ci: bust the re-poisoned cargo cache (v3) + burst-guard the runner prune
apple / swift (push) Successful in 53s
android / android (push) Has been cancelled
deb / build-publish (push) Has been cancelled
ci / rust (push) Has been cancelled
ci / web (push) Has been cancelled
ci / docs-site (push) Has been cancelled
ci / bench (push) Has been cancelled
decky / build-publish (push) Has been cancelled
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Has been cancelled
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Has been cancelled
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Has been cancelled
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Has been cancelled
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Has been cancelled
docker / deploy-docs (push) Has been cancelled
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Has been cancelled
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Has been cancelled
apple / swift (push) Successful in 53s
android / android (push) Has been cancelled
deb / build-publish (push) Has been cancelled
ci / rust (push) Has been cancelled
ci / web (push) Has been cancelled
ci / docs-site (push) Has been cancelled
ci / bench (push) Has been cancelled
decky / build-publish (push) Has been cancelled
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Has been cancelled
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Has been cancelled
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Has been cancelled
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Has been cancelled
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Has been cancelled
docker / deploy-docs (push) Has been cancelled
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Has been cancelled
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Has been cancelled
This session's push storm refilled the runner to 100% WITHIN the prune timer's 24h window (it only trims >24h), so a build hit ENOSPC and actions/cache saved a truncated target/ -> `error[E0463]: can't find crate for shlex` in ci.yml's clippy. Two fixes: - Bump cargo-target-v2- -> v3- in ci.yml + deb.yml so the poisoned tarball is bypassed (a suffix bump can't — restore-keys falls back to the old prefix; same as the v1->v2 fix). - Harden scripts/ci/docker-prune: run HOURLY (was 6h) with a burst guard — if the disk is still >85% after the normal until=12h trim, prune ALL idle images + build cache (in-use protected). A fast push-burst can fill 99 GB inside any time window, so the disk-pressure trigger, not the age filter, is the real backstop. Applied live on home-runner-1 (reclaimed 95%->66%) and checked in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -42,12 +42,12 @@ jobs:
|
|||||||
- uses: actions/cache@v4
|
- uses: actions/cache@v4
|
||||||
with:
|
with:
|
||||||
path: target
|
path: target
|
||||||
# -v2-: the prior `cargo-target-<rustc>-*` cache was poisoned when the runner ran
|
# -v3-: the prior `cargo-target-<rustc>-*` cache was poisoned when the runner ran
|
||||||
# out of disk mid-build and actions/cache saved a truncated target/ (a dep's .rmeta
|
# out of disk mid-build and actions/cache saved a truncated target/ (a dep's .rmeta
|
||||||
# went missing -> E0463 "can't find crate"). A suffix bump wouldn't help — restore-keys
|
# went missing -> E0463 "can't find crate"). A suffix bump wouldn't help — restore-keys
|
||||||
# would fall back to the poisoned prefix — so the prefix itself is versioned.
|
# would fall back to the poisoned prefix — so the prefix itself is versioned.
|
||||||
key: cargo-target-v2-${{ env.rustc }}-${{ hashFiles('Cargo.lock') }}
|
key: cargo-target-v3-${{ env.rustc }}-${{ hashFiles('Cargo.lock') }}
|
||||||
restore-keys: cargo-target-v2-${{ env.rustc }}-
|
restore-keys: cargo-target-v3-${{ env.rustc }}-
|
||||||
|
|
||||||
- name: Format
|
- name: Format
|
||||||
run: cargo fmt --all --check
|
run: cargo fmt --all --check
|
||||||
|
|||||||
@@ -71,10 +71,10 @@ jobs:
|
|||||||
- uses: actions/cache@v4
|
- uses: actions/cache@v4
|
||||||
with:
|
with:
|
||||||
path: target
|
path: target
|
||||||
# -v2-: bypass a target cache poisoned by a disk-full build (see ci.yml). Shares the
|
# -v3-: bypass a target cache poisoned by a disk-full build (see ci.yml). Shares the
|
||||||
# key with ci.yml so the release build reuses its clean artifacts.
|
# key with ci.yml so the release build reuses its clean artifacts.
|
||||||
key: cargo-target-v2-${{ env.rustc }}-${{ hashFiles('Cargo.lock') }}
|
key: cargo-target-v3-${{ env.rustc }}-${{ hashFiles('Cargo.lock') }}
|
||||||
restore-keys: cargo-target-v2-${{ env.rustc }}-
|
restore-keys: cargo-target-v3-${{ env.rustc }}-
|
||||||
|
|
||||||
- name: Build release host + client
|
- name: Build release host + client
|
||||||
env:
|
env:
|
||||||
|
|||||||
@@ -2,12 +2,14 @@
|
|||||||
#
|
#
|
||||||
# Why this exists: every CI push builds and sha-<commit>-tags a Docker image per pipeline
|
# Why this exists: every CI push builds and sha-<commit>-tags a Docker image per pipeline
|
||||||
# (rust-ci, web, docs, fedora-rpm, fedora44-rpm, ...). Those tags are never dangling, so a
|
# (rust-ci, web, docs, fedora-rpm, fedora44-rpm, ...). Those tags are never dangling, so a
|
||||||
# plain `docker image prune` SKIPS them and they accumulate forever — that is what filled the
|
# plain `docker image prune` SKIPS them and they accumulate — that is what filled the disk.
|
||||||
# disk (589 images / ~85 GB, builds failing on ENOSPC). This trims everything older than 24h;
|
|
||||||
# images IN USE by a running container are always protected regardless of age.
|
|
||||||
#
|
|
||||||
# Host-level, not per-repo CI, because the runner is shared (punktfunk + other orgs all benefit).
|
# Host-level, not per-repo CI, because the runner is shared (punktfunk + other orgs all benefit).
|
||||||
#
|
#
|
||||||
|
# Two tiers: trim anything older than 12h normally, AND — because a push-burst can fill 99 GB
|
||||||
|
# WITHIN that 12h window (a fast iteration session hit 100% and poisoned the cargo cache with a
|
||||||
|
# truncated, half-saved target/) — a burst guard that prunes ALL idle images + cache once the
|
||||||
|
# disk is >85% full. Images IN USE by a running container are always protected.
|
||||||
|
#
|
||||||
# Install on the runner host (root):
|
# Install on the runner host (root):
|
||||||
# cp scripts/ci/docker-prune.{service,timer} /etc/systemd/system/
|
# cp scripts/ci/docker-prune.{service,timer} /etc/systemd/system/
|
||||||
# systemctl daemon-reload && systemctl enable --now docker-prune.timer
|
# systemctl daemon-reload && systemctl enable --now docker-prune.timer
|
||||||
@@ -22,7 +24,10 @@ After=docker.service
|
|||||||
[Service]
|
[Service]
|
||||||
Type=oneshot
|
Type=oneshot
|
||||||
# '-' prefix: each step is independent — a no-op/failure never blocks the others.
|
# '-' prefix: each step is independent — a no-op/failure never blocks the others.
|
||||||
ExecStart=-/usr/bin/docker image prune -af --filter until=24h
|
ExecStart=-/usr/bin/docker image prune -af --filter until=12h
|
||||||
ExecStart=-/usr/bin/docker builder prune -af --filter until=24h
|
ExecStart=-/usr/bin/docker builder prune -af --filter until=12h
|
||||||
ExecStart=-/usr/bin/docker buildx prune -af --filter until=24h
|
ExecStart=-/usr/bin/docker buildx prune -af --filter until=12h
|
||||||
ExecStart=-/usr/bin/docker container prune -f --filter until=24h
|
ExecStart=-/usr/bin/docker container prune -f --filter until=12h
|
||||||
|
# Burst guard: if STILL >85% full, prune every idle image + all build cache (in-use protected),
|
||||||
|
# so a push-storm can't drive CI into ENOSPC (which truncates and poisons the actions/cargo cache).
|
||||||
|
ExecStart=-/bin/sh -c 'P=$(df --output=pcent / | tr -dc 0-9); [ "$P" -ge 85 ] && { docker image prune -af; docker builder prune -af; docker buildx prune -af; } || true'
|
||||||
|
|||||||
@@ -1,12 +1,12 @@
|
|||||||
# Runs docker-prune.service every 6h. Persistent=true catches up after downtime.
|
# Runs docker-prune.service hourly (the burst guard needs to react within the hour, not every 6h).
|
||||||
# Install: see the header of docker-prune.service.
|
# Persistent=true catches up after downtime. Install: see the header of docker-prune.service.
|
||||||
|
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=Run docker-prune every 6h (CI runner disk hygiene)
|
Description=Run docker-prune hourly (CI runner disk hygiene + burst guard)
|
||||||
|
|
||||||
[Timer]
|
[Timer]
|
||||||
OnCalendar=*-*-* 00/6:00:00
|
OnCalendar=hourly
|
||||||
RandomizedDelaySec=600
|
RandomizedDelaySec=300
|
||||||
Persistent=true
|
Persistent=true
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
|
|||||||
Reference in New Issue
Block a user