3 Commits

Author SHA1 Message Date
enricobuehler 3074b30988 ci(runner): cap the act_runner cache + 30-min prune (fix recurring disk-full)
apple / swift (push) Successful in 53s
android / android (push) Successful in 10m42s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 28s
ci / rust (push) Successful in 11m39s
ci / bench (push) Successful in 4m43s
deb / build-publish (push) Successful in 3m7s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
decky / build-publish (push) Successful in 12s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 7m23s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 7m24s
docker / deploy-docs (push) Failing after 8s
The hourly docker-prune could never reclaim the real disk filler: the act_runner
cache server's blob store (cache.dir:"" -> /root/.cache/actcache/cache) lives in
the long-running runner container's WRITABLE LAYER, which docker prune can't see.
It grew to ~66 GB and filled the 125 GB disk on its own.

- New docker-prune.sh holds the logic (inline ExecStart= broke under systemd's
  own $-expansion, which emptied $SZ/$(...) before sh ran them — silently no-oping
  the burst guard). The unit now just calls the script.
- Caps the actcache: clears the blobs once they exceed ~20 GB (act_runner
  repopulates; keys are content-hashed, so only stale entries drop).
- Burst guard lowered 85%->80% and now also clears the actcache.
- Timer hourly -> every 30 min; image/cache `until` 12h -> 6h.

Live: cleared 66 GB on home-runner-1 (93% -> 20%), deployed + verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 07:53:08 +00:00
enricobuehler 8265742e74 ci: bust the re-poisoned cargo cache (v3) + burst-guard the runner prune
apple / swift (push) Successful in 53s
android / android (push) Has been cancelled
deb / build-publish (push) Has been cancelled
ci / rust (push) Has been cancelled
ci / web (push) Has been cancelled
ci / docs-site (push) Has been cancelled
ci / bench (push) Has been cancelled
decky / build-publish (push) Has been cancelled
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Has been cancelled
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Has been cancelled
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Has been cancelled
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Has been cancelled
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Has been cancelled
docker / deploy-docs (push) Has been cancelled
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Has been cancelled
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Has been cancelled
This session's push storm refilled the runner to 100% WITHIN the prune timer's 24h window
(it only trims >24h), so a build hit ENOSPC and actions/cache saved a truncated target/ ->
`error[E0463]: can't find crate for shlex` in ci.yml's clippy. Two fixes:

- Bump cargo-target-v2- -> v3- in ci.yml + deb.yml so the poisoned tarball is bypassed (a
  suffix bump can't — restore-keys falls back to the old prefix; same as the v1->v2 fix).
- Harden scripts/ci/docker-prune: run HOURLY (was 6h) with a burst guard — if the disk is
  still >85% after the normal until=12h trim, prune ALL idle images + build cache (in-use
  protected). A fast push-burst can fill 99 GB inside any time window, so the disk-pressure
  trigger, not the age filter, is the real backstop. Applied live on home-runner-1 (reclaimed
  95%->66%) and checked in.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 14:25:40 +00:00
enricobuehler bf65d264fd ci: bound runner disk + bust the disk-full-corrupted cargo target cache
apple / swift (push) Successful in 54s
ci / bench (push) Successful in 1m35s
ci / rust (push) Successful in 6m49s
android / android (push) Failing after 4m5s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 26s
decky / build-publish (push) Successful in 29s
deb / build-publish (push) Failing after 2m33s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 16s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m40s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m32s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m17s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 21s
flatpak / build-publish (push) Failing after 4s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m27s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 5m28s
docker / deploy-docs (push) Successful in 20s
The self-hosted runner filled its disk (95%, builds failing on ENOSPC): every CI
push builds a sha-<commit>-tagged Docker image per pipeline, and since those tags
are never dangling a plain `docker image prune` skips them — they piled up to 589
images / ~85 GB plus 18 GB of build cache. Two parts:

- scripts/ci/docker-prune.{service,timer}: a host-level systemd timer (every 6h,
  Persistent) that prunes images/build-cache/containers older than 24h — in-use
  images stay protected. Checked in (the runner is hand-provisioned and shared
  across orgs) and already installed live; reclaimed 89 GB -> 39 GB (95% -> 42%).

- ci.yml / deb.yml: bump the `cargo-target-<rustc>-*` cache key to `-v2-`. The
  disk-full build let actions/cache save a truncated target/ (a dep's .rmeta went
  missing -> "error[E0463]: can't find crate for pem_rfc7468" while compiling der).
  A suffix bump is useless here — restore-keys would fall back to the poisoned
  prefix — so the prefix is versioned to force one clean rebuild. cargo-home is
  untouched (sources were intact; the failure was a missing build artifact).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 09:10:56 +00:00