- scripts/bench/compare.py: diff criterion medians (target/criterion/**/estimates.json) vs a
committed baseline, print a markdown table to the job summary, flag >threshold regressions, always
exit 0 (shared CI hardware is too noisy to gate on). --update rewrites the baseline.
- ci.yml `bench` job: runs Tier-1 (criterion) + Tier-2 (loss-harness FEC recovery) GPU-free in the
rust-ci container, then compare.py — report-only visibility per push/PR.
- scripts/bench/gpu-stream.sh + bench-gpu.yml: Tier-3 real pipeline (virtual output → zero-copy →
NVENC → punktfunk/1 → reassemble) on a self-hosted GPU runner; captures encode_us/tx_mbps/
send_dropped + client capture→reassembled latency, compares to gpu-baseline.json (20% threshold).
Needs the dev box registered as a `[self-hosted, gpu]` act_runner (one-time, see the workflow
header) — the dedicated hardware makes its absolute baseline meaningful, unlike shared CI.
- baseline.json: dev-box Tier-1 numbers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>