Files
punktfunk/crates
enricobuehler 99f60b5b08
ci / rust (push) Has been cancelled
perf(latency): microburst-cap pacing + per-frame latency histogram
From the latency investigation: the freeze-fix pacing (paced_submit) was the
single biggest software-controllable latency term — it unconditionally spread
EVERY multi-chunk frame over ~90% of the frame interval, adding up to ~7.5 ms
@120 / ~15 ms @60 to a frame's last packet even when the frame was small or the
link idle. Recover that on the common case while keeping the freeze fix:

- Microburst-cap pacing: a frame whose sealed size is <= a cap (default 128 KB,
  PUNKTFUNK_PACE_BURST_KB) goes out in ONE immediate burst — no pacing latency.
  Only the OVERFLOW of a bigger frame (IDR / sustained high bitrate, the bursts
  that actually overran the tx buffer and froze) is spread. 128 KB is well under
  the ~150 Mbps@60 frame size where drops began, so the default is safe; raise it
  after confirming send_dropped stays 0 on a given link. Still never slower than
  unpaced (budget collapses to 0 with no slack). seal-once/in-order nonce
  preserved — chunks are split, never reordered or re-sealed.
- Per-frame instrumentation (PUNKTFUNK_PERF, zero-cost off): encode_us +
  pace_us (the pacing tail) p50/p99/max histograms + immediate-vs-paced frame
  counts in the periodic perf line, so the pacing tail is finally visible and the
  cap is tunable against real numbers.

Host builds + clippy + fmt green. NOT yet deployed to the running hosts (still on
the safe full-pacing A+B build) — needs the user's LAN soak to validate the cap
doesn't reintroduce send_dropped before raising it. Deferred bigger bets (need
real-NIC/GPU/Mac validation): encode|send thread split on the native path,
CUDA stream+event (one redundant sync), NVENC slice wrapper, stage-2 Apple
presenter, glass-to-glass probe — see docs/roadmap.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:53:52 +00:00
..