Files
punktfunk/crates
enricobuehler 68c92f6874 feat(host/vaapi): submit-split instrumentation + async_depth knob (depth 1 stays default)
Chasing the 8ms submit at 1440p on the 780M: the sampled PUNKTFUNK_PERF
split (push/pull/send) shows desc+buffersrc at ~5us, hwmap-import+VPP
CSC at ~0.2-0.5ms, and avcodec_send_frame owning the rest — so neither
a VA-surface import cache nor CSC overlap would help. Two facts landed:
(1) async_depth>=2 in libavcodec's vaapi_encode is a structural
+1-frame latency (frame N's packet only materializes when N+1 queues;
measured 18ms vs 8.3ms p50 at depth 1) — depth 1 stays the default,
PUNKTFUNK_VAAPI_ASYNC_DEPTH exists for pixel rates beyond the ASIC's
serial budget, and poll() now does a bounded in-flight wait so a deeper
depth still ships the AU as soon as the ASIC finishes. (2) The residual
send_frame block tracks GPU CLOCKS, not the ASIC: ~8ms/frame at a 60fps
duty cycle vs ~4.4ms at 120fps pacing vs 3.5ms back-to-back (270fps CLI
benchmark, even at -async_depth 1) — the clock-sag fix lands in
gpuclocks.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 15:03:45 +00:00
..