Files
punktfunk/crates
enricobuehler 823e0e737a feat(host/windows): two-thread async NVENC retrieve (PUNKTFUNK_NVENC_ASYNC, opt-in)
The gpu-contention plan's §5.B lever: today submit and the blocking
lock_bitstream share one thread, so under a GPU-saturating game the
pipeline serializes on the WDDM scheduling wait (1000/17ms ≈ 59fps —
the depth-1 collapse; the old 'deeper pipeline just stacks latency'
result was a same-thread implementation, not a disproof). Async mode
opens the session enableEncodeAsync=1, registers an auto-reset
completion event per pool bitstream, and moves the wait+lock+copy+
unlock onto an internal retrieve thread feeding poll() through a
channel — the exact split the NVENC guide mandates. Register/map/unmap
stay on the encode thread; teardown drops the job channel, joins the
thread, THEN destroys the session. In-flight depth is bounded by
PUNKTFUNK_NVENC_ASYNC_DEPTH (default 4, hard cap POOL-1) — both for
output-buffer reuse and because NVENC encodes the capture ring's
textures in place. Idle latency cost ≈ 0 (same-tick pickup); under
contention completed frames queue instead of stalling capture.

CI-compile validated only — on-glass A/B under game load on the RTX
box still pending (box offline).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 15:25:22 +00:00
..