Files
punktfunk/crates
enricobuehler 26fbd9ec64
ci / rust (push) Failing after 43s
apple / swift (push) Successful in 53s
ci / web (push) Successful in 35s
android / android (push) Successful in 1m45s
ci / docs-site (push) Successful in 29s
ci / bench (push) Successful in 1m35s
decky / build-publish (push) Successful in 32s
deb / build-publish (push) Successful in 2m21s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 17s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m59s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3m52s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 21s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m37s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m37s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 5m4s
docker / deploy-docs (push) Successful in 18s
perf(host/windows): zero-copy NVENC — encode the capturer's texture in place (halve 3D-engine load)
The Windows host pegged the GPU 3D engine at ~97% during high-fps desktop streaming — measured (per-
process GPU-engine counters) as OUR process, not DWM. Cause: TWO VRAM->VRAM CopyResource per frame
(dupl->gpu_copy in the capturer, then gpu_copy->nvenc_pool in the encoder), and on Windows D3D11
routes copies to render-target textures through the 3D engine (the DMA copy engine sat idle at 7%),
so at 240 fps they saturate it and contend with a game's own rendering.

Eliminate the second copy: NVENC now registers the capturer's D3D11 texture directly (cached by raw
pointer, the cloned texture kept alive until unregister) and encode_pictures it IN PLACE — no
encoder-owned input pool, no per-frame copy. Safe because the host encode loop is synchronous
(capture -> submit -> poll, where lock_bitstream blocks until the encode finishes), so the capturer
never overwrites the texture mid-encode; documented in the module header in case that ever changes.

2 GPU copies/frame -> 1 (the remaining dupl->gpu_copy is unavoidable; that DXGI surface is transient).
Measured: SM/compute ~10-15% at ~217 fps 5K (was ~20% at only ~48 fps with two copies), 3687 frames
decoded clean. Windows-only; Linux/macOS unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 17:33:07 +00:00
..