26fbd9ec64
ci / rust (push) Failing after 43s
apple / swift (push) Successful in 53s
ci / web (push) Successful in 35s
android / android (push) Successful in 1m45s
ci / docs-site (push) Successful in 29s
ci / bench (push) Successful in 1m35s
decky / build-publish (push) Successful in 32s
deb / build-publish (push) Successful in 2m21s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 17s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m59s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3m52s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 21s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m37s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m37s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 5m4s
docker / deploy-docs (push) Successful in 18s
The Windows host pegged the GPU 3D engine at ~97% during high-fps desktop streaming — measured (per- process GPU-engine counters) as OUR process, not DWM. Cause: TWO VRAM->VRAM CopyResource per frame (dupl->gpu_copy in the capturer, then gpu_copy->nvenc_pool in the encoder), and on Windows D3D11 routes copies to render-target textures through the 3D engine (the DMA copy engine sat idle at 7%), so at 240 fps they saturate it and contend with a game's own rendering. Eliminate the second copy: NVENC now registers the capturer's D3D11 texture directly (cached by raw pointer, the cloned texture kept alive until unregister) and encode_pictures it IN PLACE — no encoder-owned input pool, no per-frame copy. Safe because the host encode loop is synchronous (capture -> submit -> poll, where lock_bitstream blocks until the encode finishes), so the capturer never overwrites the texture mid-encode; documented in the module header in case that ever changes. 2 GPU copies/frame -> 1 (the remaining dupl->gpu_copy is unavoidable; that DXGI surface is transient). Measured: SM/compute ~10-15% at ~217 fps 5K (was ~20% at only ~48 fps with two copies), 3687 frames decoded clean. Windows-only; Linux/macOS unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>