perf(host/linux): NV12 GPU convert — feed NVENC native YUV, off the contended SM (Tier 2A)
apple / swift (push) Successful in 54s
windows-host / package (push) Failing after 2m18s
ci / web (push) Successful in 32s
ci / rust (push) Failing after 5m2s
decky / build-publish (push) Successful in 11s
android / android (push) Failing after 49s
ci / docs-site (push) Successful in 35s
ci / bench (push) Failing after 3m15s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m49s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 15s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 40s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Failing after 28s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Successful in 5m54s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 11s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 1m36s
apple / swift (push) Successful in 54s
windows-host / package (push) Failing after 2m18s
ci / web (push) Successful in 32s
ci / rust (push) Failing after 5m2s
decky / build-publish (push) Successful in 11s
android / android (push) Failing after 49s
ci / docs-site (push) Successful in 35s
ci / bench (push) Failing after 3m15s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m49s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 15s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 40s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Failing after 28s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Successful in 5m54s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 11s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 1m36s
The Linux zero-copy tiled-GL path can now produce NV12 (BT.709 limited range) on the GPU and feed NVENC native YUV, deleting NVENC's internal RGB->YUV CSC — which runs on the SM/3D-compute engine a saturating game pins at 100% (the game-vs-encode contention headache). Windows already does this via the D3D11 video processor; this closes the Linux gap. See docs/host-latency-plan.md §2A. Gated behind PUNKTFUNK_NV12 (default OFF → the RGB/BGRx path is byte-for-byte unchanged; zero regression). Only the tiled EGL/GL path converts; the LINEAR/Vulkan-bridge (gamescope) path stays RGB. - zerocopy/egl.rs: Nv12Blit — BT.709 limited Y pass (R8, full-res) + UV pass (RG8, half-res, GL_LINEAR 2x2 average); both CUDA-registered; import_nv12. - zerocopy/cuda.rs: two-plane DeviceBuffer (Y W*H@1B + interleaved UV (W/2)*2 x H/2), paired Y+UV pool, copy_mapped_nv12 + copy_nv12_to_device, on the per-thread priority stream (dmabuf-recycle sync preserved). - encode/linux.rs: nvenc_input(Nv12)->NV12; submit_cuda copies two planes into NVENC's surface; VUI signalled BT.709 limited (colorspace/range/primaries/trc). - capture/linux.rs: gate (PUNKTFUNK_NV12 && tiled), report format Nv12. - main.rs + zerocopy/mod.rs: `nv12-selftest` subcommand. Validated on RTX 5070 Ti two ways: (1) nv12-selftest — synthetic RGBA->NV12 round-trip vs a BT.709 reference, max abs error Y=0.56/U=0.33/V=0.26 LSB; (2) live capture->NV12->NVENC->decode of animated red content matches the RGB path's colour (avg RGB 230,18,18 vs 231,18,20). build/clippy/fmt green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -466,6 +466,9 @@ mod pipewire {
|
||||
negotiated: Arc<AtomicBool>,
|
||||
/// Present when zero-copy is enabled: imports a dmabuf → CUDA device buffer.
|
||||
importer: Option<crate::zerocopy::EglImporter>,
|
||||
/// `PUNKTFUNK_NV12`: on the tiled EGL/GL zero-copy path, convert to NV12 on the GPU and feed
|
||||
/// NVENC native YUV (Tier 2A). Off ⇒ the BGRx path is unchanged.
|
||||
nv12: bool,
|
||||
/// Rate-limit counter for the latest-frame-only diagnostic log (see `.process`).
|
||||
dbg_log_n: u64,
|
||||
}
|
||||
@@ -780,8 +783,17 @@ mod pipewire {
|
||||
// sample LINEAR).
|
||||
let modifier = (ud.modifier != 0).then_some(ud.modifier);
|
||||
if let Some(fourcc) = crate::zerocopy::drm_fourcc(fmt) {
|
||||
let imported = if modifier.is_some() {
|
||||
importer.import(&plane, w as u32, h as u32, fourcc, modifier)
|
||||
// NV12 convert (Tier 2A) only on the tiled EGL/GL path (`modifier.is_some()`):
|
||||
// produce native YUV so NVENC skips its internal RGB→YUV CSC. The LINEAR/Vulkan
|
||||
// (gamescope) path stays RGB — its convert isn't wired here. When NV12 is
|
||||
// produced the frame's format is reported as `Nv12` so the encoder opens native.
|
||||
let nv12 = ud.nv12 && modifier.is_some();
|
||||
let imported = if let Some(m) = modifier {
|
||||
if nv12 {
|
||||
importer.import_nv12(&plane, w as u32, h as u32, fourcc, Some(m))
|
||||
} else {
|
||||
importer.import(&plane, w as u32, h as u32, fourcc, Some(m))
|
||||
}
|
||||
} else {
|
||||
importer.import_linear(&plane, w as u32, h as u32)
|
||||
};
|
||||
@@ -794,6 +806,7 @@ mod pipewire {
|
||||
w,
|
||||
h,
|
||||
modifier = ud.modifier,
|
||||
nv12,
|
||||
"zero-copy: dmabuf imported to CUDA (no CPU copy)"
|
||||
);
|
||||
}
|
||||
@@ -805,7 +818,7 @@ mod pipewire {
|
||||
width: w as u32,
|
||||
height: h as u32,
|
||||
pts_ns,
|
||||
format: fmt,
|
||||
format: if nv12 { PixelFormat::Nv12 } else { fmt },
|
||||
payload: FramePayload::Cuda(devbuf),
|
||||
});
|
||||
return;
|
||||
@@ -978,6 +991,12 @@ mod pipewire {
|
||||
"zero-copy: advertising EGL-importable dmabuf modifiers"
|
||||
);
|
||||
}
|
||||
if want_dmabuf && crate::zerocopy::nv12_enabled() {
|
||||
tracing::info!(
|
||||
"PUNKTFUNK_NV12: tiled dmabufs convert to NV12 (BT.709 limited) on the GPU — NVENC \
|
||||
fed native YUV (no internal RGB→YUV CSC)"
|
||||
);
|
||||
}
|
||||
|
||||
let data = UserData {
|
||||
info: VideoInfoRaw::default(),
|
||||
@@ -987,6 +1006,7 @@ mod pipewire {
|
||||
active,
|
||||
negotiated,
|
||||
importer,
|
||||
nv12: crate::zerocopy::nv12_enabled(),
|
||||
dbg_log_n: 0,
|
||||
};
|
||||
|
||||
|
||||
Reference in New Issue
Block a user