feat(host/encode): VAAPI zero-copy dmabuf import (AMD/Intel GPU CSC)
apple / swift (push) Successful in 57s
ci / rust (push) Successful in 1m39s
ci / web (push) Successful in 32s
ci / docs-site (push) Successful in 31s
android / android (push) Successful in 3m29s
windows-host / package (push) Successful in 3m39s
deb / build-publish (push) Successful in 3m7s
decky / build-publish (push) Successful in 22s
ci / bench (push) Successful in 4m43s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 16s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m24s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 22s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m18s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m22s
docker / deploy-docs (push) Successful in 21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m53s
apple / swift (push) Successful in 57s
ci / rust (push) Successful in 1m39s
ci / web (push) Successful in 32s
ci / docs-site (push) Successful in 31s
android / android (push) Successful in 3m29s
windows-host / package (push) Successful in 3m39s
deb / build-publish (push) Successful in 3m7s
decky / build-publish (push) Successful in 22s
ci / bench (push) Successful in 4m43s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 16s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m24s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 22s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m18s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m22s
docker / deploy-docs (push) Successful in 21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m53s
Phase 2 of AMD/Intel support: the VAAPI encoder now takes the capture dmabuf directly and does the RGB->NV12 colour conversion on the GPU's video engine, eliminating the host-side de-pad + swscale CSC + upload the CPU path pays. - capture: a vendor-neutral FramePayload::Dmabuf (dup'd fd + fourcc/modifier/ layout). When zero-copy is on, the EGL->CUDA importer is unavailable (any non-NVIDIA host), and the backend is VAAPI, the capturer advertises LINEAR dmabuf and hands the raw buffer to the encoder instead of CPU-copying it. - encode/vaapi: the encoder self-configures from the first frame's payload (no open_video signature change). The dmabuf arm wraps the buffer as an AV_PIX_FMT_DRM_PRIME frame and pushes it through a filter graph buffer(drm_prime) -> hwmap(vaapi) -> scale_vaapi=nv12 -> buffersink; the encoder takes NV12 surfaces straight from the sink. The Phase 1 CPU-upload path is kept as the other arm (used when capture produces CPU frames). Live-validated on a Radeon 780M (real Sway/xdpw desktop capture): correct, pixel-perfect HEVC, and ~10x less host CPU at 1440p (4.2s -> 0.4s of CPU for 300 frames) -- the de-pad/CSC/upload moves to the GPU. NVIDIA unchanged (zero-copy still imports to CUDA; the passthrough path only engages on non-NVIDIA hosts). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -56,21 +56,41 @@ pub struct CapturedFrame {
|
||||
pub payload: FramePayload,
|
||||
}
|
||||
|
||||
/// A captured frame still living in a single-plane packed-RGB dmabuf (the VAAPI zero-copy path).
|
||||
/// Owns a *dup* of the PipeWire buffer's fd, so the frame can travel to the encode thread and be
|
||||
/// imported into a VA surface there without the compositor's buffer being closed underneath it.
|
||||
/// (Content stability across the brief import window relies on the compositor's buffer pool depth,
|
||||
/// same as any zero-copy capture — the VAAPI importer copies into its own NV12 surface promptly.)
|
||||
#[cfg(target_os = "linux")]
|
||||
pub struct DmabufFrame {
|
||||
pub fd: std::os::fd::OwnedFd,
|
||||
/// DRM FourCC of the packed-RGB plane (e.g. `XR24` for BGRx).
|
||||
pub fourcc: u32,
|
||||
/// DRM format modifier the compositor allocated (0 = LINEAR).
|
||||
pub modifier: u64,
|
||||
pub offset: u32,
|
||||
pub stride: u32,
|
||||
}
|
||||
|
||||
/// Where a captured frame's pixels live.
|
||||
pub enum FramePayload {
|
||||
/// Tightly-packed CPU pixels in `format`, `width*height*bytes_per_pixel` (no row padding).
|
||||
Cpu(Vec<u8>),
|
||||
/// A pitched GPU buffer (BGRA-order, on the shared CUDA context) — the zero-copy path. The
|
||||
/// dmabuf has already been imported + copied into this owned device buffer.
|
||||
/// A pitched GPU buffer (BGRA-order, on the shared CUDA context) — the NVIDIA zero-copy path.
|
||||
/// The dmabuf has already been imported + copied into this owned device buffer.
|
||||
#[cfg(target_os = "linux")]
|
||||
Cuda(crate::zerocopy::DeviceBuffer),
|
||||
/// A raw packed-RGB dmabuf — the AMD/Intel (VAAPI) zero-copy path. The encoder imports it into
|
||||
/// a VA surface and does RGB→NV12 on the GPU video engine (no host CSC, no upload).
|
||||
#[cfg(target_os = "linux")]
|
||||
Dmabuf(DmabufFrame),
|
||||
/// A GPU-resident D3D11 texture (Windows zero-copy path for NVENC). Owns the copied frame.
|
||||
#[cfg(target_os = "windows")]
|
||||
D3d11(dxgi::D3d11Frame),
|
||||
}
|
||||
|
||||
impl CapturedFrame {
|
||||
/// True if the frame's pixels are a GPU/CUDA buffer (the zero-copy path).
|
||||
/// True if the frame's pixels are a GPU/CUDA buffer (the NVIDIA zero-copy path).
|
||||
pub fn is_cuda(&self) -> bool {
|
||||
#[cfg(target_os = "linux")]
|
||||
{
|
||||
@@ -81,6 +101,18 @@ impl CapturedFrame {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
/// True if the frame is a raw dmabuf (the VAAPI zero-copy path).
|
||||
pub fn is_dmabuf(&self) -> bool {
|
||||
#[cfg(target_os = "linux")]
|
||||
{
|
||||
matches!(self.payload, FramePayload::Dmabuf(_))
|
||||
}
|
||||
#[cfg(not(target_os = "linux"))]
|
||||
{
|
||||
false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Produces frames from a captured output. Lives on its own thread, feeding the encoder
|
||||
|
||||
Reference in New Issue
Block a user