diff --git a/.gitea/workflows/windows-host.yml b/.gitea/workflows/windows-host.yml index cb635c2..ec92d67 100644 --- a/.gitea/workflows/windows-host.yml +++ b/.gitea/workflows/windows-host.yml @@ -20,9 +20,12 @@ # an ephemeral self-signed cert is generated and its public .cer published next to the installer # (import once to LocalMachine\TrustedPublisher). See packaging/windows/pack-host-installer.ps1. # -# NVENC: the host builds with --features nvenc; the only link need is nvencodeapi.lib, synthesised -# from a 2-export .def with llvm-dlltool (no GPU/SDK at build time). The resulting exe is NVIDIA-only -# by design — CI never launches it, so no GPU is needed here. +# GPU backends: the host builds with --features nvenc,amf-qsv = all three vendors in one installer. +# - NVENC (NVIDIA, direct SDK): the only link need is nvencodeapi.lib, synthesised from a 2-export +# .def with llvm-dlltool (no GPU/SDK at build time). +# - AMF/QSV (AMD/Intel, libavcodec): link-imports the FFmpeg libs from FFMPEG_DIR (the BtbN gpl-shared +# tree the client uses; includes the *_amf/*_qsv encoders) and bundles its DLLs into the installer. +# CI never launches the exe, so no GPU is needed here — this is build + Windows clippy coverage only. name: windows-host on: @@ -59,6 +62,13 @@ jobs: # (pwsh Out-File utf8 = no BOM, unlike Windows PowerShell 5.1 — keeps the first line clean). "CARGO_TARGET_DIR=C:\t" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8 "CARGO_WORKSPACE_DIR=$env:GITHUB_WORKSPACE" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8 + # FFMPEG_DIR: the same BtbN gpl-shared x64 tree the Windows CLIENT links against (provisioned + # by scripts/ci/setup-windows-runner.ps1). The host's AMD/Intel AMF/QSV encode backend + # (--features amf-qsv) link-imports avcodec/avutil/swscale from it; pack-host-installer.ps1 + # then bundles its bin\*.dll into the installer. LIBCLANG_PATH is in the runner daemon env. + if (-not $env:FFMPEG_DIR) { + "FFMPEG_DIR=C:\Users\Public\ffmpeg" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8 + } $v = if ($env:GITHUB_REF -like 'refs/tags/v*') { $env:GITHUB_REF_NAME -replace '^v', '' } else { @@ -74,14 +84,15 @@ jobs: & packaging/windows/nvenc/gen-nvenc-importlib.ps1 -OutDir C:\t\nvenc "PUNKTFUNK_NVENC_LIB_DIR=C:\t\nvenc" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8 - - name: Build (release, nvenc) + - name: Build (release, nvenc + amf-qsv) shell: pwsh - run: cargo build --release -p punktfunk-host --features nvenc + # All-vendor host: NVENC (NVIDIA, direct SDK) + AMF/QSV (AMD/Intel, libavcodec via FFMPEG_DIR). + run: cargo build --release -p punktfunk-host --features nvenc,amf-qsv - name: Clippy (host, Windows) shell: pwsh # First-ever Windows lint coverage for the host (Linux CI never lints the windows-cfg code). - run: cargo clippy -p punktfunk-host --features nvenc -- -D warnings + run: cargo clippy -p punktfunk-host --features nvenc,amf-qsv -- -D warnings - name: Ensure Inno Setup shell: pwsh diff --git a/CLAUDE.md b/CLAUDE.md index d390b5d..ee694e4 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -74,17 +74,26 @@ Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protoc Clients auto-resolve the type from the physical controller (DS5→DualSense, DS4→DualShock 4, Xbox One→Xbox One). Windows-host DualShock 4 (ViGEm) is not yet wired — Windows clients asking for DS4 get Xbox 360 for now. -- **Windows host: implemented and shipping (NVIDIA-only, x64-only).** `#[cfg(windows)]` backends +- **Windows host: implemented and shipping (all-vendor, x64-only).** `#[cfg(windows)]` backends behind the same traits as Linux — DXGI Desktop Duplication capture (`capture/dxgi.rs`), **SudoVDA** - virtual display per session (`vdisplay/sudovda.rs`), NVENC encode (`--features nvenc`), SendInput + - **ViGEm** gamepads (`inject/gamepad_windows.rs`), WASAPI loopback + virtual mic (`audio/wasapi_*`). - Ships as a **signed Inno Setup installer** that registers a `LocalSystem` SCM service launching into - the interactive session for secure-desktop (UAC/lock-screen) capture (`service.rs`), bundles the - SudoVDA driver, and is published by `windows-host.yml`. **HDR (10-bit)**: WGC captures the HDR - desktop as FP16/Rgb10a2 (DDA FP16 for the secure desktop), NVENC forces HEVC Main10 + BT.2020 PQ, - the client auto-detects PQ from the HEVC VUI — gated by `PUNKTFUNK_10BIT` + client `VIDEO_CAP_10BIT`; - **Windows host only** (the Linux host stays 8-bit, blocked upstream). Newer/less battle-tested than - the Linux host; no AMD/Intel/software encode path. Packaging: `packaging/windows/`. + virtual display per session (`vdisplay/sudovda.rs`), GPU encode (NVENC `--features nvenc`; AMD/Intel + `--features amf-qsv`), SendInput + **ViGEm** gamepads (`inject/gamepad_windows.rs`), WASAPI loopback + + virtual mic (`audio/wasapi_*`). Ships as a **signed Inno Setup installer** that registers a + `LocalSystem` SCM service launching into the interactive session for secure-desktop (UAC/lock-screen) + capture (`service.rs`), bundles the SudoVDA driver + the FFmpeg DLLs, and is published by + `windows-host.yml`. **Encoder is GPU-aware** (`encode.rs` `open_video` + `windows_resolved_backend`): + `PUNKTFUNK_ENCODER=auto` (the host.env default) detects the DXGI adapter vendor → **NVENC** (NVIDIA, + direct SDK, `encode/nvenc.rs`), **AMF** (AMD) / **QSV** (Intel) via libavcodec + (`encode/ffmpeg_win.rs`, the Windows analogue of the Linux VAAPI backend — `WinVendor{Amf,Qsv}`, + system-memory NV12/P010 readback default + opt-in zero-copy D3D11 behind `PUNKTFUNK_ZEROCOPY` with a + system fallback), or software H.264 (`encode/sw.rs`, GPU-less). GameStream codec advertisement is + probed per-GPU on AMF/QSV (`windows_codec_support` → `serverinfo`, AV1 gated). **HDR (10-bit)**: WGC + captures the HDR desktop as FP16/Rgb10a2 (DDA FP16 for the secure desktop), the encoder forces HEVC + Main10 + BT.2020 PQ (NVENC ABGR10/P010; AMF/QSV P010 + a swscale Rgb10a2→P010 fallback), the client + auto-detects PQ from the HEVC VUI — gated by `PUNKTFUNK_10BIT` + client `VIDEO_CAP_10BIT`; **Windows + host only** (the Linux host stays 8-bit, blocked upstream). **AMF/QSV is CI-green but not yet + on-glass validated** (no AMD/Intel Windows box in the lab); NVENC is live-validated. Newer/less + battle-tested than the Linux host. Packaging: `packaging/windows/`. ## What's left @@ -243,6 +252,7 @@ crates/punktfunk-host/ vdisplay/{kwin,gamescope,mutter,wlroots}.rs per-compositor client-sized virtual outputs zerocopy/{egl,cuda,vulkan}.rs dmabuf → CUDA → NVENC (tiled via EGL/GL, LINEAR via Vulkan) inject/{libei,wlr,gamepad,dualsense}.rs input backends (uinput xpad + UHID DualSense) + encode/{nvenc,linux,vaapi,ffmpeg_win,sw}.rs per-GPU encoders (NVENC · Linux NVENC/CUDA · VAAPI · AMF/QSV · openh264) capture.rs · encode.rs · audio.rs · spike.rs · punktfunk1.rs · mgmt.rs · native_pairing.rs clients/probe/ punktfunk/1 reference/probe client (headless test/measurement tool) clients/linux/ native Linux client (GTK4/libadwaita · FFmpeg · PipeWire · SDL3) @@ -250,7 +260,7 @@ clients/windows/ native Windows client (WinUI 3 via windows-reactor · D3D11 · clients/apple/ native macOS/iOS/tvOS client (Swift · VideoToolbox · GameController) clients/android/ native Android client (Kotlin app + native/ Rust JNI core over punktfunk-core) clients/decky/ Steam Deck Decky plugin -crates/punktfunk-host/src/{capture/dxgi,vdisplay/sudovda,inject/gamepad_windows,audio/wasapi_*,service}.rs Windows host backends +crates/punktfunk-host/src/{capture/dxgi,vdisplay/sudovda,encode/ffmpeg_win,inject/gamepad_windows,audio/wasapi_*,service}.rs Windows host backends web/ TanStack web console over the mgmt API (status · devices · pairing) packaging/ apt(deb) · RPM/COPR · Arch/sysext · Flatpak · Bazzite bootc · Windows host installer (per-dir READMEs) tools/{loss-harness,latency-probe}/ measurement (plan §10) diff --git a/crates/punktfunk-host/Cargo.toml b/crates/punktfunk-host/Cargo.toml index 1a07570..8261b44 100644 --- a/crates/punktfunk-host/Cargo.toml +++ b/crates/punktfunk-host/Cargo.toml @@ -184,6 +184,12 @@ vigem-client = { version = "0.1", features = ["unstable_xtarget_notification"] } # the crate builds on docs.rs/CI. We enable it so the GPU-less VM/CI compiles; the DirectX NVENC path # never calls CUDA at runtime, so the pinned CUDA bindings version is irrelevant. nvidia-video-codec-sdk = { version = "0.4", features = ["ci-check"], optional = true } +# AMD (AMF) + Intel (QSV) hardware encode on Windows via libavcodec — the analogue of the Linux +# VAAPI backend (`src/encode/ffmpeg_win.rs`). Optional + behind the `amf-qsv` feature because it +# link-imports the FFmpeg libs at build time (needs a `FFMPEG_DIR` with the AMF/QSV encoders — the +# same BtbN gpl-shared tree the Windows client uses) and pulls the shared `avcodec/avutil/...` DLLs +# at runtime. `ffmpeg-sys-next` auto-detects the FFmpeg version (7.x/avcodec-61 or 8.x/62). +ffmpeg-next = { version = "8", optional = true } [features] # NVENC hardware encode (Windows). OFF by default: it pulls the NVENC SDK, and the host then needs @@ -191,3 +197,7 @@ nvidia-video-codec-sdk = { version = "0.4", features = ["ci-check"], optional = # time — i.e. `nvencodeapi.lib` from the NVIDIA Video Codec SDK (or an import lib generated from # nvEncodeAPI64.dll) on the linker path. Build the GPU host with `--features nvenc`. nvenc = ["dep:nvidia-video-codec-sdk"] +# AMD/Intel hardware encode on Windows (AMF/QSV via ffmpeg-next). OFF by default: it needs a +# `FFMPEG_DIR` (BtbN gpl-shared, includes `*_amf`/`*_qsv`) at build time and bundles the FFmpeg +# DLLs at runtime. Build the all-vendor GPU host with `--features nvenc,amf-qsv`. +amf-qsv = ["dep:ffmpeg-next"] diff --git a/crates/punktfunk-host/src/capture/dxgi.rs b/crates/punktfunk-host/src/capture/dxgi.rs index 2c44b74..4a3cd65 100644 --- a/crates/punktfunk-host/src/capture/dxgi.rs +++ b/crates/punktfunk-host/src/capture/dxgi.rs @@ -2157,9 +2157,14 @@ impl DuplCapturer { .ok() .and_then(|s| s.parse().ok()) .unwrap_or((2000 / refresh_hz.max(1)).max(100)); - let gpu_mode = std::env::var("PUNKTFUNK_ENCODER") - .map(|v| matches!(v.to_ascii_lowercase().as_str(), "nvenc" | "hw" | "nvidia")) - .unwrap_or(false); + // Produce GPU-resident D3D11 frames (zero-copy NVENC, or the NV12/P010 the AMF/QSV + // backends read back / import) whenever the resolved encode backend is a GPU one — so the + // capturer's output format matches the encoder's input. Only the software (GPU-less) path + // takes CPU staging. Mirrors `encode::open_video`'s dispatch exactly. + let gpu_mode = !matches!( + crate::encode::windows_resolved_backend(), + crate::encode::WindowsBackend::Software + ); // Read the source display's HDR mastering metadata while we still hold `output` (it is // moved into the struct below). Only meaningful for an HDR (FP16) duplication. let is_hdr_init = dd.ModeDesc.Format == DXGI_FORMAT_R16G16B16A16_FLOAT; diff --git a/crates/punktfunk-host/src/encode.rs b/crates/punktfunk-host/src/encode.rs index 85116aa..e3e7d9b 100644 --- a/crates/punktfunk-host/src/encode.rs +++ b/crates/punktfunk-host/src/encode.rs @@ -49,6 +49,26 @@ impl Codec { Codec::Av1 => "av1_vaapi", } } + + /// The FFmpeg AMD **AMF** encoder name (the Windows AMD backend). Selected by name (the codec id + /// would pick the software encoder). AV1 (`av1_amf`) is RDNA3+/RX 7000+ — probe, never assume. + pub fn amf_name(self) -> &'static str { + match self { + Codec::H264 => "h264_amf", + Codec::H265 => "hevc_amf", + Codec::Av1 => "av1_amf", + } + } + + /// The FFmpeg Intel **QSV** encoder name (the Windows Intel backend). Selected by name. AV1 + /// (`av1_qsv`) is Arc/Xe2+; HEVC Main10 is Gen9.5+ — probe, never assume. + pub fn qsv_name(self) -> &'static str { + match self { + Codec::H264 => "h264_qsv", + Codec::H265 => "hevc_qsv", + Codec::Av1 => "av1_qsv", + } + } } /// A hardware encoder. One per session; runs on the encode thread. @@ -198,49 +218,83 @@ pub fn open_video( #[cfg(target_os = "windows")] { let _ = cuda; // always false on Windows (no Cuda payload) - let _ = bit_depth; // used by the NVENC path below; the software H.264 path is 8-bit only - let pref = std::env::var("PUNKTFUNK_ENCODER") - .unwrap_or_default() - .to_ascii_lowercase(); - if matches!(pref.as_str(), "nvenc" | "hw" | "nvidia") { - // Hardware path: NVENC over D3D11. The DXGI capturer switches to its zero-copy - // FramePayload::D3d11 output under the same env var so capture + encode share textures. - #[cfg(feature = "nvenc")] - { - let enc = nvenc::NvencD3d11Encoder::open( - codec, + // NVIDIA → NVENC (direct SDK), AMD → AMF, Intel → QSV (both libavcodec), else → software + // H.264. `auto` (the default) resolves from the DXGI adapter vendor. + match windows_resolved_backend() { + WindowsBackend::Nvenc => { + // Hardware path: NVENC over D3D11. The DXGI capturer switches to its zero-copy + // FramePayload::D3d11 output under the same env var so capture + encode share textures. + #[cfg(feature = "nvenc")] + { + nvenc::NvencD3d11Encoder::open( + codec, + format, + width, + height, + fps, + bitrate_bps, + bit_depth, + ) + .map(|e| Box::new(e) as Box) + } + #[cfg(not(feature = "nvenc"))] + { + anyhow::bail!( + "NVENC requested/detected but this host was built without it — rebuild \ + with `--features nvenc` (needs the NVENC SDK's nvencodeapi.lib at link time)" + ) + } + } + backend @ (WindowsBackend::Amf | WindowsBackend::Qsv) => { + // AMD AMF / Intel QSV via libavcodec (the Windows analogue of the Linux VAAPI path). + #[cfg(feature = "amf-qsv")] + { + let vendor = if matches!(backend, WindowsBackend::Amf) { + ffmpeg_win::WinVendor::Amf + } else { + ffmpeg_win::WinVendor::Qsv + }; + ffmpeg_win::FfmpegWinEncoder::open( + vendor, + codec, + format, + width, + height, + fps, + bitrate_bps, + bit_depth, + ) + .map(|e| Box::new(e) as Box) + } + #[cfg(not(feature = "amf-qsv"))] + { + let _ = backend; + anyhow::bail!( + "AMD/Intel (AMF/QSV) encode requested/detected but this host was built \ + without it — rebuild with `--features amf-qsv` (needs ffmpeg-next + a \ + FFMPEG_DIR with the AMF/QSV encoders at build time)" + ) + } + } + WindowsBackend::Software => { + anyhow::ensure!( + codec == Codec::H264, + "the Windows software encoder supports H.264 only; client negotiated {codec:?} \ + (build a GPU backend: --features nvenc or amf-qsv, or request H264)" + ); + let _ = bit_depth; // the software H.264 path is 8-bit only + // Software H.264 realistically caps far below the negotiated hardware rates. + const SW_BITRATE_CEIL: u64 = 100_000_000; + sw::OpenH264Encoder::open( format, width, height, fps, - bitrate_bps, - bit_depth, - )?; - return Ok(Box::new(enc) as Box); - } - #[cfg(not(feature = "nvenc"))] - { - anyhow::bail!( - "NVENC requested but this host was built without it — rebuild with \ - `--features nvenc` (needs the NVENC SDK's nvencodeapi.lib at link time)" - ); + bitrate_bps.min(SW_BITRATE_CEIL), + ) + .map(|e| Box::new(e) as Box) } } - anyhow::ensure!( - codec == Codec::H264, - "the Windows software encoder supports H.264 only; client negotiated {codec:?} \ - (set PUNKTFUNK_ENCODER=nvenc for a GPU host, or request H264)" - ); - // Software H.264 realistically caps far below the negotiated hardware rates. - const SW_BITRATE_CEIL: u64 = 100_000_000; - let enc = sw::OpenH264Encoder::open( - format, - width, - height, - fps, - bitrate_bps.min(SW_BITRATE_CEIL), - )?; - Ok(Box::new(enc) as Box) } #[cfg(not(any(target_os = "linux", target_os = "windows")))] { @@ -339,7 +393,7 @@ pub fn linux_zero_copy_is_vaapi() -> bool { /// Which codecs the active GPU can actually ENCODE. Used to build the GameStream codec /// advertisement so a client never negotiates a codec the GPU can't do (AV1 encode is narrow — /// Intel Arc/Xe2+, AMD RDNA3+/RDNA4 — so it must be probed, not assumed). -#[cfg(target_os = "linux")] +#[cfg(any(target_os = "linux", target_os = "windows"))] #[derive(Clone, Copy, Debug)] pub struct CodecSupport { pub h264: bool, @@ -370,6 +424,123 @@ pub fn vaapi_codec_support() -> CodecSupport { }) } +// --------------------------------------------------------------------------------------------- +// Windows backend selection (the analogue of the Linux nvidia_present / linux_zero_copy_is_vaapi +// logic). NVIDIA → NVENC, AMD → AMF, Intel → QSV; `auto` (default) reads the DXGI adapter vendor. +// --------------------------------------------------------------------------------------------- + +#[cfg(target_os = "windows")] +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub(crate) enum WindowsBackend { + Nvenc, + Amf, + Qsv, + Software, +} + +#[cfg(target_os = "windows")] +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +enum GpuVendor { + Nvidia, + Amd, + Intel, +} + +/// Resolve the active Windows encode backend from `PUNKTFUNK_ENCODER` (`auto` → the DXGI adapter +/// vendor). Shared by [`open_video`] and the GameStream codec advertisement so both agree. +#[cfg(target_os = "windows")] +pub(crate) fn windows_resolved_backend() -> WindowsBackend { + let pref = std::env::var("PUNKTFUNK_ENCODER") + .unwrap_or_default() + .to_ascii_lowercase(); + match pref.as_str() { + "nvenc" | "hw" | "nvidia" | "cuda" => WindowsBackend::Nvenc, + "amf" | "amd" => WindowsBackend::Amf, + "qsv" | "intel" => WindowsBackend::Qsv, + "sw" | "software" | "openh264" => WindowsBackend::Software, + _ => match windows_gpu_vendor() { + Some(GpuVendor::Nvidia) => WindowsBackend::Nvenc, + Some(GpuVendor::Amd) => WindowsBackend::Amf, + Some(GpuVendor::Intel) => WindowsBackend::Qsv, + None => WindowsBackend::Software, + }, + } +} + +/// True if the active Windows backend is the libavcodec AMF/QSV path (so the codec advertisement +/// consults a real GPU probe rather than the NVENC static superset). Always false when the +/// `amf-qsv` feature is off — there's then no ffmpeg backend to probe. +#[cfg(target_os = "windows")] +pub fn windows_backend_is_ffmpeg() -> bool { + cfg!(feature = "amf-qsv") + && matches!( + windows_resolved_backend(), + WindowsBackend::Amf | WindowsBackend::Qsv + ) +} + +/// Detect the host GPU vendor from the first hardware DXGI adapter (Windows has no `/dev/nvidia*` +/// probe). Cached. NVIDIA=0x10DE, AMD=0x1002, Intel=0x8086; the software/WARP adapter is skipped. +#[cfg(target_os = "windows")] +fn windows_gpu_vendor() -> Option { + use std::sync::OnceLock; + use windows::Win32::Graphics::Dxgi::{ + CreateDXGIFactory1, IDXGIFactory1, DXGI_ADAPTER_FLAG_SOFTWARE, + }; + static CACHE: OnceLock> = OnceLock::new(); + *CACHE.get_or_init(|| unsafe { + let factory: IDXGIFactory1 = CreateDXGIFactory1().ok()?; + let mut i = 0u32; + while let Ok(adapter) = factory.EnumAdapters1(i) { + i += 1; + // windows-rs 0.62: GetDesc1 returns the desc by value (no out-param). + let Ok(desc) = adapter.GetDesc1() else { + continue; + }; + if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE.0 as u32) != 0 { + continue; // skip the Microsoft Basic Render / WARP adapter + } + match desc.VendorId { + 0x10DE => return Some(GpuVendor::Nvidia), + 0x1002 => return Some(GpuVendor::Amd), + 0x8086 => return Some(GpuVendor::Intel), + _ => continue, + } + } + None + }) +} + +/// Probe the active Windows AMF/QSV backend for its encodable codecs (cached; opens a tiny encoder +/// per codec, once). Mirrors [`vaapi_codec_support`]; called only when [`windows_backend_is_ffmpeg`] +/// is true. AV1 is narrow (AMD RDNA3+, Intel Arc/Xe2+), so it must be probed, not assumed. +#[cfg(all(target_os = "windows", feature = "amf-qsv"))] +pub fn windows_codec_support() -> CodecSupport { + use std::sync::OnceLock; + static CACHE: OnceLock = OnceLock::new(); + *CACHE.get_or_init(|| { + let vendor = match windows_resolved_backend() { + WindowsBackend::Qsv => ffmpeg_win::WinVendor::Qsv, + _ => ffmpeg_win::WinVendor::Amf, + }; + let caps = CodecSupport { + h264: ffmpeg_win::probe_can_encode(vendor, Codec::H264), + h265: ffmpeg_win::probe_can_encode(vendor, Codec::H265), + av1: ffmpeg_win::probe_can_encode(vendor, Codec::Av1), + }; + tracing::info!( + backend = ?vendor, + h264 = caps.h264, + h265 = caps.h265, + av1 = caps.av1, + "Windows AMF/QSV encode capabilities probed" + ); + caps + }) +} + +#[cfg(all(target_os = "windows", feature = "amf-qsv"))] +mod ffmpeg_win; #[cfg(target_os = "linux")] mod linux; #[cfg(all(target_os = "windows", feature = "nvenc"))] diff --git a/crates/punktfunk-host/src/encode/ffmpeg_win.rs b/crates/punktfunk-host/src/encode/ffmpeg_win.rs new file mode 100644 index 0000000..8276d85 --- /dev/null +++ b/crates/punktfunk-host/src/encode/ffmpeg_win.rs @@ -0,0 +1,1172 @@ +//! AMD **AMF** and Intel **QSV** hardware encode on Windows via `ffmpeg-next` — the Windows +//! analogue of the Linux [`super::vaapi`] backend (one libavcodec backend per vendor, selected by +//! encoder name: `*_amf` / `*_qsv`). This is the sibling of the direct-SDK [`super::nvenc`] path +//! behind the shared [`Encoder`] trait, selected in [`super::open_video`] (NVIDIA → NVENC, +//! AMD → AMF, Intel → QSV). +//! +//! The capturer hands a `FramePayload::D3d11` texture (NV12/P010 from the D3D11 video processor, or +//! BGRA/Rgb10a2 as a fallback) on the capturer's own `ID3D11Device`. Two input paths, chosen lazily +//! from the first frame and the `PUNKTFUNK_ZEROCOPY` knob: +//! +//! * **System-memory** ([`SystemInner`], the default): read the captured D3D11 surface back to a CPU +//! NV12/P010 [`AVFrame`] (a same-format `CopyResource` → staging → `Map`, plus a `swscale` step for +//! the BGRA fallback) and `avcodec_send_frame` it. AMF/QSV upload it internally. One +//! GPU→CPU→GPU round-trip per frame — the robust path, and the only one that can be brought up +//! without on-glass validation (it is the analogue of the VAAPI "CPU input" fallback). +//! * **Zero-copy D3D11** ([`ZeroCopyInner`], `PUNKTFUNK_ZEROCOPY=1`): wrap the capturer's +//! `ID3D11Device` as an `AV_HWDEVICE_TYPE_D3D11VA` hwdevice (shared, *not* a second device — the +//! capture textures are not shared-handle, so a different device couldn't read them), keep an +//! FFmpeg D3D11 frames pool, `CopySubresourceRegion` the captured texture into a pooled array +//! slice (a GPU-local copy, like NVENC's CUDA path), then feed AMF `AV_PIX_FMT_D3D11` directly, +//! or map the D3D11 frame to a derived QSV surface for QSV. If the hw setup fails to open, this +//! falls back to the system-memory path for the session. +//! +//! **Status: compiles in CI; not yet on-glass validated** (no AMD/Intel Windows box in the lab as of +//! 2026-06-22). The system path is the conservative default; zero-copy is opt-in until validated. +//! +//! Raw FFI: `ffmpeg-next` has no hwcontext wrappers for D3D11VA, so the hwdevice/hwframes calls go +//! through `ffmpeg::ffi` (= `ffmpeg_sys_next`), exactly as the Linux CUDA/VAAPI paths do. The +//! `AVD3D11VADeviceContext`/`AVD3D11VAFramesContext` layouts are mirrored (the bindings don't +//! allowlist `hwcontext_d3d11va.h`), as [`super::linux`] mirrors `AVCUDADeviceContext`. + +use super::{Codec, EncodedFrame, Encoder}; +use crate::capture::{dxgi::D3d11Frame, CapturedFrame, FramePayload, PixelFormat}; +use anyhow::{anyhow, bail, Context, Result}; +use ffmpeg::format::Pixel; +use ffmpeg::{codec, encoder, Dictionary, Packet, Rational}; +use ffmpeg_next as ffmpeg; +use std::os::raw::{c_int, c_uint, c_void}; +use std::ptr; +use windows::core::Interface; +use windows::Win32::Graphics::Direct3D11::{ + ID3D11Device, ID3D11DeviceContext, ID3D11Resource, ID3D11Texture2D, D3D11_BIND_DECODER, + D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_BIND_VIDEO_ENCODER, + D3D11_CPU_ACCESS_READ, D3D11_MAPPED_SUBRESOURCE, D3D11_MAP_READ, D3D11_TEXTURE2D_DESC, + D3D11_USAGE_STAGING, +}; +use windows::Win32::Graphics::Dxgi::Common::{ + DXGI_FORMAT, DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_FORMAT_NV12, DXGI_FORMAT_P010, + DXGI_FORMAT_R10G10B10A2_UNORM, DXGI_SAMPLE_DESC, +}; + +use ffmpeg::ffi; // = ffmpeg_sys_next + +// libswscale scaler-flag + colour-space constants (not exported as Rust consts by the bindings — +// the stable `` #defines, same as the VAAPI path uses). +const SWS_POINT: c_int = 0x10; +const SWS_CS_ITU709: c_int = 1; +const SWS_CS_BT2020: c_int = 9; + +/// `AVD3D11VADeviceContext` (libavutil/hwcontext_d3d11va.h) — mirrored (the ffmpeg-sys bindings +/// don't allowlist that header). We set `device` to the capturer's `ID3D11Device` so AMF/QSV share +/// it; `av_hwdevice_ctx_init` fills `device_context`/`video_device`/`video_context`/the default +/// lock from a non-null `device`. +#[repr(C)] +struct AVD3D11VADeviceContext { + device: *mut c_void, // ID3D11Device* + device_context: *mut c_void, // ID3D11DeviceContext* + video_device: *mut c_void, // ID3D11VideoDevice* + video_context: *mut c_void, // ID3D11VideoContext* + lock: *mut c_void, // void (*)(void*) + unlock: *mut c_void, // void (*)(void*) + lock_ctx: *mut c_void, +} + +/// `AVD3D11VAFramesContext` (libavutil/hwcontext_d3d11va.h) — mirrored. `BindFlags`/`MiscFlags` +/// customise the texture-array FFmpeg allocates for the pool; `texture` (we leave null) would let us +/// supply our own array. +#[repr(C)] +struct AVD3D11VAFramesContext { + texture: *mut c_void, // ID3D11Texture2D* + bind_flags: c_uint, // UINT BindFlags + misc_flags: c_uint, // UINT MiscFlags + texture_infos: *mut c_void, // AVD3D11FrameDescriptor* (FFmpeg-owned; we never touch it) +} + +/// AMD AMF vs Intel QSV — the two libavcodec vendor backends this module covers. +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub enum WinVendor { + Amf, + Qsv, +} + +impl WinVendor { + fn encoder_name(self, codec: Codec) -> &'static str { + match self { + WinVendor::Amf => codec.amf_name(), + WinVendor::Qsv => codec.qsv_name(), + } + } + + fn label(self) -> &'static str { + match self { + WinVendor::Amf => "AMF", + WinVendor::Qsv => "QSV", + } + } +} + +/// Is the zero-copy D3D11 path enabled? Opt-in (`PUNKTFUNK_ZEROCOPY=1`) until on-glass validated; +/// the default is the robust system-memory readback path. +fn zerocopy_enabled() -> bool { + std::env::var_os("PUNKTFUNK_ZEROCOPY").is_some() +} + +/// The swscale *source* pixel format for a captured packed-RGB/BGR layout (8-bit BGRA fallback only). +fn sws_src(format: PixelFormat) -> Result { + Ok(match format { + PixelFormat::Bgrx => Pixel::BGRZ, + PixelFormat::Rgbx => Pixel::RGBZ, + PixelFormat::Bgra => Pixel::BGRA, + PixelFormat::Rgba => Pixel::RGBA, + PixelFormat::Rgb => Pixel::RGB24, + PixelFormat::Bgr => Pixel::BGR24, + PixelFormat::Nv12 | PixelFormat::P010 | PixelFormat::Rgb10a2 => { + bail!("ffmpeg_win swscale path supports packed RGB/BGR only; got {format:?}") + } + }) +} + +/// Does this captured format imply a 10-bit encode (P010 / Rgb10a2)? +fn is_10bit_format(format: PixelFormat, bit_depth: u8) -> bool { + bit_depth >= 10 || matches!(format, PixelFormat::P010 | PixelFormat::Rgb10a2) +} + +/// `ffmpeg::format::Pixel` → raw `AVPixelFormat`. +fn pixel_to_av(p: Pixel) -> ffi::AVPixelFormat { + ffi::AVPixelFormat::from(p) +} + +/// Build the FFmpeg encoder context shared by both inner paths: name, mode, low-latency RC, +/// infinite GOP, the BT.709-limited (SDR) or BT.2020-PQ (HDR) VUI, the given `pix_fmt`, and the +/// optional hw device/frames contexts (null for the system path). Returns the opened encoder. +#[allow(clippy::too_many_arguments)] +unsafe fn open_win_encoder( + vendor: WinVendor, + codec: Codec, + width: u32, + height: u32, + fps: u32, + bitrate_bps: u64, + pix_fmt: ffi::AVPixelFormat, + sw_pix_fmt: ffi::AVPixelFormat, + ten_bit: bool, + device_ref: *mut ffi::AVBufferRef, + frames_ref: *mut ffi::AVBufferRef, +) -> Result { + let name = vendor.encoder_name(codec); + let av_codec = encoder::find_by_name(name).ok_or_else(|| { + anyhow!( + "{name} not built into libavcodec (no {} encoder)", + vendor.label() + ) + })?; + let mut video = codec::context::Context::new_with_codec(av_codec) + .encoder() + .video() + .context("alloc video encoder")?; + video.set_width(width); + video.set_height(height); + // Software view of the input layout (NV12 / P010). For the hw paths `pix_fmt` is overridden to + // D3D11/QSV below; libavcodec still uses this as `sw_pix_fmt`. + video.set_format(Pixel::from(sw_pix_fmt)); + video.set_time_base(Rational(1, fps as i32)); + video.set_frame_rate(Some(Rational(fps as i32, 1))); + video.set_bit_rate(bitrate_bps as usize); + video.set_max_bit_rate(bitrate_bps as usize); // target == max → CBR + let vbv_frames = std::env::var("PUNKTFUNK_VBV_FRAMES") + .ok() + .and_then(|s| s.parse::().ok()) + .filter(|v| v.is_finite() && *v > 0.0) + .unwrap_or(1.0); + let vbv_bits = + ((bitrate_bps as f64 / fps.max(1) as f64) * vbv_frames as f64).clamp(1.0, i32::MAX as f64); + video.set_max_b_frames(0); + let raw = video.as_mut_ptr(); + (*raw).rc_buffer_size = vbv_bits as i32; + (*raw).gop_size = i32::MAX; // no periodic IDR (forced-IDR via pict_type=I on RFI) + if ten_bit { + // 10-bit HDR: BT.2020 primaries + SMPTE-2084 (PQ) transfer. The client auto-detects PQ from + // the HEVC VUI; the static mastering metadata also rides the 0xCE datagram out-of-band. + (*raw).colorspace = ffi::AVColorSpace::AVCOL_SPC_BT2020_NCL; + (*raw).color_range = ffi::AVColorRange::AVCOL_RANGE_MPEG; + (*raw).color_primaries = ffi::AVColorPrimaries::AVCOL_PRI_BT2020; + (*raw).color_trc = ffi::AVColorTransferCharacteristic::AVCOL_TRC_SMPTE2084; + } else { + // We hand the encoder BT.709 *limited* NV12 (video-processor or swscale CSC), so signal that + // VUI — else the client decoder washes the picture out. + (*raw).colorspace = ffi::AVColorSpace::AVCOL_SPC_BT709; + (*raw).color_range = ffi::AVColorRange::AVCOL_RANGE_MPEG; + (*raw).color_primaries = ffi::AVColorPrimaries::AVCOL_PRI_BT709; + (*raw).color_trc = ffi::AVColorTransferCharacteristic::AVCOL_TRC_BT709; + } + (*raw).pix_fmt = pix_fmt; + if !device_ref.is_null() { + (*raw).hw_device_ctx = ffi::av_buffer_ref(device_ref); + } + if !frames_ref.is_null() { + (*raw).hw_frames_ctx = ffi::av_buffer_ref(frames_ref); + } + + // Low-latency tuning. Unknown private options are ignored by avcodec_open2 (left in the dict), + // so vendor-specific keys are safe to set unconditionally. + let mut opts = Dictionary::new(); + match vendor { + WinVendor::Amf => { + opts.set("usage", "ultralowlatency"); + opts.set("rc", "cbr"); + opts.set("quality", "balanced"); + opts.set("preanalysis", "false"); + opts.set("enforce_hrd", "true"); + // VPS/SPS/PPS on each IDR (clean mid-stream join) — HEVC/AV1 only; ignored elsewhere. + opts.set("header_insertion_mode", "idr"); + } + WinVendor::Qsv => { + opts.set("preset", "veryfast"); + opts.set("async_depth", "1"); // bound in-flight frames — the big QSV latency lever + opts.set("low_power", "1"); // VDEnc fixed-function path (lower latency) + opts.set("look_ahead", "0"); // (h264_qsv only; ignored on hevc/av1) + opts.set("forced_idr", "1"); // a forced key frame becomes a real IDR + opts.set("scenario", "displayremoting"); + } + } + video + .open_with(opts) + .with_context(|| format!("open {name} ({width}x{height}@{fps}, {bitrate_bps} bps)")) +} + +/// Probe whether THIS GPU can `vendor`-encode `codec`, by opening a tiny system-input encoder. The +/// driver/runtime rejects codecs the video engine can't do (AV1 on pre-RDNA3 AMD / pre-Arc Intel, +/// or HEVC on a very old part). Used to build the GameStream codec advertisement so a client never +/// negotiates a codec the encoder can't open. Torn down immediately. +pub fn probe_can_encode(vendor: WinVendor, codec: Codec) -> bool { + if ffmpeg::init().is_err() { + return false; + } + unsafe { + // A missing AMF/QSV runtime (wrong-vendor host, GPU-less CI) is an expected probe outcome — + // quiet ffmpeg's open error for the probe, then restore the level. + let prev = ffi::av_log_get_level(); + ffi::av_log_set_level(ffi::AV_LOG_FATAL); + let ok = open_win_encoder( + vendor, + codec, + 640, + 480, + 30, + 2_000_000, + ffi::AVPixelFormat::AV_PIX_FMT_NV12, + ffi::AVPixelFormat::AV_PIX_FMT_NV12, + false, + ptr::null_mut(), + ptr::null_mut(), + ) + .is_ok(); + ffi::av_log_set_level(prev); + ok + } +} + +/// Drain the encoder for one packet (shared poll logic, identical to the VAAPI/NVENC paths). +fn poll_encoder(enc: &mut encoder::video::Encoder, fps: u32) -> Result> { + let mut pkt = Packet::empty(); + match enc.receive_packet(&mut pkt) { + Ok(()) => { + let data = pkt.data().map(|d| d.to_vec()).unwrap_or_default(); + let pts = pkt.pts().unwrap_or(0).max(0) as u64; + Ok(Some(EncodedFrame { + data, + pts_ns: pts * 1_000_000_000 / fps as u64, + keyframe: pkt.is_key(), + })) + } + Err(ffmpeg::Error::Other { errno }) + if errno == ffmpeg::util::error::EAGAIN + || errno == ffmpeg::util::error::EWOULDBLOCK => + { + Ok(None) + } + Err(ffmpeg::Error::Eof) => Ok(None), + Err(e) => Err(e).context("receive_packet"), + } +} + +/// The immediate context of an `ID3D11Device` (for `CopyResource`/`CopySubresourceRegion`). +unsafe fn immediate_context(device: &ID3D11Device) -> ID3D11DeviceContext { + // windows-rs 0.62: the inherent method takes no args and returns the context (the OutRef form is + // only on the `_Impl` trait, for implementing the interface). Every D3D11 device has one. + device + .GetImmediateContext() + .expect("ID3D11Device always has an immediate context") +} + +// --------------------------------------------------------------------------------------------- +// System-memory path (default): read the captured D3D11 surface back to a CPU NV12/P010 frame. +// --------------------------------------------------------------------------------------------- + +struct SystemInner { + enc: encoder::video::Encoder, + /// Reusable software NV12/P010 frame: swscale dst / readback dst, and the `send_frame` src. + sw_frame: *mut ffi::AVFrame, + /// swscale ctx for the BGRA→NV12 fallback (built lazily; null for the YUV-readback path). + sws: *mut ffi::SwsContext, + /// CPU-readable staging texture for the D3D11 readback (built lazily on the captured device). + staging: Option, + ctx: Option, + format: PixelFormat, + ten_bit: bool, + width: u32, + height: u32, +} + +impl SystemInner { + #[allow(clippy::too_many_arguments)] + fn open( + vendor: WinVendor, + codec: Codec, + format: PixelFormat, + width: u32, + height: u32, + fps: u32, + bitrate_bps: u64, + bit_depth: u8, + ) -> Result { + let ten_bit = is_10bit_format(format, bit_depth); + let sw_av = if ten_bit { + ffi::AVPixelFormat::AV_PIX_FMT_P010LE + } else { + ffi::AVPixelFormat::AV_PIX_FMT_NV12 + }; + let enc = unsafe { + open_win_encoder( + vendor, + codec, + width, + height, + fps, + bitrate_bps, + sw_av, // system input: pix_fmt == sw_format (no hw frames ctx) + sw_av, + ten_bit, + ptr::null_mut(), + ptr::null_mut(), + )? + }; + let sw_frame = unsafe { + let f = ffi::av_frame_alloc(); + if f.is_null() { + bail!("av_frame_alloc(sw) failed"); + } + (*f).format = sw_av as c_int; + (*f).width = width as c_int; + (*f).height = height as c_int; + if ffi::av_frame_get_buffer(f, 0) < 0 { + let mut f = f; + ffi::av_frame_free(&mut f); + bail!("av_frame_get_buffer(sw) failed"); + } + f + }; + tracing::info!( + encoder = vendor.encoder_name(codec), + "{} encode active ({width}x{height}@{fps}, system-memory {} path)", + vendor.label(), + if ten_bit { "P010" } else { "NV12" } + ); + Ok(SystemInner { + enc, + sw_frame, + sws: ptr::null_mut(), + staging: None, + ctx: None, + format, + ten_bit, + width, + height, + }) + } + + /// Lazily (re)build the staging texture matching `dxgi_fmt` on the captured device. + unsafe fn ensure_staging( + &mut self, + device: &ID3D11Device, + dxgi_fmt: DXGI_FORMAT, + ) -> Result<()> { + if self.staging.is_some() { + return Ok(()); + } + let desc = D3D11_TEXTURE2D_DESC { + Width: self.width, + Height: self.height, + MipLevels: 1, + ArraySize: 1, + Format: dxgi_fmt, + SampleDesc: DXGI_SAMPLE_DESC { + Count: 1, + Quality: 0, + }, + Usage: D3D11_USAGE_STAGING, + BindFlags: 0, + CPUAccessFlags: D3D11_CPU_ACCESS_READ.0 as u32, + MiscFlags: 0, + }; + let mut t: Option = None; + device + .CreateTexture2D(&desc, None, Some(&mut t)) + .context("CreateTexture2D(staging readback)")?; + self.staging = t; + self.ctx = Some(immediate_context(device)); + Ok(()) + } + + /// Send the reusable `sw_frame` to the encoder with the given pts / IDR flag. + unsafe fn send(&mut self, pts: i64, idr: bool) -> Result<()> { + (*self.sw_frame).pts = pts; + (*self.sw_frame).pict_type = if idr { + ffi::AVPictureType::AV_PICTURE_TYPE_I + } else { + ffi::AVPictureType::AV_PICTURE_TYPE_NONE + }; + let r = ffi::avcodec_send_frame(self.enc.as_mut_ptr(), self.sw_frame); + if r < 0 { + bail!("avcodec_send_frame({} system) failed ({r})", "ffmpeg_win"); + } + Ok(()) + } + + /// D3D11 path: read the captured surface back into `sw_frame`, then send. Dispatches on the + /// CURRENT frame's `format` — the capturer's video processor latches off on failure and switches + /// NV12→Bgra (SDR) or P010→Rgb10a2 (HDR) mid-session, so a fixed open-time format is wrong. + fn submit_d3d11( + &mut self, + frame: &D3d11Frame, + format: PixelFormat, + pts: i64, + idr: bool, + ) -> Result<()> { + let fmt_10 = matches!(format, PixelFormat::P010 | PixelFormat::Rgb10a2); + anyhow::ensure!( + fmt_10 == self.ten_bit, + "captured format {format:?} bit-depth changed under the encoder (built {}-bit)", + if self.ten_bit { 10 } else { 8 } + ); + match format { + PixelFormat::Nv12 | PixelFormat::P010 => self.readback_yuv(frame, pts, idr), + PixelFormat::Bgra | PixelFormat::Bgrx => self.readback_bgra(frame, pts, idr), + PixelFormat::Rgb10a2 => self.readback_rgb10(frame, pts, idr), + other => { + bail!("ffmpeg_win system path cannot read back captured D3D11 format {other:?}") + } + } + } + + /// Read back a captured NV12/P010 surface plane-by-plane into the software frame. + fn readback_yuv(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> { + let dxgi_fmt = if self.ten_bit { + DXGI_FORMAT_P010 + } else { + DXGI_FORMAT_NV12 + }; + unsafe { + self.ensure_staging(&frame.device, dxgi_fmt)?; + let staging = self.staging.clone().context("staging texture")?; + let ctx = self.ctx.clone().context("d3d11 context")?; + let src: ID3D11Resource = frame.texture.cast().context("texture -> resource")?; + let dst: ID3D11Resource = staging.cast().context("staging -> resource")?; + ctx.CopyResource(&dst, &src); + let mut map = D3D11_MAPPED_SUBRESOURCE::default(); + ctx.Map(&staging, 0, D3D11_MAP_READ, 0, Some(&mut map)) + .context("Map staging (yuv readback)")?; + let pitch = map.RowPitch as usize; + let h = self.height as usize; + // NV12/P010 in a mapped staging surface: the Y plane occupies rows [0,H) at `pitch`; the + // interleaved chroma plane (H/2 rows) starts at byte offset `pitch * H`. P010 samples are + // 16-bit, so a "row" of width pixels is `width*2` bytes (and chroma `width*2` too). + let bytes_per_sample = if self.ten_bit { 2 } else { 1 }; + let row_bytes = self.width as usize * bytes_per_sample; + let base = map.pData as *const u8; + let total = pitch.saturating_mul(h + h.div_ceil(2)); + let mapped = std::slice::from_raw_parts(base, total); + let chroma_off = pitch * h; + let y_dst = (*self.sw_frame).data[0]; + let y_stride = (*self.sw_frame).linesize[0] as usize; + let uv_dst = (*self.sw_frame).data[1]; + let uv_stride = (*self.sw_frame).linesize[1] as usize; + for y in 0..h { + let s = &mapped[y * pitch..y * pitch + row_bytes]; + ptr::copy_nonoverlapping(s.as_ptr(), y_dst.add(y * y_stride), row_bytes); + } + for y in 0..h.div_ceil(2) { + let s = &mapped[chroma_off + y * pitch..chroma_off + y * pitch + row_bytes]; + ptr::copy_nonoverlapping(s.as_ptr(), uv_dst.add(y * uv_stride), row_bytes); + } + ctx.Unmap(&staging, 0); + self.send(pts, idr) + } + } + + /// Read back a captured BGRA surface, then swscale BGRA→NV12 into the software frame (8-bit). + fn readback_bgra(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> { + if self.ten_bit { + bail!("ffmpeg_win: BGRA readback is 8-bit only (HDR needs the P010 capture path)"); + } + unsafe { + self.ensure_staging(&frame.device, DXGI_FORMAT_B8G8R8A8_UNORM)?; + let staging = self.staging.clone().context("staging texture")?; + let ctx = self.ctx.clone().context("d3d11 context")?; + let src: ID3D11Resource = frame.texture.cast().context("texture -> resource")?; + let dst: ID3D11Resource = staging.cast().context("staging -> resource")?; + ctx.CopyResource(&dst, &src); + let mut map = D3D11_MAPPED_SUBRESOURCE::default(); + ctx.Map(&staging, 0, D3D11_MAP_READ, 0, Some(&mut map)) + .context("Map staging (bgra readback)")?; + let pitch = map.RowPitch as usize; + let h = self.height as usize; + let base = map.pData as *const u8; + self.ensure_sws( + pixel_to_av(Pixel::BGRA), + ffi::AVPixelFormat::AV_PIX_FMT_NV12, + SWS_CS_ITU709, + )?; + let src_data: [*const u8; 4] = [base, ptr::null(), ptr::null(), ptr::null()]; + let src_stride: [c_int; 4] = [pitch as c_int, 0, 0, 0]; + let r = ffi::sws_scale( + self.sws, + src_data.as_ptr(), + src_stride.as_ptr(), + 0, + h as c_int, + (*self.sw_frame).data.as_ptr(), + (*self.sw_frame).linesize.as_ptr(), + ); + ctx.Unmap(&staging, 0); + if r < 0 { + bail!("sws_scale BGRA→NV12 failed"); + } + self.send(pts, idr) + } + } + + /// Read back a captured Rgb10a2 (BT.2020 PQ, R10G10B10A2) surface and swscale it to P010 + /// (BT.2020 PQ, limited range) — the HDR path when the capturer's video processor emitted its + /// R10 shader output instead of P010. DXGI `R10G10B10A2_UNORM` (R in the low 10 bits, X2 alpha in + /// the top 2) == FFmpeg `AV_PIX_FMT_X2BGR10LE`. UNTESTED on glass (no AMD/Intel Windows box). + fn readback_rgb10(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> { + unsafe { + self.ensure_staging(&frame.device, DXGI_FORMAT_R10G10B10A2_UNORM)?; + let staging = self.staging.clone().context("staging texture")?; + let ctx = self.ctx.clone().context("d3d11 context")?; + let src: ID3D11Resource = frame.texture.cast().context("texture -> resource")?; + let dst: ID3D11Resource = staging.cast().context("staging -> resource")?; + ctx.CopyResource(&dst, &src); + let mut map = D3D11_MAPPED_SUBRESOURCE::default(); + ctx.Map(&staging, 0, D3D11_MAP_READ, 0, Some(&mut map)) + .context("Map staging (rgb10 readback)")?; + let pitch = map.RowPitch as usize; + let h = self.height as usize; + let base = map.pData as *const u8; + // RGB(BT.2020 PQ) → YUV(BT.2020 PQ): a matrix-only repack (same PQ transfer), full→limited. + self.ensure_sws( + ffi::AVPixelFormat::AV_PIX_FMT_X2BGR10LE, + ffi::AVPixelFormat::AV_PIX_FMT_P010LE, + SWS_CS_BT2020, + )?; + let src_data: [*const u8; 4] = [base, ptr::null(), ptr::null(), ptr::null()]; + let src_stride: [c_int; 4] = [pitch as c_int, 0, 0, 0]; + let r = ffi::sws_scale( + self.sws, + src_data.as_ptr(), + src_stride.as_ptr(), + 0, + h as c_int, + (*self.sw_frame).data.as_ptr(), + (*self.sw_frame).linesize.as_ptr(), + ); + ctx.Unmap(&staging, 0); + if r < 0 { + bail!("sws_scale Rgb10a2→P010 failed"); + } + self.send(pts, idr) + } + } + + /// CPU path: swscale a packed RGB/BGR CPU buffer to NV12, then send (8-bit only). Used when the + /// capturer hands `FramePayload::Cpu` (DDA without the video-processor path). + fn submit_cpu(&mut self, bytes: &[u8], format: PixelFormat, pts: i64, idr: bool) -> Result<()> { + anyhow::ensure!( + format == self.format, + "captured format {format:?} != encoder source {:?}", + self.format + ); + if self.ten_bit { + bail!("ffmpeg_win: CPU swscale path is 8-bit only"); + } + let w = self.width as usize; + let h = self.height as usize; + let src_row = w * format.bytes_per_pixel(); + anyhow::ensure!(bytes.len() >= src_row * h, "captured buffer too small"); + unsafe { + self.ensure_sws( + pixel_to_av(sws_src(format)?), + ffi::AVPixelFormat::AV_PIX_FMT_NV12, + SWS_CS_ITU709, + )?; + let src_data: [*const u8; 4] = [bytes.as_ptr(), ptr::null(), ptr::null(), ptr::null()]; + let src_stride: [c_int; 4] = [src_row as c_int, 0, 0, 0]; + if ffi::sws_scale( + self.sws, + src_data.as_ptr(), + src_stride.as_ptr(), + 0, + h as c_int, + (*self.sw_frame).data.as_ptr(), + (*self.sw_frame).linesize.as_ptr(), + ) < 0 + { + bail!("sws_scale RGB→NV12 failed"); + } + self.send(pts, idr) + } + } + + /// Lazily build the swscale context (src → NV12/P010, limited range, the given colorspace). A + /// SystemInner uses exactly one src→dst conversion for its lifetime (8-bit RGB→NV12 BT.709, or + /// 10-bit RGB10→P010 BT.2020), so caching a single context is sound. + unsafe fn ensure_sws( + &mut self, + src_av: ffi::AVPixelFormat, + dst_av: ffi::AVPixelFormat, + cs: c_int, + ) -> Result<()> { + if !self.sws.is_null() { + return Ok(()); + } + let sws = ffi::sws_getContext( + self.width as c_int, + self.height as c_int, + src_av, + self.width as c_int, + self.height as c_int, + dst_av, + SWS_POINT, + ptr::null_mut(), + ptr::null_mut(), + ptr::null(), + ); + if sws.is_null() { + bail!("sws_getContext(RGB→YUV) failed"); + } + // Source full-range RGB → destination limited-range YUV (matches the limited-range VUI we + // signal). For RGB input the src coefficient table is unused; pass the dst table for both. + let coeff = ffi::sws_getCoefficients(cs); + ffi::sws_setColorspaceDetails(sws, coeff, 1, coeff, 0, 0, 1 << 16, 1 << 16); + self.sws = sws; + Ok(()) + } +} + +impl Drop for SystemInner { + fn drop(&mut self) { + unsafe { + if !self.sw_frame.is_null() { + ffi::av_frame_free(&mut self.sw_frame); + } + if !self.sws.is_null() { + ffi::sws_freeContext(self.sws); + } + } + } +} + +// --------------------------------------------------------------------------------------------- +// Zero-copy D3D11 path (PUNKTFUNK_ZEROCOPY=1): share the capture device, pool D3D11 frames, copy +// the captured texture into a pooled slice, feed AMF directly / map to QSV. Falls back to the +// system path if the hw setup fails to open. Untested on glass — opt-in only for now. +// --------------------------------------------------------------------------------------------- + +struct D3d11Hw { + device_ref: *mut ffi::AVBufferRef, + frames_ref: *mut ffi::AVBufferRef, +} + +impl D3d11Hw { + /// Wrap the capturer's `ID3D11Device` as a D3D11VA hwdevice and build an NV12/P010 frames pool. + unsafe fn new( + device: &ID3D11Device, + sw_format: ffi::AVPixelFormat, + bind_flags: u32, + w: u32, + h: u32, + pool: c_int, + ) -> Result { + let mut device_ref = + ffi::av_hwdevice_ctx_alloc(ffi::AVHWDeviceType::AV_HWDEVICE_TYPE_D3D11VA); + if device_ref.is_null() { + bail!("av_hwdevice_ctx_alloc(D3D11VA) failed"); + } + let dev_ctx = (*device_ref).data as *mut ffi::AVHWDeviceContext; + let d11 = (*dev_ctx).hwctx as *mut AVD3D11VADeviceContext; + // Share the capture device. FFmpeg's d3d11va teardown Releases `device`, so hand it an owned + // reference (clone = AddRef, forget = don't Release ours). init() fills + // device_context / video_device / video_context / the default lock from a non-null device. + std::mem::forget(device.clone()); + (*d11).device = device.as_raw(); + let r = ffi::av_hwdevice_ctx_init(device_ref); + if r < 0 { + ffi::av_buffer_unref(&mut device_ref); + bail!("av_hwdevice_ctx_init(D3D11VA) failed ({r})"); + } + + let mut frames_ref = ffi::av_hwframe_ctx_alloc(device_ref); + if frames_ref.is_null() { + ffi::av_buffer_unref(&mut device_ref); + bail!("av_hwframe_ctx_alloc(D3D11VA) failed"); + } + let fc = (*frames_ref).data as *mut ffi::AVHWFramesContext; + (*fc).format = ffi::AVPixelFormat::AV_PIX_FMT_D3D11; + (*fc).sw_format = sw_format; + (*fc).width = w as c_int; + (*fc).height = h as c_int; + (*fc).initial_pool_size = pool; + let f11 = (*fc).hwctx as *mut AVD3D11VAFramesContext; + (*f11).bind_flags = bind_flags; + let r = ffi::av_hwframe_ctx_init(frames_ref); + if r < 0 { + ffi::av_buffer_unref(&mut frames_ref); + ffi::av_buffer_unref(&mut device_ref); + bail!("av_hwframe_ctx_init(D3D11VA) failed ({r})"); + } + Ok(D3d11Hw { + device_ref, + frames_ref, + }) + } +} + +impl Drop for D3d11Hw { + fn drop(&mut self) { + unsafe { + ffi::av_buffer_unref(&mut self.frames_ref); + ffi::av_buffer_unref(&mut self.device_ref); + } + } +} + +struct ZeroCopyInner { + vendor: WinVendor, + enc: encoder::video::Encoder, + hw: D3d11Hw, + /// QSV only: the QSV device + frames ctx derived from the D3D11VA ones (the encoder's real + /// input). `None` for AMF (which takes the D3D11 frames directly). + qsv_device: *mut ffi::AVBufferRef, + qsv_frames: *mut ffi::AVBufferRef, + ctx: ID3D11DeviceContext, + /// The pool's fixed sw_format (NV12 8-bit / P010 10-bit). A captured frame whose format differs + /// (the capturer's video-processor fell back to Bgra/Rgb10a2) cannot be CopySubresourceRegion'd + /// into this pool (format-group mismatch → UB), so the caller drops to the system path instead. + pool_format: PixelFormat, +} + +impl ZeroCopyInner { + #[allow(clippy::too_many_arguments)] + fn open( + vendor: WinVendor, + codec: Codec, + format: PixelFormat, + width: u32, + height: u32, + fps: u32, + bitrate_bps: u64, + bit_depth: u8, + device: &ID3D11Device, + ) -> Result { + let ten_bit = is_10bit_format(format, bit_depth); + let sw_av = if ten_bit { + ffi::AVPixelFormat::AV_PIX_FMT_P010LE + } else { + ffi::AVPixelFormat::AV_PIX_FMT_NV12 + }; + let pool_format = if ten_bit { + PixelFormat::P010 + } else { + PixelFormat::Nv12 + }; + // Bind flags on the FFmpeg-allocated pool. AMF reads it as encoder input (RENDER_TARGET + + // SHADER_RESOURCE, matching the video-processor output); QSV maps it as an mfx surface + // (DECODER | VIDEO_ENCODER). The CopySubresourceRegion into the pool works with any usable + // DEFAULT-usage texture regardless. + let bind_flags = match vendor { + WinVendor::Amf => (D3D11_BIND_RENDER_TARGET.0 | D3D11_BIND_SHADER_RESOURCE.0) as u32, + WinVendor::Qsv => (D3D11_BIND_DECODER.0 | D3D11_BIND_VIDEO_ENCODER.0) as u32, + }; + const POOL: c_int = 8; + unsafe { + let hw = D3d11Hw::new(device, sw_av, bind_flags, width, height, POOL)?; + let (pix_fmt, dev_ref, frames_ref, mut qsv_device, mut qsv_frames) = match vendor { + WinVendor::Amf => ( + ffi::AVPixelFormat::AV_PIX_FMT_D3D11, + hw.device_ref, + hw.frames_ref, + ptr::null_mut(), + ptr::null_mut(), + ), + WinVendor::Qsv => { + // Derive a QSV device that SHARES the D3D11 device, and a QSV frames ctx derived + // from the D3D11 frames pool (auto-mapped 1:1). The encoder takes AV_PIX_FMT_QSV. + let mut qsv_device: *mut ffi::AVBufferRef = ptr::null_mut(); + let r = ffi::av_hwdevice_ctx_create_derived( + &mut qsv_device, + ffi::AVHWDeviceType::AV_HWDEVICE_TYPE_QSV, + hw.device_ref, + 0, + ); + if r < 0 { + bail!("derive QSV device from D3D11VA: {}", ffmpeg::Error::from(r)); + } + let mut qsv_frames: *mut ffi::AVBufferRef = ptr::null_mut(); + let r = ffi::av_hwframe_ctx_create_derived( + &mut qsv_frames, + ffi::AVPixelFormat::AV_PIX_FMT_QSV, + qsv_device, + hw.frames_ref, + ffi::AV_HWFRAME_MAP_DIRECT as c_int, + ); + if r < 0 { + ffi::av_buffer_unref(&mut qsv_device); + bail!("derive QSV frames from D3D11VA: {}", ffmpeg::Error::from(r)); + } + ( + ffi::AVPixelFormat::AV_PIX_FMT_QSV, + qsv_device, + qsv_frames, + qsv_device, + qsv_frames, + ) + } + }; + let enc = match open_win_encoder( + vendor, + codec, + width, + height, + fps, + bitrate_bps, + pix_fmt, + sw_av, + ten_bit, + dev_ref, + frames_ref, + ) { + Ok(e) => e, + Err(e) => { + if !qsv_frames.is_null() { + ffi::av_buffer_unref(&mut qsv_frames); + } + if !qsv_device.is_null() { + ffi::av_buffer_unref(&mut qsv_device); + } + return Err(e); + } + }; + tracing::info!( + encoder = vendor.encoder_name(codec), + "{} encode active ({width}x{height}@{fps}, zero-copy D3D11 {} path)", + vendor.label(), + if ten_bit { "P010" } else { "NV12" } + ); + Ok(ZeroCopyInner { + vendor, + enc, + hw, + qsv_device, + qsv_frames, + ctx: immediate_context(device), + pool_format, + }) + } + } + + fn submit(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> { + unsafe { + // Pull a pooled D3D11 surface; its data[0] is the pool's texture-ARRAY, data[1] the slice. + let mut d3d = ffi::av_frame_alloc(); + if d3d.is_null() { + bail!("av_frame_alloc(d3d11) failed"); + } + let r = ffi::av_hwframe_get_buffer(self.hw.frames_ref, d3d, 0); + if r < 0 { + ffi::av_frame_free(&mut d3d); + bail!("av_hwframe_get_buffer(D3D11) failed ({r})"); + } + let dst_ptr = (*d3d).data[0] as *mut c_void; + let dst_index = (*d3d).data[1] as usize as u32; + let dst_tex = ID3D11Texture2D::from_raw_borrowed(&dst_ptr) + .ok_or_else(|| anyhow!("pooled D3D11 frame has null texture"))?; + // GPU-local copy of the captured slice into the pooled array slice (like NVENC's CUDA + // device→device copy). Subresource = arrayIndex (MipLevels=1). + let src: ID3D11Resource = frame.texture.cast().context("texture -> resource")?; + let dst: ID3D11Resource = dst_tex.cast().context("pooled texture -> resource")?; + self.ctx + .CopySubresourceRegion(&dst, dst_index, 0, 0, 0, &src, 0, None); + + (*d3d).pts = pts; + (*d3d).pict_type = if idr { + ffi::AVPictureType::AV_PICTURE_TYPE_I + } else { + ffi::AVPictureType::AV_PICTURE_TYPE_NONE + }; + + let send = match self.vendor { + WinVendor::Amf => ffi::avcodec_send_frame(self.enc.as_mut_ptr(), d3d), + WinVendor::Qsv => { + // Map the D3D11 frame to a QSV surface (1:1, no copy), then send the mapped frame. + let mut qsv = ffi::av_frame_alloc(); + if qsv.is_null() { + ffi::av_frame_free(&mut d3d); + bail!("av_frame_alloc(qsv) failed"); + } + (*qsv).format = ffi::AVPixelFormat::AV_PIX_FMT_QSV as c_int; + (*qsv).hw_frames_ctx = ffi::av_buffer_ref(self.qsv_frames); + // The map flags are a bindgen enum (no BitOr) — cast each to int before OR-ing. + let r = ffi::av_hwframe_map( + qsv, + d3d, + ffi::AV_HWFRAME_MAP_DIRECT as c_int | ffi::AV_HWFRAME_MAP_READ as c_int, + ); + if r < 0 { + ffi::av_frame_free(&mut qsv); + ffi::av_frame_free(&mut d3d); + bail!("av_hwframe_map(D3D11→QSV) failed ({r})"); + } + (*qsv).pts = pts; + (*qsv).pict_type = (*d3d).pict_type; + let s = ffi::avcodec_send_frame(self.enc.as_mut_ptr(), qsv); + ffi::av_frame_free(&mut qsv); + s + } + }; + ffi::av_frame_free(&mut d3d); + if send < 0 { + bail!( + "avcodec_send_frame({}) failed ({send})", + self.vendor.label() + ); + } + } + Ok(()) + } +} + +impl Drop for ZeroCopyInner { + fn drop(&mut self) { + unsafe { + if !self.qsv_frames.is_null() { + ffi::av_buffer_unref(&mut self.qsv_frames); + } + if !self.qsv_device.is_null() { + ffi::av_buffer_unref(&mut self.qsv_device); + } + } + } +} + +// --------------------------------------------------------------------------------------------- + +enum Inner { + System(SystemInner), + ZeroCopy(ZeroCopyInner), +} + +pub struct FfmpegWinEncoder { + vendor: WinVendor, + codec: Codec, + format: PixelFormat, + width: u32, + height: u32, + fps: u32, + bitrate_bps: u64, + bit_depth: u8, + /// Built lazily from the first frame (system readback vs zero-copy D3D11). + inner: Option, + /// Raw `ID3D11Device` pointer the live inner is bound to — re-init on change (the capturer + /// recreates its device across secure-desktop / HDR / resize transitions, like NVENC tracks). + bound_device: isize, + frame_idx: i64, + force_kf: bool, +} + +// Raw FFI pointers + COM objects; the encoder lives on a single thread (same contract as NVENC/VAAPI). +unsafe impl Send for FfmpegWinEncoder {} + +impl FfmpegWinEncoder { + #[allow(clippy::too_many_arguments)] + pub fn open( + vendor: WinVendor, + codec: Codec, + format: PixelFormat, + width: u32, + height: u32, + fps: u32, + bitrate_bps: u64, + bit_depth: u8, + ) -> Result { + ffmpeg::init().context("ffmpeg init")?; + if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() { + unsafe { ffi::av_log_set_level(48) }; + } + // Make sure the encoder name exists in this libavcodec build up front (clear error vs a + // first-frame failure). + let name = vendor.encoder_name(codec); + if encoder::find_by_name(name).is_none() { + bail!( + "{name} not built into libavcodec (this FFmpeg lacks the {} encoder)", + vendor.label() + ); + } + Ok(FfmpegWinEncoder { + vendor, + codec, + format, + width, + height, + fps, + bitrate_bps, + bit_depth, + inner: None, + bound_device: 0, + frame_idx: 0, + force_kf: false, + }) + } + + /// Build (or rebuild) the inner for a D3D11 frame, picking zero-copy or system. Zero-copy + /// failures fall back to the system path so a session is never lost to the untested hw path. The + /// device is re-bound on change (the capturer recreates it across secure-desktop / HDR / resize). + fn ensure_inner_d3d11(&mut self, device: &ID3D11Device) -> Result<()> { + let dev_raw = device.as_raw() as isize; + if self.inner.is_some() && self.bound_device == dev_raw { + return Ok(()); + } + self.inner = None; + self.bound_device = dev_raw; + let inner = if zerocopy_enabled() { + match ZeroCopyInner::open( + self.vendor, + self.codec, + self.format, + self.width, + self.height, + self.fps, + self.bitrate_bps, + self.bit_depth, + device, + ) { + Ok(zc) => Inner::ZeroCopy(zc), + Err(e) => { + tracing::warn!( + error = %format!("{e:#}"), + "{} zero-copy D3D11 setup failed — falling back to system-memory readback", + self.vendor.label() + ); + Inner::System(self.open_system()?) + } + } + } else { + Inner::System(self.open_system()?) + }; + self.inner = Some(inner); + Ok(()) + } + + fn open_system(&self) -> Result { + SystemInner::open( + self.vendor, + self.codec, + self.format, + self.width, + self.height, + self.fps, + self.bitrate_bps, + self.bit_depth, + ) + } +} + +impl Encoder for FfmpegWinEncoder { + fn submit(&mut self, captured: &CapturedFrame) -> Result<()> { + anyhow::ensure!( + captured.width == self.width && captured.height == self.height, + "captured frame {}x{} != encoder {}x{}", + captured.width, + captured.height, + self.width, + self.height + ); + let pts = self.frame_idx; + self.frame_idx += 1; + let idr = self.force_kf; + self.force_kf = false; + match &captured.payload { + FramePayload::D3d11(f) => { + self.ensure_inner_d3d11(&f.device)?; + // If zero-copy is active but the capturer fell back to a format the NV12/P010 pool + // can't accept (no video processor → Bgra/Rgb10a2), a CopySubresourceRegion into the + // pool would be a format-group mismatch (UB / device removal). Drop to the system + // readback path, which handles every captured format. + let pool_mismatch = matches!( + &self.inner, + Some(Inner::ZeroCopy(zc)) if captured.format != zc.pool_format + ); + if pool_mismatch { + tracing::warn!( + captured = ?captured.format, + "{} zero-copy pool format mismatch (capturer video-processor fallback) — \ + switching to system-memory readback", + self.vendor.label() + ); + self.inner = Some(Inner::System(self.open_system()?)); + } + match self.inner.as_mut().unwrap() { + Inner::ZeroCopy(zc) => zc.submit(f, pts, idr), + Inner::System(s) => s.submit_d3d11(f, captured.format, pts, idr), + } + } + FramePayload::Cpu(bytes) => { + // DDA-without-video-processor hands CPU BGRA; build a system inner and swscale it. + if self.inner.is_none() { + self.inner = Some(Inner::System(self.open_system()?)); + } + match self.inner.as_mut().unwrap() { + Inner::System(s) => s.submit_cpu(bytes, captured.format, pts, idr), + Inner::ZeroCopy(_) => { + bail!( + "{} encoder built for D3D11 got a CPU frame", + self.vendor.label() + ) + } + } + } + } + } + + fn request_keyframe(&mut self) { + self.force_kf = true; + } + + fn poll(&mut self) -> Result> { + match &mut self.inner { + Some(Inner::System(s)) => poll_encoder(&mut s.enc, self.fps), + Some(Inner::ZeroCopy(z)) => poll_encoder(&mut z.enc, self.fps), + None => Ok(None), + } + } + + fn flush(&mut self) -> Result<()> { + match &mut self.inner { + Some(Inner::System(s)) => s.enc.send_eof().context("send_eof")?, + Some(Inner::ZeroCopy(z)) => z.enc.send_eof().context("send_eof")?, + None => {} + } + Ok(()) + } +} diff --git a/crates/punktfunk-host/src/gamestream/serverinfo.rs b/crates/punktfunk-host/src/gamestream/serverinfo.rs index 9446461..eb3ab01 100644 --- a/crates/punktfunk-host/src/gamestream/serverinfo.rs +++ b/crates/punktfunk-host/src/gamestream/serverinfo.rs @@ -50,29 +50,41 @@ pub fn serverinfo_xml(host: &Host, https: bool, paired: bool) -> String { fn codec_mode_support() -> u32 { #[cfg(target_os = "linux")] if crate::encode::linux_zero_copy_is_vaapi() { - use super::{SCM_AV1_MAIN8, SCM_H264, SCM_HEVC}; - let caps = crate::encode::vaapi_codec_support(); - let mut m = 0; - if caps.h264 { - m |= SCM_H264; + if let Some(m) = probed_mask(crate::encode::vaapi_codec_support()) { + return m; } - if caps.h265 { - m |= SCM_HEVC; - } - if caps.av1 { - m |= SCM_AV1_MAIN8; - } - // Only trust a probe that actually found an encoder. An empty result means VAAPI wasn't - // usable at probe time (no VA display — a GPU-less CI box, or a misconfigured host), NOT - // that the GPU encodes nothing; advertise the static superset (pre-probe behaviour) rather - // than claiming zero codecs. - if m != 0 { + } + // Windows AMD/Intel (AMF/QSV): advertise only what the GPU actually encodes (AV1 is narrow, an + // old iGPU might lack HEVC). NVENC and the GPU-less software path keep the static superset. + #[cfg(all(target_os = "windows", feature = "amf-qsv"))] + if crate::encode::windows_backend_is_ffmpeg() { + if let Some(m) = probed_mask(crate::encode::windows_codec_support()) { return m; } } SERVER_CODEC_MODE_SUPPORT } +/// Turn a probed [`CodecSupport`](crate::encode::CodecSupport) into a `ServerCodecModeSupport` mask, +/// or `None` if the probe found nothing — meaning the GPU wasn't usable at probe time (GPU-less CI, +/// a misconfigured/wrong-vendor host), NOT that it encodes zero codecs; the caller then advertises +/// the static superset (pre-probe behaviour) rather than claiming nothing. +#[cfg(any(target_os = "linux", all(target_os = "windows", feature = "amf-qsv")))] +fn probed_mask(caps: crate::encode::CodecSupport) -> Option { + use super::{SCM_AV1_MAIN8, SCM_H264, SCM_HEVC}; + let mut m = 0; + if caps.h264 { + m |= SCM_H264; + } + if caps.h265 { + m |= SCM_HEVC; + } + if caps.av1 { + m |= SCM_AV1_MAIN8; + } + (m != 0).then_some(m) +} + #[cfg(test)] mod tests { use super::*; diff --git a/crates/punktfunk-host/src/service.rs b/crates/punktfunk-host/src/service.rs index 53e68ba..4c0bb91 100644 --- a/crates/punktfunk-host/src/service.rs +++ b/crates/punktfunk-host/src/service.rs @@ -603,7 +603,8 @@ fn uninstall() -> Result<()> { Ok(()) } -/// Write a default `host.env` if none exists, so a fresh install streams with NVENC out of the box. +/// Write a default `host.env` if none exists, so a fresh install streams out of the box. The encoder +/// defaults to `auto` — the host picks NVENC (NVIDIA) / AMF (AMD) / QSV (Intel) from the GPU vendor. fn ensure_default_host_env() -> Result<()> { let path = host_env_path(); if path.exists() { @@ -616,7 +617,9 @@ fn ensure_default_host_env() -> Result<()> { # KEY=VALUE per line; '#' comments. Restart the service after editing:\n\ # punktfunk-host service stop && punktfunk-host service start\n\ \n\ - PUNKTFUNK_ENCODER=nvenc\n\ + # Encode backend: auto (default) detects the GPU vendor — NVIDIA->nvenc, AMD->amf, Intel->qsv.\n\ + # Force one with nvenc | amf | qsv | sw (software H.264). amf/qsv need an FFmpeg-built host.\n\ + PUNKTFUNK_ENCODER=auto\n\ PUNKTFUNK_VIDEO_SOURCE=virtual\n\ PUNKTFUNK_SECURE_DDA=1\n\ RUST_LOG=info\n\ @@ -625,7 +628,7 @@ fn ensure_default_host_env() -> Result<()> { # compat). Use `serve` for a SECURE native-only host (no GameStream #5/#9 surface).\n\ # PUNKTFUNK_HOST_CMD=serve --gamestream\n\ \n\ - # Force a specific NVENC render GPU by name substring (multi-GPU boxes only):\n\ + # Force a specific render GPU by name substring (multi-GPU boxes only):\n\ # PUNKTFUNK_RENDER_ADAPTER=4090\n"; std::fs::write(&path, default).with_context(|| format!("write {}", path.display()))?; println!("Wrote default config: {}", path.display()); diff --git a/docs/windows-host.md b/docs/windows-host.md index 8dec174..e920207 100644 --- a/docs/windows-host.md +++ b/docs/windows-host.md @@ -143,12 +143,15 @@ rustc 1.96 clippy is stricter than the Linux CI image on shared code, e.g. `need driver DLL — `lib /def:nvenc.def /machine:x64 /out:nvencodeapi.lib` where `nvenc.def` lists `NvEncodeAPICreateInstance` and `NvEncodeAPIGetMaxSupportedVersion` — and set `PUNKTFUNK_NVENC_LIB_DIR` to its directory. -3. `cargo build -p punktfunk-host --features nvenc` (needs NASM + CMake for aws-lc-rs; libclang for - any ffmpeg-using client). Default build (no feature) uses the openh264 software encoder. +3. `cargo build -p punktfunk-host --features nvenc,amf-qsv` for the all-vendor GPU host (NVENC for + NVIDIA; AMD AMF + Intel QSV via libavcodec — `amf-qsv` needs `FFMPEG_DIR` with the `*_amf`/`*_qsv` + encoders at build, e.g. the BtbN gpl-shared tree, and the FFmpeg DLLs on PATH at run; NASM + CMake + for aws-lc-rs; libclang for ffmpeg-sys-next). Default build (no feature) = openh264 software encoder. 4. Run in the **interactive session** (not a Session-0 service / not over SSH — SendInput + DXGI - Desktop Duplication need a desktop): `serve` or `punktfunk1-host --source virtual`. Set - `PUNKTFUNK_ENCODER=nvenc` to select NVENC (the DXGI capturer switches to zero-copy D3D11 output to - match). The SudoVDA monitor activates once a real GPU drives WDDM, so capture + NVENC then work. + Desktop Duplication need a desktop): `serve` or `punktfunk1-host --source virtual`. + `PUNKTFUNK_ENCODER=auto` (default) picks the backend from the GPU vendor — `nvenc`/`amf`/`qsv`/`sw` + force one. The DXGI capturer emits zero-copy D3D11 NV12/P010 to match any GPU backend; the SudoVDA + monitor activates once a real GPU drives WDDM, so capture + encode then work. ### Dev loop (this repo → the Windows VM) @@ -224,7 +227,7 @@ This is the highest-value first move and is **fully doable GPU-less**. | **VirtualDisplay** | KWin/gamescope/Mutter/Sway | **SudoVDA** IOCTLs (below) + `SetDisplayConfig` mode-set | ✅ likely (WARP) — *spike* | | **Capture** | PipeWire/dmabuf | **DXGI Desktop Duplication** primary, **WGC** fallback → `ID3D11Texture2D`; add `FramePayload::D3d11` | ⚠️ DDA-on-WARP unreliable; WGC-on-WARP unverified — *spike* | | **Zero-copy** | dmabuf→EGL/Vulkan→CUDA | register `ID3D11Texture2D` with NVENC (`NV_ENC_DEVICE_TYPE_DIRECTX`) — no CUDA bridge | ❌ needs real GPU | -| **Encode** | ffmpeg `*_nvenc` | `openh264` SW (default on VM) + `nvidia-video-codec-sdk` HW (real GPU); behind `PUNKTFUNK_ENCODER` | SW ✅ / HW ❌ | +| **Encode** | ffmpeg `*_nvenc`/`*_vaapi` | `nvidia-video-codec-sdk` (NVIDIA) + libavcodec `*_amf`/`*_qsv` (AMD/Intel, `encode/ffmpeg_win.rs`, `--features amf-qsv`) + `openh264` SW fallback; vendor-auto via `PUNKTFUNK_ENCODER` | NVENC ✅ live / AMF/QSV CI-only | | **Input kbd/mouse** | libei / wlr | **SendInput** with `MOUSEEVENTF_VIRTUALDESK` absolute mapping onto the virtual desktop rect (skip the VK→evdev table — client sends Win VKs; use `KEYEVENTF_SCANCODE`+`EXTENDEDKEY`) | ✅ | | **Gamepad** | uinput xpad + FF | **ViGEmBus** via `vigem-client` (`Xbox360Wired`); rumble via `request_notification()`→`XNotification{large,small}` | ✅ (install driver) | | **Audio capture** | PipeWire sink monitor | **WASAPI loopback** via the `wasapi` crate (48 kHz stereo f32 → existing Opus) | ⚠️ needs an audio endpoint | diff --git a/packaging/windows/pack-host-installer.ps1 b/packaging/windows/pack-host-installer.ps1 index 35f6cb9..a304cee 100644 --- a/packaging/windows/pack-host-installer.ps1 +++ b/packaging/windows/pack-host-installer.ps1 @@ -25,6 +25,7 @@ param( [string]$Publisher = 'CN=unom', [string]$PfxBase64 = $env:MSIX_CERT_PFX_B64, # reuse the client's signing secret [string]$PfxPassword = $env:MSIX_CERT_PASSWORD, + [string]$FfmpegDir = $env:FFMPEG_DIR, # bundle its bin\*.dll (amf-qsv build) [switch]$NoDriver, # build without the bundled SudoVDA driver [switch]$NoSign # skip signing (local debug) ) @@ -146,6 +147,24 @@ if (-not $NoDriver) { } else { Write-Host "-NoDriver: building installer WITHOUT the bundled SudoVDA driver" } +# --- stage the FFmpeg shared DLLs (AMD/Intel AMF/QSV build) ------------------------------------ +# A host built with --features amf-qsv link-imports avcodec/avutil/swscale/... so the shared DLLs +# MUST sit next to the exe (it won't start otherwise). Bundle them from $FfmpegDir\bin — the same +# BtbN gpl-shared tree the build linked against. A nvenc/software-only build doesn't import them, so +# this is a harmless extra there; skipped entirely when $FfmpegDir is unset. +$ffmpegBinSrc = if ($FfmpegDir) { Join-Path $FfmpegDir 'bin' } else { $null } +if ($ffmpegBinSrc -and (Test-Path $ffmpegBinSrc)) { + $dlls = Get-ChildItem -Path $ffmpegBinSrc -Filter '*.dll' -ErrorAction SilentlyContinue + if ($dlls) { + $ffmpegStage = Join-Path $OutDir 'ffmpeg' + New-Item -ItemType Directory -Force -Path $ffmpegStage | Out-Null + $dlls | ForEach-Object { Copy-Item $_.FullName -Destination $ffmpegStage -Force } + $defines += "/DFfmpegBin=$ffmpegStage" + Write-Host "bundling $($dlls.Count) FFmpeg DLL(s) from $ffmpegBinSrc" + } +} +else { Write-Host "no FFMPEG_DIR\bin -> installer built WITHOUT FFmpeg DLLs (nvenc/software-only host)" } + # --- build the installer (from the non-redirected copy under C:\t) ----------------------------- Write-Host "==> ISCC $($defines -join ' ') $issLocal" & $iscc @defines $issLocal diff --git a/packaging/windows/punktfunk-host.iss b/packaging/windows/punktfunk-host.iss index d6e4cf3..cd7b4e0 100644 --- a/packaging/windows/punktfunk-host.iss +++ b/packaging/windows/punktfunk-host.iss @@ -32,6 +32,11 @@ #ifdef StageDir #define WithDriver #endif +; FfmpegBin (a dir of FFmpeg shared DLLs) is optional — present when the host is built with +; --features amf-qsv (the AMD/Intel AMF/QSV encode backend link-imports the FFmpeg libs). +#ifdef FfmpegBin + #define WithFfmpeg +#endif [Setup] AppId={{7C9E6A52-1F4B-4E8D-A3C7-2B5D8F1E0A93} @@ -68,6 +73,12 @@ Name: "startservice"; Description: "Start the punktfunk host service now (also s Source: "{#BinDir}\punktfunk-host.exe"; DestDir: "{app}"; Flags: ignoreversion Source: "{#HostEnv}"; DestDir: "{app}"; Flags: ignoreversion Source: "{#Readme}"; DestDir: "{app}"; DestName: "README.txt"; Flags: ignoreversion +#ifdef WithFfmpeg +; FFmpeg shared DLLs (avcodec/avutil/swscale/...) laid down next to the exe — the AMD/Intel +; (AMF/QSV) encode backend link-imports them, so the exe won't start without them. NVENC/software- +; only builds simply omit this block. +Source: "{#FfmpegBin}\*.dll"; DestDir: "{app}"; Flags: ignoreversion +#endif #ifdef WithDriver ; The driver payload + nefconc.exe + install-sudovda.ps1, extracted to {tmp} and removed after install. Source: "{#StageDir}\*"; DestDir: "{tmp}\sudovda"; Flags: deleteafterinstall recursesubdirs createallsubdirs; Tasks: installdriver diff --git a/scripts/windows/host.env.example b/scripts/windows/host.env.example index b6d3207..c26d3cf 100644 --- a/scripts/windows/host.env.example +++ b/scripts/windows/host.env.example @@ -9,9 +9,11 @@ # Format: KEY=VALUE per line; '#' starts a comment. The service loads these into its environment # and passes PUNKTFUNK_* and RUST_LOG through to the host it launches into the active session. -# Hardware encode via NVENC (NVIDIA). The host must be the `--features nvenc` build. Falls back to -# the software encoder automatically if NVENC is unavailable. -PUNKTFUNK_ENCODER=nvenc +# Hardware encode backend. `auto` (default) detects the GPU vendor: NVIDIA->nvenc (direct SDK), +# AMD->amf, Intel->qsv (both libavcodec). Force one with: nvenc | amf | qsv | sw (software H.264). +# nvenc needs the `--features nvenc` build; amf/qsv need the `--features amf-qsv` build (FFmpeg DLLs +# ship in the installer). The published installer is built with all three. +PUNKTFUNK_ENCODER=auto # Video source: `virtual` creates a per-client virtual display (SudoVDA) at the client's exact # resolution + refresh — the flagship mode. Requires the SudoVDA indirect display driver installed.