feat(windows-client): D3D11VA zero-copy hw decode + HDR10 present + GUI polish
windows-msix / package (push) Successful in 1m2s
apple / swift (push) Successful in 54s
windows / build (push) Failing after 1m2s
android / android (push) Failing after 48s
ci / web (push) Failing after 6s
ci / docs-site (push) Failing after 1s
ci / bench (push) Failing after 0s
deb / build-publish (push) Failing after 0s
decky / build-publish (push) Failing after 0s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Failing after 0s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Failing after 1s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 0s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Failing after 0s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 0s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Failing after 1s
docker / deploy-docs (push) Has been skipped
ci / rust (push) Failing after 2m0s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 4m18s

The client was pure software HEVC decode + CPU swscale->RGBA + a full-frame
dynamic-texture upload every frame -- the reason performance was poor on a GPU
box (the GPU sat idle while the CPU churned). This adds a hardware path, HDR,
and a GUI pass.

Performance -- D3D11VA zero-copy:
- gpu.rs (new): one D3D11 device (hardware + VIDEO_SUPPORT, WARP fallback,
  multithread-protected) shared by decoder and presenter via a Send/Sync
  OnceLock. Sharing is mandatory -- a decoded texture is only bindable on the
  device that created it. windows-rs COM interfaces are !Send/!Sync, so the
  unsafe impl is sound only under the multithread protection + disjoint
  decode(video ctx)/present(immediate ctx) split.
- video.rs: D3d11vaDecoder (raw FFI mirroring the Linux VAAPI module). The
  COM-typed AVD3D11VA{Device,Frames}Context are declared here (stable FFmpeg
  ABI) to avoid ffmpeg-sys binding the d3d11 headers; get_format builds a frames
  ctx with BindFlags=SHADER_RESOURCE so the NV12/P010 array slices are
  sampleable. av_frame_clone guard keeps each surface out of the reuse pool
  until the presenter drops it. Software decode stays as the fallback
  (DecoderPref Auto/Hardware/Software; auto falls back on init/decode error).
- present.rs: shared device; per-plane SRVs over the array slice
  (NV12->R8/R8G8, P010->R16/R16G16) + three pixel shaders (RGBA passthrough,
  NV12/BT.709, P010/BT.2020-PQ). present() now takes the frame by value so the
  GPU surface survives re-presents.

HDR:
- Detected in-band (transfer == SMPTE2084), same signal as the other clients.
  Swapchain flips to R10G10B10A2 + ST.2084 + HDR10 metadata. New Settings toggle
  gates advertising VIDEO_CAP_10BIT|HDR; host still gates 10-bit behind its own
  PUNKTFUNK_10BIT + actual-HDR-content checks.

GUI (windows-reactor):
- Host cards with accent-monogram avatars + colored status pills, InfoBar for
  errors/pairing hints, ToggleSwitch settings (+ HDR, decoder, bitrate), button
  icons, a richer connecting screen, and a stream HUD with GPU/CPU-decode + HDR
  status chips.

Not yet on-glass validated: the Linux dev box can't compile the cfg(windows)
code (ffmpeg/windows crates unfetched; WARP has no hw decode) -- only
cargo fmt checks it here. API shapes verified against the windows-rs/reactor
source and the YUV->RGB coefficients checked by hand, but D3D11VA + shaders +
the GUI need a real build (Windows CI / build VM) and on-glass test on the RTX
box. The host-side HDR encode path is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-18 23:16:07 +00:00
parent af9bb54785
commit 0cc36fa130
8 changed files with 1121 additions and 222 deletions
+11 -2
View File
@@ -10,7 +10,8 @@
//! punktfunk-client (open the WinUI 3 window: host list, settings, pairing)
//! punktfunk-client --discover (list punktfunk hosts on the LAN)
//! punktfunk-client --headless --connect host[:port] [--pin HEX] [--pair PIN] [--mode WxHxHz]
//! [--bitrate MBPS] [--mic] (no window; count frames + print stats)
//! [--bitrate MBPS] [--mic] [--decoder auto|hardware|software] [--no-hdr]
//! (no window; count frames + print stats)
// Link as a GUI (windows) subsystem binary so the default windowed launch (MSIX / double-click)
// does NOT pop a console window. The CLI paths (--headless/--discover) reattach to the launching
@@ -26,6 +27,8 @@ mod discovery;
#[cfg(windows)]
mod gamepad;
#[cfg(windows)]
mod gpu;
#[cfg(windows)]
mod input;
#[cfg(windows)]
mod present;
@@ -162,7 +165,11 @@ fn run_headless_cli(args: &[String], identity: (String, String)) {
}
}
tracing::info!(%host, port, ?mode, tofu = pin.is_none(), "connecting (headless)");
let decoder = arg("--decoder")
.map(|d| crate::video::DecoderPref::from_name(&d))
.unwrap_or_default();
tracing::info!(%host, port, ?mode, tofu = pin.is_none(), ?decoder, "connecting (headless)");
let handle = session::start(session::SessionParams {
host,
port,
@@ -171,6 +178,8 @@ fn run_headless_cli(args: &[String], identity: (String, String)) {
gamepad: GamepadPref::Auto,
bitrate_kbps,
mic_enabled: flag("--mic"),
hdr_enabled: !flag("--no-hdr"),
decoder,
pin,
identity,
});