feat(windows-client): D3D11VA zero-copy hw decode + HDR10 present + GUI polish
windows-msix / package (push) Successful in 1m2s
apple / swift (push) Successful in 54s
windows / build (push) Failing after 1m2s
android / android (push) Failing after 48s
ci / web (push) Failing after 6s
ci / docs-site (push) Failing after 1s
ci / bench (push) Failing after 0s
deb / build-publish (push) Failing after 0s
decky / build-publish (push) Failing after 0s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Failing after 0s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Failing after 1s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 0s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Failing after 0s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 0s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Failing after 1s
docker / deploy-docs (push) Has been skipped
ci / rust (push) Failing after 2m0s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 4m18s

The client was pure software HEVC decode + CPU swscale->RGBA + a full-frame
dynamic-texture upload every frame -- the reason performance was poor on a GPU
box (the GPU sat idle while the CPU churned). This adds a hardware path, HDR,
and a GUI pass.

Performance -- D3D11VA zero-copy:
- gpu.rs (new): one D3D11 device (hardware + VIDEO_SUPPORT, WARP fallback,
  multithread-protected) shared by decoder and presenter via a Send/Sync
  OnceLock. Sharing is mandatory -- a decoded texture is only bindable on the
  device that created it. windows-rs COM interfaces are !Send/!Sync, so the
  unsafe impl is sound only under the multithread protection + disjoint
  decode(video ctx)/present(immediate ctx) split.
- video.rs: D3d11vaDecoder (raw FFI mirroring the Linux VAAPI module). The
  COM-typed AVD3D11VA{Device,Frames}Context are declared here (stable FFmpeg
  ABI) to avoid ffmpeg-sys binding the d3d11 headers; get_format builds a frames
  ctx with BindFlags=SHADER_RESOURCE so the NV12/P010 array slices are
  sampleable. av_frame_clone guard keeps each surface out of the reuse pool
  until the presenter drops it. Software decode stays as the fallback
  (DecoderPref Auto/Hardware/Software; auto falls back on init/decode error).
- present.rs: shared device; per-plane SRVs over the array slice
  (NV12->R8/R8G8, P010->R16/R16G16) + three pixel shaders (RGBA passthrough,
  NV12/BT.709, P010/BT.2020-PQ). present() now takes the frame by value so the
  GPU surface survives re-presents.

HDR:
- Detected in-band (transfer == SMPTE2084), same signal as the other clients.
  Swapchain flips to R10G10B10A2 + ST.2084 + HDR10 metadata. New Settings toggle
  gates advertising VIDEO_CAP_10BIT|HDR; host still gates 10-bit behind its own
  PUNKTFUNK_10BIT + actual-HDR-content checks.

GUI (windows-reactor):
- Host cards with accent-monogram avatars + colored status pills, InfoBar for
  errors/pairing hints, ToggleSwitch settings (+ HDR, decoder, bitrate), button
  icons, a richer connecting screen, and a stream HUD with GPU/CPU-decode + HDR
  status chips.

Not yet on-glass validated: the Linux dev box can't compile the cfg(windows)
code (ffmpeg/windows crates unfetched; WARP has no hw decode) -- only
cargo fmt checks it here. API shapes verified against the windows-rs/reactor
source and the YUV->RGB coefficients checked by hand, but D3D11VA + shaders +
the GUI need a real build (Windows CI / build VM) and on-glass test on the RTX
box. The host-side HDR encode path is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-18 23:16:07 +00:00
parent af9bb54785
commit 0cc36fa130
8 changed files with 1121 additions and 222 deletions
+31 -10
View File
@@ -8,7 +8,7 @@
//! (software-only here) and the audio backend (WASAPI). The pump body is identical.
use crate::audio;
use crate::video::{DecodedFrame, Decoder};
use crate::video::{DecodedFrame, Decoder, DecoderPref};
use punktfunk_core::client::NativeClient;
use punktfunk_core::config::{CompositorPref, GamepadPref, Mode};
use punktfunk_core::PunktfunkError;
@@ -25,6 +25,10 @@ pub struct SessionParams {
pub bitrate_kbps: u32,
/// Stream the default microphone to the host's virtual mic source.
pub mic_enabled: bool,
/// Advertise 10-bit + HDR10 so the host may upgrade HDR content to a Main10/PQ stream.
pub hdr_enabled: bool,
/// Which video decode backend to use (auto/hardware/software).
pub decoder: DecoderPref,
/// Pinned host fingerprint; `None` = trust on first use (caller persists the observed one).
pub pin: Option<[u8; 32]>,
pub identity: (String, String),
@@ -37,6 +41,10 @@ pub struct Stats {
pub decode_ms: f32,
/// Median capture→decoded latency over the last window (host-clock corrected).
pub latency_ms: f32,
/// True when decoding on the GPU (D3D11VA zero-copy) vs. CPU (software).
pub hardware: bool,
/// True when the stream is BT.2020 PQ HDR10 (last decoded frame).
pub hdr: bool,
}
pub enum SessionEvent {
@@ -99,10 +107,15 @@ fn pump(
params.compositor,
params.gamepad,
params.bitrate_kbps,
// Advertise 10-bit + HDR10: the presenter handles BT.2020 PQ (R10G10B10A2) frames, so the
// host may upgrade HDR content to a Main10/PQ stream (it still only does so for actual HDR
// content with its own 10-bit gate). 8-bit SDR is unaffected.
punktfunk_core::quic::VIDEO_CAP_10BIT | punktfunk_core::quic::VIDEO_CAP_HDR,
// Advertise 10-bit + HDR10 (when enabled): the presenter handles BT.2020 PQ frames (P010 on
// the GPU path, X2BGR10 on software), so the host may upgrade HDR content to a Main10/PQ
// stream — it still only does so for actual HDR content with its own 10-bit gate. 8-bit SDR
// is unaffected. A client that turns HDR off advertises `0` and always gets the 8-bit stream.
if params.hdr_enabled {
punktfunk_core::quic::VIDEO_CAP_10BIT | punktfunk_core::quic::VIDEO_CAP_HDR
} else {
0
},
None, // launch: the Windows client has no library picker yet
params.pin,
Some(params.identity),
@@ -132,13 +145,15 @@ fn pump(
fingerprint: connector.host_fingerprint,
});
let mut decoder = match Decoder::new() {
let mut decoder = match Decoder::new(params.decoder) {
Ok(d) => d,
Err(e) => {
let _ = ev_tx.send_blocking(SessionEvent::Ended(Some(format!("video decoder: {e}"))));
return;
}
};
let mut hardware = decoder.is_hardware();
let mut hdr = false;
// Audio is best-effort: a session without it still streams. Gamepads are the
// app-lifetime service's job (the UI attaches it on Connected).
let player = audio::AudioPlayer::spawn()
@@ -178,12 +193,16 @@ fn pump(
match decoder.decode(&frame.data) {
Ok(Some(decoded)) => {
total_frames += 1;
hdr = decoded.hdr();
// The backend can demote D3D11VA → software mid-session on a hardware error.
hardware = decoder.is_hardware();
if total_frames == 1 {
let DecodedFrame::Cpu(c) = &decoded;
let (w, h) = decoded.dims();
tracing::info!(
width = c.width,
height = c.height,
path = "software",
width = w,
height = h,
path = if hardware { "d3d11va" } else { "software" },
hdr,
"first frame decoded"
);
}
@@ -253,6 +272,8 @@ fn pump(
0.0
},
latency_ms: p50 as f32 / 1000.0,
hardware,
hdr,
}));
window_start = Instant::now();
frames_n = 0;