feat(clients/windows): all-vendor video pipeline rewrite + app icon + hosts-page tiles

Decode+present rewrite (first real pixels on glass for this client):

- Decode: FFmpeg D3D11VA on NVIDIA/AMD/Intel. get_format now only returns
  AV_PIX_FMT_D3D11 and lets libavcodec build the decode pool from
  hw_device_ctx (hand-built frames contexts failed three different ways:
  NVIDIA rejects DECODER|SHADER_RESOURCE arrays, BindFlags=0 fails texture
  creation, Intel rejects non-128-aligned HEVC surfaces at the first
  SubmitDecoderBuffers). A DXVA profile probe before the hwdevice commits
  hardware-vs-software up front instead of burning the opening IDR;
  extra_hw_frames covers the frames the client holds.
- Present: the decoded slice is copied with ONE display-size-boxed
  CopySubresourceRegion (a planar slice is a single subresource in D3D11;
  the old two-copy D3D12-style code silently no-opped - the black screen)
  into a sampleable NV12/P010 texture, per-plane SRVs + YUV->RGB shaders.
- New dedicated render thread (render.rs): presenting is decoupled from the
  XAML thread; frame-latency-waitable swapchain + SetMaximumFrameLatency(1),
  newest-wins drain after the wait, crossbeam frame channel with pts for a
  capture->presented p50 log.
- HiDPI: pixel-sized buffers + SetMatrixTransform(96/dpi) - was blurry at
  125/150 % scaling.
- Software fallback now feeds the same shaders (swscale -> NV12/P010 planes
  -> two dynamic plane textures); ps_rgba/X2BGR10 path deleted, hw/sw colour
  math identical.
- Adapter selection for hybrid boxes: PUNKTFUNK_ADAPTER > the window's
  monitor's adapter > default; PUNKTFUNK_D3D_DEBUG=1 debug layer.
- Session pump: request_keyframe at start and on hw->sw demotion (infinite
  GOP would otherwise sit on a black screen).

Validated live on the Arc Pro + RTX 3500 Ada laptop against the local
Windows host: 60 fps D3D11VA on both vendors, software path, GUI on glass.

Also: embedded app icon (build.rs winresource + WM_SETICON, MSIX
Square44x44 targetsize assets, pack-msix stages them) and the hosts-page
tile rework (tap-to-connect tiles with sibling overflow menu - fixes
forget-also-connects - in-tile rename editor, add-host modal via root state).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 16:24:23 +02:00
parent 2c416a4bff
commit a4c84ac620
36 changed files with 1797 additions and 581 deletions
+364 -190
View File
@@ -1,17 +1,29 @@
//! Direct3D11 presenter for a WinUI 3 `SwapChainPanel`. It draws a decoded frame Contain-fit into a
//! **composition** flip-model swapchain, which the reactor stream page binds to the panel via
//! `SwapChainPanelHandle::set_swap_chain`.
//! `SwapChainPanelHandle::set_swap_chain`. After that one UI-thread bind, the presenter lives on
//! the dedicated render thread ([`crate::render`]) — presenting never touches (or is stalled by)
//! the XAML thread.
//!
//! Two frame sources, one swapchain:
//! Two frame sources, one pair of YUV shaders (identical colour math for both):
//!
//! * **GPU (zero-copy)** — [`crate::video::GpuFrame`] is a decoder-owned NV12/P010 `ID3D11Texture2D`
//! array slice (D3D11VA). We create per-plane shader-resource views over the slice and convert
//! YUV→RGB in a pixel shader: NV12 via BT.709 (`ps_nv12`), P010 via BT.2020 with the PQ transfer
//! left intact (`ps_p010`). No CPU copy. The decoder uses the **same** shared device
//! ([`crate::gpu`]) so the texture is bindable here.
//! * **CPU upload** — [`crate::video::CpuFrame`] is packed RGBA (SDR) or X2BGR10 (HDR) from the
//! software decoder; we upload it into a dynamic texture and draw it with a passthrough shader
//! (`ps_rgba`). The fallback path.
//! * **GPU (D3D11VA)** — [`crate::video::GpuFrame`] is a slice of the decoder-only NV12/P010
//! texture array. One `CopySubresourceRegion` with a display-size box moves the slice — **both
//! planes; in D3D11 a planar slice is a single subresource** (unlike D3D12) — into our
//! sampleable texture, which per-plane SRVs (R8/R8G8, R16/R16G16) expose to the shaders. The
//! source box is mandatory: the decode array is coded-size (e.g. 1920×1088), the target
//! display-size (1920×1080), and D3D11 silently drops size-mismatched full-resource copies.
//! * **CPU upload** — [`crate::video::CpuFrame`] carries NV12/P010 planes from the software
//! decoder; they upload into two dynamic plane textures feeding the same SRV slots/shaders.
//!
//! **Pacing**: the swapchain is created with `DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT`
//! and `SetMaximumFrameLatency(1)` (flagless fallback for odd drivers). The render thread waits
//! on the latency waitable before drawing, so at most one present is ever queued (minimum compose
//! latency) and a stream faster than the display drops frames *before* any GPU work. Every
//! `ResizeBuffers` must re-pass the creation flags — that's `swap_flags`.
//!
//! **HiDPI**: buffers are sized in physical pixels and `IDXGISwapChain2::SetMatrixTransform`
//! (scale 96/DPI) maps them to the panel's DIP coordinate space — without it XAML samples a
//! DIP-sized buffer up and the video is blurry at 125/150 % scaling.
//!
//! **HDR10**: when a frame is BT.2020 PQ the swapchain flips to `R10G10B10A2` +
//! `DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020` (+ HDR10 metadata) via `ResizeBuffers`/
@@ -21,21 +33,23 @@
//! All `windows` types here come from the same windows-rs commit as `windows-reactor`, so the
//! `IDXGISwapChain1` handed to `set_swap_chain` satisfies reactor's `windows_core::Interface`.
use crate::video::{DecodedFrame, GpuFrame};
use crate::video::{CpuFrame, DecodedFrame, GpuFrame};
use anyhow::{anyhow, Context, Result};
use windows::core::{Interface, PCSTR};
use windows::Win32::Foundation::{CloseHandle, HANDLE, WAIT_OBJECT_0};
use windows::Win32::Graphics::Direct3D::Fxc::{D3DCompile, D3DCOMPILE_OPTIMIZATION_LEVEL3};
use windows::Win32::Graphics::Direct3D::{
ID3DBlob, D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST, D3D_SRV_DIMENSION_TEXTURE2DARRAY,
ID3DBlob, D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST, D3D_SRV_DIMENSION_TEXTURE2D,
};
use windows::Win32::Graphics::Direct3D11::*;
use windows::Win32::Graphics::Dxgi::Common::*;
use windows::Win32::Graphics::Dxgi::*;
use windows::Win32::System::Threading::WaitForSingleObject;
// One vertex shader (fullscreen triangle) + three pixel shaders, selected per frame source. tex0 is
// RGBA (passthrough) or the luma plane; tex1 is the chroma plane. The YUV→RGB matrices fold the
// limited→full range scale into the coefficients; for P010 the R16 sample is rescaled (×65535/65472)
// to undo the 10-bits-in-the-high-bits packing, then converted with BT.2020 NCL, PQ preserved.
// One vertex shader (fullscreen triangle) + two pixel shaders, selected per frame colour space.
// tex0 is the luma plane, tex1 the chroma plane. The YUV→RGB matrices fold the limited→full range
// scale into the coefficients; for P010 the R16 sample is rescaled (×65535/65472) to undo the
// 10-bits-in-the-high-bits packing, then converted with BT.2020 NCL, PQ preserved.
const SHADER_HLSL: &str = r#"
struct VSOut { float4 pos : SV_Position; float2 uv : TEXCOORD0; };
VSOut vs_main(uint vid : SV_VertexID) {
@@ -49,8 +63,6 @@ Texture2D tex0 : register(t0);
Texture2D tex1 : register(t1);
SamplerState smp : register(s0);
float4 ps_rgba(VSOut i) : SV_Target { return tex0.Sample(smp, i.uv); }
float4 ps_nv12(VSOut i) : SV_Target {
float y = tex0.Sample(smp, i.uv).r;
float2 uv = tex1.Sample(smp, i.uv).rg;
@@ -77,46 +89,53 @@ float4 ps_p010(VSOut i) : SV_Target {
}
"#;
/// A bound GPU frame: per-plane SRVs over the decoder's texture-array slice, plus the `GpuFrame`
/// itself kept alive so the decoder won't recycle the slice while we re-present it.
struct GpuView {
/// The currently bound frame: per-plane SRVs (over the GPU sample texture or the CPU plane
/// textures) + the colour space that picks the shader. Redraws (resize, letterbox) re-present it.
struct Bound {
y: ID3D11ShaderResourceView,
c: ID3D11ShaderResourceView,
/// Held only for its `Drop` (returns the decoder surface to the reuse pool) — never read.
#[allow(dead_code)]
frame: GpuFrame,
}
/// Current draw source.
#[derive(Clone, Copy, PartialEq)]
enum Mode {
Empty,
Rgba,
Nv12,
P010,
hdr: bool,
}
pub struct Presenter {
device: ID3D11Device,
context: ID3D11DeviceContext,
vs: ID3D11VertexShader,
ps_rgba: ID3D11PixelShader,
ps_nv12: ID3D11PixelShader,
ps_p010: ID3D11PixelShader,
sampler: ID3D11SamplerState,
swap: IDXGISwapChain1,
/// Creation flags — MUST be re-passed to every `ResizeBuffers` or it fails.
swap_flags: u32,
/// The frame-latency waitable (owned; closed in `Drop`), `None` on the flagless fallback.
waitable: Option<HANDLE>,
rtv: Option<ID3D11RenderTargetView>,
/// CPU-upload texture + SRV + dimensions; recreated when the decoded size/format changes.
cpu_tex: Option<(ID3D11Texture2D, ID3D11ShaderResourceView, u32, u32)>,
/// Bound zero-copy GPU frame (held to keep its decoder surface alive).
gpu: Option<GpuView>,
mode: Mode,
/// GPU path: sampleable copy target for the decoded slice — `(tex, w, h, ten_bit)`, recreated
/// when the decoded size/bit depth changes. Format must equal the decode array's (NV12/P010).
sample_tex: Option<(ID3D11Texture2D, u32, u32, bool)>,
/// The last GPU frame, held until the NEXT bind so its decode surface stays out of the reuse
/// pool at least until this frame's copy has been queued ahead of any later decoder write.
gpu_frame: Option<GpuFrame>,
/// CPU path: dynamic luma + chroma plane textures + their SRVs — `(y, uv, y_srv, uv_srv, w, h,
/// ten_bit)`, recreated when the decoded size/bit depth changes.
#[allow(clippy::type_complexity)]
plane_tex: Option<(
ID3D11Texture2D,
ID3D11Texture2D,
ID3D11ShaderResourceView,
ID3D11ShaderResourceView,
u32,
u32,
bool,
)>,
bound: Option<Bound>,
/// Source frame dimensions, for the Contain-fit letterbox.
src_w: u32,
src_h: u32,
/// Panel (swapchain) size in pixels, updated on resize.
/// Panel (swapchain) size in physical pixels + the window DPI, updated on resize.
panel_w: u32,
panel_h: u32,
dpi: u32,
/// Whether the swapchain is currently in 10-bit HDR10 (R10G10B10A2 + ST.2084) mode.
hdr: bool,
/// The source's static HDR mastering metadata received over the protocol (`0xCE`), applied via
@@ -126,45 +145,71 @@ pub struct Presenter {
}
/// Latest source HDR mastering metadata, written by the session pump (`session.rs`, the sole
/// `next_hdr_meta` consumer) and read by `present_newest` on the UI thread — decoupled so the
/// `next_hdr_meta` consumer) and read by the render thread before each present — decoupled so the
/// presenter doesn't need the connector. One session at a time on the client, so a single slot.
pub static LATEST_HDR_META: std::sync::Mutex<Option<punktfunk_core::quic::HdrMeta>> =
std::sync::Mutex::new(None);
impl Presenter {
/// Create the presenter on the process-wide shared D3D11 device (the one the decoder uses), plus
/// the composition swapchain + shaders, sized to the panel.
pub fn new(width: u32, height: u32) -> Result<Presenter> {
/// the composition swapchain + shaders, sized to the panel in physical pixels at `dpi`.
pub fn new(width: u32, height: u32, dpi: u32) -> Result<Presenter> {
let shared = crate::gpu::shared().ok_or_else(|| anyhow!("no shared D3D11 device"))?;
let device = shared.device.clone();
let context = shared.context.clone();
let (vs, ps_rgba, ps_nv12, ps_p010, sampler) = build_pipeline(&device)?;
let swap = create_composition_swapchain(&device, width.max(1), height.max(1))?;
Ok(Presenter {
let (vs, ps_nv12, ps_p010, sampler) = build_pipeline(&device)?;
let (swap, swap_flags) =
create_composition_swapchain(&device, width.max(1), height.max(1))?;
// ≤1 queued present: the render thread blocks on the waitable, so a frame is only drawn
// when the compositor is ready to take it — the newest-wins drain happens after the wait.
let waitable = (swap_flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT.0 as u32
!= 0)
.then(|| unsafe {
let sc2: IDXGISwapChain2 = swap.cast().ok()?;
sc2.SetMaximumFrameLatency(1).ok()?;
let h = sc2.GetFrameLatencyWaitableObject();
(!h.is_invalid()).then_some(h)
})
.flatten();
let p = Presenter {
device,
context,
vs,
ps_rgba,
ps_nv12,
ps_p010,
sampler,
swap,
swap_flags,
waitable,
rtv: None,
cpu_tex: None,
gpu: None,
mode: Mode::Empty,
sample_tex: None,
gpu_frame: None,
plane_tex: None,
bound: None,
src_w: 1,
src_h: 1,
panel_w: width.max(1),
panel_h: height.max(1),
dpi: dpi.max(96),
hdr: false,
hdr_meta: None,
})
};
p.apply_dpi_matrix();
Ok(p)
}
/// Block until the swapchain can take another present (≤ `timeout_ms`). True when a present
/// slot is free; also true on the flagless fallback (no throttle available, just present).
pub fn wait_present_slot(&self, timeout_ms: u32) -> bool {
match self.waitable {
Some(h) => unsafe { WaitForSingleObject(h, timeout_ms) == WAIT_OBJECT_0 },
None => true,
}
}
/// Update the source HDR mastering metadata (from the `0xCE` plane). Stored for the next HDR
/// swapchain switch, and applied immediately if already presenting HDR. A no-op when unchanged
/// (so it's cheap to call every frame from the present loop).
/// (so it's cheap to call every frame from the render loop).
pub fn set_hdr_metadata(&mut self, meta: punktfunk_core::quic::HdrMeta) {
if self.hdr_meta == Some(meta) {
return;
@@ -180,28 +225,54 @@ impl Presenter {
&self.swap
}
/// Resize the back buffers to the panel's new size (drops the stale RTV).
pub fn resize(&mut self, width: u32, height: u32) {
if width == 0 || height == 0 || (width == self.panel_w && height == self.panel_h) {
/// Resize the back buffers to the panel's new size in physical pixels at `dpi` (drops the
/// stale RTV, re-applies the DIP↔pixel matrix).
pub fn resize(&mut self, width: u32, height: u32, dpi: u32) {
let dpi = dpi.max(96);
if width == 0
|| height == 0
|| (width == self.panel_w && height == self.panel_h && dpi == self.dpi)
{
return;
}
self.rtv = None; // release all back-buffer refs before ResizeBuffers
unsafe {
let _ = self.swap.ResizeBuffers(
if let Err(e) = self.swap.ResizeBuffers(
0,
width,
height,
DXGI_FORMAT_UNKNOWN,
DXGI_SWAP_CHAIN_FLAG(0),
);
DXGI_SWAP_CHAIN_FLAG(self.swap_flags as i32),
) {
tracing::warn!(error = %e, "ResizeBuffers failed");
return;
}
}
self.panel_w = width;
self.panel_h = height;
self.dpi = dpi;
self.apply_dpi_matrix();
}
/// Present one decoded frame (Contain-fit) — or, when `frame` is `None`, re-present the last one
/// (or black). Called from the reactor `on_rendering` per-frame callback on the UI thread. Takes
/// the frame by value so the GPU path can retain the decoder surface across re-presents.
/// Map the pixel-sized buffers into the panel's DIP coordinate space (scale 96/DPI) — XAML
/// otherwise stretches whatever size the buffers are to the panel's DIP bounds (blurry).
fn apply_dpi_matrix(&self) {
let s = 96.0 / self.dpi as f32;
if let Ok(sc2) = self.swap.cast::<IDXGISwapChain2>() {
let m = DXGI_MATRIX_3X2_F {
_11: s,
_22: s,
..Default::default()
};
if let Err(e) = unsafe { sc2.SetMatrixTransform(&m) } {
tracing::warn!(error = %e, "SetMatrixTransform failed");
}
}
}
/// Present one decoded frame (Contain-fit) — or, when `frame` is `None`, re-present the last
/// one (or black). Called from the render thread. Takes the frame by value: the GPU path
/// retains the decoder surface until the next bind.
pub fn present(&mut self, frame: Option<DecodedFrame>) {
match frame {
Some(DecodedFrame::Cpu(c)) => {
@@ -210,20 +281,14 @@ impl Presenter {
}
if let Err(e) = self.upload(&c) {
tracing::warn!(error = %e, "frame upload failed");
} else {
self.mode = Mode::Rgba;
self.src_w = c.width;
self.src_h = c.height;
self.gpu = None; // drop any held GPU frame
}
}
Some(DecodedFrame::Gpu(g)) => {
if g.hdr != self.hdr {
self.set_hdr(g.hdr);
}
match self.bind_gpu(g) {
Ok(()) => {}
Err(e) => tracing::warn!(error = %e, "GPU frame bind failed"),
if let Err(e) = self.bind_gpu(g) {
tracing::warn!(error = %e, "GPU frame bind failed");
}
}
None => {}
@@ -231,46 +296,102 @@ impl Presenter {
self.draw();
}
/// Build per-plane SRVs over the decoded texture-array slice and retain the frame.
/// Copy the decoded slice into our sampleable texture and build per-plane SRVs over it. The
/// decode array is decoder-only (NVIDIA won't bind a decoder array as a shader resource), so
/// it can't be sampled directly — one GPU-to-GPU copy makes the frame sampleable on every
/// vendor. D3D11 planar semantics: the slice is ONE subresource (both planes copy together),
/// and the source box is display-size (the array is coded-size; a full-resource copy would
/// size-mismatch and be silently dropped).
fn bind_gpu(&mut self, g: GpuFrame) -> Result<()> {
let tex: ID3D11Texture2D = unsafe {
let src: ID3D11Texture2D = unsafe {
let raw = g.texture_ptr();
ID3D11Texture2D::from_raw_borrowed(&raw)
.ok_or_else(|| anyhow!("null D3D11 texture"))?
.clone()
};
// NV12: R8 luma + R8G8 chroma. P010: R16 luma + R16G16 chroma (10 bits in the high bits).
let (fy, fc) = if g.hdr {
(DXGI_FORMAT_R16_UNORM, DXGI_FORMAT_R16G16_UNORM)
} else {
(DXGI_FORMAT_R8_UNORM, DXGI_FORMAT_R8G8_UNORM)
self.ensure_sample_tex(g.width, g.height, g.ten_bit)?;
let dst = self.sample_tex.as_ref().unwrap().0.clone();
// Even-aligned luma coordinates (NV12/P010 chroma is 2×2 subsampled).
let src_box = D3D11_BOX {
left: 0,
top: 0,
front: 0,
right: g.width & !1,
bottom: g.height & !1,
back: 1,
};
let y = self.array_srv(&tex, fy, g.index)?;
let c = self.array_srv(&tex, fc, g.index)?;
self.mode = if g.hdr { Mode::P010 } else { Mode::Nv12 };
unsafe {
self.context
.CopySubresourceRegion(&dst, 0, 0, 0, 0, &src, g.index, Some(&src_box));
}
let (fy, fc) = plane_formats(g.ten_bit);
let y = self.plane_srv(&dst, fy)?;
let c = self.plane_srv(&dst, fc)?;
if g.ten_bit != g.hdr {
warn_bitdepth_mismatch_once(g.ten_bit, g.hdr);
}
self.src_w = g.width;
self.src_h = g.height;
self.gpu = Some(GpuView { y, c, frame: g });
self.bound = Some(Bound { y, c, hdr: g.hdr });
// Hold the frame until the next bind: its decode surface stays out of the reuse pool
// until this copy is queued ahead of any later decoder write (previous frame drops here).
self.gpu_frame = Some(g);
Ok(())
}
/// A shader-resource view over a single slice of a texture array, reinterpreting the plane
/// format (the NV12/P010 sub-format trick D3D11 allows on video textures).
fn array_srv(
/// Ensure the sampleable copy texture matches the decoded frame's size + bit depth (NV12 for
/// 8-bit, P010 for 10-bit — the same format as the decode array, a `CopySubresourceRegion`
/// requirement), recreating it on a change.
fn ensure_sample_tex(&mut self, w: u32, h: u32, ten_bit: bool) -> Result<()> {
if matches!(&self.sample_tex, Some((_, tw, th, tb)) if *tw == w && *th == h && *tb == ten_bit)
{
return Ok(());
}
let desc = D3D11_TEXTURE2D_DESC {
Width: w,
Height: h,
MipLevels: 1,
ArraySize: 1,
Format: if ten_bit {
DXGI_FORMAT_P010
} else {
DXGI_FORMAT_NV12
},
SampleDesc: DXGI_SAMPLE_DESC {
Count: 1,
Quality: 0,
},
Usage: D3D11_USAGE_DEFAULT,
BindFlags: D3D11_BIND_SHADER_RESOURCE.0 as u32,
CPUAccessFlags: 0,
MiscFlags: 0,
};
let tex = unsafe {
let mut t = None;
self.device
.CreateTexture2D(&desc, None, Some(&mut t))
.context("CreateTexture2D (sample target)")?;
t.ok_or_else(|| anyhow!("null sample texture"))?
};
self.sample_tex = Some((tex, w, h, ten_bit));
Ok(())
}
/// A shader-resource view over one plane of a single (non-array) NV12/P010 texture — the
/// R8/R8G8 (or R16/R16G16) format selects the luma vs. chroma plane (the D3D11 video
/// sub-format trick).
fn plane_srv(
&self,
tex: &ID3D11Texture2D,
format: DXGI_FORMAT,
slice: u32,
) -> Result<ID3D11ShaderResourceView> {
let desc = D3D11_SHADER_RESOURCE_VIEW_DESC {
Format: format,
ViewDimension: D3D_SRV_DIMENSION_TEXTURE2DARRAY,
ViewDimension: D3D_SRV_DIMENSION_TEXTURE2D,
Anonymous: D3D11_SHADER_RESOURCE_VIEW_DESC_0 {
Texture2DArray: D3D11_TEX2D_ARRAY_SRV {
Texture2D: D3D11_TEX2D_SRV {
MostDetailedMip: 0,
MipLevels: 1,
FirstArraySlice: slice,
ArraySize: 1,
},
},
};
@@ -278,37 +399,109 @@ impl Presenter {
let mut srv = None;
self.device
.CreateShaderResourceView(tex, Some(&desc), Some(&mut srv))
.context("CreateShaderResourceView (array slice)")?;
.context("CreateShaderResourceView (plane)")?;
srv.ok_or_else(|| anyhow!("null SRV"))
}
}
/// Upload a software-decoded frame's two planes into the dynamic plane textures (created to
/// match size/bit depth), feeding the same SRV slots + shaders as the GPU path.
fn upload(&mut self, frame: &CpuFrame) -> Result<()> {
let (w, h) = (frame.width, frame.height);
let rebuild = !matches!(&self.plane_tex,
Some((.., tw, th, tb)) if *tw == w && *th == h && *tb == frame.ten_bit);
if rebuild {
let (fy, fc) = plane_formats(frame.ten_bit);
let y = self.dynamic_tex(w, h, fy)?;
let uv = self.dynamic_tex(w.div_ceil(2), h.div_ceil(2), fc)?;
let y_srv = self.plane_srv(&y, fy)?;
let uv_srv = self.plane_srv(&uv, fc)?;
self.plane_tex = Some((y, uv, y_srv, uv_srv, w, h, frame.ten_bit));
}
let (y, uv, y_srv, uv_srv, ..) = self.plane_tex.as_ref().unwrap();
let bytes = if frame.ten_bit { 2 } else { 1 };
self.map_rows(y, &frame.y, frame.y_stride, w as usize * bytes, h as usize)?;
self.map_rows(
uv,
&frame.uv,
frame.uv_stride,
w.div_ceil(2) as usize * 2 * bytes,
h.div_ceil(2) as usize,
)?;
self.src_w = w;
self.src_h = h;
self.bound = Some(Bound {
y: y_srv.clone(),
c: uv_srv.clone(),
hdr: frame.hdr,
});
self.gpu_frame = None; // drop any held GPU frame
Ok(())
}
fn dynamic_tex(&self, w: u32, h: u32, format: DXGI_FORMAT) -> Result<ID3D11Texture2D> {
let desc = D3D11_TEXTURE2D_DESC {
Width: w,
Height: h,
MipLevels: 1,
ArraySize: 1,
Format: format,
SampleDesc: DXGI_SAMPLE_DESC {
Count: 1,
Quality: 0,
},
Usage: D3D11_USAGE_DYNAMIC,
BindFlags: D3D11_BIND_SHADER_RESOURCE.0 as u32,
CPUAccessFlags: D3D11_CPU_ACCESS_WRITE.0 as u32,
MiscFlags: 0,
};
unsafe {
let mut t = None;
self.device
.CreateTexture2D(&desc, None, Some(&mut t))
.context("CreateTexture2D (plane)")?;
t.ok_or_else(|| anyhow!("null plane texture"))
}
}
/// Map-discard `tex` and copy `rows` rows of `row_bytes` from `src` (stride `src_pitch`).
fn map_rows(
&self,
tex: &ID3D11Texture2D,
src: &[u8],
src_pitch: usize,
row_bytes: usize,
rows: usize,
) -> Result<()> {
unsafe {
let mut mapped = D3D11_MAPPED_SUBRESOURCE::default();
self.context
.Map(tex, 0, D3D11_MAP_WRITE_DISCARD, 0, Some(&mut mapped))
.context("Map plane texture")?;
let dst = mapped.pData as *mut u8;
let dst_pitch = mapped.RowPitch as usize;
let n = row_bytes.min(src_pitch);
for r in 0..rows {
std::ptr::copy_nonoverlapping(
src.as_ptr().add(r * src_pitch),
dst.add(r * dst_pitch),
n,
);
}
self.context.Unmap(tex, 0);
}
Ok(())
}
fn draw(&mut self) {
let Ok(rtv) = self.rtv() else {
return;
};
let (pw, ph) = (self.panel_w, self.panel_h);
// Resolve the current source's shader + the (up to two) SRVs to bind — cheap interface
// clones. Each arm yields `Option<(&pixel_shader, [Option<SRV>; 2])>`.
let binding = match self.mode {
Mode::Rgba => self
.cpu_tex
.as_ref()
.map(|(_, srv, _, _)| (&self.ps_rgba, [Some(srv.clone()), None])),
Mode::Nv12 => self
.gpu
.as_ref()
.map(|g| (&self.ps_nv12, [Some(g.y.clone()), Some(g.c.clone())])),
Mode::P010 => self
.gpu
.as_ref()
.map(|g| (&self.ps_p010, [Some(g.y.clone()), Some(g.c.clone())])),
Mode::Empty => None,
};
unsafe {
let c = &self.context;
c.ClearRenderTargetView(&rtv, &[0.0, 0.0, 0.0, 1.0]);
if let Some((ps, srvs)) = binding {
if let Some(bound) = &self.bound {
// Contain-fit viewport: scale to the smaller axis, centre, letterbox the rest.
let (ww, wh, vfw, vfh) = (
pw as f32,
@@ -332,8 +525,15 @@ impl Presenter {
c.IASetInputLayout(None);
c.IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
c.VSSetShader(&self.vs, None);
c.PSSetShader(ps, None);
c.PSSetShaderResources(0, Some(&srvs));
c.PSSetShader(
if bound.hdr {
&self.ps_p010
} else {
&self.ps_nv12
},
None,
);
c.PSSetShaderResources(0, Some(&[Some(bound.y.clone()), Some(bound.c.clone())]));
c.PSSetSamplers(0, Some(&[Some(self.sampler.clone())]));
c.Draw(3, 0);
}
@@ -347,7 +547,6 @@ impl Presenter {
/// PQ-encoded BT.2020 for HDR, so the colour space is all the compositor needs.
fn set_hdr(&mut self, on: bool) {
self.rtv = None; // release back-buffer refs before ResizeBuffers
self.cpu_tex = None; // CPU texture format changes (R10G10B10A2 vs R8G8B8A8)
let format = if on {
DXGI_FORMAT_R10G10B10A2_UNORM
} else {
@@ -359,7 +558,7 @@ impl Presenter {
self.panel_w,
self.panel_h,
format,
DXGI_SWAP_CHAIN_FLAG(0),
DXGI_SWAP_CHAIN_FLAG(self.swap_flags as i32),
) {
tracing::warn!(error = %e, "ResizeBuffers for HDR switch failed");
return;
@@ -389,6 +588,7 @@ impl Presenter {
self.apply_hdr_metadata();
}
}
self.apply_dpi_matrix(); // belt-and-braces: keep the DIP mapping across the format switch
tracing::info!(hdr = on, "swapchain colour mode switched");
}
@@ -410,68 +610,6 @@ impl Presenter {
}
}
fn upload(&mut self, frame: &crate::video::CpuFrame) -> Result<()> {
let (w, h) = (frame.width, frame.height);
let need_new = !matches!(&self.cpu_tex, Some((_, _, tw, th)) if *tw == w && *th == h);
if need_new {
let format = if self.hdr {
DXGI_FORMAT_R10G10B10A2_UNORM
} else {
DXGI_FORMAT_R8G8B8A8_UNORM
};
let desc = D3D11_TEXTURE2D_DESC {
Width: w,
Height: h,
MipLevels: 1,
ArraySize: 1,
Format: format,
SampleDesc: DXGI_SAMPLE_DESC {
Count: 1,
Quality: 0,
},
Usage: D3D11_USAGE_DYNAMIC,
BindFlags: D3D11_BIND_SHADER_RESOURCE.0 as u32,
CPUAccessFlags: D3D11_CPU_ACCESS_WRITE.0 as u32,
MiscFlags: 0,
};
let texture = unsafe {
let mut t = None;
self.device
.CreateTexture2D(&desc, None, Some(&mut t))
.context("CreateTexture2D")?;
t.unwrap()
};
let srv = unsafe {
let mut s = None;
self.device
.CreateShaderResourceView(&texture, None, Some(&mut s))
.context("CreateShaderResourceView")?;
s.unwrap()
};
self.cpu_tex = Some((texture, srv, w, h));
}
let (texture, _, _, _) = self.cpu_tex.as_ref().unwrap();
unsafe {
let mut mapped = D3D11_MAPPED_SUBRESOURCE::default();
self.context
.Map(texture, 0, D3D11_MAP_WRITE_DISCARD, 0, Some(&mut mapped))
.context("Map video texture")?;
let dst = mapped.pData as *mut u8;
let dst_pitch = mapped.RowPitch as usize;
let src_pitch = frame.stride;
let row_bytes = (w as usize) * 4;
for y in 0..h as usize {
std::ptr::copy_nonoverlapping(
frame.pixels.as_ptr().add(y * src_pitch),
dst.add(y * dst_pitch),
row_bytes.min(src_pitch),
);
}
self.context.Unmap(texture, 0);
}
Ok(())
}
fn rtv(&mut self) -> Result<ID3D11RenderTargetView> {
if self.rtv.is_none() {
let back: ID3D11Texture2D = unsafe { self.swap.GetBuffer(0).context("GetBuffer")? };
@@ -488,18 +626,53 @@ impl Presenter {
}
}
/// A composition flip-model swapchain (no HWND) for binding to a XAML `SwapChainPanel`.
impl Drop for Presenter {
fn drop(&mut self) {
if let Some(h) = self.waitable.take() {
unsafe {
let _ = CloseHandle(h);
}
}
}
}
/// Luma + chroma plane view formats for NV12 (8-bit) vs P010 (10-in-16-bit).
fn plane_formats(ten_bit: bool) -> (DXGI_FORMAT, DXGI_FORMAT) {
if ten_bit {
(DXGI_FORMAT_R16_UNORM, DXGI_FORMAT_R16G16_UNORM)
} else {
(DXGI_FORMAT_R8_UNORM, DXGI_FORMAT_R8G8_UNORM)
}
}
/// The host couples 10-bit ⟺ HDR today; a mismatch means the shader's transfer/matrix assumption
/// is off for this stream (rendered anyway — approximate colour beats no picture).
fn warn_bitdepth_mismatch_once(ten_bit: bool, hdr: bool) {
use std::sync::atomic::{AtomicBool, Ordering};
static ONCE: AtomicBool = AtomicBool::new(true);
if ONCE.swap(false, Ordering::Relaxed) {
tracing::warn!(
ten_bit,
hdr,
"bit depth / HDR mismatch — colour may be approximate"
);
}
}
/// A composition flip-model swapchain (no HWND) for binding to a XAML `SwapChainPanel`, with the
/// frame-latency waitable when the driver allows it. Returns the swapchain + the flags it was
/// created with (every `ResizeBuffers` must re-pass them).
fn create_composition_swapchain(
device: &ID3D11Device,
width: u32,
height: u32,
) -> Result<IDXGISwapChain1> {
) -> Result<(IDXGISwapChain1, u32)> {
let dxdev: IDXGIDevice = device.cast().context("IDXGIDevice cast")?;
let factory: IDXGIFactory2 = unsafe {
let adapter = dxdev.GetAdapter().context("GetAdapter")?;
adapter.GetParent().context("GetParent (IDXGIFactory2)")?
};
let desc = DXGI_SWAP_CHAIN_DESC1 {
let mut desc = DXGI_SWAP_CHAIN_DESC1 {
Width: width,
Height: height,
Format: DXGI_FORMAT_B8G8R8A8_UNORM,
@@ -512,16 +685,24 @@ fn create_composition_swapchain(
BufferCount: 2,
Scaling: DXGI_SCALING_STRETCH,
SwapEffect: DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL,
// IGNORE (opaque), not PREMULTIPLIED: the video fills the panel and the HDR `X2BGR10`
// upload leaves the 2 padding/alpha bits 0 — premultiplied alpha would then make HDR frames
// transparent. Opaque is correct for a full-frame video surface either way.
// IGNORE (opaque), not PREMULTIPLIED: the video fills the panel with opaque RGB either way.
AlphaMode: DXGI_ALPHA_MODE_IGNORE,
Flags: 0,
Flags: DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT.0 as u32,
};
unsafe {
factory
.CreateSwapChainForComposition(device, &desc, None)
.context("CreateSwapChainForComposition")
match factory.CreateSwapChainForComposition(device, &desc, None) {
Ok(sc) => Ok((sc, desc.Flags)),
Err(e) => {
// Odd driver/WARP combinations can reject the waitable — fall back to plain
// Present(1) pacing rather than failing the stream page.
tracing::warn!(error = %e, "waitable swapchain rejected — creating without");
desc.Flags = 0;
let sc = factory
.CreateSwapChainForComposition(device, &desc, None)
.context("CreateSwapChainForComposition")?;
Ok((sc, 0))
}
}
}
}
@@ -531,11 +712,9 @@ fn build_pipeline(
ID3D11VertexShader,
ID3D11PixelShader,
ID3D11PixelShader,
ID3D11PixelShader,
ID3D11SamplerState,
)> {
let vs_blob = compile(SHADER_HLSL, "vs_main", "vs_5_0")?;
let rgba_blob = compile(SHADER_HLSL, "ps_rgba", "ps_5_0")?;
let nv12_blob = compile(SHADER_HLSL, "ps_nv12", "ps_5_0")?;
let p010_blob = compile(SHADER_HLSL, "ps_p010", "ps_5_0")?;
unsafe {
@@ -543,10 +722,6 @@ fn build_pipeline(
device
.CreateVertexShader(blob_bytes(&vs_blob), None, Some(&mut vs))
.context("CreateVertexShader")?;
let mut ps_rgba = None;
device
.CreatePixelShader(blob_bytes(&rgba_blob), None, Some(&mut ps_rgba))
.context("CreatePixelShader (rgba)")?;
let mut ps_nv12 = None;
device
.CreatePixelShader(blob_bytes(&nv12_blob), None, Some(&mut ps_nv12))
@@ -569,7 +744,6 @@ fn build_pipeline(
.context("CreateSamplerState")?;
Ok((
vs.unwrap(),
ps_rgba.unwrap(),
ps_nv12.unwrap(),
ps_p010.unwrap(),
sampler.unwrap(),