feat(clients/windows): all-vendor video pipeline rewrite + app icon + hosts-page tiles

Decode+present rewrite (first real pixels on glass for this client):

- Decode: FFmpeg D3D11VA on NVIDIA/AMD/Intel. get_format now only returns
  AV_PIX_FMT_D3D11 and lets libavcodec build the decode pool from
  hw_device_ctx (hand-built frames contexts failed three different ways:
  NVIDIA rejects DECODER|SHADER_RESOURCE arrays, BindFlags=0 fails texture
  creation, Intel rejects non-128-aligned HEVC surfaces at the first
  SubmitDecoderBuffers). A DXVA profile probe before the hwdevice commits
  hardware-vs-software up front instead of burning the opening IDR;
  extra_hw_frames covers the frames the client holds.
- Present: the decoded slice is copied with ONE display-size-boxed
  CopySubresourceRegion (a planar slice is a single subresource in D3D11;
  the old two-copy D3D12-style code silently no-opped - the black screen)
  into a sampleable NV12/P010 texture, per-plane SRVs + YUV->RGB shaders.
- New dedicated render thread (render.rs): presenting is decoupled from the
  XAML thread; frame-latency-waitable swapchain + SetMaximumFrameLatency(1),
  newest-wins drain after the wait, crossbeam frame channel with pts for a
  capture->presented p50 log.
- HiDPI: pixel-sized buffers + SetMatrixTransform(96/dpi) - was blurry at
  125/150 % scaling.
- Software fallback now feeds the same shaders (swscale -> NV12/P010 planes
  -> two dynamic plane textures); ps_rgba/X2BGR10 path deleted, hw/sw colour
  math identical.
- Adapter selection for hybrid boxes: PUNKTFUNK_ADAPTER > the window's
  monitor's adapter > default; PUNKTFUNK_D3D_DEBUG=1 debug layer.
- Session pump: request_keyframe at start and on hw->sw demotion (infinite
  GOP would otherwise sit on a black screen).

Validated live on the Arc Pro + RTX 3500 Ada laptop against the local
Windows host: 60 fps D3D11VA on both vendors, software path, GUI on glass.

Also: embedded app icon (build.rs winresource + WM_SETICON, MSIX
Square44x44 targetsize assets, pack-msix stages them) and the hosts-page
tile rework (tap-to-connect tiles with sibling overflow menu - fixes
forget-also-connects - in-tile rename editor, add-host modal via root state).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 16:24:23 +02:00
parent 2c416a4bff
commit a4c84ac620
36 changed files with 1797 additions and 581 deletions
+204
View File
@@ -0,0 +1,204 @@
//! The dedicated video render thread: decoded frames flow session pump → bounded channel → here →
//! `Presenter::present`. Presenting off the XAML thread means UI jank (layout, input, dialogs)
//! never stalls video, and a filled present queue never blocks the UI thread — the two failure
//! modes of the old present-from-`on_rendering` design.
//!
//! Pacing: block on the channel (the host paces the stream), then on the swapchain's
//! frame-latency waitable (≤1 queued present — see `present.rs`), then drain to the NEWEST frame
//! so a stream faster than the display drops backlog before any GPU work. The UI thread only
//! writes panel size/DPI into [`RenderShared`] atomics; the loop applies them before the next
//! draw (and redraws the held frame after a resize — fresh back buffers are blank).
use crate::present::Presenter;
use crate::session::FrameRx;
use crossbeam_channel::RecvTimeoutError;
use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
use std::sync::Arc;
use std::time::{Duration, Instant};
/// UI-thread → render-thread state. Size is packed into ONE atomic (w<<32|h) so a resize never
/// tears into a (new-width, old-height) pair.
pub struct RenderShared {
size_px: AtomicU64,
dpi: AtomicU32,
stop: AtomicBool,
}
impl RenderShared {
pub fn new(width: u32, height: u32, dpi: u32) -> Arc<RenderShared> {
Arc::new(RenderShared {
size_px: AtomicU64::new(pack(width, height)),
dpi: AtomicU32::new(dpi),
stop: AtomicBool::new(false),
})
}
pub fn set_size(&self, width: u32, height: u32) {
self.size_px.store(pack(width, height), Ordering::Relaxed);
}
pub fn set_dpi(&self, dpi: u32) {
self.dpi.store(dpi, Ordering::Relaxed);
}
fn snapshot(&self) -> (u32, u32, u32) {
let s = self.size_px.load(Ordering::Relaxed);
((s >> 32) as u32, s as u32, self.dpi.load(Ordering::Relaxed))
}
}
fn pack(w: u32, h: u32) -> u64 {
((w as u64) << 32) | h as u64
}
/// Handle owned by the stream page; stops + joins the thread on unmount (and on drop, so a
/// navigation away can't leak a presenting thread).
pub struct RenderThread {
shared: Arc<RenderShared>,
join: Option<std::thread::JoinHandle<()>>,
}
impl RenderThread {
pub fn shared(&self) -> &Arc<RenderShared> {
&self.shared
}
pub fn stop_and_join(&mut self) {
self.shared.stop.store(true, Ordering::SeqCst);
if let Some(j) = self.join.take() {
let _ = j.join();
}
}
}
impl Drop for RenderThread {
fn drop(&mut self) {
self.stop_and_join();
}
}
/// Moves the presenter (COM interfaces, `!Send` by default) onto the render thread. Sound here:
/// the shared device + immediate context are multithread-protected (see `crate::gpu`), D3D/DXGI
/// objects are apartment-agile, and after this one handoff the swapchain/RTV/context calls happen
/// on exactly the render thread — the same single-owner discipline as `SharedDevice`.
struct SendPresenter(Presenter);
unsafe impl Send for SendPresenter {}
/// Spawn the render thread. `frames` carries `(frame, capture pts_ns)`; `clock_offset_ns` maps our
/// wall clock onto the host's so the logged present latency is end-to-end (same math as the pump).
pub fn spawn(
presenter: Presenter,
frames: FrameRx,
shared: Arc<RenderShared>,
clock_offset_ns: i64,
) -> RenderThread {
let boxed = SendPresenter(presenter);
let shared_w = shared.clone();
let join = std::thread::Builder::new()
.name("pf-render".into())
.spawn(move || run(boxed, frames, shared_w, clock_offset_ns))
.expect("spawn render thread");
RenderThread {
shared,
join: Some(join),
}
}
fn now_ns() -> u64 {
std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_nanos() as u64)
.unwrap_or(0)
}
/// The window DPI, polled ~1 Hz as belt-and-braces for a monitor move that changes DPI without a
/// `SizeChanged` (same DIP size on both screens). `None` when the window isn't up (headless).
fn poll_window_dpi() -> Option<u32> {
use windows::Win32::UI::HiDpi::GetDpiForWindow;
use windows::Win32::UI::WindowsAndMessaging::FindWindowW;
unsafe {
let hwnd = FindWindowW(None, windows::core::w!("Punktfunk")).ok()?;
match GetDpiForWindow(hwnd) {
0 => None,
d => Some(d),
}
}
}
fn run(presenter: SendPresenter, frames: FrameRx, shared: Arc<RenderShared>, clock_offset_ns: i64) {
let mut p = presenter.0;
let mut applied = (0u32, 0u32, 0u32); // last (w, h, dpi) handed to the presenter
let mut presented = 0u32;
let mut dropped = 0u32;
let mut lat_us: Vec<u64> = Vec::with_capacity(256);
let mut window_start = Instant::now();
let mut last_dpi_poll = Instant::now();
loop {
if shared.stop.load(Ordering::SeqCst) {
break;
}
let first = match frames.recv_timeout(Duration::from_millis(50)) {
Ok(f) => Some(f),
Err(RecvTimeoutError::Timeout) => None,
Err(RecvTimeoutError::Disconnected) => break,
};
if last_dpi_poll.elapsed() >= Duration::from_secs(1) {
last_dpi_poll = Instant::now();
if let Some(dpi) = poll_window_dpi() {
shared.set_dpi(dpi);
}
}
let snap = shared.snapshot();
let resized = snap != applied && snap.0 > 0 && snap.1 > 0;
if resized {
p.resize(snap.0, snap.1, snap.2);
applied = snap;
}
if first.is_none() && !resized {
continue; // nothing new to show — don't burn GPU re-presenting a static frame
}
// Throttle to the compositor: with ≤1 present outstanding this returns as DWM frees a
// slot, and frames decoded meanwhile are drained below so the newest is what's drawn.
if !p.wait_present_slot(1000) {
tracing::debug!("frame-latency waitable timed out — presenting anyway");
}
let mut newest = first;
while let Ok(f) = frames.try_recv() {
if newest.is_some() {
dropped += 1;
}
newest = Some(f);
}
// The session pump is the sole 0xCE consumer and stashes the latest here (rare updates).
if let Some(meta) = *crate::present::LATEST_HDR_META.lock().unwrap() {
p.set_hdr_metadata(meta);
}
let pts_ns = newest.as_ref().map(|(_, pts)| *pts);
p.present(newest.map(|(f, _)| f));
presented += 1;
if let Some(pts) = pts_ns {
// Capture→presented, host-clock corrected — the glass-side companion to the pump's
// capture→decoded p50.
let lat = (now_ns() as i128 + clock_offset_ns as i128 - pts as i128).max(0) as u64;
if lat > 0 && lat < 10_000_000_000 {
lat_us.push(lat / 1000);
}
}
if window_start.elapsed() >= Duration::from_secs(1) {
lat_us.sort_unstable();
let p50 = lat_us.get(lat_us.len() / 2).copied().unwrap_or(0);
tracing::debug!(presented, dropped, present_p50_us = p50, "render window");
window_start = Instant::now();
presented = 0;
dropped = 0;
lat_us.clear();
}
}
tracing::info!("render thread exiting");
}