feat(clients/windows): all-vendor video pipeline rewrite + app icon + hosts-page tiles

Decode+present rewrite (first real pixels on glass for this client): - Decode: FFmpeg D3D11VA on NVIDIA/AMD/Intel. get_format now only returns AV_PIX_FMT_D3D11 and lets libavcodec build the decode pool from hw_device_ctx (hand-built frames contexts failed three different ways: NVIDIA rejects DECODER|SHADER_RESOURCE arrays, BindFlags=0 fails texture creation, Intel rejects non-128-aligned HEVC surfaces at the first SubmitDecoderBuffers). A DXVA profile probe before the hwdevice commits hardware-vs-software up front instead of burning the opening IDR; extra_hw_frames covers the frames the client holds. - Present: the decoded slice is copied with ONE display-size-boxed CopySubresourceRegion (a planar slice is a single subresource in D3D11; the old two-copy D3D12-style code silently no-opped - the black screen) into a sampleable NV12/P010 texture, per-plane SRVs + YUV->RGB shaders. - New dedicated render thread (render.rs): presenting is decoupled from the XAML thread; frame-latency-waitable swapchain + SetMaximumFrameLatency(1), newest-wins drain after the wait, crossbeam frame channel with pts for a capture->presented p50 log. - HiDPI: pixel-sized buffers + SetMatrixTransform(96/dpi) - was blurry at 125/150 % scaling. - Software fallback now feeds the same shaders (swscale -> NV12/P010 planes -> two dynamic plane textures); ps_rgba/X2BGR10 path deleted, hw/sw colour math identical. - Adapter selection for hybrid boxes: PUNKTFUNK_ADAPTER > the window's monitor's adapter > default; PUNKTFUNK_D3D_DEBUG=1 debug layer. - Session pump: request_keyframe at start and on hw->sw demotion (infinite GOP would otherwise sit on a black screen). Validated live on the Arc Pro + RTX 3500 Ada laptop against the local Windows host: 60 fps D3D11VA on both vendors, software path, GUI on glass. Also: embedded app icon (build.rs winresource + WM_SETICON, MSIX Square44x44 targetsize assets, pack-msix stages them) and the hosts-page tile rework (tap-to-connect tiles with sibling overflow menu - fixes forget-also-connects - in-tile rename editor, add-host modal via root state). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 16:24:23 +02:00
parent 2c416a4bff
commit a4c84ac620
36 changed files with 1797 additions and 581 deletions
@@ -0,0 +1,204 @@
+//! The dedicated video render thread: decoded frames flow session pump → bounded channel → here →
+//! `Presenter::present`. Presenting off the XAML thread means UI jank (layout, input, dialogs)
+//! never stalls video, and a filled present queue never blocks the UI thread — the two failure
+//! modes of the old present-from-`on_rendering` design.
+//!
+//! Pacing: block on the channel (the host paces the stream), then on the swapchain's
+//! frame-latency waitable (≤1 queued present — see `present.rs`), then drain to the NEWEST frame
+//! so a stream faster than the display drops backlog before any GPU work. The UI thread only
+//! writes panel size/DPI into [`RenderShared`] atomics; the loop applies them before the next
+//! draw (and redraws the held frame after a resize — fresh back buffers are blank).
+
+use crate::present::Presenter;
+use crate::session::FrameRx;
+use crossbeam_channel::RecvTimeoutError;
+use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
+use std::sync::Arc;
+use std::time::{Duration, Instant};
+
+/// UI-thread → render-thread state. Size is packed into ONE atomic (w<<32|h) so a resize never
+/// tears into a (new-width, old-height) pair.
+pub struct RenderShared {
+    size_px: AtomicU64,
+    dpi: AtomicU32,
+    stop: AtomicBool,
+}
+
+impl RenderShared {
+    pub fn new(width: u32, height: u32, dpi: u32) -> Arc<RenderShared> {
+        Arc::new(RenderShared {
+            size_px: AtomicU64::new(pack(width, height)),
+            dpi: AtomicU32::new(dpi),
+            stop: AtomicBool::new(false),
+        })
+    }
+
+    pub fn set_size(&self, width: u32, height: u32) {
+        self.size_px.store(pack(width, height), Ordering::Relaxed);
+    }
+
+    pub fn set_dpi(&self, dpi: u32) {
+        self.dpi.store(dpi, Ordering::Relaxed);
+    }
+
+    fn snapshot(&self) -> (u32, u32, u32) {
+        let s = self.size_px.load(Ordering::Relaxed);
+        ((s >> 32) as u32, s as u32, self.dpi.load(Ordering::Relaxed))
+    }
+}
+
+fn pack(w: u32, h: u32) -> u64 {
+    ((w as u64) << 32) | h as u64
+}
+
+/// Handle owned by the stream page; stops + joins the thread on unmount (and on drop, so a
+/// navigation away can't leak a presenting thread).
+pub struct RenderThread {
+    shared: Arc<RenderShared>,
+    join: Option<std::thread::JoinHandle<()>>,
+}
+
+impl RenderThread {
+    pub fn shared(&self) -> &Arc<RenderShared> {
+        &self.shared
+    }
+
+    pub fn stop_and_join(&mut self) {
+        self.shared.stop.store(true, Ordering::SeqCst);
+        if let Some(j) = self.join.take() {
+            let _ = j.join();
+        }
+    }
+}
+
+impl Drop for RenderThread {
+    fn drop(&mut self) {
+        self.stop_and_join();
+    }
+}
+
+/// Moves the presenter (COM interfaces, `!Send` by default) onto the render thread. Sound here:
+/// the shared device + immediate context are multithread-protected (see `crate::gpu`), D3D/DXGI
+/// objects are apartment-agile, and after this one handoff the swapchain/RTV/context calls happen
+/// on exactly the render thread — the same single-owner discipline as `SharedDevice`.
+struct SendPresenter(Presenter);
+unsafe impl Send for SendPresenter {}
+
+/// Spawn the render thread. `frames` carries `(frame, capture pts_ns)`; `clock_offset_ns` maps our
+/// wall clock onto the host's so the logged present latency is end-to-end (same math as the pump).
+pub fn spawn(
+    presenter: Presenter,
+    frames: FrameRx,
+    shared: Arc<RenderShared>,
+    clock_offset_ns: i64,
+) -> RenderThread {
+    let boxed = SendPresenter(presenter);
+    let shared_w = shared.clone();
+    let join = std::thread::Builder::new()
+        .name("pf-render".into())
+        .spawn(move || run(boxed, frames, shared_w, clock_offset_ns))
+        .expect("spawn render thread");
+    RenderThread {
+        shared,
+        join: Some(join),
+    }
+}
+
+fn now_ns() -> u64 {
+    std::time::SystemTime::now()
+        .duration_since(std::time::UNIX_EPOCH)
+        .map(|d| d.as_nanos() as u64)
+        .unwrap_or(0)
+}
+
+/// The window DPI, polled ~1 Hz as belt-and-braces for a monitor move that changes DPI without a
+/// `SizeChanged` (same DIP size on both screens). `None` when the window isn't up (headless).
+fn poll_window_dpi() -> Option<u32> {
+    use windows::Win32::UI::HiDpi::GetDpiForWindow;
+    use windows::Win32::UI::WindowsAndMessaging::FindWindowW;
+    unsafe {
+        let hwnd = FindWindowW(None, windows::core::w!("Punktfunk")).ok()?;
+        match GetDpiForWindow(hwnd) {
+            0 => None,
+            d => Some(d),
+        }
+    }
+}
+
+fn run(presenter: SendPresenter, frames: FrameRx, shared: Arc<RenderShared>, clock_offset_ns: i64) {
+    let mut p = presenter.0;
+    let mut applied = (0u32, 0u32, 0u32); // last (w, h, dpi) handed to the presenter
+    let mut presented = 0u32;
+    let mut dropped = 0u32;
+    let mut lat_us: Vec<u64> = Vec::with_capacity(256);
+    let mut window_start = Instant::now();
+    let mut last_dpi_poll = Instant::now();
+
+    loop {
+        if shared.stop.load(Ordering::SeqCst) {
+            break;
+        }
+        let first = match frames.recv_timeout(Duration::from_millis(50)) {
+            Ok(f) => Some(f),
+            Err(RecvTimeoutError::Timeout) => None,
+            Err(RecvTimeoutError::Disconnected) => break,
+        };
+
+        if last_dpi_poll.elapsed() >= Duration::from_secs(1) {
+            last_dpi_poll = Instant::now();
+            if let Some(dpi) = poll_window_dpi() {
+                shared.set_dpi(dpi);
+            }
+        }
+        let snap = shared.snapshot();
+        let resized = snap != applied && snap.0 > 0 && snap.1 > 0;
+        if resized {
+            p.resize(snap.0, snap.1, snap.2);
+            applied = snap;
+        }
+        if first.is_none() && !resized {
+            continue; // nothing new to show — don't burn GPU re-presenting a static frame
+        }
+
+        // Throttle to the compositor: with ≤1 present outstanding this returns as DWM frees a
+        // slot, and frames decoded meanwhile are drained below so the newest is what's drawn.
+        if !p.wait_present_slot(1000) {
+            tracing::debug!("frame-latency waitable timed out — presenting anyway");
+        }
+        let mut newest = first;
+        while let Ok(f) = frames.try_recv() {
+            if newest.is_some() {
+                dropped += 1;
+            }
+            newest = Some(f);
+        }
+
+        // The session pump is the sole 0xCE consumer and stashes the latest here (rare updates).
+        if let Some(meta) = *crate::present::LATEST_HDR_META.lock().unwrap() {
+            p.set_hdr_metadata(meta);
+        }
+
+        let pts_ns = newest.as_ref().map(|(_, pts)| *pts);
+        p.present(newest.map(|(f, _)| f));
+        presented += 1;
+        if let Some(pts) = pts_ns {
+            // Capture→presented, host-clock corrected — the glass-side companion to the pump's
+            // capture→decoded p50.
+            let lat = (now_ns() as i128 + clock_offset_ns as i128 - pts as i128).max(0) as u64;
+            if lat > 0 && lat < 10_000_000_000 {
+                lat_us.push(lat / 1000);
+            }
+        }
+
+        if window_start.elapsed() >= Duration::from_secs(1) {
+            lat_us.sort_unstable();
+            let p50 = lat_us.get(lat_us.len() / 2).copied().unwrap_or(0);
+            tracing::debug!(presented, dropped, present_p50_us = p50, "render window");
+            window_start = Instant::now();
+            presented = 0;
+            dropped = 0;
+            lat_us.clear();
+        }
+    }
+    tracing::info!("render thread exiting");
+}