feat(clients/windows): all-vendor video pipeline rewrite + app icon + hosts-page tiles

Decode+present rewrite (first real pixels on glass for this client):

- Decode: FFmpeg D3D11VA on NVIDIA/AMD/Intel. get_format now only returns
  AV_PIX_FMT_D3D11 and lets libavcodec build the decode pool from
  hw_device_ctx (hand-built frames contexts failed three different ways:
  NVIDIA rejects DECODER|SHADER_RESOURCE arrays, BindFlags=0 fails texture
  creation, Intel rejects non-128-aligned HEVC surfaces at the first
  SubmitDecoderBuffers). A DXVA profile probe before the hwdevice commits
  hardware-vs-software up front instead of burning the opening IDR;
  extra_hw_frames covers the frames the client holds.
- Present: the decoded slice is copied with ONE display-size-boxed
  CopySubresourceRegion (a planar slice is a single subresource in D3D11;
  the old two-copy D3D12-style code silently no-opped - the black screen)
  into a sampleable NV12/P010 texture, per-plane SRVs + YUV->RGB shaders.
- New dedicated render thread (render.rs): presenting is decoupled from the
  XAML thread; frame-latency-waitable swapchain + SetMaximumFrameLatency(1),
  newest-wins drain after the wait, crossbeam frame channel with pts for a
  capture->presented p50 log.
- HiDPI: pixel-sized buffers + SetMatrixTransform(96/dpi) - was blurry at
  125/150 % scaling.
- Software fallback now feeds the same shaders (swscale -> NV12/P010 planes
  -> two dynamic plane textures); ps_rgba/X2BGR10 path deleted, hw/sw colour
  math identical.
- Adapter selection for hybrid boxes: PUNKTFUNK_ADAPTER > the window's
  monitor's adapter > default; PUNKTFUNK_D3D_DEBUG=1 debug layer.
- Session pump: request_keyframe at start and on hw->sw demotion (infinite
  GOP would otherwise sit on a black screen).

Validated live on the Arc Pro + RTX 3500 Ada laptop against the local
Windows host: 60 fps D3D11VA on both vendors, software path, GUI on glass.

Also: embedded app icon (build.rs winresource + WM_SETICON, MSIX
Square44x44 targetsize assets, pack-msix stages them) and the hosts-page
tile rework (tap-to-connect tiles with sibling overflow menu - fixes
forget-also-connects - in-tile rename editor, add-host modal via root state).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 16:24:23 +02:00
parent 2c416a4bff
commit a4c84ac620
36 changed files with 1797 additions and 581 deletions
+35 -7
View File
@@ -74,9 +74,13 @@ pub enum SessionEvent {
Stats(Stats),
}
/// Decoded frames + their host-capture `pts_ns`, session pump → render thread (crossbeam so that
/// thread can block with a timeout — async-channel has no `recv_timeout`).
pub type FrameRx = crossbeam_channel::Receiver<(DecodedFrame, u64)>;
pub struct SessionHandle {
pub events: async_channel::Receiver<SessionEvent>,
pub frames: async_channel::Receiver<DecodedFrame>,
pub frames: FrameRx,
pub stop: Arc<AtomicBool>,
}
@@ -131,13 +135,15 @@ pub fn run_speed_probe(
pub fn start(params: SessionParams) -> SessionHandle {
let (ev_tx, ev_rx) = async_channel::unbounded();
// Tiny frame queue, newest wins: force_send displaces the oldest when the UI lags.
let (frame_tx, frame_rx) = async_channel::bounded(2);
// Tiny frame queue, newest wins: the pump displaces the oldest when the renderer lags (it
// keeps a Receiver clone for exactly that).
let (frame_tx, frame_rx) = crossbeam_channel::bounded(2);
let stop = Arc::new(AtomicBool::new(false));
let stop_w = stop.clone();
let frame_rx_pump = frame_rx.clone();
std::thread::Builder::new()
.name("punktfunk-session".into())
.spawn(move || pump(params, ev_tx, frame_tx, stop_w))
.spawn(move || pump(params, ev_tx, frame_tx, frame_rx_pump, stop_w))
.expect("spawn session thread");
SessionHandle {
events: ev_rx,
@@ -192,7 +198,8 @@ impl AudioDec {
fn pump(
params: SessionParams,
ev_tx: async_channel::Sender<SessionEvent>,
frame_tx: async_channel::Sender<DecodedFrame>,
frame_tx: crossbeam_channel::Sender<(DecodedFrame, u64)>,
frame_rx: FrameRx,
stop: Arc<AtomicBool>,
) {
let connector = match NativeClient::connect(
@@ -285,6 +292,11 @@ fn pump(
})
.flatten();
// Force an immediate IDR (with in-band parameter sets) rather than waiting for the host's own
// first keyframe — under infinite GOP a late/missed IDR means the decoder sits on
// "PPS id out of range" (a black screen) until one arrives.
let _ = connector.request_keyframe();
let clock_offset = connector.clock_offset_ns;
let mut total_frames = 0u64;
let mut window_start = Instant::now();
@@ -304,7 +316,17 @@ fn pump(
match connector.next_frame(Duration::from_millis(4)) {
Ok(frame) => {
let t0 = Instant::now();
match decoder.decode(&frame.data) {
// A D3D11VA→software demotion (see `Decoder::decode`) starts a FRESH decoder that
// has none of the stream's parameter sets; under infinite GOP it would sit on
// "PPS id out of range" forever. Detect the transition and force a new IDR so the
// rebuilt decoder resynchronizes immediately.
let was_hw = decoder.is_hardware();
let decoded = decoder.decode(&frame.data);
if was_hw && !decoder.is_hardware() {
tracing::info!("decoder demoted to software — requesting keyframe to resync");
let _ = connector.request_keyframe();
}
match decoded {
Ok(Some(decoded)) => {
total_frames += 1;
hdr = decoded.hdr();
@@ -330,7 +352,13 @@ fn pump(
decode_us_sum += t0.elapsed().as_micros() as u64;
frames_n += 1;
bytes_n += frame.data.len() as u64;
let _ = frame_tx.force_send(decoded);
// Newest wins: displace the oldest queued frame when the renderer lags.
if let Err(crossbeam_channel::TrySendError::Full(item)) =
frame_tx.try_send((decoded, frame.pts_ns))
{
let _ = frame_rx.try_recv();
let _ = frame_tx.try_send(item);
}
}
Ok(None) => {}
// Survivable (loss until the next IDR/RFI recovery) — keep feeding.