feat(windows): pf-vdisplay IDD-push — HDR + pipelined zero-copy capture
apple / swift (push) Successful in 1m4s
windows-host / package (push) Successful in 6m28s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m14s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m10s
release / apple (push) Successful in 7m53s
android / android (push) Successful in 10m33s
ci / web (push) Successful in 44s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 3m4s
ci / docs-site (push) Successful in 53s
ci / rust (push) Successful in 12m22s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m11s
apple / screenshots (push) Successful in 5m24s
deb / build-publish (push) Successful in 3m16s
decky / build-publish (push) Successful in 21s
ci / bench (push) Successful in 4m42s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m34s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m13s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 47s
flatpak / build-publish (push) Successful in 4m24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 25s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m44s

HDR (display-driven, matching the WGC path):
- CTA-861.3 HDR EDID (BT.2020 primaries + HDR Static Metadata block) so Windows
  offers "Use HDR" on the virtual display. The host FOLLOWS the display's live
  advanced-color state, recreating the shared ring at the matching format
  (FP16 in HDR / BGRA in SDR) on a toggle — no freeze.
- Always emit Main10/BT.2020-PQ Rgb10a2 while the display is HDR; the client
  auto-detects PQ from the HEVC VUI (clients under-report VIDEO_CAP_10BIT).
  Generic HDR10 mastering SEI on every IDR.
- Generation-tagged `latest` (gen<<40|seq<<8|slot) + driver `is_stale` re-attach
  kill the toggle-time garbage frame and any stale-ring read.

Perf:
- Pipeline the encode loop (Capturer::pipeline_depth; IDD-push = 2): submit N+1
  before polling N so the convert/copy on the 3D engine overlaps the NVENC encode
  of N on the ASIC. PUNKTFUNK_IDD_DEPTH overrides (1 = synchronous).
- Rotating host output ring (OUT_RING) so the in-flight encode and the next
  convert never touch the same texture.
- HDR converts directly from the keyed-mutex slot's SRV into the output ring
  (drops the redundant slot->fp16 scratch copy); SDR copies the BGRA slot in.
  The slot mutex is held only across the convert/copy, not the encode.
  RING_LEN 3->6 for publish headroom.
- Capture-health diagnostic: new_fps vs repeat_fps under PUNKTFUNK_PERF (a low
  new_fps at a high send rate means the source isn't compositing, not an encode
  stall).

Validated live on the RTX box: 5120x1440@240 HDR streams; driver composes
~180 new fps, encode 240 fps @ ~4.3 ms p50.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-24 00:35:52 +02:00
parent c5dab484df
commit e2c9bfd3d9
26 changed files with 2962 additions and 313 deletions
+137 -12
View File
@@ -9,8 +9,25 @@
use std::ffi::c_void;
use std::mem::size_of;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
use std::sync::{Arc, Mutex, Once};
/// Monotonic monitor generation. Each [`create_monitor`] stamps the next value onto the [`Monitor`]
/// and its [`MonitorLease`]s, so a lease whose monitor was already torn down + recreated (the IDD-push
/// reconnect-preempt path) is ignored on drop instead of decrementing the NEW monitor's refcount.
static MON_GEN: AtomicU64 = AtomicU64::new(1);
/// The gen of the CURRENTLY-active monitor. A session capturer captures this at open and re-checks it
/// each frame; when it changes (a reconnect preempted + recreated the monitor), the old session bails
/// IMMEDIATELY instead of lingering on the dead ring's 20s frame deadline — which would otherwise hold
/// its NVENC encoder open and exhaust the GPU's encode-session limit under rapid reconnects.
pub(crate) static CURRENT_MON_GEN: AtomicU64 = AtomicU64::new(0);
/// IDD-push mode: a new client connection preempts + recreates the monitor (single-client reconnect),
/// because a REUSED IddCx monitor's swap-chain is dead. Off → monitors are shared across sessions.
fn idd_push_mode() -> bool {
std::env::var_os("PUNKTFUNK_IDD_PUSH").is_some()
}
use std::thread::{self, JoinHandle};
use std::time::{Duration, Instant};
@@ -27,7 +44,8 @@ use windows::Win32::Devices::Display::{
DISPLAYCONFIG_DEVICE_INFO_GET_SOURCE_NAME, DISPLAYCONFIG_DEVICE_INFO_SET_ADVANCED_COLOR_STATE,
DISPLAYCONFIG_GET_ADVANCED_COLOR_INFO, DISPLAYCONFIG_MODE_INFO, DISPLAYCONFIG_PATH_INFO,
DISPLAYCONFIG_SET_ADVANCED_COLOR_STATE, DISPLAYCONFIG_SOURCE_DEVICE_NAME,
QDC_ONLY_ACTIVE_PATHS, SDC_ALLOW_CHANGES, SDC_APPLY, SDC_USE_SUPPLIED_DISPLAY_CONFIG,
QDC_ONLY_ACTIVE_PATHS, SDC_ALLOW_CHANGES, SDC_APPLY, SDC_FORCE_MODE_ENUMERATION,
SDC_SAVE_TO_DATABASE, SDC_USE_SUPPLIED_DISPLAY_CONFIG,
};
use windows::Win32::Foundation::{CloseHandle, HANDLE, LUID};
use windows::Win32::Graphics::Gdi::{
@@ -119,7 +137,9 @@ unsafe fn set_render_adapter(h: HANDLE, luid: LUID) -> Result<()> {
/// Desktop Duplication (e.g. the RTX 4090). Default: the discrete adapter with the most
/// `DedicatedVideoMemory`, skipping WARP / Basic-Render and the SudoVDA software adapter (≈0 VRAM).
/// `PUNKTFUNK_RENDER_ADAPTER=<substring>` forces a match by Description (Apollo's `adapter_name`).
unsafe fn resolve_render_adapter_luid() -> Option<LUID> {
/// `pub(crate)` so the IDD direct-push capturer can create its shared textures on the same discrete
/// GPU it pins here (and where NVENC runs).
pub(crate) unsafe fn resolve_render_adapter_luid() -> Option<LUID> {
use windows::Win32::Graphics::Dxgi::{CreateDXGIFactory1, IDXGIFactory1};
let want = std::env::var("PUNKTFUNK_RENDER_ADAPTER")
.ok()
@@ -497,13 +517,32 @@ unsafe fn isolate_displays_ccd(keep_target_id: u32) -> Option<SavedConfig> {
}
}
if others == 0 {
tracing::info!("display isolate (CCD): SudoVDA target {keep_target_id} already the only active display");
// The virtual path shows active in the CCD database (from set_active_mode's legacy
// ChangeDisplaySettingsExW), but a legacy mode-set does NOT drive the IddCx adapter's
// EVT_IDD_CX_ADAPTER_COMMIT_MODES — and without COMMIT_MODES the OS never calls
// ASSIGN_SWAPCHAIN, so the driver never receives composed frames. Force an explicit CCD
// SetDisplayConfig commit of the (sole) virtual path so the IddCx path actually activates.
// SDC_FORCE_MODE_ENUMERATION makes the OS re-enumerate + re-commit even though the CCD DB
// already lists the path active.
let rc = SetDisplayConfig(
Some(paths.as_slice()),
Some(modes.as_slice()),
SDC_APPLY
| SDC_USE_SUPPLIED_DISPLAY_CONFIG
| SDC_ALLOW_CHANGES
| SDC_SAVE_TO_DATABASE
| SDC_FORCE_MODE_ENUMERATION,
);
tracing::info!("display isolate (CCD): forced CCD re-commit of sole virtual path {keep_target_id} rc={rc:#x} (drives IddCx COMMIT_MODES → ASSIGN_SWAPCHAIN)");
return Some(saved);
}
let rc = SetDisplayConfig(
Some(paths.as_slice()),
Some(modes.as_slice()),
SDC_APPLY | SDC_USE_SUPPLIED_DISPLAY_CONFIG | SDC_ALLOW_CHANGES,
SDC_APPLY
| SDC_USE_SUPPLIED_DISPLAY_CONFIG
| SDC_ALLOW_CHANGES
| SDC_FORCE_MODE_ENUMERATION,
);
if rc == 0 {
tracing::info!("display isolate (CCD): deactivated {others} other display(s) — SudoVDA target {keep_target_id} is now the sole desktop");
@@ -587,6 +626,8 @@ struct Monitor {
stop: Arc<AtomicBool>,
pinger: Option<JoinHandle<()>>,
ccd_saved: Option<SavedConfig>,
/// Generation stamp ([`MON_GEN`]); a [`MonitorLease`] only releases if its gen still matches.
gen: u64,
}
enum MgrState {
@@ -670,6 +711,14 @@ unsafe fn create_monitor(device: isize, mode: Mode, watchdog_s: u32) -> Result<M
// PUNKTFUNK_RENDER_ADAPTER=<name substring> only on a box that genuinely needs steering.
let pinned = if std::env::var("PUNKTFUNK_RENDER_ADAPTER").is_ok() {
unsafe { resolve_render_adapter_luid() }
} else if std::env::var_os("PUNKTFUNK_IDD_PUSH").is_some() {
// P2 direct frame push: the host opens the driver's shared textures AND runs NVENC on the
// RENDER adapter, so on a hybrid box (4090 + iGPU) it MUST be the discrete encoder GPU —
// an iGPU-rendered surface is untouchable by NVENC. pf-vdisplay HONORS SET_RENDER_ADAPTER
// (SudoVDA ignored it), so pin the discrete GPU. The driver also reports the resulting
// render LUID in the shared header, so the host binds correctly even if this is overridden.
tracing::info!("IDD push: pinning the discrete render GPU (SET_RENDER_ADAPTER)");
unsafe { resolve_render_adapter_luid() }
} else {
tracing::info!(
"SudoVDA SET_RENDER_ADAPTER skipped (Apollo-parity: no render pin — avoids cross-GPU \
@@ -735,7 +784,9 @@ unsafe fn create_monitor(device: isize, mode: Mode, watchdog_s: u32) -> Result<M
// (the old `let _ =` swallowed it, which masked exactly this during the bad-state churn).
Err(e) => {
if !warned {
tracing::warn!("SudoVDA keepalive PING failed (control handle lost?): {e:#}");
tracing::warn!(
"SudoVDA keepalive PING failed (control handle lost?): {e:#}"
);
warned = true;
}
}
@@ -796,6 +847,7 @@ unsafe fn create_monitor(device: isize, mode: Mode, watchdog_s: u32) -> Result<M
stop,
pinger: Some(pinger),
ccd_saved,
gen: MON_GEN.fetch_add(1, Ordering::Relaxed),
})
}
}
@@ -894,6 +946,39 @@ fn mgr_acquire(mode: Mode) -> Result<VirtualOutput> {
let device = mgr_ensure_device(&mut g)?;
let watchdog_s = g.watchdog_s;
// IDD-push: a new connection while a monitor is live = a single-client RECONNECT (the prior client
// is gone — IDD-push is one display, no concurrency). A REUSED IddCx monitor's swap-chain is DEAD,
// so joining it would hand the new client a black screen until the old session times out. PREEMPT:
// tear the old monitor down (its Drop restores topology + IOCTL_REMOVEs) and fall through to create
// a FRESH one. The old session's lease is gen-stamped, so its later drop is ignored (mgr_release
// no-op) and can't tear down the new monitor.
if idd_push_mode()
&& matches!(
g.state,
MgrState::Active { .. } | MgrState::Lingering { .. }
)
{
if let MgrState::Active { mon, .. } | MgrState::Lingering { mon, .. } =
std::mem::replace(&mut g.state, MgrState::Idle)
{
tracing::info!(
old_target = mon.target_id,
"IDD-push reconnect — preempting the prior session, recreating a fresh monitor"
);
// teardown() — NOT drop() — sends IOCTL_REMOVE (and restores topology). `Monitor` has NO
// `Drop` impl, so a bare `drop(mon)` orphaned the IddCx monitor in the driver: it was never
// departed, so it kept a live D3D device + a stuck swap-chain processor thread, and these
// accumulated every reconnect (the driver-side churn leak: +1 device, ~36 nvwgf2umx threads,
// ~50 MB VRAM per session, until it choked). teardown frees it via the driver's do_remove.
unsafe { mon.teardown(device) };
// Let the OS finish the ASYNC IddCx monitor departure before the next ADD. A back-to-back
// REMOVE→ADD races the teardown and the ADD IOCTL is rejected (`DeviceIoControl failed`)
// under reconnect churn. Held under the MGR lock, but IDD-push setup is already serialized
// (IDD_SETUP_LOCK), so this only paces the recreate — exactly what a reconnect flood needs.
thread::sleep(Duration::from_millis(400));
}
}
// A live monitor already exists — join it (refcount++). This covers a concurrent session AND the
// build-then-drop overlap of a mid-stream Reconfigure / secure-return (the new lease is taken while
// the old is still held). If the requested mode differs, reconfigure the shared monitor to it so a
@@ -912,11 +997,13 @@ fn mgr_acquire(mode: Mode) -> Result<VirtualOutput> {
);
let pm = Some((mon.mode.width, mon.mode.height, mon.mode.refresh_hz));
let target = mon.target();
let gen = mon.gen;
CURRENT_MON_GEN.store(gen, Ordering::Relaxed);
return Ok(VirtualOutput {
node_id: 0,
preferred_mode: pm,
win_capture: target,
keepalive: Box::new(MonitorLease),
keepalive: Box::new(MonitorLease { gen }),
});
}
@@ -937,12 +1024,14 @@ fn mgr_acquire(mode: Mode) -> Result<VirtualOutput> {
};
let pm = Some((mon.mode.width, mon.mode.height, mon.mode.refresh_hz));
let target = mon.target();
let gen = mon.gen;
CURRENT_MON_GEN.store(gen, Ordering::Relaxed);
g.state = MgrState::Active { mon, refs: 1 };
Ok(VirtualOutput {
node_id: 0,
preferred_mode: pm,
win_capture: target,
keepalive: Box::new(MonitorLease),
keepalive: Box::new(MonitorLease { gen }),
})
}
@@ -966,8 +1055,18 @@ unsafe fn mgr_reconfigure(mon: &mut Monitor, mode: Mode) {
}
/// Release a session's hold: refcount-- ; when the last session leaves, LINGER before teardown.
fn mgr_release() {
/// `gen` is the lease's monitor generation: a STALE lease (its monitor was already torn down +
/// recreated under it — the IDD-push reconnect-preempt path) does nothing, so it can't decrement the
/// CURRENT (fresh) monitor's refcount and tear it down.
fn mgr_release(gen: u64) {
let mut g = MGR.lock().unwrap();
let stale = match &g.state {
MgrState::Active { mon, .. } | MgrState::Lingering { mon, .. } => mon.gen != gen,
MgrState::Idle => true,
};
if stale {
return;
}
g.state = match std::mem::replace(&mut g.state, MgrState::Idle) {
MgrState::Active { mon, refs } if refs > 1 => MgrState::Active {
mon,
@@ -988,6 +1087,28 @@ fn mgr_release() {
};
}
/// Wait (up to `timeout`) for the active monitor to be RELEASED — i.e. the MGR is no longer `Active`
/// (the prior session dropped its lease → `Lingering`/`Idle`). Used by the IDD-push reconnect preempt:
/// after signalling the old session to stop, we wait here so it tears its monitor down CLEANLY (while
/// frames still flow) before we acquire a fresh one — instead of dropping the monitor out from under a
/// still-live session, which churns the driver's ADD/REMOVE path and wedges it under rapid reconnects.
pub(crate) fn wait_for_monitor_released(timeout: Duration) {
let deadline = Instant::now() + timeout;
loop {
if !matches!(MGR.lock().unwrap().state, MgrState::Active { .. }) {
return;
}
if Instant::now() >= deadline {
tracing::warn!(
"IDD-push preempt: prior session didn't release the monitor within {timeout:?} — \
proceeding (mgr_acquire will preempt it)"
);
return;
}
thread::sleep(Duration::from_millis(25));
}
}
/// Background timer (started once): tear down a monitor that has lingered past its deadline (→ Idle),
/// so a physical-screen user gets their screen back after they stop streaming.
fn ensure_linger_timer() {
@@ -1012,11 +1133,15 @@ fn ensure_linger_timer() {
});
}
/// A session's lease on the shared monitor. Drop releases the refcount (→ linger when it hits 0).
struct MonitorLease;
/// A session's lease on the shared monitor. Drop releases the refcount (→ linger when it hits 0),
/// UNLESS the monitor was already torn down + recreated under it (gen mismatch — the IDD-push
/// reconnect-preempt path), in which case the drop is a no-op so it can't tear down the new monitor.
struct MonitorLease {
gen: u64,
}
impl Drop for MonitorLease {
fn drop(&mut self) {
mgr_release();
mgr_release(self.gen);
}
}