feat(windows): pf-vdisplay IDD-push — HDR + pipelined zero-copy capture
apple / swift (push) Successful in 1m4s
windows-host / package (push) Successful in 6m28s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m14s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m10s
release / apple (push) Successful in 7m53s
android / android (push) Successful in 10m33s
ci / web (push) Successful in 44s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 3m4s
ci / docs-site (push) Successful in 53s
ci / rust (push) Successful in 12m22s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m11s
apple / screenshots (push) Successful in 5m24s
deb / build-publish (push) Successful in 3m16s
decky / build-publish (push) Successful in 21s
ci / bench (push) Successful in 4m42s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m34s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m13s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 47s
flatpak / build-publish (push) Successful in 4m24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 25s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m44s
apple / swift (push) Successful in 1m4s
windows-host / package (push) Successful in 6m28s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m14s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m10s
release / apple (push) Successful in 7m53s
android / android (push) Successful in 10m33s
ci / web (push) Successful in 44s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 3m4s
ci / docs-site (push) Successful in 53s
ci / rust (push) Successful in 12m22s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m11s
apple / screenshots (push) Successful in 5m24s
deb / build-publish (push) Successful in 3m16s
decky / build-publish (push) Successful in 21s
ci / bench (push) Successful in 4m42s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m34s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m13s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 47s
flatpak / build-publish (push) Successful in 4m24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 25s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m44s
HDR (display-driven, matching the WGC path): - CTA-861.3 HDR EDID (BT.2020 primaries + HDR Static Metadata block) so Windows offers "Use HDR" on the virtual display. The host FOLLOWS the display's live advanced-color state, recreating the shared ring at the matching format (FP16 in HDR / BGRA in SDR) on a toggle — no freeze. - Always emit Main10/BT.2020-PQ Rgb10a2 while the display is HDR; the client auto-detects PQ from the HEVC VUI (clients under-report VIDEO_CAP_10BIT). Generic HDR10 mastering SEI on every IDR. - Generation-tagged `latest` (gen<<40|seq<<8|slot) + driver `is_stale` re-attach kill the toggle-time garbage frame and any stale-ring read. Perf: - Pipeline the encode loop (Capturer::pipeline_depth; IDD-push = 2): submit N+1 before polling N so the convert/copy on the 3D engine overlaps the NVENC encode of N on the ASIC. PUNKTFUNK_IDD_DEPTH overrides (1 = synchronous). - Rotating host output ring (OUT_RING) so the in-flight encode and the next convert never touch the same texture. - HDR converts directly from the keyed-mutex slot's SRV into the output ring (drops the redundant slot->fp16 scratch copy); SDR copies the BGRA slot in. The slot mutex is held only across the convert/copy, not the encode. RING_LEN 3->6 for publish headroom. - Capture-health diagnostic: new_fps vs repeat_fps under PUNKTFUNK_PERF (a low new_fps at a high send rate means the source isn't compositing, not an encode stall). Validated live on the RTX box: 5120x1440@240 HDR streams; driver composes ~180 new fps, encode 240 fps @ ~4.3 ms p50. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -2149,6 +2149,22 @@ fn session_watcher_loop(tx: std::sync::mpsc::Sender<SessionSwitch>, stop: Arc<At
|
||||
/// keepalive, the virtual output) while the data-plane `session` continues untouched —
|
||||
/// the rebuilt encoder opens with an IDR + in-band parameter sets. `probe_rx`/`probe_result_tx`
|
||||
/// carry speed-test bursts (see [`service_probes`]).
|
||||
/// The stop flag of the current in-process IDD-push session, so a NEW connection can PREEMPT it.
|
||||
/// A fresh connection means the prior client is gone (a reconnect) and a reused IddCx monitor's
|
||||
/// swap-chain is dead — so we stop the prior session (it releases its monitor cleanly while frames
|
||||
/// still flow), then build a fresh one, instead of joining a dying session or tearing its monitor out
|
||||
/// from under it (which churns the driver's ADD/REMOVE path and wedges it under rapid reconnects).
|
||||
#[cfg(target_os = "windows")]
|
||||
static IDD_SESSION_STOP: std::sync::Mutex<Option<Arc<AtomicBool>>> = std::sync::Mutex::new(None);
|
||||
|
||||
/// Serializes IDD-push session SETUP (preempt + monitor create + first frame). Held across setup,
|
||||
/// released before the encode loop — so a reconnect FLOOD can never run concurrent monitor
|
||||
/// create/teardown (the churn that fails the ADD IOCTL and wedges the driver). Each session finishes
|
||||
/// setup before the next acquires this and preempts it, by which point the preempted session is in its
|
||||
/// encode loop and releases its monitor promptly.
|
||||
#[cfg(target_os = "windows")]
|
||||
static IDD_SETUP_LOCK: std::sync::Mutex<()> = std::sync::Mutex::new(());
|
||||
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
fn virtual_stream(
|
||||
session: Session,
|
||||
@@ -2197,9 +2213,30 @@ fn virtual_stream(
|
||||
bit_depth,
|
||||
"punktfunk/1 virtual display"
|
||||
);
|
||||
// IDD-push reconnect preempt: a fresh connection means the prior client is gone. Hold IDD_SETUP_LOCK
|
||||
// across the preempt + pipeline build so a reconnect FLOOD can't run concurrent monitor
|
||||
// create/teardown. Then STOP the prior session (it ends cleanly while its monitor still composites
|
||||
// frames) and WAIT for it to release its monitor, before building a FRESH one — instead of the
|
||||
// driver-churning teardown of a monitor under a still-live session. Register THIS session's stop so
|
||||
// the next reconnect preempts it.
|
||||
#[cfg(target_os = "windows")]
|
||||
let idd_setup_guard = std::env::var_os("PUNKTFUNK_IDD_PUSH")
|
||||
.is_some()
|
||||
.then(|| IDD_SETUP_LOCK.lock().unwrap());
|
||||
#[cfg(target_os = "windows")]
|
||||
if std::env::var_os("PUNKTFUNK_IDD_PUSH").is_some() {
|
||||
let prev = IDD_SESSION_STOP.lock().unwrap().replace(stop.clone());
|
||||
if let Some(prev_stop) = prev {
|
||||
prev_stop.store(true, Ordering::SeqCst);
|
||||
crate::vdisplay::sudovda::wait_for_monitor_released(std::time::Duration::from_secs(3));
|
||||
}
|
||||
}
|
||||
let mut vd = crate::vdisplay::open(compositor)?;
|
||||
let (mut capturer, mut enc, mut frame, mut interval) =
|
||||
build_pipeline_with_retry(&mut vd, mode, bitrate_kbps, bit_depth)?;
|
||||
// Setup done — release the IDD-push setup lock so the next reconnect can begin (and preempt us).
|
||||
#[cfg(target_os = "windows")]
|
||||
drop(idd_setup_guard);
|
||||
|
||||
// Windows single-process DDA path (PUNKTFUNK_NO_WGC=1): the SudoVDA virtual display, isolated as the
|
||||
// SOLE active output, goes into fullscreen independent-flip (one plane on one display) which Desktop
|
||||
@@ -2276,6 +2313,17 @@ fn virtual_stream(
|
||||
let mut capture_rebuilds: u32 = 0;
|
||||
// Last HDR mastering metadata we forwarded — re-sent as 0xCE on change/keyframe (see below).
|
||||
let mut last_hdr_meta: Option<punktfunk_core::quic::HdrMeta> = None;
|
||||
// Frames submitted to NVENC but not yet polled (capture_ns, pacing deadline). With a capturer that
|
||||
// hands a fresh output texture per frame, the loop submits N+1 before polling N (pipeline depth > 1),
|
||||
// overlapping the convert/copy of N+1 on the 3D engine with the encode of N on the NVENC ASIC.
|
||||
let mut inflight: std::collections::VecDeque<(u64, std::time::Instant)> =
|
||||
std::collections::VecDeque::new();
|
||||
// Diagnostic: distinguish NEW captured frames (the source produced a fresh frame) from REPEATS (the
|
||||
// loop re-encoded the last frame because `try_latest` had nothing). A low new-frame rate at a high
|
||||
// send rate ⇒ the capture source isn't producing frames (e.g. an IDD virtual display DWM isn't
|
||||
// compositing), NOT an encoder problem. Logged every 2 s when `PUNKTFUNK_PERF`.
|
||||
let (mut diag_new, mut diag_repeat) = (0u64, 0u64);
|
||||
let mut diag_at = std::time::Instant::now();
|
||||
while !stop.load(Ordering::SeqCst) && std::time::Instant::now() < deadline {
|
||||
// Mid-stream session switch (the box flipped Gaming↔Desktop): rebuild the WHOLE backend in
|
||||
// place — a different compositor at the SAME client mode — keeping the Session + send thread
|
||||
@@ -2384,9 +2432,10 @@ fn virtual_stream(
|
||||
match capturer.try_latest() {
|
||||
Ok(Some(f)) => {
|
||||
frame = f;
|
||||
diag_new += 1;
|
||||
capture_rebuilds = 0; // a delivered frame clears the consecutive-loss counter
|
||||
}
|
||||
Ok(None) => {} // no new frame (static desktop / mid-rebuild) — repeat the last frame
|
||||
Ok(None) => diag_repeat += 1, // no new frame (static desktop / mid-rebuild) — repeat the last
|
||||
// The capture source died (PipeWire/compositor thread ended, virtual output gone). Rather
|
||||
// than tear the whole session down — the client has no reconnect path and would have to
|
||||
// cold-restart the handshake — rebuild the pipeline IN PLACE at the current mode, exactly
|
||||
@@ -2411,6 +2460,18 @@ fn virtual_stream(
|
||||
next = std::time::Instant::now();
|
||||
}
|
||||
}
|
||||
if perf && diag_at.elapsed() >= std::time::Duration::from_secs(2) {
|
||||
let secs = diag_at.elapsed().as_secs_f64();
|
||||
tracing::info!(
|
||||
new_fps = format!("{:.0}", diag_new as f64 / secs),
|
||||
repeat_fps = format!("{:.0}", diag_repeat as f64 / secs),
|
||||
"capture diag: NEW frames from the source vs REPEATS (low new_fps at high send rate ⇒ \
|
||||
the source isn't producing frames, not an encode stall)"
|
||||
);
|
||||
diag_new = 0;
|
||||
diag_repeat = 0;
|
||||
diag_at = std::time::Instant::now();
|
||||
}
|
||||
// The source's static HDR mastering metadata (Windows GetDesc1; None on Linux/SDR) is the
|
||||
// single source of truth: hand it to the encoder (in-band SEI on keyframes) and, when it
|
||||
// changes, to the client (0xCE). Re-sent on each keyframe below so a dropped best-effort
|
||||
@@ -2421,13 +2482,26 @@ fn virtual_stream(
|
||||
if resend_meta {
|
||||
last_hdr_meta = hdr_meta;
|
||||
}
|
||||
// How deep to pipeline (1 = synchronous submit→poll, the original behaviour). The IDD-push
|
||||
// capturer hands a rotating ring of output textures, so it returns >1; other capturers default 1.
|
||||
let depth = capturer.pipeline_depth().max(1);
|
||||
let capture_ns = now_ns();
|
||||
enc.submit(&frame).context("encoder submit")?;
|
||||
// The deadline for this frame's packets (the next frame's due time); the send thread paces
|
||||
// up to here so a high-bitrate frame spreads over the interval instead of bursting.
|
||||
// This frame's pacing deadline (the next frame's due time); the send thread spreads a big frame
|
||||
// up to here. Each in-flight frame carries its own (capture_ns, deadline) for when it's polled.
|
||||
next += interval;
|
||||
inflight.push_back((capture_ns, next));
|
||||
// Drain the OLDEST in-flight frames, keeping at most depth-1 deferred. At depth 1 this polls
|
||||
// immediately after every submit (synchronous); at depth 2 it polls N right after submitting N+1,
|
||||
// so the encode of N overlaps the convert/copy of N+1. NVENC's `pending` is FIFO, so poll() returns
|
||||
// the oldest submitted frame's AU — matching `inflight.pop_front()`.
|
||||
let mut send_gone = false;
|
||||
while let Some(au) = enc.poll().context("encoder poll")? {
|
||||
while inflight.len() >= depth {
|
||||
let au = match enc.poll().context("encoder poll")? {
|
||||
Some(au) => au,
|
||||
None => break, // no AU ready for a submitted frame (shouldn't happen — poll blocks)
|
||||
};
|
||||
let (cap_ns, deadline) = inflight.pop_front().expect("inflight non-empty");
|
||||
let flags = if au.keyframe {
|
||||
(FLAG_PIC | FLAG_SOF) as u32
|
||||
} else {
|
||||
@@ -2442,12 +2516,12 @@ fn virtual_stream(
|
||||
resend_meta = false;
|
||||
}
|
||||
}
|
||||
let encode_us = (now_ns().saturating_sub(capture_ns) / 1000) as u32;
|
||||
let encode_us = (now_ns().saturating_sub(cap_ns) / 1000) as u32;
|
||||
let msg = FrameMsg {
|
||||
data: au.data,
|
||||
capture_ns,
|
||||
capture_ns: cap_ns,
|
||||
flags,
|
||||
deadline: next,
|
||||
deadline,
|
||||
encode_us,
|
||||
};
|
||||
// Hand to the send thread; this blocks (backpressure) if it's behind. An Err means it
|
||||
@@ -2466,6 +2540,28 @@ fn virtual_stream(
|
||||
None => next = std::time::Instant::now(),
|
||||
}
|
||||
}
|
||||
// Drain the in-flight tail (the depth-1 frames submitted but not yet polled) so the last frames still
|
||||
// reach the client instead of being dropped on the way out.
|
||||
while let Some((cap_ns, deadline)) = inflight.pop_front() {
|
||||
let Ok(Some(au)) = enc.poll() else { break };
|
||||
let flags = if au.keyframe {
|
||||
(FLAG_PIC | FLAG_SOF) as u32
|
||||
} else {
|
||||
FLAG_PIC as u32
|
||||
};
|
||||
let encode_us = (now_ns().saturating_sub(cap_ns) / 1000) as u32;
|
||||
let msg = FrameMsg {
|
||||
data: au.data,
|
||||
capture_ns: cap_ns,
|
||||
flags,
|
||||
deadline,
|
||||
encode_us,
|
||||
};
|
||||
if frame_tx.send(msg).is_err() {
|
||||
break;
|
||||
}
|
||||
sent += 1;
|
||||
}
|
||||
// Signal the send thread to drain + exit (drop the channel), then join it.
|
||||
drop(frame_tx);
|
||||
let _ = send_thread.join();
|
||||
@@ -2484,6 +2580,14 @@ fn should_use_helper() -> bool {
|
||||
if std::env::var_os("PUNKTFUNK_NO_HELPER").is_some() || crate::capture::wgc_disabled() {
|
||||
return false;
|
||||
}
|
||||
// IDD direct-push captures IN-PROCESS in Session 0: the pf-vdisplay driver delivers frames to the
|
||||
// SYSTEM host's session via shared memory and NVENC is headless, so no user-session WGC helper is
|
||||
// needed for VIDEO (and a Session-1 helper couldn't open the Session-0 shared textures anyway).
|
||||
// NOTE: input injection (SendInput) from Session 0 can't reach the user's Session-1 desktop yet —
|
||||
// a known follow-up; this path validates the video transport. See docs/windows-virtual-display-rust-port.md.
|
||||
if std::env::var_os("PUNKTFUNK_IDD_PUSH").is_some() {
|
||||
return false;
|
||||
}
|
||||
std::env::var_os("PUNKTFUNK_FORCE_HELPER").is_some()
|
||||
|| crate::capture::wgc_relay::running_as_system()
|
||||
}
|
||||
@@ -2576,6 +2680,15 @@ fn virtual_stream_relay(
|
||||
let (mut _keepalive, mut relay, mut target, mut effective_hz) = build(&mut vd, mode)?;
|
||||
let mut cur_mode = mode;
|
||||
|
||||
// O3.1: optionally observe the IDD-push ring alongside WGC (WGC = the presentation trigger) to
|
||||
// confirm the 0257 driver pushes frames into a HOST-created ring. Diagnostic only; gated.
|
||||
if std::env::var_os("PUNKTFUNK_IDD_PUSH_OBSERVE").is_some() {
|
||||
crate::capture::idd_push::spawn_observer(
|
||||
target.clone(),
|
||||
Some((cur_mode.width, cur_mode.height, effective_hz)),
|
||||
);
|
||||
}
|
||||
|
||||
// The host's own DDA capturer+encoder for the SECURE (Winlogon) desktop, which WGC — and thus the
|
||||
// helper — cannot capture. Opened lazily on the first secure transition (so a session that never
|
||||
// hits a UAC/lock screen never pays for a second NVENC session), then kept for fast re-switch.
|
||||
@@ -3014,8 +3127,12 @@ fn build_pipeline(
|
||||
"compositor did not honor the requested refresh — encoding at the achieved rate"
|
||||
);
|
||||
}
|
||||
let mut capturer =
|
||||
crate::capture::capture_virtual_output(vout).context("capture virtual output")?;
|
||||
// HDR vs SDR for the IDD-push conversion: a negotiated 10-bit session (client advertised
|
||||
// VIDEO_CAP_10BIT + host opted in via PUNKTFUNK_10BIT) is our HDR path → BT.2020 PQ Rgb10a2;
|
||||
// otherwise the FP16 IDD frames are converted to 8-bit SDR. (Ignored by non-IDD-push backends,
|
||||
// which auto-detect HDR from the monitor state.)
|
||||
let mut capturer = crate::capture::capture_virtual_output(vout, bit_depth >= 10)
|
||||
.context("capture virtual output")?;
|
||||
capturer.set_active(true);
|
||||
let frame = capturer.next_frame().context("first frame")?;
|
||||
// `bit_depth` is the handshake-negotiated value (8, or 10 = HEVC Main10 when the client
|
||||
|
||||
Reference in New Issue
Block a user