fix(windows): IDD-push audit highs — keyed-mutex timeout, two per-frame leaks, IDD_PUSH knob, pooled-device threading
Five verified findings from the IDD-push/pf-vdisplay deep audit: - Keyed-mutex acquire (BOTH endpoints): AcquireSync returns WAIT_TIMEOUT (0x102) / WAIT_ABANDONED (0x80) as SUCCESS-severity HRESULTs, which the windows-rs Result wrapper erases — a busy slot read as "acquired", so driver and host could race the same ring texture (torn frames) and the designed busy-skip backpressure was dead code. Both sides now classify the raw vtable HRESULT; WAIT_ABANDONED counts as acquired (ownership transfers — refusing it would wedge the slot forever). - Host SDR hot path leaked one ID3D11VideoProcessorInputView per converted frame: the D3D11_VIDEO_PROCESSOR_STREAM ManuallyDrop field suppressed the release after VideoProcessorBlt. Released by hand now, success or not. - Driver leaked IddCx's per-acquire surface reference (from_raw_borrowed on a TRANSFERRED reference — the MS sample Attach/Reset's it): the swap-chain surface set survived swap-chain destruction, the likely true root cause of the ~50 MB-per-reconnect VRAM loss that device pooling only mitigated. Now adopted via from_raw (publisher or not) and dropped pre-Finished. - PUNKTFUNK_IDD_PUSH removed: capture is unconditionally IDD-push, but the vdisplay manager still gated the lingering-monitor preempt (and render pin) on the knob, whose default was OFF — dev/CLI runs reused a lingering monitor whose IddCx swap-chain is dead (black reconnect). The preempt and the render-GPU pin are now unconditional; host.env comments no longer promise the removed DDA/WGC fallback. - Driver D3D device: dropped D3D11_CREATE_DEVICE_SINGLETHREADED (unsound since DEVICE_POOL shares one device across processors) and the pooled immediate context is now SetMultithreadProtected — two concurrent monitors' workers otherwise race an unlocked context (UB in the UMD). No wire-contract change (pf-driver-proto untouched); the driver fixes take effect on the next pf-vdisplay redeploy. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -1129,8 +1129,14 @@ impl VideoConverter {
|
||||
pInputSurface: std::mem::ManuallyDrop::new(in_view),
|
||||
..Default::default()
|
||||
};
|
||||
self.vctx
|
||||
.VideoProcessorBlt(&self.vp, &out_view, 0, &[stream])
|
||||
.context("VideoProcessorBlt")
|
||||
let blt =
|
||||
self.vctx
|
||||
.VideoProcessorBlt(&self.vp, &out_view, 0, std::slice::from_ref(&stream));
|
||||
// COM in-params never transfer ownership: the Blt only borrowed the input view, and the
|
||||
// struct's `ManuallyDrop` field suppressed its release — drop it by hand, success or not.
|
||||
// (Skipping this leaked one view + its UMD allocation PER CONVERTED FRAME — the SDR hot
|
||||
// path; D3D11 defers the actual destruction until the GPU is done with the blit.)
|
||||
drop(std::mem::ManuallyDrop::into_inner(stream.pInputSurface));
|
||||
blt.context("VideoProcessorBlt")
|
||||
}
|
||||
}
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
//! target process's handle table, so the bootstrap's ACL is not load-bearing; the only way to reach the
|
||||
//! frames is to already be one of the two endpoint processes. The driver copies frames in; we consume
|
||||
//! the ring straight into the zero-copy NVENC path — no DXGI Desktop Duplication, no `win32u` hook.
|
||||
//! Gated by `PUNKTFUNK_IDD_PUSH`. Driver counterpart: `packaging/windows/drivers/pf-vdisplay/src/
|
||||
//! The SOLE Windows capture path. Driver counterpart: `packaging/windows/drivers/pf-vdisplay/src/
|
||||
//! frame_transport.rs`. The shared `SharedHeader` layout, `MAGIC`/`VERSION`/`RING_LEN`, the
|
||||
//! `DRV_STATUS_*` codes, the channel-delivery struct and the publish token all come from
|
||||
//! [`pf_driver_proto`] (which OWNS the contract, with `const` size asserts) — both sides `use` it, so
|
||||
@@ -133,7 +133,7 @@ struct HostSlot {
|
||||
shared: OwnedHandle,
|
||||
/// SRV on the slot texture so the HDR path samples the FP16 slot DIRECTLY (no slot→scratch copy);
|
||||
/// the convert pass writes the output ring while holding the slot's keyed mutex. Unused for SDR
|
||||
/// (which CopyResource's the BGRA slot straight to the output).
|
||||
/// (which converts the BGRA slot → NV12 on the video engine, via its own per-frame input view).
|
||||
srv: ID3D11ShaderResourceView,
|
||||
}
|
||||
|
||||
@@ -147,6 +147,13 @@ struct KeyedMutexGuard<'a> {
|
||||
key: u64,
|
||||
}
|
||||
|
||||
/// `WAIT_ABANDONED` as an HRESULT: the driver died while holding the slot's keyed mutex — ownership
|
||||
/// still transferred to this caller. SUCCESS-severity (positive), like `WAIT_TIMEOUT` (0x102): the
|
||||
/// windows-rs `Result` wrapper erases both (`.ok()` maps every non-negative HRESULT to `Ok(())`), so
|
||||
/// acquisition MUST be classified on the raw vtable HRESULT. Mirrors the driver's constants
|
||||
/// (`frame_transport.rs`).
|
||||
const WAIT_ABANDONED_HRESULT: i32 = 0x0000_0080;
|
||||
|
||||
impl<'a> KeyedMutexGuard<'a> {
|
||||
/// Acquire `mutex` at `key`, waiting up to `timeout_ms`. `None` if the acquire times out / errors
|
||||
/// (the caller skips the frame), so the guard is only ever held when the lock is genuinely held.
|
||||
@@ -156,10 +163,19 @@ impl<'a> KeyedMutexGuard<'a> {
|
||||
timeout_ms: u32,
|
||||
) -> Option<KeyedMutexGuard<'a>> {
|
||||
// SAFETY: `mutex` is a live `IDXGIKeyedMutex` on this thread's immediate-context device.
|
||||
if unsafe { mutex.AcquireSync(key, timeout_ms) }.is_err() {
|
||||
return None;
|
||||
// Raw vtable call, NOT the `Result` wrapper: `.is_err()` treated WAIT_TIMEOUT (positive =
|
||||
// `Ok`) as acquired, handing out a guard for a slot the DRIVER still held — converting from
|
||||
// a texture mid-copy (torn frame) and `ReleaseSync`ing a key this side never took.
|
||||
let hr = unsafe {
|
||||
(Interface::vtable(mutex).AcquireSync)(Interface::as_raw(mutex), key, timeout_ms)
|
||||
};
|
||||
match hr.0 {
|
||||
// Acquired — S_OK, or WAIT_ABANDONED (the driver died holding the slot: the lock is
|
||||
// OURS now, and refusing the guard would leave the key held forever, wedging the slot).
|
||||
0 | WAIT_ABANDONED_HRESULT => Some(KeyedMutexGuard { mutex, key }),
|
||||
// WAIT_TIMEOUT (slot busy — the caller skips this frame) or a genuine error: never held.
|
||||
_ => None,
|
||||
}
|
||||
Some(KeyedMutexGuard { mutex, key })
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
//! process lifetime**, so a lazily-parsed global is equivalent to "parsed once at startup".
|
||||
//!
|
||||
//! **Goal-1 stages 1–2** (`design/windows-host-rewrite.md` §2.2): stage 1 stood this up; stage 2 migrated the
|
||||
//! genuinely-constant operator/dispatch knobs onto it (the dispatch-disagreement bug class: `idd_push`,
|
||||
//! genuinely-constant operator/dispatch knobs onto it (the dispatch-disagreement bug class:
|
||||
//! `encoder_pref`, `render_adapter`, the vdisplay backend select — plus the plan-named
|
||||
//! `idd_depth`/`zerocopy`/`ten_bit`/`four_four_four` and the multi-site `perf`/`compositor`/
|
||||
//! `video_source`/`gamepad`). `SessionPlan` (stage 3) consumes it as the single owner of the
|
||||
@@ -36,12 +36,6 @@ use std::sync::OnceLock;
|
||||
/// derived `Debug` impl, so the parser can stay a single platform-neutral function.
|
||||
#[derive(Debug, Clone, Default)]
|
||||
pub struct HostConfig {
|
||||
/// `PUNKTFUNK_IDD_PUSH` — IDD direct-push monitor mode (the per-session monitor + ring recreate and
|
||||
/// the discrete-render-GPU pin in [`crate::vdisplay::manager`]). IDD-push is the sole Windows capture
|
||||
/// path (DXGI Desktop Duplication and the WGC relay were removed), so this should stay on — the
|
||||
/// installer's `host.env` sets it. **Value-aware** (`0`/`false`/`no`/`off`/empty ⇒ off, else on);
|
||||
/// unset ⇒ off. NOT a bare presence flag (so an operator can turn it OFF with `=0`).
|
||||
pub idd_push: bool,
|
||||
/// `PUNKTFUNK_ENCODER` — explicit encoder-backend override (lowercased; empty = auto-detect by GPU vendor).
|
||||
pub encoder_pref: String,
|
||||
/// `PUNKTFUNK_RENDER_ADAPTER` — discrete render-GPU pin by description substring (`Some` even when empty:
|
||||
@@ -80,16 +74,9 @@ impl HostConfig {
|
||||
// String value: `var(k).ok()` — `Some` (possibly empty) when set with valid UTF-8, else `None`.
|
||||
let val = |k: &str| std::env::var(k).ok();
|
||||
Self {
|
||||
// Value-aware (not a bare presence flag): the shipped default `host.env` turns it ON, and an
|
||||
// operator turns it OFF with `PUNKTFUNK_IDD_PUSH=0` (a `var_os` presence check would read `=0`
|
||||
// as "on"). Unset ⇒ off (the dev / non-pf-driver default).
|
||||
idd_push: match std::env::var("PUNKTFUNK_IDD_PUSH") {
|
||||
Ok(v) => !matches!(
|
||||
v.trim().to_ascii_lowercase().as_str(),
|
||||
"" | "0" | "false" | "no" | "off"
|
||||
),
|
||||
Err(_) => false,
|
||||
},
|
||||
// (`PUNKTFUNK_IDD_PUSH` was removed: IDD-push is the sole Windows capture path, so the knob
|
||||
// only split dispatch — capture ignored it while the vdisplay manager obeyed it, and `=0`
|
||||
// produced dead-swap-chain reuse on reconnect. A stale setting in an old host.env is ignored.)
|
||||
encoder_pref: std::env::var("PUNKTFUNK_ENCODER")
|
||||
.unwrap_or_default()
|
||||
.to_ascii_lowercase(),
|
||||
|
||||
@@ -240,7 +240,7 @@ impl VirtualDisplayManager {
|
||||
// NOT a reconnect. Preempting Active would tear a live session down AND churn REMOVE→ADD on every
|
||||
// retry — the per-cold-start monitor churn that exhausts the IddCx slot pool and wedges ADD at
|
||||
// 0x80070490. Active falls through to the JOIN path below (refcount++, no ADD).
|
||||
if idd_push_mode() && matches!(*state, MgrState::Lingering { .. }) {
|
||||
if matches!(*state, MgrState::Lingering { .. }) {
|
||||
if let MgrState::Lingering { mon, .. } = std::mem::replace(&mut *state, MgrState::Idle)
|
||||
{
|
||||
tracing::info!(
|
||||
@@ -676,31 +676,15 @@ impl Drop for MonitorLease {
|
||||
}
|
||||
}
|
||||
|
||||
/// IDD-push mode: a new client connection preempts + recreates the monitor (single-client reconnect),
|
||||
/// because a REUSED IddCx monitor's swap-chain is dead. Off → monitors are shared across sessions.
|
||||
fn idd_push_mode() -> bool {
|
||||
crate::config::config().idd_push
|
||||
}
|
||||
|
||||
/// The render-GPU pin decision (backend-neutral): pin the discrete render GPU when explicitly requested,
|
||||
/// or under IDD-push (the host runs NVENC on the render adapter, so it MUST be the discrete encoder GPU
|
||||
/// on a hybrid box). `None` = let the IDD use its natural adapter (Apollo parity — avoids the cross-GPU
|
||||
/// ACCESS_LOST storm SudoVDA hit when pinned).
|
||||
/// The render-GPU pin (backend-neutral): IDD-push — the sole Windows capture path — runs NVENC on the
|
||||
/// render adapter, so it must always be pinned to the selected encoder GPU (a hybrid box would
|
||||
/// otherwise render on the wrong one). The selection itself (web-console preference >
|
||||
/// `PUNKTFUNK_RENDER_ADAPTER` > max VRAM) lives in [`crate::win_adapter::resolve_render_adapter_luid`].
|
||||
/// (This was gated on the removed `PUNKTFUNK_IDD_PUSH` knob — a dispatch disagreement, since capture
|
||||
/// stopped consulting it when DDA/WGC were removed.)
|
||||
fn resolve_render_pin() -> Option<LUID> {
|
||||
// A web-console manual GPU preference pins exactly like PUNKTFUNK_RENDER_ADAPTER: the whole
|
||||
// pipeline (driver render device, capture ring, encoder) must sit on the chosen adapter.
|
||||
let manual_pref = crate::gpu::prefs().get().mode == crate::gpu::GpuMode::Manual;
|
||||
if crate::config::config().render_adapter.is_some() || manual_pref {
|
||||
crate::win_adapter::resolve_render_adapter_luid()
|
||||
} else if crate::config::config().idd_push {
|
||||
tracing::info!("IDD push: pinning the discrete render GPU (SET_RENDER_ADAPTER)");
|
||||
crate::win_adapter::resolve_render_adapter_luid()
|
||||
} else {
|
||||
tracing::info!(
|
||||
"SET_RENDER_ADAPTER skipped (Apollo-parity: no render pin; set PUNKTFUNK_RENDER_ADAPTER=<name> to force one)"
|
||||
);
|
||||
None
|
||||
}
|
||||
tracing::info!("IDD push: pinning the render GPU (SET_RENDER_ADAPTER)");
|
||||
crate::win_adapter::resolve_render_adapter_luid()
|
||||
}
|
||||
|
||||
/// Linger window before a session-less monitor is torn down (default 10 s; `PUNKTFUNK_MONITOR_LINGER_MS`).
|
||||
|
||||
@@ -797,10 +797,9 @@ fn ensure_default_host_env() -> Result<()> {
|
||||
# Force one with nvenc | amf | qsv | sw (software H.264). amf/qsv need an FFmpeg-built host.\n\
|
||||
PUNKTFUNK_ENCODER=auto\n\
|
||||
PUNKTFUNK_VIDEO_SOURCE=virtual\n\
|
||||
# Virtual display = the bundled pf-vdisplay driver; capture from its shared ring (the validated\n\
|
||||
# zero-copy IDD-push path; falls back to DDA if it can't attach). Set PUNKTFUNK_IDD_PUSH=0 to force WGC/DDA.\n\
|
||||
# Virtual display = the bundled pf-vdisplay driver; capture is IDD-push from its shared ring\n\
|
||||
# (the sole capture path — zero-copy, includes the secure desktop; DDA/WGC were removed).\n\
|
||||
PUNKTFUNK_VDISPLAY=pf\n\
|
||||
PUNKTFUNK_IDD_PUSH=1\n\
|
||||
PUNKTFUNK_SECURE_DDA=1\n\
|
||||
RUST_LOG=info\n\
|
||||
\n\
|
||||
|
||||
Reference in New Issue
Block a user