fix(windows): IDD-push audit highs — keyed-mutex timeout, two per-frame leaks, IDD_PUSH knob, pooled-device threading

Five verified findings from the IDD-push/pf-vdisplay deep audit:

- Keyed-mutex acquire (BOTH endpoints): AcquireSync returns WAIT_TIMEOUT
  (0x102) / WAIT_ABANDONED (0x80) as SUCCESS-severity HRESULTs, which the
  windows-rs Result wrapper erases — a busy slot read as "acquired", so
  driver and host could race the same ring texture (torn frames) and the
  designed busy-skip backpressure was dead code. Both sides now classify
  the raw vtable HRESULT; WAIT_ABANDONED counts as acquired (ownership
  transfers — refusing it would wedge the slot forever).
- Host SDR hot path leaked one ID3D11VideoProcessorInputView per converted
  frame: the D3D11_VIDEO_PROCESSOR_STREAM ManuallyDrop field suppressed the
  release after VideoProcessorBlt. Released by hand now, success or not.
- Driver leaked IddCx's per-acquire surface reference (from_raw_borrowed on
  a TRANSFERRED reference — the MS sample Attach/Reset's it): the swap-chain
  surface set survived swap-chain destruction, the likely true root cause of
  the ~50 MB-per-reconnect VRAM loss that device pooling only mitigated.
  Now adopted via from_raw (publisher or not) and dropped pre-Finished.
- PUNKTFUNK_IDD_PUSH removed: capture is unconditionally IDD-push, but the
  vdisplay manager still gated the lingering-monitor preempt (and render
  pin) on the knob, whose default was OFF — dev/CLI runs reused a lingering
  monitor whose IddCx swap-chain is dead (black reconnect). The preempt and
  the render-GPU pin are now unconditional; host.env comments no longer
  promise the removed DDA/WGC fallback.
- Driver D3D device: dropped D3D11_CREATE_DEVICE_SINGLETHREADED (unsound
  since DEVICE_POOL shares one device across processors) and the pooled
  immediate context is now SetMultithreadProtected — two concurrent
  monitors' workers otherwise race an unlocked context (UB in the UMD).

No wire-contract change (pf-driver-proto untouched); the driver fixes take
effect on the next pf-vdisplay redeploy.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-03 16:27:13 +00:00
parent fbf3fea0c8
commit 0da9d8ec10
10 changed files with 119 additions and 79 deletions
@@ -38,7 +38,12 @@ use windows::Win32::System::Threading::SetEvent;
use windows::core::Interface;
/// `WAIT_TIMEOUT` as an HRESULT — `AcquireSync` returns this when the slot is held by the consumer.
/// SUCCESS-severity (positive), so the windows-rs `Result` wrapper can never surface it (`.ok()` maps
/// every non-negative HRESULT to `Ok(())`) — the publish loop reads the raw vtable HRESULT instead.
const WAIT_TIMEOUT_HRESULT: i32 = 0x0000_0102;
/// `WAIT_ABANDONED` as an HRESULT — the host died while holding the slot's keyed mutex. Also
/// SUCCESS-severity, and ownership DID transfer to the caller.
const WAIT_ABANDONED_HRESULT: i32 = 0x0000_0080;
/// One monitor's sealed-channel bootstrap: the handle VALUES the host duplicated into THIS process
/// (`IOCTL_SET_FRAME_CHANNEL`). Owning a `FrameChannel` means owning those handles — exactly one of
@@ -375,9 +380,18 @@ impl FramePublisher {
let slot = (start + attempt) % ring_len;
let s = &self.slots[slot as usize];
// SAFETY: `s.mutex` is the live keyed mutex on this ring slot's shared texture; a 0 ms
// try-acquire of key 0 (released below or on WAIT_TIMEOUT it's never held).
match unsafe { s.mutex.AcquireSync(0, 0) } {
Ok(()) => {
// try-acquire of key 0 (released below; on WAIT_TIMEOUT it's never held). Raw vtable
// call, NOT the `Result` wrapper: `.ok()` erases success codes, so through `Result` a
// WAIT_TIMEOUT (host holds the slot) is indistinguishable from a real acquire — the
// wrapper made the busy-skip arm below dead code and had us copying into (and
// publishing) a slot the host was still reading.
let hr = unsafe {
(Interface::vtable(&s.mutex).AcquireSync)(Interface::as_raw(&s.mutex), 0, 0)
};
match hr.0 {
// Acquired — S_OK, or WAIT_ABANDONED (the host died holding the slot: ownership
// still transferred; publish normally, a dead host consumes nothing either way).
0 | WAIT_ABANDONED_HRESULT => {
// STRAIGHT-LINE, NO `?` between acquire + release — a `?`-return here would leak the
// keyed-mutex lock and wedge the host on this slot. The ordering below is load-bearing:
// the CopyResource is GPU-ordered before the consumer via the slot keyed mutex, and the
@@ -409,8 +423,10 @@ impl FramePublisher {
self.next = (slot + 1) % ring_len;
return;
}
Err(e) if e.code().0 == WAIT_TIMEOUT_HRESULT => continue,
Err(_) => return,
// Busy — the host holds this slot (the designed backpressure): try the next one.
WAIT_TIMEOUT_HRESULT => continue,
// Genuine failure (negative HRESULT — device removed / invalid call): drop the frame.
_ => return,
}
}
// All slots busy — drop this frame (never block the swap-chain thread).