fix(windows): IDD-push audit highs — keyed-mutex timeout, two per-frame leaks, IDD_PUSH knob, pooled-device threading

Five verified findings from the IDD-push/pf-vdisplay deep audit:

- Keyed-mutex acquire (BOTH endpoints): AcquireSync returns WAIT_TIMEOUT
  (0x102) / WAIT_ABANDONED (0x80) as SUCCESS-severity HRESULTs, which the
  windows-rs Result wrapper erases — a busy slot read as "acquired", so
  driver and host could race the same ring texture (torn frames) and the
  designed busy-skip backpressure was dead code. Both sides now classify
  the raw vtable HRESULT; WAIT_ABANDONED counts as acquired (ownership
  transfers — refusing it would wedge the slot forever).
- Host SDR hot path leaked one ID3D11VideoProcessorInputView per converted
  frame: the D3D11_VIDEO_PROCESSOR_STREAM ManuallyDrop field suppressed the
  release after VideoProcessorBlt. Released by hand now, success or not.
- Driver leaked IddCx's per-acquire surface reference (from_raw_borrowed on
  a TRANSFERRED reference — the MS sample Attach/Reset's it): the swap-chain
  surface set survived swap-chain destruction, the likely true root cause of
  the ~50 MB-per-reconnect VRAM loss that device pooling only mitigated.
  Now adopted via from_raw (publisher or not) and dropped pre-Finished.
- PUNKTFUNK_IDD_PUSH removed: capture is unconditionally IDD-push, but the
  vdisplay manager still gated the lingering-monitor preempt (and render
  pin) on the knob, whose default was OFF — dev/CLI runs reused a lingering
  monitor whose IddCx swap-chain is dead (black reconnect). The preempt and
  the render-GPU pin are now unconditional; host.env comments no longer
  promise the removed DDA/WGC fallback.
- Driver D3D device: dropped D3D11_CREATE_DEVICE_SINGLETHREADED (unsound
  since DEVICE_POOL shares one device across processors) and the pooled
  immediate context is now SetMultithreadProtected — two concurrent
  monitors' workers otherwise race an unlocked context (UB in the UMD).

No wire-contract change (pf-driver-proto untouched); the driver fixes take
effect on the next pf-vdisplay redeploy.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-03 16:27:13 +00:00
parent fbf3fea0c8
commit 0da9d8ec10
10 changed files with 119 additions and 79 deletions
@@ -6,27 +6,30 @@
//! to the wdk-sys IddCx world happens via raw pointers in `swap_chain_processor.rs`.
//!
//! STEP 5 binds this device to the swap-chain to keep the monitor a live display; STEP 6 reuses the
//! device's immediate context in the frame publisher's `CopyResource` (both on the swap-chain processor
//! thread, the one thread this device is touched from).
//! device's immediate context in the frame publisher's `CopyResource` on each swap-chain processor
//! thread. The device is POOLED across processors (one per render LUID, [`pooled_device`]), so with
//! two live monitors two worker threads share it concurrently — creation must NOT pass
//! `D3D11_CREATE_DEVICE_SINGLETHREADED` (that was sound only pre-pooling, device-per-processor), and
//! the immediate context is `SetMultithreadProtected` (it has no internal locking of its own).
use std::sync::atomic::{AtomicI32, Ordering};
use std::sync::{Arc, Mutex};
use windows::{
Win32::{
Foundation::LUID,
Foundation::{BOOL, LUID},
Graphics::{
Direct3D::D3D_DRIVER_TYPE_UNKNOWN,
Direct3D11::{
D3D11_CREATE_DEVICE_BGRA_SUPPORT,
D3D11_CREATE_DEVICE_PREVENT_ALTERING_LAYER_SETTINGS_FROM_REGISTRY,
D3D11_CREATE_DEVICE_SINGLETHREADED, D3D11_SDK_VERSION, D3D11CreateDevice,
ID3D11Device, ID3D11DeviceContext,
D3D11_SDK_VERSION, D3D11CreateDevice, ID3D11Device, ID3D11DeviceContext,
ID3D11Multithread,
},
Dxgi::{CreateDXGIFactory2, DXGI_CREATE_FACTORY_FLAGS, IDXGIAdapter1, IDXGIFactory5},
},
},
core::Error,
core::{Error, Interface},
};
#[derive(thiserror::Error, Debug)]
@@ -53,8 +56,10 @@ pub struct Direct3DDevice {
_dxgi_factory: IDXGIFactory5,
_adapter: IDXGIAdapter1,
pub device: ID3D11Device,
/// The single (SINGLETHREADED) immediate context — used by STEP 6's frame-push publisher's
/// `CopyResource` on the swap-chain processor thread (the one thread this device is touched from).
/// The shared immediate context — used by STEP 6's frame-push publisher's `CopyResource` on each
/// swap-chain processor thread. Pooled across processors, so it is `SetMultithreadProtected` at
/// init: an immediate context has no internal locking, and two concurrent monitors' workers would
/// otherwise race it (undefined behavior inside the UMD).
pub device_context: ID3D11DeviceContext,
}
@@ -77,8 +82,10 @@ impl Direct3DDevice {
&adapter,
D3D_DRIVER_TYPE_UNKNOWN,
None,
// NO `D3D11_CREATE_DEVICE_SINGLETHREADED`: the DEVICE_POOL shares this device (and
// its immediate context) across every swap-chain processor on the LUID, so the
// single-caller guarantee that flag declares no longer holds with >1 monitor.
D3D11_CREATE_DEVICE_BGRA_SUPPORT
| D3D11_CREATE_DEVICE_SINGLETHREADED
| D3D11_CREATE_DEVICE_PREVENT_ALTERING_LAYER_SETTINGS_FROM_REGISTRY,
None,
D3D11_SDK_VERSION,
@@ -91,6 +98,23 @@ impl Direct3DDevice {
let device = device.ok_or("ID3D11Device not found")?;
let device_context = device_context.ok_or("ID3D11DeviceContext not found")?;
// The pool hands this device (and its immediate context) to every processor on the LUID, and
// an immediate context is not thread-safe by itself — turn on the runtime's per-call critical
// section. (D3D11.4 interface, guaranteed on the Win11-22H2 OS floor; if the cast ever fails
// we log and continue — a single monitor is still safe, concurrent ones would not be.)
match device_context.cast::<ID3D11Multithread>() {
Ok(mt) => {
// SAFETY: plain setter on the live context's multithread interface; the returned
// previous-state BOOL carries no obligation.
unsafe {
let _ = mt.SetMultithreadProtected(BOOL::from(true));
}
}
Err(e) => dbglog!(
"[pf-vd] ID3D11Multithread unavailable ({e:?}) — immediate context left unprotected"
),
}
let live = LIVE_DEVICES.fetch_add(1, Ordering::Relaxed) + 1;
dbglog!("[pf-vd] Direct3DDevice::init OK — live D3D devices = {live}");
@@ -38,7 +38,12 @@ use windows::Win32::System::Threading::SetEvent;
use windows::core::Interface;
/// `WAIT_TIMEOUT` as an HRESULT — `AcquireSync` returns this when the slot is held by the consumer.
/// SUCCESS-severity (positive), so the windows-rs `Result` wrapper can never surface it (`.ok()` maps
/// every non-negative HRESULT to `Ok(())`) — the publish loop reads the raw vtable HRESULT instead.
const WAIT_TIMEOUT_HRESULT: i32 = 0x0000_0102;
/// `WAIT_ABANDONED` as an HRESULT — the host died while holding the slot's keyed mutex. Also
/// SUCCESS-severity, and ownership DID transfer to the caller.
const WAIT_ABANDONED_HRESULT: i32 = 0x0000_0080;
/// One monitor's sealed-channel bootstrap: the handle VALUES the host duplicated into THIS process
/// (`IOCTL_SET_FRAME_CHANNEL`). Owning a `FrameChannel` means owning those handles — exactly one of
@@ -375,9 +380,18 @@ impl FramePublisher {
let slot = (start + attempt) % ring_len;
let s = &self.slots[slot as usize];
// SAFETY: `s.mutex` is the live keyed mutex on this ring slot's shared texture; a 0 ms
// try-acquire of key 0 (released below or on WAIT_TIMEOUT it's never held).
match unsafe { s.mutex.AcquireSync(0, 0) } {
Ok(()) => {
// try-acquire of key 0 (released below; on WAIT_TIMEOUT it's never held). Raw vtable
// call, NOT the `Result` wrapper: `.ok()` erases success codes, so through `Result` a
// WAIT_TIMEOUT (host holds the slot) is indistinguishable from a real acquire — the
// wrapper made the busy-skip arm below dead code and had us copying into (and
// publishing) a slot the host was still reading.
let hr = unsafe {
(Interface::vtable(&s.mutex).AcquireSync)(Interface::as_raw(&s.mutex), 0, 0)
};
match hr.0 {
// Acquired — S_OK, or WAIT_ABANDONED (the host died holding the slot: ownership
// still transferred; publish normally, a dead host consumes nothing either way).
0 | WAIT_ABANDONED_HRESULT => {
// STRAIGHT-LINE, NO `?` between acquire + release — a `?`-return here would leak the
// keyed-mutex lock and wedge the host on this slot. The ordering below is load-bearing:
// the CopyResource is GPU-ordered before the consumer via the slot keyed mutex, and the
@@ -409,8 +423,10 @@ impl FramePublisher {
self.next = (slot + 1) % ring_len;
return;
}
Err(e) if e.code().0 == WAIT_TIMEOUT_HRESULT => continue,
Err(_) => return,
// Busy — the host holds this slot (the designed backpressure): try the next one.
WAIT_TIMEOUT_HRESULT => continue,
// Genuine failure (negative HRESULT — device removed / invalid call): drop the frame.
_ => return,
}
}
// All slots busy — drop this frame (never block the swap-chain thread).
@@ -342,19 +342,28 @@ impl SwapChainProcessor {
logged_frame = true;
}
// STEP 6: copy the acquired surface into the shared ring BEFORE FinishedProcessingFrame
// (the surface is valid until the next ReleaseAndAcquire). The pointer is BORROWED —
// `from_raw_borrowed` does NOT take IddCx's refcount — and the GPU-side copy is ordered
// before the consumer via the slot keyed mutex. (Attach happens at the loop top.)
if let Some(p) = publisher.as_mut() {
// (the surface is valid until the next ReleaseAndAcquire). Every successful acquire
// TRANSFERS one surface reference to the driver — the MS sample `Attach`es it into a
// ComPtr and `Reset`s BEFORE FinishedProcessingFrame, warning that a driver which
// "forgets to release the reference" leaves the surfaces alive after the swap-chain is
// destroyed. Holding it (the old `from_raw_borrowed`) leaked the swap-chain's whole
// surface set per assign/unassign cycle (reconnect, mode change, HDR flip) — so adopt
// the reference UNCONDITIONALLY (publisher or not); it is released when `res` drops at
// the end of this block. (Publisher attach happens at the loop top.)
{
let raw = buffer.MetaData.pSurface as *mut core::ffi::c_void;
if !raw.is_null() {
// SAFETY: `raw` is IddCx's live surface pointer (valid until the next
// ReleaseAndAcquire); `from_raw_borrowed` does not consume the refcount.
if let Some(res) = unsafe { IDXGIResource::from_raw_borrowed(&raw) }
// SAFETY: `raw` is the live surface IddCx just handed us, carrying the acquire's
// transferred reference; `from_raw` adopts exactly that reference (released on
// drop, below — the queued GPU copy is unaffected: D3D defers destruction, and
// the copy is ordered before the consumer via the slot keyed mutex).
let res = unsafe { IDXGIResource::from_raw(raw) };
if let Some(p) = publisher.as_mut()
&& let Ok(tex) = res.cast::<ID3D11Texture2D>()
{
p.publish(&tex);
}
// `res` drops here → the acquire's surface reference is released, pre-Finished.
}
}