feat(windows-drivers): STEP 5 — SwapChainProcessor + Direct3DDevice (swap-chain drain)
apple / swift (push) Failing after 1s
apple / screenshots (push) Has been skipped
windows-drivers / probe-and-proto (push) Successful in 18s
ci / rust (push) Successful in 1m14s
windows-drivers / driver-build (push) Successful in 1m11s
ci / web (push) Successful in 41s
ci / docs-site (push) Successful in 1m1s
android / android (push) Successful in 3m22s
deb / build-publish (push) Successful in 2m37s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
windows-host / package (push) Successful in 5m52s
ci / bench (push) Successful in 4m47s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m28s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m18s
docker / deploy-docs (push) Successful in 17s

The pf-vdisplay driver now consumes the OS swap-chain so a virtual monitor is a usable
display rather than a stalled one. Compiles + loads on-glass (no regression: adapter still
inits, Status=OK); adversarially reviewed — no blockers, the leak/deadlock invariants preserved.

- new swap_chain_processor.rs: a worker thread (MMCSS "Distribution") that binds the render D3D
  device (IddCxSwapChainSetDevice, single-borrow 60x@50ms retry) then drains the swap-chain
  (ReleaseAndAcquireBuffer2 -> FinishedProcessingFrame; E_PENDING waits 16ms on the surface
  event). NO frame publisher yet (STEP 6). RAII terminate+join Drop; the load-bearing
  top-of-loop terminate check (the oracle's reconnect-leak fix). Fixed a Rust-2021 disjoint-
  capture bug: `.0` field access bypassed the Sendable Send wrapper -> rebind the whole wrappers.
- new direct_3d_device.rs: CreateDXGIFactory2 -> EnumAdapterByLuid(render LUID) -> D3D11CreateDevice;
  a DEVICE_POOL of one Arc<Direct3DDevice> per render LUID (the NVIDIA-UMD-worker-thread leak fix).
- monitor.rs: MonitorObject gains swap_chain_processor; set/take helpers return it for the caller
  to drop OUTSIDE the MONITOR_MODES lock (dropping joins the worker — must never happen under the
  lock); remove_monitor/clear_all drop it before IddCxMonitorDeparture.
- callbacks.rs: assign_swap_chain spawns the processor (pooled device per RenderAdapterLuid;
  WdfObjectDelete on D3D-init failure so the OS retries); unassign_swap_chain drops it. Fixed the
  stale `panic = "abort"` doc (workspace is unwind; the extern "C" boundary aborts on unwind).
- Cargo.toml: windows 0.58 + thiserror (both already resolved in the driver lock). The 3 needed
  swap-chain DDIs were already wrapped in wdk-iddcx; their HRESULT-shaped NTSTATUS is classified
  by hand (hr>=0 success, 0x8000000A E_PENDING).
- Also rustfmt'd the whole driver workspace (it had never been driver-fmt'd).

Built via the ultracode flow: STEP-5 map workflow -> agent-implement -> box build (caught the
Send-capture bug) -> adversarial-verify-agent -> deploy (loads). Session-1 on-glass validation
(the drain loop servicing an ACTIVE monitor) is the next gate — assign_swap_chain only fires
under an interactive session. Note for STEP 6: target_id_for_object uses the MONITOR_MODES handle
lookup the oracle moved to a WDF context; revisit before target_id keys the shared frame ring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-25 09:29:20 +00:00
parent 024e709191
commit d8a453f6ca
10 changed files with 705 additions and 48 deletions
@@ -0,0 +1,139 @@
//! The render-side D3D11 device the swap-chain processor binds to the IddCx swap-chain (STEP 5).
//!
//! Ported verbatim from the proven oracle (`packaging/windows/vdisplay-driver/pf-vdisplay/src/
//! direct_3d_device.rs` + the `DEVICE_POOL`/`pooled_device` that lived in its `context.rs`). The
//! D3D/DXGI types are the `windows` crate (refcounted COM, no manual Drop); the swap-chain/LUID hand-off
//! to the wdk-sys IddCx world happens via raw pointers in `swap_chain_processor.rs`.
//!
//! STEP 5 only DRAINS the swap-chain to keep the monitor a live display — there is no frame publisher,
//! so the device's immediate context is unused here (it returns to use in STEP 6's `CopyResource`).
use std::sync::atomic::{AtomicI32, Ordering};
use std::sync::{Arc, Mutex};
use windows::{
Win32::{
Foundation::LUID,
Graphics::{
Direct3D::D3D_DRIVER_TYPE_UNKNOWN,
Direct3D11::{
D3D11_CREATE_DEVICE_BGRA_SUPPORT,
D3D11_CREATE_DEVICE_PREVENT_ALTERING_LAYER_SETTINGS_FROM_REGISTRY,
D3D11_CREATE_DEVICE_SINGLETHREADED, D3D11_SDK_VERSION, D3D11CreateDevice,
ID3D11Device, ID3D11DeviceContext,
},
Dxgi::{CreateDXGIFactory2, DXGI_CREATE_FACTORY_FLAGS, IDXGIAdapter1, IDXGIFactory5},
},
},
core::Error,
};
#[derive(thiserror::Error, Debug)]
pub enum Direct3DError {
#[error("Direct3DError({0:?})")]
Win32(#[from] Error),
#[error("Direct3DError(\"{0}\")")]
Other(&'static str),
}
impl From<&'static str> for Direct3DError {
fn from(value: &'static str) -> Self {
Direct3DError::Other(value)
}
}
/// DIAGNOSTIC: live `Direct3DDevice` count. Each one holds an `ID3D11Device` whose NVIDIA UMD spawns
/// ~dozens of worker threads; if this climbs without bound across reconnects, devices are leaking.
pub static LIVE_DEVICES: AtomicI32 = AtomicI32::new(0);
#[derive(Debug)]
pub struct Direct3DDevice {
// The following are already refcounted, so they're safe to use directly without additional drop impls
_dxgi_factory: IDXGIFactory5,
_adapter: IDXGIAdapter1,
pub device: ID3D11Device,
/// The single (SINGLETHREADED) immediate context — used by STEP 6's frame-push publisher's
/// `CopyResource` on the swap-chain processor thread (the one thread this device is touched from).
/// Unused in STEP 5 (drain-only); kept so the device matches the oracle exactly.
#[allow(dead_code)]
pub device_context: ID3D11DeviceContext,
}
impl Direct3DDevice {
pub fn init(adapter_luid: LUID) -> Result<Self, Direct3DError> {
let dxgi_factory =
unsafe { CreateDXGIFactory2::<IDXGIFactory5>(DXGI_CREATE_FACTORY_FLAGS(0))? };
let adapter = unsafe { dxgi_factory.EnumAdapterByLuid::<IDXGIAdapter1>(adapter_luid)? };
let mut device = None;
let mut device_context = None;
unsafe {
D3D11CreateDevice(
&adapter,
D3D_DRIVER_TYPE_UNKNOWN,
None,
D3D11_CREATE_DEVICE_BGRA_SUPPORT
| D3D11_CREATE_DEVICE_SINGLETHREADED
| D3D11_CREATE_DEVICE_PREVENT_ALTERING_LAYER_SETTINGS_FROM_REGISTRY,
None,
D3D11_SDK_VERSION,
Some(&mut device),
None,
Some(&mut device_context),
)?;
}
let device = device.ok_or("ID3D11Device not found")?;
let device_context = device_context.ok_or("ID3D11DeviceContext not found")?;
let live = LIVE_DEVICES.fetch_add(1, Ordering::Relaxed) + 1;
dbglog!("[pf-vd] Direct3DDevice::init OK — live D3D devices = {live}");
Ok(Self {
_dxgi_factory: dxgi_factory,
_adapter: adapter,
device,
device_context,
})
}
}
impl Drop for Direct3DDevice {
fn drop(&mut self) {
let live = LIVE_DEVICES.fetch_sub(1, Ordering::Relaxed) - 1;
dbglog!("[pf-vd] Direct3DDevice::drop — live D3D devices = {live}");
}
}
/// ONE shared D3D render device, reused across every swap-chain assignment (keyed by render LUID).
/// Creating a fresh `Direct3DDevice` per assign — and the swap-chain flap fires several assigns per
/// session — spawned a new NVIDIA UMD worker-thread set each time that was NEVER reclaimed on release
/// (proven on the RTX box: ~70 `nvwgf2umx` threads + ~50 MB VRAM leaked per reconnect, permanently,
/// even though our `Direct3DDevice` refcount dropped to 0). Pooling one device keeps a single, stable
/// thread set: the processors borrow an `Arc`, so the device outlives them and is never re-created.
static DEVICE_POOL: Mutex<Option<(i64, Arc<Direct3DDevice>)>> = Mutex::new(None);
/// Get-or-create the pooled D3D device for `luid`. Re-creates only if the render adapter changes
/// (e.g. a GPU hot-swap), which drops the old `Arc` once its last processor releases it.
pub fn pooled_device(luid: LUID) -> Option<Arc<Direct3DDevice>> {
let key = (i64::from(luid.HighPart) << 32) | i64::from(luid.LowPart);
let mut pool = DEVICE_POOL.lock().ok()?;
if let Some((k, dev)) = pool.as_ref() {
if *k == key {
return Some(dev.clone());
}
}
match Direct3DDevice::init(luid) {
Ok(d) => {
let a = Arc::new(d);
*pool = Some((key, a.clone()));
Some(a)
}
Err(e) => {
dbglog!("[pf-vd] pooled Direct3DDevice::init failed: {e:?}");
None
}
}
}