feat(windows-drivers): STEP 5 — SwapChainProcessor + Direct3DDevice (swap-chain drain)
apple / swift (push) Failing after 1s
apple / screenshots (push) Has been skipped
windows-drivers / probe-and-proto (push) Successful in 18s
ci / rust (push) Successful in 1m14s
windows-drivers / driver-build (push) Successful in 1m11s
ci / web (push) Successful in 41s
ci / docs-site (push) Successful in 1m1s
android / android (push) Successful in 3m22s
deb / build-publish (push) Successful in 2m37s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
windows-host / package (push) Successful in 5m52s
ci / bench (push) Successful in 4m47s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m28s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m18s
docker / deploy-docs (push) Successful in 17s

The pf-vdisplay driver now consumes the OS swap-chain so a virtual monitor is a usable
display rather than a stalled one. Compiles + loads on-glass (no regression: adapter still
inits, Status=OK); adversarially reviewed — no blockers, the leak/deadlock invariants preserved.

- new swap_chain_processor.rs: a worker thread (MMCSS "Distribution") that binds the render D3D
  device (IddCxSwapChainSetDevice, single-borrow 60x@50ms retry) then drains the swap-chain
  (ReleaseAndAcquireBuffer2 -> FinishedProcessingFrame; E_PENDING waits 16ms on the surface
  event). NO frame publisher yet (STEP 6). RAII terminate+join Drop; the load-bearing
  top-of-loop terminate check (the oracle's reconnect-leak fix). Fixed a Rust-2021 disjoint-
  capture bug: `.0` field access bypassed the Sendable Send wrapper -> rebind the whole wrappers.
- new direct_3d_device.rs: CreateDXGIFactory2 -> EnumAdapterByLuid(render LUID) -> D3D11CreateDevice;
  a DEVICE_POOL of one Arc<Direct3DDevice> per render LUID (the NVIDIA-UMD-worker-thread leak fix).
- monitor.rs: MonitorObject gains swap_chain_processor; set/take helpers return it for the caller
  to drop OUTSIDE the MONITOR_MODES lock (dropping joins the worker — must never happen under the
  lock); remove_monitor/clear_all drop it before IddCxMonitorDeparture.
- callbacks.rs: assign_swap_chain spawns the processor (pooled device per RenderAdapterLuid;
  WdfObjectDelete on D3D-init failure so the OS retries); unassign_swap_chain drops it. Fixed the
  stale `panic = "abort"` doc (workspace is unwind; the extern "C" boundary aborts on unwind).
- Cargo.toml: windows 0.58 + thiserror (both already resolved in the driver lock). The 3 needed
  swap-chain DDIs were already wrapped in wdk-iddcx; their HRESULT-shaped NTSTATUS is classified
  by hand (hr>=0 success, 0x8000000A E_PENDING).
- Also rustfmt'd the whole driver workspace (it had never been driver-fmt'd).

Built via the ultracode flow: STEP-5 map workflow -> agent-implement -> box build (caught the
Send-capture bug) -> adversarial-verify-agent -> deploy (loads). Session-1 on-glass validation
(the drain loop servicing an ACTIVE monitor) is the next gate — assign_swap_chain only fires
under an interactive session. Note for STEP 6: target_id_for_object uses the MONITOR_MODES handle
lookup the oracle moved to a WDF context; revisit before target_id keys the shared frame ring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-25 09:29:20 +00:00
parent 024e709191
commit d8a453f6ca
10 changed files with 705 additions and 48 deletions
@@ -0,0 +1,303 @@
//! The swap-chain processor (STEP 5): a worker thread that DRAINS the IddCx swap-chain so the virtual
//! monitor stays a usable display.
//!
//! The OS presents the composited desktop to the driver through a swap-chain; the driver MUST consume
//! it (acquire → finished-processing) or the monitor stalls. STEP 5 binds our render device to the
//! swap-chain (`IddCxSwapChainSetDevice`) and loops acquire/finish, discarding each frame. It does NOT
//! publish frames to the host — that is STEP 6 (the `CopyResource` of `out.MetaData.pSurface` into a
//! shared ring), deliberately omitted here.
//!
//! Ported from the proven oracle (`packaging/windows/vdisplay-driver/pf-vdisplay/src/
//! swap_chain_processor.rs`) onto wdk-sys + wdk-iddcx. The oracle's `wdf_umdf`/`wdf_umdf_sys` are
//! replaced by `wdk_sys::iddcx::*` + the `wdk_iddcx` DDI wrappers. Those wrappers return a RAW
//! `NTSTATUS` (`i32`) that is HRESULT-shaped for the swap-chain DDIs, so we classify it by hand
//! (`hr >= 0` = success; `0x8000_000A` = E_PENDING; `hr < 0 && != E_PENDING` = error) rather than with
//! `nt_success`. The publisher + `render_luid_low/high` params are dropped (STEP 6).
use std::{
mem::size_of,
sync::{
Arc,
atomic::{AtomicBool, Ordering},
},
thread::{self, JoinHandle},
time::Duration,
};
use wdk_sys::iddcx::{
IDARG_IN_RELEASEANDACQUIREBUFFER2, IDARG_IN_SWAPCHAINSETDEVICE,
IDARG_OUT_RELEASEANDACQUIREBUFFER2, IDDCX_SWAPCHAIN,
};
// `HANDLE` is the shared wdk-sys typedef (`crate::types`) re-used by the iddcx bindings — take it from
// the crate root, which is guaranteed to export it (the iddcx module only re-exports it if bindgen
// re-declared it there). It is the same type as `IDARG_IN_SETSWAPCHAIN.hNextSurfaceAvailable`.
use wdk_sys::{HANDLE, NTSTATUS, WDFOBJECT, call_unsafe_wdf_function_binding};
use windows::{
Win32::{
Foundation::HANDLE as WHANDLE,
Graphics::Dxgi::IDXGIDevice,
System::Threading::{
AvRevertMmThreadCharacteristics, AvSetMmThreadCharacteristicsW, WaitForSingleObject,
},
},
core::{Interface, w},
};
use crate::direct_3d_device::Direct3DDevice;
/// E_PENDING — `ReleaseAndAcquireBuffer2` returns this (HRESULT-shaped) when the swap-chain is valid but
/// DWM has composed no new frame yet; wait on the surface-available event and retry.
const E_PENDING: u32 = 0x8000_000A;
/// `WAIT_TIMEOUT` from `WaitForSingleObject` (defined locally to avoid pulling a windows-crate constant
/// type into the comparison — the raw `WAIT_EVENT.0` is just a `u32`).
const WAIT_TIMEOUT_U32: u32 = 0x0000_0102;
/// HRESULT-shaped success test for the swap-chain DDIs (raw `NTSTATUS`/HRESULT: success iff non-negative).
#[inline]
fn hr_success(hr: NTSTATUS) -> bool {
hr >= 0
}
/// A minimal newtype to move a raw pointer / handle across the thread boundary. The wrapped value is a
/// raw IddCx swap-chain handle or an event HANDLE (both raw pointers, framework-managed) — sending them
/// to the worker is sound because only this thread touches them and the framework synchronises lifetime.
struct Sendable<T>(T);
// SAFETY: see the type doc — the wrapped raw handle is owned by the worker for its lifetime.
unsafe impl<T> Send for Sendable<T> {}
pub struct SwapChainProcessor {
terminate: Arc<AtomicBool>,
thread: Option<JoinHandle<()>>,
}
// SAFETY: Raw ptr is managed by external library; access is serialised by the worker thread + the
// terminate flag.
unsafe impl Send for SwapChainProcessor {}
unsafe impl Sync for SwapChainProcessor {}
impl SwapChainProcessor {
pub fn new() -> Self {
Self {
terminate: Arc::new(AtomicBool::new(false)),
thread: None,
}
}
pub fn run(
&mut self,
swap_chain: IDDCX_SWAPCHAIN,
device: Arc<Direct3DDevice>,
available_buffer_event: HANDLE,
target_id: u32,
) {
let available_buffer_event = Sendable(available_buffer_event);
let swap_chain = Sendable(swap_chain);
let terminate = self.terminate.clone();
let join_handle = thread::spawn(move || {
// Rust 2021 disjoint closure captures would otherwise grab the raw `swap_chain.0` /
// `available_buffer_event.0` FIELDS directly (defeating the `Sendable` Send wrapper, since the
// inner `*mut IDDCX_SWAPCHAIN__` / `HANDLE` are `!Send`). Rebind the WHOLE wrappers here so the
// closure captures them as `Sendable<_>` (which IS `Send`), then unwrap from the locals.
let swap_chain = swap_chain;
let available_buffer_event = available_buffer_event;
// It is very important to prioritize this thread by making use of the Multimedia Scheduler
// Service. It will intelligently prioritize the thread for improved throughput in high
// CPU-load scenarios.
let mut av_task = 0u32;
let res = unsafe { AvSetMmThreadCharacteristicsW(w!("Distribution"), &mut av_task) };
let Ok(av_handle) = res else {
dbglog!("[pf-vd] swap-chain: failed to prioritize thread: {res:?}");
return;
};
Self::run_core(
swap_chain.0,
&device,
available_buffer_event.0,
&terminate,
target_id,
);
dbglog!(
"[pf-vd] swap-chain run_core RETURNED (target={target_id}) — deleting swap-chain, device drops next"
);
// Delete the swap-chain WDF object BEFORE the `Arc<Direct3DDevice>` drops (the swap-chain
// referenced our device). `WdfObjectDelete` takes a WDFOBJECT.
// SAFETY: `swap_chain` is a live IddCx swap-chain handle; we own the sole reference here and
// the drain loop has exited.
unsafe {
call_unsafe_wdf_function_binding!(WdfObjectDelete, swap_chain.0 as WDFOBJECT);
}
// Revert the thread to normal once it's done.
let res = unsafe { AvRevertMmThreadCharacteristics(av_handle) };
if let Err(e) = res {
dbglog!("[pf-vd] swap-chain: failed to revert prioritized thread: {e:?}");
}
});
self.thread = Some(join_handle);
}
fn run_core(
swap_chain: IDDCX_SWAPCHAIN,
device: &Direct3DDevice,
available_buffer_event: HANDLE,
terminate: &AtomicBool,
target_id: u32,
) {
// SetDevice fails (0x887A0026, FACILITY_DXGI) when the monitor briefly flaps INACTIVE during
// topology activation — the OS unassigns + re-assigns the swap-chain, and a fresh run_core thread
// can lose the race to the unassign. Retry briefly so a stable re-assign binds the device instead
// of giving up on the first transient failure. `terminate` (set when the OS unassigns + drops the
// processor) breaks us out promptly.
//
// Cast to IDXGIDevice ONCE and BORROW it to the swap-chain across all retries. Re-casting +
// `into_raw()`'ing on EVERY attempt — and a flapping monitor fails several attempts per session —
// orphans an IDXGIDevice reference per failure, pinning the D3D device (and its ~dozen worker
// threads + tens of MB of VRAM) so it is NEVER freed when the processor drops. `as_raw()` keeps
// our single reference (released right after the loop); IddCx AddRefs its own on success, and
// `device` keeps the object alive for the drain loop regardless.
let dxgi_device = match device.device.cast::<IDXGIDevice>() {
Ok(d) => d,
Err(e) => {
dbglog!("[pf-vd] swap-chain: failed to cast ID3D11Device to IDXGIDevice: {e:?}");
return;
}
};
// Built zeroed + field-assigned (driver style) — robust against a bindgen field-set difference.
let mut set_device: IDARG_IN_SWAPCHAINSETDEVICE = unsafe { core::mem::zeroed() };
set_device.pDevice = dxgi_device.as_raw().cast();
let mut set_ok = false;
let mut terminated = false;
for attempt in 0..60u32 {
if terminate.load(Ordering::Relaxed) {
dbglog!(
"[pf-vd] swap-chain run_core: terminated during SetDevice (attempt {attempt}, target={target_id})"
);
terminated = true;
break;
}
// SAFETY: driver is loaded; `swap_chain` is valid; `set_device` points to valid local storage.
let hr = unsafe { wdk_iddcx::IddCxSwapChainSetDevice(swap_chain, &set_device) };
if hr_success(hr) {
set_ok = true;
dbglog!(
"[pf-vd] swap-chain run_core: SetDevice OK (target={target_id}, attempt={attempt}) — entering drain loop"
);
break;
}
if attempt == 0 {
dbglog!(
"[pf-vd] swap-chain run_core: SetDevice attempt 0 failed ({hr:#x}) — retrying up to 60x@50ms (monitor may be flapping)"
);
}
thread::sleep(Duration::from_millis(50));
}
// Release our borrowed device reference — IddCx holds its own now, or we gave up. (Explicit drop
// so NLL can't release it mid-loop while the swap-chain still references the raw ptr.)
drop(dxgi_device);
if !set_ok {
if !terminated {
dbglog!(
"[pf-vd] swap-chain run_core: SetDevice never succeeded after retries (target={target_id}) — giving up"
);
}
return;
}
let mut logged_pending = false;
let mut logged_frame = false;
loop {
// Check terminate at the TOP, every iteration. The success branch below does NOT re-check it,
// so during a CONTINUOUS frame burst (DWM rendering the freshly-activated desktop) a thread the
// OS unassigns — or that the processor is dropping — never sees the flag and loops on, pinning
// its D3D device (and ~36 NVIDIA worker threads). That is THE reconnect leak; it only
// reproduced at full speed (E_PENDING gaps DO check terminate and masked it under a debugger).
// Without this, `SwapChainProcessor::drop`'s join can also block until the burst ends.
if terminate.load(Ordering::Relaxed) {
break;
}
// ...Buffer2 is required once CAN_PROCESS_FP16 is set. AcquireSystemMemoryBuffer=FALSE keeps
// the GPU surface (out.MetaData.pSurface). STEP 5 only drains — it does NOT publish the
// surface (STEP 6 will). Built zeroed + field-assigned (driver style) so a bindgen field-set
// difference can't break a positional struct literal.
let mut in_args: IDARG_IN_RELEASEANDACQUIREBUFFER2 = unsafe { core::mem::zeroed() };
#[allow(clippy::cast_possible_truncation)]
{
in_args.Size = size_of::<IDARG_IN_RELEASEANDACQUIREBUFFER2>() as u32;
}
in_args.AcquireSystemMemoryBuffer = 0;
// `core::mem::zeroed()` (not `::default()`) — consistent with every other IddCx out-struct
// in this driver, and robust whether or not bindgen derives `Default` for this type (its
// `MetaData` field carries a raw `pSurface` pointer + union which can suppress the derive).
let mut buffer: IDARG_OUT_RELEASEANDACQUIREBUFFER2 = unsafe { core::mem::zeroed() };
// SAFETY: driver is loaded; `swap_chain` is valid; in/out point to valid local storage.
let hr: NTSTATUS = unsafe {
wdk_iddcx::IddCxSwapChainReleaseAndAcquireBuffer2(
swap_chain,
&mut in_args,
&mut buffer,
)
};
if (hr as u32) == E_PENDING {
if !logged_pending {
dbglog!(
"[pf-vd] swap-chain run_core: E_PENDING (target={target_id}) — swap-chain valid but DWM has composed NO frame yet"
);
logged_pending = true;
}
// SAFETY: `available_buffer_event` is the framework-provided surface-available event.
let wait_result =
unsafe { WaitForSingleObject(WHANDLE(available_buffer_event.cast()), 16).0 };
// thread requested an end
if terminate.load(Ordering::Relaxed) {
break;
}
// WAIT_OBJECT_0 | WAIT_TIMEOUT
if matches!(wait_result, 0 | WAIT_TIMEOUT_U32) {
// We have a new buffer (or timed out), so try the AcquireBuffer again.
continue;
}
// The wait was cancelled or something unexpected happened.
break;
} else if hr_success(hr) {
if !logged_frame {
dbglog!(
"[pf-vd] swap-chain run_core: FIRST FRAME acquired (target={target_id}) — DWM IS compositing the virtual display!"
);
logged_frame = true;
}
// STEP 6 publishes `buffer.MetaData.pSurface` into the shared ring HERE (the surface is
// valid until the next ReleaseAndAcquire). STEP 5 only drains, so we immediately finish
// the frame.
// SAFETY: driver is loaded; `swap_chain` is valid.
let hr = unsafe { wdk_iddcx::IddCxSwapChainFinishedProcessingFrame(swap_chain) };
if !hr_success(hr) {
break;
}
} else {
// The swap-chain was likely abandoned (e.g. DXGI_ERROR_ACCESS_LOST) — exit the loop.
break;
}
}
}
}
impl Drop for SwapChainProcessor {
fn drop(&mut self) {
if let Some(handle) = self.thread.take() {
// signal the worker to end
self.terminate.store(true, Ordering::Relaxed);
// wait until the worker is finished (it deletes the swap-chain object before returning)
let _ = handle.join();
}
}
}