Files
punktfunk/packaging/windows/drivers/pf-vdisplay/src/swap_chain_processor.rs
T
enricobuehler 83d3d6384a
apple / swift (push) Successful in 1m7s
ci / rust (push) Successful in 1m14s
windows-drivers / driver-build (push) Successful in 1m8s
apple / screenshots (push) Successful in 3m14s
windows-drivers / probe-and-proto (push) Successful in 19s
ci / web (push) Successful in 40s
ci / docs-site (push) Successful in 1m1s
android / android (push) Successful in 3m13s
deb / build-publish (push) Successful in 2m38s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
windows-host / package (push) Successful in 5m18s
ci / bench (push) Successful in 4m35s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m26s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m16s
docker / deploy-docs (push) Successful in 31s
refactor(windows-drivers): STEP 8 (1/n) — unsafe-reduction pass (per-site // SAFETY)
Audit pass over the new pf-vdisplay driver's unsafe surface: 92 per-site // SAFETY comments added
across adapter.rs / monitor.rs / entry.rs / callbacks.rs / swap_chain_processor.rs /
frame_transport.rs / direct_3d_device.rs (control.rs already had full coverage). COMMENTS ONLY — zero
logic, signature, or control-flow change (verified via git diff: every added line is a // SAFETY
comment or blank).

The dominant gap was the pervasive `core::mem::zeroed()` FFI-struct builds (IDDCX_*/WDF_*/
DISPLAYCONFIG_* C PODs whose all-zero bit pattern is a valid uninitialized/Invalid state, with the
required .Size/fields set immediately after) — each now carries a one-line // SAFETY. Plus explicit
notes on the two stack/local-pointer-into-FFI hazards (adapter.rs `version` ptr into
IddCxAdapterInitAsync; monitor.rs `edid` Vec ptr into IddCxMonitorCreate — both read synchronously
before the local drops) and the frame_transport.rs raw-HANDLE / mapped-header derefs + cleanup paths.
The already-justified Send/Sync wrappers (SendAdapter, CtxTypeInfo/DevCtxInfo, MonitorObject,
Sendable, FramePublisher) were audited — each already carried a // SAFETY. No site needed a code
change.

First slice of STEP 8 (the SudoVDA drop). Comments-only ⇒ build-neutral; windows-drivers.yml verifies
on the next runner build. Remaining STEP 8: re-vendor the installer's driver binary from the new
drivers/ tree (the shipping packaging/windows/pf-vdisplay/ binary is still built from the OLD oracle
tree with the SudoVDA-compat GUID — ABI-mismatched with the host's proto GUID), add an .inx to the
new tree, re-point scripts/README from vdisplay-driver/ to drivers/, flip the selector default to
pf-vdisplay, then delete the old oracle tree. Keep sudovda.rs (the runtime fallback + the
backend-neutral CCD helpers pf_vdisplay.rs reuses) and the WGC-relay/DDA secure path (the
secure-desktop gate is not yet passed on glass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 12:00:55 +00:00

377 lines
19 KiB
Rust

//! The swap-chain processor (STEP 5 + STEP 6): a worker thread that DRAINS the IddCx swap-chain (so the
//! virtual monitor stays a usable display) and PUBLISHES each acquired surface into the host-created
//! shared ring (the IDD-push path).
//!
//! The OS presents the composited desktop to the driver through a swap-chain; the driver MUST consume it
//! (acquire → finished-processing) or the monitor stalls. STEP 5 binds our render device to the swap-chain
//! (`IddCxSwapChainSetDevice`) and loops acquire/finish. STEP 6 lazily attaches a [`FramePublisher`] to
//! the host's shared ring and, on each acquired frame, `CopyResource`s `out.MetaData.pSurface` into the
//! next ring slot before finishing the frame (a non-IDD-push session simply never attaches and keeps
//! draining).
//!
//! Ported from the proven oracle (`packaging/windows/vdisplay-driver/pf-vdisplay/src/
//! swap_chain_processor.rs`) onto wdk-sys + wdk-iddcx. The oracle's `wdf_umdf`/`wdf_umdf_sys` are
//! replaced by `wdk_sys::iddcx::*` + the `wdk_iddcx` DDI wrappers. Those wrappers return a RAW
//! `NTSTATUS` (`i32`) that is HRESULT-shaped for the swap-chain DDIs, so we classify it by hand
//! (`hr >= 0` = success; `0x8000_000A` = E_PENDING; `hr < 0 && != E_PENDING` = error) rather than with
//! `nt_success`.
use std::{
mem::size_of,
sync::{
Arc,
atomic::{AtomicBool, Ordering},
},
thread::{self, JoinHandle},
time::Duration,
};
use wdk_sys::iddcx::{
IDARG_IN_RELEASEANDACQUIREBUFFER2, IDARG_IN_SWAPCHAINSETDEVICE,
IDARG_OUT_RELEASEANDACQUIREBUFFER2, IDDCX_SWAPCHAIN,
};
// `HANDLE` is the shared wdk-sys typedef (`crate::types`) re-used by the iddcx bindings — take it from
// the crate root, which is guaranteed to export it (the iddcx module only re-exports it if bindgen
// re-declared it there). It is the same type as `IDARG_IN_SETSWAPCHAIN.hNextSurfaceAvailable`.
use wdk_sys::{HANDLE, NTSTATUS, WDFOBJECT, call_unsafe_wdf_function_binding};
use windows::{
Win32::{
Foundation::HANDLE as WHANDLE,
Graphics::{
Direct3D11::ID3D11Texture2D,
Dxgi::{IDXGIDevice, IDXGIResource},
},
System::Threading::{
AvRevertMmThreadCharacteristics, AvSetMmThreadCharacteristicsW, WaitForSingleObject,
},
},
core::{Interface, w},
};
use crate::{direct_3d_device::Direct3DDevice, frame_transport::FramePublisher};
/// E_PENDING — `ReleaseAndAcquireBuffer2` returns this (HRESULT-shaped) when the swap-chain is valid but
/// DWM has composed no new frame yet; wait on the surface-available event and retry.
const E_PENDING: u32 = 0x8000_000A;
/// `WAIT_TIMEOUT` from `WaitForSingleObject` (defined locally to avoid pulling a windows-crate constant
/// type into the comparison — the raw `WAIT_EVENT.0` is just a `u32`).
const WAIT_TIMEOUT_U32: u32 = 0x0000_0102;
/// HRESULT-shaped success test for the swap-chain DDIs (raw `NTSTATUS`/HRESULT: success iff non-negative).
#[inline]
fn hr_success(hr: NTSTATUS) -> bool {
hr >= 0
}
/// A minimal newtype to move a raw pointer / handle across the thread boundary. The wrapped value is a
/// raw IddCx swap-chain handle or an event HANDLE (both raw pointers, framework-managed) — sending them
/// to the worker is sound because only this thread touches them and the framework synchronises lifetime.
struct Sendable<T>(T);
// SAFETY: see the type doc — the wrapped raw handle is owned by the worker for its lifetime.
unsafe impl<T> Send for Sendable<T> {}
pub struct SwapChainProcessor {
terminate: Arc<AtomicBool>,
thread: Option<JoinHandle<()>>,
}
// SAFETY: Raw ptr is managed by external library; access is serialised by the worker thread + the
// terminate flag.
unsafe impl Send for SwapChainProcessor {}
unsafe impl Sync for SwapChainProcessor {}
impl SwapChainProcessor {
pub fn new() -> Self {
Self {
terminate: Arc::new(AtomicBool::new(false)),
thread: None,
}
}
pub fn run(
&mut self,
swap_chain: IDDCX_SWAPCHAIN,
device: Arc<Direct3DDevice>,
available_buffer_event: HANDLE,
target_id: u32,
render_luid_low: u32,
render_luid_high: i32,
) {
let available_buffer_event = Sendable(available_buffer_event);
let swap_chain = Sendable(swap_chain);
let terminate = self.terminate.clone();
let join_handle = thread::spawn(move || {
// Rust 2021 disjoint closure captures would otherwise grab the raw `swap_chain.0` /
// `available_buffer_event.0` FIELDS directly (defeating the `Sendable` Send wrapper, since the
// inner `*mut IDDCX_SWAPCHAIN__` / `HANDLE` are `!Send`). Rebind the WHOLE wrappers here so the
// closure captures them as `Sendable<_>` (which IS `Send`), then unwrap from the locals.
let swap_chain = swap_chain;
let available_buffer_event = available_buffer_event;
// It is very important to prioritize this thread by making use of the Multimedia Scheduler
// Service. It will intelligently prioritize the thread for improved throughput in high
// CPU-load scenarios.
let mut av_task = 0u32;
// SAFETY: `w!("Distribution")` is a 'static null-terminated UTF-16 task name; `av_task` is a
// valid local out-param. The returned handle is reverted with AvRevertMmThreadCharacteristics.
let res = unsafe { AvSetMmThreadCharacteristicsW(w!("Distribution"), &mut av_task) };
let Ok(av_handle) = res else {
dbglog!("[pf-vd] swap-chain: failed to prioritize thread: {res:?}");
return;
};
Self::run_core(
swap_chain.0,
&device,
available_buffer_event.0,
&terminate,
target_id,
render_luid_low,
render_luid_high,
);
dbglog!(
"[pf-vd] swap-chain run_core RETURNED (target={target_id}) — deleting swap-chain, device drops next"
);
// Delete the swap-chain WDF object BEFORE the `Arc<Direct3DDevice>` drops (the swap-chain
// referenced our device). `WdfObjectDelete` takes a WDFOBJECT.
// SAFETY: `swap_chain` is a live IddCx swap-chain handle; we own the sole reference here and
// the drain loop has exited.
unsafe {
call_unsafe_wdf_function_binding!(WdfObjectDelete, swap_chain.0 as WDFOBJECT);
}
// Revert the thread to normal once it's done.
// SAFETY: `av_handle` is the live characteristics handle returned by AvSetMmThreadCharacteristicsW
// above, reverted exactly once here at thread exit.
let res = unsafe { AvRevertMmThreadCharacteristics(av_handle) };
if let Err(e) = res {
dbglog!("[pf-vd] swap-chain: failed to revert prioritized thread: {e:?}");
}
});
self.thread = Some(join_handle);
}
fn run_core(
swap_chain: IDDCX_SWAPCHAIN,
device: &Direct3DDevice,
available_buffer_event: HANDLE,
terminate: &AtomicBool,
target_id: u32,
render_luid_low: u32,
render_luid_high: i32,
) {
// SetDevice fails (0x887A0026, FACILITY_DXGI) when the monitor briefly flaps INACTIVE during
// topology activation — the OS unassigns + re-assigns the swap-chain, and a fresh run_core thread
// can lose the race to the unassign. Retry briefly so a stable re-assign binds the device instead
// of giving up on the first transient failure. `terminate` (set when the OS unassigns + drops the
// processor) breaks us out promptly.
//
// Cast to IDXGIDevice ONCE and BORROW it to the swap-chain across all retries. Re-casting +
// `into_raw()`'ing on EVERY attempt — and a flapping monitor fails several attempts per session —
// orphans an IDXGIDevice reference per failure, pinning the D3D device (and its ~dozen worker
// threads + tens of MB of VRAM) so it is NEVER freed when the processor drops. `as_raw()` keeps
// our single reference (released right after the loop); IddCx AddRefs its own on success, and
// `device` keeps the object alive for the drain loop regardless.
let dxgi_device = match device.device.cast::<IDXGIDevice>() {
Ok(d) => d,
Err(e) => {
dbglog!("[pf-vd] swap-chain: failed to cast ID3D11Device to IDXGIDevice: {e:?}");
return;
}
};
// Built zeroed + field-assigned (driver style) — robust against a bindgen field-set difference.
// SAFETY: building a C POD — the all-zero bit pattern is a valid uninitialized
// IDARG_IN_SWAPCHAINSETDEVICE; the `pDevice` field is set immediately below.
let mut set_device: IDARG_IN_SWAPCHAINSETDEVICE = unsafe { core::mem::zeroed() };
set_device.pDevice = dxgi_device.as_raw().cast();
let mut set_ok = false;
let mut terminated = false;
for attempt in 0..60u32 {
if terminate.load(Ordering::Relaxed) {
dbglog!(
"[pf-vd] swap-chain run_core: terminated during SetDevice (attempt {attempt}, target={target_id})"
);
terminated = true;
break;
}
// SAFETY: driver is loaded; `swap_chain` is valid; `set_device` points to valid local storage.
let hr = unsafe { wdk_iddcx::IddCxSwapChainSetDevice(swap_chain, &set_device) };
if hr_success(hr) {
set_ok = true;
dbglog!(
"[pf-vd] swap-chain run_core: SetDevice OK (target={target_id}, attempt={attempt}) — entering drain loop"
);
break;
}
if attempt == 0 {
dbglog!(
"[pf-vd] swap-chain run_core: SetDevice attempt 0 failed ({hr:#x}) — retrying up to 60x@50ms (monitor may be flapping)"
);
}
thread::sleep(Duration::from_millis(50));
}
// Release our borrowed device reference — IddCx holds its own now, or we gave up. (Explicit drop
// so NLL can't release it mid-loop while the swap-chain still references the raw ptr.)
drop(dxgi_device);
if !set_ok {
if !terminated {
dbglog!(
"[pf-vd] swap-chain run_core: SetDevice never succeeded after retries (target={target_id}) — giving up"
);
}
return;
}
// STEP 6 IDD-push: lazily ATTACH to the HOST-created shared ring. The restricted UMDF token can't
// create named objects, so the host creates the header + event + textures and we only OPEN them
// once they appear (`try_open`). Until then we just drain — exactly the STEP-5 behaviour — so a
// non-IDD-push session never stalls. Retried every ~30 loop iterations.
let mut publisher: Option<FramePublisher> = None;
let mut frames_since_try: u32 = u32::MAX; // attach attempt on the first loop iteration
let mut logged_pending = false;
let mut logged_frame = false;
loop {
// Check terminate at the TOP, every iteration. The success branch below does NOT re-check it,
// so during a CONTINUOUS frame burst (DWM rendering the freshly-activated desktop) a thread the
// OS unassigns — or that the processor is dropping — never sees the flag and loops on, pinning
// its D3D device (and ~36 NVIDIA worker threads). That is THE reconnect leak; it only
// reproduced at full speed (E_PENDING gaps DO check terminate and masked it under a debugger).
// Without this, `SwapChainProcessor::drop`'s join can also block until the burst ends.
if terminate.load(Ordering::Relaxed) {
break;
}
// The host recreates the shared ring (new format) mid-session when the display's HDR mode
// flips — it bumps the header generation. Detect that and drop the publisher so we re-attach to
// the new-format textures below; otherwise we'd keep CopyResource'ing into the stale ring, whose
// format now mismatches the surface → the publish() format-guard drops every frame and the
// stream freezes until the next swap-chain recreate.
if publisher.as_ref().is_some_and(FramePublisher::is_stale) {
publisher = None;
frames_since_try = u32::MAX; // re-attach immediately
}
// Lazy-attach (rate-limited) at the loop TOP so we keep trying even while the display is idle
// (E_PENDING / no frames presented yet), not only when a frame is acquired. `try_open` is a
// cheap OpenFileMapping that fails fast until the host has created the ring.
if publisher.is_none() {
if frames_since_try >= 30 {
frames_since_try = 0;
// `if let Ok` (not a `match` with an empty `Err` arm) keeps clippy's `single_match`
// happy under `-D warnings`; semantics are identical — attach on success, retry on Err.
if let Ok(p) = FramePublisher::try_open(
target_id,
render_luid_low,
render_luid_high,
&device.device,
&device.device_context,
) {
publisher = Some(p);
}
} else {
frames_since_try += 1;
}
}
// ...Buffer2 is required once CAN_PROCESS_FP16 is set. AcquireSystemMemoryBuffer=FALSE keeps
// the GPU surface (out.MetaData.pSurface) — STEP 6 publishes it into the shared ring in the
// success branch below. Built zeroed + field-assigned (driver style) so a bindgen field-set
// difference can't break a positional struct literal.
// SAFETY: building a C POD — the all-zero bit pattern is a valid uninitialized
// IDARG_IN_RELEASEANDACQUIREBUFFER2; the required `.Size`/AcquireSystemMemoryBuffer are set below.
let mut in_args: IDARG_IN_RELEASEANDACQUIREBUFFER2 = unsafe { core::mem::zeroed() };
#[allow(clippy::cast_possible_truncation)]
{
in_args.Size = size_of::<IDARG_IN_RELEASEANDACQUIREBUFFER2>() as u32;
}
in_args.AcquireSystemMemoryBuffer = 0;
// `core::mem::zeroed()` (not `::default()`) — consistent with every other IddCx out-struct
// in this driver, and robust whether or not bindgen derives `Default` for this type (its
// `MetaData` field carries a raw `pSurface` pointer + union which can suppress the derive).
// SAFETY: building a C POD — the all-zero bit pattern is a valid uninitialized
// IDARG_OUT_RELEASEANDACQUIREBUFFER2 (an out-param the framework fills).
let mut buffer: IDARG_OUT_RELEASEANDACQUIREBUFFER2 = unsafe { core::mem::zeroed() };
// SAFETY: driver is loaded; `swap_chain` is valid; in/out point to valid local storage.
let hr: NTSTATUS = unsafe {
wdk_iddcx::IddCxSwapChainReleaseAndAcquireBuffer2(
swap_chain,
&mut in_args,
&mut buffer,
)
};
if (hr as u32) == E_PENDING {
if !logged_pending {
dbglog!(
"[pf-vd] swap-chain run_core: E_PENDING (target={target_id}) — swap-chain valid but DWM has composed NO frame yet"
);
logged_pending = true;
}
// SAFETY: `available_buffer_event` is the framework-provided surface-available event.
let wait_result =
unsafe { WaitForSingleObject(WHANDLE(available_buffer_event.cast()), 16).0 };
// thread requested an end
if terminate.load(Ordering::Relaxed) {
break;
}
// WAIT_OBJECT_0 | WAIT_TIMEOUT
if matches!(wait_result, 0 | WAIT_TIMEOUT_U32) {
// We have a new buffer (or timed out), so try the AcquireBuffer again.
continue;
}
// The wait was cancelled or something unexpected happened.
break;
} else if hr_success(hr) {
if !logged_frame {
dbglog!(
"[pf-vd] swap-chain run_core: FIRST FRAME acquired (target={target_id}) — DWM IS compositing the virtual display!"
);
logged_frame = true;
}
// STEP 6: copy the acquired surface into the shared ring BEFORE FinishedProcessingFrame
// (the surface is valid until the next ReleaseAndAcquire). The pointer is BORROWED —
// `from_raw_borrowed` does NOT take IddCx's refcount — and the GPU-side copy is ordered
// before the consumer via the slot keyed mutex. (Attach happens at the loop top.)
if let Some(p) = publisher.as_mut() {
let raw = buffer.MetaData.pSurface as *mut core::ffi::c_void;
if !raw.is_null() {
// SAFETY: `raw` is IddCx's live surface pointer (valid until the next
// ReleaseAndAcquire); `from_raw_borrowed` does not consume the refcount.
if let Some(res) = unsafe { IDXGIResource::from_raw_borrowed(&raw) } {
if let Ok(tex) = res.cast::<ID3D11Texture2D>() {
p.publish(&tex);
}
}
}
}
// SAFETY: driver is loaded; `swap_chain` is valid.
let hr = unsafe { wdk_iddcx::IddCxSwapChainFinishedProcessingFrame(swap_chain) };
if !hr_success(hr) {
break;
}
} else {
// The swap-chain was likely abandoned (e.g. DXGI_ERROR_ACCESS_LOST) — exit the loop.
break;
}
}
}
}
impl Drop for SwapChainProcessor {
fn drop(&mut self) {
if let Some(handle) = self.thread.take() {
// signal the worker to end
self.terminate.store(true, Ordering::Relaxed);
// wait until the worker is finished (it deletes the swap-chain object before returning)
let _ = handle.join();
}
}
}