Files
punktfunk/docs/windows-virtual-display-rust-port.md
enricobuehler e2c9bfd3d9
apple / swift (push) Successful in 1m4s
windows-host / package (push) Successful in 6m28s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m14s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m10s
release / apple (push) Successful in 7m53s
android / android (push) Successful in 10m33s
ci / web (push) Successful in 44s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 3m4s
ci / docs-site (push) Successful in 53s
ci / rust (push) Successful in 12m22s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m11s
apple / screenshots (push) Successful in 5m24s
deb / build-publish (push) Successful in 3m16s
decky / build-publish (push) Successful in 21s
ci / bench (push) Successful in 4m42s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m34s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m13s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 47s
flatpak / build-publish (push) Successful in 4m24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 25s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m44s
feat(windows): pf-vdisplay IDD-push — HDR + pipelined zero-copy capture
HDR (display-driven, matching the WGC path):
- CTA-861.3 HDR EDID (BT.2020 primaries + HDR Static Metadata block) so Windows
  offers "Use HDR" on the virtual display. The host FOLLOWS the display's live
  advanced-color state, recreating the shared ring at the matching format
  (FP16 in HDR / BGRA in SDR) on a toggle — no freeze.
- Always emit Main10/BT.2020-PQ Rgb10a2 while the display is HDR; the client
  auto-detects PQ from the HEVC VUI (clients under-report VIDEO_CAP_10BIT).
  Generic HDR10 mastering SEI on every IDR.
- Generation-tagged `latest` (gen<<40|seq<<8|slot) + driver `is_stale` re-attach
  kill the toggle-time garbage frame and any stale-ring read.

Perf:
- Pipeline the encode loop (Capturer::pipeline_depth; IDD-push = 2): submit N+1
  before polling N so the convert/copy on the 3D engine overlaps the NVENC encode
  of N on the ASIC. PUNKTFUNK_IDD_DEPTH overrides (1 = synchronous).
- Rotating host output ring (OUT_RING) so the in-flight encode and the next
  convert never touch the same texture.
- HDR converts directly from the keyed-mutex slot's SRV into the output ring
  (drops the redundant slot->fp16 scratch copy); SDR copies the BGRA slot in.
  The slot mutex is held only across the convert/copy, not the encode.
  RING_LEN 3->6 for publish headroom.
- Capture-health diagnostic: new_fps vs repeat_fps under PUNKTFUNK_PERF (a low
  new_fps at a high send rate means the source isn't compositing, not an encode
  stall).

Validated live on the RTX box: 5120x1440@240 HDR streams; driver composes
~180 new fps, encode 240 fps @ ~4.3 ms p50.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 00:39:28 +02:00

38 KiB
Raw Permalink Blame History

Windows virtual display — a Rust port of SudoVDA (investigation & plan)

Status: P1 done — pf-vdisplay validated streaming on glass at 5120×1440@240 (2026-06-22). The all-Rust IddCx driver replaces the vendored SudoVDA C++ driver, matching the "all-Rust UMDF, zero external driver deps" direction we finished for gamepads (ViGEmBus gone; DualSense/DS4/XUSB shipped). The investigation/plan below is kept for context; see Validated on-box for the result.

TL;DR

A Rust port is feasible, low-on-blockers, and strategically aligned — and there's an unexpected architectural prize beyond "same thing, in Rust."

  • Signing is not a blocker. An IddCx driver is UMDF user-mode; it needs no WHQL, no attestation, no test-signing. A self-signed cert in LocalMachine Root + TrustedPublisher loads it — exactly the model our gamepad drivers already ship (and exactly what SudoVDA and the other forks do). (Do UMDF drivers require signing?)
  • We would not be first in Rust. MolotovCherry/virtual-display-rs is a complete, shipping IddCx driver written in Rust (MIT), with hand-rolled IddCx/WDF bindgen bindings (wdf-umdf-sys + wdf-umdf) and a reference swap-chain processor. This turns "greenfield FFI" into "adapt a proven reference."
  • The prize: we can stop using DXGI Desktop Duplication. An IddCx driver already receives the composited desktop frames in its swap-chain. Looking Glass ships exactly this in production — driver consumes the swap-chain, hands frames to a separate process, "operates entirely independently of DDA." Doing the same would delete an entire class of multi-GPU bugs the current capture/dxgi.rs is built to survive (ACCESS_LOST storms, MODE_CHANGE_IN_PROGRESS, the win32u.dll reparenting patch).

Recommendation: yes, build it in Rust, in phases — a drop-in DDA-compatible driver first (own the stack at low risk), then the direct-frame-push path (the real cleanup). Keep vendoring SudoVDA as the safe interim until the Rust driver is on-glass-validated on the RTX box.

Validated on-box (2026-06-22)

Before committing, the toolchain + load path were proven on the RTX box (Win11 26200, WDK 26100):

  • A Rust IddCx driver builds with our toolchain. Cloned virtual-display-rs and built its driver .dll against our WDK (UMDF 2.31 + IddCx 1.4 stubs, bindgen over IddCx.h via our LLVM, nightly-2024-07-26). One fix needed: its build.rs picked the max SDK Lib version (10.0.28000.0, a base SDK with no IddCx) for the IddCxStub search path; resolving it by the version that actually contains um\x64\iddcx\1.4 (10.0.26100.0, the WDK) fixed the link.
  • It installs self-signed and loads. Signed .dll/.cat with our existing driver cert (the gamepad punktfunk-ds-test), pnputil /add-driver, root devnode via devgen. The device came up Status OK / CM_PROB_NONE, Class Display, hosted by WUDFRd — a Rust IddCx adapter initialized cleanly. (SudoVDA, already live here, independently confirms IddCx + self-signed UMDF work on this box.) Test artifacts removed afterward; SudoVDA untouched.

Conclusion: the central risk ("can we build + load a Rust IddCx driver here?") is retired. The binding question (D2) resolves toward reusing virtual-display-rs's self-contained wdf-umdf-sys + wdf-umdf bindgen crates (now proven to build + load on our box) rather than extending windows-drivers-rs — IddCx functions are direct IddCxStub exports the WDF function-table macro can't reach anyway, so a unified bindgen is the cleaner base for pf-vdisplay. Reference clone kept at C:\Users\Public\virtual-display-rs.

Scaffold + driver logic landed + on-glass: packaging/windows/vdisplay-driver/ — vendored wdf-umdf-sys/wdf-umdf (MIT, + the SDK-version build.rs fix) + the pf-vdisplay driver crate. The full IddCx driver is ported (entry → IDD_CX_CLIENT_CONFIG with all 7 callbacks → device/monitor context → our own EDID → a real swap-chain drain), with the IPC/serde/tokio stack replaced by an in-tree monitor model and OutputDebugString logging. Validated on the RTX box: built, signed (our punktfunk-ds-test cert), installed, loaded Status OK, and arrived a real virtual monitor ("VirtuDisplay+", DISPLAY\CHY0000) — i.e. an OURS, all-Rust IddCx virtual display creating a monitor.

IOCTL control plane done + on-glass (P1 functionally complete): the SudoVDA-compatible control plane is implemented (EVT_IDD_CX_DEVICE_IO_CONTROL + the {e5bcc234-…} interface registered via WdfDeviceCreateDeviceInterface; control.rs with byte-identical structs) — ADD a monitor at a requested mode → {LUID, target_id} (target id + adapter LUID captured from IDARG_OUT_MONITORARRIVAL), REMOVE by GUID, PING/GET_WATCHDOG watchdog, GET_VERSION, SET_RENDER_ADAPTER (IddCxAdapterSetRenderAdapter); per-ADD mode injection (requested mode preferred + fallbacks). Added the five missing FFI wrappers to the vendored wdf-umdf. Validated on the RTX box with a probe that mimics vdisplay/sudovda.rs exactly: GET_VERSION → 0.2.1, GET_WATCHDOG → timeout=3, ADD 1920×1080@60 → target_id=257 + adapter LUID, a real "VirtuDisplay+" monitor arrived at the requested mode, REMOVE ok. Constraint: pf-vdisplay can't coexist with SudoVDA — they register the same interface GUID, so two IddCx adapters claiming it → FAILED_POST_START; pf-vdisplay replaces SudoVDA (validated by disabling SudoVDA first).

Watchdog + real-host drive validated: added the watchdog thread (1 Hz countdown reset by any IOCTL; tears down all monitors at 0 so a gone host never leaves a phantom display; mirrors SudoVDA's RunWatchdog). Pointed the real host at it — removed SudoVDA's devnode so pf-vdisplay is the sole {e5bcc234} provider, then ran the host's vdisplay::sudovda::tests::live_create_drop (PUNKTFUNK_SUDOVDA_LIVE=1): test passed, and the pf-vdisplay log shows the host's IOCTLs landing — ADD 1920x1080@60 → target_id=258, luid=…02619823, then the watchdog correctly tore the monitor down when the test process exited without a final REMOVE. So vdisplay/sudovda.rs drives pf-vdisplay unchanged through the full control contract.

Validated streaming end-to-end on glass (2026-06-22) — P1 complete. pf-vdisplay is a working SudoVDA replacement. Driven by the real host (serve, the LocalSystem service) with a stock client at 5120×1440@240: the monitor arrives, resolve_gdi_name → \\.\DISPLAY10, set_active_mode + CCD-isolate succeed, the DXGI output resolves under the RTX 4090, WGC capture + NVENC run at steady 240 fps, ~2.4 ms encode, 6512 AUs sent, clean teardown (isolate restored rc=0x0). Same vdisplay/sudovda.rs path, unchanged — full parity with SudoVDA.

The earlier "monitor arrives but never gets a swap-chain / no DXGI output" symptoms were a measurement + state artifact, not a driver bug. Two traps cost a lot of time:

  1. Session 0. Every standalone probe (vdtest, the host's live_create_drop test) ran in Session 0 — the services session, whose desktop is a throwaway 1024×768 basic display. IddCx activation happens in the console Session 1, where the 4090 drives the real desktop. So Screen.AllScreens/CCD queries from Session 0 can never see the virtual monitor activate — they report the wrong desktop. The only valid way to drive + observe it is the host service (SYSTEM, which targets Session 1) plus the driver's own OutputDebugString (system-wide, session-agnostic).
  2. Accumulated device-state damage. Repeated reinstalls + Disable/Enable-PnpDevice cycles + a control handle the host cached across all of it left the device tree wedged (stale handle → the host's PINGs fail → the 3 s watchdog tears the monitor down mid-session → capture opens a dying display → "no DXGI output"). A reboot cleared it and it worked on the first connect. Lesson: after device churn, restart the host service (fresh handle) — and when in doubt, reboot.

The swap-chain processor is a faithful port of virtual-display-rs's (it drains correctly via ReleaseAndAcquireBuffer + FinishedProcessingFrame — the drain is required; a true no-op would stall DWM and freeze the captured image). The EDID is our own clean 128-byte block (manufacturer PNK, product punktfunk) — no SudoVDA bytes.

Build gotcha (important for iterating): updating an installed UMDF driver only takes if the INF DriverVer changesdeploy-dev.ps1 stamps a date.time -v on every run; without a bump the old binary keeps running (silently). Devnode hygiene: create the root devnode with nefconc --create-device-node (a clean ROOT\DISPLAY node), NOT devgen /add — devgen makes persistent SWD\DEVGEN software devices that survive reboot and registry deletion and resurrect on every pnputil /add-driver (they have hwid root\pf_vdisplay, so the driver install re-materializes them). The production installer must use a single nefconc/INF-created node and never devgen.

P2 — direct frame push (kill DDA): design & decision record

Status: in progress. P1 ships frames the old way (the driver drains its swap-chain and DDA/WGC re-captures the composited desktop). P2 makes the driver publish each swap-chain frame to the host directly, so we can retire Desktop Duplication and its multi-GPU survival code. Built behind PUNKTFUNK_IDD_PUSH, A/B'd against DDA, and only then made the default.

The decisive finding: producer and consumer are both in Session 0

The whole transport design hinged on one unknown — same-session or cross-session? Measured on the RTX box (2026-06-22): the pf-vdisplay host process is WUDFHost.exe with -DeviceGroupId:pfVDisplayGroup, running in Session 0; the punktfunk host service is LocalSystem, also Session 0. So the swap-chain processor thread (spawned by our own thread::spawn inside the driver, i.e. in WUDFHost) and the encoder live in the same session. This is the easy case:

  • A D3D11 shared keyed-mutex texture created in the driver can be opened by name in the host with ID3D11Device1::OpenSharedResourceByName — both devices created on the same render-adapter LUID (which the driver already reports out of the ADD IOCTL via OsAdapterLuid, surfaced as WinCaptureTarget::adapter_luid).
  • Named kernel objects resolve through Session 0's shared \BaseNamedObjects, so no Global\ prefix / SeCreateGlobalPrivilege gymnastics are needed (kept the names unprefixed; documented that this relies on both processes being Session 0). The Looking-Glass cross-VM shared-memory device is unnecessary — this is cross-process, same-session, on one GPU.

This collapses the "Session-0 cross-process transport is the long pole" risk from the original plan.

Transport: a ring of shared keyed-mutex textures + a metadata header + an event

A single ping-pong keyed mutex would couple the driver's present rate to the host's consume rate — and the swap-chain thread must never block (a stalled IddCxSwapChainReleaseAndAcquire/processing loop freezes DWM compositing system-wide). So, the Looking-Glass shape — multiple frame buffers, newest wins:

  • Ring of N (default 3) shared textures, RESOURCE_MISC_SHARED_NTHANDLE | SHARED_KEYEDMUTEX, fixed size for the session. A generation counter bumps on a mode change (resize): the driver tears down + recreates the ring at the new size, the host notices the generation change and re-opens.
  • Named metadata header (CreateFileMapping): {magic, version, generation, width, height, dxgi_format, ring_len, latest} where latest packs {write_index, monotonic sequence} published after the copy completes. Plain (unprefixed) names — Session-0 shared namespace.
  • Frame-ready auto-reset event so the consumer waits instead of spinning.
  • Producer (driver, per acquired frame): pick (latest_index + 1) % N; try-acquire that slot's keyed mutex with a 0 ms timeout (if the host still holds it — rare with 3 slots — reuse the current slot or skip, never block); CopyResource the acquired MetaData.pSurface into the slot; release the mutex; publish {index, ++seq}; SetEvent. Then FinishedProcessingFrame as today.
  • Consumer (host IddPushCapturer): WaitForSingleObject(event, timeout); read latest; if seq advanced, acquire that slot's mutex, CopyResource into an owned NVENC-input texture, release, yield FramePayload::D3d11{texture, device} — straight into the existing zero-copy NVENC path. No DDA, no CPU readback.

What P2 removes vs. keeps

  • Removes: capture/dxgi.rs's DXGI_ERROR_ACCESS_LOST/MODE_CHANGE_IN_PROGRESS re-duplication churn, the legacy-DuplicateOutput fallback, and install_gpu_pref_hook() (the win32u.dll patch) — by pinning the render adapter to the encoder GPU (IddCxAdapterSetRenderAdapter, the existing SET_RENDER_ADAPTER IOCTL, driven before ADD), so the OS never reparents the output and the shared texture + NVENC share one device by construction.
  • Keeps: display topology (making the virtual display the composited desktop) and the watchdog (now ours). The two-process WGC secure-desktop relay stays until we confirm the IDD push also delivers the secure (Winlogon) desktop; if it does, that retires too.

On-glass attempt 2026-06-22 — code complete, blocked at driver load

The full transport (driver publisher + host IddPushCapturer + render-LUID robustness + in-process routing) is written and compiles clean. The first on-glass A/B exposed several real things and one hard blocker:

  • The service captures in a Session-1 WGC helper, not in-process. should_use_helper() returns true for a SYSTEM service, so it spawns a user-session helper that does capture and input injection. IDD-push must capture in-process in Session 0 (where the driver publishes) — wired via should_use_helper() returning false for PUNKTFUNK_IDD_PUSH. Caveat: SendInput from Session 0 can't reach the user's Session-1 desktop, so in-process IDD-push has no working input yet. Production needs either a Session-1 input-only helper, or Global\-namespaced shared textures so a Session-1 helper consumes IDD-push for both video + input.
  • SET_RENDER_ADAPTER is ignored by the driver (the IDD lands on a different adapter than pinned: observed IDD adapter 0xd60722 vs pinned 4090 0x15de1). The render-LUID-in-header path makes the host bind correctly regardless, but the driver should be made to actually honor the pin (or the host must copy across adapters) so NVENC gets a 4090 surface.
  • Cursor is included in the IddCx composited frame (DDA strips it) — so the host-side cursor compositor (P2.5) is likely unnecessary for this path.
  • FAILED_POST_START was a red herring (churn, not the binary). Comparing the 2157 (works) and the frame_transport DLL import tables: identical (same 8 DLLs; the size/hash delta is just the Authenticode signature). A clean install + reboot (no restart-device/disable-enable/kill in between) loads the frame_transport driver to OK. The earlier FAILED_POST_START was the device wedging from the hot-reload churn (the deploy gotchas above). Lesson: deploy = install + reboot, full stop.
  • THE REAL BLOCKER — the driver can't CREATE the shared objects. With the driver loaded clean and the monitor active, the host's IddPushCapturer still times out: pfvd-hdr-<target> never appeared. The driver's own OutputDebugString is invisible (UMDF redirects it to ETW, not DebugView — verified with a working DBWIN self-test), so a file-logging driver build was tried — and it wrote no file at all, even though init() runs in DriverEntry, the device is OK, WUDFHost runs as LocalService, and C:\Users\Public is world-writable. WUDFHost runs with a restricted token: it can neither write the filesystem nor create named kernel objects (CreateFileMappingW/CreateEventW/ CreateSharedHandle), so FramePublisher::new fails silently. This is exactly why the gamepad UMDF drivers invert it: inject/dualsense_windows.rs"the host creates the section (privileged → a permissive SDDL so the WUDFHost can open it); the driver maps it"Global\pfds-shm-<idx> + SDDL D:(A;;GA;;;WD). Fix: invert frame-push to match. The HOST creates the header + event + ring textures (Global\ names, D:(A;;GA;;;WD) SDDL); the DRIVER only OPENS them, writes its actual render LUID + a status code back into the host-created header (so we get driver visibility through the host log), and runs the copy loop. The host creates the textures on the render adapter the driver reports.
  • Also unresolved: SET_RENDER_ADAPTER appears ignored (the host's pin to the 4090 vs the ADD-reply adapter differ every time). The inverted header carries the driver's actual render LUID so the host can create textures + run NVENC on the right adapter — but if that's the iGPU, NVENC (NVIDIA) can't encode it, so the driver must be made to honor the pin (or the host must cross-adapter copy). Needs its own investigation.

Driver deploy gotchas learned (this box): hot-reloading a UMDF display driver is unreliable — pnputil /restart-device does NOT restart WUDFHost (old image stays mapped), Disable/Enable-PnpDevice errors on the root-enumerated IDD, and killing WUDFHost invalidates the host's cached {e5bcc234} control handle (every ADD then fails 0x80070006, and the device can wedge to FAILED_POST_START). A reboot loads a freshly-installed build cleanly. Recovery from a broken build is clean and reboot-free: pnputil /delete-driver <oemNN>.inf /uninstall removes the bad package and the device rebinds the previous (validated) package in the DriverStore — restored 2157 → OK immediately.

On-glass attempt 2 (2026-06-23) — inversion works; in-process Session-0 path is a dead end

Implemented the inversion (host creates the header + event + ring textures with the D:(A;;GA;;;WD) SDDL, driver only opens them) + a per-attempt generation (kills the DXGI_ERROR_NAME_ALREADY_EXISTS retry collisions) + a fixed-name Global\pfvd-dbg debug channel (structured counters the driver writes, since UMDF/ETW + the restricted token block its other logs). Results on the RTX box:

  • The host creates the shared ring every time (created shared ring … render_luid=…) — the privileged-create / restricted-open split is sound.
  • No more name collisions (generation fix).
  • The driver writes NOTHING — debug block all zeros, crucially run_core_entries=0. The swap-chain processor never runs, i.e. the OS never assigns a swap-chain to the virtual monitor in this path.

Root cause: an IddCx monitor only gets a swap-chain when something PRESENTS to it, and the in-process path has no presenter. The host + the CCD topology-isolate run in Session 0, which has no DWM / compositor. The WGC path works because its capture helper lives in Session 1, where DWM composes the desktop onto the display (that composition is the swap-chain trigger). So in-process Session-0 IDD-push gets no frames to push, full stop — a fundamental barrier, not a fixable bug. The original plan's "Session-0 transport is the long pole" was right, but the long pole turned out to be triggering presentation, not the shared-memory mechanics (those work).

Consequence: the only viable IDD-push shape is option 3 — a Session-1 helper drives presentation + consumes the Global\ ring (the inversion built here is exactly what it needs). But it carries an unretired risk: it's still unproven whether the swap-chain gets assigned even with a Session-1 consumer that isn't WGC. Until that's answered, DDA/WGC stays the shipping Windows capture path — it works. All the IDD-push code (driver open-side + host create-side + debug channel) is written, compiles, and is gated behind PUNKTFUNK_IDD_PUSH (off), so it's dormant and harmless.

CONCLUSION (2026-06-23): IDD-push is not viable for bare-metal capture — the swap-chain is never assigned

After the inversion + a fixed-name debug channel + a host-created-ring observer + an autonomous loopback test harness (punktfunk-probe → the SYSTEM service, paired via the mgmt API), the question "does the driver's swap-chain processor ever run?" was answered definitively: no. The driver's run_core is never enteredrun_core_entries=0 in every configuration tested:

  • in-process (Session 0) and WGC-triggered (Session 1 helper) sessions,
  • a user-created ring AND a host-created (LocalSystem) ring with a permissive D:(A;;GA;;;WD) SDDL,
  • with and without a Low-IL (S:(ML;;NW;;;LW)) mandatory label,
  • with WUDFHost confirmed not an AppContainer (IsAppContainer=0),

— even while WGC simultaneously captured the same virtual monitor's composition and streamed multi-MB of HEVC. The gamepad UMDF drivers prove a UMDF driver can open + write a host-created Global\ section on this box, so the driver writing nothing is not an access problem — run_core simply does not run.

Root cause (researched + ecosystem-confirmed): an IddCx virtual monitor only receives a swap-chain (EVT_IDD_CX_MONITOR_ASSIGN_SWAPCHAIN) when the OS presents/scans-out to it, which requires a real presentation consumer. WGC/DDA capture of the composed desktop does NOT count — it reads DWM's composition, bypassing the driver's swap-chain. With no physical scanout and no consumer that routes through the driver, the path stays inactive (IDDCX_PATH_FLAGS=0) and ASSIGN_SWAPCHAIN never fires. Confirming evidence:

  • Every bare-metal virtual-display capture project uses WGC/DDA, not the driver swap-chain: SudoVDA (its swap-chain loop acquires-and-discards), Apollo/Sunshine (DDA + WGC backends), virtual-display-rs (discards), parsec-vdd (no frame path). Only Looking Glass consumes the driver swap-chain — and only because a VM guest scans out the display (the consumer). We have no equivalent on bare metal.
  • Microsoft's own unanswered Q&A (learn.microsoft.com/answers 4096179) reports the identical symptom for the IddSampleDriver: virtual display "always inactive," ASSIGN_SWAPCHAIN never runs.

Verdict: the "driver consumes its swap-chain and pushes frames" architecture (P2 / Looking-Glass style) cannot get frames for punktfunk's bare-metal, whole-desktop, capture-only use case. The shared-memory transport machinery (host-creates / driver-opens, the gamepad pattern) is all sound and proven to create, but there is nothing for the driver to publish. DDA/WGC remains the only viable Windows capture path, which is exactly what the entire ecosystem does. The IDD-push code stays in-tree, compiles, and is gated off (PUNKTFUNK_IDD_PUSH) — dormant and harmless — documenting the attempt so it isn't re-tried. "Better performance/lower overhead" must come from optimizing the WGC/DDA path (e.g. trimming the Session-0↔Session-1 relay, zero-copy encode), not from IDD-push.

The only unexplored avenue is driver-side (a different adapter/monitor/path setup that might make the OS treat the virtual display as a presentation target) — but it needs a reboot to test, the MS Q&A suggests it's unsolved, and the unanimous ecosystem choice of WGC/DDA argues it's a dead end.

Final exhaustion (2026-06-23, follow-up): both remaining avenues closed.

  • Option 3 (present source) — TESTED, failed. Added a present-trigger to the Session-1 WGC helper: it successfully created a D3D11 swapchain on the virtual display and presented continuously (WGC even captured the flashing window). The driver stayed run_core_entries=0 / frames_acquired=0. So an active present source on the display does NOT make the OS assign the driver's swap-chain either — DWM composes the present onto the display (capturable) without routing it through the driver's swap-chain.
  • Option 2 (driver flag) — closed by analysis. The present-trigger succeeding proves the path is already active (a swapchain presents to the display fine); the missing piece is scanout routed through the driver, which the OS does only for a real consumer (physical display / VM guest / RDP). The one IddCx flag for that — IDDCX_ADAPTER_FLAGS_REMOTE_SESSION_DRIVER — requires the RDP protocol stack as the consumer, which bare-metal console capture has no equivalent of.

Verdict is final: IDD-push needs a presentation consumer (scanout / VM guest / RDP) that bare-metal console desktop-capture fundamentally cannot provide. No host-side capture, no in-process path, no present source, and no available driver flag overcomes it. WGC (normal desktop) + DDA (secure desktop) is the only viable Windows capture path — as the entire ecosystem already does. The IDD-push + present-trigger code stays in-tree, gated off, as the documented record of the attempt.

Known gaps the build-out must close (tracked as P2.* tasks)

  • Cursor. DDA/WGC composite the HW cursor host-side from frame-info; the IDD path delivers the cursor separately (IddCxMonitorSetupHardwareCursor event → QueryHardwareCursor). The prototype may ship cursor-less; the build-out wires the IDD cursor into the existing CursorCompositor.
  • HDR. The default IddCx swap-chain surface is 8-bit B8G8R8A8; FP16/HDR needs the IddCx 1.11 D3D12 acquire path (SetDevice2/ReleaseAndAcquireBuffer2ID3D12Resource). Build against 1.10, runtime-gate 1.11. SDR-only for the prototype.

Why we'd do this

The user's goals, mapped to outcomes:

Goal Outcome
Drop external deps No more vendored prebuilt SudoVDA .dll/.cat (third-party, C++, single upstream).
Increase Rust coverage The display driver joins the gamepad drivers as in-tree Rust UMDF.
Own the stack / easier display management We control the IOCTL protocol, the EDID, the mode list, the watchdog — and can fold the topology/mode logic that's currently scattered in vdisplay/sudovda.rs into the driver.
Cleaner code Phase 2 retires capture/dxgi.rs's DDA workarounds + the win32u.dll patch.

What we'd be replacing (current architecture)

  • Driver: SudoVDA — UMDF2 IddCx, Class=Display, UmdfExtensions=IddCx0102, UpperFilters=IndirectKmd, root-enumerated Root\SudoMaker\SudoVDA. Vendored prebuilt under packaging/windows/sudovda/, installed by install-sudovda.ps1 (cert → nefconc devnode → pnputil). Source is public (SudoMaker/SudoVDA, README-only MIT/CC0 grant over the MS sample, ~1,900 LOC C++).
  • Host contract: crates/punktfunk-host/src/vdisplay/sudovda.rs opens the control device by interface GUID {e5bcc234-…} and drives a tiny METHOD_BUFFERED IOCTL protocol — byte-identical to SudoVDA's Common/Include/sudovda-ioctl.h:
    • ADD (0x800) {w,h,refresh,GUID,name[14],serial[14]}{LUID, target_id}
    • REMOVE (0x801) {GUID} · SET_RENDER_ADAPTER (0x802) {LUID} · GET_WATCHDOG (0x803) · PING (0x888) (mandatory keepalive) · GET_VERSION (0x8FF)
  • Capture: capture/dxgi.rs finds the virtual monitor's GDI output across all adapters (it's enumerated under the rendering GPU, not SudoVDA's LUID) and runs DXGI Desktop Duplication (DuplicateOutput1, FP16 for HDR). This file is dominated by virtual-display-over-DDA survival code: DXGI_ERROR_ACCESS_LOST re-duplication with retries, MODE_CHANGE_IN_PROGRESS backoff, legacy-DuplicateOutput fallback, CCD display isolation to make the IDD the sole composited desktop, and an install_gpu_pref_hook() that patches win32u.dll!NtGdiDdDDIGetCachedHybridQueryValue to stop DXGI reparenting the output across GPUs. Most of that exists because we capture a virtual display via DDA on a multi-GPU box.

Feasibility findings

Signing — green (the make-or-break)

UMDF user-mode ⇒ Code-Integrity signing rules don't apply to our binary (the only kernel piece is Microsoft's inbox IndirectKmd). Self-signed cert in Root + TrustedPublisher is sufficient on a normal Secure-Boot Win11 box — no bcdedit /set testsigning. SudoVDA and virtual-display-rs both ship this way. This is the same model as our DualSense/DS4/XUSB drivers. (The only thing that breaks install is a botched cert placement, not a signing tier.)

Rust prior art — exists, MIT, reusable

virtual-display-rs proves an all-Rust IddCx driver runs in production and gives us: wdf-umdf-sys (bindgen over WDF and iddcx.h, links IddCxStub), wdf-umdf (safe wrappers — iddcx.rs ~300 LOC, with an IddCxIsFunctionAvailable! version-gate macro), and a reference driver (swap_chain_processor.rs ~158 LOC, direct_3d_device.rs, edid.rs). Caveat: it uses its own bindgen stack, not microsoft/windows-drivers-rs — see Decision D2.

windows-drivers-rs IddCx support — absent, but a bounded extension

Our wdk-sys (m0) binds Base + WDF + feature-gated subsets (hid/gpio/spb/…). Zero IddCx symbols. Adding it is the same shape as the existing subsets: an ApiSubset::Iddcx variant + iddcx feature → iddcx_headers() returning iddcx.h for bindgen, and linking IddCx.lib. IddCx functions are not WDF-table functions, so the call_unsafe_wdf_function_binding! macro doesn't apply — they're direct IddCx.lib exports we'd #[link(name="IddCx")] extern (or bindgen) and wrap ourselves. windows 0.58 (already in the tree) provides the Direct3D11/Dxgi APIs the swap-chain loop needs.

The IddCx driver itself — well-understood, ~12k LOC

Required callbacks (baselined on the MS IddSampleDriver, ~1,100 LOC, IddCx 1.4): EVT_IDD_CX_ADAPTER_INIT_FINISHED, ADAPTER_COMMIT_MODES, PARSE_MONITOR_DESCRIPTION, MONITOR_GET_DEFAULT_DESCRIPTION_MODES, MONITOR_QUERY_TARGET_MODES, MONITOR_ASSIGN_SWAPCHAIN (the only callback with real D3D work), MONITOR_UNASSIGN_SWAPCHAIN, and DEVICE_IO_CONTROL (where our ADD/REMOVE/PING IOCTLs live). Init flow: WdfDeviceCreate → IddCxDeviceInitConfig → IddCxDeviceInitialize → IddCxAdapterInitAsync → IddCxMonitorCreate → IddCxMonitorArrival.

Arbitrary resolutions don't need EDID timings: ship one generic ~128/256-byte EDID base block to make Windows treat the target as a real monitor, then advertise modes programmatically from the mode-list callbacks — a static table plus the runtime-requested client mode injected as preferred (exactly SudoVDA's s_DefaultModes[] + per-ADD preferred-mode approach). 5120×1440@240 just gets added at ADD time.

HDR/10-bit: supported, but it's the one place IddCx is harder than today. The default swap-chain surface is 8-bit A8R8G8B8; FP16/HDR requires the IddCx 1.11 D3D12 acquire path (SetDevice2/ReleaseAndAcquireBuffer2ID3D12Resource, with a stricter sync model). Our box is Win11 26200 (IddCx ≥ 1.10), so this is reachable, but it's real work — and our current WGC/DDA path gives FP16 HDR "for free." Build against 1.10 and runtime-gate the newer DDIs (SudoVDA's pattern).

The architectural prize: skip DDA (Phase 2)

An IddCx driver gets each presented frame from IddCxSwapChainReleaseAndAcquireBuffer as an IDXGIResource on a device we bind via IddCxSwapChainSetDevice. We can copy it into a shared texture / shared section and hand it to the host's encoder process directly — no Desktop Duplication. Why this is the real win, not just a detour:

  • It's the intended IddCx use case. IddCx exists for remote/wireless/USB displays that ship swap-chain frames over a wire; consuming frames in the driver is the designed path, and Looking Glass already does exactly this (driver → shared memory → separate consumer, no DDA).
  • It kills the multi-GPU bug class. We call IddCxAdapterSetRenderAdapter to pin the swap-chain to the same GPU as our NVENC encoder before adding the monitor, and the OS honors it. No more DXGI reparenting the output onto the wrong GPU, no ACCESS_LOST storms, and we can retire install_gpu_pref_hook() (the win32u.dll patch) and most of capture/dxgi.rs. Swap-chain re-creation becomes a documented, in-band event (ABANDON_SWAPCHAIN) instead of an undocumented failure we fight with retries.

What it does not remove (be honest): display topology management — making the virtual display the sole/primary composited desktop so the game (and Winlogon) render to it — is independent of how we get frames and stays (though we can integrate it more cleanly). And the watchdog stays, now ours.

The cost: a Session-0 → service cross-process frame transport (the driver host is WUDFHost in Session 0 / LocalService; our host is a LocalSystem service). A Global\-named, explicitly-ACL'd shared section + keyed-mutex texture (Looking Glass's shape) is where the engineering actually goes — prototype this first, it's the only genuinely new risk. Plus the HDR D3D12 path above.

Decisions to make at kickoff

  • D1 — Own the driver? Recommend yes, in Rust. (Alternatives: fork SudoVDA's C++ — fastest to a known-good HDR driver but reintroduces a C++ toolchain and README-only license provenance; or keep vendoring — zero cost, but none of the goals.)
  • D2 — Binding stack? The main implementation fork.
    • (a) Extend our windows-drivers-rs (m0) with an iddcx subset — one toolchain across all our drivers, our build env, but we write the IddCx bindings ourselves (+~35 wk), using virtual-display-rs's iddcx.rs as the 1:1 guide. Preferred for consistency.
    • (b) Vendor virtual-display-rs's wdf-umdf* crates (MIT) — fastest to first light, but a second WDK-binding stack in-tree.
    • Suggested sequence: prototype on (b) to prove IddCx-on-our-box in days, then build production on (a) for consistency.
  • D3 — Frame transport? Phase it: DDA-compatible first (zero capture-side change), direct push second (the cleanup). Don't couple the driver rewrite to the transport rewrite.
  • P0 — now: keep vendoring SudoVDA. No change. (The gamepad-driver installer work just shipped; this is independent.)
  • P1 — drop-in Rust IddCx driver (pf-vdisplay). Replicate SudoVDA's IOCTL contract exactly (same struct layouts; reuse or re-issue the control interface GUID) so vdisplay/sudovda.rs needs ~zero change (at most a GUID constant). Class=Display + IddCx INF, our own EDID + programmatic mode list incl. the per-ADD client mode, the watchdog, a real swap-chain drain (the vdd port — the drain is required so DWM keeps compositing; DDA/WGC still captures the desktop). Bundle + self-sign + pnputil-install via the installer, identical to the gamepad-driver path we just built. Outcome: all-Rust, SudoVDA dependency dropped, DDA capture unchanged. Effort ≈ 24 wk to first light, 57 wk to parity (HDR, multi-monitor, CI).
  • P2 — direct frame push (kill DDA). Add a swap-chain processor that copies each frame into a shared section/texture; new capture backend reads it directly; pin the render adapter to the encoder GPU. Gate behind a flag, validate against DDA, then retire the DDA path + the win32u.dll patch. HDR via the IddCx 1.11 D3D12 acquire path. Outcome: the real "owning the stack pays off" cleanup. Effort: additional; the Session-0 transport is the long pole.

Risks

  1. D3-in-a-driver swap-chain loop — the one genuinely new piece; bugs here = black screens/TDR. Mitigated by virtual-display-rs's swap_chain_processor.rs + the MS sample as references.
  2. Session-0 cross-process transport (P2) — the actual hard part; prototype it first.
  3. HDR = the harder D3D12 1.11 path — our current WGC/DDA HDR is free; the IddCx HDR path is not.
  4. Two binding stacks if we go D2(b) — a maintenance cost cutting against "clean/consistent."
  5. No WHQL ⇒ no Windows Update / Dev-Center distribution — same constraint our gamepad drivers already accept (bundle + self-sign + import cert).

References