HDR (display-driven, matching the WGC path): - CTA-861.3 HDR EDID (BT.2020 primaries + HDR Static Metadata block) so Windows offers "Use HDR" on the virtual display. The host FOLLOWS the display's live advanced-color state, recreating the shared ring at the matching format (FP16 in HDR / BGRA in SDR) on a toggle — no freeze. - Always emit Main10/BT.2020-PQ Rgb10a2 while the display is HDR; the client auto-detects PQ from the HEVC VUI (clients under-report VIDEO_CAP_10BIT). Generic HDR10 mastering SEI on every IDR. - Generation-tagged `latest` (gen<<40|seq<<8|slot) + driver `is_stale` re-attach kill the toggle-time garbage frame and any stale-ring read. Perf: - Pipeline the encode loop (Capturer::pipeline_depth; IDD-push = 2): submit N+1 before polling N so the convert/copy on the 3D engine overlaps the NVENC encode of N on the ASIC. PUNKTFUNK_IDD_DEPTH overrides (1 = synchronous). - Rotating host output ring (OUT_RING) so the in-flight encode and the next convert never touch the same texture. - HDR converts directly from the keyed-mutex slot's SRV into the output ring (drops the redundant slot->fp16 scratch copy); SDR copies the BGRA slot in. The slot mutex is held only across the convert/copy, not the encode. RING_LEN 3->6 for publish headroom. - Capture-health diagnostic: new_fps vs repeat_fps under PUNKTFUNK_PERF (a low new_fps at a high send rate means the source isn't compositing, not an encode stall). Validated live on the RTX box: 5120x1440@240 HDR streams; driver composes ~180 new fps, encode 240 fps @ ~4.3 ms p50. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
38 KiB
Windows virtual display — a Rust port of SudoVDA (investigation & plan)
Status: P1 done — pf-vdisplay validated streaming on glass at 5120×1440@240 (2026-06-22). The
all-Rust IddCx driver replaces the vendored SudoVDA C++ driver, matching the "all-Rust UMDF, zero
external driver deps" direction we finished for gamepads (ViGEmBus gone; DualSense/DS4/XUSB shipped).
The investigation/plan below is kept for context; see Validated on-box for the result.
TL;DR
A Rust port is feasible, low-on-blockers, and strategically aligned — and there's an unexpected architectural prize beyond "same thing, in Rust."
- Signing is not a blocker. An IddCx driver is UMDF user-mode; it needs no WHQL, no
attestation, no test-signing. A self-signed cert in LocalMachine
Root+TrustedPublisherloads it — exactly the model our gamepad drivers already ship (and exactly what SudoVDA and the other forks do). (Do UMDF drivers require signing?) - We would not be first in Rust.
MolotovCherry/virtual-display-rsis a complete, shipping IddCx driver written in Rust (MIT), with hand-rolled IddCx/WDF bindgen bindings (wdf-umdf-sys+wdf-umdf) and a reference swap-chain processor. This turns "greenfield FFI" into "adapt a proven reference." - The prize: we can stop using DXGI Desktop Duplication. An IddCx driver already receives the
composited desktop frames in its swap-chain. Looking Glass
ships exactly this in production — driver consumes the swap-chain, hands frames to a separate
process, "operates entirely independently of DDA." Doing the same would delete an entire class of
multi-GPU bugs the current
capture/dxgi.rsis built to survive (ACCESS_LOST storms, MODE_CHANGE_IN_PROGRESS, thewin32u.dllreparenting patch).
Recommendation: yes, build it in Rust, in phases — a drop-in DDA-compatible driver first (own the stack at low risk), then the direct-frame-push path (the real cleanup). Keep vendoring SudoVDA as the safe interim until the Rust driver is on-glass-validated on the RTX box.
Validated on-box (2026-06-22)
Before committing, the toolchain + load path were proven on the RTX box (Win11 26200, WDK 26100):
- A Rust IddCx driver builds with our toolchain. Cloned
virtual-display-rsand built its driver.dllagainst our WDK (UMDF 2.31 + IddCx 1.4 stubs, bindgen overIddCx.hvia our LLVM, nightly-2024-07-26). One fix needed: itsbuild.rspicked the max SDK Lib version (10.0.28000.0, a base SDK with no IddCx) for theIddCxStubsearch path; resolving it by the version that actually containsum\x64\iddcx\1.4(10.0.26100.0, the WDK) fixed the link. - It installs self-signed and loads. Signed
.dll/.catwith our existing driver cert (the gamepadpunktfunk-ds-test),pnputil /add-driver, root devnode viadevgen. The device came up Status OK / CM_PROB_NONE, Class Display, hosted byWUDFRd— a Rust IddCx adapter initialized cleanly. (SudoVDA, already live here, independently confirms IddCx + self-signed UMDF work on this box.) Test artifacts removed afterward; SudoVDA untouched.
Conclusion: the central risk ("can we build + load a Rust IddCx driver here?") is retired. The
binding question (D2) resolves toward reusing virtual-display-rs's self-contained wdf-umdf-sys +
wdf-umdf bindgen crates (now proven to build + load on our box) rather than extending
windows-drivers-rs — IddCx functions are direct IddCxStub exports the WDF function-table macro
can't reach anyway, so a unified bindgen is the cleaner base for pf-vdisplay. Reference clone kept at
C:\Users\Public\virtual-display-rs.
Scaffold + driver logic landed + on-glass: packaging/windows/vdisplay-driver/ — vendored
wdf-umdf-sys/wdf-umdf (MIT, + the SDK-version build.rs fix) + the pf-vdisplay driver crate. The
full IddCx driver is ported (entry → IDD_CX_CLIENT_CONFIG with all 7 callbacks → device/monitor
context → our own EDID → a real swap-chain drain), with the IPC/serde/tokio stack replaced by an
in-tree monitor model and OutputDebugString logging. Validated on the RTX box: built, signed
(our punktfunk-ds-test cert), installed, loaded Status OK, and arrived a real virtual monitor
("VirtuDisplay+", DISPLAY\CHY0000) — i.e. an OURS, all-Rust IddCx virtual display creating a monitor.
IOCTL control plane done + on-glass (P1 functionally complete): the SudoVDA-compatible control
plane is implemented (EVT_IDD_CX_DEVICE_IO_CONTROL + the {e5bcc234-…} interface registered via
WdfDeviceCreateDeviceInterface; control.rs with byte-identical structs) — ADD a monitor at a
requested mode → {LUID, target_id} (target id + adapter LUID captured from IDARG_OUT_MONITORARRIVAL),
REMOVE by GUID, PING/GET_WATCHDOG watchdog, GET_VERSION, SET_RENDER_ADAPTER
(IddCxAdapterSetRenderAdapter); per-ADD mode injection (requested mode preferred + fallbacks). Added
the five missing FFI wrappers to the vendored wdf-umdf. Validated on the RTX box with a probe
that mimics vdisplay/sudovda.rs exactly: GET_VERSION → 0.2.1, GET_WATCHDOG → timeout=3,
ADD 1920×1080@60 → target_id=257 + adapter LUID, a real "VirtuDisplay+" monitor arrived at the
requested mode, REMOVE ok. Constraint: pf-vdisplay can't coexist with SudoVDA — they register the
same interface GUID, so two IddCx adapters claiming it → FAILED_POST_START; pf-vdisplay replaces
SudoVDA (validated by disabling SudoVDA first).
Watchdog + real-host drive validated: added the watchdog thread (1 Hz countdown reset by any IOCTL;
tears down all monitors at 0 so a gone host never leaves a phantom display; mirrors SudoVDA's
RunWatchdog). Pointed the real host at it — removed SudoVDA's devnode so pf-vdisplay is the sole
{e5bcc234} provider, then ran the host's vdisplay::sudovda::tests::live_create_drop
(PUNKTFUNK_SUDOVDA_LIVE=1): test passed, and the pf-vdisplay log shows the host's IOCTLs landing —
ADD 1920x1080@60 → target_id=258, luid=…02619823, then the watchdog correctly tore the monitor down
when the test process exited without a final REMOVE. So vdisplay/sudovda.rs drives pf-vdisplay
unchanged through the full control contract.
Validated streaming end-to-end on glass (2026-06-22) — P1 complete. pf-vdisplay is a working
SudoVDA replacement. Driven by the real host (serve, the LocalSystem service) with a stock client
at 5120×1440@240: the monitor arrives, resolve_gdi_name → \\.\DISPLAY10, set_active_mode +
CCD-isolate succeed, the DXGI output resolves under the RTX 4090, WGC capture + NVENC run at
steady 240 fps, ~2.4 ms encode, 6512 AUs sent, clean teardown (isolate restored rc=0x0). Same
vdisplay/sudovda.rs path, unchanged — full parity with SudoVDA.
The earlier "monitor arrives but never gets a swap-chain / no DXGI output" symptoms were a measurement + state artifact, not a driver bug. Two traps cost a lot of time:
- Session 0. Every standalone probe (
vdtest, the host'slive_create_droptest) ran in Session 0 — the services session, whose desktop is a throwaway 1024×768 basic display. IddCx activation happens in the console Session 1, where the 4090 drives the real desktop. SoScreen.AllScreens/CCD queries from Session 0 can never see the virtual monitor activate — they report the wrong desktop. The only valid way to drive + observe it is the host service (SYSTEM, which targets Session 1) plus the driver's ownOutputDebugString(system-wide, session-agnostic). - Accumulated device-state damage. Repeated reinstalls +
Disable/Enable-PnpDevicecycles + a control handle the host cached across all of it left the device tree wedged (stale handle → the host's PINGs fail → the 3 s watchdog tears the monitor down mid-session → capture opens a dying display → "no DXGI output"). A reboot cleared it and it worked on the first connect. Lesson: after device churn, restart the host service (fresh handle) — and when in doubt, reboot.
The swap-chain processor is a faithful port of virtual-display-rs's (it drains correctly via
ReleaseAndAcquireBuffer + FinishedProcessingFrame — the drain is required; a true no-op would
stall DWM and freeze the captured image). The EDID is our own clean 128-byte block (manufacturer
PNK, product punktfunk) — no SudoVDA bytes.
Build gotcha (important for iterating): updating an installed UMDF driver only takes if the INF
DriverVer changes — deploy-dev.ps1 stamps a date.time -v on every run; without a bump the old
binary keeps running (silently). Devnode hygiene: create the root devnode with
nefconc --create-device-node (a clean ROOT\DISPLAY node), NOT devgen /add — devgen makes
persistent SWD\DEVGEN software devices that survive reboot and registry deletion and resurrect
on every pnputil /add-driver (they have hwid root\pf_vdisplay, so the driver install re-materializes
them). The production installer must use a single nefconc/INF-created node and never devgen.
P2 — direct frame push (kill DDA): design & decision record
Status: in progress. P1 ships frames the old way (the driver drains its swap-chain and DDA/WGC
re-captures the composited desktop). P2 makes the driver publish each swap-chain frame to the host
directly, so we can retire Desktop Duplication and its multi-GPU survival code. Built behind
PUNKTFUNK_IDD_PUSH, A/B'd against DDA, and only then made the default.
The decisive finding: producer and consumer are both in Session 0
The whole transport design hinged on one unknown — same-session or cross-session? Measured on the
RTX box (2026-06-22): the pf-vdisplay host process is WUDFHost.exe with
-DeviceGroupId:pfVDisplayGroup, running in Session 0; the punktfunk host service is LocalSystem,
also Session 0. So the swap-chain processor thread (spawned by our own thread::spawn inside the
driver, i.e. in WUDFHost) and the encoder live in the same session. This is the easy case:
- A D3D11 shared keyed-mutex texture created in the driver can be opened by name in the host with
ID3D11Device1::OpenSharedResourceByName— both devices created on the same render-adapter LUID (which the driver already reports out of theADDIOCTL viaOsAdapterLuid, surfaced asWinCaptureTarget::adapter_luid). - Named kernel objects resolve through Session 0's shared
\BaseNamedObjects, so noGlobal\prefix /SeCreateGlobalPrivilegegymnastics are needed (kept the names unprefixed; documented that this relies on both processes being Session 0). The Looking-Glass cross-VM shared-memory device is unnecessary — this is cross-process, same-session, on one GPU.
This collapses the "Session-0 cross-process transport is the long pole" risk from the original plan.
Transport: a ring of shared keyed-mutex textures + a metadata header + an event
A single ping-pong keyed mutex would couple the driver's present rate to the host's consume rate — and
the swap-chain thread must never block (a stalled IddCxSwapChainReleaseAndAcquire/processing loop
freezes DWM compositing system-wide). So, the Looking-Glass shape — multiple frame buffers, newest
wins:
- Ring of
N(default 3) shared textures,RESOURCE_MISC_SHARED_NTHANDLE | SHARED_KEYEDMUTEX, fixed size for the session. A generation counter bumps on a mode change (resize): the driver tears down + recreates the ring at the new size, the host notices the generation change and re-opens. - Named metadata header (
CreateFileMapping):{magic, version, generation, width, height, dxgi_format, ring_len, latest}wherelatestpacks{write_index, monotonic sequence}published after the copy completes. Plain (unprefixed) names — Session-0 shared namespace. - Frame-ready auto-reset event so the consumer waits instead of spinning.
- Producer (driver, per acquired frame): pick
(latest_index + 1) % N; try-acquire that slot's keyed mutex with a 0 ms timeout (if the host still holds it — rare with 3 slots — reuse the current slot or skip, never block);CopyResourcethe acquiredMetaData.pSurfaceinto the slot; release the mutex; publish{index, ++seq};SetEvent. ThenFinishedProcessingFrameas today. - Consumer (host
IddPushCapturer):WaitForSingleObject(event, timeout); readlatest; ifseqadvanced, acquire that slot's mutex,CopyResourceinto an owned NVENC-input texture, release, yieldFramePayload::D3d11{texture, device}— straight into the existing zero-copy NVENC path. No DDA, no CPU readback.
What P2 removes vs. keeps
- Removes:
capture/dxgi.rs'sDXGI_ERROR_ACCESS_LOST/MODE_CHANGE_IN_PROGRESSre-duplication churn, the legacy-DuplicateOutputfallback, andinstall_gpu_pref_hook()(thewin32u.dllpatch) — by pinning the render adapter to the encoder GPU (IddCxAdapterSetRenderAdapter, the existingSET_RENDER_ADAPTERIOCTL, driven beforeADD), so the OS never reparents the output and the shared texture + NVENC share one device by construction. - Keeps: display topology (making the virtual display the composited desktop) and the watchdog (now ours). The two-process WGC secure-desktop relay stays until we confirm the IDD push also delivers the secure (Winlogon) desktop; if it does, that retires too.
On-glass attempt 2026-06-22 — code complete, blocked at driver load
The full transport (driver publisher + host IddPushCapturer + render-LUID robustness + in-process
routing) is written and compiles clean. The first on-glass A/B exposed several real things and one
hard blocker:
- The service captures in a Session-1 WGC helper, not in-process.
should_use_helper()returns true for a SYSTEM service, so it spawns a user-session helper that does capture and input injection. IDD-push must capture in-process in Session 0 (where the driver publishes) — wired viashould_use_helper()returning false forPUNKTFUNK_IDD_PUSH. Caveat:SendInputfrom Session 0 can't reach the user's Session-1 desktop, so in-process IDD-push has no working input yet. Production needs either a Session-1 input-only helper, orGlobal\-namespaced shared textures so a Session-1 helper consumes IDD-push for both video + input. SET_RENDER_ADAPTERis ignored by the driver (the IDD lands on a different adapter than pinned: observed IDD adapter0xd60722vs pinned 40900x15de1). The render-LUID-in-header path makes the host bind correctly regardless, but the driver should be made to actually honor the pin (or the host must copy across adapters) so NVENC gets a 4090 surface.- Cursor is included in the IddCx composited frame (DDA strips it) — so the host-side cursor compositor (P2.5) is likely unnecessary for this path.
FAILED_POST_STARTwas a red herring (churn, not the binary). Comparing the 2157 (works) and theframe_transportDLL import tables: identical (same 8 DLLs; the size/hash delta is just the Authenticode signature). A clean install + reboot (norestart-device/disable-enable/kill in between) loads theframe_transportdriver toOK. The earlierFAILED_POST_STARTwas the device wedging from the hot-reload churn (the deploy gotchas above). Lesson: deploy = install + reboot, full stop.- THE REAL BLOCKER — the driver can't CREATE the shared objects. With the driver loaded clean and
the monitor active, the host's
IddPushCapturerstill times out:pfvd-hdr-<target> never appeared. The driver's ownOutputDebugStringis invisible (UMDF redirects it to ETW, not DebugView — verified with a working DBWIN self-test), so a file-logging driver build was tried — and it wrote no file at all, even thoughinit()runs inDriverEntry, the device isOK, WUDFHost runs asLocalService, andC:\Users\Publicis world-writable. WUDFHost runs with a restricted token: it can neither write the filesystem nor create named kernel objects (CreateFileMappingW/CreateEventW/CreateSharedHandle), soFramePublisher::newfails silently. This is exactly why the gamepad UMDF drivers invert it:inject/dualsense_windows.rs— "the host creates the section (privileged → a permissive SDDL so the WUDFHost can open it); the driver maps it" —Global\pfds-shm-<idx>+ SDDLD:(A;;GA;;;WD). Fix: invert frame-push to match. The HOST creates the header + event + ring textures (Global\names,D:(A;;GA;;;WD)SDDL); the DRIVER only OPENS them, writes its actual render LUID + a status code back into the host-created header (so we get driver visibility through the host log), and runs the copy loop. The host creates the textures on the render adapter the driver reports. - Also unresolved:
SET_RENDER_ADAPTERappears ignored (the host's pin to the 4090 vs the ADD-reply adapter differ every time). The inverted header carries the driver's actual render LUID so the host can create textures + run NVENC on the right adapter — but if that's the iGPU, NVENC (NVIDIA) can't encode it, so the driver must be made to honor the pin (or the host must cross-adapter copy). Needs its own investigation.
Driver deploy gotchas learned (this box): hot-reloading a UMDF display driver is unreliable —
pnputil /restart-device does NOT restart WUDFHost (old image stays mapped), Disable/Enable-PnpDevice
errors on the root-enumerated IDD, and killing WUDFHost invalidates the host's cached {e5bcc234}
control handle (every ADD then fails 0x80070006, and the device can wedge to FAILED_POST_START).
A reboot loads a freshly-installed build cleanly. Recovery from a broken build is clean and
reboot-free: pnputil /delete-driver <oemNN>.inf /uninstall removes the bad package and the device
rebinds the previous (validated) package in the DriverStore — restored 2157 → OK immediately.
On-glass attempt 2 (2026-06-23) — inversion works; in-process Session-0 path is a dead end
Implemented the inversion (host creates the header + event + ring textures with the
D:(A;;GA;;;WD) SDDL, driver only opens them) + a per-attempt generation (kills the
DXGI_ERROR_NAME_ALREADY_EXISTS retry collisions) + a fixed-name Global\pfvd-dbg debug channel
(structured counters the driver writes, since UMDF/ETW + the restricted token block its other logs).
Results on the RTX box:
- ✅ The host creates the shared ring every time (
created shared ring … render_luid=…) — the privileged-create / restricted-open split is sound. - ✅ No more name collisions (generation fix).
- ❌ The driver writes NOTHING — debug block all zeros, crucially
run_core_entries=0. The swap-chain processor never runs, i.e. the OS never assigns a swap-chain to the virtual monitor in this path.
Root cause: an IddCx monitor only gets a swap-chain when something PRESENTS to it, and the in-process path has no presenter. The host + the CCD topology-isolate run in Session 0, which has no DWM / compositor. The WGC path works because its capture helper lives in Session 1, where DWM composes the desktop onto the display (that composition is the swap-chain trigger). So in-process Session-0 IDD-push gets no frames to push, full stop — a fundamental barrier, not a fixable bug. The original plan's "Session-0 transport is the long pole" was right, but the long pole turned out to be triggering presentation, not the shared-memory mechanics (those work).
Consequence: the only viable IDD-push shape is option 3 — a Session-1 helper drives presentation +
consumes the Global\ ring (the inversion built here is exactly what it needs). But it carries an
unretired risk: it's still unproven whether the swap-chain gets assigned even with a Session-1 consumer
that isn't WGC. Until that's answered, DDA/WGC stays the shipping Windows capture path — it works.
All the IDD-push code (driver open-side + host create-side + debug channel) is written, compiles, and is
gated behind PUNKTFUNK_IDD_PUSH (off), so it's dormant and harmless.
CONCLUSION (2026-06-23): IDD-push is not viable for bare-metal capture — the swap-chain is never assigned
After the inversion + a fixed-name debug channel + a host-created-ring observer + an autonomous
loopback test harness (punktfunk-probe → the SYSTEM service, paired via the mgmt API), the question
"does the driver's swap-chain processor ever run?" was answered definitively: no. The driver's
run_core is never entered — run_core_entries=0 in every configuration tested:
- in-process (Session 0) and WGC-triggered (Session 1 helper) sessions,
- a user-created ring AND a host-created (LocalSystem) ring with a permissive
D:(A;;GA;;;WD)SDDL, - with and without a Low-IL (
S:(ML;;NW;;;LW)) mandatory label, - with WUDFHost confirmed not an AppContainer (
IsAppContainer=0),
— even while WGC simultaneously captured the same virtual monitor's composition and streamed multi-MB
of HEVC. The gamepad UMDF drivers prove a UMDF driver can open + write a host-created Global\
section on this box, so the driver writing nothing is not an access problem — run_core simply
does not run.
Root cause (researched + ecosystem-confirmed): an IddCx virtual monitor only receives a swap-chain
(EVT_IDD_CX_MONITOR_ASSIGN_SWAPCHAIN) when the OS presents/scans-out to it, which requires a real
presentation consumer. WGC/DDA capture of the composed desktop does NOT count — it reads DWM's
composition, bypassing the driver's swap-chain. With no physical scanout and no consumer that routes
through the driver, the path stays inactive (IDDCX_PATH_FLAGS=0) and ASSIGN_SWAPCHAIN never fires.
Confirming evidence:
- Every bare-metal virtual-display capture project uses WGC/DDA, not the driver swap-chain: SudoVDA (its swap-chain loop acquires-and-discards), Apollo/Sunshine (DDA + WGC backends), virtual-display-rs (discards), parsec-vdd (no frame path). Only Looking Glass consumes the driver swap-chain — and only because a VM guest scans out the display (the consumer). We have no equivalent on bare metal.
- Microsoft's own unanswered Q&A (learn.microsoft.com/answers 4096179) reports the identical symptom for
the IddSampleDriver: virtual display "always inactive,"
ASSIGN_SWAPCHAINnever runs.
Verdict: the "driver consumes its swap-chain and pushes frames" architecture (P2 / Looking-Glass
style) cannot get frames for punktfunk's bare-metal, whole-desktop, capture-only use case. The
shared-memory transport machinery (host-creates / driver-opens, the gamepad pattern) is all sound and
proven to create, but there is nothing for the driver to publish. DDA/WGC remains the only viable
Windows capture path, which is exactly what the entire ecosystem does. The IDD-push code stays
in-tree, compiles, and is gated off (PUNKTFUNK_IDD_PUSH) — dormant and harmless — documenting the
attempt so it isn't re-tried. "Better performance/lower overhead" must come from optimizing the WGC/DDA
path (e.g. trimming the Session-0↔Session-1 relay, zero-copy encode), not from IDD-push.
The only unexplored avenue is driver-side (a different adapter/monitor/path setup that might make the OS treat the virtual display as a presentation target) — but it needs a reboot to test, the MS Q&A suggests it's unsolved, and the unanimous ecosystem choice of WGC/DDA argues it's a dead end.
Final exhaustion (2026-06-23, follow-up): both remaining avenues closed.
- Option 3 (present source) — TESTED, failed. Added a present-trigger to the Session-1 WGC helper:
it successfully created a D3D11 swapchain on the virtual display and presented continuously (WGC even
captured the flashing window). The driver stayed
run_core_entries=0/frames_acquired=0. So an active present source on the display does NOT make the OS assign the driver's swap-chain either — DWM composes the present onto the display (capturable) without routing it through the driver's swap-chain. - Option 2 (driver flag) — closed by analysis. The present-trigger succeeding proves the path is
already active (a swapchain presents to the display fine); the missing piece is scanout routed
through the driver, which the OS does only for a real consumer (physical display / VM guest / RDP).
The one IddCx flag for that —
IDDCX_ADAPTER_FLAGS_REMOTE_SESSION_DRIVER— requires the RDP protocol stack as the consumer, which bare-metal console capture has no equivalent of.
Verdict is final: IDD-push needs a presentation consumer (scanout / VM guest / RDP) that bare-metal console desktop-capture fundamentally cannot provide. No host-side capture, no in-process path, no present source, and no available driver flag overcomes it. WGC (normal desktop) + DDA (secure desktop) is the only viable Windows capture path — as the entire ecosystem already does. The IDD-push + present-trigger code stays in-tree, gated off, as the documented record of the attempt.
Known gaps the build-out must close (tracked as P2.* tasks)
- Cursor. DDA/WGC composite the HW cursor host-side from frame-info; the IDD path delivers the
cursor separately (
IddCxMonitorSetupHardwareCursorevent →QueryHardwareCursor). The prototype may ship cursor-less; the build-out wires the IDD cursor into the existingCursorCompositor. - HDR. The default IddCx swap-chain surface is 8-bit
B8G8R8A8; FP16/HDR needs the IddCx 1.11 D3D12 acquire path (SetDevice2/ReleaseAndAcquireBuffer2→ID3D12Resource). Build against 1.10, runtime-gate 1.11. SDR-only for the prototype.
Why we'd do this
The user's goals, mapped to outcomes:
| Goal | Outcome |
|---|---|
| Drop external deps | No more vendored prebuilt SudoVDA .dll/.cat (third-party, C++, single upstream). |
| Increase Rust coverage | The display driver joins the gamepad drivers as in-tree Rust UMDF. |
| Own the stack / easier display management | We control the IOCTL protocol, the EDID, the mode list, the watchdog — and can fold the topology/mode logic that's currently scattered in vdisplay/sudovda.rs into the driver. |
| Cleaner code | Phase 2 retires capture/dxgi.rs's DDA workarounds + the win32u.dll patch. |
What we'd be replacing (current architecture)
- Driver: SudoVDA — UMDF2 IddCx,
Class=Display,UmdfExtensions=IddCx0102,UpperFilters=IndirectKmd, root-enumeratedRoot\SudoMaker\SudoVDA. Vendored prebuilt underpackaging/windows/sudovda/, installed byinstall-sudovda.ps1(cert →nefconcdevnode →pnputil). Source is public (SudoMaker/SudoVDA, README-only MIT/CC0 grant over the MS sample, ~1,900 LOC C++). - Host contract:
crates/punktfunk-host/src/vdisplay/sudovda.rsopens the control device by interface GUID{e5bcc234-…}and drives a tinyMETHOD_BUFFEREDIOCTL protocol — byte-identical to SudoVDA'sCommon/Include/sudovda-ioctl.h:ADD (0x800){w,h,refresh,GUID,name[14],serial[14]}→{LUID, target_id}REMOVE (0x801){GUID}·SET_RENDER_ADAPTER (0x802){LUID}·GET_WATCHDOG (0x803)·PING (0x888)(mandatory keepalive) ·GET_VERSION (0x8FF)
- Capture:
capture/dxgi.rsfinds the virtual monitor's GDI output across all adapters (it's enumerated under the rendering GPU, not SudoVDA's LUID) and runs DXGI Desktop Duplication (DuplicateOutput1, FP16 for HDR). This file is dominated by virtual-display-over-DDA survival code:DXGI_ERROR_ACCESS_LOSTre-duplication with retries,MODE_CHANGE_IN_PROGRESSbackoff, legacy-DuplicateOutputfallback, CCD display isolation to make the IDD the sole composited desktop, and aninstall_gpu_pref_hook()that patcheswin32u.dll!NtGdiDdDDIGetCachedHybridQueryValueto stop DXGI reparenting the output across GPUs. Most of that exists because we capture a virtual display via DDA on a multi-GPU box.
Feasibility findings
Signing — green (the make-or-break)
UMDF user-mode ⇒ Code-Integrity signing rules don't apply to our binary (the only kernel piece is
Microsoft's inbox IndirectKmd). Self-signed cert in Root + TrustedPublisher is sufficient on a
normal Secure-Boot Win11 box — no bcdedit /set testsigning. SudoVDA and virtual-display-rs both
ship this way. This is the same model as our DualSense/DS4/XUSB drivers. (The only thing that
breaks install is a botched cert placement, not a signing tier.)
Rust prior art — exists, MIT, reusable
virtual-display-rs proves an all-Rust IddCx driver runs in production and gives us:
wdf-umdf-sys (bindgen over WDF and iddcx.h, links IddCxStub), wdf-umdf (safe wrappers —
iddcx.rs ~300 LOC, with an IddCxIsFunctionAvailable! version-gate macro), and a reference driver
(swap_chain_processor.rs ~158 LOC, direct_3d_device.rs, edid.rs). Caveat: it uses its own
bindgen stack, not microsoft/windows-drivers-rs — see Decision D2.
windows-drivers-rs IddCx support — absent, but a bounded extension
Our wdk-sys (m0) binds Base + WDF + feature-gated subsets (hid/gpio/spb/…). Zero IddCx symbols.
Adding it is the same shape as the existing subsets: an ApiSubset::Iddcx variant + iddcx feature →
iddcx_headers() returning iddcx.h for bindgen, and linking IddCx.lib. IddCx functions are not
WDF-table functions, so the call_unsafe_wdf_function_binding! macro doesn't apply — they're direct
IddCx.lib exports we'd #[link(name="IddCx")] extern (or bindgen) and wrap ourselves.
windows 0.58 (already in the tree) provides the Direct3D11/Dxgi APIs the swap-chain loop needs.
The IddCx driver itself — well-understood, ~1–2k LOC
Required callbacks (baselined on the MS IddSampleDriver, ~1,100 LOC, IddCx 1.4):
EVT_IDD_CX_ADAPTER_INIT_FINISHED, ADAPTER_COMMIT_MODES, PARSE_MONITOR_DESCRIPTION,
MONITOR_GET_DEFAULT_DESCRIPTION_MODES, MONITOR_QUERY_TARGET_MODES, MONITOR_ASSIGN_SWAPCHAIN
(the only callback with real D3D work), MONITOR_UNASSIGN_SWAPCHAIN, and DEVICE_IO_CONTROL (where
our ADD/REMOVE/PING IOCTLs live). Init flow: WdfDeviceCreate → IddCxDeviceInitConfig → IddCxDeviceInitialize → IddCxAdapterInitAsync → IddCxMonitorCreate → IddCxMonitorArrival.
Arbitrary resolutions don't need EDID timings: ship one generic ~128/256-byte EDID base block to
make Windows treat the target as a real monitor, then advertise modes programmatically from the
mode-list callbacks — a static table plus the runtime-requested client mode injected as preferred
(exactly SudoVDA's s_DefaultModes[] + per-ADD preferred-mode approach). 5120×1440@240 just gets
added at ADD time.
HDR/10-bit: supported, but it's the one place IddCx is harder than today. The default swap-chain
surface is 8-bit A8R8G8B8; FP16/HDR requires the IddCx 1.11 D3D12 acquire path
(SetDevice2/ReleaseAndAcquireBuffer2 → ID3D12Resource, with a stricter sync model). Our box is
Win11 26200 (IddCx ≥ 1.10), so this is reachable, but it's real work — and our current WGC/DDA path
gives FP16 HDR "for free." Build against 1.10 and runtime-gate the newer DDIs (SudoVDA's pattern).
The architectural prize: skip DDA (Phase 2)
An IddCx driver gets each presented frame from IddCxSwapChainReleaseAndAcquireBuffer as an
IDXGIResource on a device we bind via IddCxSwapChainSetDevice. We can copy it into a shared
texture / shared section and hand it to the host's encoder process directly — no Desktop
Duplication. Why this is the real win, not just a detour:
- It's the intended IddCx use case. IddCx exists for remote/wireless/USB displays that ship swap-chain frames over a wire; consuming frames in the driver is the designed path, and Looking Glass already does exactly this (driver → shared memory → separate consumer, no DDA).
- It kills the multi-GPU bug class. We call
IddCxAdapterSetRenderAdapterto pin the swap-chain to the same GPU as our NVENC encoder before adding the monitor, and the OS honors it. No more DXGI reparenting the output onto the wrong GPU, no ACCESS_LOST storms, and we can retireinstall_gpu_pref_hook()(thewin32u.dllpatch) and most ofcapture/dxgi.rs. Swap-chain re-creation becomes a documented, in-band event (ABANDON_SWAPCHAIN) instead of an undocumented failure we fight with retries.
What it does not remove (be honest): display topology management — making the virtual display the sole/primary composited desktop so the game (and Winlogon) render to it — is independent of how we get frames and stays (though we can integrate it more cleanly). And the watchdog stays, now ours.
The cost: a Session-0 → service cross-process frame transport (the driver host is WUDFHost in
Session 0 / LocalService; our host is a LocalSystem service). A Global\-named, explicitly-ACL'd
shared section + keyed-mutex texture (Looking Glass's shape) is where the engineering actually goes —
prototype this first, it's the only genuinely new risk. Plus the HDR D3D12 path above.
Decisions to make at kickoff
- D1 — Own the driver? Recommend yes, in Rust. (Alternatives: fork SudoVDA's C++ — fastest to a known-good HDR driver but reintroduces a C++ toolchain and README-only license provenance; or keep vendoring — zero cost, but none of the goals.)
- D2 — Binding stack? The main implementation fork.
- (a) Extend our
windows-drivers-rs(m0) with aniddcxsubset — one toolchain across all our drivers, our build env, but we write the IddCx bindings ourselves (+~3–5 wk), usingvirtual-display-rs'siddcx.rsas the 1:1 guide. Preferred for consistency. - (b) Vendor
virtual-display-rs'swdf-umdf*crates (MIT) — fastest to first light, but a second WDK-binding stack in-tree. - Suggested sequence: prototype on (b) to prove IddCx-on-our-box in days, then build production on (a) for consistency.
- (a) Extend our
- D3 — Frame transport? Phase it: DDA-compatible first (zero capture-side change), direct push second (the cleanup). Don't couple the driver rewrite to the transport rewrite.
Recommended plan
- P0 — now: keep vendoring SudoVDA. No change. (The gamepad-driver installer work just shipped; this is independent.)
- P1 — drop-in Rust IddCx driver (
pf-vdisplay). Replicate SudoVDA's IOCTL contract exactly (same struct layouts; reuse or re-issue the control interface GUID) sovdisplay/sudovda.rsneeds ~zero change (at most a GUID constant). Class=Display + IddCx INF, our own EDID + programmatic mode list incl. the per-ADD client mode, the watchdog, a real swap-chain drain (the vdd port — the drain is required so DWM keeps compositing; DDA/WGC still captures the desktop). Bundle + self-sign +pnputil-install via the installer, identical to the gamepad-driver path we just built. Outcome: all-Rust, SudoVDA dependency dropped, DDA capture unchanged. Effort ≈ 2–4 wk to first light, 5–7 wk to parity (HDR, multi-monitor, CI). - P2 — direct frame push (kill DDA). Add a swap-chain processor that copies each frame into a
shared section/texture; new
capturebackend reads it directly; pin the render adapter to the encoder GPU. Gate behind a flag, validate against DDA, then retire the DDA path + thewin32u.dllpatch. HDR via the IddCx 1.11 D3D12 acquire path. Outcome: the real "owning the stack pays off" cleanup. Effort: additional; the Session-0 transport is the long pole.
Risks
- D3-in-a-driver swap-chain loop — the one genuinely new piece; bugs here = black screens/TDR.
Mitigated by
virtual-display-rs'sswap_chain_processor.rs+ the MS sample as references. - Session-0 cross-process transport (P2) — the actual hard part; prototype it first.
- HDR = the harder D3D12 1.11 path — our current WGC/DDA HDR is free; the IddCx HDR path is not.
- Two binding stacks if we go D2(b) — a maintenance cost cutting against "clean/consistent."
- No WHQL ⇒ no Windows Update / Dev-Center distribution — same constraint our gamepad drivers already accept (bundle + self-sign + import cert).
References
- IddCx model + signing: IDD model overview · IddCx versions · 1.10+ updates · UMDF signing
- Swap-chain / frames: IDDCX_METADATA · SetDevice · SetRenderAdapter · ASSIGN_SWAPCHAIN
- Prior art: microsoft IddSampleDriver · SudoMaker/SudoVDA (ioctl.h) · MolotovCherry/virtual-display-rs (Rust, MIT) · Looking Glass IDD (swap-chain → shm, no DDA) · itsmikethetech/Virtual-Display-Driver