Files
punktfunk/design/windows-virtual-display-rust-port.md
T
enricobuehler d01a8fd17a
windows-host / package (push) Failing after 4m16s
ci / rust (push) Failing after 4m56s
ci / web (push) Failing after 22s
ci / docs-site (push) Successful in 1m7s
android / android (push) Successful in 9m19s
ci / bench (push) Successful in 4m47s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Failing after 3s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Failing after 6m29s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 7m4s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 7m17s
apple / swift (push) Successful in 1m13s
apple / screenshots (push) Successful in 5m27s
feat(host): HDR Vulkan layer so Vulkan games get HDR on the virtual display
NVIDIA/AMD Vulkan ICDs refuse to *advertise* an HDR color space for a surface on an
IddCx indirect/virtual display, so Vulkan games (Doom: The Dark Ages, id Tech, Indiana
Jones, …) report "device does not support HDR" — even though Windows HDR, DWM compose,
and the client PQ stream all work, and the ICD happily *accepts + presents* a forced HDR
swapchain there. The whole gap is enumeration; the community (Apollo/Sunshine/VDD) wrote
this off as kernel-side / unfixable.

Add VK_LAYER_PUNKTFUNK_hdr_inject (packaging/windows/pf-vkhdr-layer/): a standalone
cdylib Vulkan implicit layer that appends {A2B10G10R10, HDR10_ST2084} + {RGBA16F, scRGB}
to vkGetPhysicalDeviceSurfaceFormats[2]KHR (no need to hook vkCreateSwapchainKHR — the
ICD doesn't validate the color space there). Self-gated on the surface monitor's actual
advanced-color state (DisplayConfig GET_ADVANCED_COLOR_INFO), so it is a complete no-op
on SDR sessions and real monitors (dedup). Always-on (registry-discovered) so it works
regardless of how a game is launched — env-scoping silently fails for already-running
Steam. Escape hatches: DISABLE_PF_VKHDR, PF_VKHDR_EXCLUDE, and a built-in kernel-anti-
cheat denylist.

The installer builds/signs/stages it and registers it under
HKLM64\SOFTWARE\Khronos\Vulkan\ImplicitLayers (opt-out "Install the HDR Vulkan layer"
task); windows-host CI fmt+clippy-gates it (msvc-only FFI).

Live-validated on the RTX box: Doom: The Dark Ages enables HDR over the pf-vdisplay
virtual display.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:33:20 +00:00

498 lines
38 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Windows virtual display — a Rust port of SudoVDA (investigation & plan)
Status: **P1 done — `pf-vdisplay` validated streaming on glass at 5120×1440@240** (2026-06-22). The
all-Rust IddCx driver replaces the vendored **SudoVDA** C++ driver, matching the "all-Rust UMDF, zero
external driver deps" direction we finished for gamepads (ViGEmBus gone; DualSense/DS4/XUSB shipped).
The investigation/plan below is kept for context; see **Validated on-box** for the result.
## TL;DR
A Rust port is **feasible, low-on-blockers, and strategically aligned** — and there's an unexpected
architectural prize beyond "same thing, in Rust."
- **Signing is not a blocker.** An IddCx driver is UMDF *user-mode*; it needs **no WHQL, no
attestation, no test-signing**. A self-signed cert in LocalMachine `Root` + `TrustedPublisher`
loads it — **exactly the model our gamepad drivers already ship** (and exactly what SudoVDA and the
other forks do). ([Do UMDF drivers require signing?](https://learn.microsoft.com/en-us/archive/blogs/peterwie/do-umdf-drivers-require-signing))
- **We would not be first in Rust.** [`MolotovCherry/virtual-display-rs`](https://github.com/MolotovCherry/virtual-display-rs)
is a complete, shipping **IddCx driver written in Rust** (MIT), with hand-rolled IddCx/WDF bindgen
bindings (`wdf-umdf-sys` + `wdf-umdf`) and a reference swap-chain processor. This turns "greenfield
FFI" into "adapt a proven reference."
- **The prize: we can stop using DXGI Desktop Duplication.** An IddCx driver already *receives* the
composited desktop frames in its swap-chain. [Looking Glass](https://deepwiki.com/gnif/LookingGlass/2.5-indirect-display-driver-(idd))
ships exactly this in production — driver consumes the swap-chain, hands frames to a separate
process, "operates entirely independently of DDA." Doing the same would **delete an entire class of
multi-GPU bugs** the current `capture/dxgi.rs` is built to survive (ACCESS_LOST storms,
MODE_CHANGE_IN_PROGRESS, the `win32u.dll` reparenting patch).
Recommendation: **yes, build it in Rust**, in phases — a drop-in DDA-compatible driver first (own the
stack at low risk), then the direct-frame-push path (the real cleanup). Keep vendoring SudoVDA as the
safe interim until the Rust driver is on-glass-validated on the RTX box.
## Validated on-box (2026-06-22)
Before committing, the toolchain + load path were proven on the RTX box (Win11 26200, WDK 26100):
- **A Rust IddCx driver builds with our toolchain.** Cloned [`virtual-display-rs`](https://github.com/MolotovCherry/virtual-display-rs)
and built its driver `.dll` against our WDK (UMDF 2.31 + IddCx 1.4 stubs, bindgen over `IddCx.h` via
our LLVM, nightly-2024-07-26). One fix needed: its `build.rs` picked the **max** SDK Lib version
(`10.0.28000.0`, a base SDK with no IddCx) for the `IddCxStub` search path; resolving it by the
version that actually contains `um\x64\iddcx\1.4` (`10.0.26100.0`, the WDK) fixed the link.
- **It installs self-signed and loads.** Signed `.dll`/`.cat` with our existing driver cert (the
gamepad `punktfunk-ds-test`), `pnputil /add-driver`, root devnode via `devgen`. The device came up
**Status OK / CM_PROB_NONE**, Class Display, hosted by `WUDFRd` — a Rust IddCx adapter initialized
cleanly. (SudoVDA, already live here, independently confirms IddCx + self-signed UMDF work on this
box.) Test artifacts removed afterward; SudoVDA untouched.
**Conclusion:** the central risk ("can we build + load a Rust IddCx driver here?") is retired. The
binding question (D2) resolves toward **reusing `virtual-display-rs`'s self-contained `wdf-umdf-sys` +
`wdf-umdf` bindgen crates** (now proven to build + load on our box) rather than extending
`windows-drivers-rs` — IddCx functions are direct `IddCxStub` exports the WDF function-table macro
can't reach anyway, so a unified bindgen is the cleaner base for `pf-vdisplay`. Reference clone kept at
`C:\Users\Public\virtual-display-rs`.
**Scaffold + driver logic landed + on-glass:** `packaging/windows/vdisplay-driver/` — vendored
`wdf-umdf-sys`/`wdf-umdf` (MIT, + the SDK-version build.rs fix) + the `pf-vdisplay` driver crate. The
full IddCx driver is ported (entry → `IDD_CX_CLIENT_CONFIG` with all 7 callbacks → device/monitor
context → our own EDID → a real swap-chain drain), with the IPC/serde/`tokio` stack replaced by an
in-tree `monitor` model and `OutputDebugString` logging. **Validated on the RTX box:** built, signed
(our `punktfunk-ds-test` cert), installed, loaded **Status OK**, and **arrived a real virtual monitor**
("VirtuDisplay+", `DISPLAY\CHY0000`) — i.e. an OURS, all-Rust IddCx virtual display creating a monitor.
**IOCTL control plane done + on-glass (P1 functionally complete):** the SudoVDA-compatible control
plane is implemented (`EVT_IDD_CX_DEVICE_IO_CONTROL` + the `{e5bcc234-…}` interface registered via
`WdfDeviceCreateDeviceInterface`; `control.rs` with byte-identical structs) — `ADD` a monitor at a
requested mode → `{LUID, target_id}` (target id + adapter LUID captured from `IDARG_OUT_MONITORARRIVAL`),
`REMOVE` by GUID, `PING`/`GET_WATCHDOG` watchdog, `GET_VERSION`, `SET_RENDER_ADAPTER`
(`IddCxAdapterSetRenderAdapter`); per-`ADD` mode injection (requested mode preferred + fallbacks). Added
the five missing FFI wrappers to the vendored `wdf-umdf`. **Validated on the RTX box** with a probe
that mimics `vdisplay/sudovda.rs` exactly: `GET_VERSION → 0.2.1`, `GET_WATCHDOG → timeout=3`,
`ADD 1920×1080@60 → target_id=257 + adapter LUID`, a real "VirtuDisplay+" monitor arrived at the
requested mode, `REMOVE` ok. **Constraint:** pf-vdisplay can't coexist with SudoVDA — they register the
same interface GUID, so two IddCx adapters claiming it → `FAILED_POST_START`; pf-vdisplay *replaces*
SudoVDA (validated by disabling SudoVDA first).
**Watchdog + real-host drive validated:** added the watchdog thread (1 Hz countdown reset by any IOCTL;
tears down all monitors at 0 so a gone host never leaves a phantom display; mirrors SudoVDA's
`RunWatchdog`). Pointed the **real host** at it — removed SudoVDA's devnode so pf-vdisplay is the sole
`{e5bcc234}` provider, then ran the host's `vdisplay::sudovda::tests::live_create_drop`
(`PUNKTFUNK_SUDOVDA_LIVE=1`): **test passed**, and the pf-vdisplay log shows the host's IOCTLs landing —
`ADD 1920x1080@60 → target_id=258, luid=…02619823`, then the watchdog correctly tore the monitor down
when the test process exited without a final REMOVE. So `vdisplay/sudovda.rs` drives pf-vdisplay
unchanged through the full control contract.
**Validated streaming end-to-end on glass (2026-06-22) — P1 complete.** pf-vdisplay is a working
SudoVDA replacement. Driven by the **real host** (`serve`, the LocalSystem service) with a stock client
at **5120×1440@240**: the monitor arrives, `resolve_gdi_name → \\.\DISPLAY10`, `set_active_mode` +
CCD-isolate succeed, the DXGI output resolves **under the RTX 4090**, WGC capture + NVENC run at
**steady 240 fps, ~2.4 ms encode**, 6512 AUs sent, clean teardown (`isolate restored rc=0x0`). Same
`vdisplay/sudovda.rs` path, unchanged — full parity with SudoVDA.
**The earlier "monitor arrives but never gets a swap-chain / no DXGI output" symptoms were a
measurement + state artifact, not a driver bug.** Two traps cost a lot of time:
1. **Session 0.** Every standalone probe (`vdtest`, the host's `live_create_drop` test) ran in
**Session 0** — the services session, whose desktop is a throwaway **1024×768** basic display. IddCx
activation happens in the **console Session 1**, where the 4090 drives the real desktop. So
`Screen.AllScreens`/CCD queries from Session 0 *can never* see the virtual monitor activate — they
report the wrong desktop. The only valid way to drive + observe it is the **host service** (SYSTEM,
which targets Session 1) plus the driver's own `OutputDebugString` (system-wide, session-agnostic).
2. **Accumulated device-state damage.** Repeated reinstalls + `Disable`/`Enable-PnpDevice` cycles +
a control handle the host **cached across all of it** left the device tree wedged (stale handle →
the host's PINGs fail → the 3 s watchdog tears the monitor down mid-session → capture opens a dying
display → "no DXGI output"). **A reboot cleared it and it worked on the first connect.** Lesson:
after device churn, restart the host service (fresh handle) — and when in doubt, reboot.
The swap-chain processor is a **faithful port of virtual-display-rs's** (it drains correctly via
`ReleaseAndAcquireBuffer` + `FinishedProcessingFrame` — the drain is *required*; a true no-op would
stall DWM and freeze the captured image). The EDID is our **own clean 128-byte block** (manufacturer
`PNK`, product `punktfunk`) — no SudoVDA bytes.
**Build gotcha (important for iterating):** updating an installed UMDF driver only takes if the INF
**DriverVer changes**`deploy-dev.ps1` stamps a date.time `-v` on every run; without a bump the old
binary keeps running (silently). **Devnode hygiene:** create the root devnode with
`nefconc --create-device-node` (a clean `ROOT\DISPLAY` node), NOT `devgen /add` — devgen makes
**persistent `SWD\DEVGEN` software devices** that survive reboot *and* registry deletion and resurrect
on every `pnputil /add-driver` (they have `hwid root\pf_vdisplay`, so the driver install re-materializes
them). The production installer must use a single `nefconc`/INF-created node and never `devgen`.
## P2 — direct frame push (kill DDA): design & decision record
Status: **in progress.** P1 ships frames the old way (the driver drains its swap-chain and DDA/WGC
re-captures the composited desktop). P2 makes the driver *publish* each swap-chain frame to the host
directly, so we can retire Desktop Duplication and its multi-GPU survival code. Built behind
`PUNKTFUNK_IDD_PUSH`, A/B'd against DDA, and only then made the default.
### The decisive finding: producer and consumer are both in Session 0
The whole transport design hinged on one unknown — same-session or cross-session? **Measured on the
RTX box (2026-06-22):** the pf-vdisplay host process is `WUDFHost.exe` with
`-DeviceGroupId:pfVDisplayGroup`, running in **Session 0**; the punktfunk host service is `LocalSystem`,
also **Session 0**. So the swap-chain processor thread (spawned by our own `thread::spawn` inside the
driver, i.e. in `WUDFHost`) and the encoder live in the **same session**. This is the easy case:
- A D3D11 **shared keyed-mutex texture** created in the driver can be opened by name in the host with
`ID3D11Device1::OpenSharedResourceByName` — both devices created on the **same render-adapter LUID**
(which the driver already reports out of the `ADD` IOCTL via `OsAdapterLuid`, surfaced as
`WinCaptureTarget::adapter_luid`).
- Named kernel objects resolve through Session 0's shared `\BaseNamedObjects`, so **no `Global\`
prefix / `SeCreateGlobalPrivilege` gymnastics** are needed (kept the names unprefixed; documented
that this relies on both processes being Session 0). The Looking-Glass cross-*VM* shared-memory
device is unnecessary — this is cross-*process*, same-session, on one GPU.
This collapses the "Session-0 cross-process transport is the long pole" risk from the original plan.
### Transport: a ring of shared keyed-mutex textures + a metadata header + an event
A single ping-pong keyed mutex would couple the driver's present rate to the host's consume rate — and
**the swap-chain thread must never block** (a stalled `IddCxSwapChainReleaseAndAcquire`/processing loop
freezes DWM compositing system-wide). So, the Looking-Glass shape — multiple frame buffers, newest
wins:
- **Ring** of `N` (default 3) shared textures, `RESOURCE_MISC_SHARED_NTHANDLE |
SHARED_KEYEDMUTEX`, fixed size for the session. A **generation** counter bumps on a mode change
(resize): the driver tears down + recreates the ring at the new size, the host notices the
generation change and re-opens.
- **Named metadata header** (`CreateFileMapping`): `{magic, version, generation, width, height,
dxgi_format, ring_len, latest}` where `latest` packs `{write_index, monotonic sequence}` published
*after* the copy completes. Plain (unprefixed) names — Session-0 shared namespace.
- **Frame-ready auto-reset event** so the consumer waits instead of spinning.
- **Producer (driver, per acquired frame):** pick `(latest_index + 1) % N`; **try**-acquire that
slot's keyed mutex with a 0 ms timeout (if the host still holds it — rare with 3 slots — reuse the
current slot or skip, **never block**); `CopyResource` the acquired `MetaData.pSurface` into the
slot; release the mutex; publish `{index, ++seq}`; `SetEvent`. Then `FinishedProcessingFrame` as
today.
- **Consumer (host `IddPushCapturer`):** `WaitForSingleObject(event, timeout)`; read `latest`; if `seq`
advanced, acquire that slot's mutex, `CopyResource` into an owned NVENC-input texture, release, yield
`FramePayload::D3d11{texture, device}` — straight into the existing zero-copy NVENC path. No DDA, no
CPU readback.
### What P2 removes vs. keeps
- **Removes:** `capture/dxgi.rs`'s `DXGI_ERROR_ACCESS_LOST`/`MODE_CHANGE_IN_PROGRESS` re-duplication
churn, the legacy-`DuplicateOutput` fallback, and **`install_gpu_pref_hook()` (the `win32u.dll`
patch)** — by **pinning the render adapter to the encoder GPU** (`IddCxAdapterSetRenderAdapter`, the
existing `SET_RENDER_ADAPTER` IOCTL, driven before `ADD`), so the OS never reparents the output and
the shared texture + NVENC share one device by construction.
- **Keeps:** display **topology** (making the virtual display the composited desktop) and the
**watchdog** (now ours). The **two-process WGC secure-desktop relay** stays until we confirm the IDD
push also delivers the secure (Winlogon) desktop; if it does, that retires too.
### On-glass attempt 2026-06-22 — code complete, blocked at driver load
The full transport (driver publisher + host `IddPushCapturer` + render-LUID robustness + in-process
routing) is written and compiles clean. The first on-glass A/B exposed several real things and one
hard blocker:
- **The service captures in a Session-1 WGC helper, not in-process.** `should_use_helper()` returns
true for a SYSTEM service, so it spawns a user-session helper that does capture **and input
injection**. IDD-push must capture **in-process in Session 0** (where the driver publishes) — wired
via `should_use_helper()` returning false for `PUNKTFUNK_IDD_PUSH`. **Caveat:** `SendInput` from
Session 0 can't reach the user's Session-1 desktop, so in-process IDD-push has **no working input**
yet. Production needs either a Session-1 input-only helper, or `Global\`-namespaced shared textures
so a Session-1 helper consumes IDD-push for both video + input.
- **`SET_RENDER_ADAPTER` is ignored by the driver** (the IDD lands on a different adapter than pinned:
observed IDD adapter `0xd60722` vs pinned 4090 `0x15de1`). The render-LUID-in-header path makes the
host bind correctly regardless, but the driver should be made to actually honor the pin (or the host
must copy across adapters) so NVENC gets a 4090 surface.
- **Cursor is included** in the IddCx composited frame (DDA strips it) — so the host-side cursor
compositor (P2.5) is likely unnecessary for this path.
- **`FAILED_POST_START` was a red herring (churn, not the binary).** Comparing the 2157 (works) and
the `frame_transport` DLL import tables: **identical** (same 8 DLLs; the size/hash delta is just the
Authenticode signature). A clean install **+ reboot** (no `restart-device`/`disable-enable`/kill in
between) loads the `frame_transport` driver to **`OK`**. The earlier `FAILED_POST_START` was the
device wedging from the hot-reload churn (the deploy gotchas above). **Lesson: deploy = install +
reboot, full stop.**
- **THE REAL BLOCKER — the driver can't CREATE the shared objects.** With the driver loaded clean and
the monitor active, the host's `IddPushCapturer` still times out: `pfvd-hdr-<target> never appeared`.
The driver's own `OutputDebugString` is invisible (UMDF redirects it to ETW, not DebugView — verified
with a working DBWIN self-test), so a **file-logging** driver build was tried — and it wrote **no
file at all**, even though `init()` runs in `DriverEntry`, the device is `OK`, WUDFHost runs as
`LocalService`, and `C:\Users\Public` is world-writable. **WUDFHost runs with a restricted token: it
can neither write the filesystem nor create named kernel objects** (`CreateFileMappingW`/`CreateEventW`/
`CreateSharedHandle`), so `FramePublisher::new` fails silently. This is exactly why the **gamepad UMDF
drivers invert it**: `inject/dualsense_windows.rs` — *"the host creates the section (privileged → a
permissive SDDL so the WUDFHost can open it); the driver maps it"* — `Global\pfds-shm-<idx>` + SDDL
`D:(A;;GA;;;WD)`. **Fix: invert frame-push to match.** The HOST creates the header + event + ring
textures (`Global\` names, `D:(A;;GA;;;WD)` SDDL); the DRIVER only OPENS them, writes its actual
render LUID + a status code back into the host-created header (so we get driver visibility through the
host log), and runs the copy loop. The host creates the textures on the render adapter the driver
reports.
- **Also unresolved: `SET_RENDER_ADAPTER` appears ignored** (the host's pin to the 4090 vs the ADD-reply
adapter differ every time). The inverted header carries the driver's *actual* render LUID so the host
can create textures + run NVENC on the right adapter — but if that's the iGPU, NVENC (NVIDIA) can't
encode it, so the driver must be made to honor the pin (or the host must cross-adapter copy). Needs its
own investigation.
**Driver deploy gotchas learned (this box):** hot-reloading a UMDF display driver is unreliable —
`pnputil /restart-device` does NOT restart WUDFHost (old image stays mapped), `Disable/Enable-PnpDevice`
errors on the root-enumerated IDD, and **killing WUDFHost invalidates the host's cached `{e5bcc234}`
control handle** (every ADD then fails `0x80070006`, and the device can wedge to `FAILED_POST_START`).
A **reboot** loads a freshly-installed build cleanly. **Recovery** from a broken build is clean and
reboot-free: `pnputil /delete-driver <oemNN>.inf /uninstall` removes the bad package and the device
rebinds the previous (validated) package in the DriverStore — restored 2157 → `OK` immediately.
### On-glass attempt 2 (2026-06-23) — inversion works; in-process Session-0 path is a dead end
Implemented the **inversion** (host creates the header + event + ring textures with the
`D:(A;;GA;;;WD)` SDDL, driver only opens them) + a per-attempt **generation** (kills the
`DXGI_ERROR_NAME_ALREADY_EXISTS` retry collisions) + a fixed-name **`Global\pfvd-dbg` debug channel**
(structured counters the driver writes, since UMDF/ETW + the restricted token block its other logs).
Results on the RTX box:
- ✅ The host **creates the shared ring every time** (`created shared ring … render_luid=…`) — the
privileged-create / restricted-open split is sound.
- ✅ No more name collisions (generation fix).
-**The driver writes NOTHING** — debug block all zeros, crucially `run_core_entries=0`. The
swap-chain processor **never runs**, i.e. the OS **never assigns a swap-chain** to the virtual
monitor in this path.
**Root cause: an IddCx monitor only gets a swap-chain when something PRESENTS to it, and the in-process
path has no presenter.** The host + the CCD topology-isolate run in **Session 0, which has no DWM /
compositor**. The WGC path works because its capture helper lives in **Session 1**, where DWM composes
the desktop onto the display (that composition is the swap-chain trigger). So in-process Session-0
IDD-push gets no frames to push, full stop — a **fundamental** barrier, not a fixable bug. The original
plan's "Session-0 transport is the long pole" was right, but the long pole turned out to be *triggering
presentation*, not the shared-memory mechanics (those work).
**Consequence:** the only viable IDD-push shape is **option 3 — a Session-1 helper drives presentation +
consumes the `Global\` ring** (the inversion built here is exactly what it needs). But it carries an
unretired risk: it's still unproven whether the swap-chain gets assigned even with a Session-1 consumer
that isn't WGC. Until that's answered, **DDA/WGC stays the shipping Windows capture path** — it works.
All the IDD-push code (driver open-side + host create-side + debug channel) is written, compiles, and is
gated behind `PUNKTFUNK_IDD_PUSH` (off), so it's dormant and harmless.
### CONCLUSION (2026-06-23): IDD-push is not viable for bare-metal capture — the swap-chain is never assigned
After the inversion + a fixed-name debug channel + a host-created-ring observer + an autonomous
loopback test harness (`punktfunk-probe` → the SYSTEM service, paired via the mgmt API), the question
"does the driver's swap-chain processor ever run?" was answered **definitively: no.** The driver's
`run_core` is **never entered**`run_core_entries=0` in *every* configuration tested:
- in-process (Session 0) and WGC-triggered (Session 1 helper) sessions,
- a user-created ring AND a host-created (LocalSystem) ring with a permissive `D:(A;;GA;;;WD)` SDDL,
- with and without a Low-IL (`S:(ML;;NW;;;LW)`) mandatory label,
- with WUDFHost confirmed **not** an AppContainer (`IsAppContainer=0`),
— even while WGC simultaneously captured the same virtual monitor's composition and streamed multi-MB
of HEVC. The gamepad UMDF drivers prove a UMDF driver *can* open + write a host-created `Global\`
section on this box, so the driver writing nothing is **not** an access problem — `run_core` simply
does not run.
**Root cause (researched + ecosystem-confirmed):** an IddCx virtual monitor only receives a swap-chain
(`EVT_IDD_CX_MONITOR_ASSIGN_SWAPCHAIN`) when the OS **presents/scans-out** to it, which requires a real
presentation consumer. **WGC/DDA capture of the composed desktop does NOT count** — it reads DWM's
composition, bypassing the driver's swap-chain. With no physical scanout and no consumer that routes
*through the driver*, the path stays inactive (`IDDCX_PATH_FLAGS=0`) and `ASSIGN_SWAPCHAIN` never fires.
Confirming evidence:
- **Every bare-metal virtual-display capture project uses WGC/DDA, not the driver swap-chain:** SudoVDA
(its swap-chain loop acquires-and-discards), Apollo/Sunshine (DDA + WGC backends), virtual-display-rs
(discards), parsec-vdd (no frame path). Only **Looking Glass** consumes the driver swap-chain — and
only because a **VM guest scans out** the display (the consumer). We have no equivalent on bare metal.
- Microsoft's own unanswered Q&A (learn.microsoft.com/answers 4096179) reports the identical symptom for
the IddSampleDriver: virtual display "always inactive," `ASSIGN_SWAPCHAIN` never runs.
**Verdict:** the "driver consumes its swap-chain and pushes frames" architecture (P2 / Looking-Glass
style) **cannot get frames** for punktfunk's bare-metal, whole-desktop, capture-only use case. The
shared-memory transport machinery (host-creates / driver-opens, the gamepad pattern) is all sound and
proven to *create*, but there is nothing for the driver to publish. **DDA/WGC remains the only viable
Windows capture path**, which is exactly what the entire ecosystem does. The IDD-push code stays
in-tree, compiles, and is gated `off` (`PUNKTFUNK_IDD_PUSH`) — dormant and harmless — documenting the
attempt so it isn't re-tried. "Better performance/lower overhead" must come from optimizing the WGC/DDA
path (e.g. trimming the Session-0↔Session-1 relay, zero-copy encode), not from IDD-push.
The only unexplored avenue is **driver-side** (a different adapter/monitor/path setup that might make the
OS treat the virtual display as a presentation target) — but it needs a reboot to test, the MS Q&A
suggests it's unsolved, and the unanimous ecosystem choice of WGC/DDA argues it's a dead end.
**Final exhaustion (2026-06-23, follow-up): both remaining avenues closed.**
- **Option 3 (present source) — TESTED, failed.** Added a present-trigger to the Session-1 WGC helper:
it successfully created a D3D11 swapchain on the virtual display and presented continuously (WGC even
captured the flashing window). The driver stayed `run_core_entries=0` / `frames_acquired=0`. So an
active *present source* on the display does NOT make the OS assign the driver's swap-chain either —
DWM composes the present onto the display (capturable) without routing it through the driver's
swap-chain.
- **Option 2 (driver flag) — closed by analysis.** The present-trigger succeeding proves the **path is
already active** (a swapchain presents to the display fine); the missing piece is **scanout routed
through the driver**, which the OS does only for a real consumer (physical display / VM guest / RDP).
The one IddCx flag for that — `IDDCX_ADAPTER_FLAGS_REMOTE_SESSION_DRIVER` — requires the **RDP
protocol stack** as the consumer, which bare-metal console capture has no equivalent of.
**Verdict is final:** IDD-push needs a presentation consumer (scanout / VM guest / RDP) that bare-metal
console desktop-capture fundamentally cannot provide. No host-side capture, no in-process path, no
present source, and no available driver flag overcomes it. WGC (normal desktop) + DDA (secure desktop)
is the only viable Windows capture path — as the entire ecosystem already does. The IDD-push +
present-trigger code stays in-tree, gated off, as the documented record of the attempt.
### Known gaps the build-out must close (tracked as P2.* tasks)
- **Cursor.** DDA/WGC composite the HW cursor host-side from frame-info; the IDD path delivers the
cursor separately (`IddCxMonitorSetupHardwareCursor` event → `QueryHardwareCursor`). The prototype
may ship cursor-less; the build-out wires the IDD cursor into the existing `CursorCompositor`.
- **HDR.** The default IddCx swap-chain surface is 8-bit `B8G8R8A8`; FP16/HDR needs the **IddCx 1.11
D3D12 acquire path** (`SetDevice2`/`ReleaseAndAcquireBuffer2``ID3D12Resource`). Build against
1.10, runtime-gate 1.11. SDR-only for the prototype.
## Why we'd do this
The user's goals, mapped to outcomes:
| Goal | Outcome |
| --- | --- |
| Drop external deps | No more vendored prebuilt SudoVDA `.dll`/`.cat` (third-party, C++, single upstream). |
| Increase Rust coverage | The display driver joins the gamepad drivers as in-tree Rust UMDF. |
| Own the stack / easier display management | We control the IOCTL protocol, the EDID, the mode list, the watchdog — and can fold the topology/mode logic that's currently scattered in `vdisplay/sudovda.rs` into the driver. |
| Cleaner code | Phase 2 retires `capture/dxgi.rs`'s DDA workarounds + the `win32u.dll` patch. |
## What we'd be replacing (current architecture)
- **Driver:** SudoVDA — UMDF2 IddCx, `Class=Display`, `UmdfExtensions=IddCx0102`,
`UpperFilters=IndirectKmd`, root-enumerated `Root\SudoMaker\SudoVDA`. Vendored prebuilt under
`packaging/windows/sudovda/`, installed by `install-sudovda.ps1` (cert → `nefconc` devnode →
`pnputil`). Source is public ([SudoMaker/SudoVDA](https://github.com/SudoMaker/SudoVDA), README-only
MIT/CC0 grant over the MS sample, ~1,900 LOC C++).
- **Host contract:** `crates/punktfunk-host/src/vdisplay/sudovda.rs` opens the control device by
interface GUID `{e5bcc234-…}` and drives a tiny `METHOD_BUFFERED` IOCTL protocol — byte-identical to
SudoVDA's `Common/Include/sudovda-ioctl.h`:
- `ADD (0x800)` `{w,h,refresh,GUID,name[14],serial[14]}``{LUID, target_id}`
- `REMOVE (0x801)` `{GUID}` · `SET_RENDER_ADAPTER (0x802)` `{LUID}` · `GET_WATCHDOG (0x803)` ·
`PING (0x888)` (mandatory keepalive) · `GET_VERSION (0x8FF)`
- **Capture:** `capture/dxgi.rs` finds the virtual monitor's GDI output **across all adapters** (it's
enumerated under the *rendering* GPU, not SudoVDA's LUID) and runs **DXGI Desktop Duplication**
(`DuplicateOutput1`, FP16 for HDR). This file is **dominated by virtual-display-over-DDA survival
code**: `DXGI_ERROR_ACCESS_LOST` re-duplication with retries, `MODE_CHANGE_IN_PROGRESS` backoff,
legacy-`DuplicateOutput` fallback, CCD display isolation to make the IDD the sole composited
desktop, and an **`install_gpu_pref_hook()` that patches `win32u.dll!NtGdiDdDDIGetCachedHybridQueryValue`**
to stop DXGI reparenting the output across GPUs. Most of that exists *because* we capture a virtual
display via DDA on a multi-GPU box.
## Feasibility findings
### Signing — green (the make-or-break)
UMDF user-mode ⇒ Code-Integrity signing rules don't apply to our binary (the only kernel piece is
Microsoft's inbox `IndirectKmd`). Self-signed cert in `Root` + `TrustedPublisher` is sufficient on a
normal Secure-Boot Win11 box — no `bcdedit /set testsigning`. SudoVDA and `virtual-display-rs` both
ship this way. This is the **same** model as our DualSense/DS4/XUSB drivers. (The only thing that
breaks install is a botched cert placement, not a signing *tier*.)
### Rust prior art — exists, MIT, reusable
`virtual-display-rs` proves an all-Rust IddCx driver runs in production and gives us:
`wdf-umdf-sys` (bindgen over WDF **and** `iddcx.h`, links `IddCxStub`), `wdf-umdf` (safe wrappers —
`iddcx.rs` ~300 LOC, with an `IddCxIsFunctionAvailable!` version-gate macro), and a reference driver
(`swap_chain_processor.rs` ~158 LOC, `direct_3d_device.rs`, `edid.rs`). **Caveat:** it uses its *own*
bindgen stack, **not** `microsoft/windows-drivers-rs` — see Decision D2.
### windows-drivers-rs IddCx support — absent, but a bounded extension
Our `wdk-sys` (m0) binds Base + WDF + feature-gated subsets (hid/gpio/spb/…). **Zero IddCx symbols.**
Adding it is the same shape as the existing subsets: an `ApiSubset::Iddcx` variant + `iddcx` feature →
`iddcx_headers()` returning `iddcx.h` for bindgen, and linking `IddCx.lib`. IddCx functions are **not**
WDF-table functions, so the `call_unsafe_wdf_function_binding!` macro doesn't apply — they're direct
`IddCx.lib` exports we'd `#[link(name="IddCx")] extern` (or bindgen) and wrap ourselves.
`windows` 0.58 (already in the tree) provides the Direct3D11/Dxgi APIs the swap-chain loop needs.
### The IddCx driver itself — well-understood, ~12k LOC
Required callbacks (baselined on the MS [IddSampleDriver](https://github.com/microsoft/Windows-driver-samples/blob/main/video/IndirectDisplay/IddSampleDriver/Driver.cpp), ~1,100 LOC, IddCx 1.4):
`EVT_IDD_CX_ADAPTER_INIT_FINISHED`, `ADAPTER_COMMIT_MODES`, `PARSE_MONITOR_DESCRIPTION`,
`MONITOR_GET_DEFAULT_DESCRIPTION_MODES`, `MONITOR_QUERY_TARGET_MODES`, `MONITOR_ASSIGN_SWAPCHAIN`
(the only callback with real D3D work), `MONITOR_UNASSIGN_SWAPCHAIN`, and `DEVICE_IO_CONTROL` (where
our ADD/REMOVE/PING IOCTLs live). Init flow: `WdfDeviceCreate → IddCxDeviceInitConfig →
IddCxDeviceInitialize → IddCxAdapterInitAsync → IddCxMonitorCreate → IddCxMonitorArrival`.
**Arbitrary resolutions don't need EDID timings:** ship one generic ~128/256-byte EDID base block to
make Windows treat the target as a real monitor, then advertise modes programmatically from the
mode-list callbacks — a static table **plus the runtime-requested client mode injected as preferred**
(exactly SudoVDA's `s_DefaultModes[]` + per-ADD preferred-mode approach). 5120×1440@240 just gets
added at ADD time.
**HDR/10-bit:** supported, but it's the one place IddCx is *harder* than today. The default swap-chain
surface is **8-bit `A8R8G8B8`**; FP16/HDR requires the IddCx **1.11 D3D12 acquire path**
(`SetDevice2`/`ReleaseAndAcquireBuffer2``ID3D12Resource`, with a stricter sync model). Our box is
Win11 26200 (IddCx ≥ 1.10), so this is reachable, but it's real work — and our current WGC/DDA path
gives FP16 HDR "for free." Build against 1.10 and runtime-gate the newer DDIs (SudoVDA's pattern).
## The architectural prize: skip DDA (Phase 2)
An IddCx driver gets each presented frame from `IddCxSwapChainReleaseAndAcquireBuffer` as an
`IDXGIResource` on a device **we** bind via `IddCxSwapChainSetDevice`. We can copy it into a shared
texture / shared section and hand it to the host's encoder process directly — **no Desktop
Duplication**. Why this is the real win, not just a detour:
- **It's the *intended* IddCx use case.** IddCx exists for remote/wireless/USB displays that ship
swap-chain frames over a wire; consuming frames in the driver is the designed path, and **Looking
Glass already does exactly this** (driver → shared memory → separate consumer, no DDA).
- **It kills the multi-GPU bug class.** We call `IddCxAdapterSetRenderAdapter` to pin the swap-chain to
the **same GPU as our NVENC encoder before adding the monitor**, and the OS honors it. No more DXGI
reparenting the output onto the wrong GPU, no ACCESS_LOST storms, and we can **retire
`install_gpu_pref_hook()` (the `win32u.dll` patch)** and most of `capture/dxgi.rs`. Swap-chain
re-creation becomes a documented, in-band event (`ABANDON_SWAPCHAIN`) instead of an undocumented
failure we fight with retries.
What it does **not** remove (be honest): display **topology** management — making the virtual display
the sole/primary composited desktop so the game (and Winlogon) render to it — is independent of how we
*get* frames and stays (though we can integrate it more cleanly). And the watchdog stays, now ours.
The cost: a **Session-0 → service cross-process frame transport** (the driver host is `WUDFHost` in
Session 0 / LocalService; our host is a LocalSystem service). A `Global\`-named, explicitly-ACL'd
shared section + keyed-mutex texture (Looking Glass's shape) is where the engineering actually goes —
prototype this first, it's the only genuinely new risk. Plus the HDR D3D12 path above.
## Decisions to make at kickoff
- **D1 — Own the driver?** Recommend **yes, in Rust.** (Alternatives: fork SudoVDA's C++ — fastest to a
known-good HDR driver but reintroduces a C++ toolchain and README-only license provenance; or keep
vendoring — zero cost, but none of the goals.)
- **D2 — Binding stack?** The main implementation fork.
- **(a)** Extend our `windows-drivers-rs` (m0) with an `iddcx` subset — **one toolchain across all
our drivers**, our build env, but we write the IddCx bindings ourselves (+~35 wk), using
`virtual-display-rs`'s `iddcx.rs` as the 1:1 guide. *Preferred for consistency.*
- **(b)** Vendor `virtual-display-rs`'s `wdf-umdf*` crates (MIT) — fastest to first light, but a
*second* WDK-binding stack in-tree.
- Suggested sequence: **prototype on (b) to prove IddCx-on-our-box in days**, then build production on
**(a)** for consistency.
- **D3 — Frame transport?** Phase it: **DDA-compatible first** (zero capture-side change), **direct
push second** (the cleanup). Don't couple the driver rewrite to the transport rewrite.
## Recommended plan
- **P0 — now:** keep vendoring SudoVDA. No change. (The gamepad-driver installer work just shipped;
this is independent.)
- **P1 — drop-in Rust IddCx driver (`pf-vdisplay`).** Replicate SudoVDA's IOCTL contract **exactly**
(same struct layouts; reuse or re-issue the control interface GUID) so `vdisplay/sudovda.rs` needs
**~zero change** (at most a GUID constant). Class=Display + IddCx INF, our own EDID + programmatic
mode list incl. the per-ADD client mode, the watchdog, a real swap-chain drain (the vdd port — the
drain is required so DWM keeps compositing; DDA/WGC still captures the desktop). Bundle + self-sign +
`pnputil`-install via the installer, identical to the gamepad-driver path we just built. **Outcome:** all-Rust, SudoVDA dependency dropped, DDA capture
unchanged. Effort ≈ **24 wk to first light**, **57 wk to parity** (HDR, multi-monitor, CI).
- **P2 — direct frame push (kill DDA).** Add a swap-chain processor that copies each frame into a
shared section/texture; new `capture` backend reads it directly; pin the render adapter to the
encoder GPU. Gate behind a flag, validate against DDA, then retire the DDA path + the `win32u.dll`
patch. HDR via the IddCx 1.11 D3D12 acquire path. **Outcome:** the real "owning the stack pays off"
cleanup. Effort: additional; the Session-0 transport is the long pole.
## Risks
1. **D3-in-a-driver swap-chain loop** — the one genuinely new piece; bugs here = black screens/TDR.
Mitigated by `virtual-display-rs`'s `swap_chain_processor.rs` + the MS sample as references.
2. **Session-0 cross-process transport** (P2) — the actual hard part; prototype it first.
3. **HDR = the harder D3D12 1.11 path** — our current WGC/DDA HDR is free; the IddCx HDR path is not.
4. **Two binding stacks** if we go D2(b) — a maintenance cost cutting against "clean/consistent."
5. **No WHQL ⇒ no Windows Update / Dev-Center distribution** — same constraint our gamepad drivers
already accept (bundle + self-sign + import cert).
## References
- IddCx model + signing: [IDD model overview](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/indirect-display-driver-model-overview) ·
[IddCx versions](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/iddcx-versions) ·
[1.10+ updates](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/iddcx1.10-updates) ·
[UMDF signing](https://learn.microsoft.com/en-us/archive/blogs/peterwie/do-umdf-drivers-require-signing)
- Swap-chain / frames: [IDDCX_METADATA](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/ns-iddcx-iddcx_metadata) ·
[SetDevice](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/nf-iddcx-iddcxswapchainsetdevice) ·
[SetRenderAdapter](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/nf-iddcx-iddcxadaptersetrenderadapter) ·
[ASSIGN_SWAPCHAIN](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/nc-iddcx-evt_idd_cx_monitor_assign_swapchain)
- Prior art: [microsoft IddSampleDriver](https://github.com/microsoft/Windows-driver-samples/tree/main/video/IndirectDisplay) ·
[SudoMaker/SudoVDA](https://github.com/SudoMaker/SudoVDA) ([ioctl.h](https://github.com/SudoMaker/SudoVDA/blob/master/Common/Include/sudovda-ioctl.h)) ·
**[MolotovCherry/virtual-display-rs (Rust, MIT)](https://github.com/MolotovCherry/virtual-display-rs)** ·
[Looking Glass IDD (swap-chain → shm, no DDA)](https://deepwiki.com/gnif/LookingGlass/2.5-indirect-display-driver-(idd)) ·
[itsmikethetech/Virtual-Display-Driver](https://github.com/itsmikethetech/Virtual-Display-Driver)