e2c9bfd3d9
apple / swift (push) Successful in 1m4s
windows-host / package (push) Successful in 6m28s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m14s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m10s
release / apple (push) Successful in 7m53s
android / android (push) Successful in 10m33s
ci / web (push) Successful in 44s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 3m4s
ci / docs-site (push) Successful in 53s
ci / rust (push) Successful in 12m22s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m11s
apple / screenshots (push) Successful in 5m24s
deb / build-publish (push) Successful in 3m16s
decky / build-publish (push) Successful in 21s
ci / bench (push) Successful in 4m42s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m34s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m13s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 47s
flatpak / build-publish (push) Successful in 4m24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 25s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m44s
HDR (display-driven, matching the WGC path): - CTA-861.3 HDR EDID (BT.2020 primaries + HDR Static Metadata block) so Windows offers "Use HDR" on the virtual display. The host FOLLOWS the display's live advanced-color state, recreating the shared ring at the matching format (FP16 in HDR / BGRA in SDR) on a toggle — no freeze. - Always emit Main10/BT.2020-PQ Rgb10a2 while the display is HDR; the client auto-detects PQ from the HEVC VUI (clients under-report VIDEO_CAP_10BIT). Generic HDR10 mastering SEI on every IDR. - Generation-tagged `latest` (gen<<40|seq<<8|slot) + driver `is_stale` re-attach kill the toggle-time garbage frame and any stale-ring read. Perf: - Pipeline the encode loop (Capturer::pipeline_depth; IDD-push = 2): submit N+1 before polling N so the convert/copy on the 3D engine overlaps the NVENC encode of N on the ASIC. PUNKTFUNK_IDD_DEPTH overrides (1 = synchronous). - Rotating host output ring (OUT_RING) so the in-flight encode and the next convert never touch the same texture. - HDR converts directly from the keyed-mutex slot's SRV into the output ring (drops the redundant slot->fp16 scratch copy); SDR copies the BGRA slot in. The slot mutex is held only across the convert/copy, not the encode. RING_LEN 3->6 for publish headroom. - Capture-health diagnostic: new_fps vs repeat_fps under PUNKTFUNK_PERF (a low new_fps at a high send rate means the source isn't compositing, not an encode stall). Validated live on the RTX box: 5120x1440@240 HDR streams; driver composes ~180 new fps, encode 240 fps @ ~4.3 ms p50. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
498 lines
38 KiB
Markdown
498 lines
38 KiB
Markdown
# Windows virtual display — a Rust port of SudoVDA (investigation & plan)
|
||
|
||
Status: **P1 done — `pf-vdisplay` validated streaming on glass at 5120×1440@240** (2026-06-22). The
|
||
all-Rust IddCx driver replaces the vendored **SudoVDA** C++ driver, matching the "all-Rust UMDF, zero
|
||
external driver deps" direction we finished for gamepads (ViGEmBus gone; DualSense/DS4/XUSB shipped).
|
||
The investigation/plan below is kept for context; see **Validated on-box** for the result.
|
||
|
||
## TL;DR
|
||
|
||
A Rust port is **feasible, low-on-blockers, and strategically aligned** — and there's an unexpected
|
||
architectural prize beyond "same thing, in Rust."
|
||
|
||
- **Signing is not a blocker.** An IddCx driver is UMDF *user-mode*; it needs **no WHQL, no
|
||
attestation, no test-signing**. A self-signed cert in LocalMachine `Root` + `TrustedPublisher`
|
||
loads it — **exactly the model our gamepad drivers already ship** (and exactly what SudoVDA and the
|
||
other forks do). ([Do UMDF drivers require signing?](https://learn.microsoft.com/en-us/archive/blogs/peterwie/do-umdf-drivers-require-signing))
|
||
- **We would not be first in Rust.** [`MolotovCherry/virtual-display-rs`](https://github.com/MolotovCherry/virtual-display-rs)
|
||
is a complete, shipping **IddCx driver written in Rust** (MIT), with hand-rolled IddCx/WDF bindgen
|
||
bindings (`wdf-umdf-sys` + `wdf-umdf`) and a reference swap-chain processor. This turns "greenfield
|
||
FFI" into "adapt a proven reference."
|
||
- **The prize: we can stop using DXGI Desktop Duplication.** An IddCx driver already *receives* the
|
||
composited desktop frames in its swap-chain. [Looking Glass](https://deepwiki.com/gnif/LookingGlass/2.5-indirect-display-driver-(idd))
|
||
ships exactly this in production — driver consumes the swap-chain, hands frames to a separate
|
||
process, "operates entirely independently of DDA." Doing the same would **delete an entire class of
|
||
multi-GPU bugs** the current `capture/dxgi.rs` is built to survive (ACCESS_LOST storms,
|
||
MODE_CHANGE_IN_PROGRESS, the `win32u.dll` reparenting patch).
|
||
|
||
Recommendation: **yes, build it in Rust**, in phases — a drop-in DDA-compatible driver first (own the
|
||
stack at low risk), then the direct-frame-push path (the real cleanup). Keep vendoring SudoVDA as the
|
||
safe interim until the Rust driver is on-glass-validated on the RTX box.
|
||
|
||
## Validated on-box (2026-06-22)
|
||
|
||
Before committing, the toolchain + load path were proven on the RTX box (Win11 26200, WDK 26100):
|
||
|
||
- **A Rust IddCx driver builds with our toolchain.** Cloned [`virtual-display-rs`](https://github.com/MolotovCherry/virtual-display-rs)
|
||
and built its driver `.dll` against our WDK (UMDF 2.31 + IddCx 1.4 stubs, bindgen over `IddCx.h` via
|
||
our LLVM, nightly-2024-07-26). One fix needed: its `build.rs` picked the **max** SDK Lib version
|
||
(`10.0.28000.0`, a base SDK with no IddCx) for the `IddCxStub` search path; resolving it by the
|
||
version that actually contains `um\x64\iddcx\1.4` (`10.0.26100.0`, the WDK) fixed the link.
|
||
- **It installs self-signed and loads.** Signed `.dll`/`.cat` with our existing driver cert (the
|
||
gamepad `punktfunk-ds-test`), `pnputil /add-driver`, root devnode via `devgen`. The device came up
|
||
**Status OK / CM_PROB_NONE**, Class Display, hosted by `WUDFRd` — a Rust IddCx adapter initialized
|
||
cleanly. (SudoVDA, already live here, independently confirms IddCx + self-signed UMDF work on this
|
||
box.) Test artifacts removed afterward; SudoVDA untouched.
|
||
|
||
**Conclusion:** the central risk ("can we build + load a Rust IddCx driver here?") is retired. The
|
||
binding question (D2) resolves toward **reusing `virtual-display-rs`'s self-contained `wdf-umdf-sys` +
|
||
`wdf-umdf` bindgen crates** (now proven to build + load on our box) rather than extending
|
||
`windows-drivers-rs` — IddCx functions are direct `IddCxStub` exports the WDF function-table macro
|
||
can't reach anyway, so a unified bindgen is the cleaner base for `pf-vdisplay`. Reference clone kept at
|
||
`C:\Users\Public\virtual-display-rs`.
|
||
|
||
**Scaffold + driver logic landed + on-glass:** `packaging/windows/vdisplay-driver/` — vendored
|
||
`wdf-umdf-sys`/`wdf-umdf` (MIT, + the SDK-version build.rs fix) + the `pf-vdisplay` driver crate. The
|
||
full IddCx driver is ported (entry → `IDD_CX_CLIENT_CONFIG` with all 7 callbacks → device/monitor
|
||
context → our own EDID → a real swap-chain drain), with the IPC/serde/`tokio` stack replaced by an
|
||
in-tree `monitor` model and `OutputDebugString` logging. **Validated on the RTX box:** built, signed
|
||
(our `punktfunk-ds-test` cert), installed, loaded **Status OK**, and **arrived a real virtual monitor**
|
||
("VirtuDisplay+", `DISPLAY\CHY0000`) — i.e. an OURS, all-Rust IddCx virtual display creating a monitor.
|
||
|
||
**IOCTL control plane done + on-glass (P1 functionally complete):** the SudoVDA-compatible control
|
||
plane is implemented (`EVT_IDD_CX_DEVICE_IO_CONTROL` + the `{e5bcc234-…}` interface registered via
|
||
`WdfDeviceCreateDeviceInterface`; `control.rs` with byte-identical structs) — `ADD` a monitor at a
|
||
requested mode → `{LUID, target_id}` (target id + adapter LUID captured from `IDARG_OUT_MONITORARRIVAL`),
|
||
`REMOVE` by GUID, `PING`/`GET_WATCHDOG` watchdog, `GET_VERSION`, `SET_RENDER_ADAPTER`
|
||
(`IddCxAdapterSetRenderAdapter`); per-`ADD` mode injection (requested mode preferred + fallbacks). Added
|
||
the five missing FFI wrappers to the vendored `wdf-umdf`. **Validated on the RTX box** with a probe
|
||
that mimics `vdisplay/sudovda.rs` exactly: `GET_VERSION → 0.2.1`, `GET_WATCHDOG → timeout=3`,
|
||
`ADD 1920×1080@60 → target_id=257 + adapter LUID`, a real "VirtuDisplay+" monitor arrived at the
|
||
requested mode, `REMOVE` ok. **Constraint:** pf-vdisplay can't coexist with SudoVDA — they register the
|
||
same interface GUID, so two IddCx adapters claiming it → `FAILED_POST_START`; pf-vdisplay *replaces*
|
||
SudoVDA (validated by disabling SudoVDA first).
|
||
|
||
**Watchdog + real-host drive validated:** added the watchdog thread (1 Hz countdown reset by any IOCTL;
|
||
tears down all monitors at 0 so a gone host never leaves a phantom display; mirrors SudoVDA's
|
||
`RunWatchdog`). Pointed the **real host** at it — removed SudoVDA's devnode so pf-vdisplay is the sole
|
||
`{e5bcc234}` provider, then ran the host's `vdisplay::sudovda::tests::live_create_drop`
|
||
(`PUNKTFUNK_SUDOVDA_LIVE=1`): **test passed**, and the pf-vdisplay log shows the host's IOCTLs landing —
|
||
`ADD 1920x1080@60 → target_id=258, luid=…02619823`, then the watchdog correctly tore the monitor down
|
||
when the test process exited without a final REMOVE. So `vdisplay/sudovda.rs` drives pf-vdisplay
|
||
unchanged through the full control contract.
|
||
|
||
**Validated streaming end-to-end on glass (2026-06-22) — P1 complete.** pf-vdisplay is a working
|
||
SudoVDA replacement. Driven by the **real host** (`serve`, the LocalSystem service) with a stock client
|
||
at **5120×1440@240**: the monitor arrives, `resolve_gdi_name → \\.\DISPLAY10`, `set_active_mode` +
|
||
CCD-isolate succeed, the DXGI output resolves **under the RTX 4090**, WGC capture + NVENC run at
|
||
**steady 240 fps, ~2.4 ms encode**, 6512 AUs sent, clean teardown (`isolate restored rc=0x0`). Same
|
||
`vdisplay/sudovda.rs` path, unchanged — full parity with SudoVDA.
|
||
|
||
**The earlier "monitor arrives but never gets a swap-chain / no DXGI output" symptoms were a
|
||
measurement + state artifact, not a driver bug.** Two traps cost a lot of time:
|
||
1. **Session 0.** Every standalone probe (`vdtest`, the host's `live_create_drop` test) ran in
|
||
**Session 0** — the services session, whose desktop is a throwaway **1024×768** basic display. IddCx
|
||
activation happens in the **console Session 1**, where the 4090 drives the real desktop. So
|
||
`Screen.AllScreens`/CCD queries from Session 0 *can never* see the virtual monitor activate — they
|
||
report the wrong desktop. The only valid way to drive + observe it is the **host service** (SYSTEM,
|
||
which targets Session 1) plus the driver's own `OutputDebugString` (system-wide, session-agnostic).
|
||
2. **Accumulated device-state damage.** Repeated reinstalls + `Disable`/`Enable-PnpDevice` cycles +
|
||
a control handle the host **cached across all of it** left the device tree wedged (stale handle →
|
||
the host's PINGs fail → the 3 s watchdog tears the monitor down mid-session → capture opens a dying
|
||
display → "no DXGI output"). **A reboot cleared it and it worked on the first connect.** Lesson:
|
||
after device churn, restart the host service (fresh handle) — and when in doubt, reboot.
|
||
|
||
The swap-chain processor is a **faithful port of virtual-display-rs's** (it drains correctly via
|
||
`ReleaseAndAcquireBuffer` + `FinishedProcessingFrame` — the drain is *required*; a true no-op would
|
||
stall DWM and freeze the captured image). The EDID is our **own clean 128-byte block** (manufacturer
|
||
`PNK`, product `punktfunk`) — no SudoVDA bytes.
|
||
|
||
**Build gotcha (important for iterating):** updating an installed UMDF driver only takes if the INF
|
||
**DriverVer changes** — `deploy-dev.ps1` stamps a date.time `-v` on every run; without a bump the old
|
||
binary keeps running (silently). **Devnode hygiene:** create the root devnode with
|
||
`nefconc --create-device-node` (a clean `ROOT\DISPLAY` node), NOT `devgen /add` — devgen makes
|
||
**persistent `SWD\DEVGEN` software devices** that survive reboot *and* registry deletion and resurrect
|
||
on every `pnputil /add-driver` (they have `hwid root\pf_vdisplay`, so the driver install re-materializes
|
||
them). The production installer must use a single `nefconc`/INF-created node and never `devgen`.
|
||
|
||
## P2 — direct frame push (kill DDA): design & decision record
|
||
|
||
Status: **in progress.** P1 ships frames the old way (the driver drains its swap-chain and DDA/WGC
|
||
re-captures the composited desktop). P2 makes the driver *publish* each swap-chain frame to the host
|
||
directly, so we can retire Desktop Duplication and its multi-GPU survival code. Built behind
|
||
`PUNKTFUNK_IDD_PUSH`, A/B'd against DDA, and only then made the default.
|
||
|
||
### The decisive finding: producer and consumer are both in Session 0
|
||
|
||
The whole transport design hinged on one unknown — same-session or cross-session? **Measured on the
|
||
RTX box (2026-06-22):** the pf-vdisplay host process is `WUDFHost.exe` with
|
||
`-DeviceGroupId:pfVDisplayGroup`, running in **Session 0**; the punktfunk host service is `LocalSystem`,
|
||
also **Session 0**. So the swap-chain processor thread (spawned by our own `thread::spawn` inside the
|
||
driver, i.e. in `WUDFHost`) and the encoder live in the **same session**. This is the easy case:
|
||
|
||
- A D3D11 **shared keyed-mutex texture** created in the driver can be opened by name in the host with
|
||
`ID3D11Device1::OpenSharedResourceByName` — both devices created on the **same render-adapter LUID**
|
||
(which the driver already reports out of the `ADD` IOCTL via `OsAdapterLuid`, surfaced as
|
||
`WinCaptureTarget::adapter_luid`).
|
||
- Named kernel objects resolve through Session 0's shared `\BaseNamedObjects`, so **no `Global\`
|
||
prefix / `SeCreateGlobalPrivilege` gymnastics** are needed (kept the names unprefixed; documented
|
||
that this relies on both processes being Session 0). The Looking-Glass cross-*VM* shared-memory
|
||
device is unnecessary — this is cross-*process*, same-session, on one GPU.
|
||
|
||
This collapses the "Session-0 cross-process transport is the long pole" risk from the original plan.
|
||
|
||
### Transport: a ring of shared keyed-mutex textures + a metadata header + an event
|
||
|
||
A single ping-pong keyed mutex would couple the driver's present rate to the host's consume rate — and
|
||
**the swap-chain thread must never block** (a stalled `IddCxSwapChainReleaseAndAcquire`/processing loop
|
||
freezes DWM compositing system-wide). So, the Looking-Glass shape — multiple frame buffers, newest
|
||
wins:
|
||
|
||
- **Ring** of `N` (default 3) shared textures, `RESOURCE_MISC_SHARED_NTHANDLE |
|
||
SHARED_KEYEDMUTEX`, fixed size for the session. A **generation** counter bumps on a mode change
|
||
(resize): the driver tears down + recreates the ring at the new size, the host notices the
|
||
generation change and re-opens.
|
||
- **Named metadata header** (`CreateFileMapping`): `{magic, version, generation, width, height,
|
||
dxgi_format, ring_len, latest}` where `latest` packs `{write_index, monotonic sequence}` published
|
||
*after* the copy completes. Plain (unprefixed) names — Session-0 shared namespace.
|
||
- **Frame-ready auto-reset event** so the consumer waits instead of spinning.
|
||
- **Producer (driver, per acquired frame):** pick `(latest_index + 1) % N`; **try**-acquire that
|
||
slot's keyed mutex with a 0 ms timeout (if the host still holds it — rare with 3 slots — reuse the
|
||
current slot or skip, **never block**); `CopyResource` the acquired `MetaData.pSurface` into the
|
||
slot; release the mutex; publish `{index, ++seq}`; `SetEvent`. Then `FinishedProcessingFrame` as
|
||
today.
|
||
- **Consumer (host `IddPushCapturer`):** `WaitForSingleObject(event, timeout)`; read `latest`; if `seq`
|
||
advanced, acquire that slot's mutex, `CopyResource` into an owned NVENC-input texture, release, yield
|
||
`FramePayload::D3d11{texture, device}` — straight into the existing zero-copy NVENC path. No DDA, no
|
||
CPU readback.
|
||
|
||
### What P2 removes vs. keeps
|
||
|
||
- **Removes:** `capture/dxgi.rs`'s `DXGI_ERROR_ACCESS_LOST`/`MODE_CHANGE_IN_PROGRESS` re-duplication
|
||
churn, the legacy-`DuplicateOutput` fallback, and **`install_gpu_pref_hook()` (the `win32u.dll`
|
||
patch)** — by **pinning the render adapter to the encoder GPU** (`IddCxAdapterSetRenderAdapter`, the
|
||
existing `SET_RENDER_ADAPTER` IOCTL, driven before `ADD`), so the OS never reparents the output and
|
||
the shared texture + NVENC share one device by construction.
|
||
- **Keeps:** display **topology** (making the virtual display the composited desktop) and the
|
||
**watchdog** (now ours). The **two-process WGC secure-desktop relay** stays until we confirm the IDD
|
||
push also delivers the secure (Winlogon) desktop; if it does, that retires too.
|
||
|
||
### On-glass attempt 2026-06-22 — code complete, blocked at driver load
|
||
|
||
The full transport (driver publisher + host `IddPushCapturer` + render-LUID robustness + in-process
|
||
routing) is written and compiles clean. The first on-glass A/B exposed several real things and one
|
||
hard blocker:
|
||
|
||
- **The service captures in a Session-1 WGC helper, not in-process.** `should_use_helper()` returns
|
||
true for a SYSTEM service, so it spawns a user-session helper that does capture **and input
|
||
injection**. IDD-push must capture **in-process in Session 0** (where the driver publishes) — wired
|
||
via `should_use_helper()` returning false for `PUNKTFUNK_IDD_PUSH`. **Caveat:** `SendInput` from
|
||
Session 0 can't reach the user's Session-1 desktop, so in-process IDD-push has **no working input**
|
||
yet. Production needs either a Session-1 input-only helper, or `Global\`-namespaced shared textures
|
||
so a Session-1 helper consumes IDD-push for both video + input.
|
||
- **`SET_RENDER_ADAPTER` is ignored by the driver** (the IDD lands on a different adapter than pinned:
|
||
observed IDD adapter `0xd60722` vs pinned 4090 `0x15de1`). The render-LUID-in-header path makes the
|
||
host bind correctly regardless, but the driver should be made to actually honor the pin (or the host
|
||
must copy across adapters) so NVENC gets a 4090 surface.
|
||
- **Cursor is included** in the IddCx composited frame (DDA strips it) — so the host-side cursor
|
||
compositor (P2.5) is likely unnecessary for this path.
|
||
- **`FAILED_POST_START` was a red herring (churn, not the binary).** Comparing the 2157 (works) and
|
||
the `frame_transport` DLL import tables: **identical** (same 8 DLLs; the size/hash delta is just the
|
||
Authenticode signature). A clean install **+ reboot** (no `restart-device`/`disable-enable`/kill in
|
||
between) loads the `frame_transport` driver to **`OK`**. The earlier `FAILED_POST_START` was the
|
||
device wedging from the hot-reload churn (the deploy gotchas above). **Lesson: deploy = install +
|
||
reboot, full stop.**
|
||
- **THE REAL BLOCKER — the driver can't CREATE the shared objects.** With the driver loaded clean and
|
||
the monitor active, the host's `IddPushCapturer` still times out: `pfvd-hdr-<target> never appeared`.
|
||
The driver's own `OutputDebugString` is invisible (UMDF redirects it to ETW, not DebugView — verified
|
||
with a working DBWIN self-test), so a **file-logging** driver build was tried — and it wrote **no
|
||
file at all**, even though `init()` runs in `DriverEntry`, the device is `OK`, WUDFHost runs as
|
||
`LocalService`, and `C:\Users\Public` is world-writable. **WUDFHost runs with a restricted token: it
|
||
can neither write the filesystem nor create named kernel objects** (`CreateFileMappingW`/`CreateEventW`/
|
||
`CreateSharedHandle`), so `FramePublisher::new` fails silently. This is exactly why the **gamepad UMDF
|
||
drivers invert it**: `inject/dualsense_windows.rs` — *"the host creates the section (privileged → a
|
||
permissive SDDL so the WUDFHost can open it); the driver maps it"* — `Global\pfds-shm-<idx>` + SDDL
|
||
`D:(A;;GA;;;WD)`. **Fix: invert frame-push to match.** The HOST creates the header + event + ring
|
||
textures (`Global\` names, `D:(A;;GA;;;WD)` SDDL); the DRIVER only OPENS them, writes its actual
|
||
render LUID + a status code back into the host-created header (so we get driver visibility through the
|
||
host log), and runs the copy loop. The host creates the textures on the render adapter the driver
|
||
reports.
|
||
- **Also unresolved: `SET_RENDER_ADAPTER` appears ignored** (the host's pin to the 4090 vs the ADD-reply
|
||
adapter differ every time). The inverted header carries the driver's *actual* render LUID so the host
|
||
can create textures + run NVENC on the right adapter — but if that's the iGPU, NVENC (NVIDIA) can't
|
||
encode it, so the driver must be made to honor the pin (or the host must cross-adapter copy). Needs its
|
||
own investigation.
|
||
|
||
**Driver deploy gotchas learned (this box):** hot-reloading a UMDF display driver is unreliable —
|
||
`pnputil /restart-device` does NOT restart WUDFHost (old image stays mapped), `Disable/Enable-PnpDevice`
|
||
errors on the root-enumerated IDD, and **killing WUDFHost invalidates the host's cached `{e5bcc234}`
|
||
control handle** (every ADD then fails `0x80070006`, and the device can wedge to `FAILED_POST_START`).
|
||
A **reboot** loads a freshly-installed build cleanly. **Recovery** from a broken build is clean and
|
||
reboot-free: `pnputil /delete-driver <oemNN>.inf /uninstall` removes the bad package and the device
|
||
rebinds the previous (validated) package in the DriverStore — restored 2157 → `OK` immediately.
|
||
|
||
### On-glass attempt 2 (2026-06-23) — inversion works; in-process Session-0 path is a dead end
|
||
|
||
Implemented the **inversion** (host creates the header + event + ring textures with the
|
||
`D:(A;;GA;;;WD)` SDDL, driver only opens them) + a per-attempt **generation** (kills the
|
||
`DXGI_ERROR_NAME_ALREADY_EXISTS` retry collisions) + a fixed-name **`Global\pfvd-dbg` debug channel**
|
||
(structured counters the driver writes, since UMDF/ETW + the restricted token block its other logs).
|
||
Results on the RTX box:
|
||
|
||
- ✅ The host **creates the shared ring every time** (`created shared ring … render_luid=…`) — the
|
||
privileged-create / restricted-open split is sound.
|
||
- ✅ No more name collisions (generation fix).
|
||
- ❌ **The driver writes NOTHING** — debug block all zeros, crucially `run_core_entries=0`. The
|
||
swap-chain processor **never runs**, i.e. the OS **never assigns a swap-chain** to the virtual
|
||
monitor in this path.
|
||
|
||
**Root cause: an IddCx monitor only gets a swap-chain when something PRESENTS to it, and the in-process
|
||
path has no presenter.** The host + the CCD topology-isolate run in **Session 0, which has no DWM /
|
||
compositor**. The WGC path works because its capture helper lives in **Session 1**, where DWM composes
|
||
the desktop onto the display (that composition is the swap-chain trigger). So in-process Session-0
|
||
IDD-push gets no frames to push, full stop — a **fundamental** barrier, not a fixable bug. The original
|
||
plan's "Session-0 transport is the long pole" was right, but the long pole turned out to be *triggering
|
||
presentation*, not the shared-memory mechanics (those work).
|
||
|
||
**Consequence:** the only viable IDD-push shape is **option 3 — a Session-1 helper drives presentation +
|
||
consumes the `Global\` ring** (the inversion built here is exactly what it needs). But it carries an
|
||
unretired risk: it's still unproven whether the swap-chain gets assigned even with a Session-1 consumer
|
||
that isn't WGC. Until that's answered, **DDA/WGC stays the shipping Windows capture path** — it works.
|
||
All the IDD-push code (driver open-side + host create-side + debug channel) is written, compiles, and is
|
||
gated behind `PUNKTFUNK_IDD_PUSH` (off), so it's dormant and harmless.
|
||
|
||
### CONCLUSION (2026-06-23): IDD-push is not viable for bare-metal capture — the swap-chain is never assigned
|
||
|
||
After the inversion + a fixed-name debug channel + a host-created-ring observer + an autonomous
|
||
loopback test harness (`punktfunk-probe` → the SYSTEM service, paired via the mgmt API), the question
|
||
"does the driver's swap-chain processor ever run?" was answered **definitively: no.** The driver's
|
||
`run_core` is **never entered** — `run_core_entries=0` in *every* configuration tested:
|
||
|
||
- in-process (Session 0) and WGC-triggered (Session 1 helper) sessions,
|
||
- a user-created ring AND a host-created (LocalSystem) ring with a permissive `D:(A;;GA;;;WD)` SDDL,
|
||
- with and without a Low-IL (`S:(ML;;NW;;;LW)`) mandatory label,
|
||
- with WUDFHost confirmed **not** an AppContainer (`IsAppContainer=0`),
|
||
|
||
— even while WGC simultaneously captured the same virtual monitor's composition and streamed multi-MB
|
||
of HEVC. The gamepad UMDF drivers prove a UMDF driver *can* open + write a host-created `Global\`
|
||
section on this box, so the driver writing nothing is **not** an access problem — `run_core` simply
|
||
does not run.
|
||
|
||
**Root cause (researched + ecosystem-confirmed):** an IddCx virtual monitor only receives a swap-chain
|
||
(`EVT_IDD_CX_MONITOR_ASSIGN_SWAPCHAIN`) when the OS **presents/scans-out** to it, which requires a real
|
||
presentation consumer. **WGC/DDA capture of the composed desktop does NOT count** — it reads DWM's
|
||
composition, bypassing the driver's swap-chain. With no physical scanout and no consumer that routes
|
||
*through the driver*, the path stays inactive (`IDDCX_PATH_FLAGS=0`) and `ASSIGN_SWAPCHAIN` never fires.
|
||
Confirming evidence:
|
||
|
||
- **Every bare-metal virtual-display capture project uses WGC/DDA, not the driver swap-chain:** SudoVDA
|
||
(its swap-chain loop acquires-and-discards), Apollo/Sunshine (DDA + WGC backends), virtual-display-rs
|
||
(discards), parsec-vdd (no frame path). Only **Looking Glass** consumes the driver swap-chain — and
|
||
only because a **VM guest scans out** the display (the consumer). We have no equivalent on bare metal.
|
||
- Microsoft's own unanswered Q&A (learn.microsoft.com/answers 4096179) reports the identical symptom for
|
||
the IddSampleDriver: virtual display "always inactive," `ASSIGN_SWAPCHAIN` never runs.
|
||
|
||
**Verdict:** the "driver consumes its swap-chain and pushes frames" architecture (P2 / Looking-Glass
|
||
style) **cannot get frames** for punktfunk's bare-metal, whole-desktop, capture-only use case. The
|
||
shared-memory transport machinery (host-creates / driver-opens, the gamepad pattern) is all sound and
|
||
proven to *create*, but there is nothing for the driver to publish. **DDA/WGC remains the only viable
|
||
Windows capture path**, which is exactly what the entire ecosystem does. The IDD-push code stays
|
||
in-tree, compiles, and is gated `off` (`PUNKTFUNK_IDD_PUSH`) — dormant and harmless — documenting the
|
||
attempt so it isn't re-tried. "Better performance/lower overhead" must come from optimizing the WGC/DDA
|
||
path (e.g. trimming the Session-0↔Session-1 relay, zero-copy encode), not from IDD-push.
|
||
|
||
The only unexplored avenue is **driver-side** (a different adapter/monitor/path setup that might make the
|
||
OS treat the virtual display as a presentation target) — but it needs a reboot to test, the MS Q&A
|
||
suggests it's unsolved, and the unanimous ecosystem choice of WGC/DDA argues it's a dead end.
|
||
|
||
**Final exhaustion (2026-06-23, follow-up): both remaining avenues closed.**
|
||
|
||
- **Option 3 (present source) — TESTED, failed.** Added a present-trigger to the Session-1 WGC helper:
|
||
it successfully created a D3D11 swapchain on the virtual display and presented continuously (WGC even
|
||
captured the flashing window). The driver stayed `run_core_entries=0` / `frames_acquired=0`. So an
|
||
active *present source* on the display does NOT make the OS assign the driver's swap-chain either —
|
||
DWM composes the present onto the display (capturable) without routing it through the driver's
|
||
swap-chain.
|
||
- **Option 2 (driver flag) — closed by analysis.** The present-trigger succeeding proves the **path is
|
||
already active** (a swapchain presents to the display fine); the missing piece is **scanout routed
|
||
through the driver**, which the OS does only for a real consumer (physical display / VM guest / RDP).
|
||
The one IddCx flag for that — `IDDCX_ADAPTER_FLAGS_REMOTE_SESSION_DRIVER` — requires the **RDP
|
||
protocol stack** as the consumer, which bare-metal console capture has no equivalent of.
|
||
|
||
**Verdict is final:** IDD-push needs a presentation consumer (scanout / VM guest / RDP) that bare-metal
|
||
console desktop-capture fundamentally cannot provide. No host-side capture, no in-process path, no
|
||
present source, and no available driver flag overcomes it. WGC (normal desktop) + DDA (secure desktop)
|
||
is the only viable Windows capture path — as the entire ecosystem already does. The IDD-push +
|
||
present-trigger code stays in-tree, gated off, as the documented record of the attempt.
|
||
|
||
### Known gaps the build-out must close (tracked as P2.* tasks)
|
||
|
||
- **Cursor.** DDA/WGC composite the HW cursor host-side from frame-info; the IDD path delivers the
|
||
cursor separately (`IddCxMonitorSetupHardwareCursor` event → `QueryHardwareCursor`). The prototype
|
||
may ship cursor-less; the build-out wires the IDD cursor into the existing `CursorCompositor`.
|
||
- **HDR.** The default IddCx swap-chain surface is 8-bit `B8G8R8A8`; FP16/HDR needs the **IddCx 1.11
|
||
D3D12 acquire path** (`SetDevice2`/`ReleaseAndAcquireBuffer2` → `ID3D12Resource`). Build against
|
||
1.10, runtime-gate 1.11. SDR-only for the prototype.
|
||
|
||
## Why we'd do this
|
||
|
||
The user's goals, mapped to outcomes:
|
||
|
||
| Goal | Outcome |
|
||
| --- | --- |
|
||
| Drop external deps | No more vendored prebuilt SudoVDA `.dll`/`.cat` (third-party, C++, single upstream). |
|
||
| Increase Rust coverage | The display driver joins the gamepad drivers as in-tree Rust UMDF. |
|
||
| Own the stack / easier display management | We control the IOCTL protocol, the EDID, the mode list, the watchdog — and can fold the topology/mode logic that's currently scattered in `vdisplay/sudovda.rs` into the driver. |
|
||
| Cleaner code | Phase 2 retires `capture/dxgi.rs`'s DDA workarounds + the `win32u.dll` patch. |
|
||
|
||
## What we'd be replacing (current architecture)
|
||
|
||
- **Driver:** SudoVDA — UMDF2 IddCx, `Class=Display`, `UmdfExtensions=IddCx0102`,
|
||
`UpperFilters=IndirectKmd`, root-enumerated `Root\SudoMaker\SudoVDA`. Vendored prebuilt under
|
||
`packaging/windows/sudovda/`, installed by `install-sudovda.ps1` (cert → `nefconc` devnode →
|
||
`pnputil`). Source is public ([SudoMaker/SudoVDA](https://github.com/SudoMaker/SudoVDA), README-only
|
||
MIT/CC0 grant over the MS sample, ~1,900 LOC C++).
|
||
- **Host contract:** `crates/punktfunk-host/src/vdisplay/sudovda.rs` opens the control device by
|
||
interface GUID `{e5bcc234-…}` and drives a tiny `METHOD_BUFFERED` IOCTL protocol — byte-identical to
|
||
SudoVDA's `Common/Include/sudovda-ioctl.h`:
|
||
- `ADD (0x800)` `{w,h,refresh,GUID,name[14],serial[14]}` → `{LUID, target_id}`
|
||
- `REMOVE (0x801)` `{GUID}` · `SET_RENDER_ADAPTER (0x802)` `{LUID}` · `GET_WATCHDOG (0x803)` ·
|
||
`PING (0x888)` (mandatory keepalive) · `GET_VERSION (0x8FF)`
|
||
- **Capture:** `capture/dxgi.rs` finds the virtual monitor's GDI output **across all adapters** (it's
|
||
enumerated under the *rendering* GPU, not SudoVDA's LUID) and runs **DXGI Desktop Duplication**
|
||
(`DuplicateOutput1`, FP16 for HDR). This file is **dominated by virtual-display-over-DDA survival
|
||
code**: `DXGI_ERROR_ACCESS_LOST` re-duplication with retries, `MODE_CHANGE_IN_PROGRESS` backoff,
|
||
legacy-`DuplicateOutput` fallback, CCD display isolation to make the IDD the sole composited
|
||
desktop, and an **`install_gpu_pref_hook()` that patches `win32u.dll!NtGdiDdDDIGetCachedHybridQueryValue`**
|
||
to stop DXGI reparenting the output across GPUs. Most of that exists *because* we capture a virtual
|
||
display via DDA on a multi-GPU box.
|
||
|
||
## Feasibility findings
|
||
|
||
### Signing — green (the make-or-break)
|
||
UMDF user-mode ⇒ Code-Integrity signing rules don't apply to our binary (the only kernel piece is
|
||
Microsoft's inbox `IndirectKmd`). Self-signed cert in `Root` + `TrustedPublisher` is sufficient on a
|
||
normal Secure-Boot Win11 box — no `bcdedit /set testsigning`. SudoVDA and `virtual-display-rs` both
|
||
ship this way. This is the **same** model as our DualSense/DS4/XUSB drivers. (The only thing that
|
||
breaks install is a botched cert placement, not a signing *tier*.)
|
||
|
||
### Rust prior art — exists, MIT, reusable
|
||
`virtual-display-rs` proves an all-Rust IddCx driver runs in production and gives us:
|
||
`wdf-umdf-sys` (bindgen over WDF **and** `iddcx.h`, links `IddCxStub`), `wdf-umdf` (safe wrappers —
|
||
`iddcx.rs` ~300 LOC, with an `IddCxIsFunctionAvailable!` version-gate macro), and a reference driver
|
||
(`swap_chain_processor.rs` ~158 LOC, `direct_3d_device.rs`, `edid.rs`). **Caveat:** it uses its *own*
|
||
bindgen stack, **not** `microsoft/windows-drivers-rs` — see Decision D2.
|
||
|
||
### windows-drivers-rs IddCx support — absent, but a bounded extension
|
||
Our `wdk-sys` (m0) binds Base + WDF + feature-gated subsets (hid/gpio/spb/…). **Zero IddCx symbols.**
|
||
Adding it is the same shape as the existing subsets: an `ApiSubset::Iddcx` variant + `iddcx` feature →
|
||
`iddcx_headers()` returning `iddcx.h` for bindgen, and linking `IddCx.lib`. IddCx functions are **not**
|
||
WDF-table functions, so the `call_unsafe_wdf_function_binding!` macro doesn't apply — they're direct
|
||
`IddCx.lib` exports we'd `#[link(name="IddCx")] extern` (or bindgen) and wrap ourselves.
|
||
`windows` 0.58 (already in the tree) provides the Direct3D11/Dxgi APIs the swap-chain loop needs.
|
||
|
||
### The IddCx driver itself — well-understood, ~1–2k LOC
|
||
Required callbacks (baselined on the MS [IddSampleDriver](https://github.com/microsoft/Windows-driver-samples/blob/main/video/IndirectDisplay/IddSampleDriver/Driver.cpp), ~1,100 LOC, IddCx 1.4):
|
||
`EVT_IDD_CX_ADAPTER_INIT_FINISHED`, `ADAPTER_COMMIT_MODES`, `PARSE_MONITOR_DESCRIPTION`,
|
||
`MONITOR_GET_DEFAULT_DESCRIPTION_MODES`, `MONITOR_QUERY_TARGET_MODES`, `MONITOR_ASSIGN_SWAPCHAIN`
|
||
(the only callback with real D3D work), `MONITOR_UNASSIGN_SWAPCHAIN`, and `DEVICE_IO_CONTROL` (where
|
||
our ADD/REMOVE/PING IOCTLs live). Init flow: `WdfDeviceCreate → IddCxDeviceInitConfig →
|
||
IddCxDeviceInitialize → IddCxAdapterInitAsync → IddCxMonitorCreate → IddCxMonitorArrival`.
|
||
|
||
**Arbitrary resolutions don't need EDID timings:** ship one generic ~128/256-byte EDID base block to
|
||
make Windows treat the target as a real monitor, then advertise modes programmatically from the
|
||
mode-list callbacks — a static table **plus the runtime-requested client mode injected as preferred**
|
||
(exactly SudoVDA's `s_DefaultModes[]` + per-ADD preferred-mode approach). 5120×1440@240 just gets
|
||
added at ADD time.
|
||
|
||
**HDR/10-bit:** supported, but it's the one place IddCx is *harder* than today. The default swap-chain
|
||
surface is **8-bit `A8R8G8B8`**; FP16/HDR requires the IddCx **1.11 D3D12 acquire path**
|
||
(`SetDevice2`/`ReleaseAndAcquireBuffer2` → `ID3D12Resource`, with a stricter sync model). Our box is
|
||
Win11 26200 (IddCx ≥ 1.10), so this is reachable, but it's real work — and our current WGC/DDA path
|
||
gives FP16 HDR "for free." Build against 1.10 and runtime-gate the newer DDIs (SudoVDA's pattern).
|
||
|
||
## The architectural prize: skip DDA (Phase 2)
|
||
|
||
An IddCx driver gets each presented frame from `IddCxSwapChainReleaseAndAcquireBuffer` as an
|
||
`IDXGIResource` on a device **we** bind via `IddCxSwapChainSetDevice`. We can copy it into a shared
|
||
texture / shared section and hand it to the host's encoder process directly — **no Desktop
|
||
Duplication**. Why this is the real win, not just a detour:
|
||
|
||
- **It's the *intended* IddCx use case.** IddCx exists for remote/wireless/USB displays that ship
|
||
swap-chain frames over a wire; consuming frames in the driver is the designed path, and **Looking
|
||
Glass already does exactly this** (driver → shared memory → separate consumer, no DDA).
|
||
- **It kills the multi-GPU bug class.** We call `IddCxAdapterSetRenderAdapter` to pin the swap-chain to
|
||
the **same GPU as our NVENC encoder before adding the monitor**, and the OS honors it. No more DXGI
|
||
reparenting the output onto the wrong GPU, no ACCESS_LOST storms, and we can **retire
|
||
`install_gpu_pref_hook()` (the `win32u.dll` patch)** and most of `capture/dxgi.rs`. Swap-chain
|
||
re-creation becomes a documented, in-band event (`ABANDON_SWAPCHAIN`) instead of an undocumented
|
||
failure we fight with retries.
|
||
|
||
What it does **not** remove (be honest): display **topology** management — making the virtual display
|
||
the sole/primary composited desktop so the game (and Winlogon) render to it — is independent of how we
|
||
*get* frames and stays (though we can integrate it more cleanly). And the watchdog stays, now ours.
|
||
|
||
The cost: a **Session-0 → service cross-process frame transport** (the driver host is `WUDFHost` in
|
||
Session 0 / LocalService; our host is a LocalSystem service). A `Global\`-named, explicitly-ACL'd
|
||
shared section + keyed-mutex texture (Looking Glass's shape) is where the engineering actually goes —
|
||
prototype this first, it's the only genuinely new risk. Plus the HDR D3D12 path above.
|
||
|
||
## Decisions to make at kickoff
|
||
|
||
- **D1 — Own the driver?** Recommend **yes, in Rust.** (Alternatives: fork SudoVDA's C++ — fastest to a
|
||
known-good HDR driver but reintroduces a C++ toolchain and README-only license provenance; or keep
|
||
vendoring — zero cost, but none of the goals.)
|
||
- **D2 — Binding stack?** The main implementation fork.
|
||
- **(a)** Extend our `windows-drivers-rs` (m0) with an `iddcx` subset — **one toolchain across all
|
||
our drivers**, our build env, but we write the IddCx bindings ourselves (+~3–5 wk), using
|
||
`virtual-display-rs`'s `iddcx.rs` as the 1:1 guide. *Preferred for consistency.*
|
||
- **(b)** Vendor `virtual-display-rs`'s `wdf-umdf*` crates (MIT) — fastest to first light, but a
|
||
*second* WDK-binding stack in-tree.
|
||
- Suggested sequence: **prototype on (b) to prove IddCx-on-our-box in days**, then build production on
|
||
**(a)** for consistency.
|
||
- **D3 — Frame transport?** Phase it: **DDA-compatible first** (zero capture-side change), **direct
|
||
push second** (the cleanup). Don't couple the driver rewrite to the transport rewrite.
|
||
|
||
## Recommended plan
|
||
|
||
- **P0 — now:** keep vendoring SudoVDA. No change. (The gamepad-driver installer work just shipped;
|
||
this is independent.)
|
||
- **P1 — drop-in Rust IddCx driver (`pf-vdisplay`).** Replicate SudoVDA's IOCTL contract **exactly**
|
||
(same struct layouts; reuse or re-issue the control interface GUID) so `vdisplay/sudovda.rs` needs
|
||
**~zero change** (at most a GUID constant). Class=Display + IddCx INF, our own EDID + programmatic
|
||
mode list incl. the per-ADD client mode, the watchdog, a real swap-chain drain (the vdd port — the
|
||
drain is required so DWM keeps compositing; DDA/WGC still captures the desktop). Bundle + self-sign +
|
||
`pnputil`-install via the installer, identical to the gamepad-driver path we just built. **Outcome:** all-Rust, SudoVDA dependency dropped, DDA capture
|
||
unchanged. Effort ≈ **2–4 wk to first light**, **5–7 wk to parity** (HDR, multi-monitor, CI).
|
||
- **P2 — direct frame push (kill DDA).** Add a swap-chain processor that copies each frame into a
|
||
shared section/texture; new `capture` backend reads it directly; pin the render adapter to the
|
||
encoder GPU. Gate behind a flag, validate against DDA, then retire the DDA path + the `win32u.dll`
|
||
patch. HDR via the IddCx 1.11 D3D12 acquire path. **Outcome:** the real "owning the stack pays off"
|
||
cleanup. Effort: additional; the Session-0 transport is the long pole.
|
||
|
||
## Risks
|
||
|
||
1. **D3-in-a-driver swap-chain loop** — the one genuinely new piece; bugs here = black screens/TDR.
|
||
Mitigated by `virtual-display-rs`'s `swap_chain_processor.rs` + the MS sample as references.
|
||
2. **Session-0 cross-process transport** (P2) — the actual hard part; prototype it first.
|
||
3. **HDR = the harder D3D12 1.11 path** — our current WGC/DDA HDR is free; the IddCx HDR path is not.
|
||
4. **Two binding stacks** if we go D2(b) — a maintenance cost cutting against "clean/consistent."
|
||
5. **No WHQL ⇒ no Windows Update / Dev-Center distribution** — same constraint our gamepad drivers
|
||
already accept (bundle + self-sign + import cert).
|
||
|
||
## References
|
||
|
||
- IddCx model + signing: [IDD model overview](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/indirect-display-driver-model-overview) ·
|
||
[IddCx versions](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/iddcx-versions) ·
|
||
[1.10+ updates](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/iddcx1.10-updates) ·
|
||
[UMDF signing](https://learn.microsoft.com/en-us/archive/blogs/peterwie/do-umdf-drivers-require-signing)
|
||
- Swap-chain / frames: [IDDCX_METADATA](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/ns-iddcx-iddcx_metadata) ·
|
||
[SetDevice](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/nf-iddcx-iddcxswapchainsetdevice) ·
|
||
[SetRenderAdapter](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/nf-iddcx-iddcxadaptersetrenderadapter) ·
|
||
[ASSIGN_SWAPCHAIN](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/nc-iddcx-evt_idd_cx_monitor_assign_swapchain)
|
||
- Prior art: [microsoft IddSampleDriver](https://github.com/microsoft/Windows-driver-samples/tree/main/video/IndirectDisplay) ·
|
||
[SudoMaker/SudoVDA](https://github.com/SudoMaker/SudoVDA) ([ioctl.h](https://github.com/SudoMaker/SudoVDA/blob/master/Common/Include/sudovda-ioctl.h)) ·
|
||
**[MolotovCherry/virtual-display-rs (Rust, MIT)](https://github.com/MolotovCherry/virtual-display-rs)** ·
|
||
[Looking Glass IDD (swap-chain → shm, no DDA)](https://deepwiki.com/gnif/LookingGlass/2.5-indirect-display-driver-(idd)) ·
|
||
[itsmikethetech/Virtual-Display-Driver](https://github.com/itsmikethetech/Virtual-Display-Driver)
|