From 0badc17d8799914ffce526b727a588e0f95451b0 Mon Sep 17 00:00:00 2001 From: enricobuehler Date: Thu, 25 Jun 2026 12:39:42 +0000 Subject: [PATCH] docs(windows-rewrite): audit the IDD-push rewrite against its plan Driver track (M0+M1, STEPs 0-7) landed and is on-glass-validated, but the host-side goals (clean architecture, SudoVDA removal, unsafe reduction) and several driver-spec items (host-gone watchdog, SET_RENDER_ADAPTER, ownership model) are not yet done. Full findings + a prioritized P0-P2 fix list in the doc. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/windows-host-rewrite-audit.md | 287 +++++++++++++++++++++++++++++ 1 file changed, 287 insertions(+) create mode 100644 docs/windows-host-rewrite-audit.md diff --git a/docs/windows-host-rewrite-audit.md b/docs/windows-host-rewrite-audit.md new file mode 100644 index 0000000..7e1a833 --- /dev/null +++ b/docs/windows-host-rewrite-audit.md @@ -0,0 +1,287 @@ +# Windows Host Rewrite — Audit + +Status: **audit** (2026-06-25). Reviews the state of the Windows host rewrite against its plan +([`docs/windows-host-rewrite.md`](windows-host-rewrite.md)). Read-only assessment — no code changed. +Scope: the new IddCx driver workspace (`packaging/windows/drivers/`), the owned ABI crate +(`crates/pf-vdisplay-proto`), the host-side IDD-push path (`capture/idd_push.rs`, +`vdisplay/pf_vdisplay.rs`), and the deployment/packaging seam. Evidence is cited as `file:line`. + +> **Remediation in progress (2026-06-25).** The findings below were the state at audit time; several are +> already being worked through. Resolved since: the **cutover (§3)** — STEP 8 gave the new driver its own +> `.inx` and re-vendored the installer to the new wdk-sys build (`pf_vdisplay.dll` 613 KB → 251 KB), so the +> new driver is now the shipped one; and the **proto ABI hardening (§6.1/§6.2)** — offset asserts + the +> owned gamepad SHM layouts have landed. Remaining items are tracked and in progress. + +--- + +## 0. Bottom line + +The framing "the Windows host has been rewritten with IDD-push as the main path" **overstates what is +on disk.** What actually landed is the **driver rewrite** (plan M0 + M1, STEPs 0–7): a clean, new, +all-Rust IddCx driver (`packaging/windows/drivers/pf-vdisplay`, ~2,000 LOC) on the unified +`windows-drivers-rs` stack, speaking an owned ABI crate (`pf-vdisplay-proto`), validated on-glass through +HDR. That is the hardest, highest-risk part of the plan (the `/INTEGRITYCHECK` answer, the `iddcx` binding +on `wdk-sys`, on-glass IDD-push + HDR) and it is genuinely well executed. + +Three facts the framing hides: + +1. **The new path is not the shipped path — it is not shipped at all.** The installer still vendors and + installs the **old** `vdisplay-driver/` (wdf-umdf) build + (`packaging/windows/pf-vdisplay/pf_vdisplay.dll`, dated 2026-06-24). The new driver has **no INF + in-tree**, is not vendored, and therefore cannot be packaged. IDD-push capture is gated behind + `PUNKTFUNK_IDD_PUSH`, which is **not set** in `scripts/windows/host.env.example`, so the default + capture path is **WGC→DDA** and the default display backend falls back to **SudoVDA** whenever the new + driver interface isn't enumerable. The new path runs only on a hand-built bench box with the env var + set. +2. **The host-side rewrite — Goal 1 — has not started.** No `src/windows/` tree, no `config.rs`/ + `HostConfig`, no `SessionFactory`/`SessionPlan`, no `session/`. The old god-files are intact. SudoVDA + was not removed (135 refs; `sudovda.rs` is a *hard dependency* of the new path). Unsafe went **up**, + not down. +3. **The new driver itself diverges from its own spec in load-bearing ways** — the watchdog is dead code, + `SET_RENDER_ADAPTER` is a stub, the §2.5 ownership-model refactor wasn't done, and world-writable + logging was re-introduced. + +So the riskiest **proof** is done (real progress). The **rewrite** (clean architecture, cutover, +hardening) is still ahead. + +--- + +## 1. Goal / milestone scorecard + +| Goal / milestone | Status | Evidence | +|---|---|---| +| **M0** proto ABI + driver toolchain + `/INTEGRITYCHECK` + iddcx binding | ✅ Done | `pf-vdisplay-proto`, vendored `windows-drivers-rs`, `clear-force-integrity.ps1` | +| **M1** new IddCx driver, first light + HDR | ✅ Done (on-glass) | STEPs 0–7; `swap_chain_processor.rs`, `frame_transport.rs`, `callbacks.rs` | +| **Goal 1** clean, layered host architecture | ❌ Not started | no `src/windows/`, `config.rs`, `session/`, `SessionFactory`/`SessionPlan` | +| **Goal 2** drop every trace of SudoVDA | ❌ Not done | 135 `sudovda` refs; `sudovda.rs` (1,193 LOC) is a hard dep of `pf_vdisplay.rs` + `idd_push.rs` | +| **Goal 3** minimize unsafe + P0 lints | ❌ Regressed | host unsafe ~476 (↑); driver ~160 vs ~60 target; **no** P0 lints anywhere; `OwnedHandle` in **0** host files | +| **§2.5** delete driver global statics / DeviceContext-owned state / `EvtCleanupCallback` | ❌ Not done | `MONITOR_MODES`/`NEXT_ID`/`ADAPTER`/`DEVICE_POOL` still process-globals; `DeviceContext{_device}` empty; no monitor cleanup callback | +| **M4** unify gamepad drivers onto new stack | ❌ Not started | workspace members = `wdk-probe/wdk-iddcx/pf-vdisplay` only; gamepad drivers still standalone wdf-umdf | +| **M6** cutover + delete old monoliths | ❌ Not reached | old driver trees + `dxgi/wgc/wgc_relay/sudovda/punktfunk1` all present (partly by-design as "reference until parity") | + +--- + +## 2. What landed well (preserve, do not regress) + +- **The §1 driver "jewels" survived the port.** The two real swap-chain leak fixes are verbatim with + their rationale: borrow `IDXGIDevice` once across `SetDevice` retries + (`swap_chain_processor.rs:174`), and check `terminate` at the loop top during a frame burst (`:238`). + `DEVICE_POOL` keyed by render LUID (the NVIDIA UMD-thread/VRAM leak fix) is intact + (`direct_3d_device.rs:115`). Monitor lock discipline (drop the worker **outside** `MONITOR_MODES`) is + correct (`monitor.rs:343-390`). +- **The frame transport is clean and correct** — the standout module. `FramePublisher` uses + `pf_vdisplay_proto::frame` for header/token/names (no hand-rolled offsets), straight-line + acquire→copy→release with no `?` between lock/unlock (`frame_transport.rs:266-275`), format guard + before `CopyResource`, stale-ring generation detection, correct drop order. +- **The proto control plane is properly owned**: fresh GUID (not SudoVDA's `e5bcc234`), centralized + `FrameToken::pack/unpack` used by both sides, and a **real version handshake the host actually + asserts** and bails on mismatch (`pf_vdisplay.rs:455-466`). Typed IOCTL dispatch collapsed the + per-call unsafe (`control.rs`). +- **Per-block `// SAFETY:` discipline** is already present throughout the new driver — most of the value + of `clippy::undocumented_unsafe_blocks` without the lint being on yet. + +--- + +## 3. Deployment gap (the headline) + +The new path is built and validated but not reachable by an installed product. + +- **Installer ships the old driver.** `packaging/windows/stage-pf-vdisplay.ps1:7-8` vendors the signed + output of `packaging/windows/vdisplay-driver/` (the wdf-umdf tree); `punktfunk-host.iss` installs that + via `install-pf-vdisplay.ps1`. The vendored binary is `packaging/windows/pf-vdisplay/pf_vdisplay.dll` + (613,760 bytes — the old build). +- **New driver is not packageable.** `find packaging/windows/drivers -name '*.inf'` → none. The new + workspace is built + FORCE_INTEGRITY-cleared in CI (`windows-drivers.yml`) as a **compile/link gate + only**; nothing signs or vendors its output. +- **GUID split keeps them apart.** The old driver exposes the old SudoVDA interface GUID; the host's + `sudovda.rs` backend opens it. The new driver exposes the fresh `70667664-…` GUID; only + `pf_vdisplay.rs` opens it. With the old driver installed, `pf_vdisplay::is_available()` → false → the + host silently uses the SudoVDA backend. +- **IDD-push is off by default.** `scripts/windows/host.env.example` sets only + `PUNKTFUNK_ENCODER=auto`, `PUNKTFUNK_VIDEO_SOURCE=virtual`, `PUNKTFUNK_SECURE_DDA=1`, `RUST_LOG=info`. + `PUNKTFUNK_IDD_PUSH` is checked via `var_os(...).is_some()` (`capture.rs:348`, `punktfunk1.rs:2223+`, + `pf_vdisplay.rs:57`) but never set in deployment. + +Net: a freshly installed Windows host runs **old driver + SudoVDA backend + WGC/DDA capture** — the +pre-rewrite path. The rewrite is a manually-validated parallel track, not a delivered feature. + +--- + +## 4. Driver code audit — stability / correctness + +### 4.1 P0 — the watchdog is dead code; host-crash leaks an orphan monitor + +`WATCHDOG_PINGS` is incremented on `IOCTL_PING` (`control.rs:35`) but **nothing reads it** — the only +`thread::spawn` in the driver is the swap-chain worker (`swap_chain_processor.rs:104`). The comments are +misleading: "STEP 4's watchdog thread samples it" (`control.rs:17`) and "the watchdog reaps all monitors" +(`control.rs:14`) describe a thread that does not exist; `adapter_init_finished` +(`callbacks.rs:30-37`) does not start one despite its doc claiming so. + +Consequence: if `serve` dies or the service is stopped with `TerminateProcess` (skipping `Drop` → no +`IOCTL_REMOVE`), the virtual monitor + its worker thread + pooled D3D device persist in WUDFHost until the +**next** host start issues `IOCTL_CLEAR_ALL`. If the host is not restarted, the orphan monitor stays +plugged into the desktop topology indefinitely. + +The plan called for host-gone detection by **`EvtCleanupCallback` RAII**, a **polling watchdog**, or +**`EvtFileClose`** (§3.4) — none is implemented. Fix: implement the watchdog thread, or (preferred) wire +`EvtFileClose` so "host holds the control handle open" = liveness; and remove the false comments. + +### 4.2 P1 — `SET_RENDER_ADAPTER` is a stub → hybrid-GPU is a hard failure + +`control.rs:47` returns `STATUS_NOT_IMPLEMENTED`, contradicting plan §3.2 (which made it unconditional). +The driver renders the virtual monitor on whatever adapter the OS picks (`callbacks.rs:275`, +`pooled_device(luid)`) and reports that LUID to the host. On a hybrid **iGPU+dGPU** box, if the OS picks +the iGPU, the host's ring textures (created on the NVENC dGPU) fail `OpenSharedResourceByName` → +`DRV_STATUS_TEX_FAIL` (`frame_transport.rs:195-208`) → the host's 20 s hard bail (§5.1). This is a silent +hard failure on common Optimus/hybrid configs. The single-dGPU RTX bench box never reproduced it. + +### 4.3 P1 — the §2.5 ownership refactor wasn't done + +State is still process-global: `MONITOR_MODES`/`NEXT_ID` (`monitor.rs:63,65`), `ADAPTER` +(`adapter.rs:41`), `DEVICE_POOL` (`direct_3d_device.rs:115`); `DeviceContext` is an empty `{ _device }` +(`entry.rs:20`). No `EvtCleanupCallback` on the monitor object (`monitor.rs:292-296` sets only Size + +scope). Monitor identity is still 3-keyed (`id`/`object`/`session_id`), not the collapsed single +`Monitor`. + +This is why the plan's central payoff — *stable monitor reuse → drop the preempt dance → unblock +`max_concurrent>1` on Windows* — was not achieved. The host still does fresh-monitor-per-session with the +`IDD_SETUP_LOCK` preempt + `wait_for_monitor_released` dance (`punktfunk1.rs:2216-2237`), so Windows +IDD-push is effectively single-client even though `DEFAULT_MAX_CONCURRENT = 4`. + +### 4.4 P2 — world-writable logging re-introduced + +Plan §6 said delete the `C:\Users\Public\*.log` driver logging; the new driver re-added it +(`pf-vdisplay/src/log.rs:18` → `C:\Users\Public\pfvd-driver.log`). Info-leak / DoS surface; should move to +ETW or be gated off release builds. + +### 4.5 P2 — no control-plane input validation + +`create_monitor` receives `width/height/refresh` from the IOCTL with no bounds check (`control.rs:62-63` +→ `monitor.rs:243`). The host is a trusted LocalSystem process so the trust boundary holds, but a buggy +host could request an absurd mode. `read_input` uses `T: Copy`, not `bytemuck::Pod` (`control.rs:96`); +Pod would be a stronger guarantee. + +--- + +## 5. Host code audit + +### 5.1 P1 — when IDD-push is engaged there is no fallback + +The plan kept WGC/DDA as a safety net; the code commits hard. `capture.rs:345` consumes the keepalive and +returns the IDD-push capturer with "no fall-through"; attach failure surfaces as a **20 s deadline +`bail!`** (`idd_push.rs:820-846`) that tears the session down black rather than degrading to DDA. Combined +with §4.2, hybrid-GPU = a guaranteed 20 s black-then-drop. + +### 5.2 P1 — SudoVDA is a hard dependency of the "new" path + +`pf_vdisplay.rs` and `idd_push.rs` import `isolate_displays_ccd`/`resolve_render_adapter_luid`/ +`set_advanced_color`/`CURRENT_MON_GEN` directly from `super::sudovda` (`pf_vdisplay.rs:43-46`, +`idd_push.rs:351-356,809`). `punktfunk1.rs:2231` calls `crate::vdisplay::sudovda::wait_for_monitor_released` +even when pf-vdisplay is the live backend — benign **today** only because pf-vdisplay preempts inline and +the SudoVDA `MGR` is empty (`pf_vdisplay.rs:645-647`), but it is a fragile cross-static landmine. Plan §9 +(move CCD/adapter helpers into neutral `windows/display_ccd.rs` + `adapter.rs`) is the right fix and is +unstarted. + +### 5.3 P2 — texture-ownership contract is convention, not types + +The §4 in-place-encode hazard is *mitigated* by a host-owned 3-slot `OUT_RING` + +`pipeline_depth().clamp(1, OUT_RING)` (`idd_push.rs:60,867-872`) — sound for the live synchronous loop — +but nothing type-enforces it. `nvenc.rs:7-10` still carries the "safe because the loop is synchronous" +comment, and `repeat_last()` (`idd_push.rs:755-766`) can re-hand an out-ring slot that may still be +encoding under depth>1. Narrow, but it is the residual corruption edge the plan wanted closed type-level. + +### 5.4 P2 — HDR toggle recreates the whole ring mid-session + +`recreate_ring` (`idd_push.rs:582-617`) drops + recreates all 6 keyed-mutex textures on an HDR mode flip, +polled on a 250 ms throttle (`idd_push.rs:622-626`) → up to a 250 ms format-mismatch freeze window where +the driver drops every frame (`frame_transport.rs:256-260`). Works, but heavy and visibly janky. + +--- + +## 6. ABI / proto + +### 6.1 P1 — gamepad SHM was not migrated into proto (the one real drift hazard) + +Plan §3.1 wanted `XusbShm` (64 B) and `PadShm` (256 B incl. `device_type`) in `pf-vdisplay-proto`. They +are hand-duplicated across four sides on two build graphs, with `device_type` as a bare literal `140`: +host `inject/dualsense_windows.rs:45-52` (`OFF_DEVTYPE=140`) vs driver `dualsense-driver/src/lib.rs:753` +(`*view.add(140)`); XUSB host `inject/gamepad_windows.rs:36-47` vs driver `xusb-driver/src/lib.rs`. A +one-sided edit compiles clean on both and silently mis-routes. The `pf-vdisplay` frame/control contract +got compile-error-on-drift; the gamepad contract did not. (The gamepad drivers being standalone cargo +workspaces is the structural blocker — folding them into the unified workspace, M4, fixes both.) + +### 6.2 P2 — proto advertises offset asserts but only has size asserts + +`SharedHeader` (14 mixed-width fields + a `_pad`) is guarded by `size_of == 64` + bytemuck-Pod +(`pf-vdisplay-proto/src/lib.rs:232`), which catches most regressions but not a same-size field reorder. +Add `offset_of!` asserts for `magic/latest/generation/dxgi_format/driver_status` and the `AddReply` LUID +split. + +--- + +## 7. Performance opportunities + +- **Hybrid-GPU cross-adapter copy** (once §4.2 `SET_RENDER_ADAPTER` works): pinning the driver render to + the NVENC GPU removes a cross-adapter staging path entirely — correctness *and* latency. +- **HDR ring recreate** (§5.4) is the heaviest per-session-event op; if the display HDR state is known at + `open()` from the negotiated mode, size the ring right the first time and skip the recreate + 250 ms + window in the common case. +- **Keyed-mutex acquire timeout is 8 ms** on the host consume side (`idd_push.rs:725`) — at 240 Hz + (4.2 ms/frame) one stall already drops ≥2 frames. Reasonable as a safety bound; worth measuring under + load against a tighter value plus an explicit drop counter. +- The encode|send split, microburst pacing, and `pipeline_depth=2` convert/copy-vs-NVENC overlap are + preserved — no regression on the hot path. + +--- + +## 8. Hygiene (Goal 3) + +- **No P0 lints anywhere.** Neither the host crate nor the new driver crates carry + `deny(unsafe_op_in_unsafe_fn)` / `warn(clippy::undocumented_unsafe_blocks)` / + `warn(clippy::multiple_unsafe_ops_per_block)`. The plan claimed the driver workspace "already has it"; + it does not (`pf-vdisplay/src/lib.rs:11` is only `allow(...)`). A few-line, high-leverage first step + before any further unsafe work. +- **`OwnedHandle`/`from_raw_handle` used in zero host files** — the plan's "single biggest cheap win." + `pf_vdisplay.rs` holds a raw `isize` device handle in the pinger thread; `idd_push.rs` holds raw + event/map handles. Obvious first conversions. +- **Unsafe counts moved the wrong way.** Host ~476 (target ~35); new driver ~160 (target for all three + drivers ~60), and the old gamepad drivers are untouched on top of that. + +--- + +## 9. Recommended priority order + +**P0 — correctness/stability, before relying on the path** +1. Make host-gone detection real: implement the watchdog thread **or** `EvtFileClose`, and delete the + false "watchdog" comments. Verify service stop is cooperative (named stop event → `Drop` → + `IOCTL_REMOVE`), not `TerminateProcess`. (§4.1) +2. Implement `SET_RENDER_ADAPTER` (pin driver render to the NVENC adapter) **and** add a real capture + fallback (IDD-push attach failure → DDA) instead of the 20 s black bail. (§4.2, §5.1) + +**P1 — ship-ability + the actual rewrite** +3. Cutover plan: give the new driver an in-tree INF, vendor *its* signed output, flip + `stage-pf-vdisplay.ps1`, and make IDD-push the code default (WGC/DDA fallback) or set + `PUNKTFUNK_IDD_PUSH=1` in `host.env`. Until then the rewrite does not reach users. (§3) +4. Migrate the gamepad SHM into `pf-vdisplay-proto` (kills the `140`-literal drift hazard). (§6.1) +5. Add the P0 lints; convert raw handles to `OwnedHandle`. (§8) + +**P2 — the host-side architecture (Goal 1, the bulk of "rewrite the host")** +6. §2.5 driver ownership refactor (DeviceContext state + `EvtCleanupCallback` + single monitor identity) + — the prerequisite to `max_concurrent>1` on Windows. (§4.3) +7. §9 SudoVDA decoupling (split CCD/adapter helpers into neutral modules), then the §2.2/§2.4 host tree + (`config.rs`/`SessionFactory`) — the clean architecture that was Goal 1. (§5.2) +8. Offset asserts in proto; remove world-writable driver logging; M4 gamepad-driver unification; then M6 + deletion of the old monoliths. (§6.2, §4.4) + +--- + +## Appendix — methodology + +Full read of the new driver (`packaging/windows/drivers/pf-vdisplay/src/*.rs`, `wdk-iddcx/src/lib.rs`) +and `pf-vdisplay-proto`; targeted read of the host IDD-push path (`capture/idd_push.rs`, +`vdisplay/pf_vdisplay.rs`, `capture.rs`, `vdisplay.rs`, `encode.rs`, `encode/nvenc.rs`); structural +grep/diff of plan §2.2/§6/§8/§9/§10 against the on-disk tree; packaging/CI inspection +(`punktfunk-host.iss`, `stage-pf-vdisplay.ps1`, `windows-drivers.yml`, `scripts/windows/host.env.example`). +Unsafe counts are raw `grep -c unsafe` over the relevant subtrees (occurrences, not blocks). Not validated +on hardware — this audit reads code and packaging only; on-glass behavior is per the commit log and +[`docs/windows-host-rewrite.md`](windows-host-rewrite.md) §13–14.