Files
punktfunk/docs/windows-host-rewrite-audit.md
T
enricobuehler 5d279f8886 docs(windows-rewrite): audit-remediation hand-off tracker
Living progress/hand-off doc (docs/windows-host-rewrite-remediation.md): the 9 committed remediation commits with audit refs + how each was verified, the remaining tasks (D2, D1-host, E1, G) with scope / on-glass-gating / verification notes, the box verification recipe, and the new modules introduced. Cross-linked from the audit doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 14:30:43 +00:00

18 KiB
Raw Blame History

Windows Host Rewrite — Audit

Status: audit (2026-06-25). Reviews the state of the Windows host rewrite against its plan (docs/windows-host-rewrite.md). Read-only assessment — no code changed. Scope: the new IddCx driver workspace (packaging/windows/drivers/), the owned ABI crate (crates/pf-vdisplay-proto), the host-side IDD-push path (capture/idd_push.rs, vdisplay/pf_vdisplay.rs), and the deployment/packaging seam. Evidence is cited as file:line.

Remediation in progress (2026-06-25). The findings below were the state at audit time; several are already being worked through. Resolved since: the cutover (§3) — STEP 8 gave the new driver its own .inx and re-vendored the installer to the new wdk-sys build (pf_vdisplay.dll 613 KB → 251 KB), so the new driver is now the shipped one; and the proto ABI hardening (§6.1/§6.2) — offset asserts + the owned gamepad SHM layouts have landed. Live progress + the hand-off task list are tracked in docs/windows-host-rewrite-remediation.md.


0. Bottom line

The framing "the Windows host has been rewritten with IDD-push as the main path" overstates what is on disk. What actually landed is the driver rewrite (plan M0 + M1, STEPs 07): a clean, new, all-Rust IddCx driver (packaging/windows/drivers/pf-vdisplay, ~2,000 LOC) on the unified windows-drivers-rs stack, speaking an owned ABI crate (pf-vdisplay-proto), validated on-glass through HDR. That is the hardest, highest-risk part of the plan (the /INTEGRITYCHECK answer, the iddcx binding on wdk-sys, on-glass IDD-push + HDR) and it is genuinely well executed.

Three facts the framing hides:

  1. The new path is not the shipped path — it is not shipped at all. The installer still vendors and installs the old vdisplay-driver/ (wdf-umdf) build (packaging/windows/pf-vdisplay/pf_vdisplay.dll, dated 2026-06-24). The new driver has no INF in-tree, is not vendored, and therefore cannot be packaged. IDD-push capture is gated behind PUNKTFUNK_IDD_PUSH, which is not set in scripts/windows/host.env.example, so the default capture path is WGC→DDA and the default display backend falls back to SudoVDA whenever the new driver interface isn't enumerable. The new path runs only on a hand-built bench box with the env var set.
  2. The host-side rewrite — Goal 1 — has not started. No src/windows/ tree, no config.rs/ HostConfig, no SessionFactory/SessionPlan, no session/. The old god-files are intact. SudoVDA was not removed (135 refs; sudovda.rs is a hard dependency of the new path). Unsafe went up, not down.
  3. The new driver itself diverges from its own spec in load-bearing ways — the watchdog is dead code, SET_RENDER_ADAPTER is a stub, the §2.5 ownership-model refactor wasn't done, and world-writable logging was re-introduced.

So the riskiest proof is done (real progress). The rewrite (clean architecture, cutover, hardening) is still ahead.


1. Goal / milestone scorecard

Goal / milestone Status Evidence
M0 proto ABI + driver toolchain + /INTEGRITYCHECK + iddcx binding Done pf-vdisplay-proto, vendored windows-drivers-rs, clear-force-integrity.ps1
M1 new IddCx driver, first light + HDR Done (on-glass) STEPs 07; swap_chain_processor.rs, frame_transport.rs, callbacks.rs
Goal 1 clean, layered host architecture Not started no src/windows/, config.rs, session/, SessionFactory/SessionPlan
Goal 2 drop every trace of SudoVDA Not done 135 sudovda refs; sudovda.rs (1,193 LOC) is a hard dep of pf_vdisplay.rs + idd_push.rs
Goal 3 minimize unsafe + P0 lints Regressed host unsafe ~476 (↑); driver ~160 vs ~60 target; no P0 lints anywhere; OwnedHandle in 0 host files
§2.5 delete driver global statics / DeviceContext-owned state / EvtCleanupCallback Not done MONITOR_MODES/NEXT_ID/ADAPTER/DEVICE_POOL still process-globals; DeviceContext{_device} empty; no monitor cleanup callback
M4 unify gamepad drivers onto new stack Not started workspace members = wdk-probe/wdk-iddcx/pf-vdisplay only; gamepad drivers still standalone wdf-umdf
M6 cutover + delete old monoliths Not reached old driver trees + dxgi/wgc/wgc_relay/sudovda/punktfunk1 all present (partly by-design as "reference until parity")

2. What landed well (preserve, do not regress)

  • The §1 driver "jewels" survived the port. The two real swap-chain leak fixes are verbatim with their rationale: borrow IDXGIDevice once across SetDevice retries (swap_chain_processor.rs:174), and check terminate at the loop top during a frame burst (:238). DEVICE_POOL keyed by render LUID (the NVIDIA UMD-thread/VRAM leak fix) is intact (direct_3d_device.rs:115). Monitor lock discipline (drop the worker outside MONITOR_MODES) is correct (monitor.rs:343-390).
  • The frame transport is clean and correct — the standout module. FramePublisher uses pf_vdisplay_proto::frame for header/token/names (no hand-rolled offsets), straight-line acquire→copy→release with no ? between lock/unlock (frame_transport.rs:266-275), format guard before CopyResource, stale-ring generation detection, correct drop order.
  • The proto control plane is properly owned: fresh GUID (not SudoVDA's e5bcc234), centralized FrameToken::pack/unpack used by both sides, and a real version handshake the host actually asserts and bails on mismatch (pf_vdisplay.rs:455-466). Typed IOCTL dispatch collapsed the per-call unsafe (control.rs).
  • Per-block // SAFETY: discipline is already present throughout the new driver — most of the value of clippy::undocumented_unsafe_blocks without the lint being on yet.

3. Deployment gap (the headline)

The new path is built and validated but not reachable by an installed product.

  • Installer ships the old driver. packaging/windows/stage-pf-vdisplay.ps1:7-8 vendors the signed output of packaging/windows/vdisplay-driver/ (the wdf-umdf tree); punktfunk-host.iss installs that via install-pf-vdisplay.ps1. The vendored binary is packaging/windows/pf-vdisplay/pf_vdisplay.dll (613,760 bytes — the old build).
  • New driver is not packageable. find packaging/windows/drivers -name '*.inf' → none. The new workspace is built + FORCE_INTEGRITY-cleared in CI (windows-drivers.yml) as a compile/link gate only; nothing signs or vendors its output.
  • GUID split keeps them apart. The old driver exposes the old SudoVDA interface GUID; the host's sudovda.rs backend opens it. The new driver exposes the fresh 70667664-… GUID; only pf_vdisplay.rs opens it. With the old driver installed, pf_vdisplay::is_available() → false → the host silently uses the SudoVDA backend.
  • IDD-push is off by default. scripts/windows/host.env.example sets only PUNKTFUNK_ENCODER=auto, PUNKTFUNK_VIDEO_SOURCE=virtual, PUNKTFUNK_SECURE_DDA=1, RUST_LOG=info. PUNKTFUNK_IDD_PUSH is checked via var_os(...).is_some() (capture.rs:348, punktfunk1.rs:2223+, pf_vdisplay.rs:57) but never set in deployment.

Net: a freshly installed Windows host runs old driver + SudoVDA backend + WGC/DDA capture — the pre-rewrite path. The rewrite is a manually-validated parallel track, not a delivered feature.


4. Driver code audit — stability / correctness

4.1 P0 — the watchdog is dead code; host-crash leaks an orphan monitor

WATCHDOG_PINGS is incremented on IOCTL_PING (control.rs:35) but nothing reads it — the only thread::spawn in the driver is the swap-chain worker (swap_chain_processor.rs:104). The comments are misleading: "STEP 4's watchdog thread samples it" (control.rs:17) and "the watchdog reaps all monitors" (control.rs:14) describe a thread that does not exist; adapter_init_finished (callbacks.rs:30-37) does not start one despite its doc claiming so.

Consequence: if serve dies or the service is stopped with TerminateProcess (skipping Drop → no IOCTL_REMOVE), the virtual monitor + its worker thread + pooled D3D device persist in WUDFHost until the next host start issues IOCTL_CLEAR_ALL. If the host is not restarted, the orphan monitor stays plugged into the desktop topology indefinitely.

The plan called for host-gone detection by EvtCleanupCallback RAII, a polling watchdog, or EvtFileClose (§3.4) — none is implemented. Fix: implement the watchdog thread, or (preferred) wire EvtFileClose so "host holds the control handle open" = liveness; and remove the false comments.

4.2 P1 — SET_RENDER_ADAPTER is a stub → hybrid-GPU is a hard failure

control.rs:47 returns STATUS_NOT_IMPLEMENTED, contradicting plan §3.2 (which made it unconditional). The driver renders the virtual monitor on whatever adapter the OS picks (callbacks.rs:275, pooled_device(luid)) and reports that LUID to the host. On a hybrid iGPU+dGPU box, if the OS picks the iGPU, the host's ring textures (created on the NVENC dGPU) fail OpenSharedResourceByNameDRV_STATUS_TEX_FAIL (frame_transport.rs:195-208) → the host's 20 s hard bail (§5.1). This is a silent hard failure on common Optimus/hybrid configs. The single-dGPU RTX bench box never reproduced it.

4.3 P1 — the §2.5 ownership refactor wasn't done

State is still process-global: MONITOR_MODES/NEXT_ID (monitor.rs:63,65), ADAPTER (adapter.rs:41), DEVICE_POOL (direct_3d_device.rs:115); DeviceContext is an empty { _device } (entry.rs:20). No EvtCleanupCallback on the monitor object (monitor.rs:292-296 sets only Size + scope). Monitor identity is still 3-keyed (id/object/session_id), not the collapsed single Monitor.

This is why the plan's central payoff — stable monitor reuse → drop the preempt dance → unblock max_concurrent>1 on Windows — was not achieved. The host still does fresh-monitor-per-session with the IDD_SETUP_LOCK preempt + wait_for_monitor_released dance (punktfunk1.rs:2216-2237), so Windows IDD-push is effectively single-client even though DEFAULT_MAX_CONCURRENT = 4.

4.4 P2 — world-writable logging re-introduced

Plan §6 said delete the C:\Users\Public\*.log driver logging; the new driver re-added it (pf-vdisplay/src/log.rs:18C:\Users\Public\pfvd-driver.log). Info-leak / DoS surface; should move to ETW or be gated off release builds.

4.5 P2 — no control-plane input validation

create_monitor receives width/height/refresh from the IOCTL with no bounds check (control.rs:62-63monitor.rs:243). The host is a trusted LocalSystem process so the trust boundary holds, but a buggy host could request an absurd mode. read_input uses T: Copy, not bytemuck::Pod (control.rs:96); Pod would be a stronger guarantee.


5. Host code audit

5.1 P1 — when IDD-push is engaged there is no fallback

The plan kept WGC/DDA as a safety net; the code commits hard. capture.rs:345 consumes the keepalive and returns the IDD-push capturer with "no fall-through"; attach failure surfaces as a 20 s deadline bail! (idd_push.rs:820-846) that tears the session down black rather than degrading to DDA. Combined with §4.2, hybrid-GPU = a guaranteed 20 s black-then-drop.

5.2 P1 — SudoVDA is a hard dependency of the "new" path

pf_vdisplay.rs and idd_push.rs import isolate_displays_ccd/resolve_render_adapter_luid/ set_advanced_color/CURRENT_MON_GEN directly from super::sudovda (pf_vdisplay.rs:43-46, idd_push.rs:351-356,809). punktfunk1.rs:2231 calls crate::vdisplay::sudovda::wait_for_monitor_released even when pf-vdisplay is the live backend — benign today only because pf-vdisplay preempts inline and the SudoVDA MGR is empty (pf_vdisplay.rs:645-647), but it is a fragile cross-static landmine. Plan §9 (move CCD/adapter helpers into neutral windows/display_ccd.rs + adapter.rs) is the right fix and is unstarted.

5.3 P2 — texture-ownership contract is convention, not types

The §4 in-place-encode hazard is mitigated by a host-owned 3-slot OUT_RING + pipeline_depth().clamp(1, OUT_RING) (idd_push.rs:60,867-872) — sound for the live synchronous loop — but nothing type-enforces it. nvenc.rs:7-10 still carries the "safe because the loop is synchronous" comment, and repeat_last() (idd_push.rs:755-766) can re-hand an out-ring slot that may still be encoding under depth>1. Narrow, but it is the residual corruption edge the plan wanted closed type-level.

5.4 P2 — HDR toggle recreates the whole ring mid-session

recreate_ring (idd_push.rs:582-617) drops + recreates all 6 keyed-mutex textures on an HDR mode flip, polled on a 250 ms throttle (idd_push.rs:622-626) → up to a 250 ms format-mismatch freeze window where the driver drops every frame (frame_transport.rs:256-260). Works, but heavy and visibly janky.


6. ABI / proto

6.1 P1 — gamepad SHM was not migrated into proto (the one real drift hazard)

Plan §3.1 wanted XusbShm (64 B) and PadShm (256 B incl. device_type) in pf-vdisplay-proto. They are hand-duplicated across four sides on two build graphs, with device_type as a bare literal 140: host inject/dualsense_windows.rs:45-52 (OFF_DEVTYPE=140) vs driver dualsense-driver/src/lib.rs:753 (*view.add(140)); XUSB host inject/gamepad_windows.rs:36-47 vs driver xusb-driver/src/lib.rs. A one-sided edit compiles clean on both and silently mis-routes. The pf-vdisplay frame/control contract got compile-error-on-drift; the gamepad contract did not. (The gamepad drivers being standalone cargo workspaces is the structural blocker — folding them into the unified workspace, M4, fixes both.)

6.2 P2 — proto advertises offset asserts but only has size asserts

SharedHeader (14 mixed-width fields + a _pad) is guarded by size_of == 64 + bytemuck-Pod (pf-vdisplay-proto/src/lib.rs:232), which catches most regressions but not a same-size field reorder. Add offset_of! asserts for magic/latest/generation/dxgi_format/driver_status and the AddReply LUID split.


7. Performance opportunities

  • Hybrid-GPU cross-adapter copy (once §4.2 SET_RENDER_ADAPTER works): pinning the driver render to the NVENC GPU removes a cross-adapter staging path entirely — correctness and latency.
  • HDR ring recreate (§5.4) is the heaviest per-session-event op; if the display HDR state is known at open() from the negotiated mode, size the ring right the first time and skip the recreate + 250 ms window in the common case.
  • Keyed-mutex acquire timeout is 8 ms on the host consume side (idd_push.rs:725) — at 240 Hz (4.2 ms/frame) one stall already drops ≥2 frames. Reasonable as a safety bound; worth measuring under load against a tighter value plus an explicit drop counter.
  • The encode|send split, microburst pacing, and pipeline_depth=2 convert/copy-vs-NVENC overlap are preserved — no regression on the hot path.

8. Hygiene (Goal 3)

  • No P0 lints anywhere. Neither the host crate nor the new driver crates carry deny(unsafe_op_in_unsafe_fn) / warn(clippy::undocumented_unsafe_blocks) / warn(clippy::multiple_unsafe_ops_per_block). The plan claimed the driver workspace "already has it"; it does not (pf-vdisplay/src/lib.rs:11 is only allow(...)). A few-line, high-leverage first step before any further unsafe work.
  • OwnedHandle/from_raw_handle used in zero host files — the plan's "single biggest cheap win." pf_vdisplay.rs holds a raw isize device handle in the pinger thread; idd_push.rs holds raw event/map handles. Obvious first conversions.
  • Unsafe counts moved the wrong way. Host ~476 (target ~35); new driver ~160 (target for all three drivers ~60), and the old gamepad drivers are untouched on top of that.

P0 — correctness/stability, before relying on the path

  1. Make host-gone detection real: implement the watchdog thread or EvtFileClose, and delete the false "watchdog" comments. Verify service stop is cooperative (named stop event → DropIOCTL_REMOVE), not TerminateProcess. (§4.1)
  2. Implement SET_RENDER_ADAPTER (pin driver render to the NVENC adapter) and add a real capture fallback (IDD-push attach failure → DDA) instead of the 20 s black bail. (§4.2, §5.1)

P1 — ship-ability + the actual rewrite 3. Cutover plan: give the new driver an in-tree INF, vendor its signed output, flip stage-pf-vdisplay.ps1, and make IDD-push the code default (WGC/DDA fallback) or set PUNKTFUNK_IDD_PUSH=1 in host.env. Until then the rewrite does not reach users. (§3) 4. Migrate the gamepad SHM into pf-vdisplay-proto (kills the 140-literal drift hazard). (§6.1) 5. Add the P0 lints; convert raw handles to OwnedHandle. (§8)

P2 — the host-side architecture (Goal 1, the bulk of "rewrite the host") 6. §2.5 driver ownership refactor (DeviceContext state + EvtCleanupCallback + single monitor identity) — the prerequisite to max_concurrent>1 on Windows. (§4.3) 7. §9 SudoVDA decoupling (split CCD/adapter helpers into neutral modules), then the §2.2/§2.4 host tree (config.rs/SessionFactory) — the clean architecture that was Goal 1. (§5.2) 8. Offset asserts in proto; remove world-writable driver logging; M4 gamepad-driver unification; then M6 deletion of the old monoliths. (§6.2, §4.4)


Appendix — methodology

Full read of the new driver (packaging/windows/drivers/pf-vdisplay/src/*.rs, wdk-iddcx/src/lib.rs) and pf-vdisplay-proto; targeted read of the host IDD-push path (capture/idd_push.rs, vdisplay/pf_vdisplay.rs, capture.rs, vdisplay.rs, encode.rs, encode/nvenc.rs); structural grep/diff of plan §2.2/§6/§8/§9/§10 against the on-disk tree; packaging/CI inspection (punktfunk-host.iss, stage-pf-vdisplay.ps1, windows-drivers.yml, scripts/windows/host.env.example). Unsafe counts are raw grep -c unsafe over the relevant subtrees (occurrences, not blocks). Not validated on hardware — this audit reads code and packaging only; on-glass behavior is per the commit log and docs/windows-host-rewrite.md §1314.