Record GB1 (host-side recover-or-drop) + GB3 groundwork (driver descriptor guard/logging) in the tracker; note the RTX validation box IP floats (DHCP/ephemeral, recently .173/.158) instead of hardcoding .158. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
13 KiB
Windows Host Rewrite — Audit Remediation Tracker
Status: in progress (2026-06-25). Living hand-off doc for working through the findings in
docs/windows-host-rewrite-audit.md (the audit of the IDD-push rewrite
vs docs/windows-host-rewrite.md). Keep this updated as items land so the work
can be handed off without losing tasks.
TL;DR
- 9 commits on
main, NOT pushed (+9ahead oforigin/main, tipe60cda3). Each is compile-verified on the RTX box (see Verification). - Done: the entire audit P0 + P1 + P2 payload, the driver
unsafelint, and F1 (SudoVDA helper decoupling) complete. - Remaining: D2 (OwnedHandle), D1-host (unsafe-lint sweep), E1 (driver ownership refactor),
G (gamepad-driver unification + old-tree deletion + host
src/windows/tree). - Two cross-cutting follow-ups: (1) on-glass behavioral validation of the committed driver/host
fixes (the box is single-GPU + headless-ish, so hybrid-GPU / HDR-toggle / fallback paths weren't
exercised at runtime); (2) push to run the full CI matrix (the local checks skip the
amf-qsvpath).
Done — committed on main (unpushed)
| Commit | Audit § | What | Compile-verified |
|---|---|---|---|
0badc17 |
— | The audit doc itself | — |
95dcef3 |
§6.1/6.2 | A proto: offset_of! asserts on SharedHeader/AddReply/control structs; owned XusbShm/PadShm gamepad layouts (+ min_const_generics) |
local cargo test + MSVC (box) |
0a7ae5e |
§4.1/4.2/4.4/4.5 | B driver: real host-gone watchdog (was dead code), SET_RENDER_ADAPTER impl, world-writable-log gate, mode bounds + display_info u64-saturate |
driver cargo build (box) |
e5c9ee8 |
§4.2h/6.1 | C2/C5 host: render-pin comment/activation (driver now honors it); gamepad SHM consumers derive from pf_vdisplay_proto::gamepad |
host clippy (box) |
ed58365 |
§5.1 | C1 host: IDD-push attach fallback to DDA (open() hands keepalive back; bounded wait_for_attach on DRV_STATUS_OPENED) instead of the 20s black bail |
host clippy (box) |
b0d2838 |
§5.3/5.4 | C3/C4 host: repeat_last rotates+copies into a fresh out-ring slot; HDR ring sized FP16 at open when advanced-color is enabled |
host clippy (box) |
a755d6e |
§8 | D1-driver #![deny(unsafe_op_in_unsafe_fn)] on pf-vdisplay + wdk-iddcx |
driver cargo build (box) |
d638a93 |
§9 | F1 pt1: resolve_render_adapter_luid → neutral crate::win_adapter |
host clippy (box) |
e60cda3 |
§9 | F1 rest: 6 CCD/HDR helpers + SavedConfig → neutral crate::win_display; SudoVDA reach-in fully broken |
host clippy (box) + Linux cargo check |
Remaining — to do
Ordered by suggested sequence. On-glass = cannot be finished without a real session on the RTX box, driven by a human (driver install + client connect).
D2 — OwnedHandle on the new path · audit §8 · compile-verifiable · moderate
- Goal: replace raw
HANDLE/isizehandles held across their lifetime withstd::os::windows::io::OwnedHandle(RAII close, fixes leak-on-error, deletes manualCloseHandle). - Targets:
vdisplay/pf_vdisplay.rs— the pinger thread's rawisizedevice handle (pf_vdisplay.rs~324-344);capture/idd_push.rs—IddPushCapturer { map, event, dbg_map: HANDLE }(manually closed inDrop). The plan also lists events/jobs/tokens/sections inwindows/process.rs/service.rs(broader). - Risk: handle ownership (double-close / premature close). Compile catches type errors; lifecycle needs care. Touches the live IDD-push path → ideally smoke-tested on glass after.
- Verify: host clippy on the box (the new path is
--features nvenc).
D1-host — host-wide unsafe lint sweep · audit §8 · large/mechanical
- Goal: add
#![deny(unsafe_op_in_unsafe_fn)]+# to the host crate (crates/punktfunk-host/src/main.rs), and fix the fallout. - Scope: large — hundreds of
unsafeblocks across both Linux and Windows code need explicitunsafe {}wrapping insideunsafe fns and// SAFETY:comments. The driver already has thedeny(a755d6e); the host has none. - Verify: Linux
cargo clippy -p punktfunk-host --all-targets -- -D warnings(Linux/cross paths) and host clippy on the box (Windows paths). Do it incrementally per-subsystem to keep the diff reviewable.
E1 — driver ownership refactor · audit §4.3 / plan §2.5 + §14 step 5 · on-glass-gated · large
- Goal: move the driver's process-global statics (
MONITOR_MODES,NEXT_ID,ADAPTER,DEVICE_POOL) into a WDFDeviceContext; wireEvtCleanupCallbackon theIDDCX_MONITORobject so theSwapChainProcessor+ D3D drop via RAII; collapse the 3-key monitor identity (id/object/session_id) to one. Unblocksmax_concurrent>1on Windows + removes the host-side preempt dance. - Why on-glass: the plan's critique is explicit — instrument that
MonitorContext::Dropactually RAN; if the cleanup callback does not fire on this UMDF/IddCx stack, keep the current explicit REMOVE/teardown path as the fallback. Cannot be signed off compile-only. - Files:
packaging/windows/drivers/pf-vdisplay/src/{entry,adapter,monitor,callbacks,swap_chain_processor}.rs. - Verify: driver
cargo build(compile) on the box; then on-glass reconnect-storm + leak check (LIVE_DEVICEScounter indirect_3d_device.rs, the world-readable log whenPFVD_DEBUG_LOGis set).
G — gamepad-driver unification (M4) + deletion (M6) + host tree · audit §6/§10 + plan §2.2 · on-glass-gated · largest
- M4: fold
pf_dualsense+pf_xusb(today standalonepackaging/windows/{dualsense,xusb}-driver/on the oldwdfstack) into the unifiedpackaging/windows/drivers/workspace onwindows-drivers-rs. This also enables the driver-side gamepad-SHM→proto switch (host side already done in C5 — the driver still hand-readsview.add(140); point it atpf_vdisplay_proto::gamepad::PadShm/XusbShm). - M6: delete the old
packaging/windows/vdisplay-driver/tree + the old gamepad driver trees + the bring-up scaffolding (DebugBlock/spawn_observer/IDD_PERSIST/open_or_reuseinidd_push.rs) — only after on-glass parity of the new path. - Host architecture (Goal 1, plan §2.2/2.4): the
src/windows/subtree +config.rs(HostConfig) +SessionFactory/SessionPlan— not started. The biggest clarity lever; large.
Cross-cutting follow-ups (not a single task)
- On-glass validation of the committed fixes — needs the RTX box + a client. Specifically: the
watchdog actually reaps on host-kill (B1);
SET_RENDER_ADAPTERpins correctly on a hybrid box (B2/C2 — the lab box is single-dGPU, so this path is unexercised); the IDD-push→DDA fallback triggers- the happy path still attaches within 4s (C1); HDR ring sizing + out-ring repeat under real HDR / static-desktop pipelining (C3/C4).
- Push to run the full CI matrix — the local host checks use
--features nvenconly (no FFmpeg), so theamf-qsvencode path is unexercised locally; CI (windows-host.yml) covers it.
Related workstream — fullscreen-game IDD-push capture bug (separate doc)
A separate, newly-found bug (NOT an audit finding) in the same IDD-push subsystem, with its own staged
fix plan: docs/windows-host-rewrite-game-capture-bug.md.
Symptom: launching a fullscreen game (Doom the Dark Ages) on an HDR IDD-push stream flashes the desktop,
the game never shows, and reconnect = black screen + working audio. Root cause: the IDD-push ring is
fixed format+size at session start; the driver silently drops every frame whose surface descriptor no longer
matches (a game forces a mode-set); the host has no channel to learn the descriptor changed; and there is no
mid-session fallback → 20 s bail!.
Intersections with this remediation — read before implementing:
- Stage 1 builds on our C1 (
ed58365); do not duplicate it. C1 added an IDD-push→DDA fallback, but open-time only (driver never attaches). The game bug is mid-session (attached, then a game changes format/size). The bug doc's Stage 1 (a composing capturer that fails over mid-session) is the generalization — build it on C1'sopen()-returns-keepalive + bounded-attach infrastructure. - The bug doc was written against pre-remediation
main(a11b0dd). Its line numbers and its claim "capture.rs:348-356… no fall-through" are stale after our 9 commits (C1 changed exactly that). Rebase on currentmainfirst. - Stage 2 (new
SharedHeaderfields +PROTOCOL_VERSIONbump) must update theoffset_of!/size asserts added in A (95dcef3) — they catch drift at compile time (the intended safety net). Note: those asserts live in theframemodule ofcrates/pf-vdisplay-proto/src/lib.rs(the doc saysframe.rs). - Stage 0 / S3 diagnostics rely on the driver log, which B3 (
0a7ae5e) gated off in release builds (debug_assertions || PFVD_DEBUG_LOG). Enable it (PFVD_DEBUG_LOG=1or a debug build) for the repro. - S1/S2 (driver swap-chain resilience) is adjacent to E1 (same
swap_chain_processor.rs/callbacks.rs); coordinate so they don't conflict. - The bug doc's "doc-lag" note (
stage-pf-vdisplay.ps1still names the oldvdisplay-driver/tree) is part of our G / M6 packaging cleanup.
Stages (detail in the bug doc): Stage 0 diagnostics (S3) → Stage 1 mid-session fallback (P3, host-only, the user-visible fix) → Stage 2 adaptive ring (P1/P2; proto bump + driver re-vendor) → Stage 3 trim advertised modes → Stage S driver resilience (S1/S2). Tracked as GB0–GB3 in the task list.
Progress (2026-06-25): GB1 landed host-side — recover-or-drop, no DDA (per the owner's call): the
ring now tracks the display's ACTUAL mode (CCD active_resolution), recreating on a size/HDR change so a
game mode-set recovers in-place; if no frame resumes within 3 s it drops the session cleanly (client
reconnects). Commits f98ab07 (first-frame failover) + c87bfe0. Awaiting on-glass Doom validation.
GB3 groundwork landed — driver publish() width/height guard + descriptor-on-drop logging + a flushed
process-lifetime log appender so the swap-chain worker's lines land (commit 789ad49); needs a driver
rebuild + re-vendor to deploy. Stage 3 (trim modes) deprioritized; Stage S code-fix gated on these
diagnostics showing whether S1/S2 fire on-glass.
Verification
The persistent validator is the RTX box ssh "Enrico Bühler"@<ip> (ENRICOS-DESKTOP, RTX 4090,
PS shell). The IP FLOATS — DHCP + boots to Proxmox on reboot (new lease each time); recently .173 /
.158, confirm the current IP first. EPHEMERAL — never reboot it, never depend on it surviving. It has
WDK 26100 + LLVM 21.1.2 + the Rust toolchain. Build clone: C:\Users\Public\pf-rewrite.
# 0. (local, cross-platform) the proto crate + the Linux host build
cargo test -p pf-vdisplay-proto
cargo check -p punktfunk-host # Linux paths; the win_* mods are #[cfg(windows)]
# 1. reset the box clone to a clean base, then overlay your changed files
# ssh ... "cd C:\Users\Public\pf-rewrite; git fetch -q origin; git reset -q --hard origin/main; git clean -qfd; git checkout -q <rev>"
# scp <changed files> "Enrico Bühler@<ip>:C:/Users/Public/pf-rewrite/<same rel path>"
# 2. host clippy (warm target ~4s). NVENC import lib at C:\t\nvenc; no FFmpeg needed (amf-qsv off).
ssh ... "cd C:\Users\Public\pf-rewrite; $env:PUNKTFUNK_NVENC_LIB_DIR='C:\t\nvenc'; \
cargo clippy -p punktfunk-host --features nvenc --target x86_64-pc-windows-msvc -- -D warnings"
# 3. driver workspace build (fires deny(unsafe_op_in_unsafe_fn)); ~5s
ssh ... "cd C:\Users\Public\pf-rewrite\packaging\windows\drivers; \
$env:Version_Number='10.0.26100.0'; $env:LIBCLANG_PATH='C:\Program Files\LLVM\bin'; cargo build"
Gotchas: the box username has a ü → quote it; PS shell, filter output with Select-Object -Last N. After
a git reset --hard on the box clone, re-scp your working files (reset discards them). Do not build in
C:\Users\Public\punktfunk-native (the deployed host).
New modules introduced by this work
crates/pf-vdisplay-proto/src/lib.rs→ addedmod gamepad(XusbShm/PadShm/magics/name helpers) +offset_of!asserts.crates/punktfunk-host/src/win_adapter.rs→resolve_render_adapter_luid(plan'swindows/adapter.rs).crates/punktfunk-host/src/win_display.rs→ CCD/HDR display helpers (plan'swindows/display_ccd.rs).- Driver:
start_watchdog/reap_orphaned(control.rs/monitor.rs),set_render_adapter(adapter.rs),file_log_enabledgate (log.rs).