Files
punktfunk/docs/windows-host-rewrite-remediation.md
T
enricobuehler 1cd87066d7 docs(windows-rewrite): track GB1/GB3 progress + box IP floats (DHCP)
Record GB1 (host-side recover-or-drop) + GB3 groundwork (driver descriptor guard/logging) in the tracker; note the RTX validation box IP floats (DHCP/ephemeral, recently .173/.158) instead of hardcoding .158.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 15:35:27 +00:00

13 KiB
Raw Blame History

Windows Host Rewrite — Audit Remediation Tracker

Status: in progress (2026-06-25). Living hand-off doc for working through the findings in docs/windows-host-rewrite-audit.md (the audit of the IDD-push rewrite vs docs/windows-host-rewrite.md). Keep this updated as items land so the work can be handed off without losing tasks.

TL;DR

  • 9 commits on main, NOT pushed (+9 ahead of origin/main, tip e60cda3). Each is compile-verified on the RTX box (see Verification).
  • Done: the entire audit P0 + P1 + P2 payload, the driver unsafe lint, and F1 (SudoVDA helper decoupling) complete.
  • Remaining: D2 (OwnedHandle), D1-host (unsafe-lint sweep), E1 (driver ownership refactor), G (gamepad-driver unification + old-tree deletion + host src/windows/ tree).
  • Two cross-cutting follow-ups: (1) on-glass behavioral validation of the committed driver/host fixes (the box is single-GPU + headless-ish, so hybrid-GPU / HDR-toggle / fallback paths weren't exercised at runtime); (2) push to run the full CI matrix (the local checks skip the amf-qsv path).

Done — committed on main (unpushed)

Commit Audit § What Compile-verified
0badc17 The audit doc itself
95dcef3 §6.1/6.2 A proto: offset_of! asserts on SharedHeader/AddReply/control structs; owned XusbShm/PadShm gamepad layouts (+ min_const_generics) local cargo test + MSVC (box)
0a7ae5e §4.1/4.2/4.4/4.5 B driver: real host-gone watchdog (was dead code), SET_RENDER_ADAPTER impl, world-writable-log gate, mode bounds + display_info u64-saturate driver cargo build (box)
e5c9ee8 §4.2h/6.1 C2/C5 host: render-pin comment/activation (driver now honors it); gamepad SHM consumers derive from pf_vdisplay_proto::gamepad host clippy (box)
ed58365 §5.1 C1 host: IDD-push attach fallback to DDA (open() hands keepalive back; bounded wait_for_attach on DRV_STATUS_OPENED) instead of the 20s black bail host clippy (box)
b0d2838 §5.3/5.4 C3/C4 host: repeat_last rotates+copies into a fresh out-ring slot; HDR ring sized FP16 at open when advanced-color is enabled host clippy (box)
a755d6e §8 D1-driver #![deny(unsafe_op_in_unsafe_fn)] on pf-vdisplay + wdk-iddcx driver cargo build (box)
d638a93 §9 F1 pt1: resolve_render_adapter_luid → neutral crate::win_adapter host clippy (box)
e60cda3 §9 F1 rest: 6 CCD/HDR helpers + SavedConfig → neutral crate::win_display; SudoVDA reach-in fully broken host clippy (box) + Linux cargo check

Remaining — to do

Ordered by suggested sequence. On-glass = cannot be finished without a real session on the RTX box, driven by a human (driver install + client connect).

D2 — OwnedHandle on the new path · audit §8 · compile-verifiable · moderate

  • Goal: replace raw HANDLE/isize handles held across their lifetime with std::os::windows::io::OwnedHandle (RAII close, fixes leak-on-error, deletes manual CloseHandle).
  • Targets: vdisplay/pf_vdisplay.rs — the pinger thread's raw isize device handle (pf_vdisplay.rs ~324-344); capture/idd_push.rsIddPushCapturer { map, event, dbg_map: HANDLE } (manually closed in Drop). The plan also lists events/jobs/tokens/sections in windows/process.rs/service.rs (broader).
  • Risk: handle ownership (double-close / premature close). Compile catches type errors; lifecycle needs care. Touches the live IDD-push path → ideally smoke-tested on glass after.
  • Verify: host clippy on the box (the new path is --features nvenc).

D1-host — host-wide unsafe lint sweep · audit §8 · large/mechanical

  • Goal: add #![deny(unsafe_op_in_unsafe_fn)] + #![warn(clippy::undocumented_unsafe_blocks)] (+ optionally multiple_unsafe_ops_per_block) to the host crate (crates/punktfunk-host/src/main.rs), and fix the fallout.
  • Scope: large — hundreds of unsafe blocks across both Linux and Windows code need explicit unsafe {} wrapping inside unsafe fns and // SAFETY: comments. The driver already has the deny (a755d6e); the host has none.
  • Verify: Linux cargo clippy -p punktfunk-host --all-targets -- -D warnings (Linux/cross paths) and host clippy on the box (Windows paths). Do it incrementally per-subsystem to keep the diff reviewable.

E1 — driver ownership refactor · audit §4.3 / plan §2.5 + §14 step 5 · on-glass-gated · large

  • Goal: move the driver's process-global statics (MONITOR_MODES, NEXT_ID, ADAPTER, DEVICE_POOL) into a WDF DeviceContext; wire EvtCleanupCallback on the IDDCX_MONITOR object so the SwapChainProcessor + D3D drop via RAII; collapse the 3-key monitor identity (id/object/session_id) to one. Unblocks max_concurrent>1 on Windows + removes the host-side preempt dance.
  • Why on-glass: the plan's critique is explicit — instrument that MonitorContext::Drop actually RAN; if the cleanup callback does not fire on this UMDF/IddCx stack, keep the current explicit REMOVE/teardown path as the fallback. Cannot be signed off compile-only.
  • Files: packaging/windows/drivers/pf-vdisplay/src/{entry,adapter,monitor,callbacks,swap_chain_processor}.rs.
  • Verify: driver cargo build (compile) on the box; then on-glass reconnect-storm + leak check (LIVE_DEVICES counter in direct_3d_device.rs, the world-readable log when PFVD_DEBUG_LOG is set).

G — gamepad-driver unification (M4) + deletion (M6) + host tree · audit §6/§10 + plan §2.2 · on-glass-gated · largest

  • M4: fold pf_dualsense + pf_xusb (today standalone packaging/windows/{dualsense,xusb}-driver/ on the old wdf stack) into the unified packaging/windows/drivers/ workspace on windows-drivers-rs. This also enables the driver-side gamepad-SHM→proto switch (host side already done in C5 — the driver still hand-reads view.add(140); point it at pf_vdisplay_proto::gamepad::PadShm/XusbShm).
  • M6: delete the old packaging/windows/vdisplay-driver/ tree + the old gamepad driver trees + the bring-up scaffolding (DebugBlock/spawn_observer/IDD_PERSIST/open_or_reuse in idd_push.rs) — only after on-glass parity of the new path.
  • Host architecture (Goal 1, plan §2.2/2.4): the src/windows/ subtree + config.rs (HostConfig) + SessionFactory/SessionPlannot started. The biggest clarity lever; large.

Cross-cutting follow-ups (not a single task)

  • On-glass validation of the committed fixes — needs the RTX box + a client. Specifically: the watchdog actually reaps on host-kill (B1); SET_RENDER_ADAPTER pins correctly on a hybrid box (B2/C2 — the lab box is single-dGPU, so this path is unexercised); the IDD-push→DDA fallback triggers
    • the happy path still attaches within 4s (C1); HDR ring sizing + out-ring repeat under real HDR / static-desktop pipelining (C3/C4).
  • Push to run the full CI matrix — the local host checks use --features nvenc only (no FFmpeg), so the amf-qsv encode path is unexercised locally; CI (windows-host.yml) covers it.

A separate, newly-found bug (NOT an audit finding) in the same IDD-push subsystem, with its own staged fix plan: docs/windows-host-rewrite-game-capture-bug.md. Symptom: launching a fullscreen game (Doom the Dark Ages) on an HDR IDD-push stream flashes the desktop, the game never shows, and reconnect = black screen + working audio. Root cause: the IDD-push ring is fixed format+size at session start; the driver silently drops every frame whose surface descriptor no longer matches (a game forces a mode-set); the host has no channel to learn the descriptor changed; and there is no mid-session fallback → 20 s bail!.

Intersections with this remediation — read before implementing:

  • Stage 1 builds on our C1 (ed58365); do not duplicate it. C1 added an IDD-push→DDA fallback, but open-time only (driver never attaches). The game bug is mid-session (attached, then a game changes format/size). The bug doc's Stage 1 (a composing capturer that fails over mid-session) is the generalization — build it on C1's open()-returns-keepalive + bounded-attach infrastructure.
  • The bug doc was written against pre-remediation main (a11b0dd). Its line numbers and its claim "capture.rs:348-356 … no fall-through" are stale after our 9 commits (C1 changed exactly that). Rebase on current main first.
  • Stage 2 (new SharedHeader fields + PROTOCOL_VERSION bump) must update the offset_of!/size asserts added in A (95dcef3) — they catch drift at compile time (the intended safety net). Note: those asserts live in the frame module of crates/pf-vdisplay-proto/src/lib.rs (the doc says frame.rs).
  • Stage 0 / S3 diagnostics rely on the driver log, which B3 (0a7ae5e) gated off in release builds (debug_assertions || PFVD_DEBUG_LOG). Enable it (PFVD_DEBUG_LOG=1 or a debug build) for the repro.
  • S1/S2 (driver swap-chain resilience) is adjacent to E1 (same swap_chain_processor.rs/ callbacks.rs); coordinate so they don't conflict.
  • The bug doc's "doc-lag" note (stage-pf-vdisplay.ps1 still names the old vdisplay-driver/ tree) is part of our G / M6 packaging cleanup.

Stages (detail in the bug doc): Stage 0 diagnostics (S3) → Stage 1 mid-session fallback (P3, host-only, the user-visible fix) → Stage 2 adaptive ring (P1/P2; proto bump + driver re-vendor) → Stage 3 trim advertised modes → Stage S driver resilience (S1/S2). Tracked as GB0GB3 in the task list.

Progress (2026-06-25): GB1 landed host-siderecover-or-drop, no DDA (per the owner's call): the ring now tracks the display's ACTUAL mode (CCD active_resolution), recreating on a size/HDR change so a game mode-set recovers in-place; if no frame resumes within 3 s it drops the session cleanly (client reconnects). Commits f98ab07 (first-frame failover) + c87bfe0. Awaiting on-glass Doom validation. GB3 groundwork landed — driver publish() width/height guard + descriptor-on-drop logging + a flushed process-lifetime log appender so the swap-chain worker's lines land (commit 789ad49); needs a driver rebuild + re-vendor to deploy. Stage 3 (trim modes) deprioritized; Stage S code-fix gated on these diagnostics showing whether S1/S2 fire on-glass.

Verification

The persistent validator is the RTX box ssh "Enrico Bühler"@<ip> (ENRICOS-DESKTOP, RTX 4090, PS shell). The IP FLOATS — DHCP + boots to Proxmox on reboot (new lease each time); recently .173 / .158, confirm the current IP first. EPHEMERAL — never reboot it, never depend on it surviving. It has WDK 26100 + LLVM 21.1.2 + the Rust toolchain. Build clone: C:\Users\Public\pf-rewrite.

# 0. (local, cross-platform) the proto crate + the Linux host build
cargo test  -p pf-vdisplay-proto
cargo check -p punktfunk-host                       # Linux paths; the win_* mods are #[cfg(windows)]

# 1. reset the box clone to a clean base, then overlay your changed files
#    ssh ... "cd C:\Users\Public\pf-rewrite; git fetch -q origin; git reset -q --hard origin/main; git clean -qfd; git checkout -q <rev>"
#    scp <changed files> "Enrico Bühler@<ip>:C:/Users/Public/pf-rewrite/<same rel path>"

# 2. host clippy (warm target ~4s). NVENC import lib at C:\t\nvenc; no FFmpeg needed (amf-qsv off).
ssh ... "cd C:\Users\Public\pf-rewrite; $env:PUNKTFUNK_NVENC_LIB_DIR='C:\t\nvenc'; \
  cargo clippy -p punktfunk-host --features nvenc --target x86_64-pc-windows-msvc -- -D warnings"

# 3. driver workspace build (fires deny(unsafe_op_in_unsafe_fn)); ~5s
ssh ... "cd C:\Users\Public\pf-rewrite\packaging\windows\drivers; \
  $env:Version_Number='10.0.26100.0'; $env:LIBCLANG_PATH='C:\Program Files\LLVM\bin'; cargo build"

Gotchas: the box username has a ü → quote it; PS shell, filter output with Select-Object -Last N. After a git reset --hard on the box clone, re-scp your working files (reset discards them). Do not build in C:\Users\Public\punktfunk-native (the deployed host).

New modules introduced by this work

  • crates/pf-vdisplay-proto/src/lib.rs → added mod gamepad (XusbShm/PadShm/magics/name helpers) + offset_of! asserts.
  • crates/punktfunk-host/src/win_adapter.rsresolve_render_adapter_luid (plan's windows/adapter.rs).
  • crates/punktfunk-host/src/win_display.rs → CCD/HDR display helpers (plan's windows/display_ccd.rs).
  • Driver: start_watchdog/reap_orphaned (control.rs/monitor.rs), set_render_adapter (adapter.rs), file_log_enabled gate (log.rs).