dbab1f98ba
Cross-reference docs/windows-host-rewrite-game-capture-bug.md from the remediation tracker, with the intersections that matter for whoever implements it: Stage 1 builds on (doesn't duplicate) our C1 mid-/open-time fallback; the bug doc is written against pre-remediation main (a11b0dd) so its line refs are stale; Stage 2's new SharedHeader fields must update A's offset asserts (in lib.rs frame mod); Stage 0/S3 diagnostics need the driver log B3 gated off in release; S1/S2 is adjacent to E1.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
159 lines
12 KiB
Markdown
159 lines
12 KiB
Markdown
# Windows Host Rewrite — Audit Remediation Tracker
|
||
|
||
Status: **in progress** (2026-06-25). Living hand-off doc for working through the findings in
|
||
[`docs/windows-host-rewrite-audit.md`](windows-host-rewrite-audit.md) (the audit of the IDD-push rewrite
|
||
vs [`docs/windows-host-rewrite.md`](windows-host-rewrite.md)). Keep this updated as items land so the work
|
||
can be handed off without losing tasks.
|
||
|
||
## TL;DR
|
||
|
||
- **9 commits on `main`, NOT pushed** (`+9` ahead of `origin/main`, tip `e60cda3`). Each is compile-verified
|
||
on the RTX box (see [Verification](#verification)).
|
||
- **Done:** the entire audit **P0 + P1 + P2** payload, the driver `unsafe` lint, and **F1** (SudoVDA helper
|
||
decoupling) complete.
|
||
- **Remaining:** **D2** (OwnedHandle), **D1-host** (unsafe-lint sweep), **E1** (driver ownership refactor),
|
||
**G** (gamepad-driver unification + old-tree deletion + host `src/windows/` tree).
|
||
- **Two cross-cutting follow-ups:** (1) **on-glass behavioral validation** of the committed driver/host
|
||
fixes (the box is single-GPU + headless-ish, so hybrid-GPU / HDR-toggle / fallback paths weren't
|
||
exercised at runtime); (2) **push** to run the full CI matrix (the local checks skip the `amf-qsv` path).
|
||
|
||
## Done — committed on `main` (unpushed)
|
||
|
||
| Commit | Audit § | What | Compile-verified |
|
||
|---|---|---|---|
|
||
| `0badc17` | — | The audit doc itself | — |
|
||
| `95dcef3` | §6.1/6.2 | **A** proto: `offset_of!` asserts on `SharedHeader`/`AddReply`/control structs; owned `XusbShm`/`PadShm` gamepad layouts (+ `min_const_generics`) | local `cargo test` + MSVC (box) |
|
||
| `0a7ae5e` | §4.1/4.2/4.4/4.5 | **B** driver: real host-gone **watchdog** (was dead code), **`SET_RENDER_ADAPTER`** impl, world-writable-log gate, mode bounds + `display_info` u64-saturate | driver `cargo build` (box) |
|
||
| `e5c9ee8` | §4.2h/6.1 | **C2/C5** host: render-pin comment/activation (driver now honors it); gamepad SHM consumers derive from `pf_vdisplay_proto::gamepad` | host clippy (box) |
|
||
| `ed58365` | §5.1 | **C1** host: IDD-push **attach fallback to DDA** (open() hands keepalive back; bounded `wait_for_attach` on `DRV_STATUS_OPENED`) instead of the 20s black bail | host clippy (box) |
|
||
| `b0d2838` | §5.3/5.4 | **C3/C4** host: `repeat_last` rotates+copies into a fresh out-ring slot; HDR ring sized FP16 at open when advanced-color is enabled | host clippy (box) |
|
||
| `a755d6e` | §8 | **D1-driver** `#![deny(unsafe_op_in_unsafe_fn)]` on `pf-vdisplay` + `wdk-iddcx` | driver `cargo build` (box) |
|
||
| `d638a93` | §9 | **F1 pt1**: `resolve_render_adapter_luid` → neutral `crate::win_adapter` | host clippy (box) |
|
||
| `e60cda3` | §9 | **F1 rest**: 6 CCD/HDR helpers + `SavedConfig` → neutral `crate::win_display`; SudoVDA reach-in fully broken | host clippy (box) + Linux `cargo check` |
|
||
|
||
## Remaining — to do
|
||
|
||
Ordered by suggested sequence. **On-glass = cannot be *finished* without a real session on the RTX box,
|
||
driven by a human** (driver install + client connect).
|
||
|
||
### D2 — `OwnedHandle` on the new path · audit §8 · compile-verifiable · moderate
|
||
- **Goal:** replace raw `HANDLE`/`isize` handles held across their lifetime with
|
||
`std::os::windows::io::OwnedHandle` (RAII close, fixes leak-on-error, deletes manual `CloseHandle`).
|
||
- **Targets:** `vdisplay/pf_vdisplay.rs` — the pinger thread's raw `isize` device handle (`pf_vdisplay.rs`
|
||
~324-344); `capture/idd_push.rs` — `IddPushCapturer { map, event, dbg_map: HANDLE }` (manually closed in
|
||
`Drop`). The plan also lists events/jobs/tokens/sections in `windows/process.rs`/`service.rs` (broader).
|
||
- **Risk:** handle ownership (double-close / premature close). Compile catches type errors; lifecycle
|
||
needs care. Touches the live IDD-push path → ideally smoke-tested on glass after.
|
||
- **Verify:** host clippy on the box (the new path is `--features nvenc`).
|
||
|
||
### D1-host — host-wide `unsafe` lint sweep · audit §8 · large/mechanical
|
||
- **Goal:** add `#![deny(unsafe_op_in_unsafe_fn)]` + `#![warn(clippy::undocumented_unsafe_blocks)]`
|
||
(+ optionally `multiple_unsafe_ops_per_block`) to the **host crate** (`crates/punktfunk-host/src/main.rs`),
|
||
and fix the fallout.
|
||
- **Scope:** large — hundreds of `unsafe` blocks across **both** Linux and Windows code need explicit
|
||
`unsafe {}` wrapping inside `unsafe fn`s and `// SAFETY:` comments. The driver already has the `deny`
|
||
(`a755d6e`); the host has none.
|
||
- **Verify:** Linux `cargo clippy -p punktfunk-host --all-targets -- -D warnings` (Linux/cross paths) **and**
|
||
host clippy on the box (Windows paths). Do it incrementally per-subsystem to keep the diff reviewable.
|
||
|
||
### E1 — driver ownership refactor · audit §4.3 / plan §2.5 + §14 step 5 · **on-glass-gated** · large
|
||
- **Goal:** move the driver's process-global statics (`MONITOR_MODES`, `NEXT_ID`, `ADAPTER`, `DEVICE_POOL`)
|
||
into a WDF `DeviceContext`; **wire `EvtCleanupCallback` on the `IDDCX_MONITOR` object** so the
|
||
`SwapChainProcessor` + D3D drop via RAII; collapse the 3-key monitor identity (`id`/`object`/`session_id`)
|
||
to one. Unblocks `max_concurrent>1` on Windows + removes the host-side preempt dance.
|
||
- **Why on-glass:** the plan's critique is explicit — *instrument that `MonitorContext::Drop` actually
|
||
RAN*; if the cleanup callback does not fire on this UMDF/IddCx stack, **keep the current explicit
|
||
REMOVE/teardown path as the fallback**. Cannot be signed off compile-only.
|
||
- **Files:** `packaging/windows/drivers/pf-vdisplay/src/{entry,adapter,monitor,callbacks,swap_chain_processor}.rs`.
|
||
- **Verify:** driver `cargo build` (compile) on the box; then on-glass reconnect-storm + leak check
|
||
(`LIVE_DEVICES` counter in `direct_3d_device.rs`, the world-readable log when `PFVD_DEBUG_LOG` is set).
|
||
|
||
### G — gamepad-driver unification (M4) + deletion (M6) + host tree · audit §6/§10 + plan §2.2 · **on-glass-gated** · largest
|
||
- **M4:** fold `pf_dualsense` + `pf_xusb` (today standalone `packaging/windows/{dualsense,xusb}-driver/` on
|
||
the old `wdf` stack) into the unified `packaging/windows/drivers/` workspace on `windows-drivers-rs`. This
|
||
also enables the **driver-side** gamepad-SHM→proto switch (host side already done in C5 — the driver still
|
||
hand-reads `view.add(140)`; point it at `pf_vdisplay_proto::gamepad::PadShm`/`XusbShm`).
|
||
- **M6:** delete the old `packaging/windows/vdisplay-driver/` tree + the old gamepad driver trees + the
|
||
bring-up scaffolding (`DebugBlock`/`spawn_observer`/`IDD_PERSIST`/`open_or_reuse` in `idd_push.rs`) — **only
|
||
after on-glass parity** of the new path.
|
||
- **Host architecture (Goal 1, plan §2.2/2.4):** the `src/windows/` subtree + `config.rs` (`HostConfig`) +
|
||
`SessionFactory`/`SessionPlan` — **not started**. The biggest clarity lever; large.
|
||
|
||
### Cross-cutting follow-ups (not a single task)
|
||
- **On-glass validation of the committed fixes** — needs the RTX box + a client. Specifically: the
|
||
**watchdog** actually reaps on host-kill (B1); **`SET_RENDER_ADAPTER`** pins correctly on a *hybrid* box
|
||
(B2/C2 — the lab box is single-dGPU, so this path is unexercised); the **IDD-push→DDA fallback** triggers
|
||
+ the happy path still attaches within 4s (C1); **HDR ring sizing** + **out-ring repeat** under real HDR /
|
||
static-desktop pipelining (C3/C4).
|
||
- **Push** to run the full CI matrix — the local host checks use `--features nvenc` only (no FFmpeg), so the
|
||
`amf-qsv` encode path is unexercised locally; CI (`windows-host.yml`) covers it.
|
||
|
||
## Related workstream — fullscreen-game IDD-push capture bug (separate doc)
|
||
|
||
A **separate, newly-found bug** (NOT an audit finding) in the same IDD-push subsystem, with its own staged
|
||
fix plan: [`docs/windows-host-rewrite-game-capture-bug.md`](windows-host-rewrite-game-capture-bug.md).
|
||
**Symptom:** launching a fullscreen game (Doom the Dark Ages) on an HDR IDD-push stream flashes the desktop,
|
||
the game never shows, and reconnect = black screen + working audio. **Root cause:** the IDD-push ring is
|
||
fixed format+size at session start; the driver silently drops every frame whose surface descriptor no longer
|
||
matches (a game forces a mode-set); the host has no channel to learn the descriptor changed; and there is no
|
||
mid-session fallback → 20 s `bail!`.
|
||
|
||
**Intersections with this remediation — read before implementing:**
|
||
- **Stage 1 builds on our C1 (`ed58365`); do not duplicate it.** C1 added an IDD-push→DDA fallback, but
|
||
**open-time only** (driver never attaches). The game bug is **mid-session** (attached, then a game changes
|
||
format/size). The bug doc's Stage 1 (a composing capturer that fails over mid-session) is the
|
||
generalization — build it on C1's `open()`-returns-keepalive + bounded-attach infrastructure.
|
||
- **The bug doc was written against pre-remediation `main` (`a11b0dd`).** Its line numbers and its claim
|
||
"`capture.rs:348-356` … no fall-through" are **stale after our 9 commits** (C1 changed exactly that).
|
||
Rebase on current `main` first.
|
||
- **Stage 2 (new `SharedHeader` fields + `PROTOCOL_VERSION` bump)** must update the **`offset_of!`/size
|
||
asserts added in A (`95dcef3`)** — they catch drift at compile time (the intended safety net). Note: those
|
||
asserts live in the `frame` module of `crates/pf-vdisplay-proto/src/lib.rs` (the doc says `frame.rs`).
|
||
- **Stage 0 / S3 diagnostics rely on the driver log**, which **B3 (`0a7ae5e`) gated off in release builds**
|
||
(`debug_assertions || PFVD_DEBUG_LOG`). Enable it (`PFVD_DEBUG_LOG=1` or a debug build) for the repro.
|
||
- **S1/S2 (driver swap-chain resilience)** is adjacent to **E1** (same `swap_chain_processor.rs`/
|
||
`callbacks.rs`); coordinate so they don't conflict.
|
||
- The bug doc's "doc-lag" note (`stage-pf-vdisplay.ps1` still names the old `vdisplay-driver/` tree) is part
|
||
of our **G / M6** packaging cleanup.
|
||
|
||
**Stages (detail in the bug doc):** Stage 0 diagnostics (S3) → Stage 1 mid-session fallback (P3, host-only,
|
||
the user-visible fix) → Stage 2 adaptive ring (P1/P2; proto bump + driver re-vendor) → Stage 3 trim
|
||
advertised modes → Stage S driver resilience (S1/S2). Tracked as GB0–GB3 in the task list.
|
||
|
||
## Verification
|
||
|
||
The persistent validator is the **RTX box** `ssh "Enrico Bühler"@192.168.1.158` (ENRICOS-DESKTOP, RTX 4090,
|
||
PS shell). **EPHEMERAL — boots to Proxmox on reboot; never reboot it, never depend on it surviving.** It has
|
||
WDK 26100 + LLVM 21.1.2 + the Rust toolchain. Build clone: `C:\Users\Public\pf-rewrite`.
|
||
|
||
```sh
|
||
# 0. (local, cross-platform) the proto crate + the Linux host build
|
||
cargo test -p pf-vdisplay-proto
|
||
cargo check -p punktfunk-host # Linux paths; the win_* mods are #[cfg(windows)]
|
||
|
||
# 1. reset the box clone to a clean base, then overlay your changed files
|
||
# ssh ... "cd C:\Users\Public\pf-rewrite; git fetch -q origin; git reset -q --hard origin/main; git clean -qfd; git checkout -q <rev>"
|
||
# scp <changed files> "Enrico Bühler@192.168.1.158:C:/Users/Public/pf-rewrite/<same rel path>"
|
||
|
||
# 2. host clippy (warm target ~4s). NVENC import lib at C:\t\nvenc; no FFmpeg needed (amf-qsv off).
|
||
ssh ... "cd C:\Users\Public\pf-rewrite; $env:PUNKTFUNK_NVENC_LIB_DIR='C:\t\nvenc'; \
|
||
cargo clippy -p punktfunk-host --features nvenc --target x86_64-pc-windows-msvc -- -D warnings"
|
||
|
||
# 3. driver workspace build (fires deny(unsafe_op_in_unsafe_fn)); ~5s
|
||
ssh ... "cd C:\Users\Public\pf-rewrite\packaging\windows\drivers; \
|
||
$env:Version_Number='10.0.26100.0'; $env:LIBCLANG_PATH='C:\Program Files\LLVM\bin'; cargo build"
|
||
```
|
||
|
||
Gotchas: the box username has a `ü` → quote it; PS shell, filter output with `Select-Object -Last N`. After
|
||
a `git reset --hard` on the box clone, re-`scp` your working files (reset discards them). Do **not** build in
|
||
`C:\Users\Public\punktfunk-native` (the deployed host).
|
||
|
||
## New modules introduced by this work
|
||
|
||
- `crates/pf-vdisplay-proto/src/lib.rs` → added `mod gamepad` (`XusbShm`/`PadShm`/magics/name helpers) +
|
||
`offset_of!` asserts.
|
||
- `crates/punktfunk-host/src/win_adapter.rs` → `resolve_render_adapter_luid` (plan's `windows/adapter.rs`).
|
||
- `crates/punktfunk-host/src/win_display.rs` → CCD/HDR display helpers (plan's `windows/display_ccd.rs`).
|
||
- Driver: `start_watchdog`/`reap_orphaned` (control.rs/monitor.rs), `set_render_adapter` (adapter.rs),
|
||
`file_log_enabled` gate (log.rs).
|