# Goal-1 (clean, layered host architecture) — staged execution plan The design is in [`windows-host-rewrite.md`](windows-host-rewrite.md) §2.2–2.4. This file is the **ordered, independently-shippable execution plan**, because the host is **live-validated** (GameStream + punktfunk/1, NVENC + IDD-push on-glass) and Goal-1 rewires its session/config/dispatch flow — so every stage must **preserve behavior**, compile + box-verify on its own, and be committed before the next starts. The plan's own §14 makes the §1 preservation checklist a mandatory per-module assert contract; honour it. > **Status (2026-06-25):** all six staged stages **and** §2.5 (the ownership-model rewrite) are **DONE** — > each is code + box-`cargo check --features nvenc` + (where it touches the deployed path) on-glass > validated. Work lives on branch **`windows-host-goal1`** (off `main`, **not merged**). What's left is > small and non-blocking — see [Remaining (next session)](#remaining-next-session) at the end. ## Why staged (not one big rewrite) `main` is at parity and shipping. A monolithic rewrite would put the validated host in a broken intermediate state for a long window and make a regression impossible to bisect. Each stage below is a behaviour-preserving transform with its own verification, so a regression is caught at the stage that introduced it. ## Stages (ordered; each = goal · files · risk · verify) **Stage 1 — `HostConfig` foundation. ✅ DONE (this commit).** `config.rs`: typed `HostConfig` parsed ONCE from env (`idd_push`/`encoder_pref`/`no_helper`/`force_helper`). Migrated the two highest-churn dispatch reads onto it (`encode::windows_resolved_backend`, `punktfunk1::should_use_helper`). Risk: low (env constant at runtime → identical behaviour). Verify: box `cargo check --features nvenc`. **Stage 2 — finish `HostConfig` + resolve-once. ✅ DONE (this commit).** Migrated **31** genuinely-constant operator/dispatch sites onto `HostConfig`: `idd_push` ×7 (the capture/topology disagreement knob), `no_wgc`, `capture_backend`, `render_adapter`, `encoder_pref` (Linux), the Windows vdisplay-backend select, plus the plan-named `secure_dda`/`idd_depth`/`zerocopy`/`ten_bit` and the multi-site `perf` ×4 / `compositor` ×5 / `video_source` ×3 / `gamepad`. Each `HostConfig` field's parser is **byte-identical** to the read it replaced, so `old == new` by construction (the §1 "flipped bool" guard). **Scope correction (the plan's "~64 sites / Linux XDG+compositor / grep→0" was unsafe as written):** two classes of `env::var` read are deliberately **kept live** and documented in `config.rs`: - **Runtime-mutated session vars.** On Linux, `vdisplay::apply_session_env` *rewrites the process env on every connect* (the Bazzite Gaming↔Desktop follow): `WAYLAND_DISPLAY`, `XDG_CURRENT_DESKTOP`, `XDG_RUNTIME_DIR`, `DBUS_SESSION_BUS_ADDRESS`, and the derived `PUNKTFUNK_INPUT_BACKEND`, `PUNKTFUNK_GAMESCOPE_SESSION/NODE`, `PUNKTFUNK_KWIN/MUTTER_VIRTUAL_PRIMARY`, `PUNKTFUNK_FORCE_SHM`. Parse-once would freeze them at startup → silent session-following regression. They are NOT constant. - **Single-use local tuning** (no resolve-once benefit, call-site-local default/clamp, and `FEC_PCT` even has *two different* semantics): `FEC_PCT`, `VIDEO_DROP`, `VBV_FRAMES`, `SPLIT_ENCODE`, `PACE_BURST_KB`, the `capture/dxgi.rs` timing knobs, the `*_LIVE`/test gates, plus path/dynamic reads (config-dir, `PATH` search, env-forward-to-child). `PUNKTFUNK_ZEROCOPY` is split on purpose: Windows presence-semantics moved to the field; Linux keeps its own truthy parser. Risk: medium (semantics-preservation). Verify: Linux `cargo check`/`clippy`/`fmt` green (the Windows-only edits are 1:1 substitutions, compile-verified on the box as part of Stage 3's build). **Stage 3 — `SessionPlan` (the single biggest clarity lever, plan §2.4). ✅ DONE (box-build + on-glass validated).** New `src/session_plan.rs`: a `Copy` `SessionPlan { capture, topology, encoder, bit_depth, hdr }` resolved **once** from `HostConfig` (+ the negotiated `bit_depth`) in `virtual_stream`, logged, and threaded through `build_pipeline_with_retry`/`build_pipeline`. The three dispatch points now read it: - **capture** — `capture::capture_virtual_output` takes a `CaptureBackend` IN (was re-deriving from `config().idd_push`/`capture_backend`/`no_wgc`); `CaptureBackend::resolve()` is the one resolver (also used by the GameStream + spike call sites). - **topology** — `virtual_stream` reads `plan.topology` (`should_use_helper` deleted; its logic is `session_plan::resolve_topology`, verbatim). The IDD-preempt guard reads `plan.capture` too. - **encoder** — recorded as `EncoderBackend` from `encode::windows_resolved_backend` (config-backed + GPU-vendor cached since stage 2, already a single source). Threading `encoder`/`input_format` into the encoder + capturer opens (which removes the `dxgi.rs` back-reference) is **stage 5**. Every decision is provably equivalent to the pre-stage-3 scattered reads (same `config()` + cached probes), so it is behavior-preserving. Risk: medium-high (rewires the deployed decision). Verify: - **Box build ✅** — `cargo check -p punktfunk-host --features nvenc` (the deployed config: NVENC SDK + `cudarc` + `encode/nvenc.rs`) is **clean, zero warnings**, on the RTX box (`192.168.1.173`), in an isolated worktree. This also covers stage 2's Windows-only edits (their first real Windows compile). - **On-glass ✅** — deployed my Stage-3 host into the SCM service (Session-1 launch, the real IDD-push environment) on the RTX box and drove a `punktfunk-probe` loopback session. The host logged `resolved session plan { capture: IddPush, topology: SingleProcess, encoder: Nvenc, bit_depth: 8, hdr: false }` — the **correct** resolution for the deployed config (IDD_PUSH + VDISPLAY=pf + nvenc) — and routed correctly (IDD-push capturer → shared ring → IDD→DDA fallback). This box has a pre-existing **hybrid-GPU IDD render-adapter mismatch** (driver renders on the iGPU `af4825`, host ring on the 4090 `294d29`) that yielded no published frame in this loopback scenario; an **A/B against the shipping binary reproduced the identical `frames=0`**, proving the no-frame is environmental, **not** a Stage-3 regression. Stage 3 is behavior-equivalent to the shipping host. Box restored to its deployed state. **Stage 4 — `SessionContext` (the arg-bundling). ✅ DONE (box-build validated). `SessionFactory`/`Session::drop` deferred to §2.5 — see below.** Bundled the 13-positional-argument `#[allow(too_many_arguments)]` session entry (`virtual_stream` **and** `virtual_stream_relay`) into one owned `SessionContext` struct, moved into the stream thread. The receivers move in (`virtual_stream` is their only consumer), retiring the `&Receiver` borrow plumbing. **Behavior- identical by construction**: each function destructures the context into the same local names at the top, so the ~400-line loop bodies are byte-for-byte unchanged. Removed both `#[allow(too_many_arguments)]` attrs. **Scoped deliberately.** The plan's `SessionFactory.build()` owning a `vdm.lease(mode) → open_capturer → open_encoder → spawn` RAII chain with `Session::drop` as the *only* teardown is **coupled to §2.5's ownership-model rewrite** — it needs a host-side `VirtualDisplayManager`/`MonitorLease` that does not exist yet (the lifecycle still lives in the `CURRENT_MON_GEN`/`IDD_SETUP_LOCK` globals + the per-compositor `vdisplay` backends). The current teardown is **already drop-based** (the capturer owns the keepalive whose `Drop` releases the monitor — "restore displays before REMOVE" lives there; only `send_thread.join()` is explicit), and it is the validated shipping path. Wrapping the deployed reconfig/switch/rebuild loop in a `Session::drop` for a behavior-preserving change would add real regression risk for marginal gain. So the `SessionFactory`/`Session::drop`/`vdm.lease` work is folded into §2.5 (its natural home); this stage delivers the concrete, safe arg-bundling. Risk: low (behavior-identical). Verify: Linux + box build (the relay destructure is the only Windows-only piece); the teardown on-glass gate moves to the §2.5 work. **Stage 5 — seam-trait tightenings (plan §2.3). 🟡 Tightening 1 ✅ DONE (box-build validated); 2→§2.5, 3 follow-on.** The three §2.3 tightenings have different coupling, so they split: - **(1) `OutputFormat` into the capturer ✅** — the headline (the explicit Stage-3 deferral; §5's "highest-severity coupling"). New `capture::OutputFormat { gpu, hdr }`, resolved once per session and passed **into** `capture_virtual_output` (`SessionPlan::output_format()` for the native path — `gpu = encoder.is_gpu()`, no second probe; `OutputFormat::resolve()` for the GameStream/spike paths). `dxgi::DuplCapturer::open` takes `gpu` in and **its `windows_resolved_backend()` recompute is deleted** — capture no longer re-derives the encode backend. Behavior-preserving (the `gpu` passed in equals the value the capturer used to compute). Linux + box-build clean. - **(2) HDR/release → `VirtualLease`** — **moved to §2.5.** `await_released` as a lease method needs the monitor-generation carried *on the lease* (today it's the `CURRENT_MON_GEN` global + the `sudovda::wait_for_monitor_released` free fn), and the keepalive becoming `Box` is the ownership-model change. It belongs with the `VirtualDisplayManager`/`MonitorLease` work, not bolted on here. - **(3) `EncoderCaps`** — small additive follow-on (query optional encoder capabilities instead of default no-ops); not blocking. Tracked for the next seam pass. Risk: medium (Tightening 1 is behavior-preserving + Windows-only → box-compile is the gate; on-glass parity is the same env-limited story as Stage 3). **Stage 6 — `windows/` + `linux/` tree confinement (cfg-sprawl, plan §2.2). ✅ DONE (Linux + box-build validated).** Moved **36 platform-specific files** into per-module `windows/` and `linux/` folders (and the shared HID codecs into `inject/proto/`): `capture/{windows,linux}/`, `encode/{windows,linux}/`, `inject/{windows,linux,proto}/`, `audio/{windows,linux}/`, `vdisplay/{windows,linux}/`, and the top-level `src/windows/` (service, wgc_helper, win_adapter, win_display) + `src/linux/` (dmabuf_fence, drm_sync, zerocopy/). **Done with `#[path]`, not a module rename** — every file moves into its folder while the `crate::*::*` module names stay **flat**, so all caller paths and every internal `super::`/`crate::` reference are **unchanged** (only the parent `mod` decls gained `#[path = "…"]`). This is the codebase's existing pattern (inject's `gamepad_windows`) and makes the move byte-identical in behaviour with **zero reference churn** — far lower risk than collapsing to a single `crate::capture::windows::` namespace (that deeper rename is an optional follow-on; this delivers the folder confinement the stage is about). Done LAST, after the semantic stages. Verify: Linux `cargo check`/`clippy`/`fmt` clean; all 36 `#[path]` targets exist; no internal `#[path]`/`include!`/file-child-`mod` in any moved file; **box `cargo check --features nvenc` clean**. **§2.5 — ownership-model rewrite: `VirtualDisplayManager` + `MonitorLease`. ✅ DONE (3 steps; code + box + on-glass reconnect-leak validated).** The natural home for the deferrals above (Stage 4's `SessionFactory`/`Session::drop`/`vdm.lease`; Stage 5 tightening 2's HDR/release → `VirtualLease`). A 5-agent map first established two facts that shaped the work: **`CURRENT_MON_GEN` was WRITE-ONLY** (its only reader, `idd_push::my_gen`, was set-but-never-read — the "per-frame monitor-gen bail" the docs describe was never wired; per-frame staleness is the *separate* ring `FrameToken.generation`), so the design's "carry the monitor gen through `WinCaptureTarget`" was unnecessary; and the two Windows backends (`sudovda` + `pf_vdisplay`) **duplicated the Idle/Active/Lingering refcount state machine verbatim** (differing only in IOCTL proto + REMOVE key). User-approved shape: **one OnceLock singleton `VirtualDisplayManager`**, not a threaded `Arc`. - **Step 1 (`1520201`)** — delete the dead/write-only code: `CURRENT_MON_GEN`/`my_gen`, `IDD_PERSIST`/ `open_or_reuse`/`IddReuseHandle` (~150 lines). - **Step 2 (`d9b8b88`)** — new `vdisplay/windows/manager.rs`: the two duplicated `MGR: Mutex` globals collapse into one OnceLock `VirtualDisplayManager` { `Box`, `Arc` device (typed — kills the raw-`isize` cross-thread smuggle **and** fixes a latent control-handle leak), `Mutex`, `AtomicU64` gen }. `sudovda`/`pf_vdisplay` shrink to thin `VdisplayDriver` impls (`open`/`add_monitor`/`remove_monitor`/`ping`) + thin `VirtualDisplay` wrappers; the IOCTL surface is the only backend-specific code left. `MonitorKey = Guid(GUID) | Session(u64)`. `MON_GEN` + `wait_for_monitor_released` move onto the manager; `MonitorLease::drop → vdm().release(gen)` preserves the stale-lease no-op verbatim. - **Step 3 (`fe61597`)** — the last two globals (`IDD_SETUP_LOCK`/`IDD_SESSION_STOP`) move onto the manager behind `vdm().begin_idd_setup(stop)`; `punktfunk1` no longer reaches into vdisplay internals for the preempt. Net: `CURRENT_MON_GEN` / `MON_GEN` / two `MGR` / `IDD_PERSIST` / `IDD_SETUP_LOCK` / `IDD_SESSION_STOP` — **all gone**, replaced by one encapsulated, typed manager. Behavior-preserving (the state machine is the canonical `sudovda` copy routed through the driver seam). **On-glass reconnect-leak test ✅ (`683c81b`)** — it earned its keep: the box *compile* was clean, but the first deploy **panicked** (`VirtualDisplayManager used before a backend initialised it`) because `begin_idd_setup` called `vdm()` **before** `vdisplay::open` constructs the backend that runs `manager::init()` (the old globals needed no init, so the ordering only broke once it became a manager method). Fixed by opening the backend first — it does no monitor work, so the preempt-before-monitor-creation semantics are preserved. After the fix: **0 panics**, the new `manager` module owns the lifecycle (`vdisplay::manager: virtual-display monitor removed`), create == removed (net 0, **bounded**), **0 leaked active monitors** across many reconnects; an A/B vs the shipping binary confirmed §2.5 is behaviour-equivalent. Verified live on the **IDD-push zero-copy path** (`new_fps ~200` @5120×1440@240, **0 DDA fallbacks**). ## Remaining (next session) Small, non-blocking follow-ons — the layered architecture is in place: 1. **`EncoderCaps` (Stage 5 tightening 3)** — query optional encoder capabilities behind a small trait instead of the default no-ops; additive, low-risk. The last seam-trait tightening. 2. **Optional `crate::*::windows::` namespace collapse** — Stage 6 confined the platform files into `windows/`/`linux/` folders via `#[path]` (flat module names, zero reference churn); the deeper rename to real `crate::capture::windows::` paths is optional cleanup, not required. 3. **Merge `windows-host-goal1` → `main`** — the branch is off `main` and **not merged**; local `main` is also ~20 commits ahead of `origin/main` with unpushed audit/Stage work. Land both when ready (the [Work-on-main] habit otherwise applies). 4. **(driver — NOT the host refactor) pf-vdisplay slot reclaim** — surfaced on-glass: sustained ADD/REMOVE churn wedges the driver (`ADD → 0x80070490 ERROR_NOT_FOUND`) because it doesn't reclaim IddCx monitor slots on REMOVE (ghost monitor nodes accumulate, `target_id`s climb). Recovery today is `packaging/windows/reset-pf-vdisplay.ps1`; the real fix lives in the driver WIP (`packaging/windows/drivers/pf-vdisplay/src/{control,adapter}.rs`). Dev-iteration helpers `reset-pf-vdisplay.ps1` + `redeploy-pf-vdisplay.ps1` are committed under `packaging/windows/` (validated live). ## Guardrails (mandatory, plan §14) - Each stage is its own commit; box-verify before moving on. - Stages 3–5 touch the deployed path → **on-glass re-test** (NVENC + IDD-push, a mode switch, a connect/disconnect cycle) before the next stage. - Preserve every `PUNKTFUNK_*` var's exact semantics; when in doubt, assert old==new at the call site.