e1ca2e4d3c
The plan tracker referenced "§2.5 — see below" but had no §2.5 section and no "what's left". Add:
* a Status banner (all 6 stages + §2.5 done; branch not merged),
* the §2.5 section — the 3-step ownership-model rewrite (VirtualDisplayManager/MonitorLease,
the deleted globals), the CURRENT_MON_GEN-write-only finding, and the on-glass reconnect-leak
result (the vdm-init-order panic found+fixed, 0 leaks, IDD-push zero-copy verified),
* a "Remaining (next session)" list: EncoderCaps, optional namespace collapse, merge to main, and
the pf-vdisplay driver slot-reclaim fix (driver WIP, not the host refactor) with the dev scripts.
Mark §2.5 IMPLEMENTED in the design doc (windows-host-rewrite.md) with the write-only-gen deviation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
197 lines
16 KiB
Markdown
197 lines
16 KiB
Markdown
# Goal-1 (clean, layered host architecture) — staged execution plan
|
||
|
||
The design is in [`windows-host-rewrite.md`](windows-host-rewrite.md) §2.2–2.4. This file is the **ordered,
|
||
independently-shippable execution plan**, because the host is **live-validated** (GameStream + punktfunk/1,
|
||
NVENC + IDD-push on-glass) and Goal-1 rewires its session/config/dispatch flow — so every stage must
|
||
**preserve behavior**, compile + box-verify on its own, and be committed before the next starts. The plan's
|
||
own §14 makes the §1 preservation checklist a mandatory per-module assert contract; honour it.
|
||
|
||
> **Status (2026-06-25):** all six staged stages **and** §2.5 (the ownership-model rewrite) are **DONE** —
|
||
> each is code + box-`cargo check --features nvenc` + (where it touches the deployed path) on-glass
|
||
> validated. Work lives on branch **`windows-host-goal1`** (off `main`, **not merged**). What's left is
|
||
> small and non-blocking — see [Remaining (next session)](#remaining-next-session) at the end.
|
||
|
||
## Why staged (not one big rewrite)
|
||
|
||
`main` is at parity and shipping. A monolithic rewrite would put the validated host in a broken
|
||
intermediate state for a long window and make a regression impossible to bisect. Each stage below is a
|
||
behaviour-preserving transform with its own verification, so a regression is caught at the stage that
|
||
introduced it.
|
||
|
||
## Stages (ordered; each = goal · files · risk · verify)
|
||
|
||
**Stage 1 — `HostConfig` foundation. ✅ DONE (this commit).**
|
||
`config.rs`: typed `HostConfig` parsed ONCE from env (`idd_push`/`encoder_pref`/`no_helper`/`force_helper`).
|
||
Migrated the two highest-churn dispatch reads onto it (`encode::windows_resolved_backend`,
|
||
`punktfunk1::should_use_helper`). Risk: low (env constant at runtime → identical behaviour). Verify: box
|
||
`cargo check --features nvenc`.
|
||
|
||
**Stage 2 — finish `HostConfig` + resolve-once. ✅ DONE (this commit).**
|
||
Migrated **31** genuinely-constant operator/dispatch sites onto `HostConfig`: `idd_push` ×7 (the
|
||
capture/topology disagreement knob), `no_wgc`, `capture_backend`, `render_adapter`, `encoder_pref` (Linux),
|
||
the Windows vdisplay-backend select, plus the plan-named `secure_dda`/`idd_depth`/`zerocopy`/`ten_bit` and the
|
||
multi-site `perf` ×4 / `compositor` ×5 / `video_source` ×3 / `gamepad`. Each `HostConfig` field's parser is
|
||
**byte-identical** to the read it replaced, so `old == new` by construction (the §1 "flipped bool" guard).
|
||
|
||
**Scope correction (the plan's "~64 sites / Linux XDG+compositor / grep→0" was unsafe as written):** two
|
||
classes of `env::var` read are deliberately **kept live** and documented in `config.rs`:
|
||
- **Runtime-mutated session vars.** On Linux, `vdisplay::apply_session_env` *rewrites the process env on
|
||
every connect* (the Bazzite Gaming↔Desktop follow): `WAYLAND_DISPLAY`, `XDG_CURRENT_DESKTOP`,
|
||
`XDG_RUNTIME_DIR`, `DBUS_SESSION_BUS_ADDRESS`, and the derived `PUNKTFUNK_INPUT_BACKEND`,
|
||
`PUNKTFUNK_GAMESCOPE_SESSION/NODE`, `PUNKTFUNK_KWIN/MUTTER_VIRTUAL_PRIMARY`, `PUNKTFUNK_FORCE_SHM`.
|
||
Parse-once would freeze them at startup → silent session-following regression. They are NOT constant.
|
||
- **Single-use local tuning** (no resolve-once benefit, call-site-local default/clamp, and `FEC_PCT` even has
|
||
*two different* semantics): `FEC_PCT`, `VIDEO_DROP`, `VBV_FRAMES`, `SPLIT_ENCODE`, `PACE_BURST_KB`, the
|
||
`capture/dxgi.rs` timing knobs, the `*_LIVE`/test gates, plus path/dynamic reads (config-dir, `PATH`
|
||
search, env-forward-to-child). `PUNKTFUNK_ZEROCOPY` is split on purpose: Windows presence-semantics moved
|
||
to the field; Linux keeps its own truthy parser.
|
||
|
||
Risk: medium (semantics-preservation). Verify: Linux `cargo check`/`clippy`/`fmt` green (the Windows-only
|
||
edits are 1:1 substitutions, compile-verified on the box as part of Stage 3's build).
|
||
|
||
**Stage 3 — `SessionPlan` (the single biggest clarity lever, plan §2.4). ✅ DONE (box-build + on-glass validated).**
|
||
New `src/session_plan.rs`: a `Copy` `SessionPlan { capture, topology, encoder, bit_depth, hdr }` resolved
|
||
**once** from `HostConfig` (+ the negotiated `bit_depth`) in `virtual_stream`, logged, and threaded through
|
||
`build_pipeline_with_retry`/`build_pipeline`. The three dispatch points now read it:
|
||
- **capture** — `capture::capture_virtual_output` takes a `CaptureBackend` IN (was re-deriving from
|
||
`config().idd_push`/`capture_backend`/`no_wgc`); `CaptureBackend::resolve()` is the one resolver (also
|
||
used by the GameStream + spike call sites).
|
||
- **topology** — `virtual_stream` reads `plan.topology` (`should_use_helper` deleted; its logic is
|
||
`session_plan::resolve_topology`, verbatim). The IDD-preempt guard reads `plan.capture` too.
|
||
- **encoder** — recorded as `EncoderBackend` from `encode::windows_resolved_backend` (config-backed +
|
||
GPU-vendor cached since stage 2, already a single source). Threading `encoder`/`input_format` into the
|
||
encoder + capturer opens (which removes the `dxgi.rs` back-reference) is **stage 5**.
|
||
|
||
Every decision is provably equivalent to the pre-stage-3 scattered reads (same `config()` + cached probes),
|
||
so it is behavior-preserving. Risk: medium-high (rewires the deployed decision). Verify:
|
||
- **Box build ✅** — `cargo check -p punktfunk-host --features nvenc` (the deployed config: NVENC SDK +
|
||
`cudarc` + `encode/nvenc.rs`) is **clean, zero warnings**, on the RTX box (`192.168.1.173`), in an
|
||
isolated worktree. This also covers stage 2's Windows-only edits (their first real Windows compile).
|
||
- **On-glass ✅** — deployed my Stage-3 host into the SCM service (Session-1 launch, the real IDD-push
|
||
environment) on the RTX box and drove a `punktfunk-probe` loopback session. The host logged
|
||
`resolved session plan { capture: IddPush, topology: SingleProcess, encoder: Nvenc, bit_depth: 8,
|
||
hdr: false }` — the **correct** resolution for the deployed config (IDD_PUSH + VDISPLAY=pf + nvenc) —
|
||
and routed correctly (IDD-push capturer → shared ring → IDD→DDA fallback). This box has a pre-existing
|
||
**hybrid-GPU IDD render-adapter mismatch** (driver renders on the iGPU `af4825`, host ring on the 4090
|
||
`294d29`) that yielded no published frame in this loopback scenario; an **A/B against the shipping
|
||
binary reproduced the identical `frames=0`**, proving the no-frame is environmental, **not** a Stage-3
|
||
regression. Stage 3 is behavior-equivalent to the shipping host. Box restored to its deployed state.
|
||
|
||
**Stage 4 — `SessionContext` (the arg-bundling). ✅ DONE (box-build validated). `SessionFactory`/`Session::drop` deferred to §2.5 — see below.**
|
||
Bundled the 13-positional-argument `#[allow(too_many_arguments)]` session entry (`virtual_stream` **and**
|
||
`virtual_stream_relay`) into one owned `SessionContext` struct, moved into the stream thread. The receivers
|
||
move in (`virtual_stream` is their only consumer), retiring the `&Receiver` borrow plumbing. **Behavior-
|
||
identical by construction**: each function destructures the context into the same local names at the top, so
|
||
the ~400-line loop bodies are byte-for-byte unchanged. Removed both `#[allow(too_many_arguments)]` attrs.
|
||
|
||
**Scoped deliberately.** The plan's `SessionFactory.build()` owning a `vdm.lease(mode) → open_capturer →
|
||
open_encoder → spawn` RAII chain with `Session::drop` as the *only* teardown is **coupled to §2.5's
|
||
ownership-model rewrite** — it needs a host-side `VirtualDisplayManager`/`MonitorLease` that does not exist
|
||
yet (the lifecycle still lives in the `CURRENT_MON_GEN`/`IDD_SETUP_LOCK` globals + the per-compositor
|
||
`vdisplay` backends). The current teardown is **already drop-based** (the capturer owns the keepalive whose
|
||
`Drop` releases the monitor — "restore displays before REMOVE" lives there; only `send_thread.join()` is
|
||
explicit), and it is the validated shipping path. Wrapping the deployed reconfig/switch/rebuild loop in a
|
||
`Session::drop` for a behavior-preserving change would add real regression risk for marginal gain. So the
|
||
`SessionFactory`/`Session::drop`/`vdm.lease` work is folded into §2.5 (its natural home); this stage delivers
|
||
the concrete, safe arg-bundling. Risk: low (behavior-identical). Verify: Linux + box build (the relay
|
||
destructure is the only Windows-only piece); the teardown on-glass gate moves to the §2.5 work.
|
||
|
||
**Stage 5 — seam-trait tightenings (plan §2.3). 🟡 Tightening 1 ✅ DONE (box-build validated); 2→§2.5, 3 follow-on.**
|
||
The three §2.3 tightenings have different coupling, so they split:
|
||
- **(1) `OutputFormat` into the capturer ✅** — the headline (the explicit Stage-3 deferral; §5's
|
||
"highest-severity coupling"). New `capture::OutputFormat { gpu, hdr }`, resolved once per session and
|
||
passed **into** `capture_virtual_output` (`SessionPlan::output_format()` for the native path —
|
||
`gpu = encoder.is_gpu()`, no second probe; `OutputFormat::resolve()` for the GameStream/spike paths).
|
||
`dxgi::DuplCapturer::open` takes `gpu` in and **its `windows_resolved_backend()` recompute is deleted** —
|
||
capture no longer re-derives the encode backend. Behavior-preserving (the `gpu` passed in equals the value
|
||
the capturer used to compute). Linux + box-build clean.
|
||
- **(2) HDR/release → `VirtualLease`** — **moved to §2.5.** `await_released` as a lease method needs the
|
||
monitor-generation carried *on the lease* (today it's the `CURRENT_MON_GEN` global + the
|
||
`sudovda::wait_for_monitor_released` free fn), and the keepalive becoming `Box<dyn VirtualLease>` is the
|
||
ownership-model change. It belongs with the `VirtualDisplayManager`/`MonitorLease` work, not bolted on here.
|
||
- **(3) `EncoderCaps`** — small additive follow-on (query optional encoder capabilities instead of default
|
||
no-ops); not blocking. Tracked for the next seam pass.
|
||
|
||
Risk: medium (Tightening 1 is behavior-preserving + Windows-only → box-compile is the gate; on-glass parity is
|
||
the same env-limited story as Stage 3).
|
||
|
||
**Stage 6 — `windows/` + `linux/` tree confinement (cfg-sprawl, plan §2.2). ✅ DONE (Linux + box-build validated).**
|
||
Moved **36 platform-specific files** into per-module `windows/` and `linux/` folders (and the shared HID
|
||
codecs into `inject/proto/`): `capture/{windows,linux}/`, `encode/{windows,linux}/`,
|
||
`inject/{windows,linux,proto}/`, `audio/{windows,linux}/`, `vdisplay/{windows,linux}/`, and the top-level
|
||
`src/windows/` (service, wgc_helper, win_adapter, win_display) + `src/linux/` (dmabuf_fence, drm_sync,
|
||
zerocopy/).
|
||
|
||
**Done with `#[path]`, not a module rename** — every file moves into its folder while the `crate::*::*` module
|
||
names stay **flat**, so all caller paths and every internal `super::`/`crate::` reference are **unchanged**
|
||
(only the parent `mod` decls gained `#[path = "…"]`). This is the codebase's existing pattern (inject's
|
||
`gamepad_windows`) and makes the move byte-identical in behaviour with **zero reference churn** — far lower
|
||
risk than collapsing to a single `crate::capture::windows::` namespace (that deeper rename is an optional
|
||
follow-on; this delivers the folder confinement the stage is about). Done LAST, after the semantic stages.
|
||
Verify: Linux `cargo check`/`clippy`/`fmt` clean; all 36 `#[path]` targets exist; no internal
|
||
`#[path]`/`include!`/file-child-`mod` in any moved file; **box `cargo check --features nvenc` clean**.
|
||
|
||
**§2.5 — ownership-model rewrite: `VirtualDisplayManager` + `MonitorLease`. ✅ DONE (3 steps; code + box + on-glass reconnect-leak validated).**
|
||
The natural home for the deferrals above (Stage 4's `SessionFactory`/`Session::drop`/`vdm.lease`; Stage 5
|
||
tightening 2's HDR/release → `VirtualLease`). A 5-agent map first established two facts that shaped the work:
|
||
**`CURRENT_MON_GEN` was WRITE-ONLY** (its only reader, `idd_push::my_gen`, was set-but-never-read — the
|
||
"per-frame monitor-gen bail" the docs describe was never wired; per-frame staleness is the *separate* ring
|
||
`FrameToken.generation`), so the design's "carry the monitor gen through `WinCaptureTarget`" was unnecessary;
|
||
and the two Windows backends (`sudovda` + `pf_vdisplay`) **duplicated the Idle/Active/Lingering refcount
|
||
state machine verbatim** (differing only in IOCTL proto + REMOVE key). User-approved shape: **one OnceLock
|
||
singleton `VirtualDisplayManager`**, not a threaded `Arc`.
|
||
|
||
- **Step 1 (`1520201`)** — delete the dead/write-only code: `CURRENT_MON_GEN`/`my_gen`, `IDD_PERSIST`/
|
||
`open_or_reuse`/`IddReuseHandle` (~150 lines).
|
||
- **Step 2 (`d9b8b88`)** — new `vdisplay/windows/manager.rs`: the two duplicated `MGR: Mutex<Mgr>` globals
|
||
collapse into one OnceLock `VirtualDisplayManager` { `Box<dyn VdisplayDriver>`, `Arc<OwnedHandle>` device
|
||
(typed — kills the raw-`isize` cross-thread smuggle **and** fixes a latent control-handle leak),
|
||
`Mutex<MgrState>`, `AtomicU64` gen }. `sudovda`/`pf_vdisplay` shrink to thin `VdisplayDriver` impls
|
||
(`open`/`add_monitor`/`remove_monitor`/`ping`) + thin `VirtualDisplay` wrappers; the IOCTL surface is the
|
||
only backend-specific code left. `MonitorKey = Guid(GUID) | Session(u64)`. `MON_GEN` +
|
||
`wait_for_monitor_released` move onto the manager; `MonitorLease::drop → vdm().release(gen)` preserves the
|
||
stale-lease no-op verbatim.
|
||
- **Step 3 (`fe61597`)** — the last two globals (`IDD_SETUP_LOCK`/`IDD_SESSION_STOP`) move onto the manager
|
||
behind `vdm().begin_idd_setup(stop)`; `punktfunk1` no longer reaches into vdisplay internals for the preempt.
|
||
|
||
Net: `CURRENT_MON_GEN` / `MON_GEN` / two `MGR` / `IDD_PERSIST` / `IDD_SETUP_LOCK` / `IDD_SESSION_STOP` —
|
||
**all gone**, replaced by one encapsulated, typed manager. Behavior-preserving (the state machine is the
|
||
canonical `sudovda` copy routed through the driver seam).
|
||
|
||
**On-glass reconnect-leak test ✅ (`683c81b`)** — it earned its keep: the box *compile* was clean, but the
|
||
first deploy **panicked** (`VirtualDisplayManager used before a backend initialised it`) because
|
||
`begin_idd_setup` called `vdm()` **before** `vdisplay::open` constructs the backend that runs
|
||
`manager::init()` (the old globals needed no init, so the ordering only broke once it became a manager
|
||
method). Fixed by opening the backend first — it does no monitor work, so the preempt-before-monitor-creation
|
||
semantics are preserved. After the fix: **0 panics**, the new `manager` module owns the lifecycle
|
||
(`vdisplay::manager: virtual-display monitor removed`), create == removed (net 0, **bounded**), **0 leaked
|
||
active monitors** across many reconnects; an A/B vs the shipping binary confirmed §2.5 is behaviour-equivalent.
|
||
Verified live on the **IDD-push zero-copy path** (`new_fps ~200` @5120×1440@240, **0 DDA fallbacks**).
|
||
|
||
## Remaining (next session)
|
||
|
||
Small, non-blocking follow-ons — the layered architecture is in place:
|
||
|
||
1. **`EncoderCaps` (Stage 5 tightening 3)** — query optional encoder capabilities behind a small trait
|
||
instead of the default no-ops; additive, low-risk. The last seam-trait tightening.
|
||
2. **Optional `crate::*::windows::` namespace collapse** — Stage 6 confined the platform files into
|
||
`windows/`/`linux/` folders via `#[path]` (flat module names, zero reference churn); the deeper rename to
|
||
real `crate::capture::windows::` paths is optional cleanup, not required.
|
||
3. **Merge `windows-host-goal1` → `main`** — the branch is off `main` and **not merged**; local `main` is
|
||
also ~20 commits ahead of `origin/main` with unpushed audit/Stage work. Land both when ready (the
|
||
[Work-on-main] habit otherwise applies).
|
||
4. **(driver — NOT the host refactor) pf-vdisplay slot reclaim** — surfaced on-glass: sustained ADD/REMOVE
|
||
churn wedges the driver (`ADD → 0x80070490 ERROR_NOT_FOUND`) because it doesn't reclaim IddCx monitor
|
||
slots on REMOVE (ghost monitor nodes accumulate, `target_id`s climb). Recovery today is
|
||
`packaging/windows/reset-pf-vdisplay.ps1`; the real fix lives in the driver WIP
|
||
(`packaging/windows/drivers/pf-vdisplay/src/{control,adapter}.rs`). Dev-iteration helpers
|
||
`reset-pf-vdisplay.ps1` + `redeploy-pf-vdisplay.ps1` are committed under `packaging/windows/` (validated
|
||
live).
|
||
|
||
## Guardrails (mandatory, plan §14)
|
||
|
||
- Each stage is its own commit; box-verify before moving on.
|
||
- Stages 3–5 touch the deployed path → **on-glass re-test** (NVENC + IDD-push, a mode switch, a
|
||
connect/disconnect cycle) before the next stage.
|
||
- Preserve every `PUNKTFUNK_*` var's exact semantics; when in doubt, assert old==new at the call site.
|