Files
punktfunk/docs/windows-host-goal1-plan.md
T
enricobuehler bf57aa4000 docs(windows-host-goal1): Stage 5 tightening 3 (EncoderCaps) DONE; refresh Remaining
The Goal-1 host refactor is now functionally complete — all 6 stages, §2.5, and
all three Stage-5 seam-trait tightenings have landed (EncoderCaps = 0ccd0fe).
Remaining is non-blocking: the optional namespace collapse (decision: skip —
pure churn), the merge to main (confirm with the user — outward-facing), and the
pf-vdisplay slot-reclaim driver fix (reassigned to windows-host-rewrite.md, the
greenfield driver rewrite, alongside the fullscreen-game capture bug).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 21:28:30 +00:00

205 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Goal-1 (clean, layered host architecture) — staged execution plan
The design is in [`windows-host-rewrite.md`](windows-host-rewrite.md) §2.22.4. This file is the **ordered,
independently-shippable execution plan**, because the host is **live-validated** (GameStream + punktfunk/1,
NVENC + IDD-push on-glass) and Goal-1 rewires its session/config/dispatch flow — so every stage must
**preserve behavior**, compile + box-verify on its own, and be committed before the next starts. The plan's
own §14 makes the §1 preservation checklist a mandatory per-module assert contract; honour it.
> **Status (2026-06-25):** all six staged stages, §2.5 (the ownership-model rewrite), **and** all three
> Stage-5 seam-trait tightenings (incl. `EncoderCaps`, `0ccd0fe`) are **DONE** — each is code +
> box-`cargo check --features nvenc` + (where it touches the deployed path) on-glass validated, except the
> Windows-only `EncoderCaps` NVENC override which is Linux-clippy-clean + CI-gated. Work lives on branch
> **`windows-host-goal1`** (off `main`, **not merged**). The Goal-1 host refactor is **functionally
> complete**; what's left is non-blocking — see [Remaining (next session)](#remaining-next-session).
## Why staged (not one big rewrite)
`main` is at parity and shipping. A monolithic rewrite would put the validated host in a broken
intermediate state for a long window and make a regression impossible to bisect. Each stage below is a
behaviour-preserving transform with its own verification, so a regression is caught at the stage that
introduced it.
## Stages (ordered; each = goal · files · risk · verify)
**Stage 1 — `HostConfig` foundation. ✅ DONE (this commit).**
`config.rs`: typed `HostConfig` parsed ONCE from env (`idd_push`/`encoder_pref`/`no_helper`/`force_helper`).
Migrated the two highest-churn dispatch reads onto it (`encode::windows_resolved_backend`,
`punktfunk1::should_use_helper`). Risk: low (env constant at runtime → identical behaviour). Verify: box
`cargo check --features nvenc`.
**Stage 2 — finish `HostConfig` + resolve-once. ✅ DONE (this commit).**
Migrated **31** genuinely-constant operator/dispatch sites onto `HostConfig`: `idd_push` ×7 (the
capture/topology disagreement knob), `no_wgc`, `capture_backend`, `render_adapter`, `encoder_pref` (Linux),
the Windows vdisplay-backend select, plus the plan-named `secure_dda`/`idd_depth`/`zerocopy`/`ten_bit` and the
multi-site `perf` ×4 / `compositor` ×5 / `video_source` ×3 / `gamepad`. Each `HostConfig` field's parser is
**byte-identical** to the read it replaced, so `old == new` by construction (the §1 "flipped bool" guard).
**Scope correction (the plan's "~64 sites / Linux XDG+compositor / grep→0" was unsafe as written):** two
classes of `env::var` read are deliberately **kept live** and documented in `config.rs`:
- **Runtime-mutated session vars.** On Linux, `vdisplay::apply_session_env` *rewrites the process env on
every connect* (the Bazzite Gaming↔Desktop follow): `WAYLAND_DISPLAY`, `XDG_CURRENT_DESKTOP`,
`XDG_RUNTIME_DIR`, `DBUS_SESSION_BUS_ADDRESS`, and the derived `PUNKTFUNK_INPUT_BACKEND`,
`PUNKTFUNK_GAMESCOPE_SESSION/NODE`, `PUNKTFUNK_KWIN/MUTTER_VIRTUAL_PRIMARY`, `PUNKTFUNK_FORCE_SHM`.
Parse-once would freeze them at startup → silent session-following regression. They are NOT constant.
- **Single-use local tuning** (no resolve-once benefit, call-site-local default/clamp, and `FEC_PCT` even has
*two different* semantics): `FEC_PCT`, `VIDEO_DROP`, `VBV_FRAMES`, `SPLIT_ENCODE`, `PACE_BURST_KB`, the
`capture/dxgi.rs` timing knobs, the `*_LIVE`/test gates, plus path/dynamic reads (config-dir, `PATH`
search, env-forward-to-child). `PUNKTFUNK_ZEROCOPY` is split on purpose: Windows presence-semantics moved
to the field; Linux keeps its own truthy parser.
Risk: medium (semantics-preservation). Verify: Linux `cargo check`/`clippy`/`fmt` green (the Windows-only
edits are 1:1 substitutions, compile-verified on the box as part of Stage 3's build).
**Stage 3 — `SessionPlan` (the single biggest clarity lever, plan §2.4). ✅ DONE (box-build + on-glass validated).**
New `src/session_plan.rs`: a `Copy` `SessionPlan { capture, topology, encoder, bit_depth, hdr }` resolved
**once** from `HostConfig` (+ the negotiated `bit_depth`) in `virtual_stream`, logged, and threaded through
`build_pipeline_with_retry`/`build_pipeline`. The three dispatch points now read it:
- **capture** — `capture::capture_virtual_output` takes a `CaptureBackend` IN (was re-deriving from
`config().idd_push`/`capture_backend`/`no_wgc`); `CaptureBackend::resolve()` is the one resolver (also
used by the GameStream + spike call sites).
- **topology** — `virtual_stream` reads `plan.topology` (`should_use_helper` deleted; its logic is
`session_plan::resolve_topology`, verbatim). The IDD-preempt guard reads `plan.capture` too.
- **encoder** — recorded as `EncoderBackend` from `encode::windows_resolved_backend` (config-backed +
GPU-vendor cached since stage 2, already a single source). Threading `encoder`/`input_format` into the
encoder + capturer opens (which removes the `dxgi.rs` back-reference) is **stage 5**.
Every decision is provably equivalent to the pre-stage-3 scattered reads (same `config()` + cached probes),
so it is behavior-preserving. Risk: medium-high (rewires the deployed decision). Verify:
- **Box build ✅** — `cargo check -p punktfunk-host --features nvenc` (the deployed config: NVENC SDK +
`cudarc` + `encode/nvenc.rs`) is **clean, zero warnings**, on the RTX box (`192.168.1.173`), in an
isolated worktree. This also covers stage 2's Windows-only edits (their first real Windows compile).
- **On-glass ✅** — deployed my Stage-3 host into the SCM service (Session-1 launch, the real IDD-push
environment) on the RTX box and drove a `punktfunk-probe` loopback session. The host logged
`resolved session plan { capture: IddPush, topology: SingleProcess, encoder: Nvenc, bit_depth: 8,
hdr: false }` — the **correct** resolution for the deployed config (IDD_PUSH + VDISPLAY=pf + nvenc) —
and routed correctly (IDD-push capturer → shared ring → IDD→DDA fallback). This box has a pre-existing
**hybrid-GPU IDD render-adapter mismatch** (driver renders on the iGPU `af4825`, host ring on the 4090
`294d29`) that yielded no published frame in this loopback scenario; an **A/B against the shipping
binary reproduced the identical `frames=0`**, proving the no-frame is environmental, **not** a Stage-3
regression. Stage 3 is behavior-equivalent to the shipping host. Box restored to its deployed state.
**Stage 4 — `SessionContext` (the arg-bundling). ✅ DONE (box-build validated). `SessionFactory`/`Session::drop` deferred to §2.5 — see below.**
Bundled the 13-positional-argument `#[allow(too_many_arguments)]` session entry (`virtual_stream` **and**
`virtual_stream_relay`) into one owned `SessionContext` struct, moved into the stream thread. The receivers
move in (`virtual_stream` is their only consumer), retiring the `&Receiver` borrow plumbing. **Behavior-
identical by construction**: each function destructures the context into the same local names at the top, so
the ~400-line loop bodies are byte-for-byte unchanged. Removed both `#[allow(too_many_arguments)]` attrs.
**Scoped deliberately.** The plan's `SessionFactory.build()` owning a `vdm.lease(mode) → open_capturer →
open_encoder → spawn` RAII chain with `Session::drop` as the *only* teardown is **coupled to §2.5's
ownership-model rewrite** — it needs a host-side `VirtualDisplayManager`/`MonitorLease` that does not exist
yet (the lifecycle still lives in the `CURRENT_MON_GEN`/`IDD_SETUP_LOCK` globals + the per-compositor
`vdisplay` backends). The current teardown is **already drop-based** (the capturer owns the keepalive whose
`Drop` releases the monitor — "restore displays before REMOVE" lives there; only `send_thread.join()` is
explicit), and it is the validated shipping path. Wrapping the deployed reconfig/switch/rebuild loop in a
`Session::drop` for a behavior-preserving change would add real regression risk for marginal gain. So the
`SessionFactory`/`Session::drop`/`vdm.lease` work is folded into §2.5 (its natural home); this stage delivers
the concrete, safe arg-bundling. Risk: low (behavior-identical). Verify: Linux + box build (the relay
destructure is the only Windows-only piece); the teardown on-glass gate moves to the §2.5 work.
**Stage 5 — seam-trait tightenings (plan §2.3). ✅ Tightening 1 + 3 DONE; 2 folded into §2.5.**
The three §2.3 tightenings have different coupling, so they split:
- **(1) `OutputFormat` into the capturer ✅** — the headline (the explicit Stage-3 deferral; §5's
"highest-severity coupling"). New `capture::OutputFormat { gpu, hdr }`, resolved once per session and
passed **into** `capture_virtual_output` (`SessionPlan::output_format()` for the native path —
`gpu = encoder.is_gpu()`, no second probe; `OutputFormat::resolve()` for the GameStream/spike paths).
`dxgi::DuplCapturer::open` takes `gpu` in and **its `windows_resolved_backend()` recompute is deleted**
capture no longer re-derives the encode backend. Behavior-preserving (the `gpu` passed in equals the value
the capturer used to compute). Linux + box-build clean.
- **(2) HDR/release → `VirtualLease`** — **moved to §2.5.** `await_released` as a lease method needs the
monitor-generation carried *on the lease* (today it's the `CURRENT_MON_GEN` global + the
`sudovda::wait_for_monitor_released` free fn), and the keepalive becoming `Box<dyn VirtualLease>` is the
ownership-model change. It belongs with the `VirtualDisplayManager`/`MonitorLease` work, not bolted on here.
- **(3) `EncoderCaps` ✅ (`0ccd0fe`)** — `Encoder::caps() -> EncoderCaps { supports_rfi, supports_hdr_metadata }`,
a default-`false` query (so every SDR/libavcodec backend — Linux NVENC, VAAPI, AMF/QSV, software — is
unchanged); only the Windows direct-NVENC path overrides it, reporting the real `rfi_supported` (probed
once at open) + `hdr`. Consumer: the GameStream encode loop hoists `supports_rfi` once and gates the
loss-recovery path on it — `!(supports_rfi && invalidate_ref_frames(..))` forces a keyframe directly on
non-RFI encoders instead of an always-`false` call every loss event (behavior-preserving, intent
explicit). Linux `clippy -D warnings` clean; the NVENC override is Windows-only → CI/on-glass gate.
Risk: medium (Tightening 1 is behavior-preserving + Windows-only → box-compile is the gate; on-glass parity is
the same env-limited story as Stage 3).
**Stage 6 — `windows/` + `linux/` tree confinement (cfg-sprawl, plan §2.2). ✅ DONE (Linux + box-build validated).**
Moved **36 platform-specific files** into per-module `windows/` and `linux/` folders (and the shared HID
codecs into `inject/proto/`): `capture/{windows,linux}/`, `encode/{windows,linux}/`,
`inject/{windows,linux,proto}/`, `audio/{windows,linux}/`, `vdisplay/{windows,linux}/`, and the top-level
`src/windows/` (service, wgc_helper, win_adapter, win_display) + `src/linux/` (dmabuf_fence, drm_sync,
zerocopy/).
**Done with `#[path]`, not a module rename** — every file moves into its folder while the `crate::*::*` module
names stay **flat**, so all caller paths and every internal `super::`/`crate::` reference are **unchanged**
(only the parent `mod` decls gained `#[path = "…"]`). This is the codebase's existing pattern (inject's
`gamepad_windows`) and makes the move byte-identical in behaviour with **zero reference churn** — far lower
risk than collapsing to a single `crate::capture::windows::` namespace (that deeper rename is an optional
follow-on; this delivers the folder confinement the stage is about). Done LAST, after the semantic stages.
Verify: Linux `cargo check`/`clippy`/`fmt` clean; all 36 `#[path]` targets exist; no internal
`#[path]`/`include!`/file-child-`mod` in any moved file; **box `cargo check --features nvenc` clean**.
**§2.5 — ownership-model rewrite: `VirtualDisplayManager` + `MonitorLease`. ✅ DONE (3 steps; code + box + on-glass reconnect-leak validated).**
The natural home for the deferrals above (Stage 4's `SessionFactory`/`Session::drop`/`vdm.lease`; Stage 5
tightening 2's HDR/release → `VirtualLease`). A 5-agent map first established two facts that shaped the work:
**`CURRENT_MON_GEN` was WRITE-ONLY** (its only reader, `idd_push::my_gen`, was set-but-never-read — the
"per-frame monitor-gen bail" the docs describe was never wired; per-frame staleness is the *separate* ring
`FrameToken.generation`), so the design's "carry the monitor gen through `WinCaptureTarget`" was unnecessary;
and the two Windows backends (`sudovda` + `pf_vdisplay`) **duplicated the Idle/Active/Lingering refcount
state machine verbatim** (differing only in IOCTL proto + REMOVE key). User-approved shape: **one OnceLock
singleton `VirtualDisplayManager`**, not a threaded `Arc`.
- **Step 1 (`1520201`)** — delete the dead/write-only code: `CURRENT_MON_GEN`/`my_gen`, `IDD_PERSIST`/
`open_or_reuse`/`IddReuseHandle` (~150 lines).
- **Step 2 (`d9b8b88`)** — new `vdisplay/windows/manager.rs`: the two duplicated `MGR: Mutex<Mgr>` globals
collapse into one OnceLock `VirtualDisplayManager` { `Box<dyn VdisplayDriver>`, `Arc<OwnedHandle>` device
(typed — kills the raw-`isize` cross-thread smuggle **and** fixes a latent control-handle leak),
`Mutex<MgrState>`, `AtomicU64` gen }. `sudovda`/`pf_vdisplay` shrink to thin `VdisplayDriver` impls
(`open`/`add_monitor`/`remove_monitor`/`ping`) + thin `VirtualDisplay` wrappers; the IOCTL surface is the
only backend-specific code left. `MonitorKey = Guid(GUID) | Session(u64)`. `MON_GEN` +
`wait_for_monitor_released` move onto the manager; `MonitorLease::drop → vdm().release(gen)` preserves the
stale-lease no-op verbatim.
- **Step 3 (`fe61597`)** — the last two globals (`IDD_SETUP_LOCK`/`IDD_SESSION_STOP`) move onto the manager
behind `vdm().begin_idd_setup(stop)`; `punktfunk1` no longer reaches into vdisplay internals for the preempt.
Net: `CURRENT_MON_GEN` / `MON_GEN` / two `MGR` / `IDD_PERSIST` / `IDD_SETUP_LOCK` / `IDD_SESSION_STOP`
**all gone**, replaced by one encapsulated, typed manager. Behavior-preserving (the state machine is the
canonical `sudovda` copy routed through the driver seam).
**On-glass reconnect-leak test ✅ (`683c81b`)** — it earned its keep: the box *compile* was clean, but the
first deploy **panicked** (`VirtualDisplayManager used before a backend initialised it`) because
`begin_idd_setup` called `vdm()` **before** `vdisplay::open` constructs the backend that runs
`manager::init()` (the old globals needed no init, so the ordering only broke once it became a manager
method). Fixed by opening the backend first — it does no monitor work, so the preempt-before-monitor-creation
semantics are preserved. After the fix: **0 panics**, the new `manager` module owns the lifecycle
(`vdisplay::manager: virtual-display monitor removed`), create == removed (net 0, **bounded**), **0 leaked
active monitors** across many reconnects; an A/B vs the shipping binary confirmed §2.5 is behaviour-equivalent.
Verified live on the **IDD-push zero-copy path** (`new_fps ~200` @5120×1440@240, **0 DDA fallbacks**).
## Remaining (next session)
The layered architecture is **complete**: all 6 staged stages, §2.5, **and** the three Stage-5 seam-trait
tightenings have landed. What's left is non-blocking:
1. **Optional `crate::*::windows::` namespace collapse** — Stage 6 confined the platform files into
`windows/`/`linux/` folders via `#[path]` (flat module names, zero reference churn); the deeper rename to
real `crate::capture::windows::` paths is optional cleanup, not required. **Decision: skip** — pure churn
for no behavior/clarity gain, and it would touch every `super::`/`crate::` path in the moved files.
2. **Merge `windows-host-goal1` → `main`** — the branch is off `main` and **not merged**; local `main` is
also ~20 commits ahead of `origin/main` with unpushed audit/Stage work. Outward-facing (a 30+-commit push
to `origin`) → **confirm with the user before landing**; the [Work-on-main] habit otherwise applies.
3. **(driver — NOT the host refactor) pf-vdisplay slot reclaim** — surfaced on-glass: sustained ADD/REMOVE
churn wedges the driver (`ADD → 0x80070490 ERROR_NOT_FOUND`) because it doesn't reclaim IddCx monitor
slots on REMOVE (ghost monitor nodes accumulate, `target_id`s climb). Recovery today is
`packaging/windows/reset-pf-vdisplay.ps1`; the real fix lives in the driver WIP. **This belongs to the
greenfield driver rewrite, not the Goal-1 host refactor** — tracked in
[`windows-host-rewrite.md`](windows-host-rewrite.md) (alongside the fullscreen-game capture bug).
Dev-iteration helpers `reset-pf-vdisplay.ps1` + `redeploy-pf-vdisplay.ps1` are committed under
`packaging/windows/` (validated live).
## Guardrails (mandatory, plan §14)
- Each stage is its own commit; box-verify before moving on.
- Stages 35 touch the deployed path → **on-glass re-test** (NVENC + IDD-push, a mode switch, a
connect/disconnect cycle) before the next stage.
- Preserve every `PUNKTFUNK_*` var's exact semantics; when in doubt, assert old==new at the call site.