The Goal-1 host refactor is now functionally complete — all 6 stages, §2.5, and
all three Stage-5 seam-trait tightenings have landed (EncoderCaps = 0ccd0fe).
Remaining is non-blocking: the optional namespace collapse (decision: skip —
pure churn), the merge to main (confirm with the user — outward-facing), and the
pf-vdisplay slot-reclaim driver fix (reassigned to windows-host-rewrite.md, the
greenfield driver rewrite, alongside the fullscreen-game capture bug).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
17 KiB
Goal-1 (clean, layered host architecture) — staged execution plan
The design is in windows-host-rewrite.md §2.2–2.4. This file is the ordered,
independently-shippable execution plan, because the host is live-validated (GameStream + punktfunk/1,
NVENC + IDD-push on-glass) and Goal-1 rewires its session/config/dispatch flow — so every stage must
preserve behavior, compile + box-verify on its own, and be committed before the next starts. The plan's
own §14 makes the §1 preservation checklist a mandatory per-module assert contract; honour it.
Status (2026-06-25): all six staged stages, §2.5 (the ownership-model rewrite), and all three Stage-5 seam-trait tightenings (incl.
EncoderCaps,0ccd0fe) are DONE — each is code + box-cargo check --features nvenc+ (where it touches the deployed path) on-glass validated, except the Windows-onlyEncoderCapsNVENC override which is Linux-clippy-clean + CI-gated. Work lives on branchwindows-host-goal1(offmain, not merged). The Goal-1 host refactor is functionally complete; what's left is non-blocking — see Remaining (next session).
Why staged (not one big rewrite)
main is at parity and shipping. A monolithic rewrite would put the validated host in a broken
intermediate state for a long window and make a regression impossible to bisect. Each stage below is a
behaviour-preserving transform with its own verification, so a regression is caught at the stage that
introduced it.
Stages (ordered; each = goal · files · risk · verify)
Stage 1 — HostConfig foundation. ✅ DONE (this commit).
config.rs: typed HostConfig parsed ONCE from env (idd_push/encoder_pref/no_helper/force_helper).
Migrated the two highest-churn dispatch reads onto it (encode::windows_resolved_backend,
punktfunk1::should_use_helper). Risk: low (env constant at runtime → identical behaviour). Verify: box
cargo check --features nvenc.
Stage 2 — finish HostConfig + resolve-once. ✅ DONE (this commit).
Migrated 31 genuinely-constant operator/dispatch sites onto HostConfig: idd_push ×7 (the
capture/topology disagreement knob), no_wgc, capture_backend, render_adapter, encoder_pref (Linux),
the Windows vdisplay-backend select, plus the plan-named secure_dda/idd_depth/zerocopy/ten_bit and the
multi-site perf ×4 / compositor ×5 / video_source ×3 / gamepad. Each HostConfig field's parser is
byte-identical to the read it replaced, so old == new by construction (the §1 "flipped bool" guard).
Scope correction (the plan's "~64 sites / Linux XDG+compositor / grep→0" was unsafe as written): two
classes of env::var read are deliberately kept live and documented in config.rs:
- Runtime-mutated session vars. On Linux,
vdisplay::apply_session_envrewrites the process env on every connect (the Bazzite Gaming↔Desktop follow):WAYLAND_DISPLAY,XDG_CURRENT_DESKTOP,XDG_RUNTIME_DIR,DBUS_SESSION_BUS_ADDRESS, and the derivedPUNKTFUNK_INPUT_BACKEND,PUNKTFUNK_GAMESCOPE_SESSION/NODE,PUNKTFUNK_KWIN/MUTTER_VIRTUAL_PRIMARY,PUNKTFUNK_FORCE_SHM. Parse-once would freeze them at startup → silent session-following regression. They are NOT constant. - Single-use local tuning (no resolve-once benefit, call-site-local default/clamp, and
FEC_PCTeven has two different semantics):FEC_PCT,VIDEO_DROP,VBV_FRAMES,SPLIT_ENCODE,PACE_BURST_KB, thecapture/dxgi.rstiming knobs, the*_LIVE/test gates, plus path/dynamic reads (config-dir,PATHsearch, env-forward-to-child).PUNKTFUNK_ZEROCOPYis split on purpose: Windows presence-semantics moved to the field; Linux keeps its own truthy parser.
Risk: medium (semantics-preservation). Verify: Linux cargo check/clippy/fmt green (the Windows-only
edits are 1:1 substitutions, compile-verified on the box as part of Stage 3's build).
Stage 3 — SessionPlan (the single biggest clarity lever, plan §2.4). ✅ DONE (box-build + on-glass validated).
New src/session_plan.rs: a Copy SessionPlan { capture, topology, encoder, bit_depth, hdr } resolved
once from HostConfig (+ the negotiated bit_depth) in virtual_stream, logged, and threaded through
build_pipeline_with_retry/build_pipeline. The three dispatch points now read it:
- capture —
capture::capture_virtual_outputtakes aCaptureBackendIN (was re-deriving fromconfig().idd_push/capture_backend/no_wgc);CaptureBackend::resolve()is the one resolver (also used by the GameStream + spike call sites). - topology —
virtual_streamreadsplan.topology(should_use_helperdeleted; its logic issession_plan::resolve_topology, verbatim). The IDD-preempt guard readsplan.capturetoo. - encoder — recorded as
EncoderBackendfromencode::windows_resolved_backend(config-backed + GPU-vendor cached since stage 2, already a single source). Threadingencoder/input_formatinto the encoder + capturer opens (which removes thedxgi.rsback-reference) is stage 5.
Every decision is provably equivalent to the pre-stage-3 scattered reads (same config() + cached probes),
so it is behavior-preserving. Risk: medium-high (rewires the deployed decision). Verify:
- Box build ✅ —
cargo check -p punktfunk-host --features nvenc(the deployed config: NVENC SDK +cudarc+encode/nvenc.rs) is clean, zero warnings, on the RTX box (192.168.1.173), in an isolated worktree. This also covers stage 2's Windows-only edits (their first real Windows compile). - On-glass ✅ — deployed my Stage-3 host into the SCM service (Session-1 launch, the real IDD-push
environment) on the RTX box and drove a
punktfunk-probeloopback session. The host loggedresolved session plan { capture: IddPush, topology: SingleProcess, encoder: Nvenc, bit_depth: 8, hdr: false }— the correct resolution for the deployed config (IDD_PUSH + VDISPLAY=pf + nvenc) — and routed correctly (IDD-push capturer → shared ring → IDD→DDA fallback). This box has a pre-existing hybrid-GPU IDD render-adapter mismatch (driver renders on the iGPUaf4825, host ring on the 4090294d29) that yielded no published frame in this loopback scenario; an A/B against the shipping binary reproduced the identicalframes=0, proving the no-frame is environmental, not a Stage-3 regression. Stage 3 is behavior-equivalent to the shipping host. Box restored to its deployed state.
Stage 4 — SessionContext (the arg-bundling). ✅ DONE (box-build validated). SessionFactory/Session::drop deferred to §2.5 — see below.
Bundled the 13-positional-argument #[allow(too_many_arguments)] session entry (virtual_stream and
virtual_stream_relay) into one owned SessionContext struct, moved into the stream thread. The receivers
move in (virtual_stream is their only consumer), retiring the &Receiver borrow plumbing. Behavior-
identical by construction: each function destructures the context into the same local names at the top, so
the ~400-line loop bodies are byte-for-byte unchanged. Removed both #[allow(too_many_arguments)] attrs.
Scoped deliberately. The plan's SessionFactory.build() owning a vdm.lease(mode) → open_capturer → open_encoder → spawn RAII chain with Session::drop as the only teardown is coupled to §2.5's
ownership-model rewrite — it needs a host-side VirtualDisplayManager/MonitorLease that does not exist
yet (the lifecycle still lives in the CURRENT_MON_GEN/IDD_SETUP_LOCK globals + the per-compositor
vdisplay backends). The current teardown is already drop-based (the capturer owns the keepalive whose
Drop releases the monitor — "restore displays before REMOVE" lives there; only send_thread.join() is
explicit), and it is the validated shipping path. Wrapping the deployed reconfig/switch/rebuild loop in a
Session::drop for a behavior-preserving change would add real regression risk for marginal gain. So the
SessionFactory/Session::drop/vdm.lease work is folded into §2.5 (its natural home); this stage delivers
the concrete, safe arg-bundling. Risk: low (behavior-identical). Verify: Linux + box build (the relay
destructure is the only Windows-only piece); the teardown on-glass gate moves to the §2.5 work.
Stage 5 — seam-trait tightenings (plan §2.3). ✅ Tightening 1 + 3 DONE; 2 folded into §2.5. The three §2.3 tightenings have different coupling, so they split:
- (1)
OutputFormatinto the capturer ✅ — the headline (the explicit Stage-3 deferral; §5's "highest-severity coupling"). Newcapture::OutputFormat { gpu, hdr }, resolved once per session and passed intocapture_virtual_output(SessionPlan::output_format()for the native path —gpu = encoder.is_gpu(), no second probe;OutputFormat::resolve()for the GameStream/spike paths).dxgi::DuplCapturer::opentakesgpuin and itswindows_resolved_backend()recompute is deleted — capture no longer re-derives the encode backend. Behavior-preserving (thegpupassed in equals the value the capturer used to compute). Linux + box-build clean. - (2) HDR/release →
VirtualLease— moved to §2.5.await_releasedas a lease method needs the monitor-generation carried on the lease (today it's theCURRENT_MON_GENglobal + thesudovda::wait_for_monitor_releasedfree fn), and the keepalive becomingBox<dyn VirtualLease>is the ownership-model change. It belongs with theVirtualDisplayManager/MonitorLeasework, not bolted on here. - (3)
EncoderCaps✅ (0ccd0fe) —Encoder::caps() -> EncoderCaps { supports_rfi, supports_hdr_metadata }, a default-falsequery (so every SDR/libavcodec backend — Linux NVENC, VAAPI, AMF/QSV, software — is unchanged); only the Windows direct-NVENC path overrides it, reporting the realrfi_supported(probed once at open) +hdr. Consumer: the GameStream encode loop hoistssupports_rfionce and gates the loss-recovery path on it —!(supports_rfi && invalidate_ref_frames(..))forces a keyframe directly on non-RFI encoders instead of an always-falsecall every loss event (behavior-preserving, intent explicit). Linuxclippy -D warningsclean; the NVENC override is Windows-only → CI/on-glass gate.
Risk: medium (Tightening 1 is behavior-preserving + Windows-only → box-compile is the gate; on-glass parity is the same env-limited story as Stage 3).
Stage 6 — windows/ + linux/ tree confinement (cfg-sprawl, plan §2.2). ✅ DONE (Linux + box-build validated).
Moved 36 platform-specific files into per-module windows/ and linux/ folders (and the shared HID
codecs into inject/proto/): capture/{windows,linux}/, encode/{windows,linux}/,
inject/{windows,linux,proto}/, audio/{windows,linux}/, vdisplay/{windows,linux}/, and the top-level
src/windows/ (service, wgc_helper, win_adapter, win_display) + src/linux/ (dmabuf_fence, drm_sync,
zerocopy/).
Done with #[path], not a module rename — every file moves into its folder while the crate::*::* module
names stay flat, so all caller paths and every internal super::/crate:: reference are unchanged
(only the parent mod decls gained #[path = "…"]). This is the codebase's existing pattern (inject's
gamepad_windows) and makes the move byte-identical in behaviour with zero reference churn — far lower
risk than collapsing to a single crate::capture::windows:: namespace (that deeper rename is an optional
follow-on; this delivers the folder confinement the stage is about). Done LAST, after the semantic stages.
Verify: Linux cargo check/clippy/fmt clean; all 36 #[path] targets exist; no internal
#[path]/include!/file-child-mod in any moved file; box cargo check --features nvenc clean.
§2.5 — ownership-model rewrite: VirtualDisplayManager + MonitorLease. ✅ DONE (3 steps; code + box + on-glass reconnect-leak validated).
The natural home for the deferrals above (Stage 4's SessionFactory/Session::drop/vdm.lease; Stage 5
tightening 2's HDR/release → VirtualLease). A 5-agent map first established two facts that shaped the work:
CURRENT_MON_GEN was WRITE-ONLY (its only reader, idd_push::my_gen, was set-but-never-read — the
"per-frame monitor-gen bail" the docs describe was never wired; per-frame staleness is the separate ring
FrameToken.generation), so the design's "carry the monitor gen through WinCaptureTarget" was unnecessary;
and the two Windows backends (sudovda + pf_vdisplay) duplicated the Idle/Active/Lingering refcount
state machine verbatim (differing only in IOCTL proto + REMOVE key). User-approved shape: one OnceLock
singleton VirtualDisplayManager, not a threaded Arc.
- Step 1 (
1520201) — delete the dead/write-only code:CURRENT_MON_GEN/my_gen,IDD_PERSIST/open_or_reuse/IddReuseHandle(~150 lines). - Step 2 (
d9b8b88) — newvdisplay/windows/manager.rs: the two duplicatedMGR: Mutex<Mgr>globals collapse into one OnceLockVirtualDisplayManager{Box<dyn VdisplayDriver>,Arc<OwnedHandle>device (typed — kills the raw-isizecross-thread smuggle and fixes a latent control-handle leak),Mutex<MgrState>,AtomicU64gen }.sudovda/pf_vdisplayshrink to thinVdisplayDriverimpls (open/add_monitor/remove_monitor/ping) + thinVirtualDisplaywrappers; the IOCTL surface is the only backend-specific code left.MonitorKey = Guid(GUID) | Session(u64).MON_GEN+wait_for_monitor_releasedmove onto the manager;MonitorLease::drop → vdm().release(gen)preserves the stale-lease no-op verbatim. - Step 3 (
fe61597) — the last two globals (IDD_SETUP_LOCK/IDD_SESSION_STOP) move onto the manager behindvdm().begin_idd_setup(stop);punktfunk1no longer reaches into vdisplay internals for the preempt.
Net: CURRENT_MON_GEN / MON_GEN / two MGR / IDD_PERSIST / IDD_SETUP_LOCK / IDD_SESSION_STOP —
all gone, replaced by one encapsulated, typed manager. Behavior-preserving (the state machine is the
canonical sudovda copy routed through the driver seam).
On-glass reconnect-leak test ✅ (683c81b) — it earned its keep: the box compile was clean, but the
first deploy panicked (VirtualDisplayManager used before a backend initialised it) because
begin_idd_setup called vdm() before vdisplay::open constructs the backend that runs
manager::init() (the old globals needed no init, so the ordering only broke once it became a manager
method). Fixed by opening the backend first — it does no monitor work, so the preempt-before-monitor-creation
semantics are preserved. After the fix: 0 panics, the new manager module owns the lifecycle
(vdisplay::manager: virtual-display monitor removed), create == removed (net 0, bounded), 0 leaked
active monitors across many reconnects; an A/B vs the shipping binary confirmed §2.5 is behaviour-equivalent.
Verified live on the IDD-push zero-copy path (new_fps ~200 @5120×1440@240, 0 DDA fallbacks).
Remaining (next session)
The layered architecture is complete: all 6 staged stages, §2.5, and the three Stage-5 seam-trait tightenings have landed. What's left is non-blocking:
- Optional
crate::*::windows::namespace collapse — Stage 6 confined the platform files intowindows//linux/folders via#[path](flat module names, zero reference churn); the deeper rename to realcrate::capture::windows::paths is optional cleanup, not required. Decision: skip — pure churn for no behavior/clarity gain, and it would touch everysuper::/crate::path in the moved files. - Merge
windows-host-goal1→main— the branch is offmainand not merged; localmainis also ~20 commits ahead oforigin/mainwith unpushed audit/Stage work. Outward-facing (a 30+-commit push toorigin) → confirm with the user before landing; the [Work-on-main] habit otherwise applies. - (driver — NOT the host refactor) pf-vdisplay slot reclaim — surfaced on-glass: sustained ADD/REMOVE
churn wedges the driver (
ADD → 0x80070490 ERROR_NOT_FOUND) because it doesn't reclaim IddCx monitor slots on REMOVE (ghost monitor nodes accumulate,target_ids climb). Recovery today ispackaging/windows/reset-pf-vdisplay.ps1; the real fix lives in the driver WIP. This belongs to the greenfield driver rewrite, not the Goal-1 host refactor — tracked inwindows-host-rewrite.md(alongside the fullscreen-game capture bug). Dev-iteration helpersreset-pf-vdisplay.ps1+redeploy-pf-vdisplay.ps1are committed underpackaging/windows/(validated live).
Guardrails (mandatory, plan §14)
- Each stage is its own commit; box-verify before moving on.
- Stages 3–5 touch the deployed path → on-glass re-test (NVENC + IDD-push, a mode switch, a connect/disconnect cycle) before the next stage.
- Preserve every
PUNKTFUNK_*var's exact semantics; when in doubt, assert old==new at the call site.