Files
punktfunk/design/windows-host-rewrite.md
T
enricobuehler 7b99b41ede docs(design): trim shipped plans, consolidate cluster, add index
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).

- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
  apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
  host-latency, gpu-contention (fixed stale status table), game-library,
  linux-setup (fixed m0->spike + stale zero-copy claim),
  session-aware-host-followups, windows-client-bootstrap,
  windows-dualsense-{scoping,game-detection}, windows-virtual-display,
  security-review (per-finding status table; #12 still open),
  apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
  windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
  merged, M4 done); windows-secure-desktop.md archived (now a fallback
  behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
  roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:39:06 +00:00

30 KiB
Raw Permalink Blame History

Windows Host — Architecture, Status & Roadmap

Single source of truth for the punktfunk Windows streaming host: the all-Rust pf-vdisplay IddCx virtual-display driver + IDD-push zero-copy capture + NVENC/AMF/QSV encode, shipped as a signed Inno Setup installer with a LocalSystem SCM service. Live-validated on the RTX box through 5120×1440@240 HDR, the secure desktop (lock/UAC), and a fullscreen game.

This file is the consolidated Windows-host doc — it absorbs the rewrite design plan, the Goal-1 staged-refactor plan, the audit + remediation tracker, the fullscreen-game capture-bug analysis, and the durable rationale from the original windows-host.md implementation plan (now a stub). Last updated 2026-06-26. All of this work is merged to main (the windows-host-goal1 branch landed at 3e7c9bd).


1. Status at a glance

Goals 13 and milestones M0M4 are complete and merged to main. The host has a clean, typed, layered architecture (HostConfig → SessionPlan → SessionContext, windows/+linux/ confinement, a single VirtualDisplayManager ownership model, EncoderCaps); the all-Rust IddCx pf-vdisplay driver loads self-signed under Secure Boot and does IDD-push zero-copy capture at 5K@240 HDR including the secure desktop (Winlogon/UAC/lock); SudoVDA is gone (84a3b95) — pf-vdisplay is the sole virtual-display backend; and the three UMDF drivers (pf-vdisplay, pf-dualsense, pf-xusb) now build from source in one unified packaging/windows/drivers/ workspace (M4, 92e6802). The shipped path (IDD-push + NVENC) is live-validated on glass; the AMF/QSV encode path is CI-green but not yet on-hardware (no AMD/Intel Windows box in the lab).

Ground the details against the code: crates/punktfunk-host/src/windows/, crates/punktfunk-host/src/{capture,encode,inject,audio,vdisplay}/windows/, and packaging/windows/drivers/.

What remains (all non-blocking): the pf-vdisplay slot-reclaim-on-REMOVE fix needs an on-glass reconnect-storm A/B (§4 P1.3); host-crate unsafe lint hygiene + old-monolith / bring-up-scaffolding cleanup (§4 P2); and the hardware-gated items — AMF/QSV on-glass, hybrid-GPU SET_RENDER_ADAPTER, the WGC/DDA fallback reshape, and true max_concurrent>1 (§4 P3). One framing note: the host was not greenfield-rebuilt — it was refactored in place via a staged, behavior-preserving sequence that kept the live host working at every step; only the driver was rebuilt fresh.


2. Architecture (what is on disk)

A ~1-page map; the empirical constraints these encode are in §3, the deep reference is in §6.

2.1 Layering & crates

  • crates/punktfunk-host — one shared host crate (Linux + Windows; not split). Platform code is confined under per-module windows/+linux/ folders behind #[cfg] seams (capture/{windows,linux}/, encode/…, inject/…, audio/…, vdisplay/…, plus top-level src/windows/+src/linux/). Module names stay flat (#[path]), so caller paths are platform-agnostic.
  • crates/punktfunk-core — the one linked protocol/FEC/crypto/QUIC core (unchanged here).
  • crates/pf-driver-proto — the owned, no_std host↔driver ABI (frame ring + control plane + gamepad SHM), consumed by both the host crate and the driver workspace.
  • packaging/windows/drivers/ — the unified driver workspace on microsoft/windows-drivers-rs (vendored 0.5.1 + an iddcx subset): pf-vdisplay (the IddCx display driver), pf-dualsense + pf-xusb (the gamepad drivers, folded in by M4), wdk-iddcx (typed IddCx DDI wrappers), wdk-probe (the CI link/surface gate), vendor/{wdk-build,wdk-sys}.

2.2 Session resolution, ownership, and seam traits (Goal-1)

The old ~40-knob PUNKTFUNK_* env soup (re-read and recomputed in three places) is replaced by a resolve-once pipeline: config.rs HostConfig (typed, parsed once) → session_plan.rs SessionPlan (a Copy plan resolved once per session — CaptureBackend::resolve() picks IddPush | Dda | Wgc, resolve_topology picks SingleProcess | TwoProcessRelay; this killed the latent capture/encode backend-disagreement bug) → SessionContext (bundles the ~13 session args + plane receivers, moved into the stream thread).

Ownership is a single OnceLock VirtualDisplayManager (vdisplay/windows/manager.rs) owning a typed Arc<OwnedHandle> control-device handle (no raw-isize cross-thread smuggle), a refcounted Idle/Active/Lingering state machine, and the monitor generation; a per-session MonitorLease's Drop releases the refcount (a stale lease can't tear down a fresh monitor). This deleted a fistful of CURRENT_MON_GEN/MGR/IDD_* globals and validated on glass at 0 leaked monitors across a reconnect storm, A/B-equivalent to the shipping host.

The seam traits (VirtualDisplay/VirtualOutput/VirtualLease, Capturer, Encoder, AudioCapturer/VirtualMic/InputInjector/PadManager) got two tightenings: the capturer takes the desired OutputFormat { gpu, hdr } in (killing the capture → encode::windows_resolved_backend() back-reference), and Encoder::caps() -> EncoderCaps (§2.4) lets the session glue route loss-recovery by query.

2.3 Capture — IDD-push primary (normal and secure desktop), WGC/DDA fallback, GB1 recovery

IDD-push is the universal primary path. Capture comes straight from the driver's shared keyed-mutex texture ring (capture/windows/idd_push.rs) — no Desktop Duplication, no win32u reparenting hook. The host creates the ring; the driver opens it (permissive D:(A;;GA;;;WD) SDDL). The generation-tagged latest = gen<<40 | seq<<8 | slot stale-ring reject kills the HDR-flip garbage frame; a host-owned 3-slot OUT_RING rotated per frame is the texture-ownership contract that enables pipeline_depth=2 (convert/copy on the 3D engine overlapping NVENC on the ASIC). It captures the secure desktop (Winlogon/UAC/lock) directly (validated 2026-06-25), so there is no separate secure capturer in the primary path.

  • Open-time fallback: IddPushCapturer::open waits a bounded ~4 s for a first frame (not just DRV_STATUS_OPENED); on attach failure it returns the keepalive back so capture.rs opens DDA on the same WinCaptureTarget — never a 20 s black bail (ed58365/f98ab07).
  • Mid-session game mode-set recovery (GB1, fixed): the 250 ms poll follows the display's actual resolution (win_display::active_resolution, CCD/GDI) and recreates the ring on any descriptor change (size or HDR) → the driver re-attaches → frames resume at the game's mode, no reconnect. If a change is unrecoverable (e.g. an exclusive flip), a recovering_since clock drops the session after 3 s so the client reconnects cleanly. No protocol bump was needed — the host reads the resolution straight from Windows (c87bfe0; the driver's publish() width/height guard + flushed log is 789ad49).
  • WGC + DDA stay as demoted fallbacks for non-IddCx hardware (wgc.rs/dxgi.rs). The two-process WGC secure-desktop relay (wgc_relay.rs) is no longer load-bearing now that IDD-push handles the secure desktop; it is kept recoverable but slated for M5/M6 cleanup. (Its constraint analysis is archived in archive/windows-secure-desktop.md.)

2.4 Encode — NVENC / AMF / QSV / software; EncoderCaps; HDR

encode/windows/ dispatches per DXGI adapter vendor (open_video): NVENC (NVIDIA, direct SDK, nvenc.rs — caps-probe-before-configure, bitrate-clamp binary search, true RFI over the DPB, in-band ST.2086/CLL SEI), AMF/QSV (AMD/Intel via libavcodec, ffmpeg_win.rs — system-readback default, opt-in zero-copy D3D11; CI-only, no lab hardware), or software H.264 (sw.rs). HDR (10-bit) forces HEVC Main10 + BT.2020 PQ; the client auto-detects PQ from the VUI. The encoder adapts to a mid-session size/format/HDR change per frame (tears down + re-inits), so the GB1 capturer's resolution changes are handled downstream with no API change.

Encoder::caps() -> EncoderCaps { supports_rfi, supports_hdr_metadata } lets the session glue route loss-recovery by query (only Windows direct-NVENC overrides it; the GameStream loop gates the RFI path on supports_rfi rather than hard-coding per-backend knowledge into the glue).

2.5 Host↔driver ABI & the pf-vdisplay driver

pf-driver-proto is one no_std crate in both build graphs. It owns the frame plane (FrameToken

  • Global\pfvd-* names), the control plane (a fresh interface GUID — not SudoVDA's e5bcc234; contiguous 0x900 IOCTL ops; a GET_INFO version handshake the host asserts + bails on mismatch), and the gamepad SHM (XusbShm/PadShm incl. device_type). bytemuck-Pod + size_of and offset_of! asserts make ABI drift a compile error.

The driver (packaging/windows/drivers/pf-vdisplay/src/) is an all-Rust UMDF IddCx driver on windows-drivers-rs + the iddcx wdk-sys subset; the STEP 08 build is the checklist in §6.3, its internals are the invariants in §3, and it loads self-signed under Secure Boot (FORCE_INTEGRITY cleared post-link, §6.1). Known gaps: ownership state is still partly process-global with EvtCleanupCallback on the WDFDEVICE (a deliberate, sound choice — E1 in §4); and slot-reclaim-on-REMOVE (§4 P1.3).

2.6 Service, packaging, installer

A LocalSystem SCM supervisor (windows/service.rs) token-retargets and CreateProcessAsUserWs serve into the console session (so SendInput reaches both the streamed and the secure desktop), relaunches on session-change, and kills-on-close via a Job Object — the Sunshine/Apollo model (rationale: windows-service.md). Shipped as a signed Inno Setup setup.exe (packaging/windows/, windows-host.yml) that builds + signs all three drivers from source, bundles them + the FFmpeg DLLs, and delegates to service install. GameStream (Moonlight) is kept, but the installer/service default to secure serve (GameStream opt-in).


3. Validated invariants — preserve, do not regress

These are expensive empirical wins; keep them intact when touching the code:

  • Frame transport: host-creates/driver-opens keyed-mutex ring; generation-tagged stale-ring reject; 0 ms try-acquire / drop-on-full publish (never block the swap-chain thread); the OUT_RING rotation + pipeline_depth=2 overlap; repeat_last rotates into a fresh out-ring slot (depth-safe).
  • Driver internals: edid.rs (128-byte EDID + CTA-861.3 HDR block, dual checksums); the FP16 HDR recipe (CAN_PROCESS_FP16 + the *2 DDIs + gamma/HDR accept-stubs + HIGH_COLOR_SPACE); DEVICE_POOL per render-LUID (NVIDIA UMD/VRAM leak fix); target-id stamped on the monitor context; the two swap-chain leak fixes (borrow IDXGIDevice across SetDevice retries; check terminate at the loop top).
  • Monitor lifecycle: serialized ADD/REMOVE/teardown; restore CCD topology before REMOVE; the generation-stamped lease (a stale lease can't tear down a fresh monitor); 0-leak across reconnects.
  • HDR color math: hdr.rs (pure, unit-tested, ST.2086 + big-endian SEI); the FP16→P010/Rgb10a2 converters + hdr_p010_selftest; the cursor decomposition.
  • NVENC tuning: caps-probe-before-configure (10-bit→8-bit graceful downgrade); bitrate-clamp binary search (each GPU's real ceiling); true RFI over the DPB; CBR / infinite-GOP / P-only / ~1-frame VBV.
  • Gamepad recipe: the SwDeviceCreate identity (enumerator with no _; mandatory completion callback; synthesized DS5 compat-ids; non-null per-pad ContainerId); one pf_dualsense serving DualSense+DS4 via a device_type byte; XUSB declining WAIT_*; per-pad index via pszDeviceLocation.
  • Session glue: the trait seam + RAII keepalive teardown; host-lifetime shared services + per-session gamepads; the encode|send split + microburst pacing; build_pipeline_with_retry permanent-vs-transient classification; the GameStream VideoPacketizer (GF8 Cauchy, Moonlight byte-exact); the pairing/trust handshake.
  • Core discipline: no async on the per-frame path; pf-driver-proto is the single ABI source (drift = compile error); the version handshake the host asserts.

4. Open work / next tasks (prioritized)

P1 — ship-readiness / correctness

  1. Goal-1 → main merge — DONE. The windows-host-goal1 branch is merged (tip 3e7c9bd); the full Windows CI matrix (incl. the amf-qsv encode path that local checks skip) runs on push.
  2. IDD-push default — resolved via host.env. The shipped default host.env sets PUNKTFUNK_IDD_PUSH=1, so a fresh install runs the validated IDD-push path (with the WGC/DDA fallback in place). The bare in-code default (config.rs) is still false (the dev / non-pf-driver default); flipping it to follow the deployed default is an optional tidy.
  3. pf-vdisplay slot reclaim on REMOVE (driver robustness) — 🟡 fix landed, on-glass-validation pending. Sustained ADD/REMOVE churn wedged the driver (ADD → 0x80070490 ERROR_NOT_FOUND) because the monitor id (EDID serial / ConnectorIndex / container GUID) was a monotonic NEXT_ID, never reclaimed → IddCx accumulated a new OS target slot per cycle until exhaustion. monitor.rs now allocates the lowest free id (alloc_monitor_id), reused on REMOVE, so a fresh ADD reuses the departed monitor's target slot instead of orphaning it. CI-compile-gated; the wedge only reproduces under sustained churn on the RTX box, so this needs an on-glass reconnect-storm A/B to confirm (the box is ephemeral). Keep packaging/windows/reset-pf-vdisplay.ps1 as the recovery until validated.

P2 — hygiene / architecture completion 4. D1-host — host-crate P0 lints — deferred (low value / high churn). A crate-wide #![deny(unsafe_op_in_unsafe_fn)] produced 100+ FFI-wrap sites across the Linux modules; it wraps unsafe (discipline) rather than reducing it and doesn't improve stability, so it was deprioritized vs the OwnedHandle/RAII reductions (which are completeidd_push.rs, service.rs, the three gamepad backends via a shared gamepad_raii.rs, the SCM STOP/SESSION events as OnceLock<OwnedHandle>, the hot-loop KeyedMutexGuard, and the driver's pod_init!; all box-validated, clean sc stop in ~1 s). The driver already has the deny. Revisit D1-host as a final discipline pass (staged per-module) if desired. 5. M6 scaffolding cleanup — delete the bring-up diagnostics (spawn_observer/DebugBlock in idd_push.rs) and, once full parity is proven on glass, the host monoliths.

Explicitly NOT doing (stability decision): E1 — driver DeviceContext ownership + per-IDDCX_MONITOR EvtCleanupCallback. The current process-global design is sound: IddCx DDIs receive only an IDDCX_MONITOR handle (never the WDFDEVICE/context), and ProcessSharingDisabled makes one devnode = one host process that dies with the device. A "device-owned" variant would add a use-after-free window (the watchdog races device cleanup) for no gain, and the per-monitor cleanup callback isn't reliably reachable on this UMDF/IddCx stack. Cleanup is already deterministic (WDFDEVICE EvtCleanupCallback + cleanup_for_device_removal + the host-gone watchdog). Revisit only if max_concurrent>1 on Windows is actually needed. (monitor.rs documents this rationale at the MONITOR_MODES static.)

P3 — larger, mostly hardware-gated 6. M4 — gamepad-driver unification — substantially DONE (92e6802). pf-dualsense (DualSense / DualShock 4) and pf-xusb (Xbox 360 / XInput) now live in the unified packaging/windows/drivers/ workspace and build from source per release against the vendored wdk-sys, exactly like pf-vdisplay; build-gamepad-drivers.ps1 signs them with the shared cert. Remaining: point the driver side at pf_driver_proto::gamepad::{PadShm,XusbShm} (the host side already does — the device_type-at-offset hand-duplication is the last ABI-drift hazard), add WDF device contexts for true multi-pad, and confirm the source build matches the prior shipped binaries. 7. M5 — reshape WGC/DDA + GameStream onto session/pipeline, then delete the old relay/monoliths. AMF/QSV stays CI-only (no lab hardware). 8. On-glass behavioral validation of the committed-but-unexercised fixes: the watchdog reaping on host-kill, SET_RENDER_ADAPTER on a hybrid box (the lab box is single-dGPU), the IDD-push→DDA fallback trigger, HDR-ring sizing + out-ring repeat under real HDR/static-desktop pipelining, and the AMF/QSV encode path on real AMD/Intel hardware.


5. Operations

5.1 RTX box on-glass recipe

The persistent on-glass validator is the RTX box (ssh "Enrico Bühler"@<ip>, ENRICOS-DESKTOP, RTX 4090, PS shell). The IP FLOATS (DHCP; boots to Proxmox on reboot → ephemeral, unreachable after a reboot; recently .173/.158 — confirm current first; never reboot it, never depend on it surviving). It has WDK 26100 + LLVM 21.1.2 + the Rust toolchain; build clone at C:\Users\Public\pf-rewrite (the user's active driver-dev tree — don't clobber uncommitted WIP; use a worktree). Username has a ü → quote it; it only breaks SDL3/client builds, not the host. To validate a host branch: worktree-checkout, build with CARGO_TARGET_DIR=C:\t-goal1, then stop the PunktfunkHost service, back up the binary + %ProgramData%\punktfunk\host.env, copy your build in, restart, drive punktfunk-probe.exe loopback, then restore + git worktree remove. Drive over ssh via powershell -EncodedCommand <base64 UTF-16LE> (plain quoting mangles; prefer Write-Output/file-redirect for clean output). Driver redeploy: packaging/windows/redeploy-pf-vdisplay.ps1; ghost-monitor recovery: reset-pf-vdisplay.ps1.

5.2 CI / validation

The persistent build validator is the windows-amd64 CI runner (no GPU — fine for builds / iddcx link / /INTEGRITYCHECK self-sign / the surface-asserts; live NVENC encode + on-glass defers to the RTX box). Workflows: windows-host.yml (the host installer), windows-drivers.yml (the driver workspace build + FORCE_INTEGRITY clear), windows-drivers-provision.yml (WDK/LLVM toolchain), windows-msix.yml (the client). A single Windows runner serializes the whole fleet; a Cargo.toml touch costs ~25 min of queue, so driver pushes that avoid Cargo.toml skip the fleet serialization.

Local pre-push checks (this Linux box can't compile the Windows paths):

cargo test  -p pf-driver-proto                 # the ABI crate (cross-platform)
cargo check -p punktfunk-host                     # Linux paths; win_* mods are #[cfg(windows)]
cargo clippy -p punktfunk-host --all-targets -- -D warnings
# Windows host clippy (on the box): PUNKTFUNK_NVENC_LIB_DIR=C:\t\nvenc;
#   cargo clippy -p punktfunk-host --features nvenc --target x86_64-pc-windows-msvc -- -D warnings
# Driver build (on the box): cd packaging/windows/drivers; Version_Number=10.0.26100.0;
#   LIBCLANG_PATH='C:\Program Files\LLVM\bin'; cargo build

Note: a pre-existing rustfmt-version drift exists in some Windows-only files (this box's rustfmt 1.9.0 wraps offset_of!/unsafe fn differently than the runner's) — don't reformat unrelated files to chase it.

5.3 Env knobs (Windows host)

PUNKTFUNK_IDD_PUSH=1 (capture from the driver ring; shipped host.env default on, in-code default off), PUNKTFUNK_ENCODER=auto|nvenc (auto → vendor-detect), PUNKTFUNK_10BIT=1 + PUNKTFUNK_HDR_SHADER_P010=1 (HDR), PUNKTFUNK_SECURE_DDA=1, PUNKTFUNK_NO_WGC=1 (pure DDA), PUNKTFUNK_ZEROCOPY=1, PUNKTFUNK_MONITOR_LINGER_MS, PFVD_DEBUG_LOG=1 (driver file log — release builds are silent without it). Config lives in %ProgramData%\punktfunk\host.env; logs in %ProgramData%\punktfunk\logs\host.log.

5.4 Build / deploy / packaging

x64-only by design (no ARM64 NVIDIA driver). The installer is the thin-.iss / fat-binary model delegating to service install; tag host-win-vX.Y.Z. The drivers are built + FORCE_INTEGRITY-cleared + signed + Inf2Cat'd in CI from source. DriverVer must bump on any driver change; create the ROOT devnode via nefcon (devgen is forbidden).


6. Reference (hard-won — keep)

6.1 The /INTEGRITYCHECK answer

wdk-build emits cargo::rustc-cdylib-link-arg=/INTEGRITYCHECK unconditionally (no cfg/env/Config opt-out), so a self-signed driver can't load (CodeIntegrity 3004/3089). The fix: a deterministic, idempotent post-link step packaging/windows/clear-force-integrity.ps1 clears the PE FORCE_INTEGRITY bit (0x0080 @ e_lfanew+0x5e) + verifies (CI-proven 0x01E0 → 0x0160), before signing. Packaging order: cargo build → clear-force-integrity → sign .dllInf2Cat → sign .cat. (A public build would use real attestation signing, which satisfies /INTEGRITYCHECK legitimately.)

6.2 The iddcx binding on wdk-sys (the make-or-break — proven, the 6 bindgen knobs)

IddCx DDIs are function-table dispatched (IddFunctions[] indexed by _IDDFUNCENUM::<Name>TableIndex, IddDriverGlobals implicit arg 1) — the same model wdk-sys already implements for WDF. The vendored windows-drivers-rs 0.5.1 (packaging/windows/drivers/vendor/, [patch.crates-io]'d) gets a first-class ApiSubset::Iddcx that bindgens iddcx/1.10/IddCx.h reusing the identical wdk_default(config) baseline (so WDF/DXGI types resolve to, not redefine, wdk-sys's — type-identity by construction). The six knobs generate_iddcx needed (each a real gotcha, all CI-proven):

  1. --language=c++wdk_default parses C; IddCx.h's IDARG_* typedefs need C++ (else a "must use 'struct' tag" cascade).
  2. -DIDD_STUB — table-dispatch mode; skips IddCxFuncEnum.h's #error IDDCX_VERSION_MAJOR not defined. Do NOT add WDF_STUB (would desync the shared WDF type-identity).
  3. allowlist_recursively(false) + allowlist_file("(?i).*iddcx.*"), full codegen (no .complement()) — emit ONLY IddCx items; WDF/Win types resolve via use crate::types::*.
  4. allowlist_type("_?DXGI_.*" / "IDXGI.*" / "_?OPM_.*" / "_?D3DCOLORVALUE") — emit the non-WDF types wdk-sys doesn't bindgen, locally. The _? is load-bearing (typedef struct _OPM_X {} OPM_X needs the tag AND the alias).
  5. pub type UINT = ::core::ffi::c_uint; in src/iddcx.rsUINT is absent from crate::types.
  6. translate_enum_integer_types(true) — emit native u32 reprs for the DXGI/OPM ModuleConsts enums (nested modules can't see a parent UINT).

Wrapper note: table dispatch via _IDDFUNCENUM::<Name>TableIndex as usize (the ModuleConsts const, not a NewType .0); NTSTATUS is plain i32 (wdk_sys::NT_SUCCESS). The driver build.rs adds the IddCxStub link-search (the import lib is under iddcx\1.0\ even though headers are 1.10) + #[no_mangle] pub static IddMinimumVersionRequired: ULONG = 4. The versioned IDD_STRUCTURE_SIZE! path is dropped — the WDK links the iddcx 1.0 stub (lacks the version table); we target 1.10 vs a current framework, so size_of is exactly correct.

6.3 Driver port checklist (STEP 08, as landed)

  1. workspace pf-vdisplay(cdylib)+wdk-iddcx; prove std::thread+OwnedHandle link under UMDF (done).
  2. wdk-iddcx: 11 typed DDI wrappers via one dispatch macro + re-export the inbound PFN_* types.
  3. DriverEntry + IDD_CX_CLIENT_CONFIG (15 callbacks) + DeviceInitConfig + WdfDeviceCreate + CreateDeviceInterface (the owned pf GUID) + DeviceInitialize; edid.rs salvaged verbatim.
  4. DeviceContext + WDF_DECLARE_CONTEXT_TYPE blob; init_adapter in D0Entry (caps + FP16) → AdapterInitAsync; the *2 mode DDIs + query_target_info + gamma/HDR accept-stubs. (Box gate: loads under Secure Boot, enumerates as an IddCx adapter, Status OK.)
  5. control plane (GET_INFO version handshake the host asserts, ADD/REMOVE/SET_RENDER_ADAPTER/PING/ CLEAR_ALL) + create_monitor + real mode DDIs + watchdog + mode bounds; host switched to pf_driver_proto.
  6. Direct3DDevice + assign/unassign + SwapChainProcessor (worker, SetDevice 60×@50 ms single-borrow retry, top-of-loop terminate, ReleaseAndAcquireBuffer2, from_raw_borrowed).
  7. FramePublisher on pf_driver_proto::frame + keyed-mutex RAII guard; wire into run_core. (Box: full IDD-push glass-to-glass + the secure-desktop gate — validated 2026-06-25.)
  8. HDR / FP16 ring (validated: Mac connects WITH HDR).
  9. its own .inx + an unsafe-reduction pass (deny(unsafe_op_in_unsafe_fn), per-site // SAFETY:).

Remaining driver work beyond STEP 8: E1 (DeviceContext-owned state + per-IDDCX_MONITOR EvtCleanupCallback → unblock max_concurrent>1 — see §4 for why it's deliberately deferred), the slot-reclaim-on-REMOVE fix (§4 P1.3), and folding the gamepad-driver side onto pf_driver_proto (M4 tail, §4 P3).

6.4 Resolved product decisions (the five forks)

A the host was refactored in place (staged, behavior-preserving), not greenfield-rebuilt — the driver was rebuilt fresh. B IDD-push primary for everything incl. the secure desktop (validated); WGC+DDA demoted to non-IddCx fallbacks. C all drivers on microsoft/windows-drivers-rs (+ the iddcx subset; /INTEGRITYCHECK solved) — done for pf-vdisplay and now for the gamepad drivers (M4, 92e6802). D keep GameStream (Moonlight), default to secure serve. E concurrent sessions: the host-side preempt dance was removed by the ownership-model work, but true max_concurrent>1 on Windows stays blocked on the E1 driver swap-chain-reuse work (deliberately deferred, §4). Rejected: DeviceContext-per-monitor ownership — see the E1 stability decision in §4 (it would add a use-after-free window for no gain under ProcessSharingDisabled).


Origins & design rationale (from the original plan)

This folds in the durable rationale from the original Windows host + client plan (windows-host.md, now a stub; full original text in git history). The Windows host began (2026-06-10 to 2026-06-14) as a "add backends behind the existing traits" job, not a parallel port — punktfunk-core and the whole control plane are platform-agnostic, and the host already compiled on non-Linux (macOS) thanks to existing cfg(target_os) gating. These framing decisions shaped what shipped and still explain why the code is the way it is:

  • Build order: host-first. A user preference (the research had recommended client-first, since the client is unblocked by the no-GPU problem and becomes the host's test endpoint). The trade-off held — the GPU-gated steps were the only ones that stalled GPU-less.
  • Trait-based abstraction → ~95% reuse. punktfunk-core (protocol/FEC/crypto/session/transport/QUIC/ C ABI), the GameStream wire logic (mDNS, serverinfo, pairing, RTSP, ENet), the management REST API + native_pairing/discovery, and the punktfunk1/spike/pipeline orchestration all carried over unchanged — only the OS-touching backends behind Capturer/Encoder/VirtualDisplay/InputInjector/ AudioCapturer/VirtualMic are new #[cfg(windows)] code. Getting to MSVC needed only ~3 cfg-gates (gate the std::os::fd/OwnedFd unix-isms in main.rs/vdisplay.rs).
  • The no-GPU dev strategy. Most of the port was built + validated on a GPU-less Windows VM: the MSVC compile, the virtual-display control path (WARP), the openh264 software-encode pipeline (full capture→encode→FEC→UDP transport minus HW), SendInput injection + interactive-session/desktop-reattach, gamepad + rumble, and the entire client (software-decode loopback). Only NVENC-D3D11 zero-copy, the DDA-vs-WGC bake-off, split-encode/bitrate-ceiling, and all glass-to-glass numbers deferred to a real NVIDIA box (no perf claim transfers from Linux).
  • Windows-specific structural issues (no Linux precedent) — these are the gotchas that drove the service + capture design and remain true:
    • Interactive session, not a Session-0 service. SendInput can't reach the desktop from Session 0; Desktop Duplication / capture need the interactive session. Hence the SYSTEM-in-interactive-session supervisor (§2.6, windows-service.md) and the OpenInputDesktop/ SetThreadDesktop re-attach to survive UAC/lock desktop switches.
    • Clock epoch. The skew handshake assumes both ends read the same realtime epoch in ns — the Windows host must emit timestamps from GetSystemTimePreciseAsFileTime→Unix-epoch-ns, or cross-machine latency
      • ClockProbe/ClockEcho break (std SystemTime on Windows is historically coarser).
    • No audio endpoint on a headless IDD. WASAPI loopback needs a real/virtual render device; the virtual mic (client→host) has no clean user-mode path — deferred.
    • Color/range. All clients assume BT.709 limited-range; the BGRA→I420/NV12 path must match or colors wash out — validated against the existing decoders.

SudoVDA → pf-vdisplay evolution. The original plan was built around SudoVDA, an off-the-shelf indirect display driver (the same IDD Apollo ships) — chosen to avoid writing/WHQL-signing a driver and to get arbitrary WxH@Hz modes on the fly. It carried the host all the way to live-validated NVENC on a real RTX 4090. It was then replaced by the all-Rust pf-vdisplay IddCx driver (which solved /INTEGRITYCHECK self-signing, §6.1, and gave us the IDD-push zero-copy capture path that captures the secure desktop directly) and deleted in commit 84a3b95pf-vdisplay is now the sole virtual-display backend. The full SudoVDA control protocol (IOCTL layout, watchdog keepalive, GDI-name resolution) lives in git history if ever needed as a reference.