Files
punktfunk/docs/windows-host-rewrite.md
T
enricobuehler cd3368fc71 docs(windows-host): KeyedMutexGuard done + record the on-glass build validation
Goal 3: the IDD-push hot-loop KeyedMutexGuard (6585643) landed, and the whole
session's Windows + driver work is now ON-GLASS BUILD-VALIDATED on the RTX box —
host clippy -D warnings clean + driver build clean (the gate that surfaced + got
11 lints fixed in bd05bc8). Only the deferred host P0 lints + the deliberately-
left service.rs SCM-handler event smuggling remain, plus an optional latency A/B.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:16:23 +00:00

32 KiB
Raw Blame History

Windows Host — Architecture, Status & Roadmap

Single source of truth for the punktfunk Windows streaming host: the all-Rust pf-vdisplay IddCx virtual-display driver + IDD-push zero-copy capture + NVENC/AMF/QSV encode, shipped as a signed Inno Setup installer with a LocalSystem SCM service. Live-validated on the RTX box through 5120×1440@240 HDR, the secure desktop (lock/UAC), and a fullscreen game.

This file consolidates and replaces five earlier docs (now retired into it): the rewrite design plan, the Goal-1 staged-refactor plan, the audit, the audit-remediation tracker, and the fullscreen-game capture-bug analysis. See the consolidation note for what moved where. Last updated 2026-06-26. Work lives on branch windows-host-goal1 (off main, not yet merged).


1. Status at a glance

The Windows host is functionally complete and validated on glass. The hard, high-risk proofs are done: a clean all-Rust IddCx driver on the unified windows-drivers-rs stack (the /INTEGRITYCHECK answer + the iddcx wdk-sys binding), IDD-push zero-copy capture at 5K@240 HDR, the secure desktop (Winlogon / UAC / lock), and the host re-architected into a clean, typed, layered shape. What remains is non-blocking: hygiene (host unsafe lints, a few OwnedHandle rollouts), the SudoVDA backend deletion (decoupled, not yet removed), a driver robustness gap (slot reclaim), the gamepad-driver unification (M4), and old-monolith cleanup (M6) — plus the merge to main.

One framing correction baked into this doc: the host was not greenfield-rebuilt as the original plan imagined. It was refactored in place via a staged, behavior-preserving sequence (the "Goal-1" plan), which kept the live-validated host working at every step. The driver, by contrast, was rebuilt fresh (the new packaging/windows/drivers/pf-vdisplay/ tree).

Scorecard (verified against windows-host-goal1 HEAD, 2026-06-25)

Item Status Evidence
Goal 1 — clean, layered host architecture DONE config.rs (HostConfig), session_plan.rs (SessionPlan), SessionContext, windows/+linux/ confinement (38c68c3), VirtualDisplayManager (§2.5), EncoderCaps (0ccd0fe)
Goal 2 — drop every trace of SudoVDA DONE reach-in decoupled (F1: d638a93/e60cda3win_adapter/win_display), then the sudovda.rs backend + the dual-backend select deleted (this branch) — pf-vdisplay is the sole Windows virtual-display backend
Goal 3 — minimize unsafe + P0 lints 🟡 PARTIAL (box-validated) driver deny(unsafe_op_in_unsafe_fn) (a755d6e); OwnedHandle/RAII rolloutidd_push.rs (011607e, view-leak fix) + service.rs child/job (4c95ba7) + the 3 gamepad backends via shared gamepad_raii.rs (e5c2b4e) + the IDD-push KeyedMutexGuard hot loop (6585643); driver pod_init! (bf57704, 27→1). On-glass clean: host clippy -D warnings + driver build (RTX box; bd05bc8 fixed 11 lints the gate surfaced). Remaining: host-crate P0 lints (deferred — churn>value), the service.rs SCM-handler event smuggling (deliberately left)
M0 — proto ABI + driver toolchain + /INTEGRITYCHECK + iddcx DONE pf-driver-proto; vendored windows-drivers-rs 0.5.1; clear-force-integrity.ps1; CI-green
M1 — new IddCx driver, first light + HDR DONE (on-glass) STEP 08 (d7a9fbfcd59151); HDR live ("Mac connects WITH HDR", 6399d28)
M2 — IDD-push capture + NVENC, glass-to-glass DONE (on-glass) 5120×1440@240 HDR zero-copy; integrated into the host path
M3 — service / input / audio / secure desktop DONE (on-glass) secure desktop (lock/UAC) owner-confirmed 2026-06-25 — IDD-push captures it + input reaches it
M4 — gamepad drivers onto the unified stack OPEN pf_dualsense/pf_xusb still standalone (packaging/windows/{dualsense,xusb}-driver/), not in drivers/ workspace
M5 — WGC/DDA fallback reshape + GameStream-on-pipeline + AMF/QSV 🟡 PARTIAL fallbacks exist (wgc.rs/wgc_relay.rs/dxgi.rs), not reshaped onto the new seams; AMF/QSV CI-only (no lab hw)
M6 — cut over + delete the old monoliths 🟡 PARTIAL old vdisplay-driver/ tree deleted (a2bd0cd); host monoliths + bring-up scaffolding (spawn_observer/DebugBlock) remain
Game-capture bug (GB1) — fullscreen game breaks IDD-push FIXED resolution-listening recovery (c87bfe0) + open-time DDA failover (f98ab07) + driver guard/log (789ad49)
Audit P0/P1/P2 mostly RESOLVED watchdog, SET_RENDER_ADAPTER, log gate, mode bounds, IDD-push fallback, F1, out-ring/HDR-ring, proto asserts — all landed; open: host hygiene (§8), E1 completion, slot-reclaim

2. Architecture (what is on disk)

2.1 Layering & crates

  • crates/punktfunk-host — one shared host crate (Linux + Windows; not split). Platform code is confined under per-module windows/+linux/ folders behind #[cfg] seams (capture/{windows,linux}/, encode/{windows,linux}/, inject/{windows,linux}/, audio/{windows,linux}/, vdisplay/{windows,linux}/, and top-level src/windows/+src/linux/). Module names stay flat (#[path]), so caller paths are platform-agnostic.
  • crates/punktfunk-core — the one linked protocol/FEC/crypto/QUIC core (unchanged here).
  • crates/pf-driver-proto — the owned, no_std host↔driver ABI (frame ring + control plane + gamepad SHM), consumed by both the host crate and the driver workspace (§2.7).
  • packaging/windows/drivers/ — the unified driver workspace on microsoft/windows-drivers-rs (vendored 0.5.1 + an iddcx subset): members pf-vdisplay (the IddCx display driver), wdk-iddcx (the typed IddCx DDI wrappers), wdk-probe (the CI link/surface gate), vendor/{wdk-build,wdk-sys}.

2.2 Session resolution — HostConfig → SessionPlan → SessionContext (Goal-1 realized)

The old ~40-knob PUNKTFUNK_* env soup, re-read and recomputed in three places, is replaced by a resolve-once pipeline:

  • config.rs HostConfig — typed config parsed once from host.env/env/flags (idd_push/encoder_pref/no_wgc/capture_backend/render_adapter/secure_dda/ten_bit/zerocopy/…). Each field's parser is byte-identical to the read it replaced. (Runtime-mutated Linux session vars from vdisplay::apply_session_env, and single-use local tuning knobs, are deliberately kept live — see the config.rs header.)
  • session_plan.rs SessionPlan { display, capture, topology, encoder, input_format, bit_depth, hdr, pipeline_depth } — a Copy plan resolved once per session from HostConfig + the negotiated bit-depth, logged, and threaded through build_pipeline. CaptureBackend::resolve() is the one resolver (IddPush | Dda | Wgc); resolve_topology decides SingleProcess | TwoProcessRelay. This killed the latent capture/encode backend-disagreement bug.
  • SessionContext — bundles the session entry's ~13 args (was #[allow(too_many_arguments)]) and the plane receivers into one owned struct moved into the stream thread.

2.3 Ownership model — VirtualDisplayManager + MonitorLease (§2.5 realized)

A single OnceLock VirtualDisplayManager (vdisplay/windows/manager.rs) owns a typed Arc<OwnedHandle> control-device handle (no raw-isize cross-thread smuggle), the refcounted Idle/Active/Lingering state machine, and the monitor generation (AtomicU64). Both Windows backends (pf_vdisplay, sudovda) shrank to thin VdisplayDriver impls (open/add_monitor/remove_monitor/ ping) behind it; MonitorKey = Guid | Session(u64). A per-session MonitorLease's Drop releases the refcount (a stale lease can't tear down a fresh monitor). This deleted the old CURRENT_MON_GEN/MON_GEN/ two-MGR/IDD_PERSIST/IDD_SETUP_LOCK/IDD_SESSION_STOP globals. Validated on glass: 0 leaked active monitors across a reconnect storm, A/B-equivalent to the shipping host. (The 5-agent map found CURRENT_MON_GEN had been write-only — the per-frame "monitor-gen bail" was never wired — so the gen lives on the manager + lease only.)

2.4 The seam traits

VirtualDisplay/VirtualOutput/VirtualLease (RAII keepalive = release), Capturer (next_frame/try_latest/set_active/hdr_meta/pipeline_depth), Encoder (submit/caps/request_keyframe/set_hdr_meta/invalidate_ref_frames/poll/flush), AudioCapturer/VirtualMic/InputInjector/PadManager. Realized tightenings: the capturer takes the desired OutputFormat { gpu, hdr } in (killed the capture → encode::windows_resolved_backend() back-reference recomputed in dxgi.rs); and Encoder::caps() -> EncoderCaps { supports_rfi, supports_hdr_metadata } lets the session glue route loss-recovery by query (only Windows direct-NVENC overrides it; the GameStream loop gates the RFI path on supports_rfi).

2.5 Capture — IDD-push primary (normal and secure desktop), WGC/DDA fallback, GB1 recovery

IDD-push is the universal primary path. Capture comes straight from the driver's shared keyed-mutex texture ring (capture/windows/idd_push.rs) — no Desktop Duplication, no win32u reparenting hook. The host creates the ring; the driver opens it (permissive D:(A;;GA;;;WD) SDDL). The generation-tagged latest = gen<<40 | seq<<8 | slot stale-ring reject kills the HDR-flip garbage frame; a host-owned 3-slot OUT_RING rotated per frame is the texture-ownership contract that enables pipeline_depth=2 (convert/copy on the 3D engine overlapping NVENC on the ASIC). It captures the secure desktop (Winlogon/UAC/lock) directly (validated 2026-06-25), so there is no separate secure capturer in the primary path.

  • Open-time fallback: IddPushCapturer::open waits a bounded ~4 s for a first frame (not just DRV_STATUS_OPENED); on attach failure it returns the keepalive back so capture.rs opens DDA on the same WinCaptureTarget — never a 20 s black bail (audit §5.1, ed58365/f98ab07).
  • Mid-session game mode-set recovery (GB1, fixed): the 250 ms poll follows the display's actual resolution (win_display::active_resolution, CCD/GDI) and recreates the ring on any descriptor change (size or HDR) → the driver re-attaches → frames resume at the game's mode, no reconnect. If a change is unrecoverable (e.g. an exclusive flip), a recovering_since clock drops the session after 3 s so the client reconnects cleanly. No protocol bump was needed — the host reads the resolution straight from Windows (c87bfe0; the driver's publish() width/height guard + flushed log is 789ad49).
  • WGC + DDA stay as demoted fallbacks for non-IddCx hardware (wgc.rs/dxgi.rs). The two-process WGC secure-desktop relay (wgc_relay.rs) is no longer load-bearing now that IDD-push handles the secure desktop; it is kept recoverable but slated for M5/M6 cleanup.

2.6 Encode — NVENC / AMF / QSV / software; EncoderCaps; HDR

encode/windows/ dispatches per DXGI adapter vendor (open_video): NVENC (NVIDIA, direct SDK, nvenc.rs — caps-probe-before-configure, bitrate-clamp binary search, true RFI over the DPB, in-band ST.2086/CLL SEI), AMF/QSV (AMD/Intel via libavcodec, ffmpeg_win.rs — system-readback default, opt-in zero-copy D3D11; CI-only, no lab hardware), or software H.264 (sw.rs). HDR (10-bit) forces HEVC Main10 + BT.2020 PQ; the client auto-detects PQ from the VUI. The encoder adapts to a mid-session size/format/HDR change per frame (tears down + re-inits), so the GB1 capturer's resolution changes are handled downstream with no API change.

2.7 Host↔driver ABI — pf-driver-proto

One no_std crate, both build graphs. Owns the frame plane (SharedHeader, FrameToken { generation, seq, slot } with pack/unpack, Global\pfvd-* name helpers), the control plane (fresh interface GUID — not SudoVDA's e5bcc234; contiguous 0x900 IOCTL ops; u64 session id; a real GET_INFO version handshake the host asserts + bails on mismatch), and the gamepad SHM (XusbShm 64 B, PadShm 256 B incl. device_type). bytemuck-Pod + size_of and offset_of! asserts make ABI drift a compile error (95dcef3). The host-side gamepad consumers derive their layouts from here; the driver-side gamepad drivers do not yet (M4).

2.8 The pf-vdisplay IddCx driver

All-Rust UMDF IddCx driver on windows-drivers-rs + the iddcx wdk-sys subset. STEP 08 landed (packaging/windows/drivers/pf-vdisplay/src/): entry.rs (DriverEntry + IDD_CX_CLIENT_CONFIG, 15 callbacks), adapter.rs (caps + FP16 + SET_RENDER_ADAPTER), monitor.rs/callbacks.rs (the *2 HDR mode DDIs, EDID verbatim), swap_chain_processor.rs (the worker, SetDevice-retry + top-of-loop terminate), frame_transport.rs (the FramePublisher on pf_driver_proto::frame), control.rs (the typed IOCTL dispatch + host-gone watchdog + mode bounds). Self-signed-loadable under Secure Boot (FORCE_INTEGRITY cleared post-link). Known gaps: ownership state is still partly process-global (MONITOR_MODES/NEXT_ID/ADAPTER/DEVICE_POOL) with EvtCleanupCallback on the WDFDEVICE (not per-IDDCX_MONITOR) — see E1 in §4; and it does not reclaim IddCx monitor slots on REMOVE (the ghost-monitor wedge, §4).

2.9 Service, packaging, installer

A LocalSystem SCM supervisor (service.rs) token-retargets and CreateProcessAsUserWs serve into the console session (so SendInput reaches the streamed desktop + the secure desktop), relaunches on session-change, and kills-on-close via a Job Object. Shipped as a signed Inno Setup setup.exe (packaging/windows/, windows-host.yml) that bundles the new pf-vdisplay driver (pf_vdisplay.inx in-tree, old vdisplay-driver/ tree deleted) + FFmpeg DLLs and delegates to service install. GameStream (Moonlight) is kept but the installer/service default to secure serve (GameStream opt-in).


3. Validated invariants — preserve, do not regress

These are expensive empirical wins; keep them intact when touching the code:

  • Frame transport: host-creates/driver-opens keyed-mutex ring; generation-tagged stale-ring reject; 0 ms try-acquire / drop-on-full publish (never block the swap-chain thread); the OUT_RING rotation + pipeline_depth=2 overlap; repeat_last rotates into a fresh out-ring slot (depth-safe).
  • Driver internals: edid.rs (128-byte EDID + CTA-861.3 HDR block, dual checksums); the FP16 HDR recipe (CAN_PROCESS_FP16 + the *2 DDIs + gamma/HDR accept-stubs + HIGH_COLOR_SPACE); DEVICE_POOL per render-LUID (NVIDIA UMD/VRAM leak fix); target-id stamped on the monitor context; the two swap-chain leak fixes (borrow IDXGIDevice across SetDevice retries; check terminate at the loop top).
  • Monitor lifecycle: serialized ADD/REMOVE/teardown; restore CCD topology before REMOVE; the generation-stamped lease (a stale lease can't tear down a fresh monitor); 0-leak across reconnects.
  • HDR color math: hdr.rs (pure, unit-tested, ST.2086 + big-endian SEI); the FP16→P010/Rgb10a2 converters + hdr_p010_selftest; the cursor decomposition.
  • NVENC tuning: caps-probe-before-configure (10-bit→8-bit graceful downgrade); bitrate-clamp binary search (each GPU's real ceiling); true RFI over the DPB; CBR / infinite-GOP / P-only / ~1-frame VBV.
  • Gamepad recipe: the SwDeviceCreate identity (enumerator with no _; mandatory completion callback; synthesized DS5 compat-ids; non-null per-pad ContainerId); one pf_dualsense serving DualSense+DS4 via a device_type byte; XUSB declining WAIT_*; per-pad index via pszDeviceLocation.
  • Session glue: the trait seam + RAII keepalive teardown; host-lifetime shared services + per-session gamepads; the encode|send split + microburst pacing; build_pipeline_with_retry permanent-vs-transient classification; the GameStream VideoPacketizer (GF8 Cauchy, Moonlight byte-exact); the pairing/trust handshake.
  • Core discipline: no async on the per-frame path; pf-driver-proto is the single ABI source (drift = compile error); the version handshake the host asserts.

4. Open work / next tasks (prioritized)

P1 — ship-readiness / correctness

  1. Merge windows-host-goal1main + push (outward-facing → confirm first). Pushing also runs the full Windows CI matrix incl. the amf-qsv encode path, which local checks skip.
  2. Make IDD-push the default — today it is gated behind PUNKTFUNK_IDD_PUSH (config.rs default false); deployment sets it in host.env. Flip the code default (with the WGC/DDA fallback already in place) so a fresh install runs the validated path, or document the host.env requirement explicitly.
  3. pf-vdisplay slot reclaim on REMOVE (driver robustness) — 🟡 fix landed, on-glass-validation pending. Sustained ADD/REMOVE churn wedged the driver (ADD → 0x80070490 ERROR_NOT_FOUND) because the monitor id (EDID serial / ConnectorIndex / container GUID) was a monotonic NEXT_ID, never reclaimed → IddCx accumulated a new OS target slot per cycle until exhaustion. monitor.rs now allocates the lowest free id (alloc_monitor_id), reused on REMOVE, so a fresh ADD reuses the departed monitor's target slot instead of orphaning it. CI-compile-gated; the wedge only reproduces under sustained churn on the RTX box, so this needs an on-glass reconnect-storm A/B to confirm (the box is ephemeral). Keep packaging/windows/reset-pf-vdisplay.ps1 as the recovery until validated.

P2 — hygiene / architecture completion (the unsafe-reduction + stability priority) 4. D1-host — host-crate P0 lints. Add #![deny(unsafe_op_in_unsafe_fn)] + #![warn(clippy::undocumented_unsafe_blocks)] to the host crate and fix the fallout (~30 of the 52 unsafe fns need an inner unsafe {}). Stage it per-module, Linux-first (item-level #[deny] on linux/zerocopy/cuda.rs/egl.rs, encode/linux/vaapi.rs — locally verifiable), then the Windows modules (CI-gated), then promote to crate-level. The driver already has the deny. 5. D2 — OwnedHandle / RAII rollout. donecapture/windows/idd_push.rs (011607e: a MappedSection RAII for the mapping handle + the leaked MapViewOfFile view, + OwnedHandle for the event / ring-slot shared handles); windows/service.rs (4c95ba7: the child process/thread + Job handles, ~9 CloseHandle deleted); and the three gamepad backends (e5c2b4e: a shared inject/windows/gamepad_raii.rsShm for the section+view, SwDevice for the devnode — replacing the duplicated create_shm_section + three hand-written Drops). Remaining (deliberately left): the service.rs AtomicIsize STOP/SESSION events — smuggled into the C SCM handler, a separate riskier redesign. manager.rs/pf_vdisplay.rs already used the pattern. 6. Hot-loop KeyedMutexGuard done (6585643) — the IDD-push consume loop's hand-written AcquireSync/ReleaseSync (with its "don't ?-return between them or you leak the lock + stall the driver" caveat) is now a RAII guard scoped to the convert/copy block: same release point (latency unchanged), but leak-proof on any early return. Driver pod_init! (bf57704, 27 mem::zeroed → 1). Skipped ThreadBound<T> (each unsafe impl Send wraps a distinct type — churn, no real gain) and scratched the IOCTL dispatcher (control.rs's read_input<T>/write_output_complete<T> are already generic with minimal unsafe).

On-glass build validation (RTX box, 2026-06-26). Built this branch on the box in an isolated worktree: host cargo clippy -p punktfunk-host --features nvenc -D warnings = CLEAN, driver cargo build = CLEAN — validating the whole session's Windows + driver work on real hardware. The clippy gate (which the goal1/§2.5 work never ran — it used cargo check) surfaced + fixed 11 lint issues (bd05bc8: 9 redundant as *mut c_void, an if_same_then_else, an unused_unsafe in pod_init!). Remaining only a runtime latency A/B for the KeyedMutexGuard (provably equivalent — same release point) if a deeper check is wanted. 7. D1-host P0 lints — deferred (low value / high churn). A crate-wide #![deny(unsafe_op_in_unsafe_fn)] produced 100+ FFI-wrap sites across the Linux modules; it wraps unsafe (discipline) rather than reducing it and doesn't improve stability, so it was deprioritized vs the OwnedHandle/RAII reductions above. Revisit as a final discipline pass (staged per-module) if desired. 8. M6 scaffolding cleanup — delete the bring-up diagnostics (spawn_observer/DebugBlock in idd_push.rs) and, once full parity is proven on glass, the host monoliths.

Explicitly NOT doing (stability decision): E1 — driver DeviceContext ownership + per-IDDCX_MONITOR EvtCleanupCallback. The current process-global design is sound: IddCx DDIs receive only an IDDCX_MONITOR handle (never the WDFDEVICE/context), and ProcessSharingDisabled makes one devnode = one host process that dies with the device. A "device-owned" variant would add a use-after-free window (the watchdog races device cleanup) for no gain, and the per-monitor cleanup callback isn't reliably reachable on this UMDF/IddCx stack. Cleanup is already deterministic (WDFDEVICE EvtCleanupCallback + cleanup_for_device_removal + the host-gone watchdog). Revisit only if max_concurrent>1 on Windows is actually needed. (monitor.rs documents this rationale at the MONITOR_MODES static.) 8. M6 scaffolding cleanup — delete the bring-up diagnostics (spawn_observer/DebugBlock in idd_push.rs) and, once full parity is proven on glass, the host monoliths.

P3 — larger, mostly hardware-gated 9. M4 — gamepad-driver unification. Fold pf_dualsense + pf_xusb (standalone packaging/windows/{dualsense,xusb}-driver/ on the old WDF stack) into the unified drivers/ workspace on windows-drivers-rs with WDF device contexts (true multi-pad), and point the driver side at pf_driver_proto::gamepad::{PadShm,XusbShm} (host side already does — the device_type-at-offset-140 hand-duplication is the last ABI-drift hazard). Largest item. 10. M5 — reshape WGC/DDA + GameStream onto session/pipeline, then delete the old relay/monoliths. AMF/QSV stays CI-only (no lab hardware). 11. On-glass behavioral validation of the committed-but-unexercised fixes: the watchdog reaping on host-kill, SET_RENDER_ADAPTER on a hybrid box (the lab box is single-dGPU), the IDD-push→DDA fallback trigger, HDR-ring sizing + out-ring repeat under real HDR/static-desktop pipelining.


5. Operations

5.1 RTX box on-glass recipe

The persistent on-glass validator is the RTX box (ssh "Enrico Bühler"@<ip>, ENRICOS-DESKTOP, RTX 4090, PS shell). The IP FLOATS (DHCP; boots to Proxmox on reboot → ephemeral, unreachable after a reboot; recently .173/.158 — confirm current first; never reboot it, never depend on it surviving). It has WDK 26100 + LLVM 21.1.2 + the Rust toolchain; build clone at C:\Users\Public\pf-rewrite (the user's active driver-dev tree — don't clobber uncommitted WIP; use a worktree). Username has a ü → quote it; it only breaks SDL3/client builds, not the host. To validate a host branch: worktree-checkout, build with CARGO_TARGET_DIR=C:\t-goal1, then stop the PunktfunkHost service, back up the binary + %ProgramData%\punktfunk\host.env, copy your build in, restart, drive punktfunk-probe.exe loopback, then restore + git worktree remove. Drive over ssh via powershell -EncodedCommand <base64 UTF-16LE> (plain quoting mangles; prefer Write-Output/file-redirect for clean output). Driver redeploy: packaging/windows/redeploy-pf-vdisplay.ps1; ghost-monitor recovery: reset-pf-vdisplay.ps1.

5.2 CI / validation

The persistent build validator is the windows-amd64 CI runner (no GPU — fine for builds / iddcx link / /INTEGRITYCHECK self-sign / the surface-asserts; live NVENC encode + on-glass defers to the RTX box). Workflows: windows-host.yml (the host installer), windows-drivers.yml (the driver workspace build + FORCE_INTEGRITY clear), windows-drivers-provision.yml (WDK/LLVM toolchain), windows-msix.yml (the client). A single Windows runner serializes the whole fleet; a Cargo.toml touch costs ~25 min of queue, so driver pushes that avoid Cargo.toml skip the fleet serialization.

Local pre-push checks (this Linux box can't compile the Windows paths):

cargo test  -p pf-driver-proto                 # the ABI crate (cross-platform)
cargo check -p punktfunk-host                     # Linux paths; win_* mods are #[cfg(windows)]
cargo clippy -p punktfunk-host --all-targets -- -D warnings
# Windows host clippy (on the box): PUNKTFUNK_NVENC_LIB_DIR=C:\t\nvenc;
#   cargo clippy -p punktfunk-host --features nvenc --target x86_64-pc-windows-msvc -- -D warnings
# Driver build (on the box): cd packaging/windows/drivers; Version_Number=10.0.26100.0;
#   LIBCLANG_PATH='C:\Program Files\LLVM\bin'; cargo build

Note: a pre-existing rustfmt-version drift exists in some Windows-only files (this box's rustfmt 1.9.0 wraps offset_of!/unsafe fn differently than the runner's) — don't reformat unrelated files to chase it.

5.3 Env knobs (Windows host)

PUNKTFUNK_IDD_PUSH=1 (capture from the driver ring; default off), PUNKTFUNK_VDISPLAY=pf|sudovda, PUNKTFUNK_ENCODER=auto|nvenc (auto → vendor-detect), PUNKTFUNK_10BIT=1 + PUNKTFUNK_HDR_SHADER_P010=1 (HDR), PUNKTFUNK_SECURE_DDA=1, PUNKTFUNK_NO_WGC=1 (pure DDA), PUNKTFUNK_ZEROCOPY=1, PUNKTFUNK_MONITOR_LINGER_MS, PFVD_DEBUG_LOG=1 (driver file log — release builds are silent without it). Config lives in %ProgramData%\punktfunk\host.env; logs in %ProgramData%\punktfunk\logs\host.log.

5.4 Build / deploy / packaging

x64-only by design (no ARM64 NVIDIA driver / SudoVDA). The installer is the thin-.iss / fat-binary model delegating to service install; tag host-win-vX.Y.Z. The driver is built + FORCE_INTEGRITY-cleared + signed + Inf2Cat'd in CI from source. DriverVer must bump on any driver change; create the ROOT devnode via nefcon (devgen is forbidden).


6. Reference (hard-won — keep)

6.1 The /INTEGRITYCHECK answer

wdk-build emits cargo::rustc-cdylib-link-arg=/INTEGRITYCHECK unconditionally (no cfg/env/Config opt-out), so a self-signed driver can't load (CodeIntegrity 3004/3089). The fix: a deterministic, idempotent post-link step packaging/windows/clear-force-integrity.ps1 clears the PE FORCE_INTEGRITY bit (0x0080 @ e_lfanew+0x5e) + verifies (CI-proven 0x01E0 → 0x0160), before signing. Packaging order: cargo build → clear-force-integrity → sign .dllInf2Cat → sign .cat. (A public build would use real attestation signing, which satisfies /INTEGRITYCHECK legitimately.)

6.2 The iddcx binding on wdk-sys (the make-or-break — proven, the 6 bindgen knobs)

IddCx DDIs are function-table dispatched (IddFunctions[] indexed by _IDDFUNCENUM::<Name>TableIndex, IddDriverGlobals implicit arg 1) — the same model wdk-sys already implements for WDF. The vendored windows-drivers-rs 0.5.1 (packaging/windows/drivers/vendor/, [patch.crates-io]'d) gets a first-class ApiSubset::Iddcx that bindgens iddcx/1.10/IddCx.h reusing the identical wdk_default(config) baseline (so WDF/DXGI types resolve to, not redefine, wdk-sys's — type-identity by construction). The six knobs generate_iddcx needed (each a real gotcha, all CI-proven):

  1. --language=c++wdk_default parses C; IddCx.h's IDARG_* typedefs need C++ (else a "must use 'struct' tag" cascade).
  2. -DIDD_STUB — table-dispatch mode; skips IddCxFuncEnum.h's #error IDDCX_VERSION_MAJOR not defined. Do NOT add WDF_STUB (would desync the shared WDF type-identity).
  3. allowlist_recursively(false) + allowlist_file("(?i).*iddcx.*"), full codegen (no .complement()) — emit ONLY IddCx items; WDF/Win types resolve via use crate::types::*.
  4. allowlist_type("_?DXGI_.*" / "IDXGI.*" / "_?OPM_.*" / "_?D3DCOLORVALUE") — emit the non-WDF types wdk-sys doesn't bindgen, locally. The _? is load-bearing (typedef struct _OPM_X {} OPM_X needs the tag AND the alias).
  5. pub type UINT = ::core::ffi::c_uint; in src/iddcx.rsUINT is absent from crate::types.
  6. translate_enum_integer_types(true) — emit native u32 reprs for the DXGI/OPM ModuleConsts enums (nested modules can't see a parent UINT).

Wrapper note: table dispatch via _IDDFUNCENUM::<Name>TableIndex as usize (the ModuleConsts const, not a NewType .0); NTSTATUS is plain i32 (wdk_sys::NT_SUCCESS). The driver build.rs adds the IddCxStub link-search (the import lib is under iddcx\1.0\ even though headers are 1.10) + #[no_mangle] pub static IddMinimumVersionRequired: ULONG = 4. The versioned IDD_STRUCTURE_SIZE! path is dropped — the WDK links the iddcx 1.0 stub (lacks the version table); we target 1.10 vs a current framework, so size_of is exactly correct.

6.3 Driver port checklist (STEP 08, as landed)

  1. workspace pf-vdisplay(cdylib)+wdk-iddcx; prove std::thread+OwnedHandle link under UMDF (done).
  2. wdk-iddcx: 11 typed DDI wrappers via one dispatch macro + re-export the inbound PFN_* types.
  3. DriverEntry + IDD_CX_CLIENT_CONFIG (15 callbacks) + DeviceInitConfig + WdfDeviceCreate + CreateDeviceInterface (the owned pf GUID) + DeviceInitialize; edid.rs salvaged verbatim.
  4. DeviceContext + WDF_DECLARE_CONTEXT_TYPE blob; init_adapter in D0Entry (caps + FP16) → AdapterInitAsync; the *2 mode DDIs + query_target_info + gamma/HDR accept-stubs. (Box gate: loads under Secure Boot, enumerates as an IddCx adapter, Status OK.)
  5. control plane (GET_INFO version handshake the host asserts, ADD/REMOVE/SET_RENDER_ADAPTER/PING/ CLEAR_ALL) + create_monitor + real mode DDIs + watchdog + mode bounds; host switched to pf_driver_proto.
  6. Direct3DDevice + assign/unassign + SwapChainProcessor (worker, SetDevice 60×@50 ms single-borrow retry, top-of-loop terminate, ReleaseAndAcquireBuffer2, from_raw_borrowed).
  7. FramePublisher on pf_driver_proto::frame + keyed-mutex RAII guard; wire into run_core. (Box: full IDD-push glass-to-glass + the secure-desktop gate — validated 2026-06-25.)
  8. HDR / FP16 ring (validated: Mac connects WITH HDR).
  9. its own .inx + an unsafe-reduction pass (deny(unsafe_op_in_unsafe_fn), per-site // SAFETY:).

Remaining driver work beyond STEP 8: E1 (DeviceContext-owned state + per-IDDCX_MONITOR EvtCleanupCallback → unblock max_concurrent>1), the slot-reclaim-on-REMOVE fix, and M4 (fold the gamepad drivers in). See §4.

6.4 Resolved product decisions (the five forks)

A the host was refactored in place (staged, behavior-preserving), not greenfield-rebuilt — the driver was rebuilt fresh. B IDD-push primary for everything incl. the secure desktop (validated); WGC+DDA demoted to non-IddCx fallbacks. C all drivers on microsoft/windows-drivers-rs (+ the iddcx subset; /INTEGRITYCHECK solved) — done for pf-vdisplay, pending for the gamepad drivers (M4). D keep GameStream (Moonlight), default to secure serve. E concurrent sessions: the host-side preempt dance was removed by §2.5, but true max_concurrent>1 on Windows stays blocked on the E1 driver swap-chain-reuse work.


Appendix — consolidation note

This file replaces five docs (recoverable from git history):

  • windows-host-rewrite.md (the original design + plan, §0–§15) — its current status, architecture, the jewels, the seam traits, and the deep reference (§6) are folded in here.
  • windows-host-goal1-plan.md (the 6-stage in-place host refactor) — complete; its outcome is §2.22.4 and the Goal-1 scorecard row.
  • windows-host-rewrite-audit.md (the 2026-06-25 audit) — its findings are reconciled to current reality in §1 (scorecard) and §4 (only the still-open items survive: host hygiene, E1, slot-reclaim).
  • windows-host-rewrite-remediation.md (the audit-remediation tracker) — its landed items are in §1; its remaining items (D1-host, D2, E1, G) are §4 P2/P3.
  • windows-host-rewrite-game-capture-bug.md (the GB1 investigation + fix) — fixed; the resolution is §2.5 (capture). The full investigation narrative is in git history.

(The older docs/windows-host.md, a pre-rewrite implementation plan from 2026-06-22, is a separate lineage and is left as-is.)