Files
punktfunk/docs/windows-host-rewrite.md
T
enricobuehler 0b663cefb6 feat(windows): pf-vdisplay-proto — owned host<->driver ABI crate (rewrite M0)
First foundation of the Windows-host rewrite (docs/windows-host-rewrite.md): a
self-contained, no_std + bytemuck crate that defines the host<->driver binary
contract ONCE — the control-plane IOCTLs (add/remove/set-render-adapter/ping/
get-info/clear-all) and the IDD-push frame transport (SharedHeader, the
(gen<<40|seq<<8|slot) FrameToken, the Global\pfvd-* name scheme, driver-status
codes). Previously these were hand-duplicated byte-for-byte across
idd_push.rs/frame_transport.rs and sudovda.rs/control.rs with only "must match"
comments; here const size-asserts + bytemuck round-trips make any drift a COMPILE
error.

Clean break from SudoVDA: a freshly-minted interface GUID (not e5bcc234), a
contiguous 0x900 op space (not the gappy 0x800/0x888/0x8FF), a u64 session id (not
the 16-byte GUID + pid-mangling), a single u32 protocol version. Self-contained
(no workspace inheritance, no Windows deps) so the out-of-workspace driver build
graph can path-dep it identically. 7 tests green on Linux; clippy + fmt clean.

Also lands the full rewrite plan in docs/windows-host-rewrite.md (decisions:
greenfield; IDD-push primary incl. secure desktop, WGC+DDA demoted to fallbacks;
unify drivers on windows-drivers-rs + solve /INTEGRITYCHECK; keep GameStream,
default secure).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 06:49:50 +00:00

43 KiB
Raw Blame History

Windows Host Rewrite — Design & Plan

Status: proposed (2026-06-24). This plan takes the current, hard-won Windows host (pf-vdisplay all-Rust IddCx driver + IDD-push zero-copy capture, live-validated 5120×1440@240 HDR on the RTX box) as a knowledge base and re-derives a clean, stable, well-layered architecture from it. It drops all SudoVDA back-compat (we own both ends now) and drives unsafe to a contained minimum.

It supersedes the stale conclusion in docs/windows-virtual-display-rust-port.md ("IDD-push not viable") — that verdict was written in the same commit (e2c9bfd) that shipped the working 922-line consumer + 424-line producer. IDD-push works and is the architecture. The breakthrough the prose never recorded: once the CCD topology makes the virtual display the sole composited desktop in the console session, DWM composites to it and the IddCx swap-chain is assigned (run_core: FIRST FRAME acquired — DWM IS compositing the virtual display!). Per the owner, IDD-push also captures the secure desktop (Winlogon / UAC / lock) — so it is the universal primary path, not just the normal-desktop path.

Decisions resolved (2026-06-24)

# Decision Chosen
A. Execution greenfield vs staged Greenfield rewrite — rebuild the Windows host fresh against the clean architecture, salvaging the validated "jewels" (§1) verbatim. (Risk acknowledged: no CI for the Windows paths — mitigated by the §1 preservation checklist + on-glass gates, §10.)
B. Capture surface IDD-only / IDD+secure-DDA / keep fallbacks IDD-push primary for everything (incl. the secure desktop); keep WGC + DDA as fallbacks.
C. Driver binding stack wdf-umdf vs windows-drivers-rs Extend microsoft/windows-drivers-rs with an iddcx subset; unify all three drivers on it; solve /INTEGRITYCHECK properly (§6).
D. GameStream on Windows keep / keep-secure-default / drop Keep Moonlight compat; flip the installer/service default to secure serve (GameStream an explicit opt-in).

0. Goals (from the brief)

  1. Clean, stable, well-layered architecture. Decompose the god-files, give every subsystem one owner, and replace the ~40-knob PUNKTFUNK_* env soup with a typed config resolved once per session.
  2. Drop every trace of SudoVDA back-compat. We own the driver (pf-vdisplay) and the host. The byte-identical IOCTL ABI, the reused {e5bcc234} GUID, the sudovda module name, the "SudoVDA ignores this" conditionals — all pure liability now.
  3. Minimize unsafe. ~480 unsafe occurrences across the Windows surface; the large majority are FFI-mechanical (windows-rs/NVENC/WDK already return Result). Target: host ~144→~35, drivers ~227→~60, with the irreducible floor contained in 34 named modules under deny(unsafe_op_in_unsafe_fn).

Non-goals / invariants (do not regress)

  • Linux host behavior is out of scope and must not change. The host crate is shared; Linux is validated across KWin/gamescope/Mutter/Sway. Touch only the seams.
  • punktfunk-core stays the one linked core. Protocol/FEC/crypto/QUIC live there behind the C ABI; the host is a leaf binary. No protocol changes here.
  • No async on the per-frame path. Native threads only (the existing discipline).

1. What we KEEP (validated, load-bearing — port, don't rewrite)

These are expensive empirical wins. The rewrite relocates/wraps them but must preserve behavior byte-for-byte:

  • The IDD-push frame transport shape: host-creates / driver-opens shared keyed-mutex texture ring with the permissive D:(A;;GA;;;WD) SDDL (forced by the restricted WUDFHost token, mirrors the gamepad drivers); the generation-tagged latest = gen<<40 | seq<<8 | slot stale-ring reject (kills the HDR-flip garbage frame); 0 ms try-acquire / drop-on-full publish (never block the swap-chain thread); the host output ring OUT_RING + pipeline_depth=2 overlap of convert/copy vs NVENC.
  • The IddCx driver internals that earned their keep: edid.rs in full (128-byte EDID + CTA-861.3 HDR block, serial-as-index round-trip, dual checksums); the HDR enablement recipe (CAN_PROCESS_FP16
    • the *2 mode DDIs + set_gamma_ramp/set_default_hdr_metadata accept-stubs + HIGH_COLOR_SPACE + 8|10 bpc); DEVICE_POOL one-device-per-render-LUID (the NVIDIA UMD-thread/VRAM leak fix); stamping the OS target id onto the monitor context (the recreated-monitor target_id=0 fix); the swap-chain processor's two real leak fixes (borrow IDXGIDevice across SetDevice retries; check terminate at the loop top during a frame burst).
  • The monitor-lifecycle concurrency correctness: serialized ADD/REMOVE/teardown, the documented lock order, the watchdog CAS + re-check-under-lock, the creation grace window, the generation-stamped lease (a stale lease can't tear down a fresh monitor). Structure can change; these properties must survive.
  • The CCD topology fixes: isolate_displays_ccd (the iGPU-attached-monitor hybrid-box correctness; the SDC_FORCE_MODE_ENUMERATION re-commit that drives COMMIT_MODES → ASSIGN_SWAPCHAIN); restore topology before REMOVE.
  • The HDR color math: hdr.rs verbatim (pure, unit-tested, ST.2086 G/B/R + big-endian SEI); HdrConverter/HdrP010Converter + the f64 p010_reference + hdr_p010_selftest; VideoConverter (RGB→NV12/P010 on the video engine — a measured latency win); the cursor decomposition (convert_pointer_shape color/masked/monochrome edge cases).
  • NVENC tuning: caps-probe-before-configure (disambiguate unsupported-config vs too-high-bitrate; 10-bit→8-bit graceful downgrade); the bitrate-clamp binary search (finds each GPU's real ceiling); true RFI over the DPB; the low-latency configs (CBR, infinite GOP, P-only, ~1-frame VBV).
  • The gamepad driver wins: the SwDeviceCreate identity recipe (enumerator with no _; mandatory completion callback; synthesized USB\VID_054C&PID_0CE6 compat-ids for native-DS5 detection; the non-null per-pad ContainerId dodging the xinput1_4 slot-skip); one pf_dualsense serving DualSense+DS4 via a device_type byte; XUSB declining WAIT_* to force synchronous GET_STATE; the static HID descriptors/feature blobs; per-pad index via pszDeviceLocation.
  • The session-glue patterns: the Capturer/VirtualDisplay/Encoder trait seam + RAII keepalive teardown; host-lifetime shared services (InjectorService/MicService/AudioCapSlot) with per-session gamepads; the encode|send thread split + microburst pacing; build_pipeline_with_retry
    • permanent-vs-transient classification; the control-task select! + adaptive-FEC; the GameStream VideoPacketizer (GF8 Cauchy, Moonlight byte-exact); the pairing/trust handshake.
  • The SCM supervisor model: Session-0 LocalSystem supervisor → token-retarget → CreateProcessAsUserW serve into the console session, relaunch-on-session-change, kill-on-close Job Object; the file-append log-mask; the two-tier logging init.
  • Build/CI wins: the wdf-umdf-sys build.rs SDK-version resolution (picks the SDK version that actually contains iddcx, not the max base SDK); the ARM64 cross-compile off the x64 runner; the thin-.iss / fat-binary installer delegating to service install.

2. Target architecture

2.1 Crate & workspace strategy

Keep ONE shared crates/punktfunk-host crate (do not split punktfunk-host-windows). The host is a leaf binary consumed by nobody; the "one core, linked everywhere" invariant is already satisfied by punktfunk-core. A split would only fork the genuinely-shared session glue, traits, and hdr.rs. The cfg-sprawl win comes instead from confining all Windows code under one src/windows/ subtree behind a single #[cfg(windows)] mod windows; seam, with backend impls next to their trait's dispatch point.

Pull the three drivers into ONE in-tree driver workspace (packaging/windows/drivers/) on a single binding stack, one rust-toolchain.toml, one signing recipe, one CI build. Today they are 23 disjoint cargo packages on two incompatible WDK stacks (see §6).

Add ONE shared no_std ABI crate (crates/pf-vdisplay-proto, name TBD) consumed by both the host crate and the driver workspace. It owns every cross-process binary contract that is currently hand-duplicated with "must match" comments. This is the single highest-value correctness change (§4.1).

2.2 Target file tree (host crate)

crates/punktfunk-host/src/
  main.rs                  clap-derive subcommand dispatch only (kills parse_serve/parse_spike/hand --help)
  config.rs                HostConfig (typed; parsed ONCE from host.env/env/flags) + config_dir
  session/
    mod.rs                 SessionFactory, SessionPlan, SessionContext, Session (the ONLY teardown path)
    server.rs              QUIC accept loop, handshake, shared-service wiring
    serve_session.rs       resolve_* → Welcome/Start → spawn → RAII teardown
    control.rs             mid-stream renegotiation select! loop
    pipeline.rs            REAL shared encode|send split, send_loop, FrameMsg, pacing (used by native AND GameStream)
  capture.rs               Capturer trait + CapturedFrame/PixelFormat/FramePayload (platform-neutral)
  capture/linux.rs
  capture/windows/         mod.rs (dispatch), idd_push.rs, dda.rs, wgc.rs, secure_desktop.rs*
  vdisplay.rs              VirtualDisplay/VirtualOutput trait + open() dispatch (neutral)
  vdisplay/{kwin,gamescope,mutter,wlroots}.rs
  vdisplay/windows.rs      was sudovda.rs → PfVirtualDisplay + VirtualDisplayManager
  encode.rs                Encoder trait, EncodedFrame, validate_dimensions, open_encoder dispatch
  encode/{linux,vaapi,sw}.rs
  encode/windows/          mod.rs (dispatch), nvenc.rs, nvenc_sys.rs, ffmpeg_win/{mod,system,zerocopy,d3d11va_ffi}.rs
  hdr.rs                   PRESERVE VERBATIM
  inject.rs / inject/linux/* / inject/windows/{mod,sendinput,pad_manager,xusb,dualsense,dualshock4,swdevice,section}.rs
  inject/proto/{dualsense,dualshock4}.rs   shared pure codecs (PRESERVE)
  audio.rs / audio/linux.rs / audio/windows/{mod,wasapi_cap,wasapi_mic}.rs
  windows/                 mod.rs, d3d/{mod,texture,ring,convert}.rs, color/{hdr,p010,video_proc}.rs,
                           cursor.rs, display_ccd.rs, adapter.rs, process.rs (Token/Event/Job/Child/spawn_as_user),
                           service.rs (SCM; uses process.rs), win32u_hook.rs*, gpu_priority.rs
  session_tuning.rs (PRESERVE) / pwinit.rs / discovery.rs / mgmt.rs / native_pairing.rs / library.rs
  gamestream/              unchanged module set; stream.rs slims by reusing session/pipeline.rs

* = survives only per the secure-desktop / WGC product decisions (§5, §11).

2.3 The seam traits (keep the shape; tighten 3 things)

trait VirtualDisplay: Send {
    fn name(&self) -> &str;
    fn create(&self, mode: Mode) -> Result<VirtualOutput>;
    fn set_launch_command(&self, cmd: Option<String>);   // per-instance, not a global env var
}
struct VirtualOutput {
    node_id: u32,
    preferred_mode: Mode,
    #[cfg(windows)] win_capture: WinCaptureTarget,        // target_id + adapter_luid + monitor_gen (carried, not ambient)
    keepalive: Box<dyn VirtualLease>,
}
trait VirtualLease: Send {                                // Drop = release; replaces the sudovda free-fns + CURRENT_MON_GEN reach-in
    fn set_hdr(&self, on: bool) -> Result<()>;
    fn hdr_enabled(&self) -> bool;
    fn await_released(&self, timeout: Duration) -> bool;
}

trait Capturer: Send {
    fn next_frame(&mut self) -> Result<CapturedFrame>;
    fn try_latest(&mut self) -> Option<CapturedFrame>;
    fn set_active(&mut self, a: bool);
    fn hdr_meta(&self) -> Option<HdrMeta>;
    fn pipeline_depth(&self) -> usize;
}
fn open_capturer(vout: VirtualOutput, want: OutputFormat) -> Result<Box<dyn Capturer>>;  // format+HDR passed IN

trait Encoder: Send {
    fn submit(&mut self, f: &CapturedFrame) -> Result<()>;
    fn poll(&mut self) -> Option<EncodedFrame>;
    fn flush(&mut self);
    fn request_keyframe(&mut self);
    fn caps(&self) -> EncoderCaps;                        // query, don't rely on default no-ops
    fn set_hdr_meta(&mut self, m: Option<HdrMeta>);
    fn invalidate_ref_frames(&mut self, lo: u64, hi: u64) -> bool;
}
fn open_encoder(plan: &EncodePlan) -> Result<Box<dyn Encoder>>;

trait AudioCapturer: Send { fn next_chunk(&mut self) -> Result<Vec<f32>>; fn channels(&self) -> u16; fn drain(&mut self); }
trait VirtualMic:    Send { fn push(&mut self, pcm: &[f32]); fn channels(&self) -> u16; }
trait InputInjector: Send { fn inject(&mut self, e: &InputEvent); }
trait PadManager:    Send { /* handle/apply_rich/pump/heartbeat — Box<dyn PadManager> via select(GamepadPref), replaces the PadBackend enum */ }

The three tightenings: (1) Capturer takes the desired OutputFormat IN — kills the capture → encode::windows_resolved_backend() back-reference that's recomputed in dxgi.rs; (2) HDR control + monitor-release become VirtualLease methods so the session glue never names a concrete backend and contains zero unsafe; (3) optional encoder capabilities are queried via EncoderCaps.

2.4 SessionFactory + typed plan (the single biggest clarity lever)

Today the Windows capture/topology/encoder decision is made by ~40 scattered env reads, recomputed in THREE places (capture_virtual_output, should_use_helper, virtual_stream) with no single owner and a latent mirrored-dispatch bug (capture and encode can disagree on the backend). Replace with:

struct SessionPlan {
    display:  DisplayBackend,
    capture:  CaptureBackend,        // IddPush | Dda | Wgc
    topology: SessionTopology,       // SingleProcess | TwoProcessRelay
    encoder:  EncoderBackend,        // Nvenc | Amf | Qsv | Software
    input_format: OutputFormat,
    bit_depth: u8, hdr: bool, pipeline_depth: usize,
}
struct SessionFactory { cfg: Arc<HostConfig>, vdm: Arc<VirtualDisplayManager>, injector, mic, audio }
impl SessionFactory {
    fn plan(&self, welcome: &Welcome) -> SessionPlan;          // resolves ONCE from HostConfig; no env reads downstream
    fn build(&self, plan: &SessionPlan, ctx: SessionContext) -> Result<Session>;  // owns the RAII chain
}

build() owns the chain vdm.lease(mode) → open_capturer(vout, fmt) → open_encoder(plan) → spawn pipeline, and Session::drop is the only teardown path. This kills the env soup, makes the deployed path readable, and removes the capture/encode backend-disagreement bug class. It also lets us drop the 1213-arg #[allow(too_many_arguments)] signatures (a SessionContext struct) and the dead Compositor ceremony threaded through the Windows path.

2.5 Ownership model — delete the global statics

Today the lifecycle is smeared across IDD_PERSIST + open_or_reuse (dead code), CURRENT_MON_GEN (read per-frame), IDD_SETUP_LOCK/IDD_SESSION_STOP (the preempt dance), MGR: Mutex<Mgr>, and on the driver side ADAPTER/MONITOR_MODES/NEXT_ID/WATCHDOG_*/DEVICE_POOL. Replace with:

  • A host-lifetime VirtualDisplayManager owning a typed OwnedHandle device handle (not a raw isize smuggled across threads) and the refcounted Idle/Active/Lingering state machine (preserve the machine — it's earned).
  • A per-session MonitorLease whose Drop releases the refcount; the monitor generation carried through WinCaptureTarget instead of the ambient CURRENT_MON_GEN.
  • On the driver: wire EvtCleanupCallback for MonitorContext (only DeviceContext has it today) so the SwapChainProcessor + D3D resources drop via WDF RAII — deleting free_swap_chain_processor and the manual-free-before-departure dance that is the documented dominant reconnect leak. Move the process-global driver state into the DeviceContext; collapse the 3-way monitor identity (MONITOR_MODES / EDID serial / context stamp) to one Monitor owned by the context.

3. The host↔driver contract (own it; define once)

3.1 pf-vdisplay-proto (no_std, bytemuck/zerocopy)

One crate, both build graphs (path dep). Owns:

  • Control plane: a fresh interface GUID; a contiguous, versioned op enum; #[repr(C)] request/reply structs carrying only used fields.
  • Frame plane: SharedHeader, the FrameToken { generation, seq, slot } with pack/unpack (replacing the hand-twiddled gen<<40|seq<<8|slot on both sides), the Global\pfvd-* name helpers.
  • Gamepad sections: XusbShm (64 B) and PadShm (256 B, incl. device_type) layouts.
  • Derive FromBytes/IntoBytes/Pod; const size+offset asserts; round-trip tests. ABI drift becomes a compile error, not a runtime corruption. (bytemuck is already a dep in the driver + wdf-umdf-sys.) This deletes every OFF_* constant + read/write_unaligned on both sides of every boundary — the largest single block of shared-memory unsafe, and the top drift hazard.

3.2 Control plane — keep DeviceIoControl, redesign the ABI

DeviceIoControl is the correct WDF idiom for a driver with no control device and is low-frequency (ADD/REMOVE per session + a keepalive); the shared-memory pattern buys nothing here. Keep it; redesign the surface:

  • Ops actually needed: Add(mode, identity) → {luid, target_id}, Remove, SetRenderAdapter (now unconditional — pf-vdisplay honors it for hybrid-GPU IDD-push; drop the SudoVDA-parity default-off branch), ClearAll (first-class startup orphan reap, not an "ignored by SudoVDA" hack), GetInfo (a real version handshake), and keepalive (see §3.4).
  • Drop the SudoVDA-isms: AddParams.device_name[14]/serial[14] (ignored), the 16-byte GUID → a monotonic u64 session id (the refcount manager owns collision safety; retires next_monitor_guid's pid-mangling), the 4-byte {major,minor,incr,test} version tuple → one u32, the gappy 0x800/0x888/0x8FF func numbering → contiguous.
  • One typed IOCTL dispatch helper retrieves+validates+aligns the buffers and hands the body a safe &Req / &mut MaybeUninit<Reply> — collapses ~20 of control.rs's 29 unsafe blocks.

3.3 Frame plane — keep the inversion, retire the scaffolding

Keep the host-creates / driver-opens ring exactly. Remove the bring-up scaffolding that diagnosed the now-solved run_core=0 mystery: the DebugBlock channel + DBG_MAGIC, spawn_observer / PUNKTFUNK_IDD_PUSH_OBSERVE, the error!-as-info! logging, the intentional handle leak, and the 20 s blind no-frame deadline (replace with the DRV_STATUS_OPENED handshake as a bounded liveness signal).

3.4 Driver swap-chain reuse — the one open root cause

Today a reused IddCx monitor's swap-chain dies after ~2 sessions (target id resolves to 0, SetDevice fails 0x80070057, then an access violation), forcing fresh-monitor-per-session + the host-side preempt/wait_for_monitor_released dance + the IDD_PERSIST "create once, never recreate" workaround. The fix is in the driver: with EvtCleanupCallback wired + state owned by DeviceContext + the identity collapsed to one Monitor (the recreate-path bugs are exactly the 3-way identity desync), the clean recreate should become stable. If that holds, delete IDD_SETUP_LOCK/IDD_SESSION_STOP + the preempt dance and unblock max_concurrent>1 on Windows. If it can't be fixed cheaply, isolate the residual serialization inside VirtualDisplayManager (not smeared back into the session loop). Separately, evaluate replacing the polling watchdog (PING/countdown/grace/linger constellation) with a WDF file-object EvtFileClose (host holds the control handle open; close = host gone) — feasibility TBD on UMDF/IddCx.


4. Capture strategy

IDD-push is the universal primary path — normal AND secure desktop (Decision B). It composes in-process (cross-session via Global\ shared textures: driver in WUDFHost/Session 0, serve in the console session), needs no DXGI Desktop Duplication and no win32u reparenting hook, is live-validated at 5K@240 HDR, and (per the owner) also captures the secure desktop (Winlogon/UAC/lock). So there is no separate "secure capturer" in the primary path: the same IddPushCapturer spans the lock screen and UAC. Capture selection moves into a typed CaptureBackend in the SessionPlan — replacing the 3-way env branch with IddPush (default) → Dda/Wgc (explicit fallbacks).

WGC + DDA are kept as fallbacks, not deleted (Decision B). They cover non-IddCx / pre-pf-vdisplay hardware and act as a safety net if IDD-push fails to attach. But they are demoted: they are no longer the default, no longer entangled with the secure-desktop mux, and selected only via the explicit CaptureBackend fallback in the plan. This lets the DDA module shed the parts that existed only to make virtual-display-over-DDA survive on a hybrid box, while the genuinely-useful capture/recovery core stays:

  • Scope the win32u self-modifying-code hook + the GPU-pref hook to the DDA fallback leg (one win32u_hook::install()), so the primary IDD-push path never touches them. Re-confirm whether DDA even needs the win32u hook against pf-vdisplay (it may not — open verification item).
  • The two-process WGC relay's secure-desktop mux is retired — IDD-push handles the secure desktop directly, so desktop_watch.rs + composed_flip.rs + the virtual_stream_relay monolith are no longer needed for their original purpose. Keep a minimal WGC fallback capturer if the WGC backend is retained; do not port the 400-line relay state machine. (The cross-session input concern below is handled by the InputInjector/topology abstraction, not the AU video relay.)

Shared D3D primitives move out of dxgi.rs (today the de-facto dumping ground that wgc.rs and idd_push.rs import from) into windows/d3d/ (typed Texture2d/Ring/CopyResource/Map-as-bytes), windows/color/ (the converters + hdr_p010_selftest verbatim), and windows/cursor.rs. All three capturers consume them — deletes the duplicated tex_desc, cursor, HDR-poll, repeat-last logic.

The texture-ownership contract becomes type-level. NVENC encodes the capturer's texture in place (no copy), sound today only because the IDD-push capturer rotates OUT_RING and the loop honors pipeline_depth() — an undocumented cross-module coupling that is already a latent corruption risk. Fix: either the encoder always CopySubresourceRegions (as ffmpeg_win does), or the capturer hands an explicitly-leased ring texture with a documented lifetime. No more relying on the synchronous-loop assumption.

The IDD-push input question (must confirm on-glass): capture+encode run in serve; input must reach the streamed (console-session) desktop. If serve runs in the console session, SendInput works directly. A code comment flags "SendInput from Session 0 can't reach Session 1" — so the architecture must make InputInjector satisfiable either by in-session SendInput or by a tiny input-only Session-1 agent (re-scope the old WGC helper to input only). The SessionPlan.topology expresses this.


5. Encode layer

  • Resolve backend + input format + pipeline depth once into EncodePlan and hand it to both the capturer and the encoder factory — kill the duplicated windows_resolved_backend() call in dxgi.rs (the highest-severity coupling). Trim open_video's 8-arg grab-bag (cuda is always false on Windows; bit_depth is overridden by the capture format anyway).
  • nvenc_sys.rs: a thin safe wrapper — RAII NvSession/NvBitstream/NvRegistration/ NvMappedInput (Drop = destroy/unregister/unmap) + an NV_ENC_CONFIG builder. The public encoder then has near-zero unsafe and no hand-written teardown loops. (The SDK table already returns Result via result_without_string().) This is the single biggest encode-side unsafe reduction.
  • ffmpeg_win: RAII AvFrame/SwsCtx/HwDeviceCtx/HwFramesCtx delete every manual av_*_free and the error-path cleanup ladders (also the biggest leak-risk reduction); a checked MappedSurface for the staging readback; a const size-assert on the hand-mirrored AVD3D11VA* structs in a dedicated d3d11va_ffi submodule (silent FFmpeg ABI drift is currently undetectable). Keep system-readback the default; zero-copy stays opt-in/experimental (no AMD/Intel lab box).
  • HDR symmetry: make in-band ST.2086/CLL SEI a shared post-encode step so AMF/QSV get the same mastering metadata as NVENC (today only NVENC attaches it; AMF/QSV rely solely on the 0xCE datagram). Centralize "when does the client learn HDR metadata" in one owner.
  • Keep hdr.rs, the Encoder trait, EncodedFrame, validate_dimensions, the caps-probe + RFI logic verbatim. Delete the pipeline.rs pump_once doc stub (the real loop is session/pipeline.rs).

6. Drivers — one binding stack (windows-drivers-rs), one workspace, one signing recipe

Today: pf-vdisplay on the vendored wdf-umdf stack; pf_dualsense + pf_xusb on microsoft/windows-drivers-rs (wdk/wdk-sys/wdk-build). Two bindgen passes, two SDK resolutions, two NTSTATUS, two build systems, two signing recipes.

Decision C: unify all three on microsoft/windows-drivers-rs (the official Microsoft stack), in one in-tree packaging/windows/drivers/ workspace, edition 2024, one rust-toolchain.toml, one CI build. The gamepad drivers already ship on it; the work is to migrate pf-vdisplay onto it and add the IddCx surface it lacks today.

Required pieces of this migration (each a Phase-0/early task):

  1. Add an iddcx subset to wdk-sys. IddCx DDIs are not WDF-table functions — they are direct IddCxStub exports — so the extension is bounded: an ApiSubset::Iddcx + iddcx feature → bindgen IddCx.h + link IddCxStub, then ~15 thin extern/wrapper fns. Use the current wdf-umdf/src/iddcx.rs (~345 LOC, validated) as a line-by-line oracle, including the IddCx 1.10 *2 HDR DDIs (IddCxSwapChainReleaseAndAcquireBuffer2, IDARG_*2, _METADATA2).
  2. Solve /INTEGRITYCHECK for self-signed loading — properly. wdk-build links the driver with /INTEGRITYCHECK, which a self-signed cert can't satisfy (CodeIntegrity 3004/3089). Today the gamepad drivers hand-patch the FORCE_INTEGRITY PE bit post-link. Replace that hack with a robust solution, in order of preference: (a) override the linker flag — drop /INTEGRITYCHECK via wdk-build config / RUSTFLAGS/link-args if it can be suppressed cleanly; else (b) a deterministic, tested CI post-link tool (a small Rust/PowerShell step that clears bit 0x80 at e_lfanew+0x5e and re-signs, run in CI, not by hand) so it's reproducible and not a footgun; (c) for a public build, real attestation signing (Partner Center) satisfies /INTEGRITYCHECK legitimately. Pick (a) if feasible; (b) as the fleet-self-signed fallback. This is the headline cost of choosing this stack and must be nailed in Phase 0.
  3. Backport the wdf-umdf-sys build.rs SDK-resolution fix into wdk-build (or a local override): resolve IddCx.h/IddCxStub by the SDK version that actually contains um\x64\iddcx, not the max base SDK (the real failure where a newer base SDK shadows the WDK SDK). windows-drivers-rs's default resolution doesn't exercise IddCx today, so this likely needs porting.
  4. Port pf-vdisplay's typed safety wins onto the new stack: re-create the WDF_DECLARE_CONTEXT_TYPE! Arc<RwLock<T>> context abstraction (the gold-standard contained unsafe); the version-gate protocol (IddCxIsFunctionAvailable! / IDD_STRUCTURE_SIZE!); and a thin safe wrapper layer so the gamepad drivers stop emitting raw call_unsafe_wdf_function_binding! everywhere (the biggest driver-unsafe lever).

While unifying, also: adopt WDF device contexts for per-pad state (drop the UmdfHostProcessSharing=ProcessSharingDisabled-dependent statics → true multi-pad-per-host); replace mem::zeroed() configs with the WDF_*_CONFIG_INIT initializers (kills the recurring zeroed-default bug class that already caused 3 driver bugs); cache the shm view (RAII ShmView) instead of re-mapping ~125×/s; delete the world-writable C:\Users\Public\*.log driver logging and the "M0 spike" naming; collapse is_nt_error()/dyn-Any/From<()>-as-error into a typed IntoDriverResult; collapse the per-call dispatch unsafe into one generic dispatch() helper.

Provenance note: confirm where wdk/wdk-sys/wdk-build come from (the gamepad drivers' Cargo.toml path-deps ../../crates/wdk* don't exist in this checkout — they resolve inside a windows-drivers-rs checkout on the dev box). Pin them as crates.io deps or a vendored, version-pinned copy so the driver workspace builds reproducibly in CI.


7. Input, audio, service, packaging

  • Input: consolidate the host-side device plumbing (create_swdevice/create_shm_section/ SwDeviceProfile) into one inject/windows/swdevice.rs used by all three managers (XUSB included, which currently re-implements its own). The shm layouts come from pf-vdisplay-proto. Re-scope the cross-session helper (if any) to input-only.
  • Audio: small, already fairly clean. Replace the lone newdev.dll LoadLibrary+transmute (wasapi_mic.rs, the audio runtime's only unsafe) with the windows-rs DiInstallDriverW binding (or move provisioning to the installer) → zero unsafe in the audio runtime.
  • Service / process: one windows/process.rs owning RAII Token/Event/Job/Child + a single spawn_as_user() used by BOTH the SCM supervisor and any helper — deletes the duplicated token-dup/merged_env_block/CreateProcessAsUserW machinery and ~12 manual CloseHandle sites. Add a cooperative stop: a named stop event the supervisor sets and serve waits on, so Stop runs RAII teardown (today TerminateProcess skips Drop → the virtual monitor lingers, the documented stale-monitor gotcha); TerminateProcess only as a bounded fallback.
  • Packaging/CI: keep the thin-.iss / fat-binary model; add a punktfunk-host web install/uninstall subcommand to absorb the web-setup PowerShell. Build + sign the unified driver workspace in CI from source (or a CI guard that fails on stale-vendored-DLL / un-bumped DriverVer) so the driver can't silently drift from its source. Mint the fresh pf-vdisplay GUID coordinated across host + driver + INF. Single source of truth for version → build + ISCC AppVersion + INF DriverVer. Investigate retiring nefconc by creating the ROOT devnode via SwDevice/CM in Rust. Keep the devgen-never / nefconc-only and DriverVer-bump gotchas codified.

8. Unsafe-reduction program (run at port time, not as a separate pass)

  • P0 lints first (a few lines, before new code): #![deny(unsafe_op_in_unsafe_fn)] (host crate has none today; the driver workspace already has it), #![warn(clippy::undocumented_unsafe_blocks)], #![warn(clippy::multiple_unsafe_ops_per_block)]. Generated bindings keep their opt-out.
  • P0 std handle ownership: std::os::windows::io::OwnedHandle / std::fs::File::from_raw_handle everywhere a raw HANDLE/isize is held (events/jobs/tokens/sections/pipes). Used in zero host files today — the single biggest cheap win. Deletes the bespoke unsafe impl Read/Write/Drop (HandleReader), the never-closed sudovda control handle, the AtomicIsize HANDLE globals, ~6 manual CloseHandle sites — and fixes real leaks.
  • P0 the proto crate (§3.1) — kills the shared-memory pointer-cast unsafe.
  • P1 typed wrappers: windows/d3d/ (most COM calls already return Result; per-frame loop bodies become unsafe-free, the irreducible keyed-mutex/from_raw_parts lands in one frame_xfer fn); nvenc_sys + RAII ffmpeg (§5); one windows/process.rs (§7); collapse the 21 unsafe impl Send onto one audited SendPtr<T>/ThreadBound<T> (directly de-risks the NVENC in-place coupling).
  • P2 contain the irreducible: win32u_hook.rs (one install(); scope to secure-DDA or drop), gpu_priority.rs (the D3DKMT transmute), the WDF context-blob macro, the IddCx swap-chain DDI + from_raw_borrowed (wrap in a typed SwapChain guard returning a borrowed AcquiredSurface<'_>). Document a // SAFETY: per residual site.
  • P2 delete unsafe by deleting code: the present_trigger dead diagnostic, the DebugBlock channel, spawn_observer, IDD_PERSIST/open_or_reuse, helpers.rs Sendable<T>, the WGC-open thread-watchdog hack (gone with WGC), the driver file-logging.

Estimated: host ~144→~35, drivers ~227→~60, residual concentrated and auditable. (#![forbid(unsafe)] is impossible for the drivers and the per-frame D3D path — the realistic target is containment.)


9. SudoVDA decoupling (mechanical rename + scrub)

vdisplay/sudovda.rsvdisplay/windows.rs; SudoVdaDisplayPfVirtualDisplay; scrub "SudoVDA" from all log/error/doc strings across capture.rs/dxgi.rs/wgc*.rs/idd_push.rs/punktfunk1.rs/ main.rs/sendinput.rs (141 refs / 15 files). Split the reach-in helpers out of the vdisplay backend (they're display-utility, not virtual-display creation): set_advanced_color, advanced_color_enabled, resolve_gdi_name, isolate/restore_displays_ccd, set_active_modewindows/display_ccd.rs (collapsing the 4× copy-pasted QueryDisplayConfig preamble into one safe query_active_config()); resolve_render_adapter_luidwindows/adapter.rs. Both vdisplay and capture then depend on these as peers, breaking the circular reach-in. WinCaptureTarget moves to a neutral location (defined in dxgi.rs, constructed in sudovda.rs today). Drop the dual-driver fallback conditionals. Expose HDR/monitor-release as VirtualLease methods (zero unsafe in the session glue).


10. Build plan (greenfield — Decision A)

A from-scratch rebuild of the Windows host against the clean architecture, salvaging the §1 jewels verbatim (the already-clean, already-tested modules: hdr.rs, edid.rs, the inject/proto codecs, the HDR/cursor converters + their self-tests, the GF8 packetizer, the pairing handshake). The old Windows code stays in-tree, untouched, as the reference implementation until the new path reaches parity on glass, then is deleted.

Greenfield-risk mitigation (the survey's strong caveat stands): almost none of this is CI-validatable — the Windows backends + drivers need the RTX box (192.168.1.173) + the build VM, and AMF/QSV have no lab hardware at all. A greenfield rewrite therefore carries real risk of silently dropping a layered bug-fix. Two guardrails are mandatory:

  1. The §1 preservation checklist is a test/assert contract, not prose: each rebuilt module ports its hard-won invariants as unit tests or runtime asserts — RAII teardown order (restore displays before REMOVE), keyed-mutex held only across convert/copy, terminate checked at the swap-chain loop top, magic stamped last, OUT_RING texture rotation under pipeline_depth>1, the NVENC caps-probe downgrade, the SwDeviceCreate identity recipe. A rebuild that drops one fails its own test.
  2. On-glass A/B gates at each milestone below, on the RTX box, against the current shipping build: 1080p60, 5K@240 HDR, reconnect-storm, secure desktop (lock/UAC), multi-pad. Nothing replaces the old path until its A/B passes.

Build order

  • M0 — Foundations + the /INTEGRITYCHECK answer. Stand up crates/pf-vdisplay-proto (the clean, owned ABI: fresh GUID, the redesigned IOCTL op enum + #[repr(C)] structs, SharedHeader, FrameToken, the gamepad shm layouts, const size-asserts, round-trip tests). Stand up the in-tree packaging/windows/drivers/ workspace on windows-drivers-rs and prove the two hard unknowns: (a) the iddcx wdk-sys subset bindgen+links and a trivial IddCx adapter loads; (b) /INTEGRITYCHECK is solved (§6.2) so a self-signed driver loads under Secure Boot with no hand-patching. Add the P0 lints to the host crate. No host behavior yet.
  • M1 — pf-vdisplay on the new stack, first light. Rebuild the IddCx driver against windows-drivers-rs+iddcx, clean from the start: DeviceContext-owned state (no process-globals), one Monitor identity, EvtCleanupCallback on MonitorContext, the ported Arc<RwLock<T>> context, the EDID + HDR recipe verbatim, the redesigned control plane from the proto crate. (On-glass: ADD → monitor arrives → IDD-push ring attaches → frames flow at 1080p; REMOVE clean.)
  • M2 — IDD-push capture + NVENC, glass-to-glass. New src/windows/ tree: windows/d3d/ typed wrappers, windows/color/ (converters + self-tests), windows/cursor.rs, capture/windows/idd_push.rs consuming the proto ring with a type-level texture-ownership contract (no in-place-encode assumption), encode/windows/{nvenc.rs,nvenc_sys.rs}, vdisplay/windows.rs + windows/display_ccd.rs
    • windows/adapter.rs. Wire the SessionFactory/SessionPlan (M2 only needs the IDD-push+NVENC plan). (On-glass A/B: 1080p60 + 5K@240 HDR, latency parity with the current build.)
  • M3 — Service, input, audio, secure desktop. windows/process.rs (RAII Token/Event/Job/Child + spawn_as_user + cooperative stop) + windows/service.rs; inject/windows/* on the proto shm + consolidated swdevice.rs; audio/windows/* (zero-unsafe runtime). Confirm IDD-push captures the secure desktop (lock/UAC) and input reaches the streamed session (in-session SendInput, or the input-only agent if needed). (On-glass: full session incl. lock screen + UAC + a real pad.)
  • M4 — Gamepad drivers onto the unified stack. Rebuild pf_dualsense + pf_xusb on windows-drivers-rs in the same workspace, WDF device contexts (true multi-pad), proto shm, WDF_*_CONFIG_INIT, no file logging, no "M0 spike" naming. (On-glass: 2 XInput + 2 DualSense pads, rumble/lightbar/adaptive-trigger round-trip.)
  • M5 — Fallbacks + GameStream + AMF/QSV. Port the demoted WGC + DDA fallback capturers (minimal, win32u hook scoped to the DDA leg); encode/windows/ffmpeg_win/* with RAII FFmpeg + the d3d11va_ffi size-assert (system-readback default; zero-copy experimental); GameStream planes reusing session/pipeline.rs, installer default flipped to secure serve. (On-glass: Moonlight client on the DDA fallback; AMF/QSV stays CI-only.)
  • M6 — Cut over + delete. Flip the default to the new path, run the full A/B matrix, then delete the old dxgi.rs/wgc*/sudovda.rs/punktfunk1.rs Windows monoliths + the bring-up scaffolding (DebugBlock/spawn_observer/observe gate) + the old gamepad driver crates. Single source of truth for version; CI builds+signs all drivers from source.

Milestones are roughly dependency-ordered; M0 is the long pole (the /INTEGRITYCHECK + iddcx proof gates everything else). M5's AMF/QSV cannot be validated without hardware — keep it system-readback-only and clearly experimental.


11. Decisions (resolved 2026-06-24) + open verification items

The five product forks are decided (see the table in §0): A greenfield; B IDD-push primary for everything incl. secure desktop, WGC+DDA kept as demoted fallbacks; C extend windows-drivers-rs + solve /INTEGRITYCHECK; D keep GameStream, default secure. On E (concurrent sessions): fix the driver swap-chain lifecycle regardless (it removes the leak + the preempt dance); treat true max_concurrent>1 on Windows as a follow-on once clean reuse is proven on glass.

What remains are technical unknowns to confirm on the RTX box (not user decisions):

  • /INTEGRITYCHECK resolution path (M0 long pole). Can wdk-build suppress /INTEGRITYCHECK via config/link-args (preferred), or must we keep a deterministic CI post-link bit-clear? Decides the signing story for all three drivers.
  • iddcx subset on wdk-sys. Does the bindgen+IddCxStub link cleanly, and does the SDK-resolution fix need backporting? (windows-drivers-rs doesn't exercise IddCx today.)
  • Driver swap-chain reuse. Does the clean ownership model (EvtCleanupCallback + DeviceContext state
    • single Monitor identity) actually fix the "reused swap-chain dies after ~2 sessions" root cause? If not, the residual serialization stays inside VirtualDisplayManager.
  • IDD-push input + secure desktop. Confirm serve runs in the console session so SendInput reaches the streamed desktop (a code comment warns about Session 0→1); confirm IDD-push frames flow through the lock screen / UAC (owner reports yes — verify and lock it in as the primary, demoting the DDA secure leg to fallback).
  • Does the demoted DDA fallback still need the win32u hook against pf-vdisplay, or was that purely a SudoVDA/hybrid pathology? If unneeded, the self-modifying-code hook can be deleted entirely.
  • AMF/QSV stays CI-only (no hardware) — system-readback default, zero-copy experimental.

12. Risks

  • Greenfield with no CI (the dominant risk). The build VM is headless/WARP; the WinUI/hardware/driver paths need the RTX box, and AMF/QSV have no hardware. A from-scratch rebuild can silently drop a layered bug-fix. Mitigation: the §1 preservation checklist is a test/assert contract per rebuilt module; on-glass A/B gates the new path before the old one is deleted (M6); keep the old code in-tree as the reference until parity.
  • /INTEGRITYCHECK (M0 long pole). Choosing windows-drivers-rs means self-signed loading depends on solving it cleanly (§6.2). If neither linker-flag suppression nor a deterministic CI post-link step works, drivers can't load self-signed — prove this first, it gates everything.
  • iddcx on wdk-sys is new surface (windows-drivers-rs doesn't bind IddCx). Bounded (IddCxStub exports + ~15 wrappers, with the validated wdf-umdf/iddcx.rs as oracle) but unproven on this stack — M0 must light it.
  • pf-vdisplay-proto spans two cargo build graphs (host workspace + the driver workspace). Validate the path-dep resolves on the Windows build env in M0; pin wdk* provenance so the driver workspace builds reproducibly in CI.
  • Driver swap-chain-reuse root cause still undiagnosed. The clean ownership model should fix it; if not, residual serialization stays inside VirtualDisplayManager and max_concurrent>1 stays blocked. Keep await_released on the trait until reuse is proven on glass.
  • NVENC in-place encode + pipeline_depth>1 is a latent corruption risk; the M2 texture-ownership contract must be type-level (not the synchronous-loop assumption). Verify the ring on glass.
  • Host/driver version drift in the field. New host + new driver are always built together (greenfield), but the installer bundles both — enforce a startup version handshake (proto version in both binaries) and a CI guarantee they're built from the same revision.
  • Big-bang cutover (M6). Flipping the default and deleting the old monoliths is the riskiest moment; it is gated on the full A/B matrix passing, and the old code is recoverable from git if a regression surfaces post-cutover.