From 0b663cefb6d080336e5ab2ce8d07935f46787fc9 Mon Sep 17 00:00:00 2001 From: enricobuehler Date: Wed, 24 Jun 2026 06:49:50 +0000 Subject: [PATCH] =?UTF-8?q?feat(windows):=20pf-vdisplay-proto=20=E2=80=94?= =?UTF-8?q?=20owned=20host<->driver=20ABI=20crate=20(rewrite=20M0)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First foundation of the Windows-host rewrite (docs/windows-host-rewrite.md): a self-contained, no_std + bytemuck crate that defines the host<->driver binary contract ONCE — the control-plane IOCTLs (add/remove/set-render-adapter/ping/ get-info/clear-all) and the IDD-push frame transport (SharedHeader, the (gen<<40|seq<<8|slot) FrameToken, the Global\pfvd-* name scheme, driver-status codes). Previously these were hand-duplicated byte-for-byte across idd_push.rs/frame_transport.rs and sudovda.rs/control.rs with only "must match" comments; here const size-asserts + bytemuck round-trips make any drift a COMPILE error. Clean break from SudoVDA: a freshly-minted interface GUID (not e5bcc234), a contiguous 0x900 op space (not the gappy 0x800/0x888/0x8FF), a u64 session id (not the 16-byte GUID + pid-mangling), a single u32 protocol version. Self-contained (no workspace inheritance, no Windows deps) so the out-of-workspace driver build graph can path-dep it identically. 7 tests green on Linux; clippy + fmt clean. Also lands the full rewrite plan in docs/windows-host-rewrite.md (decisions: greenfield; IDD-push primary incl. secure desktop, WGC+DDA demoted to fallbacks; unify drivers on windows-drivers-rs + solve /INTEGRITYCHECK; keep GameStream, default secure). Co-Authored-By: Claude Opus 4.8 (1M context) --- Cargo.lock | 21 + Cargo.toml | 1 + crates/pf-vdisplay-proto/Cargo.toml | 19 + crates/pf-vdisplay-proto/src/lib.rs | 311 ++++++++++++++ docs/windows-host-rewrite.md | 617 ++++++++++++++++++++++++++++ 5 files changed, 969 insertions(+) create mode 100644 crates/pf-vdisplay-proto/Cargo.toml create mode 100644 crates/pf-vdisplay-proto/src/lib.rs create mode 100644 docs/windows-host-rewrite.md diff --git a/Cargo.lock b/Cargo.lock index 9d25097..9e9f95d 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -453,6 +453,20 @@ name = "bytemuck" version = "1.25.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c8efb64bd706a16a1bdde310ae86b351e4d21550d98d056f22f8a7f7a2183fec" +dependencies = [ + "bytemuck_derive", +] + +[[package]] +name = "bytemuck_derive" +version = "1.10.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f9abbd1bc6865053c427f7198e6af43bfdedc55ab791faed4fbd361d789575ff" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] [[package]] name = "bytes" @@ -2404,6 +2418,13 @@ version = "2.3.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220" +[[package]] +name = "pf-vdisplay-proto" +version = "0.0.1" +dependencies = [ + "bytemuck", +] + [[package]] name = "pin-project-lite" version = "0.2.17" diff --git a/Cargo.toml b/Cargo.toml index 21706c4..69a6ea8 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -3,6 +3,7 @@ resolver = "2" members = [ "crates/punktfunk-core", "crates/punktfunk-host", + "crates/pf-vdisplay-proto", "clients/probe", "clients/linux", "clients/windows", diff --git a/crates/pf-vdisplay-proto/Cargo.toml b/crates/pf-vdisplay-proto/Cargo.toml new file mode 100644 index 0000000..2092166 --- /dev/null +++ b/crates/pf-vdisplay-proto/Cargo.toml @@ -0,0 +1,19 @@ +# Shared host<->driver binary contract for the punktfunk pf-vdisplay virtual display. +# +# Deliberately self-contained (no `*.workspace = true` inheritance, no Windows deps): this crate is a +# path dependency of BOTH the host workspace (crates/punktfunk-host) AND the out-of-workspace driver +# workspace (packaging/windows/drivers/), so it must resolve identically from either build graph. It is +# `no_std` (+ alloc) and platform-neutral; the GUID/LUID are plain integers each side converts to its +# own OS type. Defining every wire struct ONCE here — with `const` size/offset asserts + bytemuck +# round-trips — makes host<->driver ABI drift a COMPILE error instead of a silent frame/IOCTL corruption. +[package] +name = "pf-vdisplay-proto" +version = "0.0.1" +edition = "2021" +rust-version = "1.82" +license = "MIT OR Apache-2.0" +description = "Shared host<->driver binary contract for the punktfunk pf-vdisplay virtual display (control IOCTLs + IDD-push frame transport)." +publish = false + +[dependencies] +bytemuck = { version = "1.19", features = ["derive"] } diff --git a/crates/pf-vdisplay-proto/src/lib.rs b/crates/pf-vdisplay-proto/src/lib.rs new file mode 100644 index 0000000..584aae1 --- /dev/null +++ b/crates/pf-vdisplay-proto/src/lib.rs @@ -0,0 +1,311 @@ +//! Shared binary contract between the punktfunk host and the `pf-vdisplay` IddCx driver. +//! +//! Two planes: +//! * [`control`] — the low-frequency `DeviceIoControl` plane (add/remove a virtual monitor, pin the +//! render adapter, keepalive, info, clear-all). Owned, clean, versioned — NOT the SudoVDA ABI. +//! * [`frame`] — the IDD-push frame transport: the host creates a ring of shared keyed-mutex textures +//! (+ a header + a frame-ready event) and the driver opens them and publishes composited frames into +//! them. This crate owns the [`frame::SharedHeader`] layout, the [`frame::FrameToken`] packing, the +//! `Global\` object-name scheme, and the driver-status codes. +//! +//! Both planes were previously hand-duplicated, byte-for-byte, across `idd_push.rs`/`frame_transport.rs` +//! and `vdisplay/sudovda.rs`/`control.rs` with only "must match" comments guarding them. Defining them +//! once here — with bytemuck `Pod` derives and `const` size asserts — makes any drift a compile error. +//! +//! The GUID and LUID are carried as plain integers; the host converts to `windows::core::GUID` / +//! `windows::Win32::Foundation::LUID` and the driver to its own bindgen types via the same constants. + +#![cfg_attr(not(test), no_std)] + +extern crate alloc; + +/// Freshly-minted pf-vdisplay device-interface GUID — `{70667664-7044-5350-a1b2-c3d4e5f60001}`. +/// Deliberately NOT SudoVDA's `{e5bcc234-…}`: we own the driver, so a private interface GUID signals +/// it and removes any accidental coexistence with a real SudoVDA install. Construct on each side via +/// `GUID::from_u128(PF_VDISPLAY_INTERFACE_GUID_U128)`. +pub const PF_VDISPLAY_INTERFACE_GUID_U128: u128 = 0x7066_7664_7044_5350_a1b2_c3d4_e5f6_0001; + +/// Bumped on any incompatible change to either plane. Exchanged via [`control::IOCTL_GET_INFO`]; host +/// and driver assert a match at startup so a mismatched pair fails loudly instead of corrupting. +pub const PROTOCOL_VERSION: u32 = 1; + +/// `CTL_CODE(FILE_DEVICE_UNKNOWN = 0x22, func, METHOD_BUFFERED = 0, FILE_ANY_ACCESS = 0)`. +pub const fn ctl_code(func: u32) -> u32 { + (0x22u32 << 16) | (func << 2) +} + +/// The control (`DeviceIoControl`) plane: add/remove a virtual monitor + adapter pin + keepalive. +pub mod control { + use super::ctl_code; + use bytemuck::{Pod, Zeroable}; + + // Contiguous op space at 0x900 — distinct from SudoVDA's gappy 0x800/0x888/0x8FF numbering. + /// Add a virtual monitor at a mode → [`AddReply`]. Input [`AddRequest`]. + pub const IOCTL_ADD: u32 = ctl_code(0x900); + /// Remove a virtual monitor by session id. Input [`RemoveRequest`]. + pub const IOCTL_REMOVE: u32 = ctl_code(0x901); + /// Pin the IddCx render adapter (hybrid-GPU IDD-push). Input [`SetRenderAdapterRequest`]. + pub const IOCTL_SET_RENDER_ADAPTER: u32 = ctl_code(0x902); + /// Keepalive (resets the driver watchdog). No payload. + pub const IOCTL_PING: u32 = ctl_code(0x903); + /// Version + watchdog handshake → [`InfoReply`]. No input. + pub const IOCTL_GET_INFO: u32 = ctl_code(0x904); + /// Tear down every virtual monitor (host-startup orphan reap). No payload. First-class op — NOT the + /// SudoVDA "send-and-hope-it's-ignored" hack. + pub const IOCTL_CLEAR_ALL: u32 = ctl_code(0x905); + + /// `IOCTL_ADD` input. A monotonic `session_id` keys the monitor (the host's refcount manager owns + /// collision safety — no more SudoVDA's 16-byte GUID + pid-mangling). The driver advertises this + /// mode as preferred; the host still CCD-forces the active mode (the OS activates IDDs at a default). + #[repr(C)] + #[derive(Clone, Copy, Pod, Zeroable, Debug, PartialEq, Eq)] + pub struct AddRequest { + pub session_id: u64, + pub width: u32, + pub height: u32, + pub refresh_hz: u32, + pub _reserved: u32, + } + + /// `IOCTL_ADD` reply: the OS target id + the adapter LUID the IDD landed on (split low/high to + /// match `windows` `LUID { LowPart: u32, HighPart: i32 }`). + #[repr(C)] + #[derive(Clone, Copy, Pod, Zeroable, Debug, PartialEq, Eq)] + pub struct AddReply { + pub adapter_luid_low: u32, + pub adapter_luid_high: i32, + pub target_id: u32, + pub _reserved: u32, + } + + /// `IOCTL_REMOVE` input. + #[repr(C)] + #[derive(Clone, Copy, Pod, Zeroable, Debug, PartialEq, Eq)] + pub struct RemoveRequest { + pub session_id: u64, + } + + /// `IOCTL_SET_RENDER_ADAPTER` input (the GPU the IddCx swap-chain should render on). + #[repr(C)] + #[derive(Clone, Copy, Pod, Zeroable, Debug, PartialEq, Eq)] + pub struct SetRenderAdapterRequest { + pub luid_low: u32, + pub luid_high: i32, + } + + /// `IOCTL_GET_INFO` reply: the protocol version (asserted against [`super::PROTOCOL_VERSION`]) and + /// the watchdog timeout the host must ping within. + #[repr(C)] + #[derive(Clone, Copy, Pod, Zeroable, Debug, PartialEq, Eq)] + pub struct InfoReply { + pub protocol_version: u32, + pub watchdog_timeout_s: u32, + } + + // Layout is load-bearing across the process boundary — pin it. (bytemuck's Pod derive already + // rejects any internal padding; these assert the externally-visible sizes too.) + const _: () = { + assert!(core::mem::size_of::() == 24); + assert!(core::mem::size_of::() == 16); + assert!(core::mem::size_of::() == 8); + assert!(core::mem::size_of::() == 8); + assert!(core::mem::size_of::() == 8); + }; +} + +/// The IDD-push frame transport: the host-created shared ring header, the publish token, the names, and +/// the driver-status codes. The texture ring itself is host-created D3D11 keyed-mutex textures (opened +/// by name on the driver side); only the *layout/contract* lives here. +pub mod frame { + use alloc::string::String; + use bytemuck::{Pod, Zeroable}; + + /// Header magic (`"PFVD"` LE). The host stamps it LAST (after the ring textures exist) so the driver + /// only attaches to a fully-published ring. + pub const MAGIC: u32 = 0x4456_4650; + /// Frame-plane version (independent bump of the header layout). + pub const VERSION: u32 = 1; + /// Ring slots. Headroom so the driver's 0 ms-timeout publish always finds a free slot while the host + /// holds one across the convert/copy + the pipelined encode. MUST be identical on both sides — it is, + /// because both read this one constant. + pub const RING_LEN: u32 = 6; + + /// `driver_status` values the driver writes into the host header (the host logs them on a timeout). + pub const DRV_STATUS_NONE: u32 = 0; + /// Driver attached to the ring and is publishing. + pub const DRV_STATUS_OPENED: u32 = 1; + /// Driver could not open the host's textures — render-adapter mismatch (it renders on a different GPU + /// than where the host created the ring). `driver_status_detail` carries the HRESULT. + pub const DRV_STATUS_TEX_FAIL: u32 = 2; + /// Driver has no `ID3D11Device1` to open shared resources. + pub const DRV_STATUS_NO_DEVICE1: u32 = 3; + + /// The shared metadata header (host-created, mapped by both sides). Atomic fields (`magic`, `latest`, + /// `generation`) are accessed via each side's own atomic view over the mapping; this is the layout. + #[repr(C)] + #[derive(Clone, Copy, Pod, Zeroable, Debug)] + pub struct SharedHeader { + pub magic: u32, + pub version: u32, + /// Bumped by the host on a ring recreate (HDR-mode flip → new texture format/names). The driver + /// re-attaches when it changes; a publish carries it so the host rejects a stale-ring publish. + pub generation: u32, + pub ring_len: u32, + pub width: u32, + pub height: u32, + pub dxgi_format: u32, + pub _pad: u32, + /// Driver-written after each copy; host loads `Acquire`. See [`FrameToken`]. + pub latest: u64, + pub qpc_pts: u64, + /// Driver-written: the adapter the swap-chain actually renders on (mismatch detection). + pub driver_render_luid_low: u32, + pub driver_render_luid_high: i32, + /// Driver-written status (visibility channel — UMDF hides OutputDebugString + the restricted + /// token blocks file writes, so this header is how the driver reports state). + pub driver_status: u32, + pub driver_status_detail: u32, + } + + /// The `SharedHeader.latest` publish token: `(generation << 40) | (seq << 8) | slot`. + /// `generation` is 24-bit, `seq` 32-bit, `slot` 8-bit. The generation tag lets the host REJECT a + /// publish from a stale ring (an old-generation publisher racing a mid-session recreate) so it never + /// consumes an unwritten new-ring slot — eliminating the toggle-time garbage frame. + #[derive(Clone, Copy, Debug, PartialEq, Eq)] + pub struct FrameToken { + pub generation: u32, + pub seq: u32, + pub slot: u8, + } + + impl FrameToken { + /// Low 24 bits of `generation` are significant (see the field docs). + pub const GENERATION_MASK: u32 = 0x00FF_FFFF; + + pub const fn pack(self) -> u64 { + (((self.generation & Self::GENERATION_MASK) as u64) << 40) + | (((self.seq as u64) & 0xFFFF_FFFF) << 8) + | (self.slot as u64) + } + + pub const fn unpack(v: u64) -> Self { + Self { + generation: ((v >> 40) as u32) & Self::GENERATION_MASK, + seq: ((v >> 8) & 0xFFFF_FFFF) as u32, + slot: (v & 0xFF) as u8, + } + } + } + + /// `Global\pfvd-hdr-` — the shared metadata header mapping name. + pub fn header_name(target_id: u32) -> String { + alloc::format!("Global\\pfvd-hdr-{target_id}") + } + /// `Global\pfvd-evt-` — the frame-ready auto-reset event name. + pub fn event_name(target_id: u32) -> String { + alloc::format!("Global\\pfvd-evt-{target_id}") + } + /// `Global\pfvd-tex---` — a ring texture's shared-handle name. The + /// generation in the name means a recreate's new textures never collide with the old ring's + /// not-yet-released handles. + pub fn texture_name(target_id: u32, generation: u32, slot: u32) -> String { + alloc::format!("Global\\pfvd-tex-{target_id}-{generation}-{slot}") + } + + const _: () = { + assert!(core::mem::size_of::() == 64); + }; +} + +#[cfg(test)] +mod tests { + use super::*; + use bytemuck::Zeroable; + + #[test] + fn frame_token_roundtrips() { + for (g, s, slot) in [ + (1u32, 0u32, 0u8), + (5, 12_345, 3), + (frame::FrameToken::GENERATION_MASK, 0xFFFF_FFFF, 5), + (0, 1, 255), + ] { + let t = frame::FrameToken { + generation: g, + seq: s, + slot, + }; + assert_eq!(frame::FrameToken::unpack(t.pack()), t); + } + } + + #[test] + fn frame_token_packing_matches_legacy_layout() { + // The legacy code packed (gen<<40)|(seq<<8)|slot by hand; lock the bit positions. + let t = frame::FrameToken { + generation: 7, + seq: 42, + slot: 3, + }; + assert_eq!(t.pack(), (7u64 << 40) | (42u64 << 8) | 3u64); + } + + #[test] + fn shared_header_is_pod_and_64_bytes() { + let mut h = frame::SharedHeader::zeroed(); + h.magic = frame::MAGIC; + h.width = 5120; + h.height = 1440; + let bytes = bytemuck::bytes_of(&h); + assert_eq!(bytes.len(), 64); + let back: frame::SharedHeader = *bytemuck::from_bytes(bytes); + assert_eq!(back.magic, frame::MAGIC); + assert_eq!(back.width, 5120); + assert_eq!(back.height, 1440); + } + + #[test] + fn control_structs_roundtrip_through_bytes() { + let req = control::AddRequest { + session_id: 0xDEAD_BEEF_CAFE_F00D, + width: 3840, + height: 2160, + refresh_hz: 120, + _reserved: 0, + }; + let bytes = bytemuck::bytes_of(&req); + assert_eq!(bytes.len(), 24); + assert_eq!(*bytemuck::from_bytes::(bytes), req); + } + + #[test] + fn names_are_stable() { + assert_eq!(frame::header_name(10), "Global\\pfvd-hdr-10"); + assert_eq!(frame::event_name(10), "Global\\pfvd-evt-10"); + assert_eq!(frame::texture_name(10, 3, 5), "Global\\pfvd-tex-10-3-5"); + } + + #[test] + fn ctl_codes_are_contiguous_and_distinct() { + assert_eq!(control::IOCTL_ADD, ctl_code(0x900)); + let all = [ + control::IOCTL_ADD, + control::IOCTL_REMOVE, + control::IOCTL_SET_RENDER_ADAPTER, + control::IOCTL_PING, + control::IOCTL_GET_INFO, + control::IOCTL_CLEAR_ALL, + ]; + for (i, a) in all.iter().enumerate() { + for b in &all[i + 1..] { + assert_ne!(a, b); + } + } + } + + #[test] + fn guid_is_not_sudovda() { + const SUDOVDA: u128 = 0xE5BC_C234_1E0C_418A_A0D4_EF8B_7501_414D; + assert_ne!(PF_VDISPLAY_INTERFACE_GUID_U128, SUDOVDA); + } +} diff --git a/docs/windows-host-rewrite.md b/docs/windows-host-rewrite.md new file mode 100644 index 0000000..de9e49e --- /dev/null +++ b/docs/windows-host-rewrite.md @@ -0,0 +1,617 @@ +# Windows Host Rewrite — Design & Plan + +Status: **proposed** (2026-06-24). This plan takes the current, hard-won Windows host (pf-vdisplay +all-Rust IddCx driver + IDD-push zero-copy capture, live-validated 5120×1440@240 HDR on the RTX box) +as a *knowledge base* and re-derives a clean, stable, well-layered architecture from it. It drops all +SudoVDA back-compat (we own both ends now) and drives `unsafe` to a contained minimum. + +It supersedes the stale conclusion in `docs/windows-virtual-display-rust-port.md` ("IDD-push not +viable") — that verdict was written in the *same commit* (`e2c9bfd`) that shipped the working +922-line consumer + 424-line producer. **IDD-push works and is the architecture.** The breakthrough the +prose never recorded: once the CCD topology makes the virtual display the sole composited desktop in the +console session, DWM composites to it and the IddCx swap-chain *is* assigned +(`run_core: FIRST FRAME acquired — DWM IS compositing the virtual display!`). Per the owner, **IDD-push +also captures the secure desktop (Winlogon / UAC / lock)** — so it is the universal primary path, not +just the normal-desktop path. + +### Decisions resolved (2026-06-24) + +| # | Decision | Chosen | +|---|----------|--------| +| A. Execution | greenfield vs staged | **Greenfield rewrite** — rebuild the Windows host fresh against the clean architecture, salvaging the validated "jewels" (§1) verbatim. (Risk acknowledged: no CI for the Windows paths — mitigated by the §1 preservation checklist + on-glass gates, §10.) | +| B. Capture surface | IDD-only / IDD+secure-DDA / keep fallbacks | **IDD-push primary for everything (incl. the secure desktop); keep WGC + DDA as fallbacks.** | +| C. Driver binding stack | wdf-umdf vs windows-drivers-rs | **Extend `microsoft/windows-drivers-rs`** with an `iddcx` subset; unify all three drivers on it; **solve `/INTEGRITYCHECK` properly** (§6). | +| D. GameStream on Windows | keep / keep-secure-default / drop | **Keep Moonlight compat; flip the installer/service default to secure `serve`** (GameStream an explicit opt-in). | + +--- + +## 0. Goals (from the brief) + +1. **Clean, stable, well-layered architecture.** Decompose the god-files, give every subsystem one + owner, and replace the ~40-knob `PUNKTFUNK_*` env soup with a typed config resolved once per session. +2. **Drop every trace of SudoVDA back-compat.** We own the driver (`pf-vdisplay`) and the host. The + byte-identical IOCTL ABI, the reused `{e5bcc234}` GUID, the `sudovda` module name, the "SudoVDA + ignores this" conditionals — all pure liability now. +3. **Minimize `unsafe`.** ~480 `unsafe` occurrences across the Windows surface; the large majority are + FFI-mechanical (windows-rs/NVENC/WDK already return `Result`). Target: host ~144→~35, drivers + ~227→~60, with the irreducible floor *contained* in 3–4 named modules under + `deny(unsafe_op_in_unsafe_fn)`. + +### Non-goals / invariants (do not regress) + +- **Linux host behavior is out of scope and must not change.** The host crate is shared; Linux is + validated across KWin/gamescope/Mutter/Sway. Touch only the seams. +- **`punktfunk-core` stays the one linked core.** Protocol/FEC/crypto/QUIC live there behind the C + ABI; the host is a leaf binary. No protocol changes here. +- **No async on the per-frame path.** Native threads only (the existing discipline). + +--- + +## 1. What we KEEP (validated, load-bearing — port, don't rewrite) + +These are expensive empirical wins. The rewrite relocates/wraps them but must preserve behavior +byte-for-byte: + +- **The IDD-push frame transport shape**: host-creates / driver-opens shared keyed-mutex texture ring + with the permissive `D:(A;;GA;;;WD)` SDDL (forced by the restricted WUDFHost token, mirrors the + gamepad drivers); the generation-tagged `latest = gen<<40 | seq<<8 | slot` stale-ring reject (kills + the HDR-flip garbage frame); 0 ms try-acquire / drop-on-full publish (never block the swap-chain + thread); the host output ring `OUT_RING` + `pipeline_depth=2` overlap of convert/copy vs NVENC. +- **The IddCx driver internals that earned their keep**: `edid.rs` in full (128-byte EDID + CTA-861.3 + HDR block, serial-as-index round-trip, dual checksums); the HDR enablement recipe (`CAN_PROCESS_FP16` + + the `*2` mode DDIs + `set_gamma_ramp`/`set_default_hdr_metadata` accept-stubs + `HIGH_COLOR_SPACE` + + 8|10 bpc); `DEVICE_POOL` one-device-per-render-LUID (the NVIDIA UMD-thread/VRAM leak fix); stamping + the OS target id onto the monitor context (the recreated-monitor `target_id=0` fix); the swap-chain + processor's two real leak fixes (borrow `IDXGIDevice` across `SetDevice` retries; check `terminate` + at the loop top during a frame burst). +- **The monitor-lifecycle concurrency correctness**: serialized ADD/REMOVE/teardown, the documented + lock order, the watchdog CAS + re-check-under-lock, the creation grace window, the + generation-stamped lease (a stale lease can't tear down a fresh monitor). *Structure* can change; + these properties must survive. +- **The CCD topology fixes**: `isolate_displays_ccd` (the iGPU-attached-monitor hybrid-box correctness; + the `SDC_FORCE_MODE_ENUMERATION` re-commit that drives `COMMIT_MODES → ASSIGN_SWAPCHAIN`); restore + topology *before* REMOVE. +- **The HDR color math**: `hdr.rs` verbatim (pure, unit-tested, ST.2086 G/B/R + big-endian SEI); + `HdrConverter`/`HdrP010Converter` + the f64 `p010_reference` + `hdr_p010_selftest`; `VideoConverter` + (RGB→NV12/P010 on the video engine — a measured latency win); the cursor decomposition + (`convert_pointer_shape` color/masked/monochrome edge cases). +- **NVENC tuning**: caps-probe-before-configure (disambiguate unsupported-config vs too-high-bitrate; + 10-bit→8-bit graceful downgrade); the bitrate-clamp binary search (finds each GPU's real ceiling); + true RFI over the DPB; the low-latency configs (CBR, infinite GOP, P-only, ~1-frame VBV). +- **The gamepad driver wins**: the SwDeviceCreate identity recipe (enumerator with no `_`; mandatory + completion callback; synthesized `USB\VID_054C&PID_0CE6` compat-ids for native-DS5 detection; the + non-null per-pad `ContainerId` dodging the xinput1_4 slot-skip); one `pf_dualsense` serving + DualSense+DS4 via a `device_type` byte; XUSB declining `WAIT_*` to force synchronous `GET_STATE`; + the static HID descriptors/feature blobs; per-pad index via `pszDeviceLocation`. +- **The session-glue patterns**: the `Capturer`/`VirtualDisplay`/`Encoder` trait seam + RAII keepalive + teardown; host-lifetime shared services (`InjectorService`/`MicService`/`AudioCapSlot`) with + per-session gamepads; the encode|send thread split + microburst pacing; `build_pipeline_with_retry` + + permanent-vs-transient classification; the control-task `select!` + adaptive-FEC; the GameStream + `VideoPacketizer` (GF8 Cauchy, Moonlight byte-exact); the pairing/trust handshake. +- **The SCM supervisor model**: Session-0 LocalSystem supervisor → token-retarget → + `CreateProcessAsUserW` `serve` into the console session, relaunch-on-session-change, kill-on-close + Job Object; the file-append log-mask; the two-tier logging init. +- **Build/CI wins**: the `wdf-umdf-sys` build.rs SDK-version resolution (picks the SDK version that + actually contains `iddcx`, not the max base SDK); the ARM64 cross-compile off the x64 runner; the + thin-.iss / fat-binary installer delegating to `service install`. + +--- + +## 2. Target architecture + +### 2.1 Crate & workspace strategy + +**Keep ONE shared `crates/punktfunk-host` crate** (do *not* split `punktfunk-host-windows`). The host is +a leaf binary consumed by nobody; the "one core, linked everywhere" invariant is already satisfied by +`punktfunk-core`. A split would only fork the genuinely-shared session glue, traits, and `hdr.rs`. The +cfg-sprawl win comes instead from confining all Windows code under one `src/windows/` subtree behind a +single `#[cfg(windows)] mod windows;` seam, with backend impls next to their trait's dispatch point. + +**Pull the three drivers into ONE in-tree driver workspace** (`packaging/windows/drivers/`) on a single +binding stack, one `rust-toolchain.toml`, one signing recipe, one CI build. Today they are 2–3 disjoint +cargo packages on two incompatible WDK stacks (see §6). + +**Add ONE shared `no_std` ABI crate** (`crates/pf-vdisplay-proto`, name TBD) consumed by both the host +crate and the driver workspace. It owns *every* cross-process binary contract that is currently +hand-duplicated with "must match" comments. This is the single highest-value correctness change (§4.1). + +### 2.2 Target file tree (host crate) + +``` +crates/punktfunk-host/src/ + main.rs clap-derive subcommand dispatch only (kills parse_serve/parse_spike/hand --help) + config.rs HostConfig (typed; parsed ONCE from host.env/env/flags) + config_dir + session/ + mod.rs SessionFactory, SessionPlan, SessionContext, Session (the ONLY teardown path) + server.rs QUIC accept loop, handshake, shared-service wiring + serve_session.rs resolve_* → Welcome/Start → spawn → RAII teardown + control.rs mid-stream renegotiation select! loop + pipeline.rs REAL shared encode|send split, send_loop, FrameMsg, pacing (used by native AND GameStream) + capture.rs Capturer trait + CapturedFrame/PixelFormat/FramePayload (platform-neutral) + capture/linux.rs + capture/windows/ mod.rs (dispatch), idd_push.rs, dda.rs, wgc.rs, secure_desktop.rs* + vdisplay.rs VirtualDisplay/VirtualOutput trait + open() dispatch (neutral) + vdisplay/{kwin,gamescope,mutter,wlroots}.rs + vdisplay/windows.rs was sudovda.rs → PfVirtualDisplay + VirtualDisplayManager + encode.rs Encoder trait, EncodedFrame, validate_dimensions, open_encoder dispatch + encode/{linux,vaapi,sw}.rs + encode/windows/ mod.rs (dispatch), nvenc.rs, nvenc_sys.rs, ffmpeg_win/{mod,system,zerocopy,d3d11va_ffi}.rs + hdr.rs PRESERVE VERBATIM + inject.rs / inject/linux/* / inject/windows/{mod,sendinput,pad_manager,xusb,dualsense,dualshock4,swdevice,section}.rs + inject/proto/{dualsense,dualshock4}.rs shared pure codecs (PRESERVE) + audio.rs / audio/linux.rs / audio/windows/{mod,wasapi_cap,wasapi_mic}.rs + windows/ mod.rs, d3d/{mod,texture,ring,convert}.rs, color/{hdr,p010,video_proc}.rs, + cursor.rs, display_ccd.rs, adapter.rs, process.rs (Token/Event/Job/Child/spawn_as_user), + service.rs (SCM; uses process.rs), win32u_hook.rs*, gpu_priority.rs + session_tuning.rs (PRESERVE) / pwinit.rs / discovery.rs / mgmt.rs / native_pairing.rs / library.rs + gamestream/ unchanged module set; stream.rs slims by reusing session/pipeline.rs +``` +`*` = survives only per the secure-desktop / WGC product decisions (§5, §11). + +### 2.3 The seam traits (keep the shape; tighten 3 things) + +```rust +trait VirtualDisplay: Send { + fn name(&self) -> &str; + fn create(&self, mode: Mode) -> Result; + fn set_launch_command(&self, cmd: Option); // per-instance, not a global env var +} +struct VirtualOutput { + node_id: u32, + preferred_mode: Mode, + #[cfg(windows)] win_capture: WinCaptureTarget, // target_id + adapter_luid + monitor_gen (carried, not ambient) + keepalive: Box, +} +trait VirtualLease: Send { // Drop = release; replaces the sudovda free-fns + CURRENT_MON_GEN reach-in + fn set_hdr(&self, on: bool) -> Result<()>; + fn hdr_enabled(&self) -> bool; + fn await_released(&self, timeout: Duration) -> bool; +} + +trait Capturer: Send { + fn next_frame(&mut self) -> Result; + fn try_latest(&mut self) -> Option; + fn set_active(&mut self, a: bool); + fn hdr_meta(&self) -> Option; + fn pipeline_depth(&self) -> usize; +} +fn open_capturer(vout: VirtualOutput, want: OutputFormat) -> Result>; // format+HDR passed IN + +trait Encoder: Send { + fn submit(&mut self, f: &CapturedFrame) -> Result<()>; + fn poll(&mut self) -> Option; + fn flush(&mut self); + fn request_keyframe(&mut self); + fn caps(&self) -> EncoderCaps; // query, don't rely on default no-ops + fn set_hdr_meta(&mut self, m: Option); + fn invalidate_ref_frames(&mut self, lo: u64, hi: u64) -> bool; +} +fn open_encoder(plan: &EncodePlan) -> Result>; + +trait AudioCapturer: Send { fn next_chunk(&mut self) -> Result>; fn channels(&self) -> u16; fn drain(&mut self); } +trait VirtualMic: Send { fn push(&mut self, pcm: &[f32]); fn channels(&self) -> u16; } +trait InputInjector: Send { fn inject(&mut self, e: &InputEvent); } +trait PadManager: Send { /* handle/apply_rich/pump/heartbeat — Box via select(GamepadPref), replaces the PadBackend enum */ } +``` + +The three tightenings: (1) `Capturer` takes the desired `OutputFormat` IN — kills the +`capture → encode::windows_resolved_backend()` back-reference that's recomputed in `dxgi.rs`; (2) HDR +control + monitor-release become `VirtualLease` methods so the session glue never names a concrete +backend and contains zero `unsafe`; (3) optional encoder capabilities are queried via `EncoderCaps`. + +### 2.4 SessionFactory + typed plan (the single biggest clarity lever) + +Today the Windows capture/topology/encoder decision is made by ~40 scattered env reads, recomputed in +THREE places (`capture_virtual_output`, `should_use_helper`, `virtual_stream`) with no single owner and +a latent mirrored-dispatch bug (capture and encode can disagree on the backend). Replace with: + +```rust +struct SessionPlan { + display: DisplayBackend, + capture: CaptureBackend, // IddPush | Dda | Wgc + topology: SessionTopology, // SingleProcess | TwoProcessRelay + encoder: EncoderBackend, // Nvenc | Amf | Qsv | Software + input_format: OutputFormat, + bit_depth: u8, hdr: bool, pipeline_depth: usize, +} +struct SessionFactory { cfg: Arc, vdm: Arc, injector, mic, audio } +impl SessionFactory { + fn plan(&self, welcome: &Welcome) -> SessionPlan; // resolves ONCE from HostConfig; no env reads downstream + fn build(&self, plan: &SessionPlan, ctx: SessionContext) -> Result; // owns the RAII chain +} +``` + +`build()` owns the chain `vdm.lease(mode) → open_capturer(vout, fmt) → open_encoder(plan) → spawn +pipeline`, and `Session::drop` is the only teardown path. This kills the env soup, makes the deployed +path readable, and removes the capture/encode backend-disagreement bug class. It also lets us drop the +12–13-arg `#[allow(too_many_arguments)]` signatures (a `SessionContext` struct) and the dead +`Compositor` ceremony threaded through the Windows path. + +### 2.5 Ownership model — delete the global statics + +Today the lifecycle is smeared across `IDD_PERSIST` + `open_or_reuse` (dead code), `CURRENT_MON_GEN` +(read per-frame), `IDD_SETUP_LOCK`/`IDD_SESSION_STOP` (the preempt dance), `MGR: Mutex`, and on +the driver side `ADAPTER`/`MONITOR_MODES`/`NEXT_ID`/`WATCHDOG_*`/`DEVICE_POOL`. Replace with: + +- A host-lifetime **`VirtualDisplayManager`** owning a *typed* `OwnedHandle` device handle (not a raw + `isize` smuggled across threads) and the refcounted Idle/Active/Lingering state machine (preserve the + machine — it's earned). +- A per-session **`MonitorLease`** whose `Drop` releases the refcount; the monitor **generation carried + through `WinCaptureTarget`** instead of the ambient `CURRENT_MON_GEN`. +- On the driver: **wire `EvtCleanupCallback` for `MonitorContext`** (only `DeviceContext` has it today) + so the `SwapChainProcessor` + D3D resources drop via WDF RAII — deleting `free_swap_chain_processor` + and the manual-free-before-departure dance that is the **documented dominant reconnect leak**. Move + the process-global driver state into the `DeviceContext`; collapse the 3-way monitor identity + (`MONITOR_MODES` / EDID serial / context stamp) to one `Monitor` owned by the context. + +--- + +## 3. The host↔driver contract (own it; define once) + +### 3.1 `pf-vdisplay-proto` (no_std, bytemuck/zerocopy) + +One crate, both build graphs (path dep). Owns: + +- **Control plane**: a fresh interface GUID; a contiguous, versioned op enum; `#[repr(C)]` request/reply + structs carrying only used fields. +- **Frame plane**: `SharedHeader`, the `FrameToken { generation, seq, slot }` with `pack`/`unpack` + (replacing the hand-twiddled `gen<<40|seq<<8|slot` on both sides), the `Global\pfvd-*` name helpers. +- **Gamepad sections**: `XusbShm` (64 B) and `PadShm` (256 B, incl. `device_type`) layouts. +- Derive `FromBytes`/`IntoBytes`/`Pod`; `const` size+offset asserts; round-trip tests. **ABI drift + becomes a compile error, not a runtime corruption.** (bytemuck is already a dep in the driver + + wdf-umdf-sys.) This deletes every `OFF_*` constant + `read/write_unaligned` on both sides of every + boundary — the largest single block of shared-memory `unsafe`, and the top drift hazard. + +### 3.2 Control plane — keep DeviceIoControl, redesign the ABI + +`DeviceIoControl` is the correct WDF idiom for a driver with no control device and is low-frequency +(ADD/REMOVE per session + a keepalive); the shared-memory pattern buys nothing here. Keep it; redesign +the surface: + +- Ops actually needed: `Add(mode, identity) → {luid, target_id}`, `Remove`, `SetRenderAdapter` + (now **unconditional** — pf-vdisplay honors it for hybrid-GPU IDD-push; drop the SudoVDA-parity + default-off branch), `ClearAll` (first-class startup orphan reap, not an "ignored by SudoVDA" hack), + `GetInfo` (a real version handshake), and keepalive (see §3.4). +- Drop the SudoVDA-isms: `AddParams.device_name[14]`/`serial[14]` (ignored), the 16-byte GUID → a + monotonic `u64` session id (the refcount manager owns collision safety; retires `next_monitor_guid`'s + pid-mangling), the 4-byte `{major,minor,incr,test}` version tuple → one `u32`, the gappy + `0x800/0x888/0x8FF` func numbering → contiguous. +- One typed IOCTL dispatch helper retrieves+validates+aligns the buffers and hands the body a safe + `&Req` / `&mut MaybeUninit` — collapses ~20 of `control.rs`'s 29 `unsafe` blocks. + +### 3.3 Frame plane — keep the inversion, retire the scaffolding + +Keep the host-creates / driver-opens ring exactly. **Remove the bring-up scaffolding** that diagnosed +the now-solved `run_core=0` mystery: the `DebugBlock` channel + `DBG_MAGIC`, `spawn_observer` / +`PUNKTFUNK_IDD_PUSH_OBSERVE`, the `error!`-as-`info!` logging, the intentional handle leak, and the +20 s blind no-frame deadline (replace with the `DRV_STATUS_OPENED` handshake as a bounded liveness +signal). + +### 3.4 Driver swap-chain reuse — the one open root cause + +Today a *reused* IddCx monitor's swap-chain dies after ~2 sessions (target id resolves to 0, `SetDevice` +fails `0x80070057`, then an access violation), forcing fresh-monitor-per-session + the host-side +preempt/`wait_for_monitor_released` dance + the `IDD_PERSIST` "create once, never recreate" workaround. +The fix is in the **driver**: with `EvtCleanupCallback` wired + state owned by `DeviceContext` + the +identity collapsed to one `Monitor` (the recreate-path bugs are exactly the 3-way identity desync), the +clean recreate should become stable. **If** that holds, delete `IDD_SETUP_LOCK`/`IDD_SESSION_STOP` + +the preempt dance and unblock `max_concurrent>1` on Windows. **If** it can't be fixed cheaply, isolate +the residual serialization inside `VirtualDisplayManager` (not smeared back into the session loop). +Separately, evaluate replacing the polling watchdog (PING/countdown/grace/linger constellation) with a +**WDF file-object `EvtFileClose`** (host holds the control handle open; close = host gone) — feasibility +TBD on UMDF/IddCx. + +--- + +## 4. Capture strategy + +**IDD-push is the universal primary path — normal AND secure desktop (Decision B).** It composes +in-process (cross-session via `Global\` shared textures: driver in WUDFHost/Session 0, `serve` in the +console session), needs no DXGI Desktop Duplication and no `win32u` reparenting hook, is live-validated +at 5K@240 HDR, and (per the owner) also captures the secure desktop (Winlogon/UAC/lock). So there is no +separate "secure capturer" in the primary path: the same `IddPushCapturer` spans the lock screen and +UAC. Capture selection moves into a typed `CaptureBackend` in the `SessionPlan` — replacing the 3-way +env branch with `IddPush` (default) → `Dda`/`Wgc` (explicit fallbacks). + +**WGC + DDA are kept as fallbacks, not deleted (Decision B).** They cover non-IddCx / pre-pf-vdisplay +hardware and act as a safety net if IDD-push fails to attach. But they are **demoted**: they are no +longer the default, no longer entangled with the secure-desktop mux, and selected only via the explicit +`CaptureBackend` fallback in the plan. This lets the DDA module shed the parts that existed *only* to +make virtual-display-over-DDA survive on a hybrid box, while the genuinely-useful capture/recovery core +stays: +- **Scope the `win32u` self-modifying-code hook + the GPU-pref hook to the DDA fallback leg** (one + `win32u_hook::install()`), so the primary IDD-push path never touches them. Re-confirm whether DDA + even needs the `win32u` hook against pf-vdisplay (it may not — open verification item). +- **The two-process WGC relay's secure-desktop mux is retired** — IDD-push handles the secure desktop + directly, so `desktop_watch.rs` + `composed_flip.rs` + the `virtual_stream_relay` monolith are no + longer needed for their original purpose. Keep a **minimal** WGC fallback capturer if the WGC backend + is retained; do not port the 400-line relay state machine. (The cross-session input concern below is + handled by the `InputInjector`/topology abstraction, not the AU video relay.) + +**Shared D3D primitives** move out of `dxgi.rs` (today the de-facto dumping ground that `wgc.rs` and +`idd_push.rs` import from) into `windows/d3d/` (typed `Texture2d`/`Ring`/`CopyResource`/`Map`-as-bytes), +`windows/color/` (the converters + `hdr_p010_selftest` verbatim), and `windows/cursor.rs`. All three +capturers consume them — deletes the duplicated `tex_desc`, cursor, HDR-poll, repeat-last logic. + +**The texture-ownership contract becomes type-level.** NVENC encodes the capturer's texture *in place* +(no copy), sound today only because the IDD-push capturer rotates `OUT_RING` and the loop honors +`pipeline_depth()` — an undocumented cross-module coupling that is *already* a latent corruption risk. +Fix: either the encoder always `CopySubresourceRegion`s (as `ffmpeg_win` does), or the capturer hands an +explicitly-leased ring texture with a documented lifetime. No more relying on the synchronous-loop +assumption. + +**The IDD-push input question** (must confirm on-glass): capture+encode run in `serve`; input must reach +the *streamed* (console-session) desktop. If `serve` runs in the console session, `SendInput` works +directly. A code comment flags "SendInput from Session 0 can't reach Session 1" — so the architecture +must make `InputInjector` satisfiable either by in-session `SendInput` *or* by a tiny **input-only +Session-1 agent** (re-scope the old WGC helper to input only). The `SessionPlan.topology` expresses +this. + +--- + +## 5. Encode layer + +- Resolve backend + input format + pipeline depth **once** into `EncodePlan` and hand it to both the + capturer and the encoder factory — kill the duplicated `windows_resolved_backend()` call in `dxgi.rs` + (the highest-severity coupling). Trim `open_video`'s 8-arg grab-bag (`cuda` is always false on + Windows; `bit_depth` is overridden by the capture format anyway). +- **`nvenc_sys.rs`**: a thin safe wrapper — RAII `NvSession`/`NvBitstream`/`NvRegistration`/ + `NvMappedInput` (Drop = destroy/unregister/unmap) + an `NV_ENC_CONFIG` builder. The public encoder + then has near-zero `unsafe` and no hand-written teardown loops. (The SDK table already returns + `Result` via `result_without_string()`.) This is the single biggest encode-side `unsafe` reduction. +- **`ffmpeg_win`**: RAII `AvFrame`/`SwsCtx`/`HwDeviceCtx`/`HwFramesCtx` delete every manual `av_*_free` + and the error-path cleanup ladders (also the biggest leak-risk reduction); a checked `MappedSurface` + for the staging readback; a `const` size-assert on the hand-mirrored `AVD3D11VA*` structs in a + dedicated `d3d11va_ffi` submodule (silent FFmpeg ABI drift is currently undetectable). Keep + system-readback the default; zero-copy stays opt-in/experimental (no AMD/Intel lab box). +- **HDR symmetry**: make in-band ST.2086/CLL SEI a shared post-encode step so AMF/QSV get the same + mastering metadata as NVENC (today only NVENC attaches it; AMF/QSV rely solely on the 0xCE datagram). + Centralize "when does the client learn HDR metadata" in one owner. +- Keep `hdr.rs`, the `Encoder` trait, `EncodedFrame`, `validate_dimensions`, the caps-probe + RFI logic + verbatim. Delete the `pipeline.rs` `pump_once` doc stub (the real loop is `session/pipeline.rs`). + +--- + +## 6. Drivers — one binding stack (`windows-drivers-rs`), one workspace, one signing recipe + +Today: `pf-vdisplay` on the vendored **`wdf-umdf`** stack; `pf_dualsense` + `pf_xusb` on +**`microsoft/windows-drivers-rs`** (`wdk`/`wdk-sys`/`wdk-build`). Two bindgen passes, two SDK +resolutions, two `NTSTATUS`, two build systems, two signing recipes. + +**Decision C: unify all three on `microsoft/windows-drivers-rs`** (the official Microsoft stack), in one +in-tree `packaging/windows/drivers/` workspace, edition 2024, one `rust-toolchain.toml`, one CI build. +The gamepad drivers already ship on it; the work is to **migrate `pf-vdisplay` onto it** and **add the +IddCx surface** it lacks today. + +**Required pieces of this migration (each a Phase-0/early task):** + +1. **Add an `iddcx` subset to `wdk-sys`.** IddCx DDIs are *not* WDF-table functions — they are direct + `IddCxStub` exports — so the extension is bounded: an `ApiSubset::Iddcx` + `iddcx` feature → + bindgen `IddCx.h` + link `IddCxStub`, then ~15 thin `extern`/wrapper fns. Use the current + `wdf-umdf/src/iddcx.rs` (~345 LOC, validated) as a **line-by-line oracle**, including the IddCx 1.10 + `*2` HDR DDIs (`IddCxSwapChainReleaseAndAcquireBuffer2`, `IDARG_*2`, `_METADATA2`). +2. **Solve `/INTEGRITYCHECK` for self-signed loading — properly.** `wdk-build` links the driver with + `/INTEGRITYCHECK`, which a self-signed cert can't satisfy (CodeIntegrity 3004/3089). Today the + gamepad drivers hand-patch the FORCE_INTEGRITY PE bit post-link. Replace that hack with a robust + solution, in order of preference: (a) **override the linker flag** — drop `/INTEGRITYCHECK` via + `wdk-build` config / `RUSTFLAGS`/`link-args` if it can be suppressed cleanly; else (b) a + **deterministic, tested CI post-link tool** (a small Rust/PowerShell step that clears bit `0x80` at + `e_lfanew+0x5e` and re-signs, run in CI, not by hand) so it's reproducible and not a footgun; (c) for + a public build, real **attestation signing** (Partner Center) satisfies `/INTEGRITYCHECK` + legitimately. Pick (a) if feasible; (b) as the fleet-self-signed fallback. This is the headline cost + of choosing this stack and must be nailed in Phase 0. +3. **Backport the `wdf-umdf-sys` build.rs SDK-resolution fix** into `wdk-build` (or a local override): + resolve `IddCx.h`/`IddCxStub` by the SDK version that *actually contains* `um\x64\iddcx`, not the max + base SDK (the real failure where a newer base SDK shadows the WDK SDK). windows-drivers-rs's default + resolution doesn't exercise IddCx today, so this likely needs porting. +4. **Port `pf-vdisplay`'s typed safety wins** onto the new stack: re-create the + `WDF_DECLARE_CONTEXT_TYPE!` `Arc>` context abstraction (the gold-standard contained + `unsafe`); the version-gate protocol (`IddCxIsFunctionAvailable!` / `IDD_STRUCTURE_SIZE!`); and a + thin safe wrapper layer so the gamepad drivers stop emitting raw `call_unsafe_wdf_function_binding!` + everywhere (the biggest driver-`unsafe` lever). + +While unifying, also: adopt WDF device contexts for per-pad state (drop the +`UmdfHostProcessSharing=ProcessSharingDisabled`-dependent statics → true multi-pad-per-host); replace +`mem::zeroed()` configs with the `WDF_*_CONFIG_INIT` initializers (kills the recurring zeroed-default +bug class that already caused 3 driver bugs); cache the shm view (RAII `ShmView`) instead of +re-mapping ~125×/s; **delete the world-writable `C:\Users\Public\*.log` driver logging** and the "M0 +spike" naming; collapse `is_nt_error()`/`dyn-Any`/`From<()>`-as-error into a typed `IntoDriverResult`; +collapse the per-call dispatch `unsafe` into one generic `dispatch()` helper. + +**Provenance note:** confirm where `wdk`/`wdk-sys`/`wdk-build` come from (the gamepad drivers' Cargo.toml +path-deps `../../crates/wdk*` don't exist in this checkout — they resolve inside a windows-drivers-rs +checkout on the dev box). Pin them as crates.io deps or a vendored, version-pinned copy so the driver +workspace builds reproducibly in CI. + +--- + +## 7. Input, audio, service, packaging + +- **Input**: consolidate the host-side device plumbing (`create_swdevice`/`create_shm_section`/ + `SwDeviceProfile`) into one `inject/windows/swdevice.rs` used by all three managers (XUSB included, + which currently re-implements its own). The shm layouts come from `pf-vdisplay-proto`. Re-scope the + cross-session helper (if any) to input-only. +- **Audio**: small, already fairly clean. Replace the lone `newdev.dll` `LoadLibrary`+`transmute` + (`wasapi_mic.rs`, the audio runtime's *only* `unsafe`) with the windows-rs `DiInstallDriverW` binding + (or move provisioning to the installer) → zero `unsafe` in the audio runtime. +- **Service / process**: one `windows/process.rs` owning RAII `Token`/`Event`/`Job`/`Child` + a single + `spawn_as_user()` used by BOTH the SCM supervisor and any helper — deletes the duplicated + token-dup/`merged_env_block`/`CreateProcessAsUserW` machinery and ~12 manual `CloseHandle` sites. Add + a **cooperative stop**: a named stop event the supervisor sets and `serve` waits on, so Stop runs RAII + teardown (today `TerminateProcess` skips Drop → the virtual monitor lingers, the documented + stale-monitor gotcha); `TerminateProcess` only as a bounded fallback. +- **Packaging/CI**: keep the thin-.iss / fat-binary model; add a `punktfunk-host web install/uninstall` + subcommand to absorb the web-setup PowerShell. **Build + sign the unified driver workspace in CI from + source** (or a CI guard that fails on stale-vendored-DLL / un-bumped DriverVer) so the driver can't + silently drift from its source. Mint the **fresh pf-vdisplay GUID** coordinated across host + driver + + INF. Single source of truth for version → build + ISCC AppVersion + INF DriverVer. Investigate + retiring `nefconc` by creating the ROOT devnode via SwDevice/CM in Rust. Keep the + devgen-never / nefconc-only and DriverVer-bump gotchas codified. + +--- + +## 8. Unsafe-reduction program (run at port time, not as a separate pass) + +- **P0 lints first** (a few lines, before new code): `#![deny(unsafe_op_in_unsafe_fn)]` (host crate has + none today; the driver workspace already has it), `#![warn(clippy::undocumented_unsafe_blocks)]`, + `#![warn(clippy::multiple_unsafe_ops_per_block)]`. Generated bindings keep their opt-out. +- **P0 std handle ownership**: `std::os::windows::io::OwnedHandle` / `std::fs::File::from_raw_handle` + everywhere a raw `HANDLE`/`isize` is held (events/jobs/tokens/sections/pipes). Used in **zero** host + files today — the single biggest cheap win. Deletes the bespoke `unsafe impl Read/Write/Drop` + (`HandleReader`), the never-closed sudovda control handle, the `AtomicIsize` HANDLE globals, ~6 manual + `CloseHandle` sites — and fixes real leaks. +- **P0 the proto crate** (§3.1) — kills the shared-memory pointer-cast `unsafe`. +- **P1 typed wrappers**: `windows/d3d/` (most COM calls already return `Result`; per-frame loop bodies + become `unsafe`-free, the irreducible keyed-mutex/`from_raw_parts` lands in one `frame_xfer` fn); + `nvenc_sys` + RAII ffmpeg (§5); one `windows/process.rs` (§7); collapse the 21 `unsafe impl Send` + onto one audited `SendPtr`/`ThreadBound` (directly de-risks the NVENC in-place coupling). +- **P2 contain the irreducible**: `win32u_hook.rs` (one `install()`; scope to secure-DDA or drop), + `gpu_priority.rs` (the D3DKMT transmute), the WDF context-blob macro, the IddCx swap-chain DDI + + `from_raw_borrowed` (wrap in a typed `SwapChain` guard returning a borrowed `AcquiredSurface<'_>`). + Document a `// SAFETY:` per residual site. +- **P2 delete `unsafe` by deleting code**: the `present_trigger` dead diagnostic, the `DebugBlock` + channel, `spawn_observer`, `IDD_PERSIST`/`open_or_reuse`, `helpers.rs Sendable`, the WGC-open + thread-watchdog hack (gone with WGC), the driver file-logging. + +Estimated: host ~144→~35, drivers ~227→~60, residual concentrated and auditable. (`#![forbid(unsafe)]` +is impossible for the drivers and the per-frame D3D path — the realistic target is *containment*.) + +--- + +## 9. SudoVDA decoupling (mechanical rename + scrub) + +`vdisplay/sudovda.rs` → `vdisplay/windows.rs`; `SudoVdaDisplay` → `PfVirtualDisplay`; scrub "SudoVDA" +from all log/error/doc strings across `capture.rs`/`dxgi.rs`/`wgc*.rs`/`idd_push.rs`/`punktfunk1.rs`/ +`main.rs`/`sendinput.rs` (141 refs / 15 files). **Split the reach-in helpers out** of the vdisplay +backend (they're display-utility, not virtual-display creation): `set_advanced_color`, +`advanced_color_enabled`, `resolve_gdi_name`, `isolate/restore_displays_ccd`, `set_active_mode` → +`windows/display_ccd.rs` (collapsing the 4× copy-pasted `QueryDisplayConfig` preamble into one safe +`query_active_config()`); `resolve_render_adapter_luid` → `windows/adapter.rs`. Both vdisplay and +capture then depend on these as peers, breaking the circular reach-in. `WinCaptureTarget` moves to a +neutral location (defined in `dxgi.rs`, constructed in `sudovda.rs` today). Drop the dual-driver +fallback conditionals. Expose HDR/monitor-release as `VirtualLease` methods (zero `unsafe` in the +session glue). + +--- + +## 10. Build plan (greenfield — Decision A) + +A from-scratch rebuild of the Windows host against the clean architecture, **salvaging the §1 jewels +verbatim** (the already-clean, already-tested modules: `hdr.rs`, `edid.rs`, the `inject/proto` codecs, +the HDR/cursor converters + their self-tests, the GF8 packetizer, the pairing handshake). The old +Windows code stays in-tree, untouched, as the *reference implementation* until the new path reaches +parity on glass, then is deleted. + +**Greenfield-risk mitigation (the survey's strong caveat stands):** almost none of this is +CI-validatable — the Windows backends + drivers need the RTX box (192.168.1.173) + the build VM, and +**AMF/QSV have no lab hardware at all**. A greenfield rewrite therefore carries real risk of silently +dropping a layered bug-fix. Two guardrails are mandatory: +1. **The §1 preservation checklist is a test/assert contract**, not prose: each rebuilt module ports its + hard-won invariants as unit tests or runtime asserts — RAII teardown order (restore displays *before* + REMOVE), keyed-mutex held only across convert/copy, `terminate` checked at the swap-chain loop top, + magic stamped last, `OUT_RING` texture rotation under `pipeline_depth>1`, the NVENC caps-probe + downgrade, the SwDeviceCreate identity recipe. A rebuild that drops one fails its own test. +2. **On-glass A/B gates** at each milestone below, on the RTX box, against the current shipping build: + 1080p60, 5K@240 HDR, reconnect-storm, secure desktop (lock/UAC), multi-pad. Nothing replaces the old + path until its A/B passes. + +### Build order + +- **M0 — Foundations + the `/INTEGRITYCHECK` answer.** Stand up `crates/pf-vdisplay-proto` (the clean, + owned ABI: fresh GUID, the redesigned IOCTL op enum + `#[repr(C)]` structs, `SharedHeader`, + `FrameToken`, the gamepad shm layouts, `const` size-asserts, round-trip tests). Stand up the in-tree + `packaging/windows/drivers/` workspace on `windows-drivers-rs` and **prove the two hard unknowns**: + (a) the `iddcx` `wdk-sys` subset bindgen+links and a trivial IddCx adapter loads; (b) `/INTEGRITYCHECK` + is solved (§6.2) so a self-signed driver loads under Secure Boot with no hand-patching. Add the P0 + lints to the host crate. *No host behavior yet.* +- **M1 — pf-vdisplay on the new stack, first light.** Rebuild the IddCx driver against + `windows-drivers-rs`+`iddcx`, clean from the start: `DeviceContext`-owned state (no process-globals), + one `Monitor` identity, `EvtCleanupCallback` on `MonitorContext`, the ported `Arc>` context, + the EDID + HDR recipe verbatim, the redesigned control plane from the proto crate. *(On-glass: ADD → + monitor arrives → IDD-push ring attaches → frames flow at 1080p; REMOVE clean.)* +- **M2 — IDD-push capture + NVENC, glass-to-glass.** New `src/windows/` tree: `windows/d3d/` typed + wrappers, `windows/color/` (converters + self-tests), `windows/cursor.rs`, `capture/windows/idd_push.rs` + consuming the proto ring with a **type-level texture-ownership contract** (no in-place-encode + assumption), `encode/windows/{nvenc.rs,nvenc_sys.rs}`, `vdisplay/windows.rs` + `windows/display_ccd.rs` + + `windows/adapter.rs`. Wire the `SessionFactory`/`SessionPlan` (M2 only needs the IDD-push+NVENC + plan). *(On-glass A/B: 1080p60 + 5K@240 HDR, latency parity with the current build.)* +- **M3 — Service, input, audio, secure desktop.** `windows/process.rs` (RAII Token/Event/Job/Child + + `spawn_as_user` + cooperative stop) + `windows/service.rs`; `inject/windows/*` on the proto shm + + consolidated `swdevice.rs`; `audio/windows/*` (zero-`unsafe` runtime). Confirm IDD-push captures the + secure desktop (lock/UAC) and input reaches the streamed session (in-session `SendInput`, or the + input-only agent if needed). *(On-glass: full session incl. lock screen + UAC + a real pad.)* +- **M4 — Gamepad drivers onto the unified stack.** Rebuild `pf_dualsense` + `pf_xusb` on + `windows-drivers-rs` in the same workspace, WDF device contexts (true multi-pad), proto shm, + `WDF_*_CONFIG_INIT`, no file logging, no "M0 spike" naming. *(On-glass: 2 XInput + 2 DualSense pads, + rumble/lightbar/adaptive-trigger round-trip.)* +- **M5 — Fallbacks + GameStream + AMF/QSV.** Port the demoted WGC + DDA fallback capturers (minimal, + `win32u` hook scoped to the DDA leg); `encode/windows/ffmpeg_win/*` with RAII FFmpeg + the + `d3d11va_ffi` size-assert (system-readback default; zero-copy experimental); GameStream planes reusing + `session/pipeline.rs`, installer default flipped to secure `serve`. *(On-glass: Moonlight client on + the DDA fallback; AMF/QSV stays CI-only.)* +- **M6 — Cut over + delete.** Flip the default to the new path, run the full A/B matrix, then delete the + old `dxgi.rs`/`wgc*`/`sudovda.rs`/`punktfunk1.rs` Windows monoliths + the bring-up scaffolding + (`DebugBlock`/`spawn_observer`/observe gate) + the old gamepad driver crates. Single source of truth + for version; CI builds+signs all drivers from source. + +Milestones are roughly dependency-ordered; M0 is the long pole (the `/INTEGRITYCHECK` + `iddcx` proof +gates everything else). M5's AMF/QSV cannot be validated without hardware — keep it system-readback-only +and clearly experimental. + +--- + +## 11. Decisions (resolved 2026-06-24) + open verification items + +The five product forks are decided (see the table in §0): **A** greenfield; **B** IDD-push primary for +everything incl. secure desktop, WGC+DDA kept as demoted fallbacks; **C** extend `windows-drivers-rs` + +solve `/INTEGRITYCHECK`; **D** keep GameStream, default secure. On **E (concurrent sessions)**: fix the +driver swap-chain lifecycle regardless (it removes the leak + the preempt dance); treat true +`max_concurrent>1` on Windows as a follow-on once clean reuse is proven on glass. + +What remains are **technical unknowns to confirm on the RTX box** (not user decisions): + +- **`/INTEGRITYCHECK` resolution path (M0 long pole).** Can `wdk-build` suppress `/INTEGRITYCHECK` via + config/link-args (preferred), or must we keep a deterministic CI post-link bit-clear? Decides the + signing story for all three drivers. +- **`iddcx` subset on `wdk-sys`.** Does the bindgen+`IddCxStub` link cleanly, and does the SDK-resolution + fix need backporting? (windows-drivers-rs doesn't exercise IddCx today.) +- **Driver swap-chain reuse.** Does the clean ownership model (`EvtCleanupCallback` + DeviceContext state + + single `Monitor` identity) actually fix the "reused swap-chain dies after ~2 sessions" root cause? If + not, the residual serialization stays inside `VirtualDisplayManager`. +- **IDD-push input + secure desktop.** Confirm `serve` runs in the console session so `SendInput` reaches + the streamed desktop (a code comment warns about Session 0→1); confirm IDD-push frames flow through the + lock screen / UAC (owner reports yes — verify and lock it in as the primary, demoting the DDA secure + leg to fallback). +- **Does the demoted DDA fallback still need the `win32u` hook** against pf-vdisplay, or was that purely + a SudoVDA/hybrid pathology? If unneeded, the self-modifying-code hook can be deleted entirely. +- **AMF/QSV** stays CI-only (no hardware) — system-readback default, zero-copy experimental. + +--- + +## 12. Risks + +- **Greenfield with no CI (the dominant risk).** The build VM is headless/WARP; the WinUI/hardware/driver + paths need the RTX box, and AMF/QSV have no hardware. A from-scratch rebuild can silently drop a + layered bug-fix. Mitigation: the §1 preservation checklist is a *test/assert contract* per rebuilt + module; on-glass A/B gates the new path before the old one is deleted (M6); keep the old code in-tree + as the reference until parity. +- **`/INTEGRITYCHECK` (M0 long pole).** Choosing `windows-drivers-rs` means self-signed loading depends + on solving it cleanly (§6.2). If neither linker-flag suppression nor a deterministic CI post-link step + works, drivers can't load self-signed — prove this first, it gates everything. +- **`iddcx` on `wdk-sys`** is new surface (windows-drivers-rs doesn't bind IddCx). Bounded + (`IddCxStub` exports + ~15 wrappers, with the validated `wdf-umdf/iddcx.rs` as oracle) but unproven on + this stack — M0 must light it. +- **`pf-vdisplay-proto` spans two cargo build graphs** (host workspace + the driver workspace). Validate + the path-dep resolves on the Windows build env in M0; pin `wdk*` provenance so the driver workspace + builds reproducibly in CI. +- **Driver swap-chain-reuse root cause still undiagnosed.** The clean ownership model *should* fix it; + if not, residual serialization stays inside `VirtualDisplayManager` and `max_concurrent>1` stays + blocked. Keep `await_released` on the trait until reuse is proven on glass. +- **NVENC in-place encode + `pipeline_depth>1`** is a latent corruption risk; the M2 texture-ownership + contract must be type-level (not the synchronous-loop assumption). Verify the ring on glass. +- **Host/driver version drift in the field.** New host + new driver are always built together (greenfield), + but the installer bundles both — enforce a startup version handshake (proto version in both binaries) + and a CI guarantee they're built from the same revision. +- **Big-bang cutover (M6).** Flipping the default and deleting the old monoliths is the riskiest moment; + it is gated on the full A/B matrix passing, and the old code is recoverable from git if a regression + surfaces post-cutover.