@@ -0,0 +1,617 @@
# Windows Host Rewrite — Design & Plan
Status: **proposed ** (2026-06-24). This plan takes the current, hard-won Windows host (pf-vdisplay
all-Rust IddCx driver + IDD-push zero-copy capture, live-validated 5120× 1440@240 HDR on the RTX box)
as a * knowledge base * and re-derives a clean, stable, well-layered architecture from it. It drops all
SudoVDA back-compat (we own both ends now) and drives `unsafe` to a contained minimum.
It supersedes the stale conclusion in `docs/windows-virtual-display-rust-port.md` ("IDD-push not
viable") — that verdict was written in the * same commit * (`e2c9bfd` ) that shipped the working
922-line consumer + 424-line producer. **IDD-push works and is the architecture. ** The breakthrough the
prose never recorded: once the CCD topology makes the virtual display the sole composited desktop in the
console session, DWM composites to it and the IddCx swap-chain * is * assigned
(`run_core: FIRST FRAME acquired — DWM IS compositing the virtual display!` ). Per the owner, **IDD-push
also captures the secure desktop (Winlogon / UAC / lock)** — so it is the universal primary path, not
just the normal-desktop path.
### Decisions resolved (2026-06-24)
| # | Decision | Chosen |
|---|----------|--------|
| A. Execution | greenfield vs staged | **Greenfield rewrite ** — rebuild the Windows host fresh against the clean architecture, salvaging the validated "jewels" (§1) verbatim. (Risk acknowledged: no CI for the Windows paths — mitigated by the §1 preservation checklist + on-glass gates, §10.) |
| B. Capture surface | IDD-only / IDD+secure-DDA / keep fallbacks | **IDD-push primary for everything (incl. the secure desktop); keep WGC + DDA as fallbacks. ** |
| C. Driver binding stack | wdf-umdf vs windows-drivers-rs | **Extend `microsoft/windows-drivers-rs` ** with an `iddcx` subset; unify all three drivers on it; **solve `/INTEGRITYCHECK` properly ** (§6). |
| D. GameStream on Windows | keep / keep-secure-default / drop | **Keep Moonlight compat; flip the installer/service default to secure `serve` ** (GameStream an explicit opt-in). |
---
## 0. Goals (from the brief)
1. **Clean, stable, well-layered architecture. ** Decompose the god-files, give every subsystem one
owner, and replace the ~40-knob `PUNKTFUNK_*` env soup with a typed config resolved once per session.
2. **Drop every trace of SudoVDA back-compat. ** We own the driver (`pf-vdisplay` ) and the host. The
byte-identical IOCTL ABI, the reused `{e5bcc234}` GUID, the `sudovda` module name, the "SudoVDA
ignores this" conditionals — all pure liability now.
3. **Minimize `unsafe`. ** ~480 `unsafe` occurrences across the Windows surface; the large majority are
FFI-mechanical (windows-rs/NVENC/WDK already return `Result` ). Target: host ~144→~35, drivers
~227→~60, with the irreducible floor * contained * in 3– 4 named modules under
`deny(unsafe_op_in_unsafe_fn)` .
### Non-goals / invariants (do not regress)
- **Linux host behavior is out of scope and must not change.** The host crate is shared; Linux is
validated across KWin/gamescope/Mutter/Sway. Touch only the seams.
- **`punktfunk-core` stays the one linked core.** Protocol/FEC/crypto/QUIC live there behind the C
ABI; the host is a leaf binary. No protocol changes here.
- **No async on the per-frame path.** Native threads only (the existing discipline).
---
## 1. What we KEEP (validated, load-bearing — port, don't rewrite)
These are expensive empirical wins. The rewrite relocates/wraps them but must preserve behavior
byte-for-byte:
- **The IDD-push frame transport shape**: host-creates / driver-opens shared keyed-mutex texture ring
with the permissive `D:(A;;GA;;;WD)` SDDL (forced by the restricted WUDFHost token, mirrors the
gamepad drivers); the generation-tagged `latest = gen<<40 | seq<<8 | slot` stale-ring reject (kills
the HDR-flip garbage frame); 0 ms try-acquire / drop-on-full publish (never block the swap-chain
thread); the host output ring `OUT_RING` + `pipeline_depth=2` overlap of convert/copy vs NVENC.
- **The IddCx driver internals that earned their keep**: `edid.rs` in full (128-byte EDID + CTA-861.3
HDR block, serial-as-index round-trip, dual checksums); the HDR enablement recipe (`CAN_PROCESS_FP16`
+ the `*2` mode DDIs + `set_gamma_ramp` /`set_default_hdr_metadata` accept-stubs + `HIGH_COLOR_SPACE` +
8|10 bpc); `DEVICE_POOL` one-device-per-render-LUID (the NVIDIA UMD-thread/VRAM leak fix); stamping
the OS target id onto the monitor context (the recreated-monitor `target_id=0` fix); the swap-chain
processor's two real leak fixes (borrow `IDXGIDevice` across `SetDevice` retries; check `terminate`
at the loop top during a frame burst).
- **The monitor-lifecycle concurrency correctness**: serialized ADD/REMOVE/teardown, the documented
lock order, the watchdog CAS + re-check-under-lock, the creation grace window, the
generation-stamped lease (a stale lease can't tear down a fresh monitor). * Structure * can change;
these properties must survive.
- **The CCD topology fixes**: `isolate_displays_ccd` (the iGPU-attached-monitor hybrid-box correctness;
the `SDC_FORCE_MODE_ENUMERATION` re-commit that drives `COMMIT_MODES → ASSIGN_SWAPCHAIN` ); restore
topology * before * REMOVE.
- **The HDR color math**: `hdr.rs` verbatim (pure, unit-tested, ST.2086 G/B/R + big-endian SEI);
`HdrConverter` /`HdrP010Converter` + the f64 `p010_reference` + `hdr_p010_selftest` ; `VideoConverter`
(RGB→NV12/P010 on the video engine — a measured latency win); the cursor decomposition
(`convert_pointer_shape` color/masked/monochrome edge cases).
- **NVENC tuning**: caps-probe-before-configure (disambiguate unsupported-config vs too-high-bitrate;
10-bit→8-bit graceful downgrade); the bitrate-clamp binary search (finds each GPU's real ceiling);
true RFI over the DPB; the low-latency configs (CBR, infinite GOP, P-only, ~1-frame VBV).
- **The gamepad driver wins**: the SwDeviceCreate identity recipe (enumerator with no `_` ; mandatory
completion callback; synthesized `USB\VID_054C&PID_0CE6` compat-ids for native-DS5 detection; the
non-null per-pad `ContainerId` dodging the xinput1_4 slot-skip); one `pf_dualsense` serving
DualSense+DS4 via a `device_type` byte; XUSB declining `WAIT_*` to force synchronous `GET_STATE` ;
the static HID descriptors/feature blobs; per-pad index via `pszDeviceLocation` .
- **The session-glue patterns**: the `Capturer` /`VirtualDisplay` /`Encoder` trait seam + RAII keepalive
teardown; host-lifetime shared services (`InjectorService` /`MicService` /`AudioCapSlot` ) with
per-session gamepads; the encode|send thread split + microburst pacing; `build_pipeline_with_retry`
+ permanent-vs-transient classification; the control-task `select!` + adaptive-FEC; the GameStream
`VideoPacketizer` (GF8 Cauchy, Moonlight byte-exact); the pairing/trust handshake.
- **The SCM supervisor model**: Session-0 LocalSystem supervisor → token-retarget →
`CreateProcessAsUserW` `serve` into the console session, relaunch-on-session-change, kill-on-close
Job Object; the file-append log-mask; the two-tier logging init.
- **Build/CI wins**: the `wdf-umdf-sys` build.rs SDK-version resolution (picks the SDK version that
actually contains `iddcx` , not the max base SDK); the ARM64 cross-compile off the x64 runner; the
thin-.iss / fat-binary installer delegating to `service install` .
---
## 2. Target architecture
### 2.1 Crate & workspace strategy
**Keep ONE shared `crates/punktfunk-host` crate ** (do * not * split `punktfunk-host-windows` ). The host is
a leaf binary consumed by nobody; the "one core, linked everywhere" invariant is already satisfied by
`punktfunk-core` . A split would only fork the genuinely-shared session glue, traits, and `hdr.rs` . The
cfg-sprawl win comes instead from confining all Windows code under one `src/windows/` subtree behind a
single `#[cfg(windows)] mod windows;` seam, with backend impls next to their trait's dispatch point.
**Pull the three drivers into ONE in-tree driver workspace ** (`packaging/windows/drivers/` ) on a single
binding stack, one `rust-toolchain.toml` , one signing recipe, one CI build. Today they are 2– 3 disjoint
cargo packages on two incompatible WDK stacks (see §6).
**Add ONE shared `no_std` ABI crate ** (`crates/pf-vdisplay-proto` , name TBD) consumed by both the host
crate and the driver workspace. It owns * every * cross-process binary contract that is currently
hand-duplicated with "must match" comments. This is the single highest-value correctness change (§4.1).
### 2.2 Target file tree (host crate)
```
crates/punktfunk-host/src/
main.rs clap-derive subcommand dispatch only (kills parse_serve/parse_spike/hand --help)
config.rs HostConfig (typed; parsed ONCE from host.env/env/flags) + config_dir
session/
mod.rs SessionFactory, SessionPlan, SessionContext, Session (the ONLY teardown path)
server.rs QUIC accept loop, handshake, shared-service wiring
serve_session.rs resolve_* → Welcome/Start → spawn → RAII teardown
control.rs mid-stream renegotiation select! loop
pipeline.rs REAL shared encode|send split, send_loop, FrameMsg, pacing (used by native AND GameStream)
capture.rs Capturer trait + CapturedFrame/PixelFormat/FramePayload (platform-neutral)
capture/linux.rs
capture/windows/ mod.rs (dispatch), idd_push.rs, dda.rs, wgc.rs, secure_desktop.rs*
vdisplay.rs VirtualDisplay/VirtualOutput trait + open() dispatch (neutral)
vdisplay/{kwin,gamescope,mutter,wlroots}.rs
vdisplay/windows.rs was sudovda.rs → PfVirtualDisplay + VirtualDisplayManager
encode.rs Encoder trait, EncodedFrame, validate_dimensions, open_encoder dispatch
encode/{linux,vaapi,sw}.rs
encode/windows/ mod.rs (dispatch), nvenc.rs, nvenc_sys.rs, ffmpeg_win/{mod,system,zerocopy,d3d11va_ffi}.rs
hdr.rs PRESERVE VERBATIM
inject.rs / inject/linux/* / inject/windows/{mod,sendinput,pad_manager,xusb,dualsense,dualshock4,swdevice,section}.rs
inject/proto/{dualsense,dualshock4}.rs shared pure codecs (PRESERVE)
audio.rs / audio/linux.rs / audio/windows/{mod,wasapi_cap,wasapi_mic}.rs
windows/ mod.rs, d3d/{mod,texture,ring,convert}.rs, color/{hdr,p010,video_proc}.rs,
cursor.rs, display_ccd.rs, adapter.rs, process.rs (Token/Event/Job/Child/spawn_as_user),
service.rs (SCM; uses process.rs), win32u_hook.rs*, gpu_priority.rs
session_tuning.rs (PRESERVE) / pwinit.rs / discovery.rs / mgmt.rs / native_pairing.rs / library.rs
gamestream/ unchanged module set; stream.rs slims by reusing session/pipeline.rs
```
`*` = survives only per the secure-desktop / WGC product decisions (§5, §11).
### 2.3 The seam traits (keep the shape; tighten 3 things)
``` rust
trait VirtualDisplay : Send {
fn name ( & self ) -> & str ;
fn create ( & self , mode : Mode ) -> Result < VirtualOutput > ;
fn set_launch_command ( & self , cmd : Option < String > ) ; // per-instance, not a global env var
}
struct VirtualOutput {
node_id : u32 ,
preferred_mode : Mode ,
#[ cfg(windows) ] win_capture : WinCaptureTarget , // target_id + adapter_luid + monitor_gen (carried, not ambient)
keepalive : Box < dyn VirtualLease > ,
}
trait VirtualLease : Send { // Drop = release; replaces the sudovda free-fns + CURRENT_MON_GEN reach-in
fn set_hdr ( & self , on : bool ) -> Result < ( ) > ;
fn hdr_enabled ( & self ) -> bool ;
fn await_released ( & self , timeout : Duration ) -> bool ;
}
trait Capturer : Send {
fn next_frame ( & mut self ) -> Result < CapturedFrame > ;
fn try_latest ( & mut self ) -> Option < CapturedFrame > ;
fn set_active ( & mut self , a : bool ) ;
fn hdr_meta ( & self ) -> Option < HdrMeta > ;
fn pipeline_depth ( & self ) -> usize ;
}
fn open_capturer ( vout : VirtualOutput , want : OutputFormat ) -> Result < Box < dyn Capturer > > ; // format+HDR passed IN
trait Encoder : Send {
fn submit ( & mut self , f : & CapturedFrame ) -> Result < ( ) > ;
fn poll ( & mut self ) -> Option < EncodedFrame > ;
fn flush ( & mut self ) ;
fn request_keyframe ( & mut self ) ;
fn caps ( & self ) -> EncoderCaps ; // query, don't rely on default no-ops
fn set_hdr_meta ( & mut self , m : Option < HdrMeta > ) ;
fn invalidate_ref_frames ( & mut self , lo : u64 , hi : u64 ) -> bool ;
}
fn open_encoder ( plan : & EncodePlan ) -> Result < Box < dyn Encoder > > ;
trait AudioCapturer : Send { fn next_chunk ( & mut self ) -> Result < Vec < f32 > > ; fn channels ( & self ) -> u16 ; fn drain ( & mut self ) ; }
trait VirtualMic : Send { fn push ( & mut self , pcm : & [ f32 ] ) ; fn channels ( & self ) -> u16 ; }
trait InputInjector : Send { fn inject ( & mut self , e : & InputEvent ) ; }
trait PadManager : Send { /* handle / apply_rich / pump / heartbeat — Box<dyn PadManager> via select(GamepadPref), replaces the PadBackend enum */ }
```
The three tightenings: (1) `Capturer` takes the desired `OutputFormat` IN — kills the
`capture → encode::windows_resolved_backend()` back-reference that's recomputed in `dxgi.rs` ; (2) HDR
control + monitor-release become `VirtualLease` methods so the session glue never names a concrete
backend and contains zero `unsafe` ; (3) optional encoder capabilities are queried via `EncoderCaps` .
### 2.4 SessionFactory + typed plan (the single biggest clarity lever)
Today the Windows capture/topology/encoder decision is made by ~40 scattered env reads, recomputed in
THREE places (`capture_virtual_output` , `should_use_helper` , `virtual_stream` ) with no single owner and
a latent mirrored-dispatch bug (capture and encode can disagree on the backend). Replace with:
``` rust
struct SessionPlan {
display : DisplayBackend ,
capture : CaptureBackend , // IddPush | Dda | Wgc
topology : SessionTopology , // SingleProcess | TwoProcessRelay
encoder : EncoderBackend , // Nvenc | Amf | Qsv | Software
input_format : OutputFormat ,
bit_depth : u8 , hdr : bool , pipeline_depth : usize ,
}
struct SessionFactory { cfg : Arc < HostConfig > , vdm : Arc < VirtualDisplayManager > , injector , mic , audio }
impl SessionFactory {
fn plan ( & self , welcome : & Welcome ) -> SessionPlan ; // resolves ONCE from HostConfig; no env reads downstream
fn build ( & self , plan : & SessionPlan , ctx : SessionContext ) -> Result < Session > ; // owns the RAII chain
}
```
`build()` owns the chain `vdm.lease(mode) → open_capturer(vout, fmt) → open_encoder(plan) → spawn
pipeline` , and `Session::drop` is the only teardown path. This kills the env soup, makes the deployed
path readable, and removes the capture/encode backend-disagreement bug class. It also lets us drop the
12– 13-arg `#[allow(too_many_arguments)]` signatures (a `SessionContext` struct) and the dead
`Compositor` ceremony threaded through the Windows path.
### 2.5 Ownership model — delete the global statics
Today the lifecycle is smeared across `IDD_PERSIST` + `open_or_reuse` (dead code), `CURRENT_MON_GEN`
(read per-frame), `IDD_SETUP_LOCK` /`IDD_SESSION_STOP` (the preempt dance), `MGR: Mutex<Mgr>` , and on
the driver side `ADAPTER` /`MONITOR_MODES` /`NEXT_ID` /`WATCHDOG_*` /`DEVICE_POOL` . Replace with:
- A host-lifetime * * `VirtualDisplayManager` ** owning a * typed * `OwnedHandle` device handle (not a raw
`isize` smuggled across threads) and the refcounted Idle/Active/Lingering state machine (preserve the
machine — it's earned).
- A per-session * * `MonitorLease` ** whose `Drop` releases the refcount; the monitor **generation carried
through `WinCaptureTarget` ** instead of the ambient `CURRENT_MON_GEN` .
- On the driver: **wire `EvtCleanupCallback` for `MonitorContext` ** (only `DeviceContext` has it today)
so the `SwapChainProcessor` + D3D resources drop via WDF RAII — deleting `free_swap_chain_processor`
and the manual-free-before-departure dance that is the **documented dominant reconnect leak ** . Move
the process-global driver state into the `DeviceContext` ; collapse the 3-way monitor identity
(`MONITOR_MODES` / EDID serial / context stamp) to one `Monitor` owned by the context.
---
## 3. The host↔driver contract (own it; define once)
### 3.1 `pf-vdisplay-proto` (no_std, bytemuck/zerocopy)
One crate, both build graphs (path dep). Owns:
- **Control plane**: a fresh interface GUID; a contiguous, versioned op enum; `#[repr(C)]` request/reply
structs carrying only used fields.
- **Frame plane**: `SharedHeader` , the `FrameToken { generation, seq, slot }` with `pack` /`unpack`
(replacing the hand-twiddled `gen<<40|seq<<8|slot` on both sides), the `Global\pfvd-*` name helpers.
- **Gamepad sections**: `XusbShm` (64 B) and `PadShm` (256 B, incl. `device_type` ) layouts.
- Derive `FromBytes` /`IntoBytes` /`Pod` ; `const` size+offset asserts; round-trip tests. **ABI drift
becomes a compile error, not a runtime corruption.** (bytemuck is already a dep in the driver +
wdf-umdf-sys.) This deletes every `OFF_*` constant + `read/write_unaligned` on both sides of every
boundary — the largest single block of shared-memory `unsafe` , and the top drift hazard.
### 3.2 Control plane — keep DeviceIoControl, redesign the ABI
`DeviceIoControl` is the correct WDF idiom for a driver with no control device and is low-frequency
(ADD/REMOVE per session + a keepalive); the shared-memory pattern buys nothing here. Keep it; redesign
the surface:
- Ops actually needed: `Add(mode, identity) → {luid, target_id}` , `Remove` , `SetRenderAdapter`
(now **unconditional ** — pf-vdisplay honors it for hybrid-GPU IDD-push; drop the SudoVDA-parity
default-off branch), `ClearAll` (first-class startup orphan reap, not an "ignored by SudoVDA" hack),
`GetInfo` (a real version handshake), and keepalive (see §3.4).
- Drop the SudoVDA-isms: `AddParams.device_name[14]` /`serial[14]` (ignored), the 16-byte GUID → a
monotonic `u64` session id (the refcount manager owns collision safety; retires `next_monitor_guid` 's
pid-mangling), the 4-byte `{major,minor,incr,test}` version tuple → one `u32` , the gappy
`0x800/0x888/0x8FF` func numbering → contiguous.
- One typed IOCTL dispatch helper retrieves+validates+aligns the buffers and hands the body a safe
`&Req` / `&mut MaybeUninit<Reply>` — collapses ~20 of `control.rs` 's 29 `unsafe` blocks.
### 3.3 Frame plane — keep the inversion, retire the scaffolding
Keep the host-creates / driver-opens ring exactly. **Remove the bring-up scaffolding ** that diagnosed
the now-solved `run_core=0` mystery: the `DebugBlock` channel + `DBG_MAGIC` , `spawn_observer` /
`PUNKTFUNK_IDD_PUSH_OBSERVE` , the `error!` -as-`info!` logging, the intentional handle leak, and the
20 s blind no-frame deadline (replace with the `DRV_STATUS_OPENED` handshake as a bounded liveness
signal).
### 3.4 Driver swap-chain reuse — the one open root cause
Today a * reused * IddCx monitor's swap-chain dies after ~2 sessions (target id resolves to 0, `SetDevice`
fails `0x80070057` , then an access violation), forcing fresh-monitor-per-session + the host-side
preempt/`wait_for_monitor_released` dance + the `IDD_PERSIST` "create once, never recreate" workaround.
The fix is in the **driver ** : with `EvtCleanupCallback` wired + state owned by `DeviceContext` + the
identity collapsed to one `Monitor` (the recreate-path bugs are exactly the 3-way identity desync), the
clean recreate should become stable. **If ** that holds, delete `IDD_SETUP_LOCK` /`IDD_SESSION_STOP` +
the preempt dance and unblock `max_concurrent>1` on Windows. **If ** it can't be fixed cheaply, isolate
the residual serialization inside `VirtualDisplayManager` (not smeared back into the session loop).
Separately, evaluate replacing the polling watchdog (PING/countdown/grace/linger constellation) with a
**WDF file-object `EvtFileClose` ** (host holds the control handle open; close = host gone) — feasibility
TBD on UMDF/IddCx.
---
## 4. Capture strategy
**IDD-push is the universal primary path — normal AND secure desktop (Decision B). ** It composes
in-process (cross-session via `Global\` shared textures: driver in WUDFHost/Session 0, `serve` in the
console session), needs no DXGI Desktop Duplication and no `win32u` reparenting hook, is live-validated
at 5K@240 HDR, and (per the owner) also captures the secure desktop (Winlogon/UAC/lock). So there is no
separate "secure capturer" in the primary path: the same `IddPushCapturer` spans the lock screen and
UAC. Capture selection moves into a typed `CaptureBackend` in the `SessionPlan` — replacing the 3-way
env branch with `IddPush` (default) → `Dda` /`Wgc` (explicit fallbacks).
**WGC + DDA are kept as fallbacks, not deleted (Decision B). ** They cover non-IddCx / pre-pf-vdisplay
hardware and act as a safety net if IDD-push fails to attach. But they are **demoted ** : they are no
longer the default, no longer entangled with the secure-desktop mux, and selected only via the explicit
`CaptureBackend` fallback in the plan. This lets the DDA module shed the parts that existed * only * to
make virtual-display-over-DDA survive on a hybrid box, while the genuinely-useful capture/recovery core
stays:
- **Scope the `win32u` self-modifying-code hook + the GPU-pref hook to the DDA fallback leg** (one
`win32u_hook::install()` ), so the primary IDD-push path never touches them. Re-confirm whether DDA
even needs the `win32u` hook against pf-vdisplay (it may not — open verification item).
- **The two-process WGC relay's secure-desktop mux is retired** — IDD-push handles the secure desktop
directly, so `desktop_watch.rs` + `composed_flip.rs` + the `virtual_stream_relay` monolith are no
longer needed for their original purpose. Keep a **minimal ** WGC fallback capturer if the WGC backend
is retained; do not port the 400-line relay state machine. (The cross-session input concern below is
handled by the `InputInjector` /topology abstraction, not the AU video relay.)
**Shared D3D primitives ** move out of `dxgi.rs` (today the de-facto dumping ground that `wgc.rs` and
`idd_push.rs` import from) into `windows/d3d/` (typed `Texture2d` /`Ring` /`CopyResource` /`Map` -as-bytes),
`windows/color/` (the converters + `hdr_p010_selftest` verbatim), and `windows/cursor.rs` . All three
capturers consume them — deletes the duplicated `tex_desc` , cursor, HDR-poll, repeat-last logic.
**The texture-ownership contract becomes type-level. ** NVENC encodes the capturer's texture * in place *
(no copy), sound today only because the IDD-push capturer rotates `OUT_RING` and the loop honors
`pipeline_depth()` — an undocumented cross-module coupling that is * already * a latent corruption risk.
Fix: either the encoder always `CopySubresourceRegion` s (as `ffmpeg_win` does), or the capturer hands an
explicitly-leased ring texture with a documented lifetime. No more relying on the synchronous-loop
assumption.
**The IDD-push input question ** (must confirm on-glass): capture+encode run in `serve` ; input must reach
the * streamed * (console-session) desktop. If `serve` runs in the console session, `SendInput` works
directly. A code comment flags "SendInput from Session 0 can't reach Session 1" — so the architecture
must make `InputInjector` satisfiable either by in-session `SendInput` * or * by a tiny **input-only
Session-1 agent** (re-scope the old WGC helper to input only). The `SessionPlan.topology` expresses
this.
---
## 5. Encode layer
- Resolve backend + input format + pipeline depth **once ** into `EncodePlan` and hand it to both the
capturer and the encoder factory — kill the duplicated `windows_resolved_backend()` call in `dxgi.rs`
(the highest-severity coupling). Trim `open_video` 's 8-arg grab-bag (`cuda` is always false on
Windows; `bit_depth` is overridden by the capture format anyway).
- **`nvenc_sys.rs` **: a thin safe wrapper — RAII `NvSession` /`NvBitstream` /`NvRegistration` /
`NvMappedInput` (Drop = destroy/unregister/unmap) + an `NV_ENC_CONFIG` builder. The public encoder
then has near-zero `unsafe` and no hand-written teardown loops. (The SDK table already returns
`Result` via `result_without_string()` .) This is the single biggest encode-side `unsafe` reduction.
- **`ffmpeg_win` **: RAII `AvFrame` /`SwsCtx` /`HwDeviceCtx` /`HwFramesCtx` delete every manual `av_*_free`
and the error-path cleanup ladders (also the biggest leak-risk reduction); a checked `MappedSurface`
for the staging readback; a `const` size-assert on the hand-mirrored `AVD3D11VA*` structs in a
dedicated `d3d11va_ffi` submodule (silent FFmpeg ABI drift is currently undetectable). Keep
system-readback the default; zero-copy stays opt-in/experimental (no AMD/Intel lab box).
- **HDR symmetry**: make in-band ST.2086/CLL SEI a shared post-encode step so AMF/QSV get the same
mastering metadata as NVENC (today only NVENC attaches it; AMF/QSV rely solely on the 0xCE datagram).
Centralize "when does the client learn HDR metadata" in one owner.
- Keep `hdr.rs` , the `Encoder` trait, `EncodedFrame` , `validate_dimensions` , the caps-probe + RFI logic
verbatim. Delete the `pipeline.rs` `pump_once` doc stub (the real loop is `session/pipeline.rs` ).
---
## 6. Drivers — one binding stack (`windows-drivers-rs`), one workspace, one signing recipe
Today: `pf-vdisplay` on the vendored * * `wdf-umdf` ** stack; `pf_dualsense` + `pf_xusb` on
* * `microsoft/windows-drivers-rs` ** (`wdk` /`wdk-sys` /`wdk-build` ). Two bindgen passes, two SDK
resolutions, two `NTSTATUS` , two build systems, two signing recipes.
**Decision C: unify all three on `microsoft/windows-drivers-rs` ** (the official Microsoft stack), in one
in-tree `packaging/windows/drivers/` workspace, edition 2024, one `rust-toolchain.toml` , one CI build.
The gamepad drivers already ship on it; the work is to **migrate `pf-vdisplay` onto it ** and **add the
IddCx surface** it lacks today.
**Required pieces of this migration (each a Phase-0/early task): **
1. **Add an `iddcx` subset to `wdk-sys`. ** IddCx DDIs are * not * WDF-table functions — they are direct
`IddCxStub` exports — so the extension is bounded: an `ApiSubset::Iddcx` + `iddcx` feature →
bindgen `IddCx.h` + link `IddCxStub` , then ~15 thin `extern` /wrapper fns. Use the current
`wdf-umdf/src/iddcx.rs` (~345 LOC, validated) as a **line-by-line oracle ** , including the IddCx 1.10
`*2` HDR DDIs (`IddCxSwapChainReleaseAndAcquireBuffer2` , `IDARG_*2` , `_METADATA2` ).
2. **Solve `/INTEGRITYCHECK` for self-signed loading — properly. ** `wdk-build` links the driver with
`/INTEGRITYCHECK` , which a self-signed cert can't satisfy (CodeIntegrity 3004/3089). Today the
gamepad drivers hand-patch the FORCE_INTEGRITY PE bit post-link. Replace that hack with a robust
solution, in order of preference: (a) **override the linker flag ** — drop `/INTEGRITYCHECK` via
`wdk-build` config / `RUSTFLAGS` /`link-args` if it can be suppressed cleanly; else (b) a
**deterministic, tested CI post-link tool ** (a small Rust/PowerShell step that clears bit `0x80` at
`e_lfanew+0x5e` and re-signs, run in CI, not by hand) so it's reproducible and not a footgun; (c) for
a public build, real **attestation signing ** (Partner Center) satisfies `/INTEGRITYCHECK`
legitimately. Pick (a) if feasible; (b) as the fleet-self-signed fallback. This is the headline cost
of choosing this stack and must be nailed in Phase 0.
3. **Backport the `wdf-umdf-sys` build.rs SDK-resolution fix ** into `wdk-build` (or a local override):
resolve `IddCx.h` /`IddCxStub` by the SDK version that * actually contains * `um\x64\iddcx` , not the max
base SDK (the real failure where a newer base SDK shadows the WDK SDK). windows-drivers-rs's default
resolution doesn't exercise IddCx today, so this likely needs porting.
4. **Port `pf-vdisplay`'s typed safety wins ** onto the new stack: re-create the
`WDF_DECLARE_CONTEXT_TYPE!` `Arc<RwLock<T>>` context abstraction (the gold-standard contained
`unsafe` ); the version-gate protocol (`IddCxIsFunctionAvailable!` / `IDD_STRUCTURE_SIZE!` ); and a
thin safe wrapper layer so the gamepad drivers stop emitting raw `call_unsafe_wdf_function_binding!`
everywhere (the biggest driver-`unsafe` lever).
While unifying, also: adopt WDF device contexts for per-pad state (drop the
`UmdfHostProcessSharing=ProcessSharingDisabled` -dependent statics → true multi-pad-per-host); replace
`mem::zeroed()` configs with the `WDF_*_CONFIG_INIT` initializers (kills the recurring zeroed-default
bug class that already caused 3 driver bugs); cache the shm view (RAII `ShmView` ) instead of
re-mapping ~125× /s; **delete the world-writable `C:\Users\Public\*.log` driver logging ** and the "M0
spike" naming; collapse `is_nt_error()` /`dyn-Any` /`From<()>` -as-error into a typed `IntoDriverResult` ;
collapse the per-call dispatch `unsafe` into one generic `dispatch()` helper.
**Provenance note: ** confirm where `wdk` /`wdk-sys` /`wdk-build` come from (the gamepad drivers' Cargo.toml
path-deps `../../crates/wdk*` don't exist in this checkout — they resolve inside a windows-drivers-rs
checkout on the dev box). Pin them as crates.io deps or a vendored, version-pinned copy so the driver
workspace builds reproducibly in CI.
---
## 7. Input, audio, service, packaging
- **Input**: consolidate the host-side device plumbing (`create_swdevice` /`create_shm_section` /
`SwDeviceProfile` ) into one `inject/windows/swdevice.rs` used by all three managers (XUSB included,
which currently re-implements its own). The shm layouts come from `pf-vdisplay-proto` . Re-scope the
cross-session helper (if any) to input-only.
- **Audio**: small, already fairly clean. Replace the lone `newdev.dll` `LoadLibrary` +`transmute`
(`wasapi_mic.rs` , the audio runtime's * only * `unsafe` ) with the windows-rs `DiInstallDriverW` binding
(or move provisioning to the installer) → zero `unsafe` in the audio runtime.
- **Service / process**: one `windows/process.rs` owning RAII `Token` /`Event` /`Job` /`Child` + a single
`spawn_as_user()` used by BOTH the SCM supervisor and any helper — deletes the duplicated
token-dup/`merged_env_block` /`CreateProcessAsUserW` machinery and ~12 manual `CloseHandle` sites. Add
a **cooperative stop ** : a named stop event the supervisor sets and `serve` waits on, so Stop runs RAII
teardown (today `TerminateProcess` skips Drop → the virtual monitor lingers, the documented
stale-monitor gotcha); `TerminateProcess` only as a bounded fallback.
- **Packaging/CI**: keep the thin-.iss / fat-binary model; add a `punktfunk-host web install/uninstall`
subcommand to absorb the web-setup PowerShell. **Build + sign the unified driver workspace in CI from
source** (or a CI guard that fails on stale-vendored-DLL / un-bumped DriverVer) so the driver can't
silently drift from its source. Mint the **fresh pf-vdisplay GUID ** coordinated across host + driver +
INF. Single source of truth for version → build + ISCC AppVersion + INF DriverVer. Investigate
retiring `nefconc` by creating the ROOT devnode via SwDevice/CM in Rust. Keep the
devgen-never / nefconc-only and DriverVer-bump gotchas codified.
---
## 8. Unsafe-reduction program (run at port time, not as a separate pass)
- **P0 lints first** (a few lines, before new code): `#![deny(unsafe_op_in_unsafe_fn)]` (host crate has
none today; the driver workspace already has it), `#![warn(clippy::undocumented_unsafe_blocks)]` ,
`#![warn(clippy::multiple_unsafe_ops_per_block)]` . Generated bindings keep their opt-out.
- **P0 std handle ownership**: `std::os::windows::io::OwnedHandle` / `std::fs::File::from_raw_handle`
everywhere a raw `HANDLE` /`isize` is held (events/jobs/tokens/sections/pipes). Used in **zero ** host
files today — the single biggest cheap win. Deletes the bespoke `unsafe impl Read/Write/Drop`
(`HandleReader` ), the never-closed sudovda control handle, the `AtomicIsize` HANDLE globals, ~6 manual
`CloseHandle` sites — and fixes real leaks.
- **P0 the proto crate** (§3.1) — kills the shared-memory pointer-cast `unsafe` .
- **P1 typed wrappers**: `windows/d3d/` (most COM calls already return `Result` ; per-frame loop bodies
become `unsafe` -free, the irreducible keyed-mutex/`from_raw_parts` lands in one `frame_xfer` fn);
`nvenc_sys` + RAII ffmpeg (§5); one `windows/process.rs` (§7); collapse the 21 `unsafe impl Send`
onto one audited `SendPtr<T>` /`ThreadBound<T>` (directly de-risks the NVENC in-place coupling).
- **P2 contain the irreducible**: `win32u_hook.rs` (one `install()` ; scope to secure-DDA or drop),
`gpu_priority.rs` (the D3DKMT transmute), the WDF context-blob macro, the IddCx swap-chain DDI +
`from_raw_borrowed` (wrap in a typed `SwapChain` guard returning a borrowed `AcquiredSurface<'_>` ).
Document a `// SAFETY:` per residual site.
- **P2 delete `unsafe` by deleting code**: the `present_trigger` dead diagnostic, the `DebugBlock`
channel, `spawn_observer` , `IDD_PERSIST` /`open_or_reuse` , `helpers.rs Sendable<T>` , the WGC-open
thread-watchdog hack (gone with WGC), the driver file-logging.
Estimated: host ~144→~35, drivers ~227→~60, residual concentrated and auditable. (`#![forbid(unsafe)]`
is impossible for the drivers and the per-frame D3D path — the realistic target is * containment * .)
---
## 9. SudoVDA decoupling (mechanical rename + scrub)
`vdisplay/sudovda.rs` → `vdisplay/windows.rs` ; `SudoVdaDisplay` → `PfVirtualDisplay` ; scrub "SudoVDA"
from all log/error/doc strings across `capture.rs` /`dxgi.rs` /`wgc*.rs` /`idd_push.rs` /`punktfunk1.rs` /
`main.rs` /`sendinput.rs` (141 refs / 15 files). **Split the reach-in helpers out ** of the vdisplay
backend (they're display-utility, not virtual-display creation): `set_advanced_color` ,
`advanced_color_enabled` , `resolve_gdi_name` , `isolate/restore_displays_ccd` , `set_active_mode` →
`windows/display_ccd.rs` (collapsing the 4× copy-pasted `QueryDisplayConfig` preamble into one safe
`query_active_config()` ); `resolve_render_adapter_luid` → `windows/adapter.rs` . Both vdisplay and
capture then depend on these as peers, breaking the circular reach-in. `WinCaptureTarget` moves to a
neutral location (defined in `dxgi.rs` , constructed in `sudovda.rs` today). Drop the dual-driver
fallback conditionals. Expose HDR/monitor-release as `VirtualLease` methods (zero `unsafe` in the
session glue).
---
## 10. Build plan (greenfield — Decision A)
A from-scratch rebuild of the Windows host against the clean architecture, **salvaging the §1 jewels
verbatim** (the already-clean, already-tested modules: `hdr.rs` , `edid.rs` , the `inject/proto` codecs,
the HDR/cursor converters + their self-tests, the GF8 packetizer, the pairing handshake). The old
Windows code stays in-tree, untouched, as the * reference implementation * until the new path reaches
parity on glass, then is deleted.
**Greenfield-risk mitigation (the survey's strong caveat stands): ** almost none of this is
CI-validatable — the Windows backends + drivers need the RTX box (192.168.1.173) + the build VM, and
**AMF/QSV have no lab hardware at all ** . A greenfield rewrite therefore carries real risk of silently
dropping a layered bug-fix. Two guardrails are mandatory:
1. **The §1 preservation checklist is a test/assert contract ** , not prose: each rebuilt module ports its
hard-won invariants as unit tests or runtime asserts — RAII teardown order (restore displays * before *
REMOVE), keyed-mutex held only across convert/copy, `terminate` checked at the swap-chain loop top,
magic stamped last, `OUT_RING` texture rotation under `pipeline_depth>1` , the NVENC caps-probe
downgrade, the SwDeviceCreate identity recipe. A rebuild that drops one fails its own test.
2. **On-glass A/B gates ** at each milestone below, on the RTX box, against the current shipping build:
1080p60, 5K@240 HDR, reconnect-storm, secure desktop (lock/UAC), multi-pad. Nothing replaces the old
path until its A/B passes.
### Build order
- **M0 — Foundations + the `/INTEGRITYCHECK` answer.** Stand up `crates/pf-vdisplay-proto` (the clean,
owned ABI: fresh GUID, the redesigned IOCTL op enum + `#[repr(C)]` structs, `SharedHeader` ,
`FrameToken` , the gamepad shm layouts, `const` size-asserts, round-trip tests). Stand up the in-tree
`packaging/windows/drivers/` workspace on `windows-drivers-rs` and **prove the two hard unknowns ** :
(a) the `iddcx` `wdk-sys` subset bindgen+links and a trivial IddCx adapter loads; (b) `/INTEGRITYCHECK`
is solved (§6.2) so a self-signed driver loads under Secure Boot with no hand-patching. Add the P0
lints to the host crate. * No host behavior yet. *
- **M1 — pf-vdisplay on the new stack, first light.** Rebuild the IddCx driver against
`windows-drivers-rs` +`iddcx` , clean from the start: `DeviceContext` -owned state (no process-globals),
one `Monitor` identity, `EvtCleanupCallback` on `MonitorContext` , the ported `Arc<RwLock<T>>` context,
the EDID + HDR recipe verbatim, the redesigned control plane from the proto crate. *(On-glass: ADD →
monitor arrives → IDD-push ring attaches → frames flow at 1080p; REMOVE clean.)*
- **M2 — IDD-push capture + NVENC, glass-to-glass.** New `src/windows/` tree: `windows/d3d/` typed
wrappers, `windows/color/` (converters + self-tests), `windows/cursor.rs` , `capture/windows/idd_push.rs`
consuming the proto ring with a **type-level texture-ownership contract ** (no in-place-encode
assumption), `encode/windows/{nvenc.rs,nvenc_sys.rs}` , `vdisplay/windows.rs` + `windows/display_ccd.rs`
+ `windows/adapter.rs` . Wire the `SessionFactory` /`SessionPlan` (M2 only needs the IDD-push+NVENC
plan). * (On-glass A/B: 1080p60 + 5K@240 HDR, latency parity with the current build.) *
- **M3 — Service, input, audio, secure desktop.** `windows/process.rs` (RAII Token/Event/Job/Child +
`spawn_as_user` + cooperative stop) + `windows/service.rs` ; `inject/windows/*` on the proto shm +
consolidated `swdevice.rs` ; `audio/windows/*` (zero-`unsafe` runtime). Confirm IDD-push captures the
secure desktop (lock/UAC) and input reaches the streamed session (in-session `SendInput` , or the
input-only agent if needed). * (On-glass: full session incl. lock screen + UAC + a real pad.) *
- **M4 — Gamepad drivers onto the unified stack.** Rebuild `pf_dualsense` + `pf_xusb` on
`windows-drivers-rs` in the same workspace, WDF device contexts (true multi-pad), proto shm,
`WDF_*_CONFIG_INIT` , no file logging, no "M0 spike" naming. *(On-glass: 2 XInput + 2 DualSense pads,
rumble/lightbar/adaptive-trigger round-trip.)*
- **M5 — Fallbacks + GameStream + AMF/QSV.** Port the demoted WGC + DDA fallback capturers (minimal,
`win32u` hook scoped to the DDA leg); `encode/windows/ffmpeg_win/*` with RAII FFmpeg + the
`d3d11va_ffi` size-assert (system-readback default; zero-copy experimental); GameStream planes reusing
`session/pipeline.rs` , installer default flipped to secure `serve` . *(On-glass: Moonlight client on
the DDA fallback; AMF/QSV stays CI-only.)*
- **M6 — Cut over + delete.** Flip the default to the new path, run the full A/B matrix, then delete the
old `dxgi.rs` /`wgc*` /`sudovda.rs` /`punktfunk1.rs` Windows monoliths + the bring-up scaffolding
(`DebugBlock` /`spawn_observer` /observe gate) + the old gamepad driver crates. Single source of truth
for version; CI builds+signs all drivers from source.
Milestones are roughly dependency-ordered; M0 is the long pole (the `/INTEGRITYCHECK` + `iddcx` proof
gates everything else). M5's AMF/QSV cannot be validated without hardware — keep it system-readback-only
and clearly experimental.
---
## 11. Decisions (resolved 2026-06-24) + open verification items
The five product forks are decided (see the table in §0): **A ** greenfield; **B ** IDD-push primary for
everything incl. secure desktop, WGC+DDA kept as demoted fallbacks; **C ** extend `windows-drivers-rs` +
solve `/INTEGRITYCHECK` ; **D ** keep GameStream, default secure. On **E (concurrent sessions) ** : fix the
driver swap-chain lifecycle regardless (it removes the leak + the preempt dance); treat true
`max_concurrent>1` on Windows as a follow-on once clean reuse is proven on glass.
What remains are **technical unknowns to confirm on the RTX box ** (not user decisions):
- **`/INTEGRITYCHECK` resolution path (M0 long pole).** Can `wdk-build` suppress `/INTEGRITYCHECK` via
config/link-args (preferred), or must we keep a deterministic CI post-link bit-clear? Decides the
signing story for all three drivers.
- **`iddcx` subset on `wdk-sys` .** Does the bindgen+`IddCxStub` link cleanly, and does the SDK-resolution
fix need backporting? (windows-drivers-rs doesn't exercise IddCx today.)
- **Driver swap-chain reuse.** Does the clean ownership model (`EvtCleanupCallback` + DeviceContext state
+ single `Monitor` identity) actually fix the "reused swap-chain dies after ~2 sessions" root cause? If
not, the residual serialization stays inside `VirtualDisplayManager` .
- **IDD-push input + secure desktop.** Confirm `serve` runs in the console session so `SendInput` reaches
the streamed desktop (a code comment warns about Session 0→1); confirm IDD-push frames flow through the
lock screen / UAC (owner reports yes — verify and lock it in as the primary, demoting the DDA secure
leg to fallback).
- **Does the demoted DDA fallback still need the `win32u` hook** against pf-vdisplay, or was that purely
a SudoVDA/hybrid pathology? If unneeded, the self-modifying-code hook can be deleted entirely.
- **AMF/QSV** stays CI-only (no hardware) — system-readback default, zero-copy experimental.
---
## 12. Risks
- **Greenfield with no CI (the dominant risk).** The build VM is headless/WARP; the WinUI/hardware/driver
paths need the RTX box, and AMF/QSV have no hardware. A from-scratch rebuild can silently drop a
layered bug-fix. Mitigation: the §1 preservation checklist is a * test/assert contract * per rebuilt
module; on-glass A/B gates the new path before the old one is deleted (M6); keep the old code in-tree
as the reference until parity.
- **`/INTEGRITYCHECK` (M0 long pole).** Choosing `windows-drivers-rs` means self-signed loading depends
on solving it cleanly (§6.2). If neither linker-flag suppression nor a deterministic CI post-link step
works, drivers can't load self-signed — prove this first, it gates everything.
- **`iddcx` on `wdk-sys` ** is new surface (windows-drivers-rs doesn't bind IddCx). Bounded
(`IddCxStub` exports + ~15 wrappers, with the validated `wdf-umdf/iddcx.rs` as oracle) but unproven on
this stack — M0 must light it.
- **`pf-vdisplay-proto` spans two cargo build graphs** (host workspace + the driver workspace). Validate
the path-dep resolves on the Windows build env in M0; pin `wdk*` provenance so the driver workspace
builds reproducibly in CI.
- **Driver swap-chain-reuse root cause still undiagnosed.** The clean ownership model * should * fix it;
if not, residual serialization stays inside `VirtualDisplayManager` and `max_concurrent>1` stays
blocked. Keep `await_released` on the trait until reuse is proven on glass.
- **NVENC in-place encode + `pipeline_depth>1` ** is a latent corruption risk; the M2 texture-ownership
contract must be type-level (not the synchronous-loop assumption). Verify the ring on glass.
- **Host/driver version drift in the field.** New host + new driver are always built together (greenfield),
but the installer bundles both — enforce a startup version handshake (proto version in both binaries)
and a CI guarantee they're built from the same revision.
- **Big-bang cutover (M6).** Flipping the default and deleting the old monoliths is the riskiest moment;
it is gated on the full A/B matrix passing, and the old code is recoverable from git if a regression
surfaces post-cutover.