7b99b41ede
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).
- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
host-latency, gpu-contention (fixed stale status table), game-library,
linux-setup (fixed m0->spike + stale zero-copy claim),
session-aware-host-followups, windows-client-bootstrap,
windows-dualsense-{scoping,game-detection}, windows-virtual-display,
security-review (per-finding status table; #12 still open),
apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
merged, M4 done); windows-secure-desktop.md archived (now a fallback
behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
411 lines
30 KiB
Markdown
411 lines
30 KiB
Markdown
# Windows Host — Architecture, Status & Roadmap
|
||
|
||
> **Single source of truth** for the punktfunk Windows streaming host: the all-Rust **`pf-vdisplay`
|
||
> IddCx virtual-display driver** + **IDD-push zero-copy capture** + **NVENC/AMF/QSV encode**, shipped as
|
||
> a signed Inno Setup installer with a LocalSystem SCM service. Live-validated on the RTX box through
|
||
> 5120×1440@240 HDR, the secure desktop (lock/UAC), and a fullscreen game.
|
||
>
|
||
> This file is the consolidated Windows-host doc — it absorbs the rewrite design plan, the Goal-1
|
||
> staged-refactor plan, the audit + remediation tracker, the fullscreen-game capture-bug analysis, and
|
||
> the durable rationale from the original `windows-host.md` implementation plan (now a stub).
|
||
> **Last updated 2026-06-26.** All of this work is **merged to `main`** (the `windows-host-goal1`
|
||
> branch landed at `3e7c9bd`).
|
||
|
||
---
|
||
|
||
## 1. Status at a glance
|
||
|
||
**Goals 1–3 and milestones M0–M4 are complete and merged to `main`.** The host has a clean, typed,
|
||
layered architecture (`HostConfig → SessionPlan → SessionContext`, `windows/`+`linux/` confinement, a
|
||
single `VirtualDisplayManager` ownership model, `EncoderCaps`); the all-Rust IddCx `pf-vdisplay` driver
|
||
loads self-signed under Secure Boot and does IDD-push zero-copy capture at 5K@240 HDR including the
|
||
**secure desktop** (Winlogon/UAC/lock); SudoVDA is gone (`84a3b95`) — `pf-vdisplay` is the sole
|
||
virtual-display backend; and the three UMDF drivers (`pf-vdisplay`, `pf-dualsense`, `pf-xusb`) now build
|
||
from source in one unified `packaging/windows/drivers/` workspace (M4, `92e6802`). The shipped path
|
||
(IDD-push + NVENC) is live-validated on glass; the AMF/QSV encode path is CI-green but not yet
|
||
on-hardware (no AMD/Intel Windows box in the lab).
|
||
|
||
Ground the details against the code: `crates/punktfunk-host/src/windows/`,
|
||
`crates/punktfunk-host/src/{capture,encode,inject,audio,vdisplay}/windows/`, and
|
||
`packaging/windows/drivers/`.
|
||
|
||
**What remains (all non-blocking):** the `pf-vdisplay` slot-reclaim-on-REMOVE fix needs an on-glass
|
||
reconnect-storm A/B (§4 P1.3); host-crate `unsafe` lint hygiene + old-monolith / bring-up-scaffolding
|
||
cleanup (§4 P2); and the hardware-gated items — AMF/QSV on-glass, hybrid-GPU `SET_RENDER_ADAPTER`,
|
||
the WGC/DDA fallback reshape, and true `max_concurrent>1` (§4 P3). One framing note: the host was
|
||
**not** greenfield-rebuilt — it was **refactored in place** via a staged, behavior-preserving sequence
|
||
that kept the live host working at every step; only the *driver* was rebuilt fresh.
|
||
|
||
---
|
||
|
||
## 2. Architecture (what is on disk)
|
||
|
||
A ~1-page map; the empirical constraints these encode are in §3, the deep reference is in §6.
|
||
|
||
### 2.1 Layering & crates
|
||
|
||
- **`crates/punktfunk-host`** — one shared host crate (Linux + Windows; not split). Platform code is
|
||
confined under per-module `windows/`+`linux/` folders behind `#[cfg]` seams (`capture/{windows,linux}/`,
|
||
`encode/…`, `inject/…`, `audio/…`, `vdisplay/…`, plus top-level `src/windows/`+`src/linux/`). Module
|
||
names stay flat (`#[path]`), so caller paths are platform-agnostic.
|
||
- **`crates/punktfunk-core`** — the one linked protocol/FEC/crypto/QUIC core (unchanged here).
|
||
- **`crates/pf-driver-proto`** — the owned, `no_std` host↔driver ABI (frame ring + control plane +
|
||
gamepad SHM), consumed by both the host crate and the driver workspace.
|
||
- **`packaging/windows/drivers/`** — the unified driver workspace on `microsoft/windows-drivers-rs`
|
||
(vendored 0.5.1 + an `iddcx` subset): `pf-vdisplay` (the IddCx display driver), `pf-dualsense` +
|
||
`pf-xusb` (the gamepad drivers, folded in by M4), `wdk-iddcx` (typed IddCx DDI wrappers), `wdk-probe`
|
||
(the CI link/surface gate), `vendor/{wdk-build,wdk-sys}`.
|
||
|
||
### 2.2 Session resolution, ownership, and seam traits (Goal-1)
|
||
|
||
The old ~40-knob `PUNKTFUNK_*` env soup (re-read and recomputed in three places) is replaced by a
|
||
**resolve-once** pipeline: `config.rs` `HostConfig` (typed, parsed once) → `session_plan.rs` `SessionPlan`
|
||
(a `Copy` plan resolved once per session — `CaptureBackend::resolve()` picks `IddPush | Dda | Wgc`,
|
||
`resolve_topology` picks `SingleProcess | TwoProcessRelay`; this killed the latent capture/encode
|
||
backend-disagreement bug) → `SessionContext` (bundles the ~13 session args + plane receivers, moved into
|
||
the stream thread).
|
||
|
||
Ownership is a single **OnceLock `VirtualDisplayManager`** (`vdisplay/windows/manager.rs`) owning a
|
||
*typed* `Arc<OwnedHandle>` control-device handle (no raw-`isize` cross-thread smuggle), a refcounted
|
||
Idle/Active/Lingering state machine, and the monitor generation; a per-session `MonitorLease`'s `Drop`
|
||
releases the refcount (a stale lease can't tear down a fresh monitor). This deleted a fistful of
|
||
`CURRENT_MON_GEN`/`MGR`/`IDD_*` globals and validated on glass at **0 leaked monitors across a reconnect
|
||
storm**, A/B-equivalent to the shipping host.
|
||
|
||
The seam traits (`VirtualDisplay`/`VirtualOutput`/`VirtualLease`, `Capturer`, `Encoder`,
|
||
`AudioCapturer`/`VirtualMic`/`InputInjector`/`PadManager`) got two tightenings: the capturer takes the
|
||
desired `OutputFormat { gpu, hdr }` **in** (killing the `capture → encode::windows_resolved_backend()`
|
||
back-reference), and `Encoder::caps() -> EncoderCaps` (§2.4) lets the session glue route loss-recovery by
|
||
query.
|
||
|
||
### 2.3 Capture — IDD-push primary (normal **and** secure desktop), WGC/DDA fallback, GB1 recovery
|
||
|
||
**IDD-push is the universal primary path.** Capture comes straight from the driver's shared keyed-mutex
|
||
texture ring (`capture/windows/idd_push.rs`) — no Desktop Duplication, no `win32u` reparenting hook. The
|
||
host creates the ring; the driver opens it (permissive `D:(A;;GA;;;WD)` SDDL). The generation-tagged
|
||
`latest = gen<<40 | seq<<8 | slot` stale-ring reject kills the HDR-flip garbage frame; a host-owned
|
||
3-slot `OUT_RING` rotated per frame is the texture-ownership contract that enables `pipeline_depth=2`
|
||
(convert/copy on the 3D engine overlapping NVENC on the ASIC). It captures the **secure desktop**
|
||
(Winlogon/UAC/lock) directly (validated 2026-06-25), so there is no separate secure capturer in the
|
||
primary path.
|
||
|
||
- **Open-time fallback:** `IddPushCapturer::open` waits a bounded ~4 s for a *first frame* (not just
|
||
`DRV_STATUS_OPENED`); on attach failure it returns the keepalive back so `capture.rs` opens **DDA** on
|
||
the same `WinCaptureTarget` — never a 20 s black bail (`ed58365`/`f98ab07`).
|
||
- **Mid-session game mode-set recovery (GB1, fixed):** the 250 ms poll follows the display's *actual*
|
||
resolution (`win_display::active_resolution`, CCD/GDI) and recreates the ring on any descriptor change
|
||
(size **or** HDR) → the driver re-attaches → frames resume at the game's mode, **no reconnect**. If a
|
||
change is unrecoverable (e.g. an exclusive flip), a `recovering_since` clock drops the session after 3 s
|
||
so the client reconnects cleanly. No protocol bump was needed — the host reads the resolution straight
|
||
from Windows (`c87bfe0`; the driver's `publish()` width/height guard + flushed log is `789ad49`).
|
||
- **WGC + DDA** stay as demoted fallbacks for non-IddCx hardware (`wgc.rs`/`dxgi.rs`). The two-process WGC
|
||
secure-desktop relay (`wgc_relay.rs`) is no longer load-bearing now that IDD-push handles the secure
|
||
desktop; it is kept recoverable but slated for M5/M6 cleanup. (Its constraint analysis is archived in
|
||
[`archive/windows-secure-desktop.md`](archive/windows-secure-desktop.md).)
|
||
|
||
### 2.4 Encode — NVENC / AMF / QSV / software; `EncoderCaps`; HDR
|
||
|
||
`encode/windows/` dispatches per DXGI adapter vendor (`open_video`): **NVENC** (NVIDIA, direct SDK,
|
||
`nvenc.rs` — caps-probe-before-configure, bitrate-clamp binary search, true RFI over the DPB, in-band
|
||
ST.2086/CLL SEI), **AMF**/**QSV** (AMD/Intel via libavcodec, `ffmpeg_win.rs` — system-readback default,
|
||
opt-in zero-copy D3D11; CI-only, no lab hardware), or **software** H.264 (`sw.rs`). HDR (10-bit) forces
|
||
HEVC Main10 + BT.2020 PQ; the client auto-detects PQ from the VUI. The encoder adapts to a mid-session
|
||
size/format/HDR change per frame (tears down + re-inits), so the GB1 capturer's resolution changes are
|
||
handled downstream with no API change.
|
||
|
||
`Encoder::caps() -> EncoderCaps { supports_rfi, supports_hdr_metadata }` lets the session glue route
|
||
loss-recovery by query (only Windows direct-NVENC overrides it; the GameStream loop gates the RFI path on
|
||
`supports_rfi` rather than hard-coding per-backend knowledge into the glue).
|
||
|
||
### 2.5 Host↔driver ABI & the `pf-vdisplay` driver
|
||
|
||
`pf-driver-proto` is one `no_std` crate in both build graphs. It owns the **frame plane** (`FrameToken`
|
||
+ `Global\pfvd-*` names), the **control plane** (a fresh interface GUID — *not* SudoVDA's `e5bcc234`;
|
||
contiguous `0x900` IOCTL ops; a `GET_INFO` version handshake the host **asserts** + bails on mismatch),
|
||
and the **gamepad SHM** (`XusbShm`/`PadShm` incl. `device_type`). `bytemuck`-`Pod` + `size_of` **and**
|
||
`offset_of!` asserts make ABI drift a **compile error**.
|
||
|
||
The driver (`packaging/windows/drivers/pf-vdisplay/src/`) is an all-Rust UMDF IddCx driver on
|
||
`windows-drivers-rs` + the `iddcx` `wdk-sys` subset; the STEP 0–8 build is the checklist in §6.3, its
|
||
internals are the invariants in §3, and it loads self-signed under Secure Boot (FORCE_INTEGRITY cleared
|
||
post-link, §6.1). **Known gaps:** ownership state is still partly process-global with
|
||
`EvtCleanupCallback` on the **WDFDEVICE** (a deliberate, sound choice — E1 in §4); and
|
||
slot-reclaim-on-REMOVE (§4 P1.3).
|
||
|
||
### 2.6 Service, packaging, installer
|
||
|
||
A `LocalSystem` SCM supervisor (`windows/service.rs`) token-retargets and `CreateProcessAsUserW`s `serve`
|
||
into the console session (so `SendInput` reaches both the streamed and the secure desktop), relaunches on
|
||
session-change, and kills-on-close via a Job Object — the Sunshine/Apollo model (rationale:
|
||
[`windows-service.md`](windows-service.md)). Shipped as a **signed Inno Setup** `setup.exe`
|
||
(`packaging/windows/`, `windows-host.yml`) that builds + signs all three drivers from source, bundles
|
||
them + the FFmpeg DLLs, and delegates to `service install`. GameStream (Moonlight) is kept, but the
|
||
installer/service default to secure `serve` (GameStream opt-in).
|
||
|
||
---
|
||
|
||
## 3. Validated invariants — preserve, do not regress
|
||
|
||
These are expensive empirical wins; keep them intact when touching the code:
|
||
|
||
- **Frame transport:** host-creates/driver-opens keyed-mutex ring; generation-tagged stale-ring reject;
|
||
0 ms try-acquire / drop-on-full publish (never block the swap-chain thread); the `OUT_RING` rotation +
|
||
`pipeline_depth=2` overlap; `repeat_last` rotates into a fresh out-ring slot (depth-safe).
|
||
- **Driver internals:** `edid.rs` (128-byte EDID + CTA-861.3 HDR block, dual checksums); the FP16 HDR
|
||
recipe (`CAN_PROCESS_FP16` + the `*2` DDIs + gamma/HDR accept-stubs + `HIGH_COLOR_SPACE`); `DEVICE_POOL`
|
||
per render-LUID (NVIDIA UMD/VRAM leak fix); target-id stamped on the monitor context; the two swap-chain
|
||
leak fixes (borrow `IDXGIDevice` across `SetDevice` retries; check `terminate` at the loop top).
|
||
- **Monitor lifecycle:** serialized ADD/REMOVE/teardown; restore CCD topology **before** REMOVE; the
|
||
generation-stamped lease (a stale lease can't tear down a fresh monitor); 0-leak across reconnects.
|
||
- **HDR color math:** `hdr.rs` (pure, unit-tested, ST.2086 + big-endian SEI); the FP16→P010/Rgb10a2
|
||
converters + `hdr_p010_selftest`; the cursor decomposition.
|
||
- **NVENC tuning:** caps-probe-before-configure (10-bit→8-bit graceful downgrade); bitrate-clamp binary
|
||
search (each GPU's real ceiling); true RFI over the DPB; CBR / infinite-GOP / P-only / ~1-frame VBV.
|
||
- **Gamepad recipe:** the SwDeviceCreate identity (enumerator with no `_`; mandatory completion callback;
|
||
synthesized DS5 compat-ids; non-null per-pad `ContainerId`); one `pf_dualsense` serving DualSense+DS4
|
||
via a `device_type` byte; XUSB declining `WAIT_*`; per-pad index via `pszDeviceLocation`.
|
||
- **Session glue:** the trait seam + RAII keepalive teardown; host-lifetime shared services + per-session
|
||
gamepads; the encode|send split + microburst pacing; `build_pipeline_with_retry` permanent-vs-transient
|
||
classification; the GameStream `VideoPacketizer` (GF8 Cauchy, Moonlight byte-exact); the pairing/trust
|
||
handshake.
|
||
- **Core discipline:** no async on the per-frame path; `pf-driver-proto` is the single ABI source
|
||
(drift = compile error); the version handshake the host asserts.
|
||
|
||
---
|
||
|
||
## 4. Open work / next tasks (prioritized)
|
||
|
||
**P1 — ship-readiness / correctness**
|
||
1. **Goal-1 → `main` merge — ✅ DONE.** The `windows-host-goal1` branch is merged (tip `3e7c9bd`); the
|
||
full Windows CI matrix (incl. the `amf-qsv` encode path that local checks skip) runs on push.
|
||
2. **IDD-push default — ✅ resolved via `host.env`.** The shipped default `host.env` sets
|
||
`PUNKTFUNK_IDD_PUSH=1`, so a fresh install runs the validated IDD-push path (with the WGC/DDA fallback
|
||
in place). The bare *in-code* default (`config.rs`) is still `false` (the dev / non-pf-driver default);
|
||
flipping it to follow the deployed default is an optional tidy.
|
||
3. **pf-vdisplay slot reclaim on REMOVE** (driver robustness) — 🟡 **fix landed, on-glass-validation
|
||
pending.** Sustained ADD/REMOVE churn wedged the driver (`ADD → 0x80070490 ERROR_NOT_FOUND`) because the
|
||
monitor id (EDID serial / `ConnectorIndex` / container GUID) was a **monotonic** `NEXT_ID`, never
|
||
reclaimed → IddCx accumulated a new OS target slot per cycle until exhaustion. `monitor.rs` now allocates
|
||
the **lowest free id** (`alloc_monitor_id`), reused on REMOVE, so a fresh ADD reuses the departed
|
||
monitor's target slot instead of orphaning it. CI-compile-gated; the wedge only reproduces under
|
||
sustained churn on the RTX box, so this needs an **on-glass reconnect-storm A/B** to confirm (the box is
|
||
ephemeral). Keep `packaging/windows/reset-pf-vdisplay.ps1` as the recovery until validated.
|
||
|
||
**P2 — hygiene / architecture completion**
|
||
4. **D1-host — host-crate P0 lints — deferred (low value / high churn).** A crate-wide
|
||
`#![deny(unsafe_op_in_unsafe_fn)]` produced 100+ FFI-wrap sites across the Linux modules; it *wraps*
|
||
unsafe (discipline) rather than reducing it and doesn't improve stability, so it was deprioritized vs
|
||
the `OwnedHandle`/RAII reductions (which are **complete** — `idd_push.rs`, `service.rs`, the three
|
||
gamepad backends via a shared `gamepad_raii.rs`, the SCM STOP/SESSION events as `OnceLock<OwnedHandle>`,
|
||
the hot-loop `KeyedMutexGuard`, and the driver's `pod_init!`; all box-validated, clean `sc stop` in
|
||
~1 s). The driver already has the deny. Revisit D1-host as a final discipline pass (staged per-module)
|
||
if desired.
|
||
5. **M6 scaffolding cleanup** — delete the bring-up diagnostics (`spawn_observer`/`DebugBlock` in
|
||
`idd_push.rs`) and, once full parity is proven on glass, the host monoliths.
|
||
|
||
**Explicitly NOT doing (stability decision): E1 — driver `DeviceContext` ownership + per-`IDDCX_MONITOR`
|
||
`EvtCleanupCallback`.** The current process-global design is *sound*: IddCx DDIs receive only an
|
||
`IDDCX_MONITOR` handle (never the WDFDEVICE/context), and `ProcessSharingDisabled` makes one devnode = one
|
||
host process that dies with the device. A "device-owned" variant would *add* a use-after-free window (the
|
||
watchdog races device cleanup) for no gain, and the per-monitor cleanup callback isn't reliably reachable
|
||
on this UMDF/IddCx stack. Cleanup is already deterministic (WDFDEVICE `EvtCleanupCallback` +
|
||
`cleanup_for_device_removal` + the host-gone watchdog). **Revisit only if `max_concurrent>1` on Windows is
|
||
actually needed.** (`monitor.rs` documents this rationale at the `MONITOR_MODES` static.)
|
||
|
||
**P3 — larger, mostly hardware-gated**
|
||
6. **M4 — gamepad-driver unification — ✅ substantially DONE** (`92e6802`). `pf-dualsense` (DualSense /
|
||
DualShock 4) and `pf-xusb` (Xbox 360 / XInput) now live in the unified `packaging/windows/drivers/`
|
||
workspace and build from source per release against the vendored `wdk-sys`, exactly like `pf-vdisplay`;
|
||
`build-gamepad-drivers.ps1` signs them with the shared cert. **Remaining:** point the **driver side** at
|
||
`pf_driver_proto::gamepad::{PadShm,XusbShm}` (the host side already does — the `device_type`-at-offset
|
||
hand-duplication is the last ABI-drift hazard), add WDF device contexts for true multi-pad, and confirm
|
||
the source build matches the prior shipped binaries.
|
||
7. **M5 — reshape WGC/DDA + GameStream onto `session/pipeline`**, then delete the old relay/monoliths.
|
||
AMF/QSV stays CI-only (no lab hardware).
|
||
8. **On-glass behavioral validation** of the committed-but-unexercised fixes: the watchdog reaping on
|
||
host-kill, `SET_RENDER_ADAPTER` on a **hybrid** box (the lab box is single-dGPU), the IDD-push→DDA
|
||
fallback trigger, HDR-ring sizing + out-ring repeat under real HDR/static-desktop pipelining, and the
|
||
AMF/QSV encode path on real AMD/Intel hardware.
|
||
|
||
---
|
||
|
||
## 5. Operations
|
||
|
||
### 5.1 RTX box on-glass recipe
|
||
|
||
The persistent on-glass validator is the **RTX box** (`ssh "Enrico Bühler"@<ip>`, ENRICOS-DESKTOP, RTX
|
||
4090, PS shell). **The IP FLOATS** (DHCP; boots to **Proxmox** on reboot → ephemeral, unreachable after a
|
||
reboot; recently `.173`/`.158` — confirm current first; **never reboot it, never depend on it surviving**).
|
||
It has WDK 26100 + LLVM 21.1.2 + the Rust toolchain; build clone at `C:\Users\Public\pf-rewrite` (the
|
||
user's active driver-dev tree — **don't clobber uncommitted WIP**; use a worktree). Username has a `ü` →
|
||
quote it; it only breaks SDL3/client builds, not the host. To validate a host branch: worktree-checkout,
|
||
build with `CARGO_TARGET_DIR=C:\t-goal1`, then stop the **PunktfunkHost** service, back up the binary +
|
||
`%ProgramData%\punktfunk\host.env`, copy your build in, restart, drive `punktfunk-probe.exe` loopback,
|
||
then restore + `git worktree remove`. Drive over ssh via `powershell -EncodedCommand <base64 UTF-16LE>`
|
||
(plain quoting mangles; prefer `Write-Output`/file-redirect for clean output). Driver redeploy:
|
||
`packaging/windows/redeploy-pf-vdisplay.ps1`; ghost-monitor recovery: `reset-pf-vdisplay.ps1`.
|
||
|
||
### 5.2 CI / validation
|
||
|
||
The persistent build validator is the **windows-amd64 CI runner** (no GPU — fine for builds / `iddcx`
|
||
link / `/INTEGRITYCHECK` self-sign / the surface-asserts; live NVENC encode + on-glass defers to the RTX
|
||
box). Workflows: `windows-host.yml` (the host installer), `windows-drivers.yml` (the driver workspace
|
||
build + FORCE_INTEGRITY clear), `windows-drivers-provision.yml` (WDK/LLVM toolchain), `windows-msix.yml`
|
||
(the client). A single Windows runner serializes the whole fleet; a `Cargo.toml` touch costs ~25 min of
|
||
queue, so driver pushes that avoid `Cargo.toml` skip the fleet serialization.
|
||
|
||
Local pre-push checks (this Linux box can't compile the Windows paths):
|
||
```sh
|
||
cargo test -p pf-driver-proto # the ABI crate (cross-platform)
|
||
cargo check -p punktfunk-host # Linux paths; win_* mods are #[cfg(windows)]
|
||
cargo clippy -p punktfunk-host --all-targets -- -D warnings
|
||
# Windows host clippy (on the box): PUNKTFUNK_NVENC_LIB_DIR=C:\t\nvenc;
|
||
# cargo clippy -p punktfunk-host --features nvenc --target x86_64-pc-windows-msvc -- -D warnings
|
||
# Driver build (on the box): cd packaging/windows/drivers; Version_Number=10.0.26100.0;
|
||
# LIBCLANG_PATH='C:\Program Files\LLVM\bin'; cargo build
|
||
```
|
||
Note: a pre-existing rustfmt-version drift exists in some Windows-only files (this box's rustfmt 1.9.0
|
||
wraps `offset_of!`/`unsafe fn` differently than the runner's) — don't reformat unrelated files to chase it.
|
||
|
||
### 5.3 Env knobs (Windows host)
|
||
|
||
`PUNKTFUNK_IDD_PUSH=1` (capture from the driver ring; shipped `host.env` default on, in-code default off),
|
||
`PUNKTFUNK_ENCODER=auto|nvenc` (auto → vendor-detect), `PUNKTFUNK_10BIT=1` + `PUNKTFUNK_HDR_SHADER_P010=1`
|
||
(HDR), `PUNKTFUNK_SECURE_DDA=1`, `PUNKTFUNK_NO_WGC=1` (pure DDA), `PUNKTFUNK_ZEROCOPY=1`,
|
||
`PUNKTFUNK_MONITOR_LINGER_MS`, `PFVD_DEBUG_LOG=1` (driver file log — release builds are silent without it).
|
||
Config lives in `%ProgramData%\punktfunk\host.env`; logs in `%ProgramData%\punktfunk\logs\host.log`.
|
||
|
||
### 5.4 Build / deploy / packaging
|
||
|
||
x64-only by design (no ARM64 NVIDIA driver). The installer is the thin-`.iss` / fat-binary model
|
||
delegating to `service install`; tag `host-win-vX.Y.Z`. The drivers are built + FORCE_INTEGRITY-cleared +
|
||
signed + `Inf2Cat`'d in CI from source. DriverVer must bump on any driver change; create the ROOT devnode
|
||
via nefcon (devgen is forbidden).
|
||
|
||
---
|
||
|
||
## 6. Reference (hard-won — keep)
|
||
|
||
### 6.1 The `/INTEGRITYCHECK` answer
|
||
|
||
`wdk-build` emits `cargo::rustc-cdylib-link-arg=/INTEGRITYCHECK` **unconditionally** (no cfg/env/Config
|
||
opt-out), so a self-signed driver can't load (CodeIntegrity 3004/3089). The fix: a deterministic,
|
||
idempotent post-link step `packaging/windows/clear-force-integrity.ps1` clears the PE FORCE_INTEGRITY bit
|
||
(`0x0080 @ e_lfanew+0x5e`) + verifies (CI-proven `0x01E0 → 0x0160`), **before** signing. Packaging order:
|
||
`cargo build` → clear-force-integrity → sign `.dll` → `Inf2Cat` → sign `.cat`. (A public build would use
|
||
real attestation signing, which satisfies `/INTEGRITYCHECK` legitimately.)
|
||
|
||
### 6.2 The `iddcx` binding on `wdk-sys` (the make-or-break — proven, the 6 bindgen knobs)
|
||
|
||
IddCx DDIs are **function-table dispatched** (`IddFunctions[]` indexed by `_IDDFUNCENUM::<Name>TableIndex`,
|
||
`IddDriverGlobals` implicit arg 1) — the same model `wdk-sys` already implements for WDF. The vendored
|
||
`windows-drivers-rs` 0.5.1 (`packaging/windows/drivers/vendor/`, `[patch.crates-io]`'d) gets a first-class
|
||
`ApiSubset::Iddcx` that bindgens `iddcx/1.10/IddCx.h` reusing the identical `wdk_default(config)` baseline
|
||
(so WDF/DXGI types **resolve to**, not redefine, `wdk-sys`'s — type-identity by construction). The six
|
||
knobs `generate_iddcx` needed (each a real gotcha, all CI-proven):
|
||
|
||
1. **`--language=c++`** — `wdk_default` parses C; `IddCx.h`'s `IDARG_*` typedefs need C++ (else a "must use
|
||
'struct' tag" cascade).
|
||
2. **`-DIDD_STUB`** — table-dispatch mode; skips `IddCxFuncEnum.h`'s `#error IDDCX_VERSION_MAJOR not
|
||
defined`. **Do NOT add `WDF_STUB`** (would desync the shared WDF type-identity).
|
||
3. **`allowlist_recursively(false)` + `allowlist_file("(?i).*iddcx.*")`, full codegen (no `.complement()`)**
|
||
— emit ONLY IddCx items; WDF/Win types resolve via `use crate::types::*`.
|
||
4. **`allowlist_type("_?DXGI_.*" / "IDXGI.*" / "_?OPM_.*" / "_?D3DCOLORVALUE")`** — emit the non-WDF types
|
||
`wdk-sys` doesn't bindgen, locally. The `_?` is load-bearing (`typedef struct _OPM_X {} OPM_X` needs the
|
||
tag AND the alias).
|
||
5. **`pub type UINT = ::core::ffi::c_uint;` in `src/iddcx.rs`** — `UINT` is absent from `crate::types`.
|
||
6. **`translate_enum_integer_types(true)`** — emit native `u32` reprs for the DXGI/OPM ModuleConsts enums
|
||
(nested modules can't see a parent `UINT`).
|
||
|
||
Wrapper note: table dispatch via `_IDDFUNCENUM::<Name>TableIndex as usize` (the ModuleConsts const, **not**
|
||
a NewType `.0`); NTSTATUS is plain `i32` (`wdk_sys::NT_SUCCESS`). The driver `build.rs` adds the IddCxStub
|
||
link-search (the import lib is under `iddcx\1.0\` even though headers are `1.10`) + `#[no_mangle] pub static
|
||
IddMinimumVersionRequired: ULONG = 4`. The versioned `IDD_STRUCTURE_SIZE!` path is dropped — the WDK links
|
||
the iddcx **1.0** stub (lacks the version table); we target 1.10 vs a current framework, so `size_of` is
|
||
exactly correct.
|
||
|
||
### 6.3 Driver port checklist (STEP 0–8, as landed)
|
||
|
||
0. workspace `pf-vdisplay`(cdylib)+`wdk-iddcx`; prove `std::thread`+`OwnedHandle` link under UMDF (done).
|
||
1. `wdk-iddcx`: 11 typed DDI wrappers via one dispatch macro + re-export the inbound `PFN_*` types.
|
||
2. DriverEntry + `IDD_CX_CLIENT_CONFIG` (15 callbacks) + DeviceInitConfig + WdfDeviceCreate +
|
||
CreateDeviceInterface (the owned pf GUID) + DeviceInitialize; `edid.rs` salvaged verbatim.
|
||
3. DeviceContext + `WDF_DECLARE_CONTEXT_TYPE` blob; `init_adapter` in D0Entry (caps + FP16) →
|
||
AdapterInitAsync; the `*2` mode DDIs + `query_target_info` + gamma/HDR accept-stubs. (Box gate: loads
|
||
under Secure Boot, enumerates as an IddCx adapter, Status OK.)
|
||
4. control plane (`GET_INFO` version handshake the host asserts, ADD/REMOVE/SET_RENDER_ADAPTER/PING/
|
||
CLEAR_ALL) + create_monitor + real mode DDIs + watchdog + mode bounds; host switched to
|
||
`pf_driver_proto`.
|
||
5. `Direct3DDevice` + assign/unassign + `SwapChainProcessor` (worker, `SetDevice` 60×@50 ms single-borrow
|
||
retry, top-of-loop `terminate`, `ReleaseAndAcquireBuffer2`, `from_raw_borrowed`).
|
||
6. `FramePublisher` on `pf_driver_proto::frame` + keyed-mutex RAII guard; wire into `run_core`. (Box:
|
||
full IDD-push glass-to-glass + the **secure-desktop** gate — validated 2026-06-25.)
|
||
7. HDR / FP16 ring (validated: Mac connects WITH HDR).
|
||
8. its own `.inx` + an `unsafe`-reduction pass (`deny(unsafe_op_in_unsafe_fn)`, per-site `// SAFETY:`).
|
||
|
||
**Remaining driver work** beyond STEP 8: E1 (DeviceContext-owned state + per-`IDDCX_MONITOR`
|
||
`EvtCleanupCallback` → unblock `max_concurrent>1` — see §4 for why it's deliberately deferred), the
|
||
slot-reclaim-on-REMOVE fix (§4 P1.3), and folding the gamepad-driver side onto `pf_driver_proto` (M4 tail,
|
||
§4 P3).
|
||
|
||
### 6.4 Resolved product decisions (the five forks)
|
||
|
||
**A** the host was refactored **in place** (staged, behavior-preserving), not greenfield-rebuilt — the
|
||
driver *was* rebuilt fresh. **B** IDD-push primary for everything incl. the **secure desktop** (validated);
|
||
WGC+DDA demoted to non-IddCx fallbacks. **C** all drivers on `microsoft/windows-drivers-rs` (+ the `iddcx`
|
||
subset; `/INTEGRITYCHECK` solved) — done for `pf-vdisplay` and now for the gamepad drivers (M4, `92e6802`).
|
||
**D** keep GameStream (Moonlight), default to secure `serve`. **E** concurrent sessions: the host-side
|
||
preempt dance was removed by the ownership-model work, but true `max_concurrent>1` on Windows stays blocked
|
||
on the E1 driver swap-chain-reuse work (deliberately deferred, §4). **Rejected: DeviceContext-per-monitor
|
||
ownership** — see the E1 stability decision in §4 (it would add a use-after-free window for no gain under
|
||
`ProcessSharingDisabled`).
|
||
|
||
---
|
||
|
||
## Origins & design rationale (from the original plan)
|
||
|
||
This folds in the durable rationale from the original Windows host + client plan
|
||
([`windows-host.md`](windows-host.md), now a stub; full original text in git history). The Windows host
|
||
began (2026-06-10 to 2026-06-14) as a *"add backends behind the existing traits"* job, not a parallel
|
||
port — `punktfunk-core` and the whole control plane are platform-agnostic, and the host already compiled
|
||
on non-Linux (macOS) thanks to existing `cfg(target_os)` gating. These framing decisions shaped what
|
||
shipped and still explain *why* the code is the way it is:
|
||
|
||
- **Build order: host-first.** A user preference (the research had recommended *client*-first, since the
|
||
client is unblocked by the no-GPU problem and becomes the host's test endpoint). The trade-off held —
|
||
the GPU-gated steps were the only ones that stalled GPU-less.
|
||
- **Trait-based abstraction → ~95% reuse.** `punktfunk-core` (protocol/FEC/crypto/session/transport/QUIC/
|
||
C ABI), the GameStream wire logic (mDNS, serverinfo, pairing, RTSP, ENet), the management REST API +
|
||
`native_pairing`/`discovery`, and the `punktfunk1`/`spike`/`pipeline` orchestration all carried over
|
||
unchanged — only the OS-touching backends behind `Capturer`/`Encoder`/`VirtualDisplay`/`InputInjector`/
|
||
`AudioCapturer`/`VirtualMic` are new `#[cfg(windows)]` code. Getting to MSVC needed only ~3 `cfg`-gates
|
||
(gate the `std::os::fd`/`OwnedFd` unix-isms in `main.rs`/`vdisplay.rs`).
|
||
- **The no-GPU dev strategy.** Most of the port was built + validated on a **GPU-less Windows VM**: the
|
||
MSVC compile, the virtual-display control path (WARP), the openh264 software-encode pipeline (full
|
||
capture→encode→FEC→UDP transport minus HW), SendInput injection + interactive-session/desktop-reattach,
|
||
gamepad + rumble, and the entire client (software-decode loopback). Only NVENC-D3D11 zero-copy, the
|
||
DDA-vs-WGC bake-off, split-encode/bitrate-ceiling, and *all* glass-to-glass numbers deferred to a real
|
||
NVIDIA box (no perf claim transfers from Linux).
|
||
- **Windows-specific structural issues (no Linux precedent)** — these are the gotchas that drove the
|
||
service + capture design and remain true:
|
||
- **Interactive session, not a Session-0 service.** SendInput can't reach the desktop from Session 0;
|
||
Desktop Duplication / capture need the interactive session. Hence the SYSTEM-in-interactive-session
|
||
supervisor (§2.6, [`windows-service.md`](windows-service.md)) and the `OpenInputDesktop`/
|
||
`SetThreadDesktop` re-attach to survive UAC/lock desktop switches.
|
||
- **Clock epoch.** The skew handshake assumes both ends read the same realtime epoch in ns — the Windows
|
||
host must emit timestamps from `GetSystemTimePreciseAsFileTime`→Unix-epoch-ns, or cross-machine latency
|
||
+ `ClockProbe`/`ClockEcho` break (std `SystemTime` on Windows is historically coarser).
|
||
- **No audio endpoint on a headless IDD.** WASAPI loopback needs a real/virtual render device; the
|
||
virtual *mic* (client→host) has no clean user-mode path — deferred.
|
||
- **Color/range.** All clients assume BT.709 limited-range; the BGRA→I420/NV12 path must match or colors
|
||
wash out — validated against the existing decoders.
|
||
|
||
**SudoVDA → pf-vdisplay evolution.** The original plan was built around **SudoVDA**, an off-the-shelf
|
||
indirect display driver (the same IDD Apollo ships) — chosen to avoid writing/WHQL-signing a driver and to
|
||
get arbitrary `WxH@Hz` modes on the fly. It carried the host all the way to live-validated NVENC on a real
|
||
RTX 4090. It was then replaced by the all-Rust `pf-vdisplay` IddCx driver (which solved
|
||
`/INTEGRITYCHECK` self-signing, §6.1, and gave us the IDD-**push** zero-copy capture path that captures the
|
||
secure desktop directly) and **deleted in commit `84a3b95`** — `pf-vdisplay` is now the sole
|
||
virtual-display backend. The full SudoVDA control protocol (IOCTL layout, watchdog keepalive, GDI-name
|
||
resolution) lives in git history if ever needed as a reference.
|