feat(host): HDR Vulkan layer so Vulkan games get HDR on the virtual display
ci / web (push) Failing after 22s
windows-host / package (push) Failing after 4m16s
ci / rust (push) Failing after 4m56s
ci / docs-site (push) Successful in 1m7s
android / android (push) Successful in 9m19s
ci / bench (push) Successful in 4m47s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Failing after 3s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Failing after 6m29s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 7m4s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 7m17s
apple / swift (push) Successful in 1m13s
apple / screenshots (push) Successful in 5m27s

NVIDIA/AMD Vulkan ICDs refuse to *advertise* an HDR color space for a surface on an
IddCx indirect/virtual display, so Vulkan games (Doom: The Dark Ages, id Tech, Indiana
Jones, …) report "device does not support HDR" — even though Windows HDR, DWM compose,
and the client PQ stream all work, and the ICD happily *accepts + presents* a forced HDR
swapchain there. The whole gap is enumeration; the community (Apollo/Sunshine/VDD) wrote
this off as kernel-side / unfixable.

Add VK_LAYER_PUNKTFUNK_hdr_inject (packaging/windows/pf-vkhdr-layer/): a standalone
cdylib Vulkan implicit layer that appends {A2B10G10R10, HDR10_ST2084} + {RGBA16F, scRGB}
to vkGetPhysicalDeviceSurfaceFormats[2]KHR (no need to hook vkCreateSwapchainKHR — the
ICD doesn't validate the color space there). Self-gated on the surface monitor's actual
advanced-color state (DisplayConfig GET_ADVANCED_COLOR_INFO), so it is a complete no-op
on SDR sessions and real monitors (dedup). Always-on (registry-discovered) so it works
regardless of how a game is launched — env-scoping silently fails for already-running
Steam. Escape hatches: DISABLE_PF_VKHDR, PF_VKHDR_EXCLUDE, and a built-in kernel-anti-
cheat denylist.

The installer builds/signs/stages it and registers it under
HKLM64\SOFTWARE\Khronos\Vulkan\ImplicitLayers (opt-out "Install the HDR Vulkan layer"
task); windows-host CI fmt+clippy-gates it (msvc-only FFI).

Live-validated on the RTX box: Doom: The Dark Ages enables HDR over the pf-vdisplay
virtual display.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-26 11:33:20 +00:00
parent 3e7c9bd059
commit d01a8fd17a
31 changed files with 1021 additions and 0 deletions
+375
View File
@@ -0,0 +1,375 @@
# Windows host + client — implementation plan
**Status: in progress — dev box provisioned, host-first.** A Windows host is an *"add backends
behind the existing traits"* job, not a parallel port: `punktfunk-core` and the whole control plane
are platform-agnostic and the host already compiles on non-Linux (macOS) thanks to existing
`cfg(target_os)` gating. The one piece that used to make it XL — a per-client *virtual* output, which
has no user-mode Windows API — is solved by reusing **[SudoVDA](https://github.com/SudoMaker/SudoVDA)**
(the SudoMaker Virtual Display Adapter, the same IDD the Apollo Sunshine-fork ships): a pre-built IDD
that creates virtual displays at **arbitrary `WxH@Hz` on the fly**. We install it and drive its IOCTL
control interface — **no driver to write or WHQL-sign.**
History: scoped 2026-06-10 (4-agent read of the host crate); SudoVDA path 2026-06-11; this concrete
plan + dev box + SudoVDA protocol + no-GPU strategy added 2026-06-14 (12-agent research pass).
## Status (2026-06-15) — full pipeline live-validated on an RTX 4090
Every OS-touching backend is implemented behind the existing traits and **builds clean on
`x86_64-pc-windows-msvc`** (and Linux unaffected). `serve` / `punktfunk1-host` **run on Windows**
(identity in `%APPDATA%`, QUIC bound, mDNS advertising, accepting sessions). The **full native
pipeline is validated live on a real RTX 4090** (Windows 11): SudoVDA virtual display → DXGI
Desktop Duplication (D3D11 zero-copy) → **NVENC HEVC** → punktfunk/1 → Rust reference client, at
720p60 and 1080p60 (0 mismatched frames, p50 1.6 / 3.45 ms cross-machine, ffmpeg-decodes clean),
coexisting with a running Apollo (two concurrent NVENC sessions).
| Backend | State | GPU-less validation on the VM |
|---|---|---|
| Virtual display (SudoVDA) | ✅ done | live: open/version/watchdog/ADD/REMOVE via the trait |
| Input (SendInput) | ✅ **live on RTX 4090** | mouse injection verified — cursor tracked the client's absolute diagonal sweep across the desktop in Session 1 (keyboard shares the same SendInput primitive) |
| Software encode (openh264) | ✅ done | **live: m0 synthetic→openh264→core FEC loopback, 120/120, 0 mismatches** |
| Audio (WASAPI loopback) | ✅ done | live: init chain opens (silent VM → no samples) |
| Capture (DXGI Desktop Duplication) | ✅ **live on RTX 4090** | SudoVDA monitor → D3D11 zero-copy duplication; output is enumerated under the *rendering* GPU, not the SudoVDA LUID (search all adapters) |
| NVENC (D3D11, `--features nvenc`) | ✅ **live on RTX 4090** | 720p60 + 1080p60 HEVC end-to-end to the Rust client; ffmpeg-decodes clean; ran alongside Apollo (2 NVENC sessions) |
| Run host (serve/punktfunk1-host) | ✅ live | punktfunk1-host starts + listens; `c_abi_connection_roundtrip` passes |
| Gamepad (ViGEm) | ✅ done | compiles incl. rumble back-channel; live needs ViGEmBus + a physical pad |
| Host→client audio wiring | ✅ done | builds on MSVC; `m3` `audio_thread` active on Windows (silent VM → no samples to send) |
| GameStream (Moonlight) audio | ✅ done | stereo path active on Windows (WASAPI→Opus→RTP/FEC); surround stays Linux-only (libopus multistream / `audiopus_sys`) |
| Rumble back-channel (ViGEm) | ✅ done | `request_notification` → background thread → 0xCA; live needs a physical pad |
| Game library (Steam discovery) | ✅ done | Windows Steam roots (Program Files) + VDF other-drive libraries; custom store already cross-platform. Non-default Steam install dir (registry) not yet covered |
**Remaining for full parity:**
- **Keyboard injection** — exercised via the same SendInput path (mouse verified live); not yet
asserted into a focused text field.
- **ViGEm rumble + gamepad input** — the pad is created live (ViGEmBus connected); the rumble
back-channel + input still need a physical pad to verify.
- **GameStream (Moonlight) path on a GPU box** — not yet run live (its fixed ports collide with
Apollo, so stop Apollo first).
- **Frame pacing on static content** — DXGI duplication is change-driven, so a blank/idle virtual
display delivers only ~12 fps (181/177 frames over ~15 s); a rendering app drives the full rate.
### Live UX hardening (2026-06-15, validated Mac ↔ RTX 4090)
Driven by live testing with the native macOS client at the display's native **5120×1440@240**:
- **Native resolution, not 1080p.** `sudovda::set_active_mode` enumerates the modes the IDD actually
advertises (`EnumDisplaySettingsW`) and sets the requested **resolution** at the best supported
refresh — keeping 5120×1440@240, never silently collapsing to the 1280×720/1920×1080 OS default
when an exact mode is briefly unavailable.
- **Bitrate auto-cap.** NVENC `init_session` probes and steps the average bitrate down (×3/4 to a
floor) when the requested rate exceeds the GPU's codec-level max, so a high client bitrate connects
instead of failing (matches the Linux host; we do NOT split NVENC sessions).
- **Mouse cursor.** DXGI duplication excludes the hardware cursor; we read the pointer
position/shape from the frame info (`GetFramePointerShape`) and GPU-composite it onto the captured
texture before NVENC (a CPU read-back would stall the pipeline). Color cursors alpha-blend;
**masked-color** cursors (the text I-beam) use an `INV_DEST_COLOR` blend for true screen inversion,
so the caret is visible on any background (no black box). Monochrome handled too.
- **Secure desktop (lock / login / UAC).** The host runs as **SYSTEM in the interactive session**;
the capturer `SetThreadDesktop`s onto the current input desktop and, on the WinSta switch,
**recreates the D3D11 device** and **re-resolves the virtual output's GDI name from the stable
SudoVDA target id** (the name changes across the topology rebuild — the old failure was hunting the
stale `\\.\DISPLAYn` and dropping). `ACCESS_LOST` / `INVALID_CALL` / device-removed are all treated
as recoverable, and a mid-stream resolution change is followed (capturer + NVENC re-init at the new
size). Validated: logging in / locking through the stream stays connected (one real session
recovered 1012 desktop switches and completed cleanly). *Display isolation* (`isolate_displays`
detaches other monitors so Winlogon renders to the virtual output) covers the case where a physical
monitor is also attached.
### Running as SYSTEM (deployment) — the `PunktfunkHost` service
To capture the secure desktop the host must run as **SYSTEM in the interactive Session 1** (a Session
0 service can't duplicate Session 1). The end-user deployment is the built-in Windows **service**
(`src/service.rs`) — see [`windows-service.md`](windows-service.md). One elevated command:
```powershell
punktfunk-host service install # auto-start LocalSystem service + firewall rules + default host.env
punktfunk-host service start
```
The service runs in Session 0 but never captures: it duplicates its own LocalSystem token, retargets
it to the active console session, and `CreateProcessAsUserW`s the host there — supervising it across
exits and console-session switches (the Sunshine/Apollo model). Config lives in
`%ProgramData%\punktfunk\host.env`; logs in `%ProgramData%\punktfunk\logs\`.
> **Old bring-up chain (debug only, superseded by the service):** a scheduled task (Interactive,
> Highest) → `PsExec64 -s -i 1 -d wscript.exe launch.vbs` → `host-run.cmd` (hidden window), with
> `APPDATA=C:\Users\Public` as the shared-identity hack. The service replaces all of this; the host
> now resolves its config dir to `%ProgramData%\punktfunk` directly (`PUNKTFUNK_CONFIG_DIR` overrides).
### Real-GPU test box (RTX 4090, `ssh "Enrico Bühler"@192.168.1.174`)
Windows 11, RTX 4090 (driver 596.36) + AMD iGPU, SudoVDA + Apollo (sunshine) installed. SSH lands in
**Session 0 (non-interactive)** — DXGI duplication + SendInput need the **interactive Session 1**, so
launch the host there via an Interactive scheduled task (admin SSH session is the same user):
`Register-ScheduledTask -Principal (New-ScheduledTaskPrincipal -UserId (whoami) -LogonType
Interactive -RunLevel Highest)`, then `Start-ScheduledTask`. The host runs with desktop access; read
its redirected log over SSH. `nvEncodeAPI64.dll` ships with the driver, so a VM-built `--features
nvenc` exe runs here as-is (no SDK install). The 4090's Ada NVENC has no consumer session cap, so the
host encodes alongside Apollo. **Gotcha:** the SudoVDA monitor is rendered by — and DXGI-enumerated
under — the 4090, not the SudoVDA adapter LUID (the capturer searches all adapters; see the fix).
#### Native build on the 4090 (fast iteration loop)
Build on the box itself (edit locally → `sftp` to the repo → `cargo build` there → run via the task)
instead of build-on-VM-then-copy. Prereqs that bit us, in order:
1. **Full MSVC C++ build tools, incl. the CRT libs.** A VS install can land `cl.exe` + the Windows
SDK + sanitizer libs but *miss* the desktop CRT import libs (`VC\Tools\MSVC\<ver>\lib\x64\msvcrt.lib`,
`libcmt.lib`, …) → `LNK1104: msvcrt.lib`. Root cause here: the `Microsoft.VisualCpp.Redist.14`
package failed to install (1603), cascading to skip the NativeDesktop workload. Fix = (re)install
the C++ workload via the VS Installer **GUI** (the headless `setup.exe modify` over SSH fails — a
non-elevated SSH token gives 1603/87, and `--quiet` as SYSTEM hangs). A reboot may be needed first
(a pending reboot also yields 1603). Stop-gap: the desktop CRT libs are version-pinned, so they can
be copied from another box with the **identical** MSVC version (`14.51.36231` here).
2. **Build from an ASCII path.** A username with a non-ASCII char (`C:\Users\Enrico Bühler\…`) breaks
the MSVC PDB writer → `LNK1201: error writing to the program database`. Clone/copy the repo to
e.g. `C:\Users\Public\punktfunk-native` and build there (the VM worked only because it built in
`C:\Users\Public\punktfunk`).
3. `winget install NASM.NASM Kitware.CMake`; generate the NVENC import lib (`lib /def` → set
`PUNKTFUNK_NVENC_LIB_DIR`); set `CMAKE_POLICY_VERSION_MINIMUM=3.5` (libopus).
Build env (each `cargo` invocation): `$env:PATH += ";C:\Program Files\NASM;C:\Program Files\CMake\bin"`,
`$env:CMAKE_POLICY_VERSION_MINIMUM="3.5"`, `$env:PUNKTFUNK_NVENC_LIB_DIR="C:\Users\Public\nvenc"`, then
`cargo build --release -p punktfunk-host --features nvenc`. Validated: native build (1m37s) →
720p60 NVENC, 174/174 frames, p50 2.5 ms, ffmpeg-decodes clean.
All Windows backends are `clippy -D warnings` and `rustfmt` clean on `x86_64-pc-windows-msvc` (the
Windows-only modules are cfg-excluded from Linux CI, so run clippy on the VM after touching them — its
rustc 1.96 clippy is stricter than the Linux CI image on shared code, e.g. `needless_return`).
### Building & testing on a real-GPU Windows box (NVENC)
1. Install **SudoVDA** (virtual display) and **ViGEmBus** (gamepad) drivers; install the NVIDIA driver.
2. NVENC link lib: either install the NVIDIA Video Codec SDK, or generate an import lib from the
driver DLL — `lib /def:nvenc.def /machine:x64 /out:nvencodeapi.lib` where `nvenc.def` lists
`NvEncodeAPICreateInstance` and `NvEncodeAPIGetMaxSupportedVersion` — and set
`PUNKTFUNK_NVENC_LIB_DIR` to its directory.
3. `cargo build -p punktfunk-host --features nvenc,amf-qsv` for the all-vendor GPU host (NVENC for
NVIDIA; AMD AMF + Intel QSV via libavcodec — `amf-qsv` needs `FFMPEG_DIR` with the `*_amf`/`*_qsv`
encoders at build, e.g. the BtbN gpl-shared tree, and the FFmpeg DLLs on PATH at run; NASM + CMake
for aws-lc-rs; libclang for ffmpeg-sys-next). Default build (no feature) = openh264 software encoder.
4. Run in the **interactive session** (not a Session-0 service / not over SSH — SendInput + DXGI
Desktop Duplication need a desktop): `serve` or `punktfunk1-host --source virtual`.
`PUNKTFUNK_ENCODER=auto` (default) picks the backend from the GPU vendor — `nvenc`/`amf`/`qsv`/`sw`
force one. The DXGI capturer emits zero-copy D3D11 NV12/P010 to match any GPU backend; the SudoVDA
monitor activates once a real GPU drives WDDM, so capture + encode then work.
### Dev loop (this repo → the Windows VM)
`ssh "Enrico Bühler"@192.168.1.57` (PowerShell shell). Repo cloned at `C:\Users\Public\punktfunk`
(Gitea). Sync uncommitted files with **sftp** (`sftp -b - host`, `/C:/...` paths — scp and
base64-over-ssh are unreliable here). Commit on Linux → `git reset --hard origin/main` on the VM.
Build env: `PATH` += cargo bin + NASM + CMake + LLVM (vcvars not needed — rustc/cc self-locate MSVC).
Set `CMAKE_POLICY_VERSION_MINIMUM=3.5` — CMake 4 rejects libopus's old `cmake_minimum_required` when
`audiopus_sys` (vendored by the `opus` crate) builds libopus from source for the host→client audio path.
## Decisions (locked 2026-06-14)
| Decision | Choice | Rationale |
|---|---|---|
| **Build order** | **Host first** | User preference. (Note: the research recommended *client* first, since the client is unblocked by the no-GPU problem and becomes the host's test endpoint — see "No-GPU dev strategy". Revisit if host progress stalls on GPU-gated steps.) |
| **Virtual display** | **SudoVDA** | Arbitrary modes on the fly (no baked EDID / registry mode list, unlike parsec-vdd), MIT/CC0 (bundleable), already installed on the dev box, proven by Apollo. |
| **Client UI** | **Pure Rust: `windows-rs` + Windows Reactor (WinUI 3)** | No C++/C#. Links `punktfunk-core` directly as a crate (like the GTK Linux client — no C ABI, no GC/FFI-lifetime hazard). Built-in `SwapChainPanel` widget for the video surface; `Custom` escape hatch + raw `Microsoft.UI.Xaml` as fallback. |
| **Client decode** | **FFmpeg + D3D11VA** | Exactly what Moonlight ships; feeds AnnexB H.264/HEVC/AV1 directly, decodes AV1 via the GPU DXVA profile with **no** Store Video Extension. Cost: ffmpeg dep + libclang. |
| **Host SW encode (no-GPU dev)** | **openh264** | BSD, no system ffmpeg, low-latency single-ref/zero-lookahead with intra-refresh. Lets the full capture→encode→FEC→send pipeline run GPU-less. |
| **Host HW encode** | **nvidia-video-codec-sdk (D3D11)** | `NV_ENC_DEVICE_TYPE_DIRECTX` + `NvEncRegisterResource` on the captured `ID3D11Texture2D` = true zero-copy, no CUDA bridge. Young crate — vendor + wrap behind the `Encoder` trait. Defers to a real-GPU box. |
## Dev box (`ssh "Enrico Bühler"@192.168.1.57`)
Windows 11 Pro 25H2 (build 26200), QEMU Q35, 8 vCPU, 12 GB. **No working GPU** (an `RTX 5070 Ti` node
is present but `Status: Unknown`; `nvidia-smi` fails → NVENC cannot initialize). Installed: Rust 1.96
(MSVC), Visual Studio Community 2026 + VC tools + Windows SDK 10.0.26100/28000, Windows App Runtime
2.2 (Reactor needs ≥2.0.1 ✅), **SudoVDA** (`ROOT\DISPLAY\0000`, hwid `root\sudomaker\sudovda`, INF
`oem6.inf`, Status OK) and Parsec VDD, git, winget. **Toolchain gaps to fill** (see Step 0): NASM,
CMake, libclang.
## Reused as-is (~95% of the codebase — no changes)
| Reusable | Why |
|---|---|
| `punktfunk-core` (protocol, FEC, crypto, session, transport, QUIC control plane, C ABI) | Zero platform deps; already compiles on Windows MSVC |
| GameStream wire logic (mDNS, serverinfo, pairing, RTSP, ENet) *except* the capture/encode/audio backends | pure protocol |
| Management REST API (`mgmt.rs`) + OpenAPI, `native_pairing`, `discovery` | axum/tokio/quinn — portable |
| `punktfunk1.rs` / `spike.rs` / `pipeline.rs` orchestration | trait-generic: call `capturer.next_frame()`, `encoder.submit/poll()`, `vd.create(mode)` — no changes |
| The trait boundaries: `Capturer`, `Encoder`, `VirtualDisplay`, `InputInjector`, `AudioCapturer`, `VirtualMic` | platform-neutral; Linux deps already isolated under `[target.'cfg(target_os="linux")'.dependencies]` |
## Step 0 — make `punktfunk-host` compile on `x86_64-pc-windows-msvc` — ✅ DONE (2026-06-14)
**Result:** the full dependency tree builds clean on MSVC (aws-lc-rs with NASM+CMake, quinn,
rusty_enet, axum/hyper/utoipa), and `punktfunk-host` compiles **and runs** (the `openapi` subcommand
emits the spec). Only **3 cfg-gates** were needed — the host was already ~95% portable:
`main.rs` `mod dmabuf_fence`/`mod drm_sync``#[cfg(target_os = "linux")]`; `vdisplay.rs` the
`use std::os::fd::OwnedFd` import + `VirtualOutput.remote_fd` field → `#[cfg(target_os = "linux")]`.
Verified green on Linux too. Build env on the VM: rustc+`cc`/`cmake` self-locate MSVC (vcvars not
needed); `PATH` must include cargo bin + NASM + CMake + LLVM.
The host already compiles on macOS (Linux backends are `cfg`-gated; heavy Linux deps are
target-gated). Getting to Windows MSVC is the **unix-but-not-linux** delta, not a from-scratch port:
1. **Toolchain**: `winget install NASM.NASM Kitware.CMake LLVM.LLVM`, set `LIBCLANG_PATH`
(or tick VS "C++ Clang tools"). NASM+CMake are for **aws-lc-rs** (pulled by `rustls`/`rcgen` on
the `quic` path); libclang is for `ffmpeg-sys`/bindgen (client decode + any host bindgen crate).
2. **`std::os::fd` / `libc`**: `vdisplay.rs:18` has an unconditional `use std::os::fd::OwnedFd;` and
`VirtualOutput.remote_fd: Option<OwnedFd>``std::os::fd` is `cfg(unix)`, so it builds on macOS
but breaks on Windows. Gate the import + field (`#[cfg(unix)]`, with a Windows arm or omission).
Sweep for other `cfg(target_os="linux")`-missing unix-isms (`libc`, fds).
3. **Build natively on the VM** (`cargo build -p punktfunk-host`*not* cross-compile; xwin chokes on
aws-lc-rs/ffmpeg-sys/WDK). Triage the remaining errors. Suspect deps to verify link on MSVC:
`aws-lc-rs` (needs NASM+CMake), `rusty_enet`, the hyper/axum/utoipa stack (expected fine).
4. **CI**: add a `cargo build -p punktfunk-host --target x86_64-pc-windows-msvc` job so the Windows
path stops bit-rotting (the dev box can be a Gitea runner later).
This is the highest-value first move and is **fully doable GPU-less**.
## Windows backends (new `#[cfg(target_os = "windows")]` code behind existing traits)
| Subsystem | Linux today | Windows backend | VM-testable? |
|---|---|---|---|
| **VirtualDisplay** | KWin/gamescope/Mutter/Sway | **SudoVDA** IOCTLs (below) + `SetDisplayConfig` mode-set | ✅ likely (WARP) — *spike* |
| **Capture** | PipeWire/dmabuf | **DXGI Desktop Duplication** primary, **WGC** fallback → `ID3D11Texture2D`; add `FramePayload::D3d11` | ⚠️ DDA-on-WARP unreliable; WGC-on-WARP unverified — *spike* |
| **Zero-copy** | dmabuf→EGL/Vulkan→CUDA | register `ID3D11Texture2D` with NVENC (`NV_ENC_DEVICE_TYPE_DIRECTX`) — no CUDA bridge | ❌ needs real GPU |
| **Encode** | ffmpeg `*_nvenc`/`*_vaapi` | `nvidia-video-codec-sdk` (NVIDIA) + libavcodec `*_amf`/`*_qsv` (AMD/Intel, `encode/ffmpeg_win.rs`, `--features amf-qsv`) + `openh264` SW fallback; vendor-auto via `PUNKTFUNK_ENCODER` | NVENC ✅ live / AMF/QSV CI-only |
| **Input kbd/mouse** | libei / wlr | **SendInput** with `MOUSEEVENTF_VIRTUALDESK` absolute mapping onto the virtual desktop rect (skip the VK→evdev table — client sends Win VKs; use `KEYEVENTF_SCANCODE`+`EXTENDEDKEY`) | ✅ |
| **Gamepad** | uinput xpad + FF | **ViGEmBus** via `vigem-client` (`Xbox360Wired`); rumble via `request_notification()``XNotification{large,small}` | ✅ (install driver) |
| **Audio capture** | PipeWire sink monitor | **WASAPI loopback** via the `wasapi` crate (48 kHz stereo f32 → existing Opus) | ⚠️ needs an audio endpoint |
| **Virtual mic** | PipeWire `Audio/Source` | virtual audio driver (`Virtual-Audio-Driver`) or defer | ❌ second driver — defer |
`punktfunk1.rs`/`spike.rs`/`pipeline.rs` are unchanged. Note: the Windows capture needs its own
`capture_virtual_output` entry point (the SudoVDA identity is a DXGI adapter LUID + DisplayConfig
TargetId → GDI `\\.\DisplayN`, which doesn't fit the PipeWire `node_id: u32` field — carry it inside
the `keepalive` / a Windows-specific seam rather than overloading `node_id`).
## SudoVDA control protocol (the `VirtualDisplay` backend spec)
Pure Rust via the `windows` crate (no C lib; Apollo vendors a header-only client under
`third-party/sudovda/`). Reference port pattern: `parsec-vdd-rust` (SetupAPI/CM_* → `CreateFileW`
`DeviceIoControl`). **Verify the IOCTL hex with a `const fn ctl_code()`**
`CTL_CODE(dev,func,method,access) = (dev<<16)|(access<<14)|(func<<2)|method`, with
`FILE_DEVICE_UNKNOWN=0x22`, `METHOD_BUFFERED=0`, `FILE_ANY_ACCESS=0`.
- **Device interface GUID**: `{E5BCC234-1E0C-418A-A0D4-EF8B7501414D}` · **HWID**: `root\sudomaker\sudovda`
- **IOCTLs** (func → value): ADD `0x800``0x00222000`, REMOVE `0x801``0x00222004`,
SET_RENDER_ADAPTER `0x802``0x00222008`, GET_WATCHDOG `0x803``0x0022200C`,
DRIVER_PING `0x888``0x00222220`, GET_PROTOCOL_VERSION `0x8FF``0x002223FC`.
- **Add** (`#[repr(C)]` exact layout): in `{ u32 Width; u32 Height; u32 RefreshRate; GUID MonitorGuid;
CHAR DeviceName[14]; CHAR SerialNumber[14] }` → out `{ LUID AdapterLuid; u32 TargetId }`. **The mode
is set at create** (driver computes timing arithmetically — no EDID seeding). Pick a *stable
per-client* `MonitorGuid` (Windows persists that monitor's layout; remove is by GUID).
- **Resolve the capture target**: the monitor appears **asynchronously** — poll
`QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS)`, match `targetInfo.id == TargetId`,
`DisplayConfigGetDeviceInfo` → `viewGdiDeviceName` (`\\.\DisplayN`). Apollo polls 20 ms → ×2 → cap
320 ms. Then point DXGI Desktop Duplication at that output.
- **Keepalive (mandatory)**: `GET_WATCHDOG` → `{ u32 Timeout_s; u32 Countdown }` (default **3 s**,
driver-wide). Run one thread firing `DRIVER_PING` every `Timeout*1000/3` ms (~1 s). Miss it and the
driver tears down **all** virtual displays.
- **Teardown (RAII)**: `Drop` → `DeviceIoControl(REMOVE, { GUID MonitorGuid })` = the `VirtualOutput`
keepalive drop.
- **Mid-stream `Reconfigure`**: SudoVDA has no in-place mode IOCTL (Apollo only relaunches). Implement
punktfunk's `Reconfigure` as remove+re-add at the new mode (or add-second + migrate capture), and
**watch the Win11 24H2/25H2 IDD mode-apply regression** (post-create `ChangeDisplaySettingsEx` may
not move the *desktop* to the new mode without a Settings-UI poke — VirtualDrivers #471). The
~90 ms `Reconfigure` budget needs an isolated spike to confirm on 24H2/25H2.
- **Install / signing**: self-signed — ship `sudovda.cer`, import to Root + TrustedPublisher, create
the device node via `nefconc.exe` (`--create-device-node`/`--install-driver`). Installs **without**
test-signing (trusted-publisher). MIT/CC0 → bundleable (Apollo precedent). **Already installed on
the dev box.** Document it as a host prerequisite (like the Linux udev rule).
- **GPU caveat**: SudoVDA's `Driver.cpp` does `D3D11CreateDevice(UNKNOWN)` on a render adapter with
**no explicit WARP fallback**; on the GPU-less VM Windows binds the Basic Render Driver (WARP), so
display compositing *should* work but NVENC won't. Confirm `ADD` actually brings a monitor up on the
VM in the first spike.
## No-GPU dev strategy
**Buildable + validatable on the VM now:** Step 0 (MSVC compile); the SudoVDA backend
(add/mode-set/keepalive/remove via WARP — *spike to confirm*); the openh264 SW encode path fed a CPU
BGRA staging copy → real AnnexB → FEC → UDP (the full transport minus HW); SendInput injection +
interactive-session/desktop-reattach; ViGEm gamepad + rumble; WASAPI loopback (if an endpoint
exists); and the entire client (software decode loopback).
**Defers to a real NVIDIA-GPU Windows box:** NVENC-D3D11 zero-copy encode; whether the captured
`ID3D11Texture2D` registers with NVENC zero-copy vs needing a `CopyResource`; the DDA-vs-WGC latency
bake-off (DDA-on-WARP is `E_NOTIMPL`-class); split-encode + bitrate-ceiling probe; and **all**
glass-to-glass / throughput numbers (no perf claim transfers from Linux).
## Windows-specific structural issues (no Linux precedent)
- **Interactive session, not a Session-0 service.** SendInput can't reach the desktop from Session 0.
Run the host in the user's interactive session and replicate Apollo/Sunshine's
`OpenInputDesktop`/`SetThreadDesktop` re-attach to survive UAC/lock-screen desktop switches. (Driving
the UAC *secure* desktop needs a UIAccess manifest + signing — out of scope; document it.)
- **Clock epoch on the host side.** The skew handshake assumes both ends read the same realtime epoch
in ns. The Windows host must emit timestamps from `GetSystemTimePreciseAsFileTime`→Unix-epoch-ns or
cross-machine latency numbers + `ClockProbe`/`ClockEcho` break.
- **IDD has no audio endpoint.** There's nothing to loop back on a headless box unless a real/virtual
render device exists → WASAPI loopback needs an endpoint, and the virtual *mic* (client→host) has no
clean user-mode path. Audio is potentially a second driver-install problem; defer the mic.
- **Color/range.** All clients assume BT.709 limited-range. A new openh264/NVENC-D3D11 path doing
BGRA→I420 must match, or colors wash out — validate against the existing decoders.
## Phased plan (host-first)
0. **Compile on MSVC** (Step 0 above). GPU-less. ← *start here*
1. **SudoVDA `VirtualDisplay` backend** — ✅ *control path landed* (`vdisplay/sudovda.rs`:
add/keepalive/remove + GDI-name resolution + RAII teardown, behind the existing trait; `open()`
returns it on Windows). Compiles + live-tested on the VM. **Remaining:** monitor activation +
`\\.\DisplayN` resolution (needs a GPU), then `SetDisplayConfig` mid-stream `Reconfigure`.
2. **Capture + SW encode** — DXGI Desktop Duplication (or WGC) → `ID3D11Texture2D` → CPU staging →
openh264 → existing FEC/transport. First end-to-end Windows session, GPU-less, against the Linux
`punktfunk-probe` or the new Windows client.
3. **Input** — SendInput (kbd/mouse, VIRTUALDESK mapping) + interactive-session/desktop-reattach.
4. **Gamepad + audio** — ViGEm + rumble; WASAPI loopback.
5. **HW encode (real-GPU box)** — `nvidia-video-codec-sdk` D3D11 zero-copy; DDA-vs-WGC bake-off;
glass-to-glass numbers. Resolve to Xbox-360 pad on Windows (drop DualSense fidelity/virtual-mic to
follow-ups, as the host already does for non-Linux).
## The Windows client (separate track, pure Rust)
Structurally a sibling of `clients/linux` (GTK4) — same shape, different toolkit:
- **UI**: `windows-rs` + **Windows Reactor** (WinUI 3) for native chrome. Link `punktfunk-core`
directly (no C ABI). **De-risk early**: a Reactor window with a `SwapChainPanel` presenting a
test pattern through a flip-model waitable swapchain, before building on it. Fallback if Reactor's
3-week-old maturity bites: the `Custom` element + raw `windows-rs` `Microsoft.UI.Xaml`.
- **Decode**: FFmpeg `avcodec_send_packet`/`receive_frame` with the **D3D11VA** hwaccel → `NV12/P010`
`ID3D11Texture2D`. Feeds AnnexB directly (matches host output), decodes AV1 with no Store extension.
- **Present**: DXGI flip-model **waitable** swapchain (`FLIP_DISCARD` + `FRAME_LATENCY_WAITABLE_OBJECT`,
max latency 1) bound to the `SwapChainPanel` via `ISwapChainPanelNative::SetSwapChain`. **Not**
MediaPlayerElement.
- **Input capture**: RAWINPUT/`WM_INPUT` for relative/pointer-lock mouse; `Windows.Gaming.Input` for
gamepads + rumble. Forward via the linked `NativeClient` (`send_input`/`send_rich_input`).
- **Trust**: SPAKE2 PIN + TOFU pinning via core; persist the client identity in Windows Credential
Manager / DPAPI (the Keychain analog).
## Open risks / spikes (do these in isolation, early)
1. **`cargo build -p punktfunk-host` on the VM** — count + triage the real MSVC errors before
estimating Step 0. (GPU-less.)
2. **SudoVDA `ADD` on the VM** — ✅ *done 2026-06-15.* The control path is fully validated on the
GPU-less VM, both standalone and through the real `VirtualDisplay` trait (`vdisplay/sudovda.rs`):
device open by GUID, `GET_VERSION` (0.2.1), `GET_WATCHDOG` (3 s), `ADD 1920×1080@60` → returns
adapter LUID + `target_id`, watchdog ping holds it, RAII `Drop` → `REMOVE`. **Gap:** with no GPU the
target does NOT activate into a WDDM display path (`QueryDisplayConfig` active paths stay 0 → no
`\\.\DisplayN` to resolve/capture). So **activation + name-resolution + capture defer to a real
GPU** (passthrough on the Proxmox VM, or a GPU box) — consistent with capture/NVENC deferring anyway.
3. **IDD arbitrary-mode + `Reconfigure` on 24H2/25H2** — does 5120×1440@240 apply, and does a
remove+re-add (or re-modeset) hit the ~90 ms budget without a Settings-UI toggle? Make-or-break for
"native client resolution, no scaling".
4. **NVENC-D3D11 zero-copy** (real-GPU box) — does the captured texture register as-is, or need a
copy? Does `nvidia-video-codec-sdk`'s `NV_ENC_DEVICE_TYPE_DIRECTX` path work end-to-end? (Expect to
vendor/patch.)
5. **DDA vs WGC** against the SudoVDA monitor — measure latency/jitter on a real GPU; resolve the
primary-capture choice.
6. **Driver redistribution** — confirm bundling SudoVDA (`.cer` + nefcon) + ViGEmBus installers in the
punktfunk Windows package; document them as prerequisites.
## References
- SudoVDA: <https://github.com/SudoMaker/SudoVDA> · Apollo integration:
<https://github.com/ClassicOldSong/Apollo/tree/master/src/platform/windows> (`virtual_display.cpp`)
+ `third-party/sudovda/`
- parsec-vdd-rust (port pattern): <https://github.com/rohitsangwan01/parsec-vdd-rust>
- Win11 24H2 IDD mode-apply regression: VirtualDrivers/Virtual-Display-Driver #471
- Windows Reactor (WinUI 3 in Rust): windows-rs PR #4479
- Crates: `windows`, `windows-capture`, `vigem-client`, `wasapi`, `openh264`,
`nvidia-video-codec-sdk`, `ffmpeg-next`
</content>
</invoke>