Files
punktfunk/docs/windows-host.md
T
enricobuehler 5cf7b561b5
apple / swift (push) Successful in 53s
android / android (push) Failing after 36s
ci / rust (push) Failing after 46s
ci / web (push) Successful in 31s
ci / docs-site (push) Successful in 29s
ci / bench (push) Successful in 1m35s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
flatpak / build-publish (push) Failing after 2s
deb / build-publish (push) Successful in 3m17s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 1m16s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 1m42s
docker / deploy-docs (push) Successful in 7s
docs(windows-host): gamepad done; audio/rumble/GPU-validation remaining
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 01:48:23 +00:00

21 KiB
Raw Blame History

Windows host + client — implementation plan

Status: in progress — dev box provisioned, host-first. A Windows host is an "add backends behind the existing traits" job, not a parallel port: punktfunk-core and the whole control plane are platform-agnostic and the host already compiles on non-Linux (macOS) thanks to existing cfg(target_os) gating. The one piece that used to make it XL — a per-client virtual output, which has no user-mode Windows API — is solved by reusing SudoVDA (the SudoMaker Virtual Display Adapter, the same IDD the Apollo Sunshine-fork ships): a pre-built IDD that creates virtual displays at arbitrary WxH@Hz on the fly. We install it and drive its IOCTL control interface — no driver to write or WHQL-sign.

History: scoped 2026-06-10 (4-agent read of the host crate); SudoVDA path 2026-06-11; this concrete plan + dev box + SudoVDA protocol + no-GPU strategy added 2026-06-14 (12-agent research pass).

Status (2026-06-15) — all host backends landed; runnable on Windows

Every OS-touching backend is implemented behind the existing traits and builds clean on x86_64-pc-windows-msvc (and Linux unaffected). serve --native / m3-host run on Windows (identity in %APPDATA%/HOME, QUIC bound, mDNS advertising, accepting sessions — validated live).

Backend State GPU-less validation on the VM
Virtual display (SudoVDA) done live: open/version/watchdog/ADD/REMOVE via the trait
Input (SendInput) done compiles; live injection needs an interactive session (not SSH)
Software encode (openh264) done live: m0 synthetic→openh264→core FEC loopback, 120/120, 0 mismatches
Audio (WASAPI loopback) done live: init chain opens (silent VM → no samples)
Capture (DXGI Desktop Duplication) done helpers unit-tested; DuplicateOutput needs a GPU-activated monitor
NVENC (D3D11, --features nvenc) compiles+links needs a GPU at runtime
Run host (serve/m3-host) live m3-host starts + listens; c_abi_connection_roundtrip passes
Gamepad (ViGEm) done compiles; live needs ViGEmBus + a physical pad; rumble back-channel TODO
Host→client audio wiring blocked opus crate links system libopus (not on MSVC)

Remaining for full parity:

  • Host→client Opus audio — the WASAPI capture backend is done + init-validated, but the m3 audio_thread stays Linux-gated because the opus crate links system libopus, absent on MSVC. Finish by providing libopus on MSVC (vcpkg) or switching audio_thread to a vendoring Opus crate (audiopus/magnum-opus build libopus from C source), then widen the audio_thread cfg.
  • ViGEm rumble back-channel (Xbox360Wired::request_notification) — small; needs a physical pad.
  • Live GPU/in-session validation — SudoVDA monitor activation, DXGI capture, NVENC encode, and SendInput injection all need a real GPU and an interactive (console) session, not SSH/Session-0.

Building & testing on a real-GPU Windows box (NVENC)

  1. Install SudoVDA (virtual display) and ViGEmBus (gamepad) drivers; install the NVIDIA driver.
  2. NVENC link lib: either install the NVIDIA Video Codec SDK, or generate an import lib from the driver DLL — lib /def:nvenc.def /machine:x64 /out:nvencodeapi.lib where nvenc.def lists NvEncodeAPICreateInstance and NvEncodeAPIGetMaxSupportedVersion — and set PUNKTFUNK_NVENC_LIB_DIR to its directory.
  3. cargo build -p punktfunk-host --features nvenc (needs NASM + CMake for aws-lc-rs; libclang for any ffmpeg-using client). Default build (no feature) uses the openh264 software encoder.
  4. Run in the interactive session (not a Session-0 service / not over SSH — SendInput + DXGI Desktop Duplication need a desktop): serve --native or m3-host --source virtual. Set PUNKTFUNK_ENCODER=nvenc to select NVENC (the DXGI capturer switches to zero-copy D3D11 output to match). The SudoVDA monitor activates once a real GPU drives WDDM, so capture + NVENC then work.

Dev loop (this repo → the Windows VM)

ssh "Enrico Bühler"@192.168.1.57 (PowerShell shell). Repo cloned at C:\Users\Public\punktfunk (Gitea). Sync uncommitted files with sftp (sftp -b - host, /C:/... paths — scp and base64-over-ssh are unreliable here). Commit on Linux → git reset --hard origin/main on the VM. Build env: PATH += cargo bin + NASM + CMake + LLVM (vcvars not needed — rustc/cc self-locate MSVC).

Decisions (locked 2026-06-14)

Decision Choice Rationale
Build order Host first User preference. (Note: the research recommended client first, since the client is unblocked by the no-GPU problem and becomes the host's test endpoint — see "No-GPU dev strategy". Revisit if host progress stalls on GPU-gated steps.)
Virtual display SudoVDA Arbitrary modes on the fly (no baked EDID / registry mode list, unlike parsec-vdd), MIT/CC0 (bundleable), already installed on the dev box, proven by Apollo.
Client UI Pure Rust: windows-rs + Windows Reactor (WinUI 3) No C++/C#. Links punktfunk-core directly as a crate (like the GTK Linux client — no C ABI, no GC/FFI-lifetime hazard). Built-in SwapChainPanel widget for the video surface; Custom escape hatch + raw Microsoft.UI.Xaml as fallback.
Client decode FFmpeg + D3D11VA Exactly what Moonlight ships; feeds AnnexB H.264/HEVC/AV1 directly, decodes AV1 via the GPU DXVA profile with no Store Video Extension. Cost: ffmpeg dep + libclang.
Host SW encode (no-GPU dev) openh264 BSD, no system ffmpeg, low-latency single-ref/zero-lookahead with intra-refresh. Lets the full capture→encode→FEC→send pipeline run GPU-less.
Host HW encode nvidia-video-codec-sdk (D3D11) NV_ENC_DEVICE_TYPE_DIRECTX + NvEncRegisterResource on the captured ID3D11Texture2D = true zero-copy, no CUDA bridge. Young crate — vendor + wrap behind the Encoder trait. Defers to a real-GPU box.

Dev box (ssh "Enrico Bühler"@192.168.1.57)

Windows 11 Pro 25H2 (build 26200), QEMU Q35, 8 vCPU, 12 GB. No working GPU (an RTX 5070 Ti node is present but Status: Unknown; nvidia-smi fails → NVENC cannot initialize). Installed: Rust 1.96 (MSVC), Visual Studio Community 2026 + VC tools + Windows SDK 10.0.26100/28000, Windows App Runtime 2.2 (Reactor needs ≥2.0.1 ), SudoVDA (ROOT\DISPLAY\0000, hwid root\sudomaker\sudovda, INF oem6.inf, Status OK) and Parsec VDD, git, winget. Toolchain gaps to fill (see Step 0): NASM, CMake, libclang.

Reused as-is (~95% of the codebase — no changes)

Reusable Why
punktfunk-core (protocol, FEC, crypto, session, transport, QUIC control plane, C ABI) Zero platform deps; already compiles on Windows MSVC
GameStream wire logic (mDNS, serverinfo, pairing, RTSP, ENet) except the capture/encode/audio backends pure protocol
Management REST API (mgmt.rs) + OpenAPI, native_pairing, discovery axum/tokio/quinn — portable
m3.rs / m0.rs / pipeline.rs orchestration trait-generic: call capturer.next_frame(), encoder.submit/poll(), vd.create(mode) — no changes
The trait boundaries: Capturer, Encoder, VirtualDisplay, InputInjector, AudioCapturer, VirtualMic platform-neutral; Linux deps already isolated under [target.'cfg(target_os="linux")'.dependencies]

Step 0 — make punktfunk-host compile on x86_64-pc-windows-msvc DONE (2026-06-14)

Result: the full dependency tree builds clean on MSVC (aws-lc-rs with NASM+CMake, quinn, rusty_enet, axum/hyper/utoipa), and punktfunk-host compiles and runs (the openapi subcommand emits the spec). Only 3 cfg-gates were needed — the host was already ~95% portable: main.rs mod dmabuf_fence/mod drm_sync#[cfg(target_os = "linux")]; vdisplay.rs the use std::os::fd::OwnedFd import + VirtualOutput.remote_fd field → #[cfg(target_os = "linux")]. Verified green on Linux too. Build env on the VM: rustc+cc/cmake self-locate MSVC (vcvars not needed); PATH must include cargo bin + NASM + CMake + LLVM.

The host already compiles on macOS (Linux backends are cfg-gated; heavy Linux deps are target-gated). Getting to Windows MSVC is the unix-but-not-linux delta, not a from-scratch port:

  1. Toolchain: winget install NASM.NASM Kitware.CMake LLVM.LLVM, set LIBCLANG_PATH (or tick VS "C++ Clang tools"). NASM+CMake are for aws-lc-rs (pulled by rustls/rcgen on the quic path); libclang is for ffmpeg-sys/bindgen (client decode + any host bindgen crate).
  2. std::os::fd / libc: vdisplay.rs:18 has an unconditional use std::os::fd::OwnedFd; and VirtualOutput.remote_fd: Option<OwnedFd>std::os::fd is cfg(unix), so it builds on macOS but breaks on Windows. Gate the import + field (#[cfg(unix)], with a Windows arm or omission). Sweep for other cfg(target_os="linux")-missing unix-isms (libc, fds).
  3. Build natively on the VM (cargo build -p punktfunk-hostnot cross-compile; xwin chokes on aws-lc-rs/ffmpeg-sys/WDK). Triage the remaining errors. Suspect deps to verify link on MSVC: aws-lc-rs (needs NASM+CMake), rusty_enet, the hyper/axum/utoipa stack (expected fine).
  4. CI: add a cargo build -p punktfunk-host --target x86_64-pc-windows-msvc job so the Windows path stops bit-rotting (the dev box can be a Gitea runner later).

This is the highest-value first move and is fully doable GPU-less.

Windows backends (new #[cfg(target_os = "windows")] code behind existing traits)

Subsystem Linux today Windows backend VM-testable?
VirtualDisplay KWin/gamescope/Mutter/Sway SudoVDA IOCTLs (below) + SetDisplayConfig mode-set likely (WARP) — spike
Capture PipeWire/dmabuf DXGI Desktop Duplication primary, WGC fallback → ID3D11Texture2D; add FramePayload::D3d11 ⚠️ DDA-on-WARP unreliable; WGC-on-WARP unverified — spike
Zero-copy dmabuf→EGL/Vulkan→CUDA register ID3D11Texture2D with NVENC (NV_ENC_DEVICE_TYPE_DIRECTX) — no CUDA bridge needs real GPU
Encode ffmpeg *_nvenc openh264 SW (default on VM) + nvidia-video-codec-sdk HW (real GPU); behind PUNKTFUNK_ENCODER SW / HW
Input kbd/mouse libei / wlr SendInput with MOUSEEVENTF_VIRTUALDESK absolute mapping onto the virtual desktop rect (skip the VK→evdev table — client sends Win VKs; use KEYEVENTF_SCANCODE+EXTENDEDKEY)
Gamepad uinput xpad + FF ViGEmBus via vigem-client (Xbox360Wired); rumble via request_notification()XNotification{large,small} (install driver)
Audio capture PipeWire sink monitor WASAPI loopback via the wasapi crate (48 kHz stereo f32 → existing Opus) ⚠️ needs an audio endpoint
Virtual mic PipeWire Audio/Source virtual audio driver (Virtual-Audio-Driver) or defer second driver — defer

m3.rs/m0.rs/pipeline.rs are unchanged. Note: the Windows capture needs its own capture_virtual_output entry point (the SudoVDA identity is a DXGI adapter LUID + DisplayConfig TargetId → GDI \\.\DisplayN, which doesn't fit the PipeWire node_id: u32 field — carry it inside the keepalive / a Windows-specific seam rather than overloading node_id).

SudoVDA control protocol (the VirtualDisplay backend spec)

Pure Rust via the windows crate (no C lib; Apollo vendors a header-only client under third-party/sudovda/). Reference port pattern: parsec-vdd-rust (SetupAPI/CM_* → CreateFileWDeviceIoControl). Verify the IOCTL hex with a const fn ctl_code()CTL_CODE(dev,func,method,access) = (dev<<16)|(access<<14)|(func<<2)|method, with FILE_DEVICE_UNKNOWN=0x22, METHOD_BUFFERED=0, FILE_ANY_ACCESS=0.

  • Device interface GUID: {E5BCC234-1E0C-418A-A0D4-EF8B7501414D} · HWID: root\sudomaker\sudovda
  • IOCTLs (func → value): ADD 0x8000x00222000, REMOVE 0x8010x00222004, SET_RENDER_ADAPTER 0x8020x00222008, GET_WATCHDOG 0x8030x0022200C, DRIVER_PING 0x8880x00222220, GET_PROTOCOL_VERSION 0x8FF0x002223FC.
  • Add (#[repr(C)] exact layout): in { u32 Width; u32 Height; u32 RefreshRate; GUID MonitorGuid; CHAR DeviceName[14]; CHAR SerialNumber[14] } → out { LUID AdapterLuid; u32 TargetId }. The mode is set at create (driver computes timing arithmetically — no EDID seeding). Pick a stable per-client MonitorGuid (Windows persists that monitor's layout; remove is by GUID).
  • Resolve the capture target: the monitor appears asynchronously — poll QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS), match targetInfo.id == TargetId, DisplayConfigGetDeviceInfoviewGdiDeviceName (\\.\DisplayN). Apollo polls 20 ms → ×2 → cap 320 ms. Then point DXGI Desktop Duplication at that output.
  • Keepalive (mandatory): GET_WATCHDOG{ u32 Timeout_s; u32 Countdown } (default 3 s, driver-wide). Run one thread firing DRIVER_PING every Timeout*1000/3 ms (~1 s). Miss it and the driver tears down all virtual displays.
  • Teardown (RAII): DropDeviceIoControl(REMOVE, { GUID MonitorGuid }) = the VirtualOutput keepalive drop.
  • Mid-stream Reconfigure: SudoVDA has no in-place mode IOCTL (Apollo only relaunches). Implement punktfunk's Reconfigure as remove+re-add at the new mode (or add-second + migrate capture), and watch the Win11 24H2/25H2 IDD mode-apply regression (post-create ChangeDisplaySettingsEx may not move the desktop to the new mode without a Settings-UI poke — VirtualDrivers #471). The ~90 ms Reconfigure budget needs an isolated spike to confirm on 24H2/25H2.
  • Install / signing: self-signed — ship sudovda.cer, import to Root + TrustedPublisher, create the device node via nefconc.exe (--create-device-node/--install-driver). Installs without test-signing (trusted-publisher). MIT/CC0 → bundleable (Apollo precedent). Already installed on the dev box. Document it as a host prerequisite (like the Linux udev rule).
  • GPU caveat: SudoVDA's Driver.cpp does D3D11CreateDevice(UNKNOWN) on a render adapter with no explicit WARP fallback; on the GPU-less VM Windows binds the Basic Render Driver (WARP), so display compositing should work but NVENC won't. Confirm ADD actually brings a monitor up on the VM in the first spike.

No-GPU dev strategy

Buildable + validatable on the VM now: Step 0 (MSVC compile); the SudoVDA backend (add/mode-set/keepalive/remove via WARP — spike to confirm); the openh264 SW encode path fed a CPU BGRA staging copy → real AnnexB → FEC → UDP (the full transport minus HW); SendInput injection + interactive-session/desktop-reattach; ViGEm gamepad + rumble; WASAPI loopback (if an endpoint exists); and the entire client (software decode loopback).

Defers to a real NVIDIA-GPU Windows box: NVENC-D3D11 zero-copy encode; whether the captured ID3D11Texture2D registers with NVENC zero-copy vs needing a CopyResource; the DDA-vs-WGC latency bake-off (DDA-on-WARP is E_NOTIMPL-class); split-encode + bitrate-ceiling probe; and all glass-to-glass / throughput numbers (no perf claim transfers from Linux).

Windows-specific structural issues (no Linux precedent)

  • Interactive session, not a Session-0 service. SendInput can't reach the desktop from Session 0. Run the host in the user's interactive session and replicate Apollo/Sunshine's OpenInputDesktop/SetThreadDesktop re-attach to survive UAC/lock-screen desktop switches. (Driving the UAC secure desktop needs a UIAccess manifest + signing — out of scope; document it.)
  • Clock epoch on the host side. The skew handshake assumes both ends read the same realtime epoch in ns. The Windows host must emit timestamps from GetSystemTimePreciseAsFileTime→Unix-epoch-ns or cross-machine latency numbers + ClockProbe/ClockEcho break.
  • IDD has no audio endpoint. There's nothing to loop back on a headless box unless a real/virtual render device exists → WASAPI loopback needs an endpoint, and the virtual mic (client→host) has no clean user-mode path. Audio is potentially a second driver-install problem; defer the mic.
  • Color/range. All clients assume BT.709 limited-range. A new openh264/NVENC-D3D11 path doing BGRA→I420 must match, or colors wash out — validate against the existing decoders.

Phased plan (host-first)

  1. Compile on MSVC (Step 0 above). GPU-less. ← start here
  2. SudoVDA VirtualDisplay backend control path landed (vdisplay/sudovda.rs: add/keepalive/remove + GDI-name resolution + RAII teardown, behind the existing trait; open() returns it on Windows). Compiles + live-tested on the VM. Remaining: monitor activation + \\.\DisplayN resolution (needs a GPU), then SetDisplayConfig mid-stream Reconfigure.
  3. Capture + SW encode — DXGI Desktop Duplication (or WGC) → ID3D11Texture2D → CPU staging → openh264 → existing FEC/transport. First end-to-end Windows session, GPU-less, against the Linux punktfunk-client-rs or the new Windows client.
  4. Input — SendInput (kbd/mouse, VIRTUALDESK mapping) + interactive-session/desktop-reattach.
  5. Gamepad + audio — ViGEm + rumble; WASAPI loopback.
  6. HW encode (real-GPU box)nvidia-video-codec-sdk D3D11 zero-copy; DDA-vs-WGC bake-off; glass-to-glass numbers. Resolve to Xbox-360 pad on Windows (drop DualSense fidelity/virtual-mic to follow-ups, as the host already does for non-Linux).

The Windows client (separate track, pure Rust)

Structurally a sibling of crates/punktfunk-client-linux (GTK4) — same shape, different toolkit:

  • UI: windows-rs + Windows Reactor (WinUI 3) for native chrome. Link punktfunk-core directly (no C ABI). De-risk early: a Reactor window with a SwapChainPanel presenting a test pattern through a flip-model waitable swapchain, before building on it. Fallback if Reactor's 3-week-old maturity bites: the Custom element + raw windows-rs Microsoft.UI.Xaml.
  • Decode: FFmpeg avcodec_send_packet/receive_frame with the D3D11VA hwaccel → NV12/P010 ID3D11Texture2D. Feeds AnnexB directly (matches host output), decodes AV1 with no Store extension.
  • Present: DXGI flip-model waitable swapchain (FLIP_DISCARD + FRAME_LATENCY_WAITABLE_OBJECT, max latency 1) bound to the SwapChainPanel via ISwapChainPanelNative::SetSwapChain. Not MediaPlayerElement.
  • Input capture: RAWINPUT/WM_INPUT for relative/pointer-lock mouse; Windows.Gaming.Input for gamepads + rumble. Forward via the linked NativeClient (send_input/send_rich_input).
  • Trust: SPAKE2 PIN + TOFU pinning via core; persist the client identity in Windows Credential Manager / DPAPI (the Keychain analog).

Open risks / spikes (do these in isolation, early)

  1. cargo build -p punktfunk-host on the VM — count + triage the real MSVC errors before estimating Step 0. (GPU-less.)
  2. SudoVDA ADD on the VM done 2026-06-15. The control path is fully validated on the GPU-less VM, both standalone and through the real VirtualDisplay trait (vdisplay/sudovda.rs): device open by GUID, GET_VERSION (0.2.1), GET_WATCHDOG (3 s), ADD 1920×1080@60 → returns adapter LUID + target_id, watchdog ping holds it, RAII DropREMOVE. Gap: with no GPU the target does NOT activate into a WDDM display path (QueryDisplayConfig active paths stay 0 → no \\.\DisplayN to resolve/capture). So activation + name-resolution + capture defer to a real GPU (passthrough on the Proxmox VM, or a GPU box) — consistent with capture/NVENC deferring anyway.
  3. IDD arbitrary-mode + Reconfigure on 24H2/25H2 — does 5120×1440@240 apply, and does a remove+re-add (or re-modeset) hit the ~90 ms budget without a Settings-UI toggle? Make-or-break for "native client resolution, no scaling".
  4. NVENC-D3D11 zero-copy (real-GPU box) — does the captured texture register as-is, or need a copy? Does nvidia-video-codec-sdk's NV_ENC_DEVICE_TYPE_DIRECTX path work end-to-end? (Expect to vendor/patch.)
  5. DDA vs WGC against the SudoVDA monitor — measure latency/jitter on a real GPU; resolve the primary-capture choice.
  6. Driver redistribution — confirm bundling SudoVDA (.cer + nefcon) + ViGEmBus installers in the punktfunk Windows package; document them as prerequisites.

References