Files
punktfunk/docs/windows-host.md
T
enricobuehler e07e359b6d
ci / rust (push) Has been cancelled
docs: scope Windows-as-host (deferred) + update roadmap status
A 4-agent read of the host crate: a Windows host is an "add a backend" job, not a parallel
port — ~95% reuse (core/protocol/FEC/crypto/C-ABI, QUIC, GameStream, mgmt, m3/pipeline are all
platform-agnostic and already cfg-isolated). New cfg(windows) backends behind the existing
traits: DXGI Desktop Duplication (capture), Media Foundation / NVENC-SDK (encode), SendInput +
ViGEm (input), WASAPI loopback + virtual mic (audio). The blocker is the virtual-display
feature — no user-mode Windows API; it needs a signed kernel-mode IDD driver (XL).

docs/windows-host.md records the per-subsystem effort + a phased plan (Phase 0 = a "basic
Windows host" capturing an existing monitor, smallest surface). Deferred: large and unbuildable
on the Linux dev box, per the request to only take it on if manageable. roadmap.md marks
#1/#2/#4 done, #3 packaged, and adds #7 Windows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 22:29:01 +00:00

5.3 KiB
Raw Blame History

Windows as a host — feasibility & scoping

Status: scoped, deferred. A Windows host is architecturally an "add a backend" job, not a parallel port — but it is a large implementation effort across five GPU/driver subsystems, and the project's headline feature (a per-client virtual output at the client's exact mode) has no user-mode Windows API: it needs a signed kernel-mode Indirect Display Driver (IDD). This doc records what it takes so the work can be picked up deliberately later.

(Grounded in a 4-agent read of the host crate, 2026-06-10.)

What's already done for us

punktfunk is cleanly layered. ~95% of the codebase is platform-agnostic and reuses verbatim:

Reusable as-is Why
punktfunk-core (protocol, FEC, crypto, session, transport, C ABI) Zero platform deps — no cfg(linux) anywhere; the C ABI is already cross-platform
QUIC control plane (quic.rs, pairing, mode negotiation) quinn + tokio are portable
GameStream P1.1 (mDNS, serverinfo, pairing, RTSP, ENet) — except stream.rs/audio.rs pure wire logic
Management REST API (mgmt.rs) + OpenAPI axum/tokio, portable
Pipeline + m3.rs orchestration trait-generic — calls capturer.next_frame(), encoder.submit/poll(); needs zero changes
The trait boundaries themselves: Capturer, Encoder, VirtualDisplay, InputInjector, AudioCapturer, VirtualMic platform-neutral signatures; Linux deps are already isolated under [target.'cfg(target_os="linux")'.dependencies]

So a Windows host is new #[cfg(target_os = "windows")] backend modules behind the existing traits — the per-frame path, protocol, and control plane don't move. No architectural refactor is required; the boundaries are already in the right places.

What a Windows host needs (new code)

Each row is a Linux backend that needs a Windows sibling. Effort is the implementation effort; all reuse the existing trait.

Subsystem Linux today Windows equivalent Effort Notes
Capture xdg ScreenCast portal → PipeWire (dmabuf) DXGI Desktop Duplication (or Windows.Graphics.Capture) → D3D11 texture M DXGI gives a GPU B8G8R8A8 texture directly
Virtual display KWin/Mutter/Sway/gamescope protocols Indirect Display Driver (IDD) — kernel UMDF mini-driver XL ⚠️ the blocker: no user-mode API; C++ driver + code signing (test-sign or WHQL). Fallback: capture an existing monitor (loses the native-resolution feature) or a borderless window
Encode ffmpeg-next NVENC, CUDA hwframes Media Foundation H.264/HEVC/AV1, or NVENC SDK direct with a D3D11 device context (AVD3D11VADeviceContext) ML encode.rs AU/codec logic + NVENC option strings are portable; only the hwdevice + frame-pool glue swaps
Zero-copy bridge dmabuf → EGL/Vulkan → CUDA D3D11 texture → NVENC (shared texture / cudaImportExternalMemory + D3D12 fence) M optional — a portable CPU-copy path already exists, so v1 can skip this
Input (ptr/kbd) libei (RemoteDesktop portal) / wlr protocols SendInput (keybd_event/mouse_event) S the VK→evdev table just becomes VK→VIRTUAL_KEY (already Win32-native)
Input (gamepads) uinput X-Box-360 pad + FF rumble ViGEm (Virtual Gamepad Emulation) + HID reports M rumble back-channel maps to ViGEm notifications
Audio capture PipeWire sink-monitor WASAPI loopback (IAudioCaptureClient) SM also produces interleaved f32 — same AudioCapturer contract
Virtual mic PipeWire Audio/Source virtual audio device (VB-Cable-style WDM driver) or WASAPI render-to-fake-device M needs a driver or a bundled 3rd-party cable
sendmmsg batching gamestream/stream.rs already has a cfg(not(linux)) per-packet fallback nothing to do

Rough total: ~2,0004,000 LOC of new Rust (+ a C++ IDD driver if the virtual-display feature is kept), spread over capture/encode/vdisplay/input/audio. Every reader rated the overall effort large; the input+audio layer alone is medium.

  1. Phase 0 — "basic Windows host" (no virtual display). Capture an existing monitor (DXGI Desktop Duplication) → Media Foundation/NVENC encode → SendInput + WASAPI loopback. This proves the whole stack on Windows with the smallest surface, reusing all of core/QUIC/GameStream/mgmt. It loses the per-client native-resolution output but is a working Windows host quickly.
  2. Phase 1 — input + audio parity. ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC zero-copy.
  3. Phase 2 — the virtual display (IDD). The XL piece: a signed Indirect Display Driver that surfaces a client-sized monitor, captured via DXGI. This restores punktfunk's differentiator on Windows. Gated on solving driver signing/distribution.

Why it's deferred (not started now)

  • It's large, and the virtual-display blocker (IDD) is a kernel driver + signing problem outside Rust — not "somewhat manageable" as a side effort.
  • None of it is buildable or testable on the Linux dev box — it would be unvalidated code.

The architecture is ready whenever the work is scheduled; this doc + the clean trait boundaries are the down payment. Start at Phase 0 for the fastest path to a working Windows host.