# Windows as a host — feasibility & scoping **Status: scoped, deferred.** A Windows host is architecturally an *"add a backend"* job, not a parallel port — but it is a **large** implementation effort across five GPU/driver subsystems, and the project's headline feature (a per-client *virtual* output at the client's exact mode) has **no user-mode Windows API**: it needs a signed kernel-mode Indirect Display Driver (IDD). This doc records what it takes so the work can be picked up deliberately later. (Grounded in a 4-agent read of the host crate, 2026-06-10.) ## What's already done for us punktfunk is cleanly layered. **~95% of the codebase is platform-agnostic and reuses verbatim:** | Reusable as-is | Why | |---|---| | `punktfunk-core` (protocol, FEC, crypto, session, transport, **C ABI**) | Zero platform deps — no `cfg(linux)` anywhere; the C ABI is already cross-platform | | QUIC control plane (`quic.rs`, pairing, mode negotiation) | quinn + tokio are portable | | GameStream P1.1 (mDNS, serverinfo, pairing, RTSP, ENet) — *except* `stream.rs`/`audio.rs` | pure wire logic | | Management REST API (`mgmt.rs`) + OpenAPI | axum/tokio, portable | | Pipeline + `m3.rs` orchestration | trait-generic — calls `capturer.next_frame()`, `encoder.submit/poll()`; **needs zero changes** | | The **trait boundaries** themselves: `Capturer`, `Encoder`, `VirtualDisplay`, `InputInjector`, `AudioCapturer`, `VirtualMic` | platform-neutral signatures; Linux deps are already isolated under `[target.'cfg(target_os="linux")'.dependencies]` | So a Windows host is **new `#[cfg(target_os = "windows")]` backend modules behind the existing traits** — the per-frame path, protocol, and control plane don't move. No architectural refactor is required; the boundaries are already in the right places. ## What a Windows host needs (new code) Each row is a Linux backend that needs a Windows sibling. Effort is the *implementation* effort; all reuse the existing trait. | Subsystem | Linux today | Windows equivalent | Effort | Notes | |---|---|---|---|---| | **Capture** | xdg ScreenCast portal → PipeWire (dmabuf) | **DXGI Desktop Duplication** (or Windows.Graphics.Capture) → D3D11 texture | M | DXGI gives a GPU `B8G8R8A8` texture directly | | **Virtual display** | KWin/Mutter/Sway/gamescope protocols | **Indirect Display Driver (IDD)** — kernel UMDF mini-driver | **XL** | ⚠️ **the blocker**: no user-mode API; C++ driver + **code signing** (test-sign or WHQL). Fallback: capture an existing monitor (loses the native-resolution feature) or a borderless window | | **Encode** | `ffmpeg-next` NVENC, CUDA hwframes | Media Foundation H.264/HEVC/AV1, **or** NVENC SDK direct with a D3D11 device context (`AVD3D11VADeviceContext`) | M–L | `encode.rs` AU/codec logic + NVENC option strings are portable; only the hwdevice + frame-pool glue swaps | | **Zero-copy bridge** | dmabuf → EGL/Vulkan → CUDA | D3D11 texture → NVENC (shared texture / `cudaImportExternalMemory` + D3D12 fence) | M | **optional** — a portable CPU-copy path already exists, so v1 can skip this | | **Input (ptr/kbd)** | libei (RemoteDesktop portal) / wlr protocols | **SendInput** (`keybd_event`/`mouse_event`) | S | the VK→evdev table just becomes VK→`VIRTUAL_KEY` (already Win32-native) | | **Input (gamepads)** | uinput X-Box-360 pad + FF rumble | **ViGEm** (Virtual Gamepad Emulation) + HID reports | M | rumble back-channel maps to ViGEm notifications | | **Audio capture** | PipeWire sink-monitor | **WASAPI loopback** (`IAudioCaptureClient`) | S–M | also produces interleaved f32 — same `AudioCapturer` contract | | **Virtual mic** | PipeWire `Audio/Source` | virtual audio device (VB-Cable-style WDM driver) or WASAPI render-to-fake-device | M | needs a driver or a bundled 3rd-party cable | | **`sendmmsg` batching** | `gamestream/stream.rs` | already has a `cfg(not(linux))` per-packet fallback | — | nothing to do | **Rough total: ~2,000–4,000 LOC of new Rust** (+ a C++ IDD driver if the virtual-display feature is kept), spread over capture/encode/vdisplay/input/audio. Every reader rated the overall effort **large**; the input+audio layer alone is *medium*. ## Recommended phasing (when picked up) 1. **Phase 0 — "basic Windows host" (no virtual display).** Capture an *existing* monitor (DXGI Desktop Duplication) → Media Foundation/NVENC encode → SendInput + WASAPI loopback. This proves the whole stack on Windows with the smallest surface, reusing all of core/QUIC/GameStream/mgmt. It loses the per-client native-resolution output but is a working Windows host quickly. 2. **Phase 1 — input + audio parity.** ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC zero-copy. 3. **Phase 2 — the virtual display (IDD).** The XL piece: a signed Indirect Display Driver that surfaces a client-sized monitor, captured via DXGI. This restores punktfunk's differentiator on Windows. Gated on solving driver signing/distribution. ## Why it's deferred (not started now) - It's **large**, and the virtual-display blocker (IDD) is a kernel driver + signing problem outside Rust — not "somewhat manageable" as a side effort. - None of it is **buildable or testable on the Linux dev box** — it would be unvalidated code. The architecture is ready whenever the work is scheduled; this doc + the clean trait boundaries are the down payment. Start at **Phase 0** for the fastest path to a working Windows host.