diff --git a/CLAUDE.md b/CLAUDE.md index d00ea73..02f987a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -72,7 +72,11 @@ Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protoc NVENC SDK wrapper (libavcodec only emits whole AUs) — the next big latency lever (~2–4 ms at high res). 3. **punktfunk/1 protocol growth**: concurrent sessions (today: one at a time, extras wait - in the accept queue); mgmt REST endpoints for the punktfunk/1 paired-client list. + in the accept queue). **Done:** unified host (`serve --native` runs GameStream + the + punktfunk/1 QUIC host in one process) with native pairing driven over the mgmt API / + web console (`mod native_pairing`: arm-on-demand → display PIN, paired-device list). Next + (see roadmap): **mandatory PIN pairing by default** (TOFU-without-pairing is insecure on a + LAN) + **delegated pairing approval** (an already-paired device approves a new one). 4. **M2 polish**: HDR/10-bit (needs HDR capture + metadata plumbing; `av1_nvenc -highbitdepth 1` already encodes Main10 from 8-bit input on this box), reconnect-at-new-mode robustness. AV1 negotiation and surround audio are implemented @@ -108,9 +112,11 @@ crates/punktfunk-host/ gamestream/ Moonlight compat: nvhttp · pairing · rtsp · control · stream · gamepad · apps vdisplay/{kwin,gamescope,mutter,wlroots}.rs per-compositor client-sized virtual outputs zerocopy/{egl,cuda,vulkan}.rs dmabuf → CUDA → NVENC (tiled via EGL/GL, LINEAR via Vulkan) - inject/{libei,wlr,gamepad}.rs input backends (+ uinput virtual gamepads) - capture.rs · encode.rs · audio.rs · m0.rs · m3.rs · mgmt.rs + inject/{libei,wlr,gamepad,dualsense}.rs input backends (uinput xpad + UHID DualSense) + capture.rs · encode.rs · audio.rs · m0.rs · m3.rs · mgmt.rs · native_pairing.rs crates/punktfunk-client-rs/ punktfunk/1 reference client (M3 headless; M4 adds decode+present) +web/ TanStack web console over the mgmt API (status · devices · pairing) +packaging/ Fedora/Bazzite RPM · bootc · COPR (packaging/bazzite/README.md) tools/{loss-harness,latency-probe}/ measurement (plan §10) scripts/ 60-punktfunk.rules · punktfunk-host.service · host.env.example · headless/ include/punktfunk_core.h generated C header diff --git a/README.md b/README.md index 770cdc6..a852434 100644 --- a/README.md +++ b/README.md @@ -12,34 +12,36 @@ negotiated extension. See [`docs/implementation-plan.md`](docs/implementation-pl | Milestone | State | |-----------|-------| -| **M1 — `punktfunk-core` + C ABI** | ✅ done & tested (FEC, packetization, crypto, session, `punktfunk_core.h`) | -| **M0 — pipeline spike** (wlroots→PipeWire→NVENC→file→`punktfunk-core`) | ✅ done & verified on NVIDIA (RTX 5070 Ti / driver 595) | -| M2 — P1 host → stock Moonlight | 🟡 capture+encode landed in M0; pairing/RTSP/vdisplay pending | -| M3 — measurement harness | 🟡 `tools/loss-harness` runs; `latency-probe` scaffolded | -| M4 — P2 transport + Rust client | 🟡 GF(2¹⁶) core done; `punktfunk-client-rs` scaffolded | -| M5 — Apple client | 🟡 macOS first light: HEVC on glass + input over `punktfunk/1` (`clients/apple`) | +| **M1 — `punktfunk-core` + C ABI** | ✅ done & hardened (FEC, packetization, AES-GCM, session, adversarial-review fixes, `punktfunk_core.h`) | +| **M2 — GameStream host → stock Moonlight** | ✅ live end-to-end: pairing, RTSP, audio, per-client virtual output at native res, GPU zero-copy NVENC, gamepads | +| **M3 — `punktfunk/1` native protocol** | ✅ validated live: QUIC control + GF(2¹⁶) FEC/AES data plane, SPAKE2 PIN pairing, mid-stream mode renegotiation | +| **M4 — client decode + present (Apple)** | 🟡 macOS first light: AnnexB→VideoToolbox HEVC on glass + input/pairing over `punktfunk/1` (`clients/apple`); iOS + presenter next | +| **Web console + management API** | ✅ TanStack web console (`web/`) over the OpenAPI mgmt API: host status, paired devices, on-demand native pairing (arm → show PIN) | -`punktfunk-core` is complete and verified: it builds and its full test suite (FEC recovery, -loopback round-trip under loss, property tests, and a **C ABI harness**) passes on -macOS/aarch64. **M0 is done:** `punktfunk-host` captures a headless wlroots output via the -ScreenCast portal + PipeWire, encodes it with NVENC, writes a playable H.265 file, and -round-trips every access unit through a `punktfunk_core` host→client session (see -`docs/linux-setup.md`). M2 is in flight: the GameStream control plane (`gamestream/`) and -the management REST API (`mgmt.rs`, OpenAPI spec in `docs/api/`) are implemented; the -remaining Linux host backends (KWin/Mutter virtual displays, libei input) are -`#[cfg(target_os = "linux")]` seams — defined and compiling, implementations pending. +The **GameStream host works with a stock Moonlight client** — validated live on NVIDIA +(RTX 5070 Ti & RTX 4090, driver 595): trust-on-first-use pairing that persists, an app +catalog, RTSP/ENet/audio, and **video at the client's exact resolution and refresh** via a +per-session virtual output (KWin, gamescope, Mutter, Sway backends), encoded with GPU +**zero-copy** (dmabuf → CUDA/Vulkan → NVENC) at up to 5120×1440@240. The native +**`punktfunk/1`** protocol adds a QUIC control plane and a GF(2¹⁶) Leopard-FEC + AES-GCM data +plane (p50 ~0.8 ms capture→reassembled at 720p120), with a SPAKE2 PIN pairing ceremony. Both +run from **one process** (`serve --native`), managed through a REST API + web console. Builds +against FFmpeg 7 or 8; deployed live on Bazzite. Full status: [`CLAUDE.md`](CLAUDE.md); +roadmap: [`docs/roadmap.md`](docs/roadmap.md). ## Layout ``` crates/ - punktfunk-core/ protocol · FEC · pacing · crypto — the C ABI (lib + cdylib + staticlib) - punktfunk-host/ Linux host: vdisplay · capture · encode · inject · gamestream · mgmt - punktfunk-client-rs/ reference client (M4): VAAPI decode + wgpu present -clients/{apple,android}/ native client scaffolds (import punktfunk_core.h) + punktfunk-core/ protocol · FEC · pacing · crypto · quic — the C ABI (lib + cdylib + staticlib) + punktfunk-host/ Linux host: vdisplay · capture · encode · inject · gamestream · m3 · mgmt · native_pairing + punktfunk-client-rs/ punktfunk/1 reference client (M3 headless; M4 adds decode+present) +clients/{apple,android}/ native client scaffolds (import punktfunk_core.h); apple = macOS first light +web/ TanStack web console (host status · paired devices · pairing) over the mgmt API +packaging/ Fedora/Bazzite RPM · bootc image · COPR (see packaging/bazzite/README.md) include/punktfunk_core.h cbindgen-generated C header (checked in) tools/{latency-probe,loss-harness}/ measurement (plan §10) -docs/implementation-plan.md +docs/{implementation-plan,roadmap,windows-host,dualsense-haptics}.md ``` ## Build & test diff --git a/docs/roadmap.md b/docs/roadmap.md index c8e485f..46435a4 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -1,13 +1,19 @@ # punktfunk roadmap — next goals -Decided 2026-06-10 (research-grounded; see commit history). Sequence: -**KDE reliability → client compositor options → mic passthrough → Bazzite COPR RPM (then -bootc) → touch → full UHID DualSense → iOS** (+ Windows host, scoped & deferred). +Decided 2026-06-10 (research-grounded; see commit history), extended since. -**Done (2026-06-10):** #1 KDE reliability (Phase 1 + 2), #2 compositor options (full stack incl. -macOS client), #4 mic passthrough — all on `main`, live-validated. #3 Bazzite packaging written -(`packaging/`); the COPR/bootc build is operator-run. Remaining: #5 touch → UHID DualSense, #6 iOS, -and a Windows host (`docs/windows-host.md`). +**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full +stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense** +— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E +live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds +against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced). +**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one +process, with native pairing driven from the **web console** (arm → show PIN), not the service log. +Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`). + +**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval), the M4 +client presenter + iOS (§6), and a Windows host (§7 — now **de-risked via SudoVDA**, no custom +signed driver needed). ## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)* @@ -111,14 +117,45 @@ select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`. PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin + touch capture (#5) + UI. tvOS later. -## 7. Windows as a host *(scoped & deferred — `docs/windows-host.md`)* +## 7. Windows as a host *(scoped — `docs/windows-host.md`; de-risked via SudoVDA)* Architecturally an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/ crypto/C-ABI) + QUIC + GameStream + mgmt + the `m3`/pipeline orchestration are all platform-agnostic and already `cfg`-isolated (~95% reuse). New `#[cfg(windows)]` backends behind the existing traits: -capture (DXGI Desktop Duplication), encode (Media Foundation / NVENC-SDK with a D3D11 context), -input (SendInput + ViGEm), audio (WASAPI loopback + a virtual mic). **The blocker** is the -virtual-display feature — no user-mode Windows API; it needs a signed kernel-mode **IDD** driver -(XL). Recommended start: **Phase 0** — a "basic Windows host" capturing an existing monitor (no -virtual display), proving the whole stack with the smallest surface. Deferred because it's large and -unbuildable on the Linux dev box; the trait boundaries are already in the right places. +capture (DXGI Desktop Duplication / Windows.Graphics.Capture), encode (Media Foundation / NVENC-SDK +with a D3D11 context), input (SendInput + ViGEm), audio (WASAPI loopback + a virtual mic). + +**The old blocker is gone.** Rather than author + sign our own kernel IDD for the per-client virtual +display, use **SudoVDA** (the Sunshine Virtual Display Adapter) — a pre-built, signed Indirect +Display Driver that creates virtual displays at arbitrary WxH@Hz on demand. The `VirtualDisplay` +backend becomes *"install + drive SudoVDA's control API"* (M effort), not *"write + WHQL-sign a +kernel driver"* (XL). That removes the only hard blocker — the Windows host is now a medium, +mostly-mechanical port. Recommended start: **Phase 0** — capture an existing monitor to prove the +stack end to end; **Phase 1** wires SudoVDA for the native-resolution output. Deferred only because +it's unbuildable on the Linux dev box; the trait boundaries are already in the right places. + +## 8. Pairing & trust hardening *(next)* + +The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the +client) is built and live. Two changes harden it from "works" to "secure by default": + +- **Mandatory PIN pairing by default.** Today the punktfunk/1 host can run open (trust-on-first-use) + — *not* acceptable on a shared LAN, where any reachable device could connect. The unified host + should `require_pairing` out of the box: a client must complete the SPAKE2 PIN ceremony (one online + guess, no offline attack) before any session. The operator arms a window and reads the PIN from the + web console (already built); an explicit `--open` escape hatch covers trusted single-user setups. + The wire is already in place (`M3Options.require_pairing` + the `serve_session` gate); this flips + the default and threads it through `serve --native` and the mgmt arm endpoint. +- **Delegated pairing approval** — the ergonomic enabler for "mandatory" (pair a new device without + fetching the host PIN out of band): + 1. Device A is already paired (authenticated) to Host X. + 2. The user tries to connect Device B to Host X. + 3. Host X pushes a request to the authenticated Device A: *"Allow Device B to pair with Host X?"* + 4. The user approves/denies on Device A; on approve, Host X admits Device B — binding B's + certificate fingerprint — with no PIN typed. + + Needs: a host→client *pairing-approval-request* (B's fingerprint + a human label) delivered to A's + live connection (a QUIC side-plane message) or polled via the mgmt API; an approve/deny round-trip + carrying an approval token; the host gating B's admission on it. The web console **and** the Apple + client render the approval prompt. PIN pairing stays the bootstrap (the first device, or when no + paired device is online to approve). diff --git a/docs/windows-host.md b/docs/windows-host.md index 166a812..1a802c5 100644 --- a/docs/windows-host.md +++ b/docs/windows-host.md @@ -1,12 +1,15 @@ # Windows as a host — feasibility & scoping -**Status: scoped, deferred.** A Windows host is architecturally an *"add a backend"* job, not a -parallel port — but it is a **large** implementation effort across five GPU/driver subsystems, and -the project's headline feature (a per-client *virtual* output at the client's exact mode) has **no -user-mode Windows API**: it needs a signed kernel-mode Indirect Display Driver (IDD). This doc -records what it takes so the work can be picked up deliberately later. +**Status: scoped, deferred — but de-risked.** A Windows host is architecturally an *"add a backend"* +job, not a parallel port. The one thing that used to make it **large** — the per-client *virtual* +output, which has no user-mode Windows API and seemingly needed a self-signed kernel Indirect +Display Driver (IDD) — is **solved by reusing [SudoVDA](https://github.com/VirtualDrivers), the +Sunshine Virtual Display Adapter**: a pre-built, signed IDD that creates virtual displays at +arbitrary `WxH@Hz` on demand. We install it and drive its control interface; **no driver to write or +WHQL-sign.** That turns the headline feature from XL into a medium backend. This doc records what's +left so the work can be picked up deliberately. -(Grounded in a 4-agent read of the host crate, 2026-06-10.) +(Grounded in a 4-agent read of the host crate, 2026-06-10; SudoVDA path added 2026-06-11.) ## What's already done for us @@ -33,7 +36,7 @@ all reuse the existing trait. | Subsystem | Linux today | Windows equivalent | Effort | Notes | |---|---|---|---|---| | **Capture** | xdg ScreenCast portal → PipeWire (dmabuf) | **DXGI Desktop Duplication** (or Windows.Graphics.Capture) → D3D11 texture | M | DXGI gives a GPU `B8G8R8A8` texture directly | -| **Virtual display** | KWin/Mutter/Sway/gamescope protocols | **Indirect Display Driver (IDD)** — kernel UMDF mini-driver | **XL** | ⚠️ **the blocker**: no user-mode API; C++ driver + **code signing** (test-sign or WHQL). Fallback: capture an existing monitor (loses the native-resolution feature) or a borderless window | +| **Virtual display** | KWin/Mutter/Sway/gamescope protocols | **SudoVDA** (pre-built signed IDD) — install + drive its control API to add/remove a `WxH@Hz` virtual monitor per session | **M** | ✅ **no longer the blocker**: SudoVDA is the same IDD Sunshine ships, so no driver to author or sign. The `VirtualDisplay` backend = enable the adapter, create a monitor at the client's mode, capture it (DXGI), tear it down on session end. Fallback if SudoVDA is absent: capture an existing monitor (loses native-resolution) | | **Encode** | `ffmpeg-next` NVENC, CUDA hwframes | Media Foundation H.264/HEVC/AV1, **or** NVENC SDK direct with a D3D11 device context (`AVD3D11VADeviceContext`) | M–L | `encode.rs` AU/codec logic + NVENC option strings are portable; only the hwdevice + frame-pool glue swaps | | **Zero-copy bridge** | dmabuf → EGL/Vulkan → CUDA | D3D11 texture → NVENC (shared texture / `cudaImportExternalMemory` + D3D12 fence) | M | **optional** — a portable CPU-copy path already exists, so v1 can skip this | | **Input (ptr/kbd)** | libei (RemoteDesktop portal) / wlr protocols | **SendInput** (`keybd_event`/`mouse_event`) | S | the VK→evdev table just becomes VK→`VIRTUAL_KEY` (already Win32-native) | @@ -42,9 +45,9 @@ all reuse the existing trait. | **Virtual mic** | PipeWire `Audio/Source` | virtual audio device (VB-Cable-style WDM driver) or WASAPI render-to-fake-device | M | needs a driver or a bundled 3rd-party cable | | **`sendmmsg` batching** | `gamestream/stream.rs` | already has a `cfg(not(linux))` per-packet fallback | — | nothing to do | -**Rough total: ~2,000–4,000 LOC of new Rust** (+ a C++ IDD driver if the virtual-display feature is -kept), spread over capture/encode/vdisplay/input/audio. Every reader rated the overall effort -**large**; the input+audio layer alone is *medium*. +**Rough total: ~2,000–4,000 LOC of new Rust** (no C++ driver — SudoVDA is reused as-is), spread over +capture/encode/vdisplay/input/audio. With the driver problem solved, the overall effort is now +**medium**; the input+audio layer alone is *small–medium*. ## Recommended phasing (when picked up) @@ -52,17 +55,19 @@ kept), spread over capture/encode/vdisplay/input/audio. Every reader rated the o Desktop Duplication) → Media Foundation/NVENC encode → SendInput + WASAPI loopback. This proves the whole stack on Windows with the smallest surface, reusing all of core/QUIC/GameStream/mgmt. It loses the per-client native-resolution output but is a working Windows host quickly. -2. **Phase 1 — input + audio parity.** ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC +2. **Phase 1 — the virtual display via SudoVDA.** A `VirtualDisplay` backend that enables SudoVDA, + creates a monitor at the client's exact `WxH@Hz`, captures it (DXGI), and tears it down on session + end — restoring punktfunk's headline feature with **no driver authoring or signing**. (Ship/guide + the SudoVDA install as a host prerequisite, like the udev rule on Linux.) +3. **Phase 2 — input + audio parity.** ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC zero-copy. -3. **Phase 2 — the virtual display (IDD).** The XL piece: a signed Indirect Display Driver that - surfaces a client-sized monitor, captured via DXGI. This restores punktfunk's differentiator on - Windows. Gated on solving driver signing/distribution. ## Why it's deferred (not started now) -- It's **large**, and the virtual-display blocker (IDD) is a kernel driver + signing problem - outside Rust — not "somewhat manageable" as a side effort. -- None of it is **buildable or testable on the Linux dev box** — it would be unvalidated code. +- The remaining work is **medium** and mechanical, but **none of it is buildable or testable on the + Linux dev box** — it would be unvalidated code until there's a Windows box in the loop. +- SudoVDA removed the hard blocker (the signed kernel driver); what's left is a backend port, picked + up whenever a Windows target is in scope. The architecture is ready whenever the work is scheduled; this doc + the clean trait boundaries are the down payment. Start at **Phase 0** for the fastest path to a working Windows host.