docs: refresh README/CLAUDE status; roadmap pairing-hardening + SudoVDA Windows
ci / rust (push) Has been cancelled

- README: replace the stale M0/M2-in-flight status with reality — M1 hardened, M2
  GameStream host live to stock Moonlight, M3 punktfunk/1 validated, M4 Apple first
  light, web console + unified host; FFmpeg 7/8; Bazzite-deployed. Layout adds
  web/, packaging/, native_pairing, dualsense.
- CLAUDE: protocol-growth item now reflects the unified host + web-console native
  pairing (done) and flags the next steps; layout updated.
- roadmap §7 Windows: de-risked via SudoVDA (the Sunshine Virtual Display Adapter) —
  no self-signed kernel IDD needed; the virtual-display backend drops XL→M.
- roadmap §8 (new) Pairing & trust hardening: mandatory PIN pairing by default
  (TOFU-open is insecure on a LAN) + delegated pairing approval (an already-paired
  device approves a new one, no out-of-band PIN).
- windows-host.md: SudoVDA path throughout (status, table, phasing, effort M not L).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-11 09:54:55 +00:00
parent 19666ba57e
commit 12cf2e4e16
4 changed files with 104 additions and 54 deletions
+9 -3
View File
@@ -72,7 +72,11 @@ Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protoc
NVENC SDK wrapper (libavcodec only emits whole AUs) — the next big latency lever (~24 ms
at high res).
3. **punktfunk/1 protocol growth**: concurrent sessions (today: one at a time, extras wait
in the accept queue); mgmt REST endpoints for the punktfunk/1 paired-client list.
in the accept queue). **Done:** unified host (`serve --native` runs GameStream + the
punktfunk/1 QUIC host in one process) with native pairing driven over the mgmt API /
web console (`mod native_pairing`: arm-on-demand → display PIN, paired-device list). Next
(see roadmap): **mandatory PIN pairing by default** (TOFU-without-pairing is insecure on a
LAN) + **delegated pairing approval** (an already-paired device approves a new one).
4. **M2 polish**: HDR/10-bit (needs HDR capture + metadata plumbing; `av1_nvenc
-highbitdepth 1` already encodes Main10 from 8-bit input on this box),
reconnect-at-new-mode robustness. AV1 negotiation and surround audio are implemented
@@ -108,9 +112,11 @@ crates/punktfunk-host/
gamestream/ Moonlight compat: nvhttp · pairing · rtsp · control · stream · gamepad · apps
vdisplay/{kwin,gamescope,mutter,wlroots}.rs per-compositor client-sized virtual outputs
zerocopy/{egl,cuda,vulkan}.rs dmabuf → CUDA → NVENC (tiled via EGL/GL, LINEAR via Vulkan)
inject/{libei,wlr,gamepad}.rs input backends (+ uinput virtual gamepads)
capture.rs · encode.rs · audio.rs · m0.rs · m3.rs · mgmt.rs
inject/{libei,wlr,gamepad,dualsense}.rs input backends (uinput xpad + UHID DualSense)
capture.rs · encode.rs · audio.rs · m0.rs · m3.rs · mgmt.rs · native_pairing.rs
crates/punktfunk-client-rs/ punktfunk/1 reference client (M3 headless; M4 adds decode+present)
web/ TanStack web console over the mgmt API (status · devices · pairing)
packaging/ Fedora/Bazzite RPM · bootc · COPR (packaging/bazzite/README.md)
tools/{loss-harness,latency-probe}/ measurement (plan §10)
scripts/ 60-punktfunk.rules · punktfunk-host.service · host.env.example · headless/
include/punktfunk_core.h generated C header
+22 -20
View File
@@ -12,34 +12,36 @@ negotiated extension. See [`docs/implementation-plan.md`](docs/implementation-pl
| Milestone | State |
|-----------|-------|
| **M1 — `punktfunk-core` + C ABI** | ✅ done & tested (FEC, packetization, crypto, session, `punktfunk_core.h`) |
| **M0pipeline spike** (wlroots→PipeWire→NVENC→file→`punktfunk-core`) | ✅ done & verified on NVIDIA (RTX 5070 Ti / driver 595) |
| M2 — P1 host → stock Moonlight | 🟡 capture+encode landed in M0; pairing/RTSP/vdisplay pending |
| M3 — measurement harness | 🟡 `tools/loss-harness` runs; `latency-probe` scaffolded |
| M4 — P2 transport + Rust client | 🟡 GF(2¹⁶) core done; `punktfunk-client-rs` scaffolded |
| M5 — Apple client | 🟡 macOS first light: HEVC on glass + input over `punktfunk/1` (`clients/apple`) |
| **M1 — `punktfunk-core` + C ABI** | ✅ done & hardened (FEC, packetization, AES-GCM, session, adversarial-review fixes, `punktfunk_core.h`) |
| **M2GameStream host → stock Moonlight** | ✅ live end-to-end: pairing, RTSP, audio, per-client virtual output at native res, GPU zero-copy NVENC, gamepads |
| **M3 — `punktfunk/1` native protocol** | ✅ validated live: QUIC control + GF(2¹⁶) FEC/AES data plane, SPAKE2 PIN pairing, mid-stream mode renegotiation |
| **M4 — client decode + present (Apple)** | 🟡 macOS first light: AnnexB→VideoToolbox HEVC on glass + input/pairing over `punktfunk/1` (`clients/apple`); iOS + presenter next |
| **Web console + management API** | ✅ TanStack web console (`web/`) over the OpenAPI mgmt API: host status, paired devices, on-demand native pairing (arm → show PIN) |
`punktfunk-core` is complete and verified: it builds and its full test suite (FEC recovery,
loopback round-trip under loss, property tests, and a **C ABI harness**) passes on
macOS/aarch64. **M0 is done:** `punktfunk-host` captures a headless wlroots output via the
ScreenCast portal + PipeWire, encodes it with NVENC, writes a playable H.265 file, and
round-trips every access unit through a `punktfunk_core` host→client session (see
`docs/linux-setup.md`). M2 is in flight: the GameStream control plane (`gamestream/`) and
the management REST API (`mgmt.rs`, OpenAPI spec in `docs/api/`) are implemented; the
remaining Linux host backends (KWin/Mutter virtual displays, libei input) are
`#[cfg(target_os = "linux")]` seams — defined and compiling, implementations pending.
The **GameStream host works with a stock Moonlight client** — validated live on NVIDIA
(RTX 5070 Ti & RTX 4090, driver 595): trust-on-first-use pairing that persists, an app
catalog, RTSP/ENet/audio, and **video at the client's exact resolution and refresh** via a
per-session virtual output (KWin, gamescope, Mutter, Sway backends), encoded with GPU
**zero-copy** (dmabuf → CUDA/Vulkan → NVENC) at up to 5120×1440@240. The native
**`punktfunk/1`** protocol adds a QUIC control plane and a GF(2¹⁶) Leopard-FEC + AES-GCM data
plane (p50 ~0.8 ms capture→reassembled at 720p120), with a SPAKE2 PIN pairing ceremony. Both
run from **one process** (`serve --native`), managed through a REST API + web console. Builds
against FFmpeg 7 or 8; deployed live on Bazzite. Full status: [`CLAUDE.md`](CLAUDE.md);
roadmap: [`docs/roadmap.md`](docs/roadmap.md).
## Layout
```
crates/
punktfunk-core/ protocol · FEC · pacing · crypto — the C ABI (lib + cdylib + staticlib)
punktfunk-host/ Linux host: vdisplay · capture · encode · inject · gamestream · mgmt
punktfunk-client-rs/ reference client (M4): VAAPI decode + wgpu present
clients/{apple,android}/ native client scaffolds (import punktfunk_core.h)
punktfunk-core/ protocol · FEC · pacing · crypto · quic — the C ABI (lib + cdylib + staticlib)
punktfunk-host/ Linux host: vdisplay · capture · encode · inject · gamestream · m3 · mgmt · native_pairing
punktfunk-client-rs/ punktfunk/1 reference client (M3 headless; M4 adds decode+present)
clients/{apple,android}/ native client scaffolds (import punktfunk_core.h); apple = macOS first light
web/ TanStack web console (host status · paired devices · pairing) over the mgmt API
packaging/ Fedora/Bazzite RPM · bootc image · COPR (see packaging/bazzite/README.md)
include/punktfunk_core.h cbindgen-generated C header (checked in)
tools/{latency-probe,loss-harness}/ measurement (plan §10)
docs/implementation-plan.md
docs/{implementation-plan,roadmap,windows-host,dualsense-haptics}.md
```
## Build & test
+51 -14
View File
@@ -1,13 +1,19 @@
# punktfunk roadmap — next goals
Decided 2026-06-10 (research-grounded; see commit history). Sequence:
**KDE reliability → client compositor options → mic passthrough → Bazzite COPR RPM (then
bootc) → touch → full UHID DualSense → iOS** (+ Windows host, scoped & deferred).
Decided 2026-06-10 (research-grounded; see commit history), extended since.
**Done (2026-06-10):** #1 KDE reliability (Phase 1 + 2), #2 compositor options (full stack incl.
macOS client), #4 mic passthrough — all on `main`, live-validated. #3 Bazzite packaging written
(`packaging/`); the COPR/bootc build is operator-run. Remaining: #5 touch → UHID DualSense, #6 iOS,
and a Windows host (`docs/windows-host.md`).
**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full
stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense**
— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E
live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds
against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced).
**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one
process, with native pairing driven from the **web console** (arm → show PIN), not the service log.
Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`).
**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval), the M4
client presenter + iOS (§6), and a Windows host (§7 — now **de-risked via SudoVDA**, no custom
signed driver needed).
## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)*
@@ -111,14 +117,45 @@ select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`.
PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin
+ touch capture (#5) + UI. tvOS later.
## 7. Windows as a host *(scoped & deferred — `docs/windows-host.md`)*
## 7. Windows as a host *(scoped — `docs/windows-host.md`; de-risked via SudoVDA)*
Architecturally an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/
crypto/C-ABI) + QUIC + GameStream + mgmt + the `m3`/pipeline orchestration are all platform-agnostic
and already `cfg`-isolated (~95% reuse). New `#[cfg(windows)]` backends behind the existing traits:
capture (DXGI Desktop Duplication), encode (Media Foundation / NVENC-SDK with a D3D11 context),
input (SendInput + ViGEm), audio (WASAPI loopback + a virtual mic). **The blocker** is the
virtual-display feature — no user-mode Windows API; it needs a signed kernel-mode **IDD** driver
(XL). Recommended start: **Phase 0** — a "basic Windows host" capturing an existing monitor (no
virtual display), proving the whole stack with the smallest surface. Deferred because it's large and
unbuildable on the Linux dev box; the trait boundaries are already in the right places.
capture (DXGI Desktop Duplication / Windows.Graphics.Capture), encode (Media Foundation / NVENC-SDK
with a D3D11 context), input (SendInput + ViGEm), audio (WASAPI loopback + a virtual mic).
**The old blocker is gone.** Rather than author + sign our own kernel IDD for the per-client virtual
display, use **SudoVDA** (the Sunshine Virtual Display Adapter) — a pre-built, signed Indirect
Display Driver that creates virtual displays at arbitrary WxH@Hz on demand. The `VirtualDisplay`
backend becomes *"install + drive SudoVDA's control API"* (M effort), not *"write + WHQL-sign a
kernel driver"* (XL). That removes the only hard blocker — the Windows host is now a medium,
mostly-mechanical port. Recommended start: **Phase 0** — capture an existing monitor to prove the
stack end to end; **Phase 1** wires SudoVDA for the native-resolution output. Deferred only because
it's unbuildable on the Linux dev box; the trait boundaries are already in the right places.
## 8. Pairing & trust hardening *(next)*
The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the
client) is built and live. Two changes harden it from "works" to "secure by default":
- **Mandatory PIN pairing by default.** Today the punktfunk/1 host can run open (trust-on-first-use)
— *not* acceptable on a shared LAN, where any reachable device could connect. The unified host
should `require_pairing` out of the box: a client must complete the SPAKE2 PIN ceremony (one online
guess, no offline attack) before any session. The operator arms a window and reads the PIN from the
web console (already built); an explicit `--open` escape hatch covers trusted single-user setups.
The wire is already in place (`M3Options.require_pairing` + the `serve_session` gate); this flips
the default and threads it through `serve --native` and the mgmt arm endpoint.
- **Delegated pairing approval** — the ergonomic enabler for "mandatory" (pair a new device without
fetching the host PIN out of band):
1. Device A is already paired (authenticated) to Host X.
2. The user tries to connect Device B to Host X.
3. Host X pushes a request to the authenticated Device A: *"Allow Device B to pair with Host X?"*
4. The user approves/denies on Device A; on approve, Host X admits Device B — binding B's
certificate fingerprint — with no PIN typed.
Needs: a host→client *pairing-approval-request* (B's fingerprint + a human label) delivered to A's
live connection (a QUIC side-plane message) or polled via the mgmt API; an approve/deny round-trip
carrying an approval token; the host gating B's admission on it. The web console **and** the Apple
client render the approval prompt. PIN pairing stays the bootstrap (the first device, or when no
paired device is online to approve).
+22 -17
View File
@@ -1,12 +1,15 @@
# Windows as a host — feasibility & scoping
**Status: scoped, deferred.** A Windows host is architecturally an *"add a backend"* job, not a
parallel port — but it is a **large** implementation effort across five GPU/driver subsystems, and
the project's headline feature (a per-client *virtual* output at the client's exact mode) has **no
user-mode Windows API**: it needs a signed kernel-mode Indirect Display Driver (IDD). This doc
records what it takes so the work can be picked up deliberately later.
**Status: scoped, deferred — but de-risked.** A Windows host is architecturally an *"add a backend"*
job, not a parallel port. The one thing that used to make it **large** — the per-client *virtual*
output, which has no user-mode Windows API and seemingly needed a self-signed kernel Indirect
Display Driver (IDD) — is **solved by reusing [SudoVDA](https://github.com/VirtualDrivers), the
Sunshine Virtual Display Adapter**: a pre-built, signed IDD that creates virtual displays at
arbitrary `WxH@Hz` on demand. We install it and drive its control interface; **no driver to write or
WHQL-sign.** That turns the headline feature from XL into a medium backend. This doc records what's
left so the work can be picked up deliberately.
(Grounded in a 4-agent read of the host crate, 2026-06-10.)
(Grounded in a 4-agent read of the host crate, 2026-06-10; SudoVDA path added 2026-06-11.)
## What's already done for us
@@ -33,7 +36,7 @@ all reuse the existing trait.
| Subsystem | Linux today | Windows equivalent | Effort | Notes |
|---|---|---|---|---|
| **Capture** | xdg ScreenCast portal → PipeWire (dmabuf) | **DXGI Desktop Duplication** (or Windows.Graphics.Capture) → D3D11 texture | M | DXGI gives a GPU `B8G8R8A8` texture directly |
| **Virtual display** | KWin/Mutter/Sway/gamescope protocols | **Indirect Display Driver (IDD)** — kernel UMDF mini-driver | **XL** | ⚠️ **the blocker**: no user-mode API; C++ driver + **code signing** (test-sign or WHQL). Fallback: capture an existing monitor (loses the native-resolution feature) or a borderless window |
| **Virtual display** | KWin/Mutter/Sway/gamescope protocols | **SudoVDA** (pre-built signed IDD) — install + drive its control API to add/remove a `WxH@Hz` virtual monitor per session | **M** | **no longer the blocker**: SudoVDA is the same IDD Sunshine ships, so no driver to author or sign. The `VirtualDisplay` backend = enable the adapter, create a monitor at the client's mode, capture it (DXGI), tear it down on session end. Fallback if SudoVDA is absent: capture an existing monitor (loses native-resolution) |
| **Encode** | `ffmpeg-next` NVENC, CUDA hwframes | Media Foundation H.264/HEVC/AV1, **or** NVENC SDK direct with a D3D11 device context (`AVD3D11VADeviceContext`) | ML | `encode.rs` AU/codec logic + NVENC option strings are portable; only the hwdevice + frame-pool glue swaps |
| **Zero-copy bridge** | dmabuf → EGL/Vulkan → CUDA | D3D11 texture → NVENC (shared texture / `cudaImportExternalMemory` + D3D12 fence) | M | **optional** — a portable CPU-copy path already exists, so v1 can skip this |
| **Input (ptr/kbd)** | libei (RemoteDesktop portal) / wlr protocols | **SendInput** (`keybd_event`/`mouse_event`) | S | the VK→evdev table just becomes VK→`VIRTUAL_KEY` (already Win32-native) |
@@ -42,9 +45,9 @@ all reuse the existing trait.
| **Virtual mic** | PipeWire `Audio/Source` | virtual audio device (VB-Cable-style WDM driver) or WASAPI render-to-fake-device | M | needs a driver or a bundled 3rd-party cable |
| **`sendmmsg` batching** | `gamestream/stream.rs` | already has a `cfg(not(linux))` per-packet fallback | — | nothing to do |
**Rough total: ~2,0004,000 LOC of new Rust** (+ a C++ IDD driver if the virtual-display feature is
kept), spread over capture/encode/vdisplay/input/audio. Every reader rated the overall effort
**large**; the input+audio layer alone is *medium*.
**Rough total: ~2,0004,000 LOC of new Rust** (no C++ driver — SudoVDA is reused as-is), spread over
capture/encode/vdisplay/input/audio. With the driver problem solved, the overall effort is now
**medium**; the input+audio layer alone is *smallmedium*.
## Recommended phasing (when picked up)
@@ -52,17 +55,19 @@ kept), spread over capture/encode/vdisplay/input/audio. Every reader rated the o
Desktop Duplication) → Media Foundation/NVENC encode → SendInput + WASAPI loopback. This proves
the whole stack on Windows with the smallest surface, reusing all of core/QUIC/GameStream/mgmt.
It loses the per-client native-resolution output but is a working Windows host quickly.
2. **Phase 1 — input + audio parity.** ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC
2. **Phase 1 — the virtual display via SudoVDA.** A `VirtualDisplay` backend that enables SudoVDA,
creates a monitor at the client's exact `WxH@Hz`, captures it (DXGI), and tears it down on session
end — restoring punktfunk's headline feature with **no driver authoring or signing**. (Ship/guide
the SudoVDA install as a host prerequisite, like the udev rule on Linux.)
3. **Phase 2 — input + audio parity.** ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC
zero-copy.
3. **Phase 2 — the virtual display (IDD).** The XL piece: a signed Indirect Display Driver that
surfaces a client-sized monitor, captured via DXGI. This restores punktfunk's differentiator on
Windows. Gated on solving driver signing/distribution.
## Why it's deferred (not started now)
- It's **large**, and the virtual-display blocker (IDD) is a kernel driver + signing problem
outside Rust — not "somewhat manageable" as a side effort.
- None of it is **buildable or testable on the Linux dev box** — it would be unvalidated code.
- The remaining work is **medium** and mechanical, but **none of it is buildable or testable on the
Linux dev box** — it would be unvalidated code until there's a Windows box in the loop.
- SudoVDA removed the hard blocker (the signed kernel driver); what's left is a backend port, picked
up whenever a Windows target is in scope.
The architecture is ready whenever the work is scheduled; this doc + the clean trait boundaries are
the down payment. Start at **Phase 0** for the fastest path to a working Windows host.