`,
+ `streamid=video/0/0` / `control/13/0`. ANNOUNCE carries the negotiated config
+ (`x-nv-video[0].*`, `x-nv-vqos[0].*`) → maps to `punktfunk_core::Config`.
+
+## The two highest interop risks (validate EARLY)
+
+1. **RS-FEC matrix compatibility.** Sunshine + Moonlight both use **nanors** (GF(2⁸), poly
+ 0x11d, Vandermonde systematic). punktfunk-core uses `reed-solomon-erasure` (Cauchy) — parity
+ bytes likely **don't match**, so Moonlight silently fails to recover any frame with a lost
+ data shard. Mitigation: **on a clean LAN with no loss the client never runs RS decode**, so
+ defer this — get a frame decoded first, then FFI/port nanors for loss recovery.
+2. **Crypto layout.** punktfunk's `SessionCrypto` (salt + seq-as-AAD) is wire-incompatible. P1
+ needs a separate GameStream GCM path. Mitigation: **video encryption is negotiated and
+ usually off on LAN** — implement plaintext video first, add GCM later.
+
+## Phasing (each phase independently testable with a real Moonlight client)
+
+- **P1.1 — Discovery + serverinfo + pairing.** mDNS `_nvstream._tcp`, HTTP/HTTPS nvhttp,
+ `/serverinfo` XML, the 4-phase pairing + cert pinning. *Acceptance: Moonlight discovers,
+ pairs (PIN), and shows the host as ready.* ← first slice.
+- **P1.2 — Launch + RTSP + virtual display.** `/launch` (parse rikey/rikeyid/mode), the RTSP
+ handshake, negotiate `Config`, create a wlroots virtual output sized to the client.
+ *Acceptance: Moonlight completes RTSP and the host stands up the UDP streams.*
+- **P1.3 — Video (punktfunk-core P1 codec), plaintext, clean-LAN.** RTP+NV framing + FEC shard
+ layout in punktfunk-core; wire M0's NVENC AUs → UDP 47998. *Acceptance: Moonlight DISPLAYS video.*
+- **P1.4 — Control + input.** ENet (`rusty_enet`) control stream; decode input → `inject.rs`
+ (uinput/reis); request-IDR → force NVENC keyframe. *Acceptance: mouse/keyboard work.*
+- **P1.5 — Robustness: FEC recovery + encryption.** nanors-exact FEC; per-shard AES-GCM.
+ *Acceptance: stable under `tc netem` loss; encrypted streams.*
+- **P1.6 — Audio + polish.** Opus + audio RTP/FEC/CBC (UDP 47999); disconnect teardown; KWin
+ backend for the user's KDE box. *Acceptance: full game stream with sound — the M2 goal.*
+
+## Crates (verified available)
+
+`mdns-sd` 0.20 (discovery) · `axum` 0.8 + `rustls` + `tokio-rustls` (nvhttp/HTTPS, custom
+`ClientCertVerifier` for pinning) · `rcgen` 0.14 + `x509-parser` 0.18 + `rsa`/`sha2`/`aes`/
+`ecb` (pairing crypto) · hand-rolled RTSP over `tokio::net::TcpListener` · `rusty_enet` 0.4
+(control) · `opus` 0.3 (audio) · `reis` 0.6 + `input-linux` (input) · `aes-gcm` (already in
+core) for the P1 video/control GCM path; nanors (FFI/port) for FEC recovery in P1.5.
+
+## Testing note
+
+The host is headless; end-to-end needs a **stock Moonlight client on the LAN** pointed at
+this box (manual "add host" by IP works without mDNS). P1.1 is testable with `curl` against
+`/serverinfo` + the Moonlight pair flow; P1.3+ needs a client that can display.
diff --git a/docs-site/content/docs/meta.json b/docs-site/content/docs/meta.json
new file mode 100644
index 0000000..3232070
--- /dev/null
+++ b/docs-site/content/docs/meta.json
@@ -0,0 +1,14 @@
+{
+ "title": "Documentation",
+ "pages": [
+ "index",
+ "overview",
+ "implementation-plan",
+ "roadmap",
+ "m2-plan",
+ "---Setup---",
+ "linux-setup",
+ "windows-host",
+ "dualsense-haptics"
+ ]
+}
diff --git a/docs-site/content/docs/overview.md b/docs-site/content/docs/overview.md
new file mode 100644
index 0000000..832ce1a
--- /dev/null
+++ b/docs-site/content/docs/overview.md
@@ -0,0 +1,77 @@
+---
+title: "Project Overview"
+description: "Status, architecture, and milestones at a glance."
+---
+
+
+*A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust
+protocol core and native clients per platform.*
+
+`punktfunk` is a placeholder codename. The bet: ship a **Linux virtual-display streaming
+host** that speaks the existing Moonlight protocol (every Moonlight/Artemis client works
+day one), then break the ~1 Gbps FEC wall with a **GF(2¹⁶) Leopard-RS** transport as a
+negotiated extension. See [`docs/implementation-plan.md`](docs/implementation-plan.md).
+
+## Status
+
+| Milestone | State |
+|-----------|-------|
+| **M1 — `punktfunk-core` + C ABI** | ✅ done & hardened (FEC, packetization, AES-GCM, session, adversarial-review fixes, `punktfunk_core.h`) |
+| **M2 — GameStream host → stock Moonlight** | ✅ live end-to-end: pairing, RTSP, audio, per-client virtual output at native res, GPU zero-copy NVENC, gamepads |
+| **M3 — `punktfunk/1` native protocol** | ✅ validated live: QUIC control + GF(2¹⁶) FEC/AES data plane, SPAKE2 PIN pairing, mid-stream mode renegotiation |
+| **M4 — client decode + present (Apple)** | 🟡 macOS first light: AnnexB→VideoToolbox HEVC on glass + input/pairing over `punktfunk/1` (`clients/apple`); iOS + presenter next |
+| **Web console + management API** | ✅ TanStack web console (`web/`) over the OpenAPI mgmt API: host status, paired devices, on-demand native pairing (arm → show PIN) |
+
+The **GameStream host works with a stock Moonlight client** — validated live on NVIDIA
+(RTX 5070 Ti & RTX 4090, driver 595): trust-on-first-use pairing that persists, an app
+catalog, RTSP/ENet/audio, and **video at the client's exact resolution and refresh** via a
+per-session virtual output (KWin, gamescope, Mutter, Sway backends), encoded with GPU
+**zero-copy** (dmabuf → CUDA/Vulkan → NVENC) at up to 5120×1440@240. The native
+**`punktfunk/1`** protocol adds a QUIC control plane and a GF(2¹⁶) Leopard-FEC + AES-GCM data
+plane (p50 ~0.8 ms capture→reassembled at 720p120), with a SPAKE2 PIN pairing ceremony. Both
+run from **one process** (`serve --native`), managed through a REST API + web console. Builds
+against FFmpeg 7 or 8; deployed live on Bazzite. Full status: [`CLAUDE.md`](CLAUDE.md);
+roadmap: [`docs/roadmap.md`](docs/roadmap.md).
+
+## Layout
+
+```
+crates/
+ punktfunk-core/ protocol · FEC · pacing · crypto · quic — the C ABI (lib + cdylib + staticlib)
+ punktfunk-host/ Linux host: vdisplay · capture · encode · inject · gamestream · m3 · mgmt · native_pairing
+ punktfunk-client-rs/ punktfunk/1 reference client (M3 headless; M4 adds decode+present)
+clients/{apple,android}/ native client scaffolds (import punktfunk_core.h); apple = macOS first light
+web/ TanStack web console (host status · paired devices · pairing) over the mgmt API
+packaging/ Fedora/Bazzite RPM · bootc image · COPR (see packaging/bazzite/README.md)
+include/punktfunk_core.h cbindgen-generated C header (checked in)
+tools/{latency-probe,loss-harness}/ measurement (plan §10)
+docs/{implementation-plan,roadmap,windows-host,dualsense-haptics}.md
+```
+
+## Build & test
+
+```sh
+cargo build --workspace # green on Linux and macOS
+cargo test --workspace # unit + loopback + proptest + C ABI harness
+cargo clippy --workspace --all-targets
+
+cargo run -p loss-harness # FEC loss-resilience sweep (no network needed)
+bash crates/punktfunk-core/tests/c/run.sh # standalone C-ABI link+round-trip proof
+```
+
+The C header regenerates from `crates/punktfunk-core/src/abi.rs` on every build (cbindgen via
+`build.rs`) into `include/punktfunk_core.h`.
+
+## Design invariants
+
+- **One core, linked everywhere.** Protocol/FEC/crypto/pacing live in `punktfunk-core` exactly
+ once, exposed over a stable, versioned C ABI (`punktfunk_abi_version()`, `PunktfunkConfig`
+ carries its own `struct_size`).
+- **No async on the hot path.** The per-frame pipeline uses native threads only;
+ `tokio`/`quinn` are gated behind the off-by-default `quic` feature (control plane only).
+- **FEC is the wall-breaker.** GF(2⁸) (≤255 shards/block) for Moonlight compat;
+ GF(2¹⁶) (≤65535 shards/block, SIMD, O(n log n)) to push past ~1 Gbps.
+
+## License
+
+MIT OR Apache-2.0.
diff --git a/docs-site/content/docs/roadmap.md b/docs-site/content/docs/roadmap.md
new file mode 100644
index 0000000..80621eb
--- /dev/null
+++ b/docs-site/content/docs/roadmap.md
@@ -0,0 +1,334 @@
+---
+title: "Roadmap"
+description: "Decided next goals and the longer-term bets."
+---
+
+
+Decided 2026-06-10 (research-grounded; see commit history), extended since.
+
+**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full
+stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense**
+— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E
+live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds
+against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced).
+**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one
+process, with native pairing driven from the **web console** (arm → show PIN), not the service log.
+Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`).
+**Bazzite dynamic resolution (`c894c6f`):** the host now *manages* a headless `gamescope-session-plus`
+Steam session at the **client's exact resolution + refresh** — games see it (via injected
+`--nested-refresh` + generated CVT modes, not the box's TV EDID), relaunched per-connection on a mode
+change, reused (no Steam restart) on the same mode. Plus macOS/iPad input fixes (NSEvent motion +
+iPad pointer-lock) and a 4K/5K one-frame-freeze fix (grow the UDP socket buffers).
+
+**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval), the M4
+client presenter + iOS (§6), and a Windows host (§7 — now **de-risked via SudoVDA**, no custom
+signed driver needed). **§10 HDR/10-bit is parked — blocked upstream at the compositor** (no
+gamescope/KWin PipeWire 10-bit producer yet).
+
+## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)*
+
+Startup is a chain of timing-sensitive handoffs with no readiness checks — each is a blind
+`sleep`, one-shot timeout, or silent fire-and-forget that fails into a black screen.
+
+- **Phase 1 (S):** replace `run-headless-kde.sh`'s blind `sleep 2` with an active readiness
+ wait (kwin socket + `wl_display` roundtrip + `zkde_screencast` global advertised +
+ KWIN_PID alive); add a `punktfunk-host probe-compositor` subcommand (reuses kwin.rs's
+ registry roundtrip); move the portal restart to *after* readiness and precede it with
+ `systemctl --user import-environment` + `dbus-update-activation-environment` (the missing
+ env import — the Sway script does this, the KDE one doesn't).
+- **Phase 2 (M):** bounded retry-with-backoff around `vd.create()` + first-frame
+ (permanent vs transient); a PipeWire negotiation watchdog with zero-copy→CPU auto-fallback
+ ("no PipeWire frame within 10s" → recovery or precise diagnosis); fix `set_custom_refresh`
+ to wait for the output, read back the active mode, reconcile encoder fps; harden gamescope
+ node discovery + detect the known-bad-gamescope signature; graceful PipeWire-thread stop.
+- **Phase 3 (L):** supervised systemd user session (kwin + portal + host) with the readiness
+ probe as an `ExecStartPost` gate, `Restart=on-failure`.
+
+## 2. Offer available compositors in the client ✅ *(done)*
+
+Host enumerates which backends are actually available (binary present + version OK:
+gamescope ≥3.16.22, KWin ≥6.5.6, gnome-shell, sway), advertises the list in the punktfunk/1
+Welcome + a mgmt-API field; client sends its pick in the Hello; host honors it per session.
+Picker in the Apple client + web console.
+
+## 3. Bazzite / install on other devices ✅ *(packaging written — `packaging/`)*
+
+Bazzite already ships gamescope + PipeWire + the NVIDIA driver (incl. `libnvidia-encode`);
+it's Fedora-atomic and the community installs Sunshine via COPR rpm-ostree — the analog.
+Written: `packaging/rpm/punktfunk.spec` (builds the host from source), `packaging/bootc/Containerfile`
+(`FROM bazzite-nvidia`), `packaging/bazzite/host.env` (gamescope default), `packaging/copr/` +
+`packaging/README.md`. The build itself is operator-run (COPR / a Fedora toolbox; not buildable on
+the Ubuntu dev box). `LICENSE-{MIT,APACHE}` added to match the declared dual license.
+
+- **M-Bazzite-1:** a **COPR RPM** (primary) — binary + `60-punktfunk.rules` (→
+ `/usr/lib/udev/rules.d`) + systemd `--user` unit + `host.env.example`; `Requires` the
+ NVENC ffmpeg-libs Bazzite already pulls; links host `libcuda`/`libnvidia-encode` directly.
+ Install = `rpm-ostree install` + reboot + add to `input`/`render`. Default backend =
+ Bazzite's already-present **gamescope** (minimal session plumbing).
+- **M-Bazzite-2:** wrap the RPM in a **bootc/OCI image layer** (`FROM
+ ghcr.io/ublue-os/bazzite-nvidia:stable`) for the appliance/"just rebase" experience.
+- Flatpak only later as an explicitly-degraded convenience build (sandbox fights
+ zero-copy NVENC/dmabuf/uinput).
+
+## 4. Mic passthrough — client mic → host input device ✅ *(done — host side)*
+
+The exact mirror of the host→client desktop-audio path. A PipeWire virtual source apps can
+select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`.
+
+- New `0xCB` MIC_AUDIO datagram (mirror of `0xC9`) + `NativeClient::send_audio` + ABI
+ `punktfunk_send_audio`.
+- `audio/source_linux.rs` — near-copy of the capture file, Direction::Output, fed from a
+ jitter buffer (silence-fill underrun, Opus PLC).
+- Host `mic_thread` (Opus decode → ring → source); teardown RAII, set `node.dont-reconnect`.
+- Apple capture (AVAudioEngine → Opus). **Opt-in + paired-only** (a remote mic is a privacy
+ surface). punktfunk/1-only.
+
+## 5. Touch + rich DualSense *(decision: commit to full UHID DualSense)*
+
+- **Touch — implemented (host path), pending a backend that lands it.** `TouchDown/Move/Up`
+ InputKinds (reuse the abs-pointer `flags=(w<<16)|h` mapping, `code`=touch id); host
+ `inject/libei.rs` requests the `Touchscreen` device type + binds the `Touch` capability and
+ injects `ei_touchscreen` down/motion/up; `punktfunk-client-rs --touch-test` drags a finger.
+ **Validated:** KWin's RemoteDesktop portal *grants* the Touchscreen device type, but its EIS
+ server creates **no touchscreen device** (headless KWin) — so touch currently no-ops on KWin
+ (now logged once). The code is correct; it needs a backend that exposes `ei_touchscreen`
+ (gamescope / newer KWin / the real iPad client path) to land. wlroots: no virtual-touch wired.
+- **Rich DualSense — HID backend built & validated live.** `inject/dualsense.rs`: a hand-rolled
+ `/dev/uhid` codec (no bindgen) presenting a genuine USB DualSense (vendor 054C/0CE6, the 232-byte
+ inputtino report descriptor) bound by the kernel `hid-playstation` driver. The mandatory
+ GET_REPORT feature handshake (calibration 0x05 / pairing 0x09 / firmware 0x20) is answered, so the
+ kernel creates the full device (gamepad/motion/touchpad/lightbar). Input report `0x01` is built
+ from gamepad frames; output report `0x02` is parsed for LED RGB, player LEDs, and **adaptive
+ trigger effects (L2/R2)**. Protocol carries new side-planes: rich-input `0xCC`
+ (touchpad/motion) + HID-output `0xCD` (LED/triggers). `/dev/uhid` udev rule shipped.
+- **Rich DualSense — Phase C/D/E end-to-end, validated live.** `PUNKTFUNK_GAMEPAD=dualsense`
+ selects a per-session `DualSenseManager` (the `PadBackend` enum in `m3.rs`): client gamepad frames
+ build the DualSense report; the kernel's feedback comes back as `HidOutput` on the **0xCD** plane
+ (lightbar / player LEDs / adaptive triggers) while **rumble stays on the universal 0xCA plane**
+ (so non-DualSense clients still feel it); touchpad + motion ride the **0xCC** rich-input plane
+ (`DualSenseManager::apply_rich`, merged with button state). The connector + C ABI gained
+ `punktfunk_connection_next_hidout` (→ `PunktfunkHidOutput`) and `punktfunk_connection_send_rich_input`
+ (← `PunktfunkRichInput`); header regenerated. Validated on-box: a synthetic-source `m3-host` +
+ `punktfunk-client-rs --rich-input-test` created the real kernel DualSense, drove 0xCC, and decoded
+ 12 live 0xCD events (the kernel's actual lightbar/trigger init reports) — data plane unaffected
+ (600/600 frames). *Remaining:* the Apple client renders adaptive triggers + rumble on a real
+ DualSense (`GCDualSenseAdaptiveTrigger`) — handed off to the client agent for the real playtest.
+- **Advanced (audio-driven voice-coil) haptics — scoped, NO-GO for now (`docs/dualsense-haptics.md`).**
+ Driven by the DualSense's USB *audio* interface (4-ch, back 2 channels = haptic PCM), not HID — so
+ the UHID backend structurally can't carry it. Three independent walls: host capture needs a kernel
+ rebuild (`CONFIG_USB_DUMMY_HCD` is off → no UDC for an `f_uac2` gadget); **near-zero Linux supply**
+ (only ~5–10 Proton titles via custom Wine patches emit it; `hid-playstation`/Steam Input/RPCS3
+ don't); and the Apple client can't faithfully replay PCM haptics (CoreHaptics is discrete/pattern-
+ based, no public channel-3/4 routing). Deferred; revisit only if a real DS for capture + a UDC/host
+ path + a PCM-capable client all land. Adaptive triggers (HID, above) deliver the reachable 80%.
+
+## 6. iOS/iPadOS → tvOS *(deferred)*
+
+PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin
++ touch capture (#5) + UI. tvOS later.
+
+## 7. Windows as a host *(scoped — `docs/windows-host.md`; de-risked via SudoVDA)*
+
+Architecturally an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/
+crypto/C-ABI) + QUIC + GameStream + mgmt + the `m3`/pipeline orchestration are all platform-agnostic
+and already `cfg`-isolated (~95% reuse). New `#[cfg(windows)]` backends behind the existing traits:
+capture (DXGI Desktop Duplication / Windows.Graphics.Capture), encode (Media Foundation / NVENC-SDK
+with a D3D11 context), input (SendInput + ViGEm), audio (WASAPI loopback + a virtual mic).
+
+**The old blocker is gone.** Rather than author + sign our own kernel IDD for the per-client virtual
+display, use **SudoVDA** (the Sunshine Virtual Display Adapter) — a pre-built, signed Indirect
+Display Driver that creates virtual displays at arbitrary WxH@Hz on demand. The `VirtualDisplay`
+backend becomes *"install + drive SudoVDA's control API"* (M effort), not *"write + WHQL-sign a
+kernel driver"* (XL). That removes the only hard blocker — the Windows host is now a medium,
+mostly-mechanical port. Recommended start: **Phase 0** — capture an existing monitor to prove the
+stack end to end; **Phase 1** wires SudoVDA for the native-resolution output. Deferred only because
+it's unbuildable on the Linux dev box; the trait boundaries are already in the right places.
+
+## 8. Pairing & trust hardening *(next)*
+
+The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the
+client) is built and live. Two changes harden it from "works" to "secure by default":
+
+- ✅ **Mandatory PIN pairing by default — done & live** (`§8a`, `serve --native` now requires
+ pairing; `serve --open` disables it). An unpaired client is rejected at the session gate; pairing
+ is via the SPAKE2 PIN ceremony (one online guess, no offline attack) armed from the web console.
+ Validated live: unpaired → "this host requires pairing", then web-armed PIN → "client trusted".
+ Deployed to the dev box + Bazzite.
+- **Delegated pairing approval** *(next — the ergonomic enabler for "mandatory": pair a device
+ without fetching the host PIN out of band).* Target flow:
+ 1. Device A is already paired (authenticated) to Host X.
+ 2. The user tries to connect Device B to Host X.
+ 3. Host X surfaces a request: *"Allow Device B to pair with Host X?"*
+ 4. The user approves/denies; on approve, Host X admits Device B — binding B's certificate
+ fingerprint — with no PIN typed.
+
+ Two buildable layers:
+ - **§8b-1 (host + web — achievable now):** an unpaired B that connects to an approval-enabled host
+ is held as a **pending request** `{id, name, fingerprint, requested_at}` in `NativePairing`
+ instead of a flat reject; mgmt gains `GET /native/pending` + `POST /native/pending/{id}/{approve,
+ deny}`; the web console lists pending requests with Approve/Deny. The **operator approves from
+ the console** — delegated approval via the management surface.
+ - **§8b-2 (peer push — needs the client):** the host also pushes the pending request over a paired
+ **Device A**'s live QUIC connection (a new control-plane message); A's app renders the prompt and
+ replies approve/deny — the user's exact "Device A gets a notification" flow. The native/Apple UI
+ is a client-agent task.
+
+ PIN pairing (§8a) stays the bootstrap — the first device, or when no approver is online.
+
+## 9. Client→host network speed test + settable bitrate *(host side done — client UI remaining)*
+
+Measure what the network actually sustains so the bitrate picker is informed (suggest/cap a safe
+value) instead of guesswork that ends in a stuttering stream.
+
+**Done & live (host + protocol + connector + C ABI, `74819b1`):**
+- **Bitrate negotiation**: `bitrate_kbps` rides Hello/Welcome (trailing-byte back-compat). The
+ client requests a rate; the host clamps to [500 kbps, 500 Mbps] (or its 20 Mbps default on 0),
+ applies it to NVENC (replacing the old hardcoded 20 Mbps) on the initial mode + every reconfigure,
+ and echoes the resolved value. C ABI: `punktfunk_connect_ex3(…, bitrate_kbps, …)` +
+ `punktfunk_connection_bitrate()`.
+- **Bandwidth probe over the punktfunk/1 data path**: `ProbeRequest{target_kbps,duration_ms}` /
+ `ProbeResult{bytes_sent,…}` control messages + a `FLAG_PROBE` packet flag. The host bursts
+ zero-filled FEC-encoded AUs at the target goodput for the duration (clamped ≤ 1 Gbps / ≤ 5 s,
+ video paused), reports what it sent; the connector measures received bytes/window → goodput + loss
+ and exposes it (`punktfunk_connection_speed_test()` + `punktfunk_connection_probe_result()` →
+ `PunktfunkProbeResult{throughput_kbps, loss_pct, …}`). Probe filler is diverted from the decoder.
+ Validated on loopback (synthetic source): a 20 Mbps/2 s probe measured 20050 kbps at 0% loss,
+ interleaved probe AUs excluded from frame verification. `punktfunk-client-rs` gains `--bitrate` +
+ `--speed-test KBPS:MS` as the reference/loopback driver.
+
+**Remaining (client UI):** wire the C ABI into the Apple client — a "Test network" action
+(`speed_test` → poll `probe_result` → "~XXX Mbps · recommended bitrate YYY") feeding a bitrate
+control (`connect_ex3`), and surface both in the web console.
+
+## 10. HDR + 10-bit color *(parked — blocked upstream at the compositor producer)*
+
+Opt-in HDR10 (BT.2020 + PQ, 10-bit) streaming. Designed end to end; **blocked at capture, not in our
+stack** — the compositor doesn't emit a 10-bit/HDR PipeWire frame on any shipping build. Spiked +
+researched 2026-06-11 (memory: `hdr-blocked-gamescope-pipewire`); the downstream design is ready to
+build the moment a producer lands.
+
+- **The wall — gamescope capture is 8-bit.** gamescope composites HDR for a *display*
+ (`--hdr-enabled`, `--hdr-debug-force-output`), but its PipeWire capture node offers only `BGRx`/
+ `NV12` (8-bit) — confirmed by reading `src/pipewire.cpp` `build_format_params()` on upstream master
+ AND the box's exact build (`c31743d`); color is capped BT.601/709. Issue **#2126** ("pipewire: add
+ HDR streams") is OPEN + unstarted (no PR). Forcing HDR output does not change the capture format.
+- **PipeWire ≥1.6** is the other prerequisite (HDR colortype transport). Fedora 43 ships 1.4.x;
+ Fedora 44 ships **1.6.6** — but Bazzite F44 `deck-nvidia` is **testing-only** (`:stable` is still
+ F43; `:testing` has a confirmed NVIDIA Game-Mode crash). Rebasing now clears *only* the PipeWire
+ wall while gamescope stays 8-bit → **no-go** for HDR; revisit a rebase when F44 promotes to stable
+ (for its own sake), not for HDR.
+- **The realistic route is KWin, not gamescope:** **KWin MR !8293** is a live draft adding HDR
+ PipeWire capture. That pulls HDR onto the *desktop* (KWin) path — trading away gamescope's
+ Steam-Deck-UI polish + the dynamic-resolution work (§ above). Track #2126 and !8293.
+- **Constraints (settled):** NVENC tops out at **10-bit** → no Main12, no 12-bit AV1. **HDR ⟹ HEVC
+ Main10** (Apple VideoToolbox decodes 10-bit HEVC but **not** 10-bit AV1). Static HDR10 SEI
+ (BT.2020-PQ default) since the compositor won't surface per-frame metadata. Opt-in negotiation via
+ the Hello/Welcome trailing-byte pattern (SDR default; client declares HDR want; host master toggle).
+- **Downstream design (ready when capture unblocks):** add P010 + `ColorInfo` to capture; 10-bit
+ zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA
+ path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple
+ VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback.
+
+## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)*
+
+Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks
+the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling.
+
+**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and
+untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is
+only ~434–874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM
+(RustCrypto auto AES-NI, ~10–25× headroom on x86_64); the u64 sequence/nonce space; and the **M1
+`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every
+legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size
+must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated
+params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand.
+
+- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the
+ clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show
+ headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`)
+ so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock
+ send-buffer drop — `Transport::send` → `Result`, a new `packets_send_dropped` stat (Stats +
+ C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe
+ completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe.
+- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall
+ per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe
+ bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all`
+ (64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux
+ `sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with
+ `recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep
+ underdrain). ~64× fewer syscalls.
+- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the
+ reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 256–1024 (bounds burst-drop
+ blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump
+ `shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets.
+- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB,
+ never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface.
+- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path):
+ `punktfunk-client-rs --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching
+ `packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.5–1 Gbps (no explicit
+ `rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.
+
+## 12. Glass-to-glass latency *(investigated; quick wins landed, bigger bets scoped)*
+
+A 5-way investigation (2026-06-11) mapped where latency actually lives. The measured "p50 0.83 ms"
+is only the same-host **capture-stamp→reassembled** slice (~30–40% of true glass-to-glass) and was
+measured with tiny single-chunk frames, so it excludes the pacing tail. The latency that matters, in
+priority order: **(1) the host pacing tail** — `paced_submit` used to spread *every* multi-chunk
+frame over ~90% of the interval (up to ~7.5 ms@120 / ~15 ms@60); **(2) native-path serialization** —
+`virtual_stream` runs capture+encode+seal+paced-send on one thread, so frame N+1 can't start until
+frame N's paced tail leaves the wire; **(3) client present** — `AVSampleBufferDisplayLayer` adds
+~0.5 refresh (~4 ms@120Hz, ~8 ms@60Hz), the dominant client term at 60 Hz.
+
+**Already optimal — do NOT touch** (confirmed): NVENC tuning (p1/ull/cbr/bf0/delay0/infinite-GOP +
+forced-IDR — `receive_packet` is already same-frame); the device→device copy in `submit_cuda` (avoids
+NVENC registration-cache thrash); FEC `max_data_per_block=4096` (every frame incl. a 4 MB IDR is one
+block — no multi-block latency); the client reassembler (no jitter buffer, frame emitted on
+last-packet arrival, `REORDER_WINDOW` is a dedup bound not a delay) — do **not** add a client jitter
+buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
+
+- **Done & live (`99f60b5`):** **microburst-cap pacing** — a frame ≤ a cap (default 128 KB,
+ `PUNKTFUNK_PACE_BURST_KB`) bursts out immediately (no pacing tail); only a bigger frame's overflow
+ (IDR / sustained high bitrate — the bursts that actually froze) is spread. Recovers the tail on the
+ common case, keeps the freeze fix for the frames that need it; 128 KB is a safe default (well under
+ the ~150 Mbps@60 frame size where drops began). Plus **per-frame instrumentation** (PUNKTFUNK_PERF):
+ `encode_us` + `pace_us` p50/p99/max + immediate-vs-paced counts, so the cap is tunable against real
+ numbers. **Validate with the LAN soak before raising the cap** (`send_dropped` must stay 0).
+- **Done & live (`b295a5b`; validated on the GNOME box 2026-06-12):** **encode|send thread split**
+ on the native path — a dedicated `send_loop` thread owns the `Session` and does seal+pace+send+
+ probes; the encode thread captures+encodes+handles reconfig and hands `FrameMsg` over a bounded
+ `sync_channel(3)` with backpressure. Removes the serialization (~2–8 ms @60–120 fps) and is the
+ substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
+ LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
+- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
+ 1. **Wall-clock skew handshake + glass-to-glass probe** (`tools/latency-probe`) — measures the two
+ biggest unmeasured terms (render→capture, decode→present); client present-stamp vs the AU's
+ `pts_ns` (already attached).
+ 2. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
+ copy) — ~0.1–0.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
+ 3. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
+ off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
+ 4. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
+ encode+send within a frame (~3–6 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
+ thread split, only after measurement justifies it.
+
+## 13. Native-protocol LAN auto-discovery ✅ *(done — 2026-06-12, validated cross-LAN)*
+
+The native protocol had no discovery — clients connected by `--connect HOST:PORT` only, while
+GameStream already auto-discovered via mDNS (`_nvstream._tcp`). Now both the unified host
+(`serve --native`) and standalone `m3-host` advertise the native service over mDNS:
+
+- **Service**: `_punktfunk._udp.local.` (UDP — punktfunk/1 is QUIC; the advertised port is the QUIC
+ control/data port). Host side: `crate::discovery::advertise_native`, wired into `m3::serve` so
+ both host entry points get it; best-effort (a discovery failure never blocks streaming —
+ `--connect` always works). The advert is held for the host's lifetime (RAII unregister).
+- **TXT records**: `proto=punktfunk/1`, `fp=` (the value a client pins — advisory
+ over unauthenticated mDNS, TOFU/pinning still verifies on connect), `pair=required|optional`
+ (so a picker knows up front whether the PIN ceremony is needed), `id=` (dedup).
+- **Client**: `punktfunk-client-rs --discover [SECS]` browses and prints each host (name, addr:port,
+ pairing, fingerprint), then exits. Apple clients browse the same service natively via NWBrowser
+ (Bonjour) — no Rust-connector dependency; this section's service type + TXT keys are the contract.
+- **Validated**: cross-LAN — dev box discovered the GNOME-box appliance
+ (`home-worker-3 192.168.1.248:9777 pair=required fp=1dcf3a…`) and a standalone synthetic host
+ (`pair=optional`); fingerprint + pairing state correct in both.
+- **Next** (not done): wire NWBrowser discovery into the Apple client UI (host picker); the
+ host-side contract above is all it needs.
diff --git a/docs-site/content/docs/windows-host.md b/docs-site/content/docs/windows-host.md
new file mode 100644
index 0000000..6fbabf2
--- /dev/null
+++ b/docs-site/content/docs/windows-host.md
@@ -0,0 +1,77 @@
+---
+title: "Windows Host"
+description: "Feasibility and scoping for a Windows host backend."
+---
+
+
+**Status: scoped, deferred — but de-risked.** A Windows host is architecturally an *"add a backend"*
+job, not a parallel port. The one thing that used to make it **large** — the per-client *virtual*
+output, which has no user-mode Windows API and seemingly needed a self-signed kernel Indirect
+Display Driver (IDD) — is **solved by reusing [SudoVDA](https://github.com/VirtualDrivers), the
+Sunshine Virtual Display Adapter**: a pre-built, signed IDD that creates virtual displays at
+arbitrary `WxH@Hz` on demand. We install it and drive its control interface; **no driver to write or
+WHQL-sign.** That turns the headline feature from XL into a medium backend. This doc records what's
+left so the work can be picked up deliberately.
+
+(Grounded in a 4-agent read of the host crate, 2026-06-10; SudoVDA path added 2026-06-11.)
+
+## What's already done for us
+
+punktfunk is cleanly layered. **~95% of the codebase is platform-agnostic and reuses verbatim:**
+
+| Reusable as-is | Why |
+|---|---|
+| `punktfunk-core` (protocol, FEC, crypto, session, transport, **C ABI**) | Zero platform deps — no `cfg(linux)` anywhere; the C ABI is already cross-platform |
+| QUIC control plane (`quic.rs`, pairing, mode negotiation) | quinn + tokio are portable |
+| GameStream P1.1 (mDNS, serverinfo, pairing, RTSP, ENet) — *except* `stream.rs`/`audio.rs` | pure wire logic |
+| Management REST API (`mgmt.rs`) + OpenAPI | axum/tokio, portable |
+| Pipeline + `m3.rs` orchestration | trait-generic — calls `capturer.next_frame()`, `encoder.submit/poll()`; **needs zero changes** |
+| The **trait boundaries** themselves: `Capturer`, `Encoder`, `VirtualDisplay`, `InputInjector`, `AudioCapturer`, `VirtualMic` | platform-neutral signatures; Linux deps are already isolated under `[target.'cfg(target_os="linux")'.dependencies]` |
+
+So a Windows host is **new `#[cfg(target_os = "windows")]` backend modules behind the existing
+traits** — the per-frame path, protocol, and control plane don't move. No architectural refactor is
+required; the boundaries are already in the right places.
+
+## What a Windows host needs (new code)
+
+Each row is a Linux backend that needs a Windows sibling. Effort is the *implementation* effort;
+all reuse the existing trait.
+
+| Subsystem | Linux today | Windows equivalent | Effort | Notes |
+|---|---|---|---|---|
+| **Capture** | xdg ScreenCast portal → PipeWire (dmabuf) | **DXGI Desktop Duplication** (or Windows.Graphics.Capture) → D3D11 texture | M | DXGI gives a GPU `B8G8R8A8` texture directly |
+| **Virtual display** | KWin/Mutter/Sway/gamescope protocols | **SudoVDA** (pre-built signed IDD) — install + drive its control API to add/remove a `WxH@Hz` virtual monitor per session | **M** | ✅ **no longer the blocker**: SudoVDA is the same IDD Sunshine ships, so no driver to author or sign. The `VirtualDisplay` backend = enable the adapter, create a monitor at the client's mode, capture it (DXGI), tear it down on session end. Fallback if SudoVDA is absent: capture an existing monitor (loses native-resolution) |
+| **Encode** | `ffmpeg-next` NVENC, CUDA hwframes | Media Foundation H.264/HEVC/AV1, **or** NVENC SDK direct with a D3D11 device context (`AVD3D11VADeviceContext`) | M–L | `encode.rs` AU/codec logic + NVENC option strings are portable; only the hwdevice + frame-pool glue swaps |
+| **Zero-copy bridge** | dmabuf → EGL/Vulkan → CUDA | D3D11 texture → NVENC (shared texture / `cudaImportExternalMemory` + D3D12 fence) | M | **optional** — a portable CPU-copy path already exists, so v1 can skip this |
+| **Input (ptr/kbd)** | libei (RemoteDesktop portal) / wlr protocols | **SendInput** (`keybd_event`/`mouse_event`) | S | the VK→evdev table just becomes VK→`VIRTUAL_KEY` (already Win32-native) |
+| **Input (gamepads)** | uinput X-Box-360 pad + FF rumble | **ViGEm** (Virtual Gamepad Emulation) + HID reports | M | rumble back-channel maps to ViGEm notifications |
+| **Audio capture** | PipeWire sink-monitor | **WASAPI loopback** (`IAudioCaptureClient`) | S–M | also produces interleaved f32 — same `AudioCapturer` contract |
+| **Virtual mic** | PipeWire `Audio/Source` | virtual audio device (VB-Cable-style WDM driver) or WASAPI render-to-fake-device | M | needs a driver or a bundled 3rd-party cable |
+| **`sendmmsg` batching** | `gamestream/stream.rs` | already has a `cfg(not(linux))` per-packet fallback | — | nothing to do |
+
+**Rough total: ~2,000–4,000 LOC of new Rust** (no C++ driver — SudoVDA is reused as-is), spread over
+capture/encode/vdisplay/input/audio. With the driver problem solved, the overall effort is now
+**medium**; the input+audio layer alone is *small–medium*.
+
+## Recommended phasing (when picked up)
+
+1. **Phase 0 — "basic Windows host" (no virtual display).** Capture an *existing* monitor (DXGI
+ Desktop Duplication) → Media Foundation/NVENC encode → SendInput + WASAPI loopback. This proves
+ the whole stack on Windows with the smallest surface, reusing all of core/QUIC/GameStream/mgmt.
+ It loses the per-client native-resolution output but is a working Windows host quickly.
+2. **Phase 1 — the virtual display via SudoVDA.** A `VirtualDisplay` backend that enables SudoVDA,
+ creates a monitor at the client's exact `WxH@Hz`, captures it (DXGI), and tears it down on session
+ end — restoring punktfunk's headline feature with **no driver authoring or signing**. (Ship/guide
+ the SudoVDA install as a host prerequisite, like the udev rule on Linux.)
+3. **Phase 2 — input + audio parity.** ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC
+ zero-copy.
+
+## Why it's deferred (not started now)
+
+- The remaining work is **medium** and mechanical, but **none of it is buildable or testable on the
+ Linux dev box** — it would be unvalidated code until there's a Windows box in the loop.
+- SudoVDA removed the hard blocker (the signed kernel driver); what's left is a backend port, picked
+ up whenever a Windows target is in scope.
+
+The architecture is ready whenever the work is scheduled; this doc + the clean trait boundaries are
+the down payment. Start at **Phase 0** for the fastest path to a working Windows host.
diff --git a/docs-site/package.json b/docs-site/package.json
new file mode 100644
index 0000000..9412b54
--- /dev/null
+++ b/docs-site/package.json
@@ -0,0 +1,34 @@
+{
+ "name": "punktfunk-docs",
+ "private": true,
+ "type": "module",
+ "description": "punktfunk documentation site — Fumadocs on TanStack Start (Vite + Nitro/bun)",
+ "scripts": {
+ "dev": "vite dev --port 3001",
+ "build": "vite build",
+ "start": "bun run .output/server/index.mjs",
+ "lint": "tsc --noEmit"
+ },
+ "dependencies": {
+ "@tanstack/react-router": "^1.121.0",
+ "@tanstack/react-start": "^1.121.0",
+ "fumadocs-core": "^16.10.1",
+ "fumadocs-ui": "^16.10.1",
+ "react": "^19.0.0",
+ "react-dom": "^19.0.0"
+ },
+ "devDependencies": {
+ "@tailwindcss/vite": "^4.0.0",
+ "@tanstack/nitro-v2-vite-plugin": "^1.155.0",
+ "@types/mdx": "^2.0.14",
+ "@types/node": "^22.10.0",
+ "@types/react": "^19.0.0",
+ "@types/react-dom": "^19.0.0",
+ "@vitejs/plugin-react": "^5",
+ "fumadocs-mdx": "^15.0.12",
+ "tailwindcss": "^4.0.0",
+ "typescript": "^5.7.0",
+ "vite": "^7.3.5",
+ "vite-tsconfig-paths": "^5.1.0"
+ }
+}
diff --git a/docs-site/source.config.ts b/docs-site/source.config.ts
new file mode 100644
index 0000000..5bd5063
--- /dev/null
+++ b/docs-site/source.config.ts
@@ -0,0 +1,10 @@
+import { defineDocs, defineConfig } from 'fumadocs-mdx/config'
+
+// Single docs collection: every .md/.mdx under content/docs/. Frontmatter `title` is
+// required (Fumadocs renders it as the page heading); `description` is optional.
+export const docs = defineDocs({
+ dir: 'content/docs',
+})
+
+// Global MDX options (remark/rehype). Default export is read by the fumadocs-mdx vite plugin.
+export default defineConfig()
diff --git a/docs-site/src/components/mdx.tsx b/docs-site/src/components/mdx.tsx
new file mode 100644
index 0000000..5049598
--- /dev/null
+++ b/docs-site/src/components/mdx.tsx
@@ -0,0 +1,15 @@
+import defaultMdxComponents from 'fumadocs-ui/mdx'
+import type { MDXComponents } from 'mdx/types'
+
+export function getMDXComponents(components?: MDXComponents) {
+ return {
+ ...defaultMdxComponents,
+ ...components,
+ } satisfies MDXComponents
+}
+
+export const useMDXComponents = getMDXComponents
+
+declare global {
+ type MDXProvidedComponents = ReturnType
+}
diff --git a/docs-site/src/lib/layout.shared.tsx b/docs-site/src/lib/layout.shared.tsx
new file mode 100644
index 0000000..6017665
--- /dev/null
+++ b/docs-site/src/lib/layout.shared.tsx
@@ -0,0 +1,10 @@
+import type { BaseLayoutProps } from 'fumadocs-ui/layouts/shared'
+
+// Shared chrome (nav title, links) for both the docs layout and the home layout.
+export function baseOptions(): BaseLayoutProps {
+ return {
+ nav: {
+ title: 'punktfunk',
+ },
+ }
+}
diff --git a/docs-site/src/lib/source.ts b/docs-site/src/lib/source.ts
new file mode 100644
index 0000000..30fc067
--- /dev/null
+++ b/docs-site/src/lib/source.ts
@@ -0,0 +1,7 @@
+import { docs } from 'collections/server'
+import { loader } from 'fumadocs-core/source'
+
+export const source = loader({
+ baseUrl: '/docs',
+ source: docs.toFumadocsSource(),
+})
diff --git a/docs-site/src/router.tsx b/docs-site/src/router.tsx
new file mode 100644
index 0000000..fc65115
--- /dev/null
+++ b/docs-site/src/router.tsx
@@ -0,0 +1,29 @@
+import { createRouter as createTanStackRouter, Link } from '@tanstack/react-router'
+import { routeTree } from './routeTree.gen'
+
+export function getRouter() {
+ return createTanStackRouter({
+ routeTree,
+ defaultPreload: 'intent',
+ scrollRestoration: true,
+ defaultNotFoundComponent: NotFound,
+ })
+}
+
+function NotFound() {
+ return (
+
+