docs(site): Fumadocs documentation site on TanStack Start
ci / rust (push) Has been cancelled

New standalone app at docs-site/ — Fumadocs (fumadocs-core/ui 16, fumadocs-mdx
15) on TanStack Start (Vite 7 + nitro-v2 bun preset, React 19, Tailwind 4),
mirroring the web/ console stack but with no auth/i18n/orval — docs stay public.

- catch-all docs route (routes/docs/$.tsx), Orama search (routes/api/search.ts),
  RootProvider shell, MDX component map, shared nav, custom 404
- content/docs/: hand-written index.mdx + meta.json nav, plus 7 pages imported
  from repo docs/ + README (leading H1 stripped, YAML frontmatter added; kept as
  .md so existing </{ don't trip MDX JSX). Content is a one-time snapshot.
- mdx() is plugins[0]; tsconfig collections/* -> ./.source/*; SSR search variant;
  @source for fumadocs-ui classes. Generated .source/routeTree/dist/.output ignored.

Verified: bun run build (client+SSR+nitro) green, tsc clean, dev + prod servers
serve all routes 200 with SSR content + nav, search returns hits, 404 works.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-12 11:17:59 +00:00
parent 4fff4641bb
commit 50c9db785a
25 changed files with 3144 additions and 0 deletions
+15
View File
@@ -0,0 +1,15 @@
node_modules
.output
.tanstack
.nitro
dist
*.local
# Fumadocs MDX codegen (the `collections/*` virtual modules) — regenerated by the
# fumadocs-mdx vite plugin on dev/build.
.source
# TanStack Router file-based route tree — regenerated by the start plugin.
src/routeTree.gen.ts
.env
+39
View File
@@ -0,0 +1,39 @@
# punktfunk-docs
The punktfunk documentation site: [Fumadocs](https://fumadocs.dev) on
[TanStack Start](https://tanstack.com/start) (Vite + Nitro/bun preset).
Content lives in [`content/docs/`](content/docs) as `.md`/`.mdx`. Several pages are imported
verbatim from the repo's `docs/` design notes (with added frontmatter); edit those there or
here as the docs site becomes the source of truth.
## Develop
```sh
bun install
bun run dev # http://localhost:3001 (docs at /docs)
```
## Build & serve
```sh
bun run build
bun run start # serves .output/ via Bun
```
## Layout
```
source.config.ts Fumadocs MDX collection (content/docs)
content/docs/ the docs content (.md/.mdx) + meta.json nav
src/
routes/
__root.tsx RootProvider + html shell
index.tsx landing page
docs/$.tsx catch-all docs renderer (Fumadocs DocsLayout)
api/search.ts Orama search endpoint
lib/source.ts Fumadocs loader over the generated collection
lib/layout.shared.tsx shared nav chrome
components/mdx.tsx MDX component map
styles/app.css Tailwind 4 + Fumadocs preset
```
+1592
View File
File diff suppressed because it is too large Load Diff
+111
View File
@@ -0,0 +1,111 @@
---
title: "DualSense Haptics"
description: "Feasibility and scoping for audio-driven DualSense haptics."
---
**Status: scoped, NO-GO for now (deferred).** Advanced voice-coil haptics on the DualSense are
driven by the controller's **USB audio interface** (4-channel surround, the back two channels carry
the haptic waveform), *not* by HID reports. Emulating that on a Linux host and faithfully replaying
it on the Apple client both hit hard walls, and the supply of software that actually *emits* these
haptics on a Linux host is essentially zero. We defer the audio-haptics feature and instead land the
parts of "really supporting the DualSense" that *are* reachable: **adaptive triggers (HID) and
two-motor rumble.**
(Grounded in a 4-agent feasibility read — host USB-gadget viability, DualSense audio descriptors,
Linux game demand, Apple client render path — 2026-06-10.)
## The one distinction that decides everything
| Feature | How it's driven | Reachable for us? |
|---|---|---|
| Basic rumble (2 motors) | HID output report `0x02`, bytes 34 | **Yes** — already parsed; client already has `nextRumble()` |
| **Adaptive triggers** (L2/R2 resistance) | HID output report `0x02`, bytes 1122 / 2233 | **Yes** — already parsed in `dualsense.rs`; just needs the `0xCD` back-channel + client render |
| **Advanced haptics** (voice-coil actuators) | **USB *audio* interface** — 4-ch, back 2 channels = haptic PCM | **No (for now)** — see the three walls below |
The UHID DualSense we already built is **HID-only**. It cannot present the DualSense's *audio*
interface, so it structurally cannot carry advanced haptics. That's not a bug in our implementation —
it's the wrong transport for this signal.
## The three walls (any one is fatal on its own)
### Wall 1 — Host capture needs a kernel rebuild
To *capture* haptic audio a game emits, the host must present a virtual device that owns the
DualSense audio interface. The standard way is a composite USB gadget (`configfs` + `f_hid` +
`f_uac2`) bound to a software UDC (`dummy_hcd`).
- ✅ Present & enabled on this box: `CONFIG_USB_CONFIGFS`, `CONFIG_USB_CONFIGFS_F_HID`,
`CONFIG_USB_CONFIGFS_F_UAC2`, plus `libcomposite`/`usb_f_hid`/`usb_f_uac2`/`u_audio` modules.
-**Blocker:** `# CONFIG_USB_DUMMY_HCD is not set` in `/boot/config-7.0.0-22-generic`. No
`dummy_hcd.ko`, no `/sys/class/udc/`. **No UDC → nothing to bind the gadget to.** Requires a
custom kernel build to enable `CONFIG_USB_DUMMY_HCD=m`, plus root for module-load/configfs.
A lighter alternative exists — a **virtual PipeWire/ALSA sink renamed as the DualSense** (this is how
the working Linux setups capture the back-2-channels today, via WirePlumber rules). It skips the
kernel rebuild, but is gated by the same Wall 2 below, and games' audio-device detection is
hardcoded per-title so it's fragile.
### Wall 2 — Almost nothing on a Linux host emits these haptics
This is the decisive one. The *supply* that would feed our capture barely exists:
- **Steam Input (Linux):** no official advanced-haptics support (open feature request as of 2026).
- **Sony's `hid-playstation` kernel driver:** explicitly does **not** expose VCM haptics or adaptive
triggers — basic rumble only.
- **RPCS3:** treats the DualSense as a generic pad; no advanced haptics.
- **Native Linux games:** effectively **zero** with advanced haptics.
- **The only working path** is a handful of Proton titles (FF7 Remake, Ghostwire, Deathloop, Animal
Well, Stellar Blade) via ClearlyClaire's *custom Wine patches*, **USB-only**, Steam Input
disabled, forced into a 4.0-surround profile, device renamed to match Windows. ~510 games total.
- Bluetooth can't carry it on Linux *or* Windows (Sony's proprietary A2DP repurposing isn't exposed).
A host-side capture feature is only as useful as the software willing to drive it. On Linux that set
is a niche-of-a-niche.
### Wall 3 — The Apple client can't faithfully replay it
Even with a captured waveform, the primary client (macOS/iOS) can't render it well:
- macOS GameController exposes the DualSense as a **basic gamepad** — no voice-coil / adaptive-trigger
access. Those are PS5-only in Apple's stack.
- CoreHaptics is **discrete, pattern-based** (`CHHapticPattern` events, ≤30 s), **not** a PCM
streaming sink. Converting a streamed haptic waveform to patterns is lossy — it throws away exactly
the fidelity that makes voice-coil haptics worth having.
- There is **no public macOS API** to route CoreAudio to the DualSense's channels 34. Doing it
anyway means private/reverse-engineered APIs that break across OS updates.
## What we *can* ship instead ("really supporting the DualSense" minus audio haptics)
The HID DualSense we built is the foundation, and the high-value parts are within reach:
1. **Adaptive triggers — GO.** `dualsense.rs` already parses the L2/R2 trigger effects out of HID
output report `0x02`. Finishing this is the paused HID work: route them over the `0xCD`
HID-output back-channel and render on the client. This delivers the headline "DualSense feel"
(trigger resistance/weapon tension) for any source that emits it — and it's pure HID, no audio
interface, no kernel rebuild.
2. **Two-motor rumble — already done.** Parsed host-side; the Apple client already has
`nextRumble()`. Wire it to `GCDeviceHaptics`/`CHHapticEngine` as discrete patterns (API-clean,
no private APIs).
3. **LED / player-LED / touchpad / motion** — already parsed; finish the `0xCC`/`0xCD` routing.
This is the resume-able HID DualSense Phase C/D/E work — it stands on its own and was never blocked.
## Conditions for a future GO on audio haptics
Revisit if **all three** change:
- A real DualSense is available on the dev box to capture an authoritative `lsusb -v` + the exact
UAC channel/sample-rate/format layout (today: undocumented, would need reverse-engineering).
- The host target gains a UDC (custom kernel with `dummy_hcd`, or real hardware OTG) **or** we accept
the PipeWire-renamed-sink path *and* the title set that emits haptics on Linux grows beyond the
Proton-patch niche.
- The client target shifts to one that can render PCM haptics (a Linux/Windows client with direct
CoreAudio-style channel access, or a future Apple API) — or we accept lossy pattern conversion.
Until then the cost/benefit is upside-down: three hard subsystems (kernel, USB gadget, audio
routing) to serve ~510 Proton titles, rendered lossily on the one client we ship.
## Recommendation
**Defer audio-driven advanced haptics. Land adaptive triggers (HID) + rumble instead** — that's the
reachable 80% of "really supporting the DualSense," needs no kernel work, and the parsing is already
written. Keep this doc as the down payment for the audio-haptics feature whenever the three
conditions above are met.
@@ -0,0 +1,300 @@
---
title: "Implementation Plan"
description: "The full design: protocol core, milestones, and architecture."
---
*A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.*
> `punktfunk` is a placeholder codename — rename freely. It fits the lowercase house style (`unom`, `played`, `remplir`) and reads as "glass-to-glass light," which is the whole point.
---
## 0. The thesis (why this is worth building)
Two concrete gaps justify a new project rather than another fork:
1. **The 1 Gbps wall is a FEC design limit, not a bandwidth limit.** Moonlight/Sunshine protect each frame with ReedSolomon over GF(2⁸), which caps a block at 255 shards. At 5120×1440@240 that ceiling is hit around 1 Gbps. Switching the erasure code to **Leopard-RS over GF(2¹⁶)** (via the `reed-solomon-simd` crate) raises the per-block shard limit to 65,536 and runs in O(n log n) with SIMD. The wall disappears as a *consequence* of a better core, not as a hack.
2. **Linux software virtual displays are a real, unfilled gap.** The compositor-side capability now exists (Mutter headless virtual monitors since GNOME 40; wlroots headless outputs; KWin virtual outputs in Plasma 6), but no streaming host *drives* those APIs to create a client-sized output on demand, capture it via PipeWire, and route input back via libei. Apollo's virtual display is Windows-only. This is the immediate, shippable win.
**Strategic ordering:** ship the Linux virtual-display host speaking the *existing* Moonlight protocol first (every Moonlight/Artemis client works on day one, no client to write). Only then introduce the new GF(2¹⁶) transport as a negotiated protocol extension with our own clients. Value early, hard parts deferred until de-risked.
---
## 1. Scope & non-goals
**In scope (eventually):**
- Linux streaming host with on-demand software virtual displays (KWin first, then wlroots, then Mutter).
- A shared Rust protocol/transport/FEC core exposed over a stable C ABI.
- A modern transport that removes the 1 Gbps ceiling.
- Native clients: Rust (Linux), Swift (macOS/iOS), Kotlin (Android) — all linking the same core.
**Explicit non-goals (at least at first):**
- Windows *host* support (Sunshine/Apollo already do this well; no gap to fill).
- Internet/NAT-traversal relay infrastructure (LAN/VPN first; you already run Headscale/NetBird — lean on that).
- Reinventing encoders/decoders (bind to FFmpeg + vendor SDKs; never rewrite codecs).
- A bespoke compositor (drive existing ones; only consider a dedicated headless compositor as a *deployment mode*, see §6).
---
## 2. Architecture overview
```mermaid
flowchart TD
subgraph Host["Linux Host (Rust)"]
VD["Virtual display orchestrator<br/>(KWin / wlroots / Mutter)"]
CAP["Capture<br/>(PipeWire / dmabuf)"]
ENC["Encoder<br/>(VAAPI / NVENC via FFmpeg)"]
VD --> CAP --> ENC
ENC --> COREH
IN_H["Input injector<br/>(libei / uinput)"]
COREH["punktfunk-core (C ABI)<br/>protocol · FEC · pacing · crypto"]
COREH --> IN_H
end
COREH <-->|"UDP+FEC video / QUIC control+audio"| COREC
subgraph Client["Client (Rust / Swift / Kotlin)"]
COREC["punktfunk-core (same crate, C ABI)"]
DEC["Decoder<br/>(VideoToolbox / NVDEC / VAAPI)"]
PRES["Present + frame pacing"]
INP["Input capture"]
COREC --> DEC --> PRES
INP --> COREC
end
```
**The load-bearing decision:** `punktfunk-core` is one crate, compiled once, linked by every host and client through a C ABI. Protocol logic, FEC, packet pacing, jitter buffering, pairing, and crypto live there and exist exactly once. Platform code (capture, encode, decode, present, input, UI) lives outside the core and is written in whatever language suits the platform.
---
## 3. Protocol strategy (three phases)
| Phase | Protocol | Clients that work | Bitrate ceiling | Purpose |
|------|----------|-------------------|-----------------|---------|
| **P1** | GameStream-compatible (existing Moonlight wire format) | All existing Moonlight/Artemis clients | ~1 Gbps (legacy GF(2⁸) FEC) | Ship the Linux virtual-display win with zero client work |
| **P2** | `punktfunk/1` negotiated extension: GF(2¹⁶) FEC, multi-block framing, optional QUIC control | punktfunk clients only; falls back to P1 for others | Multi-Gbps | Break the wall; introduce native clients |
| **P3** | `punktfunk/1` as primary; GameStream kept as compat shim | punktfunk everywhere, Moonlight as fallback | Multi-Gbps | Full control of features (mic passthrough, per-client identity, HDR signalling) |
Negotiation: extend the `serverinfo`/RTSP `SETUP` handshake with a capability flag. Old clients never see the flag and get P1 behavior. This is how Apollo/Artemis diverge cleanly, and it keeps you compatible while you build.
---
## 4. Tech stack (settled)
**Language split:** Rust for the core and all non-Apple platform code; Swift only for the macOS/iOS client UI + VideoToolbox/Metal; Kotlin for Android UI + MediaCodec. The C ABI is the seam.
**Threading:** native OS threads for the video hot path. `tokio` is allowed *only* for the control plane (pairing, web config, QUIC control stream). The per-frame pipeline must never touch an async runtime.
### Core crate dependencies
| Concern | Crate | Notes |
|--------|-------|-------|
| FEC | `reed-solomon-simd` (v3+) | Leopard/GF(2¹⁶), SIMD, O(n log n) — the wall-breaker |
| QUIC (control/audio) | `quinn` | Datagram ext for audio; reliable streams for control |
| TLS / crypto | `rustls` + `ring` (or `aws-lc-rs`) | Pairing, session keys (AES-GCM to match GameStream in P1) |
| Serialization | `zerocopy` / `bytes` | Wire structs `#[repr(C)]`, zero-copy parse |
| C header gen | `cbindgen` | Generates `punktfunk_core.h` from the ABI module |
| Error/log | `tracing` | Structured; feature-gate off the hot path |
### Linux host dependencies
| Concern | Crate / API | Notes |
|--------|-------------|-------|
| Capture | `pipewire` (pipewire-rs) | ScreenCast portal stream → dmabuf |
| Portal / DBus | `ashpd` + `zbus` | xdg-desktop-portal: ScreenCast, RemoteDesktop |
| Encode | `ffmpeg-next` or `rsmpeg` | VAAPI / NVENC, dmabuf import (zero-copy) |
| Input inject | `reis` (libei) + `input-linux` (uinput fallback) | Wayland-native first, uinput as universal fallback |
| Virtual output | per-compositor (see §6) | KWin DBus / Sway `create_output` / Mutter DBus |
| Web config | `axum` + `tokio` + small Vite/React UI | You own this stack already |
### Apple client (P2+)
Swift + VideoToolbox (decode) + Metal (present) + SwiftUI. Imports `punktfunk_core.h` directly via a module map — no glue layer.
### Ruled out
- **Swift for the host/core:** no Linux Wayland/PipeWire/DRM/VAAPI ecosystem; ARC in hot loops. (Excellent *Apple-client* language, wrong for systems/Linux.)
- **Go:** GC disqualifies the hot path.
- **C++:** throws away the safety/concurrency wins that justified greenfield over forking.
- **Zig:** best-in-class C interop, but pre-1.0 with no Wayland/QUIC ecosystem — too much risk for a multi-month build. Revisit later if desired.
---
## 5. The C ABI boundary
Design it on day one; retrofitting an ABI is painful.
**Principles**
- Opaque handles only across the boundary: `PunktfunkSession*`, never Rust types.
- All cross-boundary structs are `#[repr(C)]`; primitives + pointer/len pairs for buffers.
- Async events via registered C callbacks (`fn ptr` + `void* userdata`).
- Explicit, documented ownership: who frees what, when. Provide `punktfunk_*_free` for every allocation that crosses out.
- Versioned ABI: `uint32_t punktfunk_abi_version(void)` + a `PunktfunkConfig` struct whose first field is its own size for forward-compat.
**Minimal surface (sketch)**
```c
// lifecycle
PunktfunkSession* punktfunk_session_new(const PunktfunkConfig* cfg);
void punktfunk_session_free(PunktfunkSession*);
// host: feed an encoded access unit (the core does FEC + packetize + pace + send)
int punktfunk_host_submit_frame(PunktfunkSession*, const uint8_t* data, size_t len,
uint64_t pts_ns, PunktfunkFrameFlags flags);
// client: pull a reassembled, FEC-recovered access unit ready to decode
int punktfunk_client_poll_frame(PunktfunkSession*, PunktfunkFrame* out /*borrowed until next poll*/);
// input (both directions): client captures, host receives via callback
int punktfunk_send_input(PunktfunkSession*, const PunktfunkInputEvent*);
void punktfunk_set_input_callback(PunktfunkSession*, PunktfunkInputCb, void* user);
// stats for the frame-pacing/quality logic and the web UI
void punktfunk_get_stats(PunktfunkSession*, PunktfunkStats* out);
```
Keep it this small. Everything platform-specific (how you got the encoded bytes, how you decode them) stays on the platform side.
---
## 6. Virtual display orchestration
This is the differentiator and the most fragmented part. Two deployment models — support both eventually, pick one for the MVP.
**Model A — Attach to the running session.** Create a client-sized virtual output *inside the user's live desktop*, stream it, tear it down on disconnect. This is "add a monitor to my actual PC." Best UX, hardest because it depends on per-compositor runtime APIs.
**Model B — Dedicated headless session.** Spawn a separate headless compositor purely for the stream (e.g. `gnome-shell --headless --virtual-monitor WxH`, or a headless wlroots compositor). Cleaner isolation, sidesteps runtime-output APIs, ideal for "remote second PC." Worse for "mirror/extend my real desktop."
**Per-compositor (Model A) runtime virtual-output creation:**
- **KWin / Plasma 6 (recommended MVP target — matches your CachyOS/KDE daily driver and where the gap is loudest):** KWin can create virtual outputs; KRdp already does this internally for remote sessions. Drive it via the KWin DBus interface; capture via `xdg-desktop-portal-kde` ScreenCast (PipeWire); inject input via the RemoteDesktop portal or `reis`.
- **wlroots (Sway/Hyprland — fastest to *prototype* the pipeline):** enable the headless backend (`WLR_BACKENDS=…,headless`), then `swaymsg create_output` / `hyprctl output create headless`. Capture via `wlr-screencopy` or the portal. Simplest API; good for validating capture→encode→send before fighting KWin/Mutter.
- **Mutter / GNOME:** virtual monitors via the headless backend; runtime creation via Mutter DBus (`org.gnome.Mutter.*` — partly experimental). Capture via `xdg-desktop-portal-gnome` ScreenCast.
**Recommendation:** do a 12 day wlroots spike to prove the *pipeline*, then build the real MVP on KWin because that's your deployment target. Abstract virtual-output creation behind a trait so compositors are pluggable:
```rust
trait VirtualDisplay {
fn create(&self, mode: Mode) -> Result<OutputHandle>;
fn destroy(&self, h: OutputHandle) -> Result<()>;
}
```
---
## 7. The hot path: pipeline & latency budget
Per-frame pipeline, each stage on its own thread, connected by bounded SPSC channels (drop-oldest on overflow, never block the encoder):
```
capture(dmabuf) → encode(NVENC/VAAPI) → core[FEC+packetize+pace+send]
│ network
client: recv → core[reorder+FEC recover+jitter] → decode → present
```
**Glass-to-glass budget (LAN, 240 Hz = 4.17 ms/frame):**
| Stage | Target | Notes |
|------|--------|-------|
| Capture latency | ≤ 1 frame | dmabuf, no copy to CPU |
| Encode | 14 ms | NVENC low-latency preset; tune lookahead off |
| FEC + packetize | < 1 ms | SIMD RS; pre-allocated shard buffers |
| Network (LAN) | < 1 ms | `sendmmsg` / UDP GSO to cut syscalls |
| Jitter buffer | 01 frame | adaptive; minimum that hides observed jitter |
| FEC recover + reassemble | < 1 ms | only when loss occurs |
| Decode | 14 ms | hardware decoder |
| Present | ≤ 1 frame | align to client vsync |
**Target: 1535 ms glass-to-glass on LAN.** The art is *frame pacing* — matching capture/encode cadence to the client's actual refresh and keeping the jitter buffer as small as the link allows. This, not the codec, is what separates good from bad streaming. Budget real time for it.
**Throughput math to keep honest:** 5120×1440@240 ≈ 1.77 Gpx/s. At 0.5 bpp that's ~885 Mbps; 0.6 bpp ≈ 1.06 Gbps; 0.8 bpp (4:4:4 headroom) ≈ 1.4 Gbps. The GF(2¹⁶) FEC + multi-block framing must sustain these without the per-frame shard count being the limiter — which it no longer is once you leave GF(2⁸).
---
## 8. Milestones
Sizing is rough and relative (Spike / S / M / L) for a focused solo dev; treat as ordering, not deadlines.
**M0 — Pipeline spike (S).** wlroots headless output → PipeWire capture → VAAPI/NVENC encode → dump H.265 to a file that plays. *Acceptance:* a valid encoded file from a virtual output, no streaming yet. Proves the Linux capture+encode chain end-to-end.
**M1 — `punktfunk-core` skeleton + C ABI (M).** Session lifecycle, GameStream-compatible packetization and GF(2⁸) FEC (P1), AES-GCM, `cbindgen` header, a tiny C test harness. *Acceptance:* core links from C; round-trips packets in a loopback test with simulated loss.
**M2 — P1 host: stream to stock Moonlight (L).** Wire M0's pipeline into the core; implement `serverinfo`/pairing/RTSP enough for a real Moonlight client to connect, with a KWin virtual output created on connect and destroyed on disconnect. Input via `reis`/uinput. *Acceptance:* **you play a game on your KDE box streamed to a stock Moonlight client on a virtual display, no dummy plug, no kernel args.** This is the shippable milestone and the project's reason to exist.
**M3 — Measurement harness (S).** Glass-to-glass latency measurement (on-screen QR/timestamp or photodiode), packet-loss injection, frame-pacing and stall metrics surfaced in the web UI. *Acceptance:* you can quantify a regression. Build this before optimizing anything.
**M4 — P2 transport: break the wall (L).** Add `punktfunk/1` negotiation; swap to `reed-solomon-simd` GF(2¹⁶) with multi-block per-frame framing; optional QUIC control/audio. Write a minimal **Rust** reference client (decode via VAAPI, present via wgpu/Vulkan) to exercise it. *Acceptance:* a stable stream above 1.4 Gbps at 5120×1440@240 with loss recovery working; latency unchanged vs. M2.
**M5 — Apple client (L).** Swift + VideoToolbox + Metal + SwiftUI, linking `punktfunk-core` via the C header. *Acceptance:* the Mac Studio plays a stream at native resolution/refresh.
**M6 — Feature surface (M, ongoing).** Mic passthrough as a proper encrypted, per-client reverse audio stream (the thing the upstream PR got wrong); HDR signalling; per-client identity/permissions; pause/resume. *Acceptance:* feature parity with Apollo on the items you care about, plus mic done right.
---
## 9. Risk register
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| KWin runtime virtual-output API is undocumented/unstable | High | High | Spike on wlroots first to de-risk the pipeline; study KRdp's source for the KWin path; keep `VirtualDisplay` pluggable so a stuck compositor doesn't block the project |
| Wayland input injection gaps (libei still evolving) | Med | Med | uinput fallback always available; `reis` for the Wayland-native path |
| dmabuf → encoder zero-copy import quirks per GPU/driver | High | Med | Validate on your actual NVIDIA + AMD hardware early (M0); have a CPU-copy fallback path |
| Encoder/decoder can't sustain 1.77 Gpx/s @ 240 | Med | High | Measure in M0/M4 on real silicon; this is a hardware ceiling no rewrite fixes — discover it before P2, not after |
| Frame pacing eats more time than expected | High | Med | M3 measurement harness first; treat pacing as a first-class subsystem, not a polish step |
| Scope creep into a full Moonlight replacement | High | High | P1 (stock-client compat) is the firewall: it forces you to ship value before writing a client |
| Solo bandwidth vs. your other projects (ENRW thesis, played) | High | Med | M2 is a complete, useful artifact on its own; the plan is safe to pause after any milestone |
---
## 10. Testing & measurement
- **Loopback correctness:** core encodes→FEC→loss-inject→recover→decode in-process; property tests over loss patterns and shard counts (proptest).
- **Glass-to-glass latency:** rendered timestamp/QR on host, read back on client capture; or a photodiode for true photons. Track p50/p99.
- **Loss resilience:** `tc netem` to inject loss/jitter/reorder; verify FEC recovery and graceful degradation.
- **Pacing:** log present timestamps vs. client vsync; alert on stalls and duplicate/dropped frames.
- **Soak:** multi-hour streams; watch for buffer growth, fd leaks, encoder session exhaustion.
- **Hardware matrix:** your NVIDIA box (NVENC), an AMD/Intel box (VAAPI), Mac Studio (VideoToolbox decode). Catch driver quirks early.
---
## 11. Repo / workspace structure
```
punktfunk/
├── Cargo.toml # workspace
├── crates/
│ ├── punktfunk-core/ # protocol, FEC, pacing, crypto — C ABI (cdylib + staticlib)
│ │ ├── src/abi.rs # #[no_mangle] extern "C" surface
│ │ ├── src/fec.rs # GF(2^16) blocking over reed-solomon-simd
│ │ ├── src/transport/ # udp+fec video, quinn control/audio
│ │ ├── src/protocol/ # gamestream-compat (P1) + punktfunk/1 (P2)
│ │ └── cbindgen.toml
│ ├── punktfunk-host/ # Linux host binary
│ │ ├── src/capture/ # pipewire / portal
│ │ ├── src/encode/ # ffmpeg vaapi/nvenc
│ │ ├── src/vdisplay/ # trait + kwin/wlroots/mutter impls
│ │ ├── src/input/ # reis + uinput
│ │ └── src/web/ # axum config/pairing API
│ └── punktfunk-client-rs/ # reference Rust client (M4)
├── clients/
│ ├── apple/ # Swift package, imports punktfunk_core.h (M5)
│ └── android/ # Kotlin + JNI (later)
├── include/ # generated punktfunk_core.h
└── tools/
├── latency-probe/
└── loss-harness/
```
---
## 12. Immediate next actions (first week)
1. **Stand up the workspace** with `punktfunk-core` (empty ABI + `cbindgen`) and `punktfunk-host` skeletons; CI on your Gitea (you already have BuildKit pipelines).
2. **M0 spike on wlroots:** headless output → PipeWire capture → NVENC/VAAPI encode → playable file. This validates the riskiest *pipeline* assumptions in days, on your real GPU.
3. **Read KRdp's source** for how KDE creates virtual outputs and casts them — it's the closest existing reference for the KWin path you'll need in M2.
4. **Decide P1 protocol depth:** confirm exactly which `serverinfo`/RTSP/pairing messages a current Moonlight client requires for a successful connect, so M2's compat surface is scoped precisely (this is also the question to take back to the dev who mentioned the 1G limit).
---
*The shape of the bet: M2 alone — virtual-display streaming to stock Moonlight clients on Linux — is a complete, useful, gap-filling release. Everything after it (the wall-breaking transport, native clients, mic-done-right) is upside you unlock from a position of having already shipped, with the hard transport work resting on a FEC core that makes the 1 Gbps ceiling a thing of the past rather than a thing to hack around.*
+21
View File
@@ -0,0 +1,21 @@
---
title: Introduction
description: Low-latency desktop and game streaming, Linux-first.
---
import { Cards, Card } from 'fumadocs-ui/components/card'
**punktfunk** is a ground-up low-latency desktop and game streaming stack, built Linux-first,
with a shared Rust protocol core (`punktfunk-core`) exposed over a C ABI and native clients per
platform. It speaks two protocols: GameStream (so a stock Moonlight client just works) and the
native `punktfunk/1` (QUIC control plane + a hardened UDP data plane with GF(2¹⁶) Leopard FEC and
AES-GCM).
## Start here
<Cards>
<Card title="Project Overview" href="/docs/overview" description="Status, architecture, and milestones at a glance." />
<Card title="Implementation Plan" href="/docs/implementation-plan" description="The full design: protocol core, milestones, and architecture." />
<Card title="Roadmap" href="/docs/roadmap" description="Decided next goals and the longer-term bets." />
<Card title="Linux Host Setup" href="/docs/linux-setup" description="Bring up the build environment on an NVIDIA-GPU Ubuntu VM." />
</Cards>
+163
View File
@@ -0,0 +1,163 @@
---
title: "Linux Host Setup"
description: "Bring up the build environment on an NVIDIA-GPU Ubuntu VM."
---
How to bring up the build environment for the punktfunk Linux host on an NVIDIA-GPU Ubuntu VM
and run the **M0** capture→encode spike. `punktfunk-core` already builds and is tested
cross-platform; this is about the platform backends in `crates/punktfunk-host`.
> Target **Ubuntu 24.04 (noble)**: Sway 1.9, FFmpeg 6.1.1, xdg-desktop-portal 1.18.
> 22.04 (jammy) ships Sway 1.7 / FFmpeg 4.4 — too old for this path; build from source or
> upgrade. Package names/versions below were verified against the live Ubuntu archive.
## 1. Bootstrap
```sh
git clone git@git.unom.io:unom/punktfunk.git && cd punktfunk && git checkout m1-punktfunk-core
bash scripts/bootstrap-ubuntu.sh
```
It **verifies** the (already-installed) NVIDIA + NVENC stack, installs the Rust toolchain
(rustup) and the build/runtime deps (PipeWire, xdg-desktop-portal + the wlroots backend,
Sway, Wayland/DRM/EGL/GBM/VA dev libs, capture tools), **gates** the FFmpeg `-dev`
headers so it can't clobber your custom NVENC FFmpeg, and drops headless-Sway + portal
config templates into `~/.config` (only if absent). It does **not** reboot or edit GRUB.
After it runs, sanity-check the core on Linux:
```sh
cargo test --workspace # 21 tests; same suite that's green on macOS
```
## 2. NVIDIA prerequisites (one-time, may need a reboot)
Wayland on NVIDIA requires KMS modeset. The bootstrap checks it; if it isn't `Y`:
```sh
echo 'options nvidia-drm modeset=1 fbdev=1' | sudo tee /etc/modprobe.d/nvidia-drm.conf
sudo update-initramfs -u && sudo reboot
cat /sys/module/nvidia_drm/parameters/modeset # must print Y after reboot
```
- Driver **≥ 535** is the floor for headless wlroots (EGL/dmabuf); 550+ recommended.
- **Install the NVIDIA GL/EGL userspace, not just `nvidia-utils`:**
`sudo apt install libnvidia-gl-<NNN>` (matching the driver, e.g. `libnvidia-gl-595`).
`nvidia-utils-NNN` ships nvidia-smi + NVENC but **not** `libEGL_nvidia.so.0` or the GLVND
vendor JSON (`/usr/share/glvnd/egl_vendor.d/10_nvidia.json`). Without them libglvnd falls
back to Mesa, wlroots can't init EGL on the GPU and drops to the **pixman** software
renderer — and the ScreenCast portal then fails to negotiate a buffer format
(`unable to receive a valid format from wlr_screencopy`). Verify after install:
`ls /usr/share/glvnd/egl_vendor.d/10_nvidia.json && ldconfig -p | grep libEGL_nvidia`.
A correct GPU Sway logs `EGL vendor: NVIDIA` and a list of DMA-BUF formats.
- **Join the `render` + `video` groups:** `sudo usermod -aG render,video $USER`, then
**re-login** (group changes only apply to new logins). wlroots opens
`/dev/dri/renderD128` (group `render`) and `/dev/dri/card*` (group `video`), both 0660;
without membership Sway aborts with `Permission denied`. (`scripts/headless/*.sh` bridge a
not-yet-re-logged-in shell with `sg render`, but re-login is the clean fix.)
- A **headless VM GPU exposes no DRM connectors** — that's expected. We don't use the DRM
backend; `WLR_BACKENDS=headless` renders to an offscreen GBM/EGL surface and creates a
virtual `HEADLESS-1` output. Use the render node `/dev/dri/renderD128`.
- **NVENC in a VM:** full PCI **passthrough** = bare-metal NVENC, no license. **vGPU**
needs a valid license (vWS) or NVENC runs degraded — the bootstrap's smoke-encode tells
you if it actually works. Consumer GeForce cards also cap concurrent NVENC sessions
(~8); datacenter/RTX-pro are effectively unlimited — relevant once we serve many clients.
## 3. Bring up the headless compositor + prove capture→NVENC
```sh
# shell 1 — start headless GPU Sway on the shared user bus (blocks; -d for debug log)
bash scripts/headless/run-headless-sway.sh # success logs "EGL vendor: NVIDIA"
# shell 2 — same user: set the client mode, import the portal env, write the env file
bash scripts/headless/prepare-session.sh 2560x1440@60Hz
source /tmp/punktfunk-sway-env.sh
swaymsg -t get_outputs # confirm HEADLESS-1 active
swaymsg exec foot # optional: animated content to capture
bash scripts/headless/capture-smoke-test.sh # wf-recorder (wlr-screencopy) -> hevc_nvenc
ffprobe /tmp/punktfunk-headless-test.mkv # confirm a real H.265 stream
```
`wf-recorder` uses `wlr-screencopy` directly (no portal/D-Bus) — the fastest way to
de-risk the GPU encode path. **Note:** screencopy encodes straight to a file and *cannot*
feed PipeWire; the real integration uses the ScreenCast portal (see M0). If shell 1 logged
a Mesa/EGL fallback (or Sway dropped to pixman) instead of `EGL vendor: NVIDIA`, install the
NVIDIA GL userspace (§2) — the portal cannot capture a pixman output.
**An idle headless output produces no frames** (its frame clock is driven by damage); give
it a real refresh mode (`prepare-session.sh` does) *and* run something animated
(`swaymsg exec foot`) or the capture will be ~1 frame.
The wlroots-on-NVIDIA env workarounds (`WLR_RENDERER=gles2`, `WLR_NO_HARDWARE_CURSORS=1`,
`GBM_BACKEND=nvidia-drm`, `sway --unsupported-gpu`, …) live in
`scripts/headless/env.sh``source` it before launching anything Wayland.
## 4. M0 proper — wire it into `punktfunk-core`
Goal (plan §8): headless output → PipeWire ScreenCast → NVENC → a playable file, then feed
the encoded access units into a `punktfunk_core::Session` (host role). The module seams exist
in `crates/punktfunk-host/src/{vdisplay,capture,encode,inject,pipeline}.rs`.
**Status: implemented and verified end-to-end** in `crates/punktfunk-host` (`m0.rs`,
`capture/linux.rs`, `encode/linux.rs`). After the §3 bring-up:
```sh
source /tmp/punktfunk-sway-env.sh
swaymsg exec foot # animated content
# Live portal capture → NVENC HEVC → playable file, with each AU also round-tripped
# through a punktfunk_core host→client Session (FEC + packetize + reassemble) and verified:
cargo run -p punktfunk-host -- m0 --source portal --seconds 5 --out /tmp/punktfunk-m0.h265
ffprobe /tmp/punktfunk-m0.h265
# No capture session needed (encode + core only): --source synthetic
```
Verified result: `1920x1080` HEVC, ~300 frames in 5s, `punktfunk-core loopback … 0 mismatches`.
The portal negotiates packed **`RGB` (24-bit, 3 bpp)** on wlroots; the encoder expands it to
`rgb0` (one pad byte/pixel, no colour math) since NVENC accepts `rgb0`/`bgr0` but not
`rgb24`. dmabuf zero-copy import is still deferred (plan §9) — this is the CPU-copy path.
Crate choices, verified current:
- **Capture (portal path):** [`ashpd`](https://docs.rs/ashpd) **0.13** with the
`screencast` feature (the `pipewire` feature is *not* needed — `open_pipe_wire_remote`
is unconditional). Flow (0.13 API, verified against the vendored source): `Screencast::new`
`create_session(Default)``select_sources(&session, SelectSourcesOptions::default()
.set_sources(BitFlags::from_flag(SourceType::Monitor))…)``start(&session, None,
Default)``.response()?``Stream::pipe_wire_node_id()` + `open_pipe_wire_remote()`.
Note 0.13 takes **options structs**, not the old positional args, and defaults to the
**tokio** runtime — drive the handshake on a *multi-thread* tokio runtime (a
current-thread one starves zbus's reader and the portal reports "Invalid session").
Pull frames with [`pipewire`](https://docs.rs/pipewire) **0.9** — it must match the
pipewire crate ashpd 0.13 links (the `pipewire-sys` `links` key is unique per build, so
`0.10` fails to resolve). 0.9 uses `MainLoopRc`/`ContextRc::connect_fd_rc(OwnedFd)`/
`StreamBox`. Only request `SourceType::Monitor` — the wlr backend's
`AvailableSourceTypes` is `1` (Monitor only); asking for `Window`/`Virtual` invalidates
the session. Set `XDG_CURRENT_DESKTOP=sway` so the wlr portal backend is chosen, and
import it into the portal's environment (see "Portal bring-up" below).
- **Encode:** [`ffmpeg-next`](https://crates.io/crates/ffmpeg-next) **8.x** (binds the
system FFmpeg 8.x via pkg-config; needs `clang`/`libclang`). Select the encoder by
name — `encoder::find_by_name("hevc_nvenc")`, *not* by codec id (that's the SW encoder).
Low-latency opts: `preset=p1`, `tune=ull`, `rc=cbr`, `bf=0`, `delay=0`, large `g`.
If your FFmpeg is in a non-standard prefix, `export FFMPEG_DIR=/that/prefix`.
- **Zero-copy is the hard part.** There's no direct dmabuf→CUDA import in FFmpeg.
**Start with the CPU-copy fallback** (download frame → `hwupload_cuda``hevc_nvenc`)
to get an end-to-end stream, then chase true dmabuf zero-copy. The plan flags this
(§9) and the `capture` module already has a `cpu_bytes` fallback field.
- **Input (M2):** [`reis`](https://crates.io/crates/reis) (pure-Rust libei — no native
`libei` needed) with `input-linux`/uinput as the universal fallback.
Then continue toward **M2**: `serverinfo`/RTSP/pairing enough for a stock Moonlight client
to connect, a KWin virtual output created on connect, input via reis/uinput — the
shippable milestone.
## Troubleshooting
| Symptom | Fix |
|---|---|
| Sway aborts on NVIDIA | add `--unsupported-gpu` (the helper scripts do) |
| `not a KMS device` / no connectors | expected on a headless VM GPU — use `WLR_BACKENDS=headless`, not the DRM backend |
| Sway won't start at all | `WLR_RENDERER_ALLOW_SOFTWARE=1 WLR_RENDERER=pixman` to prove the pipeline, then fix EGL |
| ScreenCast portal finds no output | ensure `xdg-desktop-portal-wlr` is running in the same session, `XDG_CURRENT_DESKTOP=sway`, and `~/.config/xdg-desktop-portal-wlr/config` has `output_name=HEADLESS-1` |
| `Cannot load libnvidia-encode.so.1` | NVENC runtime lib missing (driver) or unlicensed vGPU |
| `cargo build` can't find FFmpeg | `export FFMPEG_DIR=$(pkg-config --variable=prefix libavcodec)` or point `PKG_CONFIG_PATH` at the custom build |
| bindgen: libclang not found | `export LIBCLANG_PATH=$(llvm-config --libdir)` |
+95
View File
@@ -0,0 +1,95 @@
---
title: "M2 — Moonlight Host"
description: "Stream to a stock Moonlight client on a client-sized virtual display."
---
The shippable milestone (plan §8). A stock Moonlight/Artemis client discovers this host,
pairs, launches, and gets video (then input, then audio) on a client-sized virtual display.
Ground-truth protocol reference: [`research/gamestream-protocol-research.json`](research/gamestream-protocol-research.json)
(distilled from Sunshine + moonlight-common-c source; cite those for byte-level detail).
## Architecture (respects the "one core" invariant)
- **punktfunk-core** gains a **P1 GameStream wire codec** (`ProtocolPhase::P1GameStream`, the
hook already exists): the exact RTP+`NV_VIDEO_PACKET` framing, the GameStream FEC shard
layout, and the video/audio AES-GCM/CBC paths. Hot path, native threads, **no async**.
Kept beside punktfunk's native internal format (P2), selected by phase.
- **punktfunk-host** gains the **control plane** (tokio/axum OK — I/O-bound, not the hot path):
mDNS discovery, nvhttp serverinfo + the 4-phase pairing, the RTSP handshake, the ENet
control stream + input injection, the virtual-display lifecycle, and Opus audio encode.
## Port map (base 47989; Moonlight derives all by offset)
| Port | Proto | Role |
|---|---|---|
| 47989 | TCP | HTTP nvhttp (unpaired: /serverinfo, /pair PIN flow) |
| 47984 | TCP | HTTPS nvhttp (paired; **client-cert pinned**) — /launch, /resume, … |
| 48010 | TCP | RTSP (OPTIONS/DESCRIBE/SETUP/ANNOUNCE/PLAY) |
| 47998 | UDP | Video RTP (+ RS-FEC, optional AES-GCM) |
| 47999 | UDP | Audio RTP (Opus, RS-FEC 4+2, optional **AES-CBC**) |
| 48000 | UDP | ENet control stream (AES-GCM) + remote input |
| 5353 | UDP | mDNS `_nvstream._tcp.local` advertisement |
## Key wire facts (the non-obvious ones)
- **Video datagram** = `RTP_PACKET(12, BIG-endian)` + `reserved[4]` + `NV_VIDEO_PACKET(16,
LITTLE-endian)` + payload. Endianness differs *within the same packet*. `header=0x80|0x10`.
- `fecInfo` (u32 LE) = `(dataShards<<22)|(fecIndex<<12)|(fecPercentage<<4)`; parityShards is
**recomputed** by the client as `ceil(dataShards*pct/100)` — must match exactly.
- `multiFecBlocks` = `(blockIdx<<4)|((nBlocks-1)<<6)`; **≤4 FEC blocks/frame**, ≤255 shards/block.
- Each frame's bitstream is prefixed with an 8-byte `video_short_frame_header_t`
(`headerType=0x01`, `frameType` 2=IDR, `lastPayloadLen`) before striping into shards.
- Shard size = `packetSize + 16`. Data shards first, then parity, over a contiguous RTP
sequence range. Last data shard zero-padded.
- **Video crypto** (when `SS_ENC_VIDEO` negotiated): AES-128-GCM, key = raw 16-byte RIKEY
(from `/launch?rikey=`), IV = `counter_le[8]||0,0,0||'V'(0x56)`, **NO AAD**, 32-byte
`ENC_VIDEO_HEADER{iv[12],frameNumber,tag[16]}` prefix; **FEC first, then encrypt per shard**.
- **Pairing**: PIN key = `SHA-256(salt[16] || ascii_pin)[..16]`; AES-128-**ECB** (no padding)
for the challenge blocks; SHA-256 rolling hashes; RSA-SHA256 signatures over X.509 certs;
the client cert is pinned for subsequent HTTPS. 4 phases over `/pair?phrase=…`.
- **RTSP** `Session: DEADBEEFCAFE;timeout = 90` (literal), `Transport: server_port=<p>`,
`streamid=video/0/0` / `control/13/0`. ANNOUNCE carries the negotiated config
(`x-nv-video[0].*`, `x-nv-vqos[0].*`) → maps to `punktfunk_core::Config`.
## The two highest interop risks (validate EARLY)
1. **RS-FEC matrix compatibility.** Sunshine + Moonlight both use **nanors** (GF(2⁸), poly
0x11d, Vandermonde systematic). punktfunk-core uses `reed-solomon-erasure` (Cauchy) — parity
bytes likely **don't match**, so Moonlight silently fails to recover any frame with a lost
data shard. Mitigation: **on a clean LAN with no loss the client never runs RS decode**, so
defer this — get a frame decoded first, then FFI/port nanors for loss recovery.
2. **Crypto layout.** punktfunk's `SessionCrypto` (salt + seq-as-AAD) is wire-incompatible. P1
needs a separate GameStream GCM path. Mitigation: **video encryption is negotiated and
usually off on LAN** — implement plaintext video first, add GCM later.
## Phasing (each phase independently testable with a real Moonlight client)
- **P1.1 — Discovery + serverinfo + pairing.** mDNS `_nvstream._tcp`, HTTP/HTTPS nvhttp,
`/serverinfo` XML, the 4-phase pairing + cert pinning. *Acceptance: Moonlight discovers,
pairs (PIN), and shows the host as ready.* ← first slice.
- **P1.2 — Launch + RTSP + virtual display.** `/launch` (parse rikey/rikeyid/mode), the RTSP
handshake, negotiate `Config`, create a wlroots virtual output sized to the client.
*Acceptance: Moonlight completes RTSP and the host stands up the UDP streams.*
- **P1.3 — Video (punktfunk-core P1 codec), plaintext, clean-LAN.** RTP+NV framing + FEC shard
layout in punktfunk-core; wire M0's NVENC AUs → UDP 47998. *Acceptance: Moonlight DISPLAYS video.*
- **P1.4 — Control + input.** ENet (`rusty_enet`) control stream; decode input → `inject.rs`
(uinput/reis); request-IDR → force NVENC keyframe. *Acceptance: mouse/keyboard work.*
- **P1.5 — Robustness: FEC recovery + encryption.** nanors-exact FEC; per-shard AES-GCM.
*Acceptance: stable under `tc netem` loss; encrypted streams.*
- **P1.6 — Audio + polish.** Opus + audio RTP/FEC/CBC (UDP 47999); disconnect teardown; KWin
backend for the user's KDE box. *Acceptance: full game stream with sound — the M2 goal.*
## Crates (verified available)
`mdns-sd` 0.20 (discovery) · `axum` 0.8 + `rustls` + `tokio-rustls` (nvhttp/HTTPS, custom
`ClientCertVerifier` for pinning) · `rcgen` 0.14 + `x509-parser` 0.18 + `rsa`/`sha2`/`aes`/
`ecb` (pairing crypto) · hand-rolled RTSP over `tokio::net::TcpListener` · `rusty_enet` 0.4
(control) · `opus` 0.3 (audio) · `reis` 0.6 + `input-linux` (input) · `aes-gcm` (already in
core) for the P1 video/control GCM path; nanors (FFI/port) for FEC recovery in P1.5.
## Testing note
The host is headless; end-to-end needs a **stock Moonlight client on the LAN** pointed at
this box (manual "add host" by IP works without mDNS). P1.1 is testable with `curl` against
`/serverinfo` + the Moonlight pair flow; P1.3+ needs a client that can display.
+14
View File
@@ -0,0 +1,14 @@
{
"title": "Documentation",
"pages": [
"index",
"overview",
"implementation-plan",
"roadmap",
"m2-plan",
"---Setup---",
"linux-setup",
"windows-host",
"dualsense-haptics"
]
}
+77
View File
@@ -0,0 +1,77 @@
---
title: "Project Overview"
description: "Status, architecture, and milestones at a glance."
---
*A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust
protocol core and native clients per platform.*
`punktfunk` is a placeholder codename. The bet: ship a **Linux virtual-display streaming
host** that speaks the existing Moonlight protocol (every Moonlight/Artemis client works
day one), then break the ~1 Gbps FEC wall with a **GF(2¹⁶) Leopard-RS** transport as a
negotiated extension. See [`docs/implementation-plan.md`](docs/implementation-plan.md).
## Status
| Milestone | State |
|-----------|-------|
| **M1 — `punktfunk-core` + C ABI** | ✅ done & hardened (FEC, packetization, AES-GCM, session, adversarial-review fixes, `punktfunk_core.h`) |
| **M2 — GameStream host → stock Moonlight** | ✅ live end-to-end: pairing, RTSP, audio, per-client virtual output at native res, GPU zero-copy NVENC, gamepads |
| **M3 — `punktfunk/1` native protocol** | ✅ validated live: QUIC control + GF(2¹⁶) FEC/AES data plane, SPAKE2 PIN pairing, mid-stream mode renegotiation |
| **M4 — client decode + present (Apple)** | 🟡 macOS first light: AnnexB→VideoToolbox HEVC on glass + input/pairing over `punktfunk/1` (`clients/apple`); iOS + presenter next |
| **Web console + management API** | ✅ TanStack web console (`web/`) over the OpenAPI mgmt API: host status, paired devices, on-demand native pairing (arm → show PIN) |
The **GameStream host works with a stock Moonlight client** — validated live on NVIDIA
(RTX 5070 Ti & RTX 4090, driver 595): trust-on-first-use pairing that persists, an app
catalog, RTSP/ENet/audio, and **video at the client's exact resolution and refresh** via a
per-session virtual output (KWin, gamescope, Mutter, Sway backends), encoded with GPU
**zero-copy** (dmabuf → CUDA/Vulkan → NVENC) at up to 5120×1440@240. The native
**`punktfunk/1`** protocol adds a QUIC control plane and a GF(2¹⁶) Leopard-FEC + AES-GCM data
plane (p50 ~0.8 ms capture→reassembled at 720p120), with a SPAKE2 PIN pairing ceremony. Both
run from **one process** (`serve --native`), managed through a REST API + web console. Builds
against FFmpeg 7 or 8; deployed live on Bazzite. Full status: [`CLAUDE.md`](CLAUDE.md);
roadmap: [`docs/roadmap.md`](docs/roadmap.md).
## Layout
```
crates/
punktfunk-core/ protocol · FEC · pacing · crypto · quic — the C ABI (lib + cdylib + staticlib)
punktfunk-host/ Linux host: vdisplay · capture · encode · inject · gamestream · m3 · mgmt · native_pairing
punktfunk-client-rs/ punktfunk/1 reference client (M3 headless; M4 adds decode+present)
clients/{apple,android}/ native client scaffolds (import punktfunk_core.h); apple = macOS first light
web/ TanStack web console (host status · paired devices · pairing) over the mgmt API
packaging/ Fedora/Bazzite RPM · bootc image · COPR (see packaging/bazzite/README.md)
include/punktfunk_core.h cbindgen-generated C header (checked in)
tools/{latency-probe,loss-harness}/ measurement (plan §10)
docs/{implementation-plan,roadmap,windows-host,dualsense-haptics}.md
```
## Build & test
```sh
cargo build --workspace # green on Linux and macOS
cargo test --workspace # unit + loopback + proptest + C ABI harness
cargo clippy --workspace --all-targets
cargo run -p loss-harness # FEC loss-resilience sweep (no network needed)
bash crates/punktfunk-core/tests/c/run.sh # standalone C-ABI link+round-trip proof
```
The C header regenerates from `crates/punktfunk-core/src/abi.rs` on every build (cbindgen via
`build.rs`) into `include/punktfunk_core.h`.
## Design invariants
- **One core, linked everywhere.** Protocol/FEC/crypto/pacing live in `punktfunk-core` exactly
once, exposed over a stable, versioned C ABI (`punktfunk_abi_version()`, `PunktfunkConfig`
carries its own `struct_size`).
- **No async on the hot path.** The per-frame pipeline uses native threads only;
`tokio`/`quinn` are gated behind the off-by-default `quic` feature (control plane only).
- **FEC is the wall-breaker.** GF(2⁸) (≤255 shards/block) for Moonlight compat;
GF(2¹⁶) (≤65535 shards/block, SIMD, O(n log n)) to push past ~1 Gbps.
## License
MIT OR Apache-2.0.
+334
View File
@@ -0,0 +1,334 @@
---
title: "Roadmap"
description: "Decided next goals and the longer-term bets."
---
Decided 2026-06-10 (research-grounded; see commit history), extended since.
**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full
stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense**
— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E
live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds
against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced).
**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one
process, with native pairing driven from the **web console** (arm → show PIN), not the service log.
Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`).
**Bazzite dynamic resolution (`c894c6f`):** the host now *manages* a headless `gamescope-session-plus`
Steam session at the **client's exact resolution + refresh** — games see it (via injected
`--nested-refresh` + generated CVT modes, not the box's TV EDID), relaunched per-connection on a mode
change, reused (no Steam restart) on the same mode. Plus macOS/iPad input fixes (NSEvent motion +
iPad pointer-lock) and a 4K/5K one-frame-freeze fix (grow the UDP socket buffers).
**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval), the M4
client presenter + iOS (§6), and a Windows host (§7 — now **de-risked via SudoVDA**, no custom
signed driver needed). **§10 HDR/10-bit is parked — blocked upstream at the compositor** (no
gamescope/KWin PipeWire 10-bit producer yet).
## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)*
Startup is a chain of timing-sensitive handoffs with no readiness checks — each is a blind
`sleep`, one-shot timeout, or silent fire-and-forget that fails into a black screen.
- **Phase 1 (S):** replace `run-headless-kde.sh`'s blind `sleep 2` with an active readiness
wait (kwin socket + `wl_display` roundtrip + `zkde_screencast` global advertised +
KWIN_PID alive); add a `punktfunk-host probe-compositor` subcommand (reuses kwin.rs's
registry roundtrip); move the portal restart to *after* readiness and precede it with
`systemctl --user import-environment` + `dbus-update-activation-environment` (the missing
env import — the Sway script does this, the KDE one doesn't).
- **Phase 2 (M):** bounded retry-with-backoff around `vd.create()` + first-frame
(permanent vs transient); a PipeWire negotiation watchdog with zero-copy→CPU auto-fallback
("no PipeWire frame within 10s" → recovery or precise diagnosis); fix `set_custom_refresh`
to wait for the output, read back the active mode, reconcile encoder fps; harden gamescope
node discovery + detect the known-bad-gamescope signature; graceful PipeWire-thread stop.
- **Phase 3 (L):** supervised systemd user session (kwin + portal + host) with the readiness
probe as an `ExecStartPost` gate, `Restart=on-failure`.
## 2. Offer available compositors in the client ✅ *(done)*
Host enumerates which backends are actually available (binary present + version OK:
gamescope ≥3.16.22, KWin ≥6.5.6, gnome-shell, sway), advertises the list in the punktfunk/1
Welcome + a mgmt-API field; client sends its pick in the Hello; host honors it per session.
Picker in the Apple client + web console.
## 3. Bazzite / install on other devices ✅ *(packaging written — `packaging/`)*
Bazzite already ships gamescope + PipeWire + the NVIDIA driver (incl. `libnvidia-encode`);
it's Fedora-atomic and the community installs Sunshine via COPR rpm-ostree — the analog.
Written: `packaging/rpm/punktfunk.spec` (builds the host from source), `packaging/bootc/Containerfile`
(`FROM bazzite-nvidia`), `packaging/bazzite/host.env` (gamescope default), `packaging/copr/` +
`packaging/README.md`. The build itself is operator-run (COPR / a Fedora toolbox; not buildable on
the Ubuntu dev box). `LICENSE-{MIT,APACHE}` added to match the declared dual license.
- **M-Bazzite-1:** a **COPR RPM** (primary) — binary + `60-punktfunk.rules` (→
`/usr/lib/udev/rules.d`) + systemd `--user` unit + `host.env.example`; `Requires` the
NVENC ffmpeg-libs Bazzite already pulls; links host `libcuda`/`libnvidia-encode` directly.
Install = `rpm-ostree install` + reboot + add to `input`/`render`. Default backend =
Bazzite's already-present **gamescope** (minimal session plumbing).
- **M-Bazzite-2:** wrap the RPM in a **bootc/OCI image layer** (`FROM
ghcr.io/ublue-os/bazzite-nvidia:stable`) for the appliance/"just rebase" experience.
- Flatpak only later as an explicitly-degraded convenience build (sandbox fights
zero-copy NVENC/dmabuf/uinput).
## 4. Mic passthrough — client mic → host input device ✅ *(done — host side)*
The exact mirror of the host→client desktop-audio path. A PipeWire virtual source apps can
select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`.
- New `0xCB` MIC_AUDIO datagram (mirror of `0xC9`) + `NativeClient::send_audio` + ABI
`punktfunk_send_audio`.
- `audio/source_linux.rs` — near-copy of the capture file, Direction::Output, fed from a
jitter buffer (silence-fill underrun, Opus PLC).
- Host `mic_thread` (Opus decode → ring → source); teardown RAII, set `node.dont-reconnect`.
- Apple capture (AVAudioEngine → Opus). **Opt-in + paired-only** (a remote mic is a privacy
surface). punktfunk/1-only.
## 5. Touch + rich DualSense *(decision: commit to full UHID DualSense)*
- **Touch — implemented (host path), pending a backend that lands it.** `TouchDown/Move/Up`
InputKinds (reuse the abs-pointer `flags=(w<<16)|h` mapping, `code`=touch id); host
`inject/libei.rs` requests the `Touchscreen` device type + binds the `Touch` capability and
injects `ei_touchscreen` down/motion/up; `punktfunk-client-rs --touch-test` drags a finger.
**Validated:** KWin's RemoteDesktop portal *grants* the Touchscreen device type, but its EIS
server creates **no touchscreen device** (headless KWin) — so touch currently no-ops on KWin
(now logged once). The code is correct; it needs a backend that exposes `ei_touchscreen`
(gamescope / newer KWin / the real iPad client path) to land. wlroots: no virtual-touch wired.
- **Rich DualSense — HID backend built & validated live.** `inject/dualsense.rs`: a hand-rolled
`/dev/uhid` codec (no bindgen) presenting a genuine USB DualSense (vendor 054C/0CE6, the 232-byte
inputtino report descriptor) bound by the kernel `hid-playstation` driver. The mandatory
GET_REPORT feature handshake (calibration 0x05 / pairing 0x09 / firmware 0x20) is answered, so the
kernel creates the full device (gamepad/motion/touchpad/lightbar). Input report `0x01` is built
from gamepad frames; output report `0x02` is parsed for LED RGB, player LEDs, and **adaptive
trigger effects (L2/R2)**. Protocol carries new side-planes: rich-input `0xCC`
(touchpad/motion) + HID-output `0xCD` (LED/triggers). `/dev/uhid` udev rule shipped.
- **Rich DualSense — Phase C/D/E end-to-end, validated live.** `PUNKTFUNK_GAMEPAD=dualsense`
selects a per-session `DualSenseManager` (the `PadBackend` enum in `m3.rs`): client gamepad frames
build the DualSense report; the kernel's feedback comes back as `HidOutput` on the **0xCD** plane
(lightbar / player LEDs / adaptive triggers) while **rumble stays on the universal 0xCA plane**
(so non-DualSense clients still feel it); touchpad + motion ride the **0xCC** rich-input plane
(`DualSenseManager::apply_rich`, merged with button state). The connector + C ABI gained
`punktfunk_connection_next_hidout` (→ `PunktfunkHidOutput`) and `punktfunk_connection_send_rich_input`
(← `PunktfunkRichInput`); header regenerated. Validated on-box: a synthetic-source `m3-host` +
`punktfunk-client-rs --rich-input-test` created the real kernel DualSense, drove 0xCC, and decoded
12 live 0xCD events (the kernel's actual lightbar/trigger init reports) — data plane unaffected
(600/600 frames). *Remaining:* the Apple client renders adaptive triggers + rumble on a real
DualSense (`GCDualSenseAdaptiveTrigger`) — handed off to the client agent for the real playtest.
- **Advanced (audio-driven voice-coil) haptics — scoped, NO-GO for now (`docs/dualsense-haptics.md`).**
Driven by the DualSense's USB *audio* interface (4-ch, back 2 channels = haptic PCM), not HID — so
the UHID backend structurally can't carry it. Three independent walls: host capture needs a kernel
rebuild (`CONFIG_USB_DUMMY_HCD` is off → no UDC for an `f_uac2` gadget); **near-zero Linux supply**
(only ~510 Proton titles via custom Wine patches emit it; `hid-playstation`/Steam Input/RPCS3
don't); and the Apple client can't faithfully replay PCM haptics (CoreHaptics is discrete/pattern-
based, no public channel-3/4 routing). Deferred; revisit only if a real DS for capture + a UDC/host
path + a PCM-capable client all land. Adaptive triggers (HID, above) deliver the reachable 80%.
## 6. iOS/iPadOS → tvOS *(deferred)*
PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin
+ touch capture (#5) + UI. tvOS later.
## 7. Windows as a host *(scoped — `docs/windows-host.md`; de-risked via SudoVDA)*
Architecturally an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/
crypto/C-ABI) + QUIC + GameStream + mgmt + the `m3`/pipeline orchestration are all platform-agnostic
and already `cfg`-isolated (~95% reuse). New `#[cfg(windows)]` backends behind the existing traits:
capture (DXGI Desktop Duplication / Windows.Graphics.Capture), encode (Media Foundation / NVENC-SDK
with a D3D11 context), input (SendInput + ViGEm), audio (WASAPI loopback + a virtual mic).
**The old blocker is gone.** Rather than author + sign our own kernel IDD for the per-client virtual
display, use **SudoVDA** (the Sunshine Virtual Display Adapter) — a pre-built, signed Indirect
Display Driver that creates virtual displays at arbitrary WxH@Hz on demand. The `VirtualDisplay`
backend becomes *"install + drive SudoVDA's control API"* (M effort), not *"write + WHQL-sign a
kernel driver"* (XL). That removes the only hard blocker — the Windows host is now a medium,
mostly-mechanical port. Recommended start: **Phase 0** — capture an existing monitor to prove the
stack end to end; **Phase 1** wires SudoVDA for the native-resolution output. Deferred only because
it's unbuildable on the Linux dev box; the trait boundaries are already in the right places.
## 8. Pairing & trust hardening *(next)*
The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the
client) is built and live. Two changes harden it from "works" to "secure by default":
- ✅ **Mandatory PIN pairing by default — done & live** (`§8a`, `serve --native` now requires
pairing; `serve --open` disables it). An unpaired client is rejected at the session gate; pairing
is via the SPAKE2 PIN ceremony (one online guess, no offline attack) armed from the web console.
Validated live: unpaired → "this host requires pairing", then web-armed PIN → "client trusted".
Deployed to the dev box + Bazzite.
- **Delegated pairing approval** *(next — the ergonomic enabler for "mandatory": pair a device
without fetching the host PIN out of band).* Target flow:
1. Device A is already paired (authenticated) to Host X.
2. The user tries to connect Device B to Host X.
3. Host X surfaces a request: *"Allow Device B to pair with Host X?"*
4. The user approves/denies; on approve, Host X admits Device B — binding B's certificate
fingerprint — with no PIN typed.
Two buildable layers:
- **§8b-1 (host + web — achievable now):** an unpaired B that connects to an approval-enabled host
is held as a **pending request** `{id, name, fingerprint, requested_at}` in `NativePairing`
instead of a flat reject; mgmt gains `GET /native/pending` + `POST /native/pending/{id}/{approve,
deny}`; the web console lists pending requests with Approve/Deny. The **operator approves from
the console** — delegated approval via the management surface.
- **§8b-2 (peer push — needs the client):** the host also pushes the pending request over a paired
**Device A**'s live QUIC connection (a new control-plane message); A's app renders the prompt and
replies approve/deny — the user's exact "Device A gets a notification" flow. The native/Apple UI
is a client-agent task.
PIN pairing (§8a) stays the bootstrap — the first device, or when no approver is online.
## 9. Client→host network speed test + settable bitrate *(host side done — client UI remaining)*
Measure what the network actually sustains so the bitrate picker is informed (suggest/cap a safe
value) instead of guesswork that ends in a stuttering stream.
**Done & live (host + protocol + connector + C ABI, `74819b1`):**
- **Bitrate negotiation**: `bitrate_kbps` rides Hello/Welcome (trailing-byte back-compat). The
client requests a rate; the host clamps to [500 kbps, 500 Mbps] (or its 20 Mbps default on 0),
applies it to NVENC (replacing the old hardcoded 20 Mbps) on the initial mode + every reconfigure,
and echoes the resolved value. C ABI: `punktfunk_connect_ex3(…, bitrate_kbps, …)` +
`punktfunk_connection_bitrate()`.
- **Bandwidth probe over the punktfunk/1 data path**: `ProbeRequest{target_kbps,duration_ms}` /
`ProbeResult{bytes_sent,…}` control messages + a `FLAG_PROBE` packet flag. The host bursts
zero-filled FEC-encoded AUs at the target goodput for the duration (clamped ≤ 1 Gbps / ≤ 5 s,
video paused), reports what it sent; the connector measures received bytes/window → goodput + loss
and exposes it (`punktfunk_connection_speed_test()` + `punktfunk_connection_probe_result()` →
`PunktfunkProbeResult{throughput_kbps, loss_pct, …}`). Probe filler is diverted from the decoder.
Validated on loopback (synthetic source): a 20 Mbps/2 s probe measured 20050 kbps at 0% loss,
interleaved probe AUs excluded from frame verification. `punktfunk-client-rs` gains `--bitrate` +
`--speed-test KBPS:MS` as the reference/loopback driver.
**Remaining (client UI):** wire the C ABI into the Apple client — a "Test network" action
(`speed_test` → poll `probe_result` → "~XXX Mbps · recommended bitrate YYY") feeding a bitrate
control (`connect_ex3`), and surface both in the web console.
## 10. HDR + 10-bit color *(parked — blocked upstream at the compositor producer)*
Opt-in HDR10 (BT.2020 + PQ, 10-bit) streaming. Designed end to end; **blocked at capture, not in our
stack** — the compositor doesn't emit a 10-bit/HDR PipeWire frame on any shipping build. Spiked +
researched 2026-06-11 (memory: `hdr-blocked-gamescope-pipewire`); the downstream design is ready to
build the moment a producer lands.
- **The wall — gamescope capture is 8-bit.** gamescope composites HDR for a *display*
(`--hdr-enabled`, `--hdr-debug-force-output`), but its PipeWire capture node offers only `BGRx`/
`NV12` (8-bit) — confirmed by reading `src/pipewire.cpp` `build_format_params()` on upstream master
AND the box's exact build (`c31743d`); color is capped BT.601/709. Issue **#2126** ("pipewire: add
HDR streams") is OPEN + unstarted (no PR). Forcing HDR output does not change the capture format.
- **PipeWire ≥1.6** is the other prerequisite (HDR colortype transport). Fedora 43 ships 1.4.x;
Fedora 44 ships **1.6.6** — but Bazzite F44 `deck-nvidia` is **testing-only** (`:stable` is still
F43; `:testing` has a confirmed NVIDIA Game-Mode crash). Rebasing now clears *only* the PipeWire
wall while gamescope stays 8-bit → **no-go** for HDR; revisit a rebase when F44 promotes to stable
(for its own sake), not for HDR.
- **The realistic route is KWin, not gamescope:** **KWin MR !8293** is a live draft adding HDR
PipeWire capture. That pulls HDR onto the *desktop* (KWin) path — trading away gamescope's
Steam-Deck-UI polish + the dynamic-resolution work (§ above). Track #2126 and !8293.
- **Constraints (settled):** NVENC tops out at **10-bit** → no Main12, no 12-bit AV1. **HDR ⟹ HEVC
Main10** (Apple VideoToolbox decodes 10-bit HEVC but **not** 10-bit AV1). Static HDR10 SEI
(BT.2020-PQ default) since the compositor won't surface per-frame metadata. Opt-in negotiation via
the Hello/Welcome trailing-byte pattern (SDR default; client declares HDR want; host master toggle).
- **Downstream design (ready when capture unblocks):** add P010 + `ColorInfo` to capture; 10-bit
zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA
path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple
VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback.
## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)*
Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks
the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling.
**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and
untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is
only ~434874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM
(RustCrypto auto AES-NI, ~1025× headroom on x86_64); the u64 sequence/nonce space; and the **M1
`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every
legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size
must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated
params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand.
- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the
clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show
headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`)
so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock
send-buffer drop — `Transport::send` → `Result<bool>`, a new `packets_send_dropped` stat (Stats +
C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe
completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe.
- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall
per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe
bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all`
(64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux
`sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with
`recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep
underdrain). ~64× fewer syscalls.
- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the
reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 2561024 (bounds burst-drop
blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump
`shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets.
- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB,
never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface.
- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path):
`punktfunk-client-rs --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching
`packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.51 Gbps (no explicit
`rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.
## 12. Glass-to-glass latency *(investigated; quick wins landed, bigger bets scoped)*
A 5-way investigation (2026-06-11) mapped where latency actually lives. The measured "p50 0.83 ms"
is only the same-host **capture-stamp→reassembled** slice (~3040% of true glass-to-glass) and was
measured with tiny single-chunk frames, so it excludes the pacing tail. The latency that matters, in
priority order: **(1) the host pacing tail** — `paced_submit` used to spread *every* multi-chunk
frame over ~90% of the interval (up to ~7.5 ms@120 / ~15 ms@60); **(2) native-path serialization** —
`virtual_stream` runs capture+encode+seal+paced-send on one thread, so frame N+1 can't start until
frame N's paced tail leaves the wire; **(3) client present** — `AVSampleBufferDisplayLayer` adds
~0.5 refresh (~4 ms@120Hz, ~8 ms@60Hz), the dominant client term at 60 Hz.
**Already optimal — do NOT touch** (confirmed): NVENC tuning (p1/ull/cbr/bf0/delay0/infinite-GOP +
forced-IDR — `receive_packet` is already same-frame); the device→device copy in `submit_cuda` (avoids
NVENC registration-cache thrash); FEC `max_data_per_block=4096` (every frame incl. a 4 MB IDR is one
block — no multi-block latency); the client reassembler (no jitter buffer, frame emitted on
last-packet arrival, `REORDER_WINDOW` is a dedup bound not a delay) — do **not** add a client jitter
buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
- **Done & live (`99f60b5`):** **microburst-cap pacing** — a frame ≤ a cap (default 128 KB,
`PUNKTFUNK_PACE_BURST_KB`) bursts out immediately (no pacing tail); only a bigger frame's overflow
(IDR / sustained high bitrate — the bursts that actually froze) is spread. Recovers the tail on the
common case, keeps the freeze fix for the frames that need it; 128 KB is a safe default (well under
the ~150 Mbps@60 frame size where drops began). Plus **per-frame instrumentation** (PUNKTFUNK_PERF):
`encode_us` + `pace_us` p50/p99/max + immediate-vs-paced counts, so the cap is tunable against real
numbers. **Validate with the LAN soak before raising the cap** (`send_dropped` must stay 0).
- **Done & live (`b295a5b`; validated on the GNOME box 2026-06-12):** **encode|send thread split**
on the native path — a dedicated `send_loop` thread owns the `Session` and does seal+pace+send+
probes; the encode thread captures+encodes+handles reconfig and hands `FrameMsg` over a bounded
`sync_channel(3)` with backpressure. Removes the serialization (~28 ms @60120 fps) and is the
substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
1. **Wall-clock skew handshake + glass-to-glass probe** (`tools/latency-probe`) — measures the two
biggest unmeasured terms (render→capture, decode→present); client present-stamp vs the AU's
`pts_ns` (already attached).
2. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
copy) — ~0.10.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
3. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
4. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
encode+send within a frame (~36 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
thread split, only after measurement justifies it.
## 13. Native-protocol LAN auto-discovery ✅ *(done — 2026-06-12, validated cross-LAN)*
The native protocol had no discovery — clients connected by `--connect HOST:PORT` only, while
GameStream already auto-discovered via mDNS (`_nvstream._tcp`). Now both the unified host
(`serve --native`) and standalone `m3-host` advertise the native service over mDNS:
- **Service**: `_punktfunk._udp.local.` (UDP — punktfunk/1 is QUIC; the advertised port is the QUIC
control/data port). Host side: `crate::discovery::advertise_native`, wired into `m3::serve` so
both host entry points get it; best-effort (a discovery failure never blocks streaming —
`--connect` always works). The advert is held for the host's lifetime (RAII unregister).
- **TXT records**: `proto=punktfunk/1`, `fp=<host cert SHA-256>` (the value a client pins — advisory
over unauthenticated mDNS, TOFU/pinning still verifies on connect), `pair=required|optional`
(so a picker knows up front whether the PIN ceremony is needed), `id=<host uniqueid>` (dedup).
- **Client**: `punktfunk-client-rs --discover [SECS]` browses and prints each host (name, addr:port,
pairing, fingerprint), then exits. Apple clients browse the same service natively via NWBrowser
(Bonjour) — no Rust-connector dependency; this section's service type + TXT keys are the contract.
- **Validated**: cross-LAN — dev box discovered the GNOME-box appliance
(`home-worker-3 192.168.1.248:9777 pair=required fp=1dcf3a…`) and a standalone synthetic host
(`pair=optional`); fingerprint + pairing state correct in both.
- **Next** (not done): wire NWBrowser discovery into the Apple client UI (host picker); the
host-side contract above is all it needs.
+77
View File
@@ -0,0 +1,77 @@
---
title: "Windows Host"
description: "Feasibility and scoping for a Windows host backend."
---
**Status: scoped, deferred — but de-risked.** A Windows host is architecturally an *"add a backend"*
job, not a parallel port. The one thing that used to make it **large** — the per-client *virtual*
output, which has no user-mode Windows API and seemingly needed a self-signed kernel Indirect
Display Driver (IDD) — is **solved by reusing [SudoVDA](https://github.com/VirtualDrivers), the
Sunshine Virtual Display Adapter**: a pre-built, signed IDD that creates virtual displays at
arbitrary `WxH@Hz` on demand. We install it and drive its control interface; **no driver to write or
WHQL-sign.** That turns the headline feature from XL into a medium backend. This doc records what's
left so the work can be picked up deliberately.
(Grounded in a 4-agent read of the host crate, 2026-06-10; SudoVDA path added 2026-06-11.)
## What's already done for us
punktfunk is cleanly layered. **~95% of the codebase is platform-agnostic and reuses verbatim:**
| Reusable as-is | Why |
|---|---|
| `punktfunk-core` (protocol, FEC, crypto, session, transport, **C ABI**) | Zero platform deps — no `cfg(linux)` anywhere; the C ABI is already cross-platform |
| QUIC control plane (`quic.rs`, pairing, mode negotiation) | quinn + tokio are portable |
| GameStream P1.1 (mDNS, serverinfo, pairing, RTSP, ENet) — *except* `stream.rs`/`audio.rs` | pure wire logic |
| Management REST API (`mgmt.rs`) + OpenAPI | axum/tokio, portable |
| Pipeline + `m3.rs` orchestration | trait-generic — calls `capturer.next_frame()`, `encoder.submit/poll()`; **needs zero changes** |
| The **trait boundaries** themselves: `Capturer`, `Encoder`, `VirtualDisplay`, `InputInjector`, `AudioCapturer`, `VirtualMic` | platform-neutral signatures; Linux deps are already isolated under `[target.'cfg(target_os="linux")'.dependencies]` |
So a Windows host is **new `#[cfg(target_os = "windows")]` backend modules behind the existing
traits** — the per-frame path, protocol, and control plane don't move. No architectural refactor is
required; the boundaries are already in the right places.
## What a Windows host needs (new code)
Each row is a Linux backend that needs a Windows sibling. Effort is the *implementation* effort;
all reuse the existing trait.
| Subsystem | Linux today | Windows equivalent | Effort | Notes |
|---|---|---|---|---|
| **Capture** | xdg ScreenCast portal → PipeWire (dmabuf) | **DXGI Desktop Duplication** (or Windows.Graphics.Capture) → D3D11 texture | M | DXGI gives a GPU `B8G8R8A8` texture directly |
| **Virtual display** | KWin/Mutter/Sway/gamescope protocols | **SudoVDA** (pre-built signed IDD) — install + drive its control API to add/remove a `WxH@Hz` virtual monitor per session | **M** | ✅ **no longer the blocker**: SudoVDA is the same IDD Sunshine ships, so no driver to author or sign. The `VirtualDisplay` backend = enable the adapter, create a monitor at the client's mode, capture it (DXGI), tear it down on session end. Fallback if SudoVDA is absent: capture an existing monitor (loses native-resolution) |
| **Encode** | `ffmpeg-next` NVENC, CUDA hwframes | Media Foundation H.264/HEVC/AV1, **or** NVENC SDK direct with a D3D11 device context (`AVD3D11VADeviceContext`) | ML | `encode.rs` AU/codec logic + NVENC option strings are portable; only the hwdevice + frame-pool glue swaps |
| **Zero-copy bridge** | dmabuf → EGL/Vulkan → CUDA | D3D11 texture → NVENC (shared texture / `cudaImportExternalMemory` + D3D12 fence) | M | **optional** — a portable CPU-copy path already exists, so v1 can skip this |
| **Input (ptr/kbd)** | libei (RemoteDesktop portal) / wlr protocols | **SendInput** (`keybd_event`/`mouse_event`) | S | the VK→evdev table just becomes VK→`VIRTUAL_KEY` (already Win32-native) |
| **Input (gamepads)** | uinput X-Box-360 pad + FF rumble | **ViGEm** (Virtual Gamepad Emulation) + HID reports | M | rumble back-channel maps to ViGEm notifications |
| **Audio capture** | PipeWire sink-monitor | **WASAPI loopback** (`IAudioCaptureClient`) | SM | also produces interleaved f32 — same `AudioCapturer` contract |
| **Virtual mic** | PipeWire `Audio/Source` | virtual audio device (VB-Cable-style WDM driver) or WASAPI render-to-fake-device | M | needs a driver or a bundled 3rd-party cable |
| **`sendmmsg` batching** | `gamestream/stream.rs` | already has a `cfg(not(linux))` per-packet fallback | — | nothing to do |
**Rough total: ~2,0004,000 LOC of new Rust** (no C++ driver — SudoVDA is reused as-is), spread over
capture/encode/vdisplay/input/audio. With the driver problem solved, the overall effort is now
**medium**; the input+audio layer alone is *smallmedium*.
## Recommended phasing (when picked up)
1. **Phase 0 — "basic Windows host" (no virtual display).** Capture an *existing* monitor (DXGI
Desktop Duplication) → Media Foundation/NVENC encode → SendInput + WASAPI loopback. This proves
the whole stack on Windows with the smallest surface, reusing all of core/QUIC/GameStream/mgmt.
It loses the per-client native-resolution output but is a working Windows host quickly.
2. **Phase 1 — the virtual display via SudoVDA.** A `VirtualDisplay` backend that enables SudoVDA,
creates a monitor at the client's exact `WxH@Hz`, captures it (DXGI), and tears it down on session
end — restoring punktfunk's headline feature with **no driver authoring or signing**. (Ship/guide
the SudoVDA install as a host prerequisite, like the udev rule on Linux.)
3. **Phase 2 — input + audio parity.** ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC
zero-copy.
## Why it's deferred (not started now)
- The remaining work is **medium** and mechanical, but **none of it is buildable or testable on the
Linux dev box** — it would be unvalidated code until there's a Windows box in the loop.
- SudoVDA removed the hard blocker (the signed kernel driver); what's left is a backend port, picked
up whenever a Windows target is in scope.
The architecture is ready whenever the work is scheduled; this doc + the clean trait boundaries are
the down payment. Start at **Phase 0** for the fastest path to a working Windows host.
+34
View File
@@ -0,0 +1,34 @@
{
"name": "punktfunk-docs",
"private": true,
"type": "module",
"description": "punktfunk documentation site — Fumadocs on TanStack Start (Vite + Nitro/bun)",
"scripts": {
"dev": "vite dev --port 3001",
"build": "vite build",
"start": "bun run .output/server/index.mjs",
"lint": "tsc --noEmit"
},
"dependencies": {
"@tanstack/react-router": "^1.121.0",
"@tanstack/react-start": "^1.121.0",
"fumadocs-core": "^16.10.1",
"fumadocs-ui": "^16.10.1",
"react": "^19.0.0",
"react-dom": "^19.0.0"
},
"devDependencies": {
"@tailwindcss/vite": "^4.0.0",
"@tanstack/nitro-v2-vite-plugin": "^1.155.0",
"@types/mdx": "^2.0.14",
"@types/node": "^22.10.0",
"@types/react": "^19.0.0",
"@types/react-dom": "^19.0.0",
"@vitejs/plugin-react": "^5",
"fumadocs-mdx": "^15.0.12",
"tailwindcss": "^4.0.0",
"typescript": "^5.7.0",
"vite": "^7.3.5",
"vite-tsconfig-paths": "^5.1.0"
}
}
+10
View File
@@ -0,0 +1,10 @@
import { defineDocs, defineConfig } from 'fumadocs-mdx/config'
// Single docs collection: every .md/.mdx under content/docs/. Frontmatter `title` is
// required (Fumadocs renders it as the page heading); `description` is optional.
export const docs = defineDocs({
dir: 'content/docs',
})
// Global MDX options (remark/rehype). Default export is read by the fumadocs-mdx vite plugin.
export default defineConfig()
+15
View File
@@ -0,0 +1,15 @@
import defaultMdxComponents from 'fumadocs-ui/mdx'
import type { MDXComponents } from 'mdx/types'
export function getMDXComponents(components?: MDXComponents) {
return {
...defaultMdxComponents,
...components,
} satisfies MDXComponents
}
export const useMDXComponents = getMDXComponents
declare global {
type MDXProvidedComponents = ReturnType<typeof getMDXComponents>
}
+10
View File
@@ -0,0 +1,10 @@
import type { BaseLayoutProps } from 'fumadocs-ui/layouts/shared'
// Shared chrome (nav title, links) for both the docs layout and the home layout.
export function baseOptions(): BaseLayoutProps {
return {
nav: {
title: 'punktfunk',
},
}
}
+7
View File
@@ -0,0 +1,7 @@
import { docs } from 'collections/server'
import { loader } from 'fumadocs-core/source'
export const source = loader({
baseUrl: '/docs',
source: docs.toFumadocsSource(),
})
+29
View File
@@ -0,0 +1,29 @@
import { createRouter as createTanStackRouter, Link } from '@tanstack/react-router'
import { routeTree } from './routeTree.gen'
export function getRouter() {
return createTanStackRouter({
routeTree,
defaultPreload: 'intent',
scrollRestoration: true,
defaultNotFoundComponent: NotFound,
})
}
function NotFound() {
return (
<main className="flex flex-1 flex-col items-center justify-center gap-4 px-4 py-24 text-center">
<h1 className="text-3xl font-bold">404</h1>
<p className="text-fd-muted-foreground">This page could not be found.</p>
<Link to="/" className="text-fd-primary underline underline-offset-4">
Back home
</Link>
</main>
)
}
declare module '@tanstack/react-router' {
interface Register {
router: ReturnType<typeof getRouter>
}
}
+40
View File
@@ -0,0 +1,40 @@
/// <reference types="vite/client" />
import { createRootRoute, HeadContent, Outlet, Scripts } from '@tanstack/react-router'
import * as React from 'react'
import { RootProvider } from 'fumadocs-ui/provider/tanstack'
import appCss from '@/styles/app.css?url'
export const Route = createRootRoute({
head: () => ({
meta: [
{ charSet: 'utf-8' },
{ name: 'viewport', content: 'width=device-width, initial-scale=1' },
{ name: 'color-scheme', content: 'dark light' },
{ title: 'punktfunk docs' },
],
links: [{ rel: 'stylesheet', href: appCss }],
}),
component: RootComponent,
})
function RootComponent() {
return (
<RootDocument>
<Outlet />
</RootDocument>
)
}
function RootDocument({ children }: { children: React.ReactNode }) {
return (
<html lang="en" suppressHydrationWarning>
<head>
<HeadContent />
</head>
<body className="flex flex-col min-h-screen">
<RootProvider>{children}</RootProvider>
<Scripts />
</body>
</html>
)
}
+16
View File
@@ -0,0 +1,16 @@
import { createFileRoute } from '@tanstack/react-router'
import { source } from '@/lib/source'
import { createFromSource } from 'fumadocs-core/search/server'
const server = createFromSource(source, {
// https://docs.orama.com/docs/orama-js/supported-languages
language: 'english',
})
export const Route = createFileRoute('/api/search')({
server: {
handlers: {
GET: async ({ request }) => server.GET(request),
},
},
})
+58
View File
@@ -0,0 +1,58 @@
import { createFileRoute, notFound } from '@tanstack/react-router'
import { DocsLayout } from 'fumadocs-ui/layouts/docs'
import { createServerFn } from '@tanstack/react-start'
import { source } from '@/lib/source'
import browserCollections from 'collections/browser'
import { DocsBody, DocsDescription, DocsPage, DocsTitle } from 'fumadocs-ui/layouts/docs/page'
import { baseOptions } from '@/lib/layout.shared'
import { useFumadocsLoader } from 'fumadocs-core/source/client'
import { Suspense } from 'react'
import { useMDXComponents } from '@/components/mdx'
export const Route = createFileRoute('/docs/$')({
component: Page,
loader: async ({ params }) => {
const slugs = params._splat?.split('/') ?? []
const data = await serverLoader({ data: slugs })
await clientLoader.preload(data.path)
return data
},
})
const serverLoader = createServerFn({
method: 'GET',
})
.validator((slugs: string[]) => slugs)
.handler(async ({ data: slugs }) => {
const page = source.getPage(slugs)
if (!page) throw notFound()
return {
path: page.path,
pageTree: await source.serializePageTree(source.getPageTree()),
}
})
const clientLoader = browserCollections.docs.createClientLoader({
component({ toc, frontmatter, default: MDX }, _props: undefined) {
return (
<DocsPage toc={toc}>
<DocsTitle>{frontmatter.title}</DocsTitle>
<DocsDescription>{frontmatter.description}</DocsDescription>
<DocsBody>
<MDX components={useMDXComponents()} />
</DocsBody>
</DocsPage>
)
},
})
function Page() {
const data = useFumadocsLoader(Route.useLoaderData())
return (
<DocsLayout {...baseOptions()} tree={data.pageTree}>
<Suspense>{clientLoader.useContent(data.path)}</Suspense>
</DocsLayout>
)
}
+26
View File
@@ -0,0 +1,26 @@
import { createFileRoute, Link } from '@tanstack/react-router'
import { HomeLayout } from 'fumadocs-ui/layouts/home'
import { baseOptions } from '@/lib/layout.shared'
export const Route = createFileRoute('/')({ component: Home })
function Home() {
return (
<HomeLayout {...baseOptions()}>
<main className="flex flex-1 flex-col items-center justify-center gap-6 px-4 py-24 text-center">
<h1 className="text-4xl font-bold tracking-tight">punktfunk</h1>
<p className="max-w-xl text-fd-muted-foreground">
A ground-up low-latency desktop and game streaming stack, built Linux-first, with a
shared Rust protocol core and native clients per platform.
</p>
<Link
to="/docs/$"
params={{ _splat: '' }}
className="rounded-lg bg-fd-primary px-5 py-2.5 font-medium text-fd-primary-foreground"
>
Read the docs
</Link>
</main>
</HomeLayout>
)
}
+6
View File
@@ -0,0 +1,6 @@
@import 'tailwindcss';
@import 'fumadocs-ui/css/neutral.css';
@import 'fumadocs-ui/css/preset.css';
/* Pull Fumadocs UI's own classes into the Tailwind 4 scan so they aren't purged. */
@source '../../node_modules/fumadocs-ui/dist/**/*.js';
+25
View File
@@ -0,0 +1,25 @@
{
"compilerOptions": {
"target": "ES2022",
"lib": ["ES2023", "DOM", "DOM.Iterable"],
"module": "ESNext",
"moduleResolution": "Bundler",
"jsx": "react-jsx",
"strict": true,
"skipLibCheck": true,
"esModuleInterop": true,
"verbatimModuleSyntax": true,
"noUncheckedIndexedAccess": true,
"resolveJsonModule": true,
"isolatedModules": true,
"allowJs": true,
"checkJs": false,
"noEmit": true,
"types": ["vite/client"],
"paths": {
"@/*": ["./src/*"],
"collections/*": ["./.source/*"]
}
},
"include": ["src", "source.config.ts", "vite.config.ts"]
}
+30
View File
@@ -0,0 +1,30 @@
import { defineConfig } from 'vite'
import tsConfigPaths from 'vite-tsconfig-paths'
import tailwindcss from '@tailwindcss/vite'
import mdx from 'fumadocs-mdx/vite'
import { tanstackStart } from '@tanstack/react-start/plugin/vite'
import { nitroV2Plugin } from '@tanstack/nitro-v2-vite-plugin'
import viteReact from '@vitejs/plugin-react'
export default defineConfig({
server: { port: 3001 },
plugins: [
// Fumadocs MDX must run first: it registers the resolver/loader that transforms
// .md/.mdx and emits the `.source` typegen (the `collections/*` virtual modules the
// docs route imports), before route generation and the React transform.
mdx(),
tsConfigPaths({ projects: ['./tsconfig.json'] }),
tailwindcss(),
// Full SSR on the TanStack Start runtime; the docs render server-side then hydrate.
tanstackStart(),
// Nitro v2 is the deployment target: the `bun` preset bundles a Bun-runnable server
// to .output/ (`bun run .output/server/index.mjs`), matching the rest of the project.
nitroV2Plugin({
preset: 'bun',
compatibilityDate: '2026-06-12',
}),
// Must come AFTER tanstackStart — provides the React JSX transform + Refresh runtime
// that Start's dev mode requires.
viteReact(),
],
})