New standalone app at docs-site/ — Fumadocs (fumadocs-core/ui 16, fumadocs-mdx
15) on TanStack Start (Vite 7 + nitro-v2 bun preset, React 19, Tailwind 4),
mirroring the web/ console stack but with no auth/i18n/orval — docs stay public.
- catch-all docs route (routes/docs/$.tsx), Orama search (routes/api/search.ts),
RootProvider shell, MDX component map, shared nav, custom 404
- content/docs/: hand-written index.mdx + meta.json nav, plus 7 pages imported
from repo docs/ + README (leading H1 stripped, YAML frontmatter added; kept as
.md so existing </{ don't trip MDX JSX). Content is a one-time snapshot.
- mdx() is plugins[0]; tsconfig collections/* -> ./.source/*; SSR search variant;
@source for fumadocs-ui classes. Generated .source/routeTree/dist/.output ignored.
Verified: bun run build (client+SSR+nitro) green, tsc clean, dev + prod servers
serve all routes 200 with SSR content + nav, search returns hits, 404 works.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,111 @@
|
||||
---
|
||||
title: "DualSense Haptics"
|
||||
description: "Feasibility and scoping for audio-driven DualSense haptics."
|
||||
---
|
||||
|
||||
|
||||
**Status: scoped, NO-GO for now (deferred).** Advanced voice-coil haptics on the DualSense are
|
||||
driven by the controller's **USB audio interface** (4-channel surround, the back two channels carry
|
||||
the haptic waveform), *not* by HID reports. Emulating that on a Linux host and faithfully replaying
|
||||
it on the Apple client both hit hard walls, and the supply of software that actually *emits* these
|
||||
haptics on a Linux host is essentially zero. We defer the audio-haptics feature and instead land the
|
||||
parts of "really supporting the DualSense" that *are* reachable: **adaptive triggers (HID) and
|
||||
two-motor rumble.**
|
||||
|
||||
(Grounded in a 4-agent feasibility read — host USB-gadget viability, DualSense audio descriptors,
|
||||
Linux game demand, Apple client render path — 2026-06-10.)
|
||||
|
||||
## The one distinction that decides everything
|
||||
|
||||
| Feature | How it's driven | Reachable for us? |
|
||||
|---|---|---|
|
||||
| Basic rumble (2 motors) | HID output report `0x02`, bytes 3–4 | **Yes** — already parsed; client already has `nextRumble()` |
|
||||
| **Adaptive triggers** (L2/R2 resistance) | HID output report `0x02`, bytes 11–22 / 22–33 | **Yes** — already parsed in `dualsense.rs`; just needs the `0xCD` back-channel + client render |
|
||||
| **Advanced haptics** (voice-coil actuators) | **USB *audio* interface** — 4-ch, back 2 channels = haptic PCM | **No (for now)** — see the three walls below |
|
||||
|
||||
The UHID DualSense we already built is **HID-only**. It cannot present the DualSense's *audio*
|
||||
interface, so it structurally cannot carry advanced haptics. That's not a bug in our implementation —
|
||||
it's the wrong transport for this signal.
|
||||
|
||||
## The three walls (any one is fatal on its own)
|
||||
|
||||
### Wall 1 — Host capture needs a kernel rebuild
|
||||
To *capture* haptic audio a game emits, the host must present a virtual device that owns the
|
||||
DualSense audio interface. The standard way is a composite USB gadget (`configfs` + `f_hid` +
|
||||
`f_uac2`) bound to a software UDC (`dummy_hcd`).
|
||||
|
||||
- ✅ Present & enabled on this box: `CONFIG_USB_CONFIGFS`, `CONFIG_USB_CONFIGFS_F_HID`,
|
||||
`CONFIG_USB_CONFIGFS_F_UAC2`, plus `libcomposite`/`usb_f_hid`/`usb_f_uac2`/`u_audio` modules.
|
||||
- ❌ **Blocker:** `# CONFIG_USB_DUMMY_HCD is not set` in `/boot/config-7.0.0-22-generic`. No
|
||||
`dummy_hcd.ko`, no `/sys/class/udc/`. **No UDC → nothing to bind the gadget to.** Requires a
|
||||
custom kernel build to enable `CONFIG_USB_DUMMY_HCD=m`, plus root for module-load/configfs.
|
||||
|
||||
A lighter alternative exists — a **virtual PipeWire/ALSA sink renamed as the DualSense** (this is how
|
||||
the working Linux setups capture the back-2-channels today, via WirePlumber rules). It skips the
|
||||
kernel rebuild, but is gated by the same Wall 2 below, and games' audio-device detection is
|
||||
hardcoded per-title so it's fragile.
|
||||
|
||||
### Wall 2 — Almost nothing on a Linux host emits these haptics
|
||||
This is the decisive one. The *supply* that would feed our capture barely exists:
|
||||
|
||||
- **Steam Input (Linux):** no official advanced-haptics support (open feature request as of 2026).
|
||||
- **Sony's `hid-playstation` kernel driver:** explicitly does **not** expose VCM haptics or adaptive
|
||||
triggers — basic rumble only.
|
||||
- **RPCS3:** treats the DualSense as a generic pad; no advanced haptics.
|
||||
- **Native Linux games:** effectively **zero** with advanced haptics.
|
||||
- **The only working path** is a handful of Proton titles (FF7 Remake, Ghostwire, Deathloop, Animal
|
||||
Well, Stellar Blade) via ClearlyClaire's *custom Wine patches*, **USB-only**, Steam Input
|
||||
disabled, forced into a 4.0-surround profile, device renamed to match Windows. ~5–10 games total.
|
||||
- Bluetooth can't carry it on Linux *or* Windows (Sony's proprietary A2DP repurposing isn't exposed).
|
||||
|
||||
A host-side capture feature is only as useful as the software willing to drive it. On Linux that set
|
||||
is a niche-of-a-niche.
|
||||
|
||||
### Wall 3 — The Apple client can't faithfully replay it
|
||||
Even with a captured waveform, the primary client (macOS/iOS) can't render it well:
|
||||
|
||||
- macOS GameController exposes the DualSense as a **basic gamepad** — no voice-coil / adaptive-trigger
|
||||
access. Those are PS5-only in Apple's stack.
|
||||
- CoreHaptics is **discrete, pattern-based** (`CHHapticPattern` events, ≤30 s), **not** a PCM
|
||||
streaming sink. Converting a streamed haptic waveform to patterns is lossy — it throws away exactly
|
||||
the fidelity that makes voice-coil haptics worth having.
|
||||
- There is **no public macOS API** to route CoreAudio to the DualSense's channels 3–4. Doing it
|
||||
anyway means private/reverse-engineered APIs that break across OS updates.
|
||||
|
||||
## What we *can* ship instead ("really supporting the DualSense" minus audio haptics)
|
||||
|
||||
The HID DualSense we built is the foundation, and the high-value parts are within reach:
|
||||
|
||||
1. **Adaptive triggers — GO.** `dualsense.rs` already parses the L2/R2 trigger effects out of HID
|
||||
output report `0x02`. Finishing this is the paused HID work: route them over the `0xCD`
|
||||
HID-output back-channel and render on the client. This delivers the headline "DualSense feel"
|
||||
(trigger resistance/weapon tension) for any source that emits it — and it's pure HID, no audio
|
||||
interface, no kernel rebuild.
|
||||
2. **Two-motor rumble — already done.** Parsed host-side; the Apple client already has
|
||||
`nextRumble()`. Wire it to `GCDeviceHaptics`/`CHHapticEngine` as discrete patterns (API-clean,
|
||||
no private APIs).
|
||||
3. **LED / player-LED / touchpad / motion** — already parsed; finish the `0xCC`/`0xCD` routing.
|
||||
|
||||
This is the resume-able HID DualSense Phase C/D/E work — it stands on its own and was never blocked.
|
||||
|
||||
## Conditions for a future GO on audio haptics
|
||||
|
||||
Revisit if **all three** change:
|
||||
|
||||
- A real DualSense is available on the dev box to capture an authoritative `lsusb -v` + the exact
|
||||
UAC channel/sample-rate/format layout (today: undocumented, would need reverse-engineering).
|
||||
- The host target gains a UDC (custom kernel with `dummy_hcd`, or real hardware OTG) **or** we accept
|
||||
the PipeWire-renamed-sink path *and* the title set that emits haptics on Linux grows beyond the
|
||||
Proton-patch niche.
|
||||
- The client target shifts to one that can render PCM haptics (a Linux/Windows client with direct
|
||||
CoreAudio-style channel access, or a future Apple API) — or we accept lossy pattern conversion.
|
||||
|
||||
Until then the cost/benefit is upside-down: three hard subsystems (kernel, USB gadget, audio
|
||||
routing) to serve ~5–10 Proton titles, rendered lossily on the one client we ship.
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Defer audio-driven advanced haptics. Land adaptive triggers (HID) + rumble instead** — that's the
|
||||
reachable 80% of "really supporting the DualSense," needs no kernel work, and the parsing is already
|
||||
written. Keep this doc as the down payment for the audio-haptics feature whenever the three
|
||||
conditions above are met.
|
||||
@@ -0,0 +1,300 @@
|
||||
---
|
||||
title: "Implementation Plan"
|
||||
description: "The full design: protocol core, milestones, and architecture."
|
||||
---
|
||||
|
||||
|
||||
*A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.*
|
||||
|
||||
> `punktfunk` is a placeholder codename — rename freely. It fits the lowercase house style (`unom`, `played`, `remplir`) and reads as "glass-to-glass light," which is the whole point.
|
||||
|
||||
---
|
||||
|
||||
## 0. The thesis (why this is worth building)
|
||||
|
||||
Two concrete gaps justify a new project rather than another fork:
|
||||
|
||||
1. **The 1 Gbps wall is a FEC design limit, not a bandwidth limit.** Moonlight/Sunshine protect each frame with Reed–Solomon over GF(2⁸), which caps a block at 255 shards. At 5120×1440@240 that ceiling is hit around 1 Gbps. Switching the erasure code to **Leopard-RS over GF(2¹⁶)** (via the `reed-solomon-simd` crate) raises the per-block shard limit to 65,536 and runs in O(n log n) with SIMD. The wall disappears as a *consequence* of a better core, not as a hack.
|
||||
|
||||
2. **Linux software virtual displays are a real, unfilled gap.** The compositor-side capability now exists (Mutter headless virtual monitors since GNOME 40; wlroots headless outputs; KWin virtual outputs in Plasma 6), but no streaming host *drives* those APIs to create a client-sized output on demand, capture it via PipeWire, and route input back via libei. Apollo's virtual display is Windows-only. This is the immediate, shippable win.
|
||||
|
||||
**Strategic ordering:** ship the Linux virtual-display host speaking the *existing* Moonlight protocol first (every Moonlight/Artemis client works on day one, no client to write). Only then introduce the new GF(2¹⁶) transport as a negotiated protocol extension with our own clients. Value early, hard parts deferred until de-risked.
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope & non-goals
|
||||
|
||||
**In scope (eventually):**
|
||||
- Linux streaming host with on-demand software virtual displays (KWin first, then wlroots, then Mutter).
|
||||
- A shared Rust protocol/transport/FEC core exposed over a stable C ABI.
|
||||
- A modern transport that removes the 1 Gbps ceiling.
|
||||
- Native clients: Rust (Linux), Swift (macOS/iOS), Kotlin (Android) — all linking the same core.
|
||||
|
||||
**Explicit non-goals (at least at first):**
|
||||
- Windows *host* support (Sunshine/Apollo already do this well; no gap to fill).
|
||||
- Internet/NAT-traversal relay infrastructure (LAN/VPN first; you already run Headscale/NetBird — lean on that).
|
||||
- Reinventing encoders/decoders (bind to FFmpeg + vendor SDKs; never rewrite codecs).
|
||||
- A bespoke compositor (drive existing ones; only consider a dedicated headless compositor as a *deployment mode*, see §6).
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture overview
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Host["Linux Host (Rust)"]
|
||||
VD["Virtual display orchestrator<br/>(KWin / wlroots / Mutter)"]
|
||||
CAP["Capture<br/>(PipeWire / dmabuf)"]
|
||||
ENC["Encoder<br/>(VAAPI / NVENC via FFmpeg)"]
|
||||
VD --> CAP --> ENC
|
||||
ENC --> COREH
|
||||
IN_H["Input injector<br/>(libei / uinput)"]
|
||||
COREH["punktfunk-core (C ABI)<br/>protocol · FEC · pacing · crypto"]
|
||||
COREH --> IN_H
|
||||
end
|
||||
|
||||
COREH <-->|"UDP+FEC video / QUIC control+audio"| COREC
|
||||
|
||||
subgraph Client["Client (Rust / Swift / Kotlin)"]
|
||||
COREC["punktfunk-core (same crate, C ABI)"]
|
||||
DEC["Decoder<br/>(VideoToolbox / NVDEC / VAAPI)"]
|
||||
PRES["Present + frame pacing"]
|
||||
INP["Input capture"]
|
||||
COREC --> DEC --> PRES
|
||||
INP --> COREC
|
||||
end
|
||||
```
|
||||
|
||||
**The load-bearing decision:** `punktfunk-core` is one crate, compiled once, linked by every host and client through a C ABI. Protocol logic, FEC, packet pacing, jitter buffering, pairing, and crypto live there and exist exactly once. Platform code (capture, encode, decode, present, input, UI) lives outside the core and is written in whatever language suits the platform.
|
||||
|
||||
---
|
||||
|
||||
## 3. Protocol strategy (three phases)
|
||||
|
||||
| Phase | Protocol | Clients that work | Bitrate ceiling | Purpose |
|
||||
|------|----------|-------------------|-----------------|---------|
|
||||
| **P1** | GameStream-compatible (existing Moonlight wire format) | All existing Moonlight/Artemis clients | ~1 Gbps (legacy GF(2⁸) FEC) | Ship the Linux virtual-display win with zero client work |
|
||||
| **P2** | `punktfunk/1` negotiated extension: GF(2¹⁶) FEC, multi-block framing, optional QUIC control | punktfunk clients only; falls back to P1 for others | Multi-Gbps | Break the wall; introduce native clients |
|
||||
| **P3** | `punktfunk/1` as primary; GameStream kept as compat shim | punktfunk everywhere, Moonlight as fallback | Multi-Gbps | Full control of features (mic passthrough, per-client identity, HDR signalling) |
|
||||
|
||||
Negotiation: extend the `serverinfo`/RTSP `SETUP` handshake with a capability flag. Old clients never see the flag and get P1 behavior. This is how Apollo/Artemis diverge cleanly, and it keeps you compatible while you build.
|
||||
|
||||
---
|
||||
|
||||
## 4. Tech stack (settled)
|
||||
|
||||
**Language split:** Rust for the core and all non-Apple platform code; Swift only for the macOS/iOS client UI + VideoToolbox/Metal; Kotlin for Android UI + MediaCodec. The C ABI is the seam.
|
||||
|
||||
**Threading:** native OS threads for the video hot path. `tokio` is allowed *only* for the control plane (pairing, web config, QUIC control stream). The per-frame pipeline must never touch an async runtime.
|
||||
|
||||
### Core crate dependencies
|
||||
|
||||
| Concern | Crate | Notes |
|
||||
|--------|-------|-------|
|
||||
| FEC | `reed-solomon-simd` (v3+) | Leopard/GF(2¹⁶), SIMD, O(n log n) — the wall-breaker |
|
||||
| QUIC (control/audio) | `quinn` | Datagram ext for audio; reliable streams for control |
|
||||
| TLS / crypto | `rustls` + `ring` (or `aws-lc-rs`) | Pairing, session keys (AES-GCM to match GameStream in P1) |
|
||||
| Serialization | `zerocopy` / `bytes` | Wire structs `#[repr(C)]`, zero-copy parse |
|
||||
| C header gen | `cbindgen` | Generates `punktfunk_core.h` from the ABI module |
|
||||
| Error/log | `tracing` | Structured; feature-gate off the hot path |
|
||||
|
||||
### Linux host dependencies
|
||||
|
||||
| Concern | Crate / API | Notes |
|
||||
|--------|-------------|-------|
|
||||
| Capture | `pipewire` (pipewire-rs) | ScreenCast portal stream → dmabuf |
|
||||
| Portal / DBus | `ashpd` + `zbus` | xdg-desktop-portal: ScreenCast, RemoteDesktop |
|
||||
| Encode | `ffmpeg-next` or `rsmpeg` | VAAPI / NVENC, dmabuf import (zero-copy) |
|
||||
| Input inject | `reis` (libei) + `input-linux` (uinput fallback) | Wayland-native first, uinput as universal fallback |
|
||||
| Virtual output | per-compositor (see §6) | KWin DBus / Sway `create_output` / Mutter DBus |
|
||||
| Web config | `axum` + `tokio` + small Vite/React UI | You own this stack already |
|
||||
|
||||
### Apple client (P2+)
|
||||
|
||||
Swift + VideoToolbox (decode) + Metal (present) + SwiftUI. Imports `punktfunk_core.h` directly via a module map — no glue layer.
|
||||
|
||||
### Ruled out
|
||||
|
||||
- **Swift for the host/core:** no Linux Wayland/PipeWire/DRM/VAAPI ecosystem; ARC in hot loops. (Excellent *Apple-client* language, wrong for systems/Linux.)
|
||||
- **Go:** GC disqualifies the hot path.
|
||||
- **C++:** throws away the safety/concurrency wins that justified greenfield over forking.
|
||||
- **Zig:** best-in-class C interop, but pre-1.0 with no Wayland/QUIC ecosystem — too much risk for a multi-month build. Revisit later if desired.
|
||||
|
||||
---
|
||||
|
||||
## 5. The C ABI boundary
|
||||
|
||||
Design it on day one; retrofitting an ABI is painful.
|
||||
|
||||
**Principles**
|
||||
- Opaque handles only across the boundary: `PunktfunkSession*`, never Rust types.
|
||||
- All cross-boundary structs are `#[repr(C)]`; primitives + pointer/len pairs for buffers.
|
||||
- Async events via registered C callbacks (`fn ptr` + `void* userdata`).
|
||||
- Explicit, documented ownership: who frees what, when. Provide `punktfunk_*_free` for every allocation that crosses out.
|
||||
- Versioned ABI: `uint32_t punktfunk_abi_version(void)` + a `PunktfunkConfig` struct whose first field is its own size for forward-compat.
|
||||
|
||||
**Minimal surface (sketch)**
|
||||
|
||||
```c
|
||||
// lifecycle
|
||||
PunktfunkSession* punktfunk_session_new(const PunktfunkConfig* cfg);
|
||||
void punktfunk_session_free(PunktfunkSession*);
|
||||
|
||||
// host: feed an encoded access unit (the core does FEC + packetize + pace + send)
|
||||
int punktfunk_host_submit_frame(PunktfunkSession*, const uint8_t* data, size_t len,
|
||||
uint64_t pts_ns, PunktfunkFrameFlags flags);
|
||||
|
||||
// client: pull a reassembled, FEC-recovered access unit ready to decode
|
||||
int punktfunk_client_poll_frame(PunktfunkSession*, PunktfunkFrame* out /*borrowed until next poll*/);
|
||||
|
||||
// input (both directions): client captures, host receives via callback
|
||||
int punktfunk_send_input(PunktfunkSession*, const PunktfunkInputEvent*);
|
||||
void punktfunk_set_input_callback(PunktfunkSession*, PunktfunkInputCb, void* user);
|
||||
|
||||
// stats for the frame-pacing/quality logic and the web UI
|
||||
void punktfunk_get_stats(PunktfunkSession*, PunktfunkStats* out);
|
||||
```
|
||||
|
||||
Keep it this small. Everything platform-specific (how you got the encoded bytes, how you decode them) stays on the platform side.
|
||||
|
||||
---
|
||||
|
||||
## 6. Virtual display orchestration
|
||||
|
||||
This is the differentiator and the most fragmented part. Two deployment models — support both eventually, pick one for the MVP.
|
||||
|
||||
**Model A — Attach to the running session.** Create a client-sized virtual output *inside the user's live desktop*, stream it, tear it down on disconnect. This is "add a monitor to my actual PC." Best UX, hardest because it depends on per-compositor runtime APIs.
|
||||
|
||||
**Model B — Dedicated headless session.** Spawn a separate headless compositor purely for the stream (e.g. `gnome-shell --headless --virtual-monitor WxH`, or a headless wlroots compositor). Cleaner isolation, sidesteps runtime-output APIs, ideal for "remote second PC." Worse for "mirror/extend my real desktop."
|
||||
|
||||
**Per-compositor (Model A) runtime virtual-output creation:**
|
||||
|
||||
- **KWin / Plasma 6 (recommended MVP target — matches your CachyOS/KDE daily driver and where the gap is loudest):** KWin can create virtual outputs; KRdp already does this internally for remote sessions. Drive it via the KWin DBus interface; capture via `xdg-desktop-portal-kde` ScreenCast (PipeWire); inject input via the RemoteDesktop portal or `reis`.
|
||||
- **wlroots (Sway/Hyprland — fastest to *prototype* the pipeline):** enable the headless backend (`WLR_BACKENDS=…,headless`), then `swaymsg create_output` / `hyprctl output create headless`. Capture via `wlr-screencopy` or the portal. Simplest API; good for validating capture→encode→send before fighting KWin/Mutter.
|
||||
- **Mutter / GNOME:** virtual monitors via the headless backend; runtime creation via Mutter DBus (`org.gnome.Mutter.*` — partly experimental). Capture via `xdg-desktop-portal-gnome` ScreenCast.
|
||||
|
||||
**Recommendation:** do a 1–2 day wlroots spike to prove the *pipeline*, then build the real MVP on KWin because that's your deployment target. Abstract virtual-output creation behind a trait so compositors are pluggable:
|
||||
|
||||
```rust
|
||||
trait VirtualDisplay {
|
||||
fn create(&self, mode: Mode) -> Result<OutputHandle>;
|
||||
fn destroy(&self, h: OutputHandle) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. The hot path: pipeline & latency budget
|
||||
|
||||
Per-frame pipeline, each stage on its own thread, connected by bounded SPSC channels (drop-oldest on overflow, never block the encoder):
|
||||
|
||||
```
|
||||
capture(dmabuf) → encode(NVENC/VAAPI) → core[FEC+packetize+pace+send]
|
||||
│ network
|
||||
client: recv → core[reorder+FEC recover+jitter] → decode → present
|
||||
```
|
||||
|
||||
**Glass-to-glass budget (LAN, 240 Hz = 4.17 ms/frame):**
|
||||
|
||||
| Stage | Target | Notes |
|
||||
|------|--------|-------|
|
||||
| Capture latency | ≤ 1 frame | dmabuf, no copy to CPU |
|
||||
| Encode | 1–4 ms | NVENC low-latency preset; tune lookahead off |
|
||||
| FEC + packetize | < 1 ms | SIMD RS; pre-allocated shard buffers |
|
||||
| Network (LAN) | < 1 ms | `sendmmsg` / UDP GSO to cut syscalls |
|
||||
| Jitter buffer | 0–1 frame | adaptive; minimum that hides observed jitter |
|
||||
| FEC recover + reassemble | < 1 ms | only when loss occurs |
|
||||
| Decode | 1–4 ms | hardware decoder |
|
||||
| Present | ≤ 1 frame | align to client vsync |
|
||||
|
||||
**Target: 15–35 ms glass-to-glass on LAN.** The art is *frame pacing* — matching capture/encode cadence to the client's actual refresh and keeping the jitter buffer as small as the link allows. This, not the codec, is what separates good from bad streaming. Budget real time for it.
|
||||
|
||||
**Throughput math to keep honest:** 5120×1440@240 ≈ 1.77 Gpx/s. At 0.5 bpp that's ~885 Mbps; 0.6 bpp ≈ 1.06 Gbps; 0.8 bpp (4:4:4 headroom) ≈ 1.4 Gbps. The GF(2¹⁶) FEC + multi-block framing must sustain these without the per-frame shard count being the limiter — which it no longer is once you leave GF(2⁸).
|
||||
|
||||
---
|
||||
|
||||
## 8. Milestones
|
||||
|
||||
Sizing is rough and relative (Spike / S / M / L) for a focused solo dev; treat as ordering, not deadlines.
|
||||
|
||||
**M0 — Pipeline spike (S).** wlroots headless output → PipeWire capture → VAAPI/NVENC encode → dump H.265 to a file that plays. *Acceptance:* a valid encoded file from a virtual output, no streaming yet. Proves the Linux capture+encode chain end-to-end.
|
||||
|
||||
**M1 — `punktfunk-core` skeleton + C ABI (M).** Session lifecycle, GameStream-compatible packetization and GF(2⁸) FEC (P1), AES-GCM, `cbindgen` header, a tiny C test harness. *Acceptance:* core links from C; round-trips packets in a loopback test with simulated loss.
|
||||
|
||||
**M2 — P1 host: stream to stock Moonlight (L).** Wire M0's pipeline into the core; implement `serverinfo`/pairing/RTSP enough for a real Moonlight client to connect, with a KWin virtual output created on connect and destroyed on disconnect. Input via `reis`/uinput. *Acceptance:* **you play a game on your KDE box streamed to a stock Moonlight client on a virtual display, no dummy plug, no kernel args.** This is the shippable milestone and the project's reason to exist.
|
||||
|
||||
**M3 — Measurement harness (S).** Glass-to-glass latency measurement (on-screen QR/timestamp or photodiode), packet-loss injection, frame-pacing and stall metrics surfaced in the web UI. *Acceptance:* you can quantify a regression. Build this before optimizing anything.
|
||||
|
||||
**M4 — P2 transport: break the wall (L).** Add `punktfunk/1` negotiation; swap to `reed-solomon-simd` GF(2¹⁶) with multi-block per-frame framing; optional QUIC control/audio. Write a minimal **Rust** reference client (decode via VAAPI, present via wgpu/Vulkan) to exercise it. *Acceptance:* a stable stream above 1.4 Gbps at 5120×1440@240 with loss recovery working; latency unchanged vs. M2.
|
||||
|
||||
**M5 — Apple client (L).** Swift + VideoToolbox + Metal + SwiftUI, linking `punktfunk-core` via the C header. *Acceptance:* the Mac Studio plays a stream at native resolution/refresh.
|
||||
|
||||
**M6 — Feature surface (M, ongoing).** Mic passthrough as a proper encrypted, per-client reverse audio stream (the thing the upstream PR got wrong); HDR signalling; per-client identity/permissions; pause/resume. *Acceptance:* feature parity with Apollo on the items you care about, plus mic done right.
|
||||
|
||||
---
|
||||
|
||||
## 9. Risk register
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| KWin runtime virtual-output API is undocumented/unstable | High | High | Spike on wlroots first to de-risk the pipeline; study KRdp's source for the KWin path; keep `VirtualDisplay` pluggable so a stuck compositor doesn't block the project |
|
||||
| Wayland input injection gaps (libei still evolving) | Med | Med | uinput fallback always available; `reis` for the Wayland-native path |
|
||||
| dmabuf → encoder zero-copy import quirks per GPU/driver | High | Med | Validate on your actual NVIDIA + AMD hardware early (M0); have a CPU-copy fallback path |
|
||||
| Encoder/decoder can't sustain 1.77 Gpx/s @ 240 | Med | High | Measure in M0/M4 on real silicon; this is a hardware ceiling no rewrite fixes — discover it before P2, not after |
|
||||
| Frame pacing eats more time than expected | High | Med | M3 measurement harness first; treat pacing as a first-class subsystem, not a polish step |
|
||||
| Scope creep into a full Moonlight replacement | High | High | P1 (stock-client compat) is the firewall: it forces you to ship value before writing a client |
|
||||
| Solo bandwidth vs. your other projects (ENRW thesis, played) | High | Med | M2 is a complete, useful artifact on its own; the plan is safe to pause after any milestone |
|
||||
|
||||
---
|
||||
|
||||
## 10. Testing & measurement
|
||||
|
||||
- **Loopback correctness:** core encodes→FEC→loss-inject→recover→decode in-process; property tests over loss patterns and shard counts (proptest).
|
||||
- **Glass-to-glass latency:** rendered timestamp/QR on host, read back on client capture; or a photodiode for true photons. Track p50/p99.
|
||||
- **Loss resilience:** `tc netem` to inject loss/jitter/reorder; verify FEC recovery and graceful degradation.
|
||||
- **Pacing:** log present timestamps vs. client vsync; alert on stalls and duplicate/dropped frames.
|
||||
- **Soak:** multi-hour streams; watch for buffer growth, fd leaks, encoder session exhaustion.
|
||||
- **Hardware matrix:** your NVIDIA box (NVENC), an AMD/Intel box (VAAPI), Mac Studio (VideoToolbox decode). Catch driver quirks early.
|
||||
|
||||
---
|
||||
|
||||
## 11. Repo / workspace structure
|
||||
|
||||
```
|
||||
punktfunk/
|
||||
├── Cargo.toml # workspace
|
||||
├── crates/
|
||||
│ ├── punktfunk-core/ # protocol, FEC, pacing, crypto — C ABI (cdylib + staticlib)
|
||||
│ │ ├── src/abi.rs # #[no_mangle] extern "C" surface
|
||||
│ │ ├── src/fec.rs # GF(2^16) blocking over reed-solomon-simd
|
||||
│ │ ├── src/transport/ # udp+fec video, quinn control/audio
|
||||
│ │ ├── src/protocol/ # gamestream-compat (P1) + punktfunk/1 (P2)
|
||||
│ │ └── cbindgen.toml
|
||||
│ ├── punktfunk-host/ # Linux host binary
|
||||
│ │ ├── src/capture/ # pipewire / portal
|
||||
│ │ ├── src/encode/ # ffmpeg vaapi/nvenc
|
||||
│ │ ├── src/vdisplay/ # trait + kwin/wlroots/mutter impls
|
||||
│ │ ├── src/input/ # reis + uinput
|
||||
│ │ └── src/web/ # axum config/pairing API
|
||||
│ └── punktfunk-client-rs/ # reference Rust client (M4)
|
||||
├── clients/
|
||||
│ ├── apple/ # Swift package, imports punktfunk_core.h (M5)
|
||||
│ └── android/ # Kotlin + JNI (later)
|
||||
├── include/ # generated punktfunk_core.h
|
||||
└── tools/
|
||||
├── latency-probe/
|
||||
└── loss-harness/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Immediate next actions (first week)
|
||||
|
||||
1. **Stand up the workspace** with `punktfunk-core` (empty ABI + `cbindgen`) and `punktfunk-host` skeletons; CI on your Gitea (you already have BuildKit pipelines).
|
||||
2. **M0 spike on wlroots:** headless output → PipeWire capture → NVENC/VAAPI encode → playable file. This validates the riskiest *pipeline* assumptions in days, on your real GPU.
|
||||
3. **Read KRdp's source** for how KDE creates virtual outputs and casts them — it's the closest existing reference for the KWin path you'll need in M2.
|
||||
4. **Decide P1 protocol depth:** confirm exactly which `serverinfo`/RTSP/pairing messages a current Moonlight client requires for a successful connect, so M2's compat surface is scoped precisely (this is also the question to take back to the dev who mentioned the 1G limit).
|
||||
|
||||
---
|
||||
|
||||
*The shape of the bet: M2 alone — virtual-display streaming to stock Moonlight clients on Linux — is a complete, useful, gap-filling release. Everything after it (the wall-breaking transport, native clients, mic-done-right) is upside you unlock from a position of having already shipped, with the hard transport work resting on a FEC core that makes the 1 Gbps ceiling a thing of the past rather than a thing to hack around.*
|
||||
@@ -0,0 +1,21 @@
|
||||
---
|
||||
title: Introduction
|
||||
description: Low-latency desktop and game streaming, Linux-first.
|
||||
---
|
||||
|
||||
import { Cards, Card } from 'fumadocs-ui/components/card'
|
||||
|
||||
**punktfunk** is a ground-up low-latency desktop and game streaming stack, built Linux-first,
|
||||
with a shared Rust protocol core (`punktfunk-core`) exposed over a C ABI and native clients per
|
||||
platform. It speaks two protocols: GameStream (so a stock Moonlight client just works) and the
|
||||
native `punktfunk/1` (QUIC control plane + a hardened UDP data plane with GF(2¹⁶) Leopard FEC and
|
||||
AES-GCM).
|
||||
|
||||
## Start here
|
||||
|
||||
<Cards>
|
||||
<Card title="Project Overview" href="/docs/overview" description="Status, architecture, and milestones at a glance." />
|
||||
<Card title="Implementation Plan" href="/docs/implementation-plan" description="The full design: protocol core, milestones, and architecture." />
|
||||
<Card title="Roadmap" href="/docs/roadmap" description="Decided next goals and the longer-term bets." />
|
||||
<Card title="Linux Host Setup" href="/docs/linux-setup" description="Bring up the build environment on an NVIDIA-GPU Ubuntu VM." />
|
||||
</Cards>
|
||||
@@ -0,0 +1,163 @@
|
||||
---
|
||||
title: "Linux Host Setup"
|
||||
description: "Bring up the build environment on an NVIDIA-GPU Ubuntu VM."
|
||||
---
|
||||
|
||||
|
||||
How to bring up the build environment for the punktfunk Linux host on an NVIDIA-GPU Ubuntu VM
|
||||
and run the **M0** capture→encode spike. `punktfunk-core` already builds and is tested
|
||||
cross-platform; this is about the platform backends in `crates/punktfunk-host`.
|
||||
|
||||
> Target **Ubuntu 24.04 (noble)**: Sway 1.9, FFmpeg 6.1.1, xdg-desktop-portal 1.18.
|
||||
> 22.04 (jammy) ships Sway 1.7 / FFmpeg 4.4 — too old for this path; build from source or
|
||||
> upgrade. Package names/versions below were verified against the live Ubuntu archive.
|
||||
|
||||
## 1. Bootstrap
|
||||
|
||||
```sh
|
||||
git clone git@git.unom.io:unom/punktfunk.git && cd punktfunk && git checkout m1-punktfunk-core
|
||||
bash scripts/bootstrap-ubuntu.sh
|
||||
```
|
||||
|
||||
It **verifies** the (already-installed) NVIDIA + NVENC stack, installs the Rust toolchain
|
||||
(rustup) and the build/runtime deps (PipeWire, xdg-desktop-portal + the wlroots backend,
|
||||
Sway, Wayland/DRM/EGL/GBM/VA dev libs, capture tools), **gates** the FFmpeg `-dev`
|
||||
headers so it can't clobber your custom NVENC FFmpeg, and drops headless-Sway + portal
|
||||
config templates into `~/.config` (only if absent). It does **not** reboot or edit GRUB.
|
||||
|
||||
After it runs, sanity-check the core on Linux:
|
||||
|
||||
```sh
|
||||
cargo test --workspace # 21 tests; same suite that's green on macOS
|
||||
```
|
||||
|
||||
## 2. NVIDIA prerequisites (one-time, may need a reboot)
|
||||
|
||||
Wayland on NVIDIA requires KMS modeset. The bootstrap checks it; if it isn't `Y`:
|
||||
|
||||
```sh
|
||||
echo 'options nvidia-drm modeset=1 fbdev=1' | sudo tee /etc/modprobe.d/nvidia-drm.conf
|
||||
sudo update-initramfs -u && sudo reboot
|
||||
cat /sys/module/nvidia_drm/parameters/modeset # must print Y after reboot
|
||||
```
|
||||
|
||||
- Driver **≥ 535** is the floor for headless wlroots (EGL/dmabuf); 550+ recommended.
|
||||
- **Install the NVIDIA GL/EGL userspace, not just `nvidia-utils`:**
|
||||
`sudo apt install libnvidia-gl-<NNN>` (matching the driver, e.g. `libnvidia-gl-595`).
|
||||
`nvidia-utils-NNN` ships nvidia-smi + NVENC but **not** `libEGL_nvidia.so.0` or the GLVND
|
||||
vendor JSON (`/usr/share/glvnd/egl_vendor.d/10_nvidia.json`). Without them libglvnd falls
|
||||
back to Mesa, wlroots can't init EGL on the GPU and drops to the **pixman** software
|
||||
renderer — and the ScreenCast portal then fails to negotiate a buffer format
|
||||
(`unable to receive a valid format from wlr_screencopy`). Verify after install:
|
||||
`ls /usr/share/glvnd/egl_vendor.d/10_nvidia.json && ldconfig -p | grep libEGL_nvidia`.
|
||||
A correct GPU Sway logs `EGL vendor: NVIDIA` and a list of DMA-BUF formats.
|
||||
- **Join the `render` + `video` groups:** `sudo usermod -aG render,video $USER`, then
|
||||
**re-login** (group changes only apply to new logins). wlroots opens
|
||||
`/dev/dri/renderD128` (group `render`) and `/dev/dri/card*` (group `video`), both 0660;
|
||||
without membership Sway aborts with `Permission denied`. (`scripts/headless/*.sh` bridge a
|
||||
not-yet-re-logged-in shell with `sg render`, but re-login is the clean fix.)
|
||||
- A **headless VM GPU exposes no DRM connectors** — that's expected. We don't use the DRM
|
||||
backend; `WLR_BACKENDS=headless` renders to an offscreen GBM/EGL surface and creates a
|
||||
virtual `HEADLESS-1` output. Use the render node `/dev/dri/renderD128`.
|
||||
- **NVENC in a VM:** full PCI **passthrough** = bare-metal NVENC, no license. **vGPU**
|
||||
needs a valid license (vWS) or NVENC runs degraded — the bootstrap's smoke-encode tells
|
||||
you if it actually works. Consumer GeForce cards also cap concurrent NVENC sessions
|
||||
(~8); datacenter/RTX-pro are effectively unlimited — relevant once we serve many clients.
|
||||
|
||||
## 3. Bring up the headless compositor + prove capture→NVENC
|
||||
|
||||
```sh
|
||||
# shell 1 — start headless GPU Sway on the shared user bus (blocks; -d for debug log)
|
||||
bash scripts/headless/run-headless-sway.sh # success logs "EGL vendor: NVIDIA"
|
||||
|
||||
# shell 2 — same user: set the client mode, import the portal env, write the env file
|
||||
bash scripts/headless/prepare-session.sh 2560x1440@60Hz
|
||||
source /tmp/punktfunk-sway-env.sh
|
||||
swaymsg -t get_outputs # confirm HEADLESS-1 active
|
||||
swaymsg exec foot # optional: animated content to capture
|
||||
bash scripts/headless/capture-smoke-test.sh # wf-recorder (wlr-screencopy) -> hevc_nvenc
|
||||
ffprobe /tmp/punktfunk-headless-test.mkv # confirm a real H.265 stream
|
||||
```
|
||||
|
||||
`wf-recorder` uses `wlr-screencopy` directly (no portal/D-Bus) — the fastest way to
|
||||
de-risk the GPU encode path. **Note:** screencopy encodes straight to a file and *cannot*
|
||||
feed PipeWire; the real integration uses the ScreenCast portal (see M0). If shell 1 logged
|
||||
a Mesa/EGL fallback (or Sway dropped to pixman) instead of `EGL vendor: NVIDIA`, install the
|
||||
NVIDIA GL userspace (§2) — the portal cannot capture a pixman output.
|
||||
|
||||
**An idle headless output produces no frames** (its frame clock is driven by damage); give
|
||||
it a real refresh mode (`prepare-session.sh` does) *and* run something animated
|
||||
(`swaymsg exec foot`) or the capture will be ~1 frame.
|
||||
|
||||
The wlroots-on-NVIDIA env workarounds (`WLR_RENDERER=gles2`, `WLR_NO_HARDWARE_CURSORS=1`,
|
||||
`GBM_BACKEND=nvidia-drm`, `sway --unsupported-gpu`, …) live in
|
||||
`scripts/headless/env.sh` — `source` it before launching anything Wayland.
|
||||
|
||||
## 4. M0 proper — wire it into `punktfunk-core`
|
||||
|
||||
Goal (plan §8): headless output → PipeWire ScreenCast → NVENC → a playable file, then feed
|
||||
the encoded access units into a `punktfunk_core::Session` (host role). The module seams exist
|
||||
in `crates/punktfunk-host/src/{vdisplay,capture,encode,inject,pipeline}.rs`.
|
||||
|
||||
**Status: implemented and verified end-to-end** in `crates/punktfunk-host` (`m0.rs`,
|
||||
`capture/linux.rs`, `encode/linux.rs`). After the §3 bring-up:
|
||||
|
||||
```sh
|
||||
source /tmp/punktfunk-sway-env.sh
|
||||
swaymsg exec foot # animated content
|
||||
# Live portal capture → NVENC HEVC → playable file, with each AU also round-tripped
|
||||
# through a punktfunk_core host→client Session (FEC + packetize + reassemble) and verified:
|
||||
cargo run -p punktfunk-host -- m0 --source portal --seconds 5 --out /tmp/punktfunk-m0.h265
|
||||
ffprobe /tmp/punktfunk-m0.h265
|
||||
# No capture session needed (encode + core only): --source synthetic
|
||||
```
|
||||
|
||||
Verified result: `1920x1080` HEVC, ~300 frames in 5s, `punktfunk-core loopback … 0 mismatches`.
|
||||
The portal negotiates packed **`RGB` (24-bit, 3 bpp)** on wlroots; the encoder expands it to
|
||||
`rgb0` (one pad byte/pixel, no colour math) since NVENC accepts `rgb0`/`bgr0` but not
|
||||
`rgb24`. dmabuf zero-copy import is still deferred (plan §9) — this is the CPU-copy path.
|
||||
|
||||
Crate choices, verified current:
|
||||
- **Capture (portal path):** [`ashpd`](https://docs.rs/ashpd) **0.13** with the
|
||||
`screencast` feature (the `pipewire` feature is *not* needed — `open_pipe_wire_remote`
|
||||
is unconditional). Flow (0.13 API, verified against the vendored source): `Screencast::new`
|
||||
→ `create_session(Default)` → `select_sources(&session, SelectSourcesOptions::default()
|
||||
.set_sources(BitFlags::from_flag(SourceType::Monitor))…)` → `start(&session, None,
|
||||
Default)` → `.response()?` → `Stream::pipe_wire_node_id()` + `open_pipe_wire_remote()`.
|
||||
Note 0.13 takes **options structs**, not the old positional args, and defaults to the
|
||||
**tokio** runtime — drive the handshake on a *multi-thread* tokio runtime (a
|
||||
current-thread one starves zbus's reader and the portal reports "Invalid session").
|
||||
Pull frames with [`pipewire`](https://docs.rs/pipewire) **0.9** — it must match the
|
||||
pipewire crate ashpd 0.13 links (the `pipewire-sys` `links` key is unique per build, so
|
||||
`0.10` fails to resolve). 0.9 uses `MainLoopRc`/`ContextRc::connect_fd_rc(OwnedFd)`/
|
||||
`StreamBox`. Only request `SourceType::Monitor` — the wlr backend's
|
||||
`AvailableSourceTypes` is `1` (Monitor only); asking for `Window`/`Virtual` invalidates
|
||||
the session. Set `XDG_CURRENT_DESKTOP=sway` so the wlr portal backend is chosen, and
|
||||
import it into the portal's environment (see "Portal bring-up" below).
|
||||
- **Encode:** [`ffmpeg-next`](https://crates.io/crates/ffmpeg-next) **8.x** (binds the
|
||||
system FFmpeg 8.x via pkg-config; needs `clang`/`libclang`). Select the encoder by
|
||||
name — `encoder::find_by_name("hevc_nvenc")`, *not* by codec id (that's the SW encoder).
|
||||
Low-latency opts: `preset=p1`, `tune=ull`, `rc=cbr`, `bf=0`, `delay=0`, large `g`.
|
||||
If your FFmpeg is in a non-standard prefix, `export FFMPEG_DIR=/that/prefix`.
|
||||
- **Zero-copy is the hard part.** There's no direct dmabuf→CUDA import in FFmpeg.
|
||||
**Start with the CPU-copy fallback** (download frame → `hwupload_cuda` → `hevc_nvenc`)
|
||||
to get an end-to-end stream, then chase true dmabuf zero-copy. The plan flags this
|
||||
(§9) and the `capture` module already has a `cpu_bytes` fallback field.
|
||||
- **Input (M2):** [`reis`](https://crates.io/crates/reis) (pure-Rust libei — no native
|
||||
`libei` needed) with `input-linux`/uinput as the universal fallback.
|
||||
|
||||
Then continue toward **M2**: `serverinfo`/RTSP/pairing enough for a stock Moonlight client
|
||||
to connect, a KWin virtual output created on connect, input via reis/uinput — the
|
||||
shippable milestone.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Fix |
|
||||
|---|---|
|
||||
| Sway aborts on NVIDIA | add `--unsupported-gpu` (the helper scripts do) |
|
||||
| `not a KMS device` / no connectors | expected on a headless VM GPU — use `WLR_BACKENDS=headless`, not the DRM backend |
|
||||
| Sway won't start at all | `WLR_RENDERER_ALLOW_SOFTWARE=1 WLR_RENDERER=pixman` to prove the pipeline, then fix EGL |
|
||||
| ScreenCast portal finds no output | ensure `xdg-desktop-portal-wlr` is running in the same session, `XDG_CURRENT_DESKTOP=sway`, and `~/.config/xdg-desktop-portal-wlr/config` has `output_name=HEADLESS-1` |
|
||||
| `Cannot load libnvidia-encode.so.1` | NVENC runtime lib missing (driver) or unlicensed vGPU |
|
||||
| `cargo build` can't find FFmpeg | `export FFMPEG_DIR=$(pkg-config --variable=prefix libavcodec)` or point `PKG_CONFIG_PATH` at the custom build |
|
||||
| bindgen: libclang not found | `export LIBCLANG_PATH=$(llvm-config --libdir)` |
|
||||
@@ -0,0 +1,95 @@
|
||||
---
|
||||
title: "M2 — Moonlight Host"
|
||||
description: "Stream to a stock Moonlight client on a client-sized virtual display."
|
||||
---
|
||||
|
||||
|
||||
The shippable milestone (plan §8). A stock Moonlight/Artemis client discovers this host,
|
||||
pairs, launches, and gets video (then input, then audio) on a client-sized virtual display.
|
||||
Ground-truth protocol reference: [`research/gamestream-protocol-research.json`](research/gamestream-protocol-research.json)
|
||||
(distilled from Sunshine + moonlight-common-c source; cite those for byte-level detail).
|
||||
|
||||
## Architecture (respects the "one core" invariant)
|
||||
|
||||
- **punktfunk-core** gains a **P1 GameStream wire codec** (`ProtocolPhase::P1GameStream`, the
|
||||
hook already exists): the exact RTP+`NV_VIDEO_PACKET` framing, the GameStream FEC shard
|
||||
layout, and the video/audio AES-GCM/CBC paths. Hot path, native threads, **no async**.
|
||||
Kept beside punktfunk's native internal format (P2), selected by phase.
|
||||
- **punktfunk-host** gains the **control plane** (tokio/axum OK — I/O-bound, not the hot path):
|
||||
mDNS discovery, nvhttp serverinfo + the 4-phase pairing, the RTSP handshake, the ENet
|
||||
control stream + input injection, the virtual-display lifecycle, and Opus audio encode.
|
||||
|
||||
## Port map (base 47989; Moonlight derives all by offset)
|
||||
|
||||
| Port | Proto | Role |
|
||||
|---|---|---|
|
||||
| 47989 | TCP | HTTP nvhttp (unpaired: /serverinfo, /pair PIN flow) |
|
||||
| 47984 | TCP | HTTPS nvhttp (paired; **client-cert pinned**) — /launch, /resume, … |
|
||||
| 48010 | TCP | RTSP (OPTIONS/DESCRIBE/SETUP/ANNOUNCE/PLAY) |
|
||||
| 47998 | UDP | Video RTP (+ RS-FEC, optional AES-GCM) |
|
||||
| 47999 | UDP | Audio RTP (Opus, RS-FEC 4+2, optional **AES-CBC**) |
|
||||
| 48000 | UDP | ENet control stream (AES-GCM) + remote input |
|
||||
| 5353 | UDP | mDNS `_nvstream._tcp.local` advertisement |
|
||||
|
||||
## Key wire facts (the non-obvious ones)
|
||||
|
||||
- **Video datagram** = `RTP_PACKET(12, BIG-endian)` + `reserved[4]` + `NV_VIDEO_PACKET(16,
|
||||
LITTLE-endian)` + payload. Endianness differs *within the same packet*. `header=0x80|0x10`.
|
||||
- `fecInfo` (u32 LE) = `(dataShards<<22)|(fecIndex<<12)|(fecPercentage<<4)`; parityShards is
|
||||
**recomputed** by the client as `ceil(dataShards*pct/100)` — must match exactly.
|
||||
- `multiFecBlocks` = `(blockIdx<<4)|((nBlocks-1)<<6)`; **≤4 FEC blocks/frame**, ≤255 shards/block.
|
||||
- Each frame's bitstream is prefixed with an 8-byte `video_short_frame_header_t`
|
||||
(`headerType=0x01`, `frameType` 2=IDR, `lastPayloadLen`) before striping into shards.
|
||||
- Shard size = `packetSize + 16`. Data shards first, then parity, over a contiguous RTP
|
||||
sequence range. Last data shard zero-padded.
|
||||
- **Video crypto** (when `SS_ENC_VIDEO` negotiated): AES-128-GCM, key = raw 16-byte RIKEY
|
||||
(from `/launch?rikey=`), IV = `counter_le[8]||0,0,0||'V'(0x56)`, **NO AAD**, 32-byte
|
||||
`ENC_VIDEO_HEADER{iv[12],frameNumber,tag[16]}` prefix; **FEC first, then encrypt per shard**.
|
||||
- **Pairing**: PIN key = `SHA-256(salt[16] || ascii_pin)[..16]`; AES-128-**ECB** (no padding)
|
||||
for the challenge blocks; SHA-256 rolling hashes; RSA-SHA256 signatures over X.509 certs;
|
||||
the client cert is pinned for subsequent HTTPS. 4 phases over `/pair?phrase=…`.
|
||||
- **RTSP** `Session: DEADBEEFCAFE;timeout = 90` (literal), `Transport: server_port=<p>`,
|
||||
`streamid=video/0/0` / `control/13/0`. ANNOUNCE carries the negotiated config
|
||||
(`x-nv-video[0].*`, `x-nv-vqos[0].*`) → maps to `punktfunk_core::Config`.
|
||||
|
||||
## The two highest interop risks (validate EARLY)
|
||||
|
||||
1. **RS-FEC matrix compatibility.** Sunshine + Moonlight both use **nanors** (GF(2⁸), poly
|
||||
0x11d, Vandermonde systematic). punktfunk-core uses `reed-solomon-erasure` (Cauchy) — parity
|
||||
bytes likely **don't match**, so Moonlight silently fails to recover any frame with a lost
|
||||
data shard. Mitigation: **on a clean LAN with no loss the client never runs RS decode**, so
|
||||
defer this — get a frame decoded first, then FFI/port nanors for loss recovery.
|
||||
2. **Crypto layout.** punktfunk's `SessionCrypto` (salt + seq-as-AAD) is wire-incompatible. P1
|
||||
needs a separate GameStream GCM path. Mitigation: **video encryption is negotiated and
|
||||
usually off on LAN** — implement plaintext video first, add GCM later.
|
||||
|
||||
## Phasing (each phase independently testable with a real Moonlight client)
|
||||
|
||||
- **P1.1 — Discovery + serverinfo + pairing.** mDNS `_nvstream._tcp`, HTTP/HTTPS nvhttp,
|
||||
`/serverinfo` XML, the 4-phase pairing + cert pinning. *Acceptance: Moonlight discovers,
|
||||
pairs (PIN), and shows the host as ready.* ← first slice.
|
||||
- **P1.2 — Launch + RTSP + virtual display.** `/launch` (parse rikey/rikeyid/mode), the RTSP
|
||||
handshake, negotiate `Config`, create a wlroots virtual output sized to the client.
|
||||
*Acceptance: Moonlight completes RTSP and the host stands up the UDP streams.*
|
||||
- **P1.3 — Video (punktfunk-core P1 codec), plaintext, clean-LAN.** RTP+NV framing + FEC shard
|
||||
layout in punktfunk-core; wire M0's NVENC AUs → UDP 47998. *Acceptance: Moonlight DISPLAYS video.*
|
||||
- **P1.4 — Control + input.** ENet (`rusty_enet`) control stream; decode input → `inject.rs`
|
||||
(uinput/reis); request-IDR → force NVENC keyframe. *Acceptance: mouse/keyboard work.*
|
||||
- **P1.5 — Robustness: FEC recovery + encryption.** nanors-exact FEC; per-shard AES-GCM.
|
||||
*Acceptance: stable under `tc netem` loss; encrypted streams.*
|
||||
- **P1.6 — Audio + polish.** Opus + audio RTP/FEC/CBC (UDP 47999); disconnect teardown; KWin
|
||||
backend for the user's KDE box. *Acceptance: full game stream with sound — the M2 goal.*
|
||||
|
||||
## Crates (verified available)
|
||||
|
||||
`mdns-sd` 0.20 (discovery) · `axum` 0.8 + `rustls` + `tokio-rustls` (nvhttp/HTTPS, custom
|
||||
`ClientCertVerifier` for pinning) · `rcgen` 0.14 + `x509-parser` 0.18 + `rsa`/`sha2`/`aes`/
|
||||
`ecb` (pairing crypto) · hand-rolled RTSP over `tokio::net::TcpListener` · `rusty_enet` 0.4
|
||||
(control) · `opus` 0.3 (audio) · `reis` 0.6 + `input-linux` (input) · `aes-gcm` (already in
|
||||
core) for the P1 video/control GCM path; nanors (FFI/port) for FEC recovery in P1.5.
|
||||
|
||||
## Testing note
|
||||
|
||||
The host is headless; end-to-end needs a **stock Moonlight client on the LAN** pointed at
|
||||
this box (manual "add host" by IP works without mDNS). P1.1 is testable with `curl` against
|
||||
`/serverinfo` + the Moonlight pair flow; P1.3+ needs a client that can display.
|
||||
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"title": "Documentation",
|
||||
"pages": [
|
||||
"index",
|
||||
"overview",
|
||||
"implementation-plan",
|
||||
"roadmap",
|
||||
"m2-plan",
|
||||
"---Setup---",
|
||||
"linux-setup",
|
||||
"windows-host",
|
||||
"dualsense-haptics"
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: "Project Overview"
|
||||
description: "Status, architecture, and milestones at a glance."
|
||||
---
|
||||
|
||||
|
||||
*A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust
|
||||
protocol core and native clients per platform.*
|
||||
|
||||
`punktfunk` is a placeholder codename. The bet: ship a **Linux virtual-display streaming
|
||||
host** that speaks the existing Moonlight protocol (every Moonlight/Artemis client works
|
||||
day one), then break the ~1 Gbps FEC wall with a **GF(2¹⁶) Leopard-RS** transport as a
|
||||
negotiated extension. See [`docs/implementation-plan.md`](docs/implementation-plan.md).
|
||||
|
||||
## Status
|
||||
|
||||
| Milestone | State |
|
||||
|-----------|-------|
|
||||
| **M1 — `punktfunk-core` + C ABI** | ✅ done & hardened (FEC, packetization, AES-GCM, session, adversarial-review fixes, `punktfunk_core.h`) |
|
||||
| **M2 — GameStream host → stock Moonlight** | ✅ live end-to-end: pairing, RTSP, audio, per-client virtual output at native res, GPU zero-copy NVENC, gamepads |
|
||||
| **M3 — `punktfunk/1` native protocol** | ✅ validated live: QUIC control + GF(2¹⁶) FEC/AES data plane, SPAKE2 PIN pairing, mid-stream mode renegotiation |
|
||||
| **M4 — client decode + present (Apple)** | 🟡 macOS first light: AnnexB→VideoToolbox HEVC on glass + input/pairing over `punktfunk/1` (`clients/apple`); iOS + presenter next |
|
||||
| **Web console + management API** | ✅ TanStack web console (`web/`) over the OpenAPI mgmt API: host status, paired devices, on-demand native pairing (arm → show PIN) |
|
||||
|
||||
The **GameStream host works with a stock Moonlight client** — validated live on NVIDIA
|
||||
(RTX 5070 Ti & RTX 4090, driver 595): trust-on-first-use pairing that persists, an app
|
||||
catalog, RTSP/ENet/audio, and **video at the client's exact resolution and refresh** via a
|
||||
per-session virtual output (KWin, gamescope, Mutter, Sway backends), encoded with GPU
|
||||
**zero-copy** (dmabuf → CUDA/Vulkan → NVENC) at up to 5120×1440@240. The native
|
||||
**`punktfunk/1`** protocol adds a QUIC control plane and a GF(2¹⁶) Leopard-FEC + AES-GCM data
|
||||
plane (p50 ~0.8 ms capture→reassembled at 720p120), with a SPAKE2 PIN pairing ceremony. Both
|
||||
run from **one process** (`serve --native`), managed through a REST API + web console. Builds
|
||||
against FFmpeg 7 or 8; deployed live on Bazzite. Full status: [`CLAUDE.md`](CLAUDE.md);
|
||||
roadmap: [`docs/roadmap.md`](docs/roadmap.md).
|
||||
|
||||
## Layout
|
||||
|
||||
```
|
||||
crates/
|
||||
punktfunk-core/ protocol · FEC · pacing · crypto · quic — the C ABI (lib + cdylib + staticlib)
|
||||
punktfunk-host/ Linux host: vdisplay · capture · encode · inject · gamestream · m3 · mgmt · native_pairing
|
||||
punktfunk-client-rs/ punktfunk/1 reference client (M3 headless; M4 adds decode+present)
|
||||
clients/{apple,android}/ native client scaffolds (import punktfunk_core.h); apple = macOS first light
|
||||
web/ TanStack web console (host status · paired devices · pairing) over the mgmt API
|
||||
packaging/ Fedora/Bazzite RPM · bootc image · COPR (see packaging/bazzite/README.md)
|
||||
include/punktfunk_core.h cbindgen-generated C header (checked in)
|
||||
tools/{latency-probe,loss-harness}/ measurement (plan §10)
|
||||
docs/{implementation-plan,roadmap,windows-host,dualsense-haptics}.md
|
||||
```
|
||||
|
||||
## Build & test
|
||||
|
||||
```sh
|
||||
cargo build --workspace # green on Linux and macOS
|
||||
cargo test --workspace # unit + loopback + proptest + C ABI harness
|
||||
cargo clippy --workspace --all-targets
|
||||
|
||||
cargo run -p loss-harness # FEC loss-resilience sweep (no network needed)
|
||||
bash crates/punktfunk-core/tests/c/run.sh # standalone C-ABI link+round-trip proof
|
||||
```
|
||||
|
||||
The C header regenerates from `crates/punktfunk-core/src/abi.rs` on every build (cbindgen via
|
||||
`build.rs`) into `include/punktfunk_core.h`.
|
||||
|
||||
## Design invariants
|
||||
|
||||
- **One core, linked everywhere.** Protocol/FEC/crypto/pacing live in `punktfunk-core` exactly
|
||||
once, exposed over a stable, versioned C ABI (`punktfunk_abi_version()`, `PunktfunkConfig`
|
||||
carries its own `struct_size`).
|
||||
- **No async on the hot path.** The per-frame pipeline uses native threads only;
|
||||
`tokio`/`quinn` are gated behind the off-by-default `quic` feature (control plane only).
|
||||
- **FEC is the wall-breaker.** GF(2⁸) (≤255 shards/block) for Moonlight compat;
|
||||
GF(2¹⁶) (≤65535 shards/block, SIMD, O(n log n)) to push past ~1 Gbps.
|
||||
|
||||
## License
|
||||
|
||||
MIT OR Apache-2.0.
|
||||
@@ -0,0 +1,334 @@
|
||||
---
|
||||
title: "Roadmap"
|
||||
description: "Decided next goals and the longer-term bets."
|
||||
---
|
||||
|
||||
|
||||
Decided 2026-06-10 (research-grounded; see commit history), extended since.
|
||||
|
||||
**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full
|
||||
stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense**
|
||||
— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E
|
||||
live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds
|
||||
against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced).
|
||||
**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one
|
||||
process, with native pairing driven from the **web console** (arm → show PIN), not the service log.
|
||||
Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`).
|
||||
**Bazzite dynamic resolution (`c894c6f`):** the host now *manages* a headless `gamescope-session-plus`
|
||||
Steam session at the **client's exact resolution + refresh** — games see it (via injected
|
||||
`--nested-refresh` + generated CVT modes, not the box's TV EDID), relaunched per-connection on a mode
|
||||
change, reused (no Steam restart) on the same mode. Plus macOS/iPad input fixes (NSEvent motion +
|
||||
iPad pointer-lock) and a 4K/5K one-frame-freeze fix (grow the UDP socket buffers).
|
||||
|
||||
**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval), the M4
|
||||
client presenter + iOS (§6), and a Windows host (§7 — now **de-risked via SudoVDA**, no custom
|
||||
signed driver needed). **§10 HDR/10-bit is parked — blocked upstream at the compositor** (no
|
||||
gamescope/KWin PipeWire 10-bit producer yet).
|
||||
|
||||
## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)*
|
||||
|
||||
Startup is a chain of timing-sensitive handoffs with no readiness checks — each is a blind
|
||||
`sleep`, one-shot timeout, or silent fire-and-forget that fails into a black screen.
|
||||
|
||||
- **Phase 1 (S):** replace `run-headless-kde.sh`'s blind `sleep 2` with an active readiness
|
||||
wait (kwin socket + `wl_display` roundtrip + `zkde_screencast` global advertised +
|
||||
KWIN_PID alive); add a `punktfunk-host probe-compositor` subcommand (reuses kwin.rs's
|
||||
registry roundtrip); move the portal restart to *after* readiness and precede it with
|
||||
`systemctl --user import-environment` + `dbus-update-activation-environment` (the missing
|
||||
env import — the Sway script does this, the KDE one doesn't).
|
||||
- **Phase 2 (M):** bounded retry-with-backoff around `vd.create()` + first-frame
|
||||
(permanent vs transient); a PipeWire negotiation watchdog with zero-copy→CPU auto-fallback
|
||||
("no PipeWire frame within 10s" → recovery or precise diagnosis); fix `set_custom_refresh`
|
||||
to wait for the output, read back the active mode, reconcile encoder fps; harden gamescope
|
||||
node discovery + detect the known-bad-gamescope signature; graceful PipeWire-thread stop.
|
||||
- **Phase 3 (L):** supervised systemd user session (kwin + portal + host) with the readiness
|
||||
probe as an `ExecStartPost` gate, `Restart=on-failure`.
|
||||
|
||||
## 2. Offer available compositors in the client ✅ *(done)*
|
||||
|
||||
Host enumerates which backends are actually available (binary present + version OK:
|
||||
gamescope ≥3.16.22, KWin ≥6.5.6, gnome-shell, sway), advertises the list in the punktfunk/1
|
||||
Welcome + a mgmt-API field; client sends its pick in the Hello; host honors it per session.
|
||||
Picker in the Apple client + web console.
|
||||
|
||||
## 3. Bazzite / install on other devices ✅ *(packaging written — `packaging/`)*
|
||||
|
||||
Bazzite already ships gamescope + PipeWire + the NVIDIA driver (incl. `libnvidia-encode`);
|
||||
it's Fedora-atomic and the community installs Sunshine via COPR rpm-ostree — the analog.
|
||||
Written: `packaging/rpm/punktfunk.spec` (builds the host from source), `packaging/bootc/Containerfile`
|
||||
(`FROM bazzite-nvidia`), `packaging/bazzite/host.env` (gamescope default), `packaging/copr/` +
|
||||
`packaging/README.md`. The build itself is operator-run (COPR / a Fedora toolbox; not buildable on
|
||||
the Ubuntu dev box). `LICENSE-{MIT,APACHE}` added to match the declared dual license.
|
||||
|
||||
- **M-Bazzite-1:** a **COPR RPM** (primary) — binary + `60-punktfunk.rules` (→
|
||||
`/usr/lib/udev/rules.d`) + systemd `--user` unit + `host.env.example`; `Requires` the
|
||||
NVENC ffmpeg-libs Bazzite already pulls; links host `libcuda`/`libnvidia-encode` directly.
|
||||
Install = `rpm-ostree install` + reboot + add to `input`/`render`. Default backend =
|
||||
Bazzite's already-present **gamescope** (minimal session plumbing).
|
||||
- **M-Bazzite-2:** wrap the RPM in a **bootc/OCI image layer** (`FROM
|
||||
ghcr.io/ublue-os/bazzite-nvidia:stable`) for the appliance/"just rebase" experience.
|
||||
- Flatpak only later as an explicitly-degraded convenience build (sandbox fights
|
||||
zero-copy NVENC/dmabuf/uinput).
|
||||
|
||||
## 4. Mic passthrough — client mic → host input device ✅ *(done — host side)*
|
||||
|
||||
The exact mirror of the host→client desktop-audio path. A PipeWire virtual source apps can
|
||||
select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`.
|
||||
|
||||
- New `0xCB` MIC_AUDIO datagram (mirror of `0xC9`) + `NativeClient::send_audio` + ABI
|
||||
`punktfunk_send_audio`.
|
||||
- `audio/source_linux.rs` — near-copy of the capture file, Direction::Output, fed from a
|
||||
jitter buffer (silence-fill underrun, Opus PLC).
|
||||
- Host `mic_thread` (Opus decode → ring → source); teardown RAII, set `node.dont-reconnect`.
|
||||
- Apple capture (AVAudioEngine → Opus). **Opt-in + paired-only** (a remote mic is a privacy
|
||||
surface). punktfunk/1-only.
|
||||
|
||||
## 5. Touch + rich DualSense *(decision: commit to full UHID DualSense)*
|
||||
|
||||
- **Touch — implemented (host path), pending a backend that lands it.** `TouchDown/Move/Up`
|
||||
InputKinds (reuse the abs-pointer `flags=(w<<16)|h` mapping, `code`=touch id); host
|
||||
`inject/libei.rs` requests the `Touchscreen` device type + binds the `Touch` capability and
|
||||
injects `ei_touchscreen` down/motion/up; `punktfunk-client-rs --touch-test` drags a finger.
|
||||
**Validated:** KWin's RemoteDesktop portal *grants* the Touchscreen device type, but its EIS
|
||||
server creates **no touchscreen device** (headless KWin) — so touch currently no-ops on KWin
|
||||
(now logged once). The code is correct; it needs a backend that exposes `ei_touchscreen`
|
||||
(gamescope / newer KWin / the real iPad client path) to land. wlroots: no virtual-touch wired.
|
||||
- **Rich DualSense — HID backend built & validated live.** `inject/dualsense.rs`: a hand-rolled
|
||||
`/dev/uhid` codec (no bindgen) presenting a genuine USB DualSense (vendor 054C/0CE6, the 232-byte
|
||||
inputtino report descriptor) bound by the kernel `hid-playstation` driver. The mandatory
|
||||
GET_REPORT feature handshake (calibration 0x05 / pairing 0x09 / firmware 0x20) is answered, so the
|
||||
kernel creates the full device (gamepad/motion/touchpad/lightbar). Input report `0x01` is built
|
||||
from gamepad frames; output report `0x02` is parsed for LED RGB, player LEDs, and **adaptive
|
||||
trigger effects (L2/R2)**. Protocol carries new side-planes: rich-input `0xCC`
|
||||
(touchpad/motion) + HID-output `0xCD` (LED/triggers). `/dev/uhid` udev rule shipped.
|
||||
- **Rich DualSense — Phase C/D/E end-to-end, validated live.** `PUNKTFUNK_GAMEPAD=dualsense`
|
||||
selects a per-session `DualSenseManager` (the `PadBackend` enum in `m3.rs`): client gamepad frames
|
||||
build the DualSense report; the kernel's feedback comes back as `HidOutput` on the **0xCD** plane
|
||||
(lightbar / player LEDs / adaptive triggers) while **rumble stays on the universal 0xCA plane**
|
||||
(so non-DualSense clients still feel it); touchpad + motion ride the **0xCC** rich-input plane
|
||||
(`DualSenseManager::apply_rich`, merged with button state). The connector + C ABI gained
|
||||
`punktfunk_connection_next_hidout` (→ `PunktfunkHidOutput`) and `punktfunk_connection_send_rich_input`
|
||||
(← `PunktfunkRichInput`); header regenerated. Validated on-box: a synthetic-source `m3-host` +
|
||||
`punktfunk-client-rs --rich-input-test` created the real kernel DualSense, drove 0xCC, and decoded
|
||||
12 live 0xCD events (the kernel's actual lightbar/trigger init reports) — data plane unaffected
|
||||
(600/600 frames). *Remaining:* the Apple client renders adaptive triggers + rumble on a real
|
||||
DualSense (`GCDualSenseAdaptiveTrigger`) — handed off to the client agent for the real playtest.
|
||||
- **Advanced (audio-driven voice-coil) haptics — scoped, NO-GO for now (`docs/dualsense-haptics.md`).**
|
||||
Driven by the DualSense's USB *audio* interface (4-ch, back 2 channels = haptic PCM), not HID — so
|
||||
the UHID backend structurally can't carry it. Three independent walls: host capture needs a kernel
|
||||
rebuild (`CONFIG_USB_DUMMY_HCD` is off → no UDC for an `f_uac2` gadget); **near-zero Linux supply**
|
||||
(only ~5–10 Proton titles via custom Wine patches emit it; `hid-playstation`/Steam Input/RPCS3
|
||||
don't); and the Apple client can't faithfully replay PCM haptics (CoreHaptics is discrete/pattern-
|
||||
based, no public channel-3/4 routing). Deferred; revisit only if a real DS for capture + a UDC/host
|
||||
path + a PCM-capable client all land. Adaptive triggers (HID, above) deliver the reachable 80%.
|
||||
|
||||
## 6. iOS/iPadOS → tvOS *(deferred)*
|
||||
|
||||
PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin
|
||||
+ touch capture (#5) + UI. tvOS later.
|
||||
|
||||
## 7. Windows as a host *(scoped — `docs/windows-host.md`; de-risked via SudoVDA)*
|
||||
|
||||
Architecturally an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/
|
||||
crypto/C-ABI) + QUIC + GameStream + mgmt + the `m3`/pipeline orchestration are all platform-agnostic
|
||||
and already `cfg`-isolated (~95% reuse). New `#[cfg(windows)]` backends behind the existing traits:
|
||||
capture (DXGI Desktop Duplication / Windows.Graphics.Capture), encode (Media Foundation / NVENC-SDK
|
||||
with a D3D11 context), input (SendInput + ViGEm), audio (WASAPI loopback + a virtual mic).
|
||||
|
||||
**The old blocker is gone.** Rather than author + sign our own kernel IDD for the per-client virtual
|
||||
display, use **SudoVDA** (the Sunshine Virtual Display Adapter) — a pre-built, signed Indirect
|
||||
Display Driver that creates virtual displays at arbitrary WxH@Hz on demand. The `VirtualDisplay`
|
||||
backend becomes *"install + drive SudoVDA's control API"* (M effort), not *"write + WHQL-sign a
|
||||
kernel driver"* (XL). That removes the only hard blocker — the Windows host is now a medium,
|
||||
mostly-mechanical port. Recommended start: **Phase 0** — capture an existing monitor to prove the
|
||||
stack end to end; **Phase 1** wires SudoVDA for the native-resolution output. Deferred only because
|
||||
it's unbuildable on the Linux dev box; the trait boundaries are already in the right places.
|
||||
|
||||
## 8. Pairing & trust hardening *(next)*
|
||||
|
||||
The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the
|
||||
client) is built and live. Two changes harden it from "works" to "secure by default":
|
||||
|
||||
- ✅ **Mandatory PIN pairing by default — done & live** (`§8a`, `serve --native` now requires
|
||||
pairing; `serve --open` disables it). An unpaired client is rejected at the session gate; pairing
|
||||
is via the SPAKE2 PIN ceremony (one online guess, no offline attack) armed from the web console.
|
||||
Validated live: unpaired → "this host requires pairing", then web-armed PIN → "client trusted".
|
||||
Deployed to the dev box + Bazzite.
|
||||
- **Delegated pairing approval** *(next — the ergonomic enabler for "mandatory": pair a device
|
||||
without fetching the host PIN out of band).* Target flow:
|
||||
1. Device A is already paired (authenticated) to Host X.
|
||||
2. The user tries to connect Device B to Host X.
|
||||
3. Host X surfaces a request: *"Allow Device B to pair with Host X?"*
|
||||
4. The user approves/denies; on approve, Host X admits Device B — binding B's certificate
|
||||
fingerprint — with no PIN typed.
|
||||
|
||||
Two buildable layers:
|
||||
- **§8b-1 (host + web — achievable now):** an unpaired B that connects to an approval-enabled host
|
||||
is held as a **pending request** `{id, name, fingerprint, requested_at}` in `NativePairing`
|
||||
instead of a flat reject; mgmt gains `GET /native/pending` + `POST /native/pending/{id}/{approve,
|
||||
deny}`; the web console lists pending requests with Approve/Deny. The **operator approves from
|
||||
the console** — delegated approval via the management surface.
|
||||
- **§8b-2 (peer push — needs the client):** the host also pushes the pending request over a paired
|
||||
**Device A**'s live QUIC connection (a new control-plane message); A's app renders the prompt and
|
||||
replies approve/deny — the user's exact "Device A gets a notification" flow. The native/Apple UI
|
||||
is a client-agent task.
|
||||
|
||||
PIN pairing (§8a) stays the bootstrap — the first device, or when no approver is online.
|
||||
|
||||
## 9. Client→host network speed test + settable bitrate *(host side done — client UI remaining)*
|
||||
|
||||
Measure what the network actually sustains so the bitrate picker is informed (suggest/cap a safe
|
||||
value) instead of guesswork that ends in a stuttering stream.
|
||||
|
||||
**Done & live (host + protocol + connector + C ABI, `74819b1`):**
|
||||
- **Bitrate negotiation**: `bitrate_kbps` rides Hello/Welcome (trailing-byte back-compat). The
|
||||
client requests a rate; the host clamps to [500 kbps, 500 Mbps] (or its 20 Mbps default on 0),
|
||||
applies it to NVENC (replacing the old hardcoded 20 Mbps) on the initial mode + every reconfigure,
|
||||
and echoes the resolved value. C ABI: `punktfunk_connect_ex3(…, bitrate_kbps, …)` +
|
||||
`punktfunk_connection_bitrate()`.
|
||||
- **Bandwidth probe over the punktfunk/1 data path**: `ProbeRequest{target_kbps,duration_ms}` /
|
||||
`ProbeResult{bytes_sent,…}` control messages + a `FLAG_PROBE` packet flag. The host bursts
|
||||
zero-filled FEC-encoded AUs at the target goodput for the duration (clamped ≤ 1 Gbps / ≤ 5 s,
|
||||
video paused), reports what it sent; the connector measures received bytes/window → goodput + loss
|
||||
and exposes it (`punktfunk_connection_speed_test()` + `punktfunk_connection_probe_result()` →
|
||||
`PunktfunkProbeResult{throughput_kbps, loss_pct, …}`). Probe filler is diverted from the decoder.
|
||||
Validated on loopback (synthetic source): a 20 Mbps/2 s probe measured 20050 kbps at 0% loss,
|
||||
interleaved probe AUs excluded from frame verification. `punktfunk-client-rs` gains `--bitrate` +
|
||||
`--speed-test KBPS:MS` as the reference/loopback driver.
|
||||
|
||||
**Remaining (client UI):** wire the C ABI into the Apple client — a "Test network" action
|
||||
(`speed_test` → poll `probe_result` → "~XXX Mbps · recommended bitrate YYY") feeding a bitrate
|
||||
control (`connect_ex3`), and surface both in the web console.
|
||||
|
||||
## 10. HDR + 10-bit color *(parked — blocked upstream at the compositor producer)*
|
||||
|
||||
Opt-in HDR10 (BT.2020 + PQ, 10-bit) streaming. Designed end to end; **blocked at capture, not in our
|
||||
stack** — the compositor doesn't emit a 10-bit/HDR PipeWire frame on any shipping build. Spiked +
|
||||
researched 2026-06-11 (memory: `hdr-blocked-gamescope-pipewire`); the downstream design is ready to
|
||||
build the moment a producer lands.
|
||||
|
||||
- **The wall — gamescope capture is 8-bit.** gamescope composites HDR for a *display*
|
||||
(`--hdr-enabled`, `--hdr-debug-force-output`), but its PipeWire capture node offers only `BGRx`/
|
||||
`NV12` (8-bit) — confirmed by reading `src/pipewire.cpp` `build_format_params()` on upstream master
|
||||
AND the box's exact build (`c31743d`); color is capped BT.601/709. Issue **#2126** ("pipewire: add
|
||||
HDR streams") is OPEN + unstarted (no PR). Forcing HDR output does not change the capture format.
|
||||
- **PipeWire ≥1.6** is the other prerequisite (HDR colortype transport). Fedora 43 ships 1.4.x;
|
||||
Fedora 44 ships **1.6.6** — but Bazzite F44 `deck-nvidia` is **testing-only** (`:stable` is still
|
||||
F43; `:testing` has a confirmed NVIDIA Game-Mode crash). Rebasing now clears *only* the PipeWire
|
||||
wall while gamescope stays 8-bit → **no-go** for HDR; revisit a rebase when F44 promotes to stable
|
||||
(for its own sake), not for HDR.
|
||||
- **The realistic route is KWin, not gamescope:** **KWin MR !8293** is a live draft adding HDR
|
||||
PipeWire capture. That pulls HDR onto the *desktop* (KWin) path — trading away gamescope's
|
||||
Steam-Deck-UI polish + the dynamic-resolution work (§ above). Track #2126 and !8293.
|
||||
- **Constraints (settled):** NVENC tops out at **10-bit** → no Main12, no 12-bit AV1. **HDR ⟹ HEVC
|
||||
Main10** (Apple VideoToolbox decodes 10-bit HEVC but **not** 10-bit AV1). Static HDR10 SEI
|
||||
(BT.2020-PQ default) since the compositor won't surface per-frame metadata. Opt-in negotiation via
|
||||
the Hello/Welcome trailing-byte pattern (SDR default; client declares HDR want; host master toggle).
|
||||
- **Downstream design (ready when capture unblocks):** add P010 + `ColorInfo` to capture; 10-bit
|
||||
zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA
|
||||
path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple
|
||||
VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback.
|
||||
|
||||
## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)*
|
||||
|
||||
Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks
|
||||
the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling.
|
||||
|
||||
**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and
|
||||
untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is
|
||||
only ~434–874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM
|
||||
(RustCrypto auto AES-NI, ~10–25× headroom on x86_64); the u64 sequence/nonce space; and the **M1
|
||||
`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every
|
||||
legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size
|
||||
must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated
|
||||
params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand.
|
||||
|
||||
- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the
|
||||
clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show
|
||||
headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`)
|
||||
so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock
|
||||
send-buffer drop — `Transport::send` → `Result<bool>`, a new `packets_send_dropped` stat (Stats +
|
||||
C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe
|
||||
completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe.
|
||||
- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall
|
||||
per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe
|
||||
bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all`
|
||||
(64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux
|
||||
`sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with
|
||||
`recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep
|
||||
underdrain). ~64× fewer syscalls.
|
||||
- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the
|
||||
reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 256–1024 (bounds burst-drop
|
||||
blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump
|
||||
`shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets.
|
||||
- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB,
|
||||
never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface.
|
||||
- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path):
|
||||
`punktfunk-client-rs --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching
|
||||
`packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.5–1 Gbps (no explicit
|
||||
`rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.
|
||||
|
||||
## 12. Glass-to-glass latency *(investigated; quick wins landed, bigger bets scoped)*
|
||||
|
||||
A 5-way investigation (2026-06-11) mapped where latency actually lives. The measured "p50 0.83 ms"
|
||||
is only the same-host **capture-stamp→reassembled** slice (~30–40% of true glass-to-glass) and was
|
||||
measured with tiny single-chunk frames, so it excludes the pacing tail. The latency that matters, in
|
||||
priority order: **(1) the host pacing tail** — `paced_submit` used to spread *every* multi-chunk
|
||||
frame over ~90% of the interval (up to ~7.5 ms@120 / ~15 ms@60); **(2) native-path serialization** —
|
||||
`virtual_stream` runs capture+encode+seal+paced-send on one thread, so frame N+1 can't start until
|
||||
frame N's paced tail leaves the wire; **(3) client present** — `AVSampleBufferDisplayLayer` adds
|
||||
~0.5 refresh (~4 ms@120Hz, ~8 ms@60Hz), the dominant client term at 60 Hz.
|
||||
|
||||
**Already optimal — do NOT touch** (confirmed): NVENC tuning (p1/ull/cbr/bf0/delay0/infinite-GOP +
|
||||
forced-IDR — `receive_packet` is already same-frame); the device→device copy in `submit_cuda` (avoids
|
||||
NVENC registration-cache thrash); FEC `max_data_per_block=4096` (every frame incl. a 4 MB IDR is one
|
||||
block — no multi-block latency); the client reassembler (no jitter buffer, frame emitted on
|
||||
last-packet arrival, `REORDER_WINDOW` is a dedup bound not a delay) — do **not** add a client jitter
|
||||
buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
|
||||
|
||||
- **Done & live (`99f60b5`):** **microburst-cap pacing** — a frame ≤ a cap (default 128 KB,
|
||||
`PUNKTFUNK_PACE_BURST_KB`) bursts out immediately (no pacing tail); only a bigger frame's overflow
|
||||
(IDR / sustained high bitrate — the bursts that actually froze) is spread. Recovers the tail on the
|
||||
common case, keeps the freeze fix for the frames that need it; 128 KB is a safe default (well under
|
||||
the ~150 Mbps@60 frame size where drops began). Plus **per-frame instrumentation** (PUNKTFUNK_PERF):
|
||||
`encode_us` + `pace_us` p50/p99/max + immediate-vs-paced counts, so the cap is tunable against real
|
||||
numbers. **Validate with the LAN soak before raising the cap** (`send_dropped` must stay 0).
|
||||
- **Done & live (`b295a5b`; validated on the GNOME box 2026-06-12):** **encode|send thread split**
|
||||
on the native path — a dedicated `send_loop` thread owns the `Session` and does seal+pace+send+
|
||||
probes; the encode thread captures+encodes+handles reconfig and hands `FrameMsg` over a bounded
|
||||
`sync_channel(3)` with backpressure. Removes the serialization (~2–8 ms @60–120 fps) and is the
|
||||
substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
|
||||
LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
|
||||
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
|
||||
1. **Wall-clock skew handshake + glass-to-glass probe** (`tools/latency-probe`) — measures the two
|
||||
biggest unmeasured terms (render→capture, decode→present); client present-stamp vs the AU's
|
||||
`pts_ns` (already attached).
|
||||
2. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
|
||||
copy) — ~0.1–0.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
|
||||
3. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
|
||||
off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
|
||||
4. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
|
||||
encode+send within a frame (~3–6 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
|
||||
thread split, only after measurement justifies it.
|
||||
|
||||
## 13. Native-protocol LAN auto-discovery ✅ *(done — 2026-06-12, validated cross-LAN)*
|
||||
|
||||
The native protocol had no discovery — clients connected by `--connect HOST:PORT` only, while
|
||||
GameStream already auto-discovered via mDNS (`_nvstream._tcp`). Now both the unified host
|
||||
(`serve --native`) and standalone `m3-host` advertise the native service over mDNS:
|
||||
|
||||
- **Service**: `_punktfunk._udp.local.` (UDP — punktfunk/1 is QUIC; the advertised port is the QUIC
|
||||
control/data port). Host side: `crate::discovery::advertise_native`, wired into `m3::serve` so
|
||||
both host entry points get it; best-effort (a discovery failure never blocks streaming —
|
||||
`--connect` always works). The advert is held for the host's lifetime (RAII unregister).
|
||||
- **TXT records**: `proto=punktfunk/1`, `fp=<host cert SHA-256>` (the value a client pins — advisory
|
||||
over unauthenticated mDNS, TOFU/pinning still verifies on connect), `pair=required|optional`
|
||||
(so a picker knows up front whether the PIN ceremony is needed), `id=<host uniqueid>` (dedup).
|
||||
- **Client**: `punktfunk-client-rs --discover [SECS]` browses and prints each host (name, addr:port,
|
||||
pairing, fingerprint), then exits. Apple clients browse the same service natively via NWBrowser
|
||||
(Bonjour) — no Rust-connector dependency; this section's service type + TXT keys are the contract.
|
||||
- **Validated**: cross-LAN — dev box discovered the GNOME-box appliance
|
||||
(`home-worker-3 192.168.1.248:9777 pair=required fp=1dcf3a…`) and a standalone synthetic host
|
||||
(`pair=optional`); fingerprint + pairing state correct in both.
|
||||
- **Next** (not done): wire NWBrowser discovery into the Apple client UI (host picker); the
|
||||
host-side contract above is all it needs.
|
||||
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: "Windows Host"
|
||||
description: "Feasibility and scoping for a Windows host backend."
|
||||
---
|
||||
|
||||
|
||||
**Status: scoped, deferred — but de-risked.** A Windows host is architecturally an *"add a backend"*
|
||||
job, not a parallel port. The one thing that used to make it **large** — the per-client *virtual*
|
||||
output, which has no user-mode Windows API and seemingly needed a self-signed kernel Indirect
|
||||
Display Driver (IDD) — is **solved by reusing [SudoVDA](https://github.com/VirtualDrivers), the
|
||||
Sunshine Virtual Display Adapter**: a pre-built, signed IDD that creates virtual displays at
|
||||
arbitrary `WxH@Hz` on demand. We install it and drive its control interface; **no driver to write or
|
||||
WHQL-sign.** That turns the headline feature from XL into a medium backend. This doc records what's
|
||||
left so the work can be picked up deliberately.
|
||||
|
||||
(Grounded in a 4-agent read of the host crate, 2026-06-10; SudoVDA path added 2026-06-11.)
|
||||
|
||||
## What's already done for us
|
||||
|
||||
punktfunk is cleanly layered. **~95% of the codebase is platform-agnostic and reuses verbatim:**
|
||||
|
||||
| Reusable as-is | Why |
|
||||
|---|---|
|
||||
| `punktfunk-core` (protocol, FEC, crypto, session, transport, **C ABI**) | Zero platform deps — no `cfg(linux)` anywhere; the C ABI is already cross-platform |
|
||||
| QUIC control plane (`quic.rs`, pairing, mode negotiation) | quinn + tokio are portable |
|
||||
| GameStream P1.1 (mDNS, serverinfo, pairing, RTSP, ENet) — *except* `stream.rs`/`audio.rs` | pure wire logic |
|
||||
| Management REST API (`mgmt.rs`) + OpenAPI | axum/tokio, portable |
|
||||
| Pipeline + `m3.rs` orchestration | trait-generic — calls `capturer.next_frame()`, `encoder.submit/poll()`; **needs zero changes** |
|
||||
| The **trait boundaries** themselves: `Capturer`, `Encoder`, `VirtualDisplay`, `InputInjector`, `AudioCapturer`, `VirtualMic` | platform-neutral signatures; Linux deps are already isolated under `[target.'cfg(target_os="linux")'.dependencies]` |
|
||||
|
||||
So a Windows host is **new `#[cfg(target_os = "windows")]` backend modules behind the existing
|
||||
traits** — the per-frame path, protocol, and control plane don't move. No architectural refactor is
|
||||
required; the boundaries are already in the right places.
|
||||
|
||||
## What a Windows host needs (new code)
|
||||
|
||||
Each row is a Linux backend that needs a Windows sibling. Effort is the *implementation* effort;
|
||||
all reuse the existing trait.
|
||||
|
||||
| Subsystem | Linux today | Windows equivalent | Effort | Notes |
|
||||
|---|---|---|---|---|
|
||||
| **Capture** | xdg ScreenCast portal → PipeWire (dmabuf) | **DXGI Desktop Duplication** (or Windows.Graphics.Capture) → D3D11 texture | M | DXGI gives a GPU `B8G8R8A8` texture directly |
|
||||
| **Virtual display** | KWin/Mutter/Sway/gamescope protocols | **SudoVDA** (pre-built signed IDD) — install + drive its control API to add/remove a `WxH@Hz` virtual monitor per session | **M** | ✅ **no longer the blocker**: SudoVDA is the same IDD Sunshine ships, so no driver to author or sign. The `VirtualDisplay` backend = enable the adapter, create a monitor at the client's mode, capture it (DXGI), tear it down on session end. Fallback if SudoVDA is absent: capture an existing monitor (loses native-resolution) |
|
||||
| **Encode** | `ffmpeg-next` NVENC, CUDA hwframes | Media Foundation H.264/HEVC/AV1, **or** NVENC SDK direct with a D3D11 device context (`AVD3D11VADeviceContext`) | M–L | `encode.rs` AU/codec logic + NVENC option strings are portable; only the hwdevice + frame-pool glue swaps |
|
||||
| **Zero-copy bridge** | dmabuf → EGL/Vulkan → CUDA | D3D11 texture → NVENC (shared texture / `cudaImportExternalMemory` + D3D12 fence) | M | **optional** — a portable CPU-copy path already exists, so v1 can skip this |
|
||||
| **Input (ptr/kbd)** | libei (RemoteDesktop portal) / wlr protocols | **SendInput** (`keybd_event`/`mouse_event`) | S | the VK→evdev table just becomes VK→`VIRTUAL_KEY` (already Win32-native) |
|
||||
| **Input (gamepads)** | uinput X-Box-360 pad + FF rumble | **ViGEm** (Virtual Gamepad Emulation) + HID reports | M | rumble back-channel maps to ViGEm notifications |
|
||||
| **Audio capture** | PipeWire sink-monitor | **WASAPI loopback** (`IAudioCaptureClient`) | S–M | also produces interleaved f32 — same `AudioCapturer` contract |
|
||||
| **Virtual mic** | PipeWire `Audio/Source` | virtual audio device (VB-Cable-style WDM driver) or WASAPI render-to-fake-device | M | needs a driver or a bundled 3rd-party cable |
|
||||
| **`sendmmsg` batching** | `gamestream/stream.rs` | already has a `cfg(not(linux))` per-packet fallback | — | nothing to do |
|
||||
|
||||
**Rough total: ~2,000–4,000 LOC of new Rust** (no C++ driver — SudoVDA is reused as-is), spread over
|
||||
capture/encode/vdisplay/input/audio. With the driver problem solved, the overall effort is now
|
||||
**medium**; the input+audio layer alone is *small–medium*.
|
||||
|
||||
## Recommended phasing (when picked up)
|
||||
|
||||
1. **Phase 0 — "basic Windows host" (no virtual display).** Capture an *existing* monitor (DXGI
|
||||
Desktop Duplication) → Media Foundation/NVENC encode → SendInput + WASAPI loopback. This proves
|
||||
the whole stack on Windows with the smallest surface, reusing all of core/QUIC/GameStream/mgmt.
|
||||
It loses the per-client native-resolution output but is a working Windows host quickly.
|
||||
2. **Phase 1 — the virtual display via SudoVDA.** A `VirtualDisplay` backend that enables SudoVDA,
|
||||
creates a monitor at the client's exact `WxH@Hz`, captures it (DXGI), and tears it down on session
|
||||
end — restoring punktfunk's headline feature with **no driver authoring or signing**. (Ship/guide
|
||||
the SudoVDA install as a host prerequisite, like the udev rule on Linux.)
|
||||
3. **Phase 2 — input + audio parity.** ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC
|
||||
zero-copy.
|
||||
|
||||
## Why it's deferred (not started now)
|
||||
|
||||
- The remaining work is **medium** and mechanical, but **none of it is buildable or testable on the
|
||||
Linux dev box** — it would be unvalidated code until there's a Windows box in the loop.
|
||||
- SudoVDA removed the hard blocker (the signed kernel driver); what's left is a backend port, picked
|
||||
up whenever a Windows target is in scope.
|
||||
|
||||
The architecture is ready whenever the work is scheduled; this doc + the clean trait boundaries are
|
||||
the down payment. Start at **Phase 0** for the fastest path to a working Windows host.
|
||||
Reference in New Issue
Block a user