docs(site): add Windows host install, restructure nav, new public roadmap
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 2m24s
ci / web (push) Successful in 35s
android / android (push) Successful in 3m27s
ci / docs-site (push) Successful in 31s
ci / bench (push) Successful in 4m43s
deb / build-publish (push) Successful in 4m49s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m28s
docker / deploy-docs (push) Successful in 17s

- install (host): add a Windows (NVIDIA) section with signed-installer and
  certificate-trust steps; note the .cer is the same across releases.
- install-client: clarify the Windows MSIX certificate is the same every
  release (trust once, updates need nothing).
- Move "Project & Internals" out of the public docs site: relocate
  implementation-plan, apple-stage2-presenter, gamescope-multiuser,
  dualsense-haptics, ci, and gamestream-host-plan to docs/; drop them from
  the nav. Move windows-host into Host Setup.
- Rewrite roadmap as a lean public page with an at-a-glance grid and
  current statuses (Windows host shipped/beta, Apple incl. tvOS shipped,
  Android shipped, concurrent sessions + delegated pairing done).
- Fix status.md link to the now-internal implementation plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-20 19:52:23 +02:00
parent 76f4484ded
commit 9f049f965f
14 changed files with 123 additions and 913 deletions
+59 -377
View File
@@ -1,392 +1,74 @@
---
title: "Roadmap"
description: "Decided next goals and the longer-term bets."
description: "What's shipped, what's in progress, and what's next for punktfunk."
---
A quick map of where punktfunk is today and where it's heading. For the detailed, dated changelog,
see [Status & Progress](/docs/status).
Decided 2026-06-10 (research-grounded; see commit history), extended since.
**Legend:** ✅ shipped · 🟡 in progress · 🔭 planned · ⛔ blocked upstream
**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full
stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense**
— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E
live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds
against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced).
**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one
process, with native pairing driven from the **web console** (arm → show PIN), not the service log.
Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`).
**Bazzite dynamic resolution (`c894c6f`):** the host now *manages* a headless `gamescope-session-plus`
Steam session at the **client's exact resolution + refresh** — games see it (via injected
`--nested-refresh` + generated CVT modes, not the host's TV EDID), relaunched per-connection on a mode
change, reused (no Steam restart) on the same mode. Plus macOS/iPad input fixes (NSEvent motion +
iPad pointer-lock) and a 4K/5K one-frame-freeze fix (grow the UDP socket buffers).
## At a glance
The four native clients — **macOS/iOS/iPadOS/tvOS**, **Linux**, **Windows**, and **Android** — are
all real, working clients; see [Status & Progress](/docs/status) for their current state.
| Area | |
|---|---|
| Protocol core — FEC · crypto · C ABI | ✅ |
| GameStream host (works with Moonlight) | ✅ |
| Native `punktfunk/1` protocol | ✅ |
| Linux host (KWin · GNOME · gamescope · Sway) | ✅ |
| Windows host (NVIDIA) | ✅ beta |
| Apple client (macOS · iOS · iPadOS · tvOS) | ✅ |
| Linux client (GTK4) | ✅ |
| Android client (phone · TV) | ✅ |
| Windows client | 🟡 |
| Web console + pairing | ✅ |
| Concurrent sessions (shared desktop) | ✅ |
| Network speed test + bitrate | ✅ |
| HDR / 10-bit (host capture) | ⛔ |
| Sub-frame pipelining (latency) | 🔭 |
A native **Windows host** (§7) is also implemented and shipping (NVIDIA-only, x64-only) — see
[Windows Host](/docs/windows-host).
## ✅ Shipped
**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval) and the
native client presenter + iOS (§6). **§10 HDR/10-bit is parked — blocked upstream at the compositor**
(no gamescope/KWin PipeWire 10-bit producer yet).
- **The host, two ways.** A GameStream host any [Moonlight](/docs/moonlight) client can use, and the
lower-latency native [`punktfunk/1`](/docs/how-it-works) protocol (QUIC control + UDP data with
GF(2¹⁶) Leopard FEC + AES-GCM). Both run from one process.
- **Native-resolution virtual displays** on Linux across KWin, GNOME/Mutter, gamescope, and
Sway/wlroots, with a fully zero-copy GPU path to NVENC (stable 240 fps at 5120×1440).
- **A native Windows host** (NVIDIA, x64) — a signed installer with secure-desktop capture and a
bundled virtual-display driver. See [Windows Host](/docs/windows-host). *(Beta — newer than the
Linux host.)*
- **Clients on every platform** — native apps for **Apple** (macOS, iOS, iPadOS, tvOS), **Linux**,
**Android** (phone + TV), and **Windows**, each with hardware decode, controllers including
DualSense, audio + mic, and automatic host discovery. See [Clients](/docs/clients).
- **Secure by default** — SPAKE2 PIN pairing with pinned reconnects, one-click delegated approval from
the web console, and mDNS LAN auto-discovery.
- **Tuned for latency** — concurrent sessions (stream one desktop to several devices at once),
mid-stream resolution renegotiation, a cross-machine clock-skew handshake, a 1 Gbps+ data plane, and
an in-app network speed test that informs the bitrate picker.
## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)*
## 🟡 In progress
Startup is a chain of timing-sensitive handoffs with no readiness checks — each is a blind
`sleep`, one-shot timeout, or silent fire-and-forget that fails into a black screen.
- **Windows client on-glass validation.** The hardware (D3D11VA) decode, HDR present, and GUI are
built and ship as a signed MSIX — they just need verification on real GPU hardware.
- **Apple stage-2 presenter as the default.** The lower-latency `VTDecompressionSession`
`CAMetalLayer` path is live behind an opt-in flag and graduating to the default.
- **Web console parity.** Surfacing the speed test and bitrate picker the apps already have.
- **Windows host hardening.** Broader real-world testing, AMD/Intel encode (NVIDIA-only today), and
bundling the ViGEm gamepad driver.
- **Phase 1 (S):** replace `run-headless-kde.sh`'s blind `sleep 2` with an active readiness
wait (kwin socket + `wl_display` roundtrip + `zkde_screencast` global advertised +
KWIN_PID alive); add a `punktfunk-host probe-compositor` subcommand (reuses kwin.rs's
registry roundtrip); move the portal restart to *after* readiness and precede it with
`systemctl --user import-environment` + `dbus-update-activation-environment` (the missing
env import — the Sway script does this, the KDE one doesn't).
- **Phase 2 (M):** bounded retry-with-backoff around `vd.create()` + first-frame
(permanent vs transient); a PipeWire negotiation watchdog with zero-copy→CPU auto-fallback
("no PipeWire frame within 10s" → recovery or precise diagnosis); fix `set_custom_refresh`
to wait for the output, read back the active mode, reconcile encoder fps; harden gamescope
node discovery + detect the known-bad-gamescope signature; graceful PipeWire-thread stop.
- **Phase 3 (L):** supervised systemd user session (kwin + portal + host) with the readiness
probe as an `ExecStartPost` gate, `Restart=on-failure`.
## 🔭 Planned
## 2. Offer available compositors in the client ✅ *(done)*
- **Sub-frame pipelining.** Overlap encode and transmit within a single frame (a direct NVENC slice
path) — the next big latency lever at high resolutions.
- **True glass-to-glass latency** measured end to end (capture → on-screen present).
- **gamescope multi-user isolation.** Per-session input and audio so concurrent clients are fully
independent desktops (the shared-desktop case already works).
- **Peer-approved pairing.** Approve a new device from an already-paired device's own app.
Host enumerates which backends are actually available (binary present + version OK:
gamescope ≥3.16.22, KWin ≥6.5.6, gnome-shell, sway), advertises the list in the punktfunk/1
Welcome + a mgmt-API field; client sends its pick in the Hello; host honors it per session.
Picker in the Apple client + web console.
## ⛔ Parked / blocked
## 3. Bazzite / install on other devices ✅ *(packaging written — `packaging/`)*
Bazzite already ships gamescope + PipeWire + the NVIDIA driver (incl. `libnvidia-encode`);
it's Fedora-atomic and the community installs Sunshine via COPR rpm-ostree — the analog.
Written: `packaging/rpm/punktfunk.spec` (builds the host from source), `packaging/bootc/Containerfile`
(`FROM bazzite-nvidia`), `packaging/bazzite/host.env` (gamescope default), `packaging/copr/` +
`packaging/README.md`. The build itself is operator-run (COPR / a Fedora toolbox; not buildable on
the Ubuntu dev box). `LICENSE-{MIT,APACHE}` added to match the declared dual license.
- **M-Bazzite-1:** a **COPR RPM** (primary) — binary + `60-punktfunk.rules` (→
`/usr/lib/udev/rules.d`) + systemd `--user` unit + `host.env.example`; `Requires` the
NVENC ffmpeg-libs Bazzite already pulls; links host `libcuda`/`libnvidia-encode` directly.
Install = `rpm-ostree install` + reboot + add to `input`/`render`. Default backend =
Bazzite's already-present **gamescope** (minimal session plumbing).
- **M-Bazzite-2:** wrap the RPM in a **bootc/OCI image layer** (`FROM
ghcr.io/ublue-os/bazzite-nvidia:stable`) for the appliance/"just rebase" experience.
- Flatpak only later as an explicitly-degraded convenience build (sandbox fights
zero-copy NVENC/dmabuf/uinput).
## 4. Mic passthrough — client mic → host input device ✅ *(done — host side)*
The exact mirror of the host→client desktop-audio path. A PipeWire virtual source apps can
select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`.
- New `0xCB` MIC_AUDIO datagram (mirror of `0xC9`) + `NativeClient::send_audio` + ABI
`punktfunk_send_audio`.
- `audio/source_linux.rs` — near-copy of the capture file, Direction::Output, fed from a
jitter buffer (silence-fill underrun, Opus PLC).
- Host `mic_thread` (Opus decode → ring → source); teardown RAII, set `node.dont-reconnect`.
- Apple capture (AVAudioEngine → Opus). **Opt-in + paired-only** (a remote mic is a privacy
surface). punktfunk/1-only.
## 5. Touch + rich DualSense *(decision: commit to full UHID DualSense)*
- **Touch — implemented (host path), pending a backend that lands it.** `TouchDown/Move/Up`
InputKinds (reuse the abs-pointer `flags=(w<<16)|h` mapping, `code`=touch id); host
`inject/libei.rs` requests the `Touchscreen` device type + binds the `Touch` capability and
injects `ei_touchscreen` down/motion/up; `punktfunk-probe --touch-test` drags a finger.
**Validated:** KWin's RemoteDesktop portal *grants* the Touchscreen device type, but its EIS
server creates **no touchscreen device** (headless KWin) — so touch currently no-ops on KWin
(now logged once). The code is correct; it needs a backend that exposes `ei_touchscreen`
(gamescope / newer KWin / the real iPad client path) to land. wlroots: no virtual-touch wired.
- **Rich DualSense — HID backend built & validated live.** `inject/dualsense.rs`: a hand-rolled
`/dev/uhid` codec (no bindgen) presenting a genuine USB DualSense (vendor 054C/0CE6, the 232-byte
inputtino report descriptor) bound by the kernel `hid-playstation` driver. The mandatory
GET_REPORT feature handshake (calibration 0x05 / pairing 0x09 / firmware 0x20) is answered, so the
kernel creates the full device (gamepad/motion/touchpad/lightbar). Input report `0x01` is built
from gamepad frames; output report `0x02` is parsed for LED RGB, player LEDs, and **adaptive
trigger effects (L2/R2)**. Protocol carries new side-planes: rich-input `0xCC`
(touchpad/motion) + HID-output `0xCD` (LED/triggers). `/dev/uhid` udev rule shipped.
- **Rich DualSense — Phase C/D/E end-to-end, validated live.** `PUNKTFUNK_GAMEPAD=dualsense`
selects a per-session `DualSenseManager` (the `PadBackend` enum in `punktfunk1.rs`): client gamepad frames
build the DualSense report; the kernel's feedback comes back as `HidOutput` on the **0xCD** plane
(lightbar / player LEDs / adaptive triggers) while **rumble stays on the universal 0xCA plane**
(so non-DualSense clients still feel it); touchpad + motion ride the **0xCC** rich-input plane
(`DualSenseManager::apply_rich`, merged with button state). The connector + C ABI gained
`punktfunk_connection_next_hidout` (→ `PunktfunkHidOutput`) and `punktfunk_connection_send_rich_input`
(← `PunktfunkRichInput`); header regenerated. Validated on-box: a synthetic-source `punktfunk1-host` +
`punktfunk-probe --rich-input-test` created the real kernel DualSense, drove 0xCC, and decoded
12 live 0xCD events (the kernel's actual lightbar/trigger init reports) — data plane unaffected
(600/600 frames). *Remaining:* the Apple client renders adaptive triggers + rumble on a real
DualSense (`GCDualSenseAdaptiveTrigger`) — handed off to the client agent for the real playtest.
- **Advanced (audio-driven voice-coil) haptics — scoped, NO-GO for now (`docs/dualsense-haptics.md`).**
Driven by the DualSense's USB *audio* interface (4-ch, back 2 channels = haptic PCM), not HID — so
the UHID backend structurally can't carry it. Three independent walls: host capture needs a kernel
rebuild (`CONFIG_USB_DUMMY_HCD` is off → no UDC for an `f_uac2` gadget); **near-zero Linux supply**
(only ~510 Proton titles via custom Wine patches emit it; `hid-playstation`/Steam Input/RPCS3
don't); and the Apple client can't faithfully replay PCM haptics (CoreHaptics is discrete/pattern-
based, no public channel-3/4 routing). Deferred; revisit only if a real DS for capture + a UDC/host
path + a PCM-capable client all land. Adaptive triggers (HID, above) deliver the reachable 80%.
## 6. iOS/iPadOS → tvOS *(deferred)*
PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin
+ touch capture (#5) + UI. tvOS later.
## 7. Windows as a host ✅ *(implemented & shipping — NVIDIA-only, x64-only; see [Windows Host](/docs/windows-host))*
Done as an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/crypto/C-ABI) +
QUIC + GameStream + mgmt + the pipeline orchestration are all platform-agnostic and already
`cfg`-isolated (~95% reuse), so the Windows host is a set of `#[cfg(windows)]` backends behind the
existing traits:
- **Capture** — DXGI Desktop Duplication → D3D11 texture.
- **Virtual display** — **SudoVDA** (the Sunshine Virtual Display Adapter), a pre-built, signed
Indirect Display Driver that creates a `WxH@Hz` monitor on demand — so there's no kernel driver to
author or WHQL-sign. Bundled and staged by the installer; falls back to capturing an existing
monitor if absent.
- **Encode** — NVENC with a D3D11 device (`--features nvenc`), so the host is **NVIDIA-only**.
- **Input** — SendInput (mouse/keyboard) + ViGEm (virtual gamepads with rumble).
- **Audio** — WASAPI loopback capture + a WASAPI virtual mic.
It ships as a **signed Inno Setup installer** that registers a `LocalSystem` SCM service launching
into the interactive session for secure-desktop (UAC / lock-screen) capture, published by
`windows-host.yml`. Still open: AMD/Intel encode (none today), broader real-world testing, and
bundling ViGEmBus.
## 8. Pairing & trust hardening *(§8a + §8b-1 done; §8b-2 next)*
The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the
client) is built and live. Two changes harden it from "works" to "secure by default":
- ✅ **Mandatory PIN pairing by default — done & live** (`§8a`, `serve --native` now requires
pairing; `serve --open` disables it). An unpaired client is rejected at the session gate; pairing
is via the SPAKE2 PIN ceremony (one online guess, no offline attack) armed from the web console.
Validated live: unpaired → "this host requires pairing", then web-armed PIN → "client trusted".
Deployed to the dev box + Bazzite.
- ✅ **§8b-1 Delegated approval via the console — done (2026-06-12)** *(the ergonomic enabler for
"mandatory": pair a device without fetching the host PIN out of band).* An identified-but-unpaired
device that knocks on a pairing-required host is held as a **pending request** in `NativePairing`
(in-memory, deduped by fingerprint, 32-entry cap, 10-min expiry — a LAN scanner can't grow it,
and an anonymous client with no certificate records nothing). The mgmt API gains
`GET /native/pending` + `POST /native/pending/{id}/approve` (optional `{name}` to relabel) +
`POST /native/pending/{id}/deny`; the web console's Pairing page shows a **Waiting for approval**
section (live-polling) with Approve/Deny — approve pins the fingerprint on the spot, no PIN.
The `Hello` carries an optional trailing **device name** (same back-compat pattern as
compositor/gamepad/bitrate; `client-rs --name` sends it, fingerprint-derived label otherwise) so
the pending list is human-readable. End-to-end tested (knock → pending → approve → same identity
streams) + unit/mgmt tests.
- **§8b-2 (peer push — needs the client):** the host also pushes the pending request over a paired
**Device A**'s live QUIC connection (a new control-plane message); A's app renders the prompt and
replies approve/deny — the user's exact "Device A gets a notification" flow. The native/Apple UI
is a client-agent task. The Apple connector should also start sending the `Hello` device name
(needs a connect-ABI name parameter; the wire field is already live).
PIN pairing (§8a) stays the bootstrap — the first device, or when no approver is online.
## 9. Client→host network speed test + settable bitrate *(host + Apple client done — web console remaining)*
Measure what the network actually sustains so the bitrate picker is informed (suggest/cap a safe
value) instead of guesswork that ends in a stuttering stream.
**Done & live (host + protocol + connector + C ABI, `74819b1`):**
- **Bitrate negotiation**: `bitrate_kbps` rides Hello/Welcome (trailing-byte back-compat). The
client requests a rate; the host clamps to [500 kbps, 2 Gbps] (or its 20 Mbps default on 0),
applies it to NVENC (replacing the old hardcoded 20 Mbps) on the initial mode + every reconfigure,
and echoes the resolved value. C ABI: `punktfunk_connect_ex3(…, bitrate_kbps, …)` +
`punktfunk_connection_bitrate()`.
- **Bandwidth probe over the punktfunk/1 data path**: `ProbeRequest{target_kbps,duration_ms}` /
`ProbeResult{bytes_sent,…}` control messages + a `FLAG_PROBE` packet flag. The host bursts
zero-filled FEC-encoded AUs at the target goodput for the duration (clamped ≤ 3 Gbps / ≤ 5 s,
video paused), reports what it sent; the connector measures received bytes/window → goodput + loss
and exposes it (`punktfunk_connection_speed_test()` + `punktfunk_connection_probe_result()` →
`PunktfunkProbeResult{throughput_kbps, loss_pct, …}`). Probe filler is diverted from the decoder.
Validated on loopback (synthetic source): a 20 Mbps/2 s probe measured 20050 kbps at 0% loss,
interleaved probe AUs excluded from frame verification. `punktfunk-probe` gains `--bitrate` +
`--speed-test KBPS:MS` as the reference/loopback driver.
**Done (Apple client UI):** Settings grows a Bitrate control (Automatic = host default; manual is
a log-scale slider up to 3 Gbps with an above-1-Gbps "test the speed first" warning — tvOS keeps
a focus-native preset picker; rides `connect_ex3` on every connect, `PUNKTFUNK_BITRATE_KBPS` dev
override), and each host card's context menu gets
"Test Network Speed…" — a sheet that connects, runs `speed_test` (up to the host's 3 Gbps
probe ceiling for 2 s), polls `probe_result` with a live readout, and shows measured
goodput · loss · recommended bitrate (≈70% of measured, capped at the 2 Gbps session
ceiling) with a one-tap "Use N Mbps" writing the setting. Loopback-tested through the
xcframework: bitrate echo (50 000 → 50 000) + a 20 Mbps/500 ms probe completing with real numbers.
**Remaining:** surface both in the web console.
## 10. HDR + 10-bit color *(parked — blocked upstream at the compositor producer)*
Opt-in HDR10 (BT.2020 + PQ, 10-bit) streaming. Designed end to end; **blocked at capture, not in our
stack** — the compositor doesn't emit a 10-bit/HDR PipeWire frame on any shipping build. Spiked +
researched 2026-06-11 (memory: `hdr-blocked-gamescope-pipewire`); the downstream design is ready to
build the moment a producer lands.
- **The wall — gamescope capture is 8-bit.** gamescope composites HDR for a *display*
(`--hdr-enabled`, `--hdr-debug-force-output`), but its PipeWire capture node offers only `BGRx`/
`NV12` (8-bit) — confirmed by reading `src/pipewire.cpp` `build_format_params()` on upstream master
AND the box's exact build (`c31743d`); color is capped BT.601/709. Issue **#2126** ("pipewire: add
HDR streams") is OPEN + unstarted (no PR). Forcing HDR output does not change the capture format.
- **PipeWire ≥1.6** is the other prerequisite (HDR colortype transport). Fedora 43 ships 1.4.x;
Fedora 44 ships **1.6.6** — but Bazzite F44 `deck-nvidia` is **testing-only** (`:stable` is still
F43; `:testing` has a confirmed NVIDIA Game-Mode crash). Rebasing now clears *only* the PipeWire
wall while gamescope stays 8-bit → **no-go** for HDR; revisit a rebase when F44 promotes to stable
(for its own sake), not for HDR.
- **The realistic route is KWin, not gamescope:** **KWin MR !8293** is a live draft adding HDR
PipeWire capture. That pulls HDR onto the *desktop* (KWin) path — trading away gamescope's
Steam-Deck-UI polish + the dynamic-resolution work (§ above). Track #2126 and !8293.
- **Constraints (settled):** NVENC tops out at **10-bit** → no Main12, no 12-bit AV1. **HDR ⟹ HEVC
Main10** (Apple VideoToolbox decodes 10-bit HEVC but **not** 10-bit AV1). Static HDR10 SEI
(BT.2020-PQ default) since the compositor won't surface per-frame metadata. Opt-in negotiation via
the Hello/Welcome trailing-byte pattern (SDR default; client declares HDR want; host master toggle).
- **Downstream design (ready when capture unblocks):** add P010 + `ColorInfo` to capture; 10-bit
zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA
path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple
VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback.
## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)*
Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks
the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling.
**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and
untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is
only ~434874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM
(RustCrypto auto AES-NI, ~1025× headroom on x86_64); the u64 sequence/nonce space; and the **core
`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every
legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size
must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated
params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand.
- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the
clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show
headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`)
so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock
send-buffer drop — `Transport::send` → `Result<bool>`, a new `packets_send_dropped` stat (Stats +
C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe
completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe.
- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall
per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe
bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all`
(64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux
`sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with
`recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep
underdrain). ~64× fewer syscalls.
- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the
reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 2561024 (bounds burst-drop
blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump
`shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets.
- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB,
never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface.
- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path):
`punktfunk-probe --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching
`packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.51 Gbps (no explicit
`rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.
## 12. Glass-to-glass latency *(investigated; quick wins landed, bigger bets scoped)*
A 5-way investigation (2026-06-11) mapped where latency actually lives. The measured "p50 0.83 ms"
is only the same-host **capture-stamp→reassembled** slice (~3040% of true glass-to-glass) and was
measured with tiny single-chunk frames, so it excludes the pacing tail. The latency that matters, in
priority order: **(1) the host pacing tail** — `paced_submit` used to spread *every* multi-chunk
frame over ~90% of the interval (up to ~7.5 ms@120 / ~15 ms@60); **(2) native-path serialization** —
`virtual_stream` runs capture+encode+seal+paced-send on one thread, so frame N+1 can't start until
frame N's paced tail leaves the wire; **(3) client present** — `AVSampleBufferDisplayLayer` adds
~0.5 refresh (~4 ms@120Hz, ~8 ms@60Hz), the dominant client term at 60 Hz.
**Already optimal — do NOT touch** (confirmed): NVENC tuning (p1/ull/cbr/bf0/delay0/infinite-GOP +
forced-IDR — `receive_packet` is already same-frame); the device→device copy in `submit_cuda` (avoids
NVENC registration-cache thrash); FEC `max_data_per_block=4096` (every frame incl. a 4 MB IDR is one
block — no multi-block latency); the client reassembler (no jitter buffer, frame emitted on
last-packet arrival, `REORDER_WINDOW` is a dedup bound not a delay) — do **not** add a client jitter
buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
- **Done & live (`99f60b5`):** **microburst-cap pacing** — a frame ≤ a cap (default 128 KB,
`PUNKTFUNK_PACE_BURST_KB`) bursts out immediately (no pacing tail); only a bigger frame's overflow
(IDR / sustained high bitrate — the bursts that actually froze) is spread. Recovers the tail on the
common case, keeps the freeze fix for the frames that need it; 128 KB is a safe default (well under
the ~150 Mbps@60 frame size where drops began). Plus **per-frame instrumentation** (PUNKTFUNK_PERF):
`encode_us` + `pace_us` p50/p99/max + immediate-vs-paced counts, so the cap is tunable against real
numbers. **Validate with the LAN soak before raising the cap** (`send_dropped` must stay 0).
- **Done & live (`b295a5b`; validated on the GNOME box 2026-06-12):** **encode|send thread split**
on the native path — a dedicated `send_loop` thread owns the `Session` and does seal+pace+send+
probes; the encode thread captures+encodes+handles reconfig and hands `FrameMsg` over a bounded
`sync_channel(3)` with backpressure. Removes the serialization (~28 ms @60120 fps) and is the
substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
- **Done & live (skew handshake landed 2026-06-12):** **wall-clock skew handshake** — `ClockProbe`/
`ClockEcho` on the control stream (8 NTP-style rounds right after `Start`; min-RTT sample →
hostclient offset; `clock_offset_ns`). The client adds the offset to its receive instant before
differencing against the AU `pts_ns`, so the `capture→reassembled` percentiles are now valid
**across machines** (reported `skew_corrected=true`), not just same-host. Back-compat: an old host
that doesn't answer times out → `skew_corrected=false` (shared-clock assumption, as before).
Validated cross-LAN (GNOME box → dev box): offset ≈ 1.57 ms (reproducible), rtt ~140 µs, **p50
1.30 ms** skew-corrected capture→reassembled. The skew handshake is now a shared core helper
(`quic::clock_sync` → `ClockSkew`) used by both the reference client and the **embeddable
connector** — `NativeClient` runs it at connect and exposes the offset over the C ABI
(`punktfunk_connection_clock_offset_ns`), so the Apple client can convert a present instant to the
host clock. The Apple client now consumes that offset: `PunktfunkConnection.clockOffsetNs` +
`LatencyMeter` surface a **capture→client-receipt** (skew-corrected) p50/p95 in the HUD — the first
cross-machine latency the real Apple client reports. **Remaining for *true* glass-to-glass**:
(1) the **decode→present** tail — the stage-1 `AVSampleBufferDisplayLayer` decodes+presents
compressed samples internally with no per-frame callback, so it needs the **stage-2 presenter**
(`VTDecompressionSession` decode-completion timestamp + `CAMetalLayer`/display-link present) to
stamp on-glass present time; (2) the host **render→capture** term (PipeWire buffer presentation
timestamp vs our capture stamp). **render→capture is parked (low priority):** pipewire-rs 0.9.2
exposes no per-buffer meta accessor, no raw buffer pointer (`pub(crate)`), and no stream-timing
API, so reading `SPA_META_Header.pts` would require introducing raw `spa_sys`/`pw_sys` FFI into the
working, perf-critical capture buffer-acquisition — a risky rewrite for the *smallest* g2g term,
with KWin/Mutter `Header.pts` support unconfirmed. Glass-to-glass is effectively complete as
**capture→present** (the stage-2 presenter measures it). `tools/latency-probe` is still the
cross-machine orchestrator.
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
1. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
copy) — ~0.10.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
2. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
3. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
encode+send within a frame (~36 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
thread split, only after measurement justifies it.
## 13. Native-protocol LAN auto-discovery ✅ *(done — 2026-06-12, validated cross-LAN)*
The native protocol had no discovery — clients connected by `--connect HOST:PORT` only, while
GameStream already auto-discovered via mDNS (`_nvstream._tcp`). Now both the unified host
(`serve --native`) and standalone `punktfunk1-host` advertise the native service over mDNS:
- **Service**: `_punktfunk._udp.local.` (UDP — punktfunk/1 is QUIC; the advertised port is the QUIC
control/data port). Host side: `crate::discovery::advertise_native`, wired into `m3::serve` so
both host entry points get it; best-effort (a discovery failure never blocks streaming —
`--connect` always works). The advert is held for the host's lifetime (RAII unregister).
- **TXT records**: `proto=punktfunk/1`, `fp=<host cert SHA-256>` (the value a client pins — advisory
over unauthenticated mDNS, TOFU/pinning still verifies on connect), `pair=required|optional`
(so a picker knows up front whether the PIN ceremony is needed), `id=<host uniqueid>` (dedup).
- **Client**: `punktfunk-probe --discover [SECS]` browses and prints each host (name, addr:port,
pairing, fingerprint), then exits. Apple clients browse the same service natively via NWBrowser
(Bonjour) — no Rust-connector dependency; this section's service type + TXT keys are the contract.
- **Validated**: cross-LAN — a client discovered a GNOME host appliance
(`myhost.local 9777 pair=required fp=1dcf3a…`) and a standalone synthetic host
(`pair=optional`); fingerprint + pairing state correct in both.
- **Next** (not done): wire NWBrowser discovery into the Apple client UI (host picker); the
host-side contract above is all it needs.
## 14. Concurrent sessions *(shared-desktop multi-view ✅ done; multi-user deferred)*
The host no longer serves one client at a time. The accept loop spawns each session (`JoinSet`),
bounded by `--max-concurrent` (default 4 — a NVENC bound; overflow waits in the accept queue). Each
session keeps its own virtual output + NVENC encoder; the host-lifetime input/audio/mic services stay
shared.
- **Done & live (shared-desktop multi-view):** multiple devices viewing/controlling the **same**
desktop on the shared-desktop backends (kwin/mutter/wlroots) — e.g. stream *your* desktop to a
laptop + a TV at once; shared input/audio is the correct semantics there. Validated live on the
GNOME box: two clients → **two independent Mutter virtual outputs (1280×720 + 1920×1080) streaming
simultaneously**. The QUIC handshake stays in the accept loop so a failed handshake doesn't consume
a slot or block the next client.
- **Deferred — gamescope multi-user (independent desktops):** the *other* model — each client its own
gamescope instance, with **per-session input + audio + mic** (the multi-user / cloud-gaming case).
Researched 2026-06-12 and **parked**: it's a large multi-file refactor (per-instance EIS sockets +
a per-session injector + per-session **null-sink audio routing** + per-session mic) for a niche
use case, while the common multi-device case is already covered by the multi-view model above. Full
research + the plumbing list: [gamescope Multi-User Isolation](/docs/gamescope-multiuser).
- **HDR / 10-bit streaming.** Designed end to end, but blocked upstream — no shipping compositor emits
a 10-bit/HDR capture stream yet. Ready to build the moment one lands.
- **Advanced DualSense voice-coil haptics.** Scoped and shelved (it rides the controller's USB audio
interface, with near-zero game support on Linux). Adaptive triggers, rumble, and the lightbar
already ship.