docs(site): make docs-site the knowledge base — status tracker + setup guides
ci / rust (push) Has been cancelled
ci / rust (push) Has been cancelled
Per the new docs workflow (docs-site = KB layer; repo docs/ keeps design notes): - Add a canonical Status & Progress tracker (status.md): milestones, per-box live state, and a dated progress log — the go-forward place to track progress. - Add setup guides: GNOME/Mutter host (gnome-box — Secure Boot MOK enroll, the libnvidia-gl EGL fix, autologin, screen-lock disable, appliance unit), headless KDE box, and Bazzite host (ujust input group, gamescope session, gotchas). - Roadmap is now canonical in docs-site (synced the skew-handshake section 12 update); removed the repo docs/roadmap.md copy and repointed README to docs-site. - Nav (meta.json) + landing cards updated; site builds (bun run build). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -27,7 +27,9 @@ per-session virtual output (KWin, gamescope, Mutter, Sway backends), encoded wit
|
||||
plane (p50 ~0.8 ms capture→reassembled at 720p120), with a SPAKE2 PIN pairing ceremony. Both
|
||||
run from **one process** (`serve --native`), managed through a REST API + web console. Builds
|
||||
against FFmpeg 7 or 8; deployed live on Bazzite. Full status: [`CLAUDE.md`](CLAUDE.md);
|
||||
roadmap: [`docs/roadmap.md`](docs/roadmap.md).
|
||||
roadmap, setup guides & progress: the docs site ([`docs-site/`](docs-site) — Fumadocs;
|
||||
`bun run dev`), with the canonical [roadmap](docs-site/content/docs/roadmap.md) and
|
||||
[status](docs-site/content/docs/status.md) there. Design notes stay in [`docs/`](docs).
|
||||
|
||||
## Layout
|
||||
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
---
|
||||
title: "Bazzite / SteamOS-like Host"
|
||||
description: "Run punktfunk on Bazzite as a headless Steam host — gamescope session at the client's mode, input perms, and the gotchas."
|
||||
---
|
||||
|
||||
Running punktfunk on **Bazzite** (Fedora Atomic / SteamOS-like) as a headless game-streaming host.
|
||||
The host launches a **gamescope** session at the *client's* exact resolution + refresh, so games see
|
||||
the client mode, not the box's TV. Full packaging (COPR / RPM / bootc) is in
|
||||
[`packaging/bazzite/README.md`](https://github.com); this page is the operational quick-reference.
|
||||
|
||||
## Input permissions — use the ujust command
|
||||
|
||||
Gamepad + DualSense injection needs the user in the `input` group. On Bazzite you **can't** just
|
||||
`usermod -aG input` (the immutable base + how the group is managed) — use the provided recipe:
|
||||
|
||||
```sh
|
||||
ujust add-user-to-input-group
|
||||
```
|
||||
|
||||
The udev rule (`60-punktfunk.rules`) grants access to `/dev/uinput` and `/dev/uhid`. A DualSense that
|
||||
shows "detected but no input" is almost always this **host-side** `/dev/uhid`/`/dev/uinput`
|
||||
permission, not a client bug — confirm the input group + the udev rule, then re-login.
|
||||
|
||||
## Headless Steam session at the client's mode
|
||||
|
||||
The host owns a `gamescope-session-plus` session and relaunches it when the client mode changes, so
|
||||
games run at the client's resolution + refresh (`--nested-refresh` + a generated CVT mode). Requires
|
||||
the headless-appliance prerequisites (linger + `multi-user.target`) and **no** physical gaming
|
||||
session running. Configure via `host.env`:
|
||||
|
||||
```sh
|
||||
PUNKTFUNK_COMPOSITOR=gamescope
|
||||
PUNKTFUNK_GAMESCOPE_SESSION=steam # host owns the session at the client mode
|
||||
PUNKTFUNK_INPUT_BACKEND=gamescope
|
||||
```
|
||||
|
||||
## Gotchas
|
||||
|
||||
- **gamescope ≥ 3.16.22 required.** Older versions deadlock capturing on PipeWire ≥ 1.6 (a loop-lock
|
||||
bug), and a wedged capture link head-blocks the whole PipeWire daemon. Never `pkill -x gamescope-wl`
|
||||
on a box where it's the live session compositor — it kills the user's session.
|
||||
- **The hardware cursor isn't in the capture** (gamescope limitation; won't-fix for now).
|
||||
- **HDR is blocked upstream**: gamescope's capture node is 8-bit only (PipeWire HDR export
|
||||
unimplemented), so HDR/10-bit is deferred even though the downstream encode path is ready.
|
||||
|
||||
## FFmpeg
|
||||
|
||||
Bazzite ships FFmpeg 7.x / libavcodec 61 — `ffmpeg-sys-next` auto-detects it and the host builds
|
||||
against it (validated live). NVENC (`hevc_nvenc` / `av1_nvenc`) works through the system FFmpeg.
|
||||
@@ -0,0 +1,99 @@
|
||||
---
|
||||
title: "GNOME / Mutter Host Setup"
|
||||
description: "Bring up an Ubuntu GNOME desktop as a headless punktfunk appliance — the physical-NVIDIA gotchas, autologin, and the Mutter virtual-output path."
|
||||
---
|
||||
|
||||
How to bring up an **Ubuntu GNOME** box as a punktfunk host using the **Mutter** backend (per-client
|
||||
virtual output via the `RecordVirtual` D-Bus API, full zero-copy). Validated live on home-worker-3
|
||||
(Ubuntu 26.04, RTX 4090, GNOME Shell 50). Two gotchas here that a QEMU VM never hits — a **physical**
|
||||
NVIDIA box has Secure Boot and needs the GLVND EGL vendor — so they're called out explicitly.
|
||||
|
||||
## 1. NVIDIA driver under Secure Boot
|
||||
|
||||
Install the driver (`ubuntu-drivers` recommends the right `-open` build):
|
||||
|
||||
```sh
|
||||
sudo apt-get install -y nvidia-driver-595-open # match what `ubuntu-drivers devices` recommends
|
||||
```
|
||||
|
||||
On a physical box with **Secure Boot enabled**, the DKMS module is signed with a local MOK that
|
||||
isn't enrolled, so `modprobe nvidia` fails with **`Key was rejected by service`** and `nvidia-smi`
|
||||
reports it can't talk to the driver. Enroll the MOK (no BIOS change, survives kernel/driver updates):
|
||||
|
||||
```sh
|
||||
sudo mokutil --import /var/lib/shim-signed/mok/MOK.der # set a throwaway one-time password
|
||||
sudo reboot
|
||||
# At the blue "Shim UEFI key management" screen on boot: Enroll MOK -> Continue -> Yes -> <password>.
|
||||
# Needs console access (the screen is firmware-level, not reachable over SSH).
|
||||
```
|
||||
|
||||
After reboot, `nvidia-smi` should show the GPU. (Alternatively, disable Secure Boot in firmware.)
|
||||
|
||||
## 2. GNOME Wayland needs the GLVND NVIDIA EGL vendor
|
||||
|
||||
If gnome-shell logs **`GPU /dev/dri/cardN ... not supported by EGL`** / `No EGL display` and
|
||||
`libEGL warning: ... driver (null)`, GLVND has no NVIDIA EGL vendor and falls back to Mesa for the
|
||||
NVIDIA card. The missing file is `/usr/share/glvnd/egl_vendor.d/10_nvidia.json`, shipped by
|
||||
`libnvidia-gl-<DRV>` — which `nvidia-driver-NNN-open` does **not** always pull in:
|
||||
|
||||
```sh
|
||||
sudo apt-get install -y libnvidia-gl-595 # provides 10_nvidia.json
|
||||
```
|
||||
|
||||
(The EGL external-platform JSONs `10_nvidia_wayland` / `15_nvidia_gbm` are usually already present.)
|
||||
`nvidia-drm modeset=1` must also be set (it's in `/etc/modprobe.d/nvidia-graphics-drivers-kms.conf`
|
||||
on a normal install) for Wayland on NVIDIA.
|
||||
|
||||
## 3. A GNOME Wayland session for Mutter to attach to
|
||||
|
||||
The host attaches to a running GNOME **Wayland** session (`wayland-0`). On a headless box, autologin
|
||||
provides it (a connected monitor also lets the session boot; a truly headless box would need
|
||||
`gnome-shell --headless --virtual-monitor`). Enable GDM autologin:
|
||||
|
||||
```ini
|
||||
# /etc/gdm3/custom.conf
|
||||
[daemon]
|
||||
AutomaticLoginEnable = true
|
||||
AutomaticLogin = <your-user>
|
||||
```
|
||||
|
||||
### Disable the screen lock (important for a headless appliance)
|
||||
|
||||
A **locked** GNOME session makes Mutter reject RemoteDesktop/ScreenCast with
|
||||
`RemoteDesktop.CreateSession: ... Session creation inhibited` — so capture fails after the session
|
||||
auto-locks on idle. There's no human to unlock a headless box, so disable it (in the user session):
|
||||
|
||||
```sh
|
||||
gsettings set org.gnome.desktop.screensaver lock-enabled false
|
||||
gsettings set org.gnome.desktop.screensaver idle-activation-enabled false
|
||||
gsettings set org.gnome.desktop.session idle-delay 0
|
||||
```
|
||||
|
||||
## 4. Host config + appliance unit
|
||||
|
||||
`~/.config/punktfunk/host.env` for the Mutter backend:
|
||||
|
||||
```sh
|
||||
XDG_RUNTIME_DIR=/run/user/1000
|
||||
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
|
||||
WAYLAND_DISPLAY=wayland-0
|
||||
XDG_CURRENT_DESKTOP=GNOME
|
||||
PUNKTFUNK_COMPOSITOR=mutter
|
||||
PUNKTFUNK_VIDEO_SOURCE=virtual
|
||||
PUNKTFUNK_ZEROCOPY=1
|
||||
PUNKTFUNK_INPUT_BACKEND=libei
|
||||
```
|
||||
|
||||
Run it as a persistent appliance — the standard user unit + linger (no kde-session unit needed here,
|
||||
autologin provides the session):
|
||||
|
||||
```sh
|
||||
mkdir -p ~/.config/systemd/user
|
||||
cp scripts/punktfunk-host.service ~/.config/systemd/user/
|
||||
systemctl --user daemon-reload && systemctl --user enable --now punktfunk-host
|
||||
sudo loginctl enable-linger "$USER" # start the user unit at boot without an interactive login
|
||||
```
|
||||
|
||||
`serve --native` then listens on GameStream + native QUIC (9777), creates a per-client Mutter virtual
|
||||
output at the client's exact mode, and streams it zero-copy. Confirm it's up:
|
||||
`punktfunk-client-rs --discover` from another box should list it.
|
||||
@@ -0,0 +1,62 @@
|
||||
---
|
||||
title: "Headless KDE Box Setup"
|
||||
description: "Run punktfunk on a headless box with a nested KWin/Plasma session — the boot-appliance pattern."
|
||||
---
|
||||
|
||||
How to run a punktfunk host on a **headless** box (no physical display / KMS scanout) using the
|
||||
**KWin** backend: a nested headless Plasma session on `WAYLAND_DISPLAY=wayland-kde`, captured into a
|
||||
per-client virtual output. This is the dev-box pattern (a QEMU VM with a passthrough NVIDIA GPU, no
|
||||
KMS scanout → everything renders offscreen via `renderD128`).
|
||||
|
||||
## Requirements
|
||||
|
||||
- **KWin ≥ 6.5.6** (headless `--virtual` gained `createVirtualOutput`), or a DRM backend. On a box
|
||||
with no KMS scanout, `kwin --drm` is impossible — use the headless/virtual path below.
|
||||
- NVIDIA driver with GL/EGL userspace (see [Linux Host Setup](/docs/linux-setup) for the build deps).
|
||||
|
||||
## Bring up the session
|
||||
|
||||
The headless Plasma session is launched by [`scripts/headless/run-headless-kde.sh`](https://github.com),
|
||||
which starts `kwin --virtual` on `wayland-kde` plus the full Plasma desktop (portals, polkit agent, a
|
||||
supervised plasmashell). It sets the env Plasma needs — notably `XDG_MENU_PREFIX=plasma-`, without
|
||||
which plasmashell runs but the launcher menu is empty:
|
||||
|
||||
```sh
|
||||
# shell 1 — the compositor session
|
||||
bash scripts/headless/run-headless-kde.sh 1920x1080
|
||||
|
||||
# shell 2 — the host
|
||||
WAYLAND_DISPLAY=wayland-kde XDG_CURRENT_DESKTOP=KDE PUNKTFUNK_VIDEO_SOURCE=virtual \
|
||||
PUNKTFUNK_ZEROCOPY=1 cargo run -rp punktfunk-host -- serve --native
|
||||
```
|
||||
|
||||
## Boot appliance (no login, comes up at boot)
|
||||
|
||||
Two user systemd units bring the whole thing up at boot with no interaction:
|
||||
|
||||
```sh
|
||||
cp scripts/punktfunk-kde-session.service scripts/punktfunk-host.service ~/.config/systemd/user/
|
||||
cp scripts/host.env.example ~/.config/punktfunk/host.env # edit for the kwin backend
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable punktfunk-kde-session punktfunk-host
|
||||
sudo loginctl enable-linger "$USER" # start user units at boot WITHOUT a login
|
||||
reboot
|
||||
```
|
||||
|
||||
`punktfunk-kde-session.service` runs the headless KWin/Plasma session; `punktfunk-host.service`
|
||||
(`serve --native`) `After=`s it and starts listening immediately (it only touches the compositor
|
||||
per session, so the ordering is soft). `host.env` for this backend:
|
||||
|
||||
```sh
|
||||
WAYLAND_DISPLAY=wayland-kde
|
||||
XDG_CURRENT_DESKTOP=KDE
|
||||
PUNKTFUNK_COMPOSITOR=kwin
|
||||
PUNKTFUNK_VIDEO_SOURCE=virtual
|
||||
PUNKTFUNK_ZEROCOPY=1
|
||||
```
|
||||
|
||||
## Other backends
|
||||
|
||||
The same box can stream a **nested app** (no desktop) via the gamescope backend, or attach to GNOME
|
||||
([GNOME Box Setup](/docs/gnome-box)) or Sway/wlroots. Each compositor keeps its own
|
||||
`VirtualDisplay` backend — there's no cross-compositor protocol for client-sized outputs.
|
||||
@@ -14,8 +14,8 @@ AES-GCM).
|
||||
## Start here
|
||||
|
||||
<Cards>
|
||||
<Card title="Project Overview" href="/docs/overview" description="Status, architecture, and milestones at a glance." />
|
||||
<Card title="Status & Progress" href="/docs/status" description="Where the work stands, what's live on each box, and a dated progress log." />
|
||||
<Card title="Implementation Plan" href="/docs/implementation-plan" description="The full design: protocol core, milestones, and architecture." />
|
||||
<Card title="Roadmap" href="/docs/roadmap" description="Decided next goals and the longer-term bets." />
|
||||
<Card title="Linux Host Setup" href="/docs/linux-setup" description="Bring up the build environment on an NVIDIA-GPU Ubuntu VM." />
|
||||
<Card title="Host Setup" href="/docs/linux-setup" description="Build env + bring-up: Linux, headless KDE, GNOME/Mutter, Bazzite." />
|
||||
</Cards>
|
||||
|
||||
@@ -3,11 +3,15 @@
|
||||
"pages": [
|
||||
"index",
|
||||
"overview",
|
||||
"status",
|
||||
"implementation-plan",
|
||||
"roadmap",
|
||||
"m2-plan",
|
||||
"---Setup---",
|
||||
"linux-setup",
|
||||
"headless-box",
|
||||
"gnome-box",
|
||||
"bazzite",
|
||||
"windows-host",
|
||||
"dualsense-haptics"
|
||||
]
|
||||
|
||||
@@ -299,15 +299,24 @@ buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
|
||||
`sync_channel(3)` with backpressure. Removes the serialization (~2–8 ms @60–120 fps) and is the
|
||||
substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
|
||||
LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
|
||||
- **Done & live (skew handshake landed 2026-06-12):** **wall-clock skew handshake** — `ClockProbe`/
|
||||
`ClockEcho` on the control stream (8 NTP-style rounds right after `Start`; min-RTT sample →
|
||||
host−client offset; `clock_offset_ns`). The client adds the offset to its receive instant before
|
||||
differencing against the AU `pts_ns`, so the `capture→reassembled` percentiles are now valid
|
||||
**across machines** (reported `skew_corrected=true`), not just same-host. Back-compat: an old host
|
||||
that doesn't answer times out → `skew_corrected=false` (shared-clock assumption, as before).
|
||||
Validated cross-LAN (GNOME box → dev box): offset ≈ −1.57 ms (reproducible), rtt ~140 µs, **p50
|
||||
1.30 ms** skew-corrected capture→reassembled. **Remaining for true glass-to-glass**: the **client
|
||||
present-stamp** (decode→present term) — only the Apple client presents today, so it needs the
|
||||
connector to expose the offset + an Apple present-time probe; and the **render→capture** term
|
||||
(PipeWire buffer presentation timestamp vs our capture stamp). `tools/latency-probe` is still the
|
||||
cross-machine orchestrator.
|
||||
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
|
||||
1. **Wall-clock skew handshake + glass-to-glass probe** (`tools/latency-probe`) — measures the two
|
||||
biggest unmeasured terms (render→capture, decode→present); client present-stamp vs the AU's
|
||||
`pts_ns` (already attached).
|
||||
2. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
|
||||
1. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
|
||||
copy) — ~0.1–0.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
|
||||
3. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
|
||||
2. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
|
||||
off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
|
||||
4. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
|
||||
3. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
|
||||
encode+send within a frame (~3–6 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
|
||||
thread split, only after measurement justifies it.
|
||||
|
||||
|
||||
@@ -0,0 +1,54 @@
|
||||
---
|
||||
title: "Status & Progress"
|
||||
description: "Where the work stands, what's live on each box, and a running progress log."
|
||||
---
|
||||
|
||||
The living progress tracker. Milestone-level status lives in [`CLAUDE.md`](https://github.com)
|
||||
and the design in the [Implementation Plan](/docs/implementation-plan); this page is the
|
||||
**current state + a dated log** of what landed, kept up to date as work happens. Newest first.
|
||||
|
||||
## Milestones at a glance
|
||||
|
||||
| Milestone | State |
|
||||
|---|---|
|
||||
| **M1** — `punktfunk-core` + C ABI (protocol · FEC · crypto) | ✅ complete & hardened |
|
||||
| **M2** — GameStream host (Moonlight-compatible) | ✅ working end-to-end; HDR/surround-audio polish open |
|
||||
| **M3** — `punktfunk/1` native protocol (QUIC control + UDP data) | ✅ full session planes, validated live |
|
||||
| **M4** — native client decode + present (Apple first) | 🟡 stage 1 done (first light); stage-2 presenter next |
|
||||
|
||||
## Live on the boxes
|
||||
|
||||
| Box | Role | Compositor | Notes |
|
||||
|---|---|---|---|
|
||||
| **home-worker-2** (dev) | KDE/KWin appliance | kwin (headless Plasma) | QEMU VM, passthrough RTX 5070 Ti; `serve --native` user unit |
|
||||
| **home-worker-3** (GNOME) | GNOME/Mutter appliance | mutter (RecordVirtual) | RTX 4090; autologin GNOME Wayland; `serve --native` user unit. See [GNOME Box Setup](/docs/gnome-box) |
|
||||
| **home-bazzite-1** | SteamOS-like host | gamescope | host-managed Steam session at client mode. See [Bazzite Setup](/docs/bazzite) |
|
||||
|
||||
All three appliances advertise over mDNS (`_punktfunk._udp`) and require PIN pairing by default.
|
||||
|
||||
## Progress log
|
||||
|
||||
### 2026-06-12
|
||||
- **Wall-clock skew handshake** (`ClockProbe`/`ClockEcho`, 8 NTP rounds after `Start`) — makes the
|
||||
client's capture→reassembled latency valid **cross-machine**. Validated GNOME box → dev box:
|
||||
offset −1.57 ms removed, **p50 1.30 ms** skew-corrected. (`05bc9ab`)
|
||||
- **Native LAN auto-discovery** — host advertises `_punktfunk._udp` (TXT: fingerprint, pairing,
|
||||
proto); `punktfunk-client-rs --discover` lists hosts. Validated cross-LAN. (`4fff464`)
|
||||
- **Third test box stood up** — home-worker-3 (Ubuntu 26.04, RTX 4090, GNOME 50): first GNOME/Mutter
|
||||
zero-copy streaming on a real desktop; **1 Gbps probe clean** (625 MB/5 s, `send_dropped=0`).
|
||||
Two physical-NVIDIA gotchas documented in [GNOME Box Setup](/docs/gnome-box).
|
||||
- **Encode|send thread split** validated on real NIC (`send_dropped=0` at 720p60 / 1080p120). (`b295a5b`)
|
||||
|
||||
### Earlier (see roadmap + git log)
|
||||
- **1 Gbps data plane**: batched `sendmmsg`/`recvmmsg` + microburst-cap paced send thread.
|
||||
- **Boot appliance**: headless KDE session + host systemd units (no login).
|
||||
- **Speed test + settable bitrate**: negotiation + bandwidth probe (host side).
|
||||
- **DualSense** UHID + haptics; gamepads live; mic uplink; AV1 + surround (unit/live-capture tested).
|
||||
|
||||
## In flight / next
|
||||
|
||||
See the [Roadmap](/docs/roadmap) for the ordered list. Near-term:
|
||||
- **True glass-to-glass**: Apple client present-stamp (decode→present) + host render→capture term.
|
||||
- **Apple stage-2 presenter** (`VTDecompressionSession` → `CAMetalLayer`).
|
||||
- **Mandatory PIN pairing + delegated pairing approval**; concurrent sessions.
|
||||
- **bazzite** kept up to date (currently offline; one rebuild behind).
|
||||
-337
@@ -1,337 +0,0 @@
|
||||
# punktfunk roadmap — next goals
|
||||
|
||||
Decided 2026-06-10 (research-grounded; see commit history), extended since.
|
||||
|
||||
**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full
|
||||
stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense**
|
||||
— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E
|
||||
live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds
|
||||
against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced).
|
||||
**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one
|
||||
process, with native pairing driven from the **web console** (arm → show PIN), not the service log.
|
||||
Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`).
|
||||
**Bazzite dynamic resolution (`c894c6f`):** the host now *manages* a headless `gamescope-session-plus`
|
||||
Steam session at the **client's exact resolution + refresh** — games see it (via injected
|
||||
`--nested-refresh` + generated CVT modes, not the box's TV EDID), relaunched per-connection on a mode
|
||||
change, reused (no Steam restart) on the same mode. Plus macOS/iPad input fixes (NSEvent motion +
|
||||
iPad pointer-lock) and a 4K/5K one-frame-freeze fix (grow the UDP socket buffers).
|
||||
|
||||
**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval), the M4
|
||||
client presenter + iOS (§6), and a Windows host (§7 — now **de-risked via SudoVDA**, no custom
|
||||
signed driver needed). **§10 HDR/10-bit is parked — blocked upstream at the compositor** (no
|
||||
gamescope/KWin PipeWire 10-bit producer yet).
|
||||
|
||||
## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)*
|
||||
|
||||
Startup is a chain of timing-sensitive handoffs with no readiness checks — each is a blind
|
||||
`sleep`, one-shot timeout, or silent fire-and-forget that fails into a black screen.
|
||||
|
||||
- **Phase 1 (S):** replace `run-headless-kde.sh`'s blind `sleep 2` with an active readiness
|
||||
wait (kwin socket + `wl_display` roundtrip + `zkde_screencast` global advertised +
|
||||
KWIN_PID alive); add a `punktfunk-host probe-compositor` subcommand (reuses kwin.rs's
|
||||
registry roundtrip); move the portal restart to *after* readiness and precede it with
|
||||
`systemctl --user import-environment` + `dbus-update-activation-environment` (the missing
|
||||
env import — the Sway script does this, the KDE one doesn't).
|
||||
- **Phase 2 (M):** bounded retry-with-backoff around `vd.create()` + first-frame
|
||||
(permanent vs transient); a PipeWire negotiation watchdog with zero-copy→CPU auto-fallback
|
||||
("no PipeWire frame within 10s" → recovery or precise diagnosis); fix `set_custom_refresh`
|
||||
to wait for the output, read back the active mode, reconcile encoder fps; harden gamescope
|
||||
node discovery + detect the known-bad-gamescope signature; graceful PipeWire-thread stop.
|
||||
- **Phase 3 (L):** supervised systemd user session (kwin + portal + host) with the readiness
|
||||
probe as an `ExecStartPost` gate, `Restart=on-failure`.
|
||||
|
||||
## 2. Offer available compositors in the client ✅ *(done)*
|
||||
|
||||
Host enumerates which backends are actually available (binary present + version OK:
|
||||
gamescope ≥3.16.22, KWin ≥6.5.6, gnome-shell, sway), advertises the list in the punktfunk/1
|
||||
Welcome + a mgmt-API field; client sends its pick in the Hello; host honors it per session.
|
||||
Picker in the Apple client + web console.
|
||||
|
||||
## 3. Bazzite / install on other devices ✅ *(packaging written — `packaging/`)*
|
||||
|
||||
Bazzite already ships gamescope + PipeWire + the NVIDIA driver (incl. `libnvidia-encode`);
|
||||
it's Fedora-atomic and the community installs Sunshine via COPR rpm-ostree — the analog.
|
||||
Written: `packaging/rpm/punktfunk.spec` (builds the host from source), `packaging/bootc/Containerfile`
|
||||
(`FROM bazzite-nvidia`), `packaging/bazzite/host.env` (gamescope default), `packaging/copr/` +
|
||||
`packaging/README.md`. The build itself is operator-run (COPR / a Fedora toolbox; not buildable on
|
||||
the Ubuntu dev box). `LICENSE-{MIT,APACHE}` added to match the declared dual license.
|
||||
|
||||
- **M-Bazzite-1:** a **COPR RPM** (primary) — binary + `60-punktfunk.rules` (→
|
||||
`/usr/lib/udev/rules.d`) + systemd `--user` unit + `host.env.example`; `Requires` the
|
||||
NVENC ffmpeg-libs Bazzite already pulls; links host `libcuda`/`libnvidia-encode` directly.
|
||||
Install = `rpm-ostree install` + reboot + add to `input`/`render`. Default backend =
|
||||
Bazzite's already-present **gamescope** (minimal session plumbing).
|
||||
- **M-Bazzite-2:** wrap the RPM in a **bootc/OCI image layer** (`FROM
|
||||
ghcr.io/ublue-os/bazzite-nvidia:stable`) for the appliance/"just rebase" experience.
|
||||
- Flatpak only later as an explicitly-degraded convenience build (sandbox fights
|
||||
zero-copy NVENC/dmabuf/uinput).
|
||||
|
||||
## 4. Mic passthrough — client mic → host input device ✅ *(done — host side)*
|
||||
|
||||
The exact mirror of the host→client desktop-audio path. A PipeWire virtual source apps can
|
||||
select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`.
|
||||
|
||||
- New `0xCB` MIC_AUDIO datagram (mirror of `0xC9`) + `NativeClient::send_audio` + ABI
|
||||
`punktfunk_send_audio`.
|
||||
- `audio/source_linux.rs` — near-copy of the capture file, Direction::Output, fed from a
|
||||
jitter buffer (silence-fill underrun, Opus PLC).
|
||||
- Host `mic_thread` (Opus decode → ring → source); teardown RAII, set `node.dont-reconnect`.
|
||||
- Apple capture (AVAudioEngine → Opus). **Opt-in + paired-only** (a remote mic is a privacy
|
||||
surface). punktfunk/1-only.
|
||||
|
||||
## 5. Touch + rich DualSense *(decision: commit to full UHID DualSense)*
|
||||
|
||||
- **Touch — implemented (host path), pending a backend that lands it.** `TouchDown/Move/Up`
|
||||
InputKinds (reuse the abs-pointer `flags=(w<<16)|h` mapping, `code`=touch id); host
|
||||
`inject/libei.rs` requests the `Touchscreen` device type + binds the `Touch` capability and
|
||||
injects `ei_touchscreen` down/motion/up; `punktfunk-client-rs --touch-test` drags a finger.
|
||||
**Validated:** KWin's RemoteDesktop portal *grants* the Touchscreen device type, but its EIS
|
||||
server creates **no touchscreen device** (headless KWin) — so touch currently no-ops on KWin
|
||||
(now logged once). The code is correct; it needs a backend that exposes `ei_touchscreen`
|
||||
(gamescope / newer KWin / the real iPad client path) to land. wlroots: no virtual-touch wired.
|
||||
- **Rich DualSense — HID backend built & validated live.** `inject/dualsense.rs`: a hand-rolled
|
||||
`/dev/uhid` codec (no bindgen) presenting a genuine USB DualSense (vendor 054C/0CE6, the 232-byte
|
||||
inputtino report descriptor) bound by the kernel `hid-playstation` driver. The mandatory
|
||||
GET_REPORT feature handshake (calibration 0x05 / pairing 0x09 / firmware 0x20) is answered, so the
|
||||
kernel creates the full device (gamepad/motion/touchpad/lightbar). Input report `0x01` is built
|
||||
from gamepad frames; output report `0x02` is parsed for LED RGB, player LEDs, and **adaptive
|
||||
trigger effects (L2/R2)**. Protocol carries new side-planes: rich-input `0xCC`
|
||||
(touchpad/motion) + HID-output `0xCD` (LED/triggers). `/dev/uhid` udev rule shipped.
|
||||
- **Rich DualSense — Phase C/D/E end-to-end, validated live.** `PUNKTFUNK_GAMEPAD=dualsense`
|
||||
selects a per-session `DualSenseManager` (the `PadBackend` enum in `m3.rs`): client gamepad frames
|
||||
build the DualSense report; the kernel's feedback comes back as `HidOutput` on the **0xCD** plane
|
||||
(lightbar / player LEDs / adaptive triggers) while **rumble stays on the universal 0xCA plane**
|
||||
(so non-DualSense clients still feel it); touchpad + motion ride the **0xCC** rich-input plane
|
||||
(`DualSenseManager::apply_rich`, merged with button state). The connector + C ABI gained
|
||||
`punktfunk_connection_next_hidout` (→ `PunktfunkHidOutput`) and `punktfunk_connection_send_rich_input`
|
||||
(← `PunktfunkRichInput`); header regenerated. Validated on-box: a synthetic-source `m3-host` +
|
||||
`punktfunk-client-rs --rich-input-test` created the real kernel DualSense, drove 0xCC, and decoded
|
||||
12 live 0xCD events (the kernel's actual lightbar/trigger init reports) — data plane unaffected
|
||||
(600/600 frames). *Remaining:* the Apple client renders adaptive triggers + rumble on a real
|
||||
DualSense (`GCDualSenseAdaptiveTrigger`) — handed off to the client agent for the real playtest.
|
||||
- **Advanced (audio-driven voice-coil) haptics — scoped, NO-GO for now (`docs/dualsense-haptics.md`).**
|
||||
Driven by the DualSense's USB *audio* interface (4-ch, back 2 channels = haptic PCM), not HID — so
|
||||
the UHID backend structurally can't carry it. Three independent walls: host capture needs a kernel
|
||||
rebuild (`CONFIG_USB_DUMMY_HCD` is off → no UDC for an `f_uac2` gadget); **near-zero Linux supply**
|
||||
(only ~5–10 Proton titles via custom Wine patches emit it; `hid-playstation`/Steam Input/RPCS3
|
||||
don't); and the Apple client can't faithfully replay PCM haptics (CoreHaptics is discrete/pattern-
|
||||
based, no public channel-3/4 routing). Deferred; revisit only if a real DS for capture + a UDC/host
|
||||
path + a PCM-capable client all land. Adaptive triggers (HID, above) deliver the reachable 80%.
|
||||
|
||||
## 6. iOS/iPadOS → tvOS *(deferred)*
|
||||
|
||||
PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin
|
||||
+ touch capture (#5) + UI. tvOS later.
|
||||
|
||||
## 7. Windows as a host *(scoped — `docs/windows-host.md`; de-risked via SudoVDA)*
|
||||
|
||||
Architecturally an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/
|
||||
crypto/C-ABI) + QUIC + GameStream + mgmt + the `m3`/pipeline orchestration are all platform-agnostic
|
||||
and already `cfg`-isolated (~95% reuse). New `#[cfg(windows)]` backends behind the existing traits:
|
||||
capture (DXGI Desktop Duplication / Windows.Graphics.Capture), encode (Media Foundation / NVENC-SDK
|
||||
with a D3D11 context), input (SendInput + ViGEm), audio (WASAPI loopback + a virtual mic).
|
||||
|
||||
**The old blocker is gone.** Rather than author + sign our own kernel IDD for the per-client virtual
|
||||
display, use **SudoVDA** (the Sunshine Virtual Display Adapter) — a pre-built, signed Indirect
|
||||
Display Driver that creates virtual displays at arbitrary WxH@Hz on demand. The `VirtualDisplay`
|
||||
backend becomes *"install + drive SudoVDA's control API"* (M effort), not *"write + WHQL-sign a
|
||||
kernel driver"* (XL). That removes the only hard blocker — the Windows host is now a medium,
|
||||
mostly-mechanical port. Recommended start: **Phase 0** — capture an existing monitor to prove the
|
||||
stack end to end; **Phase 1** wires SudoVDA for the native-resolution output. Deferred only because
|
||||
it's unbuildable on the Linux dev box; the trait boundaries are already in the right places.
|
||||
|
||||
## 8. Pairing & trust hardening *(next)*
|
||||
|
||||
The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the
|
||||
client) is built and live. Two changes harden it from "works" to "secure by default":
|
||||
|
||||
- ✅ **Mandatory PIN pairing by default — done & live** (`§8a`, `serve --native` now requires
|
||||
pairing; `serve --open` disables it). An unpaired client is rejected at the session gate; pairing
|
||||
is via the SPAKE2 PIN ceremony (one online guess, no offline attack) armed from the web console.
|
||||
Validated live: unpaired → "this host requires pairing", then web-armed PIN → "client trusted".
|
||||
Deployed to the dev box + Bazzite.
|
||||
- **Delegated pairing approval** *(next — the ergonomic enabler for "mandatory": pair a device
|
||||
without fetching the host PIN out of band).* Target flow:
|
||||
1. Device A is already paired (authenticated) to Host X.
|
||||
2. The user tries to connect Device B to Host X.
|
||||
3. Host X surfaces a request: *"Allow Device B to pair with Host X?"*
|
||||
4. The user approves/denies; on approve, Host X admits Device B — binding B's certificate
|
||||
fingerprint — with no PIN typed.
|
||||
|
||||
Two buildable layers:
|
||||
- **§8b-1 (host + web — achievable now):** an unpaired B that connects to an approval-enabled host
|
||||
is held as a **pending request** `{id, name, fingerprint, requested_at}` in `NativePairing`
|
||||
instead of a flat reject; mgmt gains `GET /native/pending` + `POST /native/pending/{id}/{approve,
|
||||
deny}`; the web console lists pending requests with Approve/Deny. The **operator approves from
|
||||
the console** — delegated approval via the management surface.
|
||||
- **§8b-2 (peer push — needs the client):** the host also pushes the pending request over a paired
|
||||
**Device A**'s live QUIC connection (a new control-plane message); A's app renders the prompt and
|
||||
replies approve/deny — the user's exact "Device A gets a notification" flow. The native/Apple UI
|
||||
is a client-agent task.
|
||||
|
||||
PIN pairing (§8a) stays the bootstrap — the first device, or when no approver is online.
|
||||
|
||||
## 9. Client→host network speed test + settable bitrate *(host side done — client UI remaining)*
|
||||
|
||||
Measure what the network actually sustains so the bitrate picker is informed (suggest/cap a safe
|
||||
value) instead of guesswork that ends in a stuttering stream.
|
||||
|
||||
**Done & live (host + protocol + connector + C ABI, `74819b1`):**
|
||||
- **Bitrate negotiation**: `bitrate_kbps` rides Hello/Welcome (trailing-byte back-compat). The
|
||||
client requests a rate; the host clamps to [500 kbps, 500 Mbps] (or its 20 Mbps default on 0),
|
||||
applies it to NVENC (replacing the old hardcoded 20 Mbps) on the initial mode + every reconfigure,
|
||||
and echoes the resolved value. C ABI: `punktfunk_connect_ex3(…, bitrate_kbps, …)` +
|
||||
`punktfunk_connection_bitrate()`.
|
||||
- **Bandwidth probe over the punktfunk/1 data path**: `ProbeRequest{target_kbps,duration_ms}` /
|
||||
`ProbeResult{bytes_sent,…}` control messages + a `FLAG_PROBE` packet flag. The host bursts
|
||||
zero-filled FEC-encoded AUs at the target goodput for the duration (clamped ≤ 1 Gbps / ≤ 5 s,
|
||||
video paused), reports what it sent; the connector measures received bytes/window → goodput + loss
|
||||
and exposes it (`punktfunk_connection_speed_test()` + `punktfunk_connection_probe_result()` →
|
||||
`PunktfunkProbeResult{throughput_kbps, loss_pct, …}`). Probe filler is diverted from the decoder.
|
||||
Validated on loopback (synthetic source): a 20 Mbps/2 s probe measured 20050 kbps at 0% loss,
|
||||
interleaved probe AUs excluded from frame verification. `punktfunk-client-rs` gains `--bitrate` +
|
||||
`--speed-test KBPS:MS` as the reference/loopback driver.
|
||||
|
||||
**Remaining (client UI):** wire the C ABI into the Apple client — a "Test network" action
|
||||
(`speed_test` → poll `probe_result` → "~XXX Mbps · recommended bitrate YYY") feeding a bitrate
|
||||
control (`connect_ex3`), and surface both in the web console.
|
||||
|
||||
## 10. HDR + 10-bit color *(parked — blocked upstream at the compositor producer)*
|
||||
|
||||
Opt-in HDR10 (BT.2020 + PQ, 10-bit) streaming. Designed end to end; **blocked at capture, not in our
|
||||
stack** — the compositor doesn't emit a 10-bit/HDR PipeWire frame on any shipping build. Spiked +
|
||||
researched 2026-06-11 (memory: `hdr-blocked-gamescope-pipewire`); the downstream design is ready to
|
||||
build the moment a producer lands.
|
||||
|
||||
- **The wall — gamescope capture is 8-bit.** gamescope composites HDR for a *display*
|
||||
(`--hdr-enabled`, `--hdr-debug-force-output`), but its PipeWire capture node offers only `BGRx`/
|
||||
`NV12` (8-bit) — confirmed by reading `src/pipewire.cpp` `build_format_params()` on upstream master
|
||||
AND the box's exact build (`c31743d`); color is capped BT.601/709. Issue **#2126** ("pipewire: add
|
||||
HDR streams") is OPEN + unstarted (no PR). Forcing HDR output does not change the capture format.
|
||||
- **PipeWire ≥1.6** is the other prerequisite (HDR colortype transport). Fedora 43 ships 1.4.x;
|
||||
Fedora 44 ships **1.6.6** — but Bazzite F44 `deck-nvidia` is **testing-only** (`:stable` is still
|
||||
F43; `:testing` has a confirmed NVIDIA Game-Mode crash). Rebasing now clears *only* the PipeWire
|
||||
wall while gamescope stays 8-bit → **no-go** for HDR; revisit a rebase when F44 promotes to stable
|
||||
(for its own sake), not for HDR.
|
||||
- **The realistic route is KWin, not gamescope:** **KWin MR !8293** is a live draft adding HDR
|
||||
PipeWire capture. That pulls HDR onto the *desktop* (KWin) path — trading away gamescope's
|
||||
Steam-Deck-UI polish + the dynamic-resolution work (§ above). Track #2126 and !8293.
|
||||
- **Constraints (settled):** NVENC tops out at **10-bit** → no Main12, no 12-bit AV1. **HDR ⟹ HEVC
|
||||
Main10** (Apple VideoToolbox decodes 10-bit HEVC but **not** 10-bit AV1). Static HDR10 SEI
|
||||
(BT.2020-PQ default) since the compositor won't surface per-frame metadata. Opt-in negotiation via
|
||||
the Hello/Welcome trailing-byte pattern (SDR default; client declares HDR want; host master toggle).
|
||||
- **Downstream design (ready when capture unblocks):** add P010 + `ColorInfo` to capture; 10-bit
|
||||
zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA
|
||||
path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple
|
||||
VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback.
|
||||
|
||||
## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)*
|
||||
|
||||
Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks
|
||||
the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling.
|
||||
|
||||
**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and
|
||||
untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is
|
||||
only ~434–874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM
|
||||
(RustCrypto auto AES-NI, ~10–25× headroom on x86_64); the u64 sequence/nonce space; and the **M1
|
||||
`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every
|
||||
legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size
|
||||
must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated
|
||||
params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand.
|
||||
|
||||
- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the
|
||||
clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show
|
||||
headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`)
|
||||
so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock
|
||||
send-buffer drop — `Transport::send` → `Result<bool>`, a new `packets_send_dropped` stat (Stats +
|
||||
C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe
|
||||
completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe.
|
||||
- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall
|
||||
per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe
|
||||
bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all`
|
||||
(64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux
|
||||
`sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with
|
||||
`recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep
|
||||
underdrain). ~64× fewer syscalls.
|
||||
- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the
|
||||
reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 256–1024 (bounds burst-drop
|
||||
blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump
|
||||
`shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets.
|
||||
- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB,
|
||||
never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface.
|
||||
- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path):
|
||||
`punktfunk-client-rs --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching
|
||||
`packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.5–1 Gbps (no explicit
|
||||
`rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.
|
||||
|
||||
## 12. Glass-to-glass latency *(investigated; quick wins landed, bigger bets scoped)*
|
||||
|
||||
A 5-way investigation (2026-06-11) mapped where latency actually lives. The measured "p50 0.83 ms"
|
||||
is only the same-host **capture-stamp→reassembled** slice (~30–40% of true glass-to-glass) and was
|
||||
measured with tiny single-chunk frames, so it excludes the pacing tail. The latency that matters, in
|
||||
priority order: **(1) the host pacing tail** — `paced_submit` used to spread *every* multi-chunk
|
||||
frame over ~90% of the interval (up to ~7.5 ms@120 / ~15 ms@60); **(2) native-path serialization** —
|
||||
`virtual_stream` runs capture+encode+seal+paced-send on one thread, so frame N+1 can't start until
|
||||
frame N's paced tail leaves the wire; **(3) client present** — `AVSampleBufferDisplayLayer` adds
|
||||
~0.5 refresh (~4 ms@120Hz, ~8 ms@60Hz), the dominant client term at 60 Hz.
|
||||
|
||||
**Already optimal — do NOT touch** (confirmed): NVENC tuning (p1/ull/cbr/bf0/delay0/infinite-GOP +
|
||||
forced-IDR — `receive_packet` is already same-frame); the device→device copy in `submit_cuda` (avoids
|
||||
NVENC registration-cache thrash); FEC `max_data_per_block=4096` (every frame incl. a 4 MB IDR is one
|
||||
block — no multi-block latency); the client reassembler (no jitter buffer, frame emitted on
|
||||
last-packet arrival, `REORDER_WINDOW` is a dedup bound not a delay) — do **not** add a client jitter
|
||||
buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
|
||||
|
||||
- **Done & live (`99f60b5`):** **microburst-cap pacing** — a frame ≤ a cap (default 128 KB,
|
||||
`PUNKTFUNK_PACE_BURST_KB`) bursts out immediately (no pacing tail); only a bigger frame's overflow
|
||||
(IDR / sustained high bitrate — the bursts that actually froze) is spread. Recovers the tail on the
|
||||
common case, keeps the freeze fix for the frames that need it; 128 KB is a safe default (well under
|
||||
the ~150 Mbps@60 frame size where drops began). Plus **per-frame instrumentation** (PUNKTFUNK_PERF):
|
||||
`encode_us` + `pace_us` p50/p99/max + immediate-vs-paced counts, so the cap is tunable against real
|
||||
numbers. **Validate with the LAN soak before raising the cap** (`send_dropped` must stay 0).
|
||||
- **Done & live (`b295a5b`; validated on the GNOME box 2026-06-12):** **encode|send thread split**
|
||||
on the native path — a dedicated `send_loop` thread owns the `Session` and does seal+pace+send+
|
||||
probes; the encode thread captures+encodes+handles reconfig and hands `FrameMsg` over a bounded
|
||||
`sync_channel(3)` with backpressure. Removes the serialization (~2–8 ms @60–120 fps) and is the
|
||||
substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
|
||||
LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
|
||||
- **Done & live (skew handshake landed 2026-06-12):** **wall-clock skew handshake** — `ClockProbe`/
|
||||
`ClockEcho` on the control stream (8 NTP-style rounds right after `Start`; min-RTT sample →
|
||||
host−client offset; `clock_offset_ns`). The client adds the offset to its receive instant before
|
||||
differencing against the AU `pts_ns`, so the `capture→reassembled` percentiles are now valid
|
||||
**across machines** (reported `skew_corrected=true`), not just same-host. Back-compat: an old host
|
||||
that doesn't answer times out → `skew_corrected=false` (shared-clock assumption, as before).
|
||||
**Remaining for true glass-to-glass**: the **client present-stamp** (decode→present term) — only
|
||||
the Apple client presents today, so it needs the connector to expose the offset + an Apple
|
||||
present-time probe; and the **render→capture** term (compare the PipeWire buffer presentation
|
||||
timestamp to our capture stamp). `tools/latency-probe` is still the cross-machine orchestrator.
|
||||
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
|
||||
1. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
|
||||
copy) — ~0.1–0.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
|
||||
2. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
|
||||
off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
|
||||
3. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
|
||||
encode+send within a frame (~3–6 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
|
||||
thread split, only after measurement justifies it.
|
||||
|
||||
## 13. Native-protocol LAN auto-discovery ✅ *(done — 2026-06-12, validated cross-LAN)*
|
||||
|
||||
The native protocol had no discovery — clients connected by `--connect HOST:PORT` only, while
|
||||
GameStream already auto-discovered via mDNS (`_nvstream._tcp`). Now both the unified host
|
||||
(`serve --native`) and standalone `m3-host` advertise the native service over mDNS:
|
||||
|
||||
- **Service**: `_punktfunk._udp.local.` (UDP — punktfunk/1 is QUIC; the advertised port is the QUIC
|
||||
control/data port). Host side: `crate::discovery::advertise_native`, wired into `m3::serve` so
|
||||
both host entry points get it; best-effort (a discovery failure never blocks streaming —
|
||||
`--connect` always works). The advert is held for the host's lifetime (RAII unregister).
|
||||
- **TXT records**: `proto=punktfunk/1`, `fp=<host cert SHA-256>` (the value a client pins — advisory
|
||||
over unauthenticated mDNS, TOFU/pinning still verifies on connect), `pair=required|optional`
|
||||
(so a picker knows up front whether the PIN ceremony is needed), `id=<host uniqueid>` (dedup).
|
||||
- **Client**: `punktfunk-client-rs --discover [SECS]` browses and prints each host (name, addr:port,
|
||||
pairing, fingerprint), then exits. Apple clients browse the same service natively via NWBrowser
|
||||
(Bonjour) — no Rust-connector dependency; this section's service type + TXT keys are the contract.
|
||||
- **Validated**: cross-LAN — dev box discovered the GNOME-box appliance
|
||||
(`home-worker-3 192.168.1.248:9777 pair=required fp=1dcf3a…`) and a standalone synthetic host
|
||||
(`pair=optional`); fingerprint + pairing state correct in both.
|
||||
- **Next** (not done): wire NWBrowser discovery into the Apple client UI (host picker); the
|
||||
host-side contract above is all it needs.
|
||||
Reference in New Issue
Block a user