docs(site): add Windows host install, restructure nav, new public roadmap
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 2m24s
ci / web (push) Successful in 35s
android / android (push) Successful in 3m27s
ci / docs-site (push) Successful in 31s
ci / bench (push) Successful in 4m43s
deb / build-publish (push) Successful in 4m49s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m28s
docker / deploy-docs (push) Successful in 17s

- install (host): add a Windows (NVIDIA) section with signed-installer and
  certificate-trust steps; note the .cer is the same across releases.
- install-client: clarify the Windows MSIX certificate is the same every
  release (trust once, updates need nothing).
- Move "Project & Internals" out of the public docs site: relocate
  implementation-plan, apple-stage2-presenter, gamescope-multiuser,
  dualsense-haptics, ci, and gamestream-host-plan to docs/; drop them from
  the nav. Move windows-host into Host Setup.
- Rewrite roadmap as a lean public page with an at-a-glance grid and
  current statuses (Windows host shipped/beta, Apple incl. tvOS shipped,
  Android shipped, concurrent sessions + delegated pairing done).
- Fix status.md link to the now-internal implementation plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-20 19:52:23 +02:00
parent 76f4484ded
commit 9f049f965f
14 changed files with 123 additions and 913 deletions
@@ -1,126 +0,0 @@
---
title: "Apple Stage-2 Presenter (handoff)"
description: "Implementation plan for the explicit VTDecompressionSession → CAMetalLayer presenter — hand-paced present + true decode→present (glass-to-glass) measurement. Written so a Mac agent can pick it up."
---
> **Status update:** the stage-2 presenter described here has since been **built and live-validated**,
> shipping behind an opt-in flag (`AVSampleBufferDisplayLayer` remains the default known-good path).
> This page is preserved as the implementation/handoff record for that work.
The implementation plan for the **stage-2 Apple presenter**. The **stage-1** presenter feeds
compressed HEVC straight into `AVSampleBufferDisplayLayer`, which hardware-decodes **and presents
internally with no per-frame callback** — so we can't stamp decode or present, and we can't hand-pace.
Stage-2 takes explicit control: decode with `VTDecompressionSession`, present decoded frames through a
`CAMetalLayer` driven by a display link. Two wins: **~0.5 refresh off the present tail** (the biggest
client latency term at 60 Hz) and **true decode→present / glass-to-glass** numbers.
All of this is **macOS/iOS/tvOS-only** — build + validate on a Mac (`swift build && swift test`, then
live against a Linux host). The host + connector side is already done: `PunktfunkConnection.clockOffsetNs`
(the connect-time skew offset, host minus client) is what makes the present timestamp cross-machine
valid. See [Status](/docs/status) and roadmap §12.
## Where it plugs into the existing code
| Existing (stage-1) | Stage-2 change |
|---|---|
| `StreamPump` pulls AUs → `AnnexB.sampleBuffer``layer.enqueue` (compressed) | A `Stage2Pump` (or a mode flag on `StreamPump`) feeds AUs to `VTDecompressionSessionDecodeFrame` instead |
| `StreamView`/`StreamViewIOS` host an `AVSampleBufferDisplayLayer` | Host a `CAMetalLayer` (+ a display link); keep the input-capture + HUD overlay unchanged |
| `AnnexB.formatDescription(fromIDR:)` builds the format desc, refreshed on every IDR | **Reused** — it's the `VTDecompressionSession`'s format description; recreate the session when it changes |
| `LatencyMeter` records capture→client-receipt at `onFrame` | Extend to record **decode-completion** and **present** stages (below) |
Keep stage-1 behind a `UserDefaults` flag (e.g. `punktfunk.presenter = "stage1" | "stage2"`) so a
regression can fall back — `AVSampleBufferDisplayLayer` is the known-good path.
## Decode: VTDecompressionSession
1. Create the session from the IDR's `CMVideoFormatDescription`
(`AnnexB.formatDescription(fromIDR:)`):
```
VTDecompressionSessionCreate(
allocator: nil,
formatDescription: fmt,
decoderSpecification: nil, // hardware by default; no need to force
imageBufferAttributes: [
kCVPixelBufferMetalCompatibilityKey: true,
kCVPixelBufferPixelFormatTypeKey:
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, // 8-bit SDR; 10-bit (…10BiPlanar) for HDR later
],
outputCallback: <C-callback>,
decompressionSessionOut: &session)
```
2. Per AU: build the same `CMSampleBuffer` as stage-1 (`AnnexB.sampleBuffer(au:format:)`, PTS =
`au.ptsNs` @ 1e9 timescale) and submit:
```
VTDecompressionSessionDecodeFrame(session, sampleBuffer,
flags: ._EnableAsynchronousDecompression,
frameRefcon: <pts or a boxed context>, infoFlagsOut: nil)
```
3. The **output callback** delivers `(status, infoFlags, imageBuffer: CVImageBuffer?, presentationTimeStamp, …)`.
`presentationTimeStamp` is `au.ptsNs` (the host capture clock). **Stamp decode-completion here**
(`CLOCK_REALTIME` ns), retain the `CVPixelBuffer`, and push `{pts, pixelBuffer, decodedNs}` into a
small NSLock-guarded ring (the "ready" queue) the display link drains.
4. **IDR / mode change**: when `AnnexB.formatDescription` yields a new desc, check
`VTDecompressionSessionCanAcceptFormatDescription`; if not, finish-and-recreate the session (same
trigger stage-1 uses to refresh `format`). On decoder error (`kVTVideoDecoderBadDataErr`, etc.) drop
to the next IDR — there's no out-of-band extradata; recovery keyframes re-carry the parameter sets.
## Present: CAMetalLayer + display link
- `CAMetalLayer` (device = system default, `pixelFormat = .bgra8Unorm`, `framebufferOnly = true`,
`drawableSize` = stream WxH). The view: macOS `NSView`/iOS `UIView` whose `layerClass`/backing layer
is the `CAMetalLayer` (mirror `StreamView`/`StreamViewIOS`).
- **Display link** drives present: macOS `CVDisplayLink` (or `CADisplayLink` on macOS 14+),
iOS/tvOS `CADisplayLink`. Each callback carries the **target present timestamp** (`CVTimeStamp` /
`targetTimestamp`).
- Each vsync: pop the **newest** ready frame (drop older undisplayed ones — low-latency default; no
smoothing buffer to start), render a fullscreen quad sampling the **biplanar YUV** (luma +
chroma planes via `CVMetalTextureCache`) with a BT.709 YUV→RGB fragment shader, then
`commandBuffer.present(drawable)` (or `present(drawable, atTime:)`). **Stamp present time** for the
frame just shown (use the display link's target timestamp converted to `CLOCK_REALTIME`).
- Colorspace: BT.709 8-bit for now (matches the host's SDR). HDR (BT.2020/PQ, 10-bit `…10BiPlanar` +
EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`) is a later tie-in with the HDR roadmap (§10).
### Cheaper intermediate (2a) if the Metal path is too big in one step
Decode with `VTDecompressionSession` (gets the **decode-completion timestamp** = capture→decoded),
then wrap the decoded `CVPixelBuffer` in a `CMSampleBuffer` and `enqueue` it into the existing
`AVSampleBufferDisplayLayer` (it accepts uncompressed pixel buffers too). This yields the decode term
**without** a Metal renderer — but **not** true present (the layer still presents internally). Ship 2a
first if useful; 2b (CAMetalLayer + display link) is required for the on-glass present stamp.
## Measurement (the whole point)
Extend `LatencyMeter` (or add per-stage meters) so each frame records three instants, all
`CLOCK_REALTIME` ns, all shifted by `connection.clockOffsetNs` to the host clock:
- **capture→decoded** = `decodedNs + offset pts_ns` (VideoToolbox decode latency, cross-machine)
- **decode→present** = `presentedNs decodedNs` (the present tail stage-2 shortens)
- **capture→present** = `presentedNs + offset pts_ns` — **the glass-to-glass number** (modulo the
host render→capture term, still unmeasured; see roadmap §12)
Surface `capture→present` p50/p95 in the HUD (extend the existing `model.latency*` line in
`ContentView`). `skewCorrected` stays false when `clockOffsetNs == 0` (old host) — then the numbers are
same-host-only, as today.
## Validation
- `swift test`: add a decode-output test (decode a known IDR built like
`VideoToolboxRoundTripTests` → assert a `CVPixelBuffer` of the right dimensions + the
decode callback fires). Present is display-bound — validate it **live** via the HUD number.
- Live: connect to a Linux host (`punktfunk1-host --source virtual` on the GNOME box; see
[Ubuntu — GNOME](/docs/ubuntu-gnome)), confirm `capture→present` is a few ms over `capture→client`
and that `decode→present` shrank vs. an `AVSampleBufferDisplayLayer` baseline.
- Compare against the headless reference number: `punktfunk-probe` reports skew-corrected
capture→reassembled (~1.3 ms p50 GNOME box → dev box); capture→present should be that **+ decode +
present**.
## Gotchas
- VT decode is **async**; the output callback runs on a VT-managed thread — don't block it, just stamp
+ enqueue. Retain the `CVPixelBuffer` until presented (the ring owns it).
- `VTDecompressionSessionDecodeFrame` wants the **same** `CMSampleBuffer` shape stage-1 builds (AVCC
length-prefixed NALs, in-band parameter sets in the format desc, never as extradata).
- `CAMetalLayer.drawableSize` must track mode changes (the host can `Reconfigure` mid-stream — watch
`PunktfunkConnection.mode`/the new-IDR dimensions).
- Don't add a jitter/smoothing buffer for the first cut — present newest-ready for lowest latency; a
pacing policy can come later if frames look uneven.
- Keep `clients/apple/README.md`'s "Stage 2" item + [Status](/docs/status) updated when this lands.
-119
View File
@@ -1,119 +0,0 @@
---
title: "CI & Docker"
description: "Gitea Actions setup — workflows, the dockerized pieces, and the runners."
---
CI runs on **Gitea Actions** (`git.unom.io`, org `unom`). The workflows live in
`.gitea/workflows/`; they run across Linux and macOS runners and push a few images to the
Gitea container registry.
## Workflows
| Workflow | Trigger | Runner | What it does |
|---|---|---|---|
| `ci.yml` | push to `main`, PRs | Linux | Rust workspace (fmt · clippy `-D warnings` · build · test · C-ABI harness · generated-header drift) inside the `punktfunk-rust-ci` image; `web/` and `docs-site/` build + typecheck in `oven/bun:1` |
| `docker.yml` | push to `main`, `v*` tags, manual | Linux | Builds + pushes the images below (`latest` + `sha-<short>` tags) |
| `apple.yml` | push to `main`, PRs, manual | macOS | Rust core → `PunktfunkCore.xcframework``swift build` + `swift test` in `clients/apple` |
| `release.yml` | `v*` tags, manual | macOS | Production Apple builds: sandboxed macOS `.dmg` (Developer ID, notarized, stapled) attached to the Gitea release + macOS/iOS/tvOS archives uploaded to TestFlight |
| `windows-msix.yml` | push to `main`, `v*` tags, manual | Windows | Builds the Windows client for `x86_64`/`aarch64` and packages signed MSIX artifacts |
## Dockerized pieces
The host and the native clients are intentionally **not** containerized (the host needs
the GPU/compositor stack of the box it runs on). What is:
| Image | Source | Notes |
|---|---|---|
| `git.unom.io/unom/punktfunk-web` | `web/Dockerfile` (repo-root context — orval needs `docs/api/openapi.json`) | Nitro `bun` bundle; `PORT` (3000) and `PUNKTFUNK_MGMT_URL` env at runtime |
| `git.unom.io/unom/punktfunk-docs` | `docs-site/Dockerfile` | This site; `PORT` (3000) |
| `git.unom.io/unom/punktfunk-rust-ci` | `ci/rust-ci.Dockerfile` | Ubuntu 26.04 + FFmpeg 8/PipeWire/GL/GBM dev libs + a libcuda **link stub** (driver userspace, no kernel module) + pinned rustup — the container `ci.yml`'s Rust job runs in |
Registry pushes authenticate with a repo Actions secret holding a registry token (a PAT
with `write:package`; the login username in `docker.yml` is the token owner, not the
push actor).
## Runners
- **Linux runner** — runs the Rust/web/docs jobs (as docker containers) and the image
build+push jobs.
- **macOS runner** — an Apple-silicon Mac running macOS, a **host-mode** `act_runner`
(upstream now ships it as `gitea-runner`) provisioned by
[`scripts/ci/setup-macos-runner.sh`](https://git.unom.io/unom/punktfunk/src/branch/main/scripts/ci/setup-macos-runner.sh):
rustup (+ both darwin targets for the universal xcframework), Node.js (host-mode runners
execute JS actions via `node` from PATH — nothing auto-provisions it), the runner binary
in `~/.local/bin`, state under `~/ci/act-runner/` (config, `.runner` registration,
`runner.log`), kept alive by the `io.gitea.act_runner` **root LaunchDaemon** — it cannot
be a user LaunchAgent: macOS Local Network privacy silently blocks LAN dials
("no route to host") from unbundled CLI binaries in gui/user launchd domains, while
system daemons are exempt. Needs full **Xcode** for `xcodebuild -create-xcframework`
(CLT alone only covers `swift build/test`); if `xcode-select` still points at CLT, the
script auto-detects `/Applications/Xcode*.app` and bakes a `DEVELOPER_DIR` override into
the daemon environment — no `xcode-select -s` required.
- **Windows runner** — builds and packages the native Windows client (MSIX) for the
release matrix.
Re-provisioning is idempotent — re-running `scripts/ci/setup-macos-runner.sh` on the macOS
runner with a fresh `GITEA_RUNNER_TOKEN` (org `unom` → Settings → Actions → Runners →
Create new runner) re-registers it without manual cleanup.
## Apple releases
`release.yml` produces the production client builds on the Mac runner. All three app
targets share the bundle ID **`io.unom.punktfunk`** (one App Store listing, universal
purchase — effectively unchangeable after first submission). Signing is **not** secret-based:
the runner uses its **login keychain** directly, so install the **Developer ID Application**,
**Apple Distribution**, and (for the Mac App Store `.pkg`) **3rd Party Mac Developer
Installer** identities once via Xcode, with the WWDR intermediate present so they show as
valid. The only secrets are `ASC_API_KEY_P8`/`ASC_API_KEY_ID`/`ASC_API_ISSUER_ID` (App Store
Connect API key — notarization + TestFlight upload). Per-platform state:
- **macOS (Developer ID)** — sandboxed app (`Config/Punktfunk-macOS.entitlements`) → export
`notarytool` → stapled `.dmg` on the Gitea release.
- **macOS (App Store)** — manual-signed archive (Apple Distribution + the *Punktfunk macOS
App Store Distribution* profile) → upload to TestFlight. App Sandbox is **mandatory** here
and is now declared (app-sandbox + network client/server + audio-input + bluetooth/usb).
Prereqs (one-time, Apple portal): add the **macOS platform** to the App Store Connect app
record (universal purchase), install the Mac App Store distribution profile + the installer
cert above. `continue-on-error` until those exist.
- **iOS** — archive + upload to TestFlight (`method: app-store-connect`,
`destination: upload`). Crypto is declared exempt (`ITSAppUsesNonExemptEncryption`,
`Config/Info.plist`) so builds don't stall on the compliance question.
- **tvOS** — archive + upload to TestFlight (Rust core built from tier-3 targets, nightly
`-Zbuild-std` via `build-xcframework.sh`).
Each macOS target uses its own entitlements: `Config/Punktfunk-macOS.entitlements` (App
Sandbox is macOS-only) for the macOS app, and the shared `Config/Punktfunk.entitlements`
(keychain-access-groups only) for iOS/tvOS — `com.apple.security.app-sandbox` is invalid on
iOS/tvOS and would fail upload validation.
The runner needs a **release (non-beta) Xcode** — App Store processing rejects beta-SDK
builds, and a beta is unusable for the Rust side too: a newer-than-OS ld emits dylibs the
running dyld rejects ("mis-aligned LINKEDIT string pool"), killing every proc-macro build
with a misleading `E0463 can't find crate`. `build-xcframework.sh` therefore resolves
toolchains itself: non-beta Xcode for everything; with only CLT + a beta present it
builds macOS slices against CLT (packaging via any Xcode — `-create-xcframework` does no
linking) and **refuses iOS/tvOS slices** (CLT has no iOS SDK).
## Deployment
`docker.yml`'s `deploy-docs` job ships this docs site after every image push: it syncs
`compose.production.yml` to the docs server and runs `docker compose pull && up -d` there
over SSH, driven by a small set of deploy secrets (`DEPLOY_HOST` / `DEPLOY_USER` /
`DEPLOY_PORT` / `DEPLOY_SSH_KEY`). A reverse proxy in front of that server serves the
container as <https://docs.punktfunk.unom.io>. The host and the web console are NOT
deployed — the console fronts a punktfunk host's management API on whatever box runs the
host.
## Troubleshooting
- **macOS runner offline** — check `~/ci/act-runner/runner.log` on the runner; restart with
`sudo launchctl kickstart -k system/io.gitea.act_runner`. "no route to host" in the log
means the daemon is running in a gui/user domain again — see the Local Network note
above.
- **`apple.yml` fails at the xcframework step** — Xcode missing or unselected:
`sudo xcode-select -s /Applications/Xcode.app/Contents/Developer` and accept the license
(`sudo xcodebuild -license accept`), then re-run.
- **Rust job can't pull `punktfunk-rust-ci`** — the runner host's docker daemon needs a
`docker login git.unom.io` if the org/registry isn't anonymously readable.
- **Stale builder image after toolchain/dep changes** — `docker.yml` re-pushes it on every
`main` push; a manual `workflow_dispatch` of `docker.yml` forces a rebuild.
-111
View File
@@ -1,111 +0,0 @@
---
title: "DualSense Haptics"
description: "Feasibility and scoping for audio-driven DualSense haptics."
---
**Status: scoped, NO-GO for now (deferred).** Advanced voice-coil haptics on the DualSense are
driven by the controller's **USB audio interface** (4-channel surround, the back two channels carry
the haptic waveform), *not* by HID reports. Emulating that on a Linux host and faithfully replaying
it on the Apple client both hit hard walls, and the supply of software that actually *emits* these
haptics on a Linux host is essentially zero. We defer the audio-haptics feature and instead land the
parts of "really supporting the DualSense" that *are* reachable: **adaptive triggers (HID) and
two-motor rumble.**
(Grounded in a 4-agent feasibility read — host USB-gadget viability, DualSense audio descriptors,
Linux game demand, Apple client render path — 2026-06-10.)
## The one distinction that decides everything
| Feature | How it's driven | Reachable for us? |
|---|---|---|
| Basic rumble (2 motors) | HID output report `0x02`, bytes 34 | **Yes** — already parsed; client already has `nextRumble()` |
| **Adaptive triggers** (L2/R2 resistance) | HID output report `0x02`, bytes 1122 / 2233 | **Yes** — already parsed in `dualsense.rs`; just needs the `0xCD` back-channel + client render |
| **Advanced haptics** (voice-coil actuators) | **USB *audio* interface** — 4-ch, back 2 channels = haptic PCM | **No (for now)** — see the three walls below |
The UHID DualSense we already built is **HID-only**. It cannot present the DualSense's *audio*
interface, so it structurally cannot carry advanced haptics. That's not a bug in our implementation —
it's the wrong transport for this signal.
## The three walls (any one is fatal on its own)
### Wall 1 — Host capture needs a kernel rebuild
To *capture* haptic audio a game emits, the host must present a virtual device that owns the
DualSense audio interface. The standard way is a composite USB gadget (`configfs` + `f_hid` +
`f_uac2`) bound to a software UDC (`dummy_hcd`).
- ✅ Present & enabled on this box: `CONFIG_USB_CONFIGFS`, `CONFIG_USB_CONFIGFS_F_HID`,
`CONFIG_USB_CONFIGFS_F_UAC2`, plus `libcomposite`/`usb_f_hid`/`usb_f_uac2`/`u_audio` modules.
-**Blocker:** `# CONFIG_USB_DUMMY_HCD is not set` in `/boot/config-7.0.0-22-generic`. No
`dummy_hcd.ko`, no `/sys/class/udc/`. **No UDC → nothing to bind the gadget to.** Requires a
custom kernel build to enable `CONFIG_USB_DUMMY_HCD=m`, plus root for module-load/configfs.
A lighter alternative exists — a **virtual PipeWire/ALSA sink renamed as the DualSense** (this is how
the working Linux setups capture the back-2-channels today, via WirePlumber rules). It skips the
kernel rebuild, but is gated by the same Wall 2 below, and games' audio-device detection is
hardcoded per-title so it's fragile.
### Wall 2 — Almost nothing on a Linux host emits these haptics
This is the decisive one. The *supply* that would feed our capture barely exists:
- **Steam Input (Linux):** no official advanced-haptics support (open feature request as of 2026).
- **Sony's `hid-playstation` kernel driver:** explicitly does **not** expose VCM haptics or adaptive
triggers — basic rumble only.
- **RPCS3:** treats the DualSense as a generic pad; no advanced haptics.
- **Native Linux games:** effectively **zero** with advanced haptics.
- **The only working path** is a handful of Proton titles (FF7 Remake, Ghostwire, Deathloop, Animal
Well, Stellar Blade) via ClearlyClaire's *custom Wine patches*, **USB-only**, Steam Input
disabled, forced into a 4.0-surround profile, device renamed to match Windows. ~510 games total.
- Bluetooth can't carry it on Linux *or* Windows (Sony's proprietary A2DP repurposing isn't exposed).
A host-side capture feature is only as useful as the software willing to drive it. On Linux that set
is a niche-of-a-niche.
### Wall 3 — The Apple client can't faithfully replay it
Even with a captured waveform, the primary client (macOS/iOS) can't render it well:
- macOS GameController exposes the DualSense as a **basic gamepad** — no voice-coil / adaptive-trigger
access. Those are PS5-only in Apple's stack.
- CoreHaptics is **discrete, pattern-based** (`CHHapticPattern` events, ≤30 s), **not** a PCM
streaming sink. Converting a streamed haptic waveform to patterns is lossy — it throws away exactly
the fidelity that makes voice-coil haptics worth having.
- There is **no public macOS API** to route CoreAudio to the DualSense's channels 34. Doing it
anyway means private/reverse-engineered APIs that break across OS updates.
## What we *can* ship instead ("really supporting the DualSense" minus audio haptics)
The HID DualSense we built is the foundation, and the high-value parts are within reach:
1. **Adaptive triggers — GO.** `dualsense.rs` already parses the L2/R2 trigger effects out of HID
output report `0x02`. Finishing this is the paused HID work: route them over the `0xCD`
HID-output back-channel and render on the client. This delivers the headline "DualSense feel"
(trigger resistance/weapon tension) for any source that emits it — and it's pure HID, no audio
interface, no kernel rebuild.
2. **Two-motor rumble — already done.** Parsed host-side; the Apple client already has
`nextRumble()`. Wire it to `GCDeviceHaptics`/`CHHapticEngine` as discrete patterns (API-clean,
no private APIs).
3. **LED / player-LED / touchpad / motion** — already parsed; finish the `0xCC`/`0xCD` routing.
This is the resume-able HID DualSense Phase C/D/E work — it stands on its own and was never blocked.
## Conditions for a future GO on audio haptics
Revisit if **all three** change:
- A real DualSense is available on the dev box to capture an authoritative `lsusb -v` + the exact
UAC channel/sample-rate/format layout (today: undocumented, would need reverse-engineering).
- The host target gains a UDC (custom kernel with `dummy_hcd`, or real hardware OTG) **or** we accept
the PipeWire-renamed-sink path *and* the title set that emits haptics on Linux grows beyond the
Proton-patch niche.
- The client target shifts to one that can render PCM haptics (a Linux/Windows client with direct
CoreAudio-style channel access, or a future Apple API) — or we accept lossy pattern conversion.
Until then the cost/benefit is upside-down: three hard subsystems (kernel, USB gadget, audio
routing) to serve ~510 Proton titles, rendered lossily on the one client we ship.
## Recommendation
**Defer audio-driven advanced haptics. Land adaptive triggers (HID) + rumble instead** — that's the
reachable 80% of "really supporting the DualSense," needs no kernel work, and the parsing is already
written. Keep this doc as the down payment for the audio-haptics feature whenever the three
conditions above are met.
@@ -1,73 +0,0 @@
---
title: "gamescope Multi-User Isolation (deferred)"
description: "Research + design for concurrent INDEPENDENT gamescope desktops (multi-user), and why it's deferred. The shared-desktop multi-view case already landed."
---
**Status: deferred (2026-06-12).** Concurrent sessions landed for the **shared-desktop multi-view**
case — multiple devices viewing/controlling the *same* KWin/Mutter/wlroots desktop ([Status](/docs/status)).
This page captures the research for the *other* model — **independent desktops** (each client its own
gamescope instance: the multi-user / cloud-gaming-on-one-box case) — and why it's parked. Pick this
up from here if the use case becomes a priority.
## What landed vs what this is
| Model | Backends | Input | Audio | Status |
|---|---|---|---|---|
| **Shared-desktop multi-view** | kwin / mutter / wlroots | shared (all drive one desktop) | shared (all hear one desktop) | ✅ **landed** — correct semantics: stream *your* desktop to laptop + TV at once |
| **Independent desktops (multi-user)** | gamescope | **per-session** (each drives its own game) | **per-session** | ⏸ **deferred** — this page |
For independent desktops, shared input/audio is *wrong* — each user must drive and hear only their own
session. gamescope is the natural fit: each `create()` spawns a fresh nested compositor (own
rendering, own EIS input socket). The blocker is that the host's input/audio/mic are host-lifetime
**shared** services, and the gamescope EIS socket is relayed through a single global file.
## Current architecture (the research)
Each gamescope **process is per-session** (`vdisplay/gamescope.rs::create()` spawns one; the
`VirtualOutput.keepalive` owns it). But:
- **EIS input socket — single global file.** gamescope exports `LIBEI_SOCKET` for its children; a
shell wrapper relays it to the fixed path `/tmp/punktfunk-gamescope-ei` (`EI_SOCKET_FILE`).
**Two concurrent instances overwrite each other's socket name** in that one file.
- **Injector — one host-lifetime `!Send` service.** `punktfunk1.rs::InjectorService` opens **one**
`inject::open(backend)` for the whole run and forwards events over an mpsc channel. It was made
shared deliberately (the portal `CreateSession` churn wedged KWin's EIS — "EIS setup timed out").
For gamescope it reads the one global socket file, so all sessions' input lands in whichever
instance wrote last.
- **Audio — global default-sink monitor.** `audio::open_audio_capture()` sets
`STREAM_CAPTURE_SINK` and autoconnects to the host's **default sink monitor** (PW_ID_ANY) — the
whole system's output, not a per-gamescope node. gamescope exposes **no per-instance audio node**.
- **Mic — one global `Audio/Source`.** `MicService` feeds one PipeWire source named `punktfunk-mic`;
all clients' mic uplinks mix into it.
- Per-session already (no work): the gamescope process, the PipeWire video node, and the uinput
gamepads.
## What it would take
1. **Per-instance EIS socket** — give each gamescope a unique relay file
(`/tmp/punktfunk-gamescope-{id}-ei`) and carry the path on `VirtualOutput` (new field) so the
session can find its own socket.
2. **Per-session injector** — for gamescope sessions, create a **per-session** injector bound to that
socket (its own thread, since `InputInjector` is `!Send`), instead of the shared `InjectorService`.
Keep the shared service for the portal backends (kwin/mutter) where shared input is correct.
Ordering nuance: the input thread is wired before the gamescope socket exists, so the per-session
injector must open **lazily** (on first event, by which time gamescope is up) or be created after
`build_pipeline`.
3. **Per-session audio (the bigger piece).** gamescope has no per-instance audio node, but audio
*is* isolatable: create a **per-session PipeWire null-sink**, route that gamescope's apps to it
(`PULSE_SINK` / a target node on the spawn env), and capture **that sink's monitor** per session.
This is the largest addition — null-sink create/teardown + routing + per-session capture.
4. **Per-session mic** — a virtual `Audio/Source` per session (`punktfunk-mic-{id}`), routed into
that gamescope, instead of the one global source.
## Why deferred
- It's a **large multi-file refactor** — the whole input path (per-instance sockets + per-session
injector + the lazy-open ordering), **plus** per-session null-sink audio routing, **plus** per-session
mic — for a **niche** use case (multiple independent users gaming on one box).
- The **common** concurrency case — stream one desktop to several of *your own* devices — is the
shared-desktop multi-view model, which **already landed and is the correct semantics** for it.
- No correctness gap in what shipped: concurrent sessions work today; this is purely the *additional*
independent-desktops model.
Revisit when there's a real multi-user requirement. The plumbing list above is the whole job.
@@ -1,95 +0,0 @@
---
title: "GameStream Host"
description: "Stream to a stock Moonlight client on a client-sized virtual display."
---
The shippable milestone (plan §8). A stock Moonlight/Artemis client discovers this host,
pairs, launches, and gets video (then input, then audio) on a client-sized virtual display.
Ground-truth protocol reference: [`research/gamestream-protocol-research.json`](research/gamestream-protocol-research.json)
(distilled from Sunshine + moonlight-common-c source; cite those for byte-level detail).
## Architecture (respects the "one core" invariant)
- **punktfunk-core** gains a **P1 GameStream wire codec** (`ProtocolPhase::P1GameStream`, the
hook already exists): the exact RTP+`NV_VIDEO_PACKET` framing, the GameStream FEC shard
layout, and the video/audio AES-GCM/CBC paths. Hot path, native threads, **no async**.
Kept beside punktfunk's native internal format (P2), selected by phase.
- **punktfunk-host** gains the **control plane** (tokio/axum OK — I/O-bound, not the hot path):
mDNS discovery, nvhttp serverinfo + the 4-phase pairing, the RTSP handshake, the ENet
control stream + input injection, the virtual-display lifecycle, and Opus audio encode.
## Port map (base 47989; Moonlight derives all by offset)
| Port | Proto | Role |
|---|---|---|
| 47989 | TCP | HTTP nvhttp (unpaired: /serverinfo, /pair PIN flow) |
| 47984 | TCP | HTTPS nvhttp (paired; **client-cert pinned**) — /launch, /resume, … |
| 48010 | TCP | RTSP (OPTIONS/DESCRIBE/SETUP/ANNOUNCE/PLAY) |
| 47998 | UDP | Video RTP (+ RS-FEC, optional AES-GCM) |
| 47999 | UDP | Audio RTP (Opus, RS-FEC 4+2, optional **AES-CBC**) |
| 48000 | UDP | ENet control stream (AES-GCM) + remote input |
| 5353 | UDP | mDNS `_nvstream._tcp.local` advertisement |
## Key wire facts (the non-obvious ones)
- **Video datagram** = `RTP_PACKET(12, BIG-endian)` + `reserved[4]` + `NV_VIDEO_PACKET(16,
LITTLE-endian)` + payload. Endianness differs *within the same packet*. `header=0x80|0x10`.
- `fecInfo` (u32 LE) = `(dataShards<<22)|(fecIndex<<12)|(fecPercentage<<4)`; parityShards is
**recomputed** by the client as `ceil(dataShards*pct/100)` — must match exactly.
- `multiFecBlocks` = `(blockIdx<<4)|((nBlocks-1)<<6)`; **≤4 FEC blocks/frame**, ≤255 shards/block.
- Each frame's bitstream is prefixed with an 8-byte `video_short_frame_header_t`
(`headerType=0x01`, `frameType` 2=IDR, `lastPayloadLen`) before striping into shards.
- Shard size = `packetSize + 16`. Data shards first, then parity, over a contiguous RTP
sequence range. Last data shard zero-padded.
- **Video crypto** (when `SS_ENC_VIDEO` negotiated): AES-128-GCM, key = raw 16-byte RIKEY
(from `/launch?rikey=`), IV = `counter_le[8]||0,0,0||'V'(0x56)`, **NO AAD**, 32-byte
`ENC_VIDEO_HEADER{iv[12],frameNumber,tag[16]}` prefix; **FEC first, then encrypt per shard**.
- **Pairing**: PIN key = `SHA-256(salt[16] || ascii_pin)[..16]`; AES-128-**ECB** (no padding)
for the challenge blocks; SHA-256 rolling hashes; RSA-SHA256 signatures over X.509 certs;
the client cert is pinned for subsequent HTTPS. 4 phases over `/pair?phrase=…`.
- **RTSP** `Session: DEADBEEFCAFE;timeout = 90` (literal), `Transport: server_port=<p>`,
`streamid=video/0/0` / `control/13/0`. ANNOUNCE carries the negotiated config
(`x-nv-video[0].*`, `x-nv-vqos[0].*`) → maps to `punktfunk_core::Config`.
## The two highest interop risks (validate EARLY)
1. **RS-FEC matrix compatibility.** Sunshine + Moonlight both use **nanors** (GF(2⁸), poly
0x11d, Vandermonde systematic). punktfunk-core uses `reed-solomon-erasure` (Cauchy) — parity
bytes likely **don't match**, so Moonlight silently fails to recover any frame with a lost
data shard. Mitigation: **on a clean LAN with no loss the client never runs RS decode**, so
defer this — get a frame decoded first, then FFI/port nanors for loss recovery.
2. **Crypto layout.** punktfunk's `SessionCrypto` (salt + seq-as-AAD) is wire-incompatible. P1
needs a separate GameStream GCM path. Mitigation: **video encryption is negotiated and
usually off on LAN** — implement plaintext video first, add GCM later.
## Phasing (each phase independently testable with a real Moonlight client)
- **P1.1 — Discovery + serverinfo + pairing.** mDNS `_nvstream._tcp`, HTTP/HTTPS nvhttp,
`/serverinfo` XML, the 4-phase pairing + cert pinning. *Acceptance: Moonlight discovers,
pairs (PIN), and shows the host as ready.* ← first slice.
- **P1.2 — Launch + RTSP + virtual display.** `/launch` (parse rikey/rikeyid/mode), the RTSP
handshake, negotiate `Config`, create a wlroots virtual output sized to the client.
*Acceptance: Moonlight completes RTSP and the host stands up the UDP streams.*
- **P1.3 — Video (punktfunk-core P1 codec), plaintext, clean-LAN.** RTP+NV framing + FEC shard
layout in punktfunk-core; wire the spike's NVENC AUs → UDP 47998. *Acceptance: Moonlight DISPLAYS video.*
- **P1.4 — Control + input.** ENet (`rusty_enet`) control stream; decode input → `inject.rs`
(uinput/reis); request-IDR → force NVENC keyframe. *Acceptance: mouse/keyboard work.*
- **P1.5 — Robustness: FEC recovery + encryption.** nanors-exact FEC; per-shard AES-GCM.
*Acceptance: stable under `tc netem` loss; encrypted streams.*
- **P1.6 — Audio + polish.** Opus + audio RTP/FEC/CBC (UDP 47999); disconnect teardown; KWin
backend for the user's KDE box. *Acceptance: full game stream with sound — the GameStream-host goal.*
## Crates (verified available)
`mdns-sd` 0.20 (discovery) · `axum` 0.8 + `rustls` + `tokio-rustls` (nvhttp/HTTPS, custom
`ClientCertVerifier` for pinning) · `rcgen` 0.14 + `x509-parser` 0.18 + `rsa`/`sha2`/`aes`/
`ecb` (pairing crypto) · hand-rolled RTSP over `tokio::net::TcpListener` · `rusty_enet` 0.4
(control) · `opus` 0.3 (audio) · `reis` 0.6 + `input-linux` (input) · `aes-gcm` (already in
core) for the P1 video/control GCM path; nanors (FFI/port) for FEC recovery in P1.5.
## Testing note
The host is headless; end-to-end needs a **stock Moonlight client on the LAN** pointed at
this box (manual "add host" by IP works without mDNS). P1.1 is testable with `curl` against
`/serverinfo` + the Moonlight pair flow; P1.3+ needs a client that can display.
@@ -1,300 +0,0 @@
---
title: "Implementation Plan"
description: "The full design: protocol core, milestones, and architecture."
---
*A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.*
> The name `punktfunk` fits the lowercase house style (`unom`, `played`, `remplir`) and reads as "glass-to-glass light," which is the whole point.
---
## 0. The thesis (why this is worth building)
Two concrete gaps justify a new project rather than another fork:
1. **The 1 Gbps wall is a FEC design limit, not a bandwidth limit.** Moonlight/Sunshine protect each frame with ReedSolomon over GF(2⁸), which caps a block at 255 shards. At 5120×1440@240 that ceiling is hit around 1 Gbps. Switching the erasure code to **Leopard-RS over GF(2¹⁶)** (via the `reed-solomon-simd` crate) raises the per-block shard limit to 65,536 and runs in O(n log n) with SIMD. The wall disappears as a *consequence* of a better core, not as a hack.
2. **Linux software virtual displays are a real, unfilled gap.** The compositor-side capability now exists (Mutter headless virtual monitors since GNOME 40; wlroots headless outputs; KWin virtual outputs in Plasma 6), but no streaming host *drives* those APIs to create a client-sized output on demand, capture it via PipeWire, and route input back via libei. Apollo's virtual display is Windows-only. This is the immediate, shippable win.
**Strategic ordering:** ship the Linux virtual-display host speaking the *existing* Moonlight protocol first (every Moonlight/Artemis client works on day one, no client to write). Only then introduce the new GF(2¹⁶) transport as a negotiated protocol extension with our own clients. Value early, hard parts deferred until de-risked.
---
## 1. Scope & non-goals
**In scope (eventually):**
- Linux streaming host with on-demand software virtual displays (KWin first, then wlroots, then Mutter).
- A shared Rust protocol/transport/FEC core exposed over a stable C ABI.
- A modern transport that removes the 1 Gbps ceiling.
- Native clients: Rust (Linux), Swift (macOS/iOS), Kotlin (Android) — all linking the same core.
**Explicit non-goals (at least at first):**
- Windows *host* support (Sunshine/Apollo already do this well; no gap to fill).
- Internet/NAT-traversal relay infrastructure (LAN/VPN first; lean on an existing mesh VPN such as Headscale/NetBird/Tailscale).
- Reinventing encoders/decoders (bind to FFmpeg + vendor SDKs; never rewrite codecs).
- A bespoke compositor (drive existing ones; only consider a dedicated headless compositor as a *deployment mode*, see §6).
---
## 2. Architecture overview
```mermaid
flowchart TD
subgraph Host["Linux Host (Rust)"]
VD["Virtual display orchestrator<br/>(KWin / wlroots / Mutter)"]
CAP["Capture<br/>(PipeWire / dmabuf)"]
ENC["Encoder<br/>(VAAPI / NVENC via FFmpeg)"]
VD --> CAP --> ENC
ENC --> COREH
IN_H["Input injector<br/>(libei / uinput)"]
COREH["punktfunk-core (C ABI)<br/>protocol · FEC · pacing · crypto"]
COREH --> IN_H
end
COREH <-->|"UDP+FEC video / QUIC control+audio"| COREC
subgraph Client["Client (Rust / Swift / Kotlin)"]
COREC["punktfunk-core (same crate, C ABI)"]
DEC["Decoder<br/>(VideoToolbox / NVDEC / VAAPI)"]
PRES["Present + frame pacing"]
INP["Input capture"]
COREC --> DEC --> PRES
INP --> COREC
end
```
**The load-bearing decision:** `punktfunk-core` is one crate, compiled once, linked by every host and client through a C ABI. Protocol logic, FEC, packet pacing, jitter buffering, pairing, and crypto live there and exist exactly once. Platform code (capture, encode, decode, present, input, UI) lives outside the core and is written in whatever language suits the platform.
---
## 3. Protocol strategy (three phases)
| Phase | Protocol | Clients that work | Bitrate ceiling | Purpose |
|------|----------|-------------------|-----------------|---------|
| **P1** | GameStream-compatible (existing Moonlight wire format) | All existing Moonlight/Artemis clients | ~1 Gbps (legacy GF(2⁸) FEC) | Ship the Linux virtual-display win with zero client work |
| **P2** | `punktfunk/1` negotiated extension: GF(2¹⁶) FEC, multi-block framing, optional QUIC control | punktfunk clients only; falls back to P1 for others | Multi-Gbps | Break the wall; introduce native clients |
| **P3** | `punktfunk/1` as primary; GameStream kept as compat shim | punktfunk everywhere, Moonlight as fallback | Multi-Gbps | Full control of features (mic passthrough, per-client identity, HDR signalling) |
Negotiation: extend the `serverinfo`/RTSP `SETUP` handshake with a capability flag. Old clients never see the flag and get P1 behavior. This is how Apollo/Artemis diverge cleanly, and it keeps you compatible while you build.
---
## 4. Tech stack (settled)
**Language split:** Rust for the core and all non-Apple platform code; Swift only for the macOS/iOS client UI + VideoToolbox/Metal; Kotlin for Android UI + MediaCodec. The C ABI is the seam.
**Threading:** native OS threads for the video hot path. `tokio` is allowed *only* for the control plane (pairing, web config, QUIC control stream). The per-frame pipeline must never touch an async runtime.
### Core crate dependencies
| Concern | Crate | Notes |
|--------|-------|-------|
| FEC | `reed-solomon-simd` (v3+) | Leopard/GF(2¹⁶), SIMD, O(n log n) — the wall-breaker |
| QUIC (control/audio) | `quinn` | Datagram ext for audio; reliable streams for control |
| TLS / crypto | `rustls` + `ring` (or `aws-lc-rs`) | Pairing, session keys (AES-GCM to match GameStream in P1) |
| Serialization | `zerocopy` / `bytes` | Wire structs `#[repr(C)]`, zero-copy parse |
| C header gen | `cbindgen` | Generates `punktfunk_core.h` from the ABI module |
| Error/log | `tracing` | Structured; feature-gate off the hot path |
### Linux host dependencies
| Concern | Crate / API | Notes |
|--------|-------------|-------|
| Capture | `pipewire` (pipewire-rs) | ScreenCast portal stream → dmabuf |
| Portal / DBus | `ashpd` + `zbus` | xdg-desktop-portal: ScreenCast, RemoteDesktop |
| Encode | `ffmpeg-next` or `rsmpeg` | VAAPI / NVENC, dmabuf import (zero-copy) |
| Input inject | `reis` (libei) + `input-linux` (uinput fallback) | Wayland-native first, uinput as universal fallback |
| Virtual output | per-compositor (see §6) | KWin DBus / Sway `create_output` / Mutter DBus |
| Web config | `axum` + `tokio` + small Vite/React UI | You own this stack already |
### Apple client (P2+)
Swift + VideoToolbox (decode) + Metal (present) + SwiftUI. Imports `punktfunk_core.h` directly via a module map — no glue layer.
### Ruled out
- **Swift for the host/core:** no Linux Wayland/PipeWire/DRM/VAAPI ecosystem; ARC in hot loops. (Excellent *Apple-client* language, wrong for systems/Linux.)
- **Go:** GC disqualifies the hot path.
- **C++:** throws away the safety/concurrency wins that justified greenfield over forking.
- **Zig:** best-in-class C interop, but pre-1.0 with no Wayland/QUIC ecosystem — too much risk for a multi-month build. Revisit later if desired.
---
## 5. The C ABI boundary
Design it on day one; retrofitting an ABI is painful.
**Principles**
- Opaque handles only across the boundary: `PunktfunkSession*`, never Rust types.
- All cross-boundary structs are `#[repr(C)]`; primitives + pointer/len pairs for buffers.
- Async events via registered C callbacks (`fn ptr` + `void* userdata`).
- Explicit, documented ownership: who frees what, when. Provide `punktfunk_*_free` for every allocation that crosses out.
- Versioned ABI: `uint32_t punktfunk_abi_version(void)` + a `PunktfunkConfig` struct whose first field is its own size for forward-compat.
**Minimal surface (sketch)**
```c
// lifecycle
PunktfunkSession* punktfunk_session_new(const PunktfunkConfig* cfg);
void punktfunk_session_free(PunktfunkSession*);
// host: feed an encoded access unit (the core does FEC + packetize + pace + send)
int punktfunk_host_submit_frame(PunktfunkSession*, const uint8_t* data, size_t len,
uint64_t pts_ns, PunktfunkFrameFlags flags);
// client: pull a reassembled, FEC-recovered access unit ready to decode
int punktfunk_client_poll_frame(PunktfunkSession*, PunktfunkFrame* out /*borrowed until next poll*/);
// input (both directions): client captures, host receives via callback
int punktfunk_send_input(PunktfunkSession*, const PunktfunkInputEvent*);
void punktfunk_set_input_callback(PunktfunkSession*, PunktfunkInputCb, void* user);
// stats for the frame-pacing/quality logic and the web UI
void punktfunk_get_stats(PunktfunkSession*, PunktfunkStats* out);
```
Keep it this small. Everything platform-specific (how you got the encoded bytes, how you decode them) stays on the platform side.
---
## 6. Virtual display orchestration
This is the differentiator and the most fragmented part. Two deployment models — support both eventually, pick one for the MVP.
**Model A — Attach to the running session.** Create a client-sized virtual output *inside the user's live desktop*, stream it, tear it down on disconnect. This is "add a monitor to my actual PC." Best UX, hardest because it depends on per-compositor runtime APIs.
**Model B — Dedicated headless session.** Spawn a separate headless compositor purely for the stream (e.g. `gnome-shell --headless --virtual-monitor WxH`, or a headless wlroots compositor). Cleaner isolation, sidesteps runtime-output APIs, ideal for "remote second PC." Worse for "mirror/extend my real desktop."
**Per-compositor (Model A) runtime virtual-output creation:**
- **KWin / Plasma 6 (recommended MVP target — a common KDE daily-driver setup, and where the gap is loudest):** KWin can create virtual outputs; KRdp already does this internally for remote sessions. Drive it via the KWin DBus interface; capture via `xdg-desktop-portal-kde` ScreenCast (PipeWire); inject input via the RemoteDesktop portal or `reis`.
- **wlroots (Sway/Hyprland — fastest to *prototype* the pipeline):** enable the headless backend (`WLR_BACKENDS=…,headless`), then `swaymsg create_output` / `hyprctl output create headless`. Capture via `wlr-screencopy` or the portal. Simplest API; good for validating capture→encode→send before fighting KWin/Mutter.
- **Mutter / GNOME:** virtual monitors via the headless backend; runtime creation via Mutter DBus (`org.gnome.Mutter.*` — partly experimental). Capture via `xdg-desktop-portal-gnome` ScreenCast.
**Recommendation:** do a 12 day wlroots spike to prove the *pipeline*, then build the real MVP on KWin because that's your deployment target. Abstract virtual-output creation behind a trait so compositors are pluggable:
```rust
trait VirtualDisplay {
fn create(&self, mode: Mode) -> Result<OutputHandle>;
fn destroy(&self, h: OutputHandle) -> Result<()>;
}
```
---
## 7. The hot path: pipeline & latency budget
Per-frame pipeline, each stage on its own thread, connected by bounded SPSC channels (drop-oldest on overflow, never block the encoder):
```
capture(dmabuf) → encode(NVENC/VAAPI) → core[FEC+packetize+pace+send]
│ network
client: recv → core[reorder+FEC recover+jitter] → decode → present
```
**Glass-to-glass budget (LAN, 240 Hz = 4.17 ms/frame):**
| Stage | Target | Notes |
|------|--------|-------|
| Capture latency | ≤ 1 frame | dmabuf, no copy to CPU |
| Encode | 14 ms | NVENC low-latency preset; tune lookahead off |
| FEC + packetize | < 1 ms | SIMD RS; pre-allocated shard buffers |
| Network (LAN) | < 1 ms | `sendmmsg` / UDP GSO to cut syscalls |
| Jitter buffer | 01 frame | adaptive; minimum that hides observed jitter |
| FEC recover + reassemble | < 1 ms | only when loss occurs |
| Decode | 14 ms | hardware decoder |
| Present | ≤ 1 frame | align to client vsync |
**Target: 1535 ms glass-to-glass on LAN.** The art is *frame pacing* — matching capture/encode cadence to the client's actual refresh and keeping the jitter buffer as small as the link allows. This, not the codec, is what separates good from bad streaming. Budget real time for it.
**Throughput math to keep honest:** 5120×1440@240 ≈ 1.77 Gpx/s. At 0.5 bpp that's ~885 Mbps; 0.6 bpp ≈ 1.06 Gbps; 0.8 bpp (4:4:4 headroom) ≈ 1.4 Gbps. The GF(2¹⁶) FEC + multi-block framing must sustain these without the per-frame shard count being the limiter — which it no longer is once you leave GF(2⁸).
---
## 8. Milestones
Sizing is rough and relative (Spike / S / M / L) for a focused solo dev; treat as ordering, not deadlines.
**M0 — Pipeline spike (S).** wlroots headless output → PipeWire capture → VAAPI/NVENC encode → dump H.265 to a file that plays. *Acceptance:* a valid encoded file from a virtual output, no streaming yet. Proves the Linux capture+encode chain end-to-end.
**M1 — `punktfunk-core` skeleton + C ABI (M).** Session lifecycle, GameStream-compatible packetization and GF(2⁸) FEC (P1), AES-GCM, `cbindgen` header, a tiny C test harness. *Acceptance:* core links from C; round-trips packets in a loopback test with simulated loss.
**M2 — P1 host: stream to stock Moonlight (L).** Wire M0's pipeline into the core; implement `serverinfo`/pairing/RTSP enough for a real Moonlight client to connect, with a KWin virtual output created on connect and destroyed on disconnect. Input via `reis`/uinput. *Acceptance:* **you play a game on your KDE box streamed to a stock Moonlight client on a virtual display, no dummy plug, no kernel args.** This is the shippable milestone and the project's reason to exist.
**M3 — Measurement harness (S).** Glass-to-glass latency measurement (on-screen QR/timestamp or photodiode), packet-loss injection, frame-pacing and stall metrics surfaced in the web UI. *Acceptance:* you can quantify a regression. Build this before optimizing anything.
**M4 — P2 transport: break the wall (L).** Add `punktfunk/1` negotiation; swap to `reed-solomon-simd` GF(2¹⁶) with multi-block per-frame framing; optional QUIC control/audio. Write a minimal **Rust** reference client (decode via VAAPI, present via wgpu/Vulkan) to exercise it. *Acceptance:* a stable stream above 1.4 Gbps at 5120×1440@240 with loss recovery working; latency unchanged vs. M2.
**M5 — Apple client (L).** Swift + VideoToolbox + Metal + SwiftUI, linking `punktfunk-core` via the C header. *Acceptance:* a Mac plays a stream at native resolution/refresh.
**M6 — Feature surface (M, ongoing).** Mic passthrough as a proper encrypted, per-client reverse audio stream (the thing the upstream PR got wrong); HDR signalling; per-client identity/permissions; pause/resume. *Acceptance:* feature parity with Apollo on the items you care about, plus mic done right.
---
## 9. Risk register
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| KWin runtime virtual-output API is undocumented/unstable | High | High | Spike on wlroots first to de-risk the pipeline; study KRdp's source for the KWin path; keep `VirtualDisplay` pluggable so a stuck compositor doesn't block the project |
| Wayland input injection gaps (libei still evolving) | Med | Med | uinput fallback always available; `reis` for the Wayland-native path |
| dmabuf → encoder zero-copy import quirks per GPU/driver | High | Med | Validate on your actual NVIDIA + AMD hardware early (M0); have a CPU-copy fallback path |
| Encoder/decoder can't sustain 1.77 Gpx/s @ 240 | Med | High | Measure in M0/M4 on real silicon; this is a hardware ceiling no rewrite fixes — discover it before P2, not after |
| Frame pacing eats more time than expected | High | Med | M3 measurement harness first; treat pacing as a first-class subsystem, not a polish step |
| Scope creep into a full Moonlight replacement | High | High | P1 (stock-client compat) is the firewall: it forces you to ship value before writing a client |
| Solo bandwidth vs. other projects | High | Med | M2 is a complete, useful artifact on its own; the plan is safe to pause after any milestone |
---
## 10. Testing & measurement
- **Loopback correctness:** core encodes→FEC→loss-inject→recover→decode in-process; property tests over loss patterns and shard counts (proptest).
- **Glass-to-glass latency:** rendered timestamp/QR on host, read back on client capture; or a photodiode for true photons. Track p50/p99.
- **Loss resilience:** `tc netem` to inject loss/jitter/reorder; verify FEC recovery and graceful degradation.
- **Pacing:** log present timestamps vs. client vsync; alert on stalls and duplicate/dropped frames.
- **Soak:** multi-hour streams; watch for buffer growth, fd leaks, encoder session exhaustion.
- **Hardware matrix:** an NVIDIA box (NVENC), an AMD/Intel box (VAAPI), a Mac (VideoToolbox decode). Catch driver quirks early.
---
## 11. Repo / workspace structure
```
punktfunk/
├── Cargo.toml # workspace
├── crates/
│ ├── punktfunk-core/ # protocol, FEC, pacing, crypto — C ABI (cdylib + staticlib)
│ │ ├── src/abi.rs # #[no_mangle] extern "C" surface
│ │ ├── src/fec.rs # GF(2^16) blocking over reed-solomon-simd
│ │ ├── src/transport/ # udp+fec video, quinn control/audio
│ │ ├── src/protocol/ # gamestream-compat (P1) + punktfunk/1 (P2)
│ │ └── cbindgen.toml
│ ├── punktfunk-host/ # Linux host binary
│ │ ├── src/capture/ # pipewire / portal
│ │ ├── src/encode/ # ffmpeg vaapi/nvenc
│ │ ├── src/vdisplay/ # trait + kwin/wlroots/mutter impls
│ │ ├── src/input/ # reis + uinput
│ │ └── src/web/ # axum config/pairing API
│ └── punktfunk-probe/ # reference Rust client (M4)
├── clients/
│ ├── apple/ # Swift package, imports punktfunk_core.h (M5)
│ └── android/ # Kotlin + JNI (later)
├── include/ # generated punktfunk_core.h
└── tools/
├── latency-probe/
└── loss-harness/
```
---
## 12. Immediate next actions (first week)
1. **Stand up the workspace** with `punktfunk-core` (empty ABI + `cbindgen`) and `punktfunk-host` skeletons; wire up CI (Gitea Actions, BuildKit-based pipelines).
2. **M0 spike on wlroots:** headless output → PipeWire capture → NVENC/VAAPI encode → playable file. This validates the riskiest *pipeline* assumptions in days, on real GPU hardware.
3. **Read KRdp's source** for how KDE creates virtual outputs and casts them — it's the closest existing reference for the KWin path needed in M2.
4. **Decide P1 protocol depth:** confirm exactly which `serverinfo`/RTSP/pairing messages a current Moonlight client requires for a successful connect, so M2's compat surface is scoped precisely.
---
*The shape of the bet: M2 alone — virtual-display streaming to stock Moonlight clients on Linux — is a complete, useful, gap-filling release. Everything after it (the wall-breaking transport, native clients, mic-done-right) is upside you unlock from a position of having already shipped, with the hard transport work resting on a FEC core that makes the 1 Gbps ceiling a thing of the past rather than a thing to hack around.*
+3 -1
View File
@@ -71,7 +71,9 @@ certificate, so you import that certificate once before Windows will install the
1. Open the [packages page](https://git.unom.io/unom/-/packages) (generic group), find
**`punktfunk-client-windows`**, and download the newest **`.msix`** and its matching **`.cer`**.
2. In an **admin** PowerShell, trust the publisher certificate (one-time), then install:
2. **Trust the publisher certificate**, then install. The MSIX won't install until the certificate is
trusted — but it's the **same certificate for every release**, so this is genuinely one-time and
later updates need nothing. In an **admin** PowerShell:
```powershell
Import-Certificate -FilePath .\punktfunk-client-windows.cer `
+30 -4
View File
@@ -1,11 +1,12 @@
---
title: Install the Host
description: Pick your distro and install the punktfunk host from its package registry.
description: Install the punktfunk host — on Linux from its package registry, or on Windows from a signed installer.
---
The package registries are the real distribution channel. Pick your distro, add the repo, and install
with your native package manager. Each row links to the full per-distro guide (add the repo, first-run
steps, the web console) — those are the source of truth, so this page doesn't duplicate them.
On Linux, the package registries are the real distribution channel. Pick your distro, add the repo, and
install with your native package manager. Each row links to the full per-distro guide (add the repo,
first-run steps, the web console) — those are the source of truth, so this page doesn't duplicate them.
On **Windows** (NVIDIA), the host ships as a signed installer instead — see [Windows](#windows-nvidia).
## Pick your distro
@@ -19,6 +20,31 @@ Each registry is public — no auth, you just trust the repo's signing key. Addi
one-time step covered in the linked guide; after that, normal `apt upgrade` / `rpm-ostree upgrade`
tracks new builds automatically.
## Windows (NVIDIA)
punktfunk also runs as a native host on **Windows 10/11 (x64) with an NVIDIA GPU**, shipped as a
signed installer — see [Windows Host](/docs/windows-host) for what it includes and its limitations.
1. From the [packages page](https://git.unom.io/unom/-/packages) (generic group), download the newest
**`punktfunk-host-setup-<ver>.exe`** and its matching **`.cer`**.
2. **Trust the publisher certificate once.** The installer is signed with a self-signed certificate
whose public `.cer` is published next to it — the **same certificate for every release**, so this is
genuinely one-time and later updates need nothing. In an **admin** PowerShell:
```powershell
Import-Certificate -FilePath .\punktfunk-host-setup.cer `
-CertStoreLocation Cert:\LocalMachine\TrustedPublisher
```
3. Run `punktfunk-host-setup-<ver>.exe` (elevated). It installs to `C:\Program Files\punktfunk`,
optionally installs the bundled **SudoVDA** virtual-display driver, and registers + starts the
`LocalSystem` service (`/VERYSILENT` for an unattended install). Upgrades and uninstall go through
Add/Remove Programs.
You need an NVIDIA GPU + driver (the host is NVENC-only on Windows). More detail — including the CLI
`punktfunk-host service install` path — is in
[Running as a Service → Windows](/docs/running-as-a-service#windows).
## What the packages are
- **`punktfunk-host`** — the streaming host. Install this on your Linux + NVIDIA gaming machine.
+3 -8
View File
@@ -11,6 +11,7 @@
"ubuntu-kde",
"fedora-kde",
"bazzite",
"windows-host",
"running-as-a-service",
"---Connecting---",
"clients",
@@ -21,13 +22,7 @@
"configuration",
"host-cli",
"troubleshooting",
"---Project & Internals---",
"roadmap",
"implementation-plan",
"windows-host",
"apple-stage2-presenter",
"gamescope-multiuser",
"dualsense-haptics",
"ci"
"---Project---",
"roadmap"
]
}
+59 -377
View File
@@ -1,392 +1,74 @@
---
title: "Roadmap"
description: "Decided next goals and the longer-term bets."
description: "What's shipped, what's in progress, and what's next for punktfunk."
---
A quick map of where punktfunk is today and where it's heading. For the detailed, dated changelog,
see [Status & Progress](/docs/status).
Decided 2026-06-10 (research-grounded; see commit history), extended since.
**Legend:** ✅ shipped · 🟡 in progress · 🔭 planned · ⛔ blocked upstream
**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full
stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense**
— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E
live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds
against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced).
**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one
process, with native pairing driven from the **web console** (arm → show PIN), not the service log.
Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`).
**Bazzite dynamic resolution (`c894c6f`):** the host now *manages* a headless `gamescope-session-plus`
Steam session at the **client's exact resolution + refresh** — games see it (via injected
`--nested-refresh` + generated CVT modes, not the host's TV EDID), relaunched per-connection on a mode
change, reused (no Steam restart) on the same mode. Plus macOS/iPad input fixes (NSEvent motion +
iPad pointer-lock) and a 4K/5K one-frame-freeze fix (grow the UDP socket buffers).
## At a glance
The four native clients — **macOS/iOS/iPadOS/tvOS**, **Linux**, **Windows**, and **Android** — are
all real, working clients; see [Status & Progress](/docs/status) for their current state.
| Area | |
|---|---|
| Protocol core — FEC · crypto · C ABI | ✅ |
| GameStream host (works with Moonlight) | ✅ |
| Native `punktfunk/1` protocol | ✅ |
| Linux host (KWin · GNOME · gamescope · Sway) | ✅ |
| Windows host (NVIDIA) | ✅ beta |
| Apple client (macOS · iOS · iPadOS · tvOS) | ✅ |
| Linux client (GTK4) | ✅ |
| Android client (phone · TV) | ✅ |
| Windows client | 🟡 |
| Web console + pairing | ✅ |
| Concurrent sessions (shared desktop) | ✅ |
| Network speed test + bitrate | ✅ |
| HDR / 10-bit (host capture) | ⛔ |
| Sub-frame pipelining (latency) | 🔭 |
A native **Windows host** (§7) is also implemented and shipping (NVIDIA-only, x64-only) — see
[Windows Host](/docs/windows-host).
## ✅ Shipped
**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval) and the
native client presenter + iOS (§6). **§10 HDR/10-bit is parked — blocked upstream at the compositor**
(no gamescope/KWin PipeWire 10-bit producer yet).
- **The host, two ways.** A GameStream host any [Moonlight](/docs/moonlight) client can use, and the
lower-latency native [`punktfunk/1`](/docs/how-it-works) protocol (QUIC control + UDP data with
GF(2¹⁶) Leopard FEC + AES-GCM). Both run from one process.
- **Native-resolution virtual displays** on Linux across KWin, GNOME/Mutter, gamescope, and
Sway/wlroots, with a fully zero-copy GPU path to NVENC (stable 240 fps at 5120×1440).
- **A native Windows host** (NVIDIA, x64) — a signed installer with secure-desktop capture and a
bundled virtual-display driver. See [Windows Host](/docs/windows-host). *(Beta — newer than the
Linux host.)*
- **Clients on every platform** — native apps for **Apple** (macOS, iOS, iPadOS, tvOS), **Linux**,
**Android** (phone + TV), and **Windows**, each with hardware decode, controllers including
DualSense, audio + mic, and automatic host discovery. See [Clients](/docs/clients).
- **Secure by default** — SPAKE2 PIN pairing with pinned reconnects, one-click delegated approval from
the web console, and mDNS LAN auto-discovery.
- **Tuned for latency** — concurrent sessions (stream one desktop to several devices at once),
mid-stream resolution renegotiation, a cross-machine clock-skew handshake, a 1 Gbps+ data plane, and
an in-app network speed test that informs the bitrate picker.
## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)*
## 🟡 In progress
Startup is a chain of timing-sensitive handoffs with no readiness checks — each is a blind
`sleep`, one-shot timeout, or silent fire-and-forget that fails into a black screen.
- **Windows client on-glass validation.** The hardware (D3D11VA) decode, HDR present, and GUI are
built and ship as a signed MSIX — they just need verification on real GPU hardware.
- **Apple stage-2 presenter as the default.** The lower-latency `VTDecompressionSession`
`CAMetalLayer` path is live behind an opt-in flag and graduating to the default.
- **Web console parity.** Surfacing the speed test and bitrate picker the apps already have.
- **Windows host hardening.** Broader real-world testing, AMD/Intel encode (NVIDIA-only today), and
bundling the ViGEm gamepad driver.
- **Phase 1 (S):** replace `run-headless-kde.sh`'s blind `sleep 2` with an active readiness
wait (kwin socket + `wl_display` roundtrip + `zkde_screencast` global advertised +
KWIN_PID alive); add a `punktfunk-host probe-compositor` subcommand (reuses kwin.rs's
registry roundtrip); move the portal restart to *after* readiness and precede it with
`systemctl --user import-environment` + `dbus-update-activation-environment` (the missing
env import — the Sway script does this, the KDE one doesn't).
- **Phase 2 (M):** bounded retry-with-backoff around `vd.create()` + first-frame
(permanent vs transient); a PipeWire negotiation watchdog with zero-copy→CPU auto-fallback
("no PipeWire frame within 10s" → recovery or precise diagnosis); fix `set_custom_refresh`
to wait for the output, read back the active mode, reconcile encoder fps; harden gamescope
node discovery + detect the known-bad-gamescope signature; graceful PipeWire-thread stop.
- **Phase 3 (L):** supervised systemd user session (kwin + portal + host) with the readiness
probe as an `ExecStartPost` gate, `Restart=on-failure`.
## 🔭 Planned
## 2. Offer available compositors in the client ✅ *(done)*
- **Sub-frame pipelining.** Overlap encode and transmit within a single frame (a direct NVENC slice
path) — the next big latency lever at high resolutions.
- **True glass-to-glass latency** measured end to end (capture → on-screen present).
- **gamescope multi-user isolation.** Per-session input and audio so concurrent clients are fully
independent desktops (the shared-desktop case already works).
- **Peer-approved pairing.** Approve a new device from an already-paired device's own app.
Host enumerates which backends are actually available (binary present + version OK:
gamescope ≥3.16.22, KWin ≥6.5.6, gnome-shell, sway), advertises the list in the punktfunk/1
Welcome + a mgmt-API field; client sends its pick in the Hello; host honors it per session.
Picker in the Apple client + web console.
## ⛔ Parked / blocked
## 3. Bazzite / install on other devices ✅ *(packaging written — `packaging/`)*
Bazzite already ships gamescope + PipeWire + the NVIDIA driver (incl. `libnvidia-encode`);
it's Fedora-atomic and the community installs Sunshine via COPR rpm-ostree — the analog.
Written: `packaging/rpm/punktfunk.spec` (builds the host from source), `packaging/bootc/Containerfile`
(`FROM bazzite-nvidia`), `packaging/bazzite/host.env` (gamescope default), `packaging/copr/` +
`packaging/README.md`. The build itself is operator-run (COPR / a Fedora toolbox; not buildable on
the Ubuntu dev box). `LICENSE-{MIT,APACHE}` added to match the declared dual license.
- **M-Bazzite-1:** a **COPR RPM** (primary) — binary + `60-punktfunk.rules` (→
`/usr/lib/udev/rules.d`) + systemd `--user` unit + `host.env.example`; `Requires` the
NVENC ffmpeg-libs Bazzite already pulls; links host `libcuda`/`libnvidia-encode` directly.
Install = `rpm-ostree install` + reboot + add to `input`/`render`. Default backend =
Bazzite's already-present **gamescope** (minimal session plumbing).
- **M-Bazzite-2:** wrap the RPM in a **bootc/OCI image layer** (`FROM
ghcr.io/ublue-os/bazzite-nvidia:stable`) for the appliance/"just rebase" experience.
- Flatpak only later as an explicitly-degraded convenience build (sandbox fights
zero-copy NVENC/dmabuf/uinput).
## 4. Mic passthrough — client mic → host input device ✅ *(done — host side)*
The exact mirror of the host→client desktop-audio path. A PipeWire virtual source apps can
select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`.
- New `0xCB` MIC_AUDIO datagram (mirror of `0xC9`) + `NativeClient::send_audio` + ABI
`punktfunk_send_audio`.
- `audio/source_linux.rs` — near-copy of the capture file, Direction::Output, fed from a
jitter buffer (silence-fill underrun, Opus PLC).
- Host `mic_thread` (Opus decode → ring → source); teardown RAII, set `node.dont-reconnect`.
- Apple capture (AVAudioEngine → Opus). **Opt-in + paired-only** (a remote mic is a privacy
surface). punktfunk/1-only.
## 5. Touch + rich DualSense *(decision: commit to full UHID DualSense)*
- **Touch — implemented (host path), pending a backend that lands it.** `TouchDown/Move/Up`
InputKinds (reuse the abs-pointer `flags=(w<<16)|h` mapping, `code`=touch id); host
`inject/libei.rs` requests the `Touchscreen` device type + binds the `Touch` capability and
injects `ei_touchscreen` down/motion/up; `punktfunk-probe --touch-test` drags a finger.
**Validated:** KWin's RemoteDesktop portal *grants* the Touchscreen device type, but its EIS
server creates **no touchscreen device** (headless KWin) — so touch currently no-ops on KWin
(now logged once). The code is correct; it needs a backend that exposes `ei_touchscreen`
(gamescope / newer KWin / the real iPad client path) to land. wlroots: no virtual-touch wired.
- **Rich DualSense — HID backend built & validated live.** `inject/dualsense.rs`: a hand-rolled
`/dev/uhid` codec (no bindgen) presenting a genuine USB DualSense (vendor 054C/0CE6, the 232-byte
inputtino report descriptor) bound by the kernel `hid-playstation` driver. The mandatory
GET_REPORT feature handshake (calibration 0x05 / pairing 0x09 / firmware 0x20) is answered, so the
kernel creates the full device (gamepad/motion/touchpad/lightbar). Input report `0x01` is built
from gamepad frames; output report `0x02` is parsed for LED RGB, player LEDs, and **adaptive
trigger effects (L2/R2)**. Protocol carries new side-planes: rich-input `0xCC`
(touchpad/motion) + HID-output `0xCD` (LED/triggers). `/dev/uhid` udev rule shipped.
- **Rich DualSense — Phase C/D/E end-to-end, validated live.** `PUNKTFUNK_GAMEPAD=dualsense`
selects a per-session `DualSenseManager` (the `PadBackend` enum in `punktfunk1.rs`): client gamepad frames
build the DualSense report; the kernel's feedback comes back as `HidOutput` on the **0xCD** plane
(lightbar / player LEDs / adaptive triggers) while **rumble stays on the universal 0xCA plane**
(so non-DualSense clients still feel it); touchpad + motion ride the **0xCC** rich-input plane
(`DualSenseManager::apply_rich`, merged with button state). The connector + C ABI gained
`punktfunk_connection_next_hidout` (→ `PunktfunkHidOutput`) and `punktfunk_connection_send_rich_input`
(← `PunktfunkRichInput`); header regenerated. Validated on-box: a synthetic-source `punktfunk1-host` +
`punktfunk-probe --rich-input-test` created the real kernel DualSense, drove 0xCC, and decoded
12 live 0xCD events (the kernel's actual lightbar/trigger init reports) — data plane unaffected
(600/600 frames). *Remaining:* the Apple client renders adaptive triggers + rumble on a real
DualSense (`GCDualSenseAdaptiveTrigger`) — handed off to the client agent for the real playtest.
- **Advanced (audio-driven voice-coil) haptics — scoped, NO-GO for now (`docs/dualsense-haptics.md`).**
Driven by the DualSense's USB *audio* interface (4-ch, back 2 channels = haptic PCM), not HID — so
the UHID backend structurally can't carry it. Three independent walls: host capture needs a kernel
rebuild (`CONFIG_USB_DUMMY_HCD` is off → no UDC for an `f_uac2` gadget); **near-zero Linux supply**
(only ~510 Proton titles via custom Wine patches emit it; `hid-playstation`/Steam Input/RPCS3
don't); and the Apple client can't faithfully replay PCM haptics (CoreHaptics is discrete/pattern-
based, no public channel-3/4 routing). Deferred; revisit only if a real DS for capture + a UDC/host
path + a PCM-capable client all land. Adaptive triggers (HID, above) deliver the reachable 80%.
## 6. iOS/iPadOS → tvOS *(deferred)*
PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin
+ touch capture (#5) + UI. tvOS later.
## 7. Windows as a host ✅ *(implemented & shipping — NVIDIA-only, x64-only; see [Windows Host](/docs/windows-host))*
Done as an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/crypto/C-ABI) +
QUIC + GameStream + mgmt + the pipeline orchestration are all platform-agnostic and already
`cfg`-isolated (~95% reuse), so the Windows host is a set of `#[cfg(windows)]` backends behind the
existing traits:
- **Capture** — DXGI Desktop Duplication → D3D11 texture.
- **Virtual display** — **SudoVDA** (the Sunshine Virtual Display Adapter), a pre-built, signed
Indirect Display Driver that creates a `WxH@Hz` monitor on demand — so there's no kernel driver to
author or WHQL-sign. Bundled and staged by the installer; falls back to capturing an existing
monitor if absent.
- **Encode** — NVENC with a D3D11 device (`--features nvenc`), so the host is **NVIDIA-only**.
- **Input** — SendInput (mouse/keyboard) + ViGEm (virtual gamepads with rumble).
- **Audio** — WASAPI loopback capture + a WASAPI virtual mic.
It ships as a **signed Inno Setup installer** that registers a `LocalSystem` SCM service launching
into the interactive session for secure-desktop (UAC / lock-screen) capture, published by
`windows-host.yml`. Still open: AMD/Intel encode (none today), broader real-world testing, and
bundling ViGEmBus.
## 8. Pairing & trust hardening *(§8a + §8b-1 done; §8b-2 next)*
The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the
client) is built and live. Two changes harden it from "works" to "secure by default":
- ✅ **Mandatory PIN pairing by default — done & live** (`§8a`, `serve --native` now requires
pairing; `serve --open` disables it). An unpaired client is rejected at the session gate; pairing
is via the SPAKE2 PIN ceremony (one online guess, no offline attack) armed from the web console.
Validated live: unpaired → "this host requires pairing", then web-armed PIN → "client trusted".
Deployed to the dev box + Bazzite.
- ✅ **§8b-1 Delegated approval via the console — done (2026-06-12)** *(the ergonomic enabler for
"mandatory": pair a device without fetching the host PIN out of band).* An identified-but-unpaired
device that knocks on a pairing-required host is held as a **pending request** in `NativePairing`
(in-memory, deduped by fingerprint, 32-entry cap, 10-min expiry — a LAN scanner can't grow it,
and an anonymous client with no certificate records nothing). The mgmt API gains
`GET /native/pending` + `POST /native/pending/{id}/approve` (optional `{name}` to relabel) +
`POST /native/pending/{id}/deny`; the web console's Pairing page shows a **Waiting for approval**
section (live-polling) with Approve/Deny — approve pins the fingerprint on the spot, no PIN.
The `Hello` carries an optional trailing **device name** (same back-compat pattern as
compositor/gamepad/bitrate; `client-rs --name` sends it, fingerprint-derived label otherwise) so
the pending list is human-readable. End-to-end tested (knock → pending → approve → same identity
streams) + unit/mgmt tests.
- **§8b-2 (peer push — needs the client):** the host also pushes the pending request over a paired
**Device A**'s live QUIC connection (a new control-plane message); A's app renders the prompt and
replies approve/deny — the user's exact "Device A gets a notification" flow. The native/Apple UI
is a client-agent task. The Apple connector should also start sending the `Hello` device name
(needs a connect-ABI name parameter; the wire field is already live).
PIN pairing (§8a) stays the bootstrap — the first device, or when no approver is online.
## 9. Client→host network speed test + settable bitrate *(host + Apple client done — web console remaining)*
Measure what the network actually sustains so the bitrate picker is informed (suggest/cap a safe
value) instead of guesswork that ends in a stuttering stream.
**Done & live (host + protocol + connector + C ABI, `74819b1`):**
- **Bitrate negotiation**: `bitrate_kbps` rides Hello/Welcome (trailing-byte back-compat). The
client requests a rate; the host clamps to [500 kbps, 2 Gbps] (or its 20 Mbps default on 0),
applies it to NVENC (replacing the old hardcoded 20 Mbps) on the initial mode + every reconfigure,
and echoes the resolved value. C ABI: `punktfunk_connect_ex3(…, bitrate_kbps, …)` +
`punktfunk_connection_bitrate()`.
- **Bandwidth probe over the punktfunk/1 data path**: `ProbeRequest{target_kbps,duration_ms}` /
`ProbeResult{bytes_sent,…}` control messages + a `FLAG_PROBE` packet flag. The host bursts
zero-filled FEC-encoded AUs at the target goodput for the duration (clamped ≤ 3 Gbps / ≤ 5 s,
video paused), reports what it sent; the connector measures received bytes/window → goodput + loss
and exposes it (`punktfunk_connection_speed_test()` + `punktfunk_connection_probe_result()` →
`PunktfunkProbeResult{throughput_kbps, loss_pct, …}`). Probe filler is diverted from the decoder.
Validated on loopback (synthetic source): a 20 Mbps/2 s probe measured 20050 kbps at 0% loss,
interleaved probe AUs excluded from frame verification. `punktfunk-probe` gains `--bitrate` +
`--speed-test KBPS:MS` as the reference/loopback driver.
**Done (Apple client UI):** Settings grows a Bitrate control (Automatic = host default; manual is
a log-scale slider up to 3 Gbps with an above-1-Gbps "test the speed first" warning — tvOS keeps
a focus-native preset picker; rides `connect_ex3` on every connect, `PUNKTFUNK_BITRATE_KBPS` dev
override), and each host card's context menu gets
"Test Network Speed…" — a sheet that connects, runs `speed_test` (up to the host's 3 Gbps
probe ceiling for 2 s), polls `probe_result` with a live readout, and shows measured
goodput · loss · recommended bitrate (≈70% of measured, capped at the 2 Gbps session
ceiling) with a one-tap "Use N Mbps" writing the setting. Loopback-tested through the
xcframework: bitrate echo (50 000 → 50 000) + a 20 Mbps/500 ms probe completing with real numbers.
**Remaining:** surface both in the web console.
## 10. HDR + 10-bit color *(parked — blocked upstream at the compositor producer)*
Opt-in HDR10 (BT.2020 + PQ, 10-bit) streaming. Designed end to end; **blocked at capture, not in our
stack** — the compositor doesn't emit a 10-bit/HDR PipeWire frame on any shipping build. Spiked +
researched 2026-06-11 (memory: `hdr-blocked-gamescope-pipewire`); the downstream design is ready to
build the moment a producer lands.
- **The wall — gamescope capture is 8-bit.** gamescope composites HDR for a *display*
(`--hdr-enabled`, `--hdr-debug-force-output`), but its PipeWire capture node offers only `BGRx`/
`NV12` (8-bit) — confirmed by reading `src/pipewire.cpp` `build_format_params()` on upstream master
AND the box's exact build (`c31743d`); color is capped BT.601/709. Issue **#2126** ("pipewire: add
HDR streams") is OPEN + unstarted (no PR). Forcing HDR output does not change the capture format.
- **PipeWire ≥1.6** is the other prerequisite (HDR colortype transport). Fedora 43 ships 1.4.x;
Fedora 44 ships **1.6.6** — but Bazzite F44 `deck-nvidia` is **testing-only** (`:stable` is still
F43; `:testing` has a confirmed NVIDIA Game-Mode crash). Rebasing now clears *only* the PipeWire
wall while gamescope stays 8-bit → **no-go** for HDR; revisit a rebase when F44 promotes to stable
(for its own sake), not for HDR.
- **The realistic route is KWin, not gamescope:** **KWin MR !8293** is a live draft adding HDR
PipeWire capture. That pulls HDR onto the *desktop* (KWin) path — trading away gamescope's
Steam-Deck-UI polish + the dynamic-resolution work (§ above). Track #2126 and !8293.
- **Constraints (settled):** NVENC tops out at **10-bit** → no Main12, no 12-bit AV1. **HDR ⟹ HEVC
Main10** (Apple VideoToolbox decodes 10-bit HEVC but **not** 10-bit AV1). Static HDR10 SEI
(BT.2020-PQ default) since the compositor won't surface per-frame metadata. Opt-in negotiation via
the Hello/Welcome trailing-byte pattern (SDR default; client declares HDR want; host master toggle).
- **Downstream design (ready when capture unblocks):** add P010 + `ColorInfo` to capture; 10-bit
zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA
path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple
VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback.
## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)*
Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks
the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling.
**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and
untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is
only ~434874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM
(RustCrypto auto AES-NI, ~1025× headroom on x86_64); the u64 sequence/nonce space; and the **core
`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every
legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size
must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated
params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand.
- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the
clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show
headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`)
so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock
send-buffer drop — `Transport::send` → `Result<bool>`, a new `packets_send_dropped` stat (Stats +
C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe
completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe.
- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall
per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe
bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all`
(64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux
`sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with
`recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep
underdrain). ~64× fewer syscalls.
- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the
reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 2561024 (bounds burst-drop
blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump
`shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets.
- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB,
never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface.
- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path):
`punktfunk-probe --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching
`packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.51 Gbps (no explicit
`rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.
## 12. Glass-to-glass latency *(investigated; quick wins landed, bigger bets scoped)*
A 5-way investigation (2026-06-11) mapped where latency actually lives. The measured "p50 0.83 ms"
is only the same-host **capture-stamp→reassembled** slice (~3040% of true glass-to-glass) and was
measured with tiny single-chunk frames, so it excludes the pacing tail. The latency that matters, in
priority order: **(1) the host pacing tail** — `paced_submit` used to spread *every* multi-chunk
frame over ~90% of the interval (up to ~7.5 ms@120 / ~15 ms@60); **(2) native-path serialization** —
`virtual_stream` runs capture+encode+seal+paced-send on one thread, so frame N+1 can't start until
frame N's paced tail leaves the wire; **(3) client present** — `AVSampleBufferDisplayLayer` adds
~0.5 refresh (~4 ms@120Hz, ~8 ms@60Hz), the dominant client term at 60 Hz.
**Already optimal — do NOT touch** (confirmed): NVENC tuning (p1/ull/cbr/bf0/delay0/infinite-GOP +
forced-IDR — `receive_packet` is already same-frame); the device→device copy in `submit_cuda` (avoids
NVENC registration-cache thrash); FEC `max_data_per_block=4096` (every frame incl. a 4 MB IDR is one
block — no multi-block latency); the client reassembler (no jitter buffer, frame emitted on
last-packet arrival, `REORDER_WINDOW` is a dedup bound not a delay) — do **not** add a client jitter
buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
- **Done & live (`99f60b5`):** **microburst-cap pacing** — a frame ≤ a cap (default 128 KB,
`PUNKTFUNK_PACE_BURST_KB`) bursts out immediately (no pacing tail); only a bigger frame's overflow
(IDR / sustained high bitrate — the bursts that actually froze) is spread. Recovers the tail on the
common case, keeps the freeze fix for the frames that need it; 128 KB is a safe default (well under
the ~150 Mbps@60 frame size where drops began). Plus **per-frame instrumentation** (PUNKTFUNK_PERF):
`encode_us` + `pace_us` p50/p99/max + immediate-vs-paced counts, so the cap is tunable against real
numbers. **Validate with the LAN soak before raising the cap** (`send_dropped` must stay 0).
- **Done & live (`b295a5b`; validated on the GNOME box 2026-06-12):** **encode|send thread split**
on the native path — a dedicated `send_loop` thread owns the `Session` and does seal+pace+send+
probes; the encode thread captures+encodes+handles reconfig and hands `FrameMsg` over a bounded
`sync_channel(3)` with backpressure. Removes the serialization (~28 ms @60120 fps) and is the
substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
- **Done & live (skew handshake landed 2026-06-12):** **wall-clock skew handshake** — `ClockProbe`/
`ClockEcho` on the control stream (8 NTP-style rounds right after `Start`; min-RTT sample →
hostclient offset; `clock_offset_ns`). The client adds the offset to its receive instant before
differencing against the AU `pts_ns`, so the `capture→reassembled` percentiles are now valid
**across machines** (reported `skew_corrected=true`), not just same-host. Back-compat: an old host
that doesn't answer times out → `skew_corrected=false` (shared-clock assumption, as before).
Validated cross-LAN (GNOME box → dev box): offset ≈ 1.57 ms (reproducible), rtt ~140 µs, **p50
1.30 ms** skew-corrected capture→reassembled. The skew handshake is now a shared core helper
(`quic::clock_sync` → `ClockSkew`) used by both the reference client and the **embeddable
connector** — `NativeClient` runs it at connect and exposes the offset over the C ABI
(`punktfunk_connection_clock_offset_ns`), so the Apple client can convert a present instant to the
host clock. The Apple client now consumes that offset: `PunktfunkConnection.clockOffsetNs` +
`LatencyMeter` surface a **capture→client-receipt** (skew-corrected) p50/p95 in the HUD — the first
cross-machine latency the real Apple client reports. **Remaining for *true* glass-to-glass**:
(1) the **decode→present** tail — the stage-1 `AVSampleBufferDisplayLayer` decodes+presents
compressed samples internally with no per-frame callback, so it needs the **stage-2 presenter**
(`VTDecompressionSession` decode-completion timestamp + `CAMetalLayer`/display-link present) to
stamp on-glass present time; (2) the host **render→capture** term (PipeWire buffer presentation
timestamp vs our capture stamp). **render→capture is parked (low priority):** pipewire-rs 0.9.2
exposes no per-buffer meta accessor, no raw buffer pointer (`pub(crate)`), and no stream-timing
API, so reading `SPA_META_Header.pts` would require introducing raw `spa_sys`/`pw_sys` FFI into the
working, perf-critical capture buffer-acquisition — a risky rewrite for the *smallest* g2g term,
with KWin/Mutter `Header.pts` support unconfirmed. Glass-to-glass is effectively complete as
**capture→present** (the stage-2 presenter measures it). `tools/latency-probe` is still the
cross-machine orchestrator.
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
1. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
copy) — ~0.10.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
2. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
3. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
encode+send within a frame (~36 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
thread split, only after measurement justifies it.
## 13. Native-protocol LAN auto-discovery ✅ *(done — 2026-06-12, validated cross-LAN)*
The native protocol had no discovery — clients connected by `--connect HOST:PORT` only, while
GameStream already auto-discovered via mDNS (`_nvstream._tcp`). Now both the unified host
(`serve --native`) and standalone `punktfunk1-host` advertise the native service over mDNS:
- **Service**: `_punktfunk._udp.local.` (UDP — punktfunk/1 is QUIC; the advertised port is the QUIC
control/data port). Host side: `crate::discovery::advertise_native`, wired into `m3::serve` so
both host entry points get it; best-effort (a discovery failure never blocks streaming —
`--connect` always works). The advert is held for the host's lifetime (RAII unregister).
- **TXT records**: `proto=punktfunk/1`, `fp=<host cert SHA-256>` (the value a client pins — advisory
over unauthenticated mDNS, TOFU/pinning still verifies on connect), `pair=required|optional`
(so a picker knows up front whether the PIN ceremony is needed), `id=<host uniqueid>` (dedup).
- **Client**: `punktfunk-probe --discover [SECS]` browses and prints each host (name, addr:port,
pairing, fingerprint), then exits. Apple clients browse the same service natively via NWBrowser
(Bonjour) — no Rust-connector dependency; this section's service type + TXT keys are the contract.
- **Validated**: cross-LAN — a client discovered a GNOME host appliance
(`myhost.local 9777 pair=required fp=1dcf3a…`) and a standalone synthetic host
(`pair=optional`); fingerprint + pairing state correct in both.
- **Next** (not done): wire NWBrowser discovery into the Apple client UI (host picker); the
host-side contract above is all it needs.
## 14. Concurrent sessions *(shared-desktop multi-view ✅ done; multi-user deferred)*
The host no longer serves one client at a time. The accept loop spawns each session (`JoinSet`),
bounded by `--max-concurrent` (default 4 — a NVENC bound; overflow waits in the accept queue). Each
session keeps its own virtual output + NVENC encoder; the host-lifetime input/audio/mic services stay
shared.
- **Done & live (shared-desktop multi-view):** multiple devices viewing/controlling the **same**
desktop on the shared-desktop backends (kwin/mutter/wlroots) — e.g. stream *your* desktop to a
laptop + a TV at once; shared input/audio is the correct semantics there. Validated live on the
GNOME box: two clients → **two independent Mutter virtual outputs (1280×720 + 1920×1080) streaming
simultaneously**. The QUIC handshake stays in the accept loop so a failed handshake doesn't consume
a slot or block the next client.
- **Deferred — gamescope multi-user (independent desktops):** the *other* model — each client its own
gamescope instance, with **per-session input + audio + mic** (the multi-user / cloud-gaming case).
Researched 2026-06-12 and **parked**: it's a large multi-file refactor (per-instance EIS sockets +
a per-session injector + per-session **null-sink audio routing** + per-session mic) for a niche
use case, while the common multi-device case is already covered by the multi-view model above. Full
research + the plumbing list: [gamescope Multi-User Isolation](/docs/gamescope-multiuser).
- **HDR / 10-bit streaming.** Designed end to end, but blocked upstream — no shipping compositor emits
a 10-bit/HDR capture stream yet. Ready to build the moment one lands.
- **Advanced DualSense voice-coil haptics.** Scoped and shelved (it rides the controller's USB audio
interface, with near-zero game support on Linux). Adaptive triggers, rumble, and the lightbar
already ship.
+2 -3
View File
@@ -3,9 +3,8 @@ title: "Status & Progress"
description: "Where the work stands across the core, the host, and the native clients."
---
A high-level view of where punktfunk stands. The full design lives in the
[Implementation Plan](/docs/implementation-plan), the ordered plan of work in the
[Roadmap](/docs/roadmap), and milestone-level detail in
A high-level view of where punktfunk stands. The ordered plan of work is on the
[Roadmap](/docs/roadmap), and milestone-level detail lives in
[`CLAUDE.md`](https://git.unom.io/unom/punktfunk/src/branch/main/CLAUDE.md).
## Milestones at a glance