Merge remote-tracking branch 'origin/main'
apple / swift (push) Successful in 56s
windows-host / package (push) Successful in 3m7s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m18s
android / android (push) Successful in 4m27s
ci / rust (push) Successful in 4m43s
ci / web (push) Successful in 31s
ci / docs-site (push) Successful in 34s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m18s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 1m1s
deb / build-publish (push) Successful in 2m8s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m5s
decky / build-publish (push) Successful in 24s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
ci / bench (push) Successful in 4m43s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 26s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m11s
flatpak / build-publish (push) Successful in 4m13s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m6s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m41s

# Conflicts:
#	docs-site/content/docs/meta.json
This commit is contained in:
2026-06-21 00:07:36 +00:00
30 changed files with 573 additions and 1209 deletions
@@ -1,122 +0,0 @@
---
title: "Apple Stage-2 Presenter (handoff)"
description: "Implementation plan for the explicit VTDecompressionSession → CAMetalLayer presenter — hand-paced present + true decode→present (glass-to-glass) measurement. Written so a Mac agent can pick it up."
---
A pickup-ready plan for the **stage-2 Apple presenter**. The current **stage-1** presenter feeds
compressed HEVC straight into `AVSampleBufferDisplayLayer`, which hardware-decodes **and presents
internally with no per-frame callback** — so we can't stamp decode or present, and we can't hand-pace.
Stage-2 takes explicit control: decode with `VTDecompressionSession`, present decoded frames through a
`CAMetalLayer` driven by a display link. Two wins: **~0.5 refresh off the present tail** (the biggest
client latency term at 60 Hz) and **true decode→present / glass-to-glass** numbers.
All of this is **macOS/iOS/tvOS-only** — build + validate on a Mac (`swift build && swift test`, then
live against a Linux host). The host + connector side is already done: `PunktfunkConnection.clockOffsetNs`
(the connect-time skew offset, host minus client) is what makes the present timestamp cross-machine
valid. See [Status](/docs/status) and roadmap §12.
## Where it plugs into the existing code
| Existing (stage-1) | Stage-2 change |
|---|---|
| `StreamPump` pulls AUs → `AnnexB.sampleBuffer``layer.enqueue` (compressed) | A `Stage2Pump` (or a mode flag on `StreamPump`) feeds AUs to `VTDecompressionSessionDecodeFrame` instead |
| `StreamView`/`StreamViewIOS` host an `AVSampleBufferDisplayLayer` | Host a `CAMetalLayer` (+ a display link); keep the input-capture + HUD overlay unchanged |
| `AnnexB.formatDescription(fromIDR:)` builds the format desc, refreshed on every IDR | **Reused** — it's the `VTDecompressionSession`'s format description; recreate the session when it changes |
| `LatencyMeter` records capture→client-receipt at `onFrame` | Extend to record **decode-completion** and **present** stages (below) |
Keep stage-1 behind a `UserDefaults` flag (e.g. `punktfunk.presenter = "stage1" | "stage2"`) so a
regression can fall back — `AVSampleBufferDisplayLayer` is the known-good path.
## Decode: VTDecompressionSession
1. Create the session from the IDR's `CMVideoFormatDescription`
(`AnnexB.formatDescription(fromIDR:)`):
```
VTDecompressionSessionCreate(
allocator: nil,
formatDescription: fmt,
decoderSpecification: nil, // hardware by default; no need to force
imageBufferAttributes: [
kCVPixelBufferMetalCompatibilityKey: true,
kCVPixelBufferPixelFormatTypeKey:
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, // 8-bit SDR; 10-bit (…10BiPlanar) for HDR later
],
outputCallback: <C-callback>,
decompressionSessionOut: &session)
```
2. Per AU: build the same `CMSampleBuffer` as stage-1 (`AnnexB.sampleBuffer(au:format:)`, PTS =
`au.ptsNs` @ 1e9 timescale) and submit:
```
VTDecompressionSessionDecodeFrame(session, sampleBuffer,
flags: ._EnableAsynchronousDecompression,
frameRefcon: <pts or a boxed context>, infoFlagsOut: nil)
```
3. The **output callback** delivers `(status, infoFlags, imageBuffer: CVImageBuffer?, presentationTimeStamp, …)`.
`presentationTimeStamp` is `au.ptsNs` (the host capture clock). **Stamp decode-completion here**
(`CLOCK_REALTIME` ns), retain the `CVPixelBuffer`, and push `{pts, pixelBuffer, decodedNs}` into a
small NSLock-guarded ring (the "ready" queue) the display link drains.
4. **IDR / mode change**: when `AnnexB.formatDescription` yields a new desc, check
`VTDecompressionSessionCanAcceptFormatDescription`; if not, finish-and-recreate the session (same
trigger stage-1 uses to refresh `format`). On decoder error (`kVTVideoDecoderBadDataErr`, etc.) drop
to the next IDR — there's no out-of-band extradata; recovery keyframes re-carry the parameter sets.
## Present: CAMetalLayer + display link
- `CAMetalLayer` (device = system default, `pixelFormat = .bgra8Unorm`, `framebufferOnly = true`,
`drawableSize` = stream WxH). The view: macOS `NSView`/iOS `UIView` whose `layerClass`/backing layer
is the `CAMetalLayer` (mirror `StreamView`/`StreamViewIOS`).
- **Display link** drives present: macOS `CVDisplayLink` (or `CADisplayLink` on macOS 14+),
iOS/tvOS `CADisplayLink`. Each callback carries the **target present timestamp** (`CVTimeStamp` /
`targetTimestamp`).
- Each vsync: pop the **newest** ready frame (drop older undisplayed ones — low-latency default; no
smoothing buffer to start), render a fullscreen quad sampling the **biplanar YUV** (luma +
chroma planes via `CVMetalTextureCache`) with a BT.709 YUV→RGB fragment shader, then
`commandBuffer.present(drawable)` (or `present(drawable, atTime:)`). **Stamp present time** for the
frame just shown (use the display link's target timestamp converted to `CLOCK_REALTIME`).
- Colorspace: BT.709 8-bit for now (matches the host's SDR). HDR (BT.2020/PQ, 10-bit `…10BiPlanar` +
EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`) is a later tie-in with the HDR roadmap (§10).
### Cheaper intermediate (2a) if the Metal path is too big in one step
Decode with `VTDecompressionSession` (gets the **decode-completion timestamp** = capture→decoded),
then wrap the decoded `CVPixelBuffer` in a `CMSampleBuffer` and `enqueue` it into the existing
`AVSampleBufferDisplayLayer` (it accepts uncompressed pixel buffers too). This yields the decode term
**without** a Metal renderer — but **not** true present (the layer still presents internally). Ship 2a
first if useful; 2b (CAMetalLayer + display link) is required for the on-glass present stamp.
## Measurement (the whole point)
Extend `LatencyMeter` (or add per-stage meters) so each frame records three instants, all
`CLOCK_REALTIME` ns, all shifted by `connection.clockOffsetNs` to the host clock:
- **capture→decoded** = `decodedNs + offset pts_ns` (VideoToolbox decode latency, cross-machine)
- **decode→present** = `presentedNs decodedNs` (the present tail stage-2 shortens)
- **capture→present** = `presentedNs + offset pts_ns` — **the glass-to-glass number** (modulo the
host render→capture term, still unmeasured; see roadmap §12)
Surface `capture→present` p50/p95 in the HUD (extend the existing `model.latency*` line in
`ContentView`). `skewCorrected` stays false when `clockOffsetNs == 0` (old host) — then the numbers are
same-host-only, as today.
## Validation
- `swift test`: add a decode-output test (decode a known IDR built like
`VideoToolboxRoundTripTests` → assert a `CVPixelBuffer` of the right dimensions + the
decode callback fires). Present is display-bound — validate it **live** via the HUD number.
- Live: connect to a Linux host (`punktfunk1-host --source virtual` on the GNOME box; see
[Ubuntu — GNOME](/docs/ubuntu-gnome)), confirm `capture→present` is a few ms over `capture→client`
and that `decode→present` shrank vs. an `AVSampleBufferDisplayLayer` baseline.
- Compare against the headless reference number: `punktfunk-probe` reports skew-corrected
capture→reassembled (~1.3 ms p50 GNOME box → dev box); capture→present should be that **+ decode +
present**.
## Gotchas
- VT decode is **async**; the output callback runs on a VT-managed thread — don't block it, just stamp
+ enqueue. Retain the `CVPixelBuffer` until presented (the ring owns it).
- `VTDecompressionSessionDecodeFrame` wants the **same** `CMSampleBuffer` shape stage-1 builds (AVCC
length-prefixed NALs, in-band parameter sets in the format desc, never as extradata).
- `CAMetalLayer.drawableSize` must track mode changes (the host can `Reconfigure` mid-stream — watch
`PunktfunkConnection.mode`/the new-IDR dimensions).
- Don't add a jitter/smoothing buffer for the first cut — present newest-ready for lowest latency; a
pacing policy can come later if frames look uneven.
- Keep `clients/apple/README.md`'s "Stage 2" item + [Status](/docs/status) updated when this lands.
-121
View File
@@ -1,121 +0,0 @@
---
title: "CI & Docker"
description: "Gitea Actions setup — workflows, the dockerized pieces, and the runners."
---
CI runs on **Gitea Actions** (`git.unom.io`, org `unom`). Three workflows in
`.gitea/workflows/`, two runners, three images in the Gitea container registry.
## Workflows
| Workflow | Trigger | Runner | What it does |
|---|---|---|---|
| `ci.yml` | push to `main`, PRs | `ubuntu-24.04` | Rust workspace (fmt · clippy `-D warnings` · build · test · C-ABI harness · generated-header drift) inside the `punktfunk-rust-ci` image; `web/` and `docs-site/` build + typecheck in `oven/bun:1` |
| `docker.yml` | push to `main`, `v*` tags, manual | `ubuntu-24.04` | Builds + pushes the three images below (`latest` + `sha-<short>` tags) |
| `apple.yml` | push to `main`, PRs, manual | `macos-arm64` | Rust core → `PunktfunkCore.xcframework``swift build` + `swift test` in `clients/apple` |
| `release.yml` | `v*` tags, manual | `macos-arm64` | Production Apple builds: sandboxed macOS `.dmg` (Developer ID, notarized, stapled) attached to the Gitea release + macOS/iOS/tvOS archives uploaded to TestFlight |
## Dockerized pieces
The host and the native clients are intentionally **not** containerized (the host needs
the GPU/compositor stack of the box it runs on). What is:
| Image | Source | Notes |
|---|---|---|
| `git.unom.io/unom/punktfunk-web` | `web/Dockerfile` (repo-root context — orval needs `docs/api/openapi.json`) | Nitro `bun` bundle; `PORT` (3000) and `PUNKTFUNK_MGMT_URL` env at runtime |
| `git.unom.io/unom/punktfunk-docs` | `docs-site/Dockerfile` | This site; `PORT` (3000) |
| `git.unom.io/unom/punktfunk-rust-ci` | `ci/rust-ci.Dockerfile` | Ubuntu 26.04 + FFmpeg 8/PipeWire/GL/GBM dev libs + a libcuda **link stub** (driver userspace, no kernel module) + pinned rustup — the container `ci.yml`'s Rust job runs in |
Registry pushes authenticate with the repo Actions secret **`REGISTRY_TOKEN`** (a PAT
with `write:package`; the login username in `docker.yml` is the token owner, not the
push actor).
## Runners
- **`ubuntu-24.04`** — the pre-existing Linux runner; runs the Rust/web/docs jobs (as
docker containers) and the image build+push jobs.
- **`macos-arm64`** — `home-mac-mini-1` (M-series, macOS 26), a **host-mode**
`act_runner` (upstream now ships it as `gitea-runner`) provisioned by
[`scripts/ci/setup-macos-runner.sh`](https://git.unom.io/unom/punktfunk/src/branch/main/scripts/ci/setup-macos-runner.sh):
rustup (+ both darwin targets for the universal xcframework), Node.js (host-mode runners
execute JS actions via `node` from PATH — nothing auto-provisions it), the runner binary
in `~/.local/bin`, state in `~/ci/act-runner/` (config, `.runner` registration,
`runner.log`), kept alive by the `io.gitea.act_runner` **root LaunchDaemon** — it cannot
be a user LaunchAgent: macOS Local Network privacy silently blocks LAN dials
("no route to host") from unbundled CLI binaries in gui/user launchd domains, while
system daemons are exempt. Needs full **Xcode** for `xcodebuild -create-xcframework`
(CLT alone only covers `swift build/test`); if `xcode-select` still points at CLT, the
script auto-detects `/Applications/Xcode*.app` and bakes a `DEVELOPER_DIR` override into
the daemon environment — no `xcode-select -s` required.
Re-provisioning (idempotent) or first-time registration from a dev box:
```sh
# token: org unom → Settings → Actions → Runners → Create new runner
ssh enricobuehler@192.168.1.135 GITEA_RUNNER_TOKEN=<token> bash -s \
< scripts/ci/setup-macos-runner.sh
```
## Apple releases
`release.yml` produces the production client builds on the Mac runner. All three app
targets share the bundle ID **`io.unom.punktfunk`** (one App Store listing, universal
purchase — effectively unchangeable after first submission). Signing is **not** secret-based:
the runner uses its **login keychain** directly, so install the **Developer ID Application**,
**Apple Distribution**, and (for the Mac App Store `.pkg`) **3rd Party Mac Developer
Installer** identities once via Xcode, with the WWDR intermediate present so they show as
valid. The only secrets are `ASC_API_KEY_P8`/`ASC_API_KEY_ID`/`ASC_API_ISSUER_ID` (App Store
Connect API key — notarization + TestFlight upload). Per-platform state:
- **macOS (Developer ID)** — sandboxed app (`Config/Punktfunk-macOS.entitlements`) → export
`notarytool` → stapled `.dmg` on the Gitea release.
- **macOS (App Store)** — manual-signed archive (Apple Distribution + the *Punktfunk macOS
App Store Distribution* profile) → upload to TestFlight. App Sandbox is **mandatory** here
and is now declared (app-sandbox + network client/server + audio-input + bluetooth/usb).
Prereqs (one-time, Apple portal): add the **macOS platform** to the App Store Connect app
record (universal purchase), install the Mac App Store distribution profile + the installer
cert above. `continue-on-error` until those exist.
- **iOS** — archive + upload to TestFlight (`method: app-store-connect`,
`destination: upload`). Crypto is declared exempt (`ITSAppUsesNonExemptEncryption`,
`Config/Info.plist`) so builds don't stall on the compliance question.
- **tvOS** — archive + upload to TestFlight (Rust core built from tier-3 targets, nightly
`-Zbuild-std` via `build-xcframework.sh`).
Each macOS target uses its own entitlements: `Config/Punktfunk-macOS.entitlements` (App
Sandbox is macOS-only) for the macOS app, and the shared `Config/Punktfunk.entitlements`
(keychain-access-groups only) for iOS/tvOS — `com.apple.security.app-sandbox` is invalid on
iOS/tvOS and would fail upload validation.
The runner needs a **release (non-beta) Xcode** — App Store processing rejects beta-SDK
builds, and a beta is unusable for the Rust side too: a newer-than-OS ld emits dylibs the
running dyld rejects ("mis-aligned LINKEDIT string pool"), killing every proc-macro build
with a misleading `E0463 can't find crate`. `build-xcframework.sh` therefore resolves
toolchains itself: non-beta Xcode for everything; with only CLT + a beta present it
builds macOS slices against CLT (packaging via any Xcode — `-create-xcframework` does no
linking) and **refuses iOS/tvOS slices** (CLT has no iOS SDK).
## Deployment
`docker.yml`'s `deploy-docs` job ships this docs site after every image push: it syncs
`compose.production.yml` to `~/punktfunk-docs` on **unom-1** (the DMZ services VM
website and cms deploy to) and runs `docker compose pull && up -d` there over SSH (same
pattern and secret set as `unom/website`: `DEPLOY_HOST` / `DEPLOY_USER` / `DEPLOY_PORT` /
`DEPLOY_SSH_KEY`, the `unom-ci-deploy` key). The container binds host port **3220**;
Caddy on `home-reverse-proxy-1` serves it as <https://docs.punktfunk.unom.io> (vhost in
`unom/reverse-proxy`, UniFi firewall allowlist Caddy→unom-1:3220 in `unom/infra`
`proxmox/unom-1`). The host and the web console are NOT deployed — the console
fronts a punktfunk host's management API on whatever box runs the host.
## Troubleshooting
- **Mac runner offline** — `ssh <mac> tail -50 '~/ci/act-runner/runner.log'`; restart with
`sudo launchctl kickstart -k system/io.gitea.act_runner`. "no route to host" in the log
means the daemon is running in a gui/user domain again — see the Local Network note
above.
- **`apple.yml` fails at the xcframework step** — Xcode missing or unselected:
`sudo xcode-select -s /Applications/Xcode.app/Contents/Developer` and accept the license
(`sudo xcodebuild -license accept`), then re-run.
- **Rust job can't pull `punktfunk-rust-ci`** — the runner host's docker daemon needs a
`docker login git.unom.io` if the org/registry isn't anonymously readable.
- **Stale builder image after toolchain/dep changes** — `docker.yml` re-pushes it on every
`main` push; a manual `workflow_dispatch` of `docker.yml` forces a rebuild.
+34 -14
View File
@@ -3,8 +3,9 @@ title: Clients
description: The ways to connect to a punktfunk host — the Apple app, Moonlight, or the Linux client.
---
A punktfunk host accepts clients over its own `punktfunk/1` protocol (the Apple and Linux apps) and
over GameStream (Moonlight). Pick whichever fits the device you're streaming *to*. Ready to install?
A punktfunk host accepts clients over its own `punktfunk/1` protocol (the macOS, Linux, Windows, and
Android apps) and over GameStream (Moonlight). Pick whichever fits the device you're streaming *to*.
Ready to install?
**[Install a Client](/docs/install-client)** has the step-by-step for every device.
## Apple app (Mac, iPhone, iPad, Apple TV)
@@ -24,8 +25,9 @@ Open the app, pick your host, [pair](/docs/pairing) once, and stream. It builds
## Moonlight (anything else)
punktfunk also speaks the **GameStream** protocol, so any [Moonlight](https://moonlight-stream.org/)
client — Windows, Android, Steam Deck, a browser, an old phone — connects with no punktfunk-specific
software. See [Connect with Moonlight](/docs/moonlight).
client — a browser, a smart TV, an old phone, a games console — connects with no punktfunk-specific
software. (Most platforms also have a native punktfunk app below — Moonlight is the catch-all.) See
[Connect with Moonlight](/docs/moonlight).
This is the broadest-compatibility option and great for couch gaming. It doesn't use the native
protocol's FEC/encryption extensions, but for a healthy LAN that rarely matters.
@@ -54,15 +56,31 @@ connect straight away:
punktfunk-client --connect <host>:9777 # skip the picker, start a session immediately
```
## Windows desktop client (in development)
## Android app (phone + Android TV)
`punktfunk-client` for Windows (`clients/windows`) is the native graphical client
for Windows — pure Rust, the same `punktfunk/1` core as the Apple and Linux apps, with a **WinUI 3**
UI (host list, settings, PIN pairing) and the video on a `SwapChainPanel`, plus WASAPI audio, FFmpeg
decode, SDL3 controllers, network discovery, and PIN pairing. Launch it and pick a host from the
list, just like the Apple and Linux apps. It builds on `x86_64-pc-windows-msvc`; hardware (D3D11VA)
decode, 10-bit/HDR present, and packaging are in progress, so it is not yet shipped. A headless CLI
path exists for scripting/measurement:
The native Android app speaks `punktfunk/1` directly, on both phones and Android TV. It does hardware
HEVC decode (including HDR10), Opus audio with a mic uplink, game controllers with rumble and
DualSense feedback, automatic host discovery, PIN pairing with pinned reconnects, and a live stats
overlay — with D-pad and game-controller focus navigation for the couch. It builds from the
`clients/android` directory (Kotlin + a shared Rust core).
Install it from **Google Play** — see [Install a Client](/docs/install-client#android). Open the app,
pick your host, [pair](/docs/pairing) once, and stream.
## Windows desktop client
`punktfunk-client` for Windows (`clients/windows`) is the native graphical client for Windows — pure
Rust, the same `punktfunk/1` core as the Apple, Linux, and Android apps, with a **WinUI 3** UI (host
list, settings, PIN pairing) and the video on a `SwapChainPanel`. It does D3D11VA hardware decode
(software fallback), 10-bit/HDR present, WASAPI audio + mic, SDL3 controllers (rumble, lightbar,
DualSense), network discovery, and the full PIN-pairing trust surface. It builds for both `x86_64`
and `aarch64` and ships as a **signed MSIX**. Launch it and pick a host from the list, just like the
other native apps.
> The hardware-decode and HDR paths are complete but still pending validation on real GPU hardware.
> If anything misbehaves, **[Moonlight](/docs/moonlight)** is a proven alternative for Windows.
A headless CLI path exists for scripting/measurement:
```sh
punktfunk-client # open the WinUI 3 window (host list / settings)
@@ -70,7 +88,7 @@ punktfunk-client --discover # list hosts on the network
punktfunk-client --headless --connect <host>:9777 # no window: connect, count frames, print stats
```
Until it ships, **Moonlight** remains the recommended way to stream to Windows (see below).
Prefer the broadest compatibility, or no install? **Moonlight** also streams to Windows (see below).
## Linux reference client (headless)
@@ -89,7 +107,9 @@ punktfunk-probe --connect <host>:9777 --pin <fp> # connect to one
|---|---|
| A Mac, iPhone, iPad, or Apple TV | The **Apple app** |
| A Linux desktop or laptop, or a Steam Deck | **`punktfunk-client`** (GTK4) |
| Windows, Android, a browser, a TV | **Moonlight** |
| An Android phone or TV | The **Android app** |
| Windows | The native **`punktfunk-client`** (signed MSIX) or **Moonlight** |
| A browser, a smart TV, or any other device | **Moonlight** |
| Automated tests / latency measurement | **`punktfunk-probe`** (headless) |
Whichever you choose, the first connection needs a one-time [pairing](/docs/pairing).
+4 -4
View File
@@ -30,15 +30,15 @@ These tell the host which desktop session to attach to. Your setup guide sets th
You don't set these on the host — **the client chooses them**. When a device connects, the host
creates a virtual display at that device's resolution and refresh rate. A 1080p60 laptop and a
1440p120 desktop each get their own. (With Moonlight, set the mode in Moonlight's settings; with the
Apple app, it uses the device's display.)
1440p120 desktop each get their own. (With Moonlight, set the mode in Moonlight's settings; the
native clients let you pick a mode or default to the device's display.)
## Bitrate
The client requests a bitrate; the host encodes to it. To find a good value for your link:
- **Apple app:** use the built-in **speed test** (a host card's menu*Test Network Speed*). It
measures your link and suggests a bitrate, then applies it.
- **Native clients (Apple, Linux, and more):** use the built-in **speed test** (from a host's menu).
It measures your link, suggests a bitrate, and applies it.
- **Moonlight:** set the bitrate in Moonlight's settings. Start moderate and raise it.
## Multiple devices at once
-111
View File
@@ -1,111 +0,0 @@
---
title: "DualSense Haptics"
description: "Feasibility and scoping for audio-driven DualSense haptics."
---
**Status: scoped, NO-GO for now (deferred).** Advanced voice-coil haptics on the DualSense are
driven by the controller's **USB audio interface** (4-channel surround, the back two channels carry
the haptic waveform), *not* by HID reports. Emulating that on a Linux host and faithfully replaying
it on the Apple client both hit hard walls, and the supply of software that actually *emits* these
haptics on a Linux host is essentially zero. We defer the audio-haptics feature and instead land the
parts of "really supporting the DualSense" that *are* reachable: **adaptive triggers (HID) and
two-motor rumble.**
(Grounded in a 4-agent feasibility read — host USB-gadget viability, DualSense audio descriptors,
Linux game demand, Apple client render path — 2026-06-10.)
## The one distinction that decides everything
| Feature | How it's driven | Reachable for us? |
|---|---|---|
| Basic rumble (2 motors) | HID output report `0x02`, bytes 34 | **Yes** — already parsed; client already has `nextRumble()` |
| **Adaptive triggers** (L2/R2 resistance) | HID output report `0x02`, bytes 1122 / 2233 | **Yes** — already parsed in `dualsense.rs`; just needs the `0xCD` back-channel + client render |
| **Advanced haptics** (voice-coil actuators) | **USB *audio* interface** — 4-ch, back 2 channels = haptic PCM | **No (for now)** — see the three walls below |
The UHID DualSense we already built is **HID-only**. It cannot present the DualSense's *audio*
interface, so it structurally cannot carry advanced haptics. That's not a bug in our implementation —
it's the wrong transport for this signal.
## The three walls (any one is fatal on its own)
### Wall 1 — Host capture needs a kernel rebuild
To *capture* haptic audio a game emits, the host must present a virtual device that owns the
DualSense audio interface. The standard way is a composite USB gadget (`configfs` + `f_hid` +
`f_uac2`) bound to a software UDC (`dummy_hcd`).
- ✅ Present & enabled on this box: `CONFIG_USB_CONFIGFS`, `CONFIG_USB_CONFIGFS_F_HID`,
`CONFIG_USB_CONFIGFS_F_UAC2`, plus `libcomposite`/`usb_f_hid`/`usb_f_uac2`/`u_audio` modules.
-**Blocker:** `# CONFIG_USB_DUMMY_HCD is not set` in `/boot/config-7.0.0-22-generic`. No
`dummy_hcd.ko`, no `/sys/class/udc/`. **No UDC → nothing to bind the gadget to.** Requires a
custom kernel build to enable `CONFIG_USB_DUMMY_HCD=m`, plus root for module-load/configfs.
A lighter alternative exists — a **virtual PipeWire/ALSA sink renamed as the DualSense** (this is how
the working Linux setups capture the back-2-channels today, via WirePlumber rules). It skips the
kernel rebuild, but is gated by the same Wall 2 below, and games' audio-device detection is
hardcoded per-title so it's fragile.
### Wall 2 — Almost nothing on a Linux host emits these haptics
This is the decisive one. The *supply* that would feed our capture barely exists:
- **Steam Input (Linux):** no official advanced-haptics support (open feature request as of 2026).
- **Sony's `hid-playstation` kernel driver:** explicitly does **not** expose VCM haptics or adaptive
triggers — basic rumble only.
- **RPCS3:** treats the DualSense as a generic pad; no advanced haptics.
- **Native Linux games:** effectively **zero** with advanced haptics.
- **The only working path** is a handful of Proton titles (FF7 Remake, Ghostwire, Deathloop, Animal
Well, Stellar Blade) via ClearlyClaire's *custom Wine patches*, **USB-only**, Steam Input
disabled, forced into a 4.0-surround profile, device renamed to match Windows. ~510 games total.
- Bluetooth can't carry it on Linux *or* Windows (Sony's proprietary A2DP repurposing isn't exposed).
A host-side capture feature is only as useful as the software willing to drive it. On Linux that set
is a niche-of-a-niche.
### Wall 3 — The Apple client can't faithfully replay it
Even with a captured waveform, the primary client (macOS/iOS) can't render it well:
- macOS GameController exposes the DualSense as a **basic gamepad** — no voice-coil / adaptive-trigger
access. Those are PS5-only in Apple's stack.
- CoreHaptics is **discrete, pattern-based** (`CHHapticPattern` events, ≤30 s), **not** a PCM
streaming sink. Converting a streamed haptic waveform to patterns is lossy — it throws away exactly
the fidelity that makes voice-coil haptics worth having.
- There is **no public macOS API** to route CoreAudio to the DualSense's channels 34. Doing it
anyway means private/reverse-engineered APIs that break across OS updates.
## What we *can* ship instead ("really supporting the DualSense" minus audio haptics)
The HID DualSense we built is the foundation, and the high-value parts are within reach:
1. **Adaptive triggers — GO.** `dualsense.rs` already parses the L2/R2 trigger effects out of HID
output report `0x02`. Finishing this is the paused HID work: route them over the `0xCD`
HID-output back-channel and render on the client. This delivers the headline "DualSense feel"
(trigger resistance/weapon tension) for any source that emits it — and it's pure HID, no audio
interface, no kernel rebuild.
2. **Two-motor rumble — already done.** Parsed host-side; the Apple client already has
`nextRumble()`. Wire it to `GCDeviceHaptics`/`CHHapticEngine` as discrete patterns (API-clean,
no private APIs).
3. **LED / player-LED / touchpad / motion** — already parsed; finish the `0xCC`/`0xCD` routing.
This is the resume-able HID DualSense Phase C/D/E work — it stands on its own and was never blocked.
## Conditions for a future GO on audio haptics
Revisit if **all three** change:
- A real DualSense is available on the dev box to capture an authoritative `lsusb -v` + the exact
UAC channel/sample-rate/format layout (today: undocumented, would need reverse-engineering).
- The host target gains a UDC (custom kernel with `dummy_hcd`, or real hardware OTG) **or** we accept
the PipeWire-renamed-sink path *and* the title set that emits haptics on Linux grows beyond the
Proton-patch niche.
- The client target shifts to one that can render PCM haptics (a Linux/Windows client with direct
CoreAudio-style channel access, or a future Apple API) — or we accept lossy pattern conversion.
Until then the cost/benefit is upside-down: three hard subsystems (kernel, USB gadget, audio
routing) to serve ~510 Proton titles, rendered lossily on the one client we ship.
## Recommendation
**Defer audio-driven advanced haptics. Land adaptive triggers (HID) + rumble instead** — that's the
reachable 80% of "really supporting the DualSense," needs no kernel work, and the parsing is already
written. Keep this doc as the down payment for the audio-haptics feature whenever the three
conditions above are met.
@@ -1,73 +0,0 @@
---
title: "gamescope Multi-User Isolation (deferred)"
description: "Research + design for concurrent INDEPENDENT gamescope desktops (multi-user), and why it's deferred. The shared-desktop multi-view case already landed."
---
**Status: deferred (2026-06-12).** Concurrent sessions landed for the **shared-desktop multi-view**
case — multiple devices viewing/controlling the *same* KWin/Mutter/wlroots desktop ([Status](/docs/status)).
This page captures the research for the *other* model — **independent desktops** (each client its own
gamescope instance: the multi-user / cloud-gaming-on-one-box case) — and why it's parked. Pick this
up from here if the use case becomes a priority.
## What landed vs what this is
| Model | Backends | Input | Audio | Status |
|---|---|---|---|---|
| **Shared-desktop multi-view** | kwin / mutter / wlroots | shared (all drive one desktop) | shared (all hear one desktop) | ✅ **landed** — correct semantics: stream *your* desktop to laptop + TV at once |
| **Independent desktops (multi-user)** | gamescope | **per-session** (each drives its own game) | **per-session** | ⏸ **deferred** — this page |
For independent desktops, shared input/audio is *wrong* — each user must drive and hear only their own
session. gamescope is the natural fit: each `create()` spawns a fresh nested compositor (own
rendering, own EIS input socket). The blocker is that the host's input/audio/mic are host-lifetime
**shared** services, and the gamescope EIS socket is relayed through a single global file.
## Current architecture (the research)
Each gamescope **process is per-session** (`vdisplay/gamescope.rs::create()` spawns one; the
`VirtualOutput.keepalive` owns it). But:
- **EIS input socket — single global file.** gamescope exports `LIBEI_SOCKET` for its children; a
shell wrapper relays it to the fixed path `/tmp/punktfunk-gamescope-ei` (`EI_SOCKET_FILE`).
**Two concurrent instances overwrite each other's socket name** in that one file.
- **Injector — one host-lifetime `!Send` service.** `punktfunk1.rs::InjectorService` opens **one**
`inject::open(backend)` for the whole run and forwards events over an mpsc channel. It was made
shared deliberately (the portal `CreateSession` churn wedged KWin's EIS — "EIS setup timed out").
For gamescope it reads the one global socket file, so all sessions' input lands in whichever
instance wrote last.
- **Audio — global default-sink monitor.** `audio::open_audio_capture()` sets
`STREAM_CAPTURE_SINK` and autoconnects to the host's **default sink monitor** (PW_ID_ANY) — the
whole system's output, not a per-gamescope node. gamescope exposes **no per-instance audio node**.
- **Mic — one global `Audio/Source`.** `MicService` feeds one PipeWire source named `punktfunk-mic`;
all clients' mic uplinks mix into it.
- Per-session already (no work): the gamescope process, the PipeWire video node, and the uinput
gamepads.
## What it would take
1. **Per-instance EIS socket** — give each gamescope a unique relay file
(`/tmp/punktfunk-gamescope-{id}-ei`) and carry the path on `VirtualOutput` (new field) so the
session can find its own socket.
2. **Per-session injector** — for gamescope sessions, create a **per-session** injector bound to that
socket (its own thread, since `InputInjector` is `!Send`), instead of the shared `InjectorService`.
Keep the shared service for the portal backends (kwin/mutter) where shared input is correct.
Ordering nuance: the input thread is wired before the gamescope socket exists, so the per-session
injector must open **lazily** (on first event, by which time gamescope is up) or be created after
`build_pipeline`.
3. **Per-session audio (the bigger piece).** gamescope has no per-instance audio node, but audio
*is* isolatable: create a **per-session PipeWire null-sink**, route that gamescope's apps to it
(`PULSE_SINK` / a target node on the spawn env), and capture **that sink's monitor** per session.
This is the largest addition — null-sink create/teardown + routing + per-session capture.
4. **Per-session mic** — a virtual `Audio/Source` per session (`punktfunk-mic-{id}`), routed into
that gamescope, instead of the one global source.
## Why deferred
- It's a **large multi-file refactor** — the whole input path (per-instance sockets + per-session
injector + the lazy-open ordering), **plus** per-session null-sink audio routing, **plus** per-session
mic — for a **niche** use case (multiple independent users gaming on one box).
- The **common** concurrency case — stream one desktop to several of *your own* devices — is the
shared-desktop multi-view model, which **already landed and is the correct semantics** for it.
- No correctness gap in what shipped: concurrent sessions work today; this is purely the *additional*
independent-desktops model.
Revisit when there's a real multi-user requirement. The plumbing list above is the whole job.
@@ -1,95 +0,0 @@
---
title: "GameStream Host"
description: "Stream to a stock Moonlight client on a client-sized virtual display."
---
The shippable milestone (plan §8). A stock Moonlight/Artemis client discovers this host,
pairs, launches, and gets video (then input, then audio) on a client-sized virtual display.
Ground-truth protocol reference: [`research/gamestream-protocol-research.json`](research/gamestream-protocol-research.json)
(distilled from Sunshine + moonlight-common-c source; cite those for byte-level detail).
## Architecture (respects the "one core" invariant)
- **punktfunk-core** gains a **P1 GameStream wire codec** (`ProtocolPhase::P1GameStream`, the
hook already exists): the exact RTP+`NV_VIDEO_PACKET` framing, the GameStream FEC shard
layout, and the video/audio AES-GCM/CBC paths. Hot path, native threads, **no async**.
Kept beside punktfunk's native internal format (P2), selected by phase.
- **punktfunk-host** gains the **control plane** (tokio/axum OK — I/O-bound, not the hot path):
mDNS discovery, nvhttp serverinfo + the 4-phase pairing, the RTSP handshake, the ENet
control stream + input injection, the virtual-display lifecycle, and Opus audio encode.
## Port map (base 47989; Moonlight derives all by offset)
| Port | Proto | Role |
|---|---|---|
| 47989 | TCP | HTTP nvhttp (unpaired: /serverinfo, /pair PIN flow) |
| 47984 | TCP | HTTPS nvhttp (paired; **client-cert pinned**) — /launch, /resume, … |
| 48010 | TCP | RTSP (OPTIONS/DESCRIBE/SETUP/ANNOUNCE/PLAY) |
| 47998 | UDP | Video RTP (+ RS-FEC, optional AES-GCM) |
| 47999 | UDP | Audio RTP (Opus, RS-FEC 4+2, optional **AES-CBC**) |
| 48000 | UDP | ENet control stream (AES-GCM) + remote input |
| 5353 | UDP | mDNS `_nvstream._tcp.local` advertisement |
## Key wire facts (the non-obvious ones)
- **Video datagram** = `RTP_PACKET(12, BIG-endian)` + `reserved[4]` + `NV_VIDEO_PACKET(16,
LITTLE-endian)` + payload. Endianness differs *within the same packet*. `header=0x80|0x10`.
- `fecInfo` (u32 LE) = `(dataShards<<22)|(fecIndex<<12)|(fecPercentage<<4)`; parityShards is
**recomputed** by the client as `ceil(dataShards*pct/100)` — must match exactly.
- `multiFecBlocks` = `(blockIdx<<4)|((nBlocks-1)<<6)`; **≤4 FEC blocks/frame**, ≤255 shards/block.
- Each frame's bitstream is prefixed with an 8-byte `video_short_frame_header_t`
(`headerType=0x01`, `frameType` 2=IDR, `lastPayloadLen`) before striping into shards.
- Shard size = `packetSize + 16`. Data shards first, then parity, over a contiguous RTP
sequence range. Last data shard zero-padded.
- **Video crypto** (when `SS_ENC_VIDEO` negotiated): AES-128-GCM, key = raw 16-byte RIKEY
(from `/launch?rikey=`), IV = `counter_le[8]||0,0,0||'V'(0x56)`, **NO AAD**, 32-byte
`ENC_VIDEO_HEADER{iv[12],frameNumber,tag[16]}` prefix; **FEC first, then encrypt per shard**.
- **Pairing**: PIN key = `SHA-256(salt[16] || ascii_pin)[..16]`; AES-128-**ECB** (no padding)
for the challenge blocks; SHA-256 rolling hashes; RSA-SHA256 signatures over X.509 certs;
the client cert is pinned for subsequent HTTPS. 4 phases over `/pair?phrase=…`.
- **RTSP** `Session: DEADBEEFCAFE;timeout = 90` (literal), `Transport: server_port=<p>`,
`streamid=video/0/0` / `control/13/0`. ANNOUNCE carries the negotiated config
(`x-nv-video[0].*`, `x-nv-vqos[0].*`) → maps to `punktfunk_core::Config`.
## The two highest interop risks (validate EARLY)
1. **RS-FEC matrix compatibility.** Sunshine + Moonlight both use **nanors** (GF(2⁸), poly
0x11d, Vandermonde systematic). punktfunk-core uses `reed-solomon-erasure` (Cauchy) — parity
bytes likely **don't match**, so Moonlight silently fails to recover any frame with a lost
data shard. Mitigation: **on a clean LAN with no loss the client never runs RS decode**, so
defer this — get a frame decoded first, then FFI/port nanors for loss recovery.
2. **Crypto layout.** punktfunk's `SessionCrypto` (salt + seq-as-AAD) is wire-incompatible. P1
needs a separate GameStream GCM path. Mitigation: **video encryption is negotiated and
usually off on LAN** — implement plaintext video first, add GCM later.
## Phasing (each phase independently testable with a real Moonlight client)
- **P1.1 — Discovery + serverinfo + pairing.** mDNS `_nvstream._tcp`, HTTP/HTTPS nvhttp,
`/serverinfo` XML, the 4-phase pairing + cert pinning. *Acceptance: Moonlight discovers,
pairs (PIN), and shows the host as ready.* ← first slice.
- **P1.2 — Launch + RTSP + virtual display.** `/launch` (parse rikey/rikeyid/mode), the RTSP
handshake, negotiate `Config`, create a wlroots virtual output sized to the client.
*Acceptance: Moonlight completes RTSP and the host stands up the UDP streams.*
- **P1.3 — Video (punktfunk-core P1 codec), plaintext, clean-LAN.** RTP+NV framing + FEC shard
layout in punktfunk-core; wire the spike's NVENC AUs → UDP 47998. *Acceptance: Moonlight DISPLAYS video.*
- **P1.4 — Control + input.** ENet (`rusty_enet`) control stream; decode input → `inject.rs`
(uinput/reis); request-IDR → force NVENC keyframe. *Acceptance: mouse/keyboard work.*
- **P1.5 — Robustness: FEC recovery + encryption.** nanors-exact FEC; per-shard AES-GCM.
*Acceptance: stable under `tc netem` loss; encrypted streams.*
- **P1.6 — Audio + polish.** Opus + audio RTP/FEC/CBC (UDP 47999); disconnect teardown; KWin
backend for the user's KDE box. *Acceptance: full game stream with sound — the GameStream-host goal.*
## Crates (verified available)
`mdns-sd` 0.20 (discovery) · `axum` 0.8 + `rustls` + `tokio-rustls` (nvhttp/HTTPS, custom
`ClientCertVerifier` for pinning) · `rcgen` 0.14 + `x509-parser` 0.18 + `rsa`/`sha2`/`aes`/
`ecb` (pairing crypto) · hand-rolled RTSP over `tokio::net::TcpListener` · `rusty_enet` 0.4
(control) · `opus` 0.3 (audio) · `reis` 0.6 + `input-linux` (input) · `aes-gcm` (already in
core) for the P1 video/control GCM path; nanors (FFI/port) for FEC recovery in P1.5.
## Testing note
The host is headless; end-to-end needs a **stock Moonlight client on the LAN** pointed at
this box (manual "add host" by IP works without mDNS). P1.1 is testable with `curl` against
`/serverinfo` + the Moonlight pair flow; P1.3+ needs a client that can display.
+1 -1
View File
@@ -17,7 +17,7 @@ punktfunk-host serve --native
| Flag | Meaning |
|---|---|
| `--native` | Also run the native `punktfunk/1` server (recommended; enables the Apple app and discovery). |
| `--native` | Also run the native `punktfunk/1` server (recommended; enables the native clients and discovery). |
| `--native-port <PORT>` | Native QUIC port (default `9777`). |
| `--open` | Don't require pairing — serve any device on the network. Off by default; only for trusted single-user setups. |
| `--mgmt-bind <IP:PORT>` | Management API address (default loopback `127.0.0.1:47990`). |
+4 -2
View File
@@ -39,7 +39,8 @@ punktfunk speaks two protocols over the same host:
no special software. This is the most compatible way in.
- **punktfunk/1 (native)** — a purpose-built protocol with a QUIC control channel and a UDP data
channel hardened with forward error correction and encryption. It's lower-latency and more resilient
on imperfect networks, and it's what the [Apple app](/docs/clients) uses.
on imperfect networks, and it's what the [native clients](/docs/clients) (Apple, Linux, Windows,
Android) use.
Both run from a single host process, so you don't choose up front — Moonlight clients use GameStream,
the native clients use punktfunk/1.
@@ -53,7 +54,8 @@ cryptographic identity — no PIN, no account, no cloud. See [Pairing & Trust](/
## Finding hosts
Hosts advertise themselves on your local network, so clients can **discover** them automatically
instead of needing an IP address. The Apple app and Moonlight both list hosts they find on the LAN.
instead of needing an IP address. The native clients and Moonlight both list hosts they find on the
LAN.
## Multiple devices at once
@@ -1,300 +0,0 @@
---
title: "Implementation Plan"
description: "The full design: protocol core, milestones, and architecture."
---
*A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.*
> `punktfunk` is a placeholder codename — rename freely. It fits the lowercase house style (`unom`, `played`, `remplir`) and reads as "glass-to-glass light," which is the whole point.
---
## 0. The thesis (why this is worth building)
Two concrete gaps justify a new project rather than another fork:
1. **The 1 Gbps wall is a FEC design limit, not a bandwidth limit.** Moonlight/Sunshine protect each frame with ReedSolomon over GF(2⁸), which caps a block at 255 shards. At 5120×1440@240 that ceiling is hit around 1 Gbps. Switching the erasure code to **Leopard-RS over GF(2¹⁶)** (via the `reed-solomon-simd` crate) raises the per-block shard limit to 65,536 and runs in O(n log n) with SIMD. The wall disappears as a *consequence* of a better core, not as a hack.
2. **Linux software virtual displays are a real, unfilled gap.** The compositor-side capability now exists (Mutter headless virtual monitors since GNOME 40; wlroots headless outputs; KWin virtual outputs in Plasma 6), but no streaming host *drives* those APIs to create a client-sized output on demand, capture it via PipeWire, and route input back via libei. Apollo's virtual display is Windows-only. This is the immediate, shippable win.
**Strategic ordering:** ship the Linux virtual-display host speaking the *existing* Moonlight protocol first (every Moonlight/Artemis client works on day one, no client to write). Only then introduce the new GF(2¹⁶) transport as a negotiated protocol extension with our own clients. Value early, hard parts deferred until de-risked.
---
## 1. Scope & non-goals
**In scope (eventually):**
- Linux streaming host with on-demand software virtual displays (KWin first, then wlroots, then Mutter).
- A shared Rust protocol/transport/FEC core exposed over a stable C ABI.
- A modern transport that removes the 1 Gbps ceiling.
- Native clients: Rust (Linux), Swift (macOS/iOS), Kotlin (Android) — all linking the same core.
**Explicit non-goals (at least at first):**
- Windows *host* support (Sunshine/Apollo already do this well; no gap to fill).
- Internet/NAT-traversal relay infrastructure (LAN/VPN first; you already run Headscale/NetBird — lean on that).
- Reinventing encoders/decoders (bind to FFmpeg + vendor SDKs; never rewrite codecs).
- A bespoke compositor (drive existing ones; only consider a dedicated headless compositor as a *deployment mode*, see §6).
---
## 2. Architecture overview
```mermaid
flowchart TD
subgraph Host["Linux Host (Rust)"]
VD["Virtual display orchestrator<br/>(KWin / wlroots / Mutter)"]
CAP["Capture<br/>(PipeWire / dmabuf)"]
ENC["Encoder<br/>(VAAPI / NVENC via FFmpeg)"]
VD --> CAP --> ENC
ENC --> COREH
IN_H["Input injector<br/>(libei / uinput)"]
COREH["punktfunk-core (C ABI)<br/>protocol · FEC · pacing · crypto"]
COREH --> IN_H
end
COREH <-->|"UDP+FEC video / QUIC control+audio"| COREC
subgraph Client["Client (Rust / Swift / Kotlin)"]
COREC["punktfunk-core (same crate, C ABI)"]
DEC["Decoder<br/>(VideoToolbox / NVDEC / VAAPI)"]
PRES["Present + frame pacing"]
INP["Input capture"]
COREC --> DEC --> PRES
INP --> COREC
end
```
**The load-bearing decision:** `punktfunk-core` is one crate, compiled once, linked by every host and client through a C ABI. Protocol logic, FEC, packet pacing, jitter buffering, pairing, and crypto live there and exist exactly once. Platform code (capture, encode, decode, present, input, UI) lives outside the core and is written in whatever language suits the platform.
---
## 3. Protocol strategy (three phases)
| Phase | Protocol | Clients that work | Bitrate ceiling | Purpose |
|------|----------|-------------------|-----------------|---------|
| **P1** | GameStream-compatible (existing Moonlight wire format) | All existing Moonlight/Artemis clients | ~1 Gbps (legacy GF(2⁸) FEC) | Ship the Linux virtual-display win with zero client work |
| **P2** | `punktfunk/1` negotiated extension: GF(2¹⁶) FEC, multi-block framing, optional QUIC control | punktfunk clients only; falls back to P1 for others | Multi-Gbps | Break the wall; introduce native clients |
| **P3** | `punktfunk/1` as primary; GameStream kept as compat shim | punktfunk everywhere, Moonlight as fallback | Multi-Gbps | Full control of features (mic passthrough, per-client identity, HDR signalling) |
Negotiation: extend the `serverinfo`/RTSP `SETUP` handshake with a capability flag. Old clients never see the flag and get P1 behavior. This is how Apollo/Artemis diverge cleanly, and it keeps you compatible while you build.
---
## 4. Tech stack (settled)
**Language split:** Rust for the core and all non-Apple platform code; Swift only for the macOS/iOS client UI + VideoToolbox/Metal; Kotlin for Android UI + MediaCodec. The C ABI is the seam.
**Threading:** native OS threads for the video hot path. `tokio` is allowed *only* for the control plane (pairing, web config, QUIC control stream). The per-frame pipeline must never touch an async runtime.
### Core crate dependencies
| Concern | Crate | Notes |
|--------|-------|-------|
| FEC | `reed-solomon-simd` (v3+) | Leopard/GF(2¹⁶), SIMD, O(n log n) — the wall-breaker |
| QUIC (control/audio) | `quinn` | Datagram ext for audio; reliable streams for control |
| TLS / crypto | `rustls` + `ring` (or `aws-lc-rs`) | Pairing, session keys (AES-GCM to match GameStream in P1) |
| Serialization | `zerocopy` / `bytes` | Wire structs `#[repr(C)]`, zero-copy parse |
| C header gen | `cbindgen` | Generates `punktfunk_core.h` from the ABI module |
| Error/log | `tracing` | Structured; feature-gate off the hot path |
### Linux host dependencies
| Concern | Crate / API | Notes |
|--------|-------------|-------|
| Capture | `pipewire` (pipewire-rs) | ScreenCast portal stream → dmabuf |
| Portal / DBus | `ashpd` + `zbus` | xdg-desktop-portal: ScreenCast, RemoteDesktop |
| Encode | `ffmpeg-next` or `rsmpeg` | VAAPI / NVENC, dmabuf import (zero-copy) |
| Input inject | `reis` (libei) + `input-linux` (uinput fallback) | Wayland-native first, uinput as universal fallback |
| Virtual output | per-compositor (see §6) | KWin DBus / Sway `create_output` / Mutter DBus |
| Web config | `axum` + `tokio` + small Vite/React UI | You own this stack already |
### Apple client (P2+)
Swift + VideoToolbox (decode) + Metal (present) + SwiftUI. Imports `punktfunk_core.h` directly via a module map — no glue layer.
### Ruled out
- **Swift for the host/core:** no Linux Wayland/PipeWire/DRM/VAAPI ecosystem; ARC in hot loops. (Excellent *Apple-client* language, wrong for systems/Linux.)
- **Go:** GC disqualifies the hot path.
- **C++:** throws away the safety/concurrency wins that justified greenfield over forking.
- **Zig:** best-in-class C interop, but pre-1.0 with no Wayland/QUIC ecosystem — too much risk for a multi-month build. Revisit later if desired.
---
## 5. The C ABI boundary
Design it on day one; retrofitting an ABI is painful.
**Principles**
- Opaque handles only across the boundary: `PunktfunkSession*`, never Rust types.
- All cross-boundary structs are `#[repr(C)]`; primitives + pointer/len pairs for buffers.
- Async events via registered C callbacks (`fn ptr` + `void* userdata`).
- Explicit, documented ownership: who frees what, when. Provide `punktfunk_*_free` for every allocation that crosses out.
- Versioned ABI: `uint32_t punktfunk_abi_version(void)` + a `PunktfunkConfig` struct whose first field is its own size for forward-compat.
**Minimal surface (sketch)**
```c
// lifecycle
PunktfunkSession* punktfunk_session_new(const PunktfunkConfig* cfg);
void punktfunk_session_free(PunktfunkSession*);
// host: feed an encoded access unit (the core does FEC + packetize + pace + send)
int punktfunk_host_submit_frame(PunktfunkSession*, const uint8_t* data, size_t len,
uint64_t pts_ns, PunktfunkFrameFlags flags);
// client: pull a reassembled, FEC-recovered access unit ready to decode
int punktfunk_client_poll_frame(PunktfunkSession*, PunktfunkFrame* out /*borrowed until next poll*/);
// input (both directions): client captures, host receives via callback
int punktfunk_send_input(PunktfunkSession*, const PunktfunkInputEvent*);
void punktfunk_set_input_callback(PunktfunkSession*, PunktfunkInputCb, void* user);
// stats for the frame-pacing/quality logic and the web UI
void punktfunk_get_stats(PunktfunkSession*, PunktfunkStats* out);
```
Keep it this small. Everything platform-specific (how you got the encoded bytes, how you decode them) stays on the platform side.
---
## 6. Virtual display orchestration
This is the differentiator and the most fragmented part. Two deployment models — support both eventually, pick one for the MVP.
**Model A — Attach to the running session.** Create a client-sized virtual output *inside the user's live desktop*, stream it, tear it down on disconnect. This is "add a monitor to my actual PC." Best UX, hardest because it depends on per-compositor runtime APIs.
**Model B — Dedicated headless session.** Spawn a separate headless compositor purely for the stream (e.g. `gnome-shell --headless --virtual-monitor WxH`, or a headless wlroots compositor). Cleaner isolation, sidesteps runtime-output APIs, ideal for "remote second PC." Worse for "mirror/extend my real desktop."
**Per-compositor (Model A) runtime virtual-output creation:**
- **KWin / Plasma 6 (recommended MVP target — matches your CachyOS/KDE daily driver and where the gap is loudest):** KWin can create virtual outputs; KRdp already does this internally for remote sessions. Drive it via the KWin DBus interface; capture via `xdg-desktop-portal-kde` ScreenCast (PipeWire); inject input via the RemoteDesktop portal or `reis`.
- **wlroots (Sway/Hyprland — fastest to *prototype* the pipeline):** enable the headless backend (`WLR_BACKENDS=…,headless`), then `swaymsg create_output` / `hyprctl output create headless`. Capture via `wlr-screencopy` or the portal. Simplest API; good for validating capture→encode→send before fighting KWin/Mutter.
- **Mutter / GNOME:** virtual monitors via the headless backend; runtime creation via Mutter DBus (`org.gnome.Mutter.*` — partly experimental). Capture via `xdg-desktop-portal-gnome` ScreenCast.
**Recommendation:** do a 12 day wlroots spike to prove the *pipeline*, then build the real MVP on KWin because that's your deployment target. Abstract virtual-output creation behind a trait so compositors are pluggable:
```rust
trait VirtualDisplay {
fn create(&self, mode: Mode) -> Result<OutputHandle>;
fn destroy(&self, h: OutputHandle) -> Result<()>;
}
```
---
## 7. The hot path: pipeline & latency budget
Per-frame pipeline, each stage on its own thread, connected by bounded SPSC channels (drop-oldest on overflow, never block the encoder):
```
capture(dmabuf) → encode(NVENC/VAAPI) → core[FEC+packetize+pace+send]
│ network
client: recv → core[reorder+FEC recover+jitter] → decode → present
```
**Glass-to-glass budget (LAN, 240 Hz = 4.17 ms/frame):**
| Stage | Target | Notes |
|------|--------|-------|
| Capture latency | ≤ 1 frame | dmabuf, no copy to CPU |
| Encode | 14 ms | NVENC low-latency preset; tune lookahead off |
| FEC + packetize | < 1 ms | SIMD RS; pre-allocated shard buffers |
| Network (LAN) | < 1 ms | `sendmmsg` / UDP GSO to cut syscalls |
| Jitter buffer | 01 frame | adaptive; minimum that hides observed jitter |
| FEC recover + reassemble | < 1 ms | only when loss occurs |
| Decode | 14 ms | hardware decoder |
| Present | ≤ 1 frame | align to client vsync |
**Target: 1535 ms glass-to-glass on LAN.** The art is *frame pacing* — matching capture/encode cadence to the client's actual refresh and keeping the jitter buffer as small as the link allows. This, not the codec, is what separates good from bad streaming. Budget real time for it.
**Throughput math to keep honest:** 5120×1440@240 ≈ 1.77 Gpx/s. At 0.5 bpp that's ~885 Mbps; 0.6 bpp ≈ 1.06 Gbps; 0.8 bpp (4:4:4 headroom) ≈ 1.4 Gbps. The GF(2¹⁶) FEC + multi-block framing must sustain these without the per-frame shard count being the limiter — which it no longer is once you leave GF(2⁸).
---
## 8. Milestones
Sizing is rough and relative (Spike / S / M / L) for a focused solo dev; treat as ordering, not deadlines.
**M0 — Pipeline spike (S).** wlroots headless output → PipeWire capture → VAAPI/NVENC encode → dump H.265 to a file that plays. *Acceptance:* a valid encoded file from a virtual output, no streaming yet. Proves the Linux capture+encode chain end-to-end.
**M1 — `punktfunk-core` skeleton + C ABI (M).** Session lifecycle, GameStream-compatible packetization and GF(2⁸) FEC (P1), AES-GCM, `cbindgen` header, a tiny C test harness. *Acceptance:* core links from C; round-trips packets in a loopback test with simulated loss.
**M2 — P1 host: stream to stock Moonlight (L).** Wire M0's pipeline into the core; implement `serverinfo`/pairing/RTSP enough for a real Moonlight client to connect, with a KWin virtual output created on connect and destroyed on disconnect. Input via `reis`/uinput. *Acceptance:* **you play a game on your KDE box streamed to a stock Moonlight client on a virtual display, no dummy plug, no kernel args.** This is the shippable milestone and the project's reason to exist.
**M3 — Measurement harness (S).** Glass-to-glass latency measurement (on-screen QR/timestamp or photodiode), packet-loss injection, frame-pacing and stall metrics surfaced in the web UI. *Acceptance:* you can quantify a regression. Build this before optimizing anything.
**M4 — P2 transport: break the wall (L).** Add `punktfunk/1` negotiation; swap to `reed-solomon-simd` GF(2¹⁶) with multi-block per-frame framing; optional QUIC control/audio. Write a minimal **Rust** reference client (decode via VAAPI, present via wgpu/Vulkan) to exercise it. *Acceptance:* a stable stream above 1.4 Gbps at 5120×1440@240 with loss recovery working; latency unchanged vs. M2.
**M5 — Apple client (L).** Swift + VideoToolbox + Metal + SwiftUI, linking `punktfunk-core` via the C header. *Acceptance:* the Mac Studio plays a stream at native resolution/refresh.
**M6 — Feature surface (M, ongoing).** Mic passthrough as a proper encrypted, per-client reverse audio stream (the thing the upstream PR got wrong); HDR signalling; per-client identity/permissions; pause/resume. *Acceptance:* feature parity with Apollo on the items you care about, plus mic done right.
---
## 9. Risk register
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| KWin runtime virtual-output API is undocumented/unstable | High | High | Spike on wlroots first to de-risk the pipeline; study KRdp's source for the KWin path; keep `VirtualDisplay` pluggable so a stuck compositor doesn't block the project |
| Wayland input injection gaps (libei still evolving) | Med | Med | uinput fallback always available; `reis` for the Wayland-native path |
| dmabuf → encoder zero-copy import quirks per GPU/driver | High | Med | Validate on your actual NVIDIA + AMD hardware early (M0); have a CPU-copy fallback path |
| Encoder/decoder can't sustain 1.77 Gpx/s @ 240 | Med | High | Measure in M0/M4 on real silicon; this is a hardware ceiling no rewrite fixes — discover it before P2, not after |
| Frame pacing eats more time than expected | High | Med | M3 measurement harness first; treat pacing as a first-class subsystem, not a polish step |
| Scope creep into a full Moonlight replacement | High | High | P1 (stock-client compat) is the firewall: it forces you to ship value before writing a client |
| Solo bandwidth vs. your other projects (ENRW thesis, played) | High | Med | M2 is a complete, useful artifact on its own; the plan is safe to pause after any milestone |
---
## 10. Testing & measurement
- **Loopback correctness:** core encodes→FEC→loss-inject→recover→decode in-process; property tests over loss patterns and shard counts (proptest).
- **Glass-to-glass latency:** rendered timestamp/QR on host, read back on client capture; or a photodiode for true photons. Track p50/p99.
- **Loss resilience:** `tc netem` to inject loss/jitter/reorder; verify FEC recovery and graceful degradation.
- **Pacing:** log present timestamps vs. client vsync; alert on stalls and duplicate/dropped frames.
- **Soak:** multi-hour streams; watch for buffer growth, fd leaks, encoder session exhaustion.
- **Hardware matrix:** your NVIDIA box (NVENC), an AMD/Intel box (VAAPI), Mac Studio (VideoToolbox decode). Catch driver quirks early.
---
## 11. Repo / workspace structure
```
punktfunk/
├── Cargo.toml # workspace
├── crates/
│ ├── punktfunk-core/ # protocol, FEC, pacing, crypto — C ABI (cdylib + staticlib)
│ │ ├── src/abi.rs # #[no_mangle] extern "C" surface
│ │ ├── src/fec.rs # GF(2^16) blocking over reed-solomon-simd
│ │ ├── src/transport/ # udp+fec video, quinn control/audio
│ │ ├── src/protocol/ # gamestream-compat (P1) + punktfunk/1 (P2)
│ │ └── cbindgen.toml
│ ├── punktfunk-host/ # Linux host binary
│ │ ├── src/capture/ # pipewire / portal
│ │ ├── src/encode/ # ffmpeg vaapi/nvenc
│ │ ├── src/vdisplay/ # trait + kwin/wlroots/mutter impls
│ │ ├── src/input/ # reis + uinput
│ │ └── src/web/ # axum config/pairing API
│ └── punktfunk-probe/ # reference Rust client (M4)
├── clients/
│ ├── apple/ # Swift package, imports punktfunk_core.h (M5)
│ └── android/ # Kotlin + JNI (later)
├── include/ # generated punktfunk_core.h
└── tools/
├── latency-probe/
└── loss-harness/
```
---
## 12. Immediate next actions (first week)
1. **Stand up the workspace** with `punktfunk-core` (empty ABI + `cbindgen`) and `punktfunk-host` skeletons; CI on your Gitea (you already have BuildKit pipelines).
2. **M0 spike on wlroots:** headless output → PipeWire capture → NVENC/VAAPI encode → playable file. This validates the riskiest *pipeline* assumptions in days, on your real GPU.
3. **Read KRdp's source** for how KDE creates virtual outputs and casts them — it's the closest existing reference for the KWin path you'll need in M2.
4. **Decide P1 protocol depth:** confirm exactly which `serverinfo`/RTSP/pairing messages a current Moonlight client requires for a successful connect, so M2's compat surface is scoped precisely (this is also the question to take back to the dev who mentioned the 1G limit).
---
*The shape of the bet: M2 alone — virtual-display streaming to stock Moonlight clients on Linux — is a complete, useful, gap-filling release. Everything after it (the wall-breaking transport, native clients, mic-done-right) is upside you unlock from a position of having already shipped, with the hard transport work resting on a FEC core that makes the 1 Gbps ceiling a thing of the past rather than a thing to hack around.*
+6 -5
View File
@@ -17,8 +17,8 @@ It's built for the things that make streaming feel native:
- **Low latency, GPU end to end.** Frames go straight from the compositor to the GPU encoder
(NVENC) with zero CPU copies, and over a transport tuned for responsiveness rather than throughput.
- **Works with the apps you already have.** punktfunk speaks the GameStream protocol, so any
**Moonlight** client connects out of the box — and a faster **native protocol** with a dedicated
app for Apple devices.
**Moonlight** client connects out of the box — and a faster **native protocol** with dedicated apps
for **macOS, iOS, tvOS, Linux, Windows, and Android**.
- **Secure by default.** Hosts require a one-time PIN pairing; after that, devices reconnect on a
pinned identity. No accounts, no cloud.
@@ -28,15 +28,16 @@ It's built for the things that make streaming feel native:
<Card title="How It Works" href="/docs/how-it-works" description="The ideas behind punktfunk in a few minutes — virtual displays, the two protocols, pairing." />
<Card title="Quick Start" href="/docs/quickstart" description="From nothing to streaming: set up a host and connect your first client." />
<Card title="Host Setup" href="/docs/requirements" description="Install the host on Ubuntu (GNOME or KDE), Fedora (KDE), or Bazzite." />
<Card title="Connect a Client" href="/docs/clients" description="Stream with the Apple app, Moonlight, or the Linux client." />
<Card title="Connect a Client" href="/docs/clients" description="Stream with the native app for your device — macOS, Linux, Windows, Android — or any Moonlight client." />
</Cards>
## What you need
- A **Linux host** with an **NVIDIA GPU** (for the NVENC hardware encoder) running one of the
[supported setups](/docs/requirements): **Ubuntu** (GNOME or KDE), **Fedora** (KDE), or **Bazzite**.
- A **client device** to stream to — a Mac/iPhone/iPad/Apple TV (native app), or anything that runs
**Moonlight**.
A native [**Windows host**](/docs/windows-host) (NVIDIA-only) is also available.
- A **client device** to stream to — there are native apps for **macOS, iOS/iPadOS, tvOS, Linux,
Windows, and Android**, plus any device that runs **Moonlight**.
- Both on the **same network** (LAN or VPN). punktfunk is designed for a trusted local network.
Ready? Head to the [Quick Start](/docs/quickstart).
+6 -3
View File
@@ -71,7 +71,9 @@ certificate, so you import that certificate once before Windows will install the
1. Open the [packages page](https://git.unom.io/unom/-/packages) (generic group), find
**`punktfunk-client-windows`**, and download the newest **`.msix`** and its matching **`.cer`**.
2. In an **admin** PowerShell, trust the publisher certificate (one-time), then install:
2. **Trust the publisher certificate**, then install. The MSIX won't install until the certificate is
trusted — but it's the **same certificate for every release**, so this is genuinely one-time and
later updates need nothing. In an **admin** PowerShell:
```powershell
Import-Certificate -FilePath .\punktfunk-client-windows.cer `
@@ -85,8 +87,9 @@ certificate, so you import that certificate once before Windows will install the
3. Launch **Punktfunk** from the Start menu and pick your host.
> The Windows client is young (software decode; hardware D3D11VA/HDR in progress). If it
> misbehaves, **[Moonlight](/docs/moonlight)** is a solid alternative for Windows.
> The Windows client's hardware-decode (D3D11VA) and HDR paths are complete but still pending
> validation on real GPU hardware. If anything misbehaves, **[Moonlight](/docs/moonlight)** is a
> solid alternative for Windows.
## macOS
+30 -4
View File
@@ -1,11 +1,12 @@
---
title: Install the Host
description: Pick your distro and install the punktfunk host from its package registry.
description: Install the punktfunk host — on Linux from its package registry, or on Windows from a signed installer.
---
The package registries are the real distribution channel. Pick your distro, add the repo, and install
with your native package manager. Each row links to the full per-distro guide (add the repo, first-run
steps, the web console) — those are the source of truth, so this page doesn't duplicate them.
On Linux, the package registries are the real distribution channel. Pick your distro, add the repo, and
install with your native package manager. Each row links to the full per-distro guide (add the repo,
first-run steps, the web console) — those are the source of truth, so this page doesn't duplicate them.
On **Windows** (NVIDIA), the host ships as a signed installer instead — see [Windows](#windows-nvidia).
## Pick your distro
@@ -20,6 +21,31 @@ Each registry is public — no auth, you just trust the repo's signing key. Addi
one-time step covered in the linked guide; after that, normal `apt upgrade` / `rpm-ostree upgrade`
tracks new builds automatically.
## Windows (NVIDIA)
punktfunk also runs as a native host on **Windows 10/11 (x64) with an NVIDIA GPU**, shipped as a
signed installer — see [Windows Host](/docs/windows-host) for what it includes and its limitations.
1. From the [packages page](https://git.unom.io/unom/-/packages) (generic group), download the newest
**`punktfunk-host-setup-<ver>.exe`** and its matching **`.cer`**.
2. **Trust the publisher certificate once.** The installer is signed with a self-signed certificate
whose public `.cer` is published next to it — the **same certificate for every release**, so this is
genuinely one-time and later updates need nothing. In an **admin** PowerShell:
```powershell
Import-Certificate -FilePath .\punktfunk-host-setup.cer `
-CertStoreLocation Cert:\LocalMachine\TrustedPublisher
```
3. Run `punktfunk-host-setup-<ver>.exe` (elevated). It installs to `C:\Program Files\punktfunk`,
optionally installs the bundled **SudoVDA** virtual-display driver, and registers + starts the
`LocalSystem` service (`/VERYSILENT` for an unattended install). Upgrades and uninstall go through
Add/Remove Programs.
You need an NVIDIA GPU + driver (the host is NVENC-only on Windows). More detail — including the CLI
`punktfunk-host service install` path — is in
[Running as a Service → Windows](/docs/running-as-a-service#windows).
## What the packages are
- **`punktfunk-host`** — the streaming host. Install this on your Linux + NVIDIA gaming machine.
+3 -8
View File
@@ -12,6 +12,7 @@
"fedora-kde",
"bazzite",
"steam-deck-host",
"windows-host",
"running-as-a-service",
"---Connecting---",
"clients",
@@ -22,13 +23,7 @@
"configuration",
"host-cli",
"troubleshooting",
"---Project & Internals---",
"roadmap",
"implementation-plan",
"windows-host",
"apple-stage2-presenter",
"gamescope-multiuser",
"dualsense-haptics",
"ci"
"---Project---",
"roadmap"
]
}
+9 -5
View File
@@ -4,8 +4,12 @@ description: Stream from a punktfunk host using any Moonlight client.
---
punktfunk speaks the **GameStream** protocol, so [Moonlight](https://moonlight-stream.org/) connects
to it like it would to any GameStream host — no punktfunk-specific app needed. This is the easiest way
to stream to Windows, Android, the Steam Deck, a browser, or a TV.
to it like it would to any GameStream host — no punktfunk-specific app needed. It's a great option for
a browser, a smart TV, or any device without a native client.
> Many platforms also have a **native punktfunk client** with lower latency and built-in
> discovery/pairing — including **Windows** and **Android** (phone and Android TV). See
> [Clients](/docs/clients) before reaching for Moonlight.
## 1. Make sure the host is running
@@ -34,7 +38,7 @@ it. Mouse, keyboard, and controllers flow back to the host.
- **Set your resolution and frame rate in Moonlight's settings** before connecting — the host matches
whatever Moonlight asks for, creating the virtual display at that exact mode.
- **Codec:** HEVC (H.265) is a good default; AV1 is available if your client supports it.
- **Bitrate:** start moderate and raise it. For very high bitrates, the native [Apple
app](/docs/clients) has a built-in speed test; with Moonlight, set the bitrate manually.
- **Bitrate:** start moderate and raise it. For very high bitrates, the [native
clients](/docs/clients) have a built-in speed test; with Moonlight, set the bitrate manually.
- Moonlight uses the GameStream protocol, not punktfunk's native FEC/encryption extensions. On a
solid LAN this is fine; on a lossy link the [Apple app](/docs/clients) holds up better.
solid LAN this is fine; on a lossy link a [native client](/docs/clients) holds up better.
+2 -1
View File
@@ -50,7 +50,8 @@ command line instead; the production `serve --native` host arms pairing from the
Then, on the client:
- **Apple app:** select the host (or use *Pair with PIN…* from its menu) and enter the PIN.
- **Native clients (Apple, Linux, Windows, Android):** select the host (or use *Pair with PIN…* from
its menu) and enter the PIN the host displays.
- **Moonlight:** choose **Pair**; Moonlight shows the PIN to confirm on the host side.
## Requiring pairing (the default)
+4 -3
View File
@@ -31,10 +31,11 @@ network, so clients can find it by name. Leave it running. (To start it automati
## 3. Connect and pair a client
On the device you want to stream to:
On the device you want to stream to, use a [native punktfunk client](/docs/clients) for the lowest
latency, or any Moonlight client:
- **Apple (Mac, iPhone, iPad, Apple TV):** open the punktfunk app — your host appears under *On this
network*. Tap it, and when prompted, **pair**.
- **Native client (Apple, Linux, Windows, Android):** open the punktfunk app — your host appears in
the list of hosts found on your network. Select it, and when prompted, **pair**.
- **Anything with Moonlight:** add the host (it should be discovered automatically), then pair.
To pair, the host needs to show a PIN. Arm pairing from the host's web console — the host displays a
+10 -4
View File
@@ -5,8 +5,9 @@ description: What you need to run a punktfunk host — GPU, driver, desktop, and
## Supported setups
A punktfunk host runs on a Linux machine with an NVIDIA GPU. These are the desktop environments it
supports today, each with its own guide:
A punktfunk host runs primarily on a Linux machine with an NVIDIA GPU (a native
[Windows host](/docs/windows-host) is also available — see below). These are the Linux desktop
environments it supports today, each with its own guide:
| Setup | Desktop / compositor | Guide |
|---|---|---|
@@ -18,6 +19,10 @@ supports today, each with its own guide:
Other wlroots compositors (Sway/Hyprland) also work but aren't a primary target. If your desktop isn't
listed, the host still needs one of these compositor backends to create a virtual display.
> **Windows host:** punktfunk also runs as a native host on **Windows 10/11 (x64) with an NVIDIA GPU**
> — a signed installer that registers a service and bundles a virtual-display driver. It's NVIDIA-only
> and newer than the Linux host; see [Windows Host](/docs/windows-host).
## GPU and driver
- **An NVIDIA GPU** with NVENC — effectively any GeForce RTX or workstation card. NVENC is what
@@ -54,5 +59,6 @@ Minimum compositor versions (newer is fine):
## A client
You also need something to stream *to* — see [Connect a Client](/docs/clients). The Apple app and any
Moonlight client both work; both can discover the host on your network automatically.
You also need something to stream *to* — see [Connect a Client](/docs/clients). There are native
punktfunk clients for **Apple (macOS, iOS, iPadOS, tvOS), Linux, Windows, and Android**, and any
Moonlight client works too. All of them can discover the host on your network automatically.
+62 -368
View File
@@ -1,383 +1,77 @@
---
title: "Roadmap"
description: "Decided next goals and the longer-term bets."
description: "What's shipped, what's in progress, and what's next for punktfunk."
---
A quick map of where punktfunk is today and where it's heading. For the detailed, dated changelog,
see [Status & Progress](/docs/status).
Decided 2026-06-10 (research-grounded; see commit history), extended since.
**Legend:** ✅ shipped · 🟡 in progress · 🔭 planned · ⛔ blocked upstream
**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full
stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense**
— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E
live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds
against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced).
**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one
process, with native pairing driven from the **web console** (arm → show PIN), not the service log.
Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`).
**Bazzite dynamic resolution (`c894c6f`):** the host now *manages* a headless `gamescope-session-plus`
Steam session at the **client's exact resolution + refresh** — games see it (via injected
`--nested-refresh` + generated CVT modes, not the box's TV EDID), relaunched per-connection on a mode
change, reused (no Steam restart) on the same mode. Plus macOS/iPad input fixes (NSEvent motion +
iPad pointer-lock) and a 4K/5K one-frame-freeze fix (grow the UDP socket buffers).
## At a glance
**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval), the native
client presenter + iOS (§6), and a Windows host (§7 — now **de-risked via SudoVDA**, no custom
signed driver needed). **§10 HDR/10-bit is parked — blocked upstream at the compositor** (no
gamescope/KWin PipeWire 10-bit producer yet).
| Area | |
|---|---|
| Protocol core — FEC · crypto · C ABI | ✅ |
| GameStream host (works with Moonlight) | ✅ |
| Native `punktfunk/1` protocol | ✅ |
| Linux host (KWin · GNOME · gamescope · Sway) | ✅ |
| Windows host (NVIDIA) | ✅ beta |
| Apple client (macOS · iOS · iPadOS · tvOS) | ✅ |
| Linux client (GTK4) | ✅ |
| Android client (phone · TV) | ✅ |
| Windows client | 🟡 |
| Web console + pairing | ✅ |
| Concurrent sessions (shared desktop) | ✅ |
| Network speed test + bitrate | ✅ |
| HDR / 10-bit streaming | ✅ Windows host · ⛔ Linux host |
| Sub-frame pipelining (latency) | 🔭 |
## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)*
## ✅ Shipped
Startup is a chain of timing-sensitive handoffs with no readiness checks — each is a blind
`sleep`, one-shot timeout, or silent fire-and-forget that fails into a black screen.
- **The host, two ways.** A GameStream host any [Moonlight](/docs/moonlight) client can use, and the
lower-latency native [`punktfunk/1`](/docs/how-it-works) protocol (QUIC control + UDP data with
GF(2¹⁶) Leopard FEC + AES-GCM). Both run from one process.
- **Native-resolution virtual displays** on Linux across KWin, GNOME/Mutter, gamescope, and
Sway/wlroots, with a fully zero-copy GPU path to NVENC (stable 240 fps at 5120×1440).
- **A native Windows host** (NVIDIA, x64) — a signed installer with secure-desktop capture and a
bundled virtual-display driver, and the only host that can stream **HDR** (10-bit BT.2020 PQ,
captured from an HDR Windows desktop and encoded as HEVC Main10). See
[Windows Host](/docs/windows-host). *(Beta — newer than the Linux host.)*
- **Clients on every platform** — native apps for **Apple** (macOS, iOS, iPadOS, tvOS), **Linux**,
**Android** (phone + TV), and **Windows**, each with hardware decode, controllers including
DualSense, audio + mic, and automatic host discovery. See [Clients](/docs/clients).
- **Secure by default** — SPAKE2 PIN pairing with pinned reconnects, one-click delegated approval from
the web console, and mDNS LAN auto-discovery.
- **Tuned for latency** — concurrent sessions (stream one desktop to several devices at once),
mid-stream resolution renegotiation, a cross-machine clock-skew handshake, a 1 Gbps+ data plane, and
an in-app network speed test that informs the bitrate picker.
- **Phase 1 (S):** replace `run-headless-kde.sh`'s blind `sleep 2` with an active readiness
wait (kwin socket + `wl_display` roundtrip + `zkde_screencast` global advertised +
KWIN_PID alive); add a `punktfunk-host probe-compositor` subcommand (reuses kwin.rs's
registry roundtrip); move the portal restart to *after* readiness and precede it with
`systemctl --user import-environment` + `dbus-update-activation-environment` (the missing
env import — the Sway script does this, the KDE one doesn't).
- **Phase 2 (M):** bounded retry-with-backoff around `vd.create()` + first-frame
(permanent vs transient); a PipeWire negotiation watchdog with zero-copy→CPU auto-fallback
("no PipeWire frame within 10s" → recovery or precise diagnosis); fix `set_custom_refresh`
to wait for the output, read back the active mode, reconcile encoder fps; harden gamescope
node discovery + detect the known-bad-gamescope signature; graceful PipeWire-thread stop.
- **Phase 3 (L):** supervised systemd user session (kwin + portal + host) with the readiness
probe as an `ExecStartPost` gate, `Restart=on-failure`.
## 🟡 In progress
## 2. Offer available compositors in the client ✅ *(done)*
- **Windows client on-glass validation.** The hardware (D3D11VA) decode, HDR present, and GUI are
built and ship as a signed MSIX — they just need verification on real GPU hardware.
- **Apple stage-2 presenter as the default.** The lower-latency `VTDecompressionSession`
`CAMetalLayer` path is live behind an opt-in flag and graduating to the default.
- **Web console parity.** Surfacing the speed test and bitrate picker the apps already have.
- **Windows host hardening.** Broader real-world testing, AMD/Intel encode (NVIDIA-only today), and
bundling the ViGEm gamepad driver.
Host enumerates which backends are actually available (binary present + version OK:
gamescope ≥3.16.22, KWin ≥6.5.6, gnome-shell, sway), advertises the list in the punktfunk/1
Welcome + a mgmt-API field; client sends its pick in the Hello; host honors it per session.
Picker in the Apple client + web console.
## 🔭 Planned
## 3. Bazzite / install on other devices ✅ *(packaging written — `packaging/`)*
- **Sub-frame pipelining.** Overlap encode and transmit within a single frame (a direct NVENC slice
path) — the next big latency lever at high resolutions.
- **True glass-to-glass latency** measured end to end (capture → on-screen present).
- **gamescope multi-user isolation.** Per-session input and audio so concurrent clients are fully
independent desktops (the shared-desktop case already works).
- **Peer-approved pairing.** Approve a new device from an already-paired device's own app.
Bazzite already ships gamescope + PipeWire + the NVIDIA driver (incl. `libnvidia-encode`);
it's Fedora-atomic and the community installs Sunshine via COPR rpm-ostree — the analog.
Written: `packaging/rpm/punktfunk.spec` (builds the host from source), `packaging/bootc/Containerfile`
(`FROM bazzite-nvidia`), `packaging/bazzite/host.env` (gamescope default), `packaging/copr/` +
`packaging/README.md`. The build itself is operator-run (COPR / a Fedora toolbox; not buildable on
the Ubuntu dev box). `LICENSE-{MIT,APACHE}` added to match the declared dual license.
## ⛔ Parked / blocked
- **M-Bazzite-1:** a **COPR RPM** (primary) — binary + `60-punktfunk.rules` (→
`/usr/lib/udev/rules.d`) + systemd `--user` unit + `host.env.example`; `Requires` the
NVENC ffmpeg-libs Bazzite already pulls; links host `libcuda`/`libnvidia-encode` directly.
Install = `rpm-ostree install` + reboot + add to `input`/`render`. Default backend =
Bazzite's already-present **gamescope** (minimal session plumbing).
- **M-Bazzite-2:** wrap the RPM in a **bootc/OCI image layer** (`FROM
ghcr.io/ublue-os/bazzite-nvidia:stable`) for the appliance/"just rebase" experience.
- Flatpak only later as an explicitly-degraded convenience build (sandbox fights
zero-copy NVENC/dmabuf/uinput).
## 4. Mic passthrough — client mic → host input device ✅ *(done — host side)*
The exact mirror of the host→client desktop-audio path. A PipeWire virtual source apps can
select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`.
- New `0xCB` MIC_AUDIO datagram (mirror of `0xC9`) + `NativeClient::send_audio` + ABI
`punktfunk_send_audio`.
- `audio/source_linux.rs` — near-copy of the capture file, Direction::Output, fed from a
jitter buffer (silence-fill underrun, Opus PLC).
- Host `mic_thread` (Opus decode → ring → source); teardown RAII, set `node.dont-reconnect`.
- Apple capture (AVAudioEngine → Opus). **Opt-in + paired-only** (a remote mic is a privacy
surface). punktfunk/1-only.
## 5. Touch + rich DualSense *(decision: commit to full UHID DualSense)*
- **Touch — implemented (host path), pending a backend that lands it.** `TouchDown/Move/Up`
InputKinds (reuse the abs-pointer `flags=(w<<16)|h` mapping, `code`=touch id); host
`inject/libei.rs` requests the `Touchscreen` device type + binds the `Touch` capability and
injects `ei_touchscreen` down/motion/up; `punktfunk-probe --touch-test` drags a finger.
**Validated:** KWin's RemoteDesktop portal *grants* the Touchscreen device type, but its EIS
server creates **no touchscreen device** (headless KWin) — so touch currently no-ops on KWin
(now logged once). The code is correct; it needs a backend that exposes `ei_touchscreen`
(gamescope / newer KWin / the real iPad client path) to land. wlroots: no virtual-touch wired.
- **Rich DualSense — HID backend built & validated live.** `inject/dualsense.rs`: a hand-rolled
`/dev/uhid` codec (no bindgen) presenting a genuine USB DualSense (vendor 054C/0CE6, the 232-byte
inputtino report descriptor) bound by the kernel `hid-playstation` driver. The mandatory
GET_REPORT feature handshake (calibration 0x05 / pairing 0x09 / firmware 0x20) is answered, so the
kernel creates the full device (gamepad/motion/touchpad/lightbar). Input report `0x01` is built
from gamepad frames; output report `0x02` is parsed for LED RGB, player LEDs, and **adaptive
trigger effects (L2/R2)**. Protocol carries new side-planes: rich-input `0xCC`
(touchpad/motion) + HID-output `0xCD` (LED/triggers). `/dev/uhid` udev rule shipped.
- **Rich DualSense — Phase C/D/E end-to-end, validated live.** `PUNKTFUNK_GAMEPAD=dualsense`
selects a per-session `DualSenseManager` (the `PadBackend` enum in `punktfunk1.rs`): client gamepad frames
build the DualSense report; the kernel's feedback comes back as `HidOutput` on the **0xCD** plane
(lightbar / player LEDs / adaptive triggers) while **rumble stays on the universal 0xCA plane**
(so non-DualSense clients still feel it); touchpad + motion ride the **0xCC** rich-input plane
(`DualSenseManager::apply_rich`, merged with button state). The connector + C ABI gained
`punktfunk_connection_next_hidout` (→ `PunktfunkHidOutput`) and `punktfunk_connection_send_rich_input`
(← `PunktfunkRichInput`); header regenerated. Validated on-box: a synthetic-source `punktfunk1-host` +
`punktfunk-probe --rich-input-test` created the real kernel DualSense, drove 0xCC, and decoded
12 live 0xCD events (the kernel's actual lightbar/trigger init reports) — data plane unaffected
(600/600 frames). *Remaining:* the Apple client renders adaptive triggers + rumble on a real
DualSense (`GCDualSenseAdaptiveTrigger`) — handed off to the client agent for the real playtest.
- **Advanced (audio-driven voice-coil) haptics — scoped, NO-GO for now (`docs/dualsense-haptics.md`).**
Driven by the DualSense's USB *audio* interface (4-ch, back 2 channels = haptic PCM), not HID — so
the UHID backend structurally can't carry it. Three independent walls: host capture needs a kernel
rebuild (`CONFIG_USB_DUMMY_HCD` is off → no UDC for an `f_uac2` gadget); **near-zero Linux supply**
(only ~510 Proton titles via custom Wine patches emit it; `hid-playstation`/Steam Input/RPCS3
don't); and the Apple client can't faithfully replay PCM haptics (CoreHaptics is discrete/pattern-
based, no public channel-3/4 routing). Deferred; revisit only if a real DS for capture + a UDC/host
path + a PCM-capable client all land. Adaptive triggers (HID, above) deliver the reachable 80%.
## 6. iOS/iPadOS → tvOS *(deferred)*
PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin
+ touch capture (#5) + UI. tvOS later.
## 7. Windows as a host *(scoped — `docs/windows-host.md`; de-risked via SudoVDA)*
Architecturally an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/
crypto/C-ABI) + QUIC + GameStream + mgmt + the `m3`/pipeline orchestration are all platform-agnostic
and already `cfg`-isolated (~95% reuse). New `#[cfg(windows)]` backends behind the existing traits:
capture (DXGI Desktop Duplication / Windows.Graphics.Capture), encode (Media Foundation / NVENC-SDK
with a D3D11 context), input (SendInput + ViGEm), audio (WASAPI loopback + a virtual mic).
**The old blocker is gone.** Rather than author + sign our own kernel IDD for the per-client virtual
display, use **SudoVDA** (the Sunshine Virtual Display Adapter) — a pre-built, signed Indirect
Display Driver that creates virtual displays at arbitrary WxH@Hz on demand. The `VirtualDisplay`
backend becomes *"install + drive SudoVDA's control API"* (M effort), not *"write + WHQL-sign a
kernel driver"* (XL). That removes the only hard blocker — the Windows host is now a medium,
mostly-mechanical port. Recommended start: **Phase 0** — capture an existing monitor to prove the
stack end to end; **Phase 1** wires SudoVDA for the native-resolution output. Deferred only because
it's unbuildable on the Linux dev box; the trait boundaries are already in the right places.
## 8. Pairing & trust hardening *(§8a + §8b-1 done; §8b-2 next)*
The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the
client) is built and live. Two changes harden it from "works" to "secure by default":
- ✅ **Mandatory PIN pairing by default — done & live** (`§8a`, `serve --native` now requires
pairing; `serve --open` disables it). An unpaired client is rejected at the session gate; pairing
is via the SPAKE2 PIN ceremony (one online guess, no offline attack) armed from the web console.
Validated live: unpaired → "this host requires pairing", then web-armed PIN → "client trusted".
Deployed to the dev box + Bazzite.
- ✅ **§8b-1 Delegated approval via the console — done (2026-06-12)** *(the ergonomic enabler for
"mandatory": pair a device without fetching the host PIN out of band).* An identified-but-unpaired
device that knocks on a pairing-required host is held as a **pending request** in `NativePairing`
(in-memory, deduped by fingerprint, 32-entry cap, 10-min expiry — a LAN scanner can't grow it,
and an anonymous client with no certificate records nothing). The mgmt API gains
`GET /native/pending` + `POST /native/pending/{id}/approve` (optional `{name}` to relabel) +
`POST /native/pending/{id}/deny`; the web console's Pairing page shows a **Waiting for approval**
section (live-polling) with Approve/Deny — approve pins the fingerprint on the spot, no PIN.
The `Hello` carries an optional trailing **device name** (same back-compat pattern as
compositor/gamepad/bitrate; `client-rs --name` sends it, fingerprint-derived label otherwise) so
the pending list is human-readable. End-to-end tested (knock → pending → approve → same identity
streams) + unit/mgmt tests.
- **§8b-2 (peer push — needs the client):** the host also pushes the pending request over a paired
**Device A**'s live QUIC connection (a new control-plane message); A's app renders the prompt and
replies approve/deny — the user's exact "Device A gets a notification" flow. The native/Apple UI
is a client-agent task. The Apple connector should also start sending the `Hello` device name
(needs a connect-ABI name parameter; the wire field is already live).
PIN pairing (§8a) stays the bootstrap — the first device, or when no approver is online.
## 9. Client→host network speed test + settable bitrate *(host + Apple client done — web console remaining)*
Measure what the network actually sustains so the bitrate picker is informed (suggest/cap a safe
value) instead of guesswork that ends in a stuttering stream.
**Done & live (host + protocol + connector + C ABI, `74819b1`):**
- **Bitrate negotiation**: `bitrate_kbps` rides Hello/Welcome (trailing-byte back-compat). The
client requests a rate; the host clamps to [500 kbps, 2 Gbps] (or its 20 Mbps default on 0),
applies it to NVENC (replacing the old hardcoded 20 Mbps) on the initial mode + every reconfigure,
and echoes the resolved value. C ABI: `punktfunk_connect_ex3(…, bitrate_kbps, …)` +
`punktfunk_connection_bitrate()`.
- **Bandwidth probe over the punktfunk/1 data path**: `ProbeRequest{target_kbps,duration_ms}` /
`ProbeResult{bytes_sent,…}` control messages + a `FLAG_PROBE` packet flag. The host bursts
zero-filled FEC-encoded AUs at the target goodput for the duration (clamped ≤ 3 Gbps / ≤ 5 s,
video paused), reports what it sent; the connector measures received bytes/window → goodput + loss
and exposes it (`punktfunk_connection_speed_test()` + `punktfunk_connection_probe_result()` →
`PunktfunkProbeResult{throughput_kbps, loss_pct, …}`). Probe filler is diverted from the decoder.
Validated on loopback (synthetic source): a 20 Mbps/2 s probe measured 20050 kbps at 0% loss,
interleaved probe AUs excluded from frame verification. `punktfunk-probe` gains `--bitrate` +
`--speed-test KBPS:MS` as the reference/loopback driver.
**Done (Apple client UI):** Settings grows a Bitrate control (Automatic = host default; manual is
a log-scale slider up to 3 Gbps with an above-1-Gbps "test the speed first" warning — tvOS keeps
a focus-native preset picker; rides `connect_ex3` on every connect, `PUNKTFUNK_BITRATE_KBPS` dev
override), and each host card's context menu gets
"Test Network Speed…" — a sheet that connects, runs `speed_test` (up to the host's 3 Gbps
probe ceiling for 2 s), polls `probe_result` with a live readout, and shows measured
goodput · loss · recommended bitrate (≈70% of measured, capped at the 2 Gbps session
ceiling) with a one-tap "Use N Mbps" writing the setting. Loopback-tested through the
xcframework: bitrate echo (50 000 → 50 000) + a 20 Mbps/500 ms probe completing with real numbers.
**Remaining:** surface both in the web console.
## 10. HDR + 10-bit color *(parked — blocked upstream at the compositor producer)*
Opt-in HDR10 (BT.2020 + PQ, 10-bit) streaming. Designed end to end; **blocked at capture, not in our
stack** — the compositor doesn't emit a 10-bit/HDR PipeWire frame on any shipping build. Spiked +
researched 2026-06-11 (memory: `hdr-blocked-gamescope-pipewire`); the downstream design is ready to
build the moment a producer lands.
- **The wall — gamescope capture is 8-bit.** gamescope composites HDR for a *display*
(`--hdr-enabled`, `--hdr-debug-force-output`), but its PipeWire capture node offers only `BGRx`/
`NV12` (8-bit) — confirmed by reading `src/pipewire.cpp` `build_format_params()` on upstream master
AND the box's exact build (`c31743d`); color is capped BT.601/709. Issue **#2126** ("pipewire: add
HDR streams") is OPEN + unstarted (no PR). Forcing HDR output does not change the capture format.
- **PipeWire ≥1.6** is the other prerequisite (HDR colortype transport). Fedora 43 ships 1.4.x;
Fedora 44 ships **1.6.6** — but Bazzite F44 `deck-nvidia` is **testing-only** (`:stable` is still
F43; `:testing` has a confirmed NVIDIA Game-Mode crash). Rebasing now clears *only* the PipeWire
wall while gamescope stays 8-bit → **no-go** for HDR; revisit a rebase when F44 promotes to stable
(for its own sake), not for HDR.
- **The realistic route is KWin, not gamescope:** **KWin MR !8293** is a live draft adding HDR
PipeWire capture. That pulls HDR onto the *desktop* (KWin) path — trading away gamescope's
Steam-Deck-UI polish + the dynamic-resolution work (§ above). Track #2126 and !8293.
- **Constraints (settled):** NVENC tops out at **10-bit** → no Main12, no 12-bit AV1. **HDR ⟹ HEVC
Main10** (Apple VideoToolbox decodes 10-bit HEVC but **not** 10-bit AV1). Static HDR10 SEI
(BT.2020-PQ default) since the compositor won't surface per-frame metadata. Opt-in negotiation via
the Hello/Welcome trailing-byte pattern (SDR default; client declares HDR want; host master toggle).
- **Downstream design (ready when capture unblocks):** add P010 + `ColorInfo` to capture; 10-bit
zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA
path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple
VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback.
## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)*
Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks
the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling.
**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and
untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is
only ~434874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM
(RustCrypto auto AES-NI, ~1025× headroom on x86_64); the u64 sequence/nonce space; and the **core
`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every
legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size
must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated
params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand.
- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the
clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show
headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`)
so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock
send-buffer drop — `Transport::send` → `Result<bool>`, a new `packets_send_dropped` stat (Stats +
C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe
completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe.
- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall
per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe
bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all`
(64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux
`sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with
`recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep
underdrain). ~64× fewer syscalls.
- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the
reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 2561024 (bounds burst-drop
blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump
`shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets.
- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB,
never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface.
- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path):
`punktfunk-probe --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching
`packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.51 Gbps (no explicit
`rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.
## 12. Glass-to-glass latency *(investigated; quick wins landed, bigger bets scoped)*
A 5-way investigation (2026-06-11) mapped where latency actually lives. The measured "p50 0.83 ms"
is only the same-host **capture-stamp→reassembled** slice (~3040% of true glass-to-glass) and was
measured with tiny single-chunk frames, so it excludes the pacing tail. The latency that matters, in
priority order: **(1) the host pacing tail** — `paced_submit` used to spread *every* multi-chunk
frame over ~90% of the interval (up to ~7.5 ms@120 / ~15 ms@60); **(2) native-path serialization** —
`virtual_stream` runs capture+encode+seal+paced-send on one thread, so frame N+1 can't start until
frame N's paced tail leaves the wire; **(3) client present** — `AVSampleBufferDisplayLayer` adds
~0.5 refresh (~4 ms@120Hz, ~8 ms@60Hz), the dominant client term at 60 Hz.
**Already optimal — do NOT touch** (confirmed): NVENC tuning (p1/ull/cbr/bf0/delay0/infinite-GOP +
forced-IDR — `receive_packet` is already same-frame); the device→device copy in `submit_cuda` (avoids
NVENC registration-cache thrash); FEC `max_data_per_block=4096` (every frame incl. a 4 MB IDR is one
block — no multi-block latency); the client reassembler (no jitter buffer, frame emitted on
last-packet arrival, `REORDER_WINDOW` is a dedup bound not a delay) — do **not** add a client jitter
buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
- **Done & live (`99f60b5`):** **microburst-cap pacing** — a frame ≤ a cap (default 128 KB,
`PUNKTFUNK_PACE_BURST_KB`) bursts out immediately (no pacing tail); only a bigger frame's overflow
(IDR / sustained high bitrate — the bursts that actually froze) is spread. Recovers the tail on the
common case, keeps the freeze fix for the frames that need it; 128 KB is a safe default (well under
the ~150 Mbps@60 frame size where drops began). Plus **per-frame instrumentation** (PUNKTFUNK_PERF):
`encode_us` + `pace_us` p50/p99/max + immediate-vs-paced counts, so the cap is tunable against real
numbers. **Validate with the LAN soak before raising the cap** (`send_dropped` must stay 0).
- **Done & live (`b295a5b`; validated on the GNOME box 2026-06-12):** **encode|send thread split**
on the native path — a dedicated `send_loop` thread owns the `Session` and does seal+pace+send+
probes; the encode thread captures+encodes+handles reconfig and hands `FrameMsg` over a bounded
`sync_channel(3)` with backpressure. Removes the serialization (~28 ms @60120 fps) and is the
substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
- **Done & live (skew handshake landed 2026-06-12):** **wall-clock skew handshake** — `ClockProbe`/
`ClockEcho` on the control stream (8 NTP-style rounds right after `Start`; min-RTT sample →
hostclient offset; `clock_offset_ns`). The client adds the offset to its receive instant before
differencing against the AU `pts_ns`, so the `capture→reassembled` percentiles are now valid
**across machines** (reported `skew_corrected=true`), not just same-host. Back-compat: an old host
that doesn't answer times out → `skew_corrected=false` (shared-clock assumption, as before).
Validated cross-LAN (GNOME box → dev box): offset ≈ 1.57 ms (reproducible), rtt ~140 µs, **p50
1.30 ms** skew-corrected capture→reassembled. The skew handshake is now a shared core helper
(`quic::clock_sync` → `ClockSkew`) used by both the reference client and the **embeddable
connector** — `NativeClient` runs it at connect and exposes the offset over the C ABI
(`punktfunk_connection_clock_offset_ns`), so the Apple client can convert a present instant to the
host clock. The Apple client now consumes that offset: `PunktfunkConnection.clockOffsetNs` +
`LatencyMeter` surface a **capture→client-receipt** (skew-corrected) p50/p95 in the HUD — the first
cross-machine latency the real Apple client reports. **Remaining for *true* glass-to-glass**:
(1) the **decode→present** tail — the stage-1 `AVSampleBufferDisplayLayer` decodes+presents
compressed samples internally with no per-frame callback, so it needs the **stage-2 presenter**
(`VTDecompressionSession` decode-completion timestamp + `CAMetalLayer`/display-link present) to
stamp on-glass present time; (2) the host **render→capture** term (PipeWire buffer presentation
timestamp vs our capture stamp). **render→capture is parked (low priority):** pipewire-rs 0.9.2
exposes no per-buffer meta accessor, no raw buffer pointer (`pub(crate)`), and no stream-timing
API, so reading `SPA_META_Header.pts` would require introducing raw `spa_sys`/`pw_sys` FFI into the
working, perf-critical capture buffer-acquisition — a risky rewrite for the *smallest* g2g term,
with KWin/Mutter `Header.pts` support unconfirmed. Glass-to-glass is effectively complete as
**capture→present** (the stage-2 presenter measures it). `tools/latency-probe` is still the
cross-machine orchestrator.
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
1. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
copy) — ~0.10.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
2. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
3. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
encode+send within a frame (~36 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
thread split, only after measurement justifies it.
## 13. Native-protocol LAN auto-discovery ✅ *(done — 2026-06-12, validated cross-LAN)*
The native protocol had no discovery — clients connected by `--connect HOST:PORT` only, while
GameStream already auto-discovered via mDNS (`_nvstream._tcp`). Now both the unified host
(`serve --native`) and standalone `punktfunk1-host` advertise the native service over mDNS:
- **Service**: `_punktfunk._udp.local.` (UDP — punktfunk/1 is QUIC; the advertised port is the QUIC
control/data port). Host side: `crate::discovery::advertise_native`, wired into `m3::serve` so
both host entry points get it; best-effort (a discovery failure never blocks streaming —
`--connect` always works). The advert is held for the host's lifetime (RAII unregister).
- **TXT records**: `proto=punktfunk/1`, `fp=<host cert SHA-256>` (the value a client pins — advisory
over unauthenticated mDNS, TOFU/pinning still verifies on connect), `pair=required|optional`
(so a picker knows up front whether the PIN ceremony is needed), `id=<host uniqueid>` (dedup).
- **Client**: `punktfunk-probe --discover [SECS]` browses and prints each host (name, addr:port,
pairing, fingerprint), then exits. Apple clients browse the same service natively via NWBrowser
(Bonjour) — no Rust-connector dependency; this section's service type + TXT keys are the contract.
- **Validated**: cross-LAN — dev box discovered the GNOME-box appliance
(`home-worker-3 192.168.1.248:9777 pair=required fp=1dcf3a…`) and a standalone synthetic host
(`pair=optional`); fingerprint + pairing state correct in both.
- **Next** (not done): wire NWBrowser discovery into the Apple client UI (host picker); the
host-side contract above is all it needs.
## 14. Concurrent sessions *(shared-desktop multi-view ✅ done; multi-user deferred)*
The host no longer serves one client at a time. The accept loop spawns each session (`JoinSet`),
bounded by `--max-concurrent` (default 4 — a NVENC bound; overflow waits in the accept queue). Each
session keeps its own virtual output + NVENC encoder; the host-lifetime input/audio/mic services stay
shared.
- **Done & live (shared-desktop multi-view):** multiple devices viewing/controlling the **same**
desktop on the shared-desktop backends (kwin/mutter/wlroots) — e.g. stream *your* desktop to a
laptop + a TV at once; shared input/audio is the correct semantics there. Validated live on the
GNOME box: two clients → **two independent Mutter virtual outputs (1280×720 + 1920×1080) streaming
simultaneously**. The QUIC handshake stays in the accept loop so a failed handshake doesn't consume
a slot or block the next client.
- **Deferred — gamescope multi-user (independent desktops):** the *other* model — each client its own
gamescope instance, with **per-session input + audio + mic** (the multi-user / cloud-gaming case).
Researched 2026-06-12 and **parked**: it's a large multi-file refactor (per-instance EIS sockets +
a per-session injector + per-session **null-sink audio routing** + per-session mic) for a niche
use case, while the common multi-device case is already covered by the multi-view model above. Full
research + the plumbing list: [gamescope Multi-User Isolation](/docs/gamescope-multiuser).
- **HDR / 10-bit on the *Linux* host.** HDR streaming already works from a
[Windows host](/docs/windows-host) to an HDR-capable client (Windows, Android). On Linux it's
blocked upstream — no shipping compositor emits a 10-bit/HDR capture stream yet — and ready the
moment one does.
- **Advanced DualSense voice-coil haptics.** Scoped and shelved (it rides the controller's USB audio
interface, with near-zero game support on Linux). Adaptive triggers, rumble, and the lightbar
already ship.
@@ -79,6 +79,10 @@ session unit — see [Bazzite](/docs/bazzite).
## Windows
> punktfunk is Linux-first, but a native **Windows host** also ships — a signed installer with an SCM
> service and a bundled virtual-display driver. It's **NVIDIA-only** (NVENC) and newer than the Linux
> host. (Not to be confused with the Windows *client*, which streams *to* a Windows PC.)
On Windows the host runs as a `LocalSystem` service that launches into the interactive session, so it
captures the secure desktop (UAC / lock screen) and survives reboots with nobody logged in — the same
model Sunshine/Apollo use.
@@ -98,7 +102,7 @@ way you need an NVIDIA GPU + driver (the host is NVENC-only on Windows).
After a reboot, from another machine on the network:
```sh
punktfunk-probe --discover # or just look for the host in the Apple app / Moonlight
punktfunk-probe --discover # or just look for the host in a native client / Moonlight
```
If the host is listed, it's up. If not, check `journalctl --user -u punktfunk-host` on the host.
+99 -83
View File
@@ -1,11 +1,11 @@
---
title: "Status & Progress"
description: "Where the work stands, what's live on each box, and a running progress log."
description: "Where the work stands across the core, the host, and the native clients."
---
The living progress tracker. Milestone-level status lives in [`CLAUDE.md`](https://github.com)
and the design in the [Implementation Plan](/docs/implementation-plan); this page is the
**current state + a dated log** of what landed, kept up to date as work happens. Newest first.
A high-level view of where punktfunk stands. The ordered plan of work is on the
[Roadmap](/docs/roadmap), and milestone-level detail lives in
[`CLAUDE.md`](https://git.unom.io/unom/punktfunk/src/branch/main/CLAUDE.md).
## Milestones at a glance
@@ -13,92 +13,108 @@ and the design in the [Implementation Plan](/docs/implementation-plan); this pag
|---|---|
| **Core**`punktfunk-core` + C ABI (protocol · FEC · crypto) | ✅ complete & hardened |
| **GameStream host** (Moonlight-compatible) | ✅ working end-to-end; HDR/surround-audio polish open |
| **Native protocol**`punktfunk/1` (QUIC control + UDP data) | ✅ full session planes, validated live |
| **Native clients** — decode + present (Apple first) | 🟡 macOS stage 1 live; stage-2 presenter built + decode-tested (opt-in, present needs live validation). **Linux GTK client stage 1 live** (2026-06-12) |
| **Native protocol**`punktfunk/1` (QUIC control + UDP data, GF(2¹⁶) Leopard FEC + AES-GCM) | ✅ full session planes, validated live |
| **Windows host** (NVIDIA, x64) | 🟡 implemented & shipping as a signed installer; NVIDIA-only, newer than the Linux host |
| **macOS / iOS / iPadOS / tvOS client** | ✅ full client; on-glass stage-2 presenter behind an opt-in flag, becoming the default |
| **Linux client** (`punktfunk-client`, GTK4/libadwaita) | ✅ full client; VAAPI zero-copy decode + software fallback |
| **Windows client** (`punktfunk-client`, WinUI 3) | ✅ stage 1 complete; ships as signed MSIX; on-glass hardware validation pending |
| **Android client** (phone + Android TV) | ✅ full client; hardware HEVC decode + HDR10 |
| **Web console** (over the management API) | ✅ status · devices · pairing |
## Live on the boxes
## What works today
| Box | Role | Compositor | Notes |
|---|---|---|---|
| **home-worker-2** (dev) | KDE/KWin appliance | kwin (headless Plasma) | QEMU VM, passthrough RTX 5070 Ti; `serve --native` user unit |
| **home-worker-3** (GNOME) | GNOME/Mutter appliance | mutter (RecordVirtual) | RTX 4090; autologin GNOME Wayland; `serve --native` user unit. See [Ubuntu — GNOME](/docs/ubuntu-gnome) |
| **home-bazzite-1** | SteamOS-like host | gamescope | host-managed Steam session at client mode. See [Bazzite Setup](/docs/bazzite) |
punktfunk is a low-latency desktop and game streaming **host** — Linux-first (Linux + NVIDIA, NVENC),
with a newer **NVIDIA-only Windows host** too — and native **clients** on macOS, iOS/iPadOS/tvOS,
Linux, Windows, and Android.
All three appliances advertise over mDNS (`_punktfunk._udp`) and require PIN pairing by default.
- **Two protocols.** The host speaks the **GameStream** protocol, so any **Moonlight**
client works out of the box, plus its own lower-latency **`punktfunk/1`** protocol
(QUIC control plane + UDP data plane with GF(2¹⁶) Leopard FEC and AES-GCM).
- **Native resolution, no scaling.** Every session gets a virtual display at the client's
exact resolution and refresh rate, via per-compositor backends for **KWin**,
**gamescope**, **Mutter**, and **Sway/wlroots**.
- **Zero-copy GPU pipeline.** Captured frames stay on the GPU (dmabuf → CUDA → NVENC) with
automatic split-encode at very high resolutions. Stable 240 fps at 5120×1440 has been
measured.
- **HDR (10-bit), on the Windows host.** An HDR Windows desktop is captured and encoded as HEVC
Main10 (BT.2020 PQ) to HDR-capable clients (Windows, Android). Linux hosts stream 8-bit for now —
HDR there is blocked upstream at the compositor.
- **Secure by default.** A **SPAKE2 PIN pairing** ceremony establishes trust (the host
shows a 4-digit PIN; an attacker gets a single online guess, no offline dictionary
attack). Trust-on-first-use (TOFU) remains an explicit opt-in for fully trusted LANs.
- **LAN auto-discovery.** Hosts advertise over mDNS (`_punktfunk._udp`); clients browse and
list them automatically.
- **Full input.** Mouse, keyboard, and gamepads (including DualSense touchpad, motion,
rumble, lightbar, player LEDs, and adaptive triggers) in both directions.
- **Audio both ways.** Opus desktop audio host → client, plus an opt-in, paired-only client
microphone uplink.
- **Management surface.** A REST management API with a checked-in OpenAPI document, plus a
web console for status, paired devices, and pairing.
## Progress log
### Native clients
### 2026-06-12
- **Native Linux client — stage 1, first light** (`clients/linux`, binary
`punktfunk-client`). GTK4/libadwaita app on the **Option A** architecture picked after a
six-angle research pass (toolkits / hw decode / Wayland presentation / input capture /
prior art / codebase): links `punktfunk-core` directly as a crate (no C ABI;
`NativeClient` is `Sync` now), mDNS host list, TOFU + SPAKE2 PIN pairing dialogs
(identity shared with `client-rs`), FFmpeg software HEVC decode (`LOW_DELAY` + slice
threads) into a `GtkGraphicsOffload`-wrapped picture, PipeWire playback with the host
mic-player's jitter ring inverted, SDL3 gamepad capture + rumble/lightbar feedback,
layout-independent keyboard (exact inverse of the host's VK table), absolute mouse +
WHEEL_DELTA scroll, compositor-shortcut inhibition, fullscreen, stats overlay.
**Validated live** against this box's `serve --native`: 1080p60 at a locked 60 fps,
capture→decoded **p50 ≈ 6.4 ms** (software decode, debug build). Next: VAAPI dmabuf →
`GdkDmabufTexture` (Tier-1 zero-copy on Intel/AMD clients), DualSense
touchpad/motion/trigger replay over SDL3, then the stage-2 raw-Wayland presenter
(wp_presentation feedback, tearing-control, Vulkan Video for NVIDIA clients).
- **Delegated pairing approval (§8b-1)** — an unpaired device that tries to connect to a
pairing-required host now shows up as a **pending request** in the web console's Pairing page;
one click approves it (optionally relabeling) and pairs its certificate fingerprint — no PIN
fetched out of band. New mgmt endpoints (`/native/pending` + approve/deny), an in-memory pending
queue in `NativePairing` (fp-deduped, capped, 10-min expiry), and an optional **device name** in
the `Hello` (back-compat trailing field; `client-rs --name` sends it). End-to-end tested.
§8b-2 (approve from a paired device's own app) is the client-side follow-up.
- **CI + deployment landed** (see the [CI & Docker](/docs/ci) guide). Gitea Actions, three
workflows: Rust workspace checks inside the new `punktfunk-rust-ci` builder image (Ubuntu 26.04,
full link-dep stack incl. a libcuda stub — 141/141 tests green in-container), web + docs-site
build/typecheck, `docker.yml` building+pushing `punktfunk-web`/`punktfunk-docs`/`punktfunk-rust-ci`
to the registry, and `apple.yml` (xcframework → `swift build`/`swift test`) on a new **host-mode
macOS runner** (`home-mac-mini-1`, provisioned by `scripts/ci/setup-macos-runner.sh`; macOS
Local-Network privacy forces it to run as a root LaunchDaemon). Host and native clients stay
un-dockerized by design. **This site now deploys automatically**: `deploy-docs` ships it to
unom-1:3220, Caddy serves <https://docs.punktfunk.unom.io> — live and verified.
- **Concurrent sessions** — the host no longer serves one client at a time. The accept loop spawns
each session (`JoinSet`), bounded by `--max-concurrent` (default 4, a NVENC bound; overflow waits
in the accept queue). Each session keeps its own virtual output + encoder; they share the
host-lifetime input/audio/mic services — i.e. **multiple devices viewing/controlling the same
desktop** on kwin/mutter/wlroots. Validated live on the GNOME box: two clients connected at once
**two independent Mutter virtual outputs (1280×720 + 1920×1080) streaming simultaneously**
(39 MB + 48 MB). gamescope's *independent-desktops* (multi-user) isolation — per-session
input/audio — is a follow-up.
- **Apple client latency HUD** — `PunktfunkConnection.clockOffsetNs` (from the C-ABI getter) +
`LatencyMeter` surface a skew-corrected **capture→client-receipt** p50/p95 in the macOS HUD: the
first cross-machine latency the real Apple client reports. (Stage-1 `AVSampleBufferDisplayLayer`
has no present callback, so decode→present is excluded — that needs the stage-2 presenter.)
Needs an `xcframework` rebuild + `swift test` on the Mac to validate.
- **Skew handshake in the connector + C ABI** — `quic::clock_sync` is now a shared core helper used
by both the reference client and `NativeClient`; the connector runs it at connect and exposes the
host clock offset over the C ABI (`punktfunk_connection_clock_offset_ns`). This is the substrate
the Apple client needs for the decode→present (glass-to-glass) term.
- **Wall-clock skew handshake** (`ClockProbe`/`ClockEcho`, 8 NTP rounds after `Start`) — makes the
client's capture→reassembled latency valid **cross-machine**. Validated GNOME box → dev box:
offset 1.57 ms removed, **p50 1.30 ms** skew-corrected. (`05bc9ab`)
- **Native LAN auto-discovery** — host advertises `_punktfunk._udp` (TXT: fingerprint, pairing,
proto); `punktfunk-probe --discover` lists hosts. Validated cross-LAN. (`4fff464`)
- **Third test box stood up** — home-worker-3 (Ubuntu 26.04, RTX 4090, GNOME 50): first GNOME/Mutter
zero-copy streaming on a real desktop; **1 Gbps probe clean** (625 MB/5 s, `send_dropped=0`).
Two physical-NVIDIA gotchas documented in [Ubuntu — GNOME](/docs/ubuntu-gnome).
- **Encode|send thread split** validated on real NIC (`send_dropped=0` at 720p60 / 1080p120). (`b295a5b`)
| Client | Highlights |
|---|---|
| **macOS / iOS / iPadOS / tvOS** | VideoToolbox HEVC decode, GameController capture, full DualSense feedback, mDNS discovery, PIN pairing + TOFU, network speed test, latency HUD. Stage-2 presenter (`VTDecompressionSession``CAMetalLayer`, ~11 ms p50 capture→present) is built and validated on glass behind an opt-in flag, becoming the default. Ships as one universal TestFlight build / App Store listing. |
| **Linux** (`punktfunk-client`) | GTK4/libadwaita. FFmpeg decode with VAAPI → DRM-PRIME dmabuf zero-copy (Intel/AMD; software fallback on NVIDIA), PipeWire audio + mic, SDL3 gamepads incl. DualSense, mDNS discovery, PIN pairing + TOFU, speed test. Ships as Flatpak, apt, rpm, and Arch packages. |
| **Windows** (`punktfunk-client`) | WinUI 3. D3D11VA zero-copy decode, HDR10, WASAPI audio + mic, SDL3 gamepads incl. DualSense, mDNS discovery, and the full PIN/TOFU trust surface are all implemented. Ships as a signed MSIX (x86_64 + ARM64). **Stage 1 complete; D3D11VA decode, HDR present, and the GUI are pending on-glass validation on real GPU hardware.** |
| **Android** (phone + Android TV) | Kotlin app with a Rust core over JNI. NDK `AMediaCodec` hardware HEVC decode + HDR10 (Main10/BT.2020 PQ), Opus/Oboe audio + mic, gamepad input with rumble/HID feedback, NsdManager mDNS discovery, PIN pairing + TOFU (Keystore identity), live stats HUD, and D-pad/controller focus navigation for TV. Ships to the Google Play Internal Testing track. |
### Earlier (see roadmap + git log)
- **1 Gbps data plane**: batched `sendmmsg`/`recvmmsg` + microburst-cap paced send thread.
- **Boot appliance**: headless KDE session + host systemd units (no login).
- **Speed test + settable bitrate**: negotiation + bandwidth probe (host side).
- **DualSense** UHID + haptics; gamepads live; mic uplink; AV1 + surround (unit/live-capture tested).
`punktfunk-probe` is a headless reference and measurement client (for testing and
benchmarking, not everyday use).
## Validated live on
The stack has been validated live on a range of hosts and clients:
- **Hosts:** Ubuntu (GNOME / KDE), Fedora KDE, and Bazzite (gamescope) on machines with
RTX-class NVIDIA GPUs, across the KWin, gamescope, Mutter, and Sway/wlroots backends.
- **Clients:** native macOS, Linux, and Android clients against live hosts, plus stock
Moonlight clients over the GameStream path.
- **Cross-machine latency** is measured and skew-corrected (a wall-clock handshake removes
the inter-machine clock offset), so capture-to-present numbers are valid across the LAN.
## Highlights
Notable capabilities that have landed, newest first:
- **Native Linux client (stage 1).** GTK4/libadwaita app that links `punktfunk-core`
directly: mDNS host list, TOFU + SPAKE2 PIN pairing, FFmpeg HEVC decode, PipeWire audio
with mic uplink, SDL3 gamepad capture with rumble/lightbar feedback, layout-independent
keyboard, absolute mouse, fullscreen, and a stats overlay. VAAPI → `GdkDmabufTexture`
zero-copy decode on Intel/AMD with a proven software fallback.
- **Delegated pairing approval.** An unpaired device that knocks on a pairing-required host
appears as a pending request in the web console's Pairing page; one click approves and
pins its certificate — no PIN fetched out of band.
- **Concurrent sessions.** The host serves multiple clients at once (bounded by an NVENC
limit), each with its own virtual output and encoder — e.g. stream the same desktop to a
laptop and a TV simultaneously.
- **Cross-machine latency HUD + wall-clock skew handshake.** A short NTP-style handshake
aligns client and host clocks, making capture-to-reassembled latency valid across
machines; the Apple client surfaces a skew-corrected capture-to-receipt p50/p95 in its
HUD.
- **Native LAN auto-discovery.** Hosts advertise `_punktfunk._udp` over mDNS (with TXT
records carrying the protocol, cert fingerprint, and pairing requirement); clients
discover and list them automatically.
- **1 Gbps data plane.** Batched `sendmmsg`/`recvmmsg`, a microburst-capped paced send
thread, and larger socket buffers, exploiting the GF(2¹⁶) Leopard FEC that breaks the
classic ~1 Gbps GameStream ceiling.
- **Boot appliance.** A headless compositor session plus host systemd units, so a host can
come up and stream with no interactive login.
- **Network speed test + settable bitrate.** Bitrate negotiation and an in-band bandwidth
probe inform the client's bitrate picker instead of guesswork.
- **Rich DualSense.** A full UHID DualSense backend on the host (gamepad, motion, touchpad,
lightbar, player LEDs, adaptive triggers) with feedback carried back to the clients.
- **AV1 + surround audio** are implemented and unit/live-capture tested.
## In flight / next
See the [Roadmap](/docs/roadmap) for the ordered list. Near-term:
- **True glass-to-glass**: Apple client present-stamp (decode→present) + host render→capture term.
- **Apple stage-2 presenter** (`VTDecompressionSession``CAMetalLayer`) — built + decode-unit-tested + live-validated on glass behind the `punktfunk.presenter` flag (capture→present ~11 ms p50); make it the default after a few resolution/HDR checks.
- **Mandatory PIN pairing + delegated pairing approval** (an already-paired device approves a new one).
- **gamescope multi-user isolation** — per-session input/audio so concurrent sessions are independent
desktops (the shared-desktop multi-view case landed).
- **bazzite** kept up to date (currently offline; one rebuild behind).
- **True glass-to-glass latency** — combine the client present-stamp (decode → present)
with the host render → capture term for a complete end-to-end number.
- **Make the Apple stage-2 presenter the default** after a few more resolution/HDR checks.
- **On-glass validation of the Windows client** (D3D11VA decode, HDR present, GUI) on real
GPU hardware.
- **gamescope multi-user isolation** — per-session input/audio so concurrent sessions can
be fully independent desktops (the shared-desktop multi-view case already works).
+3 -2
View File
@@ -65,8 +65,9 @@ Then log out and back in. On other distros this is `sudo usermod -aG input $USER
## Stutter, drops, or high latency
- Lower the **bitrate**. On a busy or Wi-Fi link, the requested bitrate may be too high — the Apple
app's [speed test](/docs/configuration#bitrate) picks a safe value; with Moonlight, set it manually.
- Lower the **bitrate**. On a busy or Wi-Fi link, the requested bitrate may be too high — the native
clients' [speed test](/docs/configuration#bitrate) picks a safe value; with Moonlight, set it
manually.
- Prefer a **wired** connection or 5 GHz Wi-Fi between host and client.
- Streaming to **many devices at once** shares the GPU encoder. The production host
(`serve --native`) handles one native session at a time, with extra clients queued; heavy load is
+53 -59
View File
@@ -1,77 +1,71 @@
---
title: "Windows Host"
description: "Feasibility and scoping for a Windows host backend."
description: "Run the punktfunk streaming host on a Windows PC with an NVIDIA GPU."
---
**Status: scoped, deferred — but de-risked.** A Windows host is architecturally an *"add a backend"*
job, not a parallel port. The one thing that used to make it **large** — the per-client *virtual*
output, which has no user-mode Windows API and seemingly needed a self-signed kernel Indirect
Display Driver (IDD) — is **solved by reusing [SudoVDA](https://github.com/VirtualDrivers), the
Sunshine Virtual Display Adapter**: a pre-built, signed IDD that creates virtual displays at
arbitrary `WxH@Hz` on demand. We install it and drive its control interface; **no driver to write or
WHQL-sign.** That turns the headline feature from XL into a medium backend. This doc records what's
left so the work can be picked up deliberately.
**Status: implemented and shipping — NVIDIA-only, x64-only.** punktfunk is Linux-first, but it also
runs as a native **Windows host**: a signed installer registers a `LocalSystem` service that streams
your Windows desktop or games to any punktfunk or Moonlight client, at the client's exact resolution
via a virtual display — including **HDR10** (10-bit BT.2020 PQ) when your Windows desktop is in HDR
mode. It's newer and less battle-tested than the Linux host, and it is built specifically around
NVIDIA hardware. (The Linux host is 8-bit only — HDR there is blocked upstream.)
(Grounded in a 4-agent read of the host crate, 2026-06-10; SudoVDA path added 2026-06-11.)
> This page is about the Windows **host** (streaming *from* a Windows PC). To stream *to* a Windows
> PC, see the [Windows client](/docs/clients#windows-desktop-client).
## What's already done for us
## Requirements
punktfunk is cleanly layered. **~95% of the codebase is platform-agnostic and reuses verbatim:**
- **Windows 10/11, x64.** ARM64 is not supported — both NVENC and the virtual-display driver are
x64-only.
- **An NVIDIA GPU + driver.** The host encodes with NVENC (`nvEncodeAPI64.dll`); there is no other
encoder backend on Windows.
- **(Optional) ViGEmBus** for virtual gamepads — a manual prerequisite for now
([releases](https://github.com/nefarius/ViGEmBus/releases)).
| Reusable as-is | Why |
|---|---|
| `punktfunk-core` (protocol, FEC, crypto, session, transport, **C ABI**) | Zero platform deps — no `cfg(linux)` anywhere; the C ABI is already cross-platform |
| QUIC control plane (`quic.rs`, pairing, mode negotiation) | quinn + tokio are portable |
| GameStream P1.1 (mDNS, serverinfo, pairing, RTSP, ENet) — *except* `stream.rs`/`audio.rs` | pure wire logic |
| Management REST API (`mgmt.rs`) + OpenAPI | axum/tokio, portable |
| Pipeline + `punktfunk1.rs` orchestration | trait-generic — calls `capturer.next_frame()`, `encoder.submit/poll()`; **needs zero changes** |
| The **trait boundaries** themselves: `Capturer`, `Encoder`, `VirtualDisplay`, `InputInjector`, `AudioCapturer`, `VirtualMic` | platform-neutral signatures; Linux deps are already isolated under `[target.'cfg(target_os="linux")'.dependencies]` |
## Install
So a Windows host is **new `#[cfg(target_os = "windows")]` backend modules behind the existing
traits** — the per-frame path, protocol, and control plane don't move. No architectural refactor is
required; the boundaries are already in the right places.
Download the signed `punktfunk-host-setup-<ver>.exe` from the package registry and run it — it
installs the host into `C:\Program Files\punktfunk`, optionally installs the bundled **SudoVDA**
virtual-display driver, and registers + starts the service. Full steps (including the silent install
and the CLI `punktfunk-host service install` path) are in
[Running as a Service → Windows](/docs/running-as-a-service#windows); packaging internals live in
[`packaging/windows`](https://git.unom.io/unom/punktfunk/src/branch/main/packaging/windows/README.md).
## What a Windows host needs (new code)
## How it works
Each row is a Linux backend that needs a Windows sibling. Effort is the *implementation* effort;
all reuse the existing trait.
The host installs a **`LocalSystem` SCM service** that runs from Session 0 and launches a worker into
the interactive session (`CreateProcessAsUserW`). That lets it **capture the secure desktop** (UAC
prompts, the lock screen) and keep streaming across reboots with nobody logged in — the same model
Sunshine and Apollo use. Service registration, firewall rules, and the supervisor all live in
`punktfunk-host service install`; the installer just lays the exe down and calls it elevated.
| Subsystem | Linux today | Windows equivalent | Effort | Notes |
|---|---|---|---|---|
| **Capture** | xdg ScreenCast portal → PipeWire (dmabuf) | **DXGI Desktop Duplication** (or Windows.Graphics.Capture) → D3D11 texture | M | DXGI gives a GPU `B8G8R8A8` texture directly |
| **Virtual display** | KWin/Mutter/Sway/gamescope protocols | **SudoVDA** (pre-built signed IDD) — install + drive its control API to add/remove a `WxH@Hz` virtual monitor per session | **M** | ✅ **no longer the blocker**: SudoVDA is the same IDD Sunshine ships, so no driver to author or sign. The `VirtualDisplay` backend = enable the adapter, create a monitor at the client's mode, capture it (DXGI), tear it down on session end. Fallback if SudoVDA is absent: capture an existing monitor (loses native-resolution) |
| **Encode** | `ffmpeg-next` NVENC, CUDA hwframes | Media Foundation H.264/HEVC/AV1, **or** NVENC SDK direct with a D3D11 device context (`AVD3D11VADeviceContext`) | ML | `encode.rs` AU/codec logic + NVENC option strings are portable; only the hwdevice + frame-pool glue swaps |
| **Zero-copy bridge** | dmabuf → EGL/Vulkan → CUDA | D3D11 texture → NVENC (shared texture / `cudaImportExternalMemory` + D3D12 fence) | M | **optional** — a portable CPU-copy path already exists, so v1 can skip this |
| **Input (ptr/kbd)** | libei (RemoteDesktop portal) / wlr protocols | **SendInput** (`keybd_event`/`mouse_event`) | S | the VK→evdev table just becomes VK→`VIRTUAL_KEY` (already Win32-native) |
| **Input (gamepads)** | uinput X-Box-360 pad + FF rumble | **ViGEm** (Virtual Gamepad Emulation) + HID reports | M | rumble back-channel maps to ViGEm notifications |
| **Audio capture** | PipeWire sink-monitor | **WASAPI loopback** (`IAudioCaptureClient`) | SM | also produces interleaved f32 — same `AudioCapturer` contract |
| **Virtual mic** | PipeWire `Audio/Source` | virtual audio device (VB-Cable-style WDM driver) or WASAPI render-to-fake-device | M | needs a driver or a bundled 3rd-party cable |
| **`sendmmsg` batching** | `gamestream/stream.rs` | already has a `cfg(not(linux))` per-packet fallback | — | nothing to do |
### One core, Windows backends
**Rough total: ~2,0004,000 LOC of new Rust** (no C++ driver — SudoVDA is reused as-is), spread over
capture/encode/vdisplay/input/audio. With the driver problem solved, the overall effort is now
**medium**; the input+audio layer alone is *smallmedium*.
Most of punktfunk is platform-agnostic. `punktfunk-core` (protocol, FEC, crypto, session, transport,
the C ABI), the QUIC control plane, the GameStream wire logic, the management API, and the per-frame
pipeline orchestration are all shared with the Linux host. The Windows host is a set of
`#[cfg(windows)]` backends behind the same traits the Linux host uses:
## Recommended phasing (when picked up)
| Subsystem | Linux backend | Windows backend |
|---|---|---|
| **Capture** | xdg ScreenCast portal → PipeWire (dmabuf) | **Windows.Graphics.Capture** (+ Desktop Duplication for the secure desktop) → D3D11 texture; FP16/10-bit when the desktop is HDR |
| **Virtual display** | KWin / Mutter / Sway / gamescope | **SudoVDA** signed IDD — create a `WxH@Hz` monitor per session, capture it, tear it down |
| **Encode** | `ffmpeg-next` NVENC (CUDA hwframes) | **NVENC** with a D3D11 device (`--features nvenc`); HEVC Main10 / BT.2020 PQ for HDR |
| **Input — mouse/keyboard** | libei / wlr protocols | **SendInput** (Win32 VK + absolute mouse) |
| **Input — gamepads** | uinput Xbox 360 pad + rumble | **ViGEm** virtual pad + rumble back-channel |
| **Audio capture** | PipeWire sink-monitor | **WASAPI loopback** |
| **Virtual mic** | PipeWire `Audio/Source` | WASAPI virtual mic |
1. **Phase 0 — "basic Windows host" (no virtual display).** Capture an *existing* monitor (DXGI
Desktop Duplication) → Media Foundation/NVENC encode → SendInput + WASAPI loopback. This proves
the whole stack on Windows with the smallest surface, reusing all of core/QUIC/GameStream/mgmt.
It loses the per-client native-resolution output but is a working Windows host quickly.
2. **Phase 1 — the virtual display via SudoVDA.** A `VirtualDisplay` backend that enables SudoVDA,
creates a monitor at the client's exact `WxH@Hz`, captures it (DXGI), and tears it down on session
end — restoring punktfunk's headline feature with **no driver authoring or signing**. (Ship/guide
the SudoVDA install as a host prerequisite, like the udev rule on Linux.)
3. **Phase 2 — input + audio parity.** ViGEm gamepads + rumble; WASAPI virtual mic; D3D11→NVENC
zero-copy.
The virtual display uses **[SudoVDA](https://github.com/VirtualDrivers)** (the Sunshine Virtual
Display Adapter) — a pre-built, signed Indirect Display Driver — so there is **no kernel driver to
author or WHQL-sign**. The installer bundles and stages it; if it's absent, the host falls back to
capturing an existing monitor (losing the per-client native-resolution output).
## Why it's deferred (not started now)
## Limitations
- The remaining work is **medium** and mechanical, but **none of it is buildable or testable on the
Linux dev box** — it would be unvalidated code until there's a Windows box in the loop.
- SudoVDA removed the hard blocker (the signed kernel driver); what's left is a backend port, picked
up whenever a Windows target is in scope.
The architecture is ready whenever the work is scheduled; this doc + the clean trait boundaries are
the down payment. Start at **Phase 0** for the fastest path to a working Windows host.
- **NVIDIA-only.** NVENC is the only encoder backend — there is no AMD / Intel / software encode path
on Windows.
- **x64-only.** No ARM64 build (no ARM64 NVIDIA driver, and SudoVDA is x64-only).
- **Newer than the Linux host.** The Linux host is the most battle-tested path; the Windows host is
more recent, with the virtual-mic and gamepad backends the youngest pieces.