docs(site): add Windows host install, restructure nav, new public roadmap
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 2m24s
ci / web (push) Successful in 35s
android / android (push) Successful in 3m27s
ci / docs-site (push) Successful in 31s
ci / bench (push) Successful in 4m43s
deb / build-publish (push) Successful in 4m49s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m28s
docker / deploy-docs (push) Successful in 17s
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 2m24s
ci / web (push) Successful in 35s
android / android (push) Successful in 3m27s
ci / docs-site (push) Successful in 31s
ci / bench (push) Successful in 4m43s
deb / build-publish (push) Successful in 4m49s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m28s
docker / deploy-docs (push) Successful in 17s
- install (host): add a Windows (NVIDIA) section with signed-installer and certificate-trust steps; note the .cer is the same across releases. - install-client: clarify the Windows MSIX certificate is the same every release (trust once, updates need nothing). - Move "Project & Internals" out of the public docs site: relocate implementation-plan, apple-stage2-presenter, gamescope-multiuser, dualsense-haptics, ci, and gamestream-host-plan to docs/; drop them from the nav. Move windows-host into Host Setup. - Rewrite roadmap as a lean public page with an at-a-glance grid and current statuses (Windows host shipped/beta, Apple incl. tvOS shipped, Android shipped, concurrent sessions + delegated pairing done). - Fix status.md link to the now-internal implementation plan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,126 +0,0 @@
|
||||
---
|
||||
title: "Apple Stage-2 Presenter (handoff)"
|
||||
description: "Implementation plan for the explicit VTDecompressionSession → CAMetalLayer presenter — hand-paced present + true decode→present (glass-to-glass) measurement. Written so a Mac agent can pick it up."
|
||||
---
|
||||
|
||||
> **Status update:** the stage-2 presenter described here has since been **built and live-validated**,
|
||||
> shipping behind an opt-in flag (`AVSampleBufferDisplayLayer` remains the default known-good path).
|
||||
> This page is preserved as the implementation/handoff record for that work.
|
||||
|
||||
The implementation plan for the **stage-2 Apple presenter**. The **stage-1** presenter feeds
|
||||
compressed HEVC straight into `AVSampleBufferDisplayLayer`, which hardware-decodes **and presents
|
||||
internally with no per-frame callback** — so we can't stamp decode or present, and we can't hand-pace.
|
||||
Stage-2 takes explicit control: decode with `VTDecompressionSession`, present decoded frames through a
|
||||
`CAMetalLayer` driven by a display link. Two wins: **~0.5 refresh off the present tail** (the biggest
|
||||
client latency term at 60 Hz) and **true decode→present / glass-to-glass** numbers.
|
||||
|
||||
All of this is **macOS/iOS/tvOS-only** — build + validate on a Mac (`swift build && swift test`, then
|
||||
live against a Linux host). The host + connector side is already done: `PunktfunkConnection.clockOffsetNs`
|
||||
(the connect-time skew offset, host minus client) is what makes the present timestamp cross-machine
|
||||
valid. See [Status](/docs/status) and roadmap §12.
|
||||
|
||||
## Where it plugs into the existing code
|
||||
|
||||
| Existing (stage-1) | Stage-2 change |
|
||||
|---|---|
|
||||
| `StreamPump` pulls AUs → `AnnexB.sampleBuffer` → `layer.enqueue` (compressed) | A `Stage2Pump` (or a mode flag on `StreamPump`) feeds AUs to `VTDecompressionSessionDecodeFrame` instead |
|
||||
| `StreamView`/`StreamViewIOS` host an `AVSampleBufferDisplayLayer` | Host a `CAMetalLayer` (+ a display link); keep the input-capture + HUD overlay unchanged |
|
||||
| `AnnexB.formatDescription(fromIDR:)` builds the format desc, refreshed on every IDR | **Reused** — it's the `VTDecompressionSession`'s format description; recreate the session when it changes |
|
||||
| `LatencyMeter` records capture→client-receipt at `onFrame` | Extend to record **decode-completion** and **present** stages (below) |
|
||||
|
||||
Keep stage-1 behind a `UserDefaults` flag (e.g. `punktfunk.presenter = "stage1" | "stage2"`) so a
|
||||
regression can fall back — `AVSampleBufferDisplayLayer` is the known-good path.
|
||||
|
||||
## Decode: VTDecompressionSession
|
||||
|
||||
1. Create the session from the IDR's `CMVideoFormatDescription`
|
||||
(`AnnexB.formatDescription(fromIDR:)`):
|
||||
```
|
||||
VTDecompressionSessionCreate(
|
||||
allocator: nil,
|
||||
formatDescription: fmt,
|
||||
decoderSpecification: nil, // hardware by default; no need to force
|
||||
imageBufferAttributes: [
|
||||
kCVPixelBufferMetalCompatibilityKey: true,
|
||||
kCVPixelBufferPixelFormatTypeKey:
|
||||
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, // 8-bit SDR; 10-bit (…10BiPlanar) for HDR later
|
||||
],
|
||||
outputCallback: <C-callback>,
|
||||
decompressionSessionOut: &session)
|
||||
```
|
||||
2. Per AU: build the same `CMSampleBuffer` as stage-1 (`AnnexB.sampleBuffer(au:format:)`, PTS =
|
||||
`au.ptsNs` @ 1e9 timescale) and submit:
|
||||
```
|
||||
VTDecompressionSessionDecodeFrame(session, sampleBuffer,
|
||||
flags: ._EnableAsynchronousDecompression,
|
||||
frameRefcon: <pts or a boxed context>, infoFlagsOut: nil)
|
||||
```
|
||||
3. The **output callback** delivers `(status, infoFlags, imageBuffer: CVImageBuffer?, presentationTimeStamp, …)`.
|
||||
`presentationTimeStamp` is `au.ptsNs` (the host capture clock). **Stamp decode-completion here**
|
||||
(`CLOCK_REALTIME` ns), retain the `CVPixelBuffer`, and push `{pts, pixelBuffer, decodedNs}` into a
|
||||
small NSLock-guarded ring (the "ready" queue) the display link drains.
|
||||
4. **IDR / mode change**: when `AnnexB.formatDescription` yields a new desc, check
|
||||
`VTDecompressionSessionCanAcceptFormatDescription`; if not, finish-and-recreate the session (same
|
||||
trigger stage-1 uses to refresh `format`). On decoder error (`kVTVideoDecoderBadDataErr`, etc.) drop
|
||||
to the next IDR — there's no out-of-band extradata; recovery keyframes re-carry the parameter sets.
|
||||
|
||||
## Present: CAMetalLayer + display link
|
||||
|
||||
- `CAMetalLayer` (device = system default, `pixelFormat = .bgra8Unorm`, `framebufferOnly = true`,
|
||||
`drawableSize` = stream WxH). The view: macOS `NSView`/iOS `UIView` whose `layerClass`/backing layer
|
||||
is the `CAMetalLayer` (mirror `StreamView`/`StreamViewIOS`).
|
||||
- **Display link** drives present: macOS `CVDisplayLink` (or `CADisplayLink` on macOS 14+),
|
||||
iOS/tvOS `CADisplayLink`. Each callback carries the **target present timestamp** (`CVTimeStamp` /
|
||||
`targetTimestamp`).
|
||||
- Each vsync: pop the **newest** ready frame (drop older undisplayed ones — low-latency default; no
|
||||
smoothing buffer to start), render a fullscreen quad sampling the **biplanar YUV** (luma +
|
||||
chroma planes via `CVMetalTextureCache`) with a BT.709 YUV→RGB fragment shader, then
|
||||
`commandBuffer.present(drawable)` (or `present(drawable, atTime:)`). **Stamp present time** for the
|
||||
frame just shown (use the display link's target timestamp converted to `CLOCK_REALTIME`).
|
||||
- Colorspace: BT.709 8-bit for now (matches the host's SDR). HDR (BT.2020/PQ, 10-bit `…10BiPlanar` +
|
||||
EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`) is a later tie-in with the HDR roadmap (§10).
|
||||
|
||||
### Cheaper intermediate (2a) if the Metal path is too big in one step
|
||||
Decode with `VTDecompressionSession` (gets the **decode-completion timestamp** = capture→decoded),
|
||||
then wrap the decoded `CVPixelBuffer` in a `CMSampleBuffer` and `enqueue` it into the existing
|
||||
`AVSampleBufferDisplayLayer` (it accepts uncompressed pixel buffers too). This yields the decode term
|
||||
**without** a Metal renderer — but **not** true present (the layer still presents internally). Ship 2a
|
||||
first if useful; 2b (CAMetalLayer + display link) is required for the on-glass present stamp.
|
||||
|
||||
## Measurement (the whole point)
|
||||
|
||||
Extend `LatencyMeter` (or add per-stage meters) so each frame records three instants, all
|
||||
`CLOCK_REALTIME` ns, all shifted by `connection.clockOffsetNs` to the host clock:
|
||||
|
||||
- **capture→decoded** = `decodedNs + offset − pts_ns` (VideoToolbox decode latency, cross-machine)
|
||||
- **decode→present** = `presentedNs − decodedNs` (the present tail stage-2 shortens)
|
||||
- **capture→present** = `presentedNs + offset − pts_ns` — **the glass-to-glass number** (modulo the
|
||||
host render→capture term, still unmeasured; see roadmap §12)
|
||||
|
||||
Surface `capture→present` p50/p95 in the HUD (extend the existing `model.latency*` line in
|
||||
`ContentView`). `skewCorrected` stays false when `clockOffsetNs == 0` (old host) — then the numbers are
|
||||
same-host-only, as today.
|
||||
|
||||
## Validation
|
||||
|
||||
- `swift test`: add a decode-output test (decode a known IDR built like
|
||||
`VideoToolboxRoundTripTests` → assert a `CVPixelBuffer` of the right dimensions + the
|
||||
decode callback fires). Present is display-bound — validate it **live** via the HUD number.
|
||||
- Live: connect to a Linux host (`punktfunk1-host --source virtual` on the GNOME box; see
|
||||
[Ubuntu — GNOME](/docs/ubuntu-gnome)), confirm `capture→present` is a few ms over `capture→client`
|
||||
and that `decode→present` shrank vs. an `AVSampleBufferDisplayLayer` baseline.
|
||||
- Compare against the headless reference number: `punktfunk-probe` reports skew-corrected
|
||||
capture→reassembled (~1.3 ms p50 GNOME box → dev box); capture→present should be that **+ decode +
|
||||
present**.
|
||||
|
||||
## Gotchas
|
||||
|
||||
- VT decode is **async**; the output callback runs on a VT-managed thread — don't block it, just stamp
|
||||
+ enqueue. Retain the `CVPixelBuffer` until presented (the ring owns it).
|
||||
- `VTDecompressionSessionDecodeFrame` wants the **same** `CMSampleBuffer` shape stage-1 builds (AVCC
|
||||
length-prefixed NALs, in-band parameter sets in the format desc, never as extradata).
|
||||
- `CAMetalLayer.drawableSize` must track mode changes (the host can `Reconfigure` mid-stream — watch
|
||||
`PunktfunkConnection.mode`/the new-IDR dimensions).
|
||||
- Don't add a jitter/smoothing buffer for the first cut — present newest-ready for lowest latency; a
|
||||
pacing policy can come later if frames look uneven.
|
||||
- Keep `clients/apple/README.md`'s "Stage 2" item + [Status](/docs/status) updated when this lands.
|
||||
@@ -1,119 +0,0 @@
|
||||
---
|
||||
title: "CI & Docker"
|
||||
description: "Gitea Actions setup — workflows, the dockerized pieces, and the runners."
|
||||
---
|
||||
|
||||
CI runs on **Gitea Actions** (`git.unom.io`, org `unom`). The workflows live in
|
||||
`.gitea/workflows/`; they run across Linux and macOS runners and push a few images to the
|
||||
Gitea container registry.
|
||||
|
||||
## Workflows
|
||||
|
||||
| Workflow | Trigger | Runner | What it does |
|
||||
|---|---|---|---|
|
||||
| `ci.yml` | push to `main`, PRs | Linux | Rust workspace (fmt · clippy `-D warnings` · build · test · C-ABI harness · generated-header drift) inside the `punktfunk-rust-ci` image; `web/` and `docs-site/` build + typecheck in `oven/bun:1` |
|
||||
| `docker.yml` | push to `main`, `v*` tags, manual | Linux | Builds + pushes the images below (`latest` + `sha-<short>` tags) |
|
||||
| `apple.yml` | push to `main`, PRs, manual | macOS | Rust core → `PunktfunkCore.xcframework` → `swift build` + `swift test` in `clients/apple` |
|
||||
| `release.yml` | `v*` tags, manual | macOS | Production Apple builds: sandboxed macOS `.dmg` (Developer ID, notarized, stapled) attached to the Gitea release + macOS/iOS/tvOS archives uploaded to TestFlight |
|
||||
| `windows-msix.yml` | push to `main`, `v*` tags, manual | Windows | Builds the Windows client for `x86_64`/`aarch64` and packages signed MSIX artifacts |
|
||||
|
||||
## Dockerized pieces
|
||||
|
||||
The host and the native clients are intentionally **not** containerized (the host needs
|
||||
the GPU/compositor stack of the box it runs on). What is:
|
||||
|
||||
| Image | Source | Notes |
|
||||
|---|---|---|
|
||||
| `git.unom.io/unom/punktfunk-web` | `web/Dockerfile` (repo-root context — orval needs `docs/api/openapi.json`) | Nitro `bun` bundle; `PORT` (3000) and `PUNKTFUNK_MGMT_URL` env at runtime |
|
||||
| `git.unom.io/unom/punktfunk-docs` | `docs-site/Dockerfile` | This site; `PORT` (3000) |
|
||||
| `git.unom.io/unom/punktfunk-rust-ci` | `ci/rust-ci.Dockerfile` | Ubuntu 26.04 + FFmpeg 8/PipeWire/GL/GBM dev libs + a libcuda **link stub** (driver userspace, no kernel module) + pinned rustup — the container `ci.yml`'s Rust job runs in |
|
||||
|
||||
Registry pushes authenticate with a repo Actions secret holding a registry token (a PAT
|
||||
with `write:package`; the login username in `docker.yml` is the token owner, not the
|
||||
push actor).
|
||||
|
||||
## Runners
|
||||
|
||||
- **Linux runner** — runs the Rust/web/docs jobs (as docker containers) and the image
|
||||
build+push jobs.
|
||||
- **macOS runner** — an Apple-silicon Mac running macOS, a **host-mode** `act_runner`
|
||||
(upstream now ships it as `gitea-runner`) provisioned by
|
||||
[`scripts/ci/setup-macos-runner.sh`](https://git.unom.io/unom/punktfunk/src/branch/main/scripts/ci/setup-macos-runner.sh):
|
||||
rustup (+ both darwin targets for the universal xcframework), Node.js (host-mode runners
|
||||
execute JS actions via `node` from PATH — nothing auto-provisions it), the runner binary
|
||||
in `~/.local/bin`, state under `~/ci/act-runner/` (config, `.runner` registration,
|
||||
`runner.log`), kept alive by the `io.gitea.act_runner` **root LaunchDaemon** — it cannot
|
||||
be a user LaunchAgent: macOS Local Network privacy silently blocks LAN dials
|
||||
("no route to host") from unbundled CLI binaries in gui/user launchd domains, while
|
||||
system daemons are exempt. Needs full **Xcode** for `xcodebuild -create-xcframework`
|
||||
(CLT alone only covers `swift build/test`); if `xcode-select` still points at CLT, the
|
||||
script auto-detects `/Applications/Xcode*.app` and bakes a `DEVELOPER_DIR` override into
|
||||
the daemon environment — no `xcode-select -s` required.
|
||||
- **Windows runner** — builds and packages the native Windows client (MSIX) for the
|
||||
release matrix.
|
||||
|
||||
Re-provisioning is idempotent — re-running `scripts/ci/setup-macos-runner.sh` on the macOS
|
||||
runner with a fresh `GITEA_RUNNER_TOKEN` (org `unom` → Settings → Actions → Runners →
|
||||
Create new runner) re-registers it without manual cleanup.
|
||||
|
||||
## Apple releases
|
||||
|
||||
`release.yml` produces the production client builds on the Mac runner. All three app
|
||||
targets share the bundle ID **`io.unom.punktfunk`** (one App Store listing, universal
|
||||
purchase — effectively unchangeable after first submission). Signing is **not** secret-based:
|
||||
the runner uses its **login keychain** directly, so install the **Developer ID Application**,
|
||||
**Apple Distribution**, and (for the Mac App Store `.pkg`) **3rd Party Mac Developer
|
||||
Installer** identities once via Xcode, with the WWDR intermediate present so they show as
|
||||
valid. The only secrets are `ASC_API_KEY_P8`/`ASC_API_KEY_ID`/`ASC_API_ISSUER_ID` (App Store
|
||||
Connect API key — notarization + TestFlight upload). Per-platform state:
|
||||
|
||||
- **macOS (Developer ID)** — sandboxed app (`Config/Punktfunk-macOS.entitlements`) → export
|
||||
→ `notarytool` → stapled `.dmg` on the Gitea release.
|
||||
- **macOS (App Store)** — manual-signed archive (Apple Distribution + the *Punktfunk macOS
|
||||
App Store Distribution* profile) → upload to TestFlight. App Sandbox is **mandatory** here
|
||||
and is now declared (app-sandbox + network client/server + audio-input + bluetooth/usb).
|
||||
Prereqs (one-time, Apple portal): add the **macOS platform** to the App Store Connect app
|
||||
record (universal purchase), install the Mac App Store distribution profile + the installer
|
||||
cert above. `continue-on-error` until those exist.
|
||||
- **iOS** — archive + upload to TestFlight (`method: app-store-connect`,
|
||||
`destination: upload`). Crypto is declared exempt (`ITSAppUsesNonExemptEncryption`,
|
||||
`Config/Info.plist`) so builds don't stall on the compliance question.
|
||||
- **tvOS** — archive + upload to TestFlight (Rust core built from tier-3 targets, nightly
|
||||
`-Zbuild-std` via `build-xcframework.sh`).
|
||||
|
||||
Each macOS target uses its own entitlements: `Config/Punktfunk-macOS.entitlements` (App
|
||||
Sandbox is macOS-only) for the macOS app, and the shared `Config/Punktfunk.entitlements`
|
||||
(keychain-access-groups only) for iOS/tvOS — `com.apple.security.app-sandbox` is invalid on
|
||||
iOS/tvOS and would fail upload validation.
|
||||
|
||||
The runner needs a **release (non-beta) Xcode** — App Store processing rejects beta-SDK
|
||||
builds, and a beta is unusable for the Rust side too: a newer-than-OS ld emits dylibs the
|
||||
running dyld rejects ("mis-aligned LINKEDIT string pool"), killing every proc-macro build
|
||||
with a misleading `E0463 can't find crate`. `build-xcframework.sh` therefore resolves
|
||||
toolchains itself: non-beta Xcode for everything; with only CLT + a beta present it
|
||||
builds macOS slices against CLT (packaging via any Xcode — `-create-xcframework` does no
|
||||
linking) and **refuses iOS/tvOS slices** (CLT has no iOS SDK).
|
||||
|
||||
## Deployment
|
||||
|
||||
`docker.yml`'s `deploy-docs` job ships this docs site after every image push: it syncs
|
||||
`compose.production.yml` to the docs server and runs `docker compose pull && up -d` there
|
||||
over SSH, driven by a small set of deploy secrets (`DEPLOY_HOST` / `DEPLOY_USER` /
|
||||
`DEPLOY_PORT` / `DEPLOY_SSH_KEY`). A reverse proxy in front of that server serves the
|
||||
container as <https://docs.punktfunk.unom.io>. The host and the web console are NOT
|
||||
deployed — the console fronts a punktfunk host's management API on whatever box runs the
|
||||
host.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **macOS runner offline** — check `~/ci/act-runner/runner.log` on the runner; restart with
|
||||
`sudo launchctl kickstart -k system/io.gitea.act_runner`. "no route to host" in the log
|
||||
means the daemon is running in a gui/user domain again — see the Local Network note
|
||||
above.
|
||||
- **`apple.yml` fails at the xcframework step** — Xcode missing or unselected:
|
||||
`sudo xcode-select -s /Applications/Xcode.app/Contents/Developer` and accept the license
|
||||
(`sudo xcodebuild -license accept`), then re-run.
|
||||
- **Rust job can't pull `punktfunk-rust-ci`** — the runner host's docker daemon needs a
|
||||
`docker login git.unom.io` if the org/registry isn't anonymously readable.
|
||||
- **Stale builder image after toolchain/dep changes** — `docker.yml` re-pushes it on every
|
||||
`main` push; a manual `workflow_dispatch` of `docker.yml` forces a rebuild.
|
||||
@@ -1,111 +0,0 @@
|
||||
---
|
||||
title: "DualSense Haptics"
|
||||
description: "Feasibility and scoping for audio-driven DualSense haptics."
|
||||
---
|
||||
|
||||
|
||||
**Status: scoped, NO-GO for now (deferred).** Advanced voice-coil haptics on the DualSense are
|
||||
driven by the controller's **USB audio interface** (4-channel surround, the back two channels carry
|
||||
the haptic waveform), *not* by HID reports. Emulating that on a Linux host and faithfully replaying
|
||||
it on the Apple client both hit hard walls, and the supply of software that actually *emits* these
|
||||
haptics on a Linux host is essentially zero. We defer the audio-haptics feature and instead land the
|
||||
parts of "really supporting the DualSense" that *are* reachable: **adaptive triggers (HID) and
|
||||
two-motor rumble.**
|
||||
|
||||
(Grounded in a 4-agent feasibility read — host USB-gadget viability, DualSense audio descriptors,
|
||||
Linux game demand, Apple client render path — 2026-06-10.)
|
||||
|
||||
## The one distinction that decides everything
|
||||
|
||||
| Feature | How it's driven | Reachable for us? |
|
||||
|---|---|---|
|
||||
| Basic rumble (2 motors) | HID output report `0x02`, bytes 3–4 | **Yes** — already parsed; client already has `nextRumble()` |
|
||||
| **Adaptive triggers** (L2/R2 resistance) | HID output report `0x02`, bytes 11–22 / 22–33 | **Yes** — already parsed in `dualsense.rs`; just needs the `0xCD` back-channel + client render |
|
||||
| **Advanced haptics** (voice-coil actuators) | **USB *audio* interface** — 4-ch, back 2 channels = haptic PCM | **No (for now)** — see the three walls below |
|
||||
|
||||
The UHID DualSense we already built is **HID-only**. It cannot present the DualSense's *audio*
|
||||
interface, so it structurally cannot carry advanced haptics. That's not a bug in our implementation —
|
||||
it's the wrong transport for this signal.
|
||||
|
||||
## The three walls (any one is fatal on its own)
|
||||
|
||||
### Wall 1 — Host capture needs a kernel rebuild
|
||||
To *capture* haptic audio a game emits, the host must present a virtual device that owns the
|
||||
DualSense audio interface. The standard way is a composite USB gadget (`configfs` + `f_hid` +
|
||||
`f_uac2`) bound to a software UDC (`dummy_hcd`).
|
||||
|
||||
- ✅ Present & enabled on this box: `CONFIG_USB_CONFIGFS`, `CONFIG_USB_CONFIGFS_F_HID`,
|
||||
`CONFIG_USB_CONFIGFS_F_UAC2`, plus `libcomposite`/`usb_f_hid`/`usb_f_uac2`/`u_audio` modules.
|
||||
- ❌ **Blocker:** `# CONFIG_USB_DUMMY_HCD is not set` in `/boot/config-7.0.0-22-generic`. No
|
||||
`dummy_hcd.ko`, no `/sys/class/udc/`. **No UDC → nothing to bind the gadget to.** Requires a
|
||||
custom kernel build to enable `CONFIG_USB_DUMMY_HCD=m`, plus root for module-load/configfs.
|
||||
|
||||
A lighter alternative exists — a **virtual PipeWire/ALSA sink renamed as the DualSense** (this is how
|
||||
the working Linux setups capture the back-2-channels today, via WirePlumber rules). It skips the
|
||||
kernel rebuild, but is gated by the same Wall 2 below, and games' audio-device detection is
|
||||
hardcoded per-title so it's fragile.
|
||||
|
||||
### Wall 2 — Almost nothing on a Linux host emits these haptics
|
||||
This is the decisive one. The *supply* that would feed our capture barely exists:
|
||||
|
||||
- **Steam Input (Linux):** no official advanced-haptics support (open feature request as of 2026).
|
||||
- **Sony's `hid-playstation` kernel driver:** explicitly does **not** expose VCM haptics or adaptive
|
||||
triggers — basic rumble only.
|
||||
- **RPCS3:** treats the DualSense as a generic pad; no advanced haptics.
|
||||
- **Native Linux games:** effectively **zero** with advanced haptics.
|
||||
- **The only working path** is a handful of Proton titles (FF7 Remake, Ghostwire, Deathloop, Animal
|
||||
Well, Stellar Blade) via ClearlyClaire's *custom Wine patches*, **USB-only**, Steam Input
|
||||
disabled, forced into a 4.0-surround profile, device renamed to match Windows. ~5–10 games total.
|
||||
- Bluetooth can't carry it on Linux *or* Windows (Sony's proprietary A2DP repurposing isn't exposed).
|
||||
|
||||
A host-side capture feature is only as useful as the software willing to drive it. On Linux that set
|
||||
is a niche-of-a-niche.
|
||||
|
||||
### Wall 3 — The Apple client can't faithfully replay it
|
||||
Even with a captured waveform, the primary client (macOS/iOS) can't render it well:
|
||||
|
||||
- macOS GameController exposes the DualSense as a **basic gamepad** — no voice-coil / adaptive-trigger
|
||||
access. Those are PS5-only in Apple's stack.
|
||||
- CoreHaptics is **discrete, pattern-based** (`CHHapticPattern` events, ≤30 s), **not** a PCM
|
||||
streaming sink. Converting a streamed haptic waveform to patterns is lossy — it throws away exactly
|
||||
the fidelity that makes voice-coil haptics worth having.
|
||||
- There is **no public macOS API** to route CoreAudio to the DualSense's channels 3–4. Doing it
|
||||
anyway means private/reverse-engineered APIs that break across OS updates.
|
||||
|
||||
## What we *can* ship instead ("really supporting the DualSense" minus audio haptics)
|
||||
|
||||
The HID DualSense we built is the foundation, and the high-value parts are within reach:
|
||||
|
||||
1. **Adaptive triggers — GO.** `dualsense.rs` already parses the L2/R2 trigger effects out of HID
|
||||
output report `0x02`. Finishing this is the paused HID work: route them over the `0xCD`
|
||||
HID-output back-channel and render on the client. This delivers the headline "DualSense feel"
|
||||
(trigger resistance/weapon tension) for any source that emits it — and it's pure HID, no audio
|
||||
interface, no kernel rebuild.
|
||||
2. **Two-motor rumble — already done.** Parsed host-side; the Apple client already has
|
||||
`nextRumble()`. Wire it to `GCDeviceHaptics`/`CHHapticEngine` as discrete patterns (API-clean,
|
||||
no private APIs).
|
||||
3. **LED / player-LED / touchpad / motion** — already parsed; finish the `0xCC`/`0xCD` routing.
|
||||
|
||||
This is the resume-able HID DualSense Phase C/D/E work — it stands on its own and was never blocked.
|
||||
|
||||
## Conditions for a future GO on audio haptics
|
||||
|
||||
Revisit if **all three** change:
|
||||
|
||||
- A real DualSense is available on the dev box to capture an authoritative `lsusb -v` + the exact
|
||||
UAC channel/sample-rate/format layout (today: undocumented, would need reverse-engineering).
|
||||
- The host target gains a UDC (custom kernel with `dummy_hcd`, or real hardware OTG) **or** we accept
|
||||
the PipeWire-renamed-sink path *and* the title set that emits haptics on Linux grows beyond the
|
||||
Proton-patch niche.
|
||||
- The client target shifts to one that can render PCM haptics (a Linux/Windows client with direct
|
||||
CoreAudio-style channel access, or a future Apple API) — or we accept lossy pattern conversion.
|
||||
|
||||
Until then the cost/benefit is upside-down: three hard subsystems (kernel, USB gadget, audio
|
||||
routing) to serve ~5–10 Proton titles, rendered lossily on the one client we ship.
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Defer audio-driven advanced haptics. Land adaptive triggers (HID) + rumble instead** — that's the
|
||||
reachable 80% of "really supporting the DualSense," needs no kernel work, and the parsing is already
|
||||
written. Keep this doc as the down payment for the audio-haptics feature whenever the three
|
||||
conditions above are met.
|
||||
@@ -1,73 +0,0 @@
|
||||
---
|
||||
title: "gamescope Multi-User Isolation (deferred)"
|
||||
description: "Research + design for concurrent INDEPENDENT gamescope desktops (multi-user), and why it's deferred. The shared-desktop multi-view case already landed."
|
||||
---
|
||||
|
||||
**Status: deferred (2026-06-12).** Concurrent sessions landed for the **shared-desktop multi-view**
|
||||
case — multiple devices viewing/controlling the *same* KWin/Mutter/wlroots desktop ([Status](/docs/status)).
|
||||
This page captures the research for the *other* model — **independent desktops** (each client its own
|
||||
gamescope instance: the multi-user / cloud-gaming-on-one-box case) — and why it's parked. Pick this
|
||||
up from here if the use case becomes a priority.
|
||||
|
||||
## What landed vs what this is
|
||||
|
||||
| Model | Backends | Input | Audio | Status |
|
||||
|---|---|---|---|---|
|
||||
| **Shared-desktop multi-view** | kwin / mutter / wlroots | shared (all drive one desktop) | shared (all hear one desktop) | ✅ **landed** — correct semantics: stream *your* desktop to laptop + TV at once |
|
||||
| **Independent desktops (multi-user)** | gamescope | **per-session** (each drives its own game) | **per-session** | ⏸ **deferred** — this page |
|
||||
|
||||
For independent desktops, shared input/audio is *wrong* — each user must drive and hear only their own
|
||||
session. gamescope is the natural fit: each `create()` spawns a fresh nested compositor (own
|
||||
rendering, own EIS input socket). The blocker is that the host's input/audio/mic are host-lifetime
|
||||
**shared** services, and the gamescope EIS socket is relayed through a single global file.
|
||||
|
||||
## Current architecture (the research)
|
||||
|
||||
Each gamescope **process is per-session** (`vdisplay/gamescope.rs::create()` spawns one; the
|
||||
`VirtualOutput.keepalive` owns it). But:
|
||||
|
||||
- **EIS input socket — single global file.** gamescope exports `LIBEI_SOCKET` for its children; a
|
||||
shell wrapper relays it to the fixed path `/tmp/punktfunk-gamescope-ei` (`EI_SOCKET_FILE`).
|
||||
**Two concurrent instances overwrite each other's socket name** in that one file.
|
||||
- **Injector — one host-lifetime `!Send` service.** `punktfunk1.rs::InjectorService` opens **one**
|
||||
`inject::open(backend)` for the whole run and forwards events over an mpsc channel. It was made
|
||||
shared deliberately (the portal `CreateSession` churn wedged KWin's EIS — "EIS setup timed out").
|
||||
For gamescope it reads the one global socket file, so all sessions' input lands in whichever
|
||||
instance wrote last.
|
||||
- **Audio — global default-sink monitor.** `audio::open_audio_capture()` sets
|
||||
`STREAM_CAPTURE_SINK` and autoconnects to the host's **default sink monitor** (PW_ID_ANY) — the
|
||||
whole system's output, not a per-gamescope node. gamescope exposes **no per-instance audio node**.
|
||||
- **Mic — one global `Audio/Source`.** `MicService` feeds one PipeWire source named `punktfunk-mic`;
|
||||
all clients' mic uplinks mix into it.
|
||||
- Per-session already (no work): the gamescope process, the PipeWire video node, and the uinput
|
||||
gamepads.
|
||||
|
||||
## What it would take
|
||||
|
||||
1. **Per-instance EIS socket** — give each gamescope a unique relay file
|
||||
(`/tmp/punktfunk-gamescope-{id}-ei`) and carry the path on `VirtualOutput` (new field) so the
|
||||
session can find its own socket.
|
||||
2. **Per-session injector** — for gamescope sessions, create a **per-session** injector bound to that
|
||||
socket (its own thread, since `InputInjector` is `!Send`), instead of the shared `InjectorService`.
|
||||
Keep the shared service for the portal backends (kwin/mutter) where shared input is correct.
|
||||
Ordering nuance: the input thread is wired before the gamescope socket exists, so the per-session
|
||||
injector must open **lazily** (on first event, by which time gamescope is up) or be created after
|
||||
`build_pipeline`.
|
||||
3. **Per-session audio (the bigger piece).** gamescope has no per-instance audio node, but audio
|
||||
*is* isolatable: create a **per-session PipeWire null-sink**, route that gamescope's apps to it
|
||||
(`PULSE_SINK` / a target node on the spawn env), and capture **that sink's monitor** per session.
|
||||
This is the largest addition — null-sink create/teardown + routing + per-session capture.
|
||||
4. **Per-session mic** — a virtual `Audio/Source` per session (`punktfunk-mic-{id}`), routed into
|
||||
that gamescope, instead of the one global source.
|
||||
|
||||
## Why deferred
|
||||
|
||||
- It's a **large multi-file refactor** — the whole input path (per-instance sockets + per-session
|
||||
injector + the lazy-open ordering), **plus** per-session null-sink audio routing, **plus** per-session
|
||||
mic — for a **niche** use case (multiple independent users gaming on one box).
|
||||
- The **common** concurrency case — stream one desktop to several of *your own* devices — is the
|
||||
shared-desktop multi-view model, which **already landed and is the correct semantics** for it.
|
||||
- No correctness gap in what shipped: concurrent sessions work today; this is purely the *additional*
|
||||
independent-desktops model.
|
||||
|
||||
Revisit when there's a real multi-user requirement. The plumbing list above is the whole job.
|
||||
@@ -1,95 +0,0 @@
|
||||
---
|
||||
title: "GameStream Host"
|
||||
description: "Stream to a stock Moonlight client on a client-sized virtual display."
|
||||
---
|
||||
|
||||
|
||||
The shippable milestone (plan §8). A stock Moonlight/Artemis client discovers this host,
|
||||
pairs, launches, and gets video (then input, then audio) on a client-sized virtual display.
|
||||
Ground-truth protocol reference: [`research/gamestream-protocol-research.json`](research/gamestream-protocol-research.json)
|
||||
(distilled from Sunshine + moonlight-common-c source; cite those for byte-level detail).
|
||||
|
||||
## Architecture (respects the "one core" invariant)
|
||||
|
||||
- **punktfunk-core** gains a **P1 GameStream wire codec** (`ProtocolPhase::P1GameStream`, the
|
||||
hook already exists): the exact RTP+`NV_VIDEO_PACKET` framing, the GameStream FEC shard
|
||||
layout, and the video/audio AES-GCM/CBC paths. Hot path, native threads, **no async**.
|
||||
Kept beside punktfunk's native internal format (P2), selected by phase.
|
||||
- **punktfunk-host** gains the **control plane** (tokio/axum OK — I/O-bound, not the hot path):
|
||||
mDNS discovery, nvhttp serverinfo + the 4-phase pairing, the RTSP handshake, the ENet
|
||||
control stream + input injection, the virtual-display lifecycle, and Opus audio encode.
|
||||
|
||||
## Port map (base 47989; Moonlight derives all by offset)
|
||||
|
||||
| Port | Proto | Role |
|
||||
|---|---|---|
|
||||
| 47989 | TCP | HTTP nvhttp (unpaired: /serverinfo, /pair PIN flow) |
|
||||
| 47984 | TCP | HTTPS nvhttp (paired; **client-cert pinned**) — /launch, /resume, … |
|
||||
| 48010 | TCP | RTSP (OPTIONS/DESCRIBE/SETUP/ANNOUNCE/PLAY) |
|
||||
| 47998 | UDP | Video RTP (+ RS-FEC, optional AES-GCM) |
|
||||
| 47999 | UDP | Audio RTP (Opus, RS-FEC 4+2, optional **AES-CBC**) |
|
||||
| 48000 | UDP | ENet control stream (AES-GCM) + remote input |
|
||||
| 5353 | UDP | mDNS `_nvstream._tcp.local` advertisement |
|
||||
|
||||
## Key wire facts (the non-obvious ones)
|
||||
|
||||
- **Video datagram** = `RTP_PACKET(12, BIG-endian)` + `reserved[4]` + `NV_VIDEO_PACKET(16,
|
||||
LITTLE-endian)` + payload. Endianness differs *within the same packet*. `header=0x80|0x10`.
|
||||
- `fecInfo` (u32 LE) = `(dataShards<<22)|(fecIndex<<12)|(fecPercentage<<4)`; parityShards is
|
||||
**recomputed** by the client as `ceil(dataShards*pct/100)` — must match exactly.
|
||||
- `multiFecBlocks` = `(blockIdx<<4)|((nBlocks-1)<<6)`; **≤4 FEC blocks/frame**, ≤255 shards/block.
|
||||
- Each frame's bitstream is prefixed with an 8-byte `video_short_frame_header_t`
|
||||
(`headerType=0x01`, `frameType` 2=IDR, `lastPayloadLen`) before striping into shards.
|
||||
- Shard size = `packetSize + 16`. Data shards first, then parity, over a contiguous RTP
|
||||
sequence range. Last data shard zero-padded.
|
||||
- **Video crypto** (when `SS_ENC_VIDEO` negotiated): AES-128-GCM, key = raw 16-byte RIKEY
|
||||
(from `/launch?rikey=`), IV = `counter_le[8]||0,0,0||'V'(0x56)`, **NO AAD**, 32-byte
|
||||
`ENC_VIDEO_HEADER{iv[12],frameNumber,tag[16]}` prefix; **FEC first, then encrypt per shard**.
|
||||
- **Pairing**: PIN key = `SHA-256(salt[16] || ascii_pin)[..16]`; AES-128-**ECB** (no padding)
|
||||
for the challenge blocks; SHA-256 rolling hashes; RSA-SHA256 signatures over X.509 certs;
|
||||
the client cert is pinned for subsequent HTTPS. 4 phases over `/pair?phrase=…`.
|
||||
- **RTSP** `Session: DEADBEEFCAFE;timeout = 90` (literal), `Transport: server_port=<p>`,
|
||||
`streamid=video/0/0` / `control/13/0`. ANNOUNCE carries the negotiated config
|
||||
(`x-nv-video[0].*`, `x-nv-vqos[0].*`) → maps to `punktfunk_core::Config`.
|
||||
|
||||
## The two highest interop risks (validate EARLY)
|
||||
|
||||
1. **RS-FEC matrix compatibility.** Sunshine + Moonlight both use **nanors** (GF(2⁸), poly
|
||||
0x11d, Vandermonde systematic). punktfunk-core uses `reed-solomon-erasure` (Cauchy) — parity
|
||||
bytes likely **don't match**, so Moonlight silently fails to recover any frame with a lost
|
||||
data shard. Mitigation: **on a clean LAN with no loss the client never runs RS decode**, so
|
||||
defer this — get a frame decoded first, then FFI/port nanors for loss recovery.
|
||||
2. **Crypto layout.** punktfunk's `SessionCrypto` (salt + seq-as-AAD) is wire-incompatible. P1
|
||||
needs a separate GameStream GCM path. Mitigation: **video encryption is negotiated and
|
||||
usually off on LAN** — implement plaintext video first, add GCM later.
|
||||
|
||||
## Phasing (each phase independently testable with a real Moonlight client)
|
||||
|
||||
- **P1.1 — Discovery + serverinfo + pairing.** mDNS `_nvstream._tcp`, HTTP/HTTPS nvhttp,
|
||||
`/serverinfo` XML, the 4-phase pairing + cert pinning. *Acceptance: Moonlight discovers,
|
||||
pairs (PIN), and shows the host as ready.* ← first slice.
|
||||
- **P1.2 — Launch + RTSP + virtual display.** `/launch` (parse rikey/rikeyid/mode), the RTSP
|
||||
handshake, negotiate `Config`, create a wlroots virtual output sized to the client.
|
||||
*Acceptance: Moonlight completes RTSP and the host stands up the UDP streams.*
|
||||
- **P1.3 — Video (punktfunk-core P1 codec), plaintext, clean-LAN.** RTP+NV framing + FEC shard
|
||||
layout in punktfunk-core; wire the spike's NVENC AUs → UDP 47998. *Acceptance: Moonlight DISPLAYS video.*
|
||||
- **P1.4 — Control + input.** ENet (`rusty_enet`) control stream; decode input → `inject.rs`
|
||||
(uinput/reis); request-IDR → force NVENC keyframe. *Acceptance: mouse/keyboard work.*
|
||||
- **P1.5 — Robustness: FEC recovery + encryption.** nanors-exact FEC; per-shard AES-GCM.
|
||||
*Acceptance: stable under `tc netem` loss; encrypted streams.*
|
||||
- **P1.6 — Audio + polish.** Opus + audio RTP/FEC/CBC (UDP 47999); disconnect teardown; KWin
|
||||
backend for the user's KDE box. *Acceptance: full game stream with sound — the GameStream-host goal.*
|
||||
|
||||
## Crates (verified available)
|
||||
|
||||
`mdns-sd` 0.20 (discovery) · `axum` 0.8 + `rustls` + `tokio-rustls` (nvhttp/HTTPS, custom
|
||||
`ClientCertVerifier` for pinning) · `rcgen` 0.14 + `x509-parser` 0.18 + `rsa`/`sha2`/`aes`/
|
||||
`ecb` (pairing crypto) · hand-rolled RTSP over `tokio::net::TcpListener` · `rusty_enet` 0.4
|
||||
(control) · `opus` 0.3 (audio) · `reis` 0.6 + `input-linux` (input) · `aes-gcm` (already in
|
||||
core) for the P1 video/control GCM path; nanors (FFI/port) for FEC recovery in P1.5.
|
||||
|
||||
## Testing note
|
||||
|
||||
The host is headless; end-to-end needs a **stock Moonlight client on the LAN** pointed at
|
||||
this box (manual "add host" by IP works without mDNS). P1.1 is testable with `curl` against
|
||||
`/serverinfo` + the Moonlight pair flow; P1.3+ needs a client that can display.
|
||||
@@ -1,300 +0,0 @@
|
||||
---
|
||||
title: "Implementation Plan"
|
||||
description: "The full design: protocol core, milestones, and architecture."
|
||||
---
|
||||
|
||||
|
||||
*A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.*
|
||||
|
||||
> The name `punktfunk` fits the lowercase house style (`unom`, `played`, `remplir`) and reads as "glass-to-glass light," which is the whole point.
|
||||
|
||||
---
|
||||
|
||||
## 0. The thesis (why this is worth building)
|
||||
|
||||
Two concrete gaps justify a new project rather than another fork:
|
||||
|
||||
1. **The 1 Gbps wall is a FEC design limit, not a bandwidth limit.** Moonlight/Sunshine protect each frame with Reed–Solomon over GF(2⁸), which caps a block at 255 shards. At 5120×1440@240 that ceiling is hit around 1 Gbps. Switching the erasure code to **Leopard-RS over GF(2¹⁶)** (via the `reed-solomon-simd` crate) raises the per-block shard limit to 65,536 and runs in O(n log n) with SIMD. The wall disappears as a *consequence* of a better core, not as a hack.
|
||||
|
||||
2. **Linux software virtual displays are a real, unfilled gap.** The compositor-side capability now exists (Mutter headless virtual monitors since GNOME 40; wlroots headless outputs; KWin virtual outputs in Plasma 6), but no streaming host *drives* those APIs to create a client-sized output on demand, capture it via PipeWire, and route input back via libei. Apollo's virtual display is Windows-only. This is the immediate, shippable win.
|
||||
|
||||
**Strategic ordering:** ship the Linux virtual-display host speaking the *existing* Moonlight protocol first (every Moonlight/Artemis client works on day one, no client to write). Only then introduce the new GF(2¹⁶) transport as a negotiated protocol extension with our own clients. Value early, hard parts deferred until de-risked.
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope & non-goals
|
||||
|
||||
**In scope (eventually):**
|
||||
- Linux streaming host with on-demand software virtual displays (KWin first, then wlroots, then Mutter).
|
||||
- A shared Rust protocol/transport/FEC core exposed over a stable C ABI.
|
||||
- A modern transport that removes the 1 Gbps ceiling.
|
||||
- Native clients: Rust (Linux), Swift (macOS/iOS), Kotlin (Android) — all linking the same core.
|
||||
|
||||
**Explicit non-goals (at least at first):**
|
||||
- Windows *host* support (Sunshine/Apollo already do this well; no gap to fill).
|
||||
- Internet/NAT-traversal relay infrastructure (LAN/VPN first; lean on an existing mesh VPN such as Headscale/NetBird/Tailscale).
|
||||
- Reinventing encoders/decoders (bind to FFmpeg + vendor SDKs; never rewrite codecs).
|
||||
- A bespoke compositor (drive existing ones; only consider a dedicated headless compositor as a *deployment mode*, see §6).
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture overview
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Host["Linux Host (Rust)"]
|
||||
VD["Virtual display orchestrator<br/>(KWin / wlroots / Mutter)"]
|
||||
CAP["Capture<br/>(PipeWire / dmabuf)"]
|
||||
ENC["Encoder<br/>(VAAPI / NVENC via FFmpeg)"]
|
||||
VD --> CAP --> ENC
|
||||
ENC --> COREH
|
||||
IN_H["Input injector<br/>(libei / uinput)"]
|
||||
COREH["punktfunk-core (C ABI)<br/>protocol · FEC · pacing · crypto"]
|
||||
COREH --> IN_H
|
||||
end
|
||||
|
||||
COREH <-->|"UDP+FEC video / QUIC control+audio"| COREC
|
||||
|
||||
subgraph Client["Client (Rust / Swift / Kotlin)"]
|
||||
COREC["punktfunk-core (same crate, C ABI)"]
|
||||
DEC["Decoder<br/>(VideoToolbox / NVDEC / VAAPI)"]
|
||||
PRES["Present + frame pacing"]
|
||||
INP["Input capture"]
|
||||
COREC --> DEC --> PRES
|
||||
INP --> COREC
|
||||
end
|
||||
```
|
||||
|
||||
**The load-bearing decision:** `punktfunk-core` is one crate, compiled once, linked by every host and client through a C ABI. Protocol logic, FEC, packet pacing, jitter buffering, pairing, and crypto live there and exist exactly once. Platform code (capture, encode, decode, present, input, UI) lives outside the core and is written in whatever language suits the platform.
|
||||
|
||||
---
|
||||
|
||||
## 3. Protocol strategy (three phases)
|
||||
|
||||
| Phase | Protocol | Clients that work | Bitrate ceiling | Purpose |
|
||||
|------|----------|-------------------|-----------------|---------|
|
||||
| **P1** | GameStream-compatible (existing Moonlight wire format) | All existing Moonlight/Artemis clients | ~1 Gbps (legacy GF(2⁸) FEC) | Ship the Linux virtual-display win with zero client work |
|
||||
| **P2** | `punktfunk/1` negotiated extension: GF(2¹⁶) FEC, multi-block framing, optional QUIC control | punktfunk clients only; falls back to P1 for others | Multi-Gbps | Break the wall; introduce native clients |
|
||||
| **P3** | `punktfunk/1` as primary; GameStream kept as compat shim | punktfunk everywhere, Moonlight as fallback | Multi-Gbps | Full control of features (mic passthrough, per-client identity, HDR signalling) |
|
||||
|
||||
Negotiation: extend the `serverinfo`/RTSP `SETUP` handshake with a capability flag. Old clients never see the flag and get P1 behavior. This is how Apollo/Artemis diverge cleanly, and it keeps you compatible while you build.
|
||||
|
||||
---
|
||||
|
||||
## 4. Tech stack (settled)
|
||||
|
||||
**Language split:** Rust for the core and all non-Apple platform code; Swift only for the macOS/iOS client UI + VideoToolbox/Metal; Kotlin for Android UI + MediaCodec. The C ABI is the seam.
|
||||
|
||||
**Threading:** native OS threads for the video hot path. `tokio` is allowed *only* for the control plane (pairing, web config, QUIC control stream). The per-frame pipeline must never touch an async runtime.
|
||||
|
||||
### Core crate dependencies
|
||||
|
||||
| Concern | Crate | Notes |
|
||||
|--------|-------|-------|
|
||||
| FEC | `reed-solomon-simd` (v3+) | Leopard/GF(2¹⁶), SIMD, O(n log n) — the wall-breaker |
|
||||
| QUIC (control/audio) | `quinn` | Datagram ext for audio; reliable streams for control |
|
||||
| TLS / crypto | `rustls` + `ring` (or `aws-lc-rs`) | Pairing, session keys (AES-GCM to match GameStream in P1) |
|
||||
| Serialization | `zerocopy` / `bytes` | Wire structs `#[repr(C)]`, zero-copy parse |
|
||||
| C header gen | `cbindgen` | Generates `punktfunk_core.h` from the ABI module |
|
||||
| Error/log | `tracing` | Structured; feature-gate off the hot path |
|
||||
|
||||
### Linux host dependencies
|
||||
|
||||
| Concern | Crate / API | Notes |
|
||||
|--------|-------------|-------|
|
||||
| Capture | `pipewire` (pipewire-rs) | ScreenCast portal stream → dmabuf |
|
||||
| Portal / DBus | `ashpd` + `zbus` | xdg-desktop-portal: ScreenCast, RemoteDesktop |
|
||||
| Encode | `ffmpeg-next` or `rsmpeg` | VAAPI / NVENC, dmabuf import (zero-copy) |
|
||||
| Input inject | `reis` (libei) + `input-linux` (uinput fallback) | Wayland-native first, uinput as universal fallback |
|
||||
| Virtual output | per-compositor (see §6) | KWin DBus / Sway `create_output` / Mutter DBus |
|
||||
| Web config | `axum` + `tokio` + small Vite/React UI | You own this stack already |
|
||||
|
||||
### Apple client (P2+)
|
||||
|
||||
Swift + VideoToolbox (decode) + Metal (present) + SwiftUI. Imports `punktfunk_core.h` directly via a module map — no glue layer.
|
||||
|
||||
### Ruled out
|
||||
|
||||
- **Swift for the host/core:** no Linux Wayland/PipeWire/DRM/VAAPI ecosystem; ARC in hot loops. (Excellent *Apple-client* language, wrong for systems/Linux.)
|
||||
- **Go:** GC disqualifies the hot path.
|
||||
- **C++:** throws away the safety/concurrency wins that justified greenfield over forking.
|
||||
- **Zig:** best-in-class C interop, but pre-1.0 with no Wayland/QUIC ecosystem — too much risk for a multi-month build. Revisit later if desired.
|
||||
|
||||
---
|
||||
|
||||
## 5. The C ABI boundary
|
||||
|
||||
Design it on day one; retrofitting an ABI is painful.
|
||||
|
||||
**Principles**
|
||||
- Opaque handles only across the boundary: `PunktfunkSession*`, never Rust types.
|
||||
- All cross-boundary structs are `#[repr(C)]`; primitives + pointer/len pairs for buffers.
|
||||
- Async events via registered C callbacks (`fn ptr` + `void* userdata`).
|
||||
- Explicit, documented ownership: who frees what, when. Provide `punktfunk_*_free` for every allocation that crosses out.
|
||||
- Versioned ABI: `uint32_t punktfunk_abi_version(void)` + a `PunktfunkConfig` struct whose first field is its own size for forward-compat.
|
||||
|
||||
**Minimal surface (sketch)**
|
||||
|
||||
```c
|
||||
// lifecycle
|
||||
PunktfunkSession* punktfunk_session_new(const PunktfunkConfig* cfg);
|
||||
void punktfunk_session_free(PunktfunkSession*);
|
||||
|
||||
// host: feed an encoded access unit (the core does FEC + packetize + pace + send)
|
||||
int punktfunk_host_submit_frame(PunktfunkSession*, const uint8_t* data, size_t len,
|
||||
uint64_t pts_ns, PunktfunkFrameFlags flags);
|
||||
|
||||
// client: pull a reassembled, FEC-recovered access unit ready to decode
|
||||
int punktfunk_client_poll_frame(PunktfunkSession*, PunktfunkFrame* out /*borrowed until next poll*/);
|
||||
|
||||
// input (both directions): client captures, host receives via callback
|
||||
int punktfunk_send_input(PunktfunkSession*, const PunktfunkInputEvent*);
|
||||
void punktfunk_set_input_callback(PunktfunkSession*, PunktfunkInputCb, void* user);
|
||||
|
||||
// stats for the frame-pacing/quality logic and the web UI
|
||||
void punktfunk_get_stats(PunktfunkSession*, PunktfunkStats* out);
|
||||
```
|
||||
|
||||
Keep it this small. Everything platform-specific (how you got the encoded bytes, how you decode them) stays on the platform side.
|
||||
|
||||
---
|
||||
|
||||
## 6. Virtual display orchestration
|
||||
|
||||
This is the differentiator and the most fragmented part. Two deployment models — support both eventually, pick one for the MVP.
|
||||
|
||||
**Model A — Attach to the running session.** Create a client-sized virtual output *inside the user's live desktop*, stream it, tear it down on disconnect. This is "add a monitor to my actual PC." Best UX, hardest because it depends on per-compositor runtime APIs.
|
||||
|
||||
**Model B — Dedicated headless session.** Spawn a separate headless compositor purely for the stream (e.g. `gnome-shell --headless --virtual-monitor WxH`, or a headless wlroots compositor). Cleaner isolation, sidesteps runtime-output APIs, ideal for "remote second PC." Worse for "mirror/extend my real desktop."
|
||||
|
||||
**Per-compositor (Model A) runtime virtual-output creation:**
|
||||
|
||||
- **KWin / Plasma 6 (recommended MVP target — a common KDE daily-driver setup, and where the gap is loudest):** KWin can create virtual outputs; KRdp already does this internally for remote sessions. Drive it via the KWin DBus interface; capture via `xdg-desktop-portal-kde` ScreenCast (PipeWire); inject input via the RemoteDesktop portal or `reis`.
|
||||
- **wlroots (Sway/Hyprland — fastest to *prototype* the pipeline):** enable the headless backend (`WLR_BACKENDS=…,headless`), then `swaymsg create_output` / `hyprctl output create headless`. Capture via `wlr-screencopy` or the portal. Simplest API; good for validating capture→encode→send before fighting KWin/Mutter.
|
||||
- **Mutter / GNOME:** virtual monitors via the headless backend; runtime creation via Mutter DBus (`org.gnome.Mutter.*` — partly experimental). Capture via `xdg-desktop-portal-gnome` ScreenCast.
|
||||
|
||||
**Recommendation:** do a 1–2 day wlroots spike to prove the *pipeline*, then build the real MVP on KWin because that's your deployment target. Abstract virtual-output creation behind a trait so compositors are pluggable:
|
||||
|
||||
```rust
|
||||
trait VirtualDisplay {
|
||||
fn create(&self, mode: Mode) -> Result<OutputHandle>;
|
||||
fn destroy(&self, h: OutputHandle) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. The hot path: pipeline & latency budget
|
||||
|
||||
Per-frame pipeline, each stage on its own thread, connected by bounded SPSC channels (drop-oldest on overflow, never block the encoder):
|
||||
|
||||
```
|
||||
capture(dmabuf) → encode(NVENC/VAAPI) → core[FEC+packetize+pace+send]
|
||||
│ network
|
||||
client: recv → core[reorder+FEC recover+jitter] → decode → present
|
||||
```
|
||||
|
||||
**Glass-to-glass budget (LAN, 240 Hz = 4.17 ms/frame):**
|
||||
|
||||
| Stage | Target | Notes |
|
||||
|------|--------|-------|
|
||||
| Capture latency | ≤ 1 frame | dmabuf, no copy to CPU |
|
||||
| Encode | 1–4 ms | NVENC low-latency preset; tune lookahead off |
|
||||
| FEC + packetize | < 1 ms | SIMD RS; pre-allocated shard buffers |
|
||||
| Network (LAN) | < 1 ms | `sendmmsg` / UDP GSO to cut syscalls |
|
||||
| Jitter buffer | 0–1 frame | adaptive; minimum that hides observed jitter |
|
||||
| FEC recover + reassemble | < 1 ms | only when loss occurs |
|
||||
| Decode | 1–4 ms | hardware decoder |
|
||||
| Present | ≤ 1 frame | align to client vsync |
|
||||
|
||||
**Target: 15–35 ms glass-to-glass on LAN.** The art is *frame pacing* — matching capture/encode cadence to the client's actual refresh and keeping the jitter buffer as small as the link allows. This, not the codec, is what separates good from bad streaming. Budget real time for it.
|
||||
|
||||
**Throughput math to keep honest:** 5120×1440@240 ≈ 1.77 Gpx/s. At 0.5 bpp that's ~885 Mbps; 0.6 bpp ≈ 1.06 Gbps; 0.8 bpp (4:4:4 headroom) ≈ 1.4 Gbps. The GF(2¹⁶) FEC + multi-block framing must sustain these without the per-frame shard count being the limiter — which it no longer is once you leave GF(2⁸).
|
||||
|
||||
---
|
||||
|
||||
## 8. Milestones
|
||||
|
||||
Sizing is rough and relative (Spike / S / M / L) for a focused solo dev; treat as ordering, not deadlines.
|
||||
|
||||
**M0 — Pipeline spike (S).** wlroots headless output → PipeWire capture → VAAPI/NVENC encode → dump H.265 to a file that plays. *Acceptance:* a valid encoded file from a virtual output, no streaming yet. Proves the Linux capture+encode chain end-to-end.
|
||||
|
||||
**M1 — `punktfunk-core` skeleton + C ABI (M).** Session lifecycle, GameStream-compatible packetization and GF(2⁸) FEC (P1), AES-GCM, `cbindgen` header, a tiny C test harness. *Acceptance:* core links from C; round-trips packets in a loopback test with simulated loss.
|
||||
|
||||
**M2 — P1 host: stream to stock Moonlight (L).** Wire M0's pipeline into the core; implement `serverinfo`/pairing/RTSP enough for a real Moonlight client to connect, with a KWin virtual output created on connect and destroyed on disconnect. Input via `reis`/uinput. *Acceptance:* **you play a game on your KDE box streamed to a stock Moonlight client on a virtual display, no dummy plug, no kernel args.** This is the shippable milestone and the project's reason to exist.
|
||||
|
||||
**M3 — Measurement harness (S).** Glass-to-glass latency measurement (on-screen QR/timestamp or photodiode), packet-loss injection, frame-pacing and stall metrics surfaced in the web UI. *Acceptance:* you can quantify a regression. Build this before optimizing anything.
|
||||
|
||||
**M4 — P2 transport: break the wall (L).** Add `punktfunk/1` negotiation; swap to `reed-solomon-simd` GF(2¹⁶) with multi-block per-frame framing; optional QUIC control/audio. Write a minimal **Rust** reference client (decode via VAAPI, present via wgpu/Vulkan) to exercise it. *Acceptance:* a stable stream above 1.4 Gbps at 5120×1440@240 with loss recovery working; latency unchanged vs. M2.
|
||||
|
||||
**M5 — Apple client (L).** Swift + VideoToolbox + Metal + SwiftUI, linking `punktfunk-core` via the C header. *Acceptance:* a Mac plays a stream at native resolution/refresh.
|
||||
|
||||
**M6 — Feature surface (M, ongoing).** Mic passthrough as a proper encrypted, per-client reverse audio stream (the thing the upstream PR got wrong); HDR signalling; per-client identity/permissions; pause/resume. *Acceptance:* feature parity with Apollo on the items you care about, plus mic done right.
|
||||
|
||||
---
|
||||
|
||||
## 9. Risk register
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| KWin runtime virtual-output API is undocumented/unstable | High | High | Spike on wlroots first to de-risk the pipeline; study KRdp's source for the KWin path; keep `VirtualDisplay` pluggable so a stuck compositor doesn't block the project |
|
||||
| Wayland input injection gaps (libei still evolving) | Med | Med | uinput fallback always available; `reis` for the Wayland-native path |
|
||||
| dmabuf → encoder zero-copy import quirks per GPU/driver | High | Med | Validate on your actual NVIDIA + AMD hardware early (M0); have a CPU-copy fallback path |
|
||||
| Encoder/decoder can't sustain 1.77 Gpx/s @ 240 | Med | High | Measure in M0/M4 on real silicon; this is a hardware ceiling no rewrite fixes — discover it before P2, not after |
|
||||
| Frame pacing eats more time than expected | High | Med | M3 measurement harness first; treat pacing as a first-class subsystem, not a polish step |
|
||||
| Scope creep into a full Moonlight replacement | High | High | P1 (stock-client compat) is the firewall: it forces you to ship value before writing a client |
|
||||
| Solo bandwidth vs. other projects | High | Med | M2 is a complete, useful artifact on its own; the plan is safe to pause after any milestone |
|
||||
|
||||
---
|
||||
|
||||
## 10. Testing & measurement
|
||||
|
||||
- **Loopback correctness:** core encodes→FEC→loss-inject→recover→decode in-process; property tests over loss patterns and shard counts (proptest).
|
||||
- **Glass-to-glass latency:** rendered timestamp/QR on host, read back on client capture; or a photodiode for true photons. Track p50/p99.
|
||||
- **Loss resilience:** `tc netem` to inject loss/jitter/reorder; verify FEC recovery and graceful degradation.
|
||||
- **Pacing:** log present timestamps vs. client vsync; alert on stalls and duplicate/dropped frames.
|
||||
- **Soak:** multi-hour streams; watch for buffer growth, fd leaks, encoder session exhaustion.
|
||||
- **Hardware matrix:** an NVIDIA box (NVENC), an AMD/Intel box (VAAPI), a Mac (VideoToolbox decode). Catch driver quirks early.
|
||||
|
||||
---
|
||||
|
||||
## 11. Repo / workspace structure
|
||||
|
||||
```
|
||||
punktfunk/
|
||||
├── Cargo.toml # workspace
|
||||
├── crates/
|
||||
│ ├── punktfunk-core/ # protocol, FEC, pacing, crypto — C ABI (cdylib + staticlib)
|
||||
│ │ ├── src/abi.rs # #[no_mangle] extern "C" surface
|
||||
│ │ ├── src/fec.rs # GF(2^16) blocking over reed-solomon-simd
|
||||
│ │ ├── src/transport/ # udp+fec video, quinn control/audio
|
||||
│ │ ├── src/protocol/ # gamestream-compat (P1) + punktfunk/1 (P2)
|
||||
│ │ └── cbindgen.toml
|
||||
│ ├── punktfunk-host/ # Linux host binary
|
||||
│ │ ├── src/capture/ # pipewire / portal
|
||||
│ │ ├── src/encode/ # ffmpeg vaapi/nvenc
|
||||
│ │ ├── src/vdisplay/ # trait + kwin/wlroots/mutter impls
|
||||
│ │ ├── src/input/ # reis + uinput
|
||||
│ │ └── src/web/ # axum config/pairing API
|
||||
│ └── punktfunk-probe/ # reference Rust client (M4)
|
||||
├── clients/
|
||||
│ ├── apple/ # Swift package, imports punktfunk_core.h (M5)
|
||||
│ └── android/ # Kotlin + JNI (later)
|
||||
├── include/ # generated punktfunk_core.h
|
||||
└── tools/
|
||||
├── latency-probe/
|
||||
└── loss-harness/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Immediate next actions (first week)
|
||||
|
||||
1. **Stand up the workspace** with `punktfunk-core` (empty ABI + `cbindgen`) and `punktfunk-host` skeletons; wire up CI (Gitea Actions, BuildKit-based pipelines).
|
||||
2. **M0 spike on wlroots:** headless output → PipeWire capture → NVENC/VAAPI encode → playable file. This validates the riskiest *pipeline* assumptions in days, on real GPU hardware.
|
||||
3. **Read KRdp's source** for how KDE creates virtual outputs and casts them — it's the closest existing reference for the KWin path needed in M2.
|
||||
4. **Decide P1 protocol depth:** confirm exactly which `serverinfo`/RTSP/pairing messages a current Moonlight client requires for a successful connect, so M2's compat surface is scoped precisely.
|
||||
|
||||
---
|
||||
|
||||
*The shape of the bet: M2 alone — virtual-display streaming to stock Moonlight clients on Linux — is a complete, useful, gap-filling release. Everything after it (the wall-breaking transport, native clients, mic-done-right) is upside you unlock from a position of having already shipped, with the hard transport work resting on a FEC core that makes the 1 Gbps ceiling a thing of the past rather than a thing to hack around.*
|
||||
@@ -71,7 +71,9 @@ certificate, so you import that certificate once before Windows will install the
|
||||
|
||||
1. Open the [packages page](https://git.unom.io/unom/-/packages) (generic group), find
|
||||
**`punktfunk-client-windows`**, and download the newest **`.msix`** and its matching **`.cer`**.
|
||||
2. In an **admin** PowerShell, trust the publisher certificate (one-time), then install:
|
||||
2. **Trust the publisher certificate**, then install. The MSIX won't install until the certificate is
|
||||
trusted — but it's the **same certificate for every release**, so this is genuinely one-time and
|
||||
later updates need nothing. In an **admin** PowerShell:
|
||||
|
||||
```powershell
|
||||
Import-Certificate -FilePath .\punktfunk-client-windows.cer `
|
||||
|
||||
@@ -1,11 +1,12 @@
|
||||
---
|
||||
title: Install the Host
|
||||
description: Pick your distro and install the punktfunk host from its package registry.
|
||||
description: Install the punktfunk host — on Linux from its package registry, or on Windows from a signed installer.
|
||||
---
|
||||
|
||||
The package registries are the real distribution channel. Pick your distro, add the repo, and install
|
||||
with your native package manager. Each row links to the full per-distro guide (add the repo, first-run
|
||||
steps, the web console) — those are the source of truth, so this page doesn't duplicate them.
|
||||
On Linux, the package registries are the real distribution channel. Pick your distro, add the repo, and
|
||||
install with your native package manager. Each row links to the full per-distro guide (add the repo,
|
||||
first-run steps, the web console) — those are the source of truth, so this page doesn't duplicate them.
|
||||
On **Windows** (NVIDIA), the host ships as a signed installer instead — see [Windows](#windows-nvidia).
|
||||
|
||||
## Pick your distro
|
||||
|
||||
@@ -19,6 +20,31 @@ Each registry is public — no auth, you just trust the repo's signing key. Addi
|
||||
one-time step covered in the linked guide; after that, normal `apt upgrade` / `rpm-ostree upgrade`
|
||||
tracks new builds automatically.
|
||||
|
||||
## Windows (NVIDIA)
|
||||
|
||||
punktfunk also runs as a native host on **Windows 10/11 (x64) with an NVIDIA GPU**, shipped as a
|
||||
signed installer — see [Windows Host](/docs/windows-host) for what it includes and its limitations.
|
||||
|
||||
1. From the [packages page](https://git.unom.io/unom/-/packages) (generic group), download the newest
|
||||
**`punktfunk-host-setup-<ver>.exe`** and its matching **`.cer`**.
|
||||
2. **Trust the publisher certificate once.** The installer is signed with a self-signed certificate
|
||||
whose public `.cer` is published next to it — the **same certificate for every release**, so this is
|
||||
genuinely one-time and later updates need nothing. In an **admin** PowerShell:
|
||||
|
||||
```powershell
|
||||
Import-Certificate -FilePath .\punktfunk-host-setup.cer `
|
||||
-CertStoreLocation Cert:\LocalMachine\TrustedPublisher
|
||||
```
|
||||
|
||||
3. Run `punktfunk-host-setup-<ver>.exe` (elevated). It installs to `C:\Program Files\punktfunk`,
|
||||
optionally installs the bundled **SudoVDA** virtual-display driver, and registers + starts the
|
||||
`LocalSystem` service (`/VERYSILENT` for an unattended install). Upgrades and uninstall go through
|
||||
Add/Remove Programs.
|
||||
|
||||
You need an NVIDIA GPU + driver (the host is NVENC-only on Windows). More detail — including the CLI
|
||||
`punktfunk-host service install` path — is in
|
||||
[Running as a Service → Windows](/docs/running-as-a-service#windows).
|
||||
|
||||
## What the packages are
|
||||
|
||||
- **`punktfunk-host`** — the streaming host. Install this on your Linux + NVIDIA gaming machine.
|
||||
|
||||
@@ -11,6 +11,7 @@
|
||||
"ubuntu-kde",
|
||||
"fedora-kde",
|
||||
"bazzite",
|
||||
"windows-host",
|
||||
"running-as-a-service",
|
||||
"---Connecting---",
|
||||
"clients",
|
||||
@@ -21,13 +22,7 @@
|
||||
"configuration",
|
||||
"host-cli",
|
||||
"troubleshooting",
|
||||
"---Project & Internals---",
|
||||
"roadmap",
|
||||
"implementation-plan",
|
||||
"windows-host",
|
||||
"apple-stage2-presenter",
|
||||
"gamescope-multiuser",
|
||||
"dualsense-haptics",
|
||||
"ci"
|
||||
"---Project---",
|
||||
"roadmap"
|
||||
]
|
||||
}
|
||||
|
||||
@@ -1,392 +1,74 @@
|
||||
---
|
||||
title: "Roadmap"
|
||||
description: "Decided next goals and the longer-term bets."
|
||||
description: "What's shipped, what's in progress, and what's next for punktfunk."
|
||||
---
|
||||
|
||||
A quick map of where punktfunk is today and where it's heading. For the detailed, dated changelog,
|
||||
see [Status & Progress](/docs/status).
|
||||
|
||||
Decided 2026-06-10 (research-grounded; see commit history), extended since.
|
||||
**Legend:** ✅ shipped · 🟡 in progress · 🔭 planned · ⛔ blocked upstream
|
||||
|
||||
**Done & live (on `main`):** #1 KDE reliability (Phase 1+2), #2 client compositor options (full
|
||||
stack incl. the macOS client), #4 mic passthrough, #5 touch (host path) + **rich UHID DualSense**
|
||||
— input + adaptive-trigger/LED feedback over the new `0xCC`/`0xCD` planes + C ABI, Phase C/D/E
|
||||
live-validated. #3 Bazzite packaging (`packaging/`) **deployed live** on a Bazzite F43 box (builds
|
||||
against FFmpeg 7 **or** 8; gamescope capture → zero-copy NVENC, sub-ms latency; Sunshine replaced).
|
||||
**Unified host:** `serve --native` runs the GameStream host + the punktfunk/1 QUIC host in one
|
||||
process, with native pairing driven from the **web console** (arm → show PIN), not the service log.
|
||||
Advanced DualSense (audio-driven voice-coil) haptics **scoped NO-GO** (`docs/dualsense-haptics.md`).
|
||||
**Bazzite dynamic resolution (`c894c6f`):** the host now *manages* a headless `gamescope-session-plus`
|
||||
Steam session at the **client's exact resolution + refresh** — games see it (via injected
|
||||
`--nested-refresh` + generated CVT modes, not the host's TV EDID), relaunched per-connection on a mode
|
||||
change, reused (no Steam restart) on the same mode. Plus macOS/iPad input fixes (NSEvent motion +
|
||||
iPad pointer-lock) and a 4K/5K one-frame-freeze fix (grow the UDP socket buffers).
|
||||
## At a glance
|
||||
|
||||
The four native clients — **macOS/iOS/iPadOS/tvOS**, **Linux**, **Windows**, and **Android** — are
|
||||
all real, working clients; see [Status & Progress](/docs/status) for their current state.
|
||||
| Area | |
|
||||
|---|---|
|
||||
| Protocol core — FEC · crypto · C ABI | ✅ |
|
||||
| GameStream host (works with Moonlight) | ✅ |
|
||||
| Native `punktfunk/1` protocol | ✅ |
|
||||
| Linux host (KWin · GNOME · gamescope · Sway) | ✅ |
|
||||
| Windows host (NVIDIA) | ✅ beta |
|
||||
| Apple client (macOS · iOS · iPadOS · tvOS) | ✅ |
|
||||
| Linux client (GTK4) | ✅ |
|
||||
| Android client (phone · TV) | ✅ |
|
||||
| Windows client | 🟡 |
|
||||
| Web console + pairing | ✅ |
|
||||
| Concurrent sessions (shared desktop) | ✅ |
|
||||
| Network speed test + bitrate | ✅ |
|
||||
| HDR / 10-bit (host capture) | ⛔ |
|
||||
| Sub-frame pipelining (latency) | 🔭 |
|
||||
|
||||
A native **Windows host** (§7) is also implemented and shipping (NVIDIA-only, x64-only) — see
|
||||
[Windows Host](/docs/windows-host).
|
||||
## ✅ Shipped
|
||||
|
||||
**Next:** **§8 pairing & trust hardening** (mandatory PIN by default + delegated approval) and the
|
||||
native client presenter + iOS (§6). **§10 HDR/10-bit is parked — blocked upstream at the compositor**
|
||||
(no gamescope/KWin PipeWire 10-bit producer yet).
|
||||
- **The host, two ways.** A GameStream host any [Moonlight](/docs/moonlight) client can use, and the
|
||||
lower-latency native [`punktfunk/1`](/docs/how-it-works) protocol (QUIC control + UDP data with
|
||||
GF(2¹⁶) Leopard FEC + AES-GCM). Both run from one process.
|
||||
- **Native-resolution virtual displays** on Linux across KWin, GNOME/Mutter, gamescope, and
|
||||
Sway/wlroots, with a fully zero-copy GPU path to NVENC (stable 240 fps at 5120×1440).
|
||||
- **A native Windows host** (NVIDIA, x64) — a signed installer with secure-desktop capture and a
|
||||
bundled virtual-display driver. See [Windows Host](/docs/windows-host). *(Beta — newer than the
|
||||
Linux host.)*
|
||||
- **Clients on every platform** — native apps for **Apple** (macOS, iOS, iPadOS, tvOS), **Linux**,
|
||||
**Android** (phone + TV), and **Windows**, each with hardware decode, controllers including
|
||||
DualSense, audio + mic, and automatic host discovery. See [Clients](/docs/clients).
|
||||
- **Secure by default** — SPAKE2 PIN pairing with pinned reconnects, one-click delegated approval from
|
||||
the web console, and mDNS LAN auto-discovery.
|
||||
- **Tuned for latency** — concurrent sessions (stream one desktop to several devices at once),
|
||||
mid-stream resolution renegotiation, a cross-machine clock-skew handshake, a 1 Gbps+ data plane, and
|
||||
an in-app network speed test that informs the bitrate picker.
|
||||
|
||||
## 1. Reliable headless KDE/compositor spawning ✅ *(done — Phase 1 + 2)*
|
||||
## 🟡 In progress
|
||||
|
||||
Startup is a chain of timing-sensitive handoffs with no readiness checks — each is a blind
|
||||
`sleep`, one-shot timeout, or silent fire-and-forget that fails into a black screen.
|
||||
- **Windows client on-glass validation.** The hardware (D3D11VA) decode, HDR present, and GUI are
|
||||
built and ship as a signed MSIX — they just need verification on real GPU hardware.
|
||||
- **Apple stage-2 presenter as the default.** The lower-latency `VTDecompressionSession` →
|
||||
`CAMetalLayer` path is live behind an opt-in flag and graduating to the default.
|
||||
- **Web console parity.** Surfacing the speed test and bitrate picker the apps already have.
|
||||
- **Windows host hardening.** Broader real-world testing, AMD/Intel encode (NVIDIA-only today), and
|
||||
bundling the ViGEm gamepad driver.
|
||||
|
||||
- **Phase 1 (S):** replace `run-headless-kde.sh`'s blind `sleep 2` with an active readiness
|
||||
wait (kwin socket + `wl_display` roundtrip + `zkde_screencast` global advertised +
|
||||
KWIN_PID alive); add a `punktfunk-host probe-compositor` subcommand (reuses kwin.rs's
|
||||
registry roundtrip); move the portal restart to *after* readiness and precede it with
|
||||
`systemctl --user import-environment` + `dbus-update-activation-environment` (the missing
|
||||
env import — the Sway script does this, the KDE one doesn't).
|
||||
- **Phase 2 (M):** bounded retry-with-backoff around `vd.create()` + first-frame
|
||||
(permanent vs transient); a PipeWire negotiation watchdog with zero-copy→CPU auto-fallback
|
||||
("no PipeWire frame within 10s" → recovery or precise diagnosis); fix `set_custom_refresh`
|
||||
to wait for the output, read back the active mode, reconcile encoder fps; harden gamescope
|
||||
node discovery + detect the known-bad-gamescope signature; graceful PipeWire-thread stop.
|
||||
- **Phase 3 (L):** supervised systemd user session (kwin + portal + host) with the readiness
|
||||
probe as an `ExecStartPost` gate, `Restart=on-failure`.
|
||||
## 🔭 Planned
|
||||
|
||||
## 2. Offer available compositors in the client ✅ *(done)*
|
||||
- **Sub-frame pipelining.** Overlap encode and transmit within a single frame (a direct NVENC slice
|
||||
path) — the next big latency lever at high resolutions.
|
||||
- **True glass-to-glass latency** measured end to end (capture → on-screen present).
|
||||
- **gamescope multi-user isolation.** Per-session input and audio so concurrent clients are fully
|
||||
independent desktops (the shared-desktop case already works).
|
||||
- **Peer-approved pairing.** Approve a new device from an already-paired device's own app.
|
||||
|
||||
Host enumerates which backends are actually available (binary present + version OK:
|
||||
gamescope ≥3.16.22, KWin ≥6.5.6, gnome-shell, sway), advertises the list in the punktfunk/1
|
||||
Welcome + a mgmt-API field; client sends its pick in the Hello; host honors it per session.
|
||||
Picker in the Apple client + web console.
|
||||
## ⛔ Parked / blocked
|
||||
|
||||
## 3. Bazzite / install on other devices ✅ *(packaging written — `packaging/`)*
|
||||
|
||||
Bazzite already ships gamescope + PipeWire + the NVIDIA driver (incl. `libnvidia-encode`);
|
||||
it's Fedora-atomic and the community installs Sunshine via COPR rpm-ostree — the analog.
|
||||
Written: `packaging/rpm/punktfunk.spec` (builds the host from source), `packaging/bootc/Containerfile`
|
||||
(`FROM bazzite-nvidia`), `packaging/bazzite/host.env` (gamescope default), `packaging/copr/` +
|
||||
`packaging/README.md`. The build itself is operator-run (COPR / a Fedora toolbox; not buildable on
|
||||
the Ubuntu dev box). `LICENSE-{MIT,APACHE}` added to match the declared dual license.
|
||||
|
||||
- **M-Bazzite-1:** a **COPR RPM** (primary) — binary + `60-punktfunk.rules` (→
|
||||
`/usr/lib/udev/rules.d`) + systemd `--user` unit + `host.env.example`; `Requires` the
|
||||
NVENC ffmpeg-libs Bazzite already pulls; links host `libcuda`/`libnvidia-encode` directly.
|
||||
Install = `rpm-ostree install` + reboot + add to `input`/`render`. Default backend =
|
||||
Bazzite's already-present **gamescope** (minimal session plumbing).
|
||||
- **M-Bazzite-2:** wrap the RPM in a **bootc/OCI image layer** (`FROM
|
||||
ghcr.io/ublue-os/bazzite-nvidia:stable`) for the appliance/"just rebase" experience.
|
||||
- Flatpak only later as an explicitly-degraded convenience build (sandbox fights
|
||||
zero-copy NVENC/dmabuf/uinput).
|
||||
|
||||
## 4. Mic passthrough — client mic → host input device ✅ *(done — host side)*
|
||||
|
||||
The exact mirror of the host→client desktop-audio path. A PipeWire virtual source apps can
|
||||
select = a `pw_stream` with `Direction::Output` + `media.class=Audio/Source`.
|
||||
|
||||
- New `0xCB` MIC_AUDIO datagram (mirror of `0xC9`) + `NativeClient::send_audio` + ABI
|
||||
`punktfunk_send_audio`.
|
||||
- `audio/source_linux.rs` — near-copy of the capture file, Direction::Output, fed from a
|
||||
jitter buffer (silence-fill underrun, Opus PLC).
|
||||
- Host `mic_thread` (Opus decode → ring → source); teardown RAII, set `node.dont-reconnect`.
|
||||
- Apple capture (AVAudioEngine → Opus). **Opt-in + paired-only** (a remote mic is a privacy
|
||||
surface). punktfunk/1-only.
|
||||
|
||||
## 5. Touch + rich DualSense *(decision: commit to full UHID DualSense)*
|
||||
|
||||
- **Touch — implemented (host path), pending a backend that lands it.** `TouchDown/Move/Up`
|
||||
InputKinds (reuse the abs-pointer `flags=(w<<16)|h` mapping, `code`=touch id); host
|
||||
`inject/libei.rs` requests the `Touchscreen` device type + binds the `Touch` capability and
|
||||
injects `ei_touchscreen` down/motion/up; `punktfunk-probe --touch-test` drags a finger.
|
||||
**Validated:** KWin's RemoteDesktop portal *grants* the Touchscreen device type, but its EIS
|
||||
server creates **no touchscreen device** (headless KWin) — so touch currently no-ops on KWin
|
||||
(now logged once). The code is correct; it needs a backend that exposes `ei_touchscreen`
|
||||
(gamescope / newer KWin / the real iPad client path) to land. wlroots: no virtual-touch wired.
|
||||
- **Rich DualSense — HID backend built & validated live.** `inject/dualsense.rs`: a hand-rolled
|
||||
`/dev/uhid` codec (no bindgen) presenting a genuine USB DualSense (vendor 054C/0CE6, the 232-byte
|
||||
inputtino report descriptor) bound by the kernel `hid-playstation` driver. The mandatory
|
||||
GET_REPORT feature handshake (calibration 0x05 / pairing 0x09 / firmware 0x20) is answered, so the
|
||||
kernel creates the full device (gamepad/motion/touchpad/lightbar). Input report `0x01` is built
|
||||
from gamepad frames; output report `0x02` is parsed for LED RGB, player LEDs, and **adaptive
|
||||
trigger effects (L2/R2)**. Protocol carries new side-planes: rich-input `0xCC`
|
||||
(touchpad/motion) + HID-output `0xCD` (LED/triggers). `/dev/uhid` udev rule shipped.
|
||||
- **Rich DualSense — Phase C/D/E end-to-end, validated live.** `PUNKTFUNK_GAMEPAD=dualsense`
|
||||
selects a per-session `DualSenseManager` (the `PadBackend` enum in `punktfunk1.rs`): client gamepad frames
|
||||
build the DualSense report; the kernel's feedback comes back as `HidOutput` on the **0xCD** plane
|
||||
(lightbar / player LEDs / adaptive triggers) while **rumble stays on the universal 0xCA plane**
|
||||
(so non-DualSense clients still feel it); touchpad + motion ride the **0xCC** rich-input plane
|
||||
(`DualSenseManager::apply_rich`, merged with button state). The connector + C ABI gained
|
||||
`punktfunk_connection_next_hidout` (→ `PunktfunkHidOutput`) and `punktfunk_connection_send_rich_input`
|
||||
(← `PunktfunkRichInput`); header regenerated. Validated on-box: a synthetic-source `punktfunk1-host` +
|
||||
`punktfunk-probe --rich-input-test` created the real kernel DualSense, drove 0xCC, and decoded
|
||||
12 live 0xCD events (the kernel's actual lightbar/trigger init reports) — data plane unaffected
|
||||
(600/600 frames). *Remaining:* the Apple client renders adaptive triggers + rumble on a real
|
||||
DualSense (`GCDualSenseAdaptiveTrigger`) — handed off to the client agent for the real playtest.
|
||||
- **Advanced (audio-driven voice-coil) haptics — scoped, NO-GO for now (`docs/dualsense-haptics.md`).**
|
||||
Driven by the DualSense's USB *audio* interface (4-ch, back 2 channels = haptic PCM), not HID — so
|
||||
the UHID backend structurally can't carry it. Three independent walls: host capture needs a kernel
|
||||
rebuild (`CONFIG_USB_DUMMY_HCD` is off → no UDC for an `f_uac2` gadget); **near-zero Linux supply**
|
||||
(only ~5–10 Proton titles via custom Wine patches emit it; `hid-playstation`/Steam Input/RPCS3
|
||||
don't); and the Apple client can't faithfully replay PCM haptics (CoreHaptics is discrete/pattern-
|
||||
based, no public channel-3/4 routing). Deferred; revisit only if a real DS for capture + a UDC/host
|
||||
path + a PCM-capable client all land. Adaptive triggers (HID, above) deliver the reachable 80%.
|
||||
|
||||
## 6. iOS/iPadOS → tvOS *(deferred)*
|
||||
|
||||
PunktfunkKit is already platform-shared; iOS needs the `UIViewRepresentable` presenter twin
|
||||
+ touch capture (#5) + UI. tvOS later.
|
||||
|
||||
## 7. Windows as a host ✅ *(implemented & shipping — NVIDIA-only, x64-only; see [Windows Host](/docs/windows-host))*
|
||||
|
||||
Done as an "add a backend" job, not a parallel port: `punktfunk-core` (protocol/FEC/crypto/C-ABI) +
|
||||
QUIC + GameStream + mgmt + the pipeline orchestration are all platform-agnostic and already
|
||||
`cfg`-isolated (~95% reuse), so the Windows host is a set of `#[cfg(windows)]` backends behind the
|
||||
existing traits:
|
||||
|
||||
- **Capture** — DXGI Desktop Duplication → D3D11 texture.
|
||||
- **Virtual display** — **SudoVDA** (the Sunshine Virtual Display Adapter), a pre-built, signed
|
||||
Indirect Display Driver that creates a `WxH@Hz` monitor on demand — so there's no kernel driver to
|
||||
author or WHQL-sign. Bundled and staged by the installer; falls back to capturing an existing
|
||||
monitor if absent.
|
||||
- **Encode** — NVENC with a D3D11 device (`--features nvenc`), so the host is **NVIDIA-only**.
|
||||
- **Input** — SendInput (mouse/keyboard) + ViGEm (virtual gamepads with rumble).
|
||||
- **Audio** — WASAPI loopback capture + a WASAPI virtual mic.
|
||||
|
||||
It ships as a **signed Inno Setup installer** that registers a `LocalSystem` SCM service launching
|
||||
into the interactive session for secure-desktop (UAC / lock-screen) capture, published by
|
||||
`windows-host.yml`. Still open: AMD/Intel encode (none today), broader real-world testing, and
|
||||
bundling ViGEmBus.
|
||||
|
||||
## 8. Pairing & trust hardening *(§8a + §8b-1 done; §8b-2 next)*
|
||||
|
||||
The unified host + web-console pairing (arm a window → display the host PIN → user enters it on the
|
||||
client) is built and live. Two changes harden it from "works" to "secure by default":
|
||||
|
||||
- ✅ **Mandatory PIN pairing by default — done & live** (`§8a`, `serve --native` now requires
|
||||
pairing; `serve --open` disables it). An unpaired client is rejected at the session gate; pairing
|
||||
is via the SPAKE2 PIN ceremony (one online guess, no offline attack) armed from the web console.
|
||||
Validated live: unpaired → "this host requires pairing", then web-armed PIN → "client trusted".
|
||||
Deployed to the dev box + Bazzite.
|
||||
- ✅ **§8b-1 Delegated approval via the console — done (2026-06-12)** *(the ergonomic enabler for
|
||||
"mandatory": pair a device without fetching the host PIN out of band).* An identified-but-unpaired
|
||||
device that knocks on a pairing-required host is held as a **pending request** in `NativePairing`
|
||||
(in-memory, deduped by fingerprint, 32-entry cap, 10-min expiry — a LAN scanner can't grow it,
|
||||
and an anonymous client with no certificate records nothing). The mgmt API gains
|
||||
`GET /native/pending` + `POST /native/pending/{id}/approve` (optional `{name}` to relabel) +
|
||||
`POST /native/pending/{id}/deny`; the web console's Pairing page shows a **Waiting for approval**
|
||||
section (live-polling) with Approve/Deny — approve pins the fingerprint on the spot, no PIN.
|
||||
The `Hello` carries an optional trailing **device name** (same back-compat pattern as
|
||||
compositor/gamepad/bitrate; `client-rs --name` sends it, fingerprint-derived label otherwise) so
|
||||
the pending list is human-readable. End-to-end tested (knock → pending → approve → same identity
|
||||
streams) + unit/mgmt tests.
|
||||
- **§8b-2 (peer push — needs the client):** the host also pushes the pending request over a paired
|
||||
**Device A**'s live QUIC connection (a new control-plane message); A's app renders the prompt and
|
||||
replies approve/deny — the user's exact "Device A gets a notification" flow. The native/Apple UI
|
||||
is a client-agent task. The Apple connector should also start sending the `Hello` device name
|
||||
(needs a connect-ABI name parameter; the wire field is already live).
|
||||
|
||||
PIN pairing (§8a) stays the bootstrap — the first device, or when no approver is online.
|
||||
|
||||
## 9. Client→host network speed test + settable bitrate *(host + Apple client done — web console remaining)*
|
||||
|
||||
Measure what the network actually sustains so the bitrate picker is informed (suggest/cap a safe
|
||||
value) instead of guesswork that ends in a stuttering stream.
|
||||
|
||||
**Done & live (host + protocol + connector + C ABI, `74819b1`):**
|
||||
- **Bitrate negotiation**: `bitrate_kbps` rides Hello/Welcome (trailing-byte back-compat). The
|
||||
client requests a rate; the host clamps to [500 kbps, 2 Gbps] (or its 20 Mbps default on 0),
|
||||
applies it to NVENC (replacing the old hardcoded 20 Mbps) on the initial mode + every reconfigure,
|
||||
and echoes the resolved value. C ABI: `punktfunk_connect_ex3(…, bitrate_kbps, …)` +
|
||||
`punktfunk_connection_bitrate()`.
|
||||
- **Bandwidth probe over the punktfunk/1 data path**: `ProbeRequest{target_kbps,duration_ms}` /
|
||||
`ProbeResult{bytes_sent,…}` control messages + a `FLAG_PROBE` packet flag. The host bursts
|
||||
zero-filled FEC-encoded AUs at the target goodput for the duration (clamped ≤ 3 Gbps / ≤ 5 s,
|
||||
video paused), reports what it sent; the connector measures received bytes/window → goodput + loss
|
||||
and exposes it (`punktfunk_connection_speed_test()` + `punktfunk_connection_probe_result()` →
|
||||
`PunktfunkProbeResult{throughput_kbps, loss_pct, …}`). Probe filler is diverted from the decoder.
|
||||
Validated on loopback (synthetic source): a 20 Mbps/2 s probe measured 20050 kbps at 0% loss,
|
||||
interleaved probe AUs excluded from frame verification. `punktfunk-probe` gains `--bitrate` +
|
||||
`--speed-test KBPS:MS` as the reference/loopback driver.
|
||||
|
||||
**Done (Apple client UI):** Settings grows a Bitrate control (Automatic = host default; manual is
|
||||
a log-scale slider up to 3 Gbps with an above-1-Gbps "test the speed first" warning — tvOS keeps
|
||||
a focus-native preset picker; rides `connect_ex3` on every connect, `PUNKTFUNK_BITRATE_KBPS` dev
|
||||
override), and each host card's context menu gets
|
||||
"Test Network Speed…" — a sheet that connects, runs `speed_test` (up to the host's 3 Gbps
|
||||
probe ceiling for 2 s), polls `probe_result` with a live readout, and shows measured
|
||||
goodput · loss · recommended bitrate (≈70% of measured, capped at the 2 Gbps session
|
||||
ceiling) with a one-tap "Use N Mbps" writing the setting. Loopback-tested through the
|
||||
xcframework: bitrate echo (50 000 → 50 000) + a 20 Mbps/500 ms probe completing with real numbers.
|
||||
|
||||
**Remaining:** surface both in the web console.
|
||||
|
||||
## 10. HDR + 10-bit color *(parked — blocked upstream at the compositor producer)*
|
||||
|
||||
Opt-in HDR10 (BT.2020 + PQ, 10-bit) streaming. Designed end to end; **blocked at capture, not in our
|
||||
stack** — the compositor doesn't emit a 10-bit/HDR PipeWire frame on any shipping build. Spiked +
|
||||
researched 2026-06-11 (memory: `hdr-blocked-gamescope-pipewire`); the downstream design is ready to
|
||||
build the moment a producer lands.
|
||||
|
||||
- **The wall — gamescope capture is 8-bit.** gamescope composites HDR for a *display*
|
||||
(`--hdr-enabled`, `--hdr-debug-force-output`), but its PipeWire capture node offers only `BGRx`/
|
||||
`NV12` (8-bit) — confirmed by reading `src/pipewire.cpp` `build_format_params()` on upstream master
|
||||
AND the box's exact build (`c31743d`); color is capped BT.601/709. Issue **#2126** ("pipewire: add
|
||||
HDR streams") is OPEN + unstarted (no PR). Forcing HDR output does not change the capture format.
|
||||
- **PipeWire ≥1.6** is the other prerequisite (HDR colortype transport). Fedora 43 ships 1.4.x;
|
||||
Fedora 44 ships **1.6.6** — but Bazzite F44 `deck-nvidia` is **testing-only** (`:stable` is still
|
||||
F43; `:testing` has a confirmed NVIDIA Game-Mode crash). Rebasing now clears *only* the PipeWire
|
||||
wall while gamescope stays 8-bit → **no-go** for HDR; revisit a rebase when F44 promotes to stable
|
||||
(for its own sake), not for HDR.
|
||||
- **The realistic route is KWin, not gamescope:** **KWin MR !8293** is a live draft adding HDR
|
||||
PipeWire capture. That pulls HDR onto the *desktop* (KWin) path — trading away gamescope's
|
||||
Steam-Deck-UI polish + the dynamic-resolution work (§ above). Track #2126 and !8293.
|
||||
- **Constraints (settled):** NVENC tops out at **10-bit** → no Main12, no 12-bit AV1. **HDR ⟹ HEVC
|
||||
Main10** (Apple VideoToolbox decodes 10-bit HEVC but **not** 10-bit AV1). Static HDR10 SEI
|
||||
(BT.2020-PQ default) since the compositor won't surface per-frame metadata. Opt-in negotiation via
|
||||
the Hello/Welcome trailing-byte pattern (SDR default; client declares HDR want; host master toggle).
|
||||
- **Downstream design (ready when capture unblocks):** add P010 + `ColorInfo` to capture; 10-bit
|
||||
zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA
|
||||
path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple
|
||||
VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback.
|
||||
|
||||
## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)*
|
||||
|
||||
Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks
|
||||
the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling.
|
||||
|
||||
**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and
|
||||
untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is
|
||||
only ~434–874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM
|
||||
(RustCrypto auto AES-NI, ~10–25× headroom on x86_64); the u64 sequence/nonce space; and the **core
|
||||
`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every
|
||||
legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size
|
||||
must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated
|
||||
params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand.
|
||||
|
||||
- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the
|
||||
clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show
|
||||
headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`)
|
||||
so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock
|
||||
send-buffer drop — `Transport::send` → `Result<bool>`, a new `packets_send_dropped` stat (Stats +
|
||||
C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe
|
||||
completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe.
|
||||
- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall
|
||||
per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe
|
||||
bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all`
|
||||
(64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux
|
||||
`sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with
|
||||
`recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep
|
||||
underdrain). ~64× fewer syscalls.
|
||||
- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the
|
||||
reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 256–1024 (bounds burst-drop
|
||||
blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump
|
||||
`shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets.
|
||||
- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB,
|
||||
never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface.
|
||||
- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path):
|
||||
`punktfunk-probe --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching
|
||||
`packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.5–1 Gbps (no explicit
|
||||
`rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.
|
||||
|
||||
## 12. Glass-to-glass latency *(investigated; quick wins landed, bigger bets scoped)*
|
||||
|
||||
A 5-way investigation (2026-06-11) mapped where latency actually lives. The measured "p50 0.83 ms"
|
||||
is only the same-host **capture-stamp→reassembled** slice (~30–40% of true glass-to-glass) and was
|
||||
measured with tiny single-chunk frames, so it excludes the pacing tail. The latency that matters, in
|
||||
priority order: **(1) the host pacing tail** — `paced_submit` used to spread *every* multi-chunk
|
||||
frame over ~90% of the interval (up to ~7.5 ms@120 / ~15 ms@60); **(2) native-path serialization** —
|
||||
`virtual_stream` runs capture+encode+seal+paced-send on one thread, so frame N+1 can't start until
|
||||
frame N's paced tail leaves the wire; **(3) client present** — `AVSampleBufferDisplayLayer` adds
|
||||
~0.5 refresh (~4 ms@120Hz, ~8 ms@60Hz), the dominant client term at 60 Hz.
|
||||
|
||||
**Already optimal — do NOT touch** (confirmed): NVENC tuning (p1/ull/cbr/bf0/delay0/infinite-GOP +
|
||||
forced-IDR — `receive_packet` is already same-frame); the device→device copy in `submit_cuda` (avoids
|
||||
NVENC registration-cache thrash); FEC `max_data_per_block=4096` (every frame incl. a 4 MB IDR is one
|
||||
block — no multi-block latency); the client reassembler (no jitter buffer, frame emitted on
|
||||
last-packet arrival, `REORDER_WINDOW` is a dedup bound not a delay) — do **not** add a client jitter
|
||||
buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
|
||||
|
||||
- **Done & live (`99f60b5`):** **microburst-cap pacing** — a frame ≤ a cap (default 128 KB,
|
||||
`PUNKTFUNK_PACE_BURST_KB`) bursts out immediately (no pacing tail); only a bigger frame's overflow
|
||||
(IDR / sustained high bitrate — the bursts that actually froze) is spread. Recovers the tail on the
|
||||
common case, keeps the freeze fix for the frames that need it; 128 KB is a safe default (well under
|
||||
the ~150 Mbps@60 frame size where drops began). Plus **per-frame instrumentation** (PUNKTFUNK_PERF):
|
||||
`encode_us` + `pace_us` p50/p99/max + immediate-vs-paced counts, so the cap is tunable against real
|
||||
numbers. **Validate with the LAN soak before raising the cap** (`send_dropped` must stay 0).
|
||||
- **Done & live (`b295a5b`; validated on the GNOME box 2026-06-12):** **encode|send thread split**
|
||||
on the native path — a dedicated `send_loop` thread owns the `Session` and does seal+pace+send+
|
||||
probes; the encode thread captures+encodes+handles reconfig and hands `FrameMsg` over a bounded
|
||||
`sync_channel(3)` with backpressure. Removes the serialization (~2–8 ms @60–120 fps) and is the
|
||||
substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
|
||||
LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
|
||||
- **Done & live (skew handshake landed 2026-06-12):** **wall-clock skew handshake** — `ClockProbe`/
|
||||
`ClockEcho` on the control stream (8 NTP-style rounds right after `Start`; min-RTT sample →
|
||||
host−client offset; `clock_offset_ns`). The client adds the offset to its receive instant before
|
||||
differencing against the AU `pts_ns`, so the `capture→reassembled` percentiles are now valid
|
||||
**across machines** (reported `skew_corrected=true`), not just same-host. Back-compat: an old host
|
||||
that doesn't answer times out → `skew_corrected=false` (shared-clock assumption, as before).
|
||||
Validated cross-LAN (GNOME box → dev box): offset ≈ −1.57 ms (reproducible), rtt ~140 µs, **p50
|
||||
1.30 ms** skew-corrected capture→reassembled. The skew handshake is now a shared core helper
|
||||
(`quic::clock_sync` → `ClockSkew`) used by both the reference client and the **embeddable
|
||||
connector** — `NativeClient` runs it at connect and exposes the offset over the C ABI
|
||||
(`punktfunk_connection_clock_offset_ns`), so the Apple client can convert a present instant to the
|
||||
host clock. The Apple client now consumes that offset: `PunktfunkConnection.clockOffsetNs` +
|
||||
`LatencyMeter` surface a **capture→client-receipt** (skew-corrected) p50/p95 in the HUD — the first
|
||||
cross-machine latency the real Apple client reports. **Remaining for *true* glass-to-glass**:
|
||||
(1) the **decode→present** tail — the stage-1 `AVSampleBufferDisplayLayer` decodes+presents
|
||||
compressed samples internally with no per-frame callback, so it needs the **stage-2 presenter**
|
||||
(`VTDecompressionSession` decode-completion timestamp + `CAMetalLayer`/display-link present) to
|
||||
stamp on-glass present time; (2) the host **render→capture** term (PipeWire buffer presentation
|
||||
timestamp vs our capture stamp). **render→capture is parked (low priority):** pipewire-rs 0.9.2
|
||||
exposes no per-buffer meta accessor, no raw buffer pointer (`pub(crate)`), and no stream-timing
|
||||
API, so reading `SPA_META_Header.pts` would require introducing raw `spa_sys`/`pw_sys` FFI into the
|
||||
working, perf-critical capture buffer-acquisition — a risky rewrite for the *smallest* g2g term,
|
||||
with KWin/Mutter `Header.pts` support unconfirmed. Glass-to-glass is effectively complete as
|
||||
**capture→present** (the stage-2 presenter measures it). `tools/latency-probe` is still the
|
||||
cross-machine orchestrator.
|
||||
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
|
||||
1. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
|
||||
copy) — ~0.1–0.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
|
||||
2. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
|
||||
off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
|
||||
3. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
|
||||
encode+send within a frame (~3–6 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
|
||||
thread split, only after measurement justifies it.
|
||||
|
||||
## 13. Native-protocol LAN auto-discovery ✅ *(done — 2026-06-12, validated cross-LAN)*
|
||||
|
||||
The native protocol had no discovery — clients connected by `--connect HOST:PORT` only, while
|
||||
GameStream already auto-discovered via mDNS (`_nvstream._tcp`). Now both the unified host
|
||||
(`serve --native`) and standalone `punktfunk1-host` advertise the native service over mDNS:
|
||||
|
||||
- **Service**: `_punktfunk._udp.local.` (UDP — punktfunk/1 is QUIC; the advertised port is the QUIC
|
||||
control/data port). Host side: `crate::discovery::advertise_native`, wired into `m3::serve` so
|
||||
both host entry points get it; best-effort (a discovery failure never blocks streaming —
|
||||
`--connect` always works). The advert is held for the host's lifetime (RAII unregister).
|
||||
- **TXT records**: `proto=punktfunk/1`, `fp=<host cert SHA-256>` (the value a client pins — advisory
|
||||
over unauthenticated mDNS, TOFU/pinning still verifies on connect), `pair=required|optional`
|
||||
(so a picker knows up front whether the PIN ceremony is needed), `id=<host uniqueid>` (dedup).
|
||||
- **Client**: `punktfunk-probe --discover [SECS]` browses and prints each host (name, addr:port,
|
||||
pairing, fingerprint), then exits. Apple clients browse the same service natively via NWBrowser
|
||||
(Bonjour) — no Rust-connector dependency; this section's service type + TXT keys are the contract.
|
||||
- **Validated**: cross-LAN — a client discovered a GNOME host appliance
|
||||
(`myhost.local 9777 pair=required fp=1dcf3a…`) and a standalone synthetic host
|
||||
(`pair=optional`); fingerprint + pairing state correct in both.
|
||||
- **Next** (not done): wire NWBrowser discovery into the Apple client UI (host picker); the
|
||||
host-side contract above is all it needs.
|
||||
|
||||
## 14. Concurrent sessions *(shared-desktop multi-view ✅ done; multi-user deferred)*
|
||||
|
||||
The host no longer serves one client at a time. The accept loop spawns each session (`JoinSet`),
|
||||
bounded by `--max-concurrent` (default 4 — a NVENC bound; overflow waits in the accept queue). Each
|
||||
session keeps its own virtual output + NVENC encoder; the host-lifetime input/audio/mic services stay
|
||||
shared.
|
||||
|
||||
- **Done & live (shared-desktop multi-view):** multiple devices viewing/controlling the **same**
|
||||
desktop on the shared-desktop backends (kwin/mutter/wlroots) — e.g. stream *your* desktop to a
|
||||
laptop + a TV at once; shared input/audio is the correct semantics there. Validated live on the
|
||||
GNOME box: two clients → **two independent Mutter virtual outputs (1280×720 + 1920×1080) streaming
|
||||
simultaneously**. The QUIC handshake stays in the accept loop so a failed handshake doesn't consume
|
||||
a slot or block the next client.
|
||||
- **Deferred — gamescope multi-user (independent desktops):** the *other* model — each client its own
|
||||
gamescope instance, with **per-session input + audio + mic** (the multi-user / cloud-gaming case).
|
||||
Researched 2026-06-12 and **parked**: it's a large multi-file refactor (per-instance EIS sockets +
|
||||
a per-session injector + per-session **null-sink audio routing** + per-session mic) for a niche
|
||||
use case, while the common multi-device case is already covered by the multi-view model above. Full
|
||||
research + the plumbing list: [gamescope Multi-User Isolation](/docs/gamescope-multiuser).
|
||||
- **HDR / 10-bit streaming.** Designed end to end, but blocked upstream — no shipping compositor emits
|
||||
a 10-bit/HDR capture stream yet. Ready to build the moment one lands.
|
||||
- **Advanced DualSense voice-coil haptics.** Scoped and shelved (it rides the controller's USB audio
|
||||
interface, with near-zero game support on Linux). Adaptive triggers, rumble, and the lightbar
|
||||
already ship.
|
||||
|
||||
@@ -3,9 +3,8 @@ title: "Status & Progress"
|
||||
description: "Where the work stands across the core, the host, and the native clients."
|
||||
---
|
||||
|
||||
A high-level view of where punktfunk stands. The full design lives in the
|
||||
[Implementation Plan](/docs/implementation-plan), the ordered plan of work in the
|
||||
[Roadmap](/docs/roadmap), and milestone-level detail in
|
||||
A high-level view of where punktfunk stands. The ordered plan of work is on the
|
||||
[Roadmap](/docs/roadmap), and milestone-level detail lives in
|
||||
[`CLAUDE.md`](https://git.unom.io/unom/punktfunk/src/branch/main/CLAUDE.md).
|
||||
|
||||
## Milestones at a glance
|
||||
|
||||
Reference in New Issue
Block a user