Phase 2 of AMD/Intel support: the VAAPI encoder now takes the capture dmabuf directly and does the RGB->NV12 colour conversion on the GPU's video engine, eliminating the host-side de-pad + swscale CSC + upload the CPU path pays. - capture: a vendor-neutral FramePayload::Dmabuf (dup'd fd + fourcc/modifier/ layout). When zero-copy is on, the EGL->CUDA importer is unavailable (any non-NVIDIA host), and the backend is VAAPI, the capturer advertises LINEAR dmabuf and hands the raw buffer to the encoder instead of CPU-copying it. - encode/vaapi: the encoder self-configures from the first frame's payload (no open_video signature change). The dmabuf arm wraps the buffer as an AV_PIX_FMT_DRM_PRIME frame and pushes it through a filter graph buffer(drm_prime) -> hwmap(vaapi) -> scale_vaapi=nv12 -> buffersink; the encoder takes NV12 surfaces straight from the sink. The Phase 1 CPU-upload path is kept as the other arm (used when capture produces CPU frames). Live-validated on a Radeon 780M (real Sway/xdpw desktop capture): correct, pixel-perfect HEVC, and ~10x less host CPU at 1440p (4.2s -> 0.4s of CPU for 300 frames) -- the de-pad/CSC/upload moves to the GPU. NVIDIA unchanged (zero-copy still imports to CUDA; the passthrough path only engages on non-NVIDIA hosts). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
punktfunk
A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.
punktfunk is a placeholder codename. The bet: ship a Linux virtual-display streaming
host that speaks the existing Moonlight protocol (every Moonlight/Artemis client works
day one), then break the ~1 Gbps FEC wall with a GF(2¹⁶) Leopard-RS transport as a
negotiated extension. See docs/implementation-plan.md.
Status
| Milestone | State |
|---|---|
Core — punktfunk-core + C ABI |
✅ done & hardened (FEC, packetization, AES-GCM, session, adversarial-review fixes, punktfunk_core.h) |
| GameStream host → stock Moonlight | ✅ live end-to-end: pairing, RTSP, audio, per-client virtual output at native res, GPU zero-copy NVENC, gamepads |
Native protocol — punktfunk/1 |
✅ validated live: QUIC control + GF(2¹⁶) FEC/AES data plane, SPAKE2 PIN pairing, mid-stream mode renegotiation |
| Native clients — decode + present | 🟡 macOS first light: AnnexB→VideoToolbox HEVC on glass + input/pairing over punktfunk/1 (clients/apple); iOS + presenter next |
| Web console + management API | ✅ TanStack web console (web/) over the OpenAPI mgmt API: host status, paired devices, on-demand native pairing (arm → show PIN) |
The GameStream host works with a stock Moonlight client — validated live on NVIDIA
(RTX 5070 Ti & RTX 4090, driver 595): trust-on-first-use pairing that persists, an app
catalog, RTSP/ENet/audio, and video at the client's exact resolution and refresh via a
per-session virtual output (KWin, gamescope, Mutter, Sway backends), encoded with GPU
zero-copy (dmabuf → CUDA/Vulkan → NVENC) at up to 5120×1440@240. The native
punktfunk/1 protocol adds a QUIC control plane and a GF(2¹⁶) Leopard-FEC + AES-GCM data
plane (p50 ~0.8 ms capture→reassembled at 720p120). Its trust model is SPAKE2 PIN pairing by
default — a new host requires the PIN ceremony; trust-on-first-use is an explicit host opt-in
(punktfunk1-host --allow-tofu / serve --open, advertised as pair=optional) for fully trusted LANs. Both
run from one process (serve --native), managed through a REST API + web console. Builds
against FFmpeg 7 or 8; deployed live on Bazzite. Full status: CLAUDE.md;
roadmap, setup guides & progress: the docs site (docs-site/ — Fumadocs;
bun run dev), with the canonical roadmap and
status there. Design notes stay in docs/.
Install (host)
The package registries are the real distribution channel — pick your distro and run one command. Per-distro setup (add the repo, first-run, web console) lives in the linked READMEs.
| Distro | One-command happy path | Details |
|---|---|---|
| Ubuntu / Debian (apt) | sudo apt install punktfunk-host (after adding the repo) |
packaging/debian/README.md |
| Fedora / Bazzite (rpm-ostree) | rpm-ostree install punktfunk punktfunk-web (after adding the repo; or the bootc image) |
packaging/rpm/README.md |
| Arch / Steam Deck (PKGBUILD / sysext) | makepkg -si (Arch) · sysext .raw (SteamOS/Deck) |
packaging/arch/README.md |
punktfunk-host is the streaming host; punktfunk-web is the browser console (pairing + status);
punktfunk-client is the GTK4 desktop client (also shipped via apt/RPM/Arch/Flatpak). After install,
run punktfunk-host serve --native inside your desktop session, then pair from the web console.
Building from source (below) is a fallback.
Layout
crates/
punktfunk-core/ protocol · FEC · pacing · crypto · quic — the C ABI (lib + cdylib + staticlib)
punktfunk-host/ Linux host: vdisplay · capture · encode · inject · gamestream · punktfunk1 · mgmt · native_pairing
clients/
probe/ punktfunk/1 reference/probe client (headless test + latency measurement)
linux/ windows/ native desktop clients (Rust: GTK4 / WinUI 3, link punktfunk-core directly)
apple/ android/ Swift (macOS+iOS) · Kotlin app + native/ Rust JNI core
decky/ Steam Deck Decky plugin
web/ TanStack web console (host status · paired devices · pairing) over the mgmt API
packaging/ Fedora/Bazzite RPM · bootc image · COPR (see packaging/bazzite/README.md)
include/punktfunk_core.h cbindgen-generated C header (checked in)
tools/{latency-probe,loss-harness}/ measurement (plan §10)
docs/{implementation-plan,roadmap,windows-host,dualsense-haptics}.md
Build & test (from source)
For development, or as an install fallback where no package is available:
cargo build --workspace # green on Linux and macOS
cargo test --workspace # unit + loopback + proptest + C ABI harness
cargo clippy --workspace --all-targets
cargo run -p loss-harness # FEC loss-resilience sweep (no network needed)
bash crates/punktfunk-core/tests/c/run.sh # standalone C-ABI link+round-trip proof
The C header regenerates from crates/punktfunk-core/src/abi.rs on every build (cbindgen via
build.rs) into include/punktfunk_core.h.
Design invariants
- One core, linked everywhere. Protocol/FEC/crypto/pacing live in
punktfunk-coreexactly once, exposed over a stable, versioned C ABI (punktfunk_abi_version(),PunktfunkConfigcarries its ownstruct_size). - No async on the hot path. The per-frame pipeline uses native threads only;
tokio/quinnare gated behind the off-by-defaultquicfeature (control plane only). - FEC is the wall-breaker. GF(2⁸) (≤255 shards/block) for Moonlight compat; GF(2¹⁶) (≤65535 shards/block, SIMD, O(n log n)) to push past ~1 Gbps.
License
MIT OR Apache-2.0.