Both directions of the audio plane, on CoreAudio's built-in Opus codec
(kAudioFormatOpus — no bundled libopus; OpusCodec.swift, round trip unit-tested):
- Playback: a drain thread pulls nextAudio() packets, decodes, and writes a priming
jitter ring feeding an AVAudioSourceNode (~20 ms prefill, adaptive to the device's
render quantum so large-buffer devices don't oscillate prime/dropout; a high-water
clamp sheds stall backlog so one network hiccup can't permanently lag audio behind
video; underrun re-primes — one dip, not sustained crackle).
- Mic: a second engine taps the input device, resamples to 48 kHz stereo, Opus-encodes
20 ms chunks and sendMic()s them into the host's virtual PipeWire source. Permission
via AVCaptureDevice (NSMicrophoneUsageDescription added to the Xcode target).
- Settings: Speaker + Microphone pickers (CoreAudio HAL enumeration, persisted by
device UID — "System default" leaves the engine unpinned so it follows macOS device
changes) and a "Send microphone" toggle (default on). Applies from the next session.
- Audio starts with streaming, never during the trust prompt (no host sound — and no
mic uplink — before the user trusted the host); teardown stops audio before close().
Adversarial-review fixes baked in: stop() and the dangling mic-permission callback
share one lock+flag protocol (no hot mic with no owner), the connect-success handler
bails when the attempt was abandoned mid-handshake (no session/mic for a dead window),
SessionAudio gets a deinit backstop (a dropped instance can't pin the connection via
its drain thread), and the render scratch buffer is block-owned (was leaked per
session).
Verified live against the box: remote test decodes 100 host Opus packets to PCM and
the host opens its virtual mic on the first uplinked frame ("punktfunk/1 virtual mic
ready"); on-glass session runs with both engines up.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
13 KiB
punktfunk Apple client (SwiftUI)
The native macOS/iOS client for punktfunk/1 (the post-GameStream protocol). All
networking/protocol work — QUIC control plane, UDP data plane, GF(2¹⁶) FEC, AES-GCM,
input datagrams, Opus audio, cert pinning — lives in the shared Rust core (statically
linked as PunktfunkCore.xcframework); this package is the Swift shell: decode
(VideoToolbox), present (SwiftUI), input capture.
Status — first light achieved (2026-06-10)
Validated live, Mac ↔ Linux box over the LAN: gamescope virtual output → NVENC HEVC →
punktfunk/1 (GF(2¹⁶) FEC + AES-GCM over UDP, QUIC control) → VideoToolbox →
AVSampleBufferDisplayLayer on glass at 1280×720@60, with mouse/keyboard flowing back as
QUIC datagrams into the host's gamescope EIS injector (thousands of events injected during
the session). Headless variant of the same proof: RemoteFirstLightTests decoded 60/60
received AUs spanning 983 ms of host capture clock.
The connector underneath (punktfunk_core::client::NativeClient over the C ABI) carries the
full session: video AUs, Opus audio (nextAudio()), rumble (nextRumble()),
input incl. gamepads, and cert pinning + TOFU (pinSHA256:/hostFingerprint) — see
m3.rs::tests::c_abi_connection_roundtrip (three sequential sessions: TOFU, pinned
reconnect, wrong-pin rejection). The host (punktfunk-host m3-host) is a persistent listener:
reconnect at will during development.
What's here, all compiled and tested on macOS (Xcode 26.5 / Swift 6.3):
PunktfunkKit(library)PunktfunkConnection.swift— wrapper over the C ABI. AUs/audio are copied intoData(the C pointer is only valid until the next call of the same kind).close()is safe from any thread: per-plane locks enforce the C contract ("never close with anext_au/next_audioin flight") instead of leaving it to callers. Pinning + TOFU viapinSHA256:/hostFingerprint.AnnexB.swift— in-band VPS/SPS/PPS →CMVideoFormatDescription; Annex-B → AVCCCMSampleBufferwithDisplayImmediatelyset.StreamView.swift— SwiftUINSViewRepresentableoverAVSampleBufferDisplayLayer(stage-1 presenter: the layer hardware-decodes compressed HEVC itself). One pump thread per view, token-cancelled so reconnects can't double-pump.InputCapture.swift—GCMouseraw deltas +GCKeyboardHID→VK mapping (the host'svk_to_evdevconsumes Windows VKs), with fractional-delta accumulation so sub-pixel motion isn't truncated away. Buttons use GameStream ids (1=left … 5=X2). Scroll arrives via the stream view'sscrollWheeloverride instead of GC (trackpad/Magic Mouse gestures never reach GCMouse's scroll dpad), WHEEL_DELTA(120)-scaled.
PunktfunkClient(the app): hosts grid (saved in UserDefaults), "+" toolbar sheet to add hosts, stream mode in Settings (⌘,), two trust flows — the trust-on-first-use fingerprint prompt over the live-but-blurred stream, and SPAKE2 PIN pairing (PairSheet, from a host card's context menu or the trust prompt;ClientIdentityStorekeeps the client identity in the Keychain and presents it on every connect) — then pinned reconnects, fps/Mb-s HUD. Settings also picks the HOST compositor (KWin/wlroots/Mutter/gamescope, default automatic — the host honors it only if that backend is available there). (Audio playback and gamepad capture are not wired into the app yet — the connector surface is there; see notes 5–6.)- Tests (
swift test): byte-level Annex-B units; a real-codec round trip (VTCompressionSession-encoded HEVC rebuilt as the host's wire shape →AnnexB→ VTDecompressionSession → pixels); loopback integration against real local hosts (test-loopback.sh— stream round trip, plus the PIN pairing ceremony and the--require-pairinggate against a second, armed host); the remote first-light test above.
Build / run / test (on a Mac)
rustup target add aarch64-apple-darwin x86_64-apple-darwin
bash scripts/build-xcframework.sh # → clients/apple/PunktfunkCore.xcframework
cd clients/apple
swift build && swift test # loopback/remote tests self-skip without a host
swift run PunktfunkClient # the unbundled dev shell (CLI)
open Punktfunk.xcodeproj # the real app: ⌘R builds + runs Punktfunk.app
bash test-loopback.sh # full loopback proof: builds punktfunk-host
# (synthetic source — runs on macOS), streams
# byte-verified frames into the Swift client
# against the real host (Linux box, see CLAUDE.md "Running on this box") — m3-host is a
# persistent listener, reconnect at will:
# PUNKTFUNK_COMPOSITOR=gamescope PUNKTFUNK_GAMESCOPE_APP=vkcube PUNKTFUNK_ZEROCOPY=1 \
# cargo run -rp punktfunk-host -- m3-host --source virtual --seconds 60
PUNKTFUNK_REMOTE_HOST=<box-ip> swift test --filter RemoteFirstLightTests # headless
# (+ PUNKTFUNK_REMOTE_PORT / PUNKTFUNK_REMOTE_COMPOSITOR=gamescope|kwin|… /
# PUNKTFUNK_REMOTE_PIN=<arming-pin> for the remote pairing test)
PUNKTFUNK_AUTOCONNECT=<box-ip> PUNKTFUNK_MODE=1280x720x60 swift run PunktfunkClient # on glass
Xcode project (Punktfunk.xcodeproj)
The app target Punktfunk wraps the same sources as the swift run shell
(Sources/PunktfunkClient, a synchronized folder — no duplication) plus App/ (asset
catalog) and links PunktfunkKit from the local package. Generated Info.plist, ad-hoc
signing, bundle id io.unom.punktfunk. Notes:
- App icon:
App/Assets.xcassetsships an emptyAppIconslot. For an Icon Composer.icon: add the file to the project (target Punktfunk), set it as the App Icon in the target's General tab, and delete the placeholderAppIcon.appiconset. Heads-up: CLIactool(Xcode 26.5) crashed compilingpunktfunk_Logo.icon— if Xcode does the same, suspect the icon bundle (it has a duplicate-named layer, "…Layer-3 2.svg"), not the project. - Tests from Xcode: the package tests run with
swift test; to get them on ⌘U, addPunktfunkKitTestsonce via Edit Scheme → Test → + (Xcode persists it into the shared scheme — a hand-written package-test reference doesn't resolve headlessly). xcodebuild -project Punktfunk.xcodeproj -scheme Punktfunk buildworks headlessly.
Notes for whoever picks this up next
- cbindgen import quirk (the predicted "small compile fixes", now fixed): the
C17-compatible header spells
PunktfunkStatus/PunktfunkInputKindas integer typedefs while the enum constants import into Swift as a distinct same-named type — bridge with.rawValue(see the top ofPunktfunkConnection.swift). Don't fight the generated header. - ABI contract: one video pump thread per connection, plus optionally one separate
audio drain thread for
nextAudio()/nextRumble()(the core keeps per-plane borrow slots, so the planes never alias);send()is enqueue-only and safe alongside all of them. The wrapper's per-plane locks makeclose()safe from anywhere (it waits out in-flight polls, ≤ their timeouts). - Decode flow: the host opens every stream with an IDR carrying VPS/SPS/PPS in-band
and recovery keyframes re-send them — "refresh the format description on every IDR"
(what
StreamViewdoes) is sufficient; there is no out-of-band extradata, ever. - Stage 2 (next): explicit
VTDecompressionSession+CAMetalLayerfor frame-pacing control (ProMotion/120 Hz), glass-to-glass measurement viatools/latency-probe(the host stampspts_nswith its capture wall clock; across machines you need a clock offset estimate from the QUIC RTT). - Audio — wired, both directions. Playback:
SessionAudiodrainsnextAudio()on its own thread, decodes through CoreAudio's built-in Opus codec (OpusCodec.swift— kAudioFormatOpus, no bundled libopus; round-trip unit-tested) into a priming jitter ring feeding anAVAudioSourceNode. Mic: a second engine taps the input device, resamples to 48 kHz stereo, Opus-encodes 20 ms chunks andsendMic()s them (the host's virtual PipeWire source accepts any frame size ≤ 120 ms). Speaker/mic are chosen in Settings (AudioDevices.swift— persisted by UID; "System default" leaves the engines unpinned so they follow macOS device changes), mic on/off toggle included; the app asks for mic permission on first use (NSMicrophoneUsageDescription is in the Xcode target). A/V sync and packet-loss concealment beyond silence-fill are still open (AudioPacket.seq/ptsNs carry what's needed). Decode with libopus orAVAudioConverter/kAudioFormatOpusinto anAVAudioEnginesource node; conceal gaps (drop/dup) rather than blocking — the Rust side buffers 320 ms and drops the newest packet when the puller lags. Wall-clockptsNsshares the host clock with video AUs for A/V sync. Wiring this intoPunktfunkClientis the next app-side task. - Gamepads:
GCController→.gamepadButton(...)/.gamepadAxis(...)events (wire contract documented on the constructors; the host accumulates them into a virtual Xbox 360 pad). PollnextRumble()and feedGCDeviceHapticsfor force feedback. Client-side capture isn't inInputCaptureyet. - Trust — the full ceremony exists now (SPAKE2).
generateIdentity()once (persist both PEMs in the Keychain), thenpair(host:identity:pin:name:)with the 4-digit PIN the host prints when it ARMS pairing (--allow-pairing/--require-pairing; one PIN per arming window, shown at startup — the user reads it before pairing). Returns the host's VERIFIED fingerprint; persist it and passpinSHA256:+identity:to every connect. Pairing is a real PAKE: a wrong PIN gets ONE online guess (no offline dictionary attack), throwing.wrongPIN; a wrong-size pin throws.invalidPin.PunktfunkClientimplements both flows: the TOFU fingerprint sheet keeps working against hosts not running--require-pairing, and the PIN ceremony is wired in —ClientIdentityStore(Keychain) on every connect,PairSheetfrom a host card's context menu or the trust prompt's "Pair with PIN instead…" (the host's accept loop is sequential, so that path drops the live session before pairing). With--require-pairingthe host now authorizes clients too (the "other direction" is no longer open, opt-in per host); the whole gate is regression-tested intestPairingCeremonyAndRequirePairingGate. 7b. Resize without reconnect:requestMode(width:height:refreshHz:)mid-stream — the host rebuilds at the new mode in ~90 ms; the first new-mode AU is an IDR with fresh parameter sets (the refresh-on-IDR decode flow handles it untouched) andcurrentMode()reflects the switch. Wire it to window-resize events. - Input capture (stage 1): capture is a deliberate, reversible STATE owned by
StreamLayerView, Moonlight-style. Engaged when the stream starts / trust is confirmed and when the user clicks into the video (that click is suppressed toward the host); released by ⌘⎋ (toggles) or focus loss; NEVER engaged by mere app activation — activating clicks may be title-bar drags or resizes, which used to get their cursor warped away mid-drag. While captured: the local cursor is hidden + frozen mid-view (the host renders its own), all input is forwarded, and the view consumes key events as first responder so unhandled keyDowns don't beep — ⌘-combos still work locally (⌘D disconnect, ⌘Q) and reach the host via GC. While released: nothing is forwarded (InputCapture.forwardinggates the GC handlers; held keys/buttons are flushed host-side on release so nothing sticks down), the cursor is free, and the HUD shows "Click the stream to capture input". GC handlers only fire while the app has focus, and focus loss also auto-releases everything held. One live capture per process (the GC mouse/keyboard singletons have a single handler slot — ownership is tracked so a stale capture's stop() can't clobber a newer one). - iOS: same package (
BUILD_IOS=1for the xcframework slice);StreamViewneeds theUIViewRepresentabletwin and touch→input mapping.
Known limitations of the current host (relevant to client UX)
- One session at a time (the listener is persistent, but a second concurrent client waits in the accept queue until the current session ends — the virtual output and encoder are single-tenant).
- Mid-stream renegotiation (resolution change without reconnect) is designed-for but not implemented (the Welcome is one-shot today).
- Host-side gamepad injection needs
/dev/uinputaccess on the box (udev rule fromdocs/linux-setup.md).