Opt-in (Settings -> Presenter; `punktfunk.presenter`, default stage-1). Stage-1's
AVSampleBufferDisplayLayer decodes AND presents internally with no per-frame
callback, so neither decode nor present can be stamped or hand-paced. Stage-2
takes explicit control:
- VideoDecoder: VTDecompressionSession, async output callback stamps
decode-completion, session rebuilt on every IDR / format change. Unit-tested
(testVideoDecoderAsyncCallbackDeliversPixels).
- MetalVideoPresenter: CAMetalLayer + CVMetalTextureCache + a runtime-compiled
BT.709 limited-range NV12->RGB shader, present at the next vsync. The
CVMetalTextures + pixel buffer are held until the GPU completes.
- Stage2Pipeline: pump thread -> decoder -> newest-ready 1-slot ring; the hosting
view's display link drains it once per vsync and stamps capture->present
(the display-link target time projected into CLOCK_REALTIME).
- LatencyMeter gains record(ptsNs:atNs:offsetNs:); the HUD shows a capture->present
(glass-to-glass, modulo host render->capture) line, skew-corrected via
clockOffsetNs. Measured live ~11 ms p50 vs ~2.2 ms capture->client.
- StreamView / StreamViewIOS host the CAMetalLayer as a sublayer + a CADisplayLink
(NSView.displayLink on macOS) when stage-2; input capture + HUD unchanged. The
session-active gates switch from `pump != nil` to `connection != nil` so capture
engages without a StreamPump.
Validated: builds macOS/iOS/tvOS; the decode half is unit-tested; the Metal
present is live-validated on glass (correct image + the capture->present number).
Colorspace is BT.709 SDR for now; 10-bit/HDR + a pacing policy are later.
Plan: docs-site/content/docs/apple-stage2-presenter.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The accept loop no longer awaits each session inline — it spawns each onto a
JoinSet, bounded by a semaphore (--max-concurrent, default 4: a NVENC session
bound; overflow clients wait in QUIC's accept backlog until a slot frees). The
QUIC handshake stays in the accept loop so a failed handshake (e.g. a pin
mismatch where the client aborts) doesn't consume a session slot or block
accepting the next client; the slow part (control handshake, pairing, the
capture/encode pipeline) runs in the spawned task.
Each session already had its own virtual output + NVENC encoder; the
host-lifetime input/audio/mic services stay shared — the natural "multiple
devices viewing/controlling the same desktop" semantic on kwin/mutter/wlroots.
gamescope's independent-desktops (per-session input/audio) isolation is a
follow-up. New M3Options.max_concurrent + the `--max-concurrent` CLI flag.
Validated live (GNOME box): two clients connected at once -> two independent
Mutter virtual outputs (720p60 + 1080p60) streaming simultaneously (39 MB +
48 MB). All 61 host tests green (the c_abi/pairing tests exercise the new loop +
the failed-handshake-doesn't-count semantics).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Apple client now consumes the connector's clock offset. PunktfunkConnection
reads punktfunk_connection_clock_offset_ns into clockOffsetNs at connect; a new
LatencyMeter (PunktfunkKit, NSLock + percentiles, mirrors FrameMeter) records each
AU's capture->client-receipt latency = now(CLOCK_REALTIME) + offset - pts_ns, and
SessionModel drains p50/p95 into the macOS HUD ("capture->client N/N ms p50/p95",
"(same-host)" when the host didn't answer the skew handshake). Wired at the
existing onFrame hook in ContentView — additive, no change to the decode/present
path. Unit test for the meter (percentiles, skew flag, absurd-value guard).
This is the first cross-machine latency the real Apple client reports. SCOPE:
stage-1 AVSampleBufferDisplayLayer decodes+presents compressed samples internally
with no per-frame callback, so this excludes decode+present; true decode->present
needs the stage-2 presenter (VTDecompressionSession + CAMetalLayer). Rebuild
PunktfunkCore.xcframework (for the new C getter) before swift build/test on a Mac.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Factor the client-side skew handshake into a shared core helper (quic::clock_sync
-> ClockSkew) so both the reference client and the embeddable connector use one
implementation. NativeClient now runs the handshake at connect (right after Start,
before the control task takes the stream) and stores the host-client offset; it's
read over the C ABI via punktfunk_connection_clock_offset_ns (i64 ns, host minus
client; 0 = no correction / old host).
This is the substrate the Apple client needs for the decode->present (glass-to-
glass) term: stamp present time, add the offset to express it in the host's
capture clock, subtract the AU pts_ns. client-rs drops its local clock_sync copy
and uses the shared helper (behavior unchanged; validated locally).
Regenerates include/punktfunk_core.h. Roadmap section 12 + status updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the new docs workflow (docs-site = KB layer; repo docs/ keeps design notes):
- Add a canonical Status & Progress tracker (status.md): milestones, per-box live
state, and a dated progress log — the go-forward place to track progress.
- Add setup guides: GNOME/Mutter host (gnome-box — Secure Boot MOK enroll, the
libnvidia-gl EGL fix, autologin, screen-lock disable, appliance unit), headless
KDE box, and Bazzite host (ujust input group, gamescope session, gotchas).
- Roadmap is now canonical in docs-site (synced the skew-handshake section 12
update); removed the repo docs/roadmap.md copy and repointed README to docs-site.
- Nav (meta.json) + landing cards updated; site builds (bun run build).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>