feat(apple): stage-2 presenter — explicit decode + Metal present + glass-to-glass
ci / web (push) Failing after 38s
ci / rust (push) Successful in 53s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 16s
ci / docs-site (push) Failing after 39s
docker / deploy-docs (push) Successful in 16s
apple / swift (push) Successful in 1m17s

Opt-in (Settings -> Presenter; `punktfunk.presenter`, default stage-1). Stage-1's
AVSampleBufferDisplayLayer decodes AND presents internally with no per-frame
callback, so neither decode nor present can be stamped or hand-paced. Stage-2
takes explicit control:

- VideoDecoder: VTDecompressionSession, async output callback stamps
  decode-completion, session rebuilt on every IDR / format change. Unit-tested
  (testVideoDecoderAsyncCallbackDeliversPixels).
- MetalVideoPresenter: CAMetalLayer + CVMetalTextureCache + a runtime-compiled
  BT.709 limited-range NV12->RGB shader, present at the next vsync. The
  CVMetalTextures + pixel buffer are held until the GPU completes.
- Stage2Pipeline: pump thread -> decoder -> newest-ready 1-slot ring; the hosting
  view's display link drains it once per vsync and stamps capture->present
  (the display-link target time projected into CLOCK_REALTIME).
- LatencyMeter gains record(ptsNs:atNs:offsetNs:); the HUD shows a capture->present
  (glass-to-glass, modulo host render->capture) line, skew-corrected via
  clockOffsetNs. Measured live ~11 ms p50 vs ~2.2 ms capture->client.
- StreamView / StreamViewIOS host the CAMetalLayer as a sublayer + a CADisplayLink
  (NSView.displayLink on macOS) when stage-2; input capture + HUD unchanged. The
  session-active gates switch from `pump != nil` to `connection != nil` so capture
  engages without a StreamPump.

Validated: builds macOS/iOS/tvOS; the decode half is unit-tested; the Metal
present is live-validated on glass (correct image + the capture->present number).
Colorspace is BT.709 SDR for now; 10-bit/HDR + a pacing policy are later.
Plan: docs-site/content/docs/apple-stage2-presenter.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-12 15:28:23 +02:00
parent 848738ed00
commit 7b10714b62
12 changed files with 737 additions and 30 deletions
+11 -8
View File
@@ -74,7 +74,8 @@ What's here, all compiled and tested on macOS (Xcode 26.5 / Swift 6.3):
received it, **skew-corrected** across machines via `PunktfunkConnection.clockOffsetNs` (the
connect-time wall-clock handshake, `punktfunk_connection_clock_offset_ns`). It excludes the
layer's decode+present (stage-1 `AVSampleBufferDisplayLayer` has no per-frame present callback);
true decode→present awaits the stage-2 presenter. Settings also picks the HOST
the opt-in **stage-2 presenter** (Settings → Presenter) adds a **capture→present**
(glass-to-glass) line via explicit decode + a Metal/display-link present. Settings also picks the HOST
compositor (KWin/wlroots/Mutter/gamescope, default automatic — the host honors it
only if that backend is available there) and has a **Controllers** section: every
detected controller (capability glyphs, battery, "In use" badge), which one to forward
@@ -166,13 +167,15 @@ signing, bundle id `io.unom.punktfunk`. Notes:
3. **Decode flow**: the host opens every stream with an IDR carrying VPS/SPS/PPS in-band
and recovery keyframes re-send them — "refresh the format description on every IDR"
(what `StreamView` does) is sufficient; there is no out-of-band extradata, ever.
4. **Stage 2 (next)**: explicit `VTDecompressionSession` + `CAMetalLayer` for frame-pacing
control (ProMotion/120 Hz) and true decode→present / glass-to-glass measurement. The
cross-machine clock offset is **already wired**`PunktfunkConnection.clockOffsetNs` (from
the connect-time skew handshake); add it to a `CLOCK_REALTIME` present instant and subtract
the AU `pts_ns`. **Full pickup-ready implementation plan** (decode + present + measurement
wiring, integration points, gotchas): `docs-site/content/docs/apple-stage2-presenter.md`
(rendered in the docs site under "Apple Stage-2 Presenter").
4. **Stage 2 — built, opt-in (`punktfunk.presenter == "stage2"`, default stage 1).** Explicit
`VTDecompressionSession` decode (`VideoDecoder`) → a `CAMetalLayer` + display-link present
(`MetalVideoPresenter`/`Stage2Pipeline`), hosted as a sublayer by the same `StreamView`s with
input capture + HUD unchanged. It adds a **capture→present** (glass-to-glass, modulo the host
render→capture term) HUD line, skew-corrected via `PunktfunkConnection.clockOffsetNs`. The
decode half is unit-tested (`testVideoDecoderAsyncCallbackDeliversPixels`); the Metal present
is display-bound — **validate live** (flip the Settings "Presenter" picker, watch the HUD
number and that the image looks right) before making it the default. 10-bit/HDR + a smoothing
pacer are later. Plan: `docs-site/content/docs/apple-stage2-presenter.md`.
5. **Audio — wired, both directions.** Playback: `SessionAudio` drains `nextAudio()`
on its own thread, decodes through CoreAudio's built-in Opus codec (`OpusCodec.swift`
— kAudioFormatOpus, no bundled libopus; round-trip unit-tested) into a priming