Opt-in (Settings -> Presenter; `punktfunk.presenter`, default stage-1). Stage-1's
AVSampleBufferDisplayLayer decodes AND presents internally with no per-frame
callback, so neither decode nor present can be stamped or hand-paced. Stage-2
takes explicit control:
- VideoDecoder: VTDecompressionSession, async output callback stamps
decode-completion, session rebuilt on every IDR / format change. Unit-tested
(testVideoDecoderAsyncCallbackDeliversPixels).
- MetalVideoPresenter: CAMetalLayer + CVMetalTextureCache + a runtime-compiled
BT.709 limited-range NV12->RGB shader, present at the next vsync. The
CVMetalTextures + pixel buffer are held until the GPU completes.
- Stage2Pipeline: pump thread -> decoder -> newest-ready 1-slot ring; the hosting
view's display link drains it once per vsync and stamps capture->present
(the display-link target time projected into CLOCK_REALTIME).
- LatencyMeter gains record(ptsNs:atNs:offsetNs:); the HUD shows a capture->present
(glass-to-glass, modulo host render->capture) line, skew-corrected via
clockOffsetNs. Measured live ~11 ms p50 vs ~2.2 ms capture->client.
- StreamView / StreamViewIOS host the CAMetalLayer as a sublayer + a CADisplayLink
(NSView.displayLink on macOS) when stage-2; input capture + HUD unchanged. The
session-active gates switch from `pump != nil` to `connection != nil` so capture
engages without a StreamPump.
Validated: builds macOS/iOS/tvOS; the decode half is unit-tested; the Metal
present is live-validated on glass (correct image + the capture->present number).
Colorspace is BT.709 SDR for now; 10-bit/HDR + a pacing policy are later.
Plan: docs-site/content/docs/apple-stage2-presenter.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>