--- title: "Apple Stage-2 Presenter (handoff)" description: "Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left." --- > **Status:** SHIPPED behind the opt-in `punktfunk.presenter` flag (`AVSampleBufferDisplayLayer` > stage-1 remains the default known-good path). Live-validated ~11 ms p50 capture→present (commit > `7b10714`). Code: `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,LatencyMeter}.swift`; > Settings has a presenter picker (`DefaultsKey.presenter`, `SettingsView.swift`). This doc is trimmed > to design rationale + open items — the shipped `.swift` code is the source of truth for the > decode/present/measurement walkthrough. ## Why stage 2 (design rationale) The **stage-1** presenter feeds compressed HEVC straight into `AVSampleBufferDisplayLayer`, which hardware-decodes **and presents internally with no per-frame callback** — so we can't stamp decode or present, and we can't hand-pace. **Stage-2** takes explicit control: decode with `VTDecompressionSession`, present decoded frames through a `CAMetalLayer` driven by a display link. Two wins justify the extra machinery: - **~0.5 refresh off the present tail** — the present tail is the biggest client latency term at 60 Hz; display-link-driven present pops the newest-ready frame each vsync instead of letting the layer present on its own internal schedule. - **True decode→present / glass-to-glass measurement** — explicit decode-completion and present timestamps make `capture→present` measurable (modulo the still-unmeasured host render→capture term). All of this is **macOS/iOS/tvOS-only** — build + validate on a Mac (`swift build && swift test`, then live against a Linux host). The host + connector side is already done: `PunktfunkConnection.clockOffsetNs` (the connect-time skew offset, host minus client) is what makes the present timestamp cross-machine valid. `skewCorrected` stays false when `clockOffsetNs == 0` (old host) — then the numbers are same-host-only. ## Architecture pattern (worth recording) Async `VTDecompressionSession` callback → **1-slot newest-ready ring** → display-link-driven present: - VT decode is **async**; the output callback runs on a VT-managed thread — don't block it, just stamp decode-completion (`CLOCK_REALTIME` ns) + enqueue. Retain the `CVPixelBuffer` until presented (the ring owns it). - Each vsync pops the **newest** ready frame and drops older undisplayed ones — low-latency default, no smoothing buffer. - Three per-frame instants (all `CLOCK_REALTIME` ns, all shifted by `clockOffsetNs` to the host clock): **capture→decoded** = `decodedNs + offset − pts_ns`; **decode→present** = `presentedNs − decodedNs` (the tail stage-2 shortens); **capture→present** = `presentedNs + offset − pts_ns` — the glass-to-glass number. ## Open items - **Make stage 2 the default** — after resolution / HDR edge-case checks (HDR = BT.2020/PQ, 10-bit `…10BiPlanar` + EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`; ties in with the HDR roadmap). - **Glass-to-glass numbers via `tools/latency-probe`** — close the still-unmeasured host render→capture term. - **Smoothing / pacing policy** — present newest-ready for lowest latency today; a pacing policy can come later if frames look uneven. - **iOS / iPadOS / tvOS stage-2 variants.**