---
title: "Apple Stage-2 Presenter (handoff)"
description: "Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left."
---

> **Status:** SHIPPED behind the opt-in `punktfunk.presenter` flag (`AVSampleBufferDisplayLayer`
> stage-1 remains the default known-good path). Live-validated ~11 ms p50 capture→present (commit
> `7b10714`). Code: `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,LatencyMeter}.swift`;
> Settings has a presenter picker (`DefaultsKey.presenter`, `SettingsView.swift`). This doc is trimmed
> to design rationale + open items — the shipped `.swift` code is the source of truth for the
> decode/present/measurement walkthrough.

## Why stage 2 (design rationale)

The **stage-1** presenter feeds compressed HEVC straight into `AVSampleBufferDisplayLayer`, which
hardware-decodes **and presents internally with no per-frame callback** — so we can't stamp decode or
present, and we can't hand-pace. **Stage-2** takes explicit control: decode with
`VTDecompressionSession`, present decoded frames through a `CAMetalLayer` driven by a display link.

Two wins justify the extra machinery:

- **~0.5 refresh off the present tail** — the present tail is the biggest client latency term at 60 Hz;
  display-link-driven present pops the newest-ready frame each vsync instead of letting the layer present
  on its own internal schedule.
- **True decode→present / glass-to-glass measurement** — explicit decode-completion and present
  timestamps make `capture→present` measurable (modulo the still-unmeasured host render→capture term).

All of this is **macOS/iOS/tvOS-only** — build + validate on a Mac (`swift build && swift test`, then
live against a Linux host). The host + connector side is already done:
`PunktfunkConnection.clockOffsetNs` (the connect-time skew offset, host minus client) is what makes the
present timestamp cross-machine valid. `skewCorrected` stays false when `clockOffsetNs == 0` (old host)
— then the numbers are same-host-only.

## Architecture pattern (worth recording)

Async `VTDecompressionSession` callback → **1-slot newest-ready ring** → display-link-driven present:

- VT decode is **async**; the output callback runs on a VT-managed thread — don't block it, just stamp
  decode-completion (`CLOCK_REALTIME` ns) + enqueue. Retain the `CVPixelBuffer` until presented (the ring
  owns it).
- Each vsync pops the **newest** ready frame and drops older undisplayed ones — low-latency default, no
  smoothing buffer.
- Three per-frame instants (all `CLOCK_REALTIME` ns, all shifted by `clockOffsetNs` to the host clock):
  **capture→decoded** = `decodedNs + offset − pts_ns`; **decode→present** = `presentedNs − decodedNs`
  (the tail stage-2 shortens); **capture→present** = `presentedNs + offset − pts_ns` — the glass-to-glass
  number.

## Open items

- **Make stage 2 the default** — after resolution / HDR edge-case checks (HDR = BT.2020/PQ, 10-bit
  `…10BiPlanar` + EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`; ties in with the HDR roadmap).
- **Glass-to-glass numbers via `tools/latency-probe`** — close the still-unmeasured host render→capture
  term.
- **Smoothing / pacing policy** — present newest-ready for lowest latency today; a pacing policy can come
  later if frames look uneven.
- **iOS / iPadOS / tvOS stage-2 variants.**