Files
punktfunk/design/apple-stage2-presenter.md
T
enricobuehler 7b99b41ede docs(design): trim shipped plans, consolidate cluster, add index
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).

- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
  apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
  host-latency, gpu-contention (fixed stale status table), game-library,
  linux-setup (fixed m0->spike + stale zero-copy claim),
  session-aware-host-followups, windows-client-bootstrap,
  windows-dualsense-{scoping,game-detection}, windows-virtual-display,
  security-review (per-finding status table; #12 still open),
  apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
  windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
  merged, M4 done); windows-secure-desktop.md archived (now a fallback
  behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
  roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:39:06 +00:00

3.4 KiB
Raw Blame History

title, description
title description
Apple Stage-2 Presenter (handoff) Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left.

Status: SHIPPED behind the opt-in punktfunk.presenter flag (AVSampleBufferDisplayLayer stage-1 remains the default known-good path). Live-validated ~11 ms p50 capture→present (commit 7b10714). Code: clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,LatencyMeter}.swift; Settings has a presenter picker (DefaultsKey.presenter, SettingsView.swift). This doc is trimmed to design rationale + open items — the shipped .swift code is the source of truth for the decode/present/measurement walkthrough.

Why stage 2 (design rationale)

The stage-1 presenter feeds compressed HEVC straight into AVSampleBufferDisplayLayer, which hardware-decodes and presents internally with no per-frame callback — so we can't stamp decode or present, and we can't hand-pace. Stage-2 takes explicit control: decode with VTDecompressionSession, present decoded frames through a CAMetalLayer driven by a display link.

Two wins justify the extra machinery:

  • ~0.5 refresh off the present tail — the present tail is the biggest client latency term at 60 Hz; display-link-driven present pops the newest-ready frame each vsync instead of letting the layer present on its own internal schedule.
  • True decode→present / glass-to-glass measurement — explicit decode-completion and present timestamps make capture→present measurable (modulo the still-unmeasured host render→capture term).

All of this is macOS/iOS/tvOS-only — build + validate on a Mac (swift build && swift test, then live against a Linux host). The host + connector side is already done: PunktfunkConnection.clockOffsetNs (the connect-time skew offset, host minus client) is what makes the present timestamp cross-machine valid. skewCorrected stays false when clockOffsetNs == 0 (old host) — then the numbers are same-host-only.

Architecture pattern (worth recording)

Async VTDecompressionSession callback → 1-slot newest-ready ring → display-link-driven present:

  • VT decode is async; the output callback runs on a VT-managed thread — don't block it, just stamp decode-completion (CLOCK_REALTIME ns) + enqueue. Retain the CVPixelBuffer until presented (the ring owns it).
  • Each vsync pops the newest ready frame and drops older undisplayed ones — low-latency default, no smoothing buffer.
  • Three per-frame instants (all CLOCK_REALTIME ns, all shifted by clockOffsetNs to the host clock): capture→decoded = decodedNs + offset pts_ns; decode→present = presentedNs decodedNs (the tail stage-2 shortens); capture→present = presentedNs + offset pts_ns — the glass-to-glass number.

Open items

  • Make stage 2 the default — after resolution / HDR edge-case checks (HDR = BT.2020/PQ, 10-bit …10BiPlanar + EDR CAMetalLayer.wantsExtendedDynamicRangeContent; ties in with the HDR roadmap).
  • Glass-to-glass numbers via tools/latency-probe — close the still-unmeasured host render→capture term.
  • Smoothing / pacing policy — present newest-ready for lowest latency today; a pacing policy can come later if frames look uneven.
  • iOS / iPadOS / tvOS stage-2 variants.