Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).
- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
host-latency, gpu-contention (fixed stale status table), game-library,
linux-setup (fixed m0->spike + stale zero-copy claim),
session-aware-host-followups, windows-client-bootstrap,
windows-dualsense-{scoping,game-detection}, windows-virtual-display,
security-review (per-finding status table; #12 still open),
apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
merged, M4 done); windows-secure-desktop.md archived (now a fallback
behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3.4 KiB
title, description
| title | description |
|---|---|
| Apple Stage-2 Presenter (handoff) | Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left. |
Status: SHIPPED behind the opt-in
punktfunk.presenterflag (AVSampleBufferDisplayLayerstage-1 remains the default known-good path). Live-validated ~11 ms p50 capture→present (commit7b10714). Code:clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,LatencyMeter}.swift; Settings has a presenter picker (DefaultsKey.presenter,SettingsView.swift). This doc is trimmed to design rationale + open items — the shipped.swiftcode is the source of truth for the decode/present/measurement walkthrough.
Why stage 2 (design rationale)
The stage-1 presenter feeds compressed HEVC straight into AVSampleBufferDisplayLayer, which
hardware-decodes and presents internally with no per-frame callback — so we can't stamp decode or
present, and we can't hand-pace. Stage-2 takes explicit control: decode with
VTDecompressionSession, present decoded frames through a CAMetalLayer driven by a display link.
Two wins justify the extra machinery:
- ~0.5 refresh off the present tail — the present tail is the biggest client latency term at 60 Hz; display-link-driven present pops the newest-ready frame each vsync instead of letting the layer present on its own internal schedule.
- True decode→present / glass-to-glass measurement — explicit decode-completion and present
timestamps make
capture→presentmeasurable (modulo the still-unmeasured host render→capture term).
All of this is macOS/iOS/tvOS-only — build + validate on a Mac (swift build && swift test, then
live against a Linux host). The host + connector side is already done:
PunktfunkConnection.clockOffsetNs (the connect-time skew offset, host minus client) is what makes the
present timestamp cross-machine valid. skewCorrected stays false when clockOffsetNs == 0 (old host)
— then the numbers are same-host-only.
Architecture pattern (worth recording)
Async VTDecompressionSession callback → 1-slot newest-ready ring → display-link-driven present:
- VT decode is async; the output callback runs on a VT-managed thread — don't block it, just stamp
decode-completion (
CLOCK_REALTIMEns) + enqueue. Retain theCVPixelBufferuntil presented (the ring owns it). - Each vsync pops the newest ready frame and drops older undisplayed ones — low-latency default, no smoothing buffer.
- Three per-frame instants (all
CLOCK_REALTIMEns, all shifted byclockOffsetNsto the host clock): capture→decoded =decodedNs + offset − pts_ns; decode→present =presentedNs − decodedNs(the tail stage-2 shortens); capture→present =presentedNs + offset − pts_ns— the glass-to-glass number.
Open items
- Make stage 2 the default — after resolution / HDR edge-case checks (HDR = BT.2020/PQ, 10-bit
…10BiPlanar+ EDRCAMetalLayer.wantsExtendedDynamicRangeContent; ties in with the HDR roadmap). - Glass-to-glass numbers via
tools/latency-probe— close the still-unmeasured host render→capture term. - Smoothing / pacing policy — present newest-ready for lowest latency today; a pacing policy can come later if frames look uneven.
- iOS / iPadOS / tvOS stage-2 variants.