Sources reorganized (client: Home/Session/Settings/Stores/Support/Trust; kit: Audio/Connection/Gamepad/Input/Support/Video/Views) with the big files split along the same seams. The gamepad mode is couch-complete, and now on macOS too (the living-room Mac case), not just iOS/iPadOS: - GamepadSettingsView: a console-style, fully controller-navigable settings screen (X from the launcher) — up/down moves focus, left/right steps values (clamped, boundary thud), A cycles/toggles, B closes; the focused row shows a one-line description. Backed by GamepadMenuList, the vertical sibling of GamepadCarousel, and SettingsOptions — the option lists hoisted out of SettingsView statics and shared by the touch, tvOS and gamepad settings. - GamepadAddHostView + GamepadKeyboard: register a host end to end with a pad — field rows open an on-screen controller keyboard (dpad grid, A types, X backspaces, B done); the launcher carousel ends in an Add Host tile, so the dead-end "add one with touch first" empty state is gone. - Launcher polish: contextual hint bar with the pad's real button glyphs, controller name + battery chip, one shared console chrome. - GamepadScreenBackground: an animated aurora (TimelineView-driven drifting blobs in the brand's violet family, breathing radii, slow hue shift, legibility scrim; freezes under Reduce Motion). Pure SwiftUI on purpose — a .metal library only bundles reliably in one of the two build systems (SPM vs the xcodeproj's synced folders) these sources compile under. - macOS port: settings/add-host/library present as sized sheets (a macOS sheet takes its content's IDEAL size, and the GeometryReader-driven screens collapsed to nothing), NSScreen-based mode lists, scroll indicators .never (the "always show scroll bars" setting overrides .hidden), tray scrims so scrolled rows dim under the pinned title/hints, extra title clearance, and a PUNKTFUNK_FORCE_GAMEPAD_UI=1 dev hook — launcher/settings/add-host/keyboard/ library render-verified live on a real Mac + LAN hosts. - GamepadMenuInput: X button support, and (re)start now snapshots held buttons so a controller handoff press never fires twice (the B that closed the keyboard no longer also cancels the screen underneath). - Cleanups: one "Connection failed" alert in ContentView instead of one per home screen; HostDiscovery.advertises/unsaved shared by both home screens. - host: can_encode_444 stub for the non-Linux/Windows host build (the macOS synthetic-source loopback used by the Swift tests). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
11 KiB
title, description
| title | description |
|---|---|
| Apple Stage-2 Presenter (handoff) | Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left. |
Status: SHIPPED as the default presenter (stage-1
AVSampleBufferDisplayLayeris the Metal-unavailable / DEBUG fallback). HDR corrected and 4:4:4 added on top of the proven main-thread present path (the hosting view'sCADisplayLinkdrivesrenderper vsync). Code:clients/apple/Sources/PunktfunkKit/Video/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,SessionPresenter,Stage444Probe,LatencyMeter}.swift. This doc is trimmed to design rationale + open items — the shipped.swiftcode is the source of truth for the decode/present/measurement walkthrough.HDR (the "too bright" fix). The presenter renders to a separate CAMetalLayer drawable, so the mastering metadata that was attached to the source
CVPixelBufferwas never composited — and with no reference-white anchor the system rendered the PQ signal far too bright. The fix is to keep the PQ-passthrough shader (BT.2020 limited→full → PQ R′G′B′ as-is) and put the anchor on the layer:colorspace = itur_2100_PQ,wantsExtendedDynamicRangeContent = true(on all platforms — the old#if os(macOS)guard left iOS/tvOS EDR half-engaged), andedrMetadata = CAEDRMetadata.hdr10(displayInfo:contentInfo:opticalOutputScale: 203). 203 nits = BT.2408 HDR reference white anchors diffuse white at EDR 1.0; a larger value renders dimmer. The mastering/CLL blobs (host0xCEdatagram) now refineedrMetadata(drained by the pump,setHdrMetahops the layer write to main) rather than being attached to a never-composited source buffer. Needs on-glass validation on a real EDR panel.Mid-session SDR↔HDR. The control-plane colour (
connection.isHDR, from the Welcome) is fixed per session, but the host can re-init its encoder mid-session (a game entering HDR), so the HEVC VUI — and the decoder'sframe.isHDR— flips. The presenter follows the decoded frame, not the latched session flag:rendercalls the idempotentconfigure(hdr:)every frame, so on a flip it reconfigures the layer (per-mode pixel formatbgra8UnormSDR /rgba16FloatHDR, colorspace, EDR) and selects the matching shader — all synchronously on the main thread (the present path is main-thread, so no cross-thread hop is needed). The last0xCEgrade is cached so an SDR→HDR reconfigure re-applies the real mastering metadata instead of the bare anchor. The pump drains0xCEunconditionally (not gated on the Welcome flag) so a session that starts SDR still gets mastering metadata when it goes HDR. A ≤2-frame transition flash on the rare flip is accepted.Pacing. The hosting view owns a main-runloop
CADisplayLink(a weakDisplayLinkProxybreaks the retain cycle; the shared per-session lifecycle lives inSessionPresenter) that callsrenderTickonce per vsync.renderTickpops the newest ready frame from the 1-slot ring (older undisplayed frames dropped — lowest latency, no smoothing buffer) and, if there is one, draws it via manuallayer.nextDrawable()and presents at the next vsync; a frame that could not be rendered (no drawable yet) is put back into the still-empty ring so the next tick retries it (under the infinite GOP a static scene sends no replacement — losing the frame would freeze stale content). On an idle vsync it does nothing and the compositor holds the last presented drawable (no idle re-render — matters at 5K).drawableSizeis set beforenextDrawable(it doesn't track bounds, defaults to 0) to the layer's pixel size (bounds × contentsScale), so the shader — not the compositor's bilinear — performs the decoded→on-screen scale (bicubic Catmull-Rom luma + siting-corrected bilinear chroma); a native-mode session is exactly 1:1 (the kernel reduces to the identity texel).maximumDrawableCount = 3. On iOS/tvOSSessionPresentersets the link'spreferredFrameRateRangeto the negotiated refresh (+CADisableMinimumFrameDurationOnPhonein Info.plist) — without it ProMotion devices cap the link at 60 Hz and a 120 fps stream presents at half rate; macOS'sNSView.displayLinkalready tracks its display and is left alone. macOSdisplaySyncEnabled = **false**: the display link is the single pacing source, so leaving the layer's own vsync wait on would also blockpresent/nextDrawableon the main thread and serialize it to the display — the cause of the fullscreen judder; disabling it lets present return promptly. Present is stamped at the drawable's actualpresentedTime(addPresentedHandler, converted toCLOCK_REALTIME), falling back to the display link'stargetTimestampprojection when the system reports none (a dropped drawable) — so the HUD numbers reflect glass, and a missed vsync shows up instead of being assumed away. The same stamp feeds decode→present (presentTailMeter→ the HUD's "decode→present" line), closing the third instant promised below.(History: an off-main
CAMetalDisplayLinkvariant and an off-main blocking-render present thread were both tried and reverted — both measured slower on macOS and iPad than this main-thread display-link path, whose real judder fix was simplydisplaySyncEnabled = false, not moving present off-thread. Measured ~11 ms p50 on the main-thread path.)4:4:4. Chroma, bit-depth, and colorimetry are orthogonal: the decode pixel format is a 2×2 of
(chroma, HDR)→420v/x420/444v/x444(all biplanar, so the existing shaders sample a full-size chroma plane unchanged); the shader is keyed only on HDR. The client advertisesVIDEO_CAP_444only whenStage444Probeconfirms hardware 4:4:4 decode (a hardware-requiredVTDecompressionSessionover an embedded 256×256 4:4:4 keyframe — software 4:4:4 is too slow for real-time; validated on M3:444v/x444produced). A bounded pump backstop ends a 4:4:4 session that persistently fails to decode (gated to 4:4:4 sessions, so 4:2:0 loss-recovery is untouched).
Why stage 2 (design rationale)
The stage-1 presenter feeds compressed HEVC straight into AVSampleBufferDisplayLayer, which
hardware-decodes and presents internally with no per-frame callback — so we can't stamp decode or
present, and we can't hand-pace. Stage-2 takes explicit control: decode with
VTDecompressionSession, present decoded frames through a CAMetalLayer driven by a display link.
Two wins justify the extra machinery:
- ~0.5 refresh off the present tail — the present tail is the biggest client latency term at 60 Hz; display-link-driven present pops the newest-ready frame each vsync instead of letting the layer present on its own internal schedule.
- True decode→present / glass-to-glass measurement — explicit decode-completion and present
timestamps make
capture→presentmeasurable (modulo the still-unmeasured host render→capture term).
All of this is macOS/iOS/tvOS-only — build + validate on a Mac (swift build && swift test, then
live against a Linux host). The host + connector side is already done:
PunktfunkConnection.clockOffsetNs (the connect-time skew offset, host minus client) is what makes the
present timestamp cross-machine valid. skewCorrected stays false when clockOffsetNs == 0 (old host)
— then the numbers are same-host-only.
Architecture pattern (worth recording)
Async VTDecompressionSession callback → 1-slot newest-ready ring → display-link-driven present:
- VT decode is async; the output callback runs on a VT-managed thread — don't block it, just stamp
decode-completion (
CLOCK_REALTIMEns) + enqueue. Retain theCVPixelBufferuntil presented (the ring owns it). - Each vsync pops the newest ready frame and drops older undisplayed ones — low-latency default, no smoothing buffer.
- Three per-frame instants (all
CLOCK_REALTIMEns, all shifted byclockOffsetNsto the host clock): capture→decoded =decodedNs + offset − pts_ns; decode→present =presentedNs − decodedNs(the tail stage-2 shortens); capture→present =presentedNs + offset − pts_ns— the glass-to-glass number.
Open items
- On-glass HDR validation — eyeball
edrMetadata+opticalOutputScale: 203on a real EDR panel (XDR display) against stage-1 side-by-side: diffuse white should sit at SDR-white level with only highlights climbing. The reference white is a single named constant (hdrReferenceWhiteNits) for easy tuning. (Needs a Windows HDR host; the Linux host is 8-bit SDR only.) - On-glass 4:4:4 validation — confirm a
PUNKTFUNK_444host (RTX box) streams a 4:4:4 session the client decodes in hardware (HUD shows the resolved chroma); verify the resolution-ceiling backstop by forcing a too-large 4:4:4 mode. - Glass-to-glass numbers via
tools/latency-probe— close the still-unmeasured host render→capture term and confirm the main-thread display-link present p50 holds at ~11 ms (and isn't regressed by the per-frameconfigure/ HDR-anchor work). The HUD's new decode→present line + thepresentedTime-based stamp make the client-side share directly visible now. - On-glass validation of the 2026-07 presenter batch — the shader-side scale (drawable at layer
pixel size; bicubic luma + chroma-siting offset — compare a resized/fullscreen-on-larger-panel
window against stage-1 for sharpness, and check GPU headroom at 5K HDR), the iOS/tvOS
preferredFrameRateRange(a 120 fps stream on a ProMotion iPhone/iPad should now present at ~120 — watch the HUD fps),kVTDecompressionPropertyKey_RealTime, and the zero-copy AnnexB → CMBlockBuffer packing (unit/round-trip tested; confirm live). - Smoothing / pacing policy — present newest-ready for lowest latency today; an optional even-pacing
policy (
present(_:afterMinimumDuration:)) can come later if frames look uneven. - 4:4:4 runtime downgrade-reconnect — today a persistently-undecodable 4:4:4 session ends cleanly
(the live 4:4:4 decode requires hardware, so a resolution-ceiling miss fails the session create
synchronously and the pump backstop ends it — no black-screen loop); auto-reconnecting at 4:2:0
(dropping
VIDEO_CAP_444) is a future refinement. - HLG —
isHDR/isHDRFormatfold HLG (transfer 18) in with PQ, but the presenter is PQ-only (itur_2100_PQ+hdr10EDR), so an HLG stream would be mis-toned. Latent — no host emits HLG (the stack is BT.2020 PQ only). A real HLG path (itur_2100_HLG, no PQ reference-white anchor) is future work; until then HLG should be treated as out of scope. - Full-range — the shaders hardcode limited→full expansion and the decoder requests the
*VideoRangeformats regardless ofconnection.colorFullRange; VideoToolbox range-converts a full-range source to video range on decode, so it stays self-consistent (mild level compression on genuinely full-range content, which no host emits). Pre-existing; wirecolorFullRangeinto the range constants eventually.