feat(apple): Improve presenter
apple / screenshots (push) Has been cancelled
apple / swift (push) Has been cancelled
ci / docs-site (push) Has been cancelled
ci / bench (push) Has been cancelled
ci / web (push) Has been cancelled
ci / rust (push) Has been cancelled
android-screenshots / screenshots (push) Successful in 2m16s
deb / build-publish (push) Successful in 3m26s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
windows-host / package (push) Successful in 6m48s
release / apple (push) Successful in 7m45s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m22s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m37s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
android / android (push) Successful in 9m35s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m32s
linux-client-screenshots / screenshots (push) Successful in 2m31s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m53s
web-screenshots / screenshots (push) Successful in 2m32s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m37s
docker / deploy-docs (push) Failing after 1m4s
flatpak / build-publish (push) Failing after 3m44s

feat(apple): add cursor capture on iPad
This commit is contained in:
2026-06-30 01:31:48 +02:00
parent e2d4c40167
commit 1c04e77293
18 changed files with 901 additions and 228 deletions
+19 -5
View File
@@ -144,11 +144,25 @@ Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protoc
`test-loopback.sh` (Swift client vs synthetic punktfunk1-hosts on loopback — runs on macOS;
includes the pairing ceremony + `--require-pairing` gate),
`RemoteFirstLightTests` (full pipeline over the LAN). See
[`clients/apple/README.md`](clients/apple/README.md). **Stage 2 presenter**
(`VTDecompressionSession` + `CAMetalLayer`) is built and live-validated on glass behind the opt-in
`punktfunk.presenter` flag (~11 ms p50 capture→present), to become the default after a few
resolution/HDR checks. Next: make stage 2 the default, glass-to-glass numbers via
`tools/latency-probe`, iOS/iPadOS/tvOS variants.
[`clients/apple/README.md`](clients/apple/README.md). **Stage 2 presenter is now the DEFAULT**
(stage-1 is the Metal-unavailable / DEBUG fallback): explicit `VTDecompressionSession` decode →
`CAMetalLayer`, presented from the hosting view's **main-runloop `CADisplayLink`** (`renderTick` pops
the newest ready frame per vsync; macOS `displaySyncEnabled = false` is the real fullscreen-judder fix,
~11 ms p50). *(An off-main `CAMetalDisplayLink` and an off-main blocking-render present thread were
both tried and reverted — both measured slower on macOS and iPad.)* **HDR fixed**
(`design/apple-stage2-presenter.md`): the "too bright" bug was a missing reference-white anchor — the
fix keeps the PQ-passthrough shader and adds `CAEDRMetadata.hdr10(…, opticalOutputScale: 203)` +
`wantsExtendedDynamicRangeContent` on the layer (on all platforms; the old `#if os(macOS)` guard left
iOS/tvOS EDR half-engaged), routing the 0xCE mastering metadata to the layer (via `setHdrMeta`) instead
of a never-composited source buffer. **Mid-session SDR↔HDR** is handled: `render` reconciles the layer
per-frame from the decoded `frame.isHDR` (per-mode pixel format `bgra8`/`rgba16Float`), so a game
entering HDR mid-stream just reconfigures (last 0xCE grade cached + re-applied; pump drains 0xCE
unconditionally). **4:4:4 added**: decode format is a 2×2 `(chroma, HDR)` matrix
(`420v/x420/444v/x444`, all biplanar so the shaders are unchanged), advertised (`VIDEO_CAP_444`) only
behind a **hardware-required `VTDecompressionSession` probe** (`Stage444Probe`, validated on M3) with a
Settings opt-out + a bounded pump backstop for an undecodable 4:4:4 session. *HDR brightness + 4:4:4
still need on-glass validation (Windows-HDR / `PUNKTFUNK_444` host).* Next: glass-to-glass numbers via
`tools/latency-probe`.
**Linux stage 1 done, first light 2026-06-12** (`clients/linux`, binary
`punktfunk-client`): GTK4/libadwaita shell linking `punktfunk-core` directly (no C ABI;
`NativeClient` is now `Sync` — mutexed plane receivers), mDNS host list, TOFU + SPAKE2
@@ -129,6 +129,8 @@ final class SessionModel: ObservableObject {
#endif
}()
let hdrCapable = hdrEnabled && displayHDR
// 4:4:4 opt-out (default on); the hardware-decode probe below is the real gate.
let want444 = (UserDefaults.standard.object(forKey: DefaultsKey.enable444) as? Bool) ?? true
Task.detached(priority: .userInitiated) {
// PunktfunkConnection.init blocks on the QUIC handshake keep it off the main
// actor. The persistent identity is presented on every connect so a paired
@@ -138,9 +140,21 @@ final class SessionModel: ObservableObject {
// Advertise 10-bit + HDR10 when enabled: the host upgrades to a BT.2020 PQ Main10 stream
// only for actual HDR content (its own gate); the VideoToolbox/Metal present path is
// HDR-capable (P010 + itur_2100_PQ + EDR). 0 keeps the 8-bit BT.709 SDR stream.
let videoCaps: UInt8 = hdrCapable
var videoCaps: UInt8 = hdrCapable
? (PunktfunkConnection.videoCap10Bit | PunktfunkConnection.videoCapHDR)
: 0
// Advertise full-chroma 4:4:4 only when allowed AND this device can HARDWARE-decode it
// (software 4:4:4 is too slow for real-time). The host content-gates depth, so an
// HDR-advertised session can still receive an 8-bit 4:4:4 stream (SDR content) require
// BOTH depths there. Otherwise a no-op (the host emits 4:4:4 only if it too opted in);
// `chromaFormat` on the connection reflects what was actually resolved.
let canDecode444 =
hdrCapable
? (Stage444Probe.hwDecode444_8bit && Stage444Probe.hwDecode444_10bit)
: Stage444Probe.hwDecode444_8bit
if want444, canDecode444 {
videoCaps |= PunktfunkConnection.videoCap444
}
let result = Result { try PunktfunkConnection(
host: host.address, port: host.port,
width: width, height: height, refreshHz: hz,
@@ -25,6 +25,7 @@ struct SettingsView: View {
@AppStorage(DefaultsKey.bitrateKbps) private var bitrateKbps = 0
@AppStorage(DefaultsKey.presenter) private var presenter = "stage2"
@AppStorage(DefaultsKey.hdrEnabled) private var hdrEnabled = true
@AppStorage(DefaultsKey.enable444) private var enable444 = true
@AppStorage(DefaultsKey.libraryEnabled) private var libraryEnabled = false
@AppStorage(DefaultsKey.fullscreenWhileStreaming) private var fullscreenWhileStreaming = true
@AppStorage(DefaultsKey.micEnabled) private var micEnabled = true
@@ -36,6 +37,7 @@ struct SettingsView: View {
@State private var showControllerTest = false
#endif
#if os(iOS)
@AppStorage(DefaultsKey.pointerCapture) private var pointerCapture = true
// The sidebar selection drives the detail pane on iPad and the pushed sub-page on iPhone.
// Width class decides the initial value: nil on iPhone (show the category list first),
// General on iPad (a two-column layout should never open with an empty detail).
@@ -206,6 +208,7 @@ struct SettingsView: View {
case .general:
Form {
streamModeSection
pointerSection
compositorSection
}
.formStyle(.grouped)
@@ -581,6 +584,30 @@ struct SettingsView: View {
}
}
#if os(iOS)
/// iPad-only pointer-capture toggle: lock the mouse/trackpad for relative movement (games) vs
/// forward an absolute cursor position (desktop). Empty on iPhone (no hardware-pointer lock
/// the mouse path there is always the absolute fallback).
@ViewBuilder private var pointerSection: some View {
if UIDevice.current.userInterfaceIdiom == .pad {
Section {
Toggle("Capture pointer for games", isOn: $pointerCapture)
} header: {
Text("Pointer")
} footer: {
Text("With a mouse or trackpad connected, lock the pointer and send relative "
+ "movement — the expected behavior for games (mouse-look). Turn this off for "
+ "desktop use to keep the pointer free and send its absolute position instead. "
+ "The lock needs the stream full-screen and frontmost; it falls back to the "
+ "absolute pointer automatically (Stage Manager, Slide Over). Finger touch is "
+ "unaffected. Applies from the next session.")
.font(.geist(12, relativeTo: .caption))
.foregroundStyle(.secondary)
}
}
}
#endif
@ViewBuilder private var compositorSection: some View {
Section {
Picker("Compositor", selection: $compositor) {
@@ -644,12 +671,15 @@ struct SettingsView: View {
@ViewBuilder private var hdrSection: some View {
Section {
Toggle("10-bit HDR", isOn: $hdrEnabled)
Toggle("Full chroma (4:4:4)", isOn: $enable444)
} header: {
Text("HDR")
Text("Video quality")
} footer: {
Text("Request a 10-bit BT.2020 PQ (HDR10) stream. It only engages when the host is "
+ "sending HDR content AND this display supports HDR — otherwise the stream stays "
+ "8-bit SDR. Applies from the next session.")
Text("HDR requests a 10-bit BT.2020 PQ (HDR10) stream — it only engages when the host is "
+ "sending HDR content AND this display supports HDR. 4:4:4 requests full chroma "
+ "(sharper text/UI, more bandwidth) — it only engages when this device can "
+ "hardware-decode it AND the host opted in. Otherwise the stream stays 8-bit "
+ "4:2:0 SDR. Applies from the next session.")
.font(.geist(12, relativeTo: .caption))
.foregroundStyle(.secondary)
}
@@ -25,9 +25,19 @@ public enum DefaultsKey {
/// Request a 10-bit BT.2020 PQ (HDR10) stream. On by default; only takes effect when the host
/// has HDR content AND this display supports HDR otherwise the stream stays 8-bit SDR.
public static let hdrEnabled = "punktfunk.hdrEnabled"
/// Request a full-chroma 4:4:4 stream when this device can HARDWARE-decode it (`Stage444Probe`).
/// On by default; only takes effect when the host also opted in to 4:4:4 (otherwise the stream
/// stays 4:2:0). Sharper text/UI at the cost of more bandwidth.
public static let enable444 = "punktfunk.enable444"
public static let hosts = "punktfunk.hosts"
/// Client-side cursor mode: "auto" (shown only in gamescope sessions), "always", "never".
public static let cursorMode = "punktfunk.cursorMode"
/// iPad: capture the mouse/trackpad pointer (pointer lock relative movement) for games,
/// rather than forwarding an absolute cursor position. On by default. Only meaningful on iPad
/// with a hardware mouse/trackpad; the system grants the lock only to a full-screen, frontmost
/// scene and silently falls back to the absolute pointer when it can't (Stage Manager / Slide
/// Over). Read by `StreamViewController.prefersPointerLocked`.
public static let pointerCapture = "punktfunk.pointerCapture"
/// Experimental: show the host's game library (browsed over the management API). Off by default.
public static let libraryEnabled = "punktfunk.libraryEnabled"
/// macOS: take the window fullscreen while streaming and restore it on the host list. On by default.
@@ -370,29 +370,32 @@ public final class GamepadFeedback {
// Hidout traffic (lightbar / player LEDs / triggers) only exists on a PlayStation-pad
// session a DualSense or a DualShock 4 (lightbar only). Block briefly on it there and
// let rumble own the wait elsewhere; on an Xbox session it stays nonblocking.
let hasHidout = connection.resolvedGamepad == .dualSense
|| connection.resolvedGamepad == .dualShock4
let hidTimeout: UInt32 = hasHidout ? 10 : 0
let thread = Thread { [connection, flag, drainDone, weak self] in
while !flag.isStopped {
do {
if let r = try connection.nextRumble(timeoutMs: 10), r.pad == 0 {
// Poll the feedback planes NON-BLOCKING. A blocking poll (timeoutMs > 0) holds
// the connection's shared feedback lock for its whole wait; the video pump drains
// HDR mastering metadata (nextHdrMeta) on the SAME lock every frame, so a blocking
// poll here starved it and throttled HDR to ~1 fps (SDR, which never drains HDR
// meta, was unaffected). Pacing with a short sleep OUTSIDE the lock (below) keeps
// rumble/HID latency low while leaving the lock free between polls.
if let r = try connection.nextRumble(timeoutMs: 0), r.pad == 0 {
self?.rumble.apply(low: r.low, high: r.high)
}
// Drain a BOUNDED burst of hidout events: only the first poll waits,
// and the cap + stop check keep sustained 0xCD traffic (a game writing
// per-frame LED/trigger reports) from starving the rumble poll above
// or blocking stop() past one cycle.
// Drain a BOUNDED burst of hidout events so sustained 0xCD traffic (a game writing
// per-frame LED/trigger reports) can't spin here or block stop() past one cycle.
var burst = 0
while burst < 64, !flag.isStopped,
let ev = try connection.nextHidOutput(
timeoutMs: burst == 0 ? hidTimeout : 0) {
let ev = try connection.nextHidOutput(timeoutMs: 0) {
self?.render(ev)
burst += 1
}
} catch {
break // .closed (or fatal) the session is over
}
// ~8 ms poll cadence (125 Hz), slept OUTSIDE the feedback lock low rumble/HID
// latency without holding the lock the HDR-meta drain needs.
if !flag.isStopped { Thread.sleep(forTimeInterval: 0.008) }
}
drainDone.signal()
}
@@ -107,6 +107,23 @@ public final class InputCapture {
/// macOS (no GCMouse handlers installed; `sendMouseAbs` is never called there). Main-queue.
public var gcMouseForwarding = false
#if os(iOS)
/// Whether any device is attached as a `GCMouse` right now. The Magic Keyboard TRACKPAD does
/// not always register as a GCMouse on iPadOS (only a standalone mouse does) when no GCMouse
/// is present the relative GCMouse path can't carry pointer motion. Main-queue.
public var hasGCMouse: Bool { !mice.isEmpty }
/// Diagnostic: a one-line description of every attached GCMouse (count + GCDevice identity), so
/// PUNKTFUNK_INPUT_DEBUG can reveal whether the trackpad showed up as a mouse at all.
public var attachedMiceSummary: String {
guard !mice.isEmpty else { return "0 mice" }
let parts = mice.map { mouse -> String in
"\(mouse.productCategory)/\(mouse.vendorName ?? "?")"
}
return "\(mice.count) mice: \(parts.joined(separator: ", "))"
}
#endif
/// Fired on (the capture toggle detected here so it works in both states; the
/// event itself is swallowed). Main queue.
public var onToggleCapture: (() -> Void)?
@@ -394,6 +411,12 @@ public final class InputCapture {
!mice.contains(where: { $0 === mouse }) // re-delivered on wake attach once
else { return }
mice.append(mouse)
#if os(iOS)
if inputDebug {
inputLog.debug(
"GCMouse attached: \(mouse.productCategory, privacy: .public)/\(mouse.vendorName ?? "?", privacy: .public) — now \(self.attachedMiceSummary, privacy: .public)")
}
#endif
// macOS drives motion + buttons from NSEvent (StreamLayerView's local monitor
// sendMotion/sendMouseButton) because GCMouse's handlers proved unreliable there;
// installing them too would double-send. iOS keeps GCMouse (raw deltas under
@@ -1,10 +1,12 @@
// Stage-2 presenter, present half: draw a decoded NV12 CVPixelBuffer into a CAMetalLayer
// drawable with a BT.709 YUVRGB shader. The display link (owned by the hosting view) drives
// `render` once per vsync with the target present time, so a present can finally be stamped and
// the present tail hand-paced. See docs apple-stage2-presenter.md.
// Stage-2 presenter, present half: draw a decoded NV12 / P010 / 4:4:4 CVPixelBuffer into a CAMetalLayer
// drawable with a YCbCrRGB shader. The hosting view's CADisplayLink drives `render` once per vsync
// (via Stage2Pipeline.renderTick) with the target present time, so a present can be stamped and the
// present tail hand-paced. See docs apple-stage2-presenter.md.
//
// Main-thread only: created during view setup, `render` called from the view's CADisplayLink
// (which fires on the main runloop). The Metal objects + texture cache are touched only here.
// Main-thread only: created during view setup, `render`/`configure` called from the view's CADisplayLink
// (which fires on the main runloop). The Metal objects + texture cache are touched only here. The one
// exception is `setHdrMeta`, called from the pump thread it hops the layer write to main so every
// CALayer mutation stays on one thread.
#if canImport(Metal) && canImport(QuartzCore)
import CoreGraphics
@@ -15,10 +17,19 @@ import os
private let presenterLog = Logger(subsystem: "io.unom.punktfunk", category: "presenter")
/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and a
/// BT.709 limited-range NV12RGB fragment shader. uv.y is flipped (1 - p.y) so the top-left-
/// origin texture presents upright (NDC y is up), not upside down. (Colorspace is BT.709 SDR
/// for now matches the host; 10-bit/HDR + other matrices are a later tie-in.)
/// HDR reference white (BT.2408 "HDR Reference White"): the absolute luminance, in nits, that the
/// PQ signal's diffuse white sits at. Passed to `CAEDRMetadata.hdr10(opticalOutputScale:)`, it anchors
/// 203-nit diffuse white at EDR 1.0 (the display's SDR-white level) and lets the system tone-map the
/// brighter highlights into the panel's headroom. This is the missing anchor that made the old HDR path
/// render "way too bright" (no `edrMetadata` no reference-white anchoring); a LARGER value renders
/// dimmer. Matches the host's standard PQ reference white.
private let hdrReferenceWhiteNits: Float = 203.0
/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and BT.709 SDR
/// and BT.2020-PQ HDR YCbCrRGB fragment shaders. uv.y is flipped (1 - p.y) so the top-left-origin
/// texture presents upright (NDC y is up). The HDR shader outputs PQ-encoded RGB as-is the
/// CAMetalLayer's `itur_2100_PQ` colour space + `edrMetadata` tell the system compositor the samples
/// are PQ and how to tone-map them (no EOTF here, matching the host's BT.2020 PQ emission).
private let shaderSource = """
#include <metal_stdlib>
using namespace metal;
@@ -66,6 +77,8 @@ float catmullRomLuma(texture2d<float> tex, sampler s, float2 uv) {
return r;
}
// SDR: 8-bit NV12 / 4:4:4 (BT.709, limited/video range) → full-range RGB. Chroma is sampled at the
// same UV as luma, so a full-size 4:4:4 chroma plane needs no shader change vs 4:2:0.
fragment float4 pf_frag(VOut in [[stage_in]],
texture2d<float> lumaTex [[texture(0)]],
texture2d<float> chromaTex [[texture(1)]]) {
@@ -82,18 +95,18 @@ fragment float4 pf_frag(VOut in [[stage_in]],
return float4(saturate(float3(r, g, b)), 1.0);
}
// HDR: 10-bit P010 (BT.2020, limited range), Y'CbCr that is PQ-encoded. We apply the BT.2020
// matrix to get PQ-encoded R'G'B' and output it as-is — the CAMetalLayer's itur_2100_PQ colour
// space + EDR tells the compositor the samples are PQ, so it does the PQ→display mapping. No EOTF
// here (matching the host, which emitted BT.2020 PQ). P010 stores the 10-bit code in the high bits
// of each 16-bit sample, so an .r16Unorm sample reads ~code/1023 (the /1024 vs /1023 error is < 0.1%).
// HDR: 10-bit P010 / 4:4:4 (BT.2020, limited range), YCbCr that is PQ-encoded. We apply the BT.2020
// matrix to get PQ-encoded RGB and output it as-is — the CAMetalLayer's itur_2100_PQ colour space
// + edrMetadata tell the compositor the samples are PQ, so it does the PQ→display tone-map. No EOTF
// here. P010/x444 store the 10-bit code in the high bits of each 16-bit sample, so an .r16Unorm sample
// reads ~code/1023 (the /1024 vs /1023 error is < 0.1%).
fragment float4 pf_frag_hdr(VOut in [[stage_in]],
texture2d<float> lumaTex [[texture(0)]],
texture2d<float> chromaTex [[texture(1)]]) {
constexpr sampler s(filter::linear, address::clamp_to_edge);
float y = catmullRomLuma(lumaTex, s, in.uv);
float2 c = chromaTex.sample(s, in.uv).rg;
// BT.2020 10-bit limited (video) range → full-range PQ R'G'B'.
// BT.2020 10-bit limited (video) range → full-range PQ RGB.
y = (y - 64.0/1023.0) * (1023.0/876.0);
float u = (c.x - 512.0/1023.0) * (1023.0/896.0);
float v = (c.y - 512.0/1023.0) * (1023.0/896.0);
@@ -110,26 +123,34 @@ public final class MetalVideoPresenter {
private let device: MTLDevice
private let queue: MTLCommandQueue
/// SDR (BT.709 8-bit NV12 bgra8) and HDR (BT.2020 PQ 10-bit P010 rgba16Float) pipelines.
/// Selected per frame by `render`; the layer is reconfigured when the mode flips (HDR toggle).
/// SDR (BT.709 8-bit bgra8) and HDR (BT.2020 PQ 10-bit rgba16Float) pipelines. Selected per
/// frame in `render`; the layer is reconfigured to match when the session flips (HDR toggle).
private let pipelineSDR: MTLRenderPipelineState
private let pipelineHDR: MTLRenderPipelineState
private var textureCache: CVMetalTextureCache?
/// Current layer configuration switched lazily in `configure(hdr:)` when a frame's mode differs.
/// Current layer configuration switched in `configure(hdr:)` when a frame's HDR-ness differs.
/// Main-thread only (read + written from `render`/`configure`, all on the display-link runloop).
private var hdrActive = false
/// Last HDR mastering grade received via `setHdrMeta` (the host's 0xCE). Cached so a mid-session
/// SDRHDR flip's `configureColor` re-applies the real grade instead of clobbering it back to the
/// bare reference-white anchor (an out-of-order race otherwise: `setHdrMeta` and the flip both write
/// `edrMetadata`). Main-thread only.
private var lastHdrMeta: PunktfunkConnection.HdrMeta?
#if DEBUG
/// Last logged "decodeddrawable" signature, so the diagnostic logs only when a size changes
/// (on first frame, a resize, or a host Reconfigure) instead of every frame.
/// Last logged "decodeddrawable" signature, so the diagnostic logs only on a size/HDR change.
private var lastSizeSig = ""
#endif
/// nil if Metal is unavailable (no GPU / a headless CI) the caller falls back to stage-1.
public init?() {
/// nil if Metal is unavailable (no GPU / a headless CI) or a shader fails to compile the caller
/// falls back to stage-1.
public static func make() -> MetalVideoPresenter? {
guard let device = MTLCreateSystemDefaultDevice(),
let queue = device.makeCommandQueue()
else { return nil }
self.device = device
self.queue = queue
let pipelineSDR: MTLRenderPipelineState
let pipelineHDR: MTLRenderPipelineState
do {
let library = try device.makeLibrary(source: shaderSource, options: nil)
let vtx = library.makeFunction(name: "pf_vtx")
@@ -146,99 +167,140 @@ public final class MetalVideoPresenter {
} catch {
return nil
}
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &textureCache)
guard textureCache != nil else { return nil }
var cache: CVMetalTextureCache?
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &cache)
guard let textureCache = cache else { return nil }
let layer = CAMetalLayer()
layer.device = device
layer.pixelFormat = .bgra8Unorm
layer.framebufferOnly = true
layer.isOpaque = true
// Render the drawable at the DECODED frame's resolution (set per-frame in `render`) and let
// the system compositor scale it to the layer's bounds the same `.resizeAspect` path
// stage-1's AVSampleBufferDisplayLayer (videoGravity) uses, so stage-2 matches its sharpness.
// A native-resolution present is then pixel-exact (1:1, no shader scaling), and any display
// scaling uses the system's high-quality scaler rather than the in-shader bicubic.
layer.contentsGravity = .resizeAspect
// Triple-buffer: more in-flight drawables before `nextDrawable()` (called on the
// display-link / MAIN thread) has to block waiting for one to free.
layer.maximumDrawableCount = 3
#if os(macOS)
// The display link already paces exactly one present per vsync. Leaving the layer's
// own vsync wait on means `commandBuffer.present` ALSO blocks for the hardware vsync,
// so `nextDrawable()` stalls the MAIN thread until a drawable frees windowed, the
// WindowServer's looser compositing hides it; FULLSCREEN's tighter, more-direct path
// serializes the main thread to the display and the stall surfaces as bad judder.
// Disabling the layer-level sync lets present return promptly (the display link is the
// pacing source), which is what fixes the fullscreen stutter. macOS-only property.
// The display link already paces exactly one present per vsync. Leaving the layer's own vsync
// wait on means `commandBuffer.present` ALSO blocks for the hardware vsync, so `nextDrawable()`
// stalls the MAIN thread until a drawable frees windowed, the WindowServer's looser
// compositing hides it; FULLSCREEN's tighter path serializes the main thread to the display and
// the stall surfaces as bad judder. Disabling the layer-level sync lets present return promptly
// (the display link is the pacing source) the fix for the fullscreen stutter. macOS-only.
layer.displaySyncEnabled = false
#endif
// Render the drawable at the DECODED frame's resolution (set per-frame in `render`) and let the
// system compositor scale it to the layer's bounds the same `.resizeAspect` path stage-1's
// AVSampleBufferDisplayLayer uses. A native-resolution present is then pixel-exact (1:1, no
// shader scaling); a resized window rescales via the system's scaler.
layer.contentsGravity = .resizeAspect
// Triple-buffer: more in-flight drawables before `nextDrawable()` (called on the display-link /
// MAIN thread) has to block waiting for one to free.
layer.maximumDrawableCount = 3
return MetalVideoPresenter(
device: device, queue: queue, pipelineSDR: pipelineSDR, pipelineHDR: pipelineHDR,
textureCache: textureCache, layer: layer)
}
private init(
device: MTLDevice, queue: MTLCommandQueue, pipelineSDR: MTLRenderPipelineState,
pipelineHDR: MTLRenderPipelineState, textureCache: CVMetalTextureCache, layer: CAMetalLayer
) {
self.device = device
self.queue = queue
self.pipelineSDR = pipelineSDR
self.pipelineHDR = pipelineHDR
self.textureCache = textureCache
self.layer = layer
}
/// Reconfigure the layer for SDR or HDR when the stream mode flips (HDR toggle). HDR uses an
/// rgba16Float drawable + a BT.2020 PQ colour space + EDR, so the compositor PQ-maps to the
/// display; SDR uses the plain 8-bit sRGB path. Main-thread only (called from `render`).
private func configure(hdr: Bool) {
/// Configure the layer + active pipeline for an SDR or HDR session. MAIN THREAD ONLY. Called once at
/// session start and again per-frame from `render` (idempotent the guard makes a same-state call a
/// no-op), so a mid-session HDR toggle (the host re-inits its encoder; the decoded `frame.isHDR`
/// flips) reconfigures here automatically. HDR uses an rgba16Float drawable + BT.2020 PQ colour space
/// + EDR with a 203-nit reference-white anchor; SDR uses the plain 8-bit sRGB path.
public func configure(hdr: Bool) {
guard hdr != hdrActive else { return }
hdrActive = hdr
configureColor(hdr: hdr)
}
/// Set the layer's pixel format + colour config for SDR or HDR. MAIN THREAD ONLY. EDR is requested
/// on ALL platforms the property is available on macOS/iOS/tvOS at our deployment floor, and the
/// old `#if os(macOS)` guard left iOS/tvOS EDR half-engaged.
private func configureColor(hdr: Bool) {
if hdr {
layer.pixelFormat = .rgba16Float
layer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ)
#if os(macOS)
layer.wantsExtendedDynamicRangeContent = true
#endif
// Anchor reference white. Re-apply the real grade if one already arrived (0xCE before the
// flip); otherwise the bare 203-nit anchor. Without this anchor the PQ signal is too bright.
layer.edrMetadata = makeEDR(lastHdrMeta)
} else {
// SDR: gamma-encoded BT.709 [0,1] in an 8-bit drawable; a nil colorspace tags it device/sRGB
// (the proven SDR path never showed the "too bright" issue, which was HDR-only).
layer.pixelFormat = .bgra8Unorm
layer.colorspace = nil
#if os(macOS)
layer.wantsExtendedDynamicRangeContent = false
#endif
layer.edrMetadata = nil
}
}
/// Draw one decoded frame to the next drawable and present it. `isHDR` selects the 10-bit
/// BT.2020 PQ path (P010 input) vs the 8-bit BT.709 path (NV12 input). Returns true on success;
/// false when there's no drawable yet, a texture couldn't be made, or Metal errored the
/// caller then doesn't stamp a present for this frame.
private func makeEDR(_ meta: PunktfunkConnection.HdrMeta?) -> CAEDRMetadata {
CAEDRMetadata.hdr10(
displayInfo: meta?.masteringDisplayColorVolume(),
contentInfo: meta?.contentLightLevelInfo(),
opticalOutputScale: hdrReferenceWhiteNits)
}
/// Update the HDR mastering metadata (drained from the host's 0xCE datagram) to refine the system
/// tone-map from the real grade. Called from the PUMP thread, so the layer write is hopped to MAIN
/// (every CALayer mutation stays on one thread). The grade is cached so a later SDRHDR
/// `configureColor` re-applies it; the `edrMetadata` write is gated on `hdrActive` (setting it on an
/// SDR layer is harmless but pointless, and the flip will apply it anyway).
public func setHdrMeta(_ meta: PunktfunkConnection.HdrMeta) {
DispatchQueue.main.async { [weak self] in
guard let self else { return }
self.lastHdrMeta = meta
if self.hdrActive { self.layer.edrMetadata = self.makeEDR(meta) }
}
}
/// Draw one decoded frame to the next drawable and present it. MAIN THREAD (the display link).
/// `isHDR` selects the 10-bit BT.2020 PQ path vs the 8-bit BT.709 path and is reconciled with the
/// layer config via `configure`. Returns true on success; false when there's no drawable yet, a
/// texture couldn't be made, or Metal errored the caller then doesn't stamp a present.
@discardableResult
public func render(_ pixelBuffer: CVPixelBuffer, isHDR: Bool = false) -> Bool {
// Reconcile the layer with the decoded frame's HDR-ness (handles a mid-session SDRHDR flip).
configure(hdr: isHDR)
// P010 stores 10-bit luma/chroma in 16-bit samples R16/RG16; NV12 is 8-bit R8/RG8.
let lumaFmt: MTLPixelFormat = isHDR ? .r16Unorm : .r8Unorm
let chromaFmt: MTLPixelFormat = isHDR ? .rg16Unorm : .rg8Unorm
// P010/x444 store 10-bit luma/chroma in 16-bit samples R16/RG16; NV12/444v is 8-bit R8/RG8.
// Derived from the actual decoded buffer so a 4:4:4 (full chroma plane) frame just works.
let pf = CVPixelBufferGetPixelFormatType(pixelBuffer)
let tenBit =
pf == kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
|| pf == kCVPixelFormatType_420YpCbCr10BiPlanarFullRange
|| pf == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange
|| pf == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange
guard let textureCache,
let luma = makeTexture(pixelBuffer, plane: 0, format: lumaFmt, cache: textureCache),
let chroma = makeTexture(pixelBuffer, plane: 1, format: chromaFmt, cache: textureCache)
let luma = makeTexture(
pixelBuffer, plane: 0, format: tenBit ? .r16Unorm : .r8Unorm, cache: textureCache),
let chroma = makeTexture(
pixelBuffer, plane: 1, format: tenBit ? .rg16Unorm : .rg8Unorm, cache: textureCache)
else { return false }
// Size the drawable to the decoded frame so the fullscreen triangle samples the texture 1:1
// (pixel-exact); the layer's contentsGravity then scales it to the on-screen bounds via the
// system compositor (matching stage-1). Re-set only on a change (first frame / Reconfigure).
// Size the drawable to the decoded frame so the fullscreen triangle samples 1:1 (pixel-exact);
// the layer's contentsGravity then scales it to the on-screen bounds via the system compositor
// (matching stage-1). drawableSize does NOT track bounds (defaults to 0), so set it BEFORE
// nextDrawable; re-set only on a change (first frame / Reconfigure / HDR flip).
let decodedSize = CGSize(
width: CVPixelBufferGetWidth(pixelBuffer), height: CVPixelBufferGetHeight(pixelBuffer))
if layer.drawableSize != decodedSize { layer.drawableSize = decodedSize }
#if DEBUG
logSizeIfChanged(decoded: decodedSize)
#endif
guard let drawable = layer.nextDrawable(),
let commandBuffer = queue.makeCommandBuffer()
else { return false }
#if DEBUG
// Diagnose sharpness: decoded should equal the drawable (the shader is 1:1); the layer's
// bounds may differ (the system scales). Logged only when a size changes.
let decodedW = Int(decodedSize.width)
let decodedH = Int(decodedSize.height)
let sig = "\(decodedW)x\(decodedH)|\(Int(layer.drawableSize.width))x\(Int(layer.drawableSize.height))"
if sig != lastSizeSig {
lastSizeSig = sig
let msg = "stage2: decoded \(decodedW)x\(decodedH) → drawable "
+ "\(Int(layer.drawableSize.width))x\(Int(layer.drawableSize.height)) "
+ "(texture \(drawable.texture.width)x\(drawable.texture.height), "
+ "contentsScale \(layer.contentsScale), "
+ "layerBounds \(Int(layer.bounds.width))x\(Int(layer.bounds.height)))"
presenterLog.info("\(msg, privacy: .public)")
}
#endif
let pass = MTLRenderPassDescriptor()
pass.colorAttachments[0].texture = drawable.texture
pass.colorAttachments[0].loadAction = .clear
@@ -247,24 +309,23 @@ public final class MetalVideoPresenter {
guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else {
return false
}
encoder.setRenderPipelineState(isHDR ? pipelineHDR : pipelineSDR)
encoder.setRenderPipelineState(hdrActive ? pipelineHDR : pipelineSDR)
encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0)
encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1)
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
encoder.endEncoding()
commandBuffer.present(drawable) // present at the next vsync lowest latency
// Hold the CVMetalTextures + the source pixel buffer (its IOSurface) alive until the GPU
// finishes sampling releasing them at scope exit could free the backing mid-read.
// Hold the CVMetalTextures + source pixel buffer (its IOSurface) alive until the GPU finishes
// sampling releasing them at scope exit could free the backing mid-read.
commandBuffer.addCompletedHandler { _ in _ = (luma, chroma, pixelBuffer) }
commandBuffer.commit()
return true
}
/// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past
/// the draw the MTLTexture is only valid while its CVMetalTexture is retained.
/// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past the
/// draw the MTLTexture is only valid while its CVMetalTexture is retained.
private func makeTexture(
_ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat,
cache: CVMetalTextureCache
_ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat, cache: CVMetalTextureCache
) -> CVMetalTexture? {
let w = CVPixelBufferGetWidthOfPlane(pixelBuffer, plane)
let h = CVPixelBufferGetHeightOfPlane(pixelBuffer, plane)
@@ -276,5 +337,16 @@ public final class MetalVideoPresenter {
else { return nil }
return cvTexture
}
#if DEBUG
private func logSizeIfChanged(decoded: CGSize) {
let sig = "\(Int(decoded.width))x\(Int(decoded.height))|hdr\(hdrActive ? 1 : 0)"
if sig != lastSizeSig {
lastSizeSig = sig
let msg = "stage2: decoded \(Int(decoded.width))x\(Int(decoded.height)) hdr=\(hdrActive)"
presenterLog.info("\(msg, privacy: .public)")
}
}
#endif
}
#endif
@@ -0,0 +1,94 @@
// Steers the system's iPad pointer-lock resolution down to a chosen "anchor" view controller.
//
// `UIViewController.prefersPointerLocked` is resolved the same way as the status bar: the system
// walks DOWN from the window's root view controller through `childViewControllerForPointerLock`.
// SwiftUI's hosting / container view controllers do NOT forward that query to their children, so a
// `UIViewControllerRepresentable` controller buried in the SwiftUI tree (our StreamViewController)
// is never consulted its `prefersPointerLocked = true` is silently ignored and a Magic Keyboard
// trackpad / mouse falls through to the absolute-pointer path instead of being captured.
//
// Swizzling the DEFAULT implementation isn't enough: the controllers that break the chain
// (UIHostingController and SwiftUI's internal containers) provide their OWN implementation of the
// property, so a base-class swizzle never runs for them. Instead we walk UP the LIVE `parent`
// chain from the anchor to the window root and, on each real ancestor, force
// `childViewControllerForPointerLock` to return the next controller toward the anchor. Each forced
// value is a genuine direct child (we follow the actual containment chain), so the system's
// downward walk reaches the anchor and reads its `prefersPointerLocked`.
//
// The forcing is per-INSTANCE an associated object gated behind a one-time per-CLASS IMP
// swizzle. Only the specific controllers in the anchor's chain are affected; every other instance
// of those classes keeps its original behavior (associated object nil original IMP). The forced
// values are cleared on disengage so the long-lived SwiftUI parents don't retain a stale controller
// across sessions. Only the PUBLIC `childViewControllerForPointerLock` selector is touched
// (App-Store-safe; no private API).
#if os(iOS)
import ObjectiveC
import UIKit
enum PointerLockChain {
private static var forcedChildKey: UInt8 = 0
/// Classes whose `childViewControllerForPointerLock` we've already IMP-swizzled (keyed by the
/// class object). Main-thread only pointer-lock resolution and capture toggles are all main.
private static var swizzledClasses = Set<ObjectIdentifier>()
/// Ancestors we've stamped with a forced child this engagement, held weakly so a deallocated
/// SwiftUI controller drops out on its own (no dangling). disengage() clears every one even
/// if the live `parent` chain has since broken so a stamped parent can never retain a stale
/// controller subtree across sessions. One anchor is ever engaged at a time.
private static let stampedParents = NSHashTable<UIViewController>.weakObjects()
private static func forcedChild(of vc: UIViewController) -> UIViewController? {
objc_getAssociatedObject(vc, &forcedChildKey) as? UIViewController
}
private static func setForcedChild(_ child: UIViewController?, on vc: UIViewController) {
// RETAIN: while steering, the parent must keep the toward-anchor child alive. It's also
// already a strong child of `vc` via UIKit containment, so this adds no cycle (the reverse
// `.parent` link is weak), and disengage() always clears it so it can't outlive a session.
objc_setAssociatedObject(vc, &forcedChildKey, child, .OBJC_ASSOCIATION_RETAIN_NONATOMIC)
}
/// Ensure `cls`'s `childViewControllerForPointerLock` getter consults the per-instance forced
/// child first, falling back to the class's original implementation. Idempotent per class.
private static func ensureSwizzled(_ cls: AnyClass) {
let id = ObjectIdentifier(cls)
guard !swizzledClasses.contains(id) else { return }
swizzledClasses.insert(id)
let selector = #selector(getter: UIViewController.childViewControllerForPointerLock)
guard let method = class_getInstanceMethod(cls, selector) else { return }
let originalIMP = method_getImplementation(method)
typealias OriginalFn = @convention(c) (AnyObject, Selector) -> UIViewController?
let original = unsafeBitCast(originalIMP, to: OriginalFn.self)
let forwarding: @convention(block) (UIViewController) -> UIViewController? = { vc in
if let forced = forcedChild(of: vc) { return forced }
return original(vc, selector)
}
method_setImplementation(method, imp_implementationWithBlock(forwarding))
}
/// Force every ancestor of `anchor` to forward pointer-lock resolution toward it, then ask the
/// system to re-resolve. No-op when `anchor` isn't in a view-controller hierarchy yet (it
/// re-runs from the anchor's appearance/parent callbacks once it is).
static func engage(_ anchor: UIViewController) {
disengage(anchor) // clear any prior engagement first (reparent / re-anchor)
var child = anchor
while let parent = child.parent {
ensureSwizzled(object_getClass(parent)!)
setForcedChild(child, on: parent)
stampedParents.add(parent)
child = parent
}
anchor.setNeedsUpdateOfPrefersPointerLocked()
}
/// Clear the forced forwarding on every stamped ancestor (so the SwiftUI parents stop retaining
/// the anchor's subtree) and re-resolve to drop the lock.
static func disengage(_ anchor: UIViewController) {
for parent in stampedParents.allObjects {
setForcedChild(nil, on: parent)
}
stampedParents.removeAllObjects()
anchor.setNeedsUpdateOfPrefersPointerLocked()
}
}
#endif
@@ -0,0 +1,36 @@
// Synthetic 4:4:4 HEVC keyframes used only by `Stage444Probe` to probe decode capability.
//
// Each is the first IDR access unit (VPS + SPS + PPS + IDR slice, Annex-B) of a 256×256 HEVC
// Range-Extensions clip `chroma_format_idc = 3` generated offline with libx265:
// ffmpeg -f lavfi -i color=c=gray:s=256x256:r=30:d=0.1 -frames:v 3 \
// -pix_fmt yuv444p[10le] -c:v libx265 \
// -x265-params keyint=1:min-keyint=1:no-info=1:repeat-headers=1:aud=0 -f hevc out.hevc
// 256×256 clears the hardware decoder's minimum-dimension floor (a 16×16 clip is rejected for every
// chroma format). Validated to hardware-decode to `444v`/`x444` on Apple Silicon (M3).
enum Probe444Blobs {
/// 256×256 HEVC Range-Extensions 4:4:4 keyframe (Annex-B): 134 bytes.
static let au444_8bit: [UInt8] = [
0x00, 0x00, 0x00, 0x01, 0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00,
0x9e, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, 0xba, 0x02, 0x40, 0x00, 0x00, 0x00, 0x01, 0x42,
0x01, 0x01, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, 0x9e, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c,
0x90, 0x01, 0x01, 0x00, 0x80, 0xb2, 0xdd, 0x49, 0x26, 0x57, 0x80, 0xb4, 0x04, 0x00, 0x00, 0x03,
0x00, 0x04, 0x00, 0x00, 0x03, 0x00, 0x78, 0x20, 0x00, 0x00, 0x00, 0x01, 0x44, 0x01, 0xc1, 0x72,
0x86, 0x0c, 0x06, 0x24, 0x00, 0x00, 0x00, 0x01, 0x28, 0x01, 0xaf, 0x72, 0x15, 0xe8, 0x34, 0xeb,
0xae, 0xfb, 0xfe, 0x75, 0x57, 0xca, 0xc1, 0x71, 0x43, 0x16, 0xf5, 0xc2, 0x40, 0xbd, 0x80, 0xa6,
0x65, 0x35, 0x20, 0x28, 0x81, 0xa2, 0x5e, 0xc0, 0x93, 0x04, 0x10, 0x9b, 0x00, 0x34, 0xe0, 0x87,
0x00, 0x00, 0x03, 0x00, 0x5b, 0x40,
]
/// 256×256 HEVC Range-Extensions 4:4:4 10-bit keyframe (Annex-B): 133 bytes.
static let au444_10bit: [UInt8] = [
0x00, 0x00, 0x00, 0x01, 0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00,
0x9c, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, 0xba, 0x02, 0x40, 0x00, 0x00, 0x00, 0x01, 0x42,
0x01, 0x01, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, 0x9c, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c,
0x90, 0x01, 0x01, 0x00, 0x80, 0x9b, 0x2d, 0xd4, 0x92, 0x65, 0x78, 0x0b, 0x40, 0x40, 0x00, 0x00,
0x03, 0x00, 0x40, 0x00, 0x00, 0x07, 0x82, 0x00, 0x00, 0x00, 0x01, 0x44, 0x01, 0xc1, 0x72, 0x86,
0x0c, 0x06, 0x24, 0x00, 0x00, 0x00, 0x01, 0x28, 0x01, 0xaf, 0x72, 0x15, 0xe8, 0x34, 0xeb, 0xae,
0xfb, 0xfe, 0x75, 0x57, 0xca, 0xc1, 0x71, 0x43, 0x16, 0xf5, 0xc2, 0x40, 0xbd, 0x80, 0xa6, 0x65,
0x35, 0x20, 0x28, 0x81, 0xa2, 0x5e, 0xc0, 0x93, 0x04, 0x10, 0x9b, 0x00, 0x34, 0xe0, 0x87, 0x00,
0x00, 0x03, 0x00, 0x5b, 0x40,
]
}
@@ -238,6 +238,13 @@ public final class PunktfunkConnection {
public private(set) var colorFullRange: Bool = false
/// Encoded bit depth (8 or 10).
public private(set) var bitDepth: UInt8 = 8
/// The chroma subsampling the host resolved for this session, as the HEVC `chroma_format_idc`:
/// `1` = 4:2:0 (every pre-4:4:4 host, and the back-compat default) or `3` = full-chroma 4:4:4
/// (only when this client advertised `videoCap444` *and* the host could open a real 4:4:4
/// encoder). Drive the decoder's requested pixel format from this. See `isChroma444`.
public private(set) var chromaFormat: UInt8 = 1
/// Convenience: the resolved stream is full-chroma 4:4:4 (`chroma_format_idc == 3`).
public var isChroma444: Bool { chromaFormat == 3 }
/// True when the negotiated stream is HDR (PQ or HLG transfer) drive an HDR present path and
/// drain `nextHdrMeta`.
public var isHDR: Bool { colorTransfer == 16 || colorTransfer == 18 }
@@ -334,6 +341,9 @@ public final class PunktfunkConnection {
colorMatrix = mtx
colorFullRange = fullRange != 0
bitDepth = depth
var cf: UInt8 = 1
_ = punktfunk_connection_chroma_format(handle, &cf)
chromaFormat = cf
var ac: UInt8 = 2
_ = punktfunk_connection_audio_channels(handle, &ac)
resolvedAudioChannels = ac
@@ -605,6 +615,10 @@ public final class PunktfunkConnection {
public static let videoCap10Bit: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_10BIT)
/// Video-capability bit: the client can present BT.2020 PQ HDR10 (implies 10-bit).
public static let videoCapHDR: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_HDR)
/// Video-capability bit: the client can decode a full-chroma 4:4:4 HEVC stream (Range
/// Extensions). Advertise only when the device can *hardware*-decode it (`Stage444Probe`);
/// the host then emits 4:4:4 only if it too opted in. `chromaFormat` reflects the real value.
public static let videoCap444: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_444)
/// Static HDR mastering metadata (SMPTE ST.2086 + content light level) the host sent for an HDR
/// session. Mirrors the wire/ABI `PunktfunkHdrMeta`; primaries are in ST.2086 **G, B, R** order,
@@ -1,21 +1,21 @@
// Stage-2 presenter orchestrator: a pump thread pulls AUs VideoDecoder; the decoder's async
// output drops the newest decoded frame into a 1-slot ring; the hosting view's display link
// calls `renderTick` once per vsync to draw + present the newest ready frame and stamp
// capturepresent. Mirrors StreamPump's lifecycle (one per start; cancel is permanent).
// Stage-2 presenter orchestrator: a pump thread pulls AUs VideoDecoder; the decoder's async output
// drops the newest decoded frame into a 1-slot ring; the hosting view's display link calls `renderTick`
// once per vsync to draw + present the newest ready frame and stamp capturepresent. Mirrors
// StreamPump's lifecycle (one per start; cancel is permanent).
//
// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick`
// + `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there).
// Only the ring + decoder cross threads and both are internally locked.
// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick` +
// `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there). Only the ring (lock-guarded)
// and the decoder/presenter (internally locked / main-hopped) cross threads.
#if canImport(Metal) && canImport(QuartzCore)
import AVFoundation
import Foundation
import QuartzCore
/// Weak-target wrapper for CADisplayLink. The link retains its target, so targeting a view
/// directly makes a `view link view` cycle that only `invalidate()` breaks if a teardown
/// is ever missed the view leaks and keeps ticking. This proxy holds the handler weakly, so the
/// view can deallocate and its `deinit` invalidate the link.
/// Weak-target wrapper for CADisplayLink. The link retains its target, so targeting a view directly
/// makes a `view link view` cycle that only `invalidate()` breaks if a teardown is ever missed
/// the view leaks and keeps ticking. This proxy holds the handler weakly, so the view can deallocate
/// and its `deinit` invalidate the link.
public final class DisplayLinkProxy: NSObject {
private let onTick: (CADisplayLink) -> Void
public init(_ onTick: @escaping (CADisplayLink) -> Void) { self.onTick = onTick }
@@ -44,10 +44,10 @@ private final class PumpToken: @unchecked Sendable {
func cancel() { lock.lock(); live = false; lock.unlock() }
}
/// Throttled host keyframe requests for decode recovery. The decoder's async error callback
/// (a VT thread) and the pump thread (a submit failure) both signal a wedge; this coalesces
/// them so the control stream isn't flooded while the decode stays stalled for several frames
/// until the requested IDR lands. Bound to the live connection in `start`, unbound in `stop`.
/// Throttled host keyframe requests for decode recovery. The decoder's async error callback (a VT
/// thread) and the pump thread (a submit failure) both signal a wedge; this coalesces them so the
/// control stream isn't flooded while the decode stays stalled for several frames until the requested
/// IDR lands. Bound to the live connection in `start`, unbound in `stop`.
private final class KeyframeRecovery: @unchecked Sendable {
private let lock = NSLock()
private var connection: PunktfunkConnection?
@@ -60,7 +60,7 @@ private final class KeyframeRecovery: @unchecked Sendable {
func request() {
lock.lock()
let now = DispatchTime.now().uptimeNanoseconds
let due = lastNs == 0 || now &- lastNs > 100_000_000 // 100 ms since the last request (matches Android)
let due = lastNs == 0 || now &- lastNs > 100_000_000 // 100 ms since the last request
if due { lastNs = now }
let conn = due ? connection : nil
lock.unlock()
@@ -76,30 +76,36 @@ public final class Stage2Pipeline {
private let recovery = KeyframeRecovery()
private var token = PumpToken()
private var offsetNs: Int64 = 0
/// Signalled when the pump thread exits, so `stop()` can join it (bounded) before `decoder.reset()`
/// otherwise a pump iteration already past its `token.isLive` check can rebuild a decode session
/// right after the reset (a brief orphan session). `pumpJoinable` is armed by `start`, consumed by
/// the first `stop` (so the idempotent second `stop`/deinit doesn't block on an already-drained
/// semaphore). start/stop are sequential lifecycle calls, so the plain flag is safe.
private let pumpStopped = DispatchSemaphore(value: 0)
private var pumpJoinable = false
/// The Metal layer the hosting view installs + sizes. nil-init fails when Metal is
/// unavailable so the caller can fall back to stage-1.
/// The Metal layer the hosting view installs + sizes.
public var layer: CAMetalLayer { presenter.layer }
/// `presentMeter` records capturepresent (the glass-to-glass term). Returns nil if Metal
/// can't be set up (headless / no GPU) caller falls back to the stage-1 presenter.
/// `presentMeter` records capturepresent (the glass-to-glass term). Returns nil if Metal can't be
/// set up (headless / no GPU) caller falls back to the stage-1 presenter.
public init?(presentMeter: LatencyMeter) {
guard let presenter = MetalVideoPresenter() else { return nil }
guard let presenter = MetalVideoPresenter.make() else { return nil }
self.presenter = presenter
self.presentMeter = presentMeter
let ring = ring
let recovery = recovery
self.decoder = VideoDecoder(
onDecoded: { ring.submit($0) },
// Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump
// resets to re-gate on the next IDR, and we ask the host to send one now (infinite
// GOP it wouldn't otherwise come soon). Throttled in KeyframeRecovery.
// Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump resets to
// re-gate on the next IDR, and we ask the host to send one now (infinite GOP it wouldn't
// otherwise come soon). Throttled in KeyframeRecovery.
onDecodeError: { _ in recovery.request() })
}
/// Start pulling AUs into the decoder. `onFrame` fires per AU at receipt (captureclient
/// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client)
/// makes the present stamp cross-machine valid.
/// Start pulling AUs into the decoder. MAIN THREAD. `onFrame` fires per AU at receipt (captureclient
/// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client) makes the
/// present stamp cross-machine valid.
public func start(
connection: PunktfunkConnection,
onFrame: (@Sendable (AccessUnit) -> Void)?,
@@ -108,34 +114,48 @@ public final class Stage2Pipeline {
offsetNs = connection.clockOffsetNs
recovery.bind(connection) // arm host-keyframe recovery for this session
token = PumpToken() // fresh token per start cancel is permanent (like StreamPump)
// Configure the decoder's chroma + the layer's initial colorimetry before the first frame. The
// chroma subsampling drives only the decode pixel format (orthogonal to HDR/depth); the HDR
// config is the Welcome's latched value, which a mid-session flip then overrides per-frame.
decoder.setChroma444(connection.isChroma444)
presenter.configure(hdr: connection.isHDR)
let token = token
let decoder = decoder
let recovery = recovery
let presenter = presenter
let pumpStopped = pumpStopped
let thread = Thread {
defer { pumpStopped.signal() } // let stop() join the pump (bounded) before decoder.reset()
var format: CMVideoFormatDescription?
var lastFramesDropped = connection.framesDropped()
// Persistent recovery WANT, not a one-shot edge (see StreamPump for the full rationale):
// the old code advanced lastFramesDropped on the same edge it called recovery.request(),
// so a request swallowed by the throttle (the lost recovery IDR being pruned within the
// window) was never re-sent and the picture stayed frozen. Keep asking until an IDR lands.
// keep asking until an IDR lands so a request swallowed by the throttle is re-sent.
var awaitingIDR = false
// 4:4:4 backstop: a run of decode/create failures in a 4:4:4 session means this device can't
// decode 4:4:4 at the negotiated resolution (the HW probe clears the common case but not a
// resolution-ceiling miss). End cleanly instead of looping on a black screen.
var decodeFailRun = 0
while token.isLive {
do {
// Loss recovery (the primary path). The reassembler drops unrecoverable AUs
// (framesDropped) and the decoder conceals the reference-missing deltas that
// follow often WITHOUT an error callback so key off the drop count climbing,
// then keep asking (awaitingIDR) until a fresh IDR re-anchors decode. Polled every
// iteration so a total-loss drought recovers the moment packets resume.
// Loss recovery (the primary path). The reassembler drops unrecoverable AUs and the
// decoder conceals the reference-missing deltas often WITHOUT an error callback
// so key off the drop count climbing, then keep asking (awaitingIDR) until a fresh
// IDR re-anchors decode.
let dropped = connection.framesDropped()
if dropped > lastFramesDropped {
lastFramesDropped = dropped
awaitingIDR = true
}
if awaitingIDR { recovery.request() }
// Drain any HDR mastering-metadata update (0xCE) and hand it to the decoder, which
// attaches it to subsequent HDR frames. Non-blocking; only HDR sessions emit these.
if connection.isHDR, let meta = try? connection.nextHdrMeta(timeoutMs: 0) {
decoder.setHdrMeta(meta)
// Drain HDR mastering metadata (0xCE) and hand it to the PRESENTER ( CAEDRMetadata).
// Polled UNCONDITIONALLY (not gated on connection.isHDR, the fixed Welcome flag): the
// host sends 0xCE only for HDR, INCLUDING a mid-session SDRHDR transition (a game
// entering HDR the host re-inits its encoder) the Welcome flag would never reflect.
// Non-blocking; nil for an SDR stream.
if let meta = try? connection.nextHdrMeta(timeoutMs: 0) {
presenter.setHdrMeta(meta)
}
guard let au = try connection.nextAU(timeoutMs: 100) else { continue }
onFrame?(au)
@@ -144,12 +164,20 @@ public final class Stage2Pipeline {
awaitingIDR = false // a fresh IDR re-anchored decode recovery complete
}
guard let f = format, token.isLive else { continue }
if !decoder.decode(au: au, format: f) {
// Submit/decoder error: drop the session and re-gate on the next IDR's
// in-band parameter sets (a delta frame can't recover) stage-1's policy
// and keep asking for that IDR (infinite GOP) until one re-anchors decode.
if decoder.decode(au: au, format: f) {
decodeFailRun = 0
} else {
// Submit/decoder error: drop the session and re-gate on the next IDR's in-band
// parameter sets (a delta frame can't recover) and keep asking for that IDR.
decoder.reset()
awaitingIDR = true
decodeFailRun += 1
// ~3 s of solid failure in a 4:4:4 session (and only there a 4:2:0 loss
// recovers within a GOP) 4:4:4 isn't decodable here; end the session.
if connection.isChroma444, decodeFailRun >= 180 {
if token.isLive { onSessionEnd?() }
break
}
}
} catch {
if token.isLive { onSessionEnd?() }
@@ -159,22 +187,30 @@ public final class Stage2Pipeline {
}
thread.name = "punktfunk-stage2-pump"
thread.qualityOfService = .userInteractive
pumpJoinable = true
thread.start()
}
/// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp
/// capturepresent at `targetPresentNs` the display link's target present instant, already
/// converted to `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
/// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp capturepresent at
/// `targetPresentNs` the display link's target present instant, already converted to
/// `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
public func renderTick(targetPresentNs: Int64) {
guard let frame = ring.take() else { return }
guard presenter.render(frame.pixelBuffer, isHDR: frame.isHDR) else { return }
presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs)
}
/// Stop the pump ( one poll timeout) and drop the decode session. Does not close the
/// connection. A restart needs a fresh Stage2Pipeline (cancel is permanent).
/// Stop the pump ( one poll timeout) and drop the decode session. MAIN THREAD; idempotent. Does not
/// close the connection. A restart needs a fresh Stage2Pipeline (cancel is permanent).
public func stop() {
token.cancel()
// Join the pump (bounded: one nextAU poll + an in-flight decode) before resetting the decoder,
// so the pump can't rebuild a session right after the reset. Only the first stop joins; a
// repeat/deinit stop skips the already-drained semaphore.
if pumpJoinable {
pumpJoinable = false
_ = pumpStopped.wait(timeout: .now() + 0.5)
}
decoder.reset()
recovery.bind(nil) // stop requesting keyframes once the session is torn down
}
@@ -182,8 +218,8 @@ public final class Stage2Pipeline {
deinit { token.cancel() }
/// Convert a `CADisplayLink.targetTimestamp` (CACurrentMediaTime basis) to a `CLOCK_REALTIME`
/// nanosecond instant the present clock the AU pts + skew offset live in. Projects to the
/// target present time (when the frame is actually on glass), not the moment we drew.
/// nanosecond instant the present clock the AU pts + skew offset live in. Projects to the target
/// present time (when the frame is actually on glass), not the moment we drew.
public static func realtimeNs(forDisplayLinkTimestamp t: CFTimeInterval) -> Int64 {
let caNow = CACurrentMediaTime()
var ts = timespec()
@@ -0,0 +1,83 @@
// Runtime 4:4:4 HEVC decode-capability probe.
//
// We advertise `VIDEO_CAP_444` (so the host upgrades to a full-chroma 4:4:4 stream) ONLY when this
// device can decode 4:4:4 HEVC *in hardware* software 4:4:4 decode works but is far too slow for a
// real-time stream at the negotiated resolution, so a software-only device must keep 4:2:0.
//
// `VTIsHardwareDecodeSupported(HEVC)` and the HEVC-decoder-capabilities dictionary report HEVC HW
// decode but expose nothing about `chroma_format_idc`, so the only reliable signal is to actually
// create a *hardware-required* `VTDecompressionSession` for a tiny synthetic 4:4:4 keyframe and
// confirm it both creates and decodes to the expected biplanar 4:4:4 pixel format. Validated on an
// Apple M3 (HW 4:4:4 8- and 10-bit decode to `444v`/`x444`); a software-only decoder fails the
// hardware-required create and we fall back to 4:2:0.
//
// The probe blobs are 256×256 (above the hardware decoder's minimum-dimension floor a 16×16 clip
// is rejected for ALL chroma formats, including 4:2:0) HEVC Range-Extensions keyframes generated
// offline with libx265; see scripts notes. Results are cached (device-static) in lazy statics.
import CoreMedia
import CoreVideo
import Foundation
import VideoToolbox
public enum Stage444Probe {
/// True iff this device hardware-decodes 8-bit 4:4:4 HEVC (the host's current 4:4:4 path
/// BT.709 limited `yuv444p`). Cached after first evaluation.
public static let hwDecode444_8bit: Bool = probeHardware444(
au: Probe444Blobs.au444_8bit,
want: kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange,
fullRangeSibling: kCVPixelFormatType_444YpCbCr8BiPlanarFullRange)
/// True iff this device hardware-decodes 10-bit 4:4:4 HEVC (the 4:4:4 HDR/10-bit intersection).
/// Cached after first evaluation.
public static let hwDecode444_10bit: Bool = probeHardware444(
au: Probe444Blobs.au444_10bit,
want: kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange,
fullRangeSibling: kCVPixelFormatType_444YpCbCr10BiPlanarFullRange)
/// Create a hardware-REQUIRED `VTDecompressionSession` for the synthetic 4:4:4 keyframe and
/// decode it, returning true only when the decoder produces the expected (video- or full-range)
/// biplanar 4:4:4 pixel format. Any failure (no hardware path, wrong output format, decode error)
/// false we keep 4:2:0.
private static func probeHardware444(
au auBytes: [UInt8], want: OSType, fullRangeSibling: OSType
) -> Bool {
let data = Data(auBytes)
guard let format = AnnexB.formatDescription(fromIDR: data) else { return false }
// Require a hardware decoder a software false-positive would make us advertise 4:4:4 and
// then decode every real frame on the CPU, blowing the latency budget.
let spec: [CFString: Any] = [
kVTVideoDecoderSpecification_RequireHardwareAcceleratedVideoDecoder: true,
]
let attrs: [CFString: Any] = [
kCVPixelBufferPixelFormatTypeKey: want,
kCVPixelBufferMetalCompatibilityKey: true,
]
var session: VTDecompressionSession?
let created = VTDecompressionSessionCreate(
allocator: kCFAllocatorDefault, formatDescription: format,
decoderSpecification: spec as CFDictionary, imageBufferAttributes: attrs as CFDictionary,
outputCallback: nil, decompressionSessionOut: &session)
guard created == noErr, let session else { return false }
defer { VTDecompressionSessionInvalidate(session) }
let au = AccessUnit(data: data, ptsNs: 0, frameIndex: 0, flags: 0)
guard let sample = AnnexB.sampleBuffer(au: au, format: format) else { return false }
var produced: OSType = 0
let done = DispatchSemaphore(value: 0)
let status = VTDecompressionSessionDecodeFrame(
session, sampleBuffer: sample,
flags: [._EnableAsynchronousDecompression], infoFlagsOut: nil
) { status, _, imageBuffer, _, _ in
if status == noErr, let imageBuffer {
produced = CVPixelBufferGetPixelFormatType(imageBuffer)
}
done.signal()
}
guard status == noErr else { return false }
VTDecompressionSessionWaitForAsynchronousFrames(session)
_ = done.wait(timeout: .now() + 1.0)
return produced == want || produced == fullRangeSibling
}
}
@@ -137,8 +137,8 @@ public struct StreamView: NSViewRepresentable {
public final class StreamLayerView: NSView {
private let displayLayer = AVSampleBufferDisplayLayer()
private var pump: StreamPump?
/// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a
/// display link instead of the StreamPump displayLayer path. nil = stage-1 (default).
/// Stage-2 presenter (default): a CAMetalLayer sublayer driven by a display link instead of the
/// StreamPump displayLayer path. nil = stage-1 (Metal-unavailable fallback / DEBUG toggle).
var presentMeter: LatencyMeter?
private var stage2: Stage2Pipeline?
private var stage2Link: CADisplayLink?
@@ -638,7 +638,7 @@ public final class StreamLayerView: NSView {
private func teardownStage2() {
stage2Link?.invalidate()
stage2Link = nil
stage2?.stop()
stage2?.stop() // stops the pump (synchronous join) + drops the decode session
stage2 = nil
metalLayer?.removeFromSuperlayer()
metalLayer = nil
@@ -92,8 +92,8 @@ public final class StreamViewController: UIViewController {
public private(set) var connection: PunktfunkConnection?
private var pump: StreamPump?
private var observers: [NSObjectProtocol] = []
/// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a
/// CADisplayLink instead of the StreamPump displayLayer path. nil = stage-1 (default).
/// Stage-2 presenter (default): a CAMetalLayer sublayer driven by a CADisplayLink instead of the
/// StreamPump displayLayer path. nil = stage-1 (Metal-unavailable fallback / DEBUG toggle).
var presentMeter: LatencyMeter?
private var stage2: Stage2Pipeline?
private var stage2Link: CADisplayLink?
@@ -155,19 +155,58 @@ public final class StreamViewController: UIViewController {
}
#if os(iOS)
// Pointer lock is only meaningful on iPad (iPhone has no hardware-pointer lock) and
// only when capture is engaged. The system additionally requires full-screen + frontmost
// and may drop it (Slide Over/Stage Manager/backgrounding) verified in setCaptured().
public override var prefersPointerLocked: Bool {
captured && UIDevice.current.userInterfaceIdiom == .pad
/// Whether the user wants the mouse/trackpad pointer CAPTURED (pointer lock relative
/// movement, the gaming default) rather than forwarded as an absolute position (desktop
/// use). Read live from UserDefaults so it tracks the Settings toggle; defaults to on when
/// unset. iPad-only gated again in `prefersPointerLocked`.
private var pointerCaptureEnabled: Bool {
UserDefaults.standard.object(forKey: DefaultsKey.pointerCapture) as? Bool ?? true
}
/// Whether the pointer should be CAPTURED right now: iPad, capture engaged, and the user
/// hasn't opted into the absolute (desktop) pointer. The system additionally requires
/// full-screen + frontmost and may drop the lock (Slide Over/Stage Manager/backgrounding)
/// syncPointerLock() handles the actual grant/drop and falls back to absolute when unlocked.
private var wantsPointerLock: Bool {
captured && pointerCaptureEnabled && UIDevice.current.userInterfaceIdiom == .pad
}
public override var prefersPointerLocked: Bool { wantsPointerLock }
public override var prefersHomeIndicatorAutoHidden: Bool { true }
// If SwiftUI's UIHostingController reparents us, a plain container parent that forwards
// its pointer-lock decision to its children will then reach this VC. (UIHostingController
// itself does not consult children, which is why GCMouse deltas can never arrive there
// the touch path, always forwarded, is the unconditional fallback.)
public override var childViewControllerForPointerLock: UIViewController? { self }
// NOTE: we deliberately do NOT override `childViewControllerForPointerLock`. The default
// returns nil, which tells the system to use THIS controller's own `prefersPointerLocked`
// exactly what we want, since `PointerLockChain` forces our SwiftUI ancestors to forward the
// downward walk to us and we are the terminal anchor. Returning `self` here would make the
// system ask the same controller forever (it keeps delegating to the returned child)
// unbounded recursion stack overflow once the chain actually reaches us.
/// (Re)build or tear down the forced pointer-lock forwarding chain from this controller to the
/// window root so the system actually resolves our `prefersPointerLocked`. Safe to call
/// repeatedly it no-ops until the view is in a window with a parent chain, and re-runs from
/// the appearance/parent callbacks once SwiftUI has placed us.
private func updatePointerLockChain() {
// Engaging needs a live parent chain to the window root; disengaging is always safe and
// must run even after the view has left the window (session teardown) so the stamped
// SwiftUI ancestors are cleared.
if wantsPointerLock, view.window != nil {
PointerLockChain.engage(self)
} else {
PointerLockChain.disengage(self)
}
}
public override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
// SwiftUI places us in the hierarchy AFTER start()'s setCaptured(true), and may reparent us
// later re-anchor the chain here so a lock requested before we had a parent still lands.
updatePointerLockChain()
}
public override func didMove(toParent parent: UIViewController?) {
super.didMove(toParent: parent)
updatePointerLockChain() // chain shape changed re-anchor (or no-op if not yet in a window)
}
#endif
func start(
@@ -200,7 +239,14 @@ public final class StreamViewController: UIViewController {
// Indirect pointer (mouse/trackpad with no lock) absolute cursor + buttons, routed
// through InputCapture so the forwarding gate and release-on-blur apply uniformly.
streamView.onPointerMoveAbs = { [weak self] p in
self?.inputCapture?.sendMouseAbs(
guard let self else { return }
if iosInputDebug {
// Whether ANY UIKit pointer movement reaches us while the scene is LOCKED tells us
// if the trackpad (which may not be a GCMouse) can still be captured via UIKit.
iosInputLog.debug(
"UIKit pointer move x=\(p.x, privacy: .public) y=\(p.y, privacy: .public) locked=\(self.pointerLockEngaged() == true, privacy: .public) gcFwd=\(self.inputCapture?.gcMouseForwarding == true, privacy: .public)")
}
self.inputCapture?.sendMouseAbs(
x: p.x, y: p.y, surfaceWidth: p.w, surfaceHeight: p.h)
}
streamView.onPointerButton = { [weak self] button, down in
@@ -210,7 +256,12 @@ public final class StreamViewController: UIViewController {
// UNLOCKED regime; while locked, GCMouse's scroll handler owns it mirror the
// sendMouseAbs !gcMouseForwarding gate so the two can't double-send.
streamView.onScroll = { [weak self] dx, dy in
guard let self, self.inputCapture?.gcMouseForwarding == false else { return }
guard let self else { return }
if iosInputDebug {
iosInputLog.debug(
"UIKit scroll dx=\(dx, privacy: .public) dy=\(dy, privacy: .public) locked=\(self.pointerLockEngaged() == true, privacy: .public)")
}
guard self.inputCapture?.gcMouseForwarding == false else { return }
self.inputCapture?.sendScroll(dx: dx, dy: dy)
}
@@ -315,7 +366,7 @@ public final class StreamViewController: UIViewController {
) {
let metal = pipeline.layer
// Composites OVER the idle (un-enqueued in stage-2) AVSampleBufferDisplayLayer base.
// (contentsScale + frame + drawableSize are all set by layoutMetalLayer() just below.)
// (contentsScale + frame are set by layoutMetalLayer() just below.)
streamView.layer.addSublayer(metal)
metalLayer = metal
stage2 = pipeline
@@ -372,7 +423,7 @@ public final class StreamViewController: UIViewController {
private func teardownStage2() {
stage2Link?.invalidate()
stage2Link = nil
stage2?.stop()
stage2?.stop() // stops the pump (synchronous join) + drops the decode session
stage2 = nil
metalLayer?.removeFromSuperlayer()
metalLayer = nil
@@ -392,6 +443,7 @@ public final class StreamViewController: UIViewController {
captured = false
}
setNeedsUpdateOfPrefersPointerLocked()
updatePointerLockChain() // (re)anchor the SwiftUI ancestors so the lock actually resolves
syncPointerLock() // resolve cursor + GCMouse/absolute routing for the current state
let onCaptureChange = onCaptureChange
let captured = captured
@@ -420,7 +472,7 @@ public final class StreamViewController: UIViewController {
pointerInteraction?.invalidate() // re-resolve the hidden/visible cursor for the state
if iosInputDebug {
iosInputLog.debug(
"pointer lock isLocked=\(locked, privacy: .public) captured=\(self.captured, privacy: .public)")
"pointer lock isLocked=\(locked, privacy: .public) captured=\(self.captured, privacy: .public) useGCMouse=\(useGCMouse, privacy: .public) [\(self.inputCapture?.attachedMiceSummary ?? "n/a", privacy: .public)]")
}
}
#endif
@@ -49,11 +49,10 @@ public final class VideoDecoder: @unchecked Sendable {
/// pump can re-gate on the next IDR.
private let onDecodeError: @Sendable (OSStatus) -> Void
/// Latest source HDR mastering metadata (from `PunktfunkConnection.nextHdrMeta`), attached to
/// each decoded HDR pixel buffer so the compositor tone-maps from the real grade. Guarded by its
/// own lock written by the pump thread, read on the VT decode callback.
private let metaLock = NSLock()
private var hdrMeta: PunktfunkConnection.HdrMeta?
/// Whether the negotiated stream is full-chroma 4:4:4 (`connection.isChroma444`), set once at
/// session start before any decode. Selects the 4:4:4 decode pixel format (orthogonal to bit
/// depth / HDR). Read inside `createSessionLocked` under `lock`.
private var chroma444 = false
public init(
onDecoded: @escaping @Sendable (ReadyFrame) -> Void,
@@ -65,12 +64,13 @@ public final class VideoDecoder: @unchecked Sendable {
deinit { teardown() }
/// Set the source HDR mastering metadata (drained from `PunktfunkConnection.nextHdrMeta`). It's
/// attached to subsequent decoded HDR pixel buffers. Thread-safe; cheap to call on each update.
public func setHdrMeta(_ meta: PunktfunkConnection.HdrMeta) {
metaLock.lock()
hdrMeta = meta
metaLock.unlock()
/// Select the chroma subsampling of the decode output (4:2:0 vs full-chroma 4:4:4). Call once at
/// session start, before decoding, from `connection.isChroma444`. Takes effect on the next
/// session (re)build. Thread-safe.
public func setChroma444(_ on: Bool) {
lock.lock()
chroma444 = on
lock.unlock()
}
/// Submit one AU for asynchronous decode, (re)creating the session if `format` changed. The
@@ -135,8 +135,10 @@ public final class VideoDecoder: @unchecked Sendable {
/// True when `newFormat` carries a PQ (SMPTE ST 2084) or HLG transfer function i.e. the host
/// is sending HDR (BT.2020). VideoToolbox populates the transfer-function extension from the
/// HEVC VUI, so this tracks the *stream*, switching dynamically when the user toggles HDR
/// (the host re-emits parameter sets with the new VUI a new format desc session rebuild).
/// HEVC VUI, so this picks the decode bit depth (10-bit P010/x444 vs 8-bit NV12/444v) from the
/// stream. The present-side HDR config (colorspace/EDR/shader) is latched once per session from
/// the Welcome (`connection.isHDR`), which the host does NOT flip mid-session so this predicate
/// and that config agree for the session (a `#if DEBUG` assert in the presenter guards it).
static func isHDRFormat(_ format: CMVideoFormatDescription) -> Bool {
guard
let tf = CMFormatDescriptionGetExtension(
@@ -157,11 +159,18 @@ public final class VideoDecoder: @unchecked Sendable {
session = nil
format = nil
// Decode pixel format is a 2×2 of (chroma, depth/HDR), both biplanar so the presenter binds
// plane 0 = luma, plane 1 = interleaved chroma uniformly 4:4:4 just delivers a full-size
// chroma plane. 10-bit (P010 / `x444`) for HDR (PQ/HLG), 8-bit (NV12 / `444v`) otherwise.
let hdr = Self.isHDRFormat(newFormat)
let pixelFormat =
hdr
? kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010 (10-bit)
: kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12 (8-bit)
let pixelFormat: OSType = {
switch (chroma444, hdr) {
case (false, false): return kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12
case (false, true): return kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010
case (true, false): return kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange // 444v
case (true, true): return kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange // x444
}
}()
let imageAttrs: [CFString: Any] = [
kCVPixelBufferMetalCompatibilityKey: true,
kCVPixelBufferPixelFormatTypeKey: pixelFormat,
@@ -169,11 +178,20 @@ public final class VideoDecoder: @unchecked Sendable {
var callback = VTDecompressionOutputCallbackRecord(
decompressionOutputCallback: decoderOutputCallback,
decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque())
// 4:4:4 sessions REQUIRE a hardware decoder: we only advertise 4:4:4 when the hardware probe
// passed, so a hardware-incapable mode (e.g. a resolution past the HW 4:4:4 ceiling) must fail
// HERE, synchronously, letting the pump's backstop end the session rather than silently
// falling back to a software 4:4:4 decoder far too slow for a real-time stream. 4:2:0 keeps the
// software fallback (nil spec) as a robustness net.
let spec: CFDictionary? =
chroma444
? [kVTVideoDecoderSpecification_RequireHardwareAcceleratedVideoDecoder: true] as CFDictionary
: nil
var newSession: VTDecompressionSession?
let status = VTDecompressionSessionCreate(
allocator: kCFAllocatorDefault,
formatDescription: newFormat,
decoderSpecification: nil, // hardware by default
decoderSpecification: spec,
imageBufferAttributes: imageAttrs as CFDictionary,
outputCallback: &callback,
decompressionSessionOut: &newSession)
@@ -195,26 +213,17 @@ public final class VideoDecoder: @unchecked Sendable {
// pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively.
let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default)
let ptsNs = p.value > 0 ? UInt64(p.value) : 0
// HDR iff the decoder produced a 10-bit P010 buffer (we only request P010 for PQ streams).
// HDR iff the decoder produced a 10-bit buffer (we only request a 10-bit format for PQ/HLG
// streams). Covers 4:2:0 (P010) and 4:4:4 (`x444`), video- and full-range, so a 10-bit 4:4:4
// HDR frame isn't misclassified as SDR. (The mastering metadata is applied to the presenter's
// CAMetalLayer via CAEDRMetadata, not to this source buffer a separate-drawable presenter
// never composites the source buffer's attachments, so attaching them here would be dead.)
let fmt = CVPixelBufferGetPixelFormatType(imageBuffer)
let isHDR =
CVPixelBufferGetPixelFormatType(imageBuffer)
== kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
// Attach the source's mastering display + content light level (ST.2086 / CEA-861.3) so the
// compositor tone-maps from the real grade rather than inferring from the PQ colourspace
// alone. The SEI byte payloads map 1:1 to these CVImageBuffer attachment keys.
if isHDR {
metaLock.lock()
let meta = hdrMeta
metaLock.unlock()
if let meta {
CVBufferSetAttachment(
imageBuffer, kCVImageBufferMasteringDisplayColorVolumeKey,
meta.masteringDisplayColorVolume() as CFData, .shouldPropagate)
CVBufferSetAttachment(
imageBuffer, kCVImageBufferContentLightLevelInfoKey,
meta.contentLightLevelInfo() as CFData, .shouldPropagate)
}
}
fmt == kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
|| fmt == kCVPixelFormatType_420YpCbCr10BiPlanarFullRange
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange
onDecoded(
ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer, isHDR: isHDR))
}
@@ -1,11 +1,13 @@
import XCTest
#if canImport(Metal)
import CoreVideo
import Metal
import QuartzCore
@testable import PunktfunkKit
final class MetalPresenterTests: XCTestCase {
/// `MetalVideoPresenter.init?()` compiles the runtime Metal shaders (the BT.709/BT.2020 YUVRGB
/// `MetalVideoPresenter.make()` compiles the runtime Metal shaders (the BT.709/BT.2020 YUVRGB
/// fragment shaders plus the Catmull-Rom luma sampler). A `nil` result on a GPU-equipped host
/// means a shader failed to compile this catches a malformed shader before it silently
/// degrades stage-2 to a stage-1 fallback on device.
@@ -14,8 +16,54 @@ final class MetalPresenterTests: XCTestCase {
throw XCTSkip("no Metal device available in this environment")
}
XCTAssertNotNil(
MetalVideoPresenter(),
MetalVideoPresenter.make(),
"stage-2 Metal shaders failed to compile (presenter init returned nil)")
}
/// The HDR fix: `configure(hdr:)` must put the layer into the BT.2020-PQ EDR configuration with a
/// reference-white anchor (`edrMetadata`) the missing anchor was what made HDR render "too
/// bright". SDR must use the plain 8-bit path with EDR off and no metadata. A mid-session flip is a
/// per-mode reconfigure, so the round trip back to SDR must fully restore the SDR config.
func testConfigureHDRSetsEDRAnchor() throws {
guard let presenter = MetalVideoPresenter.make() else {
throw XCTSkip("no Metal device available in this environment")
}
presenter.configure(hdr: true)
XCTAssertEqual(presenter.layer.pixelFormat, .rgba16Float, "HDR uses an EDR-capable drawable")
XCTAssertNotNil(presenter.layer.colorspace, "HDR layer must be tagged (itur_2100_PQ)")
XCTAssertTrue(
presenter.layer.wantsExtendedDynamicRangeContent, "EDR must be requested on all platforms")
XCTAssertNotNil(
presenter.layer.edrMetadata,
"HDR must anchor reference white via edrMetadata (the fix for 'too bright')")
// Mid-session HDRSDR flip: the 8-bit path, EDR off, no metadata.
presenter.configure(hdr: false)
XCTAssertEqual(presenter.layer.pixelFormat, .bgra8Unorm, "SDR uses the plain 8-bit drawable")
XCTAssertFalse(presenter.layer.wantsExtendedDynamicRangeContent)
XCTAssertNil(presenter.layer.edrMetadata)
}
/// `render` with a freshly-allocated NV12 buffer must present without crashing or hanging the
/// main-thread present path is the highest-risk part of the stage-2 rewrite. (A headless CI with no
/// display can still allocate a drawable from a CAMetalLayer; if it can't, render returns false,
/// which is also a valid non-crashing outcome.)
func testRenderDoesNotCrashOnNV12Frame() throws {
guard let presenter = MetalVideoPresenter.make() else {
throw XCTSkip("no Metal device available in this environment")
}
presenter.configure(hdr: false)
var pb: CVPixelBuffer?
let attrs: [CFString: Any] = [kCVPixelBufferMetalCompatibilityKey: true]
let status = CVPixelBufferCreate(
kCFAllocatorDefault, 256, 256, kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,
attrs as CFDictionary, &pb)
guard status == kCVReturnSuccess, let pixelBuffer = pb else {
throw XCTSkip("could not allocate a test pixel buffer")
}
// Just asserting it returns (true or false) without trapping the layer may have no drawable
// source headless, so a false return is acceptable.
_ = presenter.render(pixelBuffer, isHDR: false)
}
}
#endif
@@ -0,0 +1,68 @@
// 4:4:4 decode-path coverage: the hardware-capability probe is stable/cached, and a real 4:4:4 HEVC
// keyframe decodes through VideoDecoder to a biplanar 4:4:4 pixel buffer. Reuses the same synthetic
// 4:4:4 blobs the runtime probe ships with.
import CoreVideo
import VideoToolbox
import XCTest
@testable import PunktfunkKit
private final class FrameBox: @unchecked Sendable {
let lock = NSLock()
var frame: ReadyFrame?
var error: OSStatus?
}
final class Stage444Tests: XCTestCase {
/// The capability probe is device-static and cached reading it twice must return the same value
/// (and must never crash, including where 4:4:4 is unsupported false).
func testProbeIsStableAndCached() {
XCTAssertEqual(Stage444Probe.hwDecode444_8bit, Stage444Probe.hwDecode444_8bit)
XCTAssertEqual(Stage444Probe.hwDecode444_10bit, Stage444Probe.hwDecode444_10bit)
}
/// A real 8-bit 4:4:4 HEVC keyframe (the embedded probe blob) decodes through `VideoDecoder` with
/// `setChroma444(true)` to a 256×256 biplanar 4:4:4 (`444v`/`444f`) buffer classified SDR.
/// (4:4:4 sessions require a hardware decoder skip where there isn't one, which is exactly where
/// the client wouldn't advertise 4:4:4 anyway.)
func testVideoDecoderDecodes444() throws {
try XCTSkipUnless(
Stage444Probe.hwDecode444_8bit, "no hardware 4:4:4 decode on this device")
let data = Data(Probe444Blobs.au444_8bit)
let format = try XCTUnwrap(
AnnexB.formatDescription(fromIDR: data), "the 4:4:4 blob must yield a format description")
let au = AccessUnit(data: data, ptsNs: 7_000_000, frameIndex: 0, flags: 0)
let box = FrameBox()
let done = DispatchSemaphore(value: 0)
let decoder = VideoDecoder(
onDecoded: { f in box.lock.lock(); box.frame = f; box.lock.unlock(); done.signal() },
onDecodeError: { s in box.lock.lock(); box.error = s; box.lock.unlock(); done.signal() })
decoder.setChroma444(true)
XCTAssertTrue(decoder.decode(au: au, format: format), "4:4:4 frame submit should succeed")
XCTAssertEqual(done.wait(timeout: .now() + 10), .success, "the decode callback must fire")
decoder.reset()
box.lock.lock(); let frame = box.frame; let error = box.error; box.lock.unlock()
XCTAssertNil(error.map { "decode error \($0)" })
let ready = try XCTUnwrap(frame, "a 4:4:4 ReadyFrame must be delivered")
XCTAssertEqual(CVPixelBufferGetWidth(ready.pixelBuffer), 256)
XCTAssertEqual(CVPixelBufferGetHeight(ready.pixelBuffer), 256)
let pf = CVPixelBufferGetPixelFormatType(ready.pixelBuffer)
XCTAssertTrue(
pf == kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange
|| pf == kCVPixelFormatType_444YpCbCr8BiPlanarFullRange,
"expected a biplanar 4:4:4 8-bit buffer, got \(fourCCString(pf))")
XCTAssertFalse(ready.isHDR, "an 8-bit BT.709 4:4:4 stream is SDR")
// The chroma plane (plane 1) must be FULL resolution for 4:4:4 (vs half for 4:2:0) this is
// what lets the unchanged shader sample chroma at the luma UV.
XCTAssertEqual(CVPixelBufferGetWidthOfPlane(ready.pixelBuffer, 1), 256)
XCTAssertEqual(CVPixelBufferGetHeightOfPlane(ready.pixelBuffer, 1), 256)
}
private func fourCCString(_ t: OSType) -> String {
let b = [UInt8(t >> 24 & 0xff), UInt8(t >> 16 & 0xff), UInt8(t >> 8 & 0xff), UInt8(t & 0xff)]
return String(bytes: b, encoding: .ascii) ?? "\(t)"
}
}
+79 -12
View File
@@ -3,12 +3,61 @@ title: "Apple Stage-2 Presenter (handoff)"
description: "Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left."
---
> **Status:** SHIPPED behind the opt-in `punktfunk.presenter` flag (`AVSampleBufferDisplayLayer`
> stage-1 remains the default known-good path). Live-validated ~11 ms p50 capture→present (commit
> `7b10714`). Code: `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,LatencyMeter}.swift`;
> Settings has a presenter picker (`DefaultsKey.presenter`, `SettingsView.swift`). This doc is trimmed
> to design rationale + open items — the shipped `.swift` code is the source of truth for the
> decode/present/measurement walkthrough.
> **Status:** SHIPPED as the **default** presenter (stage-1 `AVSampleBufferDisplayLayer` is the
> Metal-unavailable / DEBUG fallback). HDR corrected and **4:4:4** added on top of the proven
> main-thread present path (the hosting view's `CADisplayLink` drives `render` per vsync). Code:
> `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,Stage444Probe,LatencyMeter}.swift`.
> This doc is trimmed to design rationale + open items — the shipped `.swift` code is the source of
> truth for the decode/present/measurement walkthrough.
>
> **HDR (the "too bright" fix).** The presenter renders to a *separate* CAMetalLayer drawable, so the
> mastering metadata that was attached to the source `CVPixelBuffer` was never composited — and with no
> reference-white anchor the system rendered the PQ signal far too bright. The fix is to keep the
> PQ-passthrough shader (BT.2020 limited→full → PQ RGB as-is) and put the anchor **on the layer**:
> `colorspace = itur_2100_PQ`, `wantsExtendedDynamicRangeContent = true` (on **all** platforms — the old
> `#if os(macOS)` guard left iOS/tvOS EDR half-engaged), and
> `edrMetadata = CAEDRMetadata.hdr10(displayInfo:contentInfo:opticalOutputScale: 203)`. 203 nits =
> BT.2408 HDR reference white anchors diffuse white at EDR 1.0; a larger value renders dimmer. The
> mastering/CLL blobs (host `0xCE` datagram) now refine `edrMetadata` (drained by the pump,
> `setHdrMeta` hops the layer write to main) rather than being attached to a never-composited source
> buffer. **Needs on-glass validation on a real EDR panel.**
>
> **Mid-session SDR↔HDR.** The control-plane colour (`connection.isHDR`, from the Welcome) is fixed per
> session, but the host can re-init its encoder mid-session (a game entering HDR), so the HEVC VUI — and
> the decoder's `frame.isHDR` — flips. The presenter follows the **decoded frame**, not the latched
> session flag: `render` calls the idempotent `configure(hdr:)` every frame, so on a flip it
> reconfigures the layer (per-mode pixel format `bgra8Unorm` SDR / `rgba16Float` HDR, colorspace, EDR)
> and selects the matching shader — all synchronously on the main thread (the present path is
> main-thread, so no cross-thread hop is needed). The last `0xCE` grade is cached so an SDR→HDR
> reconfigure re-applies the real mastering metadata instead of the bare anchor. The pump drains `0xCE`
> **unconditionally** (not gated on the Welcome flag) so a session that starts SDR still gets mastering
> metadata when it goes HDR. A ≤2-frame transition flash on the rare flip is accepted.
>
> **Pacing.** The hosting view owns a **main-runloop `CADisplayLink`** (a weak `DisplayLinkProxy`
> breaks the retain cycle) that calls `renderTick` once per vsync. `renderTick` pops the **newest**
> ready frame from the 1-slot ring (older undisplayed frames dropped — lowest latency, no smoothing
> buffer) and, if there is one, draws it via **manual `layer.nextDrawable()`** and presents at the next
> vsync; on an idle vsync (no new frame) it does nothing and the compositor holds the last presented
> drawable (no idle re-render — matters at 5K). `drawableSize` is set **before** `nextDrawable` (it
> doesn't track bounds, defaults to 0), so allocation always uses the decoded size. `maximumDrawableCount
> = 3`. macOS `displaySyncEnabled = **false**`: the display link is the single pacing source, so leaving
> the layer's own vsync wait on would *also* block `present`/`nextDrawable` on the main thread and
> serialize it to the display — the cause of the fullscreen judder; disabling it lets present return
> promptly. Present is stamped at the display link's `targetTimestamp` projected to `CLOCK_REALTIME`
> (the actual on-glass instant, <1 vsync after the draw — accurate for the HUD).
>
> *(History: an off-main `CAMetalDisplayLink` variant and an off-main blocking-render present thread
> were both tried and **reverted** — both measured slower on macOS *and* iPad than this main-thread
> display-link path, whose real judder fix was simply `displaySyncEnabled = false`, not moving present
> off-thread. Measured ~11 ms p50 on the main-thread path.)*
>
> **4:4:4.** Chroma, bit-depth, and colorimetry are orthogonal: the decode pixel format is a 2×2 of
> `(chroma, HDR)` → `420v/x420/444v/x444` (all biplanar, so the existing shaders sample a full-size
> chroma plane unchanged); the shader is keyed only on HDR. The client advertises `VIDEO_CAP_444` only
> when `Stage444Probe` confirms **hardware** 4:4:4 decode (a hardware-required `VTDecompressionSession`
> over an embedded 256×256 4:4:4 keyframe — software 4:4:4 is too slow for real-time; validated on M3:
> `444v`/`x444` produced). A bounded pump backstop ends a 4:4:4 session that persistently fails to
> decode (gated to 4:4:4 sessions, so 4:2:0 loss-recovery is untouched).
## Why stage 2 (design rationale)
@@ -47,10 +96,28 @@ Async `VTDecompressionSession` callback → **1-slot newest-ready ring** → dis
## Open items
- **Make stage 2 the default** — after resolution / HDR edge-case checks (HDR = BT.2020/PQ, 10-bit
`…10BiPlanar` + EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`; ties in with the HDR roadmap).
- **On-glass HDR validation** — eyeball `edrMetadata` + `opticalOutputScale: 203` on a real EDR panel
(XDR display) against stage-1 side-by-side: diffuse white should sit at SDR-white level with only
highlights climbing. The reference white is a single named constant (`hdrReferenceWhiteNits`) for easy
tuning. (Needs a Windows HDR host; the Linux host is 8-bit SDR only.)
- **On-glass 4:4:4 validation** — confirm a `PUNKTFUNK_444` host (RTX box) streams a 4:4:4 session the
client decodes in hardware (HUD shows the resolved chroma); verify the resolution-ceiling backstop by
forcing a too-large 4:4:4 mode.
- **Glass-to-glass numbers via `tools/latency-probe`** — close the still-unmeasured host render→capture
term.
- **Smoothing / pacing policy** — present newest-ready for lowest latency today; a pacing policy can come
later if frames look uneven.
- **iOS / iPadOS / tvOS stage-2 variants.**
term and confirm the main-thread display-link present p50 holds at ~11 ms (and isn't regressed by the
per-frame `configure` / HDR-anchor work).
- **Smoothing / pacing policy** — present newest-ready for lowest latency today; an optional even-pacing
policy (`present(_:afterMinimumDuration:)`) can come later if frames look uneven.
- **4:4:4 runtime downgrade-reconnect** — today a persistently-undecodable 4:4:4 session ends cleanly
(the live 4:4:4 decode requires hardware, so a resolution-ceiling miss fails the session create
*synchronously* and the pump backstop ends it — no black-screen loop); auto-reconnecting at 4:2:0
(dropping `VIDEO_CAP_444`) is a future refinement.
- **HLG** — `isHDR`/`isHDRFormat` fold HLG (transfer 18) in with PQ, but the presenter is PQ-only
(`itur_2100_PQ` + `hdr10` EDR), so an HLG stream would be mis-toned. Latent — no host emits HLG
(the stack is BT.2020 **PQ** only). A real HLG path (`itur_2100_HLG`, no PQ reference-white anchor)
is future work; until then HLG should be treated as out of scope.
- **Full-range** — the shaders hardcode limited→full expansion and the decoder requests the
`*VideoRange` formats regardless of `connection.colorFullRange`; VideoToolbox range-converts a
full-range source to video range on decode, so it stays self-consistent (mild level compression on
genuinely full-range content, which no host emits). Pre-existing; wire `colorFullRange` into the
range constants eventually.