diff --git a/CLAUDE.md b/CLAUDE.md index 663dd5a..18d05f6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -144,11 +144,25 @@ Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protoc `test-loopback.sh` (Swift client vs synthetic punktfunk1-hosts on loopback — runs on macOS; includes the pairing ceremony + `--require-pairing` gate), `RemoteFirstLightTests` (full pipeline over the LAN). See - [`clients/apple/README.md`](clients/apple/README.md). **Stage 2 presenter** - (`VTDecompressionSession` + `CAMetalLayer`) is built and live-validated on glass behind the opt-in - `punktfunk.presenter` flag (~11 ms p50 capture→present), to become the default after a few - resolution/HDR checks. Next: make stage 2 the default, glass-to-glass numbers via - `tools/latency-probe`, iOS/iPadOS/tvOS variants. + [`clients/apple/README.md`](clients/apple/README.md). **Stage 2 presenter is now the DEFAULT** + (stage-1 is the Metal-unavailable / DEBUG fallback): explicit `VTDecompressionSession` decode → + `CAMetalLayer`, presented from the hosting view's **main-runloop `CADisplayLink`** (`renderTick` pops + the newest ready frame per vsync; macOS `displaySyncEnabled = false` is the real fullscreen-judder fix, + ~11 ms p50). *(An off-main `CAMetalDisplayLink` and an off-main blocking-render present thread were + both tried and reverted — both measured slower on macOS and iPad.)* **HDR fixed** + (`design/apple-stage2-presenter.md`): the "too bright" bug was a missing reference-white anchor — the + fix keeps the PQ-passthrough shader and adds `CAEDRMetadata.hdr10(…, opticalOutputScale: 203)` + + `wantsExtendedDynamicRangeContent` on the layer (on all platforms; the old `#if os(macOS)` guard left + iOS/tvOS EDR half-engaged), routing the 0xCE mastering metadata to the layer (via `setHdrMeta`) instead + of a never-composited source buffer. **Mid-session SDR↔HDR** is handled: `render` reconciles the layer + per-frame from the decoded `frame.isHDR` (per-mode pixel format `bgra8`/`rgba16Float`), so a game + entering HDR mid-stream just reconfigures (last 0xCE grade cached + re-applied; pump drains 0xCE + unconditionally). **4:4:4 added**: decode format is a 2×2 `(chroma, HDR)` matrix + (`420v/x420/444v/x444`, all biplanar so the shaders are unchanged), advertised (`VIDEO_CAP_444`) only + behind a **hardware-required `VTDecompressionSession` probe** (`Stage444Probe`, validated on M3) with a + Settings opt-out + a bounded pump backstop for an undecodable 4:4:4 session. *HDR brightness + 4:4:4 + still need on-glass validation (Windows-HDR / `PUNKTFUNK_444` host).* Next: glass-to-glass numbers via + `tools/latency-probe`. **Linux stage 1 done, first light 2026-06-12** (`clients/linux`, binary `punktfunk-client`): GTK4/libadwaita shell linking `punktfunk-core` directly (no C ABI; `NativeClient` is now `Sync` — mutexed plane receivers), mDNS host list, TOFU + SPAKE2 diff --git a/clients/apple/Sources/PunktfunkClient/SessionModel.swift b/clients/apple/Sources/PunktfunkClient/SessionModel.swift index 07d35ba..ea96df0 100644 --- a/clients/apple/Sources/PunktfunkClient/SessionModel.swift +++ b/clients/apple/Sources/PunktfunkClient/SessionModel.swift @@ -129,6 +129,8 @@ final class SessionModel: ObservableObject { #endif }() let hdrCapable = hdrEnabled && displayHDR + // 4:4:4 opt-out (default on); the hardware-decode probe below is the real gate. + let want444 = (UserDefaults.standard.object(forKey: DefaultsKey.enable444) as? Bool) ?? true Task.detached(priority: .userInitiated) { // PunktfunkConnection.init blocks on the QUIC handshake — keep it off the main // actor. The persistent identity is presented on every connect so a paired @@ -138,9 +140,21 @@ final class SessionModel: ObservableObject { // Advertise 10-bit + HDR10 when enabled: the host upgrades to a BT.2020 PQ Main10 stream // only for actual HDR content (its own gate); the VideoToolbox/Metal present path is // HDR-capable (P010 + itur_2100_PQ + EDR). 0 keeps the 8-bit BT.709 SDR stream. - let videoCaps: UInt8 = hdrCapable + var videoCaps: UInt8 = hdrCapable ? (PunktfunkConnection.videoCap10Bit | PunktfunkConnection.videoCapHDR) : 0 + // Advertise full-chroma 4:4:4 only when allowed AND this device can HARDWARE-decode it + // (software 4:4:4 is too slow for real-time). The host content-gates depth, so an + // HDR-advertised session can still receive an 8-bit 4:4:4 stream (SDR content) — require + // BOTH depths there. Otherwise a no-op (the host emits 4:4:4 only if it too opted in); + // `chromaFormat` on the connection reflects what was actually resolved. + let canDecode444 = + hdrCapable + ? (Stage444Probe.hwDecode444_8bit && Stage444Probe.hwDecode444_10bit) + : Stage444Probe.hwDecode444_8bit + if want444, canDecode444 { + videoCaps |= PunktfunkConnection.videoCap444 + } let result = Result { try PunktfunkConnection( host: host.address, port: host.port, width: width, height: height, refreshHz: hz, diff --git a/clients/apple/Sources/PunktfunkClient/SettingsView.swift b/clients/apple/Sources/PunktfunkClient/SettingsView.swift index bf88fa1..0a8a067 100644 --- a/clients/apple/Sources/PunktfunkClient/SettingsView.swift +++ b/clients/apple/Sources/PunktfunkClient/SettingsView.swift @@ -25,6 +25,7 @@ struct SettingsView: View { @AppStorage(DefaultsKey.bitrateKbps) private var bitrateKbps = 0 @AppStorage(DefaultsKey.presenter) private var presenter = "stage2" @AppStorage(DefaultsKey.hdrEnabled) private var hdrEnabled = true + @AppStorage(DefaultsKey.enable444) private var enable444 = true @AppStorage(DefaultsKey.libraryEnabled) private var libraryEnabled = false @AppStorage(DefaultsKey.fullscreenWhileStreaming) private var fullscreenWhileStreaming = true @AppStorage(DefaultsKey.micEnabled) private var micEnabled = true @@ -36,6 +37,7 @@ struct SettingsView: View { @State private var showControllerTest = false #endif #if os(iOS) + @AppStorage(DefaultsKey.pointerCapture) private var pointerCapture = true // The sidebar selection drives the detail pane on iPad and the pushed sub-page on iPhone. // Width class decides the initial value: nil on iPhone (show the category list first), // General on iPad (a two-column layout should never open with an empty detail). @@ -206,6 +208,7 @@ struct SettingsView: View { case .general: Form { streamModeSection + pointerSection compositorSection } .formStyle(.grouped) @@ -581,6 +584,30 @@ struct SettingsView: View { } } + #if os(iOS) + /// iPad-only pointer-capture toggle: lock the mouse/trackpad for relative movement (games) vs + /// forward an absolute cursor position (desktop). Empty on iPhone (no hardware-pointer lock — + /// the mouse path there is always the absolute fallback). + @ViewBuilder private var pointerSection: some View { + if UIDevice.current.userInterfaceIdiom == .pad { + Section { + Toggle("Capture pointer for games", isOn: $pointerCapture) + } header: { + Text("Pointer") + } footer: { + Text("With a mouse or trackpad connected, lock the pointer and send relative " + + "movement — the expected behavior for games (mouse-look). Turn this off for " + + "desktop use to keep the pointer free and send its absolute position instead. " + + "The lock needs the stream full-screen and frontmost; it falls back to the " + + "absolute pointer automatically (Stage Manager, Slide Over). Finger touch is " + + "unaffected. Applies from the next session.") + .font(.geist(12, relativeTo: .caption)) + .foregroundStyle(.secondary) + } + } + } + #endif + @ViewBuilder private var compositorSection: some View { Section { Picker("Compositor", selection: $compositor) { @@ -644,12 +671,15 @@ struct SettingsView: View { @ViewBuilder private var hdrSection: some View { Section { Toggle("10-bit HDR", isOn: $hdrEnabled) + Toggle("Full chroma (4:4:4)", isOn: $enable444) } header: { - Text("HDR") + Text("Video quality") } footer: { - Text("Request a 10-bit BT.2020 PQ (HDR10) stream. It only engages when the host is " - + "sending HDR content AND this display supports HDR — otherwise the stream stays " - + "8-bit SDR. Applies from the next session.") + Text("HDR requests a 10-bit BT.2020 PQ (HDR10) stream — it only engages when the host is " + + "sending HDR content AND this display supports HDR. 4:4:4 requests full chroma " + + "(sharper text/UI, more bandwidth) — it only engages when this device can " + + "hardware-decode it AND the host opted in. Otherwise the stream stays 8-bit " + + "4:2:0 SDR. Applies from the next session.") .font(.geist(12, relativeTo: .caption)) .foregroundStyle(.secondary) } diff --git a/clients/apple/Sources/PunktfunkKit/DefaultsKeys.swift b/clients/apple/Sources/PunktfunkKit/DefaultsKeys.swift index 4a4c9da..642a42d 100644 --- a/clients/apple/Sources/PunktfunkKit/DefaultsKeys.swift +++ b/clients/apple/Sources/PunktfunkKit/DefaultsKeys.swift @@ -25,9 +25,19 @@ public enum DefaultsKey { /// Request a 10-bit BT.2020 PQ (HDR10) stream. On by default; only takes effect when the host /// has HDR content AND this display supports HDR — otherwise the stream stays 8-bit SDR. public static let hdrEnabled = "punktfunk.hdrEnabled" + /// Request a full-chroma 4:4:4 stream when this device can HARDWARE-decode it (`Stage444Probe`). + /// On by default; only takes effect when the host also opted in to 4:4:4 (otherwise the stream + /// stays 4:2:0). Sharper text/UI at the cost of more bandwidth. + public static let enable444 = "punktfunk.enable444" public static let hosts = "punktfunk.hosts" /// Client-side cursor mode: "auto" (shown only in gamescope sessions), "always", "never". public static let cursorMode = "punktfunk.cursorMode" + /// iPad: capture the mouse/trackpad pointer (pointer lock → relative movement) for games, + /// rather than forwarding an absolute cursor position. On by default. Only meaningful on iPad + /// with a hardware mouse/trackpad; the system grants the lock only to a full-screen, frontmost + /// scene and silently falls back to the absolute pointer when it can't (Stage Manager / Slide + /// Over). Read by `StreamViewController.prefersPointerLocked`. + public static let pointerCapture = "punktfunk.pointerCapture" /// Experimental: show the host's game library (browsed over the management API). Off by default. public static let libraryEnabled = "punktfunk.libraryEnabled" /// macOS: take the window fullscreen while streaming and restore it on the host list. On by default. diff --git a/clients/apple/Sources/PunktfunkKit/GamepadFeedback.swift b/clients/apple/Sources/PunktfunkKit/GamepadFeedback.swift index 0397a38..ee1880f 100644 --- a/clients/apple/Sources/PunktfunkKit/GamepadFeedback.swift +++ b/clients/apple/Sources/PunktfunkKit/GamepadFeedback.swift @@ -370,29 +370,32 @@ public final class GamepadFeedback { // Hidout traffic (lightbar / player LEDs / triggers) only exists on a PlayStation-pad // session — a DualSense or a DualShock 4 (lightbar only). Block briefly on it there and // let rumble own the wait elsewhere; on an Xbox session it stays nonblocking. - let hasHidout = connection.resolvedGamepad == .dualSense - || connection.resolvedGamepad == .dualShock4 - let hidTimeout: UInt32 = hasHidout ? 10 : 0 let thread = Thread { [connection, flag, drainDone, weak self] in while !flag.isStopped { do { - if let r = try connection.nextRumble(timeoutMs: 10), r.pad == 0 { + // Poll the feedback planes NON-BLOCKING. A blocking poll (timeoutMs > 0) holds + // the connection's shared feedback lock for its whole wait; the video pump drains + // HDR mastering metadata (nextHdrMeta) on the SAME lock every frame, so a blocking + // poll here starved it and throttled HDR to ~1 fps (SDR, which never drains HDR + // meta, was unaffected). Pacing with a short sleep OUTSIDE the lock (below) keeps + // rumble/HID latency low while leaving the lock free between polls. + if let r = try connection.nextRumble(timeoutMs: 0), r.pad == 0 { self?.rumble.apply(low: r.low, high: r.high) } - // Drain a BOUNDED burst of hidout events: only the first poll waits, - // and the cap + stop check keep sustained 0xCD traffic (a game writing - // per-frame LED/trigger reports) from starving the rumble poll above - // or blocking stop() past one cycle. + // Drain a BOUNDED burst of hidout events so sustained 0xCD traffic (a game writing + // per-frame LED/trigger reports) can't spin here or block stop() past one cycle. var burst = 0 while burst < 64, !flag.isStopped, - let ev = try connection.nextHidOutput( - timeoutMs: burst == 0 ? hidTimeout : 0) { + let ev = try connection.nextHidOutput(timeoutMs: 0) { self?.render(ev) burst += 1 } } catch { break // .closed (or fatal) — the session is over } + // ~8 ms poll cadence (≈125 Hz), slept OUTSIDE the feedback lock — low rumble/HID + // latency without holding the lock the HDR-meta drain needs. + if !flag.isStopped { Thread.sleep(forTimeInterval: 0.008) } } drainDone.signal() } diff --git a/clients/apple/Sources/PunktfunkKit/InputCapture.swift b/clients/apple/Sources/PunktfunkKit/InputCapture.swift index 68da205..dec02f6 100644 --- a/clients/apple/Sources/PunktfunkKit/InputCapture.swift +++ b/clients/apple/Sources/PunktfunkKit/InputCapture.swift @@ -107,6 +107,23 @@ public final class InputCapture { /// macOS (no GCMouse handlers installed; `sendMouseAbs` is never called there). Main-queue. public var gcMouseForwarding = false + #if os(iOS) + /// Whether any device is attached as a `GCMouse` right now. The Magic Keyboard TRACKPAD does + /// not always register as a GCMouse on iPadOS (only a standalone mouse does) — when no GCMouse + /// is present the relative GCMouse path can't carry pointer motion. Main-queue. + public var hasGCMouse: Bool { !mice.isEmpty } + + /// Diagnostic: a one-line description of every attached GCMouse (count + GCDevice identity), so + /// PUNKTFUNK_INPUT_DEBUG can reveal whether the trackpad showed up as a mouse at all. + public var attachedMiceSummary: String { + guard !mice.isEmpty else { return "0 mice" } + let parts = mice.map { mouse -> String in + "\(mouse.productCategory)/\(mouse.vendorName ?? "?")" + } + return "\(mice.count) mice: \(parts.joined(separator: ", "))" + } + #endif + /// Fired on ⌘⎋ (the capture toggle — detected here so it works in both states; the /// event itself is swallowed). Main queue. public var onToggleCapture: (() -> Void)? @@ -394,6 +411,12 @@ public final class InputCapture { !mice.contains(where: { $0 === mouse }) // re-delivered on wake — attach once else { return } mice.append(mouse) + #if os(iOS) + if inputDebug { + inputLog.debug( + "GCMouse attached: \(mouse.productCategory, privacy: .public)/\(mouse.vendorName ?? "?", privacy: .public) — now \(self.attachedMiceSummary, privacy: .public)") + } + #endif // macOS drives motion + buttons from NSEvent (StreamLayerView's local monitor → // sendMotion/sendMouseButton) because GCMouse's handlers proved unreliable there; // installing them too would double-send. iOS keeps GCMouse (raw deltas under diff --git a/clients/apple/Sources/PunktfunkKit/MetalVideoPresenter.swift b/clients/apple/Sources/PunktfunkKit/MetalVideoPresenter.swift index 529c391..69c4d33 100644 --- a/clients/apple/Sources/PunktfunkKit/MetalVideoPresenter.swift +++ b/clients/apple/Sources/PunktfunkKit/MetalVideoPresenter.swift @@ -1,10 +1,12 @@ -// Stage-2 presenter, present half: draw a decoded NV12 CVPixelBuffer into a CAMetalLayer -// drawable with a BT.709 YUV→RGB shader. The display link (owned by the hosting view) drives -// `render` once per vsync with the target present time, so a present can finally be stamped and -// the present tail hand-paced. See docs apple-stage2-presenter.md. +// Stage-2 presenter, present half: draw a decoded NV12 / P010 / 4:4:4 CVPixelBuffer into a CAMetalLayer +// drawable with a Y′CbCr→RGB shader. The hosting view's CADisplayLink drives `render` once per vsync +// (via Stage2Pipeline.renderTick) with the target present time, so a present can be stamped and the +// present tail hand-paced. See docs apple-stage2-presenter.md. // -// Main-thread only: created during view setup, `render` called from the view's CADisplayLink -// (which fires on the main runloop). The Metal objects + texture cache are touched only here. +// Main-thread only: created during view setup, `render`/`configure` called from the view's CADisplayLink +// (which fires on the main runloop). The Metal objects + texture cache are touched only here. The one +// exception is `setHdrMeta`, called from the pump thread — it hops the layer write to main so every +// CALayer mutation stays on one thread. #if canImport(Metal) && canImport(QuartzCore) import CoreGraphics @@ -15,10 +17,19 @@ import os private let presenterLog = Logger(subsystem: "io.unom.punktfunk", category: "presenter") -/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and a -/// BT.709 limited-range NV12→RGB fragment shader. uv.y is flipped (1 - p.y) so the top-left- -/// origin texture presents upright (NDC y is up), not upside down. (Colorspace is BT.709 SDR -/// for now — matches the host; 10-bit/HDR + other matrices are a later tie-in.) +/// HDR reference white (BT.2408 "HDR Reference White"): the absolute luminance, in nits, that the +/// PQ signal's diffuse white sits at. Passed to `CAEDRMetadata.hdr10(opticalOutputScale:)`, it anchors +/// 203-nit diffuse white at EDR 1.0 (the display's SDR-white level) and lets the system tone-map the +/// brighter highlights into the panel's headroom. This is the missing anchor that made the old HDR path +/// render "way too bright" (no `edrMetadata` → no reference-white anchoring); a LARGER value renders +/// dimmer. Matches the host's standard PQ reference white. +private let hdrReferenceWhiteNits: Float = 203.0 + +/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and BT.709 SDR +/// and BT.2020-PQ HDR Y′CbCr→RGB fragment shaders. uv.y is flipped (1 - p.y) so the top-left-origin +/// texture presents upright (NDC y is up). The HDR shader outputs PQ-encoded R′G′B′ as-is — the +/// CAMetalLayer's `itur_2100_PQ` colour space + `edrMetadata` tell the system compositor the samples +/// are PQ and how to tone-map them (no EOTF here, matching the host's BT.2020 PQ emission). private let shaderSource = """ #include using namespace metal; @@ -66,6 +77,8 @@ float catmullRomLuma(texture2d tex, sampler s, float2 uv) { return r; } +// SDR: 8-bit NV12 / 4:4:4 (BT.709, limited/video range) → full-range RGB. Chroma is sampled at the +// same UV as luma, so a full-size 4:4:4 chroma plane needs no shader change vs 4:2:0. fragment float4 pf_frag(VOut in [[stage_in]], texture2d lumaTex [[texture(0)]], texture2d chromaTex [[texture(1)]]) { @@ -82,18 +95,18 @@ fragment float4 pf_frag(VOut in [[stage_in]], return float4(saturate(float3(r, g, b)), 1.0); } -// HDR: 10-bit P010 (BT.2020, limited range), Y'CbCr that is PQ-encoded. We apply the BT.2020 -// matrix to get PQ-encoded R'G'B' and output it as-is — the CAMetalLayer's itur_2100_PQ colour -// space + EDR tells the compositor the samples are PQ, so it does the PQ→display mapping. No EOTF -// here (matching the host, which emitted BT.2020 PQ). P010 stores the 10-bit code in the high bits -// of each 16-bit sample, so an .r16Unorm sample reads ~code/1023 (the /1024 vs /1023 error is < 0.1%). +// HDR: 10-bit P010 / 4:4:4 (BT.2020, limited range), Y′CbCr that is PQ-encoded. We apply the BT.2020 +// matrix to get PQ-encoded R′G′B′ and output it as-is — the CAMetalLayer's itur_2100_PQ colour space +// + edrMetadata tell the compositor the samples are PQ, so it does the PQ→display tone-map. No EOTF +// here. P010/x444 store the 10-bit code in the high bits of each 16-bit sample, so an .r16Unorm sample +// reads ~code/1023 (the /1024 vs /1023 error is < 0.1%). fragment float4 pf_frag_hdr(VOut in [[stage_in]], texture2d lumaTex [[texture(0)]], texture2d chromaTex [[texture(1)]]) { constexpr sampler s(filter::linear, address::clamp_to_edge); float y = catmullRomLuma(lumaTex, s, in.uv); float2 c = chromaTex.sample(s, in.uv).rg; - // BT.2020 10-bit limited (video) range → full-range PQ R'G'B'. + // BT.2020 10-bit limited (video) range → full-range PQ R′G′B′. y = (y - 64.0/1023.0) * (1023.0/876.0); float u = (c.x - 512.0/1023.0) * (1023.0/896.0); float v = (c.y - 512.0/1023.0) * (1023.0/896.0); @@ -110,26 +123,34 @@ public final class MetalVideoPresenter { private let device: MTLDevice private let queue: MTLCommandQueue - /// SDR (BT.709 8-bit NV12 → bgra8) and HDR (BT.2020 PQ 10-bit P010 → rgba16Float) pipelines. - /// Selected per frame by `render`; the layer is reconfigured when the mode flips (HDR toggle). + /// SDR (BT.709 8-bit → bgra8) and HDR (BT.2020 PQ 10-bit → rgba16Float) pipelines. Selected per + /// frame in `render`; the layer is reconfigured to match when the session flips (HDR toggle). private let pipelineSDR: MTLRenderPipelineState private let pipelineHDR: MTLRenderPipelineState private var textureCache: CVMetalTextureCache? - /// Current layer configuration — switched lazily in `configure(hdr:)` when a frame's mode differs. + + /// Current layer configuration — switched in `configure(hdr:)` when a frame's HDR-ness differs. + /// Main-thread only (read + written from `render`/`configure`, all on the display-link runloop). private var hdrActive = false + /// Last HDR mastering grade received via `setHdrMeta` (the host's 0xCE). Cached so a mid-session + /// SDR→HDR flip's `configureColor` re-applies the real grade instead of clobbering it back to the + /// bare reference-white anchor (an out-of-order race otherwise: `setHdrMeta` and the flip both write + /// `edrMetadata`). Main-thread only. + private var lastHdrMeta: PunktfunkConnection.HdrMeta? + #if DEBUG - /// Last logged "decoded→drawable" signature, so the diagnostic logs only when a size changes - /// (on first frame, a resize, or a host Reconfigure) instead of every frame. + /// Last logged "decoded→drawable" signature, so the diagnostic logs only on a size/HDR change. private var lastSizeSig = "" #endif - /// nil if Metal is unavailable (no GPU / a headless CI) — the caller falls back to stage-1. - public init?() { + /// nil if Metal is unavailable (no GPU / a headless CI) or a shader fails to compile — the caller + /// falls back to stage-1. + public static func make() -> MetalVideoPresenter? { guard let device = MTLCreateSystemDefaultDevice(), let queue = device.makeCommandQueue() else { return nil } - self.device = device - self.queue = queue + let pipelineSDR: MTLRenderPipelineState + let pipelineHDR: MTLRenderPipelineState do { let library = try device.makeLibrary(source: shaderSource, options: nil) let vtx = library.makeFunction(name: "pf_vtx") @@ -146,99 +167,140 @@ public final class MetalVideoPresenter { } catch { return nil } - CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &textureCache) - guard textureCache != nil else { return nil } + var cache: CVMetalTextureCache? + CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &cache) + guard let textureCache = cache else { return nil } let layer = CAMetalLayer() layer.device = device layer.pixelFormat = .bgra8Unorm layer.framebufferOnly = true layer.isOpaque = true - // Render the drawable at the DECODED frame's resolution (set per-frame in `render`) and let - // the system compositor scale it to the layer's bounds — the same `.resizeAspect` path - // stage-1's AVSampleBufferDisplayLayer (videoGravity) uses, so stage-2 matches its sharpness. - // A native-resolution present is then pixel-exact (1:1, no shader scaling), and any display - // scaling uses the system's high-quality scaler rather than the in-shader bicubic. - layer.contentsGravity = .resizeAspect - // Triple-buffer: more in-flight drawables before `nextDrawable()` (called on the - // display-link / MAIN thread) has to block waiting for one to free. - layer.maximumDrawableCount = 3 #if os(macOS) - // The display link already paces exactly one present per vsync. Leaving the layer's - // own vsync wait on means `commandBuffer.present` ALSO blocks for the hardware vsync, - // so `nextDrawable()` stalls the MAIN thread until a drawable frees — windowed, the - // WindowServer's looser compositing hides it; FULLSCREEN's tighter, more-direct path - // serializes the main thread to the display and the stall surfaces as bad judder. - // Disabling the layer-level sync lets present return promptly (the display link is the - // pacing source), which is what fixes the fullscreen stutter. macOS-only property. + // The display link already paces exactly one present per vsync. Leaving the layer's own vsync + // wait on means `commandBuffer.present` ALSO blocks for the hardware vsync, so `nextDrawable()` + // stalls the MAIN thread until a drawable frees — windowed, the WindowServer's looser + // compositing hides it; FULLSCREEN's tighter path serializes the main thread to the display and + // the stall surfaces as bad judder. Disabling the layer-level sync lets present return promptly + // (the display link is the pacing source) — the fix for the fullscreen stutter. macOS-only. layer.displaySyncEnabled = false #endif + // Render the drawable at the DECODED frame's resolution (set per-frame in `render`) and let the + // system compositor scale it to the layer's bounds — the same `.resizeAspect` path stage-1's + // AVSampleBufferDisplayLayer uses. A native-resolution present is then pixel-exact (1:1, no + // shader scaling); a resized window rescales via the system's scaler. + layer.contentsGravity = .resizeAspect + // Triple-buffer: more in-flight drawables before `nextDrawable()` (called on the display-link / + // MAIN thread) has to block waiting for one to free. + layer.maximumDrawableCount = 3 + + return MetalVideoPresenter( + device: device, queue: queue, pipelineSDR: pipelineSDR, pipelineHDR: pipelineHDR, + textureCache: textureCache, layer: layer) + } + + private init( + device: MTLDevice, queue: MTLCommandQueue, pipelineSDR: MTLRenderPipelineState, + pipelineHDR: MTLRenderPipelineState, textureCache: CVMetalTextureCache, layer: CAMetalLayer + ) { + self.device = device + self.queue = queue + self.pipelineSDR = pipelineSDR + self.pipelineHDR = pipelineHDR + self.textureCache = textureCache self.layer = layer } - /// Reconfigure the layer for SDR or HDR when the stream mode flips (HDR toggle). HDR uses an - /// rgba16Float drawable + a BT.2020 PQ colour space + EDR, so the compositor PQ-maps to the - /// display; SDR uses the plain 8-bit sRGB path. Main-thread only (called from `render`). - private func configure(hdr: Bool) { + /// Configure the layer + active pipeline for an SDR or HDR session. MAIN THREAD ONLY. Called once at + /// session start and again per-frame from `render` (idempotent — the guard makes a same-state call a + /// no-op), so a mid-session HDR toggle (the host re-inits its encoder; the decoded `frame.isHDR` + /// flips) reconfigures here automatically. HDR uses an rgba16Float drawable + BT.2020 PQ colour space + /// + EDR with a 203-nit reference-white anchor; SDR uses the plain 8-bit sRGB path. + public func configure(hdr: Bool) { guard hdr != hdrActive else { return } hdrActive = hdr + configureColor(hdr: hdr) + } + + /// Set the layer's pixel format + colour config for SDR or HDR. MAIN THREAD ONLY. EDR is requested + /// on ALL platforms — the property is available on macOS/iOS/tvOS at our deployment floor, and the + /// old `#if os(macOS)` guard left iOS/tvOS EDR half-engaged. + private func configureColor(hdr: Bool) { if hdr { layer.pixelFormat = .rgba16Float layer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ) - #if os(macOS) layer.wantsExtendedDynamicRangeContent = true - #endif + // Anchor reference white. Re-apply the real grade if one already arrived (0xCE before the + // flip); otherwise the bare 203-nit anchor. Without this anchor the PQ signal is too bright. + layer.edrMetadata = makeEDR(lastHdrMeta) } else { + // SDR: gamma-encoded BT.709 [0,1] in an 8-bit drawable; a nil colorspace tags it device/sRGB + // (the proven SDR path — never showed the "too bright" issue, which was HDR-only). layer.pixelFormat = .bgra8Unorm layer.colorspace = nil - #if os(macOS) layer.wantsExtendedDynamicRangeContent = false - #endif + layer.edrMetadata = nil } } - /// Draw one decoded frame to the next drawable and present it. `isHDR` selects the 10-bit - /// BT.2020 PQ path (P010 input) vs the 8-bit BT.709 path (NV12 input). Returns true on success; - /// false when there's no drawable yet, a texture couldn't be made, or Metal errored — the - /// caller then doesn't stamp a present for this frame. + private func makeEDR(_ meta: PunktfunkConnection.HdrMeta?) -> CAEDRMetadata { + CAEDRMetadata.hdr10( + displayInfo: meta?.masteringDisplayColorVolume(), + contentInfo: meta?.contentLightLevelInfo(), + opticalOutputScale: hdrReferenceWhiteNits) + } + + /// Update the HDR mastering metadata (drained from the host's 0xCE datagram) to refine the system + /// tone-map from the real grade. Called from the PUMP thread, so the layer write is hopped to MAIN + /// (every CALayer mutation stays on one thread). The grade is cached so a later SDR→HDR + /// `configureColor` re-applies it; the `edrMetadata` write is gated on `hdrActive` (setting it on an + /// SDR layer is harmless but pointless, and the flip will apply it anyway). + public func setHdrMeta(_ meta: PunktfunkConnection.HdrMeta) { + DispatchQueue.main.async { [weak self] in + guard let self else { return } + self.lastHdrMeta = meta + if self.hdrActive { self.layer.edrMetadata = self.makeEDR(meta) } + } + } + + /// Draw one decoded frame to the next drawable and present it. MAIN THREAD (the display link). + /// `isHDR` selects the 10-bit BT.2020 PQ path vs the 8-bit BT.709 path and is reconciled with the + /// layer config via `configure`. Returns true on success; false when there's no drawable yet, a + /// texture couldn't be made, or Metal errored — the caller then doesn't stamp a present. @discardableResult public func render(_ pixelBuffer: CVPixelBuffer, isHDR: Bool = false) -> Bool { + // Reconcile the layer with the decoded frame's HDR-ness (handles a mid-session SDR↔HDR flip). configure(hdr: isHDR) - // P010 stores 10-bit luma/chroma in 16-bit samples → R16/RG16; NV12 is 8-bit → R8/RG8. - let lumaFmt: MTLPixelFormat = isHDR ? .r16Unorm : .r8Unorm - let chromaFmt: MTLPixelFormat = isHDR ? .rg16Unorm : .rg8Unorm + + // P010/x444 store 10-bit luma/chroma in 16-bit samples → R16/RG16; NV12/444v is 8-bit → R8/RG8. + // Derived from the actual decoded buffer so a 4:4:4 (full chroma plane) frame just works. + let pf = CVPixelBufferGetPixelFormatType(pixelBuffer) + let tenBit = + pf == kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange + || pf == kCVPixelFormatType_420YpCbCr10BiPlanarFullRange + || pf == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange + || pf == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange guard let textureCache, - let luma = makeTexture(pixelBuffer, plane: 0, format: lumaFmt, cache: textureCache), - let chroma = makeTexture(pixelBuffer, plane: 1, format: chromaFmt, cache: textureCache) + let luma = makeTexture( + pixelBuffer, plane: 0, format: tenBit ? .r16Unorm : .r8Unorm, cache: textureCache), + let chroma = makeTexture( + pixelBuffer, plane: 1, format: tenBit ? .rg16Unorm : .rg8Unorm, cache: textureCache) else { return false } - // Size the drawable to the decoded frame so the fullscreen triangle samples the texture 1:1 - // (pixel-exact); the layer's contentsGravity then scales it to the on-screen bounds via the - // system compositor (matching stage-1). Re-set only on a change (first frame / Reconfigure). + // Size the drawable to the decoded frame so the fullscreen triangle samples 1:1 (pixel-exact); + // the layer's contentsGravity then scales it to the on-screen bounds via the system compositor + // (matching stage-1). drawableSize does NOT track bounds (defaults to 0), so set it BEFORE + // nextDrawable; re-set only on a change (first frame / Reconfigure / HDR flip). let decodedSize = CGSize( width: CVPixelBufferGetWidth(pixelBuffer), height: CVPixelBufferGetHeight(pixelBuffer)) if layer.drawableSize != decodedSize { layer.drawableSize = decodedSize } + #if DEBUG + logSizeIfChanged(decoded: decodedSize) + #endif guard let drawable = layer.nextDrawable(), let commandBuffer = queue.makeCommandBuffer() else { return false } - #if DEBUG - // Diagnose sharpness: decoded should equal the drawable (the shader is 1:1); the layer's - // bounds may differ (the system scales). Logged only when a size changes. - let decodedW = Int(decodedSize.width) - let decodedH = Int(decodedSize.height) - let sig = "\(decodedW)x\(decodedH)|\(Int(layer.drawableSize.width))x\(Int(layer.drawableSize.height))" - if sig != lastSizeSig { - lastSizeSig = sig - let msg = "stage2: decoded \(decodedW)x\(decodedH) → drawable " - + "\(Int(layer.drawableSize.width))x\(Int(layer.drawableSize.height)) " - + "(texture \(drawable.texture.width)x\(drawable.texture.height), " - + "contentsScale \(layer.contentsScale), " - + "layerBounds \(Int(layer.bounds.width))x\(Int(layer.bounds.height)))" - presenterLog.info("\(msg, privacy: .public)") - } - #endif - let pass = MTLRenderPassDescriptor() pass.colorAttachments[0].texture = drawable.texture pass.colorAttachments[0].loadAction = .clear @@ -247,24 +309,23 @@ public final class MetalVideoPresenter { guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else { return false } - encoder.setRenderPipelineState(isHDR ? pipelineHDR : pipelineSDR) + encoder.setRenderPipelineState(hdrActive ? pipelineHDR : pipelineSDR) encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0) encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1) encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3) encoder.endEncoding() commandBuffer.present(drawable) // present at the next vsync — lowest latency - // Hold the CVMetalTextures + the source pixel buffer (its IOSurface) alive until the GPU - // finishes sampling — releasing them at scope exit could free the backing mid-read. + // Hold the CVMetalTextures + source pixel buffer (its IOSurface) alive until the GPU finishes + // sampling — releasing them at scope exit could free the backing mid-read. commandBuffer.addCompletedHandler { _ in _ = (luma, chroma, pixelBuffer) } commandBuffer.commit() return true } - /// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past - /// the draw — the MTLTexture is only valid while its CVMetalTexture is retained. + /// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past the + /// draw — the MTLTexture is only valid while its CVMetalTexture is retained. private func makeTexture( - _ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat, - cache: CVMetalTextureCache + _ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat, cache: CVMetalTextureCache ) -> CVMetalTexture? { let w = CVPixelBufferGetWidthOfPlane(pixelBuffer, plane) let h = CVPixelBufferGetHeightOfPlane(pixelBuffer, plane) @@ -276,5 +337,16 @@ public final class MetalVideoPresenter { else { return nil } return cvTexture } + + #if DEBUG + private func logSizeIfChanged(decoded: CGSize) { + let sig = "\(Int(decoded.width))x\(Int(decoded.height))|hdr\(hdrActive ? 1 : 0)" + if sig != lastSizeSig { + lastSizeSig = sig + let msg = "stage2: decoded \(Int(decoded.width))x\(Int(decoded.height)) hdr=\(hdrActive)" + presenterLog.info("\(msg, privacy: .public)") + } + } + #endif } #endif diff --git a/clients/apple/Sources/PunktfunkKit/PointerLockChain.swift b/clients/apple/Sources/PunktfunkKit/PointerLockChain.swift new file mode 100644 index 0000000..f930e64 --- /dev/null +++ b/clients/apple/Sources/PunktfunkKit/PointerLockChain.swift @@ -0,0 +1,94 @@ +// Steers the system's iPad pointer-lock resolution down to a chosen "anchor" view controller. +// +// `UIViewController.prefersPointerLocked` is resolved the same way as the status bar: the system +// walks DOWN from the window's root view controller through `childViewControllerForPointerLock`. +// SwiftUI's hosting / container view controllers do NOT forward that query to their children, so a +// `UIViewControllerRepresentable` controller buried in the SwiftUI tree (our StreamViewController) +// is never consulted — its `prefersPointerLocked = true` is silently ignored and a Magic Keyboard +// trackpad / mouse falls through to the absolute-pointer path instead of being captured. +// +// Swizzling the DEFAULT implementation isn't enough: the controllers that break the chain +// (UIHostingController and SwiftUI's internal containers) provide their OWN implementation of the +// property, so a base-class swizzle never runs for them. Instead we walk UP the LIVE `parent` +// chain from the anchor to the window root and, on each real ancestor, force +// `childViewControllerForPointerLock` to return the next controller toward the anchor. Each forced +// value is a genuine direct child (we follow the actual containment chain), so the system's +// downward walk reaches the anchor and reads its `prefersPointerLocked`. +// +// The forcing is per-INSTANCE — an associated object — gated behind a one-time per-CLASS IMP +// swizzle. Only the specific controllers in the anchor's chain are affected; every other instance +// of those classes keeps its original behavior (associated object nil → original IMP). The forced +// values are cleared on disengage so the long-lived SwiftUI parents don't retain a stale controller +// across sessions. Only the PUBLIC `childViewControllerForPointerLock` selector is touched +// (App-Store-safe; no private API). + +#if os(iOS) +import ObjectiveC +import UIKit + +enum PointerLockChain { + private static var forcedChildKey: UInt8 = 0 + /// Classes whose `childViewControllerForPointerLock` we've already IMP-swizzled (keyed by the + /// class object). Main-thread only — pointer-lock resolution and capture toggles are all main. + private static var swizzledClasses = Set() + /// Ancestors we've stamped with a forced child this engagement, held weakly so a deallocated + /// SwiftUI controller drops out on its own (no dangling). disengage() clears every one — even + /// if the live `parent` chain has since broken — so a stamped parent can never retain a stale + /// controller subtree across sessions. One anchor is ever engaged at a time. + private static let stampedParents = NSHashTable.weakObjects() + + private static func forcedChild(of vc: UIViewController) -> UIViewController? { + objc_getAssociatedObject(vc, &forcedChildKey) as? UIViewController + } + + private static func setForcedChild(_ child: UIViewController?, on vc: UIViewController) { + // RETAIN: while steering, the parent must keep the toward-anchor child alive. It's also + // already a strong child of `vc` via UIKit containment, so this adds no cycle (the reverse + // `.parent` link is weak), and disengage() always clears it — so it can't outlive a session. + objc_setAssociatedObject(vc, &forcedChildKey, child, .OBJC_ASSOCIATION_RETAIN_NONATOMIC) + } + + /// Ensure `cls`'s `childViewControllerForPointerLock` getter consults the per-instance forced + /// child first, falling back to the class's original implementation. Idempotent per class. + private static func ensureSwizzled(_ cls: AnyClass) { + let id = ObjectIdentifier(cls) + guard !swizzledClasses.contains(id) else { return } + swizzledClasses.insert(id) + let selector = #selector(getter: UIViewController.childViewControllerForPointerLock) + guard let method = class_getInstanceMethod(cls, selector) else { return } + let originalIMP = method_getImplementation(method) + typealias OriginalFn = @convention(c) (AnyObject, Selector) -> UIViewController? + let original = unsafeBitCast(originalIMP, to: OriginalFn.self) + let forwarding: @convention(block) (UIViewController) -> UIViewController? = { vc in + if let forced = forcedChild(of: vc) { return forced } + return original(vc, selector) + } + method_setImplementation(method, imp_implementationWithBlock(forwarding)) + } + + /// Force every ancestor of `anchor` to forward pointer-lock resolution toward it, then ask the + /// system to re-resolve. No-op when `anchor` isn't in a view-controller hierarchy yet (it + /// re-runs from the anchor's appearance/parent callbacks once it is). + static func engage(_ anchor: UIViewController) { + disengage(anchor) // clear any prior engagement first (reparent / re-anchor) + var child = anchor + while let parent = child.parent { + ensureSwizzled(object_getClass(parent)!) + setForcedChild(child, on: parent) + stampedParents.add(parent) + child = parent + } + anchor.setNeedsUpdateOfPrefersPointerLocked() + } + + /// Clear the forced forwarding on every stamped ancestor (so the SwiftUI parents stop retaining + /// the anchor's subtree) and re-resolve to drop the lock. + static func disengage(_ anchor: UIViewController) { + for parent in stampedParents.allObjects { + setForcedChild(nil, on: parent) + } + stampedParents.removeAllObjects() + anchor.setNeedsUpdateOfPrefersPointerLocked() + } +} +#endif diff --git a/clients/apple/Sources/PunktfunkKit/Probe444Blobs.swift b/clients/apple/Sources/PunktfunkKit/Probe444Blobs.swift new file mode 100644 index 0000000..0e54488 --- /dev/null +++ b/clients/apple/Sources/PunktfunkKit/Probe444Blobs.swift @@ -0,0 +1,36 @@ +// Synthetic 4:4:4 HEVC keyframes used only by `Stage444Probe` to probe decode capability. +// +// Each is the first IDR access unit (VPS + SPS + PPS + IDR slice, Annex-B) of a 256×256 HEVC +// Range-Extensions clip — `chroma_format_idc = 3` — generated offline with libx265: +// ffmpeg -f lavfi -i color=c=gray:s=256x256:r=30:d=0.1 -frames:v 3 \ +// -pix_fmt yuv444p[10le] -c:v libx265 \ +// -x265-params keyint=1:min-keyint=1:no-info=1:repeat-headers=1:aud=0 -f hevc out.hevc +// 256×256 clears the hardware decoder's minimum-dimension floor (a 16×16 clip is rejected for every +// chroma format). Validated to hardware-decode to `444v`/`x444` on Apple Silicon (M3). +enum Probe444Blobs { + /// 256×256 HEVC Range-Extensions 4:4:4 keyframe (Annex-B): 134 bytes. + static let au444_8bit: [UInt8] = [ + 0x00, 0x00, 0x00, 0x01, 0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, + 0x9e, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, 0xba, 0x02, 0x40, 0x00, 0x00, 0x00, 0x01, 0x42, + 0x01, 0x01, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, 0x9e, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, + 0x90, 0x01, 0x01, 0x00, 0x80, 0xb2, 0xdd, 0x49, 0x26, 0x57, 0x80, 0xb4, 0x04, 0x00, 0x00, 0x03, + 0x00, 0x04, 0x00, 0x00, 0x03, 0x00, 0x78, 0x20, 0x00, 0x00, 0x00, 0x01, 0x44, 0x01, 0xc1, 0x72, + 0x86, 0x0c, 0x06, 0x24, 0x00, 0x00, 0x00, 0x01, 0x28, 0x01, 0xaf, 0x72, 0x15, 0xe8, 0x34, 0xeb, + 0xae, 0xfb, 0xfe, 0x75, 0x57, 0xca, 0xc1, 0x71, 0x43, 0x16, 0xf5, 0xc2, 0x40, 0xbd, 0x80, 0xa6, + 0x65, 0x35, 0x20, 0x28, 0x81, 0xa2, 0x5e, 0xc0, 0x93, 0x04, 0x10, 0x9b, 0x00, 0x34, 0xe0, 0x87, + 0x00, 0x00, 0x03, 0x00, 0x5b, 0x40, + ] + + /// 256×256 HEVC Range-Extensions 4:4:4 10-bit keyframe (Annex-B): 133 bytes. + static let au444_10bit: [UInt8] = [ + 0x00, 0x00, 0x00, 0x01, 0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, + 0x9c, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, 0xba, 0x02, 0x40, 0x00, 0x00, 0x00, 0x01, 0x42, + 0x01, 0x01, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, 0x9c, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, + 0x90, 0x01, 0x01, 0x00, 0x80, 0x9b, 0x2d, 0xd4, 0x92, 0x65, 0x78, 0x0b, 0x40, 0x40, 0x00, 0x00, + 0x03, 0x00, 0x40, 0x00, 0x00, 0x07, 0x82, 0x00, 0x00, 0x00, 0x01, 0x44, 0x01, 0xc1, 0x72, 0x86, + 0x0c, 0x06, 0x24, 0x00, 0x00, 0x00, 0x01, 0x28, 0x01, 0xaf, 0x72, 0x15, 0xe8, 0x34, 0xeb, 0xae, + 0xfb, 0xfe, 0x75, 0x57, 0xca, 0xc1, 0x71, 0x43, 0x16, 0xf5, 0xc2, 0x40, 0xbd, 0x80, 0xa6, 0x65, + 0x35, 0x20, 0x28, 0x81, 0xa2, 0x5e, 0xc0, 0x93, 0x04, 0x10, 0x9b, 0x00, 0x34, 0xe0, 0x87, 0x00, + 0x00, 0x03, 0x00, 0x5b, 0x40, + ] +} diff --git a/clients/apple/Sources/PunktfunkKit/PunktfunkConnection.swift b/clients/apple/Sources/PunktfunkKit/PunktfunkConnection.swift index 17488d7..564954d 100644 --- a/clients/apple/Sources/PunktfunkKit/PunktfunkConnection.swift +++ b/clients/apple/Sources/PunktfunkKit/PunktfunkConnection.swift @@ -238,6 +238,13 @@ public final class PunktfunkConnection { public private(set) var colorFullRange: Bool = false /// Encoded bit depth (8 or 10). public private(set) var bitDepth: UInt8 = 8 + /// The chroma subsampling the host resolved for this session, as the HEVC `chroma_format_idc`: + /// `1` = 4:2:0 (every pre-4:4:4 host, and the back-compat default) or `3` = full-chroma 4:4:4 + /// (only when this client advertised `videoCap444` *and* the host could open a real 4:4:4 + /// encoder). Drive the decoder's requested pixel format from this. See `isChroma444`. + public private(set) var chromaFormat: UInt8 = 1 + /// Convenience: the resolved stream is full-chroma 4:4:4 (`chroma_format_idc == 3`). + public var isChroma444: Bool { chromaFormat == 3 } /// True when the negotiated stream is HDR (PQ or HLG transfer) — drive an HDR present path and /// drain `nextHdrMeta`. public var isHDR: Bool { colorTransfer == 16 || colorTransfer == 18 } @@ -334,6 +341,9 @@ public final class PunktfunkConnection { colorMatrix = mtx colorFullRange = fullRange != 0 bitDepth = depth + var cf: UInt8 = 1 + _ = punktfunk_connection_chroma_format(handle, &cf) + chromaFormat = cf var ac: UInt8 = 2 _ = punktfunk_connection_audio_channels(handle, &ac) resolvedAudioChannels = ac @@ -605,6 +615,10 @@ public final class PunktfunkConnection { public static let videoCap10Bit: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_10BIT) /// Video-capability bit: the client can present BT.2020 PQ HDR10 (implies 10-bit). public static let videoCapHDR: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_HDR) + /// Video-capability bit: the client can decode a full-chroma 4:4:4 HEVC stream (Range + /// Extensions). Advertise only when the device can *hardware*-decode it (`Stage444Probe`); + /// the host then emits 4:4:4 only if it too opted in. `chromaFormat` reflects the real value. + public static let videoCap444: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_444) /// Static HDR mastering metadata (SMPTE ST.2086 + content light level) the host sent for an HDR /// session. Mirrors the wire/ABI `PunktfunkHdrMeta`; primaries are in ST.2086 **G, B, R** order, diff --git a/clients/apple/Sources/PunktfunkKit/Stage2Pipeline.swift b/clients/apple/Sources/PunktfunkKit/Stage2Pipeline.swift index 2a65908..5c8741d 100644 --- a/clients/apple/Sources/PunktfunkKit/Stage2Pipeline.swift +++ b/clients/apple/Sources/PunktfunkKit/Stage2Pipeline.swift @@ -1,21 +1,21 @@ -// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async -// output drops the newest decoded frame into a 1-slot ring; the hosting view's display link -// calls `renderTick` once per vsync to draw + present the newest ready frame and stamp -// capture→present. Mirrors StreamPump's lifecycle (one per start; cancel is permanent). +// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async output +// drops the newest decoded frame into a 1-slot ring; the hosting view's display link calls `renderTick` +// once per vsync to draw + present the newest ready frame and stamp capture→present. Mirrors +// StreamPump's lifecycle (one per start; cancel is permanent). // -// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick` -// + `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there). -// Only the ring + decoder cross threads and both are internally locked. +// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick` + +// `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there). Only the ring (lock-guarded) +// and the decoder/presenter (internally locked / main-hopped) cross threads. #if canImport(Metal) && canImport(QuartzCore) import AVFoundation import Foundation import QuartzCore -/// Weak-target wrapper for CADisplayLink. The link retains its target, so targeting a view -/// directly makes a `view → link → view` cycle that only `invalidate()` breaks — if a teardown -/// is ever missed the view leaks and keeps ticking. This proxy holds the handler weakly, so the -/// view can deallocate and its `deinit` invalidate the link. +/// Weak-target wrapper for CADisplayLink. The link retains its target, so targeting a view directly +/// makes a `view → link → view` cycle that only `invalidate()` breaks — if a teardown is ever missed +/// the view leaks and keeps ticking. This proxy holds the handler weakly, so the view can deallocate +/// and its `deinit` invalidate the link. public final class DisplayLinkProxy: NSObject { private let onTick: (CADisplayLink) -> Void public init(_ onTick: @escaping (CADisplayLink) -> Void) { self.onTick = onTick } @@ -44,10 +44,10 @@ private final class PumpToken: @unchecked Sendable { func cancel() { lock.lock(); live = false; lock.unlock() } } -/// Throttled host keyframe requests for decode recovery. The decoder's async error callback -/// (a VT thread) and the pump thread (a submit failure) both signal a wedge; this coalesces -/// them so the control stream isn't flooded while the decode stays stalled for several frames -/// until the requested IDR lands. Bound to the live connection in `start`, unbound in `stop`. +/// Throttled host keyframe requests for decode recovery. The decoder's async error callback (a VT +/// thread) and the pump thread (a submit failure) both signal a wedge; this coalesces them so the +/// control stream isn't flooded while the decode stays stalled for several frames until the requested +/// IDR lands. Bound to the live connection in `start`, unbound in `stop`. private final class KeyframeRecovery: @unchecked Sendable { private let lock = NSLock() private var connection: PunktfunkConnection? @@ -60,7 +60,7 @@ private final class KeyframeRecovery: @unchecked Sendable { func request() { lock.lock() let now = DispatchTime.now().uptimeNanoseconds - let due = lastNs == 0 || now &- lastNs > 100_000_000 // ≥ 100 ms since the last request (matches Android) + let due = lastNs == 0 || now &- lastNs > 100_000_000 // ≥ 100 ms since the last request if due { lastNs = now } let conn = due ? connection : nil lock.unlock() @@ -76,30 +76,36 @@ public final class Stage2Pipeline { private let recovery = KeyframeRecovery() private var token = PumpToken() private var offsetNs: Int64 = 0 + /// Signalled when the pump thread exits, so `stop()` can join it (bounded) before `decoder.reset()` + /// — otherwise a pump iteration already past its `token.isLive` check can rebuild a decode session + /// right after the reset (a brief orphan session). `pumpJoinable` is armed by `start`, consumed by + /// the first `stop` (so the idempotent second `stop`/deinit doesn't block on an already-drained + /// semaphore). start/stop are sequential lifecycle calls, so the plain flag is safe. + private let pumpStopped = DispatchSemaphore(value: 0) + private var pumpJoinable = false - /// The Metal layer the hosting view installs + sizes. nil-init fails when Metal is - /// unavailable so the caller can fall back to stage-1. + /// The Metal layer the hosting view installs + sizes. public var layer: CAMetalLayer { presenter.layer } - /// `presentMeter` records capture→present (the glass-to-glass term). Returns nil if Metal - /// can't be set up (headless / no GPU) — caller falls back to the stage-1 presenter. + /// `presentMeter` records capture→present (the glass-to-glass term). Returns nil if Metal can't be + /// set up (headless / no GPU) — caller falls back to the stage-1 presenter. public init?(presentMeter: LatencyMeter) { - guard let presenter = MetalVideoPresenter() else { return nil } + guard let presenter = MetalVideoPresenter.make() else { return nil } self.presenter = presenter self.presentMeter = presentMeter let ring = ring let recovery = recovery self.decoder = VideoDecoder( onDecoded: { ring.submit($0) }, - // Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump - // resets to re-gate on the next IDR, and we ask the host to send one now (infinite - // GOP — it wouldn't otherwise come soon). Throttled in KeyframeRecovery. + // Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump resets to + // re-gate on the next IDR, and we ask the host to send one now (infinite GOP — it wouldn't + // otherwise come soon). Throttled in KeyframeRecovery. onDecodeError: { _ in recovery.request() }) } - /// Start pulling AUs into the decoder. `onFrame` fires per AU at receipt (capture→client - /// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client) - /// makes the present stamp cross-machine valid. + /// Start pulling AUs into the decoder. MAIN THREAD. `onFrame` fires per AU at receipt (capture→client + /// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client) makes the + /// present stamp cross-machine valid. public func start( connection: PunktfunkConnection, onFrame: (@Sendable (AccessUnit) -> Void)?, @@ -108,34 +114,48 @@ public final class Stage2Pipeline { offsetNs = connection.clockOffsetNs recovery.bind(connection) // arm host-keyframe recovery for this session token = PumpToken() // fresh token per start — cancel is permanent (like StreamPump) + + // Configure the decoder's chroma + the layer's initial colorimetry before the first frame. The + // chroma subsampling drives only the decode pixel format (orthogonal to HDR/depth); the HDR + // config is the Welcome's latched value, which a mid-session flip then overrides per-frame. + decoder.setChroma444(connection.isChroma444) + presenter.configure(hdr: connection.isHDR) + let token = token let decoder = decoder let recovery = recovery + let presenter = presenter + let pumpStopped = pumpStopped let thread = Thread { + defer { pumpStopped.signal() } // let stop() join the pump (bounded) before decoder.reset() var format: CMVideoFormatDescription? var lastFramesDropped = connection.framesDropped() // Persistent recovery WANT, not a one-shot edge (see StreamPump for the full rationale): - // the old code advanced lastFramesDropped on the same edge it called recovery.request(), - // so a request swallowed by the throttle (the lost recovery IDR being pruned within the - // window) was never re-sent and the picture stayed frozen. Keep asking until an IDR lands. + // keep asking until an IDR lands so a request swallowed by the throttle is re-sent. var awaitingIDR = false + // 4:4:4 backstop: a run of decode/create failures in a 4:4:4 session means this device can't + // decode 4:4:4 at the negotiated resolution (the HW probe clears the common case but not a + // resolution-ceiling miss). End cleanly instead of looping on a black screen. + var decodeFailRun = 0 while token.isLive { do { - // Loss recovery (the primary path). The reassembler drops unrecoverable AUs - // (framesDropped) and the decoder conceals the reference-missing deltas that - // follow — often WITHOUT an error callback — so key off the drop count climbing, - // then keep asking (awaitingIDR) until a fresh IDR re-anchors decode. Polled every - // iteration so a total-loss drought recovers the moment packets resume. + // Loss recovery (the primary path). The reassembler drops unrecoverable AUs and the + // decoder conceals the reference-missing deltas — often WITHOUT an error callback — + // so key off the drop count climbing, then keep asking (awaitingIDR) until a fresh + // IDR re-anchors decode. let dropped = connection.framesDropped() if dropped > lastFramesDropped { lastFramesDropped = dropped awaitingIDR = true } if awaitingIDR { recovery.request() } - // Drain any HDR mastering-metadata update (0xCE) and hand it to the decoder, which - // attaches it to subsequent HDR frames. Non-blocking; only HDR sessions emit these. - if connection.isHDR, let meta = try? connection.nextHdrMeta(timeoutMs: 0) { - decoder.setHdrMeta(meta) + // Drain HDR mastering metadata (0xCE) and hand it to the PRESENTER (→ CAEDRMetadata). + // Polled UNCONDITIONALLY (not gated on connection.isHDR, the fixed Welcome flag): the + // host sends 0xCE only for HDR, INCLUDING a mid-session SDR→HDR transition (a game + // entering HDR — the host re-inits its encoder) the Welcome flag would never reflect. + // Non-blocking; nil for an SDR stream. + if let meta = try? connection.nextHdrMeta(timeoutMs: 0) { + presenter.setHdrMeta(meta) } guard let au = try connection.nextAU(timeoutMs: 100) else { continue } onFrame?(au) @@ -144,12 +164,20 @@ public final class Stage2Pipeline { awaitingIDR = false // a fresh IDR re-anchored decode — recovery complete } guard let f = format, token.isLive else { continue } - if !decoder.decode(au: au, format: f) { - // Submit/decoder error: drop the session and re-gate on the next IDR's - // in-band parameter sets (a delta frame can't recover) — stage-1's policy — - // and keep asking for that IDR (infinite GOP) until one re-anchors decode. + if decoder.decode(au: au, format: f) { + decodeFailRun = 0 + } else { + // Submit/decoder error: drop the session and re-gate on the next IDR's in-band + // parameter sets (a delta frame can't recover) and keep asking for that IDR. decoder.reset() awaitingIDR = true + decodeFailRun += 1 + // ~3 s of solid failure in a 4:4:4 session (and only there — a 4:2:0 loss + // recovers within a GOP) ⇒ 4:4:4 isn't decodable here; end the session. + if connection.isChroma444, decodeFailRun >= 180 { + if token.isLive { onSessionEnd?() } + break + } } } catch { if token.isLive { onSessionEnd?() } @@ -159,22 +187,30 @@ public final class Stage2Pipeline { } thread.name = "punktfunk-stage2-pump" thread.qualityOfService = .userInteractive + pumpJoinable = true thread.start() } - /// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp - /// capture→present at `targetPresentNs` — the display link's target present instant, already - /// converted to `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`). + /// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp capture→present at + /// `targetPresentNs` — the display link's target present instant, already converted to + /// `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`). public func renderTick(targetPresentNs: Int64) { guard let frame = ring.take() else { return } guard presenter.render(frame.pixelBuffer, isHDR: frame.isHDR) else { return } presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs) } - /// Stop the pump (≤ one poll timeout) and drop the decode session. Does not close the - /// connection. A restart needs a fresh Stage2Pipeline (cancel is permanent). + /// Stop the pump (≤ one poll timeout) and drop the decode session. MAIN THREAD; idempotent. Does not + /// close the connection. A restart needs a fresh Stage2Pipeline (cancel is permanent). public func stop() { token.cancel() + // Join the pump (bounded: ≤ one nextAU poll + an in-flight decode) before resetting the decoder, + // so the pump can't rebuild a session right after the reset. Only the first stop joins; a + // repeat/deinit stop skips the already-drained semaphore. + if pumpJoinable { + pumpJoinable = false + _ = pumpStopped.wait(timeout: .now() + 0.5) + } decoder.reset() recovery.bind(nil) // stop requesting keyframes once the session is torn down } @@ -182,8 +218,8 @@ public final class Stage2Pipeline { deinit { token.cancel() } /// Convert a `CADisplayLink.targetTimestamp` (CACurrentMediaTime basis) to a `CLOCK_REALTIME` - /// nanosecond instant — the present clock the AU pts + skew offset live in. Projects to the - /// target present time (when the frame is actually on glass), not the moment we drew. + /// nanosecond instant — the present clock the AU pts + skew offset live in. Projects to the target + /// present time (when the frame is actually on glass), not the moment we drew. public static func realtimeNs(forDisplayLinkTimestamp t: CFTimeInterval) -> Int64 { let caNow = CACurrentMediaTime() var ts = timespec() diff --git a/clients/apple/Sources/PunktfunkKit/Stage444Probe.swift b/clients/apple/Sources/PunktfunkKit/Stage444Probe.swift new file mode 100644 index 0000000..474189c --- /dev/null +++ b/clients/apple/Sources/PunktfunkKit/Stage444Probe.swift @@ -0,0 +1,83 @@ +// Runtime 4:4:4 HEVC decode-capability probe. +// +// We advertise `VIDEO_CAP_444` (so the host upgrades to a full-chroma 4:4:4 stream) ONLY when this +// device can decode 4:4:4 HEVC *in hardware* — software 4:4:4 decode works but is far too slow for a +// real-time stream at the negotiated resolution, so a software-only device must keep 4:2:0. +// +// `VTIsHardwareDecodeSupported(HEVC)` and the HEVC-decoder-capabilities dictionary report HEVC HW +// decode but expose nothing about `chroma_format_idc`, so the only reliable signal is to actually +// create a *hardware-required* `VTDecompressionSession` for a tiny synthetic 4:4:4 keyframe and +// confirm it both creates and decodes to the expected biplanar 4:4:4 pixel format. Validated on an +// Apple M3 (HW 4:4:4 8- and 10-bit decode to `444v`/`x444`); a software-only decoder fails the +// hardware-required create and we fall back to 4:2:0. +// +// The probe blobs are 256×256 (above the hardware decoder's minimum-dimension floor — a 16×16 clip +// is rejected for ALL chroma formats, including 4:2:0) HEVC Range-Extensions keyframes generated +// offline with libx265; see scripts notes. Results are cached (device-static) in lazy statics. + +import CoreMedia +import CoreVideo +import Foundation +import VideoToolbox + +public enum Stage444Probe { + /// True iff this device hardware-decodes 8-bit 4:4:4 HEVC (the host's current 4:4:4 path — + /// BT.709 limited `yuv444p`). Cached after first evaluation. + public static let hwDecode444_8bit: Bool = probeHardware444( + au: Probe444Blobs.au444_8bit, + want: kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange, + fullRangeSibling: kCVPixelFormatType_444YpCbCr8BiPlanarFullRange) + + /// True iff this device hardware-decodes 10-bit 4:4:4 HEVC (the 4:4:4 ∩ HDR/10-bit intersection). + /// Cached after first evaluation. + public static let hwDecode444_10bit: Bool = probeHardware444( + au: Probe444Blobs.au444_10bit, + want: kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange, + fullRangeSibling: kCVPixelFormatType_444YpCbCr10BiPlanarFullRange) + + /// Create a hardware-REQUIRED `VTDecompressionSession` for the synthetic 4:4:4 keyframe and + /// decode it, returning true only when the decoder produces the expected (video- or full-range) + /// biplanar 4:4:4 pixel format. Any failure (no hardware path, wrong output format, decode error) + /// → false → we keep 4:2:0. + private static func probeHardware444( + au auBytes: [UInt8], want: OSType, fullRangeSibling: OSType + ) -> Bool { + let data = Data(auBytes) + guard let format = AnnexB.formatDescription(fromIDR: data) else { return false } + // Require a hardware decoder — a software false-positive would make us advertise 4:4:4 and + // then decode every real frame on the CPU, blowing the latency budget. + let spec: [CFString: Any] = [ + kVTVideoDecoderSpecification_RequireHardwareAcceleratedVideoDecoder: true, + ] + let attrs: [CFString: Any] = [ + kCVPixelBufferPixelFormatTypeKey: want, + kCVPixelBufferMetalCompatibilityKey: true, + ] + var session: VTDecompressionSession? + let created = VTDecompressionSessionCreate( + allocator: kCFAllocatorDefault, formatDescription: format, + decoderSpecification: spec as CFDictionary, imageBufferAttributes: attrs as CFDictionary, + outputCallback: nil, decompressionSessionOut: &session) + guard created == noErr, let session else { return false } + defer { VTDecompressionSessionInvalidate(session) } + + let au = AccessUnit(data: data, ptsNs: 0, frameIndex: 0, flags: 0) + guard let sample = AnnexB.sampleBuffer(au: au, format: format) else { return false } + + var produced: OSType = 0 + let done = DispatchSemaphore(value: 0) + let status = VTDecompressionSessionDecodeFrame( + session, sampleBuffer: sample, + flags: [._EnableAsynchronousDecompression], infoFlagsOut: nil + ) { status, _, imageBuffer, _, _ in + if status == noErr, let imageBuffer { + produced = CVPixelBufferGetPixelFormatType(imageBuffer) + } + done.signal() + } + guard status == noErr else { return false } + VTDecompressionSessionWaitForAsynchronousFrames(session) + _ = done.wait(timeout: .now() + 1.0) + return produced == want || produced == fullRangeSibling + } +} diff --git a/clients/apple/Sources/PunktfunkKit/StreamView.swift b/clients/apple/Sources/PunktfunkKit/StreamView.swift index 8449984..5d6768d 100644 --- a/clients/apple/Sources/PunktfunkKit/StreamView.swift +++ b/clients/apple/Sources/PunktfunkKit/StreamView.swift @@ -137,8 +137,8 @@ public struct StreamView: NSViewRepresentable { public final class StreamLayerView: NSView { private let displayLayer = AVSampleBufferDisplayLayer() private var pump: StreamPump? - /// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a - /// display link instead of the StreamPump → displayLayer path. nil = stage-1 (default). + /// Stage-2 presenter (default): a CAMetalLayer sublayer driven by a display link instead of the + /// StreamPump → displayLayer path. nil = stage-1 (Metal-unavailable fallback / DEBUG toggle). var presentMeter: LatencyMeter? private var stage2: Stage2Pipeline? private var stage2Link: CADisplayLink? @@ -638,7 +638,7 @@ public final class StreamLayerView: NSView { private func teardownStage2() { stage2Link?.invalidate() stage2Link = nil - stage2?.stop() + stage2?.stop() // stops the pump (synchronous join) + drops the decode session stage2 = nil metalLayer?.removeFromSuperlayer() metalLayer = nil diff --git a/clients/apple/Sources/PunktfunkKit/StreamViewIOS.swift b/clients/apple/Sources/PunktfunkKit/StreamViewIOS.swift index 58a402c..6d4af7c 100644 --- a/clients/apple/Sources/PunktfunkKit/StreamViewIOS.swift +++ b/clients/apple/Sources/PunktfunkKit/StreamViewIOS.swift @@ -92,8 +92,8 @@ public final class StreamViewController: UIViewController { public private(set) var connection: PunktfunkConnection? private var pump: StreamPump? private var observers: [NSObjectProtocol] = [] - /// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a - /// CADisplayLink instead of the StreamPump → displayLayer path. nil = stage-1 (default). + /// Stage-2 presenter (default): a CAMetalLayer sublayer driven by a CADisplayLink instead of the + /// StreamPump → displayLayer path. nil = stage-1 (Metal-unavailable fallback / DEBUG toggle). var presentMeter: LatencyMeter? private var stage2: Stage2Pipeline? private var stage2Link: CADisplayLink? @@ -155,19 +155,58 @@ public final class StreamViewController: UIViewController { } #if os(iOS) - // Pointer lock is only meaningful on iPad (iPhone has no hardware-pointer lock) and - // only when capture is engaged. The system additionally requires full-screen + frontmost - // and may drop it (Slide Over/Stage Manager/backgrounding) — verified in setCaptured(). - public override var prefersPointerLocked: Bool { - captured && UIDevice.current.userInterfaceIdiom == .pad + /// Whether the user wants the mouse/trackpad pointer CAPTURED (pointer lock → relative + /// movement, the gaming default) rather than forwarded as an absolute position (desktop + /// use). Read live from UserDefaults so it tracks the Settings toggle; defaults to on when + /// unset. iPad-only — gated again in `prefersPointerLocked`. + private var pointerCaptureEnabled: Bool { + UserDefaults.standard.object(forKey: DefaultsKey.pointerCapture) as? Bool ?? true } + + /// Whether the pointer should be CAPTURED right now: iPad, capture engaged, and the user + /// hasn't opted into the absolute (desktop) pointer. The system additionally requires + /// full-screen + frontmost and may drop the lock (Slide Over/Stage Manager/backgrounding) — + /// syncPointerLock() handles the actual grant/drop and falls back to absolute when unlocked. + private var wantsPointerLock: Bool { + captured && pointerCaptureEnabled && UIDevice.current.userInterfaceIdiom == .pad + } + + public override var prefersPointerLocked: Bool { wantsPointerLock } public override var prefersHomeIndicatorAutoHidden: Bool { true } - // If SwiftUI's UIHostingController reparents us, a plain container parent that forwards - // its pointer-lock decision to its children will then reach this VC. (UIHostingController - // itself does not consult children, which is why GCMouse deltas can never arrive there — - // the touch path, always forwarded, is the unconditional fallback.) - public override var childViewControllerForPointerLock: UIViewController? { self } + // NOTE: we deliberately do NOT override `childViewControllerForPointerLock`. The default + // returns nil, which tells the system to use THIS controller's own `prefersPointerLocked` — + // exactly what we want, since `PointerLockChain` forces our SwiftUI ancestors to forward the + // downward walk to us and we are the terminal anchor. Returning `self` here would make the + // system ask the same controller forever (it keeps delegating to the returned child) → + // unbounded recursion → stack overflow once the chain actually reaches us. + + /// (Re)build or tear down the forced pointer-lock forwarding chain from this controller to the + /// window root so the system actually resolves our `prefersPointerLocked`. Safe to call + /// repeatedly — it no-ops until the view is in a window with a parent chain, and re-runs from + /// the appearance/parent callbacks once SwiftUI has placed us. + private func updatePointerLockChain() { + // Engaging needs a live parent chain to the window root; disengaging is always safe and + // must run even after the view has left the window (session teardown) so the stamped + // SwiftUI ancestors are cleared. + if wantsPointerLock, view.window != nil { + PointerLockChain.engage(self) + } else { + PointerLockChain.disengage(self) + } + } + + public override func viewDidAppear(_ animated: Bool) { + super.viewDidAppear(animated) + // SwiftUI places us in the hierarchy AFTER start()'s setCaptured(true), and may reparent us + // later — re-anchor the chain here so a lock requested before we had a parent still lands. + updatePointerLockChain() + } + + public override func didMove(toParent parent: UIViewController?) { + super.didMove(toParent: parent) + updatePointerLockChain() // chain shape changed — re-anchor (or no-op if not yet in a window) + } #endif func start( @@ -200,7 +239,14 @@ public final class StreamViewController: UIViewController { // Indirect pointer (mouse/trackpad with no lock) → absolute cursor + buttons, routed // through InputCapture so the forwarding gate and release-on-blur apply uniformly. streamView.onPointerMoveAbs = { [weak self] p in - self?.inputCapture?.sendMouseAbs( + guard let self else { return } + if iosInputDebug { + // Whether ANY UIKit pointer movement reaches us while the scene is LOCKED tells us + // if the trackpad (which may not be a GCMouse) can still be captured via UIKit. + iosInputLog.debug( + "UIKit pointer move x=\(p.x, privacy: .public) y=\(p.y, privacy: .public) locked=\(self.pointerLockEngaged() == true, privacy: .public) gcFwd=\(self.inputCapture?.gcMouseForwarding == true, privacy: .public)") + } + self.inputCapture?.sendMouseAbs( x: p.x, y: p.y, surfaceWidth: p.w, surfaceHeight: p.h) } streamView.onPointerButton = { [weak self] button, down in @@ -210,7 +256,12 @@ public final class StreamViewController: UIViewController { // UNLOCKED regime; while locked, GCMouse's scroll handler owns it — mirror the // sendMouseAbs !gcMouseForwarding gate so the two can't double-send. streamView.onScroll = { [weak self] dx, dy in - guard let self, self.inputCapture?.gcMouseForwarding == false else { return } + guard let self else { return } + if iosInputDebug { + iosInputLog.debug( + "UIKit scroll dx=\(dx, privacy: .public) dy=\(dy, privacy: .public) locked=\(self.pointerLockEngaged() == true, privacy: .public)") + } + guard self.inputCapture?.gcMouseForwarding == false else { return } self.inputCapture?.sendScroll(dx: dx, dy: dy) } @@ -315,7 +366,7 @@ public final class StreamViewController: UIViewController { ) { let metal = pipeline.layer // Composites OVER the idle (un-enqueued in stage-2) AVSampleBufferDisplayLayer base. - // (contentsScale + frame + drawableSize are all set by layoutMetalLayer() just below.) + // (contentsScale + frame are set by layoutMetalLayer() just below.) streamView.layer.addSublayer(metal) metalLayer = metal stage2 = pipeline @@ -372,7 +423,7 @@ public final class StreamViewController: UIViewController { private func teardownStage2() { stage2Link?.invalidate() stage2Link = nil - stage2?.stop() + stage2?.stop() // stops the pump (synchronous join) + drops the decode session stage2 = nil metalLayer?.removeFromSuperlayer() metalLayer = nil @@ -392,6 +443,7 @@ public final class StreamViewController: UIViewController { captured = false } setNeedsUpdateOfPrefersPointerLocked() + updatePointerLockChain() // (re)anchor the SwiftUI ancestors so the lock actually resolves syncPointerLock() // resolve cursor + GCMouse/absolute routing for the current state let onCaptureChange = onCaptureChange let captured = captured @@ -420,7 +472,7 @@ public final class StreamViewController: UIViewController { pointerInteraction?.invalidate() // re-resolve the hidden/visible cursor for the state if iosInputDebug { iosInputLog.debug( - "pointer lock isLocked=\(locked, privacy: .public) captured=\(self.captured, privacy: .public)") + "pointer lock isLocked=\(locked, privacy: .public) captured=\(self.captured, privacy: .public) useGCMouse=\(useGCMouse, privacy: .public) [\(self.inputCapture?.attachedMiceSummary ?? "n/a", privacy: .public)]") } } #endif diff --git a/clients/apple/Sources/PunktfunkKit/VideoDecoder.swift b/clients/apple/Sources/PunktfunkKit/VideoDecoder.swift index fa84521..127b3f9 100644 --- a/clients/apple/Sources/PunktfunkKit/VideoDecoder.swift +++ b/clients/apple/Sources/PunktfunkKit/VideoDecoder.swift @@ -49,11 +49,10 @@ public final class VideoDecoder: @unchecked Sendable { /// pump can re-gate on the next IDR. private let onDecodeError: @Sendable (OSStatus) -> Void - /// Latest source HDR mastering metadata (from `PunktfunkConnection.nextHdrMeta`), attached to - /// each decoded HDR pixel buffer so the compositor tone-maps from the real grade. Guarded by its - /// own lock — written by the pump thread, read on the VT decode callback. - private let metaLock = NSLock() - private var hdrMeta: PunktfunkConnection.HdrMeta? + /// Whether the negotiated stream is full-chroma 4:4:4 (`connection.isChroma444`), set once at + /// session start before any decode. Selects the 4:4:4 decode pixel format (orthogonal to bit + /// depth / HDR). Read inside `createSessionLocked` under `lock`. + private var chroma444 = false public init( onDecoded: @escaping @Sendable (ReadyFrame) -> Void, @@ -65,12 +64,13 @@ public final class VideoDecoder: @unchecked Sendable { deinit { teardown() } - /// Set the source HDR mastering metadata (drained from `PunktfunkConnection.nextHdrMeta`). It's - /// attached to subsequent decoded HDR pixel buffers. Thread-safe; cheap to call on each update. - public func setHdrMeta(_ meta: PunktfunkConnection.HdrMeta) { - metaLock.lock() - hdrMeta = meta - metaLock.unlock() + /// Select the chroma subsampling of the decode output (4:2:0 vs full-chroma 4:4:4). Call once at + /// session start, before decoding, from `connection.isChroma444`. Takes effect on the next + /// session (re)build. Thread-safe. + public func setChroma444(_ on: Bool) { + lock.lock() + chroma444 = on + lock.unlock() } /// Submit one AU for asynchronous decode, (re)creating the session if `format` changed. The @@ -135,8 +135,10 @@ public final class VideoDecoder: @unchecked Sendable { /// True when `newFormat` carries a PQ (SMPTE ST 2084) or HLG transfer function — i.e. the host /// is sending HDR (BT.2020). VideoToolbox populates the transfer-function extension from the - /// HEVC VUI, so this tracks the *stream*, switching dynamically when the user toggles HDR - /// (the host re-emits parameter sets with the new VUI → a new format desc → session rebuild). + /// HEVC VUI, so this picks the decode bit depth (10-bit P010/x444 vs 8-bit NV12/444v) from the + /// stream. The present-side HDR config (colorspace/EDR/shader) is latched once per session from + /// the Welcome (`connection.isHDR`), which the host does NOT flip mid-session — so this predicate + /// and that config agree for the session (a `#if DEBUG` assert in the presenter guards it). static func isHDRFormat(_ format: CMVideoFormatDescription) -> Bool { guard let tf = CMFormatDescriptionGetExtension( @@ -157,11 +159,18 @@ public final class VideoDecoder: @unchecked Sendable { session = nil format = nil + // Decode pixel format is a 2×2 of (chroma, depth/HDR), both biplanar so the presenter binds + // plane 0 = luma, plane 1 = interleaved chroma uniformly — 4:4:4 just delivers a full-size + // chroma plane. 10-bit (P010 / `x444`) for HDR (PQ/HLG), 8-bit (NV12 / `444v`) otherwise. let hdr = Self.isHDRFormat(newFormat) - let pixelFormat = - hdr - ? kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010 (10-bit) - : kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12 (8-bit) + let pixelFormat: OSType = { + switch (chroma444, hdr) { + case (false, false): return kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12 + case (false, true): return kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010 + case (true, false): return kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange // 444v + case (true, true): return kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange // x444 + } + }() let imageAttrs: [CFString: Any] = [ kCVPixelBufferMetalCompatibilityKey: true, kCVPixelBufferPixelFormatTypeKey: pixelFormat, @@ -169,11 +178,20 @@ public final class VideoDecoder: @unchecked Sendable { var callback = VTDecompressionOutputCallbackRecord( decompressionOutputCallback: decoderOutputCallback, decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque()) + // 4:4:4 sessions REQUIRE a hardware decoder: we only advertise 4:4:4 when the hardware probe + // passed, so a hardware-incapable mode (e.g. a resolution past the HW 4:4:4 ceiling) must fail + // HERE, synchronously, letting the pump's backstop end the session — rather than silently + // falling back to a software 4:4:4 decoder far too slow for a real-time stream. 4:2:0 keeps the + // software fallback (nil spec) as a robustness net. + let spec: CFDictionary? = + chroma444 + ? [kVTVideoDecoderSpecification_RequireHardwareAcceleratedVideoDecoder: true] as CFDictionary + : nil var newSession: VTDecompressionSession? let status = VTDecompressionSessionCreate( allocator: kCFAllocatorDefault, formatDescription: newFormat, - decoderSpecification: nil, // hardware by default + decoderSpecification: spec, imageBufferAttributes: imageAttrs as CFDictionary, outputCallback: &callback, decompressionSessionOut: &newSession) @@ -195,26 +213,17 @@ public final class VideoDecoder: @unchecked Sendable { // pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively. let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default) let ptsNs = p.value > 0 ? UInt64(p.value) : 0 - // HDR iff the decoder produced a 10-bit P010 buffer (we only request P010 for PQ streams). + // HDR iff the decoder produced a 10-bit buffer (we only request a 10-bit format for PQ/HLG + // streams). Covers 4:2:0 (P010) and 4:4:4 (`x444`), video- and full-range, so a 10-bit 4:4:4 + // HDR frame isn't misclassified as SDR. (The mastering metadata is applied to the presenter's + // CAMetalLayer via CAEDRMetadata, not to this source buffer — a separate-drawable presenter + // never composites the source buffer's attachments, so attaching them here would be dead.) + let fmt = CVPixelBufferGetPixelFormatType(imageBuffer) let isHDR = - CVPixelBufferGetPixelFormatType(imageBuffer) - == kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange - // Attach the source's mastering display + content light level (ST.2086 / CEA-861.3) so the - // compositor tone-maps from the real grade rather than inferring from the PQ colourspace - // alone. The SEI byte payloads map 1:1 to these CVImageBuffer attachment keys. - if isHDR { - metaLock.lock() - let meta = hdrMeta - metaLock.unlock() - if let meta { - CVBufferSetAttachment( - imageBuffer, kCVImageBufferMasteringDisplayColorVolumeKey, - meta.masteringDisplayColorVolume() as CFData, .shouldPropagate) - CVBufferSetAttachment( - imageBuffer, kCVImageBufferContentLightLevelInfoKey, - meta.contentLightLevelInfo() as CFData, .shouldPropagate) - } - } + fmt == kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange + || fmt == kCVPixelFormatType_420YpCbCr10BiPlanarFullRange + || fmt == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange + || fmt == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange onDecoded( ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer, isHDR: isHDR)) } diff --git a/clients/apple/Tests/PunktfunkKitTests/MetalPresenterTests.swift b/clients/apple/Tests/PunktfunkKitTests/MetalPresenterTests.swift index 3d4c71b..eaffe94 100644 --- a/clients/apple/Tests/PunktfunkKitTests/MetalPresenterTests.swift +++ b/clients/apple/Tests/PunktfunkKitTests/MetalPresenterTests.swift @@ -1,11 +1,13 @@ import XCTest #if canImport(Metal) +import CoreVideo import Metal +import QuartzCore @testable import PunktfunkKit final class MetalPresenterTests: XCTestCase { - /// `MetalVideoPresenter.init?()` compiles the runtime Metal shaders (the BT.709/BT.2020 YUV→RGB + /// `MetalVideoPresenter.make()` compiles the runtime Metal shaders (the BT.709/BT.2020 YUV→RGB /// fragment shaders plus the Catmull-Rom luma sampler). A `nil` result on a GPU-equipped host /// means a shader failed to compile — this catches a malformed shader before it silently /// degrades stage-2 to a stage-1 fallback on device. @@ -14,8 +16,54 @@ final class MetalPresenterTests: XCTestCase { throw XCTSkip("no Metal device available in this environment") } XCTAssertNotNil( - MetalVideoPresenter(), + MetalVideoPresenter.make(), "stage-2 Metal shaders failed to compile (presenter init returned nil)") } + + /// The HDR fix: `configure(hdr:)` must put the layer into the BT.2020-PQ EDR configuration with a + /// reference-white anchor (`edrMetadata`) — the missing anchor was what made HDR render "too + /// bright". SDR must use the plain 8-bit path with EDR off and no metadata. A mid-session flip is a + /// per-mode reconfigure, so the round trip back to SDR must fully restore the SDR config. + func testConfigureHDRSetsEDRAnchor() throws { + guard let presenter = MetalVideoPresenter.make() else { + throw XCTSkip("no Metal device available in this environment") + } + presenter.configure(hdr: true) + XCTAssertEqual(presenter.layer.pixelFormat, .rgba16Float, "HDR uses an EDR-capable drawable") + XCTAssertNotNil(presenter.layer.colorspace, "HDR layer must be tagged (itur_2100_PQ)") + XCTAssertTrue( + presenter.layer.wantsExtendedDynamicRangeContent, "EDR must be requested on all platforms") + XCTAssertNotNil( + presenter.layer.edrMetadata, + "HDR must anchor reference white via edrMetadata (the fix for 'too bright')") + + // Mid-session HDR→SDR flip: the 8-bit path, EDR off, no metadata. + presenter.configure(hdr: false) + XCTAssertEqual(presenter.layer.pixelFormat, .bgra8Unorm, "SDR uses the plain 8-bit drawable") + XCTAssertFalse(presenter.layer.wantsExtendedDynamicRangeContent) + XCTAssertNil(presenter.layer.edrMetadata) + } + + /// `render` with a freshly-allocated NV12 buffer must present without crashing or hanging — the + /// main-thread present path is the highest-risk part of the stage-2 rewrite. (A headless CI with no + /// display can still allocate a drawable from a CAMetalLayer; if it can't, render returns false, + /// which is also a valid non-crashing outcome.) + func testRenderDoesNotCrashOnNV12Frame() throws { + guard let presenter = MetalVideoPresenter.make() else { + throw XCTSkip("no Metal device available in this environment") + } + presenter.configure(hdr: false) + var pb: CVPixelBuffer? + let attrs: [CFString: Any] = [kCVPixelBufferMetalCompatibilityKey: true] + let status = CVPixelBufferCreate( + kCFAllocatorDefault, 256, 256, kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, + attrs as CFDictionary, &pb) + guard status == kCVReturnSuccess, let pixelBuffer = pb else { + throw XCTSkip("could not allocate a test pixel buffer") + } + // Just asserting it returns (true or false) without trapping — the layer may have no drawable + // source headless, so a false return is acceptable. + _ = presenter.render(pixelBuffer, isHDR: false) + } } #endif diff --git a/clients/apple/Tests/PunktfunkKitTests/Stage444Tests.swift b/clients/apple/Tests/PunktfunkKitTests/Stage444Tests.swift new file mode 100644 index 0000000..e78d9f3 --- /dev/null +++ b/clients/apple/Tests/PunktfunkKitTests/Stage444Tests.swift @@ -0,0 +1,68 @@ +// 4:4:4 decode-path coverage: the hardware-capability probe is stable/cached, and a real 4:4:4 HEVC +// keyframe decodes through VideoDecoder to a biplanar 4:4:4 pixel buffer. Reuses the same synthetic +// 4:4:4 blobs the runtime probe ships with. + +import CoreVideo +import VideoToolbox +import XCTest +@testable import PunktfunkKit + +private final class FrameBox: @unchecked Sendable { + let lock = NSLock() + var frame: ReadyFrame? + var error: OSStatus? +} + +final class Stage444Tests: XCTestCase { + /// The capability probe is device-static and cached — reading it twice must return the same value + /// (and must never crash, including where 4:4:4 is unsupported → false). + func testProbeIsStableAndCached() { + XCTAssertEqual(Stage444Probe.hwDecode444_8bit, Stage444Probe.hwDecode444_8bit) + XCTAssertEqual(Stage444Probe.hwDecode444_10bit, Stage444Probe.hwDecode444_10bit) + } + + /// A real 8-bit 4:4:4 HEVC keyframe (the embedded probe blob) decodes through `VideoDecoder` with + /// `setChroma444(true)` to a 256×256 biplanar 4:4:4 (`444v`/`444f`) buffer classified SDR. + /// (4:4:4 sessions require a hardware decoder — skip where there isn't one, which is exactly where + /// the client wouldn't advertise 4:4:4 anyway.) + func testVideoDecoderDecodes444() throws { + try XCTSkipUnless( + Stage444Probe.hwDecode444_8bit, "no hardware 4:4:4 decode on this device") + let data = Data(Probe444Blobs.au444_8bit) + let format = try XCTUnwrap( + AnnexB.formatDescription(fromIDR: data), "the 4:4:4 blob must yield a format description") + let au = AccessUnit(data: data, ptsNs: 7_000_000, frameIndex: 0, flags: 0) + + let box = FrameBox() + let done = DispatchSemaphore(value: 0) + let decoder = VideoDecoder( + onDecoded: { f in box.lock.lock(); box.frame = f; box.lock.unlock(); done.signal() }, + onDecodeError: { s in box.lock.lock(); box.error = s; box.lock.unlock(); done.signal() }) + decoder.setChroma444(true) + + XCTAssertTrue(decoder.decode(au: au, format: format), "4:4:4 frame submit should succeed") + XCTAssertEqual(done.wait(timeout: .now() + 10), .success, "the decode callback must fire") + decoder.reset() + + box.lock.lock(); let frame = box.frame; let error = box.error; box.lock.unlock() + XCTAssertNil(error.map { "decode error \($0)" }) + let ready = try XCTUnwrap(frame, "a 4:4:4 ReadyFrame must be delivered") + XCTAssertEqual(CVPixelBufferGetWidth(ready.pixelBuffer), 256) + XCTAssertEqual(CVPixelBufferGetHeight(ready.pixelBuffer), 256) + let pf = CVPixelBufferGetPixelFormatType(ready.pixelBuffer) + XCTAssertTrue( + pf == kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange + || pf == kCVPixelFormatType_444YpCbCr8BiPlanarFullRange, + "expected a biplanar 4:4:4 8-bit buffer, got \(fourCCString(pf))") + XCTAssertFalse(ready.isHDR, "an 8-bit BT.709 4:4:4 stream is SDR") + // The chroma plane (plane 1) must be FULL resolution for 4:4:4 (vs half for 4:2:0) — this is + // what lets the unchanged shader sample chroma at the luma UV. + XCTAssertEqual(CVPixelBufferGetWidthOfPlane(ready.pixelBuffer, 1), 256) + XCTAssertEqual(CVPixelBufferGetHeightOfPlane(ready.pixelBuffer, 1), 256) + } + + private func fourCCString(_ t: OSType) -> String { + let b = [UInt8(t >> 24 & 0xff), UInt8(t >> 16 & 0xff), UInt8(t >> 8 & 0xff), UInt8(t & 0xff)] + return String(bytes: b, encoding: .ascii) ?? "\(t)" + } +} diff --git a/design/apple-stage2-presenter.md b/design/apple-stage2-presenter.md index e6f11c4..e84ad2c 100644 --- a/design/apple-stage2-presenter.md +++ b/design/apple-stage2-presenter.md @@ -3,12 +3,61 @@ title: "Apple Stage-2 Presenter (handoff)" description: "Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left." --- -> **Status:** SHIPPED behind the opt-in `punktfunk.presenter` flag (`AVSampleBufferDisplayLayer` -> stage-1 remains the default known-good path). Live-validated ~11 ms p50 capture→present (commit -> `7b10714`). Code: `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,LatencyMeter}.swift`; -> Settings has a presenter picker (`DefaultsKey.presenter`, `SettingsView.swift`). This doc is trimmed -> to design rationale + open items — the shipped `.swift` code is the source of truth for the -> decode/present/measurement walkthrough. +> **Status:** SHIPPED as the **default** presenter (stage-1 `AVSampleBufferDisplayLayer` is the +> Metal-unavailable / DEBUG fallback). HDR corrected and **4:4:4** added on top of the proven +> main-thread present path (the hosting view's `CADisplayLink` drives `render` per vsync). Code: +> `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,Stage444Probe,LatencyMeter}.swift`. +> This doc is trimmed to design rationale + open items — the shipped `.swift` code is the source of +> truth for the decode/present/measurement walkthrough. +> +> **HDR (the "too bright" fix).** The presenter renders to a *separate* CAMetalLayer drawable, so the +> mastering metadata that was attached to the source `CVPixelBuffer` was never composited — and with no +> reference-white anchor the system rendered the PQ signal far too bright. The fix is to keep the +> PQ-passthrough shader (BT.2020 limited→full → PQ R′G′B′ as-is) and put the anchor **on the layer**: +> `colorspace = itur_2100_PQ`, `wantsExtendedDynamicRangeContent = true` (on **all** platforms — the old +> `#if os(macOS)` guard left iOS/tvOS EDR half-engaged), and +> `edrMetadata = CAEDRMetadata.hdr10(displayInfo:contentInfo:opticalOutputScale: 203)`. 203 nits = +> BT.2408 HDR reference white anchors diffuse white at EDR 1.0; a larger value renders dimmer. The +> mastering/CLL blobs (host `0xCE` datagram) now refine `edrMetadata` (drained by the pump, +> `setHdrMeta` hops the layer write to main) rather than being attached to a never-composited source +> buffer. **Needs on-glass validation on a real EDR panel.** +> +> **Mid-session SDR↔HDR.** The control-plane colour (`connection.isHDR`, from the Welcome) is fixed per +> session, but the host can re-init its encoder mid-session (a game entering HDR), so the HEVC VUI — and +> the decoder's `frame.isHDR` — flips. The presenter follows the **decoded frame**, not the latched +> session flag: `render` calls the idempotent `configure(hdr:)` every frame, so on a flip it +> reconfigures the layer (per-mode pixel format `bgra8Unorm` SDR / `rgba16Float` HDR, colorspace, EDR) +> and selects the matching shader — all synchronously on the main thread (the present path is +> main-thread, so no cross-thread hop is needed). The last `0xCE` grade is cached so an SDR→HDR +> reconfigure re-applies the real mastering metadata instead of the bare anchor. The pump drains `0xCE` +> **unconditionally** (not gated on the Welcome flag) so a session that starts SDR still gets mastering +> metadata when it goes HDR. A ≤2-frame transition flash on the rare flip is accepted. +> +> **Pacing.** The hosting view owns a **main-runloop `CADisplayLink`** (a weak `DisplayLinkProxy` +> breaks the retain cycle) that calls `renderTick` once per vsync. `renderTick` pops the **newest** +> ready frame from the 1-slot ring (older undisplayed frames dropped — lowest latency, no smoothing +> buffer) and, if there is one, draws it via **manual `layer.nextDrawable()`** and presents at the next +> vsync; on an idle vsync (no new frame) it does nothing and the compositor holds the last presented +> drawable (no idle re-render — matters at 5K). `drawableSize` is set **before** `nextDrawable` (it +> doesn't track bounds, defaults to 0), so allocation always uses the decoded size. `maximumDrawableCount +> = 3`. macOS `displaySyncEnabled = **false**`: the display link is the single pacing source, so leaving +> the layer's own vsync wait on would *also* block `present`/`nextDrawable` on the main thread and +> serialize it to the display — the cause of the fullscreen judder; disabling it lets present return +> promptly. Present is stamped at the display link's `targetTimestamp` projected to `CLOCK_REALTIME` +> (the actual on-glass instant, <1 vsync after the draw — accurate for the HUD). +> +> *(History: an off-main `CAMetalDisplayLink` variant and an off-main blocking-render present thread +> were both tried and **reverted** — both measured slower on macOS *and* iPad than this main-thread +> display-link path, whose real judder fix was simply `displaySyncEnabled = false`, not moving present +> off-thread. Measured ~11 ms p50 on the main-thread path.)* +> +> **4:4:4.** Chroma, bit-depth, and colorimetry are orthogonal: the decode pixel format is a 2×2 of +> `(chroma, HDR)` → `420v/x420/444v/x444` (all biplanar, so the existing shaders sample a full-size +> chroma plane unchanged); the shader is keyed only on HDR. The client advertises `VIDEO_CAP_444` only +> when `Stage444Probe` confirms **hardware** 4:4:4 decode (a hardware-required `VTDecompressionSession` +> over an embedded 256×256 4:4:4 keyframe — software 4:4:4 is too slow for real-time; validated on M3: +> `444v`/`x444` produced). A bounded pump backstop ends a 4:4:4 session that persistently fails to +> decode (gated to 4:4:4 sessions, so 4:2:0 loss-recovery is untouched). ## Why stage 2 (design rationale) @@ -47,10 +96,28 @@ Async `VTDecompressionSession` callback → **1-slot newest-ready ring** → dis ## Open items -- **Make stage 2 the default** — after resolution / HDR edge-case checks (HDR = BT.2020/PQ, 10-bit - `…10BiPlanar` + EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`; ties in with the HDR roadmap). +- **On-glass HDR validation** — eyeball `edrMetadata` + `opticalOutputScale: 203` on a real EDR panel + (XDR display) against stage-1 side-by-side: diffuse white should sit at SDR-white level with only + highlights climbing. The reference white is a single named constant (`hdrReferenceWhiteNits`) for easy + tuning. (Needs a Windows HDR host; the Linux host is 8-bit SDR only.) +- **On-glass 4:4:4 validation** — confirm a `PUNKTFUNK_444` host (RTX box) streams a 4:4:4 session the + client decodes in hardware (HUD shows the resolved chroma); verify the resolution-ceiling backstop by + forcing a too-large 4:4:4 mode. - **Glass-to-glass numbers via `tools/latency-probe`** — close the still-unmeasured host render→capture - term. -- **Smoothing / pacing policy** — present newest-ready for lowest latency today; a pacing policy can come - later if frames look uneven. -- **iOS / iPadOS / tvOS stage-2 variants.** + term and confirm the main-thread display-link present p50 holds at ~11 ms (and isn't regressed by the + per-frame `configure` / HDR-anchor work). +- **Smoothing / pacing policy** — present newest-ready for lowest latency today; an optional even-pacing + policy (`present(_:afterMinimumDuration:)`) can come later if frames look uneven. +- **4:4:4 runtime downgrade-reconnect** — today a persistently-undecodable 4:4:4 session ends cleanly + (the live 4:4:4 decode requires hardware, so a resolution-ceiling miss fails the session create + *synchronously* and the pump backstop ends it — no black-screen loop); auto-reconnecting at 4:2:0 + (dropping `VIDEO_CAP_444`) is a future refinement. +- **HLG** — `isHDR`/`isHDRFormat` fold HLG (transfer 18) in with PQ, but the presenter is PQ-only + (`itur_2100_PQ` + `hdr10` EDR), so an HLG stream would be mis-toned. Latent — no host emits HLG + (the stack is BT.2020 **PQ** only). A real HLG path (`itur_2100_HLG`, no PQ reference-white anchor) + is future work; until then HLG should be treated as out of scope. +- **Full-range** — the shaders hardcode limited→full expansion and the decoder requests the + `*VideoRange` formats regardless of `connection.colorFullRange`; VideoToolbox range-converts a + full-range source to video range on decode, so it stays self-consistent (mild level compression on + genuinely full-range content, which no host emits). Pre-existing; wire `colorFullRange` into the + range constants eventually.