feat(apple): Improve presenter
feat(apple): add cursor capture on iPad
This commit is contained in:
@@ -144,11 +144,25 @@ Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protoc
|
|||||||
`test-loopback.sh` (Swift client vs synthetic punktfunk1-hosts on loopback — runs on macOS;
|
`test-loopback.sh` (Swift client vs synthetic punktfunk1-hosts on loopback — runs on macOS;
|
||||||
includes the pairing ceremony + `--require-pairing` gate),
|
includes the pairing ceremony + `--require-pairing` gate),
|
||||||
`RemoteFirstLightTests` (full pipeline over the LAN). See
|
`RemoteFirstLightTests` (full pipeline over the LAN). See
|
||||||
[`clients/apple/README.md`](clients/apple/README.md). **Stage 2 presenter**
|
[`clients/apple/README.md`](clients/apple/README.md). **Stage 2 presenter is now the DEFAULT**
|
||||||
(`VTDecompressionSession` + `CAMetalLayer`) is built and live-validated on glass behind the opt-in
|
(stage-1 is the Metal-unavailable / DEBUG fallback): explicit `VTDecompressionSession` decode →
|
||||||
`punktfunk.presenter` flag (~11 ms p50 capture→present), to become the default after a few
|
`CAMetalLayer`, presented from the hosting view's **main-runloop `CADisplayLink`** (`renderTick` pops
|
||||||
resolution/HDR checks. Next: make stage 2 the default, glass-to-glass numbers via
|
the newest ready frame per vsync; macOS `displaySyncEnabled = false` is the real fullscreen-judder fix,
|
||||||
`tools/latency-probe`, iOS/iPadOS/tvOS variants.
|
~11 ms p50). *(An off-main `CAMetalDisplayLink` and an off-main blocking-render present thread were
|
||||||
|
both tried and reverted — both measured slower on macOS and iPad.)* **HDR fixed**
|
||||||
|
(`design/apple-stage2-presenter.md`): the "too bright" bug was a missing reference-white anchor — the
|
||||||
|
fix keeps the PQ-passthrough shader and adds `CAEDRMetadata.hdr10(…, opticalOutputScale: 203)` +
|
||||||
|
`wantsExtendedDynamicRangeContent` on the layer (on all platforms; the old `#if os(macOS)` guard left
|
||||||
|
iOS/tvOS EDR half-engaged), routing the 0xCE mastering metadata to the layer (via `setHdrMeta`) instead
|
||||||
|
of a never-composited source buffer. **Mid-session SDR↔HDR** is handled: `render` reconciles the layer
|
||||||
|
per-frame from the decoded `frame.isHDR` (per-mode pixel format `bgra8`/`rgba16Float`), so a game
|
||||||
|
entering HDR mid-stream just reconfigures (last 0xCE grade cached + re-applied; pump drains 0xCE
|
||||||
|
unconditionally). **4:4:4 added**: decode format is a 2×2 `(chroma, HDR)` matrix
|
||||||
|
(`420v/x420/444v/x444`, all biplanar so the shaders are unchanged), advertised (`VIDEO_CAP_444`) only
|
||||||
|
behind a **hardware-required `VTDecompressionSession` probe** (`Stage444Probe`, validated on M3) with a
|
||||||
|
Settings opt-out + a bounded pump backstop for an undecodable 4:4:4 session. *HDR brightness + 4:4:4
|
||||||
|
still need on-glass validation (Windows-HDR / `PUNKTFUNK_444` host).* Next: glass-to-glass numbers via
|
||||||
|
`tools/latency-probe`.
|
||||||
**Linux stage 1 done, first light 2026-06-12** (`clients/linux`, binary
|
**Linux stage 1 done, first light 2026-06-12** (`clients/linux`, binary
|
||||||
`punktfunk-client`): GTK4/libadwaita shell linking `punktfunk-core` directly (no C ABI;
|
`punktfunk-client`): GTK4/libadwaita shell linking `punktfunk-core` directly (no C ABI;
|
||||||
`NativeClient` is now `Sync` — mutexed plane receivers), mDNS host list, TOFU + SPAKE2
|
`NativeClient` is now `Sync` — mutexed plane receivers), mDNS host list, TOFU + SPAKE2
|
||||||
|
|||||||
@@ -129,6 +129,8 @@ final class SessionModel: ObservableObject {
|
|||||||
#endif
|
#endif
|
||||||
}()
|
}()
|
||||||
let hdrCapable = hdrEnabled && displayHDR
|
let hdrCapable = hdrEnabled && displayHDR
|
||||||
|
// 4:4:4 opt-out (default on); the hardware-decode probe below is the real gate.
|
||||||
|
let want444 = (UserDefaults.standard.object(forKey: DefaultsKey.enable444) as? Bool) ?? true
|
||||||
Task.detached(priority: .userInitiated) {
|
Task.detached(priority: .userInitiated) {
|
||||||
// PunktfunkConnection.init blocks on the QUIC handshake — keep it off the main
|
// PunktfunkConnection.init blocks on the QUIC handshake — keep it off the main
|
||||||
// actor. The persistent identity is presented on every connect so a paired
|
// actor. The persistent identity is presented on every connect so a paired
|
||||||
@@ -138,9 +140,21 @@ final class SessionModel: ObservableObject {
|
|||||||
// Advertise 10-bit + HDR10 when enabled: the host upgrades to a BT.2020 PQ Main10 stream
|
// Advertise 10-bit + HDR10 when enabled: the host upgrades to a BT.2020 PQ Main10 stream
|
||||||
// only for actual HDR content (its own gate); the VideoToolbox/Metal present path is
|
// only for actual HDR content (its own gate); the VideoToolbox/Metal present path is
|
||||||
// HDR-capable (P010 + itur_2100_PQ + EDR). 0 keeps the 8-bit BT.709 SDR stream.
|
// HDR-capable (P010 + itur_2100_PQ + EDR). 0 keeps the 8-bit BT.709 SDR stream.
|
||||||
let videoCaps: UInt8 = hdrCapable
|
var videoCaps: UInt8 = hdrCapable
|
||||||
? (PunktfunkConnection.videoCap10Bit | PunktfunkConnection.videoCapHDR)
|
? (PunktfunkConnection.videoCap10Bit | PunktfunkConnection.videoCapHDR)
|
||||||
: 0
|
: 0
|
||||||
|
// Advertise full-chroma 4:4:4 only when allowed AND this device can HARDWARE-decode it
|
||||||
|
// (software 4:4:4 is too slow for real-time). The host content-gates depth, so an
|
||||||
|
// HDR-advertised session can still receive an 8-bit 4:4:4 stream (SDR content) — require
|
||||||
|
// BOTH depths there. Otherwise a no-op (the host emits 4:4:4 only if it too opted in);
|
||||||
|
// `chromaFormat` on the connection reflects what was actually resolved.
|
||||||
|
let canDecode444 =
|
||||||
|
hdrCapable
|
||||||
|
? (Stage444Probe.hwDecode444_8bit && Stage444Probe.hwDecode444_10bit)
|
||||||
|
: Stage444Probe.hwDecode444_8bit
|
||||||
|
if want444, canDecode444 {
|
||||||
|
videoCaps |= PunktfunkConnection.videoCap444
|
||||||
|
}
|
||||||
let result = Result { try PunktfunkConnection(
|
let result = Result { try PunktfunkConnection(
|
||||||
host: host.address, port: host.port,
|
host: host.address, port: host.port,
|
||||||
width: width, height: height, refreshHz: hz,
|
width: width, height: height, refreshHz: hz,
|
||||||
|
|||||||
@@ -25,6 +25,7 @@ struct SettingsView: View {
|
|||||||
@AppStorage(DefaultsKey.bitrateKbps) private var bitrateKbps = 0
|
@AppStorage(DefaultsKey.bitrateKbps) private var bitrateKbps = 0
|
||||||
@AppStorage(DefaultsKey.presenter) private var presenter = "stage2"
|
@AppStorage(DefaultsKey.presenter) private var presenter = "stage2"
|
||||||
@AppStorage(DefaultsKey.hdrEnabled) private var hdrEnabled = true
|
@AppStorage(DefaultsKey.hdrEnabled) private var hdrEnabled = true
|
||||||
|
@AppStorage(DefaultsKey.enable444) private var enable444 = true
|
||||||
@AppStorage(DefaultsKey.libraryEnabled) private var libraryEnabled = false
|
@AppStorage(DefaultsKey.libraryEnabled) private var libraryEnabled = false
|
||||||
@AppStorage(DefaultsKey.fullscreenWhileStreaming) private var fullscreenWhileStreaming = true
|
@AppStorage(DefaultsKey.fullscreenWhileStreaming) private var fullscreenWhileStreaming = true
|
||||||
@AppStorage(DefaultsKey.micEnabled) private var micEnabled = true
|
@AppStorage(DefaultsKey.micEnabled) private var micEnabled = true
|
||||||
@@ -36,6 +37,7 @@ struct SettingsView: View {
|
|||||||
@State private var showControllerTest = false
|
@State private var showControllerTest = false
|
||||||
#endif
|
#endif
|
||||||
#if os(iOS)
|
#if os(iOS)
|
||||||
|
@AppStorage(DefaultsKey.pointerCapture) private var pointerCapture = true
|
||||||
// The sidebar selection drives the detail pane on iPad and the pushed sub-page on iPhone.
|
// The sidebar selection drives the detail pane on iPad and the pushed sub-page on iPhone.
|
||||||
// Width class decides the initial value: nil on iPhone (show the category list first),
|
// Width class decides the initial value: nil on iPhone (show the category list first),
|
||||||
// General on iPad (a two-column layout should never open with an empty detail).
|
// General on iPad (a two-column layout should never open with an empty detail).
|
||||||
@@ -206,6 +208,7 @@ struct SettingsView: View {
|
|||||||
case .general:
|
case .general:
|
||||||
Form {
|
Form {
|
||||||
streamModeSection
|
streamModeSection
|
||||||
|
pointerSection
|
||||||
compositorSection
|
compositorSection
|
||||||
}
|
}
|
||||||
.formStyle(.grouped)
|
.formStyle(.grouped)
|
||||||
@@ -581,6 +584,30 @@ struct SettingsView: View {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#if os(iOS)
|
||||||
|
/// iPad-only pointer-capture toggle: lock the mouse/trackpad for relative movement (games) vs
|
||||||
|
/// forward an absolute cursor position (desktop). Empty on iPhone (no hardware-pointer lock —
|
||||||
|
/// the mouse path there is always the absolute fallback).
|
||||||
|
@ViewBuilder private var pointerSection: some View {
|
||||||
|
if UIDevice.current.userInterfaceIdiom == .pad {
|
||||||
|
Section {
|
||||||
|
Toggle("Capture pointer for games", isOn: $pointerCapture)
|
||||||
|
} header: {
|
||||||
|
Text("Pointer")
|
||||||
|
} footer: {
|
||||||
|
Text("With a mouse or trackpad connected, lock the pointer and send relative "
|
||||||
|
+ "movement — the expected behavior for games (mouse-look). Turn this off for "
|
||||||
|
+ "desktop use to keep the pointer free and send its absolute position instead. "
|
||||||
|
+ "The lock needs the stream full-screen and frontmost; it falls back to the "
|
||||||
|
+ "absolute pointer automatically (Stage Manager, Slide Over). Finger touch is "
|
||||||
|
+ "unaffected. Applies from the next session.")
|
||||||
|
.font(.geist(12, relativeTo: .caption))
|
||||||
|
.foregroundStyle(.secondary)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
@ViewBuilder private var compositorSection: some View {
|
@ViewBuilder private var compositorSection: some View {
|
||||||
Section {
|
Section {
|
||||||
Picker("Compositor", selection: $compositor) {
|
Picker("Compositor", selection: $compositor) {
|
||||||
@@ -644,12 +671,15 @@ struct SettingsView: View {
|
|||||||
@ViewBuilder private var hdrSection: some View {
|
@ViewBuilder private var hdrSection: some View {
|
||||||
Section {
|
Section {
|
||||||
Toggle("10-bit HDR", isOn: $hdrEnabled)
|
Toggle("10-bit HDR", isOn: $hdrEnabled)
|
||||||
|
Toggle("Full chroma (4:4:4)", isOn: $enable444)
|
||||||
} header: {
|
} header: {
|
||||||
Text("HDR")
|
Text("Video quality")
|
||||||
} footer: {
|
} footer: {
|
||||||
Text("Request a 10-bit BT.2020 PQ (HDR10) stream. It only engages when the host is "
|
Text("HDR requests a 10-bit BT.2020 PQ (HDR10) stream — it only engages when the host is "
|
||||||
+ "sending HDR content AND this display supports HDR — otherwise the stream stays "
|
+ "sending HDR content AND this display supports HDR. 4:4:4 requests full chroma "
|
||||||
+ "8-bit SDR. Applies from the next session.")
|
+ "(sharper text/UI, more bandwidth) — it only engages when this device can "
|
||||||
|
+ "hardware-decode it AND the host opted in. Otherwise the stream stays 8-bit "
|
||||||
|
+ "4:2:0 SDR. Applies from the next session.")
|
||||||
.font(.geist(12, relativeTo: .caption))
|
.font(.geist(12, relativeTo: .caption))
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -25,9 +25,19 @@ public enum DefaultsKey {
|
|||||||
/// Request a 10-bit BT.2020 PQ (HDR10) stream. On by default; only takes effect when the host
|
/// Request a 10-bit BT.2020 PQ (HDR10) stream. On by default; only takes effect when the host
|
||||||
/// has HDR content AND this display supports HDR — otherwise the stream stays 8-bit SDR.
|
/// has HDR content AND this display supports HDR — otherwise the stream stays 8-bit SDR.
|
||||||
public static let hdrEnabled = "punktfunk.hdrEnabled"
|
public static let hdrEnabled = "punktfunk.hdrEnabled"
|
||||||
|
/// Request a full-chroma 4:4:4 stream when this device can HARDWARE-decode it (`Stage444Probe`).
|
||||||
|
/// On by default; only takes effect when the host also opted in to 4:4:4 (otherwise the stream
|
||||||
|
/// stays 4:2:0). Sharper text/UI at the cost of more bandwidth.
|
||||||
|
public static let enable444 = "punktfunk.enable444"
|
||||||
public static let hosts = "punktfunk.hosts"
|
public static let hosts = "punktfunk.hosts"
|
||||||
/// Client-side cursor mode: "auto" (shown only in gamescope sessions), "always", "never".
|
/// Client-side cursor mode: "auto" (shown only in gamescope sessions), "always", "never".
|
||||||
public static let cursorMode = "punktfunk.cursorMode"
|
public static let cursorMode = "punktfunk.cursorMode"
|
||||||
|
/// iPad: capture the mouse/trackpad pointer (pointer lock → relative movement) for games,
|
||||||
|
/// rather than forwarding an absolute cursor position. On by default. Only meaningful on iPad
|
||||||
|
/// with a hardware mouse/trackpad; the system grants the lock only to a full-screen, frontmost
|
||||||
|
/// scene and silently falls back to the absolute pointer when it can't (Stage Manager / Slide
|
||||||
|
/// Over). Read by `StreamViewController.prefersPointerLocked`.
|
||||||
|
public static let pointerCapture = "punktfunk.pointerCapture"
|
||||||
/// Experimental: show the host's game library (browsed over the management API). Off by default.
|
/// Experimental: show the host's game library (browsed over the management API). Off by default.
|
||||||
public static let libraryEnabled = "punktfunk.libraryEnabled"
|
public static let libraryEnabled = "punktfunk.libraryEnabled"
|
||||||
/// macOS: take the window fullscreen while streaming and restore it on the host list. On by default.
|
/// macOS: take the window fullscreen while streaming and restore it on the host list. On by default.
|
||||||
|
|||||||
@@ -370,29 +370,32 @@ public final class GamepadFeedback {
|
|||||||
// Hidout traffic (lightbar / player LEDs / triggers) only exists on a PlayStation-pad
|
// Hidout traffic (lightbar / player LEDs / triggers) only exists on a PlayStation-pad
|
||||||
// session — a DualSense or a DualShock 4 (lightbar only). Block briefly on it there and
|
// session — a DualSense or a DualShock 4 (lightbar only). Block briefly on it there and
|
||||||
// let rumble own the wait elsewhere; on an Xbox session it stays nonblocking.
|
// let rumble own the wait elsewhere; on an Xbox session it stays nonblocking.
|
||||||
let hasHidout = connection.resolvedGamepad == .dualSense
|
|
||||||
|| connection.resolvedGamepad == .dualShock4
|
|
||||||
let hidTimeout: UInt32 = hasHidout ? 10 : 0
|
|
||||||
let thread = Thread { [connection, flag, drainDone, weak self] in
|
let thread = Thread { [connection, flag, drainDone, weak self] in
|
||||||
while !flag.isStopped {
|
while !flag.isStopped {
|
||||||
do {
|
do {
|
||||||
if let r = try connection.nextRumble(timeoutMs: 10), r.pad == 0 {
|
// Poll the feedback planes NON-BLOCKING. A blocking poll (timeoutMs > 0) holds
|
||||||
|
// the connection's shared feedback lock for its whole wait; the video pump drains
|
||||||
|
// HDR mastering metadata (nextHdrMeta) on the SAME lock every frame, so a blocking
|
||||||
|
// poll here starved it and throttled HDR to ~1 fps (SDR, which never drains HDR
|
||||||
|
// meta, was unaffected). Pacing with a short sleep OUTSIDE the lock (below) keeps
|
||||||
|
// rumble/HID latency low while leaving the lock free between polls.
|
||||||
|
if let r = try connection.nextRumble(timeoutMs: 0), r.pad == 0 {
|
||||||
self?.rumble.apply(low: r.low, high: r.high)
|
self?.rumble.apply(low: r.low, high: r.high)
|
||||||
}
|
}
|
||||||
// Drain a BOUNDED burst of hidout events: only the first poll waits,
|
// Drain a BOUNDED burst of hidout events so sustained 0xCD traffic (a game writing
|
||||||
// and the cap + stop check keep sustained 0xCD traffic (a game writing
|
// per-frame LED/trigger reports) can't spin here or block stop() past one cycle.
|
||||||
// per-frame LED/trigger reports) from starving the rumble poll above
|
|
||||||
// or blocking stop() past one cycle.
|
|
||||||
var burst = 0
|
var burst = 0
|
||||||
while burst < 64, !flag.isStopped,
|
while burst < 64, !flag.isStopped,
|
||||||
let ev = try connection.nextHidOutput(
|
let ev = try connection.nextHidOutput(timeoutMs: 0) {
|
||||||
timeoutMs: burst == 0 ? hidTimeout : 0) {
|
|
||||||
self?.render(ev)
|
self?.render(ev)
|
||||||
burst += 1
|
burst += 1
|
||||||
}
|
}
|
||||||
} catch {
|
} catch {
|
||||||
break // .closed (or fatal) — the session is over
|
break // .closed (or fatal) — the session is over
|
||||||
}
|
}
|
||||||
|
// ~8 ms poll cadence (≈125 Hz), slept OUTSIDE the feedback lock — low rumble/HID
|
||||||
|
// latency without holding the lock the HDR-meta drain needs.
|
||||||
|
if !flag.isStopped { Thread.sleep(forTimeInterval: 0.008) }
|
||||||
}
|
}
|
||||||
drainDone.signal()
|
drainDone.signal()
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -107,6 +107,23 @@ public final class InputCapture {
|
|||||||
/// macOS (no GCMouse handlers installed; `sendMouseAbs` is never called there). Main-queue.
|
/// macOS (no GCMouse handlers installed; `sendMouseAbs` is never called there). Main-queue.
|
||||||
public var gcMouseForwarding = false
|
public var gcMouseForwarding = false
|
||||||
|
|
||||||
|
#if os(iOS)
|
||||||
|
/// Whether any device is attached as a `GCMouse` right now. The Magic Keyboard TRACKPAD does
|
||||||
|
/// not always register as a GCMouse on iPadOS (only a standalone mouse does) — when no GCMouse
|
||||||
|
/// is present the relative GCMouse path can't carry pointer motion. Main-queue.
|
||||||
|
public var hasGCMouse: Bool { !mice.isEmpty }
|
||||||
|
|
||||||
|
/// Diagnostic: a one-line description of every attached GCMouse (count + GCDevice identity), so
|
||||||
|
/// PUNKTFUNK_INPUT_DEBUG can reveal whether the trackpad showed up as a mouse at all.
|
||||||
|
public var attachedMiceSummary: String {
|
||||||
|
guard !mice.isEmpty else { return "0 mice" }
|
||||||
|
let parts = mice.map { mouse -> String in
|
||||||
|
"\(mouse.productCategory)/\(mouse.vendorName ?? "?")"
|
||||||
|
}
|
||||||
|
return "\(mice.count) mice: \(parts.joined(separator: ", "))"
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
/// Fired on ⌘⎋ (the capture toggle — detected here so it works in both states; the
|
/// Fired on ⌘⎋ (the capture toggle — detected here so it works in both states; the
|
||||||
/// event itself is swallowed). Main queue.
|
/// event itself is swallowed). Main queue.
|
||||||
public var onToggleCapture: (() -> Void)?
|
public var onToggleCapture: (() -> Void)?
|
||||||
@@ -394,6 +411,12 @@ public final class InputCapture {
|
|||||||
!mice.contains(where: { $0 === mouse }) // re-delivered on wake — attach once
|
!mice.contains(where: { $0 === mouse }) // re-delivered on wake — attach once
|
||||||
else { return }
|
else { return }
|
||||||
mice.append(mouse)
|
mice.append(mouse)
|
||||||
|
#if os(iOS)
|
||||||
|
if inputDebug {
|
||||||
|
inputLog.debug(
|
||||||
|
"GCMouse attached: \(mouse.productCategory, privacy: .public)/\(mouse.vendorName ?? "?", privacy: .public) — now \(self.attachedMiceSummary, privacy: .public)")
|
||||||
|
}
|
||||||
|
#endif
|
||||||
// macOS drives motion + buttons from NSEvent (StreamLayerView's local monitor →
|
// macOS drives motion + buttons from NSEvent (StreamLayerView's local monitor →
|
||||||
// sendMotion/sendMouseButton) because GCMouse's handlers proved unreliable there;
|
// sendMotion/sendMouseButton) because GCMouse's handlers proved unreliable there;
|
||||||
// installing them too would double-send. iOS keeps GCMouse (raw deltas under
|
// installing them too would double-send. iOS keeps GCMouse (raw deltas under
|
||||||
|
|||||||
@@ -1,10 +1,12 @@
|
|||||||
// Stage-2 presenter, present half: draw a decoded NV12 CVPixelBuffer into a CAMetalLayer
|
// Stage-2 presenter, present half: draw a decoded NV12 / P010 / 4:4:4 CVPixelBuffer into a CAMetalLayer
|
||||||
// drawable with a BT.709 YUV→RGB shader. The display link (owned by the hosting view) drives
|
// drawable with a Y′CbCr→RGB shader. The hosting view's CADisplayLink drives `render` once per vsync
|
||||||
// `render` once per vsync with the target present time, so a present can finally be stamped and
|
// (via Stage2Pipeline.renderTick) with the target present time, so a present can be stamped and the
|
||||||
// the present tail hand-paced. See docs apple-stage2-presenter.md.
|
// present tail hand-paced. See docs apple-stage2-presenter.md.
|
||||||
//
|
//
|
||||||
// Main-thread only: created during view setup, `render` called from the view's CADisplayLink
|
// Main-thread only: created during view setup, `render`/`configure` called from the view's CADisplayLink
|
||||||
// (which fires on the main runloop). The Metal objects + texture cache are touched only here.
|
// (which fires on the main runloop). The Metal objects + texture cache are touched only here. The one
|
||||||
|
// exception is `setHdrMeta`, called from the pump thread — it hops the layer write to main so every
|
||||||
|
// CALayer mutation stays on one thread.
|
||||||
|
|
||||||
#if canImport(Metal) && canImport(QuartzCore)
|
#if canImport(Metal) && canImport(QuartzCore)
|
||||||
import CoreGraphics
|
import CoreGraphics
|
||||||
@@ -15,10 +17,19 @@ import os
|
|||||||
|
|
||||||
private let presenterLog = Logger(subsystem: "io.unom.punktfunk", category: "presenter")
|
private let presenterLog = Logger(subsystem: "io.unom.punktfunk", category: "presenter")
|
||||||
|
|
||||||
/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and a
|
/// HDR reference white (BT.2408 "HDR Reference White"): the absolute luminance, in nits, that the
|
||||||
/// BT.709 limited-range NV12→RGB fragment shader. uv.y is flipped (1 - p.y) so the top-left-
|
/// PQ signal's diffuse white sits at. Passed to `CAEDRMetadata.hdr10(opticalOutputScale:)`, it anchors
|
||||||
/// origin texture presents upright (NDC y is up), not upside down. (Colorspace is BT.709 SDR
|
/// 203-nit diffuse white at EDR 1.0 (the display's SDR-white level) and lets the system tone-map the
|
||||||
/// for now — matches the host; 10-bit/HDR + other matrices are a later tie-in.)
|
/// brighter highlights into the panel's headroom. This is the missing anchor that made the old HDR path
|
||||||
|
/// render "way too bright" (no `edrMetadata` → no reference-white anchoring); a LARGER value renders
|
||||||
|
/// dimmer. Matches the host's standard PQ reference white.
|
||||||
|
private let hdrReferenceWhiteNits: Float = 203.0
|
||||||
|
|
||||||
|
/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and BT.709 SDR
|
||||||
|
/// and BT.2020-PQ HDR Y′CbCr→RGB fragment shaders. uv.y is flipped (1 - p.y) so the top-left-origin
|
||||||
|
/// texture presents upright (NDC y is up). The HDR shader outputs PQ-encoded R′G′B′ as-is — the
|
||||||
|
/// CAMetalLayer's `itur_2100_PQ` colour space + `edrMetadata` tell the system compositor the samples
|
||||||
|
/// are PQ and how to tone-map them (no EOTF here, matching the host's BT.2020 PQ emission).
|
||||||
private let shaderSource = """
|
private let shaderSource = """
|
||||||
#include <metal_stdlib>
|
#include <metal_stdlib>
|
||||||
using namespace metal;
|
using namespace metal;
|
||||||
@@ -66,6 +77,8 @@ float catmullRomLuma(texture2d<float> tex, sampler s, float2 uv) {
|
|||||||
return r;
|
return r;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// SDR: 8-bit NV12 / 4:4:4 (BT.709, limited/video range) → full-range RGB. Chroma is sampled at the
|
||||||
|
// same UV as luma, so a full-size 4:4:4 chroma plane needs no shader change vs 4:2:0.
|
||||||
fragment float4 pf_frag(VOut in [[stage_in]],
|
fragment float4 pf_frag(VOut in [[stage_in]],
|
||||||
texture2d<float> lumaTex [[texture(0)]],
|
texture2d<float> lumaTex [[texture(0)]],
|
||||||
texture2d<float> chromaTex [[texture(1)]]) {
|
texture2d<float> chromaTex [[texture(1)]]) {
|
||||||
@@ -82,18 +95,18 @@ fragment float4 pf_frag(VOut in [[stage_in]],
|
|||||||
return float4(saturate(float3(r, g, b)), 1.0);
|
return float4(saturate(float3(r, g, b)), 1.0);
|
||||||
}
|
}
|
||||||
|
|
||||||
// HDR: 10-bit P010 (BT.2020, limited range), Y'CbCr that is PQ-encoded. We apply the BT.2020
|
// HDR: 10-bit P010 / 4:4:4 (BT.2020, limited range), Y′CbCr that is PQ-encoded. We apply the BT.2020
|
||||||
// matrix to get PQ-encoded R'G'B' and output it as-is — the CAMetalLayer's itur_2100_PQ colour
|
// matrix to get PQ-encoded R′G′B′ and output it as-is — the CAMetalLayer's itur_2100_PQ colour space
|
||||||
// space + EDR tells the compositor the samples are PQ, so it does the PQ→display mapping. No EOTF
|
// + edrMetadata tell the compositor the samples are PQ, so it does the PQ→display tone-map. No EOTF
|
||||||
// here (matching the host, which emitted BT.2020 PQ). P010 stores the 10-bit code in the high bits
|
// here. P010/x444 store the 10-bit code in the high bits of each 16-bit sample, so an .r16Unorm sample
|
||||||
// of each 16-bit sample, so an .r16Unorm sample reads ~code/1023 (the /1024 vs /1023 error is < 0.1%).
|
// reads ~code/1023 (the /1024 vs /1023 error is < 0.1%).
|
||||||
fragment float4 pf_frag_hdr(VOut in [[stage_in]],
|
fragment float4 pf_frag_hdr(VOut in [[stage_in]],
|
||||||
texture2d<float> lumaTex [[texture(0)]],
|
texture2d<float> lumaTex [[texture(0)]],
|
||||||
texture2d<float> chromaTex [[texture(1)]]) {
|
texture2d<float> chromaTex [[texture(1)]]) {
|
||||||
constexpr sampler s(filter::linear, address::clamp_to_edge);
|
constexpr sampler s(filter::linear, address::clamp_to_edge);
|
||||||
float y = catmullRomLuma(lumaTex, s, in.uv);
|
float y = catmullRomLuma(lumaTex, s, in.uv);
|
||||||
float2 c = chromaTex.sample(s, in.uv).rg;
|
float2 c = chromaTex.sample(s, in.uv).rg;
|
||||||
// BT.2020 10-bit limited (video) range → full-range PQ R'G'B'.
|
// BT.2020 10-bit limited (video) range → full-range PQ R′G′B′.
|
||||||
y = (y - 64.0/1023.0) * (1023.0/876.0);
|
y = (y - 64.0/1023.0) * (1023.0/876.0);
|
||||||
float u = (c.x - 512.0/1023.0) * (1023.0/896.0);
|
float u = (c.x - 512.0/1023.0) * (1023.0/896.0);
|
||||||
float v = (c.y - 512.0/1023.0) * (1023.0/896.0);
|
float v = (c.y - 512.0/1023.0) * (1023.0/896.0);
|
||||||
@@ -110,26 +123,34 @@ public final class MetalVideoPresenter {
|
|||||||
|
|
||||||
private let device: MTLDevice
|
private let device: MTLDevice
|
||||||
private let queue: MTLCommandQueue
|
private let queue: MTLCommandQueue
|
||||||
/// SDR (BT.709 8-bit NV12 → bgra8) and HDR (BT.2020 PQ 10-bit P010 → rgba16Float) pipelines.
|
/// SDR (BT.709 8-bit → bgra8) and HDR (BT.2020 PQ 10-bit → rgba16Float) pipelines. Selected per
|
||||||
/// Selected per frame by `render`; the layer is reconfigured when the mode flips (HDR toggle).
|
/// frame in `render`; the layer is reconfigured to match when the session flips (HDR toggle).
|
||||||
private let pipelineSDR: MTLRenderPipelineState
|
private let pipelineSDR: MTLRenderPipelineState
|
||||||
private let pipelineHDR: MTLRenderPipelineState
|
private let pipelineHDR: MTLRenderPipelineState
|
||||||
private var textureCache: CVMetalTextureCache?
|
private var textureCache: CVMetalTextureCache?
|
||||||
/// Current layer configuration — switched lazily in `configure(hdr:)` when a frame's mode differs.
|
|
||||||
|
/// Current layer configuration — switched in `configure(hdr:)` when a frame's HDR-ness differs.
|
||||||
|
/// Main-thread only (read + written from `render`/`configure`, all on the display-link runloop).
|
||||||
private var hdrActive = false
|
private var hdrActive = false
|
||||||
|
/// Last HDR mastering grade received via `setHdrMeta` (the host's 0xCE). Cached so a mid-session
|
||||||
|
/// SDR→HDR flip's `configureColor` re-applies the real grade instead of clobbering it back to the
|
||||||
|
/// bare reference-white anchor (an out-of-order race otherwise: `setHdrMeta` and the flip both write
|
||||||
|
/// `edrMetadata`). Main-thread only.
|
||||||
|
private var lastHdrMeta: PunktfunkConnection.HdrMeta?
|
||||||
|
|
||||||
#if DEBUG
|
#if DEBUG
|
||||||
/// Last logged "decoded→drawable" signature, so the diagnostic logs only when a size changes
|
/// Last logged "decoded→drawable" signature, so the diagnostic logs only on a size/HDR change.
|
||||||
/// (on first frame, a resize, or a host Reconfigure) instead of every frame.
|
|
||||||
private var lastSizeSig = ""
|
private var lastSizeSig = ""
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/// nil if Metal is unavailable (no GPU / a headless CI) — the caller falls back to stage-1.
|
/// nil if Metal is unavailable (no GPU / a headless CI) or a shader fails to compile — the caller
|
||||||
public init?() {
|
/// falls back to stage-1.
|
||||||
|
public static func make() -> MetalVideoPresenter? {
|
||||||
guard let device = MTLCreateSystemDefaultDevice(),
|
guard let device = MTLCreateSystemDefaultDevice(),
|
||||||
let queue = device.makeCommandQueue()
|
let queue = device.makeCommandQueue()
|
||||||
else { return nil }
|
else { return nil }
|
||||||
self.device = device
|
let pipelineSDR: MTLRenderPipelineState
|
||||||
self.queue = queue
|
let pipelineHDR: MTLRenderPipelineState
|
||||||
do {
|
do {
|
||||||
let library = try device.makeLibrary(source: shaderSource, options: nil)
|
let library = try device.makeLibrary(source: shaderSource, options: nil)
|
||||||
let vtx = library.makeFunction(name: "pf_vtx")
|
let vtx = library.makeFunction(name: "pf_vtx")
|
||||||
@@ -146,99 +167,140 @@ public final class MetalVideoPresenter {
|
|||||||
} catch {
|
} catch {
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &textureCache)
|
var cache: CVMetalTextureCache?
|
||||||
guard textureCache != nil else { return nil }
|
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &cache)
|
||||||
|
guard let textureCache = cache else { return nil }
|
||||||
|
|
||||||
let layer = CAMetalLayer()
|
let layer = CAMetalLayer()
|
||||||
layer.device = device
|
layer.device = device
|
||||||
layer.pixelFormat = .bgra8Unorm
|
layer.pixelFormat = .bgra8Unorm
|
||||||
layer.framebufferOnly = true
|
layer.framebufferOnly = true
|
||||||
layer.isOpaque = true
|
layer.isOpaque = true
|
||||||
// Render the drawable at the DECODED frame's resolution (set per-frame in `render`) and let
|
|
||||||
// the system compositor scale it to the layer's bounds — the same `.resizeAspect` path
|
|
||||||
// stage-1's AVSampleBufferDisplayLayer (videoGravity) uses, so stage-2 matches its sharpness.
|
|
||||||
// A native-resolution present is then pixel-exact (1:1, no shader scaling), and any display
|
|
||||||
// scaling uses the system's high-quality scaler rather than the in-shader bicubic.
|
|
||||||
layer.contentsGravity = .resizeAspect
|
|
||||||
// Triple-buffer: more in-flight drawables before `nextDrawable()` (called on the
|
|
||||||
// display-link / MAIN thread) has to block waiting for one to free.
|
|
||||||
layer.maximumDrawableCount = 3
|
|
||||||
#if os(macOS)
|
#if os(macOS)
|
||||||
// The display link already paces exactly one present per vsync. Leaving the layer's
|
// The display link already paces exactly one present per vsync. Leaving the layer's own vsync
|
||||||
// own vsync wait on means `commandBuffer.present` ALSO blocks for the hardware vsync,
|
// wait on means `commandBuffer.present` ALSO blocks for the hardware vsync, so `nextDrawable()`
|
||||||
// so `nextDrawable()` stalls the MAIN thread until a drawable frees — windowed, the
|
// stalls the MAIN thread until a drawable frees — windowed, the WindowServer's looser
|
||||||
// WindowServer's looser compositing hides it; FULLSCREEN's tighter, more-direct path
|
// compositing hides it; FULLSCREEN's tighter path serializes the main thread to the display and
|
||||||
// serializes the main thread to the display and the stall surfaces as bad judder.
|
// the stall surfaces as bad judder. Disabling the layer-level sync lets present return promptly
|
||||||
// Disabling the layer-level sync lets present return promptly (the display link is the
|
// (the display link is the pacing source) — the fix for the fullscreen stutter. macOS-only.
|
||||||
// pacing source), which is what fixes the fullscreen stutter. macOS-only property.
|
|
||||||
layer.displaySyncEnabled = false
|
layer.displaySyncEnabled = false
|
||||||
#endif
|
#endif
|
||||||
|
// Render the drawable at the DECODED frame's resolution (set per-frame in `render`) and let the
|
||||||
|
// system compositor scale it to the layer's bounds — the same `.resizeAspect` path stage-1's
|
||||||
|
// AVSampleBufferDisplayLayer uses. A native-resolution present is then pixel-exact (1:1, no
|
||||||
|
// shader scaling); a resized window rescales via the system's scaler.
|
||||||
|
layer.contentsGravity = .resizeAspect
|
||||||
|
// Triple-buffer: more in-flight drawables before `nextDrawable()` (called on the display-link /
|
||||||
|
// MAIN thread) has to block waiting for one to free.
|
||||||
|
layer.maximumDrawableCount = 3
|
||||||
|
|
||||||
|
return MetalVideoPresenter(
|
||||||
|
device: device, queue: queue, pipelineSDR: pipelineSDR, pipelineHDR: pipelineHDR,
|
||||||
|
textureCache: textureCache, layer: layer)
|
||||||
|
}
|
||||||
|
|
||||||
|
private init(
|
||||||
|
device: MTLDevice, queue: MTLCommandQueue, pipelineSDR: MTLRenderPipelineState,
|
||||||
|
pipelineHDR: MTLRenderPipelineState, textureCache: CVMetalTextureCache, layer: CAMetalLayer
|
||||||
|
) {
|
||||||
|
self.device = device
|
||||||
|
self.queue = queue
|
||||||
|
self.pipelineSDR = pipelineSDR
|
||||||
|
self.pipelineHDR = pipelineHDR
|
||||||
|
self.textureCache = textureCache
|
||||||
self.layer = layer
|
self.layer = layer
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Reconfigure the layer for SDR or HDR when the stream mode flips (HDR toggle). HDR uses an
|
/// Configure the layer + active pipeline for an SDR or HDR session. MAIN THREAD ONLY. Called once at
|
||||||
/// rgba16Float drawable + a BT.2020 PQ colour space + EDR, so the compositor PQ-maps to the
|
/// session start and again per-frame from `render` (idempotent — the guard makes a same-state call a
|
||||||
/// display; SDR uses the plain 8-bit sRGB path. Main-thread only (called from `render`).
|
/// no-op), so a mid-session HDR toggle (the host re-inits its encoder; the decoded `frame.isHDR`
|
||||||
private func configure(hdr: Bool) {
|
/// flips) reconfigures here automatically. HDR uses an rgba16Float drawable + BT.2020 PQ colour space
|
||||||
|
/// + EDR with a 203-nit reference-white anchor; SDR uses the plain 8-bit sRGB path.
|
||||||
|
public func configure(hdr: Bool) {
|
||||||
guard hdr != hdrActive else { return }
|
guard hdr != hdrActive else { return }
|
||||||
hdrActive = hdr
|
hdrActive = hdr
|
||||||
|
configureColor(hdr: hdr)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Set the layer's pixel format + colour config for SDR or HDR. MAIN THREAD ONLY. EDR is requested
|
||||||
|
/// on ALL platforms — the property is available on macOS/iOS/tvOS at our deployment floor, and the
|
||||||
|
/// old `#if os(macOS)` guard left iOS/tvOS EDR half-engaged.
|
||||||
|
private func configureColor(hdr: Bool) {
|
||||||
if hdr {
|
if hdr {
|
||||||
layer.pixelFormat = .rgba16Float
|
layer.pixelFormat = .rgba16Float
|
||||||
layer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ)
|
layer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ)
|
||||||
#if os(macOS)
|
|
||||||
layer.wantsExtendedDynamicRangeContent = true
|
layer.wantsExtendedDynamicRangeContent = true
|
||||||
#endif
|
// Anchor reference white. Re-apply the real grade if one already arrived (0xCE before the
|
||||||
|
// flip); otherwise the bare 203-nit anchor. Without this anchor the PQ signal is too bright.
|
||||||
|
layer.edrMetadata = makeEDR(lastHdrMeta)
|
||||||
} else {
|
} else {
|
||||||
|
// SDR: gamma-encoded BT.709 [0,1] in an 8-bit drawable; a nil colorspace tags it device/sRGB
|
||||||
|
// (the proven SDR path — never showed the "too bright" issue, which was HDR-only).
|
||||||
layer.pixelFormat = .bgra8Unorm
|
layer.pixelFormat = .bgra8Unorm
|
||||||
layer.colorspace = nil
|
layer.colorspace = nil
|
||||||
#if os(macOS)
|
|
||||||
layer.wantsExtendedDynamicRangeContent = false
|
layer.wantsExtendedDynamicRangeContent = false
|
||||||
#endif
|
layer.edrMetadata = nil
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Draw one decoded frame to the next drawable and present it. `isHDR` selects the 10-bit
|
private func makeEDR(_ meta: PunktfunkConnection.HdrMeta?) -> CAEDRMetadata {
|
||||||
/// BT.2020 PQ path (P010 input) vs the 8-bit BT.709 path (NV12 input). Returns true on success;
|
CAEDRMetadata.hdr10(
|
||||||
/// false when there's no drawable yet, a texture couldn't be made, or Metal errored — the
|
displayInfo: meta?.masteringDisplayColorVolume(),
|
||||||
/// caller then doesn't stamp a present for this frame.
|
contentInfo: meta?.contentLightLevelInfo(),
|
||||||
|
opticalOutputScale: hdrReferenceWhiteNits)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Update the HDR mastering metadata (drained from the host's 0xCE datagram) to refine the system
|
||||||
|
/// tone-map from the real grade. Called from the PUMP thread, so the layer write is hopped to MAIN
|
||||||
|
/// (every CALayer mutation stays on one thread). The grade is cached so a later SDR→HDR
|
||||||
|
/// `configureColor` re-applies it; the `edrMetadata` write is gated on `hdrActive` (setting it on an
|
||||||
|
/// SDR layer is harmless but pointless, and the flip will apply it anyway).
|
||||||
|
public func setHdrMeta(_ meta: PunktfunkConnection.HdrMeta) {
|
||||||
|
DispatchQueue.main.async { [weak self] in
|
||||||
|
guard let self else { return }
|
||||||
|
self.lastHdrMeta = meta
|
||||||
|
if self.hdrActive { self.layer.edrMetadata = self.makeEDR(meta) }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Draw one decoded frame to the next drawable and present it. MAIN THREAD (the display link).
|
||||||
|
/// `isHDR` selects the 10-bit BT.2020 PQ path vs the 8-bit BT.709 path and is reconciled with the
|
||||||
|
/// layer config via `configure`. Returns true on success; false when there's no drawable yet, a
|
||||||
|
/// texture couldn't be made, or Metal errored — the caller then doesn't stamp a present.
|
||||||
@discardableResult
|
@discardableResult
|
||||||
public func render(_ pixelBuffer: CVPixelBuffer, isHDR: Bool = false) -> Bool {
|
public func render(_ pixelBuffer: CVPixelBuffer, isHDR: Bool = false) -> Bool {
|
||||||
|
// Reconcile the layer with the decoded frame's HDR-ness (handles a mid-session SDR↔HDR flip).
|
||||||
configure(hdr: isHDR)
|
configure(hdr: isHDR)
|
||||||
// P010 stores 10-bit luma/chroma in 16-bit samples → R16/RG16; NV12 is 8-bit → R8/RG8.
|
|
||||||
let lumaFmt: MTLPixelFormat = isHDR ? .r16Unorm : .r8Unorm
|
// P010/x444 store 10-bit luma/chroma in 16-bit samples → R16/RG16; NV12/444v is 8-bit → R8/RG8.
|
||||||
let chromaFmt: MTLPixelFormat = isHDR ? .rg16Unorm : .rg8Unorm
|
// Derived from the actual decoded buffer so a 4:4:4 (full chroma plane) frame just works.
|
||||||
|
let pf = CVPixelBufferGetPixelFormatType(pixelBuffer)
|
||||||
|
let tenBit =
|
||||||
|
pf == kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
|
||||||
|
|| pf == kCVPixelFormatType_420YpCbCr10BiPlanarFullRange
|
||||||
|
|| pf == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange
|
||||||
|
|| pf == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange
|
||||||
guard let textureCache,
|
guard let textureCache,
|
||||||
let luma = makeTexture(pixelBuffer, plane: 0, format: lumaFmt, cache: textureCache),
|
let luma = makeTexture(
|
||||||
let chroma = makeTexture(pixelBuffer, plane: 1, format: chromaFmt, cache: textureCache)
|
pixelBuffer, plane: 0, format: tenBit ? .r16Unorm : .r8Unorm, cache: textureCache),
|
||||||
|
let chroma = makeTexture(
|
||||||
|
pixelBuffer, plane: 1, format: tenBit ? .rg16Unorm : .rg8Unorm, cache: textureCache)
|
||||||
else { return false }
|
else { return false }
|
||||||
|
|
||||||
// Size the drawable to the decoded frame so the fullscreen triangle samples the texture 1:1
|
// Size the drawable to the decoded frame so the fullscreen triangle samples 1:1 (pixel-exact);
|
||||||
// (pixel-exact); the layer's contentsGravity then scales it to the on-screen bounds via the
|
// the layer's contentsGravity then scales it to the on-screen bounds via the system compositor
|
||||||
// system compositor (matching stage-1). Re-set only on a change (first frame / Reconfigure).
|
// (matching stage-1). drawableSize does NOT track bounds (defaults to 0), so set it BEFORE
|
||||||
|
// nextDrawable; re-set only on a change (first frame / Reconfigure / HDR flip).
|
||||||
let decodedSize = CGSize(
|
let decodedSize = CGSize(
|
||||||
width: CVPixelBufferGetWidth(pixelBuffer), height: CVPixelBufferGetHeight(pixelBuffer))
|
width: CVPixelBufferGetWidth(pixelBuffer), height: CVPixelBufferGetHeight(pixelBuffer))
|
||||||
if layer.drawableSize != decodedSize { layer.drawableSize = decodedSize }
|
if layer.drawableSize != decodedSize { layer.drawableSize = decodedSize }
|
||||||
|
#if DEBUG
|
||||||
|
logSizeIfChanged(decoded: decodedSize)
|
||||||
|
#endif
|
||||||
guard let drawable = layer.nextDrawable(),
|
guard let drawable = layer.nextDrawable(),
|
||||||
let commandBuffer = queue.makeCommandBuffer()
|
let commandBuffer = queue.makeCommandBuffer()
|
||||||
else { return false }
|
else { return false }
|
||||||
|
|
||||||
#if DEBUG
|
|
||||||
// Diagnose sharpness: decoded should equal the drawable (the shader is 1:1); the layer's
|
|
||||||
// bounds may differ (the system scales). Logged only when a size changes.
|
|
||||||
let decodedW = Int(decodedSize.width)
|
|
||||||
let decodedH = Int(decodedSize.height)
|
|
||||||
let sig = "\(decodedW)x\(decodedH)|\(Int(layer.drawableSize.width))x\(Int(layer.drawableSize.height))"
|
|
||||||
if sig != lastSizeSig {
|
|
||||||
lastSizeSig = sig
|
|
||||||
let msg = "stage2: decoded \(decodedW)x\(decodedH) → drawable "
|
|
||||||
+ "\(Int(layer.drawableSize.width))x\(Int(layer.drawableSize.height)) "
|
|
||||||
+ "(texture \(drawable.texture.width)x\(drawable.texture.height), "
|
|
||||||
+ "contentsScale \(layer.contentsScale), "
|
|
||||||
+ "layerBounds \(Int(layer.bounds.width))x\(Int(layer.bounds.height)))"
|
|
||||||
presenterLog.info("\(msg, privacy: .public)")
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
let pass = MTLRenderPassDescriptor()
|
let pass = MTLRenderPassDescriptor()
|
||||||
pass.colorAttachments[0].texture = drawable.texture
|
pass.colorAttachments[0].texture = drawable.texture
|
||||||
pass.colorAttachments[0].loadAction = .clear
|
pass.colorAttachments[0].loadAction = .clear
|
||||||
@@ -247,24 +309,23 @@ public final class MetalVideoPresenter {
|
|||||||
guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else {
|
guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
encoder.setRenderPipelineState(isHDR ? pipelineHDR : pipelineSDR)
|
encoder.setRenderPipelineState(hdrActive ? pipelineHDR : pipelineSDR)
|
||||||
encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0)
|
encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0)
|
||||||
encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1)
|
encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1)
|
||||||
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
|
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
|
||||||
encoder.endEncoding()
|
encoder.endEncoding()
|
||||||
commandBuffer.present(drawable) // present at the next vsync — lowest latency
|
commandBuffer.present(drawable) // present at the next vsync — lowest latency
|
||||||
// Hold the CVMetalTextures + the source pixel buffer (its IOSurface) alive until the GPU
|
// Hold the CVMetalTextures + source pixel buffer (its IOSurface) alive until the GPU finishes
|
||||||
// finishes sampling — releasing them at scope exit could free the backing mid-read.
|
// sampling — releasing them at scope exit could free the backing mid-read.
|
||||||
commandBuffer.addCompletedHandler { _ in _ = (luma, chroma, pixelBuffer) }
|
commandBuffer.addCompletedHandler { _ in _ = (luma, chroma, pixelBuffer) }
|
||||||
commandBuffer.commit()
|
commandBuffer.commit()
|
||||||
return true
|
return true
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past
|
/// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past the
|
||||||
/// the draw — the MTLTexture is only valid while its CVMetalTexture is retained.
|
/// draw — the MTLTexture is only valid while its CVMetalTexture is retained.
|
||||||
private func makeTexture(
|
private func makeTexture(
|
||||||
_ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat,
|
_ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat, cache: CVMetalTextureCache
|
||||||
cache: CVMetalTextureCache
|
|
||||||
) -> CVMetalTexture? {
|
) -> CVMetalTexture? {
|
||||||
let w = CVPixelBufferGetWidthOfPlane(pixelBuffer, plane)
|
let w = CVPixelBufferGetWidthOfPlane(pixelBuffer, plane)
|
||||||
let h = CVPixelBufferGetHeightOfPlane(pixelBuffer, plane)
|
let h = CVPixelBufferGetHeightOfPlane(pixelBuffer, plane)
|
||||||
@@ -276,5 +337,16 @@ public final class MetalVideoPresenter {
|
|||||||
else { return nil }
|
else { return nil }
|
||||||
return cvTexture
|
return cvTexture
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#if DEBUG
|
||||||
|
private func logSizeIfChanged(decoded: CGSize) {
|
||||||
|
let sig = "\(Int(decoded.width))x\(Int(decoded.height))|hdr\(hdrActive ? 1 : 0)"
|
||||||
|
if sig != lastSizeSig {
|
||||||
|
lastSizeSig = sig
|
||||||
|
let msg = "stage2: decoded \(Int(decoded.width))x\(Int(decoded.height)) hdr=\(hdrActive)"
|
||||||
|
presenterLog.info("\(msg, privacy: .public)")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|||||||
@@ -0,0 +1,94 @@
|
|||||||
|
// Steers the system's iPad pointer-lock resolution down to a chosen "anchor" view controller.
|
||||||
|
//
|
||||||
|
// `UIViewController.prefersPointerLocked` is resolved the same way as the status bar: the system
|
||||||
|
// walks DOWN from the window's root view controller through `childViewControllerForPointerLock`.
|
||||||
|
// SwiftUI's hosting / container view controllers do NOT forward that query to their children, so a
|
||||||
|
// `UIViewControllerRepresentable` controller buried in the SwiftUI tree (our StreamViewController)
|
||||||
|
// is never consulted — its `prefersPointerLocked = true` is silently ignored and a Magic Keyboard
|
||||||
|
// trackpad / mouse falls through to the absolute-pointer path instead of being captured.
|
||||||
|
//
|
||||||
|
// Swizzling the DEFAULT implementation isn't enough: the controllers that break the chain
|
||||||
|
// (UIHostingController and SwiftUI's internal containers) provide their OWN implementation of the
|
||||||
|
// property, so a base-class swizzle never runs for them. Instead we walk UP the LIVE `parent`
|
||||||
|
// chain from the anchor to the window root and, on each real ancestor, force
|
||||||
|
// `childViewControllerForPointerLock` to return the next controller toward the anchor. Each forced
|
||||||
|
// value is a genuine direct child (we follow the actual containment chain), so the system's
|
||||||
|
// downward walk reaches the anchor and reads its `prefersPointerLocked`.
|
||||||
|
//
|
||||||
|
// The forcing is per-INSTANCE — an associated object — gated behind a one-time per-CLASS IMP
|
||||||
|
// swizzle. Only the specific controllers in the anchor's chain are affected; every other instance
|
||||||
|
// of those classes keeps its original behavior (associated object nil → original IMP). The forced
|
||||||
|
// values are cleared on disengage so the long-lived SwiftUI parents don't retain a stale controller
|
||||||
|
// across sessions. Only the PUBLIC `childViewControllerForPointerLock` selector is touched
|
||||||
|
// (App-Store-safe; no private API).
|
||||||
|
|
||||||
|
#if os(iOS)
|
||||||
|
import ObjectiveC
|
||||||
|
import UIKit
|
||||||
|
|
||||||
|
enum PointerLockChain {
|
||||||
|
private static var forcedChildKey: UInt8 = 0
|
||||||
|
/// Classes whose `childViewControllerForPointerLock` we've already IMP-swizzled (keyed by the
|
||||||
|
/// class object). Main-thread only — pointer-lock resolution and capture toggles are all main.
|
||||||
|
private static var swizzledClasses = Set<ObjectIdentifier>()
|
||||||
|
/// Ancestors we've stamped with a forced child this engagement, held weakly so a deallocated
|
||||||
|
/// SwiftUI controller drops out on its own (no dangling). disengage() clears every one — even
|
||||||
|
/// if the live `parent` chain has since broken — so a stamped parent can never retain a stale
|
||||||
|
/// controller subtree across sessions. One anchor is ever engaged at a time.
|
||||||
|
private static let stampedParents = NSHashTable<UIViewController>.weakObjects()
|
||||||
|
|
||||||
|
private static func forcedChild(of vc: UIViewController) -> UIViewController? {
|
||||||
|
objc_getAssociatedObject(vc, &forcedChildKey) as? UIViewController
|
||||||
|
}
|
||||||
|
|
||||||
|
private static func setForcedChild(_ child: UIViewController?, on vc: UIViewController) {
|
||||||
|
// RETAIN: while steering, the parent must keep the toward-anchor child alive. It's also
|
||||||
|
// already a strong child of `vc` via UIKit containment, so this adds no cycle (the reverse
|
||||||
|
// `.parent` link is weak), and disengage() always clears it — so it can't outlive a session.
|
||||||
|
objc_setAssociatedObject(vc, &forcedChildKey, child, .OBJC_ASSOCIATION_RETAIN_NONATOMIC)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Ensure `cls`'s `childViewControllerForPointerLock` getter consults the per-instance forced
|
||||||
|
/// child first, falling back to the class's original implementation. Idempotent per class.
|
||||||
|
private static func ensureSwizzled(_ cls: AnyClass) {
|
||||||
|
let id = ObjectIdentifier(cls)
|
||||||
|
guard !swizzledClasses.contains(id) else { return }
|
||||||
|
swizzledClasses.insert(id)
|
||||||
|
let selector = #selector(getter: UIViewController.childViewControllerForPointerLock)
|
||||||
|
guard let method = class_getInstanceMethod(cls, selector) else { return }
|
||||||
|
let originalIMP = method_getImplementation(method)
|
||||||
|
typealias OriginalFn = @convention(c) (AnyObject, Selector) -> UIViewController?
|
||||||
|
let original = unsafeBitCast(originalIMP, to: OriginalFn.self)
|
||||||
|
let forwarding: @convention(block) (UIViewController) -> UIViewController? = { vc in
|
||||||
|
if let forced = forcedChild(of: vc) { return forced }
|
||||||
|
return original(vc, selector)
|
||||||
|
}
|
||||||
|
method_setImplementation(method, imp_implementationWithBlock(forwarding))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Force every ancestor of `anchor` to forward pointer-lock resolution toward it, then ask the
|
||||||
|
/// system to re-resolve. No-op when `anchor` isn't in a view-controller hierarchy yet (it
|
||||||
|
/// re-runs from the anchor's appearance/parent callbacks once it is).
|
||||||
|
static func engage(_ anchor: UIViewController) {
|
||||||
|
disengage(anchor) // clear any prior engagement first (reparent / re-anchor)
|
||||||
|
var child = anchor
|
||||||
|
while let parent = child.parent {
|
||||||
|
ensureSwizzled(object_getClass(parent)!)
|
||||||
|
setForcedChild(child, on: parent)
|
||||||
|
stampedParents.add(parent)
|
||||||
|
child = parent
|
||||||
|
}
|
||||||
|
anchor.setNeedsUpdateOfPrefersPointerLocked()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Clear the forced forwarding on every stamped ancestor (so the SwiftUI parents stop retaining
|
||||||
|
/// the anchor's subtree) and re-resolve to drop the lock.
|
||||||
|
static func disengage(_ anchor: UIViewController) {
|
||||||
|
for parent in stampedParents.allObjects {
|
||||||
|
setForcedChild(nil, on: parent)
|
||||||
|
}
|
||||||
|
stampedParents.removeAllObjects()
|
||||||
|
anchor.setNeedsUpdateOfPrefersPointerLocked()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
// Synthetic 4:4:4 HEVC keyframes used only by `Stage444Probe` to probe decode capability.
|
||||||
|
//
|
||||||
|
// Each is the first IDR access unit (VPS + SPS + PPS + IDR slice, Annex-B) of a 256×256 HEVC
|
||||||
|
// Range-Extensions clip — `chroma_format_idc = 3` — generated offline with libx265:
|
||||||
|
// ffmpeg -f lavfi -i color=c=gray:s=256x256:r=30:d=0.1 -frames:v 3 \
|
||||||
|
// -pix_fmt yuv444p[10le] -c:v libx265 \
|
||||||
|
// -x265-params keyint=1:min-keyint=1:no-info=1:repeat-headers=1:aud=0 -f hevc out.hevc
|
||||||
|
// 256×256 clears the hardware decoder's minimum-dimension floor (a 16×16 clip is rejected for every
|
||||||
|
// chroma format). Validated to hardware-decode to `444v`/`x444` on Apple Silicon (M3).
|
||||||
|
enum Probe444Blobs {
|
||||||
|
/// 256×256 HEVC Range-Extensions 4:4:4 keyframe (Annex-B): 134 bytes.
|
||||||
|
static let au444_8bit: [UInt8] = [
|
||||||
|
0x00, 0x00, 0x00, 0x01, 0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00,
|
||||||
|
0x9e, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, 0xba, 0x02, 0x40, 0x00, 0x00, 0x00, 0x01, 0x42,
|
||||||
|
0x01, 0x01, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, 0x9e, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c,
|
||||||
|
0x90, 0x01, 0x01, 0x00, 0x80, 0xb2, 0xdd, 0x49, 0x26, 0x57, 0x80, 0xb4, 0x04, 0x00, 0x00, 0x03,
|
||||||
|
0x00, 0x04, 0x00, 0x00, 0x03, 0x00, 0x78, 0x20, 0x00, 0x00, 0x00, 0x01, 0x44, 0x01, 0xc1, 0x72,
|
||||||
|
0x86, 0x0c, 0x06, 0x24, 0x00, 0x00, 0x00, 0x01, 0x28, 0x01, 0xaf, 0x72, 0x15, 0xe8, 0x34, 0xeb,
|
||||||
|
0xae, 0xfb, 0xfe, 0x75, 0x57, 0xca, 0xc1, 0x71, 0x43, 0x16, 0xf5, 0xc2, 0x40, 0xbd, 0x80, 0xa6,
|
||||||
|
0x65, 0x35, 0x20, 0x28, 0x81, 0xa2, 0x5e, 0xc0, 0x93, 0x04, 0x10, 0x9b, 0x00, 0x34, 0xe0, 0x87,
|
||||||
|
0x00, 0x00, 0x03, 0x00, 0x5b, 0x40,
|
||||||
|
]
|
||||||
|
|
||||||
|
/// 256×256 HEVC Range-Extensions 4:4:4 10-bit keyframe (Annex-B): 133 bytes.
|
||||||
|
static let au444_10bit: [UInt8] = [
|
||||||
|
0x00, 0x00, 0x00, 0x01, 0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00,
|
||||||
|
0x9c, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, 0xba, 0x02, 0x40, 0x00, 0x00, 0x00, 0x01, 0x42,
|
||||||
|
0x01, 0x01, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, 0x9c, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c,
|
||||||
|
0x90, 0x01, 0x01, 0x00, 0x80, 0x9b, 0x2d, 0xd4, 0x92, 0x65, 0x78, 0x0b, 0x40, 0x40, 0x00, 0x00,
|
||||||
|
0x03, 0x00, 0x40, 0x00, 0x00, 0x07, 0x82, 0x00, 0x00, 0x00, 0x01, 0x44, 0x01, 0xc1, 0x72, 0x86,
|
||||||
|
0x0c, 0x06, 0x24, 0x00, 0x00, 0x00, 0x01, 0x28, 0x01, 0xaf, 0x72, 0x15, 0xe8, 0x34, 0xeb, 0xae,
|
||||||
|
0xfb, 0xfe, 0x75, 0x57, 0xca, 0xc1, 0x71, 0x43, 0x16, 0xf5, 0xc2, 0x40, 0xbd, 0x80, 0xa6, 0x65,
|
||||||
|
0x35, 0x20, 0x28, 0x81, 0xa2, 0x5e, 0xc0, 0x93, 0x04, 0x10, 0x9b, 0x00, 0x34, 0xe0, 0x87, 0x00,
|
||||||
|
0x00, 0x03, 0x00, 0x5b, 0x40,
|
||||||
|
]
|
||||||
|
}
|
||||||
@@ -238,6 +238,13 @@ public final class PunktfunkConnection {
|
|||||||
public private(set) var colorFullRange: Bool = false
|
public private(set) var colorFullRange: Bool = false
|
||||||
/// Encoded bit depth (8 or 10).
|
/// Encoded bit depth (8 or 10).
|
||||||
public private(set) var bitDepth: UInt8 = 8
|
public private(set) var bitDepth: UInt8 = 8
|
||||||
|
/// The chroma subsampling the host resolved for this session, as the HEVC `chroma_format_idc`:
|
||||||
|
/// `1` = 4:2:0 (every pre-4:4:4 host, and the back-compat default) or `3` = full-chroma 4:4:4
|
||||||
|
/// (only when this client advertised `videoCap444` *and* the host could open a real 4:4:4
|
||||||
|
/// encoder). Drive the decoder's requested pixel format from this. See `isChroma444`.
|
||||||
|
public private(set) var chromaFormat: UInt8 = 1
|
||||||
|
/// Convenience: the resolved stream is full-chroma 4:4:4 (`chroma_format_idc == 3`).
|
||||||
|
public var isChroma444: Bool { chromaFormat == 3 }
|
||||||
/// True when the negotiated stream is HDR (PQ or HLG transfer) — drive an HDR present path and
|
/// True when the negotiated stream is HDR (PQ or HLG transfer) — drive an HDR present path and
|
||||||
/// drain `nextHdrMeta`.
|
/// drain `nextHdrMeta`.
|
||||||
public var isHDR: Bool { colorTransfer == 16 || colorTransfer == 18 }
|
public var isHDR: Bool { colorTransfer == 16 || colorTransfer == 18 }
|
||||||
@@ -334,6 +341,9 @@ public final class PunktfunkConnection {
|
|||||||
colorMatrix = mtx
|
colorMatrix = mtx
|
||||||
colorFullRange = fullRange != 0
|
colorFullRange = fullRange != 0
|
||||||
bitDepth = depth
|
bitDepth = depth
|
||||||
|
var cf: UInt8 = 1
|
||||||
|
_ = punktfunk_connection_chroma_format(handle, &cf)
|
||||||
|
chromaFormat = cf
|
||||||
var ac: UInt8 = 2
|
var ac: UInt8 = 2
|
||||||
_ = punktfunk_connection_audio_channels(handle, &ac)
|
_ = punktfunk_connection_audio_channels(handle, &ac)
|
||||||
resolvedAudioChannels = ac
|
resolvedAudioChannels = ac
|
||||||
@@ -605,6 +615,10 @@ public final class PunktfunkConnection {
|
|||||||
public static let videoCap10Bit: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_10BIT)
|
public static let videoCap10Bit: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_10BIT)
|
||||||
/// Video-capability bit: the client can present BT.2020 PQ HDR10 (implies 10-bit).
|
/// Video-capability bit: the client can present BT.2020 PQ HDR10 (implies 10-bit).
|
||||||
public static let videoCapHDR: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_HDR)
|
public static let videoCapHDR: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_HDR)
|
||||||
|
/// Video-capability bit: the client can decode a full-chroma 4:4:4 HEVC stream (Range
|
||||||
|
/// Extensions). Advertise only when the device can *hardware*-decode it (`Stage444Probe`);
|
||||||
|
/// the host then emits 4:4:4 only if it too opted in. `chromaFormat` reflects the real value.
|
||||||
|
public static let videoCap444: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_444)
|
||||||
|
|
||||||
/// Static HDR mastering metadata (SMPTE ST.2086 + content light level) the host sent for an HDR
|
/// Static HDR mastering metadata (SMPTE ST.2086 + content light level) the host sent for an HDR
|
||||||
/// session. Mirrors the wire/ABI `PunktfunkHdrMeta`; primaries are in ST.2086 **G, B, R** order,
|
/// session. Mirrors the wire/ABI `PunktfunkHdrMeta`; primaries are in ST.2086 **G, B, R** order,
|
||||||
|
|||||||
@@ -1,21 +1,21 @@
|
|||||||
// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async
|
// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async output
|
||||||
// output drops the newest decoded frame into a 1-slot ring; the hosting view's display link
|
// drops the newest decoded frame into a 1-slot ring; the hosting view's display link calls `renderTick`
|
||||||
// calls `renderTick` once per vsync to draw + present the newest ready frame and stamp
|
// once per vsync to draw + present the newest ready frame and stamp capture→present. Mirrors
|
||||||
// capture→present. Mirrors StreamPump's lifecycle (one per start; cancel is permanent).
|
// StreamPump's lifecycle (one per start; cancel is permanent).
|
||||||
//
|
//
|
||||||
// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick`
|
// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick` +
|
||||||
// + `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there).
|
// `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there). Only the ring (lock-guarded)
|
||||||
// Only the ring + decoder cross threads and both are internally locked.
|
// and the decoder/presenter (internally locked / main-hopped) cross threads.
|
||||||
|
|
||||||
#if canImport(Metal) && canImport(QuartzCore)
|
#if canImport(Metal) && canImport(QuartzCore)
|
||||||
import AVFoundation
|
import AVFoundation
|
||||||
import Foundation
|
import Foundation
|
||||||
import QuartzCore
|
import QuartzCore
|
||||||
|
|
||||||
/// Weak-target wrapper for CADisplayLink. The link retains its target, so targeting a view
|
/// Weak-target wrapper for CADisplayLink. The link retains its target, so targeting a view directly
|
||||||
/// directly makes a `view → link → view` cycle that only `invalidate()` breaks — if a teardown
|
/// makes a `view → link → view` cycle that only `invalidate()` breaks — if a teardown is ever missed
|
||||||
/// is ever missed the view leaks and keeps ticking. This proxy holds the handler weakly, so the
|
/// the view leaks and keeps ticking. This proxy holds the handler weakly, so the view can deallocate
|
||||||
/// view can deallocate and its `deinit` invalidate the link.
|
/// and its `deinit` invalidate the link.
|
||||||
public final class DisplayLinkProxy: NSObject {
|
public final class DisplayLinkProxy: NSObject {
|
||||||
private let onTick: (CADisplayLink) -> Void
|
private let onTick: (CADisplayLink) -> Void
|
||||||
public init(_ onTick: @escaping (CADisplayLink) -> Void) { self.onTick = onTick }
|
public init(_ onTick: @escaping (CADisplayLink) -> Void) { self.onTick = onTick }
|
||||||
@@ -44,10 +44,10 @@ private final class PumpToken: @unchecked Sendable {
|
|||||||
func cancel() { lock.lock(); live = false; lock.unlock() }
|
func cancel() { lock.lock(); live = false; lock.unlock() }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Throttled host keyframe requests for decode recovery. The decoder's async error callback
|
/// Throttled host keyframe requests for decode recovery. The decoder's async error callback (a VT
|
||||||
/// (a VT thread) and the pump thread (a submit failure) both signal a wedge; this coalesces
|
/// thread) and the pump thread (a submit failure) both signal a wedge; this coalesces them so the
|
||||||
/// them so the control stream isn't flooded while the decode stays stalled for several frames
|
/// control stream isn't flooded while the decode stays stalled for several frames until the requested
|
||||||
/// until the requested IDR lands. Bound to the live connection in `start`, unbound in `stop`.
|
/// IDR lands. Bound to the live connection in `start`, unbound in `stop`.
|
||||||
private final class KeyframeRecovery: @unchecked Sendable {
|
private final class KeyframeRecovery: @unchecked Sendable {
|
||||||
private let lock = NSLock()
|
private let lock = NSLock()
|
||||||
private var connection: PunktfunkConnection?
|
private var connection: PunktfunkConnection?
|
||||||
@@ -60,7 +60,7 @@ private final class KeyframeRecovery: @unchecked Sendable {
|
|||||||
func request() {
|
func request() {
|
||||||
lock.lock()
|
lock.lock()
|
||||||
let now = DispatchTime.now().uptimeNanoseconds
|
let now = DispatchTime.now().uptimeNanoseconds
|
||||||
let due = lastNs == 0 || now &- lastNs > 100_000_000 // ≥ 100 ms since the last request (matches Android)
|
let due = lastNs == 0 || now &- lastNs > 100_000_000 // ≥ 100 ms since the last request
|
||||||
if due { lastNs = now }
|
if due { lastNs = now }
|
||||||
let conn = due ? connection : nil
|
let conn = due ? connection : nil
|
||||||
lock.unlock()
|
lock.unlock()
|
||||||
@@ -76,30 +76,36 @@ public final class Stage2Pipeline {
|
|||||||
private let recovery = KeyframeRecovery()
|
private let recovery = KeyframeRecovery()
|
||||||
private var token = PumpToken()
|
private var token = PumpToken()
|
||||||
private var offsetNs: Int64 = 0
|
private var offsetNs: Int64 = 0
|
||||||
|
/// Signalled when the pump thread exits, so `stop()` can join it (bounded) before `decoder.reset()`
|
||||||
|
/// — otherwise a pump iteration already past its `token.isLive` check can rebuild a decode session
|
||||||
|
/// right after the reset (a brief orphan session). `pumpJoinable` is armed by `start`, consumed by
|
||||||
|
/// the first `stop` (so the idempotent second `stop`/deinit doesn't block on an already-drained
|
||||||
|
/// semaphore). start/stop are sequential lifecycle calls, so the plain flag is safe.
|
||||||
|
private let pumpStopped = DispatchSemaphore(value: 0)
|
||||||
|
private var pumpJoinable = false
|
||||||
|
|
||||||
/// The Metal layer the hosting view installs + sizes. nil-init fails when Metal is
|
/// The Metal layer the hosting view installs + sizes.
|
||||||
/// unavailable so the caller can fall back to stage-1.
|
|
||||||
public var layer: CAMetalLayer { presenter.layer }
|
public var layer: CAMetalLayer { presenter.layer }
|
||||||
|
|
||||||
/// `presentMeter` records capture→present (the glass-to-glass term). Returns nil if Metal
|
/// `presentMeter` records capture→present (the glass-to-glass term). Returns nil if Metal can't be
|
||||||
/// can't be set up (headless / no GPU) — caller falls back to the stage-1 presenter.
|
/// set up (headless / no GPU) — caller falls back to the stage-1 presenter.
|
||||||
public init?(presentMeter: LatencyMeter) {
|
public init?(presentMeter: LatencyMeter) {
|
||||||
guard let presenter = MetalVideoPresenter() else { return nil }
|
guard let presenter = MetalVideoPresenter.make() else { return nil }
|
||||||
self.presenter = presenter
|
self.presenter = presenter
|
||||||
self.presentMeter = presentMeter
|
self.presentMeter = presentMeter
|
||||||
let ring = ring
|
let ring = ring
|
||||||
let recovery = recovery
|
let recovery = recovery
|
||||||
self.decoder = VideoDecoder(
|
self.decoder = VideoDecoder(
|
||||||
onDecoded: { ring.submit($0) },
|
onDecoded: { ring.submit($0) },
|
||||||
// Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump
|
// Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump resets to
|
||||||
// resets to re-gate on the next IDR, and we ask the host to send one now (infinite
|
// re-gate on the next IDR, and we ask the host to send one now (infinite GOP — it wouldn't
|
||||||
// GOP — it wouldn't otherwise come soon). Throttled in KeyframeRecovery.
|
// otherwise come soon). Throttled in KeyframeRecovery.
|
||||||
onDecodeError: { _ in recovery.request() })
|
onDecodeError: { _ in recovery.request() })
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Start pulling AUs into the decoder. `onFrame` fires per AU at receipt (capture→client
|
/// Start pulling AUs into the decoder. MAIN THREAD. `onFrame` fires per AU at receipt (capture→client
|
||||||
/// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client)
|
/// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client) makes the
|
||||||
/// makes the present stamp cross-machine valid.
|
/// present stamp cross-machine valid.
|
||||||
public func start(
|
public func start(
|
||||||
connection: PunktfunkConnection,
|
connection: PunktfunkConnection,
|
||||||
onFrame: (@Sendable (AccessUnit) -> Void)?,
|
onFrame: (@Sendable (AccessUnit) -> Void)?,
|
||||||
@@ -108,34 +114,48 @@ public final class Stage2Pipeline {
|
|||||||
offsetNs = connection.clockOffsetNs
|
offsetNs = connection.clockOffsetNs
|
||||||
recovery.bind(connection) // arm host-keyframe recovery for this session
|
recovery.bind(connection) // arm host-keyframe recovery for this session
|
||||||
token = PumpToken() // fresh token per start — cancel is permanent (like StreamPump)
|
token = PumpToken() // fresh token per start — cancel is permanent (like StreamPump)
|
||||||
|
|
||||||
|
// Configure the decoder's chroma + the layer's initial colorimetry before the first frame. The
|
||||||
|
// chroma subsampling drives only the decode pixel format (orthogonal to HDR/depth); the HDR
|
||||||
|
// config is the Welcome's latched value, which a mid-session flip then overrides per-frame.
|
||||||
|
decoder.setChroma444(connection.isChroma444)
|
||||||
|
presenter.configure(hdr: connection.isHDR)
|
||||||
|
|
||||||
let token = token
|
let token = token
|
||||||
let decoder = decoder
|
let decoder = decoder
|
||||||
let recovery = recovery
|
let recovery = recovery
|
||||||
|
let presenter = presenter
|
||||||
|
let pumpStopped = pumpStopped
|
||||||
let thread = Thread {
|
let thread = Thread {
|
||||||
|
defer { pumpStopped.signal() } // let stop() join the pump (bounded) before decoder.reset()
|
||||||
var format: CMVideoFormatDescription?
|
var format: CMVideoFormatDescription?
|
||||||
var lastFramesDropped = connection.framesDropped()
|
var lastFramesDropped = connection.framesDropped()
|
||||||
// Persistent recovery WANT, not a one-shot edge (see StreamPump for the full rationale):
|
// Persistent recovery WANT, not a one-shot edge (see StreamPump for the full rationale):
|
||||||
// the old code advanced lastFramesDropped on the same edge it called recovery.request(),
|
// keep asking until an IDR lands so a request swallowed by the throttle is re-sent.
|
||||||
// so a request swallowed by the throttle (the lost recovery IDR being pruned within the
|
|
||||||
// window) was never re-sent and the picture stayed frozen. Keep asking until an IDR lands.
|
|
||||||
var awaitingIDR = false
|
var awaitingIDR = false
|
||||||
|
// 4:4:4 backstop: a run of decode/create failures in a 4:4:4 session means this device can't
|
||||||
|
// decode 4:4:4 at the negotiated resolution (the HW probe clears the common case but not a
|
||||||
|
// resolution-ceiling miss). End cleanly instead of looping on a black screen.
|
||||||
|
var decodeFailRun = 0
|
||||||
while token.isLive {
|
while token.isLive {
|
||||||
do {
|
do {
|
||||||
// Loss recovery (the primary path). The reassembler drops unrecoverable AUs
|
// Loss recovery (the primary path). The reassembler drops unrecoverable AUs and the
|
||||||
// (framesDropped) and the decoder conceals the reference-missing deltas that
|
// decoder conceals the reference-missing deltas — often WITHOUT an error callback —
|
||||||
// follow — often WITHOUT an error callback — so key off the drop count climbing,
|
// so key off the drop count climbing, then keep asking (awaitingIDR) until a fresh
|
||||||
// then keep asking (awaitingIDR) until a fresh IDR re-anchors decode. Polled every
|
// IDR re-anchors decode.
|
||||||
// iteration so a total-loss drought recovers the moment packets resume.
|
|
||||||
let dropped = connection.framesDropped()
|
let dropped = connection.framesDropped()
|
||||||
if dropped > lastFramesDropped {
|
if dropped > lastFramesDropped {
|
||||||
lastFramesDropped = dropped
|
lastFramesDropped = dropped
|
||||||
awaitingIDR = true
|
awaitingIDR = true
|
||||||
}
|
}
|
||||||
if awaitingIDR { recovery.request() }
|
if awaitingIDR { recovery.request() }
|
||||||
// Drain any HDR mastering-metadata update (0xCE) and hand it to the decoder, which
|
// Drain HDR mastering metadata (0xCE) and hand it to the PRESENTER (→ CAEDRMetadata).
|
||||||
// attaches it to subsequent HDR frames. Non-blocking; only HDR sessions emit these.
|
// Polled UNCONDITIONALLY (not gated on connection.isHDR, the fixed Welcome flag): the
|
||||||
if connection.isHDR, let meta = try? connection.nextHdrMeta(timeoutMs: 0) {
|
// host sends 0xCE only for HDR, INCLUDING a mid-session SDR→HDR transition (a game
|
||||||
decoder.setHdrMeta(meta)
|
// entering HDR — the host re-inits its encoder) the Welcome flag would never reflect.
|
||||||
|
// Non-blocking; nil for an SDR stream.
|
||||||
|
if let meta = try? connection.nextHdrMeta(timeoutMs: 0) {
|
||||||
|
presenter.setHdrMeta(meta)
|
||||||
}
|
}
|
||||||
guard let au = try connection.nextAU(timeoutMs: 100) else { continue }
|
guard let au = try connection.nextAU(timeoutMs: 100) else { continue }
|
||||||
onFrame?(au)
|
onFrame?(au)
|
||||||
@@ -144,12 +164,20 @@ public final class Stage2Pipeline {
|
|||||||
awaitingIDR = false // a fresh IDR re-anchored decode — recovery complete
|
awaitingIDR = false // a fresh IDR re-anchored decode — recovery complete
|
||||||
}
|
}
|
||||||
guard let f = format, token.isLive else { continue }
|
guard let f = format, token.isLive else { continue }
|
||||||
if !decoder.decode(au: au, format: f) {
|
if decoder.decode(au: au, format: f) {
|
||||||
// Submit/decoder error: drop the session and re-gate on the next IDR's
|
decodeFailRun = 0
|
||||||
// in-band parameter sets (a delta frame can't recover) — stage-1's policy —
|
} else {
|
||||||
// and keep asking for that IDR (infinite GOP) until one re-anchors decode.
|
// Submit/decoder error: drop the session and re-gate on the next IDR's in-band
|
||||||
|
// parameter sets (a delta frame can't recover) and keep asking for that IDR.
|
||||||
decoder.reset()
|
decoder.reset()
|
||||||
awaitingIDR = true
|
awaitingIDR = true
|
||||||
|
decodeFailRun += 1
|
||||||
|
// ~3 s of solid failure in a 4:4:4 session (and only there — a 4:2:0 loss
|
||||||
|
// recovers within a GOP) ⇒ 4:4:4 isn't decodable here; end the session.
|
||||||
|
if connection.isChroma444, decodeFailRun >= 180 {
|
||||||
|
if token.isLive { onSessionEnd?() }
|
||||||
|
break
|
||||||
|
}
|
||||||
}
|
}
|
||||||
} catch {
|
} catch {
|
||||||
if token.isLive { onSessionEnd?() }
|
if token.isLive { onSessionEnd?() }
|
||||||
@@ -159,22 +187,30 @@ public final class Stage2Pipeline {
|
|||||||
}
|
}
|
||||||
thread.name = "punktfunk-stage2-pump"
|
thread.name = "punktfunk-stage2-pump"
|
||||||
thread.qualityOfService = .userInteractive
|
thread.qualityOfService = .userInteractive
|
||||||
|
pumpJoinable = true
|
||||||
thread.start()
|
thread.start()
|
||||||
}
|
}
|
||||||
|
|
||||||
/// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp
|
/// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp capture→present at
|
||||||
/// capture→present at `targetPresentNs` — the display link's target present instant, already
|
/// `targetPresentNs` — the display link's target present instant, already converted to
|
||||||
/// converted to `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
|
/// `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
|
||||||
public func renderTick(targetPresentNs: Int64) {
|
public func renderTick(targetPresentNs: Int64) {
|
||||||
guard let frame = ring.take() else { return }
|
guard let frame = ring.take() else { return }
|
||||||
guard presenter.render(frame.pixelBuffer, isHDR: frame.isHDR) else { return }
|
guard presenter.render(frame.pixelBuffer, isHDR: frame.isHDR) else { return }
|
||||||
presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs)
|
presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Stop the pump (≤ one poll timeout) and drop the decode session. Does not close the
|
/// Stop the pump (≤ one poll timeout) and drop the decode session. MAIN THREAD; idempotent. Does not
|
||||||
/// connection. A restart needs a fresh Stage2Pipeline (cancel is permanent).
|
/// close the connection. A restart needs a fresh Stage2Pipeline (cancel is permanent).
|
||||||
public func stop() {
|
public func stop() {
|
||||||
token.cancel()
|
token.cancel()
|
||||||
|
// Join the pump (bounded: ≤ one nextAU poll + an in-flight decode) before resetting the decoder,
|
||||||
|
// so the pump can't rebuild a session right after the reset. Only the first stop joins; a
|
||||||
|
// repeat/deinit stop skips the already-drained semaphore.
|
||||||
|
if pumpJoinable {
|
||||||
|
pumpJoinable = false
|
||||||
|
_ = pumpStopped.wait(timeout: .now() + 0.5)
|
||||||
|
}
|
||||||
decoder.reset()
|
decoder.reset()
|
||||||
recovery.bind(nil) // stop requesting keyframes once the session is torn down
|
recovery.bind(nil) // stop requesting keyframes once the session is torn down
|
||||||
}
|
}
|
||||||
@@ -182,8 +218,8 @@ public final class Stage2Pipeline {
|
|||||||
deinit { token.cancel() }
|
deinit { token.cancel() }
|
||||||
|
|
||||||
/// Convert a `CADisplayLink.targetTimestamp` (CACurrentMediaTime basis) to a `CLOCK_REALTIME`
|
/// Convert a `CADisplayLink.targetTimestamp` (CACurrentMediaTime basis) to a `CLOCK_REALTIME`
|
||||||
/// nanosecond instant — the present clock the AU pts + skew offset live in. Projects to the
|
/// nanosecond instant — the present clock the AU pts + skew offset live in. Projects to the target
|
||||||
/// target present time (when the frame is actually on glass), not the moment we drew.
|
/// present time (when the frame is actually on glass), not the moment we drew.
|
||||||
public static func realtimeNs(forDisplayLinkTimestamp t: CFTimeInterval) -> Int64 {
|
public static func realtimeNs(forDisplayLinkTimestamp t: CFTimeInterval) -> Int64 {
|
||||||
let caNow = CACurrentMediaTime()
|
let caNow = CACurrentMediaTime()
|
||||||
var ts = timespec()
|
var ts = timespec()
|
||||||
|
|||||||
@@ -0,0 +1,83 @@
|
|||||||
|
// Runtime 4:4:4 HEVC decode-capability probe.
|
||||||
|
//
|
||||||
|
// We advertise `VIDEO_CAP_444` (so the host upgrades to a full-chroma 4:4:4 stream) ONLY when this
|
||||||
|
// device can decode 4:4:4 HEVC *in hardware* — software 4:4:4 decode works but is far too slow for a
|
||||||
|
// real-time stream at the negotiated resolution, so a software-only device must keep 4:2:0.
|
||||||
|
//
|
||||||
|
// `VTIsHardwareDecodeSupported(HEVC)` and the HEVC-decoder-capabilities dictionary report HEVC HW
|
||||||
|
// decode but expose nothing about `chroma_format_idc`, so the only reliable signal is to actually
|
||||||
|
// create a *hardware-required* `VTDecompressionSession` for a tiny synthetic 4:4:4 keyframe and
|
||||||
|
// confirm it both creates and decodes to the expected biplanar 4:4:4 pixel format. Validated on an
|
||||||
|
// Apple M3 (HW 4:4:4 8- and 10-bit decode to `444v`/`x444`); a software-only decoder fails the
|
||||||
|
// hardware-required create and we fall back to 4:2:0.
|
||||||
|
//
|
||||||
|
// The probe blobs are 256×256 (above the hardware decoder's minimum-dimension floor — a 16×16 clip
|
||||||
|
// is rejected for ALL chroma formats, including 4:2:0) HEVC Range-Extensions keyframes generated
|
||||||
|
// offline with libx265; see scripts notes. Results are cached (device-static) in lazy statics.
|
||||||
|
|
||||||
|
import CoreMedia
|
||||||
|
import CoreVideo
|
||||||
|
import Foundation
|
||||||
|
import VideoToolbox
|
||||||
|
|
||||||
|
public enum Stage444Probe {
|
||||||
|
/// True iff this device hardware-decodes 8-bit 4:4:4 HEVC (the host's current 4:4:4 path —
|
||||||
|
/// BT.709 limited `yuv444p`). Cached after first evaluation.
|
||||||
|
public static let hwDecode444_8bit: Bool = probeHardware444(
|
||||||
|
au: Probe444Blobs.au444_8bit,
|
||||||
|
want: kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange,
|
||||||
|
fullRangeSibling: kCVPixelFormatType_444YpCbCr8BiPlanarFullRange)
|
||||||
|
|
||||||
|
/// True iff this device hardware-decodes 10-bit 4:4:4 HEVC (the 4:4:4 ∩ HDR/10-bit intersection).
|
||||||
|
/// Cached after first evaluation.
|
||||||
|
public static let hwDecode444_10bit: Bool = probeHardware444(
|
||||||
|
au: Probe444Blobs.au444_10bit,
|
||||||
|
want: kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange,
|
||||||
|
fullRangeSibling: kCVPixelFormatType_444YpCbCr10BiPlanarFullRange)
|
||||||
|
|
||||||
|
/// Create a hardware-REQUIRED `VTDecompressionSession` for the synthetic 4:4:4 keyframe and
|
||||||
|
/// decode it, returning true only when the decoder produces the expected (video- or full-range)
|
||||||
|
/// biplanar 4:4:4 pixel format. Any failure (no hardware path, wrong output format, decode error)
|
||||||
|
/// → false → we keep 4:2:0.
|
||||||
|
private static func probeHardware444(
|
||||||
|
au auBytes: [UInt8], want: OSType, fullRangeSibling: OSType
|
||||||
|
) -> Bool {
|
||||||
|
let data = Data(auBytes)
|
||||||
|
guard let format = AnnexB.formatDescription(fromIDR: data) else { return false }
|
||||||
|
// Require a hardware decoder — a software false-positive would make us advertise 4:4:4 and
|
||||||
|
// then decode every real frame on the CPU, blowing the latency budget.
|
||||||
|
let spec: [CFString: Any] = [
|
||||||
|
kVTVideoDecoderSpecification_RequireHardwareAcceleratedVideoDecoder: true,
|
||||||
|
]
|
||||||
|
let attrs: [CFString: Any] = [
|
||||||
|
kCVPixelBufferPixelFormatTypeKey: want,
|
||||||
|
kCVPixelBufferMetalCompatibilityKey: true,
|
||||||
|
]
|
||||||
|
var session: VTDecompressionSession?
|
||||||
|
let created = VTDecompressionSessionCreate(
|
||||||
|
allocator: kCFAllocatorDefault, formatDescription: format,
|
||||||
|
decoderSpecification: spec as CFDictionary, imageBufferAttributes: attrs as CFDictionary,
|
||||||
|
outputCallback: nil, decompressionSessionOut: &session)
|
||||||
|
guard created == noErr, let session else { return false }
|
||||||
|
defer { VTDecompressionSessionInvalidate(session) }
|
||||||
|
|
||||||
|
let au = AccessUnit(data: data, ptsNs: 0, frameIndex: 0, flags: 0)
|
||||||
|
guard let sample = AnnexB.sampleBuffer(au: au, format: format) else { return false }
|
||||||
|
|
||||||
|
var produced: OSType = 0
|
||||||
|
let done = DispatchSemaphore(value: 0)
|
||||||
|
let status = VTDecompressionSessionDecodeFrame(
|
||||||
|
session, sampleBuffer: sample,
|
||||||
|
flags: [._EnableAsynchronousDecompression], infoFlagsOut: nil
|
||||||
|
) { status, _, imageBuffer, _, _ in
|
||||||
|
if status == noErr, let imageBuffer {
|
||||||
|
produced = CVPixelBufferGetPixelFormatType(imageBuffer)
|
||||||
|
}
|
||||||
|
done.signal()
|
||||||
|
}
|
||||||
|
guard status == noErr else { return false }
|
||||||
|
VTDecompressionSessionWaitForAsynchronousFrames(session)
|
||||||
|
_ = done.wait(timeout: .now() + 1.0)
|
||||||
|
return produced == want || produced == fullRangeSibling
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -137,8 +137,8 @@ public struct StreamView: NSViewRepresentable {
|
|||||||
public final class StreamLayerView: NSView {
|
public final class StreamLayerView: NSView {
|
||||||
private let displayLayer = AVSampleBufferDisplayLayer()
|
private let displayLayer = AVSampleBufferDisplayLayer()
|
||||||
private var pump: StreamPump?
|
private var pump: StreamPump?
|
||||||
/// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a
|
/// Stage-2 presenter (default): a CAMetalLayer sublayer driven by a display link instead of the
|
||||||
/// display link instead of the StreamPump → displayLayer path. nil = stage-1 (default).
|
/// StreamPump → displayLayer path. nil = stage-1 (Metal-unavailable fallback / DEBUG toggle).
|
||||||
var presentMeter: LatencyMeter?
|
var presentMeter: LatencyMeter?
|
||||||
private var stage2: Stage2Pipeline?
|
private var stage2: Stage2Pipeline?
|
||||||
private var stage2Link: CADisplayLink?
|
private var stage2Link: CADisplayLink?
|
||||||
@@ -638,7 +638,7 @@ public final class StreamLayerView: NSView {
|
|||||||
private func teardownStage2() {
|
private func teardownStage2() {
|
||||||
stage2Link?.invalidate()
|
stage2Link?.invalidate()
|
||||||
stage2Link = nil
|
stage2Link = nil
|
||||||
stage2?.stop()
|
stage2?.stop() // stops the pump (synchronous join) + drops the decode session
|
||||||
stage2 = nil
|
stage2 = nil
|
||||||
metalLayer?.removeFromSuperlayer()
|
metalLayer?.removeFromSuperlayer()
|
||||||
metalLayer = nil
|
metalLayer = nil
|
||||||
|
|||||||
@@ -92,8 +92,8 @@ public final class StreamViewController: UIViewController {
|
|||||||
public private(set) var connection: PunktfunkConnection?
|
public private(set) var connection: PunktfunkConnection?
|
||||||
private var pump: StreamPump?
|
private var pump: StreamPump?
|
||||||
private var observers: [NSObjectProtocol] = []
|
private var observers: [NSObjectProtocol] = []
|
||||||
/// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a
|
/// Stage-2 presenter (default): a CAMetalLayer sublayer driven by a CADisplayLink instead of the
|
||||||
/// CADisplayLink instead of the StreamPump → displayLayer path. nil = stage-1 (default).
|
/// StreamPump → displayLayer path. nil = stage-1 (Metal-unavailable fallback / DEBUG toggle).
|
||||||
var presentMeter: LatencyMeter?
|
var presentMeter: LatencyMeter?
|
||||||
private var stage2: Stage2Pipeline?
|
private var stage2: Stage2Pipeline?
|
||||||
private var stage2Link: CADisplayLink?
|
private var stage2Link: CADisplayLink?
|
||||||
@@ -155,19 +155,58 @@ public final class StreamViewController: UIViewController {
|
|||||||
}
|
}
|
||||||
|
|
||||||
#if os(iOS)
|
#if os(iOS)
|
||||||
// Pointer lock is only meaningful on iPad (iPhone has no hardware-pointer lock) and
|
/// Whether the user wants the mouse/trackpad pointer CAPTURED (pointer lock → relative
|
||||||
// only when capture is engaged. The system additionally requires full-screen + frontmost
|
/// movement, the gaming default) rather than forwarded as an absolute position (desktop
|
||||||
// and may drop it (Slide Over/Stage Manager/backgrounding) — verified in setCaptured().
|
/// use). Read live from UserDefaults so it tracks the Settings toggle; defaults to on when
|
||||||
public override var prefersPointerLocked: Bool {
|
/// unset. iPad-only — gated again in `prefersPointerLocked`.
|
||||||
captured && UIDevice.current.userInterfaceIdiom == .pad
|
private var pointerCaptureEnabled: Bool {
|
||||||
|
UserDefaults.standard.object(forKey: DefaultsKey.pointerCapture) as? Bool ?? true
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Whether the pointer should be CAPTURED right now: iPad, capture engaged, and the user
|
||||||
|
/// hasn't opted into the absolute (desktop) pointer. The system additionally requires
|
||||||
|
/// full-screen + frontmost and may drop the lock (Slide Over/Stage Manager/backgrounding) —
|
||||||
|
/// syncPointerLock() handles the actual grant/drop and falls back to absolute when unlocked.
|
||||||
|
private var wantsPointerLock: Bool {
|
||||||
|
captured && pointerCaptureEnabled && UIDevice.current.userInterfaceIdiom == .pad
|
||||||
|
}
|
||||||
|
|
||||||
|
public override var prefersPointerLocked: Bool { wantsPointerLock }
|
||||||
public override var prefersHomeIndicatorAutoHidden: Bool { true }
|
public override var prefersHomeIndicatorAutoHidden: Bool { true }
|
||||||
|
|
||||||
// If SwiftUI's UIHostingController reparents us, a plain container parent that forwards
|
// NOTE: we deliberately do NOT override `childViewControllerForPointerLock`. The default
|
||||||
// its pointer-lock decision to its children will then reach this VC. (UIHostingController
|
// returns nil, which tells the system to use THIS controller's own `prefersPointerLocked` —
|
||||||
// itself does not consult children, which is why GCMouse deltas can never arrive there —
|
// exactly what we want, since `PointerLockChain` forces our SwiftUI ancestors to forward the
|
||||||
// the touch path, always forwarded, is the unconditional fallback.)
|
// downward walk to us and we are the terminal anchor. Returning `self` here would make the
|
||||||
public override var childViewControllerForPointerLock: UIViewController? { self }
|
// system ask the same controller forever (it keeps delegating to the returned child) →
|
||||||
|
// unbounded recursion → stack overflow once the chain actually reaches us.
|
||||||
|
|
||||||
|
/// (Re)build or tear down the forced pointer-lock forwarding chain from this controller to the
|
||||||
|
/// window root so the system actually resolves our `prefersPointerLocked`. Safe to call
|
||||||
|
/// repeatedly — it no-ops until the view is in a window with a parent chain, and re-runs from
|
||||||
|
/// the appearance/parent callbacks once SwiftUI has placed us.
|
||||||
|
private func updatePointerLockChain() {
|
||||||
|
// Engaging needs a live parent chain to the window root; disengaging is always safe and
|
||||||
|
// must run even after the view has left the window (session teardown) so the stamped
|
||||||
|
// SwiftUI ancestors are cleared.
|
||||||
|
if wantsPointerLock, view.window != nil {
|
||||||
|
PointerLockChain.engage(self)
|
||||||
|
} else {
|
||||||
|
PointerLockChain.disengage(self)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public override func viewDidAppear(_ animated: Bool) {
|
||||||
|
super.viewDidAppear(animated)
|
||||||
|
// SwiftUI places us in the hierarchy AFTER start()'s setCaptured(true), and may reparent us
|
||||||
|
// later — re-anchor the chain here so a lock requested before we had a parent still lands.
|
||||||
|
updatePointerLockChain()
|
||||||
|
}
|
||||||
|
|
||||||
|
public override func didMove(toParent parent: UIViewController?) {
|
||||||
|
super.didMove(toParent: parent)
|
||||||
|
updatePointerLockChain() // chain shape changed — re-anchor (or no-op if not yet in a window)
|
||||||
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
func start(
|
func start(
|
||||||
@@ -200,7 +239,14 @@ public final class StreamViewController: UIViewController {
|
|||||||
// Indirect pointer (mouse/trackpad with no lock) → absolute cursor + buttons, routed
|
// Indirect pointer (mouse/trackpad with no lock) → absolute cursor + buttons, routed
|
||||||
// through InputCapture so the forwarding gate and release-on-blur apply uniformly.
|
// through InputCapture so the forwarding gate and release-on-blur apply uniformly.
|
||||||
streamView.onPointerMoveAbs = { [weak self] p in
|
streamView.onPointerMoveAbs = { [weak self] p in
|
||||||
self?.inputCapture?.sendMouseAbs(
|
guard let self else { return }
|
||||||
|
if iosInputDebug {
|
||||||
|
// Whether ANY UIKit pointer movement reaches us while the scene is LOCKED tells us
|
||||||
|
// if the trackpad (which may not be a GCMouse) can still be captured via UIKit.
|
||||||
|
iosInputLog.debug(
|
||||||
|
"UIKit pointer move x=\(p.x, privacy: .public) y=\(p.y, privacy: .public) locked=\(self.pointerLockEngaged() == true, privacy: .public) gcFwd=\(self.inputCapture?.gcMouseForwarding == true, privacy: .public)")
|
||||||
|
}
|
||||||
|
self.inputCapture?.sendMouseAbs(
|
||||||
x: p.x, y: p.y, surfaceWidth: p.w, surfaceHeight: p.h)
|
x: p.x, y: p.y, surfaceWidth: p.w, surfaceHeight: p.h)
|
||||||
}
|
}
|
||||||
streamView.onPointerButton = { [weak self] button, down in
|
streamView.onPointerButton = { [weak self] button, down in
|
||||||
@@ -210,7 +256,12 @@ public final class StreamViewController: UIViewController {
|
|||||||
// UNLOCKED regime; while locked, GCMouse's scroll handler owns it — mirror the
|
// UNLOCKED regime; while locked, GCMouse's scroll handler owns it — mirror the
|
||||||
// sendMouseAbs !gcMouseForwarding gate so the two can't double-send.
|
// sendMouseAbs !gcMouseForwarding gate so the two can't double-send.
|
||||||
streamView.onScroll = { [weak self] dx, dy in
|
streamView.onScroll = { [weak self] dx, dy in
|
||||||
guard let self, self.inputCapture?.gcMouseForwarding == false else { return }
|
guard let self else { return }
|
||||||
|
if iosInputDebug {
|
||||||
|
iosInputLog.debug(
|
||||||
|
"UIKit scroll dx=\(dx, privacy: .public) dy=\(dy, privacy: .public) locked=\(self.pointerLockEngaged() == true, privacy: .public)")
|
||||||
|
}
|
||||||
|
guard self.inputCapture?.gcMouseForwarding == false else { return }
|
||||||
self.inputCapture?.sendScroll(dx: dx, dy: dy)
|
self.inputCapture?.sendScroll(dx: dx, dy: dy)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -315,7 +366,7 @@ public final class StreamViewController: UIViewController {
|
|||||||
) {
|
) {
|
||||||
let metal = pipeline.layer
|
let metal = pipeline.layer
|
||||||
// Composites OVER the idle (un-enqueued in stage-2) AVSampleBufferDisplayLayer base.
|
// Composites OVER the idle (un-enqueued in stage-2) AVSampleBufferDisplayLayer base.
|
||||||
// (contentsScale + frame + drawableSize are all set by layoutMetalLayer() just below.)
|
// (contentsScale + frame are set by layoutMetalLayer() just below.)
|
||||||
streamView.layer.addSublayer(metal)
|
streamView.layer.addSublayer(metal)
|
||||||
metalLayer = metal
|
metalLayer = metal
|
||||||
stage2 = pipeline
|
stage2 = pipeline
|
||||||
@@ -372,7 +423,7 @@ public final class StreamViewController: UIViewController {
|
|||||||
private func teardownStage2() {
|
private func teardownStage2() {
|
||||||
stage2Link?.invalidate()
|
stage2Link?.invalidate()
|
||||||
stage2Link = nil
|
stage2Link = nil
|
||||||
stage2?.stop()
|
stage2?.stop() // stops the pump (synchronous join) + drops the decode session
|
||||||
stage2 = nil
|
stage2 = nil
|
||||||
metalLayer?.removeFromSuperlayer()
|
metalLayer?.removeFromSuperlayer()
|
||||||
metalLayer = nil
|
metalLayer = nil
|
||||||
@@ -392,6 +443,7 @@ public final class StreamViewController: UIViewController {
|
|||||||
captured = false
|
captured = false
|
||||||
}
|
}
|
||||||
setNeedsUpdateOfPrefersPointerLocked()
|
setNeedsUpdateOfPrefersPointerLocked()
|
||||||
|
updatePointerLockChain() // (re)anchor the SwiftUI ancestors so the lock actually resolves
|
||||||
syncPointerLock() // resolve cursor + GCMouse/absolute routing for the current state
|
syncPointerLock() // resolve cursor + GCMouse/absolute routing for the current state
|
||||||
let onCaptureChange = onCaptureChange
|
let onCaptureChange = onCaptureChange
|
||||||
let captured = captured
|
let captured = captured
|
||||||
@@ -420,7 +472,7 @@ public final class StreamViewController: UIViewController {
|
|||||||
pointerInteraction?.invalidate() // re-resolve the hidden/visible cursor for the state
|
pointerInteraction?.invalidate() // re-resolve the hidden/visible cursor for the state
|
||||||
if iosInputDebug {
|
if iosInputDebug {
|
||||||
iosInputLog.debug(
|
iosInputLog.debug(
|
||||||
"pointer lock isLocked=\(locked, privacy: .public) captured=\(self.captured, privacy: .public)")
|
"pointer lock isLocked=\(locked, privacy: .public) captured=\(self.captured, privacy: .public) useGCMouse=\(useGCMouse, privacy: .public) [\(self.inputCapture?.attachedMiceSummary ?? "n/a", privacy: .public)]")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|||||||
@@ -49,11 +49,10 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
/// pump can re-gate on the next IDR.
|
/// pump can re-gate on the next IDR.
|
||||||
private let onDecodeError: @Sendable (OSStatus) -> Void
|
private let onDecodeError: @Sendable (OSStatus) -> Void
|
||||||
|
|
||||||
/// Latest source HDR mastering metadata (from `PunktfunkConnection.nextHdrMeta`), attached to
|
/// Whether the negotiated stream is full-chroma 4:4:4 (`connection.isChroma444`), set once at
|
||||||
/// each decoded HDR pixel buffer so the compositor tone-maps from the real grade. Guarded by its
|
/// session start before any decode. Selects the 4:4:4 decode pixel format (orthogonal to bit
|
||||||
/// own lock — written by the pump thread, read on the VT decode callback.
|
/// depth / HDR). Read inside `createSessionLocked` under `lock`.
|
||||||
private let metaLock = NSLock()
|
private var chroma444 = false
|
||||||
private var hdrMeta: PunktfunkConnection.HdrMeta?
|
|
||||||
|
|
||||||
public init(
|
public init(
|
||||||
onDecoded: @escaping @Sendable (ReadyFrame) -> Void,
|
onDecoded: @escaping @Sendable (ReadyFrame) -> Void,
|
||||||
@@ -65,12 +64,13 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
|
|
||||||
deinit { teardown() }
|
deinit { teardown() }
|
||||||
|
|
||||||
/// Set the source HDR mastering metadata (drained from `PunktfunkConnection.nextHdrMeta`). It's
|
/// Select the chroma subsampling of the decode output (4:2:0 vs full-chroma 4:4:4). Call once at
|
||||||
/// attached to subsequent decoded HDR pixel buffers. Thread-safe; cheap to call on each update.
|
/// session start, before decoding, from `connection.isChroma444`. Takes effect on the next
|
||||||
public func setHdrMeta(_ meta: PunktfunkConnection.HdrMeta) {
|
/// session (re)build. Thread-safe.
|
||||||
metaLock.lock()
|
public func setChroma444(_ on: Bool) {
|
||||||
hdrMeta = meta
|
lock.lock()
|
||||||
metaLock.unlock()
|
chroma444 = on
|
||||||
|
lock.unlock()
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Submit one AU for asynchronous decode, (re)creating the session if `format` changed. The
|
/// Submit one AU for asynchronous decode, (re)creating the session if `format` changed. The
|
||||||
@@ -135,8 +135,10 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
|
|
||||||
/// True when `newFormat` carries a PQ (SMPTE ST 2084) or HLG transfer function — i.e. the host
|
/// True when `newFormat` carries a PQ (SMPTE ST 2084) or HLG transfer function — i.e. the host
|
||||||
/// is sending HDR (BT.2020). VideoToolbox populates the transfer-function extension from the
|
/// is sending HDR (BT.2020). VideoToolbox populates the transfer-function extension from the
|
||||||
/// HEVC VUI, so this tracks the *stream*, switching dynamically when the user toggles HDR
|
/// HEVC VUI, so this picks the decode bit depth (10-bit P010/x444 vs 8-bit NV12/444v) from the
|
||||||
/// (the host re-emits parameter sets with the new VUI → a new format desc → session rebuild).
|
/// stream. The present-side HDR config (colorspace/EDR/shader) is latched once per session from
|
||||||
|
/// the Welcome (`connection.isHDR`), which the host does NOT flip mid-session — so this predicate
|
||||||
|
/// and that config agree for the session (a `#if DEBUG` assert in the presenter guards it).
|
||||||
static func isHDRFormat(_ format: CMVideoFormatDescription) -> Bool {
|
static func isHDRFormat(_ format: CMVideoFormatDescription) -> Bool {
|
||||||
guard
|
guard
|
||||||
let tf = CMFormatDescriptionGetExtension(
|
let tf = CMFormatDescriptionGetExtension(
|
||||||
@@ -157,11 +159,18 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
session = nil
|
session = nil
|
||||||
format = nil
|
format = nil
|
||||||
|
|
||||||
|
// Decode pixel format is a 2×2 of (chroma, depth/HDR), both biplanar so the presenter binds
|
||||||
|
// plane 0 = luma, plane 1 = interleaved chroma uniformly — 4:4:4 just delivers a full-size
|
||||||
|
// chroma plane. 10-bit (P010 / `x444`) for HDR (PQ/HLG), 8-bit (NV12 / `444v`) otherwise.
|
||||||
let hdr = Self.isHDRFormat(newFormat)
|
let hdr = Self.isHDRFormat(newFormat)
|
||||||
let pixelFormat =
|
let pixelFormat: OSType = {
|
||||||
hdr
|
switch (chroma444, hdr) {
|
||||||
? kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010 (10-bit)
|
case (false, false): return kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12
|
||||||
: kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12 (8-bit)
|
case (false, true): return kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010
|
||||||
|
case (true, false): return kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange // 444v
|
||||||
|
case (true, true): return kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange // x444
|
||||||
|
}
|
||||||
|
}()
|
||||||
let imageAttrs: [CFString: Any] = [
|
let imageAttrs: [CFString: Any] = [
|
||||||
kCVPixelBufferMetalCompatibilityKey: true,
|
kCVPixelBufferMetalCompatibilityKey: true,
|
||||||
kCVPixelBufferPixelFormatTypeKey: pixelFormat,
|
kCVPixelBufferPixelFormatTypeKey: pixelFormat,
|
||||||
@@ -169,11 +178,20 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
var callback = VTDecompressionOutputCallbackRecord(
|
var callback = VTDecompressionOutputCallbackRecord(
|
||||||
decompressionOutputCallback: decoderOutputCallback,
|
decompressionOutputCallback: decoderOutputCallback,
|
||||||
decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque())
|
decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque())
|
||||||
|
// 4:4:4 sessions REQUIRE a hardware decoder: we only advertise 4:4:4 when the hardware probe
|
||||||
|
// passed, so a hardware-incapable mode (e.g. a resolution past the HW 4:4:4 ceiling) must fail
|
||||||
|
// HERE, synchronously, letting the pump's backstop end the session — rather than silently
|
||||||
|
// falling back to a software 4:4:4 decoder far too slow for a real-time stream. 4:2:0 keeps the
|
||||||
|
// software fallback (nil spec) as a robustness net.
|
||||||
|
let spec: CFDictionary? =
|
||||||
|
chroma444
|
||||||
|
? [kVTVideoDecoderSpecification_RequireHardwareAcceleratedVideoDecoder: true] as CFDictionary
|
||||||
|
: nil
|
||||||
var newSession: VTDecompressionSession?
|
var newSession: VTDecompressionSession?
|
||||||
let status = VTDecompressionSessionCreate(
|
let status = VTDecompressionSessionCreate(
|
||||||
allocator: kCFAllocatorDefault,
|
allocator: kCFAllocatorDefault,
|
||||||
formatDescription: newFormat,
|
formatDescription: newFormat,
|
||||||
decoderSpecification: nil, // hardware by default
|
decoderSpecification: spec,
|
||||||
imageBufferAttributes: imageAttrs as CFDictionary,
|
imageBufferAttributes: imageAttrs as CFDictionary,
|
||||||
outputCallback: &callback,
|
outputCallback: &callback,
|
||||||
decompressionSessionOut: &newSession)
|
decompressionSessionOut: &newSession)
|
||||||
@@ -195,26 +213,17 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
// pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively.
|
// pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively.
|
||||||
let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default)
|
let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default)
|
||||||
let ptsNs = p.value > 0 ? UInt64(p.value) : 0
|
let ptsNs = p.value > 0 ? UInt64(p.value) : 0
|
||||||
// HDR iff the decoder produced a 10-bit P010 buffer (we only request P010 for PQ streams).
|
// HDR iff the decoder produced a 10-bit buffer (we only request a 10-bit format for PQ/HLG
|
||||||
|
// streams). Covers 4:2:0 (P010) and 4:4:4 (`x444`), video- and full-range, so a 10-bit 4:4:4
|
||||||
|
// HDR frame isn't misclassified as SDR. (The mastering metadata is applied to the presenter's
|
||||||
|
// CAMetalLayer via CAEDRMetadata, not to this source buffer — a separate-drawable presenter
|
||||||
|
// never composites the source buffer's attachments, so attaching them here would be dead.)
|
||||||
|
let fmt = CVPixelBufferGetPixelFormatType(imageBuffer)
|
||||||
let isHDR =
|
let isHDR =
|
||||||
CVPixelBufferGetPixelFormatType(imageBuffer)
|
fmt == kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
|
||||||
== kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
|
|| fmt == kCVPixelFormatType_420YpCbCr10BiPlanarFullRange
|
||||||
// Attach the source's mastering display + content light level (ST.2086 / CEA-861.3) so the
|
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange
|
||||||
// compositor tone-maps from the real grade rather than inferring from the PQ colourspace
|
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange
|
||||||
// alone. The SEI byte payloads map 1:1 to these CVImageBuffer attachment keys.
|
|
||||||
if isHDR {
|
|
||||||
metaLock.lock()
|
|
||||||
let meta = hdrMeta
|
|
||||||
metaLock.unlock()
|
|
||||||
if let meta {
|
|
||||||
CVBufferSetAttachment(
|
|
||||||
imageBuffer, kCVImageBufferMasteringDisplayColorVolumeKey,
|
|
||||||
meta.masteringDisplayColorVolume() as CFData, .shouldPropagate)
|
|
||||||
CVBufferSetAttachment(
|
|
||||||
imageBuffer, kCVImageBufferContentLightLevelInfoKey,
|
|
||||||
meta.contentLightLevelInfo() as CFData, .shouldPropagate)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
onDecoded(
|
onDecoded(
|
||||||
ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer, isHDR: isHDR))
|
ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer, isHDR: isHDR))
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,11 +1,13 @@
|
|||||||
import XCTest
|
import XCTest
|
||||||
|
|
||||||
#if canImport(Metal)
|
#if canImport(Metal)
|
||||||
|
import CoreVideo
|
||||||
import Metal
|
import Metal
|
||||||
|
import QuartzCore
|
||||||
@testable import PunktfunkKit
|
@testable import PunktfunkKit
|
||||||
|
|
||||||
final class MetalPresenterTests: XCTestCase {
|
final class MetalPresenterTests: XCTestCase {
|
||||||
/// `MetalVideoPresenter.init?()` compiles the runtime Metal shaders (the BT.709/BT.2020 YUV→RGB
|
/// `MetalVideoPresenter.make()` compiles the runtime Metal shaders (the BT.709/BT.2020 YUV→RGB
|
||||||
/// fragment shaders plus the Catmull-Rom luma sampler). A `nil` result on a GPU-equipped host
|
/// fragment shaders plus the Catmull-Rom luma sampler). A `nil` result on a GPU-equipped host
|
||||||
/// means a shader failed to compile — this catches a malformed shader before it silently
|
/// means a shader failed to compile — this catches a malformed shader before it silently
|
||||||
/// degrades stage-2 to a stage-1 fallback on device.
|
/// degrades stage-2 to a stage-1 fallback on device.
|
||||||
@@ -14,8 +16,54 @@ final class MetalPresenterTests: XCTestCase {
|
|||||||
throw XCTSkip("no Metal device available in this environment")
|
throw XCTSkip("no Metal device available in this environment")
|
||||||
}
|
}
|
||||||
XCTAssertNotNil(
|
XCTAssertNotNil(
|
||||||
MetalVideoPresenter(),
|
MetalVideoPresenter.make(),
|
||||||
"stage-2 Metal shaders failed to compile (presenter init returned nil)")
|
"stage-2 Metal shaders failed to compile (presenter init returned nil)")
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// The HDR fix: `configure(hdr:)` must put the layer into the BT.2020-PQ EDR configuration with a
|
||||||
|
/// reference-white anchor (`edrMetadata`) — the missing anchor was what made HDR render "too
|
||||||
|
/// bright". SDR must use the plain 8-bit path with EDR off and no metadata. A mid-session flip is a
|
||||||
|
/// per-mode reconfigure, so the round trip back to SDR must fully restore the SDR config.
|
||||||
|
func testConfigureHDRSetsEDRAnchor() throws {
|
||||||
|
guard let presenter = MetalVideoPresenter.make() else {
|
||||||
|
throw XCTSkip("no Metal device available in this environment")
|
||||||
|
}
|
||||||
|
presenter.configure(hdr: true)
|
||||||
|
XCTAssertEqual(presenter.layer.pixelFormat, .rgba16Float, "HDR uses an EDR-capable drawable")
|
||||||
|
XCTAssertNotNil(presenter.layer.colorspace, "HDR layer must be tagged (itur_2100_PQ)")
|
||||||
|
XCTAssertTrue(
|
||||||
|
presenter.layer.wantsExtendedDynamicRangeContent, "EDR must be requested on all platforms")
|
||||||
|
XCTAssertNotNil(
|
||||||
|
presenter.layer.edrMetadata,
|
||||||
|
"HDR must anchor reference white via edrMetadata (the fix for 'too bright')")
|
||||||
|
|
||||||
|
// Mid-session HDR→SDR flip: the 8-bit path, EDR off, no metadata.
|
||||||
|
presenter.configure(hdr: false)
|
||||||
|
XCTAssertEqual(presenter.layer.pixelFormat, .bgra8Unorm, "SDR uses the plain 8-bit drawable")
|
||||||
|
XCTAssertFalse(presenter.layer.wantsExtendedDynamicRangeContent)
|
||||||
|
XCTAssertNil(presenter.layer.edrMetadata)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// `render` with a freshly-allocated NV12 buffer must present without crashing or hanging — the
|
||||||
|
/// main-thread present path is the highest-risk part of the stage-2 rewrite. (A headless CI with no
|
||||||
|
/// display can still allocate a drawable from a CAMetalLayer; if it can't, render returns false,
|
||||||
|
/// which is also a valid non-crashing outcome.)
|
||||||
|
func testRenderDoesNotCrashOnNV12Frame() throws {
|
||||||
|
guard let presenter = MetalVideoPresenter.make() else {
|
||||||
|
throw XCTSkip("no Metal device available in this environment")
|
||||||
|
}
|
||||||
|
presenter.configure(hdr: false)
|
||||||
|
var pb: CVPixelBuffer?
|
||||||
|
let attrs: [CFString: Any] = [kCVPixelBufferMetalCompatibilityKey: true]
|
||||||
|
let status = CVPixelBufferCreate(
|
||||||
|
kCFAllocatorDefault, 256, 256, kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,
|
||||||
|
attrs as CFDictionary, &pb)
|
||||||
|
guard status == kCVReturnSuccess, let pixelBuffer = pb else {
|
||||||
|
throw XCTSkip("could not allocate a test pixel buffer")
|
||||||
|
}
|
||||||
|
// Just asserting it returns (true or false) without trapping — the layer may have no drawable
|
||||||
|
// source headless, so a false return is acceptable.
|
||||||
|
_ = presenter.render(pixelBuffer, isHDR: false)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|||||||
@@ -0,0 +1,68 @@
|
|||||||
|
// 4:4:4 decode-path coverage: the hardware-capability probe is stable/cached, and a real 4:4:4 HEVC
|
||||||
|
// keyframe decodes through VideoDecoder to a biplanar 4:4:4 pixel buffer. Reuses the same synthetic
|
||||||
|
// 4:4:4 blobs the runtime probe ships with.
|
||||||
|
|
||||||
|
import CoreVideo
|
||||||
|
import VideoToolbox
|
||||||
|
import XCTest
|
||||||
|
@testable import PunktfunkKit
|
||||||
|
|
||||||
|
private final class FrameBox: @unchecked Sendable {
|
||||||
|
let lock = NSLock()
|
||||||
|
var frame: ReadyFrame?
|
||||||
|
var error: OSStatus?
|
||||||
|
}
|
||||||
|
|
||||||
|
final class Stage444Tests: XCTestCase {
|
||||||
|
/// The capability probe is device-static and cached — reading it twice must return the same value
|
||||||
|
/// (and must never crash, including where 4:4:4 is unsupported → false).
|
||||||
|
func testProbeIsStableAndCached() {
|
||||||
|
XCTAssertEqual(Stage444Probe.hwDecode444_8bit, Stage444Probe.hwDecode444_8bit)
|
||||||
|
XCTAssertEqual(Stage444Probe.hwDecode444_10bit, Stage444Probe.hwDecode444_10bit)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A real 8-bit 4:4:4 HEVC keyframe (the embedded probe blob) decodes through `VideoDecoder` with
|
||||||
|
/// `setChroma444(true)` to a 256×256 biplanar 4:4:4 (`444v`/`444f`) buffer classified SDR.
|
||||||
|
/// (4:4:4 sessions require a hardware decoder — skip where there isn't one, which is exactly where
|
||||||
|
/// the client wouldn't advertise 4:4:4 anyway.)
|
||||||
|
func testVideoDecoderDecodes444() throws {
|
||||||
|
try XCTSkipUnless(
|
||||||
|
Stage444Probe.hwDecode444_8bit, "no hardware 4:4:4 decode on this device")
|
||||||
|
let data = Data(Probe444Blobs.au444_8bit)
|
||||||
|
let format = try XCTUnwrap(
|
||||||
|
AnnexB.formatDescription(fromIDR: data), "the 4:4:4 blob must yield a format description")
|
||||||
|
let au = AccessUnit(data: data, ptsNs: 7_000_000, frameIndex: 0, flags: 0)
|
||||||
|
|
||||||
|
let box = FrameBox()
|
||||||
|
let done = DispatchSemaphore(value: 0)
|
||||||
|
let decoder = VideoDecoder(
|
||||||
|
onDecoded: { f in box.lock.lock(); box.frame = f; box.lock.unlock(); done.signal() },
|
||||||
|
onDecodeError: { s in box.lock.lock(); box.error = s; box.lock.unlock(); done.signal() })
|
||||||
|
decoder.setChroma444(true)
|
||||||
|
|
||||||
|
XCTAssertTrue(decoder.decode(au: au, format: format), "4:4:4 frame submit should succeed")
|
||||||
|
XCTAssertEqual(done.wait(timeout: .now() + 10), .success, "the decode callback must fire")
|
||||||
|
decoder.reset()
|
||||||
|
|
||||||
|
box.lock.lock(); let frame = box.frame; let error = box.error; box.lock.unlock()
|
||||||
|
XCTAssertNil(error.map { "decode error \($0)" })
|
||||||
|
let ready = try XCTUnwrap(frame, "a 4:4:4 ReadyFrame must be delivered")
|
||||||
|
XCTAssertEqual(CVPixelBufferGetWidth(ready.pixelBuffer), 256)
|
||||||
|
XCTAssertEqual(CVPixelBufferGetHeight(ready.pixelBuffer), 256)
|
||||||
|
let pf = CVPixelBufferGetPixelFormatType(ready.pixelBuffer)
|
||||||
|
XCTAssertTrue(
|
||||||
|
pf == kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange
|
||||||
|
|| pf == kCVPixelFormatType_444YpCbCr8BiPlanarFullRange,
|
||||||
|
"expected a biplanar 4:4:4 8-bit buffer, got \(fourCCString(pf))")
|
||||||
|
XCTAssertFalse(ready.isHDR, "an 8-bit BT.709 4:4:4 stream is SDR")
|
||||||
|
// The chroma plane (plane 1) must be FULL resolution for 4:4:4 (vs half for 4:2:0) — this is
|
||||||
|
// what lets the unchanged shader sample chroma at the luma UV.
|
||||||
|
XCTAssertEqual(CVPixelBufferGetWidthOfPlane(ready.pixelBuffer, 1), 256)
|
||||||
|
XCTAssertEqual(CVPixelBufferGetHeightOfPlane(ready.pixelBuffer, 1), 256)
|
||||||
|
}
|
||||||
|
|
||||||
|
private func fourCCString(_ t: OSType) -> String {
|
||||||
|
let b = [UInt8(t >> 24 & 0xff), UInt8(t >> 16 & 0xff), UInt8(t >> 8 & 0xff), UInt8(t & 0xff)]
|
||||||
|
return String(bytes: b, encoding: .ascii) ?? "\(t)"
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -3,12 +3,61 @@ title: "Apple Stage-2 Presenter (handoff)"
|
|||||||
description: "Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left."
|
description: "Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left."
|
||||||
---
|
---
|
||||||
|
|
||||||
> **Status:** SHIPPED behind the opt-in `punktfunk.presenter` flag (`AVSampleBufferDisplayLayer`
|
> **Status:** SHIPPED as the **default** presenter (stage-1 `AVSampleBufferDisplayLayer` is the
|
||||||
> stage-1 remains the default known-good path). Live-validated ~11 ms p50 capture→present (commit
|
> Metal-unavailable / DEBUG fallback). HDR corrected and **4:4:4** added on top of the proven
|
||||||
> `7b10714`). Code: `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,LatencyMeter}.swift`;
|
> main-thread present path (the hosting view's `CADisplayLink` drives `render` per vsync). Code:
|
||||||
> Settings has a presenter picker (`DefaultsKey.presenter`, `SettingsView.swift`). This doc is trimmed
|
> `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,Stage444Probe,LatencyMeter}.swift`.
|
||||||
> to design rationale + open items — the shipped `.swift` code is the source of truth for the
|
> This doc is trimmed to design rationale + open items — the shipped `.swift` code is the source of
|
||||||
> decode/present/measurement walkthrough.
|
> truth for the decode/present/measurement walkthrough.
|
||||||
|
>
|
||||||
|
> **HDR (the "too bright" fix).** The presenter renders to a *separate* CAMetalLayer drawable, so the
|
||||||
|
> mastering metadata that was attached to the source `CVPixelBuffer` was never composited — and with no
|
||||||
|
> reference-white anchor the system rendered the PQ signal far too bright. The fix is to keep the
|
||||||
|
> PQ-passthrough shader (BT.2020 limited→full → PQ R′G′B′ as-is) and put the anchor **on the layer**:
|
||||||
|
> `colorspace = itur_2100_PQ`, `wantsExtendedDynamicRangeContent = true` (on **all** platforms — the old
|
||||||
|
> `#if os(macOS)` guard left iOS/tvOS EDR half-engaged), and
|
||||||
|
> `edrMetadata = CAEDRMetadata.hdr10(displayInfo:contentInfo:opticalOutputScale: 203)`. 203 nits =
|
||||||
|
> BT.2408 HDR reference white anchors diffuse white at EDR 1.0; a larger value renders dimmer. The
|
||||||
|
> mastering/CLL blobs (host `0xCE` datagram) now refine `edrMetadata` (drained by the pump,
|
||||||
|
> `setHdrMeta` hops the layer write to main) rather than being attached to a never-composited source
|
||||||
|
> buffer. **Needs on-glass validation on a real EDR panel.**
|
||||||
|
>
|
||||||
|
> **Mid-session SDR↔HDR.** The control-plane colour (`connection.isHDR`, from the Welcome) is fixed per
|
||||||
|
> session, but the host can re-init its encoder mid-session (a game entering HDR), so the HEVC VUI — and
|
||||||
|
> the decoder's `frame.isHDR` — flips. The presenter follows the **decoded frame**, not the latched
|
||||||
|
> session flag: `render` calls the idempotent `configure(hdr:)` every frame, so on a flip it
|
||||||
|
> reconfigures the layer (per-mode pixel format `bgra8Unorm` SDR / `rgba16Float` HDR, colorspace, EDR)
|
||||||
|
> and selects the matching shader — all synchronously on the main thread (the present path is
|
||||||
|
> main-thread, so no cross-thread hop is needed). The last `0xCE` grade is cached so an SDR→HDR
|
||||||
|
> reconfigure re-applies the real mastering metadata instead of the bare anchor. The pump drains `0xCE`
|
||||||
|
> **unconditionally** (not gated on the Welcome flag) so a session that starts SDR still gets mastering
|
||||||
|
> metadata when it goes HDR. A ≤2-frame transition flash on the rare flip is accepted.
|
||||||
|
>
|
||||||
|
> **Pacing.** The hosting view owns a **main-runloop `CADisplayLink`** (a weak `DisplayLinkProxy`
|
||||||
|
> breaks the retain cycle) that calls `renderTick` once per vsync. `renderTick` pops the **newest**
|
||||||
|
> ready frame from the 1-slot ring (older undisplayed frames dropped — lowest latency, no smoothing
|
||||||
|
> buffer) and, if there is one, draws it via **manual `layer.nextDrawable()`** and presents at the next
|
||||||
|
> vsync; on an idle vsync (no new frame) it does nothing and the compositor holds the last presented
|
||||||
|
> drawable (no idle re-render — matters at 5K). `drawableSize` is set **before** `nextDrawable` (it
|
||||||
|
> doesn't track bounds, defaults to 0), so allocation always uses the decoded size. `maximumDrawableCount
|
||||||
|
> = 3`. macOS `displaySyncEnabled = **false**`: the display link is the single pacing source, so leaving
|
||||||
|
> the layer's own vsync wait on would *also* block `present`/`nextDrawable` on the main thread and
|
||||||
|
> serialize it to the display — the cause of the fullscreen judder; disabling it lets present return
|
||||||
|
> promptly. Present is stamped at the display link's `targetTimestamp` projected to `CLOCK_REALTIME`
|
||||||
|
> (the actual on-glass instant, <1 vsync after the draw — accurate for the HUD).
|
||||||
|
>
|
||||||
|
> *(History: an off-main `CAMetalDisplayLink` variant and an off-main blocking-render present thread
|
||||||
|
> were both tried and **reverted** — both measured slower on macOS *and* iPad than this main-thread
|
||||||
|
> display-link path, whose real judder fix was simply `displaySyncEnabled = false`, not moving present
|
||||||
|
> off-thread. Measured ~11 ms p50 on the main-thread path.)*
|
||||||
|
>
|
||||||
|
> **4:4:4.** Chroma, bit-depth, and colorimetry are orthogonal: the decode pixel format is a 2×2 of
|
||||||
|
> `(chroma, HDR)` → `420v/x420/444v/x444` (all biplanar, so the existing shaders sample a full-size
|
||||||
|
> chroma plane unchanged); the shader is keyed only on HDR. The client advertises `VIDEO_CAP_444` only
|
||||||
|
> when `Stage444Probe` confirms **hardware** 4:4:4 decode (a hardware-required `VTDecompressionSession`
|
||||||
|
> over an embedded 256×256 4:4:4 keyframe — software 4:4:4 is too slow for real-time; validated on M3:
|
||||||
|
> `444v`/`x444` produced). A bounded pump backstop ends a 4:4:4 session that persistently fails to
|
||||||
|
> decode (gated to 4:4:4 sessions, so 4:2:0 loss-recovery is untouched).
|
||||||
|
|
||||||
## Why stage 2 (design rationale)
|
## Why stage 2 (design rationale)
|
||||||
|
|
||||||
@@ -47,10 +96,28 @@ Async `VTDecompressionSession` callback → **1-slot newest-ready ring** → dis
|
|||||||
|
|
||||||
## Open items
|
## Open items
|
||||||
|
|
||||||
- **Make stage 2 the default** — after resolution / HDR edge-case checks (HDR = BT.2020/PQ, 10-bit
|
- **On-glass HDR validation** — eyeball `edrMetadata` + `opticalOutputScale: 203` on a real EDR panel
|
||||||
`…10BiPlanar` + EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`; ties in with the HDR roadmap).
|
(XDR display) against stage-1 side-by-side: diffuse white should sit at SDR-white level with only
|
||||||
|
highlights climbing. The reference white is a single named constant (`hdrReferenceWhiteNits`) for easy
|
||||||
|
tuning. (Needs a Windows HDR host; the Linux host is 8-bit SDR only.)
|
||||||
|
- **On-glass 4:4:4 validation** — confirm a `PUNKTFUNK_444` host (RTX box) streams a 4:4:4 session the
|
||||||
|
client decodes in hardware (HUD shows the resolved chroma); verify the resolution-ceiling backstop by
|
||||||
|
forcing a too-large 4:4:4 mode.
|
||||||
- **Glass-to-glass numbers via `tools/latency-probe`** — close the still-unmeasured host render→capture
|
- **Glass-to-glass numbers via `tools/latency-probe`** — close the still-unmeasured host render→capture
|
||||||
term.
|
term and confirm the main-thread display-link present p50 holds at ~11 ms (and isn't regressed by the
|
||||||
- **Smoothing / pacing policy** — present newest-ready for lowest latency today; a pacing policy can come
|
per-frame `configure` / HDR-anchor work).
|
||||||
later if frames look uneven.
|
- **Smoothing / pacing policy** — present newest-ready for lowest latency today; an optional even-pacing
|
||||||
- **iOS / iPadOS / tvOS stage-2 variants.**
|
policy (`present(_:afterMinimumDuration:)`) can come later if frames look uneven.
|
||||||
|
- **4:4:4 runtime downgrade-reconnect** — today a persistently-undecodable 4:4:4 session ends cleanly
|
||||||
|
(the live 4:4:4 decode requires hardware, so a resolution-ceiling miss fails the session create
|
||||||
|
*synchronously* and the pump backstop ends it — no black-screen loop); auto-reconnecting at 4:2:0
|
||||||
|
(dropping `VIDEO_CAP_444`) is a future refinement.
|
||||||
|
- **HLG** — `isHDR`/`isHDRFormat` fold HLG (transfer 18) in with PQ, but the presenter is PQ-only
|
||||||
|
(`itur_2100_PQ` + `hdr10` EDR), so an HLG stream would be mis-toned. Latent — no host emits HLG
|
||||||
|
(the stack is BT.2020 **PQ** only). A real HLG path (`itur_2100_HLG`, no PQ reference-white anchor)
|
||||||
|
is future work; until then HLG should be treated as out of scope.
|
||||||
|
- **Full-range** — the shaders hardcode limited→full expansion and the decoder requests the
|
||||||
|
`*VideoRange` formats regardless of `connection.colorFullRange`; VideoToolbox range-converts a
|
||||||
|
full-range source to video range on decode, so it stays self-consistent (mild level compression on
|
||||||
|
genuinely full-range content, which no host emits). Pre-existing; wire `colorFullRange` into the
|
||||||
|
range constants eventually.
|
||||||
|
|||||||
Reference in New Issue
Block a user