feat(apple): Improve presenter
apple / screenshots (push) Has been cancelled
apple / swift (push) Has been cancelled
ci / docs-site (push) Has been cancelled
ci / bench (push) Has been cancelled
ci / web (push) Has been cancelled
ci / rust (push) Has been cancelled
android-screenshots / screenshots (push) Successful in 2m16s
deb / build-publish (push) Successful in 3m26s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
windows-host / package (push) Successful in 6m48s
release / apple (push) Successful in 7m45s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m22s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m37s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
android / android (push) Successful in 9m35s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m32s
linux-client-screenshots / screenshots (push) Successful in 2m31s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m53s
web-screenshots / screenshots (push) Successful in 2m32s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m37s
docker / deploy-docs (push) Failing after 1m4s
flatpak / build-publish (push) Failing after 3m44s
apple / screenshots (push) Has been cancelled
apple / swift (push) Has been cancelled
ci / docs-site (push) Has been cancelled
ci / bench (push) Has been cancelled
ci / web (push) Has been cancelled
ci / rust (push) Has been cancelled
android-screenshots / screenshots (push) Successful in 2m16s
deb / build-publish (push) Successful in 3m26s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
windows-host / package (push) Successful in 6m48s
release / apple (push) Successful in 7m45s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m22s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m37s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
android / android (push) Successful in 9m35s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m32s
linux-client-screenshots / screenshots (push) Successful in 2m31s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m53s
web-screenshots / screenshots (push) Successful in 2m32s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m37s
docker / deploy-docs (push) Failing after 1m4s
flatpak / build-publish (push) Failing after 3m44s
feat(apple): add cursor capture on iPad
This commit is contained in:
@@ -144,11 +144,25 @@ Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protoc
|
||||
`test-loopback.sh` (Swift client vs synthetic punktfunk1-hosts on loopback — runs on macOS;
|
||||
includes the pairing ceremony + `--require-pairing` gate),
|
||||
`RemoteFirstLightTests` (full pipeline over the LAN). See
|
||||
[`clients/apple/README.md`](clients/apple/README.md). **Stage 2 presenter**
|
||||
(`VTDecompressionSession` + `CAMetalLayer`) is built and live-validated on glass behind the opt-in
|
||||
`punktfunk.presenter` flag (~11 ms p50 capture→present), to become the default after a few
|
||||
resolution/HDR checks. Next: make stage 2 the default, glass-to-glass numbers via
|
||||
`tools/latency-probe`, iOS/iPadOS/tvOS variants.
|
||||
[`clients/apple/README.md`](clients/apple/README.md). **Stage 2 presenter is now the DEFAULT**
|
||||
(stage-1 is the Metal-unavailable / DEBUG fallback): explicit `VTDecompressionSession` decode →
|
||||
`CAMetalLayer`, presented from the hosting view's **main-runloop `CADisplayLink`** (`renderTick` pops
|
||||
the newest ready frame per vsync; macOS `displaySyncEnabled = false` is the real fullscreen-judder fix,
|
||||
~11 ms p50). *(An off-main `CAMetalDisplayLink` and an off-main blocking-render present thread were
|
||||
both tried and reverted — both measured slower on macOS and iPad.)* **HDR fixed**
|
||||
(`design/apple-stage2-presenter.md`): the "too bright" bug was a missing reference-white anchor — the
|
||||
fix keeps the PQ-passthrough shader and adds `CAEDRMetadata.hdr10(…, opticalOutputScale: 203)` +
|
||||
`wantsExtendedDynamicRangeContent` on the layer (on all platforms; the old `#if os(macOS)` guard left
|
||||
iOS/tvOS EDR half-engaged), routing the 0xCE mastering metadata to the layer (via `setHdrMeta`) instead
|
||||
of a never-composited source buffer. **Mid-session SDR↔HDR** is handled: `render` reconciles the layer
|
||||
per-frame from the decoded `frame.isHDR` (per-mode pixel format `bgra8`/`rgba16Float`), so a game
|
||||
entering HDR mid-stream just reconfigures (last 0xCE grade cached + re-applied; pump drains 0xCE
|
||||
unconditionally). **4:4:4 added**: decode format is a 2×2 `(chroma, HDR)` matrix
|
||||
(`420v/x420/444v/x444`, all biplanar so the shaders are unchanged), advertised (`VIDEO_CAP_444`) only
|
||||
behind a **hardware-required `VTDecompressionSession` probe** (`Stage444Probe`, validated on M3) with a
|
||||
Settings opt-out + a bounded pump backstop for an undecodable 4:4:4 session. *HDR brightness + 4:4:4
|
||||
still need on-glass validation (Windows-HDR / `PUNKTFUNK_444` host).* Next: glass-to-glass numbers via
|
||||
`tools/latency-probe`.
|
||||
**Linux stage 1 done, first light 2026-06-12** (`clients/linux`, binary
|
||||
`punktfunk-client`): GTK4/libadwaita shell linking `punktfunk-core` directly (no C ABI;
|
||||
`NativeClient` is now `Sync` — mutexed plane receivers), mDNS host list, TOFU + SPAKE2
|
||||
|
||||
@@ -129,6 +129,8 @@ final class SessionModel: ObservableObject {
|
||||
#endif
|
||||
}()
|
||||
let hdrCapable = hdrEnabled && displayHDR
|
||||
// 4:4:4 opt-out (default on); the hardware-decode probe below is the real gate.
|
||||
let want444 = (UserDefaults.standard.object(forKey: DefaultsKey.enable444) as? Bool) ?? true
|
||||
Task.detached(priority: .userInitiated) {
|
||||
// PunktfunkConnection.init blocks on the QUIC handshake — keep it off the main
|
||||
// actor. The persistent identity is presented on every connect so a paired
|
||||
@@ -138,9 +140,21 @@ final class SessionModel: ObservableObject {
|
||||
// Advertise 10-bit + HDR10 when enabled: the host upgrades to a BT.2020 PQ Main10 stream
|
||||
// only for actual HDR content (its own gate); the VideoToolbox/Metal present path is
|
||||
// HDR-capable (P010 + itur_2100_PQ + EDR). 0 keeps the 8-bit BT.709 SDR stream.
|
||||
let videoCaps: UInt8 = hdrCapable
|
||||
var videoCaps: UInt8 = hdrCapable
|
||||
? (PunktfunkConnection.videoCap10Bit | PunktfunkConnection.videoCapHDR)
|
||||
: 0
|
||||
// Advertise full-chroma 4:4:4 only when allowed AND this device can HARDWARE-decode it
|
||||
// (software 4:4:4 is too slow for real-time). The host content-gates depth, so an
|
||||
// HDR-advertised session can still receive an 8-bit 4:4:4 stream (SDR content) — require
|
||||
// BOTH depths there. Otherwise a no-op (the host emits 4:4:4 only if it too opted in);
|
||||
// `chromaFormat` on the connection reflects what was actually resolved.
|
||||
let canDecode444 =
|
||||
hdrCapable
|
||||
? (Stage444Probe.hwDecode444_8bit && Stage444Probe.hwDecode444_10bit)
|
||||
: Stage444Probe.hwDecode444_8bit
|
||||
if want444, canDecode444 {
|
||||
videoCaps |= PunktfunkConnection.videoCap444
|
||||
}
|
||||
let result = Result { try PunktfunkConnection(
|
||||
host: host.address, port: host.port,
|
||||
width: width, height: height, refreshHz: hz,
|
||||
|
||||
@@ -25,6 +25,7 @@ struct SettingsView: View {
|
||||
@AppStorage(DefaultsKey.bitrateKbps) private var bitrateKbps = 0
|
||||
@AppStorage(DefaultsKey.presenter) private var presenter = "stage2"
|
||||
@AppStorage(DefaultsKey.hdrEnabled) private var hdrEnabled = true
|
||||
@AppStorage(DefaultsKey.enable444) private var enable444 = true
|
||||
@AppStorage(DefaultsKey.libraryEnabled) private var libraryEnabled = false
|
||||
@AppStorage(DefaultsKey.fullscreenWhileStreaming) private var fullscreenWhileStreaming = true
|
||||
@AppStorage(DefaultsKey.micEnabled) private var micEnabled = true
|
||||
@@ -36,6 +37,7 @@ struct SettingsView: View {
|
||||
@State private var showControllerTest = false
|
||||
#endif
|
||||
#if os(iOS)
|
||||
@AppStorage(DefaultsKey.pointerCapture) private var pointerCapture = true
|
||||
// The sidebar selection drives the detail pane on iPad and the pushed sub-page on iPhone.
|
||||
// Width class decides the initial value: nil on iPhone (show the category list first),
|
||||
// General on iPad (a two-column layout should never open with an empty detail).
|
||||
@@ -206,6 +208,7 @@ struct SettingsView: View {
|
||||
case .general:
|
||||
Form {
|
||||
streamModeSection
|
||||
pointerSection
|
||||
compositorSection
|
||||
}
|
||||
.formStyle(.grouped)
|
||||
@@ -581,6 +584,30 @@ struct SettingsView: View {
|
||||
}
|
||||
}
|
||||
|
||||
#if os(iOS)
|
||||
/// iPad-only pointer-capture toggle: lock the mouse/trackpad for relative movement (games) vs
|
||||
/// forward an absolute cursor position (desktop). Empty on iPhone (no hardware-pointer lock —
|
||||
/// the mouse path there is always the absolute fallback).
|
||||
@ViewBuilder private var pointerSection: some View {
|
||||
if UIDevice.current.userInterfaceIdiom == .pad {
|
||||
Section {
|
||||
Toggle("Capture pointer for games", isOn: $pointerCapture)
|
||||
} header: {
|
||||
Text("Pointer")
|
||||
} footer: {
|
||||
Text("With a mouse or trackpad connected, lock the pointer and send relative "
|
||||
+ "movement — the expected behavior for games (mouse-look). Turn this off for "
|
||||
+ "desktop use to keep the pointer free and send its absolute position instead. "
|
||||
+ "The lock needs the stream full-screen and frontmost; it falls back to the "
|
||||
+ "absolute pointer automatically (Stage Manager, Slide Over). Finger touch is "
|
||||
+ "unaffected. Applies from the next session.")
|
||||
.font(.geist(12, relativeTo: .caption))
|
||||
.foregroundStyle(.secondary)
|
||||
}
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
@ViewBuilder private var compositorSection: some View {
|
||||
Section {
|
||||
Picker("Compositor", selection: $compositor) {
|
||||
@@ -644,12 +671,15 @@ struct SettingsView: View {
|
||||
@ViewBuilder private var hdrSection: some View {
|
||||
Section {
|
||||
Toggle("10-bit HDR", isOn: $hdrEnabled)
|
||||
Toggle("Full chroma (4:4:4)", isOn: $enable444)
|
||||
} header: {
|
||||
Text("HDR")
|
||||
Text("Video quality")
|
||||
} footer: {
|
||||
Text("Request a 10-bit BT.2020 PQ (HDR10) stream. It only engages when the host is "
|
||||
+ "sending HDR content AND this display supports HDR — otherwise the stream stays "
|
||||
+ "8-bit SDR. Applies from the next session.")
|
||||
Text("HDR requests a 10-bit BT.2020 PQ (HDR10) stream — it only engages when the host is "
|
||||
+ "sending HDR content AND this display supports HDR. 4:4:4 requests full chroma "
|
||||
+ "(sharper text/UI, more bandwidth) — it only engages when this device can "
|
||||
+ "hardware-decode it AND the host opted in. Otherwise the stream stays 8-bit "
|
||||
+ "4:2:0 SDR. Applies from the next session.")
|
||||
.font(.geist(12, relativeTo: .caption))
|
||||
.foregroundStyle(.secondary)
|
||||
}
|
||||
|
||||
@@ -25,9 +25,19 @@ public enum DefaultsKey {
|
||||
/// Request a 10-bit BT.2020 PQ (HDR10) stream. On by default; only takes effect when the host
|
||||
/// has HDR content AND this display supports HDR — otherwise the stream stays 8-bit SDR.
|
||||
public static let hdrEnabled = "punktfunk.hdrEnabled"
|
||||
/// Request a full-chroma 4:4:4 stream when this device can HARDWARE-decode it (`Stage444Probe`).
|
||||
/// On by default; only takes effect when the host also opted in to 4:4:4 (otherwise the stream
|
||||
/// stays 4:2:0). Sharper text/UI at the cost of more bandwidth.
|
||||
public static let enable444 = "punktfunk.enable444"
|
||||
public static let hosts = "punktfunk.hosts"
|
||||
/// Client-side cursor mode: "auto" (shown only in gamescope sessions), "always", "never".
|
||||
public static let cursorMode = "punktfunk.cursorMode"
|
||||
/// iPad: capture the mouse/trackpad pointer (pointer lock → relative movement) for games,
|
||||
/// rather than forwarding an absolute cursor position. On by default. Only meaningful on iPad
|
||||
/// with a hardware mouse/trackpad; the system grants the lock only to a full-screen, frontmost
|
||||
/// scene and silently falls back to the absolute pointer when it can't (Stage Manager / Slide
|
||||
/// Over). Read by `StreamViewController.prefersPointerLocked`.
|
||||
public static let pointerCapture = "punktfunk.pointerCapture"
|
||||
/// Experimental: show the host's game library (browsed over the management API). Off by default.
|
||||
public static let libraryEnabled = "punktfunk.libraryEnabled"
|
||||
/// macOS: take the window fullscreen while streaming and restore it on the host list. On by default.
|
||||
|
||||
@@ -370,29 +370,32 @@ public final class GamepadFeedback {
|
||||
// Hidout traffic (lightbar / player LEDs / triggers) only exists on a PlayStation-pad
|
||||
// session — a DualSense or a DualShock 4 (lightbar only). Block briefly on it there and
|
||||
// let rumble own the wait elsewhere; on an Xbox session it stays nonblocking.
|
||||
let hasHidout = connection.resolvedGamepad == .dualSense
|
||||
|| connection.resolvedGamepad == .dualShock4
|
||||
let hidTimeout: UInt32 = hasHidout ? 10 : 0
|
||||
let thread = Thread { [connection, flag, drainDone, weak self] in
|
||||
while !flag.isStopped {
|
||||
do {
|
||||
if let r = try connection.nextRumble(timeoutMs: 10), r.pad == 0 {
|
||||
// Poll the feedback planes NON-BLOCKING. A blocking poll (timeoutMs > 0) holds
|
||||
// the connection's shared feedback lock for its whole wait; the video pump drains
|
||||
// HDR mastering metadata (nextHdrMeta) on the SAME lock every frame, so a blocking
|
||||
// poll here starved it and throttled HDR to ~1 fps (SDR, which never drains HDR
|
||||
// meta, was unaffected). Pacing with a short sleep OUTSIDE the lock (below) keeps
|
||||
// rumble/HID latency low while leaving the lock free between polls.
|
||||
if let r = try connection.nextRumble(timeoutMs: 0), r.pad == 0 {
|
||||
self?.rumble.apply(low: r.low, high: r.high)
|
||||
}
|
||||
// Drain a BOUNDED burst of hidout events: only the first poll waits,
|
||||
// and the cap + stop check keep sustained 0xCD traffic (a game writing
|
||||
// per-frame LED/trigger reports) from starving the rumble poll above
|
||||
// or blocking stop() past one cycle.
|
||||
// Drain a BOUNDED burst of hidout events so sustained 0xCD traffic (a game writing
|
||||
// per-frame LED/trigger reports) can't spin here or block stop() past one cycle.
|
||||
var burst = 0
|
||||
while burst < 64, !flag.isStopped,
|
||||
let ev = try connection.nextHidOutput(
|
||||
timeoutMs: burst == 0 ? hidTimeout : 0) {
|
||||
let ev = try connection.nextHidOutput(timeoutMs: 0) {
|
||||
self?.render(ev)
|
||||
burst += 1
|
||||
}
|
||||
} catch {
|
||||
break // .closed (or fatal) — the session is over
|
||||
}
|
||||
// ~8 ms poll cadence (≈125 Hz), slept OUTSIDE the feedback lock — low rumble/HID
|
||||
// latency without holding the lock the HDR-meta drain needs.
|
||||
if !flag.isStopped { Thread.sleep(forTimeInterval: 0.008) }
|
||||
}
|
||||
drainDone.signal()
|
||||
}
|
||||
|
||||
@@ -107,6 +107,23 @@ public final class InputCapture {
|
||||
/// macOS (no GCMouse handlers installed; `sendMouseAbs` is never called there). Main-queue.
|
||||
public var gcMouseForwarding = false
|
||||
|
||||
#if os(iOS)
|
||||
/// Whether any device is attached as a `GCMouse` right now. The Magic Keyboard TRACKPAD does
|
||||
/// not always register as a GCMouse on iPadOS (only a standalone mouse does) — when no GCMouse
|
||||
/// is present the relative GCMouse path can't carry pointer motion. Main-queue.
|
||||
public var hasGCMouse: Bool { !mice.isEmpty }
|
||||
|
||||
/// Diagnostic: a one-line description of every attached GCMouse (count + GCDevice identity), so
|
||||
/// PUNKTFUNK_INPUT_DEBUG can reveal whether the trackpad showed up as a mouse at all.
|
||||
public var attachedMiceSummary: String {
|
||||
guard !mice.isEmpty else { return "0 mice" }
|
||||
let parts = mice.map { mouse -> String in
|
||||
"\(mouse.productCategory)/\(mouse.vendorName ?? "?")"
|
||||
}
|
||||
return "\(mice.count) mice: \(parts.joined(separator: ", "))"
|
||||
}
|
||||
#endif
|
||||
|
||||
/// Fired on ⌘⎋ (the capture toggle — detected here so it works in both states; the
|
||||
/// event itself is swallowed). Main queue.
|
||||
public var onToggleCapture: (() -> Void)?
|
||||
@@ -394,6 +411,12 @@ public final class InputCapture {
|
||||
!mice.contains(where: { $0 === mouse }) // re-delivered on wake — attach once
|
||||
else { return }
|
||||
mice.append(mouse)
|
||||
#if os(iOS)
|
||||
if inputDebug {
|
||||
inputLog.debug(
|
||||
"GCMouse attached: \(mouse.productCategory, privacy: .public)/\(mouse.vendorName ?? "?", privacy: .public) — now \(self.attachedMiceSummary, privacy: .public)")
|
||||
}
|
||||
#endif
|
||||
// macOS drives motion + buttons from NSEvent (StreamLayerView's local monitor →
|
||||
// sendMotion/sendMouseButton) because GCMouse's handlers proved unreliable there;
|
||||
// installing them too would double-send. iOS keeps GCMouse (raw deltas under
|
||||
|
||||
@@ -1,10 +1,12 @@
|
||||
// Stage-2 presenter, present half: draw a decoded NV12 CVPixelBuffer into a CAMetalLayer
|
||||
// drawable with a BT.709 YUV→RGB shader. The display link (owned by the hosting view) drives
|
||||
// `render` once per vsync with the target present time, so a present can finally be stamped and
|
||||
// the present tail hand-paced. See docs apple-stage2-presenter.md.
|
||||
// Stage-2 presenter, present half: draw a decoded NV12 / P010 / 4:4:4 CVPixelBuffer into a CAMetalLayer
|
||||
// drawable with a Y′CbCr→RGB shader. The hosting view's CADisplayLink drives `render` once per vsync
|
||||
// (via Stage2Pipeline.renderTick) with the target present time, so a present can be stamped and the
|
||||
// present tail hand-paced. See docs apple-stage2-presenter.md.
|
||||
//
|
||||
// Main-thread only: created during view setup, `render` called from the view's CADisplayLink
|
||||
// (which fires on the main runloop). The Metal objects + texture cache are touched only here.
|
||||
// Main-thread only: created during view setup, `render`/`configure` called from the view's CADisplayLink
|
||||
// (which fires on the main runloop). The Metal objects + texture cache are touched only here. The one
|
||||
// exception is `setHdrMeta`, called from the pump thread — it hops the layer write to main so every
|
||||
// CALayer mutation stays on one thread.
|
||||
|
||||
#if canImport(Metal) && canImport(QuartzCore)
|
||||
import CoreGraphics
|
||||
@@ -15,10 +17,19 @@ import os
|
||||
|
||||
private let presenterLog = Logger(subsystem: "io.unom.punktfunk", category: "presenter")
|
||||
|
||||
/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and a
|
||||
/// BT.709 limited-range NV12→RGB fragment shader. uv.y is flipped (1 - p.y) so the top-left-
|
||||
/// origin texture presents upright (NDC y is up), not upside down. (Colorspace is BT.709 SDR
|
||||
/// for now — matches the host; 10-bit/HDR + other matrices are a later tie-in.)
|
||||
/// HDR reference white (BT.2408 "HDR Reference White"): the absolute luminance, in nits, that the
|
||||
/// PQ signal's diffuse white sits at. Passed to `CAEDRMetadata.hdr10(opticalOutputScale:)`, it anchors
|
||||
/// 203-nit diffuse white at EDR 1.0 (the display's SDR-white level) and lets the system tone-map the
|
||||
/// brighter highlights into the panel's headroom. This is the missing anchor that made the old HDR path
|
||||
/// render "way too bright" (no `edrMetadata` → no reference-white anchoring); a LARGER value renders
|
||||
/// dimmer. Matches the host's standard PQ reference white.
|
||||
private let hdrReferenceWhiteNits: Float = 203.0
|
||||
|
||||
/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and BT.709 SDR
|
||||
/// and BT.2020-PQ HDR Y′CbCr→RGB fragment shaders. uv.y is flipped (1 - p.y) so the top-left-origin
|
||||
/// texture presents upright (NDC y is up). The HDR shader outputs PQ-encoded R′G′B′ as-is — the
|
||||
/// CAMetalLayer's `itur_2100_PQ` colour space + `edrMetadata` tell the system compositor the samples
|
||||
/// are PQ and how to tone-map them (no EOTF here, matching the host's BT.2020 PQ emission).
|
||||
private let shaderSource = """
|
||||
#include <metal_stdlib>
|
||||
using namespace metal;
|
||||
@@ -66,6 +77,8 @@ float catmullRomLuma(texture2d<float> tex, sampler s, float2 uv) {
|
||||
return r;
|
||||
}
|
||||
|
||||
// SDR: 8-bit NV12 / 4:4:4 (BT.709, limited/video range) → full-range RGB. Chroma is sampled at the
|
||||
// same UV as luma, so a full-size 4:4:4 chroma plane needs no shader change vs 4:2:0.
|
||||
fragment float4 pf_frag(VOut in [[stage_in]],
|
||||
texture2d<float> lumaTex [[texture(0)]],
|
||||
texture2d<float> chromaTex [[texture(1)]]) {
|
||||
@@ -82,18 +95,18 @@ fragment float4 pf_frag(VOut in [[stage_in]],
|
||||
return float4(saturate(float3(r, g, b)), 1.0);
|
||||
}
|
||||
|
||||
// HDR: 10-bit P010 (BT.2020, limited range), Y'CbCr that is PQ-encoded. We apply the BT.2020
|
||||
// matrix to get PQ-encoded R'G'B' and output it as-is — the CAMetalLayer's itur_2100_PQ colour
|
||||
// space + EDR tells the compositor the samples are PQ, so it does the PQ→display mapping. No EOTF
|
||||
// here (matching the host, which emitted BT.2020 PQ). P010 stores the 10-bit code in the high bits
|
||||
// of each 16-bit sample, so an .r16Unorm sample reads ~code/1023 (the /1024 vs /1023 error is < 0.1%).
|
||||
// HDR: 10-bit P010 / 4:4:4 (BT.2020, limited range), Y′CbCr that is PQ-encoded. We apply the BT.2020
|
||||
// matrix to get PQ-encoded R′G′B′ and output it as-is — the CAMetalLayer's itur_2100_PQ colour space
|
||||
// + edrMetadata tell the compositor the samples are PQ, so it does the PQ→display tone-map. No EOTF
|
||||
// here. P010/x444 store the 10-bit code in the high bits of each 16-bit sample, so an .r16Unorm sample
|
||||
// reads ~code/1023 (the /1024 vs /1023 error is < 0.1%).
|
||||
fragment float4 pf_frag_hdr(VOut in [[stage_in]],
|
||||
texture2d<float> lumaTex [[texture(0)]],
|
||||
texture2d<float> chromaTex [[texture(1)]]) {
|
||||
constexpr sampler s(filter::linear, address::clamp_to_edge);
|
||||
float y = catmullRomLuma(lumaTex, s, in.uv);
|
||||
float2 c = chromaTex.sample(s, in.uv).rg;
|
||||
// BT.2020 10-bit limited (video) range → full-range PQ R'G'B'.
|
||||
// BT.2020 10-bit limited (video) range → full-range PQ R′G′B′.
|
||||
y = (y - 64.0/1023.0) * (1023.0/876.0);
|
||||
float u = (c.x - 512.0/1023.0) * (1023.0/896.0);
|
||||
float v = (c.y - 512.0/1023.0) * (1023.0/896.0);
|
||||
@@ -110,26 +123,34 @@ public final class MetalVideoPresenter {
|
||||
|
||||
private let device: MTLDevice
|
||||
private let queue: MTLCommandQueue
|
||||
/// SDR (BT.709 8-bit NV12 → bgra8) and HDR (BT.2020 PQ 10-bit P010 → rgba16Float) pipelines.
|
||||
/// Selected per frame by `render`; the layer is reconfigured when the mode flips (HDR toggle).
|
||||
/// SDR (BT.709 8-bit → bgra8) and HDR (BT.2020 PQ 10-bit → rgba16Float) pipelines. Selected per
|
||||
/// frame in `render`; the layer is reconfigured to match when the session flips (HDR toggle).
|
||||
private let pipelineSDR: MTLRenderPipelineState
|
||||
private let pipelineHDR: MTLRenderPipelineState
|
||||
private var textureCache: CVMetalTextureCache?
|
||||
/// Current layer configuration — switched lazily in `configure(hdr:)` when a frame's mode differs.
|
||||
|
||||
/// Current layer configuration — switched in `configure(hdr:)` when a frame's HDR-ness differs.
|
||||
/// Main-thread only (read + written from `render`/`configure`, all on the display-link runloop).
|
||||
private var hdrActive = false
|
||||
/// Last HDR mastering grade received via `setHdrMeta` (the host's 0xCE). Cached so a mid-session
|
||||
/// SDR→HDR flip's `configureColor` re-applies the real grade instead of clobbering it back to the
|
||||
/// bare reference-white anchor (an out-of-order race otherwise: `setHdrMeta` and the flip both write
|
||||
/// `edrMetadata`). Main-thread only.
|
||||
private var lastHdrMeta: PunktfunkConnection.HdrMeta?
|
||||
|
||||
#if DEBUG
|
||||
/// Last logged "decoded→drawable" signature, so the diagnostic logs only when a size changes
|
||||
/// (on first frame, a resize, or a host Reconfigure) instead of every frame.
|
||||
/// Last logged "decoded→drawable" signature, so the diagnostic logs only on a size/HDR change.
|
||||
private var lastSizeSig = ""
|
||||
#endif
|
||||
|
||||
/// nil if Metal is unavailable (no GPU / a headless CI) — the caller falls back to stage-1.
|
||||
public init?() {
|
||||
/// nil if Metal is unavailable (no GPU / a headless CI) or a shader fails to compile — the caller
|
||||
/// falls back to stage-1.
|
||||
public static func make() -> MetalVideoPresenter? {
|
||||
guard let device = MTLCreateSystemDefaultDevice(),
|
||||
let queue = device.makeCommandQueue()
|
||||
else { return nil }
|
||||
self.device = device
|
||||
self.queue = queue
|
||||
let pipelineSDR: MTLRenderPipelineState
|
||||
let pipelineHDR: MTLRenderPipelineState
|
||||
do {
|
||||
let library = try device.makeLibrary(source: shaderSource, options: nil)
|
||||
let vtx = library.makeFunction(name: "pf_vtx")
|
||||
@@ -146,99 +167,140 @@ public final class MetalVideoPresenter {
|
||||
} catch {
|
||||
return nil
|
||||
}
|
||||
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &textureCache)
|
||||
guard textureCache != nil else { return nil }
|
||||
var cache: CVMetalTextureCache?
|
||||
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &cache)
|
||||
guard let textureCache = cache else { return nil }
|
||||
|
||||
let layer = CAMetalLayer()
|
||||
layer.device = device
|
||||
layer.pixelFormat = .bgra8Unorm
|
||||
layer.framebufferOnly = true
|
||||
layer.isOpaque = true
|
||||
// Render the drawable at the DECODED frame's resolution (set per-frame in `render`) and let
|
||||
// the system compositor scale it to the layer's bounds — the same `.resizeAspect` path
|
||||
// stage-1's AVSampleBufferDisplayLayer (videoGravity) uses, so stage-2 matches its sharpness.
|
||||
// A native-resolution present is then pixel-exact (1:1, no shader scaling), and any display
|
||||
// scaling uses the system's high-quality scaler rather than the in-shader bicubic.
|
||||
layer.contentsGravity = .resizeAspect
|
||||
// Triple-buffer: more in-flight drawables before `nextDrawable()` (called on the
|
||||
// display-link / MAIN thread) has to block waiting for one to free.
|
||||
layer.maximumDrawableCount = 3
|
||||
#if os(macOS)
|
||||
// The display link already paces exactly one present per vsync. Leaving the layer's
|
||||
// own vsync wait on means `commandBuffer.present` ALSO blocks for the hardware vsync,
|
||||
// so `nextDrawable()` stalls the MAIN thread until a drawable frees — windowed, the
|
||||
// WindowServer's looser compositing hides it; FULLSCREEN's tighter, more-direct path
|
||||
// serializes the main thread to the display and the stall surfaces as bad judder.
|
||||
// Disabling the layer-level sync lets present return promptly (the display link is the
|
||||
// pacing source), which is what fixes the fullscreen stutter. macOS-only property.
|
||||
// The display link already paces exactly one present per vsync. Leaving the layer's own vsync
|
||||
// wait on means `commandBuffer.present` ALSO blocks for the hardware vsync, so `nextDrawable()`
|
||||
// stalls the MAIN thread until a drawable frees — windowed, the WindowServer's looser
|
||||
// compositing hides it; FULLSCREEN's tighter path serializes the main thread to the display and
|
||||
// the stall surfaces as bad judder. Disabling the layer-level sync lets present return promptly
|
||||
// (the display link is the pacing source) — the fix for the fullscreen stutter. macOS-only.
|
||||
layer.displaySyncEnabled = false
|
||||
#endif
|
||||
// Render the drawable at the DECODED frame's resolution (set per-frame in `render`) and let the
|
||||
// system compositor scale it to the layer's bounds — the same `.resizeAspect` path stage-1's
|
||||
// AVSampleBufferDisplayLayer uses. A native-resolution present is then pixel-exact (1:1, no
|
||||
// shader scaling); a resized window rescales via the system's scaler.
|
||||
layer.contentsGravity = .resizeAspect
|
||||
// Triple-buffer: more in-flight drawables before `nextDrawable()` (called on the display-link /
|
||||
// MAIN thread) has to block waiting for one to free.
|
||||
layer.maximumDrawableCount = 3
|
||||
|
||||
return MetalVideoPresenter(
|
||||
device: device, queue: queue, pipelineSDR: pipelineSDR, pipelineHDR: pipelineHDR,
|
||||
textureCache: textureCache, layer: layer)
|
||||
}
|
||||
|
||||
private init(
|
||||
device: MTLDevice, queue: MTLCommandQueue, pipelineSDR: MTLRenderPipelineState,
|
||||
pipelineHDR: MTLRenderPipelineState, textureCache: CVMetalTextureCache, layer: CAMetalLayer
|
||||
) {
|
||||
self.device = device
|
||||
self.queue = queue
|
||||
self.pipelineSDR = pipelineSDR
|
||||
self.pipelineHDR = pipelineHDR
|
||||
self.textureCache = textureCache
|
||||
self.layer = layer
|
||||
}
|
||||
|
||||
/// Reconfigure the layer for SDR or HDR when the stream mode flips (HDR toggle). HDR uses an
|
||||
/// rgba16Float drawable + a BT.2020 PQ colour space + EDR, so the compositor PQ-maps to the
|
||||
/// display; SDR uses the plain 8-bit sRGB path. Main-thread only (called from `render`).
|
||||
private func configure(hdr: Bool) {
|
||||
/// Configure the layer + active pipeline for an SDR or HDR session. MAIN THREAD ONLY. Called once at
|
||||
/// session start and again per-frame from `render` (idempotent — the guard makes a same-state call a
|
||||
/// no-op), so a mid-session HDR toggle (the host re-inits its encoder; the decoded `frame.isHDR`
|
||||
/// flips) reconfigures here automatically. HDR uses an rgba16Float drawable + BT.2020 PQ colour space
|
||||
/// + EDR with a 203-nit reference-white anchor; SDR uses the plain 8-bit sRGB path.
|
||||
public func configure(hdr: Bool) {
|
||||
guard hdr != hdrActive else { return }
|
||||
hdrActive = hdr
|
||||
configureColor(hdr: hdr)
|
||||
}
|
||||
|
||||
/// Set the layer's pixel format + colour config for SDR or HDR. MAIN THREAD ONLY. EDR is requested
|
||||
/// on ALL platforms — the property is available on macOS/iOS/tvOS at our deployment floor, and the
|
||||
/// old `#if os(macOS)` guard left iOS/tvOS EDR half-engaged.
|
||||
private func configureColor(hdr: Bool) {
|
||||
if hdr {
|
||||
layer.pixelFormat = .rgba16Float
|
||||
layer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ)
|
||||
#if os(macOS)
|
||||
layer.wantsExtendedDynamicRangeContent = true
|
||||
#endif
|
||||
// Anchor reference white. Re-apply the real grade if one already arrived (0xCE before the
|
||||
// flip); otherwise the bare 203-nit anchor. Without this anchor the PQ signal is too bright.
|
||||
layer.edrMetadata = makeEDR(lastHdrMeta)
|
||||
} else {
|
||||
// SDR: gamma-encoded BT.709 [0,1] in an 8-bit drawable; a nil colorspace tags it device/sRGB
|
||||
// (the proven SDR path — never showed the "too bright" issue, which was HDR-only).
|
||||
layer.pixelFormat = .bgra8Unorm
|
||||
layer.colorspace = nil
|
||||
#if os(macOS)
|
||||
layer.wantsExtendedDynamicRangeContent = false
|
||||
#endif
|
||||
layer.edrMetadata = nil
|
||||
}
|
||||
}
|
||||
|
||||
/// Draw one decoded frame to the next drawable and present it. `isHDR` selects the 10-bit
|
||||
/// BT.2020 PQ path (P010 input) vs the 8-bit BT.709 path (NV12 input). Returns true on success;
|
||||
/// false when there's no drawable yet, a texture couldn't be made, or Metal errored — the
|
||||
/// caller then doesn't stamp a present for this frame.
|
||||
private func makeEDR(_ meta: PunktfunkConnection.HdrMeta?) -> CAEDRMetadata {
|
||||
CAEDRMetadata.hdr10(
|
||||
displayInfo: meta?.masteringDisplayColorVolume(),
|
||||
contentInfo: meta?.contentLightLevelInfo(),
|
||||
opticalOutputScale: hdrReferenceWhiteNits)
|
||||
}
|
||||
|
||||
/// Update the HDR mastering metadata (drained from the host's 0xCE datagram) to refine the system
|
||||
/// tone-map from the real grade. Called from the PUMP thread, so the layer write is hopped to MAIN
|
||||
/// (every CALayer mutation stays on one thread). The grade is cached so a later SDR→HDR
|
||||
/// `configureColor` re-applies it; the `edrMetadata` write is gated on `hdrActive` (setting it on an
|
||||
/// SDR layer is harmless but pointless, and the flip will apply it anyway).
|
||||
public func setHdrMeta(_ meta: PunktfunkConnection.HdrMeta) {
|
||||
DispatchQueue.main.async { [weak self] in
|
||||
guard let self else { return }
|
||||
self.lastHdrMeta = meta
|
||||
if self.hdrActive { self.layer.edrMetadata = self.makeEDR(meta) }
|
||||
}
|
||||
}
|
||||
|
||||
/// Draw one decoded frame to the next drawable and present it. MAIN THREAD (the display link).
|
||||
/// `isHDR` selects the 10-bit BT.2020 PQ path vs the 8-bit BT.709 path and is reconciled with the
|
||||
/// layer config via `configure`. Returns true on success; false when there's no drawable yet, a
|
||||
/// texture couldn't be made, or Metal errored — the caller then doesn't stamp a present.
|
||||
@discardableResult
|
||||
public func render(_ pixelBuffer: CVPixelBuffer, isHDR: Bool = false) -> Bool {
|
||||
// Reconcile the layer with the decoded frame's HDR-ness (handles a mid-session SDR↔HDR flip).
|
||||
configure(hdr: isHDR)
|
||||
// P010 stores 10-bit luma/chroma in 16-bit samples → R16/RG16; NV12 is 8-bit → R8/RG8.
|
||||
let lumaFmt: MTLPixelFormat = isHDR ? .r16Unorm : .r8Unorm
|
||||
let chromaFmt: MTLPixelFormat = isHDR ? .rg16Unorm : .rg8Unorm
|
||||
|
||||
// P010/x444 store 10-bit luma/chroma in 16-bit samples → R16/RG16; NV12/444v is 8-bit → R8/RG8.
|
||||
// Derived from the actual decoded buffer so a 4:4:4 (full chroma plane) frame just works.
|
||||
let pf = CVPixelBufferGetPixelFormatType(pixelBuffer)
|
||||
let tenBit =
|
||||
pf == kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
|
||||
|| pf == kCVPixelFormatType_420YpCbCr10BiPlanarFullRange
|
||||
|| pf == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange
|
||||
|| pf == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange
|
||||
guard let textureCache,
|
||||
let luma = makeTexture(pixelBuffer, plane: 0, format: lumaFmt, cache: textureCache),
|
||||
let chroma = makeTexture(pixelBuffer, plane: 1, format: chromaFmt, cache: textureCache)
|
||||
let luma = makeTexture(
|
||||
pixelBuffer, plane: 0, format: tenBit ? .r16Unorm : .r8Unorm, cache: textureCache),
|
||||
let chroma = makeTexture(
|
||||
pixelBuffer, plane: 1, format: tenBit ? .rg16Unorm : .rg8Unorm, cache: textureCache)
|
||||
else { return false }
|
||||
|
||||
// Size the drawable to the decoded frame so the fullscreen triangle samples the texture 1:1
|
||||
// (pixel-exact); the layer's contentsGravity then scales it to the on-screen bounds via the
|
||||
// system compositor (matching stage-1). Re-set only on a change (first frame / Reconfigure).
|
||||
// Size the drawable to the decoded frame so the fullscreen triangle samples 1:1 (pixel-exact);
|
||||
// the layer's contentsGravity then scales it to the on-screen bounds via the system compositor
|
||||
// (matching stage-1). drawableSize does NOT track bounds (defaults to 0), so set it BEFORE
|
||||
// nextDrawable; re-set only on a change (first frame / Reconfigure / HDR flip).
|
||||
let decodedSize = CGSize(
|
||||
width: CVPixelBufferGetWidth(pixelBuffer), height: CVPixelBufferGetHeight(pixelBuffer))
|
||||
if layer.drawableSize != decodedSize { layer.drawableSize = decodedSize }
|
||||
#if DEBUG
|
||||
logSizeIfChanged(decoded: decodedSize)
|
||||
#endif
|
||||
guard let drawable = layer.nextDrawable(),
|
||||
let commandBuffer = queue.makeCommandBuffer()
|
||||
else { return false }
|
||||
|
||||
#if DEBUG
|
||||
// Diagnose sharpness: decoded should equal the drawable (the shader is 1:1); the layer's
|
||||
// bounds may differ (the system scales). Logged only when a size changes.
|
||||
let decodedW = Int(decodedSize.width)
|
||||
let decodedH = Int(decodedSize.height)
|
||||
let sig = "\(decodedW)x\(decodedH)|\(Int(layer.drawableSize.width))x\(Int(layer.drawableSize.height))"
|
||||
if sig != lastSizeSig {
|
||||
lastSizeSig = sig
|
||||
let msg = "stage2: decoded \(decodedW)x\(decodedH) → drawable "
|
||||
+ "\(Int(layer.drawableSize.width))x\(Int(layer.drawableSize.height)) "
|
||||
+ "(texture \(drawable.texture.width)x\(drawable.texture.height), "
|
||||
+ "contentsScale \(layer.contentsScale), "
|
||||
+ "layerBounds \(Int(layer.bounds.width))x\(Int(layer.bounds.height)))"
|
||||
presenterLog.info("\(msg, privacy: .public)")
|
||||
}
|
||||
#endif
|
||||
|
||||
let pass = MTLRenderPassDescriptor()
|
||||
pass.colorAttachments[0].texture = drawable.texture
|
||||
pass.colorAttachments[0].loadAction = .clear
|
||||
@@ -247,24 +309,23 @@ public final class MetalVideoPresenter {
|
||||
guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else {
|
||||
return false
|
||||
}
|
||||
encoder.setRenderPipelineState(isHDR ? pipelineHDR : pipelineSDR)
|
||||
encoder.setRenderPipelineState(hdrActive ? pipelineHDR : pipelineSDR)
|
||||
encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0)
|
||||
encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1)
|
||||
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
|
||||
encoder.endEncoding()
|
||||
commandBuffer.present(drawable) // present at the next vsync — lowest latency
|
||||
// Hold the CVMetalTextures + the source pixel buffer (its IOSurface) alive until the GPU
|
||||
// finishes sampling — releasing them at scope exit could free the backing mid-read.
|
||||
// Hold the CVMetalTextures + source pixel buffer (its IOSurface) alive until the GPU finishes
|
||||
// sampling — releasing them at scope exit could free the backing mid-read.
|
||||
commandBuffer.addCompletedHandler { _ in _ = (luma, chroma, pixelBuffer) }
|
||||
commandBuffer.commit()
|
||||
return true
|
||||
}
|
||||
|
||||
/// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past
|
||||
/// the draw — the MTLTexture is only valid while its CVMetalTexture is retained.
|
||||
/// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past the
|
||||
/// draw — the MTLTexture is only valid while its CVMetalTexture is retained.
|
||||
private func makeTexture(
|
||||
_ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat,
|
||||
cache: CVMetalTextureCache
|
||||
_ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat, cache: CVMetalTextureCache
|
||||
) -> CVMetalTexture? {
|
||||
let w = CVPixelBufferGetWidthOfPlane(pixelBuffer, plane)
|
||||
let h = CVPixelBufferGetHeightOfPlane(pixelBuffer, plane)
|
||||
@@ -276,5 +337,16 @@ public final class MetalVideoPresenter {
|
||||
else { return nil }
|
||||
return cvTexture
|
||||
}
|
||||
|
||||
#if DEBUG
|
||||
private func logSizeIfChanged(decoded: CGSize) {
|
||||
let sig = "\(Int(decoded.width))x\(Int(decoded.height))|hdr\(hdrActive ? 1 : 0)"
|
||||
if sig != lastSizeSig {
|
||||
lastSizeSig = sig
|
||||
let msg = "stage2: decoded \(Int(decoded.width))x\(Int(decoded.height)) hdr=\(hdrActive)"
|
||||
presenterLog.info("\(msg, privacy: .public)")
|
||||
}
|
||||
}
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
|
||||
@@ -0,0 +1,94 @@
|
||||
// Steers the system's iPad pointer-lock resolution down to a chosen "anchor" view controller.
|
||||
//
|
||||
// `UIViewController.prefersPointerLocked` is resolved the same way as the status bar: the system
|
||||
// walks DOWN from the window's root view controller through `childViewControllerForPointerLock`.
|
||||
// SwiftUI's hosting / container view controllers do NOT forward that query to their children, so a
|
||||
// `UIViewControllerRepresentable` controller buried in the SwiftUI tree (our StreamViewController)
|
||||
// is never consulted — its `prefersPointerLocked = true` is silently ignored and a Magic Keyboard
|
||||
// trackpad / mouse falls through to the absolute-pointer path instead of being captured.
|
||||
//
|
||||
// Swizzling the DEFAULT implementation isn't enough: the controllers that break the chain
|
||||
// (UIHostingController and SwiftUI's internal containers) provide their OWN implementation of the
|
||||
// property, so a base-class swizzle never runs for them. Instead we walk UP the LIVE `parent`
|
||||
// chain from the anchor to the window root and, on each real ancestor, force
|
||||
// `childViewControllerForPointerLock` to return the next controller toward the anchor. Each forced
|
||||
// value is a genuine direct child (we follow the actual containment chain), so the system's
|
||||
// downward walk reaches the anchor and reads its `prefersPointerLocked`.
|
||||
//
|
||||
// The forcing is per-INSTANCE — an associated object — gated behind a one-time per-CLASS IMP
|
||||
// swizzle. Only the specific controllers in the anchor's chain are affected; every other instance
|
||||
// of those classes keeps its original behavior (associated object nil → original IMP). The forced
|
||||
// values are cleared on disengage so the long-lived SwiftUI parents don't retain a stale controller
|
||||
// across sessions. Only the PUBLIC `childViewControllerForPointerLock` selector is touched
|
||||
// (App-Store-safe; no private API).
|
||||
|
||||
#if os(iOS)
|
||||
import ObjectiveC
|
||||
import UIKit
|
||||
|
||||
enum PointerLockChain {
|
||||
private static var forcedChildKey: UInt8 = 0
|
||||
/// Classes whose `childViewControllerForPointerLock` we've already IMP-swizzled (keyed by the
|
||||
/// class object). Main-thread only — pointer-lock resolution and capture toggles are all main.
|
||||
private static var swizzledClasses = Set<ObjectIdentifier>()
|
||||
/// Ancestors we've stamped with a forced child this engagement, held weakly so a deallocated
|
||||
/// SwiftUI controller drops out on its own (no dangling). disengage() clears every one — even
|
||||
/// if the live `parent` chain has since broken — so a stamped parent can never retain a stale
|
||||
/// controller subtree across sessions. One anchor is ever engaged at a time.
|
||||
private static let stampedParents = NSHashTable<UIViewController>.weakObjects()
|
||||
|
||||
private static func forcedChild(of vc: UIViewController) -> UIViewController? {
|
||||
objc_getAssociatedObject(vc, &forcedChildKey) as? UIViewController
|
||||
}
|
||||
|
||||
private static func setForcedChild(_ child: UIViewController?, on vc: UIViewController) {
|
||||
// RETAIN: while steering, the parent must keep the toward-anchor child alive. It's also
|
||||
// already a strong child of `vc` via UIKit containment, so this adds no cycle (the reverse
|
||||
// `.parent` link is weak), and disengage() always clears it — so it can't outlive a session.
|
||||
objc_setAssociatedObject(vc, &forcedChildKey, child, .OBJC_ASSOCIATION_RETAIN_NONATOMIC)
|
||||
}
|
||||
|
||||
/// Ensure `cls`'s `childViewControllerForPointerLock` getter consults the per-instance forced
|
||||
/// child first, falling back to the class's original implementation. Idempotent per class.
|
||||
private static func ensureSwizzled(_ cls: AnyClass) {
|
||||
let id = ObjectIdentifier(cls)
|
||||
guard !swizzledClasses.contains(id) else { return }
|
||||
swizzledClasses.insert(id)
|
||||
let selector = #selector(getter: UIViewController.childViewControllerForPointerLock)
|
||||
guard let method = class_getInstanceMethod(cls, selector) else { return }
|
||||
let originalIMP = method_getImplementation(method)
|
||||
typealias OriginalFn = @convention(c) (AnyObject, Selector) -> UIViewController?
|
||||
let original = unsafeBitCast(originalIMP, to: OriginalFn.self)
|
||||
let forwarding: @convention(block) (UIViewController) -> UIViewController? = { vc in
|
||||
if let forced = forcedChild(of: vc) { return forced }
|
||||
return original(vc, selector)
|
||||
}
|
||||
method_setImplementation(method, imp_implementationWithBlock(forwarding))
|
||||
}
|
||||
|
||||
/// Force every ancestor of `anchor` to forward pointer-lock resolution toward it, then ask the
|
||||
/// system to re-resolve. No-op when `anchor` isn't in a view-controller hierarchy yet (it
|
||||
/// re-runs from the anchor's appearance/parent callbacks once it is).
|
||||
static func engage(_ anchor: UIViewController) {
|
||||
disengage(anchor) // clear any prior engagement first (reparent / re-anchor)
|
||||
var child = anchor
|
||||
while let parent = child.parent {
|
||||
ensureSwizzled(object_getClass(parent)!)
|
||||
setForcedChild(child, on: parent)
|
||||
stampedParents.add(parent)
|
||||
child = parent
|
||||
}
|
||||
anchor.setNeedsUpdateOfPrefersPointerLocked()
|
||||
}
|
||||
|
||||
/// Clear the forced forwarding on every stamped ancestor (so the SwiftUI parents stop retaining
|
||||
/// the anchor's subtree) and re-resolve to drop the lock.
|
||||
static func disengage(_ anchor: UIViewController) {
|
||||
for parent in stampedParents.allObjects {
|
||||
setForcedChild(nil, on: parent)
|
||||
}
|
||||
stampedParents.removeAllObjects()
|
||||
anchor.setNeedsUpdateOfPrefersPointerLocked()
|
||||
}
|
||||
}
|
||||
#endif
|
||||
@@ -0,0 +1,36 @@
|
||||
// Synthetic 4:4:4 HEVC keyframes used only by `Stage444Probe` to probe decode capability.
|
||||
//
|
||||
// Each is the first IDR access unit (VPS + SPS + PPS + IDR slice, Annex-B) of a 256×256 HEVC
|
||||
// Range-Extensions clip — `chroma_format_idc = 3` — generated offline with libx265:
|
||||
// ffmpeg -f lavfi -i color=c=gray:s=256x256:r=30:d=0.1 -frames:v 3 \
|
||||
// -pix_fmt yuv444p[10le] -c:v libx265 \
|
||||
// -x265-params keyint=1:min-keyint=1:no-info=1:repeat-headers=1:aud=0 -f hevc out.hevc
|
||||
// 256×256 clears the hardware decoder's minimum-dimension floor (a 16×16 clip is rejected for every
|
||||
// chroma format). Validated to hardware-decode to `444v`/`x444` on Apple Silicon (M3).
|
||||
enum Probe444Blobs {
|
||||
/// 256×256 HEVC Range-Extensions 4:4:4 keyframe (Annex-B): 134 bytes.
|
||||
static let au444_8bit: [UInt8] = [
|
||||
0x00, 0x00, 0x00, 0x01, 0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00,
|
||||
0x9e, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, 0xba, 0x02, 0x40, 0x00, 0x00, 0x00, 0x01, 0x42,
|
||||
0x01, 0x01, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, 0x9e, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c,
|
||||
0x90, 0x01, 0x01, 0x00, 0x80, 0xb2, 0xdd, 0x49, 0x26, 0x57, 0x80, 0xb4, 0x04, 0x00, 0x00, 0x03,
|
||||
0x00, 0x04, 0x00, 0x00, 0x03, 0x00, 0x78, 0x20, 0x00, 0x00, 0x00, 0x01, 0x44, 0x01, 0xc1, 0x72,
|
||||
0x86, 0x0c, 0x06, 0x24, 0x00, 0x00, 0x00, 0x01, 0x28, 0x01, 0xaf, 0x72, 0x15, 0xe8, 0x34, 0xeb,
|
||||
0xae, 0xfb, 0xfe, 0x75, 0x57, 0xca, 0xc1, 0x71, 0x43, 0x16, 0xf5, 0xc2, 0x40, 0xbd, 0x80, 0xa6,
|
||||
0x65, 0x35, 0x20, 0x28, 0x81, 0xa2, 0x5e, 0xc0, 0x93, 0x04, 0x10, 0x9b, 0x00, 0x34, 0xe0, 0x87,
|
||||
0x00, 0x00, 0x03, 0x00, 0x5b, 0x40,
|
||||
]
|
||||
|
||||
/// 256×256 HEVC Range-Extensions 4:4:4 10-bit keyframe (Annex-B): 133 bytes.
|
||||
static let au444_10bit: [UInt8] = [
|
||||
0x00, 0x00, 0x00, 0x01, 0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00,
|
||||
0x9c, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c, 0xba, 0x02, 0x40, 0x00, 0x00, 0x00, 0x01, 0x42,
|
||||
0x01, 0x01, 0x04, 0x08, 0x00, 0x00, 0x03, 0x00, 0x9c, 0x28, 0x00, 0x00, 0x03, 0x00, 0x00, 0x3c,
|
||||
0x90, 0x01, 0x01, 0x00, 0x80, 0x9b, 0x2d, 0xd4, 0x92, 0x65, 0x78, 0x0b, 0x40, 0x40, 0x00, 0x00,
|
||||
0x03, 0x00, 0x40, 0x00, 0x00, 0x07, 0x82, 0x00, 0x00, 0x00, 0x01, 0x44, 0x01, 0xc1, 0x72, 0x86,
|
||||
0x0c, 0x06, 0x24, 0x00, 0x00, 0x00, 0x01, 0x28, 0x01, 0xaf, 0x72, 0x15, 0xe8, 0x34, 0xeb, 0xae,
|
||||
0xfb, 0xfe, 0x75, 0x57, 0xca, 0xc1, 0x71, 0x43, 0x16, 0xf5, 0xc2, 0x40, 0xbd, 0x80, 0xa6, 0x65,
|
||||
0x35, 0x20, 0x28, 0x81, 0xa2, 0x5e, 0xc0, 0x93, 0x04, 0x10, 0x9b, 0x00, 0x34, 0xe0, 0x87, 0x00,
|
||||
0x00, 0x03, 0x00, 0x5b, 0x40,
|
||||
]
|
||||
}
|
||||
@@ -238,6 +238,13 @@ public final class PunktfunkConnection {
|
||||
public private(set) var colorFullRange: Bool = false
|
||||
/// Encoded bit depth (8 or 10).
|
||||
public private(set) var bitDepth: UInt8 = 8
|
||||
/// The chroma subsampling the host resolved for this session, as the HEVC `chroma_format_idc`:
|
||||
/// `1` = 4:2:0 (every pre-4:4:4 host, and the back-compat default) or `3` = full-chroma 4:4:4
|
||||
/// (only when this client advertised `videoCap444` *and* the host could open a real 4:4:4
|
||||
/// encoder). Drive the decoder's requested pixel format from this. See `isChroma444`.
|
||||
public private(set) var chromaFormat: UInt8 = 1
|
||||
/// Convenience: the resolved stream is full-chroma 4:4:4 (`chroma_format_idc == 3`).
|
||||
public var isChroma444: Bool { chromaFormat == 3 }
|
||||
/// True when the negotiated stream is HDR (PQ or HLG transfer) — drive an HDR present path and
|
||||
/// drain `nextHdrMeta`.
|
||||
public var isHDR: Bool { colorTransfer == 16 || colorTransfer == 18 }
|
||||
@@ -334,6 +341,9 @@ public final class PunktfunkConnection {
|
||||
colorMatrix = mtx
|
||||
colorFullRange = fullRange != 0
|
||||
bitDepth = depth
|
||||
var cf: UInt8 = 1
|
||||
_ = punktfunk_connection_chroma_format(handle, &cf)
|
||||
chromaFormat = cf
|
||||
var ac: UInt8 = 2
|
||||
_ = punktfunk_connection_audio_channels(handle, &ac)
|
||||
resolvedAudioChannels = ac
|
||||
@@ -605,6 +615,10 @@ public final class PunktfunkConnection {
|
||||
public static let videoCap10Bit: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_10BIT)
|
||||
/// Video-capability bit: the client can present BT.2020 PQ HDR10 (implies 10-bit).
|
||||
public static let videoCapHDR: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_HDR)
|
||||
/// Video-capability bit: the client can decode a full-chroma 4:4:4 HEVC stream (Range
|
||||
/// Extensions). Advertise only when the device can *hardware*-decode it (`Stage444Probe`);
|
||||
/// the host then emits 4:4:4 only if it too opted in. `chromaFormat` reflects the real value.
|
||||
public static let videoCap444: UInt8 = UInt8(PUNKTFUNK_VIDEO_CAP_444)
|
||||
|
||||
/// Static HDR mastering metadata (SMPTE ST.2086 + content light level) the host sent for an HDR
|
||||
/// session. Mirrors the wire/ABI `PunktfunkHdrMeta`; primaries are in ST.2086 **G, B, R** order,
|
||||
|
||||
@@ -1,21 +1,21 @@
|
||||
// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async
|
||||
// output drops the newest decoded frame into a 1-slot ring; the hosting view's display link
|
||||
// calls `renderTick` once per vsync to draw + present the newest ready frame and stamp
|
||||
// capture→present. Mirrors StreamPump's lifecycle (one per start; cancel is permanent).
|
||||
// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async output
|
||||
// drops the newest decoded frame into a 1-slot ring; the hosting view's display link calls `renderTick`
|
||||
// once per vsync to draw + present the newest ready frame and stamp capture→present. Mirrors
|
||||
// StreamPump's lifecycle (one per start; cancel is permanent).
|
||||
//
|
||||
// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick`
|
||||
// + `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there).
|
||||
// Only the ring + decoder cross threads and both are internally locked.
|
||||
// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick` +
|
||||
// `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there). Only the ring (lock-guarded)
|
||||
// and the decoder/presenter (internally locked / main-hopped) cross threads.
|
||||
|
||||
#if canImport(Metal) && canImport(QuartzCore)
|
||||
import AVFoundation
|
||||
import Foundation
|
||||
import QuartzCore
|
||||
|
||||
/// Weak-target wrapper for CADisplayLink. The link retains its target, so targeting a view
|
||||
/// directly makes a `view → link → view` cycle that only `invalidate()` breaks — if a teardown
|
||||
/// is ever missed the view leaks and keeps ticking. This proxy holds the handler weakly, so the
|
||||
/// view can deallocate and its `deinit` invalidate the link.
|
||||
/// Weak-target wrapper for CADisplayLink. The link retains its target, so targeting a view directly
|
||||
/// makes a `view → link → view` cycle that only `invalidate()` breaks — if a teardown is ever missed
|
||||
/// the view leaks and keeps ticking. This proxy holds the handler weakly, so the view can deallocate
|
||||
/// and its `deinit` invalidate the link.
|
||||
public final class DisplayLinkProxy: NSObject {
|
||||
private let onTick: (CADisplayLink) -> Void
|
||||
public init(_ onTick: @escaping (CADisplayLink) -> Void) { self.onTick = onTick }
|
||||
@@ -44,10 +44,10 @@ private final class PumpToken: @unchecked Sendable {
|
||||
func cancel() { lock.lock(); live = false; lock.unlock() }
|
||||
}
|
||||
|
||||
/// Throttled host keyframe requests for decode recovery. The decoder's async error callback
|
||||
/// (a VT thread) and the pump thread (a submit failure) both signal a wedge; this coalesces
|
||||
/// them so the control stream isn't flooded while the decode stays stalled for several frames
|
||||
/// until the requested IDR lands. Bound to the live connection in `start`, unbound in `stop`.
|
||||
/// Throttled host keyframe requests for decode recovery. The decoder's async error callback (a VT
|
||||
/// thread) and the pump thread (a submit failure) both signal a wedge; this coalesces them so the
|
||||
/// control stream isn't flooded while the decode stays stalled for several frames until the requested
|
||||
/// IDR lands. Bound to the live connection in `start`, unbound in `stop`.
|
||||
private final class KeyframeRecovery: @unchecked Sendable {
|
||||
private let lock = NSLock()
|
||||
private var connection: PunktfunkConnection?
|
||||
@@ -60,7 +60,7 @@ private final class KeyframeRecovery: @unchecked Sendable {
|
||||
func request() {
|
||||
lock.lock()
|
||||
let now = DispatchTime.now().uptimeNanoseconds
|
||||
let due = lastNs == 0 || now &- lastNs > 100_000_000 // ≥ 100 ms since the last request (matches Android)
|
||||
let due = lastNs == 0 || now &- lastNs > 100_000_000 // ≥ 100 ms since the last request
|
||||
if due { lastNs = now }
|
||||
let conn = due ? connection : nil
|
||||
lock.unlock()
|
||||
@@ -76,30 +76,36 @@ public final class Stage2Pipeline {
|
||||
private let recovery = KeyframeRecovery()
|
||||
private var token = PumpToken()
|
||||
private var offsetNs: Int64 = 0
|
||||
/// Signalled when the pump thread exits, so `stop()` can join it (bounded) before `decoder.reset()`
|
||||
/// — otherwise a pump iteration already past its `token.isLive` check can rebuild a decode session
|
||||
/// right after the reset (a brief orphan session). `pumpJoinable` is armed by `start`, consumed by
|
||||
/// the first `stop` (so the idempotent second `stop`/deinit doesn't block on an already-drained
|
||||
/// semaphore). start/stop are sequential lifecycle calls, so the plain flag is safe.
|
||||
private let pumpStopped = DispatchSemaphore(value: 0)
|
||||
private var pumpJoinable = false
|
||||
|
||||
/// The Metal layer the hosting view installs + sizes. nil-init fails when Metal is
|
||||
/// unavailable so the caller can fall back to stage-1.
|
||||
/// The Metal layer the hosting view installs + sizes.
|
||||
public var layer: CAMetalLayer { presenter.layer }
|
||||
|
||||
/// `presentMeter` records capture→present (the glass-to-glass term). Returns nil if Metal
|
||||
/// can't be set up (headless / no GPU) — caller falls back to the stage-1 presenter.
|
||||
/// `presentMeter` records capture→present (the glass-to-glass term). Returns nil if Metal can't be
|
||||
/// set up (headless / no GPU) — caller falls back to the stage-1 presenter.
|
||||
public init?(presentMeter: LatencyMeter) {
|
||||
guard let presenter = MetalVideoPresenter() else { return nil }
|
||||
guard let presenter = MetalVideoPresenter.make() else { return nil }
|
||||
self.presenter = presenter
|
||||
self.presentMeter = presentMeter
|
||||
let ring = ring
|
||||
let recovery = recovery
|
||||
self.decoder = VideoDecoder(
|
||||
onDecoded: { ring.submit($0) },
|
||||
// Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump
|
||||
// resets to re-gate on the next IDR, and we ask the host to send one now (infinite
|
||||
// GOP — it wouldn't otherwise come soon). Throttled in KeyframeRecovery.
|
||||
// Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump resets to
|
||||
// re-gate on the next IDR, and we ask the host to send one now (infinite GOP — it wouldn't
|
||||
// otherwise come soon). Throttled in KeyframeRecovery.
|
||||
onDecodeError: { _ in recovery.request() })
|
||||
}
|
||||
|
||||
/// Start pulling AUs into the decoder. `onFrame` fires per AU at receipt (capture→client
|
||||
/// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client)
|
||||
/// makes the present stamp cross-machine valid.
|
||||
/// Start pulling AUs into the decoder. MAIN THREAD. `onFrame` fires per AU at receipt (capture→client
|
||||
/// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client) makes the
|
||||
/// present stamp cross-machine valid.
|
||||
public func start(
|
||||
connection: PunktfunkConnection,
|
||||
onFrame: (@Sendable (AccessUnit) -> Void)?,
|
||||
@@ -108,34 +114,48 @@ public final class Stage2Pipeline {
|
||||
offsetNs = connection.clockOffsetNs
|
||||
recovery.bind(connection) // arm host-keyframe recovery for this session
|
||||
token = PumpToken() // fresh token per start — cancel is permanent (like StreamPump)
|
||||
|
||||
// Configure the decoder's chroma + the layer's initial colorimetry before the first frame. The
|
||||
// chroma subsampling drives only the decode pixel format (orthogonal to HDR/depth); the HDR
|
||||
// config is the Welcome's latched value, which a mid-session flip then overrides per-frame.
|
||||
decoder.setChroma444(connection.isChroma444)
|
||||
presenter.configure(hdr: connection.isHDR)
|
||||
|
||||
let token = token
|
||||
let decoder = decoder
|
||||
let recovery = recovery
|
||||
let presenter = presenter
|
||||
let pumpStopped = pumpStopped
|
||||
let thread = Thread {
|
||||
defer { pumpStopped.signal() } // let stop() join the pump (bounded) before decoder.reset()
|
||||
var format: CMVideoFormatDescription?
|
||||
var lastFramesDropped = connection.framesDropped()
|
||||
// Persistent recovery WANT, not a one-shot edge (see StreamPump for the full rationale):
|
||||
// the old code advanced lastFramesDropped on the same edge it called recovery.request(),
|
||||
// so a request swallowed by the throttle (the lost recovery IDR being pruned within the
|
||||
// window) was never re-sent and the picture stayed frozen. Keep asking until an IDR lands.
|
||||
// keep asking until an IDR lands so a request swallowed by the throttle is re-sent.
|
||||
var awaitingIDR = false
|
||||
// 4:4:4 backstop: a run of decode/create failures in a 4:4:4 session means this device can't
|
||||
// decode 4:4:4 at the negotiated resolution (the HW probe clears the common case but not a
|
||||
// resolution-ceiling miss). End cleanly instead of looping on a black screen.
|
||||
var decodeFailRun = 0
|
||||
while token.isLive {
|
||||
do {
|
||||
// Loss recovery (the primary path). The reassembler drops unrecoverable AUs
|
||||
// (framesDropped) and the decoder conceals the reference-missing deltas that
|
||||
// follow — often WITHOUT an error callback — so key off the drop count climbing,
|
||||
// then keep asking (awaitingIDR) until a fresh IDR re-anchors decode. Polled every
|
||||
// iteration so a total-loss drought recovers the moment packets resume.
|
||||
// Loss recovery (the primary path). The reassembler drops unrecoverable AUs and the
|
||||
// decoder conceals the reference-missing deltas — often WITHOUT an error callback —
|
||||
// so key off the drop count climbing, then keep asking (awaitingIDR) until a fresh
|
||||
// IDR re-anchors decode.
|
||||
let dropped = connection.framesDropped()
|
||||
if dropped > lastFramesDropped {
|
||||
lastFramesDropped = dropped
|
||||
awaitingIDR = true
|
||||
}
|
||||
if awaitingIDR { recovery.request() }
|
||||
// Drain any HDR mastering-metadata update (0xCE) and hand it to the decoder, which
|
||||
// attaches it to subsequent HDR frames. Non-blocking; only HDR sessions emit these.
|
||||
if connection.isHDR, let meta = try? connection.nextHdrMeta(timeoutMs: 0) {
|
||||
decoder.setHdrMeta(meta)
|
||||
// Drain HDR mastering metadata (0xCE) and hand it to the PRESENTER (→ CAEDRMetadata).
|
||||
// Polled UNCONDITIONALLY (not gated on connection.isHDR, the fixed Welcome flag): the
|
||||
// host sends 0xCE only for HDR, INCLUDING a mid-session SDR→HDR transition (a game
|
||||
// entering HDR — the host re-inits its encoder) the Welcome flag would never reflect.
|
||||
// Non-blocking; nil for an SDR stream.
|
||||
if let meta = try? connection.nextHdrMeta(timeoutMs: 0) {
|
||||
presenter.setHdrMeta(meta)
|
||||
}
|
||||
guard let au = try connection.nextAU(timeoutMs: 100) else { continue }
|
||||
onFrame?(au)
|
||||
@@ -144,12 +164,20 @@ public final class Stage2Pipeline {
|
||||
awaitingIDR = false // a fresh IDR re-anchored decode — recovery complete
|
||||
}
|
||||
guard let f = format, token.isLive else { continue }
|
||||
if !decoder.decode(au: au, format: f) {
|
||||
// Submit/decoder error: drop the session and re-gate on the next IDR's
|
||||
// in-band parameter sets (a delta frame can't recover) — stage-1's policy —
|
||||
// and keep asking for that IDR (infinite GOP) until one re-anchors decode.
|
||||
if decoder.decode(au: au, format: f) {
|
||||
decodeFailRun = 0
|
||||
} else {
|
||||
// Submit/decoder error: drop the session and re-gate on the next IDR's in-band
|
||||
// parameter sets (a delta frame can't recover) and keep asking for that IDR.
|
||||
decoder.reset()
|
||||
awaitingIDR = true
|
||||
decodeFailRun += 1
|
||||
// ~3 s of solid failure in a 4:4:4 session (and only there — a 4:2:0 loss
|
||||
// recovers within a GOP) ⇒ 4:4:4 isn't decodable here; end the session.
|
||||
if connection.isChroma444, decodeFailRun >= 180 {
|
||||
if token.isLive { onSessionEnd?() }
|
||||
break
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
if token.isLive { onSessionEnd?() }
|
||||
@@ -159,22 +187,30 @@ public final class Stage2Pipeline {
|
||||
}
|
||||
thread.name = "punktfunk-stage2-pump"
|
||||
thread.qualityOfService = .userInteractive
|
||||
pumpJoinable = true
|
||||
thread.start()
|
||||
}
|
||||
|
||||
/// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp
|
||||
/// capture→present at `targetPresentNs` — the display link's target present instant, already
|
||||
/// converted to `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
|
||||
/// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp capture→present at
|
||||
/// `targetPresentNs` — the display link's target present instant, already converted to
|
||||
/// `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
|
||||
public func renderTick(targetPresentNs: Int64) {
|
||||
guard let frame = ring.take() else { return }
|
||||
guard presenter.render(frame.pixelBuffer, isHDR: frame.isHDR) else { return }
|
||||
presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs)
|
||||
}
|
||||
|
||||
/// Stop the pump (≤ one poll timeout) and drop the decode session. Does not close the
|
||||
/// connection. A restart needs a fresh Stage2Pipeline (cancel is permanent).
|
||||
/// Stop the pump (≤ one poll timeout) and drop the decode session. MAIN THREAD; idempotent. Does not
|
||||
/// close the connection. A restart needs a fresh Stage2Pipeline (cancel is permanent).
|
||||
public func stop() {
|
||||
token.cancel()
|
||||
// Join the pump (bounded: ≤ one nextAU poll + an in-flight decode) before resetting the decoder,
|
||||
// so the pump can't rebuild a session right after the reset. Only the first stop joins; a
|
||||
// repeat/deinit stop skips the already-drained semaphore.
|
||||
if pumpJoinable {
|
||||
pumpJoinable = false
|
||||
_ = pumpStopped.wait(timeout: .now() + 0.5)
|
||||
}
|
||||
decoder.reset()
|
||||
recovery.bind(nil) // stop requesting keyframes once the session is torn down
|
||||
}
|
||||
@@ -182,8 +218,8 @@ public final class Stage2Pipeline {
|
||||
deinit { token.cancel() }
|
||||
|
||||
/// Convert a `CADisplayLink.targetTimestamp` (CACurrentMediaTime basis) to a `CLOCK_REALTIME`
|
||||
/// nanosecond instant — the present clock the AU pts + skew offset live in. Projects to the
|
||||
/// target present time (when the frame is actually on glass), not the moment we drew.
|
||||
/// nanosecond instant — the present clock the AU pts + skew offset live in. Projects to the target
|
||||
/// present time (when the frame is actually on glass), not the moment we drew.
|
||||
public static func realtimeNs(forDisplayLinkTimestamp t: CFTimeInterval) -> Int64 {
|
||||
let caNow = CACurrentMediaTime()
|
||||
var ts = timespec()
|
||||
|
||||
@@ -0,0 +1,83 @@
|
||||
// Runtime 4:4:4 HEVC decode-capability probe.
|
||||
//
|
||||
// We advertise `VIDEO_CAP_444` (so the host upgrades to a full-chroma 4:4:4 stream) ONLY when this
|
||||
// device can decode 4:4:4 HEVC *in hardware* — software 4:4:4 decode works but is far too slow for a
|
||||
// real-time stream at the negotiated resolution, so a software-only device must keep 4:2:0.
|
||||
//
|
||||
// `VTIsHardwareDecodeSupported(HEVC)` and the HEVC-decoder-capabilities dictionary report HEVC HW
|
||||
// decode but expose nothing about `chroma_format_idc`, so the only reliable signal is to actually
|
||||
// create a *hardware-required* `VTDecompressionSession` for a tiny synthetic 4:4:4 keyframe and
|
||||
// confirm it both creates and decodes to the expected biplanar 4:4:4 pixel format. Validated on an
|
||||
// Apple M3 (HW 4:4:4 8- and 10-bit decode to `444v`/`x444`); a software-only decoder fails the
|
||||
// hardware-required create and we fall back to 4:2:0.
|
||||
//
|
||||
// The probe blobs are 256×256 (above the hardware decoder's minimum-dimension floor — a 16×16 clip
|
||||
// is rejected for ALL chroma formats, including 4:2:0) HEVC Range-Extensions keyframes generated
|
||||
// offline with libx265; see scripts notes. Results are cached (device-static) in lazy statics.
|
||||
|
||||
import CoreMedia
|
||||
import CoreVideo
|
||||
import Foundation
|
||||
import VideoToolbox
|
||||
|
||||
public enum Stage444Probe {
|
||||
/// True iff this device hardware-decodes 8-bit 4:4:4 HEVC (the host's current 4:4:4 path —
|
||||
/// BT.709 limited `yuv444p`). Cached after first evaluation.
|
||||
public static let hwDecode444_8bit: Bool = probeHardware444(
|
||||
au: Probe444Blobs.au444_8bit,
|
||||
want: kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange,
|
||||
fullRangeSibling: kCVPixelFormatType_444YpCbCr8BiPlanarFullRange)
|
||||
|
||||
/// True iff this device hardware-decodes 10-bit 4:4:4 HEVC (the 4:4:4 ∩ HDR/10-bit intersection).
|
||||
/// Cached after first evaluation.
|
||||
public static let hwDecode444_10bit: Bool = probeHardware444(
|
||||
au: Probe444Blobs.au444_10bit,
|
||||
want: kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange,
|
||||
fullRangeSibling: kCVPixelFormatType_444YpCbCr10BiPlanarFullRange)
|
||||
|
||||
/// Create a hardware-REQUIRED `VTDecompressionSession` for the synthetic 4:4:4 keyframe and
|
||||
/// decode it, returning true only when the decoder produces the expected (video- or full-range)
|
||||
/// biplanar 4:4:4 pixel format. Any failure (no hardware path, wrong output format, decode error)
|
||||
/// → false → we keep 4:2:0.
|
||||
private static func probeHardware444(
|
||||
au auBytes: [UInt8], want: OSType, fullRangeSibling: OSType
|
||||
) -> Bool {
|
||||
let data = Data(auBytes)
|
||||
guard let format = AnnexB.formatDescription(fromIDR: data) else { return false }
|
||||
// Require a hardware decoder — a software false-positive would make us advertise 4:4:4 and
|
||||
// then decode every real frame on the CPU, blowing the latency budget.
|
||||
let spec: [CFString: Any] = [
|
||||
kVTVideoDecoderSpecification_RequireHardwareAcceleratedVideoDecoder: true,
|
||||
]
|
||||
let attrs: [CFString: Any] = [
|
||||
kCVPixelBufferPixelFormatTypeKey: want,
|
||||
kCVPixelBufferMetalCompatibilityKey: true,
|
||||
]
|
||||
var session: VTDecompressionSession?
|
||||
let created = VTDecompressionSessionCreate(
|
||||
allocator: kCFAllocatorDefault, formatDescription: format,
|
||||
decoderSpecification: spec as CFDictionary, imageBufferAttributes: attrs as CFDictionary,
|
||||
outputCallback: nil, decompressionSessionOut: &session)
|
||||
guard created == noErr, let session else { return false }
|
||||
defer { VTDecompressionSessionInvalidate(session) }
|
||||
|
||||
let au = AccessUnit(data: data, ptsNs: 0, frameIndex: 0, flags: 0)
|
||||
guard let sample = AnnexB.sampleBuffer(au: au, format: format) else { return false }
|
||||
|
||||
var produced: OSType = 0
|
||||
let done = DispatchSemaphore(value: 0)
|
||||
let status = VTDecompressionSessionDecodeFrame(
|
||||
session, sampleBuffer: sample,
|
||||
flags: [._EnableAsynchronousDecompression], infoFlagsOut: nil
|
||||
) { status, _, imageBuffer, _, _ in
|
||||
if status == noErr, let imageBuffer {
|
||||
produced = CVPixelBufferGetPixelFormatType(imageBuffer)
|
||||
}
|
||||
done.signal()
|
||||
}
|
||||
guard status == noErr else { return false }
|
||||
VTDecompressionSessionWaitForAsynchronousFrames(session)
|
||||
_ = done.wait(timeout: .now() + 1.0)
|
||||
return produced == want || produced == fullRangeSibling
|
||||
}
|
||||
}
|
||||
@@ -137,8 +137,8 @@ public struct StreamView: NSViewRepresentable {
|
||||
public final class StreamLayerView: NSView {
|
||||
private let displayLayer = AVSampleBufferDisplayLayer()
|
||||
private var pump: StreamPump?
|
||||
/// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a
|
||||
/// display link instead of the StreamPump → displayLayer path. nil = stage-1 (default).
|
||||
/// Stage-2 presenter (default): a CAMetalLayer sublayer driven by a display link instead of the
|
||||
/// StreamPump → displayLayer path. nil = stage-1 (Metal-unavailable fallback / DEBUG toggle).
|
||||
var presentMeter: LatencyMeter?
|
||||
private var stage2: Stage2Pipeline?
|
||||
private var stage2Link: CADisplayLink?
|
||||
@@ -638,7 +638,7 @@ public final class StreamLayerView: NSView {
|
||||
private func teardownStage2() {
|
||||
stage2Link?.invalidate()
|
||||
stage2Link = nil
|
||||
stage2?.stop()
|
||||
stage2?.stop() // stops the pump (synchronous join) + drops the decode session
|
||||
stage2 = nil
|
||||
metalLayer?.removeFromSuperlayer()
|
||||
metalLayer = nil
|
||||
|
||||
@@ -92,8 +92,8 @@ public final class StreamViewController: UIViewController {
|
||||
public private(set) var connection: PunktfunkConnection?
|
||||
private var pump: StreamPump?
|
||||
private var observers: [NSObjectProtocol] = []
|
||||
/// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a
|
||||
/// CADisplayLink instead of the StreamPump → displayLayer path. nil = stage-1 (default).
|
||||
/// Stage-2 presenter (default): a CAMetalLayer sublayer driven by a CADisplayLink instead of the
|
||||
/// StreamPump → displayLayer path. nil = stage-1 (Metal-unavailable fallback / DEBUG toggle).
|
||||
var presentMeter: LatencyMeter?
|
||||
private var stage2: Stage2Pipeline?
|
||||
private var stage2Link: CADisplayLink?
|
||||
@@ -155,19 +155,58 @@ public final class StreamViewController: UIViewController {
|
||||
}
|
||||
|
||||
#if os(iOS)
|
||||
// Pointer lock is only meaningful on iPad (iPhone has no hardware-pointer lock) and
|
||||
// only when capture is engaged. The system additionally requires full-screen + frontmost
|
||||
// and may drop it (Slide Over/Stage Manager/backgrounding) — verified in setCaptured().
|
||||
public override var prefersPointerLocked: Bool {
|
||||
captured && UIDevice.current.userInterfaceIdiom == .pad
|
||||
/// Whether the user wants the mouse/trackpad pointer CAPTURED (pointer lock → relative
|
||||
/// movement, the gaming default) rather than forwarded as an absolute position (desktop
|
||||
/// use). Read live from UserDefaults so it tracks the Settings toggle; defaults to on when
|
||||
/// unset. iPad-only — gated again in `prefersPointerLocked`.
|
||||
private var pointerCaptureEnabled: Bool {
|
||||
UserDefaults.standard.object(forKey: DefaultsKey.pointerCapture) as? Bool ?? true
|
||||
}
|
||||
|
||||
/// Whether the pointer should be CAPTURED right now: iPad, capture engaged, and the user
|
||||
/// hasn't opted into the absolute (desktop) pointer. The system additionally requires
|
||||
/// full-screen + frontmost and may drop the lock (Slide Over/Stage Manager/backgrounding) —
|
||||
/// syncPointerLock() handles the actual grant/drop and falls back to absolute when unlocked.
|
||||
private var wantsPointerLock: Bool {
|
||||
captured && pointerCaptureEnabled && UIDevice.current.userInterfaceIdiom == .pad
|
||||
}
|
||||
|
||||
public override var prefersPointerLocked: Bool { wantsPointerLock }
|
||||
public override var prefersHomeIndicatorAutoHidden: Bool { true }
|
||||
|
||||
// If SwiftUI's UIHostingController reparents us, a plain container parent that forwards
|
||||
// its pointer-lock decision to its children will then reach this VC. (UIHostingController
|
||||
// itself does not consult children, which is why GCMouse deltas can never arrive there —
|
||||
// the touch path, always forwarded, is the unconditional fallback.)
|
||||
public override var childViewControllerForPointerLock: UIViewController? { self }
|
||||
// NOTE: we deliberately do NOT override `childViewControllerForPointerLock`. The default
|
||||
// returns nil, which tells the system to use THIS controller's own `prefersPointerLocked` —
|
||||
// exactly what we want, since `PointerLockChain` forces our SwiftUI ancestors to forward the
|
||||
// downward walk to us and we are the terminal anchor. Returning `self` here would make the
|
||||
// system ask the same controller forever (it keeps delegating to the returned child) →
|
||||
// unbounded recursion → stack overflow once the chain actually reaches us.
|
||||
|
||||
/// (Re)build or tear down the forced pointer-lock forwarding chain from this controller to the
|
||||
/// window root so the system actually resolves our `prefersPointerLocked`. Safe to call
|
||||
/// repeatedly — it no-ops until the view is in a window with a parent chain, and re-runs from
|
||||
/// the appearance/parent callbacks once SwiftUI has placed us.
|
||||
private func updatePointerLockChain() {
|
||||
// Engaging needs a live parent chain to the window root; disengaging is always safe and
|
||||
// must run even after the view has left the window (session teardown) so the stamped
|
||||
// SwiftUI ancestors are cleared.
|
||||
if wantsPointerLock, view.window != nil {
|
||||
PointerLockChain.engage(self)
|
||||
} else {
|
||||
PointerLockChain.disengage(self)
|
||||
}
|
||||
}
|
||||
|
||||
public override func viewDidAppear(_ animated: Bool) {
|
||||
super.viewDidAppear(animated)
|
||||
// SwiftUI places us in the hierarchy AFTER start()'s setCaptured(true), and may reparent us
|
||||
// later — re-anchor the chain here so a lock requested before we had a parent still lands.
|
||||
updatePointerLockChain()
|
||||
}
|
||||
|
||||
public override func didMove(toParent parent: UIViewController?) {
|
||||
super.didMove(toParent: parent)
|
||||
updatePointerLockChain() // chain shape changed — re-anchor (or no-op if not yet in a window)
|
||||
}
|
||||
#endif
|
||||
|
||||
func start(
|
||||
@@ -200,7 +239,14 @@ public final class StreamViewController: UIViewController {
|
||||
// Indirect pointer (mouse/trackpad with no lock) → absolute cursor + buttons, routed
|
||||
// through InputCapture so the forwarding gate and release-on-blur apply uniformly.
|
||||
streamView.onPointerMoveAbs = { [weak self] p in
|
||||
self?.inputCapture?.sendMouseAbs(
|
||||
guard let self else { return }
|
||||
if iosInputDebug {
|
||||
// Whether ANY UIKit pointer movement reaches us while the scene is LOCKED tells us
|
||||
// if the trackpad (which may not be a GCMouse) can still be captured via UIKit.
|
||||
iosInputLog.debug(
|
||||
"UIKit pointer move x=\(p.x, privacy: .public) y=\(p.y, privacy: .public) locked=\(self.pointerLockEngaged() == true, privacy: .public) gcFwd=\(self.inputCapture?.gcMouseForwarding == true, privacy: .public)")
|
||||
}
|
||||
self.inputCapture?.sendMouseAbs(
|
||||
x: p.x, y: p.y, surfaceWidth: p.w, surfaceHeight: p.h)
|
||||
}
|
||||
streamView.onPointerButton = { [weak self] button, down in
|
||||
@@ -210,7 +256,12 @@ public final class StreamViewController: UIViewController {
|
||||
// UNLOCKED regime; while locked, GCMouse's scroll handler owns it — mirror the
|
||||
// sendMouseAbs !gcMouseForwarding gate so the two can't double-send.
|
||||
streamView.onScroll = { [weak self] dx, dy in
|
||||
guard let self, self.inputCapture?.gcMouseForwarding == false else { return }
|
||||
guard let self else { return }
|
||||
if iosInputDebug {
|
||||
iosInputLog.debug(
|
||||
"UIKit scroll dx=\(dx, privacy: .public) dy=\(dy, privacy: .public) locked=\(self.pointerLockEngaged() == true, privacy: .public)")
|
||||
}
|
||||
guard self.inputCapture?.gcMouseForwarding == false else { return }
|
||||
self.inputCapture?.sendScroll(dx: dx, dy: dy)
|
||||
}
|
||||
|
||||
@@ -315,7 +366,7 @@ public final class StreamViewController: UIViewController {
|
||||
) {
|
||||
let metal = pipeline.layer
|
||||
// Composites OVER the idle (un-enqueued in stage-2) AVSampleBufferDisplayLayer base.
|
||||
// (contentsScale + frame + drawableSize are all set by layoutMetalLayer() just below.)
|
||||
// (contentsScale + frame are set by layoutMetalLayer() just below.)
|
||||
streamView.layer.addSublayer(metal)
|
||||
metalLayer = metal
|
||||
stage2 = pipeline
|
||||
@@ -372,7 +423,7 @@ public final class StreamViewController: UIViewController {
|
||||
private func teardownStage2() {
|
||||
stage2Link?.invalidate()
|
||||
stage2Link = nil
|
||||
stage2?.stop()
|
||||
stage2?.stop() // stops the pump (synchronous join) + drops the decode session
|
||||
stage2 = nil
|
||||
metalLayer?.removeFromSuperlayer()
|
||||
metalLayer = nil
|
||||
@@ -392,6 +443,7 @@ public final class StreamViewController: UIViewController {
|
||||
captured = false
|
||||
}
|
||||
setNeedsUpdateOfPrefersPointerLocked()
|
||||
updatePointerLockChain() // (re)anchor the SwiftUI ancestors so the lock actually resolves
|
||||
syncPointerLock() // resolve cursor + GCMouse/absolute routing for the current state
|
||||
let onCaptureChange = onCaptureChange
|
||||
let captured = captured
|
||||
@@ -420,7 +472,7 @@ public final class StreamViewController: UIViewController {
|
||||
pointerInteraction?.invalidate() // re-resolve the hidden/visible cursor for the state
|
||||
if iosInputDebug {
|
||||
iosInputLog.debug(
|
||||
"pointer lock isLocked=\(locked, privacy: .public) captured=\(self.captured, privacy: .public)")
|
||||
"pointer lock isLocked=\(locked, privacy: .public) captured=\(self.captured, privacy: .public) useGCMouse=\(useGCMouse, privacy: .public) [\(self.inputCapture?.attachedMiceSummary ?? "n/a", privacy: .public)]")
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
@@ -49,11 +49,10 @@ public final class VideoDecoder: @unchecked Sendable {
|
||||
/// pump can re-gate on the next IDR.
|
||||
private let onDecodeError: @Sendable (OSStatus) -> Void
|
||||
|
||||
/// Latest source HDR mastering metadata (from `PunktfunkConnection.nextHdrMeta`), attached to
|
||||
/// each decoded HDR pixel buffer so the compositor tone-maps from the real grade. Guarded by its
|
||||
/// own lock — written by the pump thread, read on the VT decode callback.
|
||||
private let metaLock = NSLock()
|
||||
private var hdrMeta: PunktfunkConnection.HdrMeta?
|
||||
/// Whether the negotiated stream is full-chroma 4:4:4 (`connection.isChroma444`), set once at
|
||||
/// session start before any decode. Selects the 4:4:4 decode pixel format (orthogonal to bit
|
||||
/// depth / HDR). Read inside `createSessionLocked` under `lock`.
|
||||
private var chroma444 = false
|
||||
|
||||
public init(
|
||||
onDecoded: @escaping @Sendable (ReadyFrame) -> Void,
|
||||
@@ -65,12 +64,13 @@ public final class VideoDecoder: @unchecked Sendable {
|
||||
|
||||
deinit { teardown() }
|
||||
|
||||
/// Set the source HDR mastering metadata (drained from `PunktfunkConnection.nextHdrMeta`). It's
|
||||
/// attached to subsequent decoded HDR pixel buffers. Thread-safe; cheap to call on each update.
|
||||
public func setHdrMeta(_ meta: PunktfunkConnection.HdrMeta) {
|
||||
metaLock.lock()
|
||||
hdrMeta = meta
|
||||
metaLock.unlock()
|
||||
/// Select the chroma subsampling of the decode output (4:2:0 vs full-chroma 4:4:4). Call once at
|
||||
/// session start, before decoding, from `connection.isChroma444`. Takes effect on the next
|
||||
/// session (re)build. Thread-safe.
|
||||
public func setChroma444(_ on: Bool) {
|
||||
lock.lock()
|
||||
chroma444 = on
|
||||
lock.unlock()
|
||||
}
|
||||
|
||||
/// Submit one AU for asynchronous decode, (re)creating the session if `format` changed. The
|
||||
@@ -135,8 +135,10 @@ public final class VideoDecoder: @unchecked Sendable {
|
||||
|
||||
/// True when `newFormat` carries a PQ (SMPTE ST 2084) or HLG transfer function — i.e. the host
|
||||
/// is sending HDR (BT.2020). VideoToolbox populates the transfer-function extension from the
|
||||
/// HEVC VUI, so this tracks the *stream*, switching dynamically when the user toggles HDR
|
||||
/// (the host re-emits parameter sets with the new VUI → a new format desc → session rebuild).
|
||||
/// HEVC VUI, so this picks the decode bit depth (10-bit P010/x444 vs 8-bit NV12/444v) from the
|
||||
/// stream. The present-side HDR config (colorspace/EDR/shader) is latched once per session from
|
||||
/// the Welcome (`connection.isHDR`), which the host does NOT flip mid-session — so this predicate
|
||||
/// and that config agree for the session (a `#if DEBUG` assert in the presenter guards it).
|
||||
static func isHDRFormat(_ format: CMVideoFormatDescription) -> Bool {
|
||||
guard
|
||||
let tf = CMFormatDescriptionGetExtension(
|
||||
@@ -157,11 +159,18 @@ public final class VideoDecoder: @unchecked Sendable {
|
||||
session = nil
|
||||
format = nil
|
||||
|
||||
// Decode pixel format is a 2×2 of (chroma, depth/HDR), both biplanar so the presenter binds
|
||||
// plane 0 = luma, plane 1 = interleaved chroma uniformly — 4:4:4 just delivers a full-size
|
||||
// chroma plane. 10-bit (P010 / `x444`) for HDR (PQ/HLG), 8-bit (NV12 / `444v`) otherwise.
|
||||
let hdr = Self.isHDRFormat(newFormat)
|
||||
let pixelFormat =
|
||||
hdr
|
||||
? kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010 (10-bit)
|
||||
: kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12 (8-bit)
|
||||
let pixelFormat: OSType = {
|
||||
switch (chroma444, hdr) {
|
||||
case (false, false): return kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12
|
||||
case (false, true): return kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010
|
||||
case (true, false): return kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange // 444v
|
||||
case (true, true): return kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange // x444
|
||||
}
|
||||
}()
|
||||
let imageAttrs: [CFString: Any] = [
|
||||
kCVPixelBufferMetalCompatibilityKey: true,
|
||||
kCVPixelBufferPixelFormatTypeKey: pixelFormat,
|
||||
@@ -169,11 +178,20 @@ public final class VideoDecoder: @unchecked Sendable {
|
||||
var callback = VTDecompressionOutputCallbackRecord(
|
||||
decompressionOutputCallback: decoderOutputCallback,
|
||||
decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque())
|
||||
// 4:4:4 sessions REQUIRE a hardware decoder: we only advertise 4:4:4 when the hardware probe
|
||||
// passed, so a hardware-incapable mode (e.g. a resolution past the HW 4:4:4 ceiling) must fail
|
||||
// HERE, synchronously, letting the pump's backstop end the session — rather than silently
|
||||
// falling back to a software 4:4:4 decoder far too slow for a real-time stream. 4:2:0 keeps the
|
||||
// software fallback (nil spec) as a robustness net.
|
||||
let spec: CFDictionary? =
|
||||
chroma444
|
||||
? [kVTVideoDecoderSpecification_RequireHardwareAcceleratedVideoDecoder: true] as CFDictionary
|
||||
: nil
|
||||
var newSession: VTDecompressionSession?
|
||||
let status = VTDecompressionSessionCreate(
|
||||
allocator: kCFAllocatorDefault,
|
||||
formatDescription: newFormat,
|
||||
decoderSpecification: nil, // hardware by default
|
||||
decoderSpecification: spec,
|
||||
imageBufferAttributes: imageAttrs as CFDictionary,
|
||||
outputCallback: &callback,
|
||||
decompressionSessionOut: &newSession)
|
||||
@@ -195,26 +213,17 @@ public final class VideoDecoder: @unchecked Sendable {
|
||||
// pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively.
|
||||
let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default)
|
||||
let ptsNs = p.value > 0 ? UInt64(p.value) : 0
|
||||
// HDR iff the decoder produced a 10-bit P010 buffer (we only request P010 for PQ streams).
|
||||
// HDR iff the decoder produced a 10-bit buffer (we only request a 10-bit format for PQ/HLG
|
||||
// streams). Covers 4:2:0 (P010) and 4:4:4 (`x444`), video- and full-range, so a 10-bit 4:4:4
|
||||
// HDR frame isn't misclassified as SDR. (The mastering metadata is applied to the presenter's
|
||||
// CAMetalLayer via CAEDRMetadata, not to this source buffer — a separate-drawable presenter
|
||||
// never composites the source buffer's attachments, so attaching them here would be dead.)
|
||||
let fmt = CVPixelBufferGetPixelFormatType(imageBuffer)
|
||||
let isHDR =
|
||||
CVPixelBufferGetPixelFormatType(imageBuffer)
|
||||
== kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
|
||||
// Attach the source's mastering display + content light level (ST.2086 / CEA-861.3) so the
|
||||
// compositor tone-maps from the real grade rather than inferring from the PQ colourspace
|
||||
// alone. The SEI byte payloads map 1:1 to these CVImageBuffer attachment keys.
|
||||
if isHDR {
|
||||
metaLock.lock()
|
||||
let meta = hdrMeta
|
||||
metaLock.unlock()
|
||||
if let meta {
|
||||
CVBufferSetAttachment(
|
||||
imageBuffer, kCVImageBufferMasteringDisplayColorVolumeKey,
|
||||
meta.masteringDisplayColorVolume() as CFData, .shouldPropagate)
|
||||
CVBufferSetAttachment(
|
||||
imageBuffer, kCVImageBufferContentLightLevelInfoKey,
|
||||
meta.contentLightLevelInfo() as CFData, .shouldPropagate)
|
||||
}
|
||||
}
|
||||
fmt == kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
|
||||
|| fmt == kCVPixelFormatType_420YpCbCr10BiPlanarFullRange
|
||||
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange
|
||||
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange
|
||||
onDecoded(
|
||||
ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer, isHDR: isHDR))
|
||||
}
|
||||
|
||||
@@ -1,11 +1,13 @@
|
||||
import XCTest
|
||||
|
||||
#if canImport(Metal)
|
||||
import CoreVideo
|
||||
import Metal
|
||||
import QuartzCore
|
||||
@testable import PunktfunkKit
|
||||
|
||||
final class MetalPresenterTests: XCTestCase {
|
||||
/// `MetalVideoPresenter.init?()` compiles the runtime Metal shaders (the BT.709/BT.2020 YUV→RGB
|
||||
/// `MetalVideoPresenter.make()` compiles the runtime Metal shaders (the BT.709/BT.2020 YUV→RGB
|
||||
/// fragment shaders plus the Catmull-Rom luma sampler). A `nil` result on a GPU-equipped host
|
||||
/// means a shader failed to compile — this catches a malformed shader before it silently
|
||||
/// degrades stage-2 to a stage-1 fallback on device.
|
||||
@@ -14,8 +16,54 @@ final class MetalPresenterTests: XCTestCase {
|
||||
throw XCTSkip("no Metal device available in this environment")
|
||||
}
|
||||
XCTAssertNotNil(
|
||||
MetalVideoPresenter(),
|
||||
MetalVideoPresenter.make(),
|
||||
"stage-2 Metal shaders failed to compile (presenter init returned nil)")
|
||||
}
|
||||
|
||||
/// The HDR fix: `configure(hdr:)` must put the layer into the BT.2020-PQ EDR configuration with a
|
||||
/// reference-white anchor (`edrMetadata`) — the missing anchor was what made HDR render "too
|
||||
/// bright". SDR must use the plain 8-bit path with EDR off and no metadata. A mid-session flip is a
|
||||
/// per-mode reconfigure, so the round trip back to SDR must fully restore the SDR config.
|
||||
func testConfigureHDRSetsEDRAnchor() throws {
|
||||
guard let presenter = MetalVideoPresenter.make() else {
|
||||
throw XCTSkip("no Metal device available in this environment")
|
||||
}
|
||||
presenter.configure(hdr: true)
|
||||
XCTAssertEqual(presenter.layer.pixelFormat, .rgba16Float, "HDR uses an EDR-capable drawable")
|
||||
XCTAssertNotNil(presenter.layer.colorspace, "HDR layer must be tagged (itur_2100_PQ)")
|
||||
XCTAssertTrue(
|
||||
presenter.layer.wantsExtendedDynamicRangeContent, "EDR must be requested on all platforms")
|
||||
XCTAssertNotNil(
|
||||
presenter.layer.edrMetadata,
|
||||
"HDR must anchor reference white via edrMetadata (the fix for 'too bright')")
|
||||
|
||||
// Mid-session HDR→SDR flip: the 8-bit path, EDR off, no metadata.
|
||||
presenter.configure(hdr: false)
|
||||
XCTAssertEqual(presenter.layer.pixelFormat, .bgra8Unorm, "SDR uses the plain 8-bit drawable")
|
||||
XCTAssertFalse(presenter.layer.wantsExtendedDynamicRangeContent)
|
||||
XCTAssertNil(presenter.layer.edrMetadata)
|
||||
}
|
||||
|
||||
/// `render` with a freshly-allocated NV12 buffer must present without crashing or hanging — the
|
||||
/// main-thread present path is the highest-risk part of the stage-2 rewrite. (A headless CI with no
|
||||
/// display can still allocate a drawable from a CAMetalLayer; if it can't, render returns false,
|
||||
/// which is also a valid non-crashing outcome.)
|
||||
func testRenderDoesNotCrashOnNV12Frame() throws {
|
||||
guard let presenter = MetalVideoPresenter.make() else {
|
||||
throw XCTSkip("no Metal device available in this environment")
|
||||
}
|
||||
presenter.configure(hdr: false)
|
||||
var pb: CVPixelBuffer?
|
||||
let attrs: [CFString: Any] = [kCVPixelBufferMetalCompatibilityKey: true]
|
||||
let status = CVPixelBufferCreate(
|
||||
kCFAllocatorDefault, 256, 256, kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,
|
||||
attrs as CFDictionary, &pb)
|
||||
guard status == kCVReturnSuccess, let pixelBuffer = pb else {
|
||||
throw XCTSkip("could not allocate a test pixel buffer")
|
||||
}
|
||||
// Just asserting it returns (true or false) without trapping — the layer may have no drawable
|
||||
// source headless, so a false return is acceptable.
|
||||
_ = presenter.render(pixelBuffer, isHDR: false)
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
@@ -0,0 +1,68 @@
|
||||
// 4:4:4 decode-path coverage: the hardware-capability probe is stable/cached, and a real 4:4:4 HEVC
|
||||
// keyframe decodes through VideoDecoder to a biplanar 4:4:4 pixel buffer. Reuses the same synthetic
|
||||
// 4:4:4 blobs the runtime probe ships with.
|
||||
|
||||
import CoreVideo
|
||||
import VideoToolbox
|
||||
import XCTest
|
||||
@testable import PunktfunkKit
|
||||
|
||||
private final class FrameBox: @unchecked Sendable {
|
||||
let lock = NSLock()
|
||||
var frame: ReadyFrame?
|
||||
var error: OSStatus?
|
||||
}
|
||||
|
||||
final class Stage444Tests: XCTestCase {
|
||||
/// The capability probe is device-static and cached — reading it twice must return the same value
|
||||
/// (and must never crash, including where 4:4:4 is unsupported → false).
|
||||
func testProbeIsStableAndCached() {
|
||||
XCTAssertEqual(Stage444Probe.hwDecode444_8bit, Stage444Probe.hwDecode444_8bit)
|
||||
XCTAssertEqual(Stage444Probe.hwDecode444_10bit, Stage444Probe.hwDecode444_10bit)
|
||||
}
|
||||
|
||||
/// A real 8-bit 4:4:4 HEVC keyframe (the embedded probe blob) decodes through `VideoDecoder` with
|
||||
/// `setChroma444(true)` to a 256×256 biplanar 4:4:4 (`444v`/`444f`) buffer classified SDR.
|
||||
/// (4:4:4 sessions require a hardware decoder — skip where there isn't one, which is exactly where
|
||||
/// the client wouldn't advertise 4:4:4 anyway.)
|
||||
func testVideoDecoderDecodes444() throws {
|
||||
try XCTSkipUnless(
|
||||
Stage444Probe.hwDecode444_8bit, "no hardware 4:4:4 decode on this device")
|
||||
let data = Data(Probe444Blobs.au444_8bit)
|
||||
let format = try XCTUnwrap(
|
||||
AnnexB.formatDescription(fromIDR: data), "the 4:4:4 blob must yield a format description")
|
||||
let au = AccessUnit(data: data, ptsNs: 7_000_000, frameIndex: 0, flags: 0)
|
||||
|
||||
let box = FrameBox()
|
||||
let done = DispatchSemaphore(value: 0)
|
||||
let decoder = VideoDecoder(
|
||||
onDecoded: { f in box.lock.lock(); box.frame = f; box.lock.unlock(); done.signal() },
|
||||
onDecodeError: { s in box.lock.lock(); box.error = s; box.lock.unlock(); done.signal() })
|
||||
decoder.setChroma444(true)
|
||||
|
||||
XCTAssertTrue(decoder.decode(au: au, format: format), "4:4:4 frame submit should succeed")
|
||||
XCTAssertEqual(done.wait(timeout: .now() + 10), .success, "the decode callback must fire")
|
||||
decoder.reset()
|
||||
|
||||
box.lock.lock(); let frame = box.frame; let error = box.error; box.lock.unlock()
|
||||
XCTAssertNil(error.map { "decode error \($0)" })
|
||||
let ready = try XCTUnwrap(frame, "a 4:4:4 ReadyFrame must be delivered")
|
||||
XCTAssertEqual(CVPixelBufferGetWidth(ready.pixelBuffer), 256)
|
||||
XCTAssertEqual(CVPixelBufferGetHeight(ready.pixelBuffer), 256)
|
||||
let pf = CVPixelBufferGetPixelFormatType(ready.pixelBuffer)
|
||||
XCTAssertTrue(
|
||||
pf == kCVPixelFormatType_444YpCbCr8BiPlanarVideoRange
|
||||
|| pf == kCVPixelFormatType_444YpCbCr8BiPlanarFullRange,
|
||||
"expected a biplanar 4:4:4 8-bit buffer, got \(fourCCString(pf))")
|
||||
XCTAssertFalse(ready.isHDR, "an 8-bit BT.709 4:4:4 stream is SDR")
|
||||
// The chroma plane (plane 1) must be FULL resolution for 4:4:4 (vs half for 4:2:0) — this is
|
||||
// what lets the unchanged shader sample chroma at the luma UV.
|
||||
XCTAssertEqual(CVPixelBufferGetWidthOfPlane(ready.pixelBuffer, 1), 256)
|
||||
XCTAssertEqual(CVPixelBufferGetHeightOfPlane(ready.pixelBuffer, 1), 256)
|
||||
}
|
||||
|
||||
private func fourCCString(_ t: OSType) -> String {
|
||||
let b = [UInt8(t >> 24 & 0xff), UInt8(t >> 16 & 0xff), UInt8(t >> 8 & 0xff), UInt8(t & 0xff)]
|
||||
return String(bytes: b, encoding: .ascii) ?? "\(t)"
|
||||
}
|
||||
}
|
||||
@@ -3,12 +3,61 @@ title: "Apple Stage-2 Presenter (handoff)"
|
||||
description: "Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left."
|
||||
---
|
||||
|
||||
> **Status:** SHIPPED behind the opt-in `punktfunk.presenter` flag (`AVSampleBufferDisplayLayer`
|
||||
> stage-1 remains the default known-good path). Live-validated ~11 ms p50 capture→present (commit
|
||||
> `7b10714`). Code: `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,LatencyMeter}.swift`;
|
||||
> Settings has a presenter picker (`DefaultsKey.presenter`, `SettingsView.swift`). This doc is trimmed
|
||||
> to design rationale + open items — the shipped `.swift` code is the source of truth for the
|
||||
> decode/present/measurement walkthrough.
|
||||
> **Status:** SHIPPED as the **default** presenter (stage-1 `AVSampleBufferDisplayLayer` is the
|
||||
> Metal-unavailable / DEBUG fallback). HDR corrected and **4:4:4** added on top of the proven
|
||||
> main-thread present path (the hosting view's `CADisplayLink` drives `render` per vsync). Code:
|
||||
> `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,Stage444Probe,LatencyMeter}.swift`.
|
||||
> This doc is trimmed to design rationale + open items — the shipped `.swift` code is the source of
|
||||
> truth for the decode/present/measurement walkthrough.
|
||||
>
|
||||
> **HDR (the "too bright" fix).** The presenter renders to a *separate* CAMetalLayer drawable, so the
|
||||
> mastering metadata that was attached to the source `CVPixelBuffer` was never composited — and with no
|
||||
> reference-white anchor the system rendered the PQ signal far too bright. The fix is to keep the
|
||||
> PQ-passthrough shader (BT.2020 limited→full → PQ R′G′B′ as-is) and put the anchor **on the layer**:
|
||||
> `colorspace = itur_2100_PQ`, `wantsExtendedDynamicRangeContent = true` (on **all** platforms — the old
|
||||
> `#if os(macOS)` guard left iOS/tvOS EDR half-engaged), and
|
||||
> `edrMetadata = CAEDRMetadata.hdr10(displayInfo:contentInfo:opticalOutputScale: 203)`. 203 nits =
|
||||
> BT.2408 HDR reference white anchors diffuse white at EDR 1.0; a larger value renders dimmer. The
|
||||
> mastering/CLL blobs (host `0xCE` datagram) now refine `edrMetadata` (drained by the pump,
|
||||
> `setHdrMeta` hops the layer write to main) rather than being attached to a never-composited source
|
||||
> buffer. **Needs on-glass validation on a real EDR panel.**
|
||||
>
|
||||
> **Mid-session SDR↔HDR.** The control-plane colour (`connection.isHDR`, from the Welcome) is fixed per
|
||||
> session, but the host can re-init its encoder mid-session (a game entering HDR), so the HEVC VUI — and
|
||||
> the decoder's `frame.isHDR` — flips. The presenter follows the **decoded frame**, not the latched
|
||||
> session flag: `render` calls the idempotent `configure(hdr:)` every frame, so on a flip it
|
||||
> reconfigures the layer (per-mode pixel format `bgra8Unorm` SDR / `rgba16Float` HDR, colorspace, EDR)
|
||||
> and selects the matching shader — all synchronously on the main thread (the present path is
|
||||
> main-thread, so no cross-thread hop is needed). The last `0xCE` grade is cached so an SDR→HDR
|
||||
> reconfigure re-applies the real mastering metadata instead of the bare anchor. The pump drains `0xCE`
|
||||
> **unconditionally** (not gated on the Welcome flag) so a session that starts SDR still gets mastering
|
||||
> metadata when it goes HDR. A ≤2-frame transition flash on the rare flip is accepted.
|
||||
>
|
||||
> **Pacing.** The hosting view owns a **main-runloop `CADisplayLink`** (a weak `DisplayLinkProxy`
|
||||
> breaks the retain cycle) that calls `renderTick` once per vsync. `renderTick` pops the **newest**
|
||||
> ready frame from the 1-slot ring (older undisplayed frames dropped — lowest latency, no smoothing
|
||||
> buffer) and, if there is one, draws it via **manual `layer.nextDrawable()`** and presents at the next
|
||||
> vsync; on an idle vsync (no new frame) it does nothing and the compositor holds the last presented
|
||||
> drawable (no idle re-render — matters at 5K). `drawableSize` is set **before** `nextDrawable` (it
|
||||
> doesn't track bounds, defaults to 0), so allocation always uses the decoded size. `maximumDrawableCount
|
||||
> = 3`. macOS `displaySyncEnabled = **false**`: the display link is the single pacing source, so leaving
|
||||
> the layer's own vsync wait on would *also* block `present`/`nextDrawable` on the main thread and
|
||||
> serialize it to the display — the cause of the fullscreen judder; disabling it lets present return
|
||||
> promptly. Present is stamped at the display link's `targetTimestamp` projected to `CLOCK_REALTIME`
|
||||
> (the actual on-glass instant, <1 vsync after the draw — accurate for the HUD).
|
||||
>
|
||||
> *(History: an off-main `CAMetalDisplayLink` variant and an off-main blocking-render present thread
|
||||
> were both tried and **reverted** — both measured slower on macOS *and* iPad than this main-thread
|
||||
> display-link path, whose real judder fix was simply `displaySyncEnabled = false`, not moving present
|
||||
> off-thread. Measured ~11 ms p50 on the main-thread path.)*
|
||||
>
|
||||
> **4:4:4.** Chroma, bit-depth, and colorimetry are orthogonal: the decode pixel format is a 2×2 of
|
||||
> `(chroma, HDR)` → `420v/x420/444v/x444` (all biplanar, so the existing shaders sample a full-size
|
||||
> chroma plane unchanged); the shader is keyed only on HDR. The client advertises `VIDEO_CAP_444` only
|
||||
> when `Stage444Probe` confirms **hardware** 4:4:4 decode (a hardware-required `VTDecompressionSession`
|
||||
> over an embedded 256×256 4:4:4 keyframe — software 4:4:4 is too slow for real-time; validated on M3:
|
||||
> `444v`/`x444` produced). A bounded pump backstop ends a 4:4:4 session that persistently fails to
|
||||
> decode (gated to 4:4:4 sessions, so 4:2:0 loss-recovery is untouched).
|
||||
|
||||
## Why stage 2 (design rationale)
|
||||
|
||||
@@ -47,10 +96,28 @@ Async `VTDecompressionSession` callback → **1-slot newest-ready ring** → dis
|
||||
|
||||
## Open items
|
||||
|
||||
- **Make stage 2 the default** — after resolution / HDR edge-case checks (HDR = BT.2020/PQ, 10-bit
|
||||
`…10BiPlanar` + EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`; ties in with the HDR roadmap).
|
||||
- **On-glass HDR validation** — eyeball `edrMetadata` + `opticalOutputScale: 203` on a real EDR panel
|
||||
(XDR display) against stage-1 side-by-side: diffuse white should sit at SDR-white level with only
|
||||
highlights climbing. The reference white is a single named constant (`hdrReferenceWhiteNits`) for easy
|
||||
tuning. (Needs a Windows HDR host; the Linux host is 8-bit SDR only.)
|
||||
- **On-glass 4:4:4 validation** — confirm a `PUNKTFUNK_444` host (RTX box) streams a 4:4:4 session the
|
||||
client decodes in hardware (HUD shows the resolved chroma); verify the resolution-ceiling backstop by
|
||||
forcing a too-large 4:4:4 mode.
|
||||
- **Glass-to-glass numbers via `tools/latency-probe`** — close the still-unmeasured host render→capture
|
||||
term.
|
||||
- **Smoothing / pacing policy** — present newest-ready for lowest latency today; a pacing policy can come
|
||||
later if frames look uneven.
|
||||
- **iOS / iPadOS / tvOS stage-2 variants.**
|
||||
term and confirm the main-thread display-link present p50 holds at ~11 ms (and isn't regressed by the
|
||||
per-frame `configure` / HDR-anchor work).
|
||||
- **Smoothing / pacing policy** — present newest-ready for lowest latency today; an optional even-pacing
|
||||
policy (`present(_:afterMinimumDuration:)`) can come later if frames look uneven.
|
||||
- **4:4:4 runtime downgrade-reconnect** — today a persistently-undecodable 4:4:4 session ends cleanly
|
||||
(the live 4:4:4 decode requires hardware, so a resolution-ceiling miss fails the session create
|
||||
*synchronously* and the pump backstop ends it — no black-screen loop); auto-reconnecting at 4:2:0
|
||||
(dropping `VIDEO_CAP_444`) is a future refinement.
|
||||
- **HLG** — `isHDR`/`isHDRFormat` fold HLG (transfer 18) in with PQ, but the presenter is PQ-only
|
||||
(`itur_2100_PQ` + `hdr10` EDR), so an HLG stream would be mis-toned. Latent — no host emits HLG
|
||||
(the stack is BT.2020 **PQ** only). A real HLG path (`itur_2100_HLG`, no PQ reference-white anchor)
|
||||
is future work; until then HLG should be treated as out of scope.
|
||||
- **Full-range** — the shaders hardcode limited→full expansion and the decoder requests the
|
||||
`*VideoRange` formats regardless of `connection.colorFullRange`; VideoToolbox range-converts a
|
||||
full-range source to video range on decode, so it stays self-consistent (mild level compression on
|
||||
genuinely full-range content, which no host emits). Pre-existing; wire `colorFullRange` into the
|
||||
range constants eventually.
|
||||
|
||||
Reference in New Issue
Block a user