feat(apple): stage-2 presenter — explicit decode + Metal present + glass-to-glass
ci / web (push) Failing after 38s
ci / rust (push) Successful in 53s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 16s
ci / docs-site (push) Failing after 39s
docker / deploy-docs (push) Successful in 16s
apple / swift (push) Successful in 1m17s
ci / web (push) Failing after 38s
ci / rust (push) Successful in 53s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 16s
ci / docs-site (push) Failing after 39s
docker / deploy-docs (push) Successful in 16s
apple / swift (push) Successful in 1m17s
Opt-in (Settings -> Presenter; `punktfunk.presenter`, default stage-1). Stage-1's AVSampleBufferDisplayLayer decodes AND presents internally with no per-frame callback, so neither decode nor present can be stamped or hand-paced. Stage-2 takes explicit control: - VideoDecoder: VTDecompressionSession, async output callback stamps decode-completion, session rebuilt on every IDR / format change. Unit-tested (testVideoDecoderAsyncCallbackDeliversPixels). - MetalVideoPresenter: CAMetalLayer + CVMetalTextureCache + a runtime-compiled BT.709 limited-range NV12->RGB shader, present at the next vsync. The CVMetalTextures + pixel buffer are held until the GPU completes. - Stage2Pipeline: pump thread -> decoder -> newest-ready 1-slot ring; the hosting view's display link drains it once per vsync and stamps capture->present (the display-link target time projected into CLOCK_REALTIME). - LatencyMeter gains record(ptsNs:atNs:offsetNs:); the HUD shows a capture->present (glass-to-glass, modulo host render->capture) line, skew-corrected via clockOffsetNs. Measured live ~11 ms p50 vs ~2.2 ms capture->client. - StreamView / StreamViewIOS host the CAMetalLayer as a sublayer + a CADisplayLink (NSView.displayLink on macOS) when stage-2; input capture + HUD unchanged. The session-active gates switch from `pump != nil` to `connection != nil` so capture engages without a StreamPump. Validated: builds macOS/iOS/tvOS; the decode half is unit-tested; the Metal present is live-validated on glass (correct image + the capture->present number). Colorspace is BT.709 SDR for now; 10-bit/HDR + a pacing policy are later. Plan: docs-site/content/docs/apple-stage2-presenter.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
+11
-8
@@ -74,7 +74,8 @@ What's here, all compiled and tested on macOS (Xcode 26.5 / Swift 6.3):
|
||||
received it, **skew-corrected** across machines via `PunktfunkConnection.clockOffsetNs` (the
|
||||
connect-time wall-clock handshake, `punktfunk_connection_clock_offset_ns`). It excludes the
|
||||
layer's decode+present (stage-1 `AVSampleBufferDisplayLayer` has no per-frame present callback);
|
||||
true decode→present awaits the stage-2 presenter. Settings also picks the HOST
|
||||
the opt-in **stage-2 presenter** (Settings → Presenter) adds a **capture→present**
|
||||
(glass-to-glass) line via explicit decode + a Metal/display-link present. Settings also picks the HOST
|
||||
compositor (KWin/wlroots/Mutter/gamescope, default automatic — the host honors it
|
||||
only if that backend is available there) and has a **Controllers** section: every
|
||||
detected controller (capability glyphs, battery, "In use" badge), which one to forward
|
||||
@@ -166,13 +167,15 @@ signing, bundle id `io.unom.punktfunk`. Notes:
|
||||
3. **Decode flow**: the host opens every stream with an IDR carrying VPS/SPS/PPS in-band
|
||||
and recovery keyframes re-send them — "refresh the format description on every IDR"
|
||||
(what `StreamView` does) is sufficient; there is no out-of-band extradata, ever.
|
||||
4. **Stage 2 (next)**: explicit `VTDecompressionSession` + `CAMetalLayer` for frame-pacing
|
||||
control (ProMotion/120 Hz) and true decode→present / glass-to-glass measurement. The
|
||||
cross-machine clock offset is **already wired** — `PunktfunkConnection.clockOffsetNs` (from
|
||||
the connect-time skew handshake); add it to a `CLOCK_REALTIME` present instant and subtract
|
||||
the AU `pts_ns`. **Full pickup-ready implementation plan** (decode + present + measurement
|
||||
wiring, integration points, gotchas): `docs-site/content/docs/apple-stage2-presenter.md`
|
||||
(rendered in the docs site under "Apple Stage-2 Presenter").
|
||||
4. **Stage 2 — built, opt-in (`punktfunk.presenter == "stage2"`, default stage 1).** Explicit
|
||||
`VTDecompressionSession` decode (`VideoDecoder`) → a `CAMetalLayer` + display-link present
|
||||
(`MetalVideoPresenter`/`Stage2Pipeline`), hosted as a sublayer by the same `StreamView`s with
|
||||
input capture + HUD unchanged. It adds a **capture→present** (glass-to-glass, modulo the host
|
||||
render→capture term) HUD line, skew-corrected via `PunktfunkConnection.clockOffsetNs`. The
|
||||
decode half is unit-tested (`testVideoDecoderAsyncCallbackDeliversPixels`); the Metal present
|
||||
is display-bound — **validate live** (flip the Settings "Presenter" picker, watch the HUD
|
||||
number and that the image looks right) before making it the default. 10-bit/HDR + a smoothing
|
||||
pacer are later. Plan: `docs-site/content/docs/apple-stage2-presenter.md`.
|
||||
5. **Audio — wired, both directions.** Playback: `SessionAudio` drains `nextAudio()`
|
||||
on its own thread, decodes through CoreAudio's built-in Opus codec (`OpusCodec.swift`
|
||||
— kAudioFormatOpus, no bundled libopus; round-trip unit-tested) into a priming
|
||||
|
||||
@@ -610,7 +610,8 @@ struct ContentView: View {
|
||||
},
|
||||
onSessionEnd: { [weak model] in
|
||||
Task { @MainActor in model?.sessionEnded() }
|
||||
}
|
||||
},
|
||||
presentMeter: model.presentLatency
|
||||
)
|
||||
.overlay(alignment: .topTrailing) {
|
||||
if captureEnabled { hud(conn) }
|
||||
@@ -635,6 +636,13 @@ struct ContentView: View {
|
||||
.font(.system(.caption2, design: .monospaced))
|
||||
.foregroundStyle(.secondary)
|
||||
}
|
||||
if model.presentLatencyValid {
|
||||
// Capture→present (glass-to-glass, modulo host render→capture) — stage-2 presenter
|
||||
// only; stage-1's layer presents internally with no per-frame stamp.
|
||||
Text("capture→present \(model.presentLatencyP50Ms, specifier: "%.1f")/\(model.presentLatencyP95Ms, specifier: "%.1f") ms p50/p95\(model.presentLatencySkewCorrected ? "" : " (same-host)")")
|
||||
.font(.system(.caption2, design: .monospaced))
|
||||
.foregroundStyle(.secondary)
|
||||
}
|
||||
// While captured the cursor is hidden+frozen, so the button is keyboard-only
|
||||
// (⌘⎋ or Cmd+Tab release the cursor; released, it's clickable again).
|
||||
#if os(macOS)
|
||||
|
||||
@@ -61,12 +61,21 @@ final class SessionModel: ObservableObject {
|
||||
@Published var latencyP95Ms = 0.0
|
||||
@Published var latencyValid = false
|
||||
@Published var latencySkewCorrected = false
|
||||
/// Capture→present (glass-to-glass, modulo the host render→capture term) — only the stage-2
|
||||
/// presenter can stamp this (it owns decode + a CAMetalLayer/display-link present). Stays
|
||||
/// invalid under stage-1, where the layer presents internally with no per-frame callback.
|
||||
@Published var presentLatencyP50Ms = 0.0
|
||||
@Published var presentLatencyP95Ms = 0.0
|
||||
@Published var presentLatencyValid = false
|
||||
@Published var presentLatencySkewCorrected = false
|
||||
/// Mirrors StreamView's capture state (it owns the input capture; this drives the
|
||||
/// HUD's "click to capture" / "⌘⎋ releases" hint).
|
||||
@Published var mouseCaptured = false
|
||||
|
||||
let meter = FrameMeter()
|
||||
let latency = LatencyMeter()
|
||||
/// Fed by the stage-2 presenter's display link (capture→present). Passed to StreamView.
|
||||
let presentLatency = LatencyMeter()
|
||||
private var statsTimer: Timer?
|
||||
private var audio: SessionAudio?
|
||||
private var gamepadCapture: GamepadCapture?
|
||||
@@ -230,6 +239,14 @@ final class SessionModel: ObservableObject {
|
||||
} else {
|
||||
self.latencyValid = false
|
||||
}
|
||||
if let p = self.presentLatency.drain() {
|
||||
self.presentLatencyP50Ms = p.p50Ms
|
||||
self.presentLatencyP95Ms = p.p95Ms
|
||||
self.presentLatencySkewCorrected = p.skewCorrected
|
||||
self.presentLatencyValid = true
|
||||
} else {
|
||||
self.presentLatencyValid = false
|
||||
}
|
||||
}
|
||||
}
|
||||
// .common so the HUD keeps updating during window drags / menu tracking.
|
||||
|
||||
@@ -17,6 +17,7 @@ struct SettingsView: View {
|
||||
@AppStorage("punktfunk.compositor") private var compositor = 0
|
||||
@AppStorage("punktfunk.gamepadType") private var gamepadType = 0
|
||||
@AppStorage("punktfunk.bitrateKbps") private var bitrateKbps = 0
|
||||
@AppStorage("punktfunk.presenter") private var presenter = "stage1"
|
||||
@AppStorage("punktfunk.micEnabled") private var micEnabled = true
|
||||
@ObservedObject private var gamepads = GamepadManager.shared
|
||||
#if os(macOS)
|
||||
@@ -88,6 +89,10 @@ struct SettingsView: View {
|
||||
}
|
||||
TVSelectionRow(
|
||||
title: "Compositor", options: compositors, selection: $compositor)
|
||||
TVSelectionRow(
|
||||
title: "Presenter",
|
||||
options: [("Stage 1 (default)", "stage1"), ("Stage 2 (experimental)", "stage2")],
|
||||
selection: $presenter)
|
||||
Text("The host creates a virtual output at exactly this mode — native "
|
||||
+ "resolution, no scaling. \(Self.bitrateFooter) A specific compositor "
|
||||
+ "is honored only if available on the host.")
|
||||
@@ -366,6 +371,21 @@ struct SettingsView: View {
|
||||
.font(.caption)
|
||||
.foregroundStyle(.secondary)
|
||||
}
|
||||
Section {
|
||||
Picker("Presenter", selection: $presenter) {
|
||||
Text("Stage 1 (default)").tag("stage1")
|
||||
Text("Stage 2 (experimental)").tag("stage2")
|
||||
}
|
||||
} header: {
|
||||
Text("Video presenter")
|
||||
} footer: {
|
||||
Text("Stage 1 feeds compressed video to the system display layer (known-good). "
|
||||
+ "Stage 2 decodes explicitly and presents through Metal with a display "
|
||||
+ "link — it adds a capture→present (glass-to-glass) latency line in the HUD "
|
||||
+ "and shortens the present tail. Applies from the next session.")
|
||||
.font(.caption)
|
||||
.foregroundStyle(.secondary)
|
||||
}
|
||||
Section {
|
||||
if gamepads.controllers.isEmpty {
|
||||
Text("No controllers detected")
|
||||
|
||||
@@ -25,13 +25,20 @@ public final class LatencyMeter: @unchecked Sendable {
|
||||
|
||||
public init() {}
|
||||
|
||||
/// Record one frame at receipt. `ptsNs` is the host capture clock (the AU's pts); `offsetNs` is
|
||||
/// the host-client clock offset from the skew handshake (0 = uncorrected / old host).
|
||||
/// Record one frame at receipt (now). `ptsNs` is the host capture clock (the AU's pts);
|
||||
/// `offsetNs` is the host-client clock offset from the skew handshake (0 = uncorrected).
|
||||
public func record(ptsNs: UInt64, offsetNs: Int64) {
|
||||
var ts = timespec()
|
||||
clock_gettime(CLOCK_REALTIME, &ts)
|
||||
let nowNs = Int64(ts.tv_sec) * 1_000_000_000 + Int64(ts.tv_nsec)
|
||||
let latNs = nowNs &+ offsetNs &- Int64(bitPattern: ptsNs)
|
||||
record(ptsNs: ptsNs, atNs: nowNs, offsetNs: offsetNs)
|
||||
}
|
||||
|
||||
/// Record one frame whose latency is `atNs + offsetNs - ptsNs` — an EXPLICIT client instant
|
||||
/// rather than now. The stage-2 presenter uses this to stamp capture→present at the display
|
||||
/// link's target present time (not the moment the present call ran). All in `CLOCK_REALTIME`.
|
||||
public func record(ptsNs: UInt64, atNs: Int64, offsetNs: Int64) {
|
||||
let latNs = atNs &+ offsetNs &- Int64(bitPattern: ptsNs)
|
||||
// Drop absurd values (a clock step, a wildly wrong offset, or garbage pts).
|
||||
guard latNs > 0, latNs < 10_000_000_000 else { return }
|
||||
lock.lock()
|
||||
|
||||
@@ -0,0 +1,147 @@
|
||||
// Stage-2 presenter, present half: draw a decoded NV12 CVPixelBuffer into a CAMetalLayer
|
||||
// drawable with a BT.709 YUV→RGB shader. The display link (owned by the hosting view) drives
|
||||
// `render` once per vsync with the target present time, so a present can finally be stamped and
|
||||
// the present tail hand-paced. See docs apple-stage2-presenter.md.
|
||||
//
|
||||
// Main-thread only: created during view setup, `render` called from the view's CADisplayLink
|
||||
// (which fires on the main runloop). The Metal objects + texture cache are touched only here.
|
||||
|
||||
#if canImport(Metal) && canImport(QuartzCore)
|
||||
import CoreVideo
|
||||
import Metal
|
||||
import QuartzCore
|
||||
|
||||
/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and a
|
||||
/// BT.709 limited-range NV12→RGB fragment shader. uv.y is flipped (1 - p.y) so the top-left-
|
||||
/// origin texture presents upright (NDC y is up), not upside down. (Colorspace is BT.709 SDR
|
||||
/// for now — matches the host; 10-bit/HDR + other matrices are a later tie-in.)
|
||||
private let shaderSource = """
|
||||
#include <metal_stdlib>
|
||||
using namespace metal;
|
||||
|
||||
struct VOut { float4 pos [[position]]; float2 uv; };
|
||||
|
||||
vertex VOut pf_vtx(uint vid [[vertex_id]]) {
|
||||
float2 p = float2(float((vid << 1) & 2), float(vid & 2));
|
||||
VOut o;
|
||||
o.pos = float4(p * 2.0 - 1.0, 0.0, 1.0);
|
||||
o.uv = float2(p.x, 1.0 - p.y);
|
||||
return o;
|
||||
}
|
||||
|
||||
fragment float4 pf_frag(VOut in [[stage_in]],
|
||||
texture2d<float> lumaTex [[texture(0)]],
|
||||
texture2d<float> chromaTex [[texture(1)]]) {
|
||||
constexpr sampler s(filter::linear, address::clamp_to_edge);
|
||||
float y = lumaTex.sample(s, in.uv).r;
|
||||
float2 c = chromaTex.sample(s, in.uv).rg;
|
||||
// BT.709, 8-bit limited (video) range → full-range RGB.
|
||||
y = (y - 16.0/255.0) * (255.0/219.0);
|
||||
float u = (c.x - 128.0/255.0) * (255.0/224.0);
|
||||
float v = (c.y - 128.0/255.0) * (255.0/224.0);
|
||||
float r = y + 1.5748 * v;
|
||||
float g = y - 0.1873 * u - 0.4681 * v;
|
||||
float b = y + 1.8556 * u;
|
||||
return float4(saturate(float3(r, g, b)), 1.0);
|
||||
}
|
||||
"""
|
||||
|
||||
public final class MetalVideoPresenter {
|
||||
/// The layer the hosting view installs (as a sublayer) and sizes to its bounds.
|
||||
public let layer: CAMetalLayer
|
||||
|
||||
private let device: MTLDevice
|
||||
private let queue: MTLCommandQueue
|
||||
private let pipeline: MTLRenderPipelineState
|
||||
private var textureCache: CVMetalTextureCache?
|
||||
|
||||
/// nil if Metal is unavailable (no GPU / a headless CI) — the caller falls back to stage-1.
|
||||
public init?() {
|
||||
guard let device = MTLCreateSystemDefaultDevice(),
|
||||
let queue = device.makeCommandQueue()
|
||||
else { return nil }
|
||||
self.device = device
|
||||
self.queue = queue
|
||||
do {
|
||||
let library = try device.makeLibrary(source: shaderSource, options: nil)
|
||||
let desc = MTLRenderPipelineDescriptor()
|
||||
desc.vertexFunction = library.makeFunction(name: "pf_vtx")
|
||||
desc.fragmentFunction = library.makeFunction(name: "pf_frag")
|
||||
desc.colorAttachments[0].pixelFormat = .bgra8Unorm
|
||||
pipeline = try device.makeRenderPipelineState(descriptor: desc)
|
||||
} catch {
|
||||
return nil
|
||||
}
|
||||
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &textureCache)
|
||||
guard textureCache != nil else { return nil }
|
||||
|
||||
let layer = CAMetalLayer()
|
||||
layer.device = device
|
||||
layer.pixelFormat = .bgra8Unorm
|
||||
layer.framebufferOnly = true
|
||||
layer.isOpaque = true
|
||||
self.layer = layer
|
||||
}
|
||||
|
||||
/// Track the stream mode (the host can Reconfigure mid-stream). Size is in pixels.
|
||||
public func setDrawableSize(_ size: CGSize) {
|
||||
guard size.width > 0, size.height > 0 else { return }
|
||||
if layer.drawableSize != size { layer.drawableSize = size }
|
||||
}
|
||||
|
||||
/// Draw one decoded frame to the next drawable and present it. Returns true on success;
|
||||
/// false when there's no drawable yet, a texture couldn't be made, or Metal errored — the
|
||||
/// caller then doesn't stamp a present for this frame.
|
||||
@discardableResult
|
||||
public func render(_ pixelBuffer: CVPixelBuffer) -> Bool {
|
||||
guard let textureCache,
|
||||
let luma = makeTexture(pixelBuffer, plane: 0, format: .r8Unorm, cache: textureCache),
|
||||
let chroma = makeTexture(pixelBuffer, plane: 1, format: .rg8Unorm, cache: textureCache)
|
||||
else { return false }
|
||||
|
||||
// The hosting view owns drawableSize (aspect-fit to its bounds); skip until it's laid
|
||||
// out. The fullscreen triangle scales the decoded texture to fill the drawable.
|
||||
guard layer.drawableSize.width > 0, layer.drawableSize.height > 0,
|
||||
let drawable = layer.nextDrawable(),
|
||||
let commandBuffer = queue.makeCommandBuffer()
|
||||
else { return false }
|
||||
|
||||
let pass = MTLRenderPassDescriptor()
|
||||
pass.colorAttachments[0].texture = drawable.texture
|
||||
pass.colorAttachments[0].loadAction = .clear
|
||||
pass.colorAttachments[0].clearColor = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 1)
|
||||
pass.colorAttachments[0].storeAction = .store
|
||||
guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else {
|
||||
return false
|
||||
}
|
||||
encoder.setRenderPipelineState(pipeline)
|
||||
encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0)
|
||||
encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1)
|
||||
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
|
||||
encoder.endEncoding()
|
||||
commandBuffer.present(drawable) // present at the next vsync — lowest latency
|
||||
// Hold the CVMetalTextures + the source pixel buffer (its IOSurface) alive until the GPU
|
||||
// finishes sampling — releasing them at scope exit could free the backing mid-read.
|
||||
commandBuffer.addCompletedHandler { _ in _ = (luma, chroma, pixelBuffer) }
|
||||
commandBuffer.commit()
|
||||
return true
|
||||
}
|
||||
|
||||
/// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past
|
||||
/// the draw — the MTLTexture is only valid while its CVMetalTexture is retained.
|
||||
private func makeTexture(
|
||||
_ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat,
|
||||
cache: CVMetalTextureCache
|
||||
) -> CVMetalTexture? {
|
||||
let w = CVPixelBufferGetWidthOfPlane(pixelBuffer, plane)
|
||||
let h = CVPixelBufferGetHeightOfPlane(pixelBuffer, plane)
|
||||
var cvTexture: CVMetalTexture?
|
||||
let status = CVMetalTextureCacheCreateTextureFromImage(
|
||||
kCFAllocatorDefault, cache, pixelBuffer, nil, format, w, h, plane, &cvTexture)
|
||||
guard status == kCVReturnSuccess, let cvTexture,
|
||||
CVMetalTextureGetTexture(cvTexture) != nil
|
||||
else { return nil }
|
||||
return cvTexture
|
||||
}
|
||||
}
|
||||
#endif
|
||||
@@ -0,0 +1,133 @@
|
||||
// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async
|
||||
// output drops the newest decoded frame into a 1-slot ring; the hosting view's display link
|
||||
// calls `renderTick` once per vsync to draw + present the newest ready frame and stamp
|
||||
// capture→present. Mirrors StreamPump's lifecycle (one per start; cancel is permanent).
|
||||
//
|
||||
// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick`
|
||||
// + `setDrawableSize` + `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there).
|
||||
// Only the ring + decoder cross threads and both are internally locked.
|
||||
|
||||
#if canImport(Metal) && canImport(QuartzCore)
|
||||
import AVFoundation
|
||||
import Foundation
|
||||
import QuartzCore
|
||||
|
||||
/// Newest-ready 1-slot ring: the decoder overwrites (drops the older undisplayed frame — lowest
|
||||
/// latency, no smoothing buffer), the display link takes-and-clears. Sendable; lock-guarded.
|
||||
private final class ReadyRing: @unchecked Sendable {
|
||||
private let lock = NSLock()
|
||||
private var frame: ReadyFrame?
|
||||
func submit(_ f: ReadyFrame) {
|
||||
lock.lock(); frame = f; lock.unlock()
|
||||
}
|
||||
func take() -> ReadyFrame? {
|
||||
lock.lock(); defer { lock.unlock() }
|
||||
let f = frame; frame = nil; return f
|
||||
}
|
||||
}
|
||||
|
||||
/// Cancellation handle owned by one pump thread (same pattern as StreamPump).
|
||||
private final class PumpToken: @unchecked Sendable {
|
||||
private let lock = NSLock()
|
||||
private var live = true
|
||||
var isLive: Bool { lock.lock(); defer { lock.unlock() }; return live }
|
||||
func cancel() { lock.lock(); live = false; lock.unlock() }
|
||||
}
|
||||
|
||||
public final class Stage2Pipeline {
|
||||
private let ring = ReadyRing()
|
||||
private let presenter: MetalVideoPresenter
|
||||
private let decoder: VideoDecoder
|
||||
private let presentMeter: LatencyMeter
|
||||
private var token = PumpToken()
|
||||
private var offsetNs: Int64 = 0
|
||||
|
||||
/// The Metal layer the hosting view installs + sizes. nil-init fails when Metal is
|
||||
/// unavailable so the caller can fall back to stage-1.
|
||||
public var layer: CAMetalLayer { presenter.layer }
|
||||
|
||||
/// `presentMeter` records capture→present (the glass-to-glass term). Returns nil if Metal
|
||||
/// can't be set up (headless / no GPU) — caller falls back to the stage-1 presenter.
|
||||
public init?(presentMeter: LatencyMeter) {
|
||||
guard let presenter = MetalVideoPresenter() else { return nil }
|
||||
self.presenter = presenter
|
||||
self.presentMeter = presentMeter
|
||||
let ring = ring
|
||||
self.decoder = VideoDecoder(
|
||||
onDecoded: { ring.submit($0) },
|
||||
onDecodeError: { _ in /* the pump resets the session via reset() on the next IDR */ })
|
||||
}
|
||||
|
||||
/// Start pulling AUs into the decoder. `onFrame` fires per AU at receipt (capture→client
|
||||
/// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client)
|
||||
/// makes the present stamp cross-machine valid.
|
||||
public func start(
|
||||
connection: PunktfunkConnection,
|
||||
onFrame: (@Sendable (AccessUnit) -> Void)?,
|
||||
onSessionEnd: (@Sendable () -> Void)?
|
||||
) {
|
||||
offsetNs = connection.clockOffsetNs
|
||||
token = PumpToken() // fresh token per start — cancel is permanent (like StreamPump)
|
||||
let token = token
|
||||
let decoder = decoder
|
||||
let thread = Thread {
|
||||
var format: CMVideoFormatDescription?
|
||||
while token.isLive {
|
||||
do {
|
||||
guard let au = try connection.nextAU(timeoutMs: 100) else { continue }
|
||||
onFrame?(au)
|
||||
if let f = AnnexB.formatDescription(fromIDR: au.data) {
|
||||
format = f // refreshed on every IDR (mode changes included)
|
||||
}
|
||||
guard let f = format, token.isLive else { continue }
|
||||
if !decoder.decode(au: au, format: f) {
|
||||
// Submit/decoder error: drop the session and re-gate on the next IDR's
|
||||
// in-band parameter sets (a delta frame can't recover) — stage-1's policy.
|
||||
decoder.reset()
|
||||
}
|
||||
} catch {
|
||||
if token.isLive { onSessionEnd?() }
|
||||
break // session closed
|
||||
}
|
||||
}
|
||||
}
|
||||
thread.name = "punktfunk-stage2-pump"
|
||||
thread.qualityOfService = .userInteractive
|
||||
thread.start()
|
||||
}
|
||||
|
||||
/// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp
|
||||
/// capture→present at `targetPresentNs` — the display link's target present instant, already
|
||||
/// converted to `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
|
||||
public func renderTick(targetPresentNs: Int64) {
|
||||
guard let frame = ring.take() else { return }
|
||||
guard presenter.render(frame.pixelBuffer) else { return }
|
||||
presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs)
|
||||
}
|
||||
|
||||
/// MAIN thread. Keep the drawable matched to the negotiated mode (host can Reconfigure).
|
||||
public func setDrawableSize(_ size: CGSize) {
|
||||
presenter.setDrawableSize(size)
|
||||
}
|
||||
|
||||
/// Stop the pump (≤ one poll timeout) and drop the decode session. Does not close the
|
||||
/// connection. A restart needs a fresh Stage2Pipeline (cancel is permanent).
|
||||
public func stop() {
|
||||
token.cancel()
|
||||
decoder.reset()
|
||||
}
|
||||
|
||||
deinit { token.cancel() }
|
||||
|
||||
/// Convert a `CADisplayLink.targetTimestamp` (CACurrentMediaTime basis) to a `CLOCK_REALTIME`
|
||||
/// nanosecond instant — the present clock the AU pts + skew offset live in. Projects to the
|
||||
/// target present time (when the frame is actually on glass), not the moment we drew.
|
||||
public static func realtimeNs(forDisplayLinkTimestamp t: CFTimeInterval) -> Int64 {
|
||||
let caNow = CACurrentMediaTime()
|
||||
var ts = timespec()
|
||||
clock_gettime(CLOCK_REALTIME, &ts)
|
||||
let realtimeNow = Int64(ts.tv_sec) * 1_000_000_000 + Int64(ts.tv_nsec)
|
||||
return realtimeNow + Int64((t - caNow) * 1_000_000_000)
|
||||
}
|
||||
}
|
||||
#endif
|
||||
@@ -69,30 +69,35 @@ public struct StreamView: NSViewRepresentable {
|
||||
private let onCaptureChange: ((Bool) -> Void)?
|
||||
private let onFrame: (@Sendable (AccessUnit) -> Void)?
|
||||
private let onSessionEnd: (@Sendable () -> Void)?
|
||||
private let presentMeter: LatencyMeter?
|
||||
|
||||
/// `onFrame`/`onSessionEnd` fire on the pump thread — hop to the main actor for UI.
|
||||
/// `captureEnabled: false` disables input capture entirely while UI (e.g. a trust
|
||||
/// prompt) is layered over the stream; flipping it to true auto-engages capture
|
||||
/// once. `onCaptureChange` (main thread) reports engage/release — drive the HUD's
|
||||
/// "click to capture" / "⌘⎋ releases" hint with it.
|
||||
/// "click to capture" / "⌘⎋ releases" hint with it. `presentMeter` records capture→present
|
||||
/// when the stage-2 presenter is active (`punktfunk.presenter == "stage2"`).
|
||||
public init(
|
||||
connection: PunktfunkConnection,
|
||||
captureEnabled: Bool = true,
|
||||
onCaptureChange: ((Bool) -> Void)? = nil,
|
||||
onFrame: (@Sendable (AccessUnit) -> Void)? = nil,
|
||||
onSessionEnd: (@Sendable () -> Void)? = nil
|
||||
onSessionEnd: (@Sendable () -> Void)? = nil,
|
||||
presentMeter: LatencyMeter? = nil
|
||||
) {
|
||||
self.connection = connection
|
||||
self.captureEnabled = captureEnabled
|
||||
self.onCaptureChange = onCaptureChange
|
||||
self.onFrame = onFrame
|
||||
self.onSessionEnd = onSessionEnd
|
||||
self.presentMeter = presentMeter
|
||||
}
|
||||
|
||||
public func makeNSView(context: Context) -> StreamLayerView {
|
||||
let view = StreamLayerView()
|
||||
view.onCaptureChange = onCaptureChange
|
||||
view.captureEnabled = captureEnabled
|
||||
view.presentMeter = presentMeter
|
||||
view.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
return view
|
||||
}
|
||||
@@ -100,6 +105,7 @@ public struct StreamView: NSViewRepresentable {
|
||||
public func updateNSView(_ view: StreamLayerView, context: Context) {
|
||||
view.onCaptureChange = onCaptureChange
|
||||
view.captureEnabled = captureEnabled
|
||||
view.presentMeter = presentMeter
|
||||
// SwiftUI reuses the NSView across state changes — repoint the pump only when the
|
||||
// connection identity actually changed.
|
||||
if view.connection !== connection {
|
||||
@@ -115,6 +121,12 @@ public struct StreamView: NSViewRepresentable {
|
||||
public final class StreamLayerView: NSView {
|
||||
private let displayLayer = AVSampleBufferDisplayLayer()
|
||||
private var pump: StreamPump?
|
||||
/// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a
|
||||
/// display link instead of the StreamPump → displayLayer path. nil = stage-1 (default).
|
||||
var presentMeter: LatencyMeter?
|
||||
private var stage2: Stage2Pipeline?
|
||||
private var stage2Link: CADisplayLink?
|
||||
private var metalLayer: CAMetalLayer?
|
||||
public private(set) var connection: PunktfunkConnection?
|
||||
private let cursorCapture = CursorCapture()
|
||||
private var inputCapture: InputCapture?
|
||||
@@ -191,6 +203,7 @@ public final class StreamLayerView: NSView {
|
||||
public override func layout() {
|
||||
super.layout()
|
||||
attemptPendingCapture() // bounds become real here on first presentation
|
||||
layoutMetalLayer() // keep the stage-2 sublayer aspect-fit to the view
|
||||
}
|
||||
|
||||
// MARK: - Capture state machine
|
||||
@@ -296,7 +309,9 @@ public final class StreamLayerView: NSView {
|
||||
// A click is explicit intent AND may arrive mid-activation (acceptsFirstMouse:
|
||||
// NSApp.isActive / isKeyWindow are still false for the click coming in from
|
||||
// another app) — only the auto-engage paths require already-held key status.
|
||||
guard captureEnabled, !captured, pump != nil, window != nil,
|
||||
// `connection != nil` (not `pump`) is the session-active gate — the stage-2 presenter
|
||||
// runs without a StreamPump, and capture must still engage there.
|
||||
guard captureEnabled, !captured, connection != nil, window != nil,
|
||||
fromClick || (NSApp.isActive && window?.isKeyWindow == true)
|
||||
else { return }
|
||||
// If the cursor grab is refused (e.g. the reactivating click arrives before the app is
|
||||
@@ -416,14 +431,80 @@ public final class StreamLayerView: NSView {
|
||||
capture.start()
|
||||
inputCapture = capture
|
||||
|
||||
let pump = StreamPump()
|
||||
pump.start(
|
||||
connection: connection, layer: displayLayer,
|
||||
onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
self.pump = pump
|
||||
// Presenter choice — default stage-1 (the known-good AVSampleBufferDisplayLayer). Stage-2
|
||||
// (`punktfunk.presenter == "stage2"`) takes explicit VTDecompressionSession decode + a
|
||||
// CAMetalLayer/display-link present; it falls back here if Metal can't be set up.
|
||||
if UserDefaults.standard.string(forKey: "punktfunk.presenter") == "stage2",
|
||||
let meter = presentMeter,
|
||||
let pipeline = Stage2Pipeline(presentMeter: meter) {
|
||||
startStage2(pipeline, connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
} else {
|
||||
let pump = StreamPump()
|
||||
pump.start(
|
||||
connection: connection, layer: displayLayer,
|
||||
onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
self.pump = pump
|
||||
}
|
||||
requestAutoCapture() // entering a session is the deliberate "capture me" moment
|
||||
}
|
||||
|
||||
// MARK: - Stage-2 presenter (VTDecompressionSession → CAMetalLayer + display link)
|
||||
|
||||
private func startStage2(
|
||||
_ pipeline: Stage2Pipeline, connection: PunktfunkConnection,
|
||||
onFrame: (@Sendable (AccessUnit) -> Void)?, onSessionEnd: (@Sendable () -> Void)?
|
||||
) {
|
||||
let metal = pipeline.layer
|
||||
displayLayer.addSublayer(metal) // contentsScale + frame set in layoutMetalLayer()
|
||||
metalLayer = metal
|
||||
stage2 = pipeline
|
||||
layoutMetalLayer()
|
||||
let link = displayLink(target: self, selector: #selector(stage2Tick(_:)))
|
||||
link.add(to: .main, forMode: .common)
|
||||
stage2Link = link
|
||||
pipeline.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
}
|
||||
|
||||
@objc private func stage2Tick(_ link: CADisplayLink) {
|
||||
stage2?.renderTick(
|
||||
targetPresentNs: Stage2Pipeline.realtimeNs(forDisplayLinkTimestamp: link.targetTimestamp))
|
||||
}
|
||||
|
||||
/// Aspect-fit the metal sublayer in the view (the host streams at the client's native mode,
|
||||
/// so this is usually the full bounds; it letterboxes a resized window). drawableSize is the
|
||||
/// layer's pixel size — the fullscreen-triangle shader scales the decoded texture to fill it.
|
||||
private func layoutMetalLayer() {
|
||||
guard let metalLayer, let connection else { return }
|
||||
let mode = connection.currentMode()
|
||||
let fit: NSRect = (mode.width > 0 && mode.height > 0)
|
||||
? AVMakeRect(
|
||||
aspectRatio: CGSize(width: Int(mode.width), height: Int(mode.height)),
|
||||
insideRect: bounds)
|
||||
: bounds
|
||||
let scale = window?.backingScaleFactor ?? 1
|
||||
// No implicit resize animation; refresh contentsScale on a retina↔non-retina move.
|
||||
CATransaction.begin()
|
||||
CATransaction.setDisableActions(true)
|
||||
metalLayer.contentsScale = scale
|
||||
metalLayer.frame = fit
|
||||
CATransaction.commit()
|
||||
stage2?.setDrawableSize(CGSize(width: fit.width * scale, height: fit.height * scale))
|
||||
}
|
||||
|
||||
public override func viewDidChangeBackingProperties() {
|
||||
super.viewDidChangeBackingProperties()
|
||||
layoutMetalLayer() // backing scale changed (e.g. moved to a non-retina display)
|
||||
}
|
||||
|
||||
private func teardownStage2() {
|
||||
stage2Link?.invalidate()
|
||||
stage2Link = nil
|
||||
stage2?.stop()
|
||||
stage2 = nil
|
||||
metalLayer?.removeFromSuperlayer()
|
||||
metalLayer = nil
|
||||
}
|
||||
|
||||
/// Stop pumping (≤ one poll timeout). Does not close the connection — that stays with
|
||||
/// whoever owns it (PunktfunkConnection.close() is safe alongside a draining pump).
|
||||
public func stop() {
|
||||
@@ -433,6 +514,7 @@ public final class StreamLayerView: NSView {
|
||||
inputCapture = nil
|
||||
pump?.stop()
|
||||
pump = nil
|
||||
teardownStage2()
|
||||
connection = nil
|
||||
}
|
||||
|
||||
|
||||
@@ -45,25 +45,29 @@ public struct StreamView: UIViewControllerRepresentable {
|
||||
private let onCaptureChange: ((Bool) -> Void)?
|
||||
private let onFrame: (@Sendable (AccessUnit) -> Void)?
|
||||
private let onSessionEnd: (@Sendable () -> Void)?
|
||||
private let presentMeter: LatencyMeter?
|
||||
|
||||
public init(
|
||||
connection: PunktfunkConnection,
|
||||
captureEnabled: Bool = true,
|
||||
onCaptureChange: ((Bool) -> Void)? = nil,
|
||||
onFrame: (@Sendable (AccessUnit) -> Void)? = nil,
|
||||
onSessionEnd: (@Sendable () -> Void)? = nil
|
||||
onSessionEnd: (@Sendable () -> Void)? = nil,
|
||||
presentMeter: LatencyMeter? = nil
|
||||
) {
|
||||
self.connection = connection
|
||||
self.captureEnabled = captureEnabled
|
||||
self.onCaptureChange = onCaptureChange
|
||||
self.onFrame = onFrame
|
||||
self.onSessionEnd = onSessionEnd
|
||||
self.presentMeter = presentMeter
|
||||
}
|
||||
|
||||
public func makeUIViewController(context: Context) -> StreamViewController {
|
||||
let controller = StreamViewController()
|
||||
controller.onCaptureChange = onCaptureChange
|
||||
controller.captureEnabled = captureEnabled
|
||||
controller.presentMeter = presentMeter
|
||||
controller.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
return controller
|
||||
}
|
||||
@@ -71,6 +75,7 @@ public struct StreamView: UIViewControllerRepresentable {
|
||||
public func updateUIViewController(_ controller: StreamViewController, context: Context) {
|
||||
controller.onCaptureChange = onCaptureChange
|
||||
controller.captureEnabled = captureEnabled
|
||||
controller.presentMeter = presentMeter
|
||||
if controller.connection !== connection {
|
||||
controller.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
}
|
||||
@@ -87,6 +92,12 @@ public final class StreamViewController: UIViewController {
|
||||
public private(set) var connection: PunktfunkConnection?
|
||||
private var pump: StreamPump?
|
||||
private var observers: [NSObjectProtocol] = []
|
||||
/// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a
|
||||
/// CADisplayLink instead of the StreamPump → displayLayer path. nil = stage-1 (default).
|
||||
var presentMeter: LatencyMeter?
|
||||
private var stage2: Stage2Pipeline?
|
||||
private var stage2Link: CADisplayLink?
|
||||
private var metalLayer: CAMetalLayer?
|
||||
#if os(iOS)
|
||||
private var inputCapture: InputCapture?
|
||||
fileprivate var captured = false
|
||||
@@ -204,11 +215,20 @@ public final class StreamViewController: UIViewController {
|
||||
inputCapture = capture
|
||||
#endif
|
||||
|
||||
let pump = StreamPump()
|
||||
pump.start(
|
||||
connection: connection, layer: streamView.displayLayer,
|
||||
onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
self.pump = pump
|
||||
// Presenter choice — default stage-1 (the known-good AVSampleBufferDisplayLayer). Stage-2
|
||||
// (`punktfunk.presenter == "stage2"`) takes VTDecompressionSession decode + a
|
||||
// CAMetalLayer/display-link present; falls back here if Metal can't be set up.
|
||||
if UserDefaults.standard.string(forKey: "punktfunk.presenter") == "stage2",
|
||||
let meter = presentMeter,
|
||||
let pipeline = Stage2Pipeline(presentMeter: meter) {
|
||||
startStage2(pipeline, connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
} else {
|
||||
let pump = StreamPump()
|
||||
pump.start(
|
||||
connection: connection, layer: streamView.displayLayer,
|
||||
onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
self.pump = pump
|
||||
}
|
||||
|
||||
#if os(iOS)
|
||||
// GC only delivers while active; everything held is flushed by InputCapture's
|
||||
@@ -227,7 +247,7 @@ public final class StreamViewController: UIViewController {
|
||||
observers.append(NotificationCenter.default.addObserver(
|
||||
forName: UIApplication.didBecomeActiveNotification, object: nil, queue: .main
|
||||
) { [weak self] _ in
|
||||
guard let self, self.wasCapturedOnResign, self.captureEnabled, self.pump != nil
|
||||
guard let self, self.wasCapturedOnResign, self.captureEnabled, self.connection != nil
|
||||
else { return }
|
||||
self.setCaptured(true)
|
||||
})
|
||||
@@ -262,13 +282,74 @@ public final class StreamViewController: UIViewController {
|
||||
#endif
|
||||
pump?.stop()
|
||||
pump = nil
|
||||
teardownStage2()
|
||||
connection = nil
|
||||
}
|
||||
|
||||
// MARK: - Stage-2 presenter (VTDecompressionSession → CAMetalLayer + display link)
|
||||
|
||||
private func startStage2(
|
||||
_ pipeline: Stage2Pipeline, connection: PunktfunkConnection,
|
||||
onFrame: (@Sendable (AccessUnit) -> Void)?, onSessionEnd: (@Sendable () -> Void)?
|
||||
) {
|
||||
let metal = pipeline.layer
|
||||
metal.contentsScale = streamView.contentScaleFactor
|
||||
streamView.layer.addSublayer(metal)
|
||||
metalLayer = metal
|
||||
stage2 = pipeline
|
||||
layoutMetalLayer()
|
||||
let link = CADisplayLink(target: self, selector: #selector(stage2Tick(_:)))
|
||||
link.add(to: .main, forMode: .common)
|
||||
stage2Link = link
|
||||
pipeline.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||
}
|
||||
|
||||
@objc private func stage2Tick(_ link: CADisplayLink) {
|
||||
stage2?.renderTick(
|
||||
targetPresentNs: Stage2Pipeline.realtimeNs(forDisplayLinkTimestamp: link.targetTimestamp))
|
||||
}
|
||||
|
||||
public override func viewDidLayoutSubviews() {
|
||||
super.viewDidLayoutSubviews()
|
||||
layoutMetalLayer()
|
||||
}
|
||||
|
||||
/// Aspect-fit the metal sublayer in the view (the host streams at the client's native mode,
|
||||
/// so this is usually the full bounds). drawableSize is the layer's pixel size; the shader's
|
||||
/// fullscreen triangle scales the decoded texture to fill it.
|
||||
private func layoutMetalLayer() {
|
||||
guard let metalLayer, let connection else { return }
|
||||
let mode = connection.currentMode()
|
||||
let bounds = streamView.bounds
|
||||
let fit: CGRect = (mode.width > 0 && mode.height > 0)
|
||||
? AVMakeRect(
|
||||
aspectRatio: CGSize(width: Int(mode.width), height: Int(mode.height)),
|
||||
insideRect: bounds)
|
||||
: bounds
|
||||
let scale = streamView.contentScaleFactor
|
||||
CATransaction.begin()
|
||||
CATransaction.setDisableActions(true) // don't animate the resize
|
||||
metalLayer.contentsScale = scale
|
||||
metalLayer.frame = fit
|
||||
CATransaction.commit()
|
||||
stage2?.setDrawableSize(CGSize(width: fit.width * scale, height: fit.height * scale))
|
||||
}
|
||||
|
||||
private func teardownStage2() {
|
||||
stage2Link?.invalidate()
|
||||
stage2Link = nil
|
||||
stage2?.stop()
|
||||
stage2 = nil
|
||||
metalLayer?.removeFromSuperlayer()
|
||||
metalLayer = nil
|
||||
}
|
||||
|
||||
#if os(iOS)
|
||||
private func setCaptured(_ on: Bool) {
|
||||
if on {
|
||||
guard captureEnabled, !captured, pump != nil else { return }
|
||||
// `connection != nil` (not `pump`) is the session-active gate — the stage-2 presenter
|
||||
// runs without a StreamPump.
|
||||
guard captureEnabled, !captured, connection != nil else { return }
|
||||
inputCapture?.setForwarding(true)
|
||||
captured = true
|
||||
} else {
|
||||
|
||||
@@ -0,0 +1,165 @@
|
||||
// Stage-2 presenter, decode half: explicit VideoToolbox decode of the host's HEVC AUs.
|
||||
//
|
||||
// Stage-1 hands compressed samples to AVSampleBufferDisplayLayer, which decodes AND presents
|
||||
// internally with no per-frame callback — so neither decode-completion nor present can be
|
||||
// stamped, and frames can't be hand-paced. Here we drive VTDecompressionSession ourselves: the
|
||||
// output callback delivers a decoded CVPixelBuffer, we stamp decode-completion, and push it into
|
||||
// a ready ring the presenter's display link drains. See docs apple-stage2-presenter.md.
|
||||
|
||||
import CoreMedia
|
||||
import CoreVideo
|
||||
import Foundation
|
||||
import VideoToolbox
|
||||
|
||||
/// One decoded frame waiting to be presented. Owns a retained `CVPixelBuffer` until shown.
|
||||
public struct ReadyFrame: @unchecked Sendable {
|
||||
/// Host capture clock (the AU's pts), in nanoseconds.
|
||||
public let ptsNs: UInt64
|
||||
/// Client `CLOCK_REALTIME` instant decode completed, in nanoseconds.
|
||||
public let decodedNs: Int64
|
||||
/// The decoded image (NV12 biplanar, Metal-compatible).
|
||||
public let pixelBuffer: CVPixelBuffer
|
||||
}
|
||||
|
||||
/// The C output callback can't capture context, so VideoToolbox hands it the refcon we set at
|
||||
/// session creation — a pointer back to the owning `VideoDecoder`.
|
||||
private let decoderOutputCallback: VTDecompressionOutputCallback = {
|
||||
refcon, _, status, _, imageBuffer, pts, _ in
|
||||
guard let refcon else { return }
|
||||
Unmanaged<VideoDecoder>.fromOpaque(refcon)
|
||||
.takeUnretainedValue()
|
||||
.handleDecoded(status: status, imageBuffer: imageBuffer, pts: pts)
|
||||
}
|
||||
|
||||
/// Owns a `VTDecompressionSession` rebuilt whenever the format description changes (every IDR /
|
||||
/// mode change, the same trigger stage-1 uses). Thread-safe: `decode` runs on the pump thread,
|
||||
/// the output callback on a VT-managed thread; the only shared mutable state is the session +
|
||||
/// format, guarded by `lock`. `@unchecked Sendable` — the lock enforces the contract.
|
||||
public final class VideoDecoder: @unchecked Sendable {
|
||||
private let lock = NSLock()
|
||||
private var session: VTDecompressionSession?
|
||||
private var format: CMVideoFormatDescription?
|
||||
|
||||
/// Called on the VT thread for each successfully decoded frame — stamp + enqueue, don't block.
|
||||
private let onDecoded: @Sendable (ReadyFrame) -> Void
|
||||
/// Called on the VT thread when a frame fails to decode (bad data / decoder reset) so the
|
||||
/// pump can re-gate on the next IDR.
|
||||
private let onDecodeError: @Sendable (OSStatus) -> Void
|
||||
|
||||
public init(
|
||||
onDecoded: @escaping @Sendable (ReadyFrame) -> Void,
|
||||
onDecodeError: @escaping @Sendable (OSStatus) -> Void = { _ in }
|
||||
) {
|
||||
self.onDecoded = onDecoded
|
||||
self.onDecodeError = onDecodeError
|
||||
}
|
||||
|
||||
deinit { teardown() }
|
||||
|
||||
/// Submit one AU for asynchronous decode, (re)creating the session if `format` changed. The
|
||||
/// caller resolves `format` from the IDR exactly as stage-1 does (`AnnexB.formatDescription`).
|
||||
/// Returns false if the session couldn't be created or the frame couldn't be submitted.
|
||||
@discardableResult
|
||||
public func decode(au: AccessUnit, format newFormat: CMVideoFormatDescription) -> Bool {
|
||||
lock.lock()
|
||||
let needsNew: Bool = {
|
||||
guard let session, let format else { return true }
|
||||
if CMFormatDescriptionEqual(format, otherFormatDescription: newFormat) { return false }
|
||||
// A new desc that the live session can still accept (rare for HEVC) avoids a rebuild.
|
||||
return !VTDecompressionSessionCanAcceptFormatDescription(session, formatDescription: newFormat)
|
||||
}()
|
||||
if needsNew, !createSessionLocked(format: newFormat) {
|
||||
lock.unlock()
|
||||
return false
|
||||
}
|
||||
// Submit WHILE holding the lock so a concurrent reset()/teardown (main thread) can't
|
||||
// invalidate the session between here and DecodeFrame. The VT output callback takes the
|
||||
// ring lock, not this one, so there's no re-entrancy. DecodeFrame is async — non-blocking.
|
||||
guard let session,
|
||||
let sample = AnnexB.sampleBuffer(au: au, format: newFormat)
|
||||
else { lock.unlock(); return false }
|
||||
var infoOut = VTDecodeInfoFlags()
|
||||
let status = VTDecompressionSessionDecodeFrame(
|
||||
session,
|
||||
sampleBuffer: sample,
|
||||
flags: [._EnableAsynchronousDecompression],
|
||||
frameRefcon: nil,
|
||||
infoFlagsOut: &infoOut)
|
||||
lock.unlock()
|
||||
if status != noErr {
|
||||
onDecodeError(status)
|
||||
return false
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
/// Drop the session — the next `decode` rebuilds it. Used on stop and to recover from a
|
||||
/// wedged decoder (re-gates on the next in-band parameter sets, like stage-1's flush).
|
||||
public func reset() {
|
||||
lock.lock()
|
||||
teardownLocked()
|
||||
lock.unlock()
|
||||
}
|
||||
|
||||
private func teardown() {
|
||||
lock.lock()
|
||||
teardownLocked()
|
||||
lock.unlock()
|
||||
}
|
||||
|
||||
private func teardownLocked() {
|
||||
if let session {
|
||||
VTDecompressionSessionWaitForAsynchronousFrames(session)
|
||||
VTDecompressionSessionInvalidate(session)
|
||||
}
|
||||
session = nil
|
||||
format = nil
|
||||
}
|
||||
|
||||
/// `lock` held. Replace the session with one for `newFormat`. NV12 video-range, Metal-
|
||||
/// compatible output (10-bit/HDR is a later tie-in — see the plan).
|
||||
private func createSessionLocked(format newFormat: CMVideoFormatDescription) -> Bool {
|
||||
if let session {
|
||||
VTDecompressionSessionWaitForAsynchronousFrames(session)
|
||||
VTDecompressionSessionInvalidate(session)
|
||||
}
|
||||
session = nil
|
||||
format = nil
|
||||
|
||||
let imageAttrs: [CFString: Any] = [
|
||||
kCVPixelBufferMetalCompatibilityKey: true,
|
||||
kCVPixelBufferPixelFormatTypeKey:
|
||||
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,
|
||||
]
|
||||
var callback = VTDecompressionOutputCallbackRecord(
|
||||
decompressionOutputCallback: decoderOutputCallback,
|
||||
decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque())
|
||||
var newSession: VTDecompressionSession?
|
||||
let status = VTDecompressionSessionCreate(
|
||||
allocator: kCFAllocatorDefault,
|
||||
formatDescription: newFormat,
|
||||
decoderSpecification: nil, // hardware by default
|
||||
imageBufferAttributes: imageAttrs as CFDictionary,
|
||||
outputCallback: &callback,
|
||||
decompressionSessionOut: &newSession)
|
||||
guard status == noErr, let newSession else { return false }
|
||||
session = newSession
|
||||
format = newFormat
|
||||
return true
|
||||
}
|
||||
|
||||
/// VT thread. Stamp decode-completion and enqueue, or report the error.
|
||||
fileprivate func handleDecoded(status: OSStatus, imageBuffer: CVImageBuffer?, pts: CMTime) {
|
||||
guard status == noErr, let imageBuffer else {
|
||||
onDecodeError(status)
|
||||
return
|
||||
}
|
||||
var ts = timespec()
|
||||
clock_gettime(CLOCK_REALTIME, &ts)
|
||||
let decodedNs = Int64(ts.tv_sec) * 1_000_000_000 + Int64(ts.tv_nsec)
|
||||
// pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively.
|
||||
let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default)
|
||||
let ptsNs = p.value > 0 ? UInt64(p.value) : 0
|
||||
onDecoded(ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer))
|
||||
}
|
||||
}
|
||||
@@ -9,6 +9,13 @@ import VideoToolbox
|
||||
import XCTest
|
||||
@testable import PunktfunkKit
|
||||
|
||||
/// Sendable holder for the values the (background-thread) decode callback writes.
|
||||
private final class FrameBox: @unchecked Sendable {
|
||||
let lock = NSLock()
|
||||
var frame: ReadyFrame?
|
||||
var error: OSStatus?
|
||||
}
|
||||
|
||||
final class VideoToolboxRoundTripTests: XCTestCase {
|
||||
private let width = 320
|
||||
private let height = 240
|
||||
@@ -59,6 +66,43 @@ final class VideoToolboxRoundTripTests: XCTestCase {
|
||||
XCTAssertEqual(CVPixelBufferGetHeight(pixels), height)
|
||||
}
|
||||
|
||||
/// Stage-2 decode half: the same known IDR through `VideoDecoder` — assert its async output
|
||||
/// callback fires with a CVPixelBuffer of the right dimensions, the pts round-trips, and
|
||||
/// decode-completion is stamped.
|
||||
func testVideoDecoderAsyncCallbackDeliversPixels() throws {
|
||||
let (formatDesc, avccSample) = try encodeOneHEVCKeyframe()
|
||||
let annexB = try annexBAU(formatDesc: formatDesc, avccSample: avccSample)
|
||||
let format = try XCTUnwrap(AnnexB.formatDescription(fromIDR: annexB))
|
||||
let au = AccessUnit(data: annexB, ptsNs: 42_000_000, frameIndex: 0, flags: 0)
|
||||
|
||||
let box = FrameBox()
|
||||
let done = DispatchSemaphore(value: 0)
|
||||
let decoder = VideoDecoder(
|
||||
onDecoded: { frame in
|
||||
box.lock.lock(); box.frame = frame; box.lock.unlock()
|
||||
done.signal()
|
||||
},
|
||||
onDecodeError: { status in
|
||||
box.lock.lock(); box.error = status; box.lock.unlock()
|
||||
done.signal()
|
||||
})
|
||||
|
||||
XCTAssertTrue(decoder.decode(au: au, format: format), "frame submit should succeed")
|
||||
XCTAssertEqual(done.wait(timeout: .now() + 10), .success, "the decode callback must fire")
|
||||
decoder.reset()
|
||||
|
||||
box.lock.lock()
|
||||
let frame = box.frame
|
||||
let error = box.error
|
||||
box.lock.unlock()
|
||||
XCTAssertNil(error.map { "decode error \($0)" })
|
||||
let ready = try XCTUnwrap(frame, "the async output callback must deliver a ReadyFrame")
|
||||
XCTAssertEqual(CVPixelBufferGetWidth(ready.pixelBuffer), width)
|
||||
XCTAssertEqual(CVPixelBufferGetHeight(ready.pixelBuffer), height)
|
||||
XCTAssertEqual(ready.ptsNs, 42_000_000, "pts round-trips through the decoder")
|
||||
XCTAssertGreaterThan(ready.decodedNs, 0, "decode-completion is stamped")
|
||||
}
|
||||
|
||||
// MARK: - encode helpers
|
||||
|
||||
/// One forced-IDR HEVC frame; returns its format description and raw AVCC sample bytes.
|
||||
|
||||
@@ -14,7 +14,7 @@ and the design in the [Implementation Plan](/docs/implementation-plan); this pag
|
||||
| **M1** — `punktfunk-core` + C ABI (protocol · FEC · crypto) | ✅ complete & hardened |
|
||||
| **M2** — GameStream host (Moonlight-compatible) | ✅ working end-to-end; HDR/surround-audio polish open |
|
||||
| **M3** — `punktfunk/1` native protocol (QUIC control + UDP data) | ✅ full session planes, validated live |
|
||||
| **M4** — native client decode + present (Apple first) | 🟡 stage 1 done (first light); stage-2 presenter next |
|
||||
| **M4** — native client decode + present (Apple first) | 🟡 stage 1 live; stage-2 presenter built + decode-tested (opt-in, present needs live validation) |
|
||||
|
||||
## Live on the boxes
|
||||
|
||||
@@ -75,7 +75,7 @@ All three appliances advertise over mDNS (`_punktfunk._udp`) and require PIN pai
|
||||
|
||||
See the [Roadmap](/docs/roadmap) for the ordered list. Near-term:
|
||||
- **True glass-to-glass**: Apple client present-stamp (decode→present) + host render→capture term.
|
||||
- **Apple stage-2 presenter** (`VTDecompressionSession` → `CAMetalLayer`).
|
||||
- **Apple stage-2 presenter** (`VTDecompressionSession` → `CAMetalLayer`) — built + decode-unit-tested + live-validated on glass behind the `punktfunk.presenter` flag (capture→present ~11 ms p50); make it the default after a few resolution/HDR checks.
|
||||
- **Mandatory PIN pairing + delegated pairing approval** (an already-paired device approves a new one).
|
||||
- **gamescope multi-user isolation** — per-session input/audio so concurrent sessions are independent
|
||||
desktops (the shared-desktop multi-view case landed).
|
||||
|
||||
Reference in New Issue
Block a user