From 7b10714b6239598432a2347f9f35b9471e0b5e6c Mon Sep 17 00:00:00 2001 From: enricobuehler Date: Fri, 12 Jun 2026 15:28:23 +0200 Subject: [PATCH] =?UTF-8?q?feat(apple):=20stage-2=20presenter=20=E2=80=94?= =?UTF-8?q?=20explicit=20decode=20+=20Metal=20present=20+=20glass-to-glass?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Opt-in (Settings -> Presenter; `punktfunk.presenter`, default stage-1). Stage-1's AVSampleBufferDisplayLayer decodes AND presents internally with no per-frame callback, so neither decode nor present can be stamped or hand-paced. Stage-2 takes explicit control: - VideoDecoder: VTDecompressionSession, async output callback stamps decode-completion, session rebuilt on every IDR / format change. Unit-tested (testVideoDecoderAsyncCallbackDeliversPixels). - MetalVideoPresenter: CAMetalLayer + CVMetalTextureCache + a runtime-compiled BT.709 limited-range NV12->RGB shader, present at the next vsync. The CVMetalTextures + pixel buffer are held until the GPU completes. - Stage2Pipeline: pump thread -> decoder -> newest-ready 1-slot ring; the hosting view's display link drains it once per vsync and stamps capture->present (the display-link target time projected into CLOCK_REALTIME). - LatencyMeter gains record(ptsNs:atNs:offsetNs:); the HUD shows a capture->present (glass-to-glass, modulo host render->capture) line, skew-corrected via clockOffsetNs. Measured live ~11 ms p50 vs ~2.2 ms capture->client. - StreamView / StreamViewIOS host the CAMetalLayer as a sublayer + a CADisplayLink (NSView.displayLink on macOS) when stage-2; input capture + HUD unchanged. The session-active gates switch from `pump != nil` to `connection != nil` so capture engages without a StreamPump. Validated: builds macOS/iOS/tvOS; the decode half is unit-tested; the Metal present is live-validated on glass (correct image + the capture->present number). Colorspace is BT.709 SDR for now; 10-bit/HDR + a pacing policy are later. Plan: docs-site/content/docs/apple-stage2-presenter.md. Co-Authored-By: Claude Opus 4.8 (1M context) --- clients/apple/README.md | 19 +- .../Sources/PunktfunkClient/ContentView.swift | 10 +- .../PunktfunkClient/SessionModel.swift | 17 ++ .../PunktfunkClient/SettingsView.swift | 20 +++ .../Sources/PunktfunkKit/LatencyMeter.swift | 13 +- .../PunktfunkKit/MetalVideoPresenter.swift | 147 ++++++++++++++++ .../Sources/PunktfunkKit/Stage2Pipeline.swift | 133 ++++++++++++++ .../Sources/PunktfunkKit/StreamView.swift | 98 ++++++++++- .../Sources/PunktfunkKit/StreamViewIOS.swift | 97 +++++++++- .../Sources/PunktfunkKit/VideoDecoder.swift | 165 ++++++++++++++++++ .../VideoToolboxRoundTripTests.swift | 44 +++++ docs-site/content/docs/status.md | 4 +- 12 files changed, 737 insertions(+), 30 deletions(-) create mode 100644 clients/apple/Sources/PunktfunkKit/MetalVideoPresenter.swift create mode 100644 clients/apple/Sources/PunktfunkKit/Stage2Pipeline.swift create mode 100644 clients/apple/Sources/PunktfunkKit/VideoDecoder.swift diff --git a/clients/apple/README.md b/clients/apple/README.md index 03ea622..7b37214 100644 --- a/clients/apple/README.md +++ b/clients/apple/README.md @@ -74,7 +74,8 @@ What's here, all compiled and tested on macOS (Xcode 26.5 / Swift 6.3): received it, **skew-corrected** across machines via `PunktfunkConnection.clockOffsetNs` (the connect-time wall-clock handshake, `punktfunk_connection_clock_offset_ns`). It excludes the layer's decode+present (stage-1 `AVSampleBufferDisplayLayer` has no per-frame present callback); - true decode→present awaits the stage-2 presenter. Settings also picks the HOST + the opt-in **stage-2 presenter** (Settings → Presenter) adds a **capture→present** + (glass-to-glass) line via explicit decode + a Metal/display-link present. Settings also picks the HOST compositor (KWin/wlroots/Mutter/gamescope, default automatic — the host honors it only if that backend is available there) and has a **Controllers** section: every detected controller (capability glyphs, battery, "In use" badge), which one to forward @@ -166,13 +167,15 @@ signing, bundle id `io.unom.punktfunk`. Notes: 3. **Decode flow**: the host opens every stream with an IDR carrying VPS/SPS/PPS in-band and recovery keyframes re-send them — "refresh the format description on every IDR" (what `StreamView` does) is sufficient; there is no out-of-band extradata, ever. -4. **Stage 2 (next)**: explicit `VTDecompressionSession` + `CAMetalLayer` for frame-pacing - control (ProMotion/120 Hz) and true decode→present / glass-to-glass measurement. The - cross-machine clock offset is **already wired** — `PunktfunkConnection.clockOffsetNs` (from - the connect-time skew handshake); add it to a `CLOCK_REALTIME` present instant and subtract - the AU `pts_ns`. **Full pickup-ready implementation plan** (decode + present + measurement - wiring, integration points, gotchas): `docs-site/content/docs/apple-stage2-presenter.md` - (rendered in the docs site under "Apple Stage-2 Presenter"). +4. **Stage 2 — built, opt-in (`punktfunk.presenter == "stage2"`, default stage 1).** Explicit + `VTDecompressionSession` decode (`VideoDecoder`) → a `CAMetalLayer` + display-link present + (`MetalVideoPresenter`/`Stage2Pipeline`), hosted as a sublayer by the same `StreamView`s with + input capture + HUD unchanged. It adds a **capture→present** (glass-to-glass, modulo the host + render→capture term) HUD line, skew-corrected via `PunktfunkConnection.clockOffsetNs`. The + decode half is unit-tested (`testVideoDecoderAsyncCallbackDeliversPixels`); the Metal present + is display-bound — **validate live** (flip the Settings "Presenter" picker, watch the HUD + number and that the image looks right) before making it the default. 10-bit/HDR + a smoothing + pacer are later. Plan: `docs-site/content/docs/apple-stage2-presenter.md`. 5. **Audio — wired, both directions.** Playback: `SessionAudio` drains `nextAudio()` on its own thread, decodes through CoreAudio's built-in Opus codec (`OpusCodec.swift` — kAudioFormatOpus, no bundled libopus; round-trip unit-tested) into a priming diff --git a/clients/apple/Sources/PunktfunkClient/ContentView.swift b/clients/apple/Sources/PunktfunkClient/ContentView.swift index 1105604..4b79475 100644 --- a/clients/apple/Sources/PunktfunkClient/ContentView.swift +++ b/clients/apple/Sources/PunktfunkClient/ContentView.swift @@ -610,7 +610,8 @@ struct ContentView: View { }, onSessionEnd: { [weak model] in Task { @MainActor in model?.sessionEnded() } - } + }, + presentMeter: model.presentLatency ) .overlay(alignment: .topTrailing) { if captureEnabled { hud(conn) } @@ -635,6 +636,13 @@ struct ContentView: View { .font(.system(.caption2, design: .monospaced)) .foregroundStyle(.secondary) } + if model.presentLatencyValid { + // Capture→present (glass-to-glass, modulo host render→capture) — stage-2 presenter + // only; stage-1's layer presents internally with no per-frame stamp. + Text("capture→present \(model.presentLatencyP50Ms, specifier: "%.1f")/\(model.presentLatencyP95Ms, specifier: "%.1f") ms p50/p95\(model.presentLatencySkewCorrected ? "" : " (same-host)")") + .font(.system(.caption2, design: .monospaced)) + .foregroundStyle(.secondary) + } // While captured the cursor is hidden+frozen, so the button is keyboard-only // (⌘⎋ or Cmd+Tab release the cursor; released, it's clickable again). #if os(macOS) diff --git a/clients/apple/Sources/PunktfunkClient/SessionModel.swift b/clients/apple/Sources/PunktfunkClient/SessionModel.swift index d980bec..a976e8e 100644 --- a/clients/apple/Sources/PunktfunkClient/SessionModel.swift +++ b/clients/apple/Sources/PunktfunkClient/SessionModel.swift @@ -61,12 +61,21 @@ final class SessionModel: ObservableObject { @Published var latencyP95Ms = 0.0 @Published var latencyValid = false @Published var latencySkewCorrected = false + /// Capture→present (glass-to-glass, modulo the host render→capture term) — only the stage-2 + /// presenter can stamp this (it owns decode + a CAMetalLayer/display-link present). Stays + /// invalid under stage-1, where the layer presents internally with no per-frame callback. + @Published var presentLatencyP50Ms = 0.0 + @Published var presentLatencyP95Ms = 0.0 + @Published var presentLatencyValid = false + @Published var presentLatencySkewCorrected = false /// Mirrors StreamView's capture state (it owns the input capture; this drives the /// HUD's "click to capture" / "⌘⎋ releases" hint). @Published var mouseCaptured = false let meter = FrameMeter() let latency = LatencyMeter() + /// Fed by the stage-2 presenter's display link (capture→present). Passed to StreamView. + let presentLatency = LatencyMeter() private var statsTimer: Timer? private var audio: SessionAudio? private var gamepadCapture: GamepadCapture? @@ -230,6 +239,14 @@ final class SessionModel: ObservableObject { } else { self.latencyValid = false } + if let p = self.presentLatency.drain() { + self.presentLatencyP50Ms = p.p50Ms + self.presentLatencyP95Ms = p.p95Ms + self.presentLatencySkewCorrected = p.skewCorrected + self.presentLatencyValid = true + } else { + self.presentLatencyValid = false + } } } // .common so the HUD keeps updating during window drags / menu tracking. diff --git a/clients/apple/Sources/PunktfunkClient/SettingsView.swift b/clients/apple/Sources/PunktfunkClient/SettingsView.swift index 6421708..2343816 100644 --- a/clients/apple/Sources/PunktfunkClient/SettingsView.swift +++ b/clients/apple/Sources/PunktfunkClient/SettingsView.swift @@ -17,6 +17,7 @@ struct SettingsView: View { @AppStorage("punktfunk.compositor") private var compositor = 0 @AppStorage("punktfunk.gamepadType") private var gamepadType = 0 @AppStorage("punktfunk.bitrateKbps") private var bitrateKbps = 0 + @AppStorage("punktfunk.presenter") private var presenter = "stage1" @AppStorage("punktfunk.micEnabled") private var micEnabled = true @ObservedObject private var gamepads = GamepadManager.shared #if os(macOS) @@ -88,6 +89,10 @@ struct SettingsView: View { } TVSelectionRow( title: "Compositor", options: compositors, selection: $compositor) + TVSelectionRow( + title: "Presenter", + options: [("Stage 1 (default)", "stage1"), ("Stage 2 (experimental)", "stage2")], + selection: $presenter) Text("The host creates a virtual output at exactly this mode — native " + "resolution, no scaling. \(Self.bitrateFooter) A specific compositor " + "is honored only if available on the host.") @@ -366,6 +371,21 @@ struct SettingsView: View { .font(.caption) .foregroundStyle(.secondary) } + Section { + Picker("Presenter", selection: $presenter) { + Text("Stage 1 (default)").tag("stage1") + Text("Stage 2 (experimental)").tag("stage2") + } + } header: { + Text("Video presenter") + } footer: { + Text("Stage 1 feeds compressed video to the system display layer (known-good). " + + "Stage 2 decodes explicitly and presents through Metal with a display " + + "link — it adds a capture→present (glass-to-glass) latency line in the HUD " + + "and shortens the present tail. Applies from the next session.") + .font(.caption) + .foregroundStyle(.secondary) + } Section { if gamepads.controllers.isEmpty { Text("No controllers detected") diff --git a/clients/apple/Sources/PunktfunkKit/LatencyMeter.swift b/clients/apple/Sources/PunktfunkKit/LatencyMeter.swift index 1bbf371..7c9b6da 100644 --- a/clients/apple/Sources/PunktfunkKit/LatencyMeter.swift +++ b/clients/apple/Sources/PunktfunkKit/LatencyMeter.swift @@ -25,13 +25,20 @@ public final class LatencyMeter: @unchecked Sendable { public init() {} - /// Record one frame at receipt. `ptsNs` is the host capture clock (the AU's pts); `offsetNs` is - /// the host-client clock offset from the skew handshake (0 = uncorrected / old host). + /// Record one frame at receipt (now). `ptsNs` is the host capture clock (the AU's pts); + /// `offsetNs` is the host-client clock offset from the skew handshake (0 = uncorrected). public func record(ptsNs: UInt64, offsetNs: Int64) { var ts = timespec() clock_gettime(CLOCK_REALTIME, &ts) let nowNs = Int64(ts.tv_sec) * 1_000_000_000 + Int64(ts.tv_nsec) - let latNs = nowNs &+ offsetNs &- Int64(bitPattern: ptsNs) + record(ptsNs: ptsNs, atNs: nowNs, offsetNs: offsetNs) + } + + /// Record one frame whose latency is `atNs + offsetNs - ptsNs` — an EXPLICIT client instant + /// rather than now. The stage-2 presenter uses this to stamp capture→present at the display + /// link's target present time (not the moment the present call ran). All in `CLOCK_REALTIME`. + public func record(ptsNs: UInt64, atNs: Int64, offsetNs: Int64) { + let latNs = atNs &+ offsetNs &- Int64(bitPattern: ptsNs) // Drop absurd values (a clock step, a wildly wrong offset, or garbage pts). guard latNs > 0, latNs < 10_000_000_000 else { return } lock.lock() diff --git a/clients/apple/Sources/PunktfunkKit/MetalVideoPresenter.swift b/clients/apple/Sources/PunktfunkKit/MetalVideoPresenter.swift new file mode 100644 index 0000000..399e1ff --- /dev/null +++ b/clients/apple/Sources/PunktfunkKit/MetalVideoPresenter.swift @@ -0,0 +1,147 @@ +// Stage-2 presenter, present half: draw a decoded NV12 CVPixelBuffer into a CAMetalLayer +// drawable with a BT.709 YUV→RGB shader. The display link (owned by the hosting view) drives +// `render` once per vsync with the target present time, so a present can finally be stamped and +// the present tail hand-paced. See docs apple-stage2-presenter.md. +// +// Main-thread only: created during view setup, `render` called from the view's CADisplayLink +// (which fires on the main runloop). The Metal objects + texture cache are touched only here. + +#if canImport(Metal) && canImport(QuartzCore) +import CoreVideo +import Metal +import QuartzCore + +/// Runtime-compiled (no metallib build step needed in SwiftPM): a fullscreen triangle and a +/// BT.709 limited-range NV12→RGB fragment shader. uv.y is flipped (1 - p.y) so the top-left- +/// origin texture presents upright (NDC y is up), not upside down. (Colorspace is BT.709 SDR +/// for now — matches the host; 10-bit/HDR + other matrices are a later tie-in.) +private let shaderSource = """ +#include +using namespace metal; + +struct VOut { float4 pos [[position]]; float2 uv; }; + +vertex VOut pf_vtx(uint vid [[vertex_id]]) { + float2 p = float2(float((vid << 1) & 2), float(vid & 2)); + VOut o; + o.pos = float4(p * 2.0 - 1.0, 0.0, 1.0); + o.uv = float2(p.x, 1.0 - p.y); + return o; +} + +fragment float4 pf_frag(VOut in [[stage_in]], + texture2d lumaTex [[texture(0)]], + texture2d chromaTex [[texture(1)]]) { + constexpr sampler s(filter::linear, address::clamp_to_edge); + float y = lumaTex.sample(s, in.uv).r; + float2 c = chromaTex.sample(s, in.uv).rg; + // BT.709, 8-bit limited (video) range → full-range RGB. + y = (y - 16.0/255.0) * (255.0/219.0); + float u = (c.x - 128.0/255.0) * (255.0/224.0); + float v = (c.y - 128.0/255.0) * (255.0/224.0); + float r = y + 1.5748 * v; + float g = y - 0.1873 * u - 0.4681 * v; + float b = y + 1.8556 * u; + return float4(saturate(float3(r, g, b)), 1.0); +} +""" + +public final class MetalVideoPresenter { + /// The layer the hosting view installs (as a sublayer) and sizes to its bounds. + public let layer: CAMetalLayer + + private let device: MTLDevice + private let queue: MTLCommandQueue + private let pipeline: MTLRenderPipelineState + private var textureCache: CVMetalTextureCache? + + /// nil if Metal is unavailable (no GPU / a headless CI) — the caller falls back to stage-1. + public init?() { + guard let device = MTLCreateSystemDefaultDevice(), + let queue = device.makeCommandQueue() + else { return nil } + self.device = device + self.queue = queue + do { + let library = try device.makeLibrary(source: shaderSource, options: nil) + let desc = MTLRenderPipelineDescriptor() + desc.vertexFunction = library.makeFunction(name: "pf_vtx") + desc.fragmentFunction = library.makeFunction(name: "pf_frag") + desc.colorAttachments[0].pixelFormat = .bgra8Unorm + pipeline = try device.makeRenderPipelineState(descriptor: desc) + } catch { + return nil + } + CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &textureCache) + guard textureCache != nil else { return nil } + + let layer = CAMetalLayer() + layer.device = device + layer.pixelFormat = .bgra8Unorm + layer.framebufferOnly = true + layer.isOpaque = true + self.layer = layer + } + + /// Track the stream mode (the host can Reconfigure mid-stream). Size is in pixels. + public func setDrawableSize(_ size: CGSize) { + guard size.width > 0, size.height > 0 else { return } + if layer.drawableSize != size { layer.drawableSize = size } + } + + /// Draw one decoded frame to the next drawable and present it. Returns true on success; + /// false when there's no drawable yet, a texture couldn't be made, or Metal errored — the + /// caller then doesn't stamp a present for this frame. + @discardableResult + public func render(_ pixelBuffer: CVPixelBuffer) -> Bool { + guard let textureCache, + let luma = makeTexture(pixelBuffer, plane: 0, format: .r8Unorm, cache: textureCache), + let chroma = makeTexture(pixelBuffer, plane: 1, format: .rg8Unorm, cache: textureCache) + else { return false } + + // The hosting view owns drawableSize (aspect-fit to its bounds); skip until it's laid + // out. The fullscreen triangle scales the decoded texture to fill the drawable. + guard layer.drawableSize.width > 0, layer.drawableSize.height > 0, + let drawable = layer.nextDrawable(), + let commandBuffer = queue.makeCommandBuffer() + else { return false } + + let pass = MTLRenderPassDescriptor() + pass.colorAttachments[0].texture = drawable.texture + pass.colorAttachments[0].loadAction = .clear + pass.colorAttachments[0].clearColor = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 1) + pass.colorAttachments[0].storeAction = .store + guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else { + return false + } + encoder.setRenderPipelineState(pipeline) + encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0) + encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1) + encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3) + encoder.endEncoding() + commandBuffer.present(drawable) // present at the next vsync — lowest latency + // Hold the CVMetalTextures + the source pixel buffer (its IOSurface) alive until the GPU + // finishes sampling — releasing them at scope exit could free the backing mid-read. + commandBuffer.addCompletedHandler { _ in _ = (luma, chroma, pixelBuffer) } + commandBuffer.commit() + return true + } + + /// Returns the CVMetalTexture (not just its MTLTexture) so the caller can keep it alive past + /// the draw — the MTLTexture is only valid while its CVMetalTexture is retained. + private func makeTexture( + _ pixelBuffer: CVPixelBuffer, plane: Int, format: MTLPixelFormat, + cache: CVMetalTextureCache + ) -> CVMetalTexture? { + let w = CVPixelBufferGetWidthOfPlane(pixelBuffer, plane) + let h = CVPixelBufferGetHeightOfPlane(pixelBuffer, plane) + var cvTexture: CVMetalTexture? + let status = CVMetalTextureCacheCreateTextureFromImage( + kCFAllocatorDefault, cache, pixelBuffer, nil, format, w, h, plane, &cvTexture) + guard status == kCVReturnSuccess, let cvTexture, + CVMetalTextureGetTexture(cvTexture) != nil + else { return nil } + return cvTexture + } +} +#endif diff --git a/clients/apple/Sources/PunktfunkKit/Stage2Pipeline.swift b/clients/apple/Sources/PunktfunkKit/Stage2Pipeline.swift new file mode 100644 index 0000000..17a9476 --- /dev/null +++ b/clients/apple/Sources/PunktfunkKit/Stage2Pipeline.swift @@ -0,0 +1,133 @@ +// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async +// output drops the newest decoded frame into a 1-slot ring; the hosting view's display link +// calls `renderTick` once per vsync to draw + present the newest ready frame and stamp +// capture→present. Mirrors StreamPump's lifecycle (one per start; cancel is permanent). +// +// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick` +// + `setDrawableSize` + `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there). +// Only the ring + decoder cross threads and both are internally locked. + +#if canImport(Metal) && canImport(QuartzCore) +import AVFoundation +import Foundation +import QuartzCore + +/// Newest-ready 1-slot ring: the decoder overwrites (drops the older undisplayed frame — lowest +/// latency, no smoothing buffer), the display link takes-and-clears. Sendable; lock-guarded. +private final class ReadyRing: @unchecked Sendable { + private let lock = NSLock() + private var frame: ReadyFrame? + func submit(_ f: ReadyFrame) { + lock.lock(); frame = f; lock.unlock() + } + func take() -> ReadyFrame? { + lock.lock(); defer { lock.unlock() } + let f = frame; frame = nil; return f + } +} + +/// Cancellation handle owned by one pump thread (same pattern as StreamPump). +private final class PumpToken: @unchecked Sendable { + private let lock = NSLock() + private var live = true + var isLive: Bool { lock.lock(); defer { lock.unlock() }; return live } + func cancel() { lock.lock(); live = false; lock.unlock() } +} + +public final class Stage2Pipeline { + private let ring = ReadyRing() + private let presenter: MetalVideoPresenter + private let decoder: VideoDecoder + private let presentMeter: LatencyMeter + private var token = PumpToken() + private var offsetNs: Int64 = 0 + + /// The Metal layer the hosting view installs + sizes. nil-init fails when Metal is + /// unavailable so the caller can fall back to stage-1. + public var layer: CAMetalLayer { presenter.layer } + + /// `presentMeter` records capture→present (the glass-to-glass term). Returns nil if Metal + /// can't be set up (headless / no GPU) — caller falls back to the stage-1 presenter. + public init?(presentMeter: LatencyMeter) { + guard let presenter = MetalVideoPresenter() else { return nil } + self.presenter = presenter + self.presentMeter = presentMeter + let ring = ring + self.decoder = VideoDecoder( + onDecoded: { ring.submit($0) }, + onDecodeError: { _ in /* the pump resets the session via reset() on the next IDR */ }) + } + + /// Start pulling AUs into the decoder. `onFrame` fires per AU at receipt (capture→client + /// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client) + /// makes the present stamp cross-machine valid. + public func start( + connection: PunktfunkConnection, + onFrame: (@Sendable (AccessUnit) -> Void)?, + onSessionEnd: (@Sendable () -> Void)? + ) { + offsetNs = connection.clockOffsetNs + token = PumpToken() // fresh token per start — cancel is permanent (like StreamPump) + let token = token + let decoder = decoder + let thread = Thread { + var format: CMVideoFormatDescription? + while token.isLive { + do { + guard let au = try connection.nextAU(timeoutMs: 100) else { continue } + onFrame?(au) + if let f = AnnexB.formatDescription(fromIDR: au.data) { + format = f // refreshed on every IDR (mode changes included) + } + guard let f = format, token.isLive else { continue } + if !decoder.decode(au: au, format: f) { + // Submit/decoder error: drop the session and re-gate on the next IDR's + // in-band parameter sets (a delta frame can't recover) — stage-1's policy. + decoder.reset() + } + } catch { + if token.isLive { onSessionEnd?() } + break // session closed + } + } + } + thread.name = "punktfunk-stage2-pump" + thread.qualityOfService = .userInteractive + thread.start() + } + + /// MAIN thread, once per vsync. Present the newest ready frame (if any) and stamp + /// capture→present at `targetPresentNs` — the display link's target present instant, already + /// converted to `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`). + public func renderTick(targetPresentNs: Int64) { + guard let frame = ring.take() else { return } + guard presenter.render(frame.pixelBuffer) else { return } + presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs) + } + + /// MAIN thread. Keep the drawable matched to the negotiated mode (host can Reconfigure). + public func setDrawableSize(_ size: CGSize) { + presenter.setDrawableSize(size) + } + + /// Stop the pump (≤ one poll timeout) and drop the decode session. Does not close the + /// connection. A restart needs a fresh Stage2Pipeline (cancel is permanent). + public func stop() { + token.cancel() + decoder.reset() + } + + deinit { token.cancel() } + + /// Convert a `CADisplayLink.targetTimestamp` (CACurrentMediaTime basis) to a `CLOCK_REALTIME` + /// nanosecond instant — the present clock the AU pts + skew offset live in. Projects to the + /// target present time (when the frame is actually on glass), not the moment we drew. + public static func realtimeNs(forDisplayLinkTimestamp t: CFTimeInterval) -> Int64 { + let caNow = CACurrentMediaTime() + var ts = timespec() + clock_gettime(CLOCK_REALTIME, &ts) + let realtimeNow = Int64(ts.tv_sec) * 1_000_000_000 + Int64(ts.tv_nsec) + return realtimeNow + Int64((t - caNow) * 1_000_000_000) + } +} +#endif diff --git a/clients/apple/Sources/PunktfunkKit/StreamView.swift b/clients/apple/Sources/PunktfunkKit/StreamView.swift index a0ad446..27b5236 100644 --- a/clients/apple/Sources/PunktfunkKit/StreamView.swift +++ b/clients/apple/Sources/PunktfunkKit/StreamView.swift @@ -69,30 +69,35 @@ public struct StreamView: NSViewRepresentable { private let onCaptureChange: ((Bool) -> Void)? private let onFrame: (@Sendable (AccessUnit) -> Void)? private let onSessionEnd: (@Sendable () -> Void)? + private let presentMeter: LatencyMeter? /// `onFrame`/`onSessionEnd` fire on the pump thread — hop to the main actor for UI. /// `captureEnabled: false` disables input capture entirely while UI (e.g. a trust /// prompt) is layered over the stream; flipping it to true auto-engages capture /// once. `onCaptureChange` (main thread) reports engage/release — drive the HUD's - /// "click to capture" / "⌘⎋ releases" hint with it. + /// "click to capture" / "⌘⎋ releases" hint with it. `presentMeter` records capture→present + /// when the stage-2 presenter is active (`punktfunk.presenter == "stage2"`). public init( connection: PunktfunkConnection, captureEnabled: Bool = true, onCaptureChange: ((Bool) -> Void)? = nil, onFrame: (@Sendable (AccessUnit) -> Void)? = nil, - onSessionEnd: (@Sendable () -> Void)? = nil + onSessionEnd: (@Sendable () -> Void)? = nil, + presentMeter: LatencyMeter? = nil ) { self.connection = connection self.captureEnabled = captureEnabled self.onCaptureChange = onCaptureChange self.onFrame = onFrame self.onSessionEnd = onSessionEnd + self.presentMeter = presentMeter } public func makeNSView(context: Context) -> StreamLayerView { let view = StreamLayerView() view.onCaptureChange = onCaptureChange view.captureEnabled = captureEnabled + view.presentMeter = presentMeter view.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd) return view } @@ -100,6 +105,7 @@ public struct StreamView: NSViewRepresentable { public func updateNSView(_ view: StreamLayerView, context: Context) { view.onCaptureChange = onCaptureChange view.captureEnabled = captureEnabled + view.presentMeter = presentMeter // SwiftUI reuses the NSView across state changes — repoint the pump only when the // connection identity actually changed. if view.connection !== connection { @@ -115,6 +121,12 @@ public struct StreamView: NSViewRepresentable { public final class StreamLayerView: NSView { private let displayLayer = AVSampleBufferDisplayLayer() private var pump: StreamPump? + /// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a + /// display link instead of the StreamPump → displayLayer path. nil = stage-1 (default). + var presentMeter: LatencyMeter? + private var stage2: Stage2Pipeline? + private var stage2Link: CADisplayLink? + private var metalLayer: CAMetalLayer? public private(set) var connection: PunktfunkConnection? private let cursorCapture = CursorCapture() private var inputCapture: InputCapture? @@ -191,6 +203,7 @@ public final class StreamLayerView: NSView { public override func layout() { super.layout() attemptPendingCapture() // bounds become real here on first presentation + layoutMetalLayer() // keep the stage-2 sublayer aspect-fit to the view } // MARK: - Capture state machine @@ -296,7 +309,9 @@ public final class StreamLayerView: NSView { // A click is explicit intent AND may arrive mid-activation (acceptsFirstMouse: // NSApp.isActive / isKeyWindow are still false for the click coming in from // another app) — only the auto-engage paths require already-held key status. - guard captureEnabled, !captured, pump != nil, window != nil, + // `connection != nil` (not `pump`) is the session-active gate — the stage-2 presenter + // runs without a StreamPump, and capture must still engage there. + guard captureEnabled, !captured, connection != nil, window != nil, fromClick || (NSApp.isActive && window?.isKeyWindow == true) else { return } // If the cursor grab is refused (e.g. the reactivating click arrives before the app is @@ -416,14 +431,80 @@ public final class StreamLayerView: NSView { capture.start() inputCapture = capture - let pump = StreamPump() - pump.start( - connection: connection, layer: displayLayer, - onFrame: onFrame, onSessionEnd: onSessionEnd) - self.pump = pump + // Presenter choice — default stage-1 (the known-good AVSampleBufferDisplayLayer). Stage-2 + // (`punktfunk.presenter == "stage2"`) takes explicit VTDecompressionSession decode + a + // CAMetalLayer/display-link present; it falls back here if Metal can't be set up. + if UserDefaults.standard.string(forKey: "punktfunk.presenter") == "stage2", + let meter = presentMeter, + let pipeline = Stage2Pipeline(presentMeter: meter) { + startStage2(pipeline, connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd) + } else { + let pump = StreamPump() + pump.start( + connection: connection, layer: displayLayer, + onFrame: onFrame, onSessionEnd: onSessionEnd) + self.pump = pump + } requestAutoCapture() // entering a session is the deliberate "capture me" moment } + // MARK: - Stage-2 presenter (VTDecompressionSession → CAMetalLayer + display link) + + private func startStage2( + _ pipeline: Stage2Pipeline, connection: PunktfunkConnection, + onFrame: (@Sendable (AccessUnit) -> Void)?, onSessionEnd: (@Sendable () -> Void)? + ) { + let metal = pipeline.layer + displayLayer.addSublayer(metal) // contentsScale + frame set in layoutMetalLayer() + metalLayer = metal + stage2 = pipeline + layoutMetalLayer() + let link = displayLink(target: self, selector: #selector(stage2Tick(_:))) + link.add(to: .main, forMode: .common) + stage2Link = link + pipeline.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd) + } + + @objc private func stage2Tick(_ link: CADisplayLink) { + stage2?.renderTick( + targetPresentNs: Stage2Pipeline.realtimeNs(forDisplayLinkTimestamp: link.targetTimestamp)) + } + + /// Aspect-fit the metal sublayer in the view (the host streams at the client's native mode, + /// so this is usually the full bounds; it letterboxes a resized window). drawableSize is the + /// layer's pixel size — the fullscreen-triangle shader scales the decoded texture to fill it. + private func layoutMetalLayer() { + guard let metalLayer, let connection else { return } + let mode = connection.currentMode() + let fit: NSRect = (mode.width > 0 && mode.height > 0) + ? AVMakeRect( + aspectRatio: CGSize(width: Int(mode.width), height: Int(mode.height)), + insideRect: bounds) + : bounds + let scale = window?.backingScaleFactor ?? 1 + // No implicit resize animation; refresh contentsScale on a retina↔non-retina move. + CATransaction.begin() + CATransaction.setDisableActions(true) + metalLayer.contentsScale = scale + metalLayer.frame = fit + CATransaction.commit() + stage2?.setDrawableSize(CGSize(width: fit.width * scale, height: fit.height * scale)) + } + + public override func viewDidChangeBackingProperties() { + super.viewDidChangeBackingProperties() + layoutMetalLayer() // backing scale changed (e.g. moved to a non-retina display) + } + + private func teardownStage2() { + stage2Link?.invalidate() + stage2Link = nil + stage2?.stop() + stage2 = nil + metalLayer?.removeFromSuperlayer() + metalLayer = nil + } + /// Stop pumping (≤ one poll timeout). Does not close the connection — that stays with /// whoever owns it (PunktfunkConnection.close() is safe alongside a draining pump). public func stop() { @@ -433,6 +514,7 @@ public final class StreamLayerView: NSView { inputCapture = nil pump?.stop() pump = nil + teardownStage2() connection = nil } diff --git a/clients/apple/Sources/PunktfunkKit/StreamViewIOS.swift b/clients/apple/Sources/PunktfunkKit/StreamViewIOS.swift index 62826e7..caa4003 100644 --- a/clients/apple/Sources/PunktfunkKit/StreamViewIOS.swift +++ b/clients/apple/Sources/PunktfunkKit/StreamViewIOS.swift @@ -45,25 +45,29 @@ public struct StreamView: UIViewControllerRepresentable { private let onCaptureChange: ((Bool) -> Void)? private let onFrame: (@Sendable (AccessUnit) -> Void)? private let onSessionEnd: (@Sendable () -> Void)? + private let presentMeter: LatencyMeter? public init( connection: PunktfunkConnection, captureEnabled: Bool = true, onCaptureChange: ((Bool) -> Void)? = nil, onFrame: (@Sendable (AccessUnit) -> Void)? = nil, - onSessionEnd: (@Sendable () -> Void)? = nil + onSessionEnd: (@Sendable () -> Void)? = nil, + presentMeter: LatencyMeter? = nil ) { self.connection = connection self.captureEnabled = captureEnabled self.onCaptureChange = onCaptureChange self.onFrame = onFrame self.onSessionEnd = onSessionEnd + self.presentMeter = presentMeter } public func makeUIViewController(context: Context) -> StreamViewController { let controller = StreamViewController() controller.onCaptureChange = onCaptureChange controller.captureEnabled = captureEnabled + controller.presentMeter = presentMeter controller.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd) return controller } @@ -71,6 +75,7 @@ public struct StreamView: UIViewControllerRepresentable { public func updateUIViewController(_ controller: StreamViewController, context: Context) { controller.onCaptureChange = onCaptureChange controller.captureEnabled = captureEnabled + controller.presentMeter = presentMeter if controller.connection !== connection { controller.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd) } @@ -87,6 +92,12 @@ public final class StreamViewController: UIViewController { public private(set) var connection: PunktfunkConnection? private var pump: StreamPump? private var observers: [NSObjectProtocol] = [] + /// Stage-2 presenter (opt-in via `punktfunk.presenter`): a CAMetalLayer sublayer driven by a + /// CADisplayLink instead of the StreamPump → displayLayer path. nil = stage-1 (default). + var presentMeter: LatencyMeter? + private var stage2: Stage2Pipeline? + private var stage2Link: CADisplayLink? + private var metalLayer: CAMetalLayer? #if os(iOS) private var inputCapture: InputCapture? fileprivate var captured = false @@ -204,11 +215,20 @@ public final class StreamViewController: UIViewController { inputCapture = capture #endif - let pump = StreamPump() - pump.start( - connection: connection, layer: streamView.displayLayer, - onFrame: onFrame, onSessionEnd: onSessionEnd) - self.pump = pump + // Presenter choice — default stage-1 (the known-good AVSampleBufferDisplayLayer). Stage-2 + // (`punktfunk.presenter == "stage2"`) takes VTDecompressionSession decode + a + // CAMetalLayer/display-link present; falls back here if Metal can't be set up. + if UserDefaults.standard.string(forKey: "punktfunk.presenter") == "stage2", + let meter = presentMeter, + let pipeline = Stage2Pipeline(presentMeter: meter) { + startStage2(pipeline, connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd) + } else { + let pump = StreamPump() + pump.start( + connection: connection, layer: streamView.displayLayer, + onFrame: onFrame, onSessionEnd: onSessionEnd) + self.pump = pump + } #if os(iOS) // GC only delivers while active; everything held is flushed by InputCapture's @@ -227,7 +247,7 @@ public final class StreamViewController: UIViewController { observers.append(NotificationCenter.default.addObserver( forName: UIApplication.didBecomeActiveNotification, object: nil, queue: .main ) { [weak self] _ in - guard let self, self.wasCapturedOnResign, self.captureEnabled, self.pump != nil + guard let self, self.wasCapturedOnResign, self.captureEnabled, self.connection != nil else { return } self.setCaptured(true) }) @@ -262,13 +282,74 @@ public final class StreamViewController: UIViewController { #endif pump?.stop() pump = nil + teardownStage2() connection = nil } + // MARK: - Stage-2 presenter (VTDecompressionSession → CAMetalLayer + display link) + + private func startStage2( + _ pipeline: Stage2Pipeline, connection: PunktfunkConnection, + onFrame: (@Sendable (AccessUnit) -> Void)?, onSessionEnd: (@Sendable () -> Void)? + ) { + let metal = pipeline.layer + metal.contentsScale = streamView.contentScaleFactor + streamView.layer.addSublayer(metal) + metalLayer = metal + stage2 = pipeline + layoutMetalLayer() + let link = CADisplayLink(target: self, selector: #selector(stage2Tick(_:))) + link.add(to: .main, forMode: .common) + stage2Link = link + pipeline.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd) + } + + @objc private func stage2Tick(_ link: CADisplayLink) { + stage2?.renderTick( + targetPresentNs: Stage2Pipeline.realtimeNs(forDisplayLinkTimestamp: link.targetTimestamp)) + } + + public override func viewDidLayoutSubviews() { + super.viewDidLayoutSubviews() + layoutMetalLayer() + } + + /// Aspect-fit the metal sublayer in the view (the host streams at the client's native mode, + /// so this is usually the full bounds). drawableSize is the layer's pixel size; the shader's + /// fullscreen triangle scales the decoded texture to fill it. + private func layoutMetalLayer() { + guard let metalLayer, let connection else { return } + let mode = connection.currentMode() + let bounds = streamView.bounds + let fit: CGRect = (mode.width > 0 && mode.height > 0) + ? AVMakeRect( + aspectRatio: CGSize(width: Int(mode.width), height: Int(mode.height)), + insideRect: bounds) + : bounds + let scale = streamView.contentScaleFactor + CATransaction.begin() + CATransaction.setDisableActions(true) // don't animate the resize + metalLayer.contentsScale = scale + metalLayer.frame = fit + CATransaction.commit() + stage2?.setDrawableSize(CGSize(width: fit.width * scale, height: fit.height * scale)) + } + + private func teardownStage2() { + stage2Link?.invalidate() + stage2Link = nil + stage2?.stop() + stage2 = nil + metalLayer?.removeFromSuperlayer() + metalLayer = nil + } + #if os(iOS) private func setCaptured(_ on: Bool) { if on { - guard captureEnabled, !captured, pump != nil else { return } + // `connection != nil` (not `pump`) is the session-active gate — the stage-2 presenter + // runs without a StreamPump. + guard captureEnabled, !captured, connection != nil else { return } inputCapture?.setForwarding(true) captured = true } else { diff --git a/clients/apple/Sources/PunktfunkKit/VideoDecoder.swift b/clients/apple/Sources/PunktfunkKit/VideoDecoder.swift new file mode 100644 index 0000000..b0d92ec --- /dev/null +++ b/clients/apple/Sources/PunktfunkKit/VideoDecoder.swift @@ -0,0 +1,165 @@ +// Stage-2 presenter, decode half: explicit VideoToolbox decode of the host's HEVC AUs. +// +// Stage-1 hands compressed samples to AVSampleBufferDisplayLayer, which decodes AND presents +// internally with no per-frame callback — so neither decode-completion nor present can be +// stamped, and frames can't be hand-paced. Here we drive VTDecompressionSession ourselves: the +// output callback delivers a decoded CVPixelBuffer, we stamp decode-completion, and push it into +// a ready ring the presenter's display link drains. See docs apple-stage2-presenter.md. + +import CoreMedia +import CoreVideo +import Foundation +import VideoToolbox + +/// One decoded frame waiting to be presented. Owns a retained `CVPixelBuffer` until shown. +public struct ReadyFrame: @unchecked Sendable { + /// Host capture clock (the AU's pts), in nanoseconds. + public let ptsNs: UInt64 + /// Client `CLOCK_REALTIME` instant decode completed, in nanoseconds. + public let decodedNs: Int64 + /// The decoded image (NV12 biplanar, Metal-compatible). + public let pixelBuffer: CVPixelBuffer +} + +/// The C output callback can't capture context, so VideoToolbox hands it the refcon we set at +/// session creation — a pointer back to the owning `VideoDecoder`. +private let decoderOutputCallback: VTDecompressionOutputCallback = { + refcon, _, status, _, imageBuffer, pts, _ in + guard let refcon else { return } + Unmanaged.fromOpaque(refcon) + .takeUnretainedValue() + .handleDecoded(status: status, imageBuffer: imageBuffer, pts: pts) +} + +/// Owns a `VTDecompressionSession` rebuilt whenever the format description changes (every IDR / +/// mode change, the same trigger stage-1 uses). Thread-safe: `decode` runs on the pump thread, +/// the output callback on a VT-managed thread; the only shared mutable state is the session + +/// format, guarded by `lock`. `@unchecked Sendable` — the lock enforces the contract. +public final class VideoDecoder: @unchecked Sendable { + private let lock = NSLock() + private var session: VTDecompressionSession? + private var format: CMVideoFormatDescription? + + /// Called on the VT thread for each successfully decoded frame — stamp + enqueue, don't block. + private let onDecoded: @Sendable (ReadyFrame) -> Void + /// Called on the VT thread when a frame fails to decode (bad data / decoder reset) so the + /// pump can re-gate on the next IDR. + private let onDecodeError: @Sendable (OSStatus) -> Void + + public init( + onDecoded: @escaping @Sendable (ReadyFrame) -> Void, + onDecodeError: @escaping @Sendable (OSStatus) -> Void = { _ in } + ) { + self.onDecoded = onDecoded + self.onDecodeError = onDecodeError + } + + deinit { teardown() } + + /// Submit one AU for asynchronous decode, (re)creating the session if `format` changed. The + /// caller resolves `format` from the IDR exactly as stage-1 does (`AnnexB.formatDescription`). + /// Returns false if the session couldn't be created or the frame couldn't be submitted. + @discardableResult + public func decode(au: AccessUnit, format newFormat: CMVideoFormatDescription) -> Bool { + lock.lock() + let needsNew: Bool = { + guard let session, let format else { return true } + if CMFormatDescriptionEqual(format, otherFormatDescription: newFormat) { return false } + // A new desc that the live session can still accept (rare for HEVC) avoids a rebuild. + return !VTDecompressionSessionCanAcceptFormatDescription(session, formatDescription: newFormat) + }() + if needsNew, !createSessionLocked(format: newFormat) { + lock.unlock() + return false + } + // Submit WHILE holding the lock so a concurrent reset()/teardown (main thread) can't + // invalidate the session between here and DecodeFrame. The VT output callback takes the + // ring lock, not this one, so there's no re-entrancy. DecodeFrame is async — non-blocking. + guard let session, + let sample = AnnexB.sampleBuffer(au: au, format: newFormat) + else { lock.unlock(); return false } + var infoOut = VTDecodeInfoFlags() + let status = VTDecompressionSessionDecodeFrame( + session, + sampleBuffer: sample, + flags: [._EnableAsynchronousDecompression], + frameRefcon: nil, + infoFlagsOut: &infoOut) + lock.unlock() + if status != noErr { + onDecodeError(status) + return false + } + return true + } + + /// Drop the session — the next `decode` rebuilds it. Used on stop and to recover from a + /// wedged decoder (re-gates on the next in-band parameter sets, like stage-1's flush). + public func reset() { + lock.lock() + teardownLocked() + lock.unlock() + } + + private func teardown() { + lock.lock() + teardownLocked() + lock.unlock() + } + + private func teardownLocked() { + if let session { + VTDecompressionSessionWaitForAsynchronousFrames(session) + VTDecompressionSessionInvalidate(session) + } + session = nil + format = nil + } + + /// `lock` held. Replace the session with one for `newFormat`. NV12 video-range, Metal- + /// compatible output (10-bit/HDR is a later tie-in — see the plan). + private func createSessionLocked(format newFormat: CMVideoFormatDescription) -> Bool { + if let session { + VTDecompressionSessionWaitForAsynchronousFrames(session) + VTDecompressionSessionInvalidate(session) + } + session = nil + format = nil + + let imageAttrs: [CFString: Any] = [ + kCVPixelBufferMetalCompatibilityKey: true, + kCVPixelBufferPixelFormatTypeKey: + kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, + ] + var callback = VTDecompressionOutputCallbackRecord( + decompressionOutputCallback: decoderOutputCallback, + decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque()) + var newSession: VTDecompressionSession? + let status = VTDecompressionSessionCreate( + allocator: kCFAllocatorDefault, + formatDescription: newFormat, + decoderSpecification: nil, // hardware by default + imageBufferAttributes: imageAttrs as CFDictionary, + outputCallback: &callback, + decompressionSessionOut: &newSession) + guard status == noErr, let newSession else { return false } + session = newSession + format = newFormat + return true + } + + /// VT thread. Stamp decode-completion and enqueue, or report the error. + fileprivate func handleDecoded(status: OSStatus, imageBuffer: CVImageBuffer?, pts: CMTime) { + guard status == noErr, let imageBuffer else { + onDecodeError(status) + return + } + var ts = timespec() + clock_gettime(CLOCK_REALTIME, &ts) + let decodedNs = Int64(ts.tv_sec) * 1_000_000_000 + Int64(ts.tv_nsec) + // pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively. + let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default) + let ptsNs = p.value > 0 ? UInt64(p.value) : 0 + onDecoded(ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer)) + } +} diff --git a/clients/apple/Tests/PunktfunkKitTests/VideoToolboxRoundTripTests.swift b/clients/apple/Tests/PunktfunkKitTests/VideoToolboxRoundTripTests.swift index 03c3b46..d501617 100644 --- a/clients/apple/Tests/PunktfunkKitTests/VideoToolboxRoundTripTests.swift +++ b/clients/apple/Tests/PunktfunkKitTests/VideoToolboxRoundTripTests.swift @@ -9,6 +9,13 @@ import VideoToolbox import XCTest @testable import PunktfunkKit +/// Sendable holder for the values the (background-thread) decode callback writes. +private final class FrameBox: @unchecked Sendable { + let lock = NSLock() + var frame: ReadyFrame? + var error: OSStatus? +} + final class VideoToolboxRoundTripTests: XCTestCase { private let width = 320 private let height = 240 @@ -59,6 +66,43 @@ final class VideoToolboxRoundTripTests: XCTestCase { XCTAssertEqual(CVPixelBufferGetHeight(pixels), height) } + /// Stage-2 decode half: the same known IDR through `VideoDecoder` — assert its async output + /// callback fires with a CVPixelBuffer of the right dimensions, the pts round-trips, and + /// decode-completion is stamped. + func testVideoDecoderAsyncCallbackDeliversPixels() throws { + let (formatDesc, avccSample) = try encodeOneHEVCKeyframe() + let annexB = try annexBAU(formatDesc: formatDesc, avccSample: avccSample) + let format = try XCTUnwrap(AnnexB.formatDescription(fromIDR: annexB)) + let au = AccessUnit(data: annexB, ptsNs: 42_000_000, frameIndex: 0, flags: 0) + + let box = FrameBox() + let done = DispatchSemaphore(value: 0) + let decoder = VideoDecoder( + onDecoded: { frame in + box.lock.lock(); box.frame = frame; box.lock.unlock() + done.signal() + }, + onDecodeError: { status in + box.lock.lock(); box.error = status; box.lock.unlock() + done.signal() + }) + + XCTAssertTrue(decoder.decode(au: au, format: format), "frame submit should succeed") + XCTAssertEqual(done.wait(timeout: .now() + 10), .success, "the decode callback must fire") + decoder.reset() + + box.lock.lock() + let frame = box.frame + let error = box.error + box.lock.unlock() + XCTAssertNil(error.map { "decode error \($0)" }) + let ready = try XCTUnwrap(frame, "the async output callback must deliver a ReadyFrame") + XCTAssertEqual(CVPixelBufferGetWidth(ready.pixelBuffer), width) + XCTAssertEqual(CVPixelBufferGetHeight(ready.pixelBuffer), height) + XCTAssertEqual(ready.ptsNs, 42_000_000, "pts round-trips through the decoder") + XCTAssertGreaterThan(ready.decodedNs, 0, "decode-completion is stamped") + } + // MARK: - encode helpers /// One forced-IDR HEVC frame; returns its format description and raw AVCC sample bytes. diff --git a/docs-site/content/docs/status.md b/docs-site/content/docs/status.md index 28cd150..7322b08 100644 --- a/docs-site/content/docs/status.md +++ b/docs-site/content/docs/status.md @@ -14,7 +14,7 @@ and the design in the [Implementation Plan](/docs/implementation-plan); this pag | **M1** — `punktfunk-core` + C ABI (protocol · FEC · crypto) | ✅ complete & hardened | | **M2** — GameStream host (Moonlight-compatible) | ✅ working end-to-end; HDR/surround-audio polish open | | **M3** — `punktfunk/1` native protocol (QUIC control + UDP data) | ✅ full session planes, validated live | -| **M4** — native client decode + present (Apple first) | 🟡 stage 1 done (first light); stage-2 presenter next | +| **M4** — native client decode + present (Apple first) | 🟡 stage 1 live; stage-2 presenter built + decode-tested (opt-in, present needs live validation) | ## Live on the boxes @@ -75,7 +75,7 @@ All three appliances advertise over mDNS (`_punktfunk._udp`) and require PIN pai See the [Roadmap](/docs/roadmap) for the ordered list. Near-term: - **True glass-to-glass**: Apple client present-stamp (decode→present) + host render→capture term. -- **Apple stage-2 presenter** (`VTDecompressionSession` → `CAMetalLayer`). +- **Apple stage-2 presenter** (`VTDecompressionSession` → `CAMetalLayer`) — built + decode-unit-tested + live-validated on glass behind the `punktfunk.presenter` flag (capture→present ~11 ms p50); make it the default after a few resolution/HDR checks. - **Mandatory PIN pairing + delegated pairing approval** (an already-paired device approves a new one). - **gamescope multi-user isolation** — per-session input/audio so concurrent sessions are independent desktops (the shared-desktop multi-view case landed).