7b10714b62
ci / web (push) Failing after 38s
ci / rust (push) Successful in 53s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 16s
ci / docs-site (push) Failing after 39s
docker / deploy-docs (push) Successful in 16s
apple / swift (push) Successful in 1m17s
Opt-in (Settings -> Presenter; `punktfunk.presenter`, default stage-1). Stage-1's AVSampleBufferDisplayLayer decodes AND presents internally with no per-frame callback, so neither decode nor present can be stamped or hand-paced. Stage-2 takes explicit control: - VideoDecoder: VTDecompressionSession, async output callback stamps decode-completion, session rebuilt on every IDR / format change. Unit-tested (testVideoDecoderAsyncCallbackDeliversPixels). - MetalVideoPresenter: CAMetalLayer + CVMetalTextureCache + a runtime-compiled BT.709 limited-range NV12->RGB shader, present at the next vsync. The CVMetalTextures + pixel buffer are held until the GPU completes. - Stage2Pipeline: pump thread -> decoder -> newest-ready 1-slot ring; the hosting view's display link drains it once per vsync and stamps capture->present (the display-link target time projected into CLOCK_REALTIME). - LatencyMeter gains record(ptsNs:atNs:offsetNs:); the HUD shows a capture->present (glass-to-glass, modulo host render->capture) line, skew-corrected via clockOffsetNs. Measured live ~11 ms p50 vs ~2.2 ms capture->client. - StreamView / StreamViewIOS host the CAMetalLayer as a sublayer + a CADisplayLink (NSView.displayLink on macOS) when stage-2; input capture + HUD unchanged. The session-active gates switch from `pump != nil` to `connection != nil` so capture engages without a StreamPump. Validated: builds macOS/iOS/tvOS; the decode half is unit-tested; the Metal present is live-validated on glass (correct image + the capture->present number). Colorspace is BT.709 SDR for now; 10-bit/HDR + a pacing policy are later. Plan: docs-site/content/docs/apple-stage2-presenter.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
79 lines
3.8 KiB
Swift
79 lines
3.8 KiB
Swift
// Per-frame latency sampler for the live HUD: records capture->client-receipt latency and drains
|
|
// percentiles on demand. NSLock rather than an actor — the writer is the non-async pump/arrival
|
|
// path (same pattern as the app's FrameMeter).
|
|
|
|
import Foundation
|
|
|
|
/// Samples the **capture->client-receipt** latency of each access unit and reports percentiles.
|
|
///
|
|
/// The latency is `now - pts_ns`, where `pts_ns` is the host's capture wall clock (the AU's pts) and
|
|
/// `now` is the client's `CLOCK_REALTIME` instant the AU was received, shifted by the connect-time
|
|
/// **clock-skew offset** (`PunktfunkConnection.clockOffsetNs`, host minus client) so the difference
|
|
/// is valid across machines. `offsetNs == 0` means an old host that didn't answer the skew handshake
|
|
/// (or genuinely synced clocks) — the number is then only meaningful same-host.
|
|
///
|
|
/// SCOPE (stage-1 presenter): this covers host capture -> encode -> FEC -> network -> reassembly ->
|
|
/// decrypt -> handed to the presenter. It does **not** include the on-device VideoToolbox decode or
|
|
/// the `AVSampleBufferDisplayLayer` present — that layer decodes and presents compressed samples
|
|
/// internally with no per-frame callback. True decode->present (the full glass-to-glass) needs the
|
|
/// stage-2 presenter (`VTDecompressionSession` decode-completion + `CAMetalLayer`/display-link
|
|
/// present); this meter is the substrate it will extend.
|
|
public final class LatencyMeter: @unchecked Sendable {
|
|
private let lock = NSLock()
|
|
private var samplesUs: [Int64] = []
|
|
private var skewCorrected = false
|
|
|
|
public init() {}
|
|
|
|
/// Record one frame at receipt (now). `ptsNs` is the host capture clock (the AU's pts);
|
|
/// `offsetNs` is the host-client clock offset from the skew handshake (0 = uncorrected).
|
|
public func record(ptsNs: UInt64, offsetNs: Int64) {
|
|
var ts = timespec()
|
|
clock_gettime(CLOCK_REALTIME, &ts)
|
|
let nowNs = Int64(ts.tv_sec) * 1_000_000_000 + Int64(ts.tv_nsec)
|
|
record(ptsNs: ptsNs, atNs: nowNs, offsetNs: offsetNs)
|
|
}
|
|
|
|
/// Record one frame whose latency is `atNs + offsetNs - ptsNs` — an EXPLICIT client instant
|
|
/// rather than now. The stage-2 presenter uses this to stamp capture→present at the display
|
|
/// link's target present time (not the moment the present call ran). All in `CLOCK_REALTIME`.
|
|
public func record(ptsNs: UInt64, atNs: Int64, offsetNs: Int64) {
|
|
let latNs = atNs &+ offsetNs &- Int64(bitPattern: ptsNs)
|
|
// Drop absurd values (a clock step, a wildly wrong offset, or garbage pts).
|
|
guard latNs > 0, latNs < 10_000_000_000 else { return }
|
|
lock.lock()
|
|
samplesUs.append(latNs / 1000)
|
|
if offsetNs != 0 { skewCorrected = true }
|
|
lock.unlock()
|
|
}
|
|
|
|
public struct Stats: Sendable {
|
|
public let p50Ms: Double
|
|
public let p95Ms: Double
|
|
public let p99Ms: Double
|
|
public let count: Int
|
|
/// True if the skew offset was applied (a host that answered the handshake) — i.e. the
|
|
/// numbers are cross-machine valid, not just same-host.
|
|
public let skewCorrected: Bool
|
|
}
|
|
|
|
/// Percentiles over the samples accumulated since the last drain, then reset the window. `nil`
|
|
/// when no samples arrived in the interval.
|
|
public func drain() -> Stats? {
|
|
lock.lock()
|
|
let sorted = samplesUs.sorted()
|
|
let corrected = skewCorrected
|
|
samplesUs.removeAll(keepingCapacity: true)
|
|
skewCorrected = false
|
|
lock.unlock()
|
|
guard !sorted.isEmpty else { return nil }
|
|
func pct(_ p: Double) -> Double {
|
|
let i = min(Int(Double(sorted.count) * p), sorted.count - 1)
|
|
return Double(sorted[i]) / 1000.0 // us -> ms
|
|
}
|
|
return Stats(
|
|
p50Ms: pct(0.50), p95Ms: pct(0.95), p99Ms: pct(0.99),
|
|
count: sorted.count, skewCorrected: corrected)
|
|
}
|
|
}
|