feat(apple): capture->client latency HUD (skew-corrected) via the connect offset
ci / rust (push) Has been cancelled

The Apple client now consumes the connector's clock offset. PunktfunkConnection
reads punktfunk_connection_clock_offset_ns into clockOffsetNs at connect; a new
LatencyMeter (PunktfunkKit, NSLock + percentiles, mirrors FrameMeter) records each
AU's capture->client-receipt latency = now(CLOCK_REALTIME) + offset - pts_ns, and
SessionModel drains p50/p95 into the macOS HUD ("capture->client N/N ms p50/p95",
"(same-host)" when the host didn't answer the skew handshake). Wired at the
existing onFrame hook in ContentView — additive, no change to the decode/present
path. Unit test for the meter (percentiles, skew flag, absurd-value guard).

This is the first cross-machine latency the real Apple client reports. SCOPE:
stage-1 AVSampleBufferDisplayLayer decodes+presents compressed samples internally
with no per-frame callback, so this excludes decode+present; true decode->present
needs the stage-2 presenter (VTDecompressionSession + CAMetalLayer). Rebuild
PunktfunkCore.xcframework (for the new C getter) before swift build/test on a Mac.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-12 11:58:54 +00:00
parent 7eb9a927cf
commit e04328f086
8 changed files with 179 additions and 5 deletions
+7 -3
View File
@@ -310,9 +310,13 @@ buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
(`quic::clock_sync` → `ClockSkew`) used by both the reference client and the **embeddable
connector** — `NativeClient` runs it at connect and exposes the offset over the C ABI
(`punktfunk_connection_clock_offset_ns`), so the Apple client can convert a present instant to the
host clock. **Remaining for true glass-to-glass**: (1) the **Apple client present-stamp**
(decode→present) — Swift: stamp `AVSampleBufferDisplayLayer`/presenter time, add the C-ABI offset,
subtract the AU `pts_ns`; (2) the host **render→capture** term (PipeWire buffer presentation
host clock. The Apple client now consumes that offset: `PunktfunkConnection.clockOffsetNs` +
`LatencyMeter` surface a **capture→client-receipt** (skew-corrected) p50/p95 in the HUD — the first
cross-machine latency the real Apple client reports. **Remaining for *true* glass-to-glass**:
(1) the **decode→present** tail — the stage-1 `AVSampleBufferDisplayLayer` decodes+presents
compressed samples internally with no per-frame callback, so it needs the **stage-2 presenter**
(`VTDecompressionSession` decode-completion timestamp + `CAMetalLayer`/display-link present) to
stamp on-glass present time; (2) the host **render→capture** term (PipeWire buffer presentation
timestamp vs our capture stamp). `tools/latency-probe` is still the cross-machine orchestrator.
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
1. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
+5
View File
@@ -29,6 +29,11 @@ All three appliances advertise over mDNS (`_punktfunk._udp`) and require PIN pai
## Progress log
### 2026-06-12
- **Apple client latency HUD** — `PunktfunkConnection.clockOffsetNs` (from the C-ABI getter) +
`LatencyMeter` surface a skew-corrected **capture→client-receipt** p50/p95 in the macOS HUD: the
first cross-machine latency the real Apple client reports. (Stage-1 `AVSampleBufferDisplayLayer`
has no present callback, so decode→present is excluded — that needs the stage-2 presenter.)
Needs an `xcframework` rebuild + `swift test` on the Mac to validate.
- **Skew handshake in the connector + C ABI** — `quic::clock_sync` is now a shared core helper used
by both the reference client and `NativeClient`; the connector runs it at connect and exposes the
host clock offset over the C ABI (`punktfunk_connection_clock_offset_ns`). This is the substrate