From 69609945a3e6903672e7f21789bc8820241900f8 Mon Sep 17 00:00:00 2001 From: enricobuehler Date: Fri, 3 Jul 2026 21:31:49 +0000 Subject: [PATCH] feat(clients): host/network split in every stats HUD (stats phase 2, client side) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Consumes the 0xCF host-timing plane (449a67c) on all four GUI clients: each keeps a bounded pending ring of receipt samples keyed by pts, matches the host's per-AU capture→sent reports against it, and the HUD equation becomes = host 3.1 + network 6.7 + decode 2.1 + display 2.3 falling back to the combined `= host+network …` term whenever no timing matched the window (old host / datagram loss) — same total, one split fewer, never a misleading zero. Apple additionally gains the split as the only equation line under the stage-1 fallback presenter (receipt is presenter-independent), a `nextHostTiming` wrapper with its own plane lock, and a unit-tested `HostNetworkSplitter`; Android extends the JNI stats array 16→18 doubles (0–15 unchanged); Windows/Linux thread the split through `Stats` into the HUD and the headless/debug logs. Docs updated: design/stats-unification.md Phase 2 → implemented (wire format, fallback semantics), and the docs-site matrix's Sunshine "Host processing latency" row is now a direct match (ours includes the paced send; avg vs p50). Verified here: linux client clippy -D warnings green on the live tree, windows stub check + hand-verified diff, android cargo-ndk arm64 check green, apple loopback test extended (needs the rebuilt xcframework + swift test on the mac). On-glass: pending on all platforms. Co-Authored-By: Claude Fable 5 --- .../kotlin/io/unom/punktfunk/StatsOverlay.kt | 21 +++- .../io/unom/punktfunk/kit/NativeBridge.kt | 11 +- clients/android/native/src/decode.rs | 30 +++++ clients/android/native/src/session/planes.rs | 21 ++-- clients/android/native/src/stats.rs | 43 ++++++- .../Sources/PunktfunkClient/ContentView.swift | 7 +- .../Session/SessionModel.swift | 33 ++++++ .../Session/StreamHUDView.swift | 28 +++-- .../Connection/PunktfunkConnection.swift | 39 +++++++ .../Video/HostNetworkSplitter.swift | 88 ++++++++++++++ .../HostNetworkSplitterTests.swift | 107 ++++++++++++++++++ .../LoopbackIntegrationTests.swift | 20 +++- clients/linux/src/session.rs | 51 +++++++++ clients/linux/src/ui_stream.rs | 23 +++- clients/windows/src/app/stream.rs | 24 ++-- clients/windows/src/main.rs | 12 ++ clients/windows/src/session.rs | 44 +++++++ design/stats-unification.md | 49 +++++--- docs-site/content/docs/stats.md | 18 +-- 19 files changed, 610 insertions(+), 59 deletions(-) create mode 100644 clients/apple/Sources/PunktfunkKit/Video/HostNetworkSplitter.swift create mode 100644 clients/apple/Tests/PunktfunkKitTests/HostNetworkSplitterTests.swift diff --git a/clients/android/app/src/main/kotlin/io/unom/punktfunk/StatsOverlay.kt b/clients/android/app/src/main/kotlin/io/unom/punktfunk/StatsOverlay.kt index f989c83..c10a82b 100644 --- a/clients/android/app/src/main/kotlin/io/unom/punktfunk/StatsOverlay.kt +++ b/clients/android/app/src/main/kotlin/io/unom/punktfunk/StatsOverlay.kt @@ -16,12 +16,15 @@ import kotlin.math.roundToInt /** * The live stats overlay — the unified HUD (`design/stats-unification.md`, Android v1: headline is - * `capture→decoded`, tiled by `host+network` + `decode`). Reads the 16-double layout from + * `capture→decoded`, tiled by `host+network` + `decode`). Reads the 18-double layout from * [NativeBridge.nativeVideoStats]: * `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skew, w, h, hz, lost, bitDepth, colorPrimaries, - * colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms]`. Indexes 10–13 (present on a current - * native lib) describe the negotiated video feed and render as a codec/depth/colour/chroma line; - * 14/15 render as the stage equation; older layouts just omit those lines. + * colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms, hostP50Ms, netP50Ms]`. Indexes 10–13 + * (present on a current native lib) describe the negotiated video feed and render as a + * codec/depth/colour/chroma line; 14/15 render as the stage equation — split into + * `host + network + decode` when the Phase-2 terms at 16/17 are nonzero (a current host sends + * per-AU 0xCF timings; an old host leaves them 0 and the combined `host+network` term stands); + * older layouts just omit those lines. */ @Composable internal fun StatsOverlay(s: DoubleArray, modifier: Modifier = Modifier) { @@ -60,8 +63,16 @@ internal fun StatsOverlay(s: DoubleArray, modifier: Modifier = Modifier) { fontSize = 12.sp, ) if (s.size >= 16) { + // Phase-2 split (s[16]/s[17]): render `host + network` separately when the host + // reported its share this window; otherwise the combined term (old host / no + // matched 0xCF timing). + val equation = if (s.size >= 18 && s[16] > 0) { + "= host ${"%.1f".format(s[16])} + network ${"%.1f".format(s[17])} + decode ${"%.1f".format(s[15])}" + } else { + "= host+network ${"%.1f".format(s[14])} + decode ${"%.1f".format(s[15])}" + } Text( - "= host+network ${"%.1f".format(s[14])} + decode ${"%.1f".format(s[15])}", + equation, color = Color.White, fontFamily = FontFamily.Monospace, fontSize = 12.sp, diff --git a/clients/android/kit/src/main/kotlin/io/unom/punktfunk/kit/NativeBridge.kt b/clients/android/kit/src/main/kotlin/io/unom/punktfunk/kit/NativeBridge.kt index c7e624a..1f49ae0 100644 --- a/clients/android/kit/src/main/kotlin/io/unom/punktfunk/kit/NativeBridge.kt +++ b/clients/android/kit/src/main/kotlin/io/unom/punktfunk/kit/NativeBridge.kt @@ -105,14 +105,17 @@ object NativeBridge { /** * Drain ~1 s of live decode stats for the on-stream HUD, or `null` when no decode thread runs. - * Returns 16 doubles (unified stats spec, `design/stats-unification.md`): + * Returns 18 doubles (unified stats spec, `design/stats-unification.md`): * `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skewCorrected, width, height, refreshHz, framesLost, - * bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms]` + * bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms, hostP50Ms, + * netP50Ms]` * (the two flags are 1.0/0.0; indexes 2/3 are the end-to-end capture→decoded headline; 10–13 * describe the negotiated video feed — bit depth 8/10, CICP primaries/transfer, and the HEVC * chroma_format_idc 1=4:2:0 / 3=4:4:4; 14/15 are the stage p50s tiling the headline — - * `host+network` = capture→received, `decode` = received→decoded). Poll ~1 Hz; each call - * resets the measurement window. + * `host+network` = capture→received, `decode` = received→decoded; 16/17 split the + * `host+network` term via the host's per-AU 0xCF timings — `host` = the host's capture→sent, + * `network` = the remainder — both 0.0 when no timing matched this window, i.e. an old host). + * Poll ~1 Hz; each call resets the measurement window. */ external fun nativeVideoStats(handle: Long): DoubleArray? diff --git a/clients/android/native/src/decode.rs b/clients/android/native/src/decode.rs index 8fc67d7..3d934c7 100644 --- a/clients/android/native/src/decode.rs +++ b/clients/android/native/src/decode.rs @@ -25,6 +25,11 @@ use std::time::{Duration, Instant}; /// flight, so anything beyond this is stale (codec flushed / HUD toggled) and gets evicted. const IN_FLIGHT_CAP: usize = 64; +/// Cap on received AUs awaiting their 0xCF host timing (Phase 2 host/network split): the timing +/// datagram trails its AU by at most the wire, so a match lands within a frame or two — anything +/// this deep is a lost datagram (or an old host that never sends any) and gets evicted. +const PENDING_SPLIT_CAP: usize = 256; + /// The decode loop. Runs on the `pf-decode` thread until `shutdown` is set or the session closes. pub fn run( client: Arc, @@ -155,6 +160,11 @@ pub fn run( // point (output-buffer dequeue — MediaCodec round-trips presentationTimeUs) can be paired back // to its receipt for the `decode` stage. Only fed while the HUD is visible. let mut in_flight: VecDeque<(u64, i128)> = VecDeque::new(); + // Phase-2 host/network split (design/stats-unification.md): received AUs awaiting their 0xCF + // host timing, as (pts_ns, capture→received µs). The timings are drained non-blockingly right + // where receipts are recorded and matched by pts; `network = hostnet − host` (saturating). + // Only fed while the HUD is visible; an old host never sends a 0xCF, so entries just age out. + let mut pending_split: VecDeque<(u64, u64)> = VecDeque::new(); // The dataspace we've signalled on the Surface so far (None = default/SDR). Set reactively once // the decoder reports an HDR stream (see `drain`); avoids re-applying every format event. let mut applied_ds: Option = None; @@ -190,6 +200,26 @@ pub fn run( if in_flight.len() > IN_FLIGHT_CAP { in_flight.pop_front(); // stale — codec never echoed it back } + // Phase-2 split: park this AU's capture→received sample, then match any + // 0xCF host timings that have arrived — host = the host's own + // capture→sent, network = our capture→received minus it (per-frame + // tiling; saturating in case of clock jitter). + if let Some(hostnet_us) = lat_us { + pending_split.push_back((frame.pts_ns, hostnet_us)); + if pending_split.len() > PENDING_SPLIT_CAP { + pending_split.pop_front(); // 0xCF lost / old host — evict + } + } + while let Ok(t) = client.next_host_timing(Duration::ZERO) { + if let Some(i) = pending_split.iter().position(|&(p, _)| p == t.pts_ns) + { + let (_, hostnet_us) = pending_split.remove(i).unwrap(); + stats.note_host_split( + t.host_us as u64, + hostnet_us.saturating_sub(t.host_us as u64), + ); + } + } } pending = Some(frame); } diff --git a/clients/android/native/src/session/planes.rs b/clients/android/native/src/session/planes.rs index baab09f..960db7d 100644 --- a/clients/android/native/src/session/planes.rs +++ b/clients/android/native/src/session/planes.rs @@ -73,13 +73,16 @@ pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeStopVideo( } /// `NativeBridge.nativeVideoStats(handle): DoubleArray?` — drain ~1 s of decode stats for the HUD -/// (unified stats spec, `design/stats-unification.md`). Returns 16 doubles +/// (unified stats spec, `design/stats-unification.md`). Returns 18 doubles /// `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skewCorrected, width, height, refreshHz, framesLost, -/// bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms]` -/// (the two flags are 1.0/0.0; indexes 0–13 match the previous 14-double layout with the latency -/// pair re-based from capture→received to the end-to-end capture→decoded headline; the two stage -/// p50s tiling it — `host+network` = capture→received, `decode` = received→decoded — are appended -/// at the end), or `null` when no decode thread is running. Poll ~1 Hz from the UI; each call +/// bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms, hostP50Ms, +/// netP50Ms]` +/// (the two flags are 1.0/0.0; indexes 0–15 match the previous 16-double layout — 0–13 the original +/// 14-double one with the latency pair re-based to the end-to-end capture→decoded headline, 14/15 +/// the stage p50s tiling it: `host+network` = capture→received, `decode` = received→decoded; 16/17 +/// are the Phase-2 split of the `host+network` term from the per-AU 0xCF host timings — `host` = +/// the host's capture→sent, `network` = the remainder — both 0.0 when no timing matched this +/// window, i.e. an old host), or `null` when no decode thread is running. Poll ~1 Hz from the UI; each call /// resets the measurement window. Not android-gated — pure `jni` + connector reads, so it links on /// the host build too (Kotlin only ever calls it on device). #[no_mangle] @@ -100,7 +103,7 @@ pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeVideoStats( let snap = h.stats.drain(); let mode = h.client.mode(); let color = h.client.color; - let buf: [f64; 16] = [ + let buf: [f64; 18] = [ snap.fps, snap.mbps, snap.e2e_p50_ms, @@ -122,6 +125,10 @@ pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeVideoStats( // Stage p50s tiling the end-to-end headline (appended to keep 0–13 index-compatible). snap.hostnet_p50_ms, snap.decode_p50_ms, + // Phase-2 host/network split of the `host+network` stage (0xCF host timings): 0.0 + // when no timing matched this window (old host) — the HUD keeps the combined term. + snap.host_p50_ms, + snap.net_p50_ms, ]; let arr = match env.new_double_array(buf.len() as jsize) { Ok(a) => a, diff --git a/clients/android/native/src/stats.rs b/clients/android/native/src/stats.rs index 071aff1..45d28bb 100644 --- a/clients/android/native/src/stats.rs +++ b/clients/android/native/src/stats.rs @@ -1,7 +1,9 @@ //! Live decode stats for the on-stream HUD, following the unified stats spec //! (`design/stats-unification.md`): FPS, receive throughput, and the Android v1 stage split — //! headline `end-to-end` = capture→decoded (p50/p95) tiled by `host+network` = capture→received -//! and `decode` = received→decoded (stage p50s). The decode thread is the sole writer +//! and `decode` = received→decoded (stage p50s). When the host emits per-AU 0xCF host timings, the +//! `host+network` term further splits into `host` + `network` (Phase 2, `note_host_split`); an old +//! host emits none and the combined term stands. The decode thread is the sole writer //! (`note_received` per access unit at receipt, `note_decoded` per decoder output buffer); the JNI //! accessor `nativeVideoStats` drains a snapshot ~1 Hz and resets the window. Sampling is gated on //! the HUD actually being visible (`set_enabled`, driven by `nativeSetVideoStatsEnabled`) so the @@ -32,6 +34,12 @@ struct Inner { e2e_us: Vec, /// `host+network` stage = capture→received samples, in microseconds (skew-corrected). hostnet_us: Vec, + /// Phase-2 split of `host+network` (design/stats-unification.md Phase 2), fed only when the + /// host emits per-AU 0xCF timings: `host` = the host's own capture→sent duration, µs. + host_us: Vec, + /// The matching `network` term, µs: capture→received minus the host's capture→sent + /// (wire + reassembly). Always pushed in lockstep with `host_us`. + net_us: Vec, /// `decode` stage = received→decoded samples, in microseconds (client-local, single clock). decode_us: Vec, /// Whether the host answered the clock-skew handshake (latency is cross-machine valid). @@ -50,6 +58,10 @@ pub struct Snapshot { /// Stage p50s (ms): `host+network` (capture→received) and `decode` (received→decoded). pub hostnet_p50_ms: f64, pub decode_p50_ms: f64, + /// Phase-2 `host` / `network` split p50s (ms) — 0.0 when no 0xCF timing matched this window + /// (old host / no samples yet), in which case the HUD keeps the combined `host+network` term. + pub host_p50_ms: f64, + pub net_p50_ms: f64, pub lat_valid: bool, pub skew_corrected: bool, } @@ -73,6 +85,8 @@ impl VideoStats { bytes: 0, e2e_us: Vec::with_capacity(256), hostnet_us: Vec::with_capacity(256), + host_us: Vec::with_capacity(256), + net_us: Vec::with_capacity(256), decode_us: Vec::with_capacity(256), skew_corrected: false, }), @@ -101,6 +115,8 @@ impl VideoStats { g.bytes = 0; g.e2e_us.clear(); g.hostnet_us.clear(); + g.host_us.clear(); + g.net_us.clear(); g.decode_us.clear(); } } @@ -128,6 +144,25 @@ impl VideoStats { } } + /// Record one matched host/network split sample (Phase 2): the host's reported capture→sent + /// duration and our capture→received minus it, both µs — one pair per AU whose 0xCF host + /// timing arrived and matched by pts. An old host emits none, leaving the vecs empty and the + /// snapshot p50s at 0 (HUD keeps the combined `host+network` term). + // Driven only by the android-only decode thread; unreferenced on the host build — expected. + #[cfg_attr(not(target_os = "android"), allow(dead_code))] + pub fn note_host_split(&self, host_us: u64, net_us: u64) { + if !self.enabled.load(Ordering::Relaxed) { + return; // HUD hidden — skip the lock + } + // Poison-proof for the same reason as `note_received`. + let mut g = self + .inner + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner); + g.host_us.push(host_us); + g.net_us.push(net_us); + } + /// Record one decoded output frame: its capture→decoded `end-to-end` sample and its /// received→decoded `decode` stage sample (either may be absent — e.g. the receipt stamp for /// this pts predates the HUD being shown). @@ -163,6 +198,8 @@ impl VideoStats { let mbps = g.bytes as f64 * 8.0 / 1_000_000.0 / elapsed; g.e2e_us.sort_unstable(); g.hostnet_us.sort_unstable(); + g.host_us.sort_unstable(); + g.net_us.sort_unstable(); g.decode_us.sort_unstable(); let snap = Snapshot { fps, @@ -171,6 +208,8 @@ impl VideoStats { e2e_p95_ms: pctl_ms(&g.e2e_us, 0.95), hostnet_p50_ms: pctl_ms(&g.hostnet_us, 0.50), decode_p50_ms: pctl_ms(&g.decode_us, 0.50), + host_p50_ms: pctl_ms(&g.host_us, 0.50), + net_p50_ms: pctl_ms(&g.net_us, 0.50), lat_valid: !g.e2e_us.is_empty(), skew_corrected: g.skew_corrected, }; @@ -179,6 +218,8 @@ impl VideoStats { g.bytes = 0; g.e2e_us.clear(); g.hostnet_us.clear(); + g.host_us.clear(); + g.net_us.clear(); g.decode_us.clear(); snap } diff --git a/clients/apple/Sources/PunktfunkClient/ContentView.swift b/clients/apple/Sources/PunktfunkClient/ContentView.swift index d09bd35..bb9f66a 100644 --- a/clients/apple/Sources/PunktfunkClient/ContentView.swift +++ b/clients/apple/Sources/PunktfunkClient/ContentView.swift @@ -326,9 +326,14 @@ struct ContentView: View { onCaptureChange: { [weak model] captured in model?.mouseCaptured = captured }, - onFrame: { [meter = model.meter, latency = model.latency, offset = conn.clockOffsetNs] au in + onFrame: { [meter = model.meter, latency = model.latency, + split = model.latencySplit, offset = conn.clockOffsetNs] au in meter.note(byteCount: au.data.count) latency.record(ptsNs: au.ptsNs, offsetNs: offset) + // The same receipt, keyed by pts, awaiting its 0xCF host timing (the + // host/network split — drained by the 1 s stats tick). + split.recordReceipt( + ptsNs: au.ptsNs, receivedNs: au.receivedNs, offsetNs: offset) }, onSessionEnd: { [weak model] in Task { @MainActor in model?.sessionEnded() } diff --git a/clients/apple/Sources/PunktfunkClient/Session/SessionModel.swift b/clients/apple/Sources/PunktfunkClient/Session/SessionModel.swift index b5f1547..c57c158 100644 --- a/clients/apple/Sources/PunktfunkClient/Session/SessionModel.swift +++ b/clients/apple/Sources/PunktfunkClient/Session/SessionModel.swift @@ -69,6 +69,14 @@ final class SessionModel: ObservableObject { @Published var hostNetworkP95Ms = 0.0 @Published var hostNetworkValid = false @Published var hostNetworkSkewCorrected = false + /// Phase 2 of the same stage: `host+network` split into its two terms via the host's per-AU + /// 0xCF timing reports (host = capture→fully-sent as the host measured it, network = the + /// remainder), matched to receipts by pts in `latencySplit`. `splitValid` is false whenever + /// no timing matched in the window — an old host that never emits the plane, or heavy 0xCF + /// loss — and the HUD then falls back to the combined `host+network` term. + @Published var hostP50Ms = 0.0 + @Published var networkP50Ms = 0.0 + @Published var splitValid = false /// End-to-end = capture→on-glass, measured directly per frame (never summed from the stages) — /// the HUD headline. Only the stage-2 presenter can stamp it (it owns decode + a /// CAMetalLayer/display-link present); stays invalid under stage-1, where the layer presents @@ -96,6 +104,10 @@ final class SessionModel: ObservableObject { /// Capture→received (the host+network stage), fed per AU at receipt by the stream view's /// onFrame — under both presenters. let latency = LatencyMeter() + /// The host/network split of that same stage: onFrame also records (pts, interval) receipts + /// here, and the 1 s stats tick drains the connection's 0xCF host timings into it — under + /// both presenters (the receipt path is presenter-independent). + let latencySplit = HostNetworkSplitter() /// The stage-2 meters, passed to StreamView: end-to-end (capture→on-glass, stamped at /// present), decode (received→decoded), display (decoded→on-glass). let endToEnd = LatencyMeter() @@ -296,6 +308,7 @@ final class SessionModel: ObservableObject { fps = 0 mbps = 0 hostNetworkValid = false + splitValid = false endToEndValid = false decodeValid = false displayValid = false @@ -341,6 +354,7 @@ final class SessionModel: ObservableObject { private func startStatsTimer() { lastFramesDropped = 0 // a fresh connection's cumulative drop counter starts at 0 + latencySplit.reset() // no stale receipts/samples from a previous session let timer = Timer(timeInterval: 1.0, repeats: true) { [weak self] _ in guard let self else { return } Task { @MainActor in @@ -364,6 +378,25 @@ final class SessionModel: ObservableObject { } else { self.hostNetworkValid = false } + // Phase 2: drain the window's per-AU host timings (0xCF) into the splitter — + // non-blocking, bounded (a 240 fps window is ~240 reports; the cap only guards + // a pathological burst). `try?` flattens (SE-0230); a throw (.closed during + // teardown) just ends the drain. An old host never emits any → splitValid stays + // false and the HUD keeps the combined host+network term. + if let conn = self.connection { + var burst = 0 + while burst < 1024, let t = try? conn.nextHostTiming(timeoutMs: 0) { + self.latencySplit.noteHostTiming(ptsNs: t.ptsNs, hostUs: t.hostUs) + burst += 1 + } + } + if let s = self.latencySplit.drain() { + self.hostP50Ms = s.hostP50Ms + self.networkP50Ms = s.networkP50Ms + self.splitValid = true + } else { + self.splitValid = false + } if let e = self.endToEnd.drain() { self.endToEndP50Ms = e.p50Ms self.endToEndP95Ms = e.p95Ms diff --git a/clients/apple/Sources/PunktfunkClient/Session/StreamHUDView.swift b/clients/apple/Sources/PunktfunkClient/Session/StreamHUDView.swift index 5d5a65a..22f4d6d 100644 --- a/clients/apple/Sources/PunktfunkClient/Session/StreamHUDView.swift +++ b/clients/apple/Sources/PunktfunkClient/Session/StreamHUDView.swift @@ -26,20 +26,34 @@ struct StreamHUDView: View { Text("end-to-end \(model.endToEndP50Ms, specifier: "%.1f") ms p50 · \(model.endToEndP95Ms, specifier: "%.1f") p95 · capture→on-glass\(model.endToEndSkewCorrected ? "" : " (same-host clock)")") .font(.system(.caption2, design: .monospaced)) .foregroundStyle(.secondary) - // The equation: the three stages tiling the headline interval (per-window p50s — - // they only approximately sum to the directly-measured total). + // The equation: the stages tiling the headline interval (per-window p50s — + // they only approximately sum to the directly-measured total). With a host + // that reports per-AU timings (0xCF) the first term splits into host + network + // (phase 2); an old host keeps the combined term. if model.hostNetworkValid && model.decodeValid && model.displayValid { - Text("= host+network \(model.hostNetworkP50Ms, specifier: "%.1f") + decode \(model.decodeP50Ms, specifier: "%.1f") + display \(model.displayP50Ms, specifier: "%.1f")") - .font(.system(.caption2, design: .monospaced)) - .foregroundStyle(.secondary) + if model.splitValid { + Text("= host \(model.hostP50Ms, specifier: "%.1f") + network \(model.networkP50Ms, specifier: "%.1f") + decode \(model.decodeP50Ms, specifier: "%.1f") + display \(model.displayP50Ms, specifier: "%.1f")") + .font(.system(.caption2, design: .monospaced)) + .foregroundStyle(.secondary) + } else { + Text("= host+network \(model.hostNetworkP50Ms, specifier: "%.1f") + decode \(model.decodeP50Ms, specifier: "%.1f") + display \(model.displayP50Ms, specifier: "%.1f")") + .font(.system(.caption2, design: .monospaced)) + .foregroundStyle(.secondary) + } } } else if model.hostNetworkValid { // Stage-1 fallback presenter: the layer decodes + presents internally with no - // per-frame stamp, so the honest headline ends at receipt — and there is no - // equation line (host+network is the whole measured interval). + // per-frame stamp, so the honest headline ends at receipt. The host/network + // split still applies there (receipt is presenter-independent) — it becomes the + // only equation line; without it, host+network IS the whole measured interval. Text("capture→received \(model.hostNetworkP50Ms, specifier: "%.1f") ms p50 · \(model.hostNetworkP95Ms, specifier: "%.1f") p95\(model.hostNetworkSkewCorrected ? "" : " (same-host clock)")") .font(.system(.caption2, design: .monospaced)) .foregroundStyle(.secondary) + if model.splitValid { + Text("= host \(model.hostP50Ms, specifier: "%.1f") + network \(model.networkP50Ms, specifier: "%.1f")") + .font(.system(.caption2, design: .monospaced)) + .foregroundStyle(.secondary) + } } if model.lostFrames > 0 { // Unrecoverable network drops this window; hidden while the link is clean. diff --git a/clients/apple/Sources/PunktfunkKit/Connection/PunktfunkConnection.swift b/clients/apple/Sources/PunktfunkKit/Connection/PunktfunkConnection.swift index cc1740a..6423a3f 100644 --- a/clients/apple/Sources/PunktfunkKit/Connection/PunktfunkConnection.swift +++ b/clients/apple/Sources/PunktfunkKit/Connection/PunktfunkConnection.swift @@ -83,6 +83,9 @@ public final class PunktfunkConnection { /// Same role for the feedback drain thread (rumble + HID-output — two core planes, /// drained sequentially by one thread). private let feedbackLock = NSLock() + /// Same role for the host-timing (0xCF) puller — its own plane in the core, drained + /// non-blockingly by the app's 1 s stats tick (never contends with the blocking pullers). + private let statsLock = NSLock() /// Negotiated session mode (host-confirmed). public private(set) var width: UInt32 = 0 @@ -665,6 +668,40 @@ public final class PunktfunkConnection { } } + /// One per-AU host-timing report (0xCF): the host's capture→fully-sent duration for the + /// access unit whose `AccessUnit.ptsNs` equals `ptsNs` exactly. The stats consumer derives + /// `network = (receivedNs + clockOffsetNs − ptsNs) − hostUs` — the host/network split of the + /// HUD's `host+network` stage (design/stats-unification.md Phase 2). + public struct HostTiming: Sendable, Equatable { + /// The AU's capture stamp (host capture clock — matches the AU's `ptsNs`). + public let ptsNs: UInt64 + /// Host capture→sent duration, µs. + public let hostUs: UInt32 + } + + /// Pull the next per-AU host timing; nil on timeout, throws `.closed` once the session + /// ended. Best-effort plane: an older host never emits any — keep showing the combined + /// `host+network` stage then. Drain non-blockingly (`timeoutMs: 0`) from ONE stats + /// consumer (its own core plane, safe alongside the other pullers). + public func nextHostTiming(timeoutMs: UInt32 = 0) throws -> HostTiming? { + statsLock.lock() + defer { statsLock.unlock() } + guard let h = liveHandle() else { throw PunktfunkClientError.closed } + + var out = PunktfunkHostTiming() + let rc = punktfunk_connection_next_host_timing(h, &out, timeoutMs) + switch rc { + case statusOK: + return HostTiming(ptsNs: out.pts_ns, hostUs: out.host_us) + case statusNoFrame: + return nil + case statusClosed: + throw PunktfunkClientError.closed + default: + throw PunktfunkClientError.status(rc) + } + } + /// Send one input event (delivered to the host as a QUIC datagram). Thread-safe; /// silently dropped after close. public func send(_ event: PunktfunkInputEvent) { @@ -684,10 +721,12 @@ public final class PunktfunkConnection { pumpLock.lock() // pullers exit at their next poll boundary, releasing these audioLock.lock() feedbackLock.lock() + statsLock.lock() abiLock.lock() let h = handle handle = nil abiLock.unlock() + statsLock.unlock() feedbackLock.unlock() audioLock.unlock() pumpLock.unlock() diff --git a/clients/apple/Sources/PunktfunkKit/Video/HostNetworkSplitter.swift b/clients/apple/Sources/PunktfunkKit/Video/HostNetworkSplitter.swift new file mode 100644 index 0000000..eefaa33 --- /dev/null +++ b/clients/apple/Sources/PunktfunkKit/Video/HostNetworkSplitter.swift @@ -0,0 +1,88 @@ +// Splits the unified stats model's `host+network` stage (capture→received) into its `host` +// (capture→fully-sent, reported per AU by the host on the 0xCF plane) and `network` +// (the remainder) terms — design/stats-unification.md Phase 2. +// +// Receipt samples are recorded per frame from the pump path; host timings are matched to them +// by exact pts (the 0xCF datagram carries the AU's own `pts_ns`). Best-effort by construction: +// a lost 0xCF datagram, an FEC-dropped AU, or an old host that never emits the plane simply +// contributes no split sample — the HUD then keeps the combined `host+network` line. NSLock +// rather than an actor — the receipt writer is the non-async pump path (same pattern as +// LatencyMeter/FrameMeter). + +import Foundation + +/// Per-frame `host` / `network` sampler: `recordReceipt` at AU receipt (pts + the combined +/// capture→received interval), `noteHostTiming` per drained 0xCF report, `drain` the window's +/// p50s once a second. The pending ring is bounded (drop-oldest) so an old host — receipts +/// forever, timings never — costs a fixed ~4 KB, not growth. +public final class HostNetworkSplitter: @unchecked Sendable { + private let lock = NSLock() + /// Received AUs awaiting their 0xCF host timing: (pts, combined capture→received µs). + private var pending: [(ptsNs: UInt64, combinedUs: Int64)] = [] + private var hostUsSamples: [Int64] = [] + private var networkUsSamples: [Int64] = [] + /// ~1 s of frames at 240 fps; beyond it the oldest receipt can no longer expect a match. + private static let pendingCap = 256 + + public init() {} + + /// Record one frame at receipt. `ptsNs` is the host capture clock (the AU's pts), + /// `receivedNs` the client `CLOCK_REALTIME` receipt instant (`AccessUnit.receivedNs`), + /// `offsetNs` the connect-time host−client clock offset (0 = uncorrected). Same + /// absurd-value clamp as LatencyMeter — a sample it would drop must not linger here. + public func recordReceipt(ptsNs: UInt64, receivedNs: Int64, offsetNs: Int64) { + let combinedNs = receivedNs &+ offsetNs &- Int64(bitPattern: ptsNs) + guard combinedNs > 0, combinedNs < 10_000_000_000 else { return } + lock.lock() + pending.append((ptsNs: ptsNs, combinedUs: combinedNs / 1000)) + if pending.count > Self.pendingCap { + pending.removeFirst(pending.count - Self.pendingCap) + } + lock.unlock() + } + + /// Match one host timing (0xCF) to its receipt: `host` = the reported capture→sent, + /// `network` = the combined interval minus it, floored at 0 (the terms tile per frame; a + /// slightly-off skew offset must not produce a negative wire time). Unmatched timings — + /// the AU was FEC-dropped, or its receipt raced this drain — are simply skipped. + public func noteHostTiming(ptsNs: UInt64, hostUs: UInt32) { + lock.lock() + defer { lock.unlock() } + guard let i = pending.firstIndex(where: { $0.ptsNs == ptsNs }) else { return } + let combinedUs = pending.remove(at: i).combinedUs + hostUsSamples.append(Int64(hostUs)) + networkUsSamples.append(max(0, combinedUs - Int64(hostUs))) + } + + public struct Split: Sendable { + public let hostP50Ms: Double + public let networkP50Ms: Double + public let count: Int + } + + /// The window's p50s since the last drain, then reset (matched samples only; the pending + /// ring survives — a receipt may still match a timing drained next tick). `nil` when no + /// timing matched in the interval — the caller falls back to the combined stage. + public func drain() -> Split? { + lock.lock() + let host = hostUsSamples.sorted() + let network = networkUsSamples.sorted() + hostUsSamples.removeAll(keepingCapacity: true) + networkUsSamples.removeAll(keepingCapacity: true) + lock.unlock() + guard !host.isEmpty else { return nil } + func p50(_ sorted: [Int64]) -> Double { + Double(sorted[min(sorted.count / 2, sorted.count - 1)]) / 1000.0 // µs → ms + } + return Split(hostP50Ms: p50(host), networkP50Ms: p50(network), count: host.count) + } + + /// Forget everything (pending receipts + window) — a fresh connection starts clean. + public func reset() { + lock.lock() + pending.removeAll() + hostUsSamples.removeAll() + networkUsSamples.removeAll() + lock.unlock() + } +} diff --git a/clients/apple/Tests/PunktfunkKitTests/HostNetworkSplitterTests.swift b/clients/apple/Tests/PunktfunkKitTests/HostNetworkSplitterTests.swift new file mode 100644 index 0000000..b0582ef --- /dev/null +++ b/clients/apple/Tests/PunktfunkKitTests/HostNetworkSplitterTests.swift @@ -0,0 +1,107 @@ +// Unit tests for HostNetworkSplitter (the host/network split of the unified stats model's +// host+network stage — design/stats-unification.md Phase 2): pts matching, the per-frame +// tiling arithmetic (network = combined − host, floored at 0), drain/reset semantics, the +// bounded pending ring, and the absurd-receipt clamp. All samples use explicit instants, so +// the expectations are exact. + +import Foundation +import XCTest + +@testable import PunktfunkKit + +final class HostNetworkSplitterTests: XCTestCase { + /// An arbitrary host-capture pts (ns) far from zero, like a real CLOCK_REALTIME stamp. + private let basePts: UInt64 = 1_000_000_000_000 + + private func receipt(_ s: HostNetworkSplitter, pts: UInt64, combinedMs: Int64, + offsetNs: Int64 = 0) { + s.recordReceipt( + ptsNs: pts, receivedNs: Int64(pts) + combinedMs * 1_000_000 - offsetNs, + offsetNs: offsetNs) + } + + func testEmptyDrainIsNil() { + XCTAssertNil(HostNetworkSplitter().drain()) + } + + func testMatchSplitsCombinedIntoHostAndNetwork() { + let s = HostNetworkSplitter() + receipt(s, pts: basePts, combinedMs: 8) // capture→received 8 ms + s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) // host says 3 ms of it was its own + guard let split = s.drain() else { return XCTFail("expected a matched sample") } + XCTAssertEqual(split.count, 1) + XCTAssertEqual(split.hostP50Ms, 3.0) + XCTAssertEqual(split.networkP50Ms, 5.0, "the two terms tile the combined interval") + XCTAssertNil(s.drain(), "drain resets the window") + } + + func testSkewOffsetAppliesToTheCombinedInterval() { + let s = HostNetworkSplitter() + // Client clock 2 ms behind the host: the raw difference alone would read 6 ms. + receipt(s, pts: basePts, combinedMs: 8, offsetNs: 2_000_000) + s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) + XCTAssertEqual(s.drain()?.networkP50Ms, 5.0) + } + + func testUnmatchedTimingIsSkipped() { + let s = HostNetworkSplitter() + receipt(s, pts: basePts, combinedMs: 8) + // A timing for an AU we never received (FEC-dropped) must not fabricate a sample. + s.noteHostTiming(ptsNs: basePts + 1, hostUs: 3_000) + XCTAssertNil(s.drain()) + } + + func testReceiptSurvivesADrainUntilItsTimingArrives() { + let s = HostNetworkSplitter() + receipt(s, pts: basePts, combinedMs: 8) + XCTAssertNil(s.drain(), "no timing matched yet") + s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) // arrives one tick late — still matches + XCTAssertEqual(s.drain()?.hostP50Ms, 3.0) + } + + func testEachReceiptMatchesOnce() { + let s = HostNetworkSplitter() + receipt(s, pts: basePts, combinedMs: 8) + s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) + s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) // duplicate 0xCF — no second sample + XCTAssertEqual(s.drain()?.count, 1) + } + + func testNetworkFlooredAtZero() { + let s = HostNetworkSplitter() + // A slightly-off skew offset can make host_us exceed the combined interval. + receipt(s, pts: basePts, combinedMs: 2) + s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) + guard let split = s.drain() else { return XCTFail("expected a sample") } + XCTAssertEqual(split.hostP50Ms, 3.0) + XCTAssertEqual(split.networkP50Ms, 0.0) + } + + func testPendingRingDropsOldest() { + let s = HostNetworkSplitter() + for i in 0..<300 { // cap is 256 — the first receipts fall out + receipt(s, pts: basePts + UInt64(i), combinedMs: 8) + } + s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) // evicted — no match + XCTAssertNil(s.drain()) + s.noteHostTiming(ptsNs: basePts + 299, hostUs: 3_000) // newest — still pending + XCTAssertEqual(s.drain()?.count, 1) + } + + func testAbsurdReceiptsAreDropped() { + let s = HostNetworkSplitter() + receipt(s, pts: basePts, combinedMs: -1) // received before capture — clock step + receipt(s, pts: basePts + 1, combinedMs: 20_000) // > 10 s — garbage pts/offset + s.noteHostTiming(ptsNs: basePts, hostUs: 1_000) + s.noteHostTiming(ptsNs: basePts + 1, hostUs: 1_000) + XCTAssertNil(s.drain()) + } + + func testResetForgetsPendingReceipts() { + let s = HostNetworkSplitter() + receipt(s, pts: basePts, combinedMs: 8) + s.reset() + s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) + XCTAssertNil(s.drain(), "a fresh session must not match a previous session's receipts") + } +} diff --git a/clients/apple/Tests/PunktfunkKitTests/LoopbackIntegrationTests.swift b/clients/apple/Tests/PunktfunkKitTests/LoopbackIntegrationTests.swift index 5d22c93..f61120d 100644 --- a/clients/apple/Tests/PunktfunkKitTests/LoopbackIntegrationTests.swift +++ b/clients/apple/Tests/PunktfunkKitTests/LoopbackIntegrationTests.swift @@ -25,12 +25,18 @@ final class LoopbackIntegrationTests: XCTestCase { XCTAssertEqual(conn.resolvedBitrateKbps, 50_000) // Pull 25 synthetic frames and byte-verify the documented pattern: - // u32 LE frame index, then data[i] = (idx as u8) &+ (i as u8). + // u32 LE frame index, then data[i] = (idx as u8) &+ (i as u8). Alongside, drain the + // per-AU host-timing plane (0xCF) the way the app's stats tick does — the connector + // ORs VIDEO_CAP_HOST_TIMING in unconditionally and the synthetic host stamps one + // report per AU, so the pts correlation must hold end to end through the xcframework. var got = 0 var lastIndex: UInt32 = 0 + var receivedPts = Set() + var timings: [PunktfunkConnection.HostTiming] = [] let deadline = Date().addingTimeInterval(30) while got < 25 { XCTAssertLessThan(Date(), deadline, "timed out after \(got) frames") + while let t = try conn.nextHostTiming(timeoutMs: 0) { timings.append(t) } guard let au = try conn.nextAU(timeoutMs: 2000) else { continue } let idx = au.data.prefix(4).reversed().reduce(UInt32(0)) { ($0 << 8) | UInt32($1) } for (i, byte) in au.data.enumerated().dropFirst(4) { @@ -41,10 +47,22 @@ final class LoopbackIntegrationTests: XCTestCase { } } XCTAssertGreaterThan(au.ptsNs, 0) + receivedPts.insert(au.ptsNs) lastIndex = idx got += 1 } XCTAssertGreaterThanOrEqual(lastIndex, 24) + // Belt-and-braces: the last frame's timing lands just after its AU — give it a bounded + // grace drain (the stream keeps running, so this must not loop on fresh timings). + var grace = 0 + while grace < 64, !timings.contains(where: { receivedPts.contains($0.ptsNs) }), + let t = try conn.nextHostTiming(timeoutMs: 100) { + timings.append(t) + grace += 1 + } + XCTAssertTrue( + timings.contains { receivedPts.contains($0.ptsNs) }, + "no 0xCF host timing matched a received AU's pts (got \(timings.count) timings)") // Input goes the other way (enqueue-only; the host logs the count on close) — // including the touch kinds, gamepad events, the rich-input plane (DualSense diff --git a/clients/linux/src/session.rs b/clients/linux/src/session.rs index 4fe86d8..cc1d4ef 100644 --- a/clients/linux/src/session.rs +++ b/clients/linux/src/session.rs @@ -56,6 +56,16 @@ pub struct Stats { pub mbps: f32, /// p50 `host+network` stage: capture → received, host-clock corrected (ms). pub host_net_ms: f32, + /// p50 `host` stage: the host's own capture→fully-sent, from the per-AU 0xCF host + /// timings (design/stats-unification.md Phase 2). Valid only when `split`. + pub host_ms: f32, + /// p50 `network` stage: capture→received minus the host-reported share + /// (`hostnet − host`, per-frame, saturating). Valid only when `split`. + pub net_ms: f32, + /// The window had matched host timings — the OSD splits `host+network` into + /// `host + network`. An old host never emits 0xCF, so this stays false and the + /// combined stage renders unchanged. + pub split: bool, /// p50 `decode` stage: received → decoded, single-clock client-local (ms). pub decode_ms: f32, /// Unrecoverable network frame drops this window, and their share of @@ -67,6 +77,11 @@ pub struct Stats { pub decoder: &'static str, } +/// Frames the pump keeps waiting for their 0xCF host timing (pts → capture→received µs). +/// ~2 s at 120 Hz — a timing arrives within a frame or two of its AU, and against an old +/// host (no 0xCF at all) this just caps the dead-weight ring. +const PENDING_SPLIT_CAP: usize = 256; + /// Sort a window of µs samples in place and return `(p50, p95)` per the spec's index /// rules (`sorted[len/2]`, `sorted[min(len*95/100, len-1)]`); an empty window reads 0. pub fn window_percentiles(samples: &mut [u64]) -> (u64, u64) { @@ -245,6 +260,12 @@ fn pump( // corrected), `decode` = received→decoded (client-local). p50 per 1 s window. let mut hostnet_us: Vec = Vec::with_capacity(256); let mut decode_us: Vec = Vec::with_capacity(256); + // Host/network split (Phase 2): frames awaiting their per-AU 0xCF host timing, + // correlated by pts_ns. Bounded — an old host never sends any, so entries just age out. + let mut pending_split: std::collections::VecDeque<(u64, u64)> = + std::collections::VecDeque::with_capacity(PENDING_SPLIT_CAP); + let mut host_us_win: Vec = Vec::with_capacity(256); + let mut net_us_win: Vec = Vec::with_capacity(256); // What actually decoded the last frame — a VAAPI failure demotes mid-session, so // this is read off each frame's image variant rather than fixed at startup. let mut dec_path: &'static str = ""; @@ -291,6 +312,12 @@ fn pump( .max(0) as u64; if hn > 0 && hn < 10_000_000_000 { hostnet_us.push(hn / 1000); + // Remember the sample for the host/network split — matched + // against the AU's 0xCF host timing when it arrives. + if pending_split.len() >= PENDING_SPLIT_CAP { + pending_split.pop_front(); + } + pending_split.push_back((frame.pts_ns, hn / 1000)); } // `decode` stage: received→decoded, single clock, no skew. decode_us.push(decoded_ns.saturating_sub(received_ns) / 1000); @@ -310,6 +337,19 @@ fn pump( Err(e) => break Some(format!("session: {e:?}")), } + // Drain the per-AU host timings (0xCF) non-blockingly and match them to received + // frames by pts: host = the host's own capture→sent, network = our + // capture→received minus it (the two tile per frame by construction). An old + // host never emits any — the deque fills to its cap and the OSD keeps the + // combined `host+network` stage. + while let Ok(t) = connector.next_host_timing(Duration::ZERO) { + if let Some(i) = pending_split.iter().position(|(p, _)| *p == t.pts_ns) { + let (_, hn_us) = pending_split.remove(i).unwrap(); + host_us_win.push(t.host_us as u64); + net_us_win.push(hn_us.saturating_sub(t.host_us as u64)); + } + } + // Loss recovery: under infinite GOP the only recovery keyframe is one we request. The // reassembler drops unrecoverable AUs (frames_dropped); the decoder then conceals the // reference-missing delta frames that follow and returns Ok, so keying off a decode error @@ -330,11 +370,17 @@ fn pump( let secs = window_start.elapsed().as_secs_f32(); let (hn_p50, _) = window_percentiles(&mut hostnet_us); let (dec_p50, _) = window_percentiles(&mut decode_us); + // Host/network split — present only when this window matched 0xCF timings. + let split = !host_us_win.is_empty(); + let (host_p50, _) = window_percentiles(&mut host_us_win); + let (net_p50, _) = window_percentiles(&mut net_us_win); let lost = dropped.saturating_sub(window_dropped) as u32; window_dropped = dropped; tracing::debug!( fps = frames_n, hostnet_p50_us = hn_p50, + host_p50_us = host_p50, + net_p50_us = net_p50, decode_p50_us = dec_p50, lost, total_frames, @@ -344,6 +390,9 @@ fn pump( fps: frames_n as f32 / secs, mbps: bytes_n as f32 * 8.0 / 1e6 / secs, host_net_ms: hn_p50 as f32 / 1000.0, + host_ms: host_p50 as f32 / 1000.0, + net_ms: net_p50 as f32 / 1000.0, + split, decode_ms: dec_p50 as f32 / 1000.0, lost, lost_pct: if lost > 0 { @@ -358,6 +407,8 @@ fn pump( bytes_n = 0; hostnet_us.clear(); decode_us.clear(); + host_us_win.clear(); + net_us_win.clear(); } }; diff --git a/clients/linux/src/ui_stream.rs b/clients/linux/src/ui_stream.rs index 5b76248..b76a608 100644 --- a/clients/linux/src/ui_stream.rs +++ b/clients/linux/src/ui_stream.rs @@ -68,10 +68,28 @@ impl StreamPage { if self.hdr.get() { line1.push_str(" · HDR"); } + // The equation line: split `host+network` into `host + network` when the host + // reported per-AU timings (0xCF, stats Phase 2); the combined stage otherwise. + let equation = if s.split { + format!( + "= host {:.1} + network {:.1} + decode {:.1} + display {:.1}", + s.host_ms, + s.net_ms, + s.decode_ms, + self.presented.display_ms.get(), + ) + } else { + format!( + "= host+network {:.1} + decode {:.1} + display {:.1}", + s.host_net_ms, + s.decode_ms, + self.presented.display_ms.get(), + ) + }; let mut text = format!( "{line1}\n\ end-to-end {:.1} ms p50 · {:.1} p95 · capture→displayed{}\n\ - = host+network {:.1} + decode {:.1} + display {:.1}", + {equation}", self.presented.e2e_p50_ms.get(), self.presented.e2e_p95_ms.get(), if self.same_host { @@ -79,9 +97,6 @@ impl StreamPage { } else { "" }, - s.host_net_ms, - s.decode_ms, - self.presented.display_ms.get(), ); // Counters — only rendered when nonzero this window. if s.lost > 0 { diff --git a/clients/windows/src/app/stream.rs b/clients/windows/src/app/stream.rs index a5ae9f6..49b058c 100644 --- a/clients/windows/src/app/stream.rs +++ b/clients/windows/src/app/stream.rs @@ -175,7 +175,8 @@ fn fmt_uptime(secs: u32) -> String { /// The streaming HUD overlay (top-right), unified stats vocabulary (design/stats-unification.md): /// a chip row (mode · codec · decode path · HDR), a stream line (received fps · goodput · /// presenter fps), the end-to-end headline (capture→on-glass p50/p95, host-clock corrected), the -/// stage equation (= host+network + decode + display, stage p50s), a session line +/// stage equation (= host + network + decode + display when the host reports 0xCF timings, else +/// the combined = host+network + decode + display; stage p50s), a session line /// (host · time · loss/skips), and the shortcut hints. Layered over the `SwapChainPanel` in the /// same grid cell. fn hud_overlay(hud: &HudSample, mode: Option, host: &str) -> Element { @@ -212,12 +213,21 @@ fn hud_overlay(hud: &HudSample, mode: Option, host: &str) -> Element { if stats.same_host { e2e_line.push_str(" (same-host clock)"); } - // The equation: the three stages tile the headline interval per frame; the window p50s only - // approximately sum (percentiles aren't additive). - let stage_line = format!( - "= host+network {:.1} + decode {:.1} + display {:.1}", - stats.hostnet_ms, stats.decode_ms, present.display_p50_ms - ); + // The equation: the stages tile the headline interval per frame; the window p50s only + // approximately sum (percentiles aren't additive). With per-AU 0xCF host timings the opaque + // `host+network` term splits into `host` (host capture→sent) + `network` (the remainder); + // an old host emits none and the combined term stays. + let stage_line = if stats.split { + format!( + "= host {:.1} + network {:.1} + decode {:.1} + display {:.1}", + stats.host_ms, stats.net_ms, stats.decode_ms, present.display_p50_ms + ) + } else { + format!( + "= host+network {:.1} + decode {:.1} + display {:.1}", + stats.hostnet_ms, stats.decode_ms, present.display_p50_ms + ) + }; let mut session_bits: Vec = Vec::new(); if !host.is_empty() { session_bits.push(host.to_string()); diff --git a/clients/windows/src/main.rs b/clients/windows/src/main.rs index 8882610..4c1c275 100644 --- a/clients/windows/src/main.rs +++ b/clients/windows/src/main.rs @@ -238,6 +238,18 @@ fn run_headless_cli(args: &[String], identity: (String, String)) { session::SessionEvent::Connected { mode, fingerprint, .. } => tracing::info!(?mode, fp = %trust::hex(&fingerprint), "connected"), + // With per-AU 0xCF host timings the combined host+network stage splits into + // host (capture→sent on the host) + net; an old host emits none → combined only. + session::SessionEvent::Stats(s) if s.split => tracing::info!( + fps = format!("{:.0}", s.fps), + mbps = format!("{:.1}", s.mbps), + decode_p50_ms = format!("{:.2}", s.decode_ms), + hostnet_p50_ms = format!("{:.2}", s.hostnet_ms), + host_p50_ms = format!("{:.2}", s.host_ms), + net_p50_ms = format!("{:.2}", s.net_ms), + frames_seen, + "stats" + ), session::SessionEvent::Stats(s) => tracing::info!( fps = format!("{:.0}", s.fps), mbps = format!("{:.1}", s.mbps), diff --git a/clients/windows/src/session.rs b/clients/windows/src/session.rs index f2b89c6..c4b6c20 100644 --- a/clients/windows/src/session.rs +++ b/clients/windows/src/session.rs @@ -55,6 +55,15 @@ pub struct Stats { /// `host+network` stage p50 over the last 1 s window: capture (`pts_ns`) → received, /// host-clock corrected via `clock_offset_ns`. pub hostnet_ms: f32, + /// `host` stage p50 (host capture→sent, from the per-AU 0xCF host-timing plane). Valid only + /// when `split` — an old host emits no 0xCF and the HUD keeps the combined stage. + pub host_ms: f32, + /// `network` stage p50 (`hostnet − host`, tiled per frame before taking the percentile). + /// Valid only when `split`. + pub net_ms: f32, + /// True when any 0xCF host timings matched received AUs this window — the HUD then renders + /// `host + network` instead of the combined `host+network` term. + pub split: bool, /// True when `clock_offset_ns == 0` (host didn't answer the skew handshake / same host) — /// the HUD appends `(same-host clock)` to the end-to-end line. pub same_host: bool, @@ -330,6 +339,12 @@ fn pump( // 1 s tumbling stage windows (spec: design/stats-unification.md — percentiles, never means). let mut hostnet_us: Vec = Vec::with_capacity(256); let mut decode_us: Vec = Vec::with_capacity(256); + // Host/network split (Phase 2): received AUs awaiting their 0xCF host timing, `(pts_ns, + // hostnet_us)`, matched as the datagrams arrive. Bounded — an old host never sends any. + let mut pending_split: std::collections::VecDeque<(u64, u64)> = + std::collections::VecDeque::with_capacity(256); + let mut host_us_w: Vec = Vec::with_capacity(256); + let mut net_us_w: Vec = Vec::with_capacity(256); let mut pcm = vec![0f32; 5760 * channels as usize]; // scratch: max Opus frame (120 ms) × channels // Loss recovery: watch the host→client unrecoverable-drop count and ask for an IDR when it climbs. let mut last_dropped = connector.frames_dropped(); @@ -352,6 +367,11 @@ fn pump( .max(0) as u64; if hostnet > 0 && hostnet < 10_000_000_000 { hostnet_us.push(hostnet / 1000); + // Remember this AU for the 0xCF match below (host/network split). + pending_split.push_back((frame.pts_ns, hostnet / 1000)); + if pending_split.len() > 256 { + pending_split.pop_front(); + } } // A D3D11VA→software demotion (see `Decoder::decode`) starts a FRESH decoder that // has none of the stream's parameter sets; under infinite GOP it would sit on @@ -440,15 +460,34 @@ fn pump( *crate::present::LATEST_HDR_META.lock().unwrap() = Some(meta); } + // Drain the per-AU host-timing plane (0xCF) and match by pts: `host` = the host's own + // capture→sent, `network` = our capture→received minus it — the two tile per frame + // (design/stats-unification.md Phase 2). An old host never emits any; `split` stays false + // and the HUD keeps the combined `host+network` stage. + while let Ok(t) = connector.next_host_timing(Duration::ZERO) { + if let Some(i) = pending_split.iter().position(|(p, _)| *p == t.pts_ns) { + let (_, hn_us) = pending_split.remove(i).unwrap(); + host_us_w.push(t.host_us as u64); + net_us_w.push(hn_us.saturating_sub(t.host_us as u64)); + } + } + if window_start.elapsed() >= Duration::from_secs(1) { let secs = window_start.elapsed().as_secs_f32(); hostnet_us.sort_unstable(); decode_us.sort_unstable(); + host_us_w.sort_unstable(); + net_us_w.sort_unstable(); let p50 = |v: &[u64]| v.get(v.len() / 2).copied().unwrap_or(0); let (hostnet_p50, decode_p50) = (p50(&hostnet_us), p50(&decode_us)); + let (host_p50, net_p50) = (p50(&host_us_w), p50(&net_us_w)); + let split = !host_us_w.is_empty(); tracing::debug!( fps = frames_n, hostnet_p50_us = hostnet_p50, + host_p50_us = host_p50, + net_p50_us = net_p50, + split, decode_p50_us = decode_p50, total_frames, "stream window" @@ -458,6 +497,9 @@ fn pump( mbps: bytes_n as f32 * 8.0 / 1e6 / secs, decode_ms: decode_p50 as f32 / 1000.0, hostnet_ms: hostnet_p50 as f32 / 1000.0, + host_ms: host_p50 as f32 / 1000.0, + net_ms: net_p50 as f32 / 1000.0, + split, same_host: clock_offset == 0, hardware, hdr, @@ -470,6 +512,8 @@ fn pump( bytes_n = 0; hostnet_us.clear(); decode_us.clear(); + host_us_w.clear(); + net_us_w.clear(); } }; diff --git a/design/stats-unification.md b/design/stats-unification.md index 38ac20a..693b094 100644 --- a/design/stats-unification.md +++ b/design/stats-unification.md @@ -121,22 +121,41 @@ Sunshine's "host processing latency" (capture→send). from frame-number gaps); our `fps` counts received only — equal at ~0 loss. - Moonlight decode/queue/render times are **means**; ours are p50s. -## Phase 2 (specced, not in v1): split `host+network` +## Phase 2 (implemented): split `host+network` via the 0xCF host-timing plane -Carry the host's capture→send duration per AU (host stamps it at send, e.g. a -varint-µs field in the AU header or a 0.1 ms u16 à la Sunshine's frame header). Client -then displays `host {x} + network {y}` instead of `host+network`, where -`network = (received − capture) − host_reported` — and the Moonlight matrix gains a -direct "Host processing latency" counterpart. Requires a core wire/ABI bump -(`punktfunk_frame` gains `host_latency_us`), trailing-byte back-compat like the -compositor/gamepad preference bytes. Also consider surfacing the QUIC path RTT -(quinn exposes it) as a diagnostics line, clearly labelled control-plane RTT. +Not an AU-header change after all — the hardened data-plane format stays untouched. +The host reports its share on the established QUIC side-plane pattern: + +- **Cap bit**: `Hello::video_caps` gains `VIDEO_CAP_HOST_TIMING` (0x08). NativeClient + ORs it in unconditionally; the probe sets it explicitly. Old hosts ignore it. +- **Datagram**: `HOST_TIMING_MAGIC` 0xCF, 13 bytes — `[tag][pts_ns u64 LE][host_us + u32 LE]` (`quic::HostTiming`). Emitted once per AU by the send thread right after + the AU's last packet left the socket, so `host_us` = capture→fully-sent (capture + read/convert, encode, FEC+seal, paced send) against the same anchor as the wire + pts. Speed-test filler (FLAG_PROBE) is skipped. The synthetic host emits it too + (loopback protocol tests cover the plane). +- **Client math**: correlate by `pts_ns` (a bounded pending ring of receipt samples), + `host = host_us`, `network = hostnet_sample − host_us` (saturating) — the two terms + tile the `host+network` stage per frame by construction. Equation line becomes + `= host {x} + network {y} + decode + display`; when no 0xCF arrived in the window + (old host / all datagrams lost) it falls back to the combined `host+network` term. +- **Surfaces**: `NativeClient::next_host_timing()` (Rust clients), C ABI + `PunktfunkHostTiming` + `punktfunk_connection_next_host_timing` (Apple), probe log + line `host/network latency split` (host_p50/p95_us · net_p50/p95_us). +- This is our direct analogue of Sunshine's "host processing latency" — ours + additionally includes the paced send (theirs stops just before the UDP send). + +Still open for a later phase: surfacing the QUIC path RTT (quinn exposes it) as a +diagnostics line, clearly labelled control-plane RTT. ## Implementation status -- [ ] Apple (`StreamHUDView`/`SessionModel`/`Stage2Pipeline` + `LatencyMeter` reuse) -- [ ] Windows (`app/stream.rs` HUD rows, `session.rs`/`render.rs` meters → p50/p95) -- [ ] Linux (`ui_stream.rs` OSD, `session.rs` window meters) -- [ ] Android (`stats.rs`/`decode.rs` stage split, `StatsOverlay.kt`) -- [ ] probe (rename `capture→reassembled` → `capture→received` in the log line) -- [ ] docs-site stats page + matrix; link from `moonlight.md` +- [x] Apple (`StreamHUDView`/`SessionModel`/`Stage2Pipeline` + `LatencyMeter` reuse) — 09a5957 +- [x] Windows (`app/stream.rs` HUD rows, `session.rs`/`render.rs` meters → p50/p95) — 09a5957 +- [x] Linux (`ui_stream.rs` OSD, `session.rs` window meters) — 09a5957 +- [x] Android (`stats.rs`/`decode.rs` stage split, `StatsOverlay.kt`) — 09a5957 +- [x] probe (rename `capture→reassembled` → `capture→received` in the log line) — 09a5957 +- [x] docs-site stats page + matrix; link from `moonlight.md` — 09a5957 +- [x] Phase 2 wire layer (0xCF + cap bit + NativeClient/ABI + host emission + probe split) — 449a67c +- [ ] Phase 2 client HUDs (host/network equation terms on Apple/Windows/Linux/Android) +- [ ] On-glass validation everywhere (Mac swift test + glass, Windows CI + glass, Android device, Linux glass) diff --git a/docs-site/content/docs/stats.md b/docs-site/content/docs/stats.md index a2e7bfa..c943432 100644 --- a/docs-site/content/docs/stats.md +++ b/docs-site/content/docs/stats.md @@ -25,7 +25,7 @@ life: ``` 1920×1080@120 · 119 fps · 38.2 Mb/s · HEVC 10-bit HDR · GPU decode end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass -= host+network 9.8 + decode 2.1 + display 2.3 += host 3.1 + network 6.7 + decode 2.1 + display 2.3 lost 3 (0.1%) · skipped 1 · FEC 12 ``` @@ -35,14 +35,18 @@ lost 3 (0.1%) · skipped 1 · FEC 12 capture to the endpoint named at the end of the line (`capture→on-glass` here). `p50` = the typical frame (median), `p95` = the slow outliers. This is the one number that summarizes your stream. -- **Line 3 — where the time goes.** The three stages **tile the end-to-end interval** - — each starts where the previous one ends, so they add up to the headline: - - `host+network` — capture → received: the host's capture/encode/send pipeline - *plus* the network flight and reassembly, in one number. +- **Line 3 — where the time goes.** The stages **tile the end-to-end interval** — + each starts where the previous one ends, so they add up to the headline: + - `host` — capture → sent: the host's own share (capture read, encode, error + coding, the paced send), reported by the host itself once per frame. + - `network` — sent → received: the network flight plus reassembly on your device. - `decode` — received → decoded, on your device. - `display` — decoded → displayed: waiting for the right screen refresh, rendering, and vsync. + Against an **older host** that doesn't report its share yet, the first two terms + merge into a single `host+network` number — same total, one split fewer. + (Stage values are per-stage medians, so they sum only *approximately* to the headline median — percentiles aren't perfectly additive. The headline is measured directly, never computed as a sum.) @@ -109,10 +113,10 @@ stands in for a one-way frame flight that Moonlight doesn't measure.) | `Incoming frame rate from network` | Frames reassembled from the network per second | `fps` (line 1) | **Yes — direct** | | `Decoding frame rate` (desktop only) | Frames leaving the decoder per second | not shown separately (equals `fps` unless the decoder is falling behind) | — | | `Rendering frame rate` (desktop only) | Frames actually presented per second | `fps` minus `skipped` | Approximately | -| `Host processing latency min/max/avg` (Sunshine hosts) | Host capture → just-before-send, reported by Sunshine per frame | contained inside `host+network`; the host-side breakdown lives in the punktfunk web console (capture/encode/send stages) | Indirect — punktfunk's `host+network` additionally includes the network flight | +| `Host processing latency min/max/avg` (Sunshine hosts) | Host capture → just-before-send, reported by Sunshine per frame | `host` (line 3) — the host reports capture→fully-sent per frame the same way | **Yes — direct** (punktfunk's includes the paced send itself, Sunshine's stops just before it; avg vs p50) | | `Frames dropped by your network connection` | Frame-sequence gaps ÷ total frames | `lost` (line 4) | **Yes — direct** | | `Frames dropped due to network jitter` | Decoded frames the *client's pacer* chose to drop ÷ decoded frames | `skipped` (line 4) | Approximately (both are client-side pacing decisions, despite Moonlight's name) | -| `Average network latency` | The **control connection's round-trip time** (ENet RTT + variance) — not video frame latency | none, on purpose | **No.** An RTT is not a frame latency; punktfunk measures the actual per-frame path instead | +| `Average network latency` | The **control connection's round-trip time** (ENet RTT + variance) — not video frame latency | `network` (line 3) is the closest concept, but it's the *actual one-way frame path* (flight + reassembly), not an RTT | **No direct comparison.** Roughly, punktfunk's `network` ≈ ½ × an idle RTT plus serialization time of the frame | | `Average decoding time` | Mean time from decoder enqueue to picture out | `decode` (p50) | Yes (mean vs median; both include decoder queueing) | | `Average frame queue delay` | Mean time a decoded frame waits for its vsync slot | inside `display` | Sum the two Moonlight lines → | | `Average rendering time (incl. V-sync latency)` | Mean duration of the present call | inside `display` | …and compare against punktfunk's `display` |