feat(clients): unified stats vocabulary across every client + Moonlight comparison docs

One stat model everywhere (design/stats-unification.md): four measurement
points (capture/received/decoded/displayed), three stages that tile the
interval exactly, and a HUD that shows the addition explicitly —

  end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass
  = host+network 9.8 + decode 2.1 + display 2.3

replacing each client's ad-hoc mix of overlapping absolutes (the Apple HUD's
three arrow lines that looked sequential but weren't), mean-vs-median decode
times (Windows/Linux), missing same-host-clock flags (Windows/Linux), and
three different names for the same capture→received measurement (probe's
"reassembled", Apple/Android's "client", Windows/Linux's post-decode "lat").

Per client: Apple threads receivedNs through the VT decode via the frame
refcon bit pattern so the decode stage exists at all (stage-1 fallback
honestly degrades to a capture→received headline); Windows carries
FrameTimes through the existing frame channel to the render thread and adds
e2e p50/p95 post-Present; Linux stamps received at AU pop and rides
decoded_ns on DecodedFrame to the paintable-set site; Android pairs receipt
stamps with MediaCodec output buffers via the codec's pts round-trip (JNI
stats array 14→16 doubles, indexes 0-13 unchanged). fps now uniformly counts
received AUs; lost/(received+lost) per window, hidden at zero.

docs-site gains "Understanding the Stats Overlay": what each line means, why
the equation only approximately sums (percentiles), and a line-by-line
Moonlight/Sunshine matrix — including that Moonlight has no end-to-end
number and its "network latency" is an ENet control RTT, so punktfunk's
headline must not be compared against any single Moonlight line.

Verified here: linux client + probe + core check/clippy/fmt green, android
native cargo-ndk arm64 check green. Pending: Windows CI + on-glass, swift
test on the mac, on-device Android.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-03 21:01:29 +00:00
parent c7630ff5dc
commit 09a5957c6d
38 changed files with 1122 additions and 380 deletions
+89 -40
View File
@@ -1,8 +1,11 @@
//! Live decode stats for the on-stream HUD (mirrors the Apple client's stats overlay): FPS,
//! receive throughput, and capture→client-receipt latency (p50/p95). The decode thread is the sole
//! writer (`note` per access unit); the JNI accessor `nativeVideoStats` drains a snapshot ~1 Hz and
//! resets the window. Sampling is gated on the HUD actually being visible (`set_enabled`, driven by
//! `nativeSetVideoStatsEnabled`) so the hidden steady state costs one relaxed atomic load per frame.
//! Live decode stats for the on-stream HUD, following the unified stats spec
//! (`design/stats-unification.md`): FPS, receive throughput, and the Android v1 stage split —
//! headline `end-to-end` = capture→decoded (p50/p95) tiled by `host+network` = capture→received
//! and `decode` = received→decoded (stage p50s). The decode thread is the sole writer
//! (`note_received` per access unit at receipt, `note_decoded` per decoder output buffer); the JNI
//! accessor `nativeVideoStats` drains a snapshot ~1 Hz and resets the window. Sampling is gated on
//! the HUD actually being visible (`set_enabled`, driven by `nativeSetVideoStatsEnabled`) so the
//! hidden steady state costs one relaxed atomic load per frame.
//! Pure `std` so it compiles on the host build too (the decode thread is android-only, but
//! `SessionHandle` holds the shared handle unconditionally).
@@ -13,9 +16,9 @@ use std::time::Instant;
/// Rolling per-window accumulator. Rates are computed over the actual elapsed wall-time at drain
/// (robust to poll jitter), so a poll that lands at 0.9 s or 1.1 s still reports the right FPS.
pub struct VideoStats {
/// HUD gate: `note` runs on the per-frame decode path, so while the overlay is hidden it (and
/// the caller's latency computation — see `enabled`) early-outs on this flag alone. Off until
/// Kotlin shows the HUD.
/// HUD gate: the samplers run on the per-frame decode path, so while the overlay is hidden
/// they (and the caller's latency computation — see `enabled`) early-out on this flag alone.
/// Off until Kotlin shows the HUD.
enabled: AtomicBool,
inner: Mutex<Inner>,
}
@@ -24,23 +27,42 @@ struct Inner {
window_start: Instant,
frames: u64,
bytes: u64,
/// capture→client-receipt latency samples for this window, in microseconds.
lat_us: Vec<u64>,
/// `end-to-end` = capture→decoded latency samples for this window, in microseconds
/// (skew-corrected clock base).
e2e_us: Vec<u64>,
/// `host+network` stage = capture→received samples, in microseconds (skew-corrected).
hostnet_us: Vec<u64>,
/// `decode` stage = received→decoded samples, in microseconds (client-local, single clock).
decode_us: Vec<u64>,
/// Whether the host answered the clock-skew handshake (latency is cross-machine valid).
skew_corrected: bool,
}
/// A drained, computed view of one window. `lat_valid` is false when no in-range latency sample
/// landed (then p50/p95 are 0 and the HUD hides the latency line, exactly like the Apple client).
/// A drained, computed view of one window. `lat_valid` is false when no in-range end-to-end sample
/// landed (then the latency figures are 0 and the HUD hides the latency lines, exactly like the
/// Apple client).
pub struct Snapshot {
pub fps: f64,
pub mbps: f64,
pub lat_p50_ms: f64,
pub lat_p95_ms: f64,
/// Headline `end-to-end` (capture→decoded) percentiles, ms.
pub e2e_p50_ms: f64,
pub e2e_p95_ms: f64,
/// Stage p50s (ms): `host+network` (capture→received) and `decode` (received→decoded).
pub hostnet_p50_ms: f64,
pub decode_p50_ms: f64,
pub lat_valid: bool,
pub skew_corrected: bool,
}
/// Percentile over a sorted-in-place µs sample vec, in ms. 0.0 when empty.
fn pctl_ms(sorted_us: &[u64], p: f64) -> f64 {
if sorted_us.is_empty() {
return 0.0;
}
let n = sorted_us.len();
sorted_us[((n as f64 * p) as usize).min(n - 1)] as f64 / 1000.0
}
impl VideoStats {
pub fn new() -> VideoStats {
VideoStats {
@@ -49,14 +71,16 @@ impl VideoStats {
window_start: Instant::now(),
frames: 0,
bytes: 0,
lat_us: Vec::with_capacity(256),
e2e_us: Vec::with_capacity(256),
hostnet_us: Vec::with_capacity(256),
decode_us: Vec::with_capacity(256),
skew_corrected: false,
}),
}
}
/// Whether the HUD wants samples. The decode thread checks this BEFORE building a latency
/// sample, so the per-frame wall-clock read is skipped too while hidden.
/// sample, so the per-frame wall-clock reads are skipped too while hidden.
// Read only by the android-only decode thread; unreferenced on the host build — expected.
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
pub fn enabled(&self) -> bool {
@@ -75,18 +99,21 @@ impl VideoStats {
g.window_start = Instant::now();
g.frames = 0;
g.bytes = 0;
g.lat_us.clear();
g.e2e_us.clear();
g.hostnet_us.clear();
g.decode_us.clear();
}
}
/// Record one decoded access unit: its wire size and (if in range) its capture→client latency.
/// Record one received access unit: its wire size and (if in range) its capture→received
/// `host+network` stage sample. Receipt is the fps/goodput counting point per the spec.
// Driven only by the android-only decode thread; unreferenced on the host build — expected.
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
pub fn note(&self, bytes: usize, lat_us: Option<u64>, skew_corrected: bool) {
pub fn note_received(&self, bytes: usize, hostnet_us: Option<u64>, skew_corrected: bool) {
if !self.enabled.load(Ordering::Relaxed) {
return; // HUD hidden — skip the lock (the caller already skipped the clock read)
}
// Poison-proof: `note` runs per-frame on the decode thread, which has no catch_unwind —
// Poison-proof: this runs per-frame on the decode thread, which has no catch_unwind —
// a panic elsewhere must not turn every later lock into a second panic (the counters
// stay consistent regardless).
let mut g = self
@@ -96,14 +123,37 @@ impl VideoStats {
g.frames += 1;
g.bytes += bytes as u64;
g.skew_corrected = skew_corrected;
if let Some(l) = lat_us {
g.lat_us.push(l);
if let Some(l) = hostnet_us {
g.hostnet_us.push(l);
}
}
/// Record one decoded output frame: its capture→decoded `end-to-end` sample and its
/// received→decoded `decode` stage sample (either may be absent — e.g. the receipt stamp for
/// this pts predates the HUD being shown).
// Driven only by the android-only decode thread; unreferenced on the host build — expected.
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
pub fn note_decoded(&self, e2e_us: Option<u64>, decode_us: Option<u64>) {
if !self.enabled.load(Ordering::Relaxed) {
return; // HUD hidden — skip the lock (the caller already skipped the clock read)
}
// Poison-proof for the same reason as `note_received`.
let mut g = self
.inner
.lock()
.unwrap_or_else(std::sync::PoisonError::into_inner);
if let Some(l) = e2e_us {
g.e2e_us.push(l);
}
if let Some(l) = decode_us {
g.decode_us.push(l);
}
}
/// Compute the window's rates + latency percentiles, then reset for the next window.
pub fn drain(&self) -> Snapshot {
// Poison-proof for the same reason as `note` — a poisoned window still drains fine.
// Poison-proof for the same reason as `note_received` — a poisoned window still drains
// fine.
let mut g = self
.inner
.lock()
@@ -111,26 +161,25 @@ impl VideoStats {
let elapsed = g.window_start.elapsed().as_secs_f64().max(1e-3);
let fps = g.frames as f64 / elapsed;
let mbps = g.bytes as f64 * 8.0 / 1_000_000.0 / elapsed;
let (p50, p95, valid) = if g.lat_us.is_empty() {
(0.0, 0.0, false)
} else {
g.lat_us.sort_unstable();
let n = g.lat_us.len();
let at = |p: f64| g.lat_us[((n as f64 * p) as usize).min(n - 1)] as f64 / 1000.0;
(at(0.50), at(0.95), true)
g.e2e_us.sort_unstable();
g.hostnet_us.sort_unstable();
g.decode_us.sort_unstable();
let snap = Snapshot {
fps,
mbps,
e2e_p50_ms: pctl_ms(&g.e2e_us, 0.50),
e2e_p95_ms: pctl_ms(&g.e2e_us, 0.95),
hostnet_p50_ms: pctl_ms(&g.hostnet_us, 0.50),
decode_p50_ms: pctl_ms(&g.decode_us, 0.50),
lat_valid: !g.e2e_us.is_empty(),
skew_corrected: g.skew_corrected,
};
let skew = g.skew_corrected;
g.window_start = Instant::now();
g.frames = 0;
g.bytes = 0;
g.lat_us.clear();
Snapshot {
fps,
mbps,
lat_p50_ms: p50,
lat_p95_ms: p95,
lat_valid: valid,
skew_corrected: skew,
}
g.e2e_us.clear();
g.hostnet_us.clear();
g.decode_us.clear();
snap
}
}