feat(clients): host/network split in every stats HUD (stats phase 2, client side)
Consumes the 0xCF host-timing plane (449a67c) on all four GUI clients: each
keeps a bounded pending ring of receipt samples keyed by pts, matches the
host's per-AU capture→sent reports against it, and the HUD equation becomes
= host 3.1 + network 6.7 + decode 2.1 + display 2.3
falling back to the combined `= host+network …` term whenever no timing
matched the window (old host / datagram loss) — same total, one split
fewer, never a misleading zero. Apple additionally gains the split as the
only equation line under the stage-1 fallback presenter (receipt is
presenter-independent), a `nextHostTiming` wrapper with its own plane lock,
and a unit-tested `HostNetworkSplitter`; Android extends the JNI stats
array 16→18 doubles (0–15 unchanged); Windows/Linux thread the split
through `Stats` into the HUD and the headless/debug logs.
Docs updated: design/stats-unification.md Phase 2 → implemented (wire
format, fallback semantics), and the docs-site matrix's Sunshine "Host
processing latency" row is now a direct match (ours includes the paced
send; avg vs p50).
Verified here: linux client clippy -D warnings green on the live tree,
windows stub check + hand-verified diff, android cargo-ndk arm64 check
green, apple loopback test extended (needs the rebuilt xcframework + swift
test on the mac). On-glass: pending on all platforms.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -16,12 +16,15 @@ import kotlin.math.roundToInt
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* The live stats overlay — the unified HUD (`design/stats-unification.md`, Android v1: headline is
|
* The live stats overlay — the unified HUD (`design/stats-unification.md`, Android v1: headline is
|
||||||
* `capture→decoded`, tiled by `host+network` + `decode`). Reads the 16-double layout from
|
* `capture→decoded`, tiled by `host+network` + `decode`). Reads the 18-double layout from
|
||||||
* [NativeBridge.nativeVideoStats]:
|
* [NativeBridge.nativeVideoStats]:
|
||||||
* `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skew, w, h, hz, lost, bitDepth, colorPrimaries,
|
* `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skew, w, h, hz, lost, bitDepth, colorPrimaries,
|
||||||
* colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms]`. Indexes 10–13 (present on a current
|
* colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms, hostP50Ms, netP50Ms]`. Indexes 10–13
|
||||||
* native lib) describe the negotiated video feed and render as a codec/depth/colour/chroma line;
|
* (present on a current native lib) describe the negotiated video feed and render as a
|
||||||
* 14/15 render as the stage equation; older layouts just omit those lines.
|
* codec/depth/colour/chroma line; 14/15 render as the stage equation — split into
|
||||||
|
* `host + network + decode` when the Phase-2 terms at 16/17 are nonzero (a current host sends
|
||||||
|
* per-AU 0xCF timings; an old host leaves them 0 and the combined `host+network` term stands);
|
||||||
|
* older layouts just omit those lines.
|
||||||
*/
|
*/
|
||||||
@Composable
|
@Composable
|
||||||
internal fun StatsOverlay(s: DoubleArray, modifier: Modifier = Modifier) {
|
internal fun StatsOverlay(s: DoubleArray, modifier: Modifier = Modifier) {
|
||||||
@@ -60,8 +63,16 @@ internal fun StatsOverlay(s: DoubleArray, modifier: Modifier = Modifier) {
|
|||||||
fontSize = 12.sp,
|
fontSize = 12.sp,
|
||||||
)
|
)
|
||||||
if (s.size >= 16) {
|
if (s.size >= 16) {
|
||||||
|
// Phase-2 split (s[16]/s[17]): render `host + network` separately when the host
|
||||||
|
// reported its share this window; otherwise the combined term (old host / no
|
||||||
|
// matched 0xCF timing).
|
||||||
|
val equation = if (s.size >= 18 && s[16] > 0) {
|
||||||
|
"= host ${"%.1f".format(s[16])} + network ${"%.1f".format(s[17])} + decode ${"%.1f".format(s[15])}"
|
||||||
|
} else {
|
||||||
|
"= host+network ${"%.1f".format(s[14])} + decode ${"%.1f".format(s[15])}"
|
||||||
|
}
|
||||||
Text(
|
Text(
|
||||||
"= host+network ${"%.1f".format(s[14])} + decode ${"%.1f".format(s[15])}",
|
equation,
|
||||||
color = Color.White,
|
color = Color.White,
|
||||||
fontFamily = FontFamily.Monospace,
|
fontFamily = FontFamily.Monospace,
|
||||||
fontSize = 12.sp,
|
fontSize = 12.sp,
|
||||||
|
|||||||
@@ -105,14 +105,17 @@ object NativeBridge {
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* Drain ~1 s of live decode stats for the on-stream HUD, or `null` when no decode thread runs.
|
* Drain ~1 s of live decode stats for the on-stream HUD, or `null` when no decode thread runs.
|
||||||
* Returns 16 doubles (unified stats spec, `design/stats-unification.md`):
|
* Returns 18 doubles (unified stats spec, `design/stats-unification.md`):
|
||||||
* `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skewCorrected, width, height, refreshHz, framesLost,
|
* `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skewCorrected, width, height, refreshHz, framesLost,
|
||||||
* bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms]`
|
* bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms, hostP50Ms,
|
||||||
|
* netP50Ms]`
|
||||||
* (the two flags are 1.0/0.0; indexes 2/3 are the end-to-end capture→decoded headline; 10–13
|
* (the two flags are 1.0/0.0; indexes 2/3 are the end-to-end capture→decoded headline; 10–13
|
||||||
* describe the negotiated video feed — bit depth 8/10, CICP primaries/transfer, and the HEVC
|
* describe the negotiated video feed — bit depth 8/10, CICP primaries/transfer, and the HEVC
|
||||||
* chroma_format_idc 1=4:2:0 / 3=4:4:4; 14/15 are the stage p50s tiling the headline —
|
* chroma_format_idc 1=4:2:0 / 3=4:4:4; 14/15 are the stage p50s tiling the headline —
|
||||||
* `host+network` = capture→received, `decode` = received→decoded). Poll ~1 Hz; each call
|
* `host+network` = capture→received, `decode` = received→decoded; 16/17 split the
|
||||||
* resets the measurement window.
|
* `host+network` term via the host's per-AU 0xCF timings — `host` = the host's capture→sent,
|
||||||
|
* `network` = the remainder — both 0.0 when no timing matched this window, i.e. an old host).
|
||||||
|
* Poll ~1 Hz; each call resets the measurement window.
|
||||||
*/
|
*/
|
||||||
external fun nativeVideoStats(handle: Long): DoubleArray?
|
external fun nativeVideoStats(handle: Long): DoubleArray?
|
||||||
|
|
||||||
|
|||||||
@@ -25,6 +25,11 @@ use std::time::{Duration, Instant};
|
|||||||
/// flight, so anything beyond this is stale (codec flushed / HUD toggled) and gets evicted.
|
/// flight, so anything beyond this is stale (codec flushed / HUD toggled) and gets evicted.
|
||||||
const IN_FLIGHT_CAP: usize = 64;
|
const IN_FLIGHT_CAP: usize = 64;
|
||||||
|
|
||||||
|
/// Cap on received AUs awaiting their 0xCF host timing (Phase 2 host/network split): the timing
|
||||||
|
/// datagram trails its AU by at most the wire, so a match lands within a frame or two — anything
|
||||||
|
/// this deep is a lost datagram (or an old host that never sends any) and gets evicted.
|
||||||
|
const PENDING_SPLIT_CAP: usize = 256;
|
||||||
|
|
||||||
/// The decode loop. Runs on the `pf-decode` thread until `shutdown` is set or the session closes.
|
/// The decode loop. Runs on the `pf-decode` thread until `shutdown` is set or the session closes.
|
||||||
pub fn run(
|
pub fn run(
|
||||||
client: Arc<NativeClient>,
|
client: Arc<NativeClient>,
|
||||||
@@ -155,6 +160,11 @@ pub fn run(
|
|||||||
// point (output-buffer dequeue — MediaCodec round-trips presentationTimeUs) can be paired back
|
// point (output-buffer dequeue — MediaCodec round-trips presentationTimeUs) can be paired back
|
||||||
// to its receipt for the `decode` stage. Only fed while the HUD is visible.
|
// to its receipt for the `decode` stage. Only fed while the HUD is visible.
|
||||||
let mut in_flight: VecDeque<(u64, i128)> = VecDeque::new();
|
let mut in_flight: VecDeque<(u64, i128)> = VecDeque::new();
|
||||||
|
// Phase-2 host/network split (design/stats-unification.md): received AUs awaiting their 0xCF
|
||||||
|
// host timing, as (pts_ns, capture→received µs). The timings are drained non-blockingly right
|
||||||
|
// where receipts are recorded and matched by pts; `network = hostnet − host` (saturating).
|
||||||
|
// Only fed while the HUD is visible; an old host never sends a 0xCF, so entries just age out.
|
||||||
|
let mut pending_split: VecDeque<(u64, u64)> = VecDeque::new();
|
||||||
// The dataspace we've signalled on the Surface so far (None = default/SDR). Set reactively once
|
// The dataspace we've signalled on the Surface so far (None = default/SDR). Set reactively once
|
||||||
// the decoder reports an HDR stream (see `drain`); avoids re-applying every format event.
|
// the decoder reports an HDR stream (see `drain`); avoids re-applying every format event.
|
||||||
let mut applied_ds: Option<DataSpace> = None;
|
let mut applied_ds: Option<DataSpace> = None;
|
||||||
@@ -190,6 +200,26 @@ pub fn run(
|
|||||||
if in_flight.len() > IN_FLIGHT_CAP {
|
if in_flight.len() > IN_FLIGHT_CAP {
|
||||||
in_flight.pop_front(); // stale — codec never echoed it back
|
in_flight.pop_front(); // stale — codec never echoed it back
|
||||||
}
|
}
|
||||||
|
// Phase-2 split: park this AU's capture→received sample, then match any
|
||||||
|
// 0xCF host timings that have arrived — host = the host's own
|
||||||
|
// capture→sent, network = our capture→received minus it (per-frame
|
||||||
|
// tiling; saturating in case of clock jitter).
|
||||||
|
if let Some(hostnet_us) = lat_us {
|
||||||
|
pending_split.push_back((frame.pts_ns, hostnet_us));
|
||||||
|
if pending_split.len() > PENDING_SPLIT_CAP {
|
||||||
|
pending_split.pop_front(); // 0xCF lost / old host — evict
|
||||||
|
}
|
||||||
|
}
|
||||||
|
while let Ok(t) = client.next_host_timing(Duration::ZERO) {
|
||||||
|
if let Some(i) = pending_split.iter().position(|&(p, _)| p == t.pts_ns)
|
||||||
|
{
|
||||||
|
let (_, hostnet_us) = pending_split.remove(i).unwrap();
|
||||||
|
stats.note_host_split(
|
||||||
|
t.host_us as u64,
|
||||||
|
hostnet_us.saturating_sub(t.host_us as u64),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
pending = Some(frame);
|
pending = Some(frame);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -73,13 +73,16 @@ pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeStopVideo(
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// `NativeBridge.nativeVideoStats(handle): DoubleArray?` — drain ~1 s of decode stats for the HUD
|
/// `NativeBridge.nativeVideoStats(handle): DoubleArray?` — drain ~1 s of decode stats for the HUD
|
||||||
/// (unified stats spec, `design/stats-unification.md`). Returns 16 doubles
|
/// (unified stats spec, `design/stats-unification.md`). Returns 18 doubles
|
||||||
/// `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skewCorrected, width, height, refreshHz, framesLost,
|
/// `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skewCorrected, width, height, refreshHz, framesLost,
|
||||||
/// bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms]`
|
/// bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms, hostP50Ms,
|
||||||
/// (the two flags are 1.0/0.0; indexes 0–13 match the previous 14-double layout with the latency
|
/// netP50Ms]`
|
||||||
/// pair re-based from capture→received to the end-to-end capture→decoded headline; the two stage
|
/// (the two flags are 1.0/0.0; indexes 0–15 match the previous 16-double layout — 0–13 the original
|
||||||
/// p50s tiling it — `host+network` = capture→received, `decode` = received→decoded — are appended
|
/// 14-double one with the latency pair re-based to the end-to-end capture→decoded headline, 14/15
|
||||||
/// at the end), or `null` when no decode thread is running. Poll ~1 Hz from the UI; each call
|
/// the stage p50s tiling it: `host+network` = capture→received, `decode` = received→decoded; 16/17
|
||||||
|
/// are the Phase-2 split of the `host+network` term from the per-AU 0xCF host timings — `host` =
|
||||||
|
/// the host's capture→sent, `network` = the remainder — both 0.0 when no timing matched this
|
||||||
|
/// window, i.e. an old host), or `null` when no decode thread is running. Poll ~1 Hz from the UI; each call
|
||||||
/// resets the measurement window. Not android-gated — pure `jni` + connector reads, so it links on
|
/// resets the measurement window. Not android-gated — pure `jni` + connector reads, so it links on
|
||||||
/// the host build too (Kotlin only ever calls it on device).
|
/// the host build too (Kotlin only ever calls it on device).
|
||||||
#[no_mangle]
|
#[no_mangle]
|
||||||
@@ -100,7 +103,7 @@ pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeVideoStats(
|
|||||||
let snap = h.stats.drain();
|
let snap = h.stats.drain();
|
||||||
let mode = h.client.mode();
|
let mode = h.client.mode();
|
||||||
let color = h.client.color;
|
let color = h.client.color;
|
||||||
let buf: [f64; 16] = [
|
let buf: [f64; 18] = [
|
||||||
snap.fps,
|
snap.fps,
|
||||||
snap.mbps,
|
snap.mbps,
|
||||||
snap.e2e_p50_ms,
|
snap.e2e_p50_ms,
|
||||||
@@ -122,6 +125,10 @@ pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeVideoStats(
|
|||||||
// Stage p50s tiling the end-to-end headline (appended to keep 0–13 index-compatible).
|
// Stage p50s tiling the end-to-end headline (appended to keep 0–13 index-compatible).
|
||||||
snap.hostnet_p50_ms,
|
snap.hostnet_p50_ms,
|
||||||
snap.decode_p50_ms,
|
snap.decode_p50_ms,
|
||||||
|
// Phase-2 host/network split of the `host+network` stage (0xCF host timings): 0.0
|
||||||
|
// when no timing matched this window (old host) — the HUD keeps the combined term.
|
||||||
|
snap.host_p50_ms,
|
||||||
|
snap.net_p50_ms,
|
||||||
];
|
];
|
||||||
let arr = match env.new_double_array(buf.len() as jsize) {
|
let arr = match env.new_double_array(buf.len() as jsize) {
|
||||||
Ok(a) => a,
|
Ok(a) => a,
|
||||||
|
|||||||
@@ -1,7 +1,9 @@
|
|||||||
//! Live decode stats for the on-stream HUD, following the unified stats spec
|
//! Live decode stats for the on-stream HUD, following the unified stats spec
|
||||||
//! (`design/stats-unification.md`): FPS, receive throughput, and the Android v1 stage split —
|
//! (`design/stats-unification.md`): FPS, receive throughput, and the Android v1 stage split —
|
||||||
//! headline `end-to-end` = capture→decoded (p50/p95) tiled by `host+network` = capture→received
|
//! headline `end-to-end` = capture→decoded (p50/p95) tiled by `host+network` = capture→received
|
||||||
//! and `decode` = received→decoded (stage p50s). The decode thread is the sole writer
|
//! and `decode` = received→decoded (stage p50s). When the host emits per-AU 0xCF host timings, the
|
||||||
|
//! `host+network` term further splits into `host` + `network` (Phase 2, `note_host_split`); an old
|
||||||
|
//! host emits none and the combined term stands. The decode thread is the sole writer
|
||||||
//! (`note_received` per access unit at receipt, `note_decoded` per decoder output buffer); the JNI
|
//! (`note_received` per access unit at receipt, `note_decoded` per decoder output buffer); the JNI
|
||||||
//! accessor `nativeVideoStats` drains a snapshot ~1 Hz and resets the window. Sampling is gated on
|
//! accessor `nativeVideoStats` drains a snapshot ~1 Hz and resets the window. Sampling is gated on
|
||||||
//! the HUD actually being visible (`set_enabled`, driven by `nativeSetVideoStatsEnabled`) so the
|
//! the HUD actually being visible (`set_enabled`, driven by `nativeSetVideoStatsEnabled`) so the
|
||||||
@@ -32,6 +34,12 @@ struct Inner {
|
|||||||
e2e_us: Vec<u64>,
|
e2e_us: Vec<u64>,
|
||||||
/// `host+network` stage = capture→received samples, in microseconds (skew-corrected).
|
/// `host+network` stage = capture→received samples, in microseconds (skew-corrected).
|
||||||
hostnet_us: Vec<u64>,
|
hostnet_us: Vec<u64>,
|
||||||
|
/// Phase-2 split of `host+network` (design/stats-unification.md Phase 2), fed only when the
|
||||||
|
/// host emits per-AU 0xCF timings: `host` = the host's own capture→sent duration, µs.
|
||||||
|
host_us: Vec<u64>,
|
||||||
|
/// The matching `network` term, µs: capture→received minus the host's capture→sent
|
||||||
|
/// (wire + reassembly). Always pushed in lockstep with `host_us`.
|
||||||
|
net_us: Vec<u64>,
|
||||||
/// `decode` stage = received→decoded samples, in microseconds (client-local, single clock).
|
/// `decode` stage = received→decoded samples, in microseconds (client-local, single clock).
|
||||||
decode_us: Vec<u64>,
|
decode_us: Vec<u64>,
|
||||||
/// Whether the host answered the clock-skew handshake (latency is cross-machine valid).
|
/// Whether the host answered the clock-skew handshake (latency is cross-machine valid).
|
||||||
@@ -50,6 +58,10 @@ pub struct Snapshot {
|
|||||||
/// Stage p50s (ms): `host+network` (capture→received) and `decode` (received→decoded).
|
/// Stage p50s (ms): `host+network` (capture→received) and `decode` (received→decoded).
|
||||||
pub hostnet_p50_ms: f64,
|
pub hostnet_p50_ms: f64,
|
||||||
pub decode_p50_ms: f64,
|
pub decode_p50_ms: f64,
|
||||||
|
/// Phase-2 `host` / `network` split p50s (ms) — 0.0 when no 0xCF timing matched this window
|
||||||
|
/// (old host / no samples yet), in which case the HUD keeps the combined `host+network` term.
|
||||||
|
pub host_p50_ms: f64,
|
||||||
|
pub net_p50_ms: f64,
|
||||||
pub lat_valid: bool,
|
pub lat_valid: bool,
|
||||||
pub skew_corrected: bool,
|
pub skew_corrected: bool,
|
||||||
}
|
}
|
||||||
@@ -73,6 +85,8 @@ impl VideoStats {
|
|||||||
bytes: 0,
|
bytes: 0,
|
||||||
e2e_us: Vec::with_capacity(256),
|
e2e_us: Vec::with_capacity(256),
|
||||||
hostnet_us: Vec::with_capacity(256),
|
hostnet_us: Vec::with_capacity(256),
|
||||||
|
host_us: Vec::with_capacity(256),
|
||||||
|
net_us: Vec::with_capacity(256),
|
||||||
decode_us: Vec::with_capacity(256),
|
decode_us: Vec::with_capacity(256),
|
||||||
skew_corrected: false,
|
skew_corrected: false,
|
||||||
}),
|
}),
|
||||||
@@ -101,6 +115,8 @@ impl VideoStats {
|
|||||||
g.bytes = 0;
|
g.bytes = 0;
|
||||||
g.e2e_us.clear();
|
g.e2e_us.clear();
|
||||||
g.hostnet_us.clear();
|
g.hostnet_us.clear();
|
||||||
|
g.host_us.clear();
|
||||||
|
g.net_us.clear();
|
||||||
g.decode_us.clear();
|
g.decode_us.clear();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -128,6 +144,25 @@ impl VideoStats {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Record one matched host/network split sample (Phase 2): the host's reported capture→sent
|
||||||
|
/// duration and our capture→received minus it, both µs — one pair per AU whose 0xCF host
|
||||||
|
/// timing arrived and matched by pts. An old host emits none, leaving the vecs empty and the
|
||||||
|
/// snapshot p50s at 0 (HUD keeps the combined `host+network` term).
|
||||||
|
// Driven only by the android-only decode thread; unreferenced on the host build — expected.
|
||||||
|
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
|
||||||
|
pub fn note_host_split(&self, host_us: u64, net_us: u64) {
|
||||||
|
if !self.enabled.load(Ordering::Relaxed) {
|
||||||
|
return; // HUD hidden — skip the lock
|
||||||
|
}
|
||||||
|
// Poison-proof for the same reason as `note_received`.
|
||||||
|
let mut g = self
|
||||||
|
.inner
|
||||||
|
.lock()
|
||||||
|
.unwrap_or_else(std::sync::PoisonError::into_inner);
|
||||||
|
g.host_us.push(host_us);
|
||||||
|
g.net_us.push(net_us);
|
||||||
|
}
|
||||||
|
|
||||||
/// Record one decoded output frame: its capture→decoded `end-to-end` sample and its
|
/// Record one decoded output frame: its capture→decoded `end-to-end` sample and its
|
||||||
/// received→decoded `decode` stage sample (either may be absent — e.g. the receipt stamp for
|
/// received→decoded `decode` stage sample (either may be absent — e.g. the receipt stamp for
|
||||||
/// this pts predates the HUD being shown).
|
/// this pts predates the HUD being shown).
|
||||||
@@ -163,6 +198,8 @@ impl VideoStats {
|
|||||||
let mbps = g.bytes as f64 * 8.0 / 1_000_000.0 / elapsed;
|
let mbps = g.bytes as f64 * 8.0 / 1_000_000.0 / elapsed;
|
||||||
g.e2e_us.sort_unstable();
|
g.e2e_us.sort_unstable();
|
||||||
g.hostnet_us.sort_unstable();
|
g.hostnet_us.sort_unstable();
|
||||||
|
g.host_us.sort_unstable();
|
||||||
|
g.net_us.sort_unstable();
|
||||||
g.decode_us.sort_unstable();
|
g.decode_us.sort_unstable();
|
||||||
let snap = Snapshot {
|
let snap = Snapshot {
|
||||||
fps,
|
fps,
|
||||||
@@ -171,6 +208,8 @@ impl VideoStats {
|
|||||||
e2e_p95_ms: pctl_ms(&g.e2e_us, 0.95),
|
e2e_p95_ms: pctl_ms(&g.e2e_us, 0.95),
|
||||||
hostnet_p50_ms: pctl_ms(&g.hostnet_us, 0.50),
|
hostnet_p50_ms: pctl_ms(&g.hostnet_us, 0.50),
|
||||||
decode_p50_ms: pctl_ms(&g.decode_us, 0.50),
|
decode_p50_ms: pctl_ms(&g.decode_us, 0.50),
|
||||||
|
host_p50_ms: pctl_ms(&g.host_us, 0.50),
|
||||||
|
net_p50_ms: pctl_ms(&g.net_us, 0.50),
|
||||||
lat_valid: !g.e2e_us.is_empty(),
|
lat_valid: !g.e2e_us.is_empty(),
|
||||||
skew_corrected: g.skew_corrected,
|
skew_corrected: g.skew_corrected,
|
||||||
};
|
};
|
||||||
@@ -179,6 +218,8 @@ impl VideoStats {
|
|||||||
g.bytes = 0;
|
g.bytes = 0;
|
||||||
g.e2e_us.clear();
|
g.e2e_us.clear();
|
||||||
g.hostnet_us.clear();
|
g.hostnet_us.clear();
|
||||||
|
g.host_us.clear();
|
||||||
|
g.net_us.clear();
|
||||||
g.decode_us.clear();
|
g.decode_us.clear();
|
||||||
snap
|
snap
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -326,9 +326,14 @@ struct ContentView: View {
|
|||||||
onCaptureChange: { [weak model] captured in
|
onCaptureChange: { [weak model] captured in
|
||||||
model?.mouseCaptured = captured
|
model?.mouseCaptured = captured
|
||||||
},
|
},
|
||||||
onFrame: { [meter = model.meter, latency = model.latency, offset = conn.clockOffsetNs] au in
|
onFrame: { [meter = model.meter, latency = model.latency,
|
||||||
|
split = model.latencySplit, offset = conn.clockOffsetNs] au in
|
||||||
meter.note(byteCount: au.data.count)
|
meter.note(byteCount: au.data.count)
|
||||||
latency.record(ptsNs: au.ptsNs, offsetNs: offset)
|
latency.record(ptsNs: au.ptsNs, offsetNs: offset)
|
||||||
|
// The same receipt, keyed by pts, awaiting its 0xCF host timing (the
|
||||||
|
// host/network split — drained by the 1 s stats tick).
|
||||||
|
split.recordReceipt(
|
||||||
|
ptsNs: au.ptsNs, receivedNs: au.receivedNs, offsetNs: offset)
|
||||||
},
|
},
|
||||||
onSessionEnd: { [weak model] in
|
onSessionEnd: { [weak model] in
|
||||||
Task { @MainActor in model?.sessionEnded() }
|
Task { @MainActor in model?.sessionEnded() }
|
||||||
|
|||||||
@@ -69,6 +69,14 @@ final class SessionModel: ObservableObject {
|
|||||||
@Published var hostNetworkP95Ms = 0.0
|
@Published var hostNetworkP95Ms = 0.0
|
||||||
@Published var hostNetworkValid = false
|
@Published var hostNetworkValid = false
|
||||||
@Published var hostNetworkSkewCorrected = false
|
@Published var hostNetworkSkewCorrected = false
|
||||||
|
/// Phase 2 of the same stage: `host+network` split into its two terms via the host's per-AU
|
||||||
|
/// 0xCF timing reports (host = capture→fully-sent as the host measured it, network = the
|
||||||
|
/// remainder), matched to receipts by pts in `latencySplit`. `splitValid` is false whenever
|
||||||
|
/// no timing matched in the window — an old host that never emits the plane, or heavy 0xCF
|
||||||
|
/// loss — and the HUD then falls back to the combined `host+network` term.
|
||||||
|
@Published var hostP50Ms = 0.0
|
||||||
|
@Published var networkP50Ms = 0.0
|
||||||
|
@Published var splitValid = false
|
||||||
/// End-to-end = capture→on-glass, measured directly per frame (never summed from the stages) —
|
/// End-to-end = capture→on-glass, measured directly per frame (never summed from the stages) —
|
||||||
/// the HUD headline. Only the stage-2 presenter can stamp it (it owns decode + a
|
/// the HUD headline. Only the stage-2 presenter can stamp it (it owns decode + a
|
||||||
/// CAMetalLayer/display-link present); stays invalid under stage-1, where the layer presents
|
/// CAMetalLayer/display-link present); stays invalid under stage-1, where the layer presents
|
||||||
@@ -96,6 +104,10 @@ final class SessionModel: ObservableObject {
|
|||||||
/// Capture→received (the host+network stage), fed per AU at receipt by the stream view's
|
/// Capture→received (the host+network stage), fed per AU at receipt by the stream view's
|
||||||
/// onFrame — under both presenters.
|
/// onFrame — under both presenters.
|
||||||
let latency = LatencyMeter()
|
let latency = LatencyMeter()
|
||||||
|
/// The host/network split of that same stage: onFrame also records (pts, interval) receipts
|
||||||
|
/// here, and the 1 s stats tick drains the connection's 0xCF host timings into it — under
|
||||||
|
/// both presenters (the receipt path is presenter-independent).
|
||||||
|
let latencySplit = HostNetworkSplitter()
|
||||||
/// The stage-2 meters, passed to StreamView: end-to-end (capture→on-glass, stamped at
|
/// The stage-2 meters, passed to StreamView: end-to-end (capture→on-glass, stamped at
|
||||||
/// present), decode (received→decoded), display (decoded→on-glass).
|
/// present), decode (received→decoded), display (decoded→on-glass).
|
||||||
let endToEnd = LatencyMeter()
|
let endToEnd = LatencyMeter()
|
||||||
@@ -296,6 +308,7 @@ final class SessionModel: ObservableObject {
|
|||||||
fps = 0
|
fps = 0
|
||||||
mbps = 0
|
mbps = 0
|
||||||
hostNetworkValid = false
|
hostNetworkValid = false
|
||||||
|
splitValid = false
|
||||||
endToEndValid = false
|
endToEndValid = false
|
||||||
decodeValid = false
|
decodeValid = false
|
||||||
displayValid = false
|
displayValid = false
|
||||||
@@ -341,6 +354,7 @@ final class SessionModel: ObservableObject {
|
|||||||
|
|
||||||
private func startStatsTimer() {
|
private func startStatsTimer() {
|
||||||
lastFramesDropped = 0 // a fresh connection's cumulative drop counter starts at 0
|
lastFramesDropped = 0 // a fresh connection's cumulative drop counter starts at 0
|
||||||
|
latencySplit.reset() // no stale receipts/samples from a previous session
|
||||||
let timer = Timer(timeInterval: 1.0, repeats: true) { [weak self] _ in
|
let timer = Timer(timeInterval: 1.0, repeats: true) { [weak self] _ in
|
||||||
guard let self else { return }
|
guard let self else { return }
|
||||||
Task { @MainActor in
|
Task { @MainActor in
|
||||||
@@ -364,6 +378,25 @@ final class SessionModel: ObservableObject {
|
|||||||
} else {
|
} else {
|
||||||
self.hostNetworkValid = false
|
self.hostNetworkValid = false
|
||||||
}
|
}
|
||||||
|
// Phase 2: drain the window's per-AU host timings (0xCF) into the splitter —
|
||||||
|
// non-blocking, bounded (a 240 fps window is ~240 reports; the cap only guards
|
||||||
|
// a pathological burst). `try?` flattens (SE-0230); a throw (.closed during
|
||||||
|
// teardown) just ends the drain. An old host never emits any → splitValid stays
|
||||||
|
// false and the HUD keeps the combined host+network term.
|
||||||
|
if let conn = self.connection {
|
||||||
|
var burst = 0
|
||||||
|
while burst < 1024, let t = try? conn.nextHostTiming(timeoutMs: 0) {
|
||||||
|
self.latencySplit.noteHostTiming(ptsNs: t.ptsNs, hostUs: t.hostUs)
|
||||||
|
burst += 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if let s = self.latencySplit.drain() {
|
||||||
|
self.hostP50Ms = s.hostP50Ms
|
||||||
|
self.networkP50Ms = s.networkP50Ms
|
||||||
|
self.splitValid = true
|
||||||
|
} else {
|
||||||
|
self.splitValid = false
|
||||||
|
}
|
||||||
if let e = self.endToEnd.drain() {
|
if let e = self.endToEnd.drain() {
|
||||||
self.endToEndP50Ms = e.p50Ms
|
self.endToEndP50Ms = e.p50Ms
|
||||||
self.endToEndP95Ms = e.p95Ms
|
self.endToEndP95Ms = e.p95Ms
|
||||||
|
|||||||
@@ -26,20 +26,34 @@ struct StreamHUDView: View {
|
|||||||
Text("end-to-end \(model.endToEndP50Ms, specifier: "%.1f") ms p50 · \(model.endToEndP95Ms, specifier: "%.1f") p95 · capture→on-glass\(model.endToEndSkewCorrected ? "" : " (same-host clock)")")
|
Text("end-to-end \(model.endToEndP50Ms, specifier: "%.1f") ms p50 · \(model.endToEndP95Ms, specifier: "%.1f") p95 · capture→on-glass\(model.endToEndSkewCorrected ? "" : " (same-host clock)")")
|
||||||
.font(.system(.caption2, design: .monospaced))
|
.font(.system(.caption2, design: .monospaced))
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
// The equation: the three stages tiling the headline interval (per-window p50s —
|
// The equation: the stages tiling the headline interval (per-window p50s —
|
||||||
// they only approximately sum to the directly-measured total).
|
// they only approximately sum to the directly-measured total). With a host
|
||||||
|
// that reports per-AU timings (0xCF) the first term splits into host + network
|
||||||
|
// (phase 2); an old host keeps the combined term.
|
||||||
if model.hostNetworkValid && model.decodeValid && model.displayValid {
|
if model.hostNetworkValid && model.decodeValid && model.displayValid {
|
||||||
Text("= host+network \(model.hostNetworkP50Ms, specifier: "%.1f") + decode \(model.decodeP50Ms, specifier: "%.1f") + display \(model.displayP50Ms, specifier: "%.1f")")
|
if model.splitValid {
|
||||||
.font(.system(.caption2, design: .monospaced))
|
Text("= host \(model.hostP50Ms, specifier: "%.1f") + network \(model.networkP50Ms, specifier: "%.1f") + decode \(model.decodeP50Ms, specifier: "%.1f") + display \(model.displayP50Ms, specifier: "%.1f")")
|
||||||
.foregroundStyle(.secondary)
|
.font(.system(.caption2, design: .monospaced))
|
||||||
|
.foregroundStyle(.secondary)
|
||||||
|
} else {
|
||||||
|
Text("= host+network \(model.hostNetworkP50Ms, specifier: "%.1f") + decode \(model.decodeP50Ms, specifier: "%.1f") + display \(model.displayP50Ms, specifier: "%.1f")")
|
||||||
|
.font(.system(.caption2, design: .monospaced))
|
||||||
|
.foregroundStyle(.secondary)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
} else if model.hostNetworkValid {
|
} else if model.hostNetworkValid {
|
||||||
// Stage-1 fallback presenter: the layer decodes + presents internally with no
|
// Stage-1 fallback presenter: the layer decodes + presents internally with no
|
||||||
// per-frame stamp, so the honest headline ends at receipt — and there is no
|
// per-frame stamp, so the honest headline ends at receipt. The host/network
|
||||||
// equation line (host+network is the whole measured interval).
|
// split still applies there (receipt is presenter-independent) — it becomes the
|
||||||
|
// only equation line; without it, host+network IS the whole measured interval.
|
||||||
Text("capture→received \(model.hostNetworkP50Ms, specifier: "%.1f") ms p50 · \(model.hostNetworkP95Ms, specifier: "%.1f") p95\(model.hostNetworkSkewCorrected ? "" : " (same-host clock)")")
|
Text("capture→received \(model.hostNetworkP50Ms, specifier: "%.1f") ms p50 · \(model.hostNetworkP95Ms, specifier: "%.1f") p95\(model.hostNetworkSkewCorrected ? "" : " (same-host clock)")")
|
||||||
.font(.system(.caption2, design: .monospaced))
|
.font(.system(.caption2, design: .monospaced))
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
|
if model.splitValid {
|
||||||
|
Text("= host \(model.hostP50Ms, specifier: "%.1f") + network \(model.networkP50Ms, specifier: "%.1f")")
|
||||||
|
.font(.system(.caption2, design: .monospaced))
|
||||||
|
.foregroundStyle(.secondary)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
if model.lostFrames > 0 {
|
if model.lostFrames > 0 {
|
||||||
// Unrecoverable network drops this window; hidden while the link is clean.
|
// Unrecoverable network drops this window; hidden while the link is clean.
|
||||||
|
|||||||
@@ -83,6 +83,9 @@ public final class PunktfunkConnection {
|
|||||||
/// Same role for the feedback drain thread (rumble + HID-output — two core planes,
|
/// Same role for the feedback drain thread (rumble + HID-output — two core planes,
|
||||||
/// drained sequentially by one thread).
|
/// drained sequentially by one thread).
|
||||||
private let feedbackLock = NSLock()
|
private let feedbackLock = NSLock()
|
||||||
|
/// Same role for the host-timing (0xCF) puller — its own plane in the core, drained
|
||||||
|
/// non-blockingly by the app's 1 s stats tick (never contends with the blocking pullers).
|
||||||
|
private let statsLock = NSLock()
|
||||||
|
|
||||||
/// Negotiated session mode (host-confirmed).
|
/// Negotiated session mode (host-confirmed).
|
||||||
public private(set) var width: UInt32 = 0
|
public private(set) var width: UInt32 = 0
|
||||||
@@ -665,6 +668,40 @@ public final class PunktfunkConnection {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// One per-AU host-timing report (0xCF): the host's capture→fully-sent duration for the
|
||||||
|
/// access unit whose `AccessUnit.ptsNs` equals `ptsNs` exactly. The stats consumer derives
|
||||||
|
/// `network = (receivedNs + clockOffsetNs − ptsNs) − hostUs` — the host/network split of the
|
||||||
|
/// HUD's `host+network` stage (design/stats-unification.md Phase 2).
|
||||||
|
public struct HostTiming: Sendable, Equatable {
|
||||||
|
/// The AU's capture stamp (host capture clock — matches the AU's `ptsNs`).
|
||||||
|
public let ptsNs: UInt64
|
||||||
|
/// Host capture→sent duration, µs.
|
||||||
|
public let hostUs: UInt32
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Pull the next per-AU host timing; nil on timeout, throws `.closed` once the session
|
||||||
|
/// ended. Best-effort plane: an older host never emits any — keep showing the combined
|
||||||
|
/// `host+network` stage then. Drain non-blockingly (`timeoutMs: 0`) from ONE stats
|
||||||
|
/// consumer (its own core plane, safe alongside the other pullers).
|
||||||
|
public func nextHostTiming(timeoutMs: UInt32 = 0) throws -> HostTiming? {
|
||||||
|
statsLock.lock()
|
||||||
|
defer { statsLock.unlock() }
|
||||||
|
guard let h = liveHandle() else { throw PunktfunkClientError.closed }
|
||||||
|
|
||||||
|
var out = PunktfunkHostTiming()
|
||||||
|
let rc = punktfunk_connection_next_host_timing(h, &out, timeoutMs)
|
||||||
|
switch rc {
|
||||||
|
case statusOK:
|
||||||
|
return HostTiming(ptsNs: out.pts_ns, hostUs: out.host_us)
|
||||||
|
case statusNoFrame:
|
||||||
|
return nil
|
||||||
|
case statusClosed:
|
||||||
|
throw PunktfunkClientError.closed
|
||||||
|
default:
|
||||||
|
throw PunktfunkClientError.status(rc)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Send one input event (delivered to the host as a QUIC datagram). Thread-safe;
|
/// Send one input event (delivered to the host as a QUIC datagram). Thread-safe;
|
||||||
/// silently dropped after close.
|
/// silently dropped after close.
|
||||||
public func send(_ event: PunktfunkInputEvent) {
|
public func send(_ event: PunktfunkInputEvent) {
|
||||||
@@ -684,10 +721,12 @@ public final class PunktfunkConnection {
|
|||||||
pumpLock.lock() // pullers exit at their next poll boundary, releasing these
|
pumpLock.lock() // pullers exit at their next poll boundary, releasing these
|
||||||
audioLock.lock()
|
audioLock.lock()
|
||||||
feedbackLock.lock()
|
feedbackLock.lock()
|
||||||
|
statsLock.lock()
|
||||||
abiLock.lock()
|
abiLock.lock()
|
||||||
let h = handle
|
let h = handle
|
||||||
handle = nil
|
handle = nil
|
||||||
abiLock.unlock()
|
abiLock.unlock()
|
||||||
|
statsLock.unlock()
|
||||||
feedbackLock.unlock()
|
feedbackLock.unlock()
|
||||||
audioLock.unlock()
|
audioLock.unlock()
|
||||||
pumpLock.unlock()
|
pumpLock.unlock()
|
||||||
|
|||||||
@@ -0,0 +1,88 @@
|
|||||||
|
// Splits the unified stats model's `host+network` stage (capture→received) into its `host`
|
||||||
|
// (capture→fully-sent, reported per AU by the host on the 0xCF plane) and `network`
|
||||||
|
// (the remainder) terms — design/stats-unification.md Phase 2.
|
||||||
|
//
|
||||||
|
// Receipt samples are recorded per frame from the pump path; host timings are matched to them
|
||||||
|
// by exact pts (the 0xCF datagram carries the AU's own `pts_ns`). Best-effort by construction:
|
||||||
|
// a lost 0xCF datagram, an FEC-dropped AU, or an old host that never emits the plane simply
|
||||||
|
// contributes no split sample — the HUD then keeps the combined `host+network` line. NSLock
|
||||||
|
// rather than an actor — the receipt writer is the non-async pump path (same pattern as
|
||||||
|
// LatencyMeter/FrameMeter).
|
||||||
|
|
||||||
|
import Foundation
|
||||||
|
|
||||||
|
/// Per-frame `host` / `network` sampler: `recordReceipt` at AU receipt (pts + the combined
|
||||||
|
/// capture→received interval), `noteHostTiming` per drained 0xCF report, `drain` the window's
|
||||||
|
/// p50s once a second. The pending ring is bounded (drop-oldest) so an old host — receipts
|
||||||
|
/// forever, timings never — costs a fixed ~4 KB, not growth.
|
||||||
|
public final class HostNetworkSplitter: @unchecked Sendable {
|
||||||
|
private let lock = NSLock()
|
||||||
|
/// Received AUs awaiting their 0xCF host timing: (pts, combined capture→received µs).
|
||||||
|
private var pending: [(ptsNs: UInt64, combinedUs: Int64)] = []
|
||||||
|
private var hostUsSamples: [Int64] = []
|
||||||
|
private var networkUsSamples: [Int64] = []
|
||||||
|
/// ~1 s of frames at 240 fps; beyond it the oldest receipt can no longer expect a match.
|
||||||
|
private static let pendingCap = 256
|
||||||
|
|
||||||
|
public init() {}
|
||||||
|
|
||||||
|
/// Record one frame at receipt. `ptsNs` is the host capture clock (the AU's pts),
|
||||||
|
/// `receivedNs` the client `CLOCK_REALTIME` receipt instant (`AccessUnit.receivedNs`),
|
||||||
|
/// `offsetNs` the connect-time host−client clock offset (0 = uncorrected). Same
|
||||||
|
/// absurd-value clamp as LatencyMeter — a sample it would drop must not linger here.
|
||||||
|
public func recordReceipt(ptsNs: UInt64, receivedNs: Int64, offsetNs: Int64) {
|
||||||
|
let combinedNs = receivedNs &+ offsetNs &- Int64(bitPattern: ptsNs)
|
||||||
|
guard combinedNs > 0, combinedNs < 10_000_000_000 else { return }
|
||||||
|
lock.lock()
|
||||||
|
pending.append((ptsNs: ptsNs, combinedUs: combinedNs / 1000))
|
||||||
|
if pending.count > Self.pendingCap {
|
||||||
|
pending.removeFirst(pending.count - Self.pendingCap)
|
||||||
|
}
|
||||||
|
lock.unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Match one host timing (0xCF) to its receipt: `host` = the reported capture→sent,
|
||||||
|
/// `network` = the combined interval minus it, floored at 0 (the terms tile per frame; a
|
||||||
|
/// slightly-off skew offset must not produce a negative wire time). Unmatched timings —
|
||||||
|
/// the AU was FEC-dropped, or its receipt raced this drain — are simply skipped.
|
||||||
|
public func noteHostTiming(ptsNs: UInt64, hostUs: UInt32) {
|
||||||
|
lock.lock()
|
||||||
|
defer { lock.unlock() }
|
||||||
|
guard let i = pending.firstIndex(where: { $0.ptsNs == ptsNs }) else { return }
|
||||||
|
let combinedUs = pending.remove(at: i).combinedUs
|
||||||
|
hostUsSamples.append(Int64(hostUs))
|
||||||
|
networkUsSamples.append(max(0, combinedUs - Int64(hostUs)))
|
||||||
|
}
|
||||||
|
|
||||||
|
public struct Split: Sendable {
|
||||||
|
public let hostP50Ms: Double
|
||||||
|
public let networkP50Ms: Double
|
||||||
|
public let count: Int
|
||||||
|
}
|
||||||
|
|
||||||
|
/// The window's p50s since the last drain, then reset (matched samples only; the pending
|
||||||
|
/// ring survives — a receipt may still match a timing drained next tick). `nil` when no
|
||||||
|
/// timing matched in the interval — the caller falls back to the combined stage.
|
||||||
|
public func drain() -> Split? {
|
||||||
|
lock.lock()
|
||||||
|
let host = hostUsSamples.sorted()
|
||||||
|
let network = networkUsSamples.sorted()
|
||||||
|
hostUsSamples.removeAll(keepingCapacity: true)
|
||||||
|
networkUsSamples.removeAll(keepingCapacity: true)
|
||||||
|
lock.unlock()
|
||||||
|
guard !host.isEmpty else { return nil }
|
||||||
|
func p50(_ sorted: [Int64]) -> Double {
|
||||||
|
Double(sorted[min(sorted.count / 2, sorted.count - 1)]) / 1000.0 // µs → ms
|
||||||
|
}
|
||||||
|
return Split(hostP50Ms: p50(host), networkP50Ms: p50(network), count: host.count)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Forget everything (pending receipts + window) — a fresh connection starts clean.
|
||||||
|
public func reset() {
|
||||||
|
lock.lock()
|
||||||
|
pending.removeAll()
|
||||||
|
hostUsSamples.removeAll()
|
||||||
|
networkUsSamples.removeAll()
|
||||||
|
lock.unlock()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,107 @@
|
|||||||
|
// Unit tests for HostNetworkSplitter (the host/network split of the unified stats model's
|
||||||
|
// host+network stage — design/stats-unification.md Phase 2): pts matching, the per-frame
|
||||||
|
// tiling arithmetic (network = combined − host, floored at 0), drain/reset semantics, the
|
||||||
|
// bounded pending ring, and the absurd-receipt clamp. All samples use explicit instants, so
|
||||||
|
// the expectations are exact.
|
||||||
|
|
||||||
|
import Foundation
|
||||||
|
import XCTest
|
||||||
|
|
||||||
|
@testable import PunktfunkKit
|
||||||
|
|
||||||
|
final class HostNetworkSplitterTests: XCTestCase {
|
||||||
|
/// An arbitrary host-capture pts (ns) far from zero, like a real CLOCK_REALTIME stamp.
|
||||||
|
private let basePts: UInt64 = 1_000_000_000_000
|
||||||
|
|
||||||
|
private func receipt(_ s: HostNetworkSplitter, pts: UInt64, combinedMs: Int64,
|
||||||
|
offsetNs: Int64 = 0) {
|
||||||
|
s.recordReceipt(
|
||||||
|
ptsNs: pts, receivedNs: Int64(pts) + combinedMs * 1_000_000 - offsetNs,
|
||||||
|
offsetNs: offsetNs)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testEmptyDrainIsNil() {
|
||||||
|
XCTAssertNil(HostNetworkSplitter().drain())
|
||||||
|
}
|
||||||
|
|
||||||
|
func testMatchSplitsCombinedIntoHostAndNetwork() {
|
||||||
|
let s = HostNetworkSplitter()
|
||||||
|
receipt(s, pts: basePts, combinedMs: 8) // capture→received 8 ms
|
||||||
|
s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) // host says 3 ms of it was its own
|
||||||
|
guard let split = s.drain() else { return XCTFail("expected a matched sample") }
|
||||||
|
XCTAssertEqual(split.count, 1)
|
||||||
|
XCTAssertEqual(split.hostP50Ms, 3.0)
|
||||||
|
XCTAssertEqual(split.networkP50Ms, 5.0, "the two terms tile the combined interval")
|
||||||
|
XCTAssertNil(s.drain(), "drain resets the window")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testSkewOffsetAppliesToTheCombinedInterval() {
|
||||||
|
let s = HostNetworkSplitter()
|
||||||
|
// Client clock 2 ms behind the host: the raw difference alone would read 6 ms.
|
||||||
|
receipt(s, pts: basePts, combinedMs: 8, offsetNs: 2_000_000)
|
||||||
|
s.noteHostTiming(ptsNs: basePts, hostUs: 3_000)
|
||||||
|
XCTAssertEqual(s.drain()?.networkP50Ms, 5.0)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testUnmatchedTimingIsSkipped() {
|
||||||
|
let s = HostNetworkSplitter()
|
||||||
|
receipt(s, pts: basePts, combinedMs: 8)
|
||||||
|
// A timing for an AU we never received (FEC-dropped) must not fabricate a sample.
|
||||||
|
s.noteHostTiming(ptsNs: basePts + 1, hostUs: 3_000)
|
||||||
|
XCTAssertNil(s.drain())
|
||||||
|
}
|
||||||
|
|
||||||
|
func testReceiptSurvivesADrainUntilItsTimingArrives() {
|
||||||
|
let s = HostNetworkSplitter()
|
||||||
|
receipt(s, pts: basePts, combinedMs: 8)
|
||||||
|
XCTAssertNil(s.drain(), "no timing matched yet")
|
||||||
|
s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) // arrives one tick late — still matches
|
||||||
|
XCTAssertEqual(s.drain()?.hostP50Ms, 3.0)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testEachReceiptMatchesOnce() {
|
||||||
|
let s = HostNetworkSplitter()
|
||||||
|
receipt(s, pts: basePts, combinedMs: 8)
|
||||||
|
s.noteHostTiming(ptsNs: basePts, hostUs: 3_000)
|
||||||
|
s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) // duplicate 0xCF — no second sample
|
||||||
|
XCTAssertEqual(s.drain()?.count, 1)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testNetworkFlooredAtZero() {
|
||||||
|
let s = HostNetworkSplitter()
|
||||||
|
// A slightly-off skew offset can make host_us exceed the combined interval.
|
||||||
|
receipt(s, pts: basePts, combinedMs: 2)
|
||||||
|
s.noteHostTiming(ptsNs: basePts, hostUs: 3_000)
|
||||||
|
guard let split = s.drain() else { return XCTFail("expected a sample") }
|
||||||
|
XCTAssertEqual(split.hostP50Ms, 3.0)
|
||||||
|
XCTAssertEqual(split.networkP50Ms, 0.0)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testPendingRingDropsOldest() {
|
||||||
|
let s = HostNetworkSplitter()
|
||||||
|
for i in 0..<300 { // cap is 256 — the first receipts fall out
|
||||||
|
receipt(s, pts: basePts + UInt64(i), combinedMs: 8)
|
||||||
|
}
|
||||||
|
s.noteHostTiming(ptsNs: basePts, hostUs: 3_000) // evicted — no match
|
||||||
|
XCTAssertNil(s.drain())
|
||||||
|
s.noteHostTiming(ptsNs: basePts + 299, hostUs: 3_000) // newest — still pending
|
||||||
|
XCTAssertEqual(s.drain()?.count, 1)
|
||||||
|
}
|
||||||
|
|
||||||
|
func testAbsurdReceiptsAreDropped() {
|
||||||
|
let s = HostNetworkSplitter()
|
||||||
|
receipt(s, pts: basePts, combinedMs: -1) // received before capture — clock step
|
||||||
|
receipt(s, pts: basePts + 1, combinedMs: 20_000) // > 10 s — garbage pts/offset
|
||||||
|
s.noteHostTiming(ptsNs: basePts, hostUs: 1_000)
|
||||||
|
s.noteHostTiming(ptsNs: basePts + 1, hostUs: 1_000)
|
||||||
|
XCTAssertNil(s.drain())
|
||||||
|
}
|
||||||
|
|
||||||
|
func testResetForgetsPendingReceipts() {
|
||||||
|
let s = HostNetworkSplitter()
|
||||||
|
receipt(s, pts: basePts, combinedMs: 8)
|
||||||
|
s.reset()
|
||||||
|
s.noteHostTiming(ptsNs: basePts, hostUs: 3_000)
|
||||||
|
XCTAssertNil(s.drain(), "a fresh session must not match a previous session's receipts")
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -25,12 +25,18 @@ final class LoopbackIntegrationTests: XCTestCase {
|
|||||||
XCTAssertEqual(conn.resolvedBitrateKbps, 50_000)
|
XCTAssertEqual(conn.resolvedBitrateKbps, 50_000)
|
||||||
|
|
||||||
// Pull 25 synthetic frames and byte-verify the documented pattern:
|
// Pull 25 synthetic frames and byte-verify the documented pattern:
|
||||||
// u32 LE frame index, then data[i] = (idx as u8) &+ (i as u8).
|
// u32 LE frame index, then data[i] = (idx as u8) &+ (i as u8). Alongside, drain the
|
||||||
|
// per-AU host-timing plane (0xCF) the way the app's stats tick does — the connector
|
||||||
|
// ORs VIDEO_CAP_HOST_TIMING in unconditionally and the synthetic host stamps one
|
||||||
|
// report per AU, so the pts correlation must hold end to end through the xcframework.
|
||||||
var got = 0
|
var got = 0
|
||||||
var lastIndex: UInt32 = 0
|
var lastIndex: UInt32 = 0
|
||||||
|
var receivedPts = Set<UInt64>()
|
||||||
|
var timings: [PunktfunkConnection.HostTiming] = []
|
||||||
let deadline = Date().addingTimeInterval(30)
|
let deadline = Date().addingTimeInterval(30)
|
||||||
while got < 25 {
|
while got < 25 {
|
||||||
XCTAssertLessThan(Date(), deadline, "timed out after \(got) frames")
|
XCTAssertLessThan(Date(), deadline, "timed out after \(got) frames")
|
||||||
|
while let t = try conn.nextHostTiming(timeoutMs: 0) { timings.append(t) }
|
||||||
guard let au = try conn.nextAU(timeoutMs: 2000) else { continue }
|
guard let au = try conn.nextAU(timeoutMs: 2000) else { continue }
|
||||||
let idx = au.data.prefix(4).reversed().reduce(UInt32(0)) { ($0 << 8) | UInt32($1) }
|
let idx = au.data.prefix(4).reversed().reduce(UInt32(0)) { ($0 << 8) | UInt32($1) }
|
||||||
for (i, byte) in au.data.enumerated().dropFirst(4) {
|
for (i, byte) in au.data.enumerated().dropFirst(4) {
|
||||||
@@ -41,10 +47,22 @@ final class LoopbackIntegrationTests: XCTestCase {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
XCTAssertGreaterThan(au.ptsNs, 0)
|
XCTAssertGreaterThan(au.ptsNs, 0)
|
||||||
|
receivedPts.insert(au.ptsNs)
|
||||||
lastIndex = idx
|
lastIndex = idx
|
||||||
got += 1
|
got += 1
|
||||||
}
|
}
|
||||||
XCTAssertGreaterThanOrEqual(lastIndex, 24)
|
XCTAssertGreaterThanOrEqual(lastIndex, 24)
|
||||||
|
// Belt-and-braces: the last frame's timing lands just after its AU — give it a bounded
|
||||||
|
// grace drain (the stream keeps running, so this must not loop on fresh timings).
|
||||||
|
var grace = 0
|
||||||
|
while grace < 64, !timings.contains(where: { receivedPts.contains($0.ptsNs) }),
|
||||||
|
let t = try conn.nextHostTiming(timeoutMs: 100) {
|
||||||
|
timings.append(t)
|
||||||
|
grace += 1
|
||||||
|
}
|
||||||
|
XCTAssertTrue(
|
||||||
|
timings.contains { receivedPts.contains($0.ptsNs) },
|
||||||
|
"no 0xCF host timing matched a received AU's pts (got \(timings.count) timings)")
|
||||||
|
|
||||||
// Input goes the other way (enqueue-only; the host logs the count on close) —
|
// Input goes the other way (enqueue-only; the host logs the count on close) —
|
||||||
// including the touch kinds, gamepad events, the rich-input plane (DualSense
|
// including the touch kinds, gamepad events, the rich-input plane (DualSense
|
||||||
|
|||||||
@@ -56,6 +56,16 @@ pub struct Stats {
|
|||||||
pub mbps: f32,
|
pub mbps: f32,
|
||||||
/// p50 `host+network` stage: capture → received, host-clock corrected (ms).
|
/// p50 `host+network` stage: capture → received, host-clock corrected (ms).
|
||||||
pub host_net_ms: f32,
|
pub host_net_ms: f32,
|
||||||
|
/// p50 `host` stage: the host's own capture→fully-sent, from the per-AU 0xCF host
|
||||||
|
/// timings (design/stats-unification.md Phase 2). Valid only when `split`.
|
||||||
|
pub host_ms: f32,
|
||||||
|
/// p50 `network` stage: capture→received minus the host-reported share
|
||||||
|
/// (`hostnet − host`, per-frame, saturating). Valid only when `split`.
|
||||||
|
pub net_ms: f32,
|
||||||
|
/// The window had matched host timings — the OSD splits `host+network` into
|
||||||
|
/// `host + network`. An old host never emits 0xCF, so this stays false and the
|
||||||
|
/// combined stage renders unchanged.
|
||||||
|
pub split: bool,
|
||||||
/// p50 `decode` stage: received → decoded, single-clock client-local (ms).
|
/// p50 `decode` stage: received → decoded, single-clock client-local (ms).
|
||||||
pub decode_ms: f32,
|
pub decode_ms: f32,
|
||||||
/// Unrecoverable network frame drops this window, and their share of
|
/// Unrecoverable network frame drops this window, and their share of
|
||||||
@@ -67,6 +77,11 @@ pub struct Stats {
|
|||||||
pub decoder: &'static str,
|
pub decoder: &'static str,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Frames the pump keeps waiting for their 0xCF host timing (pts → capture→received µs).
|
||||||
|
/// ~2 s at 120 Hz — a timing arrives within a frame or two of its AU, and against an old
|
||||||
|
/// host (no 0xCF at all) this just caps the dead-weight ring.
|
||||||
|
const PENDING_SPLIT_CAP: usize = 256;
|
||||||
|
|
||||||
/// Sort a window of µs samples in place and return `(p50, p95)` per the spec's index
|
/// Sort a window of µs samples in place and return `(p50, p95)` per the spec's index
|
||||||
/// rules (`sorted[len/2]`, `sorted[min(len*95/100, len-1)]`); an empty window reads 0.
|
/// rules (`sorted[len/2]`, `sorted[min(len*95/100, len-1)]`); an empty window reads 0.
|
||||||
pub fn window_percentiles(samples: &mut [u64]) -> (u64, u64) {
|
pub fn window_percentiles(samples: &mut [u64]) -> (u64, u64) {
|
||||||
@@ -245,6 +260,12 @@ fn pump(
|
|||||||
// corrected), `decode` = received→decoded (client-local). p50 per 1 s window.
|
// corrected), `decode` = received→decoded (client-local). p50 per 1 s window.
|
||||||
let mut hostnet_us: Vec<u64> = Vec::with_capacity(256);
|
let mut hostnet_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
let mut decode_us: Vec<u64> = Vec::with_capacity(256);
|
let mut decode_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
|
// Host/network split (Phase 2): frames awaiting their per-AU 0xCF host timing,
|
||||||
|
// correlated by pts_ns. Bounded — an old host never sends any, so entries just age out.
|
||||||
|
let mut pending_split: std::collections::VecDeque<(u64, u64)> =
|
||||||
|
std::collections::VecDeque::with_capacity(PENDING_SPLIT_CAP);
|
||||||
|
let mut host_us_win: Vec<u64> = Vec::with_capacity(256);
|
||||||
|
let mut net_us_win: Vec<u64> = Vec::with_capacity(256);
|
||||||
// What actually decoded the last frame — a VAAPI failure demotes mid-session, so
|
// What actually decoded the last frame — a VAAPI failure demotes mid-session, so
|
||||||
// this is read off each frame's image variant rather than fixed at startup.
|
// this is read off each frame's image variant rather than fixed at startup.
|
||||||
let mut dec_path: &'static str = "";
|
let mut dec_path: &'static str = "";
|
||||||
@@ -291,6 +312,12 @@ fn pump(
|
|||||||
.max(0) as u64;
|
.max(0) as u64;
|
||||||
if hn > 0 && hn < 10_000_000_000 {
|
if hn > 0 && hn < 10_000_000_000 {
|
||||||
hostnet_us.push(hn / 1000);
|
hostnet_us.push(hn / 1000);
|
||||||
|
// Remember the sample for the host/network split — matched
|
||||||
|
// against the AU's 0xCF host timing when it arrives.
|
||||||
|
if pending_split.len() >= PENDING_SPLIT_CAP {
|
||||||
|
pending_split.pop_front();
|
||||||
|
}
|
||||||
|
pending_split.push_back((frame.pts_ns, hn / 1000));
|
||||||
}
|
}
|
||||||
// `decode` stage: received→decoded, single clock, no skew.
|
// `decode` stage: received→decoded, single clock, no skew.
|
||||||
decode_us.push(decoded_ns.saturating_sub(received_ns) / 1000);
|
decode_us.push(decoded_ns.saturating_sub(received_ns) / 1000);
|
||||||
@@ -310,6 +337,19 @@ fn pump(
|
|||||||
Err(e) => break Some(format!("session: {e:?}")),
|
Err(e) => break Some(format!("session: {e:?}")),
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Drain the per-AU host timings (0xCF) non-blockingly and match them to received
|
||||||
|
// frames by pts: host = the host's own capture→sent, network = our
|
||||||
|
// capture→received minus it (the two tile per frame by construction). An old
|
||||||
|
// host never emits any — the deque fills to its cap and the OSD keeps the
|
||||||
|
// combined `host+network` stage.
|
||||||
|
while let Ok(t) = connector.next_host_timing(Duration::ZERO) {
|
||||||
|
if let Some(i) = pending_split.iter().position(|(p, _)| *p == t.pts_ns) {
|
||||||
|
let (_, hn_us) = pending_split.remove(i).unwrap();
|
||||||
|
host_us_win.push(t.host_us as u64);
|
||||||
|
net_us_win.push(hn_us.saturating_sub(t.host_us as u64));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Loss recovery: under infinite GOP the only recovery keyframe is one we request. The
|
// Loss recovery: under infinite GOP the only recovery keyframe is one we request. The
|
||||||
// reassembler drops unrecoverable AUs (frames_dropped); the decoder then conceals the
|
// reassembler drops unrecoverable AUs (frames_dropped); the decoder then conceals the
|
||||||
// reference-missing delta frames that follow and returns Ok, so keying off a decode error
|
// reference-missing delta frames that follow and returns Ok, so keying off a decode error
|
||||||
@@ -330,11 +370,17 @@ fn pump(
|
|||||||
let secs = window_start.elapsed().as_secs_f32();
|
let secs = window_start.elapsed().as_secs_f32();
|
||||||
let (hn_p50, _) = window_percentiles(&mut hostnet_us);
|
let (hn_p50, _) = window_percentiles(&mut hostnet_us);
|
||||||
let (dec_p50, _) = window_percentiles(&mut decode_us);
|
let (dec_p50, _) = window_percentiles(&mut decode_us);
|
||||||
|
// Host/network split — present only when this window matched 0xCF timings.
|
||||||
|
let split = !host_us_win.is_empty();
|
||||||
|
let (host_p50, _) = window_percentiles(&mut host_us_win);
|
||||||
|
let (net_p50, _) = window_percentiles(&mut net_us_win);
|
||||||
let lost = dropped.saturating_sub(window_dropped) as u32;
|
let lost = dropped.saturating_sub(window_dropped) as u32;
|
||||||
window_dropped = dropped;
|
window_dropped = dropped;
|
||||||
tracing::debug!(
|
tracing::debug!(
|
||||||
fps = frames_n,
|
fps = frames_n,
|
||||||
hostnet_p50_us = hn_p50,
|
hostnet_p50_us = hn_p50,
|
||||||
|
host_p50_us = host_p50,
|
||||||
|
net_p50_us = net_p50,
|
||||||
decode_p50_us = dec_p50,
|
decode_p50_us = dec_p50,
|
||||||
lost,
|
lost,
|
||||||
total_frames,
|
total_frames,
|
||||||
@@ -344,6 +390,9 @@ fn pump(
|
|||||||
fps: frames_n as f32 / secs,
|
fps: frames_n as f32 / secs,
|
||||||
mbps: bytes_n as f32 * 8.0 / 1e6 / secs,
|
mbps: bytes_n as f32 * 8.0 / 1e6 / secs,
|
||||||
host_net_ms: hn_p50 as f32 / 1000.0,
|
host_net_ms: hn_p50 as f32 / 1000.0,
|
||||||
|
host_ms: host_p50 as f32 / 1000.0,
|
||||||
|
net_ms: net_p50 as f32 / 1000.0,
|
||||||
|
split,
|
||||||
decode_ms: dec_p50 as f32 / 1000.0,
|
decode_ms: dec_p50 as f32 / 1000.0,
|
||||||
lost,
|
lost,
|
||||||
lost_pct: if lost > 0 {
|
lost_pct: if lost > 0 {
|
||||||
@@ -358,6 +407,8 @@ fn pump(
|
|||||||
bytes_n = 0;
|
bytes_n = 0;
|
||||||
hostnet_us.clear();
|
hostnet_us.clear();
|
||||||
decode_us.clear();
|
decode_us.clear();
|
||||||
|
host_us_win.clear();
|
||||||
|
net_us_win.clear();
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@@ -68,10 +68,28 @@ impl StreamPage {
|
|||||||
if self.hdr.get() {
|
if self.hdr.get() {
|
||||||
line1.push_str(" · HDR");
|
line1.push_str(" · HDR");
|
||||||
}
|
}
|
||||||
|
// The equation line: split `host+network` into `host + network` when the host
|
||||||
|
// reported per-AU timings (0xCF, stats Phase 2); the combined stage otherwise.
|
||||||
|
let equation = if s.split {
|
||||||
|
format!(
|
||||||
|
"= host {:.1} + network {:.1} + decode {:.1} + display {:.1}",
|
||||||
|
s.host_ms,
|
||||||
|
s.net_ms,
|
||||||
|
s.decode_ms,
|
||||||
|
self.presented.display_ms.get(),
|
||||||
|
)
|
||||||
|
} else {
|
||||||
|
format!(
|
||||||
|
"= host+network {:.1} + decode {:.1} + display {:.1}",
|
||||||
|
s.host_net_ms,
|
||||||
|
s.decode_ms,
|
||||||
|
self.presented.display_ms.get(),
|
||||||
|
)
|
||||||
|
};
|
||||||
let mut text = format!(
|
let mut text = format!(
|
||||||
"{line1}\n\
|
"{line1}\n\
|
||||||
end-to-end {:.1} ms p50 · {:.1} p95 · capture→displayed{}\n\
|
end-to-end {:.1} ms p50 · {:.1} p95 · capture→displayed{}\n\
|
||||||
= host+network {:.1} + decode {:.1} + display {:.1}",
|
{equation}",
|
||||||
self.presented.e2e_p50_ms.get(),
|
self.presented.e2e_p50_ms.get(),
|
||||||
self.presented.e2e_p95_ms.get(),
|
self.presented.e2e_p95_ms.get(),
|
||||||
if self.same_host {
|
if self.same_host {
|
||||||
@@ -79,9 +97,6 @@ impl StreamPage {
|
|||||||
} else {
|
} else {
|
||||||
""
|
""
|
||||||
},
|
},
|
||||||
s.host_net_ms,
|
|
||||||
s.decode_ms,
|
|
||||||
self.presented.display_ms.get(),
|
|
||||||
);
|
);
|
||||||
// Counters — only rendered when nonzero this window.
|
// Counters — only rendered when nonzero this window.
|
||||||
if s.lost > 0 {
|
if s.lost > 0 {
|
||||||
|
|||||||
@@ -175,7 +175,8 @@ fn fmt_uptime(secs: u32) -> String {
|
|||||||
/// The streaming HUD overlay (top-right), unified stats vocabulary (design/stats-unification.md):
|
/// The streaming HUD overlay (top-right), unified stats vocabulary (design/stats-unification.md):
|
||||||
/// a chip row (mode · codec · decode path · HDR), a stream line (received fps · goodput ·
|
/// a chip row (mode · codec · decode path · HDR), a stream line (received fps · goodput ·
|
||||||
/// presenter fps), the end-to-end headline (capture→on-glass p50/p95, host-clock corrected), the
|
/// presenter fps), the end-to-end headline (capture→on-glass p50/p95, host-clock corrected), the
|
||||||
/// stage equation (= host+network + decode + display, stage p50s), a session line
|
/// stage equation (= host + network + decode + display when the host reports 0xCF timings, else
|
||||||
|
/// the combined = host+network + decode + display; stage p50s), a session line
|
||||||
/// (host · time · loss/skips), and the shortcut hints. Layered over the `SwapChainPanel` in the
|
/// (host · time · loss/skips), and the shortcut hints. Layered over the `SwapChainPanel` in the
|
||||||
/// same grid cell.
|
/// same grid cell.
|
||||||
fn hud_overlay(hud: &HudSample, mode: Option<Mode>, host: &str) -> Element {
|
fn hud_overlay(hud: &HudSample, mode: Option<Mode>, host: &str) -> Element {
|
||||||
@@ -212,12 +213,21 @@ fn hud_overlay(hud: &HudSample, mode: Option<Mode>, host: &str) -> Element {
|
|||||||
if stats.same_host {
|
if stats.same_host {
|
||||||
e2e_line.push_str(" (same-host clock)");
|
e2e_line.push_str(" (same-host clock)");
|
||||||
}
|
}
|
||||||
// The equation: the three stages tile the headline interval per frame; the window p50s only
|
// The equation: the stages tile the headline interval per frame; the window p50s only
|
||||||
// approximately sum (percentiles aren't additive).
|
// approximately sum (percentiles aren't additive). With per-AU 0xCF host timings the opaque
|
||||||
let stage_line = format!(
|
// `host+network` term splits into `host` (host capture→sent) + `network` (the remainder);
|
||||||
"= host+network {:.1} + decode {:.1} + display {:.1}",
|
// an old host emits none and the combined term stays.
|
||||||
stats.hostnet_ms, stats.decode_ms, present.display_p50_ms
|
let stage_line = if stats.split {
|
||||||
);
|
format!(
|
||||||
|
"= host {:.1} + network {:.1} + decode {:.1} + display {:.1}",
|
||||||
|
stats.host_ms, stats.net_ms, stats.decode_ms, present.display_p50_ms
|
||||||
|
)
|
||||||
|
} else {
|
||||||
|
format!(
|
||||||
|
"= host+network {:.1} + decode {:.1} + display {:.1}",
|
||||||
|
stats.hostnet_ms, stats.decode_ms, present.display_p50_ms
|
||||||
|
)
|
||||||
|
};
|
||||||
let mut session_bits: Vec<String> = Vec::new();
|
let mut session_bits: Vec<String> = Vec::new();
|
||||||
if !host.is_empty() {
|
if !host.is_empty() {
|
||||||
session_bits.push(host.to_string());
|
session_bits.push(host.to_string());
|
||||||
|
|||||||
@@ -238,6 +238,18 @@ fn run_headless_cli(args: &[String], identity: (String, String)) {
|
|||||||
session::SessionEvent::Connected {
|
session::SessionEvent::Connected {
|
||||||
mode, fingerprint, ..
|
mode, fingerprint, ..
|
||||||
} => tracing::info!(?mode, fp = %trust::hex(&fingerprint), "connected"),
|
} => tracing::info!(?mode, fp = %trust::hex(&fingerprint), "connected"),
|
||||||
|
// With per-AU 0xCF host timings the combined host+network stage splits into
|
||||||
|
// host (capture→sent on the host) + net; an old host emits none → combined only.
|
||||||
|
session::SessionEvent::Stats(s) if s.split => tracing::info!(
|
||||||
|
fps = format!("{:.0}", s.fps),
|
||||||
|
mbps = format!("{:.1}", s.mbps),
|
||||||
|
decode_p50_ms = format!("{:.2}", s.decode_ms),
|
||||||
|
hostnet_p50_ms = format!("{:.2}", s.hostnet_ms),
|
||||||
|
host_p50_ms = format!("{:.2}", s.host_ms),
|
||||||
|
net_p50_ms = format!("{:.2}", s.net_ms),
|
||||||
|
frames_seen,
|
||||||
|
"stats"
|
||||||
|
),
|
||||||
session::SessionEvent::Stats(s) => tracing::info!(
|
session::SessionEvent::Stats(s) => tracing::info!(
|
||||||
fps = format!("{:.0}", s.fps),
|
fps = format!("{:.0}", s.fps),
|
||||||
mbps = format!("{:.1}", s.mbps),
|
mbps = format!("{:.1}", s.mbps),
|
||||||
|
|||||||
@@ -55,6 +55,15 @@ pub struct Stats {
|
|||||||
/// `host+network` stage p50 over the last 1 s window: capture (`pts_ns`) → received,
|
/// `host+network` stage p50 over the last 1 s window: capture (`pts_ns`) → received,
|
||||||
/// host-clock corrected via `clock_offset_ns`.
|
/// host-clock corrected via `clock_offset_ns`.
|
||||||
pub hostnet_ms: f32,
|
pub hostnet_ms: f32,
|
||||||
|
/// `host` stage p50 (host capture→sent, from the per-AU 0xCF host-timing plane). Valid only
|
||||||
|
/// when `split` — an old host emits no 0xCF and the HUD keeps the combined stage.
|
||||||
|
pub host_ms: f32,
|
||||||
|
/// `network` stage p50 (`hostnet − host`, tiled per frame before taking the percentile).
|
||||||
|
/// Valid only when `split`.
|
||||||
|
pub net_ms: f32,
|
||||||
|
/// True when any 0xCF host timings matched received AUs this window — the HUD then renders
|
||||||
|
/// `host + network` instead of the combined `host+network` term.
|
||||||
|
pub split: bool,
|
||||||
/// True when `clock_offset_ns == 0` (host didn't answer the skew handshake / same host) —
|
/// True when `clock_offset_ns == 0` (host didn't answer the skew handshake / same host) —
|
||||||
/// the HUD appends `(same-host clock)` to the end-to-end line.
|
/// the HUD appends `(same-host clock)` to the end-to-end line.
|
||||||
pub same_host: bool,
|
pub same_host: bool,
|
||||||
@@ -330,6 +339,12 @@ fn pump(
|
|||||||
// 1 s tumbling stage windows (spec: design/stats-unification.md — percentiles, never means).
|
// 1 s tumbling stage windows (spec: design/stats-unification.md — percentiles, never means).
|
||||||
let mut hostnet_us: Vec<u64> = Vec::with_capacity(256);
|
let mut hostnet_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
let mut decode_us: Vec<u64> = Vec::with_capacity(256);
|
let mut decode_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
|
// Host/network split (Phase 2): received AUs awaiting their 0xCF host timing, `(pts_ns,
|
||||||
|
// hostnet_us)`, matched as the datagrams arrive. Bounded — an old host never sends any.
|
||||||
|
let mut pending_split: std::collections::VecDeque<(u64, u64)> =
|
||||||
|
std::collections::VecDeque::with_capacity(256);
|
||||||
|
let mut host_us_w: Vec<u64> = Vec::with_capacity(256);
|
||||||
|
let mut net_us_w: Vec<u64> = Vec::with_capacity(256);
|
||||||
let mut pcm = vec![0f32; 5760 * channels as usize]; // scratch: max Opus frame (120 ms) × channels
|
let mut pcm = vec![0f32; 5760 * channels as usize]; // scratch: max Opus frame (120 ms) × channels
|
||||||
// Loss recovery: watch the host→client unrecoverable-drop count and ask for an IDR when it climbs.
|
// Loss recovery: watch the host→client unrecoverable-drop count and ask for an IDR when it climbs.
|
||||||
let mut last_dropped = connector.frames_dropped();
|
let mut last_dropped = connector.frames_dropped();
|
||||||
@@ -352,6 +367,11 @@ fn pump(
|
|||||||
.max(0) as u64;
|
.max(0) as u64;
|
||||||
if hostnet > 0 && hostnet < 10_000_000_000 {
|
if hostnet > 0 && hostnet < 10_000_000_000 {
|
||||||
hostnet_us.push(hostnet / 1000);
|
hostnet_us.push(hostnet / 1000);
|
||||||
|
// Remember this AU for the 0xCF match below (host/network split).
|
||||||
|
pending_split.push_back((frame.pts_ns, hostnet / 1000));
|
||||||
|
if pending_split.len() > 256 {
|
||||||
|
pending_split.pop_front();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
// A D3D11VA→software demotion (see `Decoder::decode`) starts a FRESH decoder that
|
// A D3D11VA→software demotion (see `Decoder::decode`) starts a FRESH decoder that
|
||||||
// has none of the stream's parameter sets; under infinite GOP it would sit on
|
// has none of the stream's parameter sets; under infinite GOP it would sit on
|
||||||
@@ -440,15 +460,34 @@ fn pump(
|
|||||||
*crate::present::LATEST_HDR_META.lock().unwrap() = Some(meta);
|
*crate::present::LATEST_HDR_META.lock().unwrap() = Some(meta);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Drain the per-AU host-timing plane (0xCF) and match by pts: `host` = the host's own
|
||||||
|
// capture→sent, `network` = our capture→received minus it — the two tile per frame
|
||||||
|
// (design/stats-unification.md Phase 2). An old host never emits any; `split` stays false
|
||||||
|
// and the HUD keeps the combined `host+network` stage.
|
||||||
|
while let Ok(t) = connector.next_host_timing(Duration::ZERO) {
|
||||||
|
if let Some(i) = pending_split.iter().position(|(p, _)| *p == t.pts_ns) {
|
||||||
|
let (_, hn_us) = pending_split.remove(i).unwrap();
|
||||||
|
host_us_w.push(t.host_us as u64);
|
||||||
|
net_us_w.push(hn_us.saturating_sub(t.host_us as u64));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if window_start.elapsed() >= Duration::from_secs(1) {
|
if window_start.elapsed() >= Duration::from_secs(1) {
|
||||||
let secs = window_start.elapsed().as_secs_f32();
|
let secs = window_start.elapsed().as_secs_f32();
|
||||||
hostnet_us.sort_unstable();
|
hostnet_us.sort_unstable();
|
||||||
decode_us.sort_unstable();
|
decode_us.sort_unstable();
|
||||||
|
host_us_w.sort_unstable();
|
||||||
|
net_us_w.sort_unstable();
|
||||||
let p50 = |v: &[u64]| v.get(v.len() / 2).copied().unwrap_or(0);
|
let p50 = |v: &[u64]| v.get(v.len() / 2).copied().unwrap_or(0);
|
||||||
let (hostnet_p50, decode_p50) = (p50(&hostnet_us), p50(&decode_us));
|
let (hostnet_p50, decode_p50) = (p50(&hostnet_us), p50(&decode_us));
|
||||||
|
let (host_p50, net_p50) = (p50(&host_us_w), p50(&net_us_w));
|
||||||
|
let split = !host_us_w.is_empty();
|
||||||
tracing::debug!(
|
tracing::debug!(
|
||||||
fps = frames_n,
|
fps = frames_n,
|
||||||
hostnet_p50_us = hostnet_p50,
|
hostnet_p50_us = hostnet_p50,
|
||||||
|
host_p50_us = host_p50,
|
||||||
|
net_p50_us = net_p50,
|
||||||
|
split,
|
||||||
decode_p50_us = decode_p50,
|
decode_p50_us = decode_p50,
|
||||||
total_frames,
|
total_frames,
|
||||||
"stream window"
|
"stream window"
|
||||||
@@ -458,6 +497,9 @@ fn pump(
|
|||||||
mbps: bytes_n as f32 * 8.0 / 1e6 / secs,
|
mbps: bytes_n as f32 * 8.0 / 1e6 / secs,
|
||||||
decode_ms: decode_p50 as f32 / 1000.0,
|
decode_ms: decode_p50 as f32 / 1000.0,
|
||||||
hostnet_ms: hostnet_p50 as f32 / 1000.0,
|
hostnet_ms: hostnet_p50 as f32 / 1000.0,
|
||||||
|
host_ms: host_p50 as f32 / 1000.0,
|
||||||
|
net_ms: net_p50 as f32 / 1000.0,
|
||||||
|
split,
|
||||||
same_host: clock_offset == 0,
|
same_host: clock_offset == 0,
|
||||||
hardware,
|
hardware,
|
||||||
hdr,
|
hdr,
|
||||||
@@ -470,6 +512,8 @@ fn pump(
|
|||||||
bytes_n = 0;
|
bytes_n = 0;
|
||||||
hostnet_us.clear();
|
hostnet_us.clear();
|
||||||
decode_us.clear();
|
decode_us.clear();
|
||||||
|
host_us_w.clear();
|
||||||
|
net_us_w.clear();
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
+34
-15
@@ -121,22 +121,41 @@ Sunshine's "host processing latency" (capture→send).
|
|||||||
from frame-number gaps); our `fps` counts received only — equal at ~0 loss.
|
from frame-number gaps); our `fps` counts received only — equal at ~0 loss.
|
||||||
- Moonlight decode/queue/render times are **means**; ours are p50s.
|
- Moonlight decode/queue/render times are **means**; ours are p50s.
|
||||||
|
|
||||||
## Phase 2 (specced, not in v1): split `host+network`
|
## Phase 2 (implemented): split `host+network` via the 0xCF host-timing plane
|
||||||
|
|
||||||
Carry the host's capture→send duration per AU (host stamps it at send, e.g. a
|
Not an AU-header change after all — the hardened data-plane format stays untouched.
|
||||||
varint-µs field in the AU header or a 0.1 ms u16 à la Sunshine's frame header). Client
|
The host reports its share on the established QUIC side-plane pattern:
|
||||||
then displays `host {x} + network {y}` instead of `host+network`, where
|
|
||||||
`network = (received − capture) − host_reported` — and the Moonlight matrix gains a
|
- **Cap bit**: `Hello::video_caps` gains `VIDEO_CAP_HOST_TIMING` (0x08). NativeClient
|
||||||
direct "Host processing latency" counterpart. Requires a core wire/ABI bump
|
ORs it in unconditionally; the probe sets it explicitly. Old hosts ignore it.
|
||||||
(`punktfunk_frame` gains `host_latency_us`), trailing-byte back-compat like the
|
- **Datagram**: `HOST_TIMING_MAGIC` 0xCF, 13 bytes — `[tag][pts_ns u64 LE][host_us
|
||||||
compositor/gamepad preference bytes. Also consider surfacing the QUIC path RTT
|
u32 LE]` (`quic::HostTiming`). Emitted once per AU by the send thread right after
|
||||||
(quinn exposes it) as a diagnostics line, clearly labelled control-plane RTT.
|
the AU's last packet left the socket, so `host_us` = capture→fully-sent (capture
|
||||||
|
read/convert, encode, FEC+seal, paced send) against the same anchor as the wire
|
||||||
|
pts. Speed-test filler (FLAG_PROBE) is skipped. The synthetic host emits it too
|
||||||
|
(loopback protocol tests cover the plane).
|
||||||
|
- **Client math**: correlate by `pts_ns` (a bounded pending ring of receipt samples),
|
||||||
|
`host = host_us`, `network = hostnet_sample − host_us` (saturating) — the two terms
|
||||||
|
tile the `host+network` stage per frame by construction. Equation line becomes
|
||||||
|
`= host {x} + network {y} + decode + display`; when no 0xCF arrived in the window
|
||||||
|
(old host / all datagrams lost) it falls back to the combined `host+network` term.
|
||||||
|
- **Surfaces**: `NativeClient::next_host_timing()` (Rust clients), C ABI
|
||||||
|
`PunktfunkHostTiming` + `punktfunk_connection_next_host_timing` (Apple), probe log
|
||||||
|
line `host/network latency split` (host_p50/p95_us · net_p50/p95_us).
|
||||||
|
- This is our direct analogue of Sunshine's "host processing latency" — ours
|
||||||
|
additionally includes the paced send (theirs stops just before the UDP send).
|
||||||
|
|
||||||
|
Still open for a later phase: surfacing the QUIC path RTT (quinn exposes it) as a
|
||||||
|
diagnostics line, clearly labelled control-plane RTT.
|
||||||
|
|
||||||
## Implementation status
|
## Implementation status
|
||||||
|
|
||||||
- [ ] Apple (`StreamHUDView`/`SessionModel`/`Stage2Pipeline` + `LatencyMeter` reuse)
|
- [x] Apple (`StreamHUDView`/`SessionModel`/`Stage2Pipeline` + `LatencyMeter` reuse) — 09a5957
|
||||||
- [ ] Windows (`app/stream.rs` HUD rows, `session.rs`/`render.rs` meters → p50/p95)
|
- [x] Windows (`app/stream.rs` HUD rows, `session.rs`/`render.rs` meters → p50/p95) — 09a5957
|
||||||
- [ ] Linux (`ui_stream.rs` OSD, `session.rs` window meters)
|
- [x] Linux (`ui_stream.rs` OSD, `session.rs` window meters) — 09a5957
|
||||||
- [ ] Android (`stats.rs`/`decode.rs` stage split, `StatsOverlay.kt`)
|
- [x] Android (`stats.rs`/`decode.rs` stage split, `StatsOverlay.kt`) — 09a5957
|
||||||
- [ ] probe (rename `capture→reassembled` → `capture→received` in the log line)
|
- [x] probe (rename `capture→reassembled` → `capture→received` in the log line) — 09a5957
|
||||||
- [ ] docs-site stats page + matrix; link from `moonlight.md`
|
- [x] docs-site stats page + matrix; link from `moonlight.md` — 09a5957
|
||||||
|
- [x] Phase 2 wire layer (0xCF + cap bit + NativeClient/ABI + host emission + probe split) — 449a67c
|
||||||
|
- [ ] Phase 2 client HUDs (host/network equation terms on Apple/Windows/Linux/Android)
|
||||||
|
- [ ] On-glass validation everywhere (Mac swift test + glass, Windows CI + glass, Android device, Linux glass)
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ life:
|
|||||||
```
|
```
|
||||||
1920×1080@120 · 119 fps · 38.2 Mb/s · HEVC 10-bit HDR · GPU decode
|
1920×1080@120 · 119 fps · 38.2 Mb/s · HEVC 10-bit HDR · GPU decode
|
||||||
end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass
|
end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass
|
||||||
= host+network 9.8 + decode 2.1 + display 2.3
|
= host 3.1 + network 6.7 + decode 2.1 + display 2.3
|
||||||
lost 3 (0.1%) · skipped 1 · FEC 12
|
lost 3 (0.1%) · skipped 1 · FEC 12
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -35,14 +35,18 @@ lost 3 (0.1%) · skipped 1 · FEC 12
|
|||||||
capture to the endpoint named at the end of the line (`capture→on-glass` here).
|
capture to the endpoint named at the end of the line (`capture→on-glass` here).
|
||||||
`p50` = the typical frame (median), `p95` = the slow outliers. This is the one
|
`p50` = the typical frame (median), `p95` = the slow outliers. This is the one
|
||||||
number that summarizes your stream.
|
number that summarizes your stream.
|
||||||
- **Line 3 — where the time goes.** The three stages **tile the end-to-end interval**
|
- **Line 3 — where the time goes.** The stages **tile the end-to-end interval** —
|
||||||
— each starts where the previous one ends, so they add up to the headline:
|
each starts where the previous one ends, so they add up to the headline:
|
||||||
- `host+network` — capture → received: the host's capture/encode/send pipeline
|
- `host` — capture → sent: the host's own share (capture read, encode, error
|
||||||
*plus* the network flight and reassembly, in one number.
|
coding, the paced send), reported by the host itself once per frame.
|
||||||
|
- `network` — sent → received: the network flight plus reassembly on your device.
|
||||||
- `decode` — received → decoded, on your device.
|
- `decode` — received → decoded, on your device.
|
||||||
- `display` — decoded → displayed: waiting for the right screen refresh, rendering,
|
- `display` — decoded → displayed: waiting for the right screen refresh, rendering,
|
||||||
and vsync.
|
and vsync.
|
||||||
|
|
||||||
|
Against an **older host** that doesn't report its share yet, the first two terms
|
||||||
|
merge into a single `host+network` number — same total, one split fewer.
|
||||||
|
|
||||||
(Stage values are per-stage medians, so they sum only *approximately* to the
|
(Stage values are per-stage medians, so they sum only *approximately* to the
|
||||||
headline median — percentiles aren't perfectly additive. The headline is measured
|
headline median — percentiles aren't perfectly additive. The headline is measured
|
||||||
directly, never computed as a sum.)
|
directly, never computed as a sum.)
|
||||||
@@ -109,10 +113,10 @@ stands in for a one-way frame flight that Moonlight doesn't measure.)
|
|||||||
| `Incoming frame rate from network` | Frames reassembled from the network per second | `fps` (line 1) | **Yes — direct** |
|
| `Incoming frame rate from network` | Frames reassembled from the network per second | `fps` (line 1) | **Yes — direct** |
|
||||||
| `Decoding frame rate` (desktop only) | Frames leaving the decoder per second | not shown separately (equals `fps` unless the decoder is falling behind) | — |
|
| `Decoding frame rate` (desktop only) | Frames leaving the decoder per second | not shown separately (equals `fps` unless the decoder is falling behind) | — |
|
||||||
| `Rendering frame rate` (desktop only) | Frames actually presented per second | `fps` minus `skipped` | Approximately |
|
| `Rendering frame rate` (desktop only) | Frames actually presented per second | `fps` minus `skipped` | Approximately |
|
||||||
| `Host processing latency min/max/avg` (Sunshine hosts) | Host capture → just-before-send, reported by Sunshine per frame | contained inside `host+network`; the host-side breakdown lives in the punktfunk web console (capture/encode/send stages) | Indirect — punktfunk's `host+network` additionally includes the network flight |
|
| `Host processing latency min/max/avg` (Sunshine hosts) | Host capture → just-before-send, reported by Sunshine per frame | `host` (line 3) — the host reports capture→fully-sent per frame the same way | **Yes — direct** (punktfunk's includes the paced send itself, Sunshine's stops just before it; avg vs p50) |
|
||||||
| `Frames dropped by your network connection` | Frame-sequence gaps ÷ total frames | `lost` (line 4) | **Yes — direct** |
|
| `Frames dropped by your network connection` | Frame-sequence gaps ÷ total frames | `lost` (line 4) | **Yes — direct** |
|
||||||
| `Frames dropped due to network jitter` | Decoded frames the *client's pacer* chose to drop ÷ decoded frames | `skipped` (line 4) | Approximately (both are client-side pacing decisions, despite Moonlight's name) |
|
| `Frames dropped due to network jitter` | Decoded frames the *client's pacer* chose to drop ÷ decoded frames | `skipped` (line 4) | Approximately (both are client-side pacing decisions, despite Moonlight's name) |
|
||||||
| `Average network latency` | The **control connection's round-trip time** (ENet RTT + variance) — not video frame latency | none, on purpose | **No.** An RTT is not a frame latency; punktfunk measures the actual per-frame path instead |
|
| `Average network latency` | The **control connection's round-trip time** (ENet RTT + variance) — not video frame latency | `network` (line 3) is the closest concept, but it's the *actual one-way frame path* (flight + reassembly), not an RTT | **No direct comparison.** Roughly, punktfunk's `network` ≈ ½ × an idle RTT plus serialization time of the frame |
|
||||||
| `Average decoding time` | Mean time from decoder enqueue to picture out | `decode` (p50) | Yes (mean vs median; both include decoder queueing) |
|
| `Average decoding time` | Mean time from decoder enqueue to picture out | `decode` (p50) | Yes (mean vs median; both include decoder queueing) |
|
||||||
| `Average frame queue delay` | Mean time a decoded frame waits for its vsync slot | inside `display` | Sum the two Moonlight lines → |
|
| `Average frame queue delay` | Mean time a decoded frame waits for its vsync slot | inside `display` | Sum the two Moonlight lines → |
|
||||||
| `Average rendering time (incl. V-sync latency)` | Mean duration of the present call | inside `display` | …and compare against punktfunk's `display` |
|
| `Average rendering time (incl. V-sync latency)` | Mean duration of the present call | inside `display` | …and compare against punktfunk's `display` |
|
||||||
|
|||||||
Reference in New Issue
Block a user