feat(clients): unified stats vocabulary across every client + Moonlight comparison docs
One stat model everywhere (design/stats-unification.md): four measurement points (capture/received/decoded/displayed), three stages that tile the interval exactly, and a HUD that shows the addition explicitly — end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass = host+network 9.8 + decode 2.1 + display 2.3 replacing each client's ad-hoc mix of overlapping absolutes (the Apple HUD's three arrow lines that looked sequential but weren't), mean-vs-median decode times (Windows/Linux), missing same-host-clock flags (Windows/Linux), and three different names for the same capture→received measurement (probe's "reassembled", Apple/Android's "client", Windows/Linux's post-decode "lat"). Per client: Apple threads receivedNs through the VT decode via the frame refcon bit pattern so the decode stage exists at all (stage-1 fallback honestly degrades to a capture→received headline); Windows carries FrameTimes through the existing frame channel to the render thread and adds e2e p50/p95 post-Present; Linux stamps received at AU pop and rides decoded_ns on DecodedFrame to the paintable-set site; Android pairs receipt stamps with MediaCodec output buffers via the codec's pts round-trip (JNI stats array 14→16 doubles, indexes 0-13 unchanged). fps now uniformly counts received AUs; lost/(received+lost) per window, hidden at zero. docs-site gains "Understanding the Stats Overlay": what each line means, why the equation only approximately sums (percentiles), and a line-by-line Moonlight/Sunshine matrix — including that Moonlight has no end-to-end number and its "network latency" is an ENet control RTT, so punktfunk's headline must not be compared against any single Moonlight line. Verified here: linux client + probe + core check/clippy/fmt green, android native cargo-ndk arm64 check green. Pending: Windows CI + on-glass, swift test on the mac, on-device Android. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -64,7 +64,7 @@ The **GameStream host works with a stock Moonlight client** — validated live o
|
|||||||
and **video at the client's exact resolution and refresh** via a per-session virtual output (KWin,
|
and **video at the client's exact resolution and refresh** via a per-session virtual output (KWin,
|
||||||
gamescope, Mutter, and Sway/wlroots backends), encoded with GPU **zero-copy** (dmabuf → CUDA/Vulkan →
|
gamescope, Mutter, and Sway/wlroots backends), encoded with GPU **zero-copy** (dmabuf → CUDA/Vulkan →
|
||||||
NVENC) up to 5120×1440@240. The native **`punktfunk/1`** protocol adds a QUIC control plane and a
|
NVENC) up to 5120×1440@240. The native **`punktfunk/1`** protocol adds a QUIC control plane and a
|
||||||
GF(2¹⁶) Leopard-FEC + AES-GCM data plane (p50 ~0.8 ms capture→reassembled at 720p120), with
|
GF(2¹⁶) Leopard-FEC + AES-GCM data plane (p50 ~0.8 ms capture→received at 720p120), with
|
||||||
mid-stream mode renegotiation and a wall-clock skew handshake so latency stays valid across machines.
|
mid-stream mode renegotiation and a wall-clock skew handshake so latency stays valid across machines.
|
||||||
Both run from **one process**: bare `punktfunk-host serve` is the **secure native-only default**
|
Both run from **one process**: bare `punktfunk-host serve` is the **secure native-only default**
|
||||||
(`punktfunk/1` + the management API/web console), and `serve --gamestream` additionally enables the
|
(`punktfunk/1` + the management API/web console), and `serve --gamestream` additionally enables the
|
||||||
|
|||||||
@@ -15,11 +15,13 @@ import io.unom.punktfunk.kit.NativeBridge
|
|||||||
import kotlin.math.roundToInt
|
import kotlin.math.roundToInt
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* The live stats overlay — mirrors the Apple client's HUD. Reads the 14-double layout from
|
* The live stats overlay — the unified HUD (`design/stats-unification.md`, Android v1: headline is
|
||||||
|
* `capture→decoded`, tiled by `host+network` + `decode`). Reads the 16-double layout from
|
||||||
* [NativeBridge.nativeVideoStats]:
|
* [NativeBridge.nativeVideoStats]:
|
||||||
* `[fps, mbps, latP50Ms, latP95Ms, latValid, skew, w, h, hz, dropped, bitDepth, colorPrimaries,
|
* `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skew, w, h, hz, lost, bitDepth, colorPrimaries,
|
||||||
* colorTransfer, chromaFormatIdc]`. The trailing four (present on a current native lib) describe the
|
* colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms]`. Indexes 10–13 (present on a current
|
||||||
* negotiated video feed and render as a codec/depth/colour/chroma line; older layouts just omit it.
|
* native lib) describe the negotiated video feed and render as a codec/depth/colour/chroma line;
|
||||||
|
* 14/15 render as the stage equation; older layouts just omit those lines.
|
||||||
*/
|
*/
|
||||||
@Composable
|
@Composable
|
||||||
internal fun StatsOverlay(s: DoubleArray, modifier: Modifier = Modifier) {
|
internal fun StatsOverlay(s: DoubleArray, modifier: Modifier = Modifier) {
|
||||||
@@ -29,7 +31,7 @@ internal fun StatsOverlay(s: DoubleArray, modifier: Modifier = Modifier) {
|
|||||||
val hz = s[8].toInt()
|
val hz = s[8].toInt()
|
||||||
val latValid = s[4] != 0.0
|
val latValid = s[4] != 0.0
|
||||||
val skew = s[5] != 0.0
|
val skew = s[5] != 0.0
|
||||||
val dropped = s[9].toLong()
|
val lost = s[9].toLong()
|
||||||
Column(
|
Column(
|
||||||
modifier = modifier
|
modifier = modifier
|
||||||
.background(Color.Black.copy(alpha = 0.45f), RoundedCornerShape(6.dp))
|
.background(Color.Black.copy(alpha = 0.45f), RoundedCornerShape(6.dp))
|
||||||
@@ -50,17 +52,25 @@ internal fun StatsOverlay(s: DoubleArray, modifier: Modifier = Modifier) {
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
if (latValid) {
|
if (latValid) {
|
||||||
val tag = if (skew) "" else " (same-host)"
|
val tag = if (skew) "" else " (same-host clock)"
|
||||||
Text(
|
Text(
|
||||||
"capture→client ${"%.1f".format(s[2])}/${"%.1f".format(s[3])} ms p50/p95$tag",
|
"end-to-end ${"%.1f".format(s[2])} ms p50 · ${"%.1f".format(s[3])} p95 · capture→decoded$tag",
|
||||||
|
color = Color.White,
|
||||||
|
fontFamily = FontFamily.Monospace,
|
||||||
|
fontSize = 12.sp,
|
||||||
|
)
|
||||||
|
if (s.size >= 16) {
|
||||||
|
Text(
|
||||||
|
"= host+network ${"%.1f".format(s[14])} + decode ${"%.1f".format(s[15])}",
|
||||||
color = Color.White,
|
color = Color.White,
|
||||||
fontFamily = FontFamily.Monospace,
|
fontFamily = FontFamily.Monospace,
|
||||||
fontSize = 12.sp,
|
fontSize = 12.sp,
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
if (dropped > 0) {
|
}
|
||||||
|
if (lost > 0) {
|
||||||
Text(
|
Text(
|
||||||
"dropped $dropped",
|
"lost $lost",
|
||||||
color = Color(0xFFFFB0B0),
|
color = Color(0xFFFFB0B0),
|
||||||
fontFamily = FontFamily.Monospace,
|
fontFamily = FontFamily.Monospace,
|
||||||
fontSize = 12.sp,
|
fontSize = 12.sp,
|
||||||
|
|||||||
@@ -105,12 +105,14 @@ object NativeBridge {
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* Drain ~1 s of live decode stats for the on-stream HUD, or `null` when no decode thread runs.
|
* Drain ~1 s of live decode stats for the on-stream HUD, or `null` when no decode thread runs.
|
||||||
* Returns 14 doubles:
|
* Returns 16 doubles (unified stats spec, `design/stats-unification.md`):
|
||||||
* `[fps, mbps, latP50Ms, latP95Ms, latValid, skewCorrected, width, height, refreshHz, framesDropped,
|
* `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skewCorrected, width, height, refreshHz, framesLost,
|
||||||
* bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc]`
|
* bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms]`
|
||||||
* (the two flags are 1.0/0.0; the trailing four describe the negotiated video feed — bit depth
|
* (the two flags are 1.0/0.0; indexes 2/3 are the end-to-end capture→decoded headline; 10–13
|
||||||
* 8/10, CICP primaries/transfer, and the HEVC chroma_format_idc 1=4:2:0 / 3=4:4:4). Poll ~1 Hz;
|
* describe the negotiated video feed — bit depth 8/10, CICP primaries/transfer, and the HEVC
|
||||||
* each call resets the measurement window.
|
* chroma_format_idc 1=4:2:0 / 3=4:4:4; 14/15 are the stage p50s tiling the headline —
|
||||||
|
* `host+network` = capture→received, `decode` = received→decoded). Poll ~1 Hz; each call
|
||||||
|
* resets the measurement window.
|
||||||
*/
|
*/
|
||||||
external fun nativeVideoStats(handle: Long): DoubleArray?
|
external fun nativeVideoStats(handle: Long): DoubleArray?
|
||||||
|
|
||||||
|
|||||||
@@ -9,16 +9,22 @@
|
|||||||
use ndk::data_space::DataSpace;
|
use ndk::data_space::DataSpace;
|
||||||
use ndk::media::media_codec::{
|
use ndk::media::media_codec::{
|
||||||
DequeuedInputBufferResult, DequeuedOutputBufferInfoResult, MediaCodec, MediaCodecDirection,
|
DequeuedInputBufferResult, DequeuedOutputBufferInfoResult, MediaCodec, MediaCodecDirection,
|
||||||
|
OutputBuffer,
|
||||||
};
|
};
|
||||||
use ndk::media::media_format::MediaFormat;
|
use ndk::media::media_format::MediaFormat;
|
||||||
use ndk::native_window::{FrameRateCompatibility, NativeWindow};
|
use ndk::native_window::{FrameRateCompatibility, NativeWindow};
|
||||||
use punktfunk_core::client::NativeClient;
|
use punktfunk_core::client::NativeClient;
|
||||||
use punktfunk_core::error::PunktfunkError;
|
use punktfunk_core::error::PunktfunkError;
|
||||||
use punktfunk_core::session::Frame;
|
use punktfunk_core::session::Frame;
|
||||||
|
use std::collections::VecDeque;
|
||||||
use std::sync::atomic::{AtomicBool, Ordering};
|
use std::sync::atomic::{AtomicBool, Ordering};
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::time::{Duration, Instant};
|
use std::time::{Duration, Instant};
|
||||||
|
|
||||||
|
/// Cap on the pts→received-timestamp map below: MediaCodec holds only a handful of frames in
|
||||||
|
/// flight, so anything beyond this is stale (codec flushed / HUD toggled) and gets evicted.
|
||||||
|
const IN_FLIGHT_CAP: usize = 64;
|
||||||
|
|
||||||
/// The decode loop. Runs on the `pf-decode` thread until `shutdown` is set or the session closes.
|
/// The decode loop. Runs on the `pf-decode` thread until `shutdown` is set or the session closes.
|
||||||
pub fn run(
|
pub fn run(
|
||||||
client: Arc<NativeClient>,
|
client: Arc<NativeClient>,
|
||||||
@@ -141,9 +147,14 @@ pub fn run(
|
|||||||
// climbs.
|
// climbs.
|
||||||
let mut last_dropped = client.frames_dropped();
|
let mut last_dropped = client.frames_dropped();
|
||||||
let mut last_kf_req: Option<Instant> = None;
|
let mut last_kf_req: Option<Instant> = None;
|
||||||
// Capture→client-receipt latency uses the negotiated host-minus-client clock offset (0 if the
|
// Skew-corrected latency stats (spec: design/stats-unification.md) use the negotiated
|
||||||
// host didn't answer the skew handshake — then the HUD flags it "same-host").
|
// host-minus-client clock offset (0 if the host didn't answer the skew handshake — then the
|
||||||
|
// HUD flags it "(same-host clock)").
|
||||||
let clock_offset = client.clock_offset_ns;
|
let clock_offset = client.clock_offset_ns;
|
||||||
|
// HUD stage split: receipt timestamps keyed by the pts we queue into the codec, so the decoded
|
||||||
|
// point (output-buffer dequeue — MediaCodec round-trips presentationTimeUs) can be paired back
|
||||||
|
// to its receipt for the `decode` stage. Only fed while the HUD is visible.
|
||||||
|
let mut in_flight: VecDeque<(u64, i128)> = VecDeque::new();
|
||||||
// The dataspace we've signalled on the Surface so far (None = default/SDR). Set reactively once
|
// The dataspace we've signalled on the Surface so far (None = default/SDR). Set reactively once
|
||||||
// the decoder reports an HDR stream (see `drain`); avoids re-applying every format event.
|
// the decoder reports an HDR stream (see `drain`); avoids re-applying every format event.
|
||||||
let mut applied_ds: Option<DataSpace> = None;
|
let mut applied_ds: Option<DataSpace> = None;
|
||||||
@@ -164,15 +175,21 @@ pub fn run(
|
|||||||
&p[..p.len().min(6)]
|
&p[..p.len().min(6)]
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
// HUD stat: capture→client-receipt latency = client_now + (host−client) −
|
// HUD stat, `received` point: host+network = client_now + (host−client) −
|
||||||
// capture_pts. Gated on the HUD being visible — `enabled` first so the hidden
|
// capture_pts. Gated on the HUD being visible — `enabled` first so the hidden
|
||||||
// steady state skips the wall-clock read and the lock entirely.
|
// steady state skips the wall-clock read and the lock entirely. The receipt
|
||||||
|
// stamp is also parked in `in_flight` (keyed by the pts the codec will echo on
|
||||||
|
// the output buffer) for the decoded-point pairing in `drain`.
|
||||||
if stats.enabled() {
|
if stats.enabled() {
|
||||||
let lat_ns =
|
let received_ns = now_realtime_ns();
|
||||||
now_realtime_ns() + clock_offset as i128 - frame.pts_ns as i128;
|
let lat_ns = received_ns + clock_offset as i128 - frame.pts_ns as i128;
|
||||||
let lat_us = (lat_ns > 0 && lat_ns < 10_000_000_000)
|
let lat_us = (lat_ns > 0 && lat_ns < 10_000_000_000)
|
||||||
.then_some((lat_ns / 1000) as u64);
|
.then_some((lat_ns / 1000) as u64);
|
||||||
stats.note(frame.data.len(), lat_us, clock_offset != 0);
|
stats.note_received(frame.data.len(), lat_us, clock_offset != 0);
|
||||||
|
in_flight.push_back((frame.pts_ns / 1000, received_ns));
|
||||||
|
if in_flight.len() > IN_FLIGHT_CAP {
|
||||||
|
in_flight.pop_front(); // stale — codec never echoed it back
|
||||||
|
}
|
||||||
}
|
}
|
||||||
pending = Some(frame);
|
pending = Some(frame);
|
||||||
}
|
}
|
||||||
@@ -202,7 +219,15 @@ pub fn run(
|
|||||||
} else {
|
} else {
|
||||||
Duration::ZERO
|
Duration::ZERO
|
||||||
};
|
};
|
||||||
let (r, d) = drain(&codec, &window, &mut applied_ds, wait);
|
let (r, d) = drain(
|
||||||
|
&codec,
|
||||||
|
&window,
|
||||||
|
&mut applied_ds,
|
||||||
|
wait,
|
||||||
|
&stats,
|
||||||
|
&mut in_flight,
|
||||||
|
clock_offset,
|
||||||
|
);
|
||||||
rendered += r;
|
rendered += r;
|
||||||
discarded += d;
|
discarded += d;
|
||||||
|
|
||||||
@@ -330,11 +355,19 @@ fn feed(codec: &MediaCodec, au: &[u8], pts_us: u64) -> bool {
|
|||||||
/// the caller's input is blocked so the loop waits on decoder progress instead of busy-spinning.
|
/// the caller's input is blocked so the loop waits on decoder progress instead of busy-spinning.
|
||||||
/// Returns `(rendered, discarded)`. Also reacts to `OutputFormatChanged` (which can interleave
|
/// Returns `(rendered, discarded)`. Also reacts to `OutputFormatChanged` (which can interleave
|
||||||
/// between buffers — handled without losing the held buffer) to signal HDR on the Surface.
|
/// between buffers — handled without losing the held buffer) to signal HDR on the Surface.
|
||||||
|
///
|
||||||
|
/// Each dequeued buffer is also the HUD's `decoded` measurement point (rendered or not — the frame
|
||||||
|
/// finished decoding either way): end-to-end = decoded + clock_offset − capture pts, and the
|
||||||
|
/// `decode` stage pairs the buffer's echoed presentationTimeUs back to the receipt stamp in
|
||||||
|
/// `in_flight` (single-clock local difference, no skew involved).
|
||||||
fn drain(
|
fn drain(
|
||||||
codec: &MediaCodec,
|
codec: &MediaCodec,
|
||||||
window: &NativeWindow,
|
window: &NativeWindow,
|
||||||
applied_ds: &mut Option<DataSpace>,
|
applied_ds: &mut Option<DataSpace>,
|
||||||
first_wait: Duration,
|
first_wait: Duration,
|
||||||
|
stats: &crate::stats::VideoStats,
|
||||||
|
in_flight: &mut VecDeque<(u64, i128)>,
|
||||||
|
clock_offset: i64,
|
||||||
) -> (u64, u64) {
|
) -> (u64, u64) {
|
||||||
let mut held = None; // newest ready buffer so far, presented after the loop
|
let mut held = None; // newest ready buffer so far, presented after the loop
|
||||||
let mut discarded: u64 = 0;
|
let mut discarded: u64 = 0;
|
||||||
@@ -343,6 +376,9 @@ fn drain(
|
|||||||
match codec.dequeue_output_buffer(wait) {
|
match codec.dequeue_output_buffer(wait) {
|
||||||
Ok(DequeuedOutputBufferInfoResult::Buffer(buf)) => {
|
Ok(DequeuedOutputBufferInfoResult::Buffer(buf)) => {
|
||||||
wait = Duration::ZERO; // only the first dequeue may block
|
wait = Duration::ZERO; // only the first dequeue may block
|
||||||
|
if stats.enabled() {
|
||||||
|
note_decoded(stats, in_flight, clock_offset, &buf);
|
||||||
|
}
|
||||||
if let Some(stale) = held.replace(buf) {
|
if let Some(stale) = held.replace(buf) {
|
||||||
// A newer frame is ready — drop the held one without rendering.
|
// A newer frame is ready — drop the held one without rendering.
|
||||||
if let Err(e) = codec.release_output_buffer(stale, false) {
|
if let Err(e) = codec.release_output_buffer(stale, false) {
|
||||||
@@ -392,6 +428,40 @@ fn drain(
|
|||||||
(rendered, discarded)
|
(rendered, discarded)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// HUD `decoded` point for one dequeued output buffer: build the end-to-end (capture→decoded,
|
||||||
|
/// skew-corrected, clamped to (0, 10 s)) and `decode` (received→decoded, single-clock local, ≥ 0)
|
||||||
|
/// samples and hand them to [`crate::stats::VideoStats::note_decoded`]. The codec echoes the input
|
||||||
|
/// `presentationTimeUs` on the output buffer, which keys the receipt stamp in `in_flight`; entries
|
||||||
|
/// older than the echoed pts are evicted (decode order == input order here — low-latency, no
|
||||||
|
/// B-frames — so anything before it was dropped inside the codec or stamped before a flush).
|
||||||
|
fn note_decoded(
|
||||||
|
stats: &crate::stats::VideoStats,
|
||||||
|
in_flight: &mut VecDeque<(u64, i128)>,
|
||||||
|
clock_offset: i64,
|
||||||
|
buf: &OutputBuffer<'_>,
|
||||||
|
) {
|
||||||
|
let pts_us = buf.info().presentation_time_us().max(0) as u64;
|
||||||
|
let decoded_ns = now_realtime_ns();
|
||||||
|
// Pair the echoed pts back to its receipt stamp, evicting stale (older) entries as we go.
|
||||||
|
let mut received_ns = None;
|
||||||
|
while let Some(&(p, r)) = in_flight.front() {
|
||||||
|
if p > pts_us {
|
||||||
|
break; // future frame — leave it for its own output buffer
|
||||||
|
}
|
||||||
|
in_flight.pop_front();
|
||||||
|
if p == pts_us {
|
||||||
|
received_ns = Some(r);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// pts_us is the truncated frame.pts_ns/1000 we queued, so ×1000 re-approximates capture time
|
||||||
|
// to < 1 µs — negligible against the ms-scale figures shown.
|
||||||
|
let e2e_ns = decoded_ns + clock_offset as i128 - pts_us as i128 * 1000;
|
||||||
|
let e2e_us = (e2e_ns > 0 && e2e_ns < 10_000_000_000).then_some((e2e_ns / 1000) as u64);
|
||||||
|
let decode_us = received_ns.map(|r| ((decoded_ns - r).max(0) / 1000) as u64);
|
||||||
|
stats.note_decoded(e2e_us, decode_us);
|
||||||
|
}
|
||||||
|
|
||||||
/// Map the decoder's reported output colour to a BT.2020 HDR dataspace, or `None` for SDR. The
|
/// Map the decoder's reported output colour to a BT.2020 HDR dataspace, or `None` for SDR. The
|
||||||
/// integer values are the Android MediaFormat colour constants the NDK shares: COLOR_TRANSFER
|
/// integer values are the Android MediaFormat colour constants the NDK shares: COLOR_TRANSFER
|
||||||
/// ST2084 = 6 (PQ/HDR10), HLG = 7; COLOR_RANGE FULL = 1, LIMITED = 2 (the host encodes limited).
|
/// ST2084 = 6 (PQ/HDR10), HLG = 7; COLOR_RANGE FULL = 1, LIMITED = 2 (the host encodes limited).
|
||||||
|
|||||||
@@ -72,14 +72,16 @@ pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeStopVideo(
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
/// `NativeBridge.nativeVideoStats(handle): DoubleArray?` — drain ~1 s of decode stats for the HUD.
|
/// `NativeBridge.nativeVideoStats(handle): DoubleArray?` — drain ~1 s of decode stats for the HUD
|
||||||
/// Returns 14 doubles
|
/// (unified stats spec, `design/stats-unification.md`). Returns 16 doubles
|
||||||
/// `[fps, mbps, latP50Ms, latP95Ms, latValid, skewCorrected, width, height, refreshHz, framesDropped,
|
/// `[fps, mbps, e2eP50Ms, e2eP95Ms, latValid, skewCorrected, width, height, refreshHz, framesLost,
|
||||||
/// bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc]`
|
/// bitDepth, colorPrimaries, colorTransfer, chromaFormatIdc, hostNetP50Ms, decodeP50Ms]`
|
||||||
/// (the two flags are 1.0/0.0; the trailing four describe the negotiated video feed — see below), or
|
/// (the two flags are 1.0/0.0; indexes 0–13 match the previous 14-double layout with the latency
|
||||||
/// `null` when no decode thread is running. Poll ~1 Hz from the UI; each call resets the measurement
|
/// pair re-based from capture→received to the end-to-end capture→decoded headline; the two stage
|
||||||
/// window. Not android-gated — pure `jni` + connector reads, so it links on the host build too
|
/// p50s tiling it — `host+network` = capture→received, `decode` = received→decoded — are appended
|
||||||
/// (Kotlin only ever calls it on device).
|
/// at the end), or `null` when no decode thread is running. Poll ~1 Hz from the UI; each call
|
||||||
|
/// resets the measurement window. Not android-gated — pure `jni` + connector reads, so it links on
|
||||||
|
/// the host build too (Kotlin only ever calls it on device).
|
||||||
#[no_mangle]
|
#[no_mangle]
|
||||||
pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeVideoStats(
|
pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeVideoStats(
|
||||||
env: JNIEnv,
|
env: JNIEnv,
|
||||||
@@ -98,11 +100,11 @@ pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeVideoStats(
|
|||||||
let snap = h.stats.drain();
|
let snap = h.stats.drain();
|
||||||
let mode = h.client.mode();
|
let mode = h.client.mode();
|
||||||
let color = h.client.color;
|
let color = h.client.color;
|
||||||
let buf: [f64; 14] = [
|
let buf: [f64; 16] = [
|
||||||
snap.fps,
|
snap.fps,
|
||||||
snap.mbps,
|
snap.mbps,
|
||||||
snap.lat_p50_ms,
|
snap.e2e_p50_ms,
|
||||||
snap.lat_p95_ms,
|
snap.e2e_p95_ms,
|
||||||
if snap.lat_valid { 1.0 } else { 0.0 },
|
if snap.lat_valid { 1.0 } else { 0.0 },
|
||||||
if snap.skew_corrected { 1.0 } else { 0.0 },
|
if snap.skew_corrected { 1.0 } else { 0.0 },
|
||||||
mode.width as f64,
|
mode.width as f64,
|
||||||
@@ -117,6 +119,9 @@ pub extern "system" fn Java_io_unom_punktfunk_kit_NativeBridge_nativeVideoStats(
|
|||||||
color.primaries as f64,
|
color.primaries as f64,
|
||||||
color.transfer as f64,
|
color.transfer as f64,
|
||||||
h.client.chroma_format as f64,
|
h.client.chroma_format as f64,
|
||||||
|
// Stage p50s tiling the end-to-end headline (appended to keep 0–13 index-compatible).
|
||||||
|
snap.hostnet_p50_ms,
|
||||||
|
snap.decode_p50_ms,
|
||||||
];
|
];
|
||||||
let arr = match env.new_double_array(buf.len() as jsize) {
|
let arr = match env.new_double_array(buf.len() as jsize) {
|
||||||
Ok(a) => a,
|
Ok(a) => a,
|
||||||
|
|||||||
@@ -1,8 +1,11 @@
|
|||||||
//! Live decode stats for the on-stream HUD (mirrors the Apple client's stats overlay): FPS,
|
//! Live decode stats for the on-stream HUD, following the unified stats spec
|
||||||
//! receive throughput, and capture→client-receipt latency (p50/p95). The decode thread is the sole
|
//! (`design/stats-unification.md`): FPS, receive throughput, and the Android v1 stage split —
|
||||||
//! writer (`note` per access unit); the JNI accessor `nativeVideoStats` drains a snapshot ~1 Hz and
|
//! headline `end-to-end` = capture→decoded (p50/p95) tiled by `host+network` = capture→received
|
||||||
//! resets the window. Sampling is gated on the HUD actually being visible (`set_enabled`, driven by
|
//! and `decode` = received→decoded (stage p50s). The decode thread is the sole writer
|
||||||
//! `nativeSetVideoStatsEnabled`) so the hidden steady state costs one relaxed atomic load per frame.
|
//! (`note_received` per access unit at receipt, `note_decoded` per decoder output buffer); the JNI
|
||||||
|
//! accessor `nativeVideoStats` drains a snapshot ~1 Hz and resets the window. Sampling is gated on
|
||||||
|
//! the HUD actually being visible (`set_enabled`, driven by `nativeSetVideoStatsEnabled`) so the
|
||||||
|
//! hidden steady state costs one relaxed atomic load per frame.
|
||||||
//! Pure `std` so it compiles on the host build too (the decode thread is android-only, but
|
//! Pure `std` so it compiles on the host build too (the decode thread is android-only, but
|
||||||
//! `SessionHandle` holds the shared handle unconditionally).
|
//! `SessionHandle` holds the shared handle unconditionally).
|
||||||
|
|
||||||
@@ -13,9 +16,9 @@ use std::time::Instant;
|
|||||||
/// Rolling per-window accumulator. Rates are computed over the actual elapsed wall-time at drain
|
/// Rolling per-window accumulator. Rates are computed over the actual elapsed wall-time at drain
|
||||||
/// (robust to poll jitter), so a poll that lands at 0.9 s or 1.1 s still reports the right FPS.
|
/// (robust to poll jitter), so a poll that lands at 0.9 s or 1.1 s still reports the right FPS.
|
||||||
pub struct VideoStats {
|
pub struct VideoStats {
|
||||||
/// HUD gate: `note` runs on the per-frame decode path, so while the overlay is hidden it (and
|
/// HUD gate: the samplers run on the per-frame decode path, so while the overlay is hidden
|
||||||
/// the caller's latency computation — see `enabled`) early-outs on this flag alone. Off until
|
/// they (and the caller's latency computation — see `enabled`) early-out on this flag alone.
|
||||||
/// Kotlin shows the HUD.
|
/// Off until Kotlin shows the HUD.
|
||||||
enabled: AtomicBool,
|
enabled: AtomicBool,
|
||||||
inner: Mutex<Inner>,
|
inner: Mutex<Inner>,
|
||||||
}
|
}
|
||||||
@@ -24,23 +27,42 @@ struct Inner {
|
|||||||
window_start: Instant,
|
window_start: Instant,
|
||||||
frames: u64,
|
frames: u64,
|
||||||
bytes: u64,
|
bytes: u64,
|
||||||
/// capture→client-receipt latency samples for this window, in microseconds.
|
/// `end-to-end` = capture→decoded latency samples for this window, in microseconds
|
||||||
lat_us: Vec<u64>,
|
/// (skew-corrected clock base).
|
||||||
|
e2e_us: Vec<u64>,
|
||||||
|
/// `host+network` stage = capture→received samples, in microseconds (skew-corrected).
|
||||||
|
hostnet_us: Vec<u64>,
|
||||||
|
/// `decode` stage = received→decoded samples, in microseconds (client-local, single clock).
|
||||||
|
decode_us: Vec<u64>,
|
||||||
/// Whether the host answered the clock-skew handshake (latency is cross-machine valid).
|
/// Whether the host answered the clock-skew handshake (latency is cross-machine valid).
|
||||||
skew_corrected: bool,
|
skew_corrected: bool,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// A drained, computed view of one window. `lat_valid` is false when no in-range latency sample
|
/// A drained, computed view of one window. `lat_valid` is false when no in-range end-to-end sample
|
||||||
/// landed (then p50/p95 are 0 and the HUD hides the latency line, exactly like the Apple client).
|
/// landed (then the latency figures are 0 and the HUD hides the latency lines, exactly like the
|
||||||
|
/// Apple client).
|
||||||
pub struct Snapshot {
|
pub struct Snapshot {
|
||||||
pub fps: f64,
|
pub fps: f64,
|
||||||
pub mbps: f64,
|
pub mbps: f64,
|
||||||
pub lat_p50_ms: f64,
|
/// Headline `end-to-end` (capture→decoded) percentiles, ms.
|
||||||
pub lat_p95_ms: f64,
|
pub e2e_p50_ms: f64,
|
||||||
|
pub e2e_p95_ms: f64,
|
||||||
|
/// Stage p50s (ms): `host+network` (capture→received) and `decode` (received→decoded).
|
||||||
|
pub hostnet_p50_ms: f64,
|
||||||
|
pub decode_p50_ms: f64,
|
||||||
pub lat_valid: bool,
|
pub lat_valid: bool,
|
||||||
pub skew_corrected: bool,
|
pub skew_corrected: bool,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Percentile over a sorted-in-place µs sample vec, in ms. 0.0 when empty.
|
||||||
|
fn pctl_ms(sorted_us: &[u64], p: f64) -> f64 {
|
||||||
|
if sorted_us.is_empty() {
|
||||||
|
return 0.0;
|
||||||
|
}
|
||||||
|
let n = sorted_us.len();
|
||||||
|
sorted_us[((n as f64 * p) as usize).min(n - 1)] as f64 / 1000.0
|
||||||
|
}
|
||||||
|
|
||||||
impl VideoStats {
|
impl VideoStats {
|
||||||
pub fn new() -> VideoStats {
|
pub fn new() -> VideoStats {
|
||||||
VideoStats {
|
VideoStats {
|
||||||
@@ -49,14 +71,16 @@ impl VideoStats {
|
|||||||
window_start: Instant::now(),
|
window_start: Instant::now(),
|
||||||
frames: 0,
|
frames: 0,
|
||||||
bytes: 0,
|
bytes: 0,
|
||||||
lat_us: Vec::with_capacity(256),
|
e2e_us: Vec::with_capacity(256),
|
||||||
|
hostnet_us: Vec::with_capacity(256),
|
||||||
|
decode_us: Vec::with_capacity(256),
|
||||||
skew_corrected: false,
|
skew_corrected: false,
|
||||||
}),
|
}),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Whether the HUD wants samples. The decode thread checks this BEFORE building a latency
|
/// Whether the HUD wants samples. The decode thread checks this BEFORE building a latency
|
||||||
/// sample, so the per-frame wall-clock read is skipped too while hidden.
|
/// sample, so the per-frame wall-clock reads are skipped too while hidden.
|
||||||
// Read only by the android-only decode thread; unreferenced on the host build — expected.
|
// Read only by the android-only decode thread; unreferenced on the host build — expected.
|
||||||
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
|
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
|
||||||
pub fn enabled(&self) -> bool {
|
pub fn enabled(&self) -> bool {
|
||||||
@@ -75,18 +99,21 @@ impl VideoStats {
|
|||||||
g.window_start = Instant::now();
|
g.window_start = Instant::now();
|
||||||
g.frames = 0;
|
g.frames = 0;
|
||||||
g.bytes = 0;
|
g.bytes = 0;
|
||||||
g.lat_us.clear();
|
g.e2e_us.clear();
|
||||||
|
g.hostnet_us.clear();
|
||||||
|
g.decode_us.clear();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Record one decoded access unit: its wire size and (if in range) its capture→client latency.
|
/// Record one received access unit: its wire size and (if in range) its capture→received
|
||||||
|
/// `host+network` stage sample. Receipt is the fps/goodput counting point per the spec.
|
||||||
// Driven only by the android-only decode thread; unreferenced on the host build — expected.
|
// Driven only by the android-only decode thread; unreferenced on the host build — expected.
|
||||||
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
|
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
|
||||||
pub fn note(&self, bytes: usize, lat_us: Option<u64>, skew_corrected: bool) {
|
pub fn note_received(&self, bytes: usize, hostnet_us: Option<u64>, skew_corrected: bool) {
|
||||||
if !self.enabled.load(Ordering::Relaxed) {
|
if !self.enabled.load(Ordering::Relaxed) {
|
||||||
return; // HUD hidden — skip the lock (the caller already skipped the clock read)
|
return; // HUD hidden — skip the lock (the caller already skipped the clock read)
|
||||||
}
|
}
|
||||||
// Poison-proof: `note` runs per-frame on the decode thread, which has no catch_unwind —
|
// Poison-proof: this runs per-frame on the decode thread, which has no catch_unwind —
|
||||||
// a panic elsewhere must not turn every later lock into a second panic (the counters
|
// a panic elsewhere must not turn every later lock into a second panic (the counters
|
||||||
// stay consistent regardless).
|
// stay consistent regardless).
|
||||||
let mut g = self
|
let mut g = self
|
||||||
@@ -96,14 +123,37 @@ impl VideoStats {
|
|||||||
g.frames += 1;
|
g.frames += 1;
|
||||||
g.bytes += bytes as u64;
|
g.bytes += bytes as u64;
|
||||||
g.skew_corrected = skew_corrected;
|
g.skew_corrected = skew_corrected;
|
||||||
if let Some(l) = lat_us {
|
if let Some(l) = hostnet_us {
|
||||||
g.lat_us.push(l);
|
g.hostnet_us.push(l);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Record one decoded output frame: its capture→decoded `end-to-end` sample and its
|
||||||
|
/// received→decoded `decode` stage sample (either may be absent — e.g. the receipt stamp for
|
||||||
|
/// this pts predates the HUD being shown).
|
||||||
|
// Driven only by the android-only decode thread; unreferenced on the host build — expected.
|
||||||
|
#[cfg_attr(not(target_os = "android"), allow(dead_code))]
|
||||||
|
pub fn note_decoded(&self, e2e_us: Option<u64>, decode_us: Option<u64>) {
|
||||||
|
if !self.enabled.load(Ordering::Relaxed) {
|
||||||
|
return; // HUD hidden — skip the lock (the caller already skipped the clock read)
|
||||||
|
}
|
||||||
|
// Poison-proof for the same reason as `note_received`.
|
||||||
|
let mut g = self
|
||||||
|
.inner
|
||||||
|
.lock()
|
||||||
|
.unwrap_or_else(std::sync::PoisonError::into_inner);
|
||||||
|
if let Some(l) = e2e_us {
|
||||||
|
g.e2e_us.push(l);
|
||||||
|
}
|
||||||
|
if let Some(l) = decode_us {
|
||||||
|
g.decode_us.push(l);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Compute the window's rates + latency percentiles, then reset for the next window.
|
/// Compute the window's rates + latency percentiles, then reset for the next window.
|
||||||
pub fn drain(&self) -> Snapshot {
|
pub fn drain(&self) -> Snapshot {
|
||||||
// Poison-proof for the same reason as `note` — a poisoned window still drains fine.
|
// Poison-proof for the same reason as `note_received` — a poisoned window still drains
|
||||||
|
// fine.
|
||||||
let mut g = self
|
let mut g = self
|
||||||
.inner
|
.inner
|
||||||
.lock()
|
.lock()
|
||||||
@@ -111,26 +161,25 @@ impl VideoStats {
|
|||||||
let elapsed = g.window_start.elapsed().as_secs_f64().max(1e-3);
|
let elapsed = g.window_start.elapsed().as_secs_f64().max(1e-3);
|
||||||
let fps = g.frames as f64 / elapsed;
|
let fps = g.frames as f64 / elapsed;
|
||||||
let mbps = g.bytes as f64 * 8.0 / 1_000_000.0 / elapsed;
|
let mbps = g.bytes as f64 * 8.0 / 1_000_000.0 / elapsed;
|
||||||
let (p50, p95, valid) = if g.lat_us.is_empty() {
|
g.e2e_us.sort_unstable();
|
||||||
(0.0, 0.0, false)
|
g.hostnet_us.sort_unstable();
|
||||||
} else {
|
g.decode_us.sort_unstable();
|
||||||
g.lat_us.sort_unstable();
|
let snap = Snapshot {
|
||||||
let n = g.lat_us.len();
|
fps,
|
||||||
let at = |p: f64| g.lat_us[((n as f64 * p) as usize).min(n - 1)] as f64 / 1000.0;
|
mbps,
|
||||||
(at(0.50), at(0.95), true)
|
e2e_p50_ms: pctl_ms(&g.e2e_us, 0.50),
|
||||||
|
e2e_p95_ms: pctl_ms(&g.e2e_us, 0.95),
|
||||||
|
hostnet_p50_ms: pctl_ms(&g.hostnet_us, 0.50),
|
||||||
|
decode_p50_ms: pctl_ms(&g.decode_us, 0.50),
|
||||||
|
lat_valid: !g.e2e_us.is_empty(),
|
||||||
|
skew_corrected: g.skew_corrected,
|
||||||
};
|
};
|
||||||
let skew = g.skew_corrected;
|
|
||||||
g.window_start = Instant::now();
|
g.window_start = Instant::now();
|
||||||
g.frames = 0;
|
g.frames = 0;
|
||||||
g.bytes = 0;
|
g.bytes = 0;
|
||||||
g.lat_us.clear();
|
g.e2e_us.clear();
|
||||||
Snapshot {
|
g.hostnet_us.clear();
|
||||||
fps,
|
g.decode_us.clear();
|
||||||
mbps,
|
snap
|
||||||
lat_p50_ms: p50,
|
|
||||||
lat_p95_ms: p95,
|
|
||||||
lat_valid: valid,
|
|
||||||
skew_corrected: skew,
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -333,8 +333,9 @@ struct ContentView: View {
|
|||||||
onSessionEnd: { [weak model] in
|
onSessionEnd: { [weak model] in
|
||||||
Task { @MainActor in model?.sessionEnded() }
|
Task { @MainActor in model?.sessionEnded() }
|
||||||
},
|
},
|
||||||
presentMeter: model.presentLatency,
|
endToEndMeter: model.endToEnd,
|
||||||
presentTailMeter: model.presentTail
|
decodeMeter: model.decodeStage,
|
||||||
|
displayMeter: model.displayStage
|
||||||
)
|
)
|
||||||
.overlay(alignment: placement.alignment) {
|
.overlay(alignment: placement.alignment) {
|
||||||
if captureEnabled && hudEnabled {
|
if captureEnabled && hudEnabled {
|
||||||
|
|||||||
@@ -170,7 +170,10 @@ private struct ShotHUD: View {
|
|||||||
Text("5120×1440@240 240 fps 812.4 Mb/s")
|
Text("5120×1440@240 240 fps 812.4 Mb/s")
|
||||||
.font(.system(.caption, design: .monospaced))
|
.font(.system(.caption, design: .monospaced))
|
||||||
}
|
}
|
||||||
Text("capture→client 1.3/2.1 ms p50/p95")
|
Text("end-to-end 2.9 ms p50 · 3.8 p95 · capture→on-glass")
|
||||||
|
.font(.system(.caption2, design: .monospaced))
|
||||||
|
.foregroundStyle(.secondary)
|
||||||
|
Text("= host+network 1.3 + decode 0.7 + display 0.9")
|
||||||
.font(.system(.caption2, design: .monospaced))
|
.font(.system(.caption2, design: .monospaced))
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
#if os(macOS)
|
#if os(macOS)
|
||||||
|
|||||||
@@ -59,36 +59,50 @@ final class SessionModel: ObservableObject {
|
|||||||
@Published var fps = 0
|
@Published var fps = 0
|
||||||
@Published var mbps = 0.0
|
@Published var mbps = 0.0
|
||||||
@Published var totalFrames = 0
|
@Published var totalFrames = 0
|
||||||
/// Capture→client-receipt latency (ms), skew-corrected across machines via the connect-time
|
/// The unified latency stages (design/stats-unification.md), ms per 1 s window. `host+network`
|
||||||
/// clock offset — p50/p95 for the HUD. `latencyValid` is false until the first sample drains
|
/// = capture→received, skew-corrected across machines via the connect-time clock offset: the
|
||||||
/// (and whenever no host frames arrived in the last interval). `latencySkewCorrected` = the host
|
/// stage-2 HUD shows its p50 in the equation line; the stage-1 fallback shows p50/p95 as its
|
||||||
|
/// `capture→received` headline. `hostNetworkValid` is false until the first sample drains (and
|
||||||
|
/// whenever no host frames arrived in the last interval). `hostNetworkSkewCorrected` = the host
|
||||||
/// answered the skew handshake (the number is cross-machine valid, not just same-host).
|
/// answered the skew handshake (the number is cross-machine valid, not just same-host).
|
||||||
@Published var latencyP50Ms = 0.0
|
@Published var hostNetworkP50Ms = 0.0
|
||||||
@Published var latencyP95Ms = 0.0
|
@Published var hostNetworkP95Ms = 0.0
|
||||||
@Published var latencyValid = false
|
@Published var hostNetworkValid = false
|
||||||
@Published var latencySkewCorrected = false
|
@Published var hostNetworkSkewCorrected = false
|
||||||
/// Capture→present (glass-to-glass, modulo the host render→capture term) — only the stage-2
|
/// End-to-end = capture→on-glass, measured directly per frame (never summed from the stages) —
|
||||||
/// presenter can stamp this (it owns decode + a CAMetalLayer/display-link present). Stays
|
/// the HUD headline. Only the stage-2 presenter can stamp it (it owns decode + a
|
||||||
/// invalid under stage-1, where the layer presents internally with no per-frame callback.
|
/// CAMetalLayer/display-link present); stays invalid under stage-1, where the layer presents
|
||||||
@Published var presentLatencyP50Ms = 0.0
|
/// internally with no per-frame callback.
|
||||||
@Published var presentLatencyP95Ms = 0.0
|
@Published var endToEndP50Ms = 0.0
|
||||||
@Published var presentLatencyValid = false
|
@Published var endToEndP95Ms = 0.0
|
||||||
@Published var presentLatencySkewCorrected = false
|
@Published var endToEndValid = false
|
||||||
/// Decode-completion→present (the "present tail": ring wait + render + vsync) — the term the
|
@Published var endToEndSkewCorrected = false
|
||||||
/// stage-2 presenter exists to shorten. Both instants are client-side, so no skew applies.
|
/// The client-local stage terms of the HUD's equation line (single clock, no skew; p50 only):
|
||||||
@Published var presentTailP50Ms = 0.0
|
/// decode = received→decoded, display = decoded→on-glass (ring wait + render + vsync — the
|
||||||
@Published var presentTailP95Ms = 0.0
|
/// term the stage-2 presenter exists to shorten).
|
||||||
@Published var presentTailValid = false
|
@Published var decodeP50Ms = 0.0
|
||||||
|
@Published var decodeValid = false
|
||||||
|
@Published var displayP50Ms = 0.0
|
||||||
|
@Published var displayValid = false
|
||||||
|
/// Unrecoverable network frame drops in the last window (FEC couldn't rebuild them) and their
|
||||||
|
/// share of frames offered, `lost/(received+lost)`. The HUD hides the line while zero.
|
||||||
|
@Published var lostFrames = 0
|
||||||
|
@Published var lostPct = 0.0
|
||||||
/// Mirrors StreamView's capture state (it owns the input capture; this drives the
|
/// Mirrors StreamView's capture state (it owns the input capture; this drives the
|
||||||
/// HUD's "click to capture" / "⌘⎋ releases" hint).
|
/// HUD's "click to capture" / "⌘⎋ releases" hint).
|
||||||
@Published var mouseCaptured = false
|
@Published var mouseCaptured = false
|
||||||
|
|
||||||
let meter = FrameMeter()
|
let meter = FrameMeter()
|
||||||
|
/// Capture→received (the host+network stage), fed per AU at receipt by the stream view's
|
||||||
|
/// onFrame — under both presenters.
|
||||||
let latency = LatencyMeter()
|
let latency = LatencyMeter()
|
||||||
/// Fed by the stage-2 presenter's display link (capture→present). Passed to StreamView.
|
/// The stage-2 meters, passed to StreamView: end-to-end (capture→on-glass, stamped at
|
||||||
let presentLatency = LatencyMeter()
|
/// present), decode (received→decoded), display (decoded→on-glass).
|
||||||
/// Fed by the same present stamp (decode-completion→present). Passed to StreamView.
|
let endToEnd = LatencyMeter()
|
||||||
let presentTail = LatencyMeter()
|
let decodeStage = LatencyMeter()
|
||||||
|
let displayStage = LatencyMeter()
|
||||||
|
/// Cumulative reassembler-drop counter at the last stats drain (per-window `lost` delta).
|
||||||
|
private var lastFramesDropped: UInt64 = 0
|
||||||
private var statsTimer: Timer?
|
private var statsTimer: Timer?
|
||||||
private var audio: SessionAudio?
|
private var audio: SessionAudio?
|
||||||
private var gamepadCapture: GamepadCapture?
|
private var gamepadCapture: GamepadCapture?
|
||||||
@@ -281,7 +295,12 @@ final class SessionModel: ObservableObject {
|
|||||||
phase = .idle
|
phase = .idle
|
||||||
fps = 0
|
fps = 0
|
||||||
mbps = 0
|
mbps = 0
|
||||||
latencyValid = false
|
hostNetworkValid = false
|
||||||
|
endToEndValid = false
|
||||||
|
decodeValid = false
|
||||||
|
displayValid = false
|
||||||
|
lostFrames = 0
|
||||||
|
lostPct = 0
|
||||||
mouseCaptured = false
|
mouseCaptured = false
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -321,6 +340,7 @@ final class SessionModel: ObservableObject {
|
|||||||
}
|
}
|
||||||
|
|
||||||
private func startStatsTimer() {
|
private func startStatsTimer() {
|
||||||
|
lastFramesDropped = 0 // a fresh connection's cumulative drop counter starts at 0
|
||||||
let timer = Timer(timeInterval: 1.0, repeats: true) { [weak self] _ in
|
let timer = Timer(timeInterval: 1.0, repeats: true) { [weak self] _ in
|
||||||
guard let self else { return }
|
guard let self else { return }
|
||||||
Task { @MainActor in
|
Task { @MainActor in
|
||||||
@@ -328,28 +348,41 @@ final class SessionModel: ObservableObject {
|
|||||||
self.fps = frames
|
self.fps = frames
|
||||||
self.mbps = Double(bytes) * 8 / 1_000_000
|
self.mbps = Double(bytes) * 8 / 1_000_000
|
||||||
self.totalFrames = total
|
self.totalFrames = total
|
||||||
|
// Per-window `lost` = the delta of the connector's cumulative reassembler-drop
|
||||||
|
// counter (0 after close — treat a rewind as no loss rather than underflowing).
|
||||||
|
let dropped = self.connection?.framesDropped() ?? 0
|
||||||
|
let lost = dropped >= self.lastFramesDropped
|
||||||
|
? Int(dropped - self.lastFramesDropped) : 0
|
||||||
|
self.lastFramesDropped = dropped
|
||||||
|
self.lostFrames = lost
|
||||||
|
self.lostPct = lost > 0 ? Double(lost) / Double(frames + lost) * 100 : 0
|
||||||
if let lat = self.latency.drain() {
|
if let lat = self.latency.drain() {
|
||||||
self.latencyP50Ms = lat.p50Ms
|
self.hostNetworkP50Ms = lat.p50Ms
|
||||||
self.latencyP95Ms = lat.p95Ms
|
self.hostNetworkP95Ms = lat.p95Ms
|
||||||
self.latencySkewCorrected = lat.skewCorrected
|
self.hostNetworkSkewCorrected = lat.skewCorrected
|
||||||
self.latencyValid = true
|
self.hostNetworkValid = true
|
||||||
} else {
|
} else {
|
||||||
self.latencyValid = false
|
self.hostNetworkValid = false
|
||||||
}
|
}
|
||||||
if let p = self.presentLatency.drain() {
|
if let e = self.endToEnd.drain() {
|
||||||
self.presentLatencyP50Ms = p.p50Ms
|
self.endToEndP50Ms = e.p50Ms
|
||||||
self.presentLatencyP95Ms = p.p95Ms
|
self.endToEndP95Ms = e.p95Ms
|
||||||
self.presentLatencySkewCorrected = p.skewCorrected
|
self.endToEndSkewCorrected = e.skewCorrected
|
||||||
self.presentLatencyValid = true
|
self.endToEndValid = true
|
||||||
} else {
|
} else {
|
||||||
self.presentLatencyValid = false
|
self.endToEndValid = false
|
||||||
}
|
}
|
||||||
if let t = self.presentTail.drain() {
|
if let d = self.decodeStage.drain() {
|
||||||
self.presentTailP50Ms = t.p50Ms
|
self.decodeP50Ms = d.p50Ms
|
||||||
self.presentTailP95Ms = t.p95Ms
|
self.decodeValid = true
|
||||||
self.presentTailValid = true
|
|
||||||
} else {
|
} else {
|
||||||
self.presentTailValid = false
|
self.decodeValid = false
|
||||||
|
}
|
||||||
|
if let d = self.displayStage.drain() {
|
||||||
|
self.displayP50Ms = d.p50Ms
|
||||||
|
self.displayValid = true
|
||||||
|
} else {
|
||||||
|
self.displayValid = false
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,5 +1,7 @@
|
|||||||
// The streaming overlay HUD: mode + fps/throughput, the capture→client (and, under the stage-2
|
// The streaming overlay HUD: mode + fps/throughput, the unified latency lines
|
||||||
// presenter, capture→present) latency lines, the platform input hint, and disconnect.
|
// (design/stats-unification.md — end-to-end headline + the stage equation under stage-2, the
|
||||||
|
// capture→received headline under the stage-1 fallback), the loss counter, the platform input
|
||||||
|
// hint, and disconnect.
|
||||||
|
|
||||||
import PunktfunkKit
|
import PunktfunkKit
|
||||||
import SwiftUI
|
import SwiftUI
|
||||||
@@ -18,24 +20,32 @@ struct StreamHUDView: View {
|
|||||||
Text("\(connection.width)×\(connection.height)@\(connection.refreshHz) \(model.fps) fps \(model.mbps, specifier: "%.1f") Mb/s")
|
Text("\(connection.width)×\(connection.height)@\(connection.refreshHz) \(model.fps) fps \(model.mbps, specifier: "%.1f") Mb/s")
|
||||||
.font(.system(.caption, design: .monospaced))
|
.font(.system(.caption, design: .monospaced))
|
||||||
}
|
}
|
||||||
if model.latencyValid {
|
if model.endToEndValid {
|
||||||
// Capture→client-receipt (skew-corrected); excludes the layer's decode+present —
|
// Stage-2: the end-to-end headline (capture→on-glass, measured directly, skew-
|
||||||
// see LatencyMeter. "(same-host)" when the host didn't answer the skew handshake.
|
// corrected) — "(same-host clock)" when the host didn't answer the skew handshake.
|
||||||
Text("capture→client \(model.latencyP50Ms, specifier: "%.1f")/\(model.latencyP95Ms, specifier: "%.1f") ms p50/p95\(model.latencySkewCorrected ? "" : " (same-host)")")
|
Text("end-to-end \(model.endToEndP50Ms, specifier: "%.1f") ms p50 · \(model.endToEndP95Ms, specifier: "%.1f") p95 · capture→on-glass\(model.endToEndSkewCorrected ? "" : " (same-host clock)")")
|
||||||
|
.font(.system(.caption2, design: .monospaced))
|
||||||
|
.foregroundStyle(.secondary)
|
||||||
|
// The equation: the three stages tiling the headline interval (per-window p50s —
|
||||||
|
// they only approximately sum to the directly-measured total).
|
||||||
|
if model.hostNetworkValid && model.decodeValid && model.displayValid {
|
||||||
|
Text("= host+network \(model.hostNetworkP50Ms, specifier: "%.1f") + decode \(model.decodeP50Ms, specifier: "%.1f") + display \(model.displayP50Ms, specifier: "%.1f")")
|
||||||
.font(.system(.caption2, design: .monospaced))
|
.font(.system(.caption2, design: .monospaced))
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
}
|
}
|
||||||
if model.presentLatencyValid {
|
} else if model.hostNetworkValid {
|
||||||
// Capture→present (glass-to-glass, modulo host render→capture) — stage-2 presenter
|
// Stage-1 fallback presenter: the layer decodes + presents internally with no
|
||||||
// only; stage-1's layer presents internally with no per-frame stamp.
|
// per-frame stamp, so the honest headline ends at receipt — and there is no
|
||||||
Text("capture→present \(model.presentLatencyP50Ms, specifier: "%.1f")/\(model.presentLatencyP95Ms, specifier: "%.1f") ms p50/p95\(model.presentLatencySkewCorrected ? "" : " (same-host)")")
|
// equation line (host+network is the whole measured interval).
|
||||||
|
Text("capture→received \(model.hostNetworkP50Ms, specifier: "%.1f") ms p50 · \(model.hostNetworkP95Ms, specifier: "%.1f") p95\(model.hostNetworkSkewCorrected ? "" : " (same-host clock)")")
|
||||||
.font(.system(.caption2, design: .monospaced))
|
.font(.system(.caption2, design: .monospaced))
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
}
|
}
|
||||||
if model.presentTailValid {
|
if model.lostFrames > 0 {
|
||||||
// Decode→present (the client-local "present tail": ring wait + render + vsync) —
|
// Unrecoverable network drops this window; hidden while the link is clean.
|
||||||
// the term the stage-2 presenter shortens; no skew applies (one clock).
|
// String(format:) rather than specifier interpolation: the literal % would
|
||||||
Text("decode→present \(model.presentTailP50Ms, specifier: "%.1f")/\(model.presentTailP95Ms, specifier: "%.1f") ms p50/p95")
|
// otherwise land in the LocalizedStringKey's format string as a bogus conversion.
|
||||||
|
Text(String(format: "lost %d (%.1f%%)", model.lostFrames, model.lostPct))
|
||||||
.font(.system(.caption2, design: .monospaced))
|
.font(.system(.caption2, design: .monospaced))
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -310,10 +310,11 @@ extension SettingsView {
|
|||||||
Text("Video presenter · debug")
|
Text("Video presenter · debug")
|
||||||
} footer: {
|
} footer: {
|
||||||
Text("Stage 2 (default) decodes explicitly and presents through Metal with a display "
|
Text("Stage 2 (default) decodes explicitly and presents through Metal with a display "
|
||||||
+ "link — it adds a capture→present (glass-to-glass) latency line in the HUD and "
|
+ "link — it gives the HUD the end-to-end (capture→on-glass) headline with the "
|
||||||
+ "self-recovers from decode stalls. Stage 1 feeds compressed video straight to the "
|
+ "host+network/decode/display stage equation and self-recovers from decode "
|
||||||
+ "system display layer; it freezes on a lost HEVC reference frame, so it's a debug "
|
+ "stalls. Stage 1 feeds compressed video straight to the system display layer; "
|
||||||
+ "fallback only. Applies from the next session.")
|
+ "it freezes on a lost HEVC reference frame, so it's a debug fallback only. "
|
||||||
|
+ "Applies from the next session.")
|
||||||
.font(.geist(12, relativeTo: .caption))
|
.font(.geist(12, relativeTo: .caption))
|
||||||
.foregroundStyle(.secondary)
|
.foregroundStyle(.secondary)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -35,6 +35,10 @@ public struct AccessUnit: Sendable {
|
|||||||
public let ptsNs: UInt64
|
public let ptsNs: UInt64
|
||||||
public let frameIndex: UInt32
|
public let frameIndex: UInt32
|
||||||
public let flags: UInt32
|
public let flags: UInt32
|
||||||
|
/// Client `CLOCK_REALTIME` instant the AU was handed over by the core (post-FEC, decrypted)
|
||||||
|
/// — the **received** measurement point of design/stats-unification.md. The decode stage is
|
||||||
|
/// `decodedNs - receivedNs`, both client-local (no skew offset applies).
|
||||||
|
public let receivedNs: Int64
|
||||||
}
|
}
|
||||||
|
|
||||||
/// One Opus audio packet (48 kHz stereo, 5 ms frames) — decode with AVAudioConverter
|
/// One Opus audio packet (48 kHz stereo, 5 ms frames) — decode with AVAudioConverter
|
||||||
@@ -419,9 +423,13 @@ public final class PunktfunkConnection {
|
|||||||
case statusOK:
|
case statusOK:
|
||||||
guard let base = frame.data, frame.len > 0 else { return nil }
|
guard let base = frame.data, frame.len > 0 else { return nil }
|
||||||
let data = Data(bytes: base, count: Int(frame.len)) // copy: ptr valid only until next call
|
let data = Data(bytes: base, count: Int(frame.len)) // copy: ptr valid only until next call
|
||||||
|
var ts = timespec()
|
||||||
|
clock_gettime(CLOCK_REALTIME, &ts)
|
||||||
|
let receivedNs = Int64(ts.tv_sec) * 1_000_000_000 + Int64(ts.tv_nsec)
|
||||||
return AccessUnit(
|
return AccessUnit(
|
||||||
data: data, ptsNs: frame.pts_ns,
|
data: data, ptsNs: frame.pts_ns,
|
||||||
frameIndex: frame.frame_index, flags: frame.flags)
|
frameIndex: frame.frame_index, flags: frame.flags,
|
||||||
|
receivedNs: receivedNs)
|
||||||
case statusNoFrame:
|
case statusNoFrame:
|
||||||
return nil
|
return nil
|
||||||
case statusClosed:
|
case statusClosed:
|
||||||
|
|||||||
@@ -1,23 +1,25 @@
|
|||||||
// Per-frame latency sampler for the live HUD: records capture->client-receipt latency and drains
|
// Per-frame latency-stage sampler for the live HUD: records one interval per frame (an end
|
||||||
// percentiles on demand. NSLock rather than an actor — the writer is the non-async pump/arrival
|
// instant minus a start instant, both CLOCK_REALTIME ns) and drains percentiles on demand.
|
||||||
// path (same pattern as the app's FrameMeter).
|
// NSLock rather than an actor — the writers are the non-async pump/decode/present paths (same
|
||||||
|
// pattern as the app's FrameMeter).
|
||||||
|
|
||||||
import Foundation
|
import Foundation
|
||||||
|
|
||||||
/// Samples the **capture->client-receipt** latency of each access unit and reports percentiles.
|
/// Samples one **latency stage** per frame and reports percentiles. One instance per stage of the
|
||||||
|
/// unified stats model (design/stats-unification.md):
|
||||||
///
|
///
|
||||||
/// The latency is `now - pts_ns`, where `pts_ns` is the host's capture wall clock (the AU's pts) and
|
/// - `host+network` = capture→received: `record(ptsNs:offsetNs:)` at AU receipt.
|
||||||
/// `now` is the client's `CLOCK_REALTIME` instant the AU was received, shifted by the connect-time
|
/// - `decode` = received→decoded and `display` = decoded→displayed: client-local single-clock
|
||||||
/// **clock-skew offset** (`PunktfunkConnection.clockOffsetNs`, host minus client) so the difference
|
/// stages — `record(ptsNs:atNs:offsetNs:)` with the start instant as `ptsNs` and `offsetNs: 0`.
|
||||||
/// is valid across machines. `offsetNs == 0` means an old host that didn't answer the skew handshake
|
/// - `end-to-end` = capture→displayed, measured directly (never summed from the stages):
|
||||||
/// (or genuinely synced clocks) — the number is then only meaningful same-host.
|
/// `record(ptsNs:atNs:offsetNs:)` at present.
|
||||||
///
|
///
|
||||||
/// SCOPE (stage-1 presenter): this covers host capture -> encode -> FEC -> network -> reassembly ->
|
/// For the host-anchored intervals (capture→…) the sample is `end + offset - pts_ns`, where
|
||||||
/// decrypt -> handed to the presenter. It does **not** include the on-device VideoToolbox decode or
|
/// `pts_ns` is the host's capture wall clock (the AU's pts) and the connect-time **clock-skew
|
||||||
/// the `AVSampleBufferDisplayLayer` present — that layer decodes and presents compressed samples
|
/// offset** (`PunktfunkConnection.clockOffsetNs`, host minus client) makes the difference valid
|
||||||
/// internally with no per-frame callback. True decode->present (the full glass-to-glass) needs the
|
/// across machines. `offsetNs == 0` means an old host that didn't answer the skew handshake (or
|
||||||
/// stage-2 presenter (`VTDecompressionSession` decode-completion + `CAMetalLayer`/display-link
|
/// genuinely synced clocks) — the number is then only meaningful same-host, and the HUD tags the
|
||||||
/// present); this meter is the substrate it will extend.
|
/// end-to-end line `(same-host clock)`.
|
||||||
public final class LatencyMeter: @unchecked Sendable {
|
public final class LatencyMeter: @unchecked Sendable {
|
||||||
private let lock = NSLock()
|
private let lock = NSLock()
|
||||||
private var samplesUs: [Int64] = []
|
private var samplesUs: [Int64] = []
|
||||||
@@ -34,12 +36,16 @@ public final class LatencyMeter: @unchecked Sendable {
|
|||||||
record(ptsNs: ptsNs, atNs: nowNs, offsetNs: offsetNs)
|
record(ptsNs: ptsNs, atNs: nowNs, offsetNs: offsetNs)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Record one frame whose latency is `atNs + offsetNs - ptsNs` — an EXPLICIT client instant
|
/// Record one frame whose sample is `atNs + offsetNs - ptsNs` — an EXPLICIT end instant
|
||||||
/// rather than now. The stage-2 presenter uses this to stamp capture→present at the display
|
/// rather than now. `ptsNs` is the stage's start point: the AU pts for the host-anchored
|
||||||
/// link's target present time (not the moment the present call ran). All in `CLOCK_REALTIME`.
|
/// intervals, or a client stamp (receivedNs / decodedNs, with `offsetNs: 0`) for the local
|
||||||
|
/// decode/display stages. The stage-2 presenter stamps its present-side samples at the
|
||||||
|
/// display link's target present time (not the moment the present call ran). All in
|
||||||
|
/// `CLOCK_REALTIME`.
|
||||||
public func record(ptsNs: UInt64, atNs: Int64, offsetNs: Int64) {
|
public func record(ptsNs: UInt64, atNs: Int64, offsetNs: Int64) {
|
||||||
let latNs = atNs &+ offsetNs &- Int64(bitPattern: ptsNs)
|
let latNs = atNs &+ offsetNs &- Int64(bitPattern: ptsNs)
|
||||||
// Drop absurd values (a clock step, a wildly wrong offset, or garbage pts).
|
// Drop absurd values (a clock step, a wildly wrong offset, garbage pts, or a stage whose
|
||||||
|
// start stamp is missing/after its end) — samples are clamped to (0, 10 s).
|
||||||
guard latNs > 0, latNs < 10_000_000_000 else { return }
|
guard latNs > 0, latNs < 10_000_000_000 else { return }
|
||||||
lock.lock()
|
lock.lock()
|
||||||
samplesUs.append(latNs / 1000)
|
samplesUs.append(latNs / 1000)
|
||||||
|
|||||||
@@ -38,8 +38,9 @@ final class SessionPresenter {
|
|||||||
func start(
|
func start(
|
||||||
connection: PunktfunkConnection,
|
connection: PunktfunkConnection,
|
||||||
baseLayer: AVSampleBufferDisplayLayer,
|
baseLayer: AVSampleBufferDisplayLayer,
|
||||||
presentMeter: LatencyMeter?,
|
endToEndMeter: LatencyMeter?,
|
||||||
presentTailMeter: LatencyMeter? = nil,
|
decodeMeter: LatencyMeter? = nil,
|
||||||
|
displayMeter: LatencyMeter? = nil,
|
||||||
makeDisplayLink: (AnyObject, Selector) -> CADisplayLink,
|
makeDisplayLink: (AnyObject, Selector) -> CADisplayLink,
|
||||||
onFrame: (@Sendable (AccessUnit) -> Void)?,
|
onFrame: (@Sendable (AccessUnit) -> Void)?,
|
||||||
onSessionEnd: (@Sendable () -> Void)?
|
onSessionEnd: (@Sendable () -> Void)?
|
||||||
@@ -59,7 +60,8 @@ final class SessionPresenter {
|
|||||||
#endif
|
#endif
|
||||||
if !forceStage1,
|
if !forceStage1,
|
||||||
let pipeline = Stage2Pipeline(
|
let pipeline = Stage2Pipeline(
|
||||||
presentMeter: presentMeter, presentTailMeter: presentTailMeter) {
|
endToEndMeter: endToEndMeter, decodeMeter: decodeMeter,
|
||||||
|
displayMeter: displayMeter) {
|
||||||
let metal = pipeline.layer
|
let metal = pipeline.layer
|
||||||
// The opaque metal layer composites OVER the AVSampleBufferDisplayLayer base, which
|
// The opaque metal layer composites OVER the AVSampleBufferDisplayLayer base, which
|
||||||
// sits idle (un-enqueued) in stage-2. contentsScale + frame are set in layout().
|
// sits idle (un-enqueued) in stage-2. contentsScale + frame are set in layout().
|
||||||
|
|||||||
@@ -1,7 +1,8 @@
|
|||||||
// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async output
|
// Stage-2 presenter orchestrator: a pump thread pulls AUs → VideoDecoder; the decoder's async output
|
||||||
// drops the newest decoded frame into a 1-slot ring; the hosting view's display link calls `renderTick`
|
// drops the newest decoded frame into a 1-slot ring; the hosting view's display link calls `renderTick`
|
||||||
// once per vsync to draw + present the newest ready frame and stamp capture→present. Mirrors
|
// once per vsync to draw + present the newest ready frame and stamp the unified latency stages
|
||||||
// StreamPump's lifecycle (one per start; cancel is permanent).
|
// (end-to-end capture→on-glass, plus the decode and display stage terms —
|
||||||
|
// design/stats-unification.md). Mirrors StreamPump's lifecycle (one per start; cancel is permanent).
|
||||||
//
|
//
|
||||||
// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick` +
|
// Threading: the pump runs on its own thread; the decoder callback on a VT thread; `renderTick` +
|
||||||
// `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there). Only the ring (lock-guarded)
|
// `start`/`stop` on the MAIN thread (the view's CADisplayLink fires there). Only the ring (lock-guarded)
|
||||||
@@ -40,8 +41,8 @@ public final class Stage2Pipeline {
|
|||||||
private let ring = ReadyRing()
|
private let ring = ReadyRing()
|
||||||
private let presenter: MetalVideoPresenter
|
private let presenter: MetalVideoPresenter
|
||||||
private let decoder: VideoDecoder
|
private let decoder: VideoDecoder
|
||||||
private let presentMeter: LatencyMeter?
|
private let endToEndMeter: LatencyMeter?
|
||||||
private let presentTailMeter: LatencyMeter?
|
private let displayMeter: LatencyMeter?
|
||||||
private let recovery = KeyframeRecovery()
|
private let recovery = KeyframeRecovery()
|
||||||
private var token = StopFlag()
|
private var token = StopFlag()
|
||||||
private var offsetNs: Int64 = 0
|
private var offsetNs: Int64 = 0
|
||||||
@@ -56,28 +57,41 @@ public final class Stage2Pipeline {
|
|||||||
/// The Metal layer the hosting view installs + sizes.
|
/// The Metal layer the hosting view installs + sizes.
|
||||||
public var layer: CAMetalLayer { presenter.layer }
|
public var layer: CAMetalLayer { presenter.layer }
|
||||||
|
|
||||||
/// `presentMeter` records capture→present (the glass-to-glass term); `presentTailMeter`
|
/// Unified-stats meters (design/stats-unification.md): `endToEndMeter` records the headline
|
||||||
/// records decode-completion→present (the ring wait + render — the tail stage-2 exists to
|
/// end-to-end (capture→on-glass, skew-corrected); `decodeMeter` the decode stage
|
||||||
/// shorten). Both optional: metering never gates the presenter choice. Returns nil if Metal
|
/// (received→decoded); `displayMeter` the display stage (decoded→on-glass, the ring wait +
|
||||||
/// can't be set up (headless / no GPU) — caller falls back to the stage-1 presenter.
|
/// render + vsync — the tail stage-2 exists to shorten). All optional: metering never gates
|
||||||
public init?(presentMeter: LatencyMeter?, presentTailMeter: LatencyMeter? = nil) {
|
/// the presenter choice. Returns nil if Metal can't be set up (headless / no GPU) — caller
|
||||||
|
/// falls back to the stage-1 presenter.
|
||||||
|
public init?(
|
||||||
|
endToEndMeter: LatencyMeter?,
|
||||||
|
decodeMeter: LatencyMeter? = nil,
|
||||||
|
displayMeter: LatencyMeter? = nil
|
||||||
|
) {
|
||||||
guard let presenter = MetalVideoPresenter.make() else { return nil }
|
guard let presenter = MetalVideoPresenter.make() else { return nil }
|
||||||
self.presenter = presenter
|
self.presenter = presenter
|
||||||
self.presentMeter = presentMeter
|
self.endToEndMeter = endToEndMeter
|
||||||
self.presentTailMeter = presentTailMeter
|
self.displayMeter = displayMeter
|
||||||
let ring = ring
|
let ring = ring
|
||||||
let recovery = recovery
|
let recovery = recovery
|
||||||
self.decoder = VideoDecoder(
|
self.decoder = VideoDecoder(
|
||||||
onDecoded: { ring.submit($0) },
|
onDecoded: { frame in
|
||||||
|
// Decode stage = received→decoded, both client CLOCK_REALTIME (offset 0 — no
|
||||||
|
// skew applies). Stamped at decode completion, so it covers every decoded frame,
|
||||||
|
// including ones the newest-wins ring drops before present.
|
||||||
|
decodeMeter?.record(
|
||||||
|
ptsNs: UInt64(frame.receivedNs), atNs: frame.decodedNs, offsetNs: 0)
|
||||||
|
ring.submit(frame)
|
||||||
|
},
|
||||||
// Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump resets to
|
// Async decode failure (a bad P-frame referencing a lost/corrupt IDR): the pump resets to
|
||||||
// re-gate on the next IDR, and we ask the host to send one now (infinite GOP — it wouldn't
|
// re-gate on the next IDR, and we ask the host to send one now (infinite GOP — it wouldn't
|
||||||
// otherwise come soon). Throttled in KeyframeRecovery.
|
// otherwise come soon). Throttled in KeyframeRecovery.
|
||||||
onDecodeError: { _ in recovery.request() })
|
onDecodeError: { _ in recovery.request() })
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Start pulling AUs into the decoder. MAIN THREAD. `onFrame` fires per AU at receipt (capture→client
|
/// Start pulling AUs into the decoder. MAIN THREAD. `onFrame` fires per AU at receipt (the
|
||||||
/// meter, exactly as stage-1); `onSessionEnd` on close. `clockOffsetNs` (host minus client) makes the
|
/// host+network / capture→received meter, exactly as stage-1); `onSessionEnd` on close.
|
||||||
/// present stamp cross-machine valid.
|
/// `clockOffsetNs` (host minus client) makes the end-to-end stamp cross-machine valid.
|
||||||
public func start(
|
public func start(
|
||||||
connection: PunktfunkConnection,
|
connection: PunktfunkConnection,
|
||||||
onFrame: (@Sendable (AccessUnit) -> Void)?,
|
onFrame: (@Sendable (AccessUnit) -> Void)?,
|
||||||
@@ -174,14 +188,16 @@ public final class Stage2Pipeline {
|
|||||||
public func renderTick(targetPresentNs: Int64) {
|
public func renderTick(targetPresentNs: Int64) {
|
||||||
guard let frame = ring.take() else { return }
|
guard let frame = ring.take() else { return }
|
||||||
let offsetNs = offsetNs
|
let offsetNs = offsetNs
|
||||||
let presentMeter = presentMeter
|
let endToEndMeter = endToEndMeter
|
||||||
let presentTailMeter = presentTailMeter
|
let displayMeter = displayMeter
|
||||||
let rendered = presenter.render(frame.pixelBuffer, isHDR: frame.isHDR) { presentedNs in
|
let rendered = presenter.render(frame.pixelBuffer, isHDR: frame.isHDR) { presentedNs in
|
||||||
let atNs = presentedNs ?? targetPresentNs
|
let atNs = presentedNs ?? targetPresentNs
|
||||||
presentMeter?.record(ptsNs: frame.ptsNs, atNs: atNs, offsetNs: offsetNs)
|
// End-to-end = capture→on-glass, measured directly (skew-corrected via the
|
||||||
// Present tail = decode-completion → on-glass. Both instants are client
|
// connect-time clock offset) — the HUD headline.
|
||||||
// CLOCK_REALTIME, so no skew offset applies.
|
endToEndMeter?.record(ptsNs: frame.ptsNs, atNs: atNs, offsetNs: offsetNs)
|
||||||
presentTailMeter?.record(ptsNs: UInt64(frame.decodedNs), atNs: atNs, offsetNs: 0)
|
// Display stage = decoded → on-glass. Both instants are client CLOCK_REALTIME,
|
||||||
|
// so no skew offset applies.
|
||||||
|
displayMeter?.record(ptsNs: UInt64(frame.decodedNs), atNs: atNs, offsetNs: 0)
|
||||||
}
|
}
|
||||||
if !rendered { ring.putBack(frame) }
|
if !rendered { ring.putBack(frame) }
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -61,7 +61,7 @@ public enum Stage444Probe {
|
|||||||
guard created == noErr, let session else { return false }
|
guard created == noErr, let session else { return false }
|
||||||
defer { VTDecompressionSessionInvalidate(session) }
|
defer { VTDecompressionSessionInvalidate(session) }
|
||||||
|
|
||||||
let au = AccessUnit(data: data, ptsNs: 0, frameIndex: 0, flags: 0)
|
let au = AccessUnit(data: data, ptsNs: 0, frameIndex: 0, flags: 0, receivedNs: 0)
|
||||||
guard let sample = AnnexB.sampleBuffer(au: au, format: format, codec: .hevc) else { return false }
|
guard let sample = AnnexB.sampleBuffer(au: au, format: format, codec: .hevc) else { return false }
|
||||||
|
|
||||||
var produced: OSType = 0
|
var produced: OSType = 0
|
||||||
|
|||||||
@@ -15,6 +15,10 @@ import VideoToolbox
|
|||||||
public struct ReadyFrame: @unchecked Sendable {
|
public struct ReadyFrame: @unchecked Sendable {
|
||||||
/// Host capture clock (the AU's pts), in nanoseconds.
|
/// Host capture clock (the AU's pts), in nanoseconds.
|
||||||
public let ptsNs: UInt64
|
public let ptsNs: UInt64
|
||||||
|
/// Client `CLOCK_REALTIME` instant the AU was received (`AccessUnit.receivedNs`, threaded
|
||||||
|
/// through the decode via the frame refcon), in nanoseconds. 0 when unknown (a caller that
|
||||||
|
/// didn't stamp receipt) — the decode-stage meter then drops the sample via its sanity guard.
|
||||||
|
public let receivedNs: Int64
|
||||||
/// Client `CLOCK_REALTIME` instant decode completed, in nanoseconds.
|
/// Client `CLOCK_REALTIME` instant decode completed, in nanoseconds.
|
||||||
public let decodedNs: Int64
|
public let decodedNs: Int64
|
||||||
/// The decoded image — 8-bit NV12 biplanar (SDR) or 10-bit P010 biplanar (HDR), Metal-compatible.
|
/// The decoded image — 8-bit NV12 biplanar (SDR) or 10-bit P010 biplanar (HDR), Metal-compatible.
|
||||||
@@ -25,13 +29,16 @@ public struct ReadyFrame: @unchecked Sendable {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// The C output callback can't capture context, so VideoToolbox hands it the refcon we set at
|
/// The C output callback can't capture context, so VideoToolbox hands it the refcon we set at
|
||||||
/// session creation — a pointer back to the owning `VideoDecoder`.
|
/// session creation — a pointer back to the owning `VideoDecoder`. The per-frame refcon carries
|
||||||
|
/// the AU's `receivedNs` as a pointer bit pattern (a scalar smuggled through the C void*, never
|
||||||
|
/// dereferenced) so the decode stage can be computed against decode-completion.
|
||||||
private let decoderOutputCallback: VTDecompressionOutputCallback = {
|
private let decoderOutputCallback: VTDecompressionOutputCallback = {
|
||||||
refcon, _, status, _, imageBuffer, pts, _ in
|
refcon, frameRefcon, status, _, imageBuffer, pts, _ in
|
||||||
guard let refcon else { return }
|
guard let refcon else { return }
|
||||||
|
let receivedNs = frameRefcon.map { Int64(Int(bitPattern: $0)) } ?? 0
|
||||||
Unmanaged<VideoDecoder>.fromOpaque(refcon)
|
Unmanaged<VideoDecoder>.fromOpaque(refcon)
|
||||||
.takeUnretainedValue()
|
.takeUnretainedValue()
|
||||||
.handleDecoded(status: status, imageBuffer: imageBuffer, pts: pts)
|
.handleDecoded(status: status, imageBuffer: imageBuffer, pts: pts, receivedNs: receivedNs)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Owns a `VTDecompressionSession` rebuilt whenever the format description changes (every IDR /
|
/// Owns a `VTDecompressionSession` rebuilt whenever the format description changes (every IDR /
|
||||||
@@ -112,7 +119,9 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
session,
|
session,
|
||||||
sampleBuffer: sample,
|
sampleBuffer: sample,
|
||||||
flags: [._EnableAsynchronousDecompression],
|
flags: [._EnableAsynchronousDecompression],
|
||||||
frameRefcon: nil,
|
// The AU's receipt instant rides through as a bit pattern (nil for 0 — the output
|
||||||
|
// callback maps that back to 0); the callback needs it to stamp the decode stage.
|
||||||
|
frameRefcon: UnsafeMutableRawPointer(bitPattern: Int(au.receivedNs)),
|
||||||
infoFlagsOut: &infoOut)
|
infoFlagsOut: &infoOut)
|
||||||
lock.unlock()
|
lock.unlock()
|
||||||
if status != noErr {
|
if status != noErr {
|
||||||
@@ -218,8 +227,11 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
return true
|
return true
|
||||||
}
|
}
|
||||||
|
|
||||||
/// VT thread. Stamp decode-completion and enqueue, or report the error.
|
/// VT thread. Stamp decode-completion and enqueue, or report the error. `receivedNs` is the
|
||||||
fileprivate func handleDecoded(status: OSStatus, imageBuffer: CVImageBuffer?, pts: CMTime) {
|
/// AU's receipt instant threaded through the frame refcon (0 = unknown).
|
||||||
|
fileprivate func handleDecoded(
|
||||||
|
status: OSStatus, imageBuffer: CVImageBuffer?, pts: CMTime, receivedNs: Int64
|
||||||
|
) {
|
||||||
guard status == noErr, let imageBuffer else {
|
guard status == noErr, let imageBuffer else {
|
||||||
onDecodeError(status)
|
onDecodeError(status)
|
||||||
return
|
return
|
||||||
@@ -242,6 +254,8 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange
|
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarVideoRange
|
||||||
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange
|
|| fmt == kCVPixelFormatType_444YpCbCr10BiPlanarFullRange
|
||||||
onDecoded(
|
onDecoded(
|
||||||
ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer, isHDR: isHDR))
|
ReadyFrame(
|
||||||
|
ptsNs: ptsNs, receivedNs: receivedNs, decodedNs: decodedNs,
|
||||||
|
pixelBuffer: imageBuffer, isHDR: isHDR))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -85,39 +85,45 @@ public struct StreamView: NSViewRepresentable {
|
|||||||
private let onCaptureChange: ((Bool) -> Void)?
|
private let onCaptureChange: ((Bool) -> Void)?
|
||||||
private let onFrame: (@Sendable (AccessUnit) -> Void)?
|
private let onFrame: (@Sendable (AccessUnit) -> Void)?
|
||||||
private let onSessionEnd: (@Sendable () -> Void)?
|
private let onSessionEnd: (@Sendable () -> Void)?
|
||||||
private let presentMeter: LatencyMeter?
|
private let endToEndMeter: LatencyMeter?
|
||||||
private let presentTailMeter: LatencyMeter?
|
private let decodeMeter: LatencyMeter?
|
||||||
|
private let displayMeter: LatencyMeter?
|
||||||
|
|
||||||
/// `onFrame`/`onSessionEnd` fire on the pump thread — hop to the main actor for UI.
|
/// `onFrame`/`onSessionEnd` fire on the pump thread — hop to the main actor for UI.
|
||||||
/// `captureEnabled: false` disables input capture entirely while UI (e.g. a trust
|
/// `captureEnabled: false` disables input capture entirely while UI (e.g. a trust
|
||||||
/// prompt) is layered over the stream; flipping it to true auto-engages capture
|
/// prompt) is layered over the stream; flipping it to true auto-engages capture
|
||||||
/// once. `onCaptureChange` (main thread) reports engage/release — drive the HUD's
|
/// once. `onCaptureChange` (main thread) reports engage/release — drive the HUD's
|
||||||
/// "click to capture" / "⌘⎋ releases" hint with it. `presentMeter` records capture→present
|
/// "click to capture" / "⌘⎋ releases" hint with it. The meters record the unified latency
|
||||||
/// and `presentTailMeter` decode→present when the stage-2 presenter is active.
|
/// stages when the stage-2 presenter is active (design/stats-unification.md):
|
||||||
|
/// `endToEndMeter` capture→on-glass, `decodeMeter` received→decoded, `displayMeter`
|
||||||
|
/// decoded→on-glass.
|
||||||
public init(
|
public init(
|
||||||
connection: PunktfunkConnection,
|
connection: PunktfunkConnection,
|
||||||
captureEnabled: Bool = true,
|
captureEnabled: Bool = true,
|
||||||
onCaptureChange: ((Bool) -> Void)? = nil,
|
onCaptureChange: ((Bool) -> Void)? = nil,
|
||||||
onFrame: (@Sendable (AccessUnit) -> Void)? = nil,
|
onFrame: (@Sendable (AccessUnit) -> Void)? = nil,
|
||||||
onSessionEnd: (@Sendable () -> Void)? = nil,
|
onSessionEnd: (@Sendable () -> Void)? = nil,
|
||||||
presentMeter: LatencyMeter? = nil,
|
endToEndMeter: LatencyMeter? = nil,
|
||||||
presentTailMeter: LatencyMeter? = nil
|
decodeMeter: LatencyMeter? = nil,
|
||||||
|
displayMeter: LatencyMeter? = nil
|
||||||
) {
|
) {
|
||||||
self.connection = connection
|
self.connection = connection
|
||||||
self.captureEnabled = captureEnabled
|
self.captureEnabled = captureEnabled
|
||||||
self.onCaptureChange = onCaptureChange
|
self.onCaptureChange = onCaptureChange
|
||||||
self.onFrame = onFrame
|
self.onFrame = onFrame
|
||||||
self.onSessionEnd = onSessionEnd
|
self.onSessionEnd = onSessionEnd
|
||||||
self.presentMeter = presentMeter
|
self.endToEndMeter = endToEndMeter
|
||||||
self.presentTailMeter = presentTailMeter
|
self.decodeMeter = decodeMeter
|
||||||
|
self.displayMeter = displayMeter
|
||||||
}
|
}
|
||||||
|
|
||||||
public func makeNSView(context: Context) -> StreamLayerView {
|
public func makeNSView(context: Context) -> StreamLayerView {
|
||||||
let view = StreamLayerView()
|
let view = StreamLayerView()
|
||||||
view.onCaptureChange = onCaptureChange
|
view.onCaptureChange = onCaptureChange
|
||||||
view.captureEnabled = captureEnabled
|
view.captureEnabled = captureEnabled
|
||||||
view.presentMeter = presentMeter
|
view.endToEndMeter = endToEndMeter
|
||||||
view.presentTailMeter = presentTailMeter
|
view.decodeMeter = decodeMeter
|
||||||
|
view.displayMeter = displayMeter
|
||||||
view.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
view.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||||
return view
|
return view
|
||||||
}
|
}
|
||||||
@@ -125,8 +131,9 @@ public struct StreamView: NSViewRepresentable {
|
|||||||
public func updateNSView(_ view: StreamLayerView, context: Context) {
|
public func updateNSView(_ view: StreamLayerView, context: Context) {
|
||||||
view.onCaptureChange = onCaptureChange
|
view.onCaptureChange = onCaptureChange
|
||||||
view.captureEnabled = captureEnabled
|
view.captureEnabled = captureEnabled
|
||||||
view.presentMeter = presentMeter
|
view.endToEndMeter = endToEndMeter
|
||||||
view.presentTailMeter = presentTailMeter
|
view.decodeMeter = decodeMeter
|
||||||
|
view.displayMeter = displayMeter
|
||||||
// SwiftUI reuses the NSView across state changes — repoint the pump only when the
|
// SwiftUI reuses the NSView across state changes — repoint the pump only when the
|
||||||
// connection identity actually changed.
|
// connection identity actually changed.
|
||||||
if view.connection !== connection {
|
if view.connection !== connection {
|
||||||
@@ -141,10 +148,11 @@ public struct StreamView: NSViewRepresentable {
|
|||||||
|
|
||||||
public final class StreamLayerView: NSView {
|
public final class StreamLayerView: NSView {
|
||||||
private let displayLayer = AVSampleBufferDisplayLayer()
|
private let displayLayer = AVSampleBufferDisplayLayer()
|
||||||
/// Record capture→present / decode→present when the stage-2 presenter is active.
|
/// Record the unified latency stages (end-to-end / decode / display) when the stage-2
|
||||||
/// Consulted at start().
|
/// presenter is active. Consulted at start().
|
||||||
var presentMeter: LatencyMeter?
|
var endToEndMeter: LatencyMeter?
|
||||||
var presentTailMeter: LatencyMeter?
|
var decodeMeter: LatencyMeter?
|
||||||
|
var displayMeter: LatencyMeter?
|
||||||
/// The shared presenter stack: stage-2 (CAMetalLayer sublayer + display link) with the
|
/// The shared presenter stack: stage-2 (CAMetalLayer sublayer + display link) with the
|
||||||
/// stage-1 StreamPump → displayLayer path as the Metal-unavailable / DEBUG fallback.
|
/// stage-1 StreamPump → displayLayer path as the Metal-unavailable / DEBUG fallback.
|
||||||
private let presenter = SessionPresenter()
|
private let presenter = SessionPresenter()
|
||||||
@@ -571,8 +579,9 @@ public final class StreamLayerView: NSView {
|
|||||||
presenter.start(
|
presenter.start(
|
||||||
connection: connection,
|
connection: connection,
|
||||||
baseLayer: displayLayer,
|
baseLayer: displayLayer,
|
||||||
presentMeter: presentMeter,
|
endToEndMeter: endToEndMeter,
|
||||||
presentTailMeter: presentTailMeter,
|
decodeMeter: decodeMeter,
|
||||||
|
displayMeter: displayMeter,
|
||||||
makeDisplayLink: { displayLink(target: $0, selector: $1) },
|
makeDisplayLink: { displayLink(target: $0, selector: $1) },
|
||||||
onFrame: onFrame,
|
onFrame: onFrame,
|
||||||
onSessionEnd: onSessionEnd)
|
onSessionEnd: onSessionEnd)
|
||||||
|
|||||||
@@ -50,8 +50,9 @@ public struct StreamView: UIViewControllerRepresentable {
|
|||||||
private let onCaptureChange: ((Bool) -> Void)?
|
private let onCaptureChange: ((Bool) -> Void)?
|
||||||
private let onFrame: (@Sendable (AccessUnit) -> Void)?
|
private let onFrame: (@Sendable (AccessUnit) -> Void)?
|
||||||
private let onSessionEnd: (@Sendable () -> Void)?
|
private let onSessionEnd: (@Sendable () -> Void)?
|
||||||
private let presentMeter: LatencyMeter?
|
private let endToEndMeter: LatencyMeter?
|
||||||
private let presentTailMeter: LatencyMeter?
|
private let decodeMeter: LatencyMeter?
|
||||||
|
private let displayMeter: LatencyMeter?
|
||||||
|
|
||||||
public init(
|
public init(
|
||||||
connection: PunktfunkConnection,
|
connection: PunktfunkConnection,
|
||||||
@@ -59,24 +60,27 @@ public struct StreamView: UIViewControllerRepresentable {
|
|||||||
onCaptureChange: ((Bool) -> Void)? = nil,
|
onCaptureChange: ((Bool) -> Void)? = nil,
|
||||||
onFrame: (@Sendable (AccessUnit) -> Void)? = nil,
|
onFrame: (@Sendable (AccessUnit) -> Void)? = nil,
|
||||||
onSessionEnd: (@Sendable () -> Void)? = nil,
|
onSessionEnd: (@Sendable () -> Void)? = nil,
|
||||||
presentMeter: LatencyMeter? = nil,
|
endToEndMeter: LatencyMeter? = nil,
|
||||||
presentTailMeter: LatencyMeter? = nil
|
decodeMeter: LatencyMeter? = nil,
|
||||||
|
displayMeter: LatencyMeter? = nil
|
||||||
) {
|
) {
|
||||||
self.connection = connection
|
self.connection = connection
|
||||||
self.captureEnabled = captureEnabled
|
self.captureEnabled = captureEnabled
|
||||||
self.onCaptureChange = onCaptureChange
|
self.onCaptureChange = onCaptureChange
|
||||||
self.onFrame = onFrame
|
self.onFrame = onFrame
|
||||||
self.onSessionEnd = onSessionEnd
|
self.onSessionEnd = onSessionEnd
|
||||||
self.presentMeter = presentMeter
|
self.endToEndMeter = endToEndMeter
|
||||||
self.presentTailMeter = presentTailMeter
|
self.decodeMeter = decodeMeter
|
||||||
|
self.displayMeter = displayMeter
|
||||||
}
|
}
|
||||||
|
|
||||||
public func makeUIViewController(context: Context) -> StreamViewController {
|
public func makeUIViewController(context: Context) -> StreamViewController {
|
||||||
let controller = StreamViewController()
|
let controller = StreamViewController()
|
||||||
controller.onCaptureChange = onCaptureChange
|
controller.onCaptureChange = onCaptureChange
|
||||||
controller.captureEnabled = captureEnabled
|
controller.captureEnabled = captureEnabled
|
||||||
controller.presentMeter = presentMeter
|
controller.endToEndMeter = endToEndMeter
|
||||||
controller.presentTailMeter = presentTailMeter
|
controller.decodeMeter = decodeMeter
|
||||||
|
controller.displayMeter = displayMeter
|
||||||
controller.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
controller.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||||
return controller
|
return controller
|
||||||
}
|
}
|
||||||
@@ -84,8 +88,9 @@ public struct StreamView: UIViewControllerRepresentable {
|
|||||||
public func updateUIViewController(_ controller: StreamViewController, context: Context) {
|
public func updateUIViewController(_ controller: StreamViewController, context: Context) {
|
||||||
controller.onCaptureChange = onCaptureChange
|
controller.onCaptureChange = onCaptureChange
|
||||||
controller.captureEnabled = captureEnabled
|
controller.captureEnabled = captureEnabled
|
||||||
controller.presentMeter = presentMeter
|
controller.endToEndMeter = endToEndMeter
|
||||||
controller.presentTailMeter = presentTailMeter
|
controller.decodeMeter = decodeMeter
|
||||||
|
controller.displayMeter = displayMeter
|
||||||
if controller.connection !== connection {
|
if controller.connection !== connection {
|
||||||
controller.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
controller.start(connection: connection, onFrame: onFrame, onSessionEnd: onSessionEnd)
|
||||||
}
|
}
|
||||||
@@ -101,10 +106,11 @@ public struct StreamView: UIViewControllerRepresentable {
|
|||||||
public final class StreamViewController: UIViewController {
|
public final class StreamViewController: UIViewController {
|
||||||
public private(set) var connection: PunktfunkConnection?
|
public private(set) var connection: PunktfunkConnection?
|
||||||
private var observers: [NSObjectProtocol] = []
|
private var observers: [NSObjectProtocol] = []
|
||||||
/// Record capture→present / decode→present when the stage-2 presenter is active.
|
/// Record the unified latency stages (end-to-end / decode / display) when the stage-2
|
||||||
/// Consulted at start().
|
/// presenter is active. Consulted at start().
|
||||||
var presentMeter: LatencyMeter?
|
var endToEndMeter: LatencyMeter?
|
||||||
var presentTailMeter: LatencyMeter?
|
var decodeMeter: LatencyMeter?
|
||||||
|
var displayMeter: LatencyMeter?
|
||||||
/// The shared presenter stack: stage-2 (CAMetalLayer sublayer + display link) with the
|
/// The shared presenter stack: stage-2 (CAMetalLayer sublayer + display link) with the
|
||||||
/// stage-1 StreamPump → displayLayer path as the Metal-unavailable / DEBUG fallback.
|
/// stage-1 StreamPump → displayLayer path as the Metal-unavailable / DEBUG fallback.
|
||||||
private let presenter = SessionPresenter()
|
private let presenter = SessionPresenter()
|
||||||
@@ -285,8 +291,9 @@ public final class StreamViewController: UIViewController {
|
|||||||
presenter.start(
|
presenter.start(
|
||||||
connection: connection,
|
connection: connection,
|
||||||
baseLayer: streamView.displayLayer,
|
baseLayer: streamView.displayLayer,
|
||||||
presentMeter: presentMeter,
|
endToEndMeter: endToEndMeter,
|
||||||
presentTailMeter: presentTailMeter,
|
decodeMeter: decodeMeter,
|
||||||
|
displayMeter: displayMeter,
|
||||||
makeDisplayLink: { CADisplayLink(target: $0, selector: $1) },
|
makeDisplayLink: { CADisplayLink(target: $0, selector: $1) },
|
||||||
onFrame: onFrame,
|
onFrame: onFrame,
|
||||||
onSessionEnd: onSessionEnd)
|
onSessionEnd: onSessionEnd)
|
||||||
|
|||||||
@@ -1,6 +1,10 @@
|
|||||||
// Unit tests for LatencyMeter: percentiles, the skew-corrected flag, reset-on-drain, and the
|
// Unit tests for LatencyMeter (one instance per unified-stats stage — see
|
||||||
// absurd-value guard. Latencies are constructed by stamping a pts a known interval in the past, so
|
// design/stats-unification.md): percentiles, the skew-corrected flag, reset-on-drain, the
|
||||||
// the result is that interval plus the (tiny) clock advance between reads — asserted with tolerance.
|
// absurd-value guard, and the explicit-instant stage form (record(ptsNs:atNs:offsetNs:), used for
|
||||||
|
// the client-local decode/display stages and the at-present end-to-end stamp). Receipt-path
|
||||||
|
// latencies are constructed by stamping a pts a known interval in the past, so the result is that
|
||||||
|
// interval plus the (tiny) clock advance between reads — asserted with tolerance; the explicit
|
||||||
|
// form is exact.
|
||||||
|
|
||||||
import Foundation
|
import Foundation
|
||||||
import XCTest
|
import XCTest
|
||||||
@@ -38,6 +42,26 @@ final class LatencyMeterTests: XCTestCase {
|
|||||||
XCTAssertEqual(m.drain()?.skewCorrected, true)
|
XCTAssertEqual(m.drain()?.skewCorrected, true)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func testExplicitStageRecordIsExact() {
|
||||||
|
let m = LatencyMeter()
|
||||||
|
// A client-local stage (decode: received→decoded) — start instant as ptsNs, offset 0.
|
||||||
|
let receivedNs: Int64 = 1_000_000_000_000
|
||||||
|
m.record(ptsNs: UInt64(receivedNs), atNs: receivedNs + 3_000_000, offsetNs: 0)
|
||||||
|
guard let s = m.drain() else { return XCTFail("expected a sample") }
|
||||||
|
XCTAssertEqual(s.count, 1)
|
||||||
|
XCTAssertEqual(s.p50Ms, 3.0, "explicit instants make the sample exact")
|
||||||
|
XCTAssertFalse(s.skewCorrected, "local stages record with offset 0")
|
||||||
|
}
|
||||||
|
|
||||||
|
func testExplicitStageDropsNonPositiveInterval() {
|
||||||
|
let m = LatencyMeter()
|
||||||
|
// A stage whose start stamp is missing (0) or after its end must not pollute the window.
|
||||||
|
let decodedNs: Int64 = 1_000_000_000_000
|
||||||
|
m.record(ptsNs: 0, atNs: decodedNs, offsetNs: 0) // "start unknown" → > 10 s → dropped
|
||||||
|
m.record(ptsNs: UInt64(decodedNs + 1), atNs: decodedNs, offsetNs: 0) // negative → dropped
|
||||||
|
XCTAssertNil(m.drain())
|
||||||
|
}
|
||||||
|
|
||||||
func testDropsAbsurdValues() {
|
func testDropsAbsurdValues() {
|
||||||
let m = LatencyMeter()
|
let m = LatencyMeter()
|
||||||
let now = nowRealtimeNs()
|
let now = nowRealtimeNs()
|
||||||
|
|||||||
@@ -31,7 +31,7 @@ final class Stage444Tests: XCTestCase {
|
|||||||
let data = Data(Probe444Blobs.au444_8bit)
|
let data = Data(Probe444Blobs.au444_8bit)
|
||||||
let format = try XCTUnwrap(
|
let format = try XCTUnwrap(
|
||||||
AnnexB.formatDescription(fromIDR: data, codec: .hevc), "the 4:4:4 blob must yield a format description")
|
AnnexB.formatDescription(fromIDR: data, codec: .hevc), "the 4:4:4 blob must yield a format description")
|
||||||
let au = AccessUnit(data: data, ptsNs: 7_000_000, frameIndex: 0, flags: 0)
|
let au = AccessUnit(data: data, ptsNs: 7_000_000, frameIndex: 0, flags: 0, receivedNs: 0)
|
||||||
|
|
||||||
let box = FrameBox()
|
let box = FrameBox()
|
||||||
let done = DispatchSemaphore(value: 0)
|
let done = DispatchSemaphore(value: 0)
|
||||||
|
|||||||
@@ -38,7 +38,8 @@ final class VideoToolboxRoundTripTests: XCTestCase {
|
|||||||
XCTAssertEqual(AnnexB.avcc(from: annexB, codec: .hevc), avccSample)
|
XCTAssertEqual(AnnexB.avcc(from: annexB, codec: .hevc), avccSample)
|
||||||
|
|
||||||
// 3) Sample buffer → real decoder → pixels.
|
// 3) Sample buffer → real decoder → pixels.
|
||||||
let au = AccessUnit(data: annexB, ptsNs: 1_000_000, frameIndex: 0, flags: 0)
|
let au = AccessUnit(
|
||||||
|
data: annexB, ptsNs: 1_000_000, frameIndex: 0, flags: 0, receivedNs: 0)
|
||||||
let sample = try XCTUnwrap(AnnexB.sampleBuffer(au: au, format: rebuilt, codec: .hevc))
|
let sample = try XCTUnwrap(AnnexB.sampleBuffer(au: au, format: rebuilt, codec: .hevc))
|
||||||
|
|
||||||
var session: VTDecompressionSession?
|
var session: VTDecompressionSession?
|
||||||
@@ -67,13 +68,14 @@ final class VideoToolboxRoundTripTests: XCTestCase {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Stage-2 decode half: the same known IDR through `VideoDecoder` — assert its async output
|
/// Stage-2 decode half: the same known IDR through `VideoDecoder` — assert its async output
|
||||||
/// callback fires with a CVPixelBuffer of the right dimensions, the pts round-trips, and
|
/// callback fires with a CVPixelBuffer of the right dimensions, the pts and the receipt stamp
|
||||||
/// decode-completion is stamped.
|
/// round-trip (the latter rides the frame refcon), and decode-completion is stamped.
|
||||||
func testVideoDecoderAsyncCallbackDeliversPixels() throws {
|
func testVideoDecoderAsyncCallbackDeliversPixels() throws {
|
||||||
let (formatDesc, avccSample) = try encodeOneHEVCKeyframe()
|
let (formatDesc, avccSample) = try encodeOneHEVCKeyframe()
|
||||||
let annexB = try annexBAU(formatDesc: formatDesc, avccSample: avccSample)
|
let annexB = try annexBAU(formatDesc: formatDesc, avccSample: avccSample)
|
||||||
let format = try XCTUnwrap(AnnexB.formatDescription(fromIDR: annexB, codec: .hevc))
|
let format = try XCTUnwrap(AnnexB.formatDescription(fromIDR: annexB, codec: .hevc))
|
||||||
let au = AccessUnit(data: annexB, ptsNs: 42_000_000, frameIndex: 0, flags: 0)
|
let au = AccessUnit(
|
||||||
|
data: annexB, ptsNs: 42_000_000, frameIndex: 0, flags: 0, receivedNs: 41_000_000)
|
||||||
|
|
||||||
let box = FrameBox()
|
let box = FrameBox()
|
||||||
let done = DispatchSemaphore(value: 0)
|
let done = DispatchSemaphore(value: 0)
|
||||||
@@ -100,6 +102,8 @@ final class VideoToolboxRoundTripTests: XCTestCase {
|
|||||||
XCTAssertEqual(CVPixelBufferGetWidth(ready.pixelBuffer), width)
|
XCTAssertEqual(CVPixelBufferGetWidth(ready.pixelBuffer), width)
|
||||||
XCTAssertEqual(CVPixelBufferGetHeight(ready.pixelBuffer), height)
|
XCTAssertEqual(CVPixelBufferGetHeight(ready.pixelBuffer), height)
|
||||||
XCTAssertEqual(ready.ptsNs, 42_000_000, "pts round-trips through the decoder")
|
XCTAssertEqual(ready.ptsNs, 42_000_000, "pts round-trips through the decoder")
|
||||||
|
XCTAssertEqual(
|
||||||
|
ready.receivedNs, 41_000_000, "receivedNs round-trips through the frame refcon")
|
||||||
XCTAssertGreaterThan(ready.decodedNs, 0, "decode-completion is stamped")
|
XCTAssertGreaterThan(ready.decodedNs, 0, "decode-completion is stamped")
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -45,18 +45,40 @@ pub struct SessionParams {
|
|||||||
pub connect_timeout: Duration,
|
pub connect_timeout: Duration,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// The session pump's share of the unified stats window (design/stats-unification.md):
|
||||||
|
/// stream facts plus the two stages measured before the presenter. The frame consumer in
|
||||||
|
/// `ui_stream` contributes the `display` stage and the end-to-end percentiles.
|
||||||
#[derive(Clone, Copy, Default)]
|
#[derive(Clone, Copy, Default)]
|
||||||
pub struct Stats {
|
pub struct Stats {
|
||||||
|
/// AUs received (reassembled) per second, actual-elapsed-time denominator.
|
||||||
pub fps: f32,
|
pub fps: f32,
|
||||||
|
/// Received payload bytes × 8 / elapsed (goodput, excludes FEC overhead).
|
||||||
pub mbps: f32,
|
pub mbps: f32,
|
||||||
|
/// p50 `host+network` stage: capture → received, host-clock corrected (ms).
|
||||||
|
pub host_net_ms: f32,
|
||||||
|
/// p50 `decode` stage: received → decoded, single-clock client-local (ms).
|
||||||
pub decode_ms: f32,
|
pub decode_ms: f32,
|
||||||
/// Median capture→decoded latency over the last window (host-clock corrected).
|
/// Unrecoverable network frame drops this window, and their share of
|
||||||
pub latency_ms: f32,
|
/// received+lost (%). The OSD renders the counter line only when nonzero.
|
||||||
|
pub lost: u32,
|
||||||
|
pub lost_pct: f32,
|
||||||
/// The decode path frames actually took this window (`"vaapi"`/`"software"`, empty
|
/// The decode path frames actually took this window (`"vaapi"`/`"software"`, empty
|
||||||
/// until the first frame) — the OSD's trailing tag; tracks a mid-session fallback.
|
/// until the first frame) — the OSD's trailing tag; tracks a mid-session fallback.
|
||||||
pub decoder: &'static str,
|
pub decoder: &'static str,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Sort a window of µs samples in place and return `(p50, p95)` per the spec's index
|
||||||
|
/// rules (`sorted[len/2]`, `sorted[min(len*95/100, len-1)]`); an empty window reads 0.
|
||||||
|
pub fn window_percentiles(samples: &mut [u64]) -> (u64, u64) {
|
||||||
|
if samples.is_empty() {
|
||||||
|
return (0, 0);
|
||||||
|
}
|
||||||
|
samples.sort_unstable();
|
||||||
|
let p50 = samples[samples.len() / 2];
|
||||||
|
let p95 = samples[(samples.len() * 95 / 100).min(samples.len() - 1)];
|
||||||
|
(p50, p95)
|
||||||
|
}
|
||||||
|
|
||||||
pub enum SessionEvent {
|
pub enum SessionEvent {
|
||||||
Connected {
|
Connected {
|
||||||
connector: Arc<NativeClient>,
|
connector: Arc<NativeClient>,
|
||||||
@@ -219,13 +241,17 @@ fn pump(
|
|||||||
let mut window_start = Instant::now();
|
let mut window_start = Instant::now();
|
||||||
let mut frames_n = 0u32;
|
let mut frames_n = 0u32;
|
||||||
let mut bytes_n = 0u64;
|
let mut bytes_n = 0u64;
|
||||||
let mut decode_us_sum = 0u64;
|
// Stage windows (µs samples): `host+network` = capture→received (host-clock
|
||||||
let mut lat_us: Vec<u64> = Vec::with_capacity(256);
|
// corrected), `decode` = received→decoded (client-local). p50 per 1 s window.
|
||||||
|
let mut hostnet_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
|
let mut decode_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
// What actually decoded the last frame — a VAAPI failure demotes mid-session, so
|
// What actually decoded the last frame — a VAAPI failure demotes mid-session, so
|
||||||
// this is read off each frame's image variant rather than fixed at startup.
|
// this is read off each frame's image variant rather than fixed at startup.
|
||||||
let mut dec_path: &'static str = "";
|
let mut dec_path: &'static str = "";
|
||||||
// Loss recovery: watch the host→client unrecoverable-drop count and ask for an IDR when it climbs.
|
// Loss recovery: watch the host→client unrecoverable-drop count and ask for an IDR when it climbs.
|
||||||
let mut last_dropped = connector.frames_dropped();
|
let mut last_dropped = connector.frames_dropped();
|
||||||
|
// The stats window keeps its own drop cursor — the OSD shows the per-window delta.
|
||||||
|
let mut window_dropped = last_dropped;
|
||||||
let mut last_kf_req: Option<Instant> = None;
|
let mut last_kf_req: Option<Instant> = None;
|
||||||
|
|
||||||
let end: Option<String> = loop {
|
let end: Option<String> = loop {
|
||||||
@@ -237,7 +263,11 @@ fn pump(
|
|||||||
// every ~8–16 ms at 60–120 Hz anyway, so this rarely times out mid-stream).
|
// every ~8–16 ms at 60–120 Hz anyway, so this rarely times out mid-stream).
|
||||||
match connector.next_frame(Duration::from_millis(20)) {
|
match connector.next_frame(Duration::from_millis(20)) {
|
||||||
Ok(frame) => {
|
Ok(frame) => {
|
||||||
let t0 = Instant::now();
|
// The `received` point: AU fully reassembled, in hand, before decode.
|
||||||
|
let received_ns = now_ns();
|
||||||
|
// fps / goodput count every received AU (spec), decoded or not.
|
||||||
|
frames_n += 1;
|
||||||
|
bytes_n += frame.data.len() as u64;
|
||||||
match decoder.decode(&frame.data) {
|
match decoder.decode(&frame.data) {
|
||||||
Ok(Some(image)) => {
|
Ok(Some(image)) => {
|
||||||
total_frames += 1;
|
total_frames += 1;
|
||||||
@@ -252,18 +282,21 @@ fn pump(
|
|||||||
};
|
};
|
||||||
tracing::info!(width = w, height = h, path, "first frame decoded");
|
tracing::info!(width = w, height = h, path, "first frame decoded");
|
||||||
}
|
}
|
||||||
// Latency: our wall clock expressed in the host's capture clock,
|
// The `decoded` point — travels with the frame so the presenter
|
||||||
// minus the host-stamped capture pts (same math as client-rs).
|
// can measure its `display` stage against it.
|
||||||
let lat = (now_ns() as i128 + clock_offset as i128 - frame.pts_ns as i128)
|
let decoded_ns = now_ns();
|
||||||
|
// `host+network` stage: received expressed in the host's capture
|
||||||
|
// clock, minus the host-stamped capture pts (clamped (0, 10 s)).
|
||||||
|
let hn = (received_ns as i128 + clock_offset as i128 - frame.pts_ns as i128)
|
||||||
.max(0) as u64;
|
.max(0) as u64;
|
||||||
if lat > 0 && lat < 10_000_000_000 {
|
if hn > 0 && hn < 10_000_000_000 {
|
||||||
lat_us.push(lat / 1000);
|
hostnet_us.push(hn / 1000);
|
||||||
}
|
}
|
||||||
decode_us_sum += t0.elapsed().as_micros() as u64;
|
// `decode` stage: received→decoded, single clock, no skew.
|
||||||
frames_n += 1;
|
decode_us.push(decoded_ns.saturating_sub(received_ns) / 1000);
|
||||||
bytes_n += frame.data.len() as u64;
|
|
||||||
let _ = frame_tx.force_send(DecodedFrame {
|
let _ = frame_tx.force_send(DecodedFrame {
|
||||||
pts_ns: frame.pts_ns,
|
pts_ns: frame.pts_ns,
|
||||||
|
decoded_ns,
|
||||||
image,
|
image,
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
@@ -295,30 +328,36 @@ fn pump(
|
|||||||
|
|
||||||
if window_start.elapsed() >= Duration::from_secs(1) {
|
if window_start.elapsed() >= Duration::from_secs(1) {
|
||||||
let secs = window_start.elapsed().as_secs_f32();
|
let secs = window_start.elapsed().as_secs_f32();
|
||||||
lat_us.sort_unstable();
|
let (hn_p50, _) = window_percentiles(&mut hostnet_us);
|
||||||
let p50 = lat_us.get(lat_us.len() / 2).copied().unwrap_or(0);
|
let (dec_p50, _) = window_percentiles(&mut decode_us);
|
||||||
|
let lost = dropped.saturating_sub(window_dropped) as u32;
|
||||||
|
window_dropped = dropped;
|
||||||
tracing::debug!(
|
tracing::debug!(
|
||||||
fps = frames_n,
|
fps = frames_n,
|
||||||
lat_p50_us = p50,
|
hostnet_p50_us = hn_p50,
|
||||||
|
decode_p50_us = dec_p50,
|
||||||
|
lost,
|
||||||
total_frames,
|
total_frames,
|
||||||
"stream window"
|
"stream window"
|
||||||
);
|
);
|
||||||
let _ = ev_tx.try_send(SessionEvent::Stats(Stats {
|
let _ = ev_tx.try_send(SessionEvent::Stats(Stats {
|
||||||
fps: frames_n as f32 / secs,
|
fps: frames_n as f32 / secs,
|
||||||
mbps: bytes_n as f32 * 8.0 / 1e6 / secs,
|
mbps: bytes_n as f32 * 8.0 / 1e6 / secs,
|
||||||
decode_ms: if frames_n > 0 {
|
host_net_ms: hn_p50 as f32 / 1000.0,
|
||||||
decode_us_sum as f32 / frames_n as f32 / 1000.0
|
decode_ms: dec_p50 as f32 / 1000.0,
|
||||||
|
lost,
|
||||||
|
lost_pct: if lost > 0 {
|
||||||
|
lost as f32 * 100.0 / (frames_n + lost) as f32
|
||||||
} else {
|
} else {
|
||||||
0.0
|
0.0
|
||||||
},
|
},
|
||||||
latency_ms: p50 as f32 / 1000.0,
|
|
||||||
decoder: dec_path,
|
decoder: dec_path,
|
||||||
}));
|
}));
|
||||||
window_start = Instant::now();
|
window_start = Instant::now();
|
||||||
frames_n = 0;
|
frames_n = 0;
|
||||||
bytes_n = 0;
|
bytes_n = 0;
|
||||||
decode_us_sum = 0;
|
hostnet_us.clear();
|
||||||
lat_us.clear();
|
decode_us.clear();
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@@ -31,33 +31,63 @@ use std::time::{Duration, Instant};
|
|||||||
pub struct StreamPage {
|
pub struct StreamPage {
|
||||||
pub page: adw::NavigationPage,
|
pub page: adw::NavigationPage,
|
||||||
stats_label: gtk::Label,
|
stats_label: gtk::Label,
|
||||||
/// Median capture→paintable-set latency (ms) over the frame consumer's last 1 s
|
/// The frame consumer's share of the stats window (end-to-end percentiles + the
|
||||||
/// window — written there, folded into the OSD on each `Stats` event.
|
/// `display` stage) — written there each 1 s window, folded into the OSD on each
|
||||||
present_ms: Rc<Cell<f32>>,
|
/// `Stats` event.
|
||||||
|
presented: Rc<PresentedStats>,
|
||||||
/// The stream is HDR (PQ) right now — set by the frame consumer from each frame's
|
/// The stream is HDR (PQ) right now — set by the frame consumer from each frame's
|
||||||
/// signaling (the host can flip SDR↔HDR mid-session, in-band).
|
/// signaling (the host can flip SDR↔HDR mid-session, in-band).
|
||||||
hdr: Rc<Cell<bool>>,
|
hdr: Rc<Cell<bool>>,
|
||||||
|
/// `clock_offset_ns == 0`: the skew handshake didn't run (or same host) — the
|
||||||
|
/// end-to-end line carries the `(same-host clock)` flag (spec clock rules).
|
||||||
|
same_host: bool,
|
||||||
|
/// `W×H@Hz` for the OSD's first line — fixed at connect, per-session.
|
||||||
|
mode_line: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Presenter-side window results (design/stats-unification.md): end-to-end =
|
||||||
|
/// capture→displayed measured directly (p50 + p95), `display` stage = decoded→displayed
|
||||||
|
/// p50. All ms, refreshed once per 1 s window by the frame consumer.
|
||||||
|
#[derive(Default)]
|
||||||
|
struct PresentedStats {
|
||||||
|
e2e_p50_ms: Cell<f32>,
|
||||||
|
e2e_p95_ms: Cell<f32>,
|
||||||
|
display_ms: Cell<f32>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl StreamPage {
|
impl StreamPage {
|
||||||
|
/// Render the canonical unified-stats OSD (design/stats-unification.md — Linux
|
||||||
|
/// endpoint is paintable-set, headline reads `capture→displayed`).
|
||||||
pub fn update_stats(&self, s: Stats) {
|
pub fn update_stats(&self, s: Stats) {
|
||||||
let mut line = format!(
|
let mut line1 = format!("{} · {:.0} fps · {:.1} Mb/s", self.mode_line, s.fps, s.mbps);
|
||||||
"{:.0} fps · {:.1} Mbit/s · dec {:.1} ms · lat {:.1} ms · present {:.1} ms",
|
|
||||||
s.fps,
|
|
||||||
s.mbps,
|
|
||||||
s.decode_ms,
|
|
||||||
s.latency_ms,
|
|
||||||
self.present_ms.get()
|
|
||||||
);
|
|
||||||
// Which decoder actually ran this window (vaapi/software) — tracks a fallback.
|
// Which decoder actually ran this window (vaapi/software) — tracks a fallback.
|
||||||
if !s.decoder.is_empty() {
|
if !s.decoder.is_empty() {
|
||||||
line.push_str(" · ");
|
line1.push_str(" · ");
|
||||||
line.push_str(s.decoder);
|
line1.push_str(s.decoder);
|
||||||
}
|
}
|
||||||
if self.hdr.get() {
|
if self.hdr.get() {
|
||||||
line.push_str(" · HDR");
|
line1.push_str(" · HDR");
|
||||||
}
|
}
|
||||||
self.stats_label.set_text(&line);
|
let mut text = format!(
|
||||||
|
"{line1}\n\
|
||||||
|
end-to-end {:.1} ms p50 · {:.1} p95 · capture→displayed{}\n\
|
||||||
|
= host+network {:.1} + decode {:.1} + display {:.1}",
|
||||||
|
self.presented.e2e_p50_ms.get(),
|
||||||
|
self.presented.e2e_p95_ms.get(),
|
||||||
|
if self.same_host {
|
||||||
|
" (same-host clock)"
|
||||||
|
} else {
|
||||||
|
""
|
||||||
|
},
|
||||||
|
s.host_net_ms,
|
||||||
|
s.decode_ms,
|
||||||
|
self.presented.display_ms.get(),
|
||||||
|
);
|
||||||
|
// Counters — only rendered when nonzero this window.
|
||||||
|
if s.lost > 0 {
|
||||||
|
text.push_str(&format!("\nlost {} ({:.1}%)", s.lost, s.lost_pct));
|
||||||
|
}
|
||||||
|
self.stats_label.set_text(&text);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -206,6 +236,13 @@ pub fn new(args: StreamPageArgs) -> StreamPage {
|
|||||||
let w = build_widgets(&window, &title, chromeless, pad_connected);
|
let w = build_widgets(&window, &title, chromeless, pad_connected);
|
||||||
w.stats_label.set_visible(show_stats);
|
w.stats_label.set_visible(show_stats);
|
||||||
|
|
||||||
|
// OSD line-1 facts, fixed for the session (the mode is negotiated per-session).
|
||||||
|
let mode = connector.mode();
|
||||||
|
let mode_line = format!("{}×{}@{}", mode.width, mode.height, mode.refresh_hz);
|
||||||
|
// Offset 0 = the host didn't answer the skew handshake / same host — flagged on the
|
||||||
|
// end-to-end line so an uncorrected cross-machine number is never shown silently.
|
||||||
|
let same_host = clock_offset_ns == 0;
|
||||||
|
|
||||||
let capture = Rc::new(Capture {
|
let capture = Rc::new(Capture {
|
||||||
connector,
|
connector,
|
||||||
window: window.clone(),
|
window: window.clone(),
|
||||||
@@ -218,13 +255,13 @@ pub fn new(args: StreamPageArgs) -> StreamPage {
|
|||||||
held_buttons: RefCell::new(HashSet::new()),
|
held_buttons: RefCell::new(HashSet::new()),
|
||||||
});
|
});
|
||||||
|
|
||||||
let present_ms = Rc::new(Cell::new(0.0f32));
|
let presented = Rc::new(PresentedStats::default());
|
||||||
let hdr = Rc::new(Cell::new(false));
|
let hdr = Rc::new(Cell::new(false));
|
||||||
spawn_frame_consumer(
|
spawn_frame_consumer(
|
||||||
&w.picture,
|
&w.picture,
|
||||||
frames,
|
frames,
|
||||||
clock_offset_ns,
|
clock_offset_ns,
|
||||||
present_ms.clone(),
|
presented.clone(),
|
||||||
hdr.clone(),
|
hdr.clone(),
|
||||||
);
|
);
|
||||||
attach_keyboard(&w.overlay, &window, &capture, &stop, &w.stats_label);
|
attach_keyboard(&w.overlay, &window, &capture, &stop, &w.stats_label);
|
||||||
@@ -248,8 +285,10 @@ pub fn new(args: StreamPageArgs) -> StreamPage {
|
|||||||
StreamPage {
|
StreamPage {
|
||||||
page: w.page,
|
page: w.page,
|
||||||
stats_label: w.stats_label,
|
stats_label: w.stats_label,
|
||||||
present_ms,
|
presented,
|
||||||
hdr,
|
hdr,
|
||||||
|
same_host,
|
||||||
|
mode_line,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -456,12 +495,13 @@ fn attach_edge_reveal(
|
|||||||
/// then draws whatever paintable is current on its own frame clock. Ends itself when the
|
/// then draws whatever paintable is current on its own frame clock. Ends itself when the
|
||||||
/// channel closes or the picture is gone.
|
/// channel closes or the picture is gone.
|
||||||
///
|
///
|
||||||
/// Also the capture→present-ish measurement point: at each paintable set the frame's
|
/// Also the `displayed` measurement point (design/stats-unification.md): each paintable
|
||||||
/// host capture pts is compared against the local wall clock expressed in the host clock
|
/// set stamps the local wall clock, yielding end-to-end = capture→displayed (host-clock
|
||||||
/// (`clock_offset_ns`, same math as the session's decode latency). This is
|
/// corrected via `clock_offset_ns`, p50+p95, measured directly) and the client-local
|
||||||
/// capture→paintable-SET — GTK's own present adds one compositor cycle after this. The
|
/// `display` stage = decoded→displayed. This is capture→paintable-SET — GTK's own
|
||||||
/// 1 s p50 lands on the stats OSD (via `present_ms`) and in a "present window" debug
|
/// present adds one compositor cycle after this. The 1 s window results land on the
|
||||||
/// line for headless validation.
|
/// stats OSD (via `PresentedStats`) and in a "present window" debug line for headless
|
||||||
|
/// validation.
|
||||||
/// One-entry cache of `ColorDesc` → `GdkColorState` (signaling changes at most on an
|
/// One-entry cache of `ColorDesc` → `GdkColorState` (signaling changes at most on an
|
||||||
/// SDR↔HDR flip, never per frame).
|
/// SDR↔HDR flip, never per frame).
|
||||||
#[derive(Default)]
|
#[derive(Default)]
|
||||||
@@ -516,7 +556,7 @@ fn spawn_frame_consumer(
|
|||||||
picture: >k::Picture,
|
picture: >k::Picture,
|
||||||
frames: async_channel::Receiver<DecodedFrame>,
|
frames: async_channel::Receiver<DecodedFrame>,
|
||||||
clock_offset_ns: i64,
|
clock_offset_ns: i64,
|
||||||
present_ms: Rc<Cell<f32>>,
|
presented_stats: Rc<PresentedStats>,
|
||||||
hdr: Rc<Cell<bool>>,
|
hdr: Rc<Cell<bool>>,
|
||||||
) {
|
) {
|
||||||
let picture = picture.downgrade();
|
let picture = picture.downgrade();
|
||||||
@@ -528,7 +568,10 @@ fn spawn_frame_consumer(
|
|||||||
let mut yuv_state = ColorStateCache::default();
|
let mut yuv_state = ColorStateCache::default();
|
||||||
let mut rgb_state = ColorStateCache::default();
|
let mut rgb_state = ColorStateCache::default();
|
||||||
glib::spawn_future_local(async move {
|
glib::spawn_future_local(async move {
|
||||||
let mut win_lat_us: Vec<u64> = Vec::with_capacity(256);
|
// Window samples (µs): end-to-end capture→displayed (host-clock corrected) and
|
||||||
|
// the client-local display stage decoded→displayed.
|
||||||
|
let mut win_e2e_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
|
let mut win_disp_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
let mut win_start = Instant::now();
|
let mut win_start = Instant::now();
|
||||||
while let Ok(f) = frames.recv().await {
|
while let Ok(f) = frames.recv().await {
|
||||||
let Some(picture) = picture.upgrade() else {
|
let Some(picture) = picture.upgrade() else {
|
||||||
@@ -601,26 +644,34 @@ fn spawn_frame_consumer(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// Capture→paintable-set latency, host-clock corrected (same math and sanity
|
// The `displayed` stamp: end-to-end = capture→displayed host-clock corrected
|
||||||
// bound as the session's decode-latency window).
|
// (same clamp as the session's stage windows); display = decoded→displayed,
|
||||||
|
// single clock, no skew.
|
||||||
if presented {
|
if presented {
|
||||||
let lat = (crate::session::now_ns() as i128 + clock_offset_ns as i128
|
let displayed_ns = crate::session::now_ns();
|
||||||
- f.pts_ns as i128)
|
let e2e = (displayed_ns as i128 + clock_offset_ns as i128 - f.pts_ns as i128).max(0)
|
||||||
.max(0) as u64;
|
as u64;
|
||||||
if lat > 0 && lat < 10_000_000_000 {
|
if e2e > 0 && e2e < 10_000_000_000 {
|
||||||
win_lat_us.push(lat / 1000);
|
win_e2e_us.push(e2e / 1000);
|
||||||
}
|
}
|
||||||
|
win_disp_us.push(displayed_ns.saturating_sub(f.decoded_ns) / 1000);
|
||||||
}
|
}
|
||||||
if win_start.elapsed() >= Duration::from_secs(1) {
|
if win_start.elapsed() >= Duration::from_secs(1) {
|
||||||
win_lat_us.sort_unstable();
|
let frames = win_e2e_us.len();
|
||||||
let p50 = win_lat_us.get(win_lat_us.len() / 2).copied().unwrap_or(0);
|
let (e2e_p50, e2e_p95) = crate::session::window_percentiles(&mut win_e2e_us);
|
||||||
|
let (disp_p50, _) = crate::session::window_percentiles(&mut win_disp_us);
|
||||||
tracing::debug!(
|
tracing::debug!(
|
||||||
frames = win_lat_us.len(),
|
frames,
|
||||||
present_p50_us = p50,
|
e2e_p50_us = e2e_p50,
|
||||||
|
e2e_p95_us = e2e_p95,
|
||||||
|
display_p50_us = disp_p50,
|
||||||
"present window"
|
"present window"
|
||||||
);
|
);
|
||||||
present_ms.set(p50 as f32 / 1000.0);
|
presented_stats.e2e_p50_ms.set(e2e_p50 as f32 / 1000.0);
|
||||||
win_lat_us.clear();
|
presented_stats.e2e_p95_ms.set(e2e_p95 as f32 / 1000.0);
|
||||||
|
presented_stats.display_ms.set(disp_p50 as f32 / 1000.0);
|
||||||
|
win_e2e_us.clear();
|
||||||
|
win_disp_us.clear();
|
||||||
win_start = Instant::now();
|
win_start = Instant::now();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -24,11 +24,15 @@ use std::os::fd::RawFd;
|
|||||||
use std::ptr;
|
use std::ptr;
|
||||||
|
|
||||||
/// One decoded frame headed for the presenter, carrying the host capture timestamp so the
|
/// One decoded frame headed for the presenter, carrying the host capture timestamp so the
|
||||||
/// UI can measure capture→paintable-set latency at the moment it presents.
|
/// UI can measure capture→displayed latency at the moment it presents.
|
||||||
pub struct DecodedFrame {
|
pub struct DecodedFrame {
|
||||||
/// Host-clock capture pts (ns) of the AU this image decoded from — compare against
|
/// Host-clock capture pts (ns) of the AU this image decoded from — compare against
|
||||||
/// the local wall clock + `clock_offset_ns` at paintable-set time.
|
/// the local wall clock + `clock_offset_ns` at paintable-set time.
|
||||||
pub pts_ns: u64,
|
pub pts_ns: u64,
|
||||||
|
/// Local wall clock (ns) when the decoder emitted this image — the `decoded`
|
||||||
|
/// measurement point (design/stats-unification.md); the presenter subtracts it from
|
||||||
|
/// its paintable-set stamp for the client-local `display` stage.
|
||||||
|
pub decoded_ns: u64,
|
||||||
pub image: DecodedImage,
|
pub image: DecodedImage,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ example of driving the protocol end to end: QUIC control plane, UDP data plane,
|
|||||||
|
|
||||||
- **Receives a real stream**, writes a playable elementary stream (`.h265`/`.h264`/`.av1` — the
|
- **Receives a real stream**, writes a playable elementary stream (`.h265`/`.h264`/`.av1` — the
|
||||||
extension tracks the **negotiated codec**; the probe advertises all three and the host picks), and
|
extension tracks the **negotiated codec**; the probe advertises all three and the host picks), and
|
||||||
reports per-frame **capture→…→reassembled latency** percentiles (the host stamps each frame with
|
reports per-frame **capture→received latency** percentiles (the host stamps each frame with
|
||||||
its capture clock).
|
its capture clock).
|
||||||
- **Verification mode** against a synthetic host — byte-checks deterministic test frames.
|
- **Verification mode** against a synthetic host — byte-checks deterministic test frames.
|
||||||
- **Exercises every plane** with scripted test traffic:
|
- **Exercises every plane** with scripted test traffic:
|
||||||
|
|||||||
@@ -4,7 +4,7 @@
|
|||||||
//! * **verification** (`frames > 0`, synthetic host): byte-checks deterministic test frames;
|
//! * **verification** (`frames > 0`, synthetic host): byte-checks deterministic test frames;
|
||||||
//! * **stream** (`frames == 0`, virtual host): receives real encoded AUs, writes a playable
|
//! * **stream** (`frames == 0`, virtual host): receives real encoded AUs, writes a playable
|
||||||
//! elementary stream (the dump extension follows the negotiated codec — `.h265`/`.h264`/`.av1`;
|
//! elementary stream (the dump extension follows the negotiated codec — `.h265`/`.h264`/`.av1`;
|
||||||
//! the probe advertises all three), and reports per-frame **capture→…→reassembled latency**
|
//! the probe advertises all three), and reports per-frame **capture→received latency**
|
||||||
//! percentiles (the host stamps each frame with its capture wall clock; same-host runs share
|
//! percentiles (the host stamps each frame with its capture wall clock; same-host runs share
|
||||||
//! that clock).
|
//! that clock).
|
||||||
//!
|
//!
|
||||||
@@ -481,7 +481,7 @@ async fn session(args: Args) -> Result<()> {
|
|||||||
.await?;
|
.await?;
|
||||||
|
|
||||||
// Wall-clock skew handshake on the still-private control stream (before --remode/--speed-test
|
// Wall-clock skew handshake on the still-private control stream (before --remode/--speed-test
|
||||||
// take it): align our clock to the host's so the per-frame capture→reassembled latency is valid
|
// take it): align our clock to the host's so the per-frame capture→received latency is valid
|
||||||
// across machines. `None` ⇒ an old host that doesn't answer — fall back to a shared clock (0).
|
// across machines. `None` ⇒ an old host that doesn't answer — fall back to a shared clock (0).
|
||||||
let clock_offset_ns = match punktfunk_core::quic::clock_sync(&mut send, &mut recv).await {
|
let clock_offset_ns = match punktfunk_core::quic::clock_sync(&mut send, &mut recv).await {
|
||||||
Some(skew) => {
|
Some(skew) => {
|
||||||
@@ -1051,7 +1051,7 @@ async fn session(args: Args) -> Result<()> {
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
bytes += frame.data.len() as u64;
|
bytes += frame.data.len() as u64;
|
||||||
// capture→reassembled: our receive instant in the host clock (now + offset)
|
// capture→received: our receive instant in the host clock (now + offset)
|
||||||
// minus the host's capture pts. offset is 0 same-host / old host.
|
// minus the host's capture pts. offset is 0 same-host / old host.
|
||||||
let lat = (now_ns() as i128 + clock_offset as i128 - frame.pts_ns as i128)
|
let lat = (now_ns() as i128 + clock_offset as i128 - frame.pts_ns as i128)
|
||||||
.max(0) as u64;
|
.max(0) as u64;
|
||||||
@@ -1100,7 +1100,7 @@ async fn session(args: Args) -> Result<()> {
|
|||||||
lat_p99_us = pct(0.99),
|
lat_p99_us = pct(0.99),
|
||||||
lat_max_us = latencies_us.last().copied().unwrap_or(0),
|
lat_max_us = latencies_us.last().copied().unwrap_or(0),
|
||||||
skew_corrected,
|
skew_corrected,
|
||||||
"punktfunk/1 stream complete (capture→reassembled latency; skew_corrected=true ⇒ \
|
"punktfunk/1 stream complete (capture→received latency; skew_corrected=true ⇒ \
|
||||||
cross-machine valid, false ⇒ same-host clock)"
|
cross-machine valid, false ⇒ same-host clock)"
|
||||||
);
|
);
|
||||||
if expected > 0 {
|
if expected > 0 {
|
||||||
|
|||||||
@@ -2,7 +2,7 @@
|
|||||||
//! the UI thread, then handed — presenter and all — to the dedicated render thread
|
//! the UI thread, then handed — presenter and all — to the dedicated render thread
|
||||||
//! ([`crate::render`]), which presents decoded frames at stream cadence. The page itself only
|
//! ([`crate::render`]), which presents decoded frames at stream cadence. The page itself only
|
||||||
//! forwards panel size/DPI changes and draws the status-chip HUD overlay (mode · decode path ·
|
//! forwards panel size/DPI changes and draws the status-chip HUD overlay (mode · decode path ·
|
||||||
//! HDR · fps/throughput/latency · capture hint).
|
//! HDR · fps/goodput · end-to-end latency + stage equation · capture hint).
|
||||||
|
|
||||||
use super::style::{edges, uniform};
|
use super::style::{edges, uniform};
|
||||||
use super::Svc;
|
use super::Svc;
|
||||||
@@ -22,8 +22,9 @@ use windows_reactor::*;
|
|||||||
pub(crate) struct HudSample {
|
pub(crate) struct HudSample {
|
||||||
pub(crate) stats: Stats,
|
pub(crate) stats: Stats,
|
||||||
pub(crate) captured: bool,
|
pub(crate) captured: bool,
|
||||||
/// `(presents/s, skipped/s, capture→presented p50 ms)` — see [`crate::render::present_stats`].
|
/// The render thread's glass-side window (presents/s, skips, end-to-end p50/p95, display
|
||||||
pub(crate) present: (u32, u32, f32),
|
/// stage p50) — see [`crate::render::present_stats`].
|
||||||
|
pub(crate) present: crate::render::PresentStats,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Props for the stream page: the services plus the live HUD sample that drives the overlay
|
/// Props for the stream page: the services plus the live HUD sample that drives the overlay
|
||||||
@@ -171,13 +172,15 @@ fn fmt_uptime(secs: u32) -> String {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// The streaming HUD overlay (top-right), mirroring the Apple client: a chip row (mode · codec ·
|
/// The streaming HUD overlay (top-right), unified stats vocabulary (design/stats-unification.md):
|
||||||
/// decode path · HDR), a stream line (decode fps / bitrate / decode time), a glass line (display
|
/// a chip row (mode · codec · decode path · HDR), a stream line (received fps · goodput ·
|
||||||
/// presents + end-to-end latency decoded vs on-glass), a session line (host · time · loss), and
|
/// presenter fps), the end-to-end headline (capture→on-glass p50/p95, host-clock corrected), the
|
||||||
/// the shortcut hints. Layered over the `SwapChainPanel` in the same grid cell.
|
/// stage equation (= host+network + decode + display, stage p50s), a session line
|
||||||
|
/// (host · time · loss/skips), and the shortcut hints. Layered over the `SwapChainPanel` in the
|
||||||
|
/// same grid cell.
|
||||||
fn hud_overlay(hud: &HudSample, mode: Option<Mode>, host: &str) -> Element {
|
fn hud_overlay(hud: &HudSample, mode: Option<Mode>, host: &str) -> Element {
|
||||||
let stats = &hud.stats;
|
let stats = &hud.stats;
|
||||||
let (pfps, skipped, glass_ms) = hud.present;
|
let present = &hud.present;
|
||||||
let res = mode
|
let res = mode
|
||||||
.map(|m| format!("{}\u{00D7}{}@{}", m.width, m.height, m.refresh_hz))
|
.map(|m| format!("{}\u{00D7}{}@{}", m.width, m.height, m.refresh_hz))
|
||||||
.unwrap_or_else(|| "\u{2014}".into());
|
.unwrap_or_else(|| "\u{2014}".into());
|
||||||
@@ -193,25 +196,38 @@ fn hud_overlay(hud: &HudSample, mode: Option<Mode>, host: &str) -> Element {
|
|||||||
if stats.hdr {
|
if stats.hdr {
|
||||||
chips.push(hud_chip("HDR", Color::rgb(255, 205, 90)).into());
|
chips.push(hud_chip("HDR", Color::rgb(255, 205, 90)).into());
|
||||||
}
|
}
|
||||||
|
// Received fps + goodput, plus the presenter's own rate (Moonlight's "Rendering frame rate"
|
||||||
|
// analog — how often the display actually gets a new frame).
|
||||||
let stream_line = format!(
|
let stream_line = format!(
|
||||||
"{:.0} fps \u{00B7} {:.1} Mb/s \u{00B7} decode {:.1} ms",
|
"{:.0} fps \u{00B7} {:.1} Mb/s \u{00B7} display {} fps",
|
||||||
stats.fps, stats.mbps, stats.decode_ms
|
stats.fps, stats.mbps, present.fps
|
||||||
);
|
);
|
||||||
// End-to-end latency (host-clock corrected): capture→decoded from the pump, capture→on-glass
|
// The headline: end-to-end capture→displayed, measured directly post-Present (never the sum
|
||||||
// from the render thread's post-Present stamp. `skipped` = newest-wins drops (expected when
|
// of the stage percentiles). `(same-host clock)` flags an uncorrected clock (offset == 0:
|
||||||
// the stream outpaces the display); `lost` = unrecoverable network drops.
|
// same host, or the host skipped the skew handshake).
|
||||||
let glass_line = format!(
|
let mut e2e_line = format!(
|
||||||
"display {pfps} fps \u{00B7} latency {:.1} ms decoded / {glass_ms:.1} ms on-glass",
|
"end-to-end {:.1} ms p50 \u{00B7} {:.1} p95 \u{00B7} capture\u{2192}on-glass",
|
||||||
stats.latency_ms
|
present.e2e_p50_ms, present.e2e_p95_ms
|
||||||
|
);
|
||||||
|
if stats.same_host {
|
||||||
|
e2e_line.push_str(" (same-host clock)");
|
||||||
|
}
|
||||||
|
// The equation: the three stages tile the headline interval per frame; the window p50s only
|
||||||
|
// approximately sum (percentiles aren't additive).
|
||||||
|
let stage_line = format!(
|
||||||
|
"= host+network {:.1} + decode {:.1} + display {:.1}",
|
||||||
|
stats.hostnet_ms, stats.decode_ms, present.display_p50_ms
|
||||||
);
|
);
|
||||||
let mut session_bits: Vec<String> = Vec::new();
|
let mut session_bits: Vec<String> = Vec::new();
|
||||||
if !host.is_empty() {
|
if !host.is_empty() {
|
||||||
session_bits.push(host.to_string());
|
session_bits.push(host.to_string());
|
||||||
}
|
}
|
||||||
|
// `lost` = unrecoverable network drops (session-cumulative); `skipped` = the render thread's
|
||||||
|
// newest-wins drops last window (expected when the stream outpaces the display).
|
||||||
session_bits.push(fmt_uptime(stats.uptime_secs));
|
session_bits.push(fmt_uptime(stats.uptime_secs));
|
||||||
session_bits.push(format!("{} lost", stats.dropped));
|
session_bits.push(format!("{} lost", stats.dropped));
|
||||||
if skipped > 0 {
|
if present.skipped > 0 {
|
||||||
session_bits.push(format!("{skipped} skipped"));
|
session_bits.push(format!("{} skipped", present.skipped));
|
||||||
}
|
}
|
||||||
let session_line = session_bits.join(" \u{00B7} ");
|
let session_line = session_bits.join(" \u{00B7} ");
|
||||||
let hint = if hud.captured {
|
let hint = if hud.captured {
|
||||||
@@ -228,7 +244,8 @@ fn hud_overlay(hud: &HudSample, mode: Option<Mode>, host: &str) -> Element {
|
|||||||
vstack((
|
vstack((
|
||||||
hstack(chips).spacing(6.0),
|
hstack(chips).spacing(6.0),
|
||||||
dim(&stream_line),
|
dim(&stream_line),
|
||||||
dim(&glass_line),
|
dim(&e2e_line),
|
||||||
|
dim(&stage_line),
|
||||||
dim(&session_line),
|
dim(&session_line),
|
||||||
text_block(hint)
|
text_block(hint)
|
||||||
.font_size(11.0)
|
.font_size(11.0)
|
||||||
|
|||||||
@@ -241,8 +241,8 @@ fn run_headless_cli(args: &[String], identity: (String, String)) {
|
|||||||
session::SessionEvent::Stats(s) => tracing::info!(
|
session::SessionEvent::Stats(s) => tracing::info!(
|
||||||
fps = format!("{:.0}", s.fps),
|
fps = format!("{:.0}", s.fps),
|
||||||
mbps = format!("{:.1}", s.mbps),
|
mbps = format!("{:.1}", s.mbps),
|
||||||
decode_ms = format!("{:.2}", s.decode_ms),
|
decode_p50_ms = format!("{:.2}", s.decode_ms),
|
||||||
lat_ms = format!("{:.2}", s.latency_ms),
|
hostnet_p50_ms = format!("{:.2}", s.hostnet_ms),
|
||||||
frames_seen,
|
frames_seen,
|
||||||
"stats"
|
"stats"
|
||||||
),
|
),
|
||||||
|
|||||||
@@ -10,27 +10,46 @@
|
|||||||
//! draw (and redraws the held frame after a resize — fresh back buffers are blank).
|
//! draw (and redraws the held frame after a resize — fresh back buffers are blank).
|
||||||
|
|
||||||
use crate::present::Presenter;
|
use crate::present::Presenter;
|
||||||
use crate::session::FrameRx;
|
use crate::session::{FrameRx, FrameTimes};
|
||||||
use crossbeam_channel::RecvTimeoutError;
|
use crossbeam_channel::RecvTimeoutError;
|
||||||
use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
|
use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::time::{Duration, Instant};
|
use std::time::{Duration, Instant};
|
||||||
|
|
||||||
/// The last 1-second render window, published for the HUD (one render thread at a time):
|
/// The last 1-second render window, published for the HUD (one render thread at a time):
|
||||||
/// presents/s, frames skipped by the newest-wins drain, and the capture→presented p50 in µs.
|
/// presents/s, frames skipped by the newest-wins drain, the end-to-end (capture→on-glass)
|
||||||
|
/// p50/p95 and the `display` stage (decoded→displayed) p50, all stamped post-`Present()`, in µs.
|
||||||
/// Zeroed when a render thread starts so a new session never shows the previous one's numbers.
|
/// Zeroed when a render thread starts so a new session never shows the previous one's numbers.
|
||||||
static PRESENT_FPS: AtomicU32 = AtomicU32::new(0);
|
static PRESENT_FPS: AtomicU32 = AtomicU32::new(0);
|
||||||
static PRESENT_SKIPPED: AtomicU32 = AtomicU32::new(0);
|
static PRESENT_SKIPPED: AtomicU32 = AtomicU32::new(0);
|
||||||
static PRESENT_P50_US: AtomicU64 = AtomicU64::new(0);
|
static E2E_P50_US: AtomicU64 = AtomicU64::new(0);
|
||||||
|
static E2E_P95_US: AtomicU64 = AtomicU64::new(0);
|
||||||
|
static DISPLAY_P50_US: AtomicU64 = AtomicU64::new(0);
|
||||||
|
|
||||||
/// `(presents/s, skipped/s, capture→presented p50 ms)` of the last render window — the HUD's
|
/// The last render window's glass-side numbers (see the statics above) — the HUD's headline
|
||||||
/// display-side line.
|
/// (end-to-end) and trailing stage (display) come from here.
|
||||||
pub fn present_stats() -> (u32, u32, f32) {
|
#[derive(Clone, Copy, Default, PartialEq)]
|
||||||
(
|
pub struct PresentStats {
|
||||||
PRESENT_FPS.load(Ordering::Relaxed),
|
/// Presents per second (includes resize redraws of a held frame).
|
||||||
PRESENT_SKIPPED.load(Ordering::Relaxed),
|
pub fps: u32,
|
||||||
PRESENT_P50_US.load(Ordering::Relaxed) as f32 / 1000.0,
|
/// Frames dropped by the newest-wins drain this window (client-side pacing skips).
|
||||||
)
|
pub skipped: u32,
|
||||||
|
/// End-to-end capture→displayed p50, ms (host-clock corrected, measured directly).
|
||||||
|
pub e2e_p50_ms: f32,
|
||||||
|
/// End-to-end capture→displayed p95, ms.
|
||||||
|
pub e2e_p95_ms: f32,
|
||||||
|
/// `display` stage p50, ms: decoded → displayed, single-clock client-local.
|
||||||
|
pub display_p50_ms: f32,
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn present_stats() -> PresentStats {
|
||||||
|
PresentStats {
|
||||||
|
fps: PRESENT_FPS.load(Ordering::Relaxed),
|
||||||
|
skipped: PRESENT_SKIPPED.load(Ordering::Relaxed),
|
||||||
|
e2e_p50_ms: E2E_P50_US.load(Ordering::Relaxed) as f32 / 1000.0,
|
||||||
|
e2e_p95_ms: E2E_P95_US.load(Ordering::Relaxed) as f32 / 1000.0,
|
||||||
|
display_p50_ms: DISPLAY_P50_US.load(Ordering::Relaxed) as f32 / 1000.0,
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// UI-thread → render-thread state. Size is packed into ONE atomic (w<<32|h) so a resize never
|
/// UI-thread → render-thread state. Size is packed into ONE atomic (w<<32|h) so a resize never
|
||||||
@@ -101,8 +120,9 @@ impl Drop for RenderThread {
|
|||||||
struct SendPresenter(Presenter);
|
struct SendPresenter(Presenter);
|
||||||
unsafe impl Send for SendPresenter {}
|
unsafe impl Send for SendPresenter {}
|
||||||
|
|
||||||
/// Spawn the render thread. `frames` carries `(frame, capture pts_ns)`; `clock_offset_ns` maps our
|
/// Spawn the render thread. `frames` carries `(frame, FrameTimes)`; `clock_offset_ns` maps our
|
||||||
/// wall clock onto the host's so the logged present latency is end-to-end (same math as the pump).
|
/// wall clock onto the host's so the end-to-end (capture→on-glass) number is cross-machine valid
|
||||||
|
/// (same math as the pump's host+network stage).
|
||||||
pub fn spawn(
|
pub fn spawn(
|
||||||
presenter: Presenter,
|
presenter: Presenter,
|
||||||
frames: FrameRx,
|
frames: FrameRx,
|
||||||
@@ -147,12 +167,17 @@ fn run(presenter: SendPresenter, frames: FrameRx, shared: Arc<RenderShared>, clo
|
|||||||
let mut applied = (0u32, 0u32, 0u32); // last (w, h, dpi) handed to the presenter
|
let mut applied = (0u32, 0u32, 0u32); // last (w, h, dpi) handed to the presenter
|
||||||
let mut presented = 0u32;
|
let mut presented = 0u32;
|
||||||
let mut dropped = 0u32;
|
let mut dropped = 0u32;
|
||||||
let mut lat_us: Vec<u64> = Vec::with_capacity(256);
|
// 1 s tumbling windows: end-to-end (capture→displayed) and the display stage
|
||||||
|
// (decoded→displayed), sampled post-Present. Percentiles only (spec: stats-unification.md).
|
||||||
|
let mut e2e_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
|
let mut display_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
let mut window_start = Instant::now();
|
let mut window_start = Instant::now();
|
||||||
let mut last_dpi_poll = Instant::now();
|
let mut last_dpi_poll = Instant::now();
|
||||||
PRESENT_FPS.store(0, Ordering::Relaxed);
|
PRESENT_FPS.store(0, Ordering::Relaxed);
|
||||||
PRESENT_SKIPPED.store(0, Ordering::Relaxed);
|
PRESENT_SKIPPED.store(0, Ordering::Relaxed);
|
||||||
PRESENT_P50_US.store(0, Ordering::Relaxed);
|
E2E_P50_US.store(0, Ordering::Relaxed);
|
||||||
|
E2E_P95_US.store(0, Ordering::Relaxed);
|
||||||
|
DISPLAY_P50_US.store(0, Ordering::Relaxed);
|
||||||
|
|
||||||
loop {
|
loop {
|
||||||
if shared.stop.load(Ordering::SeqCst) {
|
if shared.stop.load(Ordering::SeqCst) {
|
||||||
@@ -198,29 +223,55 @@ fn run(presenter: SendPresenter, frames: FrameRx, shared: Arc<RenderShared>, clo
|
|||||||
p.set_hdr_metadata(meta);
|
p.set_hdr_metadata(meta);
|
||||||
}
|
}
|
||||||
|
|
||||||
let pts_ns = newest.as_ref().map(|(_, pts)| *pts);
|
let times: Option<FrameTimes> = newest.as_ref().map(|(_, t)| *t);
|
||||||
p.present(newest.map(|(f, _)| f));
|
p.present(newest.map(|(f, _)| f));
|
||||||
presented += 1;
|
presented += 1;
|
||||||
if let Some(pts) = pts_ns {
|
if let Some(t) = times {
|
||||||
// Capture→presented, host-clock corrected — the glass-side companion to the pump's
|
// The `displayed` point: post-Present() on this thread (the honest best-effort
|
||||||
// capture→decoded p50.
|
// presentation instant on Windows — endpoint label `capture→on-glass`).
|
||||||
let lat = (now_ns() as i128 + clock_offset_ns as i128 - pts as i128).max(0) as u64;
|
let displayed_ns = now_ns();
|
||||||
if lat > 0 && lat < 10_000_000_000 {
|
// End-to-end = capture → displayed, host-clock corrected, measured directly
|
||||||
lat_us.push(lat / 1000);
|
// (never the sum of stage percentiles). Clamped (0, 10 s).
|
||||||
|
let e2e =
|
||||||
|
(displayed_ns as i128 + clock_offset_ns as i128 - t.pts_ns as i128).max(0) as u64;
|
||||||
|
if e2e > 0 && e2e < 10_000_000_000 {
|
||||||
|
e2e_us.push(e2e / 1000);
|
||||||
|
}
|
||||||
|
// `display` stage = decoded → displayed, single-clock client-local.
|
||||||
|
let disp = displayed_ns.saturating_sub(t.decoded_ns);
|
||||||
|
if disp < 10_000_000_000 {
|
||||||
|
display_us.push(disp / 1000);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if window_start.elapsed() >= Duration::from_secs(1) {
|
if window_start.elapsed() >= Duration::from_secs(1) {
|
||||||
lat_us.sort_unstable();
|
e2e_us.sort_unstable();
|
||||||
let p50 = lat_us.get(lat_us.len() / 2).copied().unwrap_or(0);
|
display_us.sort_unstable();
|
||||||
tracing::debug!(presented, dropped, present_p50_us = p50, "render window");
|
let p50 = |v: &[u64]| v.get(v.len() / 2).copied().unwrap_or(0);
|
||||||
|
// p95 = sorted[min(len*95/100, len-1)] — the empty-window case falls to 0 via `get`.
|
||||||
|
let p95 = |v: &[u64]| {
|
||||||
|
v.get((v.len() * 95 / 100).min(v.len().saturating_sub(1)))
|
||||||
|
.copied()
|
||||||
|
.unwrap_or(0)
|
||||||
|
};
|
||||||
|
tracing::debug!(
|
||||||
|
presented,
|
||||||
|
dropped,
|
||||||
|
e2e_p50_us = p50(&e2e_us),
|
||||||
|
e2e_p95_us = p95(&e2e_us),
|
||||||
|
display_p50_us = p50(&display_us),
|
||||||
|
"render window"
|
||||||
|
);
|
||||||
PRESENT_FPS.store(presented, Ordering::Relaxed);
|
PRESENT_FPS.store(presented, Ordering::Relaxed);
|
||||||
PRESENT_SKIPPED.store(dropped, Ordering::Relaxed);
|
PRESENT_SKIPPED.store(dropped, Ordering::Relaxed);
|
||||||
PRESENT_P50_US.store(p50, Ordering::Relaxed);
|
E2E_P50_US.store(p50(&e2e_us), Ordering::Relaxed);
|
||||||
|
E2E_P95_US.store(p95(&e2e_us), Ordering::Relaxed);
|
||||||
|
DISPLAY_P50_US.store(p50(&display_us), Ordering::Relaxed);
|
||||||
window_start = Instant::now();
|
window_start = Instant::now();
|
||||||
presented = 0;
|
presented = 0;
|
||||||
dropped = 0;
|
dropped = 0;
|
||||||
lat_us.clear();
|
e2e_us.clear();
|
||||||
|
display_us.clear();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
tracing::info!("render thread exiting");
|
tracing::info!("render thread exiting");
|
||||||
|
|||||||
@@ -46,11 +46,18 @@ pub struct SessionParams {
|
|||||||
|
|
||||||
#[derive(Clone, Copy, Default, PartialEq)]
|
#[derive(Clone, Copy, Default, PartialEq)]
|
||||||
pub struct Stats {
|
pub struct Stats {
|
||||||
|
/// AUs received (reassembled) per second — actual-elapsed-time denominator.
|
||||||
pub fps: f32,
|
pub fps: f32,
|
||||||
|
/// Received payload goodput (excludes FEC overhead).
|
||||||
pub mbps: f32,
|
pub mbps: f32,
|
||||||
|
/// `decode` stage p50 over the last 1 s window: received → decoded, client-local clock.
|
||||||
pub decode_ms: f32,
|
pub decode_ms: f32,
|
||||||
/// Median capture→decoded latency over the last window (host-clock corrected).
|
/// `host+network` stage p50 over the last 1 s window: capture (`pts_ns`) → received,
|
||||||
pub latency_ms: f32,
|
/// host-clock corrected via `clock_offset_ns`.
|
||||||
|
pub hostnet_ms: f32,
|
||||||
|
/// True when `clock_offset_ns == 0` (host didn't answer the skew handshake / same host) —
|
||||||
|
/// the HUD appends `(same-host clock)` to the end-to-end line.
|
||||||
|
pub same_host: bool,
|
||||||
/// True when decoding on the GPU (D3D11VA) vs. CPU (software).
|
/// True when decoding on the GPU (D3D11VA) vs. CPU (software).
|
||||||
pub hardware: bool,
|
pub hardware: bool,
|
||||||
/// True when the stream is BT.2020 PQ HDR10 (last decoded frame).
|
/// True when the stream is BT.2020 PQ HDR10 (last decoded frame).
|
||||||
@@ -81,9 +88,19 @@ pub enum SessionEvent {
|
|||||||
Stats(Stats),
|
Stats(Stats),
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Decoded frames + their host-capture `pts_ns`, session pump → render thread (crossbeam so that
|
/// Per-frame measurement points carried with a decoded frame to the render thread: the host
|
||||||
|
/// capture clock (`pts_ns`) and our local `decoded` stamp (wall-clock ns). Post-`Present()` the
|
||||||
|
/// render thread derives the `display` stage (displayed − decoded, single-clock) and the
|
||||||
|
/// end-to-end headline (displayed + clock_offset − pts) from them.
|
||||||
|
#[derive(Clone, Copy)]
|
||||||
|
pub struct FrameTimes {
|
||||||
|
pub pts_ns: u64,
|
||||||
|
pub decoded_ns: u64,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Decoded frames + their measurement points, session pump → render thread (crossbeam so that
|
||||||
/// thread can block with a timeout — async-channel has no `recv_timeout`).
|
/// thread can block with a timeout — async-channel has no `recv_timeout`).
|
||||||
pub type FrameRx = crossbeam_channel::Receiver<(DecodedFrame, u64)>;
|
pub type FrameRx = crossbeam_channel::Receiver<(DecodedFrame, FrameTimes)>;
|
||||||
|
|
||||||
pub struct SessionHandle {
|
pub struct SessionHandle {
|
||||||
pub events: async_channel::Receiver<SessionEvent>,
|
pub events: async_channel::Receiver<SessionEvent>,
|
||||||
@@ -205,7 +222,7 @@ impl AudioDec {
|
|||||||
fn pump(
|
fn pump(
|
||||||
params: SessionParams,
|
params: SessionParams,
|
||||||
ev_tx: async_channel::Sender<SessionEvent>,
|
ev_tx: async_channel::Sender<SessionEvent>,
|
||||||
frame_tx: crossbeam_channel::Sender<(DecodedFrame, u64)>,
|
frame_tx: crossbeam_channel::Sender<(DecodedFrame, FrameTimes)>,
|
||||||
frame_rx: FrameRx,
|
frame_rx: FrameRx,
|
||||||
stop: Arc<AtomicBool>,
|
stop: Arc<AtomicBool>,
|
||||||
) {
|
) {
|
||||||
@@ -310,8 +327,9 @@ fn pump(
|
|||||||
let mut window_start = Instant::now();
|
let mut window_start = Instant::now();
|
||||||
let mut frames_n = 0u32;
|
let mut frames_n = 0u32;
|
||||||
let mut bytes_n = 0u64;
|
let mut bytes_n = 0u64;
|
||||||
let mut decode_us_sum = 0u64;
|
// 1 s tumbling stage windows (spec: design/stats-unification.md — percentiles, never means).
|
||||||
let mut lat_us: Vec<u64> = Vec::with_capacity(256);
|
let mut hostnet_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
|
let mut decode_us: Vec<u64> = Vec::with_capacity(256);
|
||||||
let mut pcm = vec![0f32; 5760 * channels as usize]; // scratch: max Opus frame (120 ms) × channels
|
let mut pcm = vec![0f32; 5760 * channels as usize]; // scratch: max Opus frame (120 ms) × channels
|
||||||
// Loss recovery: watch the host→client unrecoverable-drop count and ask for an IDR when it climbs.
|
// Loss recovery: watch the host→client unrecoverable-drop count and ask for an IDR when it climbs.
|
||||||
let mut last_dropped = connector.frames_dropped();
|
let mut last_dropped = connector.frames_dropped();
|
||||||
@@ -323,7 +341,18 @@ fn pump(
|
|||||||
}
|
}
|
||||||
match connector.next_frame(Duration::from_millis(4)) {
|
match connector.next_frame(Duration::from_millis(4)) {
|
||||||
Ok(frame) => {
|
Ok(frame) => {
|
||||||
let t0 = Instant::now();
|
// The `received` point: AU fully reassembled, handed to us, before decode.
|
||||||
|
let received_ns = now_ns();
|
||||||
|
// fps = AUs received per second, Mb/s = received goodput (spec: counted at the
|
||||||
|
// received point, not the decoded one).
|
||||||
|
frames_n += 1;
|
||||||
|
bytes_n += frame.data.len() as u64;
|
||||||
|
// `host+network` stage: capture → received, host-clock corrected. Clamped (0, 10 s).
|
||||||
|
let hostnet = (received_ns as i128 + clock_offset as i128 - frame.pts_ns as i128)
|
||||||
|
.max(0) as u64;
|
||||||
|
if hostnet > 0 && hostnet < 10_000_000_000 {
|
||||||
|
hostnet_us.push(hostnet / 1000);
|
||||||
|
}
|
||||||
// A D3D11VA→software demotion (see `Decoder::decode`) starts a FRESH decoder that
|
// A D3D11VA→software demotion (see `Decoder::decode`) starts a FRESH decoder that
|
||||||
// has none of the stream's parameter sets; under infinite GOP it would sit on
|
// has none of the stream's parameter sets; under infinite GOP it would sit on
|
||||||
// "PPS id out of range" forever. Detect the transition and force a new IDR so the
|
// "PPS id out of range" forever. Detect the transition and force a new IDR so the
|
||||||
@@ -336,6 +365,8 @@ fn pump(
|
|||||||
}
|
}
|
||||||
match decoded {
|
match decoded {
|
||||||
Ok(Some(decoded)) => {
|
Ok(Some(decoded)) => {
|
||||||
|
// The `decoded` point: decoder output frame available.
|
||||||
|
let decoded_ns = now_ns();
|
||||||
total_frames += 1;
|
total_frames += 1;
|
||||||
hdr = decoded.hdr();
|
hdr = decoded.hdr();
|
||||||
// The backend can demote D3D11VA → software mid-session on a hardware error.
|
// The backend can demote D3D11VA → software mid-session on a hardware error.
|
||||||
@@ -350,19 +381,17 @@ fn pump(
|
|||||||
"first frame decoded"
|
"first frame decoded"
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
// Latency: our wall clock expressed in the host's capture clock,
|
// `decode` stage: received → decoded, single-clock client-local.
|
||||||
// minus the host-stamped capture pts (same math as client-rs).
|
decode_us.push(decoded_ns.saturating_sub(received_ns) / 1000);
|
||||||
let lat = (now_ns() as i128 + clock_offset as i128 - frame.pts_ns as i128)
|
|
||||||
.max(0) as u64;
|
|
||||||
if lat > 0 && lat < 10_000_000_000 {
|
|
||||||
lat_us.push(lat / 1000);
|
|
||||||
}
|
|
||||||
decode_us_sum += t0.elapsed().as_micros() as u64;
|
|
||||||
frames_n += 1;
|
|
||||||
bytes_n += frame.data.len() as u64;
|
|
||||||
// Newest wins: displace the oldest queued frame when the renderer lags.
|
// Newest wins: displace the oldest queued frame when the renderer lags.
|
||||||
if let Err(crossbeam_channel::TrySendError::Full(item)) =
|
if let Err(crossbeam_channel::TrySendError::Full(item)) =
|
||||||
frame_tx.try_send((decoded, frame.pts_ns))
|
frame_tx.try_send((
|
||||||
|
decoded,
|
||||||
|
FrameTimes {
|
||||||
|
pts_ns: frame.pts_ns,
|
||||||
|
decoded_ns,
|
||||||
|
},
|
||||||
|
))
|
||||||
{
|
{
|
||||||
let _ = frame_rx.try_recv();
|
let _ = frame_rx.try_recv();
|
||||||
let _ = frame_tx.try_send(item);
|
let _ = frame_tx.try_send(item);
|
||||||
@@ -413,23 +442,23 @@ fn pump(
|
|||||||
|
|
||||||
if window_start.elapsed() >= Duration::from_secs(1) {
|
if window_start.elapsed() >= Duration::from_secs(1) {
|
||||||
let secs = window_start.elapsed().as_secs_f32();
|
let secs = window_start.elapsed().as_secs_f32();
|
||||||
lat_us.sort_unstable();
|
hostnet_us.sort_unstable();
|
||||||
let p50 = lat_us.get(lat_us.len() / 2).copied().unwrap_or(0);
|
decode_us.sort_unstable();
|
||||||
|
let p50 = |v: &[u64]| v.get(v.len() / 2).copied().unwrap_or(0);
|
||||||
|
let (hostnet_p50, decode_p50) = (p50(&hostnet_us), p50(&decode_us));
|
||||||
tracing::debug!(
|
tracing::debug!(
|
||||||
fps = frames_n,
|
fps = frames_n,
|
||||||
lat_p50_us = p50,
|
hostnet_p50_us = hostnet_p50,
|
||||||
|
decode_p50_us = decode_p50,
|
||||||
total_frames,
|
total_frames,
|
||||||
"stream window"
|
"stream window"
|
||||||
);
|
);
|
||||||
let _ = ev_tx.try_send(SessionEvent::Stats(Stats {
|
let _ = ev_tx.try_send(SessionEvent::Stats(Stats {
|
||||||
fps: frames_n as f32 / secs,
|
fps: frames_n as f32 / secs,
|
||||||
mbps: bytes_n as f32 * 8.0 / 1e6 / secs,
|
mbps: bytes_n as f32 * 8.0 / 1e6 / secs,
|
||||||
decode_ms: if frames_n > 0 {
|
decode_ms: decode_p50 as f32 / 1000.0,
|
||||||
decode_us_sum as f32 / frames_n as f32 / 1000.0
|
hostnet_ms: hostnet_p50 as f32 / 1000.0,
|
||||||
} else {
|
same_host: clock_offset == 0,
|
||||||
0.0
|
|
||||||
},
|
|
||||||
latency_ms: p50 as f32 / 1000.0,
|
|
||||||
hardware,
|
hardware,
|
||||||
hdr,
|
hdr,
|
||||||
codec: connector.codec,
|
codec: connector.codec,
|
||||||
@@ -439,8 +468,8 @@ fn pump(
|
|||||||
window_start = Instant::now();
|
window_start = Instant::now();
|
||||||
frames_n = 0;
|
frames_n = 0;
|
||||||
bytes_n = 0;
|
bytes_n = 0;
|
||||||
decode_us_sum = 0;
|
hostnet_us.clear();
|
||||||
lat_us.clear();
|
decode_us.clear();
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@@ -390,7 +390,7 @@ pub struct ProbeResult {
|
|||||||
/// `client → host`, right after [`Start`]: one round of the wall-clock skew handshake. The client
|
/// `client → host`, right after [`Start`]: one round of the wall-clock skew handshake. The client
|
||||||
/// stamps `t1_ns` (its monotonic-since-epoch clock) and sends; the host echoes it in [`ClockEcho`]
|
/// stamps `t1_ns` (its monotonic-since-epoch clock) and sends; the host echoes it in [`ClockEcho`]
|
||||||
/// with its own receive/send stamps. A few rounds let the client estimate the host↔client clock
|
/// with its own receive/send stamps. A few rounds let the client estimate the host↔client clock
|
||||||
/// offset, so the per-frame `capture→reassembled` latency (the AU `pts_ns` is the host's capture
|
/// offset, so the per-frame `capture→received` latency (the AU `pts_ns` is the host's capture
|
||||||
/// clock) is meaningful across machines, not just same-host. An old host ignores it (the client
|
/// clock) is meaningful across machines, not just same-host. An old host ignores it (the client
|
||||||
/// times out and assumes a shared clock).
|
/// times out and assumes a shared clock).
|
||||||
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||||||
|
|||||||
@@ -0,0 +1,142 @@
|
|||||||
|
# Unified streaming stats — one vocabulary across every client
|
||||||
|
|
||||||
|
Status: spec agreed 2026-07-03; implementation tracked per client below.
|
||||||
|
User-facing companion: `docs-site/content/docs/stats.md` (overlay guide + Moonlight matrix).
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Prospective users compare our numbers against Moonlight/Sunshine's overlay. Before this
|
||||||
|
spec the clients disagreed with each other (same concept, different name, different
|
||||||
|
measurement point, mean vs median, missing skew flags), and none of them made clear
|
||||||
|
which numbers are *sequential stages that add up* versus *overlapping absolutes*. The
|
||||||
|
Apple HUD showed three arrow-notation lines (`capture→client`, `capture→present`,
|
||||||
|
`decode→present`) that looked like a pipeline but overlapped, left the decode interval
|
||||||
|
invisible, and mixed two clock bases. Meanwhile Moonlight shows *only* disjoint
|
||||||
|
client-side segments and has **no end-to-end number at all** — so our headline
|
||||||
|
end-to-end figure, presented without context, reads "worse" against a Moonlight line
|
||||||
|
that measures a fraction of the chain.
|
||||||
|
|
||||||
|
## The model
|
||||||
|
|
||||||
|
Four measurement points per video frame. Every stat on every HUD is a difference of
|
||||||
|
two of these points — nothing else.
|
||||||
|
|
||||||
|
| point | meaning | who stamps it |
|
||||||
|
|---|---|---|
|
||||||
|
| **capture** | host capture clock: the per-AU `pts_ns`. Native path anchors at PipeWire frame delivery (host queue age is *inside* it); GameStream uses the RTP 90 kHz clock. | host |
|
||||||
|
| **received** | AU fully reassembled (post-FEC), handed to the client, before decode | client |
|
||||||
|
| **decoded** | decoder output frame available | client |
|
||||||
|
| **displayed** | best-effort presentation instant (see per-client endpoint table) | client |
|
||||||
|
|
||||||
|
Three **stages** tile the interval exactly (per frame, same timestamps, no gaps, no
|
||||||
|
overlap):
|
||||||
|
|
||||||
|
- `host+network` = capture → received. Contains the whole host pipeline
|
||||||
|
(queue/capture/convert/encode/pace) **plus** wire time and reassembly; it cannot be
|
||||||
|
split client-side today (see Phase 2).
|
||||||
|
- `decode` = received → decoded. Pure client-local; no clock skew involved.
|
||||||
|
- `display` = decoded → displayed. Pacing/queue wait + render + vsync. Client-local.
|
||||||
|
|
||||||
|
Headline: **`end-to-end`** = capture → displayed, **measured directly** (not the sum of
|
||||||
|
stage percentiles). Per frame the stages sum to it exactly; per window the p50s only
|
||||||
|
approximately sum (percentiles aren't additive) — the doc says so, the HUD shows the
|
||||||
|
equation with the directly-measured total on the left.
|
||||||
|
|
||||||
|
### Clock rules
|
||||||
|
|
||||||
|
- `end-to-end` and `host+network` use client `CLOCK_REALTIME` + `clock_offset_ns`
|
||||||
|
(the ClockProbe/ClockEcho skew handshake). Cross-machine valid when the offset was
|
||||||
|
measured.
|
||||||
|
- `decode` and `display` are single-clock client-local (offset irrelevant).
|
||||||
|
- When `clock_offset_ns == 0` (host didn't answer the handshake / same host), the HUD
|
||||||
|
appends **`(same-host clock)`** once, on the end-to-end line — every client, no
|
||||||
|
exceptions. Windows and Linux previously showed possibly-uncorrected numbers
|
||||||
|
silently; that was a bug in presentation.
|
||||||
|
|
||||||
|
### Aggregation rules
|
||||||
|
|
||||||
|
- **Window: 1 s tumbling** (drain and reset), HUD refresh 1×/s. (Moonlight uses a
|
||||||
|
1–2 s sliding window — close enough for comparison; documented in the matrix.)
|
||||||
|
- **Percentiles only, never means**: end-to-end shows `p50` and `p95`; stages show
|
||||||
|
`p50`. Windows/Linux previously showed *mean* decode time next to median latencies —
|
||||||
|
banned.
|
||||||
|
- Latency samples clamped to `(0, 10 s)` as before.
|
||||||
|
- `fps` = AUs received (reassembled) per second, actual-elapsed-time denominator.
|
||||||
|
- `Mb/s` = received payload bytes × 8 / elapsed (goodput, excludes FEC overhead —
|
||||||
|
same basis as Moonlight's bitrate tracker).
|
||||||
|
|
||||||
|
## Canonical HUD layout
|
||||||
|
|
||||||
|
Line order and label strings are normative; platform chips vary.
|
||||||
|
|
||||||
|
```
|
||||||
|
1920×1080@120 · 119 fps · 38.2 Mb/s · HEVC 10-bit HDR · GPU decode
|
||||||
|
end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass
|
||||||
|
= host+network 9.8 + decode 2.1 + display 2.3
|
||||||
|
lost 3 (0.1%) · skipped 1 · FEC 12
|
||||||
|
```
|
||||||
|
|
||||||
|
- Line 1 — stream facts. `{W}×{H}@{Hz} · {fps} fps · {mbps} Mb/s` plus whatever
|
||||||
|
chips the platform already knows (codec, bit depth, HDR, GPU/CPU decode).
|
||||||
|
- Line 2 — the headline. `end-to-end {p50} ms p50 · {p95} p95 · capture→{endpoint}`
|
||||||
|
(+ ` (same-host clock)` when applicable). The endpoint suffix is honest per
|
||||||
|
platform — see the endpoint table. Never a bare "latency".
|
||||||
|
- Line 3 — the equation. `= host+network {a} + decode {b} + display {c}` (stage p50s,
|
||||||
|
ms, one decimal). A platform that can't measure a trailing stage drops the term and
|
||||||
|
the headline endpoint moves accordingly — the equation always tiles the headline
|
||||||
|
interval.
|
||||||
|
- Line 4 — counters, only rendered when any value is nonzero.
|
||||||
|
`lost` = unrecoverable network frame drops in the window (`{n} ({pct}%)`,
|
||||||
|
pct = lost/(received+lost)); `skipped` = client-side newest-wins/pacing drops;
|
||||||
|
`FEC` = shards recovered this window (proof FEC is earning its keep).
|
||||||
|
|
||||||
|
### Per-client endpoints (v1)
|
||||||
|
|
||||||
|
| client | displayed point | headline reads | equation terms |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Apple stage-2 | display-link target present instant | `capture→on-glass` | host+network + decode + display |
|
||||||
|
| Apple stage-1 (fallback presenter) | n/a (opaque `AVSampleBufferDisplayLayer`) | `capture→received` | host+network only |
|
||||||
|
| Windows | post-`Present()` on the render thread | `capture→on-glass` | all three |
|
||||||
|
| Linux GTK | paintable-set (GTK adds ~1 compositor cycle after) | `capture→displayed` | all three |
|
||||||
|
| Android | MediaCodec output released to surface | `capture→decoded` (v1) | host+network + decode |
|
||||||
|
| probe (headless log) | n/a | `capture→received` (was "reassembled") | logs p50/p95/p99/max µs, whole-session — measurement tool, exempt from the 1 s window rule but uses the canonical point names |
|
||||||
|
|
||||||
|
Host web console (`stats_recorder`) keeps its own additive stage vocabulary
|
||||||
|
(`queue → capture → submit → encode → send`, p50/p99, 1–2 s windows) — operator
|
||||||
|
deep-dive tool, already additive/stacked, out of scope here except that its stage names
|
||||||
|
must stay consistent with the docs. The sum of the host stages is our analogue of
|
||||||
|
Sunshine's "host processing latency" (capture→send).
|
||||||
|
|
||||||
|
## Moonlight comparison (summary — full matrix in docs-site/content/docs/stats.md)
|
||||||
|
|
||||||
|
- Moonlight's latency lines are **disjoint client segments** (decode, queue delay,
|
||||||
|
render) plus Sunshine's host capture→send; nothing measures the wire, and there is
|
||||||
|
**no end-to-end line**. Our `end-to-end` must never be compared against any single
|
||||||
|
Moonlight line. Fair approximation:
|
||||||
|
`punktfunk end-to-end ≈ ML host processing latency + ~½ RTT + ML decoding time +
|
||||||
|
ML frame queue delay + ML rendering time`.
|
||||||
|
- Moonlight "Average network latency" = **ENet control-channel RTT** — not frame
|
||||||
|
latency; we intentionally have no equivalent line.
|
||||||
|
- Moonlight "Video stream … FPS" *includes inferred-lost frames* (host-rate estimate
|
||||||
|
from frame-number gaps); our `fps` counts received only — equal at ~0 loss.
|
||||||
|
- Moonlight decode/queue/render times are **means**; ours are p50s.
|
||||||
|
|
||||||
|
## Phase 2 (specced, not in v1): split `host+network`
|
||||||
|
|
||||||
|
Carry the host's capture→send duration per AU (host stamps it at send, e.g. a
|
||||||
|
varint-µs field in the AU header or a 0.1 ms u16 à la Sunshine's frame header). Client
|
||||||
|
then displays `host {x} + network {y}` instead of `host+network`, where
|
||||||
|
`network = (received − capture) − host_reported` — and the Moonlight matrix gains a
|
||||||
|
direct "Host processing latency" counterpart. Requires a core wire/ABI bump
|
||||||
|
(`punktfunk_frame` gains `host_latency_us`), trailing-byte back-compat like the
|
||||||
|
compositor/gamepad preference bytes. Also consider surfacing the QUIC path RTT
|
||||||
|
(quinn exposes it) as a diagnostics line, clearly labelled control-plane RTT.
|
||||||
|
|
||||||
|
## Implementation status
|
||||||
|
|
||||||
|
- [ ] Apple (`StreamHUDView`/`SessionModel`/`Stage2Pipeline` + `LatencyMeter` reuse)
|
||||||
|
- [ ] Windows (`app/stream.rs` HUD rows, `session.rs`/`render.rs` meters → p50/p95)
|
||||||
|
- [ ] Linux (`ui_stream.rs` OSD, `session.rs` window meters)
|
||||||
|
- [ ] Android (`stats.rs`/`decode.rs` stage split, `StatsOverlay.kt`)
|
||||||
|
- [ ] probe (rename `capture→reassembled` → `capture→received` in the log line)
|
||||||
|
- [ ] docs-site stats page + matrix; link from `moonlight.md`
|
||||||
@@ -26,6 +26,7 @@
|
|||||||
"host-cli",
|
"host-cli",
|
||||||
"---Troubleshooting---",
|
"---Troubleshooting---",
|
||||||
"troubleshooting",
|
"troubleshooting",
|
||||||
|
"stats",
|
||||||
"forgot-password",
|
"forgot-password",
|
||||||
"---Project---",
|
"---Project---",
|
||||||
"roadmap",
|
"roadmap",
|
||||||
|
|||||||
@@ -52,3 +52,6 @@ it. Mouse, keyboard, and controllers flow back to the host.
|
|||||||
clients](/docs/clients) have a built-in speed test; with Moonlight, set the bitrate manually.
|
clients](/docs/clients) have a built-in speed test; with Moonlight, set the bitrate manually.
|
||||||
- Moonlight uses the GameStream protocol, not punktfunk's native FEC/encryption extensions. On a
|
- Moonlight uses the GameStream protocol, not punktfunk's native FEC/encryption extensions. On a
|
||||||
solid LAN this is fine; on a lossy link a [native client](/docs/clients) holds up better.
|
solid LAN this is fine; on a lossy link a [native client](/docs/clients) holds up better.
|
||||||
|
- Comparing Moonlight's performance overlay with a punktfunk client's stats HUD? The numbers
|
||||||
|
measure different slices of the pipeline — see [Understanding the Stats Overlay](/docs/stats)
|
||||||
|
for a line-by-line comparison matrix before drawing conclusions.
|
||||||
|
|||||||
@@ -0,0 +1,131 @@
|
|||||||
|
---
|
||||||
|
title: Understanding the Stats Overlay
|
||||||
|
description: What every number in the punktfunk stats HUD means, and how to compare them fairly with Moonlight/Sunshine.
|
||||||
|
---
|
||||||
|
|
||||||
|
Every punktfunk client has an in-stream stats overlay. All clients use **the same
|
||||||
|
vocabulary, the same measurement points, and the same math**, so a number on your
|
||||||
|
phone means exactly what the same number means on your desktop.
|
||||||
|
|
||||||
|
## The four measurement points
|
||||||
|
|
||||||
|
Every latency figure is the time between two of these four points in a video frame's
|
||||||
|
life:
|
||||||
|
|
||||||
|
1. **capture** — the host grabs the frame from the (virtual) display. Stamped on the
|
||||||
|
host's clock and carried with the frame.
|
||||||
|
2. **received** — your client has fully received and reassembled the frame from the
|
||||||
|
network (after any FEC recovery), before decoding.
|
||||||
|
3. **decoded** — the video decoder has produced the picture.
|
||||||
|
4. **displayed** — the picture is handed to the screen (as close to "photons" as the
|
||||||
|
platform lets us measure).
|
||||||
|
|
||||||
|
## Reading the overlay
|
||||||
|
|
||||||
|
```
|
||||||
|
1920×1080@120 · 119 fps · 38.2 Mb/s · HEVC 10-bit HDR · GPU decode
|
||||||
|
end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass
|
||||||
|
= host+network 9.8 + decode 2.1 + display 2.3
|
||||||
|
lost 3 (0.1%) · skipped 1 · FEC 12
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Line 1 — the stream.** Resolution@refresh, frames received per second, and the
|
||||||
|
received video bitrate (goodput — FEC overhead not counted), plus codec details.
|
||||||
|
- **Line 2 — the headline.** `end-to-end` is the *directly measured* time from host
|
||||||
|
capture to the endpoint named at the end of the line (`capture→on-glass` here).
|
||||||
|
`p50` = the typical frame (median), `p95` = the slow outliers. This is the one
|
||||||
|
number that summarizes your stream.
|
||||||
|
- **Line 3 — where the time goes.** The three stages **tile the end-to-end interval**
|
||||||
|
— each starts where the previous one ends, so they add up to the headline:
|
||||||
|
- `host+network` — capture → received: the host's capture/encode/send pipeline
|
||||||
|
*plus* the network flight and reassembly, in one number.
|
||||||
|
- `decode` — received → decoded, on your device.
|
||||||
|
- `display` — decoded → displayed: waiting for the right screen refresh, rendering,
|
||||||
|
and vsync.
|
||||||
|
|
||||||
|
(Stage values are per-stage medians, so they sum only *approximately* to the
|
||||||
|
headline median — percentiles aren't perfectly additive. The headline is measured
|
||||||
|
directly, never computed as a sum.)
|
||||||
|
- **Line 4 — reliability** (only shown when something is nonzero). `lost` = frames the
|
||||||
|
network dropped beyond FEC's ability to recover; `skipped` = frames your client
|
||||||
|
chose not to display because a newer one had already arrived; `FEC` = packet shards
|
||||||
|
the error correction recovered this second (loss that you *didn't* feel).
|
||||||
|
|
||||||
|
All values refresh once per second over the last second of frames.
|
||||||
|
|
||||||
|
### Clocks, and the `(same-host clock)` tag
|
||||||
|
|
||||||
|
`end-to-end` and `host+network` span two machines, so they need the two clocks to
|
||||||
|
agree: at connect, the client runs an NTP-style handshake with the host and corrects
|
||||||
|
for the measured clock offset. If that handshake wasn't possible, the overlay appends
|
||||||
|
**`(same-host clock)`** — the numbers are then only trustworthy when client and host
|
||||||
|
run on the same machine. `decode` and `display` are single-machine measurements and
|
||||||
|
are always exact.
|
||||||
|
|
||||||
|
### What each platform can measure
|
||||||
|
|
||||||
|
Not every platform exposes a true "displayed" instant, so the headline's endpoint is
|
||||||
|
always spelled out rather than pretending:
|
||||||
|
|
||||||
|
| client | headline | why |
|
||||||
|
|---|---|---|
|
||||||
|
| Windows, macOS/iOS (Metal presenter), Linux | `capture→on-glass` / `capture→displayed` | present instant available (GTK measures at hand-off to the compositor, which adds about one compositor cycle after it) |
|
||||||
|
| Android | `capture→decoded` | the display hand-off happens inside MediaCodec/SurfaceView where precise present timing isn't exposed |
|
||||||
|
| macOS/iOS fallback presenter | `capture→received` | the system video layer hides decode and present timing entirely |
|
||||||
|
|
||||||
|
A shorter chain means the number is **smaller because it measures less** — check the
|
||||||
|
endpoint before comparing two devices.
|
||||||
|
|
||||||
|
## Comparing with Moonlight / Sunshine
|
||||||
|
|
||||||
|
Moonlight's overlay and punktfunk's measure different slices of the pipeline, and the
|
||||||
|
single biggest difference is:
|
||||||
|
|
||||||
|
> **Moonlight has no end-to-end number.** Its overlay shows separate client-side
|
||||||
|
> segments (decode time, queue delay, render time) and — on Sunshine hosts — a
|
||||||
|
> host-side number. Nothing in Moonlight measures capture-to-glass, and nothing
|
||||||
|
> measures the network flight of video frames. punktfunk's `end-to-end` line has **no
|
||||||
|
> Moonlight counterpart** — never compare it against any single Moonlight line.
|
||||||
|
|
||||||
|
To compare fairly, reconstruct an approximate end-to-end from Moonlight's lines:
|
||||||
|
|
||||||
|
```
|
||||||
|
Moonlight ≈ host processing latency (avg)
|
||||||
|
+ ½ × average network latency
|
||||||
|
+ average decoding time
|
||||||
|
+ average frame queue delay
|
||||||
|
+ average rendering time
|
||||||
|
```
|
||||||
|
|
||||||
|
…and compare *that* against punktfunk's `end-to-end`. (It's still approximate:
|
||||||
|
Moonlight's segments are averages over a slightly different window, and the ½·RTT term
|
||||||
|
stands in for a one-way frame flight that Moonlight doesn't measure.)
|
||||||
|
|
||||||
|
### Line-by-line matrix
|
||||||
|
|
||||||
|
| Moonlight overlay line | What it actually measures | punktfunk equivalent | Comparable? |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `Video stream: WxH FPS` | Received **plus inferred-lost** frames/s (host-rate estimate from frame sequence gaps) | `fps` (line 1) | ≈ equal when loss is near zero; punktfunk counts received frames only |
|
||||||
|
| `Incoming frame rate from network` | Frames reassembled from the network per second | `fps` (line 1) | **Yes — direct** |
|
||||||
|
| `Decoding frame rate` (desktop only) | Frames leaving the decoder per second | not shown separately (equals `fps` unless the decoder is falling behind) | — |
|
||||||
|
| `Rendering frame rate` (desktop only) | Frames actually presented per second | `fps` minus `skipped` | Approximately |
|
||||||
|
| `Host processing latency min/max/avg` (Sunshine hosts) | Host capture → just-before-send, reported by Sunshine per frame | contained inside `host+network`; the host-side breakdown lives in the punktfunk web console (capture/encode/send stages) | Indirect — punktfunk's `host+network` additionally includes the network flight |
|
||||||
|
| `Frames dropped by your network connection` | Frame-sequence gaps ÷ total frames | `lost` (line 4) | **Yes — direct** |
|
||||||
|
| `Frames dropped due to network jitter` | Decoded frames the *client's pacer* chose to drop ÷ decoded frames | `skipped` (line 4) | Approximately (both are client-side pacing decisions, despite Moonlight's name) |
|
||||||
|
| `Average network latency` | The **control connection's round-trip time** (ENet RTT + variance) — not video frame latency | none, on purpose | **No.** An RTT is not a frame latency; punktfunk measures the actual per-frame path instead |
|
||||||
|
| `Average decoding time` | Mean time from decoder enqueue to picture out | `decode` (p50) | Yes (mean vs median; both include decoder queueing) |
|
||||||
|
| `Average frame queue delay` | Mean time a decoded frame waits for its vsync slot | inside `display` | Sum the two Moonlight lines → |
|
||||||
|
| `Average rendering time (incl. V-sync latency)` | Mean duration of the present call | inside `display` | …and compare against punktfunk's `display` |
|
||||||
|
| *(no equivalent)* | — | `end-to-end` — true capture→glass, clock-skew-corrected across machines | **punktfunk only** |
|
||||||
|
| *(no equivalent)* | — | `FEC` recovered shards (loss absorbed invisibly) | punktfunk only |
|
||||||
|
|
||||||
|
Other differences worth knowing when squinting at both overlays side by side:
|
||||||
|
|
||||||
|
- **Averages vs percentiles.** Moonlight's time values are means; punktfunk shows
|
||||||
|
medians (p50) with a p95 for the headline. Under jitter, a mean sits above the
|
||||||
|
median — Moonlight's numbers read slightly "worse" than an equivalent p50.
|
||||||
|
- **Windows.** Both refresh about once per second; Moonlight over a ~1–2 s sliding
|
||||||
|
window, punktfunk over the last full second.
|
||||||
|
- **Host frame rate.** Moonlight's headline FPS estimates what the *host* produced
|
||||||
|
(received + lost). punktfunk shows what your client actually received, and reports
|
||||||
|
loss separately.
|
||||||
@@ -89,7 +89,7 @@ Notable capabilities that have landed, newest first:
|
|||||||
limit), each with its own virtual output and encoder — e.g. stream the same desktop to a
|
limit), each with its own virtual output and encoder — e.g. stream the same desktop to a
|
||||||
laptop and a TV simultaneously.
|
laptop and a TV simultaneously.
|
||||||
- **Cross-machine latency HUD + wall-clock skew handshake.** A short NTP-style handshake
|
- **Cross-machine latency HUD + wall-clock skew handshake.** A short NTP-style handshake
|
||||||
aligns client and host clocks, making capture-to-reassembled latency valid across
|
aligns client and host clocks, making capture-to-received latency valid across
|
||||||
machines; the Apple client surfaces a skew-corrected capture-to-receipt p50/p95 in its
|
machines; the Apple client surfaces a skew-corrected capture-to-receipt p50/p95 in its
|
||||||
HUD.
|
HUD.
|
||||||
- **Native LAN auto-discovery.** Hosts advertise `_punktfunk._udp` over mDNS (with TXT
|
- **Native LAN auto-discovery.** Hosts advertise `_punktfunk._udp` over mDNS (with TXT
|
||||||
|
|||||||
@@ -8,7 +8,7 @@
|
|||||||
# scripts/bench/gpu-stream.sh 1920x1080x120 12 --update # (re)write scripts/bench/gpu-baseline.json
|
# scripts/bench/gpu-stream.sh 1920x1080x120 12 --update # (re)write scripts/bench/gpu-baseline.json
|
||||||
#
|
#
|
||||||
# Metrics (host PUNKTFUNK_PERF + client report): encode_us_p50/p99, tx_mbps, send_dropped, and the
|
# Metrics (host PUNKTFUNK_PERF + client report): encode_us_p50/p99, tx_mbps, send_dropped, and the
|
||||||
# client's capture→reassembled lat_p50/p95/p99_us. Lower is better for latency/encode/drops, higher
|
# client's capture→received lat_p50/p95/p99_us. Lower is better for latency/encode/drops, higher
|
||||||
# for throughput. Regressions are flagged ⚠ but the script exits 0 (gate decisions stay human).
|
# for throughput. Regressions are flagged ⚠ but the script exits 0 (gate decisions stay human).
|
||||||
set -uo pipefail
|
set -uo pipefail
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user