Files

T

enricobuehler 09a5957c6d feat(clients): unified stats vocabulary across every client + Moonlight comparison docs

One stat model everywhere (design/stats-unification.md): four measurement
points (capture/received/decoded/displayed), three stages that tile the
interval exactly, and a HUD that shows the addition explicitly —

  end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass
  = host+network 9.8 + decode 2.1 + display 2.3

replacing each client's ad-hoc mix of overlapping absolutes (the Apple HUD's
three arrow lines that looked sequential but weren't), mean-vs-median decode
times (Windows/Linux), missing same-host-clock flags (Windows/Linux), and
three different names for the same capture→received measurement (probe's
"reassembled", Apple/Android's "client", Windows/Linux's post-decode "lat").

Per client: Apple threads receivedNs through the VT decode via the frame
refcon bit pattern so the decode stage exists at all (stage-1 fallback
honestly degrades to a capture→received headline); Windows carries
FrameTimes through the existing frame channel to the render thread and adds
e2e p50/p95 post-Present; Linux stamps received at AU pop and rides
decoded_ns on DecodedFrame to the paintable-set site; Android pairs receipt
stamps with MediaCodec output buffers via the codec's pts round-trip (JNI
stats array 14→16 doubles, indexes 0-13 unchanged). fps now uniformly counts
received AUs; lost/(received+lost) per window, hidden at zero.

docs-site gains "Understanding the Stats Overlay": what each line means, why
the equation only approximately sums (percentiles), and a line-by-line
Moonlight/Sunshine matrix — including that Moonlight has no end-to-end
number and its "network latency" is an ENet control RTT, so punktfunk's
headline must not be compared against any single Moonlight line.

Verified here: linux client + probe + core check/clippy/fmt green, android
native cargo-ndk arm64 check green. Pending: Windows CI + on-glass, swift
test on the mac, on-device Android.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

2026-07-03 21:01:29 +00:00

8.0 KiB

Raw Blame History

Unified streaming stats — one vocabulary across every client

Status: spec agreed 2026-07-03; implementation tracked per client below. User-facing companion: docs-site/content/docs/stats.md (overlay guide + Moonlight matrix).

Why

Prospective users compare our numbers against Moonlight/Sunshine's overlay. Before this spec the clients disagreed with each other (same concept, different name, different measurement point, mean vs median, missing skew flags), and none of them made clear which numbers are sequential stages that add up versus overlapping absolutes. The Apple HUD showed three arrow-notation lines (capture→client, capture→present, decode→present) that looked like a pipeline but overlapped, left the decode interval invisible, and mixed two clock bases. Meanwhile Moonlight shows only disjoint client-side segments and has no end-to-end number at all — so our headline end-to-end figure, presented without context, reads "worse" against a Moonlight line that measures a fraction of the chain.

The model

Four measurement points per video frame. Every stat on every HUD is a difference of two of these points — nothing else.

point	meaning	who stamps it
capture	host capture clock: the per-AU `pts_ns`. Native path anchors at PipeWire frame delivery (host queue age is inside it); GameStream uses the RTP 90 kHz clock.	host
received	AU fully reassembled (post-FEC), handed to the client, before decode	client
decoded	decoder output frame available	client
displayed	best-effort presentation instant (see per-client endpoint table)	client

Three stages tile the interval exactly (per frame, same timestamps, no gaps, no overlap):

host+network = capture → received. Contains the whole host pipeline (queue/capture/convert/encode/pace) plus wire time and reassembly; it cannot be split client-side today (see Phase 2).
decode = received → decoded. Pure client-local; no clock skew involved.
display = decoded → displayed. Pacing/queue wait + render + vsync. Client-local.

Headline: end-to-end = capture → displayed, measured directly (not the sum of stage percentiles). Per frame the stages sum to it exactly; per window the p50s only approximately sum (percentiles aren't additive) — the doc says so, the HUD shows the equation with the directly-measured total on the left.

Clock rules

end-to-end and host+network use client CLOCK_REALTIME + clock_offset_ns (the ClockProbe/ClockEcho skew handshake). Cross-machine valid when the offset was measured.
decode and display are single-clock client-local (offset irrelevant).
When clock_offset_ns == 0 (host didn't answer the handshake / same host), the HUD appends (same-host clock) once, on the end-to-end line — every client, no exceptions. Windows and Linux previously showed possibly-uncorrected numbers silently; that was a bug in presentation.

Aggregation rules

Window: 1 s tumbling (drain and reset), HUD refresh 1×/s. (Moonlight uses a 1–2 s sliding window — close enough for comparison; documented in the matrix.)
Percentiles only, never means: end-to-end shows p50 and p95; stages show p50. Windows/Linux previously showed mean decode time next to median latencies — banned.
Latency samples clamped to (0, 10 s) as before.
fps = AUs received (reassembled) per second, actual-elapsed-time denominator.
Mb/s = received payload bytes × 8 / elapsed (goodput, excludes FEC overhead — same basis as Moonlight's bitrate tracker).

Canonical HUD layout

Line order and label strings are normative; platform chips vary.

1920×1080@120 · 119 fps · 38.2 Mb/s · HEVC 10-bit HDR · GPU decode
end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass
= host+network 9.8 + decode 2.1 + display 2.3
lost 3 (0.1%) · skipped 1 · FEC 12

Line 1 — stream facts. {W}×{H}@{Hz} · {fps} fps · {mbps} Mb/s plus whatever chips the platform already knows (codec, bit depth, HDR, GPU/CPU decode).
Line 2 — the headline. end-to-end {p50} ms p50 · {p95} p95 · capture→{endpoint} (+ (same-host clock) when applicable). The endpoint suffix is honest per platform — see the endpoint table. Never a bare "latency".
Line 3 — the equation. = host+network {a} + decode {b} + display {c} (stage p50s, ms, one decimal). A platform that can't measure a trailing stage drops the term and the headline endpoint moves accordingly — the equation always tiles the headline interval.
Line 4 — counters, only rendered when any value is nonzero. lost = unrecoverable network frame drops in the window ({n} ({pct}%), pct = lost/(received+lost)); skipped = client-side newest-wins/pacing drops; FEC = shards recovered this window (proof FEC is earning its keep).

Per-client endpoints (v1)

client	displayed point	headline reads	equation terms
Apple stage-2	display-link target present instant	`capture→on-glass`	host+network + decode + display
Apple stage-1 (fallback presenter)	n/a (opaque `AVSampleBufferDisplayLayer`)	`capture→received`	host+network only
Windows	post-`Present()` on the render thread	`capture→on-glass`	all three
Linux GTK	paintable-set (GTK adds ~1 compositor cycle after)	`capture→displayed`	all three
Android	MediaCodec output released to surface	`capture→decoded` (v1)	host+network + decode
probe (headless log)	n/a	`capture→received` (was "reassembled")	logs p50/p95/p99/max µs, whole-session — measurement tool, exempt from the 1 s window rule but uses the canonical point names

Host web console (stats_recorder) keeps its own additive stage vocabulary (queue → capture → submit → encode → send, p50/p99, 1–2 s windows) — operator deep-dive tool, already additive/stacked, out of scope here except that its stage names must stay consistent with the docs. The sum of the host stages is our analogue of Sunshine's "host processing latency" (capture→send).

Moonlight comparison (summary — full matrix in docs-site/content/docs/stats.md)

Moonlight's latency lines are disjoint client segments (decode, queue delay, render) plus Sunshine's host capture→send; nothing measures the wire, and there is no end-to-end line. Our end-to-end must never be compared against any single Moonlight line. Fair approximation: punktfunk end-to-end ≈ ML host processing latency + ~½ RTT + ML decoding time + ML frame queue delay + ML rendering time.
Moonlight "Average network latency" = ENet control-channel RTT — not frame latency; we intentionally have no equivalent line.
Moonlight "Video stream … FPS" includes inferred-lost frames (host-rate estimate from frame-number gaps); our fps counts received only — equal at ~0 loss.
Moonlight decode/queue/render times are means; ours are p50s.

Phase 2 (specced, not in v1): split `host+network`

Carry the host's capture→send duration per AU (host stamps it at send, e.g. a varint-µs field in the AU header or a 0.1 ms u16 à la Sunshine's frame header). Client then displays host {x} + network {y} instead of host+network, where network = (received − capture) − host_reported — and the Moonlight matrix gains a direct "Host processing latency" counterpart. Requires a core wire/ABI bump (punktfunk_frame gains host_latency_us), trailing-byte back-compat like the compositor/gamepad preference bytes. Also consider surfacing the QUIC path RTT (quinn exposes it) as a diagnostics line, clearly labelled control-plane RTT.

Implementation status

Apple (StreamHUDView/SessionModel/Stage2Pipeline + LatencyMeter reuse)
Windows (app/stream.rs HUD rows, session.rs/render.rs meters → p50/p95)
Linux (ui_stream.rs OSD, session.rs window meters)
Android (stats.rs/decode.rs stage split, StatsOverlay.kt)
probe (rename capture→reassembled → capture→received in the log line)
docs-site stats page + matrix; link from moonlight.md

8.0 KiB Raw Blame History Unescape Escape