feat(clients): unified stats vocabulary across every client + Moonlight comparison docs
One stat model everywhere (design/stats-unification.md): four measurement points (capture/received/decoded/displayed), three stages that tile the interval exactly, and a HUD that shows the addition explicitly — end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass = host+network 9.8 + decode 2.1 + display 2.3 replacing each client's ad-hoc mix of overlapping absolutes (the Apple HUD's three arrow lines that looked sequential but weren't), mean-vs-median decode times (Windows/Linux), missing same-host-clock flags (Windows/Linux), and three different names for the same capture→received measurement (probe's "reassembled", Apple/Android's "client", Windows/Linux's post-decode "lat"). Per client: Apple threads receivedNs through the VT decode via the frame refcon bit pattern so the decode stage exists at all (stage-1 fallback honestly degrades to a capture→received headline); Windows carries FrameTimes through the existing frame channel to the render thread and adds e2e p50/p95 post-Present; Linux stamps received at AU pop and rides decoded_ns on DecodedFrame to the paintable-set site; Android pairs receipt stamps with MediaCodec output buffers via the codec's pts round-trip (JNI stats array 14→16 doubles, indexes 0-13 unchanged). fps now uniformly counts received AUs; lost/(received+lost) per window, hidden at zero. docs-site gains "Understanding the Stats Overlay": what each line means, why the equation only approximately sums (percentiles), and a line-by-line Moonlight/Sunshine matrix — including that Moonlight has no end-to-end number and its "network latency" is an ENet control RTT, so punktfunk's headline must not be compared against any single Moonlight line. Verified here: linux client + probe + core check/clippy/fmt green, android native cargo-ndk arm64 check green. Pending: Windows CI + on-glass, swift test on the mac, on-device Android. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,142 @@
|
||||
# Unified streaming stats — one vocabulary across every client
|
||||
|
||||
Status: spec agreed 2026-07-03; implementation tracked per client below.
|
||||
User-facing companion: `docs-site/content/docs/stats.md` (overlay guide + Moonlight matrix).
|
||||
|
||||
## Why
|
||||
|
||||
Prospective users compare our numbers against Moonlight/Sunshine's overlay. Before this
|
||||
spec the clients disagreed with each other (same concept, different name, different
|
||||
measurement point, mean vs median, missing skew flags), and none of them made clear
|
||||
which numbers are *sequential stages that add up* versus *overlapping absolutes*. The
|
||||
Apple HUD showed three arrow-notation lines (`capture→client`, `capture→present`,
|
||||
`decode→present`) that looked like a pipeline but overlapped, left the decode interval
|
||||
invisible, and mixed two clock bases. Meanwhile Moonlight shows *only* disjoint
|
||||
client-side segments and has **no end-to-end number at all** — so our headline
|
||||
end-to-end figure, presented without context, reads "worse" against a Moonlight line
|
||||
that measures a fraction of the chain.
|
||||
|
||||
## The model
|
||||
|
||||
Four measurement points per video frame. Every stat on every HUD is a difference of
|
||||
two of these points — nothing else.
|
||||
|
||||
| point | meaning | who stamps it |
|
||||
|---|---|---|
|
||||
| **capture** | host capture clock: the per-AU `pts_ns`. Native path anchors at PipeWire frame delivery (host queue age is *inside* it); GameStream uses the RTP 90 kHz clock. | host |
|
||||
| **received** | AU fully reassembled (post-FEC), handed to the client, before decode | client |
|
||||
| **decoded** | decoder output frame available | client |
|
||||
| **displayed** | best-effort presentation instant (see per-client endpoint table) | client |
|
||||
|
||||
Three **stages** tile the interval exactly (per frame, same timestamps, no gaps, no
|
||||
overlap):
|
||||
|
||||
- `host+network` = capture → received. Contains the whole host pipeline
|
||||
(queue/capture/convert/encode/pace) **plus** wire time and reassembly; it cannot be
|
||||
split client-side today (see Phase 2).
|
||||
- `decode` = received → decoded. Pure client-local; no clock skew involved.
|
||||
- `display` = decoded → displayed. Pacing/queue wait + render + vsync. Client-local.
|
||||
|
||||
Headline: **`end-to-end`** = capture → displayed, **measured directly** (not the sum of
|
||||
stage percentiles). Per frame the stages sum to it exactly; per window the p50s only
|
||||
approximately sum (percentiles aren't additive) — the doc says so, the HUD shows the
|
||||
equation with the directly-measured total on the left.
|
||||
|
||||
### Clock rules
|
||||
|
||||
- `end-to-end` and `host+network` use client `CLOCK_REALTIME` + `clock_offset_ns`
|
||||
(the ClockProbe/ClockEcho skew handshake). Cross-machine valid when the offset was
|
||||
measured.
|
||||
- `decode` and `display` are single-clock client-local (offset irrelevant).
|
||||
- When `clock_offset_ns == 0` (host didn't answer the handshake / same host), the HUD
|
||||
appends **`(same-host clock)`** once, on the end-to-end line — every client, no
|
||||
exceptions. Windows and Linux previously showed possibly-uncorrected numbers
|
||||
silently; that was a bug in presentation.
|
||||
|
||||
### Aggregation rules
|
||||
|
||||
- **Window: 1 s tumbling** (drain and reset), HUD refresh 1×/s. (Moonlight uses a
|
||||
1–2 s sliding window — close enough for comparison; documented in the matrix.)
|
||||
- **Percentiles only, never means**: end-to-end shows `p50` and `p95`; stages show
|
||||
`p50`. Windows/Linux previously showed *mean* decode time next to median latencies —
|
||||
banned.
|
||||
- Latency samples clamped to `(0, 10 s)` as before.
|
||||
- `fps` = AUs received (reassembled) per second, actual-elapsed-time denominator.
|
||||
- `Mb/s` = received payload bytes × 8 / elapsed (goodput, excludes FEC overhead —
|
||||
same basis as Moonlight's bitrate tracker).
|
||||
|
||||
## Canonical HUD layout
|
||||
|
||||
Line order and label strings are normative; platform chips vary.
|
||||
|
||||
```
|
||||
1920×1080@120 · 119 fps · 38.2 Mb/s · HEVC 10-bit HDR · GPU decode
|
||||
end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass
|
||||
= host+network 9.8 + decode 2.1 + display 2.3
|
||||
lost 3 (0.1%) · skipped 1 · FEC 12
|
||||
```
|
||||
|
||||
- Line 1 — stream facts. `{W}×{H}@{Hz} · {fps} fps · {mbps} Mb/s` plus whatever
|
||||
chips the platform already knows (codec, bit depth, HDR, GPU/CPU decode).
|
||||
- Line 2 — the headline. `end-to-end {p50} ms p50 · {p95} p95 · capture→{endpoint}`
|
||||
(+ ` (same-host clock)` when applicable). The endpoint suffix is honest per
|
||||
platform — see the endpoint table. Never a bare "latency".
|
||||
- Line 3 — the equation. `= host+network {a} + decode {b} + display {c}` (stage p50s,
|
||||
ms, one decimal). A platform that can't measure a trailing stage drops the term and
|
||||
the headline endpoint moves accordingly — the equation always tiles the headline
|
||||
interval.
|
||||
- Line 4 — counters, only rendered when any value is nonzero.
|
||||
`lost` = unrecoverable network frame drops in the window (`{n} ({pct}%)`,
|
||||
pct = lost/(received+lost)); `skipped` = client-side newest-wins/pacing drops;
|
||||
`FEC` = shards recovered this window (proof FEC is earning its keep).
|
||||
|
||||
### Per-client endpoints (v1)
|
||||
|
||||
| client | displayed point | headline reads | equation terms |
|
||||
|---|---|---|---|
|
||||
| Apple stage-2 | display-link target present instant | `capture→on-glass` | host+network + decode + display |
|
||||
| Apple stage-1 (fallback presenter) | n/a (opaque `AVSampleBufferDisplayLayer`) | `capture→received` | host+network only |
|
||||
| Windows | post-`Present()` on the render thread | `capture→on-glass` | all three |
|
||||
| Linux GTK | paintable-set (GTK adds ~1 compositor cycle after) | `capture→displayed` | all three |
|
||||
| Android | MediaCodec output released to surface | `capture→decoded` (v1) | host+network + decode |
|
||||
| probe (headless log) | n/a | `capture→received` (was "reassembled") | logs p50/p95/p99/max µs, whole-session — measurement tool, exempt from the 1 s window rule but uses the canonical point names |
|
||||
|
||||
Host web console (`stats_recorder`) keeps its own additive stage vocabulary
|
||||
(`queue → capture → submit → encode → send`, p50/p99, 1–2 s windows) — operator
|
||||
deep-dive tool, already additive/stacked, out of scope here except that its stage names
|
||||
must stay consistent with the docs. The sum of the host stages is our analogue of
|
||||
Sunshine's "host processing latency" (capture→send).
|
||||
|
||||
## Moonlight comparison (summary — full matrix in docs-site/content/docs/stats.md)
|
||||
|
||||
- Moonlight's latency lines are **disjoint client segments** (decode, queue delay,
|
||||
render) plus Sunshine's host capture→send; nothing measures the wire, and there is
|
||||
**no end-to-end line**. Our `end-to-end` must never be compared against any single
|
||||
Moonlight line. Fair approximation:
|
||||
`punktfunk end-to-end ≈ ML host processing latency + ~½ RTT + ML decoding time +
|
||||
ML frame queue delay + ML rendering time`.
|
||||
- Moonlight "Average network latency" = **ENet control-channel RTT** — not frame
|
||||
latency; we intentionally have no equivalent line.
|
||||
- Moonlight "Video stream … FPS" *includes inferred-lost frames* (host-rate estimate
|
||||
from frame-number gaps); our `fps` counts received only — equal at ~0 loss.
|
||||
- Moonlight decode/queue/render times are **means**; ours are p50s.
|
||||
|
||||
## Phase 2 (specced, not in v1): split `host+network`
|
||||
|
||||
Carry the host's capture→send duration per AU (host stamps it at send, e.g. a
|
||||
varint-µs field in the AU header or a 0.1 ms u16 à la Sunshine's frame header). Client
|
||||
then displays `host {x} + network {y}` instead of `host+network`, where
|
||||
`network = (received − capture) − host_reported` — and the Moonlight matrix gains a
|
||||
direct "Host processing latency" counterpart. Requires a core wire/ABI bump
|
||||
(`punktfunk_frame` gains `host_latency_us`), trailing-byte back-compat like the
|
||||
compositor/gamepad preference bytes. Also consider surfacing the QUIC path RTT
|
||||
(quinn exposes it) as a diagnostics line, clearly labelled control-plane RTT.
|
||||
|
||||
## Implementation status
|
||||
|
||||
- [ ] Apple (`StreamHUDView`/`SessionModel`/`Stage2Pipeline` + `LatencyMeter` reuse)
|
||||
- [ ] Windows (`app/stream.rs` HUD rows, `session.rs`/`render.rs` meters → p50/p95)
|
||||
- [ ] Linux (`ui_stream.rs` OSD, `session.rs` window meters)
|
||||
- [ ] Android (`stats.rs`/`decode.rs` stage split, `StatsOverlay.kt`)
|
||||
- [ ] probe (rename `capture→reassembled` → `capture→received` in the log line)
|
||||
- [ ] docs-site stats page + matrix; link from `moonlight.md`
|
||||
Reference in New Issue
Block a user