feat(clients): unified stats vocabulary across every client + Moonlight comparison docs

One stat model everywhere (design/stats-unification.md): four measurement points (capture/received/decoded/displayed), three stages that tile the interval exactly, and a HUD that shows the addition explicitly — end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass = host+network 9.8 + decode 2.1 + display 2.3 replacing each client's ad-hoc mix of overlapping absolutes (the Apple HUD's three arrow lines that looked sequential but weren't), mean-vs-median decode times (Windows/Linux), missing same-host-clock flags (Windows/Linux), and three different names for the same capture→received measurement (probe's "reassembled", Apple/Android's "client", Windows/Linux's post-decode "lat"). Per client: Apple threads receivedNs through the VT decode via the frame refcon bit pattern so the decode stage exists at all (stage-1 fallback honestly degrades to a capture→received headline); Windows carries FrameTimes through the existing frame channel to the render thread and adds e2e p50/p95 post-Present; Linux stamps received at AU pop and rides decoded_ns on DecodedFrame to the paintable-set site; Android pairs receipt stamps with MediaCodec output buffers via the codec's pts round-trip (JNI stats array 14→16 doubles, indexes 0-13 unchanged). fps now uniformly counts received AUs; lost/(received+lost) per window, hidden at zero. docs-site gains "Understanding the Stats Overlay": what each line means, why the equation only approximately sums (percentiles), and a line-by-line Moonlight/Sunshine matrix — including that Moonlight has no end-to-end number and its "network latency" is an ENet control RTT, so punktfunk's headline must not be compared against any single Moonlight line. Verified here: linux client + probe + core check/clippy/fmt green, android native cargo-ndk arm64 check green. Pending: Windows CI + on-glass, swift test on the mac, on-device Android. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 21:01:29 +00:00
parent c7630ff5dc
commit 09a5957c6d
38 changed files with 1122 additions and 380 deletions
@@ -26,6 +26,7 @@
    "host-cli",
    "---Troubleshooting---",
    "troubleshooting",
+    "stats",
    "forgot-password",
    "---Project---",
    "roadmap",
@@ -52,3 +52,6 @@ it. Mouse, keyboard, and controllers flow back to the host.
  clients](/docs/clients) have a built-in speed test; with Moonlight, set the bitrate manually.
 - Moonlight uses the GameStream protocol, not punktfunk's native FEC/encryption extensions. On a
  solid LAN this is fine; on a lossy link a [native client](/docs/clients) holds up better.
+- Comparing Moonlight's performance overlay with a punktfunk client's stats HUD? The numbers
+  measure different slices of the pipeline — see [Understanding the Stats Overlay](/docs/stats)
+  for a line-by-line comparison matrix before drawing conclusions.
@@ -0,0 +1,131 @@
+---
+title: Understanding the Stats Overlay
+description: What every number in the punktfunk stats HUD means, and how to compare them fairly with Moonlight/Sunshine.
+---
+
+Every punktfunk client has an in-stream stats overlay. All clients use **the same
+vocabulary, the same measurement points, and the same math**, so a number on your
+phone means exactly what the same number means on your desktop.
+
+## The four measurement points
+
+Every latency figure is the time between two of these four points in a video frame's
+life:
+
+1. **capture** — the host grabs the frame from the (virtual) display. Stamped on the
+   host's clock and carried with the frame.
+2. **received** — your client has fully received and reassembled the frame from the
+   network (after any FEC recovery), before decoding.
+3. **decoded** — the video decoder has produced the picture.
+4. **displayed** — the picture is handed to the screen (as close to "photons" as the
+   platform lets us measure).
+
+## Reading the overlay
+
+```
+1920×1080@120 · 119 fps · 38.2 Mb/s · HEVC 10-bit HDR · GPU decode
+end-to-end 14.2 ms p50 · 19.8 p95 · capture→on-glass
+= host+network 9.8 + decode 2.1 + display 2.3
+lost 3 (0.1%) · skipped 1 · FEC 12
+```
+
+- **Line 1 — the stream.** Resolution@refresh, frames received per second, and the
+  received video bitrate (goodput — FEC overhead not counted), plus codec details.
+- **Line 2 — the headline.** `end-to-end` is the *directly measured* time from host
+  capture to the endpoint named at the end of the line (`capture→on-glass` here).
+  `p50` = the typical frame (median), `p95` = the slow outliers. This is the one
+  number that summarizes your stream.
+- **Line 3 — where the time goes.** The three stages **tile the end-to-end interval**
+  — each starts where the previous one ends, so they add up to the headline:
+  - `host+network` — capture → received: the host's capture/encode/send pipeline
+    *plus* the network flight and reassembly, in one number.
+  - `decode` — received → decoded, on your device.
+  - `display` — decoded → displayed: waiting for the right screen refresh, rendering,
+    and vsync.
+
+  (Stage values are per-stage medians, so they sum only *approximately* to the
+  headline median — percentiles aren't perfectly additive. The headline is measured
+  directly, never computed as a sum.)
+- **Line 4 — reliability** (only shown when something is nonzero). `lost` = frames the
+  network dropped beyond FEC's ability to recover; `skipped` = frames your client
+  chose not to display because a newer one had already arrived; `FEC` = packet shards
+  the error correction recovered this second (loss that you *didn't* feel).
+
+All values refresh once per second over the last second of frames.
+
+### Clocks, and the `(same-host clock)` tag
+
+`end-to-end` and `host+network` span two machines, so they need the two clocks to
+agree: at connect, the client runs an NTP-style handshake with the host and corrects
+for the measured clock offset. If that handshake wasn't possible, the overlay appends
+**`(same-host clock)`** — the numbers are then only trustworthy when client and host
+run on the same machine. `decode` and `display` are single-machine measurements and
+are always exact.
+
+### What each platform can measure
+
+Not every platform exposes a true "displayed" instant, so the headline's endpoint is
+always spelled out rather than pretending:
+
+| client | headline | why |
+|---|---|---|
+| Windows, macOS/iOS (Metal presenter), Linux | `capture→on-glass` / `capture→displayed` | present instant available (GTK measures at hand-off to the compositor, which adds about one compositor cycle after it) |
+| Android | `capture→decoded` | the display hand-off happens inside MediaCodec/SurfaceView where precise present timing isn't exposed |
+| macOS/iOS fallback presenter | `capture→received` | the system video layer hides decode and present timing entirely |
+
+A shorter chain means the number is **smaller because it measures less** — check the
+endpoint before comparing two devices.
+
+## Comparing with Moonlight / Sunshine
+
+Moonlight's overlay and punktfunk's measure different slices of the pipeline, and the
+single biggest difference is:
+
+> **Moonlight has no end-to-end number.** Its overlay shows separate client-side
+> segments (decode time, queue delay, render time) and — on Sunshine hosts — a
+> host-side number. Nothing in Moonlight measures capture-to-glass, and nothing
+> measures the network flight of video frames. punktfunk's `end-to-end` line has **no
+> Moonlight counterpart** — never compare it against any single Moonlight line.
+
+To compare fairly, reconstruct an approximate end-to-end from Moonlight's lines:
+
+```
+Moonlight ≈ host processing latency (avg)
+          + ½ × average network latency
+          + average decoding time
+          + average frame queue delay
+          + average rendering time
+```
+
+…and compare *that* against punktfunk's `end-to-end`. (It's still approximate:
+Moonlight's segments are averages over a slightly different window, and the ½·RTT term
+stands in for a one-way frame flight that Moonlight doesn't measure.)
+
+### Line-by-line matrix
+
+| Moonlight overlay line | What it actually measures | punktfunk equivalent | Comparable? |
+|---|---|---|---|
+| `Video stream: WxH FPS` | Received **plus inferred-lost** frames/s (host-rate estimate from frame sequence gaps) | `fps` (line 1) | ≈ equal when loss is near zero; punktfunk counts received frames only |
+| `Incoming frame rate from network` | Frames reassembled from the network per second | `fps` (line 1) | **Yes — direct** |
+| `Decoding frame rate` (desktop only) | Frames leaving the decoder per second | not shown separately (equals `fps` unless the decoder is falling behind) | — |
+| `Rendering frame rate` (desktop only) | Frames actually presented per second | `fps` minus `skipped` | Approximately |
+| `Host processing latency min/max/avg` (Sunshine hosts) | Host capture → just-before-send, reported by Sunshine per frame | contained inside `host+network`; the host-side breakdown lives in the punktfunk web console (capture/encode/send stages) | Indirect — punktfunk's `host+network` additionally includes the network flight |
+| `Frames dropped by your network connection` | Frame-sequence gaps ÷ total frames | `lost` (line 4) | **Yes — direct** |
+| `Frames dropped due to network jitter` | Decoded frames the *client's pacer* chose to drop ÷ decoded frames | `skipped` (line 4) | Approximately (both are client-side pacing decisions, despite Moonlight's name) |
+| `Average network latency` | The **control connection's round-trip time** (ENet RTT + variance) — not video frame latency | none, on purpose | **No.** An RTT is not a frame latency; punktfunk measures the actual per-frame path instead |
+| `Average decoding time` | Mean time from decoder enqueue to picture out | `decode` (p50) | Yes (mean vs median; both include decoder queueing) |
+| `Average frame queue delay` | Mean time a decoded frame waits for its vsync slot | inside `display` | Sum the two Moonlight lines → |
+| `Average rendering time (incl. V-sync latency)` | Mean duration of the present call | inside `display` | …and compare against punktfunk's `display` |
+| *(no equivalent)* | — | `end-to-end` — true capture→glass, clock-skew-corrected across machines | **punktfunk only** |
+| *(no equivalent)* | — | `FEC` recovered shards (loss absorbed invisibly) | punktfunk only |
+
+Other differences worth knowing when squinting at both overlays side by side:
+
+- **Averages vs percentiles.** Moonlight's time values are means; punktfunk shows
+  medians (p50) with a p95 for the headline. Under jitter, a mean sits above the
+  median — Moonlight's numbers read slightly "worse" than an equivalent p50.
+- **Windows.** Both refresh about once per second; Moonlight over a ~1–2 s sliding
+  window, punktfunk over the last full second.
+- **Host frame rate.** Moonlight's headline FPS estimates what the *host* produced
+  (received + lost). punktfunk shows what your client actually received, and reports
+  loss separately.
@@ -89,7 +89,7 @@ Notable capabilities that have landed, newest first:
  limit), each with its own virtual output and encoder — e.g. stream the same desktop to a
  laptop and a TV simultaneously.
 - **Cross-machine latency HUD + wall-clock skew handshake.** A short NTP-style handshake
-  aligns client and host clocks, making capture-to-reassembled latency valid across
+  aligns client and host clocks, making capture-to-received latency valid across
  machines; the Apple client surfaces a skew-corrected capture-to-receipt p50/p95 in its
  HUD.
 - **Native LAN auto-discovery.** Hosts advertise `_punktfunk._udp` over mDNS (with TXT