feat(clients): host/network split in every stats HUD (stats phase 2, client side)

Consumes the 0xCF host-timing plane (449a67c) on all four GUI clients: each
keeps a bounded pending ring of receipt samples keyed by pts, matches the
host's per-AU capture→sent reports against it, and the HUD equation becomes

  = host 3.1 + network 6.7 + decode 2.1 + display 2.3

falling back to the combined `= host+network …` term whenever no timing
matched the window (old host / datagram loss) — same total, one split
fewer, never a misleading zero. Apple additionally gains the split as the
only equation line under the stage-1 fallback presenter (receipt is
presenter-independent), a `nextHostTiming` wrapper with its own plane lock,
and a unit-tested `HostNetworkSplitter`; Android extends the JNI stats
array 16→18 doubles (0–15 unchanged); Windows/Linux thread the split
through `Stats` into the HUD and the headless/debug logs.

Docs updated: design/stats-unification.md Phase 2 → implemented (wire
format, fallback semantics), and the docs-site matrix's Sunshine "Host
processing latency" row is now a direct match (ours includes the paced
send; avg vs p50).

Verified here: linux client clippy -D warnings green on the live tree,
windows stub check + hand-verified diff, android cargo-ndk arm64 check
green, apple loopback test extended (needs the rebuilt xcframework + swift
test on the mac). On-glass: pending on all platforms.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-03 21:31:49 +00:00
parent 8470419433
commit 69609945a3
19 changed files with 610 additions and 59 deletions
+34 -15
View File
@@ -121,22 +121,41 @@ Sunshine's "host processing latency" (capture→send).
from frame-number gaps); our `fps` counts received only — equal at ~0 loss.
- Moonlight decode/queue/render times are **means**; ours are p50s.
## Phase 2 (specced, not in v1): split `host+network`
## Phase 2 (implemented): split `host+network` via the 0xCF host-timing plane
Carry the host's capture→send duration per AU (host stamps it at send, e.g. a
varint-µs field in the AU header or a 0.1 ms u16 à la Sunshine's frame header). Client
then displays `host {x} + network {y}` instead of `host+network`, where
`network = (received capture) host_reported` — and the Moonlight matrix gains a
direct "Host processing latency" counterpart. Requires a core wire/ABI bump
(`punktfunk_frame` gains `host_latency_us`), trailing-byte back-compat like the
compositor/gamepad preference bytes. Also consider surfacing the QUIC path RTT
(quinn exposes it) as a diagnostics line, clearly labelled control-plane RTT.
Not an AU-header change after all — the hardened data-plane format stays untouched.
The host reports its share on the established QUIC side-plane pattern:
- **Cap bit**: `Hello::video_caps` gains `VIDEO_CAP_HOST_TIMING` (0x08). NativeClient
ORs it in unconditionally; the probe sets it explicitly. Old hosts ignore it.
- **Datagram**: `HOST_TIMING_MAGIC` 0xCF, 13 bytes — `[tag][pts_ns u64 LE][host_us
u32 LE]` (`quic::HostTiming`). Emitted once per AU by the send thread right after
the AU's last packet left the socket, so `host_us` = capture→fully-sent (capture
read/convert, encode, FEC+seal, paced send) against the same anchor as the wire
pts. Speed-test filler (FLAG_PROBE) is skipped. The synthetic host emits it too
(loopback protocol tests cover the plane).
- **Client math**: correlate by `pts_ns` (a bounded pending ring of receipt samples),
`host = host_us`, `network = hostnet_sample host_us` (saturating) — the two terms
tile the `host+network` stage per frame by construction. Equation line becomes
`= host {x} + network {y} + decode + display`; when no 0xCF arrived in the window
(old host / all datagrams lost) it falls back to the combined `host+network` term.
- **Surfaces**: `NativeClient::next_host_timing()` (Rust clients), C ABI
`PunktfunkHostTiming` + `punktfunk_connection_next_host_timing` (Apple), probe log
line `host/network latency split` (host_p50/p95_us · net_p50/p95_us).
- This is our direct analogue of Sunshine's "host processing latency" — ours
additionally includes the paced send (theirs stops just before the UDP send).
Still open for a later phase: surfacing the QUIC path RTT (quinn exposes it) as a
diagnostics line, clearly labelled control-plane RTT.
## Implementation status
- [ ] Apple (`StreamHUDView`/`SessionModel`/`Stage2Pipeline` + `LatencyMeter` reuse)
- [ ] Windows (`app/stream.rs` HUD rows, `session.rs`/`render.rs` meters → p50/p95)
- [ ] Linux (`ui_stream.rs` OSD, `session.rs` window meters)
- [ ] Android (`stats.rs`/`decode.rs` stage split, `StatsOverlay.kt`)
- [ ] probe (rename `capture→reassembled``capture→received` in the log line)
- [ ] docs-site stats page + matrix; link from `moonlight.md`
- [x] Apple (`StreamHUDView`/`SessionModel`/`Stage2Pipeline` + `LatencyMeter` reuse) — 09a5957
- [x] Windows (`app/stream.rs` HUD rows, `session.rs`/`render.rs` meters → p50/p95) — 09a5957
- [x] Linux (`ui_stream.rs` OSD, `session.rs` window meters) — 09a5957
- [x] Android (`stats.rs`/`decode.rs` stage split, `StatsOverlay.kt`) — 09a5957
- [x] probe (rename `capture→reassembled` → `capture→received` in the log line) — 09a5957
- [x] docs-site stats page + matrix; link from `moonlight.md` — 09a5957
- [x] Phase 2 wire layer (0xCF + cap bit + NativeClient/ABI + host emission + probe split) — 449a67c
- [ ] Phase 2 client HUDs (host/network equation terms on Apple/Windows/Linux/Android)
- [ ] On-glass validation everywhere (Mac swift test + glass, Windows CI + glass, Android device, Linux glass)