feat(host): web-console performance capture — record stream stats, graph them
apple / swift (push) Successful in 1m1s
android / android (push) Successful in 4m13s
ci / rust (push) Successful in 4m42s
ci / web (push) Successful in 50s
ci / docs-site (push) Successful in 53s
windows-host / package (push) Successful in 5m51s
apple / screenshots (push) Successful in 5m1s
deb / build-publish (push) Successful in 2m29s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 33s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
ci / bench (push) Successful in 4m35s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m9s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m10s

Arm streaming-perf-stats capture from the web console, play, stop, and review the
run as graphs; finished captures are saved to disk as browsable/exportable
recordings. Covers both the native punktfunk/1 path and GameStream.

- stats_recorder.rs: one shared Arc<StatsRecorder> ring (created in gamestream::serve,
  shared with the mgmt API + both streaming loops, mirroring NativePairing). The
  hot-path gate is a runtime AtomicBool that replaces the startup-only PUNKTFUNK_PERF
  for *recording* (PERF stdout logging unchanged); bounded ring (~3 h); atomic
  temp+rename writes to ~/.config/punktfunk/captures/*.json; path-traversal-safe ids;
  poison-resilient locks.
- native (punktfunk1.rs) + GameStream (stream.rs) emit a StatsSample at their existing
  ~2 s / ~1 s aggregation boundary — per-stage latency p50/p99, fps new/repeat, goodput,
  loss/FEC deltas — with no new per-frame work beyond the cheap atomic check.
  FrameMsg.was_measured keeps pre-arm in-flight frames out of the first window's
  percentiles (without zeroing the Windows-relay path's fps/encode).
- mgmt.rs: 7 bearer-only /api/v1/stats/* endpoints (capture start/stop/status/live;
  recordings list/get/delete); api/openapi.json regenerated, in sync.
- web: new "Performance" page (recharts, rendered SSR-safe) — capture control, live
  graphs while armed, recordings table (view / download-JSON / delete), and a detail
  view with the latency stacked-area bottleneck breakdown (p50/p99 toggle) + throughput
  + health. Charts adapt to either path's stage set.

Design: design/stats-capture-plan.md. Built and adversarially reviewed via a multi-agent
workflow; workspace build/clippy(-D warnings)/fmt/tests green, OpenAPI no-drift. Not yet
on-glass validated against a live session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-26 13:59:39 +00:00
parent 0a6c9d8852
commit 5bf787eb2b
20 changed files with 2907 additions and 53 deletions
+246
View File
@@ -0,0 +1,246 @@
# Stats capture & graphing — design
Goal: let an operator **enable performance-stats capture from the web console**, play a
session, **stop**, and **review the captured time-series as graphs** in the web console.
Captures are **saved to disk** (browse/compare past sessions; survive host restart) and
cover **both** streaming paths: native punktfunk/1 (`virtual_stream`) and GameStream/Moonlight
(`gamestream/stream.rs`).
This builds on the existing per-stage instrumentation (today gated by `PUNKTFUNK_PERF=1`,
stdout-only, read once at startup). We make recording **runtime-toggleable**, route the same
aggregates into a **shared ring → on-disk recording**, and expose it over the mgmt REST API +
web console.
---
## 1. Host: shared `StatsRecorder`
New module `crates/punktfunk-host/src/stats_recorder.rs`. One `Arc<StatsRecorder>` is created
once in the unified host entry (`gamestream::serve`, the `serve` subcommand) alongside
`Arc<NativePairing>`, and shared with **both** the mgmt API (`MgmtState`) and the streaming
loops (threaded through `punktfunk1::serve``SessionContext``virtual_stream`/`send_loop`,
and into the GameStream encode loop). Mirror the existing `NativePairing` Arc-sharing pattern
exactly.
### Data model (serde + utoipa `ToSchema`; this is the wire + on-disk shape)
```rust
/// One pipeline stage's latency in a window (microseconds).
pub struct StageTiming {
pub name: String, // "capture" | "submit" | "encode" | "packetize" | "send"
pub p50_us: f32,
pub p99_us: f32,
}
/// One aggregated sample (~ every 2 s native, ~ every 1 s GameStream).
pub struct StatsSample {
pub t_ms: u64, // ms since capture start (monotonic, from a stored Instant)
pub session_id: u32, // disambiguates concurrent sessions (usually constant)
pub stages: Vec<StageTiming>, // ordered pipeline stages for this path
pub fps: f32, // genuine NEW frames/s from the source
pub repeat_fps: f32, // re-encoded holds/s (source-starvation indicator)
pub mbps: f32, // tx goodput (Mb/s)
pub bitrate_kbps: u32, // configured target bitrate
pub frames_dropped: u32, // delta in this window
pub packets_dropped: u32, // delta (receiver-side / reassembler), where known
pub send_dropped: u32, // delta (host send-buffer overflow / EAGAIN)
pub fec_recovered: u32, // delta (shards recovered)
}
pub struct CaptureMeta {
pub id: String, // "2026-06-26T20-14-03Z_5120x1440" — also the filename stem
pub started_unix_ms: u64,
pub duration_ms: u64,
pub kind: String, // "native" | "gamestream"
pub width: u32,
pub height: u32,
pub fps: u32,
pub codec: String, // "h264" | "hevc" | "av1"
pub client: String, // short label / fingerprint prefix, or "" if unknown
pub sample_count: u32,
}
pub struct Capture {
pub meta: CaptureMeta,
pub samples: Vec<StatsSample>,
}
pub struct StatsStatus {
pub armed: bool, // capture currently running
pub sample_count: u32, // samples in the in-progress capture
pub started_unix_ms: u64, // 0 if idle
pub kind: String, // path of the in-progress capture, "" if idle
}
```
Stage sets per path (ordered, roughly the per-frame critical path so stacking is meaningful):
- **native**: `capture` (try_latest ring read + color convert), `submit` (NVENC enqueue),
`encode` (lock_bitstream = NVENC schedule + ASIC — the dominant stage under GPU load),
`send` (paced_submit: seal + FEC + pace + sendmmsg).
- **gamestream**: `capture`, `encode`, `packetize` (poll+FEC+packetize), `send`.
> Native naming: today's vectors are `st_cap`→`capture`, `st_submit`→`submit`,
> `st_wait`→`encode`, `pace_us`→`send`. (`encode_us` total ≈ capture+submit+encode; we do not
> emit it as a stage to avoid double-counting — it's implied by the stack.)
### Recorder API
```rust
pub struct StatsRecorder { /* dir, armed: AtomicBool, live: Mutex<Option<Live>>, next_sid: AtomicU32 */ }
impl StatsRecorder {
pub fn new(dir: PathBuf) -> Arc<Self>; // creates dir (0700) if missing
pub fn is_armed(&self) -> bool; // cheap Relaxed atomic load — called on the hot path
/// Arm a new capture. No-op if already armed (returns current status).
pub fn start(&self) -> StatsStatus;
/// A streaming loop announces itself when it first records while armed.
/// Seeds CaptureMeta (kind/w/h/fps/codec/client) on the FIRST registration. Returns session_id.
pub fn register_session(&self, kind: &'static str, w: u32, h: u32, fps: u32, codec: &str, client: &str) -> u32;
/// Append one aggregated sample (called from the loops' existing ~2 s/~1 s boundary).
/// Bounded: cap at MAX_SAMPLES (e.g. 5400 ≈ 3 h @ 2 s). On overflow, stop appending and
/// set a `truncated` flag (DO NOT drop oldest — a saved recording must keep its start).
pub fn push_sample(&self, session_id: u32, sample: StatsSample);
/// Disarm + finalize: write <dir>/<id>.json atomically, clear live, return saved meta.
pub fn stop(&self) -> std::io::Result<Option<CaptureMeta>>;
pub fn status(&self) -> StatsStatus;
pub fn live_snapshot(&self) -> Option<Capture>; // clone of the in-progress capture for live graphing
pub fn list(&self) -> Vec<CaptureMeta>; // scan dir, parse meta only, newest first
pub fn load(&self, id: &str) -> std::io::Result<Capture>;
pub fn delete(&self, id: &str) -> std::io::Result<()>;
}
```
Invariants / safety:
- **No async on the per-frame path.** `is_armed()` is a `Relaxed` atomic load; sample
construction happens only at the existing 2 s / 1 s aggregation boundary, never per frame.
- **`id` is path-traversal-safe.** `load`/`delete` MUST reject any id not matching
`^[A-Za-z0-9._-]+$` (no `/`, no `..`, no `:` — keep it a valid Windows filename), and only ever
join `dir/<id>.json`. Return NotFound on reject. (Endpoints are bearer-authed, but defend in
depth.)
- **Bounded memory.** `MAX_SAMPLES` cap; truncate (keep oldest), never unbounded.
- **Atomic disk write.** Write to `<id>.json.tmp` then rename, so a crash mid-write can't leave
a half file. Pretty-print not required; compact JSON is fine.
- Captures dir: `~/.config/punktfunk/captures/` (next to `cert.pem` etc.). Resolve via the same
config-dir helper the rest of the host uses.
### Runtime gating change (the key behavioral change)
Today the loops measure per-stage timing only `if perf` (a startup bool). Change the per-frame
**measurement** predicate to `let measure = perf || recorder.is_armed();`, re-evaluated each
frame (cheap atomic). Then at the aggregation boundary:
- if `perf` → keep the existing `tracing::info!` log line (unchanged behavior);
- if `recorder.is_armed()` → also build a `StatsSample` and `push_sample`.
So `PUNKTFUNK_PERF=1` still works exactly as before, AND the web toggle now works at runtime
with zero startup flags.
### Where each loop emits the sample
- **native** (`punktfunk1.rs`): the cap/submit/encode(`st_wait`) splits live in the capture
thread; `mbps`/`send_dropped`/`bytes` and `session.stats()` live in the send thread. Emit the
complete sample from **one** place. Cleanest: carry the per-frame `cap_us/submit_us/wait_us`
(and a `repeat: bool`) on `FrameMsg` to the send thread (it already carries `encode_us`), so
`send_loop` builds the whole sample at its existing 2 s boundary where `session.stats()` is
already read. Compute `frames_dropped/packets_dropped/send_dropped/fec_recovered` as deltas vs
the previous window's `Session::stats()` snapshot (the loop already tracks `last_bytes` /
`last_send_dropped` — extend that bookkeeping). `register_session` is called once with the
negotiated mode/codec and the client label.
- **gamestream** (`gamestream/stream.rs`): the encode loop already tracks per-stage max each
1 s. Add p50/p99 accumulation (small per-stage `Vec<u32>` like the native path) and, when
`perf || recorder.is_armed()`, emit a `StatsSample` with stages
`[capture, encode, packetize, send]` + fps (unique new frames) + mbps + whatever loss/byte
counters that path exposes (use 0 where a counter doesn't exist; do NOT fabricate). Call
`register_session("gamestream", ...)` with the GameStream-negotiated mode/codec/client.
Threading: add `stats: Arc<StatsRecorder>` to `SessionContext` and the GameStream stream
setup; the standalone `punktfunk1-host` subcommand (no mgmt) passes a fresh recorder (harmless,
just unused).
---
## 2. Host: mgmt REST API (`mgmt.rs`)
Add `stats: Arc<StatsRecorder>` to `MgmtState`. Register handlers in `api_router_parts()` via
`routes!()` with `#[utoipa::path]`. All under `/api/v1`, **bearer-token only** (operator
actions — do NOT add them to the mTLS `cert_may_access` read-only allowlist). All bodies/returns
derive `ToSchema`; errors use the `ApiJson`/`ApiError` envelope. Tag every operation `stats`.
| Method & path | fn (operationId) | body → returns |
|---------------------------------------|-------------------------|-------------------------------|
| POST `/api/v1/stats/capture/start` | `stats_capture_start` | — → `StatsStatus` |
| POST `/api/v1/stats/capture/stop` | `stats_capture_stop` | — → `CaptureMeta` (200) / 204-ish if nothing was recording |
| GET `/api/v1/stats/capture/status` | `stats_capture_status` | → `StatsStatus` |
| GET `/api/v1/stats/capture/live` | `stats_capture_live` | → `Capture` (in-progress; 404/empty if idle) |
| GET `/api/v1/stats/recordings` | `stats_recordings_list` | → `Vec<CaptureMeta>` |
| GET `/api/v1/stats/recordings/{id}` | `stats_recording_get` | → `Capture` |
| DELETE `/api/v1/stats/recordings/{id}`| `stats_recording_delete`| → `StatsStatus`/204 |
Register the new `ToSchema` types with the OpenApi derive's `components(schemas(...))` list.
Then regenerate the checked-in spec:
```
cargo run -p punktfunk-host -- openapi > api/openapi.json
```
CI fails on drift — the regenerated `api/openapi.json` MUST be committed.
---
## 3. Web console (`web/`)
New page **"Performance"** following the established route → section/index (fetch) →
section/view (presentational) pattern, registered in the `NAV` array (`app-shell.tsx`) with a
lucide icon (`Activity` or `LineChart`).
- Route: `web/src/routes/stats.tsx``createFileRoute('/stats')``SectionStats`.
- Section: `web/src/sections/Stats/index.tsx` (orval hooks) + `view.tsx` (presentational,
i18n via Paraglide `m.*`). Use `Section`, `QueryState`, `Card`/`CardHeader`/`CardTitle`/
`CardContent`, `Button`, `Badge` from `web/src/components/ui`.
- Charts: **add `recharts`** to `web/package.json` (no chart lib exists today). Render charts
**client-only** (a mounted guard) so SSR doesn't choke on `ResponsiveContainer`'s 0-width
measure. Theme via existing CSS variables / brand violet, dark-mode aware.
Data hooks come from regenerated orval (`bun run api:gen` after the host's openapi.json is
updated): `useStatsCaptureStatus`, `useStatsCaptureStart`, `useStatsCaptureStop`,
`useStatsCaptureLive`, `useStatsRecordingsList`, `useStatsRecordingGet`,
`useStatsRecordingDelete` (exact names per orval's tag/operationId convention — verify against
generated output and adjust the view imports to match).
UI layout:
1. **Capture control card** — Start/Stop button (mutations; invalidate status query on
success), a "Recording…"/"Idle" `Badge`, elapsed time + live sample count
(`useStatsCaptureStatus`, `refetchInterval: 2000`). On Start, the live chart appears.
2. **Live chart** (visible while armed; `useStatsCaptureLive`, `refetchInterval: 2000`) — the
latency stage breakdown as a **stacked area** (capture/submit/encode/send in µs, the
"where does the time go" view), with fps and mbps as secondary line charts.
3. **Recordings card** — table from `useStatsRecordingsList`: time, kind badge, resolution,
codec, duration, sample count; row actions **View** (select → detail), **Download** (export
the `Capture` JSON via the recording GET), **Delete** (mutation, confirm).
4. **Recording detail** — when a recording (or the live capture) is selected, render the full
graph set from its `samples`:
- Latency stage breakdown (stacked area, µs) — primary bottleneck view; p99 overlay toggle.
- Throughput: fps (new vs repeat) + mbps.
- Health: frames_dropped / packets_dropped / send_dropped / fec_recovered over time.
i18n: add keys to `web/messages/en.json` + `de.json` (nav label, titles, button/labels) and
regenerate Paraglide. Keep both locales in sync.
---
## 4. Verification / done-criteria
- `cargo build -p punktfunk-host` (and `--workspace`), `cargo clippy --workspace --all-targets
-D warnings`, `cargo fmt --all --check` — green.
- `cargo run -p punktfunk-host -- openapi > api/openapi.json` — committed, no drift.
- `PUNKTFUNK_PERF=1` stdout behavior unchanged (no regression to the existing perf log).
- Web: orval regen clean, typecheck/build green, charts render client-side.
- CLAUDE.md status note + this plan updated.
- Adversarial review: hot-path stays sync + bounded; `id` path-traversal-safe; OpenAPI/orval no
drift; SSR-safe charts; both paths actually emit samples.