Files
punktfunk/design/stats-capture-plan.md
T
enricobuehler 5bf787eb2b
apple / swift (push) Successful in 1m1s
android / android (push) Successful in 4m13s
ci / rust (push) Successful in 4m42s
ci / web (push) Successful in 50s
ci / docs-site (push) Successful in 53s
windows-host / package (push) Successful in 5m51s
apple / screenshots (push) Successful in 5m1s
deb / build-publish (push) Successful in 2m29s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 33s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
ci / bench (push) Successful in 4m35s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m9s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m10s
feat(host): web-console performance capture — record stream stats, graph them
Arm streaming-perf-stats capture from the web console, play, stop, and review the
run as graphs; finished captures are saved to disk as browsable/exportable
recordings. Covers both the native punktfunk/1 path and GameStream.

- stats_recorder.rs: one shared Arc<StatsRecorder> ring (created in gamestream::serve,
  shared with the mgmt API + both streaming loops, mirroring NativePairing). The
  hot-path gate is a runtime AtomicBool that replaces the startup-only PUNKTFUNK_PERF
  for *recording* (PERF stdout logging unchanged); bounded ring (~3 h); atomic
  temp+rename writes to ~/.config/punktfunk/captures/*.json; path-traversal-safe ids;
  poison-resilient locks.
- native (punktfunk1.rs) + GameStream (stream.rs) emit a StatsSample at their existing
  ~2 s / ~1 s aggregation boundary — per-stage latency p50/p99, fps new/repeat, goodput,
  loss/FEC deltas — with no new per-frame work beyond the cheap atomic check.
  FrameMsg.was_measured keeps pre-arm in-flight frames out of the first window's
  percentiles (without zeroing the Windows-relay path's fps/encode).
- mgmt.rs: 7 bearer-only /api/v1/stats/* endpoints (capture start/stop/status/live;
  recordings list/get/delete); api/openapi.json regenerated, in sync.
- web: new "Performance" page (recharts, rendered SSR-safe) — capture control, live
  graphs while armed, recordings table (view / download-JSON / delete), and a detail
  view with the latency stacked-area bottleneck breakdown (p50/p99 toggle) + throughput
  + health. Charts adapt to either path's stage set.

Design: design/stats-capture-plan.md. Built and adversarially reviewed via a multi-agent
workflow; workspace build/clippy(-D warnings)/fmt/tests green, OpenAPI no-drift. Not yet
on-glass validated against a live session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 13:59:39 +00:00

13 KiB

Stats capture & graphing — design

Goal: let an operator enable performance-stats capture from the web console, play a session, stop, and review the captured time-series as graphs in the web console. Captures are saved to disk (browse/compare past sessions; survive host restart) and cover both streaming paths: native punktfunk/1 (virtual_stream) and GameStream/Moonlight (gamestream/stream.rs).

This builds on the existing per-stage instrumentation (today gated by PUNKTFUNK_PERF=1, stdout-only, read once at startup). We make recording runtime-toggleable, route the same aggregates into a shared ring → on-disk recording, and expose it over the mgmt REST API + web console.


1. Host: shared StatsRecorder

New module crates/punktfunk-host/src/stats_recorder.rs. One Arc<StatsRecorder> is created once in the unified host entry (gamestream::serve, the serve subcommand) alongside Arc<NativePairing>, and shared with both the mgmt API (MgmtState) and the streaming loops (threaded through punktfunk1::serveSessionContextvirtual_stream/send_loop, and into the GameStream encode loop). Mirror the existing NativePairing Arc-sharing pattern exactly.

Data model (serde + utoipa ToSchema; this is the wire + on-disk shape)

/// One pipeline stage's latency in a window (microseconds).
pub struct StageTiming {
    pub name: String,      // "capture" | "submit" | "encode" | "packetize" | "send"
    pub p50_us: f32,
    pub p99_us: f32,
}

/// One aggregated sample (~ every 2 s native, ~ every 1 s GameStream).
pub struct StatsSample {
    pub t_ms: u64,             // ms since capture start (monotonic, from a stored Instant)
    pub session_id: u32,      // disambiguates concurrent sessions (usually constant)
    pub stages: Vec<StageTiming>, // ordered pipeline stages for this path
    pub fps: f32,             // genuine NEW frames/s from the source
    pub repeat_fps: f32,      // re-encoded holds/s (source-starvation indicator)
    pub mbps: f32,            // tx goodput (Mb/s)
    pub bitrate_kbps: u32,    // configured target bitrate
    pub frames_dropped: u32,  // delta in this window
    pub packets_dropped: u32, // delta (receiver-side / reassembler), where known
    pub send_dropped: u32,    // delta (host send-buffer overflow / EAGAIN)
    pub fec_recovered: u32,   // delta (shards recovered)
}

pub struct CaptureMeta {
    pub id: String,            // "2026-06-26T20-14-03Z_5120x1440" — also the filename stem
    pub started_unix_ms: u64,
    pub duration_ms: u64,
    pub kind: String,          // "native" | "gamestream"
    pub width: u32,
    pub height: u32,
    pub fps: u32,
    pub codec: String,         // "h264" | "hevc" | "av1"
    pub client: String,        // short label / fingerprint prefix, or "" if unknown
    pub sample_count: u32,
}

pub struct Capture {
    pub meta: CaptureMeta,
    pub samples: Vec<StatsSample>,
}

pub struct StatsStatus {
    pub armed: bool,           // capture currently running
    pub sample_count: u32,     // samples in the in-progress capture
    pub started_unix_ms: u64,  // 0 if idle
    pub kind: String,          // path of the in-progress capture, "" if idle
}

Stage sets per path (ordered, roughly the per-frame critical path so stacking is meaningful):

  • native: capture (try_latest ring read + color convert), submit (NVENC enqueue), encode (lock_bitstream = NVENC schedule + ASIC — the dominant stage under GPU load), send (paced_submit: seal + FEC + pace + sendmmsg).
  • gamestream: capture, encode, packetize (poll+FEC+packetize), send.

Native naming: today's vectors are st_capcapture, st_submitsubmit, st_waitencode, pace_ussend. (encode_us total ≈ capture+submit+encode; we do not emit it as a stage to avoid double-counting — it's implied by the stack.)

Recorder API

pub struct StatsRecorder { /* dir, armed: AtomicBool, live: Mutex<Option<Live>>, next_sid: AtomicU32 */ }

impl StatsRecorder {
    pub fn new(dir: PathBuf) -> Arc<Self>;   // creates dir (0700) if missing

    pub fn is_armed(&self) -> bool;          // cheap Relaxed atomic load — called on the hot path

    /// Arm a new capture. No-op if already armed (returns current status).
    pub fn start(&self) -> StatsStatus;

    /// A streaming loop announces itself when it first records while armed.
    /// Seeds CaptureMeta (kind/w/h/fps/codec/client) on the FIRST registration. Returns session_id.
    pub fn register_session(&self, kind: &'static str, w: u32, h: u32, fps: u32, codec: &str, client: &str) -> u32;

    /// Append one aggregated sample (called from the loops' existing ~2 s/~1 s boundary).
    /// Bounded: cap at MAX_SAMPLES (e.g. 5400 ≈ 3 h @ 2 s). On overflow, stop appending and
    /// set a `truncated` flag (DO NOT drop oldest — a saved recording must keep its start).
    pub fn push_sample(&self, session_id: u32, sample: StatsSample);

    /// Disarm + finalize: write <dir>/<id>.json atomically, clear live, return saved meta.
    pub fn stop(&self) -> std::io::Result<Option<CaptureMeta>>;

    pub fn status(&self) -> StatsStatus;
    pub fn live_snapshot(&self) -> Option<Capture>;  // clone of the in-progress capture for live graphing

    pub fn list(&self) -> Vec<CaptureMeta>;          // scan dir, parse meta only, newest first
    pub fn load(&self, id: &str) -> std::io::Result<Capture>;
    pub fn delete(&self, id: &str) -> std::io::Result<()>;
}

Invariants / safety:

  • No async on the per-frame path. is_armed() is a Relaxed atomic load; sample construction happens only at the existing 2 s / 1 s aggregation boundary, never per frame.
  • id is path-traversal-safe. load/delete MUST reject any id not matching ^[A-Za-z0-9._-]+$ (no /, no .., no : — keep it a valid Windows filename), and only ever join dir/<id>.json. Return NotFound on reject. (Endpoints are bearer-authed, but defend in depth.)
  • Bounded memory. MAX_SAMPLES cap; truncate (keep oldest), never unbounded.
  • Atomic disk write. Write to <id>.json.tmp then rename, so a crash mid-write can't leave a half file. Pretty-print not required; compact JSON is fine.
  • Captures dir: ~/.config/punktfunk/captures/ (next to cert.pem etc.). Resolve via the same config-dir helper the rest of the host uses.

Runtime gating change (the key behavioral change)

Today the loops measure per-stage timing only if perf (a startup bool). Change the per-frame measurement predicate to let measure = perf || recorder.is_armed();, re-evaluated each frame (cheap atomic). Then at the aggregation boundary:

  • if perf → keep the existing tracing::info! log line (unchanged behavior);
  • if recorder.is_armed() → also build a StatsSample and push_sample.

So PUNKTFUNK_PERF=1 still works exactly as before, AND the web toggle now works at runtime with zero startup flags.

Where each loop emits the sample

  • native (punktfunk1.rs): the cap/submit/encode(st_wait) splits live in the capture thread; mbps/send_dropped/bytes and session.stats() live in the send thread. Emit the complete sample from one place. Cleanest: carry the per-frame cap_us/submit_us/wait_us (and a repeat: bool) on FrameMsg to the send thread (it already carries encode_us), so send_loop builds the whole sample at its existing 2 s boundary where session.stats() is already read. Compute frames_dropped/packets_dropped/send_dropped/fec_recovered as deltas vs the previous window's Session::stats() snapshot (the loop already tracks last_bytes / last_send_dropped — extend that bookkeeping). register_session is called once with the negotiated mode/codec and the client label.
  • gamestream (gamestream/stream.rs): the encode loop already tracks per-stage max each 1 s. Add p50/p99 accumulation (small per-stage Vec<u32> like the native path) and, when perf || recorder.is_armed(), emit a StatsSample with stages [capture, encode, packetize, send] + fps (unique new frames) + mbps + whatever loss/byte counters that path exposes (use 0 where a counter doesn't exist; do NOT fabricate). Call register_session("gamestream", ...) with the GameStream-negotiated mode/codec/client.

Threading: add stats: Arc<StatsRecorder> to SessionContext and the GameStream stream setup; the standalone punktfunk1-host subcommand (no mgmt) passes a fresh recorder (harmless, just unused).


2. Host: mgmt REST API (mgmt.rs)

Add stats: Arc<StatsRecorder> to MgmtState. Register handlers in api_router_parts() via routes!() with #[utoipa::path]. All under /api/v1, bearer-token only (operator actions — do NOT add them to the mTLS cert_may_access read-only allowlist). All bodies/returns derive ToSchema; errors use the ApiJson/ApiError envelope. Tag every operation stats.

Method & path fn (operationId) body → returns
POST /api/v1/stats/capture/start stats_capture_start — → StatsStatus
POST /api/v1/stats/capture/stop stats_capture_stop — → CaptureMeta (200) / 204-ish if nothing was recording
GET /api/v1/stats/capture/status stats_capture_status StatsStatus
GET /api/v1/stats/capture/live stats_capture_live Capture (in-progress; 404/empty if idle)
GET /api/v1/stats/recordings stats_recordings_list Vec<CaptureMeta>
GET /api/v1/stats/recordings/{id} stats_recording_get Capture
DELETE /api/v1/stats/recordings/{id} stats_recording_delete StatsStatus/204

Register the new ToSchema types with the OpenApi derive's components(schemas(...)) list. Then regenerate the checked-in spec:

cargo run -p punktfunk-host -- openapi > api/openapi.json

CI fails on drift — the regenerated api/openapi.json MUST be committed.


3. Web console (web/)

New page "Performance" following the established route → section/index (fetch) → section/view (presentational) pattern, registered in the NAV array (app-shell.tsx) with a lucide icon (Activity or LineChart).

  • Route: web/src/routes/stats.tsxcreateFileRoute('/stats')SectionStats.
  • Section: web/src/sections/Stats/index.tsx (orval hooks) + view.tsx (presentational, i18n via Paraglide m.*). Use Section, QueryState, Card/CardHeader/CardTitle/ CardContent, Button, Badge from web/src/components/ui.
  • Charts: add recharts to web/package.json (no chart lib exists today). Render charts client-only (a mounted guard) so SSR doesn't choke on ResponsiveContainer's 0-width measure. Theme via existing CSS variables / brand violet, dark-mode aware.

Data hooks come from regenerated orval (bun run api:gen after the host's openapi.json is updated): useStatsCaptureStatus, useStatsCaptureStart, useStatsCaptureStop, useStatsCaptureLive, useStatsRecordingsList, useStatsRecordingGet, useStatsRecordingDelete (exact names per orval's tag/operationId convention — verify against generated output and adjust the view imports to match).

UI layout:

  1. Capture control card — Start/Stop button (mutations; invalidate status query on success), a "Recording…"/"Idle" Badge, elapsed time + live sample count (useStatsCaptureStatus, refetchInterval: 2000). On Start, the live chart appears.
  2. Live chart (visible while armed; useStatsCaptureLive, refetchInterval: 2000) — the latency stage breakdown as a stacked area (capture/submit/encode/send in µs, the "where does the time go" view), with fps and mbps as secondary line charts.
  3. Recordings card — table from useStatsRecordingsList: time, kind badge, resolution, codec, duration, sample count; row actions View (select → detail), Download (export the Capture JSON via the recording GET), Delete (mutation, confirm).
  4. Recording detail — when a recording (or the live capture) is selected, render the full graph set from its samples:
    • Latency stage breakdown (stacked area, µs) — primary bottleneck view; p99 overlay toggle.
    • Throughput: fps (new vs repeat) + mbps.
    • Health: frames_dropped / packets_dropped / send_dropped / fec_recovered over time.

i18n: add keys to web/messages/en.json + de.json (nav label, titles, button/labels) and regenerate Paraglide. Keep both locales in sync.


4. Verification / done-criteria

  • cargo build -p punktfunk-host (and --workspace), cargo clippy --workspace --all-targets -D warnings, cargo fmt --all --check — green.
  • cargo run -p punktfunk-host -- openapi > api/openapi.json — committed, no drift.
  • PUNKTFUNK_PERF=1 stdout behavior unchanged (no regression to the existing perf log).
  • Web: orval regen clean, typecheck/build green, charts render client-side.
  • CLAUDE.md status note + this plan updated.
  • Adversarial review: hot-path stays sync + bounded; id path-traversal-safe; OpenAPI/orval no drift; SSR-safe charts; both paths actually emit samples.