Arm streaming-perf-stats capture from the web console, play, stop, and review the run as graphs; finished captures are saved to disk as browsable/exportable recordings. Covers both the native punktfunk/1 path and GameStream. - stats_recorder.rs: one shared Arc<StatsRecorder> ring (created in gamestream::serve, shared with the mgmt API + both streaming loops, mirroring NativePairing). The hot-path gate is a runtime AtomicBool that replaces the startup-only PUNKTFUNK_PERF for *recording* (PERF stdout logging unchanged); bounded ring (~3 h); atomic temp+rename writes to ~/.config/punktfunk/captures/*.json; path-traversal-safe ids; poison-resilient locks. - native (punktfunk1.rs) + GameStream (stream.rs) emit a StatsSample at their existing ~2 s / ~1 s aggregation boundary — per-stage latency p50/p99, fps new/repeat, goodput, loss/FEC deltas — with no new per-frame work beyond the cheap atomic check. FrameMsg.was_measured keeps pre-arm in-flight frames out of the first window's percentiles (without zeroing the Windows-relay path's fps/encode). - mgmt.rs: 7 bearer-only /api/v1/stats/* endpoints (capture start/stop/status/live; recordings list/get/delete); api/openapi.json regenerated, in sync. - web: new "Performance" page (recharts, rendered SSR-safe) — capture control, live graphs while armed, recordings table (view / download-JSON / delete), and a detail view with the latency stacked-area bottleneck breakdown (p50/p99 toggle) + throughput + health. Charts adapt to either path's stage set. Design: design/stats-capture-plan.md. Built and adversarially reviewed via a multi-agent workflow; workspace build/clippy(-D warnings)/fmt/tests green, OpenAPI no-drift. Not yet on-glass validated against a live session. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
13 KiB
Stats capture & graphing — design
Goal: let an operator enable performance-stats capture from the web console, play a
session, stop, and review the captured time-series as graphs in the web console.
Captures are saved to disk (browse/compare past sessions; survive host restart) and
cover both streaming paths: native punktfunk/1 (virtual_stream) and GameStream/Moonlight
(gamestream/stream.rs).
This builds on the existing per-stage instrumentation (today gated by PUNKTFUNK_PERF=1,
stdout-only, read once at startup). We make recording runtime-toggleable, route the same
aggregates into a shared ring → on-disk recording, and expose it over the mgmt REST API +
web console.
1. Host: shared StatsRecorder
New module crates/punktfunk-host/src/stats_recorder.rs. One Arc<StatsRecorder> is created
once in the unified host entry (gamestream::serve, the serve subcommand) alongside
Arc<NativePairing>, and shared with both the mgmt API (MgmtState) and the streaming
loops (threaded through punktfunk1::serve → SessionContext → virtual_stream/send_loop,
and into the GameStream encode loop). Mirror the existing NativePairing Arc-sharing pattern
exactly.
Data model (serde + utoipa ToSchema; this is the wire + on-disk shape)
/// One pipeline stage's latency in a window (microseconds).
pub struct StageTiming {
pub name: String, // "capture" | "submit" | "encode" | "packetize" | "send"
pub p50_us: f32,
pub p99_us: f32,
}
/// One aggregated sample (~ every 2 s native, ~ every 1 s GameStream).
pub struct StatsSample {
pub t_ms: u64, // ms since capture start (monotonic, from a stored Instant)
pub session_id: u32, // disambiguates concurrent sessions (usually constant)
pub stages: Vec<StageTiming>, // ordered pipeline stages for this path
pub fps: f32, // genuine NEW frames/s from the source
pub repeat_fps: f32, // re-encoded holds/s (source-starvation indicator)
pub mbps: f32, // tx goodput (Mb/s)
pub bitrate_kbps: u32, // configured target bitrate
pub frames_dropped: u32, // delta in this window
pub packets_dropped: u32, // delta (receiver-side / reassembler), where known
pub send_dropped: u32, // delta (host send-buffer overflow / EAGAIN)
pub fec_recovered: u32, // delta (shards recovered)
}
pub struct CaptureMeta {
pub id: String, // "2026-06-26T20-14-03Z_5120x1440" — also the filename stem
pub started_unix_ms: u64,
pub duration_ms: u64,
pub kind: String, // "native" | "gamestream"
pub width: u32,
pub height: u32,
pub fps: u32,
pub codec: String, // "h264" | "hevc" | "av1"
pub client: String, // short label / fingerprint prefix, or "" if unknown
pub sample_count: u32,
}
pub struct Capture {
pub meta: CaptureMeta,
pub samples: Vec<StatsSample>,
}
pub struct StatsStatus {
pub armed: bool, // capture currently running
pub sample_count: u32, // samples in the in-progress capture
pub started_unix_ms: u64, // 0 if idle
pub kind: String, // path of the in-progress capture, "" if idle
}
Stage sets per path (ordered, roughly the per-frame critical path so stacking is meaningful):
- native:
capture(try_latest ring read + color convert),submit(NVENC enqueue),encode(lock_bitstream = NVENC schedule + ASIC — the dominant stage under GPU load),send(paced_submit: seal + FEC + pace + sendmmsg). - gamestream:
capture,encode,packetize(poll+FEC+packetize),send.
Native naming: today's vectors are
st_cap→capture,st_submit→submit,st_wait→encode,pace_us→send. (encode_ustotal ≈ capture+submit+encode; we do not emit it as a stage to avoid double-counting — it's implied by the stack.)
Recorder API
pub struct StatsRecorder { /* dir, armed: AtomicBool, live: Mutex<Option<Live>>, next_sid: AtomicU32 */ }
impl StatsRecorder {
pub fn new(dir: PathBuf) -> Arc<Self>; // creates dir (0700) if missing
pub fn is_armed(&self) -> bool; // cheap Relaxed atomic load — called on the hot path
/// Arm a new capture. No-op if already armed (returns current status).
pub fn start(&self) -> StatsStatus;
/// A streaming loop announces itself when it first records while armed.
/// Seeds CaptureMeta (kind/w/h/fps/codec/client) on the FIRST registration. Returns session_id.
pub fn register_session(&self, kind: &'static str, w: u32, h: u32, fps: u32, codec: &str, client: &str) -> u32;
/// Append one aggregated sample (called from the loops' existing ~2 s/~1 s boundary).
/// Bounded: cap at MAX_SAMPLES (e.g. 5400 ≈ 3 h @ 2 s). On overflow, stop appending and
/// set a `truncated` flag (DO NOT drop oldest — a saved recording must keep its start).
pub fn push_sample(&self, session_id: u32, sample: StatsSample);
/// Disarm + finalize: write <dir>/<id>.json atomically, clear live, return saved meta.
pub fn stop(&self) -> std::io::Result<Option<CaptureMeta>>;
pub fn status(&self) -> StatsStatus;
pub fn live_snapshot(&self) -> Option<Capture>; // clone of the in-progress capture for live graphing
pub fn list(&self) -> Vec<CaptureMeta>; // scan dir, parse meta only, newest first
pub fn load(&self, id: &str) -> std::io::Result<Capture>;
pub fn delete(&self, id: &str) -> std::io::Result<()>;
}
Invariants / safety:
- No async on the per-frame path.
is_armed()is aRelaxedatomic load; sample construction happens only at the existing 2 s / 1 s aggregation boundary, never per frame. idis path-traversal-safe.load/deleteMUST reject any id not matching^[A-Za-z0-9._-]+$(no/, no.., no:— keep it a valid Windows filename), and only ever joindir/<id>.json. Return NotFound on reject. (Endpoints are bearer-authed, but defend in depth.)- Bounded memory.
MAX_SAMPLEScap; truncate (keep oldest), never unbounded. - Atomic disk write. Write to
<id>.json.tmpthen rename, so a crash mid-write can't leave a half file. Pretty-print not required; compact JSON is fine. - Captures dir:
~/.config/punktfunk/captures/(next tocert.pemetc.). Resolve via the same config-dir helper the rest of the host uses.
Runtime gating change (the key behavioral change)
Today the loops measure per-stage timing only if perf (a startup bool). Change the per-frame
measurement predicate to let measure = perf || recorder.is_armed();, re-evaluated each
frame (cheap atomic). Then at the aggregation boundary:
- if
perf→ keep the existingtracing::info!log line (unchanged behavior); - if
recorder.is_armed()→ also build aStatsSampleandpush_sample.
So PUNKTFUNK_PERF=1 still works exactly as before, AND the web toggle now works at runtime
with zero startup flags.
Where each loop emits the sample
- native (
punktfunk1.rs): the cap/submit/encode(st_wait) splits live in the capture thread;mbps/send_dropped/bytesandsession.stats()live in the send thread. Emit the complete sample from one place. Cleanest: carry the per-framecap_us/submit_us/wait_us(and arepeat: bool) onFrameMsgto the send thread (it already carriesencode_us), sosend_loopbuilds the whole sample at its existing 2 s boundary wheresession.stats()is already read. Computeframes_dropped/packets_dropped/send_dropped/fec_recoveredas deltas vs the previous window'sSession::stats()snapshot (the loop already trackslast_bytes/last_send_dropped— extend that bookkeeping).register_sessionis called once with the negotiated mode/codec and the client label. - gamestream (
gamestream/stream.rs): the encode loop already tracks per-stage max each 1 s. Add p50/p99 accumulation (small per-stageVec<u32>like the native path) and, whenperf || recorder.is_armed(), emit aStatsSamplewith stages[capture, encode, packetize, send]+ fps (unique new frames) + mbps + whatever loss/byte counters that path exposes (use 0 where a counter doesn't exist; do NOT fabricate). Callregister_session("gamestream", ...)with the GameStream-negotiated mode/codec/client.
Threading: add stats: Arc<StatsRecorder> to SessionContext and the GameStream stream
setup; the standalone punktfunk1-host subcommand (no mgmt) passes a fresh recorder (harmless,
just unused).
2. Host: mgmt REST API (mgmt.rs)
Add stats: Arc<StatsRecorder> to MgmtState. Register handlers in api_router_parts() via
routes!() with #[utoipa::path]. All under /api/v1, bearer-token only (operator
actions — do NOT add them to the mTLS cert_may_access read-only allowlist). All bodies/returns
derive ToSchema; errors use the ApiJson/ApiError envelope. Tag every operation stats.
| Method & path | fn (operationId) | body → returns |
|---|---|---|
POST /api/v1/stats/capture/start |
stats_capture_start |
— → StatsStatus |
POST /api/v1/stats/capture/stop |
stats_capture_stop |
— → CaptureMeta (200) / 204-ish if nothing was recording |
GET /api/v1/stats/capture/status |
stats_capture_status |
→ StatsStatus |
GET /api/v1/stats/capture/live |
stats_capture_live |
→ Capture (in-progress; 404/empty if idle) |
GET /api/v1/stats/recordings |
stats_recordings_list |
→ Vec<CaptureMeta> |
GET /api/v1/stats/recordings/{id} |
stats_recording_get |
→ Capture |
DELETE /api/v1/stats/recordings/{id} |
stats_recording_delete |
→ StatsStatus/204 |
Register the new ToSchema types with the OpenApi derive's components(schemas(...)) list.
Then regenerate the checked-in spec:
cargo run -p punktfunk-host -- openapi > api/openapi.json
CI fails on drift — the regenerated api/openapi.json MUST be committed.
3. Web console (web/)
New page "Performance" following the established route → section/index (fetch) →
section/view (presentational) pattern, registered in the NAV array (app-shell.tsx) with a
lucide icon (Activity or LineChart).
- Route:
web/src/routes/stats.tsx→createFileRoute('/stats')→SectionStats. - Section:
web/src/sections/Stats/index.tsx(orval hooks) +view.tsx(presentational, i18n via Paraglidem.*). UseSection,QueryState,Card/CardHeader/CardTitle/CardContent,Button,Badgefromweb/src/components/ui. - Charts: add
rechartstoweb/package.json(no chart lib exists today). Render charts client-only (a mounted guard) so SSR doesn't choke onResponsiveContainer's 0-width measure. Theme via existing CSS variables / brand violet, dark-mode aware.
Data hooks come from regenerated orval (bun run api:gen after the host's openapi.json is
updated): useStatsCaptureStatus, useStatsCaptureStart, useStatsCaptureStop,
useStatsCaptureLive, useStatsRecordingsList, useStatsRecordingGet,
useStatsRecordingDelete (exact names per orval's tag/operationId convention — verify against
generated output and adjust the view imports to match).
UI layout:
- Capture control card — Start/Stop button (mutations; invalidate status query on
success), a "Recording…"/"Idle"
Badge, elapsed time + live sample count (useStatsCaptureStatus,refetchInterval: 2000). On Start, the live chart appears. - Live chart (visible while armed;
useStatsCaptureLive,refetchInterval: 2000) — the latency stage breakdown as a stacked area (capture/submit/encode/send in µs, the "where does the time go" view), with fps and mbps as secondary line charts. - Recordings card — table from
useStatsRecordingsList: time, kind badge, resolution, codec, duration, sample count; row actions View (select → detail), Download (export theCaptureJSON via the recording GET), Delete (mutation, confirm). - Recording detail — when a recording (or the live capture) is selected, render the full
graph set from its
samples:- Latency stage breakdown (stacked area, µs) — primary bottleneck view; p99 overlay toggle.
- Throughput: fps (new vs repeat) + mbps.
- Health: frames_dropped / packets_dropped / send_dropped / fec_recovered over time.
i18n: add keys to web/messages/en.json + de.json (nav label, titles, button/labels) and
regenerate Paraglide. Keep both locales in sync.
4. Verification / done-criteria
cargo build -p punktfunk-host(and--workspace),cargo clippy --workspace --all-targets -D warnings,cargo fmt --all --check— green.cargo run -p punktfunk-host -- openapi > api/openapi.json— committed, no drift.PUNKTFUNK_PERF=1stdout behavior unchanged (no regression to the existing perf log).- Web: orval regen clean, typecheck/build green, charts render client-side.
- CLAUDE.md status note + this plan updated.
- Adversarial review: hot-path stays sync + bounded;
idpath-traversal-safe; OpenAPI/orval no drift; SSR-safe charts; both paths actually emit samples.