Files
punktfunk/design/stats-capture-plan.md
enricobuehler 7b99b41ede docs(design): trim shipped plans, consolidate cluster, add index
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).

- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
  apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
  host-latency, gpu-contention (fixed stale status table), game-library,
  linux-setup (fixed m0->spike + stale zero-copy claim),
  session-aware-host-followups, windows-client-bootstrap,
  windows-dualsense-{scoping,game-detection}, windows-virtual-display,
  security-review (per-finding status table; #12 still open),
  apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
  windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
  merged, M4 done); windows-secure-desktop.md archived (now a fallback
  behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
  roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:39:06 +00:00

52 lines
3.5 KiB
Markdown

# Stats capture & graphing — design
> **Status:** SHIPPED (commit `5bf787e`) — host `crates/punktfunk-host/src/stats_recorder.rs`,
> mgmt endpoints `/api/v1/stats/*` (`mgmt.rs`), web console Performance page
> (`web/src/sections/Stats/`). Implemented; not yet on-glass validated. This doc is trimmed to
> design rationale + open items; the shipped code is the source of truth (data models, recorder
> API, endpoint list, and UI layout all live there).
Goal: let an operator **enable performance-stats capture from the web console**, play a session,
**stop**, and **review the captured time-series as graphs**. Captures are **saved to disk**
(browse/compare past sessions; survive host restart) and cover **both** streaming paths: native
punktfunk/1 (`virtual_stream`) and GameStream/Moonlight (`gamestream/stream.rs`).
## Why / design rationale
- **Reuse the existing per-stage instrumentation** that was startup-gated by `PUNKTFUNK_PERF=1`
(stdout-only, read once at startup). The key behavioral change: make the per-frame
**measurement** predicate `perf || recorder.is_armed()`, re-evaluated each frame via a cheap
`Relaxed` atomic. `PUNKTFUNK_PERF=1` still emits its `tracing::info!` log line exactly as
before; the web toggle additionally builds a `StatsSample` at the aggregation boundary — so
the web toggle works at runtime with **zero startup flags**.
- **No async on the per-frame path.** `is_armed()` is a `Relaxed` atomic load; sample
construction happens only at the existing **~2 s native / ~1 s GameStream** aggregation
boundary, never per frame. One shared `Arc<StatsRecorder>` is created once in the unified host
entry and threaded into both streaming loops + `MgmtState`, mirroring the existing
`Arc<NativePairing>` sharing pattern.
- **Stage sets are the per-frame critical path so stacking is meaningful.** native:
`capture` / `submit` (NVENC enqueue) / `encode` (`lock_bitstream` = NVENC schedule + ASIC, the
dominant stage under GPU load) / `send` (paced_submit: seal + FEC + pace + sendmmsg).
gamestream: `capture` / `encode` / `packetize` / `send`. Native source vectors map
`st_cap``capture`, `st_submit``submit`, `st_wait``encode`, `pace_us``send`; `encode_us`
total ≈ capture+submit+encode and is **not** emitted as its own stage to avoid double-counting.
- **Gotchas / accepted-risk decisions:**
- **`id` is path-traversal-safe.** `load`/`delete` reject any id not matching
`^[A-Za-z0-9._-]+$` (no `/`, no `..`, no `:` — keep it a valid Windows filename) and only ever
join `dir/<id>.json`. Endpoints are bearer-authed, but defend in depth.
- **Bounded memory, keep the start.** `MAX_SAMPLES` cap (~5400 ≈ 3 h @ 2 s); on overflow stop
appending and set a `truncated` flag — **do NOT drop oldest**, a saved recording must keep
its start.
- **Atomic disk write.** Write `<id>.json.tmp` then rename so a crash mid-write can't leave a
half file. Captures dir `~/.config/punktfunk/captures/` (0700), next to `cert.pem`.
- Counters that a path doesn't expose are recorded as `0`**do NOT fabricate**.
- mgmt endpoints are **bearer-token only** (operator actions) — deliberately NOT in the mTLS
`cert_may_access` read-only allowlist.
- Charts render **client-only** (mounted guard) so SSR doesn't choke on `ResponsiveContainer`'s
0-width measure.
## Open items
- **On-glass validation.** Implemented but not yet validated on real hardware end-to-end (arm
from the console, play, stop, review graphs across both native + GameStream paths).