docs(design): trim shipped plans, consolidate cluster, add index

Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).

- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
  apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
  host-latency, gpu-contention (fixed stale status table), game-library,
  linux-setup (fixed m0->spike + stale zero-copy claim),
  session-aware-host-followups, windows-client-bootstrap,
  windows-dualsense-{scoping,game-detection}, windows-virtual-display,
  security-review (per-finding status table; #12 still open),
  apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
  windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
  merged, M4 done); windows-secure-desktop.md archived (now a fallback
  behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
  roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-26 16:39:06 +00:00
parent 9ea2c17419
commit 7b99b41ede
27 changed files with 1322 additions and 3229 deletions
+43 -238
View File
@@ -1,246 +1,51 @@
# Stats capture & graphing — design
Goal: let an operator **enable performance-stats capture from the web console**, play a
session, **stop**, and **review the captured time-series as graphs** in the web console.
Captures are **saved to disk** (browse/compare past sessions; survive host restart) and
cover **both** streaming paths: native punktfunk/1 (`virtual_stream`) and GameStream/Moonlight
(`gamestream/stream.rs`).
> **Status:** SHIPPED (commit `5bf787e`) — host `crates/punktfunk-host/src/stats_recorder.rs`,
> mgmt endpoints `/api/v1/stats/*` (`mgmt.rs`), web console Performance page
> (`web/src/sections/Stats/`). Implemented; not yet on-glass validated. This doc is trimmed to
> design rationale + open items; the shipped code is the source of truth (data models, recorder
> API, endpoint list, and UI layout all live there).
This builds on the existing per-stage instrumentation (today gated by `PUNKTFUNK_PERF=1`,
stdout-only, read once at startup). We make recording **runtime-toggleable**, route the same
aggregates into a **shared ring → on-disk recording**, and expose it over the mgmt REST API +
web console.
Goal: let an operator **enable performance-stats capture from the web console**, play a session,
**stop**, and **review the captured time-series as graphs**. Captures are **saved to disk**
(browse/compare past sessions; survive host restart) and cover **both** streaming paths: native
punktfunk/1 (`virtual_stream`) and GameStream/Moonlight (`gamestream/stream.rs`).
---
## Why / design rationale
## 1. Host: shared `StatsRecorder`
New module `crates/punktfunk-host/src/stats_recorder.rs`. One `Arc<StatsRecorder>` is created
once in the unified host entry (`gamestream::serve`, the `serve` subcommand) alongside
`Arc<NativePairing>`, and shared with **both** the mgmt API (`MgmtState`) and the streaming
loops (threaded through `punktfunk1::serve``SessionContext``virtual_stream`/`send_loop`,
and into the GameStream encode loop). Mirror the existing `NativePairing` Arc-sharing pattern
exactly.
### Data model (serde + utoipa `ToSchema`; this is the wire + on-disk shape)
```rust
/// One pipeline stage's latency in a window (microseconds).
pub struct StageTiming {
pub name: String, // "capture" | "submit" | "encode" | "packetize" | "send"
pub p50_us: f32,
pub p99_us: f32,
}
/// One aggregated sample (~ every 2 s native, ~ every 1 s GameStream).
pub struct StatsSample {
pub t_ms: u64, // ms since capture start (monotonic, from a stored Instant)
pub session_id: u32, // disambiguates concurrent sessions (usually constant)
pub stages: Vec<StageTiming>, // ordered pipeline stages for this path
pub fps: f32, // genuine NEW frames/s from the source
pub repeat_fps: f32, // re-encoded holds/s (source-starvation indicator)
pub mbps: f32, // tx goodput (Mb/s)
pub bitrate_kbps: u32, // configured target bitrate
pub frames_dropped: u32, // delta in this window
pub packets_dropped: u32, // delta (receiver-side / reassembler), where known
pub send_dropped: u32, // delta (host send-buffer overflow / EAGAIN)
pub fec_recovered: u32, // delta (shards recovered)
}
pub struct CaptureMeta {
pub id: String, // "2026-06-26T20-14-03Z_5120x1440" — also the filename stem
pub started_unix_ms: u64,
pub duration_ms: u64,
pub kind: String, // "native" | "gamestream"
pub width: u32,
pub height: u32,
pub fps: u32,
pub codec: String, // "h264" | "hevc" | "av1"
pub client: String, // short label / fingerprint prefix, or "" if unknown
pub sample_count: u32,
}
pub struct Capture {
pub meta: CaptureMeta,
pub samples: Vec<StatsSample>,
}
pub struct StatsStatus {
pub armed: bool, // capture currently running
pub sample_count: u32, // samples in the in-progress capture
pub started_unix_ms: u64, // 0 if idle
pub kind: String, // path of the in-progress capture, "" if idle
}
```
Stage sets per path (ordered, roughly the per-frame critical path so stacking is meaningful):
- **native**: `capture` (try_latest ring read + color convert), `submit` (NVENC enqueue),
`encode` (lock_bitstream = NVENC schedule + ASIC — the dominant stage under GPU load),
`send` (paced_submit: seal + FEC + pace + sendmmsg).
- **gamestream**: `capture`, `encode`, `packetize` (poll+FEC+packetize), `send`.
> Native naming: today's vectors are `st_cap`→`capture`, `st_submit`→`submit`,
> `st_wait`→`encode`, `pace_us`→`send`. (`encode_us` total ≈ capture+submit+encode; we do not
> emit it as a stage to avoid double-counting — it's implied by the stack.)
### Recorder API
```rust
pub struct StatsRecorder { /* dir, armed: AtomicBool, live: Mutex<Option<Live>>, next_sid: AtomicU32 */ }
impl StatsRecorder {
pub fn new(dir: PathBuf) -> Arc<Self>; // creates dir (0700) if missing
pub fn is_armed(&self) -> bool; // cheap Relaxed atomic load — called on the hot path
/// Arm a new capture. No-op if already armed (returns current status).
pub fn start(&self) -> StatsStatus;
/// A streaming loop announces itself when it first records while armed.
/// Seeds CaptureMeta (kind/w/h/fps/codec/client) on the FIRST registration. Returns session_id.
pub fn register_session(&self, kind: &'static str, w: u32, h: u32, fps: u32, codec: &str, client: &str) -> u32;
/// Append one aggregated sample (called from the loops' existing ~2 s/~1 s boundary).
/// Bounded: cap at MAX_SAMPLES (e.g. 5400 ≈ 3 h @ 2 s). On overflow, stop appending and
/// set a `truncated` flag (DO NOT drop oldest — a saved recording must keep its start).
pub fn push_sample(&self, session_id: u32, sample: StatsSample);
/// Disarm + finalize: write <dir>/<id>.json atomically, clear live, return saved meta.
pub fn stop(&self) -> std::io::Result<Option<CaptureMeta>>;
pub fn status(&self) -> StatsStatus;
pub fn live_snapshot(&self) -> Option<Capture>; // clone of the in-progress capture for live graphing
pub fn list(&self) -> Vec<CaptureMeta>; // scan dir, parse meta only, newest first
pub fn load(&self, id: &str) -> std::io::Result<Capture>;
pub fn delete(&self, id: &str) -> std::io::Result<()>;
}
```
Invariants / safety:
- **Reuse the existing per-stage instrumentation** that was startup-gated by `PUNKTFUNK_PERF=1`
(stdout-only, read once at startup). The key behavioral change: make the per-frame
**measurement** predicate `perf || recorder.is_armed()`, re-evaluated each frame via a cheap
`Relaxed` atomic. `PUNKTFUNK_PERF=1` still emits its `tracing::info!` log line exactly as
before; the web toggle additionally builds a `StatsSample` at the aggregation boundary — so
the web toggle works at runtime with **zero startup flags**.
- **No async on the per-frame path.** `is_armed()` is a `Relaxed` atomic load; sample
construction happens only at the existing 2 s / 1 s aggregation boundary, never per frame.
- **`id` is path-traversal-safe.** `load`/`delete` MUST reject any id not matching
`^[A-Za-z0-9._-]+$` (no `/`, no `..`, no `:` — keep it a valid Windows filename), and only ever
join `dir/<id>.json`. Return NotFound on reject. (Endpoints are bearer-authed, but defend in
depth.)
- **Bounded memory.** `MAX_SAMPLES` cap; truncate (keep oldest), never unbounded.
- **Atomic disk write.** Write to `<id>.json.tmp` then rename, so a crash mid-write can't leave
a half file. Pretty-print not required; compact JSON is fine.
- Captures dir: `~/.config/punktfunk/captures/` (next to `cert.pem` etc.). Resolve via the same
config-dir helper the rest of the host uses.
construction happens only at the existing **~2 s native / ~1 s GameStream** aggregation
boundary, never per frame. One shared `Arc<StatsRecorder>` is created once in the unified host
entry and threaded into both streaming loops + `MgmtState`, mirroring the existing
`Arc<NativePairing>` sharing pattern.
- **Stage sets are the per-frame critical path so stacking is meaningful.** native:
`capture` / `submit` (NVENC enqueue) / `encode` (`lock_bitstream` = NVENC schedule + ASIC, the
dominant stage under GPU load) / `send` (paced_submit: seal + FEC + pace + sendmmsg).
gamestream: `capture` / `encode` / `packetize` / `send`. Native source vectors map
`st_cap``capture`, `st_submit``submit`, `st_wait``encode`, `pace_us``send`; `encode_us`
total ≈ capture+submit+encode and is **not** emitted as its own stage to avoid double-counting.
- **Gotchas / accepted-risk decisions:**
- **`id` is path-traversal-safe.** `load`/`delete` reject any id not matching
`^[A-Za-z0-9._-]+$` (no `/`, no `..`, no `:` — keep it a valid Windows filename) and only ever
join `dir/<id>.json`. Endpoints are bearer-authed, but defend in depth.
- **Bounded memory, keep the start.** `MAX_SAMPLES` cap (~5400 ≈ 3 h @ 2 s); on overflow stop
appending and set a `truncated` flag — **do NOT drop oldest**, a saved recording must keep
its start.
- **Atomic disk write.** Write `<id>.json.tmp` then rename so a crash mid-write can't leave a
half file. Captures dir `~/.config/punktfunk/captures/` (0700), next to `cert.pem`.
- Counters that a path doesn't expose are recorded as `0`**do NOT fabricate**.
- mgmt endpoints are **bearer-token only** (operator actions) — deliberately NOT in the mTLS
`cert_may_access` read-only allowlist.
- Charts render **client-only** (mounted guard) so SSR doesn't choke on `ResponsiveContainer`'s
0-width measure.
### Runtime gating change (the key behavioral change)
## Open items
Today the loops measure per-stage timing only `if perf` (a startup bool). Change the per-frame
**measurement** predicate to `let measure = perf || recorder.is_armed();`, re-evaluated each
frame (cheap atomic). Then at the aggregation boundary:
- if `perf` → keep the existing `tracing::info!` log line (unchanged behavior);
- if `recorder.is_armed()` → also build a `StatsSample` and `push_sample`.
So `PUNKTFUNK_PERF=1` still works exactly as before, AND the web toggle now works at runtime
with zero startup flags.
### Where each loop emits the sample
- **native** (`punktfunk1.rs`): the cap/submit/encode(`st_wait`) splits live in the capture
thread; `mbps`/`send_dropped`/`bytes` and `session.stats()` live in the send thread. Emit the
complete sample from **one** place. Cleanest: carry the per-frame `cap_us/submit_us/wait_us`
(and a `repeat: bool`) on `FrameMsg` to the send thread (it already carries `encode_us`), so
`send_loop` builds the whole sample at its existing 2 s boundary where `session.stats()` is
already read. Compute `frames_dropped/packets_dropped/send_dropped/fec_recovered` as deltas vs
the previous window's `Session::stats()` snapshot (the loop already tracks `last_bytes` /
`last_send_dropped` — extend that bookkeeping). `register_session` is called once with the
negotiated mode/codec and the client label.
- **gamestream** (`gamestream/stream.rs`): the encode loop already tracks per-stage max each
1 s. Add p50/p99 accumulation (small per-stage `Vec<u32>` like the native path) and, when
`perf || recorder.is_armed()`, emit a `StatsSample` with stages
`[capture, encode, packetize, send]` + fps (unique new frames) + mbps + whatever loss/byte
counters that path exposes (use 0 where a counter doesn't exist; do NOT fabricate). Call
`register_session("gamestream", ...)` with the GameStream-negotiated mode/codec/client.
Threading: add `stats: Arc<StatsRecorder>` to `SessionContext` and the GameStream stream
setup; the standalone `punktfunk1-host` subcommand (no mgmt) passes a fresh recorder (harmless,
just unused).
---
## 2. Host: mgmt REST API (`mgmt.rs`)
Add `stats: Arc<StatsRecorder>` to `MgmtState`. Register handlers in `api_router_parts()` via
`routes!()` with `#[utoipa::path]`. All under `/api/v1`, **bearer-token only** (operator
actions — do NOT add them to the mTLS `cert_may_access` read-only allowlist). All bodies/returns
derive `ToSchema`; errors use the `ApiJson`/`ApiError` envelope. Tag every operation `stats`.
| Method & path | fn (operationId) | body → returns |
|---------------------------------------|-------------------------|-------------------------------|
| POST `/api/v1/stats/capture/start` | `stats_capture_start` | — → `StatsStatus` |
| POST `/api/v1/stats/capture/stop` | `stats_capture_stop` | — → `CaptureMeta` (200) / 204-ish if nothing was recording |
| GET `/api/v1/stats/capture/status` | `stats_capture_status` | → `StatsStatus` |
| GET `/api/v1/stats/capture/live` | `stats_capture_live` | → `Capture` (in-progress; 404/empty if idle) |
| GET `/api/v1/stats/recordings` | `stats_recordings_list` | → `Vec<CaptureMeta>` |
| GET `/api/v1/stats/recordings/{id}` | `stats_recording_get` | → `Capture` |
| DELETE `/api/v1/stats/recordings/{id}`| `stats_recording_delete`| → `StatsStatus`/204 |
Register the new `ToSchema` types with the OpenApi derive's `components(schemas(...))` list.
Then regenerate the checked-in spec:
```
cargo run -p punktfunk-host -- openapi > api/openapi.json
```
CI fails on drift — the regenerated `api/openapi.json` MUST be committed.
---
## 3. Web console (`web/`)
New page **"Performance"** following the established route → section/index (fetch) →
section/view (presentational) pattern, registered in the `NAV` array (`app-shell.tsx`) with a
lucide icon (`Activity` or `LineChart`).
- Route: `web/src/routes/stats.tsx``createFileRoute('/stats')``SectionStats`.
- Section: `web/src/sections/Stats/index.tsx` (orval hooks) + `view.tsx` (presentational,
i18n via Paraglide `m.*`). Use `Section`, `QueryState`, `Card`/`CardHeader`/`CardTitle`/
`CardContent`, `Button`, `Badge` from `web/src/components/ui`.
- Charts: **add `recharts`** to `web/package.json` (no chart lib exists today). Render charts
**client-only** (a mounted guard) so SSR doesn't choke on `ResponsiveContainer`'s 0-width
measure. Theme via existing CSS variables / brand violet, dark-mode aware.
Data hooks come from regenerated orval (`bun run api:gen` after the host's openapi.json is
updated): `useStatsCaptureStatus`, `useStatsCaptureStart`, `useStatsCaptureStop`,
`useStatsCaptureLive`, `useStatsRecordingsList`, `useStatsRecordingGet`,
`useStatsRecordingDelete` (exact names per orval's tag/operationId convention — verify against
generated output and adjust the view imports to match).
UI layout:
1. **Capture control card** — Start/Stop button (mutations; invalidate status query on
success), a "Recording…"/"Idle" `Badge`, elapsed time + live sample count
(`useStatsCaptureStatus`, `refetchInterval: 2000`). On Start, the live chart appears.
2. **Live chart** (visible while armed; `useStatsCaptureLive`, `refetchInterval: 2000`) — the
latency stage breakdown as a **stacked area** (capture/submit/encode/send in µs, the
"where does the time go" view), with fps and mbps as secondary line charts.
3. **Recordings card** — table from `useStatsRecordingsList`: time, kind badge, resolution,
codec, duration, sample count; row actions **View** (select → detail), **Download** (export
the `Capture` JSON via the recording GET), **Delete** (mutation, confirm).
4. **Recording detail** — when a recording (or the live capture) is selected, render the full
graph set from its `samples`:
- Latency stage breakdown (stacked area, µs) — primary bottleneck view; p99 overlay toggle.
- Throughput: fps (new vs repeat) + mbps.
- Health: frames_dropped / packets_dropped / send_dropped / fec_recovered over time.
i18n: add keys to `web/messages/en.json` + `de.json` (nav label, titles, button/labels) and
regenerate Paraglide. Keep both locales in sync.
---
## 4. Verification / done-criteria
- `cargo build -p punktfunk-host` (and `--workspace`), `cargo clippy --workspace --all-targets
-D warnings`, `cargo fmt --all --check` — green.
- `cargo run -p punktfunk-host -- openapi > api/openapi.json` — committed, no drift.
- `PUNKTFUNK_PERF=1` stdout behavior unchanged (no regression to the existing perf log).
- Web: orval regen clean, typecheck/build green, charts render client-side.
- CLAUDE.md status note + this plan updated.
- Adversarial review: hot-path stays sync + bounded; `id` path-traversal-safe; OpenAPI/orval no
drift; SSR-safe charts; both paths actually emit samples.
- **On-glass validation.** Implemented but not yet validated on real hardware end-to-end (arm
from the console, play, stop, review graphs across both native + GameStream paths).