docs(design): trim shipped plans, consolidate cluster, add index
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).
- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
host-latency, gpu-contention (fixed stale status table), game-library,
linux-setup (fixed m0->spike + stale zero-copy claim),
session-aware-host-followups, windows-client-bootstrap,
windows-dualsense-{scoping,game-detection}, windows-virtual-display,
security-review (per-finding status table; #12 still open),
apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
merged, M4 done); windows-secure-desktop.md archived (now a fallback
behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
+35
-88
@@ -1,5 +1,7 @@
|
||||
# punktfunk — security audit (2026-06-21)
|
||||
|
||||
> **Status:** AUDIT COMPLETE (2026-06-21). 11 of 12 confirmed findings are FIXED (`3526517` · `3c55ec3`) or accepted-risk DOCUMENTED-INHERENT (#5/#9); **one remains OPEN: #12** (global `NODE_TLS_REJECT_UNAUTHORIZED`, DEFERRED). This doc is trimmed to the audit trail (executive summary · threat model · per-finding status · cross-cutting themes · positives · refuted findings) plus the accepted-risk rationale and the open item. Per-finding remediation prose for the FIXED findings is collapsed to one line + commit — the shipped code is the source of truth and git history holds the full original.
|
||||
|
||||
Whole-project audit by a 10-surface multi-agent review; every finding adversarially verified (reachability, attacker-control, existing mitigation). **10 surfaces · 20 raw findings → 18 confirmed/partial, 2 refuted.** Threat model: a malicious network client (pre- and post-pairing) is the primary adversary; also an on-path MITM and a local unprivileged user (the host is privileged).
|
||||
|
||||
## Remediation status (2026-06-21)
|
||||
@@ -11,97 +13,65 @@ All 12 confirmed findings have been addressed — fixed, or documented where a f
|
||||
| #1 | high | **FIXED** (3526517) — secret files 0600 + dir 0700 / Windows icacls DACL |
|
||||
| #2 | high | **FIXED** (3526517) — single-use SPAKE2 PIN (consumed at the host key-confirmation) |
|
||||
| #3 | med | **FIXED** (3526517) — RTSP packetSize bounded + saturating packetizer math |
|
||||
| #4 | low | **FIXED** — mgmt mTLS-cert auth restricted to a read-only allowlist; admin/state-changing routes require the bearer token |
|
||||
| #4 | low | **FIXED** (3c55ec3) — mgmt mTLS-cert auth restricted to a read-only allowlist; admin/state-changing routes require the bearer token |
|
||||
| #5 | low | **DOCUMENTED (won't-fix on legacy)** — legacy GameStream GCM nonce reuse is inherent to Nvidia's old-style control encryption (Apollo/Moonlight identical); the GCM key is client-known. Real fix = V2 control-encryption negotiation; use punktfunk/1 for untrusted nets. Code comment at `control.rs` rumble loop. |
|
||||
| #6 | low | **FIXED** — RTSP Content-Length/header caps + per-read timeout + concurrent-connection cap |
|
||||
| #7 | low | **FIXED (GameStream) / DOCUMENTED (native)** — new `VirtualDisplay::set_launch_command` carries the launch command per-session (GameStream); native path keeps the env (safe under today's single-session model; plumb per-session with concurrent sessions) |
|
||||
| #8 | info | **FIXED** — constant-time GameStream phase-4 hash compare (`crypto::ct_eq`) |
|
||||
| #6 | low | **FIXED** (3c55ec3) — RTSP Content-Length/header caps + per-read timeout + concurrent-connection cap |
|
||||
| #7 | low | **FIXED (3c55ec3, GameStream) / DOCUMENTED (native)** — new `VirtualDisplay::set_launch_command` carries the launch command per-session (GameStream); native path keeps the env (safe under today's single-session model; plumb per-session with concurrent sessions) |
|
||||
| #8 | info | **FIXED** (3c55ec3) — constant-time GameStream phase-4 hash compare (`crypto::ct_eq`) |
|
||||
| #9 | info | **DOCUMENTED** — GameStream pairing over plain HTTP is inherent to GFE compat; steer untrusted networks to the SPAKE2 native plane |
|
||||
| #10 | info | **FIXED** — fixed ALPN (`pkf1`) on both QUIC endpoints (coordinated client+host upgrade) |
|
||||
| #11 | info | **FIXED** — FEC reconstruction failure is now a counted drop, not stream-fatal |
|
||||
| #10 | info | **FIXED** (3c55ec3) — fixed ALPN (`pkf1`) on both QUIC endpoints (coordinated client+host upgrade) |
|
||||
| #11 | info | **FIXED** (3c55ec3) — FEC reconstruction failure is now a counted drop, not stream-fatal |
|
||||
| #12 | low | **DEFERRED (fix ready, reverted)** — the scoped-dispatcher fix (undici `Agent` on `proxyRequest`'s `fetch` option) is designed and the mechanism verified sound (h3 honors the fetch option), but it needs `undici` added as a web dependency (`bun add undici` + lockfile regen), which requires the web build env — not available here. Reverted to keep the web build/proxy working. Latent-only: the loopback mgmt fetch is the web console's ONLY outbound TLS, so the global env weakens nothing today. Apply with: `cd web && bun add undici`, then scope `rejectUnauthorized:false` to the mgmt fetch and drop the global env. |
|
||||
|
||||
## Open items
|
||||
|
||||
Exactly **one** finding is still open — everything else is FIXED or accepted-risk DOCUMENTED (see the table above and #5/#9 below).
|
||||
|
||||
- **#12 [LOW] Web console sets `NODE_TLS_REJECT_UNAUTHORIZED=0` process-globally — DEFERRED.** Latent-only today: the only server-side outbound TLS hop is the loopback proxy to `https://127.0.0.1:47990`, which cannot be MITM'd, so impact is nil now. The fix is designed and verified sound (scope `rejectUnauthorized:false` to a per-request `https.Agent`/undici `Agent` pinned to the host cert on `proxyRequest`'s `fetch`, and drop the global env) but was reverted because it needs `bun add undici` (+ lockfile regen) in the web build env, not available here. **Guard: this MUST be fixed before the web app gains ANY off-loopback server-side outbound TLS** — an update check, webhook, metadata fetch, or pointing `PUNKTFUNK_MGMT_URL` off-loopback would make it silently exploitable. Full detail in the findings section below.
|
||||
|
||||
## Executive summary
|
||||
|
||||
Overall the punktfunk host is a security-conscious codebase with a strong cryptographic and wire-parsing core: the FEC/reassembler path bounds every attacker-controlled length field before allocation, AES-GCM is used correctly with per-direction nonce separation and seq-as-AAD on the native plane, and the native trust model (SPAKE2 PIN binding both cert fingerprints, fingerprint pinning that still verifies the real TLS handshake signature) is genuinely sound. The most serious real defects are (1) local secret-disclosure of the host's master private key (key.pem) — written with no restrictive mode/ACL while the far-less-sensitive mgmt token is carefully 0600 — which on Windows (%ProgramData% default Users-read ACL, LocalSystem service) is a near-certain cross-privilege host-impersonation primitive, and (2) the native SPAKE2 PIN ceremony permitting unlimited online guesses against a static, non-rotating 4-digit PIN (no disarm-on-failure, no lockout), which contradicts the documented "one online guess" guarantee and lets a pre-auth LAN attacker brute-force pairing of a fully-trusted rogue client in a few hours against the default standalone/CLI flow. Dominant themes: file-permission hygiene on secrets is inconsistent (the secure pattern exists but is applied selectively), pairing throttling relies on a single global rate-limit rather than attempt-bounding, and authorization is overbroad (any streaming-paired cert is also a full mgmt admin). The remaining findings are a contained pre-auth RTSP video-thread DoS (unbounded packetSize and Content-Length), a legacy GameStream control-stream GCM nonce-reuse that is muted by modern V2 negotiation and being key-gated, and several defense-in-depth nits (non-constant-time GameStream hash compare, no QUIC ALPN, cross-session env-var launch confusion, global NODE_TLS_REJECT_UNAUTHORIZED). No memory-unsafety or RCE was found on attacker wire bytes; panics are safe Rust and isolated by panic=unwind. Net: a solid foundation whose highest-leverage fixes are tightening secret file permissions and making the PIN single-use/lockout-bounded.
|
||||
|
||||
> The executive summary describes the **pre-fix** state captured by the audit; it is retained verbatim as the historical record. See the status table for what shipped.
|
||||
|
||||
## Findings (ranked by severity × exploitability)
|
||||
|
||||
FIXED findings are collapsed to one line + commit. The two accepted-risk findings (#5, #9) and the one open finding (#12) keep their full rationale.
|
||||
|
||||
### 🟠 #1 [HIGH] Host master private key (key.pem) written with no restrictive file mode / ACL — local secret disclosure enabling full host impersonation
|
||||
|
||||
**Surface:** `secrets-availability`
|
||||
**Refs:** `crates/punktfunk-host/src/gamestream/cert.rs:36-44`, `crates/punktfunk-host/src/gamestream/mod.rs:216-232`, `crates/punktfunk-host/src/mgmt_token.rs:58-70`, `crates/punktfunk-host/src/service.rs:605-627`, `crates/punktfunk-host/src/native_pairing.rs:116-126`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #1 because it is the highest verdict-adjusted severity (high, three corroborating findings merged) and the most reliably exploitable post-foothold: key.pem is the single trust root for ALL surfaces — GameStream TLS server cert, GameStream pairing signing key, the punktfunk/1 QUIC identity every client pins, and the mgmt HTTPS cert — so its disclosure yields full host impersonation/MITM that defeats client fingerprint pinning, plus the mgmt bearer token is likewise unprotected on Windows. ServerIdentity::load_or_create writes it with a bare fs::write (no mode) and create_dir_all (no DACL). On Windows the leak is near-certain and umask-independent: config_dir() is %ProgramData%\punktfunk, whose default ACL grants BUILTIN\Users read, and the host runs as LocalSystem — any local unprivileged user reads the SYSTEM service's key; the mgmt-token 0o600 hardening is #[cfg(unix)] so it is a no-op there. On Linux the file lands at umask (commonly 0664/0644, verified live as world-readable) and is reachable cross-user whenever the home/config chain is traversable. The project demonstrably knows the secure pattern (mgmt_token.rs uses OpenOptions::mode(0o600)+set_permissions) but applies it to the less-sensitive token and not the master key. Local-only (adversary #3), not pre-auth/network, which caps it below critical.
|
||||
|
||||
**Fix:** Write key.pem (and cert.pem) via OpenOptions::mode(0o600) + a follow-up set_permissions(0o600) on Unix, mirroring mgmt_token.rs; create config_dir() with DirBuilder::mode(0o700). On Windows set an explicit DACL granting only SYSTEM+Administrators on the punktfunk %ProgramData% subtree and per-file on key.pem / mgmt-token / *paired.json (or relocate the key under a SYSTEM-only path), since the default ProgramData ACL is Users-readable. Extend the same hardening to client-key.pem and the persisted trust stores. Add a regression test asserting 0600 on key.pem on Unix.
|
||||
**Surface:** `secrets-availability` · **FIXED (3526517).** key.pem is the single trust root for ALL surfaces (GameStream TLS cert + pairing signing key, the punktfunk/1 QUIC identity every client pins, the mgmt HTTPS cert); `ServerIdentity::load_or_create` wrote it with a bare `fs::write`/`create_dir_all` while the less-sensitive mgmt token used `OpenOptions::mode(0o600)`. On Windows (%ProgramData% Users-read ACL + LocalSystem service) this was a near-certain cross-privilege host-impersonation primitive; on Linux it landed at umask (verified world-readable). Fix: 0600 file + 0700 dir on Unix mirroring `mgmt_token.rs`, SYSTEM+Administrators-only DACL on the Windows subtree, extended to `client-key.pem`/trust stores, + a 0600 regression test. Refs `gamestream/cert.rs`, `gamestream/mod.rs`, `mgmt_token.rs`, `service.rs`, `native_pairing.rs`.
|
||||
|
||||
### 🟠 #2 [HIGH] Native SPAKE2 PIN ceremony allows unlimited online guesses against a static 4-digit PIN — pre-auth brute-force to a fully-trusted rogue client
|
||||
**Surface:** `pairing-pin` · **FIXED (3526517).** `pair_ceremony` logged+errored on a wrong PIN but never called `np.disarm()`/rotated; `current_pin()` returned the same value forever, throttled only by one process-wide 2s `PAIRING_COOLDOWN` over a 10,000-PIN space (~2.8h avg → permanently-pinned rogue cert with input/capture/launch). The standalone `punktfunk1-host` default (`--require-pairing` arms `expires_at:None`) made the indefinite static-PIN window the DEFAULT, not opt-in — contradicting the documented "one online guess". Fix: a failed confirmation now consumes/disarms the PIN (the actual "one online guess"); disarm after a successful pair too. Refs `punktfunk1.rs`, `native_pairing.rs`, `mgmt.rs`.
|
||||
|
||||
**Surface:** `pairing-pin`
|
||||
**Refs:** `crates/punktfunk-host/src/punktfunk1.rs:388-446`, `crates/punktfunk-host/src/punktfunk1.rs:475-491`, `crates/punktfunk-host/src/punktfunk1.rs:82`, `crates/punktfunk-host/src/native_pairing.rs:189-234`, `crates/punktfunk-host/src/native_pairing.rs:128-131`, `crates/punktfunk-host/src/mgmt.rs:841-842`
|
||||
### 🟡 #3 [MEDIUM] Pre-auth RTSP ANNOUNCE packetSize underflows/panics the GameStream video pipeline
|
||||
**Surface:** `gamestream-parsing` · **FIXED (3526517).** The RTSP listener on TCP 48010 does no TLS/pairing/auth; `x-nv-video[0].packetSize` flowed unbounded into `VideoPacketizer::new` where `payload_per_shard = packet_size - 16` → packetSize==16 div-by-zero, <16 underflow OOB slice, ==17 per-frame datagram flood; the safe-Rust panic unwound before `running.store(false)`, wedging the session until restart (not RCE — isolated by panic=unwind). Fix: clamp packetSize (floor ~64 / cap ~2048) + checked/saturating packetizer math + `store(false)` on the unwind path + a `{0,15,16,17}` regression test. Refs `gamestream/rtsp.rs`, `gamestream/video.rs`, `gamestream/stream.rs`.
|
||||
|
||||
**Why it ranks here / impact:** Ranked #2: high severity AND pre-auth + fully attacker-controlled, the strongest exploitability combination among the high-rated issues — gated only on pairing being armed and an hours-long active window. Merges the three pairing-pin brute-force findings (they share one root cause: no disarm/rotate-on-failure and no attempt budget). pair_ceremony logs a warning and returns Err on a wrong PIN but never calls np.disarm() or rotates the PIN; current_pin() returns the same value forever (cleared only by TTL or operator); the only throttle is one process-wide 2s PAIRING_COOLDOWN. The PIN space is 10,000. Critically the standalone punktfunk1-host default (--require-pairing forces allow_pairing) arms with expires_at:None at startup, so the indefinite static-PIN window is the DEFAULT for that binary, not an opt-in. At ~1 guess/2s the space exhausts in ~5.5h worst / ~2.8h avg, and on success the attacker's cert is permanently pinned, granting input injection, screen capture and app launch. This directly contradicts the documented 'one online guess, no offline dictionary' claim — the offline-dictionary resistance from SPAKE2 holds, but the online single-guess limit is simply not implemented. Mitigations partial: the web/mgmt arm path is TTL-bounded (15..600s), confining the worst case to the CLI/standalone mode.
|
||||
|
||||
**Fix:** Make a failed confirmation consume the PIN: on ok==false in pair_ceremony, call np.disarm() (or rotate to a fresh random PIN) so a single wrong guess closes the window — this is what actually delivers the documented 'one online guess'. Add a per-window failed-attempt budget (auto-disarm after N>=1 failures), give the CLI no-expiry arm path a default expiry, and disarm after a SUCCESSFUL pair too. Keep the 2s cooldown as defence-in-depth and raise the web-armed PIN to 6 digits.
|
||||
|
||||
### 🟡 #3 [MEDIUM] Pre-auth RTSP ANNOUNCE packetSize underflows/panics the GameStream video pipeline (div-by-zero / OOB slice / allocation amplification)
|
||||
|
||||
**Surface:** `gamestream-parsing`
|
||||
**Refs:** `crates/punktfunk-host/src/gamestream/rtsp.rs:275`, `crates/punktfunk-host/src/gamestream/video.rs:55-89`, `crates/punktfunk-host/src/gamestream/stream.rs:322`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #3: medium and fully pre-auth + attacker-controlled — the highest-exploitability of the medium-and-below tier. The RTSP listener on TCP 48010 performs no TLS/pairing/auth; an unauthenticated peer drives OPTIONS→ANNOUNCE→PLAY (+ a UDP ping to the video port) and the video thread starts on state.stream alone, no paired session required. x-nv-video[0].packetSize is read with no bound and flows into VideoPacketizer::new where payload_per_shard = packet_size - 16: packetSize==16 → pps==0 → div-by-zero panic; packetSize<16 → underflow → OOB slice panic; packetSize==17 → one byte/shard → per-frame datagram flood. Reliable remote pre-auth DoS of a privileged media service, made stickier because the panic unwinds before running.store(false) leaving the session wedged until restart. Calibrated medium (not higher) because it is a SAFE Rust panic (checked slice access, no memory corruption/UB) isolated to the punktfunk-video thread by panic=unwind — the host process and other listeners survive; not RCE.
|
||||
|
||||
**Fix:** Validate packet_size in stream_config() before building StreamConfig: reject packetSize below a sane floor (e.g. < 64) and clamp to a sane max (e.g. <= 2048). Additionally harden VideoPacketizer::new to use checked/saturating arithmetic and refuse construction (or fall back to a default) when packet_size < SHARD_HEADER-16 so the per-frame path never sees pps==0 or a wrapped payload_per_shard. Also store(false) on the unwind path so a panic doesn't wedge the session. Add a regression test over packetSize in {0,15,16,17}.
|
||||
|
||||
### 🔵 #4 [LOW] Any paired punktfunk/1 streaming client gets full management-API authority via the mTLS-paired-cert auth path (no streaming-vs-admin separation)
|
||||
|
||||
**Surface:** `authz-trust`
|
||||
**Refs:** `crates/punktfunk-host/src/mgmt.rs:459-488`, `crates/punktfunk-host/src/mgmt.rs:466-470`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #4: low but a genuine post-auth privilege over-broadening with concrete admin impact. require_auth grants any verified peer cert whose fingerprint is in the native paired store full unscoped access to every /api/v1 route — the SAME paired set that admits a device to stream. So a device paired purely to watch the screen can DELETE /clients/{fp} (unpair others), POST /native/pair/arm (open a pairing window and read the PIN), approve arbitrary knocking devices, DELETE /session, and CRUD the library; there is no role/scope check anywhere in the router. The native client presents its identity via TLS client auth on both ports, so the credential is genuinely usable against mgmt. Bounded to low because it requires being an already-paired (operator-trusted) device and the mgmt port binds loopback by default — remote reach needs an explicit routable --mgmt-bind (and the mTLS path then bypasses the token requirement).
|
||||
|
||||
**Fix:** Separate streaming trust from management trust: keep a distinct admin allow-list (or an admin flag on a paired entry) for the mTLS mgmt path, or restrict mTLS-cert auth to read-only endpoints and require the bearer token for state-changing/admin routes. At minimum gate the pairing-administration endpoints (arm/approve/unpair) and session/library mutation behind the bearer token only.
|
||||
### 🔵 #4 [LOW] Any paired punktfunk/1 streaming client gets full management-API authority via the mTLS-paired-cert auth path
|
||||
**Surface:** `authz-trust` · **FIXED (3c55ec3).** `require_auth` granted any verified peer cert in the native paired store full unscoped `/api/v1` access — the SAME set that admits a device to stream — so a watch-only device could `DELETE /clients/{fp}`, arm pairing + read the PIN, approve knocks, `DELETE /session`, CRUD the library. Fix: mTLS-cert auth restricted to a read-only allowlist; state-changing/admin/pairing-administration routes require the bearer token. Refs `mgmt.rs:459-488`.
|
||||
|
||||
### 🔵 #5 [LOW] GameStream legacy control-stream AES-GCM nonce reuse across directions (host rumble vs client input share key+nonce)
|
||||
|
||||
**Surface:** `crypto`
|
||||
**Surface:** `crypto` · **DOCUMENTED (won't-fix on legacy).**
|
||||
**Refs:** `crates/punktfunk-host/src/gamestream/control.rs:373-400`, `crates/punktfunk-host/src/gamestream/control.rs:257-266`, `crates/punktfunk-host/src/gamestream/control.rs:67,106-114`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #5: a real, correctly-identified catastrophic-class crypto defect (AES-GCM (key,nonce) reuse) but adjusted to low because reachability and impact are heavily muted. The legacy NonceKind branches apply no direction separation (other => other), so host rumble (rumble_seq from 0) and client control (seq from 0) under the shared rikey produce identical (key,nonce). BUT: (1) it only triggers on the legacy auto-detected scheme — modern moonlight-common-c negotiates the V2 scheme which flips marker[0] to 'H' and is direction-separated, so the default path is safe; the doc claim 'the legacy path — which we hit' is stale; (2) the rikey is delivered only over the mTLS /launch, so a pure MITM cannot derive the key — only a paired client can; (3) a paired client can already legitimately send any client→host control message (in-scope-by-design), so forgery is largely redundant and the only genuinely new gain is recovering low-value rumble keystream / forging rumble to its own client. Post-auth, conditional path.
|
||||
|
||||
**Fix:** Separate the two directions' nonce spaces for the legacy schemes too — set a reserved high bit/byte of the legacy IV for host-originated packets (mirror the V2 'H' marker), or better, HKDF-derive an independent host→client key from the rikey with a direction label so host and client never share a GCM key. Never let host rumble and client input share (key,nonce).
|
||||
**Fix:** Separate the two directions' nonce spaces for the legacy schemes too — set a reserved high bit/byte of the legacy IV for host-originated packets (mirror the V2 'H' marker), or better, HKDF-derive an independent host→client key from the rikey with a direction label so host and client never share a GCM key. Never let host rumble and client input share (key,nonce). (Inherent to the legacy Nvidia wire; the real fix is the V2 control-encryption / punktfunk/1 path. A code comment marks it at the `control.rs` rumble loop.)
|
||||
|
||||
### 🔵 #6 [LOW] RTSP request Content-Length / header size unbounded with no read timeout or connection cap — pre-auth slow-loris / memory-growth DoS
|
||||
**Surface:** `gamestream-parsing` · **FIXED (3c55ec3).** `read_message` computed `total = end+4+content_len` with no cap and looped `extend_from_slice`; the header scan was unbounded and one unbounded native thread spawned per connection with no global limit (slow exhaustion + thread/FD exhaustion on a privileged plaintext LAN listener). Fix: Content-Length/total-header caps + per-read timeout + concurrent-connection cap. Refs `gamestream/rtsp.rs`.
|
||||
|
||||
**Surface:** `gamestream-parsing`
|
||||
**Refs:** `crates/punktfunk-host/src/gamestream/rtsp.rs:82-106`, `crates/punktfunk-host/src/gamestream/rtsp.rs:24-48`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #6: low, pre-auth and attacker-controlled but a rate-limited resource DoS, not unsafety or auth bypass. read_message parses content-length and computes total = end+4+content_len with no cap, looping buf.extend_from_slice until buf.len()>=total; the header scan is likewise unbounded and there is no body/header cap, no read/write timeout, and one unbounded native thread is spawned per connection with no global limit. Growth is bounded by attacker send rate (no pre-allocation), so it is slow exhaustion rather than instant OOM; the stronger lever is thread/FD exhaustion from many idle slow-loris connections at near-zero bandwidth. On a privileged LAN-facing plaintext listener with zero defensive caps.
|
||||
|
||||
**Fix:** Cap Content-Length and total header size to small constants (e.g. reject content_len > 64 KiB, total header > 16 KiB) and close on violation. Add a read timeout so a slow-loris connection cannot pin a thread indefinitely, and bound concurrent RTSP connections.
|
||||
|
||||
### 🔵 #7 [LOW] Per-session launch command carried via process-global PUNKTFUNK_GAMESCOPE_APP env var, stomped under concurrent native sessions (cross-session launch confusion)
|
||||
|
||||
**Surface:** `privilege-process-launch`
|
||||
**Refs:** `crates/punktfunk-host/src/punktfunk1.rs:560-571`, `crates/punktfunk-host/src/punktfunk1.rs:140`, `crates/punktfunk-host/src/vdisplay/gamescope.rs:629-647`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #7: low, post-auth cross-session isolation bug, explicitly NOT command injection. serve_session does std::env::set_var(PUNKTFUNK_GAMESCOPE_APP) per accepted connection with a stale comment claiming 'one session at a time', but DEFAULT_MAX_CONCURRENT=4 sessions run concurrently and the var is read in gamescope::spawn during VirtualDisplay::create — a genuine TOCTOU where client B's launch overwrites what client A's bare-spawn reads, and the never-cleared value leaks into a later no-launch client. Impact is capped because cmd always resolves through library::launch_command (digit-validated Steam appids / operator-only custom store), so the worst case is launching a DIFFERENT operator-approved title or a stale title — and it only affects the gamescope bare-spawn backend (kwin/mutter/wlroots/attach ignore the var).
|
||||
|
||||
**Fix:** Stop carrying the per-session launch command in a process-global env var. Plumb the resolved command through the VirtualDisplay::create call / per-session context (e.g. a field on Mode or a per-session GamescopeDisplay), and on the bare-spawn path pass it explicitly to spawn(); clear/scope it so a stale value never leaks to the next client.
|
||||
### 🔵 #7 [LOW] Per-session launch command carried via process-global PUNKTFUNK_GAMESCOPE_APP env var, stomped under concurrent native sessions
|
||||
**Surface:** `privilege-process-launch` · **FIXED (3c55ec3, GameStream) / DOCUMENTED (native).** `serve_session` did `std::env::set_var(PUNKTFUNK_GAMESCOPE_APP)` per connection under `DEFAULT_MAX_CONCURRENT=4` — a TOCTOU where client B's launch overwrote what client A's gamescope bare-spawn read (and a never-cleared value leaked to a later no-launch client). NOT command injection: `cmd` always resolves through `library::launch_command` (digit-validated Steam appids / operator-only custom store), so the worst case is a different operator-approved title, and only the gamescope bare-spawn backend reads the var. Fix: `VirtualDisplay::set_launch_command` carries it per-session for GameStream; **the native path keeps the env — safe under today's single-session model, plumb per-session as concurrent sessions land** (still-open follow-up). Refs `punktfunk1.rs`, `vdisplay/gamescope.rs`.
|
||||
|
||||
### ⚪ #8 [INFO] GameStream pairing phase-4 hash compare is not constant-time
|
||||
|
||||
**Surface:** `pairing-pin`
|
||||
**Refs:** `crates/punktfunk-host/src/gamestream/pairing.rs:226-247`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #8: info / hardening only — a real variable-time `==` on attacker-influenced 32-byte SHA-256 digests, but not weaponizable. The compared `expected` mixes in host-random server_challenge that is never disclosed (so the attacker can neither compute nor aim at the target), the attacker cannot steer client_hash to a chosen value without the PIN key, and any mismatch removes the session (map.remove) forcing a fresh ceremony with new randomness — so there is no stable secret to recover prefix-by-prefix and no path from timing to PIN recovery or match forgery. Worth fixing for consistency since the codebase already has ct_eq for the native ceremony.
|
||||
|
||||
**Fix:** Use a constant-time comparator (subtle::ConstantTimeEq or the project's existing ct_eq) for hash_ok, matching the constant-time discipline already used in the native SPAKE2 ceremony.
|
||||
**Surface:** `pairing-pin` · **FIXED (3c55ec3).** Variable-time `==` on attacker-influenced 32-byte phase-4 SHA-256 digests; not weaponizable (`expected` mixes undisclosed host-random `server_challenge`, a mismatch `map.remove`s the session forcing fresh randomness — no stable secret, no timing→PIN path). Fixed for consistency with the native ceremony: `crypto::ct_eq`. Refs `gamestream/pairing.rs:226-247`.
|
||||
|
||||
### ⚪ #9 [INFO] GameStream pairing ceremony runs over plain HTTP — inherited GFE brute-forceable-PIN / MITM weakness
|
||||
|
||||
**Surface:** `authz-trust`
|
||||
**Surface:** `authz-trust` · **DOCUMENTED (inherent to GameStream compat).**
|
||||
**Refs:** `crates/punktfunk-host/src/gamestream/nvhttp.rs:33`, `crates/punktfunk-host/src/gamestream/nvhttp.rs:215-264`, `crates/punktfunk-host/src/gamestream/pairing.rs:102-247`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #9: info — real but intentional Moonlight-compat behavior, on record rather than a regression. The whole /pair flow (incl. phase-4 cert pinning) is on plain HTTP 47989 with no transport confidentiality and no rate-limiting; the AES key is pin_key(salt,pin) = SHA-256(salt||pin)[..16] feeding AES-128-ECB, so an on-path attacker observing a legitimate pairing can offline-brute-force the 4-digit PIN and forge a clientpairingsecret to get a cert pinned. This is the well-known GFE/Sunshine construction, fixed by interop, and is precisely why punktfunk/1's SPAKE2 path exists; it requires an active MITM during an operator-initiated pairing within the 300s window. A paired GameStream client is in-scope-by-design.
|
||||
@@ -109,31 +79,19 @@ Overall the punktfunk host is a security-conscious codebase with a strong crypto
|
||||
**Fix:** Inherent to GameStream compatibility — document it and steer users to punktfunk/1 (SPAKE2) for untrusted networks. Optionally rate-limit pairing sessions per uniqueid/IP and tighten/expire the awaiting-PIN window aggressively.
|
||||
|
||||
### ⚪ #10 [INFO] No ALPN configured on the native QUIC server/client (cross-protocol confusion hardening absent)
|
||||
|
||||
**Surface:** `cert-tls-identity`
|
||||
**Refs:** `crates/punktfunk-core/src/quic.rs:1335-1354`, `crates/punktfunk-core/src/quic.rs:1412-1448`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #10: info — factually correct (no alpn_protocols set on either endpoint; the cert.pem identity is shared with GameStream TLS) but no reachable confusion attack. ALPACA-style attacks need two TLS services sharing a cert on the SAME transport; here GameStream is TLS-over-TCP and punktfunk/1 is TLS-in-QUIC (UDP) — not cross-reachable — and there is exactly one QUIC server so ALPN would make no authorization decision. Trust is already enforced by fingerprint pinning + app-layer Hello/Welcome magic. Cheap future-proofing only.
|
||||
|
||||
**Fix:** Set a fixed ALPN on both endpoints (e.g. rustls_cfg.alpn_protocols = vec![b"pkf1".to_vec()]) so a mismatched protocol is rejected during the TLS handshake — defense-in-depth against ever multiplexing protocols on the QUIC endpoint.
|
||||
**Surface:** `cert-tls-identity` · **FIXED (3c55ec3).** No `alpn_protocols` was set on either endpoint, but no reachable confusion attack: ALPACA needs two TLS services sharing a cert on the SAME transport — GameStream is TLS-over-TCP, punktfunk/1 is TLS-in-QUIC (UDP), and there is exactly one QUIC server, with trust already enforced by fingerprint pinning + Hello/Welcome magic. Fixed as cheap future-proofing: fixed ALPN `pkf1` on both endpoints (coordinated client+host upgrade). Refs `quic.rs:1335-1448`.
|
||||
|
||||
### ⚪ #11 [INFO] FEC reconstruct error on the receive path is stream-fatal — code-contract inconsistency (not an exploitable DoS)
|
||||
|
||||
**Surface:** `core-wire-deser`
|
||||
**Refs:** `crates/punktfunk-core/src/packet.rs:411`, `crates/punktfunk-core/src/session.rs:283-289`, `clients/probe/src/main.rs:959`, `crates/punktfunk-host/src/spike.rs:251`
|
||||
|
||||
**Why it ranks here / impact:** Ranked last: info — a correctly-identified contract inconsistency with NO demonstrable exploit. Reassembler::push propagates coder.reconstruct(...)? and both real receive-side callers treat any non-NoFrame error as fatal, inconsistent with the surrounding 'malformed = silent drop, never fatal' discipline. But every Err arm was traced unreachable from hostile input: header firewall + block-geometry pinning guarantee equal-length, correctly-counted shards; reconstruct is only called once received>=data_shards; Config::validate rejects odd/zero shard_payload before any decode; and MDS Reed-Solomon decodes any data_shards distinct shards. Reaching the reassembler also requires an AES-GCM-decryptable packet, so it is the connected host (not a port-sprayer), and it is client-side only — the privileged host never runs the reassembler on attacker bytes. Pure defense-in-depth hardening.
|
||||
|
||||
**Fix:** Make a FEC reconstruction failure a counted drop rather than stream-fatal: in Reassembler::push match coder.reconstruct(...) and on Err bump packets_dropped (or a fec_failed counter), discard the block, and return Ok(None). Reserve poll_frame's Err for genuinely fatal conditions (role misuse, transport teardown), matching the discipline documented at packet.rs:298-300.
|
||||
**Surface:** `core-wire-deser` · **FIXED (3c55ec3).** `Reassembler::push` propagated `coder.reconstruct(...)?` and both receive-side callers treated any non-`NoFrame` error as fatal — inconsistent with the surrounding "malformed = silent drop, never fatal" discipline. Every Err arm was traced unreachable from hostile input (header firewall + block-geometry pinning guarantee equal-length correctly-counted shards; `Config::validate` rejects odd/zero `shard_payload`; MDS Reed-Solomon decodes any `data_shards` distinct shards; reaching the reassembler needs an AES-GCM-decryptable packet; client-side only). Fixed as defense-in-depth: a reconstruct failure is now a counted drop returning `Ok(None)`, reserving `Err` for genuinely fatal conditions. Refs `packet.rs`, `session.rs`, `clients/probe/src/main.rs`, `spike.rs`.
|
||||
|
||||
### 🔵 #12 [LOW] Web console sets NODE_TLS_REJECT_UNAUTHORIZED=0 process-globally — latent footgun disabling all outbound TLS verification
|
||||
|
||||
**Surface:** `deps-config-exposure`
|
||||
**Surface:** `deps-config-exposure` · **DEFERRED — THE ONE STILL-OPEN ITEM.**
|
||||
**Refs:** `web/.env.example:22-24`, `web/web.env.example:11-14`, `web/server/util/auth.ts:17-22`, `web/vite.config.ts:23`
|
||||
|
||||
**Why it ranks here / impact:** Ranked #12: low and not currently exploitable (attackerControlled false), included as a latent defense-in-depth defect. NODE_TLS_REJECT_UNAUTHORIZED=0 disables certificate validation for every outbound TLS connection the Node process makes, but the only current server-side outbound hop is the loopback proxy to https://127.0.0.1:47990 (CDN/art fetches are browser-side), and a loopback connection cannot be MITM'd — so impact is nil today. Real impact materializes silently if anyone later adds a server-side off-host HTTPS call (update check, webhook, metadata fetch) or points PUNKTFUNK_MGMT_URL off-loopback.
|
||||
|
||||
**Fix:** Do not disable TLS verification globally. Pin the host's self-signed cert for the single loopback fetch: pass an https.Agent with the host cert as `ca` (or rejectUnauthorized:false on that one Agent only) to the proxyRequest fetch in server/routes/api/[...].ts, and drop NODE_TLS_REJECT_UNAUTHORIZED from the deployment env.
|
||||
**Fix:** Do not disable TLS verification globally. Pin the host's self-signed cert for the single loopback fetch: pass an https.Agent with the host cert as `ca` (or rejectUnauthorized:false on that one Agent only) to the proxyRequest fetch in server/routes/api/[...].ts, and drop NODE_TLS_REJECT_UNAUTHORIZED from the deployment env. **Reverted** because it needs `undici` added as a web dependency (`bun add undici` + lockfile regen) in the web build env. Apply with `cd web && bun add undici`, then scope `rejectUnauthorized:false` to the mgmt fetch and drop the global env.
|
||||
|
||||
## Cross-cutting themes
|
||||
|
||||
@@ -144,17 +102,6 @@ Overall the punktfunk host is a security-conscious codebase with a strong crypto
|
||||
- Stale concurrency assumptions and process-global mutable state (legacy GCM nonce direction, PUNKTFUNK_GAMESCOPE_APP env var) that were safe under a since-removed 'one session at a time' invariant and now cause cross-session confusion / crypto reuse.
|
||||
- Strong, well-tested cryptographic and memory-safety core (bounded wire parsing, correct AEAD/SPAKE2/pinning, catch_unwind FFI, panic=unwind isolation) — the foundation is solid; the residual risk is in operational hardening and trust-tier granularity, not in unsafe/RCE.
|
||||
|
||||
## Prioritized remediation (do in this order)
|
||||
|
||||
1. Lock down secret files: write key.pem (and cert.pem) 0600 + create config_dir 0700 on Unix using the existing mgmt_token OpenOptions::mode pattern, and set an explicit SYSTEM+Administrators-only DACL on the punktfunk %ProgramData% subtree / key.pem / mgmt-token / *paired.json on Windows. Extend to client-key.pem; add a 0600 regression test.
|
||||
2. Make the native PIN single-use and lockout-bounded: disarm or rotate the PIN on a failed SPAKE2 confirmation, add a per-window failed-attempt budget, give the CLI no-expiry arm path a default expiry, and disarm after a successful pair — this is what delivers the documented 'one online guess'.
|
||||
3. Bound the RTSP video path: validate/clamp x-nv-video[0].packetSize (floor ~64, cap ~2048) in stream_config() and use checked/saturating arithmetic in VideoPacketizer::new so pps==0 / underflow can never occur; store(false) on the unwind path; add a {0,15,16,17} regression test.
|
||||
4. Cap RTSP request parsing: enforce a Content-Length and total-header-size limit, add a read timeout, and bound concurrent connections so a pre-auth peer cannot slow-loris exhaust threads/memory.
|
||||
5. Separate streaming trust from management trust: require the mgmt bearer token (not just a paired streaming cert) for state-changing and pairing-administration routes (arm/approve/unpair/session/library), or keep a distinct admin allow-list.
|
||||
6. Fix the legacy GameStream GCM nonce reuse: HKDF-derive an independent host→client key from the rikey (direction label), or mirror the V2 'H' direction marker into the legacy IV so host rumble and client input never share (key,nonce).
|
||||
7. Stop carrying the per-session gamescope launch command in a process-global env var: plumb it through the per-session VirtualDisplay::create/context and clear it when no launch is requested, eliminating cross-session stomping under concurrency.
|
||||
8. Apply the cheap hardening nits: constant-time compare for the GameStream phase-4 hash (use ct_eq), set a fixed ALPN ('pkf1') on both QUIC endpoints, make FEC reconstruct failures a counted drop instead of stream-fatal, and replace the global NODE_TLS_REJECT_UNAUTHORIZED with a cert-pinned https.Agent scoped to the loopback mgmt fetch.
|
||||
|
||||
## Security controls done right (positives)
|
||||
|
||||
- Defense-in-depth wire parsing: every attacker-controllable FEC/reassembler header field is bounded against negotiated limits BEFORE any allocation keyed on it (packet.rs:328-343) — shard_bytes exact-match, data/total/block counts in range, indices in bounds, frame_bytes<=max — with no integer overflow in the size math and regression tests (rejects_oversized_shard_counts, rejects_inconsistent_block_geometry_without_panicking).
|
||||
|
||||
Reference in New Issue
Block a user