Files
punktfunk/design/session-aware-host-followups.md
T
enricobuehler 7b99b41ede docs(design): trim shipped plans, consolidate cluster, add index
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).

- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
  apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
  host-latency, gpu-contention (fixed stale status table), game-library,
  linux-setup (fixed m0->spike + stale zero-copy claim),
  session-aware-host-followups, windows-client-bootstrap,
  windows-dualsense-{scoping,game-detection}, windows-virtual-display,
  security-review (per-finding status table; #12 still open),
  apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
  windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
  merged, M4 done); windows-secure-desktop.md archived (now a fallback
  behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
  roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:39:06 +00:00

83 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session-aware host — known limitations & follow-ups
> **Status:** Session detection SHIPPED — host auto-detects the live session (Gaming / KDE / GNOME /
> wlroots) **per connect** and routes video+input at it; opt-in mid-stream switch watcher
> (`PUNKTFUNK_SESSION_WATCH=1`). Code: `crates/punktfunk-host/src/vdisplay.rs` +
> `vdisplay/linux/{gamescope,kwin,mutter}.rs`. Items #2 + #3 resolved in code (`3363576`); this doc is
> trimmed to the still-open limitations + their design rationale.
The host auto-detects the live session per connect and routes both video and input at it — managed
gamescope at the client's resolution in Steam Gaming Mode, a KWin/Mutter virtual output at the
client's resolution on a Desktop. A watcher (opt-in: `PUNKTFUNK_SESSION_WATCH=1`) follows a
Gaming↔Desktop switch **mid-stream** and rebuilds the backend in place without a reconnect.
Live-validated on the Bazzite F44 box (`bazzite-deck-nvidia:testing`, RTX 4090): Desktop KDE at
5120×1440 + input; Gaming managed at 5120×1440; warm-session reuse on quick reconnect; Feature B
video-switch both directions.
(Resolved: **#2 mid-stream-switch input** — `vdisplay::settle_desktop_portal()`; **#3 KWin/Mutter
virtual output primary** — `apply_session_env` defaults `PUNKTFUNK_KWIN_VIRTUAL_PRIMARY` /
`PUNKTFUNK_MUTTER_VIRTUAL_PRIMARY` on for the auto desktop path. Both shipped in `3363576`; details in
git history.)
## Still parked
### 1. F44 gamescope teardown corrupts the GPU context
Every gamescope teardown on this box (stop the autologin on connect; stop the managed session on
restore) risks leaking the NVIDIA GPU context — surfaces as `CUDA_ERROR_ILLEGAL_STATE` (401) in
`cuCtxCreate` / `vkCreateDevice` `VK_ERROR_INITIALIZATION_FAILED` (-3), then a black screen that
**needs a reboot**. The 5 s debounced restore + the desktop restore-guard cut the teardown *count*
but don't eliminate it. Options, in order of preference:
- **SIGKILL the gamescope on teardown** instead of `systemctl stop` (SIGTERM). Hypothesis: skipping
gamescope's buggy SIGTERM teardown handler (the part that SIGSEGVs, exit 139) lets the process die
hard and the driver reclaim its GPU resources cleanly via normal process exit — no half-torn-down
context. Change `stop_autologin_sessions` + `stop_session` (`vdisplay/linux/gamescope.rs`, both
still use `systemctl --user stop` = SIGTERM) to `systemctl --user kill --signal=SIGKILL <unit>`
(+ a follow-up `stop`/`reset-failed` to clear unit state). **Untested** — this is the first thing
to try; it would preserve "managed client-res gaming AND TV-shows-gaming-when-idle".
- **Keep the managed session warm** (no per-disconnect restore): spawn once, reuse forever, never
tear down → ~1 teardown per host lifetime. Tradeoff: the TV is blank/idle when no client is
connected (the autologin is never restored; return to gaming manually).
- Upstream gamescope/driver fix.
## Lower priority / polish
### 4. Mid-stream-switch input loss window (~6 s)
During the libei portal setup on a switch, buffered input drops (`libei: DROP — no resumed device`,
hundreds of events). Polish: pre-warm the portal, or hold events instead of dropping during the
device-resume window.
### 5. NVENC `InitializeEncoder failed: invalid param` (recovered)
At 5120×1440@240 the first NVENC open fails with `invalid param (8)` and **recovers** via the 2-way
split-encode path (the stream is live). Cosmetic but noisy — investigate the first-attempt failure /
silence the log.
### 6. NVENC HEVC bitrate cap (~800 Mbps on the RTX 4090)
HEVC opens at the GPU's max (~800 Mbps) when a higher rate is requested (e.g. 1600). Not a bug;
consider preferring AV1 when the client requests >~800 Mbps HEVC, and surface the cap in the
speed-test / bitrate UI.
### 7. Restore-guard / keep-warm model interaction
`do_restore_tv_session`, when a desktop is active, still stops the idle managed gamescope (a teardown
— leak risk per #1) and consumes `STOPPED_AUTOLOGIN` (so a later return-to-gaming won't auto-restore
the TV session). Resolve together with the keep-warm decision in #1.
### 8. Feature B is opt-in
The mid-stream watcher is gated behind `PUNKTFUNK_SESSION_WATCH=1` pending broader validation. Promote
to default-on once #2 (mid-stream input) lands and it's exercised on more boxes.
## Open items
1. **F44 gamescope teardown corrupts the GPU context** (#1) — try SIGKILL on teardown
(`stop_autologin_sessions` / `stop_session` in `vdisplay/linux/gamescope.rs`), else keep the
managed session warm, else upstream fix.
2. **Mid-stream-switch input-loss window (~6 s)** (#4) — pre-warm the portal or buffer/hold events
instead of dropping during the device-resume window.
3. **NVENC `InitializeEncoder failed: invalid param` noise at 5120×1440@240** (#5) — recovers via
split-encode; investigate the first-attempt failure / silence the log.
4. **NVENC HEVC ~800 Mbps cap on the RTX 4090** (#6) — consider preferring AV1 above it + surface the
cap in the speed-test / bitrate UI.
5. **Restore-guard / keep-warm interaction** (#7) — couples to #1; resolve together.
6. **Feature B (`PUNKTFUNK_SESSION_WATCH`) still opt-in** (#8) — promote to default-on after #2 lands
and it's exercised on more boxes.