Files
punktfunk/design/session-aware-host-followups.md
enricobuehler 7b99b41ede docs(design): trim shipped plans, consolidate cluster, add index
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).

- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
  apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
  host-latency, gpu-contention (fixed stale status table), game-library,
  linux-setup (fixed m0->spike + stale zero-copy claim),
  session-aware-host-followups, windows-client-bootstrap,
  windows-dualsense-{scoping,game-detection}, windows-virtual-display,
  security-review (per-finding status table; #12 still open),
  apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
  windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
  merged, M4 done); windows-secure-desktop.md archived (now a fallback
  behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
  roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:39:06 +00:00

5.2 KiB
Raw Permalink Blame History

Session-aware host — known limitations & follow-ups

Status: Session detection SHIPPED — host auto-detects the live session (Gaming / KDE / GNOME / wlroots) per connect and routes video+input at it; opt-in mid-stream switch watcher (PUNKTFUNK_SESSION_WATCH=1). Code: crates/punktfunk-host/src/vdisplay.rs + vdisplay/linux/{gamescope,kwin,mutter}.rs. Items #2 + #3 resolved in code (3363576); this doc is trimmed to the still-open limitations + their design rationale.

The host auto-detects the live session per connect and routes both video and input at it — managed gamescope at the client's resolution in Steam Gaming Mode, a KWin/Mutter virtual output at the client's resolution on a Desktop. A watcher (opt-in: PUNKTFUNK_SESSION_WATCH=1) follows a Gaming↔Desktop switch mid-stream and rebuilds the backend in place without a reconnect.

Live-validated on the Bazzite F44 box (bazzite-deck-nvidia:testing, RTX 4090): Desktop KDE at 5120×1440 + input; Gaming managed at 5120×1440; warm-session reuse on quick reconnect; Feature B video-switch both directions.

(Resolved: #2 mid-stream-switch inputvdisplay::settle_desktop_portal(); #3 KWin/Mutter virtual output primaryapply_session_env defaults PUNKTFUNK_KWIN_VIRTUAL_PRIMARY / PUNKTFUNK_MUTTER_VIRTUAL_PRIMARY on for the auto desktop path. Both shipped in 3363576; details in git history.)

Still parked

1. F44 gamescope teardown corrupts the GPU context

Every gamescope teardown on this box (stop the autologin on connect; stop the managed session on restore) risks leaking the NVIDIA GPU context — surfaces as CUDA_ERROR_ILLEGAL_STATE (401) in cuCtxCreate / vkCreateDevice VK_ERROR_INITIALIZATION_FAILED (-3), then a black screen that needs a reboot. The 5 s debounced restore + the desktop restore-guard cut the teardown count but don't eliminate it. Options, in order of preference:

  • SIGKILL the gamescope on teardown instead of systemctl stop (SIGTERM). Hypothesis: skipping gamescope's buggy SIGTERM teardown handler (the part that SIGSEGVs, exit 139) lets the process die hard and the driver reclaim its GPU resources cleanly via normal process exit — no half-torn-down context. Change stop_autologin_sessions + stop_session (vdisplay/linux/gamescope.rs, both still use systemctl --user stop = SIGTERM) to systemctl --user kill --signal=SIGKILL <unit> (+ a follow-up stop/reset-failed to clear unit state). Untested — this is the first thing to try; it would preserve "managed client-res gaming AND TV-shows-gaming-when-idle".
  • Keep the managed session warm (no per-disconnect restore): spawn once, reuse forever, never tear down → ~1 teardown per host lifetime. Tradeoff: the TV is blank/idle when no client is connected (the autologin is never restored; return to gaming manually).
  • Upstream gamescope/driver fix.

Lower priority / polish

4. Mid-stream-switch input loss window (~6 s)

During the libei portal setup on a switch, buffered input drops (libei: DROP — no resumed device, hundreds of events). Polish: pre-warm the portal, or hold events instead of dropping during the device-resume window.

5. NVENC InitializeEncoder failed: invalid param (recovered)

At 5120×1440@240 the first NVENC open fails with invalid param (8) and recovers via the 2-way split-encode path (the stream is live). Cosmetic but noisy — investigate the first-attempt failure / silence the log.

6. NVENC HEVC bitrate cap (~800 Mbps on the RTX 4090)

HEVC opens at the GPU's max (~800 Mbps) when a higher rate is requested (e.g. 1600). Not a bug; consider preferring AV1 when the client requests >~800 Mbps HEVC, and surface the cap in the speed-test / bitrate UI.

7. Restore-guard / keep-warm model interaction

do_restore_tv_session, when a desktop is active, still stops the idle managed gamescope (a teardown — leak risk per #1) and consumes STOPPED_AUTOLOGIN (so a later return-to-gaming won't auto-restore the TV session). Resolve together with the keep-warm decision in #1.

8. Feature B is opt-in

The mid-stream watcher is gated behind PUNKTFUNK_SESSION_WATCH=1 pending broader validation. Promote to default-on once #2 (mid-stream input) lands and it's exercised on more boxes.

Open items

  1. F44 gamescope teardown corrupts the GPU context (#1) — try SIGKILL on teardown (stop_autologin_sessions / stop_session in vdisplay/linux/gamescope.rs), else keep the managed session warm, else upstream fix.
  2. Mid-stream-switch input-loss window (~6 s) (#4) — pre-warm the portal or buffer/hold events instead of dropping during the device-resume window.
  3. NVENC InitializeEncoder failed: invalid param noise at 5120×1440@240 (#5) — recovers via split-encode; investigate the first-attempt failure / silence the log.
  4. NVENC HEVC ~800 Mbps cap on the RTX 4090 (#6) — consider preferring AV1 above it + surface the cap in the speed-test / bitrate UI.
  5. Restore-guard / keep-warm interaction (#7) — couples to #1; resolve together.
  6. Feature B (PUNKTFUNK_SESSION_WATCH) still opt-in (#8) — promote to default-on after #2 lands and it's exercised on more boxes.