7b99b41ede
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).
- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
host-latency, gpu-contention (fixed stale status table), game-library,
linux-setup (fixed m0->spike + stale zero-copy claim),
session-aware-host-followups, windows-client-bootstrap,
windows-dualsense-{scoping,game-detection}, windows-virtual-display,
security-review (per-finding status table; #12 still open),
apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
merged, M4 done); windows-secure-desktop.md archived (now a fallback
behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
83 lines
5.2 KiB
Markdown
83 lines
5.2 KiB
Markdown
# Session-aware host — known limitations & follow-ups
|
||
|
||
> **Status:** Session detection SHIPPED — host auto-detects the live session (Gaming / KDE / GNOME /
|
||
> wlroots) **per connect** and routes video+input at it; opt-in mid-stream switch watcher
|
||
> (`PUNKTFUNK_SESSION_WATCH=1`). Code: `crates/punktfunk-host/src/vdisplay.rs` +
|
||
> `vdisplay/linux/{gamescope,kwin,mutter}.rs`. Items #2 + #3 resolved in code (`3363576`); this doc is
|
||
> trimmed to the still-open limitations + their design rationale.
|
||
|
||
The host auto-detects the live session per connect and routes both video and input at it — managed
|
||
gamescope at the client's resolution in Steam Gaming Mode, a KWin/Mutter virtual output at the
|
||
client's resolution on a Desktop. A watcher (opt-in: `PUNKTFUNK_SESSION_WATCH=1`) follows a
|
||
Gaming↔Desktop switch **mid-stream** and rebuilds the backend in place without a reconnect.
|
||
|
||
Live-validated on the Bazzite F44 box (`bazzite-deck-nvidia:testing`, RTX 4090): Desktop KDE at
|
||
5120×1440 + input; Gaming managed at 5120×1440; warm-session reuse on quick reconnect; Feature B
|
||
video-switch both directions.
|
||
|
||
(Resolved: **#2 mid-stream-switch input** — `vdisplay::settle_desktop_portal()`; **#3 KWin/Mutter
|
||
virtual output primary** — `apply_session_env` defaults `PUNKTFUNK_KWIN_VIRTUAL_PRIMARY` /
|
||
`PUNKTFUNK_MUTTER_VIRTUAL_PRIMARY` on for the auto desktop path. Both shipped in `3363576`; details in
|
||
git history.)
|
||
|
||
## Still parked
|
||
|
||
### 1. F44 gamescope teardown corrupts the GPU context
|
||
Every gamescope teardown on this box (stop the autologin on connect; stop the managed session on
|
||
restore) risks leaking the NVIDIA GPU context — surfaces as `CUDA_ERROR_ILLEGAL_STATE` (401) in
|
||
`cuCtxCreate` / `vkCreateDevice` `VK_ERROR_INITIALIZATION_FAILED` (-3), then a black screen that
|
||
**needs a reboot**. The 5 s debounced restore + the desktop restore-guard cut the teardown *count*
|
||
but don't eliminate it. Options, in order of preference:
|
||
- **SIGKILL the gamescope on teardown** instead of `systemctl stop` (SIGTERM). Hypothesis: skipping
|
||
gamescope's buggy SIGTERM teardown handler (the part that SIGSEGVs, exit 139) lets the process die
|
||
hard and the driver reclaim its GPU resources cleanly via normal process exit — no half-torn-down
|
||
context. Change `stop_autologin_sessions` + `stop_session` (`vdisplay/linux/gamescope.rs`, both
|
||
still use `systemctl --user stop` = SIGTERM) to `systemctl --user kill --signal=SIGKILL <unit>`
|
||
(+ a follow-up `stop`/`reset-failed` to clear unit state). **Untested** — this is the first thing
|
||
to try; it would preserve "managed client-res gaming AND TV-shows-gaming-when-idle".
|
||
- **Keep the managed session warm** (no per-disconnect restore): spawn once, reuse forever, never
|
||
tear down → ~1 teardown per host lifetime. Tradeoff: the TV is blank/idle when no client is
|
||
connected (the autologin is never restored; return to gaming manually).
|
||
- Upstream gamescope/driver fix.
|
||
|
||
## Lower priority / polish
|
||
|
||
### 4. Mid-stream-switch input loss window (~6 s)
|
||
During the libei portal setup on a switch, buffered input drops (`libei: DROP — no resumed device`,
|
||
hundreds of events). Polish: pre-warm the portal, or hold events instead of dropping during the
|
||
device-resume window.
|
||
|
||
### 5. NVENC `InitializeEncoder failed: invalid param` (recovered)
|
||
At 5120×1440@240 the first NVENC open fails with `invalid param (8)` and **recovers** via the 2-way
|
||
split-encode path (the stream is live). Cosmetic but noisy — investigate the first-attempt failure /
|
||
silence the log.
|
||
|
||
### 6. NVENC HEVC bitrate cap (~800 Mbps on the RTX 4090)
|
||
HEVC opens at the GPU's max (~800 Mbps) when a higher rate is requested (e.g. 1600). Not a bug;
|
||
consider preferring AV1 when the client requests >~800 Mbps HEVC, and surface the cap in the
|
||
speed-test / bitrate UI.
|
||
|
||
### 7. Restore-guard / keep-warm model interaction
|
||
`do_restore_tv_session`, when a desktop is active, still stops the idle managed gamescope (a teardown
|
||
— leak risk per #1) and consumes `STOPPED_AUTOLOGIN` (so a later return-to-gaming won't auto-restore
|
||
the TV session). Resolve together with the keep-warm decision in #1.
|
||
|
||
### 8. Feature B is opt-in
|
||
The mid-stream watcher is gated behind `PUNKTFUNK_SESSION_WATCH=1` pending broader validation. Promote
|
||
to default-on once #2 (mid-stream input) lands and it's exercised on more boxes.
|
||
|
||
## Open items
|
||
|
||
1. **F44 gamescope teardown corrupts the GPU context** (#1) — try SIGKILL on teardown
|
||
(`stop_autologin_sessions` / `stop_session` in `vdisplay/linux/gamescope.rs`), else keep the
|
||
managed session warm, else upstream fix.
|
||
2. **Mid-stream-switch input-loss window (~6 s)** (#4) — pre-warm the portal or buffer/hold events
|
||
instead of dropping during the device-resume window.
|
||
3. **NVENC `InitializeEncoder failed: invalid param` noise at 5120×1440@240** (#5) — recovers via
|
||
split-encode; investigate the first-attempt failure / silence the log.
|
||
4. **NVENC HEVC ~800 Mbps cap on the RTX 4090** (#6) — consider preferring AV1 above it + surface the
|
||
cap in the speed-test / bitrate UI.
|
||
5. **Restore-guard / keep-warm interaction** (#7) — couples to #1; resolve together.
|
||
6. **Feature B (`PUNKTFUNK_SESSION_WATCH`) still opt-in** (#8) — promote to default-on after #2 lands
|
||
and it's exercised on more boxes.
|