Files
punktfunk/docs/session-aware-host-followups.md
T
enricobuehler 9775794ba5
apple / swift (push) Successful in 53s
android / android (push) Successful in 1m48s
ci / rust (push) Failing after 55s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 27s
ci / bench (push) Successful in 1m35s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
flatpak / build-publish (push) Failing after 2s
deb / build-publish (push) Successful in 2m12s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m19s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 5m23s
docker / deploy-docs (push) Successful in 9s
docs: known limitations + follow-ups for the session-aware host
Capture the deliberately-parked items after live-validating the session-aware
backend selector on the Bazzite F44 box (Desktop KDE + Gaming both at the
client's resolution, warm reuse, Feature B mid-stream switch both directions).

Top follow-ups: (1) F44 gamescope teardown corrupts the GPU context (try SIGKILL
teardown, else keep the managed session warm); (2) mid-stream-switch input is
flaky until a reconnect (portal opens before the systemd/D-Bus activation env
settles — fix: import-environment on switch); (3) the KWin virtual output isn't
set primary. Plus polish: input-loss window on switch, the recovered NVENC
invalid-param log, the 4090 HEVC ~800Mbps cap, restore-guard/keep-warm
interaction, and promoting Feature B from opt-in to default.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 23:56:59 +00:00

80 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session-aware host — known limitations & follow-ups
Status: 2026-06-14. The host auto-detects the live session (Gaming / KDE / GNOME / wlroots) **per
connect** and routes both video and input at it — managed gamescope at the client's resolution in
Steam Gaming Mode, a KWin/Mutter virtual output at the client's resolution on a Desktop. A watcher
(opt-in: `PUNKTFUNK_SESSION_WATCH=1`) follows a Gaming↔Desktop switch **mid-stream** and rebuilds the
backend in place without a reconnect.
Live-validated on the Bazzite F44 box (`bazzite-deck-nvidia:testing`, RTX 4090): Desktop KDE at
5120×1440 + input; Gaming managed at 5120×1440; warm-session reuse on quick reconnect; Feature B
video-switch both directions. The items below are **deliberately parked** — they have workarounds
and/or are F44-specific.
## High priority
### 1. F44 gamescope teardown corrupts the GPU context
Every gamescope teardown on this box (stop the autologin on connect; stop the managed session on
restore) risks leaking the NVIDIA GPU context — surfaces as `CUDA_ERROR_ILLEGAL_STATE` (401) in
`cuCtxCreate` / `vkCreateDevice` `VK_ERROR_INITIALIZATION_FAILED` (-3), then a black screen that
**needs a reboot**. The 5 s debounced restore + the desktop restore-guard cut the teardown *count*
but don't eliminate it. Options, in order of preference:
- **SIGKILL the gamescope on teardown** instead of `systemctl stop` (SIGTERM). Hypothesis: skipping
gamescope's buggy SIGTERM teardown handler (the part that SIGSEGVs, exit 139) lets the process die
hard and the driver reclaim its GPU resources cleanly via normal process exit — no half-torn-down
context. Change `stop_autologin_sessions` + `stop_session` (`vdisplay/gamescope.rs`) to
`systemctl --user kill --signal=SIGKILL <unit>` (+ a follow-up `stop`/`reset-failed` to clear unit
state). **Untested** — this is the first thing to try; it would preserve "managed client-res
gaming AND TV-shows-gaming-when-idle".
- **Keep the managed session warm** (no per-disconnect restore): spawn once, reuse forever, never
tear down → ~1 teardown per host lifetime. Tradeoff: the TV is blank/idle when no client is
connected (the autologin is never restored; return to gaming manually).
- Upstream gamescope/driver fix.
### 2. Mid-stream-switch input (Gaming→Desktop) is flaky until a reconnect
After a mid-stream switch to a desktop, pointer/keyboard often don't land until a disconnect+reconnect.
Root cause: on the switch the host opens the KDE `RemoteDesktop` **portal** immediately, but the
portal is a D-Bus-activated service that reads the **systemd `--user` activation environment**, which
hasn't settled to the new session yet — so the portal session is created against a half-stale env. It
*accepts* events (no error, so the injector's reopen-on-failure never fires) but they don't reach
KDE. The host log even shows `libei: portal granted devices` + `device RESUMED`, yet input is dead.
**Workaround: reconnect once after switching** (re-opens the portal against the settled env → works).
Fix: on a session switch, push the new session env into the systemd/D-Bus activation environment
before reopening input — `systemctl --user import-environment WAYLAND_DISPLAY XDG_CURRENT_DESKTOP
DBUS_SESSION_BUS_ADDRESS` + `dbus-update-activation-environment` (exactly what a fresh KDE login does;
see `scripts/headless/run-headless-kde.sh:118-119`) — and/or delay/retry the input reopen until the
portal is settled.
### 3. KWin virtual output is not set as primary
The KWin virtual output isn't marked the primary display, so KDE panels / primary-screen content can
stay on the (now-absent) original output instead of the streamed virtual screen. Follow-up: set the
virtual output primary on create — `vdisplay/kwin.rs` already reads `PUNKTFUNK_KWIN_VIRTUAL_PRIMARY`;
make it default-on (or always promote the new output to primary via the KWin/kscreen API) so the
streamed screen is the primary desktop.
## Lower priority / polish
### 4. Mid-stream-switch input loss window (~6 s)
During the libei portal setup on a switch, buffered input drops (`libei: DROP — no resumed device`,
hundreds of events). Polish: pre-warm the portal, or hold events instead of dropping during the
device-resume window.
### 5. NVENC `InitializeEncoder failed: invalid param` (recovered)
At 5120×1440@240 the first NVENC open fails with `invalid param (8)` and **recovers** via the 2-way
split-encode path (the stream is live). Cosmetic but noisy — investigate the first-attempt failure /
silence the log.
### 6. NVENC HEVC bitrate cap (~800 Mbps on the RTX 4090)
HEVC opens at the GPU's max (~800 Mbps) when a higher rate is requested (e.g. 1600). Not a bug;
consider preferring AV1 when the client requests >~800 Mbps HEVC, and surface the cap in the
speed-test / bitrate UI.
### 7. Restore-guard / keep-warm model interaction
`do_restore_tv_session`, when a desktop is active, still stops the idle managed gamescope (a teardown
— leak risk per #1) and consumes `STOPPED_AUTOLOGIN` (so a later return-to-gaming won't auto-restore
the TV session). Resolve together with the keep-warm decision in #1.
### 8. Feature B is opt-in
The mid-stream watcher is gated behind `PUNKTFUNK_SESSION_WATCH=1` pending broader validation. Promote
to default-on once #2 (mid-stream input) lands and it's exercised on more boxes.