Pre-existing working-tree changes committed to the branch on request: the
gpu-contention investigation doc, host-latency-plan additions, and small
pack-host-installer / stage-pf-vdisplay packaging-script edits.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified, prioritized analysis in docs/host-latency-plan.md (multi-agent
investigation + adversarial verification). Lands the two low-risk tiers:
Tier 2B — Linux scheduling hygiene:
- boost_thread_priority now nices the capture/encode (-10) and send (-5)
threads on Linux (setpriority, best-effort; no-op without CAP_SYS_NICE),
and the wrong "gamescope caps the game" doc-comment is corrected.
- CUDA context created with CU_CTX_SCHED_BLOCKING_SYNC (frees a core on the
shared box instead of busy-spinning on completion).
- Copies moved off the default stream onto a per-thread highest-priority
CUDA stream (cuStreamCreateWithPriority, graceful NULL-stream fallback)
with a per-stream sync that no longer blocks on the other worker thread's
in-flight copies. Stream priority is measure-then-keep (NVIDIA Linux may
ignore it); never regresses.
Tier 3A — Windows session tuning (new session_tuning.rs, raw C-ABI FFI,
no-op off Windows): once-per-process 1ms timer + DwmEnableMMCSS + HIGH
priority class; per-thread MMCSS "Games" + keep-display-awake. Wired into
both the native (boost_thread_priority) and GameStream (stream.rs) paths.
We had zero session tuning before (Apollo streaming_will_start parity).
Tier 2A (Linux NV12 convert) is specified but intentionally not landed:
it is colour-correctness-critical and needs A/B validation on a GPU box
with a display (green-screen risk). Builds + clippy + fmt green on Linux.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>