perf(host): latency hardening for the game-vs-encode GPU contention collapse

Verified, prioritized analysis in docs/host-latency-plan.md (multi-agent
investigation + adversarial verification). Lands the two low-risk tiers:

Tier 2B — Linux scheduling hygiene:
- boost_thread_priority now nices the capture/encode (-10) and send (-5)
  threads on Linux (setpriority, best-effort; no-op without CAP_SYS_NICE),
  and the wrong "gamescope caps the game" doc-comment is corrected.
- CUDA context created with CU_CTX_SCHED_BLOCKING_SYNC (frees a core on the
  shared box instead of busy-spinning on completion).
- Copies moved off the default stream onto a per-thread highest-priority
  CUDA stream (cuStreamCreateWithPriority, graceful NULL-stream fallback)
  with a per-stream sync that no longer blocks on the other worker thread's
  in-flight copies. Stream priority is measure-then-keep (NVIDIA Linux may
  ignore it); never regresses.

Tier 3A — Windows session tuning (new session_tuning.rs, raw C-ABI FFI,
no-op off Windows): once-per-process 1ms timer + DwmEnableMMCSS + HIGH
priority class; per-thread MMCSS "Games" + keep-display-awake. Wired into
both the native (boost_thread_priority) and GameStream (stream.rs) paths.
We had zero session tuning before (Apollo streaming_will_start parity).

Tier 2A (Linux NV12 convert) is specified but intentionally not landed:
it is colour-correctness-critical and needs A/B validation on a GPU box
with a display (green-screen risk). Builds + clippy + fmt green on Linux.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-18 23:05:57 +00:00
parent 16d3b7767e
commit 112a054c35
6 changed files with 472 additions and 23 deletions
+28 -3
View File
@@ -1831,10 +1831,15 @@ struct FrameMsg {
/// capture/encode/send threads. This matters even though our GPU work is already HIGH priority: the
/// GPU scheduler can only favour commands we've actually SUBMITTED, so if a normal-priority thread is
/// descheduled by the game it submits the convert/encode late and the GPU priority never bites. Apollo
/// does the same (capture thread CRITICAL, encoder ABOVE_NORMAL). Windows-only — the Linux host caps
/// the game via gamescope, so its threads aren't starved. `critical` → highest non-realtime class
/// does the same (capture thread CRITICAL, encoder ABOVE_NORMAL). The Linux host needs this too: an
/// uncapped GPU-saturating title (e.g. CS2 direct on a virtual output, not capped by gamescope) is
/// also a CPU hog and can deschedule our submit threads. `critical` → highest non-realtime class
/// (the capture+encode loop); otherwise above-normal (the send/relay thread).
pub(crate) fn boost_thread_priority(critical: bool) {
// Windows host-process/thread session tuning (timer 1ms, DWM MMCSS, HIGH class once; MMCSS +
// keep-display-awake per thread). No-op off Windows. Both stream threads call us, so this covers
// capture/encode (critical) and send (non-critical).
crate::session_tuning::on_hot_thread();
#[cfg(target_os = "windows")]
unsafe {
use windows::Win32::System::Threading::{
@@ -1853,7 +1858,27 @@ pub(crate) fn boost_thread_priority(critical: bool) {
}
}
}
#[cfg(not(target_os = "windows"))]
#[cfg(target_os = "linux")]
{
// Best-effort nice of the CALLING thread. On Linux `setpriority(PRIO_PROCESS, 0, …)` acts on
// the calling thread (the kernel resolves who==0 to the current task/tid), and both call
// sites run inside their worker thread — so this nices exactly the capture/encode (critical)
// and send (non-critical) threads, nothing else. Silently no-ops without CAP_SYS_NICE / a
// raised RLIMIT_NICE, which is fine. We deliberately do NOT use SCHED_RR/FIFO by default: a
// realtime CPU class can preempt the compositor AND the game's own render thread, adding the
// very frame-time we refuse to add (opt-in only — see PUNKTFUNK_SCHED_RR).
let nice = if critical { -10 } else { -5 };
let rc = unsafe { libc::setpriority(libc::PRIO_PROCESS, 0, nice) };
if rc == 0 {
tracing::debug!(critical, nice, "thread nice raised");
} else {
tracing::debug!(
critical,
"setpriority(nice) no-op (needs CAP_SYS_NICE / RLIMIT_NICE)"
);
}
}
#[cfg(not(any(target_os = "windows", target_os = "linux")))]
{
let _ = critical;
}