perf(host): latency hardening for the game-vs-encode GPU contention collapse
Verified, prioritized analysis in docs/host-latency-plan.md (multi-agent investigation + adversarial verification). Lands the two low-risk tiers: Tier 2B — Linux scheduling hygiene: - boost_thread_priority now nices the capture/encode (-10) and send (-5) threads on Linux (setpriority, best-effort; no-op without CAP_SYS_NICE), and the wrong "gamescope caps the game" doc-comment is corrected. - CUDA context created with CU_CTX_SCHED_BLOCKING_SYNC (frees a core on the shared box instead of busy-spinning on completion). - Copies moved off the default stream onto a per-thread highest-priority CUDA stream (cuStreamCreateWithPriority, graceful NULL-stream fallback) with a per-stream sync that no longer blocks on the other worker thread's in-flight copies. Stream priority is measure-then-keep (NVIDIA Linux may ignore it); never regresses. Tier 3A — Windows session tuning (new session_tuning.rs, raw C-ABI FFI, no-op off Windows): once-per-process 1ms timer + DwmEnableMMCSS + HIGH priority class; per-thread MMCSS "Games" + keep-display-awake. Wired into both the native (boost_thread_priority) and GameStream (stream.rs) paths. We had zero session tuning before (Apollo streaming_will_start parity). Tier 2A (Linux NV12 convert) is specified but intentionally not landed: it is colour-correctness-critical and needs A/B validation on a GPU box with a display (green-screen risk). Builds + clippy + fmt green on Linux. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1831,10 +1831,15 @@ struct FrameMsg {
|
||||
/// capture/encode/send threads. This matters even though our GPU work is already HIGH priority: the
|
||||
/// GPU scheduler can only favour commands we've actually SUBMITTED, so if a normal-priority thread is
|
||||
/// descheduled by the game it submits the convert/encode late and the GPU priority never bites. Apollo
|
||||
/// does the same (capture thread CRITICAL, encoder ABOVE_NORMAL). Windows-only — the Linux host caps
|
||||
/// the game via gamescope, so its threads aren't starved. `critical` → highest non-realtime class
|
||||
/// does the same (capture thread CRITICAL, encoder ABOVE_NORMAL). The Linux host needs this too: an
|
||||
/// uncapped GPU-saturating title (e.g. CS2 direct on a virtual output, not capped by gamescope) is
|
||||
/// also a CPU hog and can deschedule our submit threads. `critical` → highest non-realtime class
|
||||
/// (the capture+encode loop); otherwise above-normal (the send/relay thread).
|
||||
pub(crate) fn boost_thread_priority(critical: bool) {
|
||||
// Windows host-process/thread session tuning (timer 1ms, DWM MMCSS, HIGH class once; MMCSS +
|
||||
// keep-display-awake per thread). No-op off Windows. Both stream threads call us, so this covers
|
||||
// capture/encode (critical) and send (non-critical).
|
||||
crate::session_tuning::on_hot_thread();
|
||||
#[cfg(target_os = "windows")]
|
||||
unsafe {
|
||||
use windows::Win32::System::Threading::{
|
||||
@@ -1853,7 +1858,27 @@ pub(crate) fn boost_thread_priority(critical: bool) {
|
||||
}
|
||||
}
|
||||
}
|
||||
#[cfg(not(target_os = "windows"))]
|
||||
#[cfg(target_os = "linux")]
|
||||
{
|
||||
// Best-effort nice of the CALLING thread. On Linux `setpriority(PRIO_PROCESS, 0, …)` acts on
|
||||
// the calling thread (the kernel resolves who==0 to the current task/tid), and both call
|
||||
// sites run inside their worker thread — so this nices exactly the capture/encode (critical)
|
||||
// and send (non-critical) threads, nothing else. Silently no-ops without CAP_SYS_NICE / a
|
||||
// raised RLIMIT_NICE, which is fine. We deliberately do NOT use SCHED_RR/FIFO by default: a
|
||||
// realtime CPU class can preempt the compositor AND the game's own render thread, adding the
|
||||
// very frame-time we refuse to add (opt-in only — see PUNKTFUNK_SCHED_RR).
|
||||
let nice = if critical { -10 } else { -5 };
|
||||
let rc = unsafe { libc::setpriority(libc::PRIO_PROCESS, 0, nice) };
|
||||
if rc == 0 {
|
||||
tracing::debug!(critical, nice, "thread nice raised");
|
||||
} else {
|
||||
tracing::debug!(
|
||||
critical,
|
||||
"setpriority(nice) no-op (needs CAP_SYS_NICE / RLIMIT_NICE)"
|
||||
);
|
||||
}
|
||||
}
|
||||
#[cfg(not(any(target_os = "windows", target_os = "linux")))]
|
||||
{
|
||||
let _ = critical;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user