perf(host): latency hardening for the game-vs-encode GPU contention collapse
Verified, prioritized analysis in docs/host-latency-plan.md (multi-agent investigation + adversarial verification). Lands the two low-risk tiers: Tier 2B — Linux scheduling hygiene: - boost_thread_priority now nices the capture/encode (-10) and send (-5) threads on Linux (setpriority, best-effort; no-op without CAP_SYS_NICE), and the wrong "gamescope caps the game" doc-comment is corrected. - CUDA context created with CU_CTX_SCHED_BLOCKING_SYNC (frees a core on the shared box instead of busy-spinning on completion). - Copies moved off the default stream onto a per-thread highest-priority CUDA stream (cuStreamCreateWithPriority, graceful NULL-stream fallback) with a per-stream sync that no longer blocks on the other worker thread's in-flight copies. Stream priority is measure-then-keep (NVIDIA Linux may ignore it); never regresses. Tier 3A — Windows session tuning (new session_tuning.rs, raw C-ABI FFI, no-op off Windows): once-per-process 1ms timer + DwmEnableMMCSS + HIGH priority class; per-thread MMCSS "Games" + keep-display-awake. Wired into both the native (boost_thread_priority) and GameStream (stream.rs) paths. We had zero session tuning before (Apollo streaming_will_start parity). Tier 2A (Linux NV12 convert) is specified but intentionally not landed: it is colour-correctness-critical and needs A/B validation on a GPU box with a display (green-screen risk). Builds + clippy + fmt green on Linux. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -60,6 +60,8 @@ fn run(
|
||||
force_idr: &AtomicBool,
|
||||
video_cap: &std::sync::Mutex<Option<Box<dyn Capturer>>>,
|
||||
) -> Result<()> {
|
||||
// GameStream capture/encode thread: apply Windows session tuning (no-op off Windows).
|
||||
crate::session_tuning::on_hot_thread();
|
||||
// Reject an out-of-range client mode before allocating capture/encode buffers.
|
||||
encode::validate_dimensions(cfg.codec, cfg.width, cfg.height)
|
||||
.context("client-requested video mode")?;
|
||||
@@ -219,6 +221,8 @@ fn spawn_sender(
|
||||
std::thread::Builder::new()
|
||||
.name("punktfunk-send".into())
|
||||
.spawn(move || {
|
||||
// GameStream send thread: Windows session tuning + MMCSS (no-op off Windows).
|
||||
crate::session_tuning::on_hot_thread();
|
||||
// Chunk pacing: 16 packets per burst, bursts spread across the send budget.
|
||||
const PACE_CHUNK: usize = 16;
|
||||
let budget = frame_interval.mul_f32(0.75);
|
||||
|
||||
Reference in New Issue
Block a user