punktfunk

unom/punktfunk

Fork 0

Commit Graph

Author	SHA1	Message	Date
enricobuehler	112a054c35	perf(host): latency hardening for the game-vs-encode GPU contention collapse Verified, prioritized analysis in docs/host-latency-plan.md (multi-agent investigation + adversarial verification). Lands the two low-risk tiers: Tier 2B — Linux scheduling hygiene: - boost_thread_priority now nices the capture/encode (-10) and send (-5) threads on Linux (setpriority, best-effort; no-op without CAP_SYS_NICE), and the wrong "gamescope caps the game" doc-comment is corrected. - CUDA context created with CU_CTX_SCHED_BLOCKING_SYNC (frees a core on the shared box instead of busy-spinning on completion). - Copies moved off the default stream onto a per-thread highest-priority CUDA stream (cuStreamCreateWithPriority, graceful NULL-stream fallback) with a per-stream sync that no longer blocks on the other worker thread's in-flight copies. Stream priority is measure-then-keep (NVIDIA Linux may ignore it); never regresses. Tier 3A — Windows session tuning (new session_tuning.rs, raw C-ABI FFI, no-op off Windows): once-per-process 1ms timer + DwmEnableMMCSS + HIGH priority class; per-thread MMCSS "Games" + keep-display-awake. Wired into both the native (boost_thread_priority) and GameStream (stream.rs) paths. We had zero session tuning before (Apollo streaming_will_start parity). Tier 2A (Linux NV12 convert) is specified but intentionally not landed: it is colour-correctness-critical and needs A/B validation on a GPU box with a display (green-screen risk). Builds + clippy + fmt green on Linux. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-18 23:05:57 +00:00

Author

SHA1

Message

Date

enricobuehler

112a054c35

perf(host): latency hardening for the game-vs-encode GPU contention collapse

Verified, prioritized analysis in docs/host-latency-plan.md (multi-agent
investigation + adversarial verification). Lands the two low-risk tiers:

Tier 2B — Linux scheduling hygiene:
- boost_thread_priority now nices the capture/encode (-10) and send (-5)
  threads on Linux (setpriority, best-effort; no-op without CAP_SYS_NICE),
  and the wrong "gamescope caps the game" doc-comment is corrected.
- CUDA context created with CU_CTX_SCHED_BLOCKING_SYNC (frees a core on the
  shared box instead of busy-spinning on completion).
- Copies moved off the default stream onto a per-thread highest-priority
  CUDA stream (cuStreamCreateWithPriority, graceful NULL-stream fallback)
  with a per-stream sync that no longer blocks on the other worker thread's
  in-flight copies. Stream priority is measure-then-keep (NVIDIA Linux may
  ignore it); never regresses.

Tier 3A — Windows session tuning (new session_tuning.rs, raw C-ABI FFI,
no-op off Windows): once-per-process 1ms timer + DwmEnableMMCSS + HIGH
priority class; per-thread MMCSS "Games" + keep-display-awake. Wired into
both the native (boost_thread_priority) and GameStream (stream.rs) paths.
We had zero session tuning before (Apollo streaming_will_start parity).

Tier 2A (Linux NV12 convert) is specified but intentionally not landed:
it is colour-correctness-critical and needs A/B validation on a GPU box
with a display (green-screen risk). Builds + clippy + fmt green on Linux.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-18 23:05:57 +00:00

1 Commits