perf(host/windows): elevate capture/encode/send thread CPU priority (Apollo-parity)
apple / swift (push) Successful in 54s
deb / build-publish (push) Successful in 2m31s
decky / build-publish (push) Successful in 15s
ci / rust (push) Successful in 1m36s
android / android (push) Successful in 2m5s
ci / web (push) Successful in 29s
ci / docs-site (push) Successful in 29s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m28s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m20s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m58s

Apollo runs its capture thread at CRITICAL and its encoder thread at ABOVE_NORMAL; we set
none. Our GPU work is already HIGH priority, but the GPU scheduler can only favour commands
we've SUBMITTED — a normal-priority thread descheduled by a CPU-heavy game submits the
convert/encode late, so the HIGH GPU priority never bites (consistent with the measured
"NVENC engine idle yet the encode waits ~15 ms"). Raise the WGC helper's capture+encode
loop and the single-process capture+encode loop to THREAD_PRIORITY_HIGHEST, and the
transmit thread to ABOVE_NORMAL, via a cross-platform boost_thread_priority() (Windows-only
effect — the Linux host caps the game via gamescope so its threads aren't starved).

Not yet built/validated on the GPU box (it's down); the cross-platform side compiles
(cargo check) and the Windows calls are cross-checked against the windows-0.62 API.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-17 16:12:29 +00:00
parent 9771aa8815
commit 3b3e8b4ba9
2 changed files with 42 additions and 0 deletions
+36
View File
@@ -1828,6 +1828,38 @@ struct FrameMsg {
/// speed-test probe bursts (which also need the Session). Decoupling the paced send from encoding /// speed-test probe bursts (which also need the Session). Decoupling the paced send from encoding
/// lets the encode of frame N+1 overlap the transmit of frame N instead of waiting behind its tail. /// lets the encode of frame N+1 overlap the transmit of frame N instead of waiting behind its tail.
/// Runs until the encode thread drops the frame channel (end of stream) or `stop` is set. /// Runs until the encode thread drops the frame channel (end of stream) or `stop` is set.
/// Raise the current thread's OS scheduling priority so a CPU-heavy game can't deschedule our
/// capture/encode/send threads. This matters even though our GPU work is already HIGH priority: the
/// GPU scheduler can only favour commands we've actually SUBMITTED, so if a normal-priority thread is
/// descheduled by the game it submits the convert/encode late and the GPU priority never bites. Apollo
/// does the same (capture thread CRITICAL, encoder ABOVE_NORMAL). Windows-only — the Linux host caps
/// the game via gamescope, so its threads aren't starved. `critical` → highest non-realtime class
/// (the capture+encode loop); otherwise above-normal (the send/relay thread).
pub(crate) fn boost_thread_priority(critical: bool) {
#[cfg(target_os = "windows")]
unsafe {
use windows::Win32::System::Threading::{
GetCurrentThread, SetThreadPriority, THREAD_PRIORITY_ABOVE_NORMAL,
THREAD_PRIORITY_HIGHEST,
};
let prio = if critical {
THREAD_PRIORITY_HIGHEST
} else {
THREAD_PRIORITY_ABOVE_NORMAL
};
match SetThreadPriority(GetCurrentThread(), prio) {
Ok(()) => tracing::debug!(critical, "thread priority raised"),
Err(e) => {
tracing::debug!(critical, error = %format!("{e:?}"), "SetThreadPriority failed")
}
}
}
#[cfg(not(target_os = "windows"))]
{
let _ = critical;
}
}
fn send_loop( fn send_loop(
mut session: Session, mut session: Session,
frame_rx: std::sync::mpsc::Receiver<FrameMsg>, frame_rx: std::sync::mpsc::Receiver<FrameMsg>,
@@ -1837,6 +1869,7 @@ fn send_loop(
perf: bool, perf: bool,
burst_cap: usize, burst_cap: usize,
) { ) {
boost_thread_priority(false); // transmit thread: above-normal (Apollo's encoder-thread level)
let mut last_perf = std::time::Instant::now(); let mut last_perf = std::time::Instant::now();
let mut last_bytes = 0u64; let mut last_bytes = 0u64;
let mut last_send_dropped = 0u64; let mut last_send_dropped = 0u64;
@@ -1995,6 +2028,9 @@ fn virtual_stream(
probe_rx: std::sync::mpsc::Receiver<ProbeRequest>, probe_rx: std::sync::mpsc::Receiver<ProbeRequest>,
probe_result_tx: tokio::sync::mpsc::UnboundedSender<ProbeResult>, probe_result_tx: tokio::sync::mpsc::UnboundedSender<ProbeResult>,
) -> Result<()> { ) -> Result<()> {
// This thread runs the capture+encode loop (single-process: Linux / synthetic / NO_WGC DDA) — or
// tail-calls the relay below. Elevate it so a CPU-heavy game can't deschedule our GPU submission.
boost_thread_priority(true);
// Windows two-process secure-desktop path: when the host runs as SYSTEM (required for the secure // Windows two-process secure-desktop path: when the host runs as SYSTEM (required for the secure
// desktop + SendInput), WGC can't activate in-process, so we capture the normal desktop via a // desktop + SendInput), WGC can't activate in-process, so we capture the normal desktop via a
// helper spawned in the user session and relay its AUs. (Single-process WGC/DDA is used as the // helper spawned in the user session and relay its AUs. (Single-process WGC/DDA is used as the
+6
View File
@@ -46,6 +46,12 @@ pub fn run(opts: HelperOptions) -> Result<()> {
"WGC helper starting (user session)" "WGC helper starting (user session)"
); );
// This thread does WGC capture + video-processor convert + NVENC submit — the GPU-submitting hot
// path. Elevate its OS priority so a CPU-heavy game can't deschedule it and delay submission (which
// would leave our HIGH GPU priority with nothing queued to prioritise). Apollo's capture thread is
// likewise CRITICAL.
crate::m3::boost_thread_priority(true);
// Capture the EXISTING SudoVDA output by GDI name / target id — do NOT create one (the host owns // Capture the EXISTING SudoVDA output by GDI name / target id — do NOT create one (the host owns
// the virtual output + its isolate/restore; a second topology owner breaks DDA recovery). // the virtual output + its isolate/restore; a second topology owner breaks DDA recovery).
let target = WinCaptureTarget { let target = WinCaptureTarget {