refactor(windows-host): KeyedMutexGuard RAII for the IDD-push consume hot loop (Goal-3, hw-validated)

The IDD-push consume loop acquired the slot's keyed mutex by hand
(`AcquireSync(0,8)` … work … `ReleaseSync(0)`), with a comment warning that a
`?`-return between acquire and release would leak the lock and stall the driver
on that slot — the reason the HDR converter is built *before* the acquire.

Replace with a `KeyedMutexGuard` RAII (acquire → `ReleaseSync` on drop), scoped
to JUST the convert/copy block so the lock releases at the EXACT same point as
before (the driver gets the slot back immediately; not held across the rest of
`try_consume`). Now the release can't be skipped on any early return/panic — the
leak footgun is gone by construction, and the hot loop has no raw `ReleaseSync`.

Behavior/latency-equivalent (same acquire params, same release point). Windows-
only (CI + on-glass gated); to be validated on the RTX box (host clippy build +
a PERF=1 latency A/B vs the shipping binary — the change should show no delta).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-26 07:02:05 +00:00
parent 6b3cbce120
commit 658564353c
@@ -132,6 +132,41 @@ struct HostSlot {
srv: ID3D11ShaderResourceView, srv: ID3D11ShaderResourceView,
} }
/// RAII guard over an [`IDXGIKeyedMutex`]: [`acquire`](Self::acquire) does `AcquireSync(key, timeout)`,
/// `Drop` does `ReleaseSync(key)`. So the lock is released even if the work between acquire and the end
/// of the guard's scope `?`-returns or panics — the "leak the keyed-mutex lock → stall the driver on
/// that slot" footgun the consume loop guards against by hand. Keeps the hot loop free of a raw
/// `ReleaseSync` that a future early-return could skip.
struct KeyedMutexGuard<'a> {
mutex: &'a IDXGIKeyedMutex,
key: u64,
}
impl<'a> KeyedMutexGuard<'a> {
/// Acquire `mutex` at `key`, waiting up to `timeout_ms`. `None` if the acquire times out / errors
/// (the caller skips the frame), so the guard is only ever held when the lock is genuinely held.
fn acquire(
mutex: &'a IDXGIKeyedMutex,
key: u64,
timeout_ms: u32,
) -> Option<KeyedMutexGuard<'a>> {
// SAFETY: `mutex` is a live `IDXGIKeyedMutex` on this thread's immediate-context device.
if unsafe { mutex.AcquireSync(key, timeout_ms) }.is_err() {
return None;
}
Some(KeyedMutexGuard { mutex, key })
}
}
impl Drop for KeyedMutexGuard<'_> {
fn drop(&mut self) {
// SAFETY: we hold `mutex` at `key` (acquired in `acquire`, never released elsewhere); release it.
unsafe {
let _ = self.mutex.ReleaseSync(self.key);
}
}
}
/// Creates + owns the shared ring; yields the driver's frames as [`FramePayload::D3d11`]. /// Creates + owns the shared ring; yields the driver's frames as [`FramePayload::D3d11`].
pub struct IddPushCapturer { pub struct IddPushCapturer {
device: ID3D11Device, device: ID3D11Device,
@@ -783,20 +818,26 @@ impl IddPushCapturer {
// ~3 ms encode — NVENC reads the host out-ring slot, not the keyed-mutex slot), so the driver gets // ~3 ms encode — NVENC reads the host out-ring slot, not the keyed-mutex slot), so the driver gets
// the slot back immediately and the encode of the PREVIOUS frame overlaps this convert. // the slot back immediately and the encode of the PREVIOUS frame overlaps this convert.
let s = &self.slots[slot]; let s = &self.slots[slot];
if unsafe { s.mutex.AcquireSync(0, 8) }.is_err() { // Acquire the slot's keyed mutex via a RAII guard, scoped to JUST the convert/copy below so it
return Ok(None); // releases at the same point as the old hand-written `ReleaseSync` (the driver gets the slot back
} // immediately, NOT held across the rest of `try_consume`) — but now leak-proof on any early return.
unsafe { {
if self.display_hdr { let Some(_lock) = KeyedMutexGuard::acquire(&s.mutex, 0, 8) else {
// Sample the FP16 slot's SRV directly (no scratch copy) → BT.2020 PQ Rgb10a2. return Ok(None);
if let Some(conv) = self.hdr_conv.as_ref() { };
conv.convert(&self.context, &s.srv, &out_rtv, self.width, self.height); // SAFETY: convert/copy on the owning (encode) thread's immediate context, holding the slot lock.
unsafe {
if self.display_hdr {
// Sample the FP16 slot's SRV directly (no scratch copy) → BT.2020 PQ Rgb10a2.
if let Some(conv) = self.hdr_conv.as_ref() {
conv.convert(&self.context, &s.srv, &out_rtv, self.width, self.height);
}
} else {
// SDR: the slot is already 8-bit BGRA — one copy into the out-ring (hidden by pipelining).
self.context.CopyResource(&out, &s.tex);
} }
} else {
// SDR: the slot is already 8-bit BGRA — one copy into the out-ring (hidden by pipelining).
self.context.CopyResource(&out, &s.tex);
} }
let _ = s.mutex.ReleaseSync(0); // `_lock` drops here → `ReleaseSync(0)`.
} }
self.out_idx = (i + 1) % self.out_ring.len(); self.out_idx = (i + 1) % self.out_ring.len();
self.last_seq = seq; self.last_seq = seq;