fix(windows-drivers): reclaim pf-vdisplay monitor ids on REMOVE (P1, slot-reclaim)

The driver assigned each virtual monitor a monotonically-increasing NEXT_ID used
as the EDID serial / IddCx ConnectorIndex / container GUID, and never reclaimed
it on REMOVE. Under sustained ADD/REMOVE churn the connector index kept climbing,
so IddCx/PnP allocated a NEW OS target slot every cycle and orphaned the old one
(ghost "Generic Monitor (punktfunk)" nodes) until the adapter's target capacity
was exhausted and ADD failed 0x80070490 ERROR_NOT_FOUND.

Fix: `create_monitor` now allocates the LOWEST free id (`alloc_monitor_id`,
computed under the MONITOR_MODES lock with the push) instead of a counter, so a
departed monitor's id is reclaimed and a fresh ADD reuses its target slot rather
than orphaning it. With <= N live monitors the id stays bounded to 1..=N+1.
Deleted the now-unused NEXT_ID + AtomicU32/Ordering import.

CI-compile-gated only — the wedge reproduces solely under sustained churn on the
RTX box, so this needs an on-glass reconnect-storm A/B to confirm (box is
ephemeral/down). Marked on-glass-pending in windows-host-rewrite.md §4; keep
reset-pf-vdisplay.ps1 as the recovery until validated. NOT to be relied on (or
merged to main) until that A/B passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-25 22:11:36 +00:00
parent 0bf3984614
commit 8cde8621ce
2 changed files with 32 additions and 14 deletions
+8 -4
View File
@@ -211,10 +211,14 @@ These are expensive empirical wins; keep them intact when touching the code:
2. **Make IDD-push the default** — today it is gated behind `PUNKTFUNK_IDD_PUSH` (`config.rs` default 2. **Make IDD-push the default** — today it is gated behind `PUNKTFUNK_IDD_PUSH` (`config.rs` default
`false`); deployment sets it in `host.env`. Flip the code default (with the WGC/DDA fallback already in `false`); deployment sets it in `host.env`. Flip the code default (with the WGC/DDA fallback already in
place) so a fresh install runs the validated path, or document the `host.env` requirement explicitly. place) so a fresh install runs the validated path, or document the `host.env` requirement explicitly.
3. **pf-vdisplay slot reclaim on REMOVE** (driver robustness) — sustained ADD/REMOVE churn wedges the 3. **pf-vdisplay slot reclaim on REMOVE** (driver robustness) — 🟡 **fix landed, on-glass-validation
driver (`ADD → 0x80070490 ERROR_NOT_FOUND`) because IddCx monitor slots aren't reclaimed (ghost nodes pending.** Sustained ADD/REMOVE churn wedged the driver (`ADD → 0x80070490 ERROR_NOT_FOUND`) because the
accumulate). Workaround today: `packaging/windows/reset-pf-vdisplay.ps1`. Real fix in `control.rs`/ monitor id (EDID serial / `ConnectorIndex` / container GUID) was a **monotonic** `NEXT_ID`, never
`monitor.rs`. On-glass-gated. reclaimed → IddCx accumulated a new OS target slot per cycle until exhaustion. `monitor.rs` now allocates
the **lowest free id** (`alloc_monitor_id`), reused on REMOVE, so a fresh ADD reuses the departed
monitor's target slot instead of orphaning it. CI-compile-gated; the wedge only reproduces under
sustained churn on the RTX box, so this needs an **on-glass reconnect-storm A/B** to confirm (the box is
ephemeral). Keep `packaging/windows/reset-pf-vdisplay.ps1` as the recovery until validated.
**P2 — hygiene / architecture completion** **P2 — hygiene / architecture completion**
4. **D1-host — host-crate P0 lints.** Add `#![deny(unsafe_op_in_unsafe_fn)]` + 4. **D1-host — host-crate P0 lints.** Add `#![deny(unsafe_op_in_unsafe_fn)]` +
@@ -5,7 +5,6 @@
//! `guid: u128` → `session_id: u64` for the owned `pf_vdisplay_proto` control plane. //! `guid: u128` → `session_id: u64` for the owned `pf_vdisplay_proto` control plane.
use std::sync::Mutex; use std::sync::Mutex;
use std::sync::atomic::{AtomicU32, Ordering};
use std::time::{Duration, Instant}; use std::time::{Duration, Instant};
use wdk_sys::iddcx; use wdk_sys::iddcx;
@@ -69,8 +68,6 @@ unsafe impl Send for MonitorObject {}
/// thread ([`crate::control::start_watchdog`]) races device cleanup — for no real gain. Cleanup of the /// thread ([`crate::control::start_watchdog`]) races device cleanup — for no real gain. Cleanup of the
/// heavy per-monitor resources on device removal is instead done explicitly ([`cleanup_for_device_removal`]). /// heavy per-monitor resources on device removal is instead done explicitly ([`cleanup_for_device_removal`]).
pub static MONITOR_MODES: Mutex<Vec<MonitorObject>> = Mutex::new(Vec::new()); pub static MONITOR_MODES: Mutex<Vec<MonitorObject>> = Mutex::new(Vec::new());
/// Monitor id / EDID-serial counter (unique per created monitor).
static NEXT_ID: AtomicU32 = AtomicU32::new(1);
/// True if any virtual monitor currently exists — the host-gone watchdog only reaps when there's /// True if any virtual monitor currently exists — the host-gone watchdog only reaps when there's
/// something to reap (see [`crate::control::start_watchdog`]). /// something to reap (see [`crate::control::start_watchdog`]).
@@ -323,8 +320,6 @@ pub fn create_monitor(
dbglog!("[pf-vd] create_monitor: session {session_id} already live — departing the stale monitor"); dbglog!("[pf-vd] create_monitor: session {session_id} already live — departing the stale monitor");
remove_monitor(session_id); remove_monitor(session_id);
} }
let id = NEXT_ID.fetch_add(1, Ordering::Relaxed);
let mut modes = vec![Mode { let mut modes = vec![Mode {
width, width,
height, height,
@@ -332,8 +327,17 @@ pub fn create_monitor(
}]; }];
modes.extend(default_modes()); modes.extend(default_modes());
// Register the (pending) monitor so the mode DDIs can find it by EDID-serial id before arrival. // Register the (pending) monitor so the mode DDIs can find it by EDID-serial id before arrival, under a
if let Ok(mut lock) = MONITOR_MODES.lock() { // REUSED id (the lowest not currently live). Reclaiming the id on REMOVE — instead of a monotonic
// counter — keeps the connector index / EDID serial / container GUID bounded, so IddCx reuses the same
// OS target slot on a fresh ADD rather than leaving a ghost monitor node behind (the slot-exhaustion
// wedge: sustained ADD/REMOVE churn eventually makes ADD fail 0x80070490 ERROR_NOT_FOUND). Allocated
// under the lock with the push so two concurrent ADDs can't pick the same id.
let id = {
let Ok(mut lock) = MONITOR_MODES.lock() else {
return None;
};
let id = alloc_monitor_id(&lock);
lock.push(MonitorObject { lock.push(MonitorObject {
object: None, object: None,
id, id,
@@ -345,9 +349,8 @@ pub fn create_monitor(
swap_chain_processor: None, swap_chain_processor: None,
created_at: Instant::now(), created_at: Instant::now(),
}); });
} else { id
return None; };
}
// EDID (serial = id) describes the monitor; the OS calls back into parse_monitor_description. // EDID (serial = id) describes the monitor; the OS calls back into parse_monitor_description.
let mut edid = crate::edid::Edid::generate_with(id); let mut edid = crate::edid::Edid::generate_with(id);
@@ -505,6 +508,17 @@ fn remove_by_id(id: u32) {
} }
} }
/// The lowest monitor id (≥1) not currently live. Reusing freed ids (instead of a monotonic counter) keeps
/// the connector index / EDID serial / container GUID bounded to the number of concurrent monitors, so a
/// fresh ADD reuses a departed monitor's OS target slot rather than allocating a new one and orphaning the
/// old (the ghost-monitor accumulation that wedges ADD at 0x80070490 ERROR_NOT_FOUND). Caller holds
/// `MONITOR_MODES`. With ≤ N live ids, a free one always exists in `1..=N+1` (pigeonhole).
fn alloc_monitor_id(modes: &[MonitorObject]) -> u32 {
(1u32..=modes.len() as u32 + 1)
.find(|id| !modes.iter().any(|m| m.id == *id))
.unwrap_or(1)
}
/// A deterministic, monitor-unique container GUID (groups targets into a physical device). Derived from /// A deterministic, monitor-unique container GUID (groups targets into a physical device). Derived from
/// `id` so it is stable + collision-free without a random source. /// `id` so it is stable + collision-free without a random source.
fn container_guid(id: u32) -> wdk_sys::GUID { fn container_guid(id: u32) -> wdk_sys::GUID {