feat(windows-host): pf-vdisplay — fix the ADD/REMOVE wedge + per-client display-config persistence

Two phases of pf-vdisplay (IddCx virtual display) lifecycle work, both validated on-glass on the RTX box.

Phase 1 — fix the long-standing IOCTL_ADD 0x80070490 (ERROR_NOT_FOUND) wedge that ghost-monitor
slot-budget exhaustion produced under ADD/REMOVE churn (the reset-script/reboot recurring failure).
Validated: 43 reconnect-churn cycles, 0 wedges, monitor-node count flat at 1.
  * driver: on IddCxMonitorArrival failure, tear the created-but-not-arrived monitor down with
    WdfObjectDelete + reclaim its id — the asymmetric-with-the-create-failure-path leak that exhausted
    the 16-monitor MaxMonitorsSupported budget; recover MONITOR_MODES from lock poisoning instead of
    failing closed (defensive; the driver builds panic=abort).
  * host: collapse the build-retry churn — hold ONE monitor lease across all build attempts and preempt
    only on Lingering (not Active), so a cold start does 1 ADD not 8; reap not-present "punktfunk"
    monitor PDOs on startup (the reset-script step-2 logic, in-process) and self-heal a detected
    0x80070490 by reaping + retrying ADD; force-preempt a stuck-Active prior monitor on the
    begin_idd_setup timeout (the safety net the Lingering-only preempt would otherwise drop).

Phase 2 — give each client (keyed by its cert FINGERPRINT) a STABLE virtual-monitor id (1..=15) so
Windows reapplies that client's saved per-monitor config (DPI SCALING) across reconnects, and two
clients never share/bleed config. Validated: distinct clients -> distinct ids (1, 2); the driver
honors the host's id (echoed resolved == preferred).
  * proto: rename AddRequest._reserved -> preferred_monitor_id (offset 20) and AddReply._reserved ->
    resolved_monitor_id (offset 12) — byte-compatible (offset asserts), NO PROTOCOL_VERSION bump, so a
    pre-Phase-2 driver degrades gracefully to auto-id (the host detects it via the resolved echo).
  * driver: create_monitor honors a host-supplied preferred id via resolve_id (range 1..=15, never
    collides with a live monitor) and seeds the EDID serial + IddCx ConnectorIndex + ContainerId from it.
  * host: a persisted LRU fingerprint->id map (%ProgramData%\punktfunk\pf-vdisplay-identity.json),
    threaded to add_monitor via a set_client_identity no-op trait method (Linux/GameStream unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-29 21:42:59 +02:00
parent 080c55dbf7
commit 0f798d62b6
8 changed files with 553 additions and 83 deletions
@@ -59,8 +59,9 @@ pub(crate) trait VdisplayDriver: Send + Sync {
/// # Safety
/// Issues setup-API + `DeviceIoControl` calls; runs in the caller's apartment.
unsafe fn open(&self) -> Result<(OwnedHandle, u32)>;
/// ADD a virtual monitor at `mode`, pinning the IDD render GPU to `render_luid` first if `Some`.
/// Returns the REMOVE key + target id + the adapter LUID the driver actually used.
/// ADD a virtual monitor at `mode`, pinning the IDD render GPU to `render_luid` first if `Some`, and
/// requesting `preferred_monitor_id` (the host's per-client stable id; `0` = auto). Returns the REMOVE
/// key + target id + the adapter LUID the driver actually used.
///
/// # Safety
/// `dev` must be the live control handle from [`open`](Self::open).
@@ -69,6 +70,7 @@ pub(crate) trait VdisplayDriver: Send + Sync {
dev: HANDLE,
mode: Mode,
render_luid: Option<LUID>,
preferred_monitor_id: u32,
) -> Result<AddedMonitor>;
/// REMOVE the monitor identified by `key`.
///
@@ -134,6 +136,10 @@ pub(crate) struct VirtualDisplayManager {
/// The current IDD-push session's stop flag; a new connection signals the prior one to release its
/// monitor before the fresh one is created (was the `IDD_SESSION_STOP` global in `punktfunk1`).
idd_session_stop: Mutex<Option<Arc<AtomicBool>>>,
/// Persistent per-client (cert-fingerprint) → stable monitor-id map. A monitor CREATE resolves the
/// connecting client's id here, so the client keeps the same EDID serial + IddCx ConnectorIndex across
/// reconnects and Windows reapplies its saved per-monitor config (DPI scaling). See [`super::identity`].
identity_map: Mutex<super::identity::MonitorIdentityMap>,
}
static VDM: OnceLock<VirtualDisplayManager> = OnceLock::new();
@@ -149,6 +155,7 @@ pub(crate) fn init(driver: Box<dyn VdisplayDriver>) -> &'static VirtualDisplayMa
state: Mutex::new(MgrState::Idle),
setup_lock: Mutex::new(()),
idd_session_stop: Mutex::new(None),
identity_map: Mutex::new(super::identity::MonitorIdentityMap::load()),
})
}
@@ -196,30 +203,40 @@ impl VirtualDisplayManager {
}
/// Acquire the shared monitor for a new session: preempt-recreate under IDD-push, join a live one
/// (refcount++), reuse a lingering one, or create one. The returned [`MonitorLease`] releases the
/// refcount on drop.
pub(crate) fn acquire(&'static self, mode: Mode) -> Result<VirtualOutput> {
/// (refcount++), reuse a lingering one, or create one. `client_fp` (the connecting client's cert
/// fingerprint; `None` = anonymous/GameStream) gives a freshly CREATED monitor a STABLE per-client id
/// (so Windows reapplies that client's saved per-monitor config); JOIN and lingering-reuse keep the
/// existing monitor's id. The returned [`MonitorLease`] releases the refcount on drop.
pub(crate) fn acquire(
&'static self,
mode: Mode,
client_fp: Option<[u8; 32]>,
) -> Result<VirtualOutput> {
self.ensure_linger_timer();
let mut state = self.state.lock().unwrap();
let dev = self.ensure_device()?;
// IDD-push: a new connection while a monitor is live is a single-client RECONNECT (the prior
// client is gone). A REUSED IddCx swap-chain is DEAD, so joining it hands a black screen —
// PREEMPT: tear the old monitor down (its key/topology are restored) and create a fresh one. The
// old session's lease is gen-stamped, so its later drop is a no-op and can't tear down the new one.
if idd_push_mode() && matches!(*state, MgrState::Active { .. } | MgrState::Lingering { .. })
{
if let MgrState::Active { mon, .. } | MgrState::Lingering { mon, .. } =
std::mem::replace(&mut *state, MgrState::Idle)
// IDD-push: a new connection while a monitor is LINGERING is a single-client RECONNECT (the
// prior session fully released). A REUSED IddCx swap-chain is DEAD, so reusing it hands a black
// screen — PREEMPT: tear the lingering monitor down (its key/topology are restored) and create a
// fresh one. The old session's lease is gen-stamped, so its later drop is a no-op.
//
// ONLY Lingering, NOT Active: an Active monitor still has a lease held — that's the build-retry
// path (`build_pipeline_with_retry` holds one lease across all attempts) or a concurrent session,
// NOT a reconnect. Preempting Active would tear a live session down AND churn REMOVE→ADD on every
// retry — the per-cold-start monitor churn that exhausts the IddCx slot pool and wedges ADD at
// 0x80070490. Active falls through to the JOIN path below (refcount++, no ADD).
if idd_push_mode() && matches!(*state, MgrState::Lingering { .. }) {
if let MgrState::Lingering { mon, .. } = std::mem::replace(&mut *state, MgrState::Idle)
{
tracing::info!(
old_target = mon.target_id,
"IDD-push reconnect — preempting the prior session, recreating a fresh monitor"
"IDD-push reconnect — preempting the lingering monitor, recreating a fresh one"
);
// SAFETY: `teardown` requires `dev` to be the live control handle; `dev` is the value
// `ensure_device()` returned above (the device is cached in the `OnceLock` and never
// closed for the manager's lifetime). `mon` was moved out of the prior `Active`/
// `Lingering` state by `mem::replace`, so it is exclusively owned here — no aliasing.
// closed for the manager's lifetime). `mon` was moved out of the prior `Lingering`
// state by `mem::replace`, so it is exclusively owned here — no aliasing.
unsafe { self.teardown(dev, mon) };
// Let the OS finish the ASYNC monitor departure before the next ADD; a back-to-back
// REMOVE→ADD races the teardown and the ADD IOCTL is rejected under reconnect churn.
@@ -264,7 +281,7 @@ impl VirtualDisplayManager {
// SAFETY: `create_monitor` requires `dev` to be the live control handle; `dev` is the
// handle `ensure_device()` returned above (cached in the `OnceLock`, never closed for the
// manager's lifetime), and we hold the `state` lock.
MgrState::Idle => unsafe { self.create_monitor(dev, mode)? },
MgrState::Idle => unsafe { self.create_monitor(dev, mode, client_fp)? },
MgrState::Active { .. } => unreachable!("handled above"),
};
let out = self.output_for(&mon);
@@ -291,12 +308,26 @@ impl VirtualDisplayManager {
///
/// # Safety
/// `dev` must be the live control handle.
unsafe fn create_monitor(&'static self, dev: HANDLE, mode: Mode) -> Result<Monitor> {
unsafe fn create_monitor(
&'static self,
dev: HANDLE,
mode: Mode,
client_fp: Option<[u8; 32]>,
) -> Result<Monitor> {
// Resolve the connecting client's STABLE per-client monitor id (so Windows reapplies its saved
// per-monitor config — DPI scaling — on reconnect); `None`/anonymous → 0 = the driver
// auto-allocates the lowest-free id (the original slot-based behavior).
let preferred_id = client_fp
.map(|fp| self.identity_map.lock().unwrap().resolve(fp))
.unwrap_or(0);
// SAFETY: `create_monitor`'s own `# Safety` contract guarantees `dev` is the live control
// handle; we forward it unchanged to `add_monitor`, whose precondition is exactly that.
// `resolve_render_pin()` returns an `Option<LUID>` by value (plain `Copy`), so no borrowed
// memory crosses the call.
let added = unsafe { self.driver.add_monitor(dev, mode, resolve_render_pin())? };
let added = unsafe {
self.driver
.add_monitor(dev, mode, resolve_render_pin(), preferred_id)?
};
// Mandatory keepalive: ping inside the watchdog window or the driver tears all displays down.
// The pinger reaches the singleton for both the device + the driver — no raw-handle smuggle.
@@ -510,25 +541,62 @@ impl VirtualDisplayManager {
let prev = self.idd_session_stop.lock().unwrap().replace(stop);
if let Some(prev_stop) = prev {
prev_stop.store(true, Ordering::SeqCst);
self.wait_for_monitor_released(Duration::from_secs(3));
if !self.wait_for_monitor_released(Duration::from_secs(3)) {
// TIMEOUT: the prior session is STILL Active (a wedged/slow teardown). `acquire`'s preempt
// is now Lingering-only (so build-retries JOIN the held monitor instead of churning
// REMOVE→ADD), which means the upcoming `_retry_hold` acquire would JOIN this stuck monitor
// and reuse its DEAD IddCx swap-chain → a full-session black screen with no self-heal until
// this session disconnects. Force-preempt it HERE instead. This runs at most ONCE per
// session (we hold `setup_lock`), so — unlike preempting inside `acquire` — it does not
// reintroduce the per-retry churn. The next `acquire` then sees `Idle` and creates a fresh
// monitor; the stale session's gen-stamped lease release is a no-op.
if let Some(dev) = self.device_handle() {
let taken = {
let mut state = self.state.lock().unwrap();
match std::mem::replace(&mut *state, MgrState::Idle) {
MgrState::Active { mon, .. } => Some(mon),
// Raced to Lingering/Idle between the wait and here — restore + nothing stuck.
other => {
*state = other;
None
}
}
};
if let Some(mon) = taken {
tracing::warn!(
old_target = mon.target_id,
"IDD-push setup: force-preempting the stuck-Active prior monitor (its IddCx swap-chain is dead)"
);
// SAFETY: `teardown` requires `dev` to be the live control handle; `dev` is the
// cached process-lifetime `OwnedHandle` from `device_handle()` (the `Some` checked
// above). `mon` was moved out of the `Active` state under the `state` lock, so it is
// exclusively owned here — no aliasing.
unsafe { self.teardown(dev, mon) };
// Let the OS finish the ASYNC departure before the next ADD (mirrors the acquire()
// Lingering-preempt settle).
thread::sleep(Duration::from_millis(400));
}
}
}
}
guard
}
/// Wait (up to `timeout`) for the active monitor to be RELEASED (the MGR is no longer `Active`).
/// Used by the IDD-push reconnect preempt: after signalling the old session to stop, wait here so it
/// tears its monitor down cleanly before we acquire a fresh one.
pub(crate) fn wait_for_monitor_released(&self, timeout: Duration) {
/// tears its monitor down cleanly before we acquire a fresh one. Returns `true` if it released, `false`
/// on timeout (the prior session is still `Active` — the caller force-preempts it).
pub(crate) fn wait_for_monitor_released(&self, timeout: Duration) -> bool {
let deadline = Instant::now() + timeout;
loop {
if !matches!(*self.state.lock().unwrap(), MgrState::Active { .. }) {
return;
return true;
}
if Instant::now() >= deadline {
tracing::warn!(
"IDD-push preempt: prior session didn't release the monitor within {timeout:?} — proceeding"
"IDD-push preempt: prior session didn't release the monitor within {timeout:?} — force-preempting"
);
return;
return false;
}
thread::sleep(Duration::from_millis(25));
}