feat(windows-host): pf-vdisplay — fix the ADD/REMOVE wedge + per-client display-config persistence
Two phases of pf-vdisplay (IddCx virtual display) lifecycle work, both validated on-glass on the RTX box.
Phase 1 — fix the long-standing IOCTL_ADD 0x80070490 (ERROR_NOT_FOUND) wedge that ghost-monitor
slot-budget exhaustion produced under ADD/REMOVE churn (the reset-script/reboot recurring failure).
Validated: 43 reconnect-churn cycles, 0 wedges, monitor-node count flat at 1.
* driver: on IddCxMonitorArrival failure, tear the created-but-not-arrived monitor down with
WdfObjectDelete + reclaim its id — the asymmetric-with-the-create-failure-path leak that exhausted
the 16-monitor MaxMonitorsSupported budget; recover MONITOR_MODES from lock poisoning instead of
failing closed (defensive; the driver builds panic=abort).
* host: collapse the build-retry churn — hold ONE monitor lease across all build attempts and preempt
only on Lingering (not Active), so a cold start does 1 ADD not 8; reap not-present "punktfunk"
monitor PDOs on startup (the reset-script step-2 logic, in-process) and self-heal a detected
0x80070490 by reaping + retrying ADD; force-preempt a stuck-Active prior monitor on the
begin_idd_setup timeout (the safety net the Lingering-only preempt would otherwise drop).
Phase 2 — give each client (keyed by its cert FINGERPRINT) a STABLE virtual-monitor id (1..=15) so
Windows reapplies that client's saved per-monitor config (DPI SCALING) across reconnects, and two
clients never share/bleed config. Validated: distinct clients -> distinct ids (1, 2); the driver
honors the host's id (echoed resolved == preferred).
* proto: rename AddRequest._reserved -> preferred_monitor_id (offset 20) and AddReply._reserved ->
resolved_monitor_id (offset 12) — byte-compatible (offset asserts), NO PROTOCOL_VERSION bump, so a
pre-Phase-2 driver degrades gracefully to auto-id (the host detects it via the resolved echo).
* driver: create_monitor honors a host-supplied preferred id via resolve_id (range 1..=15, never
collides with a live monitor) and seeds the EDID serial + IddCx ConnectorIndex + ContainerId from it.
* host: a persisted LRU fingerprint->id map (%ProgramData%\punktfunk\pf-vdisplay-identity.json),
threaded to add_monitor via a set_client_identity no-op trait method (Linux/GameStream unaffected).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -2792,6 +2792,11 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
|
||||
// host-lifetime VirtualDisplayManager (§2.5). It does NO monitor work, so it must precede the IDD-push
|
||||
// preempt below (which reaches the manager) — otherwise `vdm()` is called before init and panics.
|
||||
let mut vd = crate::vdisplay::open(compositor)?;
|
||||
// Per-client STABLE monitor identity (Phase 2): hand the backend the connecting client's cert
|
||||
// fingerprint so a freshly CREATED virtual monitor gets this client's persistent id — Windows then
|
||||
// reapplies the client's saved per-monitor config (DPI scaling) on reconnect. No-op on Linux backends
|
||||
// and for anonymous/GameStream clients (no fingerprint → the driver auto-allocates).
|
||||
vd.set_client_identity(endpoint::peer_fingerprint(&conn));
|
||||
// IDD-push reconnect preempt (the dance now lives in the manager, Goal-1 §2.5): serialize setup so a
|
||||
// reconnect FLOOD can't run concurrent monitor create/teardown, STOP the prior session + WAIT for it
|
||||
// to release its monitor (instead of tearing a monitor out from under a still-live session), and
|
||||
@@ -3310,6 +3315,23 @@ fn build_pipeline_with_retry(
|
||||
// 30-60s to produce its first frame, and a first-connect timeout would tear down the warm
|
||||
// session (forcing another cold start on reconnect). A genuinely permanent failure still fails
|
||||
// fast via `is_permanent_build_error`; only transient "no frame yet" retries consume the budget.
|
||||
// IDD-push only: HOLD one monitor lease across all build attempts. A failed attempt's capturer
|
||||
// drop releases ITS lease, but this held lease keeps the shared monitor Active (refs >= 1), so the
|
||||
// next attempt's `vd.create` JOINS it (refcount++) instead of finding it Lingering and tripping the
|
||||
// IDD-push reconnect PREEMPT (teardown + recreate). That preempt-per-retry was the REMOVE→ADD churn
|
||||
// that exhausts the IddCx monitor-slot pool and wedges ADD at 0x80070490 — one ADD per cold start
|
||||
// now, not one per attempt. Non-IDD-push backends (Linux portal, WGC) don't use the refcount manager
|
||||
// and aren't churn-wedge-prone, so they keep create-per-attempt (a held lease there would allocate a
|
||||
// second virtual output). Dropped when this fn returns — on success the Pipeline's own lease keeps
|
||||
// the monitor Active; on failure refs falls to 0 → Lingering → linger-timeout teardown.
|
||||
let _retry_hold = if matches!(plan.capture, crate::session_plan::CaptureBackend::IddPush) {
|
||||
Some(
|
||||
vd.create(mode)
|
||||
.context("acquire virtual output for the session (retry-hold lease)")?,
|
||||
)
|
||||
} else {
|
||||
None
|
||||
};
|
||||
const MAX_ATTEMPTS: u32 = 8;
|
||||
let mut backoff = std::time::Duration::from_millis(500);
|
||||
for attempt in 1..=MAX_ATTEMPTS {
|
||||
|
||||
@@ -58,6 +58,12 @@ pub trait VirtualDisplay: Send {
|
||||
/// sessions can't stomp each other's launch target. Default: no-op (backends that attach to an
|
||||
/// existing session / don't spawn a nested command ignore it; only gamescope's spawn path uses it).
|
||||
fn set_launch_command(&mut self, _cmd: Option<String>) {}
|
||||
/// Set the connecting client's cert fingerprint so the backend can give that client a STABLE virtual
|
||||
/// monitor identity across reconnects — Windows then reapplies the client's saved per-monitor config
|
||||
/// (notably DPI scaling). Carried on the backend instance; set once before [`create`](Self::create).
|
||||
/// Default: no-op — only the Windows pf-vdisplay backend uses it (Linux compositors own their virtual
|
||||
/// output identity). `None` = anonymous/unpaired/GameStream → the backend's auto (slot-based) identity.
|
||||
fn set_client_identity(&mut self, _fingerprint: Option<[u8; 32]>) {}
|
||||
}
|
||||
|
||||
/// Compositors punktfunk knows how to drive (plan §6).
|
||||
@@ -641,6 +647,9 @@ pub fn start_restore_worker() -> std::sync::Arc<()> {
|
||||
#[cfg(target_os = "linux")]
|
||||
#[path = "vdisplay/linux/gamescope.rs"]
|
||||
mod gamescope;
|
||||
#[cfg(target_os = "windows")]
|
||||
#[path = "vdisplay/windows/identity.rs"]
|
||||
pub(crate) mod identity;
|
||||
#[cfg(target_os = "linux")]
|
||||
#[path = "vdisplay/linux/kwin.rs"]
|
||||
mod kwin;
|
||||
|
||||
@@ -0,0 +1,172 @@
|
||||
//! Per-client → stable monitor-id map for pf-vdisplay (Phase 2: per-client display-config persistence).
|
||||
//!
|
||||
//! Windows keys per-monitor config — notably DPI **scaling** (`HKCU\Control Panel\Desktop\PerMonitorSettings`)
|
||||
//! — on the monitor's EDID identity AND its OS device path (whose per-connector discriminator is the IddCx
|
||||
//! `ConnectorIndex` → target UID). The pf-vdisplay driver seeds BOTH the EDID serial and the `ConnectorIndex`
|
||||
//! from a single monitor `id`. So for Windows to REAPPLY a given client's saved scaling on reconnect, that
|
||||
//! client must get the SAME `id` every time. This map assigns each client (keyed by its cert fingerprint) a
|
||||
//! STABLE id and the host passes it as [`AddRequest::preferred_monitor_id`](pf_driver_proto::control::AddRequest).
|
||||
//!
|
||||
//! The id space is bounded to `1..=15` because the driver uses the id as the IddCx `ConnectorIndex`, which
|
||||
//! must stay `< MaxMonitorsSupported` (16). When more than 15 distinct clients are remembered, the
|
||||
//! LEAST-RECENTLY-USED entry is evicted and its id reused (that evicted client simply re-establishes its
|
||||
//! scaling once on its next connect). The map persists to `%ProgramData%\punktfunk\pf-vdisplay-identity.json`
|
||||
//! so ids — and therefore the client→config association — survive host restarts.
|
||||
//!
|
||||
//! Anonymous/TOFU and GameStream sessions have no fingerprint and resolve to id `0` (auto) upstream, never
|
||||
//! reaching this map — they keep the driver's lowest-free slot behavior unchanged.
|
||||
|
||||
use std::path::PathBuf;
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
/// Max stable id. The driver uses the id as the IddCx `ConnectorIndex`, which must stay
|
||||
/// `< MaxMonitorsSupported` (16) — so ids run `1..=15`.
|
||||
const MAX_ID: u32 = 15;
|
||||
|
||||
#[derive(Serialize, Deserialize, Default)]
|
||||
struct Store {
|
||||
/// Monotonic most-recently-used counter (the entry with the highest `seen` is the MRU). Persisted so
|
||||
/// the LRU ordering survives host restarts.
|
||||
tick: u64,
|
||||
entries: Vec<Entry>,
|
||||
}
|
||||
|
||||
#[derive(Serialize, Deserialize)]
|
||||
struct Entry {
|
||||
/// Lower-hex client cert fingerprint (the map key).
|
||||
fp: String,
|
||||
/// The client's stable monitor id (`1..=15`).
|
||||
id: u32,
|
||||
/// MRU stamp (compared against [`Store::tick`]).
|
||||
seen: u64,
|
||||
}
|
||||
|
||||
/// Persistent fingerprint → stable-id map (see the module docs).
|
||||
pub(crate) struct MonitorIdentityMap {
|
||||
path: PathBuf,
|
||||
store: Store,
|
||||
}
|
||||
|
||||
impl MonitorIdentityMap {
|
||||
/// Load the persisted map (empty on first run / unreadable / parse failure — a fresh map just
|
||||
/// re-derives ids, costing a client one scaling re-set the first time).
|
||||
pub(crate) fn load() -> Self {
|
||||
let path = crate::gamestream::config_dir().join("pf-vdisplay-identity.json");
|
||||
let mut store = std::fs::read(&path)
|
||||
.ok()
|
||||
.and_then(|b| serde_json::from_slice::<Store>(&b).ok())
|
||||
.unwrap_or_default();
|
||||
// SANITIZE a hand-edited / corrupt / cross-version file before trusting it: resolve()'s found-entry
|
||||
// branch returns the stored id verbatim, so an out-of-range id (0 = the "auto" sentinel, or
|
||||
// > MAX_ID) or a duplicate id/fp would flow straight into preferred_monitor_id. Drop out-of-range
|
||||
// ids and dedup by BOTH fp and id (keeping the most-recently-seen on a clash) so no two fingerprints
|
||||
// can map to the same id. (The driver also rejects a live-colliding id as a backstop.)
|
||||
store.entries.sort_by_key(|e| std::cmp::Reverse(e.seen));
|
||||
let mut seen_fp = std::collections::HashSet::new();
|
||||
let mut seen_id = std::collections::HashSet::new();
|
||||
store.entries.retain(|e| {
|
||||
(1..=MAX_ID).contains(&e.id) && seen_fp.insert(e.fp.clone()) && seen_id.insert(e.id)
|
||||
});
|
||||
Self { path, store }
|
||||
}
|
||||
|
||||
/// The stable id (`1..=15`) for the client fingerprint `fp`: its remembered id, or a freshly assigned
|
||||
/// one (lowest free, else LRU-evict at the cap). Bumps the entry to MRU and persists.
|
||||
pub(crate) fn resolve(&mut self, fp: [u8; 32]) -> u32 {
|
||||
let key: String = fp.iter().map(|b| format!("{b:02x}")).collect();
|
||||
self.store.tick = self.store.tick.wrapping_add(1);
|
||||
let now = self.store.tick;
|
||||
|
||||
if let Some(e) = self.store.entries.iter_mut().find(|e| e.fp == key) {
|
||||
e.seen = now;
|
||||
let id = e.id;
|
||||
self.persist();
|
||||
return id;
|
||||
}
|
||||
|
||||
// New client: prefer the lowest free id in 1..=MAX_ID; if all are taken, evict the LRU entry and
|
||||
// reuse its id (the evicted client re-establishes its scaling once on its next connect).
|
||||
let id = (1..=MAX_ID)
|
||||
.find(|i| !self.store.entries.iter().any(|e| e.id == *i))
|
||||
.unwrap_or_else(|| {
|
||||
let lru = self
|
||||
.store
|
||||
.entries
|
||||
.iter()
|
||||
.enumerate()
|
||||
.min_by_key(|(_, e)| e.seen)
|
||||
.map(|(i, _)| i)
|
||||
.expect("entries are non-empty whenever every id 1..=MAX_ID is taken");
|
||||
let evicted = self.store.entries.remove(lru);
|
||||
evicted.id
|
||||
});
|
||||
self.store.entries.push(Entry {
|
||||
fp: key,
|
||||
id,
|
||||
seen: now,
|
||||
});
|
||||
self.persist();
|
||||
id
|
||||
}
|
||||
|
||||
/// Persist atomically (temp file + rename). Best-effort: a write failure just means a restart may
|
||||
/// re-derive an id (one scaling re-set). Not a credential, so a plain (non-ACL'd) write is fine.
|
||||
fn persist(&self) {
|
||||
let Ok(bytes) = serde_json::to_vec_pretty(&self.store) else {
|
||||
return;
|
||||
};
|
||||
if let Some(dir) = self.path.parent() {
|
||||
let _ = std::fs::create_dir_all(dir);
|
||||
}
|
||||
let tmp = self.path.with_extension("json.tmp");
|
||||
if std::fs::write(&tmp, &bytes).is_ok() {
|
||||
let _ = std::fs::rename(&tmp, &self.path);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn fp(n: u8) -> [u8; 32] {
|
||||
let mut f = [0u8; 32];
|
||||
f[0] = n;
|
||||
f
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn stable_across_calls_and_distinct_per_client() {
|
||||
let mut m = MonitorIdentityMap {
|
||||
path: std::env::temp_dir().join(format!("pf-id-test-{}.json", std::process::id())),
|
||||
store: Store::default(),
|
||||
};
|
||||
let a1 = m.resolve(fp(1));
|
||||
let b = m.resolve(fp(2));
|
||||
let a2 = m.resolve(fp(1));
|
||||
assert_eq!(a1, a2, "same client → same id");
|
||||
assert_ne!(a1, b, "distinct clients → distinct ids");
|
||||
assert!((1..=MAX_ID).contains(&a1) && (1..=MAX_ID).contains(&b));
|
||||
let _ = std::fs::remove_file(&m.path);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn lru_eviction_reuses_an_id_at_the_cap() {
|
||||
let mut m = MonitorIdentityMap {
|
||||
path: std::env::temp_dir().join(format!("pf-id-lru-{}.json", std::process::id())),
|
||||
store: Store::default(),
|
||||
};
|
||||
// Fill all 15 ids (clients 1..=15), then touch client 2 so client 1 is the LRU.
|
||||
for n in 1..=15u8 {
|
||||
m.resolve(fp(n));
|
||||
}
|
||||
let _ = m.resolve(fp(2));
|
||||
// A 16th client evicts the LRU (client 1) and reuses its id; ids stay bounded.
|
||||
let id16 = m.resolve(fp(16));
|
||||
assert!((1..=MAX_ID).contains(&id16));
|
||||
assert_eq!(m.store.entries.len(), 15, "cap holds at 15 entries");
|
||||
assert!(m.store.entries.iter().all(|e| (1..=MAX_ID).contains(&e.id)));
|
||||
let _ = std::fs::remove_file(&m.path);
|
||||
}
|
||||
}
|
||||
@@ -59,8 +59,9 @@ pub(crate) trait VdisplayDriver: Send + Sync {
|
||||
/// # Safety
|
||||
/// Issues setup-API + `DeviceIoControl` calls; runs in the caller's apartment.
|
||||
unsafe fn open(&self) -> Result<(OwnedHandle, u32)>;
|
||||
/// ADD a virtual monitor at `mode`, pinning the IDD render GPU to `render_luid` first if `Some`.
|
||||
/// Returns the REMOVE key + target id + the adapter LUID the driver actually used.
|
||||
/// ADD a virtual monitor at `mode`, pinning the IDD render GPU to `render_luid` first if `Some`, and
|
||||
/// requesting `preferred_monitor_id` (the host's per-client stable id; `0` = auto). Returns the REMOVE
|
||||
/// key + target id + the adapter LUID the driver actually used.
|
||||
///
|
||||
/// # Safety
|
||||
/// `dev` must be the live control handle from [`open`](Self::open).
|
||||
@@ -69,6 +70,7 @@ pub(crate) trait VdisplayDriver: Send + Sync {
|
||||
dev: HANDLE,
|
||||
mode: Mode,
|
||||
render_luid: Option<LUID>,
|
||||
preferred_monitor_id: u32,
|
||||
) -> Result<AddedMonitor>;
|
||||
/// REMOVE the monitor identified by `key`.
|
||||
///
|
||||
@@ -134,6 +136,10 @@ pub(crate) struct VirtualDisplayManager {
|
||||
/// The current IDD-push session's stop flag; a new connection signals the prior one to release its
|
||||
/// monitor before the fresh one is created (was the `IDD_SESSION_STOP` global in `punktfunk1`).
|
||||
idd_session_stop: Mutex<Option<Arc<AtomicBool>>>,
|
||||
/// Persistent per-client (cert-fingerprint) → stable monitor-id map. A monitor CREATE resolves the
|
||||
/// connecting client's id here, so the client keeps the same EDID serial + IddCx ConnectorIndex across
|
||||
/// reconnects and Windows reapplies its saved per-monitor config (DPI scaling). See [`super::identity`].
|
||||
identity_map: Mutex<super::identity::MonitorIdentityMap>,
|
||||
}
|
||||
|
||||
static VDM: OnceLock<VirtualDisplayManager> = OnceLock::new();
|
||||
@@ -149,6 +155,7 @@ pub(crate) fn init(driver: Box<dyn VdisplayDriver>) -> &'static VirtualDisplayMa
|
||||
state: Mutex::new(MgrState::Idle),
|
||||
setup_lock: Mutex::new(()),
|
||||
idd_session_stop: Mutex::new(None),
|
||||
identity_map: Mutex::new(super::identity::MonitorIdentityMap::load()),
|
||||
})
|
||||
}
|
||||
|
||||
@@ -196,30 +203,40 @@ impl VirtualDisplayManager {
|
||||
}
|
||||
|
||||
/// Acquire the shared monitor for a new session: preempt-recreate under IDD-push, join a live one
|
||||
/// (refcount++), reuse a lingering one, or create one. The returned [`MonitorLease`] releases the
|
||||
/// refcount on drop.
|
||||
pub(crate) fn acquire(&'static self, mode: Mode) -> Result<VirtualOutput> {
|
||||
/// (refcount++), reuse a lingering one, or create one. `client_fp` (the connecting client's cert
|
||||
/// fingerprint; `None` = anonymous/GameStream) gives a freshly CREATED monitor a STABLE per-client id
|
||||
/// (so Windows reapplies that client's saved per-monitor config); JOIN and lingering-reuse keep the
|
||||
/// existing monitor's id. The returned [`MonitorLease`] releases the refcount on drop.
|
||||
pub(crate) fn acquire(
|
||||
&'static self,
|
||||
mode: Mode,
|
||||
client_fp: Option<[u8; 32]>,
|
||||
) -> Result<VirtualOutput> {
|
||||
self.ensure_linger_timer();
|
||||
let mut state = self.state.lock().unwrap();
|
||||
let dev = self.ensure_device()?;
|
||||
|
||||
// IDD-push: a new connection while a monitor is live is a single-client RECONNECT (the prior
|
||||
// client is gone). A REUSED IddCx swap-chain is DEAD, so joining it hands a black screen —
|
||||
// PREEMPT: tear the old monitor down (its key/topology are restored) and create a fresh one. The
|
||||
// old session's lease is gen-stamped, so its later drop is a no-op and can't tear down the new one.
|
||||
if idd_push_mode() && matches!(*state, MgrState::Active { .. } | MgrState::Lingering { .. })
|
||||
{
|
||||
if let MgrState::Active { mon, .. } | MgrState::Lingering { mon, .. } =
|
||||
std::mem::replace(&mut *state, MgrState::Idle)
|
||||
// IDD-push: a new connection while a monitor is LINGERING is a single-client RECONNECT (the
|
||||
// prior session fully released). A REUSED IddCx swap-chain is DEAD, so reusing it hands a black
|
||||
// screen — PREEMPT: tear the lingering monitor down (its key/topology are restored) and create a
|
||||
// fresh one. The old session's lease is gen-stamped, so its later drop is a no-op.
|
||||
//
|
||||
// ONLY Lingering, NOT Active: an Active monitor still has a lease held — that's the build-retry
|
||||
// path (`build_pipeline_with_retry` holds one lease across all attempts) or a concurrent session,
|
||||
// NOT a reconnect. Preempting Active would tear a live session down AND churn REMOVE→ADD on every
|
||||
// retry — the per-cold-start monitor churn that exhausts the IddCx slot pool and wedges ADD at
|
||||
// 0x80070490. Active falls through to the JOIN path below (refcount++, no ADD).
|
||||
if idd_push_mode() && matches!(*state, MgrState::Lingering { .. }) {
|
||||
if let MgrState::Lingering { mon, .. } = std::mem::replace(&mut *state, MgrState::Idle)
|
||||
{
|
||||
tracing::info!(
|
||||
old_target = mon.target_id,
|
||||
"IDD-push reconnect — preempting the prior session, recreating a fresh monitor"
|
||||
"IDD-push reconnect — preempting the lingering monitor, recreating a fresh one"
|
||||
);
|
||||
// SAFETY: `teardown` requires `dev` to be the live control handle; `dev` is the value
|
||||
// `ensure_device()` returned above (the device is cached in the `OnceLock` and never
|
||||
// closed for the manager's lifetime). `mon` was moved out of the prior `Active`/
|
||||
// `Lingering` state by `mem::replace`, so it is exclusively owned here — no aliasing.
|
||||
// closed for the manager's lifetime). `mon` was moved out of the prior `Lingering`
|
||||
// state by `mem::replace`, so it is exclusively owned here — no aliasing.
|
||||
unsafe { self.teardown(dev, mon) };
|
||||
// Let the OS finish the ASYNC monitor departure before the next ADD; a back-to-back
|
||||
// REMOVE→ADD races the teardown and the ADD IOCTL is rejected under reconnect churn.
|
||||
@@ -264,7 +281,7 @@ impl VirtualDisplayManager {
|
||||
// SAFETY: `create_monitor` requires `dev` to be the live control handle; `dev` is the
|
||||
// handle `ensure_device()` returned above (cached in the `OnceLock`, never closed for the
|
||||
// manager's lifetime), and we hold the `state` lock.
|
||||
MgrState::Idle => unsafe { self.create_monitor(dev, mode)? },
|
||||
MgrState::Idle => unsafe { self.create_monitor(dev, mode, client_fp)? },
|
||||
MgrState::Active { .. } => unreachable!("handled above"),
|
||||
};
|
||||
let out = self.output_for(&mon);
|
||||
@@ -291,12 +308,26 @@ impl VirtualDisplayManager {
|
||||
///
|
||||
/// # Safety
|
||||
/// `dev` must be the live control handle.
|
||||
unsafe fn create_monitor(&'static self, dev: HANDLE, mode: Mode) -> Result<Monitor> {
|
||||
unsafe fn create_monitor(
|
||||
&'static self,
|
||||
dev: HANDLE,
|
||||
mode: Mode,
|
||||
client_fp: Option<[u8; 32]>,
|
||||
) -> Result<Monitor> {
|
||||
// Resolve the connecting client's STABLE per-client monitor id (so Windows reapplies its saved
|
||||
// per-monitor config — DPI scaling — on reconnect); `None`/anonymous → 0 = the driver
|
||||
// auto-allocates the lowest-free id (the original slot-based behavior).
|
||||
let preferred_id = client_fp
|
||||
.map(|fp| self.identity_map.lock().unwrap().resolve(fp))
|
||||
.unwrap_or(0);
|
||||
// SAFETY: `create_monitor`'s own `# Safety` contract guarantees `dev` is the live control
|
||||
// handle; we forward it unchanged to `add_monitor`, whose precondition is exactly that.
|
||||
// `resolve_render_pin()` returns an `Option<LUID>` by value (plain `Copy`), so no borrowed
|
||||
// memory crosses the call.
|
||||
let added = unsafe { self.driver.add_monitor(dev, mode, resolve_render_pin())? };
|
||||
let added = unsafe {
|
||||
self.driver
|
||||
.add_monitor(dev, mode, resolve_render_pin(), preferred_id)?
|
||||
};
|
||||
|
||||
// Mandatory keepalive: ping inside the watchdog window or the driver tears all displays down.
|
||||
// The pinger reaches the singleton for both the device + the driver — no raw-handle smuggle.
|
||||
@@ -510,25 +541,62 @@ impl VirtualDisplayManager {
|
||||
let prev = self.idd_session_stop.lock().unwrap().replace(stop);
|
||||
if let Some(prev_stop) = prev {
|
||||
prev_stop.store(true, Ordering::SeqCst);
|
||||
self.wait_for_monitor_released(Duration::from_secs(3));
|
||||
if !self.wait_for_monitor_released(Duration::from_secs(3)) {
|
||||
// TIMEOUT: the prior session is STILL Active (a wedged/slow teardown). `acquire`'s preempt
|
||||
// is now Lingering-only (so build-retries JOIN the held monitor instead of churning
|
||||
// REMOVE→ADD), which means the upcoming `_retry_hold` acquire would JOIN this stuck monitor
|
||||
// and reuse its DEAD IddCx swap-chain → a full-session black screen with no self-heal until
|
||||
// this session disconnects. Force-preempt it HERE instead. This runs at most ONCE per
|
||||
// session (we hold `setup_lock`), so — unlike preempting inside `acquire` — it does not
|
||||
// reintroduce the per-retry churn. The next `acquire` then sees `Idle` and creates a fresh
|
||||
// monitor; the stale session's gen-stamped lease release is a no-op.
|
||||
if let Some(dev) = self.device_handle() {
|
||||
let taken = {
|
||||
let mut state = self.state.lock().unwrap();
|
||||
match std::mem::replace(&mut *state, MgrState::Idle) {
|
||||
MgrState::Active { mon, .. } => Some(mon),
|
||||
// Raced to Lingering/Idle between the wait and here — restore + nothing stuck.
|
||||
other => {
|
||||
*state = other;
|
||||
None
|
||||
}
|
||||
}
|
||||
};
|
||||
if let Some(mon) = taken {
|
||||
tracing::warn!(
|
||||
old_target = mon.target_id,
|
||||
"IDD-push setup: force-preempting the stuck-Active prior monitor (its IddCx swap-chain is dead)"
|
||||
);
|
||||
// SAFETY: `teardown` requires `dev` to be the live control handle; `dev` is the
|
||||
// cached process-lifetime `OwnedHandle` from `device_handle()` (the `Some` checked
|
||||
// above). `mon` was moved out of the `Active` state under the `state` lock, so it is
|
||||
// exclusively owned here — no aliasing.
|
||||
unsafe { self.teardown(dev, mon) };
|
||||
// Let the OS finish the ASYNC departure before the next ADD (mirrors the acquire()
|
||||
// Lingering-preempt settle).
|
||||
thread::sleep(Duration::from_millis(400));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
guard
|
||||
}
|
||||
|
||||
/// Wait (up to `timeout`) for the active monitor to be RELEASED (the MGR is no longer `Active`).
|
||||
/// Used by the IDD-push reconnect preempt: after signalling the old session to stop, wait here so it
|
||||
/// tears its monitor down cleanly before we acquire a fresh one.
|
||||
pub(crate) fn wait_for_monitor_released(&self, timeout: Duration) {
|
||||
/// tears its monitor down cleanly before we acquire a fresh one. Returns `true` if it released, `false`
|
||||
/// on timeout (the prior session is still `Active` — the caller force-preempts it).
|
||||
pub(crate) fn wait_for_monitor_released(&self, timeout: Duration) -> bool {
|
||||
let deadline = Instant::now() + timeout;
|
||||
loop {
|
||||
if !matches!(*self.state.lock().unwrap(), MgrState::Active { .. }) {
|
||||
return;
|
||||
return true;
|
||||
}
|
||||
if Instant::now() >= deadline {
|
||||
tracing::warn!(
|
||||
"IDD-push preempt: prior session didn't release the monitor within {timeout:?} — proceeding"
|
||||
"IDD-push preempt: prior session didn't release the monitor within {timeout:?} — force-preempting"
|
||||
);
|
||||
return;
|
||||
return false;
|
||||
}
|
||||
thread::sleep(Duration::from_millis(25));
|
||||
}
|
||||
|
||||
@@ -75,6 +75,65 @@ unsafe fn ioctl(h: HANDLE, code: u32, input: &[u8], output: &mut [u8]) -> Result
|
||||
Ok(returned)
|
||||
}
|
||||
|
||||
/// Reap the ghost (NOT-present) "punktfunk" virtual-monitor device nodes that `IddCxMonitorDeparture`
|
||||
/// leaves behind. Each departed monitor leaves a not-present "Generic Monitor (punktfunk)" PDO that keeps
|
||||
/// pinning an OS VidPN target against the IddCx adapter's fixed monitor-slot budget; once ~16 accumulate,
|
||||
/// `IOCTL_ADD` wedges at 0x80070490 (`ERROR_NOT_FOUND`) and every session black-screens until a manual
|
||||
/// reset/reboot. Removing the not-present PDOs frees the slots — the in-process equivalent of
|
||||
/// `reset-pf-vdisplay.ps1` step 2 (proven on-box). Best-effort + idempotent: only NOT-present nodes
|
||||
/// (`Status != OK`) are removed, so the LIVE session's monitor (`Status OK`) is never touched; any
|
||||
/// failure is logged and swallowed. Returns the number removed.
|
||||
fn reap_ghost_monitors() -> u32 {
|
||||
// Mirrors reset-pf-vdisplay.ps1 step 2. powershell is always present for the SYSTEM service; the
|
||||
// matched tokens ('OK', 'punktfunk', the InstanceId) are locale-invariant, so this is safe on a
|
||||
// non-English box (unlike a .ps1 *file* read in the machine codepage).
|
||||
const REAP_PS: &str = "$ErrorActionPreference='SilentlyContinue'; \
|
||||
$g = Get-PnpDevice -Class Monitor | Where-Object { $_.Status -ne 'OK' -and $_.FriendlyName -match 'punktfunk' }; \
|
||||
$n = 0; foreach ($d in $g) { pnputil /remove-device $d.InstanceId *> $null; if ($LASTEXITCODE -eq 0) { $n++ } }; \
|
||||
Write-Output $n";
|
||||
// Resolve powershell by full path — the LocalSystem service's PATH is not guaranteed to include
|
||||
// System32 — with a bare-name fallback.
|
||||
let ps = std::env::var("SystemRoot")
|
||||
.map(|r| format!(r"{r}\System32\WindowsPowerShell\v1.0\powershell.exe"))
|
||||
.unwrap_or_else(|_| "powershell.exe".to_string());
|
||||
match std::process::Command::new(&ps)
|
||||
.args([
|
||||
"-NoProfile",
|
||||
"-NonInteractive",
|
||||
"-ExecutionPolicy",
|
||||
"Bypass",
|
||||
"-Command",
|
||||
REAP_PS,
|
||||
])
|
||||
.output()
|
||||
{
|
||||
Ok(o) => {
|
||||
let n = String::from_utf8_lossy(&o.stdout)
|
||||
.trim()
|
||||
.parse::<u32>()
|
||||
.unwrap_or(0);
|
||||
if n > 0 {
|
||||
tracing::warn!(
|
||||
reaped = n,
|
||||
"pf-vdisplay: reaped ghost (not-present) virtual-monitor nodes — IddCx slot-exhaustion prevention"
|
||||
);
|
||||
}
|
||||
n
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::warn!(error = %e, "pf-vdisplay: ghost-monitor reap could not spawn powershell");
|
||||
0
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// True if `e`'s chain carries the IddCx monitor-slot-exhaustion wedge HRESULT (0x80070490,
|
||||
/// `ERROR_NOT_FOUND`) — the `IOCTL_ADD` failure that ghost-PDO accumulation produces. The hex code is
|
||||
/// locale-invariant (the OS message text is not), so we match on it.
|
||||
fn is_slot_exhaustion_wedge(e: &anyhow::Error) -> bool {
|
||||
format!("{e:#}").contains("0x80070490")
|
||||
}
|
||||
|
||||
/// Pin the pf-vdisplay IddCx's RENDER GPU to `luid` (the analogue of Apollo's `SetRenderAdapter`). No
|
||||
/// output buffer. Issued on the driver handle BEFORE `IOCTL_ADD` to steer which GPU the new target
|
||||
/// renders on — on a multi-adapter box this stops DXGI from reparenting the virtual output onto a
|
||||
@@ -193,6 +252,12 @@ impl VdisplayDriver for PfVdisplayDriver {
|
||||
} else {
|
||||
tracing::warn!("pf-vdisplay IOCTL_CLEAR_ALL failed on startup (continuing)");
|
||||
}
|
||||
// CLEAR_ALL only departs the driver's own (in-process) monitor list; it can NOT remove the
|
||||
// OS-side not-present "Generic Monitor (punktfunk)" PDOs that a previous host-run's monitor
|
||||
// departures left behind. Reap those here so a fresh host start begins with a clean IddCx
|
||||
// monitor-slot budget — prevents the 0x80070490 slot-exhaustion wedge from carrying across
|
||||
// restarts (the reason a restart's CLEAR_ALL alone never recovered it before).
|
||||
reap_ghost_monitors();
|
||||
Ok((
|
||||
// SAFETY: `device` is the valid handle from `open_device`, still owned here and NOT closed
|
||||
// on this success path (the error paths above close it and return). `from_raw_handle`'s
|
||||
@@ -208,6 +273,7 @@ impl VdisplayDriver for PfVdisplayDriver {
|
||||
dev: HANDLE,
|
||||
mode: Mode,
|
||||
render_luid: Option<LUID>,
|
||||
preferred_monitor_id: u32,
|
||||
) -> Result<AddedMonitor> {
|
||||
let session_id = next_session_id();
|
||||
let add = control::AddRequest {
|
||||
@@ -215,7 +281,7 @@ impl VdisplayDriver for PfVdisplayDriver {
|
||||
width: mode.width,
|
||||
height: mode.height,
|
||||
refresh_hz: mode.refresh_hz,
|
||||
_reserved: 0,
|
||||
preferred_monitor_id,
|
||||
};
|
||||
// SET_RENDER_ADAPTER (opt-in; pf-vdisplay IMPLEMENTS it). Non-fatal on failure: the driver reports
|
||||
// its real render LUID in the shared header, so the host binds correctly even if this is ignored.
|
||||
@@ -238,13 +304,47 @@ impl VdisplayDriver for PfVdisplayDriver {
|
||||
// borrows the local `AddRequest` (alive across this synchronous call) as the input bytes, and
|
||||
// `out` is a stack `[u8; size_of::<AddReply>()]` whose length bounds the kernel's write — both
|
||||
// buffers outlive the call.
|
||||
unsafe { ioctl(dev, control::IOCTL_ADD, bytemuck::bytes_of(&add), &mut out) }
|
||||
.with_context(|| {
|
||||
format!(
|
||||
"pf-vdisplay ADD {}x{}@{}",
|
||||
mode.width, mode.height, mode.refresh_hz
|
||||
)
|
||||
})?;
|
||||
let add_res = unsafe { ioctl(dev, control::IOCTL_ADD, bytemuck::bytes_of(&add), &mut out) };
|
||||
let add_res = match add_res {
|
||||
Err(e) if is_slot_exhaustion_wedge(&e) => {
|
||||
// The IddCx monitor-slot pool is exhausted by accumulated ghost (departed-but-not-present)
|
||||
// virtual-monitor PDOs → ADD failed 0x80070490. Reap the ghosts in-process and retry ONCE
|
||||
// so the wedge SELF-HEALS instead of hard-failing every session until a manual reset/reboot
|
||||
// (the long-standing failure mode). pnputil removal is synchronous; a brief settle lets the
|
||||
// OS recompute the adapter's monitor budget before the retry.
|
||||
let reaped = reap_ghost_monitors();
|
||||
tracing::warn!(
|
||||
reaped,
|
||||
"pf-vdisplay ADD wedged (0x80070490 ERROR_NOT_FOUND) — reaped ghost monitor nodes, retrying ADD"
|
||||
);
|
||||
// pnputil removal is durable (the ghosts are gone permanently), but the OS reclaims the
|
||||
// IddCx VidPN-target slots via ASYNC PnP teardown that can lag the synchronous pnputil
|
||||
// return. Retry the ADD a few times (300 ms apart, NO re-reap — the ghosts are already
|
||||
// removed) to ride out that variable reclaim latency rather than guess one magic settle.
|
||||
// ~1.5 s worst case, only on the rare wedge path.
|
||||
let mut res = Err(anyhow::anyhow!("pf-vdisplay ADD retry loop did not run"));
|
||||
for _ in 0..5 {
|
||||
std::thread::sleep(std::time::Duration::from_millis(300));
|
||||
// SAFETY: identical to the first IOCTL_ADD above — `dev` is the live control handle
|
||||
// (`add_monitor`'s contract), and `bytemuck::bytes_of(&add)` + `&mut out` borrow locals
|
||||
// that outlive this synchronous call.
|
||||
res = unsafe {
|
||||
ioctl(dev, control::IOCTL_ADD, bytemuck::bytes_of(&add), &mut out)
|
||||
};
|
||||
if res.is_ok() {
|
||||
break;
|
||||
}
|
||||
}
|
||||
res
|
||||
}
|
||||
other => other,
|
||||
};
|
||||
add_res.with_context(|| {
|
||||
format!(
|
||||
"pf-vdisplay ADD {}x{}@{}",
|
||||
mode.width, mode.height, mode.refresh_hz
|
||||
)
|
||||
})?;
|
||||
// `pod_read_unaligned` (NOT `from_bytes`): `out` is a stack `[u8; N]` with no guaranteed 4-byte
|
||||
// alignment, and `from_bytes` PANICS on a mismatch. This copies into an aligned `AddReply`.
|
||||
let reply: control::AddReply =
|
||||
@@ -261,6 +361,25 @@ impl VdisplayDriver for PfVdisplayDriver {
|
||||
reply.target_id,
|
||||
luid.LowPart
|
||||
);
|
||||
// Per-client identity diagnostic: did the driver honor the host's preferred (stable) monitor id?
|
||||
// A pre-Phase-2 driver leaves resolved_monitor_id=0 (it ignored the field); a current driver echoes
|
||||
// the id it actually used. A mismatch means this session fell back to an auto id, so Windows won't
|
||||
// reapply this client's saved per-monitor config (scaling) until it gets its stable id back.
|
||||
if preferred_monitor_id != 0 {
|
||||
if reply.resolved_monitor_id == preferred_monitor_id {
|
||||
tracing::info!(
|
||||
monitor_id = preferred_monitor_id,
|
||||
"pf-vdisplay: per-client monitor id honored (stable identity → saved config persists)"
|
||||
);
|
||||
} else {
|
||||
tracing::warn!(
|
||||
preferred = preferred_monitor_id,
|
||||
resolved = reply.resolved_monitor_id,
|
||||
"pf-vdisplay: preferred monitor id NOT honored (live-id collision, or a pre-Phase-2 \
|
||||
driver) — per-client config persistence degraded to auto identity this session"
|
||||
);
|
||||
}
|
||||
}
|
||||
if let Some(pin) = render_luid {
|
||||
if luid.LowPart == pin.LowPart && luid.HighPart == pin.HighPart {
|
||||
tracing::info!("pf-vdisplay ADD render adapter matches the pinned GPU (pin took)");
|
||||
@@ -309,14 +428,19 @@ impl VdisplayDriver for PfVdisplayDriver {
|
||||
}
|
||||
}
|
||||
|
||||
/// The Windows pf-vdisplay virtual-display backend. A marker — the lifecycle lives in the shared
|
||||
/// [`VirtualDisplayManager`](super::manager::VirtualDisplayManager).
|
||||
pub struct PfVdisplayDisplay;
|
||||
/// The Windows pf-vdisplay virtual-display backend. Near-stateless — the lifecycle lives in the shared
|
||||
/// [`VirtualDisplayManager`](super::manager::VirtualDisplayManager); it only carries the connecting
|
||||
/// client's fingerprint so the manager can assign a STABLE per-client monitor id (config persistence).
|
||||
pub struct PfVdisplayDisplay {
|
||||
/// The connecting client's cert fingerprint (`None` = anonymous/GameStream → the manager's auto id).
|
||||
/// Set by [`set_client_identity`](VirtualDisplay::set_client_identity) before `create`.
|
||||
client_fp: Option<[u8; 32]>,
|
||||
}
|
||||
|
||||
impl PfVdisplayDisplay {
|
||||
pub fn new() -> Result<Self> {
|
||||
super::manager::init(Box::new(PfVdisplayDriver)).open_backend()?;
|
||||
Ok(Self)
|
||||
Ok(Self { client_fp: None })
|
||||
}
|
||||
}
|
||||
|
||||
@@ -325,8 +449,12 @@ impl VirtualDisplay for PfVdisplayDisplay {
|
||||
"pf-vdisplay"
|
||||
}
|
||||
|
||||
fn set_client_identity(&mut self, fingerprint: Option<[u8; 32]>) {
|
||||
self.client_fp = fingerprint;
|
||||
}
|
||||
|
||||
fn create(&mut self, mode: Mode) -> Result<VirtualOutput> {
|
||||
super::manager::vdm().acquire(mode)
|
||||
super::manager::vdm().acquire(mode, self.client_fp)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user