feat(windows-host): pf-vdisplay — fix the ADD/REMOVE wedge + per-client display-config persistence

Two phases of pf-vdisplay (IddCx virtual display) lifecycle work, both validated on-glass on the RTX box. Phase 1 — fix the long-standing IOCTL_ADD 0x80070490 (ERROR_NOT_FOUND) wedge that ghost-monitor slot-budget exhaustion produced under ADD/REMOVE churn (the reset-script/reboot recurring failure). Validated: 43 reconnect-churn cycles, 0 wedges, monitor-node count flat at 1. * driver: on IddCxMonitorArrival failure, tear the created-but-not-arrived monitor down with WdfObjectDelete + reclaim its id — the asymmetric-with-the-create-failure-path leak that exhausted the 16-monitor MaxMonitorsSupported budget; recover MONITOR_MODES from lock poisoning instead of failing closed (defensive; the driver builds panic=abort). * host: collapse the build-retry churn — hold ONE monitor lease across all build attempts and preempt only on Lingering (not Active), so a cold start does 1 ADD not 8; reap not-present "punktfunk" monitor PDOs on startup (the reset-script step-2 logic, in-process) and self-heal a detected 0x80070490 by reaping + retrying ADD; force-preempt a stuck-Active prior monitor on the begin_idd_setup timeout (the safety net the Lingering-only preempt would otherwise drop). Phase 2 — give each client (keyed by its cert FINGERPRINT) a STABLE virtual-monitor id (1..=15) so Windows reapplies that client's saved per-monitor config (DPI SCALING) across reconnects, and two clients never share/bleed config. Validated: distinct clients -> distinct ids (1, 2); the driver honors the host's id (echoed resolved == preferred). * proto: rename AddRequest._reserved -> preferred_monitor_id (offset 20) and AddReply._reserved -> resolved_monitor_id (offset 12) — byte-compatible (offset asserts), NO PROTOCOL_VERSION bump, so a pre-Phase-2 driver degrades gracefully to auto-id (the host detects it via the resolved echo). * driver: create_monitor honors a host-supplied preferred id via resolve_id (range 1..=15, never collides with a live monitor) and seeds the EDID serial + IddCx ConnectorIndex + ContainerId from it. * host: a persisted LRU fingerprint->id map (%ProgramData%\punktfunk\pf-vdisplay-identity.json), threaded to add_monitor via a set_client_identity no-op trait method (Linux/GameStream unaffected). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 21:42:59 +02:00
parent 080c55dbf7
commit 0f798d62b6
8 changed files with 553 additions and 83 deletions
@@ -2792,6 +2792,11 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
    // host-lifetime VirtualDisplayManager (§2.5). It does NO monitor work, so it must precede the IDD-push
    // preempt below (which reaches the manager) — otherwise `vdm()` is called before init and panics.
    let mut vd = crate::vdisplay::open(compositor)?;
+    // Per-client STABLE monitor identity (Phase 2): hand the backend the connecting client's cert
+    // fingerprint so a freshly CREATED virtual monitor gets this client's persistent id — Windows then
+    // reapplies the client's saved per-monitor config (DPI scaling) on reconnect. No-op on Linux backends
+    // and for anonymous/GameStream clients (no fingerprint → the driver auto-allocates).
+    vd.set_client_identity(endpoint::peer_fingerprint(&conn));
    // IDD-push reconnect preempt (the dance now lives in the manager, Goal-1 §2.5): serialize setup so a
    // reconnect FLOOD can't run concurrent monitor create/teardown, STOP the prior session + WAIT for it
    // to release its monitor (instead of tearing a monitor out from under a still-live session), and
@@ -3310,6 +3315,23 @@ fn build_pipeline_with_retry(
    // 30-60s to produce its first frame, and a first-connect timeout would tear down the warm
    // session (forcing another cold start on reconnect). A genuinely permanent failure still fails
    // fast via `is_permanent_build_error`; only transient "no frame yet" retries consume the budget.
+    // IDD-push only: HOLD one monitor lease across all build attempts. A failed attempt's capturer
+    // drop releases ITS lease, but this held lease keeps the shared monitor Active (refs >= 1), so the
+    // next attempt's `vd.create` JOINS it (refcount++) instead of finding it Lingering and tripping the
+    // IDD-push reconnect PREEMPT (teardown + recreate). That preempt-per-retry was the REMOVE→ADD churn
+    // that exhausts the IddCx monitor-slot pool and wedges ADD at 0x80070490 — one ADD per cold start
+    // now, not one per attempt. Non-IDD-push backends (Linux portal, WGC) don't use the refcount manager
+    // and aren't churn-wedge-prone, so they keep create-per-attempt (a held lease there would allocate a
+    // second virtual output). Dropped when this fn returns — on success the Pipeline's own lease keeps
+    // the monitor Active; on failure refs falls to 0 → Lingering → linger-timeout teardown.
+    let _retry_hold = if matches!(plan.capture, crate::session_plan::CaptureBackend::IddPush) {
+        Some(
+            vd.create(mode)
+                .context("acquire virtual output for the session (retry-hold lease)")?,
+        )
+    } else {
+        None
+    };
    const MAX_ATTEMPTS: u32 = 8;
    let mut backoff = std::time::Duration::from_millis(500);
    for attempt in 1..=MAX_ATTEMPTS {