feat(windows-host): pf-vdisplay — fix the ADD/REMOVE wedge + per-client display-config persistence

Two phases of pf-vdisplay (IddCx virtual display) lifecycle work, both validated on-glass on the RTX box.

Phase 1 — fix the long-standing IOCTL_ADD 0x80070490 (ERROR_NOT_FOUND) wedge that ghost-monitor
slot-budget exhaustion produced under ADD/REMOVE churn (the reset-script/reboot recurring failure).
Validated: 43 reconnect-churn cycles, 0 wedges, monitor-node count flat at 1.
  * driver: on IddCxMonitorArrival failure, tear the created-but-not-arrived monitor down with
    WdfObjectDelete + reclaim its id — the asymmetric-with-the-create-failure-path leak that exhausted
    the 16-monitor MaxMonitorsSupported budget; recover MONITOR_MODES from lock poisoning instead of
    failing closed (defensive; the driver builds panic=abort).
  * host: collapse the build-retry churn — hold ONE monitor lease across all build attempts and preempt
    only on Lingering (not Active), so a cold start does 1 ADD not 8; reap not-present "punktfunk"
    monitor PDOs on startup (the reset-script step-2 logic, in-process) and self-heal a detected
    0x80070490 by reaping + retrying ADD; force-preempt a stuck-Active prior monitor on the
    begin_idd_setup timeout (the safety net the Lingering-only preempt would otherwise drop).

Phase 2 — give each client (keyed by its cert FINGERPRINT) a STABLE virtual-monitor id (1..=15) so
Windows reapplies that client's saved per-monitor config (DPI SCALING) across reconnects, and two
clients never share/bleed config. Validated: distinct clients -> distinct ids (1, 2); the driver
honors the host's id (echoed resolved == preferred).
  * proto: rename AddRequest._reserved -> preferred_monitor_id (offset 20) and AddReply._reserved ->
    resolved_monitor_id (offset 12) — byte-compatible (offset asserts), NO PROTOCOL_VERSION bump, so a
    pre-Phase-2 driver degrades gracefully to auto-id (the host detects it via the resolved echo).
  * driver: create_monitor honors a host-supplied preferred id via resolve_id (range 1..=15, never
    collides with a live monitor) and seeds the EDID serial + IddCx ConnectorIndex + ContainerId from it.
  * host: a persisted LRU fingerprint->id map (%ProgramData%\punktfunk\pf-vdisplay-identity.json),
    threaded to add_monitor via a set_client_identity no-op trait method (Linux/GameStream unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-29 21:42:59 +02:00
parent 080c55dbf7
commit 0f798d62b6
8 changed files with 553 additions and 83 deletions
+22
View File
@@ -2792,6 +2792,11 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// host-lifetime VirtualDisplayManager (§2.5). It does NO monitor work, so it must precede the IDD-push
// preempt below (which reaches the manager) — otherwise `vdm()` is called before init and panics.
let mut vd = crate::vdisplay::open(compositor)?;
// Per-client STABLE monitor identity (Phase 2): hand the backend the connecting client's cert
// fingerprint so a freshly CREATED virtual monitor gets this client's persistent id — Windows then
// reapplies the client's saved per-monitor config (DPI scaling) on reconnect. No-op on Linux backends
// and for anonymous/GameStream clients (no fingerprint → the driver auto-allocates).
vd.set_client_identity(endpoint::peer_fingerprint(&conn));
// IDD-push reconnect preempt (the dance now lives in the manager, Goal-1 §2.5): serialize setup so a
// reconnect FLOOD can't run concurrent monitor create/teardown, STOP the prior session + WAIT for it
// to release its monitor (instead of tearing a monitor out from under a still-live session), and
@@ -3310,6 +3315,23 @@ fn build_pipeline_with_retry(
// 30-60s to produce its first frame, and a first-connect timeout would tear down the warm
// session (forcing another cold start on reconnect). A genuinely permanent failure still fails
// fast via `is_permanent_build_error`; only transient "no frame yet" retries consume the budget.
// IDD-push only: HOLD one monitor lease across all build attempts. A failed attempt's capturer
// drop releases ITS lease, but this held lease keeps the shared monitor Active (refs >= 1), so the
// next attempt's `vd.create` JOINS it (refcount++) instead of finding it Lingering and tripping the
// IDD-push reconnect PREEMPT (teardown + recreate). That preempt-per-retry was the REMOVE→ADD churn
// that exhausts the IddCx monitor-slot pool and wedges ADD at 0x80070490 — one ADD per cold start
// now, not one per attempt. Non-IDD-push backends (Linux portal, WGC) don't use the refcount manager
// and aren't churn-wedge-prone, so they keep create-per-attempt (a held lease there would allocate a
// second virtual output). Dropped when this fn returns — on success the Pipeline's own lease keeps
// the monitor Active; on failure refs falls to 0 → Lingering → linger-timeout teardown.
let _retry_hold = if matches!(plan.capture, crate::session_plan::CaptureBackend::IddPush) {
Some(
vd.create(mode)
.context("acquire virtual output for the session (retry-hold lease)")?,
)
} else {
None
};
const MAX_ATTEMPTS: u32 = 8;
let mut backoff = std::time::Duration::from_millis(500);
for attempt in 1..=MAX_ATTEMPTS {