# pf-vdisplay: fullscreen game breaks video (IDD-push capture) — issue analysis > **Status: FIXED ✅ (2026-06-25).** Resolved by the **resolution-listening recovery** — see > [Resolution](#resolution-fixed-2026-06-25) below. The investigation that follows is kept as the record > of how it was diagnosed. Companion to [`windows-host-rewrite.md`](./windows-host-rewrite.md). ## Resolution (fixed 2026-06-25) The fix landed as the **recover-or-drop** design (host-only, **no protocol bump**), *not* the composing-capturer mid-session failover originally sketched in [Recommended fix](#recommended-fix-staged): - **`c87bfe0` — IDD-push *recovers* from a game mode-set (the "resolution-listening" work).** The ring now **tracks the display's actual mode**. At open it is sized to the display's real resolution (new `win_display::active_resolution`, CCD/GDI). Mid-session the 250 ms poll — previously HDR-toggle-only — now also **follows the active resolution**; on *any* descriptor change (size **or** HDR) it recreates the ring at the new mode (`recreate_ring` generalized to a new size), the driver re-attaches via the existing `is_stale()` path, and frames resume at the game's mode. **No freeze, no reconnect.** If a change is genuinely unrecoverable (e.g. an exclusive flip the host can't follow) a `recovering_since` clock fires after 3 s and `try_consume` drops the session cleanly so the client reconnects, instead of freezing forever. A pure idle desktop (no mode change) never triggers it. - **`f98ab07` — open-time first-frame failover to DDA (GB1 pt 1).** `wait_for_attach` now requires the driver to publish a *first frame* (not just `DRV_STATUS_OPENED`); a display the driver attaches to but whose frames its `publish()` guard rejects now fails `open()` within ~4 s → `capture.rs` falls back to DDA → the game is captured + visible after a reconnect. A normal/idle open (frame within ~1 s) is never false-failed, and DDA is itself a working path, so even a false positive degrades gracefully. - **`789ad49` — driver `publish()` width/height guard + a process-lifetime flushed log appender** (GB3 groundwork): drops a surface whose descriptor no longer matches the host ring (`CopyResource` needs matching dims too, else garbage) and logs the actual descriptor once per mismatch episode, so the swap-chain WORKER-thread lines land (closing the bug-doc **S3** observability gap). Needs a driver rebuild + re-vendor to deploy (separate from the host-only GB1 fix). **Why this instead of the composing capturer (original Stage 1):** the host reads the display's real resolution straight from Windows (CCD/GDI), so it doesn't need the driver to report it over a new `SharedHeader` field — the original **Stage 2's protocol bump is unnecessary**. In-place recovery keeps the fast IDD-push (zero-copy) path live *through* a game mode-set instead of permanently demoting to DDA; open-time DDA failover (`f98ab07`) covers the "display already in a broken mode at connect" case. **Deferred (non-blocking):** Stage 3 (trim `default_modes`) — deprioritized (recovery handles mode-sets and trimming risks the live display-activation path); Stage S driver resilience (S1/S2) — gated on the `789ad49` logging once a fresh repro is captured. Owner-confirmed the resolution-listening recovery fixes the user-visible bug (2026-06-25). ## Context The all-Rust `pf-vdisplay` IddCx virtual-display driver (STEP 0–8 of the Windows host rewrite, now on `main`, on-glass-validated for plain desktop + HDR streaming) breaks when a **fullscreen game** runs on the stream. **Reproduction (RTX 4090 box `192.168.1.158`):** launch *Doom the Dark Ages* while streaming → the desktop image **flashes** (a display mode-set fired), the game is **never visible**, and **disconnect + reconnect yields a black screen with working audio**. (The box was rebooted afterward, so live logs from the incident are gone.) **Runtime config in play** (`C:\ProgramData\punktfunk\host.env`): - `PUNKTFUNK_IDD_PUSH=1` → capture comes from the driver's **shared-memory frame ring**, not DDA/WGC. - `PUNKTFUNK_10BIT=1` (+ `PUNKTFUNK_HDR_SHADER_P010=1`) → **HDR active**; the ring is FP16. - `PUNKTFUNK_MONITOR_LINGER_MS=0` → every (re)connect builds a **fresh** monitor + ring. - `PUNKTFUNK_VDISPLAY=pf`, `PUNKTFUNK_ENCODER=nvenc`, `PUNKTFUNK_SECURE_DDA=1`. The driver log (`C:\Users\Public\pfvd-driver.log`) at inspection showed **8 fresh `IddCxMonitorCreate`/`Arrival` pairs (ids 1–8), all `0x0`, and ZERO swap-chain-processor lines** — so monitor creation is healthy and the break is entirely **downstream of monitor creation** (swap-chain drain / frame publish / host consume), exactly where a game-induced mode change lands. ## Root cause (one sentence) The IDD-push ring is created **once** at session start with a **fixed format and fixed size** derived from session-start state, there is **no channel for the driver to report the actual acquired-surface descriptor** back to the host, and there is **no mid-session fallback** — so when a game forces a format and/or resolution change on the virtual display, the driver silently drops every frame, the host never learns it needs to adapt, and the stream goes black and then hard-crashes. ## How the symptom maps to the code 1. Game launches → forces a **mode set** on the virtual display (the "desktop flash"). This changes the OS-composed surface's **DXGI format and/or width/height**, and triggers a swap-chain unassign→reassign in the driver. 2. The driver's `publish()` copies the acquired surface into the host ring **only if formats match exactly** (`desc.Format` u32 compare) — and `CopyResource` *also* silently requires identical dimensions, which is never checked. → **every frame dropped.** 3. The host's only ring-recreate trigger is polling Windows' **HDR-enabled toggle**. A game-driven format/size change it can't observe → **host never recreates the ring** → driver re-attaches to the same mismatched ring → keeps dropping. 4. Once `PUNKTFUNK_IDD_PUSH=1`, the ring is the **sole** capture source (no DDA/WGC fallback). `next_frame()` repeats the last good frame, then **`bail!`s after a 20 s deadline → the stream dies.** 5. **Reconnect stays black** because the game is still holding the display in the changed state; the fresh ring is rebuilt at the **session-negotiated** format/size again and re-mismatches. Audio is a fully independent plane, so it survives — matching "black + audio." --- ## Identified issues ### Primary **P1 — IDD-push ring format is fixed at session start; host can't observe a game-driven format change.** - Host picks the ring format once: FP16 (`DXGI_FORMAT_R16G16B16A16_FLOAT`) if `advanced_color_enabled(target_id)` else `DXGI_FORMAT_B8G8R8A8_UNORM`. `crates/punktfunk-host/src/capture/idd_push.rs:340-361` - Driver drops any frame whose `desc.Format` ≠ the ring format, silently. `packaging/windows/drivers/pf-vdisplay/src/frame_transport.rs:281-286` - Host recreates the ring **only** on a Windows HDR-toggle poll (250 ms), never on a format change it can't see. `idd_push.rs:619-640` (`poll_display_hdr` → `recreate_ring` at `:582-617`). - Driver re-attaches on a host generation bump (`is_stale`), but nothing bumps it for this case. `frame_transport.rs:259-270`. - **No `SharedHeader` field carries the driver's actual acquired-surface format** — the driver only writes `driver_status`, `driver_status_detail`, `driver_render_luid_low/high` back. **P2 — IDD-push ring size is fixed at session start; a resolution change is never detected.** - `header.width/height` written once at `idd_push.rs:396-397`; ring slots sized once and never resized; consumed frames always report the session size (`idd_push.rs:744-745`). - `publish()` guards **format only, not width/height** (`frame_transport.rs:284`). `CopyResource` requires identical dimensions, so a resolution change → silent no-op/garbage, no error logged. - Driver never reports the acquired surface's real width/height to the host. **P3 — No mid-session capture fallback; a 20 s hard crash instead of degrade.** - `PUNKTFUNK_IDD_PUSH=1` returns the IDD-push capturer early with the keepalive moved into it — **no fall-through**. `crates/punktfunk-host/src/capture.rs:348-356`. - `next_frame()` waits on the frame-ready event (16 ms), repeats the last frame, and **`bail!`s after a 20 s deadline** → the encode loop tears the session down. `idd_push.rs:819-847`. - The WGC→DDA fallback that exists (`capture.rs:389-404`) is **open-time only** and on the **non**-IDD-push path; it does not help here. - The `VirtualOutput` already carries a `WinCaptureTarget { adapter_luid, gdi_name, target_id }` (`vdisplay/pf_vdisplay.rs` `Monitor::target()`), so a DDA/WGC capturer **can** be opened on the same virtual output — the wiring just doesn't exist for IDD-push. ### Secondary (verify during the fix; not the proven primary cause) **S1 — Driver `run_core` exits permanently on a swap-chain error, with no clear re-arm.** - On a `ReleaseAndAcquireBuffer2` error (e.g. `DXGI_ERROR_ACCESS_LOST` when a game grabs the display), `run_core` `break`s and returns; the worker exits and deletes the swap-chain object. `packaging/windows/drivers/pf-vdisplay/src/swap_chain_processor.rs:359-362` (+ delete at `:141-143`). - A mode change drives unassign→assign which **does** respawn a fresh processor (`callbacks.rs:309-318`, `:249-305`), so a clean mode change recovers. **Open question:** whether the OS reliably re-assigns after a bare `ACCESS_LOST` exit (no unassign), or whether the monitor stalls with a dead-but-installed processor. Confirm against the IddCx contract / upstream `virtual-display-rs`. The standard IddCx model expects the OS to re-assign, but this needs proof. **S2 — `IddCxSwapChainSetDevice` give-up leaves a dead-but-installed processor.** - `assign_swap_chain` returns `STATUS_SUCCESS` and installs the processor **before** the worker's `SetDevice` retries run; if all 60 retries (≈3 s) fail during a mode flap, the worker returns and the processor is dead, but the OS believes the swap chain is assigned → potential permanent stall. `swap_chain_processor.rs:191-226`, `callbacks.rs:279-293`. **S3 — Driver worker-thread diagnostics are not landing (impairs root-causing).** - `dbglog!` → `log.rs` opens/append/closes the file per call with **no explicit flush**, and the observed log had only control-plane (IOCTL-thread) lines, no swap-chain-processor lines. `packaging/windows/drivers/pf-vdisplay/src/log.rs:9-22`. - Whatever the exact reason (write race / token / interleave), the practical effect is the swap-chain processor's behavior during the break is **invisible**, which is why the cause can't be pinned from logs alone today. **Fix this first** so the next repro is conclusive. --- ## Verified facts that de-risk the fix - **The encoder already adapts to a mid-session size/format change.** `encode/nvenc.rs:580-618`: `submit` detects `size_changed`/`hdr_changed`/device change per frame, tears down, and re-inits adopting the new frame's geometry + pixel format. So a capturer that changes resolution/format mid-session is handled downstream — **no encoder API change is needed** for either fix direction. - **The stream loop relays per-frame geometry.** `CapturedFrame` carries `width`/`height`/`format` (`capture.rs:50-57`); the loop reads `pipeline_depth()` live and forwards whatever `try_latest()` returns. - **WGC and DDA emit the same pixel formats the IDD-push path emits** (`Bgra` / `Rgb10a2`), so a failover capturer feeds the encoder compatible frames. - **A failover capturer fits the existing `Capturer` trait** (`next_frame` + `try_latest`, `capture.rs:120-155`) — a composing capturer that owns the ring capturer + a lazily-opened WGC/DDA capturer and switches between them is a clean drop-in. --- ## Recommended fix (staged) > **Superseded — see [Resolution](#resolution-fixed-2026-06-25).** This was the original plan; the bug > was fixed by the simpler **recover-or-drop** approach (host follows the OS resolution + open-time DDA > failover), so Stage 1's composing capturer and Stage 2's protocol bump were not needed. Kept for context. Defense-in-depth. Stages 0–1 are **host-only** (no driver rebuild, no protocol bump) and are the fast, robust, user-visible fix. Stages 2–3 harden the fast path and need the driver re-vendor loop. - **Stage 0 — Diagnostics first (land before anything else).** - `log.rs`: flush after each write (or keep a process-lifetime appender) and confirm worker-thread writes land. (S3) - Driver: in `publish()`, log/record the acquired surface's **actual format + width + height** even on the drop path, so a repro shows exactly what changed. - Host: replace the silent 20 s wait with a `tracing::warn!` at ~2 s of no fresh frame, including `driver_status`/`driver_status_detail` and the host's expected ring format/size. - Goal: the next Doom-launch repro definitively classifies the cause (format mismatch vs size mismatch vs `run_core` exit vs no-reassign). - **Stage 1 — Mid-session fallback IDD-push → WGC/DDA (robust to ALL failure modes).** (P3) - Add a composing `Capturer` that owns the IDD-push capturer and, when it yields no fresh frame for a **short** window (~1.5 s, not 20 s), opens a DDA/WGC capturer on the same `WinCaptureTarget` and serves from it for the rest of the session (optionally probing the ring for recovery). Encoder follows the new format/size automatically (verified above). - This alone guarantees the session never goes permanently black again and makes Doom playable via WGC/DDA when the ring path is defeated — independent of the *why*. - Touch points: `capture.rs:334-356` (wire the composing capturer behind `PUNKTFUNK_IDD_PUSH`), `idd_push.rs` (expose a "stalled?" signal + shorten the deadline), reuse `dxgi.rs`/`wgc.rs`. - **Stage 2 — Adaptive ring (makes the fast IDD-push path itself survive a game mode change).** (P1, P2) - Driver writes the **actual acquired-surface format + width + height** into new `SharedHeader` fields, in `publish()`, **even when about to drop the frame**. - Host watches those fields and, on any change vs the ring's current format/size, **recreates the ring at the new descriptor + bumps `generation`** (generalize `recreate_ring`/`poll_display_hdr` from "HDR toggled" to "descriptor changed"). Driver re-attaches via existing `is_stale()`. - Driver `publish()` gains a **width/height guard** alongside the format guard. - **Implications:** bump `pf_vdisplay_proto::PROTOCOL_VERSION` (host does a HARD version check in `pf_vdisplay.rs::mgr_ensure_device`), update the `const` size/offset asserts in `crates/pf-vdisplay-proto/src/frame.rs`, and deploy host + driver **in lockstep** (rebuild + re-sign + re-vendor `packaging/windows/pf-vdisplay/{dll,inf,cat}` on the RTX box, WUDFHost reload). - **Stage 3 — Prevention (frequency reducer, not a standalone fix).** (reduces P1/P2 triggers) - Trim `monitor.rs::default_modes()` so the IDD advertises essentially only the negotiated mode, so a game can't pick a different fullscreen resolution. Verify it doesn't break mid-stream `Reconfigure`. Optionally re-assert the active mode after a detected mode change. - **Stage S — Driver resilience (address S1/S2 once Stage 0 reveals if they fire).** - If logs show a permanent stall after `ACCESS_LOST`/SetDevice-give-up, add a re-arm path (e.g. delete the swap chain so the OS re-assigns, or signal `assign_swap_chain` to retry) and avoid installing a processor that has already failed `SetDevice`. ## Validation plan (RTX box `ssh "Enrico Bühler@192.168.1.158"`) 1. Deploy the Stage-0 host (+ driver if rebuilt); `punktfunk-host service stop/start`. 2. Connect a client, confirm normal stream. `type C:\Users\Public\pfvd-driver.log` to baseline. 3. Launch *Doom the Dark Ages* (or any fullscreen/HDR game). Capture: driver log + host service log (find where the in-session `serve` logs land; `RUST_LOG=info`). 4. Read which mechanism fired (format/size/exit/no-reassign) from the Stage-0 diagnostics. 5. **Success:** game is visible, the stream survives the mode-set flash, no 20 s crash, reconnect restores video. With Stage 1: the failover to WGC/DDA is logged and frames keep flowing. With Stage 2: the ring recreates at the new descriptor and the fast path resumes. ## File map | Area | Path | |---|---| | Host ring consumer | `crates/punktfunk-host/src/capture/idd_push.rs` | | Capture selection / trait | `crates/punktfunk-host/src/capture.rs` | | NVENC re-init (no change needed) | `crates/punktfunk-host/src/encode/nvenc.rs:564-618` | | DDA / WGC capturers (failover targets) | `crates/punktfunk-host/src/capture/{dxgi,wgc}.rs` | | Host monitor lifecycle / capture target | `crates/punktfunk-host/src/vdisplay/pf_vdisplay.rs` | | Shared contract (Stage 2 fields + version) | `crates/pf-vdisplay-proto/src/{lib,frame}.rs` | | Driver frame publisher (guards + reporting) | `packaging/windows/drivers/pf-vdisplay/src/frame_transport.rs` | | Driver swap-chain lifecycle (S1/S2) | `packaging/windows/drivers/pf-vdisplay/src/swap_chain_processor.rs`, `callbacks.rs` | | Driver logging (S3) | `packaging/windows/drivers/pf-vdisplay/src/log.rs` | | Advertised modes (Stage 3) | `packaging/windows/drivers/pf-vdisplay/src/monitor.rs` (`default_modes`) | | Vendored signed driver (Stage 2 re-vendor) | `packaging/windows/pf-vdisplay/{pf_vdisplay.dll,.inf,.cat}` | ## Notes / caveats - Doc lag (unrelated to the fix, worth flagging): `stage-pf-vdisplay.ps1` / packaging comments still reference the OLD `packaging/windows/vdisplay-driver/` tree; the active driver source is the NEW `packaging/windows/drivers/pf-vdisplay/` tree (re-vendored in commit `a11b0dd`). - The exact trigger (format vs resolution vs exclusive-flip vs processor-death) is **not yet proven from logs** — Stage 0 exists to pin it. Stage 1 fixes the user-visible symptom regardless.