The fullscreen-game-breaks-IDD-push bug is FIXED by the resolution-listening recovery (c87bfe0: the 250ms poll now follows the display's actual resolution and recreates the ring on any descriptor change, recover-or-drop), backed by open-time first-frame DDA failover (f98ab07) and the driver publish() width/ height guard + flushed logging (789ad49). No protocol bump was needed — the host reads the real resolution straight from Windows (CCD/GDI), so the bug doc's Stage-1 composing capturer + Stage-2 protocol bump were unnecessary. Bug doc marked FIXED with a Resolution section; the staged plan kept as superseded record. windows-host-rewrite.md: the progress log was stale (ended at "M1 cont."). Added §15 Current status — the driver STEP 0-8 port landed on main on-glass HDR- validated; the host was refactored *in place* via windows-host-goal1 (not the §10 greenfield rebuild); §2.5 ownership model resolved the swap-chain-reuse / monitor- leak open item; iddcx + /INTEGRITYCHECK CI-green. Remaining: the secure-desktop on-glass gate (the single biggest unproven claim), M4 gamepad-driver migration, M5/M6 cleanup, and the pf-vdisplay slot-reclaim driver fix. Top Status flipped proposed → largely implemented. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
18 KiB
pf-vdisplay: fullscreen game breaks video (IDD-push capture) — issue analysis
Status: FIXED ✅ (2026-06-25). Resolved by the resolution-listening recovery — see Resolution below. The investigation that follows is kept as the record of how it was diagnosed. Companion to
windows-host-rewrite.md.
Resolution (fixed 2026-06-25)
The fix landed as the recover-or-drop design (host-only, no protocol bump), not the composing-capturer mid-session failover originally sketched in Recommended fix:
c87bfe0— IDD-push recovers from a game mode-set (the "resolution-listening" work). The ring now tracks the display's actual mode. At open it is sized to the display's real resolution (newwin_display::active_resolution, CCD/GDI). Mid-session the 250 ms poll — previously HDR-toggle-only — now also follows the active resolution; on any descriptor change (size or HDR) it recreates the ring at the new mode (recreate_ringgeneralized to a new size), the driver re-attaches via the existingis_stale()path, and frames resume at the game's mode. No freeze, no reconnect. If a change is genuinely unrecoverable (e.g. an exclusive flip the host can't follow) arecovering_sinceclock fires after 3 s andtry_consumedrops the session cleanly so the client reconnects, instead of freezing forever. A pure idle desktop (no mode change) never triggers it.f98ab07— open-time first-frame failover to DDA (GB1 pt 1).wait_for_attachnow requires the driver to publish a first frame (not justDRV_STATUS_OPENED); a display the driver attaches to but whose frames itspublish()guard rejects now failsopen()within ~4 s →capture.rsfalls back to DDA → the game is captured + visible after a reconnect. A normal/idle open (frame within ~1 s) is never false-failed, and DDA is itself a working path, so even a false positive degrades gracefully.789ad49— driverpublish()width/height guard + a process-lifetime flushed log appender (GB3 groundwork): drops a surface whose descriptor no longer matches the host ring (CopyResourceneeds matching dims too, else garbage) and logs the actual descriptor once per mismatch episode, so the swap-chain WORKER-thread lines land (closing the bug-doc S3 observability gap). Needs a driver rebuild + re-vendor to deploy (separate from the host-only GB1 fix).
Why this instead of the composing capturer (original Stage 1): the host reads the display's real
resolution straight from Windows (CCD/GDI), so it doesn't need the driver to report it over a new
SharedHeader field — the original Stage 2's protocol bump is unnecessary. In-place recovery keeps the
fast IDD-push (zero-copy) path live through a game mode-set instead of permanently demoting to DDA;
open-time DDA failover (f98ab07) covers the "display already in a broken mode at connect" case.
Deferred (non-blocking): Stage 3 (trim default_modes) — deprioritized (recovery handles mode-sets and
trimming risks the live display-activation path); Stage S driver resilience (S1/S2) — gated on the
789ad49 logging once a fresh repro is captured. Owner-confirmed the resolution-listening recovery fixes
the user-visible bug (2026-06-25).
Context
The all-Rust pf-vdisplay IddCx virtual-display driver (STEP 0–8 of the Windows host rewrite,
now on main, on-glass-validated for plain desktop + HDR streaming) breaks when a fullscreen
game runs on the stream.
Reproduction (RTX 4090 box 192.168.1.158): launch Doom the Dark Ages while streaming → the
desktop image flashes (a display mode-set fired), the game is never visible, and **disconnect
- reconnect yields a black screen with working audio**. (The box was rebooted afterward, so live logs from the incident are gone.)
Runtime config in play (C:\ProgramData\punktfunk\host.env):
PUNKTFUNK_IDD_PUSH=1→ capture comes from the driver's shared-memory frame ring, not DDA/WGC.PUNKTFUNK_10BIT=1(+PUNKTFUNK_HDR_SHADER_P010=1) → HDR active; the ring is FP16.PUNKTFUNK_MONITOR_LINGER_MS=0→ every (re)connect builds a fresh monitor + ring.PUNKTFUNK_VDISPLAY=pf,PUNKTFUNK_ENCODER=nvenc,PUNKTFUNK_SECURE_DDA=1.
The driver log (C:\Users\Public\pfvd-driver.log) at inspection showed 8 fresh
IddCxMonitorCreate/Arrival pairs (ids 1–8), all 0x0, and ZERO swap-chain-processor lines —
so monitor creation is healthy and the break is entirely downstream of monitor creation
(swap-chain drain / frame publish / host consume), exactly where a game-induced mode change lands.
Root cause (one sentence)
The IDD-push ring is created once at session start with a fixed format and fixed size derived from session-start state, there is no channel for the driver to report the actual acquired-surface descriptor back to the host, and there is no mid-session fallback — so when a game forces a format and/or resolution change on the virtual display, the driver silently drops every frame, the host never learns it needs to adapt, and the stream goes black and then hard-crashes.
How the symptom maps to the code
- Game launches → forces a mode set on the virtual display (the "desktop flash"). This changes the OS-composed surface's DXGI format and/or width/height, and triggers a swap-chain unassign→reassign in the driver.
- The driver's
publish()copies the acquired surface into the host ring only if formats match exactly (desc.Formatu32 compare) — andCopyResourcealso silently requires identical dimensions, which is never checked. → every frame dropped. - The host's only ring-recreate trigger is polling Windows' HDR-enabled toggle. A game-driven format/size change it can't observe → host never recreates the ring → driver re-attaches to the same mismatched ring → keeps dropping.
- Once
PUNKTFUNK_IDD_PUSH=1, the ring is the sole capture source (no DDA/WGC fallback).next_frame()repeats the last good frame, thenbail!s after a 20 s deadline → the stream dies. - Reconnect stays black because the game is still holding the display in the changed state; the fresh ring is rebuilt at the session-negotiated format/size again and re-mismatches. Audio is a fully independent plane, so it survives — matching "black + audio."
Identified issues
Primary
P1 — IDD-push ring format is fixed at session start; host can't observe a game-driven format change.
- Host picks the ring format once: FP16 (
DXGI_FORMAT_R16G16B16A16_FLOAT) ifadvanced_color_enabled(target_id)elseDXGI_FORMAT_B8G8R8A8_UNORM.crates/punktfunk-host/src/capture/idd_push.rs:340-361 - Driver drops any frame whose
desc.Format≠ the ring format, silently.packaging/windows/drivers/pf-vdisplay/src/frame_transport.rs:281-286 - Host recreates the ring only on a Windows HDR-toggle poll (250 ms), never on a format change
it can't see.
idd_push.rs:619-640(poll_display_hdr→recreate_ringat:582-617). - Driver re-attaches on a host generation bump (
is_stale), but nothing bumps it for this case.frame_transport.rs:259-270. - No
SharedHeaderfield carries the driver's actual acquired-surface format — the driver only writesdriver_status,driver_status_detail,driver_render_luid_low/highback.
P2 — IDD-push ring size is fixed at session start; a resolution change is never detected.
header.width/heightwritten once atidd_push.rs:396-397; ring slots sized once and never resized; consumed frames always report the session size (idd_push.rs:744-745).publish()guards format only, not width/height (frame_transport.rs:284).CopyResourcerequires identical dimensions, so a resolution change → silent no-op/garbage, no error logged.- Driver never reports the acquired surface's real width/height to the host.
P3 — No mid-session capture fallback; a 20 s hard crash instead of degrade.
PUNKTFUNK_IDD_PUSH=1returns the IDD-push capturer early with the keepalive moved into it — no fall-through.crates/punktfunk-host/src/capture.rs:348-356.next_frame()waits on the frame-ready event (16 ms), repeats the last frame, andbail!s after a 20 s deadline → the encode loop tears the session down.idd_push.rs:819-847.- The WGC→DDA fallback that exists (
capture.rs:389-404) is open-time only and on the non-IDD-push path; it does not help here. - The
VirtualOutputalready carries aWinCaptureTarget { adapter_luid, gdi_name, target_id }(vdisplay/pf_vdisplay.rsMonitor::target()), so a DDA/WGC capturer can be opened on the same virtual output — the wiring just doesn't exist for IDD-push.
Secondary (verify during the fix; not the proven primary cause)
S1 — Driver run_core exits permanently on a swap-chain error, with no clear re-arm.
- On a
ReleaseAndAcquireBuffer2error (e.g.DXGI_ERROR_ACCESS_LOSTwhen a game grabs the display),run_corebreaks and returns; the worker exits and deletes the swap-chain object.packaging/windows/drivers/pf-vdisplay/src/swap_chain_processor.rs:359-362(+ delete at:141-143). - A mode change drives unassign→assign which does respawn a fresh processor
(
callbacks.rs:309-318,:249-305), so a clean mode change recovers. Open question: whether the OS reliably re-assigns after a bareACCESS_LOSTexit (no unassign), or whether the monitor stalls with a dead-but-installed processor. Confirm against the IddCx contract / upstreamvirtual-display-rs. The standard IddCx model expects the OS to re-assign, but this needs proof.
S2 — IddCxSwapChainSetDevice give-up leaves a dead-but-installed processor.
assign_swap_chainreturnsSTATUS_SUCCESSand installs the processor before the worker'sSetDeviceretries run; if all 60 retries (≈3 s) fail during a mode flap, the worker returns and the processor is dead, but the OS believes the swap chain is assigned → potential permanent stall.swap_chain_processor.rs:191-226,callbacks.rs:279-293.
S3 — Driver worker-thread diagnostics are not landing (impairs root-causing).
dbglog!→log.rsopens/append/closes the file per call with no explicit flush, and the observed log had only control-plane (IOCTL-thread) lines, no swap-chain-processor lines.packaging/windows/drivers/pf-vdisplay/src/log.rs:9-22.- Whatever the exact reason (write race / token / interleave), the practical effect is the swap-chain processor's behavior during the break is invisible, which is why the cause can't be pinned from logs alone today. Fix this first so the next repro is conclusive.
Verified facts that de-risk the fix
- The encoder already adapts to a mid-session size/format change.
encode/nvenc.rs:580-618:submitdetectssize_changed/hdr_changed/device change per frame, tears down, and re-inits adopting the new frame's geometry + pixel format. So a capturer that changes resolution/format mid-session is handled downstream — no encoder API change is needed for either fix direction. - The stream loop relays per-frame geometry.
CapturedFramecarrieswidth/height/format(capture.rs:50-57); the loop readspipeline_depth()live and forwards whatevertry_latest()returns. - WGC and DDA emit the same pixel formats the IDD-push path emits (
Bgra/Rgb10a2), so a failover capturer feeds the encoder compatible frames. - A failover capturer fits the existing
Capturertrait (next_frame+try_latest,capture.rs:120-155) — a composing capturer that owns the ring capturer + a lazily-opened WGC/DDA capturer and switches between them is a clean drop-in.
Recommended fix (staged)
Superseded — see Resolution. This was the original plan; the bug was fixed by the simpler recover-or-drop approach (host follows the OS resolution + open-time DDA failover), so Stage 1's composing capturer and Stage 2's protocol bump were not needed. Kept for context.
Defense-in-depth. Stages 0–1 are host-only (no driver rebuild, no protocol bump) and are the fast, robust, user-visible fix. Stages 2–3 harden the fast path and need the driver re-vendor loop.
-
Stage 0 — Diagnostics first (land before anything else).
log.rs: flush after each write (or keep a process-lifetime appender) and confirm worker-thread writes land. (S3)- Driver: in
publish(), log/record the acquired surface's actual format + width + height even on the drop path, so a repro shows exactly what changed. - Host: replace the silent 20 s wait with a
tracing::warn!at ~2 s of no fresh frame, includingdriver_status/driver_status_detailand the host's expected ring format/size. - Goal: the next Doom-launch repro definitively classifies the cause (format mismatch vs size
mismatch vs
run_coreexit vs no-reassign).
-
Stage 1 — Mid-session fallback IDD-push → WGC/DDA (robust to ALL failure modes). (P3)
- Add a composing
Capturerthat owns the IDD-push capturer and, when it yields no fresh frame for a short window (~1.5 s, not 20 s), opens a DDA/WGC capturer on the sameWinCaptureTargetand serves from it for the rest of the session (optionally probing the ring for recovery). Encoder follows the new format/size automatically (verified above). - This alone guarantees the session never goes permanently black again and makes Doom playable via WGC/DDA when the ring path is defeated — independent of the why.
- Touch points:
capture.rs:334-356(wire the composing capturer behindPUNKTFUNK_IDD_PUSH),idd_push.rs(expose a "stalled?" signal + shorten the deadline), reusedxgi.rs/wgc.rs.
- Add a composing
-
Stage 2 — Adaptive ring (makes the fast IDD-push path itself survive a game mode change). (P1, P2)
- Driver writes the actual acquired-surface format + width + height into new
SharedHeaderfields, inpublish(), even when about to drop the frame. - Host watches those fields and, on any change vs the ring's current format/size, recreates the
ring at the new descriptor + bumps
generation(generalizerecreate_ring/poll_display_hdrfrom "HDR toggled" to "descriptor changed"). Driver re-attaches via existingis_stale(). - Driver
publish()gains a width/height guard alongside the format guard. - Implications: bump
pf_vdisplay_proto::PROTOCOL_VERSION(host does a HARD version check inpf_vdisplay.rs::mgr_ensure_device), update theconstsize/offset asserts incrates/pf-vdisplay-proto/src/frame.rs, and deploy host + driver in lockstep (rebuild + re-sign + re-vendorpackaging/windows/pf-vdisplay/{dll,inf,cat}on the RTX box, WUDFHost reload).
- Driver writes the actual acquired-surface format + width + height into new
-
Stage 3 — Prevention (frequency reducer, not a standalone fix). (reduces P1/P2 triggers)
- Trim
monitor.rs::default_modes()so the IDD advertises essentially only the negotiated mode, so a game can't pick a different fullscreen resolution. Verify it doesn't break mid-streamReconfigure. Optionally re-assert the active mode after a detected mode change.
- Trim
-
Stage S — Driver resilience (address S1/S2 once Stage 0 reveals if they fire).
- If logs show a permanent stall after
ACCESS_LOST/SetDevice-give-up, add a re-arm path (e.g. delete the swap chain so the OS re-assigns, or signalassign_swap_chainto retry) and avoid installing a processor that has already failedSetDevice.
- If logs show a permanent stall after
Validation plan (RTX box ssh "Enrico Bühler@192.168.1.158")
- Deploy the Stage-0 host (+ driver if rebuilt);
punktfunk-host service stop/start. - Connect a client, confirm normal stream.
type C:\Users\Public\pfvd-driver.logto baseline. - Launch Doom the Dark Ages (or any fullscreen/HDR game). Capture: driver log + host service log
(find where the in-session
servelogs land;RUST_LOG=info). - Read which mechanism fired (format/size/exit/no-reassign) from the Stage-0 diagnostics.
- Success: game is visible, the stream survives the mode-set flash, no 20 s crash, reconnect restores video. With Stage 1: the failover to WGC/DDA is logged and frames keep flowing. With Stage 2: the ring recreates at the new descriptor and the fast path resumes.
File map
| Area | Path |
|---|---|
| Host ring consumer | crates/punktfunk-host/src/capture/idd_push.rs |
| Capture selection / trait | crates/punktfunk-host/src/capture.rs |
| NVENC re-init (no change needed) | crates/punktfunk-host/src/encode/nvenc.rs:564-618 |
| DDA / WGC capturers (failover targets) | crates/punktfunk-host/src/capture/{dxgi,wgc}.rs |
| Host monitor lifecycle / capture target | crates/punktfunk-host/src/vdisplay/pf_vdisplay.rs |
| Shared contract (Stage 2 fields + version) | crates/pf-vdisplay-proto/src/{lib,frame}.rs |
| Driver frame publisher (guards + reporting) | packaging/windows/drivers/pf-vdisplay/src/frame_transport.rs |
| Driver swap-chain lifecycle (S1/S2) | packaging/windows/drivers/pf-vdisplay/src/swap_chain_processor.rs, callbacks.rs |
| Driver logging (S3) | packaging/windows/drivers/pf-vdisplay/src/log.rs |
| Advertised modes (Stage 3) | packaging/windows/drivers/pf-vdisplay/src/monitor.rs (default_modes) |
| Vendored signed driver (Stage 2 re-vendor) | packaging/windows/pf-vdisplay/{pf_vdisplay.dll,.inf,.cat} |
Notes / caveats
- Doc lag (unrelated to the fix, worth flagging):
stage-pf-vdisplay.ps1/ packaging comments still reference the OLDpackaging/windows/vdisplay-driver/tree; the active driver source is the NEWpackaging/windows/drivers/pf-vdisplay/tree (re-vendored in commita11b0dd). - The exact trigger (format vs resolution vs exclusive-flip vs processor-death) is not yet proven from logs — Stage 0 exists to pin it. Stage 1 fixes the user-visible symptom regardless.