Files
punktfunk/design/gamepad-driver-health.md
T
enricobuehler efb1ba26d7 fix(windows): opt-in pad-driver file logs + size-capped service log rotation
Two disk-write fixes:

- pf-xusb/pf-dualsense no longer write C:\Users\Public\pf*-driver.log
  unconditionally — the file log is now opt-in (debug builds, or the
  PFXUSB_DEBUG_LOG / PFDS_DEBUG_LOG system env var), mirroring the audit-§4.4
  fix pf-vdisplay already got: a release driver never writes the world-writable
  Public file (info-leak/DoS surface), and the per-report OUTPUT/SET_STATE hex
  dumps stop being a sustained per-rumble disk-write path during gameplay.
  OutputDebugStringA stays unconditional; the host's driver-silence WARN and
  the gamepad-driver-health failure-mode table now say the log is opt-in.

- service.log/host.log get one-generation rotation: at each (re)open a file
  over 10 MB is renamed to .old, so a crash-restart loop or a RUST_LOG=debug
  left in host.env can't grow the append-forever logs without bound. Rotation
  runs only before an open (never under a live appender — host.log's handle
  lacks FILE_SHARE_DELETE, so a racing rename harmlessly fails).

Windows CI compile/clippy pending (drivers workspace + host are not
Linux-cross-checkable); rides along with the next pad-driver redeploy.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 14:03:32 +00:00

6.3 KiB

Windows gamepad-driver health: failure modes and how each is surfaced

Written for the "host doesn't see the client's gamepad" class of bug report (2026-07-02). The Windows virtual pads have many silent ways to fail: the stack spans the host process, a named shared-memory section, a PnP software devnode, a UMDF driver in its own WUDFHost.exe, and finally the game's input API (XInput / HID / SDL). Before this work the host logged only its own create calls — a pad could "exist" with no driver installed and nothing was ever logged. This document enumerates the failure modes and states, for each, how it is now detected and what the log line says. All host lines land in stderr / %ProgramData%\punktfunk\logs\host.log (service) and the in-memory ring served at GET /api/v1/logs → the web console Logs page.

The health signals

The gamepad drivers have no IOCTL plane (hidclass gates the device stack), so the only cross-process channel is the shared section itself. Two fields were carved out of reserved space (layout-compatible; old drivers simply never write them, pf-driver-proto pins the offsets):

field XusbShm PadShm writer meaning
driver_proto @32 @144 driver GAMEPAD_PROTO_VERSION once attached; 0 = no driver on this section
driver_heartbeat @36 @148 driver XUSB: +1 per serviced XInput IOCTL (game-visible path). DS/DS4: +1 per ~8 ms timer tick (liveness)

Host side, every pad owns a DriverAttach watcher (inject/windows/gamepad_raii.rs), fed from the existing service() poll. State machine, each transition logs exactly once:

  • driver_proto != 0 → INFO gamepad driver attached to the shared section (with late=true if it came after the warning); WARN on a proto/host version mismatch.
  • 3 s of silence → one diagnosis WARN combining: driver-store check (pnputil /enum-drivers, cached once per process, only run on the failure path), devnode PnP status (CM_Locate_DevNodeW
    • CM_Get_DevNode_Status on the instance id captured from the SwDeviceCreate callback, with a plain-language hint per CM problem code), and the driver's own debug log path.

Failure modes

# failure cause examples detection surfaced as
1 Driver package not installed fresh box, installer's driver install --gamepad skipped/failed, package pruned attach timeout → pnputil /enum-drivers misses pf_xusb.inf/pf_dualsense.inf WARN driver package NOT in the driver store — run: punktfunk-host.exe driver install --gamepad
2 Package present but binding failed certificate not in Root/TrustedPublisher, Memory Integrity (HVCI) rejects it, stale DriverVer kept the old binary attach timeout → devnode problem code (28 = drivers not installed, 52 = signature rejected, 31/39 = load failure) WARN with the CM problem code + hint
3 Driver bound but crashed / never started WUDFHost crash, WdfDeviceCreate/queue failure inside the driver attach timeout → devnode status shows driver_loaded/started flags; the driver's own log (C:\Users\Public\pf*-driver.log) has the failing WDF call — opt-in like pf-vdisplay's (debug builds, or PFXUSB_DEBUG_LOG/PFDS_DEBUG_LOG set system-wide + device restart) WARN referencing both
4 SwDeviceCreate fails outright not Administrator/SYSTEM, PnP wedged, _ in enumerator (E_INVALIDARG) existing error path (unchanged) WARN SwDeviceCreate failed; … devnode unavailable, pad continues on the out-of-band fallback
5 SwDeviceCreate callback never fires PnP service hung was silently mis-read as success (zero-init HRESULT(0) + ignored WaitForSingleObject return). Fixed: result inits to E_FAIL, the wait result is checked ERROR enumeration callback never fired (10s) — PnP may be wedged
6 Driver attached, then WUDFHost died mid-session crash, killed driver_heartbeat freezes (DS/DS4: timer-driven, so a freeze is conclusive; XUSB: only advances while a game polls, so absence is not an error) field exists for a future stall check; not auto-warned yet (XUSB semantics make a generic rule false-positive-prone)
7 Version skew host↔driver new host + old installed driver (or vice versa) driver_proto ≠ host's GAMEPAD_PROTO_VERSION; pre-health drivers read as never-attached WARN driver/host protocol mismatch — update the drivers (mismatch) / the mode-1 diagnosis text notes the pre-health case
8 Whole backend latched off first pad creation failed → broken latch disables pads for the session existing behaviour, now with remedy text ERROR …controller input disabled until the next client connect (install/repair: punktfunk-host.exe driver install --gamepad)
9 Section created but game can't see the pad XInput slot ordering, HidHide-style HID filters on the game process, RPCS3 pad-handler config, GameInput's instance-path VID/PID parse not host-detectable — outside our process and the driver's stack. XUSB driver_heartbeat advancing proves "some XInput client polls us", which brackets the problem to the game's side diagnosis text points at the driver log; the client-side controller view (Android "Connected controllers") covers the other end of the chain

What deliberately did NOT change

  • The broken latch stays one-way per session (retry loops against a missing driver would spam PnP); the log line now says so and gives the remedy.
  • No mgmt-API health endpoint yet — the log ring is the surfacing channel. If the web console ever grows a "gamepad health" card, DriverAttach is the state to expose.
  • The DS/DS4 heartbeat is not yet watched for mid-session stalls (mode 6): worth adding once the XUSB/DS semantics split is encoded (DS freeze = conclusive, XUSB freeze = normal when no game polls).

Validation status

  • Linux-side: workspace build/tests/clippy green; pf-driver-proto layout asserts pin the new offsets (compile-time).
  • Windows host code + both drivers: compile-checked in CI only (this box cannot cross-build the native deps); not yet on-box validated. On-box test recipe: stop the service, pnputil /delete-driver the gamepad package, connect a client with a pad → expect the mode-1 WARN in the console Logs page within ~3 s of the pad arriving; reinstall drivers → expect the attached (late=true) INFO on the next session.