feat(host/windows,drivers): gamepad driver attach/heartbeat health surfaced in logs
apple / swift (push) Successful in 1m12s
windows-drivers / probe-and-proto (push) Successful in 14s
windows-drivers / driver-build (push) Successful in 1m15s
apple / screenshots (push) Successful in 5m30s
android / android (push) Successful in 3m35s
ci / web (push) Successful in 51s
ci / rust (push) Successful in 1m44s
ci / docs-site (push) Successful in 58s
deb / build-publish (push) Successful in 4m6s
ci / bench (push) Successful in 4m50s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 7s
decky / build-publish (push) Successful in 13s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 8s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 7s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 35s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 51s
windows-host / package (push) Failing after 2m28s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m40s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m40s
docker / deploy-docs (push) Successful in 5s

The gamepad drivers have no IOCTL plane (hidclass gates the stack), so
until now the host had ZERO visibility into whether a driver ever
bound: a pad could be "created" with no driver installed and nothing
was logged. Two health fields are carved from reserved shm space
(layout-compatible; pf-driver-proto pins the offsets): driver_proto —
stamped by pf-xusb at device add + per serviced XInput IOCTL (movement
= the game-visible path) and by pf-dualsense/DS4 from its ~125Hz timer
— and driver_heartbeat. Host-side, every pad owns a DriverAttach
watcher fed from the existing service() poll: INFO on attach (WARN on
proto mismatch), and after 3s of silence ONE diagnosis WARN combining
a cached pnputil /enum-drivers store check, the devnode's CM problem
code (CM_Locate_DevNodeW/CM_Get_DevNode_Status on the instance id now
captured from the create callback, with plain-language hints: 28 = not
installed, 52 = signature/Memory Integrity, …) and the driver's debug
log path. Also fixes a real bug both SwDeviceCreate wrappers shared:
the 10s WaitForSingleObject result was ignored and the callback
HRESULT zero-initialised, so a PnP timeout read as SUCCESS (now E_FAIL
init + explicit timeout error). Failure-mode table:
design/gamepad-driver-health.md.

Linux workspace green; Windows host + drivers CI-compile only, on-box
recipe at the bottom of the design doc.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 16:33:31 +00:00
parent 8af1a15aa6
commit c21549c136
9 changed files with 495 additions and 51 deletions
+65
View File
@@ -0,0 +1,65 @@
# Windows gamepad-driver health: failure modes and how each is surfaced
Written for the "host doesn't see the client's gamepad" class of bug report (2026-07-02). The
Windows virtual pads have many silent ways to fail: the stack spans the host process, a named
shared-memory section, a PnP software devnode, a UMDF driver in its own WUDFHost.exe, and finally
the game's input API (XInput / HID / SDL). Before this work the host logged only its *own* create
calls — a pad could "exist" with no driver installed and nothing was ever logged. This document
enumerates the failure modes and states, for each, how it is now detected and what the log line
says. All host lines land in stderr / `%ProgramData%\punktfunk\logs\host.log` (service) **and** the
in-memory ring served at `GET /api/v1/logs` → the web console **Logs** page.
## The health signals
The gamepad drivers have no IOCTL plane (`hidclass` gates the device stack), so the only
cross-process channel is the shared section itself. Two fields were carved out of reserved space
(layout-compatible; old drivers simply never write them, `pf-driver-proto` pins the offsets):
| field | XusbShm | PadShm | writer | meaning |
|---|---|---|---|---|
| `driver_proto` | @32 | @144 | driver | `GAMEPAD_PROTO_VERSION` once attached; `0` = no driver on this section |
| `driver_heartbeat` | @36 | @148 | driver | XUSB: +1 per serviced XInput IOCTL (game-visible path). DS/DS4: +1 per ~8 ms timer tick (liveness) |
Host side, every pad owns a `DriverAttach` watcher (`inject/windows/gamepad_raii.rs`), fed from the
existing `service()` poll. State machine, each transition logs exactly once:
- `driver_proto != 0` → INFO `gamepad driver attached to the shared section` (with `late=true` if
it came after the warning); WARN on a proto/host version mismatch.
- 3 s of silence → one diagnosis WARN combining: **driver-store check** (`pnputil /enum-drivers`,
cached once per process, only run on the failure path), **devnode PnP status** (`CM_Locate_DevNodeW`
+ `CM_Get_DevNode_Status` on the instance id captured from the SwDeviceCreate callback, with a
plain-language hint per CM problem code), and the driver's own debug log path.
## Failure modes
| # | failure | cause examples | detection | surfaced as |
|---|---|---|---|---|
| 1 | Driver package not installed | fresh box, installer's `driver install --gamepad` skipped/failed, package pruned | attach timeout → `pnputil /enum-drivers` misses `pf_xusb.inf`/`pf_dualsense.inf` | WARN `driver package NOT in the driver store — run: punktfunk-host.exe driver install --gamepad` |
| 2 | Package present but binding failed | certificate not in Root/TrustedPublisher, Memory Integrity (HVCI) rejects it, stale DriverVer kept the old binary | attach timeout → devnode problem code (28 = drivers not installed, 52 = signature rejected, 31/39 = load failure) | WARN with the CM problem code + hint |
| 3 | Driver bound but crashed / never started | WUDFHost crash, `WdfDeviceCreate`/queue failure inside the driver | attach timeout → devnode status shows `driver_loaded`/`started` flags; the driver's own log (`C:\Users\Public\pf*-driver.log`) has the failing WDF call | WARN referencing both |
| 4 | `SwDeviceCreate` fails outright | not Administrator/SYSTEM, PnP wedged, `_` in enumerator (E_INVALIDARG) | existing error path (unchanged) | WARN `SwDeviceCreate failed; … devnode unavailable`, pad continues on the out-of-band fallback |
| 5 | `SwDeviceCreate` callback never fires | PnP service hung | **was silently mis-read as success** (zero-init `HRESULT(0)` + ignored `WaitForSingleObject` return). Fixed: `result` inits to `E_FAIL`, the wait result is checked | ERROR `enumeration callback never fired (10s) — PnP may be wedged` |
| 6 | Driver attached, then WUDFHost died mid-session | crash, killed | `driver_heartbeat` freezes (DS/DS4: timer-driven, so a freeze is conclusive; XUSB: only advances while a game polls, so absence is *not* an error) | field exists for a future stall check; not auto-warned yet (XUSB semantics make a generic rule false-positive-prone) |
| 7 | Version skew host↔driver | new host + old installed driver (or vice versa) | `driver_proto` ≠ host's `GAMEPAD_PROTO_VERSION`; pre-health drivers read as never-attached | WARN `driver/host protocol mismatch — update the drivers` (mismatch) / the mode-1 diagnosis text notes the pre-health case |
| 8 | Whole backend latched off | first pad creation failed → `broken` latch disables pads for the session | existing behaviour, now with remedy text | ERROR `…controller input disabled until the next client connect (install/repair: punktfunk-host.exe driver install --gamepad)` |
| 9 | Section created but game can't see the pad | XInput slot ordering, HidHide-style HID filters on the game process, RPCS3 pad-handler config, GameInput's instance-path VID/PID parse | **not host-detectable** — outside our process and the driver's stack. XUSB `driver_heartbeat` advancing proves "some XInput client polls us", which brackets the problem to the game's side | diagnosis text points at the driver log; the client-side controller view (Android "Connected controllers") covers the other end of the chain |
## What deliberately did NOT change
- The `broken` latch stays one-way per session (retry loops against a missing driver would spam
PnP); the log line now says so and gives the remedy.
- No mgmt-API health endpoint yet — the log ring is the surfacing channel. If the web console ever
grows a "gamepad health" card, `DriverAttach` is the state to expose.
- The DS/DS4 heartbeat is not yet watched for mid-session stalls (mode 6): worth adding once the
XUSB/DS semantics split is encoded (DS freeze = conclusive, XUSB freeze = normal when no game
polls).
## Validation status
- Linux-side: workspace build/tests/clippy green; `pf-driver-proto` layout asserts pin the new
offsets (compile-time).
- Windows host code + both drivers: compile-checked in CI only (this box cannot cross-build the
native deps); **not yet on-box validated**. On-box test recipe: stop the service, `pnputil
/delete-driver` the gamepad package, connect a client with a pad → expect the mode-1 WARN in the
console Logs page within ~3 s of the pad arriving; reinstall drivers → expect the
`attached (late=true)` INFO on the next session.