feat(host/windows,drivers): gamepad driver attach/heartbeat health surfaced in logs
apple / swift (push) Successful in 1m12s
windows-drivers / probe-and-proto (push) Successful in 14s
windows-drivers / driver-build (push) Successful in 1m15s
apple / screenshots (push) Successful in 5m30s
android / android (push) Successful in 3m35s
ci / web (push) Successful in 51s
ci / rust (push) Successful in 1m44s
ci / docs-site (push) Successful in 58s
deb / build-publish (push) Successful in 4m6s
ci / bench (push) Successful in 4m50s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 7s
decky / build-publish (push) Successful in 13s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 8s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 7s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 35s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 51s
windows-host / package (push) Failing after 2m28s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m40s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m40s
docker / deploy-docs (push) Successful in 5s
apple / swift (push) Successful in 1m12s
windows-drivers / probe-and-proto (push) Successful in 14s
windows-drivers / driver-build (push) Successful in 1m15s
apple / screenshots (push) Successful in 5m30s
android / android (push) Successful in 3m35s
ci / web (push) Successful in 51s
ci / rust (push) Successful in 1m44s
ci / docs-site (push) Successful in 58s
deb / build-publish (push) Successful in 4m6s
ci / bench (push) Successful in 4m50s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 7s
decky / build-publish (push) Successful in 13s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 8s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 7s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 35s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 51s
windows-host / package (push) Failing after 2m28s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m40s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m40s
docker / deploy-docs (push) Successful in 5s
The gamepad drivers have no IOCTL plane (hidclass gates the stack), so until now the host had ZERO visibility into whether a driver ever bound: a pad could be "created" with no driver installed and nothing was logged. Two health fields are carved from reserved shm space (layout-compatible; pf-driver-proto pins the offsets): driver_proto — stamped by pf-xusb at device add + per serviced XInput IOCTL (movement = the game-visible path) and by pf-dualsense/DS4 from its ~125Hz timer — and driver_heartbeat. Host-side, every pad owns a DriverAttach watcher fed from the existing service() poll: INFO on attach (WARN on proto mismatch), and after 3s of silence ONE diagnosis WARN combining a cached pnputil /enum-drivers store check, the devnode's CM problem code (CM_Locate_DevNodeW/CM_Get_DevNode_Status on the instance id now captured from the create callback, with plain-language hints: 28 = not installed, 52 = signature/Memory Integrity, …) and the driver's debug log path. Also fixes a real bug both SwDeviceCreate wrappers shared: the 10s WaitForSingleObject result was ignored and the callback HRESULT zero-initialised, so a PnP timeout read as SUCCESS (now E_FAIL init + explicit timeout error). Failure-mode table: design/gamepad-driver-health.md. Linux workspace green; Windows host + drivers CI-compile only, on-box recipe at the bottom of the design doc. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,65 @@
|
||||
# Windows gamepad-driver health: failure modes and how each is surfaced
|
||||
|
||||
Written for the "host doesn't see the client's gamepad" class of bug report (2026-07-02). The
|
||||
Windows virtual pads have many silent ways to fail: the stack spans the host process, a named
|
||||
shared-memory section, a PnP software devnode, a UMDF driver in its own WUDFHost.exe, and finally
|
||||
the game's input API (XInput / HID / SDL). Before this work the host logged only its *own* create
|
||||
calls — a pad could "exist" with no driver installed and nothing was ever logged. This document
|
||||
enumerates the failure modes and states, for each, how it is now detected and what the log line
|
||||
says. All host lines land in stderr / `%ProgramData%\punktfunk\logs\host.log` (service) **and** the
|
||||
in-memory ring served at `GET /api/v1/logs` → the web console **Logs** page.
|
||||
|
||||
## The health signals
|
||||
|
||||
The gamepad drivers have no IOCTL plane (`hidclass` gates the device stack), so the only
|
||||
cross-process channel is the shared section itself. Two fields were carved out of reserved space
|
||||
(layout-compatible; old drivers simply never write them, `pf-driver-proto` pins the offsets):
|
||||
|
||||
| field | XusbShm | PadShm | writer | meaning |
|
||||
|---|---|---|---|---|
|
||||
| `driver_proto` | @32 | @144 | driver | `GAMEPAD_PROTO_VERSION` once attached; `0` = no driver on this section |
|
||||
| `driver_heartbeat` | @36 | @148 | driver | XUSB: +1 per serviced XInput IOCTL (game-visible path). DS/DS4: +1 per ~8 ms timer tick (liveness) |
|
||||
|
||||
Host side, every pad owns a `DriverAttach` watcher (`inject/windows/gamepad_raii.rs`), fed from the
|
||||
existing `service()` poll. State machine, each transition logs exactly once:
|
||||
|
||||
- `driver_proto != 0` → INFO `gamepad driver attached to the shared section` (with `late=true` if
|
||||
it came after the warning); WARN on a proto/host version mismatch.
|
||||
- 3 s of silence → one diagnosis WARN combining: **driver-store check** (`pnputil /enum-drivers`,
|
||||
cached once per process, only run on the failure path), **devnode PnP status** (`CM_Locate_DevNodeW`
|
||||
+ `CM_Get_DevNode_Status` on the instance id captured from the SwDeviceCreate callback, with a
|
||||
plain-language hint per CM problem code), and the driver's own debug log path.
|
||||
|
||||
## Failure modes
|
||||
|
||||
| # | failure | cause examples | detection | surfaced as |
|
||||
|---|---|---|---|---|
|
||||
| 1 | Driver package not installed | fresh box, installer's `driver install --gamepad` skipped/failed, package pruned | attach timeout → `pnputil /enum-drivers` misses `pf_xusb.inf`/`pf_dualsense.inf` | WARN `driver package NOT in the driver store — run: punktfunk-host.exe driver install --gamepad` |
|
||||
| 2 | Package present but binding failed | certificate not in Root/TrustedPublisher, Memory Integrity (HVCI) rejects it, stale DriverVer kept the old binary | attach timeout → devnode problem code (28 = drivers not installed, 52 = signature rejected, 31/39 = load failure) | WARN with the CM problem code + hint |
|
||||
| 3 | Driver bound but crashed / never started | WUDFHost crash, `WdfDeviceCreate`/queue failure inside the driver | attach timeout → devnode status shows `driver_loaded`/`started` flags; the driver's own log (`C:\Users\Public\pf*-driver.log`) has the failing WDF call | WARN referencing both |
|
||||
| 4 | `SwDeviceCreate` fails outright | not Administrator/SYSTEM, PnP wedged, `_` in enumerator (E_INVALIDARG) | existing error path (unchanged) | WARN `SwDeviceCreate failed; … devnode unavailable`, pad continues on the out-of-band fallback |
|
||||
| 5 | `SwDeviceCreate` callback never fires | PnP service hung | **was silently mis-read as success** (zero-init `HRESULT(0)` + ignored `WaitForSingleObject` return). Fixed: `result` inits to `E_FAIL`, the wait result is checked | ERROR `enumeration callback never fired (10s) — PnP may be wedged` |
|
||||
| 6 | Driver attached, then WUDFHost died mid-session | crash, killed | `driver_heartbeat` freezes (DS/DS4: timer-driven, so a freeze is conclusive; XUSB: only advances while a game polls, so absence is *not* an error) | field exists for a future stall check; not auto-warned yet (XUSB semantics make a generic rule false-positive-prone) |
|
||||
| 7 | Version skew host↔driver | new host + old installed driver (or vice versa) | `driver_proto` ≠ host's `GAMEPAD_PROTO_VERSION`; pre-health drivers read as never-attached | WARN `driver/host protocol mismatch — update the drivers` (mismatch) / the mode-1 diagnosis text notes the pre-health case |
|
||||
| 8 | Whole backend latched off | first pad creation failed → `broken` latch disables pads for the session | existing behaviour, now with remedy text | ERROR `…controller input disabled until the next client connect (install/repair: punktfunk-host.exe driver install --gamepad)` |
|
||||
| 9 | Section created but game can't see the pad | XInput slot ordering, HidHide-style HID filters on the game process, RPCS3 pad-handler config, GameInput's instance-path VID/PID parse | **not host-detectable** — outside our process and the driver's stack. XUSB `driver_heartbeat` advancing proves "some XInput client polls us", which brackets the problem to the game's side | diagnosis text points at the driver log; the client-side controller view (Android "Connected controllers") covers the other end of the chain |
|
||||
|
||||
## What deliberately did NOT change
|
||||
|
||||
- The `broken` latch stays one-way per session (retry loops against a missing driver would spam
|
||||
PnP); the log line now says so and gives the remedy.
|
||||
- No mgmt-API health endpoint yet — the log ring is the surfacing channel. If the web console ever
|
||||
grows a "gamepad health" card, `DriverAttach` is the state to expose.
|
||||
- The DS/DS4 heartbeat is not yet watched for mid-session stalls (mode 6): worth adding once the
|
||||
XUSB/DS semantics split is encoded (DS freeze = conclusive, XUSB freeze = normal when no game
|
||||
polls).
|
||||
|
||||
## Validation status
|
||||
|
||||
- Linux-side: workspace build/tests/clippy green; `pf-driver-proto` layout asserts pin the new
|
||||
offsets (compile-time).
|
||||
- Windows host code + both drivers: compile-checked in CI only (this box cannot cross-build the
|
||||
native deps); **not yet on-box validated**. On-box test recipe: stop the service, `pnputil
|
||||
/delete-driver` the gamepad package, connect a client with a pad → expect the mode-1 WARN in the
|
||||
console Logs page within ~3 s of the pad arriving; reinstall drivers → expect the
|
||||
`attached (late=true)` INFO on the next session.
|
||||
Reference in New Issue
Block a user