feat(apple): explicit input-capture state machine — no more cursor grabs on window chrome
ci / rust (push) Has been cancelled

Capture used to engage whenever the app became active, so the click that activates the
window — on the title bar (a drag) or a resize edge — got the cursor warped away
mid-gesture, and raw deltas kept streaming to the host while the user fought the window.
Reworked Moonlight-style, with capture as a deliberate, reversible state owned by
StreamLayerView:

- Engage: automatically once when the stream starts / trust is confirmed (one-shot, can
  never fire surprisingly later), or by clicking into the video (that click's
  press/release are suppressed toward the host; acceptsFirstMouse makes it one click
  from another app). NEVER on app re-activation.
- Release: ⌘⎋ (toggles, key-window-scoped), focus loss — now including same-app window
  switches (⌘, / ⌘N / ⌘M resign key without resigning the app; previously the new
  window inherited a hidden frozen cursor and its typing was double-delivered to the
  host) — and disconnect.
- While released: nothing is forwarded (InputCapture.forwarding gates the GC handlers;
  held keys/buttons are flushed host-side so nothing sticks), the cursor is free, and
  the HUD (now showing the capture state) is clickable.
- The no-beep behavior moved from the NSEvent monitor to first-responder key
  consumption — swallowing at the monitor risked starving GC's own delivery (the
  "input broken altogether" report). The monitor now only intercepts ⌘⎋.
- Adversarial-review fixes: a second session preempts the previous one cleanly instead
  of leaving it captured with dead GC handlers (onPreempted); the engage click's
  suppression latch can't outlive the click (mouseUp backstop); ⌘⎋'s physical Esc can't
  type into the host in either toggle direction (suppressedVK latch + Esc-while-⌘
  guard); capture callbacks defer out of the SwiftUI update pass.

Validated live against the box: 16185 input datagrams injected during a captured
session (gamescope EIS), title-bar drag/resize free while released, and visible
cursor + typing on a streamed KWin desktop, all user-confirmed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-06-10 22:42:44 +02:00
parent acf44eed5f
commit a4eacabecd
5 changed files with 299 additions and 67 deletions
+13 -10
View File
@@ -140,16 +140,19 @@ signing, bundle id `io.unom.punktfunk`. Notes:
the host rebuilds at the new mode in ~90 ms; the first new-mode AU is an IDR with
fresh parameter sets (the refresh-on-IDR decode flow handles it untouched) and
`currentMode()` reflects the switch. Wire it to window-resize events.
8. **Input capture caveats** (stage 1): GC handlers only fire while the app has focus —
on focus loss `InputCapture` auto-releases everything still held (keys + buttons) so
nothing sticks down host-side. While the stream has focus the LOCAL cursor is hidden
and frozen mid-view (`CursorCapture` in StreamView.swift — the host renders its own
cursor; the local one diverges from it and a stray click would focus another app);
Cmd+Tab frees it, ⌘D disconnects. While captured, key NSEvents are swallowed by a
local event monitor (GC reads HID directly; without it every keystroke bubbles up the
responder chain unhandled and NSWindow beeps) except ⌘-combos, which still reach
the local app (⌘D/⌘Q) in addition to the host; a capture toggle is a small
follow-up. One live capture per process (the GC
8. **Input capture** (stage 1): capture is a deliberate, reversible STATE owned by
`StreamLayerView`, Moonlight-style. Engaged when the stream starts / trust is
confirmed and when the user clicks into the video (that click is suppressed toward
the host); released by ⌘⎋ (toggles) or focus loss; NEVER engaged by mere app
activation — activating clicks may be title-bar drags or resizes, which used to get
their cursor warped away mid-drag. While captured: the local cursor is hidden +
frozen mid-view (the host renders its own), all input is forwarded, and the view
consumes key events as first responder so unhandled keyDowns don't beep — ⌘-combos
still work locally (⌘D disconnect, ⌘Q) *and* reach the host via GC. While released:
nothing is forwarded (`InputCapture.forwarding` gates the GC handlers; held
keys/buttons are flushed host-side on release so nothing sticks down), the cursor is
free, and the HUD shows "Click the stream to capture input". GC handlers only fire
while the app has focus, and focus loss also auto-releases everything held. One live capture per process (the GC
mouse/keyboard singletons have a single handler slot — ownership is tracked so a stale
capture's stop() can't clobber a newer one).
9. **iOS**: same package (`BUILD_IOS=1` for the xcframework slice); `StreamView` needs the