The injector reattached the input desktop (OpenInputDesktop + SetThreadDesktop, two syscalls) before EVERY event. Now it stays bound to its desktop and only reattaches on a SendInput short write (the input desktop switched into UAC/lock) + retries once — Sunshine's model. No steady-state per-event overhead; still follows the desktop across the secure boundary, serving both desktops. Validated on the RTX 4090 (host as SYSTEM): client-rs --input-test injected for ~6s with no "blocked desktop" errors. Completes all 6 steps of the two-process secure-desktop build; only a real-UAC user smoke test remains. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.9 KiB
Windows secure-desktop capture — two-process design
Status: all steps (1–6) implemented and live-validated on the RTX 4090 (2026-06-16). The two-process path works end to end (host as SYSTEM): the user-session WGC helper relays video, the mux switches to the host's DDA on the secure desktop, a dead helper is rebuilt automatically, and the SendInput injector follows desktop switches lazily. Only a real UAC/lock smoke test remains (can't be triggered headless over SSH). The earlier user-mode WGC animation fix still ships; this is the SYSTEM-mode design that adds secure-desktop (UAC/lock/login) coverage, since WGC and the secure desktop need conflicting process tokens.
Implemented so far:
- Step 1 — DesktopWatcher (
capture/desktop_watch.rs): polls the input-desktop name → atomicDefault/Winlogon. Committed80e222d. - Step 3 — WGC helper subcommand (
wgc_helper.rs,m3-host wgc-helper): WGC→NVENC→framed AUs on stdout, stdin keyframe control. Committeda0f6cdd. - Step 4 — spawn + relay (
capture/wgc_relay.rs,m3::virtual_stream_relay): SYSTEM host spawns the helper viaCreateProcessAsUserWintowinsta0\default, relays its stdout AUs to the QUIC send thread, forwards keyframe requests, surfaces helper stderr in host tracing. Committed9f50b39. - Step 5 — source mux (
m3::virtual_stream_relay): the DesktopWatcher switches the AU source — helper relay onDefault, the host's own DDA capturer+encoder onWinlogon; every switch latches "wait for IDR" + forces the now-active source to emit a keyframe.
Live-validated on the RTX 4090 (2026-06-16, host as SYSTEM):
-
Step 4: the helper spawns via
CreateProcessAsUserW, runs WGC with no hang (HDR FP16 BT.2020 PQ), opens NVENC (D3D11 Main10), and relays AUs —client-rsover the LAN decoded 411 HEVC Main-10 frames. (Bug found+fixed:CreateProcessAsUserWgave the helper the user's env, droppingPUNKTFUNK_ENCODER=nvenc→ software-encoder fallback; fixed bymerged_env_block.) -
Step 5: with
PUNKTFUNK_SECURE_TEST_PERIOD_MS=4000driving a square-wave toggle, the source mux switchedsecure(DDA)↔normal(WGC relay)cleanly 5× in one session; the client decoded 308 frames continuously across every switch (the wait-for-IDR latch held — no decode break). The real Winlogon DDA capture itself is pre-proven by the single-process secure path (commitf4b4a6c); step 5's new surface is the mux, which the toggle exercises directly. -
Step 6: the helper relaunch watchdog. Force-killing the helper PID mid-stream triggered exactly one
WGC helper exited — rebuilt output + helper fails=1and the stream recovered — client-rs decoded 645 frames continuously across the kill. A ~30s mux soak (2s toggle) ran 16 switches with 0 rebuilds / 0 early-ends / 465 frames decoded. (Recovery rebuilds the whole output, not a same-target respawn, which storm-failed with "no DXGI output for target N yet" after an abrupt kill.) -
Step 2: SendInput now uses the retry-on-failure model (
inject/sendinput.rs) — the thread stays bound to its desktop and only reattaches (OpenInputDesktop/SetThreadDesktop) on aSendInputshort write (desktop switched), instead of two syscalls per event. Validated:client-rs --input-testinjected for ~6s with noblocked desktoperrors (steady-state path); the reattach-on-switch path is the sameOpenInputDesktopcall the old per-event code used, now lazy.
Remaining: a final user-driven smoke test — trigger a real UAC/lock on the box during a session and confirm the dialog appears on the client AND that clicking/typing on it lands (the box's UAC auto-elevates admins, so a real prompt can't be triggered headless over SSH; the mux switch itself is proven by the timed toggle, and DDA-on-Winlogon capture + input by the single-process secure path).
Note: the two-process path requires the host to run as SYSTEM (
run.cmd.sysbak→-s -i 1). As SYSTEM, WASAPI loopback audio (session 0) does not capture the user session's audio — a known limitation of SYSTEM-mode capture, separate from this work.
The constraint (verified live on the RTX 4090)
- WGC (the composed-desktop capture that fixes frozen HDR animations) will not activate under
the SYSTEM account —
CreateForMonitor→0x80070424. Thread-levelImpersonateLoggedOnUseris insufficient (tested:impersonated=true, still0x80070424). WGC needs the process to run as the interactive user. - DDA + SendInput on the secure desktop (Winlogon: UAC/lock/login) require LOCAL_SYSTEM (attach to the Winlogon desktop). This is already shipped (task #17) when the host runs as SYSTEM.
- Therefore one process can't do both. Single-process (the simpler design) is out.
Architecture: SYSTEM host + USER-session WGC helper, AU-relay (no shared GPU texture)
- SYSTEM host (the existing
m3-host, launched as SYSTEM in interactive Session 1 via the scheduled task → PsExec-s -i 1): owns the punktfunk/1 QUIC session, the single SudoVDA virtual output (+ isolate/restore RAII — the only topology owner), the DDA capture + NVENC encoder for the secure desktop, the single SendInput injector (serves both desktops), and the AU source mux that feeds the QUIC data plane. - USER-session WGC helper (a new
m3-hostsubcommand, spawned by the SYSTEM host viaWTSQueryUserToken(activeConsoleSessionId)→DuplicateTokenEx(TokenPrimary)→CreateProcessAsUserW(lpDesktop="winsta0\\default", CREATE_NO_WINDOW)): runs the existing WGC → scRGB/PQ → NVENC pipeline and ships Annex-B AUs ({data, pts_ns, keyframe}) to the SYSTEM host over a named pipe. It captures the SAME SudoVDA output by GDI name only — it must NOT create its own virtual output / touch display topology (a second topology owner re-triggers the ACCESS_LOST born-lost storm). - Mux: the SYSTEM host relays the helper's AUs onto QUIC while the input desktop is
Default(normal — WGC, HDR/animation-correct), and switches to its own DDA encoder while it'sWinlogon(secure — UAC/lock/login). The client sees one continuous stream; the encoder/FEC/AES-GCM/QUIC send path is untouched (sameEncodedFrameflow). NVENC re-inits only on a size/format change across the swap (already handled); same-mode is a pointer re-register. - Input: stays entirely in the SYSTEM host (only it can attach to Winlogon). One windowless
SendInput thread, Sunshine's retry-on-failure-only model (cache HDESK thread-local; SendInput
first; only on 0-injected re-
OpenInputDesktop+SetThreadDesktopand retry once) — serves both desktops with no per-event reattach. (Ctrl+Alt+Del/SAS needsSendSAS, out of scope; clicking UAC Yes/No + typing the login password are plain SendInput on Winlogon.)
Rejected: a shared NT-handle GPU texture (MIC/SDDL pain SYSTEM→user, keyed-mutex ring at 240 Hz, nvenc pointer-cache churn — all for a static lock dialog). AU bytes over a pipe are far simpler.
Detection
DesktopWatcher: a dedicated thread polling the input-desktop NAME at 30–60 Hz —
OpenInputDesktop(0,FALSE,0) + GetUserObjectInformationW(UOI_NAME) == "Winlogon" (secure) vs
"Default" (normal) → Arc<AtomicU8>. This is the authoritative signal; WTS session notifications
miss UAC entirely. (May also register WTSRegisterSessionNotification to short-circuit lock/unlock.)
Implementation steps (each independently buildable/testable on the 4090)
- DesktopWatcher (
capture/desktop_watch.rs, ~40 lines): the poll + atomic. Test: lock / trigger UAC over the existing stream, confirm the atomic flipsDefault↔Winlogonwithin a poll interval. - SendInput retry-on-failure model (
inject/sendinput.rs): replace per-event reattach with the cached-HDESK + retry-once model. Test: normal input unchanged; click UAC + type the lock password land (works today via per-event reattach — this is a refactor). - WGC helper subcommand (
m3-host wgc-helperor similar): the existing WGC pipeline → NVENC → Annex-B AUs over a named-pipe server. Test standalone: as the user it writes a valid.h265to the pipe (capturing the SudoVDA output by GDI name, no topology changes). - Spawn + relay: SYSTEM host spawns the helper (
CreateProcessAsUserW), connects the pipe, relays its AUs onto the live QUIC session. Test: normal-desktop stream sourced via the helper relay. - Source mux: relay helper AUs while
Default, switch to the host's own DDA encoder whileWinlogon(reusing the DesktopWatcher). Test: normal (WGC, HDR) → trigger UAC → stream shows the UAC dialog (DDA) → dismiss → back to WGC; QUIC session stays up throughout. Full-coverage milestone. - Relaunch watchdog + soak:
SERVICE_CONTROL_SESSIONCHANGE-style relaunch of the helper on console connect/disconnect; soak a few hundred lock/unlock+UAC switches (cf. task #17's 1012-switch run) — no leak / black / disconnect. Cargo features for the fallback:Win32_System_Threading,Win32_System_Pipes,Win32_System_RemoteDesktop.
Risks / notes
- Validate on the real 4090 only (
ssh "Enrico Bühler"@192.168.1.174, Session 1 via the Interactive scheduled task) — the headless build VM can't reproduce Winlogon-on-virtual-display or WGC. - The helper MUST capture the SudoVDA by GDI name and never create a second virtual output (avoids the ACCESS_LOST born-lost storm — one isolate owner = the SYSTEM host).
- Confirm
reisolatefires on a FRESH mid-session DDA open at the desktop boundary (task #17 only validated DDA recovery within an already-DDA session). - Brief one-frame repeat/flicker at the WGC↔DDA boundary is acceptable (the local lock/UAC transition flickers too); never starve the encoder (repeat last frame across the swap gap).
- Pragmatic alternative if full coverage isn't worth the build:
PromptOnSecureDesktop=0(UAC renders on the normal desktop → WGC captures it) covers UAC (not lock/login) with one reversible registry change.