Commit Graph

178 Commits

Author SHA1 Message Date
enricobuehler 3e6c9f6060 feat(gamepad): add virtual Xbox One/Series + DualShock 4 pad types
Extends virtual-controller support beyond Xbox 360 + DualSense. Goal: a
physical Xbox One or PS4 pad on the client gets a near-native matching virtual
pad on the host, auto-resolved from the controller type.

Protocol/core:
- GamepadPref gains XboxOne (wire 3) + DualShock4 (wire 4); to_u8/from_u8/
  from_name/as_str + C ABI PUNKTFUNK_GAMEPAD_XBOXONE/_DUALSHOCK4 constants
  (compile-time guard ties them to the enum). Single-byte wire form is
  unchanged, so it's forward-compatible (older peers degrade to Auto).

Host (Linux):
- New UHID DualShock 4 backend (inject/dualshock4.rs) bound by hid-playstation:
  lightbar, touchpad, motion, rumble — DualSense minus adaptive triggers /
  player LEDs / mute. Reuses the DualSense pure state + button mapping; only the
  report byte layout, the real-DS4 HID descriptor, the GET_REPORT handshake
  (0x12 MAC mandatory; 0x02 calibration; 0xa3 firmware) and the touchpad
  resolution (1920x942) differ. Touchpad/motion ride the existing 0xCC plane,
  lightbar the 0xCD Led plane (deduped); rumble the universal 0xCA plane.
- Xbox One/Series is the uinput Xbox-360 backend parameterized with the One S
  USB identity (045e:02ea) for matching glyphs — XInput-identical otherwise.
- PadBackend dispatch + resolver handle both; off Linux the UHID pads and
  One/Series fold into Xbox 360. Windows-host DS4 (ViGEm) deferred.

Clients (auto-resolve physical pad -> virtual type, plus manual settings):
- Linux/Windows (SDL3): SDL_GAMEPAD_TYPE_PS4 -> DualShock 4, _XBOXONE ->
  Xbox One; PadInfo carries the resolved pref; DS4 touchpad/motion capture +
  lightbar already type-agnostic. Linux settings combo + label updated.
- Apple (GameController): GCDualShockGamepad/GCXboxGamepad detection, DS4
  touchpad capture, settings picker entries.
- Android (Kotlin): InputDevice VID/PID auto-detect (matching the other
  clients) + settings entries.
- probe: --gamepad help/aliases.

Also hardens the Android JNI boundary: wrap the teardown + poll-thread shims in
catch_unwind so a panic degrades to a logged no-op instead of aborting the app.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 13:34:44 +00:00
enricobuehler 54b75c9be4 feat(host): GameStream/Moonlight compat is now opt-in (--gamestream) — secure native-only by default
apple / swift (push) Successful in 55s
windows-host / package (push) Successful in 2m31s
android / android (push) Successful in 4m40s
ci / rust (push) Successful in 4m43s
ci / web (push) Successful in 30s
ci / docs-site (push) Successful in 34s
deb / build-publish (push) Successful in 2m9s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 14s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 21s
ci / bench (push) Successful in 4m44s
docker / deploy-docs (push) Successful in 19s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m6s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m19s
Follows the security audit (#5/#9): the GameStream-compat plane carries inherent on-path weaknesses
that can't be fixed on the wire without breaking stock Moonlight — its pairing runs over plain HTTP
(#9, MITM-able during the pairing window) and its legacy control encryption can reuse GCM nonces (#5,
a passive eavesdropper can recover/forge input). The native punktfunk/1 plane (SPAKE2 PIN pairing +
per-direction AEAD nonces) has neither. So flip the default to secure-by-default:

- `serve`              → native punktfunk/1 plane + management API ONLY (no GameStream surface).
- `serve --gamestream` → ALSO the GameStream/Moonlight-compat planes (nvhttp pairing, RTSP, ENet
  control, _nvstream mDNS). Opt-in, logged with a trusted-LAN caveat. `--moonlight` is an alias.
- The native plane is now ALWAYS on in `serve` (`--native` is a kept-for-compat no-op); the unified
  GameStream+native host is `serve --gamestream`.

`gamestream::serve` gates the GameStream spawns (nvhttp/rtsp/control/mdns) on the flag; the native
plane + mgmt + native-pairing handle always run.

To avoid silently regressing validated Moonlight deployments, the explicit deployment configs PRESERVE
Moonlight via `--gamestream` (each documents dropping it for a secure native-only host): the Linux
systemd unit, the Steam Deck installer, and the Windows service default (DEFAULT_HOST_CMD). The bare
`serve` default (new/manual use) is secure.

Docs swept to match (host-cli, moonlight, quickstart, install, packaging READMEs, CLAUDE.md, README,
…): Moonlight setup now instructs `--gamestream`; native/console refs use bare `serve`. OpenAPI
regenerated (a stale "run `serve --native`" string). fmt + clippy clean; 94 host tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 10:19:40 +00:00
enricobuehler 3c55ec37fa fix(security): remaining audit findings — mgmt admin gate, RTSP DoS bounds, FEC drop, ALPN, ct-compare
apple / swift (push) Successful in 56s
windows-host / package (push) Successful in 2m25s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m8s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m10s
android / android (push) Successful in 4m42s
ci / rust (push) Successful in 4m44s
ci / web (push) Successful in 30s
ci / docs-site (push) Successful in 35s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 57s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m0s
deb / build-publish (push) Successful in 2m10s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m43s
flatpak / build-publish (push) Successful in 3m59s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m28s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m13s
Addresses the lower-severity findings from docs/security-review.md (#4-#12). Each fix was
adversarially re-reviewed (5-agent pass); two review catches folded in (the Apple client's
GET /library cert path; an RTSP header-cap bypass + a spawn-panic counter leak).

- #4 [low] mgmt mTLS-paired-cert no longer grants full admin. A paired STREAMING cert authorizes
  only a read-only allowlist (GET /host,/compositors,/status,/clients,/native/clients,/library);
  every state-changing route and every PIN-exposing route (/pair, /native/pair) requires the
  operator's bearer token. New cert_auth_is_a_read_only_allowlist test. (/library kept on the
  allowlist — the native clients browse it cert-only; its mutations stay token-only.)
- #6 [low] RTSP pre-auth DoS bounds: a concurrent-connection cap (RAII slot guard), a per-read
  timeout (slow-loris), and Content-Length/header/message size caps — closing an unauthenticated
  slow-loris / memory-growth / thread-exhaustion vector on TCP 48010.
- #11 [info] A FEC reconstruction failure is now a counted drop (discard the block, keep the
  session) instead of being stream-fatal — a lossy link can't be torn down by one bad block.
- #10 [info] Fixed ALPN ("pkf1") on both native QUIC endpoints (defense-in-depth; a deliberate
  coordinated client+host upgrade — a new host rejects an ALPN-less old client).
- #8 [info] Constant-time GameStream pairing phase-4 hash compare (crypto::ct_eq).
- #7 [low] New VirtualDisplay::set_launch_command carries the launch command per-session on the
  GameStream path (no process-global env stomp under concurrent sessions); native path keeps the
  env under today's single-session model (documented; plumb per-session with concurrent sessions).
- #5 [low] Legacy GameStream GCM nonce reuse: documented as inherent to Nvidia's old-style control
  encryption (Apollo/Moonlight identical; key is client-known) — unfixable on the legacy wire; the
  real fix is V2 control-encryption negotiation. Code comment at control.rs.
- #9 [info] GameStream plain-HTTP pairing: documented (inherent to GFE compat; use punktfunk/1).
- #12 [low] Web global NODE_TLS_REJECT_UNAUTHORIZED: fix designed (undici dispatcher scoped to the
  loopback mgmt fetch) but DEFERRED — needs `bun add undici` in the web build env; reverted to keep
  the web working. Latent-only (the loopback mgmt fetch is the console's only outbound TLS).

fmt + clippy -D warnings clean; 94 host + core tests green; no C-ABI/OpenAPI drift. (The HDR
Steps 1-2 client work in the tree is the user's parallel WIP — deliberately NOT included here.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 09:50:24 +00:00
enricobuehler 3526517eb1 feat: HDR Step-0 colour-metadata transport + security-audit hardening
ci / rust (push) Failing after 45s
apple / swift (push) Successful in 57s
ci / web (push) Successful in 39s
ci / docs-site (push) Successful in 38s
windows-host / package (push) Successful in 3m26s
android / android (push) Successful in 3m40s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m24s
deb / build-publish (push) Successful in 2m10s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m22s
decky / build-publish (push) Successful in 25s
ci / bench (push) Successful in 4m44s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 16s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 1m4s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m7s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m45s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 30s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m37s
flatpak / build-publish (push) Successful in 4m17s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m30s
docker / deploy-docs (push) Successful in 23s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m53s
Two strands, entangled in punktfunk1.rs, committed together (one builds-green tree).

HDR pipeline Step 0 — glass-to-glass colour-metadata transport (docs/hdr-pipeline-plan.md):
- Protocol/ABI: ColorInfo on the Welcome + a 0xCE HdrMeta datagram carry the source colour
  space + HDR10 static mastering metadata (quic.rs, abi.rs connect_ex5 fixing caps=0).
- New platform-independent, unit-tested HDR static-metadata helpers (hdr.rs): chromaticities
  (1/50000), mastering luminance (0.0001 cd/m2), MaxCLL/MaxFALL in HDR10/ST.2086 units.
- Capture/encode hooks (capture.rs, encode.rs set_hdr_meta) + Linux client / probe plumbing.

Security-audit hardening — top 3 from docs/security-review.md, each adversarially verified:
- #1 [HIGH] Secret file permissions. The host key.pem/cert.pem and both trust stores are now
  written owner-only: 0600 + dir 0700 on Unix (mirrors mgmt_token), best-effort
  SYSTEM/Administrators/OWNER-only icacls DACL on Windows (%ProgramData% is Users-readable).
  Closes a local key-disclosure -> host-impersonation gap. New gamestream::{create_private_dir,
  write_secret_file} + a 0600 regression test.
- #2 [HIGH] Native SPAKE2 PIN is single-use. The PIN is consumed the moment the host sends its
  key-confirmation (which lets the client test its one guess), before reading the proof, so any
  completed attempt -- right OR wrong -- disarms the window. A wrong PIN isn't observable
  host-side (the client aborts before sending its proof), so consuming on first attempt is what
  delivers the documented "one online guess" instead of an unbounded brute-force of the static
  4-digit PIN. Test verifies single-use.
- #3 [MEDIUM] RTSP packetSize is bounded ([64,2048] in stream_config) and VideoPacketizer::new
  uses saturating .max(1), killing a PRE-AUTH div-by-zero/underflow panic of the video thread.
  Tests for {0,15,16,17} + out-of-range rejection.

fmt + clippy -D warnings clean; full workspace test suite green (93 host tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 09:07:59 +00:00
enricobuehler 450bcf1e7b feat(host): Apollo-backlog hardening — cert gate, NVENC RFI, media QoS, async injector
A pass over the apollo-comparison backlog (re-verified against current code).
Lands four items end-to-end plus a Windows-DualSense scoping doc.

- #5/#92/#26 — GameStream paired-cert allow-list. tls.rs surfaces the verified
  peer cert to handlers (serve_https + PeerCertFingerprint, now shared with the
  mgmt API instead of duplicated); nvhttp gates /launch /resume /applist /cancel
  on AppState.paired and reports a real PairStatus; save_paired writes atomically
  (temp+rename). Closes the "mTLS accepts any client cert" hole. + regression test.

- #6/#51/#19/#22 — NVENC caps query -> reference-frame invalidation. nvenc.rs
  query_caps probes nvEncGetEncodeCaps (max dims / 10-bit / custom-VBV / RFI),
  rejecting over-range modes and degrading 10-bit->8-bit instead of an opaque
  InvalidParam. New Encoder::invalidate_ref_frames (default false -> caller
  keyframes); the Windows NVENC path implements real RFI (multi-ref DPB +
  nvEncInvalidateRefFrames, dedup + IDR-on-overflow). control.rs decodes the
  0x0301 lost-frame range (Apollo's IDX_INVALIDATE_REF_FRAMES) -> AppState.rfi_range
  -> encode loop, falling back to a keyframe. NOTE: the Windows NVENC impl is
  RTX-box/CI-pending (can't compile on Linux); adversarially reviewed vs the SDK.

- #43/#72 — media socket QoS + buffer growth. New punktfunk_core::transport::qos:
  grow_socket_buffers (factored out the native plane's 32MB SO_SNDBUF growth so the
  GameStream sockets reuse it) + set_media_qos (opt-in PUNKTFUNK_DSCP=1: DSCP CS5
  video / CS6 audio + Linux SO_PRIORITY, Apollo's scheme). Wired into UdpTransport
  and the GameStream video/audio sockets. Windows IP_TOS needs qWAVE (follow-up).

- #8/#45 — GameStream input injection off the ENet service thread. on_receive no
  longer injects inline (a slow inject head-blocked ENet keepalive/retransmit); it
  forwards to a dedicated injector thread. The hardened InjectorService moved from
  punktfunk1 into crate::inject (shared by both planes) + a coalesce step that sums
  adjacent relative-mouse/scroll deltas while preserving button/key/abs ordering.

Docs: re-verified apollo-comparison.md status (22 items already done/obsolete since
the snapshot) + windows-dualsense-scoping.md (ViGEm can't emulate a DualSense; real
DS5 on Windows needs a VHF virtual-HID driver — web-research pass pending).

fmt + clippy -D warnings clean; full workspace test suite green; no C-ABI/OpenAPI drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 00:06:30 +00:00
enricobuehler 516efcc3a3 feat(core/fec): adaptive FEC — size recovery to measured loss, not a flat 20%
On a clean link the flat 20% FEC is pure waste: extra wire bytes AND extra
packets. On a packet-rate-bound uplink (the Steam Deck's WiFi tx caps ~22k pps
regardless of bitrate) those extra packets directly cost goodput — measured at
200 Mbps goodput, 20% FEC drove ~10% loss vs ~2.6% at 0% (it saturated the link).

Adaptive FEC closes the loop:
- Client measures the loss FEC is absorbing each ~750 ms window from session stats
  (recovered shards / received, + a bump when a frame went unrecoverable) and sends
  a periodic `LossReport { loss_ppm }` on the control stream (new message;
  `window_loss_ppm` helper, shared + unit-tested). Connector (Apple/Linux/Windows)
  and probe both report; suppressed during a speed test so its filler can't skew it.
- Host maps loss → recovery % (`adapt_fec`: ≈ loss×1.4 + 1pt, clamped 1..50) and
  applies it live via `Session::set_fec_percent` (the wire is self-describing — each
  packet carries its block's data/recovery counts, so the receiver needs no notice).
  A clean link decays to ~1%; loss ramps it up and converges.
- `PUNKTFUNK_FEC_PCT`, when set, now PINS FEC static (disables adaptation) so
  speed-test / measurement runs keep a fixed, known overhead. Unset ⇒ adaptive,
  starting at 10%.

An older host ignores LossReport (unknown control message) and keeps static FEC;
an older client simply never reports and the host holds its start value. Builds +
clippy + fmt + tests green (adapt_fec / window_loss_ppm / loss_report unit tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 21:31:07 +00:00
enricobuehler f37a304fba fix(core/speed-test): packet-level throughput + paced burst (kill the 0/100% cliff)
The punktfunk/1 speed test was unusable across every client/host: at the start of
a burst a little data got through, then everything read as dropped (~10 MB total).
Two compounding bugs:

1. Receive side measured throughput from fully-reassembled FLAG_PROBE *access
   units* only. The instant loss crossed the 20% FEC budget no AU completed, so the
   figure cliffed to 0 / 100% loss even though most bytes still arrived — a binary
   cliff, not a graded measurement.
2. Send side blasted each filler AU (up to 256 KB ≈ 200 packets) into the socket
   buffer in one unpaced batch, unlike the real video path which paces. On a small
   buffer (e.g. the Steam Deck's 416 KB) a single AU overflowed it, so the test
   measured self-inflicted buffer overflow instead of the link.

Fixes:
- Host `run_probe_burst` keeps each AU a small (~16 KB) burst and paces by the byte
  budget, mirroring `paced_submit`; reports the WIRE packets the kernel accepted and
  the ones the send buffer dropped (stat deltas), separating host-side drops from
  link loss.
- `ProbeResult` gains `wire_packets_sent` + `send_dropped` (back-compat decode: a
  21-byte pre-wire-stats result still decodes, new fields 0).
- Clients (probe + connector) count delivered traffic at the packet level via
  `session.stats()` deltas over the burst window, so throughput/loss degrade
  gracefully. Connector freezes the delivered figure when the host report lands so
  resumed video can't inflate it. New `ProbeOutcome`/`PunktfunkProbeResult` fields:
  `host_drop_pct`, `wire_packets_sent`, `send_dropped`.

Validated on loopback (graded 142→1391 Mbps, host_drop/link_loss split correctly,
no cliff) and live against the Deck: clean to ~200 Mbps goodput / 273 Mbps wire at
0% link loss, host send buffer the wall above that (the lever-#1 target).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 17:46:17 +00:00
enricobuehler 480dee863d feat(host/gamescope): custom-resolution Game-Mode streaming on the Steam Deck
The Steam Deck (SteamOS) ships its OWN gaming session — `gamescope-session.target`
driven by `/usr/lib/steamos/gamescope-session`, not Bazzite's `gamescope-session-plus`.
That script `exec gamescope`s with HARDCODED physical-panel args (`-w 1280 -h 800 -O
'*',eDP-1`) and launches Steam via a SEPARATE `steam-launcher.service`, so the existing
managed-session path (which assumes session-plus) couldn't honor the client's mode — an
attach captured the panel's native 1280x800 instead.

Add a SteamOS branch to the managed-session path: detect it, write a `gamescope` PATH-shim
that rewrites the hardcoded args to `--backend headless -W <client> -H <client> -r <hz>`,
drop a transient user `gamescope-session.service.d` override pointing PATH at the shim +
the mode, then RESTART the whole target so `steam-launcher.service` brings Steam up IN the
headless gamescope at the client's resolution. Attach to the one fresh node (the restart
kills any prior gamescope, so no stale-node attach). Restore-on-disconnect removes the
override + restarts the target back to the physical panel (debounced; skipped if the user
switched to a desktop session). All user-level (`systemctl --user`) — no root.

Also widen `build_pipeline_with_retry` to 8 attempts (~90s): a host-managed gamescope
session cold-starting Steam Big Picture takes 30-60s to first frame, and a first-connect
timeout would tear down the warm session (forcing another cold start on reconnect).
Permanent failures still fail fast via `is_permanent_build_error`.

Validated live on a Steam Deck: Game Mode auto-detected, host takes over headless at the
client's mode (720p / 1080p), Steam Big Picture streamed glass-to-glass to the Mac at the
requested resolution. Single-tenant (concurrent clients at different modes still thrash —
a follow-up).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 16:30:24 +00:00
enricobuehler 0f7f1be3c3 fix(core/transport): treat ENOBUFS as a transient drop, not a fatal error
WiFi drivers (e.g. ath11k on the Steam Deck) return ENOBUFS — not
EAGAIN/EWOULDBLOCK — when the tx queue is momentarily full. Rust maps
ENOBUFS to ErrorKind::Uncategorized, so `is_transient_io` (which only
matched WouldBlock/ConnRefused/ConnReset) treated it as a real error and
tore the whole stream down on a single transient burst.

This presented as a vicious Heisenbug on the Deck: the native host
streamed flawlessly on loopback and under a debugger (anything slow
enough not to fill the small ~416 KB wlan0 buffer), but died at full rate
cross-machine over WiFi — flaky hang-or-SIGKILL because tx-queue-full is
probabilistic. Diagnosed live via a forced core dump (gdb on the hung
core): the data-plane thread had bailed on a fatal send error.

Treat ENOBUFS (and asynchronous network-path blips ENETUNREACH /
EHOSTUNREACH / ENETDOWN / EHOSTDOWN) as a lossy drop like WouldBlock —
FEC + the next frame recover. Validated: 6/6 back-to-back cross-machine
streams over the Deck's WiFi, host stable, p50 ~4.4 ms (one run dropped
4/300 frames *gracefully*, 0 mismatched — the fix working as intended).

Also surface a data-plane bind/hole-punch failure directly in punktfunk1
(it was previously only reported after teardown, which a stall could
swallow entirely).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 14:49:59 +00:00
enricobuehler 333f66b45b fix(host/serverinfo): don't advertise an empty codec mask when the VAAPI probe finds nothing
apple / swift (push) Successful in 54s
windows-host / package (push) Successful in 2m21s
android / android (push) Successful in 3m30s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 27s
ci / rust (push) Successful in 5m56s
deb / build-publish (push) Successful in 3m9s
ci / bench (push) Successful in 4m40s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
decky / build-publish (push) Successful in 11s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m47s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m51s
docker / deploy-docs (push) Successful in 17s
The Phase 3 GPU-aware codec mask (6922e1c) probes VAAPI on any non-NVIDIA host.
On a GPU-less box (CI container: no /dev/nvidia* -> `auto` picks VAAPI, but there's
no VA display) the probe returns all-false, so the mask was 0 -- the host
advertised NO codecs, and the serverinfo unit test failed.

Fall back to the static superset when the probe yields nothing (VAAPI wasn't
usable, not "the GPU encodes nothing"); quiet ffmpeg's expected "No VA display"
error during the probe; and assert the test against codec_mode_support() rather
than a hardcoded 65793 so it's deterministic regardless of the build host's GPU.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 11:52:17 +00:00
enricobuehler 6922e1c467 feat(host): VAAPI codec probe + AMD/Intel packaging + neutral logs (Phase 3)
apple / swift (push) Successful in 55s
ci / rust (push) Failing after 1m35s
ci / web (push) Successful in 28s
windows-host / package (push) Successful in 2m23s
ci / docs-site (push) Successful in 30s
android / android (push) Successful in 3m24s
deb / build-publish (push) Successful in 3m22s
decky / build-publish (push) Successful in 14s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m48s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m50s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m51s
docker / deploy-docs (push) Successful in 18s
Polish for AMD/Intel support:
- GameStream serverinfo advertises only codecs the GPU can ACTUALLY encode on
  the VAAPI backend (probed once by opening a tiny encoder per codec). AV1
  encode is narrow (Intel Arc/Xe2+, AMD RDNA3+/RDNA4) and an old iGPU may lack
  HEVC, so a Moonlight client never negotiates a codec the encoder can't open.
  NVENC/Windows keep the Moonlight-validated static mask. Validated on a Radeon
  780M: h264/h265/av1 all probe true -> mask unchanged (65793).
- Packaging: Recommends mesa-va-drivers + intel-media-va-driver (deb) /
  mesa-va-drivers + intel-media-driver (rpm) so the auto-selected VAAPI backend
  works out of the box on AMD/Intel; NVIDIA boxes can --no-install-recommends.
  (Fedora note: stock mesa-va-drivers disables HEVC/AV1 -- needs the freeworld
  variant from RPM Fusion.)
- De-NVIDIA-fy the user-facing encoder log/context strings ("open NVENC" ->
  "open video encoder") now that VAAPI is a first-class backend.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 10:41:37 +00:00
enricobuehler 708c62788d feat(host/encode): VAAPI zero-copy dmabuf import (AMD/Intel GPU CSC)
apple / swift (push) Successful in 57s
ci / rust (push) Successful in 1m39s
ci / web (push) Successful in 32s
ci / docs-site (push) Successful in 31s
android / android (push) Successful in 3m29s
windows-host / package (push) Successful in 3m39s
deb / build-publish (push) Successful in 3m7s
decky / build-publish (push) Successful in 22s
ci / bench (push) Successful in 4m43s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 16s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m24s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 22s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m18s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m22s
docker / deploy-docs (push) Successful in 21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m53s
Phase 2 of AMD/Intel support: the VAAPI encoder now takes the capture dmabuf
directly and does the RGB->NV12 colour conversion on the GPU's video engine,
eliminating the host-side de-pad + swscale CSC + upload the CPU path pays.

- capture: a vendor-neutral FramePayload::Dmabuf (dup'd fd + fourcc/modifier/
  layout). When zero-copy is on, the EGL->CUDA importer is unavailable (any
  non-NVIDIA host), and the backend is VAAPI, the capturer advertises LINEAR
  dmabuf and hands the raw buffer to the encoder instead of CPU-copying it.
- encode/vaapi: the encoder self-configures from the first frame's payload (no
  open_video signature change). The dmabuf arm wraps the buffer as an
  AV_PIX_FMT_DRM_PRIME frame and pushes it through a filter graph
  buffer(drm_prime) -> hwmap(vaapi) -> scale_vaapi=nv12 -> buffersink; the
  encoder takes NV12 surfaces straight from the sink. The Phase 1 CPU-upload
  path is kept as the other arm (used when capture produces CPU frames).

Live-validated on a Radeon 780M (real Sway/xdpw desktop capture): correct,
pixel-perfect HEVC, and ~10x less host CPU at 1440p (4.2s -> 0.4s of CPU for
300 frames) -- the de-pad/CSC/upload moves to the GPU. NVIDIA unchanged
(zero-copy still imports to CUDA; the passthrough path only engages on
non-NVIDIA hosts).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 09:57:00 +00:00
enricobuehler 5e27f65f2e fix(host/capture): mmap the buffer fd ourselves — xdpw MemFd over-reads MAP_BUFFERS
apple / swift (push) Successful in 55s
windows-host / package (push) Successful in 2m28s
android / android (push) Successful in 10m10s
ci / web (push) Successful in 32s
ci / docs-site (push) Successful in 29s
ci / rust (push) Successful in 11m44s
deb / build-publish (push) Successful in 3m7s
decky / build-publish (push) Successful in 34s
ci / bench (push) Successful in 4m44s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 15s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m57s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m51s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 21s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m21s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m8s
docker / deploy-docs (push) Successful in 20s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m54s
The CPU de-pad path trusted PipeWire's MAP_BUFFERS slice (`d.data()`, length =
`data.maxsize`). xdg-desktop-portal-wlr hands MemFd ScreenCast buffers whose
maxsize exceeds the bytes PipeWire actually maps into our process, so reading to
maxsize ran off the end of the mapping and SIGSEGV'd the capture thread —
crashing every CPU-path capture on Sway/wlroots (and thus any non-NVIDIA host,
which has no CUDA zero-copy importer and always falls back to this path).

mmap the fd ourselves, sized to its real length (fstat), for any fd-backed
buffer (MemFd SHM or DmaBuf); fall back to `d.data()` then drop. The existing
`needed > avail` guard now drops cleanly instead of over-reading. This also
subsumes the original "MAP_BUFFERS didn't map a Vulkan dmabuf" fallback.

Verified: fixes real Sway-desktop portal capture -> VAAPI HEVC on a Radeon 780M
(correct image + colours); the NVIDIA zero-copy path (returns before this code)
and the NVIDIA/KWin CPU path (self-mmap, fd_len == maxsize) both still work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 21:48:49 +00:00
enricobuehler f96e4ec9f8 refactor(host/zerocopy): dlopen libcuda instead of a link-time #[link]
apple / swift (push) Successful in 54s
windows-host / package (push) Successful in 2m15s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m18s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m14s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 55s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 58s
android / android (push) Successful in 4m10s
audit / cargo-audit (push) Failing after 1m5s
ci / web (push) Successful in 28s
ci / docs-site (push) Successful in 28s
ci / rust (push) Successful in 5m41s
ci / bench (push) Successful in 5m53s
decky / build-publish (push) Successful in 11s
deb / build-publish (push) Successful in 3m24s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 35s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m7s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m16s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3m50s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 22s
flatpak / build-publish (push) Successful in 4m9s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m23s
docker / deploy-docs (push) Successful in 5s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m51s
The host hard-linked libcuda.so.1 on Linux (`#[link(name="cuda")]` in
`zerocopy::cuda`), so the binary wouldn't even *start* on a non-NVIDIA box —
the dynamic loader can't resolve the NEEDED libcuda. That blocked running the
new VAAPI (AMD/Intel) path on a machine without the NVIDIA driver.

Resolve the 18 CUDA Driver API symbols at runtime via `libloading` instead.
Same-named wrapper fns forward to the dlopen'd table (call sites unchanged);
when libcuda is absent they return a non-zero CUresult so `context()` fails
cleanly and the capturer falls back to the CPU path. The library handle is
leaked (process-lifetime, like the shared context).

One Linux binary now runs on NVIDIA (CUDA zero-copy -> NVENC) and on AMD/Intel
(VAAPI, no NVIDIA driver). Verified: the NVIDIA dev box still does dmabuf->CUDA
zero-copy; on a Radeon 780M box the host builds with no libcuda present, the
binary has no NEEDED libcuda entry, and VAAPI encode runs with no stub.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 15:44:57 +00:00
enricobuehler b390dd883b feat(host/encode): VAAPI encode backend for AMD/Intel GPUs (Linux)
apple / swift (push) Successful in 54s
windows-host / package (push) Successful in 2m52s
android / android (push) Successful in 3m4s
ci / rust (push) Successful in 1m18s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 27s
ci / bench (push) Successful in 4m32s
deb / build-publish (push) Successful in 2m56s
decky / build-publish (push) Successful in 22s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m59s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 15s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m41s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 22s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 7m27s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m10s
docker / deploy-docs (push) Successful in 39s
The Linux host was NVENC/CUDA-only. Add a VAAPI encoder — one libavcodec
backend (h264/hevc/av1_vaapi) covering both AMD (Mesa radeonsi) and Intel
(iHD) — behind the existing `Encoder` trait, and turn `open_video`'s Linux
arm into a vendor dispatcher: `PUNKTFUNK_ENCODER=auto|nvenc|vaapi` (default
auto: NVENC when a CUDA frame or /dev/nvidia* is present, else VAAPI). The
NVIDIA path is unchanged — auto resolves to NVENC on an NVIDIA box and the
bitrate-probe loop moved verbatim into `open_nvenc_probed`.

`VaapiEncoder` mirrors the NVENC hwframes pattern with AV_HWDEVICE_TYPE_VAAPI.
The CPU-input path swscales packed RGB -> NV12 (BT.709 limited, VUI signalled)
and uploads into a pooled VA surface (av_hwframe_transfer_data), preserving the
low-latency model (infinite GOP, on-demand forced IDR, async_depth=1, CBR when
the driver supports it). It works on a non-NVIDIA box with no capture changes:
the capturer already falls back to CPU frames when its EGL->CUDA importer can't
initialise (no libcuda).

Live-validated on a Radeon 780M (RDNA3): hevc/h264/av1_vaapi all encode,
HEVC/H264 decode cleanly with correct BT.709-limited colours, infinite GOP
preserved. Zero-copy dmabuf import (the high-res perf lever) is next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 15:35:49 +00:00
enricobuehler aef552f04a feat(host/windows): HDR scRGB→P010 in a shader — NVENC native P010, off the SM
apple / swift (push) Successful in 55s
deb / build-publish (push) Successful in 3m9s
decky / build-publish (push) Successful in 13s
ci / rust (push) Successful in 1m14s
ci / web (push) Successful in 30s
ci / docs-site (push) Successful in 30s
windows-host / package (push) Failing after 2m19s
android / android (push) Successful in 3m12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
ci / bench (push) Successful in 4m38s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m42s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m47s
docker / deploy-docs (push) Successful in 18s
On the Windows WGC HDR path the FP16 scRGB capture was fed to NVENC as
R10G10B10A2 (BT.2020 PQ), and NVENC did the RGB→YUV CSC internally on the
contended SM — adding to the encode_ms wall under a GPU-saturating game.
(NVIDIA's D3D11 VideoProcessor can't do RGB→P010 for HDR; that path renders
green, confirmed live — so the convert must be ours.)

New `HdrP010Converter` fuses the tone-map with the BT.2020 RGB→YUV matrix and
emits P010 (10-bit limited range) directly: a luma pass → an R16_UNORM plane
RTV (full-res) and a chroma pass → an R16G16_UNORM plane RTV (half-res, 2x2
box average) of a DXGI_FORMAT_P010 texture. NVENC then takes native P010 and
skips its SM-side convert.

Gated behind PUNKTFUNK_HDR_SHADER_P010 (default OFF → the existing
R10→NVENC path is byte-for-byte unchanged). Colour validated by a new
`hdr-p010-selftest` subcommand: a synthetic scRGB pattern → P010 → readback,
compared to a BT.2020 PQ 10-bit reference — max abs error Y=0.99 / Cb=0.82 /
Cr=0.75 codes on an RTX 4090. Live-validated HDR colours correct (no green).
Build + clippy (--features nvenc -D warnings) green on x86_64-pc-windows-msvc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 09:54:00 +00:00
enricobuehler 1fc6f73784 perf(host/linux): NV12 GPU convert — feed NVENC native YUV, off the contended SM (Tier 2A)
apple / swift (push) Successful in 54s
windows-host / package (push) Failing after 2m18s
ci / web (push) Successful in 32s
ci / rust (push) Failing after 5m2s
decky / build-publish (push) Successful in 11s
android / android (push) Failing after 49s
ci / docs-site (push) Successful in 35s
ci / bench (push) Failing after 3m15s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m49s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 15s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 40s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Failing after 28s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Successful in 5m54s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 11s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 1m36s
The Linux zero-copy tiled-GL path can now produce NV12 (BT.709 limited range)
on the GPU and feed NVENC native YUV, deleting NVENC's internal RGB->YUV CSC —
which runs on the SM/3D-compute engine a saturating game pins at 100% (the
game-vs-encode contention headache). Windows already does this via the D3D11
video processor; this closes the Linux gap. See docs/host-latency-plan.md §2A.

Gated behind PUNKTFUNK_NV12 (default OFF → the RGB/BGRx path is byte-for-byte
unchanged; zero regression). Only the tiled EGL/GL path converts; the
LINEAR/Vulkan-bridge (gamescope) path stays RGB.

- zerocopy/egl.rs: Nv12Blit — BT.709 limited Y pass (R8, full-res) + UV pass
  (RG8, half-res, GL_LINEAR 2x2 average); both CUDA-registered; import_nv12.
- zerocopy/cuda.rs: two-plane DeviceBuffer (Y W*H@1B + interleaved UV
  (W/2)*2 x H/2), paired Y+UV pool, copy_mapped_nv12 + copy_nv12_to_device,
  on the per-thread priority stream (dmabuf-recycle sync preserved).
- encode/linux.rs: nvenc_input(Nv12)->NV12; submit_cuda copies two planes into
  NVENC's surface; VUI signalled BT.709 limited (colorspace/range/primaries/trc).
- capture/linux.rs: gate (PUNKTFUNK_NV12 && tiled), report format Nv12.
- main.rs + zerocopy/mod.rs: `nv12-selftest` subcommand.

Validated on RTX 5070 Ti two ways: (1) nv12-selftest — synthetic RGBA->NV12
round-trip vs a BT.709 reference, max abs error Y=0.56/U=0.33/V=0.26 LSB;
(2) live capture->NV12->NVENC->decode of animated red content matches the RGB
path's colour (avg RGB 230,18,18 vs 231,18,20). build/clippy/fmt green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 23:39:11 +00:00
enricobuehler 112a054c35 perf(host): latency hardening for the game-vs-encode GPU contention collapse
Verified, prioritized analysis in docs/host-latency-plan.md (multi-agent
investigation + adversarial verification). Lands the two low-risk tiers:

Tier 2B — Linux scheduling hygiene:
- boost_thread_priority now nices the capture/encode (-10) and send (-5)
  threads on Linux (setpriority, best-effort; no-op without CAP_SYS_NICE),
  and the wrong "gamescope caps the game" doc-comment is corrected.
- CUDA context created with CU_CTX_SCHED_BLOCKING_SYNC (frees a core on the
  shared box instead of busy-spinning on completion).
- Copies moved off the default stream onto a per-thread highest-priority
  CUDA stream (cuStreamCreateWithPriority, graceful NULL-stream fallback)
  with a per-stream sync that no longer blocks on the other worker thread's
  in-flight copies. Stream priority is measure-then-keep (NVIDIA Linux may
  ignore it); never regresses.

Tier 3A — Windows session tuning (new session_tuning.rs, raw C-ABI FFI,
no-op off Windows): once-per-process 1ms timer + DwmEnableMMCSS + HIGH
priority class; per-thread MMCSS "Games" + keep-display-awake. Wired into
both the native (boost_thread_priority) and GameStream (stream.rs) paths.
We had zero session tuning before (Apollo streaming_will_start parity).

Tier 2A (Linux NV12 convert) is specified but intentionally not landed:
it is colour-correctness-critical and needs A/B validation on a GPU box
with a display (green-screen risk). Builds + clippy + fmt green on Linux.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 23:05:57 +00:00
enricobuehler 9c8fa9340c refactor: drop milestone names + consolidate clients; loss-recovery & rumble fixes
apple / swift (push) Failing after 40s
audit / cargo-audit (push) Failing after 1m12s
windows-msix / package (push) Successful in 1m37s
windows / build (push) Successful in 1m14s
android / android (push) Successful in 4m48s
ci / web (push) Successful in 27s
ci / rust (push) Successful in 4m21s
ci / docs-site (push) Successful in 31s
ci / bench (push) Successful in 4m39s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 19s
deb / build-publish (push) Successful in 6m3s
flatpak / build-publish (push) Successful in 4m13s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m16s
docker / deploy-docs (push) Successful in 18s
Two bodies of work in one commit (the rename moved files the fixes also touched).

Naming/structure cleanup (pre-launch):
- Host modules m3.rs->punktfunk1.rs, m0.rs->spike.rs; CLI m3-host->punktfunk1-host,
  m0->spike; bare `punktfunk-host` now prints help. Types M3Options/M3Source->
  Punktfunk1Options/Punktfunk1Source.
- Clients consolidated out of crates/ into clients/: punktfunk-client-rs->
  clients/probe (crate punktfunk-probe), client-linux->clients/linux,
  client-windows->clients/windows, punktfunk-android->clients/android/native
  (crate punktfunk-client-android; kept [lib] name=punktfunk_android so the JNI
  contract is unchanged). crates/ now holds only core + host.
- Milestone codes M0-M4 purged from code/CLI/CLAUDE.md/README/docs/docs-site,
  kept only in docs/implementation-plan.md. docs/m2-plan.md->
  docs/gamestream-host-plan.md. CI/gradle/flatpak paths updated.

Client loss-recovery (video froze and never recovered after a brief drop):
- Export punktfunk_connection_frames_dropped through the C ABI (the core already
  tracked it for the client keyframe-recovery loop; it was never reachable from
  the ABI clients). Regenerated punktfunk_core.h.
- Apple (StreamPump + Stage2Pipeline) and Android (decode.rs) now poll
  frames_dropped and request a keyframe when it climbs -- the same loss-driven
  recovery Linux/Windows already had. Under infinite GOP the decoder silently
  conceals reference-missing frames, so the decode-error trigger rarely fires.

Apple rumble robustness (worked then went spotty -- DualSense + Xbox):
- Add CHHapticEngine stopped/reset handlers (rebuild on app background / audio
  interruption / server reset) and drop the permanent `broken` latch on a
  transient drive failure; latch only when the controller truly has no haptics.
- Surface swallowed SDL set_rumble errors on Linux/Windows + diagnostic logging.

Verified: cargo build/clippy/fmt --workspace, C-ABI harness, header drift.
Not runnable on this box (verify in CI): Gitea workflows, gradle/Android,
flatpak, Swift/decky.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 21:05:58 +00:00
enricobuehler 1f7b8eba66 feat(host/windows): auto-install a virtual mic device (Steam Streaming Microphone)
apple / swift (push) Successful in 54s
android / android (push) Successful in 1m56s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 28s
deb / build-publish (push) Successful in 2m31s
ci / rust (push) Successful in 1m40s
decky / build-publish (push) Successful in 19s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / bench (push) Successful in 4m32s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 20s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m16s
So Windows mic passthrough works without the user installing anything: when no virtual-mic
device is present, install Steam Remote Play's SteamStreamingMicrophone.inf (ships under
Steam\drivers\Windows10\{arch}\ next to the speakers INF Apollo uses) via DiInstallDriverW
loaded from newdev.dll — the same mechanism Apollo uses for Steam Streaming Speakers — then
re-find the device. Needs admin (the host runs as SYSTEM); best-effort and safe (no-op if
Steam absent / INF not found / PUNKTFUNK_NO_MIC_INSTALL), falling back to the manual-install
guidance (VB-Audio Cable) otherwise.

Not yet built/validated on the box (down); FFI cross-checked against windows-0.62. Whether
Steam ships SteamStreamingMicrophone.inf at that path is to be confirmed on the box.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 17:22:48 +00:00
enricobuehler a7daed5797 feat(host/windows): client→host mic passthrough via a virtual audio device
apple / swift (push) Successful in 55s
ci / web (push) Successful in 27s
ci / rust (push) Successful in 1m40s
android / android (push) Successful in 1m57s
ci / docs-site (push) Successful in 29s
deb / build-publish (push) Successful in 2m30s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m30s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m12s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m2s
docker / deploy-docs (push) Successful in 18s
The host received the client's mic uplink (0xCB Opus) but dropped it on Windows ("requires
Linux"). Windows has no user-mode way to CREATE a capture endpoint, so target an existing
virtual audio device and write the decoded mic PCM into its RENDER endpoint — the device's
CAPTURE endpoint then surfaces as a microphone host apps record from (the inverse of a
virtual cable). New audio::wasapi_mic::WasapiVirtualMic: finds the device by friendly-name
(Steam Streaming Microphone / VB-Audio CABLE Input / VoiceMeeter / "virtual", override with
PUNKTFUNK_MIC_DEVICE), opens a WASAPI shared event-driven RENDER client (48 kHz stereo f32,
autoconvert), and a dedicated COM thread writes a bounded (~80 ms drop-oldest) inject queue
with silence-fill. open_virtual_mic() gets a Windows arm; mic_service_thread (Opus decode →
push) now compiles for windows too (opus is already a windows dep). Clear error + install
guidance when no virtual device is present.

Linux/cross-platform side cargo-checks; the Windows path is built/validated when the box is
back (the wasapi render API was cross-checked against the docs + the existing capture path).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 17:15:41 +00:00
enricobuehler 3b3e8b4ba9 perf(host/windows): elevate capture/encode/send thread CPU priority (Apollo-parity)
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 1m36s
android / android (push) Successful in 2m5s
ci / web (push) Successful in 29s
ci / docs-site (push) Successful in 29s
deb / build-publish (push) Successful in 2m31s
decky / build-publish (push) Successful in 15s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m28s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m20s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m58s
Apollo runs its capture thread at CRITICAL and its encoder thread at ABOVE_NORMAL; we set
none. Our GPU work is already HIGH priority, but the GPU scheduler can only favour commands
we've SUBMITTED — a normal-priority thread descheduled by a CPU-heavy game submits the
convert/encode late, so the HIGH GPU priority never bites (consistent with the measured
"NVENC engine idle yet the encode waits ~15 ms"). Raise the WGC helper's capture+encode
loop and the single-process capture+encode loop to THREAD_PRIORITY_HIGHEST, and the
transmit thread to ABOVE_NORMAL, via a cross-platform boost_thread_priority() (Windows-only
effect — the Linux host caps the game via gamescope so its threads aren't starved).

Not yet built/validated on the GPU box (it's down); the cross-platform side compiles
(cargo check) and the Windows calls are cross-checked against the windows-0.62 API.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 16:12:29 +00:00
enricobuehler 9771aa8815 fix(host/windows): binary-search clamp NVENC bitrate to the codec-level max (not ×¾ step-down)
ci / web (push) Successful in 28s
ci / rust (push) Successful in 1m42s
ci / docs-site (push) Successful in 28s
apple / swift (push) Successful in 55s
android / android (push) Successful in 1m55s
deb / build-publish (push) Successful in 2m29s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
ci / bench (push) Successful in 4m27s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m54s
When a client requests a bitrate above the GPU's HEVC/AV1 level ceiling, NVENC rejects
initialize_encoder. The old probe stepped the rate down by ×¾ each retry, undershooting
the real ceiling badly (a 1 Gbps request landed ~300 Mbps even with the level cap near
800). Replace it with a binary search over [floor, requested] that converges (±20 Mbps)
on the HIGHEST rate NVENC accepts and clamps to that — so the stream uses the full
codec-level bitrate. Factored the session open/config/init into try_open_session() for
the probe; split-encode rejection is disambiguated from a bitrate-cap rejection (retry
once with split disabled) and the floor fallback also tries split-disabled.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:42:00 +00:00
enricobuehler a4df75132a fix(host/windows): HEVC/AV1 HIGH tier so high client bitrates aren't quartered
android / android (push) Successful in 1m56s
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 1m35s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 29s
deb / build-publish (push) Successful in 2m26s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m36s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m8s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m8s
docker / deploy-docs (push) Successful in 18s
NVENC defaulted to Main tier, whose per-level bitrate ceiling at 5K (HEVC Level 6.2
Main ≈ 240 Mbps) made initialize_encoder reject a high client bitrate; the existing
probe-and-step-down then silently dropped a ~1 Gbps request by ×¾ to ~240-320 Mbps —
visible color/motion compression on fast scenes. Set HIGH tier (≈800 Mbps for HEVC,
higher for AV1) + autoselect level so the requested bitrate goes through. `tier`/`level`
are u32 (HIGH=1, AUTOSELECT=0) shared across the HEVC/AV1 union offset; the step-down
remains as a safety net. Not yet built/validated on-box (box offline).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:33:31 +00:00
enricobuehler 4cc57d5c39 perf(host/windows): move capture→encode off the 3D engine (NV12/P010 video-processor path, zero-copy, GPU priority)
apple / swift (push) Successful in 56s
ci / rust (push) Successful in 1m36s
android / android (push) Successful in 1m56s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 28s
deb / build-publish (push) Successful in 2m26s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m33s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m58s
The Windows host capped at ~60 fps with 35-40 ms latency on a GPU-heavy game:
the per-frame capture→encode path shared the 3D engine with the game and got
scheduled behind it. Rework to minimize 3D-engine work per frame:

- VideoConverter (D3D11 video processor): capture → NVENC-native NV12/P010 so
  NVENC skips its internal RGB→YUV (a 3D/compute step). Wired into both DDA
  (dxgi.rs) and WGC (wgc.rs). New PixelFormat::Nv12/P010 + NVENC YUV input.
- GPU scheduling hardening (Apollo-style): D3DKMTSetProcessSchedulingPriorityClass
  HIGH, absolute SetGPUThreadPriority, SetMaximumFrameLatency(1).
- WGC SDR zero-copy (hold pool frames; no CopyResource). DDA keeps a fast
  CopyResource to decouple its single-frame acquire/release from the async convert.
- Pipelined helper encode loop (PUNKTFUNK_ENCODE_DEPTH, default 1) + perf split
  (cap_wait / encode / write).

Live on the RTX 4090: hard 60 fps ceiling removed (now scene-scaling 40-200+),
latency much reduced. Residual cap in GPU-pinned scenes is the irreducible RGB→YUV
convert (no fixed-function unit on NVIDIA — VideoProcessing engine reads 0%) waiting
behind an uncapped game under WDDM context time-slicing; Linux avoids it via
gamescope capping the game to the display refresh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:08:03 +00:00
enricobuehler d5757980f8 style(host): rustfmt — align video_caps comment in m3 test call-sites
apple / swift (push) Successful in 53s
ci / web (push) Successful in 32s
ci / rust (push) Successful in 1m36s
ci / docs-site (push) Successful in 29s
deb / build-publish (push) Successful in 2m26s
decky / build-publish (push) Successful in 22s
ci / bench (push) Successful in 4m31s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 17s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m3s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m40s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 21s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m19s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m17s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m17s
android / android (push) Has been cancelled
docker / deploy-docs (push) Successful in 17s
cargo fmt --all over the merged connect() call-sites (the video_caps/
launch args landed without a fmt pass). Comment-alignment only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 07:48:23 +00:00
enricobuehler 9537efdcd5 feat(client/windows): HDR10 (BT.2020 PQ) decode + present
apple / swift (push) Successful in 54s
windows-msix / package (push) Successful in 1m8s
windows / build (push) Successful in 1m14s
android / android (push) Failing after 1m43s
ci / rust (push) Failing after 48s
ci / web (push) Successful in 28s
ci / docs-site (push) Successful in 29s
deb / build-publish (push) Successful in 3m5s
decky / build-publish (push) Successful in 14s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 3s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m35s
flatpak / build-publish (push) Failing after 4m27s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 3m54s
docker / deploy-docs (push) Successful in 6s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m12s
Light up the dormant 10-bit/HDR path end to end on the Windows client.

- core: NativeClient::connect gains a video_caps param threaded into the Hello. The Windows
  client advertises VIDEO_CAP_10BIT | VIDEO_CAP_HDR; every other caller (the C ABI shim,
  Linux, Android, host test connects) passes 0, so the 8-bit BT.709 path is unchanged. The
  host already gates a Main10/PQ encode on these bits + PUNKTFUNK_10BIT.
- video.rs: a PQ frame (color_trc == SMPTE2084) converts 10-bit YUV → X2BGR10 (== DXGI
  R10G10B10A2) with the BT.2020 matrix via sws_setColorspaceDetails; swscale applies only
  the matrix + range, so the PQ-encoded samples pass through untouched.
- present.rs: on an HDR frame the swapchain flips in place (ResizeBuffers) to R10G10B10A2 +
  DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020 + HDR10 metadata; the passthrough shader is
  unchanged and the compositor maps PQ→display. Switched to ALPHA_MODE_IGNORE so the 10-bit
  padding bits don't render transparent. SDR stays 8-bit B8G8R8A8.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 00:18:30 +02:00
enricobuehler ad0cb1b582 feat(host/windows): capture the secure desktop in HDR via DDA (no SDR drop)
ci / web (push) Successful in 32s
ci / rust (push) Successful in 1m26s
android / android (push) Failing after 43s
apple / swift (push) Successful in 55s
deb / build-publish (push) Successful in 2m24s
decky / build-publish (push) Successful in 22s
ci / bench (push) Successful in 4m30s
ci / docs-site (push) Successful in 28s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 16s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4m1s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m31s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 21s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m15s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 7m46s
docker / deploy-docs (push) Successful in 21s
The secure-desktop DDA leg went black with HDR on: legacy DuplicateOutput (the SDR-era
API) can't capture an FP16/HDR desktop, and dropping the SudoVDA out of HDR is denied on
the Winlogon desktop (so the SDR-drop attempt just churned and stayed black).

Instead capture HDR natively on the DDA path — the capturer already has the full
FP16→BT.2020 PQ→R10G10B10A2 conversion (hdr_fp16 path), it just never requested FP16.
Thread a want_hdr flag into duplicate_output: for an HDR session request
DuplicateOutput1 with FP16 first and retry it (5×) instead of bailing to the
HDR-incapable legacy fallback. The secure-desktop mux now reads the monitor's real HDR
state and opens DDA in HDR when set — no advanced-color toggling at all. The
normal-desktop DDA overlay/flip issues that pushed us to WGC don't apply to the composed
Winlogon UI. want_hdr is threaded through every (re)duplication incl. ACCESS_LOST recovery.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 22:11:07 +00:00
enricobuehler 69765bad93 fix(host/windows): drop the SudoVDA to SDR for the secure DDA leg, verified
Keep HDR OFF for the DDA (secure-desktop) path rather than bailing to WGC: the DDA
capturer is SDR-only (BGRA8), so an HDR SudoVDA makes the Winlogon capture black.
On the secure transition, drop the monitor out of HDR and VERIFY it took (re-read
advanced_color_enabled, retry up to 6×200ms) before opening DDA — the CCD toggle can
transiently fail (rc=5) or lag. Restore HDR on return to the WGC normal-desktop leg.
Logs clearly if the drop can't be applied (e.g. denied on the Winlogon desktop).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 21:56:11 +00:00
enricobuehler af6787c0bd fix(host/windows): honor the SudoVDA's real HDR state (stop wiping the user's HDR toggle)
HDR streamed nothing and "didn't persist" because build() forced the SudoVDA's
advanced-color state to match the handshake bit_depth on every build — with an
8-bit-negotiated session (the common case: clients advertise no 10-bit cap) that
meant set_advanced_color(false) on every connect, wiping a user's deliberate
Windows HDR toggle on the virtual display.

But the whole pipeline already follows the monitor's REAL HDR state: WGC captures
FP16 when HDR is on, NVENC forces Main10 + BT.2020 PQ from the 10-bit capture
format regardless of the negotiated depth (encode/nvenc.rs), and the client
auto-detects PQ from the HEVC VUI. So the negotiated bit_depth must NOT drive the
monitor's colorspace.

- build(): only ever ENABLE HDR (proactively, for a negotiated 10-bit session);
  never force it off. A user-enabled HDR session now persists and flows end-to-end.
- secure-desktop mux: gate the HDR→SDR drop (for the DDA leg) on the monitor's
  ACTUAL advanced-color state at switch time, not bit_depth — so an HDR session
  with an 8-bit handshake still drops correctly for Winlogon and restores after.
- sudovda: add advanced_color_enabled() reader (DISPLAYCONFIG_GET_ADVANCED_COLOR_INFO).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 21:37:04 +00:00
enricobuehler 0ce2e37faf refactor(host/windows): clean up DDA path + add a proper Windows service
Final cleanup after the DDA-parity work, plus an end-user service to replace the
PsExec/VBS/scheduled-task launch chain.

Cleanup (behavior-preserving):
- sudovda.rs: drop the dead legacy GDI isolate_displays/restore_displays (CCD is
  the sole isolation path), the always-empty Monitor.isolated field, and the
  vestigial reassert_isolation + PUNKTFUNK_ISOLATE_DISPLAYS knob; fix stale comments.
- dxgi.rs: downgrade leftover debug warns/infos (DuplicateOutput1 retry, FALLBACKS,
  hook-hits, AcquireNextFrame idle timeout) to debug!; remove the PUNKTFUNK_NO_CURSOR
  per-frame test knob.

Windows service (src/service.rs, `punktfunk-host service`):
- SCM supervisor (windows-service crate) that duplicates its LocalSystem token,
  retargets it to the active console session, and CreateProcessAsUserW's the host
  there (Sunshine/Apollo model) — relaunching on exit and console session switch,
  inside a kill-on-close job object so a service crash never orphans the host.
- install/uninstall/start/stop/status subcommands: one elevated `service install`
  registers an auto-start LocalSystem service + firewall rules + a default host.env.
- Config moves to %ProgramData%\punktfunk\host.env; config_dir() now resolves to
  %ProgramData%\punktfunk on Windows (replacing the APPDATA=C:\Users\Public hack),
  with a PUNKTFUNK_CONFIG_DIR override. Logs land in %ProgramData%\punktfunk\logs\.
- merged_env_block (shared with the WGC helper) now also carries RUST_LOG.
- docs/windows-service.md + scripts/windows/host.env.example; windows-host.md updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 18:44:15 +00:00
enricobuehler 6d611cf889 feat(host/windows): reference-counted SudoVDA monitor lifecycle (reuse on quick reconnect, teardown when idle)
User: tearing down + recreating the monitor per session is wrong both ways — a
fixed GUID collides on overlapping sessions, but a per-session GUID makes a new
screen on every reconnect; host-lifetime would leave a phantom display for
physical-screen users. Correct model = rock-solid state machine.

Replace the per-session create/REMOVE with a host-level reference-counted
manager (global MGR):
- States: Idle / Active{refs} / Lingering{until}.
- Connect (acquire): Idle→create; Lingering→reuse (cancel teardown, reconfigure
  if the mode changed) — the quick-reconnect reuse, no new screen/PnP chime;
  Active→refs++ (concurrent / Reconfigure-overlap), reconfigure on a mode change.
- Disconnect (release, via the MonitorLease keepalive Drop): refs-- ; at 0 →
  Lingering(now + PUNKTFUNK_MONITOR_LINGER_MS, default 10s).
- Background timer: Lingering past its deadline → REMOVE the monitor → Idle, so a
  physical screen returns ~10s after streaming stops.

Eliminates BOTH the cross-session REMOVE collision (teardown only at refs==0 +
expired grace) and the new-screen-on-reconnect, without a persistent phantom
display. The control-device handle is opened once (host-level) — a handle, not a
screen. SudoVdaDisplay is now a marker; the old create() body is create_monitor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 17:53:21 +00:00
enricobuehler ca375c7ce8 fix(host/windows): WGC mux — reuse the SudoVDA monitor + helper across secure switches (no teardown/recreate)
User: re-adding WGC brought back the teardown/recreate bug (audible disconnect/
connect on the secure<->normal switch). Cause: the secure->normal switch called
build() = vd.create() = IOCTL_REMOVE old SudoVDA monitor + IOCTL_ADD new one +
respawn the helper — the same teardown/recreate kernel stress we just eliminated
from DDA, now on the mux path.

Apply the same learning (reuse, don't tear down): the SudoVDA monitor and WGC
helper persist for the whole session; only the host-DDA leg opens (on secure)
and closes (on normal). On returning to normal, RESUME the still-alive helper
(drain its secure-dwell backlog + request a keyframe) instead of rebuilding.
The HDR-session colorspace restore (set_advanced_color(true) + helper rebuild)
is kept ONLY for bit_depth>=10 — an SDR session never changed the colorspace, so
it needs no rebuild at all. The secure switch already reuses the monitor
(open_dda on the existing target).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 17:27:50 +00:00
enricobuehler e8d885fb4f fix(host/windows): WGC relay — set SudoVDA color to match session bit depth at build (kill persisted HDR)
Re-test still broken: the WGC helper captured HDR FP16 BT.2020 PQ from the FIRST
frame (before any switch), feeding the 8-bit SDR encoder → broken normal-desktop
image. Root cause: the SudoVDA's advanced-color (HDR) state PERSISTS on the
monitor across sessions, so the 8-bit session inherited HDR left enabled by the
earlier broken toggle — and gating the per-switch toggles can't undo a state
that's already on at start.

Fix: in build() (runs on initial create + every mode-switch/return-from-secure
rebuild), force set_advanced_color(target, bit_depth>=10) BEFORE spawning the
WGC helper, with a 250ms settle if it changed. An 8-bit session now always
captures SDR via WGC (matching the encoder); 10-bit keeps HDR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 17:18:41 +00:00
enricobuehler d2e536d299 fix(host/windows): WGC relay — don't force HDR on SDR sessions across the secure mux
Re-enabling the WGC relay brought back a broken image on the secure->normal
switch. Log root cause: on returning to the normal desktop the relay called
set_advanced_color(target, true) to 'restore HDR', so the rebuilt WGC helper
captured HDR FP16 BT.2020 PQ while the session encoder is 8-bit SDR -> format
mismatch (the 'HDR gets restored when flipping back to WGC' bug).

Gate BOTH set_advanced_color toggles on bit_depth>=10. An SDR (8-bit) session
now stays SDR across WGC<->DDA switches (no HDR force, no needless topology
change); HDR sessions keep the drop-on-secure / restore-on-normal behavior.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 17:13:02 +00:00
enricobuehler f469dfcc76 chore(host/windows): clean up DDA capture — fix unused imports, quiet secure-desktop log, sane retry default
- Remove 4 unused imports (PCWSTR in composed_flip, anyhow macro + SizeInt32 in
  wgc, Write in wgc_relay).
- DuplicateOutput1 retry defaults to N=1 (immediate legacy): on the secure
  desktop DuplicateOutput1 is LOGON_UI-only so it always refuses, and the
  release-before-reduplicate + gentle recovery keep the legacy dup stable;
  retrying there only blocked. Still env-tunable (PUNKTFUNK_DUP_RETRY_N/_MS).
- Throttle the 'using legacy DuplicateOutput' warning (expected + once-per-gentle-
  recovery on secure) so a lock dwell doesn't flood the log.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 17:05:02 +00:00
enricobuehler dc734c711b fix(host/windows): re-sync thread desktop on EVERY recovery (symmetric enter/leave secure)
User's observation: entering UAC/lock works instantly, but clicking OUT of it
breaks (with the disconnect sound) — Apollo's enter and leave are symmetric.
Root cause: attach_input_desktop() (SetThreadDesktop to the current input
desktop) was gated behind is_secure_desktop() in recreate_dupl, so:
- Default->Winlogon (enter): is_secure==true -> re-attach to Winlogon -> works.
- Winlogon->Default (leave): is_secure==false -> SKIP re-attach -> the capture
  thread stays stuck on the now-gone Winlogon desktop -> every rebuild fails ->
  no frames -> client timeout -> session ends -> SudoVDA removed (the disconnect
  sound).

Fix: call attach_input_desktop() UNCONDITIONALLY on every rebuild (Apollo calls
syncThreadDesktop before every duplicate), so leaving secure re-attaches to the
returned desktop. reassert_isolation stays secure-only. Also stop leaking the
HDESK (CloseDesktop right after SetThreadDesktop, like Apollo) so calling it on
every recovery is safe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 16:57:20 +00:00
enricobuehler 9a9214a2d8 fix(host/windows): gentle DDA recovery — stop the tight teardown/recreate loop
Per the user's insight: on the secure (Winlogon) desktop the duplication dies on
every independent-flip, and our tight recovery loop tore it down + recreated it
hundreds of times/sec — that release/recreate cycle is the real kernel stress,
and it stalled the send thread long enough that the client timed out ('display
disconnected'). Normal-desktop streaming is already solid (per-session GUID
killed the collision); this only changes the loss-recovery cadence.

Gentle recovery (user chose 'keep session alive'):
- cap the cheap re-duplicate to PUNKTFUNK_RECOVER_MS (default 250ms, was 5ms)
- cap the heavy new-device rebuild to PUNKTFUNK_REBUILD_MS (default 1500ms, was
  250ms) — it's the costliest teardown, throttled hardest
- repeat the last frame between attempts (no busy-spin, no 8ms sleep)

~200/s -> ~4/s teardown/recreate during a secure dwell. The session survives
lock/UAC (frozen/laggy secure screen, then clean resume on unlock) instead of
churning the kernel into a disconnect. Both cadences env-tunable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 16:41:03 +00:00
enricobuehler 2f7c021cac fix(host/windows): per-session SudoVDA monitor GUID (stop overlapping-session monitor teardown)
User observed: 'display disconnected' + freeze with NO context change, and
'first switch happy, subsequent slower, then chaos under stress'. Log shows the
cause: MONITOR_GUID was a FIXED constant, so overlapping sessions (a client
RECONNECTING after a freeze before the old session tore down, or concurrent
sessions) all map to the SAME SudoVDA monitor (same GUID -> IOCTL_ADD reuses
target 257). When the old session ends, its IOCTL_REMOVE tears the monitor down
OUT FROM UNDER the live session -> 'display disconnected' + the late
E_INVALIDARG/MODE_CHANGE failures (output vanished mid-session) -> cascade.

Fix: next_monitor_guid() returns a unique GUID per (process, session) [base GUID
with low 48-bit node = pid<<16 | session#]; create() threads it into AddParams
AND the keepalive (which REMOVEs by it). Each session now owns its own monitor;
one ending can't kill another. (The 200ms DuplicateOutput1 retry confirmed
working — 'succeeded on retry' logged; the residual failures were this
collision, not the race.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 16:20:26 +00:00
enricobuehler ce84861e3a fix(host/windows): DuplicateOutput1 retry wait 200ms (Apollo's value), env-tunable
The old-dup kernel teardown takes ~200ms (Apollo waits exactly that), so the
previous 2-16ms retries were too short and still fell through to the churning
legacy dup. Bump to PUNKTFUNK_DUP_RETRY_MS (default 200) x PUNKTFUNK_DUP_RETRY_N
(default 6) so the robust DuplicateOutput1 dup wins the race. Env-tunable for
on-box dialing without a rebuild.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 16:07:52 +00:00
enricobuehler eb451d8bc6 fix(host/windows): retry DuplicateOutput1 to ride out the old-dup teardown race
User's insight, and it fits the evidence exactly: in duplicate_output the FIRST
DuplicateOutput1 (called microseconds after the caller releases the old
duplication via self.dupl=None) returns E_ACCESSDENIED, but the legacy
DuplicateOutput a beat later SUCCEEDS — the only difference is TIMING. The
kernel-side teardown of the just-released duplication is async, so the immediate
DuplicateOutput1 races it ('output still duplicated' -> E_ACCESSDENIED). We then
fell straight through to legacy DuplicateOutput, which 'succeeds' into a FRAGILE
dup that churns ACCESS_LOST/MODE_CHANGE every few ms on this cross-GPU IDD
(causing the post-login freeze + UAC-confirm drop).

Fix: retry DuplicateOutput1 up to 5x with escalating 2/4/8/16 ms waits before
falling back to legacy, so the teardown finishes and the ROBUST DuplicateOutput1
dup succeeds (no churn). Bounded (~30 ms worst case) so a genuine failure still
falls back quickly. This is exactly Apollo's 2x/200ms retry rationale.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 16:02:22 +00:00
enricobuehler 1e1e5ce9b5 fix(host/windows): Option-handle the multi-line dupl.GetFramePointerShape call too
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 15:41:41 +00:00
enricobuehler da43b5e8d3 fix(host/windows): release the old duplication before re-duplicating (THE born-lost bug)
DuplicateOutput1 returned E_ACCESSDENIED ~8815x even with PER_MONITOR_AWARE_V2
confirmed on the capture thread (thread_is_v2=true) — so DPI was NOT the cause.
The real cause: DXGI permits only ONE IDXGIOutputDuplication per output, and on
ACCESS_LOST you MUST release the old one before re-duplicating. Our recovery
(try_reduplicate / recreate_dupl) created the NEW duplication while the OLD
self.dupl was still alive → the output stayed held → DuplicateOutput1
E_ACCESSDENIED and the legacy fallback returned a BORN-LOST dup. It never
converged because there was always exactly one stale dup alive at creation
time. The initial open() works precisely because there's no prior dup; Apollo
is clean because it releases (dup.reset()) before every re-DuplicateOutput.

Fix: make self.dupl an Option and set it to None (drop → release the output)
BEFORE duplicate_output in try_reduplicate and before reopen_duplication in
recreate_dupl, then Some(new). acquire() gets a None-guard that synthesizes
ACCESS_LOST (routes into recovery) so a transient None can't panic. All
ReleaseFrame/AcquireNextFrame sites updated for the Option.

This is the documented DDA recovery requirement and the one thing that
distinguished our failing DuplicateOutput1 from Apollo's working one.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 15:40:50 +00:00
enricobuehler c8fb4822a2 fix(host/windows): per-thread Per-Monitor-V2 DPI awareness so DuplicateOutput1 succeeds
The remaining born-lost ACCESS_LOST storm traces to ONE thing: our
IDXGIOutput5::DuplicateOutput1 returns E_ACCESSDENIED (0x80070005) ~4370x, so
we fall back to legacy DuplicateOutput, which yields a BORN-LOST duplication on
this hybrid box. Apollo's DuplicateOutput1 SUCCEEDS on the identical
desktop/output/4090-device → a working dup, clean capture.

Root cause: DuplicateOutput1 REQUIRES Per-Monitor-Aware-V2. At startup our
SetProcessDpiAwarenessContext(PER_MONITOR_AWARE_V2) FAILS with E_ACCESSDENIED
('already set' — a manifest/runtime locked the process to a lower awareness),
and GetAwarenessFromDpiAwarenessContext reports 2 for BOTH Per-Monitor V1 and
V2, so the earlier 'awareness=2' was misleading — the process is likely V1,
which DuplicateOutput1 rejects with E_ACCESSDENIED. (Legacy DuplicateOutput has
no V2 requirement, so it 'worked' but born-lost.)

Fix: SetThreadDpiAwarenessContext(PER_MONITOR_AWARE_V2) on the capture thread
in open() — a per-thread override that takes regardless of the process default,
so DuplicateOutput1 can succeed (the working dup Apollo gets). Logs set_ok +
thread_is_v2 (via AreDpiAwarenessContextsEqual) to confirm V2 actually applied.
Topology fixes (sole display, no MODE_CHANGE) and the recovery backstops stay.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 15:29:17 +00:00
enricobuehler c60a05dbe9 fix(host/windows): make SudoVDA the sole display via clean CCD (the IDD needs to be primary/composited)
Live result of the previous build: the MODE_CHANGE_IN_PROGRESS storm was FIXED
(0 occurrences) by dropping primary-promotion — but it exposed the regression
the review predicted: a non-primary EXTENDED SudoVDA is NOT DWM-composited on
this box, so DDA gets born-lost ACCESS_LOST (0x887a0026) + black frames. The
IDD genuinely must be the sole/primary/composited display here.

Apollo reaches that end state ('Virtual Desktop: 5120x1440', sole display) via
Windows AUTO-promoting the real WDDM display over the box's leftover 1024x768
basic display — but Windows does NOT auto-promote for us, leaving the IDD
extended. So make it sole explicitly, the clean way:
- create(): deactivate the other display(s) via the atomic CCD path
  (isolate_displays_ccd) by DEFAULT (opt out with PUNKTFUNK_NO_ISOLATE). Drop
  the legacy per-device GDI detach from the path (it misses iGPU-attached
  monitors and churns; kept #[allow(dead_code)] for reference).
- set_active_mode(): CDS_UPDATEREGISTRY only — set the mode in place, NO
  CDS_SET_PRIMARY / CDS_GLOBAL / DM_POSITION. A sole display is already primary,
  so there's nothing to contest → no MODE_CHANGE storm (that storm came from
  promoting primary at (0,0) WHILE the basic display was still active).

Net: sole SudoVDA → primary → composited → capturable, with no topology
contest. Keeps the prior MODE_CHANGE-as-transient handling + removed born-lost
escape as backstops.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 15:12:31 +00:00
enricobuehler 769fd96b87 fix(host/windows): stop SudoVDA MODE_CHANGE_IN_PROGRESS storm — don't force IDD primary by default
ROOT CAUSE (verified by multi-agent compare vs Apollo + adversarial review):
set_active_mode() applied the SudoVDA mode with CDS_UPDATEREGISTRY | CDS_GLOBAL
| CDS_SET_PRIMARY + DM_POSITION(0,0) — promoting the freshly-added IDD to
PRIMARY at the virtual-screen origin and persisting it globally. On this box
(baseline active display = a 1024x768 basic 'WinDisc') that primary-promotion
contests the existing display so the desktop topology never reaches a stable
fixed point → every DuplicateOutput/AcquireNextFrame during the unending
settle returns DXGI_ERROR_MODE_CHANGE_IN_PROGRESS (0x887A0025). Apollo, live
on this EXACT box with an empty config, never promotes primary and captures
the same SudoVDA at 5120x1440 with zero DXGI errors. (Ruled out earlier on the
live box: win32u hook, DPI, independent-flip/overlay, isolation, render pin.)

Fixes (subtractive, gated per adversarial review):
- sudovda.rs set_active_mode: default to CDS_UPDATEREGISTRY only (no primary
  promotion, no GLOBAL, no DM_POSITION) = Apollo-parity for the multi-display
  default. Promote to primary (CDS_GLOBAL|CDS_SET_PRIMARY+DM_POSITION) ONLY
  when PUNKTFUNK_ISOLATE_DISPLAYS=1 (sole display, where a blank extended IDD
  would otherwise yield no frames). Avoids regressing headless/isolated +
  mid-stream Reconfigure.
- dxgi.rs acquire: treat MODE_CHANGE_IN_PROGRESS (0x887A0025) as a TRANSIENT
  (Ok(None), repeat last frame, wait it out) instead of falling through to the
  fatal Err arm → cold-rebuild → create()→set_active_mode (which re-issued the
  mode change and amplified the storm).
- dxgi.rs acquire: remove the born-lost cold-rebuild escape — it re-created the
  SudoVDA (IOCTL REMOVE/ADD = the audible PnP chime the user heard) and never
  converged; now repeat last frame in-process (never tear the IDD down mid-
  session, like Apollo). Overlay + cheap-spin/HDR recovery left intact.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 14:59:42 +00:00
enricobuehler 900089c44c fix(host/windows): don't pin SudoVDA render adapter by default (Apollo parity)
GROUND TRUTH from Apollo streaming live on this exact box (empty config):
captures the SudoVDA at 5120x1440@240 on the RTX 4090 with ZERO ACCESS_LOST /
born-lost / MODE_CHANGE -- clean, no overlay, no isolation, no render pin. That
disproves the independent-flip theory (a sole SudoVDA captures fine here) and
points at something WE do that Apollo doesn't.

The concrete culprit: we call SET_RENDER_ADAPTER, which this driver IGNORES
(logs 'render adapter DIFFERS from pinned add=0x23664 pinned=0x15768') and the
IDD ends up rendering on adapter 0x23664 while its DXGI output is enumerated
under the 4090 (0x15768) where we create the capture device -- a cross-GPU
mismatch that is the real source of the perpetual ACCESS_LOST +
MODE_CHANGE_IN_PROGRESS (0x887A0025) storm. Apollo never pins (empty config),
so its IDD stays on its natural adapter, aligned with capture.

Make the render pin OPT-IN (PUNKTFUNK_RENDER_ADAPTER=<name>); default to NOT
pinning, matching Apollo. The startup log now shows the resulting AddOut LUID
so we can confirm the IDD lands on the 4090.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 14:37:31 +00:00
enricobuehler cd72164db2 fix(host/windows): keep multi-display (Apollo parity) instead of sole-display isolation
CONFIRMED on the live RTX4090+iGPU box: hook fires+verified, DPI=2, overlay
running, yet the stream STILL freezes -- born-lost dropped but MODE_CHANGE_IN_
PROGRESS (0x887A0025) churn took over (2284x) and frames go stale. Root cause
is the topology itself: create() makes SudoVDA the SOLE active display
(CDS_SET_PRIMARY + isolate_displays + isolate_displays_ccd), and a sole display
on a hybrid box goes into fullscreen independent-flip / MPO that Desktop
Duplication cannot capture.

Apollo is rock solid on this EXACT box because it does the opposite: it keeps
the physical monitor ACTIVE and arranges the virtual display alongside it
(rearrangeVirtualDisplayForLowerRight, 'Do not change the primary'). Multi-
display is DWM-composited, so the output never independent-flips.

Make isolation OPT-IN (PUNKTFUNK_ISOLATE_DISPLAYS=1) and default to NOT
isolating -- match Apollo's multi-display topology. SudoVDA stays primary (so
it carries the shell -> frames) but other monitors stay active, which disables
independent-flip. reassert_isolation honors the same flag (re-isolating mid-
stream would itself trigger the storm). Keeps the overlay + born-lost escape
as belt-and-suspenders.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 14:23:20 +00:00
enricobuehler 5f84c5785c fix(host/windows): force-composed-flip overlay in the single-process DDA path
CONFIRMED root cause via instrumented build: hook_hits=1+ (win32u hook fires,
verified-patched) and DPI awareness=2 (PER_MONITOR), yet the born-lost
ACCESS_LOST storm persists with 100% DuplicateOutput1 E_ACCESSDENIED. That
rules out reparenting (the hook works) and DPI -> it is fullscreen
independent-flip / MPO: the SudoVDA virtual display, isolated as the SOLE
active output, scans out one plane on one display, bypassing DWM composition,
so Desktop Duplication gets a born-lost duplication.

Apollo never hits this because it runs WITH a physical monitor attached
(multi-display is already DWM-composited); we isolate to sole-display, so we
must force composition ourselves. The fix already existed (ForceComposedFlip,
a tiny topmost layered overlay that disqualifies independent-flip) but was
only wired into the WGC relay path's secure branch, which PUNKTFUNK_NO_WGC=1
disables. Wire it into virtual_stream unconditionally (DDA owns the normal
desktop here, where the storm is). Held for the session; Drop tears it down;
PUNKTFUNK_FORCE_COMPOSED=0 disables.

Keeps the prior build's born-lost escape as a safety net.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 14:08:59 +00:00
enricobuehler 63b63a4010 fix(host/windows): instrument + harden DDA against the born-lost ACCESS_LOST storm
The hybrid RTX4090+iGPU box storms DXGI_ERROR_ACCESS_LOST (0x887A0026) +
MODE_CHANGE_IN_PROGRESS (0x887A0025) ~3s after first frame: every rebuilt
duplication is born-lost (created OK, first AcquireNextFrame instantly
ACCESS_LOST), seeds black, retries forever. The steady-state m3 loop calls
try_latest()->acquire() which returns Ok(None) on every recovery, so the
cold-rebuild escape (MAX_CAPTURE_REBUILDS) was unreachable -> frozen stream.

Multi-agent root-cause + adversarial review point at the win32u GPU-pref hook
being ineffective (patched on the main thread, no FlushInstructionCache, never
verified) rather than the synthesis's independent-flip theory (Apollo has no
overlay yet is stable on this exact box).

This build instruments + applies the safe, high-probability fixes:
- Hook: FlushInstructionCache after the inline patch (cross-thread i-cache);
  read back the 12 patched bytes and error! if they didn't land; per-call hit
  counter (hybrid_hook_hits) logged after open -- hits==0 proves the hook is
  off DXGI's reparent path.
- DPI: log SetProcessDpiAwarenessContext result + effective awareness (need
  2=PER_MONITOR for DuplicateOutput1; explains the 100% E_ACCESSDENIED).
- SetThreadExecutionState(ES_CONTINUOUS|ES_DISPLAY_REQUIRED|ES_SYSTEM_REQUIRED)
  at capture open, restored on Drop -- stop IDD idle-invalidation (Apollo does
  this too).
- Born-lost escape: count consecutive born-lost rebuilds; on the NORMAL desktop
  (never the secure/Winlogon dwell) escalate to Err after ~5s so the m3 loop
  cold-rebuilds the whole pipeline instead of freezing on the last frame.

Diagnostic-forward: one test now tells us hook-hits + DPI awareness + whether
ExecutionState/desktop-sync alone fixes it, and the stream self-recovers
instead of wedging.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 14:02:55 +00:00