refactor: drop milestone names + consolidate clients; loss-recovery & rumble fixes
apple / swift (push) Failing after 40s
audit / cargo-audit (push) Failing after 1m12s
windows-msix / package (push) Successful in 1m37s
windows / build (push) Successful in 1m14s
android / android (push) Successful in 4m48s
ci / web (push) Successful in 27s
ci / rust (push) Successful in 4m21s
ci / docs-site (push) Successful in 31s
ci / bench (push) Successful in 4m39s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 19s
deb / build-publish (push) Successful in 6m3s
flatpak / build-publish (push) Successful in 4m13s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m16s
docker / deploy-docs (push) Successful in 18s

Two bodies of work in one commit (the rename moved files the fixes also touched).

Naming/structure cleanup (pre-launch):
- Host modules m3.rs->punktfunk1.rs, m0.rs->spike.rs; CLI m3-host->punktfunk1-host,
  m0->spike; bare `punktfunk-host` now prints help. Types M3Options/M3Source->
  Punktfunk1Options/Punktfunk1Source.
- Clients consolidated out of crates/ into clients/: punktfunk-client-rs->
  clients/probe (crate punktfunk-probe), client-linux->clients/linux,
  client-windows->clients/windows, punktfunk-android->clients/android/native
  (crate punktfunk-client-android; kept [lib] name=punktfunk_android so the JNI
  contract is unchanged). crates/ now holds only core + host.
- Milestone codes M0-M4 purged from code/CLI/CLAUDE.md/README/docs/docs-site,
  kept only in docs/implementation-plan.md. docs/m2-plan.md->
  docs/gamestream-host-plan.md. CI/gradle/flatpak paths updated.

Client loss-recovery (video froze and never recovered after a brief drop):
- Export punktfunk_connection_frames_dropped through the C ABI (the core already
  tracked it for the client keyframe-recovery loop; it was never reachable from
  the ABI clients). Regenerated punktfunk_core.h.
- Apple (StreamPump + Stage2Pipeline) and Android (decode.rs) now poll
  frames_dropped and request a keyframe when it climbs -- the same loss-driven
  recovery Linux/Windows already had. Under infinite GOP the decoder silently
  conceals reference-missing frames, so the decode-error trigger rarely fires.

Apple rumble robustness (worked then went spotty -- DualSense + Xbox):
- Add CHHapticEngine stopped/reset handlers (rebuild on app background / audio
  interruption / server reset) and drop the permanent `broken` latch on a
  transient drive failure; latch only when the controller truly has no haptics.
- Surface swallowed SDL set_rumble errors on Linux/Windows + diagnostic logging.

Verified: cargo build/clippy/fmt --workspace, C-ABI harness, header drift.
Not runnable on this box (verify in CI): Gitea workflows, gradle/Android,
flatpak, Swift/decky.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-18 21:03:55 +00:00
parent 1faa6c6ad4
commit 9c8fa9340c
110 changed files with 534 additions and 341 deletions
+40 -40
View File
@@ -37,12 +37,12 @@ Apollo is host-only. A stream flows: **nvhttp** (HTTPS pairing + serverinfo/appl
| Apollo subsystem | Apollo key files | punktfunk counterpart |
|---|---|---|
| Apollo — Protocol & streaming (RTP/FEC/ENet/RTSP/crypto) | `stream.cpp`; `rtsp.cpp`; `crypto.cpp`; `network.cpp`; `rtsp.h`; `stream.h` | `gamestream/stream.rs`; `gamestream/video.rs`; `gamestream/rtsp.rs`; `gamestream/control.rs`; `gamestream/audio.rs`; `m3.rs` |
| Apollo — Protocol & streaming (RTP/FEC/ENet/RTSP/crypto) | `stream.cpp`; `rtsp.cpp`; `crypto.cpp`; `network.cpp`; `rtsp.h`; `stream.h` | `gamestream/stream.rs`; `gamestream/video.rs`; `gamestream/rtsp.rs`; `gamestream/control.rs`; `gamestream/audio.rs`; `punktfunk1.rs` |
| Apollo (Sunshine fork) — GameStream HTTP server, pairing ceremony, and discovery/UPnP | `nvhttp.cpp`; `nvhttp.h`; `httpcommon.cpp`; `upnp.cpp`; `crypto.h`; `crypto.cpp` | `gamestream/nvhttp.rs`; `gamestream/pairing.rs`; `gamestream/tls.rs`; `gamestream/cert.rs`; `gamestream/serverinfo.rs`; `gamestream/mod.rs` |
| Apollo — Video encode pipeline & codec config | `video.cpp`; `nvenc_base.cpp`; `video_colorspace.cpp`; `cbs.cpp`; `nvenc_config.h`; `video.h` | `encode.rs`; `encode/nvenc.rs`; `encode/sw.rs`; `encode/linux.rs`; `zerocopy/mod.rs`; `gamestream/control.rs` |
| Apollo — Audio capture, encode, transport (Windows host) | `audio.cpp`; `audio.h`; `audio.cpp`; `common.h`; `stream.cpp` | `audio.rs`; `audio/wasapi_cap.rs`; `audio/linux.rs`; `gamestream/audio.rs`; `m3.rs` |
| Apollo — Audio capture, encode, transport (Windows host) | `audio.cpp`; `audio.h`; `audio.cpp`; `common.h`; `stream.cpp` | `audio.rs`; `audio/wasapi_cap.rs`; `audio/linux.rs`; `gamestream/audio.rs`; `punktfunk1.rs` |
| Apollo (Sunshine fork) — Input handling & injection | `input.cpp`; `input.cpp`; `keylayout.h`; `misc.cpp` | — |
| Apollo: App/process launch & display configuration (Windows host) | `process.cpp`; `display_device.cpp`; `process.h`; `virtual_display.h`; `misc.cpp`; `utils.cpp` | `vdisplay/sudovda.rs`; `vdisplay.rs`; `gamestream/apps.rs`; `library.rs`; `m3.rs`; `capture/wgc_relay.rs` |
| Apollo: App/process launch & display configuration (Windows host) | `process.cpp`; `display_device.cpp`; `process.h`; `virtual_display.h`; `misc.cpp`; `utils.cpp` | `vdisplay/sudovda.rs`; `vdisplay.rs`; `gamestream/apps.rs`; `library.rs`; `punktfunk1.rs`; `capture/wgc_relay.rs` |
| Apollo: Config, management/web UI, system tray | `config.h`; `config.cpp`; `confighttp.cpp`; `confighttp.h`; `system_tray.cpp`; `system_tray.h` | `mgmt.rs`; `mgmt_token.rs`; `main.rs`; `native_pairing.rs`; `library.rs`; `docs/windows-host.md` |
### Apollo — Protocol & streaming (RTP/FEC/ENet/RTSP/crypto)
@@ -523,7 +523,7 @@ For the **shared GameStream wire protocol**, punktfunk is at par with Apollo and
**How punktfunk does it.**
punktfunk runs **two protocol planes** that share one crypto/FEC core (`punktfunk-core`), with the streaming logic split across the GameStream-compat host (`crates/punktfunk-host/src/gamestream/`) and the native punktfunk/1 host (`m3.rs`).
punktfunk runs **two protocol planes** that share one crypto/FEC core (`punktfunk-core`), with the streaming logic split across the GameStream-compat host (`crates/punktfunk-host/src/gamestream/`) and the native punktfunk/1 host (`punktfunk1.rs`).
#### GameStream-compat plane (mirrors Apollo's wire protocol)
- **RTSP** (`gamestream/rtsp.rs`): a hand-rolled TCP/48010 handler, one request per connection (`handle_conn` rtsp.rs:58, closes the socket so moonlight-common-c sees end-of-response). Maps OPTIONS/DESCRIBE/SETUP/ANNOUNCE/PLAY/TEARDOWN (rtsp.rs:124). DESCRIBE (`describe_sdp` rtsp.rs:224) emits HEVC/AV1 indicators and the six Opus `surround-params` lines in Sunshine's normal-before-HQ order (rtsp.rs:239-251). ANNOUNCE parses `x-nv-*` keys into a `StreamConfig` (rtsp.rs:270) + `AudioParams` (rtsp.rs:313), snapping Opus packet duration to 5/10 ms (rtsp.rs:327). PLAY launches video (`stream::start`) and audio (`audio::start`). **Notable: it advertises `encryptionSupported:0` (rtsp.rs:228) — streams are plaintext; only the ENet control plane is encrypted.**
@@ -533,16 +533,16 @@ punktfunk runs **two protocol planes** that share one crypto/FEC core (`punktfun
- **Pairing crypto** (`gamestream/crypto.rs`): AES-128-ECB-no-pad + SHA-256 + RSA, the GameStream pairing primitives.
#### Native punktfunk/1 plane (a strict superset, no Apollo equivalent)
- `m3.rs` drives QUIC control (Hello/Welcome/Start/Reconfigure/ClockProbe) + the hardened M1 `Session` data plane over raw UDP.
- `punktfunk1.rs` drives QUIC control (Hello/Welcome/Start/Reconfigure/ClockProbe) + the hardened core `Session` data plane over raw UDP.
- The data plane (`punktfunk-core/src/session.rs`, `crypto.rs`, `fec/`, `transport/udp.rs`) uses **GF(2¹⁶) Leopard FEC** (65535-shard ceiling, fec/mod.rs:5) + per-direction-salted AES-GCM with seq-as-AAD (crypto.rs:1-20).
- Transport (`transport/udp.rs`) has batched `sendmmsg`/`recvmmsg` AND **UDP GSO on both Linux (`UDP_SEGMENT`, udp.rs:97) and Windows (`WSASendMsg`+`UDP_SEND_MSG_SIZE`/USO, on by default, udp.rs:135-241)** with auto-fallback latching. The host send path is a dedicated `send_loop` (m3.rs:1804) doing FEC+seal+microburst-paced send (`paced_submit` m3.rs:1720) off the encode thread.
- Adds mid-stream `Reconfigure`, an 8-round NTP clock-skew handshake, and `ProbeRequest`/`run_probe_burst` bandwidth probing (m3.rs:1629) — none of which exist in GameStream/Apollo.
- Transport (`transport/udp.rs`) has batched `sendmmsg`/`recvmmsg` AND **UDP GSO on both Linux (`UDP_SEGMENT`, udp.rs:97) and Windows (`WSASendMsg`+`UDP_SEND_MSG_SIZE`/USO, on by default, udp.rs:135-241)** with auto-fallback latching. The host send path is a dedicated `send_loop` (punktfunk1.rs:1804) doing FEC+seal+microburst-paced send (`paced_submit` punktfunk1.rs:1720) off the encode thread.
- Adds mid-stream `Reconfigure`, an 8-round NTP clock-skew handshake, and `ProbeRequest`/`run_probe_burst` bandwidth probing (punktfunk1.rs:1629) — none of which exist in GameStream/Apollo.
**Intentional divergences (by design, not gaps):**
- Two protocol planes from one core: punktfunk keeps the GameStream wire protocol AND its own punktfunk/1 (QUIC control + raw-UDP GF(2^16) data plane), where Apollo is GameStream-only. The native plane intentionally expresses things GameStream cannot: GF(2^16) Leopard FEC (65535-shard ceiling vs Apollo's 255-shard ~1 Gbps wall, fec/mod.rs:5), mid-stream Reconfigure, an NTP clock-skew handshake, and active bandwidth probing (m3.rs:1629).
- Zero-copy by design: punktfunk's video packetizer and the native seal path are built around reusable buffers and a wire-buffer pool (m3.rs `reclaim_wires`), and the capture→encoder path is GPU zero-copy. Apollo's stream.cpp:676 also does zero-copy shard pointers, but punktfunk's choice is a core invariant, not just an optimization.
- Two protocol planes from one core: punktfunk keeps the GameStream wire protocol AND its own punktfunk/1 (QUIC control + raw-UDP GF(2^16) data plane), where Apollo is GameStream-only. The native plane intentionally expresses things GameStream cannot: GF(2^16) Leopard FEC (65535-shard ceiling vs Apollo's 255-shard ~1 Gbps wall, fec/mod.rs:5), mid-stream Reconfigure, an NTP clock-skew handshake, and active bandwidth probing (punktfunk1.rs:1629).
- Zero-copy by design: punktfunk's video packetizer and the native seal path are built around reusable buffers and a wire-buffer pool (punktfunk1.rs `reclaim_wires`), and the capture→encoder path is GPU zero-copy. Apollo's stream.cpp:676 also does zero-copy shard pointers, but punktfunk's choice is a core invariant, not just an optimization.
- Native-thread hot path with QUIC only on the control plane: punktfunk forbids async on the per-frame path (tokio/quinn live only behind the quic feature, used for control/m3 orchestration). The per-frame video send loops are plain std::thread + blocking sockets, matching Apollo's threaded model rather than introducing async I/O on the data plane.
- Encrypt-by-default native plane vs plaintext GameStream: punktfunk advertises encryptionSupported:0 for GameStream (plaintext video/audio, rtsp.rs:228) to match stock Moonlight expectations, but the native plane mandates per-direction-salted AES-GCM with seq-as-AAD (crypto.rs) — a deliberate security upgrade GameStream's optional/legacy crypto cannot match.
- Control-plane GCM scheme auto-detection: rather than hardcoding one of Moonlight's nonce/tag/key permutations like Apollo does per negotiated flag, punktfunk brute-forces the authenticating combination once per connection (control.rs:289) — more robust across moonlight-common-c variants, a Rust-idiomatic divergence.
@@ -625,7 +625,7 @@ punktfunk has **three encoder back-ends behind one trait**, `Encoder` (`encode.r
**10-bit HEVC Main10** on Windows: upconverts 8-bit ARGB via `pixelBitDepthMinus8=2` + Main10 profile GUID (`nvenc.rs:233-237`), and sets a BT.2020/PQ VUI when the capturer hands a 10-bit `Rgb10a2` frame (`nvenc.rs:243-254`). `validate_dimensions` (`encode.rs:83-99`) is the up-front gate on attacker/typo dimensions (zero/odd/over-max).
**Bit-depth/HDR negotiation** happens at the protocol layer (`m3.rs:563-569`), not in a colorspace module; there is no separate colorspace negotiation file.
**Bit-depth/HDR negotiation** happens at the protocol layer (`punktfunk1.rs:563-569`), not in a colorspace module; there is no separate colorspace negotiation file.
**Intentional divergences (by design, not gaps):**
@@ -660,20 +660,20 @@ Richer than the Windows path: opens a sink-monitor capture (`STREAM_CAPTURE_SINK
##### Transport — GameStream path (`crates/punktfunk-host/src/gamestream/audio.rs`)
Learns the client audio endpoint from a port-learning ping (audio.rs:305-315), reuses or reopens the persistent capturer when the channel count changes (audio.rs:318-335), and builds a per-session `SessionEncoder`: plain `opus` crate for stereo (RESTRICTED_LOWDELAY + hard CBR, audio.rs:357-365), or a hand-wrapped `audiopus_sys` multistream encoder for 5.1/7.1 (audio.rs:404-465, **Linux-only — Windows surround `bail!`s** at audio.rs:371-378). Each frame is AES-128-CBC encrypted under the `/launch` rikey with a per-packet IV (audio.rs:540-544), wrapped in a 12-byte RTP header (payload type 97, audio.rs:213-222), and paced to its packet-duration slot (audio.rs:592-598). Surround sessions add Sunshine-compatible RS(4,2) ReedSolomon audio FEC (audio.rs:194-248, 553-583) with a verbatim OpenFEC matrix.
##### Transport — native punktfunk/1 path (`crates/punktfunk-host/src/m3.rs:1379-1453`)
The same capturer feeds an Opus stereo encoder (128 kbps, LOWDELAY, CBR — m3.rs:1401-1414) whose output goes straight into `encode_audio_datagram` over QUIC (0xC9, m3.rs:1437-1441). QUIC already encrypts, so no AES-CBC/RTP/FEC layer. Stereo-only.
##### Transport — native punktfunk/1 path (`crates/punktfunk-host/src/punktfunk1.rs:1379-1453`)
The same capturer feeds an Opus stereo encoder (128 kbps, LOWDELAY, CBR — punktfunk1.rs:1401-1414) whose output goes straight into `encode_audio_datagram` over QUIC (0xC9, punktfunk1.rs:1437-1441). QUIC already encrypts, so no AES-CBC/RTP/FEC layer. Stereo-only.
##### Shared lifecycle
Both transports use the persistent `AudioCapSlot` (gamestream/audio.rs:251-257) and the "audio is best-effort" convention: a capture-open failure logs a warning and the session continues without sound (m3.rs:1395-1399, gamestream/audio.rs:332-334).
Both transports use the persistent `AudioCapSlot` (gamestream/audio.rs:251-257) and the "audio is best-effort" convention: a capture-open failure logs a warning and the session continues without sound (punktfunk1.rs:1395-1399, gamestream/audio.rs:332-334).
**Intentional divergences (by design, not gaps):**
- Native protocol transport is fundamentally different and BETTER, not a gap: punktfunk/1 ships Opus as QUIC datagrams (0xC9, m3.rs:1437-1441) — QUIC provides encryption + the data plane carries GF(2^16) Leopard FEC, so there is no AES-CBC/RTP/Reed-Solomon-RS(4,2) layer like GameStream/Apollo. This is inexpressible in Apollo's Moonlight-only world.
- Native protocol transport is fundamentally different and BETTER, not a gap: punktfunk/1 ships Opus as QUIC datagrams (0xC9, punktfunk1.rs:1437-1441) — QUIC provides encryption + the data plane carries GF(2^16) Leopard FEC, so there is no AES-CBC/RTP/Reed-Solomon-RS(4,2) layer like GameStream/Apollo. This is inexpressible in Apollo's Moonlight-only world.
- One C-ABI core, cfg-dispatched backends: a single `AudioCapturer` trait (audio.rs:17-30) with PipeWire (Linux) and WASAPI (Windows) impls behind one `open_audio_capture` factory — vs Apollo's separate src/audio.cpp + per-platform src/platform/*/audio.cpp. punktfunk's capture, encode, and transport all stay synchronous on native threads (no async on the per-frame path).
- GameStream surround is more correct than Apollo: punktfunk rotates the surround-params mapping over [3, channels) (gamestream/audio.rs:159-171) so 7.1 LFE/SL/SR round-trip after the client's GFE-order swap, where Sunshine/Apollo only rotate [3,6) and scramble 7.1 — verified by a real-codec round-trip test (gamestream/audio.rs:668-688, 750-818).
- WASAPI silent-flag + buffer-straddle handling that Apollo does by hand is provided by the `wasapi` 0.23 crate's `read_from_device_to_deque` (wasapi_cap.rs:146) + a byte VecDeque carrying the remainder (wasapi_cap.rs:135,152-156). Not a missing behavior — it is delegated to the dependency, so Apollo's manual silent-flag lesson is largely already covered on the capture-correctness axis.
- Audio is explicitly best-effort and decoupled: a failed open just logs and the session streams video-only (m3.rs:1395-1399, gamestream/audio.rs:332-334) — the same hard-won lesson Apollo encodes with its fail_guard, already adopted.
- Audio is explicitly best-effort and decoupled: a failed open just logs and the session streams video-only (punktfunk1.rs:1395-1399, gamestream/audio.rs:332-334) — the same hard-won lesson Apollo encodes with its fail_guard, already adopted.
**Transfer candidates from Apollo (6):** _Detect default-render-device changes and reinit WASAPI capture_, _Raise the WASAPI capture thread to MMCSS Pro Audio priority_, _Support 5.1/7.1 WASAPI loopback + multistream encode on Windows_, _Implement client-mic passthrough on Windows_, _Surface WASAPI data-discontinuity as a glitch diagnostic_, _Recover when there is no render endpoint at session start_ — see Part 4.
@@ -707,16 +707,16 @@ One virtual Xbox 360 pad per client index, lazily plugged on first `State` (`gam
#### Threading — two different models
- **GameStream**: injection happens **inline on the ENet service thread**`on_receive` calls `inj.inject(&ev)` directly inside the `host.service()` loop (`gamestream/control.rs:84-91, 207-211`). Rumble is pumped each 2 ms tick (`control.rs:103-124`).
- **punktfunk/1 (m3)**: a per-session `input_thread` (`m3.rs:1245-1344`) receives decoded events over an mpsc channel, routes pointer/keyboard to a **host-lifetime `injector_service_thread`** (`m3.rs:1011-1064`, `inj_tx.send(ev)` at `m3.rs:1300`) and gamepad events to the session `PadBackend`. It tracks `held_buttons`/`held_keys` and **synthesizes release events at session end** to avoid latched-button drag (`m3.rs:1267-1301, 1345-1368`). The injector service auto-reopens when the resolved backend changes mid-session (`m3.rs:1020-1029`).
- **punktfunk/1 (m3)**: a per-session `input_thread` (`punktfunk1.rs:1245-1344`) receives decoded events over an mpsc channel, routes pointer/keyboard to a **host-lifetime `injector_service_thread`** (`punktfunk1.rs:1011-1064`, `inj_tx.send(ev)` at `punktfunk1.rs:1300`) and gamepad events to the session `PadBackend`. It tracks `held_buttons`/`held_keys` and **synthesizes release events at session end** to avoid latched-button drag (`punktfunk1.rs:1267-1301, 1345-1368`). The injector service auto-reopens when the resolved backend changes mid-session (`punktfunk1.rs:1020-1029`).
**Intentional divergences (by design, not gaps):**
- One core, one event struct: both protocols decode to the same fixed-size punktfunk_core::input::InputEvent (input.rs:108-150) that crosses the C ABI unchanged, and all backends implement one InputInjector trait (inject.rs:18-20). Apollo bridges through the platf:: abstraction but has no cross-process ABI — punktfunk's event is the same shape host-side and client-side (the embeddable NativeClient sends it back via punktfunk_connection_send_input, m3.rs:2798).
- One core, one event struct: both protocols decode to the same fixed-size punktfunk_core::input::InputEvent (input.rs:108-150) that crosses the C ABI unchanged, and all backends implement one InputInjector trait (inject.rs:18-20). Apollo bridges through the platf:: abstraction but has no cross-process ABI — punktfunk's event is the same shape host-side and client-side (the embeddable NativeClient sends it back via punktfunk_connection_send_input, punktfunk1.rs:2798).
- Runtime VK→scancode via MapVirtualKeyExW (sendinput.rs:209) instead of Apollo's compile-time static US-English table (keylayout.h). This intentionally respects the host's live keyboard layout rather than forcing 0x409; the trade-off (no fixed canonical mapping for games that demand exact set-1 scancodes) is the candidate improvement below, not a bug.
- punktfunk/1 native path runs gamepads as a per-session PadBackend that can be a virtual DualSense (UHID hid-playstation) on Linux — a game sees a REAL DualSense with adaptive triggers/lightbar/touchpad/motion, with rich feedback flowing back over a dedicated 0xCD HID-output plane (m3.rs:1169-1235). Apollo emulates DS4 via ViGEm on Windows; punktfunk's UHID approach is a deliberately different, higher-fidelity Linux design (not applicable to the Windows host).
- Native threading model: the m3 path decouples ingest from injection via an mpsc channel to a host-lifetime injector service thread (m3.rs:1011-1064, 1300) — same anti-head-of-line-blocking intent as Apollo's task-pool queue but realized with Rust channels + native threads (no async, honoring the no-async-on-the-frame-path invariant). It also auto-reopens the injector when the active session/backend changes mid-stream (m3.rs:1020-1029), which Apollo has no analogue for.
- Session-end safety: the m3 input thread tracks held buttons/keys and synthesizes release events when the client vanishes mid-press (m3.rs:1267-1301, 1350-1368), preventing latched-button drag across reconnects — a robustness feature Apollo's per-stream input_t does not implement this way.
- punktfunk/1 native path runs gamepads as a per-session PadBackend that can be a virtual DualSense (UHID hid-playstation) on Linux — a game sees a REAL DualSense with adaptive triggers/lightbar/touchpad/motion, with rich feedback flowing back over a dedicated 0xCD HID-output plane (punktfunk1.rs:1169-1235). Apollo emulates DS4 via ViGEm on Windows; punktfunk's UHID approach is a deliberately different, higher-fidelity Linux design (not applicable to the Windows host).
- Native threading model: the m3 path decouples ingest from injection via an mpsc channel to a host-lifetime injector service thread (punktfunk1.rs:1011-1064, 1300) — same anti-head-of-line-blocking intent as Apollo's task-pool queue but realized with Rust channels + native threads (no async, honoring the no-async-on-the-frame-path invariant). It also auto-reopens the injector when the active session/backend changes mid-stream (punktfunk1.rs:1020-1029), which Apollo has no analogue for.
- Session-end safety: the m3 input thread tracks held buttons/keys and synthesizes release events when the client vanishes mid-press (punktfunk1.rs:1267-1301, 1350-1368), preventing latched-button drag across reconnects — a robustness feature Apollo's per-stream input_t does not implement this way.
**Transfer candidates from Apollo (6):** _Switch SendInput to retry-on-failure desktop reattach (drop per-event OpenInputDesktop)_, _Move GameStream input injection off the ENet service thread_, _Coalesce relative-mouse/scroll/controller spam before injection_, _Add virtual touch + pen on the Windows host via synthetic pointer devices_, _Map absolute mouse to the target output rect, not the whole virtual desktop_, _Add a static canonical VK→scancode table as a layout-independent fallback_ — see Part 4.
@@ -737,7 +737,7 @@ For the Windows host specifically, Apollo is clearly ahead on this subsystem. Ap
**Layer 2 — virtual display (`vdisplay.rs` + `vdisplay/sudovda.rs`).** The `VirtualDisplay` trait (`vdisplay.rs:47-53`) is RAII: `create(mode) -> VirtualOutput { node_id, win_capture, keepalive }`, teardown by dropping `keepalive`. On Windows `open()` always returns the single `SudoVdaDisplay` (the compositor arg is moot, `vdisplay.rs:525-530`).
#### Windows launch + display-config flow
`m3.rs:532-543` (and the GameStream twin `stream.rs:93-97`) resolve the launch id to a command and do `std::env::set_var("PUNKTFUNK_GAMESCOPE_APP", &cmd)`**but that env var is read only by the Linux gamescope backend** (`vdisplay/gamescope.rs:441`). On Windows there is **no process launch at all**: SudoVDA's `create()` (`sudovda/sudovda.rs:448-543`) never spawns the app, and the only `CreateProcessAsUserW` in the crate is the WGC capture helper (`capture/wgc_relay.rs:6`), not app launch. So on Windows a session shows the bare desktop; apps.json `cmd`/library titles are effectively dead.
`punktfunk1.rs:532-543` (and the GameStream twin `stream.rs:93-97`) resolve the launch id to a command and do `std::env::set_var("PUNKTFUNK_GAMESCOPE_APP", &cmd)`**but that env var is read only by the Linux gamescope backend** (`vdisplay/gamescope.rs:441`). On Windows there is **no process launch at all**: SudoVDA's `create()` (`sudovda/sudovda.rs:448-543`) never spawns the app, and the only `CreateProcessAsUserW` in the crate is the WGC capture helper (`capture/wgc_relay.rs:6`), not app launch. So on Windows a session shows the bare desktop; apps.json `cmd`/library titles are effectively dead.
`SudoVdaDisplay::create` does the display-config work Apollo splits into a separate library, inline: ADD-IOCTL at the client's exact WxH@Hz (`sudovda.rs:452-469`), watchdog ping thread keyed to the driver's reported timeout (`sudovda.rs:481-494`), resolve `\\.\DisplayN` via CCD with a 15×200ms retry (`sudovda.rs:499-505`), `set_active_mode()` which enumerates advertised modes and falls back to the best refresh AT THE SAME RESOLUTION + `CDS_TEST` before `CDS_SET_PRIMARY` (`sudovda.rs:146-265`), then `isolate_displays()` detaches every physical display so the secure desktop renders on the VD (`sudovda.rs:276-329`). Teardown (`sudovda.rs:559-580`) stops the pinger, `restore_displays()`, then REMOVE by a **fixed** `MONITOR_GUID` (`sudovda.rs:59`, "one session at a time today").
@@ -899,7 +899,7 @@ QPC values from `LastPresentTime`/`LastMouseUpdateTime` are translated to `stead
- **Treat S_OK-with-no-change frames as timeouts via DXGI update flags** (sev high, medium) — In dxgi.rs acquire(), after a successful AcquireNextFrame, compute frame_update_flag = info.LastPresentTime != 0 (and/or info.AccumulatedFrames != 0) and mouse_update_flag from LastMouseUpdateTime/PointerShapeBufferSize. Always call update_cursor (mouse). If !frame_update_flag, ReleaseFrame and return Ok(None) (so next_frame repeats last_present) UNLESS the cursor moved and we need a recomposite — in which case recomposite onto the existing last_present texture instead of CopyResource'ing the source. This cuts idle/cursor-only GPU load and avoids re-encoding unchanged content.
- **Detect resolution/format change on the acquire hot path, not only during rebuild** (sev high, small) — In acquire(), after res.cast::<ID3D11Texture2D>(), call GetDesc and compare Width/Height/Format against self.width/height and the expected format (BGRA8 vs R16G16B16A16_FLOAT). On mismatch, ReleaseFrame and run the existing recreate_dupl path (or drop gpu_copy/staging/fp16/hdr10 textures and update width/height/hdr_fp16) so the encoder re-inits cleanly. This makes live resolution + HDR-toggle changes robust even when DDA doesn't fault.
- **Release the duplication device lock during idle to avoid encoder starvation** (sev medium, small) — Cap the per-acquire DDA timeout to a small value (e.g. 8-16ms) and, when it returns WAIT_TIMEOUT, std::thread::sleep a few ms with no outstanding AcquireNextFrame before retrying — so the encode thread can grab the device for NVENC setup/reinit. Keep the generous timeout only for first_frame. Low risk, directly mirrors Apollo's documented fix.
- **Add client-framerate frame pacing with a high-precision timer** (sev medium, large) — Add an optional pacing layer (in dxgi.rs or the encode-loop caller in m3.rs/encode.rs) keyed on the negotiated client framerate: track a group start from the frame pts, sleep to the computed target with a Windows high-resolution timer (timeBeginPeriod or CREATE_WAITABLE_TIMER_HIGH_RESOLUTION), and snap near-integral refresh to integer divisors. This is the lever for steady pacing on odd refresh rates without changing the zero-copy design.
- **Add client-framerate frame pacing with a high-precision timer** (sev medium, large) — Add an optional pacing layer (in dxgi.rs or the encode-loop caller in punktfunk1.rs/encode.rs) keyed on the negotiated client framerate: track a group start from the frame pts, sleep to the computed target with a Windows high-resolution timer (timeBeginPeriod or CREATE_WAITABLE_TIMER_HIGH_RESOLUTION), and snap near-integral refresh to integer divisors. This is the lever for steady pacing on odd refresh rates without changing the zero-copy design.
- **Harden GPU scheduling priority + SetMaximumFrameLatency + NVIDIA-HAGS NVENC-realtime avoidance** (sev medium, medium) — After D3D11CreateDevice in dxgi.rs (and the NVENC encoder device wherever it's built), query IDXGIDevice1::SetMaximumFrameLatency(1) and SetGPUThreadPriority; load gdi32 D3DKMTSetProcessSchedulingPriorityClass and request HIGH (not REALTIME) when the adapter is NVIDIA (VendorId 0x10DE) with HAGS on, REALTIME otherwise. Mirror the privilege-enable. Guard behind admin/SYSTEM (host already relaunches as SYSTEM).
- **Retry DuplicateOutput at startup and request encoder-supported formats via Output5** (sev medium, small) — In open() wrap DuplicateOutput in a short retry (2-3 tries, ~200ms apart, re-attach_input_desktop between) before bailing. Optionally cast the output to IDXGIOutput5 and call DuplicateOutput1 with an explicit format list (BGRA8 for SDR, R16G16B16A16_FLOAT for HDR) so the capture format is intentional rather than incidental, falling back to DuplicateOutput when Output5 is absent.
@@ -1173,7 +1173,7 @@ punktfunk actually has a **surprisingly complete** HDR capture+encode path on Wi
2. **NVENC ingests RGB (ABGR10), not YUV.** punktfunk feeds R10G10B10A2 RGB and relies on **NVENC's internal RGB→YUV** with a hardcoded VUI. Apollo converts to P010 in its own shader so it controls the exact ITU-T H.273 matrix/range. punktfunk has `video_colorspace`-equivalent math nowhere; the rich `new_color_vectors_from_colorspace` machinery has no analogue.
3. **SDR VUI is never signaled.** For the SDR 8-bit path the NVENC VUI block (`nvenc.rs:243`) is only set when `hdr`; SDR streams carry NVENC's default VUI (effectively unsignaled / BT.709 by luck), with no full/limited-range or rec601/709 control. Apollo always sets all four fields for SDR too.
4. **HDR detection is a single hardcoded colorspace constant** (`G2084_NONE_P2020`); no logging of the full `DXGI_OUTPUT_DESC1` like Apollo's `display_base.cpp:722-732`, making field diagnosis hard.
5. **No client-request AND display-HDR gate.** The protocol has `VIDEO_CAP_HDR` (`quic.rs:86`) but HDR is driven purely by the capture surface format; there's no explicit "client wanted HDR but desktop is SDR → signal SDR" decision like Apollo's `:29`. `m3.rs:558-563` only gates *bit depth*, and the doc comment (`quic.rs:128`) admits "BT.2020 PQ HDR signaling is added alongside HDR support" — i.e. the Welcome doesn't carry colorspace.
5. **No client-request AND display-HDR gate.** The protocol has `VIDEO_CAP_HDR` (`quic.rs:86`) but HDR is driven purely by the capture surface format; there's no explicit "client wanted HDR but desktop is SDR → signal SDR" decision like Apollo's `:29`. `punktfunk1.rs:558-563` only gates *bit depth*, and the doc comment (`quic.rs:128`) admits "BT.2020 PQ HDR signaling is added alongside HDR support" — i.e. the Welcome doesn't carry colorspace.
6. **Split-encode disabled for 10-bit** (`nvenc.rs:166`) is a sensible measured choice, but it means HDR throughput is single-engine-bound — a known limitation, not a bug.
@@ -1182,7 +1182,7 @@ punktfunk actually has a **surprisingly complete** HDR capture+encode path on Wi
- **Plumb HDR10 static metadata (mastering display + MaxCLL/MaxFALL) into NVENC** (sev high, medium) — In capture/dxgi.rs read IDXGIOutput6::GetDesc1 (MaxLuminance/MinLuminance/MaxFullFrameLuminance + primaries) when entering the HDR path, carry it on D3d11Frame/CapturedFrame, and in encode/nvenc.rs populate NV_ENC_MASTERING_DISPLAY_INFO + NV_ENC_CONTENT_LIGHT_LEVEL (set via NV_ENC_HEVC_PROFILE_MAIN10 SEI fields / the per-pic HDR metadata API) on the keyframe. Override primaries to Rec.2020/D65 like Apollo.
- **Always signal the SDR VUI (primaries/transfer/matrix/range), not just HDR** (sev medium, small) — In encode/nvenc.rs always set videoSignalTypePresentFlag + colourDescriptionPresentFlag, defaulting SDR to BT.709 primaries/transfer/matrix and limited range, and thread a colorspace/range choice down from the Welcome so the client and decoder agree.
- **Convert to P010 in a D3D11 shader and feed NVENC YUV instead of ABGR10 RGB** (sev medium, large) — Add an optional GPU RGB->P010 conversion pass in capture/dxgi.rs (mirroring HdrConverter) producing DXGI_FORMAT_P010, and switch the NVENC buffer format to NV_ENC_BUFFER_FORMAT_YUV420_10BIT. Port video_colorspace's matrix generation to Rust as the conversion's constant buffer. Keep the current RGB path as a fallback behind an env knob.
- **Gate HDR on (client requested HDR) AND (desktop is actually HDR), and signal the result in Welcome** (sev medium, medium) — Add a colorspace/dynamic-range field to the Welcome (m3.rs around :562-612), resolve HDR = host_wants AND client VIDEO_CAP_HDR AND capture-surface-is-FP16, and send the resolved colorspace so the client decoder/presenter sets the right transfer/primaries.
- **Gate HDR on (client requested HDR) AND (desktop is actually HDR), and signal the result in Welcome** (sev medium, medium) — Add a colorspace/dynamic-range field to the Welcome (punktfunk1.rs around :562-612), resolve HDR = host_wants AND client VIDEO_CAP_HDR AND capture-surface-is-FP16, and send the resolved colorspace so the client decoder/presenter sets the right transfer/primaries.
- **Log the full DXGI_OUTPUT_DESC1 + capture format on HDR setup** (sev low, small) — When hdr_fp16 is detected in capture/dxgi.rs, QueryInterface IDXGIOutput6, GetDesc1, and tracing::info! the colorspace name, BitsPerColor, primaries, white point, and Max/Min/MaxFullFrame luminance — porting Apollo's colorspace_to_string table.
- **Derive HDR cursor/graphics white from the display, not a fixed 203 nits** (sev low, small) — Once GetDesc1 luminance is read (per hdr10-static-metadata), use MaxFullFrameLuminance (or the SDR-white registry value if exposed) as the cursor white target instead of a constant, falling back to 203; document the limitation like Apollo.
@@ -1268,7 +1268,7 @@ Strict reverse order with logged-but-non-fatal failures: destroy bitstream buffe
punktfunk drives the **raw NVENC API** via `nvidia_video_codec_sdk::{sys, ENCODE_API}` (the safe wrapper is CUDA-only) in a single struct `NvencD3d11Encoder` (`nvenc.rs:38`). It implements the same `Encoder` trait (`submit`/`request_keyframe`/`poll`/`flush`, `encode.rs:41-51`) as the Linux/SW encoders. What it does well:
- **True zero-copy, register-in-place.** Unlike Apollo (which owns a dedicated input texture and color-converts into it), punktfunk registers the **capturer's own** texture by raw pointer, caches the registration in `regs: HashMap<isize,...>`, and `encode_picture`s it directly — no per-frame `CopyResource` (`nvenc.rs:404-423`, comment `:1-12`). This is a genuinely tighter zero-copy path than Apollo's, valid because the encode loop is synchronous (`gamestream/stream.rs:338-344`, `m3.rs`).
- **True zero-copy, register-in-place.** Unlike Apollo (which owns a dedicated input texture and color-converts into it), punktfunk registers the **capturer's own** texture by raw pointer, caches the registration in `regs: HashMap<isize,...>`, and `encode_picture`s it directly — no per-frame `CopyResource` (`nvenc.rs:404-423`, comment `:1-12`). This is a genuinely tighter zero-copy path than Apollo's, valid because the encode loop is synchronous (`gamestream/stream.rs:338-344`, `punktfunk1.rs`).
- **Shared device.** Session opened on the capturer's `ID3D11Device` carried on `FramePayload::D3d11` (`nvenc.rs:182-192, 394`); re-inits on device/size/HDR change (`:366-397`) — handles the secure-desktop device recreate.
- **Low-latency RC mirrors Apollo's intent:** P1 + `ULTRA_LOW_LATENCY` (`:206-207, 260-261`), infinite GOP (`gopLength = NVENC_INFINITE_GOPLENGTH`, `:218`), `frameIntervalP=1` (no B-frames, `:219`), CBR (`:220`), ~1-frame VBV (`vbvBufferSize=vbvInitialDelay=bitrate/fps`, `:225-227`).
- **Bitrate probe-and-step-down** on `initialize_encoder` InvalidParam, floor 10 Mbps (`:140-314`) — equivalent to Apollo's level-cap handling, done by retry.
@@ -1278,7 +1278,7 @@ punktfunk drives the **raw NVENC API** via `nvidia_video_codec_sdk::{sys, ENCODE
##### Where punktfunk is weaker / missing / fragile
1. **No reference-frame invalidation at all.** `request_keyframe()` only sets `force_kf` → a full **IDR** (`nvenc.rs:437-442, 465-467`). There is no `nvEncInvalidateRefFrames`, no DPB depth, no RFI-range handling, no "rfi_needs_confirmation" tagging. Apollo's entire `invalidate_ref_frames` (`nvenc_base.cpp:574-610`) and deep-DPB-with-L0=1 design (`:268-281`) is absent. punktfunk relies wholly on FEC + IDR for loss recovery — every recovery event is a costly keyframe spike (the exact thing the infinite-GOP design was meant to avoid). Note `m3.rs:2153`/`gamestream/stream.rs:336` call `request_keyframe()` for "RFI", but it's always a full IDR.
1. **No reference-frame invalidation at all.** `request_keyframe()` only sets `force_kf` → a full **IDR** (`nvenc.rs:437-442, 465-467`). There is no `nvEncInvalidateRefFrames`, no DPB depth, no RFI-range handling, no "rfi_needs_confirmation" tagging. Apollo's entire `invalidate_ref_frames` (`nvenc_base.cpp:574-610`) and deep-DPB-with-L0=1 design (`:268-281`) is absent. punktfunk relies wholly on FEC + IDR for loss recovery — every recovery event is a costly keyframe spike (the exact thing the infinite-GOP design was meant to avoid). Note `punktfunk1.rs:2153`/`gamestream/stream.rs:336` call `request_keyframe()` for "RFI", but it's always a full IDR.
2. **No async encode.** punktfunk is synchronous map→encode→lock with a `pending` VecDeque and `POOL=4` bitstreams (`:28, 459-460, 469-503`), but never uses a completion event / `enableEncodeAsync` / `doNotWait`. Apollo overlaps GPU encode with CPU via a Win32 event + 100 ms timeout (`nvenc_d3d11.cpp:15,64-66`, `nvenc_base.cpp:525,534-539`). punktfunk's `lock_bitstream` blocks with no timeout — a hung encode wedges the encode thread forever.
@@ -1295,7 +1295,7 @@ punktfunk drives the **raw NVENC API** via `nvidia_video_codec_sdk::{sys, ENCODE
#### Transfer opportunities
- **Add real reference-frame invalidation (RFI) instead of always forcing IDR** (sev high, large) — In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire m3.rs RFI requests to it, falling back to request_keyframe() only when it returns false.
- **Add real reference-frame invalidation (RFI) instead of always forcing IDR** (sev high, large) — In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire punktfunk1.rs RFI requests to it, falling back to request_keyframe() only when it returns false.
- **Query nvEncGetEncodeCaps and gate config on real GPU capabilities** (sev medium, medium) — Add a `get_cap(cap: NV_ENC_CAPS) -> i32` helper in nvenc.rs after open_encode_session_ex (using API.get_encode_caps), verify codec_guid is in get_encode_guids, reject out-of-range WxH up front, and use SUPPORT_10BIT_ENCODE / SUPPORT_REF_PIC_INVALIDATION / SUPPORT_CUSTOM_VBV_BUF_SIZE to gate the corresponding config rather than assuming support. Surfaces clear errors instead of opaque InvalidParam.
- **Use async encode with a Win32 completion event + timeout** (sev medium, medium) — In nvenc.rs, gate on NV_ENC_CAPS_ASYNC_ENCODE_SUPPORT, create a per-bitstream Win32 Event (windows::Win32::System::Threading::CreateEventW), set init.enableEncodeAsync=1, store the event in `pending`, set pic.completionEvent + lock.doNotWait=1, and in poll() WaitForSingleObject(ev, 100ms) before lock_bitstream — returning a clear timeout error instead of blocking forever.
- **Minimize NvEnc API/struct versions per codec for older-driver compatibility** (sev medium, medium) — Add a `min_api_version(codec)` (v11 for H264/HEVC, v12 for AV1) and a helper that rewrites the version word (and optionally the struct-revision byte) before each NvEnc struct is passed, mirroring nvenc_base.cpp:666-680. Set apiVersion in open_encode_session_ex (nvenc.rs:186) from it. Maximizes driver compatibility for the field.
@@ -1489,7 +1489,7 @@ punktfunk's SudoVDA backend lives in `crates/punktfunk-host/src/vdisplay/sudovda
#### Transfer opportunities
- **Detect watchdog ping failures and escalate (re-open the device)** (sev high, medium) — In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in m3.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS.
- **Detect watchdog ping failures and escalate (re-open the device)** (sev high, medium) — In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS.
- **Gate on SudoVDA protocol-version compatibility instead of only logging it** (sev medium, small) — In SudoVdaDisplay::new (sudovda.rs:412-432) parse {Major,Minor,Incremental} and compare against a compiled-in EXPECTED_PROTOCOL {Major:0,Minor:2}. If Major differs or our Minor > driver Minor, return Err with a 'driver too old / incompatible — update SudoVDA' message (and a distinct error variant the mgmt API can surface, like Apollo's VirtualDisplayDriverReady in nvhttp.cpp:936).
- **Retry device open with exponential backoff** (sev medium, small) — Wrap open_device in SudoVdaDisplay::new (sudovda.rs:412-413) in a 20→320ms backoff loop matching Apollo; on a session-time re-open after watchdog failure, allow a few retries with ~1s spacing.
- **Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU** (sev high, medium) — Add `const IOCTL_SET_RENDER_ADAPTER: u32 = ctl(0x802);` and a `#[repr(C)] struct SetRenderAdapterParams { luid: LUID }` in sudovda.rs. Before ADD in create() (sudovda.rs:448), enumerate DXGI adapters (reuse capture/dxgi.rs adapter-by-LUID/name helpers) to match the configured/encoder GPU and issue the IOCTL so the IDD's AddOut LUID matches the capture device's adapter.
@@ -1570,7 +1570,7 @@ punktfunk's **secure-desktop / desktop-switch capture recovery is genuinely matu
- **Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change** (sev high, large) — Add a small Rust service binary (new crate or punktfunk-host `service` subcommand) using windows::Win32::System::Services (RegisterServiceCtrlHandlerEx, StartServiceCtrlDispatcher) that mirrors sunshinesvc.cpp: WTSGetActiveConsoleSessionId -> DuplicateTokenEx+SetTokenInformation(TokenSessionId) -> CreateProcessAsUserW(lpDesktop=winsta0\\default) into a kill-on-close job, accept SERVICE_ACCEPT_SESSIONCHANGE, and relaunch the host on a genuine console-session change. Ship an installer and drop the PsExec dependency.
- **Add an NvAPI driver-settings manager (PREFERRED_PSTATE_MAX + OGL_CPL_PREFER_DXPRESENT) with a crash-safe undo file** (sev medium, large) — Add a windows-only nvprefs module wrapping NvAPI DRS (load nvapi64 dynamically, treat NvAPI_Initialize failure as 'no NVIDIA, skip'). Create a 'punktfunk' app profile with PREFERRED_PSTATE_PREFER_MAX, set OGL_CPL_PREFER_DXPRESENT_ENABLED on the base profile behind a config flag, write an undo file under %ProgramData%\\punktfunk before global changes, and call it on session start (the new stream_will_start hook below).
- **Hook win32u!NtGdiDdDDIGetCachedHybridQueryValue to stop DXGI output-reparenting on hybrid/Optimus GPUs** (sev medium, medium) — Add a once-init in the Windows capture path (capture/dxgi.rs open) that installs the same hook via a minhook-rs/detour crate (or a manual IAT/inline hook) on NtGdiDdDDIGetCachedHybridQueryValue forcing STATE_UNSPECIFIED, plus SetProcessDpiAwarenessContext(PER_MONITOR_AWARE_V2). Gate it to NVIDIA/hybrid boxes; it's process-lifetime so no teardown needed.
- **Add a Windows stream_will_start/stop hook: timer resolution, MMCSS, HIGH_PRIORITY_CLASS, display-required, headless Mouse Keys** (sev medium, medium) — Add a windows-only RAII guard invoked when a session starts (m3.rs/pipeline session setup) that raises timer resolution (NtSetTimerResolution or timeBeginPeriod(1)), DwmEnableMMCSS(true), SetPriorityClass(HIGH_PRIORITY_CLASS), and wraps the DXGI capture loop in SetThreadExecutionState(ES_CONTINUOUS|ES_DISPLAY_REQUIRED) (capture/dxgi.rs next_frame loop), reverting on drop. Optionally the headless Mouse-Keys trick for cursor visibility.
- **Add a Windows stream_will_start/stop hook: timer resolution, MMCSS, HIGH_PRIORITY_CLASS, display-required, headless Mouse Keys** (sev medium, medium) — Add a windows-only RAII guard invoked when a session starts (punktfunk1.rs/pipeline session setup) that raises timer resolution (NtSetTimerResolution or timeBeginPeriod(1)), DwmEnableMMCSS(true), SetPriorityClass(HIGH_PRIORITY_CLASS), and wraps the DXGI capture loop in SetThreadExecutionState(ES_CONTINUOUS|ES_DISPLAY_REQUIRED) (capture/dxgi.rs next_frame loop), reverting on drop. Optionally the headless Mouse-Keys trick for cursor visibility.
- **Use Windows-native DnsServiceRegister (or fix the TXT record) so Apple's mDNS resolver accepts the host** (sev low, medium) — Either (a) verify mdns-sd always emits an RFC-1035-valid TXT (never zero strings) and add a regression test, or (b) add a windows-only discovery backend using DnsServiceRegister via the windows crate's DNS APIs mirroring publish.cpp, including the single-empty-TXT workaround, so Apple NWBrowser/Moonlight discover the host reliably.
- **Add per-frame IDXGIFactory::IsCurrent reinit detection and switch the host clock to GetSystemTimePreciseAsFileTime** (sev medium, small) — In capture/dxgi.rs next_frame, query the cached IDXGIFactory's IsCurrent() once per loop and trigger the existing recreate path when it goes false (catches HDR/topology changes cleanly). Replace now_ns() on Windows with GetSystemTimePreciseAsFileTime converted to Unix-epoch ns so ClockProbe/ClockEcho skew correction stays accurate cross-machine.
@@ -1769,14 +1769,14 @@ adversarial-verify pass. *Area* is the investigation that surfaced it.
*Area:* `cmp:input` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
- **Apollo does:** The control thread only enqueues bytes + schedules a task; a pool thread pops one packet, batches later same-type packets while holding the queue lock, then RELEASES the lock before the (slow) SendInput/ViGEm call — src/input.cpp:1481-1520, 1639-1643. A slow OS input call never stalls the network thread.
- **punktfunk gap:** on_receive() calls inj.inject(&ev) synchronously inside the host.service() ENet loop — crates/punktfunk-host/src/gamestream/control.rs:84-91,207-211. A SendInput that blocks crossing a desktop switch (or a slow ViGEm update) head-blocks ENet handshake/keepalive/retransmit servicing. The m3 path already does this right (m3.rs:1300 → injector_service_thread).
- **punktfunk gap:** on_receive() calls inj.inject(&ev) synchronously inside the host.service() ENet loop — crates/punktfunk-host/src/gamestream/control.rs:84-91,207-211. A SendInput that blocks crossing a desktop switch (or a slow ViGEm update) head-blocks ENet handshake/keepalive/retransmit servicing. The m3 path already does this right (punktfunk1.rs:1300 → injector_service_thread).
- **Proposal:** Mirror the m3 design in the GameStream control thread: push decoded InputEvents onto an mpsc channel drained by a dedicated injector thread (reuse injector_service_thread or a sibling), so the ENet thread never blocks on SendInput/ViGEm. No async needed — native thread + std::sync::mpsc, consistent with the invariant.
#### 9. Actually launch the app/game on Windows (CreateProcessAsUserW into the user session)
*Area:* `cmp:process-launch` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
- **Apollo does:** Apollo runs prep/detached/main commands via platf::run_command = retrieve_users_token + impersonate_current_user + CreateProcessAsUserW, launching apps from a SYSTEM service into the interactive user session (process.cpp execute() l430-494; platform/windows/misc.cpp run_command).
- **punktfunk gap:** On Windows the resolved command is only written to PUNKTFUNK_GAMESCOPE_APP (m3.rs:535-536, stream.rs:96), which is read solely by the Linux gamescope backend (vdisplay/gamescope.rs:441). SudoVdaDisplay::create (sudovda.rs:448-543) never spawns anything, so a Windows session always shows the bare desktop — apps.json cmd and library titles are dead on Windows.
- **punktfunk gap:** On Windows the resolved command is only written to PUNKTFUNK_GAMESCOPE_APP (punktfunk1.rs:535-536, stream.rs:96), which is read solely by the Linux gamescope backend (vdisplay/gamescope.rs:441). SudoVdaDisplay::create (sudovda.rs:448-543) never spawns anything, so a Windows session always shows the bare desktop — apps.json cmd and library titles are dead on Windows.
- **Proposal:** Add a Windows app-launch path: resolve the AppEntry.cmd / library launch_command, then CreateProcessAsUserW into the interactive session (the token+pipe plumbing already exists in capture/wgc_relay.rs — reuse retrieve-token + CreateProcessAsUserW). Track the launched process for lifetime/teardown. Make library.rs Windows launch resolve to a real command (steam steam://rungameid works on Windows) instead of the gamescope env var.
#### 10. Native system tray with state-driven icon + notifications
@@ -1826,7 +1826,7 @@ adversarial-verify pass. *Area* is the investigation that surfaced it.
- **Apollo does:** Apollo's ping thread counts consecutive PingDriver failures and after >3 calls failCb, which sets WATCHDOG_FAILED + closeVDisplayDevice; sessions then re-init the driver (virtual_display.cpp:603-616, process.cpp:65-78, process.cpp:243-246).
- **punktfunk gap:** punktfunk's pinger discards the ioctl Result entirely (`let _ = ioctl(...)`, sudovda.rs:485-494) — a dead driver is never noticed; there is no WATCHDOG_FAILED state and no re-init path.
- **Proposal:** In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in m3.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS.
- **Proposal:** In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS.
#### 16. Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU
*Area:* `win:virtual-display-sudovda` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
@@ -1885,8 +1885,8 @@ adversarial-verify pass. *Area* is the investigation that surfaced it.
*Area:* `win:nvenc-d3d11` · *Windows-host:* yes · *Severity:* high · *Effort:* large
- **Apollo does:** Apollo keeps a deep DPB (maxNumRefFrames 5/HEVC, 8/AV1) but pins L0 ref to 1 (nvenc_base.cpp:268-281), then on a loss event calls nvEncInvalidateRefFrames per-frame over the requested range, dedups against the last range, expands to the last-encoded index, escalates to IDR only if the range exceeds DPB depth, and tags the next frame rfi_needs_confirmation (nvenc_base.cpp:574-610). This lets the encoder re-reference an older still-valid frame rather than emit a multi-millisecond keyframe.
- **punktfunk gap:** punktfunk has NO invalidate path — request_keyframe() always forces a full IDR (nvenc.rs:437-442,465-467); m3.rs:2153 / gamestream/stream.rs:336 wire 'RFI' straight to a keyframe. Every recovery is a costly IDR spike, defeating the infinite-GOP design.
- **Proposal:** In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire m3.rs RFI requests to it, falling back to request_keyframe() only when it returns false.
- **punktfunk gap:** punktfunk has NO invalidate path — request_keyframe() always forces a full IDR (nvenc.rs:437-442,465-467); punktfunk1.rs:2153 / gamestream/stream.rs:336 wire 'RFI' straight to a keyframe. Every recovery is a costly IDR spike, defeating the infinite-GOP design.
- **Proposal:** In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire punktfunk1.rs RFI requests to it, falling back to request_keyframe() only when it returns false.
#### 23. Add a DS4 (DualShock4) ViGEm target on Windows with type auto-selection, motion, touchpad, battery and timestamp pump
*Area:* `win:input-sendinput-vigem` · *Windows-host:* yes · *Severity:* high · *Effort:* large
@@ -1906,10 +1906,10 @@ adversarial-verify pass. *Area* is the investigation that surfaced it.
*Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* medium · *Effort:* small · **✓ verified**
- **Apollo does:** Apollo raises the transmit/capture thread priority: platf::adjust_thread_priority(thread_priority_e::critical) in the video broadcast thread (stream.cpp:1122) and ::high in the audio/control paths (stream.cpp:1333, 1672); the Windows impl is SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST/ABOVE_NORMAL) (platform/windows/misc.cpp:1081-1102).
- **punktfunk gap:** punktfunk names its hot-path threads (stream.rs:44 video, stream.rs:204 send, m3.rs:1804 send_loop, m3.rs:2017/2328 send threads) but never sets a scheduling priority — every host capture/encode/send thread runs at default priority. Only the macOS client elevates (client.rs:169). On a loaded Windows desktop the encode/send thread can be preempted, adding jitter the frame-pacing logic can't recover.
- **punktfunk gap:** punktfunk names its hot-path threads (stream.rs:44 video, stream.rs:204 send, punktfunk1.rs:1804 send_loop, punktfunk1.rs:2017/2328 send threads) but never sets a scheduling priority — every host capture/encode/send thread runs at default priority. Only the macOS client elevates (client.rs:169). On a loaded Windows desktop the encode/send thread can be preempted, adding jitter the frame-pacing logic can't recover.
- **Proposal:** Add a cross-platform raise_current_thread_priority() helper (SetThreadPriority on Windows, optionally AvSetMmThreadCharacteristics for MMCSS; sched/nice on Linux) and call it at the top of the GameStream send thread, the native send_loop, and the encode thread. Cheap, high-value jitter reduction, no design impact.
- **Verify verdict:** `confirmed_gap` — punktfunk: NO thread-priority call exists anywhere in the workspace (grep for SetThreadPriority/sched_setscheduler/setpriority/AvSetMm/THREAD_PRIORITY across crates/ returned zero hits). Hot-path threads are named-only at default priority: GameStream video thread crates/punktfunk-host/src/gamestream/stream.rs:44-53 (thread::Builder name "punktfunk-video") and GameStream send thread stream.rs:204-206 ("punktfunk-send"); native send threads crates/punktfunk-host/src/m3.rs:2017-2033 and m3.rs:2328-2333 ("punktfunk-send"), and the native send_loop at m3.rs:1804 — all spawned with no priority set. The encode work shares the capture thread (m3.rs:2011-2013 "this thread captures+encodes ... and hands each AU to a dedicated send thread"), also default priority. The windows crate is ALREADY a dependency with the needed feature: crates/punktfunk-host/Cargo.toml:141 enables "Win32_System_Threading" (SetThreadPriority/GetCurrentThread available, zero new deps). Apollo: confirmed it raises priority on every hot-path thread — capture src/video.cpp:1295 (critical), encode src/video.cpp:2359 and 2396 (high), video send src/stream.cpp:1333 (high), control src/stream.cpp:1122 (critical), audio src/stream.cpp:1672 + src/audio.cpp:94/208. Windows impl is SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST/ABOVE_NORMAL) at src/platform/windows/misc.cpp:1081-1102, plus DwmEnableMMCSS(true) (misc.cpp:1139) and AvSetMmThreadCharacteristics("Pro Audio") for the audio-capture thread (src/platform/windows/audio.cpp:540). CRITICAL NUANCE: Apollo's adjust_thread_priority is effectively Windows-only — src/platform/linux/misc.cpp:362-364 is "// Unimplemented" and src/platform/macos/misc.mm:218-220 is "// Unimplemented".
- **Refined:** Add a small cross-platform helper raise_current_thread_priority(level) and call it at the TOP of each hot-path thread body (so the calling thread itself is elevated): the GameStream send thread (stream.rs:206), the GameStream video/capture+encode thread (stream.rs:46), the native send threads (m3.rs:2021 and m3.rs:2331 closures, before/at the start of send_loop), and the native capture+encode thread (the m3.rs run body that owns capture+encode, m3.rs ~2011+). Windows: SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST) for the send/network thread (latency-critical, matches Apollo's video-send=high but the punktfunk send thread also does FEC+seal so HIGHEST is defensible) and THREAD_PRIORITY_ABOVE_NORMAL for capture+encode — using the windows crate already on Cargo.toml:141, no new deps. Optionally associate the network/encode thread with MMCSS via AvSetMmThreadCharacteristics (needs the Win32_System_Threading "Games"/"Pro Audio" task + AVRT feature) for higher-fidelity scheduling under DWM load; treat as a follow-up, not the first cut. Linux (net-new beyond Apollo, since Apollo leaves it unimplemented and punktfunk is Linux-first): best-effort nice(-10)/setpriority on the send+encode threads — note SCHED_FIFO/RR requires CAP_SYS_NICE/rtprio limits the host won't have by default, so do NOT default to realtime; a plain niceness bump is the safe portable choice and silently no-ops without privilege. Make every priority call best-effort (log-and-continue on failure, exactly as Apollo does at misc.cpp:1104). No async, no per-frame allocation, no ABI surface change — purely thread-setup, so no design invariant is touched.
- **Verify verdict:** `confirmed_gap` — punktfunk: NO thread-priority call exists anywhere in the workspace (grep for SetThreadPriority/sched_setscheduler/setpriority/AvSetMm/THREAD_PRIORITY across crates/ returned zero hits). Hot-path threads are named-only at default priority: GameStream video thread crates/punktfunk-host/src/gamestream/stream.rs:44-53 (thread::Builder name "punktfunk-video") and GameStream send thread stream.rs:204-206 ("punktfunk-send"); native send threads crates/punktfunk-host/src/punktfunk1.rs:2017-2033 and punktfunk1.rs:2328-2333 ("punktfunk-send"), and the native send_loop at punktfunk1.rs:1804 — all spawned with no priority set. The encode work shares the capture thread (punktfunk1.rs:2011-2013 "this thread captures+encodes ... and hands each AU to a dedicated send thread"), also default priority. The windows crate is ALREADY a dependency with the needed feature: crates/punktfunk-host/Cargo.toml:141 enables "Win32_System_Threading" (SetThreadPriority/GetCurrentThread available, zero new deps). Apollo: confirmed it raises priority on every hot-path thread — capture src/video.cpp:1295 (critical), encode src/video.cpp:2359 and 2396 (high), video send src/stream.cpp:1333 (high), control src/stream.cpp:1122 (critical), audio src/stream.cpp:1672 + src/audio.cpp:94/208. Windows impl is SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST/ABOVE_NORMAL) at src/platform/windows/misc.cpp:1081-1102, plus DwmEnableMMCSS(true) (misc.cpp:1139) and AvSetMmThreadCharacteristics("Pro Audio") for the audio-capture thread (src/platform/windows/audio.cpp:540). CRITICAL NUANCE: Apollo's adjust_thread_priority is effectively Windows-only — src/platform/linux/misc.cpp:362-364 is "// Unimplemented" and src/platform/macos/misc.mm:218-220 is "// Unimplemented".
- **Refined:** Add a small cross-platform helper raise_current_thread_priority(level) and call it at the TOP of each hot-path thread body (so the calling thread itself is elevated): the GameStream send thread (stream.rs:206), the GameStream video/capture+encode thread (stream.rs:46), the native send threads (punktfunk1.rs:2021 and punktfunk1.rs:2331 closures, before/at the start of send_loop), and the native capture+encode thread (the punktfunk1.rs run body that owns capture+encode, punktfunk1.rs ~2011+). Windows: SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST) for the send/network thread (latency-critical, matches Apollo's video-send=high but the punktfunk send thread also does FEC+seal so HIGHEST is defensible) and THREAD_PRIORITY_ABOVE_NORMAL for capture+encode — using the windows crate already on Cargo.toml:141, no new deps. Optionally associate the network/encode thread with MMCSS via AvSetMmThreadCharacteristics (needs the Win32_System_Threading "Games"/"Pro Audio" task + AVRT feature) for higher-fidelity scheduling under DWM load; treat as a follow-up, not the first cut. Linux (net-new beyond Apollo, since Apollo leaves it unimplemented and punktfunk is Linux-first): best-effort nice(-10)/setpriority on the send+encode threads — note SCHED_FIFO/RR requires CAP_SYS_NICE/rtprio limits the host won't have by default, so do NOT default to realtime; a plain niceness bump is the safe portable choice and silently no-ops without privilege. Make every priority call best-effort (log-and-continue on failure, exactly as Apollo does at misc.cpp:1104). No async, no per-frame allocation, no ABI surface change — purely thread-setup, so no design invariant is touched.
#### 43. Socket QoS / DSCP marking on the media sockets
*Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* medium · *Effort:* medium · **✓ verified**
@@ -1924,10 +1924,10 @@ adversarial-verify pass. *Area* is the investigation that surfaced it.
*Area:* `cmp:protocol-streaming` · *Windows-host:* no · *Severity:* medium · *Effort:* medium · **✓ verified**
- **Apollo does:** Apollo paces each frame's packets at the *negotiated bitrate*: ratecontrol_packets_in_1ms = giga*80/100/1000/blocksize/8 (stream.cpp:1464) and sleeps the send loop to that per-millisecond budget across the frame (stream.cpp:1578-1627), so the sender shapes to the link's allotted rate, not just the frame deadline.
- **punktfunk gap:** Both punktfunk send pacers spread purely over the FRAME INTERVAL: the GameStream sender uses budget = frame_interval * 0.75 (stream.rs:209) and the native paced_submit uses budget to next frame's deadline * 0.9 (m3.rs:1752) — neither derives a packets-per-ms budget from cfg.bitrate_kbps (the bitrate is only used to open NVENC, stream.rs:275). A spiky IDR or VBR overshoot can still microburst above the negotiated rate within its frame window.
- **punktfunk gap:** Both punktfunk send pacers spread purely over the FRAME INTERVAL: the GameStream sender uses budget = frame_interval * 0.75 (stream.rs:209) and the native paced_submit uses budget to next frame's deadline * 0.9 (punktfunk1.rs:1752) — neither derives a packets-per-ms budget from cfg.bitrate_kbps (the bitrate is only used to open NVENC, stream.rs:275). A spiky IDR or VBR overshoot can still microburst above the negotiated rate within its frame window.
- **Proposal:** Compute a bitrate-derived per-millisecond send budget (like Apollo's ratecontrol_packets_in_1ms) from the negotiated bitrate and pace overflow to THAT rate inside paced_submit / spawn_sender, taking the min of the frame-interval budget and the bitrate budget. Smooths VBR bursts on rate-limited links without breaking the existing microburst fast-path.
- **Verify verdict:** `partial` — PUNKTFUNK gap is real: both pacers spread over the FRAME INTERVAL only, never the bitrate. GameStream sender: `let budget = frame_interval.mul_f32(0.75)` (crates/punktfunk-host/src/gamestream/stream.rs:209). Native paced_submit: `let budget = deadline.checked_duration_since(pace_start)...mul_f32(0.9)` (crates/punktfunk-host/src/m3.rs:1752-1755) where deadline = `next += interval` (m3.rs:2162) and `interval = Duration::from_secs_f64(1.0 / effective_hz...)` (m3.rs:2357). bitrate_kbps only configures NVENC (stream.rs:275; m3.rs:2306, 2694) and is never fed to the pacer. So far the gap claim holds. BUT the Apollo characterization in the proposal is FACTUALLY WRONG: Apollo's `size_t ratecontrol_packets_in_1ms = std::giga::num * 80 / 100 / 1000 / blocksize / 8;` (/home/enricobuehler/Apollo/src/stream.cpp:1464) is a HARDCODED 80% of 1 Gigabit/sec — a fixed constant. grep across stream.cpp shows the negotiated/session bitrate never enters this formula (only std::giga::num, blocksize, and the 80/100 constant appear at lines 1464/1578-1582/1625-1627). Apollo paces to a FIXED ~800 Mbps link ceiling regardless of negotiated bitrate; it is NOT "negotiated-bitrate pacing." punktfunk's own design notes deliberately reject clamping to negotiated bitrate: "The encoder is pixel-rate bound, not bitrate bound" (m3.rs:321) and the whole 1Gbps+ effort raised the ceiling (m3.rs:1617-1619, MAX_BITRATE_KBPS ~2 Gbps).
- **Refined:** Reject the proposal AS WRITTEN — its premise ("Apollo paces to the negotiated bitrate") is false; Apollo paces to a hardcoded 80%-of-1Gbps fixed link ceiling (stream.cpp:1464), and pacing to negotiated bitrate would actively regress punktfunk (VBR/IDR spikes legitimately exceed average bitrate, and punktfunk explicitly treats the encoder as pixel-rate-bound, not bitrate-bound — m3.rs:321). If anything is worth porting, it is the FIXED per-millisecond link-rate ceiling concept, not bitrate-derived pacing: optionally compute a fixed packets-per-ms budget from a configurable link-rate ceiling (default high, e.g. matching MAX_BITRATE_KBPS, env-overridable like PUNKTFUNK_PACE_BURST_KB) and take min(frame-interval budget, link-ceiling budget) inside paced_submit/spawn_sender — purely as a microburst smoother for rate-limited links, NOT tied to cfg.bitrate_kbps. Note punktfunk already has the microburst fast-path (burst_cap, m3.rs:2005-2009 / paced_submit:1734-1743) and frame-interval spreading, which together already address the "spiky IDR microburst" symptom the proposal cites. Recommend deferring unless a measured rate-limited-link regression appears; the current frame-interval + burst-cap pacing covers the cited risk.
- **Verify verdict:** `partial` — PUNKTFUNK gap is real: both pacers spread over the FRAME INTERVAL only, never the bitrate. GameStream sender: `let budget = frame_interval.mul_f32(0.75)` (crates/punktfunk-host/src/gamestream/stream.rs:209). Native paced_submit: `let budget = deadline.checked_duration_since(pace_start)...mul_f32(0.9)` (crates/punktfunk-host/src/punktfunk1.rs:1752-1755) where deadline = `next += interval` (punktfunk1.rs:2162) and `interval = Duration::from_secs_f64(1.0 / effective_hz...)` (punktfunk1.rs:2357). bitrate_kbps only configures NVENC (stream.rs:275; punktfunk1.rs:2306, 2694) and is never fed to the pacer. So far the gap claim holds. BUT the Apollo characterization in the proposal is FACTUALLY WRONG: Apollo's `size_t ratecontrol_packets_in_1ms = std::giga::num * 80 / 100 / 1000 / blocksize / 8;` (/home/enricobuehler/Apollo/src/stream.cpp:1464) is a HARDCODED 80% of 1 Gigabit/sec — a fixed constant. grep across stream.cpp shows the negotiated/session bitrate never enters this formula (only std::giga::num, blocksize, and the 80/100 constant appear at lines 1464/1578-1582/1625-1627). Apollo paces to a FIXED ~800 Mbps link ceiling regardless of negotiated bitrate; it is NOT "negotiated-bitrate pacing." punktfunk's own design notes deliberately reject clamping to negotiated bitrate: "The encoder is pixel-rate bound, not bitrate bound" (punktfunk1.rs:321) and the whole 1Gbps+ effort raised the ceiling (punktfunk1.rs:1617-1619, MAX_BITRATE_KBPS ~2 Gbps).
- **Refined:** Reject the proposal AS WRITTEN — its premise ("Apollo paces to the negotiated bitrate") is false; Apollo paces to a hardcoded 80%-of-1Gbps fixed link ceiling (stream.cpp:1464), and pacing to negotiated bitrate would actively regress punktfunk (VBR/IDR spikes legitimately exceed average bitrate, and punktfunk explicitly treats the encoder as pixel-rate-bound, not bitrate-bound — punktfunk1.rs:321). If anything is worth porting, it is the FIXED per-millisecond link-rate ceiling concept, not bitrate-derived pacing: optionally compute a fixed packets-per-ms budget from a configurable link-rate ceiling (default high, e.g. matching MAX_BITRATE_KBPS, env-overridable like PUNKTFUNK_PACE_BURST_KB) and take min(frame-interval budget, link-ceiling budget) inside paced_submit/spawn_sender — purely as a microburst smoother for rate-limited links, NOT tied to cfg.bitrate_kbps. Note punktfunk already has the microburst fast-path (burst_cap, punktfunk1.rs:2005-2009 / paced_submit:1734-1743) and frame-interval spreading, which together already address the "spiky IDR microburst" symptom the proposal cites. Recommend deferring unless a measured rate-limited-link regression appears; the current frame-interval + burst-cap pacing covers the cited risk.
#### 94. Consume the GameStream client loss-stats report
*Area:* `cmp:protocol-streaming` · *Windows-host:* no · *Severity:* low · *Effort:* small · **✓ verified**
@@ -1,4 +1,4 @@
# M2 — P1 host: stream to a stock Moonlight client
# GameStream host: stream to a stock Moonlight client
The shippable milestone (plan §8). A stock Moonlight/Artemis client discovers this host,
pairs, launches, and gets video (then input, then audio) on a client-sized virtual display.
@@ -68,13 +68,13 @@ Ground-truth protocol reference: [`research/gamestream-protocol-research.json`](
handshake, negotiate `Config`, create a wlroots virtual output sized to the client.
*Acceptance: Moonlight completes RTSP and the host stands up the UDP streams.*
- **P1.3 — Video (punktfunk-core P1 codec), plaintext, clean-LAN.** RTP+NV framing + FEC shard
layout in punktfunk-core; wire M0's NVENC AUs → UDP 47998. *Acceptance: Moonlight DISPLAYS video.*
layout in punktfunk-core; wire the spike's NVENC AUs → UDP 47998. *Acceptance: Moonlight DISPLAYS video.*
- **P1.4 — Control + input.** ENet (`rusty_enet`) control stream; decode input → `inject.rs`
(uinput/reis); request-IDR → force NVENC keyframe. *Acceptance: mouse/keyboard work.*
- **P1.5 — Robustness: FEC recovery + encryption.** nanors-exact FEC; per-shard AES-GCM.
*Acceptance: stable under `tc netem` loss; encrypted streams.*
- **P1.6 — Audio + polish.** Opus + audio RTP/FEC/CBC (UDP 47999); disconnect teardown; KWin
backend for the user's KDE box. *Acceptance: full game stream with sound — the M2 goal.*
backend for the user's KDE box. *Acceptance: full game stream with sound — the GameStream-host goal.*
## Crates (verified available)
+7 -7
View File
@@ -1,7 +1,7 @@
# Linux host setup — NVIDIA GPU VM (M0/M2)
# Linux host setup — NVIDIA GPU VM (pipeline spike + GameStream host)
How to bring up the build environment for the punktfunk Linux host on an NVIDIA-GPU Ubuntu VM
and run the **M0** capture→encode spike. `punktfunk-core` already builds and is tested
and run the **pipeline spike** (capture→encode). `punktfunk-core` already builds and is tested
cross-platform; this is about the platform backends in `crates/punktfunk-host`.
> Target **Ubuntu 24.04 (noble)**: Sway 1.9, FFmpeg 6.1.1, xdg-desktop-portal 1.18.
@@ -77,7 +77,7 @@ ffprobe /tmp/punktfunk-headless-test.mkv # confirm a real H.265 stream
`wf-recorder` uses `wlr-screencopy` directly (no portal/D-Bus) — the fastest way to
de-risk the GPU encode path. **Note:** screencopy encodes straight to a file and *cannot*
feed PipeWire; the real integration uses the ScreenCast portal (see M0). If shell 1 logged
feed PipeWire; the real integration uses the ScreenCast portal (see the pipeline spike). If shell 1 logged
a Mesa/EGL fallback (or Sway dropped to pixman) instead of `EGL vendor: NVIDIA`, install the
NVIDIA GL userspace (§2) — the portal cannot capture a pixman output.
@@ -89,13 +89,13 @@ The wlroots-on-NVIDIA env workarounds (`WLR_RENDERER=gles2`, `WLR_NO_HARDWARE_CU
`GBM_BACKEND=nvidia-drm`, `sway --unsupported-gpu`, …) live in
`scripts/headless/env.sh``source` it before launching anything Wayland.
## 4. M0 proper — wire it into `punktfunk-core`
## 4. The spike proper — wire it into `punktfunk-core`
Goal (plan §8): headless output → PipeWire ScreenCast → NVENC → a playable file, then feed
the encoded access units into a `punktfunk_core::Session` (host role). The module seams exist
in `crates/punktfunk-host/src/{vdisplay,capture,encode,inject,pipeline}.rs`.
**Status: implemented and verified end-to-end** in `crates/punktfunk-host` (`m0.rs`,
**Status: implemented and verified end-to-end** in `crates/punktfunk-host` (`spike.rs`,
`capture/linux.rs`, `encode/linux.rs`). After the §3 bring-up:
```sh
@@ -139,10 +139,10 @@ Crate choices, verified current:
**Start with the CPU-copy fallback** (download frame → `hwupload_cuda``hevc_nvenc`)
to get an end-to-end stream, then chase true dmabuf zero-copy. The plan flags this
(§9) and the `capture` module already has a `cpu_bytes` fallback field.
- **Input (M2):** [`reis`](https://crates.io/crates/reis) (pure-Rust libei — no native
- **Input (GameStream host):** [`reis`](https://crates.io/crates/reis) (pure-Rust libei — no native
`libei` needed) with `input-linux`/uinput as the universal fallback.
Then continue toward **M2**: `serverinfo`/RTSP/pairing enough for a stock Moonlight client
Then continue toward the **GameStream host**: `serverinfo`/RTSP/pairing enough for a stock Moonlight client
to connect, a KWin virtual output created on connect, input via reis/uinput — the
shippable milestone.
+8 -8
View File
@@ -7,7 +7,7 @@ the gotchas. Read it top to bottom, then start at **Phase 1** (de-risk Reactor f
## Status — WinUI 3 client landed (2026-06-15)
The client is implemented in `crates/punktfunk-client-windows` (binary `punktfunk-client`) and is
The client is implemented in `clients/windows` (binary `punktfunk-client`) and is
**build + clippy + fmt green on `x86_64-pc-windows-msvc`** (built on the dev VM). It is the **WinUI 3**
client this doc planned: native chrome (host list, settings, in-app SPAKE2 PIN pairing) + the video on
a **`SwapChainPanel`**, all in pure Rust.
@@ -44,9 +44,9 @@ a **`SwapChainPanel`**, all in pure Rust.
## What we're building
A native Windows client that connects to a punktfunk/1 host (`serve --native` / `m3-host`), decodes
A native Windows client that connects to a punktfunk/1 host (`serve --native` / `punktfunk1-host`), decodes
HEVC, presents it low-latency, plays Opus audio, and captures local mouse/keyboard/gamepad to send
back — i.e. the Windows analogue of the **GTK4 Linux client** (`crates/punktfunk-client-linux`),
back — i.e. the Windows analogue of the **GTK4 Linux client** (`clients/linux`),
which is the architectural template. The Windows client is close to a 1:1 port of the Linux client
with the platform layers swapped.
@@ -73,7 +73,7 @@ with the platform layers swapped.
- **Trust = shared client identity + SPAKE2 PIN pairing + TOFU** (port `trust.rs`; same identity
files/logic as the other native clients).
## The reference: `crates/punktfunk-client-linux/src/`
## The reference: `clients/linux/src/`
Port these files (near 1:1; only the platform layers change):
@@ -144,11 +144,11 @@ Windows client should mirror it:
`SwapChainPanel` and presents a cleared D3D11 swapchain into it. Confirm the windows-rs Reactor
version/API (PR #4479) and `ISwapChainPanelNative::SetSwapChain` interop. If Reactor proves too
raw, the fallback is `winit` + a child HWND swapchain, but try Reactor first per the decision.
2. **Crate scaffold.** `crates/punktfunk-client-windows`, `[target.'cfg(windows)'.dependencies]`:
2. **Crate scaffold.** `clients/windows`, `[target.'cfg(windows)'.dependencies]`:
`punktfunk-core { path, features=["quic"] }`, `windows`, the Reactor crate, `ffmpeg-next`, `opus`,
`sdl3`, `mdns-sd`, `anyhow`, `tracing`. Mirror `crates/punktfunk-client-linux/Cargo.toml`.
`sdl3`, `mdns-sd`, `anyhow`, `tracing`. Mirror `clients/linux/Cargo.toml`.
3. **Connect + control plane.** Port `session.rs` + `trust.rs`; validate headless against the 4090
box (`m3-host`/`serve --native`) — handshake, PIN/TOFU, plane counters — before any UI/decode.
box (`punktfunk1-host`/`serve --native`) — handshake, PIN/TOFU, plane counters — before any UI/decode.
4. **Decode + present.** FFmpeg D3D11VA → `SwapChainPanel`. SDR (8-bit BGRA) first, then **P010 +
HDR colorspace** (see the HDR section).
5. **Audio.** WASAPI render + Opus decode (port `audio.rs`).
@@ -158,7 +158,7 @@ Windows client should mirror it:
## Key references
- **Template:** `crates/punktfunk-client-linux/src/*` (the client to port).
- **Template:** `clients/linux/src/*` (the client to port).
- **Apple HDR present** (the pattern to mirror): `clients/apple/Sources/PunktfunkKit/{VideoDecoder,
MetalVideoPresenter,Stage2Pipeline}.swift` — in-band PQ detection, P010 decode, EDR present.
- **Core client API:** `crates/punktfunk-core/src/client.rs` (`NativeClient`).
+7 -7
View File
@@ -15,7 +15,7 @@ plan + dev box + SudoVDA protocol + no-GPU strategy added 2026-06-14 (12-agent r
## Status (2026-06-15) — full pipeline live-validated on an RTX 4090
Every OS-touching backend is implemented behind the existing traits and **builds clean on
`x86_64-pc-windows-msvc`** (and Linux unaffected). `serve --native` / `m3-host` **run on Windows**
`x86_64-pc-windows-msvc`** (and Linux unaffected). `serve --native` / `punktfunk1-host` **run on Windows**
(identity in `%APPDATA%`, QUIC bound, mDNS advertising, accepting sessions). The **full native
pipeline is validated live on a real RTX 4090** (Windows 11): SudoVDA virtual display → DXGI
Desktop Duplication (D3D11 zero-copy) → **NVENC HEVC** → punktfunk/1 → Rust reference client, at
@@ -30,7 +30,7 @@ coexisting with a running Apollo (two concurrent NVENC sessions).
| Audio (WASAPI loopback) | ✅ done | live: init chain opens (silent VM → no samples) |
| Capture (DXGI Desktop Duplication) | ✅ **live on RTX 4090** | SudoVDA monitor → D3D11 zero-copy duplication; output is enumerated under the *rendering* GPU, not the SudoVDA LUID (search all adapters) |
| NVENC (D3D11, `--features nvenc`) | ✅ **live on RTX 4090** | 720p60 + 1080p60 HEVC end-to-end to the Rust client; ffmpeg-decodes clean; ran alongside Apollo (2 NVENC sessions) |
| Run host (serve/m3-host) | ✅ live | m3-host starts + listens; `c_abi_connection_roundtrip` passes |
| Run host (serve/punktfunk1-host) | ✅ live | punktfunk1-host starts + listens; `c_abi_connection_roundtrip` passes |
| Gamepad (ViGEm) | ✅ done | compiles incl. rumble back-channel; live needs ViGEmBus + a physical pad |
| Host→client audio wiring | ✅ done | builds on MSVC; `m3` `audio_thread` active on Windows (silent VM → no samples to send) |
| GameStream (Moonlight) audio | ✅ done | stereo path active on Windows (WASAPI→Opus→RTP/FEC); surround stays Linux-only (libopus multistream / `audiopus_sys`) |
@@ -146,7 +146,7 @@ rustc 1.96 clippy is stricter than the Linux CI image on shared code, e.g. `need
3. `cargo build -p punktfunk-host --features nvenc` (needs NASM + CMake for aws-lc-rs; libclang for
any ffmpeg-using client). Default build (no feature) uses the openh264 software encoder.
4. Run in the **interactive session** (not a Session-0 service / not over SSH — SendInput + DXGI
Desktop Duplication need a desktop): `serve --native` or `m3-host --source virtual`. Set
Desktop Duplication need a desktop): `serve --native` or `punktfunk1-host --source virtual`. Set
`PUNKTFUNK_ENCODER=nvenc` to select NVENC (the DXGI capturer switches to zero-copy D3D11 output to
match). The SudoVDA monitor activates once a real GPU drives WDDM, so capture + NVENC then work.
@@ -186,7 +186,7 @@ CMake, libclang.
| `punktfunk-core` (protocol, FEC, crypto, session, transport, QUIC control plane, C ABI) | Zero platform deps; already compiles on Windows MSVC |
| GameStream wire logic (mDNS, serverinfo, pairing, RTSP, ENet) *except* the capture/encode/audio backends | pure protocol |
| Management REST API (`mgmt.rs`) + OpenAPI, `native_pairing`, `discovery` | axum/tokio/quinn — portable |
| `m3.rs` / `m0.rs` / `pipeline.rs` orchestration | trait-generic: call `capturer.next_frame()`, `encoder.submit/poll()`, `vd.create(mode)` — no changes |
| `punktfunk1.rs` / `spike.rs` / `pipeline.rs` orchestration | trait-generic: call `capturer.next_frame()`, `encoder.submit/poll()`, `vd.create(mode)` — no changes |
| The trait boundaries: `Capturer`, `Encoder`, `VirtualDisplay`, `InputInjector`, `AudioCapturer`, `VirtualMic` | platform-neutral; Linux deps already isolated under `[target.'cfg(target_os="linux")'.dependencies]` |
## Step 0 — make `punktfunk-host` compile on `x86_64-pc-windows-msvc` — ✅ DONE (2026-06-14)
@@ -230,7 +230,7 @@ This is the highest-value first move and is **fully doable GPU-less**.
| **Audio capture** | PipeWire sink monitor | **WASAPI loopback** via the `wasapi` crate (48 kHz stereo f32 → existing Opus) | ⚠️ needs an audio endpoint |
| **Virtual mic** | PipeWire `Audio/Source` | virtual audio driver (`Virtual-Audio-Driver`) or defer | ❌ second driver — defer |
`m3.rs`/`m0.rs`/`pipeline.rs` are unchanged. Note: the Windows capture needs its own
`punktfunk1.rs`/`spike.rs`/`pipeline.rs` are unchanged. Note: the Windows capture needs its own
`capture_virtual_output` entry point (the SudoVDA identity is a DXGI adapter LUID + DisplayConfig
TargetId → GDI `\\.\DisplayN`, which doesn't fit the PipeWire `node_id: u32` field — carry it inside
the `keepalive` / a Windows-specific seam rather than overloading `node_id`).
@@ -311,7 +311,7 @@ glass-to-glass / throughput numbers (no perf claim transfers from Linux).
`\\.\DisplayN` resolution (needs a GPU), then `SetDisplayConfig` mid-stream `Reconfigure`.
2. **Capture + SW encode** — DXGI Desktop Duplication (or WGC) → `ID3D11Texture2D` → CPU staging →
openh264 → existing FEC/transport. First end-to-end Windows session, GPU-less, against the Linux
`punktfunk-client-rs` or the new Windows client.
`punktfunk-probe` or the new Windows client.
3. **Input** — SendInput (kbd/mouse, VIRTUALDESK mapping) + interactive-session/desktop-reattach.
4. **Gamepad + audio** — ViGEm + rumble; WASAPI loopback.
5. **HW encode (real-GPU box)** — `nvidia-video-codec-sdk` D3D11 zero-copy; DDA-vs-WGC bake-off;
@@ -320,7 +320,7 @@ glass-to-glass / throughput numbers (no perf claim transfers from Linux).
## The Windows client (separate track, pure Rust)
Structurally a sibling of `crates/punktfunk-client-linux` (GTK4) — same shape, different toolkit:
Structurally a sibling of `clients/linux` (GTK4) — same shape, different toolkit:
- **UI**: `windows-rs` + **Windows Reactor** (WinUI 3) for native chrome. Link `punktfunk-core`
directly (no C ABI). **De-risk early**: a Reactor window with a `SwapChainPanel` presenting a
+4 -4
View File
@@ -11,7 +11,7 @@ need conflicting process tokens.
Implemented so far:
- **Step 1 — DesktopWatcher** (`capture/desktop_watch.rs`): polls the input-desktop name → atomic
`Default`/`Winlogon`. Committed `80e222d`.
- **Step 3 — WGC helper subcommand** (`wgc_helper.rs`, `m3-host wgc-helper`): WGC→NVENC→framed AUs on
- **Step 3 — WGC helper subcommand** (`wgc_helper.rs`, `punktfunk1-host wgc-helper`): WGC→NVENC→framed AUs on
stdout, stdin keyframe control. Committed `a0f6cdd`.
- **Step 4 — spawn + relay** (`capture/wgc_relay.rs`, `m3::virtual_stream_relay`): SYSTEM host spawns
the helper via `CreateProcessAsUserW` into `winsta0\default`, relays its stdout AUs to the QUIC send
@@ -64,12 +64,12 @@ proven by the timed toggle, and DDA-on-Winlogon capture + input by the single-pr
## Architecture: SYSTEM host + USER-session WGC helper, AU-relay (no shared GPU texture)
- **SYSTEM host** (the existing `m3-host`, launched as SYSTEM in interactive Session 1 via the
- **SYSTEM host** (the existing `punktfunk1-host`, launched as SYSTEM in interactive Session 1 via the
scheduled task → PsExec `-s -i 1`): owns the punktfunk/1 QUIC session, the single SudoVDA virtual
output (+ isolate/restore RAII — the *only* topology owner), the **DDA capture + NVENC encoder for
the secure desktop**, the **single SendInput injector** (serves *both* desktops), and the **AU
source mux** that feeds the QUIC data plane.
- **USER-session WGC helper** (a new `m3-host` subcommand, spawned by the SYSTEM host via
- **USER-session WGC helper** (a new `punktfunk1-host` subcommand, spawned by the SYSTEM host via
`WTSQueryUserToken(activeConsoleSessionId)``DuplicateTokenEx(TokenPrimary)`
`CreateProcessAsUserW(lpDesktop="winsta0\\default", CREATE_NO_WINDOW)`): runs the existing
**WGC → scRGB/PQ → NVENC** pipeline and ships **Annex-B AUs** (`{data, pts_ns, keyframe}`) to the
@@ -104,7 +104,7 @@ miss UAC entirely. (May also register `WTSRegisterSessionNotification` to short-
2. **SendInput retry-on-failure model** (`inject/sendinput.rs`): replace per-event reattach with the
cached-HDESK + retry-once model. Test: normal input unchanged; click UAC + type the lock password
land (works today via per-event reattach — this is a refactor).
3. **WGC helper subcommand** (`m3-host wgc-helper` or similar): the existing WGC pipeline → NVENC →
3. **WGC helper subcommand** (`punktfunk1-host wgc-helper` or similar): the existing WGC pipeline → NVENC →
Annex-B AUs over a named-pipe server. Test standalone: as the user it writes a valid `.h265` to the
pipe (capturing the SudoVDA output by GDI name, no topology changes).
4. **Spawn + relay**: SYSTEM host spawns the helper (`CreateProcessAsUserW`), connects the pipe,