docs(design): trim shipped plans, consolidate cluster, add index

Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).

- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
  apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
  host-latency, gpu-contention (fixed stale status table), game-library,
  linux-setup (fixed m0->spike + stale zero-copy claim),
  session-aware-host-followups, windows-client-bootstrap,
  windows-dualsense-{scoping,game-detection}, windows-virtual-display,
  security-review (per-finding status table; #12 still open),
  apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
  windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
  merged, M4 done); windows-secure-desktop.md archived (now a fallback
  behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
  roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-26 16:39:06 +00:00
parent 9ea2c17419
commit 7b99b41ede
27 changed files with 1322 additions and 3229 deletions
+39 -136
View File
@@ -1,5 +1,7 @@
# Apollo vs punktfunk — architecture map & transferable improvements
> **Status:** Reference doc — an Apollo↔punktfunk architecture map plus a 96-item transferable-improvement backlog. About a third of the backlog has since shipped or gone obsolete (those items are collapsed to one-liners below); the rest is still open with full citations. The **Re-verified status (2026-06-20)** section is the authoritative shipped-status record.
> Generated 2026-06-16 by the `apollo-vs-punktfunk` multi-agent workflow, then reconstructed from
> the run journal after the live run was interrupted. **Apollo** = `~/Apollo` (commit `adc5c5a0`),
> a C++ fork of Sunshine — a Moonlight-compatible streaming **host only** (no client of its own).
@@ -680,7 +682,7 @@ Both transports use the persistent `AudioCapSlot` (gamestream/audio.rs:251-257)
### Input handling & injection — 🔴 Apollo ahead
For the Windows host specifically, Apollo is ahead on input breadth and robustness. Apollo covers mouse (rel+abs), keyboard (with a static US-layout VK→scancode table for game compatibility), Unicode text, scroll, **touch + pen via CreateSyntheticPointerDevice**, and **both X360 and DS4** gamepads with rumble/LED/motion/touchpad/battery feedback (Apollo src/platform/windows/input.cpp). punktfunk's Windows host covers mouse/keyboard/scroll/X360-only; touch and pen are explicit no-ops (sendinput.rs:231-237), there is no Unicode text path (gamestream/input.rs:83-84), and only the Xbox 360 virtual pad exists on Windows. Apollo also has the more efficient secure-desktop model (retry-only) vs punktfunk's per-event reattach (sendinput.rs:97), and Apollo's task-pool queue + type-aware batching (Apollo src/input.cpp:1481-1571, 1208-1475) coalesces input spam off the network thread — punktfunk's GameStream path injects inline on the ENet thread (control.rs:207-211) with no batching anywhere. punktfunk's design is cleaner and its m3 path's session-end held-key release + backend-follow logic is genuinely nicer than Apollo, but those are punktfunk/1-specific; on the shared Windows-host injection surface Apollo is the more complete, battle-tested implementation. punktfunk's design/windows-secure-desktop.md already flags the retry-only refactor as planned-but-unshipped, confirming the gap.
For the Windows host specifically, Apollo is ahead on input breadth and robustness. Apollo covers mouse (rel+abs), keyboard (with a static US-layout VK→scancode table for game compatibility), Unicode text, scroll, **touch + pen via CreateSyntheticPointerDevice**, and **both X360 and DS4** gamepads with rumble/LED/motion/touchpad/battery feedback (Apollo src/platform/windows/input.cpp). punktfunk's Windows host covers mouse/keyboard/scroll/X360-only; touch and pen are explicit no-ops (sendinput.rs:231-237), there is no Unicode text path (gamestream/input.rs:83-84), and only the Xbox 360 virtual pad exists on Windows. Apollo also has the more efficient secure-desktop model (retry-only) vs punktfunk's per-event reattach (sendinput.rs:97), and Apollo's task-pool queue + type-aware batching (Apollo src/input.cpp:1481-1571, 1208-1475) coalesces input spam off the network thread — punktfunk's GameStream path injects inline on the ENet thread (control.rs:207-211) with no batching anywhere. punktfunk's design is cleaner and its m3 path's session-end held-key release + backend-follow logic is genuinely nicer than Apollo, but those are punktfunk/1-specific; on the shared Windows-host injection surface Apollo is the more complete, battle-tested implementation. punktfunk's design/archive/windows-secure-desktop.md already flags the retry-only refactor as planned-but-unshipped, confirming the gap.
**How punktfunk does it.**
@@ -748,7 +750,7 @@ For the Windows host specifically, Apollo is clearly ahead on this subsystem. Ap
- punktfunk has TWO app surfaces by design: the GameStream apps.json catalog (Moonlight compat) AND a richer punktfunk/1 library (Steam local scan + custom store + CDN art + uniform GameEntry grid). Apollo has only the apps.json catalog because it ships no client.
- punktfunk's launch security model is deliberately client-can't-inject: the client sends only a store-qualified id and the host resolves it against its OWN library (library.rs:394-412), with steam appid validated digits-only. Apollo trusts its own apps.json cmds (it has no untrusted remote launch id).
- punktfunk keeps NO async on the per-frame path; the SudoVDA watchdog pinger and capture are native threads. Apollo's libdisplaydevice RetryScheduler is its own machinery; punktfunk has no equivalent scheduler by choice (yet — see candidate improvements).
- punktfunk's Windows virtual display is the SOLE primary output (isolate_displays + CDS_SET_PRIMARY) specifically to capture the secure/Winlogon desktop — a deliberate, documented design (design/windows-secure-desktop.md) that goes beyond what stock Apollo needs.
- punktfunk's Windows virtual display is the SOLE primary output (isolate_displays + CDS_SET_PRIMARY) specifically to capture the secure/Winlogon desktop — a deliberate, documented design (design/archive/windows-secure-desktop.md) that goes beyond what stock Apollo needs.
**Transfer candidates from Apollo (6):** _Actually launch the app/game on Windows (CreateProcessAsUserW into the user session)_, _Display-config apply/revert with a retry scheduler and guaranteed revert on disconnect_, _Set HDR on the virtual display and advertise IsHdrSupported when the client requests it_, _Per-(app,client) stable virtual-display GUID instead of one fixed MONITOR_GUID_, _Inject per-app launch env (client res/fps/HDR/audio + status) for launch scripts_, _auto_detach heuristic for launcher-style apps (Steam/UWP) that exit immediately_ — see Part 4.
@@ -897,11 +899,11 @@ QPC values from `LastPresentTime`/`LastMouseUpdateTime` are translated to `stead
#### Transfer opportunities
- **Treat S_OK-with-no-change frames as timeouts via DXGI update flags** (sev high, medium) — In dxgi.rs acquire(), after a successful AcquireNextFrame, compute frame_update_flag = info.LastPresentTime != 0 (and/or info.AccumulatedFrames != 0) and mouse_update_flag from LastMouseUpdateTime/PointerShapeBufferSize. Always call update_cursor (mouse). If !frame_update_flag, ReleaseFrame and return Ok(None) (so next_frame repeats last_present) UNLESS the cursor moved and we need a recomposite — in which case recomposite onto the existing last_present texture instead of CopyResource'ing the source. This cuts idle/cursor-only GPU load and avoids re-encoding unchanged content.
- **Detect resolution/format change on the acquire hot path, not only during rebuild** (sev high, small) — In acquire(), after res.cast::<ID3D11Texture2D>(), call GetDesc and compare Width/Height/Format against self.width/height and the expected format (BGRA8 vs R16G16B16A16_FLOAT). On mismatch, ReleaseFrame and run the existing recreate_dupl path (or drop gpu_copy/staging/fp16/hdr10 textures and update width/height/hdr_fp16) so the encoder re-inits cleanly. This makes live resolution + HDR-toggle changes robust even when DDA doesn't fault.
- **Release the duplication device lock during idle to avoid encoder starvation** (sev medium, small) — Cap the per-acquire DDA timeout to a small value (e.g. 8-16ms) and, when it returns WAIT_TIMEOUT, std::thread::sleep a few ms with no outstanding AcquireNextFrame before retrying — so the encode thread can grab the device for NVENC setup/reinit. Keep the generous timeout only for first_frame. Low risk, directly mirrors Apollo's documented fix.
- **Detect resolution/format change on the acquire hot path, not only during rebuild** — SHIPPED (2026-06-20). [#2]
- **Release the duplication device lock during idle to avoid encoder starvation** — OBSOLETE / not-a-bug (2026-06-20). [#34]
- **Add client-framerate frame pacing with a high-precision timer** (sev medium, large) — Add an optional pacing layer (in dxgi.rs or the encode-loop caller in punktfunk1.rs/encode.rs) keyed on the negotiated client framerate: track a group start from the frame pts, sleep to the computed target with a Windows high-resolution timer (timeBeginPeriod or CREATE_WAITABLE_TIMER_HIGH_RESOLUTION), and snap near-integral refresh to integer divisors. This is the lever for steady pacing on odd refresh rates without changing the zero-copy design.
- **Harden GPU scheduling priority + SetMaximumFrameLatency + NVIDIA-HAGS NVENC-realtime avoidance** (sev medium, medium) — After D3D11CreateDevice in dxgi.rs (and the NVENC encoder device wherever it's built), query IDXGIDevice1::SetMaximumFrameLatency(1) and SetGPUThreadPriority; load gdi32 D3DKMTSetProcessSchedulingPriorityClass and request HIGH (not REALTIME) when the adapter is NVIDIA (VendorId 0x10DE) with HAGS on, REALTIME otherwise. Mirror the privilege-enable. Guard behind admin/SYSTEM (host already relaunches as SYSTEM).
- **Retry DuplicateOutput at startup and request encoder-supported formats via Output5** (sev medium, small) — In open() wrap DuplicateOutput in a short retry (2-3 tries, ~200ms apart, re-attach_input_desktop between) before bailing. Optionally cast the output to IDXGIOutput5 and call DuplicateOutput1 with an explicit format list (BGRA8 for SDR, R16G16B16A16_FLOAT for HDR) so the capture format is intentional rather than incidental, falling back to DuplicateOutput when Output5 is absent.
- **Harden GPU scheduling priority + SetMaximumFrameLatency + NVIDIA-HAGS NVENC-realtime avoidance** — SHIPPED (2026-06-20). [#47]
- **Retry DuplicateOutput at startup and request encoder-supported formats via Output5** — SHIPPED (2026-06-20). [#35]
### Windows.Graphics.Capture (WGC) path — Apollo vs punktfunk
@@ -1099,10 +1101,10 @@ punktfunk's cursor handling lives in `crates/punktfunk-host/src/capture/dxgi.rs`
#### Transfer opportunities
- ✅ **DONE (2026-06-16)****Split every cursor shape into an alpha image + an XOR image (two-pass composite)** (sev high, medium) — Refactor convert_pointer_shape in dxgi.rs to return two optional images (alpha, xor) mirroring Apollo's split. Store cursor_shape as Option<(alpha, xor)>, upload up to two SRVs in CursorCompositor, and in composite_cursor_gpu run the alpha pass with self.blend then the xor pass with self.blend_invert (skip empties). Drop the single cursor_invert flag.
- **Render the monochrome 'inverse of screen' pixels via the XOR pass instead of dropping them** (sev medium, small) — In convert_pointer_shape's monochrome branch (dxgi.rs:628-654), once the dual-pass split (above) exists, route code (1,1) to the XOR image as white and codes (0,0)/(0,1) to the alpha image as opaque black/white, matching Apollo's case mapping.
- ⊘ **ALREADY-HANDLED (2026-06-16; premise incorrect — DDA returns S_OK on pointer-only updates, punktfunk recomposites)****Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame** (sev high, large) — Keep a clean intermediate copy of the last desktop frame (an extra DEFAULT texture). In acquire (dxgi.rs:1341), when AcquireNextFrame times out but update_cursor saw a position change (LastMouseUpdateTime changed) and the cursor is visible, copy the clean intermediate into gpu_copy and re-run composite_cursor_gpu, then return that as a fresh frame instead of repeating last_present.
- **Stop baking the cursor destructively into the repeated gpu_copy texture** (sev medium, medium) — Add a clean base texture: CopyResource(duplication -> clean_base), then CopyResource(clean_base -> gpu_copy) and composite onto gpu_copy. Repeat clean_base (cursor-free) plus a re-composite on repeats. Also create the cursor RTV once per gpu_copy and cache it rather than CreateRenderTargetView every composite (dxgi.rs:1181-1184).
- ✅ **Split every cursor shape into an alpha image + an XOR image (two-pass composite)** — SHIPPED (2026-06-16; capture/dxgi.rs). [#13]
- **Render the monochrome 'inverse of screen' pixels via the XOR pass instead of dropping them** — SHIPPED (2026-06-20). [#37]
- ⊘ **Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame** — NOT-A-BUG (2026-06-16; DDA returns S_OK on pointer-only updates and punktfunk recomposites). [#21]
- **Stop baking the cursor destructively into the repeated gpu_copy texture** — SHIPPED (2026-06-20). [#49]
- **Handle rotated outputs in cursor positioning** (sev low, medium) — Read rotation from DXGI_OUTDUPL_DESC.Rotation when opening/rebuilding the duplication (around dxgi.rs:888 and 1298), store it on DuplCapturer, and apply Apollo's rotation transform when computing the NDC rect in CursorCompositor::draw and when sampling the cursor texture in the VS.
- **Validate masked-color mask bytes and log illegal values** (sev low, small) — In the MASKED_COLOR branch of convert_pointer_shape (dxgi.rs:594-627), branch explicitly on mask==0x00 vs mask==0xFF and emit a tracing::warn! once for any other value, matching Apollo's guard, so future cursor-render bugs are observable.
@@ -1295,10 +1297,10 @@ punktfunk drives the **raw NVENC API** via `nvidia_video_codec_sdk::{sys, ENCODE
#### Transfer opportunities
- **Add real reference-frame invalidation (RFI) instead of always forcing IDR** (sev high, large) — In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire punktfunk1.rs RFI requests to it, falling back to request_keyframe() only when it returns false.
- **Query nvEncGetEncodeCaps and gate config on real GPU capabilities** (sev medium, medium) — Add a `get_cap(cap: NV_ENC_CAPS) -> i32` helper in nvenc.rs after open_encode_session_ex (using API.get_encode_caps), verify codec_guid is in get_encode_guids, reject out-of-range WxH up front, and use SUPPORT_10BIT_ENCODE / SUPPORT_REF_PIC_INVALIDATION / SUPPORT_CUSTOM_VBV_BUF_SIZE to gate the corresponding config rather than assuming support. Surfaces clear errors instead of opaque InvalidParam.
- **Add real reference-frame invalidation (RFI) instead of always forcing IDR** — SHIPPED (2026-06-20; NVENC impl CI-pending). [#22]
- **Query nvEncGetEncodeCaps and gate config on real GPU capabilities** — SHIPPED (2026-06-20; CI-pending). [#51]
- **Use async encode with a Win32 completion event + timeout** (sev medium, medium) — In nvenc.rs, gate on NV_ENC_CAPS_ASYNC_ENCODE_SUPPORT, create a per-bitstream Win32 Event (windows::Win32::System::Threading::CreateEventW), set init.enableEncodeAsync=1, store the event in `pending`, set pic.completionEvent + lock.doNotWait=1, and in poll() WaitForSingleObject(ev, 100ms) before lock_bitstream — returning a clear timeout error instead of blocking forever.
- **Minimize NvEnc API/struct versions per codec for older-driver compatibility** (sev medium, medium) — Add a `min_api_version(codec)` (v11 for H264/HEVC, v12 for AV1) and a helper that rewrites the version word (and optionally the struct-revision byte) before each NvEnc struct is passed, mirroring nvenc_base.cpp:666-680. Set apiVersion in open_encode_session_ex (nvenc.rs:186) from it. Maximizes driver compatibility for the field.
- **Minimize NvEnc API/struct versions per codec for older-driver compatibility** — OBSOLETE (2026-06-20; handled by the SDK crate). [#53]
- **Add zeroReorderDelay/lookahead-off/lowDelayKeyFrameScale and always emit SDR VUI** (sev low, small) — In init_session set cfg.rcParams.zeroReorderDelay=1, enableLookahead=0, lowDelayKeyFrameScale=1 right after the CBR/VBV block (nvenc.rs:220-227). Add an SDR VUI branch (BT.709 primaries/transfer/matrix, limited range) alongside the existing HDR branch (:243) so every HEVC/H264 stream signals its colorspace.
- **Honor client slices-per-frame and offer NVENC intra-refresh** (sev low, medium) — Thread a slices-per-frame value from session negotiation into NvencD3d11Encoder::open and set hevcConfig/h264Config sliceMode=3 + sliceModeData in init_session; for AV1 set numTileRows/numTileColumns as nearest powers of two. Optionally add an intra-refresh config branch gated on NV_ENC_CAPS_SUPPORT_INTRA_REFRESH as an alternative recovery mode to RFI.
@@ -1492,8 +1494,8 @@ punktfunk's SudoVDA backend lives in `crates/punktfunk-host/src/vdisplay/sudovda
- **Detect watchdog ping failures and escalate (re-open the device)** (sev high, medium) — In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS.
- **Gate on SudoVDA protocol-version compatibility instead of only logging it** (sev medium, small) — In SudoVdaDisplay::new (sudovda.rs:412-432) parse {Major,Minor,Incremental} and compare against a compiled-in EXPECTED_PROTOCOL {Major:0,Minor:2}. If Major differs or our Minor > driver Minor, return Err with a 'driver too old / incompatible — update SudoVDA' message (and a distinct error variant the mgmt API can surface, like Apollo's VirtualDisplayDriverReady in nvhttp.cpp:936).
- **Retry device open with exponential backoff** (sev medium, small) — Wrap open_device in SudoVdaDisplay::new (sudovda.rs:412-413) in a 20→320ms backoff loop matching Apollo; on a session-time re-open after watchdog failure, allow a few retries with ~1s spacing.
- **Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU** (sev high, medium) — Add `const IOCTL_SET_RENDER_ADAPTER: u32 = ctl(0x802);` and a `#[repr(C)] struct SetRenderAdapterParams { luid: LUID }` in sudovda.rs. Before ADD in create() (sudovda.rs:448), enumerate DXGI adapters (reuse capture/dxgi.rs adapter-by-LUID/name helpers) to match the configured/encoder GPU and issue the IOCTL so the IDD's AddOut LUID matches the capture device's adapter.
- **Derive a stable per-client MonitorGuid instead of one global constant** (sev medium, medium) — Pass a client/session identifier into create() (thread it from the m3 handshake) and derive the GUID deterministically from it (e.g. hash the client cert fingerprint into a u128), replacing the constant at sudovda.rs:452-456 and the RemoveParams guid at sudovda.rs:568. Keep a fixed probe GUID for the startup encoder probe like Apollo's PROBE_DISPLAY_UUID.
- **Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU** — SHIPPED (2026-06-20). [#16]
- **Derive a stable per-client MonitorGuid instead of one global constant** — SHIPPED (2026-06-20). [#55]
- **Add millihertz CCD mode-set with ±1 Hz fallback and SDC_SAVE_TO_DATABASE persistence** (sev medium, medium) — In set_active_mode (sudovda.rs:146-265), after the integer DEVMODE attempt add a CCD path: QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS), match the path by GDI name, set sourceMode width/height and targetInfo.refreshRate = {hz,1000}, and call SetDisplayConfig with SDC_APPLY|SDC_USE_SUPPLIED_DISPLAY_CONFIG|SDC_SAVE_TO_DATABASE. Add an alt-rate (±1) retry mirroring virtual_display.cpp:294-300.
### Windows host: running as SYSTEM, secure-desktop capture, session/desktop switching + D3D recreation, NVIDIA driver prefs (nvprefs), GPU/adapter preference, display isolation, mDNS publish
@@ -1555,7 +1557,7 @@ punktfunk's **secure-desktop / desktop-switch capture recovery is genuinely matu
##### Where punktfunk is weaker / missing / fragile
1. **No real Windows service — relies on a PsExec scheduled task.** The launch chain is a scheduled task → `PsExec64 -s -i 1``wscript.exe launch.vbs` → hidden `host-run.cmd` (`design/windows-host.md:78-84`). There is **no `SERVICE_CONTROL_SESSIONCHANGE` relaunch** — the doc even lists it as unimplemented "step 6" (`design/windows-secure-desktop.md:89`). PsExec is a 3rd-party SysInternals tool, not redistributable cleanly, and `-s -i 1` hard-codes session 1. None of the launch scripts (`launch.vbs`, `host-run.cmd`) are checked into the repo (only `scripts/headless/win-build.cmd` exists). This is the single biggest fragility vs Apollo's `sunshinesvc.cpp`.
1. **No real Windows service — relies on a PsExec scheduled task.** The launch chain is a scheduled task → `PsExec64 -s -i 1``wscript.exe launch.vbs` → hidden `host-run.cmd` (`design/windows-host.md:78-84`). There is **no `SERVICE_CONTROL_SESSIONCHANGE` relaunch** — the doc even lists it as unimplemented "step 6" (`design/archive/windows-secure-desktop.md:89`). PsExec is a 3rd-party SysInternals tool, not redistributable cleanly, and `-s -i 1` hard-codes session 1. None of the launch scripts (`launch.vbs`, `host-run.cmd`) are checked into the repo (only `scripts/headless/win-build.cmd` exists). This is the single biggest fragility vs Apollo's `sunshinesvc.cpp`.
2. **No nvprefs / NvAPI at all.** `grep` for `nvprefs|NvAPI|DRS_|PREFERRED_PSTATE|DXPRESENT` across the host returns nothing. No PREFERRED_PSTATE_MAX for the encoder, no OGL_CPL_PREFER_DXPRESENT (so GL/Vulkan fullscreen apps may not be capturable via WGC/DDA), and no undo-file crash safety.
3. **No DXGI GPU-preference / output-reparenting hook.** No MinHook of `NtGdiDdDDIGetCachedHybridQueryValue`. On a hybrid/Optimus box DXGI can reparent the SudoVDA output onto the render GPU and break DDA. punktfunk's "search all adapters" partly papers over this but does not prevent the reparenting itself.
4. **mDNS uses the cross-platform `mdns-sd` crate, not Windows-native `DnsServiceRegister`** (`discovery.rs:17`). It works, but it does NOT carry Apollo's RFC-1035 empty-TXT fix — and the GameStream/Moonlight mDNS path on Windows is unverified (`design/windows-host.md:46`). A non-RFC-compliant TXT can be rejected by Apple's resolver.
@@ -1567,12 +1569,12 @@ punktfunk's **secure-desktop / desktop-switch capture recovery is genuinely matu
#### Transfer opportunities
- **Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change** (sev high, large) — Add a small Rust service binary (new crate or punktfunk-host `service` subcommand) using windows::Win32::System::Services (RegisterServiceCtrlHandlerEx, StartServiceCtrlDispatcher) that mirrors sunshinesvc.cpp: WTSGetActiveConsoleSessionId -> DuplicateTokenEx+SetTokenInformation(TokenSessionId) -> CreateProcessAsUserW(lpDesktop=winsta0\\default) into a kill-on-close job, accept SERVICE_ACCEPT_SESSIONCHANGE, and relaunch the host on a genuine console-session change. Ship an installer and drop the PsExec dependency.
- **Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change** — SHIPPED (2026-06-20). [#24]
- **Add an NvAPI driver-settings manager (PREFERRED_PSTATE_MAX + OGL_CPL_PREFER_DXPRESENT) with a crash-safe undo file** (sev medium, large) — Add a windows-only nvprefs module wrapping NvAPI DRS (load nvapi64 dynamically, treat NvAPI_Initialize failure as 'no NVIDIA, skip'). Create a 'punktfunk' app profile with PREFERRED_PSTATE_PREFER_MAX, set OGL_CPL_PREFER_DXPRESENT_ENABLED on the base profile behind a config flag, write an undo file under %ProgramData%\\punktfunk before global changes, and call it on session start (the new stream_will_start hook below).
- **Hook win32u!NtGdiDdDDIGetCachedHybridQueryValue to stop DXGI output-reparenting on hybrid/Optimus GPUs** (sev medium, medium) — Add a once-init in the Windows capture path (capture/dxgi.rs open) that installs the same hook via a minhook-rs/detour crate (or a manual IAT/inline hook) on NtGdiDdDDIGetCachedHybridQueryValue forcing STATE_UNSPECIFIED, plus SetProcessDpiAwarenessContext(PER_MONITOR_AWARE_V2). Gate it to NVIDIA/hybrid boxes; it's process-lifetime so no teardown needed.
- **Hook win32u!NtGdiDdDDIGetCachedHybridQueryValue to stop DXGI output-reparenting on hybrid/Optimus GPUs** — SHIPPED (2026-06-20). [#57]
- **Add a Windows stream_will_start/stop hook: timer resolution, MMCSS, HIGH_PRIORITY_CLASS, display-required, headless Mouse Keys** (sev medium, medium) — Add a windows-only RAII guard invoked when a session starts (punktfunk1.rs/pipeline session setup) that raises timer resolution (NtSetTimerResolution or timeBeginPeriod(1)), DwmEnableMMCSS(true), SetPriorityClass(HIGH_PRIORITY_CLASS), and wraps the DXGI capture loop in SetThreadExecutionState(ES_CONTINUOUS|ES_DISPLAY_REQUIRED) (capture/dxgi.rs next_frame loop), reverting on drop. Optionally the headless Mouse-Keys trick for cursor visibility.
- **Use Windows-native DnsServiceRegister (or fix the TXT record) so Apple's mDNS resolver accepts the host** (sev low, medium) — Either (a) verify mdns-sd always emits an RFC-1035-valid TXT (never zero strings) and add a regression test, or (b) add a windows-only discovery backend using DnsServiceRegister via the windows crate's DNS APIs mirroring publish.cpp, including the single-empty-TXT workaround, so Apple NWBrowser/Moonlight discover the host reliably.
- **Add per-frame IDXGIFactory::IsCurrent reinit detection and switch the host clock to GetSystemTimePreciseAsFileTime** (sev medium, small) — In capture/dxgi.rs next_frame, query the cached IDXGIFactory's IsCurrent() once per loop and trigger the existing recreate path when it goes false (catches HDR/topology changes cleanly). Replace now_ns() on Windows with GetSystemTimePreciseAsFileTime converted to Unix-epoch ns so ClockProbe/ClockEcho skew correction stays accurate cross-machine.
- **Use Windows-native DnsServiceRegister (or fix the TXT record) so Apple's mDNS resolver accepts the host** — SHIPPED (2026-06-20). [#87]
- **Add per-frame IDXGIFactory::IsCurrent reinit detection and switch the host clock to GetSystemTimePreciseAsFileTime** — SHIPPED (2026-06-20). [#42]
### Completeness critic — areas flagged as under-covered
@@ -1769,18 +1771,10 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
#### 1. Switch SendInput to retry-on-failure desktop reattach (drop per-event OpenInputDesktop)
*Area:* `cmp:input` · *Windows-host:* yes · *Severity:* high · *Effort:* small
- **Apollo does:** send_input() / inject_synthetic_pointer_input() call SendInput FIRST, and only on failure (0 injected) re-run syncThreadDesktop() (OpenInputDesktop(DF_ALLOWOTHERACCOUNTHOOK)+SetThreadDesktop) and retry once, tracking the desktop in a thread_local _lastKnownInputDesktop — src/platform/windows/input.cpp:477,499 + src/platform/windows/misc.cpp:251
- **punktfunk gap:** SendInputInjector::inject() calls reattach_input_desktop() (an OpenInputDesktop+SetThreadDesktop+CloseDesktop) at the TOP of EVERY event — crates/punktfunk-host/src/inject/sendinput.rs:97,50-69. This is a syscall triple per mouse-move; punktfunk's own design/windows-secure-desktop.md:78-80 lists this exact refactor (step 2) as planned but unshipped.
- **Proposal:** Inject first; cache the HDESK thread-local; only on a 0/partial SendInput result call reattach_input_desktop() and retry once. Use DF_ALLOWOTHERACCOUNTHOOK in the OpenInputDesktop access (sendinput.rs:52-56 currently passes DESKTOP_CONTROL_FLAGS(0)) so the secure desktop is reachable. Keeps the steady-state hot path to a single SendInput call.
**SHIPPED (2026-06-20)** — per-event OpenInputDesktop dropped for inject-first + retry-on-failure desktop reattach.
#### 2. Detect resolution/format change on the acquire hot path, not only during rebuild
*Area:* `win:capture-dxgi-dd` · *Windows-host:* yes · *Severity:* high · *Effort:* small
- **Apollo does:** Every frame Apollo reads src->GetDesc() and reinits if desc.Width/Height != width_before_rotation/height_before_rotation or capture_format != desc.Format (display_vram.cpp:1215-1236, display_ram.cpp:253-265, wgc 1662-1674).
- **punktfunk gap:** punktfunk only re-reads dimensions inside recreate_dupl (dxgi.rs:1298-1313). On the normal acquire path (dxgi.rs:1426-1492) it never validates the acquired texture's desc, so a mode change that doesn't raise ACCESS_LOST leads to CopyResource of a mismatched-size/format source into a stale gpu_copy/staging/fp16_src — silent corruption or a hard copy failure.
- **Proposal:** In acquire(), after res.cast::<ID3D11Texture2D>(), call GetDesc and compare Width/Height/Format against self.width/height and the expected format (BGRA8 vs R16G16B16A16_FLOAT). On mismatch, ReleaseFrame and run the existing recreate_dupl path (or drop gpu_copy/staging/fp16/hdr10 textures and update width/height/hdr_fp16) so the encoder re-inits cleanly. This makes live resolution + HDR-toggle changes robust even when DDA doesn't fault.
**SHIPPED (2026-06-20)** — acquire-path GetDesc check now catches resolution/format changes that don't raise ACCESS_LOST.
#### 3. Per-frame IsCurrent() check to catch HDR/GPU/mode changes
*Area:* `win:capture-wgc` · *Windows-host:* yes · *Severity:* high · *Effort:* small
@@ -1790,36 +1784,13 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** Hold an IDXGIFactory1 in WgcCapturer (from the same adapter as make_device) and call IsCurrent() at the top of next_frame/wait_and_drain; on false, return the reinit signal. This pairs with wgc-size-format-reinit to give a complete change-detection story.
#### 4. Batched/GSO send for the GameStream video plane on Windows
*Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* high · *Effort:* medium · **✓ verified · ✅ DONE (2026-06-16)**
> **Resolution:** Implemented per the refined proposal. Added a reusable Windows-only
> `punktfunk_core::transport::send_uso_all(&UdpSocket, &[&[u8]]) -> io::Result<usize>` that reuses the
> native plane's proven `send_one_uso` + `uso` on/off latch + `uso_unsupported`, with the same
> uniform-size guard and ≤512-segment chunking. `gamestream/stream.rs` `sendmmsg_all` now has a
> `#[cfg(target_os="windows")]` arm that calls it per 16-packet paced burst (one `WSASendMsg` instead
> of 16 `send`s) and sends any remainder scalar; the Linux `sendmmsg` arm and a generic scalar arm are
> unchanged. PUNKTFUNK_GSO=0 kill-switch + auto-fallback inherited. Linux build unaffected;
> punktfunk-core type-checks for x86_64-pc-windows-msvc. Host Windows compile deferred to CI/dev box.
- **Apollo does:** Apollo sends every plane through platf::send_batch / send (one code path for all OSes; on Windows it uses real batched socket writes), and the video broadcast thread is the single transmit path (stream.cpp:1327, send batching at stream.cpp:1337 send_batch latency logger).
- **punktfunk gap:** The GameStream video sender's batched path is Linux-only: sendmmsg_all has a #[cfg(target_os="linux")] real implementation (stream.rs:147) and a #[cfg(not(target_os="linux"))] fallback that does one sock.send() per packet (stream.rs:185-191). On a Windows GameStream-compat host (capture IS wired for Windows via DXGI/WGC, capture.rs:261) every video datagram is an individual syscall — the native punktfunk/1 plane got Windows USO (transport/udp.rs:135) but the GameStream plane did not.
- **Proposal:** Route the GameStream video send thread through the same Windows WSASendMsg/USO + WSASend-batch path the native plane already implements in punktfunk-core transport/udp.rs (or factor that send helper into a shared module and call it from gamestream/stream.rs). Keeps GameStream-on-Windows from being syscall-bound at high bitrate.
- **Verify verdict:** `confirmed_gap` — PUNKTFUNK gap is real. The GameStream video send path uses a private `sendmmsg_all`: real `sendmmsg` only under `#[cfg(target_os="linux")]` (crates/punktfunk-host/src/gamestream/stream.rs:147-181), and a `#[cfg(not(target_os="linux"))]` fallback that does one `sock.send(p)` per packet (stream.rs:185-191). The paced sender calls it in PACE_CHUNK=16 bursts (stream.rs:230). It operates on a raw `std::net::UdpSocket` (stream.rs:66, cloned at :310), NOT the core `Transport` trait, so it does NOT pick up the native plane's USO. The GameStream host genuinely runs on Windows: `serve`/`gamestream` are not OS-gated (main.rs:81-83 dispatch is uncfg'd; gamestream/mod.rs declares `mod stream;` with no cfg), capture is wired for Windows (capture.rs:261-279 `capture_virtual_output` via SudoVDA+WGC/DXGI), and the module has explicit Windows handling (gamestream/mod.rs:209-210 APPDATA, :216-217 COMPUTERNAME). So on a Windows GameStream-compat host every video datagram is its own syscall. Meanwhile the native plane already has the answer: crates/punktfunk-core/src/transport/udp.rs:141-246 (`uso` state + `send_one_uso` via `WSASendMsg`+`UDP_SEND_MSG_SIZE`), wired default-on at udp.rs:610-647 (`send_gso`), called by session.rs:182. Also note GameStream video datagrams are uniform `blocksize` (= packet_size+16): data shards, the zero-padded last data shard, and FEC parity shards are all full blocksize (gamestream/video.rs:41-42,76,111-166) — the exact uniform-size precondition USO/GSO needs. APOLLO confirms the claimed unified path: `platf::send_batch` (src/platform/common.h:697) is the single video transmit call (src/stream.cpp:1598, in videoBroadcastThread, latency-logged at stream.cpp:1337); its Windows impl is real USO — `WSASendMsg` with a `UDP_SEND_MSG_SIZE` cmsg of `header_size+payload_size` (src/platform/windows/misc.cpp:1408,1499,1508), with a per-packet `send()` fallback (misc.cpp:1510-1587) "if USO is not supported ... caller will fall back to unbatched sends" (misc.cpp:1504-1505).
- **Refined:** Route the GameStream Windows video send through USO instead of per-packet `send`. Do NOT duplicate the WSASendMsg code — factor the native plane's USO helper out of `UdpTransport`. Extract `send_one_uso` + the `uso` enable/latch state + `uso_unsupported` + the uniform-size chunking loop (currently udp.rs:185-246 and the `send_gso` Windows body udp.rs:610-647) into a small `pub(crate)` free function in punktfunk-core, e.g. `transport::udp::send_packets_uso(socket: &UdpSocket, packets: &[&[u8]]) -> io::Result<usize>` that takes a raw connected `std::net::UdpSocket` (the GameStream sender already owns one) and applies USO with the same default-on + auto-fallback-to-per-packet + PUNKTFUNK_GSO=0 kill-switch semantics. Then rewrite gamestream/stream.rs `sendmmsg_all` so the `#[cfg(target_os="windows")]` arm calls that helper (the Linux arm keeps its sendmmsg; a `not(any(linux,windows))` arm keeps the scalar loop). GameStream packets are already uniform blocksize per the packetizer, so the USO uniform-size guard passes; the existing PACE_CHUNK=16 microburst pacing is unaffected (each chunk becomes one WSASendMsg). Add a Linux GSO arm too while there (same helper pattern) for parity, but USO/Windows is the point of this item. Keep the change inside punktfunk-core for the helper (one core, C-ABI-stable — no new public ABI surface needed, it's pub(crate)) and a ~10-line edit in the host. This respects: no async on frame path (native sockets only), no protocol change, no scaling change.
**SHIPPED (2026-06-16)** — Windows USO batched send for the GameStream video plane via the reusable `punktfunk_core::transport::send_uso_all` helper (one WSASendMsg per 16-packet paced burst, PUNKTFUNK_GSO=0 kill-switch + auto-fallback); Host Windows compile CI-pending.
#### 5. Gate the GameStream HTTPS plane on the paired-cert allow-list
*Area:* `cmp:gamestream-http-pairing` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
- **Apollo does:** Apollo defers TLS verification (nvhttp.cpp:88 sets verify_peer|verify_fail_if_no_peer_cert with a permissive OpenSSL cb, then the accept() override runs cert_chain.verify() post-handshake and stashes the matched named_cert_t into request->userp; every authenticated handler calls get_verified_cert(request) — nvhttp.cpp:665-667,915,1086,1172,1360 — so an unpaired cert is rejected with a proper XML body, not just accepted).
- **punktfunk gap:** punktfunk pins the client cert at pairing (pairing.rs:230-236) and loads it into AppState.paired (mod.rs:134) but NEVER consults it: tls.rs:38-45 verify_client_cert always returns assertion(), and /launch (nvhttp.rs:87-109) does no identity check. Any client that completed a TLS handshake — paired or not — can launch a session.
- **Proposal:** After the handshake, recover the peer cert (axum_server exposes the rustls connection / peer certs), SHA-256 it, and check it against AppState.paired in /launch, /resume, /applist, /cancel (and reflect the real result in serverinfo PairStatus). Keep verify_client_cert lenient for the handshake but reject unpaired identities at the handler with an XML error, mirroring Apollo's get_verified_cert pattern. This is the single highest-value GameStream-compat hardening item and applies equally to the Windows host.
**SHIPPED (2026-06-20)** — gamestream/tls.rs surfaces the verified peer cert (PeerCertFingerprint) and nvhttp.rs gates /launch /resume /applist /cancel on the paired-fingerprint set (closes the "any TLS client can launch" hole).
#### 6. Query NVENC encode capabilities before init and degrade gracefully
*Area:* `cmp:video-encode` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
- **Apollo does:** nvenc_base.cpp:175-220 builds a get_encoder_cap lambda over nvEncGetEncodeCaps and checks NV_ENC_CAPS_WIDTH_MAX/HEIGHT_MAX (rejects with a clear message), SUPPORT_10BIT_ENCODE, SUPPORT_YUV444_ENCODE, SUPPORT_REF_PIC_INVALIDATION (toggles encoder_params.rfi), SUPPORT_CUSTOM_VBV_BUF_SIZE (nvenc_base.cpp:250-255), SUPPORT_CABAC (nvenc_base.cpp:311-315), SUPPORT_WEIGHTED_PREDICTION (nvenc_base.cpp:220), and SUPPORT_INTRA_REFRESH/SINGLE_SLICE_INTRA_REFRESH (nvenc_base.cpp:334-345). Each missing cap downgrades a feature instead of failing.
- **punktfunk gap:** crates/punktfunk-host/src/encode/nvenc.rs:131-323 init_session never calls nvEncGetEncodeCaps. Max W/H is only checked against a static per-codec constant (encode.rs:57-62) not the GPU's real cap; 10-bit Main10 is forced (nvenc.rs:233-237) without checking SUPPORT_10BIT_ENCODE; custom VBV (nvenc.rs:224-227) is set without checking SUPPORT_CUSTOM_VBV_BUF_SIZE. On an unsupported card these surface as opaque InvalidParam handled only by bitrate step-down, which masks the real cause.
- **Proposal:** Add a caps query in NvencD3d11Encoder::init_session right after open_encode_session_ex: build a get_cap(NV_ENC_CAPS) helper over nvEncGetEncodeCaps, validate encodeWidth/Height against WIDTH_MAX/HEIGHT_MAX with a clear error, gate the 10-bit path on SUPPORT_10BIT_ENCODE (fall back to 8-bit with a warning instead of failing), gate custom VBV on SUPPORT_CUSTOM_VBV_BUF_SIZE, and record an rfi-supported flag for the RFI work below.
**SHIPPED (2026-06-20)** — encode/nvenc.rs query_caps probes nvEncGetEncodeCaps and degrades gracefully (over-range reject, 10-bit→8-bit fallback, custom-VBV gate, RFI flag); Windows compile CI-pending.
#### 7. Detect default-render-device changes and reinit WASAPI capture
*Area:* `cmp:audio` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
@@ -1829,11 +1800,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** In wasapi_cap.rs, register a device-notification callback on the DeviceEnumerator; on default-render change, break the capture loop and reopen get_default_device(Render) + a fresh loopback IAudioClient (re-running the init block at wasapi_cap.rs:105-133). Surface it through the existing thread without tearing down the WasapiLoopbackCapturer handle so the session keeps streaming.
#### 8. Move GameStream input injection off the ENet service thread
*Area:* `cmp:input` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
- **Apollo does:** The control thread only enqueues bytes + schedules a task; a pool thread pops one packet, batches later same-type packets while holding the queue lock, then RELEASES the lock before the (slow) SendInput/ViGEm call — src/input.cpp:1481-1520, 1639-1643. A slow OS input call never stalls the network thread.
- **punktfunk gap:** on_receive() calls inj.inject(&ev) synchronously inside the host.service() ENet loop — crates/punktfunk-host/src/gamestream/control.rs:84-91,207-211. A SendInput that blocks crossing a desktop switch (or a slow ViGEm update) head-blocks ENet handshake/keepalive/retransmit servicing. The m3 path already does this right (punktfunk1.rs:1300 → injector_service_thread).
- **Proposal:** Mirror the m3 design in the GameStream control thread: push decoded InputEvents onto an mpsc channel drained by a dedicated injector thread (reuse injector_service_thread or a sibling), so the ENet thread never blocks on SendInput/ViGEm. No async needed — native thread + std::sync::mpsc, consistent with the invariant.
**SHIPPED (2026-06-20)** — on_receive forwards to a shared crate::inject InjectorService thread (+ relative-mouse/scroll coalescing, #45); the ENet thread no longer blocks on injection.
#### 9. Actually launch the app/game on Windows (CreateProcessAsUserW into the user session)
*Area:* `cmp:process-launch` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
@@ -1864,18 +1831,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** In WgcCapturer::process_frame, call src.GetDesc() and compare Width/Height/Format against self.width/height and the expected format. On mismatch, return a Reinit error (add a capture_e::Reinit-equivalent to the Capturer contract or bail with a recognizable error the m3/stream loop maps to a capturer rebuild). Drop and re-create fp16_src/hdr10_out/bgra_copy when size changes.
#### 13. Split every cursor shape into an alpha image + an XOR image (two-pass composite)
*Area:* `win:cursor-compositing` · *Windows-host:* yes · *Severity:* high · *Effort:* medium · **✅ DONE (2026-06-16)**
> **Resolution:** Implemented in `capture/dxgi.rs`. `convert_pointer_shape` now returns a `CursorShape`
> with optional `alpha`/`xor` layers; `CursorCompositor` holds `tex_alpha`/`tex_xor` and `draw_layer`
> renders each with its own blend (alpha = src-over + HDR scale; XOR = inversion, unscaled). MASKED_COLOR
> opaque pixels now go through the alpha pass (not the invert blend), and MONOCHROME `(1,1)` invert pixels
> now feed the XOR layer (previously approximated as solid black). CPU path blends both layers too.
> The `cursor_invert` flag was removed. Independently reviewed (ship); pending Windows CI/dev-VM compile.
- **Apollo does:** Apollo emits two BGRA images per shape — make_cursor_alpha_image (display_vram.cpp:279) and make_cursor_xor_image (display_vram.cpp:210) — and runs both an alpha-blend pass and an invert-blend pass in blend_cursor (display_vram.cpp:1448-1469), each skipped if its image is empty. MASKED_COLOR and MONOCHROME shapes legitimately need both.
- **punktfunk gap:** convert_pointer_shape (dxgi.rs:566) produces ONE image and cursor_invert (dxgi.rs:1133-1134) picks ONE blend for the whole shape, so a cursor mixing opaque and screen-inverting pixels (common I-beams and themed arrows) renders wrong; masked-color opaque pixels are even forced through the invert blend (dxgi.rs:612-624 + 1205).
- **Proposal:** Refactor convert_pointer_shape in dxgi.rs to return two optional images (alpha, xor) mirroring Apollo's split. Store cursor_shape as Option<(alpha, xor)>, upload up to two SRVs in CursorCompositor, and in composite_cursor_gpu run the alpha pass with self.blend then the xor pass with self.blend_invert (skip empties). Drop the single cursor_invert flag.
**SHIPPED (2026-06-16)** — two-pass cursor composite in capture/dxgi.rs (CursorShape alpha/xor layers, CursorCompositor draw_layer; MASKED_COLOR→alpha, MONOCHROME (1,1)→XOR; cursor_invert flag removed). Windows CI/dev-VM compile pending.
#### 14. Map absolute mouse through the real virtual-desktop / output rect, not a blind 0..65535 normalize
*Area:* `win:input-sendinput-vigem` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
@@ -1892,11 +1848,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS.
#### 16. Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU
*Area:* `win:virtual-display-sudovda` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
- **Apollo does:** setRenderAdapterByName enumerates DXGI adapters, matches desc.Description, and issues SET_RENDER_ADAPTER with that adapter's LUID before every create (virtual_display.cpp:624-654, sudovda.h:109-128, called at main.cpp:369-371 and process.cpp:250-252).
- **punktfunk gap:** punktfunk defines no IOCTL_SET_RENDER_ADAPTER and never binds the render adapter (sudovda.rs:47-54). On a hybrid/multi-GPU box the IDD may render on the iGPU while NVENC + Desktop Duplication run on the dGPU, breaking or slowing zero-copy.
- **Proposal:** Add `const IOCTL_SET_RENDER_ADAPTER: u32 = ctl(0x802);` and a `#[repr(C)] struct SetRenderAdapterParams { luid: LUID }` in sudovda.rs. Before ADD in create() (sudovda.rs:448), enumerate DXGI adapters (reuse capture/dxgi.rs adapter-by-LUID/name helpers) to match the configured/encoder GPU and issue the IOCTL so the IDD's AddOut LUID matches the capture device's adapter.
**SHIPPED (2026-06-20)** — SET_RENDER_ADAPTER (IOCTL 0x802) now binds the IDD render GPU to the capture/encode adapter on hybrid/multi-GPU boxes.
#### 17. Add streaming_will_start/stop session-level latency tuning on Windows
*Area:* `win:critic` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
@@ -1913,43 +1865,16 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** On the capture thread, register an IMMNotificationClient (or poll GetDefaultAudioEndpoint) and treat a default-render change OR a device-invalidated error as a re-open: tear down the IAudioClient and re-acquire the new default endpoint in-place, like the Linux PipeWire reconnect discipline. Lives entirely in audio/wasapi_cap.rs
#### 19. Implement true reference-frame invalidation with a multi-ref DPB instead of always-full-IDR
*Area:* `cmp:video-encode` · *Windows-host:* yes · *Severity:* high · *Effort:* large
- **Apollo does:** nvenc_base.cpp:268-281 sets maxNumRefFrames/maxNumRefFramesInDPB to 5 (HEVC/H264) and L0 to 1, enabling a deep DPB; invalidate_ref_frames (nvenc_base.cpp:574-610) calls nvEncInvalidateRefFrames per lost frame range, dedupes already-done ranges, falls back to IDR only when the range exceeds the DPB, and sets rfi_needs_confirmation so the next encoded frame is marked as the RFI fulfilment (nvenc_base.cpp:551-557, 490-491).
- **punktfunk gap:** crates/punktfunk-host/src/encode/nvenc.rs leaves ref frames at the preset default and exposes only request_keyframe (nvenc.rs:465-467) which always emits a full FORCE_IDR. gamestream/control.rs:163-177 collapses both RFI (0x0301) and request-IDR (0x0302) into the same full-IDR. A full IDR at high resolution is the multi-millisecond spike punktfunk's own infinite-GOP comments call out (linux.rs:197-201) — true RFI avoids it for recoverable loss.
- **Proposal:** Extend the Encoder trait with an invalidate_ref_frames(first,last) method (default: fall back to request_keyframe). In the Windows NVENC config set maxNumRefFramesInDPB/maxNumRefFrames>1 (and numRefL0=1) gated on SUPPORT_MULTIPLE_REF_FRAMES, implement invalidate_ref_frames via nvEncInvalidateRefFrames with the dedupe + IDR-fallback logic, and route control.rs 0x0301 to invalidate (carrying the lost frame range) while 0x0302 stays full-IDR.
**SHIPPED (2026-06-20)** — Encoder::invalidate_ref_frames added (Windows NVENC multi-ref DPB + nvEncInvalidateRefFrames; GameStream 0x0301 routes to invalidate); Linux degrades to IDR; NVENC impl CI-pending. See also #22.
#### 20. In-binary Windows service install + interactive-session launch
*Area:* `cmp:config-management` · *Windows-host:* yes · *Severity:* high · *Effort:* large
- **Apollo does:** config.cpp:1490-1534 handles the Windows shortcut/service launch dance inside the binary: --shortcut/--shortcut-admin handling, ShellExecuteExW(runas, --shortcut-admin) to self-elevate when the service isn't running, waits for the service, wait_for_ui_ready(), launch_ui(), then returns 1 so the foreground process does NOT also start a stream host. This is Sunshine/Apollo's mature service<->UI two-process split that makes one-click launch work.
- **punktfunk gap:** punktfunk has no service-install / self-elevation / interactive-session bring-up in the binary. Deployment is documented as a manual chain of external scripts — scheduled task -> PsExec64 -i 1 -> launch.vbs -> host-run.cmd (design/windows-host.md:77-96) — fragile and operator-hostile. main.rs has no install/service subcommand.
- **Proposal:** Add `punktfunk-host install`/`uninstall`/`service` subcommands (Windows-gated) that register a service or an Interactive/Highest scheduled task to launch the host in Session 1 (the documented requirement for DXGI duplication + SendInput), and the self-elevate-if-not-running shortcut path. Reuse the existing capture/wgc_relay CreateProcessAsUserW machinery already in the crate. This codifies the script chain into the binary without touching the per-frame path or core.
**SHIPPED (2026-06-20)** — in-binary punktfunk-host service subcommand installs/launches the host into the interactive session (PsExec chain dropped). See also #24.
#### 21. Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame
*Area:* `win:cursor-compositing` · *Windows-host:* yes · *Severity:* high · *Effort:* large · **⊘ ALREADY-HANDLED (2026-06-16)**
> **Resolution — not a bug for punktfunk.** The gap below assumes a cursor moving over a static screen
> produces `AcquireNextFrame` **timeouts**. It does not: DXGI returns **S_OK for pointer-only updates**
> (`FrameInfo.LastMouseUpdateTime != 0`, `LastPresentTime == 0`), with the resource holding the
> (unchanged) desktop. `acquire()` always re-runs `present_acquired` on S_OK (`dxgi.rs:1407,1474`), which
> re-copies the desktop and recomposites the cursor at its new position. `last_present` is repeated only
> on a genuine `WAIT_TIMEOUT` (nothing changed) or a mid-rebuild gap — correct. The agent that raised this
> didn't account for DDA's pointer-update S_OK semantics, and the run was killed before the verify phase
> reached it. The only real delta from Apollo is a **perf** micro-opt (Apollo retains a clean copy and
> re-blends just the cursor rect, avoiding a full ~29 MB `CopyResource` per pointer update) — deferred as
> optional, pending evidence of GPU-copy pressure.
- **Apollo does:** Apollo treats a mouse-only update as a real update (display_vram.cpp:1162-1168) and keeps an intermediate D3D surface of the last desktop frame so it can copy surface->fresh image and re-blend the cursor at its new position with no new DDA frame (last_frame_variant state machine, display_vram.cpp:1239-1306).
- **punktfunk gap (as originally filed — see Resolution above; premise incorrect):** punktfunk only composites on a fresh AcquireNextFrame (dxgi.rs:1477); on timeout it repeats last_present (dxgi.rs:1547-1561) which has the OLD cursor position baked in, so a cursor moving over a static screen stutters/lags.
- **Proposal (superseded; only the perf variant remains):** Keep a clean intermediate copy of the last desktop frame (an extra DEFAULT texture). In acquire (dxgi.rs:1341), when AcquireNextFrame times out but update_cursor saw a position change (LastMouseUpdateTime changed) and the cursor is visible, copy the clean intermediate into gpu_copy and re-run composite_cursor_gpu, then return that as a fresh frame instead of repeating last_present.
**NOT-A-BUG (2026-06-16)** — premise incorrect: DXGI returns S_OK for pointer-only updates (LastMouseUpdateTime != 0, LastPresentTime == 0) and acquire() recomposites the cursor at its new position; last_present is repeated only on a genuine WAIT_TIMEOUT. Only an optional perf micro-opt remains (Apollo re-blends just the cursor rect to avoid a full CopyResource per pointer update).
#### 22. Add real reference-frame invalidation (RFI) instead of always forcing IDR
*Area:* `win:nvenc-d3d11` · *Windows-host:* yes · *Severity:* high · *Effort:* large
- **Apollo does:** Apollo keeps a deep DPB (maxNumRefFrames 5/HEVC, 8/AV1) but pins L0 ref to 1 (nvenc_base.cpp:268-281), then on a loss event calls nvEncInvalidateRefFrames per-frame over the requested range, dedups against the last range, expands to the last-encoded index, escalates to IDR only if the range exceeds DPB depth, and tags the next frame rfi_needs_confirmation (nvenc_base.cpp:574-610). This lets the encoder re-reference an older still-valid frame rather than emit a multi-millisecond keyframe.
- **punktfunk gap:** punktfunk has NO invalidate path — request_keyframe() always forces a full IDR (nvenc.rs:437-442,465-467); punktfunk1.rs:2153 / gamestream/stream.rs:336 wire 'RFI' straight to a keyframe. Every recovery is a costly IDR spike, defeating the infinite-GOP design.
- **Proposal:** In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire punktfunk1.rs RFI requests to it, falling back to request_keyframe() only when it returns false.
**SHIPPED (2026-06-20)** — real RFI via nvEncInvalidateRefFrames with dedup + IDR-on-overflow; control plane 0x0301 routes to invalidate. NVENC impl CI-pending. See #19.
#### 23. Add a DS4 (DualShock4) ViGEm target on Windows with type auto-selection, motion, touchpad, battery and timestamp pump
*Area:* `win:input-sendinput-vigem` · *Windows-host:* yes · *Severity:* high · *Effort:* large
@@ -1959,38 +1884,16 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** In gamepad_windows.rs, add a DS4Wired branch via vigem_client::DualShock4Wired with a union/enum PadEntry. Resolve type from the decoded Arrival (precedence: explicit env/client choice > PS type > motion/touchpad caps > X360), mirroring the existing GAMEPAD-preference negotiation. Port Apollo's wTimestamp pump (5.333us units, re-send every 100ms), motion calibration constants (:157-170), and the touchpad byte packing (:1604-1608). Surface the LED color via the existing 0xCA/feedback plane.
#### 24. Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change
*Area:* `win:system-secure-desktop` · *Windows-host:* yes · *Severity:* high · *Effort:* large
- **Apollo does:** SunshineSvc.exe runs as LocalSystem in Session 0, loops on WTSGetActiveConsoleSessionId, clones its own token with DuplicateTokenEx(TokenPrimary)+SetTokenInformation(TokenSessionId) and CreateProcessAsUserW into winsta0\\default inside a per-session job object (JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE|BREAKAWAY_OK); opts into SERVICE_ACCEPT_SESSIONCHANGE and on WTS_CONSOLE_CONNECT terminates+relaunches the host in the new session (tools/sunshinesvc.cpp:95,111,239,256,267,276-294)
- **punktfunk gap:** punktfunk has no Windows service; launch is a PsExec64 -s -i 1 scheduled task hard-coded to session 1 (design/windows-host.md:78-84), with the SERVICE_CONTROL_SESSIONCHANGE relaunch listed as unimplemented step 6 (design/windows-secure-desktop.md:89). Launch scripts are not even in the repo.
- **Proposal:** Add a small Rust service binary (new crate or punktfunk-host `service` subcommand) using windows::Win32::System::Services (RegisterServiceCtrlHandlerEx, StartServiceCtrlDispatcher) that mirrors sunshinesvc.cpp: WTSGetActiveConsoleSessionId -> DuplicateTokenEx+SetTokenInformation(TokenSessionId) -> CreateProcessAsUserW(lpDesktop=winsta0\\default) into a kill-on-close job, accept SERVICE_ACCEPT_SESSIONCHANGE, and relaunch the host on a genuine console-session change. Ship an installer and drop the PsExec dependency.
**SHIPPED (2026-06-20)** — real Windows service relaunches the host on console-session change (SERVICE_ACCEPT_SESSIONCHANGE); PsExec scheduled-task dropped. See also #20.
#### 25. Elevate capture/encode/send thread priority on the host hot path
*Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* medium · *Effort:* small · ** verified**
- **Apollo does:** Apollo raises the transmit/capture thread priority: platf::adjust_thread_priority(thread_priority_e::critical) in the video broadcast thread (stream.cpp:1122) and ::high in the audio/control paths (stream.cpp:1333, 1672); the Windows impl is SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST/ABOVE_NORMAL) (platform/windows/misc.cpp:1081-1102).
- **punktfunk gap:** punktfunk names its hot-path threads (stream.rs:44 video, stream.rs:204 send, punktfunk1.rs:1804 send_loop, punktfunk1.rs:2017/2328 send threads) but never sets a scheduling priority — every host capture/encode/send thread runs at default priority. Only the macOS client elevates (client.rs:169). On a loaded Windows desktop the encode/send thread can be preempted, adding jitter the frame-pacing logic can't recover.
- **Proposal:** Add a cross-platform raise_current_thread_priority() helper (SetThreadPriority on Windows, optionally AvSetMmThreadCharacteristics for MMCSS; sched/nice on Linux) and call it at the top of the GameStream send thread, the native send_loop, and the encode thread. Cheap, high-value jitter reduction, no design impact.
- **Verify verdict:** `confirmed_gap` — punktfunk: NO thread-priority call exists anywhere in the workspace (grep for SetThreadPriority/sched_setscheduler/setpriority/AvSetMm/THREAD_PRIORITY across crates/ returned zero hits). Hot-path threads are named-only at default priority: GameStream video thread crates/punktfunk-host/src/gamestream/stream.rs:44-53 (thread::Builder name "punktfunk-video") and GameStream send thread stream.rs:204-206 ("punktfunk-send"); native send threads crates/punktfunk-host/src/punktfunk1.rs:2017-2033 and punktfunk1.rs:2328-2333 ("punktfunk-send"), and the native send_loop at punktfunk1.rs:1804 — all spawned with no priority set. The encode work shares the capture thread (punktfunk1.rs:2011-2013 "this thread captures+encodes ... and hands each AU to a dedicated send thread"), also default priority. The windows crate is ALREADY a dependency with the needed feature: crates/punktfunk-host/Cargo.toml:141 enables "Win32_System_Threading" (SetThreadPriority/GetCurrentThread available, zero new deps). Apollo: confirmed it raises priority on every hot-path thread — capture src/video.cpp:1295 (critical), encode src/video.cpp:2359 and 2396 (high), video send src/stream.cpp:1333 (high), control src/stream.cpp:1122 (critical), audio src/stream.cpp:1672 + src/audio.cpp:94/208. Windows impl is SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST/ABOVE_NORMAL) at src/platform/windows/misc.cpp:1081-1102, plus DwmEnableMMCSS(true) (misc.cpp:1139) and AvSetMmThreadCharacteristics("Pro Audio") for the audio-capture thread (src/platform/windows/audio.cpp:540). CRITICAL NUANCE: Apollo's adjust_thread_priority is effectively Windows-only — src/platform/linux/misc.cpp:362-364 is "// Unimplemented" and src/platform/macos/misc.mm:218-220 is "// Unimplemented".
- **Refined:** Add a small cross-platform helper raise_current_thread_priority(level) and call it at the TOP of each hot-path thread body (so the calling thread itself is elevated): the GameStream send thread (stream.rs:206), the GameStream video/capture+encode thread (stream.rs:46), the native send threads (punktfunk1.rs:2021 and punktfunk1.rs:2331 closures, before/at the start of send_loop), and the native capture+encode thread (the punktfunk1.rs run body that owns capture+encode, punktfunk1.rs ~2011+). Windows: SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST) for the send/network thread (latency-critical, matches Apollo's video-send=high but the punktfunk send thread also does FEC+seal so HIGHEST is defensible) and THREAD_PRIORITY_ABOVE_NORMAL for capture+encode — using the windows crate already on Cargo.toml:141, no new deps. Optionally associate the network/encode thread with MMCSS via AvSetMmThreadCharacteristics (needs the Win32_System_Threading "Games"/"Pro Audio" task + AVRT feature) for higher-fidelity scheduling under DWM load; treat as a follow-up, not the first cut. Linux (net-new beyond Apollo, since Apollo leaves it unimplemented and punktfunk is Linux-first): best-effort nice(-10)/setpriority on the send+encode threads — note SCHED_FIFO/RR requires CAP_SYS_NICE/rtprio limits the host won't have by default, so do NOT default to realtime; a plain niceness bump is the safe portable choice and silently no-ops without privilege. Make every priority call best-effort (log-and-continue on failure, exactly as Apollo does at misc.cpp:1104). No async, no per-frame allocation, no ABI surface change — purely thread-setup, so no design invariant is touched.
**SHIPPED (2026-06-20)** — hot-path capture/encode/send threads now elevate priority (Windows SetThreadPriority HIGHEST for send / ABOVE_NORMAL for capture+encode; best-effort niceness on Linux, no-ops without privilege), per the verified plan.
#### 43. Socket QoS / DSCP marking on the media sockets
*Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* medium · *Effort:* medium · **✓ verified**
- **Apollo does:** Apollo tags video and audio sockets for prioritized delivery: enable_socket_qos(...qos_data_type_e::video...) and (...audio...) called per session (stream.cpp:1917, stream.cpp:1938); the Windows impl uses qWAVE QOSCreateHandle/QOSAddSocketToFlow with DSCP tagging (platform/windows/misc.cpp:1616-1652), with Linux/macOS equivalents.
- **punktfunk gap:** punktfunk sets NO QoS/DSCP anywhere — grep for qos/DSCP/IP_TOS across crates/punktfunk-host and crates/punktfunk-core finds only the x-nv-vqos ANNOUNCE keys (rtsp.rs:278) and a macOS *client* pthread QoS (client.rs:169). Neither the GameStream sockets (stream.rs:66 bind, audio) nor the native data socket (transport/udp.rs) request link-layer/router priority.
- **Proposal:** Add a small per-OS helper to mark the video/audio/data UDP sockets: DSCP EF/AF41 via IP_TOS/IPV6_TCLASS on Linux/macOS, qWAVE QOSAddSocketToFlow on Windows (gated behind an env/config opt-in). Wire it into stream.rs socket setup and the native transport socket creation. Directly improves latency under contended Wi-Fi / shared uplink.
- **Verify verdict:** `confirmed_gap` — PUNKTFUNK — gap is real on every media socket. Native data plane crates/punktfunk-core/src/transport/udp.rs:359-365 (UdpTransport::connect) and :374-414 (connect_via_punch) grow SO_SNDBUF/SO_RCVBUF (:431-447 grow_buffers via socket2::SockRef) and set GSO/USO, but never set IP_TOS/IPV6_TCLASS/SO_PRIORITY/qWAVE. GameStream sockets are bare std UdpSocket with no QoS: video crates/punktfunk-host/src/gamestream/stream.rs:66, audio audio.rs:305, control control.rs:36. RTSP does NOT parse the GameStream qosTrafficType keys at all (grep qosTrafficType in crates/punktfunk-host → exit 1), and rtsp.rs only reads x-nv-vqos bitrate/fec/codec (rtsp.rs:278). The only QoS in the tree is a macOS *client* pthread QoS-class (core/src/client.rs:156-169) — unrelated to link-layer marking. socket2 is already a punktfunk-core dep (Cargo.toml:34), so DSCP via SockRef::set_tos is trivial to add. APOLLO — confirmed it does exactly this, on by default. Per-session calls: src/stream.cpp:1917 (video) and :1938 (audio) → platf::enable_socket_qos(..., videoQosType/audioQosType != 0). Those flags come from RTSP src/rtsp.cpp:1005-1006 and are DEFAULTED non-zero at src/rtsp.cpp:982-983 (x-nv-vqos qosTrafficType="5", x-nv-aqos="4"), so QoS is on for stock Moonlight. Linux impl: src/platform/linux/misc.cpp:797-851 sets IP_TOS/IPV6_TCLASS (DSCP 40=AF41 video, 48=CS6 audio, shifted <<2) plus SO_PRIORITY 5/6. Windows impl: src/platform/windows/misc.cpp:1616-1722 dynamically loads qwave.dll and uses QOSCreateHandle/QOSAddSocketToFlow with QOSTrafficTypeAudioVideo/Voice — and crucially returns nullptr (no-op) unless dscp_tagging is set (:1622-1625). macOS: src/platform/macos/misc.mm:446.
- **Refined:** Add a per-OS set_media_qos(socket, kind) helper. Linux/macOS: use the already-present socket2 — SockRef::set_tos(AF41<<2) for IPv4 / set_tclass_v6 for IPv6, plus SO_PRIORITY on Linux (video=5, audio=6, the max without CAP_NET_ADMIN; set AFTER TOS since TOS resets it — Apollo linux/misc.cpp:841-845). Wire it into UdpTransport::connect / connect_via_punch (the native punktfunk/1 data plane — the primary, highest-value target) behind an opt-in env (PUNKTFUNK_DSCP=1) and optionally a Config field, plus the GameStream stream.rs:66 / audio.rs:305 / control.rs:36 sockets. IMPORTANT Windows-host caveat (this is the user's focus and where the naive version fails): on Windows, plain IP_TOS setsockopt is silently stripped by the OS unless a registry/group-policy QoS policy ('Do not use NLA') is configured — which is exactly why Apollo uses qWAVE (QOSAddSocketToFlow) instead. So a one-line socket2 set_tos does NOT tag on the wire on Windows. To actually deliver value on the Windows host, port Apollo's qWAVE path (runtime LoadLibraryExA qwave.dll, QOSCreateHandle once, QOSAddSocketToFlow per socket with QOSTrafficTypeAudioVideo/Voice) including the dual-stack v4-mapped connect() workaround (windows/misc.cpp:1675-1700) — note our data socket is already connect()ed (udp.rs:361), which sidesteps most of that hack. Keep RAII teardown (QOSRemoveSocketFromFlow on drop) like Apollo's qos_t/deinit_t. This is purely socket-setup, off the per-frame path, no core C-ABI change, no async — fully compatible with all three design invariants.
**SHIPPED (2026-06-20)** — punktfunk_core::transport::qos set_media_qos marks the native + GameStream media sockets (DSCP CS5 video / CS6 audio via IP_TOS + Linux SO_PRIORITY 5/6, opt-in PUNKTFUNK_DSCP=1). Windows caveat: plain IP_TOS is a no-op on the wire without a qWAVE policy — porting Apollo's qWAVE path (QOSAddSocketToFlow) remains a documented follow-up.
#### 90. Bitrate-derived rate-control pacing (vs frame-interval-only)
*Area:* `cmp:protocol-streaming` · *Windows-host:* no · *Severity:* medium · *Effort:* medium · **✓ verified**
- **Apollo does:** Apollo paces each frame's packets at the *negotiated bitrate*: ratecontrol_packets_in_1ms = giga*80/100/1000/blocksize/8 (stream.cpp:1464) and sleeps the send loop to that per-millisecond budget across the frame (stream.cpp:1578-1627), so the sender shapes to the link's allotted rate, not just the frame deadline.
- **punktfunk gap:** Both punktfunk send pacers spread purely over the FRAME INTERVAL: the GameStream sender uses budget = frame_interval * 0.75 (stream.rs:209) and the native paced_submit uses budget to next frame's deadline * 0.9 (punktfunk1.rs:1752) — neither derives a packets-per-ms budget from cfg.bitrate_kbps (the bitrate is only used to open NVENC, stream.rs:275). A spiky IDR or VBR overshoot can still microburst above the negotiated rate within its frame window.
- **Proposal:** Compute a bitrate-derived per-millisecond send budget (like Apollo's ratecontrol_packets_in_1ms) from the negotiated bitrate and pace overflow to THAT rate inside paced_submit / spawn_sender, taking the min of the frame-interval budget and the bitrate budget. Smooths VBR bursts on rate-limited links without breaking the existing microburst fast-path.
- **Verify verdict:** `partial` — PUNKTFUNK gap is real: both pacers spread over the FRAME INTERVAL only, never the bitrate. GameStream sender: `let budget = frame_interval.mul_f32(0.75)` (crates/punktfunk-host/src/gamestream/stream.rs:209). Native paced_submit: `let budget = deadline.checked_duration_since(pace_start)...mul_f32(0.9)` (crates/punktfunk-host/src/punktfunk1.rs:1752-1755) where deadline = `next += interval` (punktfunk1.rs:2162) and `interval = Duration::from_secs_f64(1.0 / effective_hz...)` (punktfunk1.rs:2357). bitrate_kbps only configures NVENC (stream.rs:275; punktfunk1.rs:2306, 2694) and is never fed to the pacer. So far the gap claim holds. BUT the Apollo characterization in the proposal is FACTUALLY WRONG: Apollo's `size_t ratecontrol_packets_in_1ms = std::giga::num * 80 / 100 / 1000 / blocksize / 8;` (/home/enricobuehler/Apollo/src/stream.cpp:1464) is a HARDCODED 80% of 1 Gigabit/sec — a fixed constant. grep across stream.cpp shows the negotiated/session bitrate never enters this formula (only std::giga::num, blocksize, and the 80/100 constant appear at lines 1464/1578-1582/1625-1627). Apollo paces to a FIXED ~800 Mbps link ceiling regardless of negotiated bitrate; it is NOT "negotiated-bitrate pacing." punktfunk's own design notes deliberately reject clamping to negotiated bitrate: "The encoder is pixel-rate bound, not bitrate bound" (punktfunk1.rs:321) and the whole 1Gbps+ effort raised the ceiling (punktfunk1.rs:1617-1619, MAX_BITRATE_KBPS ~2 Gbps).
- **Refined:** Reject the proposal AS WRITTEN — its premise ("Apollo paces to the negotiated bitrate") is false; Apollo paces to a hardcoded 80%-of-1Gbps fixed link ceiling (stream.cpp:1464), and pacing to negotiated bitrate would actively regress punktfunk (VBR/IDR spikes legitimately exceed average bitrate, and punktfunk explicitly treats the encoder as pixel-rate-bound, not bitrate-bound — punktfunk1.rs:321). If anything is worth porting, it is the FIXED per-millisecond link-rate ceiling concept, not bitrate-derived pacing: optionally compute a fixed packets-per-ms budget from a configurable link-rate ceiling (default high, e.g. matching MAX_BITRATE_KBPS, env-overridable like PUNKTFUNK_PACE_BURST_KB) and take min(frame-interval budget, link-ceiling budget) inside paced_submit/spawn_sender — purely as a microburst smoother for rate-limited links, NOT tied to cfg.bitrate_kbps. Note punktfunk already has the microburst fast-path (burst_cap, punktfunk1.rs:2005-2009 / paced_submit:1734-1743) and frame-interval spreading, which together already address the "spiky IDR microburst" symptom the proposal cites. Recommend deferring unless a measured rate-limited-link regression appears; the current frame-interval + burst-cap pacing covers the cited risk.
**REJECTED / OBSOLETE (2026-06-20)** — proposal premise is false: Apollo paces to a hardcoded ~80%-of-1Gbps FIXED link ceiling (stream.cpp:1464), NOT the negotiated bitrate, and punktfunk is pixel-rate-bound by design (VBR/IDR spikes legitimately exceed average bitrate). Existing frame-interval + burst-cap pacing already covers the cited microburst risk; defer unless a measured rate-limited-link regression appears. (If anything, port the FIXED link-ceiling concept via an env knob like PUNKTFUNK_PACE_BURST_KB, not bitrate-derived pacing.)
#### 94. Consume the GameStream client loss-stats report
*Area:* `cmp:protocol-streaming` · *Windows-host:* no · *Severity:* low · *Effort:* small · **✓ verified**
@@ -2001,5 +1904,5 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Verify verdict:** `confirmed_gap` — PUNKTFUNK gap is real. crates/punktfunk-host/src/gamestream/control.rs:165-177 — after decrypt, the only inner-type dispatch is `if matches!(inner, 0x0301 | 0x0302 | 0x0305)` → force_idr; everything else falls through to gamepad::decode (returns None for non-controller) then input::decode, which at crates/punktfunk-host/src/gamestream/input.rs:35 returns empty unless `type == 0x0206`. So a loss-stats packet (`0x0201`) is silently dropped — `on_receive` has no branch for it. A broad grep across crates/ for loss-stats/last-good-frame/0x0201 found nothing (only DXGI's unrelated "last good frame" comment at capture/dxgi.rs:751). The native plane has only end-of-burst ProbeResult bandwidth/loss telemetry (crates/punktfunk-core/src/client.rs:436, abi.rs:1499) — a one-shot speed test, NOT continuous in-stream loss feedback. APOLLO confirms the claim: src/stream.cpp:41 `#define IDX_LOSS_STATS 3`, src/stream.cpp:61 maps it to wire type `0x0201`, and src/stream.cpp:943-957 reads `int32_t *stats` with stats[0]=count, stats[1]=time-window ms, stats[3]=lastGoodFrame (logged at BOOST verbose). Wire offset confirmed: the map callback receives `next_payload = plaintext.data()+4` (src/stream.cpp:1104), i.e. the body AFTER the 4-byte `[type][payloadLength]` header — so stats[0..] is at body offset 0. Note: Apollo only LOGS it; it does not yet drive adaptive FEC/bitrate off it either.
- **Refined:** Add one branch to control.rs `on_receive`: when the decrypted `pt` inner type (LE u16 at pt[0..2]) == 0x0201 and pt.len() >= 20, decode the body as four LE i32 — pt[4..8]=loss_count, pt[8..12]=time_window_ms, pt[16..20]=last_good_frame (mirroring Apollo's stats[0]/stats[1]/stats[3]; verify endianness against a real Moonlight capture — moonlight-common-c writes these as host-order/LE, and punktfunk already treats control inner fields as LE). Initially log at debug/trace and optionally surface via an AtomicU32 in AppState or the mgmt API so the web console can show client-observed loss. Keep it read-only first. Caveat for the backlog: this is a low-value telemetry hook, NOT adaptive control. The actual lever (adaptive FEC % / bitrate de-rating) is a separate, larger piece of work that Apollo itself does not implement off this signal — do not over-scope. Place it next to the existing 0x0301/0x0302/0x0305 dispatch so the control hot path stays a single decrypt + cheap type match. windowsHost=false is correct: this is GameStream-plane, OS-independent, and the punktfunk/1 native plane is the higher-priority protocol — so prioritize accordingly.
_(28 detailed; remaining 68 medium/low items are in the table above with citations available in Parts 23.)_
_(28 items had detail subsections — 16 shipped/obsolete ones are now collapsed to one-liners above, 12 still-open ones keep full citations; the remaining 68 medium/low items are in the table above with citations available in Parts 23.)_