# Apollo vs punktfunk — architecture map & transferable improvements > Generated 2026-06-16 by the `apollo-vs-punktfunk` multi-agent workflow, then reconstructed from > the run journal after the live run was interrupted. **Apollo** = `~/Apollo` (commit `adc5c5a0`), > a C++ fork of Sunshine — a Moonlight-compatible streaming **host only** (no client of its own). > **punktfunk** = this repo. **Focus: the Windows host.** ## How to use this doc (for future agents) - **Part 1 — Apollo architecture map**: what each Apollo subsystem does + an Apollo→punktfunk file index. - **Part 2 — Subsystem comparison**: where each project leads, and punktfunk's *intentional* divergences. - **Part 3 — Windows host deep-dive** (the centerpiece): reference-grade detail on Apollo's Windows implementation vs punktfunk's, per topic. - **Part 4 — Prioritized improvement backlog**: 96 candidate transfers, Windows-host first. **Provenance / trust:** the 7 subsystem maps, 7 comparisons, and 9 Windows investigations completed fully; every claim carries `file:line` citations into *both* codebases. The maintainer spot-verified the four highest-impact Windows claims against live code (all confirmed). Of 96 backlog items, 5 went through the workflow's independent adversarial-verify pass (flagged ✓V); the rest are agent-proposed with citations but not yet independently re-verified — treat as high-quality leads, confirm before large work. --- ## Part 1 — Apollo architecture map Apollo is host-only. A stream flows: **nvhttp** (HTTPS pairing + serverinfo/applist/launch) → **rtsp** (SDP negotiation) → **process/display_device** (launch app, set up display) → **platform capture** (DXGI/WGC/kms/X11) → **video** (color convert + NVENC/VAAPI/etc.) + **audio** → **stream** (RTP + Reed-Solomon FEC + control over ENet) → Moonlight. ### Apollo → punktfunk file index | Apollo subsystem | Apollo key files | punktfunk counterpart | |---|---|---| | Apollo — Protocol & streaming (RTP/FEC/ENet/RTSP/crypto) | `stream.cpp`; `rtsp.cpp`; `crypto.cpp`; `network.cpp`; `rtsp.h`; `stream.h` | `gamestream/stream.rs`; `gamestream/video.rs`; `gamestream/rtsp.rs`; `gamestream/control.rs`; `gamestream/audio.rs`; `punktfunk1.rs` | | Apollo (Sunshine fork) — GameStream HTTP server, pairing ceremony, and discovery/UPnP | `nvhttp.cpp`; `nvhttp.h`; `httpcommon.cpp`; `upnp.cpp`; `crypto.h`; `crypto.cpp` | `gamestream/nvhttp.rs`; `gamestream/pairing.rs`; `gamestream/tls.rs`; `gamestream/cert.rs`; `gamestream/serverinfo.rs`; `gamestream/mod.rs` | | Apollo — Video encode pipeline & codec config | `video.cpp`; `nvenc_base.cpp`; `video_colorspace.cpp`; `cbs.cpp`; `nvenc_config.h`; `video.h` | `encode.rs`; `encode/nvenc.rs`; `encode/sw.rs`; `encode/linux.rs`; `zerocopy/mod.rs`; `gamestream/control.rs` | | Apollo — Audio capture, encode, transport (Windows host) | `audio.cpp`; `audio.h`; `audio.cpp`; `common.h`; `stream.cpp` | `audio.rs`; `audio/wasapi_cap.rs`; `audio/linux.rs`; `gamestream/audio.rs`; `punktfunk1.rs` | | Apollo (Sunshine fork) — Input handling & injection | `input.cpp`; `input.cpp`; `keylayout.h`; `misc.cpp` | — | | Apollo: App/process launch & display configuration (Windows host) | `process.cpp`; `display_device.cpp`; `process.h`; `virtual_display.h`; `misc.cpp`; `utils.cpp` | `vdisplay/sudovda.rs`; `vdisplay.rs`; `gamestream/apps.rs`; `library.rs`; `punktfunk1.rs`; `capture/wgc_relay.rs` | | Apollo: Config, management/web UI, system tray | `config.h`; `config.cpp`; `confighttp.cpp`; `confighttp.h`; `system_tray.cpp`; `system_tray.h` | `mgmt.rs`; `mgmt_token.rs`; `main.rs`; `native_pairing.rs`; `library.rs`; `docs/windows-host.md` | ### Apollo — Protocol & streaming (RTP/FEC/ENet/RTSP/crypto) **Purpose.** Apollo's GameStream-compatible streaming transport: the host side of the Moonlight wire protocol. RTSP (TCP) negotiates one session's parameters; three UDP-adjacent planes carry it — video (RTP + Reed-Solomon FEC over raw UDP), audio (RTP + fixed RS FEC over raw UDP), and a reliable control plane over ENet (its own UDP socket). crypto.cpp supplies the AES-GCM/CBC/ECB primitives plus X.509 cert handling used across all planes; network.cpp is small networking glue (net-type classification, ENet host creation, port mapping, mDNS naming). This is the subsystem punktfunk's GameStream-compat host must mirror byte-for-byte to keep stock Moonlight clients working; it is NOT the punktfunk/1 native protocol. #### Control flow (one session, host side) 1. **RTSP handshake (TCP, port base+21)** — `rtsp.cpp`. A singleton `rtsp_server_t server` (rtsp.cpp:643) runs a boost::asio io_context on one thread (`rtsp.cpp:1202` `start()`). It maps 5 verbs (`rtsp.cpp:1188-1192`): OPTIONS, DESCRIBE, SETUP, ANNOUNCE, PLAY. - Before RTSP, the HTTPS pairing/launch code calls `launch_session_raise()` (rtsp.cpp:645) which arms a single pending `launch_session_t` on `launch_event` plus a `ping_timeout` discard timer (rtsp.cpp:492-511). **Only one pending RTSP session at a time** (rtsp.cpp:493-496, 517-529). - `handle_accept` (rtsp.cpp:448) pops that pending launch_session, attaches it to the socket, and starts reading. No pending session → connection closed immediately (rtsp.cpp:464-471) — explicitly so port scanners don't spam logs. - DESCRIBE (`cmd_describe`, rtsp.cpp:779) emits the SDP-ish capability blob: feature flags, **encryption supported/requested negotiation** (rtsp.cpp:794-812), HEVC/AV1 availability, surround-sound channel-mapping params (incl. the GFE channel-rotation workaround, rtsp.cpp:838-848). - ANNOUNCE (`cmd_announce`, rtsp.cpp:921) is the heavy one: it parses the client's `a=` attribute lines into a map (rtsp.cpp:956-971), fills `stream::config_t` (resolution/fps/bitrate/packetsize/FEC-min/codec/HDR/encryption flags, rtsp.cpp:993-1065), runs **bitrate de-rating math** (subtract FEC overhead, audio bandwidth, 500 Kbps A/V+control overhead — rtsp.cpp:1113-1133), rejects HEVC/AV1-requested-but-disabled and mandatory-encryption-non-compliant clients, then `stream::session::alloc()` + `stream::session::start()` (rtsp.cpp:1159-1168). SETUP (rtsp.cpp:862) just echoes the UDP port numbers and the X-SS-Ping-Payload / X-SS-Connect-Data identifiers used later to bind the UDP/ENet peers to this session. 2. **Broadcast subsystem (UDP + ENet)** — `stream.cpp`. A single process-wide `broadcast_ctx_t` (stream.cpp:334, lazily created via `safe::make_shared` at stream.cpp:488) owns 4 long-lived threads started in `start_broadcast` (stream.cpp:1755): `recvThread`, `videoBroadcastThread`, `audioBroadcastThread`, `controlBroadcastThread`. Two bound `udp::socket`s (video port base+9, audio base+11) and one ENet host (control, base+10). **The broadcast threads are shared across all sessions**; per-session routing happens by peer/session-id lookup. 3. **Per-session capture→send.** `session::start` (stream.cpp:2089) spawns per-session `videoThread`/`audioThread` (stream.cpp:2115-2116). Each blocks in `recv_ping` (stream.cpp:1849) waiting for the client's first UDP PING (so the host learns the client's source ip:port — NAT/port discovery) then calls `video::capture`/`audio::capture`. The capture pipeline pushes encoded `video::packet_t`/`audio::packet_t` onto a `mail` queue; `videoBroadcastThread` (stream.cpp:1327) and `audioBroadcastThread` (stream.cpp:1651) pop from those queues and do the RTP/FEC/encrypt/pace/send work. So **capture/encode and transmit are decoupled by a queue and run on different threads** (one transmit thread per plane, shared by all sessions). 4. **Control plane.** `controlBroadcastThread` (stream.cpp:930) registers handlers in a `type → callback` map (stream.cpp:931-1119) and runs `control_server.iterate(150ms)` (stream.cpp:1198) which is `enet_host_service`. Inbound: PING/loss-stats/IDR-request/ref-frame-invalidation/input/encrypted-wrapper/server-cmd. Outbound (driven by per-session queues drained in the same loop, stream.cpp:1173-1186): rumble, rumble-triggers, motion-event, RGB LED, adaptive triggers, HDR-mode, termination. Sessions are matched to ENet peers via `control_server_t::get_session` (stream.cpp:490): fast O(1) map by `ENetPeer*`, slow path by `connect_data` (ML_FF_SESSION_ID_V1 clients) or by source IP (legacy). #### Plane summary - **Video** (stream.cpp:1327): raw UDP, RTP header + `NV_VIDEO_PACKET`, GF(2^8) Reed-Solomon FEC (≤255 shards/block, up to 4 blocks), optional per-packet AES-GCM, in-frame rate-control pacing. - **Audio** (stream.cpp:1651): raw UDP, RTP, **fixed** RS FEC geometry (RTPA_DATA_SHARDS data + RTPA_FEC_SHARDS parity), optional AES-CBC per packet, FEC parity emitted at block boundaries. - **Control** (stream.cpp:930): ENet (reliable, ordered) over its own UDP socket, with an *application-layer* AES-GCM wrapper (type 0x0001) layered on top. - **RTSP** (rtsp.cpp): TCP, optional AES-GCM whole-message encryption with a length-prefixed framed header. **Key techniques** - **Video FEC: per-frame RS blocks aligned to packet size, up to 4 blocks** — fec::encode (stream.cpp:649) computes data_shards = ceil(payload/blocksize), parity_shards = ceil(data*fec%/100), bumped up to session->config.minRequiredFecPackets minimum (stream.cpp:659-664). videoBroadcastThread (stream.cpp:1411-1460) caps at MAX_FEC_BLOCKS=4 (2 bits in the protocol's multiFecBlocks field), computes max_data_shards_per_fec_block = 255*100/(100+fec%) because GF(2^8) caps total shards at 255 (stream.cpp:1420), splits the frame into blocksize-aligned blocks, and turns FEC off entirely for frames too large to fit 4 blocks (stream.cpp:1428-1432). Reed-Solomon via the rust-wrapped reed_solomon_new/encode (rswrapper). — _This is the GF(2^8)/≤255-shard ceiling that punktfunk's CLAUDE.md explicitly calls out as the ~1 Gbps wall and the reason punktfunk/1 uses GF(2^16) Leopard. But Apollo's GameStream-compat path MUST replicate this exact 4-block / 255-shard / multiFecBlocks geometry to interop with stock Moonlight. The min-parity-shard bump and 'disable FEC for giant frames' fallback are non-obvious correctness details a Rust reimplementation will get wrong without reading this._ - **Zero-copy FEC via shard pointers into the encoder's own payload buffer** — fec::encode (stream.cpp:676-700) does NOT copy the encoded frame: it points shards_p[x] directly into the payload buffer for all aligned data shards, and only allocates a separate buffer for (a) the single zero-padded trailing data shard and (b) the parity shards. payload_buffers (a vector of {ptr,len} descriptors, stream.cpp:634) describes the original payload + the shard buffer so the batched-send layer can scatter-gather both without a merge copy. There's a deliberate memcpy/memset instead of std::copy_n for the pad shard because GCC compiled the latter to an element-by-element loop (stream.cpp:688-696). — _Directly relevant to punktfunk's zero-copy invariant: Apollo avoids a full-frame copy on the hot path even though FEC normally implies materializing all shards. The pattern (original payload stays put; only padding+parity are freshly allocated; sends are scatter-gather over buffer descriptors) is transferable to punktfunk's send path. The GCC codegen footgun is a real lesson._ - **In-frame rate-control pacing targeting ~80% of 1 Gbps** — videoBroadcastThread (stream.cpp:1462-1627) computes ratecontrol_packets_in_1ms = 1Gbps*0.8/1000/blocksize/8 (stream.cpp:1464), then paces sends WITHIN a frame: every send batch, if a 1ms packet budget is exhausted it sleeps via a high-precision timer until the computed 'due' time (stream.cpp:1578-1590). It carries ratecontrol_next_frame_start across frames so back-to-back frames don't burst (stream.cpp:1477, 1625-1627). Batch size is min(64KB/blocksize, 64 packets) — 64K because Windows bypasses SO_SNDBUF above that and stalls in 'Other I/O'; 64 because Linux GSO maxes at 64 segments (stream.cpp:1468-1474). — _Sub-frame pacing prevents the receiver's UDP socket buffer from overflowing on bursty large IDR frames — a major real-world packet-loss source. The 64KB-Windows / 64-segment-Linux batch caps are hard-won platform facts directly applicable to punktfunk's Windows host (which already does sendmmsg batching). punktfunk's CLAUDE.md mentions sendmmsg + paced send thread as a port from GameStream — this is the exact source._ - **Deterministic AES-GCM IV construction (NIST SP 800-38D 8.2.1) per plane** — Every encrypted plane builds a 12-byte GCM IV from a monotonic 32- or 64-bit counter in the low bytes plus a 2-byte 'fixed field' tag in the high bytes that uniquely identifies the plane+direction: video uses iv[11]='V' + 64-bit counter (stream.cpp:1554-1564); control out uses 'HC' (Host/Control, stream.cpp:459-460), control in 'CC' (stream.cpp:1084-1085); RTSP in 'CR' (rtsp.cpp:215-216), RTSP out 'HR' (rtsp.cpp:721-722). Legacy Nvidia clients fall back to a 16-byte IV with just seq in byte 0 (stream.cpp:461-466, 1086-1091). Each client supplies its own unique key so the fixed field only needs to disambiguate independent uses of THAT key. — _This is the precise IV scheme a punktfunk GameStream-compat host must reproduce or Moonlight's decrypt fails. The per-direction/per-plane nonce-fixed-field idea mirrors punktfunk-core's 'per-direction nonce salts + seq-as-AAD' invariant — same threat model, different encoding. The 'CH/CC/CR/RH/VR' two-byte tags are not documented anywhere but here._ - **Layered reliable+encrypted control protocol over ENet** — Control uses ENet (reliable, ordered UDP) but wraps an additional AES-GCM layer for controlProtocolType==13 clients. encode_control (stream.cpp:434) prepends a control_encrypted_t {type=0x0001, length, seq} header, encrypts the inner control_header_v2+payload, and bumps a per-session seq used as the GCM invocation counter. Inbound, the IDX_ENCRYPTED handler (stream.cpp:1055) decrypts then re-dispatches via server->call(...,reinjected=true); call() (stream.cpp:557) DROPS any non-encrypted message on an encrypted stream (anti-downgrade). Input data is special-cased because the input handler does its own legacy decryption (stream.cpp:1112-1115). — _Shows how to add confidentiality on top of an off-the-shelf reliable-UDP lib without forking it, and the reinjection pattern for transparent decrypt-then-dispatch. The anti-downgrade drop (reject plaintext on an encrypted channel) is a security property punktfunk should mirror. ENet here is the analog of punktfunk/1's QUIC datagram side-plane demux-by-first-byte._ - **Per-client X509_STORE to dodge the OpenSSL multi-client verify bug** — cert_chain_t::verify (crypto.cpp:55) keeps a SEPARATE X509_STORE per added client cert and tries each in turn, because X509_verify_cert against a store holding multiple self-signed certs only ever verifies ONE of them — which would let only a single Moonlight client ever authenticate (comment crypto.cpp:45-52). It also sets X509_V_FLAG_PARTIAL_CHAIN (don't validate full chain for client auth, crypto.cpp:68), tolerates expired/not-yet-valid certs via a custom verify_cb (crypto.cpp:29-43, matching GFE behavior for embedded clients with bad clocks), and treats self-signed/invalid-CA as 'try next' rather than fatal (crypto.cpp:79). — _A subtle, easy-to-miss OpenSSL gotcha that would silently limit a host to one paired client. The expired-cert tolerance and partial-chain flag are interop requirements for real-world Moonlight clients. Any punktfunk pairing/trust code that uses OpenSSL-style stores for multiple pinned client certs hits the same trap._ - **PING-based UDP peer discovery + dual session-binding (connect_data vs IP)** — The host doesn't know the client's UDP source ports until the client sends them. recv_ping (stream.cpp:1849) registers a message queue keyed by the X-SS-Ping-Payload (and, for legacy clients only, by peer IP) and blocks until a matching PING arrives, then records peer ip:port. recvThread (stream.cpp:1234) demuxes inbound UDP: 4-byte legacy PINGs matched by source address, >=sizeof(SS_PING) PINGs matched by embedded payload identifier (stream.cpp:1296-1312). Control sessions bind via ML_FF_SESSION_ID_V1 connect_data when available, else source IP (stream.cpp:511-525). Local source address is captured from the control connection for correct routing on multi-homed hosts (stream.cpp:531-536). — _This is how a single shared video/audio/control socket triplet serves multiple sessions and survives NAT — the binding can't be done at SETUP time. The payload-vs-IP dual scheme (v2 vs legacy) and the multi-homed source-address capture are exactly the edge cases that make a naive reimplementation fail on real networks._ **Hard-won lessons (edge cases handled)** - Frames needing >4 FEC blocks (≈>800 packets at 20% FEC) silently disable FEC for that frame rather than corrupt the protocol; frames >4096 packets (10-bit FEC index overflow) are logged as unrecoverable but still sent (stream.cpp:1428-1432, 1445-1449). A real encoder-broke detector. - minRequiredFecPackets forces a parity-shard floor even when the % formula yields fewer, recomputing the effective FEC% upward (stream.cpp:659-664) — clients request this for tiny frames where percentage-based parity rounds to zero. - The audio RS parity matrix from the RS lib does NOT match Nvidia's; Apollo overwrites rs.get()->p with a hardcoded 8-byte matrix from OpenFEC (stream.cpp:1659-1665). Only possible because audio shard counts are fixed/known. A reimplementation using a different RS lib must replicate Nvidia's exact parity matrix or Moonlight's audio FEC recovery fails. - Send batches must stay under 64KB on Windows (above that bypasses SO_SNDBUF, lands in 'Other I/O', waits on interrupts → inconsistent latency) and under 64 segments on Linux (GSO segment cap) (stream.cpp:1468-1474). Two different OS limits collapsed into one min(). - Video SO_SNDBUF is bumped to 1MB and failure is non-fatal (stream.cpp:1776-1781). Mirrors punktfunk's documented net.core.wmem_max wall. - Encrypted control stream rejects any plaintext message (anti-downgrade), and a control_encrypted message whose decrypted inner type is itself IDX_ENCRYPTED is treated as an attack and the session is killed (stream.cpp:558-562, 1106-1110). - GCM decrypt/tag-verify failure on control or input immediately stops the session (stream.cpp:993, 1099) — no silent drop; a forged packet tears the session down. - RTSP read paths bound the 2048-byte msg_buf before reading and 400-reject oversize messages, separately for plaintext and the length-prefixed encrypted header (rtsp.cpp:99-109, 162-168, 244-254). Encrypted RTSP rejects any message without the MSB-set encrypted-type bit (rtsp.cpp:152-158). Maps to punktfunk's 'bound attacker-controlled fields before allocating' invariant. - Only ONE pending RTSP launch session is allowed; a second session_raise while one is pending is silently ignored, and a ping_timeout timer discards it if the client never completes the handshake (rtsp.cpp:492-511). Apollo (like punktfunk today) is effectively one-session-at-a-time on the negotiation path. - Stereo audio 'high quality' is signaled by whether the RTSP Host header matches a local IP — Moonlight sends 0.0.0.0 for low quality, so Apollo just substring-checks for 0.0.0.0 instead of enumerating interfaces (rtsp.cpp:1072-1081). A genuinely weird GFE protocol quirk. - GFE advertises an incorrect surround channel mapping; Moonlight compensates by rotating channels right from index 3, so Apollo pre-rotates LEFT from index 3 to cancel it (rtsp.cpp:838-848). Must be replicated for correct 5.1/7.1. - session::join arms a 10-second watchdog that hard-traps the process if a session won't terminate, specifically to recover from an NVENC deadlock bug under hardware-accelerated GPU scheduling (stream.cpp:2025-2038). Pragmatic 'crash-to-recover' for a driver bug. - RTP video timestamps use a 90kHz clock from frame_timestamp; duplicate frames have no timestamp so the rate-control next-frame-start is substituted (stream.cpp:1526-1534). - Per-packet video encryption is done IN PLACE into the FEC shard buffer, with a per-shard prefix header (12B IV + frameNumber + 16B tag) carved out by giving fec::encode a non-zero prefixsize (stream.cpp:1507, 1552-1571). The FEC and crypto layers are interleaved, not sequential. - RFC 6763 mDNS instance names are truncated to 63 chars, spaces→dashes, and stop at the first non-alnum/non-dash char; empty → 'Apollo' (network.cpp:204-225). - Encryption mode is chosen per remote address: PC/LAN addresses use lan_encryption_mode, WAN uses wan_encryption_mode (network.cpp:147-154); MANDATORY mode rejects non-compliant clients at SETUP with 403 (rtsp.cpp:1149-1157). *External deps:* moonlight-common-c (vendored: Limelight-internal.h, Rtsp.h — RTP_PACKET/NV_VIDEO_PACKET/SS_PING/AUDIO_FEC_HEADER structs, parseRtspMessage/serializeRtspMessage/createRtspResponse, RTPA_*_SHARDS + DATA_SHARDS_MAX + MAX_RTP_HEADER_SIZE constants, FLAG_* and SS_ENC_* flags), ENet (reliable-UDP control plane: enet_host_create/host_service/peer_send/packet_create), boost::asio (RTSP TCP server + UDP recv io_context; boost::endian for wire byte order), OpenSSL (EVP AES-128 GCM/CBC/ECB, X.509 gen/parse/sign/verify, RAND, SHA-256), reed_solomon (RS FEC, via rswrapper.h — a Rust-backed wrapper around the RS encoder), platf:: platform layer (platf::send / platf::send_batch / batched_send_info_t, enable_socket_qos, high-precision timer, thread priority, from_sockaddr) ### Apollo (Sunshine fork) — GameStream HTTP server, pairing ceremony, and discovery/UPnP **Purpose.** The GameStream-compatible front door: an HTTPS+HTTP server speaking the NVIDIA GameStream/nvhttp protocol that stock Moonlight clients talk to. It owns (1) the unauthenticated HTTP plane (serverinfo + pairing) and the mTLS HTTPS plane (everything authenticated); (2) the 4-phase Moonlight PIN pairing crypto ceremony and a persistent paired-client cert store with a fine-grained per-device permission model; (3) session launch/resume/cancel which constructs the RTSP launch_session and triggers display reconfig + encoder probing; and (4) reachability plumbing — UPnP/PCP port mapping + IPv6 pinholes (upnp.cpp) and the credential/CA-cert bootstrap + curl helpers (httpcommon.cpp). This is the directly transferable reference for punktfunk's GameStream-compat host: it is the same wire protocol punktfunk's gamestream/ subsystem already implements, but Apollo's is the battle-tested Windows lineage with the edge cases worked out. #### Two listeners, one verify hook (nvhttp.cpp start(), lines 1636-1771) `start()` spins up TWO servers on a base port + offset (`net::map_port`, network.cpp:187 — every Apollo port is `config.port + offset`, so PORT_HTTP=0→47989, PORT_HTTPS=-5→47984, RTSP=21, Video=9, Control=10, Audio=11): - **Plain HTTP** (`http_server_t`, port 47989): exposes ONLY `/serverinfo` and `/pair`. Unauthenticated — used by Moonlight before a client cert exists. serverinfo over HTTP returns a placeholder MAC `00:00:00:00:00:00` and Permission=0 (nvhttp.cpp:942-943). - **HTTPS w/ mutual TLS** (`SunshineHTTPSServer`, port 47984): everything authenticated. The custom subclass (nvhttp.cpp:68-139) overrides `after_bind()` to set `verify_peer | verify_fail_if_no_peer_cert | verify_client_once` with a permissive OpenSSL callback that ALWAYS returns 1 — real verification is deferred to AFTER the TLS handshake so the server can reply with a proper XML 401 error body instead of a TCP reset (the `accept()` override, nvhttp.cpp:97-138, sets TCP_NODELAY, does the async handshake, then calls the user `verify` lambda). The `verify` lambda (nvhttp.cpp:1661-1700) pulls the peer X509, runs `cert_chain.verify()`, and on success stashes the matched `named_cert_t` shared_ptr into `request->userp`. Every authenticated handler then calls `get_verified_cert(request)` (nvhttp.cpp:665-667) to recover the device identity + permissions — this is how Apollo binds a TLS connection to a paired-device record + permission set for the lifetime of the request. #### Handler routing (nvhttp.cpp:1717-1730) HTTPS: `/serverinfo /pair /applist /appasset /launch /resume /cancel /actions/clipboard(GET|POST)`. HTTP: `/serverinfo /pair` only. `launch` and `resume` capture a shared `bool host_audio` by reference (resume often omits localAudioPlayMode, so the value launch stored is reused — nvhttp.cpp:1392-1398). #### The 4-phase PIN pairing FSM (nvhttp.cpp:485-650, header docs nvhttp.h:137-185) Driven by the `phrase`/argument present in the GET query, all over the same `/pair` endpoint, tracked by `pair_session_t.last_phase` (PAIR_PHASE enum) keyed by client `uniqueid` in `map_id_sess`: 1. **getservercert** (phrase=getservercert): client sends salt + clientcert. Server derives `AES = gen_aes_key(salt, PIN)`, returns its plaincert. The PIN is obtained out-of-band (web UI / system tray / stdin / OTP) — see PIN delivery below. 2. **clientchallenge**: AES-ECB decrypt challenge, hash(challenge + server-cert-signature + serversecret), encrypt hash+serverchallenge back. 3. **serverchallengeresp**: decrypt client hash, sign serversecret with host privkey, return as pairingsecret. 4. **clientpairingsecret**: recompute hash(serverchallenge + client-cert-sig + client-secret), compare to stored client hash AND verify256 the secret's signature against the client cert. If both pass → MITM-free → mint a `named_cert_t`, assign PERM (_all if first-ever device, else _default=view|list), `add_authorized_client()`. Each phase guards `last_phase` ordering and calls `fail_pair()` (nvhttp.cpp:477-483) on any mismatch — which deletes the whole session, forcing a clean re-pair. This is a strict state machine, not a stateless one. #### PIN delivery (3 modes, nvhttp.cpp:752-800, 831-880, 1773-1784) - **OTP** (`otpauth` arg): a pre-armed 4-digit one-time PIN (`request_otp`) bound by `hash(pin + salt + passphrase)`; expires after 180s. Crucially, on OTP mismatch it STILL calls getservercert with a random PIN (nvhttp.cpp:781) — the attacker fails silently in later phases (no oracle leak). - **Async web/tray PIN**: getservercert stashes the HTTP `response` object into `async_insert_pin.response` and DISABLES its fail_guard (nvhttp.cpp:796-799) — the HTTP request hangs open until an operator calls the out-of-band `pin(pin,name)` API (nvhttp.cpp:831), which validates 4-numeric-digits, finishes getservercert, and writes the deferred response. - **PIN_STDIN**: synchronous console prompt. #### Paired-client persistence (load_state/save_state, nvhttp.cpp:206-366) JSON state file holds the host `uniqueid` + a `named_devices` array (name, cert PEM, uuid, display_mode, perm, do/undo command lists, legacy_ordering / allow_client_commands / always_use_virtual_display flags). load_state migrates an OLD `devices[].certs[]` format (nvhttp.cpp:320-338). save_state dedups certs and disambiguates duplicate names with " (2)" suffixes. After every mutation the pattern is save_state()→load_state() to rebuild the in-memory `cert_chain` from disk (nvhttp.cpp:376-379, 1791-1792). #### Launch/resume/cancel (nvhttp.cpp:1151-1493) `make_launch_session` (382-471) parses rikey/rikeyid into the GCM key+IV, enables encrypted RTSP when client `corever>=1`, parses "WxHxFPS" mode (default 1920x1080x60000, fps*1000 denominator), and resolves the per-device display_mode override. launch enforces permissions, the single-app-at-a-time rule, mandatory-encryption rejection (`encryption_mode_for_address`), then `configure_display()` + `probe_encoders()` BEFORE returning the rtsp:// sessionUrl0 and raising the session. cancel terminates RTSP sessions, the app, and reverts display config. #### Discovery + reachability - **mDNS** is NOT in these files — it lives in platform `publish.cpp` (Avahi on Linux, DNS-SD on Win/macOS), advertising the GameStream service; network.cpp only sanitizes the instance name (mdns_instance_name, RFC 6763 §7.2: ≤63 chars, spaces→dashes, stop at first invalid char). - **upnp.cpp** runs a background thread (deinit_t, RAII) that discovers an IGD (miniupnpc, 2s timeout), maps all six stream ports (+ optional Web-UI port only if WAN-exposed), opens IPv6 firewall pinholes if AF=BOTH, refreshes on REFRESH_INTERVAL (router reboots/WAN-IP changes drop rules), and unmaps everything on shutdown. **Key techniques** - **Deferred post-handshake TLS verification (reply, don't reset)** — SunshineHTTPSServer::after_bind installs a permissive OpenSSL verify_callback that always returns 1 (nvhttp.cpp:89-92); the real check runs in the user `verify` lambda AFTER the handshake completes (nvhttp.cpp:124-129). On failure it writes an XML 401 body (on_verify_failed, 1702-1715) rather than aborting the TLS handshake. — _Moonlight expects a structured GameStream XML error, not a TCP RST, when its cert isn't paired — this is what makes 'this host needs pairing' show correctly in the client instead of a generic connection failure. punktfunk's GameStream host must reproduce this exact behavior for client-side UX parity._ - **Per-cert X509_STORE verification** — cert_chain_t keeps a SEPARATE X509_STORE per paired client cert and loops verifying against each (crypto.cpp:55-85), with X509_V_FLAG_PARTIAL_CHAIN and a callback tolerating expired/not-yet-valid certs. — _OpenSSL only verifies ONE self-signed cert per store (documented at crypto.cpp:47-51) — naively adding all client certs to one store silently breaks all-but-one client. This is a non-obvious correctness trap any GameStream host with multiple paired clients hits._ - **TLS-connection→identity binding via request->userp** — On successful verify the matched named_cert_t shared_ptr is stored in request->userp (nvhttp.cpp:1697); every authenticated handler recovers it via get_verified_cert and gates on PERM bitflags (nvhttp.cpp:665). — _Gives each request a strongly-typed, pre-authenticated device identity + permission set with zero re-lookup — the clean way to attach per-device policy (display_mode override, command allow-list, virtual-display forcing) to a GameStream request._ - **Fine-grained PERM bitflag permission model** — crypto.h:45-74 defines input (controller/touch/pen/mouse/kbd), operation (clipboard/file/server_cmd), and action (list/view/launch) groups. First-ever paired device gets _all; subsequent get _default=view|list (nvhttp.cpp:629-633). Handlers check e.g. PERM::_allow_view, PERM::launch, PERM::clipboard_read. — _Far beyond Moonlight's binary paired/unpaired. Enables view-only spectators, locked-down clients, and per-device app visibility — a concrete feature punktfunk's host lacks and could adopt wholesale since it rides on the same named-cert store._ - **Async-deferred HTTP response for out-of-band PIN** — getservercert stashes the live response object and disables its fail_guard (nvhttp.cpp:796-799); the request hangs until the operator supplies the PIN via the pin() API, which writes the held response (nvhttp.cpp:867-874). — _Lets the PIN be entered in a web UI / tray long after Moonlight sent phase 1, without a polling protocol — the HTTP request itself is the wait primitive. Maps directly onto punktfunk's 'arm pairing → display PIN' mgmt flow._ - **Single base port + signed offsets** — net::map_port (network.cpp:187) computes every port as config.port + a compile-time offset (PORT_HTTPS=-5, RTSP=21, etc.). — _One config knob shifts the entire port block, and UPnP/mDNS/RTSP all derive from it consistently. Avoids the port-config drift bugs that plague multi-port streaming hosts._ - **LAN/WAN-conditional mandatory encryption** — encryption_mode_for_address (network.cpp:147) picks lan_ vs wan_encryption_mode by classifying the source address; launch/resume reject clients that can't do encrypted RTSP when the policy is MANDATORY (nvhttp.cpp:1244, 1430). — _Encrypted RTSP costs latency, so it's enforced only where needed (WAN) and optional on trusted LANs — directly relevant to a low-latency host that doesn't want to pay crypto cost on a trusted local link._ **Hard-won lessons (edge cases handled)** - MITM detection without an oracle: phase 4 requires BOTH the recomputed hash to match the stored client hash AND a verify256 signature check against the client cert (nvhttp.cpp:615-617). On OTP PIN mismatch the server proceeds with a RANDOM PIN (nvhttp.cpp:781) so an attacker can't distinguish a wrong PIN from a wrong later phase — no timing/branch oracle. - Out-of-order pairing calls are a security event, not just an error: fail_pair() deletes the entire pair_session_t (nvhttp.cpp:481), forcing a full re-pair rather than letting a half-completed session be resumed/replayed. - OpenSSL multi-client store trap: documented at crypto.cpp:47-51 — one X509_STORE verifies only a single self-signed cert, so each paired client needs its own store, else only one Moonlight install works. - Embedded-client clock skew: the verify callback explicitly accepts X509_V_ERR_CERT_NOT_YET_VALID and CERT_HAS_EXPIRED (crypto.cpp:36-38) because Moonlight on devices without RTCs produces certs with bad validity windows; matches GFE behavior. - Moonlight-Embedded forks produce client certs OpenSSL won't recognize as self-signed due to X509v3 extensions — handled with X509_V_FLAG_PARTIAL_CHAIN (crypto.cpp:68) so client-auth doesn't require full chain validation. - IPv6 LocalIP clobber: serverinfo returns 127.0.0.1 as root.LocalIP for non-v4-mapped IPv6 requests (nvhttp.cpp:955-956) because Moonlight tracks LAN IPv6 separately and a real IPv6 in LocalIP would overwrite the client's stored IPv4 — copies the GFE+GS-IPv6-Forwarder hack. - MAC address only over HTTPS to paired clients; HTTP serverinfo returns 00:00:00:00:00:00 which Moonlight knows to ignore (nvhttp.cpp:910-913, 942) — avoids leaking the MAC to unpaired/unauthenticated probers. - UPnP must tolerate broken IGDs: on a strange GetSpecificPortMappingEntry error (not 606/Unauthorized) it falls back to indefinite (lease=0) static mappings (upnp.cpp:230-234), and retries AddPortMapping with lease 0 if a timed mapping fails (upnp.cpp:251-265). It also detects and tries to delete conflicting mappings owned by other LAN clients (upnp.cpp:210-225). - UPnP rules are refreshed on an interval, not set-once: routers drop them on reboot / WAN-IP change (upnp.cpp:309-310), and ports are unmapped on clean shutdown via the RAII deinit_t (upnp.cpp:358-362). - resume often lacks localAudioPlayMode: launch stashes host_audio in a shared bool that resume reuses, and only re-reads the arg when there are no active sessions to disturb (nvhttp.cpp:1392-1398). - Device-name sanitization: parens in client names are rewritten to brackets at pair time (nvhttp.cpp:622-625) because save_state uses ' (N)' suffixes for duplicate-name disambiguation (nvhttp.cpp:237-246) — a literal paren in a name would corrupt that scheme. - Legacy Moonlight clients send devicename 'roth' which is relabeled 'Legacy Moonlight Client' (nvhttp.cpp:741); the pairing-disabled and missing-uniqueid paths return distinct XML status codes (403/400) Moonlight surfaces to the user. - Persistence rebuild discipline: every mutation does save_state() then load_state() to rebuild the in-memory cert_chain from disk (nvhttp.cpp:376-379) — guarded by FRESH_STATE flag so a clean-slate run never persists. - Encoder probing happens at launch/resume time, not startup: probe_encoders() is re-run before each stream so the chosen NVENC encoder matches the CURRENT active GPU after hotplug/driver-crash/primary-monitor change (nvhttp.cpp:1416-1426); a probe failure returns HTTP 503 with a 'is a display connected?' message. *External deps:* Simple-Web-Server (header-only Boost.Asio HTTP/HTTPS server — the request/response/routing layer), Boost.Asio + Boost.Asio.SSL (TLS context, async accept/handshake), Boost.PropertyTree (pt::ptree → GameStream XML responses via write_xml), OpenSSL (X509 cert verify, AES-ECB pairing cipher, AES-GCM RTSP cipher, RSA-2048 host cert gen, SHA-256, sign256/verify256), nlohmann::json (paired-client state file + web-creds persistence), libcurl (httpcommon: download_file with CURLSSLOPT_NATIVE_CA on Windows, url_escape, url_get_host), miniupnpc (upnp.cpp: IGD discovery, port mapping, IPv6 pinholes; API-version-gated UPNP_GetValidIGD signature), ENet (network.cpp host_create for the UDP control channel — adjacent, not in the 3 target files) ### Apollo — Video encode pipeline & codec config **Purpose.** The host-side video pipeline that turns a captured display surface into a GameStream/Moonlight-compatible compressed bitstream: it owns encoder discovery & probing, per-codec/per-colorspace encoder configuration, the capture→convert→encode→packetize loop (sync and parallel variants), low-latency rate-control/GOP/slice tuning, HDR & YUV444 negotiation, on-demand IDR + reference-frame invalidation, and SPS/VPS header rewriting to satisfy Moonlight's parser. Two encoder back-ends share one config table: libavcodec (used for D3D11VA on Windows for QSV/AMF and as the fallback for NVENC on Linux/CUDA) and a standalone hand-written NVENC wrapper (Windows-preferred for NVIDIA). #### Big picture: a declarative encoder table + a generic configurator Apollo describes every encoder as a static `encoder_t` (video.cpp:486-1042) holding three `codec_t` sub-tables (av1/hevc/h264). Each `codec_t` carries SIX option vectors — `common_options`, `sdr_options`, `hdr_options`, `sdr444_options`, `hdr444_options`, `fallback_options` (video.h:170-175) — plus the libavcodec encoder name and a runtime-filled capability bitset. Options are `option_t{name, std::variant}` (video.h:154-165); the variant lets a value be a literal, a *pointer to a config field* (live-bound to user settings), or a *lambda* evaluated against the `config_t` at open time (e.g. AMD's level picker that computes "5.1"/"5.2"/"auto" from width*height*framerate, video.cpp:788-799). This is the central design lever: adding/altering encoder behavior is data, not code. `encoder_t.flags` (flag_e, video.cpp:300-313) carries cross-cutting quirks: PARALLEL_ENCODING, H264_ONLY, LIMITED_GOP_SIZE (VAAPI), SINGLE_SLICE_ONLY (old Intel iGPUs), CBR_WITH_VBR (QSV), RELAXED_COMPLIANCE, NO_RC_BUF_LIMIT, REF_FRAMES_INVALIDATION, ALWAYS_REPROBE (software), YUV444_SUPPORT, ASYNC_TEARDOWN (NVENC). The encoder ordering list (video.cpp:1044-1059) is the preference order: on Windows {nvenc, quicksync, amdvce, software}; software is always last. #### Two encode back-ends behind one table `encode_session_t` is the abstract session (video.h:206). Two concretions: - `avcodec_encode_session_t` (video.cpp:315-390): wraps an AVCodecContext + a `platf::avcodec_encode_device_t`. Used for QSV/AMF (D3D11VA) and NVENC-on-Linux (CUDA), VAAPI, VideoToolbox, and software. Carries SPS/VPS NAL rewrite state and `replacements`. - `nvenc_encode_session_t` (video.cpp:392-436): wraps the standalone `nvenc::nvenc_base`. Windows NVENC path. Holds a `force_idr` bool toggled by request_idr_frame. `make_encode_session` (video.cpp:1898) dispatches by encode-device dynamic type. The encode device is created earlier by `make_encode_device` (video.cpp:2096) which the *display/capture backend* fulfils via `disp.make_avcodec_encode_device(pix_fmt)` or `disp.make_nvenc_encode_device(pix_fmt)` — i.e. the capture backend hands the encoder a GPU surface/texture; that is the zero-copy seam. #### The libavcodec configurator — make_avcodec_encode_session (video.cpp:1518-1888) This is the single most important function. Control flow: 1. Resolve `codec_from_config(config)` → the right codec_t; bail if not PASSED or `disp->is_codec_supported` is false (1533-1547). Reject HDR if !DYNAMIC_RANGE, reject YUV444 if !YUV444 cap. 2. Pick the sw pixel format from (bit_depth × chromaSamplingType) → one of the four encoder_platform_formats fields (1557-1561). 3. **Retry loop (2 iterations)** so fallback_options can be applied on a second avcodec_open2 attempt (1569). Inside: - set width/height/time_base/framerate; set profile per videoFormat (1576-1597); - `max_b_frames = 0` (B-frames add decoder latency, never used, 1600); - **infinite GOP**: gop_size = INT16_MAX if LIMITED_GOP_SIZE else INT_MAX, keyint_min = INT_MAX (1602-1607) — keyframes are produced on demand only; - flags = CLOSED_GOP | LOW_DELAY, flags2 |= FAST (1620-1622); - color_range/primaries/trc/colorspace from `avcodec_colorspace_from_sunshine_colorspace` (1624-1629); sw_pix_fmt set for cbs (1632); - **hardware branch (1634-1687)**: create base hwdevice ctx via the per-API initializer (dxgi/cuda/vaapi/vt), optionally derive a second ctx (QSV derives QSV from D3D11VA), then alloc+init an AVHWFramesContext (initial_pool_size=0, encode device may tweak via init_hwframes) and attach as hw_frames_ctx. slices = config.slicesPerFrame. - **software branch (1688-1695)**: slices = max(requested, min_threads) for parallelism. - SINGLE_SLICE_ONLY forces slices=1; thread_type=SLICE, thread_count=slices (1697-1702). - apply options in layers: common → (hdr|sdr) → (hdr444|sdr444 if YUV444) → fallback (only on retry) (1738-1754), via the `handle_option` std::visit dispatcher (1705-1736). - **rate control (1756-1787)**: bit_rate = rc_max_rate = bitrate*1000. CBR by default (rc_min_rate=bitrate); CBR_WITH_VBR encoders set bit_rate-- to force libavcodec into VBR-simulating-CBR. rc_buffer_size = one-frame's worth (bitrate/framerate) unless NO_RC_BUF_LIMIT; software with slices or HEVC uses 1.5× buffer; NVENC adds vbv_percentage_increase. - `encode_device->init_codec_options(ctx, &options)` gives the platform device the last word (1790). - avcodec_open2; on failure with fallback_options on retry 0 → continue, else return null. 4. Build the AVFrame with matching color metadata; attach HDR mastering-display + content-light side data if HDR (1815-1856) from `disp->get_hdr_metadata`. 5. If the encode device has no GPU surface (`!encode_device->data`) build an `avcodec_software_encode_device_t` (sws scaler) to do CPU color-convert + aspect-padding (1858-1871). 6. Compute the `inject` value: for H264/HEVC, inject = (1 - VUI_PARAMETERS cap) * (1 + videoFormat) → 0/1/2 deciding whether the encoder needs SPS/VPS rewriting (1884). #### NVENC standalone path — nvenc_base::create_encoder (nvenc_base.cpp:100-457) Bypasses libavcodec entirely, talking to nvEncodeAPI directly. Picks minimum API version (11 for H264/HEVC, 12 for AV1) to maximize driver compat (103). Opens session, enumerates supported encode GUIDs and rejects unsupported codec (130-173). Queries caps (max W/H, 10-bit, YUV444, async, RFI, custom-VBV, CABAC, weighted-pred, intra-refresh) via `get_encoder_cap` and gracefully degrades (191-220). Config: preset P1-P7, NV_ENC_TUNING_INFO_ULTRA_LOW_LATENCY, enablePTD=1, CBR, zeroReorderDelay=1, no lookahead, lowDelayKeyFrameScale=1, multipass per config, AQ, NVENC_INFINITE_GOPLENGTH, frameIntervalP=1 (235-255). Per-codec: repeatSPSPPS/repeatSeqHdr=1, idrPeriod=infinite, sliceMode=3 + sliceModeData=slicesPerFrame for H264/HEVC, power-of-two tile rows/cols for AV1, chromaFormatIDC=3 for YUV444, pixelBitDepthMinus8=2 for 10-bit, full VUI fill, intra-refresh (period 300/cnt 299 + single-slice if capable), min-QP, ref-frame setup with L0 pinned to 1 ref but a larger DPB so RFI fallback has frames to use (257-380). encode_frame (490-572) maps the registered input buffer, encodes with FORCEIDR flag if requested, optionally waits on async event, locks/copies the bitstream, reports idr flag + rfi confirmation. invalidate_ref_frames (574-610) implements per-frame nvEncInvalidateRefFrames with dedup and an IDR fallback when the range is too large. #### Capture/encode threading `capture()` (video.cpp:2450) chooses, by the PARALLEL_ENCODING flag, between: - **async/parallel** (capture_async → captureThread + encode_run): a single shared capture thread (priority critical, video.cpp:1295) feeds an image event queue; one encoder thread per session runs encode_run (video.cpp:1910). Capture and encode overlap. - **sync** (captureThreadSync → encode_run_sync, video.cpp:2187): capture and encode on one thread (priority high) — used by encoders without PARALLEL_ENCODING. `captureThread` (video.cpp:1148) owns a self-trimming image pool (12 buffers, LRU-trimmed after 3s of disuse, video.cpp:1197-1292) and the reinit dance: on `capture_e::reinit` it raises reinit_event, drops all images, waits for display use_count to drop to 1 (so capture surfaces are fully released), refreshes the display list, and re-opens the display, honoring switch_display events (video.cpp:1340-1401). `encode_run`'s loop (video.cpp:1989-2063) implements **minimum-FPS pacing** (encode a frame even with static content to avoid quality decay), drops frames arriving faster than ~3/4 of the frame interval, services idr_events and invalidate_ref_frames_events, and on shutdown flushes via avcodec_send_frame(null) in the session dtor (video.cpp:327-337). #### Encode → packet → SPS rewrite `encode_avcodec` (video.cpp:1414) sends the frame, drains packets, and on the first IDR (when `session.inject`) calls cbs::make_sps_h264/make_sps_hevc to produce rewritten SPS/VPS and queues `replace_t{old,new}` byte-substitutions on the packet (video.cpp:1451-1474) — the RTP/packetization layer applies these to fix up Moonlight-incompatible headers without re-encoding. `encode_nvenc` (video.cpp:1488) just wraps the already-complete bitstream into a generic packet with idr + after_ref_frame_invalidation flags. #### Probing & capability discovery (startup) `probe_encoders` (video.cpp:2717) walks the preference list, runs `validate_encoder` on each, and selects the first that passes — honoring a user-forced encoder and HEVC/AV1 codec-mode requirements, with ALWAYS_REPROBE forcing software (last resort) to keep retrying for something better. `validate_encoder` (video.cpp:2536) actually opens each codec at 1080p60 with two probe configs (`config_max_ref_frames` with numRefFrames=1, `config_autoselect` with 0) to learn REF_FRAMES_RESTRICT, then probes HEVC/AV1, then HDR (1080p Main10) and YUV444 — setting the per-codec capability bitset (PASSED, DYNAMIC_RANGE, YUV444, VUI_PARAMETERS) consumed later by make_avcodec_encode_session. `validate_config` (video.cpp:2484) actually encodes one IDR frame and runs cbs::validate_sps to detect whether the encoder emits VUI (deciding the inject path). **Hard-won lessons (edge cases handled)** - B-frames are never used (max_b_frames=0) because they delay decoder output — a hard latency rule, set in both back-ends (video.cpp:1600; nvenc_base.cpp implicitly via frameIntervalP=1). - x265's Info SEI is so long it pushes the IDR picture into the 2nd packet and breaks Moonlight's parser → software HEVC forces 'info=0:keyint=-1' x265-params; libx265 also ignores gop_size so keyint must be set manually (video.cpp:885-901). - AMD (and VAAPI) encoders omit SPS VUI parameters; H.264 AMF VAAPI quirk is encoded as the VUI_PARAMETERS capability flag and triggers SPS rewriting (video.h:133; video.cpp:2706-2711). A FORCE_VIDEO_HEADER_REPLACE flag can force rewriting even when VUI is present. - QSV must simulate CBR via VBR: CBR_WITH_VBR sets bit_rate = rc_max_rate-1 to force libavcodec out of true CBR, and RELAXED_COMPLIANCE (FF_COMPLIANCE_UNOFFICIAL) + NO_RC_BUF_LIMIT are needed for it to open at all (video.cpp:730,1760-1769). - Old/low-end Intel iGPUs: SINGLE_SLICE_ONLY forces slices=1 'because older Intel iGPUs ruin it for everyone else'; H.264 QSV low_power must fall back to 0 on some GPUs (fallback_options, video.cpp:726). - NVENC can hang on teardown: ASYNC_TEARDOWN moves session.reset() to a detached thread so streaming can restart immediately even if the NVENC driver thread never returns (video.cpp:1926-1941). - RFI needs a deliberately-larger DPB than the 1-ref L0 limit: NVENC pins numRefL0 to NV_ENC_NUM_REF_FRAMES_1 but keeps maxNumRefFramesInDPB at the requested/default count so invalidated frames have older fallbacks; if the invalidation range >= DPB size it must fall back to IDR (nvenc_base.cpp:268-281,597-600). - Windows 11 24H2 can report zero active display devices; probing encoders in that state locks/breaks DXGI, so allow_encoder_probing requires >=1 active device (or a temp virtual display) before probing (video.cpp:58-82). - VAAPI dislikes a truly infinite GOP, so LIMITED_GOP_SIZE caps gop_size at INT16_MAX instead of INT_MAX; it also needs idr_interval=INT_MAX and sei=0 set as options (video.cpp:1603-1605,933-963). - 10-bit H.264 is unsupported by the streaming protocol — asserted, not just rejected (video.cpp:1579). HEVC uses the same RExt profile for both 8- and 10-bit YUV444; AV1 uses Main for 4:2:0 and High for 4:4:4 (video.cpp:1584-1595). - Frame pacing: encode_run drops frames arriving faster than ~3/4 of the target interval and snaps near-on-time frames to the expected timestamp, while still encoding a minimum-FPS frame on static content to avoid quality decay (video.cpp:2021-2055). - Display reinit must wait for display shared_ptr use_count to fall to 1 before re-creating it — capture surfaces hold references and some backends allow only one session per device; images are dropped on the capture thread (not the encoder thread) to avoid a race that frees a good frame after reinit (video.cpp:1344-1371). - libsvtav1 is intentionally disabled by default (ENABLE_BROKEN_AV1_ENCODER) because on-demand IDR is broken and real-time perf is poor; svtav1-params force keyint=-1:pred-struct=1:force-key-frames=1:mbr=0 to work around an FFmpeg CBR bug (video.cpp:861-882). - The D3D11VA hwdevice context for Windows encoders installs no-op lock/unlock callbacks and AddRefs the caller's ID3D11Device (video.cpp:2967-2991) — the device is shared with the capture/zero-copy path rather than created anew, which is the zero-copy seam to preserve. *External deps:* FFmpeg libavcodec/libavutil/libswscale (avcodec_send_frame/receive_packet, AVHWFramesContext, sws_scale_frame, mastering-display/content-light side data), FFmpeg Coded Bitstream API (cbs_h264.h/cbs_h265.h, ff_cbs_* — used to parse/rewrite SPS/VPS VUI), NVIDIA Video Codec SDK / nvEncodeAPI v12 (NV_ENC_* structs, presets P1-P7, RFI, intra-refresh), D3D11VA (AVD3D11VADeviceContext, hwcontext_d3d11va.h) on Windows; CUDA hwdevice on Linux NVENC; VAAPI on Linux; VideoToolbox on macOS, Boost (pointer_cast, BOOST_LOG), Apollo internal: platform/common (display_t, capture, encode_device_t, img_t), display_device (enumerate/map_output_name), config, sync/safe (mail/queue/signal), platform/windows/virtual_display (SudoVDA, the same IDD punktfunk ships) ### Apollo — Audio capture, encode, transport (Windows host) **Purpose.** How Apollo (a Sunshine fork, Moonlight-compatible HOST) captures host audio, encodes it with Opus, and ships it over the GameStream RTP/FEC transport to a Moonlight client. The cross-platform glue lives in src/audio.cpp; the Windows-specific capture/sink-management lives in src/platform/windows/audio.cpp (WASAPI loopback). This map is for a punktfunk agent who wants to harvest the battle-tested Windows audio behaviors (loopback capture format negotiation, default-device-change handling, Steam Streaming Speakers virtual sink, MMCSS thread priority, sink save/restore) for punktfunk's own GameStream-compat host. Note punktfunk's NATIVE protocol uses Opus over QUIC datagrams (0xC9) — totally different transport — but the CAPTURE + ENCODE half of this map is directly relevant to both punktfunk host paths. #### Three layers, two threads per session ##### Layer 1 — Cross-platform orchestration (src/audio.cpp, 331 LOC) `audio::capture(mail, config, channel_data)` is the entry point spawned per streaming session (src/audio.cpp:130). It runs on the **capture thread** and itself spawns one **encode thread**: ``` session start audio::capture() [capture thread, priority=critical, src/audio.cpp:208] |- get_audio_ctx_ref() -> lazily-inited shared audio_ctx_t (src/audio.cpp:253) |- pick sink (virtual > config > host) (src/audio.cpp:160-185) |- control->set_sink(sink) (only the FIRST session may switch) (src/audio.cpp:188-196) |- control->microphone(mapping, channels, 48000, frame_size) -> mic_t (src/audio.cpp:199) |- spawn std::thread{encodeThread, samples_queue, ...} (src/audio.cpp:211) \- loop: mic->sample(buf) -> samples->raise(buf) (src/audio.cpp:222-250) encodeThread() [encode thread, priority=high, src/audio.cpp:86] |- opus_multistream_encoder_create(..., OPUS_APPLICATION_RESTRICTED_LOWDELAY) (src/audio.cpp:96) |- OPUS_SET_BITRATE + OPUS_SET_VBR(0) (src/audio.cpp:106-107) \- loop: samples->pop() -> opus_multistream_encode_float() -> packets->raise() (src/audio.cpp:114-127) ``` The two threads communicate via `safe::queue_t>` of depth 30 (src/audio.cpp:210). Encoded Opus packets are pushed into the **`mail::audio_packets`** mailbox queue (src/audio.cpp:87,126) — a global, decoupling encode from transport. ##### Layer 2 — Transport (src/stream.cpp, `audioBroadcastThread`, 1651-1753) A **third thread**, started once in `start_broadcast` (src/stream.cpp:1807), drains `mail::audio_packets` and turns each Opus packet into a GameStream RTP/FEC datagram. This is shared across platforms. ##### Layer 3 — Windows platform impl (src/platform/windows/audio.cpp, 1196 LOC) Implements `platf::mic_t` (WASAPI loopback capture, class `mic_wasapi_t`, src/platform/windows/audio.cpp:427) and `platf::audio_control_t` (sink enumeration / switching / virtual-sink management, src/platform/windows/audio.cpp:685). `platf::audio_control()` (1162) is the factory called by `start_audio_control` in Layer 1. #### Control-flow contract (platform interface, src/platform/common.h) - `capture_e` enum {ok, reinit, timeout, interrupted, error} (common.h:457) is the signaling protocol between `mic->sample()` and the capture loop. `reinit` triggers full mic teardown + recreate (src/audio.cpp:232-244); `timeout` just continues; anything else terminates audio (stream survives without sound). - `mic_t::sample(vector&)` (common.h:550) fills exactly one frame of interleaved float samples. - `audio_control_t` (common.h:555): `sink_info()`, `microphone(mapping, channels, rate, frame_size)`, `set_sink()`, `is_sink_available()`. - `sink_t` (common.h:387) has `.host` (current default device id) and optional `.null` { stereo / surround51 / surround71 } — the virtual sink device ids per channel layout. On Windows `.null` is only populated when a virtual sink (Steam Streaming Speakers or user-configured) is found. #### Format negotiation (the heart of the Windows path) `make_audio_client` (src/platform/windows/audio.cpp:279): WASAPI loopback `IAudioClient` opened in SHARED mode with `AUDCLNT_STREAMFLAGS_LOOPBACK | EVENTCALLBACK | AUTOCONVERTPCM | SRC_DEFAULT_QUALITY` (320-321) — Windows resamples to 48 kHz for us so capture is always float32 @ 48 kHz regardless of the device's native rate (313-315). Capture format is always f32 (line 294); only the **channel mask** is borrowed from the device's mix format when channel counts match (305-311), to preserve the host's real speaker layout. The `formats[]` table (258-277) maps 2/6/8 channels to Stereo/5.1/7.1 with the GameStream-required masks. `init()` (450) iterates formats, picks the one whose channel count matches the requested output, then sizes `sample_buf` to `max(device_buffer, frame_size)*2*channels` (521) — the "*2" lets a frame straddle two WASAPI buffers. **Hard-won lessons (edge cases handled)** - Audio init failure must NOT kill the video stream: a fail_guard waits on shutdown_event so the session continues silently if audio can't init (src/audio.cpp:146-153, 213-218). punktfunk should adopt this — audio is best-effort. - Sample buffer must be sized for a frame straddling two WASAPI buffers: sample_buf = max(device_frames, frame_size) * 2 * channels (src/platform/windows/audio.cpp:521). The mic_t::sample refill loop (429-448) carries leftover samples to the front of the buffer between calls; overflow is logged not fatal (639-641). - WASAPI silence flag must be honored explicitly: AUDCLNT_BUFFERFLAGS_SILENT means GetBuffer returns a buffer you must treat as zeros, NOT copy (src/platform/windows/audio.cpp:643-647). Missing this leaks stale/garbage audio during silent passages. - Data-discontinuity flag is detected and logged (src/platform/windows/audio.cpp:632-634) — glitch diagnosis hook. - Switching a virtual sink's bit depth glitches some users: set_format reads the current default device's bits-per-sample and only picks a virtual waveformat that matches it (src/platform/windows/audio.cpp:813-840) — '16->24 bit known to cause glitches'. - Stereo virtual formats deliberately deprioritize 32-bit: using a 32-bit format disables Dolby/DTS spatial audio if the user enabled it on the Steam speaker (src/platform/windows/audio.cpp:133-141) — the waveformat preference list ordering encodes this. - IPolicyConfig is UNDOCUMENTED Windows API (PolicyConfig.h, included last, src/platform/windows/audio.cpp:26). SetDeviceFormat/SetDefaultEndpoint/SetEndpointVisibility have no official contract — Apollo copies args before passing (834-836). A pure-Rust port must reimplement this COM vtable by hand. - MinGW's libnewdev.a lacks DiInstallDriverW despite the header declaring it — loaded at runtime via LoadLibraryExW/GetProcAddress (src/platform/windows/audio.cpp:1052-1065). Toolchain landmine for any cross-compile. - SetDefaultEndpoint is applied across ALL ERole values in a loop (eConsole/eMultimedia/eCommunications), counting failures (src/platform/windows/audio.cpp:860-872) — partial failure is tolerated. - On Apollo STARTUP, if Steam Streaming Speakers are currently the default device, it forcibly picks a different default by toggling endpoint visibility off/on so the OS reselects (reset_default_device, src/platform/windows/audio.cpp:987-1042; called from platf::init, 1191). Avoids the host's own virtual device being a confusing leftover default. - After installing the Steam driver, Sleep(5000) before touching the audio subsystem again (src/platform/windows/audio.cpp:1078) and restore the prior default device (1082-1089) — driver install races the audio stack. - Sink priority order is virtual > config-specified > host, but ONLY uses virtual when host audio is disabled or there's no other sink (src/audio.cpp:160-183) — so a user with real speakers and host_audio on still hears it. - FEC parity matrix mismatch: Apollo's own Reed-Solomon produces a different matrix than Nvidia's, so for the fixed RTPA(4,2) audio case the parity matrix is memcpy'd from 8 hardcoded OpenFEC-derived bytes (src/stream.cpp:1659-1665). Pure interop hack — only works because shard counts are constant. - GFE advertises an incorrect channel mapping for normal-quality surround in SDP, so rtsp.cpp rotates the mapping (std::rotate) for non-high-quality configs (src/rtsp.cpp:839-847) — wire-compat quirk to replicate for Moonlight surround. - Video bitrate is reduced to budget for audio: high-quality=256 vs normal=96 kbps per channel, capped at 20% reduction (src/rtsp.cpp:1122-1131). The comment at src/audio.cpp:33-34 warns these bitrate constants are duplicated in rtsp_stream::cmd_announce() and must be kept in sync — a footgun. *External deps:* libopus (opus/opus_multistream.h) — OpusMSEncoder multistream LOWDELAY CBR encoder, Windows WASAPI: Audioclient.h, mmdeviceapi.h (IAudioClient loopback, IAudioCaptureClient, IMMDeviceEnumerator, IMMNotificationClient), avrt.h — AvSetMmThreadCharacteristics MMCSS 'Pro Audio', PolicyConfig.h (undocumented IPolicyConfig: SetDeviceFormat/SetDefaultEndpoint/SetEndpointVisibility), newdev.dll DiInstallDriverW (runtime-loaded) — Steam Streaming Speakers IDD driver install, Steam Streaming Speakers virtual audio driver (SteamStreamingSpeakers.inf, ships with Steam), moonlight-common-c (Limelight-internal.h): RTP_PACKET, AUDIO_FEC_HEADER, RTPA_DATA/FEC/TOTAL_SHARDS, reed_solomon, Boost.Asio udp::socket (transport sockets in stream.cpp), Apollo internal: safe::queue_t/mail mailbox, util::safe_ptr RAII, crypto::cipher::cbc_t (AES-CBC) ### Apollo (Sunshine fork) — Input handling & injection **Purpose.** Apollo's host-side input subsystem receives Moonlight/GameStream input packets over the encrypted control stream and replays them as synthetic OS input on Windows: mouse (relative + absolute), keyboard (VK→scancode), Unicode text, vertical/horizontal scroll, touch, pen, and gamepads (Xbox 360 + DualShock 4 via ViGEm, including rumble, RGB LED, motion, touchpad, and battery feedback). It is split into a platform-agnostic protocol/state layer (src/input.cpp) and a Windows injection backend (src/platform/windows/input.cpp), bridged by the platf:: abstraction declared in src/platform/common.h. It is directly relevant to punktfunk's Windows host (SendInput + ViGEm path) which the user wants to improve. #### Control flow (top to bottom) 1. **Ingest (control-stream thread).** `input::passthrough(input, vector&&, const crypto::PERM&)` at src/input.cpp:1579 is the entry point. It (a) drops the packet if the stream lacks `_all_inputs`; (b) if the stream has *partial* input permission, peeks the packet `magic` and gates by category — controller/mouse/keyboard/touch/pen each map to a distinct `crypto::PERM` bit (src/input.cpp:1591-1636); (c) pushes the raw bytes onto `input->input_queue` under `input_queue_lock` and schedules `passthrough_next_message` on the shared `task_pool` (src/input.cpp:1639-1643). The control thread never touches the OS — it just enqueues. 2. **Batch + dispatch (task-pool thread).** `passthrough_next_message` (src/input.cpp:1481) locks the queue, pops the front entry, then walks the remaining queue trying to `batch()` later same-type packets into the popped one (src/input.cpp:1502-1519). **The lock is released before injection** (comment at 1486-1488) so the control thread is never blocked behind a slow OS call. It then switches on `magic` and calls the typed `passthrough(input, PNV_*_PACKET)` overload (src/input.cpp:1526-1571). 3. **Per-type translation (src/input.cpp).** Each `passthrough` overload checks the relevant `config::input.{mouse,keyboard,controller}` toggle, byte-swaps wire fields (`util::endian::big`), applies stateful logic, and calls the `platf::` primitive (`move_mouse`, `abs_mouse`, `button_mouse`, `keyboard_update`, `scroll`/`hscroll`, `unicode`, `gamepad_update`, `touch_update`, `pen_update`, `gamepad_*`). 4. **OS injection (src/platform/windows/input.cpp).** The `platf::` primitives build Win32 `INPUT` structs and call the central `send_input()` (line 477), or drive ViGEm / synthetic-pointer APIs. `send_input` retries once after `syncThreadDesktop()` if the active input desktop changed. #### State ownership - **`input_t`** (src/input.cpp:155) is per-stream: shortcut-modifier flags, `std::vector gamepads`, the `client_input_t` pointer (per-client pen/touch device context), the input queue+lock, the mouse-left-button delay task id, the touch_port, and vscroll/hscroll accumulators. - **File-static globals** (src/input.cpp:111-116): `key_press` (kpid→bool down-state map), `mouse_press[5]`, `gamepadMask` bitset, and the single `platf_input` backend handle. These are process-wide (one active stream assumed for keyboard/mouse). - **`vigem_t`** (src/platform/windows/input.cpp:198) owns the ViGEm client connection (lazily connected on first gamepad, disconnected when the last detaches — lines 232-242, 308-320) and a `vector` of size MAX_GAMEPADS. Each `gamepad_context_t` holds the target handle, the union { XUSB_REPORT x360; DS4_REPORT_EX ds4; }, pointer-id→index map for touchpad, feedback queue, last rumble/LED (for de-dup), and the DS4 timestamp-resend `repeat_task`. - **`client_input_raw_t`** (src/platform/windows/input.cpp:663) is per-client: separate synthetic pen + touch device handles, the 10-slot `touchInfo[]` array, active-slot count, and the pen/touch refresh tasks — so multiple clients inject pen/touch independently. **Key techniques** - **Decouple ingest from injection via task-pool queue with lock-then-release batching** — The control-stream thread only enqueues bytes and schedules a task (src/input.cpp:1639-1643). A pool thread pops one packet, coalesces later same-type packets into it while holding the queue lock, then RELEASES the lock before the (potentially slow) SendInput/ViGEm call (src/input.cpp:1486-1520). — _For a low-latency streaming host this is the core anti-head-of-line-blocking pattern: a slow OS input call (e.g. SendInput crossing a desktop switch) never stalls the network/control thread, and bursts of mouse/scroll/controller packets collapse to one OS event per drain. punktfunk should mirror this: never call SendInput on the QUIC/control thread._ - **Type-aware packet batching with batch_result_e (batched / not_batchable / terminate_batch)** — batch() overloads (src/input.cpp:1208-1475) sum relative-mouse deltas and scroll amounts (with __builtin_add_overflow guards that terminate the batch on 16-bit overflow), take the latest absolute position, and collapse controller/touch/pen move/hover runs. terminate_batch stops at a state-changing event (button change, eventType change, active-mask change) so ordering semantics are preserved; not_batchable skips a non-matching controller but keeps scanning. — _Moonlight 'spams controller packets even when not necessary' (src/input.cpp:282). Batching cuts injected-event count under load without dropping state transitions — directly reduces input-to-screen jitter and OS overhead._ - **VK→scancode injection with normalization fallback ladder** — keyboard_update (src/platform/windows/input.cpp:608) prefers KEYEVENTF_SCANCODE using the static US-English VK_TO_SCANCODE_MAP (keylayout.h). If the client flagged the VK as non-normalized (SS_KBE_FLAG_NON_NORMALIZED) it falls back to MapVirtualKey under config::input.always_send_scancodes (excluding VK_LWIN/RWIN/PAUSE which misbehave), else sends a raw VK event. A curated switch adds KEYEVENTF_EXTENDEDKEY for the extended-key set (arrows, nav cluster, RWIN/RMENU/RCONTROL, numpad divide, apps). — _Many games read DirectInput/raw scancodes, not VK events; sending scancodes is essential for in-game key compatibility. The extended-key flag is required or arrow keys / right-modifiers misfire. This is a concrete table+logic punktfunk's Windows VK path can adopt verbatim._ - **Desktop-switch retry on every SendInput / InjectSyntheticPointerInput** — send_input (src/platform/windows/input.cpp:477) and inject_synthetic_pointer_input (line 499) retry once after calling syncThreadDesktop() (misc.cpp:251 — OpenInputDesktop(DF_ALLOWOTHERACCOUNTHOOK)+SetThreadDesktop) when the call fails and the input desktop handle changed, tracked in a thread_local _lastKnownInputDesktop. — _On Windows the input desktop changes on UAC prompts, lock screen, and Ctrl+Alt+Del (secure desktop / Winlogon). Without re-binding the thread to the new desktop, all injected input silently fails. This is exactly the secure-desktop problem area called out in punktfunk's docs/memory — Apollo solves it cheaply per-call rather than with a second process._ - **ViGEm dual-target gamepad with client-negotiated type selection** — alloc_gamepad (src/platform/windows/input.cpp:1175) picks X360 vs DS4 by precedence: explicit config (x360/ds4) > client-reported LI_CTYPE_PS/XBOX > motion_as_ds4 if accel/gyro present > touchpad_as_ds4 > default X360. It warns when capabilities (motion/touchpad/RGB) will be lost on X360. DS4 path packs motion, touchpad, and battery into DS4_REPORT_EX. — _DS4 is the only ViGEm target that carries gyro/accel, touchpad, and lightbar; X360 is the safe default. punktfunk already does client-negotiated pad type — Apollo's capability-driven auto-selection (motion/touchpad presence → DS4) and the explicit 'feature will be lost' warnings are a more refined policy worth porting._ - **DS4 timestamped resend loop (ds4_update_ts_and_send)** — Every DS4 report advances wTimestamp by elapsed time in 5.333µs units and re-arms a 100ms repeat_task (src/platform/windows/input.cpp:1454-1481), so the 16-bit timestamp never stalls/overflows even when no new input arrives. — _'Some applications require updated timestamp values to register DS4 input' (line 1450). Without the heartbeat, motion-aware games ignore a held DS4. Non-obvious gotcha that any DS4-emulating host must replicate._ - **Synthetic pen/touch via InjectSyntheticPointerInput with periodic refresh and slot compaction** — Per-client synthetic pointer devices (CreateSyntheticPointerDevice, Win10 1809+). Touch slots are kept contiguous via perform_touch_compaction (line 715, required by the API), edge-triggered flags (DOWN/UP/CANCELED/UPDATE) are cleared after each frame (line 900/1020), and a 50ms repeat task (ISPI_REPEAT_INTERVAL) re-injects held state because Windows auto-cancels untouched interactions after ~1s. — _Touch/pen are stateful, slot-indexed, and self-cancelling — a fundamentally different injection model than mouse/keyboard. If punktfunk grows touch/pen, this is the reference for the Windows-specific contiguity + refresh requirements._ **Hard-won lessons (edge cases handled)** - Absolute-mouse left/right click ordering: when Moonlight sends absolute coords, a RIGHT-down can arrive immediately after a LEFT-up, causing a stray left-click on browser hyperlinks before the right-click. Apollo delays BUTTON_LEFT release by 10ms ONLY in absolute-mouse mode (DISABLE_LEFT_BUTTON_DELAY sentinel flips to ENABLE on the next abs move), and if a RIGHT-down arrives during the window it synthesizes right down+up immediately so right beats the delayed left release (src/input.cpp:524-614, 558-617). Relative mouse never gets the delay (gaming must stay zero-latency). - Synthetic modifier injection: a keyboard packet can request SHIFT/CTRL/ALT modifiers that aren't actually held. For non-modifier keys, Apollo presses the missing modifier, sends the key, then releases the modifier — only the ones not already in shortcutFlags (src/input.cpp:680-708, 730-743). Prevents capital letters / shortcuts breaking when the client's modifier state diverges from the host. - Gamepad BACK→HOME emulation with state reconciliation: holding BACK for back_button_timeout synthesizes a Guide/HOME press (force BACK up, press HOME, sleep 100ms, release HOME) on a delayed task (src/input.cpp:1155-1189). Because this desyncs host state from Moonlight, gamepad_t.back_button_state (UP/DOWN/NONE) FORCES the BACK bit in subsequent reports until the client's reported state matches again (src/input.cpp:1135-1150) — see the long comment at src/input.cpp:147-152. - Low-res scroll accumulation: when high_resolution_scrolling is off, fractional wheel deltas are accumulated and only whole WHEEL_DELTA(120) ticks are emitted, remainder carried (src/input.cpp:787-820). Prevents lost/jittery scroll from sub-tick deltas. - Legacy gamepad arrival: old Moonlight clients never send SS_CONTROLLER_ARRIVAL; the multi-controller packet itself lazily allocates a pad when activeGamepadMask sets the bit and frees it when cleared (src/input.cpp:1094-1113). New clients use the explicit arrival packet (src/input.cpp:837). - Touch-port coordinate transform handles offset/scaling, clamps to surface, and bails with a 'early absolute input without a touch port' log if the port event hasn't arrived yet (src/input.cpp:460-484); contact-area uses polar→cartesian scaling with a 45° assumption when rotation is LI_ROT_UNKNOWN (src/input.cpp:506-517, mirrored in windows/input.cpp:980-1000). - reset() on stream teardown synchronously releases every still-down mouse button and key via the task_pool (src/input.cpp:1646-1668) and cancels repeat/delay tasks, so a dropped client never leaves keys/buttons stuck down on the host. - DS4 motion requires inverting ViGEmBus's own calibration: raw m/s² and deg/s are converted to DS4 scale then run through APPLY_CALIBRATION with hardcoded per-axis bias/scale constants (src/platform/windows/input.cpp:95-99,143-196) measured against ViGEmBus, plus int16 clamping. Naive scaling produces wrong motion in games. - get_mouse_loc on Windows is unimplemented and throws (src/platform/windows/input.cpp:547-548) — a TODO/known gap; relative-mouse capture-back features that need cursor position aren't supported on the Windows host. - ViGEm absence is surfaced as a fatal log line at startup probe (src/platform/windows/input.cpp:200-210) specifically so the web UI shows 'install ViGEmBus' before the user ever tries to stream, and supported_gamepads() reports the 'gamepads.vigem-not-available' reason (line 1728). *External deps:* ViGEm/Client.h (ViGEmBus) — virtual Xbox360/DS4 gamepad bus driver; must be installed on host, Win32 user32: SendInput, CreateSyntheticPointerDevice/InjectSyntheticPointerInput/DestroySyntheticPointerDevice (GetProcAddress, Win10 1809+), OpenInputDesktop/SetThreadDesktop, MapVirtualKey, MultiByteToWideChar, moonlight-common-c (Input.h/Limelight.h) — packet structs (PNV_*, PSS_*) and LI_* enums for the GameStream input wire format, boost::endian — netfloat little-endian load for touch/pen/motion floats, Apollo internals: thread_pool (task_pool), safe::mail event/queue, crypto::PERM permission bits, config::input.* toggles ### Apollo: App/process launch & display configuration (Windows host) **Purpose.** When a Moonlight client launches an app, Apollo (a Sunshine fork) must, in one synchronous flow: pick/derive the target output, optionally spin up a SudoVDA virtual display at the client's WxH@Hz, push display settings (resolution/refresh/HDR) through a separate display-device library with retry+revert semantics, run the app's prep/detached/main commands with the right user token and environment, track the process lifetime, and tear everything back down on disconnect — reverting display state and removing the virtual display. process.cpp owns the orchestration and app-catalog parsing; display_device.cpp is a thin, retry-scheduled adapter over the bundled libdisplaydevice settings manager. #### Two layers **Layer 1 — `proc::proc_t` (process.cpp / process.h): the session app manager.** A single global `proc::proc` (process.cpp:55) holds the currently-running app. The state machine is `execute()` -> `running()` (polled) -> `pause()`/`resume()` (client disconnect/reconnect) -> `terminate()`. **Layer 2 — `display_device::*` (display_device.cpp): the display settings adapter.** A file-scoped singleton `DD_DATA` (display_device.cpp:37-41) holds a mutex + a `RetryScheduler`. On Windows the manager is the bundled libdisplaydevice `SettingsManager`(WinDisplayDevice over WinApiLayer) (display_device.cpp:615-630). On non-Windows `make_settings_manager` returns nullptr and every public fn becomes a safe no-op (the `if (!DD_DATA.sm_instance) return;` guard appears in every entry point). #### execute() control flow (process.cpp:178-561) — the load-bearing path 1. **Clean slate** — `terminate(false,false)` first (process.cpp:183-185); input-only path sleeps 1s extra. 2. **Scale-factor render-size derivation** (193-216): client WxH (default 1920x1080) -> render WxH = client * scale/100, then `&= ~1` to force even dimensions ("Most odd resolutions won't work well"). App-level scale_factor overrides session. The render size is written back into `launch_session->width/height`. 3. **fail_guard armed** (219-224): on any early return it restores `config::video.output_name`, calls `terminate()`, and `display_device::revert_configuration()`. Disabled only at 554 after success. 4. **Per-app gamepad override** (226-234): snapshots `config::input` into `_saved_input_config`, overrides controller enable/type; restored in terminate(). 5. **[WIN] Virtual display decision** (236-335): created if `headless_mode || launch_session->virtual_display || _app.virtual_display || !video::allow_encoder_probing()` (the last = no active display present). Lazily (re)inits the SudoVDA driver. Derives a **stable per-device GUID**: `use_app_identity` -> app name+uuid, optionally XOR-mixed with the client's `unique_id` for `per_client_app_identity` (264-268); else the client's identity. Calls `VDISPLAY::createVirtualDisplay(uuid, name, w, h, fps, guid)`. fps default 60000 (milli-fps), `<1000` treated as Hz and *1000; `double_refreshrate` doubles it (281-289). On success sets `config::video.output_name = display_device::map_display_name(this->display_name)` — i.e. it *retargets the capture/encoder to the brand-new virtual output*, ignoring the user's configured display (322-327). 6. **display_device::configure_display(config::video, *launch_session)** (337) pushes resolution/refresh/HDR through Layer 2. With a virtual display it then calls `reset_persistence()` (341-343) because Windows already owns the VD's lifetime. 7. **Encoder re-probe** (355-361): only when `session_count()==0`; `video::probe_encoders()` failing returns HTTP 503 unless `ignore_encoder_probe_failure`. Comment: GPU may have changed via hotplug/driver-crash/primary-monitor change. 8. **Environment** (368-413): SUNSHINE_* (compat) + APOLLO_* vars — app id/name/uuid, client uuid/name, render vs logical size, fps (milli-fps -> "%.3f"), HDR, gcmap, host-audio, sops, and audio config (2.0/5.1/7.1 from `surround_info & 65535`). `APOLLO_APP_STATUS` is mutated STARTING->RUNNING->RESUMING/PAUSING->TERMINATING through the lifecycle and read by user scripts. 9. **Command execution** (430-494): **prep_cmds** run sequentially & synchronously (`child.wait()`, nonzero exit aborts launch with -1, with a deliberate carve-out: a permission_denied on an empty-cmd desktop session is tolerated so a just-booted not-yet-logged-in machine can still stream — 451-456); **detached** cmds spawned & detached (failures only warn); **`_app.cmd`** is the long-lived process launched into `_process`/`_process_group`. Empty cmd => `placebo=true` (= "show desktop"). All via `platf::run_command(elevated, interactive=true, cmd, working_dir, _env, _pipe, ec, group)`. 10. **[WIN] Detached HDR-fix thread** (499-551): exponential-backoff (200ms..2s) waits until `!is_changing_settings_going_to_fail()` and display_name is known, then in `automatic` mode forces HDR OFF then ON (or just toggles OFF/ON if it was on) — works around Windows failing to set HDR right after a display connect/mode-change. #### terminate() teardown (process.cpp:706-815) Graceful process-group exit (terminate_process_group, see lessons) -> reset child/group -> run **undo** cmds in *reverse* prep order (719-742) -> [WIN] revert HDR to `initial_hdr` on `mode_changed_display` -> remove the virtual display by its tracked GUID (`removeVirtualDisplay(_launch_session->display_guid)`) -> branch: if a virtual display was used call `display_device::reset_persistence()` (Windows owns it), else `display_device::revert_configuration()` -> restore `config::video.output_name` to `initial_display` -> restore `config::input` from `_saved_input_config` -> optional `refresh()`. #### running() poll (process.cpp:563-602) `placebo` => alive; else if `wait_all` => alive while *any* group member runs; else alive only while the *initial* `_process` runs; **auto_detach heuristic** (582-594): if it exited 0 within 5s of launch, treat as a detached launcher (Steam/UWP pattern) and flip to placebo. POSIX reaps zombies via WNOHANG (565-571). #### display_device::configure_display chain `configure_display(video_config, session)` -> `parse_configuration()` (848-875) builds a `SingleDisplayConfiguration` from `parse_device_prep_option` / `parse_resolution_option` / `parse_refresh_rate_option` / `parse_hdr_option` / `remap_display_mode_if_needed`, returning a `std::variant`. configure_display(config) then `DD_DATA.sm_instance->schedule(...)` applies it, retrying only on `ApiTemporarilyUnavailable` (806-814). **Key techniques** - **Stable per-(app,client) virtual-display GUID via UUID XOR-mix** — createVirtualDisplay is keyed by a GUID derived from app uuid optionally XOR-mixed with client unique_id (process.cpp:258-279). The GUID is stored in launch_session->display_guid and is the SOLE handle used to removeVirtualDisplay on teardown (process.cpp:761). — _A reconnecting client gets the SAME virtual monitor identity, so Windows preserves per-monitor arrangement/state across sessions, and the host can deterministically reclaim the exact display it created even after a crash. punktfunk's VirtualDisplay trait could adopt a deterministic session GUID to make teardown crash-safe and reconnects stable._ - **Retarget capture/encoder onto the freshly-created virtual output** — After creating the VD, execute() overwrites config::video.output_name with map_display_name(display_name) (process.cpp:322-327) and re-probes encoders (355). initial_display is saved and restored in terminate(). — _Guarantees the encoder binds to the right output even though the user never configured it — directly relevant to punktfunk's 'native client resolution, no scaling' invariant and to the Windows-host bug class where DXGI duplicates the wrong/rendering GPU's output (see MEMORY 7654b20)._ - **Synchronous, ordered prep/undo with a reverse-iterator unwind** — prep_cmds run forward with child.wait() and an iterator pair (_app_prep_begin/_app_prep_it); terminate() walks that iterator backward running undo_cmds only for prep steps that actually ran (process.cpp:431-464, 719-742). — _Gives transactional setup/teardown semantics: if launch aborts at prep step N, only steps 0..N-1 are undone. A streaming host that mutates global display/audio state needs exactly this to avoid leaking state on partial failure._ - **RetryScheduler with stop-token + transient-vs-fatal classification** — All display mutations run on a RetryScheduler; apply retries ONLY on ApiTemporarilyUnavailable (display_device.cpp:806-814); revert retries indefinitely but is gated on device-set changes (658-700); revert can be delayed (try_indefinitely_with_delay). — _Display/CCD APIs transiently fail right after a monitor (dis)connect. Distinguishing 'retry' from 'give up' and only re-attempting revert when devices actually changed avoids a busy spin and avoids fighting the user. punktfunk's per-frame path must stay async-free, but this off-path retry pattern is portable to its vdisplay teardown._ - **Audio-context capture/restore tied to display deactivation** — sunshine_audio_context_t (display_device.cpp:50-116) captures the default audio session before a display is deactivated and holds it (15-retry, 2s) until the sink reappears, so audio routing survives EnsureOnlyDisplay-style reconfigs. — _On Windows, deactivating a monitor can move/destroy the default audio endpoint. A streaming host that disables physical displays must preserve the audio sink or lose host audio — a concrete edge punktfunk's Windows WASAPI path will hit._ - **HDR connect-race workaround via detached exponential-backoff thread** — After launch a detached thread waits (200ms..2s backoff) for !is_changing_settings_going_to_fail() (session not locked + CCD API reachable), then forces HDR OFF then ON (process.cpp:499-551). — _Windows reliably fails to set HDR immediately after a display connect/mode change; the off-then-on toggle is the empirically-found fix. Captured here as a hard-won fact for punktfunk's deferred HDR/10-bit work._ - **Token impersonation launch (CreateProcessAsUserW)** — platf::run_command on Windows retrieves the interactive user's (optionally elevated) token and impersonates before CreateProcessAsUserW (misc.cpp:312,543,1032). — _Apollo runs as a SYSTEM service in Session 0 but must launch games into the interactive desktop session — exactly punktfunk's open Windows item ('launch in Session 1 via Interactive scheduled task', SendInput-in-session). Apollo solves it inline with token impersonation rather than a scheduled task._ - **Driver watchdog ping thread with lazy re-init** — initVDisplayDriver starts a ping thread; on watchdog failure vDisplayDriverStatus flips to WATCHDOG_FAILED and the device is closed (process.cpp:65-78). execute() and refresh() lazily re-open the driver (with up to 5 retries in refresh, process.cpp:1570-1581). — _IDD drivers can wedge; a liveness ping + lazy re-open keeps the host serviceable without a restart. punktfunk ships the SAME SudoVDA IDD — this watchdog/re-init discipline is directly transferable._ **Hard-won lessons (edge cases handled)** - terminate_process_group (process.cpp:92-128): boost group::wait_for() is 'broken and deprecated', so it uses a 1s polling loop with platf::process_group_running; it ALWAYS calls group.terminate()+detach() even after a clean wait, to keep boost's group state consistent with the OS; and detaches the child to avoid a zombie. - Prep-cmd permission_denied is tolerated ONLY for an empty-cmd (desktop) app (process.cpp:451-456): this is the 'user rebooted and hasn't logged in yet' case where user impersonation fails — without this carve-out you could never stream the login desktop. - auto_detach heuristic (process.cpp:582-594): apps that exit 0 within 5s of launch (Steam, UWP, indirect-launchers) are reclassified as detached and the session goes to placebo, because their real process lives in another process group Apollo can't track via wait_all. - Log file opened with _wfsopen(..., L"a", _SH_DENYNO) on Windows (process.cpp:419-424): _SH_DENYNO lets a re-launched detached process reopen the same log while the previous instance still has it open; fopen() would interpret the path as ANSI, so UTF-16 + wchar variant is required for Unicode paths. - Even-dimension clamp: render_width/height &= ~1 after scaling (process.cpp:210-211) — odd resolutions misbehave with encoders/displays. - fps is milli-fps everywhere: default 60000, values <1000 are treated as Hz and multiplied by 1000 (process.cpp:281-285); env exports divide by 1000 and format '%.3f'. Easy to get wrong when porting. - Virtual-display teardown is keyed strictly by the tracked display_guid, and launch_session->virtual_display is set true the moment createVirtualDisplay is *attempted* (process.cpp:300-302) — even if it returns an empty name — so a partially-created display is still cleaned up. terminate() distinguishes 'remove failed but was never created right' from a real failure (process.cpp:763-767). - With a virtual display, terminate() calls reset_persistence() instead of revert_configuration() (process.cpp:341-343, 774): Windows already manages the VD's lifecycle, so re-applying the saved physical-display config would be wrong. - Encoder re-probe gated on session_count()==0 (process.cpp:355): only the first session re-probes, because the active GPU could have changed (hotplug, driver crash, primary-monitor change). A probe failure returns HTTP 503 to the client unless explicitly ignored. - apps.json is self-healing: parse() loops up to 3 times, on exception it forces tree["version"]=0 and re-migrates from scratch (process.cpp:1383-1412); if all fail it injects a 'Desktop (fallback)' app so the host is never appless. Built-in pseudo-apps (Virtual Display [WIN only, gated on driver OK], Remote Input, Terminate) are appended programmatically, not stored. - Legacy migration (migration_v2, process.cpp:1117-1214) coerces string 'true'/'on'/'yes' and array-wrapped legacy values into real bools/ints — old Sunshine stored everything as strings; a port reading old configs must replicate this. - App-id is CRC32 of name(+image sha256), truncated to positive int32 'due to client limitations' (process.cpp:1012-1041); index is folded in only on collision so ids stay stable across reorders. - map_display_name does a linear scan of enumAvailableDevices matching m_display_name->m_device_id (display_device.cpp:766-781); returns empty if not found — the VD may not be enumerable instantly after creation, hence the HDR thread's separate wait-for-display loop. - On non-Windows the entire display_device layer is a no-op (make_settings_manager returns nullptr); every public fn early-returns on !sm_instance, so callers need no #ifdef. A clean pattern for punktfunk's cross-platform display layer. *External deps:* boost::process v1 (child/group/environment/search_path) — process launch & grouping, boost::program_options split_winmain/split_unix — command-line tokenization for working-dir resolution, nlohmann::json — apps.json parse + migration, SudoVDA IDD driver via VDISPLAY wrapper (ddk/d4iface.h, d4drvif.h, sudovda/sudovda.h) — the SAME indirect-display driver punktfunk ships, bundled libdisplaydevice (display_device/windows/settings_manager.h, win_display_device.h, win_api_layer.h, retry_scheduler.h, file_settings_persistence.h, json.h) — Windows CCD display config, Windows CreateProcessAsUserW + token impersonation (platform/windows/misc.cpp) — launch into interactive session from SYSTEM, OpenSSL EVP/SHA — app image sha256 + app-id hashing, boost::crc — CRC32 app-id ### Apollo: Config, management/web UI, system tray **Purpose.** Apollo (C++ Sunshine fork, Moonlight-compatible host) loads and persists all host settings from a flat key=value text config + a JSON apps file, exposes a self-contained HTTPS management web UI (config editor, app/library CRUD, paired-client management, PIN pairing, logs, restart/quit) served by an embedded Simple-Web-Server, and presents a native OS system tray (icon + balloon notifications + menu) that drives the same control surface. This is the operator/management plane that punktfunk's Windows host (crates/punktfunk-host/src/mgmt.rs + native_pairing.rs + web/) parallels — Apollo's is far more mature on Windows, particularly around the service/desktop launch dance, tray-icon survival, and session-cookie auth. #### Three loosely-coupled planes over one set of global config singletons `config::video / audio / stream / nvhttp / input / sunshine` are process-global `extern` structs (config.h l.293-298, defined config.cpp l.439-604). Everything else reads them; the web UI mutates the *file*, not the live structs (see "live reload" below). ##### 1. Config load path (config.cpp) `config::parse(argc, argv)` (l.1387) is called once at startup: 1. Walks argv: `--help`, Windows `--shortcut`/`--shortcut-admin`, single-dash flags (`apply_flags` l.1053 toggles a `std::bitset`), `--cmd ...` tail captured into `sunshine.cmd`, bare `key=value` overrides collected into `cmd_vars`, and a bare token = path to the config file (l.1425). 2. Ensures appdata dir + creates an empty config file if missing (l.1448-1456; on Windows seeds a default `server_cmd` Bubbles screensaver entry). 3. `parse_config(read_file(config_file))` → `unordered_map` via a hand-rolled parser (NOT a library): `parse_option` (l.650) strips `#` comments, trims whitespace, splits on first `=`, and—crucially—handles multi-line `[...]` JSON list values by bracket-counting (`skip_list` l.633). 4. argv overrides are merged over file values (l.1461). 5. `apply_config(std::move(vars))` (l.1090) is the giant binding table: each setting is consumed by a typed helper that *erases* the key from the map on success, so any keys left over at the end are logged as "Unrecognized configurable option" (l.1377-1381). Every consumed key is also mirrored into `modified_config_settings` (config.h l.20, config.cpp l.1104) so the UI can show only non-default values. 6. Windows-only tail (l.1490-1534): if launched via Start-Menu shortcut and the service isn't running, it `ShellExecuteExW(runas, --shortcut-admin)` to self-elevate, waits for the service, `wait_for_ui_ready()`, then `launch_ui()` — then returns 1 so the foreground process does NOT also start a stream host. This is Apollo's service↔UI two-process split. ##### 2. Web management plane (confighttp.cpp) `confighttp::start()` (l.1507) builds a `SimpleWeb::Server` initialized with the *same* `nvhttp.cert`/`nvhttp.pkey` the GameStream host uses (l.1511), binds to `address_family`-derived any-address on `map_port(PORT_HTTPS)` (offset 1 from base port 47989 → 47990), registers routes (l.1525-1558), runs `server->start()` on a detached `std::thread` (l.1577), then blocks on the shutdown mail event (l.1580) and calls `server.stop()` + `tcp.join()`. Request lifecycle for protected routes: `authenticate()` (l.179) → `checkIPOrigin()` (compares `net::from_address` against `http::origin_web_ui_allowed` = pc/lan/wan, l.159) → if no username set, redirect to `/welcome` → else validate the `auth` cookie: `hex(hash(cookieValue + salt)) == sessionCookie` AND not older than `SESSION_EXPIRE_DURATION` (15 days). Login (l.1469) checks username (case-insensitive) + `hash(password+salt)`, then sets an in-memory `sessionCookie` and a `Set-Cookie: auth=...; Secure; SameSite=Strict; Max-Age=2592000; Path=/`. There is exactly ONE session cookie globally (a single `std::string sessionCookie` l.64) — logging in elsewhere invalidates the prior session. Config round-trip: `getConfig` (l.997) re-parses the config FILE fresh and returns all key=values as JSON plus platform/version/`vdisplayStatus`; `saveConfig` (l.1049) rewrites the WHOLE config file from the posted JSON (skipping null/empty values, l.1064) as `key = value` lines. **Saving config does NOT hot-apply** — it requires a restart (`/api/restart` l.1327 → `proc::proc.terminate()` + `platf::restart()`). App edits, by contrast, DO live-apply via `proc::refresh(file_apps)` (saveApp l.653, deleteApp l.843, reorderApps l.790). ##### 3. System tray plane (system_tray.cpp) A single static `struct tray` (l.112) holds icon path, tooltip, a fixed menu array, and an `allIconPaths` list (4 icon states). `init_tray_threaded()` (l.444) sets the menu[0] label to "Open Apollo (:)", optionally hides controls (`hide_tray_controls`), spawns `tray_thread_worker` (l.415) which calls `init_tray()` then loops `tray_loop(0)` (non-blocking) every 50 ms until `tray_thread_should_exit`. State transitions from the streaming pipeline call `update_tray_playing/pausing/stopped/launch_error/require_pin/paired/client_connected`, each of which swaps the icon, sets a balloon `notification_title/text/icon`, optionally a `notification_cb` (e.g. require_pin → `launch_ui("/pin#PIN")` l.362; launch_error → terminate stream l.337), and calls `tray_update(&tray)`. Menu callbacks call into the same primitives as the web API (`proc::proc.terminate()`, `platf::restart()`, `display_device::reset_persistence()`, `lifetime::exit_sunshine()`). ##### Cross-plane glue - `launch_ui(path)` (entry_handler.cpp l.29) is the universal "open the web UI" entry — used by the tray, by Windows shortcut launch, and by PIN-pairing notifications. It just `platf::open_url("https://localhost:")`. - Credentials live in a separate JSON file (`credentials_file`, defaulted to `file_state` then overridable, config.cpp l.1220-1221), not the main config. `httpcommon::reload_user_creds` loads username/password/salt into `config::sunshine`; `savePassword` (confighttp l.1175) re-hashes and clears the session cookie to force re-login. **Key techniques** - **Erase-on-consume parsing with leftover detection** — apply_config() (config.cpp l.1090) consumes the vars map via helpers that erase each recognized key; whatever remains is logged as unrecognized (l.1377). Combined with modified_config_settings mirroring (l.1104), the host always knows exactly which settings deviate from defaults. — _punktfunk-host config could adopt the same 'only persist deltas + warn on unknown keys' discipline so a config schema bump doesn't silently swallow typos, and the web console can show a clean non-default view._ - **Per-vendor encoder enum string translators** — Whole namespaces (nv/amd/qsv/vt/sw, config.cpp l.53-357) map human strings ('ultralowlatency','cqp','superfast') to encoder SDK integer constants, with low-latency presets as the DEFAULTS (e.g. amd usage = ultralowlatency l.469-471, sw preset 'superfast'/'zerolatency' l.451-453, nvenc realtime HAGS + high-power mode on by default l.457-459). — _Defaults are explicitly tuned for latency, not quality — the encoder is configured ultra-low-latency out of the box. A low-latency host's config defaults should bias the same way; this is the concrete table punktfunk can port for AMD/QSV/VT vendor parity._ - **Embedded HTTPS server sharing the host TLS identity** — confighttp uses SimpleWeb::Server seeded with nvhttp.cert/pkey (confighttp.cpp l.1511) — the SAME cert the Moonlight/GameStream pairing uses — on a fixed port offset (PORT_HTTPS=1 → base+1). — _One identity, one cert, management UI and stream control on adjacent ports. punktfunk already shares its cert.pem between GameStream pairing and punktfunk/1; the lesson is the web console can reuse it rather than carrying a separate mgmt TLS story._ - **Single-string session cookie with salted-hash validation** — authenticate() (l.179) validates hex(hash(cookie + salt)) against an in-memory sessionCookie with a 15-day steady_clock expiry; login (l.1469) rand_alphabet(64) the raw cookie and stores only its hash. checkIPOrigin gates by pc/lan/wan BEFORE auth. — _Contrast with punktfunk's mgmt API (bearer token in ~/.config/punktfunk/mgmt-token + web login gate). Apollo's cookie+IP-origin model is simpler for a desktop single-operator host and avoids a static long-lived token; worth considering for the web console's UX._ - **Windows service↔UI self-elevation handshake** — config::parse (l.1490-1534): a non-admin Start-Menu shortcut self-relaunches as admin (ShellExecuteExW 'runas' --shortcut-admin l.1511), starts the service, wait_for_ui_ready() polls the Win32 TCP table for the LISTEN socket (entry_handler.cpp l.236), then launch_ui(), and returns 1 so the shortcut process never starts a stream. — _This is the mature answer to the exact problem punktfunk's Windows host hit (docs/windows-host.md 'secure-desktop two-process design', Session-0 vs interactive session). Apollo solves UI-launch-from-service cleanly; the TCP-table readiness poll is directly portable._ - **Tray thread DACL hardening for SYSTEM-context survival** — init_tray() (l.143-197) adds an EXPLICIT_ACCESS ACE granting SYNCHRONIZE to Everyone on the current thread handle before registering the icon, and busy-waits for GetShellWindow() (l.201) so the icon registers reliably across logoff/logon. — _When the host runs as a Windows service (SYSTEM), Explorer can't open the thread to detect termination → ghost tray icons forever. punktfunk's Windows host, if it ever runs as a service with a tray, needs this exact DACL fix._ - **JSON-list config values parsed via ptree wrapping** — Multi-line bracketed values (global_prep_cmd, server_cmd, dd_mode_remapping) are extracted as raw strings by the flat parser, wrapped in a synthetic JSON object, then parsed by boost ptree (list_prep_cmd_f l.949, mode_remapping_from_view l.411). — _A pragmatic hybrid: flat key=value for the human-editable 90%, embedded JSON for structured fields, without committing to full-JSON config. Shows how to grow a flat config without a rewrite._ **Hard-won lessons (edge cases handled)** - Saving config via the web UI does NOT hot-reload — saveConfig (confighttp.cpp l.1049) only rewrites the file; the live config::* structs are unchanged until /api/restart (l.1327) terminates and re-launches the process. App-list edits (saveApp/deleteApp/reorderApps) are the exception and DO live-apply via proc::refresh. Mixing these two semantics is a known footgun. - saveConfig rewrites the ENTIRE config file from the posted JSON, dropping any key whose value is null or empty string (l.1064). The frontend is expected to send only non-default settings (the @attention note l.1045) — sending a partial body silently deletes everything omitted. - Config-list options MUST be cleared before repopulating from file: list_int_f (l.1013) and the framerate list comment (l.1010-1012) note that defaults like [10,30,60,90,120] fps will leak through if you only append — setting just '30' in config requires clearing first or it has no effect. - The flat parser handles bracketed list values that span newlines by bracket-counting (skip_list l.633); an unterminated '[' at EOF is detected and warned (l.679) rather than hanging. - Windows UCRT64 non-admin shortcut launch can hit access-denied creating the config folder; config::parse defers (config_loaded=false but shortcut_launch=true → sleep 10s and let the service do it, l.1481-1483) rather than failing hard. - Tray icon leaks on Windows if the host runs as SYSTEM and terminates unexpectedly, because Explorer can't open the thread handle — fixed by granting Everyone SYNCHRONIZE on the thread DACL (init_tray l.143-197). Also waits for GetShellWindow()!=null (l.201) so the icon survives a logoff/logon cycle. - Tray notification message buffers: update_tray_playing stores force_close_msg in a STATIC char[256] (l.253) because the tray lib keeps the pointer; non-static would dangle. All Windows tray strings are utf8ToAcp-converted in place (l.257-258) for the ANSI shell APIs. - PIN/pairing notification clicks deep-link into the web UI: update_tray_require_pin sets notification_cb to launch_ui('/pin#PIN') (l.362) — the tray and the web UI are wired together for the pairing ceremony rather than the tray handling PIN itself. - checkIPOrigin runs BEFORE authentication and BEFORE the username-empty check (authenticate l.180), so a WAN client is 403'd before any credential logic; origin_web_ui_allowed defaults to 'lan' (config.cpp l.542), not 'pc'. - getNodeModules (confighttp.cpp l.528) guards against path traversal: weakly_canonical + isChildPath(filePath, assets) (l.519) so '..' in the request path is rejected before serving any static asset, and unknown extensions (no MIME match) are bad_request'd. - uploadCover (l.1088) only allows downloads from images.igdb.com (host check l.1109) — an SSRF guard on a feature that fetches arbitrary URLs server-side. - quit (l.1347) and tray_quit (l.94) both special-case running headless: if GetConsoleWindow()==null (i.e. running as a Windows service) they exit_sunshine(ERROR_SHUTDOWN_IN_PROGRESS) so the service manager tears down instead of respawning the process. *External deps:* Simple-Web-Server (SimpleWeb::Server + Crypto::Base64) — embedded HTTPS server for the web UI, nlohmann/json — all /api/* request/response bodies and apps/credentials file I/O, boost::property_tree (json_parser) — parsing embedded-JSON config values (prep_cmd, server_cmd, dd_mode_remapping) and credentials file, boost::asio / boost::asio::ssl — server transport + TLS context, boost::algorithm::string — content-type trimming/case-folding, tray (dmikushin/zserge 'tray' C library, vendored at tray/src/tray.h) — cross-platform system tray icon/menu/notifications, OpenSSL via crypto.h (crypto::hash, crypto::rand_alphabet, crypto::gen_creds) — password salting + TLS cert generation, ffnvcodec/nvEncodeAPI.h, AMF/* headers — encoder enum constants resolved at config parse time, Win32: shellapi (ShellExecuteExW self-elevation), iphlpapi (GetTcpTable UI-ready poll), aclapi/accctrl (tray thread DACL) --- ## Part 2 — Subsystem comparison (punktfunk vs Apollo) | Subsystem | Verdict | |---|---| | Protocol & streaming (RTP/FEC/ENet/RTSP/crypto) | ⚪ par | | GameStream HTTP, pairing, discovery | 🔵 divergent designs | | Video encode pipeline & codec config | 🔴 Apollo ahead | | Audio capture, encode, transport (Windows host) | 🔴 Apollo ahead | | Input handling & injection | 🔴 Apollo ahead | | App/process launch & display configuration (Windows host) | 🔴 Apollo ahead | | Config, management/web UI, system tray | 🔴 Apollo ahead | ### Protocol & streaming (RTP/FEC/ENet/RTSP/crypto) — ⚪ par For the **shared GameStream wire protocol**, punktfunk is at par with Apollo and arguably cleaner: the video FEC geometry (4-block/255-shard/min-parity/RS-over-full-datagram), audio RS(4,2) with the exact OpenFEC matrix, the ENet AES-GCM control wrapper (punktfunk even auto-detects the nonce scheme rather than hardcoding it), and the RTSP surround-params ordering are all faithfully reproduced and unit-tested. For the **native protocol**, punktfunk is far ahead — GF(2¹⁶) FEC, QUIC control, clock-skew correction, mid-stream reconfigure, bandwidth probing — capabilities Apollo's GameStream-only lineage cannot express. The reason it nets to "par" rather than "ahead" is a cluster of **production-hardening details Apollo's mature Windows lineage has that punktfunk's host lacks regardless of plane**: socket QoS/DSCP tagging, hot-path thread-priority elevation, and bitrate-derived rate-control pacing. On Windows specifically, the GameStream video plane also still uses a per-packet `send()` fallback (stream.rs:185) while only the native plane got the Windows USO/batch send — so a Windows GameStream-compat host transmits sub-optimally. These are transferable, well-scoped wins, not architectural deficits. **How punktfunk does it.** punktfunk runs **two protocol planes** that share one crypto/FEC core (`punktfunk-core`), with the streaming logic split across the GameStream-compat host (`crates/punktfunk-host/src/gamestream/`) and the native punktfunk/1 host (`punktfunk1.rs`). #### GameStream-compat plane (mirrors Apollo's wire protocol) - **RTSP** (`gamestream/rtsp.rs`): a hand-rolled TCP/48010 handler, one request per connection (`handle_conn` rtsp.rs:58, closes the socket so moonlight-common-c sees end-of-response). Maps OPTIONS/DESCRIBE/SETUP/ANNOUNCE/PLAY/TEARDOWN (rtsp.rs:124). DESCRIBE (`describe_sdp` rtsp.rs:224) emits HEVC/AV1 indicators and the six Opus `surround-params` lines in Sunshine's normal-before-HQ order (rtsp.rs:239-251). ANNOUNCE parses `x-nv-*` keys into a `StreamConfig` (rtsp.rs:270) + `AudioParams` (rtsp.rs:313), snapping Opus packet duration to 5/10 ms (rtsp.rs:327). PLAY launches video (`stream::start`) and audio (`audio::start`). **Notable: it advertises `encryptionSupported:0` (rtsp.rs:228) — streams are plaintext; only the ENet control plane is encrypted.** - **Video plane** (`gamestream/stream.rs` + `gamestream/video.rs`): learns the client endpoint from its first UDP ping (stream.rs:74-80), then a capture→NVENC→`VideoPacketizer`→UDP loop on a native thread, with a **dedicated send thread** (`spawn_sender` stream.rs:197) decoupled by a depth-2 `sync_channel` (stream.rs:308) — newest-frame-drop under backpressure. `VideoPacketizer` (video.rs:67) replicates Apollo's geometry exactly: ≤4 FEC blocks, ≤255 shards/block, `max_data = 255*100/(100+pct)` cap (video.rs:92), min-parity floor (video.rs:141), RS run over the **full datagram** so the NV header recovers (video.rs:113-135, proven by the round-trip test video.rs:294). FEC via `punktfunk_core::fec::Gf8Coder`. - **Audio plane** (`gamestream/audio.rs`): Opus (multistream for 5.1/7.1) + the hardcoded RS(4,2) OpenFEC matrix (`FEC_MATRIX` audio.rs:54, replicated in `audio_parity` audio.rs:194) — byte-matching moonlight-common-c. - **Control plane** (`gamestream/control.rs`): an ENet host (rusty_enet) on UDP/47999 with 0x30 channels (control.rs:44). Carries input, gamepads, and the AES-128-GCM application wrapper. Its standout feature: `decrypt_control` (control.rs:289) **auto-detects the exact GCM scheme** (nonce construction × key byte-order × tag position × AAD) by brute-forcing the cross-product on the first authenticating packet (control.rs:325-351), then `encrypt_control` (control.rs:358) mirrors it with the direction flipped for host→client rumble. - **Pairing crypto** (`gamestream/crypto.rs`): AES-128-ECB-no-pad + SHA-256 + RSA, the GameStream pairing primitives. #### Native punktfunk/1 plane (a strict superset, no Apollo equivalent) - `punktfunk1.rs` drives QUIC control (Hello/Welcome/Start/Reconfigure/ClockProbe) + the hardened core `Session` data plane over raw UDP. - The data plane (`punktfunk-core/src/session.rs`, `crypto.rs`, `fec/`, `transport/udp.rs`) uses **GF(2¹⁶) Leopard FEC** (65535-shard ceiling, fec/mod.rs:5) + per-direction-salted AES-GCM with seq-as-AAD (crypto.rs:1-20). - Transport (`transport/udp.rs`) has batched `sendmmsg`/`recvmmsg` AND **UDP GSO on both Linux (`UDP_SEGMENT`, udp.rs:97) and Windows (`WSASendMsg`+`UDP_SEND_MSG_SIZE`/USO, on by default, udp.rs:135-241)** with auto-fallback latching. The host send path is a dedicated `send_loop` (punktfunk1.rs:1804) doing FEC+seal+microburst-paced send (`paced_submit` punktfunk1.rs:1720) off the encode thread. - Adds mid-stream `Reconfigure`, an 8-round NTP clock-skew handshake, and `ProbeRequest`/`run_probe_burst` bandwidth probing (punktfunk1.rs:1629) — none of which exist in GameStream/Apollo. **Intentional divergences (by design, not gaps):** - Two protocol planes from one core: punktfunk keeps the GameStream wire protocol AND its own punktfunk/1 (QUIC control + raw-UDP GF(2^16) data plane), where Apollo is GameStream-only. The native plane intentionally expresses things GameStream cannot: GF(2^16) Leopard FEC (65535-shard ceiling vs Apollo's 255-shard ~1 Gbps wall, fec/mod.rs:5), mid-stream Reconfigure, an NTP clock-skew handshake, and active bandwidth probing (punktfunk1.rs:1629). - Zero-copy by design: punktfunk's video packetizer and the native seal path are built around reusable buffers and a wire-buffer pool (punktfunk1.rs `reclaim_wires`), and the capture→encoder path is GPU zero-copy. Apollo's stream.cpp:676 also does zero-copy shard pointers, but punktfunk's choice is a core invariant, not just an optimization. - Native-thread hot path with QUIC only on the control plane: punktfunk forbids async on the per-frame path (tokio/quinn live only behind the quic feature, used for control/m3 orchestration). The per-frame video send loops are plain std::thread + blocking sockets, matching Apollo's threaded model rather than introducing async I/O on the data plane. - Encrypt-by-default native plane vs plaintext GameStream: punktfunk advertises encryptionSupported:0 for GameStream (plaintext video/audio, rtsp.rs:228) to match stock Moonlight expectations, but the native plane mandates per-direction-salted AES-GCM with seq-as-AAD (crypto.rs) — a deliberate security upgrade GameStream's optional/legacy crypto cannot match. - Control-plane GCM scheme auto-detection: rather than hardcoding one of Moonlight's nonce/tag/key permutations like Apollo does per negotiated flag, punktfunk brute-forces the authenticating combination once per connection (control.rs:289) — more robust across moonlight-common-c variants, a Rust-idiomatic divergence. **Transfer candidates from Apollo (6):** _Batched/GSO send for the GameStream video plane on Windows_, _Socket QoS / DSCP marking on the media sockets_, _Elevate capture/encode/send thread priority on the host hot path_, _Bitrate-derived rate-control pacing (vs frame-interval-only)_, _Consume the GameStream client loss-stats report_, _Grow SO_SNDBUF on the GameStream video/audio sockets_ — see Part 4. ### GameStream HTTP, pairing, discovery — 🔵 divergent designs On the GameStream-compat surface specifically, Apollo is meaningfully ahead: it binds each TLS request to a verified device identity and enforces it (deferred post-handshake verify, per-cert X509_STORE with embedded-clock tolerance, `get_verified_cert` + PERM checks on every authenticated handler), ships a fine-grained per-device permission model and rich paired-device records (names, display-mode overrides, do/undo command lists), has an anti-oracle OTP PIN path plus stdin/web modes, and owns WAN reachability via UPnP+PCP+IPv6 pinholes. punktfunk's GameStream path pins client certs but never consults the allow-list to gate access (`tls.rs:38-45` accepts any cert; `/launch` at `nvhttp.rs:87-109` checks nothing), advertises PairStatus from the listener rather than the store, and has no UPnP and no permission model. BUT punktfunk is not merely a lesser Sunshine: it carries an entirely separate, MORE advanced native trust subsystem (`native_pairing.rs`) — atomic persistence with rollback, name sanitization, delegated-approval queue, SPAKE2 PIN ceremony — that Apollo has no analogue for, plus pure-Rust in-process mDNS and a TXT-rich native discovery record. The two front doors have genuinely different maturity profiles, so this is divergent_designs with Apollo clearly ahead on the GameStream-compat half and punktfunk ahead on its native half. **How punktfunk does it.** punktfunk's GameStream front door lives in `crates/punktfunk-host/src/gamestream/` and is cross-platform (it runs on the Windows host too — `serve` dispatches into `gamestream::serve` unconditionally, `main.rs:81-83`). It is a faithful, leaner re-implementation of the same NVIDIA nvhttp wire protocol Apollo speaks, built on **axum + axum_server + rustls** instead of Simple-Web-Server + OpenSSL. #### Two listeners (`nvhttp.rs:23-55`) `run()` binds a **plain HTTP** server on 47989 and a **mutual-TLS HTTPS** server on 47984, both serving the identical router (`router()`, `nvhttp.rs:44-55`): `/serverinfo /pair /pin /applist /launch /resume /cancel`. The listener is distinguished by an `Extension(Https(bool))` layer rather than two divergent route tables. Ports are fixed module constants (`mod.rs:33-38`: HTTP 47989 / HTTPS 47984 / RTSP 48010 / Video 47998 / Control 47999 / Audio 48000) — not derived by offset from a single base like Apollo's `net::map_port`. #### mTLS handshake (`tls.rs`) Moonlight does mutual TLS, so the HTTPS config requests + signature-checks the client cert via a custom `ClientCertVerifier` (`tls.rs:25-80`). Crucially `verify_client_cert` **always returns `ClientCertVerified::assertion()`** (`tls.rs:38-45`) — it proves key possession but does NOT check the cert against the paired allow-list. The `mandatory` flag distinguishes the nvhttp path (client cert required) from the mgmt API path (optional, bearer-token fallback) (`tls.rs:82-119`). There is no deferred post-handshake verify hook and no binding of the TLS connection to a device identity for the request lifetime. #### 4-phase PIN pairing (`pairing.rs`) A strict per-`uniqueid` state machine across 4 HTTP GETs to `/pair`, dispatched by query phrase/key in `h_pair` (`nvhttp.rs:173-222`): getservercert (`pairing.rs:102-150`) → clientchallenge (`pairing.rs:153-181`) → serverchallengeresp (`pairing.rs:184-206`) → clientpairingsecret (`pairing.rs:210-247`). Crypto in `crypto.rs`: AES-128-ECB no-pad, SHA-256, RSA-PKCS1v15-SHA256; PIN key = `SHA-256(salt||pin)[..16]` (`crypto.rs:29-34`). Phase 4 checks both the hash match AND `verify256` of the secret signature against the client cert, then pins the client cert DER. On any failure it removes the session (`pairing.rs:244`). The final `pairchallenge` is over HTTPS and just acknowledges success because the pinned-cert TLS handshake is the proof (`nvhttp.rs:203-206`). #### PIN delivery (`pairing.rs:23-74`) ONE async mode: `PinGate` parks the getservercert handler on a tokio `Notify` for up to 300s (`pairing.rs:122-126`); an operator submits the PIN out-of-band via `GET /pin?pin=NNNN` (`nvhttp.rs:69-80`) or the mgmt API. `awaiting_pin()`/`waiters` drive the web-console "prompt now" UX. The PIN direction is host-enters-what-Moonlight-shows. #### Persistence (`mod.rs:255-295`) GameStream paired certs persist as a flat `Vec>` JSON (`paired.json`) via `load_paired`/`save_paired`. No device names, no per-device permissions, no display-mode overrides, no command lists. `config_dir()` resolves XDG/HOME on Linux and **falls back to %APPDATA% on Windows** (`mod.rs:205-213`) — important for a Windows service with no HOME. Note `save_paired` is a plain `fs::write` (not atomic temp+rename). #### serverinfo (`serverinfo.rs`) Builds the capability XML; over HTTPS reports `PairStatus=1` and a placeholder MAC `01:02:03:04:05:06`, over HTTP `PairStatus=0` and `00:00:00:00:00:00` (`serverinfo.rs:8-42`). `PairStatus` is derived purely from which listener served the request, NOT from the actual paired store (`serverinfo.rs:16-17`). Codec mask is honestly computed (`mod.rs:46-56`, no false 10-bit/HDR claim). #### launch/resume/cancel (`nvhttp.rs:87-164`) `launch` parses rikey→AES-128 key + rikeyid(i32) + "WxHxFPS" mode (default 1920x1080x60) and stores a `LaunchSession`; resume reuses it; cancel clears it and flips the streaming atomics for RAII teardown. **No permission enforcement, no single-app mandatory-encryption check, no encoder probe before returning the RTSP URL.** #### Discovery (`mdns.rs` + `discovery.rs`) GameStream mDNS advertises `_nvstream._tcp.local.` via `mdns-sd` (pure-Rust, in-process — unlike Apollo's platform Avahi/DNS-SD in publish.cpp) with NO TXT records (`mdns.rs:14-37`). The native punktfunk/1 protocol has its own richer advert `_punktfunk._udp.local.` with `proto`/`fp`(cert SHA-256 to pin)/`pair`(required|optional)/`id` TXT records (`discovery.rs:34-70`). #### Native pairing (`native_pairing.rs`) — punktfunk's own protocol, ahead of Apollo here The punktfunk/1 SPAKE2 path has a far more complete trust store than the GameStream side: persistent `PairedClients` with names + fingerprints, **atomic temp+rename persistence with in-memory rollback on save failure** (`native_pairing.rs:116-126, 260-283`), untrusted-device-name **sanitization** stripping C0/C1 controls + Unicode bidi overrides (`native_pairing.rs:139-170`), a timed arming window with expiry, and a full **delegated-approval pending-knock queue** with TTL + cap + LRU eviction (`native_pairing.rs:309-405`). #### Reachability There is NO UPnP/IGD/PCP port-mapping or IPv6-pinhole code anywhere in punktfunk — WAN reachability is entirely manual. **Intentional divergences (by design, not gaps):** - Two independent trust subsystems by design: GameStream pairing (gamestream/pairing.rs, Moonlight-compatible 4-phase RSA/AES-ECB) and the native punktfunk/1 SPAKE2 PIN ceremony with its own store (native_pairing.rs). Apollo only has the GameStream one. punktfunk's native side is deliberately richer (delegated approval, name sanitization, atomic persistence). - Pure-Rust in-process mDNS via mdns-sd (mdns.rs/discovery.rs) instead of Apollo's per-platform Avahi (Linux) / DNS-SD (Win/macOS) in publish.cpp — one code path on every OS, no system daemon dependency. - Native discovery (discovery.rs:34-70) advertises proto/fp(cert-fingerprint-to-pin)/pair/id TXT records so a client can pre-pin a fingerprint and learn pairing-required up front — a capability that has no place in the GameStream protocol Apollo targets. - axum + rustls + tokio for the control plane vs Apollo's Simple-Web-Server + OpenSSL — consistent with punktfunk's invariant that async/tokio stays on the I/O-bound control plane only, never the per-frame path. - Async-Notify PIN delivery (PinGate, pairing.rs:23-74) as the single mode, fitting the web-console-armed UX, vs Apollo's three modes (OTP/web/stdin). The native side mints+displays the PIN rather than accepting it (SPAKE2 direction). - Fixed port constants (mod.rs:33-38) rather than Apollo's single-base-port + offset scheme (net::map_port); functionally equivalent for the default GameStream ports. **Transfer candidates from Apollo (6):** _Gate the GameStream HTTPS plane on the paired-cert allow-list_, _UPnP/IGD + PCP port mapping and IPv6 pinholes for WAN reach_, _Named, permissioned paired-device records for the GameStream store_, _Atomic temp+rename persistence for the GameStream paired store_, _OTP pre-armed PIN with silent-failure anti-oracle on the GameStream path_, _Tolerate not-yet-valid/expired client certs during verification_ — see Part 4. ### Video encode pipeline & codec config — 🔴 Apollo ahead The two designs converge on the same low-latency NVENC core (P1/ULL, infinite GOP, P-only, CBR, ~1-frame VBV, forced-IDR, in-place texture encode), and punktfunk is genuinely AHEAD on bitrate auto-probing and arguably on split-encode polish (10-bit-aware, fallback). But Apollo's encoder is more complete and battle-tested on exactly the dimensions that matter for the Windows host: (1) it queries `nvEncGetEncodeCaps` for every capability before init (max W/H, 10-bit, YUV444, RFI, CABAC, weighted-pred, custom-VBV, intra-refresh) and degrades gracefully instead of failing the session (nvenc_base.cpp:175-220, 250-255, 274-281, 311-315, 334-345); (2) it implements TRUE reference-frame invalidation via nvEncInvalidateRefFrames with a multi-frame DPB (maxNumRefFrames=5, RFI confirmation flag) rather than punktfunk's always-full-IDR (nvenc_base.cpp:268-281, 574-610); (3) it has a dedicated colorspace negotiation module that derives full VUI signalling (primaries/transfer/matrix/range) for SDR AND HDR and always writes it into the VUI (video_colorspace.cpp:22-120, nvenc_base.cpp:291-302), plus SPS/VPS rewriting via the FFmpeg CBS API for encoders that don't emit VUI (cbs.cpp); (4) HDR10 static metadata, YUV444, two-pass, AQ, slices, weighted-prediction, min-QP are all wired. punktfunk's Windows path sets none of the capability checks, no SDR color VUI, no slices, no true RFI, and HDR10 metadata is explicitly deferred (nvenc.rs:242). These are real robustness/quality gaps on the very GPU/codec matrix Apollo has hardened. **How punktfunk does it.** #### punktfunk's encode pipeline punktfunk has **three encoder back-ends behind one trait**, `Encoder` (`encode.rs:41-51`): `submit`/`request_keyframe`/`poll`/`flush`. The trait is intentionally minimal — synchronous submit→poll, no async, lives on the dedicated encode thread (`encode.rs:40`). `open_video` (`encode.rs:106-221`) is the OS-conditional factory: - **Linux** = `hevc_nvenc`/`h264_nvenc`/`av1_nvenc` via `ffmpeg-next` (`encode/linux.rs`). Takes CPU packed RGB frames OR `AV_PIX_FMT_CUDA` zero-copy surfaces (`linux.rs:209-221`, `submit_cuda` at `linux.rs:394-440`). - **Windows NVENC** = a **hand-written raw-SDK wrapper** over `nvidia_video_codec_sdk::ENCODE_API` (`encode/nvenc.rs`), the direct analogue of Apollo's `nvenc_base`. It opens a session bound to the DXGI capturer's `ID3D11Device` and **encodes the capturer's texture in place** — registers each input texture once, cached by raw pointer (`nvenc.rs:404-423`), then `map_input_resource`/`encode_picture` with **no per-frame `CopyResource`** (`nvenc.rs:400-461`). This is true zero-copy and is correct because the loop is synchronous (capture→submit→poll-blocks) (`nvenc.rs:6-11`). - **Windows software** = openh264 Baseline, screen-content RT (`encode/sw.rs`), the GPU-less fallback (H.264 only). **Codec config (all back-ends mirror the Moonlight/Sunshine low-latency model):** P1 + `NV_ENC_TUNING_INFO_ULTRA_LOW_LATENCY` (`nvenc.rs:206-207`), CBR (`nvenc.rs:220`), **infinite GOP** `NVENC_INFINITE_GOPLENGTH` (`nvenc.rs:218`), `frameIntervalP=1` (P-frames only, `nvenc.rs:219`), ~1-frame VBV (`nvenc.rs:225-227`), no B-frames. Recovery is **forced-IDR on demand**: `request_keyframe` sets `force_kf`, the next submit ORs `FORCE_IDR | OUTPUT_SPSPPS` (`nvenc.rs:437-442`); the GameStream control plane maps RFI/request-IDR types `0x0301/0x0302/0x0305` to this (`gamestream/control.rs:163-177`). **Split-encode** is well developed on Windows: env-overridable, forces 2-way above ~1 Gpix/s, disables for 10-bit (measured slower on Ada), and has an init-failure fallback that disables split and retries single-engine (`nvenc.rs:150-173`, `nvenc.rs:295-306`). **Bitrate probing** is a strength: rather than hard-capping, both Windows (`nvenc.rs:131-323`) and Linux (`encode.rs:117-159`) open at the requested rate and step down 3/4 only on InvalidParam, so each GPU runs at its own real ceiling (`nvenc.rs:133-140`, comment `encode.rs:119-125`). **10-bit HEVC Main10** on Windows: upconverts 8-bit ARGB via `pixelBitDepthMinus8=2` + Main10 profile GUID (`nvenc.rs:233-237`), and sets a BT.2020/PQ VUI when the capturer hands a 10-bit `Rgb10a2` frame (`nvenc.rs:243-254`). `validate_dimensions` (`encode.rs:83-99`) is the up-front gate on attacker/typo dimensions (zero/odd/over-max). **Bit-depth/HDR negotiation** happens at the protocol layer (`punktfunk1.rs:563-569`), not in a colorspace module; there is no separate colorspace negotiation file. **Intentional divergences (by design, not gaps):** - punktfunk uses a hand-written raw-NVENC-SDK wrapper on Windows (encode/nvenc.rs) talking to nvEncodeAPI directly, exactly like Apollo's nvenc_base — but on LINUX it deliberately goes through ffmpeg-next/libavcodec (encode/linux.rs) for the same NVENC, where Apollo uses libavcodec only for QSV/AMF/VAAPI and prefers its own wrapper for NVENC everywhere. punktfunk's split (raw-SDK on Windows, libav on Linux) is a pragmatic Rust-ecosystem choice, not a gap. - Zero-copy on Windows is achieved by registering and encoding the DXGI capturer's texture IN PLACE with no encoder-owned input pool and no CopyResource (nvenc.rs:400-461) — a cleaner zero-copy seam than Apollo's create_and_register_input_buffer pool. This is a deliberate punktfunk design win enabled by the synchronous single-thread loop invariant, not something to import from Apollo. - Bitrate handling is intentionally divergent: punktfunk probe-and-step-down lets each GPU run at its real ceiling (nvenc.rs:131-323, encode.rs:117-159) instead of Apollo's static codec-level table; this is a punktfunk improvement, not a gap. - punktfunk has NO declarative encoder descriptor table / option-vector layering (Apollo's encoder_t/codec_t with common/sdr/hdr/444 option vectors). punktfunk hardcodes config per-backend in Rust. Given punktfunk targets a narrower encoder set (NVENC + openh264) and the design invariant of one core + native threads, the table abstraction is not obviously worth importing — flagged as divergent design rather than a required fix. - The Encoder trait is intentionally synchronous (submit→poll→flush, native thread only, encode.rs:41-51) to honor the 'no async on the per-frame path' invariant — Apollo's parallel/async encode loops (PARALLEL_ENCODING flag) are NOT a model to copy onto the per-frame path. **Transfer candidates from Apollo (6):** _Query NVENC encode capabilities before init and degrade gracefully_, _Implement true reference-frame invalidation with a multi-ref DPB instead of always-full-IDR_, _Always emit explicit SDR color VUI (primaries/transfer/matrix/range), not just HDR_, _Plumb HDR10 static metadata (mastering display + MaxCLL/MaxFALL)_, _Set repeatSPSPPS=1 and wire slicesPerFrame for the Windows NVENC config_, _Decode NVENCSTATUS into readable names and detect InvalidParam structurally_ — see Part 4. ### Audio capture, encode, transport (Windows host) — 🔴 Apollo ahead The CAPTURE+ENCODE halves are at near-parity: both default to the render/sink endpoint, both rely on shared-mode autoconvert/channel-mixer to reach 48 kHz f32 (no manual SRC), both use Opus RESTRICTED_LOWDELAY + hard CBR, and punktfunk actually goes further on transport (GF(2^16) FEC over QUIC on the native path; verbatim Sunshine RS(4,2) on the GameStream path with a 7.1 mapping-rotation bug-fix Apollo lacks, gamestream/audio.rs:159-171). But on the WINDOWS host specifically Apollo is clearly ahead: punktfunk's WASAPI backend is stereo-only (wasapi_cap.rs:29-32) and surround `bail!`s on Windows (gamestream/audio.rs:371-378), whereas Apollo's WASAPI path negotiates 2/6/8 channels with real device channel masks and runs the multistream encoder everywhere. More importantly, Apollo carries a large body of battle-tested Windows endpoint-management behavior that punktfunk has ZERO of: default-device-change detection (IMMNotificationClient → reinit), Steam Streaming Speakers virtual-sink install/match/format, IPolicyConfig default-endpoint switching + save/restore, reset_default_device on startup, and MMCSS Pro-Audio thread priority. punktfunk's WASAPI capturer is also a plain priority thread with no MMCSS and no device-change recovery — a default-device switch mid-session silently goes quiet until the 5 s read timeout fires. **How punktfunk does it.** #### punktfunk audio: capture trait + per-path transport, both fed by one Opus tuning ##### Capture abstraction (`crates/punktfunk-host/src/audio.rs`) A single `AudioCapturer` trait (audio.rs:17-30) abstracts the OS backend: `next_chunk()` blocks for a variable-size interleaved-f32 chunk, `channels()` reports the opened layout, `drain()` discards stale buffers when a persistent capturer is reused. The factory `open_audio_capture(channels)` (audio.rs:36-49) is `cfg`-dispatched: Linux → PipeWire (`linux::PwAudioCapturer`), Windows → `wasapi_cap::WasapiLoopbackCapturer`, other → `bail!`. A parallel `VirtualMic` trait (audio.rs:56-76) is the mic-passthrough inverse — but it is **Linux-only** (audio.rs:73-76 returns `bail!` off Linux). ##### Windows WASAPI loopback (`crates/punktfunk-host/src/audio/wasapi_cap.rs`) COM objects are apartment-bound and `!Send`, so capture runs on a dedicated `punktfunk-wasapi-audio` thread (wasapi_cap.rs:39-46); only the receiver/stop-flag/join-handle live in the struct. A bring-up handshake (`ready_tx`/`ready_rx`, 3s timeout, wasapi_cap.rs:37,47-63) surfaces a missing render endpoint as `Err` so the caller continues without audio instead of leaving a dead thread. Format negotiation is deliberately thin: it always opens the **default Render** endpoint as a loopback capture (wasapi_cap.rs:108-111), asks for a fixed `WaveFormat::new(32,32,Float,48000,2)` (wasapi_cap.rs:116), and uses `StreamMode::EventsShared { autoconvert: true, .. }` (wasapi_cap.rs:119-122) so WASAPI's shared-mode SRC resamples to 48 kHz f32 stereo — **no resampling in Rust**. The capture loop event-waits with a 100 ms timeout to stay `stop`-responsive (wasapi_cap.rs:138), drains whole `BLOCK_ALIGN`-aligned bytes (wasapi_cap.rs:152), reinterprets to f32, and `try_send`s lossily (wasapi_cap.rs:161 — drop-if-behind, same discipline as PipeWire). **It is stereo-only and hard-rejects any other channel count** (wasapi_cap.rs:29-32). ##### Linux PipeWire (`crates/punktfunk-host/src/audio/linux.rs`) Richer than the Windows path: opens a sink-monitor capture (`STREAM_CAPTURE_SINK=true`, linux.rs:366) at the session channel count (1/2/6/8, linux.rs:34-37), with explicit SPA channel positions matching the GameStream FL FR FC LFE RL RR [SL SR] order (`spa_positions`, linux.rs:87-107) and a ~5 ms quantum (linux.rs:369). PipeWire's channel-mixer up/downmixes, so a stereo desktop still yields a valid 5.1/7.1 capture. RAII teardown via a `pipewire::channel` Terminate message (linux.rs:56-62) — required so a wedged consumer doesn't head-block the daemon. Also implements the full virtual-mic source with an adaptive jitter buffer (linux.rs:160-333). ##### Transport — GameStream path (`crates/punktfunk-host/src/gamestream/audio.rs`) Learns the client audio endpoint from a port-learning ping (audio.rs:305-315), reuses or reopens the persistent capturer when the channel count changes (audio.rs:318-335), and builds a per-session `SessionEncoder`: plain `opus` crate for stereo (RESTRICTED_LOWDELAY + hard CBR, audio.rs:357-365), or a hand-wrapped `audiopus_sys` multistream encoder for 5.1/7.1 (audio.rs:404-465, **Linux-only — Windows surround `bail!`s** at audio.rs:371-378). Each frame is AES-128-CBC encrypted under the `/launch` rikey with a per-packet IV (audio.rs:540-544), wrapped in a 12-byte RTP header (payload type 97, audio.rs:213-222), and paced to its packet-duration slot (audio.rs:592-598). Surround sessions add Sunshine-compatible RS(4,2) Reed–Solomon audio FEC (audio.rs:194-248, 553-583) with a verbatim OpenFEC matrix. ##### Transport — native punktfunk/1 path (`crates/punktfunk-host/src/punktfunk1.rs:1379-1453`) The same capturer feeds an Opus stereo encoder (128 kbps, LOWDELAY, CBR — punktfunk1.rs:1401-1414) whose output goes straight into `encode_audio_datagram` over QUIC (0xC9, punktfunk1.rs:1437-1441). QUIC already encrypts, so no AES-CBC/RTP/FEC layer. Stereo-only. ##### Shared lifecycle Both transports use the persistent `AudioCapSlot` (gamestream/audio.rs:251-257) and the "audio is best-effort" convention: a capture-open failure logs a warning and the session continues without sound (punktfunk1.rs:1395-1399, gamestream/audio.rs:332-334). **Intentional divergences (by design, not gaps):** - Native protocol transport is fundamentally different and BETTER, not a gap: punktfunk/1 ships Opus as QUIC datagrams (0xC9, punktfunk1.rs:1437-1441) — QUIC provides encryption + the data plane carries GF(2^16) Leopard FEC, so there is no AES-CBC/RTP/Reed-Solomon-RS(4,2) layer like GameStream/Apollo. This is inexpressible in Apollo's Moonlight-only world. - One C-ABI core, cfg-dispatched backends: a single `AudioCapturer` trait (audio.rs:17-30) with PipeWire (Linux) and WASAPI (Windows) impls behind one `open_audio_capture` factory — vs Apollo's separate src/audio.cpp + per-platform src/platform/*/audio.cpp. punktfunk's capture, encode, and transport all stay synchronous on native threads (no async on the per-frame path). - GameStream surround is more correct than Apollo: punktfunk rotates the surround-params mapping over [3, channels) (gamestream/audio.rs:159-171) so 7.1 LFE/SL/SR round-trip after the client's GFE-order swap, where Sunshine/Apollo only rotate [3,6) and scramble 7.1 — verified by a real-codec round-trip test (gamestream/audio.rs:668-688, 750-818). - WASAPI silent-flag + buffer-straddle handling that Apollo does by hand is provided by the `wasapi` 0.23 crate's `read_from_device_to_deque` (wasapi_cap.rs:146) + a byte VecDeque carrying the remainder (wasapi_cap.rs:135,152-156). Not a missing behavior — it is delegated to the dependency, so Apollo's manual silent-flag lesson is largely already covered on the capture-correctness axis. - Audio is explicitly best-effort and decoupled: a failed open just logs and the session streams video-only (punktfunk1.rs:1395-1399, gamestream/audio.rs:332-334) — the same hard-won lesson Apollo encodes with its fail_guard, already adopted. **Transfer candidates from Apollo (6):** _Detect default-render-device changes and reinit WASAPI capture_, _Raise the WASAPI capture thread to MMCSS Pro Audio priority_, _Support 5.1/7.1 WASAPI loopback + multistream encode on Windows_, _Implement client-mic passthrough on Windows_, _Surface WASAPI data-discontinuity as a glitch diagnostic_, _Recover when there is no render endpoint at session start_ — see Part 4. ### Input handling & injection — 🔴 Apollo ahead For the Windows host specifically, Apollo is ahead on input breadth and robustness. Apollo covers mouse (rel+abs), keyboard (with a static US-layout VK→scancode table for game compatibility), Unicode text, scroll, **touch + pen via CreateSyntheticPointerDevice**, and **both X360 and DS4** gamepads with rumble/LED/motion/touchpad/battery feedback (Apollo src/platform/windows/input.cpp). punktfunk's Windows host covers mouse/keyboard/scroll/X360-only; touch and pen are explicit no-ops (sendinput.rs:231-237), there is no Unicode text path (gamestream/input.rs:83-84), and only the Xbox 360 virtual pad exists on Windows. Apollo also has the more efficient secure-desktop model (retry-only) vs punktfunk's per-event reattach (sendinput.rs:97), and Apollo's task-pool queue + type-aware batching (Apollo src/input.cpp:1481-1571, 1208-1475) coalesces input spam off the network thread — punktfunk's GameStream path injects inline on the ENet thread (control.rs:207-211) with no batching anywhere. punktfunk's design is cleaner and its m3 path's session-end held-key release + backend-follow logic is genuinely nicer than Apollo, but those are punktfunk/1-specific; on the shared Windows-host injection surface Apollo is the more complete, battle-tested implementation. punktfunk's docs/windows-secure-desktop.md already flags the retry-only refactor as planned-but-unshipped, confirming the gap. **How punktfunk does it.** #### Two protocol fronts, one platform-agnostic event, pluggable backends punktfunk splits input cleanly: a **wire-decode layer** turns each protocol's bytes into the C-ABI-shaped `punktfunk_core::input::InputEvent` (`crates/punktfunk-core/src/input.rs:108-150`, a fixed-size `{kind:u8, code:u32, x:i32, y:i32, flags:u32}` struct), and a **backend layer** (`crate::inject::InputInjector` trait, `inject.rs:18-20`) replays that event into the OS. `inject::open(backend)` (`inject.rs:38-88`) dispatches to one of: wlroots virtual protocols, libei/EIS, gamescope EIS, uinput, or — on Windows — `SendInput` (`inject.rs:76-85`). `default_backend()` auto-picks `SendInput` on Windows (`inject.rs:109-112`). #### GameStream (Moonlight-compat) decode `gamestream/input.rs:34-86` parses `NV_INPUT_HEADER`-framed packets: relative/absolute mouse, buttons, vertical (`0x0A`) + horizontal (`0x55000001`) scroll, and key down/up — masking the GameStream `0x80` high byte to a low-byte VK (`input.rs:74`). `MAGIC_UTF8` is recognized as a constant (`input.rs:26`) but the body **falls through to `None`** (`input.rs:83-84`): no Unicode text injection. `gamestream/gamepad.rs:72-111` decodes the Gen5+ `MULTI_CONTROLLER` (incl. `buttonFlags2` paddles/touchpad-click) and Sunshine `CONTROLLER_ARRIVAL`, plus `rumble_plaintext()` (`gamepad.rs:116-125`) for the host→client back-channel. #### Windows SendInput backend (`inject/sendinput.rs`) - **Absolute mouse**: maps the client's `(0..w, 0..h)` reference extent (carried in `flags`) to `0..65535` with `MOUSEEVENTF_ABSOLUTE|MOUSEEVENTF_VIRTUALDESK` (`sendinput.rs:110-132`). It queries the virtual-desktop rect (`sendinput.rs:255-264`) but **only uses `w/h`** — the `(_vx,_vy,vw,vh)` values are discarded (`sendinput.rs:116,122`), so it assumes a single output spanning the whole virtual desktop. - **Keyboard**: VK → scancode at runtime via `MapVirtualKeyExW(MAPVK_VK_TO_VSC_EX)` and sends `KEYEVENTF_SCANCODE`, deriving `KEYEVENTF_EXTENDEDKEY` from the `0xE000` high bits plus a `forced_extended()` safety-net set (`sendinput.rs:206-230, 270-275`). No static layout table — it relies on the live OS layout. - **Scroll/buttons**: WHEEL/HWHEEL with no sign-flip (`sendinput.rs:188-205`), X1/X2 side buttons (`sendinput.rs:160-175`). - **Secure-desktop survival**: `reattach_input_desktop()` (`OpenInputDesktop`+`SetThreadDesktop`) is called **at the top of every `inject()`** (`sendinput.rs:97`), and `send()` surfaces a partial/zero injection as `Err` so the host drops+reopens the injector (`sendinput.rs:71-82`). Touch/pen are explicit no-ops (`sendinput.rs:231-237`). #### Windows ViGEm gamepad (`inject/gamepad_windows.rs`) One virtual Xbox 360 pad per client index, lazily plugged on first `State` (`gamepad_windows.rs:59-100`). XInput conventions are ~1:1, so `handle()` packs `XGamepad` directly (`gamepad_windows.rs:102-121`). Rumble flows back via ViGEm's notification API on a background thread into an `AtomicU32`, drained by `pump_rumble()` on change with 255→65535 scaling (`gamepad_windows.rs:71-88, 127-137`). Only X360 — **no DualShock 4 / DualSense on Windows** (DualSense is Linux-UHID-only, `inject.rs:298-304`). #### Threading — two different models - **GameStream**: injection happens **inline on the ENet service thread** — `on_receive` calls `inj.inject(&ev)` directly inside the `host.service()` loop (`gamestream/control.rs:84-91, 207-211`). Rumble is pumped each 2 ms tick (`control.rs:103-124`). - **punktfunk/1 (m3)**: a per-session `input_thread` (`punktfunk1.rs:1245-1344`) receives decoded events over an mpsc channel, routes pointer/keyboard to a **host-lifetime `injector_service_thread`** (`punktfunk1.rs:1011-1064`, `inj_tx.send(ev)` at `punktfunk1.rs:1300`) and gamepad events to the session `PadBackend`. It tracks `held_buttons`/`held_keys` and **synthesizes release events at session end** to avoid latched-button drag (`punktfunk1.rs:1267-1301, 1345-1368`). The injector service auto-reopens when the resolved backend changes mid-session (`punktfunk1.rs:1020-1029`). **Intentional divergences (by design, not gaps):** - One core, one event struct: both protocols decode to the same fixed-size punktfunk_core::input::InputEvent (input.rs:108-150) that crosses the C ABI unchanged, and all backends implement one InputInjector trait (inject.rs:18-20). Apollo bridges through the platf:: abstraction but has no cross-process ABI — punktfunk's event is the same shape host-side and client-side (the embeddable NativeClient sends it back via punktfunk_connection_send_input, punktfunk1.rs:2798). - Runtime VK→scancode via MapVirtualKeyExW (sendinput.rs:209) instead of Apollo's compile-time static US-English table (keylayout.h). This intentionally respects the host's live keyboard layout rather than forcing 0x409; the trade-off (no fixed canonical mapping for games that demand exact set-1 scancodes) is the candidate improvement below, not a bug. - punktfunk/1 native path runs gamepads as a per-session PadBackend that can be a virtual DualSense (UHID hid-playstation) on Linux — a game sees a REAL DualSense with adaptive triggers/lightbar/touchpad/motion, with rich feedback flowing back over a dedicated 0xCD HID-output plane (punktfunk1.rs:1169-1235). Apollo emulates DS4 via ViGEm on Windows; punktfunk's UHID approach is a deliberately different, higher-fidelity Linux design (not applicable to the Windows host). - Native threading model: the m3 path decouples ingest from injection via an mpsc channel to a host-lifetime injector service thread (punktfunk1.rs:1011-1064, 1300) — same anti-head-of-line-blocking intent as Apollo's task-pool queue but realized with Rust channels + native threads (no async, honoring the no-async-on-the-frame-path invariant). It also auto-reopens the injector when the active session/backend changes mid-stream (punktfunk1.rs:1020-1029), which Apollo has no analogue for. - Session-end safety: the m3 input thread tracks held buttons/keys and synthesizes release events when the client vanishes mid-press (punktfunk1.rs:1267-1301, 1350-1368), preventing latched-button drag across reconnects — a robustness feature Apollo's per-stream input_t does not implement this way. **Transfer candidates from Apollo (6):** _Switch SendInput to retry-on-failure desktop reattach (drop per-event OpenInputDesktop)_, _Move GameStream input injection off the ENet service thread_, _Coalesce relative-mouse/scroll/controller spam before injection_, _Add virtual touch + pen on the Windows host via synthetic pointer devices_, _Map absolute mouse to the target output rect, not the whole virtual desktop_, _Add a static canonical VK→scancode table as a layout-independent fallback_ — see Part 4. ### App/process launch & display configuration (Windows host) — 🔴 Apollo ahead For the Windows host specifically, Apollo is clearly ahead on this subsystem. Apollo runs a full synchronous app lifecycle on Windows — prep/detached/main commands launched into the interactive user session via token impersonation + CreateProcessAsUserW (process.cpp run_command path), process-group lifetime tracking, env-var injection (SUNSHINE_*/APOLLO_*), an auto_detach heuristic for Steam/UWP launchers, and a separate libdisplaydevice settings manager that applies resolution/refresh/HDR on a RetryScheduler with revert-on-disconnect and a dedicated HDR-toggle workaround thread. punktfunk's Windows host has NO process launch at all (the cmd/library command only sets PUNKTFUNK_GAMESCOPE_APP, which only the Linux gamescope backend reads), no env injection for launched apps, no HDR set on the virtual display, no display-state revert beyond re-attaching physical monitors, and a single fixed MONITOR_GUID rather than a per-(app,client) stable identity. punktfunk's SudoVDA mode-setting (advertised-mode fallback, CDS_TEST, CDS_SET_PRIMARY, isolate-displays for the secure desktop) is genuinely solid and on par with Apollo's VD handling, but the launch + display-config breadth is well behind. Note punktfunk is ahead of Apollo where Apollo can't go at all (its own punktfunk/1 protocol, mDNS discovery, native clients), but those are out of this subsystem's scope. **How punktfunk does it.** #### Two-layer split, like Apollo, but thinner on Windows **Layer 1 — app catalog + library (`gamestream/apps.rs`, `library.rs`).** Two catalogs feed launches: - `apps.rs` is the GameStream `/applist`/`/launch` catalog: `AppEntry { id, title, compositor: Option, cmd: Option }` loaded from `~/.config/punktfunk/apps.json` (`apps.rs:40-95`), defaults to a "Desktop" entry plus gamescope `Steam`/`vkcube` entries gated on `which()` (`apps.rs:76-93`). Compositor strings parse via `parse_compositor` (`apps.rs:27-36`). `applist_xml()` hard-codes `IsHdrSupported=0` (`apps.rs:107`). - `library.rs` is the punktfunk/1 unified game library: a `LibraryProvider` trait with a `SteamProvider` that reads local `appmanifest_*.acf` + `libraryfolders.vdf` (no Web API key) and a CRUD'd custom store. `steam_roots()` is `cfg`-split for Windows (`library.rs:147-156`, Program Files) and reads `libraryfolders.vdf` for off-drive libraries (`library.rs:174-194`). `launch_command()` resolves a client-sent store-id to a shell command in the host's OWN library so a client can never inject (`library.rs:394-412`); `steam_appid` is validated as digits-only and interpolated into `steam steam://rungameid/`. **Layer 2 — virtual display (`vdisplay.rs` + `vdisplay/sudovda.rs`).** The `VirtualDisplay` trait (`vdisplay.rs:47-53`) is RAII: `create(mode) -> VirtualOutput { node_id, win_capture, keepalive }`, teardown by dropping `keepalive`. On Windows `open()` always returns the single `SudoVdaDisplay` (the compositor arg is moot, `vdisplay.rs:525-530`). #### Windows launch + display-config flow `punktfunk1.rs:532-543` (and the GameStream twin `stream.rs:93-97`) resolve the launch id to a command and do `std::env::set_var("PUNKTFUNK_GAMESCOPE_APP", &cmd)` — **but that env var is read only by the Linux gamescope backend** (`vdisplay/gamescope.rs:441`). On Windows there is **no process launch at all**: SudoVDA's `create()` (`sudovda/sudovda.rs:448-543`) never spawns the app, and the only `CreateProcessAsUserW` in the crate is the WGC capture helper (`capture/wgc_relay.rs:6`), not app launch. So on Windows a session shows the bare desktop; apps.json `cmd`/library titles are effectively dead. `SudoVdaDisplay::create` does the display-config work Apollo splits into a separate library, inline: ADD-IOCTL at the client's exact WxH@Hz (`sudovda.rs:452-469`), watchdog ping thread keyed to the driver's reported timeout (`sudovda.rs:481-494`), resolve `\\.\DisplayN` via CCD with a 15×200ms retry (`sudovda.rs:499-505`), `set_active_mode()` which enumerates advertised modes and falls back to the best refresh AT THE SAME RESOLUTION + `CDS_TEST` before `CDS_SET_PRIMARY` (`sudovda.rs:146-265`), then `isolate_displays()` detaches every physical display so the secure desktop renders on the VD (`sudovda.rs:276-329`). Teardown (`sudovda.rs:559-580`) stops the pinger, `restore_displays()`, then REMOVE by a **fixed** `MONITOR_GUID` (`sudovda.rs:59`, "one session at a time today"). **Intentional divergences (by design, not gaps):** - punktfunk intentionally uses a per-compositor VirtualDisplay trait (KWin/gamescope/Mutter/wlroots on Linux, SudoVDA on Windows) with RAII teardown, rather than Apollo's single Windows-only IDD + a settings-manager library. Same SudoVDA IDD on Windows, but a unified Rust trait across OSes. - punktfunk has TWO app surfaces by design: the GameStream apps.json catalog (Moonlight compat) AND a richer punktfunk/1 library (Steam local scan + custom store + CDN art + uniform GameEntry grid). Apollo has only the apps.json catalog because it ships no client. - punktfunk's launch security model is deliberately client-can't-inject: the client sends only a store-qualified id and the host resolves it against its OWN library (library.rs:394-412), with steam appid validated digits-only. Apollo trusts its own apps.json cmds (it has no untrusted remote launch id). - punktfunk keeps NO async on the per-frame path; the SudoVDA watchdog pinger and capture are native threads. Apollo's libdisplaydevice RetryScheduler is its own machinery; punktfunk has no equivalent scheduler by choice (yet — see candidate improvements). - punktfunk's Windows virtual display is the SOLE primary output (isolate_displays + CDS_SET_PRIMARY) specifically to capture the secure/Winlogon desktop — a deliberate, documented design (docs/windows-secure-desktop.md) that goes beyond what stock Apollo needs. **Transfer candidates from Apollo (6):** _Actually launch the app/game on Windows (CreateProcessAsUserW into the user session)_, _Display-config apply/revert with a retry scheduler and guaranteed revert on disconnect_, _Set HDR on the virtual display and advertise IsHdrSupported when the client requests it_, _Per-(app,client) stable virtual-display GUID instead of one fixed MONITOR_GUID_, _Inject per-app launch env (client res/fps/HDR/audio + status) for launch scripts_, _auto_detach heuristic for launcher-style apps (Steam/UWP) that exit immediately_ — see Part 4. ### Config, management/web UI, system tray — 🔴 Apollo ahead On the API itself punktfunk is arguably ahead (versioned `/api/v1`, compile-time OpenAPI doc with a drift test, HTTPS-always, dual mTLS-or-bearer auth, constant-time token compare, delegated pairing-approval queue) — all cleaner than Apollo's hand-rolled SimpleWeb routes with one global session cookie. But the prompt's subsystem is the whole operator/management *plane*, and on the parts that make a host pleasant to *operate on a desktop OS — particularly Windows — Apollo is materially more mature: (1) a native system tray with state-driven icon + balloon notifications and a menu that drives the same primitives (`system_tray.cpp:238-412`), giving headless/desktop users a visible, clickable control surface punktfunk completely lacks; (2) a "launch the UI in the default browser" universal entry (`entry_handler.cpp:29` `launch_ui`) wired to the tray, Windows shortcut, and PIN notifications; (3) the in-binary Windows service<->UI two-process elevation dance (`config.cpp:1490-1534` self-`runas` + `wait_for_ui_ready` + `launch_ui`) so a one-click Start-Menu launch Just Works; (4) a self-contained, persisted human-editable config file with a web config editor (`getConfig`/`saveConfig`, `confighttp.cpp:997/1049`) versus punktfunk's env-vars-everywhere with no host config file and no config endpoint. punktfunk leans on external scripts and a separate web app for all of this, which is fine for a Linux-first daemon but is the exact gap the user is unhappy about on the Windows host. **How punktfunk does it.** #### punktfunk's operator/management plane punktfunk splits the control surface into three pieces and deliberately keeps them OUT of the host binary where Apollo bundles them in. ##### 1. Management plane = a versioned REST API only (`crates/punktfunk-host/src/mgmt.rs`) - An axum `Router` (`mgmt.rs:166` `fn app`) under `/api/v1`, single source of truth shared between the live server and the `openapi` subcommand (`mgmt.rs:195` `api_router_parts`, `main.rs:86`). The OpenAPI 3.1 doc is generated at compile time with `utoipa` and a checked-in copy is drift-tested against `docs/api/openapi.json` (`mgmt.rs:1582` `openapi_document_is_complete_and_checked_in`). This is a real maturity advantage over Apollo, which has no machine-readable API spec. - Routes: host info/capabilities/port map (`mgmt.rs:590`), live status (`mgmt.rs:671`), paired GameStream clients list/unpair (`mgmt.rs:707`,`752`), the GameStream PIN flow (`mgmt.rs:789`,`814`), the native punktfunk/1 pairing surface — arm/disarm/status/list/unpair (`mgmt.rs:870`-`994`), **delegated pairing approval** via a pending-device queue (`mgmt.rs:1011`,`1049`,`1094`), session stop + force-IDR (`mgmt.rs:1120`,`1144`), and game-library CRUD (`mgmt.rs:1171`-`1252`). - **HTTPS always, even on loopback** (`mgmt.rs:75` `run`): it runs the rustls handshake itself via tokio-rustls so it can surface the verified peer cert to handlers (`mgmt.rs:115` `serve_https`), reusing the host's persistent identity cert that clients already pin (`mgmt.rs:90`). - **Dual auth** (`mgmt.rs:518` `require_auth`): a paired native client authenticates by its **mTLS certificate fingerprint** (matched against the native paired store, no token needed); everyone else (the web console / admin) uses a bearer token compared in constant time (`mgmt.rs:551` `token_eq` via SHA-256 digest compare). `/api/v1/health` is the only unauthenticated route. This is stronger than Apollo's single-global-session-cookie scheme (Apollo `confighttp.cpp` has exactly one `std::string sessionCookie`). ##### 2. Token bootstrap (`crates/punktfunk-host/src/mgmt_token.rs`) A token always exists with zero operator steps: env `PUNKTFUNK_MGMT_TOKEN` wins, else the persisted `~/.config/punktfunk/mgmt-token`, else a freshly generated 32-byte hex token persisted at mode 0600 (`mgmt_token.rs:24` `load_or_generate`, `mgmt_token.rs:59` `write_token`). It's written in `KEY=VALUE` form so the bundled web console can source it as a systemd `EnvironmentFile` — a single shared secret with no copying. ##### 3. Config = env vars + CLI flags + JSON files (NO config file, NO config-editor endpoint) - The whole host config surface is ~50+ `PUNKTFUNK_*` environment variables (read scattered across the codebase) plus CLI flags parsed by hand in `main.rs` (`parse_serve` `main.rs:294`, `parse_m0` `main.rs:358`). There is **no persisted host config file** and **no `/api/config` get/set endpoint** — `getConfig`/`saveConfig` have no punktfunk analog. - The only persisted user state is small JSON files in the config dir: `apps.json` (the GameStream app catalog, `gamestream/apps.rs:24`), `library.json` (custom game entries, written atomically via temp+rename `library.rs:321` `save_custom`), `mgmt-token`, and the paired stores (`paired.json`, `punktfunk1-paired.json`). The library is the one config surface that hot-applies through the API; app/catalog editing is file-only. ##### 4. Web UI = a SEPARATE app, not embedded in the host `web/` is a standalone TanStack Start console (Bun/Nitro) that talks to the API over a proxy; the host serves no HTML and embeds no assets (no `rust-embed`/`include_dir`/`ServeDir` anywhere in the crate). Apollo, by contrast, embeds a SimpleWeb HTTPS server + static assets + page routes directly in the host process. ##### 5. System tray = none There is no system tray, no balloon notifications, and no "open the UI in the browser" entry point anywhere in `crates/punktfunk-host`. Apollo has a full cross-platform tray (`system_tray.cpp`) with state-driven icon/notification updates and menu callbacks. ##### 6. Windows launch story = scripts, not in-binary The two-process secure-desktop design exists for *capture* (`main.rs:204` `wgc-helper` subcommand + `capture/wgc_relay.rs` `CreateProcessAsUserW`), but the service/desktop launch dance is handled by external scripts (scheduled task -> PsExec64 -> launch.vbs -> host-run.cmd; `docs/windows-host.md:77-96`). punktfunk has no in-binary service install, no self-elevation, no "launch UI in browser", and no tray — all of which Apollo bakes into `config.cpp`/`entry_handler.cpp`/`system_tray.cpp`. **Intentional divergences (by design, not gaps):** - Web console is a SEPARATE TanStack Start app (web/) over the REST API, not assets embedded in the host process — Apollo embeds a SimpleWeb HTTPS server + static files in-binary (confighttp.cpp:1507). punktfunk's split keeps tokio/axum confined to the control plane and lets the console deploy/scale independently (it even deploys to a different VM). This is an intentional architecture choice, not a gap. - Compile-time OpenAPI 3.1 document generated from the route handlers themselves (mgmt.rs:195/227) with a checked-in copy drift-tested in CI (mgmt.rs:1582) — Apollo has no machine-readable API spec. This is punktfunk ahead, kept deliberately as the codegen contract for clients. - Configuration is env-var + CLI + small JSON files with NO persisted host config file and NO config get/set endpoint — Apollo persists a flat key=value config and exposes a web config editor (confighttp.cpp:997/1049). punktfunk treats the host as a 12-factor-style daemon configured by environment; this is a design stance, though it costs the operator-friendly editor (see candidate improvements). - Dual auth: a paired native client authenticates by its punktfunk/1 mTLS certificate fingerprint (mgmt.rs:525), reusing the exact identity + trust store the QUIC data plane uses, so no separate web credential is needed for paired devices. Apollo only has username/password + a single global session cookie. This stems directly from punktfunk's native-protocol identity design. - PIN-pairing semantics are inverted for the native protocol: the HOST mints + displays the PIN (SPAKE2 needs it client-side first), surfaced via POST /native/pair/arm returning the PIN (mgmt.rs:890), plus a delegated-approval pending-device queue (mgmt.rs:1011/1049). Apollo's PIN flow is the GameStream client-displays-PIN model only. This follows from the punktfunk/1 SPAKE2 ceremony, not a deficiency. **Transfer candidates from Apollo (6):** _Native system tray with state-driven icon + notifications_, _"Open management UI" browser-launch entry point_, _In-binary Windows service install + interactive-session launch_, _Persisted host config + read/write config API endpoint_, _Logs read endpoint for the console_, _Actually reject unpaired GameStream client certs (close the unpair gap)_ — see Part 4. --- ## Part 3 — Windows host deep-dive (the focus) Apollo's Windows host is the mature reference (Sunshine lineage). Each topic: how **Apollo** does it, how **punktfunk** does it today and where it's weaker, then the concrete transfer opportunities. ### DXGI Desktop Duplication capture pipeline & robustness (Windows host) #### Apollo #### Apollo DXGI Desktop Duplication — reference map Apollo's Windows capture is a class hierarchy in `src/platform/windows/display.h` and three `.cpp` files. The shared base is `display_base_t` (display.h:157); concrete leaves are `display_ddup_vram_t` (HW encode, display.h:316) and `display_ddup_ram_t` (SW encode, display.h:303), plus WGC siblings. `duplication_t` (display.h:286) wraps the actual `IDXGIOutputDuplication`. The `display()` factory (display_base.cpp:996) selects DDA-vram / DDA-ram / WGC-vram / WGC-ram by capture-method config + `mem_type_e`. ##### Device / context / adapter+output enumeration (display_base.cpp:431-741) - One-time process init under `std::call_once`: `SetProcessDpiAwarenessContext(PER_MONITOR_AWARE_V2)` (display_base.cpp:443) and — critically — a **MinHook of `win32u.dll!NtGdiDdDDIGetCachedHybridQueryValue`** (display_base.cpp:452, hook at :417) that forces the GPU preference to `UNSPECIFIED`. **This is the single most important hard-won detail**: it prevents DXGI "output reparenting" where a hybrid/multi-GPU system moves an output to the render GPU, which **breaks DDA** (comment at display_base.cpp:418-422). - `CreateDXGIFactory1` (display_base.cpp:463). Adapter loop `EnumAdapters1` / output loop `EnumOutputs` filtered by configured adapter/output names (display_base.cpp:474-526). For each candidate it checks `desc.AttachedToDesktop` AND runs `test_dxgi_duplication()` (display_base.cpp:347) — a probe that actually creates a throwaway device and calls `DuplicateOutput`, so an output that *can't* be duplicated is never selected. - **Two-try outer loop** (display_base.cpp:473): if no output found on the first pass, it calls `SetThreadExecutionState(ES_DISPLAY_REQUIRED); Sleep(500)` to **power the display on** and retries (display_base.cpp:533). - Captures geometry incl. `display_rotation` and swaps width/height for 90/270 rotation (display_base.cpp:503-511); normalizes offsets to the virtual-desktop origin (display_base.cpp:515). - Real device: `D3D11CreateDevice` with `D3D_DRIVER_TYPE_UNKNOWN` + explicit adapter, a full `featureLevels` ladder 11_1→9_1 (display_base.cpp:544-571). - **GPU scheduling priority hardening**: enables `SE_INC_BASE_PRIORITY_NAME` privilege (display_base.cpp:599), queries HAGS via `D3DKMTQueryAdapterInfo`/`WDDM_2_7_CAPS` (display_base.cpp:620-657), and sets `D3DKMTSetProcessSchedulingPriorityClass` to REALTIME — but **downgrades NVIDIA+HAGS to HIGH** because realtime can cause unrecoverable NVENC freeze/driver crash (display_base.cpp:663-669). Also `SetGPUThreadPriority(0x4000001E)` with relative-priority fallback (display_base.cpp:688), and `SetMaximumFrameLatency(1)` (display_base.cpp:709). - Logs full HDR `DXGI_OUTPUT_DESC1` via `IDXGIOutput6` (display_base.cpp:716-733). ##### Duplication init (display_base.cpp:47-125) - Prefers `IDXGIOutput5::DuplicateOutput1` with a list of **encoder-supported formats** (`get_supported_capture_formats()`, display.h:236) so it can request wide-gamut/HDR formats; falls back to `IDXGIOutput1::DuplicateOutput` if Output5 is unavailable (display_base.cpp:57-110). - `capture_format` starts `DXGI_FORMAT_UNKNOWN` and is only learned from the **first acquired frame** (display_base.cpp:51, learned in display_vram.cpp:1226). - **Each DuplicateOutput is wrapped in a 2-iteration retry with `syncThreadDesktop()` + 200 ms sleep** (display_base.cpp:67-76) to survive racing a desktop/mode change at startup. It deliberately does NOT fall back from Output5 to Output1 on failure to avoid silently degrading capture during a mode-change race (comment display_base.cpp:78). ##### AcquireNextFrame contract (display_base.cpp:127-157) `duplication_t::next_frame` first **releases the previous frame** (`release_frame`, must precede every Acquire), then `AcquireNextFrame` and maps HRESULT → `capture_e`: - `S_OK` → ok; tracks `ProtectedContentMaskedOut` and rate-limits the DRM warning to once / 10 s (display_base.cpp:140). - `DXGI_ERROR_WAIT_TIMEOUT` → `timeout`. - `WAIT_ABANDONED`, `DXGI_ERROR_ACCESS_LOST`, `DXGI_ERROR_ACCESS_DENIED` → `reinit`. - anything else → `error`. `release_frame` treats `DXGI_ERROR_INVALID_CALL` (already released) as OK and `ACCESS_LOST` as reinit (display_base.cpp:167-189). ##### The capture loop & frame pacing (display_base.cpp:195-338) — the crown jewel - `SetThreadExecutionState(ES_CONTINUOUS | ES_DISPLAY_REQUIRED)` for the whole loop, restored on exit (display_base.cpp:224) — keeps the display awake; the comment explains an idle machine could otherwise oscillate sleep↔reinit forever (display_base.cpp:220-223). - Calls `factory->IsCurrent()` **once per frame** (display_base.cpp:235); if false → `reinit` (HDR toggles, GPU/driver changes). - **"Frame pacing group"** algorithm: it computes an exact `sleep_target` from a group start timestamp + accumulated frame count using rational arithmetic over the **client** framerate (display_base.cpp:243-249), sleeps with the high-precision timer down to 2 ms, then calls `snapshot()` with a tiny (2 ms "elastic") timeout. This decouples capture cadence from display refresh and prevents drift. `adjust_client_frame_rate()` (display_base.cpp:196) snaps the request to an integer divisor of a non-integral refresh (e.g. 59.94) and only ever *lowers* fps so the client doesn't accumulate latency. - **Idle/static-content handling**: when a pacing group can't continue it falls back to a `snapshot(..., 200ms, ...)` long-timeout acquire (display_base.cpp:278). On `timeout` it `sleep_for(10ms)` **without the duplication lock held** (display_base.cpp:307) — with a multi-paragraph comment (display_base.cpp:289-306) explaining that `AcquireNextFrame` holds an **unfair D3D11 device lock** the whole time it blocks, which starves the encode thread (which needs the same `ID3D11Device` to build shared textures), turning encoder reinit from ms into *seconds*. On a timeout it still pushes the (null) image with `frame_captured=false` so the encoder repeats the previous frame (display_base.cpp:316). ##### Empty-frame / update-flag detection (display_vram.cpp:1162-1168, display_ram.cpp:183-189) `AcquireNextFrame` can return `S_OK` with **no actual change** (mouse-only or even nothing). Apollo computes `mouse_update_flag` (`LastMouseUpdateTime != 0 || PointerShapeBufferSize > 0`) and `frame_update_flag` (`LastPresentTime != 0`, RAM path also checks `AccumulatedFrames`), and if neither is set returns `capture_e::timeout` — i.e. an S_OK-but-empty frame is treated like a timeout (repeat last frame). ##### Mode-change / format-change mid-stream (display_vram.cpp:1215-1236, display_ram.cpp:253-265, wgc 1662-1674) On every frame it reads the acquired texture's `D3D11_TEXTURE2D_DESC` and: - if `desc.Width/Height != width_before_rotation/height_before_rotation` → log "Capture size changed" → `reinit` (race between enumeration and mode change). - if `capture_format != desc.Format` (after first learned) → "Capture format changed" → `reinit`. This is the explicit path for **resolution and bit-depth (HDR-toggle) changes mid-stream**. ##### Static-content cursor compositing & the surface state machine (display_vram.cpp:1239-1556) The VRAM path has a 5-state `last_frame_action` × 3-state `out_frame_action` machine (display_vram.cpp:1239-1306). Because the cursor arrives *without* a new desktop frame, Apollo keeps the last frame either as a shared `img_t` (forwardable with zero copy) or as an intermediate `texture2d_t` "surface" onto which it re-blends the cursor each time the pointer moves — using a `std::variant>` (display.h:337). It **delays destruction of the intermediate surface by 10 s** (display_vram.cpp:1383, reaped at :1548) in case the cursor reappears, to avoid churning a 5K texture alloc. Cursor blending is done on the GPU via alpha-blend + invert-blend states and dedicated VS/PS (display_vram.cpp:1448-1478); cursor shapes (color / masked-color / monochrome) are decoded in `make_cursor_alpha_image`/`make_cursor_xor_image` (display_vram.cpp:210-350). ##### HW path: shared keyed-mutex texture pool (display_vram.cpp:1714-1786) Capture images (`img_d3d_t`) are `D3D11_USAGE_DEFAULT` textures created with `D3D11_RESOURCE_MISC_SHARED_NTHANDLE | SHARED_KEYEDMUTEX` (display_vram.cpp:1766). The capture device and the *separate* encoder device synchronize via an `IDXGIKeyedMutex` per image (`AcquireSync(0, INFINITE)`, display_vram.cpp:200, 408); the encoder opens the texture with `OpenSharedResource1` (display_vram.cpp:869). `complete_img` is explicitly **safe to call from the encode thread** (display_vram.cpp:1726) and recreates the texture when the format becomes known or the dummy↔real state flips. Before format is known, capture emits a **black dummy image** (display_vram.cpp:1256, dummy_img) so the encoder pipeline is alive immediately. ##### RAM path (display_ram.cpp) Single `D3D11_USAGE_STAGING` + `CPU_ACCESS_READ` texture created lazily once format is known (display_ram.cpp:235-251). `CopyResource` GPU→staging, `Map`, `std::copy_n` of `height * RowPitch` (display_ram.cpp:300) — note it copies the **padded** rows (keeps `RowPitch`), reallocating `img->data` only when pitch changes (display_ram.cpp:347). Cursor blended on CPU (`blend_cursor`, display_ram.cpp:155). ##### Enumeration discipline & re-enumeration (display_base.cpp:1033-1127) `display_names()` calls `syncThreadDesktop()` **once** before enumerating all GPUs (comment display_base.cpp:1040-1044: "either fully succeed or fully fail, otherwise the capture code switches monitors unexpectedly"), and uses `test_dxgi_duplication(..., enumeration_only=true)` which skips per-output desktop resync and bails immediately on `E_ACCESSDENIED` (display_base.cpp:401). `needs_encoder_reenumeration()` (display_base.cpp:1103) keeps a static factory and uses `IsCurrent()` to detect GPU/driver hot-changes between sessions; always returns true on the first session to absorb boot-time races. ##### Frame timestamps (display_vram.cpp:1170-1174) QPC values from `LastPresentTime`/`LastMouseUpdateTime` are translated to `steady_clock` via `qpc_time_difference`, and stamped on the image (`frame_timestamp`) — this is what feeds the pacing group and downstream latency math. #### punktfunk today #### punktfunk DXGI today (crates/punktfunk-host/src/capture/dxgi.rs, 1647 lines) `DuplCapturer` is a single struct, not a class hierarchy. It is purpose-built for **one SudoVDA virtual output** (not arbitrary monitors), which simplifies a lot but also means several Apollo robustness behaviors were never ported. **What it does well (often *better* than Apollo for its niche):** - Output enumeration searches **all adapters by GDI name** with a 2 s settle-retry, because the SudoVDA monitor is reparented under the *render* GPU (dxgi.rs:781-861) — punktfunk handles the multi-GPU reparenting case by *searching* rather than by Apollo's MinHook approach. - Secure-desktop survival is more advanced than Apollo: `attach_input_desktop` (dxgi.rs:166), re-resolving the GDI name from a stable `target_id` (dxgi.rs:1259), `reassert_isolation` (dxgi.rs:1267), and a **born-lost probe** after every rebuild (dxgi.rs:1278-1294, 1232) to reject a duplication that DuplicateOutput returns S_OK for but that dies on first Acquire. - **Tiered recovery**: cheap `try_reduplicate` on the same device for ACCESS_LOST churn (dxgi.rs:1219) vs. throttled full `recreate_dupl` for output loss (dxgi.rs:1409-1420). Handles `ACCESS_LOST | INVALID_CALL | DEVICE_REMOVED | DEVICE_RESET` (dxgi.rs:1378-1382) — INVALID_CALL recovery is something Apollo's DDA does *not* special-case. - Mid-stream **size** change is adopted across a rebuild (dxgi.rs:1304-1313), and HDR FP16 vs BGRA format is re-detected (dxgi.rs:1323). Full HDR scRGB-FP16 → BT.2020 PQ 10-bit conversion path (dxgi.rs:469-563, 1428-1467). - Cursor compositing on GPU (SDR + HDR-aware) and CPU, color/masked/monochrome decode (dxgi.rs:566-655). - First-frame gets a generous 2000 ms timeout (dxgi.rs:1348); a `nudge_cursor_onto` kick to wake a blank virtual display (dxgi.rs:184, 886). **Where it is weaker / missing / fragile:** 1. **No frame pacing.** The only cadence control is `AcquireNextFrame(timeout_ms)` where `timeout_ms = 2000/refresh` (dxgi.rs:899-902). There is no client-framerate pacing group, no rational refresh-rate snapping (Apollo display_base.cpp:196-216, 243-274), and no high-precision timer. Cadence is whatever the compositor presents at — the encode loop is the pacer elsewhere. 2. **The unfair-lock starvation problem is not addressed.** On a static desktop punktfunk sits in `AcquireNextFrame(timeout_ms)` holding the device lock; there is no "sleep 10 ms without the lock held" mitigation (Apollo display_base.cpp:289-307). punktfunk uses one device for both capture and (zero-copy) NVENC, so encoder init can be starved the same way Apollo describes. 3. **Empty-frame detection is missing.** `acquire` (dxgi.rs:1353-1360) treats any `S_OK` as a real frame and copies/encodes it. It never checks `LastPresentTime`/`LastMouseUpdateTime`/`AccumulatedFrames` — so a mouse-only or empty S_OK frame still triggers a full `CopyResource`+encode (wasted work + a possibly-stale "new" frame). Compare Apollo display_vram.cpp:1162-1168. 4. **No mode-change-by-desc detection on the hot path.** punktfunk only re-reads size *during a rebuild* (dxgi.rs:1298). On the normal acquire path it never compares the acquired texture's `GetDesc().Width/Height/Format` against its own `width/height` (Apollo display_vram.cpp:1215-1236). A live mode change that does *not* fire ACCESS_LOST → punktfunk would `CopyResource` a mismatched-size source into a stale-sized `gpu_copy`/`staging` (silent corruption or a copy error) instead of cleanly reinitializing. 5. **No display-keepalive.** Nothing sets `ES_DISPLAY_REQUIRED` for the capture lifetime (Apollo display_base.cpp:224). On a virtual output this is usually moot, but on idle physical-monitor capture this is the wake/sleep oscillation Apollo specifically guards against. 6. **No `factory.IsCurrent()` per-frame check.** GPU/driver hot-changes and some HDR transitions are only caught reactively via ACCESS_LOST, not proactively (Apollo display_base.cpp:235). 7. **No GPU scheduling-priority / SetMaximumFrameLatency / HAGS-aware NVENC-realtime-avoidance.** None of display_base.cpp:599-713 is ported — notably the NVIDIA+HAGS realtime→high downgrade that prevents NVENC freezes. 8. **No DRM `ProtectedContentMaskedOut` handling/log** (Apollo display_base.cpp:140). #### Transfer opportunities - **Treat S_OK-with-no-change frames as timeouts via DXGI update flags** (sev high, medium) — In dxgi.rs acquire(), after a successful AcquireNextFrame, compute frame_update_flag = info.LastPresentTime != 0 (and/or info.AccumulatedFrames != 0) and mouse_update_flag from LastMouseUpdateTime/PointerShapeBufferSize. Always call update_cursor (mouse). If !frame_update_flag, ReleaseFrame and return Ok(None) (so next_frame repeats last_present) UNLESS the cursor moved and we need a recomposite — in which case recomposite onto the existing last_present texture instead of CopyResource'ing the source. This cuts idle/cursor-only GPU load and avoids re-encoding unchanged content. - **Detect resolution/format change on the acquire hot path, not only during rebuild** (sev high, small) — In acquire(), after res.cast::(), call GetDesc and compare Width/Height/Format against self.width/height and the expected format (BGRA8 vs R16G16B16A16_FLOAT). On mismatch, ReleaseFrame and run the existing recreate_dupl path (or drop gpu_copy/staging/fp16/hdr10 textures and update width/height/hdr_fp16) so the encoder re-inits cleanly. This makes live resolution + HDR-toggle changes robust even when DDA doesn't fault. - **Release the duplication device lock during idle to avoid encoder starvation** (sev medium, small) — Cap the per-acquire DDA timeout to a small value (e.g. 8-16ms) and, when it returns WAIT_TIMEOUT, std::thread::sleep a few ms with no outstanding AcquireNextFrame before retrying — so the encode thread can grab the device for NVENC setup/reinit. Keep the generous timeout only for first_frame. Low risk, directly mirrors Apollo's documented fix. - **Add client-framerate frame pacing with a high-precision timer** (sev medium, large) — Add an optional pacing layer (in dxgi.rs or the encode-loop caller in punktfunk1.rs/encode.rs) keyed on the negotiated client framerate: track a group start from the frame pts, sleep to the computed target with a Windows high-resolution timer (timeBeginPeriod or CREATE_WAITABLE_TIMER_HIGH_RESOLUTION), and snap near-integral refresh to integer divisors. This is the lever for steady pacing on odd refresh rates without changing the zero-copy design. - **Harden GPU scheduling priority + SetMaximumFrameLatency + NVIDIA-HAGS NVENC-realtime avoidance** (sev medium, medium) — After D3D11CreateDevice in dxgi.rs (and the NVENC encoder device wherever it's built), query IDXGIDevice1::SetMaximumFrameLatency(1) and SetGPUThreadPriority; load gdi32 D3DKMTSetProcessSchedulingPriorityClass and request HIGH (not REALTIME) when the adapter is NVIDIA (VendorId 0x10DE) with HAGS on, REALTIME otherwise. Mirror the privilege-enable. Guard behind admin/SYSTEM (host already relaunches as SYSTEM). - **Retry DuplicateOutput at startup and request encoder-supported formats via Output5** (sev medium, small) — In open() wrap DuplicateOutput in a short retry (2-3 tries, ~200ms apart, re-attach_input_desktop between) before bailing. Optionally cast the output to IDXGIOutput5 and call DuplicateOutput1 with an explicit format list (BGRA8 for SDR, R16G16B16A16_FLOAT for HDR) so the capture format is intentional rather than incidental, falling back to DuplicateOutput when Output5 is absent. ### Windows.Graphics.Capture (WGC) path — Apollo vs punktfunk #### Apollo #### Apollo's WGC implementation (`src/platform/windows/display_wgc.cpp`, `display_base.cpp`, `display_vram.cpp`) Apollo (Sunshine lineage) treats WGC as a **first-class peer backend to DDA**, sharing the entire `display_base_t` capture loop, image-pool, and frame-pacing machinery. The WGC-specific code is small (`display_wgc.cpp` is 347 lines) because the hard-won infrastructure lives in `display_base_t::capture()` and is reused unchanged. ##### Backend selection (DDA vs WGC) — `display_base.cpp:996` `platf::display(...)` is config-driven with **DDA preferred, WGC fallback**: - `config::video.capture == "ddx" || empty` → try `display_ddup_vram_t`/`display_ddup_ram_t` first (`display_base.cpp:997-1011`). - `config::video.capture == "wgc" || empty` → try `display_wgc_vram_t`/`display_wgc_ram_t` (`display_base.cpp:1013-1027`). - If both `init()` fail, return `nullptr` (`:1030`). So the default ("empty") order is **DDA first, then WGC** — the opposite of punktfunk's default. WGC is the fallback for the cases DDA can't handle (overlay/MPO HDR animations, ACCESS_LOST churn), but DDA is preferred for its lower overhead and secure-desktop ability. - Note: both WGC backends still call `display_base_t::init()` (`display_vram.cpp:1707`), which itself runs `test_dxgi_duplication()` during output enumeration (`display_base.cpp:495`) — i.e. Apollo's display selection is gated on DDA being *enumerable* even for the WGC path; WGC reuses the same `output`/`device`/`adapter` chosen by the DDA-oriented init. ##### Device + WinRT bring-up — `display_wgc.cpp:84-158` 1. `GraphicsCaptureSession::IsSupported()` gate (`:89`) — bails cleanly on old Windows. 2. Reuses the **already-created** `display->device` (the D3D11 device chosen in `display_base_t::init`), `QueryInterface(IID_IDXGIDevice)` (`:93`), then `CreateDirect3D11DeviceFromDXGIDevice` → WinRT `IDirect3DDevice` (`:97,107`). This guarantees the frame-pool textures land **on the same device as the encoder** → zero-copy `CopyResource` (`display_vram.cpp:1686`). 3. `GetDesc()` for the monitor HMONITOR (`:108`), then `get_activation_factory()` → `CreateForMonitor(output_desc.Monitor, ...)` (`:110-112`). Per-monitor capture item. 4. Capture format chosen from `config.dynamicRange`: `R16G16B16A16_FLOAT` (HDR scRGB) else `B8G8R8A8_UNORM` (`:117-121`). ##### Frame pool — `display_wgc.cpp:124` `Direct3D11CaptureFramePool::CreateFreeThreaded(uwp_device, format, 2, item.Size())`. **Two buffers**, free-threaded (callback fires on a WinRT threadpool thread, not a UI/DispatcherQueue thread — critical for a headless host with no message pump). `CreateCaptureSession(item)` then `FrameArrived(...)` registers the producer callback (`:125-126`). ##### Border removal — `display_wgc.cpp:131-139` `IsBorderRequired(false)` to kill the yellow capture rectangle, but **guarded by `ApiInformation::IsPropertyPresent("...GraphicsCaptureSession", "IsBorderRequired")`** (`:132`) and wrapped in try/catch — older Windows builds lack the property and would throw. Logs a warning instead of failing. This is the canonical hard-won pattern: every optional WGC property is feature-probed, never assumed. ##### MinUpdateInterval — `display_wgc.cpp:140-150` Also `ApiInformation`-probed. Apollo sets it to `10000000 / (config.framerate * 2)` 100ns ticks — i.e. it allows WGC to deliver up to **2× the client framerate** so the consumer pacing loop has fresh frames to choose from, rather than capping at exactly the client rate. Wrapped in try/catch. ##### Producer/consumer threading — `display_wgc.cpp:165-221` - **Producer** (`on_frame_arrived`, runs on WGC threadpool thread): `TryGetNextFrame()`, take `SRWLOCK` exclusive, **`Close()` the previous un-consumed `produced_frame`** (`:177` — drops stale frames, keeps latency at one frame), store the new frame, release lock, `WakeConditionVariable` (`:165-183`). - **Consumer** (`next_frame`, runs on capture thread): `release_frame()` first to free the last consumed frame, then take lock and `SleepConditionVariableSRW` with timeout (`:196-204`); distinguishes `ERROR_TIMEOUT` → `capture_e::timeout` from other → `error`. Handles **spurious wakeups** (`consumed_frame == nullptr` → timeout, `:210`). Then `Surface().as()->GetInterface(IID_ID3D11Texture2D, out)` to extract the texture (`:214-218`). - Timestamp: `consumed_frame.SystemRelativeTime().count()` — **raw QPC ticks** (`:219`), later converted to steady_clock via `qpc_time_difference` in the snapshot (`display_vram.cpp:1658`). This is real per-frame presentation time, not wall-clock-now. ##### Cursor toggle — `display_wgc.cpp:231-240` `set_cursor_visible(bool)`: reads `IsCursorCaptureEnabled()`, only writes if changed (avoids redundant COM calls), wrapped in try/catch. Crucially this is called **every snapshot** from `display_*_t::snapshot(... cursor_visible)` (`display_vram.cpp:1652`), so the host can toggle OS cursor compositing live mid-stream (Moonlight tells the host whether the client renders its own cursor). The OS composites the cursor into the WGC frame when enabled — no manual blend pass. ##### VRAM (zero-copy) snapshot — `display_vram.cpp:1649-1700` - `next_frame` → src texture, then `GetDesc` (`:1660`). - **Size-change → `capture_e::reinit`** comparing against `width_before_rotation`/`height_before_rotation` (`:1664`) — note WGC textures are pre-rotation; Apollo's `display_base_t::init` tracks rotation (`display_base.cpp:503-511`). - **Format-change → `reinit`** (`:1671`). - `pull_free_image_cb` for a recycled image, then **`texture_lock_helper` (a keyed-mutex)** around `CopyResource` into the encoder-visible `capture_texture` (`:1684-1686`) — the capture and encode D3D11 devices are different, so the shared texture is guarded by a keyed mutex (`complete_img` allocates it). ##### Frame pacing & reinit loop (shared, in `display_base.cpp:195-338`) — the real value This is the machinery WGC inherits for free: - **`factory->IsCurrent()` checked once per frame** (`:235`) → `reinit` on any GPU/display/HDR-state change. This is how Apollo notices HDR toggles, mode changes, GPU hotplug. - **Frame-pacing groups** (`:217-309`): tracks a group start timestamp + frame count, computes a precise sleep target from the (rational) adjusted client framerate, sleeps via a high-precision `timer` to within 2ms then does an elastic 2ms `snapshot`. Avoids accumulating latency while matching display cadence. `adjust_client_frame_rate()` (`:196`) snaps to integer multiples of the real refresh for non-integral rates. - **`SetThreadExecutionState(ES_DISPLAY_REQUIRED)`** during capture (`:224`) — keeps the display awake so capture doesn't stall / oscillate sleep-wake. - The famous **unfair-lock starvation comment** (`:289-308`): when idle (DDA path), `AcquireNextFrame` holds a lock that starves the encode thread; Apollo sleeps 10ms unlocked at max timeout. (WGC doesn't hold that lock, but shares the loop.) ##### HDR plumbing — `display_base.cpp:743-809` `is_hdr()` checks `IDXGIOutput6::GetDesc1().ColorSpace == G2084_NONE_P2020` (`:755`). `get_hdr_metadata()` (`:758`) reports **Rec.2020 primaries + D65** (hard-overridden, `:780-787`) regardless of the reported scRGB primaries, with a long comment explaining the reasoning (clients mostly ignore primaries, luminance matters). DRM-protected content: DDA path logs `ProtectedContentMaskedOut` rate-limited (`display_base.cpp:140`). ##### GPU-preference hook (affects WGC too) — `display_base.cpp:417-454` `MinHook` of `NtGdiDdDDIGetCachedHybridQueryValue` in win32u.dll to force `D3DKMT_GPU_PREFERENCE_STATE_UNSPECIFIED`, preventing DXGI **output reparenting** (moving an output onto the render GPU) which breaks capture on hybrid-GPU laptops. Plus realtime GPU scheduling priority with an NVIDIA-HAGS carve-out (`:663-672`) and `SetMaximumFrameLatency(1)` (`:709`). ##### Hard-won edge cases (the years of Sunshine) - Every optional WGC property feature-probed via `ApiInformation` (border, MinUpdateInterval). - MinGW GUID workaround for `IDirect3DDxgiInterfaceAccess` (`:40-61`). - Free-threaded pool (no DispatcherQueue) for headless service operation. - Drop-stale-on-produce keeps exactly one frame of latency. - Spurious-wakeup guard on the condvar. - Size/format change → reinit (mode-change race). - `IsCurrent()` per-frame reinit catches HDR/GPU/mode transitions. - Keyed-mutex around the cross-device copy. - DDA preferred, WGC as the overlay/MPO/HDR-animation-correct fallback. #### punktfunk today #### punktfunk's WGC today (`crates/punktfunk-host/src/capture/wgc.rs`, `capture.rs`) punktfunk has a **competent, in some ways more advanced** WGC backend (`wgc.rs`, 520 lines) that already does several things Apollo doesn't, but it is **WGC-default with a watchdog DDA fallback** and carries an explicit hang workaround that signals fragility on the SudoVDA/IddCx virtual display. ##### What it does well (often ahead of Apollo) - **Drain-to-newest with an atomic frame counter** (`wgc.rs:90-94, 296-326`): `WgcSignal{available: AtomicU64}` incremented in the FrameArrived callback (`:221`); `wait_and_drain` calls `TryGetNextFrame` exactly `available - consumed` times and keeps only the last → one-frame latency, no empty-pool ambiguity. This is cleaner than Apollo's close-the-previous-frame approach and avoids the empty-`TryGetNextFrame` race. - **3 frame-pool buffers** for 240Hz headroom (`wgc.rs:208-211`) vs Apollo's 2. - **In-pipeline HDR conversion** (`wgc.rs:395-418`): FP16 scRGB → BT.2020 PQ R10G10B10A2 via `HdrConverter` shader, output handed to NVENC zero-copy as `FramePayload::D3d11`. Apollo also does scRGB→PQ but punktfunk keeps it on the capture device. - **HDR detection** via `IDXGIOutput6::GetDesc1().ColorSpace == G2084` (`wgc.rs:171-176`) — matches Apollo. - **Border removal** `SetIsBorderRequired(false)` (`wgc.rs:235`), **cursor compositing** `SetIsCursorCaptureEnabled(true)` (`:233`), **MinUpdateInterval** set to the client's full refresh (`:238-243`). - **`nudge_cursor_onto`** to force the first composition on a static desktop (`wgc.rs:249`) — a real fix WGC needs (FrameArrived only fires on change). - **SYSTEM-account impersonation** (`wgc.rs:58-84, 143-144`): WGC won't activate under SYSTEM (0x80070424), so it `WTSQueryUserToken` + `ImpersonateLoggedOnUser` for the activation, RAII-reverted. This is a genuinely hard-won fix Apollo's codebase doesn't need (Sunshine runs differently) and is well done. - **`last_present` repeat-frame on static desktop** (`wgc.rs:118-119, 452-454`) so a no-change desktop still produces frames. ##### Where it is weaker / missing / fragile 1. **No size-change or format-change → reinit.** `process_frame` (`wgc.rs:385-428`) never re-reads the source `GetDesc()`; `width`/`height` are frozen at `open()` (`:196`). If the SudoVDA monitor mode changes mid-session (punktfunk's whole premise is per-session mode + mid-stream `Reconfigure`), the CopyResource into a fixed-size `bgra_copy`/`hdr10_out` will mismatch — Apollo explicitly handles this (`display_vram.cpp:1664-1674`, `display_wgc.cpp:297-306`). 2. **No "display/GPU/HDR changed" detection.** Apollo's `factory->IsCurrent()` per-frame (`display_base.cpp:235`) is absent. punktfunk cannot notice an HDR toggle, mode change, or GPU reparent and reinit — it will keep capturing stale geometry/colorspace. 3. **`SetIsBorderRequired`/`SetMinUpdateInterval` are not `ApiInformation`-probed.** They are `let _ =` best-effort (`wgc.rs:235,243`), which swallows the error but doesn't log a warning on the (common on older builds) unsupported case — harder to diagnose than Apollo's probed+warned path. 4. **`cursor_visible` is hard-wired `true` at open** (`wgc.rs:233`) — there is **no live per-frame cursor toggle**. Apollo calls `set_cursor_visible` every snapshot (`display_vram.cpp:1652`). A client that renders its own cursor will get a doubled cursor and cannot turn off host compositing mid-stream. 5. **The `CreateFreeThreaded` hang workaround** (`capture.rs:282-315`): WGC open runs on a throwaway thread with a 5s `recv_timeout`; a hang abandons the thread and falls back to DDA. This is a real observed bug on IddCx virtual displays but is a blunt instrument — there's no root-cause fix and an abandoned thread leaks. 6. **Per-frame allocation in the HDR path**: `CreateRenderTargetView` is called **every frame** (`wgc.rs:407-409`) instead of being cached like `fp16_srv`/`hdr10_out` — needless churn at 240fps. 7. **No QPC frame timestamp.** `pts_ns` uses `SystemTime::now()` (`wgc.rs:436,514`) at process time, not `Direct3D11CaptureFrame::SystemRelativeTime()`. Apollo uses the real presentation QPC (`display_wgc.cpp:219`) → accurate frame pacing and latency math. punktfunk's whole clock-skew/latency story would benefit from the true capture timestamp. 8. **No rotation handling** (Apollo tracks `width_before_rotation`); minor for virtual displays but a gap. #### Transfer opportunities - **Detect size/format change in WGC and signal reinit** (sev high, medium) — In WgcCapturer::process_frame, call src.GetDesc() and compare Width/Height/Format against self.width/height and the expected format. On mismatch, return a Reinit error (add a capture_e::Reinit-equivalent to the Capturer contract or bail with a recognizable error the m3/stream loop maps to a capturer rebuild). Drop and re-create fp16_src/hdr10_out/bgra_copy when size changes. - **Per-frame IsCurrent() check to catch HDR/GPU/mode changes** (sev high, small) — Hold an IDXGIFactory1 in WgcCapturer (from the same adapter as make_device) and call IsCurrent() at the top of next_frame/wait_and_drain; on false, return the reinit signal. This pairs with wgc-size-format-reinit to give a complete change-detection story. - **Live per-frame cursor-capture toggle** (sev medium, small) — Add a set_cursor_visible(&self, bool) to the Capturer trait (default no-op; DDA already composites the cursor itself) and implement it on WgcCapturer to flip session.SetIsCursorCaptureEnabled only when the value changes. Wire the stream loop's client cursor-mode to it. - **Use SystemRelativeTime (QPC) as the frame timestamp** (sev medium, medium) — In process_frame, read frame.SystemRelativeTime() and convert QPC ticks to a monotonic ns timestamp (QueryPerformanceFrequency once); set CapturedFrame.pts_ns from that instead of now_ns(). Mirror the DDA path so both backends report consistent presentation time. - **Cache the HDR RTV and feature-probe optional session properties** (sev low, small) — Cache the hdr10_out RTV in a struct field alongside fp16_srv (create it in ensure_hdr10_out, recreate on resize). Optionally probe the two session properties via ApiInformation and log a one-time warning when absent, matching Apollo's diagnosability. - **Root-cause the CreateFreeThreaded hang instead of the watchdog/abandon** (sev medium, large) — Before CreateFreeThreaded, ensure the SudoVDA output is fully attached/lit by confirming a successful DXGI DuplicateOutput probe (Apollo's test_dxgi_duplication equivalent) and that item.Size() is non-zero; gate WGC on that probe. Keep the watchdog as a last resort but verify the hang disappears once the output is confirmed enumerable, eliminating the abandoned-thread leak. ### Windows host: cursor capture & compositing (DXGI Desktop Duplication) #### Apollo Apollo composites the hardware cursor entirely on the GPU and has split the problem into two independent overlay passes — an **alpha pass** and an **XOR/invert pass** — because a single DXGI cursor shape can simultaneously require both. This dual-pass decomposition is the single most important hard-won insight in the file. #### 1. Where the cursor data comes from DXGI Desktop Duplication never bakes the HW cursor into the captured surface — the OS composites it separately. Apollo pulls it out of `DXGI_OUTDUPL_FRAME_INFO` each frame in `display_ddup_vram_t::snapshot` (`display_vram.cpp:1150`): - `mouse_update_flag = frame_info.LastMouseUpdateTime.QuadPart != 0 || frame_info.PointerShapeBufferSize > 0` (`display_vram.cpp:1162`). Crucially, a **mouse-only update with no new desktop frame is still a real update** — `update_flag = mouse_update_flag || frame_update_flag`, and if neither is set it returns `capture_e::timeout` (`display_vram.cpp:1164-1168`). This means a moving cursor over a static desktop still produces output frames. - Shape is fetched only when `frame_info.PointerShapeBufferSize > 0` via `dup.dup->GetFramePointerShape(...)` into a sized buffer (`display_vram.cpp:1176-1187`). Shape and position are decoupled — position (`PointerPosition`) updates far more often than shape. - Position/visibility are applied only when `frame_info.LastMouseUpdateTime.QuadPart` is non-zero (`display_vram.cpp:1198-1202`), via `gpu_cursor_t::set_pos(...)`. #### 2. The three DXGI shape types and the dual-image decomposition `make_cursor_alpha_image` (`display_vram.cpp:279`) and `make_cursor_xor_image` (`display_vram.cpp:210`) each take the raw shape and emit a BGRA image destined for one of the two passes. The key idea: **every shape is decomposed into an alpha-blended component AND an XOR-blended component**, and either can be empty. - **COLOR** (`DXGI_OUTDUPL_POINTER_SHAPE_TYPE_COLOR`): straight premultiplied-ish ARGB bitmap. Alpha image = the bitmap as-is (`display_vram.cpp:304-306`); XOR image = empty (`display_vram.cpp:215-217`). One alpha pass, done. - **MASKED_COLOR** (`display_vram.cpp:218`, `285`): the alpha byte is a *mask selector*, legal only as `0xFF` or `0x00`. The genius is splitting one masked-color shape across BOTH passes: - Alpha image: pixels with `alpha==0x00` become opaque (`pixel |= 0xFF000000`), pixels with `alpha==0xFF` become transparent (`display_vram.cpp:288-301`). So the "copy these RGB pixels" set goes through alpha blending. - XOR image: pixels with `alpha==0xFF` are kept as-is to be XOR-blended, pixels with `alpha==0x00` become transparent (`display_vram.cpp:221-233`). So the "invert screen under these pixels" set goes through invert blending. - Illegal alpha values are logged as a warning and left (`display_vram.cpp:231`, `299`). - **MONOCHROME** (`display_vram.cpp:236`, `307`): 1-bpp AND mask (top half) + XOR mask (bottom half), `shape_info.Height /= 2`. The 2-bit code per pixel (`(AND?1:0) + (XOR?2:0)`) maps to four Windows cursor semantics, again split across both passes: - Alpha image (`display_vram.cpp:330-341`): code 0 → opaque **black**, code 2 → opaque **white**, codes 1 and 3 → transparent. - XOR image (`display_vram.cpp:259-268`): code 3 ("inverse of screen") → `0xFFFFFFFF` (full white, fed to invert blend), all others → transparent. - So a monochrome cursor with both opaque pixels (arrow body) and screen-inverting pixels (some I-beams) is rendered correctly by running both passes. This is the case a naive single-pass implementation silently gets wrong. #### 3. GPU upload and positioning `set_cursor_texture` (`display_vram.cpp:1106`) uploads each cursor image to an **IMMUTABLE** `B8G8R8A8_UNORM` texture (`display_vram.cpp:1127-1128`) with `SysMemPitch = 4 * shape_info.Width` and creates an SRV. Empty images reset the SRV/texture so the corresponding pass is skipped (`display_vram.cpp:1108-1112`). Height is derived as `cursor_img.size() / data.SysMemPitch` (`display_vram.cpp:1123`) — robust against the monochrome half-height. Positioning uses the **rasterizer viewport** rather than vertex math: `gpu_cursor_t::cursor_view` is a `D3D11_VIEWPORT` whose TopLeft/Width/Height are set in `update_viewport` (`display.h:106-137`). The cursor VS (`cursor_vs.hlsl`) is a fullscreen triangle (`generate_fullscreen_triangle_vertex`) — the viewport clips it down to exactly the cursor rect at the cursor position. `update_viewport` handles **all four display rotations** (UNSPECIFIED/IDENTITY, ROTATE90/180/270) by swapping width/height and recomputing TopLeft (`display.h:107-136`) — a real shipping concern for tablets/rotated monitors. A `rotate_texture_steps` constant buffer (`b2`, `display_vram.cpp:1589-1598`, `cursor_vs.hlsl:1-3`) also rotates the texture sampling so the cursor bitmap itself is oriented correctly. #### 4. The two blend states and the actual draw `make_blend(device, enable, invert)` (`display_vram.cpp:71`): - alpha blend: `SrcBlend=SRC_ALPHA, DestBlend=INV_SRC_ALPHA, Op=ADD`, alpha channel zeroed (`SrcBlendAlpha=ZERO, DestBlendAlpha=ZERO`) (`display_vram.cpp:87-92`). - invert blend: `SrcBlend=INV_DEST_COLOR, DestBlend=INV_SRC_COLOR` (`display_vram.cpp:83-84`). This is the I-beam / "inverse of screen" effect. `blend_cursor` lambda (`display_vram.cpp:1448`): binds the cursor VS+PS, then for the alpha texture sets `blend_alpha` with write-mask `0xFFFFFFFFu` and draws 3 verts (`display_vram.cpp:1453-1460`); for the XOR texture sets `blend_invert` but with **sample mask `0x00FFFFFFu`** — i.e. it deliberately does NOT touch the alpha channel during the invert pass (`display_vram.cpp:1462-1469`). It restores `blend_disable` and unbinds the RTV/viewport/SRV afterward (`display_vram.cpp:1471-1477`). `blend_alpha`/`blend_invert`/`blend_disable` are created once in init (`display_vram.cpp:1627-1635`). #### 5. The hard-won state machine: cursor-without-new-frame The deepest edge case is "the cursor moved/changed but DesktopDuplication gave us no new desktop frame." Apollo solves this with an explicit **last-frame state machine** (`display_vram.cpp:1239-1306`) over a `last_frame_variant` that is either a shared image, an intermediate D3D surface, or monostate: - New frame + need cursor blend → copy frame to an **intermediate D3D surface** (lighter than a shared image; "surface sharing between direct3d devices produce significant memory overhead", `display_vram.cpp:1267`), then copy surface→fresh image and blend cursor (`copy_src_to_surface` + `copy_last_surface_and_blend_cursor`, `display_vram.cpp:1269-1271`). - New frame + no cursor → copy straight to image, forward it (`copy_src_to_img`/`forward_last_img`). - **No new frame + need cursor** → reuse the saved intermediate surface (or convert the last image into a surface, `replace_img_with_surface`) and blend the cursor onto a fresh copy (`display_vram.cpp:1282-1291`). This is what makes the cursor track smoothly over a static screen. - No new frame + no cursor (cursor disappeared/disabled) → convert intermediate surface back to a plain image to free surface memory (`replace_surface_with_img`) and forward last image (`display_vram.cpp:1293-1303`). - `blend_mouse_cursor_flag = (cursor_alpha.visible || cursor_xor.visible) && cursor_visible` (`display_vram.cpp:1204`) — gates on both DXGI visibility AND the host's per-session cursor toggle. - A subtle correctness fix: the dummy-fallback path re-clears a previously-used dummy image because it could have a stale cursor blended onto it (`reclear_dummy`, `display_vram.cpp:1526-1537`). - The intermediate surface is destroyed **lazily** (10 s delay, `old_surface_delayed_destruction`/`old_surface_timestamp`, `display_vram.cpp:1308-1313`, `1383-1385`, `1548-1550`) so a cursor that vanishes and reappears doesn't thrash surface allocation. #### 6. HDR cursor brightness In HDR (`config.dynamicRange && is_hdr()`), Apollo swaps in `cursor_ps_normalize_white.hlsl` and a white-multiplier constant buffer set to `300.0f / 80.f` (`display_vram.cpp:1600-1618`) so the SDR-white cursor maps to ~300 nits on the scRGB FP16 surface instead of looking dim. The code comment explicitly notes there is no Win32 API to read the user's SDR white level so 300 nits is hard-coded. #### 7. WGC path — let the OS do it For the Windows.Graphics.Capture backend Apollo does **not** composite at all: `wgc_capture_t::set_cursor_visible` just toggles `capture_session.IsCursorCaptureEnabled(x)` (`display_wgc.cpp:231-239`), called every snapshot (`display_vram.cpp:1652`, `display_wgc.cpp:262`). WGC bakes the cursor in for you. The manual two-pass machine is only for the DDA path. #### 8. Frame timing of cursor updates QPC timestamps use `std::max(LastPresentTime, LastMouseUpdateTime)` (`display_vram.cpp:1171`) so a cursor-only update still carries a valid timestamp into the frame-pacing group in `display_base_t::capture` (`display_base.cpp:243-284`). #### punktfunk today punktfunk's cursor handling lives in `crates/punktfunk-host/src/capture/dxgi.rs` and is a competent single-pass design that is materially weaker than Apollo's in several specific, verifiable ways. **What it does today (and does well):** - Pulls cursor state from `DXGI_OUTDUPL_FRAME_INFO` in `update_cursor` (`dxgi.rs:1104-1140`): position/visibility gated on `LastMouseUpdateTime != 0`, shape fetched via `GetFramePointerShape` only when `PointerShapeBufferSize > 0`. Decoupled shape/position like Apollo. Shape is cached device-independently (`cursor_shape: Option<(Vec,u32,u32)>`, `dxgi.rs:761`) so it survives a device recreate, with a `cursor_dirty` re-upload flag (`dxgi.rs:764-765`, `1170-1180`) — a nice touch. - GPU composite in `composite_cursor_gpu` (`dxgi.rs:1146`) using a `CursorCompositor` (`dxgi.rs:263`) with an alpha blend (`SRC_ALPHA`/`INV_SRC_ALPHA`) and a separate invert blend (`INV_DEST_COLOR`/`INV_SRC_ALPHA`, `dxgi.rs:310-322`), selected per-frame by `cursor_invert`. Position via a constant-buffer rect computed in NDC in `draw` (`dxgi.rs:405-408`) rather than a viewport. - HDR cursor handling is actually MORE complete than Apollo's: configurable nits via `PUNKTFUNK_HDR_CURSOR_NITS` (default 203, BT.2408) and a proper sRGB→linear decode in the PS (`dxgi.rs:216-226`, `1186-1208`) vs Apollo's hard-coded 300/80. - CPU fallback blend (`blend_cursor_cpu`, `dxgi.rs:660`) for the software-encode path, with true XOR inversion. **Where it is weaker / fragile / missing:** 1. **Single-pass, not dual-pass — the core correctness gap.** `convert_pointer_shape` (`dxgi.rs:566`) produces ONE BGRA image and `cursor_invert` picks ONE blend for the whole shape (`dxgi.rs:1133-1134`, `1205`). A MASKED_COLOR or MONOCHROME cursor that mixes opaque pixels with screen-inverting pixels cannot be rendered correctly — Apollo runs both an alpha pass and an XOR pass for exactly this reason. punktfunk's masked-color path even forces opaque (`mask==0`) pixels through the same invert blend as the XOR pixels when `cursor_invert` is set (`dxgi.rs:612-624`), which mis-renders the opaque component. 2. **MONOCHROME "inverse of screen" is dropped entirely.** `convert_pointer_shape` maps the `(1,1)` AND/XOR code to "approximate as black" (`dxgi.rs:644`) with an explicit comment, and monochrome shapes never set `cursor_invert` (it's only set for MASKED_COLOR, `dxgi.rs:1133`). So screen-inverting monochrome cursors are rendered as opaque black — Apollo handles this via its XOR pass (`display_vram.cpp:265-266`). 3. **No display-rotation support.** There is a `display_rotation` analogue nowhere; the cursor rect is computed assuming identity orientation (`dxgi.rs:405-408`). Apollo's `gpu_cursor_t::update_viewport` covers all four rotations (`display.h:107-136`). 4. **No cursor-without-new-frame smoothness.** punktfunk only composites the cursor when it gets a fresh duplication frame in `acquire` (`dxgi.rs:1477`, `1504-1518`); on `AcquireNextFrame` timeout it repeats `last_present` verbatim (`dxgi.rs:1547-1561`) which still has the OLD cursor position baked in. A cursor moving over a static desktop will appear to stutter/lag, because DDA emits mouse-only updates that punktfunk's pacing turns into repeats of a stale composite. Apollo's whole `last_frame_variant` state machine (`display_vram.cpp:1239-1306`) exists to fix this — composite the moved cursor onto an intermediate copy of the last surface even with no new desktop frame. 5. **Cursor baked destructively into the reused `gpu_copy` texture.** `composite_cursor_gpu` draws straight onto `self.gpu_copy` (`dxgi.rs:1474-1477`), which is then repeated as `last_present`. There is no intermediate clean surface; the next frame's `CopyResource` overwrites it, but any repeat-frame path serves a frame with a stale-position cursor burned in. Apollo deliberately keeps a clean intermediate surface and blends onto a *fresh* image each time (`display_vram.cpp:1265-1271`). 6. **Illegal masked-color alpha values silently mishandled.** punktfunk treats any non-zero mask byte as "invert" (`dxgi.rs:612` else-branch) without validating it is `0xFF`; Apollo logs illegal alphas (`display_vram.cpp:231`, `299`). 7. **`PUNKTFUNK_NO_CURSOR` kill-switch** (`dxgi.rs:1150`) suggests the per-frame RTV-create + draw was suspected of high 3D-engine cost — Apollo creates RTVs from `img.capture_rt` held on the image, not freshly each composite. #### Transfer opportunities - ✅ **DONE (2026-06-16)** — **Split every cursor shape into an alpha image + an XOR image (two-pass composite)** (sev high, medium) — Refactor convert_pointer_shape in dxgi.rs to return two optional images (alpha, xor) mirroring Apollo's split. Store cursor_shape as Option<(alpha, xor)>, upload up to two SRVs in CursorCompositor, and in composite_cursor_gpu run the alpha pass with self.blend then the xor pass with self.blend_invert (skip empties). Drop the single cursor_invert flag. - **Render the monochrome 'inverse of screen' pixels via the XOR pass instead of dropping them** (sev medium, small) — In convert_pointer_shape's monochrome branch (dxgi.rs:628-654), once the dual-pass split (above) exists, route code (1,1) to the XOR image as white and codes (0,0)/(0,1) to the alpha image as opaque black/white, matching Apollo's case mapping. - ⊘ **ALREADY-HANDLED (2026-06-16; premise incorrect — DDA returns S_OK on pointer-only updates, punktfunk recomposites)** — **Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame** (sev high, large) — Keep a clean intermediate copy of the last desktop frame (an extra DEFAULT texture). In acquire (dxgi.rs:1341), when AcquireNextFrame times out but update_cursor saw a position change (LastMouseUpdateTime changed) and the cursor is visible, copy the clean intermediate into gpu_copy and re-run composite_cursor_gpu, then return that as a fresh frame instead of repeating last_present. - **Stop baking the cursor destructively into the repeated gpu_copy texture** (sev medium, medium) — Add a clean base texture: CopyResource(duplication -> clean_base), then CopyResource(clean_base -> gpu_copy) and composite onto gpu_copy. Repeat clean_base (cursor-free) plus a re-composite on repeats. Also create the cursor RTV once per gpu_copy and cache it rather than CreateRenderTargetView every composite (dxgi.rs:1181-1184). - **Handle rotated outputs in cursor positioning** (sev low, medium) — Read rotation from DXGI_OUTDUPL_DESC.Rotation when opening/rebuilding the duplication (around dxgi.rs:888 and 1298), store it on DuplCapturer, and apply Apollo's rotation transform when computing the NDC rect in CursorCompositor::draw and when sampling the cursor texture in the VS. - **Validate masked-color mask bytes and log illegal values** (sev low, small) — In the MASKED_COLOR branch of convert_pointer_shape (dxgi.rs:594-627), branch explicitly on mask==0x00 vs mask==0xFF and emit a tracing::warn! once for any other value, matching Apollo's guard, so future cursor-render bugs are observable. ### Windows host HDR / 10-bit / colorspace plumbing #### Apollo #### Apollo's Windows HDR / 10-bit / colorspace pipeline Apollo's design cleanly separates **(a)** a protocol-level colorspace decision, **(b)** a GPU **RGB→YUV** conversion into a native encoder pixel format (NV12 / P010 / AYUV / Y410), and **(c)** encoder VUI signaling + HDR10 static metadata. Capture is always **scRGB FP16** when the desktop is in advanced-color mode; conversion to the codec's YUV format happens in Apollo's own D3D11 shader, *not* inside NVENC. ##### 1. The colorspace decision (`src/video_colorspace.cpp`) `colorspace_from_client_config()` (`video_colorspace.cpp:22`) is the single brain. Inputs: `config.dynamicRange` (0/1 from the client), `hdr_display` (does the physical/virtual output actually have HDR on), `config.encoderCscMode`, `config.chromaSamplingType`. - HDR is chosen ONLY when `dynamicRange>0 AND hdr_display` (`:29`) → `colorspace_e::bt2020` (Rec.2020 + ST2084 PQ). This is the crucial **AND**: a client asking for HDR against an SDR desktop does NOT get a PQ stream. - Otherwise SDR: `encoderCscMode>>1` selects rec601 / rec709 / bt2020sdr (`:33-53`), `encoderCscMode & 0x1` is the **full/limited range** bit (`:56`). - `bit_depth` derives from `dynamicRange` (0→8, 1→10) (`:58-71`), with a guard that `bt2020sdr` requires 10-bit or it falls back to rec709 (`:73-76`). - `colorspace_is_hdr()` is just `colorspace == bt2020` (`:18`). `avcodec_colorspace_from_sunshine_colorspace()` (`:81`) maps each enum to the **four** ffmpeg fields that must agree: `AVCOL_PRI_*` primaries, `AVCOL_TRC_*` transfer (note `AVCOL_TRC_BT2020_10` for SDR-2020 vs `AVCOL_TRC_SMPTE2084` for HDR, `:105`/`:114`), `AVCOL_SPC_*` matrix (`BT2020_NCL`), `software_format` for swscale, and `range = full?JPEG:MPEG` (`:120`). The hard-won detail: **transfer ≠ matrix ≠ primaries** — they are set independently and PQ is only the *transfer*, the matrix stays `BT2020_NCL`. `new_color_vectors_from_colorspace()` (`:183`) generates the actual YUV conversion matrix per ITU-T H.273, parameterized by bit_depth and full_range, producing the `y_mult/y_add/uv_mult/uv_add` scale/offset for limited vs full range at 8 or 10 bit (`:208-218`). There are 12 precomputed matrices (rec601/709/2020 × full/limited × 8/10) selected by index arithmetic (`:267-288`). This is the precise math punktfunk currently delegates to NVENC's internal RGB→YUV. ##### 2. GPU capture format + RGB→YUV conversion (`src/platform/windows/display_vram.cpp`) - `get_supported_capture_formats()` (`:1814`) returns, in priority order: `R16G16B16A16_FLOAT` (scRGB FP16 — "ideal for WCG and Advanced Color, both SDR and HDR", `:1816`), then the 8-bit BGRA/BGRX/RGBA fallbacks for plain SDR desktops. It deliberately does **not** capture `R10G10B10A2_UNORM` because DWM just clamps the scRGB values into it (`:1821-1827`) — a real edge case they learned. - The encoder target format is chosen from the negotiated pix_fmt (`init()`, `:711`): `NV12` (8-bit 420), `P010` (10-bit 420), `AYUV` (8-bit 444), `R16_UINT` (10-bit 444 packed), `Y410` (10-bit 444). For each, the RTV formats for the Y and UV planes are set (`:636-666`, e.g. P010 → `R16_UNORM` Y + `R16G16_UNORM` UV). - `convert()` (`:388`) runs a D3D11 pixel shader pass: it picks **`convert_Y_or_YUV_fp16_ps` when the source is FP16** vs the normal `convert_Y_or_YUV_ps` for sRGB UNORM sources (`:420`/`:431`). So the linear-scRGB→PQ-and-RGB→YUV math lives in Apollo's HLSL, producing P010 the encoder consumes directly. The Y and UV planes are separate draws with separate viewports (`:414-434`). - `apply_colorspace()` (`:459`) uploads the color matrix as a VS constant buffer (slot 3); it switches to `new_color_vectors_from_colorspace` for the packed 444 formats AYUV/R16_UINT/Y410 (`:462-466`). - HDR cursor handling (`:1600-1618`): when `dynamicRange && is_hdr()` it swaps in `cursor_ps_normalize_white` and multiplies cursor white by `300.0/80.0` (300-nit target) because there is **no Win32 API for the user's SDR white level** (`:1608-1610`). punktfunk does the same trick with 203 nits. ##### 3. HDR detection + static metadata (`src/platform/windows/display_base.cpp`) - `is_hdr()` (`:743`): `output->QueryInterface(IID_IDXGIOutput6)` → `GetDesc1()` → `desc1.ColorSpace == DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020` (`:755`). - `get_hdr_metadata()` (`:758`) is the piece punktfunk is **entirely missing**: it reads `DXGI_OUTPUT_DESC1` (`MaxLuminance`, `MinLuminance`, `MaxFullFrameLuminance`, primaries, white point) and packs it into `SS_HDR_METADATA`. Hard-won detail (`:772-779`): the primaries reported by `GetDesc1` are scRGB/Rec.709, but since Apollo converts to Rec.2020 it **overwrites them with the canonical Rec.2020 primaries + D65 white** (`:780-787`) to avoid confusing clients; it notes most clients only care about the luminance range. MaxCLL/MaxFALL are zeroed because DXGI doesn't expose content-specific values (`:802-804`). - On init it logs the full `DXGI_OUTPUT_DESC1` (colorspace string, BitsPerColor, all primaries, luminances) (`:722-732`) — invaluable diagnostics. ##### 4. Encoder VUI + HDR10 SEI metadata (`src/video.cpp`) - The avcodec context gets the four color fields from the colorspace (`ctx->color_range/color_primaries/color_trc/colorspace`, `:1626-1629`), and `sw_pix_fmt` is selected from bit_depth × chroma (`:1557-1561`). - Profile selection per codec+depth: HEVC → `MAIN_10` when `dynamicRange` (`:1588`), AV1 Main for both (`:1595`), HEVC RExt for 444 (`:1586`); H.264 10-bit is explicitly unsupported by the protocol (`:1578`). - **The HDR10 static metadata attach** (`:1825-1856`): when `colorspace_is_hdr`, it calls `disp->get_hdr_metadata()` and builds two AVFrame side-data blocks: `av_mastering_display_metadata_create_side_data` (display primaries scaled /50000, white point /50000, min_luminance /10000, max_luminance /1; with `has_luminance`/`has_primaries` flags, `:1829-1845`) and, only if MaxCLL/MaxFALL are nonzero, `av_content_light_metadata_create_side_data` (`:1847-1852`). This rides into the bitstream as the HEVC/AV1 mastering-display + CLL SEI that real HDR TVs need. - The NVENC path uses `nvenc_colorspace_from_sunshine_colorspace()` (`nvenc_utils.cpp:53`) → fills `nvenc_colorspace_t{primaries, transfer, matrix, full_range}` for the VUI, identical mapping to the avcodec one. `nvenc_format_from_sunshine_format()` (`:34`) maps the sunshine pix_fmt → `NV_ENC_BUFFER_FORMAT` (nv12 / **YUV420_10BIT** for P010 / ayuv / yuv444_10bit) — i.e. Apollo always hands NVENC **already-converted YUV**, never RGB. ##### Summary of Apollo's hard-won edges 1. HDR requires client-request AND HDR-display both true (no fake PQ on SDR desktop). 2. Capture scRGB FP16, never R10G10B10A2 (DWM clamps it). 3. RGB→YUV is done in Apollo's own shader into P010/NV12 so it controls the exact ITU-T H.273 matrix, range, and bit-depth — NVENC only sees YUV. 4. HDR10 static metadata (mastering display + MaxCLL/MaxFALL SEI) is read from `IDXGIOutput6::GetDesc1` and overridden to canonical Rec.2020 primaries. 5. Transfer/matrix/primaries/range are four independent fields and must all be set on both the codec ctx and the encoder VUI. 6. Cursor white must be scaled to a graphics-white nit target in HDR (no API for SDR white level). #### punktfunk today #### punktfunk's Windows HDR state today punktfunk actually has a **surprisingly complete** HDR capture+encode path on Windows — far more than the CLAUDE.md ("8-bit SDR BGRx, no HDR metadata plumbing") and the GameStream-side comments suggest. The deferral is real on the GameStream/Moonlight side and on the punktfunk/1 protocol side, but the DXGI+NVENC plumbing exists. **What works (DXGI path, `capture/dxgi.rs`):** - HDR detection: the duplication surface format `R16G16B16A16_FLOAT` is treated as "HDR on" (`dxgi.rs:940`, `hdr_fp16`); WGC path queries `IDXGIOutput6::GetDesc1().ColorSpace == DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020` (`wgc.rs:171-176`). - A real scRGB-FP16 → BT.2020 → PQ shader (`HDR_PS`, `dxgi.rs:469-495`) writing into a `R10G10B10A2_UNORM` texture — the standard OBS/Sunshine conversion, with the BT709→BT2020 matrix and ST2084 OETF inline. Cursor composited onto the linear FP16 surface with sRGB→linear decode + 203-nit white scaling (`dxgi.rs:1186-1208`), matching Apollo's 300-nit trick. - Full mid-session HDR-toggle handling: toggling HDR fires ACCESS_LOST, `recreate_dupl` re-detects the format and rebuilds the FP16/HDR10 textures + converter (`dxgi.rs:1321-1327`). Tiered recovery for the HDR overlay/MPO-flip churn (`dxgi.rs:1385-1407`) — this is genuinely sophisticated. **What works (NVENC path, `encode/nvenc.rs`):** - HDR is inferred from `PixelFormat::Rgb10a2` (`nvenc.rs:366`); sets buffer format `ABGR10` (`:390`), forces HEVC Main10 + `pixelBitDepthMinus8(2)` (`:233-237`), and sets the HEVC VUI to BT.2020 primaries / SMPTE2084 transfer / BT2020_NCL matrix, limited range (`:243-254`). - 10-bit-without-HDR also works (Main10 from 8-bit ARGB, `:233`). **Where it is weaker / missing / fragile vs Apollo:** 1. **No HDR10 static metadata at all.** `nvenc.rs:242` literally says "HDR10 static metadata — mastering display + MaxCLL/MaxFALL — is added in a follow-up." There is no `IDXGIOutput6::GetDesc1` luminance read fed anywhere, no NVENC `NV_ENC_MASTERING_DISPLAY_INFO`/`NV_ENC_CONTENT_LIGHT_LEVEL` SEI. Real HDR TVs tone-map wrong without it. 2. **NVENC ingests RGB (ABGR10), not YUV.** punktfunk feeds R10G10B10A2 RGB and relies on **NVENC's internal RGB→YUV** with a hardcoded VUI. Apollo converts to P010 in its own shader so it controls the exact ITU-T H.273 matrix/range. punktfunk has `video_colorspace`-equivalent math nowhere; the rich `new_color_vectors_from_colorspace` machinery has no analogue. 3. **SDR VUI is never signaled.** For the SDR 8-bit path the NVENC VUI block (`nvenc.rs:243`) is only set when `hdr`; SDR streams carry NVENC's default VUI (effectively unsignaled / BT.709 by luck), with no full/limited-range or rec601/709 control. Apollo always sets all four fields for SDR too. 4. **HDR detection is a single hardcoded colorspace constant** (`G2084_NONE_P2020`); no logging of the full `DXGI_OUTPUT_DESC1` like Apollo's `display_base.cpp:722-732`, making field diagnosis hard. 5. **No client-request AND display-HDR gate.** The protocol has `VIDEO_CAP_HDR` (`quic.rs:86`) but HDR is driven purely by the capture surface format; there's no explicit "client wanted HDR but desktop is SDR → signal SDR" decision like Apollo's `:29`. `punktfunk1.rs:558-563` only gates *bit depth*, and the doc comment (`quic.rs:128`) admits "BT.2020 PQ HDR signaling is added alongside HDR support" — i.e. the Welcome doesn't carry colorspace. 6. **Split-encode disabled for 10-bit** (`nvenc.rs:166`) is a sensible measured choice, but it means HDR throughput is single-engine-bound — a known limitation, not a bug. #### Transfer opportunities - **Plumb HDR10 static metadata (mastering display + MaxCLL/MaxFALL) into NVENC** (sev high, medium) — In capture/dxgi.rs read IDXGIOutput6::GetDesc1 (MaxLuminance/MinLuminance/MaxFullFrameLuminance + primaries) when entering the HDR path, carry it on D3d11Frame/CapturedFrame, and in encode/nvenc.rs populate NV_ENC_MASTERING_DISPLAY_INFO + NV_ENC_CONTENT_LIGHT_LEVEL (set via NV_ENC_HEVC_PROFILE_MAIN10 SEI fields / the per-pic HDR metadata API) on the keyframe. Override primaries to Rec.2020/D65 like Apollo. - **Always signal the SDR VUI (primaries/transfer/matrix/range), not just HDR** (sev medium, small) — In encode/nvenc.rs always set videoSignalTypePresentFlag + colourDescriptionPresentFlag, defaulting SDR to BT.709 primaries/transfer/matrix and limited range, and thread a colorspace/range choice down from the Welcome so the client and decoder agree. - **Convert to P010 in a D3D11 shader and feed NVENC YUV instead of ABGR10 RGB** (sev medium, large) — Add an optional GPU RGB->P010 conversion pass in capture/dxgi.rs (mirroring HdrConverter) producing DXGI_FORMAT_P010, and switch the NVENC buffer format to NV_ENC_BUFFER_FORMAT_YUV420_10BIT. Port video_colorspace's matrix generation to Rust as the conversion's constant buffer. Keep the current RGB path as a fallback behind an env knob. - **Gate HDR on (client requested HDR) AND (desktop is actually HDR), and signal the result in Welcome** (sev medium, medium) — Add a colorspace/dynamic-range field to the Welcome (punktfunk1.rs around :562-612), resolve HDR = host_wants AND client VIDEO_CAP_HDR AND capture-surface-is-FP16, and send the resolved colorspace so the client decoder/presenter sets the right transfer/primaries. - **Log the full DXGI_OUTPUT_DESC1 + capture format on HDR setup** (sev low, small) — When hdr_fp16 is detected in capture/dxgi.rs, QueryInterface IDXGIOutput6, GetDesc1, and tracing::info! the colorspace name, BitsPerColor, primaries, white point, and Max/Min/MaxFullFrame luminance — porting Apollo's colorspace_to_string table. - **Derive HDR cursor/graphics white from the display, not a fixed 203 nits** (sev low, small) — Once GetDesc1 luminance is read (per hdr10-static-metadata), use MaxFullFrameLuminance (or the SDR-white registry value if exposed) as the cursor white target instead of a constant, falling back to 203; document the limitation like Apollo. ### NVENC on D3D11 (encoder integration) — Windows host #### Apollo #### Apollo's NVENC-on-D3D11 architecture Apollo wraps the **raw NVENC C API** (`ffnvcodec/nvEncodeAPI.h`) in a clean three-layer class hierarchy, all under `src/nvenc/`: - `nvenc_base` (`nvenc_base.h:26`, impl `nvenc_base.cpp`) — platform-agnostic session lifecycle, all RC/codec config, encode loop, RFI. Pure-virtual hooks: `init_library()`, `create_and_register_input_buffer()`, `synchronize_input_buffer()`, `wait_for_async_event()` (`nvenc_base.h:78-105`). - `nvenc_d3d11` (`nvenc_d3d11.h`/.cpp) — the Windows D3D11 specialization: DLL load + async event. - `nvenc_d3d11_native` (`nvenc_d3d11_native.cpp`) — owns the actual encoder-input `ID3D11Texture2D` and registers it. ##### Library load (hardened) `nvenc_d3d11::init_library()` (`nvenc_d3d11.cpp:28-62`) loads `nvEncodeAPI64.dll` with **`LOAD_LIBRARY_SEARCH_SYSTEM32`** (defends against DLL-planting), `GetProcAddress("NvEncodeAPICreateInstance")`, and frees the DLL on any failure path. `device_type = NV_ENC_DEVICE_TYPE_DIRECTX` is fixed in the ctor (`nvenc_d3d11_native.cpp:14-18`), and `device = d3d_device` (the **caller's** D3D11 device, shared with capture). ##### API-version compatibility (a years-hardened detail) `nvenc_base.cpp:103`: it picks the **minimum** NvEnc API version per codec — `MAKE_NVENC_VER(11,0)` for H.264/HEVC, `(12,0)` for AV1 — to **maximize driver compatibility** (an older driver that only speaks API 11 still runs H.264/HEVC). `min_struct_version()` (`nvenc_base.cpp:666-680`) then rewrites every struct's version word to that minimum and can substitute a *lower struct revision* per major version (e.g. `NV_ENC_CONFIG_VER,7,8` at `:229`, `NV_ENC_PIC_PARAMS_VER,4,6` at `:516`). This is how Apollo runs one binary across many driver vintages without `NV_ENC_ERR_INVALID_VERSION`. A `#error` guard at `:23` forces a human to re-audit when the SDK is bumped. ##### Session init + capability gating (`create_encoder`, nvenc_base.cpp:100-457) 1. `nvEncOpenEncodeSessionEx` with the shared device (`:121-128`). 2. Enumerate supported encode GUIDs (`nvEncGetEncodeGUIDCount`/`...GUIDs`, `:130-140`) and **verify the requested codec is in the list** (`:165-173`) before doing anything else. 3. A `get_encoder_cap` lambda (`:175-180`) queries `nvEncGetEncodeCaps`. Apollo gates on real caps, not assumptions: - `NV_ENC_CAPS_WIDTH_MAX`/`HEIGHT_MAX` — rejects modes the GPU can't encode (`:191-197`). - `SUPPORT_10BIT_ENCODE`, `SUPPORT_YUV444_ENCODE` (`:199-207`). - `ASYNC_ENCODE_SUPPORT` — **silently downgrades to sync** if async unsupported (`:209-212`). - `SUPPORT_REF_PIC_INVALIDATION` → drives `encoder_params.rfi` (`:214`). - `SUPPORT_CUSTOM_VBV_BUF_SIZE` (`:250`), `SUPPORT_CABAC` (`:311`), `SUPPORT_MULTIPLE_REF_FRAMES` (`:274`), `SUPPORT_INTRA_REFRESH`/`SINGLE_SLICE_INTRA_REFRESH` (`:334-342`), `SUPPORT_WEIGHTED_PREDICTION` (`:220`). ##### Rate control / low-latency tuning (`nvenc_base.cpp:216-255`) - Preset = `P1..P7` from a number (`:216`, `quality_preset_guid_from_number` `:29`), **`tuningInfo = NV_ENC_TUNING_INFO_ULTRA_LOW_LATENCY`** (`:217`), `enablePTD = 1` (`:218`). - It seeds the full config from `nvEncGetEncodePresetConfigEx` (`:229-233`) and then overrides — it does **not** hand-build `NV_ENC_CONFIG`. - `gopLength = NVENC_INFINITE_GOPLENGTH` (`:237`), `frameIntervalP = 1` (no B-frames, `:238`). - `rateControlMode = NV_ENC_PARAMS_RC_CBR` (`:239`), `zeroReorderDelay = 1` (`:240`), `enableLookahead = 0` (`:241`), `lowDelayKeyFrameScale = 1` (`:242`) — the last spreads a keyframe's bits across the VBV window so an IDR doesn't blow latency. - `multiPass` (two-pass quarter/full res) from config (`:243-245`), `enableAQ` spatial AQ (`:247`). - `averageBitRate = bitrate*1000` (`:248`). - **VBV/HRD:** only if `SUPPORT_CUSTOM_VBV_BUF_SIZE`, `vbvBufferSize = bitrate*1000/framerate` (one frame), plus an optional `vbv_percentage_increase` knob (`:250-255`) — a deliberate "low-latency VBR" lever. ##### Per-codec handling (`nvenc_base.cpp:304-380`) A shared `set_h264_hevc_common_format_config` lambda (`:257-266`): `repeatSPSPPS=1` (in-band parameter sets every IDR), `idrPeriod = NVENC_INFINITE_GOPLENGTH`, `sliceMode=3` + `sliceModeData = slicesPerFrame` (**slices-per-frame from the client**), YUV444 `chromaFormatIDC=3`, optional filler data. - **H.264** (`:305-320`): High / High444 profile by buffer format; CABAC vs CAVLC gated on `SUPPORT_CABAC`; ref frames default 5. - **HEVC** (`:322-348`): `pixelBitDepthMinus8=2` for 10-bit; ref frames default 5; **intra-refresh** when client asks (`enableIntraRefresh=1`, `intraRefreshPeriod=300`, `intraRefreshCnt=299`, `singleSliceIntraRefresh=1` if supported, `:333-346`). - **AV1** (`:350-379`): `repeatSeqHdr=1`, infinite `idrPeriod`, full color signaling on the bitstream (`colorPrimaries`/`transferCharacteristics`/`matrixCoefficients`/`colorRange`, `:364-368`), and **slices→tiles** as nearest powers-of-two biased to rows (`numTileRows`/`numTileColumns`, `:372-377`, since NVENC AV1 only takes power-of-two tile counts). - VUI is filled for H.264/HEVC via `fill_h264_hevc_vui` (`:291-302`) from the `nvenc_colorspace_t` struct (`nvenc_colorspace.h`), populated by `nvenc_colorspace_from_sunshine_colorspace` (`nvenc_utils.cpp:53-91`) covering rec601/709/2020-SDR/2020-PQ. - `profileGUID = NV_ENC_CODEC_PROFILE_AUTOSELECT_GUID` by default (`:236`). ##### Ref frames + RFI buffer sizing (the subtle one) `set_ref_frames` (`:268-281`): it sets `maxNumRefFrames`/`maxNumRefFramesInDPB` to a **large** value (5 for H.264/HEVC, 8 for AV1) so the DPB is deep, **but pins L0 to `NV_ENC_NUM_REF_FRAMES_1`** (`:280`) — so each frame references only the previous frame, yet the deep DPB gives RFI room to fall back to an older valid frame when recent ones are invalidated. If `SUPPORT_MULTIPLE_REF_FRAMES` is false it forces 1 and disables RFI (`:274-277`). ##### Async encode (overlap submit and result, Windows-specific) - The async event is a Win32 `CreateEvent` in `nvenc_d3d11`'s ctor (`nvenc_d3d11.cpp:15`), closed in dtor (`:23-25`). - `enableEncodeAsync` set from the handle (`nvenc_base.cpp:219`), registered via `nvEncRegisterAsyncEvent` (`:389-396`), unregistered in `destroy_encoder` (`:466-472`). - In `encode_frame`: `pic_params.completionEvent = async_event_handle` (`:525`), `lock_bitstream.doNotWait = 1` (`:534`), then **`wait_for_async_event(100)`** (`:536`, = `WaitForSingleObject(ev,100)` `nvenc_d3d11.cpp:64-66`) before `nvEncLockBitstream`. This is how the GPU encode overlaps the CPU instead of a busy spin; a 100 ms timeout protects against a hung encode. ##### Encode loop (`encode_frame`, nvenc_base.cpp:490-572) `synchronize_input_buffer()` hook (interop copy point) → `nvEncMapInputResource` (with a `fail_guard` that always unmaps, `:510-514`) → build `NV_ENC_PIC_PARAMS` (`:516-525`, `encodePicFlags = FORCEIDR` when forced, `inputTimeStamp = frame_index`, `PIC_STRUCT_FRAME`) → `nvEncEncodePicture` → `nvEncLockBitstream` → copy out the bytes → `nvEncUnlockBitstream`. The returned `nvenc_encoded_frame` carries `idr` (from `pictureType==IDR`) and **`rfi_needs_confirmation`** so the network layer can mark the packet (`:547-557`). A periodic min/max/avg frame-size logger runs (`:569`). ##### Reference-frame invalidation (`invalidate_ref_frames`, nvenc_base.cpp:574-610) This is the **wall-breaker for packet loss without a full IDR** — Sunshine's years-hard-won feature: - No-op if RFI unsupported (`:575`). - **Dedup:** if the requested range is already inside the last invalidation range, skip (`:579-583`). - Sets `rfi_needs_confirmation = true` (`:585`) so the *next* encoded frame is tagged. - Expands `last_frame` to the last-encoded index (`:592-593`). - **Escalates to IDR** (returns false) if the range exceeds the DPB depth (`:597-600`) — you can't invalidate more frames than the DPB holds. - Otherwise calls `nvEncInvalidateRefFrames(encoder, i)` per frame in the range (`:602-607`). The encoder then re-references an older still-valid frame instead of emitting a costly IDR. ##### Input texture: register-once, zero-copy (`nvenc_d3d11_native.cpp:31-71`) Apollo creates **its own** encoder-input `ID3D11Texture2D` (`D3D11_USAGE_DEFAULT`, `BIND_RENDER_TARGET`, format from `dxgi_format_from_nvenc_format` — NV12/P010/AYUV/R16, `nvenc_utils.cpp:14-31`), then registers it once via `nvEncRegisterResource` (`resourceType=DIRECTX`, `bufferUsage=NV_ENC_INPUT_IMAGE`, `:53-67`). The capture/convert pipeline color-converts **into** `get_input_texture()` (`:26-29`); the registration is reused for the whole session (the `synchronize_input_buffer` hook is the copy/convert point). Note it rejects 10-bit 4:4:4 over D3D11 (`:32-35`) — that path needs CUDA interop. ##### Resource teardown discipline (`destroy_encoder`, nvenc_base.cpp:459-488) Strict reverse order with logged-but-non-fatal failures: destroy bitstream buffer → unregister async event → `nvEncUnregisterResource` → `nvEncDestroyEncoder`. `create_encoder` itself is wrapped in a `util::fail_guard` that calls `destroy_encoder` on any early return (`:112-114`, disabled on success `:455`), so a half-built session never leaks. ##### Error reporting `nvenc_failed` (`:612-664`) maps every `NVENCSTATUS` to a readable string (`nvEncGetLastErrorString` is deliberately *avoided* — comment at `:653` notes it returns garbage). #### punktfunk today #### punktfunk's NVENC-D3D11 today (`crates/punktfunk-host/src/encode/nvenc.rs`, `encode.rs`) punktfunk drives the **raw NVENC API** via `nvidia_video_codec_sdk::{sys, ENCODE_API}` (the safe wrapper is CUDA-only) in a single struct `NvencD3d11Encoder` (`nvenc.rs:38`). It implements the same `Encoder` trait (`submit`/`request_keyframe`/`poll`/`flush`, `encode.rs:41-51`) as the Linux/SW encoders. What it does well: - **True zero-copy, register-in-place.** Unlike Apollo (which owns a dedicated input texture and color-converts into it), punktfunk registers the **capturer's own** texture by raw pointer, caches the registration in `regs: HashMap`, and `encode_picture`s it directly — no per-frame `CopyResource` (`nvenc.rs:404-423`, comment `:1-12`). This is a genuinely tighter zero-copy path than Apollo's, valid because the encode loop is synchronous (`gamestream/stream.rs:338-344`, `punktfunk1.rs`). - **Shared device.** Session opened on the capturer's `ID3D11Device` carried on `FramePayload::D3d11` (`nvenc.rs:182-192, 394`); re-inits on device/size/HDR change (`:366-397`) — handles the secure-desktop device recreate. - **Low-latency RC mirrors Apollo's intent:** P1 + `ULTRA_LOW_LATENCY` (`:206-207, 260-261`), infinite GOP (`gopLength = NVENC_INFINITE_GOPLENGTH`, `:218`), `frameIntervalP=1` (no B-frames, `:219`), CBR (`:220`), ~1-frame VBV (`vbvBufferSize=vbvInitialDelay=bitrate/fps`, `:225-227`). - **Bitrate probe-and-step-down** on `initialize_encoder` InvalidParam, floor 10 Mbps (`:140-314`) — equivalent to Apollo's level-cap handling, done by retry. - **Split-encode** (Ada dual-NVENC) auto/forced with a pixel-rate threshold and a 10-bit carve-out, env `PUNKTFUNK_SPLIT_ENCODE` (`:150-173, 273`), plus a fallback that disables split if init rejects it (`:295-306`). - **HDR / 10-bit:** Main10 profile + `pixelBitDepthMinus8(2)` + BT.2020/PQ VUI (`:233-254, 387-393`). - Forced-IDR keyframe with in-band SPS/PPS via `FORCEIDR | OUTPUT_SPSPPS` (`:437-442`). ##### Where punktfunk is weaker / missing / fragile 1. **No reference-frame invalidation at all.** `request_keyframe()` only sets `force_kf` → a full **IDR** (`nvenc.rs:437-442, 465-467`). There is no `nvEncInvalidateRefFrames`, no DPB depth, no RFI-range handling, no "rfi_needs_confirmation" tagging. Apollo's entire `invalidate_ref_frames` (`nvenc_base.cpp:574-610`) and deep-DPB-with-L0=1 design (`:268-281`) is absent. punktfunk relies wholly on FEC + IDR for loss recovery — every recovery event is a costly keyframe spike (the exact thing the infinite-GOP design was meant to avoid). Note `punktfunk1.rs:2153`/`gamestream/stream.rs:336` call `request_keyframe()` for "RFI", but it's always a full IDR. 2. **No async encode.** punktfunk is synchronous map→encode→lock with a `pending` VecDeque and `POOL=4` bitstreams (`:28, 459-460, 469-503`), but never uses a completion event / `enableEncodeAsync` / `doNotWait`. Apollo overlaps GPU encode with CPU via a Win32 event + 100 ms timeout (`nvenc_d3d11.cpp:15,64-66`, `nvenc_base.cpp:525,534-539`). punktfunk's `lock_bitstream` blocks with no timeout — a hung encode wedges the encode thread forever. 3. **No capability gating.** punktfunk never calls `nvEncGetEncodeCaps`. It does not check `WIDTH_MAX`/`HEIGHT_MAX`, `SUPPORT_10BIT_ENCODE`, `SUPPORT_REF_PIC_INVALIDATION`, `SUPPORT_INTRA_REFRESH`, `SUPPORT_CUSTOM_VBV_BUF_SIZE`, or even verify the codec GUID is supported (`nvEncGetEncodeGUIDs`). Apollo gates on all of these (`nvenc_base.cpp:130-255, 334`). punktfunk discovers limits only by `initialize_encoder` failing. 4. **No API-version/struct-version minimization.** punktfunk hard-codes `NVENCAPI_VERSION` and `*_VER` everywhere (`:186, 197, 258, ...`). On an older driver this risks `NV_ENC_ERR_INVALID_VERSION` for the whole session — Apollo's per-codec minimum-version + `min_struct_version` trick (`nvenc_base.cpp:103,666-680`) is the years-hardened compatibility layer punktfunk lacks. 5. **No slices/intra-refresh.** `sliceMode`/`slicesPerFrame` is never set, so single-slice frames only — no per-slice FEC granularity, and no NVENC intra-refresh option (Apollo `nvenc_base.cpp:257-266, 333-346`). The Linux path also has no intra-refresh. 6. **No VUI for the SDR path / limited colorspace.** Only the HDR branch sets VUI (`:243-254`); SDR streams carry no `colourDescriptionPresentFlag` (BT.709 left implicit). Apollo always writes VUI for H.264/HEVC (`nvenc_base.cpp:291-302`) and full color signaling for AV1. 7. **No AQ / two-pass / weighted-prediction / lookahead-off / zeroReorderDelay / lowDelayKeyFrameScale knobs.** punktfunk sets none of `zeroReorderDelay`, `enableLookahead=0`, `lowDelayKeyFrameScale=1` (Apollo `nvenc_base.cpp:240-242`) — the preset defaults are used. `lowDelayKeyFrameScale` in particular reduces the IDR latency spike. #### Transfer opportunities - **Add real reference-frame invalidation (RFI) instead of always forcing IDR** (sev high, large) — In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire punktfunk1.rs RFI requests to it, falling back to request_keyframe() only when it returns false. - **Query nvEncGetEncodeCaps and gate config on real GPU capabilities** (sev medium, medium) — Add a `get_cap(cap: NV_ENC_CAPS) -> i32` helper in nvenc.rs after open_encode_session_ex (using API.get_encode_caps), verify codec_guid is in get_encode_guids, reject out-of-range WxH up front, and use SUPPORT_10BIT_ENCODE / SUPPORT_REF_PIC_INVALIDATION / SUPPORT_CUSTOM_VBV_BUF_SIZE to gate the corresponding config rather than assuming support. Surfaces clear errors instead of opaque InvalidParam. - **Use async encode with a Win32 completion event + timeout** (sev medium, medium) — In nvenc.rs, gate on NV_ENC_CAPS_ASYNC_ENCODE_SUPPORT, create a per-bitstream Win32 Event (windows::Win32::System::Threading::CreateEventW), set init.enableEncodeAsync=1, store the event in `pending`, set pic.completionEvent + lock.doNotWait=1, and in poll() WaitForSingleObject(ev, 100ms) before lock_bitstream — returning a clear timeout error instead of blocking forever. - **Minimize NvEnc API/struct versions per codec for older-driver compatibility** (sev medium, medium) — Add a `min_api_version(codec)` (v11 for H264/HEVC, v12 for AV1) and a helper that rewrites the version word (and optionally the struct-revision byte) before each NvEnc struct is passed, mirroring nvenc_base.cpp:666-680. Set apiVersion in open_encode_session_ex (nvenc.rs:186) from it. Maximizes driver compatibility for the field. - **Add zeroReorderDelay/lookahead-off/lowDelayKeyFrameScale and always emit SDR VUI** (sev low, small) — In init_session set cfg.rcParams.zeroReorderDelay=1, enableLookahead=0, lowDelayKeyFrameScale=1 right after the CBR/VBV block (nvenc.rs:220-227). Add an SDR VUI branch (BT.709 primaries/transfer/matrix, limited range) alongside the existing HDR branch (:243) so every HEVC/H264 stream signals its colorspace. - **Honor client slices-per-frame and offer NVENC intra-refresh** (sev low, medium) — Thread a slices-per-frame value from session negotiation into NvencD3d11Encoder::open and set hevcConfig/h264Config sliceMode=3 + sliceModeData in init_session; for AV1 set numTileRows/numTileColumns as nearest powers of two. Optionally add an intra-refresh config branch gated on NV_ENC_CAPS_SUPPORT_INTRA_REFRESH as an alternative recovery mode to RFI. ### Windows host input injection: SendInput / raw input / ViGEm gamepad (mouse, keyboard, scroll, touch/pen, X360/DS4, rumble/LED) #### Apollo #### Apollo Windows input — reference map Two layers: the protocol-side dispatcher `src/input.cpp` (queue, batching, gamepad index allocation, edge-case workarounds) and the platform backend `src/platform/windows/input.cpp` (the actual `SendInput` / `InjectSyntheticPointerInput` / ViGEm calls). Future agents should treat `src/input.cpp` as "what to do" and `platform/windows/input.cpp` as "how Windows does it." ##### Desktop re-attach (the #1 hard-won robustness trick) `send_input(INPUT&)` at `platform/windows/input.cpp:477-488` calls `SendInput(1,&i,sizeof)`. If it returns != 1 it calls `syncThreadDesktop()` and, only if the input desktop **changed** since last time (`_lastKnownInputDesktop`, a `thread_local` at :35), retries **once** via `goto retry`. Same pattern for pointer injection at `inject_synthetic_pointer_input` (:499-510). Key subtlety: it gates the retry on an *actual desktop change* to avoid an infinite retry loop when the failure is for another reason, and the desktop handle is cached thread-local so it is not re-opened on every event. ##### Mouse — absolute `abs_mouse` (:512-532): `MOUSEEVENTF_MOVE | MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_VIRTUALDESK`. The protocol layer (`src/input.cpp:519-555`, `client_to_touchport` :460-484) does the real coordinate work: it maps client coords through a **touch_port** (offset_x/y, width/height, env_width/env_height, scalar_inv) so multi-monitor / aspect-ratio-corrected setups land correctly, clamps to `[0,size]`, applies client offsets, then the platform layer scales into the fixed `target_touch_port {0,0,65535,65535}` (:37-42). Note: the scaling at :525-526 is `(x+offset)*65535/width` — coordinates are normalized to the **selected output region**, not blindly to the whole virtual desktop. ##### Mouse — relative `move_mouse` (:534-545): plain `MOUSEEVENTF_MOVE` with raw deltas, no `ABSOLUTE`. The protocol layer accumulates: `passthrough(PNV_REL_MOUSE_MOVE_PACKET)` (:444-451) sets `mouse_left_button_timeout = DISABLE_LEFT_BUTTON_DELAY` to mark "we are in relative mode." ##### Mouse buttons — the left/right ordering workaround `button_mouse` (:561-582) maps 1=L,2=M,3=R,4=X1,5=X2 with the `MOUSEEVENTF_*DOWN/UP` flags and `XBUTTON1/2` in `mouseData`. The clever part is in `src/input.cpp:573-616`: when in **absolute** mode, Moonlight can send R-down right after L-up, causing a stray left-click on hyperlinks before the right-click registers. Apollo delays `BUTTON_LEFT` release by **10 ms** via the task pool (:600), and if `BUTTON_RIGHT` arrives during that window it flushes a synthetic R down+up immediately (:604-614). This delay is applied **only** in absolute mode (so gaming relative input stays zero-latency). It also dedupes redundant button states via `mouse_press[]` (:565-571). ##### Scroll — vertical, horizontal, and high-res `scroll`/`hscroll` (:584-606): `MOUSEEVENTF_WHEEL`/`MOUSEEVENTF_HWHEEL`, distance straight into `mouseData`. The accumulation lives in `src/input.cpp:780-820`: if `high_resolution_scrolling` is off, deltas are **accumulated** (`accumulated_vscroll_delta`/`accumulated_hscroll_delta`) and only whole `WHEEL_DELTA` (120) ticks are emitted, remainder carried forward (:790-795, :813-818). This prevents apps that don't understand fractional wheel deltas from misbehaving, while a config flag passes raw high-res deltas for apps that do. ##### Keyboard — scancode vs VK, normalization, extended keys `keyboard_update` (:608-661) is the crown jewel: - It prefers **scancodes** (`KEYEVENTF_SCANCODE`) over VK for max game compatibility (DirectInput games read scancodes). - VK→scancode uses the checked-in US-English table `VK_TO_SCANCODE_MAP` in `keylayout.h` (256 entries, indexed by `modcode & 0xFF`, :618) — GameStream's canonical layout. The client normalizes to US-English VK; the table is the inverse. - If the client flagged the key as **non-normalized** (`SS_KBE_FLAG_NON_NORMALIZED`, :616) AND the host config `always_send_scancodes` is set, it falls back to `MapVirtualKey(modcode, MAPVK_VK_TO_VSC)` (:621) — *assuming host==client layout* — explicitly excluding `VK_LWIN/RWIN/PAUSE` because `MapVirtualKey` is broken for those (:619, comment :620). - If no scancode maps, it sends the VK directly via `ki.wVk` (:629). - A hardcoded **extended-key list** (:633-654: LWIN, RWIN, RMENU, RCONTROL, INSERT, DELETE, HOME, END, PRIOR, NEXT, arrows, DIVIDE, APPS) ORs `KEYEVENTF_EXTENDEDKEY` because the table can't express the E0 prefix. ##### Unicode / dead keys `unicode` (:1147-1173): `MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,...)` then for each UTF-16 unit sends `KEYEVENTF_UNICODE` down for all chars, then up for all chars (separate loops). This bypasses the keyboard layout entirely and is how non-US characters / dead-key results get injected. ##### Touch & pen — SyntheticPointerDevice (Win10 1809+) Dynamically resolves `CreateSyntheticPointerDevice`/`InjectSyntheticPointerInput`/`DestroySyntheticPointerDevice` from user32.dll at startup (:466-468); capability advertised only if present (`get_capabilities` :1763-1781). Per-client device + state (`client_input_raw_t` :663-699) — separate pen and touch devices per connected client. Hard-won details: - **Contiguity**: Windows requires active touch slots be contiguous; `perform_touch_compaction` (:715-737) swaps slots down and recomputes `activeTouchSlots` before every operation. - **Slot allocation by pointer ID**: `pointer_by_id` (:746-776) maps client pointer IDs to slots, warns on missing down/up events. - **Auto-cancel timeout**: Windows cancels injected pointer interactions if not refreshed ~1 s; Apollo re-injects every **50 ms** (`ISPI_REPEAT_INTERVAL` :840) via task-pool `repeat_touch`/`repeat_pen` (:846-866). - **Edge-triggered flags** (`POINTER_FLAG_DOWN|UP|CANCELED|UPDATE`, :900) are cleared after one frame (:1020, :1139). - Pressure 0..1024, contact area computed from major/minor axis + rotation into `rcContact` (:980-1000), pen tilt converted from polar rotation+tilt to tiltX/tiltY (:1117-1130), barrel button + eraser flags (:1077-1094). ##### ViGEm gamepad — X360 + DS4 `vigem_t` (:198-407). `init()` (:200-215) probes ViGEm at startup so the web UI shows a fatal error immediately if ViGEmBus isn't installed (:207). Lazy connect on first pad (:231-242). `gamepads` is a fixed `MAX_GAMEPADS` vector indexed by **globalIndex** while the wire uses **clientRelativeIndex** (:84, :228) — feedback always goes back keyed by the client-relative index (:346-351). - **Type selection** `alloc_gamepad` (:1175-1227): config override `x360`/`ds4`, else client-reported `LI_CTYPE_PS`→DS4 / `LI_CTYPE_XBOX`→X360, else `motion_as_ds4` (has accel/gyro)→DS4, else `touchpad_as_ds4`→DS4, else X360. Warns when caps will be lost (motion/touchpad/RGB on X360, :1207-1224). - **X360** report: `x360_buttons` (:1244-1294), `x360_update_state` (:1302-1312), pushed with `vigem_target_x360_update`. - **DS4** is much richer: `ds4_update_state` (:1432-1446) including stick conversion `to_ds4_triggerX/Y` (:1417-1425, note the Y axis is inverted and 0 mapped to 0xFF), D-pad as direction enum (:1314-1345), special buttons PS/touchpad (:1397-1415). - **DS4 timestamp pump** `ds4_update_ts_and_send` (:1454-1481): some apps require a monotonically advancing `wTimestamp` (units of 5.333 µs, :1467-1468) to register DS4 input, so Apollo re-sends the report at least every 100 ms via the task pool (:1479) — without this, DS4 input silently stops registering. - **DS4 motion** `ds4_update_motion` (:143-196): converts m/s² accel and deg/s gyro into DS4 scale and applies the **inverse of ViGEmBus's hardcoded calibration biases** (:157-170) so games read correct values; requests 100 Hz motion from the client at alloc (:258-259). - **DS4 touchpad** `gamepad_touch` (:1521-1620): 2 pointers, touchpad is 1920×943, packs X/Y into the DS4 touch byte layout (:1604-1608), tracks per-pointer "is up" tracking numbers. - **DS4 battery** `gamepad_battery` (:1654-1720): maps Moonlight battery state to the DS4 `bBatteryLvlSpecial` nibble layout. ##### Rumble / LED back-channel `vigem_target_x360_register_notification`/`vigem_target_ds4_register_notification` (:275-278). Callbacks `x360_notify`/`ds4_notify` (:409-441) **push work to the task pool** (don't block the driver callback thread, :421, :439-440). `rumble()` (:329-358) converts 8→16-bit, **dedupes duplicate rumble** (:344-345), and raises a `gamepad_feedback_msg_t` keyed by client_relative_index. `set_rgb_led()` (:367-384) same with dedup. `forward_rumble` config gate (:331). ##### Gamepad index lifecycle & batching (`src/input.cpp`) - A global `std::bitset gamepadMask` (:116) maps the client's controllerNumber → a free globalIndex via `alloc_id`/`free_id`. - Created on `SS_CONTROLLER_ARRIVAL_MAGIC` (:852-869), or lazily on a multi-controller packet when `activeGamepadMask` newly sets the bit (:1096-1107); freed when the mask clears the bit (:1108-1112). - **Back→Home emulation**: holding BACK for `back_button_timeout` synthesizes a Guide/Home press (:1135-1172), with a `back_button_state` state machine so the real BACK is suppressed/forced correctly. - **Input queue + batching**: a dedicated thread drains a queue and coalesces consecutive same-type packets before injecting — `batch()` overloads (:1208-1414): rel-mouse sums deltas with overflow guard, abs-mouse takes latest (only if same ref w/h), scroll sums, controller takes latest only if buttons unchanged, touch/pen only batch move/hover for the same pointer. This decouples network jitter from injection and reduces SendInput call volume. #### punktfunk today #### punktfunk Windows input — current state ##### Mouse — `inject/sendinput.rs` - **Relative** (`MouseMove`, :99-109): plain `MOUSEEVENTF_MOVE` with raw `event.x/y`. Equivalent to Apollo. - **Absolute** (`MouseMoveAbs`, :110-132): normalizes client `(x,y)` over the client's reference extent (carried in `flags` as `w<<16|h` from `gamestream/input.rs:58`) to 0..65535 with `MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_VIRTUALDESK`. **Weaker than Apollo**: it ignores the actual virtual-desktop rect — `virtual_desktop_rect()` is fetched (:116) then explicitly discarded (`let _ = (vw,vh)`, :122). There is no `touch_port`/offset/aspect-correction concept, so on a multi-monitor host or where the virtual output is not at the virtual-desktop origin the cursor lands wrong. Comment at :117 even admits it assumes a single output spanning the desktop. - **Buttons** (:133-187): 1..5 with X1/X2. **Missing the left/right ordering workaround** — no 10 ms left-up delay in absolute mode, no `mouse_press[]` dedup, so the hyperlink-misclick bug Apollo fixed is present. - **Scroll** (`MouseScroll`, :188-205): `MOUSEEVENTF_WHEEL`/`HWHEEL`, passes `event.x` (already 120-scaled) straight through. **No high-res accumulation / WHEEL_DELTA quantization** — apps that can't handle fractional/large deltas may misbehave; no config to choose high-res vs ticked. ##### Keyboard (:206-230) - Uses `MapVirtualKeyExW(vk, MAPVK_VK_TO_VSC_EX, None)` to get a scancode incl. the E0 extended bits, sends `KEYEVENTF_SCANCODE` (+ `EXTENDEDKEY` when `0xe000` set or `forced_extended()` :270-275). This is actually *cleaner* than Apollo's static table in one way (uses the live layout) but **weaker in others**: (1) it passes the live host layout map rather than a canonical US-English table, so the client's normalized-US-English VK can map to the wrong scancode under a non-US host layout — Apollo deliberately uses the fixed US table for normalized keys and only falls back to `MapVirtualKey` for non-normalized; (2) if `MapVirtualKeyExW` returns 0 the key is silently dropped (:210-211) with **no VK fallback** (Apollo falls back to `ki.wVk`); (3) it ignores the client's normalization flag entirely (the GameStream decoder at `gamestream/input.rs:74` masks the VK but never surfaces `SS_KBE_FLAG_NON_NORMALIZED`). - **No Unicode path**: `MAGIC_UTF8` is decoded-to-nothing (`gamestream/input.rs:83` lists UTF-8 as "not yet injected"), so dead keys / non-US text typed on the client never reach the host. Apollo's `unicode()` handles this. ##### Touch / pen - **Entirely absent.** `TouchDown/Move/Up` are explicit no-ops (`sendinput.rs:232-236`) and pen has no `InputKind` at all. No `CreateSyntheticPointerDevice`, no capability advertisement. ##### Gamepad — `inject/gamepad_windows.rs` - **X360 only.** `Xbox360Wired` via `vigem_client` (:17, :62). No DS4/DualSense on Windows (the `dualsense` module is `#[cfg(target_os = "linux")]`, `inject.rs:297-298`). So PS-controller clients lose motion, touchpad, lightbar, adaptive triggers, and the correct face-button glyphs on Windows. - **No client-driven type selection**: always X360 regardless of `LI_CTYPE_PS` / motion / touchpad caps (the `Arrival` event is decoded but ignored — `gamepad_windows.rs:103` returns on anything but `State`). - **Lazy plug on first State** (:59-100) with a rumble notification thread writing `large<<8|small` into an atomic; `pump_rumble` (:127-137) emits on change, scales 0..255→0..65535 by ×257. Functionally fine for X360 rumble. **No LED back-channel** (X360 has none; but with no DS4 there is also no lightbar relay). **No DS4 timestamp pump, no motion calibration, no battery, no touchpad** — none of which apply to X360 but all of which are missing because DS4 isn't implemented. - Index handling is a plain `HashMap` (:33) keyed by `f.index.max(0)` (:106); no global/client-relative split, no activeGamepadMask add/remove lifecycle, no Back→Home emulation. ##### Dispatch / batching - The GameStream decoder `gamestream/input.rs` produces one `InputEvent` per packet; there is **no input queue, no batching/coalescing** of rel-mouse/scroll/controller packets on the Windows path. Under jitter this means more SendInput calls and no delta-summing. ##### Desktop re-attach (a relative strength) - `reattach_input_desktop()` is called **before every event** (:97) — robust, but heavier than Apollo's lazy "only on send failure + only when the desktop actually changed" (`platform/windows/input.cpp:481-485`). punktfunk also surfaces a 0-count send as `Err` so the host drops+reopens the injector (:75-81), which is a reasonable self-heal. #### Transfer opportunities - **Map absolute mouse through the real virtual-desktop / output rect, not a blind 0..65535 normalize** (sev high, medium) — In sendinput.rs::MouseMoveAbs, query the target virtual output's rect (it is created at a known position by the SudoVDA display backend) and compute the absolute coords relative to the full virtual desktop: ax = (out_x + cx*out_w - vx)/vw * 65535 (and same for y), using SM_X/Y/CX/CYVIRTUALSCREEN. Carry the output origin/size from the display backend into the injector instead of dropping vw/vh. - **Add a DS4 (DualShock4) ViGEm target on Windows with type auto-selection, motion, touchpad, battery and timestamp pump** (sev high, large) — In gamepad_windows.rs, add a DS4Wired branch via vigem_client::DualShock4Wired with a union/enum PadEntry. Resolve type from the decoded Arrival (precedence: explicit env/client choice > PS type > motion/touchpad caps > X360), mirroring the existing GAMEPAD-preference negotiation. Port Apollo's wTimestamp pump (5.333us units, re-send every 100ms), motion calibration constants (:157-170), and the touchpad byte packing (:1604-1608). Surface the LED color via the existing 0xCA/feedback plane. - **Use a canonical US-English VK→scancode table for normalized keys, and fall back to VK when no scancode maps** (sev medium, medium) — Port keylayout.h as a Rust const [u8;256] and index it for normalized keys; thread the non-normalized flag from gamestream/input.rs (decode the keyboard flags byte) into InputEvent.flags so sendinput.rs can choose table vs MapVirtualKeyExW. Add a VK fallback (set wVk, clear SCANCODE) when the scancode is 0 instead of dropping. - **Inject UTF-8 text via KEYEVENTF_UNICODE (dead keys / non-US characters)** (sev medium, small) — Add a text InputKind (or a side channel) carrying the UTF-8 bytes; decode MAGIC_UTF8 in gamestream/input.rs; in sendinput.rs convert to UTF-16 (e.g. char::encode_utf16) and emit paired KEYEVENTF_UNICODE down/up INPUTs. Mirror Apollo's two-loop (all-down then all-up) ordering. - **Add wheel-delta accumulation/quantization for low-res scroll consumers** (sev low, small) — Add accumulated_vscroll/hscroll fields to SendInputInjector and, behind a config/env toggle (default: ticked), sum deltas and emit only multiples of 120, carrying the remainder. Keep the raw-passthrough path for high-res-aware apps. - **Add the absolute-mode left-button-release delay + button-state dedup** (sev low, medium) — Track whether the last mouse event was absolute (set on MouseMoveAbs, cleared on MouseMove) and a mouse_press[6] bitset in SendInputInjector. In absolute mode, defer left-up ~10ms (a small timer/queued action on the injector thread) and short-circuit a pending right-down; dedupe identical button states. Guard so relative gaming input is never delayed. ### SudoVDA virtual display driver lifecycle (Windows host) #### Apollo #### Apollo's SudoVDA virtual-display lifecycle on Windows Apollo drives the **SudoVDA** Indirect Display Driver (IDD) entirely from user mode over a device-interface-GUID + `DeviceIoControl` IOCTL protocol. There is no DLL link and no named pipe; the contract is the structs/IOCTLs in `third-party/sudovda/sudovda-ioctl.h`. The two layers are: the thin IOCTL wrappers in `sudovda.h`, and the lifecycle/policy layer in `src/platform/windows/virtual_display.cpp`, orchestrated by `src/process.cpp` and `src/main.cpp`. ##### IOCTL control protocol `sudovda-ioctl.h:10-15` defines five `CTL_CODE(FILE_DEVICE_UNKNOWN, func, METHOD_BUFFERED, FILE_ANY_ACCESS)` codes: ADD=0x800, REMOVE=0x801, SET_RENDER_ADAPTER=0x802, GET_WATCHDOG=0x803, DRIVER_PING=0x888, GET_PROTOCOL_VERSION=0x8FF. Structs (`sudovda-ioctl.h:35-64`): - `VIRTUAL_DISPLAY_ADD_PARAMS { UINT Width, Height, RefreshRate; GUID MonitorGuid; CHAR DeviceName[14]; CHAR SerialNumber[14]; }` → out `VIRTUAL_DISPLAY_ADD_OUT { LUID AdapterLuid; UINT TargetId; }`. - `VIRTUAL_DISPLAY_REMOVE_PARAMS { GUID MonitorGuid; }`; `SET_RENDER_ADAPTER { LUID AdapterLuid; }`; `GET_WATCHDOG_OUT { UINT Timeout, Countdown; }`; `PROTOCOL_VERSION { u8 Major, Minor, Incremental; bool TestBuild; }`. `DeviceName`/`SerialNumber` are copied with `strncpy(..., 13)` (`sudovda.h:66-67`) — 13 chars into a 14-byte field, guaranteeing a NUL terminator. ##### Device detection / open (with retry/backoff) `OpenDevice` (`sudovda.h:21-62`) uses `SetupDiGetClassDevs(SUVDA_INTERFACE_GUID, DIGCF_PRESENT|DIGCF_DEVICEINTERFACE)`, enumerates interfaces, gets the detail path, then `CreateFileA(path, GENERIC_READ|GENERIC_WRITE, FILE_SHARE_READ|WRITE, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL | FILE_FLAG_NO_BUFFERING | FILE_FLAG_OVERLAPPED | FILE_FLAG_WRITE_THROUGH)`. Note the **OVERLAPPED + NO_BUFFERING + WRITE_THROUGH** flags — IDD control I/O is unbuffered/async-capable. `openVDisplayDevice` (`virtual_display.cpp:562-586`) wraps this in a **retry loop with exponential backoff**: 20ms doubling up to 320ms before giving up with `DRIVER_STATUS::FAILED`. The driver's WUDF service can be slow to come up after boot/install, so a single open attempt is unreliable. ##### Protocol version compatibility gate After open, `CheckProtocolCompatible` (`sudovda.h:186-193`) calls GET_PROTOCOL_VERSION and `isProtocolCompatible` (`sudovda.h:170-184`): **reject if `Major` differs** (breaking IOCTL change), **reject if our compiled `Minor` is newer than the driver's** (we'd issue IOCTLs the driver lacks). `openVDisplayDevice` returns `DRIVER_STATUS::VERSION_INCOMPATIBLE` and closes the handle if this fails (`virtual_display.cpp:579-583`). The compiled-against version is `VDAProtocolVersion = {0,2,1,true}` (`sudovda-ioctl.h:25`). ##### Watchdog / keepalive with failure escalation SudoVDA tears down **all** virtual displays if it isn't pinged inside its watchdog window. `startPingThread` (`virtual_display.cpp:588-622`) reads `GetWatchdogTimeout` → if `Timeout != 0`, spawns a detached thread sleeping `Timeout*1000/3` ms, calling `PingDriver` (DRIVER_PING IOCTL). Crucially it **counts consecutive failures**: `fail_count > 3` → invokes the `failCb` and exits the thread. The callback `onVDisplayWatchdogFailed` (`process.cpp:65-68`) sets `vDisplayDriverStatus = WATCHDOG_FAILED` and `closeVDisplayDevice()`, so the host knows the IDD is dead and re-inits on the next session (`process.cpp:243-246`) or in `refresh()`'s 5-try re-init loop (`process.cpp:1571-1579`). ##### Render-adapter binding (multi-GPU) `setRenderAdapterByName` (`virtual_display.cpp:624-654`) enumerates DXGI adapters, matches `desc.Description` to the user-configured `adapter_name`, and issues SET_RENDER_ADAPTER with that adapter's LUID. This binds the IDD's *render* GPU to the same GPU the encoder/capture uses — essential on hybrid/multi-GPU boxes (otherwise the IDD renders on the iGPU while NVENC sits on the dGPU). Called before every create in `main.cpp:369-371` and `process.cpp:250-252`. ##### Per-client monitor identity The `MonitorGuid` is derived **per client / per app**, not a constant (`process.cpp:256-279`): for app-identity displays it XORs the client UUID with the app UUID; otherwise it uses the launch session's `unique_id`. The GUID is then memcpy'd into `launch_session->display_guid` (`process.cpp:279`) and REMOVE keys off it. Windows persists each GUID's monitor layout/position independently, so distinct clients get stable, non-colliding monitors. A separate fixed `PROBE_DISPLAY_UUID` is used for the startup encoder-probe display. ##### Add → resolve GDI name → mode-set flow `createVirtualDisplay` (`virtual_display.cpp:656-689`): AddVirtualDisplay IOCTL, then `GetAddedDisplayName` in a **20→320ms backoff loop** (`virtual_display.cpp:674-683`). `GetAddedDisplayName` (`sudovda.h:215-247`) matches on the **stable `TargetId`** from the ADD output via `QueryDisplayConfig`, then resolves the `\\.\DisplayN` GDI name via `DISPLAYCONFIG_SOURCE_DEVICE_NAME`. The mode in ADD only *advertises* the timing; the desktop is then forced to it by `changeDisplaySettings` (`virtual_display.cpp:265-309`), which: 1. First applies a **baseline** via the legacy `ChangeDisplaySettingsExW`/`DEVMODEW` path with refresh-rate rounding — including an **alt refresh rate** fallback (±1 Hz, `virtual_display.cpp:271-300`) because IDDs frequently advertise e.g. 60 vs 59 and the exact integer rate fails. 2. Then fine-tunes via the modern CCD path `changeDisplaySettings2` (`virtual_display.cpp:64-263`): `QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS)`, find the path whose source GDI name matches, set `sourceMode.width/height` and `targetInfo.refreshRate = {rate,1000}` (millihertz — sub-integer refresh), then `SetDisplayConfig(SDC_APPLY | SDC_USE_SUPPLIED_DISPLAY_CONFIG | SDC_SAVE_TO_DATABASE)`. `SDC_SAVE_TO_DATABASE` **persists** the mode across reconnects. ##### Multi-monitor isolation ("isolated virtual display") The most hard-won code. `changeDisplaySettings2(..., bApplyIsolated=true)` (`virtual_display.cpp:86-201`) repositions the virtual display to the **lower-right at (0,0)-anchored primary** and rearranges every other display so the desktop topology stays gap-free and connected. `rearrangeVirtualDisplayForLowerRight` (`virtual_display.cpp:914-1169`) and `moveToBeConnected` (`virtual_display.cpp:732-910`) implement a geometric box-packing algorithm: it computes corner/vertical/horizontal adjacency between display rectangles and iteratively slides disconnected displays until every monitor "touches" (Windows rejects a topology with gaps/overlaps). The primary-display offset is preserved so the user's primary doesn't move (`virtual_display.cpp:151-174`). This took years because Windows silently rejects `SetDisplayConfig` for any non-contiguous arrangement. ##### HDR control keyed to a display `findDisplayIds` (`virtual_display.cpp:386-423`) resolves `(adapterId LUID, targetId)` from a GDI name. `getDisplayHDR` (`virtual_display.cpp:425-516`) matches the adapter LUID across DXGI factory adapters, then matches the output's HMONITOR GDI name and reads `IDXGIOutput6::GetDesc1().ColorSpace == RGB_FULL_G2084_NONE_P2020`. `setDisplayHDR` (`virtual_display.cpp:518-527`) uses `DisplayConfigSetDeviceInfo(SET_ADVANCED_COLOR_STATE)`. Apollo saves the initial HDR state and **reverts it on teardown** (`process.cpp:526-543`, `process.cpp:749-757`). ##### Teardown `removeVirtualDisplay` (`virtual_display.cpp:691-702`) issues REMOVE by GUID. The host only removes if `vDisplayDriverStatus == OK && launch_session->virtual_display` (`process.cpp:759-768`), and crucially sets `launch_session->virtual_display = true` immediately after ADD **regardless of whether the name resolved** (`process.cpp:300-302`) — "the display might still exist even if we couldn't get its name, so we must still track it for removal." After teardown of a virtual-display session it calls `display_device::reset_persistence()` (`process.cpp:341-343`) since Windows already restored the prior topology. ##### Driver install Apollo does **not** install SudoVDA programmatically — the INF (`src_assets/windows/drivers/sudovda/SudoVDA.inf`: `Root\SudoMaker\SudoVDA`, `UmdfExtensions=IddCx0102`, signed via `sudovda.cat`/`sudovda.cer`) is installed by the MSI installer. (Apollo's only programmatic `DiInstallDriverW` use is for Steam audio, `audio.cpp:1048-1103`.) #### punktfunk today #### What punktfunk does today punktfunk's SudoVDA backend lives in `crates/punktfunk-host/src/vdisplay/sudovda.rs` behind the `VirtualDisplay` trait (`vdisplay.rs:47-53`), opened via `vdisplay::open` (`vdisplay.rs:525-530`, compositor arg ignored on Windows). It is a faithful but **thinner** port of Apollo. **What it has:** - IOCTL codes computed correctly: ADD/REMOVE/GET_WATCHDOG/DRIVER_PING/GET_VERSION (`sudovda.rs:47-54`). Device open via SetupDi + `CreateFileW` (`sudovda.rs:364-400`). - ADD with mode baked in, AddOut → `target_id`/`luid` (`sudovda.rs:448-478`). - GDI-name resolution from the **stable target id** via CCD (`resolve_gdi_name`, `sudovda.rs:106-140`) — re-resolved on ACCESS_LOST by the DXGI capture path (`capture/dxgi.rs:1259`). - `set_active_mode` (`sudovda.rs:146-265`) enumerates advertised modes, picks exact-or-highest-≤-requested refresh at the requested resolution, `CDS_TEST` before apply, and applies as **PRIMARY** (`CDS_UPDATEREGISTRY|CDS_GLOBAL|CDS_SET_PRIMARY`) so DWM composites it. - Multi-monitor isolation via `isolate_displays`/`restore_displays`/`reassert_isolation` (`sudovda.rs:276-362`) — detach-all-but-virtual for secure-desktop capture, with saved-mode restore on teardown. (This is a *different, simpler* approach than Apollo's box-packing — it detaches rather than rearranges.) - RAII teardown: pinger stop + restore displays + REMOVE by GUID (`SudoVdaKeepalive::drop`, `sudovda.rs:559-580`). - `probe`/`is_available` open-and-close checks (`sudovda.rs:583-594`). **Where it is weaker / missing / fragile:** 1. **Protocol version is read but never gated.** `SudoVdaDisplay::new` logs `ver[0..3]` (`sudovda.rs:414-423`) but there is no `isProtocolCompatible` equivalent — a Major-version-incompatible or older-than-compiled driver is used anyway, where Apollo refuses (`sudovda.h:170-193`). No `VERSION_INCOMPATIBLE` status exists. 2. **Watchdog pinger never detects failure.** The ping thread (`sudovda.rs:485-494`) discards the `ioctl` Result (`let _ = ...`) and never counts failures or invokes a callback. Apollo escalates after >3 misses → closes the device + marks WATCHDOG_FAILED → re-inits (`virtual_display.cpp:603-616`, `process.cpp:65-78`). punktfunk has no `WATCHDOG_FAILED` concept and no re-init path. 3. **No open retry/backoff.** `open_device` (`sudovda.rs:364-400`) tries exactly once; Apollo retries 20→320ms (`virtual_display.cpp:562-577`). Right after boot/driver-install the WUDF service may not be ready, so a single attempt can spuriously fail. 4. **No SET_RENDER_ADAPTER (IOCTL 0x802).** punktfunk omits the code entirely — there is no `setRenderAdapterByName`. On a hybrid/multi-GPU box the IDD may render on the wrong GPU relative to NVENC. Apollo binds it before every create (`virtual_display.cpp:624-654`, `process.cpp:250-252`). 5. **Single hardcoded MonitorGuid.** `MONITOR_GUID` is one constant (`sudovda.rs:56-59`) with an explicit TODO for concurrency. Apollo derives a per-client/per-app GUID (`process.cpp:256-279`), so distinct clients persist independent monitor layouts and don't collide. 6. **Refresh-rate handling is integer-only.** `set_active_mode` uses `DEVMODEW.dmDisplayFrequency` (integer Hz) only; it lacks Apollo's millihertz CCD path (`targetInfo.refreshRate = {rate,1000}`, `virtual_display.cpp:234`) and the ±1 Hz baseline alt-rate fallback (`virtual_display.cpp:271-300`). Fractional/near-miss rates (59.94, 119.88) and exact high rates can fail to apply. 7. **Mode is not persisted to the CCD database.** punktfunk uses `ChangeDisplaySettingsExW` (`sudovda.rs:237-248`); Apollo's `changeDisplaySettings2` uses `SetDisplayConfig(... SDC_SAVE_TO_DATABASE)` (`virtual_display.cpp:178-186`) so the mode survives reconnects. #### Transfer opportunities - **Detect watchdog ping failures and escalate (re-open the device)** (sev high, medium) — In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS. - **Gate on SudoVDA protocol-version compatibility instead of only logging it** (sev medium, small) — In SudoVdaDisplay::new (sudovda.rs:412-432) parse {Major,Minor,Incremental} and compare against a compiled-in EXPECTED_PROTOCOL {Major:0,Minor:2}. If Major differs or our Minor > driver Minor, return Err with a 'driver too old / incompatible — update SudoVDA' message (and a distinct error variant the mgmt API can surface, like Apollo's VirtualDisplayDriverReady in nvhttp.cpp:936). - **Retry device open with exponential backoff** (sev medium, small) — Wrap open_device in SudoVdaDisplay::new (sudovda.rs:412-413) in a 20→320ms backoff loop matching Apollo; on a session-time re-open after watchdog failure, allow a few retries with ~1s spacing. - **Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU** (sev high, medium) — Add `const IOCTL_SET_RENDER_ADAPTER: u32 = ctl(0x802);` and a `#[repr(C)] struct SetRenderAdapterParams { luid: LUID }` in sudovda.rs. Before ADD in create() (sudovda.rs:448), enumerate DXGI adapters (reuse capture/dxgi.rs adapter-by-LUID/name helpers) to match the configured/encoder GPU and issue the IOCTL so the IDD's AddOut LUID matches the capture device's adapter. - **Derive a stable per-client MonitorGuid instead of one global constant** (sev medium, medium) — Pass a client/session identifier into create() (thread it from the m3 handshake) and derive the GUID deterministically from it (e.g. hash the client cert fingerprint into a u128), replacing the constant at sudovda.rs:452-456 and the RemoveParams guid at sudovda.rs:568. Keep a fixed probe GUID for the startup encoder probe like Apollo's PROBE_DISPLAY_UUID. - **Add millihertz CCD mode-set with ±1 Hz fallback and SDC_SAVE_TO_DATABASE persistence** (sev medium, medium) — In set_active_mode (sudovda.rs:146-265), after the integer DEVMODE attempt add a CCD path: QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS), match the path by GDI name, set sourceMode width/height and targetInfo.refreshRate = {hz,1000}, and call SetDisplayConfig with SDC_APPLY|SDC_USE_SUPPLIED_DISPLAY_CONFIG|SDC_SAVE_TO_DATABASE. Add an alt-rate (±1) retry mirroring virtual_display.cpp:294-300. ### Windows host: running as SYSTEM, secure-desktop capture, session/desktop switching + D3D recreation, NVIDIA driver prefs (nvprefs), GPU/adapter preference, display isolation, mDNS publish #### Apollo #### How Apollo (Sunshine lineage) solves the Windows SYSTEM / secure-desktop / GPU-pref problem Apollo splits the problem across **two processes**: a tiny **Windows service** (`SunshineSvc.exe`, `tools/sunshinesvc.cpp`) that runs as LocalSystem in Session 0 and exists only to *launch* the real host into the interactive console session, and **Sunshine.exe** itself (the host) which runs as SYSTEM-in-the-interactive-session and does capture/encode/input. This is the model that took Sunshine years to harden. ##### 1. The service: launching the host as SYSTEM into the interactive session (`tools/sunshinesvc.cpp`) - Registers a control handler with `RegisterServiceCtrlHandlerEx` (`sunshinesvc.cpp:170`) and **opts into `SERVICE_ACCEPT_SESSIONCHANGE`** plus stop/preshutdown (`:239`). This is the load-bearing detail: the service is notified on `WTS_CONSOLE_CONNECT` (`:31`) and signals a `session_change_event`. - It loops every 3 s (`:244`): `WTSGetActiveConsoleSessionId()` (`:245`) → `DuplicateTokenForSession()` (`:95`). That helper does `OpenProcessToken(TOKEN_DUPLICATE)` → `DuplicateTokenEx(TOKEN_ALL_ACCESS, SecurityImpersonation, TokenPrimary)` → **`SetTokenInformation(TokenSessionId, console_session_id)`** (`:111`). i.e. it clones its own LocalSystem token and re-stamps it into the *console* session — this is how you get a SYSTEM process to appear in Session 1. - Launch: `CreateProcessAsUserW(console_token, "Sunshine.exe", …, EXTENDED_STARTUPINFO_PRESENT, …, startup_info)` with **`startup_info.StartupInfo.lpDesktop = "winsta0\\default"`** (`:219`, `:267`) — explicitly targeting the interactive window station + default desktop. - **Job object per session** (`CreateJobObjectForChildProcess`, `:54`): `JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE` (kill the host if the service dies — no orphan holding the log handle) + `JOB_OBJECT_LIMIT_BREAKAWAY_OK` (so the host can spawn games that outlive it). Job objects cannot span sessions, so it builds a new one each loop (`:256`). - **Session-change handling = restart, not in-process fixup** (`:276`): the service waits on `{stop_event, host_process, session_change_event}`. On a real console-session change (re-checks `WTSGetActiveConsoleSessionId() != console_session_id`, `:279`) it **terminates and re-launches the host into the new session**. Graceful termination is itself a child process trick: `RunTerminationHelper` (`:133`) re-launches the service binary with `--terminate ` into the user session (`lpDesktop=winsta0\default`, `:145`), which `AttachConsole(pid)` + `GenerateConsoleCtrlEvent(CTRL_C_EVENT)` (`:325`) to send a clean Ctrl-C; 20 s timeout then `TerminateProcess` (`:291`). - Log handle is created inheritable in `%TEMP%\sunshine.log` and passed via `PROC_THREAD_ATTRIBUTE_HANDLE_LIST` so only that one handle is inherited (`:236`). ##### 2. Capture attaches to the input desktop, not the service model — `syncThreadDesktop()` (`misc.cpp:251`) The single most-reused primitive: `OpenInputDesktop(DF_ALLOWOTHERACCOUNTHOOK, FALSE, GENERIC_ALL)` → `SetThreadDesktop(hDesk)` → `CloseDesktop` (`misc.cpp:251-268`). The capture thread calls this to bind to whatever desktop currently has input focus (Default ↔ Winlogon/secure). Hard-won edge cases in how it's used: - Called **before every `DuplicateOutput`/`DuplicateOutput1` attempt**, inside a 2-iteration retry with a 200 ms sleep (`display_base.cpp:67-76`, `:95-104`) — the secure-desktop switch races mode changes, so one attempt is not enough. - During **display enumeration** it is called **exactly once up front** (`display_base.cpp:1045`) and `test_dxgi_duplication(..., enumeration_only=true)` deliberately does NOT re-sync per-output (`:387-392`), with a comment that enumeration must "either fully succeed or fully fail, otherwise it can lead to the capture code switching monitors unexpectedly." ##### 3. WinSta/desktop switch detection + D3D recreation — `IDXGIFactory::IsCurrent` + `capture_e::reinit` (`display_base.cpp`) - The capture loop calls **`factory->IsCurrent()` once per frame** (`display_base.cpp:235`); if false (HDR toggle, GPU/display topology change) it returns `capture_e::reinit`, which tears down and recreates the whole DXGI/D3D stack. - `AcquireNextFrame` maps `WAIT_ABANDONED | DXGI_ERROR_ACCESS_LOST | DXGI_ERROR_ACCESS_DENIED → capture_e::reinit` (`:149-152`); `ReleaseFrame` maps `DXGI_ERROR_ACCESS_LOST → reinit` and treats `DXGI_ERROR_INVALID_CALL` (frame already released) as benign (`:178-183`). - **The unfair-lock starvation edge case** (`:289-308`): the D3D11 device lock is held the whole time `AcquireNextFrame` blocks; when nothing is updating the screen it spins ~100% in that call and starves the encoder thread, making reinit take *seconds*. Fix: on timeout, `sleep_for(10ms)` *without* the lock held. - `SetThreadExecutionState(ES_CONTINUOUS | ES_DISPLAY_REQUIRED)` keeps the display awake during capture, with a comment about the "neverending wake/sleep cycle" if the display sleeps and triggers reinit (`:224`). On output-not-found it powers the display on and retries (`:532-536`). ##### 4. GPU/adapter preference — the DXGI output-reparenting hook (`display_base.cpp:417-454`) This is the deepest hard-won trick. On Optimus/hybrid laptops and multi-GPU desktops, DXGI performs "GPU preference resolution" (registry + power settings + hybrid-adapter DDI) and **reparents outputs onto the render GPU**, which **breaks Desktop Duplication** (the output is no longer where DDA expects it). Apollo uses **MinHook** to hook `win32u.dll!NtGdiDdDDIGetCachedHybridQueryValue` (`:451-453`) and force `D3DKMT_GPU_PREFERENCE_STATE_UNSPECIFIED` (`:424`), preventing the reparenting entirely. It also sets `SetProcessDpiAwarenessContext(PER_MONITOR_AWARE_V2)` once (`:443`). Adapter selection itself is a simple name-match loop (`config::video.adapter_name`) over `EnumAdapters1`/`EnumOutputs`, taking the first output that is `AttachedToDesktop` AND passes `test_dxgi_duplication` (`:480-526`). ##### 5. NVIDIA driver prefs — nvprefs (`nvprefs_interface.cpp`, `driver_settings.cpp`) A full NvAPI DRS (Driver Settings) manager. `streaming_will_start()` (`misc.cpp:1153-1161`) re-applies it on every stream start (in case the user changed panel settings): - **Application profile** "SunshineStream" for `sunshine.exe` (`driver_settings.cpp:216`): creates the profile + application entry if missing, sets **`PREFERRED_PSTATE = PREFERRED_PSTATE_PREFER_MAX`** (max GPU clocks for the encoder, `:288-308`), removing it again if the user disables high-power mode. - **Global/base profile**: sets **`OGL_CPL_PREFER_DXPRESENT = ENABLED`** (`driver_settings.cpp:184-206`) so OpenGL/Vulkan apps present through a DXGI swapchain — required for WGC/DDA to capture GL/Vulkan fullscreen apps. This is gated on `opengl_vulkan_on_dxgi` (`:166`). - **Crash-safety via an undo file** (`nvprefs_interface.cpp`): before changing the *global* profile it writes a JSON undo file in `%ProgramData%\Sunshine\nvprefs_undo.json` (`:53`, `:142-183`), so if the process dies mid-stream the next launch restores the user's panel state (`restore_from_and_delete_undo_file_if_exists`, `:71`). Application-profile changes don't need undo (they're our own profile). NvAPI is loaded dynamically and `NvAPI_Initialize()` failing is treated as "no NVIDIA card, ignore" (`driver_settings.cpp:44-48`). ##### 6. mDNS service publish — Windows-native `DnsServiceRegister` (`publish.cpp`) Loads `dnsapi.dll` dynamically (`:218`) and uses **`DnsServiceRegister` / `DnsServiceDeRegister` / `DnsServiceFreeInstance`** (`:92`) — the OS's own mDNS responder, not a bundled library. RAII `mdns_registration_t` registers on construct, deregisters on destruct (`:172-197`). The async API is bridged to sync with a `safe::alarm_t` waited in `service()` (`:157`). **Hard-won RFC-1035 edge case** (`:120-133`): a TXT record with zero strings is illegal; Windows emits exactly that unless you set one empty key/value pair — **Apple's mDNS resolver rejects the whole answer otherwise**. If `dnsapi.dll` can't load it logs "add PC manually from Moonlight" and continues (`:221`). ##### 7. Process/stream tuning at stream start (`misc.cpp:1108-1253`) `streaming_will_start()`: `DwmEnableMMCSS(true)` (`:1139`); **`NtSetTimerResolution` to max** (0.5 ms), falling back to `timeBeginPeriod(1)` (`:1142`); `SetPriorityClass(HIGH_PRIORITY_CLASS)` (`:1151`); re-applies nvprefs (`:1154`); enables **WLAN media-streaming low-latency mode** on connected Wi-Fi NICs via dynamically-loaded `wlanapi.dll` (`:1164-1192`); and — clever — if `!GetSystemMetrics(SM_MOUSEPRESENT)` it **enables Mouse Keys** to force the cursor to render on a headless box (`:1195-1219`). `streaming_will_stop()` reverts every one of these. There is also a `is_user_session_locked()` / `test_no_access_to_ccd_api()` pre-flight (`utils.cpp:184-237`) using `WTSQuerySessionInformationW(WTSSessionInfoEx)` and a `SetDisplayConfig(SDC_VALIDATE)` probe to detect "we can't change display settings right now" before trying. #### punktfunk today #### What punktfunk does today (with file:line) and where it is weaker punktfunk's **secure-desktop / desktop-switch capture recovery is genuinely mature** — in the DXGI-on-virtual-display case it is arguably more aggressive than Apollo's, because it has to keep a *virtual* output as the only display through Winlogon switches: - **Desktop attach**: `attach_input_desktop()` (`capture/dxgi.rs:166`) = `OpenInputDesktop` + `SetThreadDesktop`, called at open (`:882`) and on every recovery (`:1262`) — punktfunk's equivalent of Apollo's `syncThreadDesktop()`. - **Desktop-switch detection**: `DesktopWatcher` (`capture/desktop_watch.rs`) polls the input-desktop NAME at 50 Hz and publishes `Default`/`Winlogon` to an `AtomicU8` (`:41-58`). Comment correctly notes WTS session notifications miss UAC (`:7`). This is *more reliable* than Apollo's WTS-only signal for the UAC case. - **D3D recreation on switch**: `next_frame` treats `ACCESS_LOST | INVALID_CALL | DEVICE_REMOVED | DEVICE_RESET` as recoverable (`dxgi.rs:1378-1382`), with a **tiered recovery**: cheap `try_reduplicate()` on the same device for HDR/overlay churn (`:1219`, avoids encoder re-init), full `recreate_dupl()` (new device on the current desktop) only when the output is gone (`:1252`), throttled to 250 ms with last-frame repeat so a long secure dwell doesn't starve the client (`:1409-1421`). The `INVALID_CALL`-on-post-login case is explicitly handled (`:1374`). - **GDI re-resolution from a stable id**: `recreate_dupl` re-resolves the GDI name from the stable SudoVDA `target_id` (`:1259`, `vdisplay/sudovda.rs:106 resolve_gdi_name` via `QueryDisplayConfig`) — fixes the "hunting the stale `\\.\DISPLAYn`" bug Apollo never had to solve (Apollo captures physical outputs). - **Display isolation so Winlogon renders to the virtual output**: `isolate_displays` (`sudovda.rs:276`) detaches all other monitors via `ChangeDisplaySettingsExW(0x0 mode, CDS_NORESET|CDS_GLOBAL)`, `reassert_isolation` re-runs it on every recovery tick (`:358`), `restore_displays` on teardown (`:333`). Apollo has no equivalent (it isolates nothing). - **Adapter selection**: searches ALL adapters for the output's GDI name because the SudoVDA monitor is enumerated under the *rendering* GPU, not the SudoVDA LUID (`dxgi.rs:790-855`); `adapter_luid` kept only as a tie-break (`:788`). ##### Where punktfunk is weaker / missing / fragile 1. **No real Windows service — relies on a PsExec scheduled task.** The launch chain is a scheduled task → `PsExec64 -s -i 1` → `wscript.exe launch.vbs` → hidden `host-run.cmd` (`docs/windows-host.md:78-84`). There is **no `SERVICE_CONTROL_SESSIONCHANGE` relaunch** — the doc even lists it as unimplemented "step 6" (`docs/windows-secure-desktop.md:89`). PsExec is a 3rd-party SysInternals tool, not redistributable cleanly, and `-s -i 1` hard-codes session 1. None of the launch scripts (`launch.vbs`, `host-run.cmd`) are checked into the repo (only `scripts/headless/win-build.cmd` exists). This is the single biggest fragility vs Apollo's `sunshinesvc.cpp`. 2. **No nvprefs / NvAPI at all.** `grep` for `nvprefs|NvAPI|DRS_|PREFERRED_PSTATE|DXPRESENT` across the host returns nothing. No PREFERRED_PSTATE_MAX for the encoder, no OGL_CPL_PREFER_DXPRESENT (so GL/Vulkan fullscreen apps may not be capturable via WGC/DDA), and no undo-file crash safety. 3. **No DXGI GPU-preference / output-reparenting hook.** No MinHook of `NtGdiDdDDIGetCachedHybridQueryValue`. On a hybrid/Optimus box DXGI can reparent the SudoVDA output onto the render GPU and break DDA. punktfunk's "search all adapters" partly papers over this but does not prevent the reparenting itself. 4. **mDNS uses the cross-platform `mdns-sd` crate, not Windows-native `DnsServiceRegister`** (`discovery.rs:17`). It works, but it does NOT carry Apollo's RFC-1035 empty-TXT fix — and the GameStream/Moonlight mDNS path on Windows is unverified (`docs/windows-host.md:46`). A non-RFC-compliant TXT can be rejected by Apple's resolver. 5. **No stream-start system tuning.** No `NtSetTimerResolution`/`timeBeginPeriod`, no `DwmEnableMMCSS`, no `SetPriorityClass(HIGH_PRIORITY_CLASS)`, no `SetThreadExecutionState(ES_DISPLAY_REQUIRED)`, no WLAN media-streaming mode, no Mouse-Keys-on-headless trick. (Linux has none of this either, but on Windows these are real latency/jitter levers Apollo proves out.) 6. **No `factory->IsCurrent()` per-frame check.** punktfunk reacts to errors from `AcquireNextFrame` but does not proactively detect HDR/topology changes the way Apollo does each frame (`display_base.cpp:235`) — it relies on ACCESS_LOST firing, which it usually does, but IsCurrent is the cleaner signal. 7. **No `is_user_session_locked()` / CCD pre-flight.** Before a mode-set or isolation, Apollo checks `WTSQuerySessionInformationW` + `SetDisplayConfig(SDC_VALIDATE)` (`utils.cpp:184-237`); punktfunk just attempts and handles failure, which can thrash the display during a lock. 8. **Clock epoch is `SystemTime::now()` (`dxgi.rs:1530`), not `GetSystemTimePreciseAsFileTime`.** The doc itself flags this as a cross-machine-latency risk (`docs/windows-host.md:284-286`); std SystemTime on Windows historically has coarser (~1–15 ms) resolution than the precise FILETIME API, which can corrupt the ClockProbe/ClockEcho skew handshake. #### Transfer opportunities - **Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change** (sev high, large) — Add a small Rust service binary (new crate or punktfunk-host `service` subcommand) using windows::Win32::System::Services (RegisterServiceCtrlHandlerEx, StartServiceCtrlDispatcher) that mirrors sunshinesvc.cpp: WTSGetActiveConsoleSessionId -> DuplicateTokenEx+SetTokenInformation(TokenSessionId) -> CreateProcessAsUserW(lpDesktop=winsta0\\default) into a kill-on-close job, accept SERVICE_ACCEPT_SESSIONCHANGE, and relaunch the host on a genuine console-session change. Ship an installer and drop the PsExec dependency. - **Add an NvAPI driver-settings manager (PREFERRED_PSTATE_MAX + OGL_CPL_PREFER_DXPRESENT) with a crash-safe undo file** (sev medium, large) — Add a windows-only nvprefs module wrapping NvAPI DRS (load nvapi64 dynamically, treat NvAPI_Initialize failure as 'no NVIDIA, skip'). Create a 'punktfunk' app profile with PREFERRED_PSTATE_PREFER_MAX, set OGL_CPL_PREFER_DXPRESENT_ENABLED on the base profile behind a config flag, write an undo file under %ProgramData%\\punktfunk before global changes, and call it on session start (the new stream_will_start hook below). - **Hook win32u!NtGdiDdDDIGetCachedHybridQueryValue to stop DXGI output-reparenting on hybrid/Optimus GPUs** (sev medium, medium) — Add a once-init in the Windows capture path (capture/dxgi.rs open) that installs the same hook via a minhook-rs/detour crate (or a manual IAT/inline hook) on NtGdiDdDDIGetCachedHybridQueryValue forcing STATE_UNSPECIFIED, plus SetProcessDpiAwarenessContext(PER_MONITOR_AWARE_V2). Gate it to NVIDIA/hybrid boxes; it's process-lifetime so no teardown needed. - **Add a Windows stream_will_start/stop hook: timer resolution, MMCSS, HIGH_PRIORITY_CLASS, display-required, headless Mouse Keys** (sev medium, medium) — Add a windows-only RAII guard invoked when a session starts (punktfunk1.rs/pipeline session setup) that raises timer resolution (NtSetTimerResolution or timeBeginPeriod(1)), DwmEnableMMCSS(true), SetPriorityClass(HIGH_PRIORITY_CLASS), and wraps the DXGI capture loop in SetThreadExecutionState(ES_CONTINUOUS|ES_DISPLAY_REQUIRED) (capture/dxgi.rs next_frame loop), reverting on drop. Optionally the headless Mouse-Keys trick for cursor visibility. - **Use Windows-native DnsServiceRegister (or fix the TXT record) so Apple's mDNS resolver accepts the host** (sev low, medium) — Either (a) verify mdns-sd always emits an RFC-1035-valid TXT (never zero strings) and add a regression test, or (b) add a windows-only discovery backend using DnsServiceRegister via the windows crate's DNS APIs mirroring publish.cpp, including the single-empty-TXT workaround, so Apple NWBrowser/Moonlight discover the host reliably. - **Add per-frame IDXGIFactory::IsCurrent reinit detection and switch the host clock to GetSystemTimePreciseAsFileTime** (sev medium, small) — In capture/dxgi.rs next_frame, query the cached IDXGIFactory's IsCurrent() once per loop and trigger the existing recreate path when it goes false (catches HDR/topology changes cleanly). Replace now_ns() on Windows with GetSystemTimePreciseAsFileTime converted to Unix-epoch ns so ClockProbe/ClockEcho skew correction stays accurate cross-machine. ### Completeness critic — areas flagged as under-covered - Win32 input injection robustness (keyboard scancode normalization, KEYEVENTF_UNICODE text injection, desktop-retry loop semantics, extended-key set) — the 8 prior investigations were all capture/HDR; input.cpp vs sendinput.rs was never compared - Session-level Windows latency tuning at stream start/stop (NtSetTimerResolution 0.5ms, DwmEnableMMCSS, HIGH_PRIORITY_CLASS, WLAN media-streaming low-latency mode, SPI_SETMOUSEKEYS cursor-forcing on headless boxes) — Apollo's streaming_will_start/stop; punktfunk has none of it - WASAPI audio robustness (default-device-change mid-session reinit via IMMNotificationClient, AUDCLNT_E_DEVICE_INVALIDATED recovery, AUDCLNT_BUFFERFLAGS_SILENT handling, virtual audio sink install for headless hosts, 5.1/7.1 surround channel masks) — wasapi_cap.rs is stereo-only with no recovery - Exact/fractional refresh-rate mode setting (SetDisplayConfig with {rate,1000} millihertz vs ChangeDisplaySettingsExW integer-Hz) — clients commonly request 59.94/119.88 Hz - Windows virtual gamepad type negotiation + DualSense/DS4 ViGEm target and full feedback (lightbar, adaptive triggers, touchpad, motion, battery) — gamepad_windows.rs is hardcoded Xbox360Wired despite the protocol negotiating pad type - SendInput event batching (single-call multi-event submission) and synthetic touch/pen injection (CreateSyntheticPointerDevice) — punktfunk injects one event per SendInput call and no-ops touch --- ## Part 4 — Prioritized transferable-improvement backlog 96 candidates, Windows-host first, then severity, then effort. **✓V** = passed the independent adversarial-verify pass. *Area* is the investigation that surfaced it. > **Status updates** > - **2026-06-16 — #13 ✅ DONE.** Two-pass cursor compositing (alpha + XOR layers) implemented in > `capture/dxgi.rs` (`CursorShape`, `convert_pointer_shape` decomposition, `CursorCompositor::set_shapes`/ > `draw_layer`, two-pass GPU + CPU composite). Independently reviewed (ship). Pending: Windows CI/dev-VM compile. > - **2026-06-16 — #21 ⊘ ALREADY-HANDLED (not a bug).** The premise is wrong for punktfunk: DXGI > `AcquireNextFrame` returns **S_OK for pointer-only updates** (`LastMouseUpdateTime != 0`, > `LastPresentTime == 0`), and `acquire()` always re-runs `present_acquired` on S_OK (`dxgi.rs:1407,1474`), > re-copying the desktop and recompositing the cursor at its new position. `last_present` is repeated > only on a genuine `WAIT_TIMEOUT` (nothing changed) or a rebuild gap — correct. No stutter from this > cause. The only real (perf-only) delta is the redundant full-surface copy per pointer update; deferred. | # | Improvement | Area | Win | Sev | Eff | ✓V | |---|---|---|---|---|---|---| | 1 | Switch SendInput to retry-on-failure desktop reattach (drop per-event OpenInputDesktop) | cmp:input | Y | high | small | | | 2 | Detect resolution/format change on the acquire hot path, not only during rebuild | win:capture-dxgi-dd | Y | high | small | | | 3 | Per-frame IsCurrent() check to catch HDR/GPU/mode changes | win:capture-wgc | Y | high | small | | | 4 | ✅ **DONE** — Batched/GSO send for the GameStream video plane on Windows | cmp:protocol-streaming | Y | high | medium | ✓ | | 5 | Gate the GameStream HTTPS plane on the paired-cert allow-list | cmp:gamestream-http-pairing | Y | high | medium | | | 6 | Query NVENC encode capabilities before init and degrade gracefully | cmp:video-encode | Y | high | medium | | | 7 | Detect default-render-device changes and reinit WASAPI capture | cmp:audio | Y | high | medium | | | 8 | Move GameStream input injection off the ENet service thread | cmp:input | Y | high | medium | | | 9 | Actually launch the app/game on Windows (CreateProcessAsUserW into the user session) | cmp:process-launch | Y | high | medium | | | 10 | Native system tray with state-driven icon + notifications | cmp:config-management | Y | high | medium | | | 11 | Treat S_OK-with-no-change frames as timeouts via DXGI update flags | win:capture-dxgi-dd | Y | high | medium | | | 12 | Detect size/format change in WGC and signal reinit | win:capture-wgc | Y | high | medium | | | 13 | ✅ **DONE** — Split every cursor shape into an alpha image + an XOR image (two-pass composite) | win:cursor-compositing | Y | high | medium | | | 14 | Map absolute mouse through the real virtual-desktop / output rect, not a blind 0..65535 normalize | win:input-sendinput-vigem | Y | high | medium | | | 15 | Detect watchdog ping failures and escalate (re-open the device) | win:virtual-display-sudovda | Y | high | medium | | | 16 | Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU | win:virtual-display-sudovda | Y | high | medium | | | 17 | Add streaming_will_start/stop session-level latency tuning on Windows | win:critic | Y | high | medium | | | 18 | Recover WASAPI loopback from default-device change and AUDCLNT_E_DEVICE_INVALIDATED | win:critic | Y | high | medium | | | 19 | Implement true reference-frame invalidation with a multi-ref DPB instead of always-full-IDR | cmp:video-encode | Y | high | large | | | 20 | In-binary Windows service install + interactive-session launch | cmp:config-management | Y | high | large | | | 21 | ⊘ **ALREADY-HANDLED** — Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame | win:cursor-compositing | Y | high | large | | | 22 | Add real reference-frame invalidation (RFI) instead of always forcing IDR | win:nvenc-d3d11 | Y | high | large | | | 23 | Add a DS4 (DualShock4) ViGEm target on Windows with type auto-selection, motion, touchpad, battery and timestamp pump | win:input-sendinput-vigem | Y | high | large | | | 24 | Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change | win:system-secure-desktop | Y | high | large | | | 25 | Elevate capture/encode/send thread priority on the host hot path | cmp:protocol-streaming | Y | medium | small | ✓ | | 26 | Atomic temp+rename persistence for the GameStream paired store | cmp:gamestream-http-pairing | Y | medium | small | | | 27 | Always emit explicit SDR color VUI (primaries/transfer/matrix/range), not just HDR | cmp:video-encode | Y | medium | small | | | 28 | Set repeatSPSPPS=1 and wire slicesPerFrame for the Windows NVENC config | cmp:video-encode | Y | medium | small | | | 29 | Raise the WASAPI capture thread to MMCSS Pro Audio priority | cmp:audio | Y | medium | small | | | 30 | Map absolute mouse to the target output rect, not the whole virtual desktop | cmp:input | Y | medium | small | | | 31 | Per-(app,client) stable virtual-display GUID instead of one fixed MONITOR_GUID | cmp:process-launch | Y | medium | small | | | 32 | "Open management UI" browser-launch entry point | cmp:config-management | Y | medium | small | | | 33 | Logs read endpoint for the console | cmp:config-management | Y | medium | small | | | 34 | Release the duplication device lock during idle to avoid encoder starvation | win:capture-dxgi-dd | Y | medium | small | | | 35 | Retry DuplicateOutput at startup and request encoder-supported formats via Output5 | win:capture-dxgi-dd | Y | medium | small | | | 36 | Live per-frame cursor-capture toggle | win:capture-wgc | Y | medium | small | | | 37 | Render the monochrome 'inverse of screen' pixels via the XOR pass instead of dropping them | win:cursor-compositing | Y | medium | small | | | 38 | Always signal the SDR VUI (primaries/transfer/matrix/range), not just HDR | win:hdr-colorspace | Y | medium | small | | | 39 | Inject UTF-8 text via KEYEVENTF_UNICODE (dead keys / non-US characters) | win:input-sendinput-vigem | Y | medium | small | | | 40 | Gate on SudoVDA protocol-version compatibility instead of only logging it | win:virtual-display-sudovda | Y | medium | small | | | 41 | Retry device open with exponential backoff | win:virtual-display-sudovda | Y | medium | small | | | 42 | Add per-frame IDXGIFactory::IsCurrent reinit detection and switch the host clock to GetSystemTimePreciseAsFileTime | win:system-secure-desktop | Y | medium | small | | | 43 | Socket QoS / DSCP marking on the media sockets | cmp:protocol-streaming | Y | medium | medium | ✓ | | 44 | Plumb HDR10 static metadata (mastering display + MaxCLL/MaxFALL) | cmp:video-encode | Y | medium | medium | | | 45 | Coalesce relative-mouse/scroll/controller spam before injection | cmp:input | Y | medium | medium | | | 46 | Display-config apply/revert with a retry scheduler and guaranteed revert on disconnect | cmp:process-launch | Y | medium | medium | | | 47 | Harden GPU scheduling priority + SetMaximumFrameLatency + NVIDIA-HAGS NVENC-realtime avoidance | win:capture-dxgi-dd | Y | medium | medium | | | 48 | Use SystemRelativeTime (QPC) as the frame timestamp | win:capture-wgc | Y | medium | medium | | | 49 | Stop baking the cursor destructively into the repeated gpu_copy texture | win:cursor-compositing | Y | medium | medium | | | 50 | Gate HDR on (client requested HDR) AND (desktop is actually HDR), and signal the result in Welcome | win:hdr-colorspace | Y | medium | medium | | | 51 | Query nvEncGetEncodeCaps and gate config on real GPU capabilities | win:nvenc-d3d11 | Y | medium | medium | | | 52 | Use async encode with a Win32 completion event + timeout | win:nvenc-d3d11 | Y | medium | medium | | | 53 | Minimize NvEnc API/struct versions per codec for older-driver compatibility | win:nvenc-d3d11 | Y | medium | medium | | | 54 | Use a canonical US-English VK→scancode table for normalized keys, and fall back to VK when no scancode maps | win:input-sendinput-vigem | Y | medium | medium | | | 55 | Derive a stable per-client MonitorGuid instead of one global constant | win:virtual-display-sudovda | Y | medium | medium | | | 56 | Add millihertz CCD mode-set with ±1 Hz fallback and SDC_SAVE_TO_DATABASE persistence | win:virtual-display-sudovda | Y | medium | medium | | | 57 | Hook win32u!NtGdiDdDDIGetCachedHybridQueryValue to stop DXGI output-reparenting on hybrid/Optimus GPUs | win:system-secure-desktop | Y | medium | medium | | | 58 | Add a Windows stream_will_start/stop hook: timer resolution, MMCSS, HIGH_PRIORITY_CLASS, display-required, headless Mouse Keys | win:system-secure-desktop | Y | medium | medium | | | 59 | Honor AUDCLNT_BUFFERFLAGS_SILENT and emit silence keepalive while no audio renders | win:critic | Y | medium | medium | | | 60 | Set the SudoVDA active mode at exact fractional refresh via SetDisplayConfig | win:critic | Y | medium | medium | | | 61 | Handle non-normalized VK codes and add KEYEVENTF_UNICODE text fallback | win:critic | Y | medium | medium | | | 62 | UPnP/IGD + PCP port mapping and IPv6 pinholes for WAN reach | cmp:gamestream-http-pairing | Y | medium | large | | | 63 | Support 5.1/7.1 WASAPI loopback + multistream encode on Windows | cmp:audio | Y | medium | large | | | 64 | Implement client-mic passthrough on Windows | cmp:audio | Y | medium | large | | | 65 | Add virtual touch + pen on the Windows host via synthetic pointer devices | cmp:input | Y | medium | large | | | 66 | Set HDR on the virtual display and advertise IsHdrSupported when the client requests it | cmp:process-launch | Y | medium | large | | | 67 | Add client-framerate frame pacing with a high-precision timer | win:capture-dxgi-dd | Y | medium | large | | | 68 | Root-cause the CreateFreeThreaded hang instead of the watchdog/abandon | win:capture-wgc | Y | medium | large | | | 69 | Convert to P010 in a D3D11 shader and feed NVENC YUV instead of ABGR10 RGB | win:hdr-colorspace | Y | medium | large | | | 70 | Add an NvAPI driver-settings manager (PREFERRED_PSTATE_MAX + OGL_CPL_PREFER_DXPRESENT) with a crash-safe undo file | win:system-secure-desktop | Y | medium | large | | | 71 | Install/select a virtual audio sink so a headless Windows host has audio with no physical device | win:critic | Y | medium | large | | | 72 | Grow SO_SNDBUF on the GameStream video/audio sockets | cmp:protocol-streaming | Y | low | small | | | 73 | Decode NVENCSTATUS into readable names and detect InvalidParam structurally | cmp:video-encode | Y | low | small | | | 74 | Surface WASAPI data-discontinuity as a glitch diagnostic | cmp:audio | Y | low | small | | | 75 | Inject per-app launch env (client res/fps/HDR/audio + status) for launch scripts | cmp:process-launch | Y | low | small | | | 76 | auto_detach heuristic for launcher-style apps (Steam/UWP) that exit immediately | cmp:process-launch | Y | low | small | | | 77 | Cache the HDR RTV and feature-probe optional session properties | win:capture-wgc | Y | low | small | | | 78 | Validate masked-color mask bytes and log illegal values | win:cursor-compositing | Y | low | small | | | 79 | Log the full DXGI_OUTPUT_DESC1 + capture format on HDR setup | win:hdr-colorspace | Y | low | small | | | 80 | Derive HDR cursor/graphics white from the display, not a fixed 203 nits | win:hdr-colorspace | Y | low | small | | | 81 | Add zeroReorderDelay/lookahead-off/lowDelayKeyFrameScale and always emit SDR VUI | win:nvenc-d3d11 | Y | low | small | | | 82 | Add wheel-delta accumulation/quantization for low-res scroll consumers | win:input-sendinput-vigem | Y | low | small | | | 83 | Add a static canonical VK→scancode table as a layout-independent fallback | cmp:input | Y | low | medium | | | 84 | Handle rotated outputs in cursor positioning | win:cursor-compositing | Y | low | medium | | | 85 | Honor client slices-per-frame and offer NVENC intra-refresh | win:nvenc-d3d11 | Y | low | medium | | | 86 | Add the absolute-mode left-button-release delay + button-state dedup | win:input-sendinput-vigem | Y | low | medium | | | 87 | Use Windows-native DnsServiceRegister (or fix the TXT record) so Apple's mDNS resolver accepts the host | win:system-secure-desktop | Y | low | medium | | | 88 | Recover when there is no render endpoint at session start | cmp:audio | Y | low | large | | | 89 | Support DualSense/DS4 ViGEm target + feedback on Windows, honoring negotiated pad type | win:critic | Y | low | large | | | 90 | Bitrate-derived rate-control pacing (vs frame-interval-only) | cmp:protocol-streaming | | medium | medium | ✓ | | 91 | Named, permissioned paired-device records for the GameStream store | cmp:gamestream-http-pairing | | medium | medium | | | 92 | Actually reject unpaired GameStream client certs (close the unpair gap) | cmp:config-management | | medium | medium | | | 93 | Persisted host config + read/write config API endpoint | cmp:config-management | | medium | large | | | 94 | Consume the GameStream client loss-stats report | cmp:protocol-streaming | | low | small | ✓ | | 95 | Tolerate not-yet-valid/expired client certs during verification | cmp:gamestream-http-pairing | | low | small | | | 96 | OTP pre-armed PIN with silent-failure anti-oracle on the GameStream path | cmp:gamestream-http-pairing | | low | medium | | ### Details — high-severity items (and all ✓V-verified) #### 1. Switch SendInput to retry-on-failure desktop reattach (drop per-event OpenInputDesktop) *Area:* `cmp:input` · *Windows-host:* yes · *Severity:* high · *Effort:* small - **Apollo does:** send_input() / inject_synthetic_pointer_input() call SendInput FIRST, and only on failure (0 injected) re-run syncThreadDesktop() (OpenInputDesktop(DF_ALLOWOTHERACCOUNTHOOK)+SetThreadDesktop) and retry once, tracking the desktop in a thread_local _lastKnownInputDesktop — src/platform/windows/input.cpp:477,499 + src/platform/windows/misc.cpp:251 - **punktfunk gap:** SendInputInjector::inject() calls reattach_input_desktop() (an OpenInputDesktop+SetThreadDesktop+CloseDesktop) at the TOP of EVERY event — crates/punktfunk-host/src/inject/sendinput.rs:97,50-69. This is a syscall triple per mouse-move; punktfunk's own docs/windows-secure-desktop.md:78-80 lists this exact refactor (step 2) as planned but unshipped. - **Proposal:** Inject first; cache the HDESK thread-local; only on a 0/partial SendInput result call reattach_input_desktop() and retry once. Use DF_ALLOWOTHERACCOUNTHOOK in the OpenInputDesktop access (sendinput.rs:52-56 currently passes DESKTOP_CONTROL_FLAGS(0)) so the secure desktop is reachable. Keeps the steady-state hot path to a single SendInput call. #### 2. Detect resolution/format change on the acquire hot path, not only during rebuild *Area:* `win:capture-dxgi-dd` · *Windows-host:* yes · *Severity:* high · *Effort:* small - **Apollo does:** Every frame Apollo reads src->GetDesc() and reinits if desc.Width/Height != width_before_rotation/height_before_rotation or capture_format != desc.Format (display_vram.cpp:1215-1236, display_ram.cpp:253-265, wgc 1662-1674). - **punktfunk gap:** punktfunk only re-reads dimensions inside recreate_dupl (dxgi.rs:1298-1313). On the normal acquire path (dxgi.rs:1426-1492) it never validates the acquired texture's desc, so a mode change that doesn't raise ACCESS_LOST leads to CopyResource of a mismatched-size/format source into a stale gpu_copy/staging/fp16_src — silent corruption or a hard copy failure. - **Proposal:** In acquire(), after res.cast::(), call GetDesc and compare Width/Height/Format against self.width/height and the expected format (BGRA8 vs R16G16B16A16_FLOAT). On mismatch, ReleaseFrame and run the existing recreate_dupl path (or drop gpu_copy/staging/fp16/hdr10 textures and update width/height/hdr_fp16) so the encoder re-inits cleanly. This makes live resolution + HDR-toggle changes robust even when DDA doesn't fault. #### 3. Per-frame IsCurrent() check to catch HDR/GPU/mode changes *Area:* `win:capture-wgc` · *Windows-host:* yes · *Severity:* high · *Effort:* small - **Apollo does:** display_base_t::capture checks factory->IsCurrent() once per frame and returns reinit on any DXGI factory invalidation — HDR toggle, mode change, GPU hotplug/reparent (display_base.cpp:235). - **punktfunk gap:** wgc.rs has no equivalent; it cannot notice an HDR enable/disable or GPU change and will keep capturing with a stale colorspace/geometry assumption. - **Proposal:** Hold an IDXGIFactory1 in WgcCapturer (from the same adapter as make_device) and call IsCurrent() at the top of next_frame/wait_and_drain; on false, return the reinit signal. This pairs with wgc-size-format-reinit to give a complete change-detection story. #### 4. Batched/GSO send for the GameStream video plane on Windows *Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* high · *Effort:* medium · **✓ verified · ✅ DONE (2026-06-16)** > **Resolution:** Implemented per the refined proposal. Added a reusable Windows-only > `punktfunk_core::transport::send_uso_all(&UdpSocket, &[&[u8]]) -> io::Result` that reuses the > native plane's proven `send_one_uso` + `uso` on/off latch + `uso_unsupported`, with the same > uniform-size guard and ≤512-segment chunking. `gamestream/stream.rs` `sendmmsg_all` now has a > `#[cfg(target_os="windows")]` arm that calls it per 16-packet paced burst (one `WSASendMsg` instead > of 16 `send`s) and sends any remainder scalar; the Linux `sendmmsg` arm and a generic scalar arm are > unchanged. PUNKTFUNK_GSO=0 kill-switch + auto-fallback inherited. Linux build unaffected; > punktfunk-core type-checks for x86_64-pc-windows-msvc. Host Windows compile deferred to CI/dev box. - **Apollo does:** Apollo sends every plane through platf::send_batch / send (one code path for all OSes; on Windows it uses real batched socket writes), and the video broadcast thread is the single transmit path (stream.cpp:1327, send batching at stream.cpp:1337 send_batch latency logger). - **punktfunk gap:** The GameStream video sender's batched path is Linux-only: sendmmsg_all has a #[cfg(target_os="linux")] real implementation (stream.rs:147) and a #[cfg(not(target_os="linux"))] fallback that does one sock.send() per packet (stream.rs:185-191). On a Windows GameStream-compat host (capture IS wired for Windows via DXGI/WGC, capture.rs:261) every video datagram is an individual syscall — the native punktfunk/1 plane got Windows USO (transport/udp.rs:135) but the GameStream plane did not. - **Proposal:** Route the GameStream video send thread through the same Windows WSASendMsg/USO + WSASend-batch path the native plane already implements in punktfunk-core transport/udp.rs (or factor that send helper into a shared module and call it from gamestream/stream.rs). Keeps GameStream-on-Windows from being syscall-bound at high bitrate. - **Verify verdict:** `confirmed_gap` — PUNKTFUNK gap is real. The GameStream video send path uses a private `sendmmsg_all`: real `sendmmsg` only under `#[cfg(target_os="linux")]` (crates/punktfunk-host/src/gamestream/stream.rs:147-181), and a `#[cfg(not(target_os="linux"))]` fallback that does one `sock.send(p)` per packet (stream.rs:185-191). The paced sender calls it in PACE_CHUNK=16 bursts (stream.rs:230). It operates on a raw `std::net::UdpSocket` (stream.rs:66, cloned at :310), NOT the core `Transport` trait, so it does NOT pick up the native plane's USO. The GameStream host genuinely runs on Windows: `serve`/`gamestream` are not OS-gated (main.rs:81-83 dispatch is uncfg'd; gamestream/mod.rs declares `mod stream;` with no cfg), capture is wired for Windows (capture.rs:261-279 `capture_virtual_output` via SudoVDA+WGC/DXGI), and the module has explicit Windows handling (gamestream/mod.rs:209-210 APPDATA, :216-217 COMPUTERNAME). So on a Windows GameStream-compat host every video datagram is its own syscall. Meanwhile the native plane already has the answer: crates/punktfunk-core/src/transport/udp.rs:141-246 (`uso` state + `send_one_uso` via `WSASendMsg`+`UDP_SEND_MSG_SIZE`), wired default-on at udp.rs:610-647 (`send_gso`), called by session.rs:182. Also note GameStream video datagrams are uniform `blocksize` (= packet_size+16): data shards, the zero-padded last data shard, and FEC parity shards are all full blocksize (gamestream/video.rs:41-42,76,111-166) — the exact uniform-size precondition USO/GSO needs. APOLLO confirms the claimed unified path: `platf::send_batch` (src/platform/common.h:697) is the single video transmit call (src/stream.cpp:1598, in videoBroadcastThread, latency-logged at stream.cpp:1337); its Windows impl is real USO — `WSASendMsg` with a `UDP_SEND_MSG_SIZE` cmsg of `header_size+payload_size` (src/platform/windows/misc.cpp:1408,1499,1508), with a per-packet `send()` fallback (misc.cpp:1510-1587) "if USO is not supported ... caller will fall back to unbatched sends" (misc.cpp:1504-1505). - **Refined:** Route the GameStream Windows video send through USO instead of per-packet `send`. Do NOT duplicate the WSASendMsg code — factor the native plane's USO helper out of `UdpTransport`. Extract `send_one_uso` + the `uso` enable/latch state + `uso_unsupported` + the uniform-size chunking loop (currently udp.rs:185-246 and the `send_gso` Windows body udp.rs:610-647) into a small `pub(crate)` free function in punktfunk-core, e.g. `transport::udp::send_packets_uso(socket: &UdpSocket, packets: &[&[u8]]) -> io::Result` that takes a raw connected `std::net::UdpSocket` (the GameStream sender already owns one) and applies USO with the same default-on + auto-fallback-to-per-packet + PUNKTFUNK_GSO=0 kill-switch semantics. Then rewrite gamestream/stream.rs `sendmmsg_all` so the `#[cfg(target_os="windows")]` arm calls that helper (the Linux arm keeps its sendmmsg; a `not(any(linux,windows))` arm keeps the scalar loop). GameStream packets are already uniform blocksize per the packetizer, so the USO uniform-size guard passes; the existing PACE_CHUNK=16 microburst pacing is unaffected (each chunk becomes one WSASendMsg). Add a Linux GSO arm too while there (same helper pattern) for parity, but USO/Windows is the point of this item. Keep the change inside punktfunk-core for the helper (one core, C-ABI-stable — no new public ABI surface needed, it's pub(crate)) and a ~10-line edit in the host. This respects: no async on frame path (native sockets only), no protocol change, no scaling change. #### 5. Gate the GameStream HTTPS plane on the paired-cert allow-list *Area:* `cmp:gamestream-http-pairing` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** Apollo defers TLS verification (nvhttp.cpp:88 sets verify_peer|verify_fail_if_no_peer_cert with a permissive OpenSSL cb, then the accept() override runs cert_chain.verify() post-handshake and stashes the matched named_cert_t into request->userp; every authenticated handler calls get_verified_cert(request) — nvhttp.cpp:665-667,915,1086,1172,1360 — so an unpaired cert is rejected with a proper XML body, not just accepted). - **punktfunk gap:** punktfunk pins the client cert at pairing (pairing.rs:230-236) and loads it into AppState.paired (mod.rs:134) but NEVER consults it: tls.rs:38-45 verify_client_cert always returns assertion(), and /launch (nvhttp.rs:87-109) does no identity check. Any client that completed a TLS handshake — paired or not — can launch a session. - **Proposal:** After the handshake, recover the peer cert (axum_server exposes the rustls connection / peer certs), SHA-256 it, and check it against AppState.paired in /launch, /resume, /applist, /cancel (and reflect the real result in serverinfo PairStatus). Keep verify_client_cert lenient for the handshake but reject unpaired identities at the handler with an XML error, mirroring Apollo's get_verified_cert pattern. This is the single highest-value GameStream-compat hardening item and applies equally to the Windows host. #### 6. Query NVENC encode capabilities before init and degrade gracefully *Area:* `cmp:video-encode` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** nvenc_base.cpp:175-220 builds a get_encoder_cap lambda over nvEncGetEncodeCaps and checks NV_ENC_CAPS_WIDTH_MAX/HEIGHT_MAX (rejects with a clear message), SUPPORT_10BIT_ENCODE, SUPPORT_YUV444_ENCODE, SUPPORT_REF_PIC_INVALIDATION (toggles encoder_params.rfi), SUPPORT_CUSTOM_VBV_BUF_SIZE (nvenc_base.cpp:250-255), SUPPORT_CABAC (nvenc_base.cpp:311-315), SUPPORT_WEIGHTED_PREDICTION (nvenc_base.cpp:220), and SUPPORT_INTRA_REFRESH/SINGLE_SLICE_INTRA_REFRESH (nvenc_base.cpp:334-345). Each missing cap downgrades a feature instead of failing. - **punktfunk gap:** crates/punktfunk-host/src/encode/nvenc.rs:131-323 init_session never calls nvEncGetEncodeCaps. Max W/H is only checked against a static per-codec constant (encode.rs:57-62) not the GPU's real cap; 10-bit Main10 is forced (nvenc.rs:233-237) without checking SUPPORT_10BIT_ENCODE; custom VBV (nvenc.rs:224-227) is set without checking SUPPORT_CUSTOM_VBV_BUF_SIZE. On an unsupported card these surface as opaque InvalidParam handled only by bitrate step-down, which masks the real cause. - **Proposal:** Add a caps query in NvencD3d11Encoder::init_session right after open_encode_session_ex: build a get_cap(NV_ENC_CAPS) helper over nvEncGetEncodeCaps, validate encodeWidth/Height against WIDTH_MAX/HEIGHT_MAX with a clear error, gate the 10-bit path on SUPPORT_10BIT_ENCODE (fall back to 8-bit with a warning instead of failing), gate custom VBV on SUPPORT_CUSTOM_VBV_BUF_SIZE, and record an rfi-supported flag for the RFI work below. #### 7. Detect default-render-device changes and reinit WASAPI capture *Area:* `cmp:audio` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** Apollo registers an IMMNotificationClient and, on a default-device change, returns capture_e::reinit which triggers full mic teardown + recreate (src/platform/windows/audio.cpp default-device-change detection; signaling contract src/audio.cpp:232-244, capture_e enum src/platform/common.h:457). - **punktfunk gap:** punktfunk's WASAPI loopback opens the default Render endpoint exactly once (wasapi_cap.rs:108-111) and has no device-change listener. If the user switches default output mid-session (headphones plugged in, HDMI sink appears), the captured endpoint silently stops producing events; the only recovery is the 5 s next_chunk read timeout (wasapi_cap.rs:78) which kills audio for the rest of the session. Add an IMMNotificationClient (or wasapi-crate equivalent) that sets the stop flag / signals reinit so the capture thread reopens the new default endpoint. - **Proposal:** In wasapi_cap.rs, register a device-notification callback on the DeviceEnumerator; on default-render change, break the capture loop and reopen get_default_device(Render) + a fresh loopback IAudioClient (re-running the init block at wasapi_cap.rs:105-133). Surface it through the existing thread without tearing down the WasapiLoopbackCapturer handle so the session keeps streaming. #### 8. Move GameStream input injection off the ENet service thread *Area:* `cmp:input` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** The control thread only enqueues bytes + schedules a task; a pool thread pops one packet, batches later same-type packets while holding the queue lock, then RELEASES the lock before the (slow) SendInput/ViGEm call — src/input.cpp:1481-1520, 1639-1643. A slow OS input call never stalls the network thread. - **punktfunk gap:** on_receive() calls inj.inject(&ev) synchronously inside the host.service() ENet loop — crates/punktfunk-host/src/gamestream/control.rs:84-91,207-211. A SendInput that blocks crossing a desktop switch (or a slow ViGEm update) head-blocks ENet handshake/keepalive/retransmit servicing. The m3 path already does this right (punktfunk1.rs:1300 → injector_service_thread). - **Proposal:** Mirror the m3 design in the GameStream control thread: push decoded InputEvents onto an mpsc channel drained by a dedicated injector thread (reuse injector_service_thread or a sibling), so the ENet thread never blocks on SendInput/ViGEm. No async needed — native thread + std::sync::mpsc, consistent with the invariant. #### 9. Actually launch the app/game on Windows (CreateProcessAsUserW into the user session) *Area:* `cmp:process-launch` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** Apollo runs prep/detached/main commands via platf::run_command = retrieve_users_token + impersonate_current_user + CreateProcessAsUserW, launching apps from a SYSTEM service into the interactive user session (process.cpp execute() l430-494; platform/windows/misc.cpp run_command). - **punktfunk gap:** On Windows the resolved command is only written to PUNKTFUNK_GAMESCOPE_APP (punktfunk1.rs:535-536, stream.rs:96), which is read solely by the Linux gamescope backend (vdisplay/gamescope.rs:441). SudoVdaDisplay::create (sudovda.rs:448-543) never spawns anything, so a Windows session always shows the bare desktop — apps.json cmd and library titles are dead on Windows. - **Proposal:** Add a Windows app-launch path: resolve the AppEntry.cmd / library launch_command, then CreateProcessAsUserW into the interactive session (the token+pipe plumbing already exists in capture/wgc_relay.rs — reuse retrieve-token + CreateProcessAsUserW). Track the launched process for lifetime/teardown. Make library.rs Windows launch resolve to a real command (steam steam://rungameid works on Windows) instead of the gamescope env var. #### 10. Native system tray with state-driven icon + notifications *Area:* `cmp:config-management` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** system_tray.cpp builds a single static tray struct with a menu (Open/Force-stop/Reset-display/Restart/Quit, l.112-141) and pushes state changes from the streaming pipeline — update_tray_playing/pausing/stopped/launch_error/require_pin/paired/client_connected (l.238-412) each swap the icon + raise a balloon notification; init_tray hardens the thread DACL so the icon survives running as SYSTEM (l.143-204); a 50 ms polling thread drives it (tray_thread_worker l.415). - **punktfunk gap:** No tray code exists anywhere in crates/punktfunk-host (grep for tray/notify-rust/balloon returns nothing). On Windows the host runs windowless as SYSTEM in Session 1 via external scripts (docs/windows-host.md:77-84) with the only operator feedback being a redirected log file — there is no visible, clickable status/control surface for a desktop user. - **Proposal:** Add an optional system-tray plane behind a feature/flag using a Rust tray crate (e.g. tray-icon) spawned on its own native thread (no async on the per-frame path). Drive it from the existing AppState atomics/locks already exposed by mgmt.rs get_status (streaming/audio_streaming/pin_pending/session) — poll or push on state change to swap icon + show balloons (connected, pairing PIN, launch error). Menu items call the SAME primitives the API uses (stop_session, force_idr, native arm-pairing, quit). On Windows replicate Apollo's thread-DACL hardening so the icon shows when launched as SYSTEM in the interactive session. #### 11. Treat S_OK-with-no-change frames as timeouts via DXGI update flags *Area:* `win:capture-dxgi-dd` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** Apollo computes mouse_update_flag (LastMouseUpdateTime!=0 || PointerShapeBufferSize>0) and frame_update_flag (LastPresentTime!=0; RAM path also AccumulatedFrames!=0); if neither is set it returns capture_e::timeout and repeats the last frame (display_vram.cpp:1162-1168, display_ram.cpp:183-189). - **punktfunk gap:** acquire() (dxgi.rs:1353-1360) treats every Ok(()) as a real new frame and always does CopyResource + (NVENC) encode + cursor composite, even when DDA reported a mouse-only or empty update. Wasted GPU work and a possibly-stale frame re-encoded as 'new'. - **Proposal:** In dxgi.rs acquire(), after a successful AcquireNextFrame, compute frame_update_flag = info.LastPresentTime != 0 (and/or info.AccumulatedFrames != 0) and mouse_update_flag from LastMouseUpdateTime/PointerShapeBufferSize. Always call update_cursor (mouse). If !frame_update_flag, ReleaseFrame and return Ok(None) (so next_frame repeats last_present) UNLESS the cursor moved and we need a recomposite — in which case recomposite onto the existing last_present texture instead of CopyResource'ing the source. This cuts idle/cursor-only GPU load and avoids re-encoding unchanged content. #### 12. Detect size/format change in WGC and signal reinit *Area:* `win:capture-wgc` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** Apollo re-reads the source desc every snapshot and returns capture_e::reinit on width/height/format mismatch (display_wgc.cpp:297-306, display_vram.cpp:1664-1674), so a mid-session mode change rebuilds the pipeline instead of corrupting it. - **punktfunk gap:** wgc.rs process_frame (wgc.rs:385-428) never re-reads the source GetDesc; width/height are frozen at open() (wgc.rs:196). A SudoVDA mode change or mid-stream Reconfigure (punktfunk's core feature) silently mismatches the fixed-size bgra_copy/hdr10_out. - **Proposal:** In WgcCapturer::process_frame, call src.GetDesc() and compare Width/Height/Format against self.width/height and the expected format. On mismatch, return a Reinit error (add a capture_e::Reinit-equivalent to the Capturer contract or bail with a recognizable error the m3/stream loop maps to a capturer rebuild). Drop and re-create fp16_src/hdr10_out/bgra_copy when size changes. #### 13. Split every cursor shape into an alpha image + an XOR image (two-pass composite) *Area:* `win:cursor-compositing` · *Windows-host:* yes · *Severity:* high · *Effort:* medium · **✅ DONE (2026-06-16)** > **Resolution:** Implemented in `capture/dxgi.rs`. `convert_pointer_shape` now returns a `CursorShape` > with optional `alpha`/`xor` layers; `CursorCompositor` holds `tex_alpha`/`tex_xor` and `draw_layer` > renders each with its own blend (alpha = src-over + HDR scale; XOR = inversion, unscaled). MASKED_COLOR > opaque pixels now go through the alpha pass (not the invert blend), and MONOCHROME `(1,1)` invert pixels > now feed the XOR layer (previously approximated as solid black). CPU path blends both layers too. > The `cursor_invert` flag was removed. Independently reviewed (ship); pending Windows CI/dev-VM compile. - **Apollo does:** Apollo emits two BGRA images per shape — make_cursor_alpha_image (display_vram.cpp:279) and make_cursor_xor_image (display_vram.cpp:210) — and runs both an alpha-blend pass and an invert-blend pass in blend_cursor (display_vram.cpp:1448-1469), each skipped if its image is empty. MASKED_COLOR and MONOCHROME shapes legitimately need both. - **punktfunk gap:** convert_pointer_shape (dxgi.rs:566) produces ONE image and cursor_invert (dxgi.rs:1133-1134) picks ONE blend for the whole shape, so a cursor mixing opaque and screen-inverting pixels (common I-beams and themed arrows) renders wrong; masked-color opaque pixels are even forced through the invert blend (dxgi.rs:612-624 + 1205). - **Proposal:** Refactor convert_pointer_shape in dxgi.rs to return two optional images (alpha, xor) mirroring Apollo's split. Store cursor_shape as Option<(alpha, xor)>, upload up to two SRVs in CursorCompositor, and in composite_cursor_gpu run the alpha pass with self.blend then the xor pass with self.blend_invert (skip empties). Drop the single cursor_invert flag. #### 14. Map absolute mouse through the real virtual-desktop / output rect, not a blind 0..65535 normalize *Area:* `win:input-sendinput-vigem` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** Apollo maps client coords through a touch_port (offset_x/y, width/height, env_width/env_height, scalar_inv) then scales into the per-output region before the 0..65535 ABSOLUTE|VIRTUALDESK SendInput: client_to_touchport src/input.cpp:460-484 and abs_mouse platform/windows/input.cpp:512-532. - **punktfunk gap:** sendinput.rs:110-132 fetches virtual_desktop_rect() then discards it (let _ = (vw,vh) at :122) and normalizes purely to the client extent, assuming one output spans the whole desktop (admitted in the comment at :117). Wrong cursor position on multi-monitor hosts or when the SudoVDA output is not at the virtual-desktop origin. - **Proposal:** In sendinput.rs::MouseMoveAbs, query the target virtual output's rect (it is created at a known position by the SudoVDA display backend) and compute the absolute coords relative to the full virtual desktop: ax = (out_x + cx*out_w - vx)/vw * 65535 (and same for y), using SM_X/Y/CX/CYVIRTUALSCREEN. Carry the output origin/size from the display backend into the injector instead of dropping vw/vh. #### 15. Detect watchdog ping failures and escalate (re-open the device) *Area:* `win:virtual-display-sudovda` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** Apollo's ping thread counts consecutive PingDriver failures and after >3 calls failCb, which sets WATCHDOG_FAILED + closeVDisplayDevice; sessions then re-init the driver (virtual_display.cpp:603-616, process.cpp:65-78, process.cpp:243-246). - **punktfunk gap:** punktfunk's pinger discards the ioctl Result entirely (`let _ = ioctl(...)`, sudovda.rs:485-494) — a dead driver is never noticed; there is no WATCHDOG_FAILED state and no re-init path. - **Proposal:** In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS. #### 16. Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU *Area:* `win:virtual-display-sudovda` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** setRenderAdapterByName enumerates DXGI adapters, matches desc.Description, and issues SET_RENDER_ADAPTER with that adapter's LUID before every create (virtual_display.cpp:624-654, sudovda.h:109-128, called at main.cpp:369-371 and process.cpp:250-252). - **punktfunk gap:** punktfunk defines no IOCTL_SET_RENDER_ADAPTER and never binds the render adapter (sudovda.rs:47-54). On a hybrid/multi-GPU box the IDD may render on the iGPU while NVENC + Desktop Duplication run on the dGPU, breaking or slowing zero-copy. - **Proposal:** Add `const IOCTL_SET_RENDER_ADAPTER: u32 = ctl(0x802);` and a `#[repr(C)] struct SetRenderAdapterParams { luid: LUID }` in sudovda.rs. Before ADD in create() (sudovda.rs:448), enumerate DXGI adapters (reuse capture/dxgi.rs adapter-by-LUID/name helpers) to match the configured/encoder GPU and issue the IOCTL so the IDD's AddOut LUID matches the capture device's adapter. #### 17. Add streaming_will_start/stop session-level latency tuning on Windows *Area:* `win:critic` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** misc.cpp:1108-1253: on stream start raises timer resolution to 0.5ms (NtSetTimerResolution, timeBeginPeriod fallback), DwmEnableMMCSS(true), SetPriorityClass(HIGH_PRIORITY_CLASS), enables WLAN media-streaming low-latency mode on connected Wi-Fi NICs (wlan_intf_opcode_media_streaming_mode), and force-enables Mouse Keys when SM_MOUSEPRESENT is false so the cursor appears on a headless box — all restored on stop - **punktfunk gap:** No equivalent anywhere in the host: grep for NtSetTimerResolution/timeBeginPeriod/DwmEnableMMCSS/SetPriorityClass/SPI_SETMOUSEKEYS/WlanSetInterface across crates/ returns zero hits. The Windows host runs at default 15.6ms timer granularity and NORMAL priority, hurting frame pacing and the encode|send thread split mentioned in CLAUDE.md - **Proposal:** Add a Windows-only session-hardening module called once around a session: NtSetTimerResolution(5000) (0.5ms) with timeBeginPeriod(1) fallback, DwmEnableMMCSS(true), SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS), and (optional) WLAN media-streaming mode; restore all on session teardown. This is pure host-process tuning, no per-frame-path or core/ABI impact #### 18. Recover WASAPI loopback from default-device change and AUDCLNT_E_DEVICE_INVALIDATED *Area:* `win:critic` · *Windows-host:* yes · *Severity:* high · *Effort:* medium - **Apollo does:** audio.cpp:586-594 checks endpt_notification.check_default_render_device_changed() each fill and returns capture_e::reinit; audio.cpp:625-656 maps AUDCLNT_E_DEVICE_INVALIDATED (and GetBuffer failures) to reinit, so switching the default output device or sleeping/waking the audio device transparently rebuilds the capture client - **punktfunk gap:** wasapi_cap.rs:136-162 only loops on the event handle; there is no IMMNotificationClient, no AUDCLNT_E_DEVICE_INVALIDATED branch (read_from_device_to_deque errors just kill the thread, wasapi_cap.rs:147), so a mid-session output-device switch permanently silences the stream until the session restarts - **Proposal:** On the capture thread, register an IMMNotificationClient (or poll GetDefaultAudioEndpoint) and treat a default-render change OR a device-invalidated error as a re-open: tear down the IAudioClient and re-acquire the new default endpoint in-place, like the Linux PipeWire reconnect discipline. Lives entirely in audio/wasapi_cap.rs #### 19. Implement true reference-frame invalidation with a multi-ref DPB instead of always-full-IDR *Area:* `cmp:video-encode` · *Windows-host:* yes · *Severity:* high · *Effort:* large - **Apollo does:** nvenc_base.cpp:268-281 sets maxNumRefFrames/maxNumRefFramesInDPB to 5 (HEVC/H264) and L0 to 1, enabling a deep DPB; invalidate_ref_frames (nvenc_base.cpp:574-610) calls nvEncInvalidateRefFrames per lost frame range, dedupes already-done ranges, falls back to IDR only when the range exceeds the DPB, and sets rfi_needs_confirmation so the next encoded frame is marked as the RFI fulfilment (nvenc_base.cpp:551-557, 490-491). - **punktfunk gap:** crates/punktfunk-host/src/encode/nvenc.rs leaves ref frames at the preset default and exposes only request_keyframe (nvenc.rs:465-467) which always emits a full FORCE_IDR. gamestream/control.rs:163-177 collapses both RFI (0x0301) and request-IDR (0x0302) into the same full-IDR. A full IDR at high resolution is the multi-millisecond spike punktfunk's own infinite-GOP comments call out (linux.rs:197-201) — true RFI avoids it for recoverable loss. - **Proposal:** Extend the Encoder trait with an invalidate_ref_frames(first,last) method (default: fall back to request_keyframe). In the Windows NVENC config set maxNumRefFramesInDPB/maxNumRefFrames>1 (and numRefL0=1) gated on SUPPORT_MULTIPLE_REF_FRAMES, implement invalidate_ref_frames via nvEncInvalidateRefFrames with the dedupe + IDR-fallback logic, and route control.rs 0x0301 to invalidate (carrying the lost frame range) while 0x0302 stays full-IDR. #### 20. In-binary Windows service install + interactive-session launch *Area:* `cmp:config-management` · *Windows-host:* yes · *Severity:* high · *Effort:* large - **Apollo does:** config.cpp:1490-1534 handles the Windows shortcut/service launch dance inside the binary: --shortcut/--shortcut-admin handling, ShellExecuteExW(runas, --shortcut-admin) to self-elevate when the service isn't running, waits for the service, wait_for_ui_ready(), launch_ui(), then returns 1 so the foreground process does NOT also start a stream host. This is Sunshine/Apollo's mature service<->UI two-process split that makes one-click launch work. - **punktfunk gap:** punktfunk has no service-install / self-elevation / interactive-session bring-up in the binary. Deployment is documented as a manual chain of external scripts — scheduled task -> PsExec64 -i 1 -> launch.vbs -> host-run.cmd (docs/windows-host.md:77-96) — fragile and operator-hostile. main.rs has no install/service subcommand. - **Proposal:** Add `punktfunk-host install`/`uninstall`/`service` subcommands (Windows-gated) that register a service or an Interactive/Highest scheduled task to launch the host in Session 1 (the documented requirement for DXGI duplication + SendInput), and the self-elevate-if-not-running shortcut path. Reuse the existing capture/wgc_relay CreateProcessAsUserW machinery already in the crate. This codifies the script chain into the binary without touching the per-frame path or core. #### 21. Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame *Area:* `win:cursor-compositing` · *Windows-host:* yes · *Severity:* high · *Effort:* large · **⊘ ALREADY-HANDLED (2026-06-16)** > **Resolution — not a bug for punktfunk.** The gap below assumes a cursor moving over a static screen > produces `AcquireNextFrame` **timeouts**. It does not: DXGI returns **S_OK for pointer-only updates** > (`FrameInfo.LastMouseUpdateTime != 0`, `LastPresentTime == 0`), with the resource holding the > (unchanged) desktop. `acquire()` always re-runs `present_acquired` on S_OK (`dxgi.rs:1407,1474`), which > re-copies the desktop and recomposites the cursor at its new position. `last_present` is repeated only > on a genuine `WAIT_TIMEOUT` (nothing changed) or a mid-rebuild gap — correct. The agent that raised this > didn't account for DDA's pointer-update S_OK semantics, and the run was killed before the verify phase > reached it. The only real delta from Apollo is a **perf** micro-opt (Apollo retains a clean copy and > re-blends just the cursor rect, avoiding a full ~29 MB `CopyResource` per pointer update) — deferred as > optional, pending evidence of GPU-copy pressure. - **Apollo does:** Apollo treats a mouse-only update as a real update (display_vram.cpp:1162-1168) and keeps an intermediate D3D surface of the last desktop frame so it can copy surface->fresh image and re-blend the cursor at its new position with no new DDA frame (last_frame_variant state machine, display_vram.cpp:1239-1306). - **punktfunk gap (as originally filed — see Resolution above; premise incorrect):** punktfunk only composites on a fresh AcquireNextFrame (dxgi.rs:1477); on timeout it repeats last_present (dxgi.rs:1547-1561) which has the OLD cursor position baked in, so a cursor moving over a static screen stutters/lags. - **Proposal (superseded; only the perf variant remains):** Keep a clean intermediate copy of the last desktop frame (an extra DEFAULT texture). In acquire (dxgi.rs:1341), when AcquireNextFrame times out but update_cursor saw a position change (LastMouseUpdateTime changed) and the cursor is visible, copy the clean intermediate into gpu_copy and re-run composite_cursor_gpu, then return that as a fresh frame instead of repeating last_present. #### 22. Add real reference-frame invalidation (RFI) instead of always forcing IDR *Area:* `win:nvenc-d3d11` · *Windows-host:* yes · *Severity:* high · *Effort:* large - **Apollo does:** Apollo keeps a deep DPB (maxNumRefFrames 5/HEVC, 8/AV1) but pins L0 ref to 1 (nvenc_base.cpp:268-281), then on a loss event calls nvEncInvalidateRefFrames per-frame over the requested range, dedups against the last range, expands to the last-encoded index, escalates to IDR only if the range exceeds DPB depth, and tags the next frame rfi_needs_confirmation (nvenc_base.cpp:574-610). This lets the encoder re-reference an older still-valid frame rather than emit a multi-millisecond keyframe. - **punktfunk gap:** punktfunk has NO invalidate path — request_keyframe() always forces a full IDR (nvenc.rs:437-442,465-467); punktfunk1.rs:2153 / gamestream/stream.rs:336 wire 'RFI' straight to a keyframe. Every recovery is a costly IDR spike, defeating the infinite-GOP design. - **Proposal:** In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire punktfunk1.rs RFI requests to it, falling back to request_keyframe() only when it returns false. #### 23. Add a DS4 (DualShock4) ViGEm target on Windows with type auto-selection, motion, touchpad, battery and timestamp pump *Area:* `win:input-sendinput-vigem` · *Windows-host:* yes · *Severity:* high · *Effort:* large - **Apollo does:** Apollo selects X360 vs DS4 from client type / motion / touchpad caps (alloc_gamepad input.cpp:1175-1227), implements full DS4 report incl. motion with inverse ViGEmBus calibration (ds4_update_motion :143-196), touchpad (gamepad_touch :1521-1620), battery (:1654-1720), and crucially a 100ms timestamp pump (ds4_update_ts_and_send :1454-1481) because apps stop registering DS4 input without an advancing wTimestamp. - **punktfunk gap:** gamepad_windows.rs is X360-only (:17,:62); the Arrival metadata (LI_CTYPE_PS / motion / touchpad caps) is decoded but ignored (:103), and the dualsense module is Linux-only (inject.rs:297-298). PS clients on Windows lose motion, touchpad, lightbar, adaptive triggers and correct glyphs. - **Proposal:** In gamepad_windows.rs, add a DS4Wired branch via vigem_client::DualShock4Wired with a union/enum PadEntry. Resolve type from the decoded Arrival (precedence: explicit env/client choice > PS type > motion/touchpad caps > X360), mirroring the existing GAMEPAD-preference negotiation. Port Apollo's wTimestamp pump (5.333us units, re-send every 100ms), motion calibration constants (:157-170), and the touchpad byte packing (:1604-1608). Surface the LED color via the existing 0xCA/feedback plane. #### 24. Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change *Area:* `win:system-secure-desktop` · *Windows-host:* yes · *Severity:* high · *Effort:* large - **Apollo does:** SunshineSvc.exe runs as LocalSystem in Session 0, loops on WTSGetActiveConsoleSessionId, clones its own token with DuplicateTokenEx(TokenPrimary)+SetTokenInformation(TokenSessionId) and CreateProcessAsUserW into winsta0\\default inside a per-session job object (JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE|BREAKAWAY_OK); opts into SERVICE_ACCEPT_SESSIONCHANGE and on WTS_CONSOLE_CONNECT terminates+relaunches the host in the new session (tools/sunshinesvc.cpp:95,111,239,256,267,276-294) - **punktfunk gap:** punktfunk has no Windows service; launch is a PsExec64 -s -i 1 scheduled task hard-coded to session 1 (docs/windows-host.md:78-84), with the SERVICE_CONTROL_SESSIONCHANGE relaunch listed as unimplemented step 6 (docs/windows-secure-desktop.md:89). Launch scripts are not even in the repo. - **Proposal:** Add a small Rust service binary (new crate or punktfunk-host `service` subcommand) using windows::Win32::System::Services (RegisterServiceCtrlHandlerEx, StartServiceCtrlDispatcher) that mirrors sunshinesvc.cpp: WTSGetActiveConsoleSessionId -> DuplicateTokenEx+SetTokenInformation(TokenSessionId) -> CreateProcessAsUserW(lpDesktop=winsta0\\default) into a kill-on-close job, accept SERVICE_ACCEPT_SESSIONCHANGE, and relaunch the host on a genuine console-session change. Ship an installer and drop the PsExec dependency. #### 25. Elevate capture/encode/send thread priority on the host hot path *Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* medium · *Effort:* small · **✓ verified** - **Apollo does:** Apollo raises the transmit/capture thread priority: platf::adjust_thread_priority(thread_priority_e::critical) in the video broadcast thread (stream.cpp:1122) and ::high in the audio/control paths (stream.cpp:1333, 1672); the Windows impl is SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST/ABOVE_NORMAL) (platform/windows/misc.cpp:1081-1102). - **punktfunk gap:** punktfunk names its hot-path threads (stream.rs:44 video, stream.rs:204 send, punktfunk1.rs:1804 send_loop, punktfunk1.rs:2017/2328 send threads) but never sets a scheduling priority — every host capture/encode/send thread runs at default priority. Only the macOS client elevates (client.rs:169). On a loaded Windows desktop the encode/send thread can be preempted, adding jitter the frame-pacing logic can't recover. - **Proposal:** Add a cross-platform raise_current_thread_priority() helper (SetThreadPriority on Windows, optionally AvSetMmThreadCharacteristics for MMCSS; sched/nice on Linux) and call it at the top of the GameStream send thread, the native send_loop, and the encode thread. Cheap, high-value jitter reduction, no design impact. - **Verify verdict:** `confirmed_gap` — punktfunk: NO thread-priority call exists anywhere in the workspace (grep for SetThreadPriority/sched_setscheduler/setpriority/AvSetMm/THREAD_PRIORITY across crates/ returned zero hits). Hot-path threads are named-only at default priority: GameStream video thread crates/punktfunk-host/src/gamestream/stream.rs:44-53 (thread::Builder name "punktfunk-video") and GameStream send thread stream.rs:204-206 ("punktfunk-send"); native send threads crates/punktfunk-host/src/punktfunk1.rs:2017-2033 and punktfunk1.rs:2328-2333 ("punktfunk-send"), and the native send_loop at punktfunk1.rs:1804 — all spawned with no priority set. The encode work shares the capture thread (punktfunk1.rs:2011-2013 "this thread captures+encodes ... and hands each AU to a dedicated send thread"), also default priority. The windows crate is ALREADY a dependency with the needed feature: crates/punktfunk-host/Cargo.toml:141 enables "Win32_System_Threading" (SetThreadPriority/GetCurrentThread available, zero new deps). Apollo: confirmed it raises priority on every hot-path thread — capture src/video.cpp:1295 (critical), encode src/video.cpp:2359 and 2396 (high), video send src/stream.cpp:1333 (high), control src/stream.cpp:1122 (critical), audio src/stream.cpp:1672 + src/audio.cpp:94/208. Windows impl is SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST/ABOVE_NORMAL) at src/platform/windows/misc.cpp:1081-1102, plus DwmEnableMMCSS(true) (misc.cpp:1139) and AvSetMmThreadCharacteristics("Pro Audio") for the audio-capture thread (src/platform/windows/audio.cpp:540). CRITICAL NUANCE: Apollo's adjust_thread_priority is effectively Windows-only — src/platform/linux/misc.cpp:362-364 is "// Unimplemented" and src/platform/macos/misc.mm:218-220 is "// Unimplemented". - **Refined:** Add a small cross-platform helper raise_current_thread_priority(level) and call it at the TOP of each hot-path thread body (so the calling thread itself is elevated): the GameStream send thread (stream.rs:206), the GameStream video/capture+encode thread (stream.rs:46), the native send threads (punktfunk1.rs:2021 and punktfunk1.rs:2331 closures, before/at the start of send_loop), and the native capture+encode thread (the punktfunk1.rs run body that owns capture+encode, punktfunk1.rs ~2011+). Windows: SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST) for the send/network thread (latency-critical, matches Apollo's video-send=high but the punktfunk send thread also does FEC+seal so HIGHEST is defensible) and THREAD_PRIORITY_ABOVE_NORMAL for capture+encode — using the windows crate already on Cargo.toml:141, no new deps. Optionally associate the network/encode thread with MMCSS via AvSetMmThreadCharacteristics (needs the Win32_System_Threading "Games"/"Pro Audio" task + AVRT feature) for higher-fidelity scheduling under DWM load; treat as a follow-up, not the first cut. Linux (net-new beyond Apollo, since Apollo leaves it unimplemented and punktfunk is Linux-first): best-effort nice(-10)/setpriority on the send+encode threads — note SCHED_FIFO/RR requires CAP_SYS_NICE/rtprio limits the host won't have by default, so do NOT default to realtime; a plain niceness bump is the safe portable choice and silently no-ops without privilege. Make every priority call best-effort (log-and-continue on failure, exactly as Apollo does at misc.cpp:1104). No async, no per-frame allocation, no ABI surface change — purely thread-setup, so no design invariant is touched. #### 43. Socket QoS / DSCP marking on the media sockets *Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* medium · *Effort:* medium · **✓ verified** - **Apollo does:** Apollo tags video and audio sockets for prioritized delivery: enable_socket_qos(...qos_data_type_e::video...) and (...audio...) called per session (stream.cpp:1917, stream.cpp:1938); the Windows impl uses qWAVE QOSCreateHandle/QOSAddSocketToFlow with DSCP tagging (platform/windows/misc.cpp:1616-1652), with Linux/macOS equivalents. - **punktfunk gap:** punktfunk sets NO QoS/DSCP anywhere — grep for qos/DSCP/IP_TOS across crates/punktfunk-host and crates/punktfunk-core finds only the x-nv-vqos ANNOUNCE keys (rtsp.rs:278) and a macOS *client* pthread QoS (client.rs:169). Neither the GameStream sockets (stream.rs:66 bind, audio) nor the native data socket (transport/udp.rs) request link-layer/router priority. - **Proposal:** Add a small per-OS helper to mark the video/audio/data UDP sockets: DSCP EF/AF41 via IP_TOS/IPV6_TCLASS on Linux/macOS, qWAVE QOSAddSocketToFlow on Windows (gated behind an env/config opt-in). Wire it into stream.rs socket setup and the native transport socket creation. Directly improves latency under contended Wi-Fi / shared uplink. - **Verify verdict:** `confirmed_gap` — PUNKTFUNK — gap is real on every media socket. Native data plane crates/punktfunk-core/src/transport/udp.rs:359-365 (UdpTransport::connect) and :374-414 (connect_via_punch) grow SO_SNDBUF/SO_RCVBUF (:431-447 grow_buffers via socket2::SockRef) and set GSO/USO, but never set IP_TOS/IPV6_TCLASS/SO_PRIORITY/qWAVE. GameStream sockets are bare std UdpSocket with no QoS: video crates/punktfunk-host/src/gamestream/stream.rs:66, audio audio.rs:305, control control.rs:36. RTSP does NOT parse the GameStream qosTrafficType keys at all (grep qosTrafficType in crates/punktfunk-host → exit 1), and rtsp.rs only reads x-nv-vqos bitrate/fec/codec (rtsp.rs:278). The only QoS in the tree is a macOS *client* pthread QoS-class (core/src/client.rs:156-169) — unrelated to link-layer marking. socket2 is already a punktfunk-core dep (Cargo.toml:34), so DSCP via SockRef::set_tos is trivial to add. APOLLO — confirmed it does exactly this, on by default. Per-session calls: src/stream.cpp:1917 (video) and :1938 (audio) → platf::enable_socket_qos(..., videoQosType/audioQosType != 0). Those flags come from RTSP src/rtsp.cpp:1005-1006 and are DEFAULTED non-zero at src/rtsp.cpp:982-983 (x-nv-vqos qosTrafficType="5", x-nv-aqos="4"), so QoS is on for stock Moonlight. Linux impl: src/platform/linux/misc.cpp:797-851 sets IP_TOS/IPV6_TCLASS (DSCP 40=AF41 video, 48=CS6 audio, shifted <<2) plus SO_PRIORITY 5/6. Windows impl: src/platform/windows/misc.cpp:1616-1722 dynamically loads qwave.dll and uses QOSCreateHandle/QOSAddSocketToFlow with QOSTrafficTypeAudioVideo/Voice — and crucially returns nullptr (no-op) unless dscp_tagging is set (:1622-1625). macOS: src/platform/macos/misc.mm:446. - **Refined:** Add a per-OS set_media_qos(socket, kind) helper. Linux/macOS: use the already-present socket2 — SockRef::set_tos(AF41<<2) for IPv4 / set_tclass_v6 for IPv6, plus SO_PRIORITY on Linux (video=5, audio=6, the max without CAP_NET_ADMIN; set AFTER TOS since TOS resets it — Apollo linux/misc.cpp:841-845). Wire it into UdpTransport::connect / connect_via_punch (the native punktfunk/1 data plane — the primary, highest-value target) behind an opt-in env (PUNKTFUNK_DSCP=1) and optionally a Config field, plus the GameStream stream.rs:66 / audio.rs:305 / control.rs:36 sockets. IMPORTANT Windows-host caveat (this is the user's focus and where the naive version fails): on Windows, plain IP_TOS setsockopt is silently stripped by the OS unless a registry/group-policy QoS policy ('Do not use NLA') is configured — which is exactly why Apollo uses qWAVE (QOSAddSocketToFlow) instead. So a one-line socket2 set_tos does NOT tag on the wire on Windows. To actually deliver value on the Windows host, port Apollo's qWAVE path (runtime LoadLibraryExA qwave.dll, QOSCreateHandle once, QOSAddSocketToFlow per socket with QOSTrafficTypeAudioVideo/Voice) including the dual-stack v4-mapped connect() workaround (windows/misc.cpp:1675-1700) — note our data socket is already connect()ed (udp.rs:361), which sidesteps most of that hack. Keep RAII teardown (QOSRemoveSocketFromFlow on drop) like Apollo's qos_t/deinit_t. This is purely socket-setup, off the per-frame path, no core C-ABI change, no async — fully compatible with all three design invariants. #### 90. Bitrate-derived rate-control pacing (vs frame-interval-only) *Area:* `cmp:protocol-streaming` · *Windows-host:* no · *Severity:* medium · *Effort:* medium · **✓ verified** - **Apollo does:** Apollo paces each frame's packets at the *negotiated bitrate*: ratecontrol_packets_in_1ms = giga*80/100/1000/blocksize/8 (stream.cpp:1464) and sleeps the send loop to that per-millisecond budget across the frame (stream.cpp:1578-1627), so the sender shapes to the link's allotted rate, not just the frame deadline. - **punktfunk gap:** Both punktfunk send pacers spread purely over the FRAME INTERVAL: the GameStream sender uses budget = frame_interval * 0.75 (stream.rs:209) and the native paced_submit uses budget to next frame's deadline * 0.9 (punktfunk1.rs:1752) — neither derives a packets-per-ms budget from cfg.bitrate_kbps (the bitrate is only used to open NVENC, stream.rs:275). A spiky IDR or VBR overshoot can still microburst above the negotiated rate within its frame window. - **Proposal:** Compute a bitrate-derived per-millisecond send budget (like Apollo's ratecontrol_packets_in_1ms) from the negotiated bitrate and pace overflow to THAT rate inside paced_submit / spawn_sender, taking the min of the frame-interval budget and the bitrate budget. Smooths VBR bursts on rate-limited links without breaking the existing microburst fast-path. - **Verify verdict:** `partial` — PUNKTFUNK gap is real: both pacers spread over the FRAME INTERVAL only, never the bitrate. GameStream sender: `let budget = frame_interval.mul_f32(0.75)` (crates/punktfunk-host/src/gamestream/stream.rs:209). Native paced_submit: `let budget = deadline.checked_duration_since(pace_start)...mul_f32(0.9)` (crates/punktfunk-host/src/punktfunk1.rs:1752-1755) where deadline = `next += interval` (punktfunk1.rs:2162) and `interval = Duration::from_secs_f64(1.0 / effective_hz...)` (punktfunk1.rs:2357). bitrate_kbps only configures NVENC (stream.rs:275; punktfunk1.rs:2306, 2694) and is never fed to the pacer. So far the gap claim holds. BUT the Apollo characterization in the proposal is FACTUALLY WRONG: Apollo's `size_t ratecontrol_packets_in_1ms = std::giga::num * 80 / 100 / 1000 / blocksize / 8;` (/home/enricobuehler/Apollo/src/stream.cpp:1464) is a HARDCODED 80% of 1 Gigabit/sec — a fixed constant. grep across stream.cpp shows the negotiated/session bitrate never enters this formula (only std::giga::num, blocksize, and the 80/100 constant appear at lines 1464/1578-1582/1625-1627). Apollo paces to a FIXED ~800 Mbps link ceiling regardless of negotiated bitrate; it is NOT "negotiated-bitrate pacing." punktfunk's own design notes deliberately reject clamping to negotiated bitrate: "The encoder is pixel-rate bound, not bitrate bound" (punktfunk1.rs:321) and the whole 1Gbps+ effort raised the ceiling (punktfunk1.rs:1617-1619, MAX_BITRATE_KBPS ~2 Gbps). - **Refined:** Reject the proposal AS WRITTEN — its premise ("Apollo paces to the negotiated bitrate") is false; Apollo paces to a hardcoded 80%-of-1Gbps fixed link ceiling (stream.cpp:1464), and pacing to negotiated bitrate would actively regress punktfunk (VBR/IDR spikes legitimately exceed average bitrate, and punktfunk explicitly treats the encoder as pixel-rate-bound, not bitrate-bound — punktfunk1.rs:321). If anything is worth porting, it is the FIXED per-millisecond link-rate ceiling concept, not bitrate-derived pacing: optionally compute a fixed packets-per-ms budget from a configurable link-rate ceiling (default high, e.g. matching MAX_BITRATE_KBPS, env-overridable like PUNKTFUNK_PACE_BURST_KB) and take min(frame-interval budget, link-ceiling budget) inside paced_submit/spawn_sender — purely as a microburst smoother for rate-limited links, NOT tied to cfg.bitrate_kbps. Note punktfunk already has the microburst fast-path (burst_cap, punktfunk1.rs:2005-2009 / paced_submit:1734-1743) and frame-interval spreading, which together already address the "spiky IDR microburst" symptom the proposal cites. Recommend deferring unless a measured rate-limited-link regression appears; the current frame-interval + burst-cap pacing covers the cited risk. #### 94. Consume the GameStream client loss-stats report *Area:* `cmp:protocol-streaming` · *Windows-host:* no · *Severity:* low · *Effort:* small · **✓ verified** - **Apollo does:** Apollo maps the IDX_LOSS_STATS control message (stream.cpp:943) and reads loss count, time-window, and last-good-frame from the client's periodic report — the hook for loss-driven decisions (e.g. adaptive FEC/bitrate). - **punktfunk gap:** punktfunk's GameStream control handler only acts on IDR/RFI requests (control.rs:163-177) and decodes input/gamepad; it has no handler for the periodic loss-stats packet the client sends, so the host has zero visibility into client-observed packet loss on the GameStream plane (the native plane has ProbeResult bandwidth feedback but not in-stream loss telemetry either). - **Proposal:** Decode the loss-stats control message in control.rs on_receive and log/expose it (loss count, window, last-good-frame). Even read-only it enables future adaptive FEC percentage or bitrate de-rating; it is the prerequisite signal Apollo already captures. - **Verify verdict:** `confirmed_gap` — PUNKTFUNK gap is real. crates/punktfunk-host/src/gamestream/control.rs:165-177 — after decrypt, the only inner-type dispatch is `if matches!(inner, 0x0301 | 0x0302 | 0x0305)` → force_idr; everything else falls through to gamepad::decode (returns None for non-controller) then input::decode, which at crates/punktfunk-host/src/gamestream/input.rs:35 returns empty unless `type == 0x0206`. So a loss-stats packet (`0x0201`) is silently dropped — `on_receive` has no branch for it. A broad grep across crates/ for loss-stats/last-good-frame/0x0201 found nothing (only DXGI's unrelated "last good frame" comment at capture/dxgi.rs:751). The native plane has only end-of-burst ProbeResult bandwidth/loss telemetry (crates/punktfunk-core/src/client.rs:436, abi.rs:1499) — a one-shot speed test, NOT continuous in-stream loss feedback. APOLLO confirms the claim: src/stream.cpp:41 `#define IDX_LOSS_STATS 3`, src/stream.cpp:61 maps it to wire type `0x0201`, and src/stream.cpp:943-957 reads `int32_t *stats` with stats[0]=count, stats[1]=time-window ms, stats[3]=lastGoodFrame (logged at BOOST verbose). Wire offset confirmed: the map callback receives `next_payload = plaintext.data()+4` (src/stream.cpp:1104), i.e. the body AFTER the 4-byte `[type][payloadLength]` header — so stats[0..] is at body offset 0. Note: Apollo only LOGS it; it does not yet drive adaptive FEC/bitrate off it either. - **Refined:** Add one branch to control.rs `on_receive`: when the decrypted `pt` inner type (LE u16 at pt[0..2]) == 0x0201 and pt.len() >= 20, decode the body as four LE i32 — pt[4..8]=loss_count, pt[8..12]=time_window_ms, pt[16..20]=last_good_frame (mirroring Apollo's stats[0]/stats[1]/stats[3]; verify endianness against a real Moonlight capture — moonlight-common-c writes these as host-order/LE, and punktfunk already treats control inner fields as LE). Initially log at debug/trace and optionally surface via an AtomicU32 in AppState or the mgmt API so the web console can show client-observed loss. Keep it read-only first. Caveat for the backlog: this is a low-value telemetry hook, NOT adaptive control. The actual lever (adaptive FEC % / bitrate de-rating) is a separate, larger piece of work that Apollo itself does not implement off this signal — do not over-scope. Place it next to the existing 0x0301/0x0302/0x0305 dispatch so the control hot path stays a single decrypt + cheap type match. windowsHost=false is correct: this is GameStream-plane, OS-independent, and the punktfunk/1 native plane is the higher-priority protocol — so prioritize accordingly. _(28 detailed; remaining 68 medium/low items are in the table above with citations available in Parts 2–3.)_