feat(host): Apollo-backlog hardening — cert gate, NVENC RFI, media QoS, async injector

A pass over the apollo-comparison backlog (re-verified against current code).
Lands four items end-to-end plus a Windows-DualSense scoping doc.

- #5/#92/#26 — GameStream paired-cert allow-list. tls.rs surfaces the verified
  peer cert to handlers (serve_https + PeerCertFingerprint, now shared with the
  mgmt API instead of duplicated); nvhttp gates /launch /resume /applist /cancel
  on AppState.paired and reports a real PairStatus; save_paired writes atomically
  (temp+rename). Closes the "mTLS accepts any client cert" hole. + regression test.

- #6/#51/#19/#22 — NVENC caps query -> reference-frame invalidation. nvenc.rs
  query_caps probes nvEncGetEncodeCaps (max dims / 10-bit / custom-VBV / RFI),
  rejecting over-range modes and degrading 10-bit->8-bit instead of an opaque
  InvalidParam. New Encoder::invalidate_ref_frames (default false -> caller
  keyframes); the Windows NVENC path implements real RFI (multi-ref DPB +
  nvEncInvalidateRefFrames, dedup + IDR-on-overflow). control.rs decodes the
  0x0301 lost-frame range (Apollo's IDX_INVALIDATE_REF_FRAMES) -> AppState.rfi_range
  -> encode loop, falling back to a keyframe. NOTE: the Windows NVENC impl is
  RTX-box/CI-pending (can't compile on Linux); adversarially reviewed vs the SDK.

- #43/#72 — media socket QoS + buffer growth. New punktfunk_core::transport::qos:
  grow_socket_buffers (factored out the native plane's 32MB SO_SNDBUF growth so the
  GameStream sockets reuse it) + set_media_qos (opt-in PUNKTFUNK_DSCP=1: DSCP CS5
  video / CS6 audio + Linux SO_PRIORITY, Apollo's scheme). Wired into UdpTransport
  and the GameStream video/audio sockets. Windows IP_TOS needs qWAVE (follow-up).

- #8/#45 — GameStream input injection off the ENet service thread. on_receive no
  longer injects inline (a slow inject head-blocked ENet keepalive/retransmit); it
  forwards to a dedicated injector thread. The hardened InjectorService moved from
  punktfunk1 into crate::inject (shared by both planes) + a coalesce step that sums
  adjacent relative-mouse/scroll deltas while preserving button/key/abs ordering.

Docs: re-verified apollo-comparison.md status (22 items already done/obsolete since
the snapshot) + windows-dualsense-scoping.md (ViGEm can't emulate a DualSense; real
DS5 on Windows needs a VHF virtual-HID driver — web-research pass pending).

fmt + clippy -D warnings clean; full workspace test suite green; no C-ABI/OpenAPI drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-21 00:06:30 +00:00
parent a2a6b858f7
commit 450bcf1e7b
20 changed files with 1060 additions and 281 deletions
@@ -2,9 +2,11 @@
//! directly — no async runtime is involved.
mod loopback;
mod qos;
mod udp;
pub use loopback::{loopback_pair, LoopbackTransport};
pub use qos::{grow_socket_buffers, set_media_qos, MediaClass};
/// Windows-only: reusable USO (UDP Send Offload) batch send for callers that own their own connected
/// socket (the GameStream video sender) rather than going through [`UdpTransport`].
#[cfg(target_os = "windows")]
+145
View File
@@ -0,0 +1,145 @@
//! Shared UDP socket tuning for the media planes: send/recv buffer growth + best-effort link-layer
//! QoS.
//!
//! [`grow_socket_buffers`] is the `SO_SNDBUF`/`SO_RCVBUF` growth the native data plane applies; the
//! GameStream video/audio sockets reuse it so they don't go ENOBUFS-bound at high bitrate.
//!
//! [`set_media_qos`] DSCP-tags the latency-sensitive video/audio traffic (+ Linux `SO_PRIORITY`) so a
//! QoS-aware path (Wi-Fi WMM access categories, a managed switch, a shaped uplink) can prioritize it
//! over bulk flows. Mirrors what Apollo/Sunshine tag — DSCP **CS5** for video, **CS6** for audio. It
//! is **opt-in** (`PUNKTFUNK_DSCP=1`): DSCP can interact badly with some consumer ISPs/routers, and on
//! Windows a plain `IP_TOS` is silently stripped unless a qWAVE policy is active (Apollo uses the
//! qWAVE API there — that port is a follow-up; today this is a no-op on the wire on Windows).
use std::net::UdpSocket;
/// Target kernel socket-buffer size (`SO_SNDBUF`/`SO_RCVBUF`). A high-resolution frame is a burst (a
/// 5120×1440 keyframe is ~130 packets the send thread hands to `sendmmsg` at once); the default UDP
/// buffer (~208 KB on Linux) overflows on it, which EAGAINs the host send (dropping packets) or drops
/// on the client recv — and with infinite-GOP a single lost frame freezes the decode until the next
/// RFI refresh. Requested large; the OS clamps to `net.core.{wmem,rmem}_max` (Linux) /
/// `kern.ipc.maxsockbuf` (macOS).
///
/// Sized for 1 Gbps+: at ~1.2 Gbps on the wire an 8 MB buffer is only ~49 ms of steady state, and a
/// single multi-MB IDR keyframe (~4 MB ≈ 3300 packets) instantly fills most of it. 32 MB gives ~200 ms
/// of headroom and absorbs a keyframe burst without EAGAIN/ENOBUFS drops. (Paced sending —
/// `punktfunk1.rs::paced_submit` — spreads a big frame's overflow, so this buffer mostly absorbs the
/// immediate microburst rather than a whole unpaced frame.)
pub(crate) const TARGET_SOCKBUF: usize = 32 * 1024 * 1024;
/// Best-effort grow of `SO_SNDBUF`/`SO_RCVBUF` to [`TARGET_SOCKBUF`]. A failure isn't fatal (the
/// stream just runs lossier); a grant far below the request means the OS cap is too low for clean
/// 4K/5K streaming, so warn with the knob to raise.
pub fn grow_socket_buffers(socket: &UdpSocket) {
let sock = socket2::SockRef::from(socket);
let _ = sock.set_send_buffer_size(TARGET_SOCKBUF);
let _ = sock.set_recv_buffer_size(TARGET_SOCKBUF);
// The kernel reports back the (possibly clamped, Linux-doubled) granted size.
let granted = sock
.send_buffer_size()
.unwrap_or(0)
.min(sock.recv_buffer_size().unwrap_or(0));
if granted < TARGET_SOCKBUF / 4 {
tracing::warn!(
granted_kb = granted / 1024,
"UDP socket buffer capped well below target — high-resolution streaming may drop \
frames; raise net.core.wmem_max / net.core.rmem_max (Linux) for clean 4K/5K"
);
}
}
/// Media class of a socket — selects the DSCP code point (and Linux `SO_PRIORITY`), matching Apollo's
/// mapping: video = CS5, audio = CS6.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum MediaClass {
Video,
Audio,
}
impl MediaClass {
/// DSCP code point (the high 6 bits of the IPv4 TOS / IPv6 traffic-class byte).
const fn dscp(self) -> u32 {
match self {
MediaClass::Video => 40, // CS5
MediaClass::Audio => 48, // CS6
}
}
}
/// Whether DSCP/QoS marking is enabled (`PUNKTFUNK_DSCP=1`). Off by default.
pub(crate) fn dscp_enabled() -> bool {
matches!(
std::env::var("PUNKTFUNK_DSCP").as_deref(),
Ok("1") | Ok("true") | Ok("on")
)
}
/// Best-effort: tag `socket`'s outgoing packets for prioritized delivery of its media class. A no-op
/// unless `PUNKTFUNK_DSCP=1`. Every step is best-effort (failures logged at debug, never fatal) — QoS
/// is a nicety, not required for correctness.
///
/// IPv4 only (all current media sockets bind `0.0.0.0`); a v6 socket simply isn't tagged. On Windows
/// the `IP_TOS` set succeeds but the OS doesn't tag the wire without a qWAVE policy (follow-up).
pub fn set_media_qos(socket: &UdpSocket, class: MediaClass) {
if dscp_enabled() {
apply_media_qos(socket, class);
}
}
/// The unconditional QoS application, factored out of [`set_media_qos`] so it is directly testable
/// without touching the process-global `PUNKTFUNK_DSCP` env. Best-effort (every step logs-and-continues).
fn apply_media_qos(socket: &UdpSocket, class: MediaClass) {
let sock = socket2::SockRef::from(socket);
// DSCP occupies the high 6 bits of the TOS byte → shift left 2.
if let Err(e) = sock.set_tos_v4(class.dscp() << 2) {
tracing::debug!(error = %e, ?class, "set IP_TOS (DSCP) failed — QoS marking skipped");
}
// SO_PRIORITY must be set AFTER IP_TOS (setting TOS resets SO_PRIORITY to 0 on Linux). Linux-only;
// 6 is the highest priority allowed without CAP_NET_ADMIN, so video=5 / audio=6 (Apollo's scheme).
#[cfg(target_os = "linux")]
{
let prio = match class {
MediaClass::Video => 5,
MediaClass::Audio => 6,
};
if let Err(e) = sock.set_priority(prio) {
tracing::debug!(error = %e, "set SO_PRIORITY failed");
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn dscp_code_points_match_apollo() {
// CS5 video / CS6 audio, shifted into the TOS byte (high 6 bits).
assert_eq!(MediaClass::Video.dscp(), 40);
assert_eq!(MediaClass::Audio.dscp(), 48);
assert_eq!(MediaClass::Video.dscp() << 2, 0xA0);
assert_eq!(MediaClass::Audio.dscp() << 2, 0xC0);
}
#[test]
fn qos_and_buffer_growth_are_best_effort_and_never_panic() {
let sock = UdpSocket::bind("127.0.0.1:0").unwrap();
// No PUNKTFUNK_DSCP in the test env → early return; must not panic regardless.
set_media_qos(&sock, MediaClass::Video);
set_media_qos(&sock, MediaClass::Audio);
grow_socket_buffers(&sock);
}
#[test]
fn apply_qos_tags_the_socket() {
// Exercise the enabled path directly (no env), and read the options back where we can.
let sock = UdpSocket::bind("127.0.0.1:0").unwrap();
apply_media_qos(&sock, MediaClass::Video);
#[cfg(target_os = "linux")]
{
let s = socket2::SockRef::from(&sock);
assert_eq!(s.tos_v4().unwrap(), 0xA0, "video → CS5 in the TOS byte");
assert_eq!(s.priority().unwrap(), 5, "video → SO_PRIORITY 5");
}
}
}
+6 -37
View File
@@ -413,26 +413,15 @@ pub struct UdpTransport {
}
impl UdpTransport {
/// Target kernel socket-buffer size. A high-resolution frame is a burst (a 5120×1440
/// keyframe is ~130 packets the send thread hands to `sendmmsg` at once); the default
/// UDP buffer (~208 KB on Linux) overflows on it, which EAGAINs the host send (dropping
/// packets) or drops on the client recv — and with infinite-GOP a single lost frame
/// freezes the decode until the next RFI refresh. Requested large; the OS clamps to
/// `net.core.{wmem,rmem}_max` (Linux) / `kern.ipc.maxsockbuf` (macOS).
///
/// Sized for 1 Gbps+: at ~1.2 Gbps on the wire an 8 MB buffer is only ~49 ms of steady state,
/// and a single multi-MB IDR keyframe (~4 MB ≈ 3300 packets) instantly fills most of it. 32 MB
/// gives ~200 ms of headroom and absorbs a keyframe burst without EAGAIN drops. (Paced sending
/// — `punktfunk1.rs::paced_submit` — now spreads a big frame's overflow, so this buffer mostly absorbs
/// the immediate microburst rather than a whole unpaced frame.)
const TARGET_SOCKBUF: usize = 32 * 1024 * 1024;
/// Bind `local` and `connect` to `peer`, so `send`/`recv` need no address and the
/// kernel filters to this peer. Non-blocking, matching the [`Transport`] contract.
pub fn connect(local: &str, peer: &str) -> std::io::Result<Self> {
let socket = UdpSocket::bind(local)?;
socket.connect(peer)?;
Self::grow_buffers(&socket);
super::qos::grow_socket_buffers(&socket);
// The native data plane is video-dominant — tag it as the video class (opt-in via
// PUNKTFUNK_DSCP). Each end marks its own egress.
super::qos::set_media_qos(&socket, super::qos::MediaClass::Video);
socket.set_nonblocking(true)?;
Ok(UdpTransport { socket })
}
@@ -481,7 +470,8 @@ impl UdpTransport {
let target = observed.map(|s| s.to_string());
socket.connect(target.as_deref().unwrap_or(fallback_peer))?;
socket.set_read_timeout(None)?;
Self::grow_buffers(&socket);
super::qos::grow_socket_buffers(&socket);
super::qos::set_media_qos(&socket, super::qos::MediaClass::Video);
socket.set_nonblocking(true)?;
Ok((UdpTransport { socket }, punched))
}
@@ -498,27 +488,6 @@ impl UdpTransport {
self.socket.local_addr()
}
/// Best-effort grow of SO_SNDBUF/SO_RCVBUF (see [`TARGET_SOCKBUF`]). A failure isn't fatal
/// (the stream just runs lossier); a grant far below the request means the OS cap is too
/// low for clean 4K/5K streaming, so warn once with the knob to raise.
fn grow_buffers(socket: &UdpSocket) {
let sock = socket2::SockRef::from(socket);
let _ = sock.set_send_buffer_size(Self::TARGET_SOCKBUF);
let _ = sock.set_recv_buffer_size(Self::TARGET_SOCKBUF);
// The kernel reports back the (possibly clamped, Linux-doubled) granted size.
let granted = sock
.send_buffer_size()
.unwrap_or(0)
.min(sock.recv_buffer_size().unwrap_or(0));
if granted < Self::TARGET_SOCKBUF / 4 {
tracing::warn!(
granted_kb = granted / 1024,
"UDP socket buffer capped well below target — high-resolution streaming may drop \
frames; raise net.core.wmem_max / net.core.rmem_max (Linux) for clean 4K/5K"
);
}
}
/// Apple batched receive via `recvmsg_x` — drains up to `out.len()` datagrams in one syscall into
/// the caller's reused buffers (the recv counterpart of Linux `recvmmsg`, which Darwin lacks).
/// SAFETY: each `MsghdrX` holds a raw pointer into `iovs`, which holds raw pointers into `out`'s