Files
punktfunk/crates/punktfunk-core/src/transport/mod.rs
T
enricobuehler 0324719b6e feat(host/windows): USO batched send for the GameStream video plane
The GameStream video sender did one send() syscall per packet on Windows
(the #[cfg(not(target_os="linux"))] sendmmsg_all fallback), capping
throughput at high packet rates. Wire it to UDP Send Offload (the Windows
analogue of Linux GSO) so each paced 16-packet burst goes out in one
WSASendMsg(UDP_SEND_MSG_SIZE) syscall instead of 16, preserving the
microburst pacing.

Expose a reusable punktfunk_core::transport::send_uso_all (Windows-only)
that reuses the proven native-plane USO primitive (send_one_uso + the uso
on/off latch + uso_unsupported), with the same uniform-size guard and
≤512-segment chunking as UdpTransport::send_gso. It returns how many leading
packets it sent via USO; the GameStream sendmmsg_all sends any remainder
(USO off via PUNKTFUNK_GSO=0, a size-mixed burst, or a frame's short final
packet) with per-packet send. On-wire packet boundaries are unchanged.

Resolves #4 in docs/apollo-comparison.md. Linux build unaffected;
punktfunk-core type-checks for x86_64-pc-windows-msvc. Host Windows compile
deferred to CI / dev box.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 10:21:33 +00:00

75 lines
4.0 KiB
Rust
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
//! Pluggable packet I/O. The hot path calls [`Transport::send`] / [`Transport::recv`]
//! directly — no async runtime is involved.
mod loopback;
mod udp;
pub use loopback::{loopback_pair, LoopbackTransport};
/// Windows-only: reusable USO (UDP Send Offload) batch send for callers that own their own connected
/// socket (the GameStream video sender) rather than going through [`UdpTransport`].
#[cfg(target_os = "windows")]
pub use udp::send_uso_all;
pub use udp::{spawn_data_punch, UdpTransport, PUNCH_MAGIC};
/// A datagram transport. `recv` is non-blocking: it returns `Ok(None)` when no packet
/// is currently available, so the caller (decode/present thread) never blocks here.
pub trait Transport: Send + Sync {
/// Send one packet. `Ok(true)` = handed to the kernel; `Ok(false)` = dropped locally because
/// the send buffer was momentarily full (WouldBlock) — a non-fatal loss the FEC/keyframe path
/// recovers, surfaced so the caller can count it (`packets_send_dropped`) instead of it being
/// invisible. `Err` = a real send failure.
fn send(&self, packet: &[u8]) -> std::io::Result<bool>;
/// Send a whole frame's packets in as few syscalls as possible, returning how many were
/// handed to the kernel (the caller counts `packets.len() - sent` as send-buffer drops). This
/// is the 1 Gbps+ lever: the [`UdpTransport`](super::UdpTransport) override uses `sendmmsg`
/// (~64 packets/syscall) instead of one `send` each — at ~125k pkt/s that is the difference
/// between ~2k and ~125k syscalls/sec. The default is the scalar `send` loop (correct for the
/// loopback transport and non-Linux builds). On a full send buffer it stops early and reports
/// the partial count rather than blocking — same lossy, FEC-protected contract as `send`.
fn send_batch(&self, packets: &[&[u8]]) -> std::io::Result<usize> {
let mut sent = 0;
for p in packets {
if self.send(p)? {
sent += 1;
}
}
Ok(sent)
}
/// Send a frame's equal-size packets using UDP Generic Segmentation Offload where available:
/// one `sendmsg` hands the kernel a big buffer it splits into `gso_size` UDP datagrams, building
/// ~1 GSO skb per ≤64 segments instead of one skb per packet. This is the multi-Gbps lever —
/// research shows ~2.4× throughput at equal CPU and ~40× fewer syscalls, and that `sendmmsg`
/// batching alone is insufficient (it still builds one skb per datagram). The
/// [`UdpTransport`](super::UdpTransport) Linux override implements it (opt-in via `PUNKTFUNK_GSO`,
/// auto-fallback on any GSO error); the default just delegates to [`send_batch`](Self::send_batch),
/// correct for loopback and non-Linux. Same lossy, FEC-protected short-count contract as `send_batch`.
fn send_gso(&self, packets: &[&[u8]]) -> std::io::Result<usize> {
self.send_batch(packets)
}
fn recv(&self) -> std::io::Result<Option<Vec<u8>>>;
/// Receive up to `out.len()` datagrams in as few syscalls as possible, writing each into its
/// `out[i]` buffer (sized ≥ a max datagram) and its byte count into `lens[i]`; returns how many
/// arrived (`0` = none available; non-blocking). The recv counterpart of [`send_batch`]: the
/// [`UdpTransport`](super::UdpTransport) override uses `recvmmsg` into a caller-owned, reused
/// buffer ring — no per-packet allocation or syscall at line rate. The default does a single
/// scalar [`recv`](Self::recv) into `out[0]` (correct for the loopback transport + non-Linux).
fn recv_batch(&self, out: &mut [Vec<u8>], lens: &mut [usize]) -> std::io::Result<usize> {
if out.is_empty() {
return Ok(0);
}
match self.recv()? {
Some(pkt) => {
let n = pkt.len().min(out[0].len());
out[0][..n].copy_from_slice(&pkt[..n]);
lens[0] = n;
Ok(1)
}
None => Ok(0),
}
}
}