First increment of the 1 Gbps send-path rework (the measured bottleneck): the native data plane did one send() syscall per packet — at ~125k pkt/s (1 Gbps wire) that burns a core on syscalls. Port the proven GameStream sendmmsg path into the core Transport seam. - Transport gains `send_batch(&[&[u8]]) -> usize` (count handed to the kernel; caller counts the rest as send-buffer drops). Default = the scalar send loop (loopback transport + non-Linux). - UdpTransport overrides it on Linux with `sendmmsg` (64 datagrams/syscall); the connected socket needs no per-message address. Non-blocking-aware: a full send buffer yields a short count / EAGAIN, and we stop + report what went out rather than block or retry (same lossy, FEC-protected contract as send()). - Session::submit_frame seals every shard then hands the whole frame to send_batch in ONE call instead of looping send() — ~64x fewer syscalls per frame on the native + GameStream-over-core paths; send_dropped accounting preserved (total - sent). ~125k → ~2k syscalls/sec at 1 Gbps line rate. Verified: new loopback-UDP test send_batch_delivers_over_loopback (100 batched packets arrive intact, datagram boundaries preserved); full core suite + clippy + fmt green. Next increments: a paced send thread (microburst shaping so a real NIC doesn't drop line-rate bursts) and recvmmsg on the client. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -118,13 +118,22 @@ impl Session {
|
||||
.packetizer
|
||||
.packetize(data, pts_ns, user_flags, self.coder.as_ref())?;
|
||||
StatsCounters::add(&self.stats.frames_submitted, 1);
|
||||
for pkt in packets {
|
||||
let wire = self.seal_for_wire(&pkt)?;
|
||||
StatsCounters::add(&self.stats.packets_sent, 1);
|
||||
StatsCounters::add(&self.stats.bytes_sent, wire.len() as u64);
|
||||
if !self.transport.send(&wire)? {
|
||||
StatsCounters::add(&self.stats.packets_send_dropped, 1);
|
||||
}
|
||||
// Seal every shard (the nonce counter advances per packet, in order), then hand the whole
|
||||
// frame to the transport in ONE batched call — `sendmmsg` does ~64 packets/syscall instead
|
||||
// of a `send` each, the dominant 1 Gbps+ lever. (Sealing must finish before the immutable
|
||||
// `send_batch` borrow; collecting the wires also keeps them alive for the batch's iovecs.)
|
||||
let mut wires: Vec<Vec<u8>> = Vec::with_capacity(packets.len());
|
||||
for pkt in &packets {
|
||||
wires.push(self.seal_for_wire(pkt)?);
|
||||
}
|
||||
let total = wires.len();
|
||||
let bytes: u64 = wires.iter().map(|w| w.len() as u64).sum();
|
||||
StatsCounters::add(&self.stats.packets_sent, total as u64);
|
||||
StatsCounters::add(&self.stats.bytes_sent, bytes);
|
||||
let refs: Vec<&[u8]> = wires.iter().map(|w| w.as_slice()).collect();
|
||||
let sent = self.transport.send_batch(&refs)?;
|
||||
if sent < total {
|
||||
StatsCounters::add(&self.stats.packets_send_dropped, (total - sent) as u64);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user