First increment of the 1 Gbps send-path rework (the measured bottleneck): the native data plane did one send() syscall per packet — at ~125k pkt/s (1 Gbps wire) that burns a core on syscalls. Port the proven GameStream sendmmsg path into the core Transport seam. - Transport gains `send_batch(&[&[u8]]) -> usize` (count handed to the kernel; caller counts the rest as send-buffer drops). Default = the scalar send loop (loopback transport + non-Linux). - UdpTransport overrides it on Linux with `sendmmsg` (64 datagrams/syscall); the connected socket needs no per-message address. Non-blocking-aware: a full send buffer yields a short count / EAGAIN, and we stop + report what went out rather than block or retry (same lossy, FEC-protected contract as send()). - Session::submit_frame seals every shard then hands the whole frame to send_batch in ONE call instead of looping send() — ~64x fewer syscalls per frame on the native + GameStream-over-core paths; send_dropped accounting preserved (total - sent). ~125k → ~2k syscalls/sec at 1 Gbps line rate. Verified: new loopback-UDP test send_batch_delivers_over_loopback (100 batched packets arrive intact, datagram boundaries preserved); full core suite + clippy + fmt green. Next increments: a paced send thread (microburst shaping so a real NIC doesn't drop line-rate bursts) and recvmmsg on the client. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -15,5 +15,23 @@ pub trait Transport: Send + Sync {
|
||||
/// recovers, surfaced so the caller can count it (`packets_send_dropped`) instead of it being
|
||||
/// invisible. `Err` = a real send failure.
|
||||
fn send(&self, packet: &[u8]) -> std::io::Result<bool>;
|
||||
|
||||
/// Send a whole frame's packets in as few syscalls as possible, returning how many were
|
||||
/// handed to the kernel (the caller counts `packets.len() - sent` as send-buffer drops). This
|
||||
/// is the 1 Gbps+ lever: the [`UdpTransport`](super::UdpTransport) override uses `sendmmsg`
|
||||
/// (~64 packets/syscall) instead of one `send` each — at ~125k pkt/s that is the difference
|
||||
/// between ~2k and ~125k syscalls/sec. The default is the scalar `send` loop (correct for the
|
||||
/// loopback transport and non-Linux builds). On a full send buffer it stops early and reports
|
||||
/// the partial count rather than blocking — same lossy, FEC-protected contract as `send`.
|
||||
fn send_batch(&self, packets: &[&[u8]]) -> std::io::Result<usize> {
|
||||
let mut sent = 0;
|
||||
for p in packets {
|
||||
if self.send(p)? {
|
||||
sent += 1;
|
||||
}
|
||||
}
|
||||
Ok(sent)
|
||||
}
|
||||
|
||||
fn recv(&self) -> std::io::Result<Option<Vec<u8>>>;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user