feat(1gbps): batched client recv via recvmmsg (increment C)
ci / rust (push) Has been cancelled

Final increment of the 1 Gbps data-plane rework — the recv counterpart of the
sendmmsg work. The client recv path did one recvfrom + one Vec allocation per
packet (and the pump's 300µs idle sleep could let packets pile up at line rate).

- Transport gains recv_batch(&mut [Vec<u8>], &mut [usize]) -> count; default is
  a single scalar recv into out[0] (loopback + non-Linux).
- UdpTransport overrides it on Linux with recvmmsg (MSG_DONTWAIT) draining up to
  N datagrams per syscall into the caller's reused buffers — no per-packet alloc.
- Session::poll_frame owns a lazily-allocated recv ring (RECV_BATCH=32) and
  consumes it one packet at a time across calls, refilling with one recvmmsg when
  drained. Encapsulated: the punktfunk-client-rs + NativeClient pumps are
  unchanged, and draining a batch per syscall means the 300µs sleep no longer
  underdrains. Added UdpTransport::local_addr (used by the test, generally handy).

~125k → ~4k recv syscalls/sec at line rate, zero per-packet recv allocation.
Verified: new recv_batch_drains_over_loopback test (50 datagrams drained intact
via recvmmsg) + the existing loopback round-trip now runs through the batched
poll_frame; full suite (35 + round-trip + 6) + clippy + fmt green.

Decode-in-place (kill the per-packet open_from_wire alloc) is a separate later
optimization. With A (sendmmsg) + B (paced send) + C (recvmmsg), the native data
plane is batched + paced end to end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-11 22:39:51 +00:00
parent 10a932d013
commit 2f4f92a804
3 changed files with 166 additions and 10 deletions
@@ -34,4 +34,25 @@ pub trait Transport: Send + Sync {
}
fn recv(&self) -> std::io::Result<Option<Vec<u8>>>;
/// Receive up to `out.len()` datagrams in as few syscalls as possible, writing each into its
/// `out[i]` buffer (sized ≥ a max datagram) and its byte count into `lens[i]`; returns how many
/// arrived (`0` = none available; non-blocking). The recv counterpart of [`send_batch`]: the
/// [`UdpTransport`](super::UdpTransport) override uses `recvmmsg` into a caller-owned, reused
/// buffer ring — no per-packet allocation or syscall at line rate. The default does a single
/// scalar [`recv`](Self::recv) into `out[0]` (correct for the loopback transport + non-Linux).
fn recv_batch(&self, out: &mut [Vec<u8>], lens: &mut [usize]) -> std::io::Result<usize> {
if out.is_empty() {
return Ok(0);
}
match self.recv()? {
Some(pkt) => {
let n = pkt.len().min(out[0].len());
out[0][..n].copy_from_slice(&pkt[..n]);
lens[0] = n;
Ok(1)
}
None => Ok(0),
}
}
}