perf(core): batched non-allocating recv on Apple targets (macOS client wall)
apple / swift (push) Failing after 28s
ci / rust (push) Failing after 1m18s
ci / web (push) Failing after 47s
ci / docs-site (push) Failing after 35s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 16s
deb / build-publish (push) Failing after 43s
apple / swift (push) Failing after 28s
ci / rust (push) Failing after 1m18s
ci / web (push) Failing after 47s
ci / docs-site (push) Failing after 35s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 16s
deb / build-publish (push) Failing after 43s
The batched `recvmmsg` recv path was Linux-only; macOS fell back to the trait default, which calls the scalar `recv` — a fresh `vec![0u8; 2049]` allocation (plus zeroing and a copy) PER PACKET on the single receive thread. At line rate that alloc/free churn, not the syscall, was the single-core wall: measured the real Mac client topping out ~315 Mbps and dropping the session at 800, while a Linux client (recvmmsg) held a clean 1 Gbps against the same host, and Moonlight (batched recv) does 900 on the same Mac. Add a `cfg(all(unix, not(linux)))` `recv_batch` that drains up to RECV_BATCH datagrams per call with `libc::recv(MSG_DONTWAIT)` straight into the caller's reused ring buffers — no per-packet allocation or copy. Still one syscall per datagram (a future `recvmsg_x` batch would cut that too), but it removes the dominant cost. Linux recvmmsg path and the Windows/loopback default unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -165,7 +165,8 @@ impl Transport for UdpTransport {
|
||||
/// caller's reused buffers (no per-packet allocation). `MSG_DONTWAIT` keeps it non-blocking
|
||||
/// (the socket already is); `EAGAIN` → `0`. A datagram larger than a buffer is truncated and
|
||||
/// `lens[i]` reaches the buffer size — the reassembler then rejects it as malformed, matching
|
||||
/// `recv`'s oversized-drop. Non-Linux falls back to the trait's scalar `recv` default.
|
||||
/// `recv`'s oversized-drop. Apple/BSD use the `recv`-loop override below; other non-unix the
|
||||
/// trait's scalar default.
|
||||
#[cfg(target_os = "linux")]
|
||||
fn recv_batch(&self, out: &mut [Vec<u8>], lens: &mut [usize]) -> std::io::Result<usize> {
|
||||
use std::os::fd::AsRawFd;
|
||||
@@ -204,6 +205,46 @@ impl Transport for UdpTransport {
|
||||
}
|
||||
Ok(n as usize)
|
||||
}
|
||||
|
||||
/// Batched receive for Apple/BSD targets, which have no `recvmmsg(2)`. Drains up to `out.len()`
|
||||
/// datagrams per call with `libc::recv(MSG_DONTWAIT)` straight into the caller's reused `out[i]`
|
||||
/// buffers — eliminating the per-packet 2 KB `vec!` allocation (and its zeroing + a copy) that
|
||||
/// the scalar `recv` + trait-default `recv_batch` incur. THIS is the macOS-client throughput
|
||||
/// fix: at line rate the alloc/free churn — not the syscall — was the single-core wall (Moonlight
|
||||
/// batches; our client per-packet-allocated). It is still one syscall per datagram (a future
|
||||
/// `recvmsg_x` batch would cut that too); `EAGAIN` ends the drain. Oversized datagrams set
|
||||
/// `lens[i] == buf.len()` and the caller (`poll_frame`) drops them — same contract as `recvmmsg`.
|
||||
#[cfg(all(unix, not(target_os = "linux")))]
|
||||
fn recv_batch(&self, out: &mut [Vec<u8>], lens: &mut [usize]) -> std::io::Result<usize> {
|
||||
use std::os::fd::AsRawFd;
|
||||
let fd = self.socket.as_raw_fd();
|
||||
let n_bufs = out.len().min(lens.len());
|
||||
let mut got = 0usize;
|
||||
while got < n_bufs {
|
||||
let buf = &mut out[got];
|
||||
let r = unsafe {
|
||||
libc::recv(
|
||||
fd,
|
||||
buf.as_mut_ptr() as *mut libc::c_void,
|
||||
buf.len(),
|
||||
libc::MSG_DONTWAIT,
|
||||
)
|
||||
};
|
||||
if r < 0 {
|
||||
let err = std::io::Error::last_os_error();
|
||||
if err.kind() == std::io::ErrorKind::WouldBlock {
|
||||
break; // socket drained
|
||||
}
|
||||
if got > 0 {
|
||||
break; // report what we have; surface the error on the next empty poll
|
||||
}
|
||||
return Err(err);
|
||||
}
|
||||
lens[got] = r as usize;
|
||||
got += 1;
|
||||
}
|
||||
Ok(got)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
|
||||
Reference in New Issue
Block a user