perf(core): UDP GSO send path (the multi-Gbps lever)
apple / swift (push) Successful in 1m16s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / rust (push) Successful in 1m31s
deb / build-publish (push) Successful in 2m36s
ci / web (push) Failing after 36s
ci / docs-site (push) Failing after 32s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
rpm / build-publish (push) Successful in 4m38s
docker / deploy-docs (push) Successful in 17s
apple / swift (push) Successful in 1m16s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / rust (push) Successful in 1m31s
deb / build-publish (push) Successful in 2m36s
ci / web (push) Failing after 36s
ci / docs-site (push) Failing after 32s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
rpm / build-publish (push) Successful in 4m38s
docker / deploy-docs (push) Successful in 17s
sendmmsg already batches syscalls but still builds one sk_buff per datagram — the kernel-side wall above ~1 Gbps. UDP Generic Segmentation Offload hands the kernel one big buffer it splits into gso_size datagrams, building ~1 GSO skb per ≤64 segments. Research (LWN/Cloudflare/Tailscale) measures ~2.4x throughput at equal CPU and 17-44x fewer syscalls, and that sendmmsg batching alone is insufficient — you need true segmentation offload. Adds Transport::send_gso (default = send_batch) + a UdpTransport Linux override: coalesces a frame's equal-size wire packets (shards are zero-padded to a constant size, so a whole frame is one gso_size) into ≤64-segment sendmsg(UDP_SEGMENT) calls. seal/send routes through it. Opt-in via PUNKTFUNK_GSO (new unsafe hot-path code) with automatic fallback to sendmmsg on any GSO error (unsupported kernel/ path), latched per process. Loopback unit test validates the cmsg segmentation; full session over loopback streams clean (0% loss). Linux-only; loopback/non-Linux keep sendmmsg/scalar. Next levers: in-place AES-GCM seal (kill per-packet allocs), UDP GRO on recv, drop the sleep-pacing in favor of the kernel qdisc, jumbo MTU. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -33,6 +33,18 @@ pub trait Transport: Send + Sync {
|
||||
Ok(sent)
|
||||
}
|
||||
|
||||
/// Send a frame's equal-size packets using UDP Generic Segmentation Offload where available:
|
||||
/// one `sendmsg` hands the kernel a big buffer it splits into `gso_size` UDP datagrams, building
|
||||
/// ~1 GSO skb per ≤64 segments instead of one skb per packet. This is the multi-Gbps lever —
|
||||
/// research shows ~2.4× throughput at equal CPU and ~40× fewer syscalls, and that `sendmmsg`
|
||||
/// batching alone is insufficient (it still builds one skb per datagram). The
|
||||
/// [`UdpTransport`](super::UdpTransport) Linux override implements it (opt-in via `PUNKTFUNK_GSO`,
|
||||
/// auto-fallback on any GSO error); the default just delegates to [`send_batch`](Self::send_batch),
|
||||
/// correct for loopback and non-Linux. Same lossy, FEC-protected short-count contract as `send_batch`.
|
||||
fn send_gso(&self, packets: &[&[u8]]) -> std::io::Result<usize> {
|
||||
self.send_batch(packets)
|
||||
}
|
||||
|
||||
fn recv(&self) -> std::io::Result<Option<Vec<u8>>>;
|
||||
|
||||
/// Receive up to `out.len()` datagrams in as few syscalls as possible, writing each into its
|
||||
|
||||
Reference in New Issue
Block a user