perf(core): UDP GSO send path (the multi-Gbps lever)
apple / swift (push) Successful in 1m16s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / rust (push) Successful in 1m31s
deb / build-publish (push) Successful in 2m36s
ci / web (push) Failing after 36s
ci / docs-site (push) Failing after 32s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
rpm / build-publish (push) Successful in 4m38s
docker / deploy-docs (push) Successful in 17s
apple / swift (push) Successful in 1m16s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / rust (push) Successful in 1m31s
deb / build-publish (push) Successful in 2m36s
ci / web (push) Failing after 36s
ci / docs-site (push) Failing after 32s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
rpm / build-publish (push) Successful in 4m38s
docker / deploy-docs (push) Successful in 17s
sendmmsg already batches syscalls but still builds one sk_buff per datagram — the kernel-side wall above ~1 Gbps. UDP Generic Segmentation Offload hands the kernel one big buffer it splits into gso_size datagrams, building ~1 GSO skb per ≤64 segments. Research (LWN/Cloudflare/Tailscale) measures ~2.4x throughput at equal CPU and 17-44x fewer syscalls, and that sendmmsg batching alone is insufficient — you need true segmentation offload. Adds Transport::send_gso (default = send_batch) + a UdpTransport Linux override: coalesces a frame's equal-size wire packets (shards are zero-padded to a constant size, so a whole frame is one gso_size) into ≤64-segment sendmsg(UDP_SEGMENT) calls. seal/send routes through it. Opt-in via PUNKTFUNK_GSO (new unsafe hot-path code) with automatic fallback to sendmmsg on any GSO error (unsupported kernel/ path), latched per process. Loopback unit test validates the cmsg segmentation; full session over loopback streams clean (0% loss). Linux-only; loopback/non-Linux keep sendmmsg/scalar. Next levers: in-place AES-GCM seal (kill per-packet allocs), UDP GRO on recv, drop the sleep-pacing in favor of the kernel qdisc, jumbo MTU. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -158,7 +158,8 @@ impl Session {
|
||||
/// batched `sendmmsg`, returning how many the kernel accepted. The rest (`packets.len() - n`)
|
||||
/// are counted as send-buffer drops. Call once for the whole frame, or per paced chunk.
|
||||
pub fn send_sealed(&self, packets: &[&[u8]]) -> Result<usize> {
|
||||
let sent = self.transport.send_batch(packets)?;
|
||||
// GSO when enabled (UdpTransport/Linux), else sendmmsg — same short-count drop contract.
|
||||
let sent = self.transport.send_gso(packets)?;
|
||||
if sent < packets.len() {
|
||||
StatsCounters::add(
|
||||
&self.stats.packets_send_dropped,
|
||||
|
||||
Reference in New Issue
Block a user