feat(net/mac): default-on recvmsg_x batched Mac recv + GSO host + longer probe
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 31s
ci / rust (push) Successful in 2m6s
ci / bench (push) Successful in 1m35s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
apple / swift (push) Successful in 1m17s
docker / deploy-docs (push) Successful in 17s
deb / build-publish (push) Successful in 2m18s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 4m50s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m27s

The Mac/iOS client's wall around ~380 Mbps on a 2.5 G path is the receive
drain, not the transport: a loopback speed-test pushes 380/600/1000 Mbps at
0.0% loss, but Darwin has no recvmmsg(2), so the macOS client was doing one
recv() syscall per packet — ~40-90k syscalls/s on one core. When the recv loop
can't drain fast enough the kernel socket buffer backs up and drops, which the
client sees as a sustained stream stalling/freezing in the 300-400 Mbps range
(and an immediate "session ended" when a 500 Mbps+ first keyframe bursts in).

- core/transport: flip recvmsg_x (the batched Darwin recv, ~30x fewer syscalls)
  from opt-in to default ON, opt-out via PUNKTFUNK_RECVMSG_X=0. Keeps the
  auto-fallback to the scalar loop on any unexpected syscall error. The Apple CI
  swift-test loopback now exercises this path by default.
- packaging/kde host.env: enable PUNKTFUNK_GSO=1 — UDP segmentation offload on
  the host send path (one sendmsg per ~64 packets), the dominant lever above
  ~1 Gbps. Already wired (send_sealed -> send_gso) with sendmmsg auto-fallback.
- apple SpeedTestSheet: lengthen the bandwidth probe 2 s -> 5 s so the measured
  number stops swinging wildly (50 vs 900 Mbps on the same link) — long enough
  for steady-state send + recv drain to settle. Matches host MAX_PROBE_MS.
- host capture: PUNKTFUNK_SYNTH_NOISE synthetic high-entropy source for
  reproducible throughput testing of the encode->FEC->send->recv path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-14 00:35:26 +00:00
parent c7c08b2855
commit c2ae40ef9e
4 changed files with 55 additions and 16 deletions
+32 -8
View File
@@ -165,6 +165,12 @@ pub struct FastSyntheticCapturer {
height: u32,
frame_idx: u64,
buf: Vec<u8>,
/// PUNKTFUNK_SYNTH_NOISE: every frame is fresh high-entropy noise NVENC can't compress or
/// predict, so the encoder hits its (CBR) bitrate target — a throughput test of the real
/// encode→FEC→send→recv path. The default flat/band content compresses to ~nothing, so it
/// can't generate real Mbps (the encoder is content-driven). xorshift over u64 chunks.
noise: bool,
rng: u64,
}
impl FastSyntheticCapturer {
@@ -175,20 +181,38 @@ impl FastSyntheticCapturer {
height,
frame_idx: 0,
buf: vec![0u8; width as usize * height as usize * 4],
noise: std::env::var_os("PUNKTFUNK_SYNTH_NOISE").is_some(),
rng: 0x9e3779b97f4a7c15,
}
}
}
impl Capturer for FastSyntheticCapturer {
fn next_frame(&mut self) -> Result<CapturedFrame> {
let (w, h) = (self.width as usize, self.height as usize);
let row = w * 4;
let shade = (self.frame_idx % 256) as u8;
self.buf.fill(shade);
let band_h = (h / 20).max(1);
let band_y = (self.frame_idx as usize * 6) % h;
for y in band_y..(band_y + band_h).min(h) {
self.buf[y * row..(y + 1) * row].fill(0xff);
if self.noise {
// Fresh, every-frame-decorrelated noise: reseed from the frame index so consecutive
// frames share no structure (forces large P-frames too, not just the keyframe).
let mut s = self
.rng
.wrapping_add(self.frame_idx.wrapping_mul(0x2545F491_4F6CDD1D))
| 1;
for c in self.buf.chunks_exact_mut(8) {
s ^= s << 13;
s ^= s >> 7;
s ^= s << 17;
c.copy_from_slice(&s.to_le_bytes());
}
self.rng = s;
} else {
let (w, h) = (self.width as usize, self.height as usize);
let row = w * 4;
let shade = (self.frame_idx % 256) as u8;
self.buf.fill(shade);
let band_h = (h / 20).max(1);
let band_y = (self.frame_idx as usize * 6) % h;
for y in band_y..(band_y + band_h).min(h) {
self.buf[y * row..(y + 1) * row].fill(0xff);
}
}
self.frame_idx += 1;
Ok(CapturedFrame {