docs(roadmap): §11 1 Gbps+ data plane — foundation landed, batched send next

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 20:47:36 +00:00
parent b8a33e21a2
commit 2557ce1ee5
1 changed files with 39 additions and 0 deletions
@@ -224,3 +224,42 @@ build the moment a producer lands.
  zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA
  path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple
  VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback.
+
+## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)*
+
+Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks
+the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling.
+
+**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and
+untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is
+only ~434–874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM
+(RustCrypto auto AES-NI, ~10–25× headroom on x86_64); the u64 sequence/nonce space; and the **M1
+`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every
+legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size
+must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated
+params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand.
+
+- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the
+  clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show
+  headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`)
+  so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock
+  send-buffer drop — `Transport::send` → `Result<bool>`, a new `packets_send_dropped` stat (Stats +
+  C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe
+  completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe.
+- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall
+  per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe
+  bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all`
+  (64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux
+  `sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with
+  `recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep
+  underdrain). ~64× fewer syscalls.
+- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the
+  reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 256–1024 (bounds burst-drop
+  blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump
+  `shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets.
+- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB,
+  never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface.
+- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path):
+  `punktfunk-client-rs --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching
+  `packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.5–1 Gbps (no explicit
+  `rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.