From 2557ce1ee526627bb0603e10df754fc1663b9d29 Mon Sep 17 00:00:00 2001 From: enricobuehler Date: Thu, 11 Jun 2026 20:47:36 +0000 Subject: [PATCH] =?UTF-8?q?docs(roadmap):=20=C2=A711=201=20Gbps+=20data=20?= =?UTF-8?q?plane=20=E2=80=94=20foundation=20landed,=20batched=20send=20nex?= =?UTF-8?q?t?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/roadmap.md | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/docs/roadmap.md b/docs/roadmap.md index a675eda..901be93 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -224,3 +224,42 @@ build the moment a producer lands. zero-copy import (`GL_RGB10_A2`/float dest for RGB10, or P010 straight through the Vulkan→CUDA path); `hevc_nvenc -profile main10` + color/SEI metadata; opt-in Hello/Welcome + C ABI; Apple VideoToolbox Main10 decode + `wantsExtendedDynamicRangeContent` EDR present + SDR fallback. + +## 11. 1 Gbps+ data plane *(foundation landed — the real work is batched/paced send)* + +Support 1 Gbps+ video bitrate end to end — **the whole point of the GF(2¹⁶) Leopard FEC** (it breaks +the GF(2⁸)/Moonlight ~1 Gbps wall). A 6-way subagent investigation (2026-06-11) mapped every ceiling. + +**Verdict: ~halfway, and it's mostly clamps + ONE real piece of work.** Already 1 Gbps-ready and +untouched: the integer/type path (u32 kbps → u64 → int64_t, no truncation); FEC (a 1 Gbps frame is +only ~434–874 data shards = a single GF(2¹⁶) block, two orders under the 65535 ceiling); AES-GCM +(RustCrypto auto AES-NI, ~10–25× headroom on x86_64); the u64 sequence/nonce space; and the **M1 +`ReassemblerLimits`** — fully *derived* from the negotiated `FecConfig`, so they already admit every +legit high-bitrate frame with nothing to relax. Security invariant to keep: every allocation size +must trace to a host-negotiated parameter clamped to a scheme ceiling — scale via the negotiated +params (`max_data_per_block`, `shard_payload`), never by widening a bound by hand. + +- **Done & live (`b8a33e2`) — make 1 Gbps configurable + its failure mode observable:** raised the + clamps (`MAX_BITRATE_KBPS` 500 Mbps → 2 Gbps; `MAX_PROBE_KBPS` 1 → 3 Gbps so the probe can show + headroom above the session cap); `TARGET_SOCKBUF` 8 → 32 MB (+ matching `99-punktfunk-net.conf`) + so a multi-MB IDR burst doesn't fill the buffer; and surfaced the previously-silent WouldBlock + send-buffer drop — `Transport::send` → `Result`, a new `packets_send_dropped` stat (Stats + + C ABI `PunktfunkStats`), a `PUNKTFUNK_PERF` wire-Mbps/drop dump in `virtual_stream`, and the probe + completion log. Loopback-verified the clamp no longer truncates a 1.2 Gbps probe. +- **The real bottleneck (next):** the native data plane is single-threaded with one `send()` syscall + per packet — at ~125k pkt/s (1 Gbps wire) it burns a core on syscalls and mass-drops keyframe + bursts. The fix is a **port, not invention**: lift the GameStream path's proven `sendmmsg_all` + (64/call) + paced `spawn_sender` into the core `Transport` seam (`send_batch(&[&[u8]])`, Linux + `sendmmsg`, scalar default), move FEC+seal+send onto a dedicated paced send thread, and mirror with + `recvmmsg` + a reused buffer ring on the client (kills the per-recv alloc + the 300 µs-sleep + underdrain). ~64× fewer syscalls. +- **Then refine as profiling shows:** add a FEC throughput-bench to `loss-harness`; reuse the + reed-solomon engine in `Gf16Coder`; lower `max_data_per_block` 4096 → 256–1024 (bounds burst-drop + blast radius + enables per-block FEC parallelism); seal in place via `AeadInPlace`; bump + `shard_payload` 1200 → ~1452 (or jumbo after a path-MTU probe) for ~17% (or ~6×) fewer packets. +- **DoS hygiene (last):** derive the one hardcoded reassembler field (`max_frame_bytes` = 64 MiB, + never set by `session_config`) from the negotiated mode/bitrate — strictly *tightens* the surface. +- **Validate with the speed-test probe** (it reuses the real `submit_frame`→FEC+crypto+send path): + `punktfunk-client-rs --speed-test KBPS:MS`, RELEASE build (debug is CPU-bound ~30 Mbps), watching + `packets_send_dropped`. Open Qs: NVENC CBR rate-tracking at 0.5–1 Gbps (no explicit + `rc_buffer_size`); LAN/QEMU-NIC jumbo/GSO support; any `web/` bitrate slider hardcoding 500 Mbps.