Commit Graph

7 Commits

Author SHA1 Message Date
enricobuehler 448986f41c perf(core): UDP GSO send path (the multi-Gbps lever)
apple / swift (push) Successful in 1m16s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / rust (push) Successful in 1m31s
deb / build-publish (push) Successful in 2m36s
ci / web (push) Failing after 36s
ci / docs-site (push) Failing after 32s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
rpm / build-publish (push) Successful in 4m38s
docker / deploy-docs (push) Successful in 17s
sendmmsg already batches syscalls but still builds one sk_buff per datagram —
the kernel-side wall above ~1 Gbps. UDP Generic Segmentation Offload hands the
kernel one big buffer it splits into gso_size datagrams, building ~1 GSO skb per
≤64 segments. Research (LWN/Cloudflare/Tailscale) measures ~2.4x throughput at
equal CPU and 17-44x fewer syscalls, and that sendmmsg batching alone is
insufficient — you need true segmentation offload.

Adds Transport::send_gso (default = send_batch) + a UdpTransport Linux override:
coalesces a frame's equal-size wire packets (shards are zero-padded to a constant
size, so a whole frame is one gso_size) into ≤64-segment sendmsg(UDP_SEGMENT)
calls. seal/send routes through it. Opt-in via PUNKTFUNK_GSO (new unsafe hot-path
code) with automatic fallback to sendmmsg on any GSO error (unsupported kernel/
path), latched per process. Loopback unit test validates the cmsg segmentation;
full session over loopback streams clean (0% loss). Linux-only; loopback/non-Linux
keep sendmmsg/scalar.

Next levers: in-place AES-GCM seal (kill per-packet allocs), UDP GRO on recv,
drop the sleep-pacing in favor of the kernel qdisc, jumbo MTU.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:29:51 +00:00
enricobuehler 86f463cf71 fix(housekeeping): unaligned read UB + recv-drop parity; dedup mmsghdr; doc fixes
ci / rust (push) Has been cancelled
From a bug-hunt + unsafe-audit pass (4 reviewers + adversarial verify). It
confirmed ZERO real bugs in the recent batched/paced data-plane work — these are
the surfaced cleanups + one genuine soundness fix:

- SOUNDNESS (reduce unsafe): inject/gamepad.rs::pump_ff did `ptr::read` of an
  InputEventRaw (align 8, holds a timeval) out of a 1-aligned [u8; N] buffer — UB
  per the reference (x86_64 tolerates it, but it can miscompile under LTO). Use
  ptr::read_unaligned + a SAFETY note. Zero behavior change.
- recv parity: recv_batch (recvmmsg) didn't drop an oversized/truncated datagram
  the way scalar recv does — poll_frame now skips a message whose len fills the
  buffer (> MAX_DATAGRAM_BYTES), matching recv's `n >= RECV_BUF` drop. (AEAD
  already rejected these on encrypted sessions; this restores the documented
  invariant on the batched path.)
- dedup unsafe FFI: factor the identical mmsghdr-from-iovec construction out of
  send_batch + recv_batch into one `mmsghdrs()` helper — the raw-pointer
  scaffolding + its lifetime SAFETY note now live in one place.
- docs: TARGET_SOCKBUF no longer calls paced sending future work (it landed,
  m3.rs::paced_submit); gamescope.rs input is no longer "(TODO)" (wired +
  live-validated); the PUNKTFUNK_PERF `wire_mbps` field is renamed `tx_mbps` and
  noted as attempted/sealed bytes (send_dropped shows what didn't reach the wire).

Full suite (35 + loopback round-trip + 6) + clippy + fmt green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 23:04:23 +00:00
enricobuehler 2f4f92a804 feat(1gbps): batched client recv via recvmmsg (increment C)
ci / rust (push) Has been cancelled
Final increment of the 1 Gbps data-plane rework — the recv counterpart of the
sendmmsg work. The client recv path did one recvfrom + one Vec allocation per
packet (and the pump's 300µs idle sleep could let packets pile up at line rate).

- Transport gains recv_batch(&mut [Vec<u8>], &mut [usize]) -> count; default is
  a single scalar recv into out[0] (loopback + non-Linux).
- UdpTransport overrides it on Linux with recvmmsg (MSG_DONTWAIT) draining up to
  N datagrams per syscall into the caller's reused buffers — no per-packet alloc.
- Session::poll_frame owns a lazily-allocated recv ring (RECV_BATCH=32) and
  consumes it one packet at a time across calls, refilling with one recvmmsg when
  drained. Encapsulated: the punktfunk-client-rs + NativeClient pumps are
  unchanged, and draining a batch per syscall means the 300µs sleep no longer
  underdrains. Added UdpTransport::local_addr (used by the test, generally handy).

~125k → ~4k recv syscalls/sec at line rate, zero per-packet recv allocation.
Verified: new recv_batch_drains_over_loopback test (50 datagrams drained intact
via recvmmsg) + the existing loopback round-trip now runs through the batched
poll_frame; full suite (35 + round-trip + 6) + clippy + fmt green.

Decode-in-place (kill the per-packet open_from_wire alloc) is a separate later
optimization. With A (sendmmsg) + B (paced send) + C (recvmmsg), the native data
plane is batched + paced end to end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:39:51 +00:00
enricobuehler 10a932d013 feat(1gbps): pace per-frame sends so high-bitrate frames don't burst-drop
ci / rust (push) Has been cancelled
Increment B of the send-path rework — the actual fix for "freezes get more
common over ~150 Mbps, no image at all at 400 Mbps" on the native path. Cause:
the encoder emits a frame and submit_frame blasted ALL its packets at once into
the NIC; a real link drops the line-rate burst (host send buffer EAGAINs), and
under infinite GOP one dropped frame freezes the decode until the next keyframe.
(The speed-test probe showed 0 drops at 400 Mbps because the probe is self-paced;
real video wasn't.)

Adaptive pacing, no extra thread, no regression:
- Session splits into seal_frame (FEC + packetize + seal → wire packets, no
  send) and send_sealed (one batched sendmmsg of a chunk, counts drops);
  submit_frame is now their composition (synthetic + probe paths unchanged).
- virtual_stream's paced_submit seals a frame then sends it in 16-packet chunks
  spread over ~90% of the time until the next frame is due. At 60 fps desktop
  (fast encode → lots of slack) the frame spreads across the interval → no NIC
  burst → no freeze. At 240 fps@5K (encode ≈ interval → ~0 slack) the budget
  collapses and every chunk goes out immediately → never slower than before.

Core suite (34 + loopback round-trip + 6) + clippy + fmt green. The seal/send
split is covered by the existing loopback tests; the pacing is host timing,
verified by review (live-test needs a real NIC — your Mac at a raised bitrate).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:15:52 +00:00
enricobuehler c24b571e37 feat(1gbps): batched send via sendmmsg (Transport::send_batch)
ci / rust (push) Has been cancelled
First increment of the 1 Gbps send-path rework (the measured bottleneck): the
native data plane did one send() syscall per packet — at ~125k pkt/s (1 Gbps
wire) that burns a core on syscalls. Port the proven GameStream sendmmsg path
into the core Transport seam.

- Transport gains `send_batch(&[&[u8]]) -> usize` (count handed to the kernel;
  caller counts the rest as send-buffer drops). Default = the scalar send loop
  (loopback transport + non-Linux).
- UdpTransport overrides it on Linux with `sendmmsg` (64 datagrams/syscall);
  the connected socket needs no per-message address. Non-blocking-aware: a full
  send buffer yields a short count / EAGAIN, and we stop + report what went out
  rather than block or retry (same lossy, FEC-protected contract as send()).
- Session::submit_frame seals every shard then hands the whole frame to
  send_batch in ONE call instead of looping send() — ~64x fewer syscalls per
  frame on the native + GameStream-over-core paths; send_dropped accounting
  preserved (total - sent).

~125k → ~2k syscalls/sec at 1 Gbps line rate. Verified: new loopback-UDP test
send_batch_delivers_over_loopback (100 batched packets arrive intact, datagram
boundaries preserved); full core suite + clippy + fmt green.

Next increments: a paced send thread (microburst shaping so a real NIC doesn't
drop line-rate bursts) and recvmmsg on the client.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 21:55:22 +00:00
enricobuehler b8a33e21a2 feat(1gbps): raise bitrate/probe clamps + socket buffers, count send-buffer drops
ci / rust (push) Has been cancelled
First step of 1 Gbps+ readiness (the whole point of the GF(2^16) Leopard FEC):
make 1 Gbps configurable and its dominant failure mode observable, before the
real transport work (sendmmsg + paced encode|send split) lands.

Investigation (6-way) verdict: we're ~halfway, and it's mostly clamps plus one
real piece of work. The integer/type path, FEC (a 1 Gbps frame is only a few
hundred shards in one GF(2^16) block, far under the 65535 ceiling), AES-GCM
(AES-NI, ~10-25x headroom), and the M1 reassembler bounds (fully derived from
the negotiated FecConfig) are ALL already 1 Gbps-ready and untouched.

This commit (the configurable + observable foundation):
- m3.rs: MAX_BITRATE_KBPS 500_000 -> 2_000_000 (2 Gbps headroom over the 1 Gbps+
  target); MAX_PROBE_KBPS 1_000_000 -> 3_000_000 (probe can demonstrate headroom
  ABOVE the session cap so a client can confidently pick a 1 Gbps+ bitrate).
- transport/udp.rs: TARGET_SOCKBUF 8 MB -> 32 MB (a multi-MB IDR keyframe burst
  no longer fills the buffer); scripts/99-punktfunk-net.conf bumped to match.
- Observability: Transport::send now returns Ok(true|false) (false = WouldBlock
  send-buffer drop, previously a silent Ok(())). Session counts these as a new
  `packets_send_dropped` stat (distinct from recv-side packets_dropped) — in
  Stats, the C ABI PunktfunkStats (header regenerated), a PUNKTFUNK_PERF periodic
  wire-Mbps + drop dump in virtual_stream, and the speed-test probe completion
  log. This is the dominant 1 Gbps+ loss mode and was invisible.

Loopback-verified: a probe now runs at 1.2 Gbps target (no longer truncated to
1 Gbps) with the drop counter live. NOT yet a sustained-1-Gbps proof — the
single-send()-per-packet native path is the next, real piece of work (port the
proven GameStream sendmmsg + paced send thread into the core Transport).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 20:45:49 +00:00
enricobuehler bfd64ce871 rename: lumen → punktfunk, everywhere
ci / rust (push) Has been cancelled
Full project rename, decided 2026-06-10:
- Crates/binaries: punktfunk-core / punktfunk-host / punktfunk-client-rs.
- C ABI: punktfunk_* symbols, Punktfunk* types, include/punktfunk_core.h,
  PUNKTFUNK_FEATURE_QUIC guard (header regenerated; cbindgen renames updated, incl.
  PUNKTFUNK_BTN_*/PUNKTFUNK_AXIS_* wire constants).
- Protocol: punktfunk/1 — control-plane magic LMN1 → PKF1, nonce salt lmn1 → pkf1.
  WIRE BREAK: clients must be rebuilt from this revision.
- Env knobs: PUNKTFUNK_VIDEO_SOURCE / PUNKTFUNK_COMPOSITOR / PUNKTFUNK_ZEROCOPY / ….
- Host config dir: ~/.config/punktfunk (the box's dir was migrated in place — the
  persistent identity is unchanged, pinned fingerprints stay valid).
- Swift package: PunktfunkKit + PunktfunkCore.xcframework + PunktfunkConnection
  (Sources/PunktfunkClient app + tests renamed with it); build-xcframework.sh updated.
- scripts/: 60-punktfunk.rules, punktfunk-host.service; OpenAPI doc regenerated.

Also: scripts/headless/run-headless-kde.sh — full headless Plasma bringup. Root cause of
"desktop but no apps/settings" over the stream: plasmashell launched without
XDG_MENU_PREFIX=plasma-, so the launcher resolved a nonexistent applications.menu and
rendered an empty menu. The script sets the complete KDE session env (menu prefix,
KDE_FULL_SESSION, session version) and rebuilds ksycoca before starting plasmashell.

Gate: 97/97 tests, clippy -D warnings (both feature sets), fmt, C-ABI harness PASS,
zero lumen references left outside .git.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 13:11:59 +00:00