Commit Graph

12 Commits

Author SHA1 Message Date
enricobuehler 9338a8797d style: rustfmt the connect_via_punch match guard
ci / web (push) Successful in 29s
apple / swift (push) Successful in 1m21s
ci / docs-site (push) Successful in 33s
ci / rust (push) Successful in 2m4s
ci / bench (push) Successful in 1m39s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
deb / build-publish (push) Successful in 2m18s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 4m53s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m28s
cargo fmt --all --check failed CI on the long match-arm guard in UdpTransport::connect_via_punch;
apply the formatter's wrapping. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 19:56:25 +00:00
enricobuehler 7ec91aec2d feat(punktfunk/1): cross-VLAN/NAT video via data-plane hole-punching
ci / web (push) Successful in 29s
ci / rust (push) Failing after 38s
ci / docs-site (push) Successful in 30s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 6s
apple / swift (push) Successful in 1m17s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 6s
deb / build-publish (push) Successful in 3m6s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 4m58s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m17s
The video data plane is a raw UDP socket separate from the QUIC control connection. On a flat LAN
the host can send straight to the client, but across NAT or a stateful inter-VLAN firewall the
unsolicited host→client video is rejected (ICMP port-unreachable → the session dies immediately,
while control/audio/input keep working since they ride the client-initiated QUIC). Observed live:
a client on 192.168.6.2 streaming from a host on 192.168.1.48.

Fix: client-initiated hole-punching. The client sends PUNCH_MAGIC datagrams from its data socket
to the host's advertised data port (Welcome.udp_port); that opens the firewall/NAT return path and
lets the host learn the client's OBSERVED source (the NAT-translated address, not the client's
reported private one). The host (UdpTransport::connect_via_punch) waits ≤2.5s for the first punch
and streams there, falling back to the client-reported address for clients that don't punch
(flat-LAN behaviour unchanged). The client keeps a low-rate keepalive so a stateful firewall's idle
timeout can't close the path during a static, low-bitrate scene. Wired into client-rs and the
NativeClient connector (covers the Linux + Apple clients; the Apple app needs an xcframework rebuild
to pick up the new core).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 18:46:15 +00:00
enricobuehler 2ebffe3457 perf(core): recvmsg_x batched receive on Apple (macOS client)
apple / swift (push) Failing after 1m2s
ci / rust (push) Failing after 1m11s
ci / web (push) Failing after 39s
ci / docs-site (push) Failing after 41s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
deb / build-publish (push) Successful in 3m5s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (push) Successful in 4m30s
macOS/iOS have no recvmmsg(2), so the Mac client did one recv() syscall per
packet (non-allocating after the earlier fix, but still a syscall each — a
single-core wall at line rate that Moonlight avoids). Add the Darwin recvmsg_x(2)
batched-receive path (the recv counterpart of Linux recvmmsg): one syscall drains
up to RECV_BATCH datagrams into the reused ring. struct msghdr_x + the extern
aren't in the libc crate, so declared here (cfg target_vendor=apple).

Opt-in via PUNKTFUNK_RECVMSG_X (it's FFI we can't exercise off-Apple) with
auto-fallback to the tested scalar recv-loop on any unexpected error. Linux
recvmmsg + the non-Apple scalar loop are unchanged; apple.yml compiles the path.

Re GRO: Linux recv already batches via recvmmsg (32/syscall), so UDP GRO is only a
marginal add there and needs a recv-path redesign to split coalesced buffers —
deferred as low-ROI vs the Mac, which had no batching at all.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:52:39 +00:00
enricobuehler 448986f41c perf(core): UDP GSO send path (the multi-Gbps lever)
apple / swift (push) Successful in 1m16s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / rust (push) Successful in 1m31s
deb / build-publish (push) Successful in 2m36s
ci / web (push) Failing after 36s
ci / docs-site (push) Failing after 32s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
rpm / build-publish (push) Successful in 4m38s
docker / deploy-docs (push) Successful in 17s
sendmmsg already batches syscalls but still builds one sk_buff per datagram —
the kernel-side wall above ~1 Gbps. UDP Generic Segmentation Offload hands the
kernel one big buffer it splits into gso_size datagrams, building ~1 GSO skb per
≤64 segments. Research (LWN/Cloudflare/Tailscale) measures ~2.4x throughput at
equal CPU and 17-44x fewer syscalls, and that sendmmsg batching alone is
insufficient — you need true segmentation offload.

Adds Transport::send_gso (default = send_batch) + a UdpTransport Linux override:
coalesces a frame's equal-size wire packets (shards are zero-padded to a constant
size, so a whole frame is one gso_size) into ≤64-segment sendmsg(UDP_SEGMENT)
calls. seal/send routes through it. Opt-in via PUNKTFUNK_GSO (new unsafe hot-path
code) with automatic fallback to sendmmsg on any GSO error (unsupported kernel/
path), latched per process. Loopback unit test validates the cmsg segmentation;
full session over loopback streams clean (0% loss). Linux-only; loopback/non-Linux
keep sendmmsg/scalar.

Next levers: in-place AES-GCM seal (kill per-packet allocs), UDP GRO on recv,
drop the sleep-pacing in favor of the kernel qdisc, jumbo MTU.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:29:51 +00:00
enricobuehler 6b5ee9f47b perf(core): batched non-allocating recv on Apple targets (macOS client wall)
apple / swift (push) Failing after 28s
ci / rust (push) Failing after 1m18s
ci / web (push) Failing after 47s
ci / docs-site (push) Failing after 35s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 16s
deb / build-publish (push) Failing after 43s
The batched `recvmmsg` recv path was Linux-only; macOS fell back to the trait
default, which calls the scalar `recv` — a fresh `vec![0u8; 2049]` allocation
(plus zeroing and a copy) PER PACKET on the single receive thread. At line rate
that alloc/free churn, not the syscall, was the single-core wall: measured the
real Mac client topping out ~315 Mbps and dropping the session at 800, while a
Linux client (recvmmsg) held a clean 1 Gbps against the same host, and Moonlight
(batched recv) does 900 on the same Mac.

Add a `cfg(all(unix, not(linux)))` `recv_batch` that drains up to RECV_BATCH
datagrams per call with `libc::recv(MSG_DONTWAIT)` straight into the caller's
reused ring buffers — no per-packet allocation or copy. Still one syscall per
datagram (a future `recvmsg_x` batch would cut that too), but it removes the
dominant cost. Linux recvmmsg path and the Windows/loopback default unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:05:54 +00:00
enricobuehler 86f463cf71 fix(housekeeping): unaligned read UB + recv-drop parity; dedup mmsghdr; doc fixes
ci / rust (push) Has been cancelled
From a bug-hunt + unsafe-audit pass (4 reviewers + adversarial verify). It
confirmed ZERO real bugs in the recent batched/paced data-plane work — these are
the surfaced cleanups + one genuine soundness fix:

- SOUNDNESS (reduce unsafe): inject/gamepad.rs::pump_ff did `ptr::read` of an
  InputEventRaw (align 8, holds a timeval) out of a 1-aligned [u8; N] buffer — UB
  per the reference (x86_64 tolerates it, but it can miscompile under LTO). Use
  ptr::read_unaligned + a SAFETY note. Zero behavior change.
- recv parity: recv_batch (recvmmsg) didn't drop an oversized/truncated datagram
  the way scalar recv does — poll_frame now skips a message whose len fills the
  buffer (> MAX_DATAGRAM_BYTES), matching recv's `n >= RECV_BUF` drop. (AEAD
  already rejected these on encrypted sessions; this restores the documented
  invariant on the batched path.)
- dedup unsafe FFI: factor the identical mmsghdr-from-iovec construction out of
  send_batch + recv_batch into one `mmsghdrs()` helper — the raw-pointer
  scaffolding + its lifetime SAFETY note now live in one place.
- docs: TARGET_SOCKBUF no longer calls paced sending future work (it landed,
  m3.rs::paced_submit); gamescope.rs input is no longer "(TODO)" (wired +
  live-validated); the PUNKTFUNK_PERF `wire_mbps` field is renamed `tx_mbps` and
  noted as attempted/sealed bytes (send_dropped shows what didn't reach the wire).

Full suite (35 + loopback round-trip + 6) + clippy + fmt green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 23:04:23 +00:00
enricobuehler 2f4f92a804 feat(1gbps): batched client recv via recvmmsg (increment C)
ci / rust (push) Has been cancelled
Final increment of the 1 Gbps data-plane rework — the recv counterpart of the
sendmmsg work. The client recv path did one recvfrom + one Vec allocation per
packet (and the pump's 300µs idle sleep could let packets pile up at line rate).

- Transport gains recv_batch(&mut [Vec<u8>], &mut [usize]) -> count; default is
  a single scalar recv into out[0] (loopback + non-Linux).
- UdpTransport overrides it on Linux with recvmmsg (MSG_DONTWAIT) draining up to
  N datagrams per syscall into the caller's reused buffers — no per-packet alloc.
- Session::poll_frame owns a lazily-allocated recv ring (RECV_BATCH=32) and
  consumes it one packet at a time across calls, refilling with one recvmmsg when
  drained. Encapsulated: the punktfunk-client-rs + NativeClient pumps are
  unchanged, and draining a batch per syscall means the 300µs sleep no longer
  underdrains. Added UdpTransport::local_addr (used by the test, generally handy).

~125k → ~4k recv syscalls/sec at line rate, zero per-packet recv allocation.
Verified: new recv_batch_drains_over_loopback test (50 datagrams drained intact
via recvmmsg) + the existing loopback round-trip now runs through the batched
poll_frame; full suite (35 + round-trip + 6) + clippy + fmt green.

Decode-in-place (kill the per-packet open_from_wire alloc) is a separate later
optimization. With A (sendmmsg) + B (paced send) + C (recvmmsg), the native data
plane is batched + paced end to end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:39:51 +00:00
enricobuehler c24b571e37 feat(1gbps): batched send via sendmmsg (Transport::send_batch)
ci / rust (push) Has been cancelled
First increment of the 1 Gbps send-path rework (the measured bottleneck): the
native data plane did one send() syscall per packet — at ~125k pkt/s (1 Gbps
wire) that burns a core on syscalls. Port the proven GameStream sendmmsg path
into the core Transport seam.

- Transport gains `send_batch(&[&[u8]]) -> usize` (count handed to the kernel;
  caller counts the rest as send-buffer drops). Default = the scalar send loop
  (loopback transport + non-Linux).
- UdpTransport overrides it on Linux with `sendmmsg` (64 datagrams/syscall);
  the connected socket needs no per-message address. Non-blocking-aware: a full
  send buffer yields a short count / EAGAIN, and we stop + report what went out
  rather than block or retry (same lossy, FEC-protected contract as send()).
- Session::submit_frame seals every shard then hands the whole frame to
  send_batch in ONE call instead of looping send() — ~64x fewer syscalls per
  frame on the native + GameStream-over-core paths; send_dropped accounting
  preserved (total - sent).

~125k → ~2k syscalls/sec at 1 Gbps line rate. Verified: new loopback-UDP test
send_batch_delivers_over_loopback (100 batched packets arrive intact, datagram
boundaries preserved); full core suite + clippy + fmt green.

Next increments: a paced send thread (microburst shaping so a real NIC doesn't
drop line-rate bursts) and recvmmsg on the client.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 21:55:22 +00:00
enricobuehler b8a33e21a2 feat(1gbps): raise bitrate/probe clamps + socket buffers, count send-buffer drops
ci / rust (push) Has been cancelled
First step of 1 Gbps+ readiness (the whole point of the GF(2^16) Leopard FEC):
make 1 Gbps configurable and its dominant failure mode observable, before the
real transport work (sendmmsg + paced encode|send split) lands.

Investigation (6-way) verdict: we're ~halfway, and it's mostly clamps plus one
real piece of work. The integer/type path, FEC (a 1 Gbps frame is only a few
hundred shards in one GF(2^16) block, far under the 65535 ceiling), AES-GCM
(AES-NI, ~10-25x headroom), and the M1 reassembler bounds (fully derived from
the negotiated FecConfig) are ALL already 1 Gbps-ready and untouched.

This commit (the configurable + observable foundation):
- m3.rs: MAX_BITRATE_KBPS 500_000 -> 2_000_000 (2 Gbps headroom over the 1 Gbps+
  target); MAX_PROBE_KBPS 1_000_000 -> 3_000_000 (probe can demonstrate headroom
  ABOVE the session cap so a client can confidently pick a 1 Gbps+ bitrate).
- transport/udp.rs: TARGET_SOCKBUF 8 MB -> 32 MB (a multi-MB IDR keyframe burst
  no longer fills the buffer); scripts/99-punktfunk-net.conf bumped to match.
- Observability: Transport::send now returns Ok(true|false) (false = WouldBlock
  send-buffer drop, previously a silent Ok(())). Session counts these as a new
  `packets_send_dropped` stat (distinct from recv-side packets_dropped) — in
  Stats, the C ABI PunktfunkStats (header regenerated), a PUNKTFUNK_PERF periodic
  wire-Mbps + drop dump in virtual_stream, and the speed-test probe completion
  log. This is the dominant 1 Gbps+ loss mode and was invisible.

Loopback-verified: a probe now runs at 1.2 Gbps target (no longer truncated to
1 Gbps) with the drop counter live. NOT yet a sustained-1-Gbps proof — the
single-send()-per-packet native path is the next, real piece of work (port the
proven GameStream sendmmsg + paced send thread into the core Transport).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 20:45:49 +00:00
enricobuehler 0f333460ec fix(core): grow UDP socket buffers — fixes 4K/5K video freezing on one frame
The data-plane UDP sockets used the OS default buffer (~208 KB on Linux, similar
on macOS), which is smaller than a single high-resolution frame burst: a
5120×1440 keyframe is ~130 packets the encode|send thread hands to sendmmsg at
once. The burst overflows the buffer — EAGAIN on the host send (now dropped, was
fatal) or a silent drop on the client recv — and because the data plane runs
infinite-GOP, one lost frame breaks every subsequent reference and the decode
freezes on the last good frame until an RFI refresh that may never catch up.
Symptom: connect at 5120×1440, see ONE frame, then a frozen image (audio + input
keep working — those ride QUIC, not this socket).

Set SO_SNDBUF/SO_RCVBUF to 8 MB (clamped by the OS to net.core.{w,r}mem_max on
Linux / kern.ipc.maxsockbuf on macOS); warn if the grant lands far below target so
an undersized host is diagnosable. The client side matters most — the SAME
UdpTransport backs the Apple client's data plane via the C ABI, and macOS grants
multi-MB buffers without any sysctl, so a rebuilt client stops losing frames.

Validated live, bazzite→client at 5120×1440: was 1319/1500 frames (12% loss →
freeze), now 1500/1500 @60 and 5279/5279 @240 (split-encode active), zero
mismatches, p50 1.9–3.4 ms. Host send buffer was still capped at 416 KB and lost
nothing — the loss was purely the client recv buffer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 13:28:02 +00:00
enricobuehler b6f4164454 fix(core): drop video packets on a full UDP send buffer, don't fail the session
UdpTransport sockets are non-blocking, so a momentarily-full kernel send buffer
makes socket.send return WouldBlock (EAGAIN). submit_frame propagated that as a
fatal error, tearing the whole punktfunk/1 session down — observed when attaching
to an already-running source (a headless Steam session) that emits frames at full
rate the instant capture connects: the first burst saturates the tx queue and the
session dies before a single frame reaches the client.

The data plane is lossy + Leopard-FEC-protected and runs infinite-GOP with RFI
keyframes, so the real-time-correct response to a full tx queue is to DROP the
packet (the next frame / FEC recovers) — exactly what the recv path already does
for WouldBlock. Blocking would queue stale frames and add latency. Loopback/M1
paths are unaffected (LoopbackTransport never blocks; M1 tests stay green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 12:55:12 +00:00
enricobuehler bfd64ce871 rename: lumen → punktfunk, everywhere
ci / rust (push) Has been cancelled
Full project rename, decided 2026-06-10:
- Crates/binaries: punktfunk-core / punktfunk-host / punktfunk-client-rs.
- C ABI: punktfunk_* symbols, Punktfunk* types, include/punktfunk_core.h,
  PUNKTFUNK_FEATURE_QUIC guard (header regenerated; cbindgen renames updated, incl.
  PUNKTFUNK_BTN_*/PUNKTFUNK_AXIS_* wire constants).
- Protocol: punktfunk/1 — control-plane magic LMN1 → PKF1, nonce salt lmn1 → pkf1.
  WIRE BREAK: clients must be rebuilt from this revision.
- Env knobs: PUNKTFUNK_VIDEO_SOURCE / PUNKTFUNK_COMPOSITOR / PUNKTFUNK_ZEROCOPY / ….
- Host config dir: ~/.config/punktfunk (the box's dir was migrated in place — the
  persistent identity is unchanged, pinned fingerprints stay valid).
- Swift package: PunktfunkKit + PunktfunkCore.xcframework + PunktfunkConnection
  (Sources/PunktfunkClient app + tests renamed with it); build-xcframework.sh updated.
- scripts/: 60-punktfunk.rules, punktfunk-host.service; OpenAPI doc regenerated.

Also: scripts/headless/run-headless-kde.sh — full headless Plasma bringup. Root cause of
"desktop but no apps/settings" over the stream: plasmashell launched without
XDG_MENU_PREFIX=plasma-, so the launcher resolved a nonexistent applications.menu and
rendered an empty menu. The script sets the complete KDE session env (menu prefix,
KDE_FULL_SESSION, session version) and rebuilds ksycoca before starting plasmashell.

Gate: 97/97 tests, clippy -D warnings (both feature sets), fmt, C-ABI harness PASS,
zero lumen references left outside .git.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 13:11:59 +00:00