Commit Graph

10 Commits

Author SHA1 Message Date
enricobuehler 516efcc3a3 feat(core/fec): adaptive FEC — size recovery to measured loss, not a flat 20%
On a clean link the flat 20% FEC is pure waste: extra wire bytes AND extra
packets. On a packet-rate-bound uplink (the Steam Deck's WiFi tx caps ~22k pps
regardless of bitrate) those extra packets directly cost goodput — measured at
200 Mbps goodput, 20% FEC drove ~10% loss vs ~2.6% at 0% (it saturated the link).

Adaptive FEC closes the loop:
- Client measures the loss FEC is absorbing each ~750 ms window from session stats
  (recovered shards / received, + a bump when a frame went unrecoverable) and sends
  a periodic `LossReport { loss_ppm }` on the control stream (new message;
  `window_loss_ppm` helper, shared + unit-tested). Connector (Apple/Linux/Windows)
  and probe both report; suppressed during a speed test so its filler can't skew it.
- Host maps loss → recovery % (`adapt_fec`: ≈ loss×1.4 + 1pt, clamped 1..50) and
  applies it live via `Session::set_fec_percent` (the wire is self-describing — each
  packet carries its block's data/recovery counts, so the receiver needs no notice).
  A clean link decays to ~1%; loss ramps it up and converges.
- `PUNKTFUNK_FEC_PCT`, when set, now PINS FEC static (disables adaptation) so
  speed-test / measurement runs keep a fixed, known overhead. Unset ⇒ adaptive,
  starting at 10%.

An older host ignores LossReport (unknown control message) and keeps static FEC;
an older client simply never reports and the host holds its start value. Builds +
clippy + fmt + tests green (adapt_fec / window_loss_ppm / loss_report unit tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 21:31:07 +00:00
enricobuehler 9c8fa9340c refactor: drop milestone names + consolidate clients; loss-recovery & rumble fixes
apple / swift (push) Failing after 40s
audit / cargo-audit (push) Failing after 1m12s
windows-msix / package (push) Successful in 1m37s
windows / build (push) Successful in 1m14s
android / android (push) Successful in 4m48s
ci / web (push) Successful in 27s
ci / rust (push) Successful in 4m21s
ci / docs-site (push) Successful in 31s
ci / bench (push) Successful in 4m39s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 19s
deb / build-publish (push) Successful in 6m3s
flatpak / build-publish (push) Successful in 4m13s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m16s
docker / deploy-docs (push) Successful in 18s
Two bodies of work in one commit (the rename moved files the fixes also touched).

Naming/structure cleanup (pre-launch):
- Host modules m3.rs->punktfunk1.rs, m0.rs->spike.rs; CLI m3-host->punktfunk1-host,
  m0->spike; bare `punktfunk-host` now prints help. Types M3Options/M3Source->
  Punktfunk1Options/Punktfunk1Source.
- Clients consolidated out of crates/ into clients/: punktfunk-client-rs->
  clients/probe (crate punktfunk-probe), client-linux->clients/linux,
  client-windows->clients/windows, punktfunk-android->clients/android/native
  (crate punktfunk-client-android; kept [lib] name=punktfunk_android so the JNI
  contract is unchanged). crates/ now holds only core + host.
- Milestone codes M0-M4 purged from code/CLI/CLAUDE.md/README/docs/docs-site,
  kept only in docs/implementation-plan.md. docs/m2-plan.md->
  docs/gamestream-host-plan.md. CI/gradle/flatpak paths updated.

Client loss-recovery (video froze and never recovered after a brief drop):
- Export punktfunk_connection_frames_dropped through the C ABI (the core already
  tracked it for the client keyframe-recovery loop; it was never reachable from
  the ABI clients). Regenerated punktfunk_core.h.
- Apple (StreamPump + Stage2Pipeline) and Android (decode.rs) now poll
  frames_dropped and request a keyframe when it climbs -- the same loss-driven
  recovery Linux/Windows already had. Under infinite GOP the decoder silently
  conceals reference-missing frames, so the decode-error trigger rarely fires.

Apple rumble robustness (worked then went spotty -- DualSense + Xbox):
- Add CHHapticEngine stopped/reset handlers (rebuild on app background / audio
  interruption / server reset) and drop the permanent `broken` latch on a
  transient drive failure; latch only when the controller truly has no haptics.
- Surface swallowed SDL set_rumble errors on Linux/Windows + diagnostic logging.

Verified: cargo build/clippy/fmt --workspace, C-ABI harness, header drift.
Not runnable on this box (verify in CI): Gitea workflows, gradle/Android,
flatpak, Swift/decky.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 21:05:58 +00:00
enricobuehler 9c86f667ca perf(core): in-place AES-GCM seal + reused wire-buffer pool (host send)
ci / web (push) Failing after 39s
ci / docs-site (push) Failing after 33s
apple / swift (push) Successful in 1m16s
ci / rust (push) Successful in 1m20s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
deb / build-publish (push) Successful in 3m3s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (push) Successful in 4m35s
The host sealed every packet with ~3 heap allocations: aes-gcm's convenience
encrypt() allocates the ciphertext Vec, seal_for_wire allocates the seq||ct||tag
wire Vec, and seal_frame allocated a fresh Vec<Vec<u8>> per frame. At line rate
(~250k–500k pkt/s for 2.5–5 Gbps) that's the single-core allocator wall.

- SessionCrypto::seal_in_place uses AeadInPlace::encrypt_in_place_detached to
  encrypt into the caller's buffer and write the detached tag at the end —
  byte-identical to seal's ciphertext||tag, no allocation (unit-tested for byte
  equality + decrypt).
- Session keeps a wire_pool the caller returns via reclaim_wires; seal_frame
  seals each packet in place into the reused buffers (clear() keeps capacity), so
  after warmup there's no per-packet ciphertext/wire allocation. paced_submit and
  submit_frame reclaim the pool after sending.

End-to-end encrypted/lossless multi-frame tests stay green (validates the pool
reuse doesn't corrupt across frames). Next: write packetize directly into a
contiguous send buffer (kills the remaining shard allocs + GSO's coalescing copy).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:47:38 +00:00
enricobuehler 448986f41c perf(core): UDP GSO send path (the multi-Gbps lever)
apple / swift (push) Successful in 1m16s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / rust (push) Successful in 1m31s
deb / build-publish (push) Successful in 2m36s
ci / web (push) Failing after 36s
ci / docs-site (push) Failing after 32s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
rpm / build-publish (push) Successful in 4m38s
docker / deploy-docs (push) Successful in 17s
sendmmsg already batches syscalls but still builds one sk_buff per datagram —
the kernel-side wall above ~1 Gbps. UDP Generic Segmentation Offload hands the
kernel one big buffer it splits into gso_size datagrams, building ~1 GSO skb per
≤64 segments. Research (LWN/Cloudflare/Tailscale) measures ~2.4x throughput at
equal CPU and 17-44x fewer syscalls, and that sendmmsg batching alone is
insufficient — you need true segmentation offload.

Adds Transport::send_gso (default = send_batch) + a UdpTransport Linux override:
coalesces a frame's equal-size wire packets (shards are zero-padded to a constant
size, so a whole frame is one gso_size) into ≤64-segment sendmsg(UDP_SEGMENT)
calls. seal/send routes through it. Opt-in via PUNKTFUNK_GSO (new unsafe hot-path
code) with automatic fallback to sendmmsg on any GSO error (unsupported kernel/
path), latched per process. Loopback unit test validates the cmsg segmentation;
full session over loopback streams clean (0% loss). Linux-only; loopback/non-Linux
keep sendmmsg/scalar.

Next levers: in-place AES-GCM seal (kill per-packet allocs), UDP GRO on recv,
drop the sleep-pacing in favor of the kernel qdisc, jumbo MTU.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:29:51 +00:00
enricobuehler 86f463cf71 fix(housekeeping): unaligned read UB + recv-drop parity; dedup mmsghdr; doc fixes
ci / rust (push) Has been cancelled
From a bug-hunt + unsafe-audit pass (4 reviewers + adversarial verify). It
confirmed ZERO real bugs in the recent batched/paced data-plane work — these are
the surfaced cleanups + one genuine soundness fix:

- SOUNDNESS (reduce unsafe): inject/gamepad.rs::pump_ff did `ptr::read` of an
  InputEventRaw (align 8, holds a timeval) out of a 1-aligned [u8; N] buffer — UB
  per the reference (x86_64 tolerates it, but it can miscompile under LTO). Use
  ptr::read_unaligned + a SAFETY note. Zero behavior change.
- recv parity: recv_batch (recvmmsg) didn't drop an oversized/truncated datagram
  the way scalar recv does — poll_frame now skips a message whose len fills the
  buffer (> MAX_DATAGRAM_BYTES), matching recv's `n >= RECV_BUF` drop. (AEAD
  already rejected these on encrypted sessions; this restores the documented
  invariant on the batched path.)
- dedup unsafe FFI: factor the identical mmsghdr-from-iovec construction out of
  send_batch + recv_batch into one `mmsghdrs()` helper — the raw-pointer
  scaffolding + its lifetime SAFETY note now live in one place.
- docs: TARGET_SOCKBUF no longer calls paced sending future work (it landed,
  m3.rs::paced_submit); gamescope.rs input is no longer "(TODO)" (wired +
  live-validated); the PUNKTFUNK_PERF `wire_mbps` field is renamed `tx_mbps` and
  noted as attempted/sealed bytes (send_dropped shows what didn't reach the wire).

Full suite (35 + loopback round-trip + 6) + clippy + fmt green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 23:04:23 +00:00
enricobuehler 2f4f92a804 feat(1gbps): batched client recv via recvmmsg (increment C)
ci / rust (push) Has been cancelled
Final increment of the 1 Gbps data-plane rework — the recv counterpart of the
sendmmsg work. The client recv path did one recvfrom + one Vec allocation per
packet (and the pump's 300µs idle sleep could let packets pile up at line rate).

- Transport gains recv_batch(&mut [Vec<u8>], &mut [usize]) -> count; default is
  a single scalar recv into out[0] (loopback + non-Linux).
- UdpTransport overrides it on Linux with recvmmsg (MSG_DONTWAIT) draining up to
  N datagrams per syscall into the caller's reused buffers — no per-packet alloc.
- Session::poll_frame owns a lazily-allocated recv ring (RECV_BATCH=32) and
  consumes it one packet at a time across calls, refilling with one recvmmsg when
  drained. Encapsulated: the punktfunk-client-rs + NativeClient pumps are
  unchanged, and draining a batch per syscall means the 300µs sleep no longer
  underdrains. Added UdpTransport::local_addr (used by the test, generally handy).

~125k → ~4k recv syscalls/sec at line rate, zero per-packet recv allocation.
Verified: new recv_batch_drains_over_loopback test (50 datagrams drained intact
via recvmmsg) + the existing loopback round-trip now runs through the batched
poll_frame; full suite (35 + round-trip + 6) + clippy + fmt green.

Decode-in-place (kill the per-packet open_from_wire alloc) is a separate later
optimization. With A (sendmmsg) + B (paced send) + C (recvmmsg), the native data
plane is batched + paced end to end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:39:51 +00:00
enricobuehler 10a932d013 feat(1gbps): pace per-frame sends so high-bitrate frames don't burst-drop
ci / rust (push) Has been cancelled
Increment B of the send-path rework — the actual fix for "freezes get more
common over ~150 Mbps, no image at all at 400 Mbps" on the native path. Cause:
the encoder emits a frame and submit_frame blasted ALL its packets at once into
the NIC; a real link drops the line-rate burst (host send buffer EAGAINs), and
under infinite GOP one dropped frame freezes the decode until the next keyframe.
(The speed-test probe showed 0 drops at 400 Mbps because the probe is self-paced;
real video wasn't.)

Adaptive pacing, no extra thread, no regression:
- Session splits into seal_frame (FEC + packetize + seal → wire packets, no
  send) and send_sealed (one batched sendmmsg of a chunk, counts drops);
  submit_frame is now their composition (synthetic + probe paths unchanged).
- virtual_stream's paced_submit seals a frame then sends it in 16-packet chunks
  spread over ~90% of the time until the next frame is due. At 60 fps desktop
  (fast encode → lots of slack) the frame spreads across the interval → no NIC
  burst → no freeze. At 240 fps@5K (encode ≈ interval → ~0 slack) the budget
  collapses and every chunk goes out immediately → never slower than before.

Core suite (34 + loopback round-trip + 6) + clippy + fmt green. The seal/send
split is covered by the existing loopback tests; the pacing is host timing,
verified by review (live-test needs a real NIC — your Mac at a raised bitrate).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:15:52 +00:00
enricobuehler c24b571e37 feat(1gbps): batched send via sendmmsg (Transport::send_batch)
ci / rust (push) Has been cancelled
First increment of the 1 Gbps send-path rework (the measured bottleneck): the
native data plane did one send() syscall per packet — at ~125k pkt/s (1 Gbps
wire) that burns a core on syscalls. Port the proven GameStream sendmmsg path
into the core Transport seam.

- Transport gains `send_batch(&[&[u8]]) -> usize` (count handed to the kernel;
  caller counts the rest as send-buffer drops). Default = the scalar send loop
  (loopback transport + non-Linux).
- UdpTransport overrides it on Linux with `sendmmsg` (64 datagrams/syscall);
  the connected socket needs no per-message address. Non-blocking-aware: a full
  send buffer yields a short count / EAGAIN, and we stop + report what went out
  rather than block or retry (same lossy, FEC-protected contract as send()).
- Session::submit_frame seals every shard then hands the whole frame to
  send_batch in ONE call instead of looping send() — ~64x fewer syscalls per
  frame on the native + GameStream-over-core paths; send_dropped accounting
  preserved (total - sent).

~125k → ~2k syscalls/sec at 1 Gbps line rate. Verified: new loopback-UDP test
send_batch_delivers_over_loopback (100 batched packets arrive intact, datagram
boundaries preserved); full core suite + clippy + fmt green.

Next increments: a paced send thread (microburst shaping so a real NIC doesn't
drop line-rate bursts) and recvmmsg on the client.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 21:55:22 +00:00
enricobuehler b8a33e21a2 feat(1gbps): raise bitrate/probe clamps + socket buffers, count send-buffer drops
ci / rust (push) Has been cancelled
First step of 1 Gbps+ readiness (the whole point of the GF(2^16) Leopard FEC):
make 1 Gbps configurable and its dominant failure mode observable, before the
real transport work (sendmmsg + paced encode|send split) lands.

Investigation (6-way) verdict: we're ~halfway, and it's mostly clamps plus one
real piece of work. The integer/type path, FEC (a 1 Gbps frame is only a few
hundred shards in one GF(2^16) block, far under the 65535 ceiling), AES-GCM
(AES-NI, ~10-25x headroom), and the M1 reassembler bounds (fully derived from
the negotiated FecConfig) are ALL already 1 Gbps-ready and untouched.

This commit (the configurable + observable foundation):
- m3.rs: MAX_BITRATE_KBPS 500_000 -> 2_000_000 (2 Gbps headroom over the 1 Gbps+
  target); MAX_PROBE_KBPS 1_000_000 -> 3_000_000 (probe can demonstrate headroom
  ABOVE the session cap so a client can confidently pick a 1 Gbps+ bitrate).
- transport/udp.rs: TARGET_SOCKBUF 8 MB -> 32 MB (a multi-MB IDR keyframe burst
  no longer fills the buffer); scripts/99-punktfunk-net.conf bumped to match.
- Observability: Transport::send now returns Ok(true|false) (false = WouldBlock
  send-buffer drop, previously a silent Ok(())). Session counts these as a new
  `packets_send_dropped` stat (distinct from recv-side packets_dropped) — in
  Stats, the C ABI PunktfunkStats (header regenerated), a PUNKTFUNK_PERF periodic
  wire-Mbps + drop dump in virtual_stream, and the speed-test probe completion
  log. This is the dominant 1 Gbps+ loss mode and was invisible.

Loopback-verified: a probe now runs at 1.2 Gbps target (no longer truncated to
1 Gbps) with the drop counter live. NOT yet a sustained-1-Gbps proof — the
single-send()-per-packet native path is the next, real piece of work (port the
proven GameStream sendmmsg + paced send thread into the core Transport).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 20:45:49 +00:00
enricobuehler bfd64ce871 rename: lumen → punktfunk, everywhere
ci / rust (push) Has been cancelled
Full project rename, decided 2026-06-10:
- Crates/binaries: punktfunk-core / punktfunk-host / punktfunk-client-rs.
- C ABI: punktfunk_* symbols, Punktfunk* types, include/punktfunk_core.h,
  PUNKTFUNK_FEATURE_QUIC guard (header regenerated; cbindgen renames updated, incl.
  PUNKTFUNK_BTN_*/PUNKTFUNK_AXIS_* wire constants).
- Protocol: punktfunk/1 — control-plane magic LMN1 → PKF1, nonce salt lmn1 → pkf1.
  WIRE BREAK: clients must be rebuilt from this revision.
- Env knobs: PUNKTFUNK_VIDEO_SOURCE / PUNKTFUNK_COMPOSITOR / PUNKTFUNK_ZEROCOPY / ….
- Host config dir: ~/.config/punktfunk (the box's dir was migrated in place — the
  persistent identity is unchanged, pinned fingerprints stay valid).
- Swift package: PunktfunkKit + PunktfunkCore.xcframework + PunktfunkConnection
  (Sources/PunktfunkClient app + tests renamed with it); build-xcframework.sh updated.
- scripts/: 60-punktfunk.rules, punktfunk-host.service; OpenAPI doc regenerated.

Also: scripts/headless/run-headless-kde.sh — full headless Plasma bringup. Root cause of
"desktop but no apps/settings" over the stream: plasmashell launched without
XDG_MENU_PREFIX=plasma-, so the launcher resolved a nonexistent applications.menu and
rendered an empty menu. The script sets the complete KDE session env (menu prefix,
KDE_FULL_SESSION, session version) and rebuilds ksycoca before starting plasmashell.

Gate: 97/97 tests, clippy -D warnings (both feature sets), fmt, C-ABI harness PASS,
zero lumen references left outside .git.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 13:11:59 +00:00