Commit Graph

113 Commits

Author SHA1 Message Date
enricobuehler 7ecf2d8dfd fix(inject/libei): emit the continuous scroll axis so small scrolls register
ci / rust (push) Failing after 40s
ci / web (push) Failing after 37s
apple / swift (push) Successful in 1m23s
ci / docs-site (push) Failing after 41s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 7s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 6s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
deb / build-publish (push) Successful in 3m0s
docker / deploy-docs (push) Successful in 19s
rpm / build-publish (push) Successful in 4m18s
The libei backend forwarded mouse wheel only via scroll_discrete (120-per-detent).
Mutter floors a sub-detent delta — a trackpad, a precise/high-res wheel, or a
fractional smooth-scroll event — to zero whole clicks, so small scrolls never land and
you have to spin the wheel a lot before anything moves. Emit the continuous `scroll`
axis (logical px, ~15 px/detent) alongside the discrete steps, matching the wlroots
backend's 15-px/notch behaviour, so every delta moves proportionally while full
detents still drive line/page scrolling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 12:37:07 +00:00
enricobuehler 55dfb4800f fix(vdisplay/mutter): stop the teardown layout-restore from SIGSEGVing gnome-shell
After a session ends, the Mutter backend (with PUNKTFUNK_MUTTER_VIRTUAL_PRIMARY=1)
re-asserted the physical monitor layout with an explicit ApplyMonitorsConfig. On
Mutter 50 + NVIDIA that monitor reconfig — issued while the just-removed high-refresh
virtual output is still tearing down — SIGSEGVs gnome-shell. Observed live on
home-worker-3: the teardown ApplyMonitorsConfig returns "recipient disconnected from
message bus" (the shell died mid-call), GDM's crash-loop guard then drops to the
greeter and STAYS there, so org.gnome.Mutter.RemoteDesktop/DisplayConfig vanish and
every subsequent reconnect fails with RemoteDesktop.CreateSession ServiceUnknown —
i.e. "after a disconnect I can't reconnect anymore."

make_virtual_primary applies an APPLY_TEMPORARY config, which Mutter reverts on its
own once the virtual output disappears and our DisplayConfig connection closes. So the
explicit restore was both redundant and the crash trigger: drop it, drop the dc_pre
connection at teardown, and let Mutter revert the temporary config itself. Setup is
unchanged (the virtual output is still made primary so the desktop lands on the
streamed surface). Removes the now-unused to_apply_logicals/apply_config helpers.

Verified live on home-worker-3 (5120x1440@240, VIRTUAL_PRIMARY=1): 6/6 back-to-back
connect/disconnect cycles streamed cleanly with gnome-shell holding the same PID
throughout (previously it crashed within the first few disconnects).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 12:37:07 +00:00
enricobuehler dad5a08c1f chore(capture): tidy the GNOME flash diagnostic — it's the CORRUPTED skip
ci / docs-site (push) Failing after 40s
apple / swift (push) Successful in 1m17s
ci / rust (push) Successful in 1m20s
ci / web (push) Failing after 34s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
deb / build-publish (push) Successful in 2m52s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (push) Successful in 5m12s
Live confirmation on worker-3: the flash was Mutter's CORRUPTED, size-0
cursor-update buffers (chunk_flags=CORRUPTED) carrying recycled old frames —
drained=1 always, so latest-frame-only draining wasn't the lever, the CORRUPTED
skip was (OBS issue 8630). Demote the verbose drain diagnostic to a rate-limited
debug line and document the root cause inline. Validated: zero-copy back on GNOME
(dmabuf->CUDA, 5120x1440) AND flash-free with FORCE_SHM off.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 11:28:11 +00:00
enricobuehler d8da12bbbd fix(capture/mutter): latest-frame-only dequeue (the real GNOME flash fix)
ci / web (push) Failing after 39s
apple / swift (push) Successful in 1m18s
ci / rust (push) Successful in 1m22s
ci / docs-site (push) Failing after 44s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
deb / build-publish (push) Successful in 3m17s
docker / deploy-docs (push) Successful in 19s
rpm / build-publish (push) Successful in 4m42s
Deep research (OBS Studio's linux-pipewire, Mutter bug tracker) found the GNOME
stale-frame flash is a buffer-RECYCLING race, not damage (Mutter sends whole
frames, no SPA_META_VideoDamage) and not buffer count. OBS's proven fix is
latest-frame-only dequeue: each process callback, drain ALL queued PipeWire
buffers, requeue the older ones, and consume only the NEWEST — plus skip
CORRUPTED buffers. Our code dequeued one buffer per callback (oldest-first) and
the bounded channel dropped the NEWEST when full, so during Mutter's bursty
delivery the encoder got stale frames → the flash.

Switch the process callback to raw dequeue_raw_buffer + drain-to-newest (requeue
older), extract the consume logic into consume_frame(spa_buf) sourcing datas via
the transparent Data cast, skip SPA_META_HEADER_FLAG_CORRUPTED / CORRUPTED-chunk
buffers (size-0 skip kept SHM-only so dmabuf isn't regressed), and remove the
earlier content-hash drop heuristic (it couldn't tell stale re-deliveries from
legit repeating content). Diagnostic logs drain depth + chunk/header flags.
Reverts none of the FORCE_SHM / dmabuf_fence work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 11:15:01 +00:00
enricobuehler 79508b2666 fix(capture/mutter): drop stale re-delivered frames (the GNOME flash)
ci / web (push) Failing after 40s
apple / swift (push) Successful in 1m17s
ci / docs-site (push) Failing after 37s
ci / rust (push) Successful in 1m20s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
deb / build-publish (push) Successful in 2m53s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (push) Successful in 5m11s
Instrumented worker-3: even on the ordered FORCE_SHM download path, Mutter
re-delivers COMPLETE OLD pool buffers — 655 frames in a 15 s session whose content
exactly matched an earlier frame (not damage-incremental; full old frames, in
runs, ~45% during motion). NVIDIA gives no fence to prevent it, so the producer
delivery can't be made clean from our side.

Detect it and drop it: hash a spatial sample of each captured frame; a frame whose
content equals an EARLIER distinct frame (vs the current one, whose duplicates pass
through) is a stale re-delivery — skip it so the encoder never emits the flash
(try_latest re-sends the last good frame; brief hold instead of a backward jump).
Runs on the CPU/SHM path (where Mutter+NVIDIA capture lives); never triggers on
static content or non-Mutter compositors (no reverts). PUNKTFUNK_KEEP_STALE=1
disables it for A/B.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:46:27 +00:00
enricobuehler 4098b252bc fix(abi): exclude internal Apple recvmsg_x FFI from the C header
ci / web (push) Failing after 46s
apple / swift (push) Successful in 1m17s
ci / docs-site (push) Failing after 32s
ci / rust (push) Successful in 1m20s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
deb / build-publish (push) Successful in 3m16s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (push) Successful in 4m43s
cbindgen swept transport/udp.rs's `recvmsg_x` foreign import and its `MsghdrX`
#[repr(C)] struct into the generated C header — they're internal Apple-only FFI,
not part of the public C ABI, and reference socklen_t/ssize_t/iovec which the C
ABI harness doesn't include, so c_abi_harness_round_trips failed to compile.
Add them to cbindgen.toml export.exclude and regenerate the header.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 09:44:03 +00:00
enricobuehler f9b857aac2 feat(capture): true SHM path (PUNKTFUNK_FORCE_SHM) for race-free Mutter+NVIDIA
ci / web (push) Failing after 37s
apple / swift (push) Failing after 1m3s
ci / rust (push) Failing after 1m11s
ci / docs-site (push) Failing after 43s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
deb / build-publish (push) Successful in 2m55s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (push) Successful in 5m17s
Empirically, Mutter+NVIDIA dmabuf capture has NO working GPU sync — confirmed on
worker-3: explicit sync fails buffer alloc (EINVAL, no cogl sync_fd), and the
dmabuf carries no implicit fence (EXPORT_SYNC_FILE waited=false). So any dmabuf
read — zero-copy import OR mmap — races Mutter's render and flashes the buffer's
previous frame. The prior "CPU fallback" still listed DmaBuf in its buffer types,
so Mutter kept handing dmabufs and it never fixed anything (got worse).

PUNKTFUNK_FORCE_SHM=1 offers MemPtr+MemFd ONLY (no DmaBuf), forcing Mutter to
glReadPixels-download into mappable memory — which orders against its render, so
the frame is complete + current by construction (race-free). Costs the download
(~3 ms) + zero-copy; correct at 1080p/4K60. KWin/gamescope are unaffected (they
blit into the buffer, no read-before-render race) and keep zero-copy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 09:35:28 +00:00
enricobuehler 92c6da9546 fix(capture/mutter): restore zero-copy + sync via dmabuf implicit fence
ci / web (push) Failing after 42s
apple / swift (push) Failing after 1m5s
ci / rust (push) Failing after 1m10s
ci / docs-site (push) Failing after 44s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
deb / build-publish (push) Successful in 2m54s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (push) Successful in 5m13s
The previous attempt (8531135) dropped zero-copy on Mutter+NVIDIA for a sticky
CPU/SHM fallback that (a) still listed SPA_DATA_DmaBuf in its buffer types, so
Mutter kept handing dmabufs that got mmap-read UNsynced — making the flashing
worse, not better — and (b) hinged on producer explicit sync, which Mutter+NVIDIA
cannot do (`error alloc buffers` / no cogl sync_fd, confirmed in worker-3 logs).

Revert the capture restructure to the original zero-copy dmabuf path, and fix the
NVIDIA stale-frame race the RIGHT way for a producer that can't do explicit sync:
the consumer snapshots the dmabuf's implicit fence (DMA_BUF_IOCTL_EXPORT_SYNC_FILE)
and waits the producer's render before sampling (new dmabuf_fence module, ioctl
number unit-tested). Covers the GPU import and the CPU mmap read. Logs once whether
a render was actually in flight (waited=true → the driver fences and the race is
closed; false → no implicit fence, so we learn zero-copy still needs SHM here).

drm_sync (the explicit-sync primitive) is kept and verified but marked unused —
no targeted compositor produces a usable sync_fd today; ready to wire in when one
does. The Bug-2 input fix (held-key release on disconnect) from 8531135 is kept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 09:28:17 +00:00
enricobuehler 8531135bb7 fix(capture/mutter): stale-frame flashes + stuck input after disconnect on GNOME
ci / web (push) Failing after 49s
apple / swift (push) Failing after 1m4s
ci / rust (push) Failing after 1m9s
ci / docs-site (push) Failing after 42s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 6s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
deb / build-publish (push) Successful in 2m58s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (push) Successful in 4m17s
Deep dive into the two GNOME-only host bugs (KWin/gamescope clean):

1. Stale-frame flashes (windows at old positions, typed text reverting):
   Mutter renders its virtual monitors DIRECTLY into the PipeWire buffer
   pool, and NVIDIA has no implicit dmabuf fencing — our zero-copy
   import raced the render and encoded each pool buffer's PREVIOUS
   contents. Fix, in order of preference:
   - Consumer-side PipeWire explicit sync (SPA_META_SyncTimeline): new
     drm_sync module (DRM timeline-syncobj wait/signal via raw ioctls,
     unit-tested incl. a live signal->wait round trip); announced
     post-format via update_params (the OBS pattern — at connect time
     the meta makes producers fail allocation, observed on KWin), with
     a blocks=3 Buffers filter so the producer's sync pod wins; acquire
     point awaited before any read (GPU import or CPU mmap), release
     point signaled on every path.
   - Where the producer can't do explicit sync (Mutter on NVIDIA today:
     no cogl sync_fd, "error alloc buffers"), a sticky fallback flips
     the capture to the synchronous CPU/shm path — Mutter's glReadPixels
     download orders against its render, so frames are correct by
     construction. First session pays one ~10 s probe+retry; later
     sessions go straight there. Validated live on home-worker-3
     (GNOME 50 + RTX 4090): clean fallback, 30 MB HEVC streamed.
   - Sync is only announced on Mutter sessions (new VirtualOutput.mutter
     tag): KWin+NVIDIA fails allocation when merely asked, and doesn't
     need it (verified unchanged: zero-copy CUDA import + 1.1 MB/10 s).
   PUNKTFUNK_EXPLICIT_SYNC=0 disables the probe outright.

2. Clicks wedged in the focused app after disconnect+reconnect: a client
   vanishing mid-press left keys/buttons latched in the compositor —
   Mutter keeps the destroyed EIS device's implicit grab and the focused
   app stops taking clicks until restarted. EiState now tracks held
   keys/buttons/touches (wire codes) and synthesizes releases through
   the normal inject path before the EIS connection goes away.

GNOME hosts on NVIDIA temporarily lose zero-copy (correctness over
throughput); the moment Mutter+driver gain working explicit sync, the
sync path engages automatically and zero-copy returns.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:34:42 +00:00
enricobuehler 2ebffe3457 perf(core): recvmsg_x batched receive on Apple (macOS client)
apple / swift (push) Failing after 1m2s
ci / rust (push) Failing after 1m11s
ci / web (push) Failing after 39s
ci / docs-site (push) Failing after 41s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
deb / build-publish (push) Successful in 3m5s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (push) Successful in 4m30s
macOS/iOS have no recvmmsg(2), so the Mac client did one recv() syscall per
packet (non-allocating after the earlier fix, but still a syscall each — a
single-core wall at line rate that Moonlight avoids). Add the Darwin recvmsg_x(2)
batched-receive path (the recv counterpart of Linux recvmmsg): one syscall drains
up to RECV_BATCH datagrams into the reused ring. struct msghdr_x + the extern
aren't in the libc crate, so declared here (cfg target_vendor=apple).

Opt-in via PUNKTFUNK_RECVMSG_X (it's FFI we can't exercise off-Apple) with
auto-fallback to the tested scalar recv-loop on any unexpected error. Linux
recvmmsg + the non-Apple scalar loop are unchanged; apple.yml compiles the path.

Re GRO: Linux recv already batches via recvmmsg (32/syscall), so UDP GRO is only a
marginal add there and needs a recv-path redesign to split coalesced buffers —
deferred as low-ROI vs the Mac, which had no batching at all.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:52:39 +00:00
enricobuehler 9c86f667ca perf(core): in-place AES-GCM seal + reused wire-buffer pool (host send)
ci / web (push) Failing after 39s
ci / docs-site (push) Failing after 33s
apple / swift (push) Successful in 1m16s
ci / rust (push) Successful in 1m20s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
deb / build-publish (push) Successful in 3m3s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (push) Successful in 4m35s
The host sealed every packet with ~3 heap allocations: aes-gcm's convenience
encrypt() allocates the ciphertext Vec, seal_for_wire allocates the seq||ct||tag
wire Vec, and seal_frame allocated a fresh Vec<Vec<u8>> per frame. At line rate
(~250k–500k pkt/s for 2.5–5 Gbps) that's the single-core allocator wall.

- SessionCrypto::seal_in_place uses AeadInPlace::encrypt_in_place_detached to
  encrypt into the caller's buffer and write the detached tag at the end —
  byte-identical to seal's ciphertext||tag, no allocation (unit-tested for byte
  equality + decrypt).
- Session keeps a wire_pool the caller returns via reclaim_wires; seal_frame
  seals each packet in place into the reused buffers (clear() keeps capacity), so
  after warmup there's no per-packet ciphertext/wire allocation. paced_submit and
  submit_frame reclaim the pool after sending.

End-to-end encrypted/lossless multi-frame tests stay green (validates the pool
reuse doesn't corrupt across frames). Next: write packetize directly into a
contiguous send buffer (kills the remaining shard allocs + GSO's coalescing copy).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:47:38 +00:00
enricobuehler 448986f41c perf(core): UDP GSO send path (the multi-Gbps lever)
apple / swift (push) Successful in 1m16s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / rust (push) Successful in 1m31s
deb / build-publish (push) Successful in 2m36s
ci / web (push) Failing after 36s
ci / docs-site (push) Failing after 32s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
rpm / build-publish (push) Successful in 4m38s
docker / deploy-docs (push) Successful in 17s
sendmmsg already batches syscalls but still builds one sk_buff per datagram —
the kernel-side wall above ~1 Gbps. UDP Generic Segmentation Offload hands the
kernel one big buffer it splits into gso_size datagrams, building ~1 GSO skb per
≤64 segments. Research (LWN/Cloudflare/Tailscale) measures ~2.4x throughput at
equal CPU and 17-44x fewer syscalls, and that sendmmsg batching alone is
insufficient — you need true segmentation offload.

Adds Transport::send_gso (default = send_batch) + a UdpTransport Linux override:
coalesces a frame's equal-size wire packets (shards are zero-padded to a constant
size, so a whole frame is one gso_size) into ≤64-segment sendmsg(UDP_SEGMENT)
calls. seal/send routes through it. Opt-in via PUNKTFUNK_GSO (new unsafe hot-path
code) with automatic fallback to sendmmsg on any GSO error (unsupported kernel/
path), latched per process. Loopback unit test validates the cmsg segmentation;
full session over loopback streams clean (0% loss). Linux-only; loopback/non-Linux
keep sendmmsg/scalar.

Next levers: in-place AES-GCM seal (kill per-packet allocs), UDP GRO on recv,
drop the sleep-pacing in favor of the kernel qdisc, jumbo MTU.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:29:51 +00:00
enricobuehler 4b1bbfdf0e feat(client-linux): VAAPI hardware decode — zero-copy dmabuf into GraphicsOffload
ci / docs-site (push) Failing after 45s
ci / web (push) Failing after 32s
apple / swift (push) Successful in 1m16s
ci / rust (push) Failing after 1m18s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 7s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 7s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Failing after 1m38s
rpm / build-publish (push) Successful in 4m10s
Stage 1.5: on Intel/AMD clients libavcodec's VAAPI hwaccel decodes on
the GPU; frames map to DRM-PRIME dmabufs (av_hwframe_map, zero copy)
and reach GTK as GdkDmabufTexture (BT.709 limited CICP color state —
GDK's dmabuf default is BT.601). Inside GtkGraphicsOffload that is the
decoder-to-subsurface path, direct-scanout eligible when fullscreen.

Fallback ladder, live-verified on the NVIDIA dev box: no VAAPI device
-> software decode at session start (logged reason); a mid-session
VAAPI error (e.g. broken nvidia-vaapi-driver) demotes to software and
the host's IDR/RFI recovery resynchronizes; a rejected dmabuf import
logs and the stream continues. PUNKTFUNK_DECODER=software|vaapi
overrides; the first-frame log now names the active path.

The hwaccel path is raw ffmpeg-sys FFI (ffmpeg-next wraps none of it):
hw device ctx + get_format pinned to AV_PIX_FMT_VAAPI (NONE on
mismatch so cpu-fallback never silently engages inside libavcodec),
thread_count=1, LOW_DELAY. Surface lifetime rides DrmFrameGuard into
the texture's release func — GDK runs it on both success and failure.

Needs an Intel/AMD client box (Steam Deck/Bazzite) to live-verify the
hardware path; the software path is unchanged and revalidated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:26:59 +00:00
enricobuehler b5c30dff4f perf(host): lift bitrate cap to 8G, raise MTU to 1452, FEC env knob
Groundwork for multi-Gbps (2.5G link here, 5G to the Mac Studio). The encoder is
pixel-rate bound, not bitrate bound, so these unblock the transport:
- MAX_BITRATE_KBPS 2G -> 8G, MAX_PROBE_KBPS 3G -> 10G (the cap was policy, not a
  hardware limit — NVENC emits multi-Gbps trivially with the 2-way split).
- Welcome shard_payload 1200 -> 1452: fills a 1500 MTU, ~17% fewer packets for
  free (even size, FEC-safe; negotiated so the client follows).
- PUNKTFUNK_FEC_PCT env overrides the 20% FEC default — a clean wired LAN can drop
  it (every recovery shard is wire bytes+packets); 0 disables FEC.

Next: UDP GSO (the dominant lever — research shows ~2.4x throughput / ~40x fewer
syscalls; sendmmsg batching alone is insufficient) + in-place AES-GCM seal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:20:46 +00:00
enricobuehler 11fc3be726 fix(core): libc is a unix-wide dep — unbreak iOS/tvOS xcframework slices
ci / web (push) Failing after 37s
ci / docs-site (push) Failing after 36s
apple / swift (push) Successful in 1m17s
deb / build-publish (push) Failing after 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Failing after 1s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 1s
ci / rust (push) Failing after 1m22s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 56s
6b5ee9f added a libc-based batched recv_batch for the Apple/BSD targets
(cfg(all(unix, not(target_os = "linux")))) but left libc declared only under
cfg(target_os = "linux"). The macOS host build pulls libc in transitively so it
compiled, but the iOS/tvOS cross-compiles (no transitive libc, dev-deps off) failed
with E0433 "cannot find crate libc", breaking the full xcframework build. Widen the
gate to cfg(unix): libc is now used by sendmmsg/recvmmsg on Linux AND recv() on the
other unix (Apple/BSD) targets.

Verified: cargo build --release -p punktfunk-core --features quic for
aarch64-apple-ios, x86_64-apple-ios, and aarch64-apple-tvos (-Z build-std) all link.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 01:12:56 +02:00
enricobuehler 6b5ee9f47b perf(core): batched non-allocating recv on Apple targets (macOS client wall)
apple / swift (push) Failing after 28s
ci / rust (push) Failing after 1m18s
ci / web (push) Failing after 47s
ci / docs-site (push) Failing after 35s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 16s
deb / build-publish (push) Failing after 43s
The batched `recvmmsg` recv path was Linux-only; macOS fell back to the trait
default, which calls the scalar `recv` — a fresh `vec![0u8; 2049]` allocation
(plus zeroing and a copy) PER PACKET on the single receive thread. At line rate
that alloc/free churn, not the syscall, was the single-core wall: measured the
real Mac client topping out ~315 Mbps and dropping the session at 800, while a
Linux client (recvmmsg) held a clean 1 Gbps against the same host, and Moonlight
(batched recv) does 900 on the same Mac.

Add a `cfg(all(unix, not(linux)))` `recv_batch` that drains up to RECV_BATCH
datagrams per call with `libc::recv(MSG_DONTWAIT)` straight into the caller's
reused ring buffers — no per-packet allocation or copy. Still one syscall per
datagram (a future `recvmsg_x` batch would cut that too), but it removes the
dominant cost. Linux recvmmsg path and the Windows/loopback default unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:05:54 +00:00
enricobuehler c56b1b455a feat(punktfunk/1): request-IDR recovery for a wedged client decode
apple / swift (push) Successful in 1m17s
ci / rust (push) Failing after 31s
ci / web (push) Failing after 42s
ci / docs-site (push) Failing after 40s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 10s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 6s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 15s
deb / build-publish (push) Failing after 43s
Fixes the intermittent first-connect freeze. The host streams infinite GOP — one
opening IDR, then P-frames only (recovery keyframes just on loss) — so when the
client's decoder wedges on the cold first session (a lost/corrupt opening IDR, a
bad early P-frame) the picture stays frozen until the far-off next keyframe. The
client had no way to ask for one; now it does.

Add a RequestKeyframe control message (client -> host, reliable control stream),
mirroring Reconfigure:
- core: quic.rs RequestKeyframe (type 0x03) + roundtrip test; client.rs
  CtrlRequest::Keyframe + NativeClient::request_keyframe; abi.rs
  punktfunk_connection_request_keyframe (header regenerated).
- host: m3.rs decodes it in the control loop and signals the encode loop, which
  coalesces a burst and calls enc.request_keyframe() — wiring the existing
  NvencEncoder hook (force_kf -> next frame pict_type=I), the same recovery the
  GameStream path already had via force_idr.
- apple: PunktfunkConnection.requestKeyframe(); StreamPump (stage-1) requests on
  layer.status==.failed; Stage2Pipeline (stage-2) on a sync submit failure and on
  the async decode-error callback via a thread-safe KeyframeRecovery. All
  throttled to <=1/250ms (the decode stays wedged for several frames until the IDR
  lands, so per-frame requests would flood the control stream).

Self-healing: a lost recovery IDR is re-requested after the throttle; the host
coalesces bursts into a single IDR.

Validated: cargo fmt + clippy clean; core + host test suites green (incl. new
request_keyframe_roundtrip); swift build + test (39 passed); xcframework rebuilt
(all 5 slices), header regenerated with no unrelated drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 00:48:24 +02:00
enricobuehler 184f94e867 Merge remote-tracking branch 'origin/main'
ci / web (push) Failing after 36s
ci / rust (push) Successful in 1m8s
ci / docs-site (push) Failing after 34s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
apple / swift (push) Successful in 1m17s
docker / deploy-docs (push) Successful in 16s
2026-06-12 21:12:02 +00:00
enricobuehler a95984bb4f feat(client-linux): feature parity with the Swift client
Everything the macOS app does that stage 1 lacked, before any new
feature work (user directive):

- Input capture is now a deliberate, reversible STATE (Moonlight-
  style): engaged on stream start and click-into-video (the engaging
  click is suppressed), released by Ctrl+Alt+Shift+Q (toggles) or
  focus loss; held keys/buttons are flushed host-side on release;
  cursor hiding + shortcut inhibition follow the state; HUD hint when
  released. Per-session window handlers disconnect with the page.
- Gamepads: app-lifetime SDL service (GamepadManager parity) — pad
  list + "Forwarded controller" pin in Settings (auto = most recent),
  "Automatic" pad TYPE resolves from the physical pad at connect;
  DualSense touchpad contacts + ~250 Hz motion samples on the 0xCC
  plane (Swift GamepadWire scale constants); feedback grows adaptive-
  trigger replay and player LEDs via raw DS5 effects packets (the
  wire's 11-byte blocks drop into SDL_SendGamepadEffect verbatim);
  held pad state zeroed on pad switch/detach. sdl3 "hidapi" feature.
- Microphone uplink: PipeWire capture -> Opus 20 ms -> 0xCB datagrams
  (validated live: host received 711 mic packets), Settings toggle.
- Speed test per saved host (Swift's "Test Network Speed…"): 2 s
  probe burst, goodput/loss + recommended ~70 % bitrate, one-tap apply.
- Settings: host compositor preference (sent in the Hello), native-
  display resolution/refresh resolved from the window's monitor at
  connect (new default), bitrate ceiling to 3 Gbit/s.
- Hosts page: saved/trusted hosts section for direct pinned reconnect
  (mDNS not required), rebuilt on every page return.

Deliberately not ported: audio device pickers (PipeWire routing owns
this on Linux), resize-to-request_mode (not wired in Swift either),
pointer-lock relative mouse (stage-2 presenter, needs raw Wayland).
DualSense fidelity needs a physical pad to live-verify.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 21:11:52 +00:00
enricobuehler dea749186d fix(quic/apple): QUIC keep-alive + reconnect input re-engage
ci / rust (push) Failing after 36s
ci / web (push) Failing after 51s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / docs-site (push) Failing after 40s
apple / swift (push) Successful in 1m16s
docker / deploy-docs (push) Successful in 17s
Three native-client bugs isolated against a stock Moonlight client (which
stays connected / keeps input working under the same actions):

- Connection drops mid-stream: the quinn endpoints (host + client) ran with
  default transport config, so keep_alive_interval was OFF. Any quiet stretch
  (no input, audio muted/stalled, a capture hiccup, a mode change) let the
  idle timer expire and quinn closed the session -> next_au=Closed -> "Session
  ended". Moonlight's ENet sends keepalive pings; we sent nothing. Add a shared
  TransportConfig (keep-alive 4s under an explicit 20s idle timeout) to both
  endpoint::server_from_der and endpoint::client_pinned_with_identity.

- Reconnect input dead (macOS): the session-start auto-capture one-shot was
  consumed even when engageCapture(fromClick:false) was refused (window not key
  yet at the instant of reconnect), with no retry -> capture stayed off and
  input never forwarded. Clear the one-shot only on a successful engage, and
  retry on NSWindow.didBecomeKey. Stays scoped to session start, so it does not
  resurrect the rejected auto-grab-on-activation behavior.

- Reconnect input dead (iOS): wasCapturedOnResign leaked stale state across
  sessions and the foreground-restore could fire before this session's
  InputCapture was wired (setForwarding no-ops on nil). Reset it per session in
  start() and guard the didBecomeActive restore on inputCapture != nil.

Validated: cargo build -p punktfunk-core --features quic; swift build;
swift test (39 passed, 0 failures); xcframework rebuilt (all 5 slices), no
ABI/header drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:07:54 +02:00
enricobuehler a8a6224fd8 fix(encode): bound per-frame size with a tight VBV buffer
ci / rust (push) Failing after 36s
ci / web (push) Failing after 36s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Successful in 17s
ci / docs-site (push) Failing after 39s
apple / swift (push) Successful in 1m16s
NVENC ran CBR (bit_rate == max_bit_rate, rc=cbr) but never set rc_buffer_size,
so it used a loose default VBV. A high-motion P-frame was then allowed to spike
to many times the average frame size; the extra packets overflow the depth-2
send queue (newest frame dropped) and the kernel UDP buffer (WouldBlock drops),
which the client sees as framedrops/jitter — and on the infinite-GOP GameStream
path as old/stale frames flashing until the next RFI.

Set a tight ~1-frame VBV (rc_buffer_size = bitrate/fps) so the encoder holds
frame size roughly constant and absorbs motion as a momentary QP/quality dip
instead — the Sunshine/Moonlight low-latency model. Tunable via
PUNKTFUNK_VBV_FRAMES (default 1.0); larger trades burst tolerance for motion
quality. Fixes both the punktfunk/1 and GameStream paths (shared encoder).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 20:58:46 +00:00
enricobuehler 5f088c6f56 fix(client-linux): absolute mouse was dropped — pack the surface size in flags
ci / web (push) Failing after 45s
ci / rust (push) Successful in 1m1s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
apple / swift (push) Successful in 1m18s
ci / docs-site (push) Failing after 42s
docker / deploy-docs (push) Successful in 17s
The MouseMoveAbs wire contract packs the client coordinate-space size
as (width << 16) | height in `flags` (same as touch); injectors
normalize against it and drop the event when it is zero. The GTK
client sent flags=0, so KWin's libei path refused every motion
(`emitted=false`) — found via the first real user test from
home-worker-3.

- ui_stream: send_abs() packs the negotiated mode into flags for
  motion + click-position events.
- core input.rs: document the contract on MouseMoveAbs itself (it was
  only implied by TouchDown's doc).
- client-rs --input-test: add a MouseMoveAbs sweep so the absolute
  path stays covered — Moonlight and the Mac client only send relative
  motion, which is why this gap survived every prior live test.

Validated live against serve --native: kind=MouseMoveAbs emitted=true.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 20:50:53 +00:00
enricobuehler 96a35ca84c feat(client-linux): native GTK4 client — stage 1, first light at 1080p60
ci / rust (push) Failing after 29s
ci / web (push) Failing after 35s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 3s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 18s
ci / docs-site (push) Failing after 38s
apple / swift (push) Successful in 1m15s
docker / deploy-docs (push) Successful in 17s
New crate crates/punktfunk-client-linux (binary punktfunk-client), the
native Linux client on the Option A architecture (2026-06-12 research):

- GTK4/libadwaita shell linking punktfunk-core directly (no C ABI):
  mDNS host list, TOFU fingerprint prompt, SPAKE2 PIN pairing dialog,
  preferences (mode/bitrate/gamepad/shortcut capture), stats overlay,
  --connect host[:port] for scripting.
- Video: FFmpeg software HEVC decode (LOW_DELAY, slice threads) ->
  RGBA -> GdkMemoryTexture inside GtkGraphicsOffload (the dmabuf
  subsurface path lights up when VAAPI lands; black-background keeps
  fullscreen scanout-eligible).
- Audio: Opus -> PipeWire playback stream, the host virtual-mic's
  adaptive jitter ring inverted.
- Input: keyboard as the exact inverse of the host VK table (evdev
  keycodes, layout-independent; unit-tested), absolute mouse through
  the Contain-fit transform, WHEEL_DELTA(120) scroll, compositor
  shortcut inhibition while streaming, Ctrl+Alt+Shift+Q release chord,
  F11 fullscreen. SDL3 gamepad capture (single pad-0 model) + rumble
  and DualSense lightbar feedback on the same thread.
- Session pump owns video+audio pulls; the gamepad thread owns
  rumble+hidout — possible because NativeClient's plane receivers are
  now mutexed, making it Sync (Arc-shared, compiler-verified per-plane
  contract instead of the ABI's manual assertion).
- Linux-gated deps + a stub main keep cargo build --workspace green on
  macOS.

Validated live against serve --native on this box: 1920x1080@60,
locked 60 fps, capture->decoded p50 ~6.4 ms (software decode, debug
build). Teardown keys off AdwNavigationPage::hidden — NavigationView
push fires a transient unmap/map cycle that must not end the session.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 20:16:30 +00:00
enricobuehler 99b4de32ee feat(pairing): delegated approval (§8b-1) — approve an unpaired device from the console
ci / web (push) Failing after 40s
ci / rust (push) Successful in 1m6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 13s
apple / swift (push) Successful in 1m20s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
ci / docs-site (push) Failing after 46s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 18s
docker / deploy-docs (push) Successful in 16s
An identified-but-unpaired device that knocks on a pairing-required host is now
held as a pending request the operator approves from the web console — pairing it
with no PIN fetched out of band — instead of a flat reject.

- core: Hello gains an optional trailing device name (len u8 || UTF-8, ≤64,
  same trailing-back-compat pattern as compositor/gamepad/bitrate). client-rs
  --name sends it; the connector sends None (fingerprint-derived label).
- native_pairing: in-memory pending queue (note_pending dedups by fingerprint,
  evicts the least-recently-active past a 32 cap, 10-min TTL); approve_pending
  pins the fingerprint, deny drops it. Names are sanitized (strip control/ANSI/
  bidi — untrusted wire input); add()/remove() roll back in-memory on a persist
  failure; pairing clears any stale pending knock.
- m3: the require_pairing gate records the knock (sanitized label) before
  rejecting; anonymous (certless) clients record nothing.
- mgmt: GET /native/pending, POST /native/pending/{id}/approve (optional {name})
  and /deny; OpenAPI + tests; docs/api/openapi.json regenerated.
- web: a "Waiting for approval" section on the Pairing page (live-poll, Approve/
  Deny, error-surfaced via QueryState); en+de strings.
- Also completes an in-progress NativeClient Sync refactor (receivers behind
  per-plane mutexes) that was left half-applied in the tree.

Adversarially reviewed (4 lenses + 3-vote verify); the confirmed findings are
fixed here. Validated live on the GNOME box: knock (with a wire name, and a
malicious ANSI/bidi name that got neutralized) → pending → approve → the same
identity streams real video. Full workspace tests + clippy + fmt green; web tsc
clean. Roadmap §8b-1 marked done; §8b-2 (peer-push approval) is the client
follow-up. See docs-site pairing page.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 19:14:05 +00:00
enricobuehler c8099c0125 fix(vdisplay/mutter): stop screencast before monitor reconfig — fixes >60Hz teardown crash
ci / web (push) Failing after 45s
ci / rust (push) Successful in 57s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / docs-site (push) Failing after 29s
docker / deploy-docs (push) Successful in 16s
apple / swift (push) Successful in 1m19s
The high-refresh teardown SIGSEGV was caused by ApplyMonitorsConfig disabling the
still-actively-captured high-refresh virtual output. Reorder teardown: Stop the screencast
FIRST (Mutter removes the virtual + auto-reverts the temporary config), then re-assert the
physical layout once the virtual is gone. Never reconfigure a live virtual CRTC.

With this, PUNKTFUNK_MUTTER_VIRTUAL_REFRESH=1 is stable: validated at 5120x1440@240 on
Mutter 50 + NVIDIA — virtual output Meta-0@240, real 240fps, gnome-shell survives back-to-back
sessions + teardowns, physical (HDMI-1) restored each time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 14:08:05 +00:00
enricobuehler 015f2ee47b fix(vdisplay/mutter): gate >60Hz virtual mode behind an env flag (teardown SIGSEGV)
ci / web (push) Failing after 34s
ci / rust (push) Successful in 55s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / docs-site (push) Failing after 41s
docker / deploy-docs (push) Successful in 17s
apple / swift (push) Successful in 1m20s
Pinning the virtual output to a high client refresh via RecordVirtual "modes" works
mid-stream, but a high-refresh virtual CRTC SIGSEGVs gnome-shell on session TEARDOWN
(observed at 5120x1440@240) — taking down the whole GNOME session, so subsequent connects
fail with RemoteDesktop ServiceUnknown.

Gate it behind PUNKTFUNK_MUTTER_VIRTUAL_REFRESH, default OFF — Mutter then derives the
virtual monitor's refresh from the PipeWire framerate (60Hz, stable). The >60Hz path stays
in-tree for investigation; re-enable once the teardown crash is understood.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 13:59:29 +00:00
enricobuehler f6a7f3c12d feat(vdisplay/mutter): pin the virtual output to the client's refresh (>60 Hz)
RecordVirtual without a "modes" property makes Mutter derive the virtual monitor's refresh
from the PipeWire stream framerate and default to 60 Hz — so a 240 Hz client mode rendered
at 60 (the encoder just padded to 240 with duplicate frames). Pass an explicit "modes" entry
(size + refresh-rate + is-preferred) so Mutter creates the virtual monitor at the client's
exact WxH@Hz. Mutter >= 47; older Mutter ignores the unknown key (60 Hz fallback, no regression).

Confirmed first via raw D-Bus on the box, then validated end-to-end: the virtual output
Meta-0 reports 1920x1080@240.00 and the host encodes 480 *immediate* (real, not paced)
frames per 2 s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 13:45:29 +00:00
enricobuehler 2ed755f0c3 fix(vdisplay/mutter): make the virtual output the SOLE display, not primary + secondary
ci / web (push) Failing after 38s
ci / rust (push) Successful in 55s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 3s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / docs-site (push) Failing after 43s
docker / deploy-docs (push) Successful in 16s
apple / swift (push) Successful in 1m13s
Keeping the physical monitor enabled as a secondary let the cursor, windows, and keyboard
focus land on it — relative pointer motion wandered off the streamed surface, so on the
client the cursor "disappeared" and clicks/keys went nowhere visible. Omit the physical
outputs from ApplyMonitorsConfig so Mutter disables them for the session; everything is
confined to the streamed virtual output. Restored on teardown.

Validated on-box: mid-session DisplayConfig shows only the virtual output (Meta-0) as the
sole primary; the physical (HDMI-1) is restored after the session ends.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 13:05:02 +00:00
enricobuehler 0c4cfa40be fix(inject/mutter): GNOME input via Mutter's direct EIS, not the xdg portal
ci / rust (push) Successful in 56s
ci / web (push) Failing after 35s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 3s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / docs-site (push) Failing after 38s
apple / swift (push) Successful in 1m15s
On a headless GNOME host the xdg-desktop-portal RemoteDesktop Start() blocks on an
interactive "Allow remote control?" approval nobody can click, so libei input timed out
("EIS setup timed out") and neither mouse nor keyboard worked — even though video worked
(it uses Mutter's direct RemoteDesktop API).

Add EiSource::MutterEis: obtain the EIS fd from
org.gnome.Mutter.RemoteDesktop.Session.ConnectToEIS (CreateSession → Start → ConnectToEIS),
no portal and no approval. Selected for GNOME/Mutter; KWin keeps the RemoteDesktop portal,
gamescope keeps its own EIS socket.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 12:45:10 +00:00
enricobuehler 94552331ef feat(host): concurrent punktfunk/1 sessions (bounded by --max-concurrent)
ci / web (push) Failing after 32s
ci / docs-site (push) Failing after 34s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 17s
ci / rust (push) Successful in 5m25s
apple / swift (push) Successful in 1m23s
The accept loop no longer awaits each session inline — it spawns each onto a
JoinSet, bounded by a semaphore (--max-concurrent, default 4: a NVENC session
bound; overflow clients wait in QUIC's accept backlog until a slot frees). The
QUIC handshake stays in the accept loop so a failed handshake (e.g. a pin
mismatch where the client aborts) doesn't consume a session slot or block
accepting the next client; the slow part (control handshake, pairing, the
capture/encode pipeline) runs in the spawned task.

Each session already had its own virtual output + NVENC encoder; the
host-lifetime input/audio/mic services stay shared — the natural "multiple
devices viewing/controlling the same desktop" semantic on kwin/mutter/wlroots.
gamescope's independent-desktops (per-session input/audio) isolation is a
follow-up. New M3Options.max_concurrent + the `--max-concurrent` CLI flag.

Validated live (GNOME box): two clients connected at once -> two independent
Mutter virtual outputs (720p60 + 1080p60) streaming simultaneously (39 MB +
48 MB). All 61 host tests green (the c_abi/pairing tests exercise the new loop +
the failed-handshake-doesn't-count semantics).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 12:42:09 +00:00
enricobuehler 60ccbfdcf7 style: cargo fmt --all under rustfmt 1.9 (Rust 1.96)
Comment reflow only — the pinned "stable" channel moved and CI checks
formatting with the current toolchain.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 12:28:13 +00:00
enricobuehler bdcc88f5fc Merge remote-tracking branch 'origin/main'
ci / rust (push) Has been cancelled
2026-06-12 12:18:32 +00:00
enricobuehler 9fe7b7877f feat(vdisplay/mutter): optional virtual-output-as-primary for monitored GNOME hosts
PUNKTFUNK_MUTTER_VIRTUAL_PRIMARY=1: after RecordVirtual, promote the per-session
virtual output to the primary monitor (physical kept on, secondary) via
org.gnome.Mutter.DisplayConfig.ApplyMonitorsConfig, restoring on teardown.

Without it, a GNOME host that also has a physical monitor attached keeps the physical
primary, so the virtual output is an empty extended desktop — the client streams only
the wallpaper. (The backend was validated on headless GNOME, where the virtual output
is the only display.)

Best-effort + opt-in: default behavior is unchanged; any DisplayConfig failure just
logs and streaming continues. method=temporary, so nothing is written to monitors.xml
and Mutter auto-reverts the layout when the virtual output is torn down.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 12:18:32 +00:00
enricobuehler 6b4de5d738 feat(client/speedtest): request the host's full 3 Gbps probe ceiling
The Apple speed test asked for only 400 Mbps, capping the measured throughput
there and hiding the link's real headroom. Request the host's full
MAX_PROBE_KBPS (3 Gbps) instead, and raise the recommended-bitrate clamp from
500 Mbps to the host's 2 Gbps session ceiling so a fast measurement yields a
usable recommendation.

Also fix the stale caps left when the host clamps were raised (b8a33e2): the
resolved-bitrate range and the probe doc comments (abi.rs, client.rs,
regenerated header), plus the section 9 roadmap copy, now read 3 Gbps probe /
2 Gbps session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 14:08:19 +02:00
enricobuehler 1c94f46be8 style(quic): format-stable clock test assert (message to comment)
ci / rust (push) Has been cancelled
The clock_offset test's assert_eq! carried an inline message that newer rustfmt
wants to wrap while the repo's committed style keeps such asserts on one line.
Move the message to a comment and use bare assert_eq! so it formats identically
under any rustfmt version — no new fmt-check ambiguity from this addition.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 12:01:56 +00:00
enricobuehler 7eb9a927cf feat(connector): expose host clock offset over the C ABI for glass-to-glass
ci / rust (push) Has been cancelled
Factor the client-side skew handshake into a shared core helper (quic::clock_sync
-> ClockSkew) so both the reference client and the embeddable connector use one
implementation. NativeClient now runs the handshake at connect (right after Start,
before the control task takes the stream) and stores the host-client offset; it's
read over the C ABI via punktfunk_connection_clock_offset_ns (i64 ns, host minus
client; 0 = no correction / old host).

This is the substrate the Apple client needs for the decode->present (glass-to-
glass) term: stamp present time, add the offset to express it in the host's
capture clock, subtract the AU pts_ns. client-rs drops its local clock_sync copy
and uses the shared helper (behavior unchanged; validated locally).

Regenerates include/punktfunk_core.h. Roadmap section 12 + status updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 11:44:54 +00:00
enricobuehler 05bc9ab22c feat(latency): wall-clock skew handshake for cross-machine latency measurement
ci / rust (push) Has been cancelled
ClockProbe/ClockEcho on the QUIC control stream — 8 NTP-style rounds right after
Start; the min-RTT sample gives the host-client clock offset (clock_offset_ns
estimator in punktfunk-core). The client adds the offset to its receive instant
before differencing against the AU pts_ns, so the capture->reassembled latency
percentiles are valid across machines (skew_corrected=true), not just same-host.
Back-compat: an old host that doesn't answer the probe times out and the client
falls back to a shared-clock assumption (skew_corrected=false).

Host adds one ClockProbe dispatch arm in the control task; the client runs
clock_sync after Start, before the --remode/--speed-test tasks take the stream.

Validated cross-LAN (GNOME box -> dev box): offset ~ -1.57 ms (reproducible),
rtt ~140 us, p50 1.30 ms skew-corrected capture->reassembled — the offset is
exactly the systematic error the handshake removes. Unit tests for the message
codecs and the min-RTT offset estimator.

Roadmap §12: skew handshake done; remaining for true glass-to-glass is the Apple
client present-stamp (decode->present) plus the host render->capture term.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 11:20:20 +00:00
enricobuehler 4fff4641bb feat(discovery): native-protocol LAN auto-discovery over mDNS
ci / rust (push) Has been cancelled
Both the unified host (serve --native) and standalone m3-host now advertise the
native punktfunk/1 service over mDNS (_punktfunk._udp) — the analogue of the
GameStream _nvstream._tcp advert. TXT records carry proto, the host cert
fingerprint (fp, the value clients pin), the pairing requirement
(pair=required|optional), and the host id. New crate::discovery module, wired
into m3::serve so both host entry points get it; best-effort, never blocks
streaming (--connect always works).

Client gains `punktfunk-client-rs --discover [SECS]`: browses the LAN and prints
each host (name, addr:port, pairing, fingerprint), then exits. Apple clients
browse the same service natively via NWBrowser (service type + TXT keys are the
contract).

Validated cross-LAN: the dev box discovered the GNOME-box appliance
(pair=required) and a standalone synthetic host (pair=optional); fingerprint and
pairing state correct in both.

Also refresh the now-stale sendmmsg caveat in the bitrate doc (batched/paced send
landed + validated to 1 Gbps) and mark the encode|send thread split done in §12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 10:37:12 +00:00
enricobuehler b295a5b7a9 perf(latency): encode|send thread split on the native path
ci / rust (push) Has been cancelled
Bigger-bet #1 from the latency plan. virtual_stream ran capture+encode+seal+
paced-send on ONE thread, so frame N+1's capture/encode couldn't start until
frame N's entire paced tail had left the wire — the pacing budget (~0.9×interval)
was serialized in front of the next encode. Port GameStream's spawn_sender model
to the native path:

- A dedicated send thread (`send_loop`) owns the WHOLE Session (so no socket
  clone or shared/Arc stats needed — `seal_frame` mutates the nonce, `send_sealed`
  + the probe bursts all live there) and does FEC+seal + microburst-paced send.
- The encode thread captures+encodes + handles reconfig and hands each AU over a
  bounded sync_channel(3) as a FrameMsg (data, capture_ns, flags, deadline,
  encode_us). It BLOCKS on backpressure if the send falls behind — frames slow
  down rather than a dropped frame freezing the infinite-GOP stream (we don't
  drop). Clean shutdown: drop the channel → send thread drains/exits → join.
- Probes (run_probe_burst) move to the send thread since they need the Session; a
  burst naturally pauses video (the encode thread blocks on the full channel).
- Per-frame encode_us/pace_us histogram moved to the send thread (carries
  encode_us in the FrameMsg) and now reflects the overlap.

Removes the encode↔paced-tail serialization (~2-8 ms @60-120 fps), independent of
the pacing policy, no quality cost. Substrate for the future NVENC slice wrapper.

Verified live on this box (appliance restarted onto it): a client streamed the
KWin desktop (1.49 MB H.265, clean, no panic) and a 200 Mbps speed-test probe
completed through the send thread (0 drops). Build + clippy + fmt green.
Real-NIC sustained soak (reconfig under load, line-rate, mode switches) pending
the Ubuntu third host.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 09:42:16 +00:00
enricobuehler 86f463cf71 fix(housekeeping): unaligned read UB + recv-drop parity; dedup mmsghdr; doc fixes
ci / rust (push) Has been cancelled
From a bug-hunt + unsafe-audit pass (4 reviewers + adversarial verify). It
confirmed ZERO real bugs in the recent batched/paced data-plane work — these are
the surfaced cleanups + one genuine soundness fix:

- SOUNDNESS (reduce unsafe): inject/gamepad.rs::pump_ff did `ptr::read` of an
  InputEventRaw (align 8, holds a timeval) out of a 1-aligned [u8; N] buffer — UB
  per the reference (x86_64 tolerates it, but it can miscompile under LTO). Use
  ptr::read_unaligned + a SAFETY note. Zero behavior change.
- recv parity: recv_batch (recvmmsg) didn't drop an oversized/truncated datagram
  the way scalar recv does — poll_frame now skips a message whose len fills the
  buffer (> MAX_DATAGRAM_BYTES), matching recv's `n >= RECV_BUF` drop. (AEAD
  already rejected these on encrypted sessions; this restores the documented
  invariant on the batched path.)
- dedup unsafe FFI: factor the identical mmsghdr-from-iovec construction out of
  send_batch + recv_batch into one `mmsghdrs()` helper — the raw-pointer
  scaffolding + its lifetime SAFETY note now live in one place.
- docs: TARGET_SOCKBUF no longer calls paced sending future work (it landed,
  m3.rs::paced_submit); gamescope.rs input is no longer "(TODO)" (wired +
  live-validated); the PUNKTFUNK_PERF `wire_mbps` field is renamed `tx_mbps` and
  noted as attempted/sealed bytes (send_dropped shows what didn't reach the wire).

Full suite (35 + loopback round-trip + 6) + clippy + fmt green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 23:04:23 +00:00
enricobuehler 99f60b5b08 perf(latency): microburst-cap pacing + per-frame latency histogram
ci / rust (push) Has been cancelled
From the latency investigation: the freeze-fix pacing (paced_submit) was the
single biggest software-controllable latency term — it unconditionally spread
EVERY multi-chunk frame over ~90% of the frame interval, adding up to ~7.5 ms
@120 / ~15 ms @60 to a frame's last packet even when the frame was small or the
link idle. Recover that on the common case while keeping the freeze fix:

- Microburst-cap pacing: a frame whose sealed size is <= a cap (default 128 KB,
  PUNKTFUNK_PACE_BURST_KB) goes out in ONE immediate burst — no pacing latency.
  Only the OVERFLOW of a bigger frame (IDR / sustained high bitrate, the bursts
  that actually overran the tx buffer and froze) is spread. 128 KB is well under
  the ~150 Mbps@60 frame size where drops began, so the default is safe; raise it
  after confirming send_dropped stays 0 on a given link. Still never slower than
  unpaced (budget collapses to 0 with no slack). seal-once/in-order nonce
  preserved — chunks are split, never reordered or re-sealed.
- Per-frame instrumentation (PUNKTFUNK_PERF, zero-cost off): encode_us +
  pace_us (the pacing tail) p50/p99/max histograms + immediate-vs-paced frame
  counts in the periodic perf line, so the pacing tail is finally visible and the
  cap is tunable against real numbers.

Host builds + clippy + fmt green. NOT yet deployed to the running hosts (still on
the safe full-pacing A+B build) — needs the user's LAN soak to validate the cap
doesn't reintroduce send_dropped before raising it. Deferred bigger bets (need
real-NIC/GPU/Mac validation): encode|send thread split on the native path,
CUDA stream+event (one redundant sync), NVENC slice wrapper, stage-2 Apple
presenter, glass-to-glass probe — see docs/roadmap.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:53:52 +00:00
enricobuehler 2f4f92a804 feat(1gbps): batched client recv via recvmmsg (increment C)
ci / rust (push) Has been cancelled
Final increment of the 1 Gbps data-plane rework — the recv counterpart of the
sendmmsg work. The client recv path did one recvfrom + one Vec allocation per
packet (and the pump's 300µs idle sleep could let packets pile up at line rate).

- Transport gains recv_batch(&mut [Vec<u8>], &mut [usize]) -> count; default is
  a single scalar recv into out[0] (loopback + non-Linux).
- UdpTransport overrides it on Linux with recvmmsg (MSG_DONTWAIT) draining up to
  N datagrams per syscall into the caller's reused buffers — no per-packet alloc.
- Session::poll_frame owns a lazily-allocated recv ring (RECV_BATCH=32) and
  consumes it one packet at a time across calls, refilling with one recvmmsg when
  drained. Encapsulated: the punktfunk-client-rs + NativeClient pumps are
  unchanged, and draining a batch per syscall means the 300µs sleep no longer
  underdrains. Added UdpTransport::local_addr (used by the test, generally handy).

~125k → ~4k recv syscalls/sec at line rate, zero per-packet recv allocation.
Verified: new recv_batch_drains_over_loopback test (50 datagrams drained intact
via recvmmsg) + the existing loopback round-trip now runs through the batched
poll_frame; full suite (35 + round-trip + 6) + clippy + fmt green.

Decode-in-place (kill the per-packet open_from_wire alloc) is a separate later
optimization. With A (sendmmsg) + B (paced send) + C (recvmmsg), the native data
plane is batched + paced end to end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:39:51 +00:00
enricobuehler 10a932d013 feat(1gbps): pace per-frame sends so high-bitrate frames don't burst-drop
ci / rust (push) Has been cancelled
Increment B of the send-path rework — the actual fix for "freezes get more
common over ~150 Mbps, no image at all at 400 Mbps" on the native path. Cause:
the encoder emits a frame and submit_frame blasted ALL its packets at once into
the NIC; a real link drops the line-rate burst (host send buffer EAGAINs), and
under infinite GOP one dropped frame freezes the decode until the next keyframe.
(The speed-test probe showed 0 drops at 400 Mbps because the probe is self-paced;
real video wasn't.)

Adaptive pacing, no extra thread, no regression:
- Session splits into seal_frame (FEC + packetize + seal → wire packets, no
  send) and send_sealed (one batched sendmmsg of a chunk, counts drops);
  submit_frame is now their composition (synthetic + probe paths unchanged).
- virtual_stream's paced_submit seals a frame then sends it in 16-packet chunks
  spread over ~90% of the time until the next frame is due. At 60 fps desktop
  (fast encode → lots of slack) the frame spreads across the interval → no NIC
  burst → no freeze. At 240 fps@5K (encode ≈ interval → ~0 slack) the budget
  collapses and every chunk goes out immediately → never slower than before.

Core suite (34 + loopback round-trip + 6) + clippy + fmt green. The seal/send
split is covered by the existing loopback tests; the pacing is host timing,
verified by review (live-test needs a real NIC — your Mac at a raised bitrate).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 22:15:52 +00:00
enricobuehler c24b571e37 feat(1gbps): batched send via sendmmsg (Transport::send_batch)
ci / rust (push) Has been cancelled
First increment of the 1 Gbps send-path rework (the measured bottleneck): the
native data plane did one send() syscall per packet — at ~125k pkt/s (1 Gbps
wire) that burns a core on syscalls. Port the proven GameStream sendmmsg path
into the core Transport seam.

- Transport gains `send_batch(&[&[u8]]) -> usize` (count handed to the kernel;
  caller counts the rest as send-buffer drops). Default = the scalar send loop
  (loopback transport + non-Linux).
- UdpTransport overrides it on Linux with `sendmmsg` (64 datagrams/syscall);
  the connected socket needs no per-message address. Non-blocking-aware: a full
  send buffer yields a short count / EAGAIN, and we stop + report what went out
  rather than block or retry (same lossy, FEC-protected contract as send()).
- Session::submit_frame seals every shard then hands the whole frame to
  send_batch in ONE call instead of looping send() — ~64x fewer syscalls per
  frame on the native + GameStream-over-core paths; send_dropped accounting
  preserved (total - sent).

~125k → ~2k syscalls/sec at 1 Gbps line rate. Verified: new loopback-UDP test
send_batch_delivers_over_loopback (100 batched packets arrive intact, datagram
boundaries preserved); full core suite + clippy + fmt green.

Next increments: a paced send thread (microburst shaping so a real NIC doesn't
drop line-rate bursts) and recvmmsg on the client.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 21:55:22 +00:00
enricobuehler b8a33e21a2 feat(1gbps): raise bitrate/probe clamps + socket buffers, count send-buffer drops
ci / rust (push) Has been cancelled
First step of 1 Gbps+ readiness (the whole point of the GF(2^16) Leopard FEC):
make 1 Gbps configurable and its dominant failure mode observable, before the
real transport work (sendmmsg + paced encode|send split) lands.

Investigation (6-way) verdict: we're ~halfway, and it's mostly clamps plus one
real piece of work. The integer/type path, FEC (a 1 Gbps frame is only a few
hundred shards in one GF(2^16) block, far under the 65535 ceiling), AES-GCM
(AES-NI, ~10-25x headroom), and the M1 reassembler bounds (fully derived from
the negotiated FecConfig) are ALL already 1 Gbps-ready and untouched.

This commit (the configurable + observable foundation):
- m3.rs: MAX_BITRATE_KBPS 500_000 -> 2_000_000 (2 Gbps headroom over the 1 Gbps+
  target); MAX_PROBE_KBPS 1_000_000 -> 3_000_000 (probe can demonstrate headroom
  ABOVE the session cap so a client can confidently pick a 1 Gbps+ bitrate).
- transport/udp.rs: TARGET_SOCKBUF 8 MB -> 32 MB (a multi-MB IDR keyframe burst
  no longer fills the buffer); scripts/99-punktfunk-net.conf bumped to match.
- Observability: Transport::send now returns Ok(true|false) (false = WouldBlock
  send-buffer drop, previously a silent Ok(())). Session counts these as a new
  `packets_send_dropped` stat (distinct from recv-side packets_dropped) — in
  Stats, the C ABI PunktfunkStats (header regenerated), a PUNKTFUNK_PERF periodic
  wire-Mbps + drop dump in virtual_stream, and the speed-test probe completion
  log. This is the dominant 1 Gbps+ loss mode and was invisible.

Loopback-verified: a probe now runs at 1.2 Gbps target (no longer truncated to
1 Gbps) with the drop counter live. NOT yet a sustained-1-Gbps proof — the
single-send()-per-packet native path is the next, real piece of work (port the
proven GameStream sendmmsg + paced send thread into the core Transport).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 20:45:49 +00:00
enricobuehler 7cac1eb663 feat(host): log virtual DualSense pad creation (match the X-Box path)
ci / rust (push) Has been cancelled
The uinput X-Box 360 backend logs "virtual gamepad created" on success, but
the UHID DualSense backend logged only on failure — so a working DualSense
session was silent and indistinguishable in the logs from one where no pad
was ever created. Add the matching success log.

This makes a DualSense-not-working report self-diagnosing: the host now logs
either "virtual DualSense created (UHID hid-playstation)" or the existing
"virtual DualSense creation failed — controller input disabled" (which fires
when /dev/uhid isn't writable — i.e. the 60-punktfunk.rules uhid rule isn't
installed or the user isn't in the 'input' group).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 18:51:23 +00:00
enricobuehler 74819b1be8 feat(punktfunk/1): negotiable encoder bitrate + bandwidth speed-test probe
ci / rust (push) Has been cancelled
Two related additions to the native protocol, host-side (the client side of
each is exposed over the C ABI so the platform clients can wire it up).

Bitrate negotiation
- Hello/Welcome carry `bitrate_kbps` (appended trailing-byte field, back-compat:
  old peers decode 0 = host default). The client requests a rate; the host
  clamps it to [500 kbps, 500 Mbps] (or its 20 Mbps default when 0) and echoes
  the resolved value in Welcome. Replaces the hardcoded 20 Mbps NVENC bitrate in
  m3.rs — threaded through virtual_stream → build_pipeline → open_video, applied
  on the initial mode and every reconfigure rebuild.
- C ABI: punktfunk_connect_ex3(..., bitrate_kbps, ...) (ex2 delegates with 0);
  punktfunk_connection_bitrate() reads the resolved value.

Speed test (bandwidth probe)
- New typed control messages ProbeRequest{target_kbps,duration_ms} (0x20) /
  ProbeResult{bytes_sent,packets_sent,duration_ms} (0x21), plus a FLAG_PROBE
  packet flag. The client asks the host to burst zero-filled, FLAG_PROBE-tagged
  access units over the data plane at a target goodput for a duration (clamped
  ≤ 1 Gbps / ≤ 5 s), pacing by a bytes-allowed budget; video pauses for the
  burst. The host reports what it actually sent; the client measures received
  bytes + window → goodput and loss. Probe filler is never fed to the decoder
  (diverted in the connector pump and the reference client's poll loop).
- The host control task now multiplexes Reconfigure + ProbeRequest (inbound)
  and ProbeResult (outbound) over select!; a probe channel reaches the
  data-plane thread (both virtual and synthetic sources).
- Connector: NativeClient::request_probe()/probe_result() with an internal
  accumulator; C ABI punktfunk_connection_speed_test() +
  punktfunk_connection_probe_result() → PunktfunkProbeResult.
- punktfunk-client-rs gains `--bitrate KBPS` and `--speed-test KBPS:MS` (its own
  loop measures + logs goodput/loss) for loopback verification.

Validated on loopback (synthetic source): a 20 Mbps / 2 s probe measured
20050 kbps at 0% loss, bitrate negotiated (0→20000 and 50000→50000), and the
interleaved probe AUs were correctly excluded from frame verification
(mismatched=0). Wire codecs + trailing-byte back-compat have unit tests. C
header regenerated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 18:44:47 +00:00
enricobuehler c894c6f897 feat(host): host-managed gamescope session at the client's mode (dynamic res + refresh)
ci / rust (push) Has been cancelled
Nested games on the Bazzite host saw the wrong display: refresh capped at 60 Hz,
the box's connected TV's EDID modes leaking in (DOOM landed on 2560×1440@60), and
the resolution fixed at whatever the always-on session was launched at — the
client's requested mode never reached the game. Root causes: the session-plus
gamescope command has no --nested-refresh (Xwayland advertises 59.96 Hz for every
mode), --prefer-output HDMI-A-1 makes gamescope read the TV EDID, and the ATTACH
model launches one fixed-resolution session.

New vdisplay path: PUNKTFUNK_GAMESCOPE_SESSION=<client> — the host LAUNCHES
gamescope-session-plus headless AT THE CLIENT'S mode and relaunches it when the
mode changes. Injected via a host-written GAMESCOPE_BIN wrapper (--nested-refresh
$PF_HZ, the flag session-plus doesn't expose) + DRM_MODE=cvt (gamescope generates
clean CVT modes at that refresh instead of the TV's EDID). The session runs as a
transient `systemd-run --user` unit (clean cgroup teardown of the Steam tree);
state lives in a host-lifetime static (MANAGED_SESSION), NOT in GamescopeDisplay
(which is per-client-session) — so a same-mode reconnect REUSES the running
session instantly (no Steam restart) while a different mode RELAUNCHES it (games
can't change output mode live; a game/Steam restart on a mode change is
unavoidable and acceptable). Reuses the existing node + EIS auto-discovery
(find_gamescope_node / find_gamescope_eis_socket, factored into
point_injector_at_eis) and the existing mid-stream Reconfigure → vd.create(mode)
machinery — no protocol or m3 control-flow change.

Validated live on bazzite (RTX 4090): games' Xwayland now advertises 5120×1440 @
239.90 Hz as the preferred mode (was 59.96), the TV's 3840×2160/4096×2160@60 modes
are gone, frames stream; reconnect at 1920×1080@120 relaunches and games see that;
same-mode reconnect reuses with no restart and frames flow instantly.

scripts: host.env.example documents PUNKTFUNK_GAMESCOPE_SESSION (mutually exclusive
with the legacy NODE=auto attach); punktfunk-steam-session.service marked
deprecated (superseded — must not run alongside the host-managed path).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 16:14:10 +00:00
enricobuehler 1d605fb781 feat(gamepad): controller discovery + client-negotiated pad type + rich DualSense end to end
The Apple client grows full gamepad support and punktfunk/1 learns to negotiate
the virtual pad type:

- Protocol: Hello carries a GamepadPref byte (offset 21, the same trailing-byte
  back-compat pattern as the compositor; echoed resolved in Welcome at 54).
  Host precedence: explicit client choice > PUNKTFUNK_GAMEPAD env > Xbox 360,
  DualSense (UHID) only where available. ABI: punktfunk_connect_ex2 +
  punktfunk_connection_gamepad (connect_ex delegates; ABI_VERSION stays 2 — the
  trailing byte IS the compat mechanism). punktfunk-client-rs gets --gamepad.

- Swift client: GamepadManager (app-lifetime discovery + selection — Settings
  lists every controller with capabilities/battery/"In use"; exactly ONE pad
  forwards as pad 0, auto = most recently connected, or pinned), GamepadCapture
  (snapshot-diff button/axis events, DualSense touchpad + ~250 Hz motion on the
  rich-input plane, held state released on switch/deactivate/stop),
  GamepadFeedback (rumble → CoreHaptics per-handle engines; lightbar →
  GCDeviceLight; player LEDs → playerIndex; adaptive-trigger blocks → the
  table-driven DualSenseTriggerEffect parser → GCDualSenseAdaptiveTrigger,
  exact for the 10-zone positional modes). The pad type auto-resolves from the
  physical controller at connect time, user-overridable in Settings.

- Host DualSense fixes surfaced by adversarial review against hid-playstation /
  SDL / Nielk1 ground truth: input-report sensor/touch offsets were off by one
  (the kernel read garbage motion + phantom touches), the L2/R2 trigger blocks
  were swapped (the report is right-trigger-first), feedback now gates on the
  report's valid-flags (a plain rumble write no longer blanks lightbar/
  triggers), and the touchpad rescale clamps to the advertised ABS_MT extents.

- Tests: Hello/Welcome trailing-byte back-compat, pick_gamepad precedence,
  byte-exact input-report layout, valid-flag gating, per-mode trigger-parser
  table (incl. packed 3-bit zones), wire conversions, and a scripted loopback
  feedback burst (PUNKTFUNK_TEST_FEEDBACK=1) asserted through the xcframework
  on the rumble + HID-output planes.

Validated: cargo test/clippy/fmt green on macOS + Linux (61 host tests), swift
build/test green, test-loopback.sh green, tvOS/iOS targets compile. DualSense
motion sign/scale is derived from the calibration blob, not yet live-verified
(constants isolated in GamepadWire).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 16:28:33 +02:00
enricobuehler 0f333460ec fix(core): grow UDP socket buffers — fixes 4K/5K video freezing on one frame
The data-plane UDP sockets used the OS default buffer (~208 KB on Linux, similar
on macOS), which is smaller than a single high-resolution frame burst: a
5120×1440 keyframe is ~130 packets the encode|send thread hands to sendmmsg at
once. The burst overflows the buffer — EAGAIN on the host send (now dropped, was
fatal) or a silent drop on the client recv — and because the data plane runs
infinite-GOP, one lost frame breaks every subsequent reference and the decode
freezes on the last good frame until an RFI refresh that may never catch up.
Symptom: connect at 5120×1440, see ONE frame, then a frozen image (audio + input
keep working — those ride QUIC, not this socket).

Set SO_SNDBUF/SO_RCVBUF to 8 MB (clamped by the OS to net.core.{w,r}mem_max on
Linux / kern.ipc.maxsockbuf on macOS); warn if the grant lands far below target so
an undersized host is diagnosable. The client side matters most — the SAME
UdpTransport backs the Apple client's data plane via the C ABI, and macOS grants
multi-MB buffers without any sysctl, so a rebuilt client stops losing frames.

Validated live, bazzite→client at 5120×1440: was 1319/1500 frames (12% loss →
freeze), now 1500/1500 @60 and 5279/5279 @240 (split-encode active), zero
mismatches, p50 1.9–3.4 ms. Host send buffer was still capped at 416 KB and lost
nothing — the loss was purely the client recv buffer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 13:28:02 +00:00