Files
punktfunk/CLAUDE.md
T
enricobuehler bf8a974e8b
ci / rust (push) Has been cancelled
feat: M4 stage 1 — the SwiftUI client is real: compiles, tested, first light on glass
The clients/apple scaffold is now a working macOS client, validated live against this
repo's host across the LAN: gamescope virtual output → NVENC HEVC → lumen/1 (GF(2¹⁶) FEC +
AES-GCM over UDP, QUIC control) → VideoToolbox → AVSampleBufferDisplayLayer at 720p60,
mouse/keyboard flowing back as QUIC datagrams into the host's gamescope EIS injector
(~3.7k events injected in one session).

LumenKit:
- LumenConnection: the predicted cbindgen compile fixes (C17 header spells the typedefs as
  integers while the enum constants import as a distinct Swift type — bridge by rawValue);
  close() is now safe from any thread (a close flag + pumpLock held across the blocking
  poll enforce the C contract "never close with a next_au in flight"; flag prevents
  lock-starvation by back-to-back polls).
- StreamView: per-pump cancellation token (reconnects can't double-pump), flush + re-gate
  on the next in-band parameter sets when the layer fails, no stale enqueue after restart.
- InputCapture: fractional-delta accumulation (sub-pixel motion isn't truncated away),
  pressed-state tracking with release-all on focus loss and stop() (nothing sticks down
  host-side), global-singleton ownership guard (GC has one handler slot per process),
  X1/X2 buttons, horizontal scroll, full keypad/CapsLock/ISO-102nd/PrintScreen/Menu VKs.
- LumenClient app shell (swift run LumenClient): connect form, fps/Mb-s HUD,
  LUMEN_AUTOCONNECT/LUMEN_MODE for scripted first-light runs.
- Tests: Annex-B byte-level units; real-codec round trip (VTCompressionSession-encoded
  HEVC rebuilt as the host's wire shape → AnnexB → VTDecompressionSession → pixels);
  test-loopback.sh (Swift client vs a real local m3-host over loopback — the Swift twin of
  c_abi_connection_roundtrip); RemoteFirstLightTests (full pipeline over the LAN).

Host/build fixes that fell out:
- The workspace builds on non-Linux again: gamestream audio (opus) and sendmmsg batching
  are now platform-gated with stubs/fallback, per the crate's "compiles everywhere" rule.
- Horizontal scroll was inverted end-to-end: the injectors negated BOTH axes onto the
  ei/wl axes, but GameStream's horizontal convention is positive = right
  (moonlight-qt/Sunshine pass it through unnegated) — only vertical flips now. This also
  un-inverts real Moonlight clients.
- AnnexB drops all zeros preceding a start code (trailing_zero_8bits padding), ffmpeg's
  policy, instead of leaking them into the preceding NAL.
- build-xcframework.sh: deployment targets pinned to the package floor + an otool guard —
  cargo does not fingerprint MACOSX_DEPLOYMENT_TARGET, so warm caches can silently ship
  too-new minos objects.

Adversarially reviewed (5-dimension multi-agent pass, every finding refutation-verified):
14 confirmed findings, all fixed above; the send-while-polling core-contract gap flagged
here is closed by the lumen/1 session-planes work (&self pulls + per-plane borrow slots).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:46:45 +02:00

9.9 KiB
Raw Blame History

CLAUDE.md — lumen

Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protocol core (lumen-core) exposed over a C ABI and native clients per platform. Full design: docs/implementation-plan.md. Status table: README.md.

Where the work stands

  • M1 (lumen-core + C ABI): complete and hardened. FEC recovery, loopback-under-loss, proptests, C ABI harness all green; 13 adversarial-review findings fixed + regression-tested (a913042).
  • M2 (GameStream host): working end-to-end with a stock Moonlight client. Validated live on this box: pairing (persists across restarts), serverinfo/applist (app catalog from ~/.config/lumen/apps.json → each entry picks a compositor + nested command), RTSP, ENet control, audio, and video at the client's native resolution and refresh — the host creates a per-session virtual output via per-compositor VirtualDisplay backends: KWin (zkde_screencast stream_virtual_output, needs KWin ≥ 6.5.6 headless; >60 Hz via custom modes), gamescope (spawned headless at WxH@Hz, its PipeWire node captured, needs gamescope ≥ 3.16.22 — older deadlocks on PipeWire ≥ 1.6), Mutter (D-Bus RecordVirtual; implemented, live validation pending a gnome-shell install). Performance work landed and measured: GPU zero-copy on all paths (tiled dmabuf → EGL/GL → CUDA; LINEAR dmabuf → Vulkan bridge → CUDA → NVENC), auto 2-way NVENC split-encode above ~1 Gpix/s (5K@240), infinite GOP + RFI keyframes (killed the periodic freeze), encode|send thread split with sendmmsg batching. Stable 240 fps at 5120×1440. Input: mouse/keyboard (libei via RemoteDesktop portal on KWin/GNOME, gamescope's own EIS socket, wlr protocols on Sway) and gamepads (uinput X-Box-360 pads + rumble back-channel; live validation pending the udev rule below). Management REST API + checked-in OpenAPI doc (mgmt.rs).
  • M3 (lumen/1, the native protocol): full session planes, validated live. QUIC control plane (lumen-core quic feature: Hello{mode}/Welcome{full Config}/Start), data plane = the hardened M1 Session over raw UDP with GF(2¹⁶) Leopard FEC + AES-GCM (inexpressible in GameStream), host creates the native virtual output at the client's requested mode. m3-host is a persistent listener (sessions back to back; --max-sessions). QUIC datagrams carry the side planes, demuxed by first byte: input 0xC8 (incl. gamepads — incremental events accumulated into the uinput xpad), Opus audio 0xC9 (48 kHz stereo, 5 ms, host→client), rumble 0xCA (host→client). Trust: host serves its persistent identity (~/.config/lumen/cert.pem, shared with GameStream pairing) and logs the SHA-256 fingerprint; clients pin it (TOFU on first connect — endpoint::client_pinned). Measured on-box at 720p120: 1680/1680 frames, p50 0.83 ms capture→…→reassembled; audio measured live (~200 pkts/s). lumen-client-rs is the working reference client (--pin, datagram counters, --input-test incl. gamepad). The embeddable connector (NativeClient) exposes it all over the C ABI: lumen_connect (pin/TOFU) + next_au/next_audio/next_rumble/send_input.

What's left

  1. M4 — client decode + present: macOS stage 1 done, first light achieved (2026-06-10). LumenKit compiles and is tested on macOS (AnnexB → VideoToolbox → AVSampleBufferDisplayLayer, GCMouse/GCKeyboard capture, LumenClient app shell); validated live Mac ↔ this box at 720p60 — vkcube on glass, input injected via gamescope EIS. Tests: swift test in clients/apple (unit + real-codec round trip), test-loopback.sh (Swift client vs synthetic m3-host on loopback — runs on macOS), RemoteFirstLightTests (full pipeline over the LAN). See clients/apple/README.md. Next: stage 2 presenter (VTDecompressionSession + CAMetalLayer frame pacing), glass-to-glass numbers via tools/latency-probe (scaffold), iOS variant. The Linux reference client (lumen-client-rs) gets VAAPI + wgpu on the same connector later.
  2. Sub-frame pipelining: overlap encode and transmit within a frame. Requires a direct NVENC SDK wrapper (libavcodec only emits whole AUs) — the next big latency lever (~24 ms at high res).
  3. lumen/1 protocol growth: a PIN-style pairing ceremony on top of fingerprint pinning, mid-stream mode renegotiation (the Welcome is one-shot today), concurrent sessions (today: one at a time, extras wait in the accept queue).
  4. M2 polish: wlroots/Sway VirtualDisplay backend (deferred; swaymsg create_output), GNOME live validation, gamepad live validation (blocked on the udev rule below), HDR/10-bit/AV1 negotiation, surround audio, reconnect-at-new-mode robustness.
  5. Native clients (clients/{apple,android} scaffolds) consuming lumen_core.h.
  6. This box, one-time setup still pending: sudo cp scripts/60-lumen.rules /etc/udev/rules.d/ + user into input group (gamepads); apt install gnome-shell (Mutter validation). Done since last update: gamescope 3.16.22 is installed at /usr/local/bin — the PATH=/tmp/gamescope-src/... override is no longer needed.

Build / test / run

cargo build --workspace          # green on Linux and macOS
cargo test  --workspace          # unit + loopback + proptest + C ABI harness (~97 tests)
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --all --check

cargo run -p loss-harness        # FEC loss-resilience sweep (no network needed)
bash crates/lumen-core/tests/c/run.sh   # standalone C-ABI link + round-trip proof

Generated artifacts are checked in and CI fails on drift: include/lumen_core.h (cbindgen from lumen-core/src/abi.rs) and docs/api/openapi.json (regenerate with cargo run -p lumen-host -- openapi > docs/api/openapi.json; spec lives in mgmt.rs).

Layout

crates/lumen-core/        protocol · FEC · crypto · quic (lumen/1 control plane, feature-gated)
crates/lumen-host/
  gamestream/             Moonlight compat: nvhttp · pairing · rtsp · control · stream · gamepad · apps
  vdisplay/{kwin,gamescope,mutter}.rs   per-compositor client-sized virtual outputs
  zerocopy/{egl,cuda,vulkan}.rs         dmabuf → CUDA → NVENC (tiled via EGL/GL, LINEAR via Vulkan)
  inject/{libei,wlr,gamepad}.rs         input backends (+ uinput virtual gamepads)
  capture.rs · encode.rs · audio.rs · m0.rs · m3.rs · mgmt.rs
crates/lumen-client-rs/   lumen/1 reference client (M3 headless; M4 adds decode+present)
tools/{loss-harness,latency-probe}/     measurement (plan §10)
scripts/                  60-lumen.rules · lumen-host.service · host.env.example · headless/
include/lumen_core.h      generated C header

Design invariants — do not regress

  • One core, linked everywhere. Protocol/FEC/crypto live only in lumen-core, behind a stable, versioned C ABI. tokio/quinn exist only behind the quic feature (control plane); no async on the per-frame path — native threads only.
  • Native client resolution, no scaling. A session gets a virtual output at exactly the client's WxH@Hz via the VirtualDisplay trait (create(mode) → VirtualOutput { node_id, remote_fd, preferred_mode, keepalive }, RAII teardown). There is no cross-compositor protocol for this — each compositor keeps its own backend.
  • FEC is the wall-breaker. GF(2⁸) (≤255 shards/block, Moonlight-compatible) and GF(2¹⁶) Leopard (≤65535 shards/block) — lumen/1 negotiates the latter, removing the ~1 Gbps ceiling.
  • M1 security hardening stays intact: reassembler bounds attacker-controlled fields before allocating (ReassemblerLimits); AES-GCM per-direction nonce salts + seq-as-AAD; ABI struct_size checks. Regression tests exist — keep them green.
  • PipeWire consumer discipline: our capture streams set node.dont-reconnect and tear down promptly on negotiation timeout — one wedged link head-blocks the daemon's shared work queue system-wide.

Running on this box

Headless QEMU VM (Ubuntu 26.04, kernel 7.0), passthrough RTX 5070 Ti (driver 595 open module — a kernel update silently drops it; reinstall nvidia-driver-595-open), no KMS scanout → KWin --drm impossible; everything renders offscreen via renderD128.

# compositor session (shell 1, or the systemd unit in scripts/):
XDG_RUNTIME_DIR=/run/user/1000 DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus \
XDG_CURRENT_DESKTOP=KDE KWIN_WAYLAND_NO_PERMISSION_CHECKS=1 \
kwin_wayland --virtual --width 1920 --height 1080 --no-lockscreen --socket wayland-kde \
  --exit-with-session wev

# host (shell 2; gamescope entries need the PATH override until ninja install):
WAYLAND_DISPLAY=wayland-kde XDG_CURRENT_DESKTOP=KDE LUMEN_VIDEO_SOURCE=virtual \
LUMEN_ZEROCOPY=1 PATH=/tmp/gamescope-src/build/src:$PATH cargo run -rp lumen-host -- serve

# lumen/1 native loopback test (no Moonlight needed; same env as serve, listener persists
# across sessions — bound it with --max-sessions):
cargo run -rp lumen-host -- m3-host --source virtual --seconds 10 --max-sessions 1
cargo run -rp lumen-client-rs -- --mode 1280x720x120 --out /tmp/a.h265 --input-test  # + --pin HEX

Pinned crate facts: ashpd 0.13 + pipewire 0.9 (must match ashpd's) + ffmpeg-next 8.x (system FFmpeg 8 / libavcodec 62). Env knobs: LUMEN_VIDEO_SOURCE=virtual|portal, LUMEN_COMPOSITOR=kwin|gamescope|mutter, LUMEN_ZEROCOPY=1, LUMEN_GAMESCOPE_APP=..., LUMEN_INPUT_BACKEND=..., LUMEN_PERF=1 (per-stage timing), LUMEN_VIDEO_DROP=N (FEC test), LUMEN_FEC_PCT=N.

Conventions

  • Rust 2021, rustfmt + clippy -D warnings clean before commit.
  • Match the surrounding code's comment density and naming.
  • Commit messages end with the Co-Authored-By trailer (see git log).
  • pkill caution on this box: match exact comm names (pkill -x gamescope-wl, pkill -x lumen-host) — pkill -f self-matches the invoking shell.