punktfunk

Author	SHA1	Message	Date
enricobuehler	3cc3c02b42	feat(gamestream): AV1 negotiation + 5.1/7.1 surround audio Codec negotiation (M2 polish): - ServerCodecModeSupport now advertises what we encode: H264\|HEVC\|AV1_MAIN8 = 65793 (flags verified against moonlight-common-c Limelight.h). The old placeholder 3843 wrongly claimed HEVC Main10 + 4:4:4 and no AV1. Main10 bits stay off on purpose: Moonlight ties 10-bit to HDR, and capture is 8-bit SDR BGRx with no HDR metadata path (av1_nvenc -highbitdepth was validated working for later). - RTSP ANNOUNCE: bitStreamFormat 0/1/2 -> H264/HEVC/AV1 (already plumbed to av1_nvenc; validated e2e via `m0 --codec av1` + ffprobe av01), and a dynamicRangeMode!=0 request now logs + falls back to 8-bit SDR. Surround audio (M2 polish): - ANNOUNCE x-nv-audio.surround.{numChannels,AudioQuality} + x-nv-aqos.packetDuration -> per-session AudioParams; DESCRIBE advertises all six Opus configs (normal before HQ per channel count). Normal-quality mappings are pre-rotated for the client's GFE-order LFE swap (RtspConnection.c, verified verbatim) so its derived decoder mapping equals our encoder mapping — including 7.1, where Sunshine's rotate only covers [3,6) and scrambles LFE/SL/SR. - 5.1/7.1 encode via libopus multistream (audiopus_sys, the sys layer the opus crate already links) with Sunshine's layouts/bitrates, RAII wrapper; the live-validated stereo wire is byte-identical (plain Opus, no FEC). - Surround sessions add Sunshine-style RS(4,2) audio FEC (packetType 127 + AUDIO_FEC_HEADER, the OpenFEC parity matrix both ends hardcode, nanors gemm semantics verified from nanors/rs.c). - PipeWire capture generalized to the negotiated channel count with explicit FL FR FC LFE RL RR [SL SR] positions; missing sink channels are zero- filled by the channel-mixer. PwAudioCapturer now tears down cleanly on Drop (pipewire channel -> loop quit), so a channel-count change can reopen without leaking a capture stream. Tests: serverinfo mask, RTSP codec/audio param parsing, DESCRIBE contents, surround-params strings + client-swap round trip, FEC parity self-recovery and packet layout, real-codec 5.1 channel-identity round trip, and an ignored live test (ran green against a 6ch null sink monitor). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 15:41:15 +00:00
enricobuehler	bfd64ce871	rename: lumen → punktfunk, everywhere ci / rust (push) Has been cancelled Details Full project rename, decided 2026-06-10: - Crates/binaries: punktfunk-core / punktfunk-host / punktfunk-client-rs. - C ABI: punktfunk_* symbols, Punktfunk* types, include/punktfunk_core.h, PUNKTFUNK_FEATURE_QUIC guard (header regenerated; cbindgen renames updated, incl. PUNKTFUNK_BTN_/PUNKTFUNK_AXIS_ wire constants). - Protocol: punktfunk/1 — control-plane magic LMN1 → PKF1, nonce salt lmn1 → pkf1. WIRE BREAK: clients must be rebuilt from this revision. - Env knobs: PUNKTFUNK_VIDEO_SOURCE / PUNKTFUNK_COMPOSITOR / PUNKTFUNK_ZEROCOPY / …. - Host config dir: ~/.config/punktfunk (the box's dir was migrated in place — the persistent identity is unchanged, pinned fingerprints stay valid). - Swift package: PunktfunkKit + PunktfunkCore.xcframework + PunktfunkConnection (Sources/PunktfunkClient app + tests renamed with it); build-xcframework.sh updated. - scripts/: 60-punktfunk.rules, punktfunk-host.service; OpenAPI doc regenerated. Also: scripts/headless/run-headless-kde.sh — full headless Plasma bringup. Root cause of "desktop but no apps/settings" over the stream: plasmashell launched without XDG_MENU_PREFIX=plasma-, so the launcher resolved a nonexistent applications.menu and rendered an empty menu. The script sets the complete KDE session env (menu prefix, KDE_FULL_SESSION, session version) and rebuilds ksycoca before starting plasmashell. Gate: 97/97 tests, clippy -D warnings (both feature sets), fmt, C-ABI harness PASS, zero lumen references left outside .git. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 13:11:59 +00:00
enricobuehler	bf8a974e8b	feat: M4 stage 1 — the SwiftUI client is real: compiles, tested, first light on glass ci / rust (push) Has been cancelled Details The clients/apple scaffold is now a working macOS client, validated live against this repo's host across the LAN: gamescope virtual output → NVENC HEVC → lumen/1 (GF(2¹⁶) FEC + AES-GCM over UDP, QUIC control) → VideoToolbox → AVSampleBufferDisplayLayer at 720p60, mouse/keyboard flowing back as QUIC datagrams into the host's gamescope EIS injector (~3.7k events injected in one session). LumenKit: - LumenConnection: the predicted cbindgen compile fixes (C17 header spells the typedefs as integers while the enum constants import as a distinct Swift type — bridge by rawValue); close() is now safe from any thread (a close flag + pumpLock held across the blocking poll enforce the C contract "never close with a next_au in flight"; flag prevents lock-starvation by back-to-back polls). - StreamView: per-pump cancellation token (reconnects can't double-pump), flush + re-gate on the next in-band parameter sets when the layer fails, no stale enqueue after restart. - InputCapture: fractional-delta accumulation (sub-pixel motion isn't truncated away), pressed-state tracking with release-all on focus loss and stop() (nothing sticks down host-side), global-singleton ownership guard (GC has one handler slot per process), X1/X2 buttons, horizontal scroll, full keypad/CapsLock/ISO-102nd/PrintScreen/Menu VKs. - LumenClient app shell (swift run LumenClient): connect form, fps/Mb-s HUD, LUMEN_AUTOCONNECT/LUMEN_MODE for scripted first-light runs. - Tests: Annex-B byte-level units; real-codec round trip (VTCompressionSession-encoded HEVC rebuilt as the host's wire shape → AnnexB → VTDecompressionSession → pixels); test-loopback.sh (Swift client vs a real local m3-host over loopback — the Swift twin of c_abi_connection_roundtrip); RemoteFirstLightTests (full pipeline over the LAN). Host/build fixes that fell out: - The workspace builds on non-Linux again: gamestream audio (opus) and sendmmsg batching are now platform-gated with stubs/fallback, per the crate's "compiles everywhere" rule. - Horizontal scroll was inverted end-to-end: the injectors negated BOTH axes onto the ei/wl axes, but GameStream's horizontal convention is positive = right (moonlight-qt/Sunshine pass it through unnegated) — only vertical flips now. This also un-inverts real Moonlight clients. - AnnexB drops all zeros preceding a start code (trailing_zero_8bits padding), ffmpeg's policy, instead of leaking them into the preceding NAL. - build-xcframework.sh: deployment targets pinned to the package floor + an otool guard — cargo does not fingerprint MACOSX_DEPLOYMENT_TARGET, so warm caches can silently ship too-new minos objects. Adversarially reviewed (5-dimension multi-agent pass, every finding refutation-verified): 14 confirmed findings, all fixed above; the send-while-polling core-contract gap flagged here is closed by the lumen/1 session-planes work (&self pulls + per-plane borrow slots). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 14:46:45 +02:00
enricobuehler	520d7342dd	feat: M3 — full lumen/1 session planes: audio, gamepads+rumble, pinned trust, persistent listener ci / rust (push) Has been cancelled Details m3-host is now a real host, not a one-shot demo. Everything validated live on this box (two back-to-back sessions, pinned + TOFU, ~200 audio pkts/s, p50 0.84 ms at 720p60). lumen-core: - quic.rs: QUIC-datagram side planes demuxed by first byte — Opus audio 0xC9 ([magic][u32 seq][u64 pts_ns][opus], host→client) and rumble 0xCA ([magic][pad][low][high]). - Trust: endpoint::server_with_identity (persistent PEM identity) and endpoint::client_pinned — SHA-256 cert-fingerprint pinning with TOFU (observed fingerprint reported back for persisting). The verifier checks the TLS 1.3 CertificateVerify signature for real (an MITM replaying the host's public cert without its key is rejected; cert pinning alone would not prove key possession). - client.rs: NativeClient gains pin + host_fingerprint, audio/rumble receivers (next_audio / next_rumble); pull methods take &self so the C ABI's per-plane threads never alias a &mut (per-plane mutexed borrow slots in abi.rs). - abi.rs: lumen_connect(pin_sha256, observed_sha256_out) + lumen_connection_next_audio / next_rumble. input.rs: documented gamepad wire contract (GameStream buttonFlags bits, XInput axis conventions, +y = up) — exported as LUMEN_BTN_/LUMEN_AXIS_ (bare BTN_* collides with <linux/input-event-codes.h> at different values). lumen-host (m3): - Persistent accept loop: sessions back to back on one endpoint (--max-sessions, 0 = forever); per-session failures log and the loop keeps serving; 10 s handshake deadline so a silent client can't wedge the sequential accept queue; teardown on every exit path (stop flag → conn.close → join audio+input threads). - Audio plane: desktop PipeWire capture → Opus 48 kHz stereo 5 ms CBR → datagrams; ONE capturer reused across sessions via an AudioCapSlot (PipeWire streams have no cheap teardown — per-session opens would leak a thread + core connection + live node each). - Gamepad routing: incremental GamepadButton/GamepadAxis datagrams accumulate into per-pad state feeding the uinput xpad manager; force feedback returns as rumble datagrams, with current state re-sent every 500 ms (idempotent-state healing for the lossy channel). QUIC endpoint serves the persistent ~/.config/lumen identity and logs the pinnable fingerprint. lumen-client-rs: --pin (malformed values abort — never silently downgrade to TOFU), TOFU fingerprint logging, audio/rumble datagram counters, gamepad events in --input-test. clients/apple: scaffold synced — pinSHA256/hostFingerprint (wrong-size pin throws, fail-closed), nextAudio/nextRumble, gamepad event constructors; README handoff updated (persistent listener, audio decode notes, trust UX). Adversarially reviewed (5-dimension multi-agent pass over the diff, 2-skeptic verification): fixed the MITM signature-check gap, a Y-axis contract inversion, header macro collisions, ABI aliasing UB, the PipeWire per-session leak, the missing handshake deadline, fail-open pin parsing, and teardown-on-error paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 12:26:18 +00:00
enricobuehler	3ea096ace9	feat: M4 groundwork — lumen/1 client connector in the C ABI + SwiftUI client scaffold ci / rust (push) Has been cancelled Details The shared-core architecture pays off: platform clients now link ONE Rust library that does the entire lumen/1 protocol, and only add decode/present/input on top. lumen-core: - client.rs (quic feature): NativeClient — QUIC handshake + UDP data plane + input datagrams on internal threads; embedder surface = connect / next_frame / send_input. - abi.rs: lumen_connect / lumen_connection_next_au (borrow-until-next-call, matching lumen_client_poll_frame semantics) / lumen_connection_send_input / lumen_connection_mode / lumen_connection_close. Guarded in the generated header by LUMEN_FEATURE_QUIC (cbindgen [defines] mapping), so the checked-in header is stable across feature sets. - error.rs: append-only LumenStatus additions Timeout (-9) and Closed (-10). - TESTED end-to-end through the C ABI: in-process lumen/1 host, lumen_connect pulls 25 byte-verified frames, sends input, closes (m3.rs::c_abi_connection_roundtrip). Apple client (clients/apple — SCAFFOLD, written on Linux, first Xcode build pending): - scripts/build-xcframework.sh: cargo per Apple target → universal staticlib + header (LUMEN_FEATURE_QUIC pre-defined) + modulemap → LumenCore.xcframework. - Package.swift (LumenKit) + Swift sources: LumenConnection (ABI wrapper), AnnexB (in-band VPS/SPS/PPS → CMVideoFormatDescription, Annex-B → AVCC CMSampleBuffers with DisplayImmediately), StreamView (SwiftUI over AVSampleBufferDisplayLayer — stage-1 presenter that hardware-decodes compressed HEVC itself), InputCapture (GCMouse raw deltas + GCKeyboard HID→VK). - README.md is the full handoff for the next (Mac-side) agent: build steps, ABI contract, first-light test recipe against the Linux host, stage-2 (VT+Metal pacing) plan, and the known host-side gaps (single-session m3-host, no lumen/1 audio yet, gamepad kinds not yet routed in m3's injector, seed-stage trust). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 07:28:41 +00:00
enricobuehler	68f2b19cca	Merge main (management REST API) into m1-lumen-core ci / rust (push) Has been cancelled Details Resolutions: serve() keeps main's AppState::new() with our persisted-pairing load folded into it; main.rs keeps both the m3 and mgmt modules; mgmt's test LaunchSessions gain the new appid field; Cargo.lock re-resolved. Full gate green (92 tests, clippy, fmt). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 07:03:41 +00:00
enricobuehler	5b0d84acd0	feat: M3 — lumen/1 native streaming: real video at client mode + input over QUIC datagrams The native protocol now does the real thing, end to end: - Hello carries the client's requested mode; the host creates a NATIVE virtual output at exactly that size/refresh (same vdisplay backends as the GameStream path) and streams NVENC HEVC through the M1 Session (GF(2^16) Leopard FEC + AES-GCM, QUIC-negotiated). - Input rides QUIC DATAGRAMS — encrypted, congestion-managed, no ENet retransmission spikes — decoded into lumen_core InputEvents and fed to the session's input injector. - Frames are stamped with the capture wall clock; the reference client computes per-frame capture→reassembled latency percentiles and writes a playable .h265. - m3-host gains --source synthetic\|virtual + --seconds; the client gains --mode WxHxFPS, --out, --input-test (scripted mouse/keyboard datagrams). VALIDATED live (gamescope session, xev nested): client requested 1280x720@120 → host created gamescope at that mode → 1680/1680 frames over 14s, zero loss, valid HEVC; pipeline latency p50 0.83ms / p95 1.2ms / p99 1.3ms (capture→encode→FEC→crypto→UDP→ reassembled, same-host clock); 176 input datagrams sent → injector (GamescopeEi) → 164 X events observed inside the nested session. Known follow-on: slice-level sub-frame pipelining needs the NVENC SDK directly (libavcodec emits whole AUs only) — the next big latency lever. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 06:56:47 +00:00
enricobuehler	de3123038f	feat: M3 seed — the lumen/1 native protocol: QUIC control plane + reference client (Phase 5) The first end-to-end run of lumen's own protocol, past the GameStream compatibility layer. - lumen-core/src/quic.rs (behind the `quic` feature): the lumen/1 handshake — Hello/Welcome/ Start as length-prefixed LE binary on one QUIC bi-stream. Welcome carries the COMPLETE data-plane Config: mode, FEC scheme incl. GF(2^16) Leopard (inexpressible in GameStream), shard sizing, AES-GCM key + per-direction salt, data UDP port. Plus quinn endpoint helpers (self-signed server; accepts-any client — pinning lands with the trust model) and framed async IO. Round-trip unit-tested. - lumen-host m3-host: serves one lumen/1 session — QUIC handshake, then a NATIVE thread (no async on the frame path — design invariant) streams deterministic 64KB test frames through the hardened M1 Session over UdpTransport. - lumen-client-rs: from scaffold to working reference client — connects, negotiates, brings up the client Session over UDP, reassembles + FEC-recovers + byte-verifies every frame. VALIDATED END-TO-END on localhost: 300/300 frames verified, 0 mismatches, through QUIC-negotiated GF(2^16) FEC + AES-GCM over real UDP sockets. M4 (decode+present) builds on this exact client skeleton. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 23:33:40 +00:00
enricobuehler	1eeb35a723	feat: M2 — host productionization: app catalog, persistent pairing, quit semantics, systemd (Phase 4) - gamestream/apps.rs: an app catalog (loaded from ~/.config/lumen/apps.json, with defaults: Desktop + gamescope entries when gamescope/steam/vkcube are installed). /applist renders it; /launch?appid=N selects the entry; RTSP PLAY resolves it and the stream honors the app's compositor + nested command — so a Moonlight client picks "Steam" and gets a gamescope session at its native resolution, or "Desktop" for the KWin/GNOME desktop. - Persistent pairing: the paired-client cert allow-list now survives restarts (~/.config/lumen/paired.json), saved on each successful pairing, loaded at boot. - Quit semantics: /cancel now actually stops the media threads (streaming/audio flags), tearing down the per-session virtual output / gamescope process via the capturer's RAII. - scripts/lumen-host.service (systemd user unit) + scripts/host.env.example (config file consumed by it) — the host runs as a managed service instead of an SSH shell. Smoke-tested: serve boots, /applist serves the catalog (Desktop + vkcube gamescope entry auto-detected on this box). GNOME backend validation still pending gnome-shell install; wlroots vdisplay backend deliberately deferred (not in the priority compositor trio). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 23:23:53 +00:00
enricobuehler	826da9968e	feat: M2 — Vulkan bridge: TRUE zero-copy for gamescope's LINEAR dmabufs (Phase 3) The missing zero-copy path is closed. NVIDIA's EGL won't sample LINEAR and the CUDA driver rejects raw dmabuf fds — but Vulkan imports dmabufs (VK_EXT_external_memory_dma_buf) and exports OPAQUE_FD memory that CUDA officially imports. zerocopy/vulkan.rs (ash): dmabuf fd → VkBuffer (import cached per fd) → vkCmdCopyBuffer (GPU) → exportable VkBuffer → vkGetMemoryFdKHR(OPAQUE_FD) → cuImportExternalMemory → CUdeviceptr The exportable buffer + CUDA mapping are per-resolution; per frame it's one GPU buffer copy (fence-waited) + one pitched CUDA copy into the encoder's pool. No CPU touches pixels. EglImporter::import_linear now routes through the bridge (lazy init; any failure still falls back to the CPU mmap path). cuda::ExternalDmabuf gained import_owned_fd for the Vulkan-exported fd. Validated live: gamescope 720p120 → "Vulkan→CUDA exportable staging buffer ready size=3686400" (exactly 12807204), full-rate 122.7 fps, decoded frame pixel-correct (vkcube). KWin's tiled EGL path regression-tested intact. NV12 negotiation dropped — moot now that BGRx is fully zero-copy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 23:18:38 +00:00
enricobuehler	c39615d7d1	perf: M2 — split the data plane into encode \| send threads with batched, paced sends (Phase 2) stream_body no longer sends: each frame's packet batch goes over a depth-2 bounded queue to a dedicated send thread, so a send spike can never stall capture/encode (a full queue drops the NEWEST batch — FEC/RFI covers the client — rather than ever blocking). The sender ships packets with sendmmsg (≤64/syscall: ~375 syscalls/s instead of ~24k at 5K@240) in 16-packet chunks paced across ~3/4 of the frame interval — microburst shaping for real links without per-packet sleep jitter. Client-gone detection moved to the sender (clears `running`); the LUMEN_VIDEO_DROP FEC test knob moved with the send path. Loopback-tested: batches arrive complete and byte-identical through the paced sendmmsg path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 23:13:34 +00:00
enricobuehler	6521146abc	feat: M2 — gamepad input: uinput virtual X-Box pads + rumble back-channel (Phase 1) Full controller path for the SteamOS-like session, mirroring Sunshine byte-for-byte (wire formats verified against moonlight-common-c + Sunshine source; ioctl numbers and struct layouts verified by compiling against this box's <linux/uinput.h>, locked in with const asserts): - gamestream/gamepad.rs: decode MULTI_CONTROLLER (magic 0x0C, mixed BE-size/LE-body) incl. the Sunshine buttonFlags2 extension (paddles/touchpad/Misc — our appversion already advertises Sunshine, so clients send it) and CONTROLLER_ARRIVAL (0x55000004); build the 0x010B rumble plaintext (with the mandatory 4-byte filler). Unit-tested. - inject/gamepad.rs: VirtualPad clones the kernel xpad identity ("Microsoft X-Box 360 pad", 045e:028e, exact button/axis codes + absinfo) so SDL/Steam/Proton match their built-in mapping with zero config. GamepadManager creates/destroys pads from activeGamepadMask (hotplug), emits button transitions + axes (+Y-up → evdev +Y-down negation, D-pad as HAT0X/Y) per frame. Rumble: non-blocking FF pump answers UI_BEGIN/END_FF_UPLOAD/ERASE (games block in EVIOCSFF until answered), tracks effects with replay expiry + FF_GAIN, mixes to (low, high) motor levels, dedups. - control.rs: channel_limit 8 → 0x30 — Moonlight sends gamepad input on ENet channel 0x10+n, so the old limit silently discarded ALL controller input. Gamepad events route to the manager; rumble is sealed with the client's detected GCM scheme direction-flipped (V2 marker 'H?', own seq counter) and sent on the control peer every service tick. - scripts/60-lumen.rules: udev rule (Sunshine-style) granting the input group /dev/uinput. Live validation needs the udev rule installed (root-only /dev/uinput on this box) + a Moonlight client with a controller; everything else is gated and unit/static-checked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 23:09:16 +00:00
enricobuehler	be18a782b1	feat: M2 — GNOME/Mutter virtual-display backend (RecordVirtual) + preferred-mode negotiation Third compositor on the VirtualDisplay seam, via Mutter's direct D-Bus APIs (the gnome-remote-desktop headless model, no portal grant): RemoteDesktop.CreateSession → ScreenCast.CreateSession({remote-desktop-session-id}) → Session.RecordVirtual (creates a virtual monitor) → Start → PipeWireStreamAdded(node_id). A keepalive thread owns the zbus connection (sessions die with it — RAII teardown); select with LUMEN_COMPOSITOR=mutter or XDG_CURRENT_DESKTOP=GNOME. Mutter sizes its virtual monitor FROM the PipeWire format negotiation, so VirtualOutput gains preferred_mode (w, h, refresh_hz), threaded into the consumer's format pods as the default size/framerate. KWin/gamescope set it too (their outputs are already exact-size; the preference just confirms it) — both regression-tested intact. Compile/clippy/test clean; live validation needs gnome-shell installed (then `gnome-shell --headless` + m0 --source kwin-virtual with LUMEN_COMPOSITOR=mutter). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 22:48:53 +00:00
enricobuehler	751789f932	feat: M2 — LINEAR-dmabuf CUDA import attempt + graceful zero-copy fallback (gamescope) gamescope only offers LINEAR dmabufs, which the EGL/GL interop path can't handle (NVIDIA's EGL lists no LINEAR modifier for sampling). Attempt a direct CUDA external-memory import (cuImportExternalMemory OPAQUE_FD, cached per buffer fd, one DtoD copy per frame into the pooled buffer): the FFI + plumbing are in place, and LINEAR(0) is now advertised alongside the tiled EGL modifiers (tiled first, so KWin still prefers it — regression-tested). Empirically the 595 desktop driver rejects raw dmabuf fds as OPAQUE_FD (CUDA_ERROR_UNKNOWN), matching the documented limitation — true LINEAR GPU import needs a Vulkan interop bridge (import dmabuf via VK_EXT_external_memory_dma_buf, GPU-copy into an exportable allocation, hand that to CUDA), noted as future work. So the importer now degrades instead of dying: on GPU-import failure it logs once, disables itself, and falls through to the CPU mmap path. Validated: gamescope + LUMEN_ZEROCOPY=1 runs full-rate (122.9 fps @720p120, valid HEVC) via the fallback; KWin keeps real zero-copy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 22:43:35 +00:00
enricobuehler	7f3897e0d3	feat: M2 — gamescope input via its EIS socket (SteamOS-like input path) gamescope runs its own EIS server and exports the socket to its children as LIBEI_SOCKET — no portal involved. The gamescope backend now launches the nested app through a tiny shell wrapper that relays that value to /tmp/lumen-gamescope-ei; the libei injector gains an EiSource enum (Portal \| SocketPathFile) and connects a UnixStream directly to gamescope's socket (polling until the app has started), then runs the identical reis sender flow. Backend::GamescopeEi is auto-selected when LUMEN_COMPOSITOR=gamescope (LUMEN_INPUT_BACKEND=gamescope overrides). Validated end-to-end: input-test against a headless gamescope running xev — 129 MotionNotify/KeyPress/ButtonPress events delivered into the nested X app ("Gamescope Virtual Input" device bound, sender handshake + emulation working). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 22:34:48 +00:00
enricobuehler	c8f9032dec	feat: M2 — harden gamescope capture path (blocked on gamescope ≥3.16.22 upstream fix) Deep investigation (gdb + daemon traces) proved the gamescope capture stall is a gamescope 3.16.20 bug, not ours: it calls pw_loop_iterate() without pw_loop_enter()/leave(), and under PipeWire 1.6's loop locking its main thread permanently holds the loop mutex — the pw thread deadlocks, gamescope never acks the daemon's port_set_param(Format), and the link parks in "negotiating" silently. Stock gst pipewiresrc fails identically. Fixed upstream by gamescope commit e3ed1ea7 ("pipewire: Fix pipewire loop locking", pipewire#5148); first release 3.16.22. Ubuntu 26.04 ships 3.16.20 (built ten days before the fix) — patch/upgrade required. Consumer-side improvements from the investigation (all verified correct vs gamescope's pods, and needed once the producer is fixed): - discover the node from gamescope's own "stream available on node ID: N" log line (its node.name appears on two objects; the advertised id is authoritative); pw-dump fallback - CPU path accepts mappable dmabufs: Buffers param now offers MemPtr\|MemFd\|DmaBuf (gamescope counter-offers exactly DmaBuf when its modifier pod wins, never MemPtr), mmap the fd ourselves when MAP_BUFFERS didn't (Vulkan-exported dmabufs aren't flagged mappable), and treat chunk.size==0 as the computed span - warn_once on every silent frame-drop path in the process callback - node.dont-reconnect on our capture streams: an orphaned stream re-targeted by wireplumber onto a fresh node wedges it — and a stuck link head-blocks the daemon's shared work queue, stalling ALL new link negotiation system-wide (this poisoned whole test sessions) - LUMEN_GAMESCOPE_NODE (attach to an existing gamescope) + LUMEN_PW_FIXED_POD (negotiation bisection) debug knobs KWin path regression-tested (zero-copy intact). gamescope end-to-end validation pending the patched gamescope build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 22:16:53 +00:00
enricobuehler	bd25f5e02f	fix: M2 — harden the management API after adversarial review ci / rust (push) Has been cancelled Details Five confirmed findings from a 46-agent review panel: - Empty --mgmt-token no longer satisfies the non-loopback token gate (critical: 'Bearer ' with an empty token authenticated; parse_serve now bails on blank tokens and mgmt::run treats blank as none) - axum's built-in body rejections (400/415/422) now wear the documented ApiError envelope via an ApiJson extractor, and the spec documents them - GET /health carries security([{}]) in the spec, matching the server's auth exemption - unpairClient's description no longer claims revocation the TLS layer doesn't enforce yet (gamestream/tls.rs accepts any cert — known gap) - CLAUDE.md/README.md no longer reference the deleted web.rs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 22:00:22 +00:00
enricobuehler	a339a0466e	feat: M2 — management REST API with OpenAPI doc (control-pane groundwork) A versioned control-plane REST API (/api/v1) on its own port (default 127.0.0.1:47990) serving host info, runtime status, paired-client management, the pairing PIN flow, and session control (stop / force-IDR). The OpenAPI 3.1 document is generated from the handlers by utoipa, served live at /api/v1/openapi.json (+ Scalar docs at /api/docs), printable via `lumen-host openapi`, and checked in at docs/api/openapi.json for client codegen — a test fails if it drifts, mirroring the cbindgen header rule. Auth: optional bearer token (--mgmt-token / LUMEN_MGMT_TOKEN), enforced on everything but /health, and mandatory for non-loopback binds. PinGate gains a waiter count so the API can report pin_pending; logs moved to stderr so stdout stays machine-readable. Supersedes the web.rs stub. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 21:35:43 +00:00
enricobuehler	20bd76ae50	feat: M2 — gamescope virtual-display backend (spawn headless, capture its PipeWire node) Third compositor on the VirtualDisplay seam. gamescope's model differs from KWin/Mutter: it's not a runtime protocol but a micro-compositor we spawn — `gamescope --backend headless -W -H -r -- <app>` — which composites at the client's size AND refresh natively (so no separate refresh step), runs the app nested, and exports a built-in PipeWire node named "gamescope". The backend spawns it, discovers that node via pw-dump, and returns a VirtualOutput whose keepalive owns the process (drop = kill = teardown). App via LUMEN_GAMESCOPE_APP. Select with LUMEN_COMPOSITOR=gamescope; m0's virtual source now honors LUMEN_COMPOSITOR so any backend is testable without a client. Input (gamescope's libei/EIS socket) is a follow-up. Builds/clippy/fmt clean. Needs gamescope installed to validate; headless capture on the proprietary NVIDIA driver is plausible-by-architecture but unproven — validate empirically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 21:23:52 +00:00
enricobuehler	22a982a1cb	feat: M2 — drive KWin virtual output above 60Hz (custom mode at the client's refresh) KWin creates virtual outputs at a hardcoded 60 Hz and zkde stream_virtual_output has no refresh argument, so the source composited at 60 Hz even when the client asked for 120/240 (confirmed live: stream paced a stable 240 fps but only ~60 unique frames/s). KWin 6.6+ allows custom modes on virtual outputs, so after creating the output we install + select a mode at the client's refresh, before capture connects PipeWire. First cut shells out to kscreen-doctor (output is "Virtual-<name>"); the in-process kde_output_management_v2 client is a follow-up. Best-effort — failure leaves the source at 60 Hz (stream still works). Verified the mode is applied (Virtual-lumen -> 1280x720@120). Empirically de-risked that this headless QEMU VM's software vsync accepts >60 Hz. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 21:14:16 +00:00
enricobuehler	37ae26b4be	perf: M2 — auto 2-way NVENC split-encode for high pixel rates (5K@240) GB203 has two NVENC engines. A single HEVC p1 session tops out ~1 Gpix/s, so 5120x1440@240 (1.77 Gpix/s) is encoder-bound on one engine; split-frame encode runs it across both (~1.8x, latency-neutral, output is standard HEVC the client decodes normally). NVENC's AUTO split won't engage below ~2112px height, so force split_encode_mode=2 when the pixel rate exceeds ~1 Gpix/s (HEVC/AV1 only — not H.264). Below that (e.g. 5K@120) stay single-engine to avoid the ~2% BD-rate cost. Override with LUMEN_SPLIT_ENCODE. Verified: engages at 240, not at 120. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 20:42:41 +00:00
enricobuehler	a473f4a926	perf: M2 — amortize per-frame zero-copy overhead (pool buffers + register once) The zero-copy import did real per-frame GPU churn that capped high-fps throughput: a fresh ~29MB cuMemAllocPitch + cuMemFree, a cuGraphicsGLRegisterImage/unregister, and a map of the same persistent blit texture — every frame. Two fixes: - BufferPool: a recycled free-list of pitched device buffers per resolution. DeviceBuffer returns its allocation to the pool on drop (after the encoder synchronized) instead of freeing — kills the per-frame 29MB alloc/free that took the device allocator lock and serialized against the GPU. - RegisteredTexture: register the (reused) GL_RGBA8 blit destination with CUDA ONCE when the GlBlit is built; each frame only maps → copies the array → unmaps, instead of registering/unregistering every frame. This is the "zero-copy should be overhead-free" cleanup. Verified the import still produces correct frames; the remaining per-frame cuCtxSynchronize pair (shared-context coupling) is the next step (CUDA stream + events). lumen-host builds, clippy/fmt/tests clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 20:38:26 +00:00
enricobuehler	0e1853e070	fix: M2 — eliminate the periodic high-res stream freeze (infinite GOP + single-deadline pacing) At 5120x1440 the stream froze on a ~2s cadence. Two compounding causes (confirmed by a profiling pass + adversarial review): 1. Periodic IDR every 2s (set_gop(fps*2)). A keyframe at 5K is ~20-40x a P-frame — a recurring multi-millisecond encode+packetize+send spike. Fix: infinite GOP (gop_size=-1), one IDR at stream start, P-frames only; forced-idr makes a client recovery request (RFI via request_keyframe) emit an IDR on demand — the Moonlight/Sunshine low-latency model. 2. Two pacing timers summing on the capture/encode thread: a per-packet thread::sleep pacer (spread a frame's packets across a whole frame interval) PLUS a backstop sleep on top, so every frame cost 1-2x the interval and the big IDR blew through it (the 2->120 oscillation). Fix: delete both; send at line rate and drive cadence from a single absolute deadline. (Proper microburst pacing belongs on a dedicated send thread — a follow-up.) Also: honor the client's fps (pacing clamp 60->240) and add an env-gated (LUMEN_PERF) per-stage timing log (enc/pkt/send µs + unique-vs-reencoded frames + max packet burst) for diagnosing the remaining throughput ceiling. Verified live: freeze gone at 5120x1440. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 20:29:48 +00:00
enricobuehler	669d40ae21	build: migrate to ffmpeg-next 8 (FFmpeg 8.x / libavcodec 62) Ubuntu 26.04 ships FFmpeg 8.0 (libavcodec 62); bump ffmpeg-next 7.1 -> 8.1 to bind it as the intended pairing. No source changes needed — the encode API surface we use (avcodec_send_frame, hwframe contexts, AV_PIX_FMT_CUDA, av_log) is stable across 7->8. Workspace builds + all tests green; clippy/fmt clean. Refresh the 7.x doc references. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 18:13:40 +00:00
enricobuehler	7d08e43c16	feat: M2 — KWin virtual-output backend behind a VirtualDisplay trait (native client resolution) Honor the client's requested resolution by rendering a compositor virtual output at exactly that size — native, headless, no scaling. There is no cross-compositor Wayland protocol for this, so it's a per-compositor backend behind the (previously stubbed) VirtualDisplay trait. - vdisplay.rs: VirtualDisplay::create(mode) now returns a live VirtualOutput { node_id, remote_fd: Option<OwnedFd>, keepalive } with RAII teardown (drop releases the output) instead of an inert OutputHandle + explicit destroy. Add compositor detect() (LUMEN_COMPOSITOR / XDG_CURRENT_DESKTOP). - vdisplay/kwin.rs: the KWin backend — the zkde_screencast_unstable_v1 stream_virtual_output client (vendored protocol XML + wayland-scanner codegen). Creates a WxH output, returns its PipeWire node (default daemon, remote_fd=None); a keepalive thread holds the Wayland connection until dropped. (Moved here from capture/kwin.rs — it's a vdisplay backend, not capture.) - capture: generalize the PipeWire consumer to Option<OwnedFd> (portal remote vs. default daemon) and add capture_virtual_output(vout), compositor-agnostic, owning the keepalive. - gamestream/stream.rs: LUMEN_VIDEO_SOURCE=virtual creates a virtual display sized to the client's cfg and captures it (self-contained, not pooled — a reconnect at a new resolution gets a fresh output). - m0: --source kwin-virtual goes through the trait. Verified end-to-end against the running headless KWin: the request reaches the compositor and is handled cleanly. Native creation needs a backend implementing createVirtualOutput — the DRM backend, or the VirtualBackend since KWin 6.5.6; on this box's --virtual 6.4.5 it returns "Could not find output" (expected; validates after the KWin upgrade). wlroots/Mutter backends are the next ones to land on the same seam. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 17:30:02 +00:00
enricobuehler	6508980564	feat: M2 — validate client-requested video mode (codec dimension guards) Clients pick the resolution via mode=WxHxFPS / RTSP clientViewportWd-Ht, so the host must bound attacker/typo-controlled dimensions before allocating buffers or opening NVENC. Add encode::validate_dimensions: reject zero, odd, and over-limit modes (H.264 ≤ 4096px/side; HEVC/AV1 ≤ 8192) with a clear message instead of a buffer-math overflow or an opaque NVENC open failure. Gate both the stream path (before any allocation) and open_video (also covers m0). Unit-tested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 16:40:56 +00:00
enricobuehler	aa91485008	feat: M2 — complete zero-copy dmabuf→NVENC capture path (EGL/GL→CUDA) The PipeWire dmabuf now reaches NVENC with no CPU touch. Verified live against headless KWin: a tiled BGRx dmabuf is imported and encoded to a pixel-correct H.265 stream (decoded frame matches the captured desktop — no tiling artifacts, no colour swap). The CPU-copy path stays the default and the runtime fallback. Capture side (zerocopy::egl): desktop NVIDIA can't register a dmabuf EGLImage with CUDA directly (cuGraphicsEGLRegisterImage is Tegra-only; cuGraphicsGLRegisterImage rejects EGLImage-backed textures), so we follow OBS/Sunshine — bind the EGLImage to a GL texture, render it through a fullscreen-triangle shader into an immutable GL_RGBA8 texture (de-tiling + .bgra swizzle to the BGRx the encoder wants), then register that texture with CUDA and copy it device-to-device into an owned buffer so the dmabuf returns to the compositor immediately. Encode side (encode/linux::submit_cuda): take a pooled CUDA surface via av_hwframe_get_buffer and device→device-copy our imported buffer into it, instead of wrapping our own pointer in a bare AVFrame. A bare frame is rejected with EINVAL (NVENC ignores frames with null buf[0]; the encode path's av_frame_ref needs a refcounted buffer), and a fresh device pointer every frame would thrash NVENC's bounded resource-registration cache — the pool recycles a small set. Also: gate FFmpeg AV_LOG_DEBUG behind LUMEN_FFMPEG_DEBUG for diagnosing hw-frame rejects, and refresh the now-accurate module docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 16:28:29 +00:00
enricobuehler	e3876c0d8a	feat: M2 zero-copy — PipeWire dmabuf negotiation + EGL device-platform import (WIP) Wire the capture side of zero-copy (LUMEN_ZEROCOPY=1): - EGL importer now opens the headless EGLDisplay on the NVIDIA EGL device (EGL_PLATFORM_DEVICE_EXT) and queries its importable DRM modifiers (eglQueryDmaBufModifiersEXT). - The PipeWire stream advertises a BGRx dmabuf format with those modifiers as a mandatory enum Choice + a dmabuf-only Buffers param; the compositor fixates an importable tiled modifier. param_changed reads the negotiated modifier; the process callback imports the dmabuf (eglCreateImage with explicit LO/HI modifier) and would copy it into a CUDA buffer for the encoder. Validated against headless KWin (Plasma 6.4): negotiation succeeds (13 NVIDIA modifiers advertised, KWin fixates one, stream reaches Streaming with a real tiled dmabuf) and `eglCreateImage` succeeds. The remaining blocker is `cuGraphicsEGLRegisterImage` returning CUDA_ERROR_INVALID_VALUE on the dmabuf-imported EGLImage — the likely fix is to bind the EGLImage to a GL texture (glEGLImageTargetTexture2DOES) and register that via cuGraphicsGLRegisterImage (OBS/Sunshine's path), which needs a GL context. The CPU-copy path stays the default and is unaffected (regression-checked: real KWin capture → HEVC). LUMEN_ZEROCOPY is opt-in/experimental until the CUDA registration lands. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 15:41:31 +00:00
enricobuehler	16a00563a8	feat: M2 zero-copy foundation — EGL→CUDA import + NVENC CUDA-frame path Scaffolding for dmabuf zero-copy (plan §9), opt-in via LUMEN_ZEROCOPY: - src/zerocopy/{cuda,egl}.rs: hand-rolled CUDA Driver-API FFI (no Rust crate exposes the EGL-interop calls / CUeglFrame) with a shared process-wide CUcontext + pitched device buffers; an EGL importer (GBM platform on the NVIDIA render node) that turns a dmabuf into an EGLImage, registers it with CUDA, and copies it device-to-device into an owned buffer. `zerocopy-probe` subcommand validates the FFI/linking/GPU access — confirmed on the box (driver 595, EGL_EXT_image_dma_buf_import + modifiers). - CapturedFrame gains a FramePayload enum (Cpu(Vec<u8>) \| Cuda(DeviceBuffer)); the encoder branches: CPU keeps the expand+upload path, CUDA wraps the device buffer in an AV_PIX_FMT_CUDA frame fed straight to hevc_nvenc (sharing our CUcontext via a hand-declared AVCUDADeviceContext, since ffmpeg-sys doesn't bind hwcontext_cuda.h). open_video/the encoder take a `cuda` flag derived from the first frame's payload. The capture-side dmabuf negotiation (which produces the Cuda frames) is the next step; the CPU path is unchanged and remains the default + fallback. Builds clean, clippy clean, tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 15:13:05 +00:00
enricobuehler	b64be1dc33	fix: m0 portal capture — activate the capturer so frames are delivered The M2 teardown work added an `active` gate to the PipeWire capture callback (idle by default so reconnects stay cheap, with the stream path calling set_active(true) on PLAY). The `m0` subcommand was never updated, so its portal capturer stayed inactive and the callback dropped every frame — `m0 --source portal` failed with "no PipeWire frame within 10s" on every compositor. Call set_active(true) before the capture loop. Validated on headless KWin (Plasma 6.4) via the RemoteDesktop-anchored ScreenCast session: real desktop frames flow (shm BGRx 1920x1080) and encode to valid H.265. (Also folds in a rustfmt reflow of the input-test log line.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 14:24:09 +00:00
enricobuehler	0a79a8209b	feat: M2 — RemoteDesktop-anchored ScreenCast capture for KWin/GNOME On RemoteDesktop-capable desktops (KWin, GNOME), select the ScreenCast source on a session created via the RemoteDesktop portal and start it through RemoteDesktop, so a single grant — pre-authorized headlessly via the `kde-authorized` permission, exactly like the libei input path — also covers screen capture. Standalone ScreenCast has no such bypass and would raise an un-clickable dialog on a headless box. wlroots/Sway has no RemoteDesktop portal, so it keeps the plain ScreenCast session; the choice keys off inject::default_backend(). The PipeWire consumer is unchanged — the anchored session yields the same fd + node id. Validated on headless KWin (Plasma 6.4): the portal grants the session with no dialog and PipeWire negotiates the format (1920x1080 BGRx, Streaming). Frame delivery on KWin still pends dmabuf import — KWin hands GPU dmabuf buffers and the M0 consumer is CPU-copy/shm only (plan §9, zero-copy) — so it's the next step; the CPU-copy path remains correct on wlroots. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 14:09:19 +00:00
enricobuehler	03a6a67354	feat: M2 P1.7 — libei input backend (portable to KWin/GNOME) Add a second input-injection backend that works on compositors implementing the org.freedesktop.portal.RemoteDesktop interface (KWin, GNOME/Mutter), where the wlroots virtual-input protocols are absent. Uses ashpd 0.13 to open a RemoteDesktop session + EIS fd and reis 0.6.1 to drive it as an EI sender: bind pointer/keyboard/scroll/button capabilities and, per device, start_emulating → emit → frame. Runs on a dedicated thread with its own tokio runtime (the portal session + EIS connection must stay alive and the event stream must be polled continuously); open() returns immediately so a slow or denied portal can never freeze the ENet control thread, with events enqueued over an unbounded channel until devices resume. Backend now auto-selects per session (inject::default_backend): wlr on Sway, libei on KDE/GNOME; LUMEN_INPUT_BACKEND overrides. Refactor inject.rs into the inject/{wlr,libei}.rs layout matching the capture/encode convention. Keyboard codes are evdev (the same space our VK→evdev table produces) and the compositor supplies the keymap, so no keymap upload and no modifier serialization — pressing the modifier keys Moonlight sends is enough. Add a `lumen-host input-test` subcommand that injects a scripted mouse+keyboard pattern through the session backend, so input injection can be validated without a Moonlight client. Live-validated on headless KWin (Plasma 6.4): mouse motion, left click, and the 'A' key inject correctly and are delivered to the focused client. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 13:58:41 +00:00
enricobuehler	6de09fd822	feat: M2 teardown — persistent capturers for clean reconnects Disconnect/reconnect now works reliably. Previously each stream spawned its own portal+PipeWire (and PipeWire audio) capture threads and never stopped them, so a reconnect opened a SECOND screencast session that conflicted with the leaked first one ("no PipeWire frame within 10s" → black screen on reconnect). - The screen capturer and audio capturer are now persistent, held in AppState and reused across streams (created on the first stream). One screencast session for the host's lifetime → no conflict, and instant reconnect (no re-handshake). Verified live: 3 stream cycles, 1 create + 2 "reusing capturer", clean every time. - Capturer::set_active gates the (5K, ~1.3 GB/s) de-pad copy to active streams, so the persistent video capturer is nearly free while idle between streams. - AudioCapturer::drain discards buffered chunks on reuse so the client never hears stale audio captured while idle. - stream.rs / gamestream/audio.rs split into a borrow-the-capturer wrapper + the encode/send body, so the capturer is always returned to its slot on exit. This holds whether the client reconnects via /resume (Moonlight's "running → play/continue") or a fresh /launch — both re-run RTSP PLAY → a new stream cycle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 12:35:10 +00:00
enricobuehler	af4360c930	feat: M2 P1.5 robustness — IDR-on-request, send pacing, min-parity floor Graceful FEC behavior on a lossy link: at a realistic 2% packet loss the stream is now steady 0% (was spiking 40-60%). Verified live. - IDR/RFI handling: the control thread recognizes the client's recovery requests (0x0301 invalidate-reference-frames, 0x0302 request-IDR, 0x0305) and sets a shared force_idr flag; the video thread forces an NVENC keyframe on the next frame (Encoder::request_keyframe → input frame pict_type = I). Without this, a frame that exceeds the FEC budget broke the reference chain until the next GOP IDR (~2s), cascading to most of the stream being undecodable. - Min-parity floor: honor the client's x-nv-vqos[0].fec.minRequiredFecPackets (it asks for 2). Small P-frames previously got m=ceil(k*20/100)=1 parity — a single loss broke them; flooring m>=2 (capped so k+m<=255, wire pct recomputed) protects them. This is what turned the 2% spikes into steady 0%. - Send pacing: spread each frame's packets evenly across the frame interval instead of blasting them at line rate (a real link drops microbursts), matching Sunshine's rate-controlled sends; sub-500us sleeps skipped (unreliable). Note: sustained ~8% uniform loss still degrades — that exceeds 20% FEC for reference-frame video and real Sunshine degrades there too; real networks are <1% or bursty, which this now handles cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 12:14:59 +00:00
enricobuehler	72f8c05aa3	feat: M2 P1.5 (FEC) — nanors-exact Reed-Solomon recovery for the video stream Moonlight now reconstructs lost video shards from our parity (verified live: under induced packet loss the picture recovers cleanly instead of failing with "network connection too bad"; 0% added loss in normal operation). The decisive finding: Moonlight's nanors uses a CAUCHY generator matrix (M[j][i] = inv[(m+i)^j], GF(2^8) poly 0x1d), while reed-solomon-erasure is Vandermonde — so its parity was NOT Moonlight-decodable, despite the old gf8.rs comment claiming equivalence. lumen-core: - Swap the GF(2^8) backend from reed-solomon-erasure to a vendored fec-rs (vendor/fec-rs, BSD-2), which builds the byte-identical Cauchy matrix. Pure Rust, no FFI — keeps the "one core" hot path. This makes both lumen's own protocol and the GameStream parity nanors-compatible. - Lock it with a regression test against real nanors vectors (k=4,m=2 [10,20,30,40] -> parity [136,0]) + an independent matrix-derived cross-check + an erase/recover round-trip. Existing FEC/loopback tests stay green, so lumen's own protocol is unaffected. lumen-host video.rs: - Generate m = ceil(kpct/100) parity shards per FEC block via Gf8Coder; stamp fecInfo with the recomputed wire pct (100m/k) so the client derives the same count; cap per-block data to 255100/(100+pct) so k+m <= 255. - CRITICAL byte-exactness: RS runs over the whole `blocksize` shard (Moonlight decodes packetSize+16 bytes from the datagram start and PACKET_RECOVERY_FAILUREs on a bad reconstructed `flags` byte). So the NV header fields RS must reproduce (streamPacketIndex/frameIndex/flags/multiFec) are written into data shards BEFORE encode, and only the transport fields (RTP header/seq/timestamp + fecInfo) are stamped AFTER — leaving the flags byte RS-covered. Matches Sunshine stream.cpp. Unit-tested incl. flags recovery. - fec_percentage wired from stream.rs (Sunshine default 20, LUMEN_FEC_PCT override; 0 = data-only). LUMEN_VIDEO_DROP injects loss to test recovery. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 11:34:27 +00:00
enricobuehler	278a6330de	feat: M2 P1.6 — audio (Opus + AES-CBC) and steady-rate video pacing A stock Moonlight client now gets video + full input + AUDIO from the from-scratch GameStream host (verified live end-to-end on a macOS client). Audio (audio.rs, audio/linux.rs, gamestream/audio.rs): - Capture the default PipeWire sink's monitor (system output) as interleaved f32 stereo @ 48kHz via stream.capture.sink, on its own thread. - Opus-encode 5ms/240-sample stereo frames (RESTRICTED_LOWDELAY, CBR) and send as GameStream RTP audio: 12-byte BE RTP_PACKET (packetType 97, seq+1/pkt, timestamp += packetDuration, ssrc 0) on UDP 48000, after learning the client endpoint from its port-learning ping. - Encrypt the Opus payload with AES-128-CBC (PKCS7), key = launch rikey, IV = BE32(rikeyid + seq) in [0..4]. Like the control stream, modern Moonlight always decrypts audio regardless of the negotiated flags — plaintext makes it log "Failed to decrypt audio packet" and play silence (diagnosed from the client log). RTP header stays in the clear. Scheme cross-checked against Sunshine stream.cpp/crypto.cpp + moonlight AudioStream.c. - Pace each frame to its 5ms slot (PipeWire delivers ~1024-frame buffers) to avoid bursts the client's jitter buffer hears as glitches. LUMEN_AUDIO_GAIN applies optional linear gain for quiet sources. - DESCRIBE SDP advertises the stereo Opus config (a=fmtp:97 surround-params). Video (stream.rs): pace at a steady ≤60fps, re-encoding the last captured frame when the compositor produces none. wlroots only emits on damage, so a static or slow-updating desktop previously starved the client into a "network too slow" abort; an unchanged frame costs a near-empty P-frame. Adds a non-blocking Capturer::try_latest (portal drains to the freshest queued frame). Misc: serialize pipewire init across the video + audio capture threads (pwinit.rs, std::sync::Once) to avoid a concurrent pw_init race. Deps: opus, cbc; libopus-dev in bootstrap-ubuntu.sh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 10:39:22 +00:00
enricobuehler	4c2c41acba	feat: M2 P1.4 — control-stream decryption + input injection (mouse/keyboard live) A stock Moonlight client can now drive the headless Sway desktop: mouse movement, buttons, scroll, and keyboard all inject through the streamed session (verified live end-to-end — typing, clicking, window management). Control stream (gamestream/control.rs): - Moonlight encrypts the ENet control stream with AES-128-GCM even though we negotiate no media encryption (it detects our Sunshine `state` and turns it on). Decrypt per-packet under the /launch `rikey`. - The exact GCM scheme is auto-detected on the first authenticating packet (nonce construction × key byte-order × tag position × AAD) since GCM gives no partial credit. Our client uses the legacy 16-byte nonce (`iv[0]=seq&0xff`) because we advertise no encryption; the 12-byte SS_ENC_CONTROL_V2 nonce is also supported. Key/IV/tag layout cross-checked against Sunshine stream.cpp + crypto.cpp and moonlight-common-c ControlStream.c. Input decode (gamestream/input.rs): - Decrypted control messages (`[u16 type][u16 len][NV_INPUT packet]`, type 0x0206) decode into lumen_core::input::InputEvent: relative/abs mouse, buttons, vert/horiz scroll, keyboard down/up. Struct layout from moonlight Input.h (size BE, magic LE, body BE; keyCode LE masked to the low-byte VK), dispatch per Sunshine input.cpp (Gen5+). Unit-tested against real captured bytes. Injection (inject.rs): - WlrootsInjector: connects to Sway as a Wayland client and injects via the wlroots virtual-pointer + virtual-keyboard protocols (uinput is invisible to a compositor running WLR_LIBINPUT_NO_DEVICES=1). Uploads an evdev/US xkb keymap, tracks modifier state, and maps Windows VK → Linux evdev (full table). Deps: aes-gcm, wayland-client, wayland-protocols-{wlr,misc}, xkbcommon (+ libxkbcommon-dev in bootstrap-ubuntu.sh). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:56:19 +00:00
enricobuehler	c8491af893	feat(m2): real desktop capture in the video stream (portal → Moonlight) Wire M0's portal desktop capture into the GameStream video plane: with LUMEN_VIDEO_SOURCE=portal the stream captures the headless wlroots desktop (PipeWire RGB) instead of the synthetic pattern, opens NVENC from the first captured frame's format/size, and streams it. Verified live: a stock Moonlight client shows the real 5120×1440 desktop at ~42 fps (release build). - capture.rs: FastSyntheticCapturer (cheap fill pattern, real-time at 5K) so both sources share the Capturer trait - stream.rs: source select (portal \| synthetic), encoder opened from the first frame, wall-clock 90 kHz RTP timestamps (correct under a variable capture rate) Note: the CPU-copy RGB→rgb0 path caps ~42 fps at 5K (single-threaded); dmabuf zero-copy is the deferred optimization (plan §9). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:51:49 +00:00
enricobuehler	de60650ed3	feat(m2): live video to stock Moonlight — ENet control + video data plane A stock Moonlight client now decodes H.265 from the lumen host end-to-end (verified at 5120×1440@120 on RTX 5070 Ti): - control.rs: ENet control host on UDP 47999 (rusty_enet). Moonlight starts the control stream before video (STAGE_CONTROL_STREAM_START precedes _VIDEO_), so it must be up first — this was the blocker behind the earlier "error 35". - stream.rs: video data plane — on RTSP PLAY, learn the client endpoint from its ping, NVENC-encode at the negotiated mode, packetize (GameStream RTP/NV/FEC), send over UDP 47998; stops when the client disconnects. - rtsp.rs: ANNOUNCE → StreamConfig (resolution/fps/packetSize/bitrate/codec), PLAY starts the stream, TEARDOWN stops it; PairStatus=1 over the mutual-TLS port. P1.3 uses a synthetic test pattern + data-shards-only FEC (clean-LAN). Next: real portal desktop capture, input injection (decode control → uinput), nanors-exact FEC, encryption, audio. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:39:14 +00:00
enricobuehler	ab6dda2e5f	feat: M0 capture→encode pipeline + M2 GameStream host (pairing, RTSP, video) M0 (lumen-host) — verified on NVIDIA RTX 5070 Ti / Ubuntu 25.10: headless wlroots → xdg ScreenCast portal → PipeWire → NVENC HEVC → playable file, with each access unit round-tripped through a lumen_core host↔client Session (FEC + packetize + reassemble), 0 mismatches. - capture.rs: SyntheticCapturer + portal capture (ashpd 0.13 + pipewire 0.9), format-aware - encode/linux.rs: NVENC via ffmpeg-next 7 (BGRx/RGB → rgb0, no host-side swscale) - m0.rs: capture→encode→file + lumen-core loopback verification M2 P1 (lumen-host gamestream/) — a stock Moonlight client pairs + launches, verified live: - mDNS _nvstream._tcp + nvhttp /serverinfo (HTTP 47989, mutual-TLS HTTPS 47984) - 4-phase pairing: PIN→AES-128-ECB / SHA-256 / RSA-PKCS1v15 / X.509, custom rustls ClientCertVerifier for the mutual-TLS pairchallenge - /applist, /launch (rikey/rikeyid/mode), hand-rolled RTSP (OPTIONS/DESCRIBE/SETUP×3/ ANNOUNCE/PLAY, one-request-per-TCP-connection per moonlight-common-c's read-to-EOF) - video.rs: GameStream RTP + NV_VIDEO_PACKET wire packetizer, data-shards-only (0% FEC, clean-LAN), unit-tested (single/multi-block) Docs: docs/m2-plan.md (phased plan) + docs/research/ (ground-truth protocol spec). Bootstrap/setup updated for the verified path (libnvidia-gl, render/video groups, GPU EGL, pipewire 0.9). Workspace clippy-clean, tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:14:59 +00:00
enricobuehler	a913042367	feat: M1 lumen-core (FEC/crypto/packet/session + C ABI) and workspace scaffold Ground-up low-latency streaming stack per docs/implementation-plan.md. M1 is complete and tested; Linux host backends are cfg-gated stubs to be filled in on real hardware (M0/M2). lumen-core (built + tested on macOS/aarch64 — 21 tests): - fec: ErasureCoder over GF(2^8) (reed-solomon-erasure, Moonlight-compatible) and GF(2^16) Leopard-RS (reed-solomon-simd, the >1 Gbps wall-breaker); proptested - packet: zero-copy #[repr(C)] framing, multi-block, FEC-aware reassembly - crypto: AES-128-GCM with per-direction nonce salts + sequence-as-AAD - session: host submit / client poll hot paths + input; loopback & UDP transports - abi: opaque handles, versioned LumenConfig, panic guards; cbindgen-generated header - acceptance: Rust loopback+proptest and a C harness that links the staticlib Scaffold (compiles green on all platforms): lumen-host (vdisplay/capture/encode/ inject/web/pipeline seams under cfg(linux)), lumen-client-rs, tools/{loss-harness, latency-probe}, Apple/Android client stubs, Gitea CI, docs. Hardened against a multi-agent adversarial review (13 verified findings fixed, regression-tested): reassembler memory-DoS bounds + block-consistency validation, GCM nonce-reuse direction separation, ABI struct_size guard + range checks, FEC shard-length guards, shard_payload datagram bound, key zeroization + Debug redaction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 00:02:52 +02:00

41 Commits