punktfunk

Author	SHA1	Message	Date
enricobuehler	751789f932	feat: M2 — LINEAR-dmabuf CUDA import attempt + graceful zero-copy fallback (gamescope) gamescope only offers LINEAR dmabufs, which the EGL/GL interop path can't handle (NVIDIA's EGL lists no LINEAR modifier for sampling). Attempt a direct CUDA external-memory import (cuImportExternalMemory OPAQUE_FD, cached per buffer fd, one DtoD copy per frame into the pooled buffer): the FFI + plumbing are in place, and LINEAR(0) is now advertised alongside the tiled EGL modifiers (tiled first, so KWin still prefers it — regression-tested). Empirically the 595 desktop driver rejects raw dmabuf fds as OPAQUE_FD (CUDA_ERROR_UNKNOWN), matching the documented limitation — true LINEAR GPU import needs a Vulkan interop bridge (import dmabuf via VK_EXT_external_memory_dma_buf, GPU-copy into an exportable allocation, hand that to CUDA), noted as future work. So the importer now degrades instead of dying: on GPU-import failure it logs once, disables itself, and falls through to the CPU mmap path. Validated: gamescope + LUMEN_ZEROCOPY=1 runs full-rate (122.9 fps @720p120, valid HEVC) via the fallback; KWin keeps real zero-copy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 22:43:35 +00:00
enricobuehler	7f3897e0d3	feat: M2 — gamescope input via its EIS socket (SteamOS-like input path) gamescope runs its own EIS server and exports the socket to its children as LIBEI_SOCKET — no portal involved. The gamescope backend now launches the nested app through a tiny shell wrapper that relays that value to /tmp/lumen-gamescope-ei; the libei injector gains an EiSource enum (Portal \| SocketPathFile) and connects a UnixStream directly to gamescope's socket (polling until the app has started), then runs the identical reis sender flow. Backend::GamescopeEi is auto-selected when LUMEN_COMPOSITOR=gamescope (LUMEN_INPUT_BACKEND=gamescope overrides). Validated end-to-end: input-test against a headless gamescope running xev — 129 MotionNotify/KeyPress/ButtonPress events delivered into the nested X app ("Gamescope Virtual Input" device bound, sender handshake + emulation working). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 22:34:48 +00:00
enricobuehler	c8f9032dec	feat: M2 — harden gamescope capture path (blocked on gamescope ≥3.16.22 upstream fix) Deep investigation (gdb + daemon traces) proved the gamescope capture stall is a gamescope 3.16.20 bug, not ours: it calls pw_loop_iterate() without pw_loop_enter()/leave(), and under PipeWire 1.6's loop locking its main thread permanently holds the loop mutex — the pw thread deadlocks, gamescope never acks the daemon's port_set_param(Format), and the link parks in "negotiating" silently. Stock gst pipewiresrc fails identically. Fixed upstream by gamescope commit e3ed1ea7 ("pipewire: Fix pipewire loop locking", pipewire#5148); first release 3.16.22. Ubuntu 26.04 ships 3.16.20 (built ten days before the fix) — patch/upgrade required. Consumer-side improvements from the investigation (all verified correct vs gamescope's pods, and needed once the producer is fixed): - discover the node from gamescope's own "stream available on node ID: N" log line (its node.name appears on two objects; the advertised id is authoritative); pw-dump fallback - CPU path accepts mappable dmabufs: Buffers param now offers MemPtr\|MemFd\|DmaBuf (gamescope counter-offers exactly DmaBuf when its modifier pod wins, never MemPtr), mmap the fd ourselves when MAP_BUFFERS didn't (Vulkan-exported dmabufs aren't flagged mappable), and treat chunk.size==0 as the computed span - warn_once on every silent frame-drop path in the process callback - node.dont-reconnect on our capture streams: an orphaned stream re-targeted by wireplumber onto a fresh node wedges it — and a stuck link head-blocks the daemon's shared work queue, stalling ALL new link negotiation system-wide (this poisoned whole test sessions) - LUMEN_GAMESCOPE_NODE (attach to an existing gamescope) + LUMEN_PW_FIXED_POD (negotiation bisection) debug knobs KWin path regression-tested (zero-copy intact). gamescope end-to-end validation pending the patched gamescope build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 22:16:53 +00:00
enricobuehler	20bd76ae50	feat: M2 — gamescope virtual-display backend (spawn headless, capture its PipeWire node) Third compositor on the VirtualDisplay seam. gamescope's model differs from KWin/Mutter: it's not a runtime protocol but a micro-compositor we spawn — `gamescope --backend headless -W -H -r -- <app>` — which composites at the client's size AND refresh natively (so no separate refresh step), runs the app nested, and exports a built-in PipeWire node named "gamescope". The backend spawns it, discovers that node via pw-dump, and returns a VirtualOutput whose keepalive owns the process (drop = kill = teardown). App via LUMEN_GAMESCOPE_APP. Select with LUMEN_COMPOSITOR=gamescope; m0's virtual source now honors LUMEN_COMPOSITOR so any backend is testable without a client. Input (gamescope's libei/EIS socket) is a follow-up. Builds/clippy/fmt clean. Needs gamescope installed to validate; headless capture on the proprietary NVIDIA driver is plausible-by-architecture but unproven — validate empirically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 21:23:52 +00:00
enricobuehler	22a982a1cb	feat: M2 — drive KWin virtual output above 60Hz (custom mode at the client's refresh) KWin creates virtual outputs at a hardcoded 60 Hz and zkde stream_virtual_output has no refresh argument, so the source composited at 60 Hz even when the client asked for 120/240 (confirmed live: stream paced a stable 240 fps but only ~60 unique frames/s). KWin 6.6+ allows custom modes on virtual outputs, so after creating the output we install + select a mode at the client's refresh, before capture connects PipeWire. First cut shells out to kscreen-doctor (output is "Virtual-<name>"); the in-process kde_output_management_v2 client is a follow-up. Best-effort — failure leaves the source at 60 Hz (stream still works). Verified the mode is applied (Virtual-lumen -> 1280x720@120). Empirically de-risked that this headless QEMU VM's software vsync accepts >60 Hz. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 21:14:16 +00:00
enricobuehler	37ae26b4be	perf: M2 — auto 2-way NVENC split-encode for high pixel rates (5K@240) GB203 has two NVENC engines. A single HEVC p1 session tops out ~1 Gpix/s, so 5120x1440@240 (1.77 Gpix/s) is encoder-bound on one engine; split-frame encode runs it across both (~1.8x, latency-neutral, output is standard HEVC the client decodes normally). NVENC's AUTO split won't engage below ~2112px height, so force split_encode_mode=2 when the pixel rate exceeds ~1 Gpix/s (HEVC/AV1 only — not H.264). Below that (e.g. 5K@120) stay single-engine to avoid the ~2% BD-rate cost. Override with LUMEN_SPLIT_ENCODE. Verified: engages at 240, not at 120. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 20:42:41 +00:00
enricobuehler	a473f4a926	perf: M2 — amortize per-frame zero-copy overhead (pool buffers + register once) The zero-copy import did real per-frame GPU churn that capped high-fps throughput: a fresh ~29MB cuMemAllocPitch + cuMemFree, a cuGraphicsGLRegisterImage/unregister, and a map of the same persistent blit texture — every frame. Two fixes: - BufferPool: a recycled free-list of pitched device buffers per resolution. DeviceBuffer returns its allocation to the pool on drop (after the encoder synchronized) instead of freeing — kills the per-frame 29MB alloc/free that took the device allocator lock and serialized against the GPU. - RegisteredTexture: register the (reused) GL_RGBA8 blit destination with CUDA ONCE when the GlBlit is built; each frame only maps → copies the array → unmaps, instead of registering/unregistering every frame. This is the "zero-copy should be overhead-free" cleanup. Verified the import still produces correct frames; the remaining per-frame cuCtxSynchronize pair (shared-context coupling) is the next step (CUDA stream + events). lumen-host builds, clippy/fmt/tests clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 20:38:26 +00:00
enricobuehler	0e1853e070	fix: M2 — eliminate the periodic high-res stream freeze (infinite GOP + single-deadline pacing) At 5120x1440 the stream froze on a ~2s cadence. Two compounding causes (confirmed by a profiling pass + adversarial review): 1. Periodic IDR every 2s (set_gop(fps*2)). A keyframe at 5K is ~20-40x a P-frame — a recurring multi-millisecond encode+packetize+send spike. Fix: infinite GOP (gop_size=-1), one IDR at stream start, P-frames only; forced-idr makes a client recovery request (RFI via request_keyframe) emit an IDR on demand — the Moonlight/Sunshine low-latency model. 2. Two pacing timers summing on the capture/encode thread: a per-packet thread::sleep pacer (spread a frame's packets across a whole frame interval) PLUS a backstop sleep on top, so every frame cost 1-2x the interval and the big IDR blew through it (the 2->120 oscillation). Fix: delete both; send at line rate and drive cadence from a single absolute deadline. (Proper microburst pacing belongs on a dedicated send thread — a follow-up.) Also: honor the client's fps (pacing clamp 60->240) and add an env-gated (LUMEN_PERF) per-stage timing log (enc/pkt/send µs + unique-vs-reencoded frames + max packet burst) for diagnosing the remaining throughput ceiling. Verified live: freeze gone at 5120x1440. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 20:29:48 +00:00
enricobuehler	669d40ae21	build: migrate to ffmpeg-next 8 (FFmpeg 8.x / libavcodec 62) Ubuntu 26.04 ships FFmpeg 8.0 (libavcodec 62); bump ffmpeg-next 7.1 -> 8.1 to bind it as the intended pairing. No source changes needed — the encode API surface we use (avcodec_send_frame, hwframe contexts, AV_PIX_FMT_CUDA, av_log) is stable across 7->8. Workspace builds + all tests green; clippy/fmt clean. Refresh the 7.x doc references. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 18:13:40 +00:00
enricobuehler	7d08e43c16	feat: M2 — KWin virtual-output backend behind a VirtualDisplay trait (native client resolution) Honor the client's requested resolution by rendering a compositor virtual output at exactly that size — native, headless, no scaling. There is no cross-compositor Wayland protocol for this, so it's a per-compositor backend behind the (previously stubbed) VirtualDisplay trait. - vdisplay.rs: VirtualDisplay::create(mode) now returns a live VirtualOutput { node_id, remote_fd: Option<OwnedFd>, keepalive } with RAII teardown (drop releases the output) instead of an inert OutputHandle + explicit destroy. Add compositor detect() (LUMEN_COMPOSITOR / XDG_CURRENT_DESKTOP). - vdisplay/kwin.rs: the KWin backend — the zkde_screencast_unstable_v1 stream_virtual_output client (vendored protocol XML + wayland-scanner codegen). Creates a WxH output, returns its PipeWire node (default daemon, remote_fd=None); a keepalive thread holds the Wayland connection until dropped. (Moved here from capture/kwin.rs — it's a vdisplay backend, not capture.) - capture: generalize the PipeWire consumer to Option<OwnedFd> (portal remote vs. default daemon) and add capture_virtual_output(vout), compositor-agnostic, owning the keepalive. - gamestream/stream.rs: LUMEN_VIDEO_SOURCE=virtual creates a virtual display sized to the client's cfg and captures it (self-contained, not pooled — a reconnect at a new resolution gets a fresh output). - m0: --source kwin-virtual goes through the trait. Verified end-to-end against the running headless KWin: the request reaches the compositor and is handled cleanly. Native creation needs a backend implementing createVirtualOutput — the DRM backend, or the VirtualBackend since KWin 6.5.6; on this box's --virtual 6.4.5 it returns "Could not find output" (expected; validates after the KWin upgrade). wlroots/Mutter backends are the next ones to land on the same seam. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 17:30:02 +00:00
enricobuehler	6508980564	feat: M2 — validate client-requested video mode (codec dimension guards) Clients pick the resolution via mode=WxHxFPS / RTSP clientViewportWd-Ht, so the host must bound attacker/typo-controlled dimensions before allocating buffers or opening NVENC. Add encode::validate_dimensions: reject zero, odd, and over-limit modes (H.264 ≤ 4096px/side; HEVC/AV1 ≤ 8192) with a clear message instead of a buffer-math overflow or an opaque NVENC open failure. Gate both the stream path (before any allocation) and open_video (also covers m0). Unit-tested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 16:40:56 +00:00
enricobuehler	aa91485008	feat: M2 — complete zero-copy dmabuf→NVENC capture path (EGL/GL→CUDA) The PipeWire dmabuf now reaches NVENC with no CPU touch. Verified live against headless KWin: a tiled BGRx dmabuf is imported and encoded to a pixel-correct H.265 stream (decoded frame matches the captured desktop — no tiling artifacts, no colour swap). The CPU-copy path stays the default and the runtime fallback. Capture side (zerocopy::egl): desktop NVIDIA can't register a dmabuf EGLImage with CUDA directly (cuGraphicsEGLRegisterImage is Tegra-only; cuGraphicsGLRegisterImage rejects EGLImage-backed textures), so we follow OBS/Sunshine — bind the EGLImage to a GL texture, render it through a fullscreen-triangle shader into an immutable GL_RGBA8 texture (de-tiling + .bgra swizzle to the BGRx the encoder wants), then register that texture with CUDA and copy it device-to-device into an owned buffer so the dmabuf returns to the compositor immediately. Encode side (encode/linux::submit_cuda): take a pooled CUDA surface via av_hwframe_get_buffer and device→device-copy our imported buffer into it, instead of wrapping our own pointer in a bare AVFrame. A bare frame is rejected with EINVAL (NVENC ignores frames with null buf[0]; the encode path's av_frame_ref needs a refcounted buffer), and a fresh device pointer every frame would thrash NVENC's bounded resource-registration cache — the pool recycles a small set. Also: gate FFmpeg AV_LOG_DEBUG behind LUMEN_FFMPEG_DEBUG for diagnosing hw-frame rejects, and refresh the now-accurate module docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 16:28:29 +00:00
enricobuehler	e3876c0d8a	feat: M2 zero-copy — PipeWire dmabuf negotiation + EGL device-platform import (WIP) Wire the capture side of zero-copy (LUMEN_ZEROCOPY=1): - EGL importer now opens the headless EGLDisplay on the NVIDIA EGL device (EGL_PLATFORM_DEVICE_EXT) and queries its importable DRM modifiers (eglQueryDmaBufModifiersEXT). - The PipeWire stream advertises a BGRx dmabuf format with those modifiers as a mandatory enum Choice + a dmabuf-only Buffers param; the compositor fixates an importable tiled modifier. param_changed reads the negotiated modifier; the process callback imports the dmabuf (eglCreateImage with explicit LO/HI modifier) and would copy it into a CUDA buffer for the encoder. Validated against headless KWin (Plasma 6.4): negotiation succeeds (13 NVIDIA modifiers advertised, KWin fixates one, stream reaches Streaming with a real tiled dmabuf) and `eglCreateImage` succeeds. The remaining blocker is `cuGraphicsEGLRegisterImage` returning CUDA_ERROR_INVALID_VALUE on the dmabuf-imported EGLImage — the likely fix is to bind the EGLImage to a GL texture (glEGLImageTargetTexture2DOES) and register that via cuGraphicsGLRegisterImage (OBS/Sunshine's path), which needs a GL context. The CPU-copy path stays the default and is unaffected (regression-checked: real KWin capture → HEVC). LUMEN_ZEROCOPY is opt-in/experimental until the CUDA registration lands. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 15:41:31 +00:00
enricobuehler	16a00563a8	feat: M2 zero-copy foundation — EGL→CUDA import + NVENC CUDA-frame path Scaffolding for dmabuf zero-copy (plan §9), opt-in via LUMEN_ZEROCOPY: - src/zerocopy/{cuda,egl}.rs: hand-rolled CUDA Driver-API FFI (no Rust crate exposes the EGL-interop calls / CUeglFrame) with a shared process-wide CUcontext + pitched device buffers; an EGL importer (GBM platform on the NVIDIA render node) that turns a dmabuf into an EGLImage, registers it with CUDA, and copies it device-to-device into an owned buffer. `zerocopy-probe` subcommand validates the FFI/linking/GPU access — confirmed on the box (driver 595, EGL_EXT_image_dma_buf_import + modifiers). - CapturedFrame gains a FramePayload enum (Cpu(Vec<u8>) \| Cuda(DeviceBuffer)); the encoder branches: CPU keeps the expand+upload path, CUDA wraps the device buffer in an AV_PIX_FMT_CUDA frame fed straight to hevc_nvenc (sharing our CUcontext via a hand-declared AVCUDADeviceContext, since ffmpeg-sys doesn't bind hwcontext_cuda.h). open_video/the encoder take a `cuda` flag derived from the first frame's payload. The capture-side dmabuf negotiation (which produces the Cuda frames) is the next step; the CPU path is unchanged and remains the default + fallback. Builds clean, clippy clean, tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 15:13:05 +00:00
enricobuehler	b64be1dc33	fix: m0 portal capture — activate the capturer so frames are delivered The M2 teardown work added an `active` gate to the PipeWire capture callback (idle by default so reconnects stay cheap, with the stream path calling set_active(true) on PLAY). The `m0` subcommand was never updated, so its portal capturer stayed inactive and the callback dropped every frame — `m0 --source portal` failed with "no PipeWire frame within 10s" on every compositor. Call set_active(true) before the capture loop. Validated on headless KWin (Plasma 6.4) via the RemoteDesktop-anchored ScreenCast session: real desktop frames flow (shm BGRx 1920x1080) and encode to valid H.265. (Also folds in a rustfmt reflow of the input-test log line.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 14:24:09 +00:00
enricobuehler	0a79a8209b	feat: M2 — RemoteDesktop-anchored ScreenCast capture for KWin/GNOME On RemoteDesktop-capable desktops (KWin, GNOME), select the ScreenCast source on a session created via the RemoteDesktop portal and start it through RemoteDesktop, so a single grant — pre-authorized headlessly via the `kde-authorized` permission, exactly like the libei input path — also covers screen capture. Standalone ScreenCast has no such bypass and would raise an un-clickable dialog on a headless box. wlroots/Sway has no RemoteDesktop portal, so it keeps the plain ScreenCast session; the choice keys off inject::default_backend(). The PipeWire consumer is unchanged — the anchored session yields the same fd + node id. Validated on headless KWin (Plasma 6.4): the portal grants the session with no dialog and PipeWire negotiates the format (1920x1080 BGRx, Streaming). Frame delivery on KWin still pends dmabuf import — KWin hands GPU dmabuf buffers and the M0 consumer is CPU-copy/shm only (plan §9, zero-copy) — so it's the next step; the CPU-copy path remains correct on wlroots. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 14:09:19 +00:00
enricobuehler	03a6a67354	feat: M2 P1.7 — libei input backend (portable to KWin/GNOME) Add a second input-injection backend that works on compositors implementing the org.freedesktop.portal.RemoteDesktop interface (KWin, GNOME/Mutter), where the wlroots virtual-input protocols are absent. Uses ashpd 0.13 to open a RemoteDesktop session + EIS fd and reis 0.6.1 to drive it as an EI sender: bind pointer/keyboard/scroll/button capabilities and, per device, start_emulating → emit → frame. Runs on a dedicated thread with its own tokio runtime (the portal session + EIS connection must stay alive and the event stream must be polled continuously); open() returns immediately so a slow or denied portal can never freeze the ENet control thread, with events enqueued over an unbounded channel until devices resume. Backend now auto-selects per session (inject::default_backend): wlr on Sway, libei on KDE/GNOME; LUMEN_INPUT_BACKEND overrides. Refactor inject.rs into the inject/{wlr,libei}.rs layout matching the capture/encode convention. Keyboard codes are evdev (the same space our VK→evdev table produces) and the compositor supplies the keymap, so no keymap upload and no modifier serialization — pressing the modifier keys Moonlight sends is enough. Add a `lumen-host input-test` subcommand that injects a scripted mouse+keyboard pattern through the session backend, so input injection can be validated without a Moonlight client. Live-validated on headless KWin (Plasma 6.4): mouse motion, left click, and the 'A' key inject correctly and are delivered to the focused client. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 13:58:41 +00:00
enricobuehler	6de09fd822	feat: M2 teardown — persistent capturers for clean reconnects Disconnect/reconnect now works reliably. Previously each stream spawned its own portal+PipeWire (and PipeWire audio) capture threads and never stopped them, so a reconnect opened a SECOND screencast session that conflicted with the leaked first one ("no PipeWire frame within 10s" → black screen on reconnect). - The screen capturer and audio capturer are now persistent, held in AppState and reused across streams (created on the first stream). One screencast session for the host's lifetime → no conflict, and instant reconnect (no re-handshake). Verified live: 3 stream cycles, 1 create + 2 "reusing capturer", clean every time. - Capturer::set_active gates the (5K, ~1.3 GB/s) de-pad copy to active streams, so the persistent video capturer is nearly free while idle between streams. - AudioCapturer::drain discards buffered chunks on reuse so the client never hears stale audio captured while idle. - stream.rs / gamestream/audio.rs split into a borrow-the-capturer wrapper + the encode/send body, so the capturer is always returned to its slot on exit. This holds whether the client reconnects via /resume (Moonlight's "running → play/continue") or a fresh /launch — both re-run RTSP PLAY → a new stream cycle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 12:35:10 +00:00
enricobuehler	af4360c930	feat: M2 P1.5 robustness — IDR-on-request, send pacing, min-parity floor Graceful FEC behavior on a lossy link: at a realistic 2% packet loss the stream is now steady 0% (was spiking 40-60%). Verified live. - IDR/RFI handling: the control thread recognizes the client's recovery requests (0x0301 invalidate-reference-frames, 0x0302 request-IDR, 0x0305) and sets a shared force_idr flag; the video thread forces an NVENC keyframe on the next frame (Encoder::request_keyframe → input frame pict_type = I). Without this, a frame that exceeds the FEC budget broke the reference chain until the next GOP IDR (~2s), cascading to most of the stream being undecodable. - Min-parity floor: honor the client's x-nv-vqos[0].fec.minRequiredFecPackets (it asks for 2). Small P-frames previously got m=ceil(k*20/100)=1 parity — a single loss broke them; flooring m>=2 (capped so k+m<=255, wire pct recomputed) protects them. This is what turned the 2% spikes into steady 0%. - Send pacing: spread each frame's packets evenly across the frame interval instead of blasting them at line rate (a real link drops microbursts), matching Sunshine's rate-controlled sends; sub-500us sleeps skipped (unreliable). Note: sustained ~8% uniform loss still degrades — that exceeds 20% FEC for reference-frame video and real Sunshine degrades there too; real networks are <1% or bursty, which this now handles cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 12:14:59 +00:00
enricobuehler	72f8c05aa3	feat: M2 P1.5 (FEC) — nanors-exact Reed-Solomon recovery for the video stream Moonlight now reconstructs lost video shards from our parity (verified live: under induced packet loss the picture recovers cleanly instead of failing with "network connection too bad"; 0% added loss in normal operation). The decisive finding: Moonlight's nanors uses a CAUCHY generator matrix (M[j][i] = inv[(m+i)^j], GF(2^8) poly 0x1d), while reed-solomon-erasure is Vandermonde — so its parity was NOT Moonlight-decodable, despite the old gf8.rs comment claiming equivalence. lumen-core: - Swap the GF(2^8) backend from reed-solomon-erasure to a vendored fec-rs (vendor/fec-rs, BSD-2), which builds the byte-identical Cauchy matrix. Pure Rust, no FFI — keeps the "one core" hot path. This makes both lumen's own protocol and the GameStream parity nanors-compatible. - Lock it with a regression test against real nanors vectors (k=4,m=2 [10,20,30,40] -> parity [136,0]) + an independent matrix-derived cross-check + an erase/recover round-trip. Existing FEC/loopback tests stay green, so lumen's own protocol is unaffected. lumen-host video.rs: - Generate m = ceil(kpct/100) parity shards per FEC block via Gf8Coder; stamp fecInfo with the recomputed wire pct (100m/k) so the client derives the same count; cap per-block data to 255100/(100+pct) so k+m <= 255. - CRITICAL byte-exactness: RS runs over the whole `blocksize` shard (Moonlight decodes packetSize+16 bytes from the datagram start and PACKET_RECOVERY_FAILUREs on a bad reconstructed `flags` byte). So the NV header fields RS must reproduce (streamPacketIndex/frameIndex/flags/multiFec) are written into data shards BEFORE encode, and only the transport fields (RTP header/seq/timestamp + fecInfo) are stamped AFTER — leaving the flags byte RS-covered. Matches Sunshine stream.cpp. Unit-tested incl. flags recovery. - fec_percentage wired from stream.rs (Sunshine default 20, LUMEN_FEC_PCT override; 0 = data-only). LUMEN_VIDEO_DROP injects loss to test recovery. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 11:34:27 +00:00
enricobuehler	278a6330de	feat: M2 P1.6 — audio (Opus + AES-CBC) and steady-rate video pacing A stock Moonlight client now gets video + full input + AUDIO from the from-scratch GameStream host (verified live end-to-end on a macOS client). Audio (audio.rs, audio/linux.rs, gamestream/audio.rs): - Capture the default PipeWire sink's monitor (system output) as interleaved f32 stereo @ 48kHz via stream.capture.sink, on its own thread. - Opus-encode 5ms/240-sample stereo frames (RESTRICTED_LOWDELAY, CBR) and send as GameStream RTP audio: 12-byte BE RTP_PACKET (packetType 97, seq+1/pkt, timestamp += packetDuration, ssrc 0) on UDP 48000, after learning the client endpoint from its port-learning ping. - Encrypt the Opus payload with AES-128-CBC (PKCS7), key = launch rikey, IV = BE32(rikeyid + seq) in [0..4]. Like the control stream, modern Moonlight always decrypts audio regardless of the negotiated flags — plaintext makes it log "Failed to decrypt audio packet" and play silence (diagnosed from the client log). RTP header stays in the clear. Scheme cross-checked against Sunshine stream.cpp/crypto.cpp + moonlight AudioStream.c. - Pace each frame to its 5ms slot (PipeWire delivers ~1024-frame buffers) to avoid bursts the client's jitter buffer hears as glitches. LUMEN_AUDIO_GAIN applies optional linear gain for quiet sources. - DESCRIBE SDP advertises the stereo Opus config (a=fmtp:97 surround-params). Video (stream.rs): pace at a steady ≤60fps, re-encoding the last captured frame when the compositor produces none. wlroots only emits on damage, so a static or slow-updating desktop previously starved the client into a "network too slow" abort; an unchanged frame costs a near-empty P-frame. Adds a non-blocking Capturer::try_latest (portal drains to the freshest queued frame). Misc: serialize pipewire init across the video + audio capture threads (pwinit.rs, std::sync::Once) to avoid a concurrent pw_init race. Deps: opus, cbc; libopus-dev in bootstrap-ubuntu.sh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 10:39:22 +00:00
enricobuehler	4c2c41acba	feat: M2 P1.4 — control-stream decryption + input injection (mouse/keyboard live) A stock Moonlight client can now drive the headless Sway desktop: mouse movement, buttons, scroll, and keyboard all inject through the streamed session (verified live end-to-end — typing, clicking, window management). Control stream (gamestream/control.rs): - Moonlight encrypts the ENet control stream with AES-128-GCM even though we negotiate no media encryption (it detects our Sunshine `state` and turns it on). Decrypt per-packet under the /launch `rikey`. - The exact GCM scheme is auto-detected on the first authenticating packet (nonce construction × key byte-order × tag position × AAD) since GCM gives no partial credit. Our client uses the legacy 16-byte nonce (`iv[0]=seq&0xff`) because we advertise no encryption; the 12-byte SS_ENC_CONTROL_V2 nonce is also supported. Key/IV/tag layout cross-checked against Sunshine stream.cpp + crypto.cpp and moonlight-common-c ControlStream.c. Input decode (gamestream/input.rs): - Decrypted control messages (`[u16 type][u16 len][NV_INPUT packet]`, type 0x0206) decode into lumen_core::input::InputEvent: relative/abs mouse, buttons, vert/horiz scroll, keyboard down/up. Struct layout from moonlight Input.h (size BE, magic LE, body BE; keyCode LE masked to the low-byte VK), dispatch per Sunshine input.cpp (Gen5+). Unit-tested against real captured bytes. Injection (inject.rs): - WlrootsInjector: connects to Sway as a Wayland client and injects via the wlroots virtual-pointer + virtual-keyboard protocols (uinput is invisible to a compositor running WLR_LIBINPUT_NO_DEVICES=1). Uploads an evdev/US xkb keymap, tracks modifier state, and maps Windows VK → Linux evdev (full table). Deps: aes-gcm, wayland-client, wayland-protocols-{wlr,misc}, xkbcommon (+ libxkbcommon-dev in bootstrap-ubuntu.sh). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:56:19 +00:00
enricobuehler	c8491af893	feat(m2): real desktop capture in the video stream (portal → Moonlight) Wire M0's portal desktop capture into the GameStream video plane: with LUMEN_VIDEO_SOURCE=portal the stream captures the headless wlroots desktop (PipeWire RGB) instead of the synthetic pattern, opens NVENC from the first captured frame's format/size, and streams it. Verified live: a stock Moonlight client shows the real 5120×1440 desktop at ~42 fps (release build). - capture.rs: FastSyntheticCapturer (cheap fill pattern, real-time at 5K) so both sources share the Capturer trait - stream.rs: source select (portal \| synthetic), encoder opened from the first frame, wall-clock 90 kHz RTP timestamps (correct under a variable capture rate) Note: the CPU-copy RGB→rgb0 path caps ~42 fps at 5K (single-threaded); dmabuf zero-copy is the deferred optimization (plan §9). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:51:49 +00:00
enricobuehler	de60650ed3	feat(m2): live video to stock Moonlight — ENet control + video data plane A stock Moonlight client now decodes H.265 from the lumen host end-to-end (verified at 5120×1440@120 on RTX 5070 Ti): - control.rs: ENet control host on UDP 47999 (rusty_enet). Moonlight starts the control stream before video (STAGE_CONTROL_STREAM_START precedes _VIDEO_), so it must be up first — this was the blocker behind the earlier "error 35". - stream.rs: video data plane — on RTSP PLAY, learn the client endpoint from its ping, NVENC-encode at the negotiated mode, packetize (GameStream RTP/NV/FEC), send over UDP 47998; stops when the client disconnects. - rtsp.rs: ANNOUNCE → StreamConfig (resolution/fps/packetSize/bitrate/codec), PLAY starts the stream, TEARDOWN stops it; PairStatus=1 over the mutual-TLS port. P1.3 uses a synthetic test pattern + data-shards-only FEC (clean-LAN). Next: real portal desktop capture, input injection (decode control → uinput), nanors-exact FEC, encryption, audio. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:39:14 +00:00
enricobuehler	ab6dda2e5f	feat: M0 capture→encode pipeline + M2 GameStream host (pairing, RTSP, video) M0 (lumen-host) — verified on NVIDIA RTX 5070 Ti / Ubuntu 25.10: headless wlroots → xdg ScreenCast portal → PipeWire → NVENC HEVC → playable file, with each access unit round-tripped through a lumen_core host↔client Session (FEC + packetize + reassemble), 0 mismatches. - capture.rs: SyntheticCapturer + portal capture (ashpd 0.13 + pipewire 0.9), format-aware - encode/linux.rs: NVENC via ffmpeg-next 7 (BGRx/RGB → rgb0, no host-side swscale) - m0.rs: capture→encode→file + lumen-core loopback verification M2 P1 (lumen-host gamestream/) — a stock Moonlight client pairs + launches, verified live: - mDNS _nvstream._tcp + nvhttp /serverinfo (HTTP 47989, mutual-TLS HTTPS 47984) - 4-phase pairing: PIN→AES-128-ECB / SHA-256 / RSA-PKCS1v15 / X.509, custom rustls ClientCertVerifier for the mutual-TLS pairchallenge - /applist, /launch (rikey/rikeyid/mode), hand-rolled RTSP (OPTIONS/DESCRIBE/SETUP×3/ ANNOUNCE/PLAY, one-request-per-TCP-connection per moonlight-common-c's read-to-EOF) - video.rs: GameStream RTP + NV_VIDEO_PACKET wire packetizer, data-shards-only (0% FEC, clean-LAN), unit-tested (single/multi-block) Docs: docs/m2-plan.md (phased plan) + docs/research/ (ground-truth protocol spec). Bootstrap/setup updated for the verified path (libnvidia-gl, render/video groups, GPU EGL, pipewire 0.9). Workspace clippy-clean, tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:14:59 +00:00
enricobuehler	a913042367	feat: M1 lumen-core (FEC/crypto/packet/session + C ABI) and workspace scaffold Ground-up low-latency streaming stack per docs/implementation-plan.md. M1 is complete and tested; Linux host backends are cfg-gated stubs to be filled in on real hardware (M0/M2). lumen-core (built + tested on macOS/aarch64 — 21 tests): - fec: ErasureCoder over GF(2^8) (reed-solomon-erasure, Moonlight-compatible) and GF(2^16) Leopard-RS (reed-solomon-simd, the >1 Gbps wall-breaker); proptested - packet: zero-copy #[repr(C)] framing, multi-block, FEC-aware reassembly - crypto: AES-128-GCM with per-direction nonce salts + sequence-as-AAD - session: host submit / client poll hot paths + input; loopback & UDP transports - abi: opaque handles, versioned LumenConfig, panic guards; cbindgen-generated header - acceptance: Rust loopback+proptest and a C harness that links the staticlib Scaffold (compiles green on all platforms): lumen-host (vdisplay/capture/encode/ inject/web/pipeline seams under cfg(linux)), lumen-client-rs, tools/{loss-harness, latency-probe}, Apple/Android client stubs, Gitea CI, docs. Hardened against a multi-agent adversarial review (13 verified findings fixed, regression-tested): reassembler memory-DoS bounds + block-consistency validation, GCM nonce-reuse direction separation, ABI struct_size guard + range checks, FEC shard-length guards, shard_payload datagram bound, key zeroization + Debug redaction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 00:02:52 +02:00

26 Commits