The first end-to-end run of lumen's own protocol, past the GameStream compatibility layer.
- lumen-core/src/quic.rs (behind the `quic` feature): the lumen/1 handshake — Hello/Welcome/
Start as length-prefixed LE binary on one QUIC bi-stream. Welcome carries the COMPLETE
data-plane Config: mode, FEC scheme incl. GF(2^16) Leopard (inexpressible in GameStream),
shard sizing, AES-GCM key + per-direction salt, data UDP port. Plus quinn endpoint helpers
(self-signed server; accepts-any client — pinning lands with the trust model) and framed
async IO. Round-trip unit-tested.
- lumen-host m3-host: serves one lumen/1 session — QUIC handshake, then a NATIVE thread
(no async on the frame path — design invariant) streams deterministic 64KB test frames
through the hardened M1 Session over UdpTransport.
- lumen-client-rs: from scaffold to working reference client — connects, negotiates, brings
up the client Session over UDP, reassembles + FEC-recovers + byte-verifies every frame.
VALIDATED END-TO-END on localhost: 300/300 frames verified, 0 mismatches, through
QUIC-negotiated GF(2^16) FEC + AES-GCM over real UDP sockets. M4 (decode+present) builds on
this exact client skeleton.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- gamestream/apps.rs: an app catalog (loaded from ~/.config/lumen/apps.json, with defaults:
Desktop + gamescope entries when gamescope/steam/vkcube are installed). /applist renders
it; /launch?appid=N selects the entry; RTSP PLAY resolves it and the stream honors the
app's compositor + nested command — so a Moonlight client picks "Steam" and gets a
gamescope session at its native resolution, or "Desktop" for the KWin/GNOME desktop.
- Persistent pairing: the paired-client cert allow-list now survives restarts
(~/.config/lumen/paired.json), saved on each successful pairing, loaded at boot.
- Quit semantics: /cancel now actually stops the media threads (streaming/audio flags),
tearing down the per-session virtual output / gamescope process via the capturer's RAII.
- scripts/lumen-host.service (systemd user unit) + scripts/host.env.example (config file
consumed by it) — the host runs as a managed service instead of an SSH shell.
Smoke-tested: serve boots, /applist serves the catalog (Desktop + vkcube gamescope entry
auto-detected on this box). GNOME backend validation still pending gnome-shell install;
wlroots vdisplay backend deliberately deferred (not in the priority compositor trio).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The missing zero-copy path is closed. NVIDIA's EGL won't sample LINEAR and the CUDA driver
rejects raw dmabuf fds — but Vulkan imports dmabufs (VK_EXT_external_memory_dma_buf) and
exports OPAQUE_FD memory that CUDA officially imports. zerocopy/vulkan.rs (ash):
dmabuf fd → VkBuffer (import cached per fd) → vkCmdCopyBuffer (GPU) →
exportable VkBuffer → vkGetMemoryFdKHR(OPAQUE_FD) → cuImportExternalMemory → CUdeviceptr
The exportable buffer + CUDA mapping are per-resolution; per frame it's one GPU buffer copy
(fence-waited) + one pitched CUDA copy into the encoder's pool. No CPU touches pixels.
EglImporter::import_linear now routes through the bridge (lazy init; any failure still falls
back to the CPU mmap path). cuda::ExternalDmabuf gained import_owned_fd for the
Vulkan-exported fd.
Validated live: gamescope 720p120 → "Vulkan→CUDA exportable staging buffer ready
size=3686400" (exactly 1280*720*4), full-rate 122.7 fps, decoded frame pixel-correct
(vkcube). KWin's tiled EGL path regression-tested intact. NV12 negotiation dropped — moot
now that BGRx is fully zero-copy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
stream_body no longer sends: each frame's packet batch goes over a depth-2 bounded queue to a
dedicated send thread, so a send spike can never stall capture/encode (a full queue drops the
NEWEST batch — FEC/RFI covers the client — rather than ever blocking). The sender ships packets
with sendmmsg (≤64/syscall: ~375 syscalls/s instead of ~24k at 5K@240) in 16-packet chunks
paced across ~3/4 of the frame interval — microburst shaping for real links without per-packet
sleep jitter. Client-gone detection moved to the sender (clears `running`); the LUMEN_VIDEO_DROP
FEC test knob moved with the send path. Loopback-tested: batches arrive complete and
byte-identical through the paced sendmmsg path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Full controller path for the SteamOS-like session, mirroring Sunshine byte-for-byte
(wire formats verified against moonlight-common-c + Sunshine source; ioctl numbers and
struct layouts verified by compiling against this box's <linux/uinput.h>, locked in with
const asserts):
- gamestream/gamepad.rs: decode MULTI_CONTROLLER (magic 0x0C, mixed BE-size/LE-body) incl.
the Sunshine buttonFlags2 extension (paddles/touchpad/Misc — our appversion already
advertises Sunshine, so clients send it) and CONTROLLER_ARRIVAL (0x55000004); build the
0x010B rumble plaintext (with the mandatory 4-byte filler). Unit-tested.
- inject/gamepad.rs: VirtualPad clones the kernel xpad identity ("Microsoft X-Box 360 pad",
045e:028e, exact button/axis codes + absinfo) so SDL/Steam/Proton match their built-in
mapping with zero config. GamepadManager creates/destroys pads from activeGamepadMask
(hotplug), emits button transitions + axes (+Y-up → evdev +Y-down negation, D-pad as
HAT0X/Y) per frame. Rumble: non-blocking FF pump answers UI_BEGIN/END_FF_UPLOAD/ERASE
(games block in EVIOCSFF until answered), tracks effects with replay expiry + FF_GAIN,
mixes to (low, high) motor levels, dedups.
- control.rs: channel_limit 8 → 0x30 — Moonlight sends gamepad input on ENet channel
0x10+n, so the old limit silently discarded ALL controller input. Gamepad events route to
the manager; rumble is sealed with the client's detected GCM scheme direction-flipped
(V2 marker 'H?', own seq counter) and sent on the control peer every service tick.
- scripts/60-lumen.rules: udev rule (Sunshine-style) granting the input group /dev/uinput.
Live validation needs the udev rule installed (root-only /dev/uinput on this box) + a
Moonlight client with a controller; everything else is gated and unit/static-checked.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Third compositor on the VirtualDisplay seam, via Mutter's direct D-Bus APIs (the
gnome-remote-desktop headless model, no portal grant): RemoteDesktop.CreateSession →
ScreenCast.CreateSession({remote-desktop-session-id}) → Session.RecordVirtual (creates a
virtual monitor) → Start → PipeWireStreamAdded(node_id). A keepalive thread owns the zbus
connection (sessions die with it — RAII teardown); select with LUMEN_COMPOSITOR=mutter or
XDG_CURRENT_DESKTOP=GNOME.
Mutter sizes its virtual monitor FROM the PipeWire format negotiation, so VirtualOutput
gains preferred_mode (w, h, refresh_hz), threaded into the consumer's format pods as the
default size/framerate. KWin/gamescope set it too (their outputs are already exact-size;
the preference just confirms it) — both regression-tested intact.
Compile/clippy/test clean; live validation needs gnome-shell installed (then
`gnome-shell --headless` + m0 --source kwin-virtual with LUMEN_COMPOSITOR=mutter).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
gamescope only offers LINEAR dmabufs, which the EGL/GL interop path can't handle (NVIDIA's
EGL lists no LINEAR modifier for sampling). Attempt a direct CUDA external-memory import
(cuImportExternalMemory OPAQUE_FD, cached per buffer fd, one DtoD copy per frame into the
pooled buffer): the FFI + plumbing are in place, and LINEAR(0) is now advertised alongside
the tiled EGL modifiers (tiled first, so KWin still prefers it — regression-tested).
Empirically the 595 desktop driver rejects raw dmabuf fds as OPAQUE_FD (CUDA_ERROR_UNKNOWN),
matching the documented limitation — true LINEAR GPU import needs a Vulkan interop bridge
(import dmabuf via VK_EXT_external_memory_dma_buf, GPU-copy into an exportable allocation,
hand that to CUDA), noted as future work. So the importer now degrades instead of dying:
on GPU-import failure it logs once, disables itself, and falls through to the CPU mmap path.
Validated: gamescope + LUMEN_ZEROCOPY=1 runs full-rate (122.9 fps @720p120, valid HEVC) via
the fallback; KWin keeps real zero-copy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
gamescope runs its own EIS server and exports the socket to its children as LIBEI_SOCKET —
no portal involved. The gamescope backend now launches the nested app through a tiny shell
wrapper that relays that value to /tmp/lumen-gamescope-ei; the libei injector gains an
EiSource enum (Portal | SocketPathFile) and connects a UnixStream directly to gamescope's
socket (polling until the app has started), then runs the identical reis sender flow.
Backend::GamescopeEi is auto-selected when LUMEN_COMPOSITOR=gamescope
(LUMEN_INPUT_BACKEND=gamescope overrides).
Validated end-to-end: input-test against a headless gamescope running xev — 129
MotionNotify/KeyPress/ButtonPress events delivered into the nested X app ("Gamescope
Virtual Input" device bound, sender handshake + emulation working).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deep investigation (gdb + daemon traces) proved the gamescope capture stall is a gamescope
3.16.20 bug, not ours: it calls pw_loop_iterate() without pw_loop_enter()/leave(), and under
PipeWire 1.6's loop locking its main thread permanently holds the loop mutex — the pw thread
deadlocks, gamescope never acks the daemon's port_set_param(Format), and the link parks in
"negotiating" silently. Stock gst pipewiresrc fails identically. Fixed upstream by gamescope
commit e3ed1ea7 ("pipewire: Fix pipewire loop locking", pipewire#5148); first release 3.16.22.
Ubuntu 26.04 ships 3.16.20 (built ten days before the fix) — patch/upgrade required.
Consumer-side improvements from the investigation (all verified correct vs gamescope's pods,
and needed once the producer is fixed):
- discover the node from gamescope's own "stream available on node ID: N" log line (its
node.name appears on two objects; the advertised id is authoritative); pw-dump fallback
- CPU path accepts mappable dmabufs: Buffers param now offers MemPtr|MemFd|DmaBuf (gamescope
counter-offers exactly DmaBuf when its modifier pod wins, never MemPtr), mmap the fd
ourselves when MAP_BUFFERS didn't (Vulkan-exported dmabufs aren't flagged mappable), and
treat chunk.size==0 as the computed span
- warn_once on every silent frame-drop path in the process callback
- node.dont-reconnect on our capture streams: an orphaned stream re-targeted by wireplumber
onto a fresh node wedges it — and a stuck link head-blocks the daemon's shared work queue,
stalling ALL new link negotiation system-wide (this poisoned whole test sessions)
- LUMEN_GAMESCOPE_NODE (attach to an existing gamescope) + LUMEN_PW_FIXED_POD (negotiation
bisection) debug knobs
KWin path regression-tested (zero-copy intact). gamescope end-to-end validation pending the
patched gamescope build.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Third compositor on the VirtualDisplay seam. gamescope's model differs from KWin/Mutter: it's
not a runtime protocol but a micro-compositor we spawn — `gamescope --backend headless -W -H -r
-- <app>` — which composites at the client's size AND refresh natively (so no separate
refresh step), runs the app nested, and exports a built-in PipeWire node named "gamescope".
The backend spawns it, discovers that node via pw-dump, and returns a VirtualOutput whose
keepalive owns the process (drop = kill = teardown). App via LUMEN_GAMESCOPE_APP. Select with
LUMEN_COMPOSITOR=gamescope; m0's virtual source now honors LUMEN_COMPOSITOR so any backend is
testable without a client. Input (gamescope's libei/EIS socket) is a follow-up.
Builds/clippy/fmt clean. Needs gamescope installed to validate; headless capture on the
proprietary NVIDIA driver is plausible-by-architecture but unproven — validate empirically.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
KWin creates virtual outputs at a hardcoded 60 Hz and zkde stream_virtual_output has no
refresh argument, so the *source* composited at 60 Hz even when the client asked for 120/240
(confirmed live: stream paced a stable 240 fps but only ~60 unique frames/s). KWin 6.6+ allows
custom modes on virtual outputs, so after creating the output we install + select a mode at the
client's refresh, before capture connects PipeWire. First cut shells out to kscreen-doctor
(output is "Virtual-<name>"); the in-process kde_output_management_v2 client is a follow-up.
Best-effort — failure leaves the source at 60 Hz (stream still works). Verified the mode is
applied (Virtual-lumen -> 1280x720@120). Empirically de-risked that this headless QEMU VM's
software vsync accepts >60 Hz.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GB203 has two NVENC engines. A single HEVC p1 session tops out ~1 Gpix/s, so 5120x1440@240
(1.77 Gpix/s) is encoder-bound on one engine; split-frame encode runs it across both (~1.8x,
latency-neutral, output is standard HEVC the client decodes normally). NVENC's AUTO split
won't engage below ~2112px height, so force split_encode_mode=2 when the pixel rate exceeds
~1 Gpix/s (HEVC/AV1 only — not H.264). Below that (e.g. 5K@120) stay single-engine to avoid
the ~2% BD-rate cost. Override with LUMEN_SPLIT_ENCODE. Verified: engages at 240, not at 120.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The zero-copy import did real per-frame GPU churn that capped high-fps throughput: a fresh
~29MB cuMemAllocPitch + cuMemFree, a cuGraphicsGLRegisterImage/unregister, and a map of the
*same* persistent blit texture — every frame. Two fixes:
- BufferPool: a recycled free-list of pitched device buffers per resolution. DeviceBuffer
returns its allocation to the pool on drop (after the encoder synchronized) instead of
freeing — kills the per-frame 29MB alloc/free that took the device allocator lock and
serialized against the GPU.
- RegisteredTexture: register the (reused) GL_RGBA8 blit destination with CUDA ONCE when the
GlBlit is built; each frame only maps → copies the array → unmaps, instead of
registering/unregistering every frame.
This is the "zero-copy should be overhead-free" cleanup. Verified the import still produces
correct frames; the remaining per-frame cuCtxSynchronize pair (shared-context coupling) is
the next step (CUDA stream + events). lumen-host builds, clippy/fmt/tests clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
At 5120x1440 the stream froze on a ~2s cadence. Two compounding causes (confirmed by a
profiling pass + adversarial review):
1. Periodic IDR every 2s (set_gop(fps*2)). A keyframe at 5K is ~20-40x a P-frame — a
recurring multi-millisecond encode+packetize+send spike. Fix: infinite GOP (gop_size=-1),
one IDR at stream start, P-frames only; forced-idr makes a client recovery request (RFI via
request_keyframe) emit an IDR on demand — the Moonlight/Sunshine low-latency model.
2. Two pacing timers summing on the capture/encode thread: a per-packet thread::sleep pacer
(spread a frame's packets across a whole frame interval) PLUS a backstop sleep on top, so
every frame cost 1-2x the interval and the big IDR blew through it (the 2->120 oscillation).
Fix: delete both; send at line rate and drive cadence from a single absolute deadline.
(Proper microburst pacing belongs on a dedicated send thread — a follow-up.)
Also: honor the client's fps (pacing clamp 60->240) and add an env-gated (LUMEN_PERF)
per-stage timing log (enc/pkt/send µs + unique-vs-reencoded frames + max packet burst) for
diagnosing the remaining throughput ceiling. Verified live: freeze gone at 5120x1440.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ubuntu 26.04 ships FFmpeg 8.0 (libavcodec 62); bump ffmpeg-next 7.1 -> 8.1 to bind it
as the intended pairing. No source changes needed — the encode API surface we use
(avcodec_send_frame, hwframe contexts, AV_PIX_FMT_CUDA, av_log) is stable across 7->8.
Workspace builds + all tests green; clippy/fmt clean. Refresh the 7.x doc references.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Honor the client's requested resolution by rendering a compositor virtual output at
exactly that size — native, headless, no scaling. There is no cross-compositor Wayland
protocol for this, so it's a per-compositor backend behind the (previously stubbed)
VirtualDisplay trait.
- vdisplay.rs: VirtualDisplay::create(mode) now returns a live VirtualOutput
{ node_id, remote_fd: Option<OwnedFd>, keepalive } with RAII teardown (drop releases
the output) instead of an inert OutputHandle + explicit destroy. Add compositor
detect() (LUMEN_COMPOSITOR / XDG_CURRENT_DESKTOP).
- vdisplay/kwin.rs: the KWin backend — the zkde_screencast_unstable_v1 stream_virtual_output
client (vendored protocol XML + wayland-scanner codegen). Creates a WxH output, returns
its PipeWire node (default daemon, remote_fd=None); a keepalive thread holds the Wayland
connection until dropped. (Moved here from capture/kwin.rs — it's a vdisplay backend, not
capture.)
- capture: generalize the PipeWire consumer to Option<OwnedFd> (portal remote vs. default
daemon) and add capture_virtual_output(vout), compositor-agnostic, owning the keepalive.
- gamestream/stream.rs: LUMEN_VIDEO_SOURCE=virtual creates a virtual display sized to the
client's cfg and captures it (self-contained, not pooled — a reconnect at a new
resolution gets a fresh output).
- m0: --source kwin-virtual goes through the trait.
Verified end-to-end against the running headless KWin: the request reaches the compositor
and is handled cleanly. Native creation needs a backend implementing createVirtualOutput —
the DRM backend, or the VirtualBackend since KWin 6.5.6; on this box's --virtual 6.4.5 it
returns "Could not find output" (expected; validates after the KWin upgrade). wlroots/Mutter
backends are the next ones to land on the same seam.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Clients pick the resolution via mode=WxHxFPS / RTSP clientViewportWd-Ht, so the
host must bound attacker/typo-controlled dimensions before allocating buffers or
opening NVENC. Add encode::validate_dimensions: reject zero, odd, and over-limit
modes (H.264 ≤ 4096px/side; HEVC/AV1 ≤ 8192) with a clear message instead of a
buffer-math overflow or an opaque NVENC open failure. Gate both the stream path
(before any allocation) and open_video (also covers m0). Unit-tested.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The PipeWire dmabuf now reaches NVENC with no CPU touch. Verified live against
headless KWin: a tiled BGRx dmabuf is imported and encoded to a pixel-correct
H.265 stream (decoded frame matches the captured desktop — no tiling artifacts,
no colour swap). The CPU-copy path stays the default and the runtime fallback.
Capture side (zerocopy::egl): desktop NVIDIA can't register a dmabuf EGLImage
with CUDA directly (cuGraphicsEGLRegisterImage is Tegra-only; cuGraphicsGLRegisterImage
rejects EGLImage-backed textures), so we follow OBS/Sunshine — bind the EGLImage
to a GL texture, render it through a fullscreen-triangle shader into an immutable
GL_RGBA8 texture (de-tiling + .bgra swizzle to the BGRx the encoder wants), then
register that texture with CUDA and copy it device-to-device into an owned buffer
so the dmabuf returns to the compositor immediately.
Encode side (encode/linux::submit_cuda): take a *pooled* CUDA surface via
av_hwframe_get_buffer and device→device-copy our imported buffer into it, instead
of wrapping our own pointer in a bare AVFrame. A bare frame is rejected with
EINVAL (NVENC ignores frames with null buf[0]; the encode path's av_frame_ref
needs a refcounted buffer), and a fresh device pointer every frame would thrash
NVENC's bounded resource-registration cache — the pool recycles a small set.
Also: gate FFmpeg AV_LOG_DEBUG behind LUMEN_FFMPEG_DEBUG for diagnosing
hw-frame rejects, and refresh the now-accurate module docs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the capture side of zero-copy (LUMEN_ZEROCOPY=1):
- EGL importer now opens the headless EGLDisplay on the NVIDIA EGL device
(EGL_PLATFORM_DEVICE_EXT) and queries its importable DRM modifiers
(eglQueryDmaBufModifiersEXT).
- The PipeWire stream advertises a BGRx dmabuf format with those modifiers as a
mandatory enum Choice + a dmabuf-only Buffers param; the compositor fixates an
importable tiled modifier. param_changed reads the negotiated modifier; the
process callback imports the dmabuf (eglCreateImage with explicit LO/HI
modifier) and would copy it into a CUDA buffer for the encoder.
Validated against headless KWin (Plasma 6.4): negotiation succeeds (13 NVIDIA
modifiers advertised, KWin fixates one, stream reaches Streaming with a real
tiled dmabuf) and `eglCreateImage` succeeds. The remaining blocker is
`cuGraphicsEGLRegisterImage` returning CUDA_ERROR_INVALID_VALUE on the
dmabuf-imported EGLImage — the likely fix is to bind the EGLImage to a GL
texture (glEGLImageTargetTexture2DOES) and register that via
cuGraphicsGLRegisterImage (OBS/Sunshine's path), which needs a GL context.
The CPU-copy path stays the default and is unaffected (regression-checked: real
KWin capture → HEVC). LUMEN_ZEROCOPY is opt-in/experimental until the CUDA
registration lands.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scaffolding for dmabuf zero-copy (plan §9), opt-in via LUMEN_ZEROCOPY:
- src/zerocopy/{cuda,egl}.rs: hand-rolled CUDA Driver-API FFI (no Rust crate
exposes the EGL-interop calls / CUeglFrame) with a shared process-wide
CUcontext + pitched device buffers; an EGL importer (GBM platform on the
NVIDIA render node) that turns a dmabuf into an EGLImage, registers it with
CUDA, and copies it device-to-device into an owned buffer. `zerocopy-probe`
subcommand validates the FFI/linking/GPU access — confirmed on the box
(driver 595, EGL_EXT_image_dma_buf_import + modifiers).
- CapturedFrame gains a FramePayload enum (Cpu(Vec<u8>) | Cuda(DeviceBuffer));
the encoder branches: CPU keeps the expand+upload path, CUDA wraps the device
buffer in an AV_PIX_FMT_CUDA frame fed straight to hevc_nvenc (sharing our
CUcontext via a hand-declared AVCUDADeviceContext, since ffmpeg-sys doesn't
bind hwcontext_cuda.h). open_video/the encoder take a `cuda` flag derived from
the first frame's payload.
The capture-side dmabuf negotiation (which produces the Cuda frames) is the
next step; the CPU path is unchanged and remains the default + fallback. Builds
clean, clippy clean, tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The M2 teardown work added an `active` gate to the PipeWire capture callback
(idle by default so reconnects stay cheap, with the stream path calling
set_active(true) on PLAY). The `m0` subcommand was never updated, so its portal
capturer stayed inactive and the callback dropped every frame — `m0 --source
portal` failed with "no PipeWire frame within 10s" on every compositor. Call
set_active(true) before the capture loop.
Validated on headless KWin (Plasma 6.4) via the RemoteDesktop-anchored ScreenCast
session: real desktop frames flow (shm BGRx 1920x1080) and encode to valid H.265.
(Also folds in a rustfmt reflow of the input-test log line.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On RemoteDesktop-capable desktops (KWin, GNOME), select the ScreenCast source on
a session created via the RemoteDesktop portal and start it through RemoteDesktop,
so a single grant — pre-authorized headlessly via the `kde-authorized` permission,
exactly like the libei input path — also covers screen capture. Standalone
ScreenCast has no such bypass and would raise an un-clickable dialog on a headless
box. wlroots/Sway has no RemoteDesktop portal, so it keeps the plain ScreenCast
session; the choice keys off inject::default_backend(). The PipeWire consumer is
unchanged — the anchored session yields the same fd + node id.
Validated on headless KWin (Plasma 6.4): the portal grants the session with no
dialog and PipeWire negotiates the format (1920x1080 BGRx, Streaming). Frame
delivery on KWin still pends dmabuf import — KWin hands GPU dmabuf buffers and the
M0 consumer is CPU-copy/shm only (plan §9, zero-copy) — so it's the next step;
the CPU-copy path remains correct on wlroots.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a second input-injection backend that works on compositors implementing
the org.freedesktop.portal.RemoteDesktop interface (KWin, GNOME/Mutter), where
the wlroots virtual-input protocols are absent. Uses ashpd 0.13 to open a
RemoteDesktop session + EIS fd and reis 0.6.1 to drive it as an EI sender:
bind pointer/keyboard/scroll/button capabilities and, per device,
start_emulating → emit → frame. Runs on a dedicated thread with its own tokio
runtime (the portal session + EIS connection must stay alive and the event
stream must be polled continuously); open() returns immediately so a slow or
denied portal can never freeze the ENet control thread, with events enqueued
over an unbounded channel until devices resume.
Backend now auto-selects per session (inject::default_backend): wlr on Sway,
libei on KDE/GNOME; LUMEN_INPUT_BACKEND overrides. Refactor inject.rs into the
inject/{wlr,libei}.rs layout matching the capture/encode convention. Keyboard
codes are evdev (the same space our VK→evdev table produces) and the compositor
supplies the keymap, so no keymap upload and no modifier serialization — pressing
the modifier keys Moonlight sends is enough.
Add a `lumen-host input-test` subcommand that injects a scripted mouse+keyboard
pattern through the session backend, so input injection can be validated without
a Moonlight client.
Live-validated on headless KWin (Plasma 6.4): mouse motion, left click, and the
'A' key inject correctly and are delivered to the focused client.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Disconnect/reconnect now works reliably. Previously each stream spawned its own
portal+PipeWire (and PipeWire audio) capture threads and never stopped them, so a
reconnect opened a SECOND screencast session that conflicted with the leaked
first one ("no PipeWire frame within 10s" → black screen on reconnect).
- The screen capturer and audio capturer are now persistent, held in AppState and
reused across streams (created on the first stream). One screencast session for
the host's lifetime → no conflict, and instant reconnect (no re-handshake).
Verified live: 3 stream cycles, 1 create + 2 "reusing capturer", clean every time.
- Capturer::set_active gates the (5K, ~1.3 GB/s) de-pad copy to active streams, so
the persistent video capturer is nearly free while idle between streams.
- AudioCapturer::drain discards buffered chunks on reuse so the client never hears
stale audio captured while idle.
- stream.rs / gamestream/audio.rs split into a borrow-the-capturer wrapper + the
encode/send body, so the capturer is always returned to its slot on exit.
This holds whether the client reconnects via /resume (Moonlight's "running →
play/continue") or a fresh /launch — both re-run RTSP PLAY → a new stream cycle.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Graceful FEC behavior on a lossy link: at a realistic 2% packet loss the stream
is now steady 0% (was spiking 40-60%). Verified live.
- IDR/RFI handling: the control thread recognizes the client's recovery requests
(0x0301 invalidate-reference-frames, 0x0302 request-IDR, 0x0305) and sets a
shared force_idr flag; the video thread forces an NVENC keyframe on the next
frame (Encoder::request_keyframe → input frame pict_type = I). Without this, a
frame that exceeds the FEC budget broke the reference chain until the next GOP
IDR (~2s), cascading to most of the stream being undecodable.
- Min-parity floor: honor the client's x-nv-vqos[0].fec.minRequiredFecPackets
(it asks for 2). Small P-frames previously got m=ceil(k*20/100)=1 parity — a
single loss broke them; flooring m>=2 (capped so k+m<=255, wire pct recomputed)
protects them. This is what turned the 2% spikes into steady 0%.
- Send pacing: spread each frame's packets evenly across the frame interval
instead of blasting them at line rate (a real link drops microbursts), matching
Sunshine's rate-controlled sends; sub-500us sleeps skipped (unreliable).
Note: sustained ~8% uniform loss still degrades — that exceeds 20% FEC for
reference-frame video and real Sunshine degrades there too; real networks are
<1% or bursty, which this now handles cleanly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Moonlight now reconstructs lost video shards from our parity (verified live:
under induced packet loss the picture recovers cleanly instead of failing with
"network connection too bad"; 0% added loss in normal operation).
The decisive finding: Moonlight's nanors uses a CAUCHY generator matrix
(M[j][i] = inv[(m+i)^j], GF(2^8) poly 0x1d), while reed-solomon-erasure is
Vandermonde — so its parity was NOT Moonlight-decodable, despite the old
gf8.rs comment claiming equivalence.
lumen-core:
- Swap the GF(2^8) backend from reed-solomon-erasure to a vendored fec-rs
(vendor/fec-rs, BSD-2), which builds the byte-identical Cauchy matrix. Pure
Rust, no FFI — keeps the "one core" hot path. This makes both lumen's own
protocol and the GameStream parity nanors-compatible.
- Lock it with a regression test against real nanors vectors
(k=4,m=2 [10,20,30,40] -> parity [136,0]) + an independent matrix-derived
cross-check + an erase/recover round-trip. Existing FEC/loopback tests stay
green, so lumen's own protocol is unaffected.
lumen-host video.rs:
- Generate m = ceil(k*pct/100) parity shards per FEC block via Gf8Coder; stamp
fecInfo with the recomputed wire pct (100*m/k) so the client derives the same
count; cap per-block data to 255*100/(100+pct) so k+m <= 255.
- CRITICAL byte-exactness: RS runs over the whole `blocksize` shard (Moonlight
decodes packetSize+16 bytes from the datagram start and PACKET_RECOVERY_FAILUREs
on a bad reconstructed `flags` byte). So the NV header fields RS must reproduce
(streamPacketIndex/frameIndex/flags/multiFec*) are written into data shards
BEFORE encode, and only the transport fields (RTP header/seq/timestamp +
fecInfo) are stamped AFTER — leaving the flags byte RS-covered. Matches
Sunshine stream.cpp. Unit-tested incl. flags recovery.
- fec_percentage wired from stream.rs (Sunshine default 20, LUMEN_FEC_PCT
override; 0 = data-only). LUMEN_VIDEO_DROP injects loss to test recovery.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A stock Moonlight client now gets video + full input + AUDIO from the
from-scratch GameStream host (verified live end-to-end on a macOS client).
Audio (audio.rs, audio/linux.rs, gamestream/audio.rs):
- Capture the default PipeWire sink's monitor (system output) as interleaved
f32 stereo @ 48kHz via stream.capture.sink, on its own thread.
- Opus-encode 5ms/240-sample stereo frames (RESTRICTED_LOWDELAY, CBR) and send
as GameStream RTP audio: 12-byte BE RTP_PACKET (packetType 97, seq+1/pkt,
timestamp += packetDuration, ssrc 0) on UDP 48000, after learning the client
endpoint from its port-learning ping.
- Encrypt the Opus payload with AES-128-CBC (PKCS7), key = launch rikey, IV =
BE32(rikeyid + seq) in [0..4]. Like the control stream, modern Moonlight
always decrypts audio regardless of the negotiated flags — plaintext makes it
log "Failed to decrypt audio packet" and play silence (diagnosed from the
client log). RTP header stays in the clear. Scheme cross-checked against
Sunshine stream.cpp/crypto.cpp + moonlight AudioStream.c.
- Pace each frame to its 5ms slot (PipeWire delivers ~1024-frame buffers) to
avoid bursts the client's jitter buffer hears as glitches. LUMEN_AUDIO_GAIN
applies optional linear gain for quiet sources.
- DESCRIBE SDP advertises the stereo Opus config (a=fmtp:97 surround-params).
Video (stream.rs): pace at a steady ≤60fps, re-encoding the last captured frame
when the compositor produces none. wlroots only emits on damage, so a static or
slow-updating desktop previously starved the client into a "network too slow"
abort; an unchanged frame costs a near-empty P-frame. Adds a non-blocking
Capturer::try_latest (portal drains to the freshest queued frame).
Misc: serialize pipewire init across the video + audio capture threads
(pwinit.rs, std::sync::Once) to avoid a concurrent pw_init race. Deps: opus,
cbc; libopus-dev in bootstrap-ubuntu.sh.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A stock Moonlight client can now drive the headless Sway desktop: mouse
movement, buttons, scroll, and keyboard all inject through the streamed
session (verified live end-to-end — typing, clicking, window management).
Control stream (gamestream/control.rs):
- Moonlight encrypts the ENet control stream with AES-128-GCM even though we
negotiate no media encryption (it detects our Sunshine `state` and turns it
on). Decrypt per-packet under the /launch `rikey`.
- The exact GCM scheme is auto-detected on the first authenticating packet
(nonce construction × key byte-order × tag position × AAD) since GCM gives no
partial credit. Our client uses the legacy 16-byte nonce (`iv[0]=seq&0xff`)
because we advertise no encryption; the 12-byte SS_ENC_CONTROL_V2 nonce is
also supported. Key/IV/tag layout cross-checked against Sunshine stream.cpp +
crypto.cpp and moonlight-common-c ControlStream.c.
Input decode (gamestream/input.rs):
- Decrypted control messages (`[u16 type][u16 len][NV_INPUT packet]`, type
0x0206) decode into lumen_core::input::InputEvent: relative/abs mouse, buttons,
vert/horiz scroll, keyboard down/up. Struct layout from moonlight Input.h
(size BE, magic LE, body BE; keyCode LE masked to the low-byte VK), dispatch
per Sunshine input.cpp (Gen5+). Unit-tested against real captured bytes.
Injection (inject.rs):
- WlrootsInjector: connects to Sway as a Wayland client and injects via the
wlroots virtual-pointer + virtual-keyboard protocols (uinput is invisible to a
compositor running WLR_LIBINPUT_NO_DEVICES=1). Uploads an evdev/US xkb keymap,
tracks modifier state, and maps Windows VK → Linux evdev (full table).
Deps: aes-gcm, wayland-client, wayland-protocols-{wlr,misc}, xkbcommon (+
libxkbcommon-dev in bootstrap-ubuntu.sh).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire M0's portal desktop capture into the GameStream video plane: with
LUMEN_VIDEO_SOURCE=portal the stream captures the headless wlroots desktop
(PipeWire RGB) instead of the synthetic pattern, opens NVENC from the first
captured frame's format/size, and streams it. Verified live: a stock Moonlight
client shows the real 5120×1440 desktop at ~42 fps (release build).
- capture.rs: FastSyntheticCapturer (cheap fill pattern, real-time at 5K) so both
sources share the Capturer trait
- stream.rs: source select (portal | synthetic), encoder opened from the first
frame, wall-clock 90 kHz RTP timestamps (correct under a variable capture rate)
Note: the CPU-copy RGB→rgb0 path caps ~42 fps at 5K (single-threaded); dmabuf
zero-copy is the deferred optimization (plan §9).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A stock Moonlight client now decodes H.265 from the lumen host end-to-end
(verified at 5120×1440@120 on RTX 5070 Ti):
- control.rs: ENet control host on UDP 47999 (rusty_enet). Moonlight starts the
control stream before video (STAGE_CONTROL_STREAM_START precedes _VIDEO_), so it
must be up first — this was the blocker behind the earlier "error 35".
- stream.rs: video data plane — on RTSP PLAY, learn the client endpoint from its
ping, NVENC-encode at the negotiated mode, packetize (GameStream RTP/NV/FEC),
send over UDP 47998; stops when the client disconnects.
- rtsp.rs: ANNOUNCE → StreamConfig (resolution/fps/packetSize/bitrate/codec), PLAY
starts the stream, TEARDOWN stops it; PairStatus=1 over the mutual-TLS port.
P1.3 uses a synthetic test pattern + data-shards-only FEC (clean-LAN). Next: real
portal desktop capture, input injection (decode control → uinput), nanors-exact FEC,
encryption, audio.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepares the move to the NVIDIA-GPU Ubuntu VM where M0/M2 run (macOS can't drive the
Wayland/GPU stack). The repo carries the context, since Claude Code sessions are
machine-local and don't transfer.
- CLAUDE.md: project state + design invariants + don't-regress security notes. Auto-loads
every session, so a fresh session on the VM continues from here.
- scripts/bootstrap-ubuntu.sh: verifies the (already-installed) NVIDIA/NVENC stack,
installs rustup + PipeWire/portal/wlroots/Sway + DRM/EGL/GBM/VA dev deps; GATES the
FFmpeg -dev headers so apt can't clobber a custom NVENC build; checks nvidia-drm.modeset.
- scripts/headless/: headless-Sway + xdg-desktop-portal-wlr config templates, the
NVIDIA-wlroots env workarounds, run-headless-sway.sh, and a wf-recorder->hevc_nvenc
capture smoke test (proves capture->NVENC with no Rust).
- docs/linux-setup.md: M0 walkthrough + verified gotchas (modeset, headless backend,
vGPU NVENC licensing, dmabuf->NVENC CPU-copy fallback, FFmpeg-dev gate, crate versions).
Ubuntu 24.04 package names/versions verified against the live archive; scripts pass
shellcheck and `bash -n`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>