The first end-to-end run of lumen's own protocol, past the GameStream compatibility layer.
- lumen-core/src/quic.rs (behind the `quic` feature): the lumen/1 handshake — Hello/Welcome/
Start as length-prefixed LE binary on one QUIC bi-stream. Welcome carries the COMPLETE
data-plane Config: mode, FEC scheme incl. GF(2^16) Leopard (inexpressible in GameStream),
shard sizing, AES-GCM key + per-direction salt, data UDP port. Plus quinn endpoint helpers
(self-signed server; accepts-any client — pinning lands with the trust model) and framed
async IO. Round-trip unit-tested.
- lumen-host m3-host: serves one lumen/1 session — QUIC handshake, then a NATIVE thread
(no async on the frame path — design invariant) streams deterministic 64KB test frames
through the hardened M1 Session over UdpTransport.
- lumen-client-rs: from scaffold to working reference client — connects, negotiates, brings
up the client Session over UDP, reassembles + FEC-recovers + byte-verifies every frame.
VALIDATED END-TO-END on localhost: 300/300 frames verified, 0 mismatches, through
QUIC-negotiated GF(2^16) FEC + AES-GCM over real UDP sockets. M4 (decode+present) builds on
this exact client skeleton.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The missing zero-copy path is closed. NVIDIA's EGL won't sample LINEAR and the CUDA driver
rejects raw dmabuf fds — but Vulkan imports dmabufs (VK_EXT_external_memory_dma_buf) and
exports OPAQUE_FD memory that CUDA officially imports. zerocopy/vulkan.rs (ash):
dmabuf fd → VkBuffer (import cached per fd) → vkCmdCopyBuffer (GPU) →
exportable VkBuffer → vkGetMemoryFdKHR(OPAQUE_FD) → cuImportExternalMemory → CUdeviceptr
The exportable buffer + CUDA mapping are per-resolution; per frame it's one GPU buffer copy
(fence-waited) + one pitched CUDA copy into the encoder's pool. No CPU touches pixels.
EglImporter::import_linear now routes through the bridge (lazy init; any failure still falls
back to the CPU mmap path). cuda::ExternalDmabuf gained import_owned_fd for the
Vulkan-exported fd.
Validated live: gamescope 720p120 → "Vulkan→CUDA exportable staging buffer ready
size=3686400" (exactly 1280*720*4), full-rate 122.7 fps, decoded frame pixel-correct
(vkcube). KWin's tiled EGL path regression-tested intact. NV12 negotiation dropped — moot
now that BGRx is fully zero-copy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Third compositor on the VirtualDisplay seam. gamescope's model differs from KWin/Mutter: it's
not a runtime protocol but a micro-compositor we spawn — `gamescope --backend headless -W -H -r
-- <app>` — which composites at the client's size AND refresh natively (so no separate
refresh step), runs the app nested, and exports a built-in PipeWire node named "gamescope".
The backend spawns it, discovers that node via pw-dump, and returns a VirtualOutput whose
keepalive owns the process (drop = kill = teardown). App via LUMEN_GAMESCOPE_APP. Select with
LUMEN_COMPOSITOR=gamescope; m0's virtual source now honors LUMEN_COMPOSITOR so any backend is
testable without a client. Input (gamescope's libei/EIS socket) is a follow-up.
Builds/clippy/fmt clean. Needs gamescope installed to validate; headless capture on the
proprietary NVIDIA driver is plausible-by-architecture but unproven — validate empirically.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ubuntu 26.04 ships FFmpeg 8.0 (libavcodec 62); bump ffmpeg-next 7.1 -> 8.1 to bind it
as the intended pairing. No source changes needed — the encode API surface we use
(avcodec_send_frame, hwframe contexts, AV_PIX_FMT_CUDA, av_log) is stable across 7->8.
Workspace builds + all tests green; clippy/fmt clean. Refresh the 7.x doc references.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Honor the client's requested resolution by rendering a compositor virtual output at
exactly that size — native, headless, no scaling. There is no cross-compositor Wayland
protocol for this, so it's a per-compositor backend behind the (previously stubbed)
VirtualDisplay trait.
- vdisplay.rs: VirtualDisplay::create(mode) now returns a live VirtualOutput
{ node_id, remote_fd: Option<OwnedFd>, keepalive } with RAII teardown (drop releases
the output) instead of an inert OutputHandle + explicit destroy. Add compositor
detect() (LUMEN_COMPOSITOR / XDG_CURRENT_DESKTOP).
- vdisplay/kwin.rs: the KWin backend — the zkde_screencast_unstable_v1 stream_virtual_output
client (vendored protocol XML + wayland-scanner codegen). Creates a WxH output, returns
its PipeWire node (default daemon, remote_fd=None); a keepalive thread holds the Wayland
connection until dropped. (Moved here from capture/kwin.rs — it's a vdisplay backend, not
capture.)
- capture: generalize the PipeWire consumer to Option<OwnedFd> (portal remote vs. default
daemon) and add capture_virtual_output(vout), compositor-agnostic, owning the keepalive.
- gamestream/stream.rs: LUMEN_VIDEO_SOURCE=virtual creates a virtual display sized to the
client's cfg and captures it (self-contained, not pooled — a reconnect at a new
resolution gets a fresh output).
- m0: --source kwin-virtual goes through the trait.
Verified end-to-end against the running headless KWin: the request reaches the compositor
and is handled cleanly. Native creation needs a backend implementing createVirtualOutput —
the DRM backend, or the VirtualBackend since KWin 6.5.6; on this box's --virtual 6.4.5 it
returns "Could not find output" (expected; validates after the KWin upgrade). wlroots/Mutter
backends are the next ones to land on the same seam.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scaffolding for dmabuf zero-copy (plan §9), opt-in via LUMEN_ZEROCOPY:
- src/zerocopy/{cuda,egl}.rs: hand-rolled CUDA Driver-API FFI (no Rust crate
exposes the EGL-interop calls / CUeglFrame) with a shared process-wide
CUcontext + pitched device buffers; an EGL importer (GBM platform on the
NVIDIA render node) that turns a dmabuf into an EGLImage, registers it with
CUDA, and copies it device-to-device into an owned buffer. `zerocopy-probe`
subcommand validates the FFI/linking/GPU access — confirmed on the box
(driver 595, EGL_EXT_image_dma_buf_import + modifiers).
- CapturedFrame gains a FramePayload enum (Cpu(Vec<u8>) | Cuda(DeviceBuffer));
the encoder branches: CPU keeps the expand+upload path, CUDA wraps the device
buffer in an AV_PIX_FMT_CUDA frame fed straight to hevc_nvenc (sharing our
CUcontext via a hand-declared AVCUDADeviceContext, since ffmpeg-sys doesn't
bind hwcontext_cuda.h). open_video/the encoder take a `cuda` flag derived from
the first frame's payload.
The capture-side dmabuf negotiation (which produces the Cuda frames) is the
next step; the CPU path is unchanged and remains the default + fallback. Builds
clean, clippy clean, tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a second input-injection backend that works on compositors implementing
the org.freedesktop.portal.RemoteDesktop interface (KWin, GNOME/Mutter), where
the wlroots virtual-input protocols are absent. Uses ashpd 0.13 to open a
RemoteDesktop session + EIS fd and reis 0.6.1 to drive it as an EI sender:
bind pointer/keyboard/scroll/button capabilities and, per device,
start_emulating → emit → frame. Runs on a dedicated thread with its own tokio
runtime (the portal session + EIS connection must stay alive and the event
stream must be polled continuously); open() returns immediately so a slow or
denied portal can never freeze the ENet control thread, with events enqueued
over an unbounded channel until devices resume.
Backend now auto-selects per session (inject::default_backend): wlr on Sway,
libei on KDE/GNOME; LUMEN_INPUT_BACKEND overrides. Refactor inject.rs into the
inject/{wlr,libei}.rs layout matching the capture/encode convention. Keyboard
codes are evdev (the same space our VK→evdev table produces) and the compositor
supplies the keymap, so no keymap upload and no modifier serialization — pressing
the modifier keys Moonlight sends is enough.
Add a `lumen-host input-test` subcommand that injects a scripted mouse+keyboard
pattern through the session backend, so input injection can be validated without
a Moonlight client.
Live-validated on headless KWin (Plasma 6.4): mouse motion, left click, and the
'A' key inject correctly and are delivered to the focused client.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A stock Moonlight client now gets video + full input + AUDIO from the
from-scratch GameStream host (verified live end-to-end on a macOS client).
Audio (audio.rs, audio/linux.rs, gamestream/audio.rs):
- Capture the default PipeWire sink's monitor (system output) as interleaved
f32 stereo @ 48kHz via stream.capture.sink, on its own thread.
- Opus-encode 5ms/240-sample stereo frames (RESTRICTED_LOWDELAY, CBR) and send
as GameStream RTP audio: 12-byte BE RTP_PACKET (packetType 97, seq+1/pkt,
timestamp += packetDuration, ssrc 0) on UDP 48000, after learning the client
endpoint from its port-learning ping.
- Encrypt the Opus payload with AES-128-CBC (PKCS7), key = launch rikey, IV =
BE32(rikeyid + seq) in [0..4]. Like the control stream, modern Moonlight
always decrypts audio regardless of the negotiated flags — plaintext makes it
log "Failed to decrypt audio packet" and play silence (diagnosed from the
client log). RTP header stays in the clear. Scheme cross-checked against
Sunshine stream.cpp/crypto.cpp + moonlight AudioStream.c.
- Pace each frame to its 5ms slot (PipeWire delivers ~1024-frame buffers) to
avoid bursts the client's jitter buffer hears as glitches. LUMEN_AUDIO_GAIN
applies optional linear gain for quiet sources.
- DESCRIBE SDP advertises the stereo Opus config (a=fmtp:97 surround-params).
Video (stream.rs): pace at a steady ≤60fps, re-encoding the last captured frame
when the compositor produces none. wlroots only emits on damage, so a static or
slow-updating desktop previously starved the client into a "network too slow"
abort; an unchanged frame costs a near-empty P-frame. Adds a non-blocking
Capturer::try_latest (portal drains to the freshest queued frame).
Misc: serialize pipewire init across the video + audio capture threads
(pwinit.rs, std::sync::Once) to avoid a concurrent pw_init race. Deps: opus,
cbc; libopus-dev in bootstrap-ubuntu.sh.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A stock Moonlight client can now drive the headless Sway desktop: mouse
movement, buttons, scroll, and keyboard all inject through the streamed
session (verified live end-to-end — typing, clicking, window management).
Control stream (gamestream/control.rs):
- Moonlight encrypts the ENet control stream with AES-128-GCM even though we
negotiate no media encryption (it detects our Sunshine `state` and turns it
on). Decrypt per-packet under the /launch `rikey`.
- The exact GCM scheme is auto-detected on the first authenticating packet
(nonce construction × key byte-order × tag position × AAD) since GCM gives no
partial credit. Our client uses the legacy 16-byte nonce (`iv[0]=seq&0xff`)
because we advertise no encryption; the 12-byte SS_ENC_CONTROL_V2 nonce is
also supported. Key/IV/tag layout cross-checked against Sunshine stream.cpp +
crypto.cpp and moonlight-common-c ControlStream.c.
Input decode (gamestream/input.rs):
- Decrypted control messages (`[u16 type][u16 len][NV_INPUT packet]`, type
0x0206) decode into lumen_core::input::InputEvent: relative/abs mouse, buttons,
vert/horiz scroll, keyboard down/up. Struct layout from moonlight Input.h
(size BE, magic LE, body BE; keyCode LE masked to the low-byte VK), dispatch
per Sunshine input.cpp (Gen5+). Unit-tested against real captured bytes.
Injection (inject.rs):
- WlrootsInjector: connects to Sway as a Wayland client and injects via the
wlroots virtual-pointer + virtual-keyboard protocols (uinput is invisible to a
compositor running WLR_LIBINPUT_NO_DEVICES=1). Uploads an evdev/US xkb keymap,
tracks modifier state, and maps Windows VK → Linux evdev (full table).
Deps: aes-gcm, wayland-client, wayland-protocols-{wlr,misc}, xkbcommon (+
libxkbcommon-dev in bootstrap-ubuntu.sh).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A stock Moonlight client now decodes H.265 from the lumen host end-to-end
(verified at 5120×1440@120 on RTX 5070 Ti):
- control.rs: ENet control host on UDP 47999 (rusty_enet). Moonlight starts the
control stream before video (STAGE_CONTROL_STREAM_START precedes _VIDEO_), so it
must be up first — this was the blocker behind the earlier "error 35".
- stream.rs: video data plane — on RTSP PLAY, learn the client endpoint from its
ping, NVENC-encode at the negotiated mode, packetize (GameStream RTP/NV/FEC),
send over UDP 47998; stops when the client disconnects.
- rtsp.rs: ANNOUNCE → StreamConfig (resolution/fps/packetSize/bitrate/codec), PLAY
starts the stream, TEARDOWN stops it; PairStatus=1 over the mutual-TLS port.
P1.3 uses a synthetic test pattern + data-shards-only FEC (clean-LAN). Next: real
portal desktop capture, input injection (decode control → uinput), nanors-exact FEC,
encryption, audio.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>