The PipeWire dmabuf now reaches NVENC with no CPU touch. Verified live against headless KWin: a tiled BGRx dmabuf is imported and encoded to a pixel-correct H.265 stream (decoded frame matches the captured desktop — no tiling artifacts, no colour swap). The CPU-copy path stays the default and the runtime fallback. Capture side (zerocopy::egl): desktop NVIDIA can't register a dmabuf EGLImage with CUDA directly (cuGraphicsEGLRegisterImage is Tegra-only; cuGraphicsGLRegisterImage rejects EGLImage-backed textures), so we follow OBS/Sunshine — bind the EGLImage to a GL texture, render it through a fullscreen-triangle shader into an immutable GL_RGBA8 texture (de-tiling + .bgra swizzle to the BGRx the encoder wants), then register that texture with CUDA and copy it device-to-device into an owned buffer so the dmabuf returns to the compositor immediately. Encode side (encode/linux::submit_cuda): take a *pooled* CUDA surface via av_hwframe_get_buffer and device→device-copy our imported buffer into it, instead of wrapping our own pointer in a bare AVFrame. A bare frame is rejected with EINVAL (NVENC ignores frames with null buf[0]; the encode path's av_frame_ref needs a refcounted buffer), and a fresh device pointer every frame would thrash NVENC's bounded resource-registration cache — the pool recycles a small set. Also: gate FFmpeg AV_LOG_DEBUG behind LUMEN_FFMPEG_DEBUG for diagnosing hw-frame rejects, and refresh the now-accurate module docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lumen
A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.
lumen is a placeholder codename. The bet: ship a Linux virtual-display streaming
host that speaks the existing Moonlight protocol (every Moonlight/Artemis client works
day one), then break the ~1 Gbps FEC wall with a GF(2¹⁶) Leopard-RS transport as a
negotiated extension. See docs/implementation-plan.md.
Status
| Milestone | State |
|---|---|
M1 — lumen-core + C ABI |
✅ done & tested (FEC, packetization, crypto, session, lumen_core.h) |
M0 — pipeline spike (wlroots→PipeWire→NVENC→file→lumen-core) |
✅ done & verified on NVIDIA (RTX 5070 Ti / driver 595) |
| M2 — P1 host → stock Moonlight | 🟡 capture+encode landed in M0; pairing/RTSP/vdisplay pending |
| M3 — measurement harness | 🟡 tools/loss-harness runs; latency-probe scaffolded |
| M4 — P2 transport + Rust client | 🟡 GF(2¹⁶) core done; lumen-client-rs scaffolded |
| M5 — Apple client | ⬜ scaffolded (clients/apple) |
lumen-core is complete and verified: it builds and its full test suite (FEC recovery,
loopback round-trip under loss, property tests, and a C ABI harness) passes on
macOS/aarch64. M0 is done: lumen-host captures a headless wlroots output via the
ScreenCast portal + PipeWire, encodes it with NVENC, writes a playable H.265 file, and
round-trips every access unit through a lumen_core host→client session (see
docs/linux-setup.md). The remaining Linux host backends (KWin/Mutter virtual displays,
libei input, web/pairing) are #[cfg(target_os = "linux")] seams — defined and compiling,
implementations pending (M2).
Layout
crates/
lumen-core/ protocol · FEC · pacing · crypto — the C ABI (lib + cdylib + staticlib)
lumen-host/ Linux host: vdisplay · capture · encode · inject · web (cfg-gated)
lumen-client-rs/ reference client (M4): VAAPI decode + wgpu present
clients/{apple,android}/ native client scaffolds (import lumen_core.h)
include/lumen_core.h cbindgen-generated C header (checked in)
tools/{latency-probe,loss-harness}/ measurement (plan §10)
docs/implementation-plan.md
Build & test
cargo build --workspace # green on Linux and macOS
cargo test --workspace # unit + loopback + proptest + C ABI harness
cargo clippy --workspace --all-targets
cargo run -p loss-harness # FEC loss-resilience sweep (no network needed)
bash crates/lumen-core/tests/c/run.sh # standalone C-ABI link+round-trip proof
The C header regenerates from crates/lumen-core/src/abi.rs on every build (cbindgen via
build.rs) into include/lumen_core.h.
Design invariants
- One core, linked everywhere. Protocol/FEC/crypto/pacing live in
lumen-coreexactly once, exposed over a stable, versioned C ABI (lumen_abi_version(),LumenConfigcarries its ownstruct_size). - No async on the hot path. The per-frame pipeline uses native threads only;
tokio/quinnare gated behind the off-by-defaultquicfeature (control plane only). - FEC is the wall-breaker. GF(2⁸) (≤255 shards/block) for Moonlight compat; GF(2¹⁶) (≤65535 shards/block, SIMD, O(n log n)) to push past ~1 Gbps.
License
MIT OR Apache-2.0.