T

enricobuehler aa91485008 feat: M2 — complete zero-copy dmabuf→NVENC capture path (EGL/GL→CUDA)

The PipeWire dmabuf now reaches NVENC with no CPU touch. Verified live against
headless KWin: a tiled BGRx dmabuf is imported and encoded to a pixel-correct
H.265 stream (decoded frame matches the captured desktop — no tiling artifacts,
no colour swap). The CPU-copy path stays the default and the runtime fallback.

Capture side (zerocopy::egl): desktop NVIDIA can't register a dmabuf EGLImage
with CUDA directly (cuGraphicsEGLRegisterImage is Tegra-only; cuGraphicsGLRegisterImage
rejects EGLImage-backed textures), so we follow OBS/Sunshine — bind the EGLImage
to a GL texture, render it through a fullscreen-triangle shader into an immutable
GL_RGBA8 texture (de-tiling + .bgra swizzle to the BGRx the encoder wants), then
register that texture with CUDA and copy it device-to-device into an owned buffer
so the dmabuf returns to the compositor immediately.

Encode side (encode/linux::submit_cuda): take a *pooled* CUDA surface via
av_hwframe_get_buffer and device→device-copy our imported buffer into it, instead
of wrapping our own pointer in a bare AVFrame. A bare frame is rejected with
EINVAL (NVENC ignores frames with null buf[0]; the encode path's av_frame_ref
needs a refcounted buffer), and a fresh device pointer every frame would thrash
NVENC's bounded resource-registration cache — the pool recycles a small set.

Also: gate FFmpeg AV_LOG_DEBUG behind LUMEN_FFMPEG_DEBUG for diagnosing
hw-frame rejects, and refresh the now-accurate module docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-09 16:28:29 +00:00

.gitea/workflows

feat: M1 lumen-core (FEC/crypto/packet/session + C ABI) and workspace scaffold

2026-06-09 00:02:52 +02:00

clients

feat: M1 lumen-core (FEC/crypto/packet/session + C ABI) and workspace scaffold

2026-06-09 00:02:52 +02:00

crates

feat: M2 — complete zero-copy dmabuf→NVENC capture path (EGL/GL→CUDA)

2026-06-09 16:28:29 +00:00

docs

feat: M0 capture→encode pipeline + M2 GameStream host (pairing, RTSP, video)

2026-06-09 07:14:59 +00:00

include

feat: M1 lumen-core (FEC/crypto/packet/session + C ABI) and workspace scaffold

2026-06-09 00:02:52 +02:00

scripts

feat: M2 P1.6 — audio (Opus + AES-CBC) and steady-rate video pacing

2026-06-09 10:39:22 +00:00

tools

feat: M1 lumen-core (FEC/crypto/packet/session + C ABI) and workspace scaffold

2026-06-09 00:02:52 +02:00

.gitignore

feat: M1 lumen-core (FEC/crypto/packet/session + C ABI) and workspace scaffold

2026-06-09 00:02:52 +02:00

Cargo.lock

feat: M2 zero-copy foundation — EGL→CUDA import + NVENC CUDA-frame path

2026-06-09 15:13:05 +00:00

Cargo.toml

feat: M1 lumen-core (FEC/crypto/packet/session + C ABI) and workspace scaffold

2026-06-09 00:02:52 +02:00

CLAUDE.md

feat: M0 capture→encode pipeline + M2 GameStream host (pairing, RTSP, video)

2026-06-09 07:14:59 +00:00

README.md

feat: M0 capture→encode pipeline + M2 GameStream host (pairing, RTSP, video)

2026-06-09 07:14:59 +00:00

rust-toolchain.toml

feat: M1 lumen-core (FEC/crypto/packet/session + C ABI) and workspace scaffold

2026-06-09 00:02:52 +02:00

README.md

lumen

A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.

lumen is a placeholder codename. The bet: ship a Linux virtual-display streaming host that speaks the existing Moonlight protocol (every Moonlight/Artemis client works day one), then break the ~1 Gbps FEC wall with a GF(2¹⁶) Leopard-RS transport as a negotiated extension. See docs/implementation-plan.md.

Status

Milestone	State
M1 — `lumen-core` + C ABI	✅ done & tested (FEC, packetization, crypto, session, `lumen_core.h`)
M0 — pipeline spike (wlroots→PipeWire→NVENC→file→`lumen-core`)	✅ done & verified on NVIDIA (RTX 5070 Ti / driver 595)
M2 — P1 host → stock Moonlight	🟡 capture+encode landed in M0; pairing/RTSP/vdisplay pending
M3 — measurement harness	🟡 `tools/loss-harness` runs; `latency-probe` scaffolded
M4 — P2 transport + Rust client	🟡 GF(2¹⁶) core done; `lumen-client-rs` scaffolded
M5 — Apple client	⬜ scaffolded (`clients/apple`)

lumen-core is complete and verified: it builds and its full test suite (FEC recovery, loopback round-trip under loss, property tests, and a C ABI harness) passes on macOS/aarch64. M0 is done: lumen-host captures a headless wlroots output via the ScreenCast portal + PipeWire, encodes it with NVENC, writes a playable H.265 file, and round-trips every access unit through a lumen_core host→client session (see docs/linux-setup.md). The remaining Linux host backends (KWin/Mutter virtual displays, libei input, web/pairing) are #[cfg(target_os = "linux")] seams — defined and compiling, implementations pending (M2).

Layout

crates/
  lumen-core/        protocol · FEC · pacing · crypto — the C ABI (lib + cdylib + staticlib)
  lumen-host/        Linux host: vdisplay · capture · encode · inject · web (cfg-gated)
  lumen-client-rs/   reference client (M4): VAAPI decode + wgpu present
clients/{apple,android}/   native client scaffolds (import lumen_core.h)
include/lumen_core.h       cbindgen-generated C header (checked in)
tools/{latency-probe,loss-harness}/   measurement (plan §10)
docs/implementation-plan.md

Build & test

cargo build --workspace          # green on Linux and macOS
cargo test  --workspace          # unit + loopback + proptest + C ABI harness
cargo clippy --workspace --all-targets

cargo run -p loss-harness        # FEC loss-resilience sweep (no network needed)
bash crates/lumen-core/tests/c/run.sh   # standalone C-ABI link+round-trip proof

The C header regenerates from crates/lumen-core/src/abi.rs on every build (cbindgen via build.rs) into include/lumen_core.h.

Design invariants

One core, linked everywhere. Protocol/FEC/crypto/pacing live in lumen-core exactly once, exposed over a stable, versioned C ABI (lumen_abi_version(), LumenConfig carries its own struct_size).
No async on the hot path. The per-frame pipeline uses native threads only; tokio/quinn are gated behind the off-by-default quic feature (control plane only).
FEC is the wall-breaker. GF(2⁸) (≤255 shards/block) for Moonlight compat; GF(2¹⁶) (≤65535 shards/block, SIMD, O(n log n)) to push past ~1 Gbps.

License

MIT OR Apache-2.0.

Releases 3

v0.2.1 Latest

2026-06-28 12:51:55 +00:00

Languages

Rust 72%

Swift 12.3%

TypeScript 4.1%

Kotlin 3.2%

Shell 3.2%

Other 5.1%