Ubuntu 26.04 ships FFmpeg 8.0 (libavcodec 62); bump ffmpeg-next 7.1 -> 8.1 to bind it
as the intended pairing. No source changes needed — the encode API surface we use
(avcodec_send_frame, hwframe contexts, AV_PIX_FMT_CUDA, av_log) is stable across 7->8.
Workspace builds + all tests green; clippy/fmt clean. Refresh the 7.x doc references.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The PipeWire dmabuf now reaches NVENC with no CPU touch. Verified live against
headless KWin: a tiled BGRx dmabuf is imported and encoded to a pixel-correct
H.265 stream (decoded frame matches the captured desktop — no tiling artifacts,
no colour swap). The CPU-copy path stays the default and the runtime fallback.
Capture side (zerocopy::egl): desktop NVIDIA can't register a dmabuf EGLImage
with CUDA directly (cuGraphicsEGLRegisterImage is Tegra-only; cuGraphicsGLRegisterImage
rejects EGLImage-backed textures), so we follow OBS/Sunshine — bind the EGLImage
to a GL texture, render it through a fullscreen-triangle shader into an immutable
GL_RGBA8 texture (de-tiling + .bgra swizzle to the BGRx the encoder wants), then
register that texture with CUDA and copy it device-to-device into an owned buffer
so the dmabuf returns to the compositor immediately.
Encode side (encode/linux::submit_cuda): take a *pooled* CUDA surface via
av_hwframe_get_buffer and device→device-copy our imported buffer into it, instead
of wrapping our own pointer in a bare AVFrame. A bare frame is rejected with
EINVAL (NVENC ignores frames with null buf[0]; the encode path's av_frame_ref
needs a refcounted buffer), and a fresh device pointer every frame would thrash
NVENC's bounded resource-registration cache — the pool recycles a small set.
Also: gate FFmpeg AV_LOG_DEBUG behind LUMEN_FFMPEG_DEBUG for diagnosing
hw-frame rejects, and refresh the now-accurate module docs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scaffolding for dmabuf zero-copy (plan §9), opt-in via LUMEN_ZEROCOPY:
- src/zerocopy/{cuda,egl}.rs: hand-rolled CUDA Driver-API FFI (no Rust crate
exposes the EGL-interop calls / CUeglFrame) with a shared process-wide
CUcontext + pitched device buffers; an EGL importer (GBM platform on the
NVIDIA render node) that turns a dmabuf into an EGLImage, registers it with
CUDA, and copies it device-to-device into an owned buffer. `zerocopy-probe`
subcommand validates the FFI/linking/GPU access — confirmed on the box
(driver 595, EGL_EXT_image_dma_buf_import + modifiers).
- CapturedFrame gains a FramePayload enum (Cpu(Vec<u8>) | Cuda(DeviceBuffer));
the encoder branches: CPU keeps the expand+upload path, CUDA wraps the device
buffer in an AV_PIX_FMT_CUDA frame fed straight to hevc_nvenc (sharing our
CUcontext via a hand-declared AVCUDADeviceContext, since ffmpeg-sys doesn't
bind hwcontext_cuda.h). open_video/the encoder take a `cuda` flag derived from
the first frame's payload.
The capture-side dmabuf negotiation (which produces the Cuda frames) is the
next step; the CPU path is unchanged and remains the default + fallback. Builds
clean, clippy clean, tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Graceful FEC behavior on a lossy link: at a realistic 2% packet loss the stream
is now steady 0% (was spiking 40-60%). Verified live.
- IDR/RFI handling: the control thread recognizes the client's recovery requests
(0x0301 invalidate-reference-frames, 0x0302 request-IDR, 0x0305) and sets a
shared force_idr flag; the video thread forces an NVENC keyframe on the next
frame (Encoder::request_keyframe → input frame pict_type = I). Without this, a
frame that exceeds the FEC budget broke the reference chain until the next GOP
IDR (~2s), cascading to most of the stream being undecodable.
- Min-parity floor: honor the client's x-nv-vqos[0].fec.minRequiredFecPackets
(it asks for 2). Small P-frames previously got m=ceil(k*20/100)=1 parity — a
single loss broke them; flooring m>=2 (capped so k+m<=255, wire pct recomputed)
protects them. This is what turned the 2% spikes into steady 0%.
- Send pacing: spread each frame's packets evenly across the frame interval
instead of blasting them at line rate (a real link drops microbursts), matching
Sunshine's rate-controlled sends; sub-500us sleeps skipped (unreliable).
Note: sustained ~8% uniform loss still degrades — that exceeds 20% FEC for
reference-frame video and real Sunshine degrades there too; real networks are
<1% or bursty, which this now handles cleanly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>