feat: M2 P1.5 (FEC) — nanors-exact Reed-Solomon recovery for the video stream

Moonlight now reconstructs lost video shards from our parity (verified live:
under induced packet loss the picture recovers cleanly instead of failing with
"network connection too bad"; 0% added loss in normal operation).

The decisive finding: Moonlight's nanors uses a CAUCHY generator matrix
(M[j][i] = inv[(m+i)^j], GF(2^8) poly 0x1d), while reed-solomon-erasure is
Vandermonde — so its parity was NOT Moonlight-decodable, despite the old
gf8.rs comment claiming equivalence.

lumen-core:
- Swap the GF(2^8) backend from reed-solomon-erasure to a vendored fec-rs
  (vendor/fec-rs, BSD-2), which builds the byte-identical Cauchy matrix. Pure
  Rust, no FFI — keeps the "one core" hot path. This makes both lumen's own
  protocol and the GameStream parity nanors-compatible.
- Lock it with a regression test against real nanors vectors
  (k=4,m=2 [10,20,30,40] -> parity [136,0]) + an independent matrix-derived
  cross-check + an erase/recover round-trip. Existing FEC/loopback tests stay
  green, so lumen's own protocol is unaffected.

lumen-host video.rs:
- Generate m = ceil(k*pct/100) parity shards per FEC block via Gf8Coder; stamp
  fecInfo with the recomputed wire pct (100*m/k) so the client derives the same
  count; cap per-block data to 255*100/(100+pct) so k+m <= 255.
- CRITICAL byte-exactness: RS runs over the whole `blocksize` shard (Moonlight
  decodes packetSize+16 bytes from the datagram start and PACKET_RECOVERY_FAILUREs
  on a bad reconstructed `flags` byte). So the NV header fields RS must reproduce
  (streamPacketIndex/frameIndex/flags/multiFec*) are written into data shards
  BEFORE encode, and only the transport fields (RTP header/seq/timestamp +
  fecInfo) are stamped AFTER — leaving the flags byte RS-covered. Matches
  Sunshine stream.cpp. Unit-tested incl. flags recovery.
- fec_percentage wired from stream.rs (Sunshine default 20, LUMEN_FEC_PCT
  override; 0 = data-only). LUMEN_VIDEO_DROP injects loss to test recovery.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-09 11:34:27 +00:00
parent 278a6330de
commit 72f8c05aa3
14 changed files with 2921 additions and 212 deletions
+20 -2
View File
@@ -8,6 +8,7 @@ use super::VIDEO_PORT;
use crate::capture::{self, Capturer, FastSyntheticCapturer};
use crate::encode::{self, Codec};
use anyhow::{Context, Result};
use rand::Rng;
use std::net::UdpSocket;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
@@ -82,7 +83,12 @@ fn run(cfg: StreamConfig, running: &AtomicBool) -> Result<()> {
cfg.bitrate_kbps as u64 * 1000,
)
.context("open NVENC for stream")?;
let mut pk = VideoPacketizer::new(cfg.packet_size);
// FEC overhead percent (Sunshine default 20). Override with LUMEN_FEC_PCT (0 = data-only).
let fec_pct: u8 = std::env::var("LUMEN_FEC_PCT")
.ok()
.and_then(|v| v.parse().ok())
.unwrap_or(20);
let mut pk = VideoPacketizer::new(cfg.packet_size, fec_pct);
// Pace at a steady rate (capped at 60fps), re-encoding the last captured frame when the
// compositor produced no new one. wlroots only emits frames on damage, so a static or
@@ -94,6 +100,13 @@ fn run(cfg: StreamConfig, running: &AtomicBool) -> Result<()> {
let mut fps_count: u32 = 0;
let mut fps_t = Instant::now();
let stream_start = Instant::now();
// Test knob: drop this % of outbound packets to exercise FEC recovery (0 = off).
let drop_pct: u32 = std::env::var("LUMEN_VIDEO_DROP")
.ok()
.and_then(|v| v.parse().ok())
.unwrap_or(0);
let mut rng = rand::thread_rng();
let mut dropped: u64 = 0;
while running.load(Ordering::SeqCst) {
let tick = Instant::now();
@@ -113,6 +126,11 @@ fn run(cfg: StreamConfig, running: &AtomicBool) -> Result<()> {
FrameType::P
};
for pkt in pk.packetize(&au.data, ft, ts) {
// Simulated network loss: build the packet (advances seq) but skip the send.
if drop_pct > 0 && rng.gen_range(0..100) < drop_pct {
dropped += 1;
continue;
}
if sock.send(&pkt).is_err() {
client_gone = true;
break;
@@ -130,7 +148,7 @@ fn run(cfg: StreamConfig, running: &AtomicBool) -> Result<()> {
fps_count += 1;
if fps_t.elapsed() >= Duration::from_secs(1) {
tracing::info!(fps = fps_count, sent_pkts, "video: streaming");
tracing::info!(fps = fps_count, sent_pkts, dropped, "video: streaming");
fps_count = 0;
fps_t = Instant::now();
}
+159 -72
View File
@@ -1,15 +1,21 @@
//! GameStream video wire packetization: an encoded access unit → UDP datagrams a stock
//! Moonlight client decodes. Each datagram is
//! Moonlight client decodes (and recovers under loss). Each datagram is
//! `RTP_PACKET(12, big-endian) + reserved[4] + NV_VIDEO_PACKET(16, little-endian) + payload`
//! and the frame's bitstream is prefixed with an 8-byte `video_short_frame_header_t`, then
//! striped into ≤4 FEC blocks of ≤255 data shards. Byte-exact spec:
//! striped into ≤4 FEC blocks of ≤255 shards. Byte-exact spec:
//! `docs/research/gamestream-protocol-research.json` (video plane).
//!
//! P1.3 sends **data shards only** (`fecPercentage = 0`): on a clean LAN the client has
//! every data shard and never runs ReedSolomon recovery, so we get a decodable frame
//! without matching Moonlight's `nanors` parity matrix (that interop work is P1.5). Plaintext
//! only (encryption negotiated off for now). This lives in lumen-host for fast iteration;
//! the wire codec moves into lumen-core (the P1 wire mode) once proven.
//! FEC (P1.5): each block carries `m = ⌈k·pct/100⌉` ReedSolomon parity shards generated by
//! `lumen_core::fec::Gf8Coder` (the nanors-compatible Cauchy GF(2⁸) coder). Crucially, RS runs
//! over the **whole `blocksize` shard** — Moonlight decodes over `packetSize + 16` bytes from
//! the datagram start (`RtpVideoQueue.c`), and rejects a recovered shard whose reconstructed
//! `flags` byte isn't valid — so the NV header fields RS must reproduce (streamPacketIndex,
//! frameIndex, flags, multiFec*) are written into the data shards **before** encoding, and only
//! the transport fields (RTP header/seq/timestamp + fecInfo) are stamped **after**, matching
//! Sunshine `stream.cpp`. `pct = 0` falls back to data-shards-only. Plaintext (AES-GCM video
//! encryption is negotiated off for now).
use lumen_core::fec::{ErasureCoder, Gf8Coder};
/// RTP `header` byte: version 2 (0x80) | extension (0x10) — Moonlight keys on the extension.
const RTP_HEADER_BYTE: u8 = 0x80 | 0x10;
@@ -28,28 +34,32 @@ pub enum FrameType {
P,
}
/// Splits encoded access units into GameStream video datagrams.
/// Splits encoded access units into GameStream video datagrams (data + FEC parity shards).
pub struct VideoPacketizer {
/// Negotiated `packetSize` (ANNOUNCE `x-nv-video[0].packetSize`).
packet_size: usize,
/// Per-shard payload bytes = `blocksize - SHARD_HEADER`, `blocksize = packetSize + 16`.
payload_per_shard: usize,
/// Requested FEC overhead percent (0 = data shards only). The wire carries the recomputed
/// per-block `(100·m)/k` so Moonlight derives the same parity count.
fec_percentage: usize,
frame_index: u32,
/// Monotonic per-stream packet counter (the RTP sequence / streamPacketIndex source).
seq: u32,
}
impl VideoPacketizer {
pub fn new(packet_size: usize) -> Self {
pub fn new(packet_size: usize, fec_percentage: u8) -> Self {
VideoPacketizer {
packet_size,
payload_per_shard: packet_size + 16 - SHARD_HEADER,
fec_percentage: fec_percentage as usize,
frame_index: 0,
seq: 0,
}
}
/// Packetize one encoded AU into wire datagrams (ready for UDP send).
/// Packetize one encoded AU into wire datagrams (data shards + Cauchy RS parity shards).
pub fn packetize(
&mut self,
au: &[u8],
@@ -59,6 +69,8 @@ impl VideoPacketizer {
let frame_index = self.frame_index;
self.frame_index = self.frame_index.wrapping_add(1);
let pps = self.payload_per_shard;
let blocksize = SHARD_HEADER + pps; // = packet_size + 16
let pct = self.fec_percentage;
// frame payload = 8-byte short frame header + the AU bitstream.
let total_len = 8 + au.len();
@@ -71,53 +83,120 @@ impl VideoPacketizer {
fp.extend_from_slice(au);
let total_data = total_len.div_ceil(pps).max(1);
let n_blocks = total_data
.div_ceil(MAX_DATA_SHARDS_PER_BLOCK)
.clamp(1, MAX_FEC_BLOCKS);
// With parity, cap per-block data so k + m ≤ 255 (the GF(2⁸) ceiling): parity for k
// data shards is ⌈k·pct/100⌉, so k ≤ 255·100/(100+pct).
let max_data = if pct > 0 {
(255 * 100) / (100 + pct)
} else {
MAX_DATA_SHARDS_PER_BLOCK
};
let n_blocks = total_data.div_ceil(max_data).clamp(1, MAX_FEC_BLOCKS);
let per_block = total_data.div_ceil(n_blocks);
let mut packets = Vec::with_capacity(total_data);
let mut packets = Vec::with_capacity(total_data + total_data * pct / 100 + n_blocks);
for b in 0..n_blocks {
let first = b * per_block;
let last = ((b + 1) * per_block).min(total_data);
if first >= last {
break;
}
let block_data_count = last - first;
for (fec_index, shard) in (first..last).enumerate() {
let start = shard * pps;
let end = (start + pps).min(fp.len());
let mut payload = vec![0u8; pps]; // last shard zero-padded
payload[..end - start].copy_from_slice(&fp[start..end]);
let k = last - first;
let block_seq_base = self.seq;
let multi_fec_blocks = ((b as u8) << 4) | (((n_blocks - 1) as u8) << 6);
// 1. Build this block's k data-shard datagrams (full `blocksize`), writing the NV
// header fields RS must reproduce on recovery (streamPacketIndex, frameIndex,
// flags, multiFec*). The RTP header + fecInfo are left zero (stamped post-RS).
let mut shards: Vec<Vec<u8>> = Vec::with_capacity(k);
for i in 0..k {
let global = first + i;
let seq = block_seq_base + i as u32;
let mut buf = vec![0u8; blocksize];
let mut flags = FLAG_PIC;
if shard == 0 {
if global == 0 {
flags |= FLAG_SOF;
}
if shard == total_data - 1 {
if global == total_data - 1 {
flags |= FLAG_EOF;
}
let multi_fec_blocks = ((b as u8) << 4) | (((n_blocks - 1) as u8) << 6);
// fecInfo: dataShards<<22 | fecIndex<<12 | fecPercentage<<4 (pct = 0).
let fec_info: u32 = ((block_data_count as u32) << 22) | ((fec_index as u32) << 12);
let seq = self.seq;
self.seq = self.seq.wrapping_add(1);
buf[16..20].copy_from_slice(&(seq << 8).to_le_bytes()); // streamPacketIndex
buf[20..24].copy_from_slice(&frame_index.to_le_bytes()); // frameIndex
buf[24] = flags;
buf[26] = MULTI_FEC_FLAGS;
buf[27] = multi_fec_blocks;
let ps = global * pps;
let pe = (ps + pps).min(fp.len());
buf[SHARD_HEADER..SHARD_HEADER + (pe - ps)].copy_from_slice(&fp[ps..pe]);
shards.push(buf);
}
packets.push(build_packet(
// 2. m = ⌈k·pct/100⌉ parity shards over the full datagrams. The wire percentage is
// recomputed from m so the client derives the same parity count.
let m = if pct > 0 { (k * pct).div_ceil(100) } else { 0 };
let wire_pct = if m > 0 { (100 * m) / k } else { 0 };
let parity = if m > 0 {
Gf8Coder.encode(&shards, m).unwrap_or_default()
} else {
Vec::new()
};
// 3. Stamp transport headers (RTP + fecInfo) on every shard. We do NOT touch the
// flags/streamPacketIndex bytes, so a recovered data shard's RS-reconstructed
// NV header stays valid.
self.seq = block_seq_base + k as u32;
for (i, mut buf) in shards.into_iter().enumerate() {
let seq = block_seq_base + i as u32;
finalize(
&mut buf,
seq,
timestamp_90k,
frame_index,
flags,
multi_fec_blocks,
fec_info,
&payload,
));
fec_info(k, i, wire_pct),
);
packets.push(buf);
}
for (j, mut buf) in parity.into_iter().enumerate() {
let seq = self.seq;
self.seq = self.seq.wrapping_add(1);
finalize(
&mut buf,
seq,
timestamp_90k,
frame_index,
multi_fec_blocks,
fec_info(k, k + j, wire_pct),
);
packets.push(buf);
}
}
packets
}
}
/// `fecInfo` (u32, little-endian): `dataShards<<22 | fecIndex<<12 | fecPercentage<<4`.
fn fec_info(k: usize, fec_index: usize, pct: usize) -> u32 {
((k as u32) << 22) | ((fec_index as u32) << 12) | ((pct as u32) << 4)
}
/// Stamp the post-RS transport fields into a shard datagram (in place). Leaves the NV
/// `flags`/`streamPacketIndex`/`multiFecFlags` bytes untouched (RS-covered).
fn finalize(
buf: &mut [u8],
seq: u32,
ts_90k: u32,
frame_index: u32,
multi_fec_blocks: u8,
fec_info: u32,
) {
buf[0] = RTP_HEADER_BYTE; // header (version 2 + extension)
buf[2..4].copy_from_slice(&(seq as u16).to_be_bytes()); // sequenceNumber (BE)
buf[4..8].copy_from_slice(&ts_90k.to_be_bytes()); // timestamp (90 kHz, BE)
buf[20..24].copy_from_slice(&frame_index.to_le_bytes()); // frameIndex (re-affirm for parity)
buf[27] = multi_fec_blocks; // re-affirm for parity
buf[28..32].copy_from_slice(&fec_info.to_le_bytes()); // fecInfo (LE)
}
/// 8-byte `video_short_frame_header_t` (little-endian), prefixed to the AU bitstream.
fn short_frame_header(frame_type: FrameType, last_payload_len: u16) -> [u8; 8] {
let mut h = [0u8; 8];
@@ -132,55 +211,21 @@ fn short_frame_header(frame_type: FrameType, last_payload_len: u16) -> [u8; 8] {
h
}
/// Build one wire datagram: RTP(BE) + reserved + NV_VIDEO_PACKET(LE) + payload.
fn build_packet(
seq: u32,
timestamp_90k: u32,
frame_index: u32,
flags: u8,
multi_fec_blocks: u8,
fec_info: u32,
payload: &[u8],
) -> Vec<u8> {
let mut p = Vec::with_capacity(SHARD_HEADER + payload.len());
// --- RTP_PACKET (12 bytes, big-endian) ---
p.push(RTP_HEADER_BYTE); // header
p.push(0); // packetType (unused for video)
p.extend_from_slice(&(seq as u16).to_be_bytes()); // sequenceNumber
p.extend_from_slice(&timestamp_90k.to_be_bytes()); // timestamp (90 kHz)
p.extend_from_slice(&0u32.to_be_bytes()); // ssrc
// --- reserved[4] ---
p.extend_from_slice(&[0u8; 4]);
// --- NV_VIDEO_PACKET (16 bytes, little-endian) ---
p.extend_from_slice(&(seq << 8).to_le_bytes()); // streamPacketIndex (low byte 0)
p.extend_from_slice(&frame_index.to_le_bytes()); // frameIndex
p.push(flags);
p.push(0); // extraFlags
p.push(MULTI_FEC_FLAGS);
p.push(multi_fec_blocks);
p.extend_from_slice(&fec_info.to_le_bytes()); // fecInfo
// --- payload ---
p.extend_from_slice(payload);
p
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn single_block_layout() {
let mut pk = VideoPacketizer::new(1392); // payload_per_shard = 1392+16-32 = 1376
let mut pk = VideoPacketizer::new(1392, 0); // data-only; pps = 1392+16-32 = 1376
assert_eq!(pk.payload_per_shard, 1376);
let au = vec![0xABu8; 4000]; // 8+4000 = 4008 → ceil(4008/1376) = 3 data shards
let pkts = pk.packetize(&au, FrameType::Idr, 90_000);
assert_eq!(pkts.len(), 3);
// Every datagram is SHARD_HEADER + payload_per_shard.
for p in &pkts {
assert_eq!(p.len(), SHARD_HEADER + 1376);
assert_eq!(p[0], 0x90); // RTP header byte
}
// First packet: SOF set, fecIndex 0, frameIndex 0.
let first = &pkts[0];
assert_eq!(first[24] & FLAG_SOF, FLAG_SOF);
assert_eq!(first[24] & FLAG_PIC, FLAG_PIC);
@@ -189,12 +234,10 @@ mod tests {
let fec_info = u32::from_le_bytes(first[28..32].try_into().unwrap());
assert_eq!(fec_info >> 22, 3); // dataShards = 3
assert_eq!((fec_info >> 12) & 0x3ff, 0); // fecIndex 0
// Last packet: EOF set, fecIndex 2.
let last = &pkts[2];
assert_eq!(last[24] & FLAG_EOF, FLAG_EOF);
let fec_info_last = u32::from_le_bytes(last[28..32].try_into().unwrap());
assert_eq!((fec_info_last >> 12) & 0x3ff, 2);
// RTP sequence numbers are 0,1,2.
for (i, p) in pkts.iter().enumerate() {
assert_eq!(u16::from_be_bytes(p[2..4].try_into().unwrap()), i as u16);
}
@@ -202,15 +245,59 @@ mod tests {
#[test]
fn multi_block_split() {
let mut pk = VideoPacketizer::new(1392);
// Need > 255 data shards → multi-block. 255*1376 ≈ 351 KB; use 600 KB.
let mut pk = VideoPacketizer::new(1392, 0); // data-only
let au = vec![0u8; 600_000];
let pkts = pk.packetize(&au, FrameType::P, 0);
let total = (8 + au.len()).div_ceil(1376);
assert_eq!(pkts.len(), total);
// n_blocks = ceil(total/255), clamped to 4; check multiFecBlocks lastBlock nibble.
let n_blocks = total.div_ceil(255).clamp(1, 4);
let last_block = ((pkts.last().unwrap()[27]) >> 6) & 0x3;
assert_eq!(last_block as usize, n_blocks - 1);
}
#[test]
fn emits_parity_shards() {
let mut pk = VideoPacketizer::new(1392, 20); // pps = 1376, 20% FEC
let au = vec![0xABu8; 4000]; // 8+4000 = 4008 → 3 data shards (k=3)
let pkts = pk.packetize(&au, FrameType::Idr, 0);
// m = ceil(3*20/100) = 1 parity shard → 4 packets; wire_pct = 100*1/3 = 33.
assert_eq!(pkts.len(), 4);
for p in &pkts {
let fec_info = u32::from_le_bytes(p[28..32].try_into().unwrap());
assert_eq!(fec_info >> 22, 3); // dataShards = k = 3
assert_eq!((fec_info >> 4) & 0xff, 33); // wire fecPercentage
}
// The parity shard is last: fecIndex = k = 3.
let parity = &pkts[3];
let fec_info = u32::from_le_bytes(parity[28..32].try_into().unwrap());
assert_eq!((fec_info >> 12) & 0x3ff, 3);
// Data shards keep SOF (first) / EOF (last data shard) / PIC.
assert_eq!(pkts[0][24] & FLAG_SOF, FLAG_SOF);
assert_eq!(pkts[2][24] & FLAG_EOF, FLAG_EOF);
// RTP sequence numbers are contiguous across data + parity (0,1,2,3).
for (i, p) in pkts.iter().enumerate() {
assert_eq!(u16::from_be_bytes(p[2..4].try_into().unwrap()), i as u16);
}
}
/// End-to-end recovery: parity over the full datagram reconstructs a dropped data shard's
/// payload AND its NV `flags` byte (the byte Moonlight validates), proving the layout.
#[test]
fn parity_recovers_full_datagram_incl_flags() {
let mut pk = VideoPacketizer::new(1392, 50); // high pct → plenty of parity
let au = vec![0x5Au8; 4000]; // k = 3
let pkts = pk.packetize(&au, FrameType::Idr, 0);
let k = 3usize;
let m = pkts.len() - k;
assert!(m >= 1);
// Drop data shard 1; reconstruct from the rest via the same Cauchy coder.
let mut received: Vec<Option<Vec<u8>>> = pkts.iter().map(|p| Some(p.clone())).collect();
received[1] = None;
let recovered = Gf8Coder.reconstruct(k, m, &mut received).unwrap();
// The recovered shard equals the original data shard's RS-covered bytes: its flags
// byte (offset 24) is PIC (middle shard), proving the NV header recovers correctly.
assert_eq!(recovered[1][24], FLAG_PIC);
// ...and the payload region matches the original.
assert_eq!(recovered[1][SHARD_HEADER..], pkts[1][SHARD_HEADER..]);
}
}