feat(punktfunk/1): request-IDR recovery for a wedged client decode
apple / swift (push) Successful in 1m17s
ci / rust (push) Failing after 31s
ci / web (push) Failing after 42s
ci / docs-site (push) Failing after 40s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 10s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 6s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 15s
deb / build-publish (push) Failing after 43s
apple / swift (push) Successful in 1m17s
ci / rust (push) Failing after 31s
ci / web (push) Failing after 42s
ci / docs-site (push) Failing after 40s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 10s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 6s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 15s
deb / build-publish (push) Failing after 43s
Fixes the intermittent first-connect freeze. The host streams infinite GOP — one opening IDR, then P-frames only (recovery keyframes just on loss) — so when the client's decoder wedges on the cold first session (a lost/corrupt opening IDR, a bad early P-frame) the picture stays frozen until the far-off next keyframe. The client had no way to ask for one; now it does. Add a RequestKeyframe control message (client -> host, reliable control stream), mirroring Reconfigure: - core: quic.rs RequestKeyframe (type 0x03) + roundtrip test; client.rs CtrlRequest::Keyframe + NativeClient::request_keyframe; abi.rs punktfunk_connection_request_keyframe (header regenerated). - host: m3.rs decodes it in the control loop and signals the encode loop, which coalesces a burst and calls enc.request_keyframe() — wiring the existing NvencEncoder hook (force_kf -> next frame pict_type=I), the same recovery the GameStream path already had via force_idr. - apple: PunktfunkConnection.requestKeyframe(); StreamPump (stage-1) requests on layer.status==.failed; Stage2Pipeline (stage-2) on a sync submit failure and on the async decode-error callback via a thread-safe KeyframeRecovery. All throttled to <=1/250ms (the decode stays wedged for several frames until the IDR lands, so per-frame requests would flood the control stream). Self-healing: a lost recovery IDR is re-requested after the throttle; the host coalesces bursts into a single IDR. Validated: cargo fmt + clippy clean; core + host test suites green (incl. new request_keyframe_roundtrip); swift build + test (39 passed); xcframework rebuilt (all 5 slices), header regenerated with no unrelated drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1383,6 +1383,32 @@ pub unsafe extern "C" fn punktfunk_connection_request_mode(
|
||||
})
|
||||
}
|
||||
|
||||
/// Ask the host's encoder to emit a fresh IDR keyframe now — client recovery when the
|
||||
/// decoder has stalled (the infinite-GOP stream sends one opening IDR then P-frames only, so
|
||||
/// a wedged decoder would otherwise freeze until the next loss-triggered recovery keyframe).
|
||||
/// Non-blocking, fire-and-forget; the recovered keyframe is the only ack. The caller should
|
||||
/// THROTTLE — the decode stays wedged for several frames until the IDR lands, so requesting
|
||||
/// every frame would flood the control stream.
|
||||
///
|
||||
/// # Safety
|
||||
/// `c` is a valid connection handle.
|
||||
#[cfg(feature = "quic")]
|
||||
#[no_mangle]
|
||||
pub unsafe extern "C" fn punktfunk_connection_request_keyframe(
|
||||
c: *const PunktfunkConnection,
|
||||
) -> PunktfunkStatus {
|
||||
guard(|| {
|
||||
let c = match unsafe { c.as_ref() } {
|
||||
Some(c) => c,
|
||||
None => return PunktfunkStatus::NullPointer,
|
||||
};
|
||||
match c.inner.request_keyframe() {
|
||||
Ok(()) => PunktfunkStatus::Ok,
|
||||
Err(e) => e.status(),
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
/// A speed-test measurement, filled by [`punktfunk_connection_probe_result`]. `done` is 0 until
|
||||
/// the host's end-of-burst report lands, then 1 (the numbers are final). `throughput_kbps` is the
|
||||
/// measured goodput to drive a bitrate choice from; `loss_pct` is the delivery loss at that rate.
|
||||
|
||||
@@ -17,7 +17,7 @@ use crate::input::InputEvent;
|
||||
use crate::packet::FLAG_PROBE;
|
||||
use crate::quic::{
|
||||
endpoint, io, Hello, HidOutput, ProbeRequest, ProbeResult, Reconfigure, Reconfigured,
|
||||
RichInput, Start, Welcome,
|
||||
RequestKeyframe, RichInput, Start, Welcome,
|
||||
};
|
||||
use crate::session::{Frame, Session};
|
||||
use crate::transport::UdpTransport;
|
||||
@@ -32,6 +32,7 @@ use std::time::{Duration, Instant};
|
||||
enum CtrlRequest {
|
||||
Mode(Mode),
|
||||
Probe(ProbeRequest),
|
||||
Keyframe,
|
||||
}
|
||||
|
||||
/// What the worker reports to [`NativeClient::connect`] once the handshake lands: the negotiated
|
||||
@@ -365,6 +366,16 @@ impl NativeClient {
|
||||
.map_err(|_| PunktfunkError::Closed)
|
||||
}
|
||||
|
||||
/// Ask the host's encoder to emit a fresh IDR keyframe now (client recovery on a stalled
|
||||
/// decode). Non-blocking, fire-and-forget — the recovered keyframe is the only ack. The
|
||||
/// caller should throttle (the decode stays wedged across several frames until the IDR
|
||||
/// lands, so requesting on every frame would flood the control stream).
|
||||
pub fn request_keyframe(&self) -> Result<()> {
|
||||
self.ctrl_tx
|
||||
.send(CtrlRequest::Keyframe)
|
||||
.map_err(|_| PunktfunkError::Closed)
|
||||
}
|
||||
|
||||
/// Start a bandwidth speed test: ask the host to burst filler over the data plane at
|
||||
/// `target_kbps` of goodput for `duration_ms`, *briefly pausing video*. Non-blocking — the
|
||||
/// measurement accumulates in the background; poll [`NativeClient::probe_result`] until its
|
||||
@@ -716,6 +727,7 @@ async fn worker_main(args: WorkerArgs) {
|
||||
let bytes = match req {
|
||||
CtrlRequest::Mode(m) => Reconfigure { mode: m }.encode(),
|
||||
CtrlRequest::Probe(p) => p.encode(),
|
||||
CtrlRequest::Keyframe => RequestKeyframe.encode(),
|
||||
};
|
||||
if io::write_msg(&mut ctrl_send, &bytes).await.is_err() {
|
||||
break;
|
||||
|
||||
@@ -126,6 +126,16 @@ pub struct Reconfigured {
|
||||
pub mode: Mode,
|
||||
}
|
||||
|
||||
/// `client → host`, any time after [`Start`]: ask the host's encoder to emit a fresh IDR
|
||||
/// keyframe NOW. The infinite-GOP stream opens with one IDR then sends P-frames only, so a
|
||||
/// decoder that wedges (a lost/corrupt opening IDR, a bad early P-frame — most likely on the
|
||||
/// cold first session) would otherwise stay frozen until the next loss-triggered recovery
|
||||
/// keyframe, which may be far off. The client sends this when it detects a stalled decode;
|
||||
/// the host forces the next frame to be an IDR with in-band parameter sets, recovering the
|
||||
/// picture in ~one frame. Fire-and-forget — no reply (the recovered IDR is the ack).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||||
pub struct RequestKeyframe;
|
||||
|
||||
/// `client → host`, any time after [`Start`]: run a bandwidth speed test. The host bursts
|
||||
/// filler access units (flagged [`crate::packet::FLAG_PROBE`]) over the data plane at
|
||||
/// `target_kbps` of application goodput for `duration_ms`, *pausing video for the duration*, then
|
||||
@@ -195,6 +205,8 @@ pub fn clock_offset_ns(samples: &[(u64, u64, u64, u64)]) -> Option<(i64, u64)> {
|
||||
pub const MSG_RECONFIGURE: u8 = 0x01;
|
||||
/// Type byte of [`Reconfigured`].
|
||||
pub const MSG_RECONFIGURED: u8 = 0x02;
|
||||
/// Type byte of [`RequestKeyframe`].
|
||||
pub const MSG_REQUEST_KEYFRAME: u8 = 0x03;
|
||||
/// Type byte of [`ProbeRequest`].
|
||||
pub const MSG_PROBE_REQUEST: u8 = 0x20;
|
||||
/// Type byte of [`ProbeResult`].
|
||||
@@ -699,6 +711,23 @@ impl Reconfigured {
|
||||
}
|
||||
}
|
||||
|
||||
impl RequestKeyframe {
|
||||
pub fn encode(&self) -> Vec<u8> {
|
||||
// magic[0..4] type[4] — no payload
|
||||
let mut b = Vec::with_capacity(5);
|
||||
b.extend_from_slice(CTL_MAGIC);
|
||||
b.push(MSG_REQUEST_KEYFRAME);
|
||||
b
|
||||
}
|
||||
|
||||
pub fn decode(b: &[u8]) -> Result<RequestKeyframe> {
|
||||
if b.len() != 5 || &b[0..4] != CTL_MAGIC || b[4] != MSG_REQUEST_KEYFRAME {
|
||||
return Err(PunktfunkError::InvalidArg("bad RequestKeyframe"));
|
||||
}
|
||||
Ok(RequestKeyframe)
|
||||
}
|
||||
}
|
||||
|
||||
impl ProbeRequest {
|
||||
pub fn encode(&self) -> Vec<u8> {
|
||||
// magic[0..4] type[4] target_kbps[5..9] duration_ms[9..13]
|
||||
@@ -1660,6 +1689,22 @@ mod tests {
|
||||
.is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn request_keyframe_roundtrip() {
|
||||
let bytes = RequestKeyframe.encode();
|
||||
assert!(RequestKeyframe::decode(&bytes).is_ok());
|
||||
// Distinct from the other control messages — its type byte must not collide.
|
||||
let mode = Mode {
|
||||
width: 1280,
|
||||
height: 720,
|
||||
refresh_hz: 60,
|
||||
};
|
||||
assert!(RequestKeyframe::decode(&Reconfigure { mode }.encode()).is_err());
|
||||
assert!(Reconfigure::decode(&bytes).is_err());
|
||||
// Length is exact (no trailing bytes accepted).
|
||||
assert!(RequestKeyframe::decode(&[bytes.as_slice(), &[0]].concat()).is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn probe_messages_roundtrip() {
|
||||
let req = ProbeRequest {
|
||||
|
||||
@@ -28,7 +28,7 @@ use punktfunk_core::input::{InputEvent, InputKind};
|
||||
use punktfunk_core::packet::{FLAG_PIC, FLAG_PROBE, FLAG_SOF};
|
||||
use punktfunk_core::quic::{
|
||||
endpoint, io, ClockEcho, ClockProbe, Hello, PairChallenge, PairProof, PairRequest, PairResult,
|
||||
ProbeRequest, ProbeResult, Reconfigure, Reconfigured, Start, Welcome,
|
||||
ProbeRequest, ProbeResult, Reconfigure, Reconfigured, RequestKeyframe, Start, Welcome,
|
||||
};
|
||||
use punktfunk_core::transport::UdpTransport;
|
||||
use punktfunk_core::Session;
|
||||
@@ -578,6 +578,7 @@ async fn serve_session(
|
||||
// hands back a ProbeResult that this task writes to the client. The two control directions
|
||||
// (inbound requests, outbound probe results) are multiplexed with `select!`.
|
||||
let (reconfig_tx, reconfig_rx) = std::sync::mpsc::channel::<punktfunk_core::Mode>();
|
||||
let (keyframe_tx, keyframe_rx) = std::sync::mpsc::channel::<()>();
|
||||
let (probe_tx, probe_rx) = std::sync::mpsc::channel::<ProbeRequest>();
|
||||
let (probe_result_tx, mut probe_result_rx) =
|
||||
tokio::sync::mpsc::unbounded_channel::<ProbeResult>();
|
||||
@@ -608,6 +609,14 @@ async fn serve_session(
|
||||
if ok && reconfig_tx.send(req.mode).is_err() {
|
||||
break; // data plane gone
|
||||
}
|
||||
} else if RequestKeyframe::decode(&msg).is_ok() {
|
||||
// Client recovery: its decoder wedged — force the next encoded frame to
|
||||
// be an IDR. Coalesced in the encode loop (a wedge fires several before
|
||||
// the IDR lands); a send error just means the data plane is gone.
|
||||
tracing::debug!("client requested keyframe (decode recovery)");
|
||||
if keyframe_tx.send(()).is_err() {
|
||||
break; // data plane gone
|
||||
}
|
||||
} else if let Ok(req) = ProbeRequest::decode(&msg) {
|
||||
tracing::info!(
|
||||
target_kbps = req.target_kbps,
|
||||
@@ -782,6 +791,7 @@ async fn serve_session(
|
||||
seconds,
|
||||
stop_stream,
|
||||
&reconfig_rx,
|
||||
&keyframe_rx,
|
||||
compositor,
|
||||
bitrate_kbps,
|
||||
probe_rx,
|
||||
@@ -1688,6 +1698,7 @@ fn virtual_stream(
|
||||
seconds: u32,
|
||||
stop: Arc<AtomicBool>,
|
||||
reconfig: &std::sync::mpsc::Receiver<punktfunk_core::Mode>,
|
||||
keyframe: &std::sync::mpsc::Receiver<()>,
|
||||
compositor: crate::vdisplay::Compositor,
|
||||
bitrate_kbps: u32,
|
||||
probe_rx: std::sync::mpsc::Receiver<ProbeRequest>,
|
||||
@@ -1762,6 +1773,18 @@ fn virtual_stream(
|
||||
}
|
||||
}
|
||||
}
|
||||
// Client recovery: it asked for a fresh IDR (its decoder wedged on the cold opening
|
||||
// GOP). Coalesce the backlog — several requests fire before the IDR lands — and force
|
||||
// the next encoded frame to be a keyframe. (A reconfig rebuild above already opens with
|
||||
// an IDR, so this is for the steady-state wedge, not mode switches.)
|
||||
let mut want_kf = false;
|
||||
while keyframe.try_recv().is_ok() {
|
||||
want_kf = true;
|
||||
}
|
||||
if want_kf {
|
||||
tracing::debug!("forcing keyframe (client decode recovery)");
|
||||
enc.request_keyframe();
|
||||
}
|
||||
if let Some(f) = capturer.try_latest().context("capture")? {
|
||||
frame = f;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user