feat(punktfunk/1): request-IDR recovery for a wedged client decode
apple / swift (push) Successful in 1m17s
ci / rust (push) Failing after 31s
ci / web (push) Failing after 42s
ci / docs-site (push) Failing after 40s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 10s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 6s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 15s
deb / build-publish (push) Failing after 43s

Fixes the intermittent first-connect freeze. The host streams infinite GOP — one
opening IDR, then P-frames only (recovery keyframes just on loss) — so when the
client's decoder wedges on the cold first session (a lost/corrupt opening IDR, a
bad early P-frame) the picture stays frozen until the far-off next keyframe. The
client had no way to ask for one; now it does.

Add a RequestKeyframe control message (client -> host, reliable control stream),
mirroring Reconfigure:
- core: quic.rs RequestKeyframe (type 0x03) + roundtrip test; client.rs
  CtrlRequest::Keyframe + NativeClient::request_keyframe; abi.rs
  punktfunk_connection_request_keyframe (header regenerated).
- host: m3.rs decodes it in the control loop and signals the encode loop, which
  coalesces a burst and calls enc.request_keyframe() — wiring the existing
  NvencEncoder hook (force_kf -> next frame pict_type=I), the same recovery the
  GameStream path already had via force_idr.
- apple: PunktfunkConnection.requestKeyframe(); StreamPump (stage-1) requests on
  layer.status==.failed; Stage2Pipeline (stage-2) on a sync submit failure and on
  the async decode-error callback via a thread-safe KeyframeRecovery. All
  throttled to <=1/250ms (the decode stays wedged for several frames until the IDR
  lands, so per-frame requests would flood the control stream).

Self-healing: a lost recovery IDR is re-requested after the throttle; the host
coalesces bursts into a single IDR.

Validated: cargo fmt + clippy clean; core + host test suites green (incl. new
request_keyframe_roundtrip); swift build + test (39 passed); xcframework rebuilt
(all 5 slices), header regenerated with no unrelated drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 00:48:18 +02:00
parent 71d6b64f81
commit c56b1b455a
8 changed files with 189 additions and 9 deletions
+45
View File
@@ -126,6 +126,16 @@ pub struct Reconfigured {
pub mode: Mode,
}
/// `client → host`, any time after [`Start`]: ask the host's encoder to emit a fresh IDR
/// keyframe NOW. The infinite-GOP stream opens with one IDR then sends P-frames only, so a
/// decoder that wedges (a lost/corrupt opening IDR, a bad early P-frame — most likely on the
/// cold first session) would otherwise stay frozen until the next loss-triggered recovery
/// keyframe, which may be far off. The client sends this when it detects a stalled decode;
/// the host forces the next frame to be an IDR with in-band parameter sets, recovering the
/// picture in ~one frame. Fire-and-forget — no reply (the recovered IDR is the ack).
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct RequestKeyframe;
/// `client → host`, any time after [`Start`]: run a bandwidth speed test. The host bursts
/// filler access units (flagged [`crate::packet::FLAG_PROBE`]) over the data plane at
/// `target_kbps` of application goodput for `duration_ms`, *pausing video for the duration*, then
@@ -195,6 +205,8 @@ pub fn clock_offset_ns(samples: &[(u64, u64, u64, u64)]) -> Option<(i64, u64)> {
pub const MSG_RECONFIGURE: u8 = 0x01;
/// Type byte of [`Reconfigured`].
pub const MSG_RECONFIGURED: u8 = 0x02;
/// Type byte of [`RequestKeyframe`].
pub const MSG_REQUEST_KEYFRAME: u8 = 0x03;
/// Type byte of [`ProbeRequest`].
pub const MSG_PROBE_REQUEST: u8 = 0x20;
/// Type byte of [`ProbeResult`].
@@ -699,6 +711,23 @@ impl Reconfigured {
}
}
impl RequestKeyframe {
pub fn encode(&self) -> Vec<u8> {
// magic[0..4] type[4] — no payload
let mut b = Vec::with_capacity(5);
b.extend_from_slice(CTL_MAGIC);
b.push(MSG_REQUEST_KEYFRAME);
b
}
pub fn decode(b: &[u8]) -> Result<RequestKeyframe> {
if b.len() != 5 || &b[0..4] != CTL_MAGIC || b[4] != MSG_REQUEST_KEYFRAME {
return Err(PunktfunkError::InvalidArg("bad RequestKeyframe"));
}
Ok(RequestKeyframe)
}
}
impl ProbeRequest {
pub fn encode(&self) -> Vec<u8> {
// magic[0..4] type[4] target_kbps[5..9] duration_ms[9..13]
@@ -1660,6 +1689,22 @@ mod tests {
.is_err());
}
#[test]
fn request_keyframe_roundtrip() {
let bytes = RequestKeyframe.encode();
assert!(RequestKeyframe::decode(&bytes).is_ok());
// Distinct from the other control messages — its type byte must not collide.
let mode = Mode {
width: 1280,
height: 720,
refresh_hz: 60,
};
assert!(RequestKeyframe::decode(&Reconfigure { mode }.encode()).is_err());
assert!(Reconfigure::decode(&bytes).is_err());
// Length is exact (no trailing bytes accepted).
assert!(RequestKeyframe::decode(&[bytes.as_slice(), &[0]].concat()).is_err());
}
#[test]
fn probe_messages_roundtrip() {
let req = ProbeRequest {