feat(punktfunk/1): request-IDR recovery for a wedged client decode
apple / swift (push) Successful in 1m17s
ci / rust (push) Failing after 31s
ci / web (push) Failing after 42s
ci / docs-site (push) Failing after 40s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 10s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 6s
docker / deploy-docs (push) Has been skipped
rpm / build-publish (push) Failing after 15s
deb / build-publish (push) Failing after 43s

Fixes the intermittent first-connect freeze. The host streams infinite GOP — one
opening IDR, then P-frames only (recovery keyframes just on loss) — so when the
client's decoder wedges on the cold first session (a lost/corrupt opening IDR, a
bad early P-frame) the picture stays frozen until the far-off next keyframe. The
client had no way to ask for one; now it does.

Add a RequestKeyframe control message (client -> host, reliable control stream),
mirroring Reconfigure:
- core: quic.rs RequestKeyframe (type 0x03) + roundtrip test; client.rs
  CtrlRequest::Keyframe + NativeClient::request_keyframe; abi.rs
  punktfunk_connection_request_keyframe (header regenerated).
- host: m3.rs decodes it in the control loop and signals the encode loop, which
  coalesces a burst and calls enc.request_keyframe() — wiring the existing
  NvencEncoder hook (force_kf -> next frame pict_type=I), the same recovery the
  GameStream path already had via force_idr.
- apple: PunktfunkConnection.requestKeyframe(); StreamPump (stage-1) requests on
  layer.status==.failed; Stage2Pipeline (stage-2) on a sync submit failure and on
  the async decode-error callback via a thread-safe KeyframeRecovery. All
  throttled to <=1/250ms (the decode stays wedged for several frames until the IDR
  lands, so per-frame requests would flood the control stream).

Self-healing: a lost recovery IDR is re-requested after the throttle; the host
coalesces bursts into a single IDR.

Validated: cargo fmt + clippy clean; core + host test suites green (incl. new
request_keyframe_roundtrip); swift build + test (39 passed); xcframework rebuilt
(all 5 slices), header regenerated with no unrelated drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 00:48:18 +02:00
parent 71d6b64f81
commit c56b1b455a
8 changed files with 189 additions and 9 deletions
+24 -1
View File
@@ -28,7 +28,7 @@ use punktfunk_core::input::{InputEvent, InputKind};
use punktfunk_core::packet::{FLAG_PIC, FLAG_PROBE, FLAG_SOF};
use punktfunk_core::quic::{
endpoint, io, ClockEcho, ClockProbe, Hello, PairChallenge, PairProof, PairRequest, PairResult,
ProbeRequest, ProbeResult, Reconfigure, Reconfigured, Start, Welcome,
ProbeRequest, ProbeResult, Reconfigure, Reconfigured, RequestKeyframe, Start, Welcome,
};
use punktfunk_core::transport::UdpTransport;
use punktfunk_core::Session;
@@ -578,6 +578,7 @@ async fn serve_session(
// hands back a ProbeResult that this task writes to the client. The two control directions
// (inbound requests, outbound probe results) are multiplexed with `select!`.
let (reconfig_tx, reconfig_rx) = std::sync::mpsc::channel::<punktfunk_core::Mode>();
let (keyframe_tx, keyframe_rx) = std::sync::mpsc::channel::<()>();
let (probe_tx, probe_rx) = std::sync::mpsc::channel::<ProbeRequest>();
let (probe_result_tx, mut probe_result_rx) =
tokio::sync::mpsc::unbounded_channel::<ProbeResult>();
@@ -608,6 +609,14 @@ async fn serve_session(
if ok && reconfig_tx.send(req.mode).is_err() {
break; // data plane gone
}
} else if RequestKeyframe::decode(&msg).is_ok() {
// Client recovery: its decoder wedged — force the next encoded frame to
// be an IDR. Coalesced in the encode loop (a wedge fires several before
// the IDR lands); a send error just means the data plane is gone.
tracing::debug!("client requested keyframe (decode recovery)");
if keyframe_tx.send(()).is_err() {
break; // data plane gone
}
} else if let Ok(req) = ProbeRequest::decode(&msg) {
tracing::info!(
target_kbps = req.target_kbps,
@@ -782,6 +791,7 @@ async fn serve_session(
seconds,
stop_stream,
&reconfig_rx,
&keyframe_rx,
compositor,
bitrate_kbps,
probe_rx,
@@ -1688,6 +1698,7 @@ fn virtual_stream(
seconds: u32,
stop: Arc<AtomicBool>,
reconfig: &std::sync::mpsc::Receiver<punktfunk_core::Mode>,
keyframe: &std::sync::mpsc::Receiver<()>,
compositor: crate::vdisplay::Compositor,
bitrate_kbps: u32,
probe_rx: std::sync::mpsc::Receiver<ProbeRequest>,
@@ -1762,6 +1773,18 @@ fn virtual_stream(
}
}
}
// Client recovery: it asked for a fresh IDR (its decoder wedged on the cold opening
// GOP). Coalesce the backlog — several requests fire before the IDR lands — and force
// the next encoded frame to be a keyframe. (A reconfig rebuild above already opens with
// an IDR, so this is for the steady-state wedge, not mode switches.)
let mut want_kf = false;
while keyframe.try_recv().is_ok() {
want_kf = true;
}
if want_kf {
tracing::debug!("forcing keyframe (client decode recovery)");
enc.request_keyframe();
}
if let Some(f) = capturer.try_latest().context("capture")? {
frame = f;
}