fix(core/transport): treat ENOBUFS as a transient drop, not a fatal error
WiFi drivers (e.g. ath11k on the Steam Deck) return ENOBUFS — not EAGAIN/EWOULDBLOCK — when the tx queue is momentarily full. Rust maps ENOBUFS to ErrorKind::Uncategorized, so `is_transient_io` (which only matched WouldBlock/ConnRefused/ConnReset) treated it as a real error and tore the whole stream down on a single transient burst. This presented as a vicious Heisenbug on the Deck: the native host streamed flawlessly on loopback and under a debugger (anything slow enough not to fill the small ~416 KB wlan0 buffer), but died at full rate cross-machine over WiFi — flaky hang-or-SIGKILL because tx-queue-full is probabilistic. Diagnosed live via a forced core dump (gdb on the hung core): the data-plane thread had bailed on a fatal send error. Treat ENOBUFS (and asynchronous network-path blips ENETUNREACH / EHOSTUNREACH / ENETDOWN / EHOSTDOWN) as a lossy drop like WouldBlock — FEC + the next frame recover. Validated: 6/6 back-to-back cross-machine streams over the Deck's WiFi, host stable, p50 ~4.4 ms (one run dropped 4/300 frames *gracefully*, 0 mismatched — the fix working as intended). Also surface a data-plane bind/hole-punch failure directly in punktfunk1 (it was previously only reported after teardown, which a stall could swallow entirely). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -837,12 +837,19 @@ async fn serve_session(
|
||||
// can be on different subnets; control + side planes ride the client-initiated QUIC, but
|
||||
// the raw video UDP needs the client to open the path first). Falls back to the
|
||||
// client-reported address for clients that don't punch (flat-LAN, unchanged).
|
||||
let (transport, punched) = UdpTransport::connect_via_punch(
|
||||
let (transport, punched) = match UdpTransport::connect_via_punch(
|
||||
&format!("0.0.0.0:{udp_port}"),
|
||||
&client_udp.to_string(),
|
||||
std::time::Duration::from_millis(2500),
|
||||
)
|
||||
.context("bind data plane")?;
|
||||
) {
|
||||
Ok(v) => v,
|
||||
Err(e) => {
|
||||
// Surface the failure here directly: a data-plane bind error would otherwise be
|
||||
// reported only after teardown (and a teardown stall could swallow it entirely).
|
||||
tracing::error!(error = %e, %client_udp, udp_port, "data-plane socket bind/hole-punch failed");
|
||||
return Err(anyhow::Error::new(e)).context("bind data plane");
|
||||
}
|
||||
};
|
||||
tracing::info!(
|
||||
%client_udp,
|
||||
punched,
|
||||
|
||||
Reference in New Issue
Block a user