feat(protocol): per-AU host-timing plane (0xCF) — split host+network latency (stats phase 2)

The unified-stats equation's host+network stage was one opaque number
because the wire carried nothing but pts_ns. Now the host reports its own
share per frame: when the client's Hello sets VIDEO_CAP_HOST_TIMING (0x08),
the send thread emits a 13-byte 0xCF datagram — [tag][pts_ns u64][host_us
u32] — right after the AU's last packet leaves the socket, so host_us =
capture→fully-sent (capture read/convert, encode, FEC+seal, paced send)
against the same anchor the wire pts carries. Clients correlate by pts_ns
and derive network = (received + clock_offset − pts) − host_us; the two
terms tile per frame by construction.

Back-compat is free in all four combinations: old clients ignore unknown
datagram tags, old hosts ignore unknown cap bits (client keeps the combined
stage). The hardened data-plane format is untouched — this rides the
established QUIC side-plane pattern (0xC8…0xCE). NativeClient ORs the bit
in unconditionally and exposes next_host_timing(); the C ABI gains
PunktfunkHostTiming + punktfunk_connection_next_host_timing (additive).
The synthetic host emits 0xCF too, so pure-loopback protocol tests cover
the plane.

The probe reports the split (host_p50/p95_us · net_p50/p95_us) and is our
direct analogue of Sunshine's "host processing latency" — ours additionally
includes the paced send.

Validated on loopback (synthetic host + probe, debug build): 240/240 AUs
matched, host_p50 6.5 ms + net_p50 6.4 ms ≈ capture→received p50 13.0 ms.
Core suite + new 0xCF roundtrip/truncation test green; host+core+probe
clippy clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-03 21:22:12 +00:00
parent 09a5957c6d
commit 449a67ce8d
6 changed files with 314 additions and 4 deletions
+48
View File
@@ -254,6 +254,16 @@
#define VIDEO_CAP_444 4
#endif
#if defined(PUNKTFUNK_FEATURE_QUIC)
// [`Hello::video_caps`] bit: the client consumes per-AU host-timing datagrams
// ([`HOST_TIMING_MAGIC`], 0xCF) — the host's capture→send duration per frame, letting the client
// split its `host+network` latency stage into `host` and `network`
// (design/stats-unification.md Phase 2). The host emits 0xCF ONLY when this bit is set (an older
// host ignores it and simply never sends any); a client that doesn't set it keeps the combined
// stage. Purely observability — never changes what the host encodes.
#define VIDEO_CAP_HOST_TIMING 8
#endif
#if defined(PUNKTFUNK_FEATURE_QUIC)
// [`Hello::video_codecs`] bit: the client can decode H.264 / AVC. The GPU-less **software**
// encode path (openh264) emits H.264, so a client that wants to stream from a software host MUST
@@ -395,6 +405,13 @@
#define HDR_META_MAGIC 206
#endif
#if defined(PUNKTFUNK_FEATURE_QUIC)
// Per-AU host-timing datagram tag, host → client (see [`HostTiming`]). Next tag after
// [`HDR_META_MAGIC`]. Emitted once per access unit, right after its last packet left the host's
// socket, and only when the client advertised [`VIDEO_CAP_HOST_TIMING`].
#define HOST_TIMING_MAGIC 207
#endif
#if defined(PUNKTFUNK_FEATURE_QUIC)
// CICP colour-primaries code point: BT.709.
#define ColorInfo_CP_BT709 1
@@ -672,6 +689,21 @@ typedef struct {
} PunktfunkHdrMeta;
#endif
#if defined(PUNKTFUNK_FEATURE_QUIC)
// One access unit's host-side processing time ([`punktfunk_connection_next_host_timing`]):
// capture → fully sent, i.e. the whole host pipeline (capture read/convert, encode, FEC+seal,
// paced send). Correlate to the AU whose `PunktfunkFrame::pts_ns` equals `pts_ns`, then
// `network = (received_instant + clock_offset pts_ns) host_us` — the unified stats HUD's
// `host` / `network` split (design/stats-unification.md Phase 2). Best-effort: a lost datagram
// means that frame simply contributes no sample.
typedef struct {
// The AU's capture stamp (host capture clock — matches `PunktfunkFrame::pts_ns` exactly).
uint64_t pts_ns;
// Host capture→sent duration, µs.
uint32_t host_us;
} PunktfunkHostTiming;
#endif
#if defined(PUNKTFUNK_FEATURE_QUIC)
// One rich client→host input for the host's virtual DualSense
// ([`punktfunk_connection_send_rich_input`]): a touchpad contact or a motion sample. Set `kind`
@@ -1189,6 +1221,22 @@ PunktfunkStatus punktfunk_connection_next_hdr_meta(PunktfunkConnection *c,
uint32_t timeout_ms);
#endif
#if defined(PUNKTFUNK_FEATURE_QUIC)
// Pull the next per-AU host timing (0xCF) into `*out`: the host's capture→sent duration for one
// access unit, correlated to the AU by `pts_ns` (see [`PunktfunkHostTiming`]).
// [`PunktfunkStatus::NoFrame`] on timeout, [`PunktfunkStatus::Closed`] once the session ended.
// A stats consumer drains this non-blockingly (`timeout_ms = 0`) alongside its frame samples;
// an older host never emits any — keep showing the combined `host+network` stage then. Same
// threading rules as [`punktfunk_connection_next_rumble`] (one puller, may run alongside the
// other planes).
//
// # Safety
// `c` is a valid connection handle; `out` is writable for one `PunktfunkHostTiming`.
PunktfunkStatus punktfunk_connection_next_host_timing(PunktfunkConnection *c,
PunktfunkHostTiming *out,
uint32_t timeout_ms);
#endif
#if defined(PUNKTFUNK_FEATURE_QUIC)
// Read the session's resolved colour signalling + encode bit depth (from the host's Welcome).
// Each out pointer is filled when non-NULL: `primaries`/`transfer`/`matrix` are CICP code points