feat(discovery): native-protocol LAN auto-discovery over mDNS
ci / rust (push) Has been cancelled

Both the unified host (serve --native) and standalone m3-host now advertise the
native punktfunk/1 service over mDNS (_punktfunk._udp) — the analogue of the
GameStream _nvstream._tcp advert. TXT records carry proto, the host cert
fingerprint (fp, the value clients pin), the pairing requirement
(pair=required|optional), and the host id. New crate::discovery module, wired
into m3::serve so both host entry points get it; best-effort, never blocks
streaming (--connect always works).

Client gains `punktfunk-client-rs --discover [SECS]`: browses the LAN and prints
each host (name, addr:port, pairing, fingerprint), then exits. Apple clients
browse the same service natively via NWBrowser (service type + TXT keys are the
contract).

Validated cross-LAN: the dev box discovered the GNOME-box appliance
(pair=required) and a standalone synthetic host (pair=optional); fingerprint and
pairing state correct in both.

Also refresh the now-stale sendmmsg caveat in the bitrate doc (batched/paced send
landed + validated to 1 Gbps) and mark the encode|send thread split done in §12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-12 10:37:12 +00:00
parent a9e974d50d
commit 4fff4641bb
8 changed files with 221 additions and 13 deletions
+32 -7
View File
@@ -289,17 +289,42 @@ buffer; `sendmmsg`/`recvmmsg` batching; the capture-timestamp anchor placement.
the ~150 Mbps@60 frame size where drops began). Plus **per-frame instrumentation** (PUNKTFUNK_PERF):
`encode_us` + `pace_us` p50/p99/max + immediate-vs-paced counts, so the cap is tunable against real
numbers. **Validate with the LAN soak before raising the cap** (`send_dropped` must stay 0).
- **Done & live (`b295a5b`; validated on the GNOME box 2026-06-12):** **encode|send thread split**
on the native path — a dedicated `send_loop` thread owns the `Session` and does seal+pace+send+
probes; the encode thread captures+encodes+handles reconfig and hands `FrameMsg` over a bounded
`sync_channel(3)` with backpressure. Removes the serialization (~28 ms @60120 fps) and is the
substrate the slice wrapper needs. Real-NIC soak (host on the Ubuntu/GNOME box, client over the
LAN): `send_dropped=0` at 720p60 / 1080p120, and a 1 Gbps probe pushed 625 MB in 5 s clean.
- **Bigger bets (ordered, deferred — need real-NIC/GPU/Mac validation):**
1. **Encode|send thread split** on the native path (port GameStream's `spawn_sender` + depth-2
channel; `seal_frame` stays on the encode thread, `send_sealed` on a send thread) — removes the
serialization (~28 ms @60120 fps), and is the substrate the slice wrapper needs.
2. **Wall-clock skew handshake + glass-to-glass probe** (`tools/latency-probe`) — measures the two
1. **Wall-clock skew handshake + glass-to-glass probe** (`tools/latency-probe`) — measures the two
biggest unmeasured terms (render→capture, decode→present); client present-stamp vs the AU's
`pts_ns` (already attached).
3. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
2. **CUDA stream+event** to drop one of two redundant `cuCtxSynchronize` in `submit_cuda` (keep the
copy) — ~0.10.4 ms@720p, ~1 ms@5K; only if per-stage timing proves the sync is on the path.
4. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
3. **Stage-2 Apple presenter** (`VTDecompressionSession` → `CAMetalLayer`, hand-paced) — ~0.5 refresh
off the present tail (biggest client win at 60 Hz); gate on the probe proving present is real.
5. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
4. **NVENC slice-mode wrapper** (roadmap §2 sub-frame pipelining) — per-slice transmit overlaps
encode+send within a frame (~36 ms at 4K/5K/IDR); large + driver-ABI-fragile, on top of the
thread split, only after measurement justifies it.
## 13. Native-protocol LAN auto-discovery ✅ *(done — 2026-06-12, validated cross-LAN)*
The native protocol had no discovery — clients connected by `--connect HOST:PORT` only, while
GameStream already auto-discovered via mDNS (`_nvstream._tcp`). Now both the unified host
(`serve --native`) and standalone `m3-host` advertise the native service over mDNS:
- **Service**: `_punktfunk._udp.local.` (UDP — punktfunk/1 is QUIC; the advertised port is the QUIC
control/data port). Host side: `crate::discovery::advertise_native`, wired into `m3::serve` so
both host entry points get it; best-effort (a discovery failure never blocks streaming —
`--connect` always works). The advert is held for the host's lifetime (RAII unregister).
- **TXT records**: `proto=punktfunk/1`, `fp=<host cert SHA-256>` (the value a client pins — advisory
over unauthenticated mDNS, TOFU/pinning still verifies on connect), `pair=required|optional`
(so a picker knows up front whether the PIN ceremony is needed), `id=<host uniqueid>` (dedup).
- **Client**: `punktfunk-client-rs --discover [SECS]` browses and prints each host (name, addr:port,
pairing, fingerprint), then exits. Apple clients browse the same service natively via NWBrowser
(Bonjour) — no Rust-connector dependency; this section's service type + TXT keys are the contract.
- **Validated**: cross-LAN — dev box discovered the GNOME-box appliance
(`home-worker-3 192.168.1.248:9777 pair=required fp=1dcf3a…`) and a standalone synthetic host
(`pair=optional`); fingerprint + pairing state correct in both.
- **Next** (not done): wire NWBrowser discovery into the Apple client UI (host picker); the
host-side contract above is all it needs.