Files
enricobuehler c830246037
apple / swift (push) Successful in 53s
audit / cargo-audit (push) Failing after 1m7s
ci / rust (push) Failing after 40s
android / android (push) Successful in 2m11s
ci / web (push) Successful in 29s
ci / docs-site (push) Successful in 30s
ci / bench (push) Successful in 1m49s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
flatpak / build-publish (push) Successful in 3m42s
deb / build-publish (push) Successful in 6m58s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m30s
docker / deploy-docs (push) Successful in 30s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 5m10s
feat(host/windows): UDP send offload + NVENC 2-way split-encode (1 Gbps+ / 5K@240)
The Windows host couldn't sustain high-throughput / high-fps streams — two gaps vs the Linux host,
both found via live RTX 4090 measurement (PERF timing + nvidia-smi per-engine attribution):

- UDP Send Offload (USO). punktfunk-core's UdpTransport sent one packet per `send` syscall on
  Windows (send_batch/send_gso were Linux-only), capping throughput at high packet rates. Add a
  Windows `send_gso` override using `WSASendMsg` + `UDP_SEND_MSG_SIZE` (the Windows analogue of
  Linux UDP GSO) via windows-sys — one syscall segments a coalesced <=512-segment super-buffer to
  the connected peer. On by default with auto-fallback (PUNKTFUNK_GSO=0 disables, error latches
  off); plugs into the existing paced send path. SO_SNDBUF (32MB) was already cross-platform.

- NVENC 2-way split-frame encoding. A single Ada NVENC session tops out ~0.8 Gpix/s, so 5K@240
  (1.77 Gpix/s) took ~8 ms/frame -> a ~125 fps ceiling at high motion (the in-game stutter). Set
  NV_ENC_INITIALIZE_PARAMS.splitEncodeMode = TWO_FORCED above ~1 Gpix/s (matching the Linux
  libavcodec split_encode_mode path) to use both 4090 encoders — measured ~8 ms -> ~4 ms/frame at
  throughput. Env override PUNKTFUNK_SPLIT_ENCODE; init-failure fallback disables it (e.g. H264).

Windows-only paths; Linux/macOS unaffected. Builds clean on x86_64-pc-windows-msvc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 16:52:59 +00:00

82 lines
4.2 KiB
TOML

[package]
name = "punktfunk-core"
description = "punktfunk shared protocol/transport/FEC core, exposed over a stable C ABI"
version.workspace = true
edition.workspace = true
rust-version.workspace = true
license.workspace = true
authors.workspace = true
repository.workspace = true
[lib]
name = "punktfunk_core"
# `lib` — so punktfunk-host / punktfunk-client-rs / tools link it as a normal Rust crate.
# `staticlib` — `libpunktfunk_core.a` for the C test harness and static embedding.
# `cdylib` — `libpunktfunk_core.{so,dylib}` for Swift/Kotlin clients via the C ABI.
crate-type = ["lib", "cdylib", "staticlib"]
[features]
default = []
# Control-plane QUIC (pairing, config, reverse audio). tokio is permitted ONLY here,
# never on the per-frame hot path. Off by default so the core stays runtime-free.
quic = ["dep:quinn", "dep:tokio", "dep:rustls", "dep:rcgen", "dep:rustls-pki-types", "dep:sha2", "dep:hmac", "dep:spake2"]
[dependencies]
reed-solomon-simd = "3.1" # GF(2^16) Leopard-RS, SIMD, O(n log n) — the wall-breaker (P2)
# Vendored fork of fec-rs: GF(2^8) classic RS with the *Cauchy* generator matrix
# (M[j][i] = inv[(m+i)^j]) — byte-identical to the `nanors` library Moonlight uses, so our
# parity is decodable by a stock Moonlight client. (reed-solomon-erasure is Vandermonde and is
# NOT interoperable.) See vendor/fec-rs/LICENSE (BSD-2-Clause).
fec-rs = { path = "vendor/fec-rs" }
aes-gcm = "0.10" # AES-128-GCM session crypto, matches GameStream
zerocopy = { version = "0.8", features = ["derive"] }
bytes = "1"
socket2 = "0.6" # set SO_SNDBUF/SO_RCVBUF — default UDP buffers are too small for 4K/5K frame bursts
thiserror = "2"
tracing = { version = "0.1", default-features = false, features = ["std"] }
rand = "0.9"
zeroize = "1"
quinn = { version = "0.11", optional = true }
rustls = { version = "0.23", optional = true, default-features = false, features = ["ring", "std"] }
# Crypto backend pinned to `ring` (matching rustls/quinn above) so the whole quic tree is
# ring-only: no aws-lc-rs/aws-lc-sys (heavy C dep, needs cmake) is pulled in. Keeps the
# Android/iOS cdylib lean and the cross-compile cmake-free. `generate_simple_self_signed`
# is backend-agnostic, so the swap is transparent.
rcgen = { version = "0.13", optional = true, default-features = false, features = ["ring", "pem"] }
rustls-pki-types = { version = "1", optional = true }
sha2 = { version = "0.10", optional = true }
hmac = { version = "0.12", optional = true }
spake2 = { version = "0.4", optional = true }
tokio = { version = "1", optional = true, features = ["rt-multi-thread", "net", "sync", "macros"] }
# `libc` for batched UDP syscalls: `sendmmsg`/`recvmmsg` on Linux (the 1 Gbps+ lever) and the
# `recv(MSG_DONTWAIT)` drain on the other unix (Apple/BSD) targets, which have no `recvmmsg`
# (see transport/udp.rs `recv_batch`). Needed on every unix target — non-unix (Windows) uses
# the scalar fallbacks. Cross-compiles (iOS/tvOS) don't pull libc transitively the way the
# macOS host build does, so it must be a direct dep here or those slices fail to link `libc::`.
[target.'cfg(unix)'.dependencies]
libc = "0.2"
# Windows UDP Send Offload (USO): `WSASendMsg` + `UDP_SEND_MSG_SIZE` is the Windows analogue of
# Linux UDP GSO — the 1 Gbps+ send lever (the host otherwise sends one packet per `send` syscall,
# which caps throughput at high packet rates). See transport/udp.rs.
[target.'cfg(windows)'.dependencies]
# windows-sys (raw FFI, the quinn-udp choice): the high-level `windows` crate doesn't bind the
# `WSASendMsg` extension function. WinSock feature gives WSASendMsg + WSAMSG/WSABUF/CMSGHDR.
# Win32_System_IO too: WSASendMsg's signature references OVERLAPPED, so it's gated on that feature.
windows-sys = { version = "0.59", features = ["Win32_Networking_WinSock", "Win32_System_IO"] }
[dev-dependencies]
proptest = "1"
# Tier-1 microbenchmarks (benches/pipeline.rs). default-features off → no plotters/HTML (headless
# CI just needs the measurement + target/criterion/**/estimates.json for the regression compare).
criterion = { version = "0.5", default-features = false, features = ["cargo_bench_support"] }
[[bench]]
name = "pipeline"
harness = false
[build-dependencies]
cbindgen = "0.29"