Commit Graph

20 Commits

Author SHA1 Message Date
enricobuehler 333f66b45b fix(host/serverinfo): don't advertise an empty codec mask when the VAAPI probe finds nothing
apple / swift (push) Successful in 54s
windows-host / package (push) Successful in 2m21s
android / android (push) Successful in 3m30s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 27s
ci / rust (push) Successful in 5m56s
deb / build-publish (push) Successful in 3m9s
ci / bench (push) Successful in 4m40s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
decky / build-publish (push) Successful in 11s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m47s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m51s
docker / deploy-docs (push) Successful in 17s
The Phase 3 GPU-aware codec mask (6922e1c) probes VAAPI on any non-NVIDIA host.
On a GPU-less box (CI container: no /dev/nvidia* -> `auto` picks VAAPI, but there's
no VA display) the probe returns all-false, so the mask was 0 -- the host
advertised NO codecs, and the serverinfo unit test failed.

Fall back to the static superset when the probe yields nothing (VAAPI wasn't
usable, not "the GPU encodes nothing"); quiet ffmpeg's expected "No VA display"
error during the probe; and assert the test against codec_mode_support() rather
than a hardcoded 65793 so it's deterministic regardless of the build host's GPU.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 11:52:17 +00:00
enricobuehler 6922e1c467 feat(host): VAAPI codec probe + AMD/Intel packaging + neutral logs (Phase 3)
apple / swift (push) Successful in 55s
ci / rust (push) Failing after 1m35s
ci / web (push) Successful in 28s
windows-host / package (push) Successful in 2m23s
ci / docs-site (push) Successful in 30s
android / android (push) Successful in 3m24s
deb / build-publish (push) Successful in 3m22s
decky / build-publish (push) Successful in 14s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m48s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m50s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m51s
docker / deploy-docs (push) Successful in 18s
Polish for AMD/Intel support:
- GameStream serverinfo advertises only codecs the GPU can ACTUALLY encode on
  the VAAPI backend (probed once by opening a tiny encoder per codec). AV1
  encode is narrow (Intel Arc/Xe2+, AMD RDNA3+/RDNA4) and an old iGPU may lack
  HEVC, so a Moonlight client never negotiates a codec the encoder can't open.
  NVENC/Windows keep the Moonlight-validated static mask. Validated on a Radeon
  780M: h264/h265/av1 all probe true -> mask unchanged (65793).
- Packaging: Recommends mesa-va-drivers + intel-media-va-driver (deb) /
  mesa-va-drivers + intel-media-driver (rpm) so the auto-selected VAAPI backend
  works out of the box on AMD/Intel; NVIDIA boxes can --no-install-recommends.
  (Fedora note: stock mesa-va-drivers disables HEVC/AV1 -- needs the freeworld
  variant from RPM Fusion.)
- De-NVIDIA-fy the user-facing encoder log/context strings ("open NVENC" ->
  "open video encoder") now that VAAPI is a first-class backend.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 10:41:37 +00:00
enricobuehler 708c62788d feat(host/encode): VAAPI zero-copy dmabuf import (AMD/Intel GPU CSC)
apple / swift (push) Successful in 57s
ci / rust (push) Successful in 1m39s
ci / web (push) Successful in 32s
ci / docs-site (push) Successful in 31s
android / android (push) Successful in 3m29s
windows-host / package (push) Successful in 3m39s
deb / build-publish (push) Successful in 3m7s
decky / build-publish (push) Successful in 22s
ci / bench (push) Successful in 4m43s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 16s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m24s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 22s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m18s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m22s
docker / deploy-docs (push) Successful in 21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m53s
Phase 2 of AMD/Intel support: the VAAPI encoder now takes the capture dmabuf
directly and does the RGB->NV12 colour conversion on the GPU's video engine,
eliminating the host-side de-pad + swscale CSC + upload the CPU path pays.

- capture: a vendor-neutral FramePayload::Dmabuf (dup'd fd + fourcc/modifier/
  layout). When zero-copy is on, the EGL->CUDA importer is unavailable (any
  non-NVIDIA host), and the backend is VAAPI, the capturer advertises LINEAR
  dmabuf and hands the raw buffer to the encoder instead of CPU-copying it.
- encode/vaapi: the encoder self-configures from the first frame's payload (no
  open_video signature change). The dmabuf arm wraps the buffer as an
  AV_PIX_FMT_DRM_PRIME frame and pushes it through a filter graph
  buffer(drm_prime) -> hwmap(vaapi) -> scale_vaapi=nv12 -> buffersink; the
  encoder takes NV12 surfaces straight from the sink. The Phase 1 CPU-upload
  path is kept as the other arm (used when capture produces CPU frames).

Live-validated on a Radeon 780M (real Sway/xdpw desktop capture): correct,
pixel-perfect HEVC, and ~10x less host CPU at 1440p (4.2s -> 0.4s of CPU for
300 frames) -- the de-pad/CSC/upload moves to the GPU. NVIDIA unchanged
(zero-copy still imports to CUDA; the passthrough path only engages on
non-NVIDIA hosts).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 09:57:00 +00:00
enricobuehler b390dd883b feat(host/encode): VAAPI encode backend for AMD/Intel GPUs (Linux)
apple / swift (push) Successful in 54s
windows-host / package (push) Successful in 2m52s
android / android (push) Successful in 3m4s
ci / rust (push) Successful in 1m18s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 27s
ci / bench (push) Successful in 4m32s
deb / build-publish (push) Successful in 2m56s
decky / build-publish (push) Successful in 22s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m59s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 15s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m41s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 22s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 7m27s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m10s
docker / deploy-docs (push) Successful in 39s
The Linux host was NVENC/CUDA-only. Add a VAAPI encoder — one libavcodec
backend (h264/hevc/av1_vaapi) covering both AMD (Mesa radeonsi) and Intel
(iHD) — behind the existing `Encoder` trait, and turn `open_video`'s Linux
arm into a vendor dispatcher: `PUNKTFUNK_ENCODER=auto|nvenc|vaapi` (default
auto: NVENC when a CUDA frame or /dev/nvidia* is present, else VAAPI). The
NVIDIA path is unchanged — auto resolves to NVENC on an NVIDIA box and the
bitrate-probe loop moved verbatim into `open_nvenc_probed`.

`VaapiEncoder` mirrors the NVENC hwframes pattern with AV_HWDEVICE_TYPE_VAAPI.
The CPU-input path swscales packed RGB -> NV12 (BT.709 limited, VUI signalled)
and uploads into a pooled VA surface (av_hwframe_transfer_data), preserving the
low-latency model (infinite GOP, on-demand forced IDR, async_depth=1, CBR when
the driver supports it). It works on a non-NVIDIA box with no capture changes:
the capturer already falls back to CPU frames when its EGL->CUDA importer can't
initialise (no libcuda).

Live-validated on a Radeon 780M (RDNA3): hevc/h264/av1_vaapi all encode,
HEVC/H264 decode cleanly with correct BT.709-limited colours, infinite GOP
preserved. Zero-copy dmabuf import (the high-res perf lever) is next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 15:35:49 +00:00
enricobuehler 1fc6f73784 perf(host/linux): NV12 GPU convert — feed NVENC native YUV, off the contended SM (Tier 2A)
apple / swift (push) Successful in 54s
windows-host / package (push) Failing after 2m18s
ci / web (push) Successful in 32s
ci / rust (push) Failing after 5m2s
decky / build-publish (push) Successful in 11s
android / android (push) Failing after 49s
ci / docs-site (push) Successful in 35s
ci / bench (push) Failing after 3m15s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 3m49s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 15s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 40s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Failing after 28s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Successful in 5m54s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 11s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 1m36s
The Linux zero-copy tiled-GL path can now produce NV12 (BT.709 limited range)
on the GPU and feed NVENC native YUV, deleting NVENC's internal RGB->YUV CSC —
which runs on the SM/3D-compute engine a saturating game pins at 100% (the
game-vs-encode contention headache). Windows already does this via the D3D11
video processor; this closes the Linux gap. See docs/host-latency-plan.md §2A.

Gated behind PUNKTFUNK_NV12 (default OFF → the RGB/BGRx path is byte-for-byte
unchanged; zero regression). Only the tiled EGL/GL path converts; the
LINEAR/Vulkan-bridge (gamescope) path stays RGB.

- zerocopy/egl.rs: Nv12Blit — BT.709 limited Y pass (R8, full-res) + UV pass
  (RG8, half-res, GL_LINEAR 2x2 average); both CUDA-registered; import_nv12.
- zerocopy/cuda.rs: two-plane DeviceBuffer (Y W*H@1B + interleaved UV
  (W/2)*2 x H/2), paired Y+UV pool, copy_mapped_nv12 + copy_nv12_to_device,
  on the per-thread priority stream (dmabuf-recycle sync preserved).
- encode/linux.rs: nvenc_input(Nv12)->NV12; submit_cuda copies two planes into
  NVENC's surface; VUI signalled BT.709 limited (colorspace/range/primaries/trc).
- capture/linux.rs: gate (PUNKTFUNK_NV12 && tiled), report format Nv12.
- main.rs + zerocopy/mod.rs: `nv12-selftest` subcommand.

Validated on RTX 5070 Ti two ways: (1) nv12-selftest — synthetic RGBA->NV12
round-trip vs a BT.709 reference, max abs error Y=0.56/U=0.33/V=0.26 LSB;
(2) live capture->NV12->NVENC->decode of animated red content matches the RGB
path's colour (avg RGB 230,18,18 vs 231,18,20). build/clippy/fmt green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 23:39:11 +00:00
enricobuehler 9771aa8815 fix(host/windows): binary-search clamp NVENC bitrate to the codec-level max (not ×¾ step-down)
ci / web (push) Successful in 28s
ci / rust (push) Successful in 1m42s
ci / docs-site (push) Successful in 28s
apple / swift (push) Successful in 55s
android / android (push) Successful in 1m55s
deb / build-publish (push) Successful in 2m29s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
ci / bench (push) Successful in 4m27s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m54s
When a client requests a bitrate above the GPU's HEVC/AV1 level ceiling, NVENC rejects
initialize_encoder. The old probe stepped the rate down by ×¾ each retry, undershooting
the real ceiling badly (a 1 Gbps request landed ~300 Mbps even with the level cap near
800). Replace it with a binary search over [floor, requested] that converges (±20 Mbps)
on the HIGHEST rate NVENC accepts and clamps to that — so the stream uses the full
codec-level bitrate. Factored the session open/config/init into try_open_session() for
the probe; split-encode rejection is disambiguated from a bitrate-cap rejection (retry
once with split disabled) and the floor fallback also tries split-disabled.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:42:00 +00:00
enricobuehler a4df75132a fix(host/windows): HEVC/AV1 HIGH tier so high client bitrates aren't quartered
android / android (push) Successful in 1m56s
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 1m35s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 29s
deb / build-publish (push) Successful in 2m26s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m36s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m8s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m8s
docker / deploy-docs (push) Successful in 18s
NVENC defaulted to Main tier, whose per-level bitrate ceiling at 5K (HEVC Level 6.2
Main ≈ 240 Mbps) made initialize_encoder reject a high client bitrate; the existing
probe-and-step-down then silently dropped a ~1 Gbps request by ×¾ to ~240-320 Mbps —
visible color/motion compression on fast scenes. Set HIGH tier (≈800 Mbps for HEVC,
higher for AV1) + autoselect level so the requested bitrate goes through. `tier`/`level`
are u32 (HIGH=1, AUTOSELECT=0) shared across the HEVC/AV1 union offset; the step-down
remains as a safety net. Not yet built/validated on-box (box offline).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:33:31 +00:00
enricobuehler 4cc57d5c39 perf(host/windows): move capture→encode off the 3D engine (NV12/P010 video-processor path, zero-copy, GPU priority)
apple / swift (push) Successful in 56s
ci / rust (push) Successful in 1m36s
android / android (push) Successful in 1m56s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 28s
deb / build-publish (push) Successful in 2m26s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m33s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m58s
The Windows host capped at ~60 fps with 35-40 ms latency on a GPU-heavy game:
the per-frame capture→encode path shared the 3D engine with the game and got
scheduled behind it. Rework to minimize 3D-engine work per frame:

- VideoConverter (D3D11 video processor): capture → NVENC-native NV12/P010 so
  NVENC skips its internal RGB→YUV (a 3D/compute step). Wired into both DDA
  (dxgi.rs) and WGC (wgc.rs). New PixelFormat::Nv12/P010 + NVENC YUV input.
- GPU scheduling hardening (Apollo-style): D3DKMTSetProcessSchedulingPriorityClass
  HIGH, absolute SetGPUThreadPriority, SetMaximumFrameLatency(1).
- WGC SDR zero-copy (hold pool frames; no CopyResource). DDA keeps a fast
  CopyResource to decouple its single-frame acquire/release from the async convert.
- Pipelined helper encode loop (PUNKTFUNK_ENCODE_DEPTH, default 1) + perf split
  (cap_wait / encode / write).

Live on the RTX 4090: hard 60 fps ceiling removed (now scene-scaling 40-200+),
latency much reduced. Residual cap in GPU-pinned scenes is the irreducible RGB→YUV
convert (no fixed-function unit on NVIDIA — VideoProcessing engine reads 0%) waiting
behind an uncapped game under WDDM context time-slicing; Linux avoids it via
gamescope capping the game to the display refresh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:08:03 +00:00
enricobuehler b9f4cf1f3e fix(host/windows): don't 2-way-split-encode Main10 — it's SLOWER on Ada (fixes broken HDR animations)
apple / swift (push) Successful in 53s
audit / cargo-audit (push) Failing after 1m9s
android / android (push) Successful in 2m3s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
ci / web (push) Successful in 29s
ci / docs-site (push) Successful in 29s
ci / bench (push) Successful in 1m31s
ci / rust (push) Successful in 4m26s
decky / build-publish (push) Successful in 11s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
flatpak / build-publish (push) Successful in 3m34s
deb / build-publish (push) Successful in 6m55s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m25s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 5m10s
The "broken animations in HDR" was an encode-throughput cliff, not the ACCESS_LOST churn. Measured at
5120x1440@240 HEVC Main10 on the RTX 4090: forced 2-way split-encode = 7.6 ms/frame (~131 fps, well
over the 4.17 ms/240fps budget → choppy), while SINGLE engine = 2.8-3.9 ms/frame (~256-357 fps, fits
240). The split/merge overhead dominates for 10-bit; a single Ada NVENC engine already handles 5K@240
Main10 comfortably. So the split decision now forces DISABLE for Main10 (bit_depth >= 10), keeping the
existing forced-2 only for 8-bit above 1 Gpix/s. PUNKTFUNK_SPLIT_ENCODE still overrides. Added a
split-mode log line.

Validated live on the 4090: encode_us_p50 7.6 ms → 3.9 ms at 5K240 HDR with no env override.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 21:40:28 +00:00
enricobuehler bbabc04bca feat(hdr): Windows HDR10 + 10-bit end-to-end, negotiated; non-blocking capture recovery
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 1m32s
android / android (push) Successful in 1m49s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 30s
ci / bench (push) Successful in 1m36s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
deb / build-publish (push) Successful in 2m20s
flatpak / build-publish (push) Successful in 4m6s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m11s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m32s
Adds true HDR (BT.2020 PQ) and 10-bit (HEVC Main10) streaming, negotiated so an
8-bit/SDR client is never sent a stream it can't decode, plus a robust fix for the
capture losing the stream across a secure-desktop transition.

Protocol (punktfunk-core/quic.rs):
- Hello gains `video_caps` (VIDEO_CAP_10BIT / VIDEO_CAP_HDR), Welcome gains `bit_depth`,
  both as optional trailing bytes (back-compat). client-rs advertises 10-bit via
  PUNKTFUNK_CLIENT_10BIT; the connector advertises 0 for now (in-band detection drives
  the native clients). Regenerated punktfunk_core.h.

Windows host:
- 10-bit Main10: host enables it only when the client advertised VIDEO_CAP_10BIT AND
  PUNKTFUNK_10BIT is set; threaded through open_video → NVENC (profile Main10,
  pixelBitDepthMinus8).
- HDR: when the captured desktop is scRGB FP16 (R16G16B16A16_FLOAT, HDR on), copy it to
  an FP16 surface, composite the cursor there, convert scRGB → BT.2020 PQ 10-bit
  (R10G10B10A2) via a shader, and encode HEVC Main10 with the BT.2020/PQ colour VUI
  (ABGR10 input). Fixes the freeze + cursor-trail that came from feeding FP16 into the
  BGRA path. Reacts dynamically to the HDR toggle.
- Capture recovery: rebuild is now a single NON-BLOCKING attempt, throttled to ~4×/s,
  repeating the last good frame between attempts (format-tagged last_present). During a
  secure-desktop dwell SudoVDA's output is gone; the old blocking 12 s retry starved the
  send loop for seconds so the client timed out and disconnected — now the session stays
  fed (frozen) until the desktop returns. Also seeds a black frame on recovery.

Apple client (PunktfunkKit):
- Detects HDR in-band from the stream VUI (PQ transfer function), decodes to 10-bit P010,
  and presents via an rgba16Float + BT.2020 PQ CAMetalLayer with EDR; SDR path unchanged.
  Switches automatically on a mid-session HDR toggle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 20:28:52 +00:00
enricobuehler 26fbd9ec64 perf(host/windows): zero-copy NVENC — encode the capturer's texture in place (halve 3D-engine load)
ci / rust (push) Failing after 43s
apple / swift (push) Successful in 53s
ci / web (push) Successful in 35s
android / android (push) Successful in 1m45s
ci / docs-site (push) Successful in 29s
ci / bench (push) Successful in 1m35s
decky / build-publish (push) Successful in 32s
deb / build-publish (push) Successful in 2m21s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 17s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m59s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3m52s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 21s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m37s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m37s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 5m4s
docker / deploy-docs (push) Successful in 18s
The Windows host pegged the GPU 3D engine at ~97% during high-fps desktop streaming — measured (per-
process GPU-engine counters) as OUR process, not DWM. Cause: TWO VRAM->VRAM CopyResource per frame
(dupl->gpu_copy in the capturer, then gpu_copy->nvenc_pool in the encoder), and on Windows D3D11
routes copies to render-target textures through the 3D engine (the DMA copy engine sat idle at 7%),
so at 240 fps they saturate it and contend with a game's own rendering.

Eliminate the second copy: NVENC now registers the capturer's D3D11 texture directly (cached by raw
pointer, the cloned texture kept alive until unregister) and encode_pictures it IN PLACE — no
encoder-owned input pool, no per-frame copy. Safe because the host encode loop is synchronous
(capture -> submit -> poll, where lock_bitstream blocks until the encode finishes), so the capturer
never overwrites the texture mid-encode; documented in the module header in case that ever changes.

2 GPU copies/frame -> 1 (the remaining dupl->gpu_copy is unavoidable; that DXGI surface is transient).
Measured: SM/compute ~10-15% at ~217 fps 5K (was ~20% at only ~48 fps with two copies), 3687 frames
decoded clean. Windows-only; Linux/macOS unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 17:33:07 +00:00
enricobuehler c830246037 feat(host/windows): UDP send offload + NVENC 2-way split-encode (1 Gbps+ / 5K@240)
apple / swift (push) Successful in 53s
audit / cargo-audit (push) Failing after 1m7s
ci / rust (push) Failing after 40s
android / android (push) Successful in 2m11s
ci / web (push) Successful in 29s
ci / docs-site (push) Successful in 30s
ci / bench (push) Successful in 1m49s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
flatpak / build-publish (push) Successful in 3m42s
deb / build-publish (push) Successful in 6m58s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m30s
docker / deploy-docs (push) Successful in 30s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 5m10s
The Windows host couldn't sustain high-throughput / high-fps streams — two gaps vs the Linux host,
both found via live RTX 4090 measurement (PERF timing + nvidia-smi per-engine attribution):

- UDP Send Offload (USO). punktfunk-core's UdpTransport sent one packet per `send` syscall on
  Windows (send_batch/send_gso were Linux-only), capping throughput at high packet rates. Add a
  Windows `send_gso` override using `WSASendMsg` + `UDP_SEND_MSG_SIZE` (the Windows analogue of
  Linux UDP GSO) via windows-sys — one syscall segments a coalesced <=512-segment super-buffer to
  the connected peer. On by default with auto-fallback (PUNKTFUNK_GSO=0 disables, error latches
  off); plugs into the existing paced send path. SO_SNDBUF (32MB) was already cross-platform.

- NVENC 2-way split-frame encoding. A single Ada NVENC session tops out ~0.8 Gpix/s, so 5K@240
  (1.77 Gpix/s) took ~8 ms/frame -> a ~125 fps ceiling at high motion (the in-game stutter). Set
  NV_ENC_INITIALIZE_PARAMS.splitEncodeMode = TWO_FORCED above ~1 Gpix/s (matching the Linux
  libavcodec split_encode_mode path) to use both 4090 encoders — measured ~8 ms -> ~4 ms/frame at
  throughput. Env override PUNKTFUNK_SPLIT_ENCODE; init-failure fallback disables it (e.g. H264).

Windows-only paths; Linux/macOS unaffected. Builds clean on x86_64-pc-windows-msvc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 16:52:59 +00:00
enricobuehler f4b4a6c1e4 feat(host/windows): native res, cursor, secure-desktop capture, windowless SYSTEM launch
apple / swift (push) Successful in 52s
ci / rust (push) Failing after 36s
ci / web (push) Successful in 31s
android / android (push) Successful in 1m52s
ci / docs-site (push) Successful in 29s
ci / bench (push) Successful in 1m39s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
deb / build-publish (push) Successful in 3m19s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m15s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m57s
docker / deploy-docs (push) Successful in 17s
Live-validated Mac <-> RTX 4090 at the display's native 5120x1440@240:

- Resolution: set_active_mode enumerates the IDD's advertised modes and sets the
  requested resolution at the best supported refresh (keeps 5120x1440@240; no more
  silent fallback to the 1080p OS default when an exact mode is briefly unavailable).
- Bitrate auto-cap: NVENC init probes and steps the average bitrate down to the GPU's
  codec-level max so a high client bitrate connects (matches the Linux host; we do not
  split NVENC sessions).
- Mouse cursor: DXGI duplication excludes the HW cursor; capture the pointer
  shape/position (GetFramePointerShape) and GPU-composite it before NVENC. Color cursors
  alpha-blend; masked-color (the text I-beam) uses an INV_DEST_COLOR inversion blend so
  the caret inverts the screen and shows on any background (no black box); monochrome
  handled too.
- Secure desktop (lock / login / UAC): run as SYSTEM in the interactive session, follow
  the input desktop via SetThreadDesktop, and on the WinSta switch recreate the D3D11
  device and re-resolve the virtual output's GDI name from the stable SudoVDA target id
  (the name changes across the topology rebuild; the old failure hunted the stale
  \\.\DISPLAYn and dropped). ACCESS_LOST / INVALID_CALL / device-removed are recoverable,
  and a mid-stream resolution change is followed (capturer + NVENC re-init at the new
  size). isolate_displays detaches other monitors so Winlogon renders to the virtual
  output. One real session recovered 1012 desktop switches and completed cleanly.

Windows-only backends; Linux/macOS unaffected. Builds clean on x86_64-pc-windows-msvc.
Deployment (windowless SYSTEM launch via PsExec + hidden VBScript) documented in
docs/windows-host.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 15:46:34 +00:00
enricobuehler 2448a33698 style(host/windows): rustfmt the Windows backends
apple / swift (push) Successful in 55s
android / android (push) Failing after 1m53s
ci / web (push) Failing after 17s
ci / docs-site (push) Successful in 42s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 7s
ci / rust (push) Failing after 3m5s
ci / bench (push) Successful in 1m49s
decky / build-publish (push) Successful in 12s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 7s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 2s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Failing after 0s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Failing after 0s
flatpak / build-publish (push) Failing after 0s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 0s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Failing after 1m43s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 1m15s
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 01:50:16 +00:00
enricobuehler 69ba6ec45d feat(host/windows): NVENC D3D11 hardware encoder (--features nvenc)
android / android (push) Failing after 36s
ci / rust (push) Failing after 45s
apple / swift (push) Successful in 55s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 29s
ci / bench (push) Successful in 1m35s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
flatpak / build-publish (push) Failing after 2s
deb / build-publish (push) Successful in 3m13s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 1m17s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 1m32s
docker / deploy-docs (push) Successful in 17s
Zero-copy capture->encode on the GPU via the raw NVENC API (nvidia_video_codec_sdk sys + ENCODE_API; the safe wrapper is CUDA-only). Opens an NV_ENC_DEVICE_TYPE_DIRECTX session on the SAME ID3D11Device as the DXGI capturer (carried on the new FramePayload::D3d11), registers a pool of BGRA textures once, CopyResources each captured texture in and encode_picture; CBR/ULL, infinite GOP, P-only, forced-IDR for RFI. The DXGI capturer gains a D3D11 zero-copy output (selected, like the encoder, by PUNKTFUNK_ENCODER=nvenc) so capture+encode share textures.

OFF by default (the nvenc feature pulls the NVENC SDK + cudarc): the default Windows host links without it (openh264 path). cudarc builds toolkit-less via the SDK ci-check feature (dynamic-loading). At link time --features nvenc needs nvencodeapi.lib (NVENC SDK, or an import lib generated from the driver's nvEncodeAPI64.dll) on PUNKTFUNK_NVENC_LIB_DIR. Both default and --features nvenc builds validated to compile+link GPU-less on the VM (import lib generated from the driver DLL). Runtime needs a real NVIDIA GPU.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 01:39:46 +00:00
enricobuehler cbbeaa5c29 feat(host/windows): openh264 software H.264 encoder (GPU-less path)
apple / swift (push) Successful in 53s
android / android (push) Failing after 1m31s
ci / rust (push) Failing after 45s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 29s
ci / bench (push) Successful in 1m37s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 3s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
flatpak / build-publish (push) Failing after 2s
deb / build-publish (push) Successful in 3m6s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 1m21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 1m46s
docker / deploy-docs (push) Successful in 18s
Windows Encoder impl via the openh264 crate (statically-bundled, BSD-2): low-latency screen-content config (Baseline/no-B-frames, bitrate RC, BT.709 limited, near-infinite GOP + forced-IDR recovery via request_keyframe), packed CPU pixels (BGRx/BGRA/RGB/RGBA/RGBx/BGR) -> I420 -> AnnexB with in-band SPS/PPS each IDR. Synchronous: submit encodes immediately, poll hands back the one AU, flush is a no-op. Windows open_video factory selects it (PUNKTFUNK_ENCODER=software|nvenc|auto; NVENC arm lands later), H.264-only with a clear error otherwise, SW bitrate ceiling. Unit-tested live on the VM: synthetic BGRx -> AnnexB IDR + SPS NAL. Unblocks the GPU-less capture->encode->FEC->send pipeline. Compiles clean on Windows + Linux.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 00:43:19 +00:00
enricobuehler a95984bb4f feat(client-linux): feature parity with the Swift client
Everything the macOS app does that stage 1 lacked, before any new
feature work (user directive):

- Input capture is now a deliberate, reversible STATE (Moonlight-
  style): engaged on stream start and click-into-video (the engaging
  click is suppressed), released by Ctrl+Alt+Shift+Q (toggles) or
  focus loss; held keys/buttons are flushed host-side on release;
  cursor hiding + shortcut inhibition follow the state; HUD hint when
  released. Per-session window handlers disconnect with the page.
- Gamepads: app-lifetime SDL service (GamepadManager parity) — pad
  list + "Forwarded controller" pin in Settings (auto = most recent),
  "Automatic" pad TYPE resolves from the physical pad at connect;
  DualSense touchpad contacts + ~250 Hz motion samples on the 0xCC
  plane (Swift GamepadWire scale constants); feedback grows adaptive-
  trigger replay and player LEDs via raw DS5 effects packets (the
  wire's 11-byte blocks drop into SDL_SendGamepadEffect verbatim);
  held pad state zeroed on pad switch/detach. sdl3 "hidapi" feature.
- Microphone uplink: PipeWire capture -> Opus 20 ms -> 0xCB datagrams
  (validated live: host received 711 mic packets), Settings toggle.
- Speed test per saved host (Swift's "Test Network Speed…"): 2 s
  probe burst, goodput/loss + recommended ~70 % bitrate, one-tap apply.
- Settings: host compositor preference (sent in the Hello), native-
  display resolution/refresh resolved from the window's monitor at
  connect (new default), bitrate ceiling to 3 Gbit/s.
- Hosts page: saved/trusted hosts section for direct pinned reconnect
  (mDNS not required), rebuilt on every page return.

Deliberately not ported: audio device pickers (PipeWire routing owns
this on Linux), resize-to-request_mode (not wired in Swift either),
pointer-lock relative mouse (stage-2 presenter, needs raw Wayland).
DualSense fidelity needs a physical pad to live-verify.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 21:11:52 +00:00
enricobuehler a8a6224fd8 fix(encode): bound per-frame size with a tight VBV buffer
ci / rust (push) Failing after 36s
ci / web (push) Failing after 36s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Successful in 17s
ci / docs-site (push) Failing after 39s
apple / swift (push) Successful in 1m16s
NVENC ran CBR (bit_rate == max_bit_rate, rc=cbr) but never set rc_buffer_size,
so it used a loose default VBV. A high-motion P-frame was then allowed to spike
to many times the average frame size; the extra packets overflow the depth-2
send queue (newest frame dropped) and the kernel UDP buffer (WouldBlock drops),
which the client sees as framedrops/jitter — and on the infinite-GOP GameStream
path as old/stale frames flashing until the next RFI.

Set a tight ~1-frame VBV (rc_buffer_size = bitrate/fps) so the encoder holds
frame size roughly constant and absorbs motion as a momentary QP/quality dip
instead — the Sunshine/Moonlight low-latency model. Tunable via
PUNKTFUNK_VBV_FRAMES (default 1.0); larger trades burst tolerance for motion
quality. Fixes both the punktfunk/1 and GameStream paths (shared encoder).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 20:58:46 +00:00
enricobuehler 136390514d build: support FFmpeg 7.x and 8.x; fix RPM spec GPU link deps
ci / rust (push) Has been cancelled
punktfunk-host builds unchanged against either FFmpeg 7.x (libavcodec 61) or 8.x
(libavcodec 62) — ffmpeg-sys-next auto-detects the system version, and the host's
ffmpeg FFI only touches long-stable APIs. Confirmed by building + running live on a
Bazzite F43 box (FFmpeg 7.1.3): full gamescope capture → zero-copy dmabuf→CUDA →
NVENC H.265 at 1280x720x60, p50 ~0.96 ms. Just doc/spec accuracy, no code change:

- encode/linux.rs + CLAUDE.md: drop the "FFmpeg 8 only" claim; note 7.x/8.x both work.
- rpm spec: add the missing zero-copy GPU build deps the link actually needs —
  pkgconfig(gl) + pkgconfig(gbm) (mesa) — and document that -lcuda needs libcuda.so at
  link time (NVIDIA host, or the CUDA toolkit stub on a headless COPR/koji builder).
  Tracked for a proper fix: make the cuda/gbm/GL FFI dlopen-based like khronos-egl so
  the RPM builds on a GPU-less host.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 09:12:59 +00:00
enricobuehler bfd64ce871 rename: lumen → punktfunk, everywhere
ci / rust (push) Has been cancelled
Full project rename, decided 2026-06-10:
- Crates/binaries: punktfunk-core / punktfunk-host / punktfunk-client-rs.
- C ABI: punktfunk_* symbols, Punktfunk* types, include/punktfunk_core.h,
  PUNKTFUNK_FEATURE_QUIC guard (header regenerated; cbindgen renames updated, incl.
  PUNKTFUNK_BTN_*/PUNKTFUNK_AXIS_* wire constants).
- Protocol: punktfunk/1 — control-plane magic LMN1 → PKF1, nonce salt lmn1 → pkf1.
  WIRE BREAK: clients must be rebuilt from this revision.
- Env knobs: PUNKTFUNK_VIDEO_SOURCE / PUNKTFUNK_COMPOSITOR / PUNKTFUNK_ZEROCOPY / ….
- Host config dir: ~/.config/punktfunk (the box's dir was migrated in place — the
  persistent identity is unchanged, pinned fingerprints stay valid).
- Swift package: PunktfunkKit + PunktfunkCore.xcframework + PunktfunkConnection
  (Sources/PunktfunkClient app + tests renamed with it); build-xcframework.sh updated.
- scripts/: 60-punktfunk.rules, punktfunk-host.service; OpenAPI doc regenerated.

Also: scripts/headless/run-headless-kde.sh — full headless Plasma bringup. Root cause of
"desktop but no apps/settings" over the stream: plasmashell launched without
XDG_MENU_PREFIX=plasma-, so the launcher resolved a nonexistent applications.menu and
rendered an empty menu. The script sets the complete KDE session env (menu prefix,
KDE_FULL_SESSION, session version) and rebuilds ksycoca before starting plasmashell.

Gate: 97/97 tests, clippy -D warnings (both feature sets), fmt, C-ABI harness PASS,
zero lumen references left outside .git.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 13:11:59 +00:00