Commit Graph

6 Commits

Author SHA1 Message Date
enricobuehler 4cc57d5c39 perf(host/windows): move capture→encode off the 3D engine (NV12/P010 video-processor path, zero-copy, GPU priority)
apple / swift (push) Successful in 56s
ci / rust (push) Successful in 1m36s
android / android (push) Successful in 1m56s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 28s
deb / build-publish (push) Successful in 2m26s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m33s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m58s
The Windows host capped at ~60 fps with 35-40 ms latency on a GPU-heavy game:
the per-frame capture→encode path shared the 3D engine with the game and got
scheduled behind it. Rework to minimize 3D-engine work per frame:

- VideoConverter (D3D11 video processor): capture → NVENC-native NV12/P010 so
  NVENC skips its internal RGB→YUV (a 3D/compute step). Wired into both DDA
  (dxgi.rs) and WGC (wgc.rs). New PixelFormat::Nv12/P010 + NVENC YUV input.
- GPU scheduling hardening (Apollo-style): D3DKMTSetProcessSchedulingPriorityClass
  HIGH, absolute SetGPUThreadPriority, SetMaximumFrameLatency(1).
- WGC SDR zero-copy (hold pool frames; no CopyResource). DDA keeps a fast
  CopyResource to decouple its single-frame acquire/release from the async convert.
- Pipelined helper encode loop (PUNKTFUNK_ENCODE_DEPTH, default 1) + perf split
  (cap_wait / encode / write).

Live on the RTX 4090: hard 60 fps ceiling removed (now scene-scaling 40-200+),
latency much reduced. Residual cap in GPU-pinned scenes is the irreducible RGB→YUV
convert (no fixed-function unit on NVIDIA — VideoProcessing engine reads 0%) waiting
behind an uncapped game under WDDM context time-slicing; Linux avoids it via
gamescope capping the game to the display refresh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:08:03 +00:00
enricobuehler bbabc04bca feat(hdr): Windows HDR10 + 10-bit end-to-end, negotiated; non-blocking capture recovery
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 1m32s
android / android (push) Successful in 1m49s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 30s
ci / bench (push) Successful in 1m36s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
deb / build-publish (push) Successful in 2m20s
flatpak / build-publish (push) Successful in 4m6s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m11s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m32s
Adds true HDR (BT.2020 PQ) and 10-bit (HEVC Main10) streaming, negotiated so an
8-bit/SDR client is never sent a stream it can't decode, plus a robust fix for the
capture losing the stream across a secure-desktop transition.

Protocol (punktfunk-core/quic.rs):
- Hello gains `video_caps` (VIDEO_CAP_10BIT / VIDEO_CAP_HDR), Welcome gains `bit_depth`,
  both as optional trailing bytes (back-compat). client-rs advertises 10-bit via
  PUNKTFUNK_CLIENT_10BIT; the connector advertises 0 for now (in-band detection drives
  the native clients). Regenerated punktfunk_core.h.

Windows host:
- 10-bit Main10: host enables it only when the client advertised VIDEO_CAP_10BIT AND
  PUNKTFUNK_10BIT is set; threaded through open_video → NVENC (profile Main10,
  pixelBitDepthMinus8).
- HDR: when the captured desktop is scRGB FP16 (R16G16B16A16_FLOAT, HDR on), copy it to
  an FP16 surface, composite the cursor there, convert scRGB → BT.2020 PQ 10-bit
  (R10G10B10A2) via a shader, and encode HEVC Main10 with the BT.2020/PQ colour VUI
  (ABGR10 input). Fixes the freeze + cursor-trail that came from feeding FP16 into the
  BGRA path. Reacts dynamically to the HDR toggle.
- Capture recovery: rebuild is now a single NON-BLOCKING attempt, throttled to ~4×/s,
  repeating the last good frame between attempts (format-tagged last_present). During a
  secure-desktop dwell SudoVDA's output is gone; the old blocking 12 s retry starved the
  send loop for seconds so the client timed out and disconnected — now the session stays
  fed (frozen) until the desktop returns. Also seeds a black frame on recovery.

Apple client (PunktfunkKit):
- Detects HDR in-band from the stream VUI (PQ transfer function), decodes to 10-bit P010,
  and presents via an rgba16Float + BT.2020 PQ CAMetalLayer with EDR; SDR path unchanged.
  Switches automatically on a mid-session HDR toggle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 20:28:52 +00:00
enricobuehler a95984bb4f feat(client-linux): feature parity with the Swift client
Everything the macOS app does that stage 1 lacked, before any new
feature work (user directive):

- Input capture is now a deliberate, reversible STATE (Moonlight-
  style): engaged on stream start and click-into-video (the engaging
  click is suppressed), released by Ctrl+Alt+Shift+Q (toggles) or
  focus loss; held keys/buttons are flushed host-side on release;
  cursor hiding + shortcut inhibition follow the state; HUD hint when
  released. Per-session window handlers disconnect with the page.
- Gamepads: app-lifetime SDL service (GamepadManager parity) — pad
  list + "Forwarded controller" pin in Settings (auto = most recent),
  "Automatic" pad TYPE resolves from the physical pad at connect;
  DualSense touchpad contacts + ~250 Hz motion samples on the 0xCC
  plane (Swift GamepadWire scale constants); feedback grows adaptive-
  trigger replay and player LEDs via raw DS5 effects packets (the
  wire's 11-byte blocks drop into SDL_SendGamepadEffect verbatim);
  held pad state zeroed on pad switch/detach. sdl3 "hidapi" feature.
- Microphone uplink: PipeWire capture -> Opus 20 ms -> 0xCB datagrams
  (validated live: host received 711 mic packets), Settings toggle.
- Speed test per saved host (Swift's "Test Network Speed…"): 2 s
  probe burst, goodput/loss + recommended ~70 % bitrate, one-tap apply.
- Settings: host compositor preference (sent in the Hello), native-
  display resolution/refresh resolved from the window's monitor at
  connect (new default), bitrate ceiling to 3 Gbit/s.
- Hosts page: saved/trusted hosts section for direct pinned reconnect
  (mDNS not required), rebuilt on every page return.

Deliberately not ported: audio device pickers (PipeWire routing owns
this on Linux), resize-to-request_mode (not wired in Swift either),
pointer-lock relative mouse (stage-2 presenter, needs raw Wayland).
DualSense fidelity needs a physical pad to live-verify.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 21:11:52 +00:00
enricobuehler a8a6224fd8 fix(encode): bound per-frame size with a tight VBV buffer
ci / rust (push) Failing after 36s
ci / web (push) Failing after 36s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Successful in 17s
ci / docs-site (push) Failing after 39s
apple / swift (push) Successful in 1m16s
NVENC ran CBR (bit_rate == max_bit_rate, rc=cbr) but never set rc_buffer_size,
so it used a loose default VBV. A high-motion P-frame was then allowed to spike
to many times the average frame size; the extra packets overflow the depth-2
send queue (newest frame dropped) and the kernel UDP buffer (WouldBlock drops),
which the client sees as framedrops/jitter — and on the infinite-GOP GameStream
path as old/stale frames flashing until the next RFI.

Set a tight ~1-frame VBV (rc_buffer_size = bitrate/fps) so the encoder holds
frame size roughly constant and absorbs motion as a momentary QP/quality dip
instead — the Sunshine/Moonlight low-latency model. Tunable via
PUNKTFUNK_VBV_FRAMES (default 1.0); larger trades burst tolerance for motion
quality. Fixes both the punktfunk/1 and GameStream paths (shared encoder).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 20:58:46 +00:00
enricobuehler 136390514d build: support FFmpeg 7.x and 8.x; fix RPM spec GPU link deps
ci / rust (push) Has been cancelled
punktfunk-host builds unchanged against either FFmpeg 7.x (libavcodec 61) or 8.x
(libavcodec 62) — ffmpeg-sys-next auto-detects the system version, and the host's
ffmpeg FFI only touches long-stable APIs. Confirmed by building + running live on a
Bazzite F43 box (FFmpeg 7.1.3): full gamescope capture → zero-copy dmabuf→CUDA →
NVENC H.265 at 1280x720x60, p50 ~0.96 ms. Just doc/spec accuracy, no code change:

- encode/linux.rs + CLAUDE.md: drop the "FFmpeg 8 only" claim; note 7.x/8.x both work.
- rpm spec: add the missing zero-copy GPU build deps the link actually needs —
  pkgconfig(gl) + pkgconfig(gbm) (mesa) — and document that -lcuda needs libcuda.so at
  link time (NVIDIA host, or the CUDA toolkit stub on a headless COPR/koji builder).
  Tracked for a proper fix: make the cuda/gbm/GL FFI dlopen-based like khronos-egl so
  the RPM builds on a GPU-less host.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 09:12:59 +00:00
enricobuehler bfd64ce871 rename: lumen → punktfunk, everywhere
ci / rust (push) Has been cancelled
Full project rename, decided 2026-06-10:
- Crates/binaries: punktfunk-core / punktfunk-host / punktfunk-client-rs.
- C ABI: punktfunk_* symbols, Punktfunk* types, include/punktfunk_core.h,
  PUNKTFUNK_FEATURE_QUIC guard (header regenerated; cbindgen renames updated, incl.
  PUNKTFUNK_BTN_*/PUNKTFUNK_AXIS_* wire constants).
- Protocol: punktfunk/1 — control-plane magic LMN1 → PKF1, nonce salt lmn1 → pkf1.
  WIRE BREAK: clients must be rebuilt from this revision.
- Env knobs: PUNKTFUNK_VIDEO_SOURCE / PUNKTFUNK_COMPOSITOR / PUNKTFUNK_ZEROCOPY / ….
- Host config dir: ~/.config/punktfunk (the box's dir was migrated in place — the
  persistent identity is unchanged, pinned fingerprints stay valid).
- Swift package: PunktfunkKit + PunktfunkCore.xcframework + PunktfunkConnection
  (Sources/PunktfunkClient app + tests renamed with it); build-xcframework.sh updated.
- scripts/: 60-punktfunk.rules, punktfunk-host.service; OpenAPI doc regenerated.

Also: scripts/headless/run-headless-kde.sh — full headless Plasma bringup. Root cause of
"desktop but no apps/settings" over the stream: plasmashell launched without
XDG_MENU_PREFIX=plasma-, so the launcher resolved a nonexistent applications.menu and
rendered an empty menu. The script sets the complete KDE session env (menu prefix,
KDE_FULL_SESSION, session version) and rebuilds ksycoca before starting plasmashell.

Gate: 97/97 tests, clippy -D warnings (both feature sets), fmt, C-ABI harness PASS,
zero lumen references left outside .git.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 13:11:59 +00:00