17 Commits

Author SHA1 Message Date
enricobuehler 5bf787eb2b feat(host): web-console performance capture — record stream stats, graph them
apple / swift (push) Successful in 1m1s
android / android (push) Successful in 4m13s
ci / rust (push) Successful in 4m42s
ci / web (push) Successful in 50s
ci / docs-site (push) Successful in 53s
windows-host / package (push) Successful in 5m51s
apple / screenshots (push) Successful in 5m1s
deb / build-publish (push) Successful in 2m29s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 33s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
ci / bench (push) Successful in 4m35s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m9s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m10s
Arm streaming-perf-stats capture from the web console, play, stop, and review the
run as graphs; finished captures are saved to disk as browsable/exportable
recordings. Covers both the native punktfunk/1 path and GameStream.

- stats_recorder.rs: one shared Arc<StatsRecorder> ring (created in gamestream::serve,
  shared with the mgmt API + both streaming loops, mirroring NativePairing). The
  hot-path gate is a runtime AtomicBool that replaces the startup-only PUNKTFUNK_PERF
  for *recording* (PERF stdout logging unchanged); bounded ring (~3 h); atomic
  temp+rename writes to ~/.config/punktfunk/captures/*.json; path-traversal-safe ids;
  poison-resilient locks.
- native (punktfunk1.rs) + GameStream (stream.rs) emit a StatsSample at their existing
  ~2 s / ~1 s aggregation boundary — per-stage latency p50/p99, fps new/repeat, goodput,
  loss/FEC deltas — with no new per-frame work beyond the cheap atomic check.
  FrameMsg.was_measured keeps pre-arm in-flight frames out of the first window's
  percentiles (without zeroing the Windows-relay path's fps/encode).
- mgmt.rs: 7 bearer-only /api/v1/stats/* endpoints (capture start/stop/status/live;
  recordings list/get/delete); api/openapi.json regenerated, in sync.
- web: new "Performance" page (recharts, rendered SSR-safe) — capture control, live
  graphs while armed, recordings table (view / download-JSON / delete), and a detail
  view with the latency stacked-area bottleneck breakdown (p50/p99 toggle) + throughput
  + health. Charts adapt to either path's stage set.

Design: design/stats-capture-plan.md. Built and adversarially reviewed via a multi-agent
workflow; workspace build/clippy(-D warnings)/fmt/tests green, OpenAPI no-drift. Not yet
on-glass validated against a live session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 13:59:39 +00:00
enricobuehler 0a6c9d8852 docs: point Android install at Discord for beta access + add community links
apple / swift (push) Successful in 1m32s
apple / screenshots (push) Successful in 3m26s
android / android (push) Successful in 4m7s
ci / rust (push) Successful in 4m36s
ci / web (push) Successful in 44s
ci / docs-site (push) Successful in 53s
deb / build-publish (push) Successful in 2m18s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 42s
ci / bench (push) Successful in 4m42s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m12s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m8s
docker / deploy-docs (push) Successful in 6s
The Android app is in Google Play Internal Testing, so the public Play Store URL
doesn't resolve for non-testers. Lead the Android install instructions with a
"request a tester invite on Discord" CTA (the Play listing unlocks once a Google
account is added to the test track), and surface the Discord + r/Punktfunk
community links in the README, the docs intro, and the docs-site nav.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:59:25 +00:00
enricobuehler 0eedfb3c1f docs: first-class Linux + Windows positioning + IDD-push differentiator
apple / swift (push) Failing after 0s
apple / screenshots (push) Has been skipped
windows-drivers-provision / provision (push) Successful in 13s
windows-drivers / probe-and-proto (push) Successful in 17s
windows-drivers / driver-build (push) Successful in 1m10s
android / android (push) Successful in 3m19s
ci / web (push) Successful in 39s
ci / docs-site (push) Successful in 53s
windows-host / package (push) Successful in 6m6s
ci / rust (push) Successful in 11m12s
decky / build-publish (push) Successful in 11s
ci / bench (push) Successful in 5m9s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 21s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 43s
deb / build-publish (push) Successful in 7m31s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m14s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m12s
release / apple (push) Failing after 1s
docker / deploy-docs (push) Successful in 19s
flatpak / build-publish (push) Successful in 4m43s
Drop the "Linux-first" framing across the README and docs site in favor of
first-class Linux AND Windows hosts, and surface the Windows IDD-push
virtual-display path as a distinct differentiator (punktfunk's own indirect
display driver the host pushes frames into — a real virtual display, no physical
monitor or dummy plug, even on the secure desktop).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:53:02 +00:00
enricobuehler f6490f4c28 fix: complete the docs/→design/ and openapi→api/ rename references
The file moves (docs/ → design/, docs/api/openapi.json → api/openapi.json) landed
in d01a8fd, but the matching reference updates did not — so mgmt.rs's drift-test
`include_str!("../../../docs/api/openapi.json")` pointed at a path that no longer
exists and the host failed to build. This restores it and updates every reference:

  - mgmt.rs include_str! → ../../../api/openapi.json (fixes the build)
  - web/orval.config.ts codegen target, web/Dockerfile, .dockerignore
  - deb/rpm/Arch packaging install paths
  - CLAUDE.md, the .gitea CI workflows, code doc-comments, design-doc cross-links

docs-site route URLs (/docs/...) untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:53:02 +00:00
enricobuehler d01a8fd17a feat(host): HDR Vulkan layer so Vulkan games get HDR on the virtual display
ci / web (push) Failing after 22s
windows-host / package (push) Failing after 4m16s
ci / rust (push) Failing after 4m56s
ci / docs-site (push) Successful in 1m7s
android / android (push) Successful in 9m19s
ci / bench (push) Successful in 4m47s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Failing after 3s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Failing after 6m29s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 7m4s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 7m17s
apple / swift (push) Successful in 1m13s
apple / screenshots (push) Successful in 5m27s
NVIDIA/AMD Vulkan ICDs refuse to *advertise* an HDR color space for a surface on an
IddCx indirect/virtual display, so Vulkan games (Doom: The Dark Ages, id Tech, Indiana
Jones, …) report "device does not support HDR" — even though Windows HDR, DWM compose,
and the client PQ stream all work, and the ICD happily *accepts + presents* a forced HDR
swapchain there. The whole gap is enumeration; the community (Apollo/Sunshine/VDD) wrote
this off as kernel-side / unfixable.

Add VK_LAYER_PUNKTFUNK_hdr_inject (packaging/windows/pf-vkhdr-layer/): a standalone
cdylib Vulkan implicit layer that appends {A2B10G10R10, HDR10_ST2084} + {RGBA16F, scRGB}
to vkGetPhysicalDeviceSurfaceFormats[2]KHR (no need to hook vkCreateSwapchainKHR — the
ICD doesn't validate the color space there). Self-gated on the surface monitor's actual
advanced-color state (DisplayConfig GET_ADVANCED_COLOR_INFO), so it is a complete no-op
on SDR sessions and real monitors (dedup). Always-on (registry-discovered) so it works
regardless of how a game is launched — env-scoping silently fails for already-running
Steam. Escape hatches: DISABLE_PF_VKHDR, PF_VKHDR_EXCLUDE, and a built-in kernel-anti-
cheat denylist.

The installer builds/signs/stages it and registers it under
HKLM64\SOFTWARE\Khronos\Vulkan\ImplicitLayers (opt-out "Install the HDR Vulkan layer"
task); windows-host CI fmt+clippy-gates it (msvc-only FFI).

Live-validated on the RTX box: Doom: The Dark Ages enables HDR over the pf-vdisplay
virtual display.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:33:20 +00:00
enricobuehler 3e7c9bd059 fix(host): remove unsound unsafe impl Sync for HelperRelay
apple / screenshots (push) Has been skipped
windows-drivers / probe-and-proto (push) Successful in 29s
audit / cargo-audit (push) Failing after 1m20s
windows-drivers / driver-build (push) Successful in 1m14s
ci / web (push) Successful in 46s
ci / docs-site (push) Successful in 1m3s
windows-host / package (push) Successful in 6m46s
apple / swift (push) Failing after 0s
release / apple (push) Failing after 0s
android / android (push) Failing after 2m5s
ci / bench (push) Successful in 4m34s
decky / build-publish (push) Successful in 22s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m25s
ci / rust (push) Successful in 8m36s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m11s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 59s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m37s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m3s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 29s
deb / build-publish (push) Successful in 7m50s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m52s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 1m5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m33s
flatpak / build-publish (push) Successful in 3m56s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m46s
docker / deploy-docs (push) Successful in 22s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m26s
The one genuine soundness defect the unsafe-proof program surfaced (flagged
SUSPECT in program 3/N). `HelperRelay` holds an `rx: Receiver<RelayAu>`, which is
`!Sync` (std mpsc is single-consumer), so asserting `Sync` claimed more than the
fields support — an `Arc<HelperRelay>` recv'd from two threads would compile and
be UB.

It was never live-exploited, and it turns out `Sync` is also unnecessary: the
relay is a single-owner `mut relay` local in the punktfunk1 two-process mux loop
(recv_timeout/try_recv/request_keyframe all called on the owning thread; no `Arc`,
no `thread::spawn` capturing it). So the fix is simply to delete the impl — the
struct keeps its sound `unsafe impl Send` (needed for the raw `HANDLE` fields),
which is all the code uses.

Box-verified: cargo clippy -p punktfunk-host --features nvenc --target
x86_64-pc-windows-msvc -- -D warnings stays green without the Sync impl.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 10:00:40 +00:00
enricobuehler 7aa787a789 docs(host): prove the last 3 files + crate-root deny (unsafe-proof program 4/N, final)
Completes the unsafe-proof program now that the parallel WIP has landed:

- idd_push.rs (25 sites), nvenc.rs (7), punktfunk1.rs (21): a SAFETY proof on
  every unsafe block — D3D11/DXGI COM (same-device textures, immediate-context
  single-thread, keyed-mutex-held convert), the NVENC SDK table (versioned POD,
  register/map/lock-bitstream pairing), cross-process shm reads (atomic
  magic/generation handshake), and the C-ABI harness (each call cross-checked
  against its abi.rs `# Safety` doc). No SUSPECT (UB) blocks.
- capture.rs / encode.rs: the parent-module deny is restored (their WIP children
  are now proven), and main.rs gains a crate-root
  #![deny(clippy::undocumented_unsafe_blocks)] — the permanent catch-all gate so
  no future unsafe block anywhere in the crate can land without a proof.
- Fixed 4 blocks the agents missed: unsafe blocks nested inside `assert_eq!(...)`
  macro args (the comment-above-statement didn't associate) — hoisted to a `let`.
- rustfmt-canonicalized the Windows files (the agents' SAFETY comments + some
  pre-existing 1.9.0 drift) so `cargo fmt --all --check` is clean.

Verified: cargo clippy -p punktfunk-host --all-targets -- -D warnings AND
cargo fmt -p punktfunk-host --check both green with the crate-root deny active.
Windows cfg(windows) re-verified on the box next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 09:57:00 +00:00
enricobuehler 3514702d8c feat(windows-host): IDD-push encodes native NV12/P010 (skip NVENC's SM-side CSC)
GPU-contention work (host-latency plan §5.A): the IDD-push output ring now hands
NVENC native YUV instead of RGB, so NVENC skips its internal RGB→YUV colour
conversion on the SM/3D engine the running game saturates.

- idd_push.rs: out_ring is now NV12 (SDR, BT.709 limited) via a D3D11 VIDEO-engine
  BGRA→NV12 VideoConverter (keeps the CSC off the contended 3D/compute engine), or
  P010 (HDR, BT.2020 PQ limited) via the FP16→P010 shader (NVIDIA's VideoProcessor
  can't do RGB→P010). The ring drops its per-slot RTV (textures only), matching the
  WGC YUV ring; converters rebuild on a size/HDR flip.
- nvenc.rs: NV12 input forces bit_depth=8 so an HDR→SDR toggle (or a 10-bit-
  negotiated client on an SDR display) re-inits the session at the matching depth —
  NV12 can't feed a 10-bit session (register_resource rejects it).
- punktfunk1.rs: per-stage latency instrumentation under PUNKTFUNK_PERF
  (cap=try_latest, submit=encode_picture, wait=lock_bitstream µs p50/p99/max) to
  pinpoint where capture→encoded latency goes under GPU saturation.
2026-06-26 09:35:23 +00:00
enricobuehler 327a5fa828 docs(host): prove unsafe blocks in the Windows + cross-platform files + gate them (unsafe-proof program 3/N)
Continues the unsafe-proof program across the Windows/cross-platform host files
(~75 blocks, 21 files), each with a SAFETY proof of the real invariant and a
per-file #![deny(clippy::undocumented_unsafe_blocks)] gate:

  capture/windows: dxgi.rs, wgc_relay.rs, wgc.rs, desktop_watch.rs, composed_flip.rs
                   (windows-rs COM: interface validity, same-D3D11-device textures,
                    immediate-context single-thread, borrowed args outlive the call)
  windows: service.rs (SCM/token/CreateProcessAsUserW/event handles — OwnedHandle
           liveness, no double-close/signal race), win_display, wgc_helper, interactive
  vdisplay/windows: manager.rs, pf_vdisplay.rs (SwDeviceCreate/IddCx/ioctl handle
                    liveness via the OnceLock VDM singleton + OwnedHandle)
  encode/windows: ffmpeg_win.rs (full AVBufferRef refcount audit — balanced, NO leaks,
                  unlike the vaapi sibling), sw.rs
  cross-platform: gamestream/audio.rs (libopus), gamestream/stream.rs (sendmmsg),
                  inject/windows/sendinput.rs, audio/windows/wasapi_mic.rs,
                  session_tuning.rs, vdisplay.rs

Two findings (handled separately):
- wgc_relay.rs `unsafe impl Sync for HelperRelay` is UNSOUND (its mpsc Receiver is
  !Sync) though not live-exploited — marked SUSPECT inline; fix pending box check
  (it touches the in-flight punktfunk1.rs).
- capture.rs / encode.rs (PARENT modules of the WIP idd_push.rs / nvenc.rs) do NOT
  get the file deny yet — it would propagate the lint into the undocumented WIP
  children. The deny lands there once those are documented (after the WIP commits).

Linux-visible parts verified green (cargo clippy -p punktfunk-host --all-targets
-- -D warnings). The cfg(windows) deny gates are box-verified next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 09:23:25 +00:00
enricobuehler 9777ed7fb3 fix(host/vaapi): plug two AVBufferRef leaks in DmabufInner::open
Surfaced while writing the unsafe-soundness proofs (2/N): both are refcount
leaks (sound — never dangling/double-free — so the SAFETY proofs held, but real
bugs on the persistent punktfunk1-host listener that opens a fresh encoder per
session).

1. Per-session leak: `par->hw_frames_ctx = av_buffer_ref(drm_frames)` created a
   second owned ref. `av_buffersrc_parameters_set` takes its OWN ref of
   `par->hw_frames_ctx`, and `av_free(par)` frees only the struct, not the ref —
   so the extra ref leaked every session, pinning the DRM frames ctx + device.
   Fix: assign `drm_frames` borrowed (the standard ffmpeg pattern); our single
   owned ref lives in DmabufInner and is unref'd in Drop.

2. Error-path leak: the final `open_vaapi_encoder(...)?` returned without the
   unref ladder every other error path runs, leaking graph/drm_frames/
   vaapi_device/drm_device on encoder-open failure. Fix: match + clean up before
   returning (nv12_ctx is borrowed from the sink → freed by graph teardown).

cargo clippy -p punktfunk-host --all-targets -- -D warnings clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 09:02:54 +00:00
enricobuehler ba68a98873 docs(host): prove every unsafe block in the Linux FFI files + gate them (unsafe-proof program 2/N)
Continues the structural unsafe-proof program (every unsafe carries a documented
proof of soundness; the file gains #![deny(clippy::undocumented_unsafe_blocks)]
so it stays proven). This batch covers all 10 remaining pure-Linux files
(104 blocks), each proof stating the REAL invariant — not boilerplate:

  zerocopy/cuda.rs (26)   leaked process-lifetime libcuda fn-ptr table; opaque
                          CUcontext never dereferenced; free-exactly-once via the
                          Arc<Mutex<PoolInner>> ownership graph; dmabuf fd take/close split
  zerocopy/egl.rs (18)    eglGetProcAddress'd procs with the GL context current;
                          EGLImage liveness; the two-call modifier-query bounds
  zerocopy/vulkan.rs (4)  copy-bounds arithmetic (src_size>=span); Send = thread
                          confinement to the punktfunk-pipewire thread
  dmabuf_fence.rs (4)     poll/ioctl/close fd liveness + ownership
  capture/linux/mod.rs (16)  spa_data repr(transparent) cast; null-checked spa
                          derefs; single-loop-thread buffer ownership until requeue
  inject/linux/gamepad.rs (10)  uinput ioctl request-number ↔ struct-size match
                          (static-asserted); InputEventRaw no-padding for the byte cast
  encode/linux/vaapi.rs (15) + encode/linux/mod.rs (9)  ffmpeg object ownership/
                          free ladders; VAAPI/DRM graph; Send = single-thread transfer
  inject/linux/wlr.rs (2), vdisplay/linux/kwin.rs (1)

No memory-unsafety SUSPECT blocks were found — the unsafe is sound. The vaapi
agent did flag two real AVBufferRef *leaks* (not UB) in DmabufInner::open; marked
inline with NOTE(leak) and addressed in a follow-up.

Verified: cargo clippy -p punktfunk-host --all-targets -- -D warnings is clean
(each file's deny gate hard-errors on any undocumented block).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 09:00:30 +00:00
enricobuehler 22359f5dc8 docs(host): prove every unsafe block in drm_sync.rs + gate it (unsafe-proof program 1/N)
Start of the structural unsafe-proof program (per the "every unsafe needs a
documented proof of soundness" goal): each `unsafe` block gets an accurate
`// SAFETY:` proof of WHY it is sound, and the file gains
`#![deny(clippy::undocumented_unsafe_blocks)]` so the proof requirement is
permanently enforced (a future undocumented unsafe in this file fails CI).

drm_sync.rs (10 blocks: libc open/ioctl/clock_gettime/close + 3 in tests): each
proof states the real invariant — fd liveness/ownership, the ioctl request number
encoding the matching struct size, the `&mut req` being a live correctly-sized
`#[repr(C)]` struct, and (for the timeline ioctls) the `handles`/`points` arrays
outliving the synchronous call with `count_handles` matching their length.

The gate grows file-by-file (CI stays green; undone files don't carry the lint
yet); it promotes to a crate-root deny once every file is done. ~122 Linux blocks
+ the Windows files remain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 08:35:32 +00:00
enricobuehler 7e9023faad feat(gamestream): launch apps on Windows + Linux non-gamescope hosts
GameStream's apps.json `cmd` is delivered via set_launch_command, which ONLY the Linux
gamescope backend nests. On Windows (no gamescope) and Linux kwin/mutter/wlroots (which
stream the existing desktop) the command was silently dropped. Now, after capture is live,
stream.rs spawns it via library::launch_gamestream_command for those backends — Windows:
into the interactive USER session (spawn_in_active_session, since the host is SYSTEM);
Linux: a plain `sh -c` spawn into the host's own graphical session so the app lands on the
streamed (primary) output. Linux gamescope keeps nesting via set_launch_command and is
skipped here to avoid a double launch. The command is operator-typed apps.json (trusted),
never client-set.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 08:12:53 +00:00
enricobuehler 5acc12d9e9 feat(library): shared cover-art warmer + cache (GOG + Xbox art)
A disk-backed art cache (library-art-cache.json in the canonical host config dir) is the
source of truth read by all_games(), so the library list + launch-resolve never block on
the network. A host-lifetime background warmer (start_art_warmer, started in serve())
fetches uncached art OFF the hot path: GOG via the public no-auth api.gog.com product API,
Xbox via the unofficial no-auth displaycatalog (keyed by StoreId). Both best-effort
(protocol-relative URLs normalized to https; results cached even when empty so they aren't
re-fetched). The GOG + Xbox providers now read cached_art() (title-only until warmed).

Cross-platform (ureq blocking HTTP — no tokio on this path) so the fetch/parse code is
compiled + checked everywhere; a host whose stores all self-provide art (Steam CDN /
Heroic CDN / Lutris data: URLs) does no fetching. Dep: ureq (webpki roots, no system certs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 08:00:31 +00:00
enricobuehler aed0bf0c2a feat(library): Windows Xbox / Game Pass store provider
XboxProvider scans each fixed drive's <drive>:\XboxGames for GDK games (presence of
Content\MicrosoftGame.config marks a game vs. an ordinary UWP app), parsing title /
Identity name / Executable Id / StoreId via roxmltree. The PackageFamilyName is READ
from the AppRepository\Packages\<PackageFullName> dir name (reduced to Name_Hash) —
never computed from the publisher. Launch via the AUMID (shell:AppsFolder\<PFN>!<AppId>)
through explorer in the interactive user session (UWP activation needs the user token,
which spawn_in_active_session already provides). Cover art (displaycatalog) is deferred
→ title-only. Known v1 gaps: custom .GamingRoot install folders + non-GDK pure-UWP Store
games (under the ACL-locked WindowsApps) aren't enumerated.

New windows_launch_for `aumid` arm; XboxProvider wired into all_games() under cfg(windows).
Dep: roxmltree (Windows). Windows unit tests cover MicrosoftGame.config parsing (incl. the
ms-resource title fallback), the PackageFullName→PFN reduction, and the aumid launch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:49:03 +00:00
enricobuehler b65745284e feat(library): Windows Epic + GOG store providers
EpicProvider reads the launcher's local .item manifests under %ProgramData% (no auth,
launcher need not run) with Playnite's exclusion filter (skip UE_* components +
non-launchable addons + dead install dirs); cover art from the base64 catcache.bin
(public Epic CDN, best-effort). Launch via the com.epicgames.launcher:// URI opened
through explorer.exe — the namespace:catalogItemId:appName triple, with a bare-appName
fallback so a launch is never dropped.

GogProvider enumerates HKLM\SOFTWARE\WOW6432Node\GOG.com\Games (winreg) + each
goggame-<id>.info primary FileTask into a direct-exe spawn (no Galaxy, dodges its
cold-start/anti-cheat). GOG cover art (public api.gog.com) is deferred — it needs an
HTTP fetch + cache off the hot all_games() path — so GOG is title-only for now.

windows_launch_for gains epic/gog arms; both providers wired into all_games() under
cfg(windows). Deps: base64 moved to the cross-platform table (Epic catcache decode +
Lutris art encode both need it); winreg added on the Windows target. Windows unit tests
cover the Epic exclusion filter + URI builder and the GOG spawn + play-task parsing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:37:30 +00:00
enricobuehler 8ca695eb4c docs(windows-host): SCM event redesign done + runtime-validated (D2 complete)
The service.rs STOP/SESSION events are now OnceLock<OwnedHandle> (61c02e6) — the
last host-side raw-handle smuggle retired. Runtime-validated on the RTX box: swap
in, sc start -> RUNNING, sc stop -> clean STOPPED in ~1s, original restored. D2
(OwnedHandle/RAII rollout) is complete; only the deferred host P0 lints remain in
Goal 3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:28:29 +00:00
127 changed files with 6890 additions and 287 deletions
+2 -2
View File
@@ -1,9 +1,9 @@
# Root build context is used only by web/Dockerfile, which needs web/ and
# docs/api/openapi.json. Allowlist those; keep everything else (target/, .git, crates)
# api/openapi.json. Allowlist those; keep everything else (target/, .git, crates)
# out of the context upload.
*
!web
!docs/api/openapi.json
!api/openapi.json
web/node_modules
web/.output
web/dist
+1 -1
View File
@@ -24,7 +24,7 @@ on:
push:
branches: [main]
# The flatpak is the CLIENT — only rebuild when the client/core/manifest change, not on every
# docs/host push (this is a heavy flatpak-builder run). Tags (v*, the client release) build too.
# design/host push (this is a heavy flatpak-builder run). Tags (v*, the client release) build too.
paths:
- 'clients/linux/**'
- 'crates/punktfunk-core/**'
@@ -1,5 +1,5 @@
# One-shot provisioning of the WDK + cargo-wdk onto the persistent self-hosted windows-amd64 runner, so
# the all-Rust UMDF drivers can build there (docs/windows-host-rewrite.md, M0). The runner has the base
# the all-Rust UMDF drivers can build there (design/windows-host-rewrite.md, M0). The runner has the base
# Windows SDK + MSVC + LLVM + Rust but NOT the WDK (no km/wdf/iddcx headers) or cargo-wdk.
#
# Dispatch manually (workflow_dispatch). Idempotent: re-running is a near no-op once provisioned. The
+1 -1
View File
@@ -1,5 +1,5 @@
# Windows driver workspace CI — runs on the self-hosted Windows runner (home-windows-1, host mode;
# label windows-amd64). Part of the Windows-host rewrite (docs/windows-host-rewrite.md, M0).
# label windows-amd64). Part of the Windows-host rewrite (design/windows-host-rewrite.md, M0).
#
# Stage 1 (this file): PROBE the runner's driver toolchain (WDK / EWDK / cargo-make / LLVM / the
# inf2cat/stampinf/devgen/signtool tools) so we know what's provisioned BEFORE writing driver code,
+12
View File
@@ -96,6 +96,18 @@ jobs:
# First-ever Windows lint coverage for the host (Linux CI never lints the windows-cfg code).
run: cargo clippy -p punktfunk-host --features nvenc,amf-qsv -- -D warnings
- name: Build + lint the HDR Vulkan layer (pf-vkhdr-layer)
shell: pwsh
# Standalone cdylib (own [workspace]) the installer bundles + registers (it lets Vulkan games
# like Doom use HDR on the virtual display). Lint here so a regression fails CI instead of
# silently shipping the host without the layer (pack-host-installer.ps1 builds it non-fatally).
# Windows-only FFI (user32 + the vk_layer loader glue) → can't be linted on the Linux CI.
run: |
Push-Location packaging/windows/pf-vkhdr-layer
cargo fmt --check; if ($LASTEXITCODE) { throw "pf-vkhdr-layer rustfmt" }
cargo clippy --release -- -D warnings; if ($LASTEXITCODE) { throw "pf-vkhdr-layer clippy" }
Pop-Location
- name: Ensure Inno Setup
shell: pwsh
run: |
+24 -9
View File
@@ -2,7 +2,7 @@
Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protocol core
(`punktfunk-core`) exposed over a C ABI and native clients per platform. Full design:
[`docs/implementation-plan.md`](docs/implementation-plan.md). Status table: `README.md`.
[`design/implementation-plan.md`](design/implementation-plan.md). Status table: `README.md`.
## Where the work stands
@@ -27,7 +27,15 @@ Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protoc
Input: mouse/keyboard (libei via RemoteDesktop portal on KWin/GNOME, gamescope's own EIS
socket, wlr protocols on Sway) and **gamepads** (uinput X-Box-360 pads + rumble
back-channel; validated live — pad created/destroyed with the session). Management REST API +
checked-in OpenAPI doc (`mgmt.rs`).
checked-in OpenAPI doc (`mgmt.rs`). **Web-console performance capture** (`stats_recorder.rs`,
design: [`design/stats-capture-plan.md`](design/stats-capture-plan.md)): the operator arms stats
recording from the web console, plays, stops, and reviews the run as graphs (per-stage latency
breakdown · fps new/repeat · goodput · loss/FEC). A shared `Arc<StatsRecorder>` ring (the hot-path
gate is a runtime `AtomicBool`, replacing the startup-only `PUNKTFUNK_PERF`) is fed by **both** the
native `virtual_stream` and the GameStream encode loop at their existing ~2 s/~1 s aggregation
boundary, and finished captures are saved as on-disk recordings
(`~/.config/punktfunk/captures/*.json`) browsable/exportable from the console's **Performance** page
(recharts). Endpoints `/api/v1/stats/*` (bearer-only). *Implemented; not yet on-glass validated.*
- **Native protocol (`punktfunk/1`): full session planes, validated live.** QUIC
control plane (`punktfunk-core` `quic` feature: Hello{mode}/Welcome{full Config}/Start), data
plane = the hardened core `Session` over raw UDP with **GF(2¹⁶) Leopard FEC + AES-GCM**
@@ -104,9 +112,16 @@ Low-latency desktop/game streaming stack, Linux-first, with a shared Rust protoc
captures the HDR desktop as FP16/Rgb10a2 (DDA FP16 for the secure desktop), the encoder forces HEVC
Main10 + BT.2020 PQ (NVENC ABGR10/P010; AMF/QSV P010 + a swscale Rgb10a2→P010 fallback), the client
auto-detects PQ from the HEVC VUI — gated by `PUNKTFUNK_10BIT` + client `VIDEO_CAP_10BIT`; **Windows
host only** (the Linux host stays 8-bit, blocked upstream). **AMF/QSV is CI-green but not yet
on-glass validated** (no AMD/Intel Windows box in the lab); NVENC is live-validated. Newer/less
battle-tested than the Linux host. Packaging: `packaging/windows/`.
host only** (the Linux host stays 8-bit, blocked upstream). **Vulkan-game HDR over the virtual
display**: NVIDIA/AMD Vulkan ICDs refuse to *advertise* an HDR color space for a surface on an IddCx
indirect display (so Vulkan games — Doom: The Dark Ages, id Tech, etc. — say "device does not support
HDR"), even though the ICD happily *accepts + presents* a forced HDR swapchain there. A tiny always-on
Vulkan **implicit layer** (`packaging/windows/pf-vkhdr-layer/`, `VK_LAYER_PUNKTFUNK_hdr_inject`)
injects the `HDR10_ST2084`/scRGB surface formats into `vkGetPhysicalDeviceSurfaceFormats[2]KHR`,
self-gated on the display's actual advanced-color state (no-op on SDR / real monitors); bundled +
HKLM-registered by the installer. **Live-validated: Doom: The Dark Ages enables HDR over the virtual
display.** **AMF/QSV is CI-green but not yet on-glass validated** (no AMD/Intel Windows box in the
lab); NVENC is live-validated. Newer/less battle-tested than the Linux host. Packaging: `packaging/windows/`.
## What's left
@@ -245,8 +260,8 @@ bash crates/punktfunk-core/tests/c/run.sh # standalone C-ABI link + round-trip
```
Generated artifacts are **checked in** and CI fails on drift: `include/punktfunk_core.h`
(cbindgen from `punktfunk-core/src/abi.rs`) and `docs/api/openapi.json` (regenerate with
`cargo run -p punktfunk-host -- openapi > docs/api/openapi.json`; spec lives in `mgmt.rs`).
(cbindgen from `punktfunk-core/src/abi.rs`) and `api/openapi.json` (regenerate with
`cargo run -p punktfunk-host -- openapi > api/openapi.json`; spec lives in `mgmt.rs`).
CI is Gitea Actions (`.gitea/workflows/`, guide: docs-site `ci.md`): `ci.yml` runs the
workspace checks inside the `git.unom.io/unom/punktfunk-rust-ci` image plus web/docs-site
@@ -268,7 +283,7 @@ crates/punktfunk-host/
zerocopy/{egl,cuda,vulkan}.rs dmabuf → CUDA → NVENC (tiled via EGL/GL, LINEAR via Vulkan)
inject/{libei,wlr,gamepad,dualsense}.rs input backends (uinput xpad + UHID DualSense)
encode/{nvenc,linux,vaapi,ffmpeg_win,sw}.rs per-GPU encoders (NVENC · Linux NVENC/CUDA · VAAPI · AMF/QSV · openh264)
capture.rs · encode.rs · audio.rs · spike.rs · punktfunk1.rs · mgmt.rs · native_pairing.rs
capture.rs · encode.rs · audio.rs · spike.rs · punktfunk1.rs · mgmt.rs · native_pairing.rs · stats_recorder.rs
clients/probe/ punktfunk/1 reference/probe client (headless test/measurement tool)
clients/linux/ native Linux client (GTK4/libadwaita · FFmpeg · PipeWire · SDL3)
clients/windows/ native Windows client (WinUI 3 via windows-reactor · D3D11 · WASAPI · SDL3)
@@ -276,7 +291,7 @@ clients/apple/ native macOS/iOS/tvOS client (Swift · VideoToolbox · GameCon
clients/android/ native Android client (Kotlin app + native/ Rust JNI core over punktfunk-core)
clients/decky/ Steam Deck Decky plugin
crates/punktfunk-host/src/{capture/dxgi,vdisplay/sudovda,encode/ffmpeg_win,inject/gamepad_windows,audio/wasapi_*,service}.rs Windows host backends
web/ TanStack web console over the mgmt API (status · devices · pairing)
web/ TanStack web console over the mgmt API (status · devices · pairing · performance graphs)
packaging/ apt(deb) · RPM/COPR · Arch/sysext · Flatpak · Bazzite bootc · Windows host installer (per-dir READMEs)
tools/{loss-harness,latency-probe}/ measurement (plan §10)
scripts/ 60-punktfunk.rules · punktfunk-host.service · host.env.example · headless/
Generated
+332
View File
@@ -2,6 +2,12 @@
# It is not intended for manual editing.
version = 3
[[package]]
name = "adler2"
version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "320119579fcad9c21884f5c4861d16174d0e06250625266f50fe6898340abefa"
[[package]]
name = "aead"
version = "0.5.2"
@@ -735,6 +741,15 @@ dependencies = [
"libc",
]
[[package]]
name = "crc32fast"
version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9481c1c90cbf2ac953f07c8d4a58aa3945c425b7185c9154d67a65e4230da511"
dependencies = [
"cfg-if",
]
[[package]]
name = "criterion"
version = "0.5.1"
@@ -1100,6 +1115,16 @@ version = "0.5.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1d674e81391d1e1ab681a28d99df07927c6d4aa5b027d7da16ba32d1d21ecd99"
[[package]]
name = "flate2"
version = "1.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "843fba2746e448b37e26a819579957415c8cef339bf08564fe8b7ddbd959573c"
dependencies = [
"crc32fast",
"miniz_oxide",
]
[[package]]
name = "flume"
version = "0.11.1"
@@ -1751,12 +1776,115 @@ dependencies = [
"tower-service",
]
[[package]]
name = "icu_collections"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2984d1cd16c883d7935b9e07e44071dca8d917fd52ecc02c04d5fa0b5a3f191c"
dependencies = [
"displaydoc",
"potential_utf",
"utf8_iter",
"yoke",
"zerofrom",
"zerovec",
]
[[package]]
name = "icu_locale_core"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "92219b62b3e2b4d88ac5119f8904c10f8f61bf7e95b640d25ba3075e6cac2c29"
dependencies = [
"displaydoc",
"litemap",
"tinystr",
"writeable",
"zerovec",
]
[[package]]
name = "icu_normalizer"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c56e5ee99d6e3d33bd91c5d85458b6005a22140021cc324cea84dd0e72cff3b4"
dependencies = [
"icu_collections",
"icu_normalizer_data",
"icu_properties",
"icu_provider",
"smallvec",
"zerovec",
]
[[package]]
name = "icu_normalizer_data"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "da3be0ae77ea334f4da67c12f149704f19f81d1adf7c51cf482943e84a2bad38"
[[package]]
name = "icu_properties"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bee3b67d0ea5c2cca5003417989af8996f8604e34fb9ddf96208a033901e70de"
dependencies = [
"icu_collections",
"icu_locale_core",
"icu_properties_data",
"icu_provider",
"zerotrie",
"zerovec",
]
[[package]]
name = "icu_properties_data"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8e2bbb201e0c04f7b4b3e14382af113e17ba4f63e2c9d2ee626b720cbce54a14"
[[package]]
name = "icu_provider"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "139c4cf31c8b5f33d7e199446eff9c1e02decfc2f0eec2c8d71f65befa45b421"
dependencies = [
"displaydoc",
"icu_locale_core",
"writeable",
"yoke",
"zerofrom",
"zerotrie",
"zerovec",
]
[[package]]
name = "id-arena"
version = "2.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3d3067d79b975e8844ca9eb072e16b31c3c1c36928edf9c6789548c524d0d954"
[[package]]
name = "idna"
version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3b0875f23caa03898994f6ddc501886a45c7d3d62d04d2d90788d47be1b1e4de"
dependencies = [
"idna_adapter",
"smallvec",
"utf8_iter",
]
[[package]]
name = "idna_adapter"
version = "1.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cb68373c0d6620ef8105e855e7745e18b0d00d3bdb07fb532e434244cdb9a714"
dependencies = [
"icu_normalizer",
"icu_properties",
]
[[package]]
name = "if-addrs"
version = "0.15.0"
@@ -2022,6 +2150,12 @@ version = "0.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "32a66949e030da00e8c7d4434b251670a91556f4144941d37452769c25d58a53"
[[package]]
name = "litemap"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "92daf443525c4cce67b150400bc2316076100ce0b3686209eb8cf3c31612e6f0"
[[package]]
name = "lock_api"
version = "0.4.14"
@@ -2116,6 +2250,16 @@ version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "68354c5c6bd36d73ff3feceb05efa59b6acb7626617f4962be322a825e61f79a"
[[package]]
name = "miniz_oxide"
version = "0.8.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1fa76a2c86f704bdb222d66965fb3d63269ce38518b83cb0575fca855ebb6316"
dependencies = [
"adler2",
"simd-adler32",
]
[[package]]
name = "mio"
version = "1.2.1"
@@ -2548,6 +2692,15 @@ dependencies = [
"universal-hash",
]
[[package]]
name = "potential_utf"
version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0103b1cef7ec0cf76490e969665504990193874ea05c85ff9bab8b911d0a0564"
dependencies = [
"zerovec",
]
[[package]]
name = "powerfmt"
version = "0.2.0"
@@ -2728,6 +2881,7 @@ dependencies = [
"rand 0.8.6",
"rcgen",
"reis",
"roxmltree",
"rsa",
"rusqlite",
"rustls",
@@ -2741,6 +2895,7 @@ dependencies = [
"tower",
"tracing",
"tracing-subscriber",
"ureq",
"utoipa",
"utoipa-axum",
"utoipa-scalar",
@@ -2752,6 +2907,7 @@ dependencies = [
"wayland-scanner",
"windows 0.62.2 (registry+https://github.com/rust-lang/crates.io-index)",
"windows-service",
"winreg",
"x509-parser",
"xkbcommon",
]
@@ -3054,6 +3210,15 @@ dependencies = [
"windows-sys 0.52.0",
]
[[package]]
name = "roxmltree"
version = "0.21.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f1964b10c76125c36f8afe190065a4bf9a87bf324842c05701330bba9f1cacbb"
dependencies = [
"memchr",
]
[[package]]
name = "rpkg-config"
version = "0.1.2"
@@ -3555,6 +3720,12 @@ dependencies = [
"rand_core 0.6.4",
]
[[package]]
name = "simd-adler32"
version = "0.3.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "703d5c7ef118737c72f1af64ad2f6f8c5e1921f818cdcb97b8fe6fc69bf66214"
[[package]]
name = "siphasher"
version = "1.0.3"
@@ -3637,6 +3808,12 @@ dependencies = [
"wasm-bindgen",
]
[[package]]
name = "stable_deref_trait"
version = "1.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6ce2be8dc25455e1f91df71bfa12ad37d7af1092ae736f3a6cd0e37bc7810596"
[[package]]
name = "strsim"
version = "0.11.1"
@@ -3789,6 +3966,16 @@ dependencies = [
"time-core",
]
[[package]]
name = "tinystr"
version = "0.8.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c8323304221c2a851516f22236c5722a72eaa19749016521d6dff0824447d96d"
dependencies = [
"displaydoc",
"zerovec",
]
[[package]]
name = "tinytemplate"
version = "1.2.1"
@@ -4094,6 +4281,40 @@ version = "0.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ecb6da28b8a351d773b68d5825ac39017e680750f980f3a1a85cd8dd28a47c1"
[[package]]
name = "ureq"
version = "2.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "02d1a66277ed75f640d608235660df48c8e3c19f3b4edb6a263315626cc3c01d"
dependencies = [
"base64",
"flate2",
"log",
"once_cell",
"rustls",
"rustls-pki-types",
"url",
"webpki-roots 0.26.11",
]
[[package]]
name = "url"
version = "2.5.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ff67a8a4397373c3ef660812acab3268222035010ab8680ec4215f38ba3d0eed"
dependencies = [
"form_urlencoded",
"idna",
"percent-encoding",
"serde",
]
[[package]]
name = "utf8_iter"
version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6c140620e7ffbb22c2dee59cafe6084a59b5ffc27a8859a5f0d494b5d52b6be"
[[package]]
name = "utf8parse"
version = "0.2.2"
@@ -4421,6 +4642,24 @@ dependencies = [
"rustls-pki-types",
]
[[package]]
name = "webpki-roots"
version = "0.26.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "521bc38abb08001b01866da9f51eb7c5d647a19260e00054a8c7fd5f9e57f7a9"
dependencies = [
"webpki-roots 1.0.8",
]
[[package]]
name = "webpki-roots"
version = "1.0.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bf85cb06032201fa7c6f829d7db5a7e5aa45bcc0655327713065f6f0576731bf"
dependencies = [
"rustls-pki-types",
]
[[package]]
name = "wide"
version = "0.7.33"
@@ -4946,6 +5185,16 @@ dependencies = [
"memchr",
]
[[package]]
name = "winreg"
version = "0.56.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7d6f32a0ff4a9f6f01231eb2059cc85479330739333e0e58cadf03b6af2cca10"
dependencies = [
"cfg-if",
"windows-sys 0.61.2",
]
[[package]]
name = "wit-bindgen"
version = "0.51.0"
@@ -5040,6 +5289,12 @@ dependencies = [
"wasmparser",
]
[[package]]
name = "writeable"
version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1ffae5123b2d3fc086436f8834ae3ab053a283cfac8fe0a0b8eaae044768a4c4"
[[package]]
name = "x509-parser"
version = "0.16.0"
@@ -5083,6 +5338,29 @@ dependencies = [
"time",
]
[[package]]
name = "yoke"
version = "0.8.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "709fe23a0424b6a435d82152b1bd3fdfb0833487d5fa90d05d42762a9891fef5"
dependencies = [
"stable_deref_trait",
"yoke-derive",
"zerofrom",
]
[[package]]
name = "yoke-derive"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "de844c262c8848816172cef550288e7dc6c7b7814b4ee56b3e1553f275f1858e"
dependencies = [
"proc-macro2",
"quote",
"syn",
"synstructure",
]
[[package]]
name = "zbus"
version = "5.16.0"
@@ -5159,12 +5437,66 @@ dependencies = [
"syn",
]
[[package]]
name = "zerofrom"
version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0ec05a11813ea801ff6d75110ad09cd0824ddba17dfe17128ea0d5f68e6c5272"
dependencies = [
"zerofrom-derive",
]
[[package]]
name = "zerofrom-derive"
version = "0.1.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "11532158c46691caf0f2593ea8358fed6bbf68a0315e80aae9bd41fbade684a1"
dependencies = [
"proc-macro2",
"quote",
"syn",
"synstructure",
]
[[package]]
name = "zeroize"
version = "1.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b97154e67e32c85465826e8bcc1c59429aaaf107c1e4a9e53c8d8ccd5eff88d0"
[[package]]
name = "zerotrie"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0f9152d31db0792fa83f70fb2f83148effb5c1f5b8c7686c3459e361d9bc20bf"
dependencies = [
"displaydoc",
"yoke",
"zerofrom",
]
[[package]]
name = "zerovec"
version = "0.11.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "90f911cbc359ab6af17377d242225f4d75119aec87ea711a880987b18cd7b239"
dependencies = [
"yoke",
"zerofrom",
"zerovec-derive",
]
[[package]]
name = "zerovec-derive"
version = "0.11.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "625dc425cab0dca6dc3c3319506e6593dcb08a9f387ea3b284dbd52a92c40555"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "zmij"
version = "1.0.21"
+11 -3
View File
@@ -1,13 +1,16 @@
# punktfunk
**Low-latency desktop and game streaming, Linux-first.** Run the host on a Linux machine — or a
Windows PC — with an NVIDIA GPU, connect from a Mac, PC, phone, tablet, or TV, and stream your desktop
**Low-latency desktop and game streaming with first-class Linux and Windows hosts.** Run the host on
a Linux machine or a Windows PC, connect from a Mac, PC, phone, tablet, or TV, and stream your desktop
or games — each device at its **own native resolution and refresh rate**, over your local network.
📖 **Documentation: [docs.punktfunk.unom.io](https://docs.punktfunk.unom.io)** — start with
[How It Works](https://docs.punktfunk.unom.io/docs/how-it-works) or the
[Quick Start](https://docs.punktfunk.unom.io/docs/quickstart).
💬 **Community: [Discord](https://discord.gg/kaPNvzMuGU)** — chat, support, and **Android beta
access** · **[r/Punktfunk](https://www.reddit.com/r/Punktfunk/)**.
punktfunk pairs a **virtual-display streaming host** with native clients on every platform. It speaks
the existing **GameStream** protocol, so any [Moonlight](https://moonlight-stream.org/) client works
day one — and adds its own faster **`punktfunk/1`** protocol that breaks the ~1 Gbps FEC wall with a
@@ -19,6 +22,11 @@ protocol, FEC, and crypto, linked into the host and every client over a stable C
- **Your device's exact mode.** For each client that connects, the host spins up a virtual display
sized to that device — 1080p60 to a laptop, 1440p120 to a desktop, 4K to a TV, all at once. No
letterboxing, no scaling, no rearranging your real monitors.
- **A real virtual display on Windows, too.** On Linux the host uses per-compositor virtual outputs;
on Windows you get the same on-the-fly virtual display — at the client's exact mode, no physical
monitor or dummy HDMI plug, even on the secure desktop (UAC / lock screen). It also has **its own
indirect display driver (IDD)** the host pushes finished frames straight into, rather than scraping
a screen — tight, push-based integration that's unusual for a Windows streaming host.
- **Low latency, GPU end to end.** Frames go straight from the compositor to the NVENC encoder with
zero CPU copies (dmabuf → CUDA/Vulkan → NVENC), over a transport tuned for responsiveness rather
than throughput. Stable 240 fps at 5120×1440; sub-millisecond capture-to-reassembly on a LAN.
@@ -124,7 +132,7 @@ clients/
web/ web console (TanStack) over the management API — status · devices · pairing
packaging/ apt · rpm / COPR · Arch · Flatpak · Bazzite bootc image
docs-site/ public documentation site (Fumadocs) — https://docs.punktfunk.unom.io
docs/ design notes & deep-dive plans
design/ design notes & deep-dive plans
include/punktfunk_core.h cbindgen-generated C header (checked in)
tools/ latency-probe · loss-harness (measurement)
```
+528
View File
@@ -978,6 +978,309 @@
}
}
},
"/api/v1/stats/capture/live": {
"get": {
"tags": [
"stats"
],
"summary": "Live in-progress capture",
"description": "The full sample time-series of the capture currently recording, for live graphing. `404` when\nnothing is armed.",
"operationId": "statsCaptureLive",
"responses": {
"200": {
"description": "The in-progress capture (meta + samples so far)",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Capture"
}
}
}
},
"401": {
"description": "Missing or invalid bearer token",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
},
"404": {
"description": "No capture is currently recording",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
}
}
}
},
"/api/v1/stats/capture/start": {
"post": {
"tags": [
"stats"
],
"summary": "Start a stats capture",
"description": "Arms a new performance-stats capture. Idempotent: if a capture is already running this returns\nthe current status unchanged. While armed, the streaming loops emit aggregated samples (~ every\n12 s) into the in-progress capture, readable live via `GET /stats/capture/live`.",
"operationId": "statsCaptureStart",
"responses": {
"200": {
"description": "Capture armed (or already running)",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/StatsStatus"
}
}
}
},
"401": {
"description": "Missing or invalid bearer token",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
}
}
}
},
"/api/v1/stats/capture/status": {
"get": {
"tags": [
"stats"
],
"summary": "Stats capture status",
"description": "Whether a capture is armed, its sample count, and start time. Poll this (e.g. every 2 s) to\ndrive the capture-control UI.",
"operationId": "statsCaptureStatus",
"responses": {
"200": {
"description": "In-progress capture status (idle when not armed)",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/StatsStatus"
}
}
}
},
"401": {
"description": "Missing or invalid bearer token",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
}
}
}
},
"/api/v1/stats/capture/stop": {
"post": {
"tags": [
"stats"
],
"summary": "Stop the stats capture",
"description": "Disarms the in-progress capture and writes it to disk atomically, returning its summary. If\nnothing was recording, returns `204 No Content`.",
"operationId": "statsCaptureStop",
"responses": {
"200": {
"description": "Capture stopped and saved",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/CaptureMeta"
}
}
}
},
"204": {
"description": "Nothing was recording"
},
"401": {
"description": "Missing or invalid bearer token",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
},
"500": {
"description": "Could not write the recording to disk",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
}
}
}
},
"/api/v1/stats/recordings": {
"get": {
"tags": [
"stats"
],
"summary": "List saved recordings",
"description": "Every saved capture's summary (the `meta` head only — not the sample body), newest first.",
"operationId": "statsRecordingsList",
"responses": {
"200": {
"description": "Saved capture summaries, newest first",
"content": {
"application/json": {
"schema": {
"type": "array",
"items": {
"$ref": "#/components/schemas/CaptureMeta"
}
}
}
}
},
"401": {
"description": "Missing or invalid bearer token",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
}
}
}
},
"/api/v1/stats/recordings/{id}": {
"get": {
"tags": [
"stats"
],
"summary": "Get a saved recording",
"description": "The full capture (meta + samples) for `id`, for graphing or download.",
"operationId": "statsRecordingGet",
"parameters": [
{
"name": "id",
"in": "path",
"description": "The recording id (its filename stem)",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"200": {
"description": "The full capture",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Capture"
}
}
}
},
"401": {
"description": "Missing or invalid bearer token",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
},
"404": {
"description": "No recording with that id",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
},
"500": {
"description": "The recording file is unreadable",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
}
}
},
"delete": {
"tags": [
"stats"
],
"summary": "Delete a saved recording",
"description": "Removes the recording `id` from disk. `404` if there is no such recording.",
"operationId": "statsRecordingDelete",
"parameters": [
{
"name": "id",
"in": "path",
"description": "The recording id (its filename stem)",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"204": {
"description": "Recording deleted"
},
"401": {
"description": "Missing or invalid bearer token",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
},
"404": {
"description": "No recording with that id",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
},
"500": {
"description": "Could not delete the recording",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ApiError"
}
}
}
}
}
}
},
"/api/v1/status": {
"get": {
"tags": [
@@ -1125,6 +1428,89 @@
}
}
},
"Capture": {
"type": "object",
"description": "A full capture: summary + the sample time-series. The wire + on-disk shape.",
"required": [
"meta",
"samples"
],
"properties": {
"meta": {
"$ref": "#/components/schemas/CaptureMeta"
},
"samples": {
"type": "array",
"items": {
"$ref": "#/components/schemas/StatsSample"
}
}
}
},
"CaptureMeta": {
"type": "object",
"description": "Capture summary — the filename stem plus the negotiated mode/codec/client. Stored at the head\nof each on-disk recording and listed standalone (without the sample body) by\n[`StatsRecorder::list`].",
"required": [
"id",
"started_unix_ms",
"duration_ms",
"kind",
"width",
"height",
"fps",
"codec",
"client",
"sample_count"
],
"properties": {
"client": {
"type": "string",
"description": "Short label / fingerprint prefix, or `\"\"` if unknown."
},
"codec": {
"type": "string",
"description": "`\"h264\" | \"hevc\" | \"av1\"`."
},
"duration_ms": {
"type": "integer",
"format": "int64",
"minimum": 0
},
"fps": {
"type": "integer",
"format": "int32",
"minimum": 0
},
"height": {
"type": "integer",
"format": "int32",
"minimum": 0
},
"id": {
"type": "string",
"description": "e.g. `\"2026-06-26T20-14-03Z_5120x1440\"` — also the filename stem."
},
"kind": {
"type": "string",
"description": "`\"native\" | \"gamestream\"`."
},
"sample_count": {
"type": "integer",
"format": "int32",
"minimum": 0
},
"started_unix_ms": {
"type": "integer",
"format": "int64",
"minimum": 0
},
"width": {
"type": "integer",
"format": "int32",
"minimum": 0
}
}
},
"CustomEntry": {
"type": "object",
"description": "A user-added title, persisted in `~/.config/punktfunk/library.json`. Same shape the API\nreturns and the web console edits.",
@@ -1595,6 +1981,144 @@
}
}
},
"StageTiming": {
"type": "object",
"description": "One pipeline stage's latency in an aggregation window (microseconds).",
"required": [
"name",
"p50_us",
"p99_us"
],
"properties": {
"name": {
"type": "string",
"description": "`\"capture\" | \"submit\" | \"encode\" | \"packetize\" | \"send\"` (path-dependent)."
},
"p50_us": {
"type": "number",
"format": "float"
},
"p99_us": {
"type": "number",
"format": "float"
}
}
},
"StatsSample": {
"type": "object",
"description": "One aggregated sample (~ every 2 s native, ~ every 1 s GameStream).",
"required": [
"t_ms",
"session_id",
"stages",
"fps",
"repeat_fps",
"mbps",
"bitrate_kbps",
"frames_dropped",
"packets_dropped",
"send_dropped",
"fec_recovered"
],
"properties": {
"bitrate_kbps": {
"type": "integer",
"format": "int32",
"description": "Configured target bitrate.",
"minimum": 0
},
"fec_recovered": {
"type": "integer",
"format": "int32",
"description": "FEC shards recovered this window (delta).",
"minimum": 0
},
"fps": {
"type": "number",
"format": "float",
"description": "Genuine NEW frames/s from the source."
},
"frames_dropped": {
"type": "integer",
"format": "int32",
"description": "Frames dropped this window (delta).",
"minimum": 0
},
"mbps": {
"type": "number",
"format": "float",
"description": "Transmit goodput (Mb/s)."
},
"packets_dropped": {
"type": "integer",
"format": "int32",
"description": "Packets dropped this window (receiver-side / reassembler, where known).",
"minimum": 0
},
"repeat_fps": {
"type": "number",
"format": "float",
"description": "Re-encoded holds/s (source-starvation indicator)."
},
"send_dropped": {
"type": "integer",
"format": "int32",
"description": "Host send-buffer overflow / EAGAIN this window (delta).",
"minimum": 0
},
"session_id": {
"type": "integer",
"format": "int32",
"description": "Disambiguates concurrent sessions (usually constant).",
"minimum": 0
},
"stages": {
"type": "array",
"items": {
"$ref": "#/components/schemas/StageTiming"
},
"description": "Ordered pipeline stages for this path."
},
"t_ms": {
"type": "integer",
"format": "int64",
"description": "Milliseconds since capture start (monotonic; stamped by [`StatsRecorder::push_sample`]).",
"minimum": 0
}
}
},
"StatsStatus": {
"type": "object",
"description": "Snapshot of the in-progress capture for the management API.",
"required": [
"armed",
"sample_count",
"started_unix_ms",
"kind"
],
"properties": {
"armed": {
"type": "boolean",
"description": "Capture currently running."
},
"kind": {
"type": "string",
"description": "Path of the in-progress capture (`\"\"` if idle)."
},
"sample_count": {
"type": "integer",
"format": "int32",
"description": "Samples in the in-progress capture.",
"minimum": 0
},
"started_unix_ms": {
"type": "integer",
"format": "int64",
"description": "Unix start time of the in-progress capture (`0` if idle).",
"minimum": 0
}
}
},
"StreamInfo": {
"type": "object",
"description": "RTSP-negotiated stream parameters.",
@@ -1696,6 +2220,10 @@
{
"name": "library",
"description": "Game library: installed-store titles (Steam) plus user-curated custom entries"
},
{
"name": "stats",
"description": "Streaming performance-stats capture: arm/stop a recording, read the live + saved time-series for graphing"
}
]
}
+1 -1
View File
@@ -361,4 +361,4 @@ ever switched to a logged-in GUI session, re-adding macOS to the job's capture s
- Mid-stream renegotiation (resolution change without reconnect) is designed-for but not
implemented (the Welcome is one-shot today).
- Host-side gamepad injection needs `/dev/uinput` access on the box (udev rule from
`docs/linux-setup.md`).
`design/linux-setup.md`).
+1 -1
View File
@@ -276,7 +276,7 @@ pub mod frame {
/// These were hand-duplicated as `OFF_*`/`SHM_*` constants in `inject/{gamepad,dualsense}_windows.rs`
/// and (as bare literals — `*view.add(140)`) in the standalone `xusb-driver`/`dualsense-driver`
/// workspaces, guarded only by "must match" comments — the top ABI-drift hazard the audit flagged
/// (`docs/windows-host-rewrite.md` §2.7). Owning them here with `Pod` derives + `offset_of!`
/// (`design/windows-host-rewrite.md` §2.7). Owning them here with `Pod` derives + `offset_of!`
/// asserts makes a one-sided edit a compile error.
///
/// The host creates the section (privileged, permissive DACL so the restricted WUDFHost token can
+14 -3
View File
@@ -25,6 +25,14 @@ aes-gcm = "0.10"
cbc = { version = "0.1", features = ["alloc"] }
rand = "0.8"
hex = "0.4"
# Cover-art delivery in the game library: encode Lutris's local JPEGs into `data:` URLs and decode
# the Epic launcher's base64 `catcache.bin`. Cross-platform (Linux Lutris art + Windows Epic art).
base64 = "0.22"
# Blocking HTTP for the library cover-art warmer (no-auth GOG api.gog.com + Xbox displaycatalog),
# run on a background thread off the hot path. `ureq` is small + sync (no tokio here) and bundles
# webpki roots (no system cert dependency). Cross-platform so the fetch/parse code is compiled +
# checked everywhere even though only the Windows GOG/Xbox providers need it today.
ureq = "2"
rcgen = { version = "0.13", default-features = false, features = ["aws_lc_rs", "pem"] }
x509-parser = "0.16"
axum-server = { version = "0.7", features = ["tls-rustls"] }
@@ -89,9 +97,6 @@ serde_json = "1"
# SQLite (cc, already needed for ffmpeg/opus) so there's no system libsqlite3 runtime dependency —
# clean for the deb/rpm/flatpak packaging. Opened read-only/immutable (Lutris may hold it open).
rusqlite = { version = "0.40", features = ["bundled"] }
# Inline Lutris's local cover-art JPEGs as `data:` URLs in the library (Lutris has no public CDN
# keyed by a stable id, unlike Steam/Heroic; a `data:` URL is self-contained — no host-served endpoint).
base64 = "0.22"
# Builds/validates the xkb keymap uploaded to the virtual keyboard + tracks modifier state.
xkbcommon = "0.8"
# The safe `opus` crate is stereo-only; surround (5.1/7.1) needs the libopus *multistream*
@@ -176,6 +181,12 @@ windows = { version = "0.62", features = [
# handler / ServiceManager install). Wraps the Win32 service API; the supervision loop itself uses
# the `windows` crate above.
windows-service = "0.7"
# Read the GOG.com install registry (HKLM\SOFTWARE\WOW6432Node\GOG.com\Games) for the GOG store
# provider — ergonomic + correct-by-construction vs. hand-rolled Reg* FFI for subkey enumeration.
winreg = "0.56"
# Parse each Xbox/Game-Pass game's MicrosoftGame.config (GDK manifest XML) for the Xbox store
# provider — a small read-only DOM is all we need (Identity/Executable/ShellVisuals/StoreId).
roxmltree = "0.21"
# Software H.264 encoder (GPU-less path + NVENC fallback). The default `source` feature statically
# compiles OpenH264 (BSD-2) — no system lib, builds on MSVC; nasm on PATH adds the SIMD fast path.
openh264 = "0.9"
@@ -13,6 +13,9 @@
//! when the client isn't talking. WASAPI objects are `!Send`, so they live entirely on that thread
//! (mirrors `WasapiLoopbackCapturer`).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{VirtualMic, SAMPLE_RATE};
use anyhow::{anyhow, Context, Result};
use std::collections::VecDeque;
@@ -154,6 +157,13 @@ fn find_or_install_device() -> Result<wasapi::Device> {
Ok(d) => Ok(d),
Err(e) => {
tracing::info!("no virtual mic device present — attempting auto-install");
// SAFETY: `try_install_virtual_mic` is `unsafe` only because it `LoadLibraryExW`s
// `newdev.dll` and calls `DiInstallDriverW` through a `transmute`d function pointer;
// calling it imposes no extra precondition here (it takes no args and aliases nothing).
// Its internal contract holds: the `DiInstall` type matches the documented
// `BOOL DiInstallDriverW(HWND, PCWSTR, DWORD, PBOOL)` ABI, and it passes a
// NUL-terminated UTF-16 INF path with null/zero optional args. Invoked once on the
// dedicated mic thread.
if unsafe { try_install_virtual_mic() } {
find_device()
} else {
+9
View File
@@ -2,6 +2,10 @@
//! CPU-copy fallback (the portal delivers a CPU buffer; the encoder uploads it to the GPU
//! internally). Zero-copy dmabuf→NVENC import is deferred (plan §9 risk).
// Every unsafe block in this module tree carries a `// SAFETY:` proof; enforce it (unsafe-proof
// program). As a parent module this also covers the child modules (capture::windows/linux::*).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::Result;
/// Packed pixel layout of a [`CapturedFrame`]. The ScreenCast portal negotiates the
@@ -433,6 +437,11 @@ pub fn capture_virtual_output(
// DDA is the safety net (+ the secure-desktop path). The encode thread is set MTA so the WGC
// objects built on the watchdog thread (also MTA) are usable here; the keepalive is handed to WGC
// only on success, else to DDA. A hung watchdog thread is abandoned (holds no keepalive).
// SAFETY: `RoInitialize` is a combase FFI call that initializes the WinRT apartment for the calling
// thread. It takes the `RO_INIT_MULTITHREADED` enum by value and borrows no memory, so there is no
// pointer/lifetime/aliasing obligation; it is safe on any thread and idempotent — a second call on a
// thread already in a compatible apartment returns S_FALSE / RPC_E_CHANGED_MODE, which we discard.
// Runs on the encode thread that goes on to use the WGC (WinRT) objects built by the watchdog thread.
unsafe {
let _ = windows::Win32::System::WinRT::RoInitialize(
windows::Win32::System::WinRT::RO_INIT_MULTITHREADED,
@@ -17,6 +17,9 @@
//! instead of leaking it to process exit. The portal thread (when used) still parks on its zbus
//! connection until process exit.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{CapturedFrame, Capturer, DmabufFrame, FramePayload, PixelFormat};
use anyhow::{anyhow, Context, Result};
use std::os::fd::OwnedFd;
@@ -498,6 +501,12 @@ mod pipewire {
impl DmabufMap {
fn new(fd: i32, len: usize) -> Option<DmabufMap> {
// SAFETY: a null `addr` lets the kernel choose the mapping address; `fd` is a caller-owned
// dmabuf/MemFd fd, valid for the duration of this call, and `len` is the requested map length.
// `mmap` reads no Rust memory — it installs a fresh PROT_READ/MAP_SHARED page mapping and
// returns its base (or MAP_FAILED, checked below before `DmabufMap` adopts it). The returned
// region is a brand-new VMA, so it aliases no live Rust object, and it keeps the underlying
// object mapped independently of `fd` (which may be closed after this returns).
let ptr = unsafe {
libc::mmap(
std::ptr::null_mut(),
@@ -514,6 +523,11 @@ mod pipewire {
impl Drop for DmabufMap {
fn drop(&mut self) {
// SAFETY: `self.ptr`/`self.len` are exactly the base+length of a successful `mmap` in
// `DmabufMap::new` (constructed only when `ptr != MAP_FAILED`). This `DmabufMap` uniquely owns
// that mapping and `drop` runs once, so `munmap` releases a live mapping exactly once — no
// double-unmap. Every `&[u8]` derived from the mapping is bounded by this `DmabufMap`'s
// lifetime, so no borrow outlives the unmap.
unsafe {
libc::munmap(self.ptr, self.len);
}
@@ -719,6 +733,14 @@ mod pipewire {
if !ud.active.load(Ordering::Relaxed) {
return;
}
// SAFETY: `spa_buf` is the `*mut spa_buffer` of the PipeWire buffer we dequeued and still hold for
// this `.process` callback (not requeued until after `consume_frame` returns), so it is live. The
// block null-checks `spa_buf`, requires `n_datas != 0`, and null-checks the `datas` array pointer
// before forming any slice. `(*spa_buf).datas` points to `n_datas` libspa `spa_data` structs, and
// `pw::spa::buffer::Data` is `#[repr(transparent)]` over `spa_data` (the same cast
// `Buffer::datas_mut` performs — see the function doc), so the pointer cast + length describe
// exactly that array, in bounds. The PipeWire loop is single-threaded and owns the buffer here, so
// this `&mut` slice is the only reference to it (no aliasing/data race).
let datas: &mut [pw::spa::buffer::Data] = unsafe {
if spa_buf.is_null() || (*spa_buf).n_datas == 0 || (*spa_buf).datas.is_null() {
&mut []
@@ -783,6 +805,10 @@ mod pipewire {
// dup the fd so it survives the SPA buffer recycle — the encode thread
// imports it. (Content stability across the brief map+CSC window relies on
// the compositor's buffer-pool depth, like any zero-copy capture.)
// SAFETY: `datas[0].fd()` is the dmabuf fd owned by the live PipeWire buffer (valid
// for this callback). `fcntl(fd, F_DUPFD_CLOEXEC, 0)` reads only the integer fd,
// touches no Rust memory, and returns a fresh independent CLOEXEC duplicate (or -1).
// The original stays owned by PipeWire; the dup is a new fd we own (checked >= 0).
let dup =
unsafe { libc::fcntl(datas[0].fd() as i32, libc::F_DUPFD_CLOEXEC, 0) };
if dup >= 0 {
@@ -796,6 +822,10 @@ mod pipewire {
pts_ns,
format: fmt,
payload: FramePayload::Dmabuf(DmabufFrame {
// SAFETY: `dup` is the fresh fd `fcntl(F_DUPFD_CLOEXEC)` just returned
// (checked `dup >= 0`); nothing else owns it, so `OwnedFd` takes sole
// ownership and closes it exactly once on drop — no alias, no
// double-close.
fd: unsafe { OwnedFd::from_raw_fd(dup) },
fourcc,
modifier: ud.modifier,
@@ -930,6 +960,11 @@ mod pipewire {
// cleanly if the real buffer is genuinely too small. MemPtr buffers (no fd) are same-process —
// trust `d.data()`.
let fd_len = if raw_fd > 0 {
// SAFETY: `libc::stat` is a C plain-old-data struct for which all-zero is a valid value, so
// `mem::zeroed()` is a sound initializer. `raw_fd` is the buffer's fd (`> 0` checked here) and
// valid for this callback; `fstat` writes metadata into `&mut st`, a live, aligned,
// correctly-sized stack `stat` that outlives the synchronous call. `st.st_size` is read only
// after the return value is confirmed `== 0`. `st` is a fresh local, so nothing aliases it.
unsafe {
let mut st: libc::stat = std::mem::zeroed();
(libc::fstat(raw_fd as i32, &mut st) == 0 && st.st_size > 0)
@@ -946,6 +981,14 @@ mod pipewire {
match DmabufMap::new(raw_fd as i32, map_len) {
Some(m) => {
_mapping = m;
// SAFETY: `_mapping` is the `DmabufMap` just stored; its `ptr`/`len` come from a
// successful `mmap` of `map_len` PROT_READ bytes, so `ptr` is non-null, page-aligned,
// and the VMA is one allocated object of `len` bytes valid for reads. In the common
// path `map_len == fd_len` (the fd's real size from `fstat`), so the mapping spans the
// whole object; the de-pad copy below is further bounded by the `offset <= buf.len()`
// and `needed > avail` guards. The `&[u8]` borrows `_mapping`, which lives to the end
// of `consume_frame`, so the slice never outlives the mapping, and the memory is only
// read here, so there is no aliasing/mutation.
Some(unsafe {
std::slice::from_raw_parts(_mapping.ptr as *const u8, _mapping.len)
})
@@ -1177,24 +1220,43 @@ mod pipewire {
// Latest-frame-only (OBS pattern): Mutter delivers buffers in bursts and
// recycles its pool; an older queued buffer carries a STALE frame. Drain all
// queued buffers, requeue the older ones, keep only the newest.
// SAFETY: `stream` is the live stream PipeWire passes into this `.process` callback on
// the loop thread, where `pw_stream_dequeue_buffer` is the documented call. It returns
// a `*mut pw_buffer` owned by the stream (or null when the queue is drained),
// null-checked before any use. The loop is single-threaded, so no concurrent access.
let mut newest = unsafe { stream.dequeue_raw_buffer() };
if newest.is_null() {
return;
}
let mut drained = 1u32;
loop {
// SAFETY: same stream/loop-thread contract as the dequeue above; each call returns
// the next stream-owned `*mut pw_buffer` or null (null-checked before use).
let next = unsafe { stream.dequeue_raw_buffer() };
if next.is_null() {
break;
}
// SAFETY: `newest` is a non-null `*mut pw_buffer` previously dequeued from this same
// stream and not yet requeued; `pw_stream_queue_buffer` hands ownership back to the
// stream. We immediately overwrite `newest = next`, so the requeued pointer is never
// touched again (no use-after-requeue). Loop thread, single-threaded.
unsafe { stream.queue_raw_buffer(newest) };
newest = next;
drained += 1;
}
// SAFETY: `newest` is the non-null buffer we still own (dequeued, not requeued);
// `.buffer` is a `*mut spa_buffer` field libpipewire populated. This is a single field
// load through a valid pointer — no mutation or aliasing.
let spa_buf = unsafe { (*newest).buffer };
// Inspect the newest buffer's header + first chunk for the diagnostic and the
// CORRUPTED skip. SPA_META_Header is optional — `hdr` may be null.
// SAFETY: `spa_buf` is the `*mut spa_buffer` of the buffer we still hold.
// `spa_buffer_find_meta_data` scans that buffer's metadata array for a `SPA_META_Header`
// of at least `size_of::<spa_meta_header>()` bytes and returns a pointer into the held
// buffer's metadata (or null). The size argument matches the struct the result is cast
// to, and the pointer stays valid as long as the buffer is held (until requeue). Null is
// handled below.
let hdr = unsafe {
spa::sys::spa_buffer_find_meta_data(
spa_buf,
@@ -1205,11 +1267,20 @@ mod pipewire {
let hdr_flags = if hdr.is_null() {
0u32
} else {
// SAFETY: reached only when `hdr` is non-null; it points to a `spa_meta_header`
// inside the live buffer's metadata (returned for a size >=
// `size_of::<spa_meta_header>()`, so `.flags` is in bounds). A single field read
// while the buffer is still held.
unsafe { (*hdr).flags }
};
// First data chunk's size + flags (used for the diagnostic + CORRUPTED check)
// and its data type (a dmabuf legitimately reports chunk size 0, so the size-0
// stale skip only applies to mappable SHM buffers).
// SAFETY: every dereference is guarded in order before any field read — `spa_buf`
// non-null, `n_datas > 0`, the `datas` (`*mut spa_data`) array non-null, and the first
// element's `chunk` (`*mut spa_chunk`) non-null. `d0` is that first `spa_data` and `c`
// its chunk; reading `(*d0).type_`, `(*c).size`, `(*c).flags` are in-bounds field loads
// of libspa structs inside the buffer we still hold. Single-threaded loop, no mutation.
let (chunk_size, chunk_flags, is_dmabuf) = unsafe {
if !spa_buf.is_null()
&& (*spa_buf).n_datas > 0
@@ -1246,11 +1317,17 @@ mod pipewire {
"capture: skipped a stale CORRUPTED/cursor buffer (GNOME)"
);
}
// SAFETY: `newest` is the non-null buffer we own (dequeued, never requeued on this
// skip path); hand it back to the stream exactly once and return without touching it
// again. Loop thread inside `.process`.
unsafe { stream.queue_raw_buffer(newest) };
return;
}
consume_frame(ud, spa_buf);
// SAFETY: `consume_frame` has finished reading `spa_buf` (and the `datas` borrows derived
// from `newest`), so requeuing the owned `newest` exactly once here is sound — no
// use-after-requeue. Loop thread inside `.process`.
unsafe { stream.queue_raw_buffer(newest) };
}));
if outcome.is_err() {
@@ -15,6 +15,9 @@
//! composed while a session is live). Effectiveness can be build/driver-dependent; gated by
//! `PUNKTFUNK_FORCE_COMPOSED` (default ON; set =0 to disable).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use windows::core::w;
@@ -48,6 +51,10 @@ impl ForceComposedFlip {
let st = stop.clone();
std::thread::Builder::new()
.name("composed-flip".into())
// SAFETY: `run` is this module's `unsafe fn` (it owns a desktop+window lifecycle via Win32
// FFI); it takes ownership of `st` (the stop `Arc<AtomicBool>`) and has no caller-side memory
// precondition. It is designed to own its thread for its whole duration — exactly the
// dedicated `composed-flip` thread spawned here.
.spawn(move || unsafe { run(st) })
.ok()?;
tracing::info!("force-composed-flip overlay started (Winlogon-aware)");
@@ -62,6 +69,9 @@ impl Drop for ForceComposedFlip {
}
extern "system" fn wndproc(hwnd: HWND, msg: u32, wp: WPARAM, lp: LPARAM) -> LRESULT {
// SAFETY: this is the window procedure the OS invokes with the window's own `hwnd` and a real
// message `(msg, wp, lp)`. `DefWindowProcW` performs default processing for exactly those
// parameters (all passed straight through by value); it borrows no Rust memory and is synchronous.
unsafe { DefWindowProcW(hwnd, msg, wp, lp) }
}
@@ -1,5 +1,5 @@
//! Input-desktop watcher (Windows) — the authoritative "normal vs secure desktop" signal for the
//! two-process secure-desktop design (docs/windows-secure-desktop.md).
//! two-process secure-desktop design (design/windows-secure-desktop.md).
//!
//! Windows switches the *input desktop* to "Winlogon" (the secure desktop) for UAC elevation, the
//! lock screen and the login screen, and back to "Default" for the normal session. WGC captures only
@@ -7,6 +7,9 @@
//! desktop's NAME (WTS session notifications miss UAC entirely, so the name is the reliable signal)
//! and publishes it as an atomic the capture mux + input path read.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::sync::atomic::{AtomicBool, AtomicU8, Ordering};
use std::sync::Arc;
use std::time::Duration;
@@ -33,6 +36,10 @@ impl DesktopWatcher {
// mux) sees the real state immediately. Otherwise a session that begins already on the secure
// desktop (e.g. a reconnect to a locked box) would read DESKTOP_NORMAL for the first poll
// interval and relay one stale normal-desktop frame — the "flash of the login screen" bug.
// SAFETY: `is_secure_desktop` is this module's `unsafe fn` — unsafe only because it calls Win32
// desktop FFI (`OpenInputDesktop`/`GetUserObjectInformationW`/`CloseDesktop`), with no caller
// precondition; it opens, names, and closes the input-desktop handle internally and is safe to
// call from any thread (here, on the thread running `DesktopWatcher::start`).
let initial = if unsafe { is_secure_desktop() } {
DESKTOP_SECURE
} else {
@@ -53,6 +60,9 @@ impl DesktopWatcher {
let mut candidate = initial;
let mut stable = 0u32;
while !st.load(Ordering::Relaxed) {
// SAFETY: same as in `start` — `is_secure_desktop` is self-contained Win32 desktop
// FFI with no caller precondition, called here on the dedicated `desktop-watch`
// polling thread.
let v = if unsafe { is_secure_desktop() } {
DESKTOP_SECURE
} else {
@@ -7,6 +7,9 @@
//! Validates only with a real GPU + an *activated* SudoVDA monitor (`DuplicateOutput` needs a live
//! WDDM output). Compiles on the GPU-less VM; the pure helpers are unit-tested there.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{CapturedFrame, Capturer, FramePayload, PixelFormat};
use anyhow::{anyhow, bail, Context, Result};
use std::ffi::c_void;
@@ -69,7 +72,12 @@ pub struct D3d11Frame {
pub texture: ID3D11Texture2D,
pub device: ID3D11Device,
}
// COM pointers, used only from the single owning thread.
// SAFETY: `D3d11Frame` owns an `ID3D11Texture2D` + `ID3D11Device`, which are COM interface pointers.
// D3D11 devices/resources use thread-safe (interlocked) COM reference counting, and the device is
// created free-threaded (`make_device` passes no `D3D11_CREATE_DEVICE_SINGLETHREADED`), so handing
// ownership of the frame to another thread — the capture→encode handoff — and releasing it there is
// sound. The value is moved, never aliased (no `Sync`), so there is no concurrent use of the
// single-threaded immediate context.
unsafe impl Send for D3d11Frame {}
pub fn pack_luid(luid: LUID) -> i64 {
@@ -295,6 +303,12 @@ unsafe fn d3dkmt_set_scheduling_priority_class(
fn elevate_process_gpu_priority() {
use std::sync::Once;
static ONCE: Once = Once::new();
// SAFETY: the closure calls two of this module's `unsafe fn`s — `enable_inc_base_priority`
// (adjusts the current-process token; it has no caller precondition and builds all its FFI args
// locally) and `d3dkmt_set_scheduling_priority_class` (loads gdi32 by name and calls the export).
// The latter requires `process` to be a valid process handle; `GetCurrentProcess()` returns the
// current-process pseudo-handle, which is always valid and needs no close. Runs once via
// `Once::call_once`; no raw pointers are dereferenced here.
ONCE.call_once(|| unsafe {
use windows::Win32::System::Threading::GetCurrentProcess;
let Some(prio) = configured_gpu_priority_class() else {
@@ -538,6 +552,17 @@ unsafe extern "system" fn hybrid_query_hook(gpu_preference: *mut u32) -> i32 {
pub(crate) fn install_gpu_pref_hook() {
use std::sync::Once;
static HOOK: Once = Once::new();
// SAFETY: this one-time hook install only touches a region it has just validated.
// `LoadLibraryA("win32u.dll")` + `GetProcAddress("NtGdiDdDDIGetCachedHybridQueryValue")` yield the
// live base of the real exported function, so `target` is a valid executable code pointer to at
// least the 12 bytes the patch overwrites (an x64 prologue, per Apollo's verified hook). The two
// `ptr::copy_nonoverlapping`s each move exactly 12 bytes between the 12-byte stack arrays
// (`patch`/`readback`) and `target`, which `VirtualProtect(target, 12, PAGE_EXECUTE_READWRITE, …)`
// has just made writable (and is restored to `old` after) — source and dest never overlap (stack
// vs. loaded module image), so every access stays in mapped, in-bounds memory.
// `FlushInstructionCache` gets the current-process pseudo-handle + that same range. The DPI calls
// take by-value context handles / fill the live local `&mut old`/`&mut restore` for the duration of
// each synchronous call. Runs once via `Once::call_once`, before any DXGI use.
HOOK.call_once(|| unsafe {
use windows::Win32::System::LibraryLoader::{GetProcAddress, LoadLibraryA};
use windows::Win32::System::Memory::{
@@ -1389,6 +1414,14 @@ pub fn hdr_p010_selftest() -> Result<()> {
}
}
// SAFETY: this self-test creates its own D3D11 device + immediate context (`D3D11CreateDevice`,
// both checked non-null) and uses ONLY that device for the rest of the block: every
// `CreateTexture2D`/`CreateShaderResourceView`/`HdrP010Converter::{new,convert}`/`CopyResource`/
// `Map` is invoked on that device or its context, so all resources share one device and run on this
// single thread. The source texture's `D3D11_SUBRESOURCE_DATA` points at `fp16`, a live
// `Vec<u16>` of `W*H*4` samples with `SysMemPitch = W*8`, matching the W×H R16G16B16A16 texture;
// `fp16` outlives the synchronous `CreateTexture2D` that reads it. The mapped-pointer reads are
// proven individually at the `read_u16` closure below.
unsafe {
// Hardware D3D11 device (no adapter pin — the default GPU is fine for the self-test).
let mut device: Option<ID3D11Device> = None;
@@ -2038,7 +2071,11 @@ pub struct DuplCapturer {
dbg_cursor: u64,
_keepalive: Box<dyn Send>,
}
// COM objects used only from the one thread that owns the capturer (the encode thread).
// SAFETY: `DuplCapturer` holds D3D11 device/context/duplication COM pointers plus plain data. The
// device is created free-threaded (`make_device` sets no `D3D11_CREATE_DEVICE_SINGLETHREADED`) and
// COM reference counting is interlocked, so moving ownership of the whole capturer to another thread
// is sound. It is used by exactly one thread (the encode thread) at a time — moved to it once, never
// shared (no `Sync`) — so the single-threaded immediate context is never touched concurrently.
unsafe impl Send for DuplCapturer {}
impl DuplCapturer {
@@ -2051,6 +2088,13 @@ impl DuplCapturer {
gpu: bool,
want_hdr: bool,
) -> Result<Self> {
// SAFETY: runs on the capture thread that will own this `DuplCapturer`. `install_gpu_pref_hook()`
// and the DPI-context calls take by-value handles / no args and touch only thread/process state;
// `SetThreadExecutionState` takes a flags bitmask by value. `CreateDXGIFactory1` yields a live
// `IDXGIFactory1`, and every subsequent COM method (`EnumAdapters1`/`EnumOutputs`/`GetDesc1`/
// `GetDesc`/`cast`) is called on that factory or on an adapter/output it returned — each obtained
// through a checked `while let Ok(..)`/`?` — all from this one thread. No raw pointers are
// dereferenced; the borrowed strings/locals outlive each synchronous call.
unsafe {
// Stop DXGI hybrid-GPU output reparenting BEFORE we create the factory / enumerate outputs
// (the cause of the 0x887A0026 ACCESS_LOST churn on this hybrid box: RTX 4090 + AMD iGPU).
@@ -3207,6 +3251,11 @@ impl Capturer for DuplCapturer {
// the duplication up to 12 s). Better a few seconds of frozen-last-frame than dropping the stream.
let mut deadline = Instant::now() + Duration::from_secs(20);
loop {
// SAFETY: `acquire` is an `unsafe fn` because it drives the D3D11 immediate context + the
// output duplication, which must be touched only from the capturer's owning thread.
// `next_frame` runs on that one thread — `DuplCapturer` is `Send` but not `Sync`, so it is
// owned by a single (encode) thread for its whole life — and `&mut self` gives exclusive
// access for the call, satisfying that contract.
if let Some(f) = unsafe { self.acquire() }? {
self.ever_got_frame = true;
return Ok(f);
@@ -3253,6 +3302,8 @@ impl Capturer for DuplCapturer {
}
fn try_latest(&mut self) -> Result<Option<CapturedFrame>> {
// SAFETY: as in `next_frame` — `acquire` must run on the capturer's single owning thread, and
// `try_latest` is called on it (`DuplCapturer` is `Send`, not `Sync`); `&mut self` is exclusive.
unsafe { self.acquire() }
}
@@ -3264,11 +3315,19 @@ impl Capturer for DuplCapturer {
impl Drop for DuplCapturer {
fn drop(&mut self) {
if self.holding_frame {
// SAFETY: `self.dupl` is the live `IDXGIOutputDuplication` this capturer created and owns;
// `ReleaseFrame` is a valid COM method on it, called only when `holding_frame` records that a
// frame was acquired and not yet released (so it is not an unbalanced release). Drop runs on
// whichever thread owns the capturer — its sole owner, since it is `!Sync` — and the `&`
// borrow of the duplication outlives this synchronous call.
unsafe {
let _ = self.dupl.as_ref().map(|d| d.ReleaseFrame());
}
}
// Release the display/system-required execution state we took at open().
// SAFETY: `SetThreadExecutionState` is a Win32 FFI call taking an execution-state flag bitmask
// by value (`ES_CONTINUOUS` clears the display/system-required state taken at open); it borrows
// no Rust memory and is safe to call from any thread.
unsafe {
SetThreadExecutionState(ES_CONTINUOUS);
}
@@ -10,7 +10,10 @@
//! [`pf_driver_proto::frame`] (which OWNS the contract, with `const` size asserts) — both sides
//! `use` it, so drift is a compile error rather than a "must match" comment.
use super::dxgi::{make_device, D3d11Frame, HdrConverter, WinCaptureTarget};
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::dxgi::{make_device, D3d11Frame, HdrP010Converter, VideoConverter, WinCaptureTarget};
use super::{CapturedFrame, Capturer, FramePayload, PixelFormat};
use anyhow::{bail, Context, Result};
use pf_driver_proto::frame;
@@ -20,13 +23,12 @@ use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH};
use windows::core::{w, Interface, HSTRING};
use windows::Win32::Foundation::{HANDLE, INVALID_HANDLE_VALUE, LUID};
use windows::Win32::Graphics::Direct3D11::{
ID3D11Device, ID3D11DeviceContext, ID3D11RenderTargetView, ID3D11ShaderResourceView,
ID3D11Texture2D, D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE,
D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX, D3D11_RESOURCE_MISC_SHARED_NTHANDLE,
D3D11_TEXTURE2D_DESC, D3D11_USAGE_DEFAULT,
ID3D11Device, ID3D11DeviceContext, ID3D11ShaderResourceView, ID3D11Texture2D,
D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX,
D3D11_RESOURCE_MISC_SHARED_NTHANDLE, D3D11_TEXTURE2D_DESC, D3D11_USAGE_DEFAULT,
};
use windows::Win32::Graphics::Dxgi::Common::{
DXGI_FORMAT, DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_FORMAT_R10G10B10A2_UNORM,
DXGI_FORMAT, DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_FORMAT_NV12, DXGI_FORMAT_P010,
DXGI_FORMAT_R16G16B16A16_FLOAT, DXGI_SAMPLE_DESC,
};
use windows::Win32::Graphics::Dxgi::{
@@ -205,21 +207,33 @@ pub struct IddPushCapturer {
/// cleared when a fresh frame resumes. If it stays set past the recovery window, `try_consume` drops
/// the session (recover-or-drop, no DDA).
recovering_since: Option<Instant>,
/// Host-owned ROTATING output ring NVENC encodes (texture + RTV per slot). Rotating it per frame is
/// the precondition for pipelining the encode loop: while NVENC encodes frame N's texture on the
/// ASIC, frame N+1's convert/copy writes a DIFFERENT texture on the 3D engine — the two overlap. The
/// HDR convert and the SDR copy both write into the current slot. Format = `out_format()` (Rgb10a2 in
/// HDR, Bgra in SDR); rebuilt on a display-mode flip. Built lazily.
out_ring: Vec<(ID3D11Texture2D, ID3D11RenderTargetView)>,
/// Host-owned ROTATING output ring NVENC encodes (one YUV texture per slot). Rotating it per frame
/// is the precondition for pipelining the encode loop: while NVENC encodes frame N's texture on the
/// ASIC, frame N+1's convert writes a DIFFERENT texture — the two overlap. Format = `out_format()`:
/// NV12 (SDR, BT.709 limited) or P010 (HDR, BT.2020 PQ limited), so NVENC takes native YUV and skips
/// its internal RGB→YUV CSC on the SM/3D engine the game saturates (plan §5.A). Rebuilt on a
/// display-mode flip. Built lazily.
out_ring: Vec<ID3D11Texture2D>,
out_idx: usize,
/// FP16 scRGB → `Rgb10a2` BT.2020 PQ converter, used while the display is HDR. Built lazily.
hdr_conv: Option<HdrConverter>,
/// BGRA slot → NV12 (BT.709 limited) on the dedicated D3D11 VIDEO engine, used while the display is
/// SDR — keeps the colour-convert OFF the contended 3D/compute engine. Built lazily; rebuilt on a
/// size/HDR flip.
video_conv: Option<VideoConverter>,
/// FP16 scRGB slot → P010 (BT.2020 PQ limited) via two shader passes, used while the display is HDR
/// (NVIDIA's VideoProcessor can't do RGB→P010). The passes run on the 3D engine, but it still skips
/// NVENC's internal SM-side CSC. Built lazily.
hdr_p010_conv: Option<HdrP010Converter>,
last_seq: u64,
last_present: Option<(ID3D11Texture2D, PixelFormat)>,
status_logged: bool,
_keepalive: Box<dyn Send>,
}
// COM objects used only from the owning (encode) thread.
// SAFETY: `IddPushCapturer` is `!Send` only because of its `*mut SharedHeader`/`*mut DebugBlock` raw
// pointers (and the COM interfaces). It is created, used, and dropped by a SINGLE thread — the owning
// capture/encode thread — never shared: the `ID3D11DeviceContext` is the device's IMMEDIATE context
// (single-threaded by D3D11 contract) and is only ever touched from that thread, and the header/
// dbg_block pointers (into mappings this struct owns) are only dereferenced there. `Send` transfers
// ownership to one thread at a time with NO concurrent access; we do not (and must not) claim `Sync`.
unsafe impl Send for IddPushCapturer {}
/// Build a permissive (Everyone:GenericAll) `SECURITY_ATTRIBUTES` so the restricted WUDFHost driver
@@ -337,6 +351,9 @@ impl IddPushCapturer {
// a fullscreen game can hold the virtual display at a different mode (esp. across a reconnect), so
// matching the actual mode lets the first frame flow instead of being dropped (game-capture bug
// GB1). Falls back to the negotiated mode when the CCD read is unavailable.
// SAFETY: `active_resolution` is an `unsafe fn` (Win32 CCD `QueryDisplayConfig`) that takes only a
// copy of the plain `u32` CCD target id and returns owned `(w, h)` values; it forms no borrows from
// us and validates the id internally, returning `None` on any failure (handled by `unwrap_or`).
let (w, h) =
unsafe { crate::win_display::active_resolution(target.target_id) }.unwrap_or((pw, ph));
if (w, h) != (pw, ph) {
@@ -355,6 +372,27 @@ impl IddPushCapturer {
// PROACTIVELY enable advanced color so HDR streams without the user toggling anything; an
// SDR-only client leaves the display alone (and still gets a tone-mapped picture, never a freeze,
// if the user does enable HDR).
// SAFETY: one block over the whole ring setup; every operation in it is sound:
// - `set_advanced_color`/`advanced_color_enabled` are `unsafe fn`s taking only a copy of the plain
// `u32` target id; they read/flip CCD display config and return owned values, borrowing nothing.
// - `CreateDXGIFactory1`, `EnumAdapterByLuid`, `make_device`, `permissive_sa`, `CreateFileMappingW`,
// `MapViewOfFile`, `CreateEventW`, and `create_ring_slots` are all `?`-checked, so every returned
// interface/handle/view is non-error before use; `&sa`/`&adapter`/`&device`/the `&HSTRING` names
// are live borrows that outlive each synchronous call, and `sa.lpSecurityDescriptor` stays valid
// because its backing `_psd` is held in scope for the whole block.
// - The header mapping is created AND viewed at `bytes == size_of::<SharedHeader>().max(64)`; the
// view's null is checked (`bail!` on failure, after which the owned `map` closes the mapping). The
// OS view base is page-aligned, so `section.ptr::<SharedHeader>()` is suitably aligned for a
// `SharedHeader`, and `write_bytes(.., 0, bytes)` plus the `(*header).field = ..` writes all stay
// within those `bytes` and write THROUGH the raw pointer without forming any `&mut`. The debug
// section is the same pattern at `dbg_bytes == size_of::<DebugBlock>()`, only entered when its
// own view is non-null.
// - The `magic` publish stores through `addr_of!((*header).magic) as *const AtomicU32`: `addr_of!`
// takes the field address without a reference; the field is a 4-aligned `u32` (valid for
// `AtomicU32`), and the `Release` store after the `Release` fence is the cross-process handshake
// that orders all preceding writes before the driver may observe `MAGIC`.
// - `header`/`dbg_block` point into the OS mappings, NOT into the `MappedSection` structs, so moving
// `section`/`dbg_section` into `me` leaves them valid (see the `MappedSection` doc comment).
unsafe {
// If we ENABLE advanced color for a 10-bit client, trust it (the driver will compose FP16) and
// size the ring FP16 directly — don't race the advanced_color_enabled poll, which may not have
@@ -504,7 +542,8 @@ impl IddPushCapturer {
recovering_since: None,
out_ring: Vec::new(),
out_idx: 0,
hdr_conv: None,
video_conv: None,
hdr_p010_conv: None,
last_seq: 0,
last_present: None,
status_logged: false,
@@ -523,7 +562,7 @@ impl IddPushCapturer {
/// Block (bounded) until the driver has ATTACHED to the host ring (`DRV_STATUS_OPENED`) **and published
/// a first frame**, else fail so the caller can fall back to DDA (audit §5.1 +
/// `docs/windows-host-rewrite.md` §2.5 — the GB1 game-capture fix).
/// `design/windows-host-rewrite.md` §2.5 — the GB1 game-capture fix).
///
/// Requiring the first frame — not just the attach — catches the *reconnect-into-a-broken-state* case:
/// a fullscreen game can leave the virtual display in a format/size that the driver's `publish()` guard
@@ -534,10 +573,16 @@ impl IddPushCapturer {
fn wait_for_attach(&self) -> Result<()> {
let deadline = Instant::now() + Duration::from_secs(4);
loop {
// Plain read: the driver writes this u32; an aligned u32 read can't tear (same access as
// SAFETY: `self.header` points into the live shared-header mapping this capturer owns (sized
// `>= size_of::<SharedHeader>()`, page-aligned), so the field read is in-bounds + aligned, and
// no reference into the shared region is formed. Plain read: the driver writes this `u32`
// cross-process, but an aligned `u32` read can't tear and `driver_status` is best-effort
// diagnostics — the real handshake is the atomic `magic`/`latest` (same access as
// log_driver_status_once).
let st = unsafe { (*self.header).driver_status };
if matches!(st, DRV_STATUS_TEX_FAIL | DRV_STATUS_NO_DEVICE1) {
// SAFETY: as above — an in-bounds, aligned `u32` read of a best-effort diagnostic field
// through the owned, live header mapping; no reference into the shared region is formed.
let detail = unsafe { (*self.header).driver_status_detail };
bail!(
"IDD-push driver failed to attach (driver_status={st} detail=0x{detail:08x} — \
@@ -560,6 +605,10 @@ impl IddPushCapturer {
#[inline]
fn latest(&self) -> u64 {
// SAFETY: `self.header` is the live, owned shared-header mapping (page-aligned, sized for a
// `SharedHeader`). `addr_of!((*self.header).latest)` forms the address of the `latest` field
// WITHOUT a reference; it is an 8-aligned `u64` (so valid for `AtomicU64`), and the `Acquire` load
// is the consumer half of the cross-process publish handshake (pairs with the driver's `Release`).
unsafe {
(*(std::ptr::addr_of!((*self.header).latest) as *const AtomicU64))
.load(Ordering::Acquire)
@@ -571,6 +620,10 @@ impl IddPushCapturer {
if self.status_logged {
return;
}
// SAFETY: four in-bounds, aligned reads of the live, owned shared-header mapping. The driver writes
// these `u32`/`i32` diagnostic fields cross-process, but aligned word reads can't tear and these are
// best-effort status (the real handshake is the atomic `magic`/`latest`); no `&`/`&mut` reference
// into the shared region is formed.
let (status, detail, lo, hi) = unsafe {
(
(*self.header).driver_status,
@@ -610,6 +663,11 @@ impl IddPushCapturer {
tracing::warn!("IDD push DEBUG: no debug block");
return;
}
// SAFETY: `self.dbg_block` was just checked non-null (the early return above); it points into the
// owned `dbg_section` mapping sized exactly `size_of::<DebugBlock>()` and page-aligned, so it is
// valid + aligned for `DebugBlock`. `d` is a short-lived SHARED reference used only to read the
// fields below; we never form `&mut` into this region, and the driver's cross-process writes are
// aligned `u32`s that don't tear (best-effort bring-up diagnostics).
let d = unsafe { &*self.dbg_block };
tracing::error!(
run_core_entries = d.run_core_entries,
@@ -625,16 +683,17 @@ impl IddPushCapturer {
);
}
/// The output texture format + the [`PixelFormat`] it presents as, driven SOLELY by the DISPLAY's
/// HDR state (like the WGC path): HDR → `Rgb10a2` BT.2020 PQ → NVENC Main10, and the client
/// auto-detects PQ from the HEVC VUI; SDR → 8-bit `Bgra`. We do NOT gate HDR on the client's
/// advertised `VIDEO_CAP_10BIT` — clients under-report it (e.g. the Mac advertises 10-bit only when
/// its OWN display is HDR), yet all decode Main10 + auto-switch, exactly as on the WGC path.
/// The output texture format + the [`PixelFormat`] NVENC encodes, driven SOLELY by the DISPLAY's HDR
/// state (like the WGC path): HDR → `P010` (BT.2020 PQ 10-bit limited) → NVENC Main10, and the client
/// auto-detects PQ from the HEVC VUI; SDR → `Nv12` (BT.709 8-bit limited). Both are native YUV so
/// NVENC skips its internal RGB→YUV CSC on the contended SM (plan §5.A). We do NOT gate HDR on the
/// client's advertised `VIDEO_CAP_10BIT` — clients under-report it (e.g. the Mac advertises 10-bit
/// only when its OWN display is HDR), yet all decode Main10 + auto-switch, exactly as on the WGC path.
fn out_format(&self) -> (DXGI_FORMAT, PixelFormat) {
if self.display_hdr {
(DXGI_FORMAT_R10G10B10A2_UNORM, PixelFormat::Rgb10a2)
(DXGI_FORMAT_P010, PixelFormat::P010)
} else {
(DXGI_FORMAT_B8G8R8A8_UNORM, PixelFormat::Bgra)
(DXGI_FORMAT_NV12, PixelFormat::Nv12)
}
}
@@ -658,6 +717,10 @@ impl IddPushCapturer {
self.height = new_h;
let fmt = self.ring_format();
let new_gen = IDD_GENERATION.fetch_add(1, Ordering::Relaxed);
// SAFETY: `create_ring_slots` is an `unsafe fn` (it makes D3D11/DXGI COM calls); we pass a live
// borrow of `self.device` (the capturer's own device, on which the slots are created) plus plain
// `u32`/`DXGI_FORMAT` values, and `?` propagates any failure before the slots are used. Every
// returned slot's texture + keyed mutex belongs to that same `self.device`.
let new_slots = unsafe {
Self::create_ring_slots(
&self.device,
@@ -668,6 +731,12 @@ impl IddPushCapturer {
fmt,
)?
};
// SAFETY: `self.header` is the live, owned shared-header mapping (page-aligned, sized for a
// `SharedHeader`). The `latest`/`generation` stores go through `addr_of!`-formed field pointers (no
// references) of correctly-aligned `u64`/`u32` fields, valid for `AtomicU64`/`AtomicU32`; the
// `dxgi_format`/`width`/`height` writes are in-bounds raw writes through the pointer (no `&mut`).
// The `Release` fence + the `Release` `generation` store publish all preceding writes so the driver
// only re-attaches (`Acquire`) once the new textures + format are in place.
unsafe {
// Clear `latest` to the 0 sentinel (generation 0, which try_consume rejects). The real guard
// against consuming an unwritten new-ring slot is the generation tag in `latest`: a stale
@@ -688,6 +757,8 @@ impl IddPushCapturer {
self.generation = new_gen;
self.last_seq = 0;
self.out_ring.clear(); // the output format changed → rebuild lazily at the new format
self.video_conv = None; // converters are sized + HDR-specific → rebuild at the new mode
self.hdr_p010_conv = None;
self.out_idx = 0;
self.last_present = None;
Ok(())
@@ -701,9 +772,13 @@ impl IddPushCapturer {
return;
}
self.last_acm_poll = Instant::now();
// SAFETY: `advanced_color_enabled` is an `unsafe fn` taking only a copy of the plain `u32` target
// id; it performs a read-only CCD query and returns an owned `bool`, borrowing nothing from us.
let now_hdr = unsafe { crate::win_display::advanced_color_enabled(self.target_id) };
// Follow the display's ACTUAL resolution too — a fullscreen game can mode-set the virtual display
// out from under the negotiated size (game-capture bug GB1). Unknown read → keep our current size.
// SAFETY: `active_resolution` is an `unsafe fn` taking only a copy of the plain `u32` target id; it
// performs a read-only CCD query and returns owned `(w, h)` values, borrowing nothing from us.
let (now_w, now_h) = unsafe { crate::win_display::active_resolution(self.target_id) }
.unwrap_or((self.width, self.height));
if now_hdr == self.display_hdr && now_w == self.width && now_h == self.height {
@@ -742,31 +817,46 @@ impl IddPushCapturer {
Quality: 0,
},
Usage: D3D11_USAGE_DEFAULT,
BindFlags: (D3D11_BIND_RENDER_TARGET.0 | D3D11_BIND_SHADER_RESOURCE.0) as u32,
// RENDER_TARGET: the VIDEO processor (NV12) and the P010 shader passes both write here, and
// NVENC registers it as encode input — matching the WGC YUV ring.
BindFlags: D3D11_BIND_RENDER_TARGET.0 as u32,
CPUAccessFlags: 0,
MiscFlags: 0,
};
for _ in 0..OUT_RING {
let mut t: Option<ID3D11Texture2D> = None;
let mut rtv: Option<ID3D11RenderTargetView> = None;
// SAFETY: `CreateTexture2D` is called on `self.device` (the capturer's live D3D11 device);
// `&desc` is a fully-initialized stack `D3D11_TEXTURE2D_DESC`, the data arg is `None` (no
// initial data), and `Some(&mut t)` is a live out-parameter the call fills. `?` rejects a failed
// HRESULT before `t` is unwrapped, and the created texture belongs to `self.device`.
unsafe {
self.device
.CreateTexture2D(&desc, None, Some(&mut t))
.context("CreateTexture2D(IDD out ring)")?;
let t = t.context("null out-ring texture")?;
self.device
.CreateRenderTargetView(&t, None, Some(&mut rtv))
.context("CreateRenderTargetView(IDD out ring)")?;
self.out_ring.push((t, rtv.context("null out-ring rtv")?));
self.out_ring.push(t.context("null out-ring texture")?);
}
}
Ok(())
}
/// Build the HDR converter if not already built (HDR-display path only — an SDR display is a copy).
/// Build the per-mode YUV converter if not already built: a VIDEO-engine BGRA→NV12 processor on an
/// SDR display, or the FP16→P010 shader on an HDR display. Both keep NVENC's RGB→YUV CSC off the SM.
fn ensure_converter(&mut self) -> Result<()> {
if self.hdr_conv.is_none() {
self.hdr_conv = Some(unsafe { HdrConverter::new(&self.device)? });
if self.display_hdr {
if self.hdr_p010_conv.is_none() {
// SAFETY: `HdrP010Converter::new` is `unsafe` (it compiles D3D11 shaders + creates
// resources); we pass a live borrow of `self.device`, the device the converter's resources
// belong to, and `?` propagates any failure before the converter is stored.
self.hdr_p010_conv = Some(unsafe { HdrP010Converter::new(&self.device)? });
}
} else if self.video_conv.is_none() {
// SAFETY: `VideoConverter::new` is `unsafe` (it sets up the D3D11 VIDEO processor); we pass live
// borrows of `self.device` + its immediate `self.context` (single-threaded, this thread) plus
// plain `u32` dimensions, and `?` propagates any failure before it is stored. The converter's
// resources belong to that same device/context.
self.video_conv = Some(unsafe {
VideoConverter::new(&self.device, &self.context, self.width, self.height, false)?
});
}
Ok(())
}
@@ -801,16 +891,11 @@ impl IddPushCapturer {
return Ok(None);
}
self.ensure_out_ring()?;
// Build the HDR converter BEFORE acquiring the slot so nothing between Acquire and Release can
// Build the converter BEFORE acquiring the slot so nothing between Acquire and Release can
// `?`-return and leak the keyed-mutex lock (which would stall the driver on that slot).
if self.display_hdr {
self.ensure_converter()?;
}
self.ensure_converter()?;
let i = self.out_idx;
let (out, out_rtv) = {
let (t, rtv) = &self.out_ring[i];
(t.clone(), rtv.clone())
};
let out = self.out_ring[i].clone();
let (_, pf) = self.out_format();
// Hold the slot's keyed mutex only across the convert/copy into the host out-ring (NOT across the
@@ -824,16 +909,27 @@ impl IddPushCapturer {
let Some(_lock) = KeyedMutexGuard::acquire(&s.mutex, 0, 8) else {
return Ok(None);
};
// SAFETY: convert/copy on the owning (encode) thread's immediate context, holding the slot lock.
// SAFETY: convert on the owning (encode) thread's immediate context, holding the slot lock.
// A `?` here is leak-safe: `_lock` (the KeyedMutexGuard) drops on the early return, releasing
// the slot back to the driver.
unsafe {
if self.display_hdr {
// Sample the FP16 slot's SRV directly (no scratch copy) → BT.2020 PQ Rgb10a2.
if let Some(conv) = self.hdr_conv.as_ref() {
conv.convert(&self.context, &s.srv, &out_rtv, self.width, self.height);
// HDR: FP16 slot SRV → P010 (BT.2020 PQ) via the shader; NVENC takes native P010.
if let Some(conv) = self.hdr_p010_conv.as_ref() {
conv.convert(
&self.device,
&self.context,
&s.srv,
&out,
self.width,
self.height,
)?;
}
} else {
// SDR: the slot is already 8-bit BGRA — one copy into the out-ring (hidden by pipelining).
self.context.CopyResource(&out, &s.tex);
// SDR: BGRA slot → NV12 on the VIDEO engine; NVENC takes native NV12, no SM-side CSC.
if let Some(conv) = self.video_conv.as_ref() {
conv.convert(&s.tex, &out)?;
}
}
}
// `_lock` drops here → `ReleaseSync(0)`.
@@ -861,7 +957,7 @@ impl IddPushCapturer {
// OUT_RING(3) > the max pipeline_depth(2) guarantees the rotated slot is not in flight.
let (src, pf) = self.last_present.clone()?;
let i = self.out_idx;
let dst = self.out_ring.get(i)?.0.clone();
let dst = self.out_ring.get(i)?.clone();
// SAFETY: GPU copy on the owning thread's immediate context; src/dst are our out-ring textures of
// identical format/size (src is a previous out-ring slot; dst the next).
unsafe {
@@ -922,6 +1018,8 @@ pub fn spawn_observer(target: WinCaptureTarget, preferred: Option<(u32, u32, u32
/// The discrete render GPU LUID (where NVENC runs), falling back to the monitor's `OsAdapterLuid`.
fn resolve_render_adapter_luid_or(fallback_packed: i64) -> LUID {
// SAFETY: `resolve_render_adapter_luid` is an `unsafe fn` (it enumerates DXGI adapters) that takes no
// arguments and returns an owned `Option<LUID>`, borrowing nothing.
if let Some(l) = unsafe { crate::win_adapter::resolve_render_adapter_luid() } {
return l;
}
@@ -935,6 +1033,9 @@ impl Capturer for IddPushCapturer {
fn next_frame(&mut self) -> Result<CapturedFrame> {
let deadline = Instant::now() + Duration::from_secs(20);
loop {
// SAFETY: `self.event` is the live frame-ready `OwnedHandle` this capturer owns; its raw value
// (borrowed for the call, so it outlives this synchronous wait) is a valid auto-reset event
// handle. `WaitForSingleObject` only reads the handle; the 16 ms timeout bounds the wait.
let _ = unsafe { WaitForSingleObject(HANDLE(self.event.as_raw_handle()), 16) };
if let Some(f) = self.try_consume()? {
return Ok(f);
@@ -944,6 +1045,9 @@ impl Capturer for IddPushCapturer {
}
if Instant::now() > deadline {
self.log_debug_block();
// SAFETY: four in-bounds, aligned reads of the live, owned shared-header mapping — the same
// best-effort diagnostic fields as `log_driver_status_once` (aligned word reads can't tear;
// no reference into the shared region is formed).
let (st, detail, lo, hi) = unsafe {
(
(*self.header).driver_status,
@@ -16,6 +16,9 @@
//! Limitation: WGC cannot capture the secure desktop (lock / UAC / login) — the caller falls back to
//! the DDA backend ([`super::dxgi::DuplCapturer`]) for those (see capture.rs).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::dxgi::{
find_output, hdr_shader_p010_enabled, make_device, nudge_cursor_onto, D3d11Frame, HdrConverter,
HdrP010Converter, VideoConverter, WinCaptureTarget,
@@ -92,6 +95,10 @@ struct Deimpersonate(Option<HANDLE>);
impl Drop for Deimpersonate {
fn drop(&mut self) {
if let Some(tok) = self.0.take() {
// SAFETY: `RevertToSelf` takes no arguments and undoes the thread impersonation set during
// WGC activation; `tok` is the impersonation token `HANDLE` from `impersonate_active_user`,
// owned by this `Deimpersonate` and closed exactly once here (taken out of the `Option`, so
// no double-close). Both are FFI calls borrowing no Rust memory.
unsafe {
let _ = RevertToSelf();
let _ = CloseHandle(tok);
@@ -174,7 +181,12 @@ pub struct WgcCapturer {
_keepalive: Option<Box<dyn Send>>,
}
// COM + WinRT pointers; confined to the single owning (encode) thread, like DuplCapturer.
// SAFETY: like `DuplCapturer`. `WgcCapturer` holds D3D11 (free-threaded device/context) plus WGC WinRT
// objects (`Direct3D11CaptureFramePool` etc., created free-threaded via `CreateFreeThreaded`). COM/WinRT
// reference counting is interlocked, and the capturer is owned + used by exactly one encode thread,
// moved to it once and never shared (no `Sync`), so transferring ownership across threads is sound. The
// free-threaded `FrameArrived` callback touches only the `Arc<WgcSignal>` (itself `Send + Sync`), not
// the capturer's COM fields.
unsafe impl Send for WgcCapturer {}
impl WgcCapturer {
@@ -182,6 +194,15 @@ impl WgcCapturer {
/// [`attach_keepalive`](Self::attach_keepalive) only after open succeeds, so a failure leaves the
/// keepalive with the caller to hand to the DDA fallback.
pub fn open(target: WinCaptureTarget, preferred: Option<(u32, u32, u32)>) -> Result<Self> {
// SAFETY: runs on the thread opening the WGC session. `RoInitialize` inits this thread's WinRT
// apartment (idempotent; result ignored). `impersonate_active_user()` and `find_output()` are
// this module's `unsafe fn`s whose contracts (call on the activating thread; pass a GDI name)
// are met, and the impersonation is reverted by `_deimp`'s Drop on every return path. Every
// COM/WinRT call thereafter operates on an object obtained + `?`-checked earlier in this same
// block on this single thread — the `IDXGIOutput1` from `find_output`, the device/context from
// `make_device`, the factory/interop/item/pool/session — and the `TypedEventHandler` closure
// captures an `Arc<WgcSignal>` (Send+Sync) by move. No raw pointers are dereferenced; borrowed
// locals outlive their synchronous calls.
unsafe {
// WGC is WinRT — the calling thread needs a COM/WinRT apartment for the GraphicsCaptureItem
// activation factory (RoGetActivationFactory). Initialize MTA; ignore "already initialized"
@@ -585,6 +606,15 @@ impl WgcCapturer {
}
fn process_frame(&mut self, frame: Direct3D11CaptureFrame) -> Result<CapturedFrame> {
// SAFETY: runs on the capturer's single owning thread. `frame` is a live
// `Direct3D11CaptureFrame` from `self.pool`; `frame.Surface().cast::<IDirect3DDxgiInterfaceAccess
// >().GetInterface()` yields the frame's backing `ID3D11Texture2D`, which belongs to
// `self.device` (the pool was created on it via `CreateDirect3D11DeviceFromDXGIDevice`). Every
// helper called here — `hdr_to_p010`, `convert_to_yuv`, `ensure_fp16_src`, `ensure_out_ring`,
// `HdrConverter::convert`, `CopyResource`, `CreateRenderTargetView` — operates on
// `self.device`/`self.context` and that same-device texture, so all resources share one device.
// The frame is held in `self.held` until its async GPU read completes for the zero-copy paths.
// Single-threaded immediate-context use; borrowed textures/SRVs/RTVs outlive each synchronous call.
unsafe {
let surface = frame.Surface().context("frame Surface")?;
let access: IDirect3DDxgiInterfaceAccess = surface
@@ -1,5 +1,5 @@
//! Host-side WGC helper relay (Windows two-process secure-desktop design,
//! docs/windows-secure-desktop.md — step 4).
//! design/windows-secure-desktop.md — step 4).
//!
//! WGC won't activate under the SYSTEM account, so the SYSTEM host can't capture the normal desktop
//! itself. Instead it spawns `punktfunk-host wgc-helper` in the **interactive user session** (so WGC works)
@@ -13,6 +13,9 @@
//! Wire framing (must match `wgc_helper::write_au`): per AU
//! `[u32 magic "PFAU" LE][u32 len LE][u64 pts_ns LE][u8 keyframe][len bytes data]`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use crate::capture::dxgi::WinCaptureTarget;
use anyhow::{bail, Context, Result};
use std::io::{BufRead, BufReader, Read};
@@ -56,9 +59,15 @@ pub struct HelperRelay {
rx: Receiver<RelayAu>,
}
// HANDLEs are just kernel handle values; we own them for the relay's lifetime and close them on Drop.
// SAFETY: every field is itself `Send`: the `proc`/`thread` `HANDLE`s are process-global kernel
// handle values (plain integers valid from any thread, owned for the relay's lifetime and closed once
// on Drop), `stdin_w` is a `Mutex<HANDLE>`, and `rx` is an mpsc `Receiver<RelayAu>` (which is `Send`).
// The relay is moved to one thread and owned there, so transferring it across threads is sound.
unsafe impl Send for HelperRelay {}
unsafe impl Sync for HelperRelay {}
// NOTE: `HelperRelay` is deliberately NOT `Sync`. Its `rx: Receiver<RelayAu>` is `!Sync` (std mpsc
// is single-consumer), and the relay is only ever a single-owner local in the punktfunk1 two-process
// mux loop — never shared by `&` across threads — so `Sync` is neither sound nor needed. (A prior
// `unsafe impl Sync` here asserted more than the fields support; removed.)
/// Control byte on the helper's stdin: force the next encoded frame to be an IDR (client decode
/// recovery). Mirrors `enc.request_keyframe()` in the single-process path.
@@ -84,6 +93,10 @@ impl HelperRelay {
);
tracing::info!(cmd = %cmdline, "spawning WGC helper in user session");
// SAFETY: `spawn_inner` is an `unsafe fn` only because it drives raw Win32 token/pipe/process
// FFI; it imposes no caller-side memory precondition beyond valid arguments. `cmdline` is a live
// `&str` borrowed for the synchronous call and `(w, h, hz)` are plain `u32`s. It validates its
// own runtime requirements (active console session, SYSTEM token) and returns `Err` otherwise.
unsafe { spawn_inner(&cmdline, w, h, hz) }
}
@@ -108,6 +121,11 @@ impl HelperRelay {
pub fn request_keyframe(&self) {
let h = self.stdin_w.lock().unwrap();
let mut written = 0u32;
// SAFETY: `*h` is the host's write end of the helper's stdin pipe — a live `HANDLE` owned by
// this `HelperRelay` (held under the `stdin_w` Mutex, locked here), closed only in Drop.
// `WriteFile` reads the 1-byte `&[CTL_KEYFRAME]` buffer and writes the byte count into
// `written`; both are live locals that outlive the synchronous call. A failure (helper gone) is
// discarded as documented.
unsafe {
let _ = windows::Win32::Storage::FileSystem::WriteFile(
*h,
@@ -121,6 +139,10 @@ impl HelperRelay {
impl Drop for HelperRelay {
fn drop(&mut self) {
// SAFETY: `self.proc`/`self.thread` are the child process/thread `HANDLE`s from
// `CreateProcessAsUserW`, and `stdin_w` is the host's pipe write end — all owned by this
// `HelperRelay` and closed exactly once here in Drop (no double-close). `TerminateProcess` and
// the three `CloseHandle`s are FFI calls taking those handles by value, borrowing no Rust memory.
unsafe {
// Terminate the child first so its WGC capture + NVENC session tear down, then close our
// handles (the reader threads end on the resulting broken pipe).
@@ -364,10 +386,17 @@ fn au_reader(mut r: HandleReader, tx: SyncSender<RelayAu>) {
/// Minimal `Read` over a Win32 pipe HANDLE (the windows crate doesn't impl `Read` on HANDLE).
struct HandleReader(HANDLE);
// SAFETY: `HandleReader` owns a single pipe `HANDLE` (a process-global kernel handle value, valid from
// any thread). It is moved into the dedicated reader thread and used only there (and closed once on
// Drop), never shared — so transferring ownership across threads is sound.
unsafe impl Send for HandleReader {}
impl Read for HandleReader {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
let mut read = 0u32;
// SAFETY: `self.0` is the live read end of an anonymous pipe owned by this `HandleReader`
// (closed only in Drop). `ReadFile` fills the caller-provided `buf` (writing at most `buf.len()`
// bytes) and stores the count in `read`; both outlive the synchronous call. A broken pipe
// surfaces as `Err` and is mapped to EOF below.
let ok = unsafe {
windows::Win32::Storage::FileSystem::ReadFile(self.0, Some(buf), Some(&mut read), None)
};
@@ -380,6 +409,8 @@ impl Read for HandleReader {
}
impl Drop for HandleReader {
fn drop(&mut self) {
// SAFETY: `self.0` is the pipe `HANDLE` this `HandleReader` owns; `CloseHandle` (an FFI call
// taking the handle by value) is invoked exactly once here in Drop, so there is no double-close.
unsafe {
let _ = CloseHandle(self.0);
}
@@ -391,6 +422,13 @@ impl Drop for HandleReader {
pub fn running_as_system() -> bool {
use windows::Win32::Security::{GetTokenInformation, TokenUser, TOKEN_QUERY, TOKEN_USER};
use windows::Win32::System::Threading::{GetCurrentProcess, OpenProcessToken};
// SAFETY: `OpenProcessToken(GetCurrentProcess(), TOKEN_QUERY, &mut token)` opens the current-process
// token (the pseudo-handle is always valid) into `token`, which is closed once before each return.
// The first `GetTokenInformation` (null buffer) queries the required `len`; `buf` is then a
// `Vec<u8>` of exactly `len` bytes and the second call fills it, so `&*(buf.as_ptr() as *const
// TOKEN_USER)` reads a `TOKEN_USER` the kernel just wrote into a sufficiently-sized buffer (the
// variable-length SID it points at also lies within `buf`, which outlives the borrow).
// `is_local_system_sid` is this module's `unsafe fn`, given that in-buffer `PSID`. Safe on any thread.
unsafe {
let mut token = HANDLE::default();
if OpenProcessToken(GetCurrentProcess(), TOKEN_QUERY, &mut token).is_err() {
+1 -1
View File
@@ -4,7 +4,7 @@
//! environment before the host starts, and **for the knobs captured here the environment is constant for the
//! process lifetime**, so a lazily-parsed global is equivalent to "parsed once at startup".
//!
//! **Goal-1 stages 12** (`docs/windows-host-rewrite.md` §2.2): stage 1 stood this up; stage 2 migrated the
//! **Goal-1 stages 12** (`design/windows-host-rewrite.md` §2.2): stage 1 stood this up; stage 2 migrated the
//! genuinely-constant operator/dispatch knobs onto it (the dispatch-disagreement bug class: `idd_push`,
//! `capture_backend`, `encoder_pref`, `render_adapter`, `no_wgc`, the vdisplay backend select — plus the
//! plan-named `secure_dda`/`idd_depth`/`zerocopy`/`ten_bit` and the multi-site `perf`/`compositor`/
+11
View File
@@ -3,6 +3,9 @@
//! RGB→YUV on the GPU, so no host-side CSC) and VAAPI on AMD/Intel (`*_vaapi`; the CPU-input
//! fallback swscales RGB→NV12, the zero-copy path imports the capture dmabuf straight into a
//! VA surface). One [`Encoder`] trait, selected in [`open_video`].
// Every unsafe block in this module tree carries a `// SAFETY:` proof; enforce it (unsafe-proof
// program). As a parent module this also covers the child modules (encode::windows/linux::*).
#![deny(clippy::undocumented_unsafe_blocks)]
use crate::capture::{CapturedFrame, PixelFormat};
use anyhow::Result;
@@ -505,6 +508,14 @@ fn windows_gpu_vendor() -> Option<GpuVendor> {
CreateDXGIFactory1, IDXGIFactory1, DXGI_ADAPTER_FLAG_SOFTWARE,
};
static CACHE: OnceLock<Option<GpuVendor>> = OnceLock::new();
// SAFETY: `CreateDXGIFactory1` returns a fresh owned `IDXGIFactory1` COM object (refcounted by the
// windows-rs wrapper, Released when the local drops); `.ok()?` bails on failure so `factory` is a
// valid interface before any use. `EnumAdapters1(i)` hands back the i-th adapter as an owned
// `IDXGIAdapter1` (or an error past the last adapter, which ends the loop). `GetDesc1()` returns the
// `DXGI_ADAPTER_DESC1` by value (no out-pointer), so reading `desc.Flags`/`desc.VendorId` is plain
// field access. Every call only touches COM objects this closure owns; the `OnceLock` runs the
// closure once (no data race) and all interfaces are Released as the locals drop. No raw pointer is
// dereferenced and nothing is aliased.
*CACHE.get_or_init(|| unsafe {
let factory: IDXGIFactory1 = CreateDXGIFactory1().ok()?;
let mut i = 0u32;
@@ -8,6 +8,8 @@
//! does *not* accept — we expand it to `rgb0` (one padding byte/pixel, no colour math).
//! The encoder is opened *without* a global header so VPS/SPS/PPS are emitted in-band on
//! every IDR — the output is both a playable raw Annex-B stream and self-contained AUs.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Codec, EncodedFrame, Encoder};
use crate::capture::{CapturedFrame, FramePayload, PixelFormat};
@@ -79,6 +81,12 @@ impl CudaHw {
impl Drop for CudaHw {
fn drop(&mut self) {
// SAFETY: `frames_ref`/`device_ref` are the two non-null `AVBufferRef`s `CudaHw::new` created
// (it bails before returning `Self` if either alloc fails, so a live `CudaHw` always holds
// both). `av_buffer_unref` drops one reference and nulls the pointer through the `&mut`. This
// `Drop` runs exactly once and `CudaHw` owns these refs exclusively → no double-free /
// use-after-free. Frames are unref'd before the device (the frames ctx internally refs the
// device; refcounted, so the order is sound regardless).
unsafe {
ffi::av_buffer_unref(&mut self.frames_ref);
ffi::av_buffer_unref(&mut self.device_ref);
@@ -136,6 +144,13 @@ pub struct NvencEncoder {
// `CudaHw` holds raw `AVBufferRef`s; the encoder lives on a single thread. The CPU encoder is
// already `Send` via ffmpeg-next; assert it for the CUDA fields too.
// SAFETY: `NvencEncoder` owns an ffmpeg-next `Encoder`/`VideoFrame` (already `Send`) plus a `CudaHw`
// holding raw `AVBufferRef`s, which are not `Send` by default. The encoder is owned and driven by
// exactly ONE thread — the per-session encode thread it is moved to — and is only touched through
// `&mut self` methods, so it is never aliased or accessed concurrently. The wrapped libav contexts
// (and the shared `CUcontext` the `CudaHw` references) have no thread affinity, so transferring
// ownership across threads is sound. This asserts `Send` (transfer) only, extending ffmpeg-next's
// existing `Send` to the raw CUDA fields; `Sync` (shared `&`) is deliberately NOT implemented.
unsafe impl Send for NvencEncoder {}
impl NvencEncoder {
@@ -162,6 +177,9 @@ impl NvencEncoder {
}
ffmpeg::init().context("ffmpeg init")?;
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
// SAFETY: `av_log_set_level` sets libav's global integer log level; `48` (= AV_LOG_DEBUG)
// is a valid level with no pointer args, and libav was just initialized by `ffmpeg::init()`
// above — always sound.
unsafe { ffi::av_log_set_level(48) }; // AV_LOG_DEBUG — surface NVENC hw-frame rejects
}
let name = codec.nvenc_name();
@@ -195,6 +213,11 @@ impl NvencEncoder {
.unwrap_or(1.0);
let vbv_bits = ((bitrate_bps as f64 / fps.max(1) as f64) * vbv_frames as f64)
.clamp(1.0, i32::MAX as f64);
// SAFETY: `video` is the ffmpeg-next encoder builder wrapping a freshly-allocated
// `AVCodecContext` that we hold by value and have not opened yet; `video.as_mut_ptr()` returns
// that non-null, properly-aligned, exclusively-owned context. Writing the plain `rc_buffer_size`
// int field before `open_with` is the supported way to set a field ffmpeg-next exposes no
// setter for. Sole owner → no aliasing; synchronous in-bounds scalar write.
unsafe {
(*video.as_mut_ptr()).rc_buffer_size = vbv_bits as i32;
}
@@ -204,6 +227,9 @@ impl NvencEncoder {
// "freeze". NVENC emits one IDR at stream start, then P-frames only; `forced-idr` (below)
// turns a client recovery request (RFI, via `request_keyframe`) into an IDR on demand.
// This is the Moonlight/Sunshine low-latency model.
// SAFETY: same `video` builder as above — a non-null, properly-aligned, sole-owned, not-yet-
// opened `AVCodecContext`. We write the plain `gop_size` int field (= -1, infinite GOP) before
// `open_with`, which ffmpeg-next has no setter for. No aliasing; synchronous scalar write.
unsafe {
(*video.as_mut_ptr()).gop_size = -1;
}
@@ -214,6 +240,10 @@ impl NvencEncoder {
// RGB-input paths leave these unset (NVENC's internal CSC writes its own VUI). Matches the
// Windows NV12 path's BT.709 limited-range signalling.
if matches!(format, PixelFormat::Nv12) {
// SAFETY: same `video` builder — `raw = video.as_mut_ptr()` is the non-null, properly-
// aligned, sole-owned, not-yet-opened `AVCodecContext`. We set its four VUI colour enum
// fields to valid `AVColorSpace`/`AVColorRange`/`AVColorPrimaries`/`AVColorTransfer-
// Characteristic` variants before `open_with`. Sole owner → no aliasing; synchronous writes.
unsafe {
let raw = video.as_mut_ptr();
(*raw).colorspace = ffi::AVColorSpace::AVCOL_SPC_BT709;
@@ -228,7 +258,17 @@ impl NvencEncoder {
// *before* open (NVENC derives the device from `hw_frames_ctx`).
let cuda_hw = if cuda {
let cu_ctx = crate::zerocopy::cuda::context().context("shared CUDA context")?;
// SAFETY: `CudaHw::new` (an `unsafe fn`) requires libav initialized (the `ffmpeg::init()`
// above ran) and a valid `CUcontext`; `cu_ctx` is the shared importer context from
// `zerocopy::cuda::context()?`, non-null on the `Ok` path. `nvenc_pixel` is a valid `Pixel`
// and `width`/`height` are the validated positive dims. It returns a RAII `CudaHw` wrapping
// (not owning) `cu_ctx` and owning two `AVBufferRef`s freed on drop.
let hw = unsafe { CudaHw::new(cu_ctx, nvenc_pixel, width, height)? };
// SAFETY: `raw = video.as_mut_ptr()` is the non-null, sole-owned, not-yet-opened
// `AVCodecContext`. We set `pix_fmt = CUDA` and attach NEW refs (`av_buffer_ref`) of
// `hw.device_ref`/`hw.frames_ref` — both non-null (`CudaHw::new` guarantees) and from the
// live `hw`, which is moved into `NvencEncoder.cuda` next to `enc` and so outlives the
// encoder. The context owns its own refs (freed when the context closes). No aliasing.
unsafe {
let raw = video.as_mut_ptr();
(*raw).pix_fmt = ffi::AVPixelFormat::AV_PIX_FMT_CUDA;
@@ -428,6 +468,19 @@ impl NvencEncoder {
// The device→device copy below uses our shared context directly; make it current on the
// encode thread (ffmpeg pushes its own around the pool alloc, so order is fine).
crate::zerocopy::cuda::make_current().context("CUDA context current (encode thread)")?;
// SAFETY: `frames_ref` is the non-null CUDA frames ctx from `self.cuda` (unwrapped via
// `.context(..)?` above), and the shared CUDA context was just made current on THIS thread
// (`make_current()?`), the precondition for the device-pointer copies below.
// * `av_frame_alloc` → `f` (null-checked). `av_hwframe_get_buffer(frames_ref, f, 0)` fills `f`
// with a pooled CUDA surface (sets `data[]`/`linesize[]`/`buf[0]`/`hw_frames_ctx`); on
// failure we free `f` and bail.
// * For NV12 we read `(*f).data[0..2]` / `linesize[0..2]` (Y + interleaved UV), else
// `data[0]`/`linesize[0]` — in-struct fields of the non-null `f`, valid for the surface dims
// ffmpeg allocated — and pass them to the cuda copy helpers, which device→device copy `buf`
// (the imported `DeviceBuffer`, owned by the caller and live for this call) into the surface.
// * On copy error we free `f` and return. Otherwise we write `pts`/`pict_type` through `f` and
// `avcodec_send_frame` it into the live owned `self.enc` context (which takes its own ref of
// the pooled surface), then free our `f` ref exactly once. Single-threaded encoder → no race.
unsafe {
let mut f = ffi::av_frame_alloc();
if f.is_null() {
+148 -3
View File
@@ -19,6 +19,8 @@
//! hwdevice/hwframes/buffersrc/buffersink calls go through `ffmpeg::ffi` (= `ffmpeg_sys_next`),
//! as the CUDA encode path and the clients' decode paths already do. The encoder is opened
//! *without* a global header, so VPS/SPS/PPS are in-band on every IDR.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Codec, EncodedFrame, Encoder};
use crate::capture::{CapturedFrame, DmabufFrame, FramePayload, PixelFormat};
@@ -133,6 +135,14 @@ pub fn probe_can_encode(codec: Codec) -> bool {
if ffmpeg::init().is_err() {
return false;
}
// SAFETY: `ffmpeg::init()` returned Ok above, so libav is initialized. `av_log_get_level`/
// `av_log_set_level` only read/write libav's global integer log level (no pointer args) and are
// always sound to call post-init. `VaapiHw::new` (an `unsafe fn`) builds a VAAPI device + NV12
// frames pool from the literal NV12/640x480/pool=2 args and hands back a RAII handle that unrefs
// both `AVBufferRef`s on drop. `open_vaapi_encoder` (an `unsafe fn`) borrows `hw.device_ref`/
// `hw.frames_ref` — the two non-null refs `VaapiHw::new` just created — and `av_buffer_ref`s them
// into the encoder; `hw` is a live local for the whole match arm, so the borrows outlive the
// synchronous call, and both `hw` and the probe encoder are dropped (RAII) when the arm ends.
unsafe {
// A missing VA device (non-VAAPI host, GPU-less CI) is an expected probe outcome — quiet
// ffmpeg's "No VA display found" error for the probe, then restore the level.
@@ -224,6 +234,12 @@ impl VaapiHw {
impl Drop for VaapiHw {
fn drop(&mut self) {
// SAFETY: `frames_ref`/`device_ref` are the two non-null `AVBufferRef`s `VaapiHw::new`
// created (it bails before constructing `Self` if either alloc fails, so a live `VaapiHw`
// always holds both). `av_buffer_unref` drops one reference and nulls the pointer through the
// `&mut`. This `Drop` runs exactly once and `VaapiHw` owns these refs exclusively, so there
// is no double-free / use-after-free. Frames are unref'd before the device because the frames
// ctx internally holds a ref on the device (refcounted, so the order is sound either way).
unsafe {
ffi::av_buffer_unref(&mut self.frames_ref);
ffi::av_buffer_unref(&mut self.device_ref);
@@ -252,7 +268,16 @@ impl CpuInner {
) -> Result<Self> {
let src_pixel = vaapi_sws_src(format)?;
const POOL: c_int = 16;
// SAFETY: `VaapiHw::new` (an `unsafe fn`) requires libav initialized — guaranteed because the
// only path here is `VaapiEncoder::open` → `ensure_inner` → `CpuInner::open`, and `open` ran
// `ffmpeg::init()`. The args are valid: NV12 sw_format, the validated positive `width`/`height`,
// pool=16. It returns a RAII `VaapiHw` that unrefs its two `AVBufferRef`s on drop.
let hw = unsafe { VaapiHw::new(ffi::AVPixelFormat::AV_PIX_FMT_NV12, width, height, POOL)? };
// SAFETY: `open_vaapi_encoder` (an `unsafe fn`) borrows `hw.device_ref`/`hw.frames_ref` — both
// non-null (`VaapiHw::new` guarantees it) and from the `hw` just built above, which is a live
// local that outlives this synchronous call. The fn `av_buffer_ref`s them into the encoder, so
// the encoder holds its own references; `hw` is also moved into the returned `CpuInner` next to
// `enc`, keeping the device/frames alive for the encoder's whole lifetime.
let enc = unsafe {
open_vaapi_encoder(
codec,
@@ -266,6 +291,12 @@ impl CpuInner {
};
// swscale RGB→NV12, BT.709 limited (matches the VUI), no rescale.
let src_av = pixel_to_av(src_pixel);
// SAFETY: `sws_getContext` allocates a swscale context for the given src/dst dimensions and
// pixel formats. All four dims are the encoder's positive `width`/`height` cast to `c_int`;
// `src_av` is a valid `AVPixelFormat` (from `pixel_to_av` of the `vaapi_sws_src`-validated
// `src_pixel`), the dst is NV12. The three trailing pointers (srcFilter, dstFilter, param) are
// explicitly null = "use defaults", which the API documents as accepted. No Rust memory is
// borrowed — only by-value ints/enums — and the returned pointer is null-checked just below.
let sws = unsafe {
ffi::sws_getContext(
width as c_int,
@@ -283,10 +314,23 @@ impl CpuInner {
if sws.is_null() {
bail!("sws_getContext(RGB→NV12) failed");
}
// SAFETY: `sws` is the non-null `SwsContext` from `sws_getContext` above (the `is_null()`
// check immediately preceding returned false). `sws_getCoefficients(SWS_CS_ITU709)` returns a
// pointer into a libswscale static const coefficient table valid for the whole process, reused
// here for both the inverse (src) and forward (dst) matrices. `sws_setColorspaceDetails` only
// reads those tables and writes scalar CSC settings into `sws`; the table pointer outlives the
// synchronous call and no Rust memory is passed.
unsafe {
let cs709 = ffi::sws_getCoefficients(SWS_CS_ITU709);
ffi::sws_setColorspaceDetails(sws, cs709, 1, cs709, 0, 0, 1 << 16, 1 << 16);
}
// SAFETY: `av_frame_alloc` returns a fresh, uniquely-owned heap `AVFrame` (null-checked — on
// null we free the already-built `sws` and bail). We then write the plain `format`/`width`/
// `height` fields through the non-null, properly-aligned `f` (sole owner, not yet shared).
// `av_frame_get_buffer(f, 0)` allocates backing storage for those dims/format; on failure we
// free `f` and `sws` (unwinding the half-built state) and bail. On success `f` is a fully-owned
// NV12 frame stored in `CpuInner.nv12` and freed once in `CpuInner::drop`. `f` is a unique
// fresh pointer, so none of these writes alias anything.
let nv12 = unsafe {
let f = ffi::av_frame_alloc();
if f.is_null() {
@@ -329,6 +373,18 @@ impl CpuInner {
let h = self.height as usize;
let src_row = w * self.src_format.bytes_per_pixel();
anyhow::ensure!(bytes.len() >= src_row * h, "captured buffer too small");
// SAFETY: The `ensure!`s above guarantee `format == self.src_format` and
// `bytes.len() >= src_row * h`. `sws_scale` reads `h` rows of `src_row` bytes from
// `src_data[0] = bytes.as_ptr()` (the other planes null/0 — packed RGB is single-plane), all
// in bounds; `bytes`, `src_data`, `src_stride` are live locals for this synchronous call.
// `self.sws` is the non-null context built in `open`; it writes into `self.nv12` (a non-null
// owned frame whose `data`/`linesize` in-struct arrays were sized by `av_frame_get_buffer`).
// `av_frame_alloc` (null-checked) yields a fresh `hwf`; `av_hwframe_get_buffer` pulls a pooled
// VAAPI surface from the live non-null `self.hw.frames_ref`; `av_hwframe_transfer_data` uploads
// the staged NV12 into it — both frames live, failures free `hwf` and bail. We then write
// `pts`/`pict_type` through the non-null `hwf` and `avcodec_send_frame` it into the live
// owned `self.enc` context (which takes its own ref), then free our `hwf` ref exactly once.
// The encoder runs only on this thread (see `unsafe impl Send`), so no aliasing/data race.
unsafe {
let src_data: [*const u8; 4] = [bytes.as_ptr(), ptr::null(), ptr::null(), ptr::null()];
let src_stride: [c_int; 4] = [src_row as c_int, 0, 0, 0];
@@ -374,6 +430,12 @@ impl CpuInner {
impl Drop for CpuInner {
fn drop(&mut self) {
// SAFETY: `self.nv12` (an owned `AVFrame`) and `self.sws` (an owned `SwsContext`) are each
// freed exactly once here, guarded by `is_null()` so a never-set pointer is skipped (no double
// free). `CpuInner` owns both exclusively and `Drop` runs once. `av_frame_free` takes `&mut`
// and nulls the pointer. `self.enc`/`self.hw` are freed afterward by their own `Drop` impls;
// the encoder holds its own `av_buffer_ref`'d device/frames copies, so field-drop order is
// irrelevant to soundness.
unsafe {
if !self.nv12.is_null() {
ffi::av_frame_free(&mut self.nv12);
@@ -417,6 +479,31 @@ impl DmabufInner {
let drm_fourcc = crate::zerocopy::drm_fourcc(format)
.ok_or_else(|| anyhow!("no DRM fourcc for {format:?} (VAAPI zero-copy)"))?;
let node = render_node();
// SAFETY: libav is initialized (`VaapiEncoder::open` ran `ffmpeg::init()` before
// `ensure_inner` → `DmabufInner::open`). Every raw pointer dereferenced below is either freshly
// allocated by the immediately-preceding ffmpeg call and null-checked, or an in-struct field of
// such an object:
// * `node` is a `CString` (from `render_node`) live for the whole block; its `.as_ptr()` is a
// NUL-terminated path read only during `av_hwdevice_ctx_create`.
// * `av_hwdevice_ctx_create(&mut drm_device, DRM, …)` / `…_create_derived(&mut vaapi_device,
// VAAPI, drm_device, …)`: on `r < 0` the out-param stays null and we bail (the derive path
// unrefs `drm_device` first); on success each is a non-null owned `AVBufferRef`.
// * `av_hwframe_ctx_alloc(drm_device)` → `drm_frames` (null-checked); `(*drm_frames).data` is
// its `AVHWFramesContext` payload, written before `av_hwframe_ctx_init`.
// * `avfilter_graph_alloc` → `graph` (null-checked); `avfilter_get_by_name` returns a static
// const `AVFilter` (process-lifetime) or null; `avfilter_graph_alloc_filter` allocates each
// filter ctx inside `graph`; the four are null-checked together. `inst`/arg strings are
// 'static C literals.
// * `(*hwmap/scale).hw_device_ctx = av_buffer_ref(vaapi_device)` attaches a NEW ref owned by
// the filter (freed by `avfilter_graph_free`); our `vaapi_device` ref is untouched.
// * `av_buffersink_get_hw_frames_ctx(sink)` → `nv12_ctx` is a borrowed ref owned by the sink,
// valid while `graph` lives (and `graph` is moved into the returned `DmabufInner`).
// * `open_vaapi_encoder` borrows `vaapi_device` (our live owned ref) and `nv12_ctx` (sink's
// live ref) and `av_buffer_ref`s both into the encoder.
// Every early-error path unref's the allocated buffers and frees the graph in the right order
// before bailing; on success the four `AVBufferRef`s + `graph` + `src`/`sink` are moved into
// `DmabufInner` and freed in its `Drop`. (Two non-UB leaks noted below: `av_buffersrc_*` and
// the final `?`.)
unsafe {
// DRM device (source dmabuf frames) + a VAAPI device derived from it (same GPU) for
// hwmap/scale_vaapi/the encoder.
@@ -509,7 +596,12 @@ impl DmabufInner {
num: 1,
den: fps as c_int,
};
(*par).hw_frames_ctx = ffi::av_buffer_ref(drm_frames);
// Assign `drm_frames` BORROWED (no extra ref): `av_buffersrc_parameters_set` takes its
// own ref of `par->hw_frames_ctx` (via av_buffer_replace), and `av_free(par)` frees only
// the struct, not the ref. Our single owned `drm_frames` ref is retained, lives in
// `DmabufInner`, and is unref'd in `Drop`. Wrapping it in `av_buffer_ref` here would leak
// that extra ref every session (the persistent listener would accumulate them).
(*par).hw_frames_ctx = drm_frames;
let r = ffi::av_buffersrc_parameters_set(src, par);
ffi::av_free(par as *mut _);
if r < 0 {
@@ -564,7 +656,12 @@ impl DmabufInner {
ffi::av_buffer_unref(&mut drm_device);
bail!("filter sink has no VAAPI frames context");
}
let enc = open_vaapi_encoder(
// On encoder-open failure, free the graph + our owned buffer refs before bailing (matching
// every error path above) so a failed session doesn't leak them. `nv12_ctx` is borrowed
// from the sink (owned by `graph`), so `avfilter_graph_free` reclaims it — don't unref it
// separately. On success the encoder takes its own ref of `vaapi_device`, and `drm_frames`/
// `vaapi_device`/`drm_device`/`graph` move into `DmabufInner` (freed in `Drop`).
let enc = match open_vaapi_encoder(
codec,
width,
height,
@@ -572,7 +669,16 @@ impl DmabufInner {
bitrate_bps,
vaapi_device,
nv12_ctx,
)?;
) {
Ok(enc) => enc,
Err(e) => {
ffi::avfilter_graph_free(&mut graph);
ffi::av_buffer_unref(&mut drm_frames);
ffi::av_buffer_unref(&mut vaapi_device);
ffi::av_buffer_unref(&mut drm_device);
return Err(e);
}
};
tracing::info!(
encoder = codec.vaapi_name(),
@@ -600,6 +706,23 @@ impl DmabufInner {
dmabuf.fourcc,
self.fourcc
);
// SAFETY: The `ensure!` above checked `dmabuf.fourcc == self.fourcc`.
// * `std::mem::zeroed::<AVDRMFrameDescriptor>()` is sound: it is a `#[repr(C)]` POD of ints and
// nested int-struct arrays (no `NonNull`/refs), for which all-zero is a valid bit pattern;
// `Box` puts it on the heap with a unique owner.
// * `dmabuf.fd.as_raw_fd()` is the fd of the caller's `&DmabufFrame`, which owns it for the
// whole synchronous `submit`; we describe one object/layer/plane from its
// fourcc/modifier/offset/stride and pass `object.size = 0` (ffmpeg queries the real size).
// * `av_frame_alloc` → `drm` (null-checked); we set its scalar fields and
// `hw_frames_ctx = av_buffer_ref(self.drm_frames)` (new ref of the live owned ctx).
// * `data[0] = Box::into_raw(desc)` transfers the box into the frame; `buf[0] =
// av_buffer_create(.., free_desc, ..)` registers a destructor that reclaims it exactly once
// when the buffer's refcount hits zero — matched alloc/free, no leak/double-free.
// * `av_buffersrc_add_frame_flags(self.src, drm, KEEP_REF)` pushes a ref into the live
// buffersrc; KEEP_REF keeps our own `drm` ref, which we then `av_frame_free`. We pull the
// converted surface with `av_buffersink_get_frame(self.sink, nv12)` BEFORE returning, so the
// dmabuf (owned by the caller) is read while still valid. `nv12` is sent into the live owned
// `self.enc` (takes its own ref) and our ref freed once. Single-threaded encoder → no race.
unsafe {
// Build a DRM-PRIME AVFrame describing the dmabuf (one object/fd, one layer/plane).
let mut desc: Box<ffi::AVDRMFrameDescriptor> = Box::new(std::mem::zeroed());
@@ -626,6 +749,11 @@ impl DmabufInner {
// Own the descriptor so it frees with the frame (the fd is owned by the DmabufFrame,
// which outlives this call — the graph reads the surface before submit returns).
extern "C" fn free_desc(_opaque: *mut std::ffi::c_void, data: *mut u8) {
// SAFETY: `data` is exactly the pointer produced by `Box::into_raw(desc)` and passed as
// `av_buffer_create`'s first arg, which libav hands back verbatim to this callback. It
// is a valid, uniquely-owned `Box<AVDRMFrameDescriptor>` raw pointer; libav invokes the
// callback exactly once (when the last buffer ref drops), so `from_raw` + `drop`
// reclaims it exactly once — no double-free. `_opaque` is unused (we passed null).
unsafe { drop(Box::from_raw(data as *mut ffi::AVDRMFrameDescriptor)) };
}
(*drm).buf[0] = ffi::av_buffer_create(
@@ -673,6 +801,13 @@ impl DmabufInner {
impl Drop for DmabufInner {
fn drop(&mut self) {
// SAFETY: `graph`/`drm_frames`/`vaapi_device`/`drm_device` are the non-null objects
// `DmabufInner::open` built and moved into `self` (open bails before constructing `Self` if any
// alloc fails). `avfilter_graph_free` frees the graph (and the per-filter device refs it owns);
// each `av_buffer_unref` drops one ref and nulls the pointer via `&mut`. `DmabufInner` owns all
// four exclusively and `Drop` runs once → no double-free/use-after-free. The graph is freed
// first (it holds refs on the devices), then frames, then the derived VAAPI device, then DRM.
// (`self.enc` drops via ffmpeg-next afterward, holding its own refs.)
unsafe {
ffi::avfilter_graph_free(&mut self.graph);
ffi::av_buffer_unref(&mut self.drm_frames);
@@ -703,6 +838,13 @@ pub struct VaapiEncoder {
}
// Raw FFI pointers; the encoder lives on a single thread (same contract as `NvencEncoder`).
// SAFETY: `VaapiEncoder`'s `Inner` holds raw FFI pointers (`SwsContext`, `AVFrame`, `AVBufferRef`,
// `AVFilterContext`, `AVCodecContext`) that are not `Send` by default. The encoder is owned and
// driven by exactly ONE thread — the host's per-session encode thread it is moved (transferred) to —
// and is only ever touched through `&mut self` methods, so it is never aliased or accessed
// concurrently from two threads. None of the underlying libav/libswscale objects have thread
// affinity (they are not thread-local), so transferring ownership across threads is sound. This
// asserts `Send` (transfer) only; `Sync` (shared `&`) is deliberately NOT implemented.
unsafe impl Send for VaapiEncoder {}
impl VaapiEncoder {
@@ -720,6 +862,9 @@ impl VaapiEncoder {
}
ffmpeg::init().context("ffmpeg init")?;
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
// SAFETY: `av_log_set_level` sets libav's global integer log level; `48` (= AV_LOG_DEBUG)
// is a valid level and there are no pointer args. libav was just initialized by the
// `ffmpeg::init()` above, so the call is always sound.
unsafe { ffi::av_log_set_level(48) };
}
// Validate the codec/format up front so a bad request fails at open, not on the first frame.
@@ -28,6 +28,8 @@
//! through `ffmpeg::ffi` (= `ffmpeg_sys_next`), exactly as the Linux CUDA/VAAPI paths do. The
//! `AVD3D11VADeviceContext`/`AVD3D11VAFramesContext` layouts are mirrored (the bindings don't
//! allowlist `hwcontext_d3d11va.h`), as [`super::linux`] mirrors `AVCUDADeviceContext`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Codec, EncodedFrame, Encoder};
use crate::capture::{dxgi::D3d11Frame, CapturedFrame, FramePayload, PixelFormat};
@@ -243,6 +245,12 @@ pub fn probe_can_encode(vendor: WinVendor, codec: Codec) -> bool {
if ffmpeg::init().is_err() {
return false;
}
// SAFETY: `ffmpeg::init()` succeeded above, so libav's global state is initialised.
// `av_log_get_level`/`av_log_set_level` are global scalar getters/setters with no pointer args.
// `open_win_encoder` (the `unsafe fn`) is called with null `device_ref`/`frames_ref` (the system
// path), so it touches no D3D11/hwcontext — it only allocates and opens a self-contained
// libavcodec encoder that is dropped at the end of `.is_ok()`. We restore the prior log level and
// no raw pointer escapes the block.
unsafe {
// A missing AMF/QSV runtime (wrong-vendor host, GPU-less CI) is an expected probe outcome —
// quiet ffmpeg's open error for the probe, then restore the level.
@@ -337,6 +345,10 @@ impl SystemInner {
} else {
ffi::AVPixelFormat::AV_PIX_FMT_NV12
};
// SAFETY: calls the `unsafe fn open_win_encoder` with null `device_ref`/`frames_ref`, so the
// system path is taken (no hw device/frames context is touched); all other args are scalars.
// The returned `encoder::video::Encoder` owns its `AVCodecContext` and frees it on drop; no raw
// pointer is aliased.
let enc = unsafe {
open_win_encoder(
vendor,
@@ -352,6 +364,11 @@ impl SystemInner {
ptr::null_mut(),
)?
};
// SAFETY: `av_frame_alloc` returns a freshly-allocated, uniquely-owned `AVFrame` (null-checked
// before any deref); writing `format`/`width`/`height` through `*f` stays inside that
// allocation. `av_frame_get_buffer(f, 0)` allocates the backing planes — on failure we
// `av_frame_free` the sole owner (no double-free) and bail; on success the raw `f` is moved into
// `self.sw_frame` and freed exactly once in `Drop`.
let sw_frame = unsafe {
let f = ffi::av_frame_alloc();
if f.is_null() {
@@ -467,6 +484,18 @@ impl SystemInner {
} else {
DXGI_FORMAT_NV12
};
// SAFETY: `ensure_staging` builds a STAGING texture (CPU_ACCESS_READ) matching `dxgi_fmt` on
// `frame.device` — the same `ID3D11Device` that owns `frame.texture` — and caches that device's
// immediate context in `self.ctx`. `src`/`dst` are that device's textures of identical NV12/P010
// format and dimensions, so `CopyResource` on the single-threaded immediate context is valid.
// `Map(.., D3D11_MAP_READ)` succeeds on a staging texture and yields `map.pData` valid for the
// whole resource; for NV12/P010 the luma plane is `H` rows at `RowPitch` and the chroma plane
// follows at byte offset `RowPitch*H` (`H/2` rows), so `total = pitch*(H+⌈H/2⌉)` is exactly the
// mapped extent and `from_raw_parts(base, total)` stays in-bounds. Each `copy_nonoverlapping`
// reads a bounds-checked `mapped[..]` sub-slice (`row_bytes ≤ pitch`) and writes `row_bytes ≤
// linesize` into the `av_frame_get_buffer`-allocated plane at row `y < H`, so every destination
// offset is inside the frame's plane allocation; src and dst never alias. `Unmap` pairs `Map`,
// then `send` (the `unsafe fn`) hands `sw_frame` to the encoder.
unsafe {
self.ensure_staging(&frame.device, dxgi_fmt)?;
let staging = self.staging.clone().context("staging texture")?;
@@ -510,6 +539,14 @@ impl SystemInner {
if self.ten_bit {
bail!("ffmpeg_win: BGRA readback is 8-bit only (HDR needs the P010 capture path)");
}
// SAFETY: `ensure_staging` builds a B8G8R8A8 STAGING texture on `frame.device` and caches that
// device's immediate context; `src`/`dst` are that device's textures of matching BGRA format,
// so `CopyResource` on the single-threaded context is valid. `Map(READ)` on the staging texture
// yields `base` valid for `pitch` × `h` rows. `ensure_sws` lazily builds the BGRA→NV12 context;
// `sws_scale` reads `h` rows of `pitch` bytes from `base` (in-bounds — the staging surface is
// `≥ pitch*h`) into the `sw_frame` planes addressed by its `data`/`linesize` (allocated for
// `width`×`height` NV12). `Unmap` pairs `Map`; the cached `sws` is freed once in `Drop`. The
// mapped read region never aliases the owned encoder frame.
unsafe {
self.ensure_staging(&frame.device, DXGI_FORMAT_B8G8R8A8_UNORM)?;
let staging = self.staging.clone().context("staging texture")?;
@@ -552,6 +589,13 @@ impl SystemInner {
/// R10 shader output instead of P010. DXGI `R10G10B10A2_UNORM` (R in the low 10 bits, X2 alpha in
/// the top 2) == FFmpeg `AV_PIX_FMT_X2BGR10LE`. UNTESTED on glass (no AMD/Intel Windows box).
fn readback_rgb10(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> {
// SAFETY: same shape as `readback_yuv`/`readback_bgra` — `ensure_staging` builds an
// R10G10B10A2 STAGING texture on `frame.device` and caches its immediate context; `src`/`dst`
// are that device's matching-format textures, so `CopyResource` on the single-threaded context
// is valid. `Map(READ)` yields `base` valid for `pitch` × `h` rows. `ensure_sws` builds the
// X2BGR10LE→P010 (BT.2020) context; `sws_scale` reads `h` rows of `pitch` bytes from `base`
// (in-bounds) into the `sw_frame` P010 planes (`data`/`linesize`, allocated `width`×`height`).
// `Unmap` pairs `Map`; `sws` is freed once in `Drop`. No aliasing between read and write.
unsafe {
self.ensure_staging(&frame.device, DXGI_FORMAT_R10G10B10A2_UNORM)?;
let staging = self.staging.clone().context("staging texture")?;
@@ -605,6 +649,12 @@ impl SystemInner {
let h = self.height as usize;
let src_row = w * format.bytes_per_pixel();
anyhow::ensure!(bytes.len() >= src_row * h, "captured buffer too small");
// SAFETY: `ensure_sws` lazily builds the (packed RGB/BGR)→NV12 context for this fixed src/dst
// format pair. `src_data[0] = bytes.as_ptr()` with `src_stride[0] = src_row`; the `ensure!`
// above guarantees `bytes` holds at least `src_row*h` bytes, so `sws_scale` reads `h` rows of
// `src_row` bytes in-bounds and writes the `sw_frame` NV12 planes (`data`/`linesize`, allocated
// `width`×`height`). `bytes` is borrowed for the call only and never aliases the owned
// `sw_frame`. `send` then hands `sw_frame` to the encoder.
unsafe {
self.ensure_sws(
pixel_to_av(sws_src(format)?),
@@ -667,6 +717,10 @@ impl SystemInner {
impl Drop for SystemInner {
fn drop(&mut self) {
// SAFETY: `sw_frame` is the `AVFrame` allocated in `open` (or null) — `av_frame_free` drops it
// once and nulls the pointer through the `&mut`; `sws` is the cached `SwsContext` (or null) —
// `sws_freeContext` frees it once. This `Drop` runs exactly once and `SystemInner` owns both
// exclusively, so there is no double-free or use-after-free.
unsafe {
if !self.sw_frame.is_null() {
ffi::av_frame_free(&mut self.sw_frame);
@@ -745,6 +799,12 @@ impl D3d11Hw {
impl Drop for D3d11Hw {
fn drop(&mut self) {
// SAFETY: `frames_ref`/`device_ref` are the two non-null `AVBufferRef`s `D3d11Hw::new` created
// (it bails before constructing `Self` if either alloc/init fails, so a live `D3d11Hw` always
// holds both). `av_buffer_unref` drops one reference and nulls the pointer through the `&mut`.
// This `Drop` runs exactly once and `D3d11Hw` owns these refs exclusively → no double-free /
// use-after-free. Frames are unref'd before the device because the frames ctx internally holds
// a ref on the device (refcounted, so the order is sound either way).
unsafe {
ffi::av_buffer_unref(&mut self.frames_ref);
ffi::av_buffer_unref(&mut self.device_ref);
@@ -800,6 +860,18 @@ impl ZeroCopyInner {
WinVendor::Qsv => (D3D11_BIND_DECODER.0 | D3D11_BIND_VIDEO_ENCODER.0) as u32,
};
const POOL: c_int = 8;
// SAFETY: `D3d11Hw::new` wraps the capturer's `device` as a D3D11VA hwdevice (handing FFmpeg an
// owned AddRef of it, balanced by FFmpeg's teardown Release) and builds an owned
// device_ref/frames_ref pair freed by `D3d11Hw::Drop`; `hw` is a local, so it is dropped (and
// both refs freed) on every early `return Err`. For QSV, `av_hwdevice_ctx_create_derived` and
// `av_hwframe_ctx_create_derived` fill the null-initialised `qsv_device`/`qsv_frames` out-params
// only on success (`r >= 0` checked); on the frames-derive failure we unref the already-created
// `qsv_device` before bailing. `open_win_encoder` internally `av_buffer_ref`s the dev/frames
// refs it is given (so ownership of `hw`'s and the derived refs stays here), and on its failure
// we unref the still-owned derived `qsv_frames`/`qsv_device` (null for AMF → skipped) and return
// — `hw` then drops its D3D11 refs. On success the derived refs are moved into `ZeroCopyInner`
// (freed in its `Drop`) and the encoder holds its own AddRef'd copies. Every `AVBufferRef` is
// unref'd exactly once across all paths — no leak, no double-free.
unsafe {
let hw = D3d11Hw::new(device, sw_av, bind_flags, width, height, POOL)?;
let (pix_fmt, dev_ref, frames_ref, mut qsv_device, mut qsv_frames) = match vendor {
@@ -887,6 +959,19 @@ impl ZeroCopyInner {
}
fn submit(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> {
// SAFETY: `d3d = av_frame_alloc()` is a fresh owned frame (null-checked) and is `av_frame_free`d
// exactly once on every path below. `av_hwframe_get_buffer` fills it from the pool — on failure
// we free it and bail. `(*d3d).data[0]` is the pool's texture-array and `data[1]` the array
// index; `from_raw_borrowed` borrows that `ID3D11Texture2D` WITHOUT taking ownership (no Release
// — the frame owns it) and is null-checked. `src` (the captured texture) and `dst` (the pooled
// slice) live on the SAME D3D11 device wrapped by `self.hw`, and the caller guarantees
// `captured.format == pool_format` before calling, so `CopySubresourceRegion(dst, dst_index, ..,
// src, 0, ..)` on the single-threaded immediate context `self.ctx` is a valid same-format GPU
// copy. For QSV the mapped `qsv` frame is a fresh owned frame whose `hw_frames_ctx` takes an
// `av_buffer_ref` of `self.qsv_frames`; it is `av_frame_free`d (releasing that ref) on both the
// map-failure and success paths. `avcodec_send_frame` only internally refs the input frame, so
// the `av_frame_free(d3d)`/`av_frame_free(qsv)` afterwards are the sole owning frees — no leak,
// no double-free, no use-after-free.
unsafe {
// Pull a pooled D3D11 surface; its data[0] is the pool's texture-ARRAY, data[1] the slice.
let mut d3d = ffi::av_frame_alloc();
@@ -959,6 +1044,11 @@ impl ZeroCopyInner {
impl Drop for ZeroCopyInner {
fn drop(&mut self) {
// SAFETY: `qsv_frames`/`qsv_device` are the derived QSV `AVBufferRef`s (or null for AMF); each
// is `av_buffer_unref`'d once here (nulling the pointer through the `&mut`) — `ZeroCopyInner`
// owns these handles exclusively and this `Drop` runs once, so no double-free. The `enc` and
// `hw` fields free the encoder's AddRef'd copies and the D3D11 device/frames refs through their
// own `Drop`, so all references stay balanced.
unsafe {
if !self.qsv_frames.is_null() {
ffi::av_buffer_unref(&mut self.qsv_frames);
@@ -996,6 +1086,13 @@ pub struct FfmpegWinEncoder {
}
// Raw FFI pointers + COM objects; the encoder lives on a single thread (same contract as NVENC/VAAPI).
// SAFETY: `FfmpegWinEncoder` owns raw libav pointers (`AVFrame`/`SwsContext`/`AVBufferRef`) and
// windows-rs COM handles (`ID3D11Device`/`ID3D11DeviceContext`/textures) that are not auto-`Send`. The
// session creates the encoder, drives `submit`/`poll`/`flush`, and drops it all on one dedicated encode
// thread; it is never shared by reference across threads, and the D3D11 immediate context is only ever
// touched from that thread. The only cross-thread action is the initial move to the encode thread,
// after which every interior pointer/COM ref is used single-threaded — the same contract the
// NVENC/VAAPI encoders rely on. No interior state is accessed concurrently.
unsafe impl Send for FfmpegWinEncoder {}
impl FfmpegWinEncoder {
@@ -1012,6 +1109,8 @@ impl FfmpegWinEncoder {
) -> Result<Self> {
ffmpeg::init().context("ffmpeg init")?;
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
// SAFETY: `ffmpeg::init()` ran on the line above, so libav is initialised; `av_log_set_level`
// is a global scalar setter with no pointer arguments.
unsafe { ffi::av_log_set_level(48) };
}
// Make sure the encoder name exists in this libavcodec build up front (clear error vs a
@@ -13,6 +13,9 @@
//! Needs a real NVIDIA GPU at runtime (session creation fails otherwise) — compiles GPU-less, but
//! `open`/`submit` only succeed on a GPU box. The software encoder (`super::sw`) is the fallback.
// Every `unsafe` block / impl in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Codec, EncodedFrame, Encoder, EncoderCaps};
use crate::capture::{CapturedFrame, FramePayload, PixelFormat};
use anyhow::{anyhow, bail, Context, Result};
@@ -88,7 +91,15 @@ pub struct NvencD3d11Encoder {
init_device: *mut c_void,
}
// Raw NVENC handle + COM ptrs; confined to the single encode thread (like the Linux encoder).
// SAFETY: the `!Send` fields are the raw NVENC session/device handles (`encoder`, `init_device`),
// the raw NVENC bitstream/registered/mapped pointers carried in `bitstreams`/`regs`/`pending`, and
// the `ID3D11Texture2D` COM refs — none of which may be touched concurrently from two threads. This
// encoder is owned by exactly one thread: it is moved onto the host encode thread once at
// construction, and every NVENC call and D3D11 access happens only from that thread thereafter
// (`submit`/`poll`/`invalidate_ref_frames`/`Drop` all run there, like the Linux encoder). Moving the
// handles across that single ownership-transfer boundary is sound because no NVENC/D3D11 call is in
// flight during the move and the session and its D3D11 immediate context are never shared (`&`) or
// used concurrently — so `Send` introduces no data race on the non-`Send` fields.
unsafe impl Send for NvencD3d11Encoder {}
impl NvencD3d11Encoder {
@@ -403,6 +414,17 @@ impl NvencD3d11Encoder {
/// Lazily create the session on the first frame's D3D11 device (so capture + encode share it).
fn init_session(&mut self, device: &ID3D11Device) -> Result<()> {
// SAFETY: every call below goes through a function pointer resolved once from the loaded
// `nvidia_video_codec_sdk::ENCODE_API` (`nvEncodeAPI`) table, or through this type's own
// `unsafe fn`s whose contract is met here. `query_caps`/`try_open_session` receive `device`,
// the live `ID3D11Device` the caller pulled off the first frame; each returns either a valid
// open NVENC session handle or an `Err`. `destroy_encoder` is only ever called on a handle a
// `try_open_session` just returned (and `best` only when `!best.is_null()`), so it never frees
// a dangling or null session. `create_bitstream_buffer` is passed `enc` — the one chosen live
// session — and `&mut cb`, a `#[repr(C)] NV_ENC_CREATE_BITSTREAM_BUFFER` whose `version` is set
// to `NV_ENC_CREATE_BITSTREAM_BUFFER_VER`; `cb` lives across the synchronous call and its
// returned `bitstreamBuffer` is copied into `self.bitstreams` before `cb` drops. No handle
// escapes the encode thread.
unsafe {
// Probe real GPU caps first (max dims / 10-bit / custom-VBV / RFI) so the config below is
// gated on what this card supports and an out-of-range mode fails with a clear error
@@ -589,6 +611,11 @@ impl Encoder for NvencD3d11Encoder {
new = format!("{}x{}", captured.width, captured.height),
"NVENC: capture device/size/HDR changed — re-initializing session"
);
// SAFETY: `teardown` (an `unsafe fn`) requires the encode thread with no NVENC call in
// flight and a session whose cached regs/bitstreams/pending all belong to `self.encoder`.
// All hold: this is the synchronous encode thread, `self.inited` so `self.encoder` is the
// live session every cached resource was created against, and the previous frame's encode
// has already been polled (synchronous submit→poll), so nothing is mid-encode.
unsafe { self.teardown() };
}
if !self.inited {
@@ -609,7 +636,14 @@ impl Encoder for NvencD3d11Encoder {
self.bit_depth = 10;
nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ABGR10
}
PixelFormat::Nv12 => nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_NV12,
PixelFormat::Nv12 => {
// NV12 is 8-bit 4:2:0. Force 8-bit so a transition from a prior P010 (10-bit) session
// — or a 10-bit-negotiated client on an SDR display — re-inits at the matching depth.
// Unlike ARGB (which NVENC upconverts to Main10), NV12 cannot feed a 10-bit session:
// `register_resource` rejects it as InvalidParam (the HDR→SDR-toggle stream drop).
self.bit_depth = 8;
nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_NV12
}
_ => nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB,
};
let device = frame.device.clone();
@@ -618,6 +652,21 @@ impl Encoder for NvencD3d11Encoder {
}
let slot = self.next % POOL;
self.next += 1;
// SAFETY: every NVENC call goes through a function pointer from the loaded `ENCODE_API` table
// and takes `self.encoder`, the live session `init_session` just established (non-null on the
// path that reaches here). `NV_ENC_REGISTER_RESOURCE rr` has `version =
// NV_ENC_REGISTER_RESOURCE_VER` and registers `frame.texture` — a D3D11 texture from
// `frame.device`, which is the SAME device the session was opened against (any device change
// tears down and re-inits above, so `init_device == frame.device.as_raw()` here); the cloned
// `ID3D11Texture2D` is kept alive in `regs` so NVENC's registration never outlives the texture.
// `mp` (`NV_ENC_MAP_INPUT_RESOURCE`, version set) maps that registration and the map is recorded
// in `pending` to be unmapped exactly once in `poll`/`teardown`. `pic` (`NV_ENC_PIC_PARAMS`,
// version set) points `inputBuffer` at `mp.mappedResource` and `outputBitstream` at the live
// pool bitstream `bitstreams[slot]`; the optional SEI scratch (`mastering_sei`/`cll_sei` and the
// `sei` Vec whose `as_mut_ptr()` is written into the codec union) are stack locals that outlive
// the synchronous `encode_picture`. Every `#[repr(C)]` param is a live local borrowed `&mut`
// for the duration of its one synchronous call. (In-place encode without `CopyResource` is
// sound because the encode loop is synchronous, as the module docs state.)
unsafe {
// Register the capturer's texture with NVENC once (cached by raw pointer), then encode it
// IN PLACE — no `CopyResource` into an encoder-owned pool. This is the zero-copy win: the
@@ -774,6 +823,12 @@ impl Encoder for NvencD3d11Encoder {
// We tag each input with `inputTimeStamp = frame_idx` (0,1,2,…), which is also the client's
// frame number (the packetizer numbers frames in submit order), so the client's lost-frame
// range maps 1:1 onto the timestamps NVENC invalidates here.
// SAFETY: `invalidate_ref_frames` is a function pointer from the loaded `ENCODE_API` table.
// `self.encoder` was checked non-null at the top of this fn and is the live session; this runs
// on the encode thread (like submit/poll), so there is no concurrent NVENC use. Each `ts` was
// clamped to `[oldest_in_dpb, frame_idx - 1]` above, so it names a frame still in the session's
// DPB; the call passes only that `u64` timestamp (no struct), so there is no struct-size or
// lifetime concern.
unsafe {
for ts in first..=last {
if (API.invalidate_ref_frames)(self.encoder, ts as u64)
@@ -792,6 +847,16 @@ impl Encoder for NvencD3d11Encoder {
let Some((bs, map, pts_ns)) = self.pending.pop_front() else {
return Ok(None);
};
// SAFETY: a non-empty `pending` implies `submit` ran, so `self.encoder` is the live session
// (`teardown` clears `pending` whenever it nulls the handle); all calls below use function
// pointers from the loaded `ENCODE_API` table on the encode thread. `NV_ENC_LOCK_BITSTREAM lock`
// (version = `NV_ENC_LOCK_BITSTREAM_VER`) locks `bs`, a pool bitstream a prior `encode_picture`
// targeted; `lock_bitstream` blocks until that encode finishes, so on success
// `lock.bitstreamBufferPtr` is non-null and points at `lock.bitstreamSizeInBytes` bytes of
// NVENC-owned, CPU-readable output valid until `unlock_bitstream`. The `from_raw_parts` slice is
// only read (copied via `to_vec()`) BEFORE `unlock_bitstream(bs)` — lock and unlock pair on the
// same buffer — so it never outlives the lock. `map` (the input resource paired with `bs` in
// `pending`) is unmapped here, after the encode completed, exactly once.
unsafe {
let mut lock = nv::NV_ENC_LOCK_BITSTREAM {
version: nv::NV_ENC_LOCK_BITSTREAM_VER,
@@ -831,6 +896,11 @@ impl Encoder for NvencD3d11Encoder {
impl Drop for NvencD3d11Encoder {
fn drop(&mut self) {
// SAFETY: `teardown` (an `unsafe fn`) needs the owning thread with no NVENC call in flight and
// a session whose cached resources all belong to `self.encoder`. At Drop this encoder is owned
// exclusively (no other reference can exist), runs on the encode thread it was confined to, and
// `teardown` early-returns when `self.encoder` is null; otherwise every cached reg/bitstream/
// pending was created against that live session. It runs exactly once (here).
unsafe { self.teardown() };
}
}
@@ -2,6 +2,8 @@
//! fallback when NVENC is unavailable). Low-latency screen-content config: single-reference,
//! no B-frames (Baseline), bitrate rate-control, in-band SPS/PPS each IDR, BT.709 limited range.
//! Synchronous: `submit` encodes immediately and stashes the AU for `poll` (no internal queue).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{EncodedFrame, Encoder};
use crate::capture::{CapturedFrame, FramePayload, PixelFormat};
@@ -30,6 +32,12 @@ pub struct OpenH264Encoder {
}
// openh264's Encoder holds a raw C handle (not auto-Send); it lives on the single encode thread.
// SAFETY: `OpenH264Encoder` wraps `Oh264` (openh264's `Encoder`), which holds a raw C handle to the
// openh264 `ISVCEncoder` and is not auto-`Send`; the other fields (`YUVBuffer`, `Vec`, scalars,
// `Option<EncodedFrame>`) are plain owned data. The session creates the encoder, calls
// `submit`/`poll`/`flush`, and drops it all on one dedicated encode thread, never sharing it by
// reference across threads, so the C handle is only ever touched from a single thread. Moving the
// whole value to that thread is therefore sound — there is no concurrent access to the handle.
unsafe impl Send for OpenH264Encoder {}
impl OpenH264Encoder {
+48 -1
View File
@@ -17,6 +17,9 @@
//! data packets are consumed immediately and missing parity only costs loss recovery — so
//! the validated stereo path stays byte-identical (data packets only, exactly as before).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
#[cfg(any(target_os = "linux", target_os = "windows", test))]
use crate::audio::SAMPLE_RATE;
#[cfg(any(target_os = "linux", target_os = "windows"))]
@@ -409,7 +412,10 @@ struct MsEncoder {
st: std::ptr::NonNull<audiopus_sys::OpusMSEncoder>,
}
// The raw encoder state has no thread affinity; the session owns it on one thread at a time.
// SAFETY: `MsEncoder` owns a unique `OpusMSEncoder` via `NonNull` (it is neither `Clone` nor
// `Sync`, so the pointer is never aliased). libopus's multistream encoder state is a self-contained
// heap allocation with no thread-local or thread-affine state, so moving ownership to another thread
// is sound; every method takes `&mut self`, keeping access single-threaded at any instant.
#[cfg(target_os = "linux")]
unsafe impl Send for MsEncoder {}
@@ -418,6 +424,13 @@ impl MsEncoder {
fn new(layout: &OpusLayout) -> Result<MsEncoder> {
use std::os::raw::c_int;
let mut err: c_int = 0;
// SAFETY: every scalar arg is a valid libopus input (sample rate, channel/stream/coupled
// counts, the RESTRICTED_LOWDELAY application constant). `layout.mapping.as_ptr()` addresses
// a 'static slice of exactly `layout.channels` bytes (every `OpusLayout` constant upholds
// that), which is the element count `opus_multistream_encoder_create` reads through it, and
// `&mut err` is a live local the call writes its status into. libopus copies the mapping into
// its own allocation, so the pointer need only be valid for the call; the returned pointer is
// null/`OPUS_OK`-checked below before any use.
let st = unsafe {
audiopus_sys::opus_multistream_encoder_create(
SAMPLE_RATE as i32,
@@ -432,6 +445,11 @@ impl MsEncoder {
let st = std::ptr::NonNull::new(st)
.filter(|_| err == audiopus_sys::OPUS_OK)
.ok_or_else(|| anyhow::anyhow!("opus_multistream_encoder_create failed ({err})"))?;
// SAFETY: `st` is the non-null encoder `opus_multistream_encoder_create` just returned, owned
// exclusively here. Each `opus_multistream_encoder_ctl` call passes a valid request constant
// with the single by-value `c_int` argument that request's variadic ABI expects
// (`OPUS_SET_BITRATE_REQUEST` → bitrate, `OPUS_SET_VBR_REQUEST` → 0). No pointer escapes the
// call and the encoder outlives it.
unsafe {
audiopus_sys::opus_multistream_encoder_ctl(
st.as_ptr(),
@@ -453,6 +471,13 @@ impl MsEncoder {
samples_per_channel: usize,
out: &mut [u8],
) -> Result<usize> {
// SAFETY: `self.st` is the live encoder from `new`. libopus reads `samples_per_channel *
// channels` f32s through `frame.as_ptr()`; every caller passes a `frame` of exactly that
// length together with the matching `samples_per_channel` (`audio_body`'s `frame_len =
// samples_per_channel * layout.channels`; the round-trip tests size identically), so the read
// stays in bounds. `out.as_mut_ptr()` is written for at most `out.len()` bytes, which is
// passed as the capacity bound. Both buffers are live locals outliving this synchronous call;
// the return value is range-checked before being used as a length.
let n = unsafe {
audiopus_sys::opus_multistream_encode_float(
self.st.as_ptr(),
@@ -470,6 +495,9 @@ impl MsEncoder {
#[cfg(target_os = "linux")]
impl Drop for MsEncoder {
fn drop(&mut self) {
// SAFETY: `self.st` is the encoder `opus_multistream_encoder_create` returned; this
// `MsEncoder` owns it uniquely and `drop` runs exactly once, so the destroy frees it once
// with no subsequent use.
unsafe { audiopus_sys::opus_multistream_encoder_destroy(self.st.as_ptr()) }
}
}
@@ -761,6 +789,10 @@ mod tests {
let client_mapping = client_swap(&digits[3..]);
let mut err = 0i32;
// SAFETY: scalar args are valid libopus inputs. `client_mapping.as_ptr()` addresses a
// `Vec<u8>` of exactly `ch` entries (derived from the advertised surround-params), which is
// the element count the decoder reads through it, and `&mut err` is a live local the call
// writes. The returned pointer is `OPUS_OK`/non-null-checked immediately below before use.
let dec = unsafe {
audiopus_sys::opus_multistream_decoder_create(
SAMPLE_RATE as i32,
@@ -789,6 +821,11 @@ mod tests {
}
let n = enc.encode_float(&frame, samples, &mut out).unwrap();
assert!(n > 0);
// SAFETY: `dec` is the non-null decoder asserted above. `out.as_ptr()` is read for
// the `n` encoded bytes just produced by `encode_float`; `decoded.as_mut_ptr()` is
// written for up to `samples * ch` f32s and `decoded` is exactly that long; `samples`
// is the per-channel frame size. All buffers are live locals outliving the call; the
// return is checked to equal `samples`.
let got = unsafe {
audiopus_sys::opus_multistream_decode_float(
dec,
@@ -817,6 +854,8 @@ mod tests {
(energies: {energy:?})"
);
}
// SAFETY: `dec` is the decoder `opus_multistream_decoder_create` returned; the test owns it
// and destroys it exactly once here, after the final decode — no later use, no double free.
unsafe { audiopus_sys::opus_multistream_decoder_destroy(dec) };
}
@@ -853,6 +892,9 @@ mod tests {
let digits: Vec<u8> = s.bytes().map(|b| b - b'0').collect();
let client_mapping = client_swap(&digits[3..]);
let mut err = 0i32;
// SAFETY: scalar args are valid; `client_mapping.as_ptr()` addresses a 6-entry `Vec<u8>`
// (matches the 6-channel layout the decoder reads through it), alive past the call, and
// `&mut err` is a live local. The pointer is `OPUS_OK`-checked before use.
let dec = unsafe {
audiopus_sys::opus_multistream_decoder_create(
48000,
@@ -865,6 +907,10 @@ mod tests {
};
assert_eq!(err, audiopus_sys::OPUS_OK);
let mut pcm = vec![0f32; 240 * 6];
// SAFETY: `dec` is the non-null decoder from create. `out.as_ptr()` is read for the CBR
// packet length passed in (`*sizes.first()`, a real encoded packet size in `out`);
// `pcm.as_mut_ptr()` is written for up to `240 * 6` f32s and `pcm` is exactly that long;
// `240` is the per-channel frame size. All buffers are live locals outliving the call.
let got = unsafe {
audiopus_sys::opus_multistream_decode_float(
dec,
@@ -875,6 +921,7 @@ mod tests {
0,
)
};
// SAFETY: `dec` is owned by the test; destroyed exactly once here after the final decode.
unsafe { audiopus_sys::opus_multistream_decoder_destroy(dec) };
assert_eq!(got, 240);
}
@@ -1,7 +1,7 @@
//! Pairing crypto primitives (control plane only — distinct from `punktfunk_core`'s AES-GCM
//! data-plane sealing). GameStream pairing uses: AES-128-**ECB** with **no padding**,
//! SHA-256 (host appversion major ≥ 7), and RSA-PKCS1v15-SHA256 signatures. See the
//! `serverinfo + pairing` section of `docs/research/gamestream-protocol-research.json`.
//! `serverinfo + pairing` section of `design/research/gamestream-protocol-research.json`.
use aes::cipher::generic_array::GenericArray;
use aes::cipher::{BlockDecrypt, BlockEncrypt, KeyInit};
+21 -8
View File
@@ -1,7 +1,7 @@
//! GameStream (P1) control plane — what a stock Moonlight/Artemis client talks to around
//! the media streams: mDNS discovery, the nvhttp serverinfo + pairing HTTP(S) API, RTSP,
//! and the ENet control stream. `tokio`/`axum` live here (control plane, I/O-bound — never
//! the per-frame hot path; that is `punktfunk_core`'s P1 wire codec). See `docs/gamestream-host-plan.md`.
//! the per-frame hot path; that is `punktfunk_core`'s P1 wire codec). See `design/gamestream-host-plan.md`.
//!
//! Status: P1.1 — mDNS `_nvstream._tcp` advertisement + `/serverinfo`. Pairing, RTSP, and
//! the media streams follow (see the GameStream host task list / plan).
@@ -125,12 +125,21 @@ pub struct AppState {
/// (avoids a PipeWire stream setup per reconnect); drained on reuse so no stale audio is
/// sent, dropped + reopened when a session negotiates a different channel count.
pub audio_cap: std::sync::Arc<std::sync::Mutex<Option<Box<dyn crate::audio::AudioCapturer>>>>,
/// Shared streaming-stats recorder (web-console capture/graph). The GameStream encode loop
/// reads `is_armed()` per frame and emits samples; the same `Arc` is shared with the mgmt API
/// and the native punktfunk/1 loops so one capture spans whichever path is streaming.
pub stats: Arc<crate::stats_recorder::StatsRecorder>,
}
impl AppState {
/// Fresh control-plane state: no active session; the pairing allow-list is loaded from
/// disk (pairings persist across restarts).
pub fn new(host: Host, identity: cert::ServerIdentity) -> AppState {
/// disk (pairings persist across restarts). `stats` is the shared recorder handed to both the
/// mgmt API and the streaming loops.
pub fn new(
host: Host,
identity: cert::ServerIdentity,
stats: Arc<crate::stats_recorder::StatsRecorder>,
) -> AppState {
AppState {
host,
identity,
@@ -145,6 +154,7 @@ impl AppState {
rfi_range: std::sync::Arc::new(std::sync::Mutex::new(None)),
video_cap: std::sync::Arc::new(std::sync::Mutex::new(None)),
audio_cap: std::sync::Arc::new(std::sync::Mutex::new(None)),
stats,
}
}
}
@@ -166,7 +176,10 @@ pub fn serve(
) -> Result<()> {
let host = Host::detect()?;
let identity = cert::ServerIdentity::load_or_create().context("host certificate")?;
let state = Arc::new(AppState::new(host, identity));
// The shared streaming-stats recorder: one handle for the mgmt API, the GameStream encode loop
// (via `AppState`), and the native punktfunk/1 loops (passed to `punktfunk1::serve`).
let stats = crate::stats_recorder::StatsRecorder::new(crate::stats_recorder::default_dir());
let state = Arc::new(AppState::new(host, identity, stats.clone()));
// The native plane always runs, so the shared native-pairing handle (linking the QUIC ceremony
// and the management API) always exists.
let np = Arc::new(
@@ -206,8 +219,8 @@ pub fn serve(
);
tokio::try_join!(
nvhttp::run(state.clone()),
crate::mgmt::run(state.clone(), mgmt, Some(np.clone())),
crate::punktfunk1::serve(native_opts, np),
crate::mgmt::run(state.clone(), mgmt, Some(np.clone()), stats.clone()),
crate::punktfunk1::serve(native_opts, np, stats.clone()),
)?;
} else {
// Secure default: native punktfunk/1 + management API only (no GameStream surface).
@@ -217,8 +230,8 @@ pub fn serve(
(GameStream OFF — pass --gamestream for stock-Moonlight compat)"
);
tokio::try_join!(
crate::mgmt::run(state.clone(), mgmt, Some(np.clone())),
crate::punktfunk1::serve(native_opts, np),
crate::mgmt::run(state.clone(), mgmt, Some(np.clone()), stats.clone()),
crate::punktfunk1::serve(native_opts, np, stats.clone()),
)?;
}
Ok(())
@@ -291,7 +291,10 @@ mod tests {
https_port: HTTPS_PORT,
};
let identity = super::super::cert::ServerIdentity::ephemeral().expect("ephemeral identity");
Arc::new(AppState::new(host, identity))
let stats = crate::stats_recorder::StatsRecorder::new(
std::env::temp_dir().join(format!("pf-nvhttp-stats-{}", std::process::id())),
);
Arc::new(AppState::new(host, identity, stats))
}
fn fp_of(der: &[u8]) -> String {
@@ -1,7 +1,7 @@
//! The 4-phase GameStream pairing state machine (over HTTP), keyed by `uniqueid`. Proves
//! both sides know the PIN (via the SHA-256(salt||pin) AES-ECB key) and own their certs
//! (RSA signatures), then pins the client cert. The final `pairchallenge` happens over
//! HTTPS (handled in `nvhttp`). Byte-exact spec: `docs/research/…-research.json`.
//! HTTPS (handled in `nvhttp`). Byte-exact spec: `design/research/…-research.json`.
use super::cert::ServerIdentity;
use super::crypto;
@@ -234,6 +234,7 @@ fn handle_request(req: &Request, state: &AppState) -> String {
state.force_idr.clone(),
state.rfi_range.clone(),
state.video_cap.clone(),
state.stats.clone(),
);
}
Some(_) => tracing::info!("RTSP PLAY — stream already running"),
+181 -14
View File
@@ -3,6 +3,9 @@
//! either real portal desktop capture (`PUNKTFUNK_VIDEO_SOURCE=portal`, the portal PipeWire path) or
//! a synthetic test pattern (default). Runs on its own native thread.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
use super::video::{FrameType, VideoPacketizer};
use super::VIDEO_PORT;
use crate::capture::{self, Capturer, FastSyntheticCapturer};
@@ -45,6 +48,7 @@ pub fn start(
force_idr: Arc<AtomicBool>,
rfi_range: RfiSlot,
video_cap: CapturerSlot,
stats: Arc<crate::stats_recorder::StatsRecorder>,
) {
let _ = std::thread::Builder::new()
.name("punktfunk-video".into())
@@ -57,6 +61,7 @@ pub fn start(
&force_idr,
&rfi_range,
&video_cap,
&stats,
) {
tracing::error!(error = %format!("{e:#}"), "video stream failed");
}
@@ -65,6 +70,7 @@ pub fn start(
});
}
#[allow(clippy::too_many_arguments)]
fn run(
cfg: StreamConfig,
app: Option<&super::apps::AppEntry>,
@@ -72,6 +78,9 @@ fn run(
force_idr: &AtomicBool,
rfi_range: &std::sync::Mutex<Option<(i64, i64)>>,
video_cap: &std::sync::Mutex<Option<Box<dyn Capturer>>>,
// Shared stats recorder for the web-console capture/graph. Threaded into `stream_body` (the
// encode loop); per-frame sample emission is wired by a later pass.
stats: &Arc<crate::stats_recorder::StatsRecorder>,
) -> Result<()> {
// GameStream capture/encode thread: apply Windows session tuning (no-op off Windows).
crate::session_tuning::on_hot_thread();
@@ -97,6 +106,8 @@ fn run(
sock.connect(client)
.context("connect client video endpoint")?;
tracing::info!(%client, "video: client endpoint learned");
// Short label for web-console stats captures: the client's peer IP.
let client_label = client.ip().to_string();
// Native client-resolution source: create a compositor virtual output sized to the client's
// request and capture it (no scaling). Self-contained — deliberately NOT pooled in
@@ -141,7 +152,35 @@ fn run(
)
.context("capture virtual output")?;
capturer.set_active(true);
return stream_body(&mut *capturer, &sock, cfg, running, force_idr, rfi_range);
// Launch the app's command now that capture is live, for the backends that DON'T nest it via
// set_launch_command above: Windows (no gamescope) and Linux kwin/mutter/wlroots (which stream
// the existing desktop, so the app must be spawned into the session to land on the streamed
// output). Linux gamescope already nested it via set_launch_command, so skip it there.
#[cfg(windows)]
let launch_here = true;
#[cfg(target_os = "linux")]
let launch_here = compositor != crate::vdisplay::Compositor::Gamescope;
#[cfg(any(windows, target_os = "linux"))]
if launch_here {
if let Some(cmd) = app
.and_then(|a| a.cmd.as_deref())
.filter(|c| !c.trim().is_empty())
{
if let Err(e) = crate::library::launch_gamestream_command(cmd) {
tracing::warn!(command = %cmd, error = %e, "gamestream: could not launch app");
}
}
}
return stream_body(
&mut *capturer,
&sock,
cfg,
running,
force_idr,
rfi_range,
stats,
&client_label,
);
}
// Reuse the persistent capturer (one screencast session → clean reconnect); create it on
@@ -161,7 +200,16 @@ fn run(
}
};
capturer.set_active(true);
let result = stream_body(&mut *capturer, &sock, cfg, running, force_idr, rfi_range);
let result = stream_body(
&mut *capturer,
&sock,
cfg,
running,
force_idr,
rfi_range,
stats,
&client_label,
);
capturer.set_active(false);
*video_cap.lock().unwrap() = Some(capturer);
result
@@ -188,6 +236,10 @@ fn sendmmsg_all(sock: &UdpSocket, pkts: &[Vec<u8>]) -> std::io::Result<()> {
let mut hdrs: Vec<libc::mmsghdr> = iovs
.iter_mut()
.map(|iov| {
// SAFETY: `libc::mmsghdr` is a plain `#[repr(C)]` struct of integers and raw
// pointers, for which an all-zero bit pattern is valid (null pointers / zero
// lengths); the fields we rely on (`msg_iov`, `msg_iovlen`) are overwritten on the
// next two lines before the struct is handed to the kernel.
let mut h: libc::mmsghdr = unsafe { std::mem::zeroed() };
h.msg_hdr.msg_iov = iov;
h.msg_hdr.msg_iovlen = 1;
@@ -196,6 +248,13 @@ fn sendmmsg_all(sock: &UdpSocket, pkts: &[Vec<u8>]) -> std::io::Result<()> {
.collect();
let mut off = 0usize;
while off < hdrs.len() {
// SAFETY: `fd` is `sock`'s live raw fd (`sock` outlives the call). `hdrs[off..]
// .as_mut_ptr()` is a live slice of `(hdrs.len() - off)` `mmsghdr`s — exactly the count
// passed — into which the kernel writes each `msg_len`. Each header's `msg_iov` points
// into `iovs` (a local that outlives this call, with `msg_iovlen == 1` matching its one
// entry) and each `iovec.iov_base` points into the `chunk` packet buffers (the caller's
// `pkts`, alive for the call); the kernel only reads those payloads. Flags 0; the return
// is error-/progress-checked before advancing `off`.
let n = unsafe {
libc::sendmmsg(fd, hdrs[off..].as_mut_ptr(), (hdrs.len() - off) as u32, 0)
};
@@ -293,8 +352,20 @@ fn spawn_sender(
Ok(())
}
/// Percentile of a slice (sorts it in place first). `q` in `0.0..=1.0`. Used for the web-console
/// stats sample's per-stage p50/p99.
fn percentile(v: &mut [u32], q: f64) -> u32 {
if v.is_empty() {
return 0;
}
v.sort_unstable();
let i = ((v.len() as f64 * q) as usize).min(v.len() - 1);
v[i]
}
/// The encode → packetize loop, over a borrowed capturer. Sending runs on a dedicated thread
/// (see [`spawn_sender`]) so a send spike can never stall capture/encode.
#[allow(clippy::too_many_arguments)]
fn stream_body(
capturer: &mut dyn Capturer,
sock: &UdpSocket,
@@ -302,6 +373,11 @@ fn stream_body(
running: &Arc<AtomicBool>,
force_idr: &AtomicBool,
rfi_range: &std::sync::Mutex<Option<(i64, i64)>>,
// Shared stats recorder. The encode loop reads `stats.is_armed()` per frame to decide whether
// to accumulate the per-stage split, then emits a `StatsSample` at its 1 s aggregation boundary.
stats: &Arc<crate::stats_recorder::StatsRecorder>,
// Short client label (peer IP) seeded into the capture meta on the first armed registration.
client_label: &str,
) -> Result<()> {
// The first frame establishes the authoritative size/format for the encoder.
let mut frame = capturer.next_frame().context("capture first frame")?;
@@ -365,6 +441,19 @@ fn stream_body(
let perf = crate::config::config().perf;
let (mut mx_cap, mut mx_enc, mut mx_pkt, mut mx_send, mut mx_pkts, mut uniq) =
(0u128, 0u128, 0u128, 0u128, 0usize, 0u32);
// Web-console stats accumulation (active when `perf` OR a capture is armed): per-stage vectors
// for p50/p99, the goodput bytes queued to the sender this window, the previous window's
// dropped-frame count for delta computation, and the registration id cached on the first sample.
let codec_name = match cfg.codec {
Codec::H264 => "h264",
Codec::H265 => "hevc",
Codec::Av1 => "av1",
};
let mut sid: Option<u32> = None;
let (mut v_cap, mut v_enc, mut v_pkt, mut v_send): (Vec<u32>, Vec<u32>, Vec<u32>, Vec<u32>) =
(Vec::new(), Vec::new(), Vec::new(), Vec::new());
let mut bytes_win: u64 = 0;
let mut last_dropped_batches: u64 = 0;
// Absolute next-frame deadline — the single pacing clock for the loop.
let mut next_frame = Instant::now();
// RFI capability is fixed for the session (probed at encoder open). Query it once so the
@@ -374,6 +463,9 @@ fn stream_body(
while running.load(Ordering::SeqCst) {
let tick = Instant::now();
// Measure per-stage timing when `PUNKTFUNK_PERF` is set OR a web-console stats capture is
// armed (cheap Relaxed atomic, re-read each frame).
let measure = perf || stats.is_armed();
// Advance to the freshest captured frame if one arrived; otherwise reuse the last.
if let Some(f) = capturer.try_latest().context("capture frame")? {
frame = f;
@@ -414,9 +506,19 @@ fn stream_body(
// Hand the frame's packets to the send thread; never block here. A full queue means
// the sender is behind — drop this batch (FEC/RFI covers the client) and keep encoding.
let n = batch.len();
// Goodput this window = bytes actually queued to the sender (a dropped batch never reaches
// the wire, so it's excluded). Summed only when measuring, to keep the idle path free.
let batch_bytes: u64 = if measure {
batch.iter().map(|p| p.len() as u64).sum()
} else {
0
};
if n > 0 {
match batch_tx.try_send(batch) {
Ok(()) => sent_batches += 1,
Ok(()) => {
sent_batches += 1;
bytes_win += batch_bytes;
}
Err(std::sync::mpsc::TrySendError::Full(_)) => {
dropped_batches += 1;
if dropped_batches.is_power_of_two() {
@@ -428,17 +530,26 @@ fn stream_body(
}
}
}
if perf {
if measure {
let t_send = tick.elapsed();
mx_cap = mx_cap.max(t_cap.as_micros());
mx_enc = mx_enc.max((t_enc - t_cap).as_micros());
mx_pkt = mx_pkt.max((t_pkt - t_enc).as_micros());
mx_send = mx_send.max((t_send - t_pkt).as_micros());
let cap_us = t_cap.as_micros();
let enc_us = (t_enc - t_cap).as_micros();
let pkt_us = (t_pkt - t_enc).as_micros();
let send_us = (t_send - t_pkt).as_micros();
mx_cap = mx_cap.max(cap_us);
mx_enc = mx_enc.max(enc_us);
mx_pkt = mx_pkt.max(pkt_us);
mx_send = mx_send.max(send_us);
mx_pkts = mx_pkts.max(n);
v_cap.push(cap_us as u32);
v_enc.push(enc_us as u32);
v_pkt.push(pkt_us as u32);
v_send.push(send_us as u32);
}
fps_count += 1;
if fps_t.elapsed() >= Duration::from_secs(1) {
let secs = fps_t.elapsed().as_secs_f64();
if perf {
// Max µs/stage this second: cap=drain channel, enc=submit (zero-copy device
// copy + NVENC), pkt=poll+FEC+packetize, send=paced packet send. `uniq`=new
@@ -453,12 +564,6 @@ fn stream_body(
max_pkts = mx_pkts,
"video: streaming (perf)"
);
mx_cap = 0;
mx_enc = 0;
mx_pkt = 0;
mx_send = 0;
mx_pkts = 0;
uniq = 0;
} else {
tracing::info!(
fps = fps_count,
@@ -467,6 +572,68 @@ fn stream_body(
"video: streaming"
);
}
// Web-console capture: build the aggregated sample. The host send side exposes no
// receiver-side packet loss / FEC-recovery / send-buffer EAGAIN counters, so those stay
// 0 (not fabricated); `frames_dropped` is the per-frame send-queue overflow delta.
if stats.is_armed() {
let session_id = *sid.get_or_insert_with(|| {
stats.register_session(
"gamestream",
cfg.width,
cfg.height,
cfg.fps,
codec_name,
client_label,
)
});
let sample = crate::stats_recorder::StatsSample {
t_ms: 0, // stamped by push_sample from the capture's monotonic start
session_id,
stages: vec![
crate::stats_recorder::StageTiming {
name: "capture".into(),
p50_us: percentile(&mut v_cap, 0.50) as f32,
p99_us: percentile(&mut v_cap, 0.99) as f32,
},
crate::stats_recorder::StageTiming {
name: "encode".into(),
p50_us: percentile(&mut v_enc, 0.50) as f32,
p99_us: percentile(&mut v_enc, 0.99) as f32,
},
crate::stats_recorder::StageTiming {
name: "packetize".into(),
p50_us: percentile(&mut v_pkt, 0.50) as f32,
p99_us: percentile(&mut v_pkt, 0.99) as f32,
},
crate::stats_recorder::StageTiming {
name: "send".into(),
p50_us: percentile(&mut v_send, 0.50) as f32,
p99_us: percentile(&mut v_send, 0.99) as f32,
},
],
fps: (uniq as f64 / secs) as f32,
repeat_fps: (fps_count.saturating_sub(uniq) as f64 / secs) as f32,
mbps: (bytes_win as f64 * 8.0 / secs / 1_000_000.0) as f32,
bitrate_kbps: cfg.bitrate_kbps,
frames_dropped: dropped_batches.saturating_sub(last_dropped_batches) as u32,
packets_dropped: 0,
send_dropped: 0,
fec_recovered: 0,
};
stats.push_sample(session_id, sample);
}
mx_cap = 0;
mx_enc = 0;
mx_pkt = 0;
mx_send = 0;
mx_pkts = 0;
uniq = 0;
v_cap.clear();
v_enc.clear();
v_pkt.clear();
v_send.clear();
bytes_win = 0;
last_dropped_batches = dropped_batches;
fps_count = 0;
fps_t = Instant::now();
}
@@ -3,7 +3,7 @@
//! `RTP_PACKET(12, big-endian) + reserved[4] + NV_VIDEO_PACKET(16, little-endian) + payload`
//! and the frame's bitstream is prefixed with an 8-byte `video_short_frame_header_t`, then
//! striped into ≤4 FEC blocks of ≤255 shards. Byte-exact spec:
//! `docs/research/gamestream-protocol-research.json` (video plane).
//! `design/research/gamestream-protocol-research.json` (video plane).
//!
//! FEC (P1.5): each block carries `m = ⌈k·pct/100⌉` ReedSolomon parity shards generated by
//! `punktfunk_core::fec::Gf8Coder` (the nanors-compatible Cauchy GF(2⁸) coder). Crucially, RS runs
@@ -15,6 +15,9 @@
//! `<linux/uinput.h>` on x86_64. `/dev/uinput` needs a udev rule + `input` group membership
//! (see `scripts/60-punktfunk.rules`); creation fails with a clear error otherwise.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use crate::gamestream::gamepad::{self, GamepadFrame, MAX_PADS};
use anyhow::{bail, Result};
use std::collections::HashMap;
@@ -215,6 +218,11 @@ const _: () = {
};
fn ioctl_int(fd: i32, req: libc::c_ulong, arg: libc::c_int, what: &str) -> Result<()> {
// SAFETY: every caller passes one of UI_SET_EVBIT/KEYBIT/FFBIT/UI_DEV_CREATE/UI_DEV_DESTROY as
// `req` — all integer-argument ioctls whose third arg the kernel takes BY VALUE, so nothing is
// dereferenced through `arg` and no memory must outlive the call. The only precondition is `fd`
// being a valid open descriptor; callers pass the live `/dev/uinput` fd, and even a stale fd
// would merely return -1/EBADF (reported below), never UB.
if unsafe { libc::ioctl(fd, req, arg) } < 0 {
bail!("{what}: {}", std::io::Error::last_os_error());
}
@@ -222,6 +230,12 @@ fn ioctl_int(fd: i32, req: libc::c_ulong, arg: libc::c_int, what: &str) -> Resul
}
fn ioctl_ptr<T>(fd: i32, req: libc::c_ulong, arg: *mut T, what: &str) -> Result<()> {
// SAFETY: `fd` is the caller's live `/dev/uinput` fd. Every call site passes `&mut x` for a live,
// uniquely-borrowed `#[repr(C)]` `x: T` whose size matches the struct the request number encodes
// (UI_DEV_SETUP=0x405c_5503 → 0x5c=92=size_of::<UinputSetup>(); UI_ABS_SETUP → 0x1c=28; the FF
// upload/erase ioctls → 0x68/0x0c — all pinned by the `size_of` asserts above). The kernel copies
// exactly that many bytes in/out through `arg`; the `&mut` keeps the pointee alive and unaliased
// for the whole synchronous call.
if unsafe { libc::ioctl(fd, req, arg) } < 0 {
bail!("{what}: {}", std::io::Error::last_os_error());
}
@@ -251,6 +265,9 @@ pub struct VirtualPad {
impl VirtualPad {
pub fn create(index: usize, identity: PadIdentity) -> Result<VirtualPad> {
use std::os::fd::FromRawFd;
// SAFETY: `c"/dev/uinput"` is a 'static NUL-terminated C string literal; `as_ptr()` yields a
// valid pointer the kernel only reads as a filesystem path. `open` returns a fresh fd (or -1)
// and retains nothing; no Rust memory is aliased or handed to the kernel beyond that 'static path.
let raw = unsafe {
libc::open(
c"/dev/uinput".as_ptr(),
@@ -264,6 +281,9 @@ impl VirtualPad {
std::io::Error::last_os_error()
);
}
// SAFETY: `raw >= 0` here (the `< 0` branch above already bailed), so it is a freshly-opened fd
// from `libc::open` that is not stored or owned anywhere else. Transferring it to `OwnedFd` makes
// this the unique owner, which will `close` it exactly once on drop (no double-close, no leak).
let fd = unsafe { OwnedFd::from_raw_fd(raw) };
ioctl_int(raw, UI_SET_EVBIT, EV_KEY as i32, "UI_SET_EVBIT(EV_KEY)")?;
@@ -356,6 +376,11 @@ impl VirtualPad {
code,
value,
};
// SAFETY: `ev` is a live local `#[repr(C)]` struct of all-integer fields with no padding bytes
// (timeval=16 + u16 + u16 + i32 = 24, the size asserted above), so every byte is initialized and
// valid to read as `u8`. The pointer is non-null and `u8`-aligned (align 1), the length is exactly
// `size_of::<InputEventRaw>()` so the slice spans precisely `ev`'s bytes (in bounds), and `ev`
// outlives `bytes` (used immediately below) with no concurrent mutation (single-threaded local).
let bytes = unsafe {
std::slice::from_raw_parts(
&ev as *const _ as *const u8,
@@ -363,6 +388,10 @@ impl VirtualPad {
)
};
// Best-effort: a full kernel queue drops the event; the next frame re-syncs state.
// SAFETY: `self.fd` is the live uinput `OwnedFd` (borrowed via `as_raw_fd`, so it stays open for
// the call); `bytes` is the slice above backed by the still-live local `ev`. `write` only READS
// exactly `bytes.len()` bytes from `bytes.as_ptr()` (in bounds) and retains nothing past return,
// so the buffer outlives the synchronous call and the read-only access cannot race or alias.
let _ = unsafe {
libc::write(
self.fd.as_raw_fd(),
@@ -404,6 +433,10 @@ impl VirtualPad {
let raw = self.fd.as_raw_fd();
let mut buf = [0u8; std::mem::size_of::<InputEventRaw>()];
loop {
// SAFETY: `raw` is the live raw fd of `self.fd` (the non-blocking uinput device). `buf` is a
// live local `[u8; size_of::<InputEventRaw>()]`; `buf.as_mut_ptr()` is a valid writable pointer
// to its `buf.len()` bytes. `read` writes AT MOST `buf.len()` bytes (in bounds), the buffer
// outlives this synchronous call, and `buf` is borrowed uniquely here (no alias/race).
let n = unsafe { libc::read(raw, buf.as_mut_ptr() as *mut libc::c_void, buf.len()) };
if n != buf.len() as isize {
break; // EAGAIN / short read — queue drained
@@ -415,6 +448,10 @@ impl VirtualPad {
unsafe { std::ptr::read_unaligned(buf.as_ptr() as *const InputEventRaw) };
match (ev.type_, ev.code) {
(EV_UINPUT, UI_FF_UPLOAD) => {
// SAFETY: `UinputFfUpload` is `#[repr(C)]` over integers (`u32`, `i32`) and two
// `FfEffect`s (integers + `[u8; 32]`); all-zero is a valid bit pattern for every field
// (no bool/NonZero/enum/reference niche), so `zeroed` yields a fully-initialized valid
// value — `request_id` is then set below and the rest filled by UI_BEGIN_FF_UPLOAD.
let mut up: UinputFfUpload = unsafe { std::mem::zeroed() };
up.request_id = ev.value as u32;
if ioctl_ptr(raw, UI_BEGIN_FF_UPLOAD, &mut up, "UI_BEGIN_FF_UPLOAD").is_ok() {
@@ -442,6 +479,9 @@ impl VirtualPad {
}
}
(EV_UINPUT, UI_FF_ERASE) => {
// SAFETY: `UinputFfErase` is `#[repr(C)]` over three integer fields (`u32`, `i32`,
// `u32`); all-zero is a valid bit pattern for each, so `zeroed` produces a fully-valid
// initialized value — `request_id` is set below and `effect_id` filled by the ioctl.
let mut er: UinputFfErase = unsafe { std::mem::zeroed() };
er.request_id = ev.value as u32;
if ioctl_ptr(raw, UI_BEGIN_FF_ERASE, &mut er, "UI_BEGIN_FF_ERASE").is_ok() {
@@ -492,6 +532,9 @@ impl VirtualPad {
impl Drop for VirtualPad {
fn drop(&mut self) {
// SAFETY: `self.fd` is still the live owned uinput fd here (the `OwnedFd` field is closed only
// AFTER this `drop` body returns), borrowed by `as_raw_fd`. UI_DEV_DESTROY takes its argument
// (0) BY VALUE, so nothing is dereferenced or aliased; the ioctl just tears down the device.
let _ = unsafe { libc::ioctl(self.fd.as_raw_fd(), UI_DEV_DESTROY, 0) };
}
}
@@ -5,6 +5,9 @@
//! keymap, and translate events into virtual pointer/keyboard requests, tracking modifier state
//! so the compositor resolves shifted keysyms correctly.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{gs_button_to_evdev, vk_to_evdev, InputEvent, InputInjector};
use anyhow::{bail, Context, Result};
use punktfunk_core::input::InputKind;
@@ -264,10 +267,17 @@ impl InputInjector for WlrootsInjector {
/// Create an anonymous in-memory file holding `s` + a trailing NUL (for the keymap fd).
fn memfd_with(s: &str) -> Result<std::fs::File> {
let name = b"punktfunk-keymap\0";
// SAFETY: `name` is a byte-string literal with an explicit trailing NUL, so `name.as_ptr()` is a
// valid NUL-terminated C string; `memfd_create` only reads that name (copying it) and creates an
// anonymous file, returning a fresh fd (or -1). `MFD_CLOEXEC` is a valid flag. The 'static literal
// outlives the synchronous call and nothing aliases it. The result is checked `< 0` below.
let fd = unsafe { libc::memfd_create(name.as_ptr() as *const libc::c_char, libc::MFD_CLOEXEC) };
if fd < 0 {
bail!("memfd_create failed: {}", std::io::Error::last_os_error());
}
// SAFETY: `fd` is the fresh memfd `memfd_create` just returned and checked `>= 0`; it is a unique
// open fd nothing else owns, so `File` takes sole ownership and closes it exactly once on drop —
// no alias, no double-close.
let mut f = unsafe { std::fs::File::from_raw_fd(fd) };
f.write_all(s.as_bytes()).context("write keymap")?;
f.write_all(&[0]).context("write keymap NUL")?;
@@ -41,7 +41,8 @@ pub(super) const SHM_MAGIC: u32 = pf_driver_proto::gamepad::PAD_MAGIC; // "PFDS"
pub(super) const OFF_INPUT: usize = core::mem::offset_of!(pf_driver_proto::gamepad::PadShm, input);
pub(super) const OFF_OUT_SEQ: usize =
core::mem::offset_of!(pf_driver_proto::gamepad::PadShm, out_seq);
pub(super) const OFF_OUTPUT: usize = core::mem::offset_of!(pf_driver_proto::gamepad::PadShm, output);
pub(super) const OFF_OUTPUT: usize =
core::mem::offset_of!(pf_driver_proto::gamepad::PadShm, output);
/// Device-type selector the driver reads to choose which HID identity/descriptor it serves: 0 =
/// DualSense (the default — the section is zeroed), 1 = DualShock 4.
pub(super) const OFF_DEVTYPE: usize =
@@ -108,7 +109,7 @@ pub(super) struct SwDeviceProfile<'a> {
/// `profile.instance`). The returned `HSWDEVICE` owns it — `SwDeviceClose` removes it on drop, so the
/// pad appears/disappears with the session and nothing persists.
///
/// **Game-detection identity** (see `docs/windows-dualsense-game-detection.md`). `HIDD_ATTRIBUTES`
/// **Game-detection identity** (see `design/windows-dualsense-game-detection.md`). `HIDD_ATTRIBUTES`
/// alone (VID/PID via the IOCTL) satisfies SDL/HIDAPI/RawInput, but a native PS5 path (libScePad-
/// style raw HID) classifies the *connection type* by walking from the HID child to its parent
/// (`CM_Get_Parent`) and string-matching `"USB"`/`"BTHENUM"` in that parent's
@@ -187,8 +187,10 @@ impl XusbWinPad {
#[allow(clippy::too_many_arguments)]
fn write_state(&mut self, buttons: u16, lt: u8, rt: u8, lx: i16, ly: i16, rx: i16, ry: i16) {
self.packet = self.packet.wrapping_add(1);
// SAFETY: base points at SHM_SIZE bytes; all offsets are in range.
let base = self.shm.base();
// SAFETY: `base` is the start of the mapped section (`SHM_SIZE` bytes, owned by `Shm`); every
// `OFF_*` is a fixed in-range offset into it and `write_unaligned` handles the unaligned field
// writes. Single owner (`&mut self`), so no concurrent writer races these stores.
unsafe {
std::ptr::write_unaligned(base.add(OFF_BUTTONS) as *mut u16, buttons);
*base.add(OFF_LT) = lt;
@@ -5,6 +5,9 @@
//! thread stays bound to its desktop and only reattaches (`OpenInputDesktop`/`SetThreadDesktop`) when
//! `SendInput` reports a short write (the input desktop switched) — no per-event reattach overhead.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::Result;
use punktfunk_core::input::{InputEvent, InputKind};
use std::mem::size_of;
@@ -35,7 +38,12 @@ pub struct SendInputInjector {
desktop: Option<HDESK>,
}
// Only ever used from the host's single injector thread.
// SAFETY: `SendInputInjector` holds only an `Option<HDESK>` (a desktop handle). The host creates
// and drives it from a single dedicated injector thread; the handle is opened, rebound, and closed
// on whichever thread owns the value, and the type is not `Sync`, so there is never concurrent
// access. A desktop `HDESK` is not thread-affine for ownership (`CloseDesktop` works from any
// thread; `SetThreadDesktop` rebinds the current thread), so transferring ownership via `Send` is
// sound.
unsafe impl Send for SendInputInjector {}
impl SendInputInjector {
@@ -49,6 +57,12 @@ impl SendInputInjector {
/// Bind this thread to the desktop currently receiving input. UAC / lock screen / Ctrl-Alt-Del
/// swap the input desktop; `SendInput` silently no-ops unless our thread is on it.
fn reattach_input_desktop(&mut self) {
// SAFETY: `OpenInputDesktop`/`SetThreadDesktop`/`CloseDesktop` are FFI calls passed only
// by-value args (constant desktop flags, a `bool`, an access mask). `OpenInputDesktop`
// yields an owned `HDESK` only on `Ok`; we then either install it with `SetThreadDesktop`
// (closing the previously-owned handle exactly once) or close the fresh handle on failure —
// so every handle is closed exactly once and none is used after close. `SetThreadDesktop`
// only rebinds this calling thread, which is where the injector runs.
unsafe {
match OpenInputDesktop(
DESKTOP_CONTROL_FLAGS(0),
@@ -75,12 +89,17 @@ impl SendInputInjector {
/// switched out from under us, e.g. into UAC/lock) do we reattach to the now-current input desktop
/// and retry once. This serves both the normal and secure desktops with no steady-state overhead.
fn send(&mut self, inputs: &[INPUT]) -> Result<()> {
// SAFETY: `inputs` is a live `&[INPUT]` slice that outlives this synchronous `SendInput`
// call; `size_of::<INPUT>()` is the exact per-element stride Win32 requires as `cbSize`. The
// call only reads the array (one event per element) and returns the count injected.
let n = unsafe { SendInput(inputs, size_of::<INPUT>() as i32) };
if n as usize == inputs.len() {
return Ok(());
}
// Short write → the input desktop likely changed. Reattach + retry once.
self.reattach_input_desktop();
// SAFETY: same as the first `SendInput` — `inputs` is the identical live slice outliving the
// call and `cbSize == size_of::<INPUT>()`; only re-issued after reattaching the input desktop.
let n = unsafe { SendInput(inputs, size_of::<INPUT>() as i32) };
if n as usize != inputs.len() {
anyhow::bail!(
@@ -95,6 +114,9 @@ impl SendInputInjector {
impl Drop for SendInputInjector {
fn drop(&mut self) {
if let Some(h) = self.desktop.take() {
// SAFETY: `h` is the `HDESK` this injector owned (moved out of `self.desktop`);
// `CloseDesktop` runs once here in `Drop` on that still-valid handle, with no later use —
// no double close.
unsafe {
let _ = CloseDesktop(h);
}
@@ -216,7 +238,11 @@ impl InputInjector for SendInputInjector {
}
InputKind::KeyDown | InputKind::KeyUp => {
let down = event.kind == InputKind::KeyDown;
let vk = (event.code & 0xff) as u16; // client sends Windows VK
// client sends Windows VK
let vk = (event.code & 0xff) as u16;
// SAFETY: `MapVirtualKeyExW` is a pure value translation (VK → scancode); all three
// args are by-value (`u32`, the `MAPVK_VK_TO_VSC_EX` map-type constant, a `None`
// HKL). It dereferences no pointer and returns a `u32` — FFI-`unsafe` only.
let sc_ex = unsafe { MapVirtualKeyExW(vk as u32, MAPVK_VK_TO_VSC_EX, None) };
if sc_ex == 0 {
return Ok(()); // unmappable -> drop
@@ -264,6 +290,8 @@ fn key(ki: KEYBDINPUT) -> INPUT {
}
fn virtual_desktop_rect() -> (i32, i32, i32, i32) {
// SAFETY: each `GetSystemMetrics` takes a single by-value `SYSTEM_METRICS_INDEX` constant and
// returns an `i32`; it dereferences no pointer and has no side effects — FFI-`unsafe` only.
unsafe {
(
GetSystemMetrics(SM_XVIRTUALSCREEN),
+806
View File
@@ -548,6 +548,621 @@ fn heroic_launch_prefix() -> Option<String> {
flatpak.then(|| "flatpak run com.heroicgameslauncher.hgl".into())
}
// ---------------------------------------------------------------------------------------
// Epic Games Store (Windows) — reads the launcher's local `.item` manifests under ProgramData
// (no auth, launcher need not run). Cover art from the base64 `catcache.bin` (public Epic CDN).
// ---------------------------------------------------------------------------------------
/// Reads the Epic Games Launcher's local install manifests. Windows-only. Best-effort: empty when
/// the launcher (or its manifest dir) isn't present.
#[cfg(windows)]
pub struct EpicProvider;
#[cfg(windows)]
impl LibraryProvider for EpicProvider {
fn store(&self) -> &'static str {
"epic"
}
fn list(&self) -> Vec<GameEntry> {
let data = epic_data_dir();
let Ok(rd) = std::fs::read_dir(data.join("Manifests")) else {
return Vec::new();
};
// Parse the (best-effort) artwork cache ONCE: catalogItemId -> Artwork.
let art = epic_art_index(&data.join("Catalog").join("catcache.bin"));
let mut games = Vec::new();
for entry in rd.flatten() {
let p = entry.path();
if p.extension().and_then(|e| e.to_str()) != Some("item") {
continue;
}
let Ok(text) = std::fs::read_to_string(&p) else {
continue;
};
let Ok(v) = serde_json::from_str::<serde_json::Value>(&text) else {
continue;
};
if let Some(g) = epic_entry(&v, &art) {
games.push(g);
}
}
games
}
}
/// `%ProgramData%\Epic\EpicGamesLauncher\Data` (machine-wide, SYSTEM-readable).
#[cfg(windows)]
fn epic_data_dir() -> PathBuf {
std::env::var_os("ProgramData")
.map(PathBuf::from)
.unwrap_or_else(|| PathBuf::from("C:\\ProgramData"))
.join("Epic")
.join("EpicGamesLauncher")
.join("Data")
}
/// Map one `.item` manifest to a [`GameEntry`], or `None` if it isn't a launchable game. Uses
/// Playnite's proven EXCLUSION filter (skip `UE_*` Unreal components; skip a DLC/addon unless it is
/// `addons/launchable`) rather than a positive `games`-category match, which can drop legit titles.
#[cfg(windows)]
fn epic_entry(
v: &serde_json::Value,
art: &std::collections::HashMap<String, Artwork>,
) -> Option<GameEntry> {
let s = |k: &str| v.get(k).and_then(|x| x.as_str());
let app_name = s("AppName")?.to_string();
if app_name.starts_with("UE_") {
return None; // Unreal Engine component, not a game
}
let cats: Vec<&str> = v
.get("AppCategories")
.and_then(|c| c.as_array())
.map(|a| a.iter().filter_map(|x| x.as_str()).collect())
.unwrap_or_default();
if cats.contains(&"addons") && !cats.contains(&"addons/launchable") {
return None; // non-launchable DLC/addon
}
// Drop stale records whose install dir is gone.
let install = s("InstallLocation")?;
if !Path::new(install).is_dir() {
return None;
}
let title = s("DisplayName").unwrap_or(&app_name).to_string();
let namespace = s("CatalogNamespace").unwrap_or("");
let catalog = s("CatalogItemId").unwrap_or("");
// The robust launch form is the namespace:catalogItemId:appName triple; fall back to the bare
// appName when those ids are absent (some manifests lack them) — never drop the launch entirely.
let value = if !namespace.is_empty() && !catalog.is_empty() {
format!("{namespace}:{catalog}:{app_name}")
} else {
app_name.clone()
};
Some(GameEntry {
id: format!("epic:{app_name}"),
store: "epic".into(),
title,
art: art.get(catalog).cloned().unwrap_or_default(),
launch: Some(LaunchSpec {
kind: "epic".into(),
value,
}),
})
}
/// Best-effort parse of `catcache.bin` (base64-encoded JSON array of catalog items) into
/// catalogItemId → [`Artwork`] from each item's `keyImages`. Empty map on any read/decode failure
/// (the format is community-reverse-engineered + can lag a fresh install → titles just show no art).
#[cfg(windows)]
fn epic_art_index(catcache: &Path) -> std::collections::HashMap<String, Artwork> {
use base64::Engine as _;
let mut map = std::collections::HashMap::new();
let Ok(raw) = std::fs::read(catcache) else {
return map;
};
let Ok(decoded) = base64::engine::general_purpose::STANDARD.decode(raw) else {
return map;
};
let Ok(items) = serde_json::from_slice::<serde_json::Value>(&decoded) else {
return map;
};
let Some(arr) = items.as_array() else {
return map;
};
for item in arr {
let Some(cat) = item
.get("id")
.or_else(|| item.get("catalogItemId"))
.and_then(|v| v.as_str())
else {
continue;
};
let Some(images) = item.get("keyImages").and_then(|v| v.as_array()) else {
continue;
};
let mut art = Artwork::default();
for img in images {
let (Some(ty), Some(url)) = (
img.get("type").and_then(|v| v.as_str()),
img.get("url").and_then(|v| v.as_str()),
) else {
continue;
};
if !(url.starts_with("http://") || url.starts_with("https://")) {
continue;
}
match ty {
"DieselGameBoxTall" => art.portrait = Some(url.to_string()),
"DieselGameBox" => art.hero = Some(url.to_string()),
"DieselGameBoxLogo" => art.logo = Some(url.to_string()),
_ => {}
}
}
if art.portrait.is_some() || art.hero.is_some() || art.logo.is_some() {
map.insert(cat.to_string(), art);
}
}
map
}
/// Build the `com.epicgames.launcher://` launch URI from a stored launch value — the triple
/// `<namespace>:<catalogItemId>:<appName>` (colons URL-encoded), or a bare `<appName>` fallback.
/// Each part is charset-validated (host-derived, but belt-and-suspenders) so no shell/URI injection.
#[cfg(windows)]
fn epic_launch_uri(value: &str) -> Option<String> {
let ok = |s: &str| {
!s.is_empty()
&& s.bytes()
.all(|b| b.is_ascii_alphanumeric() || matches!(b, b'.' | b'_' | b'-'))
};
let inner = match value.split(':').collect::<Vec<_>>().as_slice() {
[ns, cat, app] if ok(ns) && ok(cat) && ok(app) => format!("{ns}%3A{cat}%3A{app}"),
[app] if ok(app) => (*app).to_string(),
_ => return None,
};
Some(format!(
"com.epicgames.launcher://apps/{inner}?action=launch&silent=true"
))
}
// ---------------------------------------------------------------------------------------
// GOG (Windows) — registry-indexed installs + each game's `goggame-<id>.info` for a direct-exe
// launch (no Galaxy needed, dodges its cold-start/anti-cheat). Art (api.gog.com) is a follow-up.
// ---------------------------------------------------------------------------------------
/// Reads the GOG.com install registry + per-game `.info` files. Windows-only. Best-effort: empty
/// when GOG isn't installed.
#[cfg(windows)]
pub struct GogProvider;
#[cfg(windows)]
impl LibraryProvider for GogProvider {
fn store(&self) -> &'static str {
"gog"
}
fn list(&self) -> Vec<GameEntry> {
gog_games()
}
}
#[cfg(windows)]
fn gog_games() -> Vec<GameEntry> {
use winreg::enums::HKEY_LOCAL_MACHINE;
use winreg::RegKey;
// 32-bit GOG writes under WOW6432Node; a 64-bit process reads the explicit path directly.
let Ok(games_key) =
RegKey::predef(HKEY_LOCAL_MACHINE).open_subkey("SOFTWARE\\WOW6432Node\\GOG.com\\Games")
else {
return Vec::new();
};
let mut out = Vec::new();
for sub in games_key.enum_keys().flatten() {
// The subkey name IS the GOG product id.
let Ok(k) = games_key.open_subkey(&sub) else {
continue;
};
let Ok(path) = k.get_value::<String, _>("PATH") else {
continue;
};
if !Path::new(&path).is_dir() {
continue;
}
let title = k
.get_value::<String, _>("GAMENAME")
.unwrap_or_else(|_| sub.clone());
// Resolve the primary play task (exe + args + workdir) from goggame-<id>.info; skip if absent.
let Some((exe, args, workdir)) = gog_play_task(&path, &sub) else {
continue;
};
let id = format!("gog:{sub}");
// Art (public api.gog.com) is resolved off the hot path by the background warmer; read
// whatever it has cached (title-only until warmed).
let art = cached_art(&id).unwrap_or_default();
out.push(GameEntry {
id,
store: "gog".into(),
title,
art,
launch: Some(LaunchSpec {
kind: "gog".into(),
value: format!("{exe}\t{args}\t{workdir}"),
}),
});
}
out
}
/// The primary play task from `<install>\goggame-<id>.info`: `(absolute exe, args, working dir)`.
/// Prefers `isPrimary` + `FileTask`, else the first `FileTask`. Paths are resolved against `install`.
#[cfg(windows)]
fn gog_play_task(install: &str, id: &str) -> Option<(String, String, String)> {
let text =
std::fs::read_to_string(Path::new(install).join(format!("goggame-{id}.info"))).ok()?;
let v: serde_json::Value = serde_json::from_str(&text).ok()?;
let tasks = v.get("playTasks")?.as_array()?;
let is_file =
|t: &serde_json::Value| t.get("type").and_then(|s| s.as_str()) == Some("FileTask");
let pick = tasks
.iter()
.find(|t| {
t.get("isPrimary")
.and_then(|b| b.as_bool())
.unwrap_or(false)
&& is_file(t)
})
.or_else(|| tasks.iter().find(|t| is_file(t)))?;
let rel = pick.get("path").and_then(|s| s.as_str())?;
let exe = Path::new(install).join(rel);
let args = pick
.get("arguments")
.and_then(|s| s.as_str())
.unwrap_or("")
.to_string();
let workdir = pick
.get("workingDir")
.and_then(|s| s.as_str())
.map(|w| Path::new(install).join(w))
.unwrap_or_else(|| Path::new(install).to_path_buf());
Some((
exe.to_string_lossy().into_owned(),
args,
workdir.to_string_lossy().into_owned(),
))
}
/// Build the spawn `(command line, working dir)` for a `gog` launch value (`exe \t args \t workdir`,
/// all host-resolved from the operator's own disk). Direct exe — no shell, no Galaxy.
#[cfg(windows)]
fn gog_spawn(value: &str) -> Option<(String, Option<PathBuf>)> {
let mut parts = value.split('\t');
let exe = parts.next().filter(|s| !s.is_empty())?;
let args = parts.next().unwrap_or("");
let workdir = parts.next().filter(|s| !s.is_empty()).map(PathBuf::from);
let cmdline = if args.trim().is_empty() {
format!("\"{exe}\"")
} else {
format!("\"{exe}\" {args}")
};
Some((cmdline, workdir))
}
// ---------------------------------------------------------------------------------------
// Xbox / Microsoft Store / Game Pass (Windows) — scans the flat-file `XboxGames` install dirs
// (no auth) for GDK games (each has a Content\MicrosoftGame.config). Launch via the AUMID
// (shell:AppsFolder\<PFN>!<AppId>) in the interactive session. Cover art (displaycatalog) deferred.
// ---------------------------------------------------------------------------------------
/// Reads installed Xbox / Game Pass / Store GDK games from the flat-file install dirs. Windows-only.
/// Best-effort: empty when no `XboxGames` dir exists.
#[cfg(windows)]
pub struct XboxProvider;
#[cfg(windows)]
impl LibraryProvider for XboxProvider {
fn store(&self) -> &'static str {
"xbox"
}
fn list(&self) -> Vec<GameEntry> {
xbox_games()
}
}
/// Scan each fixed drive's default `<drive>:\XboxGames` for GDK games — the presence of
/// `Content\MicrosoftGame.config` is the game marker (so we list games, not ordinary UWP apps). A
/// custom install folder (set via the undocumented `.GamingRoot`) isn't covered; the default folder
/// is the common case. Non-GDK pure-UWP Store games (under the ACL-locked WindowsApps) are missed too.
#[cfg(windows)]
fn xbox_games() -> Vec<GameEntry> {
let mut games = Vec::new();
for letter in b'C'..=b'Z' {
let root = PathBuf::from(format!("{}:\\XboxGames", letter as char));
let Ok(rd) = std::fs::read_dir(&root) else {
continue;
};
for entry in rd.flatten() {
let title_dir = entry.path();
let cfg = title_dir.join("Content").join("MicrosoftGame.config");
if !cfg.is_file() {
continue;
}
let Ok(text) = std::fs::read_to_string(&cfg) else {
continue;
};
let folder = title_dir
.file_name()
.map(|f| f.to_string_lossy().into_owned());
let Some((name, app_id, title, store_id)) = xbox_parse_config(&text, folder.as_deref())
else {
continue;
};
let Some(pfn) = xbox_pfn(&name) else {
tracing::debug!(package = %name, "xbox: no AppRepository entry → can't resolve PFN, skipping");
continue;
};
let id_key = if store_id.is_empty() {
pfn.clone()
} else {
store_id
};
let id = format!("xbox:{id_key}");
// Art (unofficial displaycatalog, keyed by StoreId) is resolved off the hot path by the
// background warmer; read whatever it has cached (title-only until warmed / if no StoreId).
let art = cached_art(&id).unwrap_or_default();
games.push(GameEntry {
id,
store: "xbox".into(),
title,
art,
launch: Some(LaunchSpec {
kind: "aumid".into(),
value: format!("{pfn}!{app_id}"),
}),
});
}
}
games.sort_by(|a, b| a.id.cmp(&b.id));
games.dedup_by(|a, b| a.id == b.id); // same game on two drives → one entry
games
}
/// Parse the fields we need from a `MicrosoftGame.config`: `(Identity Name, AppId, title, StoreId)`.
/// AppId is the `<Executable>`'s `Id` (the AUMID app id, typically "Game"). The title prefers
/// `ShellVisuals@DefaultDisplayName`, but that can be an unresolved `ms-resource:` ref → fall back to
/// the install folder name, then the package name.
#[cfg(windows)]
fn xbox_parse_config(text: &str, folder: Option<&str>) -> Option<(String, String, String, String)> {
let doc = roxmltree::Document::parse(text).ok()?;
let root = doc.root_element();
let name = root
.children()
.find(|n| n.has_tag_name("Identity"))?
.attribute("Name")?
.to_string();
let app_id = root
.children()
.find(|n| n.has_tag_name("ExecutableList"))
.and_then(|el| {
el.children()
.filter(|n| n.has_tag_name("Executable"))
.find_map(|e| e.attribute("Id"))
})?
.to_string();
let ddn = root
.children()
.find(|n| n.has_tag_name("ShellVisuals"))
.and_then(|sv| sv.attribute("DefaultDisplayName"))
.filter(|s| !s.is_empty() && !s.starts_with("ms-resource"));
let title = ddn
.map(String::from)
.or_else(|| folder.map(String::from))
.unwrap_or_else(|| name.clone());
let store_id = root
.children()
.find(|n| n.has_tag_name("StoreId"))
.and_then(|n| n.text())
.unwrap_or("")
.to_string();
Some((name, app_id, title, store_id))
}
/// Resolve a package's PackageFamilyName by finding its
/// `AppRepository\Packages\<PackageFullName>` dir (machine-wide, SYSTEM-readable) and reducing the
/// full name to `Name_PublisherHash`. This READS the authoritative PFN — never compute the hash.
#[cfg(windows)]
fn xbox_pfn(identity: &str) -> Option<String> {
let pkgs = PathBuf::from(std::env::var_os("ProgramData")?)
.join("Microsoft")
.join("Windows")
.join("AppRepository")
.join("Packages");
let prefix = format!("{identity}_");
for e in std::fs::read_dir(&pkgs).ok()?.flatten() {
let dn = e.file_name().to_string_lossy().into_owned();
if dn.starts_with(&prefix) {
if let Some(pfn) = pfn_from_full(&dn, identity) {
return Some(pfn);
}
}
}
None
}
/// PackageFamilyName from a PackageFullName dir name
/// (`Name_Version_Arch_ResourceId_PublisherHash`) → `Name_PublisherHash`. The hash is the last
/// `_`-segment; `Name` is the caller's identity.
#[cfg(windows)]
fn pfn_from_full(dir_name: &str, identity: &str) -> Option<String> {
let hash = dir_name.rsplit('_').next()?;
(!hash.is_empty() && hash != dir_name).then(|| format!("{identity}_{hash}"))
}
// ---------------------------------------------------------------------------------------
// Cover-art resolver + cache (shared by the Windows GOG + Xbox providers, which have no local
// art). A disk cache is the source of truth read by all_games() (so the list/launch path never
// blocks on the network); a host-lifetime background warmer fetches uncached art (GOG's public
// api.gog.com + Xbox's displaycatalog, both no-auth) and persists it. Cross-platform so the
// HTTP/JSON code is compiled + checked everywhere; the warmer simply finds nothing to fetch on a
// host whose stores all carry their own art (Steam CDN / Heroic CDN / Lutris data: URLs).
// ---------------------------------------------------------------------------------------
/// The persisted art cache: GameEntry id → resolved [`Artwork`]. An entry's PRESENCE means "already
/// resolved" (even an empty Artwork = fetched, none found) so the warmer never re-fetches it.
fn art_cache() -> &'static std::sync::Mutex<std::collections::HashMap<String, Artwork>> {
static CACHE: std::sync::OnceLock<
std::sync::Mutex<std::collections::HashMap<String, Artwork>>,
> = std::sync::OnceLock::new();
CACHE.get_or_init(|| {
let loaded = std::fs::read_to_string(art_cache_path())
.ok()
.and_then(|s| serde_json::from_str(&s).ok())
.unwrap_or_default();
std::sync::Mutex::new(loaded)
})
}
/// The art cache lives in the canonical HOST config dir (`%ProgramData%\punktfunk` on Windows /
/// `~/.config/punktfunk` on Linux — gamestream::config_dir, NOT the legacy XDG/HOME `config_dir`
/// below that the custom store still uses).
fn art_cache_path() -> PathBuf {
crate::gamestream::config_dir().join("library-art-cache.json")
}
/// The cached art for a library id, if it has been resolved (positive or negative). `None` = not yet
/// warmed → the provider shows title-only until the warmer fills it in.
fn cached_art(id: &str) -> Option<Artwork> {
art_cache().lock().unwrap().get(id).cloned()
}
/// Record resolved art for a library id + persist the cache (write-then-rename; best-effort).
fn store_art(id: &str, art: Artwork) {
let mut cache = art_cache().lock().unwrap();
cache.insert(id.to_string(), art);
if let Ok(json) = serde_json::to_string(&*cache) {
let path = art_cache_path();
if let Some(dir) = path.parent() {
let _ = std::fs::create_dir_all(dir);
}
let tmp = path.with_extension("json.tmp");
if std::fs::write(&tmp, json).is_ok() {
let _ = std::fs::rename(&tmp, &path);
}
}
}
/// Start the host-lifetime cover-art warmer: every few minutes, fetch + cache art for any library
/// entry whose store needs a network lookup (GOG / Xbox) and isn't cached yet. Idempotent — once
/// everything is cached a pass makes no network calls (and a host with only self-art stores never
/// fetches at all). Call once from `serve()`; the returned handle can be dropped to detach it.
pub fn start_art_warmer() -> std::thread::JoinHandle<()> {
std::thread::Builder::new()
.name("pf-art-warmer".into())
.spawn(|| loop {
warm_art_once();
std::thread::sleep(std::time::Duration::from_secs(300));
})
.expect("spawn art warmer thread")
}
/// One warming pass: resolve uncached GOG/Xbox art. Other stores carry their own art (Steam CDN
/// template, Heroic CDN URLs, Lutris data: URLs, custom user URLs) and are skipped.
fn warm_art_once() {
for g in all_games() {
if cached_art(&g.id).is_some() {
continue;
}
let Some((store, localid)) = g.id.split_once(':') else {
continue;
};
let art = match store {
"gog" => fetch_gog_art(localid),
// The xbox id is the StoreId when present, else the PFN (contains '_', no displaycatalog
// entry) → cache empty for those so they aren't retried every pass.
"xbox" if !localid.contains('_') => fetch_xbox_art(localid),
"xbox" => Artwork::default(),
_ => continue, // steam/heroic/lutris/custom resolve their own art
};
store_art(&g.id, art);
}
}
/// HTTP GET + parse JSON with a bounded timeout. `None` on any network/parse failure (best-effort —
/// art is non-essential, so a failure just leaves the title-only card).
fn fetch_json(url: &str) -> Option<serde_json::Value> {
let agent = ureq::AgentBuilder::new()
.timeout(std::time::Duration::from_secs(10))
.build();
let body = agent.get(url).call().ok()?.into_string().ok()?;
serde_json::from_str(&body).ok()
}
/// Make a protocol-relative URL (`//host/...`, common in GOG + MS catalog responses) absolute https.
fn abs_url(u: &str) -> String {
u.strip_prefix("//")
.map(|rest| format!("https://{rest}"))
.unwrap_or_else(|| u.to_string())
}
/// GOG cover art via the public (no-auth) product API. Field names / URL shapes are GOG-specific and
/// best-effort (worth on-box confirmation); a wrong URL just degrades to the title card client-side.
fn fetch_gog_art(product_id: &str) -> Artwork {
let Some(v) = fetch_json(&format!(
"https://api.gog.com/products/{product_id}?expand=images"
)) else {
return Artwork::default();
};
let img = |k: &str| {
v.get("images")
.and_then(|i| i.get(k))
.and_then(|u| u.as_str())
.map(abs_url)
};
Artwork {
portrait: img("verticalCover"),
hero: img("background"),
logo: img("logo2x"),
header: img("logo"),
}
}
/// Xbox cover art via the (unofficial, no-auth) Microsoft display catalog, keyed by StoreId. Best-
/// effort: the endpoint is internal/unstable, so on drift this just yields no art (title-only).
fn fetch_xbox_art(store_id: &str) -> Artwork {
let Some(v) = fetch_json(&format!(
"https://displaycatalog.mp.microsoft.com/v7.0/products/{store_id}?market=US&languages=en-us&fieldsTemplate=Details"
)) else {
return Artwork::default();
};
let images = v
.get("Products")
.and_then(|p| p.as_array())
.and_then(|a| a.first())
.and_then(|p| p.get("LocalizedProperties"))
.and_then(|l| l.as_array())
.and_then(|a| a.first())
.and_then(|lp| lp.get("Images"))
.and_then(|i| i.as_array());
let mut art = Artwork::default();
for img in images.into_iter().flatten() {
let (Some(purpose), Some(uri)) = (
img.get("ImagePurpose").and_then(|v| v.as_str()),
img.get("Uri").and_then(|v| v.as_str()),
) else {
continue;
};
let url = abs_url(uri);
match purpose {
"Poster" => art.portrait = Some(url),
"SuperHeroArt" | "Hero" => art.hero = Some(url),
"Logo" => art.logo = Some(url),
"BoxArt" => art.header = Some(url),
_ => {}
}
}
art
}
// ---------------------------------------------------------------------------------------
// Custom store (user-curated entries, persisted + CRUD'd via the mgmt API)
// ---------------------------------------------------------------------------------------
@@ -768,6 +1383,32 @@ fn windows_launch_for(spec: &LaunchSpec) -> Option<(String, Option<std::path::Pa
};
Some((cmdline, None))
}
// Epic: open the (host-built, validated) com.epicgames.launcher:// URI via explorer.exe — a
// concrete EXE that resolves the registered protocol handler as the user; the URI is a single
// argv element (no shell, no cmd /c). Same pattern as the steam explorer fallback.
"epic" => epic_launch_uri(&spec.value).map(|uri| (format!("explorer.exe \"{uri}\""), None)),
// GOG: spawn the resolved game exe directly (host-derived from goggame-<id>.info), no Galaxy.
"gog" => gog_spawn(&spec.value),
// Xbox/Game Pass: activate the UWP/GDK package by its AUMID (<PFN>!<AppId>) via explorer's
// shell:AppsFolder — which runs in the interactive user session (UWP activation fails as
// SYSTEM/session-0; spawn_in_active_session uses the user token). Guard the charset (the value
// is host-derived from MicrosoftGame.config + AppRepository, but belt-and-suspenders).
"aumid" => {
let valid = spec.value.split_once('!').is_some_and(|(pfn, app)| {
let part = |s: &str| {
!s.is_empty()
&& s.bytes()
.all(|b| b.is_ascii_alphanumeric() || matches!(b, b'.' | b'_' | b'-'))
};
part(pfn) && part(app)
});
valid.then(|| {
(
format!("explorer.exe \"shell:AppsFolder\\{}\"", spec.value),
None,
)
})
}
// Operator-typed custom command (host-owned, never client-set): run it through the shell in the
// interactive session. `cmd.exe /c` is acceptable here precisely because the value is operator
// input — the same trust as the operator typing it — not a client-influenced string.
@@ -795,6 +1436,38 @@ fn steam_exe() -> Option<std::path::PathBuf> {
None
}
/// Launch a GameStream `apps.json` command (operator-typed, trusted — never client-set) into the live
/// session, AFTER capture is up. Used by the GameStream path for the backends that DON'T nest the
/// command via [`VirtualDisplay::set_launch_command`]: Windows (no gamescope) and Linux
/// kwin/mutter/wlroots (which stream the existing desktop). The caller skips this for Linux gamescope,
/// which already nested it. On Windows it runs in the interactive USER session (the host is SYSTEM);
/// on Linux the host is already inside the user's graphical session, so a plain spawn lands the app on
/// the streamed (primary) output.
#[cfg(any(windows, target_os = "linux"))]
pub fn launch_gamestream_command(cmd: &str) -> Result<()> {
let cmd = cmd.trim();
anyhow::ensure!(!cmd.is_empty(), "empty command");
#[cfg(windows)]
{
// cmd.exe /c is fine here: the value is the host operator's own apps.json command, not a
// client-influenced string (same trust as the custom-store `command` kind).
let pid = crate::interactive::spawn_in_active_session(&format!("cmd.exe /c {cmd}"), None)
.context("spawn gamestream command in the interactive session")?;
tracing::info!(command = %cmd, pid, "gamestream: launched app in the interactive session");
Ok(())
}
#[cfg(target_os = "linux")]
{
let child = std::process::Command::new("sh")
.arg("-c")
.arg(cmd)
.spawn()
.context("spawn gamestream command")?;
tracing::info!(command = %cmd, pid = child.id(), "gamestream: launched app into the session");
Ok(())
}
}
/// The full library: every store's titles merged + the custom entries, sorted by title.
pub fn all_games() -> Vec<GameEntry> {
let mut games = SteamProvider.list();
@@ -805,6 +1478,13 @@ pub fn all_games() -> Vec<GameEntry> {
games.extend(LutrisProvider.list());
games.extend(HeroicProvider.list());
}
// Windows store providers (their launchers are Windows-only): Epic + GOG + Xbox/Game Pass.
#[cfg(windows)]
{
games.extend(EpicProvider.list());
games.extend(GogProvider.list());
games.extend(XboxProvider.list());
}
games.extend(load_custom().into_iter().map(GameEntry::from));
games.sort_by_key(|g| g.title.to_lowercase());
games
@@ -1048,6 +1728,20 @@ mod tests {
windows_launch_for(&cmd).unwrap().0,
"cmd.exe /c notepad.exe"
);
// Xbox AUMID → explorer shell:AppsFolder activation; a value without '!' is rejected.
let aumid = LaunchSpec {
kind: "aumid".into(),
value: "Microsoft.X_8wekyb3d8bbwe!Game".into(),
};
assert_eq!(
windows_launch_for(&aumid).unwrap().0,
"explorer.exe \"shell:AppsFolder\\Microsoft.X_8wekyb3d8bbwe!Game\""
);
assert!(windows_launch_for(&LaunchSpec {
kind: "aumid".into(),
value: "no-bang".into()
})
.is_none());
// Empty / unknown kinds → no recipe.
assert!(windows_launch_for(&LaunchSpec {
kind: "command".into(),
@@ -1060,4 +1754,116 @@ mod tests {
})
.is_none());
}
#[cfg(windows)]
#[test]
fn epic_filters_and_builds_launch() {
let dir = std::env::temp_dir().join(format!("pf-epic-test-{}", std::process::id()));
std::fs::create_dir_all(&dir).unwrap();
let inst = dir.to_string_lossy().into_owned();
let empty = std::collections::HashMap::new();
// Normal game with the full triple → kept, triple launch value.
let game = serde_json::json!({
"AppName": "Fortnite", "DisplayName": "Fortnite", "CatalogNamespace": "fn",
"CatalogItemId": "abc123", "InstallLocation": inst.clone(),
"AppCategories": ["public", "games", "applications"]
});
let e = epic_entry(&game, &empty).expect("game kept");
assert_eq!(e.id, "epic:Fortnite");
assert_eq!(e.launch.as_ref().unwrap().value, "fn:abc123:Fortnite");
// UE component, non-launchable addon, and a missing install dir are all skipped.
let ue = serde_json::json!({"AppName":"UE_5.3","InstallLocation":inst.clone(),"AppCategories":["engines"]});
assert!(epic_entry(&ue, &empty).is_none());
let dlc =
serde_json::json!({"AppName":"DLC","InstallLocation":inst,"AppCategories":["addons"]});
assert!(epic_entry(&dlc, &empty).is_none());
let gone = serde_json::json!({"AppName":"Gone","InstallLocation":"C:\\nope-xyz","AppCategories":["games"]});
assert!(epic_entry(&gone, &empty).is_none());
std::fs::remove_dir_all(&dir).ok();
}
#[cfg(windows)]
#[test]
fn epic_launch_uri_triple_bare_and_guard() {
assert_eq!(
epic_launch_uri("fn:abc:Fortnite").as_deref(),
Some("com.epicgames.launcher://apps/fn%3Aabc%3AFortnite?action=launch&silent=true")
);
assert_eq!(
epic_launch_uri("Fortnite").as_deref(),
Some("com.epicgames.launcher://apps/Fortnite?action=launch&silent=true")
);
assert!(epic_launch_uri("bad part:x:y").is_none()); // a space → rejected
assert!(epic_launch_uri("").is_none());
}
#[cfg(windows)]
#[test]
fn gog_spawn_parses_and_guards() {
let (cmd, wd) = gog_spawn("C:\\Games\\W3\\witcher3.exe\t--skip\tC:\\Games\\W3").unwrap();
assert_eq!(cmd, "\"C:\\Games\\W3\\witcher3.exe\" --skip");
assert_eq!(wd, Some(std::path::PathBuf::from("C:\\Games\\W3")));
let (cmd2, wd2) = gog_spawn("C:\\g.exe").unwrap();
assert_eq!(cmd2, "\"C:\\g.exe\"");
assert!(wd2.is_none());
assert!(gog_spawn("").is_none());
}
#[cfg(windows)]
#[test]
fn gog_play_task_picks_primary_filetask() {
let dir = std::env::temp_dir().join(format!("pf-gog-test-{}", std::process::id()));
std::fs::create_dir_all(&dir).unwrap();
let id = "1207658924";
std::fs::write(
dir.join(format!("goggame-{id}.info")),
r#"{"playTasks":[
{"isPrimary":false,"type":"FileTask","path":"other.exe"},
{"isPrimary":true,"type":"FileTask","path":"bin\\game.exe","arguments":"-w","workingDir":"bin"}
]}"#,
)
.unwrap();
let (exe, args, wd) = gog_play_task(&dir.to_string_lossy(), id).unwrap();
std::fs::remove_dir_all(&dir).ok();
assert!(exe.ends_with("bin\\game.exe"), "exe={exe}");
assert_eq!(args, "-w");
assert!(wd.ends_with("bin"), "wd={wd}");
}
#[cfg(windows)]
#[test]
fn xbox_parse_config_and_pfn() {
let xml = r#"<?xml version="1.0" encoding="utf-8"?>
<Game configVersion="1">
<Identity Name="Microsoft.624F8B84B80" Publisher="CN=Microsoft" Version="1.0.0.0" />
<ExecutableList>
<Executable Name="gamelaunchhelper.exe" Id="Game" />
</ExecutableList>
<StoreId>9NBLGGH4R315</StoreId>
<ShellVisuals DefaultDisplayName="Halo Infinite" Square150x150Logo="x.png" />
</Game>"#;
let (name, app_id, title, store_id) = xbox_parse_config(xml, Some("HaloInfinite")).unwrap();
assert_eq!(name, "Microsoft.624F8B84B80");
assert_eq!(app_id, "Game");
assert_eq!(title, "Halo Infinite");
assert_eq!(store_id, "9NBLGGH4R315");
// An ms-resource DefaultDisplayName is unresolvable → fall back to the install folder name.
let xml2 = r#"<Game><Identity Name="Pkg.Name"/>
<ExecutableList><Executable Id="App"/></ExecutableList>
<ShellVisuals DefaultDisplayName="ms-resource:DisplayName"/></Game>"#;
let (_, app2, title2, sid2) = xbox_parse_config(xml2, Some("MyGameFolder")).unwrap();
assert_eq!(app2, "App");
assert_eq!(title2, "MyGameFolder");
assert_eq!(sid2, "");
// PackageFamilyName reduced from a PackageFullName dir name (the hash is the last segment).
assert_eq!(
pfn_from_full(
"Microsoft.624F8B84B80_1.0.0.0_x64__8wekyb3d8bbwe",
"Microsoft.624F8B84B80"
)
.as_deref(),
Some("Microsoft.624F8B84B80_8wekyb3d8bbwe")
);
assert!(pfn_from_full("NoUnderscore", "NoUnderscore").is_none());
}
}
@@ -13,6 +13,9 @@
//! attaches none, the export yields an already-signaled sync_file (poll returns immediately) — no
//! wait, no harm, and `waited=false` tells us the driver doesn't fence (so zero-copy would still race).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::os::fd::RawFd;
// linux/dma-buf.h ioctls on the DMA_BUF_BASE ('b' = 0x62) magic. _IOWR = dir(3)<<30 | size<<16 | base<<8 | nr.
@@ -40,6 +43,11 @@ pub fn wait_read_ready(dmabuf_fd: RawFd, timeout_ms: i32) -> std::io::Result<boo
flags: DMA_BUF_SYNC_READ,
fd: -1,
};
// SAFETY: `dmabuf_fd` is a live dmabuf fd supplied by the caller (borrowed for this call; we
// never close it). `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` encodes `size_of::<DmaBufExportSyncFile>()`
// — the exact byte count the kernel copies — and `&mut req` is a live, correctly-sized
// `#[repr(C)]` struct the EXPORT_SYNC_FILE ioctl reads (`flags`) and writes (`fd`). `req`
// outlives this synchronous call and is not aliased elsewhere.
let r = unsafe { libc::ioctl(dmabuf_fd, DMA_BUF_IOCTL_EXPORT_SYNC_FILE, &mut req) };
if r < 0 {
return Err(std::io::Error::last_os_error());
@@ -54,11 +62,21 @@ pub fn wait_read_ready(dmabuf_fd: RawFd, timeout_ms: i32) -> std::io::Result<boo
revents: 0,
};
// Non-blocking probe: not-yet-signaled (poll==0) means the producer is still rendering.
// SAFETY: `&mut pfd` points at a single live `libc::pollfd` and `nfds == 1` matches that one
// element; `pfd.fd` is `sync_fd`, the sync_file fd just exported (already checked `>= 0`).
// `poll` reads `fd`/`events` and writes `revents` for this non-blocking (timeout 0) probe, then
// returns — `pfd` outlives the call and aliases nothing.
let pending = unsafe { libc::poll(&mut pfd, 1, 0) } == 0;
if pending {
pfd.revents = 0;
// SAFETY: same live single-element `pfd` (its `revents` reset to 0 just above), `nfds == 1`,
// and `sync_fd` still open. This blocking `poll` (up to `timeout_ms`) waits for the render
// fence to signal; it reads `fd`/`events`, writes `revents`, and returns before `pfd` ends.
unsafe { libc::poll(&mut pfd, 1, timeout_ms) }; // block until the render fence signals
}
// SAFETY: `sync_fd` is the sync_file fd the EXPORT_SYNC_FILE ioctl created and handed us to own;
// this point is reached only when `sync_fd >= 0`, this `close` runs exactly once on it, and it is
// never used afterward — no double-close or use-after-close.
unsafe { libc::close(sync_fd) };
Ok(pending)
}
@@ -8,6 +8,8 @@
//! verified (ioctl numbers + a live signal→wait round trip), ready to wire in the moment a producer
//! gains working `SPA_META_SyncTimeline`.
#![allow(dead_code)]
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
//!
//! Compositors that render directly into the PipeWire buffer pool (Mutter's virtual
//! monitors) hand buffers over at GPU-submit time; on drivers without implicit dmabuf
@@ -81,6 +83,8 @@ pub struct DrmSync {
impl DrmSync {
pub fn open() -> Result<DrmSync> {
let path = c"/dev/dri/renderD128";
// SAFETY: `path` is a 'static NUL-terminated C string literal; `open` only reads it as a
// filesystem path and returns an fd (or -1). No Rust memory is aliased or handed to the kernel.
let fd = unsafe { libc::open(path.as_ptr(), libc::O_RDWR | libc::O_CLOEXEC) };
if fd < 0 {
bail!("open /dev/dri/renderD128 for syncobj ops: {}", errno());
@@ -94,6 +98,9 @@ impl DrmSync {
fd: syncobj_fd,
..Default::default()
};
// SAFETY: `self.fd` is the live render-node fd from `open`; the request number encodes
// `size_of::<DrmSyncobjHandle>()` (the bytes the kernel copies), and `&mut req` is a live,
// correctly-sized `#[repr(C)]` struct the FD_TO_HANDLE ioctl reads (`fd`) and writes (`handle`).
let r = unsafe { libc::ioctl(self.fd, DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, &mut req) };
if r < 0 {
bail!("SYNCOBJ_FD_TO_HANDLE: {}", errno());
@@ -106,6 +113,8 @@ impl DrmSync {
handle,
..Default::default()
};
// SAFETY: `self.fd` is the live render-node fd; `DRM_IOCTL_SYNCOBJ_DESTROY` encodes
// `size_of::<DrmSyncobjDestroy>()`, and `&mut req` is a live correctly-sized struct the kernel reads.
unsafe { libc::ioctl(self.fd, DRM_IOCTL_SYNCOBJ_DESTROY, &mut req) };
}
@@ -117,6 +126,8 @@ impl DrmSync {
tv_sec: 0,
tv_nsec: 0,
};
// SAFETY: `CLOCK_MONOTONIC` is a valid clock id and `&mut now` is a live `libc::timespec` the
// kernel fills in; the call returns before `now` is read, so there is no aliasing/lifetime issue.
unsafe { libc::clock_gettime(libc::CLOCK_MONOTONIC, &mut now) };
let deadline = now.tv_sec * 1_000_000_000 + now.tv_nsec + timeout_ms as i64 * 1_000_000;
let handles = [handle];
@@ -129,6 +140,11 @@ impl DrmSync {
flags: DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
..Default::default()
};
// SAFETY: `self.fd` is the live render-node fd; the request number encodes
// `size_of::<DrmSyncobjTimelineWait>()`; `&mut req` is a live correctly-sized struct. Its
// `handles`/`points` u64 fields hold the addresses of the local `handles`/`points` arrays, which
// outlive this synchronous call, and `count_handles == 1` matches their length — so every kernel
// read through those addresses stays in bounds.
let r = unsafe { libc::ioctl(self.fd, DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, &mut req) };
let saved = errno();
self.destroy(handle);
@@ -151,6 +167,10 @@ impl DrmSync {
count_handles: 1,
flags: 0,
};
// SAFETY: `self.fd` is the live render-node fd; the request number encodes
// `size_of::<DrmSyncobjTimelineArray>()`; `&mut req` is a live correctly-sized struct whose
// `handles`/`points` u64 fields address the local `handles`/`points` arrays (alive for this
// synchronous call, `count_handles == 1` matching their length).
let r = unsafe { libc::ioctl(self.fd, DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, &mut req) };
let saved = errno();
self.destroy(handle);
@@ -163,6 +183,8 @@ impl DrmSync {
impl Drop for DrmSync {
fn drop(&mut self) {
// SAFETY: `self.fd` is the fd `open` returned; this `DrmSync` owns it exclusively and `close`
// runs exactly once (here, in `Drop`), so there is no double-close or use-after-close.
unsafe { libc::close(self.fd) };
}
}
@@ -203,14 +225,19 @@ mod tests {
const CREATE: u64 = iowr(0xBF, std::mem::size_of::<Create>());
const HANDLE_TO_FD: u64 = iowr(0xC1, std::mem::size_of::<DrmSyncobjHandle>());
let mut c = Create::default();
// SAFETY: `sync.fd` is the live render-node fd; `CREATE` encodes `size_of::<Create>()`, and
// `&mut c` is a live correctly-sized struct the kernel fills (`handle`).
assert!(unsafe { libc::ioctl(sync.fd, CREATE, &mut c) } >= 0);
let mut h = DrmSyncobjHandle {
handle: c.handle,
..Default::default()
};
// SAFETY: `sync.fd` is live; `HANDLE_TO_FD` encodes `size_of::<DrmSyncobjHandle>()`; `&mut h`
// is a live correctly-sized struct (the kernel reads `handle`, writes `fd`).
assert!(unsafe { libc::ioctl(sync.fd, HANDLE_TO_FD, &mut h) } >= 0);
sync.signal_point(h.fd, 1).expect("signal");
sync.wait_point(h.fd, 1, 100).expect("wait after signal");
// SAFETY: `h.fd` is the fd HANDLE_TO_FD just exported; we own it and close it exactly once here.
unsafe { libc::close(h.fd) };
sync.destroy(c.handle);
}
@@ -11,6 +11,8 @@
//! thread) and ffmpeg's `hevc_nvenc` (encode thread); each thread makes it current before use.
#![allow(non_camel_case_types, non_snake_case)]
// Every `unsafe` block/impl below carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::{bail, Result};
use std::os::raw::{c_int, c_uint, c_void};
@@ -128,8 +130,14 @@ struct CudaApi {
) -> CUresult,
cuDestroyExternalMemory: unsafe extern "C" fn(CUexternalMemory) -> CUresult,
}
// The resolved fn pointers are plain addresses into a process-lifetime mapping; safe to share.
// SAFETY: every field is a bare `extern "C" fn` address into the leaked, process-lifetime
// `libcuda` mapping (`cuda_api` `forget`s the `Library`, so it is never unloaded) — an immutable
// value with no interior mutability and no thread affinity. Moving the table to another thread
// cannot dangle (the code it points at stays mapped) or race (the fields are read-only).
unsafe impl Send for CudaApi {}
// SAFETY: as above — the table is a set of immutable fn-pointer addresses with no interior
// mutability, so concurrent shared reads from multiple threads cannot race; the driver entry
// points they address are themselves thread-safe.
unsafe impl Sync for CudaApi {}
/// `CUresult` returned by the wrappers when `libcuda` isn't loaded (no NVIDIA driver). Non-zero so
@@ -143,6 +151,14 @@ static CUDA_API: OnceLock<Option<CudaApi>> = OnceLock::new();
/// (the expected case on AMD/Intel hosts) — logged at debug, not an error.
fn cuda_api() -> Option<&'static CudaApi> {
CUDA_API
// SAFETY: `Library::new` runs `libcuda.so.1`'s initializers — it is the trusted NVIDIA
// driver library, so loading has no unexpected effects; `?`/`None` handle its absence.
// Each `lib.get::<T>(name)` asserts the symbol's real ABI equals `T`: every NUL-terminated
// name is a documented CUDA Driver API entry point and `T` is the exact
// `unsafe extern "C" fn(..)` signature from cuda.h/cudaGL.h (`_v2` for ctx/mem ops). Each
// `Symbol` only borrows `lib` until the end of the struct-literal statement; we deref-copy
// the raw fn-pointer out first, then `forget(lib)` leaks the mapping so those addresses
// stay valid for the whole process. Runs once under the `OnceLock` init — no aliasing.
.get_or_init(|| unsafe {
let lib = libloading::Library::new("libcuda.so.1")
.or_else(|_| libloading::Library::new("libcuda.so"))
@@ -361,6 +377,12 @@ pub fn read_plane_to_host(
Height: height,
..Default::default()
};
// SAFETY: `copy_blocking` is unsafe because it issues a CUDA copy; its contract is a valid
// descriptor with the shared context current (the caller's responsibility — self-test path).
// `&copy` is a live local `#[repr(C)] CUDA_MEMCPY2D` that outlives the synchronous call:
// `srcDevice`/`srcPitch` are the caller's live pitched device plane, `dstHost` addresses the
// freshly-allocated `host` `Vec` of exactly `width_bytes*height` bytes, and `WidthInBytes`×
// `Height` fit both. The copy is synchronous, so `host` is fully written before we return it.
unsafe { copy_blocking(&copy, "cuMemcpy2DAsync_v2(dev->host)")? };
Ok(host)
}
@@ -369,7 +391,13 @@ pub fn read_plane_to_host(
/// in a `OnceLock`; the raw `CUcontext` is thread-safe to make current from any thread.
#[derive(Clone, Copy)]
pub struct Context(pub CUcontext);
// SAFETY: `CUcontext` is an opaque CUDA driver handle, not a dereferenceable Rust pointer. It is
// created once and never destroyed (process lifetime), and the only thing done with it is
// `cuCtxSetCurrent`, which the Driver API explicitly allows from any thread — so transferring the
// handle to another thread cannot dangle or race (the driver owns the synchronization).
unsafe impl Send for Context {}
// SAFETY: as above — the wrapped handle is an immutable opaque address and the driver does all the
// synchronization, so sharing `&Context` across threads is sound.
unsafe impl Sync for Context {}
static CONTEXT: OnceLock<Context> = OnceLock::new();
@@ -382,6 +410,12 @@ pub fn context() -> Result<CUcontext> {
if cuda_api().is_none() {
bail!("libcuda.so.1 not available — no NVIDIA driver (CUDA zero-copy disabled)");
}
// SAFETY: we returned above unless `cuda_api()` is `Some`, so every wrapper here forwards into
// the live, leaked `libcuda` table rather than the not-loaded stub. `cuInit(0)` passes the
// API-required flags value 0. `&mut dev`/`&mut ctx` are live, zero/null-initialized stack
// out-params the driver writes the device handle / new context into; each outlives its
// synchronous call and they are distinct locals (no aliasing). `cuCtxCreate_v2` yields a valid
// `CUcontext` on success (`ck` bails otherwise), which becomes the block's value.
let ctx = unsafe {
ck(cuInit(0), "cuInit")?;
let mut dev: CUdevice = 0;
@@ -401,6 +435,10 @@ pub fn context() -> Result<CUcontext> {
/// Make the shared context current on the calling thread (required before any CUDA op here).
pub fn make_current() -> Result<()> {
let ctx = context()?;
// SAFETY: `ctx` came from `context()?`, so it is the live shared `CUcontext` and the driver
// table is present. `cuCtxSetCurrent` binds that opaque handle to the calling thread; it takes
// no Rust-memory pointer and is thread-safe (affects only this thread's current context), so
// there is no aliasing or lifetime hazard.
unsafe { ck(cuCtxSetCurrent(ctx), "cuCtxSetCurrent") }
}
@@ -423,6 +461,12 @@ fn copy_stream() -> CUstream {
if let Some(s) = cell.get() {
return s;
}
// SAFETY: `copy_stream` runs with the shared context current (its doc contract), so the
// wrappers forward into the live `libcuda` table. `&mut least`/`&mut greatest` are live
// stack `i32`s the driver fills with the priority range; `&mut s` is a live null-init
// `CUstream` the driver writes the new stream into. All out-params outlive their
// synchronous calls and are distinct locals. On any non-zero result we fall back to a null
// (NULL-stream) value and never read an uninitialized handle.
let stream = unsafe {
let (mut least, mut greatest) = (0i32, 0i32);
if cuCtxGetStreamPriorityRange(&mut least, &mut greatest) != 0 {
@@ -459,6 +503,11 @@ unsafe fn copy_blocking(copy: &CUDA_MEMCPY2D, what: &str) -> Result<()> {
fn alloc_pitched(width: u32, height: u32) -> Result<(CUdeviceptr, usize)> {
let mut ptr: CUdeviceptr = 0;
let mut pitch: usize = 0;
// SAFETY: `cuMemAllocPitch_v2` allocates a pitched device buffer (the wrapper forwards to the
// live table on any path that reached allocation). `&mut ptr` (`CUdeviceptr`) and `&mut pitch`
// (`usize`) are live, distinct stack out-params the driver writes the allocation pointer and
// its pitch into; both outlive the synchronous call. Width/height/element-size are by-value
// ints. No aliasing — two separate locals.
unsafe {
ck(
cuMemAllocPitch_v2(
@@ -486,6 +535,10 @@ fn alloc_pitched_nv12(
let mut y_pitch: usize = 0;
let mut uv_ptr: CUdeviceptr = 0;
let mut uv_pitch: usize = 0;
// SAFETY: two independent `cuMemAllocPitch_v2` calls (wrapper → live table). `&mut y_ptr`/
// `&mut y_pitch` and `&mut uv_ptr`/`&mut uv_pitch` are live, distinct stack out-params the
// driver writes each plane's pointer and pitch into; all outlive their synchronous calls. The
// dimension/element-size args are by-value ints. No aliasing — four separate locals.
unsafe {
ck(
cuMemAllocPitch_v2(
@@ -524,6 +577,13 @@ struct PoolInner {
impl Drop for PoolInner {
fn drop(&mut self) {
// SAFETY: the pool only exists because allocation succeeded, so the driver table is live.
// `PoolInner` drops only once every `DeviceBuffer` that referenced it (each holds an `Arc`
// clone) has been recycled, so `free`/`free_uv` hold every outstanding allocation exactly
// once and nothing else still uses them — no double-free or use-after-free. We make the
// shared context current first (drop may run off the allocating thread) so `cuMemFree_v2`
// targets the right context. Each `p` is a `CUdeviceptr` previously returned by
// `cuMemAllocPitch_v2`; results are ignored (best-effort teardown).
unsafe {
if let Some(c) = CONTEXT.get() {
let _ = cuCtxSetCurrent(c.0);
@@ -697,6 +757,12 @@ impl Drop for DeviceBuffer {
}
} else {
// The buffer may be freed on the encode thread; cuMemFree needs a current context.
// SAFETY: this is the un-pooled branch (`pool` is `None`), so this `DeviceBuffer`
// exclusively owns `self.ptr` (and `self.uv`'s `uv_ptr`), each returned by
// `cuMemAllocPitch_v2` and freed exactly once here — `drop` runs once and the
// `self.ptr == 0` guard above skips the sentinel/empty case, so no double-free. We set
// the shared context current first because drop may run on a thread where it isn't, and
// `cuMemFree_v2` needs it. Wrapper → live table; results ignored (teardown).
unsafe {
if let Some(c) = CONTEXT.get() {
let _ = cuCtxSetCurrent(c.0);
@@ -745,6 +811,16 @@ impl RegisteredTexture {
/// unmap. The copy is synchronized (on our priority stream) before unmap so `dst` is ready
/// before the source dmabuf is recycled. Always unmaps, even if the copy errors.
pub fn copy_mapped_to(&mut self, dst: &DeviceBuffer) -> Result<()> {
// SAFETY: `self.resource` is the valid `CUgraphicsResource` from a successful `register_gl`
// (its only constructor), so the wrappers forward to the live table; the caller holds the
// GL+CUDA contexts current (the registration's contract). `cuGraphicsMapResources` maps
// `count == 1` resource via `&mut self.resource` (a live field) on the default stream;
// `cuGraphicsSubResourceGetMappedArray` writes the mapped `CUarray` into the live local
// `array` (index 0, mip 0). On failure we unmap and bail (balanced). `&copy` is a live
// local `CUDA_MEMCPY2D` outliving the synchronous `copy_blocking`: `srcArray` is valid
// while mapped, `dstDevice`/`dstPitch` are `dst`'s live allocation, `width*4`×`height` fit
// both. `copy_blocking` syncs before we unmap, so the array stays valid through the copy;
// we always unmap afterward (even on error), keeping the map/unmap pair balanced.
unsafe {
ck(
cuGraphicsMapResources(1, &mut self.resource, std::ptr::null_mut()),
@@ -783,6 +859,14 @@ impl RegisteredTexture {
width_bytes: usize,
height: usize,
) -> Result<()> {
// SAFETY: identical contract to `copy_mapped_to` — `self.resource` is the valid
// `CUgraphicsResource` from `register_gl` (wrappers → live table; caller holds GL+CUDA
// contexts current). Map `count == 1` resource via the live `&mut self.resource`; the
// mapped `CUarray` is written into the live local `array` (index 0, mip 0); on failure we
// unmap and bail (balanced). `&copy` is a live local outliving the synchronous
// `copy_blocking`: `srcArray` valid while mapped, `dstDevice`/`dstPitch` are the caller's
// live plane, `width_bytes`×`height` fit it. We always unmap afterward, even on copy error,
// so the map/unmap pair stays balanced and the array outlives the copy.
unsafe {
ck(
cuGraphicsMapResources(1, &mut self.resource, std::ptr::null_mut()),
@@ -847,6 +931,10 @@ pub fn copy_device_to_device(
Height: src.height as usize,
..Default::default()
};
// SAFETY: `copy_blocking` is unsafe (issues a CUDA copy); the caller must have the shared
// context current (documented). `&copy` is a live local device→device `CUDA_MEMCPY2D` outliving
// the synchronous call: `srcDevice`/`srcPitch` are `src`'s live allocation, `dstDevice`/
// `dstPitch` the caller's live region, `width*4`×`height` within both. Wrapper → live table.
unsafe { copy_blocking(&copy, "cuMemcpy2DAsync_v2(dev->dev)") }
}
@@ -888,6 +976,12 @@ pub fn copy_nv12_to_device(
Height: h / 2,
..Default::default()
};
// SAFETY: two unsafe `copy_blocking` device→device copies; the caller must have the shared
// context current (documented). `&y`/`&uv` are live local `CUDA_MEMCPY2D`s outliving each
// synchronous call. All four device pointers are valid: `src.ptr`/`src_uv_ptr` come from a live
// NV12 `DeviceBuffer` (its `.uv` presence was checked via `ok_or_else`), `y_dst`/`uv_dst` are
// the caller's live NVENC surface planes; the luma copy is `w`×`h`, the chroma copy
// `(w/2)*2`×`h/2`, each within its planes. Wrappers → live table.
unsafe {
copy_blocking(&y, "cuMemcpy2DAsync_v2(nv12 Y dev->dev)")?;
copy_blocking(&uv, "cuMemcpy2DAsync_v2(nv12 UV dev->dev)")
@@ -897,6 +991,12 @@ pub fn copy_nv12_to_device(
impl Drop for RegisteredTexture {
fn drop(&mut self) {
if !self.resource.is_null() {
// SAFETY: `self.resource` is non-null (just checked) and is the valid
// `CUgraphicsResource` from `register_gl`, owned exclusively by this `RegisteredTexture`
// and unregistered exactly once here (drop runs once) — no use-after-free or
// double-unregister. `cuGraphicsUnregisterResource` releases the GL↔CUDA registration;
// wrapper → live table (the resource exists ⇒ the driver was present). Result ignored
// (best-effort teardown).
unsafe {
let _ = cuGraphicsUnregisterResource(self.resource);
}
@@ -913,7 +1013,11 @@ pub struct ExternalDmabuf {
pub size: u64,
}
// Raw driver handles; used from the single capture thread but moved with the importer.
// SAFETY: the fields are opaque CUDA driver handles — an external-memory handle and a device
// pointer — not dereferenceable Rust memory, and the value is uniquely owned (no `Clone`). It is
// used from a single capture thread but constructed on / moved between threads with the importer;
// transferring these handles is sound because uniqueness rules out aliasing and they are destroyed
// exactly once in `Drop`. Only `Send` (not `Sync`) is asserted, matching the single-thread use.
unsafe impl Send for ExternalDmabuf {}
impl ExternalDmabuf {
@@ -921,6 +1025,9 @@ impl ExternalDmabuf {
/// from then on) and map its full `size` bytes to a device pointer. The shared context
/// must be current.
pub fn import(fd: i32, size: u64) -> Result<ExternalDmabuf> {
// SAFETY: `libc::dup` only reads the integer `fd` and returns a new descriptor (or -1); it
// touches no Rust memory and `fd` is the caller's still-owned dmabuf fd (not consumed
// here). No aliasing or lifetime concern — a pure syscall on an integer.
let dup = unsafe { libc::dup(fd) };
if dup < 0 {
bail!("dup(dmabuf fd) failed");
@@ -938,8 +1045,17 @@ impl ExternalDmabuf {
};
desc.handle[0] = dup as u32 as u64; // union member `int fd` (little-endian low bytes)
let mut ext: CUexternalMemory = std::ptr::null_mut();
// SAFETY: `cuImportExternalMemory` imports the memory described by `&desc`, a live local
// `#[repr(C)] CUDA_EXTERNAL_MEMORY_HANDLE_DESC` (cuda.h 64-bit layout) that outlives this
// synchronous call: `type_` is OPAQUE_FD, `handle[0]` holds the dup'd fd in the union's
// `int fd` low bytes, `size` is set. `&mut ext` is a live null-init out-param the driver
// writes the imported handle into. The driver takes ownership of the fd only on success.
// Distinct locals → no aliasing. Wrapper → live table (caller holds the context current).
let r = unsafe { cuImportExternalMemory(&mut ext, &desc) };
if r != 0 {
// SAFETY: import failed (`r != 0`), so the driver did NOT take ownership of `dup`; we
// still own it and close it exactly once here on the error path (the success path never
// closes it — the driver does). `libc::close` acts on the integer fd alone.
unsafe { libc::close(dup) }; // import failed → the driver did not take the fd
bail!("cuImportExternalMemory failed ({r}) — LINEAR dmabuf import unsupported?");
}
@@ -949,8 +1065,17 @@ impl ExternalDmabuf {
..Default::default()
};
let mut ptr: CUdeviceptr = 0;
// SAFETY: maps a device pointer from `ext` (the valid `CUexternalMemory` just imported) per
// `&buf`, a live local `CUDA_EXTERNAL_MEMORY_BUFFER_DESC` (offset 0, full `size`) that
// outlives this synchronous call. `&mut ptr` is a live zero-init out-param the driver writes
// the mapped device address into; distinct locals → no aliasing. Wrapper → live table
// (context current).
let r = unsafe { cuExternalMemoryGetMappedBuffer(&mut ptr, ext, &buf) };
if r != 0 {
// SAFETY: mapping failed; `ext` is the valid `CUexternalMemory` we imported and
// exclusively own. We destroy it exactly once here on the error path (the success path
// instead moves it into the returned `ExternalDmabuf`, whose `Drop` destroys it),
// releasing the fd the driver took — no double-destroy or use-after-free.
unsafe {
let _ = cuDestroyExternalMemory(ext);
}
@@ -962,6 +1087,12 @@ impl ExternalDmabuf {
impl Drop for ExternalDmabuf {
fn drop(&mut self) {
// SAFETY: this `ExternalDmabuf` only exists after a successful import, so the driver table
// is live. It exclusively owns `self.ptr` (the mapped buffer) and `self.ext` (the external
// memory), each torn down exactly once here (drop runs once; guarded by `!= 0` / `!null`) —
// no double-free or use-after-free. We make the shared context current first because drop
// may run off the import thread, and we free the mapped buffer before destroying its
// backing external memory. Results ignored (best-effort teardown).
unsafe {
if let Some(c) = CONTEXT.get() {
let _ = cuCtxSetCurrent(c.0);
@@ -996,5 +1127,10 @@ pub fn copy_pitched_to_buffer(
};
// copy_blocking syncs our priority stream before returning, so the copy is complete before the
// dmabuf is requeued to the producer.
// SAFETY: `copy_blocking` is unsafe (issues a CUDA copy); the caller must have the shared
// context current (documented). `&copy` is a live local device→device `CUDA_MEMCPY2D` outliving
// the synchronous call: `srcDevice`/`srcPitch` are the caller's live mapped span (e.g. an
// `ExternalDmabuf`), `dstDevice`/`dstPitch` are `dst`'s live allocation, `width*4`×`height`
// within both. Wrapper → live table.
unsafe { copy_blocking(&copy, "cuMemcpy2DAsync_v2(ext->dev)") }
}
+110 -3
View File
@@ -12,6 +12,8 @@
//! owned [`DeviceBuffer`] so the dmabuf can be returned to the compositor immediately.
#![allow(non_upper_case_globals)]
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::cuda::{self, DeviceBuffer};
use anyhow::{bail, ensure, Context as _, Result};
@@ -415,6 +417,14 @@ impl Nv12Blit {
impl Drop for Nv12Blit {
fn drop(&mut self) {
// SAFETY: these GL names (textures/FBOs/VAO/programs) were all created by THIS `Nv12Blit`
// in `Nv12Blit::new` on the current GL context, which is still current because the owning
// `EglImporter` is dropped on its single capture thread (fields drop before
// `EglImporter::drop`, which never releases the context). `glDelete*` takes a count + a
// pointer to that many names: `&self.y_tex`/`&self.vao` are `&u32` to one live field (n=1);
// `[self.y_fbo, self.uv_fbo].as_ptr()` points at a 2-element temporary that lives for the
// whole `glDeleteFramebuffers` call (n=2 matches). The symbols dispatch through libGL
// (libglvnd) to the driver for the current context. Each name is deleted exactly once.
unsafe {
glDeleteTextures(1, &self.y_tex);
glDeleteTextures(1, &self.uv_tex);
@@ -459,7 +469,14 @@ pub struct EglImporter {
render_fd: c_int,
}
// The EGL handles are confined to the capture thread; the struct is moved there once.
// SAFETY: `EglImporter` owns thread-affine handles — an EGLDisplay/contexts made current on one
// thread, a loaded GL proc pointer, a `gbm_device*`, a raw fd, and CUDA-registered GL textures —
// none safe to touch concurrently. It is constructed inside `pipewire_thread` on the dedicated
// `punktfunk-pipewire` thread, and every method (`import*`, `supported_modifiers`, `Drop`) runs on
// that same thread; it is never accessed through a shared `&` from another thread. `Send` asserts
// only that transferring *ownership* is sound (needed so the importer can live in the PipeWire
// stream's user-data, whose API imposes a `Send` bound) — the live handles are never used
// off-thread. `Sync` is deliberately NOT implied.
unsafe impl Send for EglImporter {}
impl EglImporter {
@@ -470,16 +487,38 @@ impl EglImporter {
// to the same DRM device CUDA-GL interop associates with, which the EGL device platform
// did not (cuGraphicsGLRegisterImage rejected device-platform GL textures).
let path = std::ffi::CString::new("/dev/dri/renderD128").unwrap();
// SAFETY: `path` is a live local `CString` (built from a string with no interior NUL, so it
// is NUL-terminated); `path.as_ptr()` is a valid pointer to that buffer which outlives this
// synchronous `open`. `open` only reads the path and returns a new fd (or -1); it neither
// retains the pointer nor writes through it, so there is no aliasing or lifetime hazard.
let render_fd = unsafe { libc::open(path.as_ptr(), libc::O_RDWR | libc::O_CLOEXEC) };
ensure!(render_fd >= 0, "open /dev/dri/renderD128 for GBM");
// SAFETY: `render_fd` is the live DRM render-node fd just returned by `open` and checked
// `>= 0`. `gbm_create_device` (libgbm, linked above) builds a `gbm_device` over that fd and
// returns a `*mut gbm_device` (or null); it borrows but does not take ownership of the fd,
// which `EglImporter` keeps open and closes only in `Drop` after `gbm_device_destroy`. No
// Rust-owned memory is passed, so there is nothing to alias.
let gbm = unsafe { gbm_create_device(render_fd) };
if gbm.is_null() {
// SAFETY: reached only when `gbm_create_device` failed (null) — the fd was not consumed
// and no `EglImporter` exists yet to close it again, so this `close` runs exactly once on
// the live `render_fd`, releasing it before the error return. No double-close.
unsafe { libc::close(render_fd) };
anyhow::bail!("gbm_create_device failed");
}
// SAFETY: `Egl::load_required` dlopens the system libEGL and binds its entry points,
// trusting that libEGL (libglvnd) is a genuine EGL 1.5 implementation whose core symbols
// match the ABI the `khronos_egl` `EGL1_5` bindings declare. No Rust memory is passed; the
// returned instance is afterwards used only through the safe `khronos_egl` wrappers.
let egl: Egl =
unsafe { Egl::load_required() }.context("load libEGL (EGL 1.5 dynamic instance)")?;
// SAFETY: `gbm` is the non-null `gbm_device*` created just above (checked), and
// `EGL_PLATFORM_GBM_KHR` is exactly the platform enum that pairs with a GBM device as the
// native-display handle, so the `gbm as NativeDisplayType` cast hands EGL a valid native
// display for the requested platform. `&[egl::ATTRIB_NONE]` is a properly terminated, empty
// attribute array borrowed for this synchronous call; EGL only reads it and returns an
// `EGLDisplay`, retaining no pointer into Rust memory.
let display = unsafe {
egl.get_platform_display(
EGL_PLATFORM_GBM_KHR,
@@ -533,6 +572,13 @@ impl EglImporter {
.context("eglCreateContext(OpenGL)")?;
egl.make_current(display, None, None, Some(gl_ctx))
.context("eglMakeCurrent surfaceless (needs EGL_KHR_surfaceless_context)")?;
// SAFETY: the GL context was made current on this thread just above, which `eglGetProcAddress`
// requires to return a usable pointer. The non-null (`?`-checked) pointer it returns for
// "glEGLImageTargetTexture2DOES" is the driver's implementation of that GL-OES entry point,
// whose real ABI is `void(GLenum, GLeglImageOES)` = `(u32, *mut c_void)` `extern "system"`.
// `EglImageTargetFn` is declared with exactly that signature, so the transmute only retypes a
// same-size, same-ABI thin function pointer (no value/representation change). The function is
// present because `EGL_EXT_image_dma_buf_import` was asserted on this display above.
let egl_image_target: EglImageTargetFn = unsafe {
std::mem::transmute(
egl.get_proc_address("glEGLImageTargetTexture2DOES")
@@ -543,6 +589,10 @@ impl EglImporter {
// Create the shared CUDA context up front so import() is pure hot path.
cuda::context().context("create CUDA context")?;
// SAFETY: `egl::NO_CONTEXT` is EGL's defined sentinel (a null handle) for "no context";
// `Context::from_ptr` only stores the handle (it never dereferences it), so wrapping the
// null sentinel is sound and yields exactly the `EGL_NO_CONTEXT` value that
// `eglCreateImage(EGL_LINUX_DMA_BUF_EXT)` requires as its context argument later.
let no_ctx = unsafe { egl::Context::from_ptr(egl::NO_CONTEXT) };
tracing::info!(
"zero-copy EGL importer ready (GBM platform + GL texture interop, dma_buf_import + modifiers)"
@@ -602,8 +652,21 @@ impl EglImporter {
let Some(sym) = self.egl.get_proc_address("eglQueryDmaBufModifiersEXT") else {
return Vec::new();
};
// SAFETY: `sym` is the non-null pointer `eglGetProcAddress("eglQueryDmaBufModifiersEXT")`
// returned (the `let-else` already bailed on `None`) — the driver's implementation of that
// EGL extension entry point. `QueryFn` is declared with that function's exact documented ABI
// (`EGLDisplay, EGLint, EGLint, EGLuint64* , EGLBoolean*, EGLint* -> EGLBoolean`), all
// `extern "system"`, so the transmute only retypes a same-size, same-ABI thin fn pointer.
let query: QueryFn = unsafe { std::mem::transmute(sym) };
let dpy = self.display.as_ptr();
// SAFETY: `dpy` is this importer's live, initialized `EGLDisplay`; `query` is the proc loaded
// just above. The first call passes null out-arrays with `max_modifiers == 0`, which the
// extension defines as "write only the count" — it writes solely through `&mut count` (a live
// local `i32`). For the second call, `mods`/`ext` are freshly allocated `Vec`s of exactly
// `count` elements and `max_modifiers == count`, so the driver writes at most `count`
// `u64`/`u32` entries (in bounds) plus the actual count through `&mut n` (a live local). All
// four Rust addresses outlive these synchronous calls and alias nothing else. `truncate` only
// shrinks, so even a misbehaving `n > count` cannot read out of bounds.
unsafe {
let mut count: i32 = 0;
if query(
@@ -699,6 +762,10 @@ impl EglImporter {
]);
}
attrs.push(egl::ATTRIB_NONE);
// SAFETY: `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` mandates a NULL `EGLClientBuffer`
// (the source is described entirely by the attribute list built above), so wrapping
// `null_mut()` is the required value. `from_ptr` only stores the pointer without
// dereferencing it, so constructing it from null is sound.
let client = unsafe { egl::ClientBuffer::from_ptr(std::ptr::null_mut()) };
let image = self
.egl
@@ -733,11 +800,21 @@ impl EglImporter {
) -> Result<DeviceBuffer> {
cuda::make_current()?;
if self.blit.as_ref().map(|b| (b.width, b.height)) != Some((width, height)) {
// SAFETY: `GlBlit::new` requires the GL context current on the calling thread and a
// current CUDA context. Both hold: this runs on the capture thread where
// `EglImporter::new` made the GL context current and never released it, and
// `cuda::make_current()?` ran at the top of this function. `width`/`height` are plain
// `Copy` frame dimensions.
self.blit = Some(unsafe { GlBlit::new(width, height)? });
}
let egl_image_target = self.egl_image_target;
let blit = self.blit.as_mut().unwrap();
// SAFETY: GL + CUDA contexts current on this thread; `image` is a valid EGLImage.
// SAFETY: `GlBlit::run` requires a current GL context and a valid `EGLImage`. The GL context
// is current on this capture thread (made current in `EglImporter::new`, never released) and
// `cuda::make_current()` ran above; `egl_image_target` is the `glEGLImageTargetTexture2DOES`
// pointer loaded in `new`; `image` is the raw handle of the live `EGLImage` that
// `import_inner` created with `eglCreateImage` and destroys only AFTER this call returns, so
// it stays valid for the whole synchronous `run`.
unsafe { blit.run(egl_image_target, image)? };
// Persistent registration (mapped per frame) + a pooled buffer — no per-frame
// cuGraphicsGLRegisterImage / cuMemAllocPitch.
@@ -757,11 +834,21 @@ impl EglImporter {
) -> Result<DeviceBuffer> {
cuda::make_current()?;
if self.nv12_blit.as_ref().map(|b| (b.width, b.height)) != Some((width, height)) {
// SAFETY: `Nv12Blit::new` requires the GL context current on the calling thread and a
// current CUDA context. Both hold: this runs on the capture thread where
// `EglImporter::new` made the GL context current and never released it, and
// `cuda::make_current()?` ran at the top of this function. `width`/`height` are plain
// `Copy` frame dimensions.
self.nv12_blit = Some(unsafe { Nv12Blit::new(width, height)? });
}
let egl_image_target = self.egl_image_target;
let blit = self.nv12_blit.as_mut().unwrap();
// SAFETY: GL + CUDA contexts current on this thread; `image` is a valid EGLImage.
// SAFETY: `Nv12Blit::run` requires a current GL context and a valid `EGLImage`. The GL
// context is current on this capture thread (made current in `EglImporter::new`, never
// released) and `cuda::make_current()` ran above; `egl_image_target` is the
// `glEGLImageTargetTexture2DOES` pointer loaded in `new`; `image` is the raw handle of the
// live `EGLImage` that `import_inner` created with `eglCreateImage` and destroys only AFTER
// this call returns, so it stays valid for the whole synchronous `run`.
unsafe { blit.run(egl_image_target, image)? };
let dst = blit.pool.get()?;
cuda::copy_mapped_nv12(&mut blit.y_registered, &mut blit.uv_registered, &dst)?;
@@ -787,9 +874,22 @@ impl EglImporter {
);
cuda::make_current()?;
if self.nv12_blit.as_ref().map(|b| (b.width, b.height)) != Some((width, height)) {
// SAFETY: `Nv12Blit::new` requires the GL context current on the calling thread and a
// current CUDA context. Both hold: this self-test path runs on the thread that owns this
// `EglImporter` with its GL context current, and `cuda::make_current()?` ran just above.
// `width`/`height` are plain `Copy` scalars.
self.nv12_blit = Some(unsafe { Nv12Blit::new(width, height)? });
}
let blit = self.nv12_blit.as_mut().unwrap();
// SAFETY: runs on the thread that owns this `EglImporter` with its GL context current.
// `blit.src_tex` is a texture this `Nv12Blit` owns; `glTexStorage2D` allocates immutable
// RGBA8 storage exactly once (guarded by `test_src_storage`) sized `width×height`.
// `glTexSubImage2D` then uploads exactly `width×height` RGBA8 texels, reading `width*height*4`
// bytes from `rgba.as_ptr()`; the caller already asserted `rgba.len() == width*height*4`, rows
// are `width*4` bytes (a multiple of the default 4-byte unpack alignment, so no row-padding
// over-read), and `rgba` is a live borrow that outlives this synchronous upload. `run_passes`
// then needs only the current GL context (no further Rust pointers). All GL names are this
// blit's own, alias no other live object, and nothing is retained past the calls.
unsafe {
// Upload the host RGBA into `src_tex` (an immutable GL_RGBA8 backing must exist first;
// the live path never allocates it — it retargets `src_tex` via EGLImage instead).
@@ -824,9 +924,16 @@ impl EglImporter {
impl Drop for EglImporter {
fn drop(&mut self) {
if !self.gbm.is_null() {
// SAFETY: `self.gbm` is the non-null `gbm_device*` from `gbm_create_device` in `new`
// (checked non-null here), owned exclusively by this `EglImporter` and destroyed exactly
// once (in `Drop`). It is freed BEFORE `render_fd` is closed below — the correct order,
// since the device borrowed that fd for its lifetime.
unsafe { gbm_device_destroy(self.gbm) };
}
if self.render_fd >= 0 {
// SAFETY: `self.render_fd` is the fd `open` returned in `new` (checked `>= 0`), owned
// exclusively by this `EglImporter`; this `close` runs exactly once, after the gbm device
// that borrowed it has been destroyed. No double-close or use-after-close.
unsafe { libc::close(self.render_fd) };
}
}
@@ -16,6 +16,9 @@
//! a stream's life). Falls back cleanly: any init/import error disables the importer and the
//! CPU mmap path takes over.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::cuda::{self, DeviceBuffer};
use anyhow::{anyhow, bail, Context as _, Result};
use ash::vk;
@@ -51,12 +54,27 @@ pub struct VkBridge {
dst: Option<DstBuf>,
}
// Confined to the capture thread; moved there once.
// SAFETY: `VkBridge` owns ash Vulkan handles (instance/device/queue/command pool+buffer/fence), a
// CUDA external-memory mapping, and an fd→buffer cache — none `Sync`, and a single queue +
// command buffer must be externally synchronized. It is created inside `EglImporter::import_linear`
// on the dedicated `punktfunk-pipewire` capture thread and every method (`import_linear`, `Drop`)
// runs on that thread; it is never shared via `&` across threads. `Send` asserts only that
// transferring ownership is sound (so the bridge can live inside the `Send` `EglImporter`); the live
// handles are never touched off-thread, and `Sync` is deliberately NOT implied.
unsafe impl Send for VkBridge {}
impl VkBridge {
/// Bring up Vulkan on the NVIDIA GPU with the external-memory extensions.
pub fn new() -> Result<VkBridge> {
// SAFETY: standard ash bring-up — every call is `unsafe` only because ash cannot statically
// verify Vulkan handle/CreateInfo validity. `ash::Entry::load` dlopens a real system
// libvulkan. Each `*CreateInfo`/`AllocateInfo` is built by ash's builders from locals (`app`,
// `exts`, `prio`, `qci`, and the inline infos) that all live for the duration of the
// synchronous `create_*`/`enumerate_*` call that reads them — in particular the
// `enabled_extension_names(&exts)` and `queue_priorities(&prio)` borrows outlive their calls.
// Every handle passed (`instance`, `phys`, `device`, `qf`, `cmd_pool`) was just created and
// checked via `?`/`ok_or_else` in this same function, so no invalid handle is ever used. This
// constructor shares nothing across threads.
unsafe {
let entry = ash::Entry::load().context("load libvulkan")?;
let app = vk::ApplicationInfo::default().api_version(vk::API_VERSION_1_1);
@@ -294,6 +312,19 @@ impl VkBridge {
height: u32,
pool: &cuda::BufferPool,
) -> Result<DeviceBuffer> {
// SAFETY: `fd` is the live dmabuf fd handed in by the caller (borrowed; `import_src` dup's it
// internally and Vulkan owns the dup). `libc::lseek` only queries the fd's size. The unsafe
// `import_src`/`ensure_dst` are called with a valid fd and a checked size. The bounds are
// proven: `import_src` asserts `size >= span` (so the cached `src_size >= span`),
// `copy_size = src_size.min(span)`, and `ensure_dst(copy_size)` makes `dst` at least
// `copy_size` — so the GPU `cmd_copy_buffer` of `copy_size` bytes reads/writes within both
// buffers, and the later CUDA pitched copy reading `[offset, span)` from `dst.cuda.ptr` (=
// `offset + stride*height = span <= copy_size`) stays inside the freshly-copied region. The
// `*Info`/`region`/`cmds`/`submit` are locals that outlive the synchronous calls reading them.
// `cmd`/`queue`/`fence` are this bridge's own handles, used on this single thread only. The
// host-side `wait_for_fences` fully retires the Vulkan copy BEFORE CUDA reads the shared
// memory, so there is no GPU write/read data race. `dst` is an `&self.dst` shared borrow that
// does not alias the `&self.device` calls.
unsafe {
let span = offset as u64 + stride as u64 * height as u64;
if !self.src_cache.contains_key(&fd) {
@@ -347,6 +378,15 @@ impl VkBridge {
impl Drop for VkBridge {
fn drop(&mut self) {
// SAFETY: runs once when the bridge is dropped on its owning capture thread.
// `device_wait_idle` first drains all in-flight GPU work, so no queued command still
// references these objects. Every handle freed (the `src_cache` buffers+memories, the `dst`
// buffer+memory, `fence`, `cmd_pool`, `device`, `instance`) was created by this `VkBridge`
// and owned exclusively by it, so each `destroy_*`/`free_*` runs exactly once with no
// double-free, in dependency order (child objects before `device`, `device` before
// `instance`). `dst.cuda` is dropped after `free_memory`, which is safe because CUDA holds
// its own dup'd OPAQUE_FD reference to the underlying allocation. No other thread touches
// these handles.
unsafe {
let _ = self.device.device_wait_idle();
for (_, s) in self.src_cache.drain() {
+7 -2
View File
@@ -13,6 +13,10 @@
// Scaffold: trait methods and config paths are defined ahead of their backends.
#![allow(dead_code)]
// Unsafe-proof program: every `unsafe {}` / `unsafe impl` in the crate must carry a `// SAFETY:`
// proof of why it is sound. This crate-root deny is the permanent, catch-all gate (it also covers
// any future module); individual files keep their own `#![deny(...)]` as belt-and-suspenders.
#![deny(clippy::undocumented_unsafe_blocks)]
mod audio;
mod capture;
@@ -46,6 +50,7 @@ mod service;
mod session_plan;
mod session_tuning;
mod spike;
mod stats_recorder;
mod vdisplay;
#[cfg(target_os = "windows")]
#[path = "windows/wgc_helper.rs"]
@@ -385,7 +390,7 @@ fn real_main() -> Result<()> {
}
// USER-session WGC helper (Windows two-process secure-desktop design): capture the EXISTING
// SudoVDA via WGC + NVENC, stream AUs on stdout to the SYSTEM host. Spawned by the host
// (CreateProcessAsUser), not run by hand. See docs/windows-secure-desktop.md.
// (CreateProcessAsUser), not run by hand. See design/windows-secure-desktop.md.
#[cfg(target_os = "windows")]
Some("wgc-helper") => {
let get = |flag: &str| {
@@ -700,7 +705,7 @@ SPIKE OPTIONS:
NOTES:
'portal' needs headless Sway + xdg-desktop-portal-wlr running in this session
(see docs/linux-setup.md). 'synthetic' needs no capture session and always runs.
(see design/linux-setup.md). 'synthetic' needs no capture session and always runs.
Encoded AUs are written to a playable file AND (unless --no-loopback) fed through a
punktfunk_core host→client loopback that reassembles and byte-verifies each one.
Both 'serve' and 'punktfunk1-host' advertise the native service over mDNS
+217 -9
View File
@@ -6,7 +6,7 @@
//! The API is versioned under `/api/v1` and described by an OpenAPI 3.1 document generated
//! at compile time with `utoipa` — `punktfunk-host openapi` prints it for client codegen, the
//! running server serves it at `/api/v1/openapi.json` plus interactive docs at `/api/docs`,
//! and a copy is checked in at `docs/api/openapi.json` (a test fails if it drifts, like the
//! and a copy is checked in at `api/openapi.json` (a test fails if it drifts, like the
//! cbindgen header).
//!
//! Security: binds loopback by default, serves HTTPS with the host's identity cert, and requires
@@ -20,6 +20,7 @@ use crate::gamestream::{
tls::{serve_https, PeerCertFingerprint},
AppState, APP_VERSION, AUDIO_PORT, CONTROL_PORT, GFE_VERSION, RTSP_PORT, VIDEO_PORT,
};
use crate::stats_recorder::{Capture, CaptureMeta, StatsStatus};
use anyhow::{Context, Result};
use axum::{
extract::{Path, Request, State},
@@ -66,6 +67,9 @@ struct MgmtState {
/// Native (punktfunk/1) pairing — shared with the QUIC host when the unified `serve --native`
/// runs it. `None` ⇒ GameStream-only host (the native endpoints report `enabled: false`).
native: Option<Arc<crate::native_pairing::NativePairing>>,
/// Shared streaming-stats recorder — the same handle the streaming loops emit into, so an
/// operator can arm/stop a capture here and review/list/delete saved recordings.
stats: Arc<crate::stats_recorder::StatsRecorder>,
token: Option<String>,
/// The port we serve on, echoed in [`PortMap`] so a client can persist a full endpoint map.
port: u16,
@@ -77,6 +81,7 @@ pub async fn run(
state: Arc<AppState>,
opts: Options,
native: Option<Arc<crate::native_pairing::NativePairing>>,
stats: Arc<crate::stats_recorder::StatsRecorder>,
) -> Result<()> {
// The mgmt API is HTTPS + token-authenticated ALWAYS (even on loopback): `parse_serve`
// guarantees a token (CLI flag / env / persisted ~/.config/punktfunk/mgmt-token / generated).
@@ -100,7 +105,7 @@ pub async fn run(
auth = "mTLS (paired cert) or bearer (required)",
"management API listening over HTTPS (docs at /api/docs, spec at /api/v1/openapi.json)"
);
let app = app(state, Some(token), opts.bind.port(), native);
let app = app(state, Some(token), opts.bind.port(), native, stats);
serve_https(opts.bind, app, tls).await
}
@@ -110,10 +115,12 @@ fn app(
token: Option<String>,
port: u16,
native: Option<Arc<crate::native_pairing::NativePairing>>,
stats: Arc<crate::stats_recorder::StatsRecorder>,
) -> Router {
let shared = Arc::new(MgmtState {
app: state,
native,
stats,
token,
port,
});
@@ -158,13 +165,19 @@ fn api_router_parts() -> (Router<Arc<MgmtState>>, utoipa::openapi::OpenApi) {
.routes(routes!(request_idr))
.routes(routes!(get_library))
.routes(routes!(create_custom_game))
.routes(routes!(update_custom_game, delete_custom_game)),
.routes(routes!(update_custom_game, delete_custom_game))
.routes(routes!(stats_capture_start))
.routes(routes!(stats_capture_stop))
.routes(routes!(stats_capture_status))
.routes(routes!(stats_capture_live))
.routes(routes!(stats_recordings_list))
.routes(routes!(stats_recording_get, stats_recording_delete)),
)
.split_for_parts()
}
/// The OpenAPI document as pretty JSON — what `punktfunk-host openapi` prints and what is
/// checked in at `docs/api/openapi.json` for client codegen.
/// checked in at `api/openapi.json` for client codegen.
pub fn openapi_json() -> String {
let (_, api) = api_router_parts();
let mut json = api.to_pretty_json().expect("serialize OpenAPI document");
@@ -190,6 +203,7 @@ pub fn openapi_json() -> String {
(name = "native", description = "Native punktfunk/1 pairing: arm a window, display the host PIN, manage paired devices"),
(name = "session", description = "Active streaming session control"),
(name = "library", description = "Game library: installed-store titles (Steam) plus user-curated custom entries"),
(name = "stats", description = "Streaming performance-stats capture: arm/stop a recording, read the live + saved time-series for graphing"),
)
)]
struct ApiDoc;
@@ -1218,6 +1232,185 @@ async fn delete_custom_game(Path(id): Path<String>) -> Response {
}
}
// ---------------------------------------------------------------------------------------
// Streaming stats capture (design/stats-capture-plan.md §2)
// ---------------------------------------------------------------------------------------
/// Start a stats capture
///
/// Arms a new performance-stats capture. Idempotent: if a capture is already running this returns
/// the current status unchanged. While armed, the streaming loops emit aggregated samples (~ every
/// 12 s) into the in-progress capture, readable live via `GET /stats/capture/live`.
#[utoipa::path(
post,
path = "/stats/capture/start",
tag = "stats",
operation_id = "statsCaptureStart",
responses(
(status = OK, description = "Capture armed (or already running)", body = StatsStatus),
(status = UNAUTHORIZED, description = "Missing or invalid bearer token", body = ApiError),
)
)]
async fn stats_capture_start(State(st): State<Arc<MgmtState>>) -> Json<StatsStatus> {
let status = st.stats.start();
tracing::info!(
started_unix_ms = status.started_unix_ms,
"management API: stats capture armed"
);
Json(status)
}
/// Stop the stats capture
///
/// Disarms the in-progress capture and writes it to disk atomically, returning its summary. If
/// nothing was recording, returns `204 No Content`.
#[utoipa::path(
post,
path = "/stats/capture/stop",
tag = "stats",
operation_id = "statsCaptureStop",
responses(
(status = OK, description = "Capture stopped and saved", body = CaptureMeta),
(status = NO_CONTENT, description = "Nothing was recording"),
(status = INTERNAL_SERVER_ERROR, description = "Could not write the recording to disk", body = ApiError),
(status = UNAUTHORIZED, description = "Missing or invalid bearer token", body = ApiError),
)
)]
async fn stats_capture_stop(State(st): State<Arc<MgmtState>>) -> Response {
match st.stats.stop() {
Ok(Some(meta)) => {
tracing::info!(id = %meta.id, samples = meta.sample_count, "management API: stats capture saved");
(StatusCode::OK, Json(meta)).into_response()
}
Ok(None) => StatusCode::NO_CONTENT.into_response(),
Err(e) => api_error(
StatusCode::INTERNAL_SERVER_ERROR,
&format!("could not save capture: {e}"),
),
}
}
/// Stats capture status
///
/// Whether a capture is armed, its sample count, and start time. Poll this (e.g. every 2 s) to
/// drive the capture-control UI.
#[utoipa::path(
get,
path = "/stats/capture/status",
tag = "stats",
operation_id = "statsCaptureStatus",
responses(
(status = OK, description = "In-progress capture status (idle when not armed)", body = StatsStatus),
(status = UNAUTHORIZED, description = "Missing or invalid bearer token", body = ApiError),
)
)]
async fn stats_capture_status(State(st): State<Arc<MgmtState>>) -> Json<StatsStatus> {
Json(st.stats.status())
}
/// Live in-progress capture
///
/// The full sample time-series of the capture currently recording, for live graphing. `404` when
/// nothing is armed.
#[utoipa::path(
get,
path = "/stats/capture/live",
tag = "stats",
operation_id = "statsCaptureLive",
responses(
(status = OK, description = "The in-progress capture (meta + samples so far)", body = Capture),
(status = NOT_FOUND, description = "No capture is currently recording", body = ApiError),
(status = UNAUTHORIZED, description = "Missing or invalid bearer token", body = ApiError),
)
)]
async fn stats_capture_live(State(st): State<Arc<MgmtState>>) -> Response {
match st.stats.live_snapshot() {
Some(capture) => Json(capture).into_response(),
None => api_error(StatusCode::NOT_FOUND, "no capture is currently recording"),
}
}
/// List saved recordings
///
/// Every saved capture's summary (the `meta` head only — not the sample body), newest first.
#[utoipa::path(
get,
path = "/stats/recordings",
tag = "stats",
operation_id = "statsRecordingsList",
responses(
(status = OK, description = "Saved capture summaries, newest first", body = [CaptureMeta]),
(status = UNAUTHORIZED, description = "Missing or invalid bearer token", body = ApiError),
)
)]
async fn stats_recordings_list(State(st): State<Arc<MgmtState>>) -> Json<Vec<CaptureMeta>> {
Json(st.stats.list())
}
/// Get a saved recording
///
/// The full capture (meta + samples) for `id`, for graphing or download.
#[utoipa::path(
get,
path = "/stats/recordings/{id}",
tag = "stats",
operation_id = "statsRecordingGet",
params(("id" = String, Path, description = "The recording id (its filename stem)")),
responses(
(status = OK, description = "The full capture", body = Capture),
(status = NOT_FOUND, description = "No recording with that id", body = ApiError),
(status = UNAUTHORIZED, description = "Missing or invalid bearer token", body = ApiError),
(status = INTERNAL_SERVER_ERROR, description = "The recording file is unreadable", body = ApiError),
)
)]
async fn stats_recording_get(State(st): State<Arc<MgmtState>>, Path(id): Path<String>) -> Response {
match st.stats.load(&id) {
Ok(capture) => Json(capture).into_response(),
Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
api_error(StatusCode::NOT_FOUND, "no recording with that id")
}
Err(e) => api_error(
StatusCode::INTERNAL_SERVER_ERROR,
&format!("could not read recording: {e}"),
),
}
}
/// Delete a saved recording
///
/// Removes the recording `id` from disk. `404` if there is no such recording.
#[utoipa::path(
delete,
path = "/stats/recordings/{id}",
tag = "stats",
operation_id = "statsRecordingDelete",
params(("id" = String, Path, description = "The recording id (its filename stem)")),
responses(
(status = NO_CONTENT, description = "Recording deleted"),
(status = NOT_FOUND, description = "No recording with that id", body = ApiError),
(status = UNAUTHORIZED, description = "Missing or invalid bearer token", body = ApiError),
(status = INTERNAL_SERVER_ERROR, description = "Could not delete the recording", body = ApiError),
)
)]
async fn stats_recording_delete(
State(st): State<Arc<MgmtState>>,
Path(id): Path<String>,
) -> Response {
match st.stats.delete(&id) {
Ok(()) => {
tracing::info!(id, "management API: recording deleted");
StatusCode::NO_CONTENT.into_response()
}
Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
api_error(StatusCode::NOT_FOUND, "no recording with that id")
}
Err(e) => api_error(
StatusCode::INTERNAL_SERVER_ERROR,
&format!("could not delete recording: {e}"),
),
}
}
// ---------------------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------------------
@@ -1231,6 +1424,15 @@ mod tests {
use std::net::{IpAddr, Ipv4Addr};
use tower::ServiceExt;
/// A throwaway stats recorder rooted in a unique temp dir (never touches the real config dir).
fn test_stats() -> Arc<crate::stats_recorder::StatsRecorder> {
crate::stats_recorder::StatsRecorder::new(std::env::temp_dir().join(format!(
"pf-mgmt-stats-{}-{:p}",
std::process::id(),
&0u8 as *const u8
)))
}
fn test_state() -> Arc<AppState> {
let host = Host {
hostname: "test-host".into(),
@@ -1240,18 +1442,20 @@ mod tests {
https_port: HTTPS_PORT,
};
let identity = ServerIdentity::ephemeral().expect("ephemeral identity");
Arc::new(AppState::new(host, identity))
Arc::new(AppState::new(host, identity, test_stats()))
}
// The mgmt API now always requires auth, so the router always has a token. A test that passes
// `None` gets the default "test-secret" (and `send` auto-attaches the matching bearer); a test
// that passes an explicit token exercises a mismatch (e.g. `bearer_token_is_enforced`).
fn test_app(state: Arc<AppState>, token: Option<&str>) -> Router {
let stats = state.stats.clone();
app(
state,
Some(token.unwrap_or("test-secret").to_string()),
DEFAULT_PORT,
None,
stats,
)
}
@@ -1261,11 +1465,13 @@ mod tests {
) -> Router {
// Auth required always; the paired-cert tests inject a fingerprint (cert branch wins), the
// rest authenticate via the `send`-attached default bearer.
let stats = state.stats.clone();
app(
state,
Some("test-secret".to_string()),
DEFAULT_PORT,
Some(np),
stats,
)
}
@@ -1580,7 +1786,9 @@ mod tests {
bind: "127.0.0.1:0".parse().unwrap(),
token: Some(" ".into()),
};
let err = run(test_state(), opts, None).await.unwrap_err();
let err = run(test_state(), opts, None, test_stats())
.await
.unwrap_err();
assert!(err.to_string().contains("no token"), "{err}");
}
@@ -1663,14 +1871,14 @@ mod tests {
serde_json::json!([{}])
);
let checked_in = include_str!("../../../docs/api/openapi.json");
let checked_in = include_str!("../../../api/openapi.json");
// Compare content, not line-ending style: the generated `json` is LF (serde_json), but git
// may check the file out CRLF on Windows.
assert_eq!(
json.trim().replace('\r', ""),
checked_in.trim().replace('\r', ""),
"docs/api/openapi.json is stale — regenerate with: \
cargo run -p punktfunk-host -- openapi > docs/api/openapi.json"
"api/openapi.json is stale — regenerate with: \
cargo run -p punktfunk-host -- openapi > api/openapi.json"
);
}
+355 -42
View File
@@ -22,6 +22,9 @@
//! Trust: the host serves with its persistent identity (`~/.config/punktfunk/cert.pem`, shared
//! with GameStream pairing) and logs the SHA-256 fingerprint clients pin.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::{anyhow, Context, Result};
use punktfunk_core::config::{CompositorPref, FecConfig, FecScheme, GamepadPref, Role};
use punktfunk_core::input::{InputEvent, InputKind};
@@ -76,6 +79,9 @@ pub struct Punktfunk1Options {
/// The native (punktfunk/1) trust store + on-demand arming PIN, shared with the management API.
use crate::native_pairing::NativePairing;
/// The shared streaming-stats recorder (web-console capture/graph), shared with the management API
/// and the GameStream loop; threaded into each session's `SessionContext`.
use crate::stats_recorder::StatsRecorder;
/// Minimum spacing between accepted pairing ceremonies (bounds online PIN guessing — with
/// SPAKE2 an attacker already gets only one guess per ceremony; this caps the rate).
@@ -111,7 +117,11 @@ pub fn run(opts: Punktfunk1Options) -> Result<()> {
opts.pairing_pin.clone(),
opts.allow_pairing || opts.require_pairing,
)?);
rt.block_on(serve(opts, np))
// Standalone `punktfunk1-host` has no mgmt API to arm capture, so this recorder stays disarmed
// (harmless — the loops' `is_armed()` gate is always false). The unified `serve` shares one
// recorder across mgmt + both streaming paths instead.
let stats = StatsRecorder::new(crate::stats_recorder::default_dir());
rt.block_on(serve(opts, np, stats))
}
fn fingerprint_hex(fp: &[u8; 32]) -> String {
@@ -154,7 +164,11 @@ pub(crate) fn native_serve_opts(cfg: &NativeServe) -> Punktfunk1Options {
}
}
pub(crate) async fn serve(opts: Punktfunk1Options, np: Arc<NativePairing>) -> Result<()> {
pub(crate) async fn serve(
opts: Punktfunk1Options,
np: Arc<NativePairing>,
stats: Arc<StatsRecorder>,
) -> Result<()> {
let identity = crate::gamestream::cert::ServerIdentity::load_or_create()
.context("load host identity (~/.config/punktfunk)")?;
let fingerprint = endpoint::fingerprint_of_pem(&identity.cert_pem)
@@ -209,6 +223,10 @@ pub(crate) async fn serve(opts: Punktfunk1Options, np: Arc<NativePairing>) -> Re
// restores the box's autologin gaming session on idle, not per-disconnect — see
// `vdisplay::restore_managed_session`). Held for serve()'s lifetime; dropping it stops it.
let _restore_worker = crate::vdisplay::start_restore_worker();
// Host-lifetime cover-art warmer: fetches + caches GOG/Xbox cover art (no-auth api.gog.com /
// displaycatalog) off the hot path so `all_games()` (the library list + launch resolve) never
// blocks on the network. A no-op on a host whose stores all carry their own art.
let _art_warmer = crate::library::start_art_warmer();
// Pairing state (arming PIN + trust store) is shared with the management API. If it was armed
// at startup (the CLI flags), surface the PIN the headless operator reads from the log; the
// web console arms it on demand instead (a fresh, time-limited PIN).
@@ -269,6 +287,7 @@ pub(crate) async fn serve(opts: Punktfunk1Options, np: Arc<NativePairing>) -> Re
let audio_cap = audio_cap.clone();
let np = np.clone();
let last_pairing = last_pairing.clone();
let stats = stats.clone();
let inj_tx = injector.sender();
let mic_tx = mic_service.sender();
sessions.spawn(async move {
@@ -282,6 +301,7 @@ pub(crate) async fn serve(opts: Punktfunk1Options, np: Arc<NativePairing>) -> Re
&fingerprint,
&np,
&last_pairing,
stats,
)
.await
{
@@ -472,6 +492,7 @@ async fn serve_session(
host_fp: &[u8; 32],
np: &NativePairing,
last_pairing: &std::sync::Mutex<Option<std::time::Instant>>,
stats: Arc<StatsRecorder>,
) -> Result<()> {
let peer = conn.remote_address();
@@ -928,6 +949,12 @@ async fn serve_session(
let stop_stream = stop.clone();
let fec_target_dp = fec_target.clone(); // data-plane handle to the adaptive-FEC target
let conn_stream = conn.clone(); // for sending the source's real HDR metadata (0xCE) mid-stream
let stats_dp = stats; // data-plane handle to the shared stats recorder
// Short label for web-console stats captures: the client's cert-fingerprint prefix, else its
// peer IP (no fingerprint = anonymous TOFU/--open client).
let client_label = endpoint::peer_fingerprint(&conn)
.map(|fp| fingerprint_hex(&fp)[..12].to_string())
.unwrap_or_else(|| conn.remote_address().ip().to_string());
let result: Result<()> = async {
tokio::task::spawn_blocking(move || -> Result<()> {
// Wait briefly for the client to hole-punch our data port, then stream to its OBSERVED
@@ -982,6 +1009,8 @@ async fn serve_session(
probe_result_tx,
fec_target: fec_target_dp,
conn: conn_stream,
stats: stats_dp,
client_label,
#[cfg(target_os = "windows")]
launch: launch_for_dp,
})
@@ -1940,6 +1969,21 @@ struct FrameMsg {
deadline: std::time::Instant,
/// capture→encoded latency (µs), measured on the encode thread, carried for the perf histogram.
encode_us: u32,
/// Per-stage µs splits, measured on the capture/encode thread (0 when neither `PUNKTFUNK_PERF`
/// nor a stats capture is armed). The send thread accumulates them for the web-console sample:
/// `cap_us` = `try_latest` (ring read + colour convert), `submit_us` = NVENC `encode_picture`
/// launch, `wait_us` = `lock_bitstream` (the scheduling wait + ASIC encode = the "encode" stage).
cap_us: u32,
submit_us: u32,
wait_us: u32,
/// This frame is a re-encoded hold (the source had no fresh frame): a source-starvation signal
/// the send thread folds into `repeat_fps`.
repeat: bool,
/// Whether the per-stage splits (`cap_us`/`submit_us`/`wait_us`) were actually measured at
/// capture time (`perf` was on or a stats capture was armed). The send thread trusts this
/// instead of re-reading `is_armed()`, so a capture that arms while frames are already in flight
/// doesn't fold their zeroed splits into the first window's percentiles.
was_measured: bool,
}
/// The dedicated send thread: it owns the whole [`Session`] (so no socket clone or shared stats are
@@ -1961,6 +2005,11 @@ pub(crate) fn boost_thread_priority(critical: bool) {
// capture/encode (critical) and send (non-critical).
crate::session_tuning::on_hot_thread();
#[cfg(target_os = "windows")]
// SAFETY: `GetCurrentThread()` returns the constant pseudo-handle for the calling thread — always
// valid, thread-local in meaning, and never closed (no leak/double-close). `SetThreadPriority`
// takes that handle plus a `THREAD_PRIORITY_*` value the windows crate defines (HIGHEST or
// ABOVE_NORMAL here); it only reprioritizes this OS thread, borrows no Rust memory, and its
// `Result` is matched (a failure is logged, never UB). No pointers, lifetimes, or aliasing.
unsafe {
use windows::Win32::System::Threading::{
GetCurrentThread, SetThreadPriority, THREAD_PRIORITY_ABOVE_NORMAL,
@@ -1988,6 +2037,10 @@ pub(crate) fn boost_thread_priority(critical: bool) {
// realtime CPU class can preempt the compositor AND the game's own render thread, adding the
// very frame-time we refuse to add (opt-in only — see PUNKTFUNK_SCHED_RR).
let nice = if critical { -10 } else { -5 };
// SAFETY: `setpriority` takes three by-value integers and no pointers, so there is nothing to
// alias or outlive. `PRIO_PROCESS` with `who == 0` targets the calling task on Linux and
// `nice` is in range; the call only adjusts this thread's scheduling nice value and returns an
// `int` we inspect. No memory is touched.
let rc = unsafe { libc::setpriority(libc::PRIO_PROCESS, 0, nice) };
if rc == 0 {
tracing::debug!(critical, nice, "thread nice raised");
@@ -2004,6 +2057,19 @@ pub(crate) fn boost_thread_priority(critical: bool) {
}
}
/// Everything the send thread needs to emit web-console stats samples at its 2 s aggregation
/// boundary: the shared recorder (whose `is_armed()` gates emission) plus the negotiated
/// mode/codec/client to seed the capture's `CaptureMeta` on the first armed registration.
struct SendStats {
rec: Arc<StatsRecorder>,
width: u32,
height: u32,
fps: u32,
codec: &'static str,
client: String,
bitrate_kbps: u32,
}
#[allow(clippy::too_many_arguments)]
fn send_loop(
mut session: Session,
@@ -2014,6 +2080,7 @@ fn send_loop(
perf: bool,
burst_cap: usize,
fec_target: Arc<AtomicU8>,
stats: SendStats,
) {
boost_thread_priority(false); // transmit thread: above-normal (Apollo's encoder-thread level)
let mut last_perf = std::time::Instant::now();
@@ -2022,6 +2089,16 @@ fn send_loop(
let mut encode_us: Vec<u32> = Vec::new();
let mut pace_us: Vec<u32> = Vec::new();
let (mut paced_frames, mut immediate_frames) = (0u64, 0u64);
// Web-console stats accumulation (active when `perf` OR the recorder is armed): the per-stage
// split carried on each FrameMsg, the new-vs-repeat frame split, the cached registration id, and
// the previous window's loss snapshot for delta computation.
let mut sid: Option<u32> = None;
let (mut cap_v, mut submit_v, mut wait_v): (Vec<u32>, Vec<u32>, Vec<u32>) =
(Vec::new(), Vec::new(), Vec::new());
let (mut new_frames, mut repeat_frames) = (0u64, 0u64);
let mut last_frames_dropped = 0u64;
let mut last_packets_dropped = 0u64;
let mut last_fec_recovered = 0u64;
loop {
if stop.load(Ordering::SeqCst) {
break;
@@ -2042,9 +2119,24 @@ fn send_loop(
burst_cap,
) {
Ok(stat) => {
if perf {
if perf || stats.rec.is_armed() {
// `encode_us`/`pace_us`/fps are valid for every frame (always measured),
// including the Windows relay + tail-drain frames. The cap/submit/wait splits
// are only real when the frame was measured at capture time — a frame captured
// before this capture armed carries zeroed splits, so skip those (an empty
// window → `percentile()` returns 0) rather than pull the percentiles down.
encode_us.push(msg.encode_us);
pace_us.push(stat.spread_us);
if msg.was_measured {
cap_v.push(msg.cap_us);
submit_v.push(msg.submit_us);
wait_v.push(msg.wait_us);
}
if msg.repeat {
repeat_frames += 1;
} else {
new_frames += 1;
}
if stat.paced {
paced_frames += 1;
} else {
@@ -2060,31 +2152,91 @@ fn send_loop(
Err(std::sync::mpsc::RecvTimeoutError::Timeout) => {}
Err(std::sync::mpsc::RecvTimeoutError::Disconnected) => break, // encode thread done
}
if perf && last_perf.elapsed() >= std::time::Duration::from_secs(2) {
if last_perf.elapsed() >= std::time::Duration::from_secs(2) {
let s = session.stats();
let secs = last_perf.elapsed().as_secs_f64();
// Attempted (sealed) transmit rate; `send_dropped` is what didn't reach the wire.
let tx_mbps = (s.bytes_sent - last_bytes) as f64 * 8.0 / secs / 1_000_000.0;
tracing::info!(
tx_mbps = format!("{tx_mbps:.0}"),
send_dropped = s.packets_send_dropped - last_send_dropped,
send_dropped_total = s.packets_send_dropped,
encode_us_p50 = percentile(&mut encode_us, 0.50),
encode_us_p99 = percentile(&mut encode_us, 0.99),
pace_us_p50 = percentile(&mut pace_us, 0.50),
pace_us_p99 = percentile(&mut pace_us, 0.99),
pace_us_max = pace_us.last().copied().unwrap_or(0),
immediate_frames,
paced_frames,
"perf"
);
if perf {
tracing::info!(
tx_mbps = format!("{tx_mbps:.0}"),
send_dropped = s.packets_send_dropped - last_send_dropped,
send_dropped_total = s.packets_send_dropped,
encode_us_p50 = percentile(&mut encode_us, 0.50),
encode_us_p99 = percentile(&mut encode_us, 0.99),
pace_us_p50 = percentile(&mut pace_us, 0.50),
pace_us_p99 = percentile(&mut pace_us, 0.99),
pace_us_max = pace_us.last().copied().unwrap_or(0),
immediate_frames,
paced_frames,
"perf"
);
}
// Web-console capture: this thread owns `session.stats()`, so it emits the COMPLETE
// sample — the cap/submit/encode split carried over from the capture thread plus this
// window's pacing/goodput/loss. Loss fields are deltas vs the previous window's snapshot.
if stats.rec.is_armed() {
let session_id = *sid.get_or_insert_with(|| {
stats.rec.register_session(
"native",
stats.width,
stats.height,
stats.fps,
stats.codec,
&stats.client,
)
});
let sample = crate::stats_recorder::StatsSample {
t_ms: 0, // stamped by push_sample from the capture's monotonic start
session_id,
stages: vec![
crate::stats_recorder::StageTiming {
name: "capture".into(),
p50_us: percentile(&mut cap_v, 0.50) as f32,
p99_us: percentile(&mut cap_v, 0.99) as f32,
},
crate::stats_recorder::StageTiming {
name: "submit".into(),
p50_us: percentile(&mut submit_v, 0.50) as f32,
p99_us: percentile(&mut submit_v, 0.99) as f32,
},
crate::stats_recorder::StageTiming {
name: "encode".into(),
p50_us: percentile(&mut wait_v, 0.50) as f32,
p99_us: percentile(&mut wait_v, 0.99) as f32,
},
crate::stats_recorder::StageTiming {
name: "send".into(),
p50_us: percentile(&mut pace_us, 0.50) as f32,
p99_us: percentile(&mut pace_us, 0.99) as f32,
},
],
fps: (new_frames as f64 / secs) as f32,
repeat_fps: (repeat_frames as f64 / secs) as f32,
mbps: tx_mbps as f32,
bitrate_kbps: stats.bitrate_kbps,
frames_dropped: s.frames_dropped.saturating_sub(last_frames_dropped) as u32,
packets_dropped: s.packets_dropped.saturating_sub(last_packets_dropped) as u32,
send_dropped: s.packets_send_dropped.saturating_sub(last_send_dropped) as u32,
fec_recovered: s.fec_recovered_shards.saturating_sub(last_fec_recovered) as u32,
};
stats.rec.push_sample(session_id, sample);
}
last_perf = std::time::Instant::now();
last_bytes = s.bytes_sent;
last_send_dropped = s.packets_send_dropped;
last_frames_dropped = s.frames_dropped;
last_packets_dropped = s.packets_dropped;
last_fec_recovered = s.fec_recovered_shards;
encode_us.clear();
pace_us.clear();
cap_v.clear();
submit_v.clear();
wait_v.clear();
paced_frames = 0;
immediate_frames = 0;
new_frames = 0;
repeat_frames = 0;
}
}
}
@@ -2185,6 +2337,13 @@ struct SessionContext {
fec_target: Arc<AtomicU8>,
/// The QUIC control connection (carries host→client 0xCE source-HDR metadata mid-stream).
conn: quinn::Connection,
/// Shared streaming-stats recorder. The capture loop reads `is_armed()` per frame to decide
/// whether to measure the per-stage split; the send thread builds + pushes the aggregated
/// `StatsSample` at its 2 s boundary.
stats: Arc<StatsRecorder>,
/// Short client label (cert-fingerprint prefix, else peer IP) seeded into the capture meta on
/// the first armed stats registration.
client_label: String,
/// Windows: the store-qualified library id to launch into the interactive user session once
/// capture is live (no gamescope nesting on Windows). `None` = no launch requested. Linux uses the
/// gamescope `PUNKTFUNK_GAMESCOPE_APP` path resolved at handshake, so this field is Windows-only.
@@ -2205,7 +2364,7 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// Windows two-process secure-desktop path: when the host runs as SYSTEM (required for the secure
// desktop + SendInput), WGC can't activate in-process, so we capture the normal desktop via a
// helper spawned in the user session and relay its AUs. (Single-process WGC/DDA is used as the
// user, and stays the path on Linux.) See docs/windows-secure-desktop.md.
// user, and stays the path on Linux.) See design/windows-secure-desktop.md.
#[cfg(target_os = "windows")]
if plan.topology == crate::session_plan::SessionTopology::TwoProcessRelay {
return virtual_stream_relay(ctx);
@@ -2226,6 +2385,8 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
probe_result_tx,
fec_target,
conn,
stats,
client_label,
#[cfg(target_os = "windows")]
launch,
} = ctx;
@@ -2294,6 +2455,17 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// The bounded channel applies backpressure (the encode thread blocks if the send falls behind,
// so frames slow down rather than a dropped frame freezing the infinite-GOP stream).
let (frame_tx, frame_rx) = std::sync::mpsc::sync_channel::<FrameMsg>(3);
// The send thread emits the web-console stats sample (it owns `session.stats()`); clone the
// recorder so the capture loop keeps its own handle for the per-frame `is_armed()` gate.
let send_stats = SendStats {
rec: stats.clone(),
width: mode.width,
height: mode.height,
fps: mode.refresh_hz,
codec: "hevc",
client: client_label,
bitrate_kbps,
};
let send_thread = std::thread::Builder::new()
.name("punktfunk-send".into())
.spawn({
@@ -2308,6 +2480,7 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
perf,
burst_cap,
fec_target,
send_stats,
)
}
})
@@ -2352,6 +2525,12 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// compositing), NOT an encoder problem. Logged every 2 s when `PUNKTFUNK_PERF`.
let (mut diag_new, mut diag_repeat) = (0u64, 0u64);
let mut diag_at = std::time::Instant::now();
// Per-stage latency breakdown (PUNKTFUNK_PERF): per-call µs for the GPU-bound stages so we see
// exactly where the capture→encoded latency goes — cap=try_latest (ring read + colour convert),
// submit=encode_picture launch, wait=lock_bitstream (the scheduling wait + ASIC encode, the one
// that dominates under a GPU-saturating game).
let (mut st_cap, mut st_submit, mut st_wait): (Vec<u32>, Vec<u32>, Vec<u32>) =
(Vec::new(), Vec::new(), Vec::new());
while !stop.load(Ordering::SeqCst) && std::time::Instant::now() < deadline {
// Mid-stream session switch (the box flipped Gaming↔Desktop): rebuild the WHOLE backend in
// place — a different compositor at the SAME client mode — keeping the Session + send thread
@@ -2458,13 +2637,31 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
tracing::debug!("forcing keyframe (client decode recovery)");
enc.request_keyframe();
}
match capturer.try_latest() {
// Measure the per-stage split when `PUNKTFUNK_PERF` is set OR a web-console stats capture is
// armed (a cheap Relaxed atomic, re-read each frame). The values feed the existing perf log
// unchanged and ride each FrameMsg to the send thread, which builds the aggregated sample.
let measure = perf || stats.is_armed();
let t_cap = std::time::Instant::now();
let cap_result = capturer.try_latest();
let cap_us = if measure {
t_cap.elapsed().as_micros() as u32
} else {
0
};
if perf {
st_cap.push(cap_us);
}
let mut repeat = false;
match cap_result {
Ok(Some(f)) => {
frame = f;
diag_new += 1;
capture_rebuilds = 0; // a delivered frame clears the consecutive-loss counter
}
Ok(None) => diag_repeat += 1, // no new frame (static desktop / mid-rebuild) — repeat the last
Ok(None) => {
diag_repeat += 1; // no new frame (static desktop / mid-rebuild) — repeat the last
repeat = true;
}
// The capture source died (PipeWire/compositor thread ended, virtual output gone). Rather
// than tear the whole session down — the client has no reconnect path and would have to
// cold-restart the handshake — rebuild the pipeline IN PLACE at the current mode, exactly
@@ -2497,6 +2694,20 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
"capture diag: NEW frames from the source vs REPEATS (low new_fps at high send rate ⇒ \
the source isn't producing frames, not an encode stall)"
);
let wait_max = st_wait.iter().copied().max().unwrap_or(0);
tracing::info!(
cap_us_p50 = percentile(&mut st_cap, 0.50),
cap_us_p99 = percentile(&mut st_cap, 0.99),
submit_us_p50 = percentile(&mut st_submit, 0.50),
submit_us_p99 = percentile(&mut st_submit, 0.99),
wait_us_p50 = percentile(&mut st_wait, 0.50),
wait_us_p99 = percentile(&mut st_wait, 0.99),
wait_us_max = wait_max,
"stage perf (µs/call): cap=try_latest(ring+convert) submit=encode_picture wait=lock_bitstream(sched+ASIC)"
);
st_cap.clear();
st_submit.clear();
st_wait.clear();
diag_new = 0;
diag_repeat = 0;
diag_at = std::time::Instant::now();
@@ -2515,7 +2726,16 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// capturer hands a rotating ring of output textures, so it returns >1; other capturers default 1.
let depth = capturer.pipeline_depth().max(1);
let capture_ns = now_ns();
let t_submit = std::time::Instant::now();
enc.submit(&frame).context("encoder submit")?;
let submit_us = if measure {
t_submit.elapsed().as_micros() as u32
} else {
0
};
if perf {
st_submit.push(submit_us);
}
// This frame's pacing deadline (the next frame's due time); the send thread spreads a big frame
// up to here. Each in-flight frame carries its own (capture_ns, deadline) for when it's polled.
next += interval;
@@ -2526,7 +2746,17 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// the oldest submitted frame's AU — matching `inflight.pop_front()`.
let mut send_gone = false;
while inflight.len() >= depth {
let au = match enc.poll().context("encoder poll")? {
let t_wait = std::time::Instant::now();
let polled = enc.poll().context("encoder poll")?;
let wait_us = if measure {
t_wait.elapsed().as_micros() as u32
} else {
0
};
if perf {
st_wait.push(wait_us);
}
let au = match polled {
Some(au) => au,
None => break, // no AU ready for a submitted frame (shouldn't happen — poll blocks)
};
@@ -2552,6 +2782,11 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
flags,
deadline,
encode_us,
cap_us,
submit_us,
wait_us,
repeat,
was_measured: measure,
};
// Hand to the send thread; this blocks (backpressure) if it's behind. An Err means it
// exited (send failure / stop) — end the encode loop too.
@@ -2579,12 +2814,19 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
FLAG_PIC as u32
};
let encode_us = (now_ns().saturating_sub(cap_ns) / 1000) as u32;
// End-of-stream tail drain: the per-stage split isn't measured here (the capture loop has
// exited), so leave it zero — these last few frames are negligible for the aggregates.
let msg = FrameMsg {
data: au.data,
capture_ns: cap_ns,
flags,
deadline,
encode_us,
cap_us: 0,
submit_us: 0,
wait_us: 0,
repeat: false,
was_measured: false,
};
if frame_tx.send(msg).is_err() {
break;
@@ -2631,6 +2873,8 @@ fn virtual_stream_relay(ctx: SessionContext) -> Result<()> {
probe_result_tx,
fec_target,
conn: _conn,
stats,
client_label,
launch,
} = ctx;
tracing::info!(
@@ -2669,6 +2913,11 @@ fn virtual_stream_relay(ctx: SessionContext) -> Result<()> {
// The secure-desktop HDR drop (for the DDA leg) keys off the monitor's real state in the mux loop.
#[cfg(target_os = "windows")]
if bit_depth >= 10 {
// SAFETY: `set_advanced_color` is marked `unsafe` only because it drives the Win32 CCD API
// internally; it takes `target_id` by value (Copy `u32` — this session's live SudoVDA
// monitor's CCD target id) and sizes + owns every buffer it hands the OS on its own stack.
// We pass no pointers, so nothing must outlive the call and there is no aliasing; an
// unknown/absent target id simply returns false.
unsafe {
if crate::win_display::set_advanced_color(target.target_id, true) {
// Let the colorspace change settle before WGC creates its capture item / detects HDR.
@@ -2760,7 +3009,18 @@ fn virtual_stream_relay(ctx: SessionContext) -> Result<()> {
* 1024;
// Same encode|send split as the single-process path: this thread relays AUs, a dedicated send
// thread owns the Session and does FEC+seal+paced-send.
// thread owns the Session and does FEC+seal+paced-send. The relay encodes in the helper process,
// so this path's FrameMsgs carry no cap/submit/encode split (those stages stay 0 in the sample);
// the send thread still emits fps/goodput/pacing/loss from `session.stats()`.
let send_stats = SendStats {
rec: stats,
width: mode.width,
height: mode.height,
fps: effective_hz,
codec: "hevc",
client: client_label,
bitrate_kbps,
};
let (frame_tx, frame_rx) = std::sync::mpsc::sync_channel::<FrameMsg>(3);
let send_thread = std::thread::Builder::new()
.name("punktfunk-send".into())
@@ -2776,6 +3036,7 @@ fn virtual_stream_relay(ctx: SessionContext) -> Result<()> {
perf,
burst_cap,
fec_target,
send_stats,
)
}
})
@@ -2838,6 +3099,11 @@ fn virtual_stream_relay(ctx: SessionContext) -> Result<()> {
flags,
deadline: std::time::Instant::now() + interval,
encode_us,
cap_us: 0,
submit_us: 0,
wait_us: 0,
repeat: false,
was_measured: false,
};
let ok = frame_tx.send(msg).is_ok();
if ok {
@@ -2904,8 +3170,12 @@ fn virtual_stream_relay(ctx: SessionContext) -> Result<()> {
// desktop (the drop just churned + still went black). Instead, if the monitor is in HDR,
// open DDA in HDR (FP16 DuplicateOutput1 → BT.2020 PQ Main10); the normal-desktop DDA
// overlay/flip issues that drove us to WGC don't apply to the composed Winlogon UI.
let hdr =
unsafe { crate::win_display::advanced_color_enabled(target.target_id) };
// SAFETY: `advanced_color_enabled` is `unsafe` only because it queries the Win32 CCD
// API; it takes `target_id` by value (the live SudoVDA monitor's CCD target id) and
// allocates + owns every buffer it passes the OS internally. No caller pointer is
// involved, so nothing must outlive the call and there is no aliasing; a missing
// target id just yields false.
let hdr = unsafe { crate::win_display::advanced_color_enabled(target.target_id) };
dda = None; // reopen to capture the secure desktop
match open_dda(&target, cur_mode.width, cur_mode.height, effective_hz, hdr) {
Ok(mut p) => {
@@ -3330,12 +3600,27 @@ mod tests {
unsafe fn pull_verified(conn: *mut punktfunk_core::abi::PunktfunkConnection, count: u32) {
use punktfunk_core::error::PunktfunkStatus;
let mut got = 0u32;
// SAFETY: the inferred type is the `#[repr(C)]` POD `PunktfunkFrame` (a raw `*const u8`, a
// `usize`, and integer fields); all-zero is a valid bit pattern for every field (a null
// `data`, `len == 0`). It is only ever read after `next_au` below fully overwrites it on `Ok`,
// so the zeroed value is never observed.
let mut frame = unsafe { std::mem::zeroed() };
while got < count {
// SAFETY: `conn` is the live, non-null `*mut PunktfunkConnection` from `punktfunk_connect`
// (the caller asserts non-null and does not close it until after this returns), meeting the
// ABI's "valid handle". `&mut frame` is an exclusive, writable borrow of the local
// `PunktfunkFrame` that outlives this synchronous call. This single test thread is the only
// video puller, satisfying the one-video-thread rule.
match unsafe {
punktfunk_core::abi::punktfunk_connection_next_au(conn, &mut frame, 2000)
} {
PunktfunkStatus::Ok => {
// SAFETY: on `Ok`, `next_au` set `frame.data`/`frame.len` to the reassembled AU
// buffer the connection owns; per the ABI contract that borrow stays valid until
// the NEXT `next_au` call on this handle. We read the whole slice here (the assert
// + length-checked indexing) before the loop's next `next_au`, and `conn` outlives
// it — so the pointer is live, exactly `len` bytes, read-only, single-threaded (no
// aliasing/use-after-free).
let data = unsafe { std::slice::from_raw_parts(frame.data, frame.len) };
let idx = u32::from_le_bytes(data[0..4].try_into().unwrap());
assert_eq!(
@@ -3383,6 +3668,11 @@ mod tests {
// Session 1: TOFU (no pin) — observe the host fingerprint.
let addr = std::ffi::CString::new("127.0.0.1").unwrap();
let mut observed = [0u8; 32];
// SAFETY: `addr` is a live `CString` ("127.0.0.1") whose `as_ptr()` is the NUL-terminated
// UTF-8 host string the contract requires; `pin_sha256`/cert/key are NULL (all permitted), and
// `observed.as_mut_ptr()` is the local `[u8; 32]` — exactly the 32 writable bytes the contract
// demands, not aliased during the call. Every pointer references a live local that outlives the
// blocking connect.
let conn = unsafe {
punktfunk_connect(
addr.as_ptr(),
@@ -3401,26 +3691,28 @@ mod tests {
assert_ne!(observed, [0u8; 32], "fingerprint not reported");
let (mut w, mut h, mut hz) = (0u32, 0u32, 0u32);
assert_eq!(
unsafe { punktfunk_connection_mode(conn, &mut w, &mut h, &mut hz) },
PunktfunkStatus::Ok
);
// SAFETY: `conn` is the live, non-null connection handle just asserted above; `&mut w/h/hz` are
// exclusive, writable borrows of local `u32`s that outlive this synchronous call — the three
// writable out-params the contract names.
let st = unsafe { punktfunk_connection_mode(conn, &mut w, &mut h, &mut hz) };
assert_eq!(st, PunktfunkStatus::Ok);
assert_eq!((w, h, hz), (1280, 720, 60));
// Mid-stream renegotiation: request a new mode, the host acks on the control
// stream, and punktfunk_connection_mode reflects the switch.
assert_eq!(
unsafe {
punktfunk_core::abi::punktfunk_connection_request_mode(conn, 1920, 1080, 144)
},
PunktfunkStatus::Ok
);
// SAFETY: `conn` is the live, non-null connection handle (the only pointer arg); the remaining
// arguments are by-value integers. The handle outlives this non-blocking enqueue.
let st = unsafe {
punktfunk_core::abi::punktfunk_connection_request_mode(conn, 1920, 1080, 144)
};
assert_eq!(st, PunktfunkStatus::Ok);
let deadline = std::time::Instant::now() + std::time::Duration::from_secs(5);
loop {
assert_eq!(
unsafe { punktfunk_connection_mode(conn, &mut w, &mut h, &mut hz) },
PunktfunkStatus::Ok
);
// SAFETY: same as the earlier `punktfunk_connection_mode` call — `conn` is the live handle
// and `&mut w/h/hz` are exclusive writable borrows of locals that outlive this synchronous
// call.
let st = unsafe { punktfunk_connection_mode(conn, &mut w, &mut h, &mut hz) };
assert_eq!(st, PunktfunkStatus::Ok);
if (w, h, hz) == (1920, 1080, 144) {
break;
}
@@ -3431,6 +3723,8 @@ mod tests {
std::thread::sleep(std::time::Duration::from_millis(20));
}
// SAFETY: `pull_verified` requires a live connection handle it alone pulls video from; `conn` is
// the open, non-null handle from `punktfunk_connect` and this is the only thread touching it.
unsafe { pull_verified(conn, 25) };
let ev = punktfunk_core::input::InputEvent {
@@ -3441,13 +3735,19 @@ mod tests {
y: 2,
flags: 0,
};
assert_eq!(
unsafe { punktfunk_connection_send_input(conn, &ev) },
PunktfunkStatus::Ok
);
// SAFETY: `conn` is the live handle; `&ev` borrows the local `InputEvent`, valid and immutable
// for this synchronous enqueue — the contract's "valid InputEvent" pointer.
let st = unsafe { punktfunk_connection_send_input(conn, &ev) };
assert_eq!(st, PunktfunkStatus::Ok);
// SAFETY: `conn` was returned by `punktfunk_connect` and is never used after this call (session
// 2 below uses a fresh `conn2`); `close` takes ownership and frees the handle exactly once.
unsafe { punktfunk_connection_close(conn) };
// Session 2 (same host process — the listener survived): pin the fingerprint.
// SAFETY: as for session 1 — `addr` is the live NUL-terminated host string; here
// `observed.as_ptr()` is the 32-byte pin (the fingerprint captured above, a valid `[u8; 32]`),
// `observed_sha256_out` is NULL and cert/key are NULL. All pointers reference live locals for
// the duration of the blocking connect.
let conn2 = unsafe {
punktfunk_connect(
addr.as_ptr(),
@@ -3463,11 +3763,17 @@ mod tests {
)
};
assert!(!conn2.is_null(), "pinned reconnect failed");
// SAFETY: `conn2` is the live, non-null pinned handle, pulled only from this thread —
// `pull_verified`'s requirement.
unsafe { pull_verified(conn2, 25) };
// SAFETY: `conn2` came from `punktfunk_connect` and is not used after this; `close` frees it once.
unsafe { punktfunk_connection_close(conn2) };
// Session 3: a wrong pin must be rejected by the handshake.
let bad = [0xAAu8; 32];
// SAFETY: same shape as the prior connects — `addr` is the live host string, `bad.as_ptr()` is
// the 32-byte `[0xAA; 32]` pin, and out/cert/key are NULL; all reference live locals across the
// blocking call. (The handshake is expected to fail and return NULL here, which is sound.)
let conn3 = unsafe {
punktfunk_connect(
addr.as_ptr(),
@@ -3487,6 +3793,8 @@ mod tests {
// The host saw the rejected handshake attempt as session 3? No — a TLS-failed
// handshake never yields a connection, so accept() is still waiting. Connect once
// more (TOFU) to complete the host's third session and let it exit.
// SAFETY: same as session 1's connect — `addr` is the live host string, pin/out/cert/key all
// NULL; the pointers reference live locals for the duration of the blocking connect.
let conn4 = unsafe {
punktfunk_connect(
addr.as_ptr(),
@@ -3502,7 +3810,9 @@ mod tests {
)
};
assert!(!conn4.is_null());
// SAFETY: `conn4` is the live, non-null handle, pulled only from this thread.
unsafe { pull_verified(conn4, 25) };
// SAFETY: `conn4` came from `punktfunk_connect` and is unused after this; `close` frees it once.
unsafe { punktfunk_connection_close(conn4) };
host.join().unwrap().unwrap();
@@ -3546,6 +3856,9 @@ mod tests {
paired_store: None, // unused: the shared `np` IS the store handle
},
np_host,
StatsRecorder::new(
std::env::temp_dir().join(format!("pf-approval-stats-{}", std::process::id())),
),
))
});
std::thread::sleep(std::time::Duration::from_millis(500));
+1 -1
View File
@@ -1,7 +1,7 @@
//! `SessionPlan` — the per-session capture / topology / encoder decision, resolved **once** from
//! [`HostConfig`](crate::config) (+ the handshake-negotiated bit depth) into a typed, logged value.
//!
//! **Goal-1 stage 3** (`docs/windows-host-rewrite.md` §2.2): before this, the Windows session decision was
//! **Goal-1 stage 3** (`design/windows-host-rewrite.md` §2.2): before this, the Windows session decision was
//! re-derived at three call sites — the capture backend inside `capture::capture_virtual_output`, the
//! process topology in `punktfunk1::should_use_helper`, and the encode backend in
//! `encode::windows_resolved_backend` — each reading [`config`](crate::config) independently, with no
+13 -1
View File
@@ -9,7 +9,10 @@
//! Raw C-ABI FFI (winmm/kernel32/dwmapi/avrt) rather than the `windows` crate so it builds without
//! pulling new windows-rs features. No-op on non-Windows. Per-thread effects (MMCSS, execution
//! state) auto-revert at thread exit (= session end); the process-wide bits revert at process exit.
//! See `docs/host-latency-plan.md` Tier 3A.
//! See `design/host-latency-plan.md` Tier 3A.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
#[cfg(target_os = "windows")]
mod imp {
@@ -49,6 +52,10 @@ mod imp {
/// Process-wide tuning, applied exactly once. Reverts at process exit. Best-effort: each call is
/// independent and a failure is ignored (e.g. a non-elevated host may not get HIGH class).
fn tune_process_once() {
// SAFETY: each call is a C-ABI FFI into winmm/kernel32/dwmapi declared with a matching
// `extern "system"` signature; every argument is a plain integer (no pointers/buffers escape),
// and `GetCurrentProcess()` returns the current-process pseudo-handle (a constant, always valid,
// never closed). The body runs inside `get_or_init`, so it executes exactly once per process.
PROCESS_TUNED.get_or_init(|| unsafe {
// 1 ms timer granularity (default ~15.6 ms) — the floor for precise frame pacing and the
// encode|send split's sub-ms sleeps.
@@ -70,6 +77,11 @@ mod imp {
/// thread exits, so a session that ends tears them down without explicit bookkeeping.
pub fn on_hot_thread() {
tune_process_once();
// SAFETY: C-ABI FFI declared with matching `extern "system"` signatures. SetThreadExecutionState
// takes only flag bits. `task` is a local NUL-terminated UTF-16 buffer ("Games\0") alive for the
// whole block, so `task.as_ptr()` is a valid LPCWSTR for the call, and `&mut idx` is a live local
// u32 the call writes the task index into. The returned MMCSS handle is intentionally leaked (the
// OS reverts the characteristics at thread exit), so there is nothing to free or double-free.
unsafe {
SetThreadExecutionState(ES_CONTINUOUS | ES_DISPLAY_REQUIRED | ES_SYSTEM_REQUIRED);
let task: Vec<u16> = "Games\0".encode_utf16().collect();
+553
View File
@@ -0,0 +1,553 @@
//! Shared streaming-stats recorder (`design/stats-capture-plan.md` §1). One
//! [`StatsRecorder`] handle is created once in the unified host entry
//! (`gamestream::serve`) alongside [`crate::native_pairing::NativePairing`], and shared with
//! **both** the management API ([`crate::mgmt`]) and the streaming loops (threaded through
//! [`crate::punktfunk1::serve`] → `SessionContext` and into the GameStream encode loop). The
//! operator arms a capture from the web console, plays a session, stops, and reviews the
//! captured time-series as graphs; captures are saved to disk and survive a host restart.
//!
//! Hot-path discipline: [`StatsRecorder::is_armed`] is a cheap `Relaxed` atomic load (re-read
//! per frame); sample construction happens only at the loops' existing ~2 s / ~1 s aggregation
//! boundary, never per frame. Memory is bounded ([`MAX_SAMPLES`]); the on-disk write is atomic
//! (temp + rename); and capture ids are path-traversal-safe.
use serde::{Deserialize, Serialize};
use std::path::PathBuf;
use std::sync::atomic::{AtomicBool, AtomicU32, Ordering};
use std::sync::{Arc, Mutex};
use std::time::Instant;
use utoipa::ToSchema;
/// Cap on samples kept in one capture: ≈ 3 h at one sample / 2 s. On overflow we stop appending
/// (keeping the oldest — a saved recording must keep its start), never dropping the front and never
/// growing unbounded.
const MAX_SAMPLES: usize = 5400;
/// One pipeline stage's latency in an aggregation window (microseconds).
#[derive(Serialize, Deserialize, ToSchema, Clone, Debug)]
pub struct StageTiming {
/// `"capture" | "submit" | "encode" | "packetize" | "send"` (path-dependent).
pub name: String,
pub p50_us: f32,
pub p99_us: f32,
}
/// One aggregated sample (~ every 2 s native, ~ every 1 s GameStream).
#[derive(Serialize, Deserialize, ToSchema, Clone, Debug)]
pub struct StatsSample {
/// Milliseconds since capture start (monotonic; stamped by [`StatsRecorder::push_sample`]).
pub t_ms: u64,
/// Disambiguates concurrent sessions (usually constant).
pub session_id: u32,
/// Ordered pipeline stages for this path.
pub stages: Vec<StageTiming>,
/// Genuine NEW frames/s from the source.
pub fps: f32,
/// Re-encoded holds/s (source-starvation indicator).
pub repeat_fps: f32,
/// Transmit goodput (Mb/s).
pub mbps: f32,
/// Configured target bitrate.
pub bitrate_kbps: u32,
/// Frames dropped this window (delta).
pub frames_dropped: u32,
/// Packets dropped this window (receiver-side / reassembler, where known).
pub packets_dropped: u32,
/// Host send-buffer overflow / EAGAIN this window (delta).
pub send_dropped: u32,
/// FEC shards recovered this window (delta).
pub fec_recovered: u32,
}
/// Capture summary — the filename stem plus the negotiated mode/codec/client. Stored at the head
/// of each on-disk recording and listed standalone (without the sample body) by
/// [`StatsRecorder::list`].
#[derive(Serialize, Deserialize, ToSchema, Clone, Debug)]
pub struct CaptureMeta {
/// e.g. `"2026-06-26T20-14-03Z_5120x1440"` — also the filename stem.
pub id: String,
pub started_unix_ms: u64,
pub duration_ms: u64,
/// `"native" | "gamestream"`.
pub kind: String,
pub width: u32,
pub height: u32,
pub fps: u32,
/// `"h264" | "hevc" | "av1"`.
pub codec: String,
/// Short label / fingerprint prefix, or `""` if unknown.
pub client: String,
pub sample_count: u32,
}
/// A full capture: summary + the sample time-series. The wire + on-disk shape.
#[derive(Serialize, Deserialize, ToSchema, Clone, Debug)]
pub struct Capture {
pub meta: CaptureMeta,
pub samples: Vec<StatsSample>,
}
/// Snapshot of the in-progress capture for the management API.
#[derive(Serialize, Deserialize, ToSchema, Clone, Debug)]
pub struct StatsStatus {
/// Capture currently running.
pub armed: bool,
/// Samples in the in-progress capture.
pub sample_count: u32,
/// Unix start time of the in-progress capture (`0` if idle).
pub started_unix_ms: u64,
/// Path of the in-progress capture (`""` if idle).
pub kind: String,
}
/// Mode/codec/client seeded on the first [`StatsRecorder::register_session`] of a capture.
#[derive(Clone)]
struct MetaSeed {
kind: String,
width: u32,
height: u32,
fps: u32,
codec: String,
client: String,
}
/// The in-progress capture (present iff armed).
struct Live {
/// Monotonic clock origin for sample `t_ms`.
started: Instant,
started_unix_ms: u64,
/// Seeded once, on the first session registration.
meta: Option<MetaSeed>,
samples: Vec<StatsSample>,
/// Set once the sample cap was hit (further samples dropped). Read so it isn't dead.
truncated: bool,
}
/// Shared streaming-stats recorder: an arm/disarm flag (the hot-path gate), the in-progress
/// capture, and the on-disk capture directory.
pub struct StatsRecorder {
dir: PathBuf,
/// The hot-path gate — a `Relaxed` load per frame; never blocks the frame thread.
armed: AtomicBool,
/// The in-progress capture. Locks recover a poisoned guard (`unwrap_or_else(|e| e.into_inner())`,
/// as in `vdisplay::gamescope`) rather than `unwrap()`: a panic somewhere must never make stats
/// recording crash an otherwise-healthy stream. The critical sections only push/clone/format, so
/// poisoning is near-impossible anyway — this is belt-and-suspenders.
live: Mutex<Option<Live>>,
next_sid: AtomicU32,
}
/// The default captures directory: `~/.config/punktfunk/captures/` (next to `cert.pem`),
/// resolved via the same config-dir helper the rest of the host uses.
pub fn default_dir() -> PathBuf {
crate::gamestream::config_dir().join("captures")
}
/// `id` charset gate, matching `^[A-Za-z0-9._-]+$` — the exact charset `capture_id` emits (which
/// deliberately uses dashes, not colons, so the stem is a valid Windows filename). We additionally
/// reject `.`/`..` so a path-component sneaks no parent reference even though the charset would allow
/// bare dots. The charset already excludes `/` and `\`, so `dir.join("<id>.json")` is always a single
/// child of `dir`. Defense in depth — the endpoints are bearer-authed.
fn valid_id(id: &str) -> bool {
!id.is_empty()
&& id != "."
&& id != ".."
&& id
.bytes()
.all(|b| b.is_ascii_alphanumeric() || matches!(b, b'.' | b'_' | b'-'))
}
fn unix_ms_now() -> u64 {
std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_millis() as u64)
.unwrap_or(0)
}
/// A human-readable, filesystem-safe capture id from the start time + mode, e.g.
/// `2026-06-26T20-14-03Z_5120x1440`. Dashes (not colons) in the time so it's a valid Windows
/// filename; matches [`valid_id`].
fn capture_id(unix_ms: u64, width: u32, height: u32) -> String {
let secs = (unix_ms / 1000) as i64;
let days = secs.div_euclid(86_400);
let tod = secs.rem_euclid(86_400);
let (y, mo, d) = civil_from_days(days);
let (h, mi, s) = (tod / 3600, (tod % 3600) / 60, tod % 60);
format!("{y:04}-{mo:02}-{d:02}T{h:02}-{mi:02}-{s:02}Z_{width}x{height}")
}
/// Civil (Y, M, D) from a count of days since the Unix epoch (Howard Hinnant's `civil_from_days`).
fn civil_from_days(z: i64) -> (i64, u32, u32) {
let z = z + 719_468;
let era = if z >= 0 { z } else { z - 146_096 }.div_euclid(146_097);
let doe = z - era * 146_097; // [0, 146096]
let yoe = (doe - doe / 1460 + doe / 36524 - doe / 146_096) / 365; // [0, 399]
let y = yoe + era * 400;
let doy = doe - (365 * yoe + yoe / 4 - yoe / 100); // [0, 365]
let mp = (5 * doy + 2) / 153; // [0, 11]
let d = (doy - (153 * mp + 2) / 5 + 1) as u32; // [1, 31]
let m = if mp < 10 { mp + 3 } else { mp - 9 }; // [1, 12]
(if m <= 2 { y + 1 } else { y }, m as u32, d)
}
impl StatsRecorder {
/// Create the recorder, creating `dir` (owner-private, best-effort) if missing.
pub fn new(dir: PathBuf) -> Arc<Self> {
if let Err(e) = crate::gamestream::create_private_dir(&dir) {
tracing::warn!(dir = %dir.display(), error = %e, "could not create stats captures dir");
}
Arc::new(StatsRecorder {
dir,
armed: AtomicBool::new(false),
live: Mutex::new(None),
next_sid: AtomicU32::new(0),
})
}
/// The hot-path gate: cheap `Relaxed` load, called per frame to decide whether to measure.
pub fn is_armed(&self) -> bool {
self.armed.load(Ordering::Relaxed)
}
/// Arm a new capture. No-op if already armed (returns the current status).
pub fn start(&self) -> StatsStatus {
let mut guard = self.live.lock().unwrap_or_else(|e| e.into_inner());
if guard.is_none() {
*guard = Some(Live {
started: Instant::now(),
started_unix_ms: unix_ms_now(),
meta: None,
samples: Vec::new(),
truncated: false,
});
// Publish AFTER the live capture exists, so a frame thread that observes `armed` always
// finds a capture to push into.
self.armed.store(true, Ordering::Relaxed);
}
status_of(guard.as_ref())
}
/// A streaming loop announces itself when it first records while armed. Seeds the capture's
/// `CaptureMeta` (kind/w/h/fps/codec/client) on the FIRST registration; returns a session id
/// to stamp on the loop's samples.
pub fn register_session(
&self,
kind: &'static str,
w: u32,
h: u32,
fps: u32,
codec: &str,
client: &str,
) -> u32 {
let sid = self.next_sid.fetch_add(1, Ordering::Relaxed);
let mut guard = self.live.lock().unwrap_or_else(|e| e.into_inner());
if let Some(live) = guard.as_mut() {
if live.meta.is_none() {
live.meta = Some(MetaSeed {
kind: kind.to_string(),
width: w,
height: h,
fps,
codec: codec.to_string(),
client: client.to_string(),
});
}
}
sid
}
/// Append one aggregated sample (called from the loops' existing ~2 s / ~1 s boundary). The
/// `t_ms` is (re)stamped here from the capture's monotonic start, so callers may leave it `0`.
/// Bounded at [`MAX_SAMPLES`]: on overflow we stop appending (oldest kept) and flag truncation.
/// A no-op when nothing is armed (e.g. a `stop()` raced the frame boundary).
pub fn push_sample(&self, session_id: u32, mut sample: StatsSample) {
let mut guard = self.live.lock().unwrap_or_else(|e| e.into_inner());
let Some(live) = guard.as_mut() else { return };
if live.samples.len() >= MAX_SAMPLES {
if !live.truncated {
live.truncated = true;
tracing::warn!(
max = MAX_SAMPLES,
"stats capture hit the sample cap — further samples dropped (oldest kept)"
);
}
return;
}
sample.session_id = session_id;
sample.t_ms = live.started.elapsed().as_millis() as u64;
live.samples.push(sample);
}
/// Disarm + finalize: write `<dir>/<id>.json` atomically (temp + rename) and return its meta.
/// `Ok(None)` if nothing was recording.
pub fn stop(&self) -> std::io::Result<Option<CaptureMeta>> {
// Clear the hot-path gate first so frame threads stop building samples immediately.
self.armed.store(false, Ordering::Relaxed);
let Some(live) = self.live.lock().unwrap_or_else(|e| e.into_inner()).take() else {
return Ok(None);
};
let meta = meta_of(&live);
let capture = Capture {
meta: meta.clone(),
samples: live.samples,
};
let bytes = serde_json::to_vec(&capture).map_err(std::io::Error::other)?;
// Atomic replace: write a sibling temp then rename, so a crash mid-write can't leave a half
// file. The id is generated (always `valid_id`), so this only ever names a child of `dir`.
let path = self.dir.join(format!("{}.json", meta.id));
let tmp = self.dir.join(format!("{}.json.tmp", meta.id));
std::fs::write(&tmp, &bytes)?;
std::fs::rename(&tmp, &path)?;
Ok(Some(meta))
}
/// The in-progress capture status (idle = `armed: false`, zeroed fields).
pub fn status(&self) -> StatsStatus {
status_of(self.live.lock().unwrap_or_else(|e| e.into_inner()).as_ref())
}
/// A clone of the in-progress capture for live graphing (`None` when idle).
pub fn live_snapshot(&self) -> Option<Capture> {
let guard = self.live.lock().unwrap_or_else(|e| e.into_inner());
let live = guard.as_ref()?;
Some(Capture {
meta: meta_of(live),
samples: live.samples.clone(),
})
}
/// All saved recordings, newest first, parsing each file's `meta` head only (not the samples).
pub fn list(&self) -> Vec<CaptureMeta> {
/// Parse only the `meta` head — serde skips the (large) `samples` array.
#[derive(Deserialize)]
struct MetaOnly {
meta: CaptureMeta,
}
let mut out: Vec<CaptureMeta> = Vec::new();
let Ok(entries) = std::fs::read_dir(&self.dir) else {
return out;
};
for entry in entries.flatten() {
let path = entry.path();
if path.extension().and_then(|e| e.to_str()) != Some("json") {
continue;
}
if let Ok(bytes) = std::fs::read(&path) {
if let Ok(parsed) = serde_json::from_slice::<MetaOnly>(&bytes) {
out.push(parsed.meta);
}
}
}
out.sort_by_key(|m| std::cmp::Reverse(m.started_unix_ms));
out
}
/// Load a saved recording by id. Rejects a path-unsafe id (and a missing file) as `NotFound`.
pub fn load(&self, id: &str) -> std::io::Result<Capture> {
let path = self.recording_path(id)?;
let bytes = std::fs::read(&path)?;
serde_json::from_slice(&bytes)
.map_err(|e| std::io::Error::new(std::io::ErrorKind::InvalidData, e))
}
/// Delete a saved recording by id. Rejects a path-unsafe id (and a missing file) as `NotFound`.
pub fn delete(&self, id: &str) -> std::io::Result<()> {
let path = self.recording_path(id)?;
std::fs::remove_file(&path)
}
/// Resolve `dir/<id>.json` after validating `id`. A rejected id is `NotFound` (defense in
/// depth: never let an attacker-shaped id escape `dir`).
fn recording_path(&self, id: &str) -> std::io::Result<PathBuf> {
if !valid_id(id) {
return Err(std::io::Error::new(
std::io::ErrorKind::NotFound,
"invalid recording id",
));
}
Ok(self.dir.join(format!("{id}.json")))
}
}
/// Build the live `StatsStatus` from the optional in-progress capture.
fn status_of(live: Option<&Live>) -> StatsStatus {
match live {
Some(l) => StatsStatus {
armed: true,
sample_count: l.samples.len() as u32,
started_unix_ms: l.started_unix_ms,
kind: l.meta.as_ref().map(|m| m.kind.clone()).unwrap_or_default(),
},
None => StatsStatus {
armed: false,
sample_count: 0,
started_unix_ms: 0,
kind: String::new(),
},
}
}
/// Compute the `CaptureMeta` for an in-progress or finalizing capture (id derived from the start
/// time + negotiated mode; duration from the monotonic start).
fn meta_of(live: &Live) -> CaptureMeta {
let (kind, width, height, fps, codec, client) = match &live.meta {
Some(m) => (
m.kind.clone(),
m.width,
m.height,
m.fps,
m.codec.clone(),
m.client.clone(),
),
None => (String::new(), 0, 0, 0, String::new(), String::new()),
};
CaptureMeta {
id: capture_id(live.started_unix_ms, width, height),
started_unix_ms: live.started_unix_ms,
duration_ms: live.started.elapsed().as_millis() as u64,
kind,
width,
height,
fps,
codec,
client,
sample_count: live.samples.len() as u32,
}
}
#[cfg(test)]
mod tests {
use super::*;
fn temp_dir() -> PathBuf {
// A per-call unique dir: a process-wide counter (NOT a timestamp, which collides when tests
// run in parallel within the same millisecond — one test's cleanup would then wipe another's
// dir mid-run).
static COUNTER: AtomicU32 = AtomicU32::new(0);
let n = COUNTER.fetch_add(1, Ordering::Relaxed);
let p = std::env::temp_dir().join(format!("pf-stats-{}-{}", std::process::id(), n));
let _ = std::fs::remove_dir_all(&p);
p
}
fn sample() -> StatsSample {
StatsSample {
t_ms: 0,
session_id: 0,
stages: vec![StageTiming {
name: "capture".into(),
p50_us: 100.0,
p99_us: 200.0,
}],
fps: 60.0,
repeat_fps: 0.0,
mbps: 25.0,
bitrate_kbps: 20_000,
frames_dropped: 0,
packets_dropped: 0,
send_dropped: 0,
fec_recovered: 0,
}
}
#[test]
fn arm_record_save_load_delete() {
let dir = temp_dir();
let rec = StatsRecorder::new(dir.clone());
assert!(!rec.is_armed());
assert!(!rec.status().armed);
// A push while idle is a no-op (no live capture).
rec.push_sample(0, sample());
let st = rec.start();
assert!(st.armed);
assert!(rec.is_armed());
let sid = rec.register_session("native", 5120, 1440, 240, "hevc", "abcd");
rec.push_sample(sid, sample());
rec.push_sample(sid, sample());
assert_eq!(rec.status().sample_count, 2);
assert_eq!(rec.status().kind, "native");
assert!(rec.live_snapshot().is_some());
let meta = rec.stop().unwrap().expect("a capture was recording");
assert_eq!(meta.sample_count, 2);
assert_eq!(meta.kind, "native");
assert_eq!(meta.width, 5120);
assert!(meta.id.ends_with("_5120x1440"), "id was {}", meta.id);
assert!(!rec.is_armed());
assert!(rec.live_snapshot().is_none());
// Stop with nothing recording → Ok(None).
assert!(rec.stop().unwrap().is_none());
// It is listed and loadable.
let list = rec.list();
assert_eq!(list.len(), 1);
assert_eq!(list[0].id, meta.id);
let loaded = rec.load(&meta.id).unwrap();
assert_eq!(loaded.samples.len(), 2);
assert_eq!(loaded.meta.codec, "hevc");
// Delete removes it; a second delete is NotFound.
rec.delete(&meta.id).unwrap();
assert!(rec.list().is_empty());
assert_eq!(
rec.delete(&meta.id).unwrap_err().kind(),
std::io::ErrorKind::NotFound
);
let _ = std::fs::remove_dir_all(&dir);
}
#[test]
fn rejects_path_traversal_ids() {
let dir = temp_dir();
let rec = StatsRecorder::new(dir.clone());
for bad in [
"../secret",
"..",
".",
"a/b",
"a\\b",
"",
"/etc/passwd",
"x/../../y",
] {
assert_eq!(
rec.load(bad).unwrap_err().kind(),
std::io::ErrorKind::NotFound,
"load({bad:?}) must be rejected as NotFound"
);
assert_eq!(
rec.delete(bad).unwrap_err().kind(),
std::io::ErrorKind::NotFound,
"delete({bad:?}) must be rejected as NotFound"
);
}
let _ = std::fs::remove_dir_all(&dir);
}
#[test]
fn samples_are_bounded() {
let dir = temp_dir();
let rec = StatsRecorder::new(dir.clone());
rec.start();
for _ in 0..(MAX_SAMPLES + 50) {
rec.push_sample(0, sample());
}
assert_eq!(rec.status().sample_count as usize, MAX_SAMPLES);
let _ = std::fs::remove_dir_all(&dir);
}
#[test]
fn start_is_idempotent_while_armed() {
let dir = temp_dir();
let rec = StatsRecorder::new(dir.clone());
rec.start();
rec.register_session("native", 1920, 1080, 60, "hevc", "");
rec.push_sample(0, sample());
// A second start must NOT wipe the in-progress capture.
let st = rec.start();
assert!(st.armed);
assert_eq!(st.sample_count, 1);
let _ = std::fs::remove_dir_all(&dir);
}
}
+10 -3
View File
@@ -13,6 +13,9 @@
//! owned keepalive whose `Drop` releases the output (RAII — no explicit `destroy`). Capture
//! consumes the node via [`crate::capture::capture_virtual_output`].
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::Result;
pub use punktfunk_core::Mode;
#[cfg(target_os = "linux")]
@@ -225,6 +228,8 @@ pub fn compositor_for_kind(kind: ActiveKind) -> Option<Compositor> {
#[cfg(target_os = "linux")]
fn default_runtime_dir() -> String {
std::env::var("XDG_RUNTIME_DIR").unwrap_or_else(|_| {
// SAFETY: `getuid()` is a parameterless POSIX call that always succeeds and touches no
// memory — it just returns the calling process's real uid. Nothing is aliased or freed.
let uid = unsafe { libc::getuid() };
format!("/run/user/{uid}")
})
@@ -245,6 +250,8 @@ fn default_bus(runtime: &str) -> String {
#[cfg(target_os = "linux")]
pub fn detect_active_session() -> ActiveSession {
use std::os::unix::fs::MetadataExt;
// SAFETY: `getuid()` is a parameterless POSIX call that always succeeds and touches no memory —
// it just returns the calling process's real uid. Nothing is aliased or freed.
let uid = unsafe { libc::getuid() };
let xdg_runtime_dir = default_runtime_dir();
let dbus = default_bus(&xdg_runtime_dir);
@@ -615,12 +622,12 @@ mod gamescope;
#[cfg(target_os = "linux")]
#[path = "vdisplay/linux/kwin.rs"]
mod kwin;
#[cfg(target_os = "linux")]
#[path = "vdisplay/linux/mutter.rs"]
mod mutter;
#[cfg(target_os = "windows")]
#[path = "vdisplay/windows/manager.rs"]
pub(crate) mod manager;
#[cfg(target_os = "linux")]
#[path = "vdisplay/linux/mutter.rs"]
mod mutter;
#[cfg(target_os = "windows")]
#[path = "vdisplay/windows/pf_vdisplay.rs"]
pub(crate) mod pf_vdisplay;
@@ -15,6 +15,8 @@
//! the KWin session's environment.
#![allow(clippy::all, dead_code, non_camel_case_types, non_snake_case, unused)]
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Mode, VirtualDisplay, VirtualOutput};
use anyhow::{anyhow, bail, Context, Result};
@@ -495,6 +497,11 @@ fn run(
events: libc::POLLIN,
revents: 0,
};
// SAFETY: `&mut pfd` points at a single live, fully-initialized `libc::pollfd` on the stack, and
// the count `1` matches that one-element array, so `poll` reads `fd`/`events` and writes `revents`
// strictly within `pfd`. `pfd.fd` is the Wayland connection's fd, valid because `conn` (and the
// `prepare_read` guard) are alive across the call. `poll` blocks up to 200 ms and writes only
// `revents`; `pfd` outlives the synchronous call and aliases nothing (a fresh local).
let r = unsafe { libc::poll(&mut pfd, 1, 200) };
if r > 0 && (pfd.revents & libc::POLLIN) != 0 {
let _ = guard.read();
@@ -13,6 +13,9 @@
//! its `Drop` releases the refcount (a *stale* lease — its monitor was preempted + recreated under it —
//! is a no-op, so it can never tear down the live monitor).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::os::windows::io::{AsRawHandle, OwnedHandle};
use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
use std::sync::{Arc, Mutex, Once, OnceLock};
@@ -60,8 +63,12 @@ pub(crate) trait VdisplayDriver: Send + Sync {
///
/// # Safety
/// `dev` must be the live control handle from [`open`](Self::open).
unsafe fn add_monitor(&self, dev: HANDLE, mode: Mode, render_luid: Option<LUID>)
-> Result<AddedMonitor>;
unsafe fn add_monitor(
&self,
dev: HANDLE,
mode: Mode,
render_luid: Option<LUID>,
) -> Result<AddedMonitor>;
/// REMOVE the monitor identified by `key`.
///
/// # Safety
@@ -147,7 +154,8 @@ pub(crate) fn init(driver: Box<dyn VdisplayDriver>) -> &'static VirtualDisplayMa
/// The process-wide manager. Panics if reached before a backend called [`init`] — by construction a
/// session is only ever created after `vdisplay::open` constructed the backend (which calls `init`).
pub(crate) fn vdm() -> &'static VirtualDisplayManager {
VDM.get().expect("VirtualDisplayManager used before a backend initialised it")
VDM.get()
.expect("VirtualDisplayManager used before a backend initialised it")
}
impl VirtualDisplayManager {
@@ -161,6 +169,10 @@ impl VirtualDisplayManager {
if let Some(d) = self.device.get() {
return Ok(HANDLE(d.as_raw_handle()));
}
// SAFETY: `VdisplayDriver::open` is `unsafe` only because it issues SetupAPI + `DeviceIoControl`
// FFI in the caller's apartment; `ensure_device` runs that on the acquiring thread under the
// `state` lock (callers hold it), so there is no concurrent open. `open` has no handle
// precondition to uphold, and the `OwnedHandle` it returns is the sole owner of the device.
let (handle, watchdog_s) = unsafe { self.driver.open()? };
self.watchdog_s.store(watchdog_s, Ordering::Relaxed);
let raw = HANDLE(handle.as_raw_handle());
@@ -171,9 +183,7 @@ impl VirtualDisplayManager {
/// The live control handle for the pinger/linger threads (lock-free: the device never changes once
/// opened). `None` only before the first acquire opened it.
fn device_handle(&self) -> Option<HANDLE> {
self.device
.get()
.map(|d| HANDLE(d.as_raw_handle()))
self.device.get().map(|d| HANDLE(d.as_raw_handle()))
}
/// Open + initialise the backend (validates the driver is present). Mirrors the old
@@ -196,8 +206,7 @@ impl VirtualDisplayManager {
// client is gone). A REUSED IddCx swap-chain is DEAD, so joining it hands a black screen —
// PREEMPT: tear the old monitor down (its key/topology are restored) and create a fresh one. The
// old session's lease is gen-stamped, so its later drop is a no-op and can't tear down the new one.
if idd_push_mode()
&& matches!(*state, MgrState::Active { .. } | MgrState::Lingering { .. })
if idd_push_mode() && matches!(*state, MgrState::Active { .. } | MgrState::Lingering { .. })
{
if let MgrState::Active { mon, .. } | MgrState::Lingering { mon, .. } =
std::mem::replace(&mut *state, MgrState::Idle)
@@ -206,6 +215,10 @@ impl VirtualDisplayManager {
old_target = mon.target_id,
"IDD-push reconnect — preempting the prior session, recreating a fresh monitor"
);
// SAFETY: `teardown` requires `dev` to be the live control handle; `dev` is the value
// `ensure_device()` returned above (the device is cached in the `OnceLock` and never
// closed for the manager's lifetime). `mon` was moved out of the prior `Active`/
// `Lingering` state by `mem::replace`, so it is exclusively owned here — no aliasing.
unsafe { self.teardown(dev, mon) };
// Let the OS finish the ASYNC monitor departure before the next ADD; a back-to-back
// REMOVE→ADD races the teardown and the ADD IOCTL is rejected under reconnect churn.
@@ -219,21 +232,37 @@ impl VirtualDisplayManager {
if let MgrState::Active { mon, refs } = &mut *state {
*refs += 1;
if mon.mode != mode {
// SAFETY: `reconfigure` only manipulates the live display topology via the CCD/GDI
// helpers and needs an exclusive `&mut Monitor`. `mon` is the `&mut` into the current
// `Active` state, held under the `state` lock, so nothing else reconfigures it concurrently.
unsafe { self.reconfigure(mon, mode) };
}
tracing::info!(refs = *refs, backend = self.driver.name(), "virtual monitor reused (concurrent / reconfigure session)");
tracing::info!(
refs = *refs,
backend = self.driver.name(),
"virtual monitor reused (concurrent / reconfigure session)"
);
return Ok(self.output_for(mon));
}
// Idle or Lingering: repurpose a lingering monitor / create a fresh one → Active{refs:1}.
let mon = match std::mem::replace(&mut *state, MgrState::Idle) {
MgrState::Lingering { mut mon, .. } => {
tracing::info!(backend = self.driver.name(), "virtual monitor reused (reconnect within the linger window)");
tracing::info!(
backend = self.driver.name(),
"virtual monitor reused (reconnect within the linger window)"
);
if mon.mode != mode {
// SAFETY: `reconfigure` needs an exclusive `&mut Monitor` and only touches the live
// display topology. `mon` is the local monitor just moved out of the `Lingering`
// state (sole owner), and we hold the `state` lock — no concurrent reconfigure.
unsafe { self.reconfigure(&mut mon, mode) };
}
mon
}
// SAFETY: `create_monitor` requires `dev` to be the live control handle; `dev` is the
// handle `ensure_device()` returned above (cached in the `OnceLock`, never closed for the
// manager's lifetime), and we hold the `state` lock.
MgrState::Idle => unsafe { self.create_monitor(dev, mode)? },
MgrState::Active { .. } => unreachable!("handled above"),
};
@@ -262,17 +291,27 @@ impl VirtualDisplayManager {
/// # Safety
/// `dev` must be the live control handle.
unsafe fn create_monitor(&'static self, dev: HANDLE, mode: Mode) -> Result<Monitor> {
// SAFETY: `create_monitor`'s own `# Safety` contract guarantees `dev` is the live control
// handle; we forward it unchanged to `add_monitor`, whose precondition is exactly that.
// `resolve_render_pin()` returns an `Option<LUID>` by value (plain `Copy`), so no borrowed
// memory crosses the call.
let added = unsafe { self.driver.add_monitor(dev, mode, resolve_render_pin())? };
// Mandatory keepalive: ping inside the watchdog window or the driver tears all displays down.
// The pinger reaches the singleton for both the device + the driver — no raw-handle smuggle.
let stop = Arc::new(AtomicBool::new(false));
let interval = Duration::from_millis(self.watchdog_s.load(Ordering::Relaxed) as u64 * 1000 / 3);
let interval =
Duration::from_millis(self.watchdog_s.load(Ordering::Relaxed) as u64 * 1000 / 3);
let stop_t = stop.clone();
let pinger = thread::spawn(move || {
let mut warned = false;
while !stop_t.load(Ordering::Relaxed) {
if let Some(h) = vdm().device_handle() {
// SAFETY: `ping` requires `dev` to be the live control handle. `h` is from
// `device_handle()` (the `Some` branch) — the `OnceLock<Arc<OwnedHandle>>` that,
// once set, is never cleared or closed for the process lifetime, so the handle is
// live for this call. The pinger thread only spins while the `&'static` manager
// singleton (and thus the device) lives.
match unsafe { vdm().driver.ping(h) } {
Ok(()) => warned = false,
Err(e) => {
@@ -292,6 +331,9 @@ impl VirtualDisplayManager {
let mut gdi_name = None;
for _ in 0..15 {
thread::sleep(Duration::from_millis(200));
// SAFETY: `resolve_gdi_name` is `unsafe` for its CCD (QueryDisplayConfig) FFI; it takes a
// plain `Copy` `u32` target id by value and returns an owned `String`, so no caller memory
// is borrowed across the call.
if let Some(n) = unsafe { resolve_gdi_name(added.target_id) } {
gdi_name = Some(n);
break;
@@ -308,6 +350,9 @@ impl VirtualDisplayManager {
// display(s) first via the atomic CCD path promotes the IDD to a composited primary with no
// MODE_CHANGE storm. Opt out with PUNKTFUNK_NO_ISOLATE=1.
if std::env::var("PUNKTFUNK_NO_ISOLATE").is_err() {
// SAFETY: `isolate_displays_ccd` is `unsafe` for its CCD topology FFI; it takes a
// `Copy` `u32` by value and returns an owned `SavedConfig` snapshot (no borrowed
// memory crosses). It runs under the `state` lock, the sole mutator of the topology.
ccd_saved = unsafe { isolate_displays_ccd(added.target_id) };
} else {
tracing::info!("display isolation skipped (PUNKTFUNK_NO_ISOLATE) — IDD stays extended");
@@ -339,10 +384,15 @@ impl VirtualDisplayManager {
/// Touches the live display topology via the CCD/GDI helpers.
unsafe fn reconfigure(&self, mon: &mut Monitor, mode: Mode) {
tracing::info!(
old = format!("{}x{}@{}", mon.mode.width, mon.mode.height, mon.mode.refresh_hz),
old = format!(
"{}x{}@{}",
mon.mode.width, mon.mode.height, mon.mode.refresh_hz
),
new = format!("{}x{}@{}", mode.width, mode.height, mode.refresh_hz),
"virtual-display: reconfiguring reused monitor to the new client mode"
);
// SAFETY: `resolve_gdi_name` is `unsafe` for its CCD FFI; it takes the `Copy` `u32`
// `mon.target_id` by value and returns an owned `String`, so nothing borrowed crosses the call.
if let Some(n) = unsafe { resolve_gdi_name(mon.target_id) } {
mon.gdi_name = Some(n);
}
@@ -365,10 +415,16 @@ impl VirtualDisplayManager {
if let Some(saved) = &mon.ccd_saved {
restore_displays_ccd(saved);
}
// SAFETY: `teardown`'s own `# Safety` contract guarantees `dev` is the live control handle, and
// `remove_monitor` requires exactly that. `&mon.key` borrows the `MonitorKey` inside the
// still-owned `mon`, alive for this synchronous IOCTL, so the pointer the driver reads stays valid.
if let Err(e) = unsafe { self.driver.remove_monitor(dev, &mon.key) } {
tracing::warn!("virtual-display REMOVE failed: {e:#}");
} else {
tracing::info!(backend = self.driver.name(), "virtual-display monitor removed");
tracing::info!(
backend = self.driver.name(),
"virtual-display monitor removed"
);
}
}
@@ -385,10 +441,16 @@ impl VirtualDisplayManager {
return;
}
*state = match std::mem::replace(&mut *state, MgrState::Idle) {
MgrState::Active { mon, refs } if refs > 1 => MgrState::Active { mon, refs: refs - 1 },
MgrState::Active { mon, refs } if refs > 1 => MgrState::Active {
mon,
refs: refs - 1,
},
MgrState::Active { mon, .. } => {
let ms = linger_ms();
tracing::info!(linger_ms = ms, "virtual-display: last session left — lingering before teardown");
tracing::info!(
linger_ms = ms,
"virtual-display: last session left — lingering before teardown"
);
MgrState::Lingering {
mon,
until: Instant::now() + Duration::from_millis(ms),
@@ -470,6 +532,10 @@ impl VirtualDisplayManager {
}
};
if let Some(mon) = taken {
// SAFETY: `teardown` requires `dev` to be the live control handle; `dev` is from
// `self.device_handle()` (the `Some` checked just above), i.e. the cached
// `OwnedHandle` live for the process lifetime. `mon` was moved out of the
// `Lingering` state under the `state` lock, so it is exclusively owned here.
unsafe { self.teardown(dev, mon) };
}
})
@@ -503,9 +569,13 @@ fn idd_push_mode() -> bool {
/// ACCESS_LOST storm SudoVDA hit when pinned).
fn resolve_render_pin() -> Option<LUID> {
if crate::config::config().render_adapter.is_some() {
// SAFETY: `resolve_render_adapter_luid` is `unsafe` only for its DXGI factory FFI; it takes no
// arguments and returns an `Option<LUID>` by value, so there is no input/borrow to keep valid.
unsafe { crate::win_adapter::resolve_render_adapter_luid() }
} else if crate::config::config().idd_push {
tracing::info!("IDD push: pinning the discrete render GPU (SET_RENDER_ADAPTER)");
// SAFETY: as above — `resolve_render_adapter_luid` takes no arguments and returns an
// `Option<LUID>` by value; the `unsafe` covers only its DXGI factory enumeration FFI.
unsafe { crate::win_adapter::resolve_render_adapter_luid() }
} else {
tracing::info!(
@@ -6,7 +6,7 @@
//!
//! Control surface: a device-interface-GUID + `CreateFileW` + `DeviceIoControl` IOCTL protocol, with
//! the wire contract OWNED by [`pf_driver_proto::control`] (versioned + `#[repr(C)] Pod` structs,
//! NOT the SudoVDA ABI). No DLL, no named pipe. See `docs/windows-host-rewrite.md`.
//! NOT the SudoVDA ABI). No DLL, no named pipe. See `design/windows-host-rewrite.md`.
//!
//! This is a faithful clone of [`super::sudovda`] (the shipping fallback) repointed at the new driver:
//! same reference-counted/lingering monitor lifecycle, same CCD isolation + active-mode forcing — those
@@ -14,6 +14,9 @@
//! target id, so the CCD/DXGI code works unchanged). Only the driver-specific bits (GUID, IOCTL codes,
//! request/reply structs, the version handshake) differ, per `pf_driver_proto`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::ffi::c_void;
use std::mem::size_of;
use std::os::windows::io::{FromRawHandle, OwnedHandle};
@@ -144,15 +147,26 @@ impl VdisplayDriver for PfVdisplayDriver {
}
unsafe fn open(&self) -> Result<(OwnedHandle, u32)> {
// SAFETY: `open_device` is `unsafe` only because it issues SetupAPI enumeration + `CreateFileW`
// FFI; it takes no arguments and returns an owned raw `HANDLE` (or `Err`). Called here on the
// backend-init thread, with no precondition beyond a valid thread context.
let device = unsafe { open_device()? };
// HARD protocol-version check (unlike SudoVDA's best-effort log): a mismatched host/driver pair
// fails loudly here rather than corrupting the IOCTL stream.
let mut info_buf = [0u8; size_of::<control::InfoReply>()];
// SAFETY: `ioctl` requires `h` to be a valid device handle and its slices to be valid for the
// call. `device` is the live handle just returned by `open_device`. `IOCTL_GET_INFO` takes no
// input (`&[]`) and writes into `info_buf`, a stack `[u8; size_of::<InfoReply>()]` whose length
// is passed as the output size — so `DeviceIoControl` can't write OOB — and which outlives this
// synchronous call.
unsafe { ioctl(device, control::IOCTL_GET_INFO, &[], &mut info_buf) }
.context("pf-vdisplay IOCTL_GET_INFO (version handshake)")?;
let info: control::InfoReply =
bytemuck::pod_read_unaligned(&info_buf[..size_of::<control::InfoReply>()]);
if info.protocol_version != pf_driver_proto::PROTOCOL_VERSION {
// SAFETY: `device` is the valid raw handle from `open_device` and has NOT yet been wrapped
// in an `OwnedHandle` (that happens only on the success path below), so this error path is
// the sole owner closing it exactly once — no double-close.
unsafe {
let _ = CloseHandle(device);
}
@@ -171,12 +185,19 @@ impl VdisplayDriver for PfVdisplayDriver {
);
// Reap monitors orphaned by a crashed previous host — a FIRST-CLASS op (driver returns SUCCESS).
let mut none: [u8; 0] = [];
// SAFETY: `device` is the live handle from `open_device` (still owned here, before it is wrapped
// below). `IOCTL_CLEAR_ALL` has no input and no output: `&[]` and the empty `none` slice pass
// zero-length buffers, so nothing is read or written through them.
if unsafe { ioctl(device, control::IOCTL_CLEAR_ALL, &[], &mut none) }.is_ok() {
tracing::info!("cleared orphaned virtual monitors on host startup");
} else {
tracing::warn!("pf-vdisplay IOCTL_CLEAR_ALL failed on startup (continuing)");
}
Ok((
// SAFETY: `device` is the valid handle from `open_device`, still owned here and NOT closed
// on this success path (the error paths above close it and return). `from_raw_handle`'s
// contract — caller owns a valid handle — holds, so ownership transfers cleanly into the
// `OwnedHandle`: exactly one owner, which `CloseHandle`s it on drop.
unsafe { OwnedHandle::from_raw_handle(device.0 as _) },
watchdog_s,
))
@@ -199,6 +220,9 @@ impl VdisplayDriver for PfVdisplayDriver {
// SET_RENDER_ADAPTER (opt-in; pf-vdisplay IMPLEMENTS it). Non-fatal on failure: the driver reports
// its real render LUID in the shared header, so the host binds correctly even if this is ignored.
if let Some(luid) = render_luid {
// SAFETY: `add_monitor`'s `# Safety` contract guarantees `dev` is the live control handle,
// which is `set_render_adapter`'s precondition; we forward it unchanged. `luid` is a plain
// `Copy` `LUID` passed by value — no borrow crosses the call.
match unsafe { set_render_adapter(dev, luid) } {
Ok(()) => tracing::info!(
luid = format!("{:08x}:{:08x}", luid.HighPart, luid.LowPart),
@@ -210,14 +234,17 @@ impl VdisplayDriver for PfVdisplayDriver {
}
}
let mut out = [0u8; size_of::<control::AddReply>()];
unsafe { ioctl(dev, control::IOCTL_ADD, bytemuck::bytes_of(&add), &mut out) }.with_context(
|| {
// SAFETY: per `add_monitor`'s contract `dev` is the live control handle. `bytemuck::bytes_of(&add)`
// borrows the local `AddRequest` (alive across this synchronous call) as the input bytes, and
// `out` is a stack `[u8; size_of::<AddReply>()]` whose length bounds the kernel's write — both
// buffers outlive the call.
unsafe { ioctl(dev, control::IOCTL_ADD, bytemuck::bytes_of(&add), &mut out) }
.with_context(|| {
format!(
"pf-vdisplay ADD {}x{}@{}",
mode.width, mode.height, mode.refresh_hz
)
},
)?;
})?;
// `pod_read_unaligned` (NOT `from_bytes`): `out` is a stack `[u8; N]` with no guaranteed 4-byte
// alignment, and `from_bytes` PANICS on a mismatch. This copies into an aligned `AddReply`.
let reply: control::AddReply =
@@ -260,11 +287,24 @@ impl VdisplayDriver for PfVdisplayDriver {
session_id: *session_id,
};
let mut none: [u8; 0] = [];
unsafe { ioctl(dev, control::IOCTL_REMOVE, bytemuck::bytes_of(&req), &mut none) }.map(|_| ())
// SAFETY: per `remove_monitor`'s contract `dev` is the live control handle. `bytes_of(&req)`
// borrows the local `RemoveRequest` for the duration of this synchronous call as the input
// bytes; `none` is empty, so there is no output buffer.
unsafe {
ioctl(
dev,
control::IOCTL_REMOVE,
bytemuck::bytes_of(&req),
&mut none,
)
}
.map(|_| ())
}
unsafe fn ping(&self, dev: HANDLE) -> Result<()> {
let mut none: [u8; 0] = [];
// SAFETY: per `ping`'s contract `dev` is the live control handle. `IOCTL_PING` has no input
// (`&[]`) and no output (`none` is empty), so no memory is read or written through the buffers.
unsafe { ioctl(dev, control::IOCTL_PING, &[], &mut none) }.map(|_| ())
}
}
@@ -292,7 +332,11 @@ impl VirtualDisplay for PfVdisplayDisplay {
/// Readiness probe: can we open the pf-vdisplay control device?
pub fn probe() -> Result<()> {
// SAFETY: `open_device` is `unsafe` only for its SetupAPI + `CreateFileW` FFI; no arguments, returns
// an owned raw `HANDLE` (or `Err`).
let h = unsafe { open_device()? };
// SAFETY: `h` is the handle just opened by `open_device` in this function, owned here and not yet
// handed anywhere else, so this closes it exactly once — no double-close, no use-after-close.
unsafe {
let _ = CloseHandle(h);
}
@@ -301,6 +345,9 @@ pub fn probe() -> Result<()> {
/// Is the pf-vdisplay driver present (device interface enumerable)?
pub fn is_available() -> bool {
// SAFETY: `open_device` returns an owned raw `HANDLE`; on `Ok(h)` the handle is moved into the
// closure (sole owner) and closed exactly once via `CloseHandle`, on `Err` there is nothing to
// close — so no double-close and no leak of an opened handle. The `unsafe` covers both FFI calls.
unsafe { open_device().map(|h| CloseHandle(h)).is_ok() }
}
@@ -15,6 +15,9 @@
//! that is correct for launching *our own* streamer, but a store launcher needs the real user's token
//! for activation + auth). The host process itself stays SYSTEM.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::{bail, Context, Result};
use std::path::Path;
use windows::core::{PCWSTR, PWSTR};
@@ -40,6 +43,8 @@ use windows::Win32::System::Threading::{
/// user is logged on (a pre-login / freshly-booted box can stream the login desktop but cannot
/// auto-launch a store title until someone signs in).
pub fn spawn_in_active_session(cmdline: &str, workdir: Option<&Path>) -> Result<u32> {
// SAFETY: `spawn_inner` is unsafe only for its Win32 FFI; it has no caller-side preconditions — it
// validates the session/token itself and owns every handle it opens — so calling it is always sound.
unsafe { spawn_inner(cmdline, workdir) }
}
@@ -21,6 +21,9 @@
//! loaded into the service's environment and carried to the host child. Logs land in
//! `%ProgramData%\punktfunk\logs\`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::{bail, Context, Result};
use std::ffi::{c_void, OsString};
use std::os::windows::io::{AsRawHandle, FromRawHandle, OwnedHandle};
@@ -205,14 +208,19 @@ fn run_service() -> Result<()> {
// Two manual-reset events: STOP (set once, never reset) and SESSION (set on a console
// connect/disconnect, reset by the supervisor after it reacts).
// SAFETY: CreateEventW with null attributes (None), manual-reset=true, initial-state=false and a null
// name passes no pointers into Rust memory; it returns a fresh, owned event HANDLE (or Err, via `?`).
// Nothing aliases or outlives the call.
let stop_raw =
unsafe { CreateEventW(None, true, false, PCWSTR::null()) }.context("CreateEvent stop")?;
// SAFETY: as above — a second fresh manual-reset event; no pointers into Rust memory, no aliasing.
let session_raw = unsafe { CreateEventW(None, true, false, PCWSTR::null()) }
.context("CreateEvent session")?;
// Own each event handle (the OS reaps them at process exit); the handler reaches them through the
// OnceLocks, while `supervise` waits on the borrowed `HANDLE`s. SAFETY: each is a fresh CreateEventW
// handle we own — take ownership exactly once.
let stop_owned = unsafe { OwnedHandle::from_raw_handle(stop_raw.0) };
// SAFETY: `session_raw` is the other fresh CreateEventW handle nothing else owns — take ownership once.
let session_owned = unsafe { OwnedHandle::from_raw_handle(session_raw.0) };
let stop = HANDLE(stop_owned.as_raw_handle());
let session = HANDLE(session_owned.as_raw_handle());
@@ -226,6 +234,9 @@ fn run_service() -> Result<()> {
match control {
ServiceControl::Stop | ServiceControl::Preshutdown | ServiceControl::Shutdown => {
if let Some(h) = event_handle(&STOP_EVENT) {
// SAFETY: `h` borrows the STOP event HANDLE from the STOP_EVENT OwnedHandle, set for
// the whole process lifetime and never closed before exit, so it is open here; SetEvent
// only signals the event and passes no Rust memory.
unsafe { SetEvent(h) }.ok();
}
ServiceControlHandlerResult::NoError
@@ -237,6 +248,9 @@ fn run_service() -> Result<()> {
ConsoleConnect | ConsoleDisconnect | SessionLogon
) {
if let Some(h) = event_handle(&SESSION_EVENT) {
// SAFETY: `h` borrows the SESSION event HANDLE from the SESSION_EVENT OwnedHandle,
// alive for the whole process lifetime and never closed before exit; SetEvent only
// signals the event and passes no Rust memory.
unsafe { SetEvent(h) }.ok();
}
}
@@ -297,6 +311,8 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
// Kill-on-close job so a service crash never orphans the SYSTEM host; BREAKAWAY_OK lets the host
// still spawn the WGC helper. Owned: dropping it at function exit (KILL_ON_JOB_CLOSE) reaps any
// straggler still inside it — no manual CloseHandle(job).
// SAFETY: `make_job` is unsafe only for its Win32 FFI; it has no caller preconditions and creates +
// immediately takes RAII ownership of the job object, so calling it here is sound.
let job = unsafe { make_job() }.context("create job object")?;
let mut restarts: u32 = 0;
@@ -304,6 +320,8 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
if wait_one(stop, 0) {
break;
}
// SAFETY: WTSGetActiveConsoleSessionId takes no arguments and returns the active console session
// id (or 0xFFFFFFFF); it passes no pointers, so the call is always sound.
let session = unsafe { WTSGetActiveConsoleSessionId() };
if session == 0xFFFF_FFFF {
// No interactive session yet (boot / fully logged out). Wait, but wake on stop/session.
@@ -311,12 +329,17 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
if wait_any(&[stop, session_ev], 3000) == Some(0) {
break;
}
// SAFETY: `session_ev` is the SESSION event HANDLE borrowed from the SESSION_EVENT OwnedHandle,
// alive for the process lifetime; ResetEvent only clears its signalled state, no Rust memory.
unsafe { ResetEvent(session_ev) }.ok();
continue;
}
// BORROW the owned job handle for AssignProcessToJobObject inside spawn_host.
let job_h = HANDLE(job.as_raw_handle());
// SAFETY: `spawn_host` is unsafe only for its Win32 FFI. `session` is a valid console session id
// (checked != 0xFFFFFFFF above), `cmdline`/`workdir` are live borrows for the call, and `job_h`
// borrows the still-live `job` OwnedHandle — every argument is valid for the call's duration.
let child = match unsafe { spawn_host(session, &cmdline, &workdir, job_h) } {
Ok(child) => child,
Err(e) => {
@@ -340,6 +363,9 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
match reason {
Some(0) => {
// Stop: terminate the child and exit (the `child` drop closes its handles).
// SAFETY: `proc_h` is a HANDLE copy of the still-live `child.process` OwnedHandle (not
// dropped until end of iteration), so the process handle is open; TerminateProcess only
// signals termination by handle and passes no Rust memory.
unsafe {
let _ = TerminateProcess(proc_h, 0);
}
@@ -347,7 +373,10 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
}
Some(1) => {
// Session change: relaunch only if the active console session actually moved.
// SAFETY: `session_ev` borrows the process-lifetime SESSION_EVENT OwnedHandle; ResetEvent
// only clears its signalled state and passes no Rust memory.
unsafe { ResetEvent(session_ev) }.ok();
// SAFETY: WTSGetActiveConsoleSessionId takes no arguments and passes no pointers.
let now = unsafe { WTSGetActiveConsoleSessionId() };
if now != session {
tracing::info!(
@@ -355,6 +384,8 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
new = now,
"console session changed — relaunching host"
);
// SAFETY: `proc_h` copies the still-live `child.process` OwnedHandle (dropped only at
// end of iteration), so the handle is open; TerminateProcess only signals by handle.
unsafe {
let _ = TerminateProcess(proc_h, 0);
}
@@ -363,6 +394,8 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
}
// Same session (e.g. a stray notification) — keep waiting on the same child.
let r = wait_any(&[stop, proc_h], INFINITE);
// SAFETY: `proc_h` copies the still-live `child.process` OwnedHandle (dropped only at end
// of iteration), so the handle is open; TerminateProcess only signals by handle.
unsafe {
let _ = TerminateProcess(proc_h, 0);
}
@@ -394,11 +427,17 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
/// `true` if `h` is signalled within `ms`.
fn wait_one(h: HANDLE, ms: u32) -> bool {
// SAFETY: `&[h]` is a live one-element HANDLE slice the caller keeps open across the wait; the kernel
// reads exactly one handle (the binding derives the count from the slice length), bWaitAll=false,
// `ms` is a timeout — no pointers escape and the array is only read for this synchronous call.
unsafe { WaitForMultipleObjects(&[h], false, ms) == WAIT_OBJECT_0 }
}
/// Wait on several handles; returns the index of the first signalled, or `None` on timeout.
fn wait_any(handles: &[HANDLE], ms: u32) -> Option<usize> {
// SAFETY: `handles` is a live slice the caller keeps open across the wait; WaitForMultipleObjects
// reads exactly `handles.len()` handles (the binding derives the count from the slice), bWaitAll=false,
// `ms` is a timeout — the array is only read for this synchronous call and no pointers escape it.
let r = unsafe { WaitForMultipleObjects(handles, false, ms) };
let idx = r.0.wrapping_sub(WAIT_OBJECT_0.0);
(idx < handles.len() as u32).then_some(idx as usize)
@@ -1,5 +1,5 @@
//! USER-session WGC helper (Windows) — part of the two-process secure-desktop design
//! (docs/windows-secure-desktop.md).
//! (design/windows-secure-desktop.md).
//!
//! WGC won't activate under the SYSTEM account, but the host must run as SYSTEM for the secure
//! desktop. So the SYSTEM host spawns THIS helper in the interactive user session
@@ -12,6 +12,9 @@
//!
//! Wire framing on stdout, per AU: `[u32 len LE][u64 pts_ns LE][u8 keyframe][len bytes data]`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use crate::capture::{dxgi::WinCaptureTarget, wgc::WgcCapturer, Capturer};
use crate::encode::{self, Codec};
use anyhow::{Context, Result};
@@ -72,6 +75,9 @@ pub fn run(opts: HelperOptions) -> Result<()> {
.name("pf-present-trigger".into())
.spawn(move || {
tracing::info!("present-trigger: starting D3D present loop on the virtual display");
// SAFETY: `present_trigger` is unsafe only for its Win32/D3D11 FFI; it has no caller
// preconditions (it creates and exclusively owns its own window, device, and swapchain on
// this dedicated thread), so the call is sound.
if let Err(e) = unsafe { present_trigger(w, h) } {
tracing::warn!("present-trigger error: {e:#}");
}
@@ -8,6 +8,9 @@
//! them, which let the SudoVDA backend be dropped without losing them (audit §9 / Goal 2 — done). The
//! plan's `windows/display_ccd.rs`. Extracted verbatim from the former SudoVDA backend before its removal.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::mem::size_of;
use windows::core::PCWSTR;
@@ -16,9 +19,9 @@ use windows::Win32::Devices::Display::{
QueryDisplayConfig, SetDisplayConfig, DISPLAYCONFIG_DEVICE_INFO_GET_ADVANCED_COLOR_INFO,
DISPLAYCONFIG_DEVICE_INFO_GET_SOURCE_NAME, DISPLAYCONFIG_DEVICE_INFO_SET_ADVANCED_COLOR_STATE,
DISPLAYCONFIG_GET_ADVANCED_COLOR_INFO, DISPLAYCONFIG_MODE_INFO, DISPLAYCONFIG_PATH_INFO,
DISPLAYCONFIG_SET_ADVANCED_COLOR_STATE, DISPLAYCONFIG_SOURCE_DEVICE_NAME, QDC_ONLY_ACTIVE_PATHS,
SDC_ALLOW_CHANGES, SDC_APPLY, SDC_FORCE_MODE_ENUMERATION, SDC_SAVE_TO_DATABASE,
SDC_USE_SUPPLIED_DISPLAY_CONFIG,
DISPLAYCONFIG_SET_ADVANCED_COLOR_STATE, DISPLAYCONFIG_SOURCE_DEVICE_NAME,
QDC_ONLY_ACTIVE_PATHS, SDC_ALLOW_CHANGES, SDC_APPLY, SDC_FORCE_MODE_ENUMERATION,
SDC_SAVE_TO_DATABASE, SDC_USE_SUPPLIED_DISPLAY_CONFIG,
};
use windows::Win32::Graphics::Gdi::{
ChangeDisplaySettingsExW, EnumDisplaySettingsW, CDS_TEST, CDS_UPDATEREGISTRY, DEVMODEW,
@@ -202,6 +205,10 @@ pub(crate) fn set_active_mode(gdi_name: &str, mode: Mode) {
dmSize: size_of::<DEVMODEW>() as u16,
..Default::default()
};
// SAFETY: `wname` is a live NUL-terminated UTF-16 device name (built above) whose pointer stays
// valid for the call; `&mut dm` is a live DEVMODEW with `dmSize` set that EnumDisplaySettingsW
// fills in for mode index `i`. Both outlive this synchronous call; the API only reads the name
// and writes `dm`, so nothing aliases.
let ok = unsafe {
EnumDisplaySettingsW(
PCWSTR(wname.as_ptr()),
@@ -269,6 +276,9 @@ pub(crate) fn set_active_mode(gdi_name: &str, mode: Mode) {
dmDisplayFrequency: chosen_hz,
..Default::default()
};
// SAFETY: `wname` is a live NUL-terminated UTF-16 device name and `&dm` is a live DEVMODEW describing
// the requested mode; both outlive the call. CDS_TEST only validates the mode (no apply), the two
// trailing args are null, and the API only reads its inputs.
let test = unsafe {
ChangeDisplaySettingsExW(PCWSTR(wname.as_ptr()), Some(&dm), None, CDS_TEST, None)
};
@@ -282,6 +292,9 @@ pub(crate) fn set_active_mode(gdi_name: &str, mode: Mode) {
);
return;
}
// SAFETY: same inputs as the CDS_TEST call above — `wname` (live NUL-terminated device name) and
// `&dm` (live DEVMODEW) both outlive the call; CDS_UPDATEREGISTRY applies the already-validated mode,
// and the API only reads its inputs.
let apply = unsafe {
ChangeDisplaySettingsExW(
PCWSTR(wname.as_ptr()),
@@ -43,7 +43,7 @@ Apollo is host-only. A stream flows: **nvhttp** (HTTPS pairing + serverinfo/appl
| Apollo — Audio capture, encode, transport (Windows host) | `audio.cpp`; `audio.h`; `audio.cpp`; `common.h`; `stream.cpp` | `audio.rs`; `audio/wasapi_cap.rs`; `audio/linux.rs`; `gamestream/audio.rs`; `punktfunk1.rs` |
| Apollo (Sunshine fork) — Input handling & injection | `input.cpp`; `input.cpp`; `keylayout.h`; `misc.cpp` | — |
| Apollo: App/process launch & display configuration (Windows host) | `process.cpp`; `display_device.cpp`; `process.h`; `virtual_display.h`; `misc.cpp`; `utils.cpp` | `vdisplay/sudovda.rs`; `vdisplay.rs`; `gamestream/apps.rs`; `library.rs`; `punktfunk1.rs`; `capture/wgc_relay.rs` |
| Apollo: Config, management/web UI, system tray | `config.h`; `config.cpp`; `confighttp.cpp`; `confighttp.h`; `system_tray.cpp`; `system_tray.h` | `mgmt.rs`; `mgmt_token.rs`; `main.rs`; `native_pairing.rs`; `library.rs`; `docs/windows-host.md` |
| Apollo: Config, management/web UI, system tray | `config.h`; `config.cpp`; `confighttp.cpp`; `confighttp.h`; `system_tray.cpp`; `system_tray.h` | `mgmt.rs`; `mgmt_token.rs`; `main.rs`; `native_pairing.rs`; `library.rs`; `design/windows-host.md` |
### Apollo — Protocol & streaming (RTP/FEC/ENet/RTSP/crypto)
@@ -354,7 +354,7 @@ The `formats[]` table (258-277) maps 2/6/8 channels to Stereo/5.1/7.1 with the G
- **Decouple ingest from injection via task-pool queue with lock-then-release batching** — The control-stream thread only enqueues bytes and schedules a task (src/input.cpp:1639-1643). A pool thread pops one packet, coalesces later same-type packets into it while holding the queue lock, then RELEASES the lock before the (potentially slow) SendInput/ViGEm call (src/input.cpp:1486-1520). — _For a low-latency streaming host this is the core anti-head-of-line-blocking pattern: a slow OS input call (e.g. SendInput crossing a desktop switch) never stalls the network/control thread, and bursts of mouse/scroll/controller packets collapse to one OS event per drain. punktfunk should mirror this: never call SendInput on the QUIC/control thread._
- **Type-aware packet batching with batch_result_e (batched / not_batchable / terminate_batch)** — batch() overloads (src/input.cpp:1208-1475) sum relative-mouse deltas and scroll amounts (with __builtin_add_overflow guards that terminate the batch on 16-bit overflow), take the latest absolute position, and collapse controller/touch/pen move/hover runs. terminate_batch stops at a state-changing event (button change, eventType change, active-mask change) so ordering semantics are preserved; not_batchable skips a non-matching controller but keeps scanning. — _Moonlight 'spams controller packets even when not necessary' (src/input.cpp:282). Batching cuts injected-event count under load without dropping state transitions — directly reduces input-to-screen jitter and OS overhead._
- **VK→scancode injection with normalization fallback ladder** — keyboard_update (src/platform/windows/input.cpp:608) prefers KEYEVENTF_SCANCODE using the static US-English VK_TO_SCANCODE_MAP (keylayout.h). If the client flagged the VK as non-normalized (SS_KBE_FLAG_NON_NORMALIZED) it falls back to MapVirtualKey under config::input.always_send_scancodes (excluding VK_LWIN/RWIN/PAUSE which misbehave), else sends a raw VK event. A curated switch adds KEYEVENTF_EXTENDEDKEY for the extended-key set (arrows, nav cluster, RWIN/RMENU/RCONTROL, numpad divide, apps). — _Many games read DirectInput/raw scancodes, not VK events; sending scancodes is essential for in-game key compatibility. The extended-key flag is required or arrow keys / right-modifiers misfire. This is a concrete table+logic punktfunk's Windows VK path can adopt verbatim._
- **Desktop-switch retry on every SendInput / InjectSyntheticPointerInput** — send_input (src/platform/windows/input.cpp:477) and inject_synthetic_pointer_input (line 499) retry once after calling syncThreadDesktop() (misc.cpp:251 — OpenInputDesktop(DF_ALLOWOTHERACCOUNTHOOK)+SetThreadDesktop) when the call fails and the input desktop handle changed, tracked in a thread_local _lastKnownInputDesktop. — _On Windows the input desktop changes on UAC prompts, lock screen, and Ctrl+Alt+Del (secure desktop / Winlogon). Without re-binding the thread to the new desktop, all injected input silently fails. This is exactly the secure-desktop problem area called out in punktfunk's docs/memory — Apollo solves it cheaply per-call rather than with a second process._
- **Desktop-switch retry on every SendInput / InjectSyntheticPointerInput** — send_input (src/platform/windows/input.cpp:477) and inject_synthetic_pointer_input (line 499) retry once after calling syncThreadDesktop() (misc.cpp:251 — OpenInputDesktop(DF_ALLOWOTHERACCOUNTHOOK)+SetThreadDesktop) when the call fails and the input desktop handle changed, tracked in a thread_local _lastKnownInputDesktop. — _On Windows the input desktop changes on UAC prompts, lock screen, and Ctrl+Alt+Del (secure desktop / Winlogon). Without re-binding the thread to the new desktop, all injected input silently fails. This is exactly the secure-desktop problem area called out in punktfunk's design/memory — Apollo solves it cheaply per-call rather than with a second process._
- **ViGEm dual-target gamepad with client-negotiated type selection** — alloc_gamepad (src/platform/windows/input.cpp:1175) picks X360 vs DS4 by precedence: explicit config (x360/ds4) > client-reported LI_CTYPE_PS/XBOX > motion_as_ds4 if accel/gyro present > touchpad_as_ds4 > default X360. It warns when capabilities (motion/touchpad/RGB) will be lost on X360. DS4 path packs motion, touchpad, and battery into DS4_REPORT_EX. — _DS4 is the only ViGEm target that carries gyro/accel, touchpad, and lightbar; X360 is the safe default. punktfunk already does client-negotiated pad type — Apollo's capability-driven auto-selection (motion/touchpad presence → DS4) and the explicit 'feature will be lost' warnings are a more refined policy worth porting._
- **DS4 timestamped resend loop (ds4_update_ts_and_send)** — Every DS4 report advances wTimestamp by elapsed time in 5.333µs units and re-arms a 100ms repeat_task (src/platform/windows/input.cpp:1454-1481), so the 16-bit timestamp never stalls/overflows even when no new input arrives. — _'Some applications require updated timestamp values to register DS4 input' (line 1450). Without the heartbeat, motion-aware games ignore a held DS4. Non-obvious gotcha that any DS4-emulating host must replicate._
- **Synthetic pen/touch via InjectSyntheticPointerInput with periodic refresh and slot compaction** — Per-client synthetic pointer devices (CreateSyntheticPointerDevice, Win10 1809+). Touch slots are kept contiguous via perform_touch_compaction (line 715, required by the API), edge-triggered flags (DOWN/UP/CANCELED/UPDATE) are cleared after each frame (line 900/1020), and a 50ms repeat task (ISPI_REPEAT_INTERVAL) re-injects held state because Windows auto-cancels untouched interactions after ~1s. — _Touch/pen are stateful, slot-indexed, and self-cancelling — a fundamentally different injection model than mouse/keyboard. If punktfunk grows touch/pen, this is the reference for the Windows-specific contiguity + refresh requirements._
@@ -479,7 +479,7 @@ A single static `struct tray` (l.112) holds icon path, tooltip, a fixed menu arr
- **Per-vendor encoder enum string translators** — Whole namespaces (nv/amd/qsv/vt/sw, config.cpp l.53-357) map human strings ('ultralowlatency','cqp','superfast') to encoder SDK integer constants, with low-latency presets as the DEFAULTS (e.g. amd usage = ultralowlatency l.469-471, sw preset 'superfast'/'zerolatency' l.451-453, nvenc realtime HAGS + high-power mode on by default l.457-459). — _Defaults are explicitly tuned for latency, not quality — the encoder is configured ultra-low-latency out of the box. A low-latency host's config defaults should bias the same way; this is the concrete table punktfunk can port for AMD/QSV/VT vendor parity._
- **Embedded HTTPS server sharing the host TLS identity** — confighttp uses SimpleWeb::Server<HTTPS> seeded with nvhttp.cert/pkey (confighttp.cpp l.1511) — the SAME cert the Moonlight/GameStream pairing uses — on a fixed port offset (PORT_HTTPS=1 → base+1). — _One identity, one cert, management UI and stream control on adjacent ports. punktfunk already shares its cert.pem between GameStream pairing and punktfunk/1; the lesson is the web console can reuse it rather than carrying a separate mgmt TLS story._
- **Single-string session cookie with salted-hash validation** — authenticate() (l.179) validates hex(hash(cookie + salt)) against an in-memory sessionCookie with a 15-day steady_clock expiry; login (l.1469) rand_alphabet(64) the raw cookie and stores only its hash. checkIPOrigin gates by pc/lan/wan BEFORE auth. — _Contrast with punktfunk's mgmt API (bearer token in ~/.config/punktfunk/mgmt-token + web login gate). Apollo's cookie+IP-origin model is simpler for a desktop single-operator host and avoids a static long-lived token; worth considering for the web console's UX._
- **Windows service↔UI self-elevation handshake** — config::parse (l.1490-1534): a non-admin Start-Menu shortcut self-relaunches as admin (ShellExecuteExW 'runas' --shortcut-admin l.1511), starts the service, wait_for_ui_ready() polls the Win32 TCP table for the LISTEN socket (entry_handler.cpp l.236), then launch_ui(), and returns 1 so the shortcut process never starts a stream. — _This is the mature answer to the exact problem punktfunk's Windows host hit (docs/windows-host.md 'secure-desktop two-process design', Session-0 vs interactive session). Apollo solves UI-launch-from-service cleanly; the TCP-table readiness poll is directly portable._
- **Windows service↔UI self-elevation handshake** — config::parse (l.1490-1534): a non-admin Start-Menu shortcut self-relaunches as admin (ShellExecuteExW 'runas' --shortcut-admin l.1511), starts the service, wait_for_ui_ready() polls the Win32 TCP table for the LISTEN socket (entry_handler.cpp l.236), then launch_ui(), and returns 1 so the shortcut process never starts a stream. — _This is the mature answer to the exact problem punktfunk's Windows host hit (design/windows-host.md 'secure-desktop two-process design', Session-0 vs interactive session). Apollo solves UI-launch-from-service cleanly; the TCP-table readiness poll is directly portable._
- **Tray thread DACL hardening for SYSTEM-context survival** — init_tray() (l.143-197) adds an EXPLICIT_ACCESS ACE granting SYNCHRONIZE to Everyone on the current thread handle before registering the icon, and busy-waits for GetShellWindow() (l.201) so the icon registers reliably across logoff/logon. — _When the host runs as a Windows service (SYSTEM), Explorer can't open the thread to detect termination → ghost tray icons forever. punktfunk's Windows host, if it ever runs as a service with a tray, needs this exact DACL fix._
- **JSON-list config values parsed via ptree wrapping** — Multi-line bracketed values (global_prep_cmd, server_cmd, dd_mode_remapping) are extracted as raw strings by the flat parser, wrapped in a synthetic JSON object, then parsed by boost ptree (list_prep_cmd_f l.949, mode_remapping_from_view l.411). — _A pragmatic hybrid: flat key=value for the human-editable 90%, embedded JSON for structured fields, without committing to full-JSON config. Shows how to grow a flat config without a rewrite._
@@ -680,7 +680,7 @@ Both transports use the persistent `AudioCapSlot` (gamestream/audio.rs:251-257)
### Input handling & injection — 🔴 Apollo ahead
For the Windows host specifically, Apollo is ahead on input breadth and robustness. Apollo covers mouse (rel+abs), keyboard (with a static US-layout VK→scancode table for game compatibility), Unicode text, scroll, **touch + pen via CreateSyntheticPointerDevice**, and **both X360 and DS4** gamepads with rumble/LED/motion/touchpad/battery feedback (Apollo src/platform/windows/input.cpp). punktfunk's Windows host covers mouse/keyboard/scroll/X360-only; touch and pen are explicit no-ops (sendinput.rs:231-237), there is no Unicode text path (gamestream/input.rs:83-84), and only the Xbox 360 virtual pad exists on Windows. Apollo also has the more efficient secure-desktop model (retry-only) vs punktfunk's per-event reattach (sendinput.rs:97), and Apollo's task-pool queue + type-aware batching (Apollo src/input.cpp:1481-1571, 1208-1475) coalesces input spam off the network thread — punktfunk's GameStream path injects inline on the ENet thread (control.rs:207-211) with no batching anywhere. punktfunk's design is cleaner and its m3 path's session-end held-key release + backend-follow logic is genuinely nicer than Apollo, but those are punktfunk/1-specific; on the shared Windows-host injection surface Apollo is the more complete, battle-tested implementation. punktfunk's docs/windows-secure-desktop.md already flags the retry-only refactor as planned-but-unshipped, confirming the gap.
For the Windows host specifically, Apollo is ahead on input breadth and robustness. Apollo covers mouse (rel+abs), keyboard (with a static US-layout VK→scancode table for game compatibility), Unicode text, scroll, **touch + pen via CreateSyntheticPointerDevice**, and **both X360 and DS4** gamepads with rumble/LED/motion/touchpad/battery feedback (Apollo src/platform/windows/input.cpp). punktfunk's Windows host covers mouse/keyboard/scroll/X360-only; touch and pen are explicit no-ops (sendinput.rs:231-237), there is no Unicode text path (gamestream/input.rs:83-84), and only the Xbox 360 virtual pad exists on Windows. Apollo also has the more efficient secure-desktop model (retry-only) vs punktfunk's per-event reattach (sendinput.rs:97), and Apollo's task-pool queue + type-aware batching (Apollo src/input.cpp:1481-1571, 1208-1475) coalesces input spam off the network thread — punktfunk's GameStream path injects inline on the ENet thread (control.rs:207-211) with no batching anywhere. punktfunk's design is cleaner and its m3 path's session-end held-key release + backend-follow logic is genuinely nicer than Apollo, but those are punktfunk/1-specific; on the shared Windows-host injection surface Apollo is the more complete, battle-tested implementation. punktfunk's design/windows-secure-desktop.md already flags the retry-only refactor as planned-but-unshipped, confirming the gap.
**How punktfunk does it.**
@@ -748,7 +748,7 @@ For the Windows host specifically, Apollo is clearly ahead on this subsystem. Ap
- punktfunk has TWO app surfaces by design: the GameStream apps.json catalog (Moonlight compat) AND a richer punktfunk/1 library (Steam local scan + custom store + CDN art + uniform GameEntry grid). Apollo has only the apps.json catalog because it ships no client.
- punktfunk's launch security model is deliberately client-can't-inject: the client sends only a store-qualified id and the host resolves it against its OWN library (library.rs:394-412), with steam appid validated digits-only. Apollo trusts its own apps.json cmds (it has no untrusted remote launch id).
- punktfunk keeps NO async on the per-frame path; the SudoVDA watchdog pinger and capture are native threads. Apollo's libdisplaydevice RetryScheduler is its own machinery; punktfunk has no equivalent scheduler by choice (yet — see candidate improvements).
- punktfunk's Windows virtual display is the SOLE primary output (isolate_displays + CDS_SET_PRIMARY) specifically to capture the secure/Winlogon desktop — a deliberate, documented design (docs/windows-secure-desktop.md) that goes beyond what stock Apollo needs.
- punktfunk's Windows virtual display is the SOLE primary output (isolate_displays + CDS_SET_PRIMARY) specifically to capture the secure/Winlogon desktop — a deliberate, documented design (design/windows-secure-desktop.md) that goes beyond what stock Apollo needs.
**Transfer candidates from Apollo (6):** _Actually launch the app/game on Windows (CreateProcessAsUserW into the user session)_, _Display-config apply/revert with a retry scheduler and guaranteed revert on disconnect_, _Set HDR on the virtual display and advertise IsHdrSupported when the client requests it_, _Per-(app,client) stable virtual-display GUID instead of one fixed MONITOR_GUID_, _Inject per-app launch env (client res/fps/HDR/audio + status) for launch scripts_, _auto_detach heuristic for launcher-style apps (Steam/UWP) that exit immediately_ — see Part 4.
@@ -765,7 +765,7 @@ On the API itself punktfunk is arguably ahead (versioned `/api/v1`, compile-time
punktfunk splits the control surface into three pieces and deliberately keeps them OUT of the host binary where Apollo bundles them in.
##### 1. Management plane = a versioned REST API only (`crates/punktfunk-host/src/mgmt.rs`)
- An axum `Router` (`mgmt.rs:166` `fn app`) under `/api/v1`, single source of truth shared between the live server and the `openapi` subcommand (`mgmt.rs:195` `api_router_parts`, `main.rs:86`). The OpenAPI 3.1 doc is generated at compile time with `utoipa` and a checked-in copy is drift-tested against `docs/api/openapi.json` (`mgmt.rs:1582` `openapi_document_is_complete_and_checked_in`). This is a real maturity advantage over Apollo, which has no machine-readable API spec.
- An axum `Router` (`mgmt.rs:166` `fn app`) under `/api/v1`, single source of truth shared between the live server and the `openapi` subcommand (`mgmt.rs:195` `api_router_parts`, `main.rs:86`). The OpenAPI 3.1 doc is generated at compile time with `utoipa` and a checked-in copy is drift-tested against `api/openapi.json` (`mgmt.rs:1582` `openapi_document_is_complete_and_checked_in`). This is a real maturity advantage over Apollo, which has no machine-readable API spec.
- Routes: host info/capabilities/port map (`mgmt.rs:590`), live status (`mgmt.rs:671`), paired GameStream clients list/unpair (`mgmt.rs:707`,`752`), the GameStream PIN flow (`mgmt.rs:789`,`814`), the native punktfunk/1 pairing surface — arm/disarm/status/list/unpair (`mgmt.rs:870`-`994`), **delegated pairing approval** via a pending-device queue (`mgmt.rs:1011`,`1049`,`1094`), session stop + force-IDR (`mgmt.rs:1120`,`1144`), and game-library CRUD (`mgmt.rs:1171`-`1252`).
- **HTTPS always, even on loopback** (`mgmt.rs:75` `run`): it runs the rustls handshake itself via tokio-rustls so it can surface the verified peer cert to handlers (`mgmt.rs:115` `serve_https`), reusing the host's persistent identity cert that clients already pin (`mgmt.rs:90`).
- **Dual auth** (`mgmt.rs:518` `require_auth`): a paired native client authenticates by its **mTLS certificate fingerprint** (matched against the native paired store, no token needed); everyone else (the web console / admin) uses a bearer token compared in constant time (`mgmt.rs:551` `token_eq` via SHA-256 digest compare). `/api/v1/health` is the only unauthenticated route. This is stronger than Apollo's single-global-session-cookie scheme (Apollo `confighttp.cpp` has exactly one `std::string sessionCookie`).
@@ -784,7 +784,7 @@ A token always exists with zero operator steps: env `PUNKTFUNK_MGMT_TOKEN` wins,
There is no system tray, no balloon notifications, and no "open the UI in the browser" entry point anywhere in `crates/punktfunk-host`. Apollo has a full cross-platform tray (`system_tray.cpp`) with state-driven icon/notification updates and menu callbacks.
##### 6. Windows launch story = scripts, not in-binary
The two-process secure-desktop design exists for *capture* (`main.rs:204` `wgc-helper` subcommand + `capture/wgc_relay.rs` `CreateProcessAsUserW`), but the service/desktop launch dance is handled by external scripts (scheduled task -> PsExec64 -> launch.vbs -> host-run.cmd; `docs/windows-host.md:77-96`). punktfunk has no in-binary service install, no self-elevation, no "launch UI in browser", and no tray — all of which Apollo bakes into `config.cpp`/`entry_handler.cpp`/`system_tray.cpp`.
The two-process secure-desktop design exists for *capture* (`main.rs:204` `wgc-helper` subcommand + `capture/wgc_relay.rs` `CreateProcessAsUserW`), but the service/desktop launch dance is handled by external scripts (scheduled task -> PsExec64 -> launch.vbs -> host-run.cmd; `design/windows-host.md:77-96`). punktfunk has no in-binary service install, no self-elevation, no "launch UI in browser", and no tray — all of which Apollo bakes into `config.cpp`/`entry_handler.cpp`/`system_tray.cpp`.
**Intentional divergences (by design, not gaps):**
@@ -1555,14 +1555,14 @@ punktfunk's **secure-desktop / desktop-switch capture recovery is genuinely matu
##### Where punktfunk is weaker / missing / fragile
1. **No real Windows service — relies on a PsExec scheduled task.** The launch chain is a scheduled task → `PsExec64 -s -i 1``wscript.exe launch.vbs` → hidden `host-run.cmd` (`docs/windows-host.md:78-84`). There is **no `SERVICE_CONTROL_SESSIONCHANGE` relaunch** — the doc even lists it as unimplemented "step 6" (`docs/windows-secure-desktop.md:89`). PsExec is a 3rd-party SysInternals tool, not redistributable cleanly, and `-s -i 1` hard-codes session 1. None of the launch scripts (`launch.vbs`, `host-run.cmd`) are checked into the repo (only `scripts/headless/win-build.cmd` exists). This is the single biggest fragility vs Apollo's `sunshinesvc.cpp`.
1. **No real Windows service — relies on a PsExec scheduled task.** The launch chain is a scheduled task → `PsExec64 -s -i 1``wscript.exe launch.vbs` → hidden `host-run.cmd` (`design/windows-host.md:78-84`). There is **no `SERVICE_CONTROL_SESSIONCHANGE` relaunch** — the doc even lists it as unimplemented "step 6" (`design/windows-secure-desktop.md:89`). PsExec is a 3rd-party SysInternals tool, not redistributable cleanly, and `-s -i 1` hard-codes session 1. None of the launch scripts (`launch.vbs`, `host-run.cmd`) are checked into the repo (only `scripts/headless/win-build.cmd` exists). This is the single biggest fragility vs Apollo's `sunshinesvc.cpp`.
2. **No nvprefs / NvAPI at all.** `grep` for `nvprefs|NvAPI|DRS_|PREFERRED_PSTATE|DXPRESENT` across the host returns nothing. No PREFERRED_PSTATE_MAX for the encoder, no OGL_CPL_PREFER_DXPRESENT (so GL/Vulkan fullscreen apps may not be capturable via WGC/DDA), and no undo-file crash safety.
3. **No DXGI GPU-preference / output-reparenting hook.** No MinHook of `NtGdiDdDDIGetCachedHybridQueryValue`. On a hybrid/Optimus box DXGI can reparent the SudoVDA output onto the render GPU and break DDA. punktfunk's "search all adapters" partly papers over this but does not prevent the reparenting itself.
4. **mDNS uses the cross-platform `mdns-sd` crate, not Windows-native `DnsServiceRegister`** (`discovery.rs:17`). It works, but it does NOT carry Apollo's RFC-1035 empty-TXT fix — and the GameStream/Moonlight mDNS path on Windows is unverified (`docs/windows-host.md:46`). A non-RFC-compliant TXT can be rejected by Apple's resolver.
4. **mDNS uses the cross-platform `mdns-sd` crate, not Windows-native `DnsServiceRegister`** (`discovery.rs:17`). It works, but it does NOT carry Apollo's RFC-1035 empty-TXT fix — and the GameStream/Moonlight mDNS path on Windows is unverified (`design/windows-host.md:46`). A non-RFC-compliant TXT can be rejected by Apple's resolver.
5. **No stream-start system tuning.** No `NtSetTimerResolution`/`timeBeginPeriod`, no `DwmEnableMMCSS`, no `SetPriorityClass(HIGH_PRIORITY_CLASS)`, no `SetThreadExecutionState(ES_DISPLAY_REQUIRED)`, no WLAN media-streaming mode, no Mouse-Keys-on-headless trick. (Linux has none of this either, but on Windows these are real latency/jitter levers Apollo proves out.)
6. **No `factory->IsCurrent()` per-frame check.** punktfunk reacts to errors from `AcquireNextFrame` but does not proactively detect HDR/topology changes the way Apollo does each frame (`display_base.cpp:235`) — it relies on ACCESS_LOST firing, which it usually does, but IsCurrent is the cleaner signal.
7. **No `is_user_session_locked()` / CCD pre-flight.** Before a mode-set or isolation, Apollo checks `WTSQuerySessionInformationW` + `SetDisplayConfig(SDC_VALIDATE)` (`utils.cpp:184-237`); punktfunk just attempts and handles failure, which can thrash the display during a lock.
8. **Clock epoch is `SystemTime::now()` (`dxgi.rs:1530`), not `GetSystemTimePreciseAsFileTime`.** The doc itself flags this as a cross-machine-latency risk (`docs/windows-host.md:284-286`); std SystemTime on Windows historically has coarser (~115 ms) resolution than the precise FILETIME API, which can corrupt the ClockProbe/ClockEcho skew handshake.
8. **Clock epoch is `SystemTime::now()` (`dxgi.rs:1530`), not `GetSystemTimePreciseAsFileTime`.** The doc itself flags this as a cross-machine-latency risk (`design/windows-host.md:284-286`); std SystemTime on Windows historically has coarser (~115 ms) resolution than the precise FILETIME API, which can corrupt the ClockProbe/ClockEcho skew handshake.
#### Transfer opportunities
@@ -1772,7 +1772,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
*Area:* `cmp:input` · *Windows-host:* yes · *Severity:* high · *Effort:* small
- **Apollo does:** send_input() / inject_synthetic_pointer_input() call SendInput FIRST, and only on failure (0 injected) re-run syncThreadDesktop() (OpenInputDesktop(DF_ALLOWOTHERACCOUNTHOOK)+SetThreadDesktop) and retry once, tracking the desktop in a thread_local _lastKnownInputDesktop — src/platform/windows/input.cpp:477,499 + src/platform/windows/misc.cpp:251
- **punktfunk gap:** SendInputInjector::inject() calls reattach_input_desktop() (an OpenInputDesktop+SetThreadDesktop+CloseDesktop) at the TOP of EVERY event — crates/punktfunk-host/src/inject/sendinput.rs:97,50-69. This is a syscall triple per mouse-move; punktfunk's own docs/windows-secure-desktop.md:78-80 lists this exact refactor (step 2) as planned but unshipped.
- **punktfunk gap:** SendInputInjector::inject() calls reattach_input_desktop() (an OpenInputDesktop+SetThreadDesktop+CloseDesktop) at the TOP of EVERY event — crates/punktfunk-host/src/inject/sendinput.rs:97,50-69. This is a syscall triple per mouse-move; punktfunk's own design/windows-secure-desktop.md:78-80 lists this exact refactor (step 2) as planned but unshipped.
- **Proposal:** Inject first; cache the HDESK thread-local; only on a 0/partial SendInput result call reattach_input_desktop() and retry once. Use DF_ALLOWOTHERACCOUNTHOOK in the OpenInputDesktop access (sendinput.rs:52-56 currently passes DESKTOP_CONTROL_FLAGS(0)) so the secure desktop is reachable. Keeps the steady-state hot path to a single SendInput call.
#### 2. Detect resolution/format change on the acquire hot path, not only during rebuild
@@ -1846,7 +1846,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
*Area:* `cmp:config-management` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
- **Apollo does:** system_tray.cpp builds a single static tray struct with a menu (Open/Force-stop/Reset-display/Restart/Quit, l.112-141) and pushes state changes from the streaming pipeline — update_tray_playing/pausing/stopped/launch_error/require_pin/paired/client_connected (l.238-412) each swap the icon + raise a balloon notification; init_tray hardens the thread DACL so the icon survives running as SYSTEM (l.143-204); a 50 ms polling thread drives it (tray_thread_worker l.415).
- **punktfunk gap:** No tray code exists anywhere in crates/punktfunk-host (grep for tray/notify-rust/balloon returns nothing). On Windows the host runs windowless as SYSTEM in Session 1 via external scripts (docs/windows-host.md:77-84) with the only operator feedback being a redirected log file — there is no visible, clickable status/control surface for a desktop user.
- **punktfunk gap:** No tray code exists anywhere in crates/punktfunk-host (grep for tray/notify-rust/balloon returns nothing). On Windows the host runs windowless as SYSTEM in Session 1 via external scripts (design/windows-host.md:77-84) with the only operator feedback being a redirected log file — there is no visible, clickable status/control surface for a desktop user.
- **Proposal:** Add an optional system-tray plane behind a feature/flag using a Rust tray crate (e.g. tray-icon) spawned on its own native thread (no async on the per-frame path). Drive it from the existing AppState atomics/locks already exposed by mgmt.rs get_status (streaming/audio_streaming/pin_pending/session) — poll or push on state change to swap icon + show balloons (connected, pairing PIN, launch error). Menu items call the SAME primitives the API uses (stop_session, force_idr, native arm-pairing, quit). On Windows replicate Apollo's thread-DACL hardening so the icon shows when launched as SYSTEM in the interactive session.
#### 11. Treat S_OK-with-no-change frames as timeouts via DXGI update flags
@@ -1923,7 +1923,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
*Area:* `cmp:config-management` · *Windows-host:* yes · *Severity:* high · *Effort:* large
- **Apollo does:** config.cpp:1490-1534 handles the Windows shortcut/service launch dance inside the binary: --shortcut/--shortcut-admin handling, ShellExecuteExW(runas, --shortcut-admin) to self-elevate when the service isn't running, waits for the service, wait_for_ui_ready(), launch_ui(), then returns 1 so the foreground process does NOT also start a stream host. This is Sunshine/Apollo's mature service<->UI two-process split that makes one-click launch work.
- **punktfunk gap:** punktfunk has no service-install / self-elevation / interactive-session bring-up in the binary. Deployment is documented as a manual chain of external scripts — scheduled task -> PsExec64 -i 1 -> launch.vbs -> host-run.cmd (docs/windows-host.md:77-96) — fragile and operator-hostile. main.rs has no install/service subcommand.
- **punktfunk gap:** punktfunk has no service-install / self-elevation / interactive-session bring-up in the binary. Deployment is documented as a manual chain of external scripts — scheduled task -> PsExec64 -i 1 -> launch.vbs -> host-run.cmd (design/windows-host.md:77-96) — fragile and operator-hostile. main.rs has no install/service subcommand.
- **Proposal:** Add `punktfunk-host install`/`uninstall`/`service` subcommands (Windows-gated) that register a service or an Interactive/Highest scheduled task to launch the host in Session 1 (the documented requirement for DXGI duplication + SendInput), and the self-elevate-if-not-running shortcut path. Reuse the existing capture/wgc_relay CreateProcessAsUserW machinery already in the crate. This codifies the script chain into the binary without touching the per-frame path or core.
#### 21. Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame
@@ -1962,7 +1962,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
*Area:* `win:system-secure-desktop` · *Windows-host:* yes · *Severity:* high · *Effort:* large
- **Apollo does:** SunshineSvc.exe runs as LocalSystem in Session 0, loops on WTSGetActiveConsoleSessionId, clones its own token with DuplicateTokenEx(TokenPrimary)+SetTokenInformation(TokenSessionId) and CreateProcessAsUserW into winsta0\\default inside a per-session job object (JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE|BREAKAWAY_OK); opts into SERVICE_ACCEPT_SESSIONCHANGE and on WTS_CONSOLE_CONNECT terminates+relaunches the host in the new session (tools/sunshinesvc.cpp:95,111,239,256,267,276-294)
- **punktfunk gap:** punktfunk has no Windows service; launch is a PsExec64 -s -i 1 scheduled task hard-coded to session 1 (docs/windows-host.md:78-84), with the SERVICE_CONTROL_SESSIONCHANGE relaunch listed as unimplemented step 6 (docs/windows-secure-desktop.md:89). Launch scripts are not even in the repo.
- **punktfunk gap:** punktfunk has no Windows service; launch is a PsExec64 -s -i 1 scheduled task hard-coded to session 1 (design/windows-host.md:78-84), with the SERVICE_CONTROL_SESSIONCHANGE relaunch listed as unimplemented step 6 (design/windows-secure-desktop.md:89). Launch scripts are not even in the repo.
- **Proposal:** Add a small Rust service binary (new crate or punktfunk-host `service` subcommand) using windows::Win32::System::Services (RegisterServiceCtrlHandlerEx, StartServiceCtrlDispatcher) that mirrors sunshinesvc.cpp: WTSGetActiveConsoleSessionId -> DuplicateTokenEx+SetTokenInformation(TokenSessionId) -> CreateProcessAsUserW(lpDesktop=winsta0\\default) into a kill-on-close job, accept SERVICE_ACCEPT_SESSIONCHANGE, and relaunch the host on a genuine console-session change. Ship an installer and drop the PsExec dependency.
#### 25. Elevate capture/encode/send thread priority on the host hot path
+1 -1
View File
@@ -40,7 +40,7 @@ the GPU/compositor stack of the box it runs on). What is:
| Image | Source | Notes |
|---|---|---|
| `git.unom.io/unom/punktfunk-web` | `web/Dockerfile` (repo-root context — orval needs `docs/api/openapi.json`) | Nitro `bun` bundle; `PORT` (3000) and `PUNKTFUNK_MGMT_URL` env at runtime |
| `git.unom.io/unom/punktfunk-web` | `web/Dockerfile` (repo-root context — orval needs `api/openapi.json`) | Nitro `bun` bundle; `PORT` (3000) and `PUNKTFUNK_MGMT_URL` env at runtime |
| `git.unom.io/unom/punktfunk-docs` | `docs-site/Dockerfile` | This site; `PORT` (3000) |
| `git.unom.io/unom/punktfunk-rust-ci` | `ci/rust-ci.Dockerfile` | Ubuntu 26.04 + FFmpeg 8/PipeWire/GL/GBM dev libs + a libcuda **link stub** (driver userspace, no kernel module) + pinned rustup — the container `ci.yml`'s Rust job runs in |
+246
View File
@@ -0,0 +1,246 @@
# Stats capture & graphing — design
Goal: let an operator **enable performance-stats capture from the web console**, play a
session, **stop**, and **review the captured time-series as graphs** in the web console.
Captures are **saved to disk** (browse/compare past sessions; survive host restart) and
cover **both** streaming paths: native punktfunk/1 (`virtual_stream`) and GameStream/Moonlight
(`gamestream/stream.rs`).
This builds on the existing per-stage instrumentation (today gated by `PUNKTFUNK_PERF=1`,
stdout-only, read once at startup). We make recording **runtime-toggleable**, route the same
aggregates into a **shared ring → on-disk recording**, and expose it over the mgmt REST API +
web console.
---
## 1. Host: shared `StatsRecorder`
New module `crates/punktfunk-host/src/stats_recorder.rs`. One `Arc<StatsRecorder>` is created
once in the unified host entry (`gamestream::serve`, the `serve` subcommand) alongside
`Arc<NativePairing>`, and shared with **both** the mgmt API (`MgmtState`) and the streaming
loops (threaded through `punktfunk1::serve``SessionContext``virtual_stream`/`send_loop`,
and into the GameStream encode loop). Mirror the existing `NativePairing` Arc-sharing pattern
exactly.
### Data model (serde + utoipa `ToSchema`; this is the wire + on-disk shape)
```rust
/// One pipeline stage's latency in a window (microseconds).
pub struct StageTiming {
pub name: String, // "capture" | "submit" | "encode" | "packetize" | "send"
pub p50_us: f32,
pub p99_us: f32,
}
/// One aggregated sample (~ every 2 s native, ~ every 1 s GameStream).
pub struct StatsSample {
pub t_ms: u64, // ms since capture start (monotonic, from a stored Instant)
pub session_id: u32, // disambiguates concurrent sessions (usually constant)
pub stages: Vec<StageTiming>, // ordered pipeline stages for this path
pub fps: f32, // genuine NEW frames/s from the source
pub repeat_fps: f32, // re-encoded holds/s (source-starvation indicator)
pub mbps: f32, // tx goodput (Mb/s)
pub bitrate_kbps: u32, // configured target bitrate
pub frames_dropped: u32, // delta in this window
pub packets_dropped: u32, // delta (receiver-side / reassembler), where known
pub send_dropped: u32, // delta (host send-buffer overflow / EAGAIN)
pub fec_recovered: u32, // delta (shards recovered)
}
pub struct CaptureMeta {
pub id: String, // "2026-06-26T20-14-03Z_5120x1440" — also the filename stem
pub started_unix_ms: u64,
pub duration_ms: u64,
pub kind: String, // "native" | "gamestream"
pub width: u32,
pub height: u32,
pub fps: u32,
pub codec: String, // "h264" | "hevc" | "av1"
pub client: String, // short label / fingerprint prefix, or "" if unknown
pub sample_count: u32,
}
pub struct Capture {
pub meta: CaptureMeta,
pub samples: Vec<StatsSample>,
}
pub struct StatsStatus {
pub armed: bool, // capture currently running
pub sample_count: u32, // samples in the in-progress capture
pub started_unix_ms: u64, // 0 if idle
pub kind: String, // path of the in-progress capture, "" if idle
}
```
Stage sets per path (ordered, roughly the per-frame critical path so stacking is meaningful):
- **native**: `capture` (try_latest ring read + color convert), `submit` (NVENC enqueue),
`encode` (lock_bitstream = NVENC schedule + ASIC — the dominant stage under GPU load),
`send` (paced_submit: seal + FEC + pace + sendmmsg).
- **gamestream**: `capture`, `encode`, `packetize` (poll+FEC+packetize), `send`.
> Native naming: today's vectors are `st_cap``capture`, `st_submit``submit`,
> `st_wait``encode`, `pace_us``send`. (`encode_us` total ≈ capture+submit+encode; we do not
> emit it as a stage to avoid double-counting — it's implied by the stack.)
### Recorder API
```rust
pub struct StatsRecorder { /* dir, armed: AtomicBool, live: Mutex<Option<Live>>, next_sid: AtomicU32 */ }
impl StatsRecorder {
pub fn new(dir: PathBuf) -> Arc<Self>; // creates dir (0700) if missing
pub fn is_armed(&self) -> bool; // cheap Relaxed atomic load — called on the hot path
/// Arm a new capture. No-op if already armed (returns current status).
pub fn start(&self) -> StatsStatus;
/// A streaming loop announces itself when it first records while armed.
/// Seeds CaptureMeta (kind/w/h/fps/codec/client) on the FIRST registration. Returns session_id.
pub fn register_session(&self, kind: &'static str, w: u32, h: u32, fps: u32, codec: &str, client: &str) -> u32;
/// Append one aggregated sample (called from the loops' existing ~2 s/~1 s boundary).
/// Bounded: cap at MAX_SAMPLES (e.g. 5400 ≈ 3 h @ 2 s). On overflow, stop appending and
/// set a `truncated` flag (DO NOT drop oldest — a saved recording must keep its start).
pub fn push_sample(&self, session_id: u32, sample: StatsSample);
/// Disarm + finalize: write <dir>/<id>.json atomically, clear live, return saved meta.
pub fn stop(&self) -> std::io::Result<Option<CaptureMeta>>;
pub fn status(&self) -> StatsStatus;
pub fn live_snapshot(&self) -> Option<Capture>; // clone of the in-progress capture for live graphing
pub fn list(&self) -> Vec<CaptureMeta>; // scan dir, parse meta only, newest first
pub fn load(&self, id: &str) -> std::io::Result<Capture>;
pub fn delete(&self, id: &str) -> std::io::Result<()>;
}
```
Invariants / safety:
- **No async on the per-frame path.** `is_armed()` is a `Relaxed` atomic load; sample
construction happens only at the existing 2 s / 1 s aggregation boundary, never per frame.
- **`id` is path-traversal-safe.** `load`/`delete` MUST reject any id not matching
`^[A-Za-z0-9._-]+$` (no `/`, no `..`, no `:` — keep it a valid Windows filename), and only ever
join `dir/<id>.json`. Return NotFound on reject. (Endpoints are bearer-authed, but defend in
depth.)
- **Bounded memory.** `MAX_SAMPLES` cap; truncate (keep oldest), never unbounded.
- **Atomic disk write.** Write to `<id>.json.tmp` then rename, so a crash mid-write can't leave
a half file. Pretty-print not required; compact JSON is fine.
- Captures dir: `~/.config/punktfunk/captures/` (next to `cert.pem` etc.). Resolve via the same
config-dir helper the rest of the host uses.
### Runtime gating change (the key behavioral change)
Today the loops measure per-stage timing only `if perf` (a startup bool). Change the per-frame
**measurement** predicate to `let measure = perf || recorder.is_armed();`, re-evaluated each
frame (cheap atomic). Then at the aggregation boundary:
- if `perf` → keep the existing `tracing::info!` log line (unchanged behavior);
- if `recorder.is_armed()` → also build a `StatsSample` and `push_sample`.
So `PUNKTFUNK_PERF=1` still works exactly as before, AND the web toggle now works at runtime
with zero startup flags.
### Where each loop emits the sample
- **native** (`punktfunk1.rs`): the cap/submit/encode(`st_wait`) splits live in the capture
thread; `mbps`/`send_dropped`/`bytes` and `session.stats()` live in the send thread. Emit the
complete sample from **one** place. Cleanest: carry the per-frame `cap_us/submit_us/wait_us`
(and a `repeat: bool`) on `FrameMsg` to the send thread (it already carries `encode_us`), so
`send_loop` builds the whole sample at its existing 2 s boundary where `session.stats()` is
already read. Compute `frames_dropped/packets_dropped/send_dropped/fec_recovered` as deltas vs
the previous window's `Session::stats()` snapshot (the loop already tracks `last_bytes` /
`last_send_dropped` — extend that bookkeeping). `register_session` is called once with the
negotiated mode/codec and the client label.
- **gamestream** (`gamestream/stream.rs`): the encode loop already tracks per-stage max each
1 s. Add p50/p99 accumulation (small per-stage `Vec<u32>` like the native path) and, when
`perf || recorder.is_armed()`, emit a `StatsSample` with stages
`[capture, encode, packetize, send]` + fps (unique new frames) + mbps + whatever loss/byte
counters that path exposes (use 0 where a counter doesn't exist; do NOT fabricate). Call
`register_session("gamestream", ...)` with the GameStream-negotiated mode/codec/client.
Threading: add `stats: Arc<StatsRecorder>` to `SessionContext` and the GameStream stream
setup; the standalone `punktfunk1-host` subcommand (no mgmt) passes a fresh recorder (harmless,
just unused).
---
## 2. Host: mgmt REST API (`mgmt.rs`)
Add `stats: Arc<StatsRecorder>` to `MgmtState`. Register handlers in `api_router_parts()` via
`routes!()` with `#[utoipa::path]`. All under `/api/v1`, **bearer-token only** (operator
actions — do NOT add them to the mTLS `cert_may_access` read-only allowlist). All bodies/returns
derive `ToSchema`; errors use the `ApiJson`/`ApiError` envelope. Tag every operation `stats`.
| Method & path | fn (operationId) | body → returns |
|---------------------------------------|-------------------------|-------------------------------|
| POST `/api/v1/stats/capture/start` | `stats_capture_start` | — → `StatsStatus` |
| POST `/api/v1/stats/capture/stop` | `stats_capture_stop` | — → `CaptureMeta` (200) / 204-ish if nothing was recording |
| GET `/api/v1/stats/capture/status` | `stats_capture_status` | → `StatsStatus` |
| GET `/api/v1/stats/capture/live` | `stats_capture_live` | → `Capture` (in-progress; 404/empty if idle) |
| GET `/api/v1/stats/recordings` | `stats_recordings_list` | → `Vec<CaptureMeta>` |
| GET `/api/v1/stats/recordings/{id}` | `stats_recording_get` | → `Capture` |
| DELETE `/api/v1/stats/recordings/{id}`| `stats_recording_delete`| → `StatsStatus`/204 |
Register the new `ToSchema` types with the OpenApi derive's `components(schemas(...))` list.
Then regenerate the checked-in spec:
```
cargo run -p punktfunk-host -- openapi > api/openapi.json
```
CI fails on drift — the regenerated `api/openapi.json` MUST be committed.
---
## 3. Web console (`web/`)
New page **"Performance"** following the established route → section/index (fetch) →
section/view (presentational) pattern, registered in the `NAV` array (`app-shell.tsx`) with a
lucide icon (`Activity` or `LineChart`).
- Route: `web/src/routes/stats.tsx``createFileRoute('/stats')``SectionStats`.
- Section: `web/src/sections/Stats/index.tsx` (orval hooks) + `view.tsx` (presentational,
i18n via Paraglide `m.*`). Use `Section`, `QueryState`, `Card`/`CardHeader`/`CardTitle`/
`CardContent`, `Button`, `Badge` from `web/src/components/ui`.
- Charts: **add `recharts`** to `web/package.json` (no chart lib exists today). Render charts
**client-only** (a mounted guard) so SSR doesn't choke on `ResponsiveContainer`'s 0-width
measure. Theme via existing CSS variables / brand violet, dark-mode aware.
Data hooks come from regenerated orval (`bun run api:gen` after the host's openapi.json is
updated): `useStatsCaptureStatus`, `useStatsCaptureStart`, `useStatsCaptureStop`,
`useStatsCaptureLive`, `useStatsRecordingsList`, `useStatsRecordingGet`,
`useStatsRecordingDelete` (exact names per orval's tag/operationId convention — verify against
generated output and adjust the view imports to match).
UI layout:
1. **Capture control card** — Start/Stop button (mutations; invalidate status query on
success), a "Recording…"/"Idle" `Badge`, elapsed time + live sample count
(`useStatsCaptureStatus`, `refetchInterval: 2000`). On Start, the live chart appears.
2. **Live chart** (visible while armed; `useStatsCaptureLive`, `refetchInterval: 2000`) — the
latency stage breakdown as a **stacked area** (capture/submit/encode/send in µs, the
"where does the time go" view), with fps and mbps as secondary line charts.
3. **Recordings card** — table from `useStatsRecordingsList`: time, kind badge, resolution,
codec, duration, sample count; row actions **View** (select → detail), **Download** (export
the `Capture` JSON via the recording GET), **Delete** (mutation, confirm).
4. **Recording detail** — when a recording (or the live capture) is selected, render the full
graph set from its `samples`:
- Latency stage breakdown (stacked area, µs) — primary bottleneck view; p99 overlay toggle.
- Throughput: fps (new vs repeat) + mbps.
- Health: frames_dropped / packets_dropped / send_dropped / fec_recovered over time.
i18n: add keys to `web/messages/en.json` + `de.json` (nav label, titles, button/labels) and
regenerate Paraglide. Keep both locales in sync.
---
## 4. Verification / done-criteria
- `cargo build -p punktfunk-host` (and `--workspace`), `cargo clippy --workspace --all-targets
-D warnings`, `cargo fmt --all --check` — green.
- `cargo run -p punktfunk-host -- openapi > api/openapi.json` — committed, no drift.
- `PUNKTFUNK_PERF=1` stdout behavior unchanged (no regression to the existing perf log).
- Web: orval regen clean, typecheck/build green, charts render client-side.
- CLAUDE.md status note + this plan updated.
- Adversarial review: hot-path stays sync + bounded; `id` path-traversal-safe; OpenAPI/orval no
drift; SSR-safe charts; both paths actually emit samples.
@@ -51,7 +51,7 @@ back — i.e. the Windows analogue of the **GTK4 Linux client** (`clients/linux`
which is the architectural template. The Windows client is close to a 1:1 port of the Linux client
with the platform layers swapped.
## Locked decisions (from the Windows-host/client plan, `docs/windows-host.md` + project memory)
## Locked decisions (from the Windows-host/client plan, `design/windows-host.md` + project memory)
- **Pure Rust.** `windows-rs` + **Windows App SDK "Reactor"** (WinUI 3 from Rust, merged windows-rs
PR #4479). No C++/C#. De-risk Reactor + `SwapChainPanel` FIRST — it's the only novel/uncertain
@@ -165,6 +165,6 @@ Windows client should mirror it:
- **Core client API:** `crates/punktfunk-core/src/client.rs` (`NativeClient`).
- **Protocol:** `crates/punktfunk-core/src/quic.rs` (`Hello.video_caps`, `Welcome.bit_depth`,
`VIDEO_CAP_10BIT`/`VIDEO_CAP_HDR`).
- **Full Windows plan + SudoVDA/host details:** `docs/windows-host.md`.
- **Full Windows plan + SudoVDA/host details:** `design/windows-host.md`.
- **Host HDR conversion (for the inverse math):** `crates/punktfunk-host/src/capture/dxgi.rs`
(`HDR_PS`, `HdrConverter`) + `crates/punktfunk-host/src/encode/nvenc.rs` (BT.2020/PQ VUI).
@@ -34,7 +34,7 @@ which kept the live-validated host working at every step. The driver, by contras
|---|---|---|
| **Goal 1** — clean, layered host architecture | ✅ **DONE** | `config.rs` (`HostConfig`), `session_plan.rs` (`SessionPlan`), `SessionContext`, `windows/`+`linux/` confinement (`38c68c3`), `VirtualDisplayManager` (§2.5), `EncoderCaps` (`0ccd0fe`) |
| **Goal 2** — drop every trace of SudoVDA | ✅ **DONE** | reach-in decoupled (F1: `d638a93`/`e60cda3``win_adapter`/`win_display`), then the `sudovda.rs` backend + the dual-backend select **deleted** (this branch) — pf-vdisplay is the sole Windows virtual-display backend |
| **Goal 3** — minimize `unsafe` + P0 lints | 🟡 **PARTIAL** (**box-validated**) | driver `deny(unsafe_op_in_unsafe_fn)` (`a755d6e`); **`OwnedHandle`/RAII rollout** — `idd_push.rs` (`011607e`, view-leak fix) + `service.rs` child/job (`4c95ba7`) + the 3 gamepad backends via shared `gamepad_raii.rs` (`e5c2b4e`) + the IDD-push `KeyedMutexGuard` hot loop (`6585643`); **driver `pod_init!`** (`bf57704`, 27→1). **On-glass clean: host clippy `-D warnings` + driver build** (RTX box; `bd05bc8` fixed 11 lints the gate surfaced). Remaining: host-crate P0 lints (deferred — churn>value), the `service.rs` SCM-handler event smuggling (deliberately left) |
| **Goal 3** — minimize `unsafe` + P0 lints | 🟡 **PARTIAL** (**box-validated**) | driver `deny(unsafe_op_in_unsafe_fn)` (`a755d6e`); **`OwnedHandle`/RAII rollout** — `idd_push.rs` (`011607e`, view-leak fix) + `service.rs` child/job (`4c95ba7`) + the 3 gamepad backends via shared `gamepad_raii.rs` (`e5c2b4e`) + the IDD-push `KeyedMutexGuard` hot loop (`6585643`) + the **SCM STOP/SESSION events**`OnceLock<OwnedHandle>` (`61c02e6`, runtime-validated: clean ~1 s `sc stop`); **driver `pod_init!`** (`bf57704`, 27→1). **On-glass clean: host clippy `-D warnings` + driver build** (RTX box; `bd05bc8` fixed 11 lints the gate surfaced). The host-side raw-handle smuggling is fully retired; only host-crate P0 lints remain (deferred — churn>value) |
| **M0** — proto ABI + driver toolchain + `/INTEGRITYCHECK` + `iddcx` | ✅ **DONE** | `pf-driver-proto`; vendored `windows-drivers-rs` 0.5.1; `clear-force-integrity.ps1`; CI-green |
| **M1** — new IddCx driver, first light + HDR | ✅ **DONE (on-glass)** | STEP 08 (`d7a9fbf``cd59151`); HDR live ("Mac connects WITH HDR", `6399d28`) |
| **M2** — IDD-push capture + NVENC, glass-to-glass | ✅ **DONE (on-glass)** | 5120×1440@240 HDR zero-copy; integrated into the host path |
@@ -226,14 +226,16 @@ These are expensive empirical wins; keep them intact when touching the code:
`unsafe fn`s need an inner `unsafe {}`). Stage it **per-module, Linux-first** (item-level `#[deny]` on
`linux/zerocopy/cuda.rs`/`egl.rs`, `encode/linux/vaapi.rs` — locally verifiable), then the Windows
modules (CI-gated), then promote to crate-level. The driver already has the deny.
5. **D2 — `OwnedHandle` / RAII rollout.****done** `capture/windows/idd_push.rs` (`011607e`: a
`MappedSection` RAII for the mapping handle **+** the leaked `MapViewOfFile` view, + `OwnedHandle` for the
event / ring-slot shared handles); `windows/service.rs` (`4c95ba7`: the child process/thread + Job
handles, ~9 `CloseHandle` deleted); and the **three gamepad backends** (`e5c2b4e`: a shared
5. **D2 — `OwnedHandle` / RAII rollout.****DONE (complete).** `capture/windows/idd_push.rs` (`011607e`:
a `MappedSection` RAII for the mapping handle **+** the leaked `MapViewOfFile` view, + `OwnedHandle` for
the event / ring-slot shared handles); `windows/service.rs` (`4c95ba7`: the child process/thread + Job
handles, ~9 `CloseHandle` deleted); the **three gamepad backends** (`e5c2b4e`: a shared
`inject/windows/gamepad_raii.rs``Shm` for the section+view, `SwDevice` for the devnode — replacing the
duplicated `create_shm_section` + three hand-written `Drop`s). **Remaining (deliberately left):** the
`service.rs` `AtomicIsize` STOP/SESSION events — smuggled into the C SCM handler, a separate riskier
redesign. `manager.rs`/`pf_vdisplay.rs` already used the pattern.
duplicated `create_shm_section` + three hand-written `Drop`s); and the **SCM STOP/SESSION events**
(`61c02e6`: `AtomicIsize` raw-`isize` smuggle → `OnceLock<OwnedHandle>` the capture-free C handler reads,
owned for the process lifetime — also closes a latent close-then-signal window). **Runtime-validated on
the RTX box**: swapped in, `sc start` → RUNNING, `sc stop` → clean STOPPED in ~1 s (not a timeout-kill),
original restored. `manager.rs`/`pf_vdisplay.rs` already used the pattern.
6. **Hot-loop `KeyedMutexGuard` ✅ done** (`6585643`) — the IDD-push consume loop's hand-written
`AcquireSync`/`ReleaseSync` (with its "don't `?`-return between them or you leak the lock + stall the
driver" caveat) is now a RAII guard scoped to the convert/copy block: same release point (latency
@@ -426,5 +428,5 @@ This file replaces five docs (recoverable from git history):
- `windows-host-rewrite-game-capture-bug.md` (the GB1 investigation + fix) — **fixed**; the resolution is
§2.5 (capture). The full investigation narrative is in git history.
(The older `docs/windows-host.md`, a pre-rewrite implementation plan from 2026-06-22, is a separate
(The older `design/windows-host.md`, a pre-rewrite implementation plan from 2026-06-22, is a separate
lineage and is left as-is.)
+2 -2
View File
@@ -16,8 +16,8 @@ sidebar, and the landing page). It reads [`public/openapi.json`](public/openapi.
```sh
# from the repo root — regenerate the spec, then copy the snapshot in:
cargo run -p punktfunk-host -- openapi > docs/api/openapi.json
cp docs/api/openapi.json docs-site/public/openapi.json
cargo run -p punktfunk-host -- openapi > api/openapi.json
cp api/openapi.json docs-site/public/openapi.json
```
## Develop
+4 -2
View File
@@ -64,8 +64,10 @@ DualSense feedback, automatic host discovery, PIN pairing with pinned reconnects
overlay — with D-pad and game-controller focus navigation for the couch. It builds from the
`clients/android` directory (Kotlin + a shared Rust core).
Install it from **Google Play** — see [Install a Client](/docs/install-client#android). Open the app,
pick your host, [pair](/docs/pairing) once, and stream.
The app is in **Google Play Internal Testing** — request a tester invite on our
[**Discord**](https://discord.gg/kaPNvzMuGU) and we'll add you (see
[Install a Client](/docs/install-client#android)). Once added, open the app, pick your host,
[pair](/docs/pairing) once, and stream.
## Windows desktop client
+10 -2
View File
@@ -16,14 +16,22 @@ monitor. When the client disconnects, the virtual display goes away.
That's why a 1080p60 laptop and a 1440p120 desktop can stream from the same host **at the same time**,
each at its own mode — they each get their own virtual display.
How the virtual display is created depends on your desktop:
How the virtual display is created depends on your host:
| Desktop | How |
| Host | How |
|---|---|
| **GNOME** (Mutter) | A virtual monitor via the screen-cast API |
| **KDE Plasma** (KWin) | A virtual output via KWin's screencast |
| **Bazzite / Steam** (gamescope) | A nested gamescope session launched at the client's mode |
| **Sway** (wlroots) | A headless output added to the running session |
| **Windows** | A virtual-display driver — including punktfunk's own **indirect display driver** the host pushes frames straight into — a real virtual display, no physical monitor, even on the secure desktop |
That last one is the distinctive part on Windows: rather than only capturing an existing screen,
punktfunk has **its own indirect display driver (IDD)**, and the host can push finished frames
**straight into the driver**. You get the same on-the-fly virtual display the Linux compositors give
you — at the client's exact mode, with no physical monitor or dummy HDMI dongle, and even on the
secure desktop (UAC / lock screen). That tight, push-based integration is unusual among Windows
streaming hosts.
## From screen to GPU to wire
+15 -7
View File
@@ -1,19 +1,21 @@
---
title: Introduction
description: Low-latency desktop and game streaming from a Linux host to any of your devices.
description: Low-latency desktop and game streaming from a Linux or Windows host to any of your devices.
---
import { Cards, Card } from 'fumadocs-ui/components/card'
**punktfunk** streams your Linux desktop or games to your other devices — a laptop, a Mac, a tablet,
**punktfunk** streams your desktop or games to your other devices — a laptop, a Mac, a tablet,
a TV — at low latency and at **each device's own resolution and refresh rate**. Run the host on a
Linux machine with an NVIDIA GPU, connect a client, and you're streaming.
Linux machine or a Windows PC, connect a client, and you're streaming.
It's built for the things that make streaming feel native:
- **Your device's exact mode.** The host spins up a virtual display sized to the client that's
connecting — 1080p60 to your laptop, 1440p120 to your desktop, 4K to your TV — at the same time.
No letterboxing, no scaling, no juggling your real monitors.
No letterboxing, no scaling, no juggling your real monitors. On Windows that's punktfunk's own
indirect display driver, frames pushed straight in — so there's no physical monitor or dummy HDMI
plug to deal with, even on the secure desktop.
- **Low latency, GPU end to end.** Frames go straight from the compositor to the GPU encoder
(NVENC) with zero CPU copies, and over a transport tuned for responsiveness rather than throughput.
- **Works with the apps you already have.** punktfunk speaks the GameStream protocol, so any
@@ -34,11 +36,17 @@ It's built for the things that make streaming feel native:
## What you need
- A **Linux host** with an **NVIDIA GPU** (for the NVENC hardware encoder) running one of the
[supported setups](/docs/requirements): **Ubuntu** (GNOME or KDE), **Fedora** (KDE), or **Bazzite**.
A native [**Windows host**](/docs/windows-host) (NVIDIA-only) is also available.
- A **host** with a supported GPU — either a **Linux** machine running one of the
[supported setups](/docs/requirements) (**Ubuntu** GNOME or KDE, **Fedora** KDE, or **Bazzite**), or
a **[Windows](/docs/windows-host) PC**.
- A **client device** to stream to — there are native apps for **macOS, iOS/iPadOS, tvOS, Linux,
Windows, and Android**, plus any device that runs **Moonlight**.
- Both on the **same network** (LAN or VPN). punktfunk is designed for a trusted local network.
Ready? Head to the [Quick Start](/docs/quickstart).
## Community
Questions, help, or want to try the **Android beta**? Join the
[**Discord**](https://discord.gg/kaPNvzMuGU) — request a tester invite there — or
[**r/Punktfunk**](https://www.reddit.com/r/Punktfunk/) on Reddit.
+9 -5
View File
@@ -20,7 +20,7 @@ Whichever client you install, the first connection needs a one-time [pairing](/d
| **Windows** | [Signed MSIX](#windows) from the package registry |
| **macOS** | [Notarized `.dmg`](#macos) from the releases page |
| **iPhone / iPad / Apple TV** | [TestFlight beta](#ios-ipados-apple-tv) |
| **Android / Android TV** | [Google Play](#android) |
| **Android / Android TV** | [Beta — request access](#android) |
| Anything else (browser, old phone, TV) | [Moonlight](/docs/moonlight) |
## Linux desktop (Flatpak)
@@ -118,12 +118,16 @@ Open the app, and your hosts appear automatically under *On this network*.
## Android
The Android client (phone + Android TV) is on **Google Play**:
The Android client (phone + Android TV) is in **Google Play Internal Testing**. To try it, request a
tester invite on our [**Discord**](https://discord.gg/kaPNvzMuGU) and we'll add your Google account to
the test track:
**[Request access on Discord →](https://discord.gg/kaPNvzMuGU)**
Once you're added, install it from Google Play, then open the app and pick your host:
**[Get punktfunk on Google Play →](https://play.google.com/store/apps/details?id=io.unom.punktfunk)**
Install, open the app, and pick your host. _(The app is in testing — if the listing isn't visible
to you yet, you'll need to be added to the test track.)_
_(only resolves once your account is on the tester list)_
## Anything else — Moonlight
@@ -84,9 +84,10 @@ session unit — see [Bazzite](/docs/bazzite).
## Windows
> punktfunk is Linux-first, but a native **Windows host** also ships a signed installer with an SCM
> service and a bundled virtual-display driver. It's **NVIDIA-only** (NVENC) and newer than the Linux
> host. (Not to be confused with the Windows *client*, which streams *to* a Windows PC.)
> punktfunk has first-class **Linux and Windows** hosts. On Windows it ships as a signed installer
> with an SCM service and a virtual-display driver — including punktfunk's own **indirect display
> driver** the host pushes frames straight into. The Windows host is newer than the Linux host. (Not
> to be confused with the Windows *client*, which streams *to* a Windows PC.)
On Windows the host runs as a `LocalSystem` service that launches into the interactive session, so it
captures the secure desktop (UAC / lock screen) and survives reboots with nobody logged in — the same
+3 -3
View File
@@ -23,9 +23,9 @@ A high-level view of where punktfunk stands. The ordered plan of work is on the
## What works today
punktfunk is a low-latency desktop and game streaming **host** — Linux-first (Linux + NVIDIA, NVENC),
with a newer **NVIDIA-only Windows host** too — and native **clients** on macOS, iOS/iPadOS/tvOS,
Linux, Windows, and Android.
punktfunk is a low-latency desktop and game streaming **host** with first-class **Linux and Windows**
support — and native **clients** on macOS, iOS/iPadOS/tvOS, Linux, Windows, and Android. (The Windows
host is newer than the Linux host.)
- **Two protocols.** The host speaks the **GameStream** protocol, so any **Moonlight**
client works out of the box, plus its own lower-latency **`punktfunk/1`** protocol
+8 -6
View File
@@ -1,15 +1,17 @@
---
title: "Windows Host"
description: "Run the punktfunk streaming host on a Windows PC with an NVIDIA GPU."
description: "Run the punktfunk streaming host on a Windows PC — a first-class, virtual-display host."
---
**Status: implemented and shipping — NVIDIA-only, x64-only.** punktfunk is Linux-first, but it also
runs as a native **Windows host**: a signed installer registers a `LocalSystem` service that streams
**Status: implemented and shipping — x64-only.** Alongside the Linux host, punktfunk runs as a
first-class native **Windows host**: a signed installer registers a `LocalSystem` service that streams
your Windows desktop or games to any punktfunk or Moonlight client, at the client's exact resolution
via a virtual display — including **HDR10** (10-bit BT.2020 PQ) when your Windows desktop is in HDR
mode. It's newer and less battle-tested than the Linux host, and it is built specifically around
NVIDIA hardware. (The Linux host is 8-bit only — HDR there is blocked upstream.)
via a **virtual display** — including **HDR10** (10-bit BT.2020 PQ) when your Windows desktop is in HDR
mode. punktfunk has its own **indirect display driver (IDD)** that the host pushes finished frames
straight into, so you get a real on-the-fly virtual display with no physical monitor or dummy HDMI
plug — even on the secure desktop (UAC / lock screen). The Windows host is newer and less
battle-tested than the Linux host. (The Linux host is 8-bit only — HDR there is blocked upstream.)
> This page is about the Windows **host** (streaming *from* a Windows PC). To stream *to* a Windows
> PC, see the [Windows client](/docs/clients#windows-desktop-client).
+2
View File
@@ -19,6 +19,8 @@ export function baseOptions(): BaseLayoutProps {
{ text: 'API', url: '/api' },
{ text: 'Website', url: 'https://punktfunk.unom.io' },
{ text: 'Source code', url: 'https://git.unom.io/unom/punktfunk' },
{ text: 'Discord', url: 'https://discord.gg/kaPNvzMuGU' },
{ text: 'Reddit', url: 'https://www.reddit.com/r/Punktfunk/' },
],
}
}
+2 -2
View File
@@ -13,8 +13,8 @@ function Home() {
<BrandMark className="size-20 drop-shadow-[0_8px_30px_rgba(108,91,243,0.45)]" />
<Wordmark className="h-12 md:h-14" />
<p className="max-w-xl text-fd-muted-foreground">
Linux-first, low-latency desktop and game streaming a shared Rust protocol
core with native clients per platform.
Low-latency desktop and game streaming with first-class Linux and Windows
hosts a shared Rust protocol core with native clients on every platform.
</p>
<Link
to="/docs/$"
+1 -1
View File
@@ -94,7 +94,7 @@ package_punktfunk-host() {
install -Dm0644 "$R/scripts/host.env.example" "$pkgdir/usr/share/punktfunk/host.env.example"
install -Dm0644 "$R/packaging/bazzite/host.env" "$pkgdir/usr/share/punktfunk/host.env.bazzite"
install -Dm0644 "$R/packaging/kde/host.env" "$pkgdir/usr/share/punktfunk/host.env.kde"
install -Dm0644 "$R/docs/api/openapi.json" "$pkgdir/usr/share/punktfunk/openapi.json"
install -Dm0644 "$R/api/openapi.json" "$pkgdir/usr/share/punktfunk/openapi.json"
install -Dm0644 "$R/LICENSE-MIT" "$pkgdir/usr/share/licenses/punktfunk-host/LICENSE-MIT"
install -Dm0644 "$R/LICENSE-APACHE" "$pkgdir/usr/share/licenses/punktfunk-host/LICENSE-APACHE"
install -Dm0644 "$R/README.md" "$pkgdir/usr/share/doc/punktfunk-host/README.md"
+1 -1
View File
@@ -257,7 +257,7 @@ journalctl --user -u punktfunk-host -f
> ⚠️ **There is no firewall script or firewall doc in the repo.** The ports below are derived
> directly from the code constants (`crates/punktfunk-host/src/gamestream/mod.rs`, `mgmt.rs`) and
> the GameStream-host port-map (`docs/gamestream-host-plan.md`). Treat the `firewall-cmd` lines as recommended-but-verified,
> the GameStream-host port-map (`design/gamestream-host-plan.md`). Treat the `firewall-cmd` lines as recommended-but-verified,
> not a checked-in script.
**GameStream / Moonlight ports** (fixed; Moonlight derives them from the HTTP base). These only apply
+1 -1
View File
@@ -57,7 +57,7 @@ install -Dm0644 scripts/headless/punktfunk-sink.conf "$SHAREDIR/headless/punkt
install -Dm0644 scripts/host.env.example "$SHAREDIR/host.env.example"
install -Dm0644 packaging/bazzite/host.env "$SHAREDIR/host.env.bazzite"
install -Dm0644 packaging/kde/host.env "$SHAREDIR/host.env.kde"
install -Dm0644 docs/api/openapi.json "$SHAREDIR/openapi.json"
install -Dm0644 api/openapi.json "$SHAREDIR/openapi.json"
install -Dm0644 LICENSE-MIT "$DOCDIR/LICENSE-MIT"
install -Dm0644 LICENSE-APACHE "$DOCDIR/LICENSE-APACHE"
install -Dm0644 README.md "$DOCDIR/README.md"
+2 -2
View File
@@ -224,7 +224,7 @@ install -Dm0644 packaging/kde/host.env %{buildroot}%{_datadir}/%
# Bazzite KDE Desktop-mode one-shot setup (KWIN_WAYLAND_NO_PERMISSION_CHECKS + RemoteDesktop grant).
install -d %{buildroot}%{_datadir}/%{name}/bazzite
install -Dm0755 packaging/bazzite/kde-desktop-setup.sh %{buildroot}%{_datadir}/%{name}/bazzite/kde-desktop-setup.sh
install -Dm0644 docs/api/openapi.json %{buildroot}%{_datadir}/%{name}/openapi.json
install -Dm0644 api/openapi.json %{buildroot}%{_datadir}/%{name}/openapi.json
%if %{with web}
# --- web console subpackage (punktfunk-web) ---
@@ -246,7 +246,7 @@ install -Dm0644 web/web.env.example %{buildroot}%{_datadir}/punkt
%files
%license LICENSE-MIT LICENSE-APACHE
%doc README.md docs/implementation-plan.md packaging/README.md
%doc README.md design/implementation-plan.md packaging/README.md
%{_bindir}/punktfunk-host
%{_udevrulesdir}/60-punktfunk.rules
%{_prefix}/lib/sysctl.d/99-punktfunk-net.conf
+2 -1
View File
@@ -76,10 +76,11 @@ read it from `%ProgramData%\punktfunk\web-password`.
| `reset-pf-vdisplay.ps1` | **Dev:** recover a wedged driver — stop host → reap ghost monitor nodes → reload the adapter → start host (no reboot). See *Dev iteration* below. |
| `redeploy-pf-vdisplay.ps1` | **Dev:** one-shot redeploy — (optional) build → stop host → `deploy-dev.ps1 -Install` → reload adapter → start host. |
| `nvenc/nvenc.def`, `nvenc/gen-nvenc-importlib.ps1` | Synthesise `nvencodeapi.lib` for the `--features nvenc` link (llvm-dlltool / lib.exe). |
| `pf-vkhdr-layer/` | **HDR Vulkan layer** (standalone `cdylib`): lets Vulkan games (Doom: The Dark Ages, etc.) enable HDR over the virtual display by advertising the HDR surface formats the NVIDIA/AMD ICDs hide on an indirect display. Built by the packer, laid into `{app}\vklayer`, registered under `HKLM64\…\Khronos\Vulkan\ImplicitLayers` (opt-out *Install the HDR Vulkan layer* task). Self-gated on the display's HDR state. See its README. |
> **Vendored driver:** pf-vdisplay is our **all-Rust IddCx** virtual display (UMDF2), built from
> `packaging/windows/drivers/`. It replaced the vendored SudoVDA C++ driver — full story in
> [`docs/windows-virtual-display-rust-port.md`](../../docs/windows-virtual-display-rust-port.md). The
> [`design/windows-virtual-display-rust-port.md`](../../design/windows-virtual-display-rust-port.md). The
> **signed** output (`pf_vdisplay.dll`/`.inf`/`.cat` + `punktfunk-driver.cer`; signer
> `punktfunk-ds-test` — the same cert the gamepad drivers ship, Class=Display, HWID `root\pf_vdisplay`)
> is checked in under `pf-vdisplay/`. To refresh it after a driver-source change, rebuild + re-sign with

Some files were not shown because too many files have changed in this diff Show More