13 Commits

Author SHA1 Message Date
enricobuehler d01a8fd17a feat(host): HDR Vulkan layer so Vulkan games get HDR on the virtual display
ci / web (push) Failing after 22s
windows-host / package (push) Failing after 4m16s
ci / rust (push) Failing after 4m56s
ci / docs-site (push) Successful in 1m7s
android / android (push) Successful in 9m19s
ci / bench (push) Successful in 4m47s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Failing after 3s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
docker / deploy-docs (push) Has been skipped
deb / build-publish (push) Failing after 6m29s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 7m4s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 7m17s
apple / swift (push) Successful in 1m13s
apple / screenshots (push) Successful in 5m27s
NVIDIA/AMD Vulkan ICDs refuse to *advertise* an HDR color space for a surface on an
IddCx indirect/virtual display, so Vulkan games (Doom: The Dark Ages, id Tech, Indiana
Jones, …) report "device does not support HDR" — even though Windows HDR, DWM compose,
and the client PQ stream all work, and the ICD happily *accepts + presents* a forced HDR
swapchain there. The whole gap is enumeration; the community (Apollo/Sunshine/VDD) wrote
this off as kernel-side / unfixable.

Add VK_LAYER_PUNKTFUNK_hdr_inject (packaging/windows/pf-vkhdr-layer/): a standalone
cdylib Vulkan implicit layer that appends {A2B10G10R10, HDR10_ST2084} + {RGBA16F, scRGB}
to vkGetPhysicalDeviceSurfaceFormats[2]KHR (no need to hook vkCreateSwapchainKHR — the
ICD doesn't validate the color space there). Self-gated on the surface monitor's actual
advanced-color state (DisplayConfig GET_ADVANCED_COLOR_INFO), so it is a complete no-op
on SDR sessions and real monitors (dedup). Always-on (registry-discovered) so it works
regardless of how a game is launched — env-scoping silently fails for already-running
Steam. Escape hatches: DISABLE_PF_VKHDR, PF_VKHDR_EXCLUDE, and a built-in kernel-anti-
cheat denylist.

The installer builds/signs/stages it and registers it under
HKLM64\SOFTWARE\Khronos\Vulkan\ImplicitLayers (opt-out "Install the HDR Vulkan layer"
task); windows-host CI fmt+clippy-gates it (msvc-only FFI).

Live-validated on the RTX box: Doom: The Dark Ages enables HDR over the pf-vdisplay
virtual display.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:33:20 +00:00
enricobuehler 3e7c9bd059 fix(host): remove unsound unsafe impl Sync for HelperRelay
apple / screenshots (push) Has been skipped
windows-drivers / probe-and-proto (push) Successful in 29s
audit / cargo-audit (push) Failing after 1m20s
windows-drivers / driver-build (push) Successful in 1m14s
ci / web (push) Successful in 46s
ci / docs-site (push) Successful in 1m3s
windows-host / package (push) Successful in 6m46s
apple / swift (push) Failing after 0s
release / apple (push) Failing after 0s
android / android (push) Failing after 2m5s
ci / bench (push) Successful in 4m34s
decky / build-publish (push) Successful in 22s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m25s
ci / rust (push) Successful in 8m36s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m11s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 59s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m37s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m3s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 29s
deb / build-publish (push) Successful in 7m50s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m52s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 1m5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m33s
flatpak / build-publish (push) Successful in 3m56s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m46s
docker / deploy-docs (push) Successful in 22s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m26s
The one genuine soundness defect the unsafe-proof program surfaced (flagged
SUSPECT in program 3/N). `HelperRelay` holds an `rx: Receiver<RelayAu>`, which is
`!Sync` (std mpsc is single-consumer), so asserting `Sync` claimed more than the
fields support — an `Arc<HelperRelay>` recv'd from two threads would compile and
be UB.

It was never live-exploited, and it turns out `Sync` is also unnecessary: the
relay is a single-owner `mut relay` local in the punktfunk1 two-process mux loop
(recv_timeout/try_recv/request_keyframe all called on the owning thread; no `Arc`,
no `thread::spawn` capturing it). So the fix is simply to delete the impl — the
struct keeps its sound `unsafe impl Send` (needed for the raw `HANDLE` fields),
which is all the code uses.

Box-verified: cargo clippy -p punktfunk-host --features nvenc --target
x86_64-pc-windows-msvc -- -D warnings stays green without the Sync impl.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 10:00:40 +00:00
enricobuehler 7aa787a789 docs(host): prove the last 3 files + crate-root deny (unsafe-proof program 4/N, final)
Completes the unsafe-proof program now that the parallel WIP has landed:

- idd_push.rs (25 sites), nvenc.rs (7), punktfunk1.rs (21): a SAFETY proof on
  every unsafe block — D3D11/DXGI COM (same-device textures, immediate-context
  single-thread, keyed-mutex-held convert), the NVENC SDK table (versioned POD,
  register/map/lock-bitstream pairing), cross-process shm reads (atomic
  magic/generation handshake), and the C-ABI harness (each call cross-checked
  against its abi.rs `# Safety` doc). No SUSPECT (UB) blocks.
- capture.rs / encode.rs: the parent-module deny is restored (their WIP children
  are now proven), and main.rs gains a crate-root
  #![deny(clippy::undocumented_unsafe_blocks)] — the permanent catch-all gate so
  no future unsafe block anywhere in the crate can land without a proof.
- Fixed 4 blocks the agents missed: unsafe blocks nested inside `assert_eq!(...)`
  macro args (the comment-above-statement didn't associate) — hoisted to a `let`.
- rustfmt-canonicalized the Windows files (the agents' SAFETY comments + some
  pre-existing 1.9.0 drift) so `cargo fmt --all --check` is clean.

Verified: cargo clippy -p punktfunk-host --all-targets -- -D warnings AND
cargo fmt -p punktfunk-host --check both green with the crate-root deny active.
Windows cfg(windows) re-verified on the box next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 09:57:00 +00:00
enricobuehler 3514702d8c feat(windows-host): IDD-push encodes native NV12/P010 (skip NVENC's SM-side CSC)
GPU-contention work (host-latency plan §5.A): the IDD-push output ring now hands
NVENC native YUV instead of RGB, so NVENC skips its internal RGB→YUV colour
conversion on the SM/3D engine the running game saturates.

- idd_push.rs: out_ring is now NV12 (SDR, BT.709 limited) via a D3D11 VIDEO-engine
  BGRA→NV12 VideoConverter (keeps the CSC off the contended 3D/compute engine), or
  P010 (HDR, BT.2020 PQ limited) via the FP16→P010 shader (NVIDIA's VideoProcessor
  can't do RGB→P010). The ring drops its per-slot RTV (textures only), matching the
  WGC YUV ring; converters rebuild on a size/HDR flip.
- nvenc.rs: NV12 input forces bit_depth=8 so an HDR→SDR toggle (or a 10-bit-
  negotiated client on an SDR display) re-inits the session at the matching depth —
  NV12 can't feed a 10-bit session (register_resource rejects it).
- punktfunk1.rs: per-stage latency instrumentation under PUNKTFUNK_PERF
  (cap=try_latest, submit=encode_picture, wait=lock_bitstream µs p50/p99/max) to
  pinpoint where capture→encoded latency goes under GPU saturation.
2026-06-26 09:35:23 +00:00
enricobuehler 327a5fa828 docs(host): prove unsafe blocks in the Windows + cross-platform files + gate them (unsafe-proof program 3/N)
Continues the unsafe-proof program across the Windows/cross-platform host files
(~75 blocks, 21 files), each with a SAFETY proof of the real invariant and a
per-file #![deny(clippy::undocumented_unsafe_blocks)] gate:

  capture/windows: dxgi.rs, wgc_relay.rs, wgc.rs, desktop_watch.rs, composed_flip.rs
                   (windows-rs COM: interface validity, same-D3D11-device textures,
                    immediate-context single-thread, borrowed args outlive the call)
  windows: service.rs (SCM/token/CreateProcessAsUserW/event handles — OwnedHandle
           liveness, no double-close/signal race), win_display, wgc_helper, interactive
  vdisplay/windows: manager.rs, pf_vdisplay.rs (SwDeviceCreate/IddCx/ioctl handle
                    liveness via the OnceLock VDM singleton + OwnedHandle)
  encode/windows: ffmpeg_win.rs (full AVBufferRef refcount audit — balanced, NO leaks,
                  unlike the vaapi sibling), sw.rs
  cross-platform: gamestream/audio.rs (libopus), gamestream/stream.rs (sendmmsg),
                  inject/windows/sendinput.rs, audio/windows/wasapi_mic.rs,
                  session_tuning.rs, vdisplay.rs

Two findings (handled separately):
- wgc_relay.rs `unsafe impl Sync for HelperRelay` is UNSOUND (its mpsc Receiver is
  !Sync) though not live-exploited — marked SUSPECT inline; fix pending box check
  (it touches the in-flight punktfunk1.rs).
- capture.rs / encode.rs (PARENT modules of the WIP idd_push.rs / nvenc.rs) do NOT
  get the file deny yet — it would propagate the lint into the undocumented WIP
  children. The deny lands there once those are documented (after the WIP commits).

Linux-visible parts verified green (cargo clippy -p punktfunk-host --all-targets
-- -D warnings). The cfg(windows) deny gates are box-verified next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 09:23:25 +00:00
enricobuehler 9777ed7fb3 fix(host/vaapi): plug two AVBufferRef leaks in DmabufInner::open
Surfaced while writing the unsafe-soundness proofs (2/N): both are refcount
leaks (sound — never dangling/double-free — so the SAFETY proofs held, but real
bugs on the persistent punktfunk1-host listener that opens a fresh encoder per
session).

1. Per-session leak: `par->hw_frames_ctx = av_buffer_ref(drm_frames)` created a
   second owned ref. `av_buffersrc_parameters_set` takes its OWN ref of
   `par->hw_frames_ctx`, and `av_free(par)` frees only the struct, not the ref —
   so the extra ref leaked every session, pinning the DRM frames ctx + device.
   Fix: assign `drm_frames` borrowed (the standard ffmpeg pattern); our single
   owned ref lives in DmabufInner and is unref'd in Drop.

2. Error-path leak: the final `open_vaapi_encoder(...)?` returned without the
   unref ladder every other error path runs, leaking graph/drm_frames/
   vaapi_device/drm_device on encoder-open failure. Fix: match + clean up before
   returning (nv12_ctx is borrowed from the sink → freed by graph teardown).

cargo clippy -p punktfunk-host --all-targets -- -D warnings clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 09:02:54 +00:00
enricobuehler ba68a98873 docs(host): prove every unsafe block in the Linux FFI files + gate them (unsafe-proof program 2/N)
Continues the structural unsafe-proof program (every unsafe carries a documented
proof of soundness; the file gains #![deny(clippy::undocumented_unsafe_blocks)]
so it stays proven). This batch covers all 10 remaining pure-Linux files
(104 blocks), each proof stating the REAL invariant — not boilerplate:

  zerocopy/cuda.rs (26)   leaked process-lifetime libcuda fn-ptr table; opaque
                          CUcontext never dereferenced; free-exactly-once via the
                          Arc<Mutex<PoolInner>> ownership graph; dmabuf fd take/close split
  zerocopy/egl.rs (18)    eglGetProcAddress'd procs with the GL context current;
                          EGLImage liveness; the two-call modifier-query bounds
  zerocopy/vulkan.rs (4)  copy-bounds arithmetic (src_size>=span); Send = thread
                          confinement to the punktfunk-pipewire thread
  dmabuf_fence.rs (4)     poll/ioctl/close fd liveness + ownership
  capture/linux/mod.rs (16)  spa_data repr(transparent) cast; null-checked spa
                          derefs; single-loop-thread buffer ownership until requeue
  inject/linux/gamepad.rs (10)  uinput ioctl request-number ↔ struct-size match
                          (static-asserted); InputEventRaw no-padding for the byte cast
  encode/linux/vaapi.rs (15) + encode/linux/mod.rs (9)  ffmpeg object ownership/
                          free ladders; VAAPI/DRM graph; Send = single-thread transfer
  inject/linux/wlr.rs (2), vdisplay/linux/kwin.rs (1)

No memory-unsafety SUSPECT blocks were found — the unsafe is sound. The vaapi
agent did flag two real AVBufferRef *leaks* (not UB) in DmabufInner::open; marked
inline with NOTE(leak) and addressed in a follow-up.

Verified: cargo clippy -p punktfunk-host --all-targets -- -D warnings is clean
(each file's deny gate hard-errors on any undocumented block).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 09:00:30 +00:00
enricobuehler 22359f5dc8 docs(host): prove every unsafe block in drm_sync.rs + gate it (unsafe-proof program 1/N)
Start of the structural unsafe-proof program (per the "every unsafe needs a
documented proof of soundness" goal): each `unsafe` block gets an accurate
`// SAFETY:` proof of WHY it is sound, and the file gains
`#![deny(clippy::undocumented_unsafe_blocks)]` so the proof requirement is
permanently enforced (a future undocumented unsafe in this file fails CI).

drm_sync.rs (10 blocks: libc open/ioctl/clock_gettime/close + 3 in tests): each
proof states the real invariant — fd liveness/ownership, the ioctl request number
encoding the matching struct size, the `&mut req` being a live correctly-sized
`#[repr(C)]` struct, and (for the timeline ioctls) the `handles`/`points` arrays
outliving the synchronous call with `count_handles` matching their length.

The gate grows file-by-file (CI stays green; undone files don't carry the lint
yet); it promotes to a crate-root deny once every file is done. ~122 Linux blocks
+ the Windows files remain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 08:35:32 +00:00
enricobuehler 7e9023faad feat(gamestream): launch apps on Windows + Linux non-gamescope hosts
GameStream's apps.json `cmd` is delivered via set_launch_command, which ONLY the Linux
gamescope backend nests. On Windows (no gamescope) and Linux kwin/mutter/wlroots (which
stream the existing desktop) the command was silently dropped. Now, after capture is live,
stream.rs spawns it via library::launch_gamestream_command for those backends — Windows:
into the interactive USER session (spawn_in_active_session, since the host is SYSTEM);
Linux: a plain `sh -c` spawn into the host's own graphical session so the app lands on the
streamed (primary) output. Linux gamescope keeps nesting via set_launch_command and is
skipped here to avoid a double launch. The command is operator-typed apps.json (trusted),
never client-set.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 08:12:53 +00:00
enricobuehler 5acc12d9e9 feat(library): shared cover-art warmer + cache (GOG + Xbox art)
A disk-backed art cache (library-art-cache.json in the canonical host config dir) is the
source of truth read by all_games(), so the library list + launch-resolve never block on
the network. A host-lifetime background warmer (start_art_warmer, started in serve())
fetches uncached art OFF the hot path: GOG via the public no-auth api.gog.com product API,
Xbox via the unofficial no-auth displaycatalog (keyed by StoreId). Both best-effort
(protocol-relative URLs normalized to https; results cached even when empty so they aren't
re-fetched). The GOG + Xbox providers now read cached_art() (title-only until warmed).

Cross-platform (ureq blocking HTTP — no tokio on this path) so the fetch/parse code is
compiled + checked everywhere; a host whose stores all self-provide art (Steam CDN /
Heroic CDN / Lutris data: URLs) does no fetching. Dep: ureq (webpki roots, no system certs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 08:00:31 +00:00
enricobuehler aed0bf0c2a feat(library): Windows Xbox / Game Pass store provider
XboxProvider scans each fixed drive's <drive>:\XboxGames for GDK games (presence of
Content\MicrosoftGame.config marks a game vs. an ordinary UWP app), parsing title /
Identity name / Executable Id / StoreId via roxmltree. The PackageFamilyName is READ
from the AppRepository\Packages\<PackageFullName> dir name (reduced to Name_Hash) —
never computed from the publisher. Launch via the AUMID (shell:AppsFolder\<PFN>!<AppId>)
through explorer in the interactive user session (UWP activation needs the user token,
which spawn_in_active_session already provides). Cover art (displaycatalog) is deferred
→ title-only. Known v1 gaps: custom .GamingRoot install folders + non-GDK pure-UWP Store
games (under the ACL-locked WindowsApps) aren't enumerated.

New windows_launch_for `aumid` arm; XboxProvider wired into all_games() under cfg(windows).
Dep: roxmltree (Windows). Windows unit tests cover MicrosoftGame.config parsing (incl. the
ms-resource title fallback), the PackageFullName→PFN reduction, and the aumid launch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:49:03 +00:00
enricobuehler b65745284e feat(library): Windows Epic + GOG store providers
EpicProvider reads the launcher's local .item manifests under %ProgramData% (no auth,
launcher need not run) with Playnite's exclusion filter (skip UE_* components +
non-launchable addons + dead install dirs); cover art from the base64 catcache.bin
(public Epic CDN, best-effort). Launch via the com.epicgames.launcher:// URI opened
through explorer.exe — the namespace:catalogItemId:appName triple, with a bare-appName
fallback so a launch is never dropped.

GogProvider enumerates HKLM\SOFTWARE\WOW6432Node\GOG.com\Games (winreg) + each
goggame-<id>.info primary FileTask into a direct-exe spawn (no Galaxy, dodges its
cold-start/anti-cheat). GOG cover art (public api.gog.com) is deferred — it needs an
HTTP fetch + cache off the hot all_games() path — so GOG is title-only for now.

windows_launch_for gains epic/gog arms; both providers wired into all_games() under
cfg(windows). Deps: base64 moved to the cross-platform table (Epic catcache decode +
Lutris art encode both need it); winreg added on the Windows target. Windows unit tests
cover the Epic exclusion filter + URI builder and the GOG spawn + play-task parsing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:37:30 +00:00
enricobuehler 8ca695eb4c docs(windows-host): SCM event redesign done + runtime-validated (D2 complete)
The service.rs STOP/SESSION events are now OnceLock<OwnedHandle> (61c02e6) — the
last host-side raw-handle smuggle retired. Runtime-validated on the RTX box: swap
in, sc start -> RUNNING, sc stop -> clean STOPPED in ~1s, original restored. D2
(OwnedHandle/RAII rollout) is complete; only the deferred host P0 lints remain in
Goal 3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:28:29 +00:00
72 changed files with 3835 additions and 129 deletions
+12
View File
@@ -96,6 +96,18 @@ jobs:
# First-ever Windows lint coverage for the host (Linux CI never lints the windows-cfg code).
run: cargo clippy -p punktfunk-host --features nvenc,amf-qsv -- -D warnings
- name: Build + lint the HDR Vulkan layer (pf-vkhdr-layer)
shell: pwsh
# Standalone cdylib (own [workspace]) the installer bundles + registers (it lets Vulkan games
# like Doom use HDR on the virtual display). Lint here so a regression fails CI instead of
# silently shipping the host without the layer (pack-host-installer.ps1 builds it non-fatally).
# Windows-only FFI (user32 + the vk_layer loader glue) → can't be linted on the Linux CI.
run: |
Push-Location packaging/windows/pf-vkhdr-layer
cargo fmt --check; if ($LASTEXITCODE) { throw "pf-vkhdr-layer rustfmt" }
cargo clippy --release -- -D warnings; if ($LASTEXITCODE) { throw "pf-vkhdr-layer clippy" }
Pop-Location
- name: Ensure Inno Setup
shell: pwsh
run: |
Generated
+332
View File
@@ -2,6 +2,12 @@
# It is not intended for manual editing.
version = 3
[[package]]
name = "adler2"
version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "320119579fcad9c21884f5c4861d16174d0e06250625266f50fe6898340abefa"
[[package]]
name = "aead"
version = "0.5.2"
@@ -735,6 +741,15 @@ dependencies = [
"libc",
]
[[package]]
name = "crc32fast"
version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9481c1c90cbf2ac953f07c8d4a58aa3945c425b7185c9154d67a65e4230da511"
dependencies = [
"cfg-if",
]
[[package]]
name = "criterion"
version = "0.5.1"
@@ -1100,6 +1115,16 @@ version = "0.5.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1d674e81391d1e1ab681a28d99df07927c6d4aa5b027d7da16ba32d1d21ecd99"
[[package]]
name = "flate2"
version = "1.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "843fba2746e448b37e26a819579957415c8cef339bf08564fe8b7ddbd959573c"
dependencies = [
"crc32fast",
"miniz_oxide",
]
[[package]]
name = "flume"
version = "0.11.1"
@@ -1751,12 +1776,115 @@ dependencies = [
"tower-service",
]
[[package]]
name = "icu_collections"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2984d1cd16c883d7935b9e07e44071dca8d917fd52ecc02c04d5fa0b5a3f191c"
dependencies = [
"displaydoc",
"potential_utf",
"utf8_iter",
"yoke",
"zerofrom",
"zerovec",
]
[[package]]
name = "icu_locale_core"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "92219b62b3e2b4d88ac5119f8904c10f8f61bf7e95b640d25ba3075e6cac2c29"
dependencies = [
"displaydoc",
"litemap",
"tinystr",
"writeable",
"zerovec",
]
[[package]]
name = "icu_normalizer"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c56e5ee99d6e3d33bd91c5d85458b6005a22140021cc324cea84dd0e72cff3b4"
dependencies = [
"icu_collections",
"icu_normalizer_data",
"icu_properties",
"icu_provider",
"smallvec",
"zerovec",
]
[[package]]
name = "icu_normalizer_data"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "da3be0ae77ea334f4da67c12f149704f19f81d1adf7c51cf482943e84a2bad38"
[[package]]
name = "icu_properties"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bee3b67d0ea5c2cca5003417989af8996f8604e34fb9ddf96208a033901e70de"
dependencies = [
"icu_collections",
"icu_locale_core",
"icu_properties_data",
"icu_provider",
"zerotrie",
"zerovec",
]
[[package]]
name = "icu_properties_data"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8e2bbb201e0c04f7b4b3e14382af113e17ba4f63e2c9d2ee626b720cbce54a14"
[[package]]
name = "icu_provider"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "139c4cf31c8b5f33d7e199446eff9c1e02decfc2f0eec2c8d71f65befa45b421"
dependencies = [
"displaydoc",
"icu_locale_core",
"writeable",
"yoke",
"zerofrom",
"zerotrie",
"zerovec",
]
[[package]]
name = "id-arena"
version = "2.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3d3067d79b975e8844ca9eb072e16b31c3c1c36928edf9c6789548c524d0d954"
[[package]]
name = "idna"
version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3b0875f23caa03898994f6ddc501886a45c7d3d62d04d2d90788d47be1b1e4de"
dependencies = [
"idna_adapter",
"smallvec",
"utf8_iter",
]
[[package]]
name = "idna_adapter"
version = "1.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cb68373c0d6620ef8105e855e7745e18b0d00d3bdb07fb532e434244cdb9a714"
dependencies = [
"icu_normalizer",
"icu_properties",
]
[[package]]
name = "if-addrs"
version = "0.15.0"
@@ -2022,6 +2150,12 @@ version = "0.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "32a66949e030da00e8c7d4434b251670a91556f4144941d37452769c25d58a53"
[[package]]
name = "litemap"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "92daf443525c4cce67b150400bc2316076100ce0b3686209eb8cf3c31612e6f0"
[[package]]
name = "lock_api"
version = "0.4.14"
@@ -2116,6 +2250,16 @@ version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "68354c5c6bd36d73ff3feceb05efa59b6acb7626617f4962be322a825e61f79a"
[[package]]
name = "miniz_oxide"
version = "0.8.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1fa76a2c86f704bdb222d66965fb3d63269ce38518b83cb0575fca855ebb6316"
dependencies = [
"adler2",
"simd-adler32",
]
[[package]]
name = "mio"
version = "1.2.1"
@@ -2548,6 +2692,15 @@ dependencies = [
"universal-hash",
]
[[package]]
name = "potential_utf"
version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0103b1cef7ec0cf76490e969665504990193874ea05c85ff9bab8b911d0a0564"
dependencies = [
"zerovec",
]
[[package]]
name = "powerfmt"
version = "0.2.0"
@@ -2728,6 +2881,7 @@ dependencies = [
"rand 0.8.6",
"rcgen",
"reis",
"roxmltree",
"rsa",
"rusqlite",
"rustls",
@@ -2741,6 +2895,7 @@ dependencies = [
"tower",
"tracing",
"tracing-subscriber",
"ureq",
"utoipa",
"utoipa-axum",
"utoipa-scalar",
@@ -2752,6 +2907,7 @@ dependencies = [
"wayland-scanner",
"windows 0.62.2 (registry+https://github.com/rust-lang/crates.io-index)",
"windows-service",
"winreg",
"x509-parser",
"xkbcommon",
]
@@ -3054,6 +3210,15 @@ dependencies = [
"windows-sys 0.52.0",
]
[[package]]
name = "roxmltree"
version = "0.21.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f1964b10c76125c36f8afe190065a4bf9a87bf324842c05701330bba9f1cacbb"
dependencies = [
"memchr",
]
[[package]]
name = "rpkg-config"
version = "0.1.2"
@@ -3555,6 +3720,12 @@ dependencies = [
"rand_core 0.6.4",
]
[[package]]
name = "simd-adler32"
version = "0.3.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "703d5c7ef118737c72f1af64ad2f6f8c5e1921f818cdcb97b8fe6fc69bf66214"
[[package]]
name = "siphasher"
version = "1.0.3"
@@ -3637,6 +3808,12 @@ dependencies = [
"wasm-bindgen",
]
[[package]]
name = "stable_deref_trait"
version = "1.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6ce2be8dc25455e1f91df71bfa12ad37d7af1092ae736f3a6cd0e37bc7810596"
[[package]]
name = "strsim"
version = "0.11.1"
@@ -3789,6 +3966,16 @@ dependencies = [
"time-core",
]
[[package]]
name = "tinystr"
version = "0.8.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c8323304221c2a851516f22236c5722a72eaa19749016521d6dff0824447d96d"
dependencies = [
"displaydoc",
"zerovec",
]
[[package]]
name = "tinytemplate"
version = "1.2.1"
@@ -4094,6 +4281,40 @@ version = "0.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ecb6da28b8a351d773b68d5825ac39017e680750f980f3a1a85cd8dd28a47c1"
[[package]]
name = "ureq"
version = "2.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "02d1a66277ed75f640d608235660df48c8e3c19f3b4edb6a263315626cc3c01d"
dependencies = [
"base64",
"flate2",
"log",
"once_cell",
"rustls",
"rustls-pki-types",
"url",
"webpki-roots 0.26.11",
]
[[package]]
name = "url"
version = "2.5.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ff67a8a4397373c3ef660812acab3268222035010ab8680ec4215f38ba3d0eed"
dependencies = [
"form_urlencoded",
"idna",
"percent-encoding",
"serde",
]
[[package]]
name = "utf8_iter"
version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6c140620e7ffbb22c2dee59cafe6084a59b5ffc27a8859a5f0d494b5d52b6be"
[[package]]
name = "utf8parse"
version = "0.2.2"
@@ -4421,6 +4642,24 @@ dependencies = [
"rustls-pki-types",
]
[[package]]
name = "webpki-roots"
version = "0.26.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "521bc38abb08001b01866da9f51eb7c5d647a19260e00054a8c7fd5f9e57f7a9"
dependencies = [
"webpki-roots 1.0.8",
]
[[package]]
name = "webpki-roots"
version = "1.0.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bf85cb06032201fa7c6f829d7db5a7e5aa45bcc0655327713065f6f0576731bf"
dependencies = [
"rustls-pki-types",
]
[[package]]
name = "wide"
version = "0.7.33"
@@ -4946,6 +5185,16 @@ dependencies = [
"memchr",
]
[[package]]
name = "winreg"
version = "0.56.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7d6f32a0ff4a9f6f01231eb2059cc85479330739333e0e58cadf03b6af2cca10"
dependencies = [
"cfg-if",
"windows-sys 0.61.2",
]
[[package]]
name = "wit-bindgen"
version = "0.51.0"
@@ -5040,6 +5289,12 @@ dependencies = [
"wasmparser",
]
[[package]]
name = "writeable"
version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1ffae5123b2d3fc086436f8834ae3ab053a283cfac8fe0a0b8eaae044768a4c4"
[[package]]
name = "x509-parser"
version = "0.16.0"
@@ -5083,6 +5338,29 @@ dependencies = [
"time",
]
[[package]]
name = "yoke"
version = "0.8.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "709fe23a0424b6a435d82152b1bd3fdfb0833487d5fa90d05d42762a9891fef5"
dependencies = [
"stable_deref_trait",
"yoke-derive",
"zerofrom",
]
[[package]]
name = "yoke-derive"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "de844c262c8848816172cef550288e7dc6c7b7814b4ee56b3e1553f275f1858e"
dependencies = [
"proc-macro2",
"quote",
"syn",
"synstructure",
]
[[package]]
name = "zbus"
version = "5.16.0"
@@ -5159,12 +5437,66 @@ dependencies = [
"syn",
]
[[package]]
name = "zerofrom"
version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0ec05a11813ea801ff6d75110ad09cd0824ddba17dfe17128ea0d5f68e6c5272"
dependencies = [
"zerofrom-derive",
]
[[package]]
name = "zerofrom-derive"
version = "0.1.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "11532158c46691caf0f2593ea8358fed6bbf68a0315e80aae9bd41fbade684a1"
dependencies = [
"proc-macro2",
"quote",
"syn",
"synstructure",
]
[[package]]
name = "zeroize"
version = "1.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b97154e67e32c85465826e8bcc1c59429aaaf107c1e4a9e53c8d8ccd5eff88d0"
[[package]]
name = "zerotrie"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0f9152d31db0792fa83f70fb2f83148effb5c1f5b8c7686c3459e361d9bc20bf"
dependencies = [
"displaydoc",
"yoke",
"zerofrom",
]
[[package]]
name = "zerovec"
version = "0.11.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "90f911cbc359ab6af17377d242225f4d75119aec87ea711a880987b18cd7b239"
dependencies = [
"yoke",
"zerofrom",
"zerovec-derive",
]
[[package]]
name = "zerovec-derive"
version = "0.11.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "625dc425cab0dca6dc3c3319506e6593dcb08a9f387ea3b284dbd52a92c40555"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "zmij"
version = "1.0.21"
+14 -3
View File
@@ -25,6 +25,14 @@ aes-gcm = "0.10"
cbc = { version = "0.1", features = ["alloc"] }
rand = "0.8"
hex = "0.4"
# Cover-art delivery in the game library: encode Lutris's local JPEGs into `data:` URLs and decode
# the Epic launcher's base64 `catcache.bin`. Cross-platform (Linux Lutris art + Windows Epic art).
base64 = "0.22"
# Blocking HTTP for the library cover-art warmer (no-auth GOG api.gog.com + Xbox displaycatalog),
# run on a background thread off the hot path. `ureq` is small + sync (no tokio here) and bundles
# webpki roots (no system cert dependency). Cross-platform so the fetch/parse code is compiled +
# checked everywhere even though only the Windows GOG/Xbox providers need it today.
ureq = "2"
rcgen = { version = "0.13", default-features = false, features = ["aws_lc_rs", "pem"] }
x509-parser = "0.16"
axum-server = { version = "0.7", features = ["tls-rustls"] }
@@ -89,9 +97,6 @@ serde_json = "1"
# SQLite (cc, already needed for ffmpeg/opus) so there's no system libsqlite3 runtime dependency —
# clean for the deb/rpm/flatpak packaging. Opened read-only/immutable (Lutris may hold it open).
rusqlite = { version = "0.40", features = ["bundled"] }
# Inline Lutris's local cover-art JPEGs as `data:` URLs in the library (Lutris has no public CDN
# keyed by a stable id, unlike Steam/Heroic; a `data:` URL is self-contained — no host-served endpoint).
base64 = "0.22"
# Builds/validates the xkb keymap uploaded to the virtual keyboard + tracks modifier state.
xkbcommon = "0.8"
# The safe `opus` crate is stereo-only; surround (5.1/7.1) needs the libopus *multistream*
@@ -176,6 +181,12 @@ windows = { version = "0.62", features = [
# handler / ServiceManager install). Wraps the Win32 service API; the supervision loop itself uses
# the `windows` crate above.
windows-service = "0.7"
# Read the GOG.com install registry (HKLM\SOFTWARE\WOW6432Node\GOG.com\Games) for the GOG store
# provider — ergonomic + correct-by-construction vs. hand-rolled Reg* FFI for subkey enumeration.
winreg = "0.56"
# Parse each Xbox/Game-Pass game's MicrosoftGame.config (GDK manifest XML) for the Xbox store
# provider — a small read-only DOM is all we need (Identity/Executable/ShellVisuals/StoreId).
roxmltree = "0.21"
# Software H.264 encoder (GPU-less path + NVENC fallback). The default `source` feature statically
# compiles OpenH264 (BSD-2) — no system lib, builds on MSVC; nasm on PATH adds the SIMD fast path.
openh264 = "0.9"
@@ -13,6 +13,9 @@
//! when the client isn't talking. WASAPI objects are `!Send`, so they live entirely on that thread
//! (mirrors `WasapiLoopbackCapturer`).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{VirtualMic, SAMPLE_RATE};
use anyhow::{anyhow, Context, Result};
use std::collections::VecDeque;
@@ -154,6 +157,13 @@ fn find_or_install_device() -> Result<wasapi::Device> {
Ok(d) => Ok(d),
Err(e) => {
tracing::info!("no virtual mic device present — attempting auto-install");
// SAFETY: `try_install_virtual_mic` is `unsafe` only because it `LoadLibraryExW`s
// `newdev.dll` and calls `DiInstallDriverW` through a `transmute`d function pointer;
// calling it imposes no extra precondition here (it takes no args and aliases nothing).
// Its internal contract holds: the `DiInstall` type matches the documented
// `BOOL DiInstallDriverW(HWND, PCWSTR, DWORD, PBOOL)` ABI, and it passes a
// NUL-terminated UTF-16 INF path with null/zero optional args. Invoked once on the
// dedicated mic thread.
if unsafe { try_install_virtual_mic() } {
find_device()
} else {
+9
View File
@@ -2,6 +2,10 @@
//! CPU-copy fallback (the portal delivers a CPU buffer; the encoder uploads it to the GPU
//! internally). Zero-copy dmabuf→NVENC import is deferred (plan §9 risk).
// Every unsafe block in this module tree carries a `// SAFETY:` proof; enforce it (unsafe-proof
// program). As a parent module this also covers the child modules (capture::windows/linux::*).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::Result;
/// Packed pixel layout of a [`CapturedFrame`]. The ScreenCast portal negotiates the
@@ -433,6 +437,11 @@ pub fn capture_virtual_output(
// DDA is the safety net (+ the secure-desktop path). The encode thread is set MTA so the WGC
// objects built on the watchdog thread (also MTA) are usable here; the keepalive is handed to WGC
// only on success, else to DDA. A hung watchdog thread is abandoned (holds no keepalive).
// SAFETY: `RoInitialize` is a combase FFI call that initializes the WinRT apartment for the calling
// thread. It takes the `RO_INIT_MULTITHREADED` enum by value and borrows no memory, so there is no
// pointer/lifetime/aliasing obligation; it is safe on any thread and idempotent — a second call on a
// thread already in a compatible apartment returns S_FALSE / RPC_E_CHANGED_MODE, which we discard.
// Runs on the encode thread that goes on to use the WGC (WinRT) objects built by the watchdog thread.
unsafe {
let _ = windows::Win32::System::WinRT::RoInitialize(
windows::Win32::System::WinRT::RO_INIT_MULTITHREADED,
@@ -17,6 +17,9 @@
//! instead of leaking it to process exit. The portal thread (when used) still parks on its zbus
//! connection until process exit.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{CapturedFrame, Capturer, DmabufFrame, FramePayload, PixelFormat};
use anyhow::{anyhow, Context, Result};
use std::os::fd::OwnedFd;
@@ -498,6 +501,12 @@ mod pipewire {
impl DmabufMap {
fn new(fd: i32, len: usize) -> Option<DmabufMap> {
// SAFETY: a null `addr` lets the kernel choose the mapping address; `fd` is a caller-owned
// dmabuf/MemFd fd, valid for the duration of this call, and `len` is the requested map length.
// `mmap` reads no Rust memory — it installs a fresh PROT_READ/MAP_SHARED page mapping and
// returns its base (or MAP_FAILED, checked below before `DmabufMap` adopts it). The returned
// region is a brand-new VMA, so it aliases no live Rust object, and it keeps the underlying
// object mapped independently of `fd` (which may be closed after this returns).
let ptr = unsafe {
libc::mmap(
std::ptr::null_mut(),
@@ -514,6 +523,11 @@ mod pipewire {
impl Drop for DmabufMap {
fn drop(&mut self) {
// SAFETY: `self.ptr`/`self.len` are exactly the base+length of a successful `mmap` in
// `DmabufMap::new` (constructed only when `ptr != MAP_FAILED`). This `DmabufMap` uniquely owns
// that mapping and `drop` runs once, so `munmap` releases a live mapping exactly once — no
// double-unmap. Every `&[u8]` derived from the mapping is bounded by this `DmabufMap`'s
// lifetime, so no borrow outlives the unmap.
unsafe {
libc::munmap(self.ptr, self.len);
}
@@ -719,6 +733,14 @@ mod pipewire {
if !ud.active.load(Ordering::Relaxed) {
return;
}
// SAFETY: `spa_buf` is the `*mut spa_buffer` of the PipeWire buffer we dequeued and still hold for
// this `.process` callback (not requeued until after `consume_frame` returns), so it is live. The
// block null-checks `spa_buf`, requires `n_datas != 0`, and null-checks the `datas` array pointer
// before forming any slice. `(*spa_buf).datas` points to `n_datas` libspa `spa_data` structs, and
// `pw::spa::buffer::Data` is `#[repr(transparent)]` over `spa_data` (the same cast
// `Buffer::datas_mut` performs — see the function doc), so the pointer cast + length describe
// exactly that array, in bounds. The PipeWire loop is single-threaded and owns the buffer here, so
// this `&mut` slice is the only reference to it (no aliasing/data race).
let datas: &mut [pw::spa::buffer::Data] = unsafe {
if spa_buf.is_null() || (*spa_buf).n_datas == 0 || (*spa_buf).datas.is_null() {
&mut []
@@ -783,6 +805,10 @@ mod pipewire {
// dup the fd so it survives the SPA buffer recycle — the encode thread
// imports it. (Content stability across the brief map+CSC window relies on
// the compositor's buffer-pool depth, like any zero-copy capture.)
// SAFETY: `datas[0].fd()` is the dmabuf fd owned by the live PipeWire buffer (valid
// for this callback). `fcntl(fd, F_DUPFD_CLOEXEC, 0)` reads only the integer fd,
// touches no Rust memory, and returns a fresh independent CLOEXEC duplicate (or -1).
// The original stays owned by PipeWire; the dup is a new fd we own (checked >= 0).
let dup =
unsafe { libc::fcntl(datas[0].fd() as i32, libc::F_DUPFD_CLOEXEC, 0) };
if dup >= 0 {
@@ -796,6 +822,10 @@ mod pipewire {
pts_ns,
format: fmt,
payload: FramePayload::Dmabuf(DmabufFrame {
// SAFETY: `dup` is the fresh fd `fcntl(F_DUPFD_CLOEXEC)` just returned
// (checked `dup >= 0`); nothing else owns it, so `OwnedFd` takes sole
// ownership and closes it exactly once on drop — no alias, no
// double-close.
fd: unsafe { OwnedFd::from_raw_fd(dup) },
fourcc,
modifier: ud.modifier,
@@ -930,6 +960,11 @@ mod pipewire {
// cleanly if the real buffer is genuinely too small. MemPtr buffers (no fd) are same-process —
// trust `d.data()`.
let fd_len = if raw_fd > 0 {
// SAFETY: `libc::stat` is a C plain-old-data struct for which all-zero is a valid value, so
// `mem::zeroed()` is a sound initializer. `raw_fd` is the buffer's fd (`> 0` checked here) and
// valid for this callback; `fstat` writes metadata into `&mut st`, a live, aligned,
// correctly-sized stack `stat` that outlives the synchronous call. `st.st_size` is read only
// after the return value is confirmed `== 0`. `st` is a fresh local, so nothing aliases it.
unsafe {
let mut st: libc::stat = std::mem::zeroed();
(libc::fstat(raw_fd as i32, &mut st) == 0 && st.st_size > 0)
@@ -946,6 +981,14 @@ mod pipewire {
match DmabufMap::new(raw_fd as i32, map_len) {
Some(m) => {
_mapping = m;
// SAFETY: `_mapping` is the `DmabufMap` just stored; its `ptr`/`len` come from a
// successful `mmap` of `map_len` PROT_READ bytes, so `ptr` is non-null, page-aligned,
// and the VMA is one allocated object of `len` bytes valid for reads. In the common
// path `map_len == fd_len` (the fd's real size from `fstat`), so the mapping spans the
// whole object; the de-pad copy below is further bounded by the `offset <= buf.len()`
// and `needed > avail` guards. The `&[u8]` borrows `_mapping`, which lives to the end
// of `consume_frame`, so the slice never outlives the mapping, and the memory is only
// read here, so there is no aliasing/mutation.
Some(unsafe {
std::slice::from_raw_parts(_mapping.ptr as *const u8, _mapping.len)
})
@@ -1177,24 +1220,43 @@ mod pipewire {
// Latest-frame-only (OBS pattern): Mutter delivers buffers in bursts and
// recycles its pool; an older queued buffer carries a STALE frame. Drain all
// queued buffers, requeue the older ones, keep only the newest.
// SAFETY: `stream` is the live stream PipeWire passes into this `.process` callback on
// the loop thread, where `pw_stream_dequeue_buffer` is the documented call. It returns
// a `*mut pw_buffer` owned by the stream (or null when the queue is drained),
// null-checked before any use. The loop is single-threaded, so no concurrent access.
let mut newest = unsafe { stream.dequeue_raw_buffer() };
if newest.is_null() {
return;
}
let mut drained = 1u32;
loop {
// SAFETY: same stream/loop-thread contract as the dequeue above; each call returns
// the next stream-owned `*mut pw_buffer` or null (null-checked before use).
let next = unsafe { stream.dequeue_raw_buffer() };
if next.is_null() {
break;
}
// SAFETY: `newest` is a non-null `*mut pw_buffer` previously dequeued from this same
// stream and not yet requeued; `pw_stream_queue_buffer` hands ownership back to the
// stream. We immediately overwrite `newest = next`, so the requeued pointer is never
// touched again (no use-after-requeue). Loop thread, single-threaded.
unsafe { stream.queue_raw_buffer(newest) };
newest = next;
drained += 1;
}
// SAFETY: `newest` is the non-null buffer we still own (dequeued, not requeued);
// `.buffer` is a `*mut spa_buffer` field libpipewire populated. This is a single field
// load through a valid pointer — no mutation or aliasing.
let spa_buf = unsafe { (*newest).buffer };
// Inspect the newest buffer's header + first chunk for the diagnostic and the
// CORRUPTED skip. SPA_META_Header is optional — `hdr` may be null.
// SAFETY: `spa_buf` is the `*mut spa_buffer` of the buffer we still hold.
// `spa_buffer_find_meta_data` scans that buffer's metadata array for a `SPA_META_Header`
// of at least `size_of::<spa_meta_header>()` bytes and returns a pointer into the held
// buffer's metadata (or null). The size argument matches the struct the result is cast
// to, and the pointer stays valid as long as the buffer is held (until requeue). Null is
// handled below.
let hdr = unsafe {
spa::sys::spa_buffer_find_meta_data(
spa_buf,
@@ -1205,11 +1267,20 @@ mod pipewire {
let hdr_flags = if hdr.is_null() {
0u32
} else {
// SAFETY: reached only when `hdr` is non-null; it points to a `spa_meta_header`
// inside the live buffer's metadata (returned for a size >=
// `size_of::<spa_meta_header>()`, so `.flags` is in bounds). A single field read
// while the buffer is still held.
unsafe { (*hdr).flags }
};
// First data chunk's size + flags (used for the diagnostic + CORRUPTED check)
// and its data type (a dmabuf legitimately reports chunk size 0, so the size-0
// stale skip only applies to mappable SHM buffers).
// SAFETY: every dereference is guarded in order before any field read — `spa_buf`
// non-null, `n_datas > 0`, the `datas` (`*mut spa_data`) array non-null, and the first
// element's `chunk` (`*mut spa_chunk`) non-null. `d0` is that first `spa_data` and `c`
// its chunk; reading `(*d0).type_`, `(*c).size`, `(*c).flags` are in-bounds field loads
// of libspa structs inside the buffer we still hold. Single-threaded loop, no mutation.
let (chunk_size, chunk_flags, is_dmabuf) = unsafe {
if !spa_buf.is_null()
&& (*spa_buf).n_datas > 0
@@ -1246,11 +1317,17 @@ mod pipewire {
"capture: skipped a stale CORRUPTED/cursor buffer (GNOME)"
);
}
// SAFETY: `newest` is the non-null buffer we own (dequeued, never requeued on this
// skip path); hand it back to the stream exactly once and return without touching it
// again. Loop thread inside `.process`.
unsafe { stream.queue_raw_buffer(newest) };
return;
}
consume_frame(ud, spa_buf);
// SAFETY: `consume_frame` has finished reading `spa_buf` (and the `datas` borrows derived
// from `newest`), so requeuing the owned `newest` exactly once here is sound — no
// use-after-requeue. Loop thread inside `.process`.
unsafe { stream.queue_raw_buffer(newest) };
}));
if outcome.is_err() {
@@ -15,6 +15,9 @@
//! composed while a session is live). Effectiveness can be build/driver-dependent; gated by
//! `PUNKTFUNK_FORCE_COMPOSED` (default ON; set =0 to disable).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use windows::core::w;
@@ -48,6 +51,10 @@ impl ForceComposedFlip {
let st = stop.clone();
std::thread::Builder::new()
.name("composed-flip".into())
// SAFETY: `run` is this module's `unsafe fn` (it owns a desktop+window lifecycle via Win32
// FFI); it takes ownership of `st` (the stop `Arc<AtomicBool>`) and has no caller-side memory
// precondition. It is designed to own its thread for its whole duration — exactly the
// dedicated `composed-flip` thread spawned here.
.spawn(move || unsafe { run(st) })
.ok()?;
tracing::info!("force-composed-flip overlay started (Winlogon-aware)");
@@ -62,6 +69,9 @@ impl Drop for ForceComposedFlip {
}
extern "system" fn wndproc(hwnd: HWND, msg: u32, wp: WPARAM, lp: LPARAM) -> LRESULT {
// SAFETY: this is the window procedure the OS invokes with the window's own `hwnd` and a real
// message `(msg, wp, lp)`. `DefWindowProcW` performs default processing for exactly those
// parameters (all passed straight through by value); it borrows no Rust memory and is synchronous.
unsafe { DefWindowProcW(hwnd, msg, wp, lp) }
}
@@ -7,6 +7,9 @@
//! desktop's NAME (WTS session notifications miss UAC entirely, so the name is the reliable signal)
//! and publishes it as an atomic the capture mux + input path read.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::sync::atomic::{AtomicBool, AtomicU8, Ordering};
use std::sync::Arc;
use std::time::Duration;
@@ -33,6 +36,10 @@ impl DesktopWatcher {
// mux) sees the real state immediately. Otherwise a session that begins already on the secure
// desktop (e.g. a reconnect to a locked box) would read DESKTOP_NORMAL for the first poll
// interval and relay one stale normal-desktop frame — the "flash of the login screen" bug.
// SAFETY: `is_secure_desktop` is this module's `unsafe fn` — unsafe only because it calls Win32
// desktop FFI (`OpenInputDesktop`/`GetUserObjectInformationW`/`CloseDesktop`), with no caller
// precondition; it opens, names, and closes the input-desktop handle internally and is safe to
// call from any thread (here, on the thread running `DesktopWatcher::start`).
let initial = if unsafe { is_secure_desktop() } {
DESKTOP_SECURE
} else {
@@ -53,6 +60,9 @@ impl DesktopWatcher {
let mut candidate = initial;
let mut stable = 0u32;
while !st.load(Ordering::Relaxed) {
// SAFETY: same as in `start` — `is_secure_desktop` is self-contained Win32 desktop
// FFI with no caller precondition, called here on the dedicated `desktop-watch`
// polling thread.
let v = if unsafe { is_secure_desktop() } {
DESKTOP_SECURE
} else {
@@ -7,6 +7,9 @@
//! Validates only with a real GPU + an *activated* SudoVDA monitor (`DuplicateOutput` needs a live
//! WDDM output). Compiles on the GPU-less VM; the pure helpers are unit-tested there.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{CapturedFrame, Capturer, FramePayload, PixelFormat};
use anyhow::{anyhow, bail, Context, Result};
use std::ffi::c_void;
@@ -69,7 +72,12 @@ pub struct D3d11Frame {
pub texture: ID3D11Texture2D,
pub device: ID3D11Device,
}
// COM pointers, used only from the single owning thread.
// SAFETY: `D3d11Frame` owns an `ID3D11Texture2D` + `ID3D11Device`, which are COM interface pointers.
// D3D11 devices/resources use thread-safe (interlocked) COM reference counting, and the device is
// created free-threaded (`make_device` passes no `D3D11_CREATE_DEVICE_SINGLETHREADED`), so handing
// ownership of the frame to another thread — the capture→encode handoff — and releasing it there is
// sound. The value is moved, never aliased (no `Sync`), so there is no concurrent use of the
// single-threaded immediate context.
unsafe impl Send for D3d11Frame {}
pub fn pack_luid(luid: LUID) -> i64 {
@@ -295,6 +303,12 @@ unsafe fn d3dkmt_set_scheduling_priority_class(
fn elevate_process_gpu_priority() {
use std::sync::Once;
static ONCE: Once = Once::new();
// SAFETY: the closure calls two of this module's `unsafe fn`s — `enable_inc_base_priority`
// (adjusts the current-process token; it has no caller precondition and builds all its FFI args
// locally) and `d3dkmt_set_scheduling_priority_class` (loads gdi32 by name and calls the export).
// The latter requires `process` to be a valid process handle; `GetCurrentProcess()` returns the
// current-process pseudo-handle, which is always valid and needs no close. Runs once via
// `Once::call_once`; no raw pointers are dereferenced here.
ONCE.call_once(|| unsafe {
use windows::Win32::System::Threading::GetCurrentProcess;
let Some(prio) = configured_gpu_priority_class() else {
@@ -538,6 +552,17 @@ unsafe extern "system" fn hybrid_query_hook(gpu_preference: *mut u32) -> i32 {
pub(crate) fn install_gpu_pref_hook() {
use std::sync::Once;
static HOOK: Once = Once::new();
// SAFETY: this one-time hook install only touches a region it has just validated.
// `LoadLibraryA("win32u.dll")` + `GetProcAddress("NtGdiDdDDIGetCachedHybridQueryValue")` yield the
// live base of the real exported function, so `target` is a valid executable code pointer to at
// least the 12 bytes the patch overwrites (an x64 prologue, per Apollo's verified hook). The two
// `ptr::copy_nonoverlapping`s each move exactly 12 bytes between the 12-byte stack arrays
// (`patch`/`readback`) and `target`, which `VirtualProtect(target, 12, PAGE_EXECUTE_READWRITE, …)`
// has just made writable (and is restored to `old` after) — source and dest never overlap (stack
// vs. loaded module image), so every access stays in mapped, in-bounds memory.
// `FlushInstructionCache` gets the current-process pseudo-handle + that same range. The DPI calls
// take by-value context handles / fill the live local `&mut old`/`&mut restore` for the duration of
// each synchronous call. Runs once via `Once::call_once`, before any DXGI use.
HOOK.call_once(|| unsafe {
use windows::Win32::System::LibraryLoader::{GetProcAddress, LoadLibraryA};
use windows::Win32::System::Memory::{
@@ -1389,6 +1414,14 @@ pub fn hdr_p010_selftest() -> Result<()> {
}
}
// SAFETY: this self-test creates its own D3D11 device + immediate context (`D3D11CreateDevice`,
// both checked non-null) and uses ONLY that device for the rest of the block: every
// `CreateTexture2D`/`CreateShaderResourceView`/`HdrP010Converter::{new,convert}`/`CopyResource`/
// `Map` is invoked on that device or its context, so all resources share one device and run on this
// single thread. The source texture's `D3D11_SUBRESOURCE_DATA` points at `fp16`, a live
// `Vec<u16>` of `W*H*4` samples with `SysMemPitch = W*8`, matching the W×H R16G16B16A16 texture;
// `fp16` outlives the synchronous `CreateTexture2D` that reads it. The mapped-pointer reads are
// proven individually at the `read_u16` closure below.
unsafe {
// Hardware D3D11 device (no adapter pin — the default GPU is fine for the self-test).
let mut device: Option<ID3D11Device> = None;
@@ -2038,7 +2071,11 @@ pub struct DuplCapturer {
dbg_cursor: u64,
_keepalive: Box<dyn Send>,
}
// COM objects used only from the one thread that owns the capturer (the encode thread).
// SAFETY: `DuplCapturer` holds D3D11 device/context/duplication COM pointers plus plain data. The
// device is created free-threaded (`make_device` sets no `D3D11_CREATE_DEVICE_SINGLETHREADED`) and
// COM reference counting is interlocked, so moving ownership of the whole capturer to another thread
// is sound. It is used by exactly one thread (the encode thread) at a time — moved to it once, never
// shared (no `Sync`) — so the single-threaded immediate context is never touched concurrently.
unsafe impl Send for DuplCapturer {}
impl DuplCapturer {
@@ -2051,6 +2088,13 @@ impl DuplCapturer {
gpu: bool,
want_hdr: bool,
) -> Result<Self> {
// SAFETY: runs on the capture thread that will own this `DuplCapturer`. `install_gpu_pref_hook()`
// and the DPI-context calls take by-value handles / no args and touch only thread/process state;
// `SetThreadExecutionState` takes a flags bitmask by value. `CreateDXGIFactory1` yields a live
// `IDXGIFactory1`, and every subsequent COM method (`EnumAdapters1`/`EnumOutputs`/`GetDesc1`/
// `GetDesc`/`cast`) is called on that factory or on an adapter/output it returned — each obtained
// through a checked `while let Ok(..)`/`?` — all from this one thread. No raw pointers are
// dereferenced; the borrowed strings/locals outlive each synchronous call.
unsafe {
// Stop DXGI hybrid-GPU output reparenting BEFORE we create the factory / enumerate outputs
// (the cause of the 0x887A0026 ACCESS_LOST churn on this hybrid box: RTX 4090 + AMD iGPU).
@@ -3207,6 +3251,11 @@ impl Capturer for DuplCapturer {
// the duplication up to 12 s). Better a few seconds of frozen-last-frame than dropping the stream.
let mut deadline = Instant::now() + Duration::from_secs(20);
loop {
// SAFETY: `acquire` is an `unsafe fn` because it drives the D3D11 immediate context + the
// output duplication, which must be touched only from the capturer's owning thread.
// `next_frame` runs on that one thread — `DuplCapturer` is `Send` but not `Sync`, so it is
// owned by a single (encode) thread for its whole life — and `&mut self` gives exclusive
// access for the call, satisfying that contract.
if let Some(f) = unsafe { self.acquire() }? {
self.ever_got_frame = true;
return Ok(f);
@@ -3253,6 +3302,8 @@ impl Capturer for DuplCapturer {
}
fn try_latest(&mut self) -> Result<Option<CapturedFrame>> {
// SAFETY: as in `next_frame` — `acquire` must run on the capturer's single owning thread, and
// `try_latest` is called on it (`DuplCapturer` is `Send`, not `Sync`); `&mut self` is exclusive.
unsafe { self.acquire() }
}
@@ -3264,11 +3315,19 @@ impl Capturer for DuplCapturer {
impl Drop for DuplCapturer {
fn drop(&mut self) {
if self.holding_frame {
// SAFETY: `self.dupl` is the live `IDXGIOutputDuplication` this capturer created and owns;
// `ReleaseFrame` is a valid COM method on it, called only when `holding_frame` records that a
// frame was acquired and not yet released (so it is not an unbalanced release). Drop runs on
// whichever thread owns the capturer — its sole owner, since it is `!Sync` — and the `&`
// borrow of the duplication outlives this synchronous call.
unsafe {
let _ = self.dupl.as_ref().map(|d| d.ReleaseFrame());
}
}
// Release the display/system-required execution state we took at open().
// SAFETY: `SetThreadExecutionState` is a Win32 FFI call taking an execution-state flag bitmask
// by value (`ES_CONTINUOUS` clears the display/system-required state taken at open); it borrows
// no Rust memory and is safe to call from any thread.
unsafe {
SetThreadExecutionState(ES_CONTINUOUS);
}
@@ -10,7 +10,10 @@
//! [`pf_driver_proto::frame`] (which OWNS the contract, with `const` size asserts) — both sides
//! `use` it, so drift is a compile error rather than a "must match" comment.
use super::dxgi::{make_device, D3d11Frame, HdrConverter, WinCaptureTarget};
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::dxgi::{make_device, D3d11Frame, HdrP010Converter, VideoConverter, WinCaptureTarget};
use super::{CapturedFrame, Capturer, FramePayload, PixelFormat};
use anyhow::{bail, Context, Result};
use pf_driver_proto::frame;
@@ -20,13 +23,12 @@ use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH};
use windows::core::{w, Interface, HSTRING};
use windows::Win32::Foundation::{HANDLE, INVALID_HANDLE_VALUE, LUID};
use windows::Win32::Graphics::Direct3D11::{
ID3D11Device, ID3D11DeviceContext, ID3D11RenderTargetView, ID3D11ShaderResourceView,
ID3D11Texture2D, D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE,
D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX, D3D11_RESOURCE_MISC_SHARED_NTHANDLE,
D3D11_TEXTURE2D_DESC, D3D11_USAGE_DEFAULT,
ID3D11Device, ID3D11DeviceContext, ID3D11ShaderResourceView, ID3D11Texture2D,
D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX,
D3D11_RESOURCE_MISC_SHARED_NTHANDLE, D3D11_TEXTURE2D_DESC, D3D11_USAGE_DEFAULT,
};
use windows::Win32::Graphics::Dxgi::Common::{
DXGI_FORMAT, DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_FORMAT_R10G10B10A2_UNORM,
DXGI_FORMAT, DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_FORMAT_NV12, DXGI_FORMAT_P010,
DXGI_FORMAT_R16G16B16A16_FLOAT, DXGI_SAMPLE_DESC,
};
use windows::Win32::Graphics::Dxgi::{
@@ -205,21 +207,33 @@ pub struct IddPushCapturer {
/// cleared when a fresh frame resumes. If it stays set past the recovery window, `try_consume` drops
/// the session (recover-or-drop, no DDA).
recovering_since: Option<Instant>,
/// Host-owned ROTATING output ring NVENC encodes (texture + RTV per slot). Rotating it per frame is
/// the precondition for pipelining the encode loop: while NVENC encodes frame N's texture on the
/// ASIC, frame N+1's convert/copy writes a DIFFERENT texture on the 3D engine — the two overlap. The
/// HDR convert and the SDR copy both write into the current slot. Format = `out_format()` (Rgb10a2 in
/// HDR, Bgra in SDR); rebuilt on a display-mode flip. Built lazily.
out_ring: Vec<(ID3D11Texture2D, ID3D11RenderTargetView)>,
/// Host-owned ROTATING output ring NVENC encodes (one YUV texture per slot). Rotating it per frame
/// is the precondition for pipelining the encode loop: while NVENC encodes frame N's texture on the
/// ASIC, frame N+1's convert writes a DIFFERENT texture — the two overlap. Format = `out_format()`:
/// NV12 (SDR, BT.709 limited) or P010 (HDR, BT.2020 PQ limited), so NVENC takes native YUV and skips
/// its internal RGB→YUV CSC on the SM/3D engine the game saturates (plan §5.A). Rebuilt on a
/// display-mode flip. Built lazily.
out_ring: Vec<ID3D11Texture2D>,
out_idx: usize,
/// FP16 scRGB → `Rgb10a2` BT.2020 PQ converter, used while the display is HDR. Built lazily.
hdr_conv: Option<HdrConverter>,
/// BGRA slot → NV12 (BT.709 limited) on the dedicated D3D11 VIDEO engine, used while the display is
/// SDR — keeps the colour-convert OFF the contended 3D/compute engine. Built lazily; rebuilt on a
/// size/HDR flip.
video_conv: Option<VideoConverter>,
/// FP16 scRGB slot → P010 (BT.2020 PQ limited) via two shader passes, used while the display is HDR
/// (NVIDIA's VideoProcessor can't do RGB→P010). The passes run on the 3D engine, but it still skips
/// NVENC's internal SM-side CSC. Built lazily.
hdr_p010_conv: Option<HdrP010Converter>,
last_seq: u64,
last_present: Option<(ID3D11Texture2D, PixelFormat)>,
status_logged: bool,
_keepalive: Box<dyn Send>,
}
// COM objects used only from the owning (encode) thread.
// SAFETY: `IddPushCapturer` is `!Send` only because of its `*mut SharedHeader`/`*mut DebugBlock` raw
// pointers (and the COM interfaces). It is created, used, and dropped by a SINGLE thread — the owning
// capture/encode thread — never shared: the `ID3D11DeviceContext` is the device's IMMEDIATE context
// (single-threaded by D3D11 contract) and is only ever touched from that thread, and the header/
// dbg_block pointers (into mappings this struct owns) are only dereferenced there. `Send` transfers
// ownership to one thread at a time with NO concurrent access; we do not (and must not) claim `Sync`.
unsafe impl Send for IddPushCapturer {}
/// Build a permissive (Everyone:GenericAll) `SECURITY_ATTRIBUTES` so the restricted WUDFHost driver
@@ -337,6 +351,9 @@ impl IddPushCapturer {
// a fullscreen game can hold the virtual display at a different mode (esp. across a reconnect), so
// matching the actual mode lets the first frame flow instead of being dropped (game-capture bug
// GB1). Falls back to the negotiated mode when the CCD read is unavailable.
// SAFETY: `active_resolution` is an `unsafe fn` (Win32 CCD `QueryDisplayConfig`) that takes only a
// copy of the plain `u32` CCD target id and returns owned `(w, h)` values; it forms no borrows from
// us and validates the id internally, returning `None` on any failure (handled by `unwrap_or`).
let (w, h) =
unsafe { crate::win_display::active_resolution(target.target_id) }.unwrap_or((pw, ph));
if (w, h) != (pw, ph) {
@@ -355,6 +372,27 @@ impl IddPushCapturer {
// PROACTIVELY enable advanced color so HDR streams without the user toggling anything; an
// SDR-only client leaves the display alone (and still gets a tone-mapped picture, never a freeze,
// if the user does enable HDR).
// SAFETY: one block over the whole ring setup; every operation in it is sound:
// - `set_advanced_color`/`advanced_color_enabled` are `unsafe fn`s taking only a copy of the plain
// `u32` target id; they read/flip CCD display config and return owned values, borrowing nothing.
// - `CreateDXGIFactory1`, `EnumAdapterByLuid`, `make_device`, `permissive_sa`, `CreateFileMappingW`,
// `MapViewOfFile`, `CreateEventW`, and `create_ring_slots` are all `?`-checked, so every returned
// interface/handle/view is non-error before use; `&sa`/`&adapter`/`&device`/the `&HSTRING` names
// are live borrows that outlive each synchronous call, and `sa.lpSecurityDescriptor` stays valid
// because its backing `_psd` is held in scope for the whole block.
// - The header mapping is created AND viewed at `bytes == size_of::<SharedHeader>().max(64)`; the
// view's null is checked (`bail!` on failure, after which the owned `map` closes the mapping). The
// OS view base is page-aligned, so `section.ptr::<SharedHeader>()` is suitably aligned for a
// `SharedHeader`, and `write_bytes(.., 0, bytes)` plus the `(*header).field = ..` writes all stay
// within those `bytes` and write THROUGH the raw pointer without forming any `&mut`. The debug
// section is the same pattern at `dbg_bytes == size_of::<DebugBlock>()`, only entered when its
// own view is non-null.
// - The `magic` publish stores through `addr_of!((*header).magic) as *const AtomicU32`: `addr_of!`
// takes the field address without a reference; the field is a 4-aligned `u32` (valid for
// `AtomicU32`), and the `Release` store after the `Release` fence is the cross-process handshake
// that orders all preceding writes before the driver may observe `MAGIC`.
// - `header`/`dbg_block` point into the OS mappings, NOT into the `MappedSection` structs, so moving
// `section`/`dbg_section` into `me` leaves them valid (see the `MappedSection` doc comment).
unsafe {
// If we ENABLE advanced color for a 10-bit client, trust it (the driver will compose FP16) and
// size the ring FP16 directly — don't race the advanced_color_enabled poll, which may not have
@@ -504,7 +542,8 @@ impl IddPushCapturer {
recovering_since: None,
out_ring: Vec::new(),
out_idx: 0,
hdr_conv: None,
video_conv: None,
hdr_p010_conv: None,
last_seq: 0,
last_present: None,
status_logged: false,
@@ -534,10 +573,16 @@ impl IddPushCapturer {
fn wait_for_attach(&self) -> Result<()> {
let deadline = Instant::now() + Duration::from_secs(4);
loop {
// Plain read: the driver writes this u32; an aligned u32 read can't tear (same access as
// SAFETY: `self.header` points into the live shared-header mapping this capturer owns (sized
// `>= size_of::<SharedHeader>()`, page-aligned), so the field read is in-bounds + aligned, and
// no reference into the shared region is formed. Plain read: the driver writes this `u32`
// cross-process, but an aligned `u32` read can't tear and `driver_status` is best-effort
// diagnostics — the real handshake is the atomic `magic`/`latest` (same access as
// log_driver_status_once).
let st = unsafe { (*self.header).driver_status };
if matches!(st, DRV_STATUS_TEX_FAIL | DRV_STATUS_NO_DEVICE1) {
// SAFETY: as above — an in-bounds, aligned `u32` read of a best-effort diagnostic field
// through the owned, live header mapping; no reference into the shared region is formed.
let detail = unsafe { (*self.header).driver_status_detail };
bail!(
"IDD-push driver failed to attach (driver_status={st} detail=0x{detail:08x} — \
@@ -560,6 +605,10 @@ impl IddPushCapturer {
#[inline]
fn latest(&self) -> u64 {
// SAFETY: `self.header` is the live, owned shared-header mapping (page-aligned, sized for a
// `SharedHeader`). `addr_of!((*self.header).latest)` forms the address of the `latest` field
// WITHOUT a reference; it is an 8-aligned `u64` (so valid for `AtomicU64`), and the `Acquire` load
// is the consumer half of the cross-process publish handshake (pairs with the driver's `Release`).
unsafe {
(*(std::ptr::addr_of!((*self.header).latest) as *const AtomicU64))
.load(Ordering::Acquire)
@@ -571,6 +620,10 @@ impl IddPushCapturer {
if self.status_logged {
return;
}
// SAFETY: four in-bounds, aligned reads of the live, owned shared-header mapping. The driver writes
// these `u32`/`i32` diagnostic fields cross-process, but aligned word reads can't tear and these are
// best-effort status (the real handshake is the atomic `magic`/`latest`); no `&`/`&mut` reference
// into the shared region is formed.
let (status, detail, lo, hi) = unsafe {
(
(*self.header).driver_status,
@@ -610,6 +663,11 @@ impl IddPushCapturer {
tracing::warn!("IDD push DEBUG: no debug block");
return;
}
// SAFETY: `self.dbg_block` was just checked non-null (the early return above); it points into the
// owned `dbg_section` mapping sized exactly `size_of::<DebugBlock>()` and page-aligned, so it is
// valid + aligned for `DebugBlock`. `d` is a short-lived SHARED reference used only to read the
// fields below; we never form `&mut` into this region, and the driver's cross-process writes are
// aligned `u32`s that don't tear (best-effort bring-up diagnostics).
let d = unsafe { &*self.dbg_block };
tracing::error!(
run_core_entries = d.run_core_entries,
@@ -625,16 +683,17 @@ impl IddPushCapturer {
);
}
/// The output texture format + the [`PixelFormat`] it presents as, driven SOLELY by the DISPLAY's
/// HDR state (like the WGC path): HDR → `Rgb10a2` BT.2020 PQ → NVENC Main10, and the client
/// auto-detects PQ from the HEVC VUI; SDR → 8-bit `Bgra`. We do NOT gate HDR on the client's
/// advertised `VIDEO_CAP_10BIT` — clients under-report it (e.g. the Mac advertises 10-bit only when
/// its OWN display is HDR), yet all decode Main10 + auto-switch, exactly as on the WGC path.
/// The output texture format + the [`PixelFormat`] NVENC encodes, driven SOLELY by the DISPLAY's HDR
/// state (like the WGC path): HDR → `P010` (BT.2020 PQ 10-bit limited) → NVENC Main10, and the client
/// auto-detects PQ from the HEVC VUI; SDR → `Nv12` (BT.709 8-bit limited). Both are native YUV so
/// NVENC skips its internal RGB→YUV CSC on the contended SM (plan §5.A). We do NOT gate HDR on the
/// client's advertised `VIDEO_CAP_10BIT` — clients under-report it (e.g. the Mac advertises 10-bit
/// only when its OWN display is HDR), yet all decode Main10 + auto-switch, exactly as on the WGC path.
fn out_format(&self) -> (DXGI_FORMAT, PixelFormat) {
if self.display_hdr {
(DXGI_FORMAT_R10G10B10A2_UNORM, PixelFormat::Rgb10a2)
(DXGI_FORMAT_P010, PixelFormat::P010)
} else {
(DXGI_FORMAT_B8G8R8A8_UNORM, PixelFormat::Bgra)
(DXGI_FORMAT_NV12, PixelFormat::Nv12)
}
}
@@ -658,6 +717,10 @@ impl IddPushCapturer {
self.height = new_h;
let fmt = self.ring_format();
let new_gen = IDD_GENERATION.fetch_add(1, Ordering::Relaxed);
// SAFETY: `create_ring_slots` is an `unsafe fn` (it makes D3D11/DXGI COM calls); we pass a live
// borrow of `self.device` (the capturer's own device, on which the slots are created) plus plain
// `u32`/`DXGI_FORMAT` values, and `?` propagates any failure before the slots are used. Every
// returned slot's texture + keyed mutex belongs to that same `self.device`.
let new_slots = unsafe {
Self::create_ring_slots(
&self.device,
@@ -668,6 +731,12 @@ impl IddPushCapturer {
fmt,
)?
};
// SAFETY: `self.header` is the live, owned shared-header mapping (page-aligned, sized for a
// `SharedHeader`). The `latest`/`generation` stores go through `addr_of!`-formed field pointers (no
// references) of correctly-aligned `u64`/`u32` fields, valid for `AtomicU64`/`AtomicU32`; the
// `dxgi_format`/`width`/`height` writes are in-bounds raw writes through the pointer (no `&mut`).
// The `Release` fence + the `Release` `generation` store publish all preceding writes so the driver
// only re-attaches (`Acquire`) once the new textures + format are in place.
unsafe {
// Clear `latest` to the 0 sentinel (generation 0, which try_consume rejects). The real guard
// against consuming an unwritten new-ring slot is the generation tag in `latest`: a stale
@@ -688,6 +757,8 @@ impl IddPushCapturer {
self.generation = new_gen;
self.last_seq = 0;
self.out_ring.clear(); // the output format changed → rebuild lazily at the new format
self.video_conv = None; // converters are sized + HDR-specific → rebuild at the new mode
self.hdr_p010_conv = None;
self.out_idx = 0;
self.last_present = None;
Ok(())
@@ -701,9 +772,13 @@ impl IddPushCapturer {
return;
}
self.last_acm_poll = Instant::now();
// SAFETY: `advanced_color_enabled` is an `unsafe fn` taking only a copy of the plain `u32` target
// id; it performs a read-only CCD query and returns an owned `bool`, borrowing nothing from us.
let now_hdr = unsafe { crate::win_display::advanced_color_enabled(self.target_id) };
// Follow the display's ACTUAL resolution too — a fullscreen game can mode-set the virtual display
// out from under the negotiated size (game-capture bug GB1). Unknown read → keep our current size.
// SAFETY: `active_resolution` is an `unsafe fn` taking only a copy of the plain `u32` target id; it
// performs a read-only CCD query and returns owned `(w, h)` values, borrowing nothing from us.
let (now_w, now_h) = unsafe { crate::win_display::active_resolution(self.target_id) }
.unwrap_or((self.width, self.height));
if now_hdr == self.display_hdr && now_w == self.width && now_h == self.height {
@@ -742,31 +817,46 @@ impl IddPushCapturer {
Quality: 0,
},
Usage: D3D11_USAGE_DEFAULT,
BindFlags: (D3D11_BIND_RENDER_TARGET.0 | D3D11_BIND_SHADER_RESOURCE.0) as u32,
// RENDER_TARGET: the VIDEO processor (NV12) and the P010 shader passes both write here, and
// NVENC registers it as encode input — matching the WGC YUV ring.
BindFlags: D3D11_BIND_RENDER_TARGET.0 as u32,
CPUAccessFlags: 0,
MiscFlags: 0,
};
for _ in 0..OUT_RING {
let mut t: Option<ID3D11Texture2D> = None;
let mut rtv: Option<ID3D11RenderTargetView> = None;
// SAFETY: `CreateTexture2D` is called on `self.device` (the capturer's live D3D11 device);
// `&desc` is a fully-initialized stack `D3D11_TEXTURE2D_DESC`, the data arg is `None` (no
// initial data), and `Some(&mut t)` is a live out-parameter the call fills. `?` rejects a failed
// HRESULT before `t` is unwrapped, and the created texture belongs to `self.device`.
unsafe {
self.device
.CreateTexture2D(&desc, None, Some(&mut t))
.context("CreateTexture2D(IDD out ring)")?;
let t = t.context("null out-ring texture")?;
self.device
.CreateRenderTargetView(&t, None, Some(&mut rtv))
.context("CreateRenderTargetView(IDD out ring)")?;
self.out_ring.push((t, rtv.context("null out-ring rtv")?));
self.out_ring.push(t.context("null out-ring texture")?);
}
}
Ok(())
}
/// Build the HDR converter if not already built (HDR-display path only — an SDR display is a copy).
/// Build the per-mode YUV converter if not already built: a VIDEO-engine BGRA→NV12 processor on an
/// SDR display, or the FP16→P010 shader on an HDR display. Both keep NVENC's RGB→YUV CSC off the SM.
fn ensure_converter(&mut self) -> Result<()> {
if self.hdr_conv.is_none() {
self.hdr_conv = Some(unsafe { HdrConverter::new(&self.device)? });
if self.display_hdr {
if self.hdr_p010_conv.is_none() {
// SAFETY: `HdrP010Converter::new` is `unsafe` (it compiles D3D11 shaders + creates
// resources); we pass a live borrow of `self.device`, the device the converter's resources
// belong to, and `?` propagates any failure before the converter is stored.
self.hdr_p010_conv = Some(unsafe { HdrP010Converter::new(&self.device)? });
}
} else if self.video_conv.is_none() {
// SAFETY: `VideoConverter::new` is `unsafe` (it sets up the D3D11 VIDEO processor); we pass live
// borrows of `self.device` + its immediate `self.context` (single-threaded, this thread) plus
// plain `u32` dimensions, and `?` propagates any failure before it is stored. The converter's
// resources belong to that same device/context.
self.video_conv = Some(unsafe {
VideoConverter::new(&self.device, &self.context, self.width, self.height, false)?
});
}
Ok(())
}
@@ -801,16 +891,11 @@ impl IddPushCapturer {
return Ok(None);
}
self.ensure_out_ring()?;
// Build the HDR converter BEFORE acquiring the slot so nothing between Acquire and Release can
// Build the converter BEFORE acquiring the slot so nothing between Acquire and Release can
// `?`-return and leak the keyed-mutex lock (which would stall the driver on that slot).
if self.display_hdr {
self.ensure_converter()?;
}
self.ensure_converter()?;
let i = self.out_idx;
let (out, out_rtv) = {
let (t, rtv) = &self.out_ring[i];
(t.clone(), rtv.clone())
};
let out = self.out_ring[i].clone();
let (_, pf) = self.out_format();
// Hold the slot's keyed mutex only across the convert/copy into the host out-ring (NOT across the
@@ -824,16 +909,27 @@ impl IddPushCapturer {
let Some(_lock) = KeyedMutexGuard::acquire(&s.mutex, 0, 8) else {
return Ok(None);
};
// SAFETY: convert/copy on the owning (encode) thread's immediate context, holding the slot lock.
// SAFETY: convert on the owning (encode) thread's immediate context, holding the slot lock.
// A `?` here is leak-safe: `_lock` (the KeyedMutexGuard) drops on the early return, releasing
// the slot back to the driver.
unsafe {
if self.display_hdr {
// Sample the FP16 slot's SRV directly (no scratch copy) → BT.2020 PQ Rgb10a2.
if let Some(conv) = self.hdr_conv.as_ref() {
conv.convert(&self.context, &s.srv, &out_rtv, self.width, self.height);
// HDR: FP16 slot SRV → P010 (BT.2020 PQ) via the shader; NVENC takes native P010.
if let Some(conv) = self.hdr_p010_conv.as_ref() {
conv.convert(
&self.device,
&self.context,
&s.srv,
&out,
self.width,
self.height,
)?;
}
} else {
// SDR: the slot is already 8-bit BGRA — one copy into the out-ring (hidden by pipelining).
self.context.CopyResource(&out, &s.tex);
// SDR: BGRA slot → NV12 on the VIDEO engine; NVENC takes native NV12, no SM-side CSC.
if let Some(conv) = self.video_conv.as_ref() {
conv.convert(&s.tex, &out)?;
}
}
}
// `_lock` drops here → `ReleaseSync(0)`.
@@ -861,7 +957,7 @@ impl IddPushCapturer {
// OUT_RING(3) > the max pipeline_depth(2) guarantees the rotated slot is not in flight.
let (src, pf) = self.last_present.clone()?;
let i = self.out_idx;
let dst = self.out_ring.get(i)?.0.clone();
let dst = self.out_ring.get(i)?.clone();
// SAFETY: GPU copy on the owning thread's immediate context; src/dst are our out-ring textures of
// identical format/size (src is a previous out-ring slot; dst the next).
unsafe {
@@ -922,6 +1018,8 @@ pub fn spawn_observer(target: WinCaptureTarget, preferred: Option<(u32, u32, u32
/// The discrete render GPU LUID (where NVENC runs), falling back to the monitor's `OsAdapterLuid`.
fn resolve_render_adapter_luid_or(fallback_packed: i64) -> LUID {
// SAFETY: `resolve_render_adapter_luid` is an `unsafe fn` (it enumerates DXGI adapters) that takes no
// arguments and returns an owned `Option<LUID>`, borrowing nothing.
if let Some(l) = unsafe { crate::win_adapter::resolve_render_adapter_luid() } {
return l;
}
@@ -935,6 +1033,9 @@ impl Capturer for IddPushCapturer {
fn next_frame(&mut self) -> Result<CapturedFrame> {
let deadline = Instant::now() + Duration::from_secs(20);
loop {
// SAFETY: `self.event` is the live frame-ready `OwnedHandle` this capturer owns; its raw value
// (borrowed for the call, so it outlives this synchronous wait) is a valid auto-reset event
// handle. `WaitForSingleObject` only reads the handle; the 16 ms timeout bounds the wait.
let _ = unsafe { WaitForSingleObject(HANDLE(self.event.as_raw_handle()), 16) };
if let Some(f) = self.try_consume()? {
return Ok(f);
@@ -944,6 +1045,9 @@ impl Capturer for IddPushCapturer {
}
if Instant::now() > deadline {
self.log_debug_block();
// SAFETY: four in-bounds, aligned reads of the live, owned shared-header mapping — the same
// best-effort diagnostic fields as `log_driver_status_once` (aligned word reads can't tear;
// no reference into the shared region is formed).
let (st, detail, lo, hi) = unsafe {
(
(*self.header).driver_status,
@@ -16,6 +16,9 @@
//! Limitation: WGC cannot capture the secure desktop (lock / UAC / login) — the caller falls back to
//! the DDA backend ([`super::dxgi::DuplCapturer`]) for those (see capture.rs).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::dxgi::{
find_output, hdr_shader_p010_enabled, make_device, nudge_cursor_onto, D3d11Frame, HdrConverter,
HdrP010Converter, VideoConverter, WinCaptureTarget,
@@ -92,6 +95,10 @@ struct Deimpersonate(Option<HANDLE>);
impl Drop for Deimpersonate {
fn drop(&mut self) {
if let Some(tok) = self.0.take() {
// SAFETY: `RevertToSelf` takes no arguments and undoes the thread impersonation set during
// WGC activation; `tok` is the impersonation token `HANDLE` from `impersonate_active_user`,
// owned by this `Deimpersonate` and closed exactly once here (taken out of the `Option`, so
// no double-close). Both are FFI calls borrowing no Rust memory.
unsafe {
let _ = RevertToSelf();
let _ = CloseHandle(tok);
@@ -174,7 +181,12 @@ pub struct WgcCapturer {
_keepalive: Option<Box<dyn Send>>,
}
// COM + WinRT pointers; confined to the single owning (encode) thread, like DuplCapturer.
// SAFETY: like `DuplCapturer`. `WgcCapturer` holds D3D11 (free-threaded device/context) plus WGC WinRT
// objects (`Direct3D11CaptureFramePool` etc., created free-threaded via `CreateFreeThreaded`). COM/WinRT
// reference counting is interlocked, and the capturer is owned + used by exactly one encode thread,
// moved to it once and never shared (no `Sync`), so transferring ownership across threads is sound. The
// free-threaded `FrameArrived` callback touches only the `Arc<WgcSignal>` (itself `Send + Sync`), not
// the capturer's COM fields.
unsafe impl Send for WgcCapturer {}
impl WgcCapturer {
@@ -182,6 +194,15 @@ impl WgcCapturer {
/// [`attach_keepalive`](Self::attach_keepalive) only after open succeeds, so a failure leaves the
/// keepalive with the caller to hand to the DDA fallback.
pub fn open(target: WinCaptureTarget, preferred: Option<(u32, u32, u32)>) -> Result<Self> {
// SAFETY: runs on the thread opening the WGC session. `RoInitialize` inits this thread's WinRT
// apartment (idempotent; result ignored). `impersonate_active_user()` and `find_output()` are
// this module's `unsafe fn`s whose contracts (call on the activating thread; pass a GDI name)
// are met, and the impersonation is reverted by `_deimp`'s Drop on every return path. Every
// COM/WinRT call thereafter operates on an object obtained + `?`-checked earlier in this same
// block on this single thread — the `IDXGIOutput1` from `find_output`, the device/context from
// `make_device`, the factory/interop/item/pool/session — and the `TypedEventHandler` closure
// captures an `Arc<WgcSignal>` (Send+Sync) by move. No raw pointers are dereferenced; borrowed
// locals outlive their synchronous calls.
unsafe {
// WGC is WinRT — the calling thread needs a COM/WinRT apartment for the GraphicsCaptureItem
// activation factory (RoGetActivationFactory). Initialize MTA; ignore "already initialized"
@@ -585,6 +606,15 @@ impl WgcCapturer {
}
fn process_frame(&mut self, frame: Direct3D11CaptureFrame) -> Result<CapturedFrame> {
// SAFETY: runs on the capturer's single owning thread. `frame` is a live
// `Direct3D11CaptureFrame` from `self.pool`; `frame.Surface().cast::<IDirect3DDxgiInterfaceAccess
// >().GetInterface()` yields the frame's backing `ID3D11Texture2D`, which belongs to
// `self.device` (the pool was created on it via `CreateDirect3D11DeviceFromDXGIDevice`). Every
// helper called here — `hdr_to_p010`, `convert_to_yuv`, `ensure_fp16_src`, `ensure_out_ring`,
// `HdrConverter::convert`, `CopyResource`, `CreateRenderTargetView` — operates on
// `self.device`/`self.context` and that same-device texture, so all resources share one device.
// The frame is held in `self.held` until its async GPU read completes for the zero-copy paths.
// Single-threaded immediate-context use; borrowed textures/SRVs/RTVs outlive each synchronous call.
unsafe {
let surface = frame.Surface().context("frame Surface")?;
let access: IDirect3DDxgiInterfaceAccess = surface
@@ -13,6 +13,9 @@
//! Wire framing (must match `wgc_helper::write_au`): per AU
//! `[u32 magic "PFAU" LE][u32 len LE][u64 pts_ns LE][u8 keyframe][len bytes data]`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use crate::capture::dxgi::WinCaptureTarget;
use anyhow::{bail, Context, Result};
use std::io::{BufRead, BufReader, Read};
@@ -56,9 +59,15 @@ pub struct HelperRelay {
rx: Receiver<RelayAu>,
}
// HANDLEs are just kernel handle values; we own them for the relay's lifetime and close them on Drop.
// SAFETY: every field is itself `Send`: the `proc`/`thread` `HANDLE`s are process-global kernel
// handle values (plain integers valid from any thread, owned for the relay's lifetime and closed once
// on Drop), `stdin_w` is a `Mutex<HANDLE>`, and `rx` is an mpsc `Receiver<RelayAu>` (which is `Send`).
// The relay is moved to one thread and owned there, so transferring it across threads is sound.
unsafe impl Send for HelperRelay {}
unsafe impl Sync for HelperRelay {}
// NOTE: `HelperRelay` is deliberately NOT `Sync`. Its `rx: Receiver<RelayAu>` is `!Sync` (std mpsc
// is single-consumer), and the relay is only ever a single-owner local in the punktfunk1 two-process
// mux loop — never shared by `&` across threads — so `Sync` is neither sound nor needed. (A prior
// `unsafe impl Sync` here asserted more than the fields support; removed.)
/// Control byte on the helper's stdin: force the next encoded frame to be an IDR (client decode
/// recovery). Mirrors `enc.request_keyframe()` in the single-process path.
@@ -84,6 +93,10 @@ impl HelperRelay {
);
tracing::info!(cmd = %cmdline, "spawning WGC helper in user session");
// SAFETY: `spawn_inner` is an `unsafe fn` only because it drives raw Win32 token/pipe/process
// FFI; it imposes no caller-side memory precondition beyond valid arguments. `cmdline` is a live
// `&str` borrowed for the synchronous call and `(w, h, hz)` are plain `u32`s. It validates its
// own runtime requirements (active console session, SYSTEM token) and returns `Err` otherwise.
unsafe { spawn_inner(&cmdline, w, h, hz) }
}
@@ -108,6 +121,11 @@ impl HelperRelay {
pub fn request_keyframe(&self) {
let h = self.stdin_w.lock().unwrap();
let mut written = 0u32;
// SAFETY: `*h` is the host's write end of the helper's stdin pipe — a live `HANDLE` owned by
// this `HelperRelay` (held under the `stdin_w` Mutex, locked here), closed only in Drop.
// `WriteFile` reads the 1-byte `&[CTL_KEYFRAME]` buffer and writes the byte count into
// `written`; both are live locals that outlive the synchronous call. A failure (helper gone) is
// discarded as documented.
unsafe {
let _ = windows::Win32::Storage::FileSystem::WriteFile(
*h,
@@ -121,6 +139,10 @@ impl HelperRelay {
impl Drop for HelperRelay {
fn drop(&mut self) {
// SAFETY: `self.proc`/`self.thread` are the child process/thread `HANDLE`s from
// `CreateProcessAsUserW`, and `stdin_w` is the host's pipe write end — all owned by this
// `HelperRelay` and closed exactly once here in Drop (no double-close). `TerminateProcess` and
// the three `CloseHandle`s are FFI calls taking those handles by value, borrowing no Rust memory.
unsafe {
// Terminate the child first so its WGC capture + NVENC session tear down, then close our
// handles (the reader threads end on the resulting broken pipe).
@@ -364,10 +386,17 @@ fn au_reader(mut r: HandleReader, tx: SyncSender<RelayAu>) {
/// Minimal `Read` over a Win32 pipe HANDLE (the windows crate doesn't impl `Read` on HANDLE).
struct HandleReader(HANDLE);
// SAFETY: `HandleReader` owns a single pipe `HANDLE` (a process-global kernel handle value, valid from
// any thread). It is moved into the dedicated reader thread and used only there (and closed once on
// Drop), never shared — so transferring ownership across threads is sound.
unsafe impl Send for HandleReader {}
impl Read for HandleReader {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
let mut read = 0u32;
// SAFETY: `self.0` is the live read end of an anonymous pipe owned by this `HandleReader`
// (closed only in Drop). `ReadFile` fills the caller-provided `buf` (writing at most `buf.len()`
// bytes) and stores the count in `read`; both outlive the synchronous call. A broken pipe
// surfaces as `Err` and is mapped to EOF below.
let ok = unsafe {
windows::Win32::Storage::FileSystem::ReadFile(self.0, Some(buf), Some(&mut read), None)
};
@@ -380,6 +409,8 @@ impl Read for HandleReader {
}
impl Drop for HandleReader {
fn drop(&mut self) {
// SAFETY: `self.0` is the pipe `HANDLE` this `HandleReader` owns; `CloseHandle` (an FFI call
// taking the handle by value) is invoked exactly once here in Drop, so there is no double-close.
unsafe {
let _ = CloseHandle(self.0);
}
@@ -391,6 +422,13 @@ impl Drop for HandleReader {
pub fn running_as_system() -> bool {
use windows::Win32::Security::{GetTokenInformation, TokenUser, TOKEN_QUERY, TOKEN_USER};
use windows::Win32::System::Threading::{GetCurrentProcess, OpenProcessToken};
// SAFETY: `OpenProcessToken(GetCurrentProcess(), TOKEN_QUERY, &mut token)` opens the current-process
// token (the pseudo-handle is always valid) into `token`, which is closed once before each return.
// The first `GetTokenInformation` (null buffer) queries the required `len`; `buf` is then a
// `Vec<u8>` of exactly `len` bytes and the second call fills it, so `&*(buf.as_ptr() as *const
// TOKEN_USER)` reads a `TOKEN_USER` the kernel just wrote into a sufficiently-sized buffer (the
// variable-length SID it points at also lies within `buf`, which outlives the borrow).
// `is_local_system_sid` is this module's `unsafe fn`, given that in-buffer `PSID`. Safe on any thread.
unsafe {
let mut token = HANDLE::default();
if OpenProcessToken(GetCurrentProcess(), TOKEN_QUERY, &mut token).is_err() {
+11
View File
@@ -3,6 +3,9 @@
//! RGB→YUV on the GPU, so no host-side CSC) and VAAPI on AMD/Intel (`*_vaapi`; the CPU-input
//! fallback swscales RGB→NV12, the zero-copy path imports the capture dmabuf straight into a
//! VA surface). One [`Encoder`] trait, selected in [`open_video`].
// Every unsafe block in this module tree carries a `// SAFETY:` proof; enforce it (unsafe-proof
// program). As a parent module this also covers the child modules (encode::windows/linux::*).
#![deny(clippy::undocumented_unsafe_blocks)]
use crate::capture::{CapturedFrame, PixelFormat};
use anyhow::Result;
@@ -505,6 +508,14 @@ fn windows_gpu_vendor() -> Option<GpuVendor> {
CreateDXGIFactory1, IDXGIFactory1, DXGI_ADAPTER_FLAG_SOFTWARE,
};
static CACHE: OnceLock<Option<GpuVendor>> = OnceLock::new();
// SAFETY: `CreateDXGIFactory1` returns a fresh owned `IDXGIFactory1` COM object (refcounted by the
// windows-rs wrapper, Released when the local drops); `.ok()?` bails on failure so `factory` is a
// valid interface before any use. `EnumAdapters1(i)` hands back the i-th adapter as an owned
// `IDXGIAdapter1` (or an error past the last adapter, which ends the loop). `GetDesc1()` returns the
// `DXGI_ADAPTER_DESC1` by value (no out-pointer), so reading `desc.Flags`/`desc.VendorId` is plain
// field access. Every call only touches COM objects this closure owns; the `OnceLock` runs the
// closure once (no data race) and all interfaces are Released as the locals drop. No raw pointer is
// dereferenced and nothing is aliased.
*CACHE.get_or_init(|| unsafe {
let factory: IDXGIFactory1 = CreateDXGIFactory1().ok()?;
let mut i = 0u32;
@@ -8,6 +8,8 @@
//! does *not* accept — we expand it to `rgb0` (one padding byte/pixel, no colour math).
//! The encoder is opened *without* a global header so VPS/SPS/PPS are emitted in-band on
//! every IDR — the output is both a playable raw Annex-B stream and self-contained AUs.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Codec, EncodedFrame, Encoder};
use crate::capture::{CapturedFrame, FramePayload, PixelFormat};
@@ -79,6 +81,12 @@ impl CudaHw {
impl Drop for CudaHw {
fn drop(&mut self) {
// SAFETY: `frames_ref`/`device_ref` are the two non-null `AVBufferRef`s `CudaHw::new` created
// (it bails before returning `Self` if either alloc fails, so a live `CudaHw` always holds
// both). `av_buffer_unref` drops one reference and nulls the pointer through the `&mut`. This
// `Drop` runs exactly once and `CudaHw` owns these refs exclusively → no double-free /
// use-after-free. Frames are unref'd before the device (the frames ctx internally refs the
// device; refcounted, so the order is sound regardless).
unsafe {
ffi::av_buffer_unref(&mut self.frames_ref);
ffi::av_buffer_unref(&mut self.device_ref);
@@ -136,6 +144,13 @@ pub struct NvencEncoder {
// `CudaHw` holds raw `AVBufferRef`s; the encoder lives on a single thread. The CPU encoder is
// already `Send` via ffmpeg-next; assert it for the CUDA fields too.
// SAFETY: `NvencEncoder` owns an ffmpeg-next `Encoder`/`VideoFrame` (already `Send`) plus a `CudaHw`
// holding raw `AVBufferRef`s, which are not `Send` by default. The encoder is owned and driven by
// exactly ONE thread — the per-session encode thread it is moved to — and is only touched through
// `&mut self` methods, so it is never aliased or accessed concurrently. The wrapped libav contexts
// (and the shared `CUcontext` the `CudaHw` references) have no thread affinity, so transferring
// ownership across threads is sound. This asserts `Send` (transfer) only, extending ffmpeg-next's
// existing `Send` to the raw CUDA fields; `Sync` (shared `&`) is deliberately NOT implemented.
unsafe impl Send for NvencEncoder {}
impl NvencEncoder {
@@ -162,6 +177,9 @@ impl NvencEncoder {
}
ffmpeg::init().context("ffmpeg init")?;
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
// SAFETY: `av_log_set_level` sets libav's global integer log level; `48` (= AV_LOG_DEBUG)
// is a valid level with no pointer args, and libav was just initialized by `ffmpeg::init()`
// above — always sound.
unsafe { ffi::av_log_set_level(48) }; // AV_LOG_DEBUG — surface NVENC hw-frame rejects
}
let name = codec.nvenc_name();
@@ -195,6 +213,11 @@ impl NvencEncoder {
.unwrap_or(1.0);
let vbv_bits = ((bitrate_bps as f64 / fps.max(1) as f64) * vbv_frames as f64)
.clamp(1.0, i32::MAX as f64);
// SAFETY: `video` is the ffmpeg-next encoder builder wrapping a freshly-allocated
// `AVCodecContext` that we hold by value and have not opened yet; `video.as_mut_ptr()` returns
// that non-null, properly-aligned, exclusively-owned context. Writing the plain `rc_buffer_size`
// int field before `open_with` is the supported way to set a field ffmpeg-next exposes no
// setter for. Sole owner → no aliasing; synchronous in-bounds scalar write.
unsafe {
(*video.as_mut_ptr()).rc_buffer_size = vbv_bits as i32;
}
@@ -204,6 +227,9 @@ impl NvencEncoder {
// "freeze". NVENC emits one IDR at stream start, then P-frames only; `forced-idr` (below)
// turns a client recovery request (RFI, via `request_keyframe`) into an IDR on demand.
// This is the Moonlight/Sunshine low-latency model.
// SAFETY: same `video` builder as above — a non-null, properly-aligned, sole-owned, not-yet-
// opened `AVCodecContext`. We write the plain `gop_size` int field (= -1, infinite GOP) before
// `open_with`, which ffmpeg-next has no setter for. No aliasing; synchronous scalar write.
unsafe {
(*video.as_mut_ptr()).gop_size = -1;
}
@@ -214,6 +240,10 @@ impl NvencEncoder {
// RGB-input paths leave these unset (NVENC's internal CSC writes its own VUI). Matches the
// Windows NV12 path's BT.709 limited-range signalling.
if matches!(format, PixelFormat::Nv12) {
// SAFETY: same `video` builder — `raw = video.as_mut_ptr()` is the non-null, properly-
// aligned, sole-owned, not-yet-opened `AVCodecContext`. We set its four VUI colour enum
// fields to valid `AVColorSpace`/`AVColorRange`/`AVColorPrimaries`/`AVColorTransfer-
// Characteristic` variants before `open_with`. Sole owner → no aliasing; synchronous writes.
unsafe {
let raw = video.as_mut_ptr();
(*raw).colorspace = ffi::AVColorSpace::AVCOL_SPC_BT709;
@@ -228,7 +258,17 @@ impl NvencEncoder {
// *before* open (NVENC derives the device from `hw_frames_ctx`).
let cuda_hw = if cuda {
let cu_ctx = crate::zerocopy::cuda::context().context("shared CUDA context")?;
// SAFETY: `CudaHw::new` (an `unsafe fn`) requires libav initialized (the `ffmpeg::init()`
// above ran) and a valid `CUcontext`; `cu_ctx` is the shared importer context from
// `zerocopy::cuda::context()?`, non-null on the `Ok` path. `nvenc_pixel` is a valid `Pixel`
// and `width`/`height` are the validated positive dims. It returns a RAII `CudaHw` wrapping
// (not owning) `cu_ctx` and owning two `AVBufferRef`s freed on drop.
let hw = unsafe { CudaHw::new(cu_ctx, nvenc_pixel, width, height)? };
// SAFETY: `raw = video.as_mut_ptr()` is the non-null, sole-owned, not-yet-opened
// `AVCodecContext`. We set `pix_fmt = CUDA` and attach NEW refs (`av_buffer_ref`) of
// `hw.device_ref`/`hw.frames_ref` — both non-null (`CudaHw::new` guarantees) and from the
// live `hw`, which is moved into `NvencEncoder.cuda` next to `enc` and so outlives the
// encoder. The context owns its own refs (freed when the context closes). No aliasing.
unsafe {
let raw = video.as_mut_ptr();
(*raw).pix_fmt = ffi::AVPixelFormat::AV_PIX_FMT_CUDA;
@@ -428,6 +468,19 @@ impl NvencEncoder {
// The device→device copy below uses our shared context directly; make it current on the
// encode thread (ffmpeg pushes its own around the pool alloc, so order is fine).
crate::zerocopy::cuda::make_current().context("CUDA context current (encode thread)")?;
// SAFETY: `frames_ref` is the non-null CUDA frames ctx from `self.cuda` (unwrapped via
// `.context(..)?` above), and the shared CUDA context was just made current on THIS thread
// (`make_current()?`), the precondition for the device-pointer copies below.
// * `av_frame_alloc` → `f` (null-checked). `av_hwframe_get_buffer(frames_ref, f, 0)` fills `f`
// with a pooled CUDA surface (sets `data[]`/`linesize[]`/`buf[0]`/`hw_frames_ctx`); on
// failure we free `f` and bail.
// * For NV12 we read `(*f).data[0..2]` / `linesize[0..2]` (Y + interleaved UV), else
// `data[0]`/`linesize[0]` — in-struct fields of the non-null `f`, valid for the surface dims
// ffmpeg allocated — and pass them to the cuda copy helpers, which device→device copy `buf`
// (the imported `DeviceBuffer`, owned by the caller and live for this call) into the surface.
// * On copy error we free `f` and return. Otherwise we write `pts`/`pict_type` through `f` and
// `avcodec_send_frame` it into the live owned `self.enc` context (which takes its own ref of
// the pooled surface), then free our `f` ref exactly once. Single-threaded encoder → no race.
unsafe {
let mut f = ffi::av_frame_alloc();
if f.is_null() {
+148 -3
View File
@@ -19,6 +19,8 @@
//! hwdevice/hwframes/buffersrc/buffersink calls go through `ffmpeg::ffi` (= `ffmpeg_sys_next`),
//! as the CUDA encode path and the clients' decode paths already do. The encoder is opened
//! *without* a global header, so VPS/SPS/PPS are in-band on every IDR.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Codec, EncodedFrame, Encoder};
use crate::capture::{CapturedFrame, DmabufFrame, FramePayload, PixelFormat};
@@ -133,6 +135,14 @@ pub fn probe_can_encode(codec: Codec) -> bool {
if ffmpeg::init().is_err() {
return false;
}
// SAFETY: `ffmpeg::init()` returned Ok above, so libav is initialized. `av_log_get_level`/
// `av_log_set_level` only read/write libav's global integer log level (no pointer args) and are
// always sound to call post-init. `VaapiHw::new` (an `unsafe fn`) builds a VAAPI device + NV12
// frames pool from the literal NV12/640x480/pool=2 args and hands back a RAII handle that unrefs
// both `AVBufferRef`s on drop. `open_vaapi_encoder` (an `unsafe fn`) borrows `hw.device_ref`/
// `hw.frames_ref` — the two non-null refs `VaapiHw::new` just created — and `av_buffer_ref`s them
// into the encoder; `hw` is a live local for the whole match arm, so the borrows outlive the
// synchronous call, and both `hw` and the probe encoder are dropped (RAII) when the arm ends.
unsafe {
// A missing VA device (non-VAAPI host, GPU-less CI) is an expected probe outcome — quiet
// ffmpeg's "No VA display found" error for the probe, then restore the level.
@@ -224,6 +234,12 @@ impl VaapiHw {
impl Drop for VaapiHw {
fn drop(&mut self) {
// SAFETY: `frames_ref`/`device_ref` are the two non-null `AVBufferRef`s `VaapiHw::new`
// created (it bails before constructing `Self` if either alloc fails, so a live `VaapiHw`
// always holds both). `av_buffer_unref` drops one reference and nulls the pointer through the
// `&mut`. This `Drop` runs exactly once and `VaapiHw` owns these refs exclusively, so there
// is no double-free / use-after-free. Frames are unref'd before the device because the frames
// ctx internally holds a ref on the device (refcounted, so the order is sound either way).
unsafe {
ffi::av_buffer_unref(&mut self.frames_ref);
ffi::av_buffer_unref(&mut self.device_ref);
@@ -252,7 +268,16 @@ impl CpuInner {
) -> Result<Self> {
let src_pixel = vaapi_sws_src(format)?;
const POOL: c_int = 16;
// SAFETY: `VaapiHw::new` (an `unsafe fn`) requires libav initialized — guaranteed because the
// only path here is `VaapiEncoder::open` → `ensure_inner` → `CpuInner::open`, and `open` ran
// `ffmpeg::init()`. The args are valid: NV12 sw_format, the validated positive `width`/`height`,
// pool=16. It returns a RAII `VaapiHw` that unrefs its two `AVBufferRef`s on drop.
let hw = unsafe { VaapiHw::new(ffi::AVPixelFormat::AV_PIX_FMT_NV12, width, height, POOL)? };
// SAFETY: `open_vaapi_encoder` (an `unsafe fn`) borrows `hw.device_ref`/`hw.frames_ref` — both
// non-null (`VaapiHw::new` guarantees it) and from the `hw` just built above, which is a live
// local that outlives this synchronous call. The fn `av_buffer_ref`s them into the encoder, so
// the encoder holds its own references; `hw` is also moved into the returned `CpuInner` next to
// `enc`, keeping the device/frames alive for the encoder's whole lifetime.
let enc = unsafe {
open_vaapi_encoder(
codec,
@@ -266,6 +291,12 @@ impl CpuInner {
};
// swscale RGB→NV12, BT.709 limited (matches the VUI), no rescale.
let src_av = pixel_to_av(src_pixel);
// SAFETY: `sws_getContext` allocates a swscale context for the given src/dst dimensions and
// pixel formats. All four dims are the encoder's positive `width`/`height` cast to `c_int`;
// `src_av` is a valid `AVPixelFormat` (from `pixel_to_av` of the `vaapi_sws_src`-validated
// `src_pixel`), the dst is NV12. The three trailing pointers (srcFilter, dstFilter, param) are
// explicitly null = "use defaults", which the API documents as accepted. No Rust memory is
// borrowed — only by-value ints/enums — and the returned pointer is null-checked just below.
let sws = unsafe {
ffi::sws_getContext(
width as c_int,
@@ -283,10 +314,23 @@ impl CpuInner {
if sws.is_null() {
bail!("sws_getContext(RGB→NV12) failed");
}
// SAFETY: `sws` is the non-null `SwsContext` from `sws_getContext` above (the `is_null()`
// check immediately preceding returned false). `sws_getCoefficients(SWS_CS_ITU709)` returns a
// pointer into a libswscale static const coefficient table valid for the whole process, reused
// here for both the inverse (src) and forward (dst) matrices. `sws_setColorspaceDetails` only
// reads those tables and writes scalar CSC settings into `sws`; the table pointer outlives the
// synchronous call and no Rust memory is passed.
unsafe {
let cs709 = ffi::sws_getCoefficients(SWS_CS_ITU709);
ffi::sws_setColorspaceDetails(sws, cs709, 1, cs709, 0, 0, 1 << 16, 1 << 16);
}
// SAFETY: `av_frame_alloc` returns a fresh, uniquely-owned heap `AVFrame` (null-checked — on
// null we free the already-built `sws` and bail). We then write the plain `format`/`width`/
// `height` fields through the non-null, properly-aligned `f` (sole owner, not yet shared).
// `av_frame_get_buffer(f, 0)` allocates backing storage for those dims/format; on failure we
// free `f` and `sws` (unwinding the half-built state) and bail. On success `f` is a fully-owned
// NV12 frame stored in `CpuInner.nv12` and freed once in `CpuInner::drop`. `f` is a unique
// fresh pointer, so none of these writes alias anything.
let nv12 = unsafe {
let f = ffi::av_frame_alloc();
if f.is_null() {
@@ -329,6 +373,18 @@ impl CpuInner {
let h = self.height as usize;
let src_row = w * self.src_format.bytes_per_pixel();
anyhow::ensure!(bytes.len() >= src_row * h, "captured buffer too small");
// SAFETY: The `ensure!`s above guarantee `format == self.src_format` and
// `bytes.len() >= src_row * h`. `sws_scale` reads `h` rows of `src_row` bytes from
// `src_data[0] = bytes.as_ptr()` (the other planes null/0 — packed RGB is single-plane), all
// in bounds; `bytes`, `src_data`, `src_stride` are live locals for this synchronous call.
// `self.sws` is the non-null context built in `open`; it writes into `self.nv12` (a non-null
// owned frame whose `data`/`linesize` in-struct arrays were sized by `av_frame_get_buffer`).
// `av_frame_alloc` (null-checked) yields a fresh `hwf`; `av_hwframe_get_buffer` pulls a pooled
// VAAPI surface from the live non-null `self.hw.frames_ref`; `av_hwframe_transfer_data` uploads
// the staged NV12 into it — both frames live, failures free `hwf` and bail. We then write
// `pts`/`pict_type` through the non-null `hwf` and `avcodec_send_frame` it into the live
// owned `self.enc` context (which takes its own ref), then free our `hwf` ref exactly once.
// The encoder runs only on this thread (see `unsafe impl Send`), so no aliasing/data race.
unsafe {
let src_data: [*const u8; 4] = [bytes.as_ptr(), ptr::null(), ptr::null(), ptr::null()];
let src_stride: [c_int; 4] = [src_row as c_int, 0, 0, 0];
@@ -374,6 +430,12 @@ impl CpuInner {
impl Drop for CpuInner {
fn drop(&mut self) {
// SAFETY: `self.nv12` (an owned `AVFrame`) and `self.sws` (an owned `SwsContext`) are each
// freed exactly once here, guarded by `is_null()` so a never-set pointer is skipped (no double
// free). `CpuInner` owns both exclusively and `Drop` runs once. `av_frame_free` takes `&mut`
// and nulls the pointer. `self.enc`/`self.hw` are freed afterward by their own `Drop` impls;
// the encoder holds its own `av_buffer_ref`'d device/frames copies, so field-drop order is
// irrelevant to soundness.
unsafe {
if !self.nv12.is_null() {
ffi::av_frame_free(&mut self.nv12);
@@ -417,6 +479,31 @@ impl DmabufInner {
let drm_fourcc = crate::zerocopy::drm_fourcc(format)
.ok_or_else(|| anyhow!("no DRM fourcc for {format:?} (VAAPI zero-copy)"))?;
let node = render_node();
// SAFETY: libav is initialized (`VaapiEncoder::open` ran `ffmpeg::init()` before
// `ensure_inner` → `DmabufInner::open`). Every raw pointer dereferenced below is either freshly
// allocated by the immediately-preceding ffmpeg call and null-checked, or an in-struct field of
// such an object:
// * `node` is a `CString` (from `render_node`) live for the whole block; its `.as_ptr()` is a
// NUL-terminated path read only during `av_hwdevice_ctx_create`.
// * `av_hwdevice_ctx_create(&mut drm_device, DRM, …)` / `…_create_derived(&mut vaapi_device,
// VAAPI, drm_device, …)`: on `r < 0` the out-param stays null and we bail (the derive path
// unrefs `drm_device` first); on success each is a non-null owned `AVBufferRef`.
// * `av_hwframe_ctx_alloc(drm_device)` → `drm_frames` (null-checked); `(*drm_frames).data` is
// its `AVHWFramesContext` payload, written before `av_hwframe_ctx_init`.
// * `avfilter_graph_alloc` → `graph` (null-checked); `avfilter_get_by_name` returns a static
// const `AVFilter` (process-lifetime) or null; `avfilter_graph_alloc_filter` allocates each
// filter ctx inside `graph`; the four are null-checked together. `inst`/arg strings are
// 'static C literals.
// * `(*hwmap/scale).hw_device_ctx = av_buffer_ref(vaapi_device)` attaches a NEW ref owned by
// the filter (freed by `avfilter_graph_free`); our `vaapi_device` ref is untouched.
// * `av_buffersink_get_hw_frames_ctx(sink)` → `nv12_ctx` is a borrowed ref owned by the sink,
// valid while `graph` lives (and `graph` is moved into the returned `DmabufInner`).
// * `open_vaapi_encoder` borrows `vaapi_device` (our live owned ref) and `nv12_ctx` (sink's
// live ref) and `av_buffer_ref`s both into the encoder.
// Every early-error path unref's the allocated buffers and frees the graph in the right order
// before bailing; on success the four `AVBufferRef`s + `graph` + `src`/`sink` are moved into
// `DmabufInner` and freed in its `Drop`. (Two non-UB leaks noted below: `av_buffersrc_*` and
// the final `?`.)
unsafe {
// DRM device (source dmabuf frames) + a VAAPI device derived from it (same GPU) for
// hwmap/scale_vaapi/the encoder.
@@ -509,7 +596,12 @@ impl DmabufInner {
num: 1,
den: fps as c_int,
};
(*par).hw_frames_ctx = ffi::av_buffer_ref(drm_frames);
// Assign `drm_frames` BORROWED (no extra ref): `av_buffersrc_parameters_set` takes its
// own ref of `par->hw_frames_ctx` (via av_buffer_replace), and `av_free(par)` frees only
// the struct, not the ref. Our single owned `drm_frames` ref is retained, lives in
// `DmabufInner`, and is unref'd in `Drop`. Wrapping it in `av_buffer_ref` here would leak
// that extra ref every session (the persistent listener would accumulate them).
(*par).hw_frames_ctx = drm_frames;
let r = ffi::av_buffersrc_parameters_set(src, par);
ffi::av_free(par as *mut _);
if r < 0 {
@@ -564,7 +656,12 @@ impl DmabufInner {
ffi::av_buffer_unref(&mut drm_device);
bail!("filter sink has no VAAPI frames context");
}
let enc = open_vaapi_encoder(
// On encoder-open failure, free the graph + our owned buffer refs before bailing (matching
// every error path above) so a failed session doesn't leak them. `nv12_ctx` is borrowed
// from the sink (owned by `graph`), so `avfilter_graph_free` reclaims it — don't unref it
// separately. On success the encoder takes its own ref of `vaapi_device`, and `drm_frames`/
// `vaapi_device`/`drm_device`/`graph` move into `DmabufInner` (freed in `Drop`).
let enc = match open_vaapi_encoder(
codec,
width,
height,
@@ -572,7 +669,16 @@ impl DmabufInner {
bitrate_bps,
vaapi_device,
nv12_ctx,
)?;
) {
Ok(enc) => enc,
Err(e) => {
ffi::avfilter_graph_free(&mut graph);
ffi::av_buffer_unref(&mut drm_frames);
ffi::av_buffer_unref(&mut vaapi_device);
ffi::av_buffer_unref(&mut drm_device);
return Err(e);
}
};
tracing::info!(
encoder = codec.vaapi_name(),
@@ -600,6 +706,23 @@ impl DmabufInner {
dmabuf.fourcc,
self.fourcc
);
// SAFETY: The `ensure!` above checked `dmabuf.fourcc == self.fourcc`.
// * `std::mem::zeroed::<AVDRMFrameDescriptor>()` is sound: it is a `#[repr(C)]` POD of ints and
// nested int-struct arrays (no `NonNull`/refs), for which all-zero is a valid bit pattern;
// `Box` puts it on the heap with a unique owner.
// * `dmabuf.fd.as_raw_fd()` is the fd of the caller's `&DmabufFrame`, which owns it for the
// whole synchronous `submit`; we describe one object/layer/plane from its
// fourcc/modifier/offset/stride and pass `object.size = 0` (ffmpeg queries the real size).
// * `av_frame_alloc` → `drm` (null-checked); we set its scalar fields and
// `hw_frames_ctx = av_buffer_ref(self.drm_frames)` (new ref of the live owned ctx).
// * `data[0] = Box::into_raw(desc)` transfers the box into the frame; `buf[0] =
// av_buffer_create(.., free_desc, ..)` registers a destructor that reclaims it exactly once
// when the buffer's refcount hits zero — matched alloc/free, no leak/double-free.
// * `av_buffersrc_add_frame_flags(self.src, drm, KEEP_REF)` pushes a ref into the live
// buffersrc; KEEP_REF keeps our own `drm` ref, which we then `av_frame_free`. We pull the
// converted surface with `av_buffersink_get_frame(self.sink, nv12)` BEFORE returning, so the
// dmabuf (owned by the caller) is read while still valid. `nv12` is sent into the live owned
// `self.enc` (takes its own ref) and our ref freed once. Single-threaded encoder → no race.
unsafe {
// Build a DRM-PRIME AVFrame describing the dmabuf (one object/fd, one layer/plane).
let mut desc: Box<ffi::AVDRMFrameDescriptor> = Box::new(std::mem::zeroed());
@@ -626,6 +749,11 @@ impl DmabufInner {
// Own the descriptor so it frees with the frame (the fd is owned by the DmabufFrame,
// which outlives this call — the graph reads the surface before submit returns).
extern "C" fn free_desc(_opaque: *mut std::ffi::c_void, data: *mut u8) {
// SAFETY: `data` is exactly the pointer produced by `Box::into_raw(desc)` and passed as
// `av_buffer_create`'s first arg, which libav hands back verbatim to this callback. It
// is a valid, uniquely-owned `Box<AVDRMFrameDescriptor>` raw pointer; libav invokes the
// callback exactly once (when the last buffer ref drops), so `from_raw` + `drop`
// reclaims it exactly once — no double-free. `_opaque` is unused (we passed null).
unsafe { drop(Box::from_raw(data as *mut ffi::AVDRMFrameDescriptor)) };
}
(*drm).buf[0] = ffi::av_buffer_create(
@@ -673,6 +801,13 @@ impl DmabufInner {
impl Drop for DmabufInner {
fn drop(&mut self) {
// SAFETY: `graph`/`drm_frames`/`vaapi_device`/`drm_device` are the non-null objects
// `DmabufInner::open` built and moved into `self` (open bails before constructing `Self` if any
// alloc fails). `avfilter_graph_free` frees the graph (and the per-filter device refs it owns);
// each `av_buffer_unref` drops one ref and nulls the pointer via `&mut`. `DmabufInner` owns all
// four exclusively and `Drop` runs once → no double-free/use-after-free. The graph is freed
// first (it holds refs on the devices), then frames, then the derived VAAPI device, then DRM.
// (`self.enc` drops via ffmpeg-next afterward, holding its own refs.)
unsafe {
ffi::avfilter_graph_free(&mut self.graph);
ffi::av_buffer_unref(&mut self.drm_frames);
@@ -703,6 +838,13 @@ pub struct VaapiEncoder {
}
// Raw FFI pointers; the encoder lives on a single thread (same contract as `NvencEncoder`).
// SAFETY: `VaapiEncoder`'s `Inner` holds raw FFI pointers (`SwsContext`, `AVFrame`, `AVBufferRef`,
// `AVFilterContext`, `AVCodecContext`) that are not `Send` by default. The encoder is owned and
// driven by exactly ONE thread — the host's per-session encode thread it is moved (transferred) to —
// and is only ever touched through `&mut self` methods, so it is never aliased or accessed
// concurrently from two threads. None of the underlying libav/libswscale objects have thread
// affinity (they are not thread-local), so transferring ownership across threads is sound. This
// asserts `Send` (transfer) only; `Sync` (shared `&`) is deliberately NOT implemented.
unsafe impl Send for VaapiEncoder {}
impl VaapiEncoder {
@@ -720,6 +862,9 @@ impl VaapiEncoder {
}
ffmpeg::init().context("ffmpeg init")?;
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
// SAFETY: `av_log_set_level` sets libav's global integer log level; `48` (= AV_LOG_DEBUG)
// is a valid level and there are no pointer args. libav was just initialized by the
// `ffmpeg::init()` above, so the call is always sound.
unsafe { ffi::av_log_set_level(48) };
}
// Validate the codec/format up front so a bad request fails at open, not on the first frame.
@@ -28,6 +28,8 @@
//! through `ffmpeg::ffi` (= `ffmpeg_sys_next`), exactly as the Linux CUDA/VAAPI paths do. The
//! `AVD3D11VADeviceContext`/`AVD3D11VAFramesContext` layouts are mirrored (the bindings don't
//! allowlist `hwcontext_d3d11va.h`), as [`super::linux`] mirrors `AVCUDADeviceContext`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Codec, EncodedFrame, Encoder};
use crate::capture::{dxgi::D3d11Frame, CapturedFrame, FramePayload, PixelFormat};
@@ -243,6 +245,12 @@ pub fn probe_can_encode(vendor: WinVendor, codec: Codec) -> bool {
if ffmpeg::init().is_err() {
return false;
}
// SAFETY: `ffmpeg::init()` succeeded above, so libav's global state is initialised.
// `av_log_get_level`/`av_log_set_level` are global scalar getters/setters with no pointer args.
// `open_win_encoder` (the `unsafe fn`) is called with null `device_ref`/`frames_ref` (the system
// path), so it touches no D3D11/hwcontext — it only allocates and opens a self-contained
// libavcodec encoder that is dropped at the end of `.is_ok()`. We restore the prior log level and
// no raw pointer escapes the block.
unsafe {
// A missing AMF/QSV runtime (wrong-vendor host, GPU-less CI) is an expected probe outcome —
// quiet ffmpeg's open error for the probe, then restore the level.
@@ -337,6 +345,10 @@ impl SystemInner {
} else {
ffi::AVPixelFormat::AV_PIX_FMT_NV12
};
// SAFETY: calls the `unsafe fn open_win_encoder` with null `device_ref`/`frames_ref`, so the
// system path is taken (no hw device/frames context is touched); all other args are scalars.
// The returned `encoder::video::Encoder` owns its `AVCodecContext` and frees it on drop; no raw
// pointer is aliased.
let enc = unsafe {
open_win_encoder(
vendor,
@@ -352,6 +364,11 @@ impl SystemInner {
ptr::null_mut(),
)?
};
// SAFETY: `av_frame_alloc` returns a freshly-allocated, uniquely-owned `AVFrame` (null-checked
// before any deref); writing `format`/`width`/`height` through `*f` stays inside that
// allocation. `av_frame_get_buffer(f, 0)` allocates the backing planes — on failure we
// `av_frame_free` the sole owner (no double-free) and bail; on success the raw `f` is moved into
// `self.sw_frame` and freed exactly once in `Drop`.
let sw_frame = unsafe {
let f = ffi::av_frame_alloc();
if f.is_null() {
@@ -467,6 +484,18 @@ impl SystemInner {
} else {
DXGI_FORMAT_NV12
};
// SAFETY: `ensure_staging` builds a STAGING texture (CPU_ACCESS_READ) matching `dxgi_fmt` on
// `frame.device` — the same `ID3D11Device` that owns `frame.texture` — and caches that device's
// immediate context in `self.ctx`. `src`/`dst` are that device's textures of identical NV12/P010
// format and dimensions, so `CopyResource` on the single-threaded immediate context is valid.
// `Map(.., D3D11_MAP_READ)` succeeds on a staging texture and yields `map.pData` valid for the
// whole resource; for NV12/P010 the luma plane is `H` rows at `RowPitch` and the chroma plane
// follows at byte offset `RowPitch*H` (`H/2` rows), so `total = pitch*(H+⌈H/2⌉)` is exactly the
// mapped extent and `from_raw_parts(base, total)` stays in-bounds. Each `copy_nonoverlapping`
// reads a bounds-checked `mapped[..]` sub-slice (`row_bytes ≤ pitch`) and writes `row_bytes ≤
// linesize` into the `av_frame_get_buffer`-allocated plane at row `y < H`, so every destination
// offset is inside the frame's plane allocation; src and dst never alias. `Unmap` pairs `Map`,
// then `send` (the `unsafe fn`) hands `sw_frame` to the encoder.
unsafe {
self.ensure_staging(&frame.device, dxgi_fmt)?;
let staging = self.staging.clone().context("staging texture")?;
@@ -510,6 +539,14 @@ impl SystemInner {
if self.ten_bit {
bail!("ffmpeg_win: BGRA readback is 8-bit only (HDR needs the P010 capture path)");
}
// SAFETY: `ensure_staging` builds a B8G8R8A8 STAGING texture on `frame.device` and caches that
// device's immediate context; `src`/`dst` are that device's textures of matching BGRA format,
// so `CopyResource` on the single-threaded context is valid. `Map(READ)` on the staging texture
// yields `base` valid for `pitch` × `h` rows. `ensure_sws` lazily builds the BGRA→NV12 context;
// `sws_scale` reads `h` rows of `pitch` bytes from `base` (in-bounds — the staging surface is
// `≥ pitch*h`) into the `sw_frame` planes addressed by its `data`/`linesize` (allocated for
// `width`×`height` NV12). `Unmap` pairs `Map`; the cached `sws` is freed once in `Drop`. The
// mapped read region never aliases the owned encoder frame.
unsafe {
self.ensure_staging(&frame.device, DXGI_FORMAT_B8G8R8A8_UNORM)?;
let staging = self.staging.clone().context("staging texture")?;
@@ -552,6 +589,13 @@ impl SystemInner {
/// R10 shader output instead of P010. DXGI `R10G10B10A2_UNORM` (R in the low 10 bits, X2 alpha in
/// the top 2) == FFmpeg `AV_PIX_FMT_X2BGR10LE`. UNTESTED on glass (no AMD/Intel Windows box).
fn readback_rgb10(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> {
// SAFETY: same shape as `readback_yuv`/`readback_bgra` — `ensure_staging` builds an
// R10G10B10A2 STAGING texture on `frame.device` and caches its immediate context; `src`/`dst`
// are that device's matching-format textures, so `CopyResource` on the single-threaded context
// is valid. `Map(READ)` yields `base` valid for `pitch` × `h` rows. `ensure_sws` builds the
// X2BGR10LE→P010 (BT.2020) context; `sws_scale` reads `h` rows of `pitch` bytes from `base`
// (in-bounds) into the `sw_frame` P010 planes (`data`/`linesize`, allocated `width`×`height`).
// `Unmap` pairs `Map`; `sws` is freed once in `Drop`. No aliasing between read and write.
unsafe {
self.ensure_staging(&frame.device, DXGI_FORMAT_R10G10B10A2_UNORM)?;
let staging = self.staging.clone().context("staging texture")?;
@@ -605,6 +649,12 @@ impl SystemInner {
let h = self.height as usize;
let src_row = w * format.bytes_per_pixel();
anyhow::ensure!(bytes.len() >= src_row * h, "captured buffer too small");
// SAFETY: `ensure_sws` lazily builds the (packed RGB/BGR)→NV12 context for this fixed src/dst
// format pair. `src_data[0] = bytes.as_ptr()` with `src_stride[0] = src_row`; the `ensure!`
// above guarantees `bytes` holds at least `src_row*h` bytes, so `sws_scale` reads `h` rows of
// `src_row` bytes in-bounds and writes the `sw_frame` NV12 planes (`data`/`linesize`, allocated
// `width`×`height`). `bytes` is borrowed for the call only and never aliases the owned
// `sw_frame`. `send` then hands `sw_frame` to the encoder.
unsafe {
self.ensure_sws(
pixel_to_av(sws_src(format)?),
@@ -667,6 +717,10 @@ impl SystemInner {
impl Drop for SystemInner {
fn drop(&mut self) {
// SAFETY: `sw_frame` is the `AVFrame` allocated in `open` (or null) — `av_frame_free` drops it
// once and nulls the pointer through the `&mut`; `sws` is the cached `SwsContext` (or null) —
// `sws_freeContext` frees it once. This `Drop` runs exactly once and `SystemInner` owns both
// exclusively, so there is no double-free or use-after-free.
unsafe {
if !self.sw_frame.is_null() {
ffi::av_frame_free(&mut self.sw_frame);
@@ -745,6 +799,12 @@ impl D3d11Hw {
impl Drop for D3d11Hw {
fn drop(&mut self) {
// SAFETY: `frames_ref`/`device_ref` are the two non-null `AVBufferRef`s `D3d11Hw::new` created
// (it bails before constructing `Self` if either alloc/init fails, so a live `D3d11Hw` always
// holds both). `av_buffer_unref` drops one reference and nulls the pointer through the `&mut`.
// This `Drop` runs exactly once and `D3d11Hw` owns these refs exclusively → no double-free /
// use-after-free. Frames are unref'd before the device because the frames ctx internally holds
// a ref on the device (refcounted, so the order is sound either way).
unsafe {
ffi::av_buffer_unref(&mut self.frames_ref);
ffi::av_buffer_unref(&mut self.device_ref);
@@ -800,6 +860,18 @@ impl ZeroCopyInner {
WinVendor::Qsv => (D3D11_BIND_DECODER.0 | D3D11_BIND_VIDEO_ENCODER.0) as u32,
};
const POOL: c_int = 8;
// SAFETY: `D3d11Hw::new` wraps the capturer's `device` as a D3D11VA hwdevice (handing FFmpeg an
// owned AddRef of it, balanced by FFmpeg's teardown Release) and builds an owned
// device_ref/frames_ref pair freed by `D3d11Hw::Drop`; `hw` is a local, so it is dropped (and
// both refs freed) on every early `return Err`. For QSV, `av_hwdevice_ctx_create_derived` and
// `av_hwframe_ctx_create_derived` fill the null-initialised `qsv_device`/`qsv_frames` out-params
// only on success (`r >= 0` checked); on the frames-derive failure we unref the already-created
// `qsv_device` before bailing. `open_win_encoder` internally `av_buffer_ref`s the dev/frames
// refs it is given (so ownership of `hw`'s and the derived refs stays here), and on its failure
// we unref the still-owned derived `qsv_frames`/`qsv_device` (null for AMF → skipped) and return
// — `hw` then drops its D3D11 refs. On success the derived refs are moved into `ZeroCopyInner`
// (freed in its `Drop`) and the encoder holds its own AddRef'd copies. Every `AVBufferRef` is
// unref'd exactly once across all paths — no leak, no double-free.
unsafe {
let hw = D3d11Hw::new(device, sw_av, bind_flags, width, height, POOL)?;
let (pix_fmt, dev_ref, frames_ref, mut qsv_device, mut qsv_frames) = match vendor {
@@ -887,6 +959,19 @@ impl ZeroCopyInner {
}
fn submit(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> {
// SAFETY: `d3d = av_frame_alloc()` is a fresh owned frame (null-checked) and is `av_frame_free`d
// exactly once on every path below. `av_hwframe_get_buffer` fills it from the pool — on failure
// we free it and bail. `(*d3d).data[0]` is the pool's texture-array and `data[1]` the array
// index; `from_raw_borrowed` borrows that `ID3D11Texture2D` WITHOUT taking ownership (no Release
// — the frame owns it) and is null-checked. `src` (the captured texture) and `dst` (the pooled
// slice) live on the SAME D3D11 device wrapped by `self.hw`, and the caller guarantees
// `captured.format == pool_format` before calling, so `CopySubresourceRegion(dst, dst_index, ..,
// src, 0, ..)` on the single-threaded immediate context `self.ctx` is a valid same-format GPU
// copy. For QSV the mapped `qsv` frame is a fresh owned frame whose `hw_frames_ctx` takes an
// `av_buffer_ref` of `self.qsv_frames`; it is `av_frame_free`d (releasing that ref) on both the
// map-failure and success paths. `avcodec_send_frame` only internally refs the input frame, so
// the `av_frame_free(d3d)`/`av_frame_free(qsv)` afterwards are the sole owning frees — no leak,
// no double-free, no use-after-free.
unsafe {
// Pull a pooled D3D11 surface; its data[0] is the pool's texture-ARRAY, data[1] the slice.
let mut d3d = ffi::av_frame_alloc();
@@ -959,6 +1044,11 @@ impl ZeroCopyInner {
impl Drop for ZeroCopyInner {
fn drop(&mut self) {
// SAFETY: `qsv_frames`/`qsv_device` are the derived QSV `AVBufferRef`s (or null for AMF); each
// is `av_buffer_unref`'d once here (nulling the pointer through the `&mut`) — `ZeroCopyInner`
// owns these handles exclusively and this `Drop` runs once, so no double-free. The `enc` and
// `hw` fields free the encoder's AddRef'd copies and the D3D11 device/frames refs through their
// own `Drop`, so all references stay balanced.
unsafe {
if !self.qsv_frames.is_null() {
ffi::av_buffer_unref(&mut self.qsv_frames);
@@ -996,6 +1086,13 @@ pub struct FfmpegWinEncoder {
}
// Raw FFI pointers + COM objects; the encoder lives on a single thread (same contract as NVENC/VAAPI).
// SAFETY: `FfmpegWinEncoder` owns raw libav pointers (`AVFrame`/`SwsContext`/`AVBufferRef`) and
// windows-rs COM handles (`ID3D11Device`/`ID3D11DeviceContext`/textures) that are not auto-`Send`. The
// session creates the encoder, drives `submit`/`poll`/`flush`, and drops it all on one dedicated encode
// thread; it is never shared by reference across threads, and the D3D11 immediate context is only ever
// touched from that thread. The only cross-thread action is the initial move to the encode thread,
// after which every interior pointer/COM ref is used single-threaded — the same contract the
// NVENC/VAAPI encoders rely on. No interior state is accessed concurrently.
unsafe impl Send for FfmpegWinEncoder {}
impl FfmpegWinEncoder {
@@ -1012,6 +1109,8 @@ impl FfmpegWinEncoder {
) -> Result<Self> {
ffmpeg::init().context("ffmpeg init")?;
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
// SAFETY: `ffmpeg::init()` ran on the line above, so libav is initialised; `av_log_set_level`
// is a global scalar setter with no pointer arguments.
unsafe { ffi::av_log_set_level(48) };
}
// Make sure the encoder name exists in this libavcodec build up front (clear error vs a
@@ -13,6 +13,9 @@
//! Needs a real NVIDIA GPU at runtime (session creation fails otherwise) — compiles GPU-less, but
//! `open`/`submit` only succeed on a GPU box. The software encoder (`super::sw`) is the fallback.
// Every `unsafe` block / impl in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Codec, EncodedFrame, Encoder, EncoderCaps};
use crate::capture::{CapturedFrame, FramePayload, PixelFormat};
use anyhow::{anyhow, bail, Context, Result};
@@ -88,7 +91,15 @@ pub struct NvencD3d11Encoder {
init_device: *mut c_void,
}
// Raw NVENC handle + COM ptrs; confined to the single encode thread (like the Linux encoder).
// SAFETY: the `!Send` fields are the raw NVENC session/device handles (`encoder`, `init_device`),
// the raw NVENC bitstream/registered/mapped pointers carried in `bitstreams`/`regs`/`pending`, and
// the `ID3D11Texture2D` COM refs — none of which may be touched concurrently from two threads. This
// encoder is owned by exactly one thread: it is moved onto the host encode thread once at
// construction, and every NVENC call and D3D11 access happens only from that thread thereafter
// (`submit`/`poll`/`invalidate_ref_frames`/`Drop` all run there, like the Linux encoder). Moving the
// handles across that single ownership-transfer boundary is sound because no NVENC/D3D11 call is in
// flight during the move and the session and its D3D11 immediate context are never shared (`&`) or
// used concurrently — so `Send` introduces no data race on the non-`Send` fields.
unsafe impl Send for NvencD3d11Encoder {}
impl NvencD3d11Encoder {
@@ -403,6 +414,17 @@ impl NvencD3d11Encoder {
/// Lazily create the session on the first frame's D3D11 device (so capture + encode share it).
fn init_session(&mut self, device: &ID3D11Device) -> Result<()> {
// SAFETY: every call below goes through a function pointer resolved once from the loaded
// `nvidia_video_codec_sdk::ENCODE_API` (`nvEncodeAPI`) table, or through this type's own
// `unsafe fn`s whose contract is met here. `query_caps`/`try_open_session` receive `device`,
// the live `ID3D11Device` the caller pulled off the first frame; each returns either a valid
// open NVENC session handle or an `Err`. `destroy_encoder` is only ever called on a handle a
// `try_open_session` just returned (and `best` only when `!best.is_null()`), so it never frees
// a dangling or null session. `create_bitstream_buffer` is passed `enc` — the one chosen live
// session — and `&mut cb`, a `#[repr(C)] NV_ENC_CREATE_BITSTREAM_BUFFER` whose `version` is set
// to `NV_ENC_CREATE_BITSTREAM_BUFFER_VER`; `cb` lives across the synchronous call and its
// returned `bitstreamBuffer` is copied into `self.bitstreams` before `cb` drops. No handle
// escapes the encode thread.
unsafe {
// Probe real GPU caps first (max dims / 10-bit / custom-VBV / RFI) so the config below is
// gated on what this card supports and an out-of-range mode fails with a clear error
@@ -589,6 +611,11 @@ impl Encoder for NvencD3d11Encoder {
new = format!("{}x{}", captured.width, captured.height),
"NVENC: capture device/size/HDR changed — re-initializing session"
);
// SAFETY: `teardown` (an `unsafe fn`) requires the encode thread with no NVENC call in
// flight and a session whose cached regs/bitstreams/pending all belong to `self.encoder`.
// All hold: this is the synchronous encode thread, `self.inited` so `self.encoder` is the
// live session every cached resource was created against, and the previous frame's encode
// has already been polled (synchronous submit→poll), so nothing is mid-encode.
unsafe { self.teardown() };
}
if !self.inited {
@@ -609,7 +636,14 @@ impl Encoder for NvencD3d11Encoder {
self.bit_depth = 10;
nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ABGR10
}
PixelFormat::Nv12 => nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_NV12,
PixelFormat::Nv12 => {
// NV12 is 8-bit 4:2:0. Force 8-bit so a transition from a prior P010 (10-bit) session
// — or a 10-bit-negotiated client on an SDR display — re-inits at the matching depth.
// Unlike ARGB (which NVENC upconverts to Main10), NV12 cannot feed a 10-bit session:
// `register_resource` rejects it as InvalidParam (the HDR→SDR-toggle stream drop).
self.bit_depth = 8;
nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_NV12
}
_ => nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB,
};
let device = frame.device.clone();
@@ -618,6 +652,21 @@ impl Encoder for NvencD3d11Encoder {
}
let slot = self.next % POOL;
self.next += 1;
// SAFETY: every NVENC call goes through a function pointer from the loaded `ENCODE_API` table
// and takes `self.encoder`, the live session `init_session` just established (non-null on the
// path that reaches here). `NV_ENC_REGISTER_RESOURCE rr` has `version =
// NV_ENC_REGISTER_RESOURCE_VER` and registers `frame.texture` — a D3D11 texture from
// `frame.device`, which is the SAME device the session was opened against (any device change
// tears down and re-inits above, so `init_device == frame.device.as_raw()` here); the cloned
// `ID3D11Texture2D` is kept alive in `regs` so NVENC's registration never outlives the texture.
// `mp` (`NV_ENC_MAP_INPUT_RESOURCE`, version set) maps that registration and the map is recorded
// in `pending` to be unmapped exactly once in `poll`/`teardown`. `pic` (`NV_ENC_PIC_PARAMS`,
// version set) points `inputBuffer` at `mp.mappedResource` and `outputBitstream` at the live
// pool bitstream `bitstreams[slot]`; the optional SEI scratch (`mastering_sei`/`cll_sei` and the
// `sei` Vec whose `as_mut_ptr()` is written into the codec union) are stack locals that outlive
// the synchronous `encode_picture`. Every `#[repr(C)]` param is a live local borrowed `&mut`
// for the duration of its one synchronous call. (In-place encode without `CopyResource` is
// sound because the encode loop is synchronous, as the module docs state.)
unsafe {
// Register the capturer's texture with NVENC once (cached by raw pointer), then encode it
// IN PLACE — no `CopyResource` into an encoder-owned pool. This is the zero-copy win: the
@@ -774,6 +823,12 @@ impl Encoder for NvencD3d11Encoder {
// We tag each input with `inputTimeStamp = frame_idx` (0,1,2,…), which is also the client's
// frame number (the packetizer numbers frames in submit order), so the client's lost-frame
// range maps 1:1 onto the timestamps NVENC invalidates here.
// SAFETY: `invalidate_ref_frames` is a function pointer from the loaded `ENCODE_API` table.
// `self.encoder` was checked non-null at the top of this fn and is the live session; this runs
// on the encode thread (like submit/poll), so there is no concurrent NVENC use. Each `ts` was
// clamped to `[oldest_in_dpb, frame_idx - 1]` above, so it names a frame still in the session's
// DPB; the call passes only that `u64` timestamp (no struct), so there is no struct-size or
// lifetime concern.
unsafe {
for ts in first..=last {
if (API.invalidate_ref_frames)(self.encoder, ts as u64)
@@ -792,6 +847,16 @@ impl Encoder for NvencD3d11Encoder {
let Some((bs, map, pts_ns)) = self.pending.pop_front() else {
return Ok(None);
};
// SAFETY: a non-empty `pending` implies `submit` ran, so `self.encoder` is the live session
// (`teardown` clears `pending` whenever it nulls the handle); all calls below use function
// pointers from the loaded `ENCODE_API` table on the encode thread. `NV_ENC_LOCK_BITSTREAM lock`
// (version = `NV_ENC_LOCK_BITSTREAM_VER`) locks `bs`, a pool bitstream a prior `encode_picture`
// targeted; `lock_bitstream` blocks until that encode finishes, so on success
// `lock.bitstreamBufferPtr` is non-null and points at `lock.bitstreamSizeInBytes` bytes of
// NVENC-owned, CPU-readable output valid until `unlock_bitstream`. The `from_raw_parts` slice is
// only read (copied via `to_vec()`) BEFORE `unlock_bitstream(bs)` — lock and unlock pair on the
// same buffer — so it never outlives the lock. `map` (the input resource paired with `bs` in
// `pending`) is unmapped here, after the encode completed, exactly once.
unsafe {
let mut lock = nv::NV_ENC_LOCK_BITSTREAM {
version: nv::NV_ENC_LOCK_BITSTREAM_VER,
@@ -831,6 +896,11 @@ impl Encoder for NvencD3d11Encoder {
impl Drop for NvencD3d11Encoder {
fn drop(&mut self) {
// SAFETY: `teardown` (an `unsafe fn`) needs the owning thread with no NVENC call in flight and
// a session whose cached resources all belong to `self.encoder`. At Drop this encoder is owned
// exclusively (no other reference can exist), runs on the encode thread it was confined to, and
// `teardown` early-returns when `self.encoder` is null; otherwise every cached reg/bitstream/
// pending was created against that live session. It runs exactly once (here).
unsafe { self.teardown() };
}
}
@@ -2,6 +2,8 @@
//! fallback when NVENC is unavailable). Low-latency screen-content config: single-reference,
//! no B-frames (Baseline), bitrate rate-control, in-band SPS/PPS each IDR, BT.709 limited range.
//! Synchronous: `submit` encodes immediately and stashes the AU for `poll` (no internal queue).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{EncodedFrame, Encoder};
use crate::capture::{CapturedFrame, FramePayload, PixelFormat};
@@ -30,6 +32,12 @@ pub struct OpenH264Encoder {
}
// openh264's Encoder holds a raw C handle (not auto-Send); it lives on the single encode thread.
// SAFETY: `OpenH264Encoder` wraps `Oh264` (openh264's `Encoder`), which holds a raw C handle to the
// openh264 `ISVCEncoder` and is not auto-`Send`; the other fields (`YUVBuffer`, `Vec`, scalars,
// `Option<EncodedFrame>`) are plain owned data. The session creates the encoder, calls
// `submit`/`poll`/`flush`, and drops it all on one dedicated encode thread, never sharing it by
// reference across threads, so the C handle is only ever touched from a single thread. Moving the
// whole value to that thread is therefore sound — there is no concurrent access to the handle.
unsafe impl Send for OpenH264Encoder {}
impl OpenH264Encoder {
+48 -1
View File
@@ -17,6 +17,9 @@
//! data packets are consumed immediately and missing parity only costs loss recovery — so
//! the validated stereo path stays byte-identical (data packets only, exactly as before).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
#[cfg(any(target_os = "linux", target_os = "windows", test))]
use crate::audio::SAMPLE_RATE;
#[cfg(any(target_os = "linux", target_os = "windows"))]
@@ -409,7 +412,10 @@ struct MsEncoder {
st: std::ptr::NonNull<audiopus_sys::OpusMSEncoder>,
}
// The raw encoder state has no thread affinity; the session owns it on one thread at a time.
// SAFETY: `MsEncoder` owns a unique `OpusMSEncoder` via `NonNull` (it is neither `Clone` nor
// `Sync`, so the pointer is never aliased). libopus's multistream encoder state is a self-contained
// heap allocation with no thread-local or thread-affine state, so moving ownership to another thread
// is sound; every method takes `&mut self`, keeping access single-threaded at any instant.
#[cfg(target_os = "linux")]
unsafe impl Send for MsEncoder {}
@@ -418,6 +424,13 @@ impl MsEncoder {
fn new(layout: &OpusLayout) -> Result<MsEncoder> {
use std::os::raw::c_int;
let mut err: c_int = 0;
// SAFETY: every scalar arg is a valid libopus input (sample rate, channel/stream/coupled
// counts, the RESTRICTED_LOWDELAY application constant). `layout.mapping.as_ptr()` addresses
// a 'static slice of exactly `layout.channels` bytes (every `OpusLayout` constant upholds
// that), which is the element count `opus_multistream_encoder_create` reads through it, and
// `&mut err` is a live local the call writes its status into. libopus copies the mapping into
// its own allocation, so the pointer need only be valid for the call; the returned pointer is
// null/`OPUS_OK`-checked below before any use.
let st = unsafe {
audiopus_sys::opus_multistream_encoder_create(
SAMPLE_RATE as i32,
@@ -432,6 +445,11 @@ impl MsEncoder {
let st = std::ptr::NonNull::new(st)
.filter(|_| err == audiopus_sys::OPUS_OK)
.ok_or_else(|| anyhow::anyhow!("opus_multistream_encoder_create failed ({err})"))?;
// SAFETY: `st` is the non-null encoder `opus_multistream_encoder_create` just returned, owned
// exclusively here. Each `opus_multistream_encoder_ctl` call passes a valid request constant
// with the single by-value `c_int` argument that request's variadic ABI expects
// (`OPUS_SET_BITRATE_REQUEST` → bitrate, `OPUS_SET_VBR_REQUEST` → 0). No pointer escapes the
// call and the encoder outlives it.
unsafe {
audiopus_sys::opus_multistream_encoder_ctl(
st.as_ptr(),
@@ -453,6 +471,13 @@ impl MsEncoder {
samples_per_channel: usize,
out: &mut [u8],
) -> Result<usize> {
// SAFETY: `self.st` is the live encoder from `new`. libopus reads `samples_per_channel *
// channels` f32s through `frame.as_ptr()`; every caller passes a `frame` of exactly that
// length together with the matching `samples_per_channel` (`audio_body`'s `frame_len =
// samples_per_channel * layout.channels`; the round-trip tests size identically), so the read
// stays in bounds. `out.as_mut_ptr()` is written for at most `out.len()` bytes, which is
// passed as the capacity bound. Both buffers are live locals outliving this synchronous call;
// the return value is range-checked before being used as a length.
let n = unsafe {
audiopus_sys::opus_multistream_encode_float(
self.st.as_ptr(),
@@ -470,6 +495,9 @@ impl MsEncoder {
#[cfg(target_os = "linux")]
impl Drop for MsEncoder {
fn drop(&mut self) {
// SAFETY: `self.st` is the encoder `opus_multistream_encoder_create` returned; this
// `MsEncoder` owns it uniquely and `drop` runs exactly once, so the destroy frees it once
// with no subsequent use.
unsafe { audiopus_sys::opus_multistream_encoder_destroy(self.st.as_ptr()) }
}
}
@@ -761,6 +789,10 @@ mod tests {
let client_mapping = client_swap(&digits[3..]);
let mut err = 0i32;
// SAFETY: scalar args are valid libopus inputs. `client_mapping.as_ptr()` addresses a
// `Vec<u8>` of exactly `ch` entries (derived from the advertised surround-params), which is
// the element count the decoder reads through it, and `&mut err` is a live local the call
// writes. The returned pointer is `OPUS_OK`/non-null-checked immediately below before use.
let dec = unsafe {
audiopus_sys::opus_multistream_decoder_create(
SAMPLE_RATE as i32,
@@ -789,6 +821,11 @@ mod tests {
}
let n = enc.encode_float(&frame, samples, &mut out).unwrap();
assert!(n > 0);
// SAFETY: `dec` is the non-null decoder asserted above. `out.as_ptr()` is read for
// the `n` encoded bytes just produced by `encode_float`; `decoded.as_mut_ptr()` is
// written for up to `samples * ch` f32s and `decoded` is exactly that long; `samples`
// is the per-channel frame size. All buffers are live locals outliving the call; the
// return is checked to equal `samples`.
let got = unsafe {
audiopus_sys::opus_multistream_decode_float(
dec,
@@ -817,6 +854,8 @@ mod tests {
(energies: {energy:?})"
);
}
// SAFETY: `dec` is the decoder `opus_multistream_decoder_create` returned; the test owns it
// and destroys it exactly once here, after the final decode — no later use, no double free.
unsafe { audiopus_sys::opus_multistream_decoder_destroy(dec) };
}
@@ -853,6 +892,9 @@ mod tests {
let digits: Vec<u8> = s.bytes().map(|b| b - b'0').collect();
let client_mapping = client_swap(&digits[3..]);
let mut err = 0i32;
// SAFETY: scalar args are valid; `client_mapping.as_ptr()` addresses a 6-entry `Vec<u8>`
// (matches the 6-channel layout the decoder reads through it), alive past the call, and
// `&mut err` is a live local. The pointer is `OPUS_OK`-checked before use.
let dec = unsafe {
audiopus_sys::opus_multistream_decoder_create(
48000,
@@ -865,6 +907,10 @@ mod tests {
};
assert_eq!(err, audiopus_sys::OPUS_OK);
let mut pcm = vec![0f32; 240 * 6];
// SAFETY: `dec` is the non-null decoder from create. `out.as_ptr()` is read for the CBR
// packet length passed in (`*sizes.first()`, a real encoded packet size in `out`);
// `pcm.as_mut_ptr()` is written for up to `240 * 6` f32s and `pcm` is exactly that long;
// `240` is the per-channel frame size. All buffers are live locals outliving the call.
let got = unsafe {
audiopus_sys::opus_multistream_decode_float(
dec,
@@ -875,6 +921,7 @@ mod tests {
0,
)
};
// SAFETY: `dec` is owned by the test; destroyed exactly once here after the final decode.
unsafe { audiopus_sys::opus_multistream_decoder_destroy(dec) };
assert_eq!(got, 240);
}
@@ -3,6 +3,9 @@
//! either real portal desktop capture (`PUNKTFUNK_VIDEO_SOURCE=portal`, the portal PipeWire path) or
//! a synthetic test pattern (default). Runs on its own native thread.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
use super::video::{FrameType, VideoPacketizer};
use super::VIDEO_PORT;
use crate::capture::{self, Capturer, FastSyntheticCapturer};
@@ -141,6 +144,25 @@ fn run(
)
.context("capture virtual output")?;
capturer.set_active(true);
// Launch the app's command now that capture is live, for the backends that DON'T nest it via
// set_launch_command above: Windows (no gamescope) and Linux kwin/mutter/wlroots (which stream
// the existing desktop, so the app must be spawned into the session to land on the streamed
// output). Linux gamescope already nested it via set_launch_command, so skip it there.
#[cfg(windows)]
let launch_here = true;
#[cfg(target_os = "linux")]
let launch_here = compositor != crate::vdisplay::Compositor::Gamescope;
#[cfg(any(windows, target_os = "linux"))]
if launch_here {
if let Some(cmd) = app
.and_then(|a| a.cmd.as_deref())
.filter(|c| !c.trim().is_empty())
{
if let Err(e) = crate::library::launch_gamestream_command(cmd) {
tracing::warn!(command = %cmd, error = %e, "gamestream: could not launch app");
}
}
}
return stream_body(&mut *capturer, &sock, cfg, running, force_idr, rfi_range);
}
@@ -188,6 +210,10 @@ fn sendmmsg_all(sock: &UdpSocket, pkts: &[Vec<u8>]) -> std::io::Result<()> {
let mut hdrs: Vec<libc::mmsghdr> = iovs
.iter_mut()
.map(|iov| {
// SAFETY: `libc::mmsghdr` is a plain `#[repr(C)]` struct of integers and raw
// pointers, for which an all-zero bit pattern is valid (null pointers / zero
// lengths); the fields we rely on (`msg_iov`, `msg_iovlen`) are overwritten on the
// next two lines before the struct is handed to the kernel.
let mut h: libc::mmsghdr = unsafe { std::mem::zeroed() };
h.msg_hdr.msg_iov = iov;
h.msg_hdr.msg_iovlen = 1;
@@ -196,6 +222,13 @@ fn sendmmsg_all(sock: &UdpSocket, pkts: &[Vec<u8>]) -> std::io::Result<()> {
.collect();
let mut off = 0usize;
while off < hdrs.len() {
// SAFETY: `fd` is `sock`'s live raw fd (`sock` outlives the call). `hdrs[off..]
// .as_mut_ptr()` is a live slice of `(hdrs.len() - off)` `mmsghdr`s — exactly the count
// passed — into which the kernel writes each `msg_len`. Each header's `msg_iov` points
// into `iovs` (a local that outlives this call, with `msg_iovlen == 1` matching its one
// entry) and each `iovec.iov_base` points into the `chunk` packet buffers (the caller's
// `pkts`, alive for the call); the kernel only reads those payloads. Flags 0; the return
// is error-/progress-checked before advancing `off`.
let n = unsafe {
libc::sendmmsg(fd, hdrs[off..].as_mut_ptr(), (hdrs.len() - off) as u32, 0)
};
@@ -15,6 +15,9 @@
//! `<linux/uinput.h>` on x86_64. `/dev/uinput` needs a udev rule + `input` group membership
//! (see `scripts/60-punktfunk.rules`); creation fails with a clear error otherwise.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use crate::gamestream::gamepad::{self, GamepadFrame, MAX_PADS};
use anyhow::{bail, Result};
use std::collections::HashMap;
@@ -215,6 +218,11 @@ const _: () = {
};
fn ioctl_int(fd: i32, req: libc::c_ulong, arg: libc::c_int, what: &str) -> Result<()> {
// SAFETY: every caller passes one of UI_SET_EVBIT/KEYBIT/FFBIT/UI_DEV_CREATE/UI_DEV_DESTROY as
// `req` — all integer-argument ioctls whose third arg the kernel takes BY VALUE, so nothing is
// dereferenced through `arg` and no memory must outlive the call. The only precondition is `fd`
// being a valid open descriptor; callers pass the live `/dev/uinput` fd, and even a stale fd
// would merely return -1/EBADF (reported below), never UB.
if unsafe { libc::ioctl(fd, req, arg) } < 0 {
bail!("{what}: {}", std::io::Error::last_os_error());
}
@@ -222,6 +230,12 @@ fn ioctl_int(fd: i32, req: libc::c_ulong, arg: libc::c_int, what: &str) -> Resul
}
fn ioctl_ptr<T>(fd: i32, req: libc::c_ulong, arg: *mut T, what: &str) -> Result<()> {
// SAFETY: `fd` is the caller's live `/dev/uinput` fd. Every call site passes `&mut x` for a live,
// uniquely-borrowed `#[repr(C)]` `x: T` whose size matches the struct the request number encodes
// (UI_DEV_SETUP=0x405c_5503 → 0x5c=92=size_of::<UinputSetup>(); UI_ABS_SETUP → 0x1c=28; the FF
// upload/erase ioctls → 0x68/0x0c — all pinned by the `size_of` asserts above). The kernel copies
// exactly that many bytes in/out through `arg`; the `&mut` keeps the pointee alive and unaliased
// for the whole synchronous call.
if unsafe { libc::ioctl(fd, req, arg) } < 0 {
bail!("{what}: {}", std::io::Error::last_os_error());
}
@@ -251,6 +265,9 @@ pub struct VirtualPad {
impl VirtualPad {
pub fn create(index: usize, identity: PadIdentity) -> Result<VirtualPad> {
use std::os::fd::FromRawFd;
// SAFETY: `c"/dev/uinput"` is a 'static NUL-terminated C string literal; `as_ptr()` yields a
// valid pointer the kernel only reads as a filesystem path. `open` returns a fresh fd (or -1)
// and retains nothing; no Rust memory is aliased or handed to the kernel beyond that 'static path.
let raw = unsafe {
libc::open(
c"/dev/uinput".as_ptr(),
@@ -264,6 +281,9 @@ impl VirtualPad {
std::io::Error::last_os_error()
);
}
// SAFETY: `raw >= 0` here (the `< 0` branch above already bailed), so it is a freshly-opened fd
// from `libc::open` that is not stored or owned anywhere else. Transferring it to `OwnedFd` makes
// this the unique owner, which will `close` it exactly once on drop (no double-close, no leak).
let fd = unsafe { OwnedFd::from_raw_fd(raw) };
ioctl_int(raw, UI_SET_EVBIT, EV_KEY as i32, "UI_SET_EVBIT(EV_KEY)")?;
@@ -356,6 +376,11 @@ impl VirtualPad {
code,
value,
};
// SAFETY: `ev` is a live local `#[repr(C)]` struct of all-integer fields with no padding bytes
// (timeval=16 + u16 + u16 + i32 = 24, the size asserted above), so every byte is initialized and
// valid to read as `u8`. The pointer is non-null and `u8`-aligned (align 1), the length is exactly
// `size_of::<InputEventRaw>()` so the slice spans precisely `ev`'s bytes (in bounds), and `ev`
// outlives `bytes` (used immediately below) with no concurrent mutation (single-threaded local).
let bytes = unsafe {
std::slice::from_raw_parts(
&ev as *const _ as *const u8,
@@ -363,6 +388,10 @@ impl VirtualPad {
)
};
// Best-effort: a full kernel queue drops the event; the next frame re-syncs state.
// SAFETY: `self.fd` is the live uinput `OwnedFd` (borrowed via `as_raw_fd`, so it stays open for
// the call); `bytes` is the slice above backed by the still-live local `ev`. `write` only READS
// exactly `bytes.len()` bytes from `bytes.as_ptr()` (in bounds) and retains nothing past return,
// so the buffer outlives the synchronous call and the read-only access cannot race or alias.
let _ = unsafe {
libc::write(
self.fd.as_raw_fd(),
@@ -404,6 +433,10 @@ impl VirtualPad {
let raw = self.fd.as_raw_fd();
let mut buf = [0u8; std::mem::size_of::<InputEventRaw>()];
loop {
// SAFETY: `raw` is the live raw fd of `self.fd` (the non-blocking uinput device). `buf` is a
// live local `[u8; size_of::<InputEventRaw>()]`; `buf.as_mut_ptr()` is a valid writable pointer
// to its `buf.len()` bytes. `read` writes AT MOST `buf.len()` bytes (in bounds), the buffer
// outlives this synchronous call, and `buf` is borrowed uniquely here (no alias/race).
let n = unsafe { libc::read(raw, buf.as_mut_ptr() as *mut libc::c_void, buf.len()) };
if n != buf.len() as isize {
break; // EAGAIN / short read — queue drained
@@ -415,6 +448,10 @@ impl VirtualPad {
unsafe { std::ptr::read_unaligned(buf.as_ptr() as *const InputEventRaw) };
match (ev.type_, ev.code) {
(EV_UINPUT, UI_FF_UPLOAD) => {
// SAFETY: `UinputFfUpload` is `#[repr(C)]` over integers (`u32`, `i32`) and two
// `FfEffect`s (integers + `[u8; 32]`); all-zero is a valid bit pattern for every field
// (no bool/NonZero/enum/reference niche), so `zeroed` yields a fully-initialized valid
// value — `request_id` is then set below and the rest filled by UI_BEGIN_FF_UPLOAD.
let mut up: UinputFfUpload = unsafe { std::mem::zeroed() };
up.request_id = ev.value as u32;
if ioctl_ptr(raw, UI_BEGIN_FF_UPLOAD, &mut up, "UI_BEGIN_FF_UPLOAD").is_ok() {
@@ -442,6 +479,9 @@ impl VirtualPad {
}
}
(EV_UINPUT, UI_FF_ERASE) => {
// SAFETY: `UinputFfErase` is `#[repr(C)]` over three integer fields (`u32`, `i32`,
// `u32`); all-zero is a valid bit pattern for each, so `zeroed` produces a fully-valid
// initialized value — `request_id` is set below and `effect_id` filled by the ioctl.
let mut er: UinputFfErase = unsafe { std::mem::zeroed() };
er.request_id = ev.value as u32;
if ioctl_ptr(raw, UI_BEGIN_FF_ERASE, &mut er, "UI_BEGIN_FF_ERASE").is_ok() {
@@ -492,6 +532,9 @@ impl VirtualPad {
impl Drop for VirtualPad {
fn drop(&mut self) {
// SAFETY: `self.fd` is still the live owned uinput fd here (the `OwnedFd` field is closed only
// AFTER this `drop` body returns), borrowed by `as_raw_fd`. UI_DEV_DESTROY takes its argument
// (0) BY VALUE, so nothing is dereferenced or aliased; the ioctl just tears down the device.
let _ = unsafe { libc::ioctl(self.fd.as_raw_fd(), UI_DEV_DESTROY, 0) };
}
}
@@ -5,6 +5,9 @@
//! keymap, and translate events into virtual pointer/keyboard requests, tracking modifier state
//! so the compositor resolves shifted keysyms correctly.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{gs_button_to_evdev, vk_to_evdev, InputEvent, InputInjector};
use anyhow::{bail, Context, Result};
use punktfunk_core::input::InputKind;
@@ -264,10 +267,17 @@ impl InputInjector for WlrootsInjector {
/// Create an anonymous in-memory file holding `s` + a trailing NUL (for the keymap fd).
fn memfd_with(s: &str) -> Result<std::fs::File> {
let name = b"punktfunk-keymap\0";
// SAFETY: `name` is a byte-string literal with an explicit trailing NUL, so `name.as_ptr()` is a
// valid NUL-terminated C string; `memfd_create` only reads that name (copying it) and creates an
// anonymous file, returning a fresh fd (or -1). `MFD_CLOEXEC` is a valid flag. The 'static literal
// outlives the synchronous call and nothing aliases it. The result is checked `< 0` below.
let fd = unsafe { libc::memfd_create(name.as_ptr() as *const libc::c_char, libc::MFD_CLOEXEC) };
if fd < 0 {
bail!("memfd_create failed: {}", std::io::Error::last_os_error());
}
// SAFETY: `fd` is the fresh memfd `memfd_create` just returned and checked `>= 0`; it is a unique
// open fd nothing else owns, so `File` takes sole ownership and closes it exactly once on drop —
// no alias, no double-close.
let mut f = unsafe { std::fs::File::from_raw_fd(fd) };
f.write_all(s.as_bytes()).context("write keymap")?;
f.write_all(&[0]).context("write keymap NUL")?;
@@ -41,7 +41,8 @@ pub(super) const SHM_MAGIC: u32 = pf_driver_proto::gamepad::PAD_MAGIC; // "PFDS"
pub(super) const OFF_INPUT: usize = core::mem::offset_of!(pf_driver_proto::gamepad::PadShm, input);
pub(super) const OFF_OUT_SEQ: usize =
core::mem::offset_of!(pf_driver_proto::gamepad::PadShm, out_seq);
pub(super) const OFF_OUTPUT: usize = core::mem::offset_of!(pf_driver_proto::gamepad::PadShm, output);
pub(super) const OFF_OUTPUT: usize =
core::mem::offset_of!(pf_driver_proto::gamepad::PadShm, output);
/// Device-type selector the driver reads to choose which HID identity/descriptor it serves: 0 =
/// DualSense (the default — the section is zeroed), 1 = DualShock 4.
pub(super) const OFF_DEVTYPE: usize =
@@ -187,8 +187,10 @@ impl XusbWinPad {
#[allow(clippy::too_many_arguments)]
fn write_state(&mut self, buttons: u16, lt: u8, rt: u8, lx: i16, ly: i16, rx: i16, ry: i16) {
self.packet = self.packet.wrapping_add(1);
// SAFETY: base points at SHM_SIZE bytes; all offsets are in range.
let base = self.shm.base();
// SAFETY: `base` is the start of the mapped section (`SHM_SIZE` bytes, owned by `Shm`); every
// `OFF_*` is a fixed in-range offset into it and `write_unaligned` handles the unaligned field
// writes. Single owner (`&mut self`), so no concurrent writer races these stores.
unsafe {
std::ptr::write_unaligned(base.add(OFF_BUTTONS) as *mut u16, buttons);
*base.add(OFF_LT) = lt;
@@ -5,6 +5,9 @@
//! thread stays bound to its desktop and only reattaches (`OpenInputDesktop`/`SetThreadDesktop`) when
//! `SendInput` reports a short write (the input desktop switched) — no per-event reattach overhead.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it.
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::Result;
use punktfunk_core::input::{InputEvent, InputKind};
use std::mem::size_of;
@@ -35,7 +38,12 @@ pub struct SendInputInjector {
desktop: Option<HDESK>,
}
// Only ever used from the host's single injector thread.
// SAFETY: `SendInputInjector` holds only an `Option<HDESK>` (a desktop handle). The host creates
// and drives it from a single dedicated injector thread; the handle is opened, rebound, and closed
// on whichever thread owns the value, and the type is not `Sync`, so there is never concurrent
// access. A desktop `HDESK` is not thread-affine for ownership (`CloseDesktop` works from any
// thread; `SetThreadDesktop` rebinds the current thread), so transferring ownership via `Send` is
// sound.
unsafe impl Send for SendInputInjector {}
impl SendInputInjector {
@@ -49,6 +57,12 @@ impl SendInputInjector {
/// Bind this thread to the desktop currently receiving input. UAC / lock screen / Ctrl-Alt-Del
/// swap the input desktop; `SendInput` silently no-ops unless our thread is on it.
fn reattach_input_desktop(&mut self) {
// SAFETY: `OpenInputDesktop`/`SetThreadDesktop`/`CloseDesktop` are FFI calls passed only
// by-value args (constant desktop flags, a `bool`, an access mask). `OpenInputDesktop`
// yields an owned `HDESK` only on `Ok`; we then either install it with `SetThreadDesktop`
// (closing the previously-owned handle exactly once) or close the fresh handle on failure —
// so every handle is closed exactly once and none is used after close. `SetThreadDesktop`
// only rebinds this calling thread, which is where the injector runs.
unsafe {
match OpenInputDesktop(
DESKTOP_CONTROL_FLAGS(0),
@@ -75,12 +89,17 @@ impl SendInputInjector {
/// switched out from under us, e.g. into UAC/lock) do we reattach to the now-current input desktop
/// and retry once. This serves both the normal and secure desktops with no steady-state overhead.
fn send(&mut self, inputs: &[INPUT]) -> Result<()> {
// SAFETY: `inputs` is a live `&[INPUT]` slice that outlives this synchronous `SendInput`
// call; `size_of::<INPUT>()` is the exact per-element stride Win32 requires as `cbSize`. The
// call only reads the array (one event per element) and returns the count injected.
let n = unsafe { SendInput(inputs, size_of::<INPUT>() as i32) };
if n as usize == inputs.len() {
return Ok(());
}
// Short write → the input desktop likely changed. Reattach + retry once.
self.reattach_input_desktop();
// SAFETY: same as the first `SendInput` — `inputs` is the identical live slice outliving the
// call and `cbSize == size_of::<INPUT>()`; only re-issued after reattaching the input desktop.
let n = unsafe { SendInput(inputs, size_of::<INPUT>() as i32) };
if n as usize != inputs.len() {
anyhow::bail!(
@@ -95,6 +114,9 @@ impl SendInputInjector {
impl Drop for SendInputInjector {
fn drop(&mut self) {
if let Some(h) = self.desktop.take() {
// SAFETY: `h` is the `HDESK` this injector owned (moved out of `self.desktop`);
// `CloseDesktop` runs once here in `Drop` on that still-valid handle, with no later use —
// no double close.
unsafe {
let _ = CloseDesktop(h);
}
@@ -216,7 +238,11 @@ impl InputInjector for SendInputInjector {
}
InputKind::KeyDown | InputKind::KeyUp => {
let down = event.kind == InputKind::KeyDown;
let vk = (event.code & 0xff) as u16; // client sends Windows VK
// client sends Windows VK
let vk = (event.code & 0xff) as u16;
// SAFETY: `MapVirtualKeyExW` is a pure value translation (VK → scancode); all three
// args are by-value (`u32`, the `MAPVK_VK_TO_VSC_EX` map-type constant, a `None`
// HKL). It dereferences no pointer and returns a `u32` — FFI-`unsafe` only.
let sc_ex = unsafe { MapVirtualKeyExW(vk as u32, MAPVK_VK_TO_VSC_EX, None) };
if sc_ex == 0 {
return Ok(()); // unmappable -> drop
@@ -264,6 +290,8 @@ fn key(ki: KEYBDINPUT) -> INPUT {
}
fn virtual_desktop_rect() -> (i32, i32, i32, i32) {
// SAFETY: each `GetSystemMetrics` takes a single by-value `SYSTEM_METRICS_INDEX` constant and
// returns an `i32`; it dereferences no pointer and has no side effects — FFI-`unsafe` only.
unsafe {
(
GetSystemMetrics(SM_XVIRTUALSCREEN),
+806
View File
@@ -548,6 +548,621 @@ fn heroic_launch_prefix() -> Option<String> {
flatpak.then(|| "flatpak run com.heroicgameslauncher.hgl".into())
}
// ---------------------------------------------------------------------------------------
// Epic Games Store (Windows) — reads the launcher's local `.item` manifests under ProgramData
// (no auth, launcher need not run). Cover art from the base64 `catcache.bin` (public Epic CDN).
// ---------------------------------------------------------------------------------------
/// Reads the Epic Games Launcher's local install manifests. Windows-only. Best-effort: empty when
/// the launcher (or its manifest dir) isn't present.
#[cfg(windows)]
pub struct EpicProvider;
#[cfg(windows)]
impl LibraryProvider for EpicProvider {
fn store(&self) -> &'static str {
"epic"
}
fn list(&self) -> Vec<GameEntry> {
let data = epic_data_dir();
let Ok(rd) = std::fs::read_dir(data.join("Manifests")) else {
return Vec::new();
};
// Parse the (best-effort) artwork cache ONCE: catalogItemId -> Artwork.
let art = epic_art_index(&data.join("Catalog").join("catcache.bin"));
let mut games = Vec::new();
for entry in rd.flatten() {
let p = entry.path();
if p.extension().and_then(|e| e.to_str()) != Some("item") {
continue;
}
let Ok(text) = std::fs::read_to_string(&p) else {
continue;
};
let Ok(v) = serde_json::from_str::<serde_json::Value>(&text) else {
continue;
};
if let Some(g) = epic_entry(&v, &art) {
games.push(g);
}
}
games
}
}
/// `%ProgramData%\Epic\EpicGamesLauncher\Data` (machine-wide, SYSTEM-readable).
#[cfg(windows)]
fn epic_data_dir() -> PathBuf {
std::env::var_os("ProgramData")
.map(PathBuf::from)
.unwrap_or_else(|| PathBuf::from("C:\\ProgramData"))
.join("Epic")
.join("EpicGamesLauncher")
.join("Data")
}
/// Map one `.item` manifest to a [`GameEntry`], or `None` if it isn't a launchable game. Uses
/// Playnite's proven EXCLUSION filter (skip `UE_*` Unreal components; skip a DLC/addon unless it is
/// `addons/launchable`) rather than a positive `games`-category match, which can drop legit titles.
#[cfg(windows)]
fn epic_entry(
v: &serde_json::Value,
art: &std::collections::HashMap<String, Artwork>,
) -> Option<GameEntry> {
let s = |k: &str| v.get(k).and_then(|x| x.as_str());
let app_name = s("AppName")?.to_string();
if app_name.starts_with("UE_") {
return None; // Unreal Engine component, not a game
}
let cats: Vec<&str> = v
.get("AppCategories")
.and_then(|c| c.as_array())
.map(|a| a.iter().filter_map(|x| x.as_str()).collect())
.unwrap_or_default();
if cats.contains(&"addons") && !cats.contains(&"addons/launchable") {
return None; // non-launchable DLC/addon
}
// Drop stale records whose install dir is gone.
let install = s("InstallLocation")?;
if !Path::new(install).is_dir() {
return None;
}
let title = s("DisplayName").unwrap_or(&app_name).to_string();
let namespace = s("CatalogNamespace").unwrap_or("");
let catalog = s("CatalogItemId").unwrap_or("");
// The robust launch form is the namespace:catalogItemId:appName triple; fall back to the bare
// appName when those ids are absent (some manifests lack them) — never drop the launch entirely.
let value = if !namespace.is_empty() && !catalog.is_empty() {
format!("{namespace}:{catalog}:{app_name}")
} else {
app_name.clone()
};
Some(GameEntry {
id: format!("epic:{app_name}"),
store: "epic".into(),
title,
art: art.get(catalog).cloned().unwrap_or_default(),
launch: Some(LaunchSpec {
kind: "epic".into(),
value,
}),
})
}
/// Best-effort parse of `catcache.bin` (base64-encoded JSON array of catalog items) into
/// catalogItemId → [`Artwork`] from each item's `keyImages`. Empty map on any read/decode failure
/// (the format is community-reverse-engineered + can lag a fresh install → titles just show no art).
#[cfg(windows)]
fn epic_art_index(catcache: &Path) -> std::collections::HashMap<String, Artwork> {
use base64::Engine as _;
let mut map = std::collections::HashMap::new();
let Ok(raw) = std::fs::read(catcache) else {
return map;
};
let Ok(decoded) = base64::engine::general_purpose::STANDARD.decode(raw) else {
return map;
};
let Ok(items) = serde_json::from_slice::<serde_json::Value>(&decoded) else {
return map;
};
let Some(arr) = items.as_array() else {
return map;
};
for item in arr {
let Some(cat) = item
.get("id")
.or_else(|| item.get("catalogItemId"))
.and_then(|v| v.as_str())
else {
continue;
};
let Some(images) = item.get("keyImages").and_then(|v| v.as_array()) else {
continue;
};
let mut art = Artwork::default();
for img in images {
let (Some(ty), Some(url)) = (
img.get("type").and_then(|v| v.as_str()),
img.get("url").and_then(|v| v.as_str()),
) else {
continue;
};
if !(url.starts_with("http://") || url.starts_with("https://")) {
continue;
}
match ty {
"DieselGameBoxTall" => art.portrait = Some(url.to_string()),
"DieselGameBox" => art.hero = Some(url.to_string()),
"DieselGameBoxLogo" => art.logo = Some(url.to_string()),
_ => {}
}
}
if art.portrait.is_some() || art.hero.is_some() || art.logo.is_some() {
map.insert(cat.to_string(), art);
}
}
map
}
/// Build the `com.epicgames.launcher://` launch URI from a stored launch value — the triple
/// `<namespace>:<catalogItemId>:<appName>` (colons URL-encoded), or a bare `<appName>` fallback.
/// Each part is charset-validated (host-derived, but belt-and-suspenders) so no shell/URI injection.
#[cfg(windows)]
fn epic_launch_uri(value: &str) -> Option<String> {
let ok = |s: &str| {
!s.is_empty()
&& s.bytes()
.all(|b| b.is_ascii_alphanumeric() || matches!(b, b'.' | b'_' | b'-'))
};
let inner = match value.split(':').collect::<Vec<_>>().as_slice() {
[ns, cat, app] if ok(ns) && ok(cat) && ok(app) => format!("{ns}%3A{cat}%3A{app}"),
[app] if ok(app) => (*app).to_string(),
_ => return None,
};
Some(format!(
"com.epicgames.launcher://apps/{inner}?action=launch&silent=true"
))
}
// ---------------------------------------------------------------------------------------
// GOG (Windows) — registry-indexed installs + each game's `goggame-<id>.info` for a direct-exe
// launch (no Galaxy needed, dodges its cold-start/anti-cheat). Art (api.gog.com) is a follow-up.
// ---------------------------------------------------------------------------------------
/// Reads the GOG.com install registry + per-game `.info` files. Windows-only. Best-effort: empty
/// when GOG isn't installed.
#[cfg(windows)]
pub struct GogProvider;
#[cfg(windows)]
impl LibraryProvider for GogProvider {
fn store(&self) -> &'static str {
"gog"
}
fn list(&self) -> Vec<GameEntry> {
gog_games()
}
}
#[cfg(windows)]
fn gog_games() -> Vec<GameEntry> {
use winreg::enums::HKEY_LOCAL_MACHINE;
use winreg::RegKey;
// 32-bit GOG writes under WOW6432Node; a 64-bit process reads the explicit path directly.
let Ok(games_key) =
RegKey::predef(HKEY_LOCAL_MACHINE).open_subkey("SOFTWARE\\WOW6432Node\\GOG.com\\Games")
else {
return Vec::new();
};
let mut out = Vec::new();
for sub in games_key.enum_keys().flatten() {
// The subkey name IS the GOG product id.
let Ok(k) = games_key.open_subkey(&sub) else {
continue;
};
let Ok(path) = k.get_value::<String, _>("PATH") else {
continue;
};
if !Path::new(&path).is_dir() {
continue;
}
let title = k
.get_value::<String, _>("GAMENAME")
.unwrap_or_else(|_| sub.clone());
// Resolve the primary play task (exe + args + workdir) from goggame-<id>.info; skip if absent.
let Some((exe, args, workdir)) = gog_play_task(&path, &sub) else {
continue;
};
let id = format!("gog:{sub}");
// Art (public api.gog.com) is resolved off the hot path by the background warmer; read
// whatever it has cached (title-only until warmed).
let art = cached_art(&id).unwrap_or_default();
out.push(GameEntry {
id,
store: "gog".into(),
title,
art,
launch: Some(LaunchSpec {
kind: "gog".into(),
value: format!("{exe}\t{args}\t{workdir}"),
}),
});
}
out
}
/// The primary play task from `<install>\goggame-<id>.info`: `(absolute exe, args, working dir)`.
/// Prefers `isPrimary` + `FileTask`, else the first `FileTask`. Paths are resolved against `install`.
#[cfg(windows)]
fn gog_play_task(install: &str, id: &str) -> Option<(String, String, String)> {
let text =
std::fs::read_to_string(Path::new(install).join(format!("goggame-{id}.info"))).ok()?;
let v: serde_json::Value = serde_json::from_str(&text).ok()?;
let tasks = v.get("playTasks")?.as_array()?;
let is_file =
|t: &serde_json::Value| t.get("type").and_then(|s| s.as_str()) == Some("FileTask");
let pick = tasks
.iter()
.find(|t| {
t.get("isPrimary")
.and_then(|b| b.as_bool())
.unwrap_or(false)
&& is_file(t)
})
.or_else(|| tasks.iter().find(|t| is_file(t)))?;
let rel = pick.get("path").and_then(|s| s.as_str())?;
let exe = Path::new(install).join(rel);
let args = pick
.get("arguments")
.and_then(|s| s.as_str())
.unwrap_or("")
.to_string();
let workdir = pick
.get("workingDir")
.and_then(|s| s.as_str())
.map(|w| Path::new(install).join(w))
.unwrap_or_else(|| Path::new(install).to_path_buf());
Some((
exe.to_string_lossy().into_owned(),
args,
workdir.to_string_lossy().into_owned(),
))
}
/// Build the spawn `(command line, working dir)` for a `gog` launch value (`exe \t args \t workdir`,
/// all host-resolved from the operator's own disk). Direct exe — no shell, no Galaxy.
#[cfg(windows)]
fn gog_spawn(value: &str) -> Option<(String, Option<PathBuf>)> {
let mut parts = value.split('\t');
let exe = parts.next().filter(|s| !s.is_empty())?;
let args = parts.next().unwrap_or("");
let workdir = parts.next().filter(|s| !s.is_empty()).map(PathBuf::from);
let cmdline = if args.trim().is_empty() {
format!("\"{exe}\"")
} else {
format!("\"{exe}\" {args}")
};
Some((cmdline, workdir))
}
// ---------------------------------------------------------------------------------------
// Xbox / Microsoft Store / Game Pass (Windows) — scans the flat-file `XboxGames` install dirs
// (no auth) for GDK games (each has a Content\MicrosoftGame.config). Launch via the AUMID
// (shell:AppsFolder\<PFN>!<AppId>) in the interactive session. Cover art (displaycatalog) deferred.
// ---------------------------------------------------------------------------------------
/// Reads installed Xbox / Game Pass / Store GDK games from the flat-file install dirs. Windows-only.
/// Best-effort: empty when no `XboxGames` dir exists.
#[cfg(windows)]
pub struct XboxProvider;
#[cfg(windows)]
impl LibraryProvider for XboxProvider {
fn store(&self) -> &'static str {
"xbox"
}
fn list(&self) -> Vec<GameEntry> {
xbox_games()
}
}
/// Scan each fixed drive's default `<drive>:\XboxGames` for GDK games — the presence of
/// `Content\MicrosoftGame.config` is the game marker (so we list games, not ordinary UWP apps). A
/// custom install folder (set via the undocumented `.GamingRoot`) isn't covered; the default folder
/// is the common case. Non-GDK pure-UWP Store games (under the ACL-locked WindowsApps) are missed too.
#[cfg(windows)]
fn xbox_games() -> Vec<GameEntry> {
let mut games = Vec::new();
for letter in b'C'..=b'Z' {
let root = PathBuf::from(format!("{}:\\XboxGames", letter as char));
let Ok(rd) = std::fs::read_dir(&root) else {
continue;
};
for entry in rd.flatten() {
let title_dir = entry.path();
let cfg = title_dir.join("Content").join("MicrosoftGame.config");
if !cfg.is_file() {
continue;
}
let Ok(text) = std::fs::read_to_string(&cfg) else {
continue;
};
let folder = title_dir
.file_name()
.map(|f| f.to_string_lossy().into_owned());
let Some((name, app_id, title, store_id)) = xbox_parse_config(&text, folder.as_deref())
else {
continue;
};
let Some(pfn) = xbox_pfn(&name) else {
tracing::debug!(package = %name, "xbox: no AppRepository entry → can't resolve PFN, skipping");
continue;
};
let id_key = if store_id.is_empty() {
pfn.clone()
} else {
store_id
};
let id = format!("xbox:{id_key}");
// Art (unofficial displaycatalog, keyed by StoreId) is resolved off the hot path by the
// background warmer; read whatever it has cached (title-only until warmed / if no StoreId).
let art = cached_art(&id).unwrap_or_default();
games.push(GameEntry {
id,
store: "xbox".into(),
title,
art,
launch: Some(LaunchSpec {
kind: "aumid".into(),
value: format!("{pfn}!{app_id}"),
}),
});
}
}
games.sort_by(|a, b| a.id.cmp(&b.id));
games.dedup_by(|a, b| a.id == b.id); // same game on two drives → one entry
games
}
/// Parse the fields we need from a `MicrosoftGame.config`: `(Identity Name, AppId, title, StoreId)`.
/// AppId is the `<Executable>`'s `Id` (the AUMID app id, typically "Game"). The title prefers
/// `ShellVisuals@DefaultDisplayName`, but that can be an unresolved `ms-resource:` ref → fall back to
/// the install folder name, then the package name.
#[cfg(windows)]
fn xbox_parse_config(text: &str, folder: Option<&str>) -> Option<(String, String, String, String)> {
let doc = roxmltree::Document::parse(text).ok()?;
let root = doc.root_element();
let name = root
.children()
.find(|n| n.has_tag_name("Identity"))?
.attribute("Name")?
.to_string();
let app_id = root
.children()
.find(|n| n.has_tag_name("ExecutableList"))
.and_then(|el| {
el.children()
.filter(|n| n.has_tag_name("Executable"))
.find_map(|e| e.attribute("Id"))
})?
.to_string();
let ddn = root
.children()
.find(|n| n.has_tag_name("ShellVisuals"))
.and_then(|sv| sv.attribute("DefaultDisplayName"))
.filter(|s| !s.is_empty() && !s.starts_with("ms-resource"));
let title = ddn
.map(String::from)
.or_else(|| folder.map(String::from))
.unwrap_or_else(|| name.clone());
let store_id = root
.children()
.find(|n| n.has_tag_name("StoreId"))
.and_then(|n| n.text())
.unwrap_or("")
.to_string();
Some((name, app_id, title, store_id))
}
/// Resolve a package's PackageFamilyName by finding its
/// `AppRepository\Packages\<PackageFullName>` dir (machine-wide, SYSTEM-readable) and reducing the
/// full name to `Name_PublisherHash`. This READS the authoritative PFN — never compute the hash.
#[cfg(windows)]
fn xbox_pfn(identity: &str) -> Option<String> {
let pkgs = PathBuf::from(std::env::var_os("ProgramData")?)
.join("Microsoft")
.join("Windows")
.join("AppRepository")
.join("Packages");
let prefix = format!("{identity}_");
for e in std::fs::read_dir(&pkgs).ok()?.flatten() {
let dn = e.file_name().to_string_lossy().into_owned();
if dn.starts_with(&prefix) {
if let Some(pfn) = pfn_from_full(&dn, identity) {
return Some(pfn);
}
}
}
None
}
/// PackageFamilyName from a PackageFullName dir name
/// (`Name_Version_Arch_ResourceId_PublisherHash`) → `Name_PublisherHash`. The hash is the last
/// `_`-segment; `Name` is the caller's identity.
#[cfg(windows)]
fn pfn_from_full(dir_name: &str, identity: &str) -> Option<String> {
let hash = dir_name.rsplit('_').next()?;
(!hash.is_empty() && hash != dir_name).then(|| format!("{identity}_{hash}"))
}
// ---------------------------------------------------------------------------------------
// Cover-art resolver + cache (shared by the Windows GOG + Xbox providers, which have no local
// art). A disk cache is the source of truth read by all_games() (so the list/launch path never
// blocks on the network); a host-lifetime background warmer fetches uncached art (GOG's public
// api.gog.com + Xbox's displaycatalog, both no-auth) and persists it. Cross-platform so the
// HTTP/JSON code is compiled + checked everywhere; the warmer simply finds nothing to fetch on a
// host whose stores all carry their own art (Steam CDN / Heroic CDN / Lutris data: URLs).
// ---------------------------------------------------------------------------------------
/// The persisted art cache: GameEntry id → resolved [`Artwork`]. An entry's PRESENCE means "already
/// resolved" (even an empty Artwork = fetched, none found) so the warmer never re-fetches it.
fn art_cache() -> &'static std::sync::Mutex<std::collections::HashMap<String, Artwork>> {
static CACHE: std::sync::OnceLock<
std::sync::Mutex<std::collections::HashMap<String, Artwork>>,
> = std::sync::OnceLock::new();
CACHE.get_or_init(|| {
let loaded = std::fs::read_to_string(art_cache_path())
.ok()
.and_then(|s| serde_json::from_str(&s).ok())
.unwrap_or_default();
std::sync::Mutex::new(loaded)
})
}
/// The art cache lives in the canonical HOST config dir (`%ProgramData%\punktfunk` on Windows /
/// `~/.config/punktfunk` on Linux — gamestream::config_dir, NOT the legacy XDG/HOME `config_dir`
/// below that the custom store still uses).
fn art_cache_path() -> PathBuf {
crate::gamestream::config_dir().join("library-art-cache.json")
}
/// The cached art for a library id, if it has been resolved (positive or negative). `None` = not yet
/// warmed → the provider shows title-only until the warmer fills it in.
fn cached_art(id: &str) -> Option<Artwork> {
art_cache().lock().unwrap().get(id).cloned()
}
/// Record resolved art for a library id + persist the cache (write-then-rename; best-effort).
fn store_art(id: &str, art: Artwork) {
let mut cache = art_cache().lock().unwrap();
cache.insert(id.to_string(), art);
if let Ok(json) = serde_json::to_string(&*cache) {
let path = art_cache_path();
if let Some(dir) = path.parent() {
let _ = std::fs::create_dir_all(dir);
}
let tmp = path.with_extension("json.tmp");
if std::fs::write(&tmp, json).is_ok() {
let _ = std::fs::rename(&tmp, &path);
}
}
}
/// Start the host-lifetime cover-art warmer: every few minutes, fetch + cache art for any library
/// entry whose store needs a network lookup (GOG / Xbox) and isn't cached yet. Idempotent — once
/// everything is cached a pass makes no network calls (and a host with only self-art stores never
/// fetches at all). Call once from `serve()`; the returned handle can be dropped to detach it.
pub fn start_art_warmer() -> std::thread::JoinHandle<()> {
std::thread::Builder::new()
.name("pf-art-warmer".into())
.spawn(|| loop {
warm_art_once();
std::thread::sleep(std::time::Duration::from_secs(300));
})
.expect("spawn art warmer thread")
}
/// One warming pass: resolve uncached GOG/Xbox art. Other stores carry their own art (Steam CDN
/// template, Heroic CDN URLs, Lutris data: URLs, custom user URLs) and are skipped.
fn warm_art_once() {
for g in all_games() {
if cached_art(&g.id).is_some() {
continue;
}
let Some((store, localid)) = g.id.split_once(':') else {
continue;
};
let art = match store {
"gog" => fetch_gog_art(localid),
// The xbox id is the StoreId when present, else the PFN (contains '_', no displaycatalog
// entry) → cache empty for those so they aren't retried every pass.
"xbox" if !localid.contains('_') => fetch_xbox_art(localid),
"xbox" => Artwork::default(),
_ => continue, // steam/heroic/lutris/custom resolve their own art
};
store_art(&g.id, art);
}
}
/// HTTP GET + parse JSON with a bounded timeout. `None` on any network/parse failure (best-effort —
/// art is non-essential, so a failure just leaves the title-only card).
fn fetch_json(url: &str) -> Option<serde_json::Value> {
let agent = ureq::AgentBuilder::new()
.timeout(std::time::Duration::from_secs(10))
.build();
let body = agent.get(url).call().ok()?.into_string().ok()?;
serde_json::from_str(&body).ok()
}
/// Make a protocol-relative URL (`//host/...`, common in GOG + MS catalog responses) absolute https.
fn abs_url(u: &str) -> String {
u.strip_prefix("//")
.map(|rest| format!("https://{rest}"))
.unwrap_or_else(|| u.to_string())
}
/// GOG cover art via the public (no-auth) product API. Field names / URL shapes are GOG-specific and
/// best-effort (worth on-box confirmation); a wrong URL just degrades to the title card client-side.
fn fetch_gog_art(product_id: &str) -> Artwork {
let Some(v) = fetch_json(&format!(
"https://api.gog.com/products/{product_id}?expand=images"
)) else {
return Artwork::default();
};
let img = |k: &str| {
v.get("images")
.and_then(|i| i.get(k))
.and_then(|u| u.as_str())
.map(abs_url)
};
Artwork {
portrait: img("verticalCover"),
hero: img("background"),
logo: img("logo2x"),
header: img("logo"),
}
}
/// Xbox cover art via the (unofficial, no-auth) Microsoft display catalog, keyed by StoreId. Best-
/// effort: the endpoint is internal/unstable, so on drift this just yields no art (title-only).
fn fetch_xbox_art(store_id: &str) -> Artwork {
let Some(v) = fetch_json(&format!(
"https://displaycatalog.mp.microsoft.com/v7.0/products/{store_id}?market=US&languages=en-us&fieldsTemplate=Details"
)) else {
return Artwork::default();
};
let images = v
.get("Products")
.and_then(|p| p.as_array())
.and_then(|a| a.first())
.and_then(|p| p.get("LocalizedProperties"))
.and_then(|l| l.as_array())
.and_then(|a| a.first())
.and_then(|lp| lp.get("Images"))
.and_then(|i| i.as_array());
let mut art = Artwork::default();
for img in images.into_iter().flatten() {
let (Some(purpose), Some(uri)) = (
img.get("ImagePurpose").and_then(|v| v.as_str()),
img.get("Uri").and_then(|v| v.as_str()),
) else {
continue;
};
let url = abs_url(uri);
match purpose {
"Poster" => art.portrait = Some(url),
"SuperHeroArt" | "Hero" => art.hero = Some(url),
"Logo" => art.logo = Some(url),
"BoxArt" => art.header = Some(url),
_ => {}
}
}
art
}
// ---------------------------------------------------------------------------------------
// Custom store (user-curated entries, persisted + CRUD'd via the mgmt API)
// ---------------------------------------------------------------------------------------
@@ -768,6 +1383,32 @@ fn windows_launch_for(spec: &LaunchSpec) -> Option<(String, Option<std::path::Pa
};
Some((cmdline, None))
}
// Epic: open the (host-built, validated) com.epicgames.launcher:// URI via explorer.exe — a
// concrete EXE that resolves the registered protocol handler as the user; the URI is a single
// argv element (no shell, no cmd /c). Same pattern as the steam explorer fallback.
"epic" => epic_launch_uri(&spec.value).map(|uri| (format!("explorer.exe \"{uri}\""), None)),
// GOG: spawn the resolved game exe directly (host-derived from goggame-<id>.info), no Galaxy.
"gog" => gog_spawn(&spec.value),
// Xbox/Game Pass: activate the UWP/GDK package by its AUMID (<PFN>!<AppId>) via explorer's
// shell:AppsFolder — which runs in the interactive user session (UWP activation fails as
// SYSTEM/session-0; spawn_in_active_session uses the user token). Guard the charset (the value
// is host-derived from MicrosoftGame.config + AppRepository, but belt-and-suspenders).
"aumid" => {
let valid = spec.value.split_once('!').is_some_and(|(pfn, app)| {
let part = |s: &str| {
!s.is_empty()
&& s.bytes()
.all(|b| b.is_ascii_alphanumeric() || matches!(b, b'.' | b'_' | b'-'))
};
part(pfn) && part(app)
});
valid.then(|| {
(
format!("explorer.exe \"shell:AppsFolder\\{}\"", spec.value),
None,
)
})
}
// Operator-typed custom command (host-owned, never client-set): run it through the shell in the
// interactive session. `cmd.exe /c` is acceptable here precisely because the value is operator
// input — the same trust as the operator typing it — not a client-influenced string.
@@ -795,6 +1436,38 @@ fn steam_exe() -> Option<std::path::PathBuf> {
None
}
/// Launch a GameStream `apps.json` command (operator-typed, trusted — never client-set) into the live
/// session, AFTER capture is up. Used by the GameStream path for the backends that DON'T nest the
/// command via [`VirtualDisplay::set_launch_command`]: Windows (no gamescope) and Linux
/// kwin/mutter/wlroots (which stream the existing desktop). The caller skips this for Linux gamescope,
/// which already nested it. On Windows it runs in the interactive USER session (the host is SYSTEM);
/// on Linux the host is already inside the user's graphical session, so a plain spawn lands the app on
/// the streamed (primary) output.
#[cfg(any(windows, target_os = "linux"))]
pub fn launch_gamestream_command(cmd: &str) -> Result<()> {
let cmd = cmd.trim();
anyhow::ensure!(!cmd.is_empty(), "empty command");
#[cfg(windows)]
{
// cmd.exe /c is fine here: the value is the host operator's own apps.json command, not a
// client-influenced string (same trust as the custom-store `command` kind).
let pid = crate::interactive::spawn_in_active_session(&format!("cmd.exe /c {cmd}"), None)
.context("spawn gamestream command in the interactive session")?;
tracing::info!(command = %cmd, pid, "gamestream: launched app in the interactive session");
Ok(())
}
#[cfg(target_os = "linux")]
{
let child = std::process::Command::new("sh")
.arg("-c")
.arg(cmd)
.spawn()
.context("spawn gamestream command")?;
tracing::info!(command = %cmd, pid = child.id(), "gamestream: launched app into the session");
Ok(())
}
}
/// The full library: every store's titles merged + the custom entries, sorted by title.
pub fn all_games() -> Vec<GameEntry> {
let mut games = SteamProvider.list();
@@ -805,6 +1478,13 @@ pub fn all_games() -> Vec<GameEntry> {
games.extend(LutrisProvider.list());
games.extend(HeroicProvider.list());
}
// Windows store providers (their launchers are Windows-only): Epic + GOG + Xbox/Game Pass.
#[cfg(windows)]
{
games.extend(EpicProvider.list());
games.extend(GogProvider.list());
games.extend(XboxProvider.list());
}
games.extend(load_custom().into_iter().map(GameEntry::from));
games.sort_by_key(|g| g.title.to_lowercase());
games
@@ -1048,6 +1728,20 @@ mod tests {
windows_launch_for(&cmd).unwrap().0,
"cmd.exe /c notepad.exe"
);
// Xbox AUMID → explorer shell:AppsFolder activation; a value without '!' is rejected.
let aumid = LaunchSpec {
kind: "aumid".into(),
value: "Microsoft.X_8wekyb3d8bbwe!Game".into(),
};
assert_eq!(
windows_launch_for(&aumid).unwrap().0,
"explorer.exe \"shell:AppsFolder\\Microsoft.X_8wekyb3d8bbwe!Game\""
);
assert!(windows_launch_for(&LaunchSpec {
kind: "aumid".into(),
value: "no-bang".into()
})
.is_none());
// Empty / unknown kinds → no recipe.
assert!(windows_launch_for(&LaunchSpec {
kind: "command".into(),
@@ -1060,4 +1754,116 @@ mod tests {
})
.is_none());
}
#[cfg(windows)]
#[test]
fn epic_filters_and_builds_launch() {
let dir = std::env::temp_dir().join(format!("pf-epic-test-{}", std::process::id()));
std::fs::create_dir_all(&dir).unwrap();
let inst = dir.to_string_lossy().into_owned();
let empty = std::collections::HashMap::new();
// Normal game with the full triple → kept, triple launch value.
let game = serde_json::json!({
"AppName": "Fortnite", "DisplayName": "Fortnite", "CatalogNamespace": "fn",
"CatalogItemId": "abc123", "InstallLocation": inst.clone(),
"AppCategories": ["public", "games", "applications"]
});
let e = epic_entry(&game, &empty).expect("game kept");
assert_eq!(e.id, "epic:Fortnite");
assert_eq!(e.launch.as_ref().unwrap().value, "fn:abc123:Fortnite");
// UE component, non-launchable addon, and a missing install dir are all skipped.
let ue = serde_json::json!({"AppName":"UE_5.3","InstallLocation":inst.clone(),"AppCategories":["engines"]});
assert!(epic_entry(&ue, &empty).is_none());
let dlc =
serde_json::json!({"AppName":"DLC","InstallLocation":inst,"AppCategories":["addons"]});
assert!(epic_entry(&dlc, &empty).is_none());
let gone = serde_json::json!({"AppName":"Gone","InstallLocation":"C:\\nope-xyz","AppCategories":["games"]});
assert!(epic_entry(&gone, &empty).is_none());
std::fs::remove_dir_all(&dir).ok();
}
#[cfg(windows)]
#[test]
fn epic_launch_uri_triple_bare_and_guard() {
assert_eq!(
epic_launch_uri("fn:abc:Fortnite").as_deref(),
Some("com.epicgames.launcher://apps/fn%3Aabc%3AFortnite?action=launch&silent=true")
);
assert_eq!(
epic_launch_uri("Fortnite").as_deref(),
Some("com.epicgames.launcher://apps/Fortnite?action=launch&silent=true")
);
assert!(epic_launch_uri("bad part:x:y").is_none()); // a space → rejected
assert!(epic_launch_uri("").is_none());
}
#[cfg(windows)]
#[test]
fn gog_spawn_parses_and_guards() {
let (cmd, wd) = gog_spawn("C:\\Games\\W3\\witcher3.exe\t--skip\tC:\\Games\\W3").unwrap();
assert_eq!(cmd, "\"C:\\Games\\W3\\witcher3.exe\" --skip");
assert_eq!(wd, Some(std::path::PathBuf::from("C:\\Games\\W3")));
let (cmd2, wd2) = gog_spawn("C:\\g.exe").unwrap();
assert_eq!(cmd2, "\"C:\\g.exe\"");
assert!(wd2.is_none());
assert!(gog_spawn("").is_none());
}
#[cfg(windows)]
#[test]
fn gog_play_task_picks_primary_filetask() {
let dir = std::env::temp_dir().join(format!("pf-gog-test-{}", std::process::id()));
std::fs::create_dir_all(&dir).unwrap();
let id = "1207658924";
std::fs::write(
dir.join(format!("goggame-{id}.info")),
r#"{"playTasks":[
{"isPrimary":false,"type":"FileTask","path":"other.exe"},
{"isPrimary":true,"type":"FileTask","path":"bin\\game.exe","arguments":"-w","workingDir":"bin"}
]}"#,
)
.unwrap();
let (exe, args, wd) = gog_play_task(&dir.to_string_lossy(), id).unwrap();
std::fs::remove_dir_all(&dir).ok();
assert!(exe.ends_with("bin\\game.exe"), "exe={exe}");
assert_eq!(args, "-w");
assert!(wd.ends_with("bin"), "wd={wd}");
}
#[cfg(windows)]
#[test]
fn xbox_parse_config_and_pfn() {
let xml = r#"<?xml version="1.0" encoding="utf-8"?>
<Game configVersion="1">
<Identity Name="Microsoft.624F8B84B80" Publisher="CN=Microsoft" Version="1.0.0.0" />
<ExecutableList>
<Executable Name="gamelaunchhelper.exe" Id="Game" />
</ExecutableList>
<StoreId>9NBLGGH4R315</StoreId>
<ShellVisuals DefaultDisplayName="Halo Infinite" Square150x150Logo="x.png" />
</Game>"#;
let (name, app_id, title, store_id) = xbox_parse_config(xml, Some("HaloInfinite")).unwrap();
assert_eq!(name, "Microsoft.624F8B84B80");
assert_eq!(app_id, "Game");
assert_eq!(title, "Halo Infinite");
assert_eq!(store_id, "9NBLGGH4R315");
// An ms-resource DefaultDisplayName is unresolvable → fall back to the install folder name.
let xml2 = r#"<Game><Identity Name="Pkg.Name"/>
<ExecutableList><Executable Id="App"/></ExecutableList>
<ShellVisuals DefaultDisplayName="ms-resource:DisplayName"/></Game>"#;
let (_, app2, title2, sid2) = xbox_parse_config(xml2, Some("MyGameFolder")).unwrap();
assert_eq!(app2, "App");
assert_eq!(title2, "MyGameFolder");
assert_eq!(sid2, "");
// PackageFamilyName reduced from a PackageFullName dir name (the hash is the last segment).
assert_eq!(
pfn_from_full(
"Microsoft.624F8B84B80_1.0.0.0_x64__8wekyb3d8bbwe",
"Microsoft.624F8B84B80"
)
.as_deref(),
Some("Microsoft.624F8B84B80_8wekyb3d8bbwe")
);
assert!(pfn_from_full("NoUnderscore", "NoUnderscore").is_none());
}
}
@@ -13,6 +13,9 @@
//! attaches none, the export yields an already-signaled sync_file (poll returns immediately) — no
//! wait, no harm, and `waited=false` tells us the driver doesn't fence (so zero-copy would still race).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::os::fd::RawFd;
// linux/dma-buf.h ioctls on the DMA_BUF_BASE ('b' = 0x62) magic. _IOWR = dir(3)<<30 | size<<16 | base<<8 | nr.
@@ -40,6 +43,11 @@ pub fn wait_read_ready(dmabuf_fd: RawFd, timeout_ms: i32) -> std::io::Result<boo
flags: DMA_BUF_SYNC_READ,
fd: -1,
};
// SAFETY: `dmabuf_fd` is a live dmabuf fd supplied by the caller (borrowed for this call; we
// never close it). `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` encodes `size_of::<DmaBufExportSyncFile>()`
// — the exact byte count the kernel copies — and `&mut req` is a live, correctly-sized
// `#[repr(C)]` struct the EXPORT_SYNC_FILE ioctl reads (`flags`) and writes (`fd`). `req`
// outlives this synchronous call and is not aliased elsewhere.
let r = unsafe { libc::ioctl(dmabuf_fd, DMA_BUF_IOCTL_EXPORT_SYNC_FILE, &mut req) };
if r < 0 {
return Err(std::io::Error::last_os_error());
@@ -54,11 +62,21 @@ pub fn wait_read_ready(dmabuf_fd: RawFd, timeout_ms: i32) -> std::io::Result<boo
revents: 0,
};
// Non-blocking probe: not-yet-signaled (poll==0) means the producer is still rendering.
// SAFETY: `&mut pfd` points at a single live `libc::pollfd` and `nfds == 1` matches that one
// element; `pfd.fd` is `sync_fd`, the sync_file fd just exported (already checked `>= 0`).
// `poll` reads `fd`/`events` and writes `revents` for this non-blocking (timeout 0) probe, then
// returns — `pfd` outlives the call and aliases nothing.
let pending = unsafe { libc::poll(&mut pfd, 1, 0) } == 0;
if pending {
pfd.revents = 0;
// SAFETY: same live single-element `pfd` (its `revents` reset to 0 just above), `nfds == 1`,
// and `sync_fd` still open. This blocking `poll` (up to `timeout_ms`) waits for the render
// fence to signal; it reads `fd`/`events`, writes `revents`, and returns before `pfd` ends.
unsafe { libc::poll(&mut pfd, 1, timeout_ms) }; // block until the render fence signals
}
// SAFETY: `sync_fd` is the sync_file fd the EXPORT_SYNC_FILE ioctl created and handed us to own;
// this point is reached only when `sync_fd >= 0`, this `close` runs exactly once on it, and it is
// never used afterward — no double-close or use-after-close.
unsafe { libc::close(sync_fd) };
Ok(pending)
}
@@ -8,6 +8,8 @@
//! verified (ioctl numbers + a live signal→wait round trip), ready to wire in the moment a producer
//! gains working `SPA_META_SyncTimeline`.
#![allow(dead_code)]
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
//!
//! Compositors that render directly into the PipeWire buffer pool (Mutter's virtual
//! monitors) hand buffers over at GPU-submit time; on drivers without implicit dmabuf
@@ -81,6 +83,8 @@ pub struct DrmSync {
impl DrmSync {
pub fn open() -> Result<DrmSync> {
let path = c"/dev/dri/renderD128";
// SAFETY: `path` is a 'static NUL-terminated C string literal; `open` only reads it as a
// filesystem path and returns an fd (or -1). No Rust memory is aliased or handed to the kernel.
let fd = unsafe { libc::open(path.as_ptr(), libc::O_RDWR | libc::O_CLOEXEC) };
if fd < 0 {
bail!("open /dev/dri/renderD128 for syncobj ops: {}", errno());
@@ -94,6 +98,9 @@ impl DrmSync {
fd: syncobj_fd,
..Default::default()
};
// SAFETY: `self.fd` is the live render-node fd from `open`; the request number encodes
// `size_of::<DrmSyncobjHandle>()` (the bytes the kernel copies), and `&mut req` is a live,
// correctly-sized `#[repr(C)]` struct the FD_TO_HANDLE ioctl reads (`fd`) and writes (`handle`).
let r = unsafe { libc::ioctl(self.fd, DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, &mut req) };
if r < 0 {
bail!("SYNCOBJ_FD_TO_HANDLE: {}", errno());
@@ -106,6 +113,8 @@ impl DrmSync {
handle,
..Default::default()
};
// SAFETY: `self.fd` is the live render-node fd; `DRM_IOCTL_SYNCOBJ_DESTROY` encodes
// `size_of::<DrmSyncobjDestroy>()`, and `&mut req` is a live correctly-sized struct the kernel reads.
unsafe { libc::ioctl(self.fd, DRM_IOCTL_SYNCOBJ_DESTROY, &mut req) };
}
@@ -117,6 +126,8 @@ impl DrmSync {
tv_sec: 0,
tv_nsec: 0,
};
// SAFETY: `CLOCK_MONOTONIC` is a valid clock id and `&mut now` is a live `libc::timespec` the
// kernel fills in; the call returns before `now` is read, so there is no aliasing/lifetime issue.
unsafe { libc::clock_gettime(libc::CLOCK_MONOTONIC, &mut now) };
let deadline = now.tv_sec * 1_000_000_000 + now.tv_nsec + timeout_ms as i64 * 1_000_000;
let handles = [handle];
@@ -129,6 +140,11 @@ impl DrmSync {
flags: DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
..Default::default()
};
// SAFETY: `self.fd` is the live render-node fd; the request number encodes
// `size_of::<DrmSyncobjTimelineWait>()`; `&mut req` is a live correctly-sized struct. Its
// `handles`/`points` u64 fields hold the addresses of the local `handles`/`points` arrays, which
// outlive this synchronous call, and `count_handles == 1` matches their length — so every kernel
// read through those addresses stays in bounds.
let r = unsafe { libc::ioctl(self.fd, DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, &mut req) };
let saved = errno();
self.destroy(handle);
@@ -151,6 +167,10 @@ impl DrmSync {
count_handles: 1,
flags: 0,
};
// SAFETY: `self.fd` is the live render-node fd; the request number encodes
// `size_of::<DrmSyncobjTimelineArray>()`; `&mut req` is a live correctly-sized struct whose
// `handles`/`points` u64 fields address the local `handles`/`points` arrays (alive for this
// synchronous call, `count_handles == 1` matching their length).
let r = unsafe { libc::ioctl(self.fd, DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, &mut req) };
let saved = errno();
self.destroy(handle);
@@ -163,6 +183,8 @@ impl DrmSync {
impl Drop for DrmSync {
fn drop(&mut self) {
// SAFETY: `self.fd` is the fd `open` returned; this `DrmSync` owns it exclusively and `close`
// runs exactly once (here, in `Drop`), so there is no double-close or use-after-close.
unsafe { libc::close(self.fd) };
}
}
@@ -203,14 +225,19 @@ mod tests {
const CREATE: u64 = iowr(0xBF, std::mem::size_of::<Create>());
const HANDLE_TO_FD: u64 = iowr(0xC1, std::mem::size_of::<DrmSyncobjHandle>());
let mut c = Create::default();
// SAFETY: `sync.fd` is the live render-node fd; `CREATE` encodes `size_of::<Create>()`, and
// `&mut c` is a live correctly-sized struct the kernel fills (`handle`).
assert!(unsafe { libc::ioctl(sync.fd, CREATE, &mut c) } >= 0);
let mut h = DrmSyncobjHandle {
handle: c.handle,
..Default::default()
};
// SAFETY: `sync.fd` is live; `HANDLE_TO_FD` encodes `size_of::<DrmSyncobjHandle>()`; `&mut h`
// is a live correctly-sized struct (the kernel reads `handle`, writes `fd`).
assert!(unsafe { libc::ioctl(sync.fd, HANDLE_TO_FD, &mut h) } >= 0);
sync.signal_point(h.fd, 1).expect("signal");
sync.wait_point(h.fd, 1, 100).expect("wait after signal");
// SAFETY: `h.fd` is the fd HANDLE_TO_FD just exported; we own it and close it exactly once here.
unsafe { libc::close(h.fd) };
sync.destroy(c.handle);
}
@@ -11,6 +11,8 @@
//! thread) and ffmpeg's `hevc_nvenc` (encode thread); each thread makes it current before use.
#![allow(non_camel_case_types, non_snake_case)]
// Every `unsafe` block/impl below carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::{bail, Result};
use std::os::raw::{c_int, c_uint, c_void};
@@ -128,8 +130,14 @@ struct CudaApi {
) -> CUresult,
cuDestroyExternalMemory: unsafe extern "C" fn(CUexternalMemory) -> CUresult,
}
// The resolved fn pointers are plain addresses into a process-lifetime mapping; safe to share.
// SAFETY: every field is a bare `extern "C" fn` address into the leaked, process-lifetime
// `libcuda` mapping (`cuda_api` `forget`s the `Library`, so it is never unloaded) — an immutable
// value with no interior mutability and no thread affinity. Moving the table to another thread
// cannot dangle (the code it points at stays mapped) or race (the fields are read-only).
unsafe impl Send for CudaApi {}
// SAFETY: as above — the table is a set of immutable fn-pointer addresses with no interior
// mutability, so concurrent shared reads from multiple threads cannot race; the driver entry
// points they address are themselves thread-safe.
unsafe impl Sync for CudaApi {}
/// `CUresult` returned by the wrappers when `libcuda` isn't loaded (no NVIDIA driver). Non-zero so
@@ -143,6 +151,14 @@ static CUDA_API: OnceLock<Option<CudaApi>> = OnceLock::new();
/// (the expected case on AMD/Intel hosts) — logged at debug, not an error.
fn cuda_api() -> Option<&'static CudaApi> {
CUDA_API
// SAFETY: `Library::new` runs `libcuda.so.1`'s initializers — it is the trusted NVIDIA
// driver library, so loading has no unexpected effects; `?`/`None` handle its absence.
// Each `lib.get::<T>(name)` asserts the symbol's real ABI equals `T`: every NUL-terminated
// name is a documented CUDA Driver API entry point and `T` is the exact
// `unsafe extern "C" fn(..)` signature from cuda.h/cudaGL.h (`_v2` for ctx/mem ops). Each
// `Symbol` only borrows `lib` until the end of the struct-literal statement; we deref-copy
// the raw fn-pointer out first, then `forget(lib)` leaks the mapping so those addresses
// stay valid for the whole process. Runs once under the `OnceLock` init — no aliasing.
.get_or_init(|| unsafe {
let lib = libloading::Library::new("libcuda.so.1")
.or_else(|_| libloading::Library::new("libcuda.so"))
@@ -361,6 +377,12 @@ pub fn read_plane_to_host(
Height: height,
..Default::default()
};
// SAFETY: `copy_blocking` is unsafe because it issues a CUDA copy; its contract is a valid
// descriptor with the shared context current (the caller's responsibility — self-test path).
// `&copy` is a live local `#[repr(C)] CUDA_MEMCPY2D` that outlives the synchronous call:
// `srcDevice`/`srcPitch` are the caller's live pitched device plane, `dstHost` addresses the
// freshly-allocated `host` `Vec` of exactly `width_bytes*height` bytes, and `WidthInBytes`×
// `Height` fit both. The copy is synchronous, so `host` is fully written before we return it.
unsafe { copy_blocking(&copy, "cuMemcpy2DAsync_v2(dev->host)")? };
Ok(host)
}
@@ -369,7 +391,13 @@ pub fn read_plane_to_host(
/// in a `OnceLock`; the raw `CUcontext` is thread-safe to make current from any thread.
#[derive(Clone, Copy)]
pub struct Context(pub CUcontext);
// SAFETY: `CUcontext` is an opaque CUDA driver handle, not a dereferenceable Rust pointer. It is
// created once and never destroyed (process lifetime), and the only thing done with it is
// `cuCtxSetCurrent`, which the Driver API explicitly allows from any thread — so transferring the
// handle to another thread cannot dangle or race (the driver owns the synchronization).
unsafe impl Send for Context {}
// SAFETY: as above — the wrapped handle is an immutable opaque address and the driver does all the
// synchronization, so sharing `&Context` across threads is sound.
unsafe impl Sync for Context {}
static CONTEXT: OnceLock<Context> = OnceLock::new();
@@ -382,6 +410,12 @@ pub fn context() -> Result<CUcontext> {
if cuda_api().is_none() {
bail!("libcuda.so.1 not available — no NVIDIA driver (CUDA zero-copy disabled)");
}
// SAFETY: we returned above unless `cuda_api()` is `Some`, so every wrapper here forwards into
// the live, leaked `libcuda` table rather than the not-loaded stub. `cuInit(0)` passes the
// API-required flags value 0. `&mut dev`/`&mut ctx` are live, zero/null-initialized stack
// out-params the driver writes the device handle / new context into; each outlives its
// synchronous call and they are distinct locals (no aliasing). `cuCtxCreate_v2` yields a valid
// `CUcontext` on success (`ck` bails otherwise), which becomes the block's value.
let ctx = unsafe {
ck(cuInit(0), "cuInit")?;
let mut dev: CUdevice = 0;
@@ -401,6 +435,10 @@ pub fn context() -> Result<CUcontext> {
/// Make the shared context current on the calling thread (required before any CUDA op here).
pub fn make_current() -> Result<()> {
let ctx = context()?;
// SAFETY: `ctx` came from `context()?`, so it is the live shared `CUcontext` and the driver
// table is present. `cuCtxSetCurrent` binds that opaque handle to the calling thread; it takes
// no Rust-memory pointer and is thread-safe (affects only this thread's current context), so
// there is no aliasing or lifetime hazard.
unsafe { ck(cuCtxSetCurrent(ctx), "cuCtxSetCurrent") }
}
@@ -423,6 +461,12 @@ fn copy_stream() -> CUstream {
if let Some(s) = cell.get() {
return s;
}
// SAFETY: `copy_stream` runs with the shared context current (its doc contract), so the
// wrappers forward into the live `libcuda` table. `&mut least`/`&mut greatest` are live
// stack `i32`s the driver fills with the priority range; `&mut s` is a live null-init
// `CUstream` the driver writes the new stream into. All out-params outlive their
// synchronous calls and are distinct locals. On any non-zero result we fall back to a null
// (NULL-stream) value and never read an uninitialized handle.
let stream = unsafe {
let (mut least, mut greatest) = (0i32, 0i32);
if cuCtxGetStreamPriorityRange(&mut least, &mut greatest) != 0 {
@@ -459,6 +503,11 @@ unsafe fn copy_blocking(copy: &CUDA_MEMCPY2D, what: &str) -> Result<()> {
fn alloc_pitched(width: u32, height: u32) -> Result<(CUdeviceptr, usize)> {
let mut ptr: CUdeviceptr = 0;
let mut pitch: usize = 0;
// SAFETY: `cuMemAllocPitch_v2` allocates a pitched device buffer (the wrapper forwards to the
// live table on any path that reached allocation). `&mut ptr` (`CUdeviceptr`) and `&mut pitch`
// (`usize`) are live, distinct stack out-params the driver writes the allocation pointer and
// its pitch into; both outlive the synchronous call. Width/height/element-size are by-value
// ints. No aliasing — two separate locals.
unsafe {
ck(
cuMemAllocPitch_v2(
@@ -486,6 +535,10 @@ fn alloc_pitched_nv12(
let mut y_pitch: usize = 0;
let mut uv_ptr: CUdeviceptr = 0;
let mut uv_pitch: usize = 0;
// SAFETY: two independent `cuMemAllocPitch_v2` calls (wrapper → live table). `&mut y_ptr`/
// `&mut y_pitch` and `&mut uv_ptr`/`&mut uv_pitch` are live, distinct stack out-params the
// driver writes each plane's pointer and pitch into; all outlive their synchronous calls. The
// dimension/element-size args are by-value ints. No aliasing — four separate locals.
unsafe {
ck(
cuMemAllocPitch_v2(
@@ -524,6 +577,13 @@ struct PoolInner {
impl Drop for PoolInner {
fn drop(&mut self) {
// SAFETY: the pool only exists because allocation succeeded, so the driver table is live.
// `PoolInner` drops only once every `DeviceBuffer` that referenced it (each holds an `Arc`
// clone) has been recycled, so `free`/`free_uv` hold every outstanding allocation exactly
// once and nothing else still uses them — no double-free or use-after-free. We make the
// shared context current first (drop may run off the allocating thread) so `cuMemFree_v2`
// targets the right context. Each `p` is a `CUdeviceptr` previously returned by
// `cuMemAllocPitch_v2`; results are ignored (best-effort teardown).
unsafe {
if let Some(c) = CONTEXT.get() {
let _ = cuCtxSetCurrent(c.0);
@@ -697,6 +757,12 @@ impl Drop for DeviceBuffer {
}
} else {
// The buffer may be freed on the encode thread; cuMemFree needs a current context.
// SAFETY: this is the un-pooled branch (`pool` is `None`), so this `DeviceBuffer`
// exclusively owns `self.ptr` (and `self.uv`'s `uv_ptr`), each returned by
// `cuMemAllocPitch_v2` and freed exactly once here — `drop` runs once and the
// `self.ptr == 0` guard above skips the sentinel/empty case, so no double-free. We set
// the shared context current first because drop may run on a thread where it isn't, and
// `cuMemFree_v2` needs it. Wrapper → live table; results ignored (teardown).
unsafe {
if let Some(c) = CONTEXT.get() {
let _ = cuCtxSetCurrent(c.0);
@@ -745,6 +811,16 @@ impl RegisteredTexture {
/// unmap. The copy is synchronized (on our priority stream) before unmap so `dst` is ready
/// before the source dmabuf is recycled. Always unmaps, even if the copy errors.
pub fn copy_mapped_to(&mut self, dst: &DeviceBuffer) -> Result<()> {
// SAFETY: `self.resource` is the valid `CUgraphicsResource` from a successful `register_gl`
// (its only constructor), so the wrappers forward to the live table; the caller holds the
// GL+CUDA contexts current (the registration's contract). `cuGraphicsMapResources` maps
// `count == 1` resource via `&mut self.resource` (a live field) on the default stream;
// `cuGraphicsSubResourceGetMappedArray` writes the mapped `CUarray` into the live local
// `array` (index 0, mip 0). On failure we unmap and bail (balanced). `&copy` is a live
// local `CUDA_MEMCPY2D` outliving the synchronous `copy_blocking`: `srcArray` is valid
// while mapped, `dstDevice`/`dstPitch` are `dst`'s live allocation, `width*4`×`height` fit
// both. `copy_blocking` syncs before we unmap, so the array stays valid through the copy;
// we always unmap afterward (even on error), keeping the map/unmap pair balanced.
unsafe {
ck(
cuGraphicsMapResources(1, &mut self.resource, std::ptr::null_mut()),
@@ -783,6 +859,14 @@ impl RegisteredTexture {
width_bytes: usize,
height: usize,
) -> Result<()> {
// SAFETY: identical contract to `copy_mapped_to` — `self.resource` is the valid
// `CUgraphicsResource` from `register_gl` (wrappers → live table; caller holds GL+CUDA
// contexts current). Map `count == 1` resource via the live `&mut self.resource`; the
// mapped `CUarray` is written into the live local `array` (index 0, mip 0); on failure we
// unmap and bail (balanced). `&copy` is a live local outliving the synchronous
// `copy_blocking`: `srcArray` valid while mapped, `dstDevice`/`dstPitch` are the caller's
// live plane, `width_bytes`×`height` fit it. We always unmap afterward, even on copy error,
// so the map/unmap pair stays balanced and the array outlives the copy.
unsafe {
ck(
cuGraphicsMapResources(1, &mut self.resource, std::ptr::null_mut()),
@@ -847,6 +931,10 @@ pub fn copy_device_to_device(
Height: src.height as usize,
..Default::default()
};
// SAFETY: `copy_blocking` is unsafe (issues a CUDA copy); the caller must have the shared
// context current (documented). `&copy` is a live local device→device `CUDA_MEMCPY2D` outliving
// the synchronous call: `srcDevice`/`srcPitch` are `src`'s live allocation, `dstDevice`/
// `dstPitch` the caller's live region, `width*4`×`height` within both. Wrapper → live table.
unsafe { copy_blocking(&copy, "cuMemcpy2DAsync_v2(dev->dev)") }
}
@@ -888,6 +976,12 @@ pub fn copy_nv12_to_device(
Height: h / 2,
..Default::default()
};
// SAFETY: two unsafe `copy_blocking` device→device copies; the caller must have the shared
// context current (documented). `&y`/`&uv` are live local `CUDA_MEMCPY2D`s outliving each
// synchronous call. All four device pointers are valid: `src.ptr`/`src_uv_ptr` come from a live
// NV12 `DeviceBuffer` (its `.uv` presence was checked via `ok_or_else`), `y_dst`/`uv_dst` are
// the caller's live NVENC surface planes; the luma copy is `w`×`h`, the chroma copy
// `(w/2)*2`×`h/2`, each within its planes. Wrappers → live table.
unsafe {
copy_blocking(&y, "cuMemcpy2DAsync_v2(nv12 Y dev->dev)")?;
copy_blocking(&uv, "cuMemcpy2DAsync_v2(nv12 UV dev->dev)")
@@ -897,6 +991,12 @@ pub fn copy_nv12_to_device(
impl Drop for RegisteredTexture {
fn drop(&mut self) {
if !self.resource.is_null() {
// SAFETY: `self.resource` is non-null (just checked) and is the valid
// `CUgraphicsResource` from `register_gl`, owned exclusively by this `RegisteredTexture`
// and unregistered exactly once here (drop runs once) — no use-after-free or
// double-unregister. `cuGraphicsUnregisterResource` releases the GL↔CUDA registration;
// wrapper → live table (the resource exists ⇒ the driver was present). Result ignored
// (best-effort teardown).
unsafe {
let _ = cuGraphicsUnregisterResource(self.resource);
}
@@ -913,7 +1013,11 @@ pub struct ExternalDmabuf {
pub size: u64,
}
// Raw driver handles; used from the single capture thread but moved with the importer.
// SAFETY: the fields are opaque CUDA driver handles — an external-memory handle and a device
// pointer — not dereferenceable Rust memory, and the value is uniquely owned (no `Clone`). It is
// used from a single capture thread but constructed on / moved between threads with the importer;
// transferring these handles is sound because uniqueness rules out aliasing and they are destroyed
// exactly once in `Drop`. Only `Send` (not `Sync`) is asserted, matching the single-thread use.
unsafe impl Send for ExternalDmabuf {}
impl ExternalDmabuf {
@@ -921,6 +1025,9 @@ impl ExternalDmabuf {
/// from then on) and map its full `size` bytes to a device pointer. The shared context
/// must be current.
pub fn import(fd: i32, size: u64) -> Result<ExternalDmabuf> {
// SAFETY: `libc::dup` only reads the integer `fd` and returns a new descriptor (or -1); it
// touches no Rust memory and `fd` is the caller's still-owned dmabuf fd (not consumed
// here). No aliasing or lifetime concern — a pure syscall on an integer.
let dup = unsafe { libc::dup(fd) };
if dup < 0 {
bail!("dup(dmabuf fd) failed");
@@ -938,8 +1045,17 @@ impl ExternalDmabuf {
};
desc.handle[0] = dup as u32 as u64; // union member `int fd` (little-endian low bytes)
let mut ext: CUexternalMemory = std::ptr::null_mut();
// SAFETY: `cuImportExternalMemory` imports the memory described by `&desc`, a live local
// `#[repr(C)] CUDA_EXTERNAL_MEMORY_HANDLE_DESC` (cuda.h 64-bit layout) that outlives this
// synchronous call: `type_` is OPAQUE_FD, `handle[0]` holds the dup'd fd in the union's
// `int fd` low bytes, `size` is set. `&mut ext` is a live null-init out-param the driver
// writes the imported handle into. The driver takes ownership of the fd only on success.
// Distinct locals → no aliasing. Wrapper → live table (caller holds the context current).
let r = unsafe { cuImportExternalMemory(&mut ext, &desc) };
if r != 0 {
// SAFETY: import failed (`r != 0`), so the driver did NOT take ownership of `dup`; we
// still own it and close it exactly once here on the error path (the success path never
// closes it — the driver does). `libc::close` acts on the integer fd alone.
unsafe { libc::close(dup) }; // import failed → the driver did not take the fd
bail!("cuImportExternalMemory failed ({r}) — LINEAR dmabuf import unsupported?");
}
@@ -949,8 +1065,17 @@ impl ExternalDmabuf {
..Default::default()
};
let mut ptr: CUdeviceptr = 0;
// SAFETY: maps a device pointer from `ext` (the valid `CUexternalMemory` just imported) per
// `&buf`, a live local `CUDA_EXTERNAL_MEMORY_BUFFER_DESC` (offset 0, full `size`) that
// outlives this synchronous call. `&mut ptr` is a live zero-init out-param the driver writes
// the mapped device address into; distinct locals → no aliasing. Wrapper → live table
// (context current).
let r = unsafe { cuExternalMemoryGetMappedBuffer(&mut ptr, ext, &buf) };
if r != 0 {
// SAFETY: mapping failed; `ext` is the valid `CUexternalMemory` we imported and
// exclusively own. We destroy it exactly once here on the error path (the success path
// instead moves it into the returned `ExternalDmabuf`, whose `Drop` destroys it),
// releasing the fd the driver took — no double-destroy or use-after-free.
unsafe {
let _ = cuDestroyExternalMemory(ext);
}
@@ -962,6 +1087,12 @@ impl ExternalDmabuf {
impl Drop for ExternalDmabuf {
fn drop(&mut self) {
// SAFETY: this `ExternalDmabuf` only exists after a successful import, so the driver table
// is live. It exclusively owns `self.ptr` (the mapped buffer) and `self.ext` (the external
// memory), each torn down exactly once here (drop runs once; guarded by `!= 0` / `!null`) —
// no double-free or use-after-free. We make the shared context current first because drop
// may run off the import thread, and we free the mapped buffer before destroying its
// backing external memory. Results ignored (best-effort teardown).
unsafe {
if let Some(c) = CONTEXT.get() {
let _ = cuCtxSetCurrent(c.0);
@@ -996,5 +1127,10 @@ pub fn copy_pitched_to_buffer(
};
// copy_blocking syncs our priority stream before returning, so the copy is complete before the
// dmabuf is requeued to the producer.
// SAFETY: `copy_blocking` is unsafe (issues a CUDA copy); the caller must have the shared
// context current (documented). `&copy` is a live local device→device `CUDA_MEMCPY2D` outliving
// the synchronous call: `srcDevice`/`srcPitch` are the caller's live mapped span (e.g. an
// `ExternalDmabuf`), `dstDevice`/`dstPitch` are `dst`'s live allocation, `width*4`×`height`
// within both. Wrapper → live table.
unsafe { copy_blocking(&copy, "cuMemcpy2DAsync_v2(ext->dev)") }
}
+110 -3
View File
@@ -12,6 +12,8 @@
//! owned [`DeviceBuffer`] so the dmabuf can be returned to the compositor immediately.
#![allow(non_upper_case_globals)]
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::cuda::{self, DeviceBuffer};
use anyhow::{bail, ensure, Context as _, Result};
@@ -415,6 +417,14 @@ impl Nv12Blit {
impl Drop for Nv12Blit {
fn drop(&mut self) {
// SAFETY: these GL names (textures/FBOs/VAO/programs) were all created by THIS `Nv12Blit`
// in `Nv12Blit::new` on the current GL context, which is still current because the owning
// `EglImporter` is dropped on its single capture thread (fields drop before
// `EglImporter::drop`, which never releases the context). `glDelete*` takes a count + a
// pointer to that many names: `&self.y_tex`/`&self.vao` are `&u32` to one live field (n=1);
// `[self.y_fbo, self.uv_fbo].as_ptr()` points at a 2-element temporary that lives for the
// whole `glDeleteFramebuffers` call (n=2 matches). The symbols dispatch through libGL
// (libglvnd) to the driver for the current context. Each name is deleted exactly once.
unsafe {
glDeleteTextures(1, &self.y_tex);
glDeleteTextures(1, &self.uv_tex);
@@ -459,7 +469,14 @@ pub struct EglImporter {
render_fd: c_int,
}
// The EGL handles are confined to the capture thread; the struct is moved there once.
// SAFETY: `EglImporter` owns thread-affine handles — an EGLDisplay/contexts made current on one
// thread, a loaded GL proc pointer, a `gbm_device*`, a raw fd, and CUDA-registered GL textures —
// none safe to touch concurrently. It is constructed inside `pipewire_thread` on the dedicated
// `punktfunk-pipewire` thread, and every method (`import*`, `supported_modifiers`, `Drop`) runs on
// that same thread; it is never accessed through a shared `&` from another thread. `Send` asserts
// only that transferring *ownership* is sound (needed so the importer can live in the PipeWire
// stream's user-data, whose API imposes a `Send` bound) — the live handles are never used
// off-thread. `Sync` is deliberately NOT implied.
unsafe impl Send for EglImporter {}
impl EglImporter {
@@ -470,16 +487,38 @@ impl EglImporter {
// to the same DRM device CUDA-GL interop associates with, which the EGL device platform
// did not (cuGraphicsGLRegisterImage rejected device-platform GL textures).
let path = std::ffi::CString::new("/dev/dri/renderD128").unwrap();
// SAFETY: `path` is a live local `CString` (built from a string with no interior NUL, so it
// is NUL-terminated); `path.as_ptr()` is a valid pointer to that buffer which outlives this
// synchronous `open`. `open` only reads the path and returns a new fd (or -1); it neither
// retains the pointer nor writes through it, so there is no aliasing or lifetime hazard.
let render_fd = unsafe { libc::open(path.as_ptr(), libc::O_RDWR | libc::O_CLOEXEC) };
ensure!(render_fd >= 0, "open /dev/dri/renderD128 for GBM");
// SAFETY: `render_fd` is the live DRM render-node fd just returned by `open` and checked
// `>= 0`. `gbm_create_device` (libgbm, linked above) builds a `gbm_device` over that fd and
// returns a `*mut gbm_device` (or null); it borrows but does not take ownership of the fd,
// which `EglImporter` keeps open and closes only in `Drop` after `gbm_device_destroy`. No
// Rust-owned memory is passed, so there is nothing to alias.
let gbm = unsafe { gbm_create_device(render_fd) };
if gbm.is_null() {
// SAFETY: reached only when `gbm_create_device` failed (null) — the fd was not consumed
// and no `EglImporter` exists yet to close it again, so this `close` runs exactly once on
// the live `render_fd`, releasing it before the error return. No double-close.
unsafe { libc::close(render_fd) };
anyhow::bail!("gbm_create_device failed");
}
// SAFETY: `Egl::load_required` dlopens the system libEGL and binds its entry points,
// trusting that libEGL (libglvnd) is a genuine EGL 1.5 implementation whose core symbols
// match the ABI the `khronos_egl` `EGL1_5` bindings declare. No Rust memory is passed; the
// returned instance is afterwards used only through the safe `khronos_egl` wrappers.
let egl: Egl =
unsafe { Egl::load_required() }.context("load libEGL (EGL 1.5 dynamic instance)")?;
// SAFETY: `gbm` is the non-null `gbm_device*` created just above (checked), and
// `EGL_PLATFORM_GBM_KHR` is exactly the platform enum that pairs with a GBM device as the
// native-display handle, so the `gbm as NativeDisplayType` cast hands EGL a valid native
// display for the requested platform. `&[egl::ATTRIB_NONE]` is a properly terminated, empty
// attribute array borrowed for this synchronous call; EGL only reads it and returns an
// `EGLDisplay`, retaining no pointer into Rust memory.
let display = unsafe {
egl.get_platform_display(
EGL_PLATFORM_GBM_KHR,
@@ -533,6 +572,13 @@ impl EglImporter {
.context("eglCreateContext(OpenGL)")?;
egl.make_current(display, None, None, Some(gl_ctx))
.context("eglMakeCurrent surfaceless (needs EGL_KHR_surfaceless_context)")?;
// SAFETY: the GL context was made current on this thread just above, which `eglGetProcAddress`
// requires to return a usable pointer. The non-null (`?`-checked) pointer it returns for
// "glEGLImageTargetTexture2DOES" is the driver's implementation of that GL-OES entry point,
// whose real ABI is `void(GLenum, GLeglImageOES)` = `(u32, *mut c_void)` `extern "system"`.
// `EglImageTargetFn` is declared with exactly that signature, so the transmute only retypes a
// same-size, same-ABI thin function pointer (no value/representation change). The function is
// present because `EGL_EXT_image_dma_buf_import` was asserted on this display above.
let egl_image_target: EglImageTargetFn = unsafe {
std::mem::transmute(
egl.get_proc_address("glEGLImageTargetTexture2DOES")
@@ -543,6 +589,10 @@ impl EglImporter {
// Create the shared CUDA context up front so import() is pure hot path.
cuda::context().context("create CUDA context")?;
// SAFETY: `egl::NO_CONTEXT` is EGL's defined sentinel (a null handle) for "no context";
// `Context::from_ptr` only stores the handle (it never dereferences it), so wrapping the
// null sentinel is sound and yields exactly the `EGL_NO_CONTEXT` value that
// `eglCreateImage(EGL_LINUX_DMA_BUF_EXT)` requires as its context argument later.
let no_ctx = unsafe { egl::Context::from_ptr(egl::NO_CONTEXT) };
tracing::info!(
"zero-copy EGL importer ready (GBM platform + GL texture interop, dma_buf_import + modifiers)"
@@ -602,8 +652,21 @@ impl EglImporter {
let Some(sym) = self.egl.get_proc_address("eglQueryDmaBufModifiersEXT") else {
return Vec::new();
};
// SAFETY: `sym` is the non-null pointer `eglGetProcAddress("eglQueryDmaBufModifiersEXT")`
// returned (the `let-else` already bailed on `None`) — the driver's implementation of that
// EGL extension entry point. `QueryFn` is declared with that function's exact documented ABI
// (`EGLDisplay, EGLint, EGLint, EGLuint64* , EGLBoolean*, EGLint* -> EGLBoolean`), all
// `extern "system"`, so the transmute only retypes a same-size, same-ABI thin fn pointer.
let query: QueryFn = unsafe { std::mem::transmute(sym) };
let dpy = self.display.as_ptr();
// SAFETY: `dpy` is this importer's live, initialized `EGLDisplay`; `query` is the proc loaded
// just above. The first call passes null out-arrays with `max_modifiers == 0`, which the
// extension defines as "write only the count" — it writes solely through `&mut count` (a live
// local `i32`). For the second call, `mods`/`ext` are freshly allocated `Vec`s of exactly
// `count` elements and `max_modifiers == count`, so the driver writes at most `count`
// `u64`/`u32` entries (in bounds) plus the actual count through `&mut n` (a live local). All
// four Rust addresses outlive these synchronous calls and alias nothing else. `truncate` only
// shrinks, so even a misbehaving `n > count` cannot read out of bounds.
unsafe {
let mut count: i32 = 0;
if query(
@@ -699,6 +762,10 @@ impl EglImporter {
]);
}
attrs.push(egl::ATTRIB_NONE);
// SAFETY: `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` mandates a NULL `EGLClientBuffer`
// (the source is described entirely by the attribute list built above), so wrapping
// `null_mut()` is the required value. `from_ptr` only stores the pointer without
// dereferencing it, so constructing it from null is sound.
let client = unsafe { egl::ClientBuffer::from_ptr(std::ptr::null_mut()) };
let image = self
.egl
@@ -733,11 +800,21 @@ impl EglImporter {
) -> Result<DeviceBuffer> {
cuda::make_current()?;
if self.blit.as_ref().map(|b| (b.width, b.height)) != Some((width, height)) {
// SAFETY: `GlBlit::new` requires the GL context current on the calling thread and a
// current CUDA context. Both hold: this runs on the capture thread where
// `EglImporter::new` made the GL context current and never released it, and
// `cuda::make_current()?` ran at the top of this function. `width`/`height` are plain
// `Copy` frame dimensions.
self.blit = Some(unsafe { GlBlit::new(width, height)? });
}
let egl_image_target = self.egl_image_target;
let blit = self.blit.as_mut().unwrap();
// SAFETY: GL + CUDA contexts current on this thread; `image` is a valid EGLImage.
// SAFETY: `GlBlit::run` requires a current GL context and a valid `EGLImage`. The GL context
// is current on this capture thread (made current in `EglImporter::new`, never released) and
// `cuda::make_current()` ran above; `egl_image_target` is the `glEGLImageTargetTexture2DOES`
// pointer loaded in `new`; `image` is the raw handle of the live `EGLImage` that
// `import_inner` created with `eglCreateImage` and destroys only AFTER this call returns, so
// it stays valid for the whole synchronous `run`.
unsafe { blit.run(egl_image_target, image)? };
// Persistent registration (mapped per frame) + a pooled buffer — no per-frame
// cuGraphicsGLRegisterImage / cuMemAllocPitch.
@@ -757,11 +834,21 @@ impl EglImporter {
) -> Result<DeviceBuffer> {
cuda::make_current()?;
if self.nv12_blit.as_ref().map(|b| (b.width, b.height)) != Some((width, height)) {
// SAFETY: `Nv12Blit::new` requires the GL context current on the calling thread and a
// current CUDA context. Both hold: this runs on the capture thread where
// `EglImporter::new` made the GL context current and never released it, and
// `cuda::make_current()?` ran at the top of this function. `width`/`height` are plain
// `Copy` frame dimensions.
self.nv12_blit = Some(unsafe { Nv12Blit::new(width, height)? });
}
let egl_image_target = self.egl_image_target;
let blit = self.nv12_blit.as_mut().unwrap();
// SAFETY: GL + CUDA contexts current on this thread; `image` is a valid EGLImage.
// SAFETY: `Nv12Blit::run` requires a current GL context and a valid `EGLImage`. The GL
// context is current on this capture thread (made current in `EglImporter::new`, never
// released) and `cuda::make_current()` ran above; `egl_image_target` is the
// `glEGLImageTargetTexture2DOES` pointer loaded in `new`; `image` is the raw handle of the
// live `EGLImage` that `import_inner` created with `eglCreateImage` and destroys only AFTER
// this call returns, so it stays valid for the whole synchronous `run`.
unsafe { blit.run(egl_image_target, image)? };
let dst = blit.pool.get()?;
cuda::copy_mapped_nv12(&mut blit.y_registered, &mut blit.uv_registered, &dst)?;
@@ -787,9 +874,22 @@ impl EglImporter {
);
cuda::make_current()?;
if self.nv12_blit.as_ref().map(|b| (b.width, b.height)) != Some((width, height)) {
// SAFETY: `Nv12Blit::new` requires the GL context current on the calling thread and a
// current CUDA context. Both hold: this self-test path runs on the thread that owns this
// `EglImporter` with its GL context current, and `cuda::make_current()?` ran just above.
// `width`/`height` are plain `Copy` scalars.
self.nv12_blit = Some(unsafe { Nv12Blit::new(width, height)? });
}
let blit = self.nv12_blit.as_mut().unwrap();
// SAFETY: runs on the thread that owns this `EglImporter` with its GL context current.
// `blit.src_tex` is a texture this `Nv12Blit` owns; `glTexStorage2D` allocates immutable
// RGBA8 storage exactly once (guarded by `test_src_storage`) sized `width×height`.
// `glTexSubImage2D` then uploads exactly `width×height` RGBA8 texels, reading `width*height*4`
// bytes from `rgba.as_ptr()`; the caller already asserted `rgba.len() == width*height*4`, rows
// are `width*4` bytes (a multiple of the default 4-byte unpack alignment, so no row-padding
// over-read), and `rgba` is a live borrow that outlives this synchronous upload. `run_passes`
// then needs only the current GL context (no further Rust pointers). All GL names are this
// blit's own, alias no other live object, and nothing is retained past the calls.
unsafe {
// Upload the host RGBA into `src_tex` (an immutable GL_RGBA8 backing must exist first;
// the live path never allocates it — it retargets `src_tex` via EGLImage instead).
@@ -824,9 +924,16 @@ impl EglImporter {
impl Drop for EglImporter {
fn drop(&mut self) {
if !self.gbm.is_null() {
// SAFETY: `self.gbm` is the non-null `gbm_device*` from `gbm_create_device` in `new`
// (checked non-null here), owned exclusively by this `EglImporter` and destroyed exactly
// once (in `Drop`). It is freed BEFORE `render_fd` is closed below — the correct order,
// since the device borrowed that fd for its lifetime.
unsafe { gbm_device_destroy(self.gbm) };
}
if self.render_fd >= 0 {
// SAFETY: `self.render_fd` is the fd `open` returned in `new` (checked `>= 0`), owned
// exclusively by this `EglImporter`; this `close` runs exactly once, after the gbm device
// that borrowed it has been destroyed. No double-close or use-after-close.
unsafe { libc::close(self.render_fd) };
}
}
@@ -16,6 +16,9 @@
//! a stream's life). Falls back cleanly: any init/import error disables the importer and the
//! CPU mmap path takes over.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::cuda::{self, DeviceBuffer};
use anyhow::{anyhow, bail, Context as _, Result};
use ash::vk;
@@ -51,12 +54,27 @@ pub struct VkBridge {
dst: Option<DstBuf>,
}
// Confined to the capture thread; moved there once.
// SAFETY: `VkBridge` owns ash Vulkan handles (instance/device/queue/command pool+buffer/fence), a
// CUDA external-memory mapping, and an fd→buffer cache — none `Sync`, and a single queue +
// command buffer must be externally synchronized. It is created inside `EglImporter::import_linear`
// on the dedicated `punktfunk-pipewire` capture thread and every method (`import_linear`, `Drop`)
// runs on that thread; it is never shared via `&` across threads. `Send` asserts only that
// transferring ownership is sound (so the bridge can live inside the `Send` `EglImporter`); the live
// handles are never touched off-thread, and `Sync` is deliberately NOT implied.
unsafe impl Send for VkBridge {}
impl VkBridge {
/// Bring up Vulkan on the NVIDIA GPU with the external-memory extensions.
pub fn new() -> Result<VkBridge> {
// SAFETY: standard ash bring-up — every call is `unsafe` only because ash cannot statically
// verify Vulkan handle/CreateInfo validity. `ash::Entry::load` dlopens a real system
// libvulkan. Each `*CreateInfo`/`AllocateInfo` is built by ash's builders from locals (`app`,
// `exts`, `prio`, `qci`, and the inline infos) that all live for the duration of the
// synchronous `create_*`/`enumerate_*` call that reads them — in particular the
// `enabled_extension_names(&exts)` and `queue_priorities(&prio)` borrows outlive their calls.
// Every handle passed (`instance`, `phys`, `device`, `qf`, `cmd_pool`) was just created and
// checked via `?`/`ok_or_else` in this same function, so no invalid handle is ever used. This
// constructor shares nothing across threads.
unsafe {
let entry = ash::Entry::load().context("load libvulkan")?;
let app = vk::ApplicationInfo::default().api_version(vk::API_VERSION_1_1);
@@ -294,6 +312,19 @@ impl VkBridge {
height: u32,
pool: &cuda::BufferPool,
) -> Result<DeviceBuffer> {
// SAFETY: `fd` is the live dmabuf fd handed in by the caller (borrowed; `import_src` dup's it
// internally and Vulkan owns the dup). `libc::lseek` only queries the fd's size. The unsafe
// `import_src`/`ensure_dst` are called with a valid fd and a checked size. The bounds are
// proven: `import_src` asserts `size >= span` (so the cached `src_size >= span`),
// `copy_size = src_size.min(span)`, and `ensure_dst(copy_size)` makes `dst` at least
// `copy_size` — so the GPU `cmd_copy_buffer` of `copy_size` bytes reads/writes within both
// buffers, and the later CUDA pitched copy reading `[offset, span)` from `dst.cuda.ptr` (=
// `offset + stride*height = span <= copy_size`) stays inside the freshly-copied region. The
// `*Info`/`region`/`cmds`/`submit` are locals that outlive the synchronous calls reading them.
// `cmd`/`queue`/`fence` are this bridge's own handles, used on this single thread only. The
// host-side `wait_for_fences` fully retires the Vulkan copy BEFORE CUDA reads the shared
// memory, so there is no GPU write/read data race. `dst` is an `&self.dst` shared borrow that
// does not alias the `&self.device` calls.
unsafe {
let span = offset as u64 + stride as u64 * height as u64;
if !self.src_cache.contains_key(&fd) {
@@ -347,6 +378,15 @@ impl VkBridge {
impl Drop for VkBridge {
fn drop(&mut self) {
// SAFETY: runs once when the bridge is dropped on its owning capture thread.
// `device_wait_idle` first drains all in-flight GPU work, so no queued command still
// references these objects. Every handle freed (the `src_cache` buffers+memories, the `dst`
// buffer+memory, `fence`, `cmd_pool`, `device`, `instance`) was created by this `VkBridge`
// and owned exclusively by it, so each `destroy_*`/`free_*` runs exactly once with no
// double-free, in dependency order (child objects before `device`, `device` before
// `instance`). `dst.cuda` is dropped after `free_memory`, which is safe because CUDA holds
// its own dup'd OPAQUE_FD reference to the underlying allocation. No other thread touches
// these handles.
unsafe {
let _ = self.device.device_wait_idle();
for (_, s) in self.src_cache.drain() {
+4
View File
@@ -13,6 +13,10 @@
// Scaffold: trait methods and config paths are defined ahead of their backends.
#![allow(dead_code)]
// Unsafe-proof program: every `unsafe {}` / `unsafe impl` in the crate must carry a `// SAFETY:`
// proof of why it is sound. This crate-root deny is the permanent, catch-all gate (it also covers
// any future module); individual files keep their own `#![deny(...)]` as belt-and-suspenders.
#![deny(clippy::undocumented_unsafe_blocks)]
mod audio;
mod capture;
+121 -22
View File
@@ -22,6 +22,9 @@
//! Trust: the host serves with its persistent identity (`~/.config/punktfunk/cert.pem`, shared
//! with GameStream pairing) and logs the SHA-256 fingerprint clients pin.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::{anyhow, Context, Result};
use punktfunk_core::config::{CompositorPref, FecConfig, FecScheme, GamepadPref, Role};
use punktfunk_core::input::{InputEvent, InputKind};
@@ -209,6 +212,10 @@ pub(crate) async fn serve(opts: Punktfunk1Options, np: Arc<NativePairing>) -> Re
// restores the box's autologin gaming session on idle, not per-disconnect — see
// `vdisplay::restore_managed_session`). Held for serve()'s lifetime; dropping it stops it.
let _restore_worker = crate::vdisplay::start_restore_worker();
// Host-lifetime cover-art warmer: fetches + caches GOG/Xbox cover art (no-auth api.gog.com /
// displaycatalog) off the hot path so `all_games()` (the library list + launch resolve) never
// blocks on the network. A no-op on a host whose stores all carry their own art.
let _art_warmer = crate::library::start_art_warmer();
// Pairing state (arming PIN + trust store) is shared with the management API. If it was armed
// at startup (the CLI flags), surface the PIN the headless operator reads from the log; the
// web console arms it on demand instead (a fresh, time-limited PIN).
@@ -1961,6 +1968,11 @@ pub(crate) fn boost_thread_priority(critical: bool) {
// capture/encode (critical) and send (non-critical).
crate::session_tuning::on_hot_thread();
#[cfg(target_os = "windows")]
// SAFETY: `GetCurrentThread()` returns the constant pseudo-handle for the calling thread — always
// valid, thread-local in meaning, and never closed (no leak/double-close). `SetThreadPriority`
// takes that handle plus a `THREAD_PRIORITY_*` value the windows crate defines (HIGHEST or
// ABOVE_NORMAL here); it only reprioritizes this OS thread, borrows no Rust memory, and its
// `Result` is matched (a failure is logged, never UB). No pointers, lifetimes, or aliasing.
unsafe {
use windows::Win32::System::Threading::{
GetCurrentThread, SetThreadPriority, THREAD_PRIORITY_ABOVE_NORMAL,
@@ -1988,6 +2000,10 @@ pub(crate) fn boost_thread_priority(critical: bool) {
// realtime CPU class can preempt the compositor AND the game's own render thread, adding the
// very frame-time we refuse to add (opt-in only — see PUNKTFUNK_SCHED_RR).
let nice = if critical { -10 } else { -5 };
// SAFETY: `setpriority` takes three by-value integers and no pointers, so there is nothing to
// alias or outlive. `PRIO_PROCESS` with `who == 0` targets the calling task on Linux and
// `nice` is in range; the call only adjusts this thread's scheduling nice value and returns an
// `int` we inspect. No memory is touched.
let rc = unsafe { libc::setpriority(libc::PRIO_PROCESS, 0, nice) };
if rc == 0 {
tracing::debug!(critical, nice, "thread nice raised");
@@ -2352,6 +2368,12 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// compositing), NOT an encoder problem. Logged every 2 s when `PUNKTFUNK_PERF`.
let (mut diag_new, mut diag_repeat) = (0u64, 0u64);
let mut diag_at = std::time::Instant::now();
// Per-stage latency breakdown (PUNKTFUNK_PERF): per-call µs for the GPU-bound stages so we see
// exactly where the capture→encoded latency goes — cap=try_latest (ring read + colour convert),
// submit=encode_picture launch, wait=lock_bitstream (the scheduling wait + ASIC encode, the one
// that dominates under a GPU-saturating game).
let (mut st_cap, mut st_submit, mut st_wait): (Vec<u32>, Vec<u32>, Vec<u32>) =
(Vec::new(), Vec::new(), Vec::new());
while !stop.load(Ordering::SeqCst) && std::time::Instant::now() < deadline {
// Mid-stream session switch (the box flipped Gaming↔Desktop): rebuild the WHOLE backend in
// place — a different compositor at the SAME client mode — keeping the Session + send thread
@@ -2458,7 +2480,12 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
tracing::debug!("forcing keyframe (client decode recovery)");
enc.request_keyframe();
}
match capturer.try_latest() {
let t_cap = std::time::Instant::now();
let cap_result = capturer.try_latest();
if perf {
st_cap.push(t_cap.elapsed().as_micros() as u32);
}
match cap_result {
Ok(Some(f)) => {
frame = f;
diag_new += 1;
@@ -2497,6 +2524,20 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
"capture diag: NEW frames from the source vs REPEATS (low new_fps at high send rate ⇒ \
the source isn't producing frames, not an encode stall)"
);
let wait_max = st_wait.iter().copied().max().unwrap_or(0);
tracing::info!(
cap_us_p50 = percentile(&mut st_cap, 0.50),
cap_us_p99 = percentile(&mut st_cap, 0.99),
submit_us_p50 = percentile(&mut st_submit, 0.50),
submit_us_p99 = percentile(&mut st_submit, 0.99),
wait_us_p50 = percentile(&mut st_wait, 0.50),
wait_us_p99 = percentile(&mut st_wait, 0.99),
wait_us_max = wait_max,
"stage perf (µs/call): cap=try_latest(ring+convert) submit=encode_picture wait=lock_bitstream(sched+ASIC)"
);
st_cap.clear();
st_submit.clear();
st_wait.clear();
diag_new = 0;
diag_repeat = 0;
diag_at = std::time::Instant::now();
@@ -2515,7 +2556,11 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// capturer hands a rotating ring of output textures, so it returns >1; other capturers default 1.
let depth = capturer.pipeline_depth().max(1);
let capture_ns = now_ns();
let t_submit = std::time::Instant::now();
enc.submit(&frame).context("encoder submit")?;
if perf {
st_submit.push(t_submit.elapsed().as_micros() as u32);
}
// This frame's pacing deadline (the next frame's due time); the send thread spreads a big frame
// up to here. Each in-flight frame carries its own (capture_ns, deadline) for when it's polled.
next += interval;
@@ -2526,7 +2571,12 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// the oldest submitted frame's AU — matching `inflight.pop_front()`.
let mut send_gone = false;
while inflight.len() >= depth {
let au = match enc.poll().context("encoder poll")? {
let t_wait = std::time::Instant::now();
let polled = enc.poll().context("encoder poll")?;
if perf {
st_wait.push(t_wait.elapsed().as_micros() as u32);
}
let au = match polled {
Some(au) => au,
None => break, // no AU ready for a submitted frame (shouldn't happen — poll blocks)
};
@@ -2669,6 +2719,11 @@ fn virtual_stream_relay(ctx: SessionContext) -> Result<()> {
// The secure-desktop HDR drop (for the DDA leg) keys off the monitor's real state in the mux loop.
#[cfg(target_os = "windows")]
if bit_depth >= 10 {
// SAFETY: `set_advanced_color` is marked `unsafe` only because it drives the Win32 CCD API
// internally; it takes `target_id` by value (Copy `u32` — this session's live SudoVDA
// monitor's CCD target id) and sizes + owns every buffer it hands the OS on its own stack.
// We pass no pointers, so nothing must outlive the call and there is no aliasing; an
// unknown/absent target id simply returns false.
unsafe {
if crate::win_display::set_advanced_color(target.target_id, true) {
// Let the colorspace change settle before WGC creates its capture item / detects HDR.
@@ -2904,8 +2959,12 @@ fn virtual_stream_relay(ctx: SessionContext) -> Result<()> {
// desktop (the drop just churned + still went black). Instead, if the monitor is in HDR,
// open DDA in HDR (FP16 DuplicateOutput1 → BT.2020 PQ Main10); the normal-desktop DDA
// overlay/flip issues that drove us to WGC don't apply to the composed Winlogon UI.
let hdr =
unsafe { crate::win_display::advanced_color_enabled(target.target_id) };
// SAFETY: `advanced_color_enabled` is `unsafe` only because it queries the Win32 CCD
// API; it takes `target_id` by value (the live SudoVDA monitor's CCD target id) and
// allocates + owns every buffer it passes the OS internally. No caller pointer is
// involved, so nothing must outlive the call and there is no aliasing; a missing
// target id just yields false.
let hdr = unsafe { crate::win_display::advanced_color_enabled(target.target_id) };
dda = None; // reopen to capture the secure desktop
match open_dda(&target, cur_mode.width, cur_mode.height, effective_hz, hdr) {
Ok(mut p) => {
@@ -3330,12 +3389,27 @@ mod tests {
unsafe fn pull_verified(conn: *mut punktfunk_core::abi::PunktfunkConnection, count: u32) {
use punktfunk_core::error::PunktfunkStatus;
let mut got = 0u32;
// SAFETY: the inferred type is the `#[repr(C)]` POD `PunktfunkFrame` (a raw `*const u8`, a
// `usize`, and integer fields); all-zero is a valid bit pattern for every field (a null
// `data`, `len == 0`). It is only ever read after `next_au` below fully overwrites it on `Ok`,
// so the zeroed value is never observed.
let mut frame = unsafe { std::mem::zeroed() };
while got < count {
// SAFETY: `conn` is the live, non-null `*mut PunktfunkConnection` from `punktfunk_connect`
// (the caller asserts non-null and does not close it until after this returns), meeting the
// ABI's "valid handle". `&mut frame` is an exclusive, writable borrow of the local
// `PunktfunkFrame` that outlives this synchronous call. This single test thread is the only
// video puller, satisfying the one-video-thread rule.
match unsafe {
punktfunk_core::abi::punktfunk_connection_next_au(conn, &mut frame, 2000)
} {
PunktfunkStatus::Ok => {
// SAFETY: on `Ok`, `next_au` set `frame.data`/`frame.len` to the reassembled AU
// buffer the connection owns; per the ABI contract that borrow stays valid until
// the NEXT `next_au` call on this handle. We read the whole slice here (the assert
// + length-checked indexing) before the loop's next `next_au`, and `conn` outlives
// it — so the pointer is live, exactly `len` bytes, read-only, single-threaded (no
// aliasing/use-after-free).
let data = unsafe { std::slice::from_raw_parts(frame.data, frame.len) };
let idx = u32::from_le_bytes(data[0..4].try_into().unwrap());
assert_eq!(
@@ -3383,6 +3457,11 @@ mod tests {
// Session 1: TOFU (no pin) — observe the host fingerprint.
let addr = std::ffi::CString::new("127.0.0.1").unwrap();
let mut observed = [0u8; 32];
// SAFETY: `addr` is a live `CString` ("127.0.0.1") whose `as_ptr()` is the NUL-terminated
// UTF-8 host string the contract requires; `pin_sha256`/cert/key are NULL (all permitted), and
// `observed.as_mut_ptr()` is the local `[u8; 32]` — exactly the 32 writable bytes the contract
// demands, not aliased during the call. Every pointer references a live local that outlives the
// blocking connect.
let conn = unsafe {
punktfunk_connect(
addr.as_ptr(),
@@ -3401,26 +3480,28 @@ mod tests {
assert_ne!(observed, [0u8; 32], "fingerprint not reported");
let (mut w, mut h, mut hz) = (0u32, 0u32, 0u32);
assert_eq!(
unsafe { punktfunk_connection_mode(conn, &mut w, &mut h, &mut hz) },
PunktfunkStatus::Ok
);
// SAFETY: `conn` is the live, non-null connection handle just asserted above; `&mut w/h/hz` are
// exclusive, writable borrows of local `u32`s that outlive this synchronous call — the three
// writable out-params the contract names.
let st = unsafe { punktfunk_connection_mode(conn, &mut w, &mut h, &mut hz) };
assert_eq!(st, PunktfunkStatus::Ok);
assert_eq!((w, h, hz), (1280, 720, 60));
// Mid-stream renegotiation: request a new mode, the host acks on the control
// stream, and punktfunk_connection_mode reflects the switch.
assert_eq!(
unsafe {
punktfunk_core::abi::punktfunk_connection_request_mode(conn, 1920, 1080, 144)
},
PunktfunkStatus::Ok
);
// SAFETY: `conn` is the live, non-null connection handle (the only pointer arg); the remaining
// arguments are by-value integers. The handle outlives this non-blocking enqueue.
let st = unsafe {
punktfunk_core::abi::punktfunk_connection_request_mode(conn, 1920, 1080, 144)
};
assert_eq!(st, PunktfunkStatus::Ok);
let deadline = std::time::Instant::now() + std::time::Duration::from_secs(5);
loop {
assert_eq!(
unsafe { punktfunk_connection_mode(conn, &mut w, &mut h, &mut hz) },
PunktfunkStatus::Ok
);
// SAFETY: same as the earlier `punktfunk_connection_mode` call — `conn` is the live handle
// and `&mut w/h/hz` are exclusive writable borrows of locals that outlive this synchronous
// call.
let st = unsafe { punktfunk_connection_mode(conn, &mut w, &mut h, &mut hz) };
assert_eq!(st, PunktfunkStatus::Ok);
if (w, h, hz) == (1920, 1080, 144) {
break;
}
@@ -3431,6 +3512,8 @@ mod tests {
std::thread::sleep(std::time::Duration::from_millis(20));
}
// SAFETY: `pull_verified` requires a live connection handle it alone pulls video from; `conn` is
// the open, non-null handle from `punktfunk_connect` and this is the only thread touching it.
unsafe { pull_verified(conn, 25) };
let ev = punktfunk_core::input::InputEvent {
@@ -3441,13 +3524,19 @@ mod tests {
y: 2,
flags: 0,
};
assert_eq!(
unsafe { punktfunk_connection_send_input(conn, &ev) },
PunktfunkStatus::Ok
);
// SAFETY: `conn` is the live handle; `&ev` borrows the local `InputEvent`, valid and immutable
// for this synchronous enqueue — the contract's "valid InputEvent" pointer.
let st = unsafe { punktfunk_connection_send_input(conn, &ev) };
assert_eq!(st, PunktfunkStatus::Ok);
// SAFETY: `conn` was returned by `punktfunk_connect` and is never used after this call (session
// 2 below uses a fresh `conn2`); `close` takes ownership and frees the handle exactly once.
unsafe { punktfunk_connection_close(conn) };
// Session 2 (same host process — the listener survived): pin the fingerprint.
// SAFETY: as for session 1 — `addr` is the live NUL-terminated host string; here
// `observed.as_ptr()` is the 32-byte pin (the fingerprint captured above, a valid `[u8; 32]`),
// `observed_sha256_out` is NULL and cert/key are NULL. All pointers reference live locals for
// the duration of the blocking connect.
let conn2 = unsafe {
punktfunk_connect(
addr.as_ptr(),
@@ -3463,11 +3552,17 @@ mod tests {
)
};
assert!(!conn2.is_null(), "pinned reconnect failed");
// SAFETY: `conn2` is the live, non-null pinned handle, pulled only from this thread —
// `pull_verified`'s requirement.
unsafe { pull_verified(conn2, 25) };
// SAFETY: `conn2` came from `punktfunk_connect` and is not used after this; `close` frees it once.
unsafe { punktfunk_connection_close(conn2) };
// Session 3: a wrong pin must be rejected by the handshake.
let bad = [0xAAu8; 32];
// SAFETY: same shape as the prior connects — `addr` is the live host string, `bad.as_ptr()` is
// the 32-byte `[0xAA; 32]` pin, and out/cert/key are NULL; all reference live locals across the
// blocking call. (The handshake is expected to fail and return NULL here, which is sound.)
let conn3 = unsafe {
punktfunk_connect(
addr.as_ptr(),
@@ -3487,6 +3582,8 @@ mod tests {
// The host saw the rejected handshake attempt as session 3? No — a TLS-failed
// handshake never yields a connection, so accept() is still waiting. Connect once
// more (TOFU) to complete the host's third session and let it exit.
// SAFETY: same as session 1's connect — `addr` is the live host string, pin/out/cert/key all
// NULL; the pointers reference live locals for the duration of the blocking connect.
let conn4 = unsafe {
punktfunk_connect(
addr.as_ptr(),
@@ -3502,7 +3599,9 @@ mod tests {
)
};
assert!(!conn4.is_null());
// SAFETY: `conn4` is the live, non-null handle, pulled only from this thread.
unsafe { pull_verified(conn4, 25) };
// SAFETY: `conn4` came from `punktfunk_connect` and is unused after this; `close` frees it once.
unsafe { punktfunk_connection_close(conn4) };
host.join().unwrap().unwrap();
@@ -11,6 +11,9 @@
//! state) auto-revert at thread exit (= session end); the process-wide bits revert at process exit.
//! See `docs/host-latency-plan.md` Tier 3A.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
#[cfg(target_os = "windows")]
mod imp {
#![allow(non_snake_case)]
@@ -49,6 +52,10 @@ mod imp {
/// Process-wide tuning, applied exactly once. Reverts at process exit. Best-effort: each call is
/// independent and a failure is ignored (e.g. a non-elevated host may not get HIGH class).
fn tune_process_once() {
// SAFETY: each call is a C-ABI FFI into winmm/kernel32/dwmapi declared with a matching
// `extern "system"` signature; every argument is a plain integer (no pointers/buffers escape),
// and `GetCurrentProcess()` returns the current-process pseudo-handle (a constant, always valid,
// never closed). The body runs inside `get_or_init`, so it executes exactly once per process.
PROCESS_TUNED.get_or_init(|| unsafe {
// 1 ms timer granularity (default ~15.6 ms) — the floor for precise frame pacing and the
// encode|send split's sub-ms sleeps.
@@ -70,6 +77,11 @@ mod imp {
/// thread exits, so a session that ends tears them down without explicit bookkeeping.
pub fn on_hot_thread() {
tune_process_once();
// SAFETY: C-ABI FFI declared with matching `extern "system"` signatures. SetThreadExecutionState
// takes only flag bits. `task` is a local NUL-terminated UTF-16 buffer ("Games\0") alive for the
// whole block, so `task.as_ptr()` is a valid LPCWSTR for the call, and `&mut idx` is a live local
// u32 the call writes the task index into. The returned MMCSS handle is intentionally leaked (the
// OS reverts the characteristics at thread exit), so there is nothing to free or double-free.
unsafe {
SetThreadExecutionState(ES_CONTINUOUS | ES_DISPLAY_REQUIRED | ES_SYSTEM_REQUIRED);
let task: Vec<u16> = "Games\0".encode_utf16().collect();
+10 -3
View File
@@ -13,6 +13,9 @@
//! owned keepalive whose `Drop` releases the output (RAII — no explicit `destroy`). Capture
//! consumes the node via [`crate::capture::capture_virtual_output`].
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::Result;
pub use punktfunk_core::Mode;
#[cfg(target_os = "linux")]
@@ -225,6 +228,8 @@ pub fn compositor_for_kind(kind: ActiveKind) -> Option<Compositor> {
#[cfg(target_os = "linux")]
fn default_runtime_dir() -> String {
std::env::var("XDG_RUNTIME_DIR").unwrap_or_else(|_| {
// SAFETY: `getuid()` is a parameterless POSIX call that always succeeds and touches no
// memory — it just returns the calling process's real uid. Nothing is aliased or freed.
let uid = unsafe { libc::getuid() };
format!("/run/user/{uid}")
})
@@ -245,6 +250,8 @@ fn default_bus(runtime: &str) -> String {
#[cfg(target_os = "linux")]
pub fn detect_active_session() -> ActiveSession {
use std::os::unix::fs::MetadataExt;
// SAFETY: `getuid()` is a parameterless POSIX call that always succeeds and touches no memory —
// it just returns the calling process's real uid. Nothing is aliased or freed.
let uid = unsafe { libc::getuid() };
let xdg_runtime_dir = default_runtime_dir();
let dbus = default_bus(&xdg_runtime_dir);
@@ -615,12 +622,12 @@ mod gamescope;
#[cfg(target_os = "linux")]
#[path = "vdisplay/linux/kwin.rs"]
mod kwin;
#[cfg(target_os = "linux")]
#[path = "vdisplay/linux/mutter.rs"]
mod mutter;
#[cfg(target_os = "windows")]
#[path = "vdisplay/windows/manager.rs"]
pub(crate) mod manager;
#[cfg(target_os = "linux")]
#[path = "vdisplay/linux/mutter.rs"]
mod mutter;
#[cfg(target_os = "windows")]
#[path = "vdisplay/windows/pf_vdisplay.rs"]
pub(crate) mod pf_vdisplay;
@@ -15,6 +15,8 @@
//! the KWin session's environment.
#![allow(clippy::all, dead_code, non_camel_case_types, non_snake_case, unused)]
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{Mode, VirtualDisplay, VirtualOutput};
use anyhow::{anyhow, bail, Context, Result};
@@ -495,6 +497,11 @@ fn run(
events: libc::POLLIN,
revents: 0,
};
// SAFETY: `&mut pfd` points at a single live, fully-initialized `libc::pollfd` on the stack, and
// the count `1` matches that one-element array, so `poll` reads `fd`/`events` and writes `revents`
// strictly within `pfd`. `pfd.fd` is the Wayland connection's fd, valid because `conn` (and the
// `prepare_read` guard) are alive across the call. `poll` blocks up to 200 ms and writes only
// `revents`; `pfd` outlives the synchronous call and aliases nothing (a fresh local).
let r = unsafe { libc::poll(&mut pfd, 1, 200) };
if r > 0 && (pfd.revents & libc::POLLIN) != 0 {
let _ = guard.read();
@@ -13,6 +13,9 @@
//! its `Drop` releases the refcount (a *stale* lease — its monitor was preempted + recreated under it —
//! is a no-op, so it can never tear down the live monitor).
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::os::windows::io::{AsRawHandle, OwnedHandle};
use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
use std::sync::{Arc, Mutex, Once, OnceLock};
@@ -60,8 +63,12 @@ pub(crate) trait VdisplayDriver: Send + Sync {
///
/// # Safety
/// `dev` must be the live control handle from [`open`](Self::open).
unsafe fn add_monitor(&self, dev: HANDLE, mode: Mode, render_luid: Option<LUID>)
-> Result<AddedMonitor>;
unsafe fn add_monitor(
&self,
dev: HANDLE,
mode: Mode,
render_luid: Option<LUID>,
) -> Result<AddedMonitor>;
/// REMOVE the monitor identified by `key`.
///
/// # Safety
@@ -147,7 +154,8 @@ pub(crate) fn init(driver: Box<dyn VdisplayDriver>) -> &'static VirtualDisplayMa
/// The process-wide manager. Panics if reached before a backend called [`init`] — by construction a
/// session is only ever created after `vdisplay::open` constructed the backend (which calls `init`).
pub(crate) fn vdm() -> &'static VirtualDisplayManager {
VDM.get().expect("VirtualDisplayManager used before a backend initialised it")
VDM.get()
.expect("VirtualDisplayManager used before a backend initialised it")
}
impl VirtualDisplayManager {
@@ -161,6 +169,10 @@ impl VirtualDisplayManager {
if let Some(d) = self.device.get() {
return Ok(HANDLE(d.as_raw_handle()));
}
// SAFETY: `VdisplayDriver::open` is `unsafe` only because it issues SetupAPI + `DeviceIoControl`
// FFI in the caller's apartment; `ensure_device` runs that on the acquiring thread under the
// `state` lock (callers hold it), so there is no concurrent open. `open` has no handle
// precondition to uphold, and the `OwnedHandle` it returns is the sole owner of the device.
let (handle, watchdog_s) = unsafe { self.driver.open()? };
self.watchdog_s.store(watchdog_s, Ordering::Relaxed);
let raw = HANDLE(handle.as_raw_handle());
@@ -171,9 +183,7 @@ impl VirtualDisplayManager {
/// The live control handle for the pinger/linger threads (lock-free: the device never changes once
/// opened). `None` only before the first acquire opened it.
fn device_handle(&self) -> Option<HANDLE> {
self.device
.get()
.map(|d| HANDLE(d.as_raw_handle()))
self.device.get().map(|d| HANDLE(d.as_raw_handle()))
}
/// Open + initialise the backend (validates the driver is present). Mirrors the old
@@ -196,8 +206,7 @@ impl VirtualDisplayManager {
// client is gone). A REUSED IddCx swap-chain is DEAD, so joining it hands a black screen —
// PREEMPT: tear the old monitor down (its key/topology are restored) and create a fresh one. The
// old session's lease is gen-stamped, so its later drop is a no-op and can't tear down the new one.
if idd_push_mode()
&& matches!(*state, MgrState::Active { .. } | MgrState::Lingering { .. })
if idd_push_mode() && matches!(*state, MgrState::Active { .. } | MgrState::Lingering { .. })
{
if let MgrState::Active { mon, .. } | MgrState::Lingering { mon, .. } =
std::mem::replace(&mut *state, MgrState::Idle)
@@ -206,6 +215,10 @@ impl VirtualDisplayManager {
old_target = mon.target_id,
"IDD-push reconnect — preempting the prior session, recreating a fresh monitor"
);
// SAFETY: `teardown` requires `dev` to be the live control handle; `dev` is the value
// `ensure_device()` returned above (the device is cached in the `OnceLock` and never
// closed for the manager's lifetime). `mon` was moved out of the prior `Active`/
// `Lingering` state by `mem::replace`, so it is exclusively owned here — no aliasing.
unsafe { self.teardown(dev, mon) };
// Let the OS finish the ASYNC monitor departure before the next ADD; a back-to-back
// REMOVE→ADD races the teardown and the ADD IOCTL is rejected under reconnect churn.
@@ -219,21 +232,37 @@ impl VirtualDisplayManager {
if let MgrState::Active { mon, refs } = &mut *state {
*refs += 1;
if mon.mode != mode {
// SAFETY: `reconfigure` only manipulates the live display topology via the CCD/GDI
// helpers and needs an exclusive `&mut Monitor`. `mon` is the `&mut` into the current
// `Active` state, held under the `state` lock, so nothing else reconfigures it concurrently.
unsafe { self.reconfigure(mon, mode) };
}
tracing::info!(refs = *refs, backend = self.driver.name(), "virtual monitor reused (concurrent / reconfigure session)");
tracing::info!(
refs = *refs,
backend = self.driver.name(),
"virtual monitor reused (concurrent / reconfigure session)"
);
return Ok(self.output_for(mon));
}
// Idle or Lingering: repurpose a lingering monitor / create a fresh one → Active{refs:1}.
let mon = match std::mem::replace(&mut *state, MgrState::Idle) {
MgrState::Lingering { mut mon, .. } => {
tracing::info!(backend = self.driver.name(), "virtual monitor reused (reconnect within the linger window)");
tracing::info!(
backend = self.driver.name(),
"virtual monitor reused (reconnect within the linger window)"
);
if mon.mode != mode {
// SAFETY: `reconfigure` needs an exclusive `&mut Monitor` and only touches the live
// display topology. `mon` is the local monitor just moved out of the `Lingering`
// state (sole owner), and we hold the `state` lock — no concurrent reconfigure.
unsafe { self.reconfigure(&mut mon, mode) };
}
mon
}
// SAFETY: `create_monitor` requires `dev` to be the live control handle; `dev` is the
// handle `ensure_device()` returned above (cached in the `OnceLock`, never closed for the
// manager's lifetime), and we hold the `state` lock.
MgrState::Idle => unsafe { self.create_monitor(dev, mode)? },
MgrState::Active { .. } => unreachable!("handled above"),
};
@@ -262,17 +291,27 @@ impl VirtualDisplayManager {
/// # Safety
/// `dev` must be the live control handle.
unsafe fn create_monitor(&'static self, dev: HANDLE, mode: Mode) -> Result<Monitor> {
// SAFETY: `create_monitor`'s own `# Safety` contract guarantees `dev` is the live control
// handle; we forward it unchanged to `add_monitor`, whose precondition is exactly that.
// `resolve_render_pin()` returns an `Option<LUID>` by value (plain `Copy`), so no borrowed
// memory crosses the call.
let added = unsafe { self.driver.add_monitor(dev, mode, resolve_render_pin())? };
// Mandatory keepalive: ping inside the watchdog window or the driver tears all displays down.
// The pinger reaches the singleton for both the device + the driver — no raw-handle smuggle.
let stop = Arc::new(AtomicBool::new(false));
let interval = Duration::from_millis(self.watchdog_s.load(Ordering::Relaxed) as u64 * 1000 / 3);
let interval =
Duration::from_millis(self.watchdog_s.load(Ordering::Relaxed) as u64 * 1000 / 3);
let stop_t = stop.clone();
let pinger = thread::spawn(move || {
let mut warned = false;
while !stop_t.load(Ordering::Relaxed) {
if let Some(h) = vdm().device_handle() {
// SAFETY: `ping` requires `dev` to be the live control handle. `h` is from
// `device_handle()` (the `Some` branch) — the `OnceLock<Arc<OwnedHandle>>` that,
// once set, is never cleared or closed for the process lifetime, so the handle is
// live for this call. The pinger thread only spins while the `&'static` manager
// singleton (and thus the device) lives.
match unsafe { vdm().driver.ping(h) } {
Ok(()) => warned = false,
Err(e) => {
@@ -292,6 +331,9 @@ impl VirtualDisplayManager {
let mut gdi_name = None;
for _ in 0..15 {
thread::sleep(Duration::from_millis(200));
// SAFETY: `resolve_gdi_name` is `unsafe` for its CCD (QueryDisplayConfig) FFI; it takes a
// plain `Copy` `u32` target id by value and returns an owned `String`, so no caller memory
// is borrowed across the call.
if let Some(n) = unsafe { resolve_gdi_name(added.target_id) } {
gdi_name = Some(n);
break;
@@ -308,6 +350,9 @@ impl VirtualDisplayManager {
// display(s) first via the atomic CCD path promotes the IDD to a composited primary with no
// MODE_CHANGE storm. Opt out with PUNKTFUNK_NO_ISOLATE=1.
if std::env::var("PUNKTFUNK_NO_ISOLATE").is_err() {
// SAFETY: `isolate_displays_ccd` is `unsafe` for its CCD topology FFI; it takes a
// `Copy` `u32` by value and returns an owned `SavedConfig` snapshot (no borrowed
// memory crosses). It runs under the `state` lock, the sole mutator of the topology.
ccd_saved = unsafe { isolate_displays_ccd(added.target_id) };
} else {
tracing::info!("display isolation skipped (PUNKTFUNK_NO_ISOLATE) — IDD stays extended");
@@ -339,10 +384,15 @@ impl VirtualDisplayManager {
/// Touches the live display topology via the CCD/GDI helpers.
unsafe fn reconfigure(&self, mon: &mut Monitor, mode: Mode) {
tracing::info!(
old = format!("{}x{}@{}", mon.mode.width, mon.mode.height, mon.mode.refresh_hz),
old = format!(
"{}x{}@{}",
mon.mode.width, mon.mode.height, mon.mode.refresh_hz
),
new = format!("{}x{}@{}", mode.width, mode.height, mode.refresh_hz),
"virtual-display: reconfiguring reused monitor to the new client mode"
);
// SAFETY: `resolve_gdi_name` is `unsafe` for its CCD FFI; it takes the `Copy` `u32`
// `mon.target_id` by value and returns an owned `String`, so nothing borrowed crosses the call.
if let Some(n) = unsafe { resolve_gdi_name(mon.target_id) } {
mon.gdi_name = Some(n);
}
@@ -365,10 +415,16 @@ impl VirtualDisplayManager {
if let Some(saved) = &mon.ccd_saved {
restore_displays_ccd(saved);
}
// SAFETY: `teardown`'s own `# Safety` contract guarantees `dev` is the live control handle, and
// `remove_monitor` requires exactly that. `&mon.key` borrows the `MonitorKey` inside the
// still-owned `mon`, alive for this synchronous IOCTL, so the pointer the driver reads stays valid.
if let Err(e) = unsafe { self.driver.remove_monitor(dev, &mon.key) } {
tracing::warn!("virtual-display REMOVE failed: {e:#}");
} else {
tracing::info!(backend = self.driver.name(), "virtual-display monitor removed");
tracing::info!(
backend = self.driver.name(),
"virtual-display monitor removed"
);
}
}
@@ -385,10 +441,16 @@ impl VirtualDisplayManager {
return;
}
*state = match std::mem::replace(&mut *state, MgrState::Idle) {
MgrState::Active { mon, refs } if refs > 1 => MgrState::Active { mon, refs: refs - 1 },
MgrState::Active { mon, refs } if refs > 1 => MgrState::Active {
mon,
refs: refs - 1,
},
MgrState::Active { mon, .. } => {
let ms = linger_ms();
tracing::info!(linger_ms = ms, "virtual-display: last session left — lingering before teardown");
tracing::info!(
linger_ms = ms,
"virtual-display: last session left — lingering before teardown"
);
MgrState::Lingering {
mon,
until: Instant::now() + Duration::from_millis(ms),
@@ -470,6 +532,10 @@ impl VirtualDisplayManager {
}
};
if let Some(mon) = taken {
// SAFETY: `teardown` requires `dev` to be the live control handle; `dev` is from
// `self.device_handle()` (the `Some` checked just above), i.e. the cached
// `OwnedHandle` live for the process lifetime. `mon` was moved out of the
// `Lingering` state under the `state` lock, so it is exclusively owned here.
unsafe { self.teardown(dev, mon) };
}
})
@@ -503,9 +569,13 @@ fn idd_push_mode() -> bool {
/// ACCESS_LOST storm SudoVDA hit when pinned).
fn resolve_render_pin() -> Option<LUID> {
if crate::config::config().render_adapter.is_some() {
// SAFETY: `resolve_render_adapter_luid` is `unsafe` only for its DXGI factory FFI; it takes no
// arguments and returns an `Option<LUID>` by value, so there is no input/borrow to keep valid.
unsafe { crate::win_adapter::resolve_render_adapter_luid() }
} else if crate::config::config().idd_push {
tracing::info!("IDD push: pinning the discrete render GPU (SET_RENDER_ADAPTER)");
// SAFETY: as above — `resolve_render_adapter_luid` takes no arguments and returns an
// `Option<LUID>` by value; the `unsafe` covers only its DXGI factory enumeration FFI.
unsafe { crate::win_adapter::resolve_render_adapter_luid() }
} else {
tracing::info!(
@@ -14,6 +14,9 @@
//! target id, so the CCD/DXGI code works unchanged). Only the driver-specific bits (GUID, IOCTL codes,
//! request/reply structs, the version handshake) differ, per `pf_driver_proto`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::ffi::c_void;
use std::mem::size_of;
use std::os::windows::io::{FromRawHandle, OwnedHandle};
@@ -144,15 +147,26 @@ impl VdisplayDriver for PfVdisplayDriver {
}
unsafe fn open(&self) -> Result<(OwnedHandle, u32)> {
// SAFETY: `open_device` is `unsafe` only because it issues SetupAPI enumeration + `CreateFileW`
// FFI; it takes no arguments and returns an owned raw `HANDLE` (or `Err`). Called here on the
// backend-init thread, with no precondition beyond a valid thread context.
let device = unsafe { open_device()? };
// HARD protocol-version check (unlike SudoVDA's best-effort log): a mismatched host/driver pair
// fails loudly here rather than corrupting the IOCTL stream.
let mut info_buf = [0u8; size_of::<control::InfoReply>()];
// SAFETY: `ioctl` requires `h` to be a valid device handle and its slices to be valid for the
// call. `device` is the live handle just returned by `open_device`. `IOCTL_GET_INFO` takes no
// input (`&[]`) and writes into `info_buf`, a stack `[u8; size_of::<InfoReply>()]` whose length
// is passed as the output size — so `DeviceIoControl` can't write OOB — and which outlives this
// synchronous call.
unsafe { ioctl(device, control::IOCTL_GET_INFO, &[], &mut info_buf) }
.context("pf-vdisplay IOCTL_GET_INFO (version handshake)")?;
let info: control::InfoReply =
bytemuck::pod_read_unaligned(&info_buf[..size_of::<control::InfoReply>()]);
if info.protocol_version != pf_driver_proto::PROTOCOL_VERSION {
// SAFETY: `device` is the valid raw handle from `open_device` and has NOT yet been wrapped
// in an `OwnedHandle` (that happens only on the success path below), so this error path is
// the sole owner closing it exactly once — no double-close.
unsafe {
let _ = CloseHandle(device);
}
@@ -171,12 +185,19 @@ impl VdisplayDriver for PfVdisplayDriver {
);
// Reap monitors orphaned by a crashed previous host — a FIRST-CLASS op (driver returns SUCCESS).
let mut none: [u8; 0] = [];
// SAFETY: `device` is the live handle from `open_device` (still owned here, before it is wrapped
// below). `IOCTL_CLEAR_ALL` has no input and no output: `&[]` and the empty `none` slice pass
// zero-length buffers, so nothing is read or written through them.
if unsafe { ioctl(device, control::IOCTL_CLEAR_ALL, &[], &mut none) }.is_ok() {
tracing::info!("cleared orphaned virtual monitors on host startup");
} else {
tracing::warn!("pf-vdisplay IOCTL_CLEAR_ALL failed on startup (continuing)");
}
Ok((
// SAFETY: `device` is the valid handle from `open_device`, still owned here and NOT closed
// on this success path (the error paths above close it and return). `from_raw_handle`'s
// contract — caller owns a valid handle — holds, so ownership transfers cleanly into the
// `OwnedHandle`: exactly one owner, which `CloseHandle`s it on drop.
unsafe { OwnedHandle::from_raw_handle(device.0 as _) },
watchdog_s,
))
@@ -199,6 +220,9 @@ impl VdisplayDriver for PfVdisplayDriver {
// SET_RENDER_ADAPTER (opt-in; pf-vdisplay IMPLEMENTS it). Non-fatal on failure: the driver reports
// its real render LUID in the shared header, so the host binds correctly even if this is ignored.
if let Some(luid) = render_luid {
// SAFETY: `add_monitor`'s `# Safety` contract guarantees `dev` is the live control handle,
// which is `set_render_adapter`'s precondition; we forward it unchanged. `luid` is a plain
// `Copy` `LUID` passed by value — no borrow crosses the call.
match unsafe { set_render_adapter(dev, luid) } {
Ok(()) => tracing::info!(
luid = format!("{:08x}:{:08x}", luid.HighPart, luid.LowPart),
@@ -210,14 +234,17 @@ impl VdisplayDriver for PfVdisplayDriver {
}
}
let mut out = [0u8; size_of::<control::AddReply>()];
unsafe { ioctl(dev, control::IOCTL_ADD, bytemuck::bytes_of(&add), &mut out) }.with_context(
|| {
// SAFETY: per `add_monitor`'s contract `dev` is the live control handle. `bytemuck::bytes_of(&add)`
// borrows the local `AddRequest` (alive across this synchronous call) as the input bytes, and
// `out` is a stack `[u8; size_of::<AddReply>()]` whose length bounds the kernel's write — both
// buffers outlive the call.
unsafe { ioctl(dev, control::IOCTL_ADD, bytemuck::bytes_of(&add), &mut out) }
.with_context(|| {
format!(
"pf-vdisplay ADD {}x{}@{}",
mode.width, mode.height, mode.refresh_hz
)
},
)?;
})?;
// `pod_read_unaligned` (NOT `from_bytes`): `out` is a stack `[u8; N]` with no guaranteed 4-byte
// alignment, and `from_bytes` PANICS on a mismatch. This copies into an aligned `AddReply`.
let reply: control::AddReply =
@@ -260,11 +287,24 @@ impl VdisplayDriver for PfVdisplayDriver {
session_id: *session_id,
};
let mut none: [u8; 0] = [];
unsafe { ioctl(dev, control::IOCTL_REMOVE, bytemuck::bytes_of(&req), &mut none) }.map(|_| ())
// SAFETY: per `remove_monitor`'s contract `dev` is the live control handle. `bytes_of(&req)`
// borrows the local `RemoveRequest` for the duration of this synchronous call as the input
// bytes; `none` is empty, so there is no output buffer.
unsafe {
ioctl(
dev,
control::IOCTL_REMOVE,
bytemuck::bytes_of(&req),
&mut none,
)
}
.map(|_| ())
}
unsafe fn ping(&self, dev: HANDLE) -> Result<()> {
let mut none: [u8; 0] = [];
// SAFETY: per `ping`'s contract `dev` is the live control handle. `IOCTL_PING` has no input
// (`&[]`) and no output (`none` is empty), so no memory is read or written through the buffers.
unsafe { ioctl(dev, control::IOCTL_PING, &[], &mut none) }.map(|_| ())
}
}
@@ -292,7 +332,11 @@ impl VirtualDisplay for PfVdisplayDisplay {
/// Readiness probe: can we open the pf-vdisplay control device?
pub fn probe() -> Result<()> {
// SAFETY: `open_device` is `unsafe` only for its SetupAPI + `CreateFileW` FFI; no arguments, returns
// an owned raw `HANDLE` (or `Err`).
let h = unsafe { open_device()? };
// SAFETY: `h` is the handle just opened by `open_device` in this function, owned here and not yet
// handed anywhere else, so this closes it exactly once — no double-close, no use-after-close.
unsafe {
let _ = CloseHandle(h);
}
@@ -301,6 +345,9 @@ pub fn probe() -> Result<()> {
/// Is the pf-vdisplay driver present (device interface enumerable)?
pub fn is_available() -> bool {
// SAFETY: `open_device` returns an owned raw `HANDLE`; on `Ok(h)` the handle is moved into the
// closure (sole owner) and closed exactly once via `CloseHandle`, on `Err` there is nothing to
// close — so no double-close and no leak of an opened handle. The `unsafe` covers both FFI calls.
unsafe { open_device().map(|h| CloseHandle(h)).is_ok() }
}
@@ -15,6 +15,9 @@
//! that is correct for launching *our own* streamer, but a store launcher needs the real user's token
//! for activation + auth). The host process itself stays SYSTEM.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::{bail, Context, Result};
use std::path::Path;
use windows::core::{PCWSTR, PWSTR};
@@ -40,6 +43,8 @@ use windows::Win32::System::Threading::{
/// user is logged on (a pre-login / freshly-booted box can stream the login desktop but cannot
/// auto-launch a store title until someone signs in).
pub fn spawn_in_active_session(cmdline: &str, workdir: Option<&Path>) -> Result<u32> {
// SAFETY: `spawn_inner` is unsafe only for its Win32 FFI; it has no caller-side preconditions — it
// validates the session/token itself and owns every handle it opens — so calling it is always sound.
unsafe { spawn_inner(cmdline, workdir) }
}
@@ -21,6 +21,9 @@
//! loaded into the service's environment and carried to the host child. Logs land in
//! `%ProgramData%\punktfunk\logs\`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use anyhow::{bail, Context, Result};
use std::ffi::{c_void, OsString};
use std::os::windows::io::{AsRawHandle, FromRawHandle, OwnedHandle};
@@ -205,14 +208,19 @@ fn run_service() -> Result<()> {
// Two manual-reset events: STOP (set once, never reset) and SESSION (set on a console
// connect/disconnect, reset by the supervisor after it reacts).
// SAFETY: CreateEventW with null attributes (None), manual-reset=true, initial-state=false and a null
// name passes no pointers into Rust memory; it returns a fresh, owned event HANDLE (or Err, via `?`).
// Nothing aliases or outlives the call.
let stop_raw =
unsafe { CreateEventW(None, true, false, PCWSTR::null()) }.context("CreateEvent stop")?;
// SAFETY: as above — a second fresh manual-reset event; no pointers into Rust memory, no aliasing.
let session_raw = unsafe { CreateEventW(None, true, false, PCWSTR::null()) }
.context("CreateEvent session")?;
// Own each event handle (the OS reaps them at process exit); the handler reaches them through the
// OnceLocks, while `supervise` waits on the borrowed `HANDLE`s. SAFETY: each is a fresh CreateEventW
// handle we own — take ownership exactly once.
let stop_owned = unsafe { OwnedHandle::from_raw_handle(stop_raw.0) };
// SAFETY: `session_raw` is the other fresh CreateEventW handle nothing else owns — take ownership once.
let session_owned = unsafe { OwnedHandle::from_raw_handle(session_raw.0) };
let stop = HANDLE(stop_owned.as_raw_handle());
let session = HANDLE(session_owned.as_raw_handle());
@@ -226,6 +234,9 @@ fn run_service() -> Result<()> {
match control {
ServiceControl::Stop | ServiceControl::Preshutdown | ServiceControl::Shutdown => {
if let Some(h) = event_handle(&STOP_EVENT) {
// SAFETY: `h` borrows the STOP event HANDLE from the STOP_EVENT OwnedHandle, set for
// the whole process lifetime and never closed before exit, so it is open here; SetEvent
// only signals the event and passes no Rust memory.
unsafe { SetEvent(h) }.ok();
}
ServiceControlHandlerResult::NoError
@@ -237,6 +248,9 @@ fn run_service() -> Result<()> {
ConsoleConnect | ConsoleDisconnect | SessionLogon
) {
if let Some(h) = event_handle(&SESSION_EVENT) {
// SAFETY: `h` borrows the SESSION event HANDLE from the SESSION_EVENT OwnedHandle,
// alive for the whole process lifetime and never closed before exit; SetEvent only
// signals the event and passes no Rust memory.
unsafe { SetEvent(h) }.ok();
}
}
@@ -297,6 +311,8 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
// Kill-on-close job so a service crash never orphans the SYSTEM host; BREAKAWAY_OK lets the host
// still spawn the WGC helper. Owned: dropping it at function exit (KILL_ON_JOB_CLOSE) reaps any
// straggler still inside it — no manual CloseHandle(job).
// SAFETY: `make_job` is unsafe only for its Win32 FFI; it has no caller preconditions and creates +
// immediately takes RAII ownership of the job object, so calling it here is sound.
let job = unsafe { make_job() }.context("create job object")?;
let mut restarts: u32 = 0;
@@ -304,6 +320,8 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
if wait_one(stop, 0) {
break;
}
// SAFETY: WTSGetActiveConsoleSessionId takes no arguments and returns the active console session
// id (or 0xFFFFFFFF); it passes no pointers, so the call is always sound.
let session = unsafe { WTSGetActiveConsoleSessionId() };
if session == 0xFFFF_FFFF {
// No interactive session yet (boot / fully logged out). Wait, but wake on stop/session.
@@ -311,12 +329,17 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
if wait_any(&[stop, session_ev], 3000) == Some(0) {
break;
}
// SAFETY: `session_ev` is the SESSION event HANDLE borrowed from the SESSION_EVENT OwnedHandle,
// alive for the process lifetime; ResetEvent only clears its signalled state, no Rust memory.
unsafe { ResetEvent(session_ev) }.ok();
continue;
}
// BORROW the owned job handle for AssignProcessToJobObject inside spawn_host.
let job_h = HANDLE(job.as_raw_handle());
// SAFETY: `spawn_host` is unsafe only for its Win32 FFI. `session` is a valid console session id
// (checked != 0xFFFFFFFF above), `cmdline`/`workdir` are live borrows for the call, and `job_h`
// borrows the still-live `job` OwnedHandle — every argument is valid for the call's duration.
let child = match unsafe { spawn_host(session, &cmdline, &workdir, job_h) } {
Ok(child) => child,
Err(e) => {
@@ -340,6 +363,9 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
match reason {
Some(0) => {
// Stop: terminate the child and exit (the `child` drop closes its handles).
// SAFETY: `proc_h` is a HANDLE copy of the still-live `child.process` OwnedHandle (not
// dropped until end of iteration), so the process handle is open; TerminateProcess only
// signals termination by handle and passes no Rust memory.
unsafe {
let _ = TerminateProcess(proc_h, 0);
}
@@ -347,7 +373,10 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
}
Some(1) => {
// Session change: relaunch only if the active console session actually moved.
// SAFETY: `session_ev` borrows the process-lifetime SESSION_EVENT OwnedHandle; ResetEvent
// only clears its signalled state and passes no Rust memory.
unsafe { ResetEvent(session_ev) }.ok();
// SAFETY: WTSGetActiveConsoleSessionId takes no arguments and passes no pointers.
let now = unsafe { WTSGetActiveConsoleSessionId() };
if now != session {
tracing::info!(
@@ -355,6 +384,8 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
new = now,
"console session changed — relaunching host"
);
// SAFETY: `proc_h` copies the still-live `child.process` OwnedHandle (dropped only at
// end of iteration), so the handle is open; TerminateProcess only signals by handle.
unsafe {
let _ = TerminateProcess(proc_h, 0);
}
@@ -363,6 +394,8 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
}
// Same session (e.g. a stray notification) — keep waiting on the same child.
let r = wait_any(&[stop, proc_h], INFINITE);
// SAFETY: `proc_h` copies the still-live `child.process` OwnedHandle (dropped only at end
// of iteration), so the handle is open; TerminateProcess only signals by handle.
unsafe {
let _ = TerminateProcess(proc_h, 0);
}
@@ -394,11 +427,17 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
/// `true` if `h` is signalled within `ms`.
fn wait_one(h: HANDLE, ms: u32) -> bool {
// SAFETY: `&[h]` is a live one-element HANDLE slice the caller keeps open across the wait; the kernel
// reads exactly one handle (the binding derives the count from the slice length), bWaitAll=false,
// `ms` is a timeout — no pointers escape and the array is only read for this synchronous call.
unsafe { WaitForMultipleObjects(&[h], false, ms) == WAIT_OBJECT_0 }
}
/// Wait on several handles; returns the index of the first signalled, or `None` on timeout.
fn wait_any(handles: &[HANDLE], ms: u32) -> Option<usize> {
// SAFETY: `handles` is a live slice the caller keeps open across the wait; WaitForMultipleObjects
// reads exactly `handles.len()` handles (the binding derives the count from the slice), bWaitAll=false,
// `ms` is a timeout — the array is only read for this synchronous call and no pointers escape it.
let r = unsafe { WaitForMultipleObjects(handles, false, ms) };
let idx = r.0.wrapping_sub(WAIT_OBJECT_0.0);
(idx < handles.len() as u32).then_some(idx as usize)
@@ -12,6 +12,9 @@
//!
//! Wire framing on stdout, per AU: `[u32 len LE][u64 pts_ns LE][u8 keyframe][len bytes data]`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use crate::capture::{dxgi::WinCaptureTarget, wgc::WgcCapturer, Capturer};
use crate::encode::{self, Codec};
use anyhow::{Context, Result};
@@ -72,6 +75,9 @@ pub fn run(opts: HelperOptions) -> Result<()> {
.name("pf-present-trigger".into())
.spawn(move || {
tracing::info!("present-trigger: starting D3D present loop on the virtual display");
// SAFETY: `present_trigger` is unsafe only for its Win32/D3D11 FFI; it has no caller
// preconditions (it creates and exclusively owns its own window, device, and swapchain on
// this dedicated thread), so the call is sound.
if let Err(e) = unsafe { present_trigger(w, h) } {
tracing::warn!("present-trigger error: {e:#}");
}
@@ -8,6 +8,9 @@
//! them, which let the SudoVDA backend be dropped without losing them (audit §9 / Goal 2 — done). The
//! plan's `windows/display_ccd.rs`. Extracted verbatim from the former SudoVDA backend before its removal.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use std::mem::size_of;
use windows::core::PCWSTR;
@@ -16,9 +19,9 @@ use windows::Win32::Devices::Display::{
QueryDisplayConfig, SetDisplayConfig, DISPLAYCONFIG_DEVICE_INFO_GET_ADVANCED_COLOR_INFO,
DISPLAYCONFIG_DEVICE_INFO_GET_SOURCE_NAME, DISPLAYCONFIG_DEVICE_INFO_SET_ADVANCED_COLOR_STATE,
DISPLAYCONFIG_GET_ADVANCED_COLOR_INFO, DISPLAYCONFIG_MODE_INFO, DISPLAYCONFIG_PATH_INFO,
DISPLAYCONFIG_SET_ADVANCED_COLOR_STATE, DISPLAYCONFIG_SOURCE_DEVICE_NAME, QDC_ONLY_ACTIVE_PATHS,
SDC_ALLOW_CHANGES, SDC_APPLY, SDC_FORCE_MODE_ENUMERATION, SDC_SAVE_TO_DATABASE,
SDC_USE_SUPPLIED_DISPLAY_CONFIG,
DISPLAYCONFIG_SET_ADVANCED_COLOR_STATE, DISPLAYCONFIG_SOURCE_DEVICE_NAME,
QDC_ONLY_ACTIVE_PATHS, SDC_ALLOW_CHANGES, SDC_APPLY, SDC_FORCE_MODE_ENUMERATION,
SDC_SAVE_TO_DATABASE, SDC_USE_SUPPLIED_DISPLAY_CONFIG,
};
use windows::Win32::Graphics::Gdi::{
ChangeDisplaySettingsExW, EnumDisplaySettingsW, CDS_TEST, CDS_UPDATEREGISTRY, DEVMODEW,
@@ -202,6 +205,10 @@ pub(crate) fn set_active_mode(gdi_name: &str, mode: Mode) {
dmSize: size_of::<DEVMODEW>() as u16,
..Default::default()
};
// SAFETY: `wname` is a live NUL-terminated UTF-16 device name (built above) whose pointer stays
// valid for the call; `&mut dm` is a live DEVMODEW with `dmSize` set that EnumDisplaySettingsW
// fills in for mode index `i`. Both outlive this synchronous call; the API only reads the name
// and writes `dm`, so nothing aliases.
let ok = unsafe {
EnumDisplaySettingsW(
PCWSTR(wname.as_ptr()),
@@ -269,6 +276,9 @@ pub(crate) fn set_active_mode(gdi_name: &str, mode: Mode) {
dmDisplayFrequency: chosen_hz,
..Default::default()
};
// SAFETY: `wname` is a live NUL-terminated UTF-16 device name and `&dm` is a live DEVMODEW describing
// the requested mode; both outlive the call. CDS_TEST only validates the mode (no apply), the two
// trailing args are null, and the API only reads its inputs.
let test = unsafe {
ChangeDisplaySettingsExW(PCWSTR(wname.as_ptr()), Some(&dm), None, CDS_TEST, None)
};
@@ -282,6 +292,9 @@ pub(crate) fn set_active_mode(gdi_name: &str, mode: Mode) {
);
return;
}
// SAFETY: same inputs as the CDS_TEST call above — `wname` (live NUL-terminated device name) and
// `&dm` (live DEVMODEW) both outlive the call; CDS_UPDATEREGISTRY applies the already-validated mode,
// and the API only reads its inputs.
let apply = unsafe {
ChangeDisplaySettingsExW(
PCWSTR(wname.as_ptr()),
View File
@@ -34,7 +34,7 @@ which kept the live-validated host working at every step. The driver, by contras
|---|---|---|
| **Goal 1** — clean, layered host architecture | ✅ **DONE** | `config.rs` (`HostConfig`), `session_plan.rs` (`SessionPlan`), `SessionContext`, `windows/`+`linux/` confinement (`38c68c3`), `VirtualDisplayManager` (§2.5), `EncoderCaps` (`0ccd0fe`) |
| **Goal 2** — drop every trace of SudoVDA | ✅ **DONE** | reach-in decoupled (F1: `d638a93`/`e60cda3``win_adapter`/`win_display`), then the `sudovda.rs` backend + the dual-backend select **deleted** (this branch) — pf-vdisplay is the sole Windows virtual-display backend |
| **Goal 3** — minimize `unsafe` + P0 lints | 🟡 **PARTIAL** (**box-validated**) | driver `deny(unsafe_op_in_unsafe_fn)` (`a755d6e`); **`OwnedHandle`/RAII rollout** — `idd_push.rs` (`011607e`, view-leak fix) + `service.rs` child/job (`4c95ba7`) + the 3 gamepad backends via shared `gamepad_raii.rs` (`e5c2b4e`) + the IDD-push `KeyedMutexGuard` hot loop (`6585643`); **driver `pod_init!`** (`bf57704`, 27→1). **On-glass clean: host clippy `-D warnings` + driver build** (RTX box; `bd05bc8` fixed 11 lints the gate surfaced). Remaining: host-crate P0 lints (deferred — churn>value), the `service.rs` SCM-handler event smuggling (deliberately left) |
| **Goal 3** — minimize `unsafe` + P0 lints | 🟡 **PARTIAL** (**box-validated**) | driver `deny(unsafe_op_in_unsafe_fn)` (`a755d6e`); **`OwnedHandle`/RAII rollout** — `idd_push.rs` (`011607e`, view-leak fix) + `service.rs` child/job (`4c95ba7`) + the 3 gamepad backends via shared `gamepad_raii.rs` (`e5c2b4e`) + the IDD-push `KeyedMutexGuard` hot loop (`6585643`) + the **SCM STOP/SESSION events**`OnceLock<OwnedHandle>` (`61c02e6`, runtime-validated: clean ~1 s `sc stop`); **driver `pod_init!`** (`bf57704`, 27→1). **On-glass clean: host clippy `-D warnings` + driver build** (RTX box; `bd05bc8` fixed 11 lints the gate surfaced). The host-side raw-handle smuggling is fully retired; only host-crate P0 lints remain (deferred — churn>value) |
| **M0** — proto ABI + driver toolchain + `/INTEGRITYCHECK` + `iddcx` | ✅ **DONE** | `pf-driver-proto`; vendored `windows-drivers-rs` 0.5.1; `clear-force-integrity.ps1`; CI-green |
| **M1** — new IddCx driver, first light + HDR | ✅ **DONE (on-glass)** | STEP 08 (`d7a9fbf``cd59151`); HDR live ("Mac connects WITH HDR", `6399d28`) |
| **M2** — IDD-push capture + NVENC, glass-to-glass | ✅ **DONE (on-glass)** | 5120×1440@240 HDR zero-copy; integrated into the host path |
@@ -226,14 +226,16 @@ These are expensive empirical wins; keep them intact when touching the code:
`unsafe fn`s need an inner `unsafe {}`). Stage it **per-module, Linux-first** (item-level `#[deny]` on
`linux/zerocopy/cuda.rs`/`egl.rs`, `encode/linux/vaapi.rs` — locally verifiable), then the Windows
modules (CI-gated), then promote to crate-level. The driver already has the deny.
5. **D2 — `OwnedHandle` / RAII rollout.****done** `capture/windows/idd_push.rs` (`011607e`: a
`MappedSection` RAII for the mapping handle **+** the leaked `MapViewOfFile` view, + `OwnedHandle` for the
event / ring-slot shared handles); `windows/service.rs` (`4c95ba7`: the child process/thread + Job
handles, ~9 `CloseHandle` deleted); and the **three gamepad backends** (`e5c2b4e`: a shared
5. **D2 — `OwnedHandle` / RAII rollout.****DONE (complete).** `capture/windows/idd_push.rs` (`011607e`:
a `MappedSection` RAII for the mapping handle **+** the leaked `MapViewOfFile` view, + `OwnedHandle` for
the event / ring-slot shared handles); `windows/service.rs` (`4c95ba7`: the child process/thread + Job
handles, ~9 `CloseHandle` deleted); the **three gamepad backends** (`e5c2b4e`: a shared
`inject/windows/gamepad_raii.rs``Shm` for the section+view, `SwDevice` for the devnode — replacing the
duplicated `create_shm_section` + three hand-written `Drop`s). **Remaining (deliberately left):** the
`service.rs` `AtomicIsize` STOP/SESSION events — smuggled into the C SCM handler, a separate riskier
redesign. `manager.rs`/`pf_vdisplay.rs` already used the pattern.
duplicated `create_shm_section` + three hand-written `Drop`s); and the **SCM STOP/SESSION events**
(`61c02e6`: `AtomicIsize` raw-`isize` smuggle → `OnceLock<OwnedHandle>` the capture-free C handler reads,
owned for the process lifetime — also closes a latent close-then-signal window). **Runtime-validated on
the RTX box**: swapped in, `sc start` → RUNNING, `sc stop` → clean STOPPED in ~1 s (not a timeout-kill),
original restored. `manager.rs`/`pf_vdisplay.rs` already used the pattern.
6. **Hot-loop `KeyedMutexGuard` ✅ done** (`6585643`) — the IDD-push consume loop's hand-written
`AcquireSync`/`ReleaseSync` (with its "don't `?`-return between them or you leak the lock + stall the
driver" caveat) is now a RAII guard scoped to the convert/copy block: same release point (latency
+32
View File
@@ -211,6 +211,38 @@ if ($WebDir -and (Test-Path $WebDir) -and $BunExe -and (Test-Path $BunExe)) {
}
else { Write-Host "no -WebDir/-BunExe -> installer built WITHOUT the web console" }
# --- build + stage the HDR Vulkan layer (pf-vkhdr-layer) --------------------------------------
# A tiny always-on Vulkan implicit layer (cdylib) that advertises HDR10/scRGB surface formats on the
# virtual display so Vulkan games (Doom: The Dark Ages, etc.) can enable HDR while streaming — the
# NVIDIA/AMD ICDs hide HDR formats on an indirect display even though they accept+present a forced HDR
# swapchain there. Self-gated on the display's actual advanced-color state, so it's a no-op on SDR.
# Standalone crate (own [workspace]); built here and registered by the installer. Skipped if cargo
# is unavailable or the build fails -> installer is produced WITHOUT the layer (non-fatal).
$layerSrc = Join-Path $here 'pf-vkhdr-layer'
if (Test-Path (Join-Path $layerSrc 'Cargo.toml')) {
$layerTarget = Join-Path $OutDir 'vklayer-target'
Write-Host "==> building pf-vkhdr-layer (cdylib)"
$prevTarget = $env:CARGO_TARGET_DIR
$env:CARGO_TARGET_DIR = $layerTarget
Push-Location $layerSrc
& cargo build --release
$layerExit = $LASTEXITCODE
Pop-Location
if ($prevTarget) { $env:CARGO_TARGET_DIR = $prevTarget } else { Remove-Item Env:\CARGO_TARGET_DIR -ErrorAction SilentlyContinue }
$layerDll = Join-Path $layerTarget 'release\pf_vkhdr_layer.dll'
if ($layerExit -eq 0 -and (Test-Path $layerDll)) {
$layerStage = Join-Path $OutDir 'vklayer'
New-Item -ItemType Directory -Force -Path $layerStage | Out-Null
Copy-Item $layerDll (Join-Path $layerStage 'pf_vkhdr_layer.dll') -Force
Copy-Item (Join-Path $layerSrc 'pf_vkhdr_layer.json') (Join-Path $layerStage 'pf_vkhdr_layer.json') -Force
Sign-File (Join-Path $layerStage 'pf_vkhdr_layer.dll')
$defines += "/DVkLayerDir=$layerStage"
Write-Host "==> staged pf-vkhdr-layer -> $layerStage"
}
else { Write-Warning "pf-vkhdr-layer build failed ($layerExit) — installer built WITHOUT the HDR Vulkan layer" }
}
else { Write-Host "no pf-vkhdr-layer crate -> installer built WITHOUT the HDR Vulkan layer" }
# --- build the installer (from the non-redirected copy under C:\t) -----------------------------
Write-Host "==> ISCC $($defines -join ' ') $issLocal"
& $iscc @defines $issLocal
@@ -0,0 +1,23 @@
[package]
name = "pf-vkhdr-layer"
version = "0.1.0"
edition = "2021"
description = "punktfunk Vulkan implicit layer: inject HDR10/scRGB surface formats on the virtual display so Vulkan games (id Tech, etc.) detect HDR over an IddCx virtual display"
license = "MIT OR Apache-2.0"
publish = false
[lib]
name = "pf_vkhdr_layer"
crate-type = ["cdylib"]
[dependencies]
ash = "=0.38.0+1.3.281"
[profile.release]
opt-level = 2
panic = "abort"
strip = true
# Standalone package (not a member of the main punktfunk cargo workspace) — it's a Windows-only
# cdylib built on its own by pack-host-installer.ps1, like the UMDF driver crates next to it.
[workspace]
@@ -0,0 +1,70 @@
# pf-vkhdr-layer — HDR Vulkan layer for the virtual display
A tiny Vulkan **implicit layer** (`VK_LAYER_PUNKTFUNK_hdr_inject`) that lets **Vulkan games enable
HDR while streaming over the punktfunk virtual display**.
## The problem it solves
On Windows, NVIDIA/AMD Vulkan ICDs do **not** advertise any HDR color space
(`VK_COLOR_SPACE_HDR10_ST2084_EXT`, `VK_COLOR_SPACE_EXTENDED_SRGB_LINEAR_EXT`) for a surface that
lives on an **IddCx indirect / virtual display** — even when Windows "Use HDR" is on and the desktop
is composited at 10-bit. So Vulkan games (Doom: The Dark Ages and the rest of id Tech, Indiana Jones
and the Great Circle, …) query `vkGetPhysicalDeviceSurfaceFormatsKHR`, find no HDR color space, and
refuse HDR ("This device does not support HDR"). D3D11/D3D12 HDR works on the very same display
because the OS compositor drives it — only the **Vulkan WSI enumeration** is gated.
This was long believed unfixable from outside the GPU driver (the Apollo/Sunshine/Virtual-Display-
Driver communities all concluded "it's in Windows's kernel"). It isn't: an on-box experiment proved
the ICD happily **accepts and presents a *forced* HDR swapchain** on that exact virtual-display
surface (`vkCreateSwapchainKHR` + `vkSetHdrMetadataEXT` + present all succeed) — it simply won't
*advertise* the format. So the whole fix is to add the HDR surface formats to the enumeration the
game queries; once the game requests that swapchain, the ICD honors it. **Validated live: Doom: The
Dark Ages enables HDR over the virtual display with this layer.**
## What it does
- Intercepts `vkGetPhysicalDeviceSurfaceFormatsKHR` / `...2KHR`, calls down to the ICD, and appends
`{A2B10G10R10_UNORM_PACK32, HDR10_ST2084_EXT}` + `{R16G16B16A16_SFLOAT, EXTENDED_SRGB_LINEAR_EXT}`
(deduped — a no-op on real HDR monitors that already list them).
- **Self-gates**: it only injects when the surface's monitor actually has Windows advanced-color
(HDR) *enabled* right now (checked via `DisplayConfigGetDeviceInfo` / `GET_ADVANCED_COLOR_INFO`).
So it does **nothing** on SDR sessions/displays — no washed-out "SDR-in-HDR". It tracks
`VkSurfaceKHR → HWND` by intercepting `vkCreateWin32SurfaceKHR`.
- Everything else is pass-through dispatch chaining (instance + device).
It is shipped as an **always-on** implicit layer (loads via the registry, so it works regardless of
how a game is launched — including via an already-running Steam, which env-based scoping can't
guarantee). Because of the self-gate it is inert outside HDR streaming.
## Controls
| Variable | Effect |
|---|---|
| `DISABLE_PF_VKHDR=1` | Loader-standard off-switch — disables the whole layer for that process. |
| `PF_VKHDR_EXCLUDE=foo.exe,bar.exe` | Extra exe basenames to skip (in addition to a small built-in kernel-anti-cheat default list: `cs2.exe`, `rainbowsix.exe`, …). |
| `PF_VKHDR_LOG=1` | Write a debug log to `%TEMP%\pf_vkhdr_layer.log`. |
## Build / install
Standalone crate (own `[workspace]`), Windows-only `cdylib`:
```sh
cargo build --release # -> target/release/pf_vkhdr_layer.dll
```
The host installer (`packaging/windows/pack-host-installer.ps1``punktfunk-host.iss`) builds it,
lays `pf_vkhdr_layer.dll` + `pf_vkhdr_layer.json` into `{app}\vklayer`, and registers it under
`HKLM64\SOFTWARE\Khronos\Vulkan\ImplicitLayers` (opt-out task "Install the HDR Vulkan layer").
Manual dev install: drop the DLL + JSON in one directory and add a `REG_DWORD` value named after the
JSON's full path (data `0`) under `HKLM\SOFTWARE\Khronos\Vulkan\ImplicitLayers` (or `HKCU\...`).
Confirm with `vulkaninfo` (Surface section) — `HDR10_ST2084_EXT` should appear when the display has
HDR enabled.
## Notes
- x64 only (the Windows host is x64 only).
- Anti-cheat: the layer is benign (it only *adds* surface formats; it never touches rendering,
memory, or input) and is signed, but because it's always-on it is *present* in every Vulkan
process. The built-in exclude list + `PF_VKHDR_EXCLUDE` + `DISABLE_PF_VKHDR` cover kernel-anti-
cheat titles you'd rather it stay out of.
@@ -0,0 +1,17 @@
{
"file_format_version": "1.2.0",
"layer": {
"name": "VK_LAYER_PUNKTFUNK_hdr_inject",
"type": "GLOBAL",
"library_path": ".\\pf_vkhdr_layer.dll",
"api_version": "1.4.280",
"implementation_version": "1",
"description": "punktfunk: advertise HDR10/scRGB Vulkan surface formats on the virtual display (NVIDIA/AMD ICDs hide them on indirect displays even though the ICD accepts + presents a forced HDR swapchain).",
"functions": {
"vkNegotiateLoaderLayerInterfaceVersion": "vkNegotiateLoaderLayerInterfaceVersion"
},
"disable_environment": {
"DISABLE_PF_VKHDR": "1"
}
}
}
+841
View File
@@ -0,0 +1,841 @@
//! punktfunk Vulkan implicit layer — `VK_LAYER_PUNKTFUNK_hdr_inject`.
//!
//! ## Why
//! On Windows, NVIDIA/AMD Vulkan ICDs do **not** advertise any HDR color space
//! (`HDR10_ST2084_EXT`, `EXTENDED_SRGB_LINEAR_EXT`) for a surface on an IddCx *indirect / virtual*
//! display — even when Windows "Use HDR" is enabled and the desktop is composited at 10-bit. So
//! Vulkan games (Doom: The Dark Ages and the rest of id Tech, Indiana Jones, …) query
//! `vkGetPhysicalDeviceSurfaceFormatsKHR`, find no HDR color space, and refuse HDR
//! ("This device does not support HDR"). DX11/DX12 HDR works on the same display because the OS
//! compositor drives it; only the Vulkan WSI *enumeration* is gated. An on-box spike proved the ICD
//! *accepts and presents* a forced HDR swapchain on that exact surface — it just won't *advertise*
//! the format. So the entire fix is to append the HDR surface formats to the enumeration the game
//! queries; once the game asks for that swapchain, the ICD honors it.
//!
//! ## What this layer does
//! Intercepts `vkGetPhysicalDeviceSurfaceFormatsKHR` / `...2KHR`, calls down to the ICD, and appends
//! `{A2B10G10R10_UNORM_PACK32, HDR10_ST2084_EXT}` and `{R16G16B16A16_SFLOAT, EXTENDED_SRGB_LINEAR_EXT}`
//! (deduped). **Self-gating:** it only injects when the surface's monitor actually has Windows
//! advanced-color (HDR) *enabled* — so it is a complete no-op on SDR sessions and on real monitors
//! (which already advertise HDR, and dedup drops the duplicate). It tracks `VkSurfaceKHR -> HWND` by
//! intercepting `vkCreateWin32SurfaceKHR`. Everything else is pass-through dispatch chaining.
//!
//! Off-switches: the loader-standard `DISABLE_PF_VKHDR=1` (disables the whole layer), and
//! `PF_VKHDR_EXCLUDE` (comma/semicolon list of exe basenames to skip — defaults include known
//! kernel-anti-cheat titles). `PF_VKHDR_LOG=1` enables a debug log in `%TEMP%\pf_vkhdr_layer.log`.
#![allow(non_snake_case)]
#![allow(clippy::missing_safety_doc)]
#![allow(clippy::too_many_arguments)]
// HWND / HMONITOR etc. deliberately mirror the Win32 names.
#![allow(clippy::upper_case_acronyms)]
use ash::vk;
use ash::vk::Handle;
use std::collections::HashMap;
use std::ffi::{c_char, c_void, CStr};
use std::ptr;
use std::sync::{Mutex, OnceLock};
// ---- Vulkan loader<->layer glue (vk_layer.h; not in the core registry / ash) ----
// The loader's private create-info sTypes squat on 47/48 (NOT the 1000211xxx range).
const LOADER_INSTANCE_CREATE_INFO: i32 = 47;
const LOADER_DEVICE_CREATE_INFO: i32 = 48;
const VK_LAYER_LINK_INFO: i32 = 0;
const LAYER_NEGOTIATE_INTERFACE_STRUCT: i32 = 1;
type PfnGipa = vk::PFN_vkGetInstanceProcAddr;
type PfnGdpa = vk::PFN_vkGetDeviceProcAddr;
type PfnGpdpa = unsafe extern "system" fn(vk::Instance, *const c_char) -> vk::PFN_vkVoidFunction;
#[repr(C)]
struct BaseIn {
s_type: vk::StructureType,
p_next: *const c_void,
}
#[repr(C)]
struct LayerInstanceLink {
p_next: *mut LayerInstanceLink,
next_gipa: PfnGipa,
next_gpdpa: Option<PfnGpdpa>,
}
#[repr(C)]
struct LayerInstanceCreateInfo {
s_type: vk::StructureType,
p_next: *const c_void,
function: i32,
u: *mut LayerInstanceLink,
}
#[repr(C)]
struct LayerDeviceLink {
p_next: *mut LayerDeviceLink,
next_gipa: PfnGipa,
next_gdpa: PfnGdpa,
}
#[repr(C)]
struct LayerDeviceCreateInfo {
s_type: vk::StructureType,
p_next: *const c_void,
function: i32,
u: *mut LayerDeviceLink,
}
#[repr(C)]
pub struct NegotiateLayerInterface {
s_type: i32,
p_next: *mut c_void,
loader_layer_interface_version: u32,
pfn_gipa: Option<PfnGipa>,
pfn_gdpa: Option<PfnGdpa>,
pfn_gpdpa: Option<PfnGpdpa>,
}
// raw mirror of VkSurfaceFormat2KHR (avoid ash lifetime generics in fn-pointer types)
#[repr(C)]
#[derive(Clone, Copy)]
struct SurfaceFormat2Raw {
s_type: vk::StructureType,
p_next: *mut c_void,
surface_format: vk::SurfaceFormatKHR,
}
// ---- ICD function-pointer typedefs we call down to (raw pointers, no lifetimes) ----
type FnCreateInstance =
unsafe extern "system" fn(*const c_void, *const c_void, *mut vk::Instance) -> vk::Result;
type FnDestroyInstance = unsafe extern "system" fn(vk::Instance, *const c_void);
type FnCreateDevice = unsafe extern "system" fn(
vk::PhysicalDevice,
*const c_void,
*const c_void,
*mut vk::Device,
) -> vk::Result;
type FnGetSurfFmts = unsafe extern "system" fn(
vk::PhysicalDevice,
vk::SurfaceKHR,
*mut u32,
*mut vk::SurfaceFormatKHR,
) -> vk::Result;
type FnGetSurfFmts2 = unsafe extern "system" fn(
vk::PhysicalDevice,
*const c_void,
*mut u32,
*mut c_void,
) -> vk::Result;
type FnCreateWin32Surface = unsafe extern "system" fn(
vk::Instance,
*const c_void,
*const c_void,
*mut vk::SurfaceKHR,
) -> vk::Result;
type FnDestroySurface = unsafe extern "system" fn(vk::Instance, vk::SurfaceKHR, *const c_void);
struct InstanceData {
instance: vk::Instance,
next_gipa: PfnGipa,
next_gpdpa: Option<PfnGpdpa>,
destroy_instance: Option<FnDestroyInstance>,
get_surface_formats: Option<FnGetSurfFmts>,
get_surface_formats2: Option<FnGetSurfFmts2>,
create_win32_surface: Option<FnCreateWin32Surface>,
destroy_surface: Option<FnDestroySurface>,
}
unsafe impl Send for InstanceData {}
struct DeviceData {
next_gdpa: PfnGdpa,
}
unsafe impl Send for DeviceData {}
fn instances() -> &'static Mutex<HashMap<usize, InstanceData>> {
static M: OnceLock<Mutex<HashMap<usize, InstanceData>>> = OnceLock::new();
M.get_or_init(|| Mutex::new(HashMap::new()))
}
fn devices() -> &'static Mutex<HashMap<usize, DeviceData>> {
static M: OnceLock<Mutex<HashMap<usize, DeviceData>>> = OnceLock::new();
M.get_or_init(|| Mutex::new(HashMap::new()))
}
/// VkSurfaceKHR handle -> the HWND it was created from (so we can resolve its monitor's HDR state).
fn surface_hwnds() -> &'static Mutex<HashMap<u64, isize>> {
static M: OnceLock<Mutex<HashMap<u64, isize>>> = OnceLock::new();
M.get_or_init(|| Mutex::new(HashMap::new()))
}
// dispatch key = first pointer word of a dispatchable handle (loader dispatch table ptr).
#[inline]
unsafe fn key(raw: u64) -> usize {
*(raw as usize as *const usize)
}
#[inline]
unsafe fn as_pfn(p: *const c_void) -> vk::PFN_vkVoidFunction {
Some(std::mem::transmute::<
*const c_void,
unsafe extern "system" fn(),
>(p))
}
#[inline]
unsafe fn resolve<T: Copy>(gipa: PfnGipa, inst: vk::Instance, name: &CStr) -> Option<T> {
gipa(inst, name.as_ptr()).map(|f| std::mem::transmute_copy::<_, T>(&f))
}
fn log(msg: &str) {
if std::env::var_os("PF_VKHDR_LOG").is_none() {
return;
}
use std::io::Write;
let mut path = std::env::temp_dir();
path.push("pf_vkhdr_layer.log");
if let Ok(mut f) = std::fs::OpenOptions::new()
.create(true)
.append(true)
.open(path)
{
let _ = writeln!(f, "{msg}");
}
}
fn hdr_extra() -> [vk::SurfaceFormatKHR; 2] {
[
vk::SurfaceFormatKHR {
format: vk::Format::A2B10G10R10_UNORM_PACK32,
color_space: vk::ColorSpaceKHR::HDR10_ST2084_EXT,
},
vk::SurfaceFormatKHR {
format: vk::Format::R16G16B16A16_SFLOAT,
color_space: vk::ColorSpaceKHR::EXTENDED_SRGB_LINEAR_EXT,
},
]
}
/// `false` if this process is on the anti-cheat exclude list (built-in + `PF_VKHDR_EXCLUDE`).
/// Computed once per process.
fn injection_allowed_for_process() -> bool {
static C: OnceLock<bool> = OnceLock::new();
*C.get_or_init(|| {
let exe = std::env::current_exe()
.ok()
.and_then(|p| p.file_name().map(|s| s.to_string_lossy().to_lowercase()))
.unwrap_or_default();
if exe.is_empty() {
return true;
}
// Conservative default skip-list for kernel-level anti-cheat titles. Users can extend or
// clear via PF_VKHDR_EXCLUDE. (HDR injection is benign, but we err toward not being present
// in these process' WSI path at all.)
const DENY: &[&str] = &[
"cs2.exe",
"rainbowsix.exe",
"rainbowsixgame.exe",
"r5apex.exe",
];
let mut denied = false;
for d in DENY {
if *d == exe {
denied = true;
}
}
if let Ok(extra) = std::env::var("PF_VKHDR_EXCLUDE") {
for e in extra.split([',', ';']) {
if e.trim().to_lowercase() == exe {
denied = true;
}
}
}
if denied {
log(&format!("injection disabled for excluded process: {exe}"));
}
!denied
})
}
// ---- Win32 / DisplayConfig: is the surface's monitor HDR-enabled right now? ----
mod hdr {
use std::ffi::c_void;
pub type HWND = isize;
pub type HMONITOR = isize;
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Rect {
pub l: i32,
pub t: i32,
pub r: i32,
pub b: i32,
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct MonitorInfoExW {
pub cb: u32,
pub rc_monitor: Rect,
pub rc_work: Rect,
pub flags: u32,
pub sz_device: [u16; 32],
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Luid {
pub low: u32,
pub high: i32,
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Rational {
pub num: u32,
pub den: u32,
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Source {
pub adapter: Luid,
pub id: u32,
pub mode_idx: u32,
pub status: u32,
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Target {
pub adapter: Luid,
pub id: u32,
pub mode_idx: u32,
pub tech: i32,
pub rotation: i32,
pub scaling: i32,
pub refresh: Rational,
pub scanline: i32,
pub available: i32,
pub status: u32,
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct PathInfo {
pub src: Source,
pub tgt: Target,
pub flags: u32,
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct ModeInfo {
pub _b: [u8; 64],
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct Header {
pub typ: i32,
pub size: u32,
pub adapter: Luid,
pub id: u32,
}
#[repr(C)]
pub struct AdvInfo {
pub header: Header,
pub value: u32,
pub enc: i32,
pub bpc: i32,
}
#[repr(C)]
pub struct SourceName {
pub header: Header,
pub gdi: [u16; 32],
}
#[link(name = "user32")]
extern "system" {
fn MonitorFromWindow(h: HWND, flags: u32) -> HMONITOR;
fn GetMonitorInfoW(h: HMONITOR, mi: *mut MonitorInfoExW) -> i32;
fn GetDisplayConfigBufferSizes(flags: u32, np: *mut u32, nm: *mut u32) -> i32;
fn QueryDisplayConfig(
flags: u32,
np: *mut u32,
pa: *mut PathInfo,
nm: *mut u32,
ma: *mut ModeInfo,
topo: *mut c_void,
) -> i32;
fn DisplayConfigGetDeviceInfo(p: *mut c_void) -> i32;
}
const QDC_ONLY_ACTIVE_PATHS: u32 = 2;
const GET_SOURCE_NAME: i32 = 1;
const GET_ADVANCED_COLOR_INFO: i32 = 9;
const MONITOR_DEFAULTTONEAREST: u32 = 2;
unsafe fn active_paths() -> Vec<PathInfo> {
let (mut np, mut nm) = (0u32, 0u32);
if GetDisplayConfigBufferSizes(QDC_ONLY_ACTIVE_PATHS, &mut np, &mut nm) != 0 || np == 0 {
return Vec::new();
}
let mut pa: Vec<PathInfo> = vec![std::mem::zeroed(); np as usize];
let mut ma: Vec<ModeInfo> = vec![std::mem::zeroed(); nm as usize];
if QueryDisplayConfig(
QDC_ONLY_ACTIVE_PATHS,
&mut np,
pa.as_mut_ptr(),
&mut nm,
ma.as_mut_ptr(),
std::ptr::null_mut(),
) != 0
{
return Vec::new();
}
pa.truncate(np as usize);
pa
}
unsafe fn target_hdr_enabled(p: &PathInfo) -> bool {
let mut ai: AdvInfo = std::mem::zeroed();
ai.header.typ = GET_ADVANCED_COLOR_INFO;
ai.header.size = std::mem::size_of::<AdvInfo>() as u32;
ai.header.adapter = p.tgt.adapter;
ai.header.id = p.tgt.id;
if DisplayConfigGetDeviceInfo(&mut ai as *mut _ as *mut c_void) != 0 {
return false;
}
// value bitfield: bit0 advancedColorSupported, bit1 advancedColorEnabled.
(ai.value & 0b10) != 0
}
unsafe fn source_gdi(p: &PathInfo) -> [u16; 32] {
let mut sn: SourceName = std::mem::zeroed();
sn.header.typ = GET_SOURCE_NAME;
sn.header.size = std::mem::size_of::<SourceName>() as u32;
sn.header.adapter = p.src.adapter;
sn.header.id = p.src.id;
let _ = DisplayConfigGetDeviceInfo(&mut sn as *mut _ as *mut c_void);
sn.gdi
}
/// Is HDR (Windows advanced color) currently enabled on the display this surface lives on?
/// `hwnd == 0`/unknown falls back to "any active display has HDR enabled".
pub unsafe fn enabled_for(hwnd: HWND) -> bool {
let paths = active_paths();
if hwnd != 0 {
let mon = MonitorFromWindow(hwnd, MONITOR_DEFAULTTONEAREST);
let mut mi: MonitorInfoExW = std::mem::zeroed();
mi.cb = std::mem::size_of::<MonitorInfoExW>() as u32;
if GetMonitorInfoW(mon, &mut mi) != 0 {
for p in &paths {
if source_gdi(p) == mi.sz_device {
return target_hdr_enabled(p);
}
}
}
}
paths.iter().any(|p| target_hdr_enabled(p))
}
}
/// Should we inject HDR formats for this surface right now?
unsafe fn should_inject(surface: vk::SurfaceKHR) -> bool {
if !injection_allowed_for_process() {
return false;
}
let hwnd = surface_hwnds()
.lock()
.ok()
.and_then(|m| m.get(&surface.as_raw()).copied())
.unwrap_or(0);
hdr::enabled_for(hwnd)
}
// ---- entry point ----
#[no_mangle]
pub unsafe extern "system" fn vkNegotiateLoaderLayerInterfaceVersion(
p: *mut NegotiateLayerInterface,
) -> vk::Result {
if p.is_null() {
return vk::Result::ERROR_INITIALIZATION_FAILED;
}
let s = &mut *p;
if s.s_type != LAYER_NEGOTIATE_INTERFACE_STRUCT {
return vk::Result::ERROR_INITIALIZATION_FAILED;
}
if s.loader_layer_interface_version > 2 {
s.loader_layer_interface_version = 2;
}
s.pfn_gipa = Some(layer_gipa);
s.pfn_gdpa = Some(layer_gdpa);
s.pfn_gpdpa = Some(layer_gpdpa);
log("negotiate: VK_LAYER_PUNKTFUNK_hdr_inject active (v2)");
vk::Result::SUCCESS
}
// ---- proc-addr dispatch ----
unsafe extern "system" fn layer_gipa(
instance: vk::Instance,
p_name: *const c_char,
) -> vk::PFN_vkVoidFunction {
if p_name.is_null() {
return None;
}
match CStr::from_ptr(p_name).to_bytes() {
b"vkGetInstanceProcAddr" => as_pfn(layer_gipa as *const c_void),
b"vkGetDeviceProcAddr" => as_pfn(layer_gdpa as *const c_void),
b"vkCreateInstance" => as_pfn(create_instance as *const c_void),
b"vkDestroyInstance" => as_pfn(destroy_instance as *const c_void),
b"vkCreateDevice" => as_pfn(create_device as *const c_void),
b"vkGetPhysicalDeviceSurfaceFormatsKHR" => as_pfn(get_surface_formats as *const c_void),
b"vkGetPhysicalDeviceSurfaceFormats2KHR" => as_pfn(get_surface_formats2 as *const c_void),
b"vkCreateWin32SurfaceKHR" => as_pfn(create_win32_surface as *const c_void),
b"vkDestroySurfaceKHR" => as_pfn(destroy_surface as *const c_void),
_ => {
if instance == vk::Instance::null() {
return None;
}
let next = {
let g = instances().lock().ok()?;
g.get(&key(instance.as_raw())).map(|d| d.next_gipa)
};
next.and_then(|gipa| gipa(instance, p_name))
}
}
}
unsafe extern "system" fn layer_gpdpa(
instance: vk::Instance,
p_name: *const c_char,
) -> vk::PFN_vkVoidFunction {
if p_name.is_null() {
return None;
}
match CStr::from_ptr(p_name).to_bytes() {
b"vkGetPhysicalDeviceSurfaceFormatsKHR" => as_pfn(get_surface_formats as *const c_void),
b"vkGetPhysicalDeviceSurfaceFormats2KHR" => as_pfn(get_surface_formats2 as *const c_void),
_ => {
if instance == vk::Instance::null() {
return None;
}
let next = {
let g = instances().lock().ok()?;
g.get(&key(instance.as_raw())).and_then(|d| d.next_gpdpa)
};
next.and_then(|gpdpa| gpdpa(instance, p_name))
}
}
}
unsafe extern "system" fn layer_gdpa(
device: vk::Device,
p_name: *const c_char,
) -> vk::PFN_vkVoidFunction {
if p_name.is_null() {
return None;
}
if CStr::from_ptr(p_name).to_bytes() == b"vkGetDeviceProcAddr" {
return as_pfn(layer_gdpa as *const c_void);
}
if device == vk::Device::null() {
return None;
}
let next = {
let g = match devices().lock() {
Ok(g) => g,
Err(_) => return None,
};
g.get(&key(device.as_raw())).map(|d| d.next_gdpa)
};
next.and_then(|gdpa| gdpa(device, p_name))
}
// ---- instance chain ----
unsafe fn find_instance_link(p_ci: *const c_void) -> *mut LayerInstanceCreateInfo {
let mut node = (*(p_ci as *const BaseIn)).p_next as *const BaseIn;
while !node.is_null() {
if (*node).s_type.as_raw() == LOADER_INSTANCE_CREATE_INFO {
let lci = node as *mut LayerInstanceCreateInfo;
if (*lci).function == VK_LAYER_LINK_INFO {
return lci;
}
}
node = (*node).p_next as *const BaseIn;
}
ptr::null_mut()
}
unsafe extern "system" fn create_instance(
p_ci: *const c_void,
p_alloc: *const c_void,
p_inst: *mut vk::Instance,
) -> vk::Result {
let lci = find_instance_link(p_ci);
if lci.is_null() {
return vk::Result::ERROR_INITIALIZATION_FAILED;
}
let link = (*lci).u;
if link.is_null() {
return vk::Result::ERROR_INITIALIZATION_FAILED;
}
let next_gipa = (*link).next_gipa;
let next_gpdpa = (*link).next_gpdpa;
(*lci).u = (*link).p_next;
let create: FnCreateInstance =
match resolve(next_gipa, vk::Instance::null(), c"vkCreateInstance") {
Some(f) => f,
None => return vk::Result::ERROR_INITIALIZATION_FAILED,
};
let res = create(p_ci, p_alloc, p_inst);
if res != vk::Result::SUCCESS {
return res;
}
let inst = *p_inst;
let data = InstanceData {
instance: inst,
next_gipa,
next_gpdpa,
destroy_instance: resolve(next_gipa, inst, c"vkDestroyInstance"),
get_surface_formats: resolve(next_gipa, inst, c"vkGetPhysicalDeviceSurfaceFormatsKHR"),
get_surface_formats2: resolve(next_gipa, inst, c"vkGetPhysicalDeviceSurfaceFormats2KHR"),
create_win32_surface: resolve(next_gipa, inst, c"vkCreateWin32SurfaceKHR"),
destroy_surface: resolve(next_gipa, inst, c"vkDestroySurfaceKHR"),
};
if let Ok(mut g) = instances().lock() {
g.insert(key(inst.as_raw()), data);
}
log("create_instance: hooked");
vk::Result::SUCCESS
}
unsafe extern "system" fn destroy_instance(inst: vk::Instance, p_alloc: *const c_void) {
if inst == vk::Instance::null() {
return;
}
let data = instances()
.lock()
.ok()
.and_then(|mut g| g.remove(&key(inst.as_raw())));
if let Some(d) = data {
if let Some(f) = d.destroy_instance {
f(inst, p_alloc);
}
}
}
// ---- device chain (pass-through; keeps device-level dispatch working) ----
unsafe fn find_device_link(p_ci: *const c_void) -> *mut LayerDeviceCreateInfo {
let mut node = (*(p_ci as *const BaseIn)).p_next as *const BaseIn;
while !node.is_null() {
if (*node).s_type.as_raw() == LOADER_DEVICE_CREATE_INFO {
let lci = node as *mut LayerDeviceCreateInfo;
if (*lci).function == VK_LAYER_LINK_INFO {
return lci;
}
}
node = (*node).p_next as *const BaseIn;
}
ptr::null_mut()
}
unsafe extern "system" fn create_device(
pdev: vk::PhysicalDevice,
p_ci: *const c_void,
p_alloc: *const c_void,
p_dev: *mut vk::Device,
) -> vk::Result {
let lci = find_device_link(p_ci);
if lci.is_null() {
return vk::Result::ERROR_INITIALIZATION_FAILED;
}
let link = (*lci).u;
if link.is_null() {
return vk::Result::ERROR_INITIALIZATION_FAILED;
}
let next_gipa = (*link).next_gipa;
let next_gdpa = (*link).next_gdpa;
(*lci).u = (*link).p_next;
let inst = instances()
.lock()
.ok()
.and_then(|g| g.get(&key(pdev.as_raw())).map(|d| d.instance))
.unwrap_or(vk::Instance::null());
let create: FnCreateDevice = match resolve(next_gipa, inst, c"vkCreateDevice") {
Some(f) => f,
None => return vk::Result::ERROR_INITIALIZATION_FAILED,
};
let res = create(pdev, p_ci, p_alloc, p_dev);
if res != vk::Result::SUCCESS {
return res;
}
let dev = *p_dev;
if let Ok(mut g) = devices().lock() {
g.insert(key(dev.as_raw()), DeviceData { next_gdpa });
}
vk::Result::SUCCESS
}
// ---- surface tracking (so we can resolve a surface's monitor) ----
unsafe extern "system" fn create_win32_surface(
inst: vk::Instance,
p_ci: *const c_void,
p_alloc: *const c_void,
p_surface: *mut vk::SurfaceKHR,
) -> vk::Result {
let down = instances().lock().ok().and_then(|g| {
g.get(&key(inst.as_raw()))
.and_then(|d| d.create_win32_surface)
});
let down = match down {
Some(f) => f,
None => return vk::Result::ERROR_EXTENSION_NOT_PRESENT,
};
let res = down(inst, p_ci, p_alloc, p_surface);
if res == vk::Result::SUCCESS {
// VkWin32SurfaceCreateInfoKHR: sType@0, pNext@8, flags@16, hinstance@24, hwnd@32
let hwnd = *((p_ci as *const u8).add(32) as *const isize);
if let Ok(mut m) = surface_hwnds().lock() {
m.insert((*p_surface).as_raw(), hwnd);
}
}
res
}
unsafe extern "system" fn destroy_surface(
inst: vk::Instance,
surface: vk::SurfaceKHR,
p_alloc: *const c_void,
) {
if let Ok(mut m) = surface_hwnds().lock() {
m.remove(&surface.as_raw());
}
let down = instances()
.lock()
.ok()
.and_then(|g| g.get(&key(inst.as_raw())).and_then(|d| d.destroy_surface));
if let Some(f) = down {
f(inst, surface, p_alloc);
}
}
// ---- the actual fix: append HDR surface formats (self-gated on display HDR state) ----
unsafe extern "system" fn get_surface_formats(
pdev: vk::PhysicalDevice,
surface: vk::SurfaceKHR,
p_count: *mut u32,
p_formats: *mut vk::SurfaceFormatKHR,
) -> vk::Result {
let down = instances().lock().ok().and_then(|g| {
g.get(&key(pdev.as_raw()))
.and_then(|d| d.get_surface_formats)
});
let down = match down {
Some(f) => f,
None => return vk::Result::ERROR_INITIALIZATION_FAILED,
};
let mut n = 0u32;
let r = down(pdev, surface, &mut n, ptr::null_mut());
if r != vk::Result::SUCCESS {
return r;
}
let mut real = vec![vk::SurfaceFormatKHR::default(); n as usize];
if n > 0 {
let r = down(pdev, surface, &mut n, real.as_mut_ptr());
if r != vk::Result::SUCCESS {
return r;
}
}
real.truncate(n as usize);
let mut aug = real;
if !aug.is_empty() && should_inject(surface) {
for e in hdr_extra() {
if !aug
.iter()
.any(|x| x.format == e.format && x.color_space == e.color_space)
{
aug.push(e);
}
}
}
if p_formats.is_null() {
*p_count = aug.len() as u32;
return vk::Result::SUCCESS;
}
let m = (*p_count as usize).min(aug.len());
ptr::copy_nonoverlapping(aug.as_ptr(), p_formats, m);
*p_count = m as u32;
if m < aug.len() {
vk::Result::INCOMPLETE
} else {
vk::Result::SUCCESS
}
}
unsafe extern "system" fn get_surface_formats2(
pdev: vk::PhysicalDevice,
p_info: *const c_void,
p_count: *mut u32,
p_formats: *mut c_void,
) -> vk::Result {
let down = instances().lock().ok().and_then(|g| {
g.get(&key(pdev.as_raw()))
.and_then(|d| d.get_surface_formats2)
});
let down = match down {
Some(f) => f,
None => return vk::Result::ERROR_INITIALIZATION_FAILED,
};
let mut n = 0u32;
let r = down(pdev, p_info, &mut n, ptr::null_mut());
if r != vk::Result::SUCCESS {
return r;
}
let mut real: Vec<SurfaceFormat2Raw> = (0..n)
.map(|_| SurfaceFormat2Raw {
s_type: vk::StructureType::SURFACE_FORMAT_2_KHR,
p_next: ptr::null_mut(),
surface_format: vk::SurfaceFormatKHR::default(),
})
.collect();
if n > 0 {
let r = down(pdev, p_info, &mut n, real.as_mut_ptr() as *mut c_void);
if r != vk::Result::SUCCESS {
return r;
}
}
real.truncate(n as usize);
// VkPhysicalDeviceSurfaceInfo2KHR: sType@0, pNext@8, surface@16
let surface = vk::SurfaceKHR::from_raw(*((p_info as *const u8).add(16) as *const u64));
let mut extras: Vec<vk::SurfaceFormatKHR> = Vec::new();
if !real.is_empty() && should_inject(surface) {
for e in hdr_extra() {
if !real.iter().any(|x| {
x.surface_format.format == e.format && x.surface_format.color_space == e.color_space
}) {
extras.push(e);
}
}
}
let total = real.len() + extras.len();
if p_formats.is_null() {
*p_count = total as u32;
return vk::Result::SUCCESS;
}
let m = (*p_count as usize).min(total);
let out = p_formats as *mut SurfaceFormat2Raw;
for i in 0..m {
let sf = if i < real.len() {
real[i].surface_format
} else {
extras[i - real.len()]
};
let dst = out.add(i);
(*dst).s_type = vk::StructureType::SURFACE_FORMAT_2_KHR;
(*dst).surface_format = sf;
}
*p_count = m as u32;
if m < total {
vk::Result::INCOMPLETE
} else {
vk::Result::SUCCESS
}
}
+26
View File
@@ -56,6 +56,13 @@
#define WithWeb
#endif
#endif
; VkLayerDir (the staged pf-vkhdr-layer: pf_vkhdr_layer.dll + .json) is optional — present when the
; HDR Vulkan layer was built. It lets Vulkan games (Doom: The Dark Ages, etc.) enable HDR over the
; virtual display (the ICD won't advertise HDR there; the layer injects the surface formats, self-
; gated on the display's actual HDR state).
#ifdef VkLayerDir
#define WithVkLayer
#endif
[Setup]
AppId={{7C9E6A52-1F4B-4E8D-A3C7-2B5D8F1E0A93}
@@ -89,6 +96,9 @@ Name: "installdriver"; Description: "Install the pf-vdisplay virtual display dri
#ifdef WithGamepad
Name: "installgamepad"; Description: "Install the virtual gamepad drivers (DualSense / DualShock 4 / Xbox 360 — no ViGEmBus needed)"
#endif
#ifdef WithVkLayer
Name: "installhdrlayer"; Description: "Install the HDR Vulkan layer (lets Vulkan games like Doom use HDR on the virtual display)"
#endif
Name: "startservice"; Description: "Start the punktfunk host service now (also starts on every boot)"
[Files]
@@ -119,6 +129,22 @@ Source: "{#StageDir}\*"; DestDir: "{tmp}\pfvdisplay"; Flags: deleteafterinstall
; The vendored UMDF gamepad drivers + install-gamepad-drivers.ps1, extracted to {tmp}, removed after.
Source: "{#GamepadStageDir}\*"; DestDir: "{tmp}\gamepad"; Flags: deleteafterinstall recursesubdirs createallsubdirs; Tasks: installgamepad
#endif
#ifdef WithVkLayer
; The HDR Vulkan implicit layer (cdylib + its JSON manifest) laid into {app}\vklayer and registered
; below. The manifest's library_path is ".\pf_vkhdr_layer.dll" (relative to the JSON), so the two
; must live in the same directory.
Source: "{#VkLayerDir}\pf_vkhdr_layer.dll"; DestDir: "{app}\vklayer"; Flags: ignoreversion; Tasks: installhdrlayer
Source: "{#VkLayerDir}\pf_vkhdr_layer.json"; DestDir: "{app}\vklayer"; Flags: ignoreversion; Tasks: installhdrlayer
#endif
[Registry]
#ifdef WithVkLayer
; Register the HDR Vulkan implicit layer system-wide. The 64-bit Vulkan loader reads
; HKLM64\SOFTWARE\Khronos\Vulkan\ImplicitLayers; the value NAME is the manifest path and the DWORD
; DATA is 0 (= enabled). uninsdeletevalue removes just this value on uninstall. The layer is inert
; unless the target display has HDR enabled, and honors DISABLE_PF_VKHDR=1 as a global off-switch.
Root: HKLM64; Subkey: "SOFTWARE\Khronos\Vulkan\ImplicitLayers"; ValueType: dword; ValueName: "{app}\vklayer\pf_vkhdr_layer.json"; ValueData: 0; Flags: uninsdeletevalue; Tasks: installhdrlayer
#endif
[Run]
#ifdef WithDriver