8 Commits

Author SHA1 Message Date
enricobuehler 61c02e695e refactor(windows-host): OwnedHandle for the SCM STOP/SESSION events (Goal-3, last unsafe reduction)
The service's STOP/SESSION manual-reset events were smuggled across the C SCM
control-handler boundary as raw `isize` in `AtomicIsize` statics (the handler is a
capture-free `'static` closure, so it can't hold a non-`Send` `HANDLE` — it has to
reach the events through statics), reconstructed via `load_event`, and explicitly
`CloseHandle`d at `run_service` end.

Replace the raw-`isize` statics with `OnceLock<OwnedHandle>`:
- `run_service` creates each event, wraps it in an `OwnedHandle`, derives a borrowed
  `HANDLE` for `supervise` (unchanged signature), and `set`s the OnceLock (once per
  process) — all BEFORE the handler is registered, so the handler always sees `Some`.
- The handler reads `event_handle(&STOP_EVENT)` (a borrow) and `SetEvent`s it, with a
  defensive `None` guard (matches the old `SetEvent(HANDLE(0))` no-op if it ever fired
  pre-init).
- The events are owned by the OnceLocks for the process lifetime (the service process
  exits right after `run_service` returns, so the OS reaps them at exit). Dropping the
  explicit `CloseHandle` also removes the latent close-then-signal window the old
  statics had (the raw isize lingered after the close).

Deletes the `AtomicIsize`/`Ordering` import + `load_event` + the raw-isize smuggle —
the last host-side raw-handle reduction. Behaviour-preserving (same events, same
signal/wait/reset, same once-per-process init order). Linux check + fmt clean; the
file is #[cfg(windows)] → to be box-validated (compile + a service stop/restart).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:22:46 +00:00
enricobuehler 203ad8069d fix(web): library badge shows the actual store, not always "Steam"
The GameCard badge hard-coded steam-vs-custom, so any non-Steam non-custom store
rendered with the "Steam" label. Add storeLabel(store): steam/custom keep their
localized strings, every other store is shown as a capitalized proper noun — so the
new Lutris/Heroic providers (and future ones) surface correctly with no per-store
translation. tsc --noEmit clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:22:28 +00:00
enricobuehler 5f8c6b6147 feat(library): Lutris + Heroic store providers (Linux)
LutrisProvider reads the local pga.db (rusqlite, read-only/immutable so a running
Lutris can't block us) → installed games, launch via `lutris lutris:rungameid/<id>`,
cover art from Lutris's on-disk cache inlined as data: URLs (no public CDN keyed by a
stable id, unlike Steam/Heroic). HeroicProvider parses Heroic's store_cache JSON —
legendary/gog/nile = Epic+GOG+Amazon in one provider — installed-only with an
install-dir existence cross-check (works around Heroic's gog is_installed bug #2691),
free public CDN cover art, launch via `heroic --no-gui heroic://launch?...` (the
single-instance-Electron gamescope-escape caveat is documented; needs live confirm).

New command_for arms (lutris_id digits-guard, heroic runner+appName-guard) + both
providers wired into all_games(); everything Linux-gated (the launchers are
Linux-only), so the Windows/macOS host build is unaffected. Deps rusqlite (bundled
SQLite, no system dep) + base64 added to the Linux target only. Unit tests with
sqlite/json fixtures (installed-only filtering, CDN-art mapping, launch guards); live
`library` enumeration returns [] gracefully on a box without the launchers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:20:58 +00:00
enricobuehler cd3368fc71 docs(windows-host): KeyedMutexGuard done + record the on-glass build validation
Goal 3: the IDD-push hot-loop KeyedMutexGuard (6585643) landed, and the whole
session's Windows + driver work is now ON-GLASS BUILD-VALIDATED on the RTX box —
host clippy -D warnings clean + driver build clean (the gate that surfaced + got
11 lints fixed in bd05bc8). Only the deferred host P0 lints + the deliberately-
left service.rs SCM-handler event smuggling remain, plus an optional latency A/B.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:16:23 +00:00
enricobuehler bd05bc8c30 fix(windows): clippy/build cleanups the on-glass build surfaced (-D warnings)
Built the host crate (`cargo clippy --features nvenc -D warnings`) and the driver
workspace (`cargo build`) on the RTX box — the project's intended Windows gate,
which `cargo check` (what the goal1/§2.5 work used) never runs. It surfaced lint
issues accumulated across the goal1 / §2.5 / this-session Windows work:

- 9× redundant `as *mut c_void` after `.as_raw_handle()` (already `*mut c_void`):
  idd_push.rs (3, this session), service.rs (3, this session), manager.rs (3,
  pre-existing §2.5 — my OwnedHandle work copied the idiom). Removed the casts +
  the now-unused `use std::ffi::c_void` in idd_push.rs / manager.rs (service still
  uses it).
- `if_same_then_else` in session_plan.rs::resolve_topology (pre-existing goal1
  stage 3): collapsed the two `false` arms into one condition (behavior identical).
- `unused_unsafe` in the driver `pod_init!` macro: it expands at call sites already
  inside an `unsafe` block, where its own `unsafe` is redundant — `#[allow(
  unused_unsafe)]` (needed at the non-unsafe sites, redundant at the nested ones).

After these, BOTH builds are clean on the box — validating the whole session's
blind Windows + driver work compiles + passes clippy on real hardware.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:15:00 +00:00
enricobuehler 658564353c refactor(windows-host): KeyedMutexGuard RAII for the IDD-push consume hot loop (Goal-3, hw-validated)
The IDD-push consume loop acquired the slot's keyed mutex by hand
(`AcquireSync(0,8)` … work … `ReleaseSync(0)`), with a comment warning that a
`?`-return between acquire and release would leak the lock and stall the driver
on that slot — the reason the HDR converter is built *before* the acquire.

Replace with a `KeyedMutexGuard` RAII (acquire → `ReleaseSync` on drop), scoped
to JUST the convert/copy block so the lock releases at the EXACT same point as
before (the driver gets the slot back immediately; not held across the rest of
`try_consume`). Now the release can't be skipped on any early return/panic — the
leak footgun is gone by construction, and the hot loop has no raw `ReleaseSync`.

Behavior/latency-equivalent (same acquire params, same release point). Windows-
only (CI + on-glass gated); to be validated on the RTX box (host clippy build +
a PERF=1 latency A/B vs the shipping binary — the change should show no delta).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:02:05 +00:00
enricobuehler 6b3cbce120 wip: host latency/GPU-contention notes + Windows packaging tweaks
Pre-existing working-tree changes committed to the branch on request: the
gpu-contention investigation doc, host-latency-plan additions, and small
pack-host-installer / stage-pf-vdisplay packaging-script edits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:53:09 +00:00
enricobuehler 739fa74e68 docs(library): game-store provider design (Xbox/Epic/EA, Heroic/Lutris, …)
Web-researched + adversarially-verified design for extending library.rs with more
store providers: the LibraryProvider extension point, the two cross-cutting pieces
(Windows interactive-session launch wiring + a layered artwork strategy), new
LaunchSpec kinds, per-store enumeration/launch/art recipes with priority/effort/
confidence, a phased plan, and the verification corrections.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:53:09 +00:00
15 changed files with 1455 additions and 66 deletions
Generated
+90 -1
View File
@@ -1010,6 +1010,18 @@ dependencies = [
"pin-project-lite",
]
[[package]]
name = "fallible-iterator"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2acce4a10f12dc2fb14a218589d4f1f62ef011b2d0cc4b3cb1bba8e94da14649"
[[package]]
name = "fallible-streaming-iterator"
version = "0.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7360491ce676a36bf9bb3c56c1aa791658183a54d2744120f27285738d90465a"
[[package]]
name = "fastbloom"
version = "0.14.1"
@@ -1111,6 +1123,12 @@ version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d9c4f5dac5e15c24eb999c26181a6ca40b39fe946cbe4c263c7209467bc83af2"
[[package]]
name = "foldhash"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "77ce24cb58228fbb8aa041425bb1050850ac19177686ea6e0f41a70416f56fdb"
[[package]]
name = "form_urlencoded"
version = "1.2.2"
@@ -1586,7 +1604,16 @@ version = "0.15.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9229cfe53dfd69f0609a49f65461bd93001ea1ef889cd5529dd176593f5338a1"
dependencies = [
"foldhash",
"foldhash 0.1.5",
]
[[package]]
name = "hashbrown"
version = "0.16.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "841d1cc9bed7f9236f321df977030373f4a4163ae1a7dbfe1a51a2c1a51d9100"
dependencies = [
"foldhash 0.2.0",
]
[[package]]
@@ -1594,6 +1621,18 @@ name = "hashbrown"
version = "0.17.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ed5909b6e89a2db4456e54cd5f673791d7eca6732202bbf2a9cc504fe2f9b84a"
dependencies = [
"foldhash 0.2.0",
]
[[package]]
name = "hashlink"
version = "0.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a5081f264ed7adee96ea4b4778b6bb9da0a7228b084587aa3bd3ff05da7c5a3b"
dependencies = [
"hashbrown 0.17.1",
]
[[package]]
name = "heck"
@@ -1966,6 +2005,17 @@ dependencies = [
"system-deps",
]
[[package]]
name = "libsqlite3-sys"
version = "0.38.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f6c19a05435c21ac299d71b6a9c13db3e3f47c520517d58990a462a1397a61db"
dependencies = [
"cc",
"pkg-config",
"vcpkg",
]
[[package]]
name = "linux-raw-sys"
version = "0.12.1"
@@ -2655,6 +2705,7 @@ dependencies = [
"audiopus_sys",
"axum",
"axum-server",
"base64",
"bytemuck",
"cbc",
"ffmpeg-next",
@@ -2678,6 +2729,7 @@ dependencies = [
"rcgen",
"reis",
"rsa",
"rusqlite",
"rustls",
"rustls-pemfile",
"rusty_enet",
@@ -3028,6 +3080,31 @@ dependencies = [
"zeroize",
]
[[package]]
name = "rsqlite-vfs"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c51c9ae4df8a7fba42103df5c621fa3c37eccf3a3c650879e90fc48b11cc192c"
dependencies = [
"hashbrown 0.16.1",
"thiserror 2.0.18",
]
[[package]]
name = "rusqlite"
version = "0.40.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "11438310b19e3109b6446c33d1ed5e889428cf2e278407bc7896bc4aaea43323"
dependencies = [
"bitflags",
"fallible-iterator",
"fallible-streaming-iterator",
"hashlink",
"libsqlite3-sys",
"smallvec",
"sqlite-wasm-rs",
]
[[package]]
name = "rustc-hash"
version = "2.1.2"
@@ -3548,6 +3625,18 @@ dependencies = [
"der",
]
[[package]]
name = "sqlite-wasm-rs"
version = "0.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dc3efc0da82635d7e1ced0053bbbfa8c7ab9645d0bf36ceb4f7127bb85315d75"
dependencies = [
"cc",
"js-sys",
"rsqlite-vfs",
"wasm-bindgen",
]
[[package]]
name = "strsim"
version = "0.11.1"
+7
View File
@@ -85,6 +85,13 @@ wayland-scanner = "0.31"
wayland-backend = "0.3"
# Parse `pw-dump` JSON to find gamescope's PipeWire node (gamescope backend).
serde_json = "1"
# Read the Lutris library DB (`pga.db`) for the Lutris store provider. `bundled` vendors + compiles
# SQLite (cc, already needed for ffmpeg/opus) so there's no system libsqlite3 runtime dependency —
# clean for the deb/rpm/flatpak packaging. Opened read-only/immutable (Lutris may hold it open).
rusqlite = { version = "0.40", features = ["bundled"] }
# Inline Lutris's local cover-art JPEGs as `data:` URLs in the library (Lutris has no public CDN
# keyed by a stable id, unlike Steam/Heroic; a `data:` URL is self-contained — no host-served endpoint).
base64 = "0.22"
# Builds/validates the xkb keymap uploaded to the virtual keyboard + tracks modifier state.
xkbcommon = "0.8"
# The safe `opus` crate is stereo-only; surround (5.1/7.1) needs the libopus *multistream*
@@ -14,7 +14,6 @@ use super::dxgi::{make_device, D3d11Frame, HdrConverter, WinCaptureTarget};
use super::{CapturedFrame, Capturer, FramePayload, PixelFormat};
use anyhow::{bail, Context, Result};
use pf_driver_proto::frame;
use std::ffi::c_void;
use std::os::windows::io::{AsRawHandle, FromRawHandle, OwnedHandle};
use std::sync::atomic::{AtomicU32, AtomicU64, Ordering};
use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH};
@@ -132,6 +131,41 @@ struct HostSlot {
srv: ID3D11ShaderResourceView,
}
/// RAII guard over an [`IDXGIKeyedMutex`]: [`acquire`](Self::acquire) does `AcquireSync(key, timeout)`,
/// `Drop` does `ReleaseSync(key)`. So the lock is released even if the work between acquire and the end
/// of the guard's scope `?`-returns or panics — the "leak the keyed-mutex lock → stall the driver on
/// that slot" footgun the consume loop guards against by hand. Keeps the hot loop free of a raw
/// `ReleaseSync` that a future early-return could skip.
struct KeyedMutexGuard<'a> {
mutex: &'a IDXGIKeyedMutex,
key: u64,
}
impl<'a> KeyedMutexGuard<'a> {
/// Acquire `mutex` at `key`, waiting up to `timeout_ms`. `None` if the acquire times out / errors
/// (the caller skips the frame), so the guard is only ever held when the lock is genuinely held.
fn acquire(
mutex: &'a IDXGIKeyedMutex,
key: u64,
timeout_ms: u32,
) -> Option<KeyedMutexGuard<'a>> {
// SAFETY: `mutex` is a live `IDXGIKeyedMutex` on this thread's immediate-context device.
if unsafe { mutex.AcquireSync(key, timeout_ms) }.is_err() {
return None;
}
Some(KeyedMutexGuard { mutex, key })
}
}
impl Drop for KeyedMutexGuard<'_> {
fn drop(&mut self) {
// SAFETY: we hold `mutex` at `key` (acquired in `acquire`, never released elsewhere); release it.
unsafe {
let _ = self.mutex.ReleaseSync(self.key);
}
}
}
/// Creates + owns the shared ring; yields the driver's frames as [`FramePayload::D3d11`].
pub struct IddPushCapturer {
device: ID3D11Device,
@@ -365,7 +399,7 @@ impl IddPushCapturer {
// Own the mapping handle so it (and its view) free via `MappedSection` RAII even on bail.
let map = OwnedHandle::from_raw_handle(map.0 as _);
let view = MapViewOfFile(
HANDLE(map.as_raw_handle() as *mut c_void),
HANDLE(map.as_raw_handle()),
FILE_MAP_ALL_ACCESS,
0,
0,
@@ -415,7 +449,7 @@ impl IddPushCapturer {
// Own the mapping handle so it (and its view) free via `MappedSection` RAII.
let dm = OwnedHandle::from_raw_handle(dm.0 as _);
let dv = MapViewOfFile(
HANDLE(dm.as_raw_handle() as *mut c_void),
HANDLE(dm.as_raw_handle()),
FILE_MAP_ALL_ACCESS,
0,
0,
@@ -783,20 +817,26 @@ impl IddPushCapturer {
// ~3 ms encode — NVENC reads the host out-ring slot, not the keyed-mutex slot), so the driver gets
// the slot back immediately and the encode of the PREVIOUS frame overlaps this convert.
let s = &self.slots[slot];
if unsafe { s.mutex.AcquireSync(0, 8) }.is_err() {
return Ok(None);
}
unsafe {
if self.display_hdr {
// Sample the FP16 slot's SRV directly (no scratch copy) → BT.2020 PQ Rgb10a2.
if let Some(conv) = self.hdr_conv.as_ref() {
conv.convert(&self.context, &s.srv, &out_rtv, self.width, self.height);
// Acquire the slot's keyed mutex via a RAII guard, scoped to JUST the convert/copy below so it
// releases at the same point as the old hand-written `ReleaseSync` (the driver gets the slot back
// immediately, NOT held across the rest of `try_consume`) — but now leak-proof on any early return.
{
let Some(_lock) = KeyedMutexGuard::acquire(&s.mutex, 0, 8) else {
return Ok(None);
};
// SAFETY: convert/copy on the owning (encode) thread's immediate context, holding the slot lock.
unsafe {
if self.display_hdr {
// Sample the FP16 slot's SRV directly (no scratch copy) → BT.2020 PQ Rgb10a2.
if let Some(conv) = self.hdr_conv.as_ref() {
conv.convert(&self.context, &s.srv, &out_rtv, self.width, self.height);
}
} else {
// SDR: the slot is already 8-bit BGRA — one copy into the out-ring (hidden by pipelining).
self.context.CopyResource(&out, &s.tex);
}
} else {
// SDR: the slot is already 8-bit BGRA — one copy into the out-ring (hidden by pipelining).
self.context.CopyResource(&out, &s.tex);
}
let _ = s.mutex.ReleaseSync(0);
// `_lock` drops here → `ReleaseSync(0)`.
}
self.out_idx = (i + 1) % self.out_ring.len();
self.last_seq = seq;
@@ -895,9 +935,7 @@ impl Capturer for IddPushCapturer {
fn next_frame(&mut self) -> Result<CapturedFrame> {
let deadline = Instant::now() + Duration::from_secs(20);
loop {
let _ = unsafe {
WaitForSingleObject(HANDLE(self.event.as_raw_handle() as *mut c_void), 16)
};
let _ = unsafe { WaitForSingleObject(HANDLE(self.event.as_raw_handle()), 16) };
if let Some(f) = self.try_consume()? {
return Ok(f);
}
+405
View File
@@ -256,6 +256,298 @@ fn is_steam_tool(appid: u32, name: &str) -> bool {
|| n.contains("steamvr")
}
// ---------------------------------------------------------------------------------------
// Lutris (Linux) — reads the local `pga.db` (no auth, no network). One provider covers
// everything Lutris manages: Wine/Proton games, GOG/Epic/Battle.net installs, emulators.
// ---------------------------------------------------------------------------------------
/// Reads the **local** Lutris library DB (`pga.db`) — no network. Installed titles only; cover art
/// from Lutris's on-disk cache, inlined as `data:` URLs. Linux-only (Lutris is Linux-only).
#[cfg(target_os = "linux")]
pub struct LutrisProvider;
#[cfg(target_os = "linux")]
impl LibraryProvider for LutrisProvider {
fn store(&self) -> &'static str {
"lutris"
}
fn list(&self) -> Vec<GameEntry> {
let Some(db) = lutris_db() else {
return Vec::new();
};
lutris_games(&db).unwrap_or_else(|e| {
tracing::warn!(error = %e, db = %db.display(), "lutris pga.db read failed — skipping");
Vec::new()
})
}
}
/// The first existing Lutris `pga.db`: XDG data dir, the classic `~/.local/share`, or Flatpak.
#[cfg(target_os = "linux")]
fn lutris_db() -> Option<PathBuf> {
let mut candidates = Vec::new();
if let Some(d) = std::env::var_os("XDG_DATA_HOME") {
candidates.push(PathBuf::from(d).join("lutris/pga.db"));
}
if let Some(home) = std::env::var_os("HOME").map(PathBuf::from) {
candidates.push(home.join(".local/share/lutris/pga.db"));
candidates.push(home.join(".var/app/net.lutris.Lutris/data/lutris/pga.db"));
}
candidates.into_iter().find(|p| p.is_file())
}
/// Installed games from a Lutris `pga.db`. Opened **read-only + immutable** (via a SQLite URI) so a
/// running Lutris holding the file can't make us block or fail, and we never write to it.
#[cfg(target_os = "linux")]
fn lutris_games(db: &Path) -> rusqlite::Result<Vec<GameEntry>> {
use rusqlite::OpenFlags;
// `immutable=1` treats the DB as read-only-and-unchanging → no locking against a live Lutris. The
// path goes into the URI literally; a `?`/`#` in it (vanishingly rare on Linux) would mis-parse,
// so fall back to a plain read-only open in that case.
let path = db.to_string_lossy();
let conn = if path.contains('?') || path.contains('#') {
rusqlite::Connection::open_with_flags(db, OpenFlags::SQLITE_OPEN_READ_ONLY)?
} else {
rusqlite::Connection::open_with_flags(
format!("file:{path}?immutable=1"),
OpenFlags::SQLITE_OPEN_READ_ONLY | OpenFlags::SQLITE_OPEN_URI,
)?
};
let mut stmt = conn.prepare(
"SELECT id, slug, name FROM games \
WHERE installed = 1 AND name IS NOT NULL AND name <> '' \
ORDER BY name COLLATE NOCASE",
)?;
let rows = stmt.query_map([], |row| {
Ok((
row.get::<_, i64>(0)?,
row.get::<_, Option<String>>(1)?,
row.get::<_, String>(2)?,
))
})?;
let mut games = Vec::new();
for (id, slug, name) in rows.flatten() {
games.push(GameEntry {
id: format!("lutris:{id}"),
store: "lutris".into(),
title: name,
art: slug.as_deref().map(lutris_art).unwrap_or_default(),
launch: Some(LaunchSpec {
kind: "lutris_id".into(),
value: id.to_string(),
}),
});
}
Ok(games)
}
/// Lutris cover art (local files keyed by slug) inlined as `data:` URLs — Lutris has no public CDN
/// keyed by a stable id (unlike Steam/Heroic), and `Artwork` fields are URLs the client fetches, so a
/// self-contained `data:` URL needs no host-served endpoint. `coverart` → portrait, `banners` → header.
#[cfg(target_os = "linux")]
fn lutris_art(slug: &str) -> Artwork {
Artwork {
portrait: lutris_image("coverart", slug),
header: lutris_image("banners", slug),
..Default::default()
}
}
/// Find `<kind>/<slug>.jpg` across the current (0.5.18+), legacy (`~/.cache`), and Flatpak Lutris
/// dirs and inline it as `data:image/jpeg;base64,…`. Skips a missing or implausibly large file (a
/// 1 MiB cap bounds the catalog JSON so a few big files can't bloat it).
#[cfg(target_os = "linux")]
fn lutris_image(kind: &str, slug: &str) -> Option<String> {
use base64::Engine as _;
let home = std::env::var_os("HOME").map(PathBuf::from)?;
let roots = [
home.join(".local/share/lutris"),
home.join(".cache/lutris"),
home.join(".var/app/net.lutris.Lutris/data/lutris"),
home.join(".var/app/net.lutris.Lutris/cache/lutris"),
];
for root in roots {
let p = root.join(kind).join(format!("{slug}.jpg"));
let Ok(meta) = std::fs::metadata(&p) else {
continue;
};
if meta.len() == 0 || meta.len() > 1024 * 1024 {
continue;
}
if let Ok(bytes) = std::fs::read(&p) {
let enc = base64::engine::general_purpose::STANDARD.encode(&bytes);
return Some(format!("data:image/jpeg;base64,{enc}"));
}
}
None
}
// ---------------------------------------------------------------------------------------
// Heroic (Linux) — Epic + GOG + Amazon in one provider. Reads Heroic's `store_cache` JSON
// (no auth); cover art is already public Epic/GOG/Amazon CDN URLs the client fetches directly.
// ---------------------------------------------------------------------------------------
/// Reads Heroic Games Launcher's local library cache. One provider surfaces all three of Heroic's
/// backends (legendary=Epic, gog=GOG, nile=Amazon). Linux-only for now (Heroic on Windows uses a
/// different config path and the launch path isn't wired there yet).
#[cfg(target_os = "linux")]
pub struct HeroicProvider;
#[cfg(target_os = "linux")]
impl LibraryProvider for HeroicProvider {
fn store(&self) -> &'static str {
"heroic"
}
fn list(&self) -> Vec<GameEntry> {
let Some(root) = heroic_root() else {
return Vec::new();
};
let mut games = Vec::new();
// (cache file, runner id, the electron-store data key holding the games array)
for (file, runner, key) in [
("legendary_library.json", "legendary", "library"),
("gog_library.json", "gog", "games"),
("nile_library.json", "nile", "library"),
] {
let path = root.join("store_cache").join(file);
match heroic_games(&path, runner, key) {
Ok(mut g) => games.append(&mut g),
Err(e) => {
tracing::debug!(error = %e, file, "heroic store_cache not read (store unused?)")
}
}
}
games
}
}
/// The first existing Heroic config root: `$XDG_CONFIG_HOME/heroic`, classic `~/.config/heroic`, or
/// the Flatpak path.
#[cfg(target_os = "linux")]
fn heroic_root() -> Option<PathBuf> {
let mut candidates = Vec::new();
if let Some(d) = std::env::var_os("XDG_CONFIG_HOME") {
candidates.push(PathBuf::from(d).join("heroic"));
}
if let Some(home) = std::env::var_os("HOME").map(PathBuf::from) {
candidates.push(home.join(".config/heroic"));
candidates.push(home.join(".var/app/com.heroicgameslauncher.hgl/config/heroic"));
}
candidates.into_iter().find(|p| p.is_dir())
}
/// Parse one runner's `store_cache/*_library.json` (an electron-store object whose `key` holds the
/// games array). Keeps only installed titles whose install dir still exists (the latter works around
/// Heroic's gog `is_installed` bug, #2691). Art comes straight from the cached public CDN URLs.
#[cfg(target_os = "linux")]
fn heroic_games(path: &Path, runner: &str, key: &str) -> anyhow::Result<Vec<GameEntry>> {
let raw = std::fs::read_to_string(path)?;
let root: serde_json::Value = serde_json::from_str(&raw)?;
let arr = root
.get(key)
.and_then(|v| v.as_array())
.ok_or_else(|| anyhow::anyhow!("no '{key}' array in {}", path.display()))?;
let mut games = Vec::new();
for g in arr {
if !g
.get("is_installed")
.and_then(|v| v.as_bool())
.unwrap_or(false)
{
continue; // the cache also lists owned-but-not-installed titles
}
let install_ok = g
.get("install")
.and_then(|i| i.get("install_path"))
.and_then(|p| p.as_str())
.is_some_and(|p| Path::new(p).is_dir());
if !install_ok {
continue;
}
let Some(app_name) = g
.get("app_name")
.and_then(|v| v.as_str())
.filter(|s| !s.is_empty())
else {
continue;
};
let title = g
.get("title")
.and_then(|v| v.as_str())
.unwrap_or(app_name)
.to_string();
// Only emit http(s) art (sideloaded titles can carry local file:// paths the client can't fetch).
let http = |k: &str| {
g.get(k)
.and_then(|v| v.as_str())
.filter(|s| s.starts_with("http://") || s.starts_with("https://"))
.map(String::from)
};
let art = Artwork {
portrait: http("art_square"),
header: http("art_cover"),
hero: http("art_background").or_else(|| http("art_cover")),
logo: http("art_logo"),
};
games.push(GameEntry {
id: format!("heroic:{runner}:{app_name}"),
store: "heroic".into(),
title,
art,
launch: Some(LaunchSpec {
kind: "heroic".into(),
value: format!("{runner}:{app_name}"),
}),
});
}
Ok(games)
}
/// Map a `heroic` LaunchSpec value (`<runner>:<appName>`) to the Heroic launch command, run nested in
/// gamescope. The host owns this mapping; the client only ever sends the id. CAVEAT: Heroic is a
/// single-instance Electron app — in a fresh per-session gamescope it boots, launches the game (which
/// renders into that gamescope) and stays hidden via `--no-gui`; but if a Heroic GUI is ALREADY
/// running on the box, the spawned process forwards the URI and exits, which would tear the session
/// down. The validated path is the fresh-session case; needs live confirmation on a box with Heroic.
#[cfg(target_os = "linux")]
fn heroic_command(value: &str) -> Option<String> {
let (runner, app) = value.split_once(':')?;
if !matches!(runner, "legendary" | "gog" | "nile") {
return None;
}
// appName charset (Epic alnum, GOG digits, Amazon alnum) — keep the URI a single safe token.
if app.is_empty()
|| !app
.bytes()
.all(|b| b.is_ascii_alphanumeric() || matches!(b, b'.' | b'_' | b'-'))
{
return None;
}
let prefix = heroic_launch_prefix()?;
// No quotes: gamescope spawns the app by `split_whitespace()`, and the URI has no spaces (appName
// is validated above) so it stays a single argv token; `&` is fine (exec'd, not shell-parsed).
Some(format!(
"{prefix} --no-gui heroic://launch?appName={app}&runner={runner}"
))
}
/// How to invoke Heroic: the native `heroic` binary if on `PATH`, else the Flatpak app if its data
/// root is present. `None` ⇒ Heroic not found, so no launch command.
#[cfg(target_os = "linux")]
fn heroic_launch_prefix() -> Option<String> {
let on_path = std::env::var_os("PATH")
.is_some_and(|paths| std::env::split_paths(&paths).any(|d| d.join("heroic").is_file()));
if on_path {
return Some("heroic".into());
}
let flatpak = std::env::var_os("HOME")
.map(PathBuf::from)
.is_some_and(|h| h.join(".var/app/com.heroicgameslauncher.hgl").is_dir());
flatpak.then(|| "flatpak run com.heroicgameslauncher.hgl".into())
}
// ---------------------------------------------------------------------------------------
// Custom store (user-curated entries, persisted + CRUD'd via the mgmt API)
// ---------------------------------------------------------------------------------------
@@ -415,6 +707,13 @@ fn command_for(spec: &LaunchSpec) -> Option<String> {
match spec.kind.as_str() {
"steam_appid" => valid_steam_appid(&spec.value)
.then(|| format!("steam steam://rungameid/{}", spec.value)),
// Lutris: a digits-only pga.db game id (same guard as steam_appid) → its run URI.
#[cfg(target_os = "linux")]
"lutris_id" => (!spec.value.is_empty() && spec.value.bytes().all(|b| b.is_ascii_digit()))
.then(|| format!("lutris lutris:rungameid/{}", spec.value)),
// Heroic: `<runner>:<appName>` → the validated heroic://launch command (see heroic_command).
#[cfg(target_os = "linux")]
"heroic" => heroic_command(&spec.value),
// Trusted: the command comes from the host's own custom store, never the client.
"command" => (!spec.value.trim().is_empty()).then(|| spec.value.clone()),
_ => None,
@@ -499,6 +798,13 @@ fn steam_exe() -> Option<std::path::PathBuf> {
/// The full library: every store's titles merged + the custom entries, sorted by title.
pub fn all_games() -> Vec<GameEntry> {
let mut games = SteamProvider.list();
// The Lutris + Heroic providers are Linux-only (their launchers are); on other hosts the library
// is Steam + custom. Each provider is best-effort (empty when its store isn't present).
#[cfg(target_os = "linux")]
{
games.extend(LutrisProvider.list());
games.extend(HeroicProvider.list());
}
games.extend(load_custom().into_iter().map(GameEntry::from));
games.sort_by_key(|g| g.title.to_lowercase());
games
@@ -616,6 +922,105 @@ mod tests {
assert_eq!(g.store, "custom");
}
#[cfg(target_os = "linux")]
#[test]
fn lutris_games_reads_installed_only() {
use rusqlite::Connection;
let dir = std::env::temp_dir().join(format!("pf-lutris-test-{}", std::process::id()));
std::fs::create_dir_all(&dir).unwrap();
let db = dir.join("pga.db");
{
let c = Connection::open(&db).unwrap();
c.execute_batch(
"CREATE TABLE games (id INTEGER PRIMARY KEY, slug TEXT, name TEXT, installed INTEGER);
INSERT INTO games (id,slug,name,installed) VALUES (42,'elden-ring','ELDEN RING',1);
INSERT INTO games (id,slug,name,installed) VALUES (7,'owned','Owned Only',0);
INSERT INTO games (id,slug,name,installed) VALUES (9,'noname',NULL,1);",
)
.unwrap();
}
let games = lutris_games(&db).unwrap();
std::fs::remove_dir_all(&dir).ok();
// Only the installed, named row; the uninstalled + NULL-name rows are filtered out.
assert_eq!(games.len(), 1);
assert_eq!(games[0].id, "lutris:42");
assert_eq!(games[0].store, "lutris");
assert_eq!(games[0].title, "ELDEN RING");
let l = games[0].launch.as_ref().unwrap();
assert_eq!((l.kind.as_str(), l.value.as_str()), ("lutris_id", "42"));
}
#[cfg(target_os = "linux")]
#[test]
fn heroic_games_parses_installed_with_cdn_art() {
let dir = std::env::temp_dir().join(format!("pf-heroic-test-{}", std::process::id()));
let install = dir.join("game-install");
std::fs::create_dir_all(&install).unwrap();
let path = dir.join("legendary_library.json");
let json = format!(
r#"{{"library":[
{{"app_name":"Quail","title":"Quail","is_installed":true,
"install":{{"install_path":"{inst}"}},
"art_square":"https://cdn/quail_tall.jpg","art_cover":"https://cdn/quail_wide.jpg",
"art_logo":"file:///local/logo.png"}},
{{"app_name":"Owned","title":"Owned Only","is_installed":false,
"install":{{"install_path":"{inst}"}}}}
]}}"#,
inst = install.display()
);
std::fs::write(&path, json).unwrap();
let games = heroic_games(&path, "legendary", "library").unwrap();
std::fs::remove_dir_all(&dir).ok();
assert_eq!(games.len(), 1); // the uninstalled title is filtered out
assert_eq!(games[0].id, "heroic:legendary:Quail");
assert_eq!(games[0].title, "Quail");
assert_eq!(
games[0].art.portrait.as_deref(),
Some("https://cdn/quail_tall.jpg")
);
assert_eq!(
games[0].art.header.as_deref(),
Some("https://cdn/quail_wide.jpg")
);
assert!(games[0].art.logo.is_none()); // file:// art is dropped (client can't fetch it)
let l = games[0].launch.as_ref().unwrap();
assert_eq!(
(l.kind.as_str(), l.value.as_str()),
("heroic", "legendary:Quail")
);
}
#[cfg(target_os = "linux")]
#[test]
fn command_for_lutris_and_heroic_guards() {
// Lutris: digits → its run URI; a non-numeric id (injection attempt) is rejected.
assert_eq!(
command_for(&LaunchSpec {
kind: "lutris_id".into(),
value: "42".into()
})
.as_deref(),
Some("lutris lutris:rungameid/42")
);
assert_eq!(
command_for(&LaunchSpec {
kind: "lutris_id".into(),
value: "42; rm -rf ~".into()
}),
None
);
// Heroic guards (independent of whether Heroic is installed): bad runner / appName → None.
assert_eq!(heroic_command("badrunner:Quail"), None);
assert_eq!(heroic_command("legendary:bad name"), None);
assert_eq!(heroic_command("nile:"), None);
// When Heroic IS resolvable (a dev box), a valid id yields the launch URI; on CI (no Heroic)
// it's None — assert the URI shape only when a launcher prefix exists.
if let Some(cmd) = heroic_command("legendary:Quail-1.2_x") {
assert!(cmd.contains("heroic://launch?appName=Quail-1.2_x&runner=legendary"));
assert!(cmd.contains("--no-gui"));
}
}
#[cfg(windows)]
#[test]
fn windows_launch_for_maps_and_guards() {
+1 -3
View File
@@ -138,9 +138,7 @@ fn resolve_topology() -> SessionTopology {
let cfg = crate::config::config();
// `NO_HELPER`/`NO_WGC` force single-process; IDD-push captures in-process in Session 0 (no helper);
// otherwise the helper runs when forced or when we're SYSTEM (in-process WGC can't activate there).
let helper = if cfg.no_helper || crate::capture::wgc_disabled() {
false
} else if cfg.idd_push {
let helper = if cfg.no_helper || crate::capture::wgc_disabled() || cfg.idd_push {
false
} else {
cfg.force_helper || crate::capture::wgc_relay::running_as_system()
@@ -13,7 +13,6 @@
//! its `Drop` releases the refcount (a *stale* lease — its monitor was preempted + recreated under it —
//! is a no-op, so it can never tear down the live monitor).
use std::ffi::c_void;
use std::os::windows::io::{AsRawHandle, OwnedHandle};
use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
use std::sync::{Arc, Mutex, Once, OnceLock};
@@ -160,11 +159,11 @@ impl VirtualDisplayManager {
/// double-open.
fn ensure_device(&self) -> Result<HANDLE> {
if let Some(d) = self.device.get() {
return Ok(HANDLE(d.as_raw_handle() as *mut c_void));
return Ok(HANDLE(d.as_raw_handle()));
}
let (handle, watchdog_s) = unsafe { self.driver.open()? };
self.watchdog_s.store(watchdog_s, Ordering::Relaxed);
let raw = HANDLE(handle.as_raw_handle() as *mut c_void);
let raw = HANDLE(handle.as_raw_handle());
let _ = self.device.set(Arc::new(handle));
Ok(raw)
}
@@ -174,7 +173,7 @@ impl VirtualDisplayManager {
fn device_handle(&self) -> Option<HANDLE> {
self.device
.get()
.map(|d| HANDLE(d.as_raw_handle() as *mut c_void))
.map(|d| HANDLE(d.as_raw_handle()))
}
/// Open + initialise the backend (validates the driver is present). Mirrors the old
+35 -25
View File
@@ -25,7 +25,7 @@ use anyhow::{bail, Context, Result};
use std::ffi::{c_void, OsString};
use std::os::windows::io::{AsRawHandle, FromRawHandle, OwnedHandle};
use std::path::PathBuf;
use std::sync::atomic::{AtomicIsize, Ordering};
use std::sync::OnceLock;
use std::time::Duration;
use windows::core::{PCWSTR, PWSTR};
@@ -65,18 +65,19 @@ const SERVICE_DESCRIPTION: &str =
/// legacy GCM nonce reuse — security-review #5/#9; native clients only).
const DEFAULT_HOST_CMD: &str = "serve --gamestream";
/// Event handles shared between the SCM control handler (which signals them) and the supervision loop
/// (which waits on them). Stored as raw `isize` so the `'static + Send` handler can reach them without
/// a non-`Send` `HANDLE` capture. Set once in `run_service`.
///
/// Intentionally left as raw-`isize` statics + their explicit `CloseHandle` in `run_service` (not
/// `OwnedHandle`): they're smuggled across the C SCM control-handler boundary, so converting them is a
/// separate, riskier redesign out of scope for the process/job-handle ownership change here.
static STOP_EVENT: AtomicIsize = AtomicIsize::new(0);
static SESSION_EVENT: AtomicIsize = AtomicIsize::new(0);
/// The STOP and SESSION manual-reset events, shared between the SCM control handler (a capture-free
/// `'static` closure that SIGNALS them) and the supervision loop (which WAITS on them). They live in
/// `OnceLock`s — a static the handler can reach without capturing a non-`Send` `HANDLE` — and each owns
/// its handle (`OwnedHandle`) for the process lifetime: the service process exits right after
/// `run_service` returns, so the OS reaps them at exit, and owning them past the handler's last possible
/// call avoids the close-then-signal window the old raw-`isize` statics had. Set once, in `run_service`.
static STOP_EVENT: OnceLock<OwnedHandle> = OnceLock::new();
static SESSION_EVENT: OnceLock<OwnedHandle> = OnceLock::new();
fn load_event(a: &AtomicIsize) -> HANDLE {
HANDLE(a.load(Ordering::Relaxed) as *mut c_void)
/// Borrow an event's handle for the control handler's `SetEvent`. `None` until `run_service` creates the
/// events — but the handler is registered only AFTER they're set, so in practice this is always `Some`.
fn event_handle(ev: &OnceLock<OwnedHandle>) -> Option<HANDLE> {
ev.get().map(|h| HANDLE(h.as_raw_handle()))
}
/// Dispatch `service <sub>`.
@@ -204,12 +205,19 @@ fn run_service() -> Result<()> {
// Two manual-reset events: STOP (set once, never reset) and SESSION (set on a console
// connect/disconnect, reset by the supervisor after it reacts).
let stop =
let stop_raw =
unsafe { CreateEventW(None, true, false, PCWSTR::null()) }.context("CreateEvent stop")?;
let session = unsafe { CreateEventW(None, true, false, PCWSTR::null()) }
let session_raw = unsafe { CreateEventW(None, true, false, PCWSTR::null()) }
.context("CreateEvent session")?;
STOP_EVENT.store(stop.0 as isize, Ordering::Relaxed);
SESSION_EVENT.store(session.0 as isize, Ordering::Relaxed);
// Own each event handle (the OS reaps them at process exit); the handler reaches them through the
// OnceLocks, while `supervise` waits on the borrowed `HANDLE`s. SAFETY: each is a fresh CreateEventW
// handle we own — take ownership exactly once.
let stop_owned = unsafe { OwnedHandle::from_raw_handle(stop_raw.0) };
let session_owned = unsafe { OwnedHandle::from_raw_handle(session_raw.0) };
let stop = HANDLE(stop_owned.as_raw_handle());
let session = HANDLE(session_owned.as_raw_handle());
let _ = STOP_EVENT.set(stop_owned); // set once per process
let _ = SESSION_EVENT.set(session_owned);
// The control handler captures nothing — it reaches the events through the statics, so it stays
// `Fn + Send + 'static`. Session lock/unlock are handled inside the host (DesktopWatcher), so we
@@ -217,7 +225,9 @@ fn run_service() -> Result<()> {
let handler = move |control| -> ServiceControlHandlerResult {
match control {
ServiceControl::Stop | ServiceControl::Preshutdown | ServiceControl::Shutdown => {
unsafe { SetEvent(load_event(&STOP_EVENT)) }.ok();
if let Some(h) = event_handle(&STOP_EVENT) {
unsafe { SetEvent(h) }.ok();
}
ServiceControlHandlerResult::NoError
}
ServiceControl::SessionChange(param) => {
@@ -226,7 +236,9 @@ fn run_service() -> Result<()> {
param.reason,
ConsoleConnect | ConsoleDisconnect | SessionLogon
) {
unsafe { SetEvent(load_event(&SESSION_EVENT)) }.ok();
if let Some(h) = event_handle(&SESSION_EVENT) {
unsafe { SetEvent(h) }.ok();
}
}
ServiceControlHandlerResult::NoError
}
@@ -263,10 +275,8 @@ fn run_service() -> Result<()> {
controls_accepted: ServiceControlAccept::empty(),
..running
});
unsafe {
let _ = CloseHandle(stop);
let _ = CloseHandle(session);
}
// The STOP/SESSION events stay owned by the OnceLocks for the process lifetime (the OS reaps them at
// exit); NOT closing them while the SCM handler could still fire avoids a use-after-close.
result
}
@@ -306,7 +316,7 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
}
// BORROW the owned job handle for AssignProcessToJobObject inside spawn_host.
let job_h = HANDLE(job.as_raw_handle() as *mut c_void);
let job_h = HANDLE(job.as_raw_handle());
let child = match unsafe { spawn_host(session, &cmdline, &workdir, job_h) } {
Ok(child) => child,
Err(e) => {
@@ -323,7 +333,7 @@ fn supervise(stop: HANDLE, session_ev: HANDLE) -> Result<()> {
// `proc_h` is a plain copy that does NOT close it). `child` owns the process + thread handles
// and auto-closes BOTH when it drops — at the end of this iteration, on `continue`, or on
// `break` — so every match arm below only stops/terminates and lets the drop do the closing.
let proc_h = HANDLE(child.process.as_raw_handle() as *mut c_void);
let proc_h = HANDLE(child.process.as_raw_handle());
// Wait on stop / session-change / child-exit.
let reason = wait_any(&[stop, session_ev, proc_h], INFINITE);
@@ -403,7 +413,7 @@ unsafe fn make_job() -> Result<OwnedHandle> {
info.BasicLimitInformation.LimitFlags =
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE | JOB_OBJECT_LIMIT_BREAKAWAY_OK;
SetInformationJobObject(
HANDLE(job.as_raw_handle() as *mut c_void),
HANDLE(job.as_raw_handle()),
JobObjectExtendedLimitInformation,
&info as *const _ as *const c_void,
std::mem::size_of::<JOBOBJECT_EXTENDED_LIMIT_INFORMATION>() as u32,
+377
View File
@@ -0,0 +1,377 @@
# Game library: more game stores
Status: **design / not started** · Author research: web-backed, adversarially verified (2026-06-26).
Goal: extend the unified game library so it enumerates and launches titles from more stores —
on **Windows** Xbox / Game Pass, Epic, EA app (and GOG / Ubisoft / Battle.net / Amazon);
on **Linux** Heroic (Epic+GOG+Amazon), Lutris, and a `.desktop`/Flatpak catch-all.
---
## 1. Where the extension point already is
The library lives in [`crates/punktfunk-host/src/library.rs`](../crates/punktfunk-host/src/library.rs)
and is already a plug-in system — its own doc comment names these exact targets. Adding a store is
a new `LibraryProvider`, not a rewrite.
```rust
pub trait LibraryProvider {
fn store(&self) -> &'static str; // "steam", ...
fn list(&self) -> Vec<GameEntry>; // best-effort: empty (not Err) if the store is absent
}
pub struct GameEntry { id: String /* "<store>:<localid>" */, store, title, art: Artwork, launch: Option<LaunchSpec> }
pub struct Artwork { portrait, hero, logo, header: Option<String> } // URLs the CLIENT fetches
pub struct LaunchSpec{ kind: String, value: String } // today: "steam_appid" | "command"
```
Today: `SteamProvider` (reads local `.acf` / `.vdf` files — **no API key, no network**) plus a
user-curated `custom` store. `all_games()` merges them; `launch_command(id)` resolves a
store-qualified id **against the host's own library** and maps the `LaunchSpec` to a shell command,
with injection guards (`steam_appid` is validated digits-only; the client never sends a raw command).
**The "read the launcher's own on-disk files, no auth" approach is the gold standard we replicate per store.**
Surfaces touched by adding stores:
- `library.rs` — new providers (the bulk of the work is small per store).
- [`mgmt.rs`](../crates/punktfunk-host/src/mgmt.rs) `:1138` — serves `/library`; OpenAPI-generated TS client picks up new stores as data.
- [`web/src/sections/Library/view.tsx`](../web/src/sections/Library/view.tsx) — the grid; **store badge is hard-coded** steam-vs-custom, needs generalizing per `game.store`.
- Launch wiring: [`punktfunk1.rs`](../crates/punktfunk-host/src/punktfunk1.rs) `:573` (native) and [`gamestream/stream.rs`](../crates/punktfunk-host/src/gamestream/stream.rs) `:122` (Moonlight).
> The legacy GameStream `apps.json` ([`gamestream/apps.rs`](../crates/punktfunk-host/src/gamestream/apps.rs))
> is a **separate** Moonlight surface (session recipes: compositor + nested command) and stays as-is.
---
## 2. The two cross-cutting pieces (this is the real work)
Per-store enumeration is mostly easy. Two shared problems gate everything — especially Windows.
### 2a. Launch abstraction + the Windows launch gap
- **Linux** runs the chosen title as a shell command **nested in the per-session gamescope**
(`set_launch_command` / `PUNKTFUNK_GAMESCOPE_APP`). Works today.
- **Windows** captures the whole desktop (DXGI/WGC); there is no nesting, and
`VirtualDisplay::set_launch_command` is a **no-op** ([`vdisplay.rs:57`](../crates/punktfunk-host/src/vdisplay.rs)).
So on Windows **nothing is auto-started** — the user just sees the desktop.
**Plan.** Stop returning a single Linux shell string from `command_for`; introduce an internal enum and
an OS-aware resolver:
```rust
enum LaunchAction { Shell(String), Spawn { exe: PathBuf, args: Vec<String>, workdir: Option<PathBuf> } }
fn resolve_launch(&LaunchSpec) -> Option<LaunchAction> // cfg-aware
fn launch_command(id) -> Option<String> // Linux: thin Shell wrapper (back-compat)
#[cfg(windows)] fn launch_title(id) -> Result<()> // resolve Spawn + run in interactive session
```
**The Windows launcher already exists in the codebase — reuse it.**
[`capture/windows/wgc_relay.rs:196-204`](../crates/punktfunk-host/src/capture/windows/wgc_relay.rs)
does exactly the needed sequence:
`WTSGetActiveConsoleSessionId → WTSQueryUserToken → DuplicateTokenEx(TokenPrimary) →
CreateEnvironmentBlock → CreateProcessAsUserW(lpDesktop="winsta0\\default")`.
- Factor that into `windows/interactive.rs::spawn_in_active_session(exe, args, workdir) -> u32`.
- **Critical:** use the **logged-in user token** (`WTSQueryUserToken`, as `wgc_relay` does) — **not**
`windows/service.rs:449-510`'s variant, which duplicates the **SYSTEM** token and only retargets its
session id. UWP/appx activation, the user-hive protocol handlers (`HKCU\Software\Classes`), and each
launcher's auth/entitlement context all require the *real user's* token. The host process stays SYSTEM.
- For URI-handoff kinds (Epic/Steam/EA/Amazon/GOG-Galaxy) build a **concrete EXE + the URI as a separate
argv element**. `CreateProcessAsUserW` does **no** shell/protocol resolution — never `cmd /c`, never a
bare URI. For schemes with no exe-argv form (`amazon-games://`, `origin2://`), add an impersonate-token
`ShellExecuteEx` fallback (`ImpersonateLoggedOnUser` on a worker thread + `CoInitialize`).
- **Order:** launch the title **after** the interactive capture pipeline is live, so the game renders onto
the already-captured desktop and grabs foreground.
- **Caveats:** `WTSQueryUserToken` fails when no interactive user is logged on (a pre-login box can stream
the login/secure desktop but can't auto-launch a title); on the lock/secure desktop a launch may queue
until unlock. **Needs on-glass validation** (RTX box) that each launcher EXE accepts its URI on argv and
that post-capture launch grabs foreground.
### 2b. Artwork: a layered, no-auth-first `ArtResolver`
Steam gets free CDN art keyed by appid. Most stores don't. Layered ladder, degrade to a title-only card:
1. **Steam** → public Steam CDN by appid (unchanged, client fetches directly).
2. **Stores that already hold public CDN URLs** → emit verbatim, **no host endpoint**: Heroic
`store_cache` `art_*` (Epic/GOG/Amazon CDN), itch `cover_url`, GOG via public `api.gog.com/products/<id>?expand=images`
(one cached lookup), Epic via local `catcache.bin` keyImages.
3. **Xbox** → one **unofficial** no-auth `displaycatalog.mp.microsoft.com` lookup by StoreId, cached,
degrade to no-art offline. (Not a stable contract — tolerate drift.)
4. **Genuinely-local art** (Lutris `coverart`/`banners` JPEGs, Flatpak/.desktop icons, Bottles) → a
**new host-served endpoint is required**, because `Artwork` carries URLs the client fetches and a file
on the host has no public URL.
5. **Opt-in SteamGridDB** enrichment (v2 API `https://www.steamgriddb.com/api/v2`, `Authorization: Bearer
<operator key>`, **off by default**) to fill gaps. Not no-auth; never blocks listing.
6. **None** → existing title-only card.
**New endpoint:** `GET /library/art/<entryId>/<slot>` (slot ∈ `portrait|hero|logo|header`) on `mgmt.rs`.
It resolves `entryId` in the host library to a **known on-disk absolute path** (never interpolates raw
client input into a filesystem path), sanitizes the slot, rejects `..`, streams the bytes with the right
content-type. Reserve `data:` URLs for tiny logos only (don't bloat the catalog JSON that crosses the
control plane). See open question on whether this GET bypasses the mgmt bearer (images are non-sensitive
and the streaming client connects over punktfunk/1, not the bearer-gated REST).
---
## 3. Security model (preserved and extended)
The invariant is unchanged: **the client sends only a store-qualified `GameEntry.id`** (e.g. `lutris:42`,
`xbox:9NBLGGH4R315`, `epic:fn:4fe…:Fortnite`) in `Hello.launch`. The host looks it up in its **own**
enumerated library, reads the **host-derived** `LaunchSpec`, and resolves it. The client never sends a
`LaunchSpec`, command, URI, or path.
Per-kind charset validators are belt-and-suspenders before any interpolation (values are already
host-derived from local files the host owns):
| kind | guard |
|---|---|
| `steam_appid`, `lutris_id`, `uplay` | digits only |
| `battlenet` | `^[A-Za-z0-9]+$` (case-sensitive) |
| `amazon` | `^[A-Za-z0-9-]+$` |
| `aumid` | `^[A-Za-z0-9._-]+![A-Za-z0-9._-]+$` (the `!` separator) |
| `epic` | ≤3 `:`-split parts, each `^[A-Za-z0-9._-]+$`, then URL-encode colons |
| `heroic` | runner ∈ {legendary,gog,nile} + appName `^[A-Za-z0-9._-]+$` |
| `ea_offer_ids` | `^[A-Za-z0-9._,-]+$` (allow comma) |
On **Windows never route a client-influenced string through `cmd /c start`.** `resolve_launch` yields
`Spawn{exe,args,workdir}`; `CreateProcessAsUserW` launches a concrete EXE with the URI/flags as separate
argv elements. The operator-only `command` kind (custom store + provider-generated Linux shell lines for
`desktop`/`itch`) is host-derived/operator-typed, never client-set.
The one net-new surface is `GET /library/art` — covered in §2b (id-resolved path, no traversal).
---
## 4. New `LaunchSpec` kinds
| kind | value holds | maps to |
|---|---|---|
| `lutris_id` | `pga.db` `games.id` (digits) | Linux Shell `lutris lutris:rungameid/<id>` (nests in gamescope) |
| `heroic` | `<runner>:<appName>` | Linux argv `heroic --no-gui "heroic://launch?appName=<app>&runner=<runner>"` |
| `aumid` | `<PFN>!<AppId>` | Windows Spawn `explorer.exe "shell:AppsFolder\<aumid>"` (interactive session) |
| `epic` | `<namespace>:<catalogItemId>:<appName>` | Windows Spawn `EpicGamesLauncher.exe` + `com.epicgames.launcher://apps/<ns>%3A<cat>%3A<app>?action=launch&silent=true` |
| `gog` | host-resolved `exe \t args \t workdir` | Windows Spawn `CreateProcessAsUserW(exe,args,workdir)` (direct exe, no Galaxy) |
| `uplay` | Ubisoft gameId (digits) | Windows `uplay://launch/<gameId>/0` |
| `battlenet` | product code (e.g. `WTCG`, `Fen`, `OSI`) | Windows Spawn `Battle.net.exe --exec="launch <code>"` |
| `amazon` | Amazon Games `DbSet.Id` | Windows `amazon-games://play/<Id>` (impersonate ShellExecute) |
| `ea_offer_ids` | comma-joined contentID list | Windows `origin2://game/launch/?offerIds=<list>&autoDownload=1` |
| `command` (existing) | host-derived shell line | Linux gamescope-nested (desktop/flatpak/itch reuse this) |
---
## 5. Per-store provider catalog
Confidence is **after** adversarial web-verification (research → verify). All enumeration is no-auth,
local, launcher-need-not-be-running unless noted.
### Linux
#### Lutris — P0, effort M, confidence **high**
- **Enumerate:** read-only `rusqlite` open of `pga.db`
(`$XDG_DATA_HOME/lutris` | `~/.local/share/lutris` | `~/.var/app/net.lutris.Lutris/data/lutris`).
`SELECT id, slug, name, runner FROM games WHERE installed=1`. Optionally LEFT JOIN
`games_categories`/`categories` to drop the `.hidden` category. Open `mode=ro`/`immutable=1` (Lutris
holds it open). `installed=1` matters — the DB also lists owned-but-not-installed rows.
- **Launch:** `lutris_id` → `lutris lutris:rungameid/<id>` (execs the game; most nesting-friendly).
One-time on-box check that `games.id` == the `rungameid` int.
- **Artwork:** **local** JPEGs keyed by slug — `coverart/<slug>.jpg` (→ portrait), `banners/<slug>.jpg`
(→ header) under `~/.local/share/lutris` (0.5.18+), with `~/.cache/lutris` (≤0.5.17) and the Flatpak
cache as fallbacks. Needs the `/library/art` endpoint. hero/logo stay None.
- **Notes:** highest-confidence new store. A `runner=='steam'` row can duplicate `SteamProvider` — dedup
is a nicety. Verify bundled-SQLite is fine for deb/rpm/flatpak.
#### Heroic — P0, effort M, confidence **high** (one provider = Epic + GOG + Amazon, art free)
- **Enumerate:** parse `~/.config/heroic/store_cache/{legendary,gog,nile}_library.json` (Flatpak:
`~/.var/app/com.heroicgameslauncher.hgl/config/heroic/...`). Data key is `"library"` (legendary/nile)
or `"games"` (gog); ignore `__timestamp.*` siblings. Filter `is_installed==true` **and** cross-check
`install.install_path` exists (works around the gog `is_installed` bug, Heroic #2691). Fall back to
`legendaryConfig/legendary/installed.json` etc. when a cache file is absent.
*(Heroic uses `legendaryConfig/legendary`, **not** the standalone `~/.config/legendary`.)*
- **Launch:** `heroic` → `heroic --no-gui "heroic://launch?appName=<app>&runner=<runner>"` (argv, no shell).
`--no-gui` does the suppression; the `gui=false` query param is **inert/fabricated** — drop it.
**Ship enumeration+art first, gate launch:** Heroic is single-instance Electron — if already running it
forwards the URI and **exits**, which (as gamescope's foreground child) would tear the session down while
the game runs **outside** gamescope, uncaptured. Also Electron needs a display — fine nested in gamescope,
not in a bare headless context.
- **Artwork:** **free** — `art_square` → portrait, `art_cover` → header, `art_background`||`art_cover` →
hero, `art_logo` → logo are already public Epic/GOG/Amazon CDN URLs. Skip non-`http(s)` values
(sideloaded `file://` art). No host endpoint.
- **Notes:** do **not** also build separate Linux GOG/Amazon providers — native Linux GOG Galaxy doesn't
exist; Heroic is the canonical Linux path for those.
#### Desktop (`.desktop` + Flatpak) — P1, effort M, confidence medium (universal catch-all)
- **Enumerate:** scan `{/var/lib/flatpak/exports/share/applications,
~/.local/share/flatpak/.../applications, /usr/share/applications, /usr/local/share/applications,
~/.local/share/applications}/*.desktop`. Require `Type=Application` + `Categories` contains `Game`; skip
`NoDisplay`/`Hidden`/`Terminal=true` and known launcher app-ids (Steam/Heroic/Lutris/Bottles/RetroArch)
to avoid recursion/dupes.
- **Launch:** reuse `command` (host-derived shell line, nested in gamescope): cleaned `Exec` (strip
`%U/%F/%f/%u/%i/%c/%k`) else `flatpak run <app-id>`.
- **Artwork:** local — resolve `Icon=` via the hicolor theme / flatpak exported icons → `/library/art`.
App icons are low-res, not box art (acceptable header fallback).
- **Notes:** run **last** and dedup by install path / drop ids already surfaced by Steam/Heroic/Lutris.
#### itch.io — P3, effort S, confidence medium (Linux + Windows)
- **Enumerate:** read-only `rusqlite` of `butler.db` (`~/.config/itch/db/butler.db`; Flatpak
`io.itch.itch`; Windows `%AppData%\itch\db`, per-user). JOIN `caves`→`games`. **Key on `cave.ID`** (a
game can have multiple caves; install location + verdict are per-cave). Read game title / `cover_url`;
resolve install dir from `InstallLocationID`+`InstallFolderName`||`CustomInstallFolder` + the Verdict
candidate. Confirm exact column names on-box.
- **Launch:** `command` → direct binary `basePath`+`candidate.path`, **only** for Verdict candidates with
`flavor==native` (html/jar/love need itch's runtime — fall back to custom).
- **Artwork:** **free** — `games.cover_url` is a public itch CDN URL.
### Windows
#### Epic Games Store — P1, effort M, confidence medium (cleanest Windows store to validate the launch wiring)
- **Enumerate:** read `C:\ProgramData\Epic\EpicGamesLauncher\Data\Manifests\*.item` (JSON; machine-wide,
SYSTEM-readable, launcher need not run). Read `DisplayName`, `AppName`, `CatalogNamespace`,
`CatalogItemId`, `InstallLocation`, `LaunchExecutable`, `MainGameAppName`, `AppCategories`. Iterate the
dir (filename is a random GUID).
**Use Playnite's EXCLUSION filter, not a positive `games` filter:** skip `AppName` starting `UE_`; skip
DLC only when `AppCategories` has `addons` && **not** `addons/launchable`; require `InstallLocation`
exists. (The first-pass positive filter `games + MainGameAppName==AppName` can drop legit games.)
- **Launch:** `epic` → Spawn `EpicGamesLauncher.exe` + `com.epicgames.launcher://apps/<ns>%3A<cat>%3A<app>?action=launch&silent=true`.
Build the **triple** only when both namespace and CatalogItemId are present; otherwise **fall back to the
bare `appName` URI (don't set launch=None)** — bare still works in Playnite today, it's just less robust.
CatalogItemId is **not** present in every `.item` — verify on a real box.
- **Artwork:** **free** — base64-decode + parse `Data\Catalog\catcache.bin`, index by catalogItemId, map
keyImages `DieselGameBoxTall`→portrait, `DieselGameBox`→hero, `DieselGameBoxLogo`→logo. None on miss.
- **Notes:** `.item` + `catcache.bin` are community-RE'd; `silent=true` may not suppress a cold-start
launcher window.
#### GOG — P1, effort M, confidence medium
- **Enumerate:** registry `HKLM\SOFTWARE\WOW6432Node\GOG.com\Games\<id>` (PATH/GAMENAME/gameID/EXE) or
Uninstall `<id>_is1` keys with `Publisher=='GOG.com'` (exclude `GOGPACK*`). Parse
`<PATH>\goggame-<id>.info` for `playTasks[isPrimary && type=='FileTask']` → exe/args/workingDir.
- **Launch:** `gog` → **direct-exe** Spawn (no Galaxy dependency, dodges cold-start/anti-cheat). Optional
fallback: `GalaxyClient.exe /launchViaAutostart /gameId=<id> /command=runGame /path="<dir>"` (note the
`/launchViaAutostart` token; `goggalaxy://openGameView/<id>` only **opens the page**, doesn't launch).
- **Artwork:** **free** — public no-auth `GET https://api.gog.com/products/<id>?expand=images` →
`images.logo2x`/`verticalCover`/`background`; cache resolved URLs. (`goggame-.info` carries no art; the
Galaxy `galaxy-2.0.db` is undocumented/locked — avoid.)
#### Xbox / Microsoft Store / Game Pass — P1, effort **L**, confidence medium (big Game Pass value, most plumbing)
- **Enumerate:** probe each fixed drive for an `XboxGames` dir (default `C:\XboxGames`; the `.GamingRoot`
binary layout is **undocumented** — just scan, don't depend on parsing it). For each
`<Title>\Content\MicrosoftGame.config` (**presence = it's a GDK game**, the game-vs-app signal) read
`ShellVisuals.DefaultDisplayName` (title), `<StoreId>` (12-char BigId, the art key), `Identity Name`,
`<Executable Id="Game">` (the AppId). **Read the PackageFamilyName from the
`C:\ProgramData\Microsoft\Windows\AppRepository\Packages\<PackageFullName>` directory name** (strip
`_Version_Arch_~_PublisherHash`) — **never compute the PFN by hashing the publisher**. AUMID = `PFN!AppId`.
- **Launch:** `aumid` → `explorer.exe shell:AppsFolder\<AUMID>` into the interactive session. **UWP
activation fails from SYSTEM/session-0 — the interactive user token is load-bearing.**
- **Artwork:** one **unofficial** no-auth lookup
`displaycatalog.mp.microsoft.com/v7.0/products/<StoreId>?market=US&languages=en-us&fieldsTemplate=Details`,
map `Images[]` ImagePurpose Poster→portrait / SuperHeroArt→hero / Logo→logo / BoxArt→header; cache to
the config dir, degrade to no-art offline. Not a stable contract.
- **Notes:** misses pure-UWP (non-GDK) Store games under the ACL-locked `WindowsApps` — accept for v1.
#### Ubisoft Connect — P2, effort S, confidence medium
- **Enumerate:** registry `HKLM\SOFTWARE\WOW6432Node\Ubisoft\Launcher\Installs\<gameId>` (both reg views),
read `InstallDir`; title = install-dir leaf folder (primary) else the `Uplay Install <gameId>` Uninstall
`DisplayName`.
- **Launch:** `uplay` → `uplay://launch/<gameId>/0`. **Artwork:** none → title-only.
- **Notes:** smallest effort once the Windows URI-launch wiring exists; hive+scheme unchanged across the
Origin→EA migration.
#### Amazon Games — P2, effort S, confidence medium
- **Enumerate:** read-only `rusqlite` of
`%LocalAppData%\Amazon Games\Data\Games\Sql\GameInstallInfo.sqlite`:
`SELECT Id,ProductTitle,InstallDirectory FROM DbSet WHERE Installed=1`. **Per-user path** — the SYSTEM
service must resolve the **active session user's** profile (not the SYSTEM profile).
- **Launch:** `amazon` → `amazon-games://play/<Id>` (impersonate-token ShellExecute; no clean exe-argv form).
- **Artwork:** `ProductIconUrl`/`ProductLogoUrl` columns when present, else none.
#### Battle.net — P2, effort **L**, confidence medium (high catalog value: WoW/Diablo IV/Overwatch 2/CoD)
- **Enumerate:** hand-roll a ~4-field protobuf decode of `C:\ProgramData\Battle.net\Agent\product.db`
(`product_install{ uid, product_code, settings.install_path, cached_product_state.base_product_state.installed }`).
Registry fallback: Uninstall keys whose `UninstallString` matches `Battle.net.exe --uid=<uid>`.
`product.db` has **no titles** → maintain a ~30-entry `product_code`→name map (source from
bnetlauncher/Lutris/Heroic; codes are **case-sensitive**).
- **Launch:** `battlenet` → `Battle.net.exe --exec="launch <code>"` (more reliable than the
`battlenet://<code>` URI, which only hands off). **Artwork:** none → title-only.
- **Notes:** the protobuf + name map + no-art make it L; pin the `.proto` and decode defensively.
#### EA app — P2, effort M, confidence medium (most closed/fragile — ship last)
- **Enumerate:** registry `HKLM\SOFTWARE\WOW6432Node\{EA Games,Origin Games}\<id>` (Install Dir /
DisplayName), parse `<dir>\__Installer\installerdata.xml` for the **full** `<contentIDs>` list +
`<gameTitle locale='en_US'>`. Registry under-reports for EA-app (vs legacy Origin) installs — known
completeness gap. Keep the AES-256 encrypted `IS`-file decrypt **out** of the default path (optional
feature flag for completeness).
- **Launch:** `ea_offer_ids` → `origin2://game/launch/?offerIds=<full,comma,list>&autoDownload=1`. **Emit
the full contentID list** — a single offerId generally no longer launches under the EA app.
- **Artwork:** none no-auth → title-only.
#### Rockstar — P3, fold into custom
- Registry `HKLM\SOFTWARE\WOW6432Node\Rockstar Games\<Title>\InstallFolder`; direct-exe Spawn; no art.
Tiny catalog, most titles now bought on Steam/Epic.
---
## 6. Suggested structure & phasing
**Structure.** Split `library.rs` → a `library/` dir before it balloons:
`mod.rs` (trait, wire types, `LaunchAction`, custom CRUD, `all_games`, `resolve_launch`,
`launch_command`/`launch_title`), `steam.rs`, one file per provider, `art.rs` (ArtResolver +
displaycatalog/gog-api/steamgriddb helpers), `win_util.rs` (HKLM subkey enumerator, read-only SQLite
opener, tiny read-only XML reader). New deps: `rusqlite` (bundled, read-only) for lutris/itch/amazon DBs;
`roxmltree`/`quick-xml` for the Windows manifests; registry via the `windows` crate's
`Win32_System_Registry` feature (no new crate). Avoid `prost` — hand-roll the ~4 Battle.net fields.
| Phase | Deliverable | Files |
|---|---|---|
| **1 — Foundation** (no new stores) | Split `library.rs` → `library/`; add `LaunchAction` + `resolve_launch`; factor `windows/interactive.rs::spawn_in_active_session` out of `wgc_relay.rs`; make `set_launch_command` real on Windows; wire `launch_title` at session-start post-capture; add `win_util.rs` + deps | `library/{mod,steam,launch,art,win_util}.rs`; `windows/interactive.rs` (new); `capture/windows/wgc_relay.rs`; `punktfunk1.rs:573`; `gamestream/stream.rs:122`; `vdisplay.rs:57`; `main.rs`; `Cargo.toml` |
| **2 — Linux Lutris + Heroic + art endpoint** (P0) | `LutrisProvider`, `HeroicProvider` (art free); `GET /library/art/<id>/<slot>` for Lutris local JPEGs; wire into `all_games()`; unit tests for new `resolve_launch` arms + guards | `library/{lutris,heroic,art}.rs`; `library/mod.rs`; `mgmt.rs:1138` + new route |
| **3 — Windows Epic + GOG** (P1) | `EpicProvider` (.item + catcache art), `GogProvider` (registry + .info + api.gog.com art); validate `windows/interactive.rs` end-to-end on the RTX box | `library/{epic,gog,win_util,art,launch}.rs` |
| **4 — Xbox / Game Pass** (P1) | `XboxProvider` (XboxGames scan + MicrosoftGame.config + AppRepository PFN + aumid launch) + displaycatalog art with caching/offline degrade | `library/{xbox,art,launch}.rs` |
| **5 — Linux Desktop catch-all + easy Windows URI stores** (P1/P2) | `DesktopProvider` (last + dedup, icons via `/library/art`), `UplayProvider`, `AmazonProvider` (+ per-user-profile-under-SYSTEM helper) | `library/{desktop,uplay,amazon,win_util,art}.rs` |
| **6 — Remaining + opt-in enrichment** (P2/P3) | `BattleNetProvider` (hand-rolled protobuf + code→name map), `EaAppProvider`, `ItchProvider`; Rockstar/Bottles → custom; optional SteamGridDB v2 behind an operator key | `library/{battlenet,eaapp,itch,art,mod}.rs` |
Also generalize the web console store badge (`web/src/sections/Library/view.tsx`) to render per `game.store`.
---
## 7. Open questions
- **Art delivery auth:** the streaming client connects over punktfunk/1 (QUIC), not the bearer-gated mgmt
REST, yet already fetches Steam CDN URLs over plain HTTP. Should `GET /library/art/*` be an
unauthenticated read-only image GET on the mgmt listener (bearer bypass for that path only), a separate
tiny image server, or should local-art bytes ride the punktfunk/1 control plane?
- **Windows launch ordering** needs on-glass RTX-box validation: confirm launching *after* capture is live
grabs foreground+capture, and that `CreateProcessAsUserW(EpicGamesLauncher.exe/steam.exe, URI-as-argv)`
actually starts the game per launcher (vs needing the impersonate-ShellExecute fallback).
- **Per-user-profile resolution under SYSTEM** for Amazon (`%LocalAppData%`) and itch (`%AppData%`): add
`WTSQueryUserToken` + `GetUserProfileDirectoryW` (or read `USERPROFILE` from `CreateEnvironmentBlock`)?
- **`rusqlite` bundled SQLite** — acceptable for deb/rpm/flatpak and no link conflict? Otherwise fall back
to `lutris -l -j` (fragile: single-instance D-Bus forwarding).
- **Battle.net** product-code→name map source/maintenance, and `product.db` `.proto` drift across Agent versions.
- **Unofficial art sources** (Xbox displaycatalog): best-effort with aggressive caching + no-art degrade,
or Xbox-art local-tile-only for v1?
- **Heroic launch:** ship enumeration+art only at first, or invest in direct legendary/gogdl/nile CLI
launch (needs the user's on-disk auth tokens) to dodge the single-instance-Electron / gamescope-escape problem?
- **`config_dir()` consistency:** `library.rs` uses an XDG/HOME-based dir; confirm the Windows SYSTEM host
lands its art cache + custom store under `%ProgramData%\punktfunk` (there's a separate
`gamestream::config_dir()` that already does this).
- Should provider-generated Linux shell lines (`desktop`/`itch`) reuse the `command` kind (documented
"operator-only") or get a distinct internal kind to keep the mgmt-UI `command` semantics clean?
---
## 8. Verification notes (what the adversarial pass corrected)
First-pass research was web-re-checked; corrections folded into §5 above:
- **Epic:** bare-`AppName` URI is **not** universally removed (Playnite still uses it) — build the triple
when ids exist, fall back to bare; use Playnite's **exclusion** filter, not a positive `games` filter.
- **EA:** a single offerId no longer launches — emit the **full** comma-joined contentID list; registry
under-reports for EA-app installs.
- **Battle.net:** `battlenet://<code>` only hands off — use `Battle.net.exe --exec="launch <code>"`.
- **Xbox:** **read** the PFN from the AppRepository dir name, don't hash the publisher; `.GamingRoot`
layout is undocumented — just scan `XboxGames`.
- **Heroic:** `gui=false` is inert (`--no-gui` does it); single-instance Electron forwards-and-exits →
gate launch.
- **Lutris:** open the DB read-only; `lutris -l -j` fallback is fragile (single-instance D-Bus forwarding).
- **SteamGridDB:** v1 is deprecated — use v2 (`/api/v2`, Bearer key).
**Not web-confirmable / needs on-box validation:** every Windows launch path (each launcher's argv
handling, foreground grab, secure-desktop behavior), all registry keys / DB schemas against a live box,
and `rusqlite` packaging.
+430
View File
@@ -0,0 +1,430 @@
# GPU-contention performance investigation — why a saturating game starves the stream (2026-06-25)
> The headache, stated precisely:
> a game renders ~140 fps on the host GPU; the client requests 120/240; in a GPU-light scene the
> stream tracks; the moment the game pins the GPU the **stream collapses to 4050 fps** while the
> game keeps rendering 140. Capping the game's fps raises the stream back up (clearest in light
> titles like CS2). **Capping is not an acceptable fix** — demanding titles exhaust the GPU even
> when capped.
This is the second, deeper pass on the problem. The first pass is
[`host-latency-plan.md`](host-latency-plan.md) (a 25-agent investigation, 2026-06-18). **This doc
supersedes several of that doc's conclusions** — the codebase moved a lot in the week since
(the Windows-host rewrite landed IDD-push as the default capture path, split-encode shipped, the
GPU-priority knob got configurable), and a fresh, adversarially-verified research pass overturned
two of the old plan's premises. Read §1 (corrections) before acting on the old doc.
Method: five parallel investigations — three deep reads of the *current* code (encode, capture,
mitigations) and two web-research passes (encoder-side and GPU-scheduling-side), the latter run with
their own adversarial verifiers. Every external claim below carries a source URL; every code claim
carries a current `file:line`.
---
## 0. TL;DR — the corrected mental model and the action list
**The governing fact:** NVENC is a **dedicated ASIC on its own GPU runlist**, physically separate
from the SM/CUDA/graphics cores a 3D game saturates. The game does **not** steal the encode block.
It steals everything that *feeds* the block — capture-acquire, the **RGB→YUV colour-convert**, the
copy into the encoder's input surface, the readback — **and the GPU-scheduler time** to run that
feed work, which is queued behind the game's graphics context.
([NVENC app-note](https://docs.nvidia.com/video-technologies/video-codec-sdk/13.0/nvenc-application-note/index.html),
[engine-table proof, UNC RTAS'24](https://www.cs.unc.edu/~jbakita/rtas24.pdf))
**Therefore there are two different bottlenecks with opposite fixes, and you must tell them apart
before writing code:**
| Bottleneck | Symptom | Fix family |
|---|---|---|
| **(a) feed-scheduling contention** | `uniq``fps`, both ~50; `encode_ms` 1317 | shrink the host's contended-engine footprint; raise GPU scheduling priority; pipeline correctly; in the limit, a second GPU |
| **(b) frame-source ceiling** | `fps`≈240 (held re-encodes) but `uniq`→4050 | capture the game's real frames (swapchain hook); compose-flip for the DLSS-FG case |
**The single hardest truth:** on one saturated GPU there is **no free lunch**. Any host GPU work
either *preempts* the game (and steals its frames) or *waits* behind it. Capping the game works
only because it cuts the game's **total** GPU demand and opens idle gaps. The non-capping
equivalents are exactly three: **need less GPU** (footprint shrink), **take more** (priority — which
costs the game fps), or **use a different GPU** (real isolation). Anything pitched as "make the game
politely yield without losing anything" — Reflex, render-queue tricks — is a **placebo** here (§7).
**Action list, highest leverage first** (detail in §5–§6):
1. **Diagnose first** (§3). Read `uniq`-vs-`fps` under the real workload + PresentMon presentation
mode. Half a day; decides whether you're fighting (a) or (b). The repo already prints the counter.
2. **Stop feeding NVENC RGB on the default path.** IDD-push (the install default) hands NVENC
BGRA → NVENC runs its RGB→YUV CSC on the SM, the exact contended engine. Convert to NV12/P010 on
the **video engine** like the WGC/DDA paths already do. Biggest in-our-control win. (§5.A)
3. **Build a *correct* async encode pipeline** — submit on one thread, blocking-retrieve on another,
deep surface pool, Windows completion events. Our past "pipelining didn't help" was a *same-thread*
implementation that can't overlap; the two-thread pattern the NVENC guide mandates was never
tried. Recovers the depth-1 serialization that produces ~50 fps, up to the priority ceiling. (§5.B)
4. **Auto-gated REALTIME GPU priority.** Our `LocalSystem` service *can* grant it (most apps can't).
Gate on HAGS-state + VRAM headroom to dodge the documented NVENC freeze. (§5.C)
5. **Lock clocks / pin P-state** for jitter (cheap; fixes the light-scene "200-not-240", not the
collapse). (§5.E)
6. **If source-bound: swapchain-hook capture** (OBS-style) — the real escape from the compose
ceiling. Big lift, anti-cheat tradeoffs. (§5.F)
7. **The honest endgame for demanding titles: encode on a second GPU / the iGPU.** The only approach
that *removes* contention instead of re-prioritizing it. We already have AMF/QSV paths. (§5.G)
---
## 1. Corrections to `host-latency-plan.md` (read before reusing it)
The old doc was right about the shape but several specifics are now wrong or stale:
- **"Windows already feeds NVENC YUV on the video engine, so it does the right thing."** True for the
DDA and WGC paths — **false for IDD-push, which is now the install default** and feeds NVENC
**RGB**, paying the SM-side CSC the old doc said Windows had eliminated. The default path
*regressed* on the exact axis the doc celebrated. (§5.A, `capture/windows/idd_push.rs:545-551,743`)
- **"`PUNKTFUNK_ENCODE_DEPTH` (default 4, ≤6) deep-pipelines."** **There is no such knob.** It exists
only in two stale comments (`encode/windows/nvenc.rs:30`, `capture/windows/wgc.rs:57`) and is never
parsed. The real depth knob is `PUNKTFUNK_IDD_DEPTH` (default 2), used only by IDD-push on the
native path; GameStream and the WGC helper are hardcoded depth-1.
- **"Async NVENC is measure-gated and probably stacks latency (Tier 3D)."** The measurement that
produced that verdict (`capture/windows/wgc_helper.rs:131-135`) pipelined **on a single thread**
it queued more frames but still blocked `lock_bitstream` inline, so it added queue latency with
**zero overlap**. That is not the pattern the NVENC guide prescribes (submit/retrieve on
*separate* threads). The correct async pipeline is **untried**, not disproven. (§5.B)
- **"More GPU priority is maxed and hits a hard preemption wall with no recourse."** Half right.
Priority *is* near-maxed (HIGH), but the "no recourse" intuition is wrong: a **higher-priority GPU
context does preempt a saturating graphics context at pixel granularity** — that is precisely how
NVIDIA VR Async-TimeWarp injects a frame into a busy game
([VRWorks Context Priority](https://developer.nvidia.com/vrworks/headset/contextpriority)). And we
default to HIGH, leaving **REALTIME unused** even though our SYSTEM service can grant it. (§5.C)
- **"Force Composed Flip / double-refresh recovers the 'capture sees half the frames' loss."** The
"half the frames" effect is **specifically a DLSS-Frame-Generation flip-metering artifact**
(FG v310.x+ / RTX 50-series), *not* a general property of independent-flip games — normal
fullscreen flip games are captured at full rate by DDA. So composed-flip is a **narrow** fix, not a
general lever. ([Apollo #676 — DDA captured a flip game at full 120 fps](https://github.com/ClassicOldSong/Apollo/issues/676),
[Sunshine #3621 — version-pinned to FG 310.x](https://github.com/LizardByte/Sunshine/issues/3621))
- **"NvFBC is a possible low-overhead capture path."** **Dead on Windows** — deprecated, frozen at
Capture SDK 7.1 / Win10-1803
([NVIDIA deprecation bulletin](https://developer.download.nvidia.com/designworks/capture-sdk/docs/NVFBC_Win10_Deprecation_Tech_Bulletin.pdf)).
Linux-only, and there only via the consumer `keylase` patch.
What the old doc got right and still holds: feeding NVENC RGB is backwards; the source/compose ceiling
is real and upstream of encode; split-encode is a pixel-rate lever not a contention lever; the
honest residual ceiling at 100% GPU. Those carry forward.
---
## 2. How the pipeline actually serializes today (verified against current code)
The capture→encode loop is a **fixed-cadence pacer** (`gamestream/stream.rs:375-480`,
`punktfunk1.rs:2430-2540`): every `1/target_fps` tick it grabs the freshest frame with a
**non-blocking** `try_latest()`, and **if nothing new arrived it re-encodes the held frame** (a
near-empty P-frame). So the **outbound fps is pinned at `target_fps` no matter what the source did**
which is *why the raw fps counter lies* under contention. The only honest signal is the `uniq` /
`diag_new` counter (`stream.rs:380`, `punktfunk1.rs:2433-2436`), and the code itself states the
diagnostic: *"low new_fps at high send rate ⇒ the source isn't producing frames, not an encode
stall"* (`punktfunk1.rs:2466-2468`).
The encode round-trip (NVENC, the dominant path):
- `submit``encode_picture` (`encode/windows/nvenc.rs:722`) is a **non-blocking** ASIC launch; it
pushes onto a `pending` FIFO.
- `poll``lock_bitstream` (`nvenc.rs:801`) **blocks the same thread** until that frame's encode
completes. The session is **synchronous** — no `enableEncodeAsync`, no completion event.
- The only thread split is **encode-vs-network-send**, never submit-vs-retrieve.
So at depth-1 the loop is strictly serial: `capture (+convert) → submit → block in lock_bitstream →
hand AU to the send thread`. The arithmetic matches the symptom — `1000/17 ≈ 59` and `1000/13 ≈ 77`
fps bracket the observed ~50, the signature of **one frame in flight per round-trip**, not an ASIC
throughput wall.
([independent NVENC latency study: ~7 frames across all presets](https://arxiv.org/html/2511.18688v2))
Where the per-frame GPU work lands, by path (this is the crux of contention):
| Path | Colour-convert | Extra copy | NVENC input | Contended-engine load/frame |
|---|---|---|---|---|
| **IDD-push** (install default) | **none → NVENC internal RGB→YUV on the SM** | `CopyResource` BGRA→out-ring (3D), `idd_push.rs:743` | **BGRA/Rgb10a2** | **highest** (SM CSC + 3D copy) |
| **WGC** (fallback default) | `VideoProcessorBlt` → NV12 on the **video engine**, `wgc.rs:631` | none (encodes pool texture in place) | NV12/P010 | low |
| **DDA** | `VideoProcessorBlt` → NV12 on the **video engine**, `dxgi.rs:1657-1762` | one `CopyResource` (3D) to release the dup fast, `dxgi.rs:3099` | NV12/P010 | medium |
| **Linux NVENC** | **none → NVENC internal RGB→YUV on the SM** (default) | CUDA dev→dev copy + `cuStreamSynchronize` | RGBZ/BGRZ (NV12 only if `PUNKTFUNK_NV12` *and* `PUNKTFUNK_ZEROCOPY`) | high |
Measured magnitude of "RGB vs NV12 to the encoder":
[**RGB input ≈ video-engine 40% + 3D/CUDA 15%; NV12 input ≈ video 26% + 3D 2%**](https://hardforum.com/threads/can-someone-explain-to-me-how-nvenc-obs-work-with-nvidia-gpus-and-the-gpu-load-they-cause.2025896/).
NVENC's guide confirms the mechanism: *"Encoding of RGB contents"* is on the explicit list of
features that **internally use CUDA**
([NVENC prog-guide §Encoder Features using CUDA](https://docs.nvidia.com/video-technologies/video-codec-sdk/13.0/nvenc-video-encoder-api-prog-guide/index.html)).
---
## 3. Diagnose first — cheap, decisive, do before any code
Everything in §5 is gated on knowing whether you're fighting bottleneck (a) or (b). The dev VM
cannot reproduce this — run on the **RTX 4090 Windows box** (and a real NVIDIA Linux box) with an
actual saturating game.
1. **Run with `PUNKTFUNK_PERF=1` and read `uniq` vs `fps`** under CS2 at GPU-100%:
- `fps`≈target but `uniq`→4050 ⇒ **(b) source ceiling** — the compositor/IDD only produced
4050 unique frames. No encode/priority fix exceeds that number. Go to §5.F.
- both `fps` and `uniq`→4050, with `encode_ms` 1317 ⇒ **(a) feed contention** — the round-trip
is starving. Go to §5.A/B/C.
2. **Classify the game's presentation with [PresentMon](https://github.com/GameTechDev/PresentMon)**
"Presented FPS" vs "Displayed FPS" and **Presentation Mode** (Hardware: Independent Flip vs
Composed: Flip). Independent-Flip + `uniq` ≪ Presented ⇒ source/flip problem; **Presented FPS
itself** collapsed ⇒ the game is genuinely GPU-bound and no capture trick invents the missing
frames.
3. Log `cap_us` / `enc_us` / `pace_us` p50/p99 alongside to localise the stall.
> **Necessary-but-not-sufficient caveat:** if the game only *rendered* 50 frames because it's
> GPU-bound, **nothing downstream creates the other 90**. Source fixes address (b) only; the
> throughput of a saturated single GPU is split between game and host no matter what.
---
## 4. Current-state audit (what's shipped / regressed / missing)
| Area | State | Where |
|---|---|---|
| Thread priority (Win) | HIGH class + MMCSS "Games" + 1 ms timer | `session_tuning.rs` ✅ |
| Thread priority (Linux) | `setpriority` 10/5 — **native path only; GameStream Linux threads get none** | `punktfunk1.rs:1977` ⚠ |
| GPU sched priority | `D3DKMTSetProcessSchedulingPriorityClass` **HIGH(4)** default; `realtime` opt-in, no auto-gate; cross-process onto WGC helper | `capture/windows/dxgi.rs:208-330` ⚠ |
| GPU thread/latency | `SetGPUThreadPriority(0x4000001E)`, `SetMaximumFrameLatency(1)` | `dxgi.rs:193-200` ✅ |
| CSC off-SM (Win SDR) | WGC/DDA video-engine NV12 ✅ — **IDD-push (default) RGB→SM ✗** | `wgc.rs:631` / `idd_push.rs:545` |
| CSC off-SM (Win HDR) | on-SM unless `PUNKTFUNK_HDR_SHADER_P010` (default **off**) | `wgc.rs:603` ⚠ |
| CSC off-SM (Linux) | RGB→SM by default; NV12 is **double-opt-in** (`PUNKTFUNK_NV12`+`PUNKTFUNK_ZEROCOPY`) | `encode/linux/mod.rs:104` ⚠ |
| Encode pipeline | depth-1 synchronous, inline `lock_bitstream`; IDD-push native = depth-2 same-thread | `nvenc.rs:801` ⚠ |
| Split-encode | 2-way >1 Gpix/s (HEVC/AV1); disabled 10-bit (correct); proper enum | `nvenc.rs:424-447` ✅ |
| Zero-copy register-in-place | yes (no encoder-owned pool copy) — IDD-push adds its own out-ring copy | `nvenc.rs:623` ✅/⚠ |
| AMF tuning | `usage=ultralowlatency`, `preanalysis=false` | `ffmpeg_win.rs:215-219` ✅ |
| QSV tuning | `async_depth=1`, `low_power=1` (VDEnc) | `ffmpeg_win.rs:226-227` ✅ |
| Intra-refresh / infinite GOP | yes (killed the periodic-IDR freeze) | ✅ |
| encode\|send split + paced send + sendmmsg + 32 MB sockbuf | yes | `stream.rs`, `transport/qos.rs` ✅ |
| **Clock / P-state pin** | **none** (zero hits repo-wide) | ✗ |
| **Async NVENC (2-thread)** | **none** | ✗ |
| **Frame-source escape (hook/NvFBC-Linux)** | **none** | ✗ |
| **Second-GPU / iGPU encode offload** | **none** | ✗ |
| DSCP/QoS | implemented, `PUNKTFUNK_DSCP` opt-in (default off) | `transport/qos.rs` ⚠ |
---
## 5. The levers, ranked, with honest verdicts
### A. Stop feeding NVENC RGB on the default path — **highest in-our-control win**
The default Windows capture path (IDD-push) and the default Linux path both hand NVENC packed RGB,
forcing NVENC's internal RGB→YUV CSC onto the SM the game saturates. The WGC and DDA paths already
solved this by doing the CSC with `ID3D11VideoProcessor::VideoProcessorBlt` (video engine) and
feeding NV12/P010. **Make IDD-push and Linux do the same.**
- **Windows IDD-push:** add a `VideoProcessorBlt` BGRA→NV12 (SDR) / FP16→P010 (HDR) step into the
out-ring, exactly like `wgc.rs:631` / `dxgi.rs:1657-1762`, and feed `NV_ENC_BUFFER_FORMAT_NV12` /
`..._YUV420_10BIT`. This *also* lets you drop the separate `CopyResource` (the convert writes the
out-ring), removing **both** contended-engine ops per frame. Plug it into `SessionPlan`
(`session_plan.rs`, the single owner of the capture/encode decision) so capture and encode can't
disagree on the format.
- **Linux:** make NV12 the **default** for the tiled zero-copy path (it's gated behind
`PUNKTFUNK_NV12` *and* `PUNKTFUNK_ZEROCOPY` today — `encode/linux/mod.rs:104`,
`linux/zerocopy/egl.rs:272`), and feed NVENC `NV_ENC_BUFFER_FORMAT_NV12`. The GL detile already
runs; emitting NV12 from it replaces the swizzle at ~equal cost and deletes NVENC's CSC.
- **Windows HDR:** flip `PUNKTFUNK_HDR_SHADER_P010` on by default (or, better, use a video-engine
P010 convert where the VP supports it).
**Verdict: REAL, but honestly *conditional*.** Feeding NV12 provably removes NVENC's internal CUDA
CSC — but the convert has to land **off** the SM to fully pay off. `VideoProcessorBlt` is *designed*
to use fixed-function video hardware and the hardforum numbers back the 15%→2% drop, **but no NVIDIA
doc explicitly confirms `VideoProcessorBlt` runs off-SM on GeForce** — treat the "video engine" claim
as well-founded-but-unverified and confirm on-box with `nvidia-smi dmon` (watch the `enc`/`sm`
columns) before and after. Do **not** convert with a CUDA/3D shader and call it done — that just
relocates the CSC to the same SM (Sunshine's RGB→NV12 CUDA kernel still contends).
### B. A *correct* async encode pipeline (the untried encoder lever)
The NVENC Programming Guide is explicit: *"The main encoder thread should be used only to submit
work… (non-blocking `NvEncEncodePicture`). Output buffer processing — waiting on the completion
event in asynchronous mode, or calling `NvEncLockBitstream` in synchronous mode — should be done in
the **secondary thread**."*
([NVENC prog-guide, threading model](https://docs.nvidia.com/video-technologies/video-codec-sdk/13.0/nvenc-video-encoder-api-prog-guide/index.html))
We do the opposite — submit and blocking-retrieve on **one** thread. Queuing more `pending` entries
(IDD-push depth-2, or the abandoned wgc_helper experiment) adds queue latency with **no overlap**,
which is exactly the "deeper pipeline only stacks latency" result we recorded. It was the wrong
implementation, not a disproof.
The fix: **submit on the capture/encode thread; do `lock_bitstream` on a dedicated retrieve thread;
hold a deep input+output surface pool (≈48); on Windows register a `completionEvent` per output
buffer (`enableEncodeAsync=1`) — on Linux async events are unsupported, so use the same two-thread
split with a blocking retrieve.**
([async is Windows/WDDM-only](https://docs.nvidia.com/video-technologies/video-codec-sdk/13.0/nvenc-video-encoder-api-prog-guide/index.html);
FFmpeg models the same knob as `delay`/`async_depth`,
[libavcodec/nvenc.c](https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/nvenc.c)).
This lets the WDDM scheduler find a **backlog** when it finally grants the encoder context a slice,
and drain several frames back-to-back, while the ASIC encodes frame N as the contended engines do
frame N+1's convert.
**Verdict: REAL throughput recovery for the depth-1 collapse, latency cost +12 frames, ceiling-bounded.**
The honest bound (and why this is *second* to §A/§C): pipelining cannot manufacture GPU time — if the
scheduler grants the encode context only X% under load, depth only guarantees work is *ready* for
each grant; it can't raise X. That is why Sunshine's documented lever for "GPU heavily loaded" is
**priority**, not depth. So §B recovers the serialization loss; §A/§C raise the share it's bounded by.
Watch out: this **forecloses sub-frame slice output** (mutually exclusive with `enableEncodeAsync`),
and HAGS can spike the *submit* call itself
([100200 ms `nvEncEncodePicture` stalls under HAGS](https://forums.developer.nvidia.com/t/windows-11-hardware-accelerated-gpu-scheduling-issue/286128)).
### C. Auto-gated REALTIME GPU scheduling priority
Raising the host process's WDDM GPU priority is **the** proven single-PC production lever — OBS and
Sunshine both set `D3DKMT_SCHEDULINGPRIORITYCLASS_REALTIME` to stop being descheduled behind
fullscreen games
([OBS commit](https://github.com/obsproject/obs-studio/commit/ec769ef008b748f7dfba211daec9eb203ea4bea0),
[Sunshine `display_base.cpp`](https://raw.githubusercontent.com/LizardByte/Sunshine/master/src/platform/windows/display_base.cpp)).
It works **independently of HAGS** (HAGS does *not* reassign cross-process priority — Microsoft:
*"Windows continues to control prioritization"*
[DirectX devblog](https://devblogs.microsoft.com/directx/hardware-accelerated-gpu-scheduling/)).
We ship only **HIGH(4)** by default with a static `realtime` opt-in and **no auto-gate**. Two things
to change:
- **We can actually grant REALTIME.** It needs `SeIncreaseBasePriorityPrivilege`, which an unelevated
app lacks (OBS logs the failure) — **but our host runs as a `LocalSystem` service, which holds it.**
The lever is available to us specifically.
- **Gate it to dodge the freeze.** REALTIME + NVIDIA + HAGS-on + near-full-VRAM is a **documented
NVENC hang** (Sunshine ships `nvenc_realtime_hags` to downgrade to HIGH for exactly this;
[Sunshine config](https://docs.lizardbyte.dev/projects/sunshine/latest/md_docs_2configuration.html),
[NVIDIA repro](https://forums.developer.nvidia.com/t/bug-report-nvenc-encoder-hangs-on-windows-when-using-d3d11-in-real-time-mode/357466)).
Implement the old plan's "Tier 3B": probe HAGS via `D3DKMTQueryAdapterInfo` and VRAM headroom via
`IDXGIAdapter3::QueryVideoMemoryInfo` (continuously); use REALTIME only when HAGS-off, or HAGS-on
with comfortable VRAM headroom; downgrade to HIGH the instant VRAM tightens.
**Verdict: REAL — the genuine ceiling-raiser — but it is the no-free-lunch lever.** Priority is how
the host *takes* GPU time from the game; it measurably **costs the game fps**
([Doom Eternal 121→60 with Sunshine running](https://github.com/LizardByte/Sunshine/issues/3703)).
That's acceptable for a streaming host (the remote view is the product), but say so plainly and make
the class operator-configurable (we already expose `PUNKTFUNK_GPU_PRIORITY_CLASS`).
### D. Multi-vendor encoder hygiene (AMF/QSV) — mostly done, one caveat
Our `*_amf`/`*_qsv` libavcodec config already follows the research's advice: AMF
`usage=ultralowlatency` + `preanalysis=false` (`ffmpeg_win.rs:215`), QSV `async_depth=1` +
`low_power=1` VDEnc path (`:226`). Keep them. Two notes:
- **AMF/QSV suffer contention *worse* than NVENC.** OBS: *"For Intel and AMD GPUs, the hardware
encoder requires significant resources of the same type a 3D app/game requires… different from
NVIDIA's NVENC, which has dedicated encoding circuits"*
([OBS KB](https://obsproject.com/forum/threads/how-to-debug-encoding-overloaded.168625/)). So on an
AMD/Intel host the collapse is *expected to be harder* — and §G (iGPU offload) is even more
attractive there.
- **The AMF busy-poll floor** (a fixed-sleep `QueryOutput` poll imposes ~15 ms via timer
granularity) is fixed in FFmpeg's amf wrapper (Cameron Gutman's `QUERY_TIMEOUT` patch); since we
go through libavcodec we inherit it — just **confirm the pinned FFmpeg build includes it**.
([ffmpeg-devel](https://www.mail-archive.com/ffmpeg-devel@ffmpeg.org/msg170489.html))
**Verdict: REAL but largely already captured.** No big win left here except via §G.
### E. Lock clocks / pin P-state — cheap jitter fix, not a collapse fix
NVIDIA's adaptive clocking downclocks between our small bursty frames and pays a ramp tax every
frame — most visible in the *light* scene (the "200-not-240"). Pin it:
- **Windows:** NvAPI per-application DRS `PREFERRED_PSTATE = PREFER_MAX` scoped to our exe (this is
exactly Sunshine's `nvenc_latency_over_power`,
[Sunshine nvprefs](https://github.com/LizardByte/Sunshine/blob/master/src/platform/windows/nvprefs/driver_settings.cpp)).
**Crash-safe undo is mandatory** — persist an undo record to `%ProgramData%\punktfunk\` *before*
applying, revert a stale profile on next start, so a crash never leaves the user's control panel
modified.
- **Linux:** `nvidia-smi -lgc`/NVML `nvmlDeviceSetGpuLockedClocks` (needs root/`CAP_SYS_ADMIN`; query
`nvmlDeviceGetMaxClockInfo`, lock to that, restore on teardown *and* SIGTERM). Plus the newly-added
`CudaNoStablePerfLimit` driver profile — *new in R580/595, so usable on the 595 box* — to defeat
the CUDA "Force P2" memory-clock clamp.
- Gate behind `PUNKTFUNK_PIN_CLOCKS`; **default off on battery / Steam Deck** (pinning is harmful
there).
**Verdict: REAL for latency *stability*, marginal for the saturated collapse** (at 100% util the game
already pins P0). Cheap, low risk, do it for the light-scene win.
### F. Escape the frame-source ceiling — only if §3 says (b)
If `uniq` is the wall, no encoder/priority work helps — you need a better frame source.
- **Swapchain-hook capture (the real fix).** Inject a hook on `IDXGISwapChain::Present`/`Present1`,
`vkQueuePresentKHR`, `wglSwapBuffers` and copy the backbuffer to a shared texture *before* the
compositor — OBS Game Capture's mechanism. Sees **every presented frame**, no compose/refresh
gating.
([OBS dxgi-capture](https://github.com/obsproject/obs-studio/blob/master/plugins/win-capture/graphics-hook/dxgi-capture.cpp))
**Tradeoffs are serious:** anti-cheat (EAC/BattlEye/Vanguard) flags injection — needs
whitelisting/compat handling; per-graphics-API hooks; fragility across game updates. Scope it as an
opt-in "game capture" mode, not the default.
- **NvFBC:** **not an option on Windows** (dead, §1). On **Linux** it's viable via the consumer
keylase patch and captures below composition — worth a flag for the Linux NVIDIA host.
- **Compose-flip (narrow):** the topmost 1×1 layered-window trick (we already have
`composed_flip.rs`) forces DWM composition and fixes specifically the **DLSS-Frame-Gen** half-rate
case. Adds host-display latency; don't enable globally.
- **WGC "deliver 2× rate":** Apollo sets `MinUpdateInterval = 1e7/(fps*2)` so the pacer always has a
fresh frame to pick ([Apollo](https://github.com/ClassicOldSong/Apollo/pull/785)); we set it to 1×
refresh (`wgc.rs:310`). Cheap tweak to try on the WGC path.
**Verdict: swapchain-hook is REAL and the only general escape; the rest are narrow.** None invents
frames the game didn't render.
### G. The honest endgame — encode on a second GPU / the iGPU
For *demanding* titles that saturate the GPU even when capped, the only thing that **removes**
contention rather than re-prioritizing it is to run the capture→convert→encode pipeline on a
**different** GPU — a second dGPU or, more realistically, the **iGPU** (Intel QuickSync / AMD VCN),
which most desktops already have. Render on the gaming GPU, copy the frame across the adapter once,
encode on the iGPU's independent media engine. This is the textbook "stream on a separate encoder"
play, and the OBS "second GPU is harmful" verdict does **not** apply — that verdict is about moving
*only the NVENC block*; moving capture + CSC + copies off the gaming GPU genuinely frees it.
([OBS forum](https://obsproject.com/forum/threads/can-you-use-a-2nd-gpu-to-eliminate-encoder-overload.149644/))
We're unusually well-placed for this: we already have working AMF and QSV backends
(`encode/windows/ffmpeg_win.rs`) and the Linux VAAPI backend. The missing piece is a capture/topology
mode that pins capture to the gaming adapter and the encoder to the iGPU adapter, with one
cross-adapter shared-texture copy. Cost: that copy still shares VRAM bandwidth, so it's not free, but
it's the only path that lets a demanding game and a clean stream coexist on one machine.
**Verdict: REAL — the cleanest isolation, and the right answer to "even capped it collapses."**
Datacenter stacks (GeForce NOW, Stadia) "solve" this by one dedicated GPU + encoder per session;
the consumer analogue is the iGPU.
---
## 6. Recommended order of attack
1. **§3 Diagnose** on the RTX box + a real game. Settles (a) vs (b). *(half a day, decisive)*
2. **§5.A NV12/P010 on the default paths** (IDD-push video-engine convert; Linux NV12 default-on;
Windows HDR P010 default). Biggest in-our-control floor-raise; confirm off-SM with `nvidia-smi dmon`.
3. **§5.C Auto-gated REALTIME** priority (HAGS + VRAM gate). Cheap, big, we can uniquely grant it.
4. **§5.E Clock pin** both OSes (crash-safe undo). Cheap light-scene win.
5. **§5.B Correct two-thread async pipeline.** Structural; recovers the depth-1 serialization.
6. **§3-gated §5.F** source escape (swapchain hook) — only if `uniq` is the wall.
7. **§5.G iGPU encode offload** — the strategic answer for demanding titles; larger build.
After 25 the light-scene gap closes and the saturated floor rises materially. But report the
honest ceiling: **on one saturated GPU the game and the host split a fixed pie** — coarse WDDM
graphics preemption caps how much priority can claw back, and a genuinely GPU-bound game that only
*rendered* 50 frames cannot also yield 140 unique frames to capture. The only escapes from that pie
are reducing the game's demand (cap — rejected), taking a bigger slice (priority — costs game fps),
or a second slice of silicon (§G). Don't chase the rest with encoder micro-optimisation.
---
## 7. Placebos & dead ends (so we don't re-propose them)
| Candidate | Verdict | Why |
|---|---|---|
| **NVIDIA Reflex / Ultra-Low-Latency / max-pre-rendered-frames** as a "non-capping yield" | ✗ placebo | Shrinks the *game's* render queue but the game still demands ~99% GPU → frees ≈0 SM headroom. Reflex needs in-game SDK (host can't force it); ULLM is host-forceable only on DX11/DX9 (DX12 since driver 551.23) and is NVIDIA's weaker mechanism. Only honest effect: µs of tail-jitter smoothing. ([Battle(non)sense LDAT data](https://forums.guru3d.com/threads/battle-non-sense-youtuber-claims-low-latency-mode-only-helps-when-gpu-load-is-99.429074/)) |
| **HAGS on, as a contention fix** | ✗ neutral→harmful | Doesn't reassign cross-process priority (Microsoft); OBS reports it *causes* NVENC latency spikes; it's the freeze-hazard variable. Needed only to enable the VK/D3D12 realtime *queue*. ([OBS KB](https://obsproject.com/kb/hags)) |
| **Split-frame encode (2/3/4-way) to fix contention** | ✗ (pixel-rate only) | Parallelizes the ASIC, not the contended copy/CSC; measured **zero** latency change at 4K. Correct use = raise the single-session pixel ceiling (5K@240). `splitEncodeMode=15` is the legit *disable* sentinel, not a bug. ([SDK header](https://raw.githubusercontent.com/FFmpeg/nv-codec-headers/master/include/ffnvcodec/nvEncodeAPI.h)) |
| **Move the encoded-bitstream readback to a copy engine** | ✗ placebo | Output is KB-scale; the cost of `lock_bitstream` is the completion *wait*, not copy bandwidth. (The *input* full-frame copy is the real one — but D3D11 can't target the copy engine; zero-copy already avoids it.) |
| **CUDA stream priority / `CUDA_DEVICE_MAX_CONNECTIONS` / `CU_CTX_SCHED_*`** | ✗ placebo cross-process | Intra-context only; the game is a *separate* context. Stream priority "will not preempt already executing work". ([CUDA docs](https://docs.nvidia.com/cuda/cuda-programming-guide/02-basics/asynchronous-execution.html)) |
| **VK/EGL global-priority REALTIME on Linux NVIDIA** | ✗ | Not reliably granted on the proprietary driver, and moot anyway — our Linux NVENC is driven via CUDA/NVENC-SDK, not a Vulkan queue. |
| **Windows "High performance" GPU preference** | ✗ single-GPU placebo | Only selects an adapter; real only to split work across adapters (→ that's §G). |
| **MIG / MPS / vGPU** | ✗ N/A | MIG/vGPU are datacenter/pro + hypervisor/license; MPS is Linux-CUDA-only with no graphics notion. None apply to a consumer GPU. |
| **NvFBC on Windows** | ✗ dead | Deprecated, frozen at Capture SDK 7.1 / Win10-1803. |
| **Frame Generation / Smooth Motion** to "make more frames" | ✗ red herring | We stream *rendered* frames; FG adds optical-flow/tensor + present load to the same GPU → amplifies contention. |
---
## 8. Open evidence gaps (flagged honestly)
- Whether `ID3D11VideoProcessor::VideoProcessorBlt` (BGRA→NV12) runs **off the SM on GeForce** is not
confirmed by any NVIDIA document — it's the linchpin of §5.A's full payoff. **Verify on-box** with
`nvidia-smi dmon` (sm% vs enc%) on the WGC path before assuming IDD-push will match it.
- The exact share of the 1317 ms `encode_ms` that is *convert-on-SM* vs *scheduling-wait* is
unmeasured. §3 + an A/B of IDD-push-RGB vs IDD-push-NV12 on the same scene settles it and tells you
whether §5.A alone is enough or whether §5.C is doing the heavy lifting.
- AMD VCN "degrades worse under contention" is practitioner-consensus + architecture, not an AMD
whitepaper; treat the *direction* as solid, the magnitude as TBD.
+9
View File
@@ -1,5 +1,14 @@
# Host latency & the GPU-contention collapse — analysis + prioritized plan
> **⚠ Partially superseded (2026-06-25) by [`gpu-contention-investigation.md`](gpu-contention-investigation.md).**
> That follow-up re-verified this plan against the current code and overturned several specifics:
> the default Windows path (IDD-push) now feeds NVENC **RGB** (regressing the §0A "Windows does it
> right" claim); `PUNKTFUNK_ENCODE_DEPTH` never existed (phantom knob); the "async NVENC stacks
> latency" result was a *same-thread* implementation, not a disproof of a correct two-thread pipeline;
> "capture sees half the frames" is DLSS-Frame-Gen-specific, not general; and NvFBC is dead on
> Windows. Use the new doc's ranked action list. The tiers/dropped-placebo analysis below remain a
> useful record.
Scope: Windows + Linux GameStream/punktfunk1 hosts. Priority: **latency**, and specifically the
"saturating game starves the stream" headache:
+16 -9
View File
@@ -34,7 +34,7 @@ which kept the live-validated host working at every step. The driver, by contras
|---|---|---|
| **Goal 1** — clean, layered host architecture | ✅ **DONE** | `config.rs` (`HostConfig`), `session_plan.rs` (`SessionPlan`), `SessionContext`, `windows/`+`linux/` confinement (`38c68c3`), `VirtualDisplayManager` (§2.5), `EncoderCaps` (`0ccd0fe`) |
| **Goal 2** — drop every trace of SudoVDA | ✅ **DONE** | reach-in decoupled (F1: `d638a93`/`e60cda3``win_adapter`/`win_display`), then the `sudovda.rs` backend + the dual-backend select **deleted** (this branch) — pf-vdisplay is the sole Windows virtual-display backend |
| **Goal 3** — minimize `unsafe` + P0 lints | 🟡 **PARTIAL** | driver `deny(unsafe_op_in_unsafe_fn)` (`a755d6e`); **`OwnedHandle`/RAII rollout** — `idd_push.rs` (`011607e`, also a view-leak fix) + `service.rs` child/job (`4c95ba7`) + the 3 gamepad backends via shared `gamepad_raii.rs` (`e5c2b4e`), on top of `manager.rs`/`pf_vdisplay.rs`; **driver `pod_init!`** (`bf57704`, 27→1). Remaining: host-crate P0 lints (deferred — high churn, low value), the `service.rs` SCM-handler event smuggling, the on-glass-gated `KeyedMutexGuard` hot-loop RAII |
| **Goal 3** — minimize `unsafe` + P0 lints | 🟡 **PARTIAL** (**box-validated**) | driver `deny(unsafe_op_in_unsafe_fn)` (`a755d6e`); **`OwnedHandle`/RAII rollout** — `idd_push.rs` (`011607e`, view-leak fix) + `service.rs` child/job (`4c95ba7`) + the 3 gamepad backends via shared `gamepad_raii.rs` (`e5c2b4e`) + the IDD-push `KeyedMutexGuard` hot loop (`6585643`); **driver `pod_init!`** (`bf57704`, 27→1). **On-glass clean: host clippy `-D warnings` + driver build** (RTX box; `bd05bc8` fixed 11 lints the gate surfaced). Remaining: host-crate P0 lints (deferred — churn>value), the `service.rs` SCM-handler event smuggling (deliberately left) |
| **M0** — proto ABI + driver toolchain + `/INTEGRITYCHECK` + `iddcx` | ✅ **DONE** | `pf-driver-proto`; vendored `windows-drivers-rs` 0.5.1; `clear-force-integrity.ps1`; CI-green |
| **M1** — new IddCx driver, first light + HDR | ✅ **DONE (on-glass)** | STEP 08 (`d7a9fbf``cd59151`); HDR live ("Mac connects WITH HDR", `6399d28`) |
| **M2** — IDD-push capture + NVENC, glass-to-glass | ✅ **DONE (on-glass)** | 5120×1440@240 HDR zero-copy; integrated into the host path |
@@ -234,14 +234,21 @@ These are expensive empirical wins; keep them intact when touching the code:
duplicated `create_shm_section` + three hand-written `Drop`s). **Remaining (deliberately left):** the
`service.rs` `AtomicIsize` STOP/SESSION events — smuggled into the C SCM handler, a separate riskier
redesign. `manager.rs`/`pf_vdisplay.rs` already used the pattern.
6. **Driver unsafe levers** (the driver is already `deny`-clean with per-site SAFETY; these *reduce count*):
✅ **`pod_init!` macro done** (`bf57704`, 27 `mem::zeroed` → 1). **Skipped `ThreadBound<T>`** — not a clean
win (each `unsafe impl Send` wraps a distinct type; consolidating churns every access for no real safety
gain over the per-struct `// SAFETY:`). **Scratched the IOCTL dispatcher** — `control.rs`'s
`read_input<T>`/`write_output_complete<T>` are already generic helpers with minimal, documented unsafe;
re-factoring would be churn, not reduction. **Remaining (on-glass-gated):** a `KeyedMutexGuard`/
`AcquiredSurface` RAII for the frame-transport hot loop — perf-sensitive, needs an on-glass latency check,
so held rather than rushed blind.
6. **Hot-loop `KeyedMutexGuard` ✅ done** (`6585643`) — the IDD-push consume loop's hand-written
`AcquireSync`/`ReleaseSync` (with its "don't `?`-return between them or you leak the lock + stall the
driver" caveat) is now a RAII guard scoped to the convert/copy block: same release point (latency
unchanged), but leak-proof on any early return. **Driver `pod_init!` ✅** (`bf57704`, 27 `mem::zeroed` →
1). **Skipped `ThreadBound<T>`** (each `unsafe impl Send` wraps a distinct type — churn, no real gain) and
**scratched the IOCTL dispatcher** (`control.rs`'s `read_input<T>`/`write_output_complete<T>` are already
generic with minimal unsafe).
**On-glass build validation (RTX box, 2026-06-26).** Built this branch on the box in an isolated worktree:
**host `cargo clippy -p punktfunk-host --features nvenc -D warnings` = CLEAN**, **driver `cargo build` =
CLEAN** — validating the whole session's Windows + driver work on real hardware. The clippy gate (which the
goal1/§2.5 work never ran — it used `cargo check`) surfaced + fixed 11 lint issues (`bd05bc8`: 9 redundant
`as *mut c_void`, an `if_same_then_else`, an `unused_unsafe` in `pod_init!`). Remaining only a runtime
**latency A/B** for the `KeyedMutexGuard` (provably equivalent — same release point) if a deeper check is
wanted.
7. **D1-host P0 lints — deferred (low value / high churn).** A crate-wide `#![deny(unsafe_op_in_unsafe_fn)]`
produced 100+ FFI-wrap sites across the Linux modules; it *wraps* unsafe (discipline) rather than
reducing it and doesn't improve stability, so it was deprioritized vs the `OwnedHandle`/RAII reductions
@@ -66,6 +66,10 @@ macro_rules! pod_init {
($t:ty) => {{
// SAFETY: $t is a C POD (windows-rs/WDK/IddCx struct); its all-zero bit pattern is a valid
// zero-initialised value and the caller sets the required .Size/etc fields immediately after.
unsafe { ::core::mem::zeroed::<$t>() }
// `unused_unsafe`: pod_init! is also expanded at call sites already inside an `unsafe` block
// (where this `unsafe` is redundant), but it IS required at the non-unsafe sites — so allow it.
#[allow(unused_unsafe)]
let zeroed = unsafe { ::core::mem::zeroed::<$t>() };
zeroed
}};
}
+1 -1
View File
@@ -141,7 +141,7 @@ $defines = @(
)
# --- stage the pf-vdisplay virtual-display driver bundle --------------------------------------
# pf-vdisplay is our all-Rust IddCx driver (packaging/windows/vdisplay-driver/), vendored signed under
# pf-vdisplay is our all-Rust IddCx driver (packaging/windows/drivers/), vendored signed under
# packaging/windows/pf-vdisplay/. It replaced the vendored SudoVDA C++ driver.
if (-not $NoDriver) {
$stage = Join-Path $OutDir 'stage'
+3 -3
View File
@@ -4,11 +4,11 @@
driver + the fetched nefcon device tool.
.DESCRIPTION
pf-vdisplay (our all-Rust IddCx virtual display) is built from packaging/windows/vdisplay-driver/, and
pf-vdisplay (our all-Rust IddCx virtual display) is built from packaging/windows/drivers/, and
the SIGNED output (pf_vdisplay.dll/.inf/.cat + punktfunk-driver.cer) is VENDORED under
packaging/windows/pf-vdisplay/ (signer punktfunk-ds-test — shared with the gamepad drivers — Class=
Display, HWID root\pf_vdisplay). Rebuild + re-vendor with
packaging/windows/vdisplay-driver/deploy-dev.ps1 when the driver source changes, then copy the staged
packaging/windows/drivers/deploy-dev.ps1 when the driver source changes, then copy the staged
pf_vdisplay.{dll,inf,cat} over the vendored copies. nefcon publishes a pinned release, so we fetch +
SHA-256-verify it (it provides nefconc.exe, used to create the root-enumerated device node — pnputil
can't).
@@ -36,7 +36,7 @@ New-Item -ItemType Directory -Force -Path $OutDir | Out-Null
# --- vendored pf-vdisplay driver --------------------------------------------------------------
$inf = Get-ChildItem -Path $VendorDir -Filter pf_vdisplay.inf -ErrorAction SilentlyContinue | Select-Object -First 1
if (-not $inf) { throw "no vendored pf_vdisplay.inf under $VendorDir — re-vendor via vdisplay-driver/deploy-dev.ps1" }
if (-not $inf) { throw "no vendored pf_vdisplay.inf under $VendorDir — re-vendor via drivers/deploy-dev.ps1" }
Copy-Item (Join-Path $VendorDir '*') $OutDir -Force
Write-Host "==> vendored pf-vdisplay staged from $VendorDir"
+17 -1
View File
@@ -19,6 +19,22 @@ function customId(entry: GameEntry): string {
: entry.id;
}
/**
* Display label for a store badge. Steam and custom keep their localized strings; every other store
* (lutris, heroic, epic, …) is a proper noun shown capitalized, so new providers surface correctly
* without a translation per store.
*/
function storeLabel(store: string): string {
switch (store) {
case "custom":
return m.library_store_custom();
case "steam":
return m.library_store_steam();
default:
return store.charAt(0).toUpperCase() + store.slice(1);
}
}
interface FormState {
title: string;
portrait: string;
@@ -276,7 +292,7 @@ const GameCard: FC<GameCardProps> = ({ game, onEdit, onDelete, deleting }) => {
variant={isCustom ? "secondary" : "outline"}
className="bg-background/80 backdrop-blur"
>
{isCustom ? m.library_store_custom() : m.library_store_steam()}
{storeLabel(game.store)}
</Badge>
</div>
{isCustom && (