14 Commits

Author SHA1 Message Date
enricobuehler 5a2e07e865 style(windows): rustfmt install.rs to unbreak cargo fmt --all --check
apple / swift (push) Successful in 1m3s
ci / rust (push) Successful in 4m52s
ci / web (push) Successful in 56s
ci / docs-site (push) Successful in 59s
apple / screenshots (push) Successful in 5m12s
ci / bench (push) Successful in 4m40s
windows-host / package (push) Successful in 6m28s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m17s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m13s
release / apple (push) Successful in 10m9s
deb / build-publish (push) Successful in 2m44s
decky / build-publish (push) Successful in 11s
android / android (push) Successful in 3m33s
flatpak / build-publish (push) Successful in 4m9s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m5s
docker / deploy-docs (push) Successful in 7s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m16s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m41s
The pnputil /add-driver call in windows/install.rs was committed unwrapped;
`cargo fmt --all --check` (which checks cfg(windows) files too) flagged it and
failed the `rust` CI job at the Format step, skipping clippy/build/test. Apply
rustfmt — no behavior change. Clears the way to cut the v0.2.0 release from
green main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 23:19:12 +00:00
enricobuehler 6e949b6748 fix(readme): make the logo readable on light + dark themes
apple / swift (push) Successful in 1m3s
apple / screenshots (push) Successful in 5m25s
ci / rust (push) Failing after 1m5s
ci / web (push) Successful in 52s
ci / docs-site (push) Successful in 1m0s
android / android (push) Successful in 3m59s
deb / build-publish (push) Successful in 2m31s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 4s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
ci / bench (push) Successful in 4m35s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m55s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m46s
docker / deploy-docs (push) Successful in 7s
The wordmark was light violet only — low-contrast on a light README
background. Swap to a single theme-adaptive SVG: an internal
`prefers-color-scheme` media query paints it deep violet (the brand-mark
palette) on light backgrounds and the original light violet on dark, so it
reads on both GitHub/Gitea themes with no markup change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:54:03 +00:00
enricobuehler 8ae161fe61 docs(windows): README - install via punktfunk-host.exe driver install / web setup (not .ps1)
apple / swift (push) Successful in 1m0s
windows-host / package (push) Successful in 6m20s
apple / screenshots (push) Successful in 5m26s
ci / rust (push) Failing after 26s
ci / web (push) Successful in 54s
deb / build-publish (push) Successful in 2m30s
ci / docs-site (push) Successful in 1m3s
android / android (push) Successful in 3m19s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m35s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m2s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m48s
docker / deploy-docs (push) Successful in 6s
Option A removed install-pf-vdisplay.ps1 / install-gamepad-drivers.ps1 / web-setup.ps1;
the installer now calls the exe subcommands. Drop the stale table rows + reword the
install-flow + 'thin installer' notes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 16:46:05 +00:00
enricobuehler 3a89ee8cd7 docs(readme): add logo banner + refresh Windows-host status
- Add the centered punktfunk wordmark banner at the top (assets/punktfunk-logo.svg,
  the same logo + layout the marketing site's README uses).
- Refresh the now-stale Windows-host facts: all-vendor (NVENC + AMF/QSV), its own
  all-Rust pf-vdisplay IddCx virtual display (was SudoVDA), bundled UMDF virtual-gamepad
  drivers (ViGEmBus gone), HDR incl. Vulkan-game HDR; x64-only, no longer NVIDIA-only.
- Note punktfunk-host covers Linux + Windows; point design/ at its new README index.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:45:29 +00:00
enricobuehler dac0fee4e3 docs(windows): reflect the install-via-exe (Option A) landing in the build/packaging doc
apple / swift (push) Successful in 1m3s
apple / screenshots (push) Successful in 5m31s
ci / web (push) Successful in 49s
decky / build-publish (push) Successful in 14s
ci / rust (push) Failing after 32s
ci / docs-site (push) Successful in 1m1s
android / android (push) Successful in 3m21s
deb / build-publish (push) Successful in 2m30s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 13s
ci / bench (push) Successful in 4m49s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m32s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m47s
docker / deploy-docs (push) Successful in 6s
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 16:44:47 +00:00
enricobuehler 125a51d81d feat(windows-installer): move driver + web install into the host exe (ASCII root fix)
apple / swift (push) Successful in 1m0s
apple / screenshots (push) Successful in 5m16s
windows-host / package (push) Successful in 6m25s
ci / rust (push) Failing after 28s
ci / web (push) Successful in 53s
ci / docs-site (push) Successful in 1m1s
android / android (push) Successful in 3m21s
deb / build-publish (push) Successful in 2m31s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 6s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m39s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m2s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m50s
docker / deploy-docs (push) Successful in 17s
Port the three install-time PowerShell *files* (install-pf-vdisplay.ps1,
install-gamepad-drivers.ps1, web-setup.ps1) into punktfunk-host.exe subcommands:
`driver install [--gamepad] --dir <stage>` and `web setup --app-dir <app>
[--password-file <f>]` (windows/install.rs).

Why: PowerShell 5.1 reads a BOM-less .ps1 FILE in the machine ANSI codepage, so a
stray non-ASCII byte mis-decodes and aborts on a non-English box - exactly how the
pf-vdisplay driver install silently failed. A compiled subcommand drives the same
external tools (certutil/pnputil/nefconc/schtasks/netsh/icacls) as fixed string
literals, with no file-codepage surface. (The .iss's INLINE -Command PowerShell is a
command-line string, not a file read, so it's unaffected and stays.)

- windows/install.rs: faithful port - cert trust, gated nefconc node create + pnputil
  for pf-vdisplay; pnputil per-inf for gamepads; web-password ACL, the PunktfunkWeb task
  (generated UTF-16 XML), firewall rule, start. Best-effort (a hiccup warns, never aborts).
- punktfunk-host.iss [Run]: call the exe instead of `powershell -File`; drop the
  web-setup.ps1 staging + WebSetup define; WebSetupParams emits --app-dir/--password-file.
- pack-host-installer.ps1: stop copying the three install scripts into the stages.
- delete the three .ps1 files.

The `mod install;` + dispatch arms in main.rs landed in the preceding docs commit
(swept up by a concurrent commit); this commit adds the module + installer wiring.
CI-compile-validated via windows-host; the install path is on-glass-validated on the
next canary install (the test box is offline).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 16:43:18 +00:00
enricobuehler 7b99b41ede docs(design): trim shipped plans, consolidate cluster, add index
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).

- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
  apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
  host-latency, gpu-contention (fixed stale status table), game-library,
  linux-setup (fixed m0->spike + stale zero-copy claim),
  session-aware-host-followups, windows-client-bootstrap,
  windows-dualsense-{scoping,game-detection}, windows-virtual-display,
  security-review (per-finding status table; #12 still open),
  apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
  windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
  merged, M4 done); windows-secure-desktop.md archived (now a fallback
  behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
  roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:39:06 +00:00
enricobuehler 9ea2c17419 docs(windows): add design/windows-build-and-packaging.md + refresh packaging README
apple / swift (push) Successful in 1m0s
apple / screenshots (push) Successful in 5m19s
windows-host / package (push) Successful in 6m20s
android / android (push) Successful in 4m42s
ci / rust (push) Successful in 4m47s
ci / web (push) Successful in 50s
ci / docs-site (push) Successful in 58s
deb / build-publish (push) Successful in 2m30s
decky / build-publish (push) Successful in 23s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
ci / bench (push) Successful in 4m40s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m16s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m3s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m0s
docker / deploy-docs (push) Successful in 22s
A single repo-internal source of truth for the Windows build/packaging: what ships, the
all-Rust driver workspace built FROM SOURCE in CI (+ the anti-stale rationale), the
toolchain (clang 22 + bindgen 0.72, no LLVM pin), the Inno installer, the web console
bundle, the CI workflows, signing, and the dev loop. (design/, not the docs-site.)

packaging/windows/README.md: drop the deleted vendored-driver dir + its "Vendored driver"
callout, add the build-* / install-gamepad / clear-force-integrity rows, point at the new
design doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 16:22:40 +00:00
enricobuehler a9cca82fb8 chore(windows): clean up build/packaging - drop vendored driver binaries + the LLVM-21 pin
windows-drivers-provision / provision (push) Successful in 13s
windows-drivers / probe-and-proto (push) Successful in 17s
android / android (push) Failing after 40s
apple / swift (push) Successful in 1m0s
ci / web (push) Successful in 58s
windows-drivers / driver-build (push) Successful in 1m9s
ci / docs-site (push) Successful in 1m18s
ci / rust (push) Successful in 4m25s
apple / screenshots (push) Successful in 5m24s
deb / build-publish (push) Successful in 2m29s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 29s
ci / bench (push) Successful in 4m48s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
windows-host / package (push) Successful in 6m38s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m24s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 9m31s
docker / deploy-docs (push) Successful in 18s
Now that the drivers build from source in CI, remove the dead checked-in binaries and
the toolchain cruft they left behind:

- Delete packaging/windows/{pf-vdisplay,gamepad-drivers}/ (the prebuilt .dll/.inf/.cat/.cer).
  pack-host-installer.ps1 builds + signs all three drivers from the drivers/ workspace and
  nothing reads the vendored dirs anymore; stage-pf-vdisplay.ps1's -VendorDir is now a
  mandatory build-output path, not a vendored default.
- Drop the LLVM-21 pin. The vendored bindgen 0.71->0.72 bump (the shipping pack already
  builds green on the runner-default clang 22) retired the bindgen-0.71 layout-test overflow
  that needed LLVM 21.1.2, so windows-drivers.yml + provision-windows-wdk.ps1 no longer
  install/point at C:\llvm-21 (~898 MB off a fresh provision) - both driver builds now use one
  toolchain (clang 22 + bindgen 0.72).
- pack -SkipBuild on the gamepad build (build-pf-vdisplay.ps1 already builds the whole
  workspace), build-web.ps1 reaps a stale node too, deploy-dev.ps1 nefconc path + comments.
- Reword the vendored-driver references (build scripts, .iss, READMEs, the vite web-bundle
  comment) to the build-from-source reality.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 16:16:46 +00:00
enricobuehler 7ab0661ddc fix(windows-installer): escape the brace in the [UninstallRun] PowerShell so ISCC compiles
windows-drivers / probe-and-proto (push) Successful in 21s
apple / swift (push) Successful in 1m4s
windows-drivers / driver-build (push) Successful in 1m9s
android / android (push) Successful in 4m25s
ci / web (push) Successful in 53s
apple / screenshots (push) Successful in 5m32s
ci / rust (push) Successful in 4m45s
ci / docs-site (push) Successful in 52s
windows-host / package (push) Successful in 6m47s
deb / build-publish (push) Successful in 2m28s
decky / build-publish (push) Successful in 14s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 3s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m45s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m55s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m46s
The Bug C [UninstallRun] one-liner had `ForEach-Object { Stop-Process ... }`; Inno
Setup parses `{...}` as a constant in [Run]/[UninstallRun] sections, so ISCC aborted
with "Unknown constant" and the windows-host pack failed at the ISCC step (the host
build, clippy, driver build + web smoke-boot all passed). Escape `{` as `{{`. The
same one-liner in the [Code] StopWebConsole proc is inside a Pascal string literal,
so its brace is literal and must NOT be escaped. Validated: ISCC now parses past
[UninstallRun] + [Code] (fails only later on the absent dummy payload).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 15:15:07 +00:00
enricobuehler 92e68024f1 fix(windows-installer): build the gamepad drivers from source in CI too
Fold the pf-dualsense (DualSense / DualShock 4) and pf-xusb (Xbox 360 / XInput)
UMDF drivers into the in-tree drivers workspace (their source had stale
../../crates/wdk-* path-deps from before the wdk vendoring reorg and could no
longer build at all) and build them from source per release, exactly like
pf-vdisplay - same anti-stale reasoning. One `cargo build --release` now builds
all three drivers against the vendored wdk-sys (incl. the bindgen 0.72 pin), and
build-gamepad-drivers.ps1 signs pf_dualsense + pf_xusb (clear FORCE_INTEGRITY ->
sign dll -> stampinf -> Inf2Cat -> sign cat) with one shared cert + .cer,
matching the layout install-gamepad-drivers.ps1 expects. pack-host-installer.ps1
builds + stages them instead of the retired checked-in binaries.

Validated on the runner: the whole workspace (pf-vdisplay + pf-dualsense +
pf-xusb) builds with CARGO_TARGET_DIR=C:\t set, and build-gamepad-drivers.ps1
produces signed pf_dualsense.{dll,inf,cat} + pf_xusb.{dll,inf,cat} + the .cer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 15:08:40 +00:00
enricobuehler 64abce6daa fix(windows-installer): pf-vdisplay CI build - default target dir + non-fatal cat guard
apple / swift (push) Successful in 59s
android / android (push) Successful in 4m23s
ci / rust (push) Successful in 4m43s
ci / web (push) Successful in 50s
ci / docs-site (push) Successful in 54s
windows-host / package (push) Failing after 5m39s
apple / screenshots (push) Successful in 5m15s
deb / build-publish (push) Successful in 2m31s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 6s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m39s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 9m6s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m52s
The CI driver build panicked in wdk-sys's build script - "a Cargo.lock file should
exist in the same directory as the top-level Cargo.toml". wdk-build's
find_top_level_cargo_manifest() walks UP from OUT_DIR for the first ancestor holding a
Cargo.lock and explicitly does NOT support non-default target dirs - but
build-pf-vdisplay.ps1 pointed CARGO_TARGET_DIR at an out-of-tree dir (to isolate from
CI's shared C:\t), so no ancestor of OUT_DIR had a Cargo.lock. Build into the driver
workspace's DEFAULT target dir instead (its ancestors include the driver Cargo.lock);
the driver's own [workspace] already isolates it and it has no CMake deps needing C:\t.
Also make the Test-FileCatalog coverage guard non-fatal (it can't open a catalog
signed by a not-yet-trusted cert). Validated on the runner with CARGO_TARGET_DIR=C:\t.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 14:58:20 +00:00
enricobuehler bdfab8e0d5 fix(windows-installer): build pf-vdisplay from source in CI; ASCII scripts; upgrade-safe web console
windows-drivers / probe-and-proto (push) Successful in 24s
apple / swift (push) Successful in 1m4s
windows-drivers / driver-build (push) Successful in 1m8s
android / android (push) Successful in 4m4s
ci / rust (push) Successful in 4m39s
ci / web (push) Successful in 50s
ci / docs-site (push) Successful in 53s
apple / screenshots (push) Successful in 5m10s
windows-host / package (push) Failing after 5m35s
deb / build-publish (push) Successful in 2m29s
decky / build-publish (push) Successful in 13s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
ci / bench (push) Successful in 4m42s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m57s
docker / deploy-docs (push) Successful in 17s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m46s
The pf-vdisplay virtual-display driver shipped as a checked-in PREBUILT binary
that went stale - two field failures on a fresh install (live-repro'd on a
German-locale Dell laptop):

  * Bug A (every box): a repo-wide rename edited the vendored pf_vdisplay.inf
    but never re-signed pf_vdisplay.cat, so the catalog stopped covering the INF
    -> `pnputil /add-driver` fails SPAPI_E_FILE_HASH_NOT_IN_CATALOG -> driver
    never installs -> every session dies "pf-vdisplay driver interface not
    found".
  * the prebuilt binary also predated IOCTL_SET_RENDER_ADAPTER (added to the
    driver source after the vendor freeze) that the host needs to pin the IDD
    render GPU on hybrid/Optimus boxes.

Fix: build the driver FROM SOURCE every release (build-pf-vdisplay.ps1, wired
into pack-host-installer.ps1) so .dll/.inf/.cat are always in lockstep and
current driver features ship. The runner's clang 22 made the driver's pinned
bindgen 0.71 emit opaque structs (157 layout-assert errors), so bump the
vendored wdk-sys/wdk-build bindgen 0.71 -> 0.72 (+ lock). The build self-signs
the driver per build (installer trusts the bundled .cer); a stable
DRIVER_CERT_PFX_B64 secret can override.

  * Bug B (non-English boxes): the installer runs install-pf-vdisplay.ps1 etc.
    via powershell.exe (5.1), which reads a BOM-less script in the ANSI codepage
    - an em-dash's trailing 0x94 byte becomes a curly quote on German
    Windows-1252 and the script aborts "unterminated string", so the driver
    never installed (the gamepad script survived only because it was already
    ASCII). Scrub every installer-run .ps1/.cmd to ASCII + add a CI gate that
    fails on any non-ASCII so it can't regress.

  * Bug C (upgrades): nothing stopped the OLD web console before re-registering
    its task, so a stale server kept :3000 (the new one restart-looped on
    EADDRINUSE) and served a broken old bundle (500 on /login). Stop + reap it
    (runtime-agnostic, by the :3000 listener owner) in web-setup.ps1 and in the
    .iss before the file copy + on uninstall.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 14:33:34 +00:00
enricobuehler 8e87e617df fix(windows-host): force EXTEND topology so a new IddCx display isn't cloned
A freshly-added IddCx virtual display lands in CLONE/duplicate mode when a
physical display is already active (a laptop panel, an attached monitor): the
cloned output shares that display's source, so the OS never commits a distinct
path for it, never calls ASSIGN_SWAPCHAIN, and capture sees no frames - the
session fails "not an active display path / needs a WDDM GPU to activate" and
tears down with 0 frames (seen live on an Intel-iGPU + NVIDIA-Optimus laptop).

force_extend_topology() applies the EXTEND preset (the programmatic Win+P
"Extend") right after ADD so the IDD comes up as its own active path; the
existing resolve_gdi_name -> set_active_mode -> isolate_displays_ccd bring-up
then proceeds. Idempotent / no-op on a sole-display (headless single-GPU) box,
so it's safe on the path that already worked.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 14:33:15 +00:00
80 changed files with 2564 additions and 3938 deletions
+5 -5
View File
@@ -76,7 +76,7 @@ jobs:
head "EWDK" head "EWDK"
Write-Host ("EWDKROOT = " + ($env:EWDKROOT ?? '<unset>')) Write-Host ("EWDKROOT = " + ($env:EWDKROOT ?? '<unset>'))
head "LLVM / clang (README pins 21.1.2 for wdk-sys bindgen)" head "LLVM / clang (bindgen 0.72 builds on the runner default clang)"
Write-Host ("LIBCLANG_PATH = " + ($env:LIBCLANG_PATH ?? '<unset>')) Write-Host ("LIBCLANG_PATH = " + ($env:LIBCLANG_PATH ?? '<unset>'))
$clang = Get-Command clang -ErrorAction SilentlyContinue $clang = Get-Command clang -ErrorAction SilentlyContinue
if ($clang) { & clang --version } else { Write-Host "clang: NOT on PATH" } if ($clang) { & clang --version } else { Write-Host "clang: NOT on PATH" }
@@ -119,12 +119,12 @@ jobs:
env: env:
# wdk-build otherwise picks 10.0.28000.0 (no km/crt) and bindgen fails — pin the WDK SDK version. # wdk-build otherwise picks 10.0.28000.0 (no km/crt) and bindgen fails — pin the WDK SDK version.
Version_Number: '10.0.26100.0' Version_Number: '10.0.26100.0'
# wdk-sys bindgen layout tests overflow (E0080) on the runner's default LLVM (ToT/22-dev); point at # No LIBCLANG_PATH pin: the vendored bindgen 0.72 builds clean on the runner's default clang 22
# the pinned LLVM 21.1.2 that windows-drivers-rs builds clean against (provisioned to C:\llvm-21). # (the shipping pack proves it). A 0.71-era layout-test overflow once needed LLVM 21; the 0.72 bump
LIBCLANG_PATH: 'C:\llvm-21\bin' # retired that — see design/windows-build-and-packaging.md.
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
- name: Ensure WDK + cargo-wdk + LLVM 21.1.2 (idempotent self-provision) - name: Ensure WDK + cargo-wdk (idempotent self-provision)
# Run the provisioning script here too so driver-build is self-sufficient and never races a # Run the provisioning script here too so driver-build is self-sufficient and never races a
# separate provision run on the single runner. Path is relative to the job working-directory # separate provision run on the single runner. Path is relative to the job working-directory
# (packaging/windows/drivers). Near-noop once the toolchain is present. # (packaging/windows/drivers). Near-noop once the toolchain is present.
+16
View File
@@ -56,6 +56,22 @@ jobs:
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
- name: Locale-safety gate (installer-run scripts must be ASCII)
shell: pwsh
# The installer runs these via powershell.exe (Windows PowerShell 5.1) and cmd.exe on the END
# USER's box. PS 5.1 reads a BOM-less script in the active ANSI codepage, so on a non-UTF-8 locale
# (e.g. German Windows-1252) a stray em-dash mis-decodes into a curly quote and the script aborts
# with "unterminated string" - exactly how the pf-vdisplay driver install silently failed in the
# field. Keep every installer-run script pure ASCII (matches install-gamepad-drivers.ps1).
run: |
$bad = Get-ChildItem packaging/windows/*.ps1, scripts/windows/*.ps1, scripts/windows/*.cmd -ErrorAction SilentlyContinue |
Where-Object { [IO.File]::ReadAllText($_.FullName) -match '[^\x00-\x7F]' }
if ($bad) {
$bad.FullName | ForEach-Object { Write-Output "::error::non-ASCII in installer-run script: $_" }
throw "installer-run scripts must be pure ASCII (PS 5.1 mis-parses them on non-UTF-8 locales)"
}
Write-Output "installer-run scripts are ASCII-clean"
- name: Configure + version - name: Configure + version
shell: pwsh shell: pwsh
run: | run: |
+13 -9
View File
@@ -1,8 +1,12 @@
# punktfunk <p align="center">
<img src="assets/punktfunk-logo.svg" alt="punktfunk" width="320" />
</p>
**Low-latency desktop and game streaming with first-class Linux and Windows hosts.** Run the host on <p align="center"><b>Low-latency desktop and game streaming with first-class Linux and Windows hosts.</b></p>
a Linux machine or a Windows PC, connect from a Mac, PC, phone, tablet, or TV, and stream your desktop
or games — each device at its **own native resolution and refresh rate**, over your local network. Run the host on a Linux machine or a Windows PC, connect from a Mac, PC, phone, tablet, or TV, and
stream your desktop or games — each device at its **own native resolution and refresh rate**, over
your local network.
📖 **Documentation: [docs.punktfunk.unom.io](https://docs.punktfunk.unom.io)** — start with 📖 **Documentation: [docs.punktfunk.unom.io](https://docs.punktfunk.unom.io)** — start with
[How It Works](https://docs.punktfunk.unom.io/docs/how-it-works) or the [How It Works](https://docs.punktfunk.unom.io/docs/how-it-works) or the
@@ -43,7 +47,7 @@ protocol, FEC, and crypto, linked into the host and every client over a stable C
| **Core**`punktfunk-core` + C ABI (protocol · FEC · crypto · QUIC) | ✅ Complete & hardened | | **Core**`punktfunk-core` + C ABI (protocol · FEC · crypto · QUIC) | ✅ Complete & hardened |
| **GameStream host** → stock Moonlight | ✅ Live end-to-end: pairing, RTSP, audio, per-client virtual output at native resolution, GPU zero-copy NVENC, gamepads | | **GameStream host** → stock Moonlight | ✅ Live end-to-end: pairing, RTSP, audio, per-client virtual output at native resolution, GPU zero-copy NVENC, gamepads |
| **Native protocol**`punktfunk/1` | ✅ Validated live: QUIC control + GF(2¹⁶) FEC/AES-GCM data plane, PIN pairing, mDNS discovery, mid-stream mode renegotiation | | **Native protocol**`punktfunk/1` | ✅ Validated live: QUIC control + GF(2¹⁶) FEC/AES-GCM data plane, PIN pairing, mDNS discovery, mid-stream mode renegotiation |
| **Windows host** (NVIDIA, x64) | 🟡 Implemented & shipping as a signed installer (DXGI capture · SudoVDA virtual display · NVENC · WASAPI · ViGEm); NVIDIA-only, newer than the Linux host | | **Windows host** (x64) | 🟡 Implemented & shipping as a signed installer: DXGI/WGC capture · its own all-Rust IddCx **virtual display** (secure-desktop capable) · GPU encode (NVENC on NVIDIA, AMF/QSV on AMD/Intel) · WASAPI audio · bundled virtual-gamepad drivers (no ViGEmBus) · HDR incl. Vulkan-game HDR. NVIDIA live-validated; AMD/Intel CI-green |
| **macOS / iOS / tvOS client** (`clients/apple`) | ✅ Streaming live: VideoToolbox decode, controllers incl. DualSense, discovery, pairing, speed test | | **macOS / iOS / tvOS client** (`clients/apple`) | ✅ Streaming live: VideoToolbox decode, controllers incl. DualSense, discovery, pairing, speed test |
| **Linux client** (`clients/linux`, GTK4) | ✅ Streaming live: FFmpeg + VAAPI zero-copy decode, PipeWire audio, SDL3 controllers; ships as Flatpak/apt/rpm/Arch | | **Linux client** (`clients/linux`, GTK4) | ✅ Streaming live: FFmpeg + VAAPI zero-copy decode, PipeWire audio, SDL3 controllers; ships as Flatpak/apt/rpm/Arch |
| **Android client** (`clients/android`, phone + TV) | ✅ Streaming live: AMediaCodec decode + HDR10, Oboe audio, controllers, discovery, pairing | | **Android client** (`clients/android`, phone + TV) | ✅ Streaming live: AMediaCodec decode + HDR10, Oboe audio, controllers, discovery, pairing |
@@ -69,14 +73,14 @@ roadmap: **[/docs/roadmap](https://docs.punktfunk.unom.io/docs/roadmap)**.
Pick your platform and install from its package registry — the per-platform guide covers adding the Pick your platform and install from its package registry — the per-platform guide covers adding the
repo, first run, and the web console. The Linux host is the primary, most battle-tested path; a repo, first run, and the web console. The Linux host is the primary, most battle-tested path; a
Windows host (NVIDIA-only) also ships as a signed installer. Windows host also ships as a signed installer (all-vendor: NVIDIA, AMD, Intel).
| Platform | Install | Guide | | Platform | Install | Guide |
|--------|---------|-------| |--------|---------|-------|
| **Ubuntu / Debian** (apt) | `sudo apt install punktfunk-host` *(after adding the repo)* | [Ubuntu — GNOME](https://docs.punktfunk.unom.io/docs/ubuntu-gnome) · [KDE](https://docs.punktfunk.unom.io/docs/ubuntu-kde) | | **Ubuntu / Debian** (apt) | `sudo apt install punktfunk-host` *(after adding the repo)* | [Ubuntu — GNOME](https://docs.punktfunk.unom.io/docs/ubuntu-gnome) · [KDE](https://docs.punktfunk.unom.io/docs/ubuntu-kde) |
| **Fedora / Bazzite** (rpm-ostree) | `rpm-ostree install punktfunk punktfunk-web` *(or the bootc image)* | [Fedora — KDE](https://docs.punktfunk.unom.io/docs/fedora-kde) · [Bazzite](https://docs.punktfunk.unom.io/docs/bazzite) | | **Fedora / Bazzite** (rpm-ostree) | `rpm-ostree install punktfunk punktfunk-web` *(or the bootc image)* | [Fedora — KDE](https://docs.punktfunk.unom.io/docs/fedora-kde) · [Bazzite](https://docs.punktfunk.unom.io/docs/bazzite) |
| **Arch / Steam Deck** (PKGBUILD / sysext) | `makepkg -si` *(Arch)* · sysext `.raw` *(SteamOS)* | [packaging/arch](packaging/arch/README.md) | | **Arch / Steam Deck** (PKGBUILD / sysext) | `makepkg -si` *(Arch)* · sysext `.raw` *(SteamOS)* | [packaging/arch](packaging/arch/README.md) |
| **Windows** (NVIDIA, x64) | signed `setup.exe` from the package registry | [Windows Host](https://docs.punktfunk.unom.io/docs/windows-host) | | **Windows** (x64) | signed `setup.exe` from the package registry | [Windows Host](https://docs.punktfunk.unom.io/docs/windows-host) |
`punktfunk-host` is the streaming host; `punktfunk-web` is the browser console (pairing + status). `punktfunk-host` is the streaming host; `punktfunk-web` is the browser console (pairing + status).
After install, run `punktfunk-host serve` inside your desktop session (the secure native default; After install, run `punktfunk-host serve` inside your desktop session (the secure native default;
@@ -121,7 +125,7 @@ and the [docs site](https://docs.punktfunk.unom.io).
``` ```
crates/ crates/
punktfunk-core/ protocol · FEC · pacing · crypto · QUIC control plane — the C ABI (lib + cdylib + staticlib) punktfunk-core/ protocol · FEC · pacing · crypto · QUIC control plane — the C ABI (lib + cdylib + staticlib)
punktfunk-host/ Linux host: virtual displays · capture · encode · input · GameStream · punktfunk/1 · mgmt punktfunk-host/ the host (Linux + Windows): virtual displays · capture · encode · input · GameStream · punktfunk/1 · mgmt
clients/ clients/
apple/ macOS / iOS / tvOS app (Swift · VideoToolbox · Metal · GameController) apple/ macOS / iOS / tvOS app (Swift · VideoToolbox · Metal · GameController)
linux/ Linux desktop app (Rust · GTK4/libadwaita · FFmpeg/VAAPI · PipeWire · SDL3) linux/ Linux desktop app (Rust · GTK4/libadwaita · FFmpeg/VAAPI · PipeWire · SDL3)
@@ -132,7 +136,7 @@ clients/
web/ web console (TanStack) over the management API — status · devices · pairing web/ web console (TanStack) over the management API — status · devices · pairing
packaging/ apt · rpm / COPR · Arch · Flatpak · Bazzite bootc image packaging/ apt · rpm / COPR · Arch · Flatpak · Bazzite bootc image
docs-site/ public documentation site (Fumadocs) — https://docs.punktfunk.unom.io docs-site/ public documentation site (Fumadocs) — https://docs.punktfunk.unom.io
design/ design notes & deep-dive plans design/ design notes & deep-dive plans (index: design/README.md)
include/punktfunk_core.h cbindgen-generated C header (checked in) include/punktfunk_core.h cbindgen-generated C header (checked in)
tools/ latency-probe · loss-harness (measurement) tools/ latency-probe · loss-harness (measurement)
``` ```
+33
View File
@@ -0,0 +1,33 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg width="100%" height="100%" viewBox="0 0 579 298" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xml:space="preserve" style="fill-rule:evenodd;clip-rule:evenodd;stroke-linejoin:round;stroke-miterlimit:2;">
<style>
/* Theme-adaptive so the logo stays readable on both light and dark README
backgrounds: deep violet (the brand-mark palette) on light, the original
light violet on dark. Evaluated by the viewer's color scheme. */
.pf-wm { fill: #6c5bf3; }
.pf-back { fill: #a79ff8; }
.pf-deep { fill: #6c5bf3; }
@media (prefers-color-scheme: dark) {
.pf-wm { fill: #cec9fb; }
.pf-back { fill: #f2f1fe; }
.pf-deep { fill: #8c7ef5; }
}
</style>
<g>
<g>
<path class="pf-wm" style="fill-rule:nonzero;" d="M21.144,176.635l0,102.687l31.253,0l0,-35.563l73.436,0l0,-23.555l-73.436,0l0,-19.398l77.285,0l0,-24.171l-108.537,0Z"/>
<path class="pf-wm" style="fill-rule:nonzero;" d="M136.148,176.635l0,47.264c0.154,16.627 0.154,16.627 0.308,20.014c0.77,15.087 2.463,21.4 7.544,26.634c7.698,8.16 20.014,10.315 59.272,10.315c23.863,0 34.178,-0.616 43.415,-2.463c11.7,-2.463 19.552,-10.623 21.246,-22.323c0.924,-7.236 1.078,-8.929 1.54,-32.176l0,-47.264l-31.253,0l0,47.264c0,2.155 -0.154,7.082 -0.308,10.623c-0.462,9.699 -1.232,12.47 -3.695,15.087c-3.387,3.695 -9.853,4.619 -31.407,4.619c-26.634,0 -32.638,-1.693 -34.332,-9.853c-0.77,-4.157 -0.77,-4.311 -1.078,-20.476l0,-47.264l-31.253,0Z"/>
<path class="pf-wm" style="fill-rule:nonzero;" d="M275.938,176.527l0,102.687l31.868,0l-0.77,-76.669l3.387,0l54.038,76.669l54.346,0l0,-102.687l-31.868,0l0.77,76.515l-3.233,0l-53.73,-76.515l-54.808,0Z"/>
<path class="pf-wm" style="fill-rule:nonzero;" d="M425.273,176.527l0,102.687l31.253,0l0,-39.258l17.089,0l46.032,39.258l47.418,0l-64.353,-52.344l59.426,-50.959l-47.88,0l-40.644,37.873l-17.089,0l0,-37.257l-31.253,0Z"/>
</g>
<path class="pf-back" style="fill-rule:nonzero;" d="M65.442,150.143c24.514,0 44.298,-19.784 44.298,-44.298c0,-24.514 -19.784,-44.298 -44.298,-44.298c-24.514,0 -44.298,19.784 -44.298,44.298c0,24.514 19.784,44.298 44.298,44.298Z"/>
<path class="pf-deep" style="fill-rule:nonzero;" d="M141.063,92.871c17.334,-17.334 17.334,-45.312 0,-62.647c-17.334,-17.334 -45.312,-17.334 -62.647,-0c-17.334,17.334 -17.334,45.312 0,62.647c17.334,17.334 45.312,17.334 62.647,-0Z"/>
<path style="fill:url(#_Linear1);" d="M121.228,104.359c-14.777,3.965 -31.187,0.136 -42.811,-11.488c-11.624,-11.624 -15.453,-28.034 -11.488,-42.811c14.777,-3.965 31.187,-0.136 42.811,11.488c11.624,11.624 15.453,28.034 11.488,42.811Z"/>
</g>
<defs>
<linearGradient id="_Linear1" x1="0" y1="0" x2="1" y2="0" gradientUnits="userSpaceOnUse" gradientTransform="matrix(31.323323,-31.323323,31.323323,31.323323,78.416832,92.870811)">
<stop offset="0" style="stop-color:#cec9fb;stop-opacity:0"/>
<stop offset="1" style="stop-color:#fcfcff;stop-opacity:1"/>
</linearGradient>
</defs>
</svg>

After

Width:  |  Height:  |  Size: 3.0 KiB

@@ -1,5 +1,5 @@
//! Input-desktop watcher (Windows) — the authoritative "normal vs secure desktop" signal for the //! Input-desktop watcher (Windows) — the authoritative "normal vs secure desktop" signal for the
//! two-process secure-desktop design (design/windows-secure-desktop.md). //! two-process secure-desktop design (design/archive/windows-secure-desktop.md).
//! //!
//! Windows switches the *input desktop* to "Winlogon" (the secure desktop) for UAC elevation, the //! Windows switches the *input desktop* to "Winlogon" (the secure desktop) for UAC elevation, the
//! lock screen and the login screen, and back to "Default" for the normal session. WGC captures only //! lock screen and the login screen, and back to "Default" for the normal session. WGC captures only
@@ -1,5 +1,5 @@
//! Host-side WGC helper relay (Windows two-process secure-desktop design, //! Host-side WGC helper relay (Windows two-process secure-desktop design,
//! design/windows-secure-desktop.md — step 4). //! design/archive/windows-secure-desktop.md — step 4).
//! //!
//! WGC won't activate under the SYSTEM account, so the SYSTEM host can't capture the normal desktop //! WGC won't activate under the SYSTEM account, so the SYSTEM host can't capture the normal desktop
//! itself. Instead it spawns `punktfunk-host wgc-helper` in the **interactive user session** (so WGC works) //! itself. Instead it spawns `punktfunk-host wgc-helper` in the **interactive user session** (so WGC works)
+10 -1
View File
@@ -35,6 +35,9 @@ mod gamestream;
mod hdr; mod hdr;
mod inject; mod inject;
#[cfg(target_os = "windows")] #[cfg(target_os = "windows")]
#[path = "windows/install.rs"]
mod install;
#[cfg(target_os = "windows")]
#[path = "windows/interactive.rs"] #[path = "windows/interactive.rs"]
mod interactive; mod interactive;
mod library; mod library;
@@ -390,7 +393,7 @@ fn real_main() -> Result<()> {
} }
// USER-session WGC helper (Windows two-process secure-desktop design): capture the EXISTING // USER-session WGC helper (Windows two-process secure-desktop design): capture the EXISTING
// SudoVDA via WGC + NVENC, stream AUs on stdout to the SYSTEM host. Spawned by the host // SudoVDA via WGC + NVENC, stream AUs on stdout to the SYSTEM host. Spawned by the host
// (CreateProcessAsUser), not run by hand. See design/windows-secure-desktop.md. // (CreateProcessAsUser), not run by hand. See design/archive/windows-secure-desktop.md.
#[cfg(target_os = "windows")] #[cfg(target_os = "windows")]
Some("wgc-helper") => { Some("wgc-helper") => {
let get = |flag: &str| { let get = |flag: &str| {
@@ -422,6 +425,12 @@ fn real_main() -> Result<()> {
// that launches the host into the active interactive session. // that launches the host into the active interactive session.
#[cfg(target_os = "windows")] #[cfg(target_os = "windows")]
Some("service") => service::main(&args[1..]), Some("service") => service::main(&args[1..]),
// Install-time work the Windows installer delegates to the exe instead of locale-parsed
// PowerShell *files* (the ANSI-codepage parse-break root fix; see windows/install.rs).
#[cfg(target_os = "windows")]
Some("driver") => install::driver_main(&args[1..]),
#[cfg(target_os = "windows")]
Some("web") => install::web_main(&args[1..]),
Some("-h") | Some("--help") | Some("help") | None => { Some("-h") | Some("--help") | Some("help") | None => {
print_usage(); print_usage();
Ok(()) Ok(())
+1 -1
View File
@@ -2364,7 +2364,7 @@ fn virtual_stream(ctx: SessionContext) -> Result<()> {
// Windows two-process secure-desktop path: when the host runs as SYSTEM (required for the secure // Windows two-process secure-desktop path: when the host runs as SYSTEM (required for the secure
// desktop + SendInput), WGC can't activate in-process, so we capture the normal desktop via a // desktop + SendInput), WGC can't activate in-process, so we capture the normal desktop via a
// helper spawned in the user session and relay its AUs. (Single-process WGC/DDA is used as the // helper spawned in the user session and relay its AUs. (Single-process WGC/DDA is used as the
// user, and stays the path on Linux.) See design/windows-secure-desktop.md. // user, and stays the path on Linux.) See design/archive/windows-secure-desktop.md.
#[cfg(target_os = "windows")] #[cfg(target_os = "windows")]
if plan.topology == crate::session_plan::SessionTopology::TwoProcessRelay { if plan.topology == crate::session_plan::SessionTopology::TwoProcessRelay {
return virtual_stream_relay(ctx); return virtual_stream_relay(ctx);
@@ -27,7 +27,8 @@ use windows::Win32::Foundation::{HANDLE, LUID};
use super::{Mode, VirtualOutput}; use super::{Mode, VirtualOutput};
use crate::win_display::{ use crate::win_display::{
isolate_displays_ccd, resolve_gdi_name, restore_displays_ccd, set_active_mode, SavedConfig, force_extend_topology, isolate_displays_ccd, resolve_gdi_name, restore_displays_ccd,
set_active_mode, SavedConfig,
}; };
/// The per-backend REMOVE key the driver stamps on ADD and consumes on REMOVE. SudoVDA keys monitors by /// The per-backend REMOVE key the driver stamps on ADD and consumes on REMOVE. SudoVDA keys monitors by
@@ -326,6 +327,15 @@ impl VirtualDisplayManager {
} }
}); });
// Windows defaults a new IddCx monitor into CLONE mode when a physical display is already
// active (a laptop panel, an attached monitor): the cloned IDD shares that display's source, so
// the OS never commits a distinct path for it and capture sees no frames. Force EXTEND first so
// the IDD comes up as its OWN active path; the resolve loop below then finds it. Idempotent /
// no-op on a sole-display box, so it's safe on the headless single-GPU path too.
// SAFETY: `force_extend_topology` only calls `SetDisplayConfig` (a CCD topology apply) with no
// borrowed caller memory; it runs under the manager `state` lock, the sole topology mutator.
unsafe { force_extend_topology() };
// Resolve the capture target. May be None on a GPU-less box (target added but not WDDM-activated); // Resolve the capture target. May be None on a GPU-less box (target added but not WDDM-activated);
// the capture backend re-resolves once a GPU is present. // the capture backend re-resolves once a GPU is present.
let mut gdi_name = None; let mut gdi_name = None;
@@ -0,0 +1,402 @@
//! `punktfunk-host driver install` / `web setup` - the install-time work the Windows installer's Inno
//! `[Run]` section delegates to the host EXE instead of locale-parsed PowerShell *files*.
//!
//! Why: Windows PowerShell 5.1 reads a BOM-less `.ps1` *file* in the machine's ANSI codepage, so on a
//! non-English locale a stray non-ASCII byte mis-decodes and the script aborts "unterminated string" -
//! exactly how the pf-vdisplay driver install silently failed on a German box. A compiled subcommand has
//! no such surface: the external tools it drives (`certutil`/`pnputil`/`nefconc`/`schtasks`/`netsh`/
//! `icacls`) are fixed string literals, not a file parsed in some codepage. (The installer's *inline*
//! `-Command` PowerShell in the `.iss` is unaffected - that's a command-line string, not a file read -
//! so it stays.) Sits next to `service install` (`service.rs`), the established Rust-owns-install pattern.
//!
//! Everything here is BEST-EFFORT: a hiccup warns but returns `Ok` - a non-zero exit would abort the
//! whole installer, and a missing driver only degrades the host to a physical display.
use anyhow::{bail, Context, Result};
use std::path::{Path, PathBuf};
use std::process::{Command, Stdio};
// ── arg + command helpers ──────────────────────────────────────────────────────────────────────
fn flag_val(args: &[String], name: &str) -> Option<String> {
args.iter()
.position(|a| a == name)
.and_then(|i| args.get(i + 1))
.cloned()
}
fn flag_present(args: &[String], name: &str) -> bool {
args.iter().any(|a| a == name)
}
/// Run a command, discard output, return whether it succeeded.
fn run_quiet(cmd: &str, args: &[&str]) -> bool {
Command::new(cmd)
.args(args)
.stdout(Stdio::null())
.stderr(Stdio::null())
.status()
.map(|s| s.success())
.unwrap_or(false)
}
/// Run a command, capture stdout (lossy UTF-8); empty on failure.
fn run_capture(cmd: &str, args: &[&str]) -> String {
Command::new(cmd)
.args(args)
.output()
.map(|o| String::from_utf8_lossy(&o.stdout).into_owned())
.unwrap_or_default()
}
// ── `driver install [--gamepad] --dir <stage>` ─────────────────────────────────────────────────
pub fn driver_main(args: &[String]) -> Result<()> {
match args.first().map(String::as_str) {
Some("install") => driver_install(&args[1..]),
_ => bail!("usage: punktfunk-host driver install --dir <stage> [--gamepad]"),
}
}
fn driver_install(args: &[String]) -> Result<()> {
let dir =
PathBuf::from(flag_val(args, "--dir").context("driver install: --dir <stage> required")?);
let gamepad = flag_present(args, "--gamepad");
let (what, res) = if gamepad {
("gamepad", install_gamepad(&dir))
} else {
("pf-vdisplay", install_pf_vdisplay(&dir))
};
if let Err(e) = res {
// Never abort the installer on a driver failure (matches the old best-effort PS scripts).
eprintln!("warning: {what} driver install: {e:#} (the host degrades without it)");
}
Ok(())
}
/// Trust the bundled self-signed driver cert: machine `Root` (so the chain validates) + `TrustedPublisher`
/// (so PnP installs without a prompt).
fn trust_cert(dir: &Path) {
match first_with_ext(dir, "cer") {
Some(cer) => {
let cer = cer.to_string_lossy().into_owned();
for store in ["Root", "TrustedPublisher"] {
if !run_quiet("certutil", &["-addstore", "-f", store, &cer]) {
eprintln!("warning: certutil -addstore {store} failed for {cer}");
}
}
println!("trusted driver cert {cer} (Root + TrustedPublisher)");
}
None => eprintln!(
"warning: no .cer in {} - driver may not install silently",
dir.display()
),
}
}
fn install_pf_vdisplay(dir: &Path) -> Result<()> {
let inf = dir.join("pf_vdisplay.inf");
if !inf.exists() {
bail!("no pf_vdisplay.inf in {}", dir.display());
}
trust_cert(dir);
// Create the ROOT device node only if absent (a blind re-create spawns a phantom duplicate, and the
// host binds interface index 0). ALWAYS nefconc (a clean ROOT\DISPLAY node), NEVER devgen (which makes
// persistent SWD\DEVGEN software devices that survive reboot + registry deletion).
if pf_vdisplay_present() {
println!("pf-vdisplay device node already present - leaving it.");
} else if let Some(nef) = first_named(dir, "nefconc.exe") {
let (class, guid) = inf_class(&inf);
let ok = run_quiet(
&nef.to_string_lossy(),
&[
"--create-device-node",
"--hardware-id",
"root\\pf_vdisplay",
"--class-name",
&class,
"--class-guid",
&guid,
],
);
if ok {
println!("created root\\pf_vdisplay device node (nefconc)");
} else {
eprintln!("warning: nefconc --create-device-node failed");
}
} else {
eprintln!(
"warning: nefconc.exe not found in {} - cannot create the device node",
dir.display()
);
}
// Stage + bind the driver (idempotent; re-staging the same .inf is harmless).
if run_quiet(
"pnputil",
&["/add-driver", &inf.to_string_lossy(), "/install"],
) {
println!("pnputil /add-driver pf_vdisplay.inf /install ok");
} else {
eprintln!("warning: pnputil /add-driver /install failed (driver may not have installed)");
}
Ok(())
}
fn install_gamepad(dir: &Path) -> Result<()> {
let infs: Vec<PathBuf> = std::fs::read_dir(dir)
.with_context(|| format!("read {}", dir.display()))?
.flatten()
.map(|e| e.path())
.filter(|p| p.extension().is_some_and(|x| x.eq_ignore_ascii_case("inf")))
.collect();
if infs.is_empty() {
bail!("no driver .inf in {}", dir.display());
}
trust_cert(dir);
// Add each package to the store - no /install, no device node: the host SwDeviceCreate's the
// per-session devnode when a client forwards a pad, so PnP binds the store driver on demand.
for inf in &infs {
if run_quiet("pnputil", &["/add-driver", &inf.to_string_lossy()]) {
println!("pnputil /add-driver {} ok", file_name(inf));
} else {
eprintln!("warning: pnputil /add-driver {} failed", inf.display());
}
}
Ok(())
}
/// Is a punktfunk virtual-display device already enumerated? Matches the device ID / description, which
/// are NOT localized, so the substring check is locale-safe.
fn pf_vdisplay_present() -> bool {
let lo = run_capture("pnputil", &["/enum-devices", "/class", "Display"]).to_ascii_lowercase();
lo.contains("pf_vdisplay") || lo.contains("punktfunk virtual display")
}
/// Read `Class` + `ClassGuid` from an INF so the node matches the shipped driver; falls back to Display.
fn inf_class(inf: &Path) -> (String, String) {
let text = std::fs::read_to_string(inf).unwrap_or_default();
let (mut class, mut guid) = (None, None);
for line in text.lines() {
let t = line.trim();
if let Some(eq) = t.find('=') {
let key = t[..eq].trim().to_ascii_lowercase();
let val = t[eq + 1..]
.split(';')
.next()
.unwrap_or("")
.trim()
.to_string();
match key.as_str() {
"class" => class = Some(val),
"classguid" => guid = Some(val),
_ => {}
}
}
}
(
class
.filter(|c| !c.is_empty())
.unwrap_or_else(|| "Display".into()),
guid.filter(|g| !g.is_empty())
.unwrap_or_else(|| "{4d36e968-e325-11ce-bfc1-08002be10318}".into()),
)
}
// ── `web setup --app-dir <app> [--password-file <file>]` ────────────────────────────────────────
const WEB_TASK: &str = "PunktfunkWeb";
pub fn web_main(args: &[String]) -> Result<()> {
match args.first().map(String::as_str) {
Some("setup") => web_setup(&args[1..]),
_ => bail!("usage: punktfunk-host web setup --app-dir <app> [--password-file <file>]"),
}
}
fn web_setup(args: &[String]) -> Result<()> {
let app_dir =
PathBuf::from(flag_val(args, "--app-dir").context("web setup: --app-dir <app> required")?);
let pw_file = flag_val(args, "--password-file");
let data_dir = crate::gamestream::config_dir();
std::fs::create_dir_all(&data_dir).ok();
let pw_path = data_dir.join("web-password");
let token_path = data_dir.join("mgmt-token");
// 1. login password
set_web_password(&pw_path, pw_file.as_deref());
// 2. (upgrade-safe) stop any running console so the new task binds :3000 + the files unlock
stop_web_console();
// 3. register the PunktfunkWeb scheduled task
let cmd = app_dir.join("web").join("web-run.cmd");
if !cmd.exists() {
bail!("web launcher missing: {}", cmd.display());
}
register_web_task(&cmd)?;
// 4. firewall: inbound TCP 3000
if !run_quiet(
"netsh",
&[
"advfirewall",
"firewall",
"add",
"rule",
"name=punktfunk web console (TCP 3000)",
"dir=in",
"action=allow",
"protocol=TCP",
"localport=3000",
],
) {
eprintln!("warning: could not add the firewall rule for TCP 3000");
}
// 5. wait briefly for the host's mgmt token, then start (restart-on-failure picks it up otherwise)
for _ in 0..30 {
if token_path.exists() {
break;
}
std::thread::sleep(std::time::Duration::from_secs(1));
}
run_quiet("schtasks", &["/run", "/tn", WEB_TASK]);
println!("web console set up + started (http://<host-ip>:3000)");
Ok(())
}
/// Source: a non-empty `--password-file` (fresh install) > keep existing (upgrade) > random fallback.
/// Writes `PUNKTFUNK_UI_PASSWORD=<pw>\n` (LF, no BOM) + ACLs it to Administrators + SYSTEM only.
fn set_web_password(pw_path: &Path, pw_file: Option<&str>) {
let password = pw_file
.and_then(|f| std::fs::read_to_string(f).ok())
.map(|s| s.trim().to_string())
.filter(|s| !s.is_empty())
.or_else(|| {
if pw_path.exists() {
println!("keeping existing web console password");
None
} else {
Some(random_password())
}
});
if let Some(pw) = password {
if std::fs::write(pw_path, format!("PUNKTFUNK_UI_PASSWORD={pw}\n")).is_err() {
eprintln!("warning: could not write {}", pw_path.display());
return;
}
// Lock down: drop inheritance, grant only Administrators (S-1-5-32-544) + SYSTEM (S-1-5-18).
let p = pw_path.to_string_lossy();
run_quiet(
"icacls",
&[
&p,
"/inheritance:r",
"/grant:r",
"*S-1-5-32-544:F",
"*S-1-5-18:F",
],
);
}
}
/// 20-char URL/shell-safe password (no `/ + =`), like web-init.sh / the old web-setup.ps1.
fn random_password() -> String {
use base64::Engine;
use rand::RngCore;
let mut b = [0u8; 24];
rand::thread_rng().fill_bytes(&mut b);
base64::engine::general_purpose::STANDARD
.encode(b)
.chars()
.filter(|c| !matches!(c, '/' | '+' | '='))
.take(20)
.collect()
}
/// Stop + reap a running console before re-registering (upgrade-safe): end the task AND kill the :3000
/// listener owner (runtime-agnostic - a prior install may have run node vs the current bun). The listener
/// is identified by the wildcard foreign address (`0.0.0.0:0`/`[::]:0`), so the localized state word
/// ("LISTENING"/"ABHOEREN"/...) is never parsed.
fn stop_web_console() {
run_quiet("schtasks", &["/end", "/tn", WEB_TASK]);
for line in run_capture("netstat", &["-ano", "-p", "tcp"]).lines() {
let toks: Vec<&str> = line.split_whitespace().collect();
if toks.len() >= 5
&& toks[0].eq_ignore_ascii_case("tcp")
&& toks[1].ends_with(":3000")
&& (toks[2] == "0.0.0.0:0" || toks[2] == "[::]:0")
{
let pid = toks[toks.len() - 1];
if !pid.is_empty() && pid.bytes().all(|b| b.is_ascii_digit()) {
run_quiet("taskkill", &["/PID", pid, "/F"]);
}
}
}
std::thread::sleep(std::time::Duration::from_secs(1));
}
/// Register the boot/SYSTEM/restart-on-failure task via a generated Task Scheduler XML (`schtasks /xml`,
/// no COM). The XML declares UTF-16, so it's written UTF-16LE+BOM.
fn register_web_task(cmd: &Path) -> Result<()> {
let xml = format!(
"<?xml version=\"1.0\" encoding=\"UTF-16\"?>\n\
<Task version=\"1.2\" xmlns=\"http://schemas.microsoft.com/windows/2004/02/mit/task\">\n\
<RegistrationInfo><Description>punktfunk web management console (Nitro SSR on bun, :3000)</Description></RegistrationInfo>\n\
<Triggers><BootTrigger><Enabled>true</Enabled></BootTrigger></Triggers>\n\
<Principals><Principal id=\"Author\"><UserId>S-1-5-18</UserId><RunLevel>HighestAvailable</RunLevel></Principal></Principals>\n\
<Settings>\n\
<MultipleInstancesPolicy>IgnoreNew</MultipleInstancesPolicy>\n\
<DisallowStartIfOnBatteries>false</DisallowStartIfOnBatteries>\n\
<StopIfGoingOnBatteries>false</StopIfGoingOnBatteries>\n\
<StartWhenAvailable>true</StartWhenAvailable>\n\
<ExecutionTimeLimit>PT0S</ExecutionTimeLimit>\n\
<RestartOnFailure><Interval>PT1M</Interval><Count>10</Count></RestartOnFailure>\n\
</Settings>\n\
<Actions Context=\"Author\"><Exec><Command>{}</Command></Exec></Actions>\n\
</Task>",
xml_escape(&cmd.to_string_lossy())
);
let xml_path = std::env::temp_dir().join("punktfunk-web-task.xml");
write_utf16le_bom(&xml_path, &xml)?;
let ok = run_quiet(
"schtasks",
&[
"/create",
"/tn",
WEB_TASK,
"/xml",
&xml_path.to_string_lossy(),
"/f",
],
);
let _ = std::fs::remove_file(&xml_path);
if ok {
println!("registered scheduled task {WEB_TASK} -> {}", cmd.display());
Ok(())
} else {
bail!("schtasks /create {WEB_TASK} failed")
}
}
fn write_utf16le_bom(path: &Path, s: &str) -> Result<()> {
let mut bytes = vec![0xFFu8, 0xFE]; // UTF-16LE BOM
for u in s.encode_utf16() {
bytes.extend_from_slice(&u.to_le_bytes());
}
std::fs::write(path, bytes).with_context(|| format!("write {}", path.display()))
}
fn xml_escape(s: &str) -> String {
s.replace('&', "&amp;")
.replace('<', "&lt;")
.replace('>', "&gt;")
}
fn first_with_ext(dir: &Path, ext: &str) -> Option<PathBuf> {
std::fs::read_dir(dir)
.ok()?
.flatten()
.map(|e| e.path())
.find(|p| p.extension().is_some_and(|x| x.eq_ignore_ascii_case(ext)))
}
fn first_named(dir: &Path, name: &str) -> Option<PathBuf> {
let p = dir.join(name);
p.exists().then_some(p)
}
fn file_name(p: &Path) -> String {
p.file_name()
.unwrap_or_default()
.to_string_lossy()
.into_owned()
}
@@ -1,5 +1,5 @@
//! USER-session WGC helper (Windows) — part of the two-process secure-desktop design //! USER-session WGC helper (Windows) — part of the two-process secure-desktop design
//! (design/windows-secure-desktop.md). //! (design/archive/windows-secure-desktop.md).
//! //!
//! WGC won't activate under the SYSTEM account, but the host must run as SYSTEM for the secure //! WGC won't activate under the SYSTEM account, but the host must run as SYSTEM for the secure
//! desktop. So the SYSTEM host spawns THIS helper in the interactive user session //! desktop. So the SYSTEM host spawns THIS helper in the interactive user session
@@ -21,7 +21,7 @@ use windows::Win32::Devices::Display::{
DISPLAYCONFIG_GET_ADVANCED_COLOR_INFO, DISPLAYCONFIG_MODE_INFO, DISPLAYCONFIG_PATH_INFO, DISPLAYCONFIG_GET_ADVANCED_COLOR_INFO, DISPLAYCONFIG_MODE_INFO, DISPLAYCONFIG_PATH_INFO,
DISPLAYCONFIG_SET_ADVANCED_COLOR_STATE, DISPLAYCONFIG_SOURCE_DEVICE_NAME, DISPLAYCONFIG_SET_ADVANCED_COLOR_STATE, DISPLAYCONFIG_SOURCE_DEVICE_NAME,
QDC_ONLY_ACTIVE_PATHS, SDC_ALLOW_CHANGES, SDC_APPLY, SDC_FORCE_MODE_ENUMERATION, QDC_ONLY_ACTIVE_PATHS, SDC_ALLOW_CHANGES, SDC_APPLY, SDC_FORCE_MODE_ENUMERATION,
SDC_SAVE_TO_DATABASE, SDC_USE_SUPPLIED_DISPLAY_CONFIG, SDC_SAVE_TO_DATABASE, SDC_TOPOLOGY_EXTEND, SDC_USE_SUPPLIED_DISPLAY_CONFIG,
}; };
use windows::Win32::Graphics::Gdi::{ use windows::Win32::Graphics::Gdi::{
ChangeDisplaySettingsExW, EnumDisplaySettingsW, CDS_TEST, CDS_UPDATEREGISTRY, DEVMODEW, ChangeDisplaySettingsExW, EnumDisplaySettingsW, CDS_TEST, CDS_UPDATEREGISTRY, DEVMODEW,
@@ -31,6 +31,29 @@ use windows::Win32::Graphics::Gdi::{
use crate::vdisplay::Mode; use crate::vdisplay::Mode;
/// Force the desktop into EXTEND topology - the programmatic equivalent of the Win+P / DisplaySwitch
/// "Extend" shortcut. Windows defaults a FRESHLY-ADDED monitor into CLONE/duplicate mode when a
/// physical display is already active (e.g. a laptop panel): a cloned IddCx output shares the panel's
/// source, so the OS never commits a distinct path for it, never calls ASSIGN_SWAPCHAIN, and capture
/// sees no frames (`resolve_gdi_name` stays `None` and the session fails "not an active display path").
/// Applying the EXTEND preset across the live set of connected displays makes the new IddCx monitor its
/// OWN active path, so the rest of bring-up (`resolve_gdi_name` -> `set_active_mode` ->
/// `isolate_displays_ccd`) proceeds. Best-effort + idempotent: a no-op on a single-display (already
/// sole/extended) box, so it is safe to call unconditionally. `rc == 0` is success.
pub(crate) unsafe fn force_extend_topology() {
// A topology flag with no supplied path/mode arrays tells the OS to recompute + apply that preset
// for the currently-connected displays (the same code path DisplaySwitch.exe drives).
let rc = SetDisplayConfig(None, None, SDC_APPLY | SDC_TOPOLOGY_EXTEND);
if rc == 0 {
tracing::info!(
"display topology forced to EXTEND (a new IddCx monitor would otherwise be CLONED onto the \
existing panel -> no distinct source -> no frames)"
);
} else {
tracing::warn!("display force-EXTEND topology: SetDisplayConfig rc={rc:#x}");
}
}
/// Resolve the `\\.\DisplayN` GDI name for a SudoVDA target id via the CCD API. Returns `None` /// Resolve the `\\.\DisplayN` GDI name for a SudoVDA target id via the CCD API. Returns `None`
/// until the OS activates the target into the desktop topology (needs a real WDDM GPU; on a /// until the OS activates the target into the desktop topology (needs a real WDDM GPU; on a
/// GPU-less box this stays `None` even though ADD succeeded). /// GPU-less box this stays `None` even though ADD succeeded).
+85
View File
@@ -0,0 +1,85 @@
# design/ — design notes & deep-dive plans
Repo-internal design docs: architecture rationale, investigations, and the *why* behind decisions that
the code and [`../CLAUDE.md`](../CLAUDE.md) don't capture. **Authoritative current status lives in
[`../CLAUDE.md`](../CLAUDE.md)** ("Where the work stands" / "What's left"); the user-facing guides live in
`docs-site/`. These docs are kept trimmed: once work ships, the redundant implementation detail is dropped
(the code is the source of truth) and only the durable rationale + still-open items remain. Git history
holds the full originals.
## Index
| Doc | What it is | Status |
|-----|-----------|--------|
| [`implementation-plan.md`](implementation-plan.md) | Master design thesis (why GF(2¹⁶) FEC + Linux virtual displays; three-phase de-risking), architecture invariants, latency budget, risk register | **Design reference** — §07,9 kept; milestones → CLAUDE.md |
| [`apollo-comparison.md`](apollo-comparison.md) | Apollo↔punktfunk architecture map + file index + ~63-item transferable-improvement backlog (Windows-host focus) | **Reference + open backlog** — ~⅓ shipped (collapsed); rest open |
| [`security-review.md`](security-review.md) | Whole-project security audit (2026-06-21), 12 findings | **Audit trail** — 11 fixed/inherent; **#12 open** |
| [`ci.md`](ci.md) | CI/CD architecture: Gitea workflows, runners, release model, signing | **Evergreen reference** |
| [`linux-setup.md`](linux-setup.md) | Linux host bring-up (NVIDIA/headless) + troubleshooting | **Setup guide** (evergreen) |
| [`gamestream-host-plan.md`](gamestream-host-plan.md) | GameStream/Moonlight-compat host (P1.1P1.6) | **Shipped** — stub + the 2 deferral decisions |
| [`stats-capture-plan.md`](stats-capture-plan.md) | Web-console performance capture | **Shipped** — stub |
| [`session-aware-host-followups.md`](session-aware-host-followups.md) | Session-aware host known limitations | **Open items**#2/#3 shipped; #1,#48 parked |
| [`gamescope-multiuser.md`](gamescope-multiuser.md) | Per-session gamescope isolation (the 4 plumbing items) | **Deferred** — reference spec |
| [`host-latency-plan.md`](host-latency-plan.md) | Latency under GPU contention — 4-tier plan | **Partly shipped** — superseded by ↓; diagnostics + open tiers kept |
| [`gpu-contention-investigation.md`](gpu-contention-investigation.md) | GPU-contention root-cause + ranked levers (supersedes ↑) | **Active plan** — §5.A shipped; §5.B/C/E/F/G open |
| [`hdr-pipeline-plan.md`](hdr-pipeline-plan.md) | Glass-to-glass HDR | **Steps 03 shipped**; Step 4 (Linux) open |
| [`windows-host-rewrite.md`](windows-host-rewrite.md) | **Windows host — the single architecture/status/reference doc** (validated invariants, ops, open work) | **Active reference** |
| [`windows-build-and-packaging.md`](windows-build-and-packaging.md) | How the Windows host is built, signed, packaged (drivers-from-source, Inno, CI) | **Evergreen reference** |
| [`windows-service.md`](windows-service.md) | SYSTEM SCM service + secure-desktop deployment model | **Shipped** — stub + graceful-stop open item |
| [`windows-host.md`](windows-host.md) | (original 2026-06 plan) | **Redirect**`windows-host-rewrite.md` |
| [`windows-virtual-display-rust-port.md`](windows-virtual-display-rust-port.md) | pf-vdisplay IddCx port + the "IDD-push is impossible on bare metal" finding | **Shipped** — P2 do-not-retry record kept |
| [`windows-dualsense-scoping.md`](windows-dualsense-scoping.md) | Virtual DualSense (UMDF2) decision + M0 bug lessons | **Shipped (M0M4)** — public signing open |
| [`windows-dualsense-game-detection.md`](windows-dualsense-game-detection.md) | Native game-detection fix (SwDeviceCreate identity) | **Shipped** — on-glass test + GameInput open |
| [`windows-client-bootstrap.md`](windows-client-bootstrap.md) | Windows client architecture record + HDR guide + build gotchas | **Shipped** — on-glass validation open |
| [`apple-stage2-presenter.md`](apple-stage2-presenter.md) | Apple stage-2 (VTDecompressionSession + CAMetalLayer) presenter | **Shipped (opt-in)** — make-default + iOS open |
| [`game-library-stores.md`](game-library-stores.md) | Multi-store game library | **Phases 14 shipped** — 6 providers + 8 Qs open |
| [`dualsense-haptics.md`](dualsense-haptics.md) | DualSense advanced-haptics feasibility | **HID shipped**; audio haptics deferred (3 walls) |
| [`archive/windows-secure-desktop.md`](archive/windows-secure-desktop.md) | Two-process WGC secure-desktop design | **Archived** — shipped but now a fallback (IDD-push primary) |
Plus `research/gamestream-protocol-research.json` — raw Moonlight/GameStream wire reference (data, not prose).
## Consolidated open items
Still-open work scattered across the docs above, rolled up by theme so nothing is tracked in only one
buried doc. CLAUDE.md "What's left" is the headline list; this is the design-level detail. (→ names the
owning doc.)
**Latency / performance**
- Sub-frame pipelining — overlap encode+transmit within a frame; needs a direct NVENC SDK wrapper (~24 ms). → `implementation-plan`, `gamestream-host-plan`
- GPU-contention levers: correct async NVENC pipeline, auto-gated REALTIME GPU priority, clock/P-state pinning, frame-source escape (swapchain-hook/NvFBC/compose-flip), iGPU encode offload, PERF uniq-vs-fps instrumentation. → `gpu-contention-investigation` (§5.B/C/E/F/G), `host-latency-plan` (Tiers 1A/1B/3B/3C/3D/4)
- Apple stage-2 as default (after resolution/HDR checks) + smoothing/pacing policy + glass-to-glass numbers via `tools/latency-probe`. → `apple-stage2-presenter`
**HDR**
- Linux 10-bit HDR (Step 4): 8-bit→Main10 shim, true 10-bit PipeWire capture (blocked upstream — gamescope #2126), Linux-client P010 + GTK color management. → `hdr-pipeline-plan`
- GameStream HDR/10-bit (capture + metadata plumbing). → `gamestream-host-plan`
- Open Qs: MaxCLL source, GameStream SS_HDR_METADATA vs deliberate SDR, HLG sources, mid-session SDR-downgrade + SDR-for-SDR-client validation. → `hdr-pipeline-plan`
**Clients**
- Windows client on-glass validation (D3D11VA decode + HDR present + GUI on the RTX box) + RAWINPUT relative-mouse pointer-lock + per-host speed-test UI. → `windows-client-bootstrap`, `implementation-plan`
- iOS/iPadOS/tvOS stage-2 presenter variants. → `apple-stage2-presenter`, `implementation-plan`
- Android real-device validation (gamepad rumble/lightbar/DualSense, HDR10). → `implementation-plan`
**Windows host**
- Graceful stop signal — host is killed via TerminateProcess (skips RAII teardown → a stale virtual monitor can linger). → `windows-service`
- pf-vdisplay slot-reclaim on-glass reconnect-storm A/B; M4 driver-unification source-build validation; P2/P3 cleanup (D1-host lints, M6 scaffolding, M5 reshape WGC/DDA onto session/pipeline). → `windows-host-rewrite`
- Session-aware follow-ups: F44 gamescope teardown GPU-context corruption (#1, SIGKILL hypothesis); mid-stream-switch input-loss window; NVENC InitializeEncoder noise at 5K@240; NVENC HEVC ~800 Mbps cap (prefer AV1 above it); restore-guard/keep-warm coupling; Feature B (`PUNKTFUNK_SESSION_WATCH`) opt-in → default. → `session-aware-host-followups`
- Apollo backlog (~63 open) — highest-value: #9 Windows app launch (CreateProcessAsUserW), #7/#18 WASAPI device-loss recovery, #3 per-frame `IDXGIFactory::IsCurrent()`, #15 watchdog escalation, #14/#30/#56 abs-mouse through the real output rect, #10/#20/#32/#33 tray + browser-UI + in-binary service install + logs endpoint, #67/#68 frame pacing. → `apollo-comparison`
**Windows gamepads**
- DualSense public-distribution signing (EV cert + Microsoft Partner Center attestation — blocks public release); GameInput detection (reads VID/PID 0x0000 — may need a rank-3 KMDF USB-emulating bus driver); HidHide integration; minimum-OS / UMDFVERSION targeting; on-glass Cyberpunk glyph test. → `windows-dualsense-scoping`, `windows-dualsense-game-detection`
**GameStream**
- AV1 + surround 5.1/7.1 live stock-Moonlight confirmation (incl. FEC-under-loss); reconnect-at-new-mode robustness. → `gamestream-host-plan`, `implementation-plan`
**Game library**
- 6 remaining providers (Desktop/Flatpak, itch.io, Ubisoft Connect, Amazon Games, Battle.net, EA app); the `/library/art/<entryId>/<slot>` mgmt endpoint; refactor `library.rs` into a `library/` dir; 8 open design questions; optional SteamGridDB v2 enrichment. → `game-library-stores`
**Multi-user / sessions**
- gamescope per-session input/audio isolation (independent desktops) — the 4 plumbing items, deferred. → `gamescope-multiuser`, `implementation-plan`
**Security**
- **#12** — scope `NODE_TLS_REJECT_UNAUTHORIZED` to a per-request pinned agent (needs `bun add undici`); latent-only today, but **must fix before the web app gains any off-loopback server-side TLS**. → `security-review`
**Deferred / do-not-retry records** (kept so the dead ends aren't re-explored)
- DualSense audio-driven haptics — deferred until all 3 GO conditions are met. → `dualsense-haptics`
- IDD-push direct frame-push on bare-metal console capture — architecturally impossible (no presentation consumer for the swapchain). → `windows-virtual-display-rust-port`
+39 -136
View File
@@ -1,5 +1,7 @@
# Apollo vs punktfunk — architecture map & transferable improvements # Apollo vs punktfunk — architecture map & transferable improvements
> **Status:** Reference doc — an Apollo↔punktfunk architecture map plus a 96-item transferable-improvement backlog. About a third of the backlog has since shipped or gone obsolete (those items are collapsed to one-liners below); the rest is still open with full citations. The **Re-verified status (2026-06-20)** section is the authoritative shipped-status record.
> Generated 2026-06-16 by the `apollo-vs-punktfunk` multi-agent workflow, then reconstructed from > Generated 2026-06-16 by the `apollo-vs-punktfunk` multi-agent workflow, then reconstructed from
> the run journal after the live run was interrupted. **Apollo** = `~/Apollo` (commit `adc5c5a0`), > the run journal after the live run was interrupted. **Apollo** = `~/Apollo` (commit `adc5c5a0`),
> a C++ fork of Sunshine — a Moonlight-compatible streaming **host only** (no client of its own). > a C++ fork of Sunshine — a Moonlight-compatible streaming **host only** (no client of its own).
@@ -680,7 +682,7 @@ Both transports use the persistent `AudioCapSlot` (gamestream/audio.rs:251-257)
### Input handling & injection — 🔴 Apollo ahead ### Input handling & injection — 🔴 Apollo ahead
For the Windows host specifically, Apollo is ahead on input breadth and robustness. Apollo covers mouse (rel+abs), keyboard (with a static US-layout VK→scancode table for game compatibility), Unicode text, scroll, **touch + pen via CreateSyntheticPointerDevice**, and **both X360 and DS4** gamepads with rumble/LED/motion/touchpad/battery feedback (Apollo src/platform/windows/input.cpp). punktfunk's Windows host covers mouse/keyboard/scroll/X360-only; touch and pen are explicit no-ops (sendinput.rs:231-237), there is no Unicode text path (gamestream/input.rs:83-84), and only the Xbox 360 virtual pad exists on Windows. Apollo also has the more efficient secure-desktop model (retry-only) vs punktfunk's per-event reattach (sendinput.rs:97), and Apollo's task-pool queue + type-aware batching (Apollo src/input.cpp:1481-1571, 1208-1475) coalesces input spam off the network thread — punktfunk's GameStream path injects inline on the ENet thread (control.rs:207-211) with no batching anywhere. punktfunk's design is cleaner and its m3 path's session-end held-key release + backend-follow logic is genuinely nicer than Apollo, but those are punktfunk/1-specific; on the shared Windows-host injection surface Apollo is the more complete, battle-tested implementation. punktfunk's design/windows-secure-desktop.md already flags the retry-only refactor as planned-but-unshipped, confirming the gap. For the Windows host specifically, Apollo is ahead on input breadth and robustness. Apollo covers mouse (rel+abs), keyboard (with a static US-layout VK→scancode table for game compatibility), Unicode text, scroll, **touch + pen via CreateSyntheticPointerDevice**, and **both X360 and DS4** gamepads with rumble/LED/motion/touchpad/battery feedback (Apollo src/platform/windows/input.cpp). punktfunk's Windows host covers mouse/keyboard/scroll/X360-only; touch and pen are explicit no-ops (sendinput.rs:231-237), there is no Unicode text path (gamestream/input.rs:83-84), and only the Xbox 360 virtual pad exists on Windows. Apollo also has the more efficient secure-desktop model (retry-only) vs punktfunk's per-event reattach (sendinput.rs:97), and Apollo's task-pool queue + type-aware batching (Apollo src/input.cpp:1481-1571, 1208-1475) coalesces input spam off the network thread — punktfunk's GameStream path injects inline on the ENet thread (control.rs:207-211) with no batching anywhere. punktfunk's design is cleaner and its m3 path's session-end held-key release + backend-follow logic is genuinely nicer than Apollo, but those are punktfunk/1-specific; on the shared Windows-host injection surface Apollo is the more complete, battle-tested implementation. punktfunk's design/archive/windows-secure-desktop.md already flags the retry-only refactor as planned-but-unshipped, confirming the gap.
**How punktfunk does it.** **How punktfunk does it.**
@@ -748,7 +750,7 @@ For the Windows host specifically, Apollo is clearly ahead on this subsystem. Ap
- punktfunk has TWO app surfaces by design: the GameStream apps.json catalog (Moonlight compat) AND a richer punktfunk/1 library (Steam local scan + custom store + CDN art + uniform GameEntry grid). Apollo has only the apps.json catalog because it ships no client. - punktfunk has TWO app surfaces by design: the GameStream apps.json catalog (Moonlight compat) AND a richer punktfunk/1 library (Steam local scan + custom store + CDN art + uniform GameEntry grid). Apollo has only the apps.json catalog because it ships no client.
- punktfunk's launch security model is deliberately client-can't-inject: the client sends only a store-qualified id and the host resolves it against its OWN library (library.rs:394-412), with steam appid validated digits-only. Apollo trusts its own apps.json cmds (it has no untrusted remote launch id). - punktfunk's launch security model is deliberately client-can't-inject: the client sends only a store-qualified id and the host resolves it against its OWN library (library.rs:394-412), with steam appid validated digits-only. Apollo trusts its own apps.json cmds (it has no untrusted remote launch id).
- punktfunk keeps NO async on the per-frame path; the SudoVDA watchdog pinger and capture are native threads. Apollo's libdisplaydevice RetryScheduler is its own machinery; punktfunk has no equivalent scheduler by choice (yet — see candidate improvements). - punktfunk keeps NO async on the per-frame path; the SudoVDA watchdog pinger and capture are native threads. Apollo's libdisplaydevice RetryScheduler is its own machinery; punktfunk has no equivalent scheduler by choice (yet — see candidate improvements).
- punktfunk's Windows virtual display is the SOLE primary output (isolate_displays + CDS_SET_PRIMARY) specifically to capture the secure/Winlogon desktop — a deliberate, documented design (design/windows-secure-desktop.md) that goes beyond what stock Apollo needs. - punktfunk's Windows virtual display is the SOLE primary output (isolate_displays + CDS_SET_PRIMARY) specifically to capture the secure/Winlogon desktop — a deliberate, documented design (design/archive/windows-secure-desktop.md) that goes beyond what stock Apollo needs.
**Transfer candidates from Apollo (6):** _Actually launch the app/game on Windows (CreateProcessAsUserW into the user session)_, _Display-config apply/revert with a retry scheduler and guaranteed revert on disconnect_, _Set HDR on the virtual display and advertise IsHdrSupported when the client requests it_, _Per-(app,client) stable virtual-display GUID instead of one fixed MONITOR_GUID_, _Inject per-app launch env (client res/fps/HDR/audio + status) for launch scripts_, _auto_detach heuristic for launcher-style apps (Steam/UWP) that exit immediately_ — see Part 4. **Transfer candidates from Apollo (6):** _Actually launch the app/game on Windows (CreateProcessAsUserW into the user session)_, _Display-config apply/revert with a retry scheduler and guaranteed revert on disconnect_, _Set HDR on the virtual display and advertise IsHdrSupported when the client requests it_, _Per-(app,client) stable virtual-display GUID instead of one fixed MONITOR_GUID_, _Inject per-app launch env (client res/fps/HDR/audio + status) for launch scripts_, _auto_detach heuristic for launcher-style apps (Steam/UWP) that exit immediately_ — see Part 4.
@@ -897,11 +899,11 @@ QPC values from `LastPresentTime`/`LastMouseUpdateTime` are translated to `stead
#### Transfer opportunities #### Transfer opportunities
- **Treat S_OK-with-no-change frames as timeouts via DXGI update flags** (sev high, medium) — In dxgi.rs acquire(), after a successful AcquireNextFrame, compute frame_update_flag = info.LastPresentTime != 0 (and/or info.AccumulatedFrames != 0) and mouse_update_flag from LastMouseUpdateTime/PointerShapeBufferSize. Always call update_cursor (mouse). If !frame_update_flag, ReleaseFrame and return Ok(None) (so next_frame repeats last_present) UNLESS the cursor moved and we need a recomposite — in which case recomposite onto the existing last_present texture instead of CopyResource'ing the source. This cuts idle/cursor-only GPU load and avoids re-encoding unchanged content. - **Treat S_OK-with-no-change frames as timeouts via DXGI update flags** (sev high, medium) — In dxgi.rs acquire(), after a successful AcquireNextFrame, compute frame_update_flag = info.LastPresentTime != 0 (and/or info.AccumulatedFrames != 0) and mouse_update_flag from LastMouseUpdateTime/PointerShapeBufferSize. Always call update_cursor (mouse). If !frame_update_flag, ReleaseFrame and return Ok(None) (so next_frame repeats last_present) UNLESS the cursor moved and we need a recomposite — in which case recomposite onto the existing last_present texture instead of CopyResource'ing the source. This cuts idle/cursor-only GPU load and avoids re-encoding unchanged content.
- **Detect resolution/format change on the acquire hot path, not only during rebuild** (sev high, small) — In acquire(), after res.cast::<ID3D11Texture2D>(), call GetDesc and compare Width/Height/Format against self.width/height and the expected format (BGRA8 vs R16G16B16A16_FLOAT). On mismatch, ReleaseFrame and run the existing recreate_dupl path (or drop gpu_copy/staging/fp16/hdr10 textures and update width/height/hdr_fp16) so the encoder re-inits cleanly. This makes live resolution + HDR-toggle changes robust even when DDA doesn't fault. - **Detect resolution/format change on the acquire hot path, not only during rebuild** — SHIPPED (2026-06-20). [#2]
- **Release the duplication device lock during idle to avoid encoder starvation** (sev medium, small) — Cap the per-acquire DDA timeout to a small value (e.g. 8-16ms) and, when it returns WAIT_TIMEOUT, std::thread::sleep a few ms with no outstanding AcquireNextFrame before retrying — so the encode thread can grab the device for NVENC setup/reinit. Keep the generous timeout only for first_frame. Low risk, directly mirrors Apollo's documented fix. - **Release the duplication device lock during idle to avoid encoder starvation** — OBSOLETE / not-a-bug (2026-06-20). [#34]
- **Add client-framerate frame pacing with a high-precision timer** (sev medium, large) — Add an optional pacing layer (in dxgi.rs or the encode-loop caller in punktfunk1.rs/encode.rs) keyed on the negotiated client framerate: track a group start from the frame pts, sleep to the computed target with a Windows high-resolution timer (timeBeginPeriod or CREATE_WAITABLE_TIMER_HIGH_RESOLUTION), and snap near-integral refresh to integer divisors. This is the lever for steady pacing on odd refresh rates without changing the zero-copy design. - **Add client-framerate frame pacing with a high-precision timer** (sev medium, large) — Add an optional pacing layer (in dxgi.rs or the encode-loop caller in punktfunk1.rs/encode.rs) keyed on the negotiated client framerate: track a group start from the frame pts, sleep to the computed target with a Windows high-resolution timer (timeBeginPeriod or CREATE_WAITABLE_TIMER_HIGH_RESOLUTION), and snap near-integral refresh to integer divisors. This is the lever for steady pacing on odd refresh rates without changing the zero-copy design.
- **Harden GPU scheduling priority + SetMaximumFrameLatency + NVIDIA-HAGS NVENC-realtime avoidance** (sev medium, medium) — After D3D11CreateDevice in dxgi.rs (and the NVENC encoder device wherever it's built), query IDXGIDevice1::SetMaximumFrameLatency(1) and SetGPUThreadPriority; load gdi32 D3DKMTSetProcessSchedulingPriorityClass and request HIGH (not REALTIME) when the adapter is NVIDIA (VendorId 0x10DE) with HAGS on, REALTIME otherwise. Mirror the privilege-enable. Guard behind admin/SYSTEM (host already relaunches as SYSTEM). - **Harden GPU scheduling priority + SetMaximumFrameLatency + NVIDIA-HAGS NVENC-realtime avoidance** — SHIPPED (2026-06-20). [#47]
- **Retry DuplicateOutput at startup and request encoder-supported formats via Output5** (sev medium, small) — In open() wrap DuplicateOutput in a short retry (2-3 tries, ~200ms apart, re-attach_input_desktop between) before bailing. Optionally cast the output to IDXGIOutput5 and call DuplicateOutput1 with an explicit format list (BGRA8 for SDR, R16G16B16A16_FLOAT for HDR) so the capture format is intentional rather than incidental, falling back to DuplicateOutput when Output5 is absent. - **Retry DuplicateOutput at startup and request encoder-supported formats via Output5** — SHIPPED (2026-06-20). [#35]
### Windows.Graphics.Capture (WGC) path — Apollo vs punktfunk ### Windows.Graphics.Capture (WGC) path — Apollo vs punktfunk
@@ -1099,10 +1101,10 @@ punktfunk's cursor handling lives in `crates/punktfunk-host/src/capture/dxgi.rs`
#### Transfer opportunities #### Transfer opportunities
- ✅ **DONE (2026-06-16)****Split every cursor shape into an alpha image + an XOR image (two-pass composite)** (sev high, medium) — Refactor convert_pointer_shape in dxgi.rs to return two optional images (alpha, xor) mirroring Apollo's split. Store cursor_shape as Option<(alpha, xor)>, upload up to two SRVs in CursorCompositor, and in composite_cursor_gpu run the alpha pass with self.blend then the xor pass with self.blend_invert (skip empties). Drop the single cursor_invert flag. - ✅ **Split every cursor shape into an alpha image + an XOR image (two-pass composite)** — SHIPPED (2026-06-16; capture/dxgi.rs). [#13]
- **Render the monochrome 'inverse of screen' pixels via the XOR pass instead of dropping them** (sev medium, small) — In convert_pointer_shape's monochrome branch (dxgi.rs:628-654), once the dual-pass split (above) exists, route code (1,1) to the XOR image as white and codes (0,0)/(0,1) to the alpha image as opaque black/white, matching Apollo's case mapping. - **Render the monochrome 'inverse of screen' pixels via the XOR pass instead of dropping them** — SHIPPED (2026-06-20). [#37]
- ⊘ **ALREADY-HANDLED (2026-06-16; premise incorrect — DDA returns S_OK on pointer-only updates, punktfunk recomposites)****Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame** (sev high, large) — Keep a clean intermediate copy of the last desktop frame (an extra DEFAULT texture). In acquire (dxgi.rs:1341), when AcquireNextFrame times out but update_cursor saw a position change (LastMouseUpdateTime changed) and the cursor is visible, copy the clean intermediate into gpu_copy and re-run composite_cursor_gpu, then return that as a fresh frame instead of repeating last_present. - ⊘ **Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame** — NOT-A-BUG (2026-06-16; DDA returns S_OK on pointer-only updates and punktfunk recomposites). [#21]
- **Stop baking the cursor destructively into the repeated gpu_copy texture** (sev medium, medium) — Add a clean base texture: CopyResource(duplication -> clean_base), then CopyResource(clean_base -> gpu_copy) and composite onto gpu_copy. Repeat clean_base (cursor-free) plus a re-composite on repeats. Also create the cursor RTV once per gpu_copy and cache it rather than CreateRenderTargetView every composite (dxgi.rs:1181-1184). - **Stop baking the cursor destructively into the repeated gpu_copy texture** — SHIPPED (2026-06-20). [#49]
- **Handle rotated outputs in cursor positioning** (sev low, medium) — Read rotation from DXGI_OUTDUPL_DESC.Rotation when opening/rebuilding the duplication (around dxgi.rs:888 and 1298), store it on DuplCapturer, and apply Apollo's rotation transform when computing the NDC rect in CursorCompositor::draw and when sampling the cursor texture in the VS. - **Handle rotated outputs in cursor positioning** (sev low, medium) — Read rotation from DXGI_OUTDUPL_DESC.Rotation when opening/rebuilding the duplication (around dxgi.rs:888 and 1298), store it on DuplCapturer, and apply Apollo's rotation transform when computing the NDC rect in CursorCompositor::draw and when sampling the cursor texture in the VS.
- **Validate masked-color mask bytes and log illegal values** (sev low, small) — In the MASKED_COLOR branch of convert_pointer_shape (dxgi.rs:594-627), branch explicitly on mask==0x00 vs mask==0xFF and emit a tracing::warn! once for any other value, matching Apollo's guard, so future cursor-render bugs are observable. - **Validate masked-color mask bytes and log illegal values** (sev low, small) — In the MASKED_COLOR branch of convert_pointer_shape (dxgi.rs:594-627), branch explicitly on mask==0x00 vs mask==0xFF and emit a tracing::warn! once for any other value, matching Apollo's guard, so future cursor-render bugs are observable.
@@ -1295,10 +1297,10 @@ punktfunk drives the **raw NVENC API** via `nvidia_video_codec_sdk::{sys, ENCODE
#### Transfer opportunities #### Transfer opportunities
- **Add real reference-frame invalidation (RFI) instead of always forcing IDR** (sev high, large) — In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire punktfunk1.rs RFI requests to it, falling back to request_keyframe() only when it returns false. - **Add real reference-frame invalidation (RFI) instead of always forcing IDR** — SHIPPED (2026-06-20; NVENC impl CI-pending). [#22]
- **Query nvEncGetEncodeCaps and gate config on real GPU capabilities** (sev medium, medium) — Add a `get_cap(cap: NV_ENC_CAPS) -> i32` helper in nvenc.rs after open_encode_session_ex (using API.get_encode_caps), verify codec_guid is in get_encode_guids, reject out-of-range WxH up front, and use SUPPORT_10BIT_ENCODE / SUPPORT_REF_PIC_INVALIDATION / SUPPORT_CUSTOM_VBV_BUF_SIZE to gate the corresponding config rather than assuming support. Surfaces clear errors instead of opaque InvalidParam. - **Query nvEncGetEncodeCaps and gate config on real GPU capabilities** — SHIPPED (2026-06-20; CI-pending). [#51]
- **Use async encode with a Win32 completion event + timeout** (sev medium, medium) — In nvenc.rs, gate on NV_ENC_CAPS_ASYNC_ENCODE_SUPPORT, create a per-bitstream Win32 Event (windows::Win32::System::Threading::CreateEventW), set init.enableEncodeAsync=1, store the event in `pending`, set pic.completionEvent + lock.doNotWait=1, and in poll() WaitForSingleObject(ev, 100ms) before lock_bitstream — returning a clear timeout error instead of blocking forever. - **Use async encode with a Win32 completion event + timeout** (sev medium, medium) — In nvenc.rs, gate on NV_ENC_CAPS_ASYNC_ENCODE_SUPPORT, create a per-bitstream Win32 Event (windows::Win32::System::Threading::CreateEventW), set init.enableEncodeAsync=1, store the event in `pending`, set pic.completionEvent + lock.doNotWait=1, and in poll() WaitForSingleObject(ev, 100ms) before lock_bitstream — returning a clear timeout error instead of blocking forever.
- **Minimize NvEnc API/struct versions per codec for older-driver compatibility** (sev medium, medium) — Add a `min_api_version(codec)` (v11 for H264/HEVC, v12 for AV1) and a helper that rewrites the version word (and optionally the struct-revision byte) before each NvEnc struct is passed, mirroring nvenc_base.cpp:666-680. Set apiVersion in open_encode_session_ex (nvenc.rs:186) from it. Maximizes driver compatibility for the field. - **Minimize NvEnc API/struct versions per codec for older-driver compatibility** — OBSOLETE (2026-06-20; handled by the SDK crate). [#53]
- **Add zeroReorderDelay/lookahead-off/lowDelayKeyFrameScale and always emit SDR VUI** (sev low, small) — In init_session set cfg.rcParams.zeroReorderDelay=1, enableLookahead=0, lowDelayKeyFrameScale=1 right after the CBR/VBV block (nvenc.rs:220-227). Add an SDR VUI branch (BT.709 primaries/transfer/matrix, limited range) alongside the existing HDR branch (:243) so every HEVC/H264 stream signals its colorspace. - **Add zeroReorderDelay/lookahead-off/lowDelayKeyFrameScale and always emit SDR VUI** (sev low, small) — In init_session set cfg.rcParams.zeroReorderDelay=1, enableLookahead=0, lowDelayKeyFrameScale=1 right after the CBR/VBV block (nvenc.rs:220-227). Add an SDR VUI branch (BT.709 primaries/transfer/matrix, limited range) alongside the existing HDR branch (:243) so every HEVC/H264 stream signals its colorspace.
- **Honor client slices-per-frame and offer NVENC intra-refresh** (sev low, medium) — Thread a slices-per-frame value from session negotiation into NvencD3d11Encoder::open and set hevcConfig/h264Config sliceMode=3 + sliceModeData in init_session; for AV1 set numTileRows/numTileColumns as nearest powers of two. Optionally add an intra-refresh config branch gated on NV_ENC_CAPS_SUPPORT_INTRA_REFRESH as an alternative recovery mode to RFI. - **Honor client slices-per-frame and offer NVENC intra-refresh** (sev low, medium) — Thread a slices-per-frame value from session negotiation into NvencD3d11Encoder::open and set hevcConfig/h264Config sliceMode=3 + sliceModeData in init_session; for AV1 set numTileRows/numTileColumns as nearest powers of two. Optionally add an intra-refresh config branch gated on NV_ENC_CAPS_SUPPORT_INTRA_REFRESH as an alternative recovery mode to RFI.
@@ -1492,8 +1494,8 @@ punktfunk's SudoVDA backend lives in `crates/punktfunk-host/src/vdisplay/sudovda
- **Detect watchdog ping failures and escalate (re-open the device)** (sev high, medium) — In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS. - **Detect watchdog ping failures and escalate (re-open the device)** (sev high, medium) — In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS.
- **Gate on SudoVDA protocol-version compatibility instead of only logging it** (sev medium, small) — In SudoVdaDisplay::new (sudovda.rs:412-432) parse {Major,Minor,Incremental} and compare against a compiled-in EXPECTED_PROTOCOL {Major:0,Minor:2}. If Major differs or our Minor > driver Minor, return Err with a 'driver too old / incompatible — update SudoVDA' message (and a distinct error variant the mgmt API can surface, like Apollo's VirtualDisplayDriverReady in nvhttp.cpp:936). - **Gate on SudoVDA protocol-version compatibility instead of only logging it** (sev medium, small) — In SudoVdaDisplay::new (sudovda.rs:412-432) parse {Major,Minor,Incremental} and compare against a compiled-in EXPECTED_PROTOCOL {Major:0,Minor:2}. If Major differs or our Minor > driver Minor, return Err with a 'driver too old / incompatible — update SudoVDA' message (and a distinct error variant the mgmt API can surface, like Apollo's VirtualDisplayDriverReady in nvhttp.cpp:936).
- **Retry device open with exponential backoff** (sev medium, small) — Wrap open_device in SudoVdaDisplay::new (sudovda.rs:412-413) in a 20→320ms backoff loop matching Apollo; on a session-time re-open after watchdog failure, allow a few retries with ~1s spacing. - **Retry device open with exponential backoff** (sev medium, small) — Wrap open_device in SudoVdaDisplay::new (sudovda.rs:412-413) in a 20→320ms backoff loop matching Apollo; on a session-time re-open after watchdog failure, allow a few retries with ~1s spacing.
- **Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU** (sev high, medium) — Add `const IOCTL_SET_RENDER_ADAPTER: u32 = ctl(0x802);` and a `#[repr(C)] struct SetRenderAdapterParams { luid: LUID }` in sudovda.rs. Before ADD in create() (sudovda.rs:448), enumerate DXGI adapters (reuse capture/dxgi.rs adapter-by-LUID/name helpers) to match the configured/encoder GPU and issue the IOCTL so the IDD's AddOut LUID matches the capture device's adapter. - **Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU** — SHIPPED (2026-06-20). [#16]
- **Derive a stable per-client MonitorGuid instead of one global constant** (sev medium, medium) — Pass a client/session identifier into create() (thread it from the m3 handshake) and derive the GUID deterministically from it (e.g. hash the client cert fingerprint into a u128), replacing the constant at sudovda.rs:452-456 and the RemoveParams guid at sudovda.rs:568. Keep a fixed probe GUID for the startup encoder probe like Apollo's PROBE_DISPLAY_UUID. - **Derive a stable per-client MonitorGuid instead of one global constant** — SHIPPED (2026-06-20). [#55]
- **Add millihertz CCD mode-set with ±1 Hz fallback and SDC_SAVE_TO_DATABASE persistence** (sev medium, medium) — In set_active_mode (sudovda.rs:146-265), after the integer DEVMODE attempt add a CCD path: QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS), match the path by GDI name, set sourceMode width/height and targetInfo.refreshRate = {hz,1000}, and call SetDisplayConfig with SDC_APPLY|SDC_USE_SUPPLIED_DISPLAY_CONFIG|SDC_SAVE_TO_DATABASE. Add an alt-rate (±1) retry mirroring virtual_display.cpp:294-300. - **Add millihertz CCD mode-set with ±1 Hz fallback and SDC_SAVE_TO_DATABASE persistence** (sev medium, medium) — In set_active_mode (sudovda.rs:146-265), after the integer DEVMODE attempt add a CCD path: QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS), match the path by GDI name, set sourceMode width/height and targetInfo.refreshRate = {hz,1000}, and call SetDisplayConfig with SDC_APPLY|SDC_USE_SUPPLIED_DISPLAY_CONFIG|SDC_SAVE_TO_DATABASE. Add an alt-rate (±1) retry mirroring virtual_display.cpp:294-300.
### Windows host: running as SYSTEM, secure-desktop capture, session/desktop switching + D3D recreation, NVIDIA driver prefs (nvprefs), GPU/adapter preference, display isolation, mDNS publish ### Windows host: running as SYSTEM, secure-desktop capture, session/desktop switching + D3D recreation, NVIDIA driver prefs (nvprefs), GPU/adapter preference, display isolation, mDNS publish
@@ -1555,7 +1557,7 @@ punktfunk's **secure-desktop / desktop-switch capture recovery is genuinely matu
##### Where punktfunk is weaker / missing / fragile ##### Where punktfunk is weaker / missing / fragile
1. **No real Windows service — relies on a PsExec scheduled task.** The launch chain is a scheduled task → `PsExec64 -s -i 1``wscript.exe launch.vbs` → hidden `host-run.cmd` (`design/windows-host.md:78-84`). There is **no `SERVICE_CONTROL_SESSIONCHANGE` relaunch** — the doc even lists it as unimplemented "step 6" (`design/windows-secure-desktop.md:89`). PsExec is a 3rd-party SysInternals tool, not redistributable cleanly, and `-s -i 1` hard-codes session 1. None of the launch scripts (`launch.vbs`, `host-run.cmd`) are checked into the repo (only `scripts/headless/win-build.cmd` exists). This is the single biggest fragility vs Apollo's `sunshinesvc.cpp`. 1. **No real Windows service — relies on a PsExec scheduled task.** The launch chain is a scheduled task → `PsExec64 -s -i 1``wscript.exe launch.vbs` → hidden `host-run.cmd` (`design/windows-host.md:78-84`). There is **no `SERVICE_CONTROL_SESSIONCHANGE` relaunch** — the doc even lists it as unimplemented "step 6" (`design/archive/windows-secure-desktop.md:89`). PsExec is a 3rd-party SysInternals tool, not redistributable cleanly, and `-s -i 1` hard-codes session 1. None of the launch scripts (`launch.vbs`, `host-run.cmd`) are checked into the repo (only `scripts/headless/win-build.cmd` exists). This is the single biggest fragility vs Apollo's `sunshinesvc.cpp`.
2. **No nvprefs / NvAPI at all.** `grep` for `nvprefs|NvAPI|DRS_|PREFERRED_PSTATE|DXPRESENT` across the host returns nothing. No PREFERRED_PSTATE_MAX for the encoder, no OGL_CPL_PREFER_DXPRESENT (so GL/Vulkan fullscreen apps may not be capturable via WGC/DDA), and no undo-file crash safety. 2. **No nvprefs / NvAPI at all.** `grep` for `nvprefs|NvAPI|DRS_|PREFERRED_PSTATE|DXPRESENT` across the host returns nothing. No PREFERRED_PSTATE_MAX for the encoder, no OGL_CPL_PREFER_DXPRESENT (so GL/Vulkan fullscreen apps may not be capturable via WGC/DDA), and no undo-file crash safety.
3. **No DXGI GPU-preference / output-reparenting hook.** No MinHook of `NtGdiDdDDIGetCachedHybridQueryValue`. On a hybrid/Optimus box DXGI can reparent the SudoVDA output onto the render GPU and break DDA. punktfunk's "search all adapters" partly papers over this but does not prevent the reparenting itself. 3. **No DXGI GPU-preference / output-reparenting hook.** No MinHook of `NtGdiDdDDIGetCachedHybridQueryValue`. On a hybrid/Optimus box DXGI can reparent the SudoVDA output onto the render GPU and break DDA. punktfunk's "search all adapters" partly papers over this but does not prevent the reparenting itself.
4. **mDNS uses the cross-platform `mdns-sd` crate, not Windows-native `DnsServiceRegister`** (`discovery.rs:17`). It works, but it does NOT carry Apollo's RFC-1035 empty-TXT fix — and the GameStream/Moonlight mDNS path on Windows is unverified (`design/windows-host.md:46`). A non-RFC-compliant TXT can be rejected by Apple's resolver. 4. **mDNS uses the cross-platform `mdns-sd` crate, not Windows-native `DnsServiceRegister`** (`discovery.rs:17`). It works, but it does NOT carry Apollo's RFC-1035 empty-TXT fix — and the GameStream/Moonlight mDNS path on Windows is unverified (`design/windows-host.md:46`). A non-RFC-compliant TXT can be rejected by Apple's resolver.
@@ -1567,12 +1569,12 @@ punktfunk's **secure-desktop / desktop-switch capture recovery is genuinely matu
#### Transfer opportunities #### Transfer opportunities
- **Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change** (sev high, large) — Add a small Rust service binary (new crate or punktfunk-host `service` subcommand) using windows::Win32::System::Services (RegisterServiceCtrlHandlerEx, StartServiceCtrlDispatcher) that mirrors sunshinesvc.cpp: WTSGetActiveConsoleSessionId -> DuplicateTokenEx+SetTokenInformation(TokenSessionId) -> CreateProcessAsUserW(lpDesktop=winsta0\\default) into a kill-on-close job, accept SERVICE_ACCEPT_SESSIONCHANGE, and relaunch the host on a genuine console-session change. Ship an installer and drop the PsExec dependency. - **Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change** — SHIPPED (2026-06-20). [#24]
- **Add an NvAPI driver-settings manager (PREFERRED_PSTATE_MAX + OGL_CPL_PREFER_DXPRESENT) with a crash-safe undo file** (sev medium, large) — Add a windows-only nvprefs module wrapping NvAPI DRS (load nvapi64 dynamically, treat NvAPI_Initialize failure as 'no NVIDIA, skip'). Create a 'punktfunk' app profile with PREFERRED_PSTATE_PREFER_MAX, set OGL_CPL_PREFER_DXPRESENT_ENABLED on the base profile behind a config flag, write an undo file under %ProgramData%\\punktfunk before global changes, and call it on session start (the new stream_will_start hook below). - **Add an NvAPI driver-settings manager (PREFERRED_PSTATE_MAX + OGL_CPL_PREFER_DXPRESENT) with a crash-safe undo file** (sev medium, large) — Add a windows-only nvprefs module wrapping NvAPI DRS (load nvapi64 dynamically, treat NvAPI_Initialize failure as 'no NVIDIA, skip'). Create a 'punktfunk' app profile with PREFERRED_PSTATE_PREFER_MAX, set OGL_CPL_PREFER_DXPRESENT_ENABLED on the base profile behind a config flag, write an undo file under %ProgramData%\\punktfunk before global changes, and call it on session start (the new stream_will_start hook below).
- **Hook win32u!NtGdiDdDDIGetCachedHybridQueryValue to stop DXGI output-reparenting on hybrid/Optimus GPUs** (sev medium, medium) — Add a once-init in the Windows capture path (capture/dxgi.rs open) that installs the same hook via a minhook-rs/detour crate (or a manual IAT/inline hook) on NtGdiDdDDIGetCachedHybridQueryValue forcing STATE_UNSPECIFIED, plus SetProcessDpiAwarenessContext(PER_MONITOR_AWARE_V2). Gate it to NVIDIA/hybrid boxes; it's process-lifetime so no teardown needed. - **Hook win32u!NtGdiDdDDIGetCachedHybridQueryValue to stop DXGI output-reparenting on hybrid/Optimus GPUs** — SHIPPED (2026-06-20). [#57]
- **Add a Windows stream_will_start/stop hook: timer resolution, MMCSS, HIGH_PRIORITY_CLASS, display-required, headless Mouse Keys** (sev medium, medium) — Add a windows-only RAII guard invoked when a session starts (punktfunk1.rs/pipeline session setup) that raises timer resolution (NtSetTimerResolution or timeBeginPeriod(1)), DwmEnableMMCSS(true), SetPriorityClass(HIGH_PRIORITY_CLASS), and wraps the DXGI capture loop in SetThreadExecutionState(ES_CONTINUOUS|ES_DISPLAY_REQUIRED) (capture/dxgi.rs next_frame loop), reverting on drop. Optionally the headless Mouse-Keys trick for cursor visibility. - **Add a Windows stream_will_start/stop hook: timer resolution, MMCSS, HIGH_PRIORITY_CLASS, display-required, headless Mouse Keys** (sev medium, medium) — Add a windows-only RAII guard invoked when a session starts (punktfunk1.rs/pipeline session setup) that raises timer resolution (NtSetTimerResolution or timeBeginPeriod(1)), DwmEnableMMCSS(true), SetPriorityClass(HIGH_PRIORITY_CLASS), and wraps the DXGI capture loop in SetThreadExecutionState(ES_CONTINUOUS|ES_DISPLAY_REQUIRED) (capture/dxgi.rs next_frame loop), reverting on drop. Optionally the headless Mouse-Keys trick for cursor visibility.
- **Use Windows-native DnsServiceRegister (or fix the TXT record) so Apple's mDNS resolver accepts the host** (sev low, medium) — Either (a) verify mdns-sd always emits an RFC-1035-valid TXT (never zero strings) and add a regression test, or (b) add a windows-only discovery backend using DnsServiceRegister via the windows crate's DNS APIs mirroring publish.cpp, including the single-empty-TXT workaround, so Apple NWBrowser/Moonlight discover the host reliably. - **Use Windows-native DnsServiceRegister (or fix the TXT record) so Apple's mDNS resolver accepts the host** — SHIPPED (2026-06-20). [#87]
- **Add per-frame IDXGIFactory::IsCurrent reinit detection and switch the host clock to GetSystemTimePreciseAsFileTime** (sev medium, small) — In capture/dxgi.rs next_frame, query the cached IDXGIFactory's IsCurrent() once per loop and trigger the existing recreate path when it goes false (catches HDR/topology changes cleanly). Replace now_ns() on Windows with GetSystemTimePreciseAsFileTime converted to Unix-epoch ns so ClockProbe/ClockEcho skew correction stays accurate cross-machine. - **Add per-frame IDXGIFactory::IsCurrent reinit detection and switch the host clock to GetSystemTimePreciseAsFileTime** — SHIPPED (2026-06-20). [#42]
### Completeness critic — areas flagged as under-covered ### Completeness critic — areas flagged as under-covered
@@ -1769,18 +1771,10 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
#### 1. Switch SendInput to retry-on-failure desktop reattach (drop per-event OpenInputDesktop) #### 1. Switch SendInput to retry-on-failure desktop reattach (drop per-event OpenInputDesktop)
*Area:* `cmp:input` · *Windows-host:* yes · *Severity:* high · *Effort:* small **SHIPPED (2026-06-20)** — per-event OpenInputDesktop dropped for inject-first + retry-on-failure desktop reattach.
- **Apollo does:** send_input() / inject_synthetic_pointer_input() call SendInput FIRST, and only on failure (0 injected) re-run syncThreadDesktop() (OpenInputDesktop(DF_ALLOWOTHERACCOUNTHOOK)+SetThreadDesktop) and retry once, tracking the desktop in a thread_local _lastKnownInputDesktop — src/platform/windows/input.cpp:477,499 + src/platform/windows/misc.cpp:251
- **punktfunk gap:** SendInputInjector::inject() calls reattach_input_desktop() (an OpenInputDesktop+SetThreadDesktop+CloseDesktop) at the TOP of EVERY event — crates/punktfunk-host/src/inject/sendinput.rs:97,50-69. This is a syscall triple per mouse-move; punktfunk's own design/windows-secure-desktop.md:78-80 lists this exact refactor (step 2) as planned but unshipped.
- **Proposal:** Inject first; cache the HDESK thread-local; only on a 0/partial SendInput result call reattach_input_desktop() and retry once. Use DF_ALLOWOTHERACCOUNTHOOK in the OpenInputDesktop access (sendinput.rs:52-56 currently passes DESKTOP_CONTROL_FLAGS(0)) so the secure desktop is reachable. Keeps the steady-state hot path to a single SendInput call.
#### 2. Detect resolution/format change on the acquire hot path, not only during rebuild #### 2. Detect resolution/format change on the acquire hot path, not only during rebuild
*Area:* `win:capture-dxgi-dd` · *Windows-host:* yes · *Severity:* high · *Effort:* small **SHIPPED (2026-06-20)** — acquire-path GetDesc check now catches resolution/format changes that don't raise ACCESS_LOST.
- **Apollo does:** Every frame Apollo reads src->GetDesc() and reinits if desc.Width/Height != width_before_rotation/height_before_rotation or capture_format != desc.Format (display_vram.cpp:1215-1236, display_ram.cpp:253-265, wgc 1662-1674).
- **punktfunk gap:** punktfunk only re-reads dimensions inside recreate_dupl (dxgi.rs:1298-1313). On the normal acquire path (dxgi.rs:1426-1492) it never validates the acquired texture's desc, so a mode change that doesn't raise ACCESS_LOST leads to CopyResource of a mismatched-size/format source into a stale gpu_copy/staging/fp16_src — silent corruption or a hard copy failure.
- **Proposal:** In acquire(), after res.cast::<ID3D11Texture2D>(), call GetDesc and compare Width/Height/Format against self.width/height and the expected format (BGRA8 vs R16G16B16A16_FLOAT). On mismatch, ReleaseFrame and run the existing recreate_dupl path (or drop gpu_copy/staging/fp16/hdr10 textures and update width/height/hdr_fp16) so the encoder re-inits cleanly. This makes live resolution + HDR-toggle changes robust even when DDA doesn't fault.
#### 3. Per-frame IsCurrent() check to catch HDR/GPU/mode changes #### 3. Per-frame IsCurrent() check to catch HDR/GPU/mode changes
*Area:* `win:capture-wgc` · *Windows-host:* yes · *Severity:* high · *Effort:* small *Area:* `win:capture-wgc` · *Windows-host:* yes · *Severity:* high · *Effort:* small
@@ -1790,36 +1784,13 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** Hold an IDXGIFactory1 in WgcCapturer (from the same adapter as make_device) and call IsCurrent() at the top of next_frame/wait_and_drain; on false, return the reinit signal. This pairs with wgc-size-format-reinit to give a complete change-detection story. - **Proposal:** Hold an IDXGIFactory1 in WgcCapturer (from the same adapter as make_device) and call IsCurrent() at the top of next_frame/wait_and_drain; on false, return the reinit signal. This pairs with wgc-size-format-reinit to give a complete change-detection story.
#### 4. Batched/GSO send for the GameStream video plane on Windows #### 4. Batched/GSO send for the GameStream video plane on Windows
*Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* high · *Effort:* medium · **✓ verified · ✅ DONE (2026-06-16)** **SHIPPED (2026-06-16)** — Windows USO batched send for the GameStream video plane via the reusable `punktfunk_core::transport::send_uso_all` helper (one WSASendMsg per 16-packet paced burst, PUNKTFUNK_GSO=0 kill-switch + auto-fallback); Host Windows compile CI-pending.
> **Resolution:** Implemented per the refined proposal. Added a reusable Windows-only
> `punktfunk_core::transport::send_uso_all(&UdpSocket, &[&[u8]]) -> io::Result<usize>` that reuses the
> native plane's proven `send_one_uso` + `uso` on/off latch + `uso_unsupported`, with the same
> uniform-size guard and ≤512-segment chunking. `gamestream/stream.rs` `sendmmsg_all` now has a
> `#[cfg(target_os="windows")]` arm that calls it per 16-packet paced burst (one `WSASendMsg` instead
> of 16 `send`s) and sends any remainder scalar; the Linux `sendmmsg` arm and a generic scalar arm are
> unchanged. PUNKTFUNK_GSO=0 kill-switch + auto-fallback inherited. Linux build unaffected;
> punktfunk-core type-checks for x86_64-pc-windows-msvc. Host Windows compile deferred to CI/dev box.
- **Apollo does:** Apollo sends every plane through platf::send_batch / send (one code path for all OSes; on Windows it uses real batched socket writes), and the video broadcast thread is the single transmit path (stream.cpp:1327, send batching at stream.cpp:1337 send_batch latency logger).
- **punktfunk gap:** The GameStream video sender's batched path is Linux-only: sendmmsg_all has a #[cfg(target_os="linux")] real implementation (stream.rs:147) and a #[cfg(not(target_os="linux"))] fallback that does one sock.send() per packet (stream.rs:185-191). On a Windows GameStream-compat host (capture IS wired for Windows via DXGI/WGC, capture.rs:261) every video datagram is an individual syscall — the native punktfunk/1 plane got Windows USO (transport/udp.rs:135) but the GameStream plane did not.
- **Proposal:** Route the GameStream video send thread through the same Windows WSASendMsg/USO + WSASend-batch path the native plane already implements in punktfunk-core transport/udp.rs (or factor that send helper into a shared module and call it from gamestream/stream.rs). Keeps GameStream-on-Windows from being syscall-bound at high bitrate.
- **Verify verdict:** `confirmed_gap` — PUNKTFUNK gap is real. The GameStream video send path uses a private `sendmmsg_all`: real `sendmmsg` only under `#[cfg(target_os="linux")]` (crates/punktfunk-host/src/gamestream/stream.rs:147-181), and a `#[cfg(not(target_os="linux"))]` fallback that does one `sock.send(p)` per packet (stream.rs:185-191). The paced sender calls it in PACE_CHUNK=16 bursts (stream.rs:230). It operates on a raw `std::net::UdpSocket` (stream.rs:66, cloned at :310), NOT the core `Transport` trait, so it does NOT pick up the native plane's USO. The GameStream host genuinely runs on Windows: `serve`/`gamestream` are not OS-gated (main.rs:81-83 dispatch is uncfg'd; gamestream/mod.rs declares `mod stream;` with no cfg), capture is wired for Windows (capture.rs:261-279 `capture_virtual_output` via SudoVDA+WGC/DXGI), and the module has explicit Windows handling (gamestream/mod.rs:209-210 APPDATA, :216-217 COMPUTERNAME). So on a Windows GameStream-compat host every video datagram is its own syscall. Meanwhile the native plane already has the answer: crates/punktfunk-core/src/transport/udp.rs:141-246 (`uso` state + `send_one_uso` via `WSASendMsg`+`UDP_SEND_MSG_SIZE`), wired default-on at udp.rs:610-647 (`send_gso`), called by session.rs:182. Also note GameStream video datagrams are uniform `blocksize` (= packet_size+16): data shards, the zero-padded last data shard, and FEC parity shards are all full blocksize (gamestream/video.rs:41-42,76,111-166) — the exact uniform-size precondition USO/GSO needs. APOLLO confirms the claimed unified path: `platf::send_batch` (src/platform/common.h:697) is the single video transmit call (src/stream.cpp:1598, in videoBroadcastThread, latency-logged at stream.cpp:1337); its Windows impl is real USO — `WSASendMsg` with a `UDP_SEND_MSG_SIZE` cmsg of `header_size+payload_size` (src/platform/windows/misc.cpp:1408,1499,1508), with a per-packet `send()` fallback (misc.cpp:1510-1587) "if USO is not supported ... caller will fall back to unbatched sends" (misc.cpp:1504-1505).
- **Refined:** Route the GameStream Windows video send through USO instead of per-packet `send`. Do NOT duplicate the WSASendMsg code — factor the native plane's USO helper out of `UdpTransport`. Extract `send_one_uso` + the `uso` enable/latch state + `uso_unsupported` + the uniform-size chunking loop (currently udp.rs:185-246 and the `send_gso` Windows body udp.rs:610-647) into a small `pub(crate)` free function in punktfunk-core, e.g. `transport::udp::send_packets_uso(socket: &UdpSocket, packets: &[&[u8]]) -> io::Result<usize>` that takes a raw connected `std::net::UdpSocket` (the GameStream sender already owns one) and applies USO with the same default-on + auto-fallback-to-per-packet + PUNKTFUNK_GSO=0 kill-switch semantics. Then rewrite gamestream/stream.rs `sendmmsg_all` so the `#[cfg(target_os="windows")]` arm calls that helper (the Linux arm keeps its sendmmsg; a `not(any(linux,windows))` arm keeps the scalar loop). GameStream packets are already uniform blocksize per the packetizer, so the USO uniform-size guard passes; the existing PACE_CHUNK=16 microburst pacing is unaffected (each chunk becomes one WSASendMsg). Add a Linux GSO arm too while there (same helper pattern) for parity, but USO/Windows is the point of this item. Keep the change inside punktfunk-core for the helper (one core, C-ABI-stable — no new public ABI surface needed, it's pub(crate)) and a ~10-line edit in the host. This respects: no async on frame path (native sockets only), no protocol change, no scaling change.
#### 5. Gate the GameStream HTTPS plane on the paired-cert allow-list #### 5. Gate the GameStream HTTPS plane on the paired-cert allow-list
*Area:* `cmp:gamestream-http-pairing` · *Windows-host:* yes · *Severity:* high · *Effort:* medium **SHIPPED (2026-06-20)** — gamestream/tls.rs surfaces the verified peer cert (PeerCertFingerprint) and nvhttp.rs gates /launch /resume /applist /cancel on the paired-fingerprint set (closes the "any TLS client can launch" hole).
- **Apollo does:** Apollo defers TLS verification (nvhttp.cpp:88 sets verify_peer|verify_fail_if_no_peer_cert with a permissive OpenSSL cb, then the accept() override runs cert_chain.verify() post-handshake and stashes the matched named_cert_t into request->userp; every authenticated handler calls get_verified_cert(request) — nvhttp.cpp:665-667,915,1086,1172,1360 — so an unpaired cert is rejected with a proper XML body, not just accepted).
- **punktfunk gap:** punktfunk pins the client cert at pairing (pairing.rs:230-236) and loads it into AppState.paired (mod.rs:134) but NEVER consults it: tls.rs:38-45 verify_client_cert always returns assertion(), and /launch (nvhttp.rs:87-109) does no identity check. Any client that completed a TLS handshake — paired or not — can launch a session.
- **Proposal:** After the handshake, recover the peer cert (axum_server exposes the rustls connection / peer certs), SHA-256 it, and check it against AppState.paired in /launch, /resume, /applist, /cancel (and reflect the real result in serverinfo PairStatus). Keep verify_client_cert lenient for the handshake but reject unpaired identities at the handler with an XML error, mirroring Apollo's get_verified_cert pattern. This is the single highest-value GameStream-compat hardening item and applies equally to the Windows host.
#### 6. Query NVENC encode capabilities before init and degrade gracefully #### 6. Query NVENC encode capabilities before init and degrade gracefully
*Area:* `cmp:video-encode` · *Windows-host:* yes · *Severity:* high · *Effort:* medium **SHIPPED (2026-06-20)** — encode/nvenc.rs query_caps probes nvEncGetEncodeCaps and degrades gracefully (over-range reject, 10-bit→8-bit fallback, custom-VBV gate, RFI flag); Windows compile CI-pending.
- **Apollo does:** nvenc_base.cpp:175-220 builds a get_encoder_cap lambda over nvEncGetEncodeCaps and checks NV_ENC_CAPS_WIDTH_MAX/HEIGHT_MAX (rejects with a clear message), SUPPORT_10BIT_ENCODE, SUPPORT_YUV444_ENCODE, SUPPORT_REF_PIC_INVALIDATION (toggles encoder_params.rfi), SUPPORT_CUSTOM_VBV_BUF_SIZE (nvenc_base.cpp:250-255), SUPPORT_CABAC (nvenc_base.cpp:311-315), SUPPORT_WEIGHTED_PREDICTION (nvenc_base.cpp:220), and SUPPORT_INTRA_REFRESH/SINGLE_SLICE_INTRA_REFRESH (nvenc_base.cpp:334-345). Each missing cap downgrades a feature instead of failing.
- **punktfunk gap:** crates/punktfunk-host/src/encode/nvenc.rs:131-323 init_session never calls nvEncGetEncodeCaps. Max W/H is only checked against a static per-codec constant (encode.rs:57-62) not the GPU's real cap; 10-bit Main10 is forced (nvenc.rs:233-237) without checking SUPPORT_10BIT_ENCODE; custom VBV (nvenc.rs:224-227) is set without checking SUPPORT_CUSTOM_VBV_BUF_SIZE. On an unsupported card these surface as opaque InvalidParam handled only by bitrate step-down, which masks the real cause.
- **Proposal:** Add a caps query in NvencD3d11Encoder::init_session right after open_encode_session_ex: build a get_cap(NV_ENC_CAPS) helper over nvEncGetEncodeCaps, validate encodeWidth/Height against WIDTH_MAX/HEIGHT_MAX with a clear error, gate the 10-bit path on SUPPORT_10BIT_ENCODE (fall back to 8-bit with a warning instead of failing), gate custom VBV on SUPPORT_CUSTOM_VBV_BUF_SIZE, and record an rfi-supported flag for the RFI work below.
#### 7. Detect default-render-device changes and reinit WASAPI capture #### 7. Detect default-render-device changes and reinit WASAPI capture
*Area:* `cmp:audio` · *Windows-host:* yes · *Severity:* high · *Effort:* medium *Area:* `cmp:audio` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
@@ -1829,11 +1800,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** In wasapi_cap.rs, register a device-notification callback on the DeviceEnumerator; on default-render change, break the capture loop and reopen get_default_device(Render) + a fresh loopback IAudioClient (re-running the init block at wasapi_cap.rs:105-133). Surface it through the existing thread without tearing down the WasapiLoopbackCapturer handle so the session keeps streaming. - **Proposal:** In wasapi_cap.rs, register a device-notification callback on the DeviceEnumerator; on default-render change, break the capture loop and reopen get_default_device(Render) + a fresh loopback IAudioClient (re-running the init block at wasapi_cap.rs:105-133). Surface it through the existing thread without tearing down the WasapiLoopbackCapturer handle so the session keeps streaming.
#### 8. Move GameStream input injection off the ENet service thread #### 8. Move GameStream input injection off the ENet service thread
*Area:* `cmp:input` · *Windows-host:* yes · *Severity:* high · *Effort:* medium **SHIPPED (2026-06-20)** — on_receive forwards to a shared crate::inject InjectorService thread (+ relative-mouse/scroll coalescing, #45); the ENet thread no longer blocks on injection.
- **Apollo does:** The control thread only enqueues bytes + schedules a task; a pool thread pops one packet, batches later same-type packets while holding the queue lock, then RELEASES the lock before the (slow) SendInput/ViGEm call — src/input.cpp:1481-1520, 1639-1643. A slow OS input call never stalls the network thread.
- **punktfunk gap:** on_receive() calls inj.inject(&ev) synchronously inside the host.service() ENet loop — crates/punktfunk-host/src/gamestream/control.rs:84-91,207-211. A SendInput that blocks crossing a desktop switch (or a slow ViGEm update) head-blocks ENet handshake/keepalive/retransmit servicing. The m3 path already does this right (punktfunk1.rs:1300 → injector_service_thread).
- **Proposal:** Mirror the m3 design in the GameStream control thread: push decoded InputEvents onto an mpsc channel drained by a dedicated injector thread (reuse injector_service_thread or a sibling), so the ENet thread never blocks on SendInput/ViGEm. No async needed — native thread + std::sync::mpsc, consistent with the invariant.
#### 9. Actually launch the app/game on Windows (CreateProcessAsUserW into the user session) #### 9. Actually launch the app/game on Windows (CreateProcessAsUserW into the user session)
*Area:* `cmp:process-launch` · *Windows-host:* yes · *Severity:* high · *Effort:* medium *Area:* `cmp:process-launch` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
@@ -1864,18 +1831,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** In WgcCapturer::process_frame, call src.GetDesc() and compare Width/Height/Format against self.width/height and the expected format. On mismatch, return a Reinit error (add a capture_e::Reinit-equivalent to the Capturer contract or bail with a recognizable error the m3/stream loop maps to a capturer rebuild). Drop and re-create fp16_src/hdr10_out/bgra_copy when size changes. - **Proposal:** In WgcCapturer::process_frame, call src.GetDesc() and compare Width/Height/Format against self.width/height and the expected format. On mismatch, return a Reinit error (add a capture_e::Reinit-equivalent to the Capturer contract or bail with a recognizable error the m3/stream loop maps to a capturer rebuild). Drop and re-create fp16_src/hdr10_out/bgra_copy when size changes.
#### 13. Split every cursor shape into an alpha image + an XOR image (two-pass composite) #### 13. Split every cursor shape into an alpha image + an XOR image (two-pass composite)
*Area:* `win:cursor-compositing` · *Windows-host:* yes · *Severity:* high · *Effort:* medium · **✅ DONE (2026-06-16)** **SHIPPED (2026-06-16)** — two-pass cursor composite in capture/dxgi.rs (CursorShape alpha/xor layers, CursorCompositor draw_layer; MASKED_COLOR→alpha, MONOCHROME (1,1)→XOR; cursor_invert flag removed). Windows CI/dev-VM compile pending.
> **Resolution:** Implemented in `capture/dxgi.rs`. `convert_pointer_shape` now returns a `CursorShape`
> with optional `alpha`/`xor` layers; `CursorCompositor` holds `tex_alpha`/`tex_xor` and `draw_layer`
> renders each with its own blend (alpha = src-over + HDR scale; XOR = inversion, unscaled). MASKED_COLOR
> opaque pixels now go through the alpha pass (not the invert blend), and MONOCHROME `(1,1)` invert pixels
> now feed the XOR layer (previously approximated as solid black). CPU path blends both layers too.
> The `cursor_invert` flag was removed. Independently reviewed (ship); pending Windows CI/dev-VM compile.
- **Apollo does:** Apollo emits two BGRA images per shape — make_cursor_alpha_image (display_vram.cpp:279) and make_cursor_xor_image (display_vram.cpp:210) — and runs both an alpha-blend pass and an invert-blend pass in blend_cursor (display_vram.cpp:1448-1469), each skipped if its image is empty. MASKED_COLOR and MONOCHROME shapes legitimately need both.
- **punktfunk gap:** convert_pointer_shape (dxgi.rs:566) produces ONE image and cursor_invert (dxgi.rs:1133-1134) picks ONE blend for the whole shape, so a cursor mixing opaque and screen-inverting pixels (common I-beams and themed arrows) renders wrong; masked-color opaque pixels are even forced through the invert blend (dxgi.rs:612-624 + 1205).
- **Proposal:** Refactor convert_pointer_shape in dxgi.rs to return two optional images (alpha, xor) mirroring Apollo's split. Store cursor_shape as Option<(alpha, xor)>, upload up to two SRVs in CursorCompositor, and in composite_cursor_gpu run the alpha pass with self.blend then the xor pass with self.blend_invert (skip empties). Drop the single cursor_invert flag.
#### 14. Map absolute mouse through the real virtual-desktop / output rect, not a blind 0..65535 normalize #### 14. Map absolute mouse through the real virtual-desktop / output rect, not a blind 0..65535 normalize
*Area:* `win:input-sendinput-vigem` · *Windows-host:* yes · *Severity:* high · *Effort:* medium *Area:* `win:input-sendinput-vigem` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
@@ -1892,11 +1848,7 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS. - **Proposal:** In the pinger thread in sudovda.rs (around 485-494), track a consecutive-failure counter; after N (3) failures set a shared AtomicBool 'driver_dead' on SudoVdaDisplay/keepalive and stop pinging. Surface it so the session loop in punktfunk1.rs treats a dead virtual display like ACCESS_LOST and re-opens (re-run open_device + re-create). Add a DriverStatus enum mirroring Apollo's DRIVER_STATUS.
#### 16. Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU #### 16. Add SET_RENDER_ADAPTER (IOCTL 0x802) to bind the IDD render GPU to the capture/encode GPU
*Area:* `win:virtual-display-sudovda` · *Windows-host:* yes · *Severity:* high · *Effort:* medium **SHIPPED (2026-06-20)** — SET_RENDER_ADAPTER (IOCTL 0x802) now binds the IDD render GPU to the capture/encode adapter on hybrid/multi-GPU boxes.
- **Apollo does:** setRenderAdapterByName enumerates DXGI adapters, matches desc.Description, and issues SET_RENDER_ADAPTER with that adapter's LUID before every create (virtual_display.cpp:624-654, sudovda.h:109-128, called at main.cpp:369-371 and process.cpp:250-252).
- **punktfunk gap:** punktfunk defines no IOCTL_SET_RENDER_ADAPTER and never binds the render adapter (sudovda.rs:47-54). On a hybrid/multi-GPU box the IDD may render on the iGPU while NVENC + Desktop Duplication run on the dGPU, breaking or slowing zero-copy.
- **Proposal:** Add `const IOCTL_SET_RENDER_ADAPTER: u32 = ctl(0x802);` and a `#[repr(C)] struct SetRenderAdapterParams { luid: LUID }` in sudovda.rs. Before ADD in create() (sudovda.rs:448), enumerate DXGI adapters (reuse capture/dxgi.rs adapter-by-LUID/name helpers) to match the configured/encoder GPU and issue the IOCTL so the IDD's AddOut LUID matches the capture device's adapter.
#### 17. Add streaming_will_start/stop session-level latency tuning on Windows #### 17. Add streaming_will_start/stop session-level latency tuning on Windows
*Area:* `win:critic` · *Windows-host:* yes · *Severity:* high · *Effort:* medium *Area:* `win:critic` · *Windows-host:* yes · *Severity:* high · *Effort:* medium
@@ -1913,43 +1865,16 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** On the capture thread, register an IMMNotificationClient (or poll GetDefaultAudioEndpoint) and treat a default-render change OR a device-invalidated error as a re-open: tear down the IAudioClient and re-acquire the new default endpoint in-place, like the Linux PipeWire reconnect discipline. Lives entirely in audio/wasapi_cap.rs - **Proposal:** On the capture thread, register an IMMNotificationClient (or poll GetDefaultAudioEndpoint) and treat a default-render change OR a device-invalidated error as a re-open: tear down the IAudioClient and re-acquire the new default endpoint in-place, like the Linux PipeWire reconnect discipline. Lives entirely in audio/wasapi_cap.rs
#### 19. Implement true reference-frame invalidation with a multi-ref DPB instead of always-full-IDR #### 19. Implement true reference-frame invalidation with a multi-ref DPB instead of always-full-IDR
*Area:* `cmp:video-encode` · *Windows-host:* yes · *Severity:* high · *Effort:* large **SHIPPED (2026-06-20)** — Encoder::invalidate_ref_frames added (Windows NVENC multi-ref DPB + nvEncInvalidateRefFrames; GameStream 0x0301 routes to invalidate); Linux degrades to IDR; NVENC impl CI-pending. See also #22.
- **Apollo does:** nvenc_base.cpp:268-281 sets maxNumRefFrames/maxNumRefFramesInDPB to 5 (HEVC/H264) and L0 to 1, enabling a deep DPB; invalidate_ref_frames (nvenc_base.cpp:574-610) calls nvEncInvalidateRefFrames per lost frame range, dedupes already-done ranges, falls back to IDR only when the range exceeds the DPB, and sets rfi_needs_confirmation so the next encoded frame is marked as the RFI fulfilment (nvenc_base.cpp:551-557, 490-491).
- **punktfunk gap:** crates/punktfunk-host/src/encode/nvenc.rs leaves ref frames at the preset default and exposes only request_keyframe (nvenc.rs:465-467) which always emits a full FORCE_IDR. gamestream/control.rs:163-177 collapses both RFI (0x0301) and request-IDR (0x0302) into the same full-IDR. A full IDR at high resolution is the multi-millisecond spike punktfunk's own infinite-GOP comments call out (linux.rs:197-201) — true RFI avoids it for recoverable loss.
- **Proposal:** Extend the Encoder trait with an invalidate_ref_frames(first,last) method (default: fall back to request_keyframe). In the Windows NVENC config set maxNumRefFramesInDPB/maxNumRefFrames>1 (and numRefL0=1) gated on SUPPORT_MULTIPLE_REF_FRAMES, implement invalidate_ref_frames via nvEncInvalidateRefFrames with the dedupe + IDR-fallback logic, and route control.rs 0x0301 to invalidate (carrying the lost frame range) while 0x0302 stays full-IDR.
#### 20. In-binary Windows service install + interactive-session launch #### 20. In-binary Windows service install + interactive-session launch
*Area:* `cmp:config-management` · *Windows-host:* yes · *Severity:* high · *Effort:* large **SHIPPED (2026-06-20)** — in-binary punktfunk-host service subcommand installs/launches the host into the interactive session (PsExec chain dropped). See also #24.
- **Apollo does:** config.cpp:1490-1534 handles the Windows shortcut/service launch dance inside the binary: --shortcut/--shortcut-admin handling, ShellExecuteExW(runas, --shortcut-admin) to self-elevate when the service isn't running, waits for the service, wait_for_ui_ready(), launch_ui(), then returns 1 so the foreground process does NOT also start a stream host. This is Sunshine/Apollo's mature service<->UI two-process split that makes one-click launch work.
- **punktfunk gap:** punktfunk has no service-install / self-elevation / interactive-session bring-up in the binary. Deployment is documented as a manual chain of external scripts — scheduled task -> PsExec64 -i 1 -> launch.vbs -> host-run.cmd (design/windows-host.md:77-96) — fragile and operator-hostile. main.rs has no install/service subcommand.
- **Proposal:** Add `punktfunk-host install`/`uninstall`/`service` subcommands (Windows-gated) that register a service or an Interactive/Highest scheduled task to launch the host in Session 1 (the documented requirement for DXGI duplication + SendInput), and the self-elevate-if-not-running shortcut path. Reuse the existing capture/wgc_relay CreateProcessAsUserW machinery already in the crate. This codifies the script chain into the binary without touching the per-frame path or core.
#### 21. Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame #### 21. Composite the moved cursor onto a clean copy even when DDA returns no new desktop frame
*Area:* `win:cursor-compositing` · *Windows-host:* yes · *Severity:* high · *Effort:* large · **⊘ ALREADY-HANDLED (2026-06-16)** **NOT-A-BUG (2026-06-16)** — premise incorrect: DXGI returns S_OK for pointer-only updates (LastMouseUpdateTime != 0, LastPresentTime == 0) and acquire() recomposites the cursor at its new position; last_present is repeated only on a genuine WAIT_TIMEOUT. Only an optional perf micro-opt remains (Apollo re-blends just the cursor rect to avoid a full CopyResource per pointer update).
> **Resolution — not a bug for punktfunk.** The gap below assumes a cursor moving over a static screen
> produces `AcquireNextFrame` **timeouts**. It does not: DXGI returns **S_OK for pointer-only updates**
> (`FrameInfo.LastMouseUpdateTime != 0`, `LastPresentTime == 0`), with the resource holding the
> (unchanged) desktop. `acquire()` always re-runs `present_acquired` on S_OK (`dxgi.rs:1407,1474`), which
> re-copies the desktop and recomposites the cursor at its new position. `last_present` is repeated only
> on a genuine `WAIT_TIMEOUT` (nothing changed) or a mid-rebuild gap — correct. The agent that raised this
> didn't account for DDA's pointer-update S_OK semantics, and the run was killed before the verify phase
> reached it. The only real delta from Apollo is a **perf** micro-opt (Apollo retains a clean copy and
> re-blends just the cursor rect, avoiding a full ~29 MB `CopyResource` per pointer update) — deferred as
> optional, pending evidence of GPU-copy pressure.
- **Apollo does:** Apollo treats a mouse-only update as a real update (display_vram.cpp:1162-1168) and keeps an intermediate D3D surface of the last desktop frame so it can copy surface->fresh image and re-blend the cursor at its new position with no new DDA frame (last_frame_variant state machine, display_vram.cpp:1239-1306).
- **punktfunk gap (as originally filed — see Resolution above; premise incorrect):** punktfunk only composites on a fresh AcquireNextFrame (dxgi.rs:1477); on timeout it repeats last_present (dxgi.rs:1547-1561) which has the OLD cursor position baked in, so a cursor moving over a static screen stutters/lags.
- **Proposal (superseded; only the perf variant remains):** Keep a clean intermediate copy of the last desktop frame (an extra DEFAULT texture). In acquire (dxgi.rs:1341), when AcquireNextFrame times out but update_cursor saw a position change (LastMouseUpdateTime changed) and the cursor is visible, copy the clean intermediate into gpu_copy and re-run composite_cursor_gpu, then return that as a fresh frame instead of repeating last_present.
#### 22. Add real reference-frame invalidation (RFI) instead of always forcing IDR #### 22. Add real reference-frame invalidation (RFI) instead of always forcing IDR
*Area:* `win:nvenc-d3d11` · *Windows-host:* yes · *Severity:* high · *Effort:* large **SHIPPED (2026-06-20)** — real RFI via nvEncInvalidateRefFrames with dedup + IDR-on-overflow; control plane 0x0301 routes to invalidate. NVENC impl CI-pending. See #19.
- **Apollo does:** Apollo keeps a deep DPB (maxNumRefFrames 5/HEVC, 8/AV1) but pins L0 ref to 1 (nvenc_base.cpp:268-281), then on a loss event calls nvEncInvalidateRefFrames per-frame over the requested range, dedups against the last range, expands to the last-encoded index, escalates to IDR only if the range exceeds DPB depth, and tags the next frame rfi_needs_confirmation (nvenc_base.cpp:574-610). This lets the encoder re-reference an older still-valid frame rather than emit a multi-millisecond keyframe.
- **punktfunk gap:** punktfunk has NO invalidate path — request_keyframe() always forces a full IDR (nvenc.rs:437-442,465-467); punktfunk1.rs:2153 / gamestream/stream.rs:336 wire 'RFI' straight to a keyframe. Every recovery is a costly IDR spike, defeating the infinite-GOP design.
- **Proposal:** In nvenc.rs add `maxNumRefFramesInDPB`/`numRefL0=1` to the HEVC/H264/AV1 config in init_session, gate on a new caps query NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION, track last_encoded_frame_index + last_rfi_range, and add an `invalidate_ref_frames(first,last)` method on the Encoder trait (encode.rs:41-51) that calls API.invalidate_ref_frames per index with Apollo's dedup/escalate-to-IDR-on-overflow logic. Wire punktfunk1.rs RFI requests to it, falling back to request_keyframe() only when it returns false.
#### 23. Add a DS4 (DualShock4) ViGEm target on Windows with type auto-selection, motion, touchpad, battery and timestamp pump #### 23. Add a DS4 (DualShock4) ViGEm target on Windows with type auto-selection, motion, touchpad, battery and timestamp pump
*Area:* `win:input-sendinput-vigem` · *Windows-host:* yes · *Severity:* high · *Effort:* large *Area:* `win:input-sendinput-vigem` · *Windows-host:* yes · *Severity:* high · *Effort:* large
@@ -1959,38 +1884,16 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Proposal:** In gamepad_windows.rs, add a DS4Wired branch via vigem_client::DualShock4Wired with a union/enum PadEntry. Resolve type from the decoded Arrival (precedence: explicit env/client choice > PS type > motion/touchpad caps > X360), mirroring the existing GAMEPAD-preference negotiation. Port Apollo's wTimestamp pump (5.333us units, re-send every 100ms), motion calibration constants (:157-170), and the touchpad byte packing (:1604-1608). Surface the LED color via the existing 0xCA/feedback plane. - **Proposal:** In gamepad_windows.rs, add a DS4Wired branch via vigem_client::DualShock4Wired with a union/enum PadEntry. Resolve type from the decoded Arrival (precedence: explicit env/client choice > PS type > motion/touchpad caps > X360), mirroring the existing GAMEPAD-preference negotiation. Port Apollo's wTimestamp pump (5.333us units, re-send every 100ms), motion calibration constants (:157-170), and the touchpad byte packing (:1604-1608). Surface the LED color via the existing 0xCA/feedback plane.
#### 24. Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change #### 24. Replace the PsExec scheduled-task launch with a real Windows service that relaunches the host on session change
*Area:* `win:system-secure-desktop` · *Windows-host:* yes · *Severity:* high · *Effort:* large **SHIPPED (2026-06-20)** — real Windows service relaunches the host on console-session change (SERVICE_ACCEPT_SESSIONCHANGE); PsExec scheduled-task dropped. See also #20.
- **Apollo does:** SunshineSvc.exe runs as LocalSystem in Session 0, loops on WTSGetActiveConsoleSessionId, clones its own token with DuplicateTokenEx(TokenPrimary)+SetTokenInformation(TokenSessionId) and CreateProcessAsUserW into winsta0\\default inside a per-session job object (JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE|BREAKAWAY_OK); opts into SERVICE_ACCEPT_SESSIONCHANGE and on WTS_CONSOLE_CONNECT terminates+relaunches the host in the new session (tools/sunshinesvc.cpp:95,111,239,256,267,276-294)
- **punktfunk gap:** punktfunk has no Windows service; launch is a PsExec64 -s -i 1 scheduled task hard-coded to session 1 (design/windows-host.md:78-84), with the SERVICE_CONTROL_SESSIONCHANGE relaunch listed as unimplemented step 6 (design/windows-secure-desktop.md:89). Launch scripts are not even in the repo.
- **Proposal:** Add a small Rust service binary (new crate or punktfunk-host `service` subcommand) using windows::Win32::System::Services (RegisterServiceCtrlHandlerEx, StartServiceCtrlDispatcher) that mirrors sunshinesvc.cpp: WTSGetActiveConsoleSessionId -> DuplicateTokenEx+SetTokenInformation(TokenSessionId) -> CreateProcessAsUserW(lpDesktop=winsta0\\default) into a kill-on-close job, accept SERVICE_ACCEPT_SESSIONCHANGE, and relaunch the host on a genuine console-session change. Ship an installer and drop the PsExec dependency.
#### 25. Elevate capture/encode/send thread priority on the host hot path #### 25. Elevate capture/encode/send thread priority on the host hot path
*Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* medium · *Effort:* small · ** verified** **SHIPPED (2026-06-20)** — hot-path capture/encode/send threads now elevate priority (Windows SetThreadPriority HIGHEST for send / ABOVE_NORMAL for capture+encode; best-effort niceness on Linux, no-ops without privilege), per the verified plan.
- **Apollo does:** Apollo raises the transmit/capture thread priority: platf::adjust_thread_priority(thread_priority_e::critical) in the video broadcast thread (stream.cpp:1122) and ::high in the audio/control paths (stream.cpp:1333, 1672); the Windows impl is SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST/ABOVE_NORMAL) (platform/windows/misc.cpp:1081-1102).
- **punktfunk gap:** punktfunk names its hot-path threads (stream.rs:44 video, stream.rs:204 send, punktfunk1.rs:1804 send_loop, punktfunk1.rs:2017/2328 send threads) but never sets a scheduling priority — every host capture/encode/send thread runs at default priority. Only the macOS client elevates (client.rs:169). On a loaded Windows desktop the encode/send thread can be preempted, adding jitter the frame-pacing logic can't recover.
- **Proposal:** Add a cross-platform raise_current_thread_priority() helper (SetThreadPriority on Windows, optionally AvSetMmThreadCharacteristics for MMCSS; sched/nice on Linux) and call it at the top of the GameStream send thread, the native send_loop, and the encode thread. Cheap, high-value jitter reduction, no design impact.
- **Verify verdict:** `confirmed_gap` — punktfunk: NO thread-priority call exists anywhere in the workspace (grep for SetThreadPriority/sched_setscheduler/setpriority/AvSetMm/THREAD_PRIORITY across crates/ returned zero hits). Hot-path threads are named-only at default priority: GameStream video thread crates/punktfunk-host/src/gamestream/stream.rs:44-53 (thread::Builder name "punktfunk-video") and GameStream send thread stream.rs:204-206 ("punktfunk-send"); native send threads crates/punktfunk-host/src/punktfunk1.rs:2017-2033 and punktfunk1.rs:2328-2333 ("punktfunk-send"), and the native send_loop at punktfunk1.rs:1804 — all spawned with no priority set. The encode work shares the capture thread (punktfunk1.rs:2011-2013 "this thread captures+encodes ... and hands each AU to a dedicated send thread"), also default priority. The windows crate is ALREADY a dependency with the needed feature: crates/punktfunk-host/Cargo.toml:141 enables "Win32_System_Threading" (SetThreadPriority/GetCurrentThread available, zero new deps). Apollo: confirmed it raises priority on every hot-path thread — capture src/video.cpp:1295 (critical), encode src/video.cpp:2359 and 2396 (high), video send src/stream.cpp:1333 (high), control src/stream.cpp:1122 (critical), audio src/stream.cpp:1672 + src/audio.cpp:94/208. Windows impl is SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST/ABOVE_NORMAL) at src/platform/windows/misc.cpp:1081-1102, plus DwmEnableMMCSS(true) (misc.cpp:1139) and AvSetMmThreadCharacteristics("Pro Audio") for the audio-capture thread (src/platform/windows/audio.cpp:540). CRITICAL NUANCE: Apollo's adjust_thread_priority is effectively Windows-only — src/platform/linux/misc.cpp:362-364 is "// Unimplemented" and src/platform/macos/misc.mm:218-220 is "// Unimplemented".
- **Refined:** Add a small cross-platform helper raise_current_thread_priority(level) and call it at the TOP of each hot-path thread body (so the calling thread itself is elevated): the GameStream send thread (stream.rs:206), the GameStream video/capture+encode thread (stream.rs:46), the native send threads (punktfunk1.rs:2021 and punktfunk1.rs:2331 closures, before/at the start of send_loop), and the native capture+encode thread (the punktfunk1.rs run body that owns capture+encode, punktfunk1.rs ~2011+). Windows: SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST) for the send/network thread (latency-critical, matches Apollo's video-send=high but the punktfunk send thread also does FEC+seal so HIGHEST is defensible) and THREAD_PRIORITY_ABOVE_NORMAL for capture+encode — using the windows crate already on Cargo.toml:141, no new deps. Optionally associate the network/encode thread with MMCSS via AvSetMmThreadCharacteristics (needs the Win32_System_Threading "Games"/"Pro Audio" task + AVRT feature) for higher-fidelity scheduling under DWM load; treat as a follow-up, not the first cut. Linux (net-new beyond Apollo, since Apollo leaves it unimplemented and punktfunk is Linux-first): best-effort nice(-10)/setpriority on the send+encode threads — note SCHED_FIFO/RR requires CAP_SYS_NICE/rtprio limits the host won't have by default, so do NOT default to realtime; a plain niceness bump is the safe portable choice and silently no-ops without privilege. Make every priority call best-effort (log-and-continue on failure, exactly as Apollo does at misc.cpp:1104). No async, no per-frame allocation, no ABI surface change — purely thread-setup, so no design invariant is touched.
#### 43. Socket QoS / DSCP marking on the media sockets #### 43. Socket QoS / DSCP marking on the media sockets
*Area:* `cmp:protocol-streaming` · *Windows-host:* yes · *Severity:* medium · *Effort:* medium · **✓ verified** **SHIPPED (2026-06-20)** — punktfunk_core::transport::qos set_media_qos marks the native + GameStream media sockets (DSCP CS5 video / CS6 audio via IP_TOS + Linux SO_PRIORITY 5/6, opt-in PUNKTFUNK_DSCP=1). Windows caveat: plain IP_TOS is a no-op on the wire without a qWAVE policy — porting Apollo's qWAVE path (QOSAddSocketToFlow) remains a documented follow-up.
- **Apollo does:** Apollo tags video and audio sockets for prioritized delivery: enable_socket_qos(...qos_data_type_e::video...) and (...audio...) called per session (stream.cpp:1917, stream.cpp:1938); the Windows impl uses qWAVE QOSCreateHandle/QOSAddSocketToFlow with DSCP tagging (platform/windows/misc.cpp:1616-1652), with Linux/macOS equivalents.
- **punktfunk gap:** punktfunk sets NO QoS/DSCP anywhere — grep for qos/DSCP/IP_TOS across crates/punktfunk-host and crates/punktfunk-core finds only the x-nv-vqos ANNOUNCE keys (rtsp.rs:278) and a macOS *client* pthread QoS (client.rs:169). Neither the GameStream sockets (stream.rs:66 bind, audio) nor the native data socket (transport/udp.rs) request link-layer/router priority.
- **Proposal:** Add a small per-OS helper to mark the video/audio/data UDP sockets: DSCP EF/AF41 via IP_TOS/IPV6_TCLASS on Linux/macOS, qWAVE QOSAddSocketToFlow on Windows (gated behind an env/config opt-in). Wire it into stream.rs socket setup and the native transport socket creation. Directly improves latency under contended Wi-Fi / shared uplink.
- **Verify verdict:** `confirmed_gap` — PUNKTFUNK — gap is real on every media socket. Native data plane crates/punktfunk-core/src/transport/udp.rs:359-365 (UdpTransport::connect) and :374-414 (connect_via_punch) grow SO_SNDBUF/SO_RCVBUF (:431-447 grow_buffers via socket2::SockRef) and set GSO/USO, but never set IP_TOS/IPV6_TCLASS/SO_PRIORITY/qWAVE. GameStream sockets are bare std UdpSocket with no QoS: video crates/punktfunk-host/src/gamestream/stream.rs:66, audio audio.rs:305, control control.rs:36. RTSP does NOT parse the GameStream qosTrafficType keys at all (grep qosTrafficType in crates/punktfunk-host → exit 1), and rtsp.rs only reads x-nv-vqos bitrate/fec/codec (rtsp.rs:278). The only QoS in the tree is a macOS *client* pthread QoS-class (core/src/client.rs:156-169) — unrelated to link-layer marking. socket2 is already a punktfunk-core dep (Cargo.toml:34), so DSCP via SockRef::set_tos is trivial to add. APOLLO — confirmed it does exactly this, on by default. Per-session calls: src/stream.cpp:1917 (video) and :1938 (audio) → platf::enable_socket_qos(..., videoQosType/audioQosType != 0). Those flags come from RTSP src/rtsp.cpp:1005-1006 and are DEFAULTED non-zero at src/rtsp.cpp:982-983 (x-nv-vqos qosTrafficType="5", x-nv-aqos="4"), so QoS is on for stock Moonlight. Linux impl: src/platform/linux/misc.cpp:797-851 sets IP_TOS/IPV6_TCLASS (DSCP 40=AF41 video, 48=CS6 audio, shifted <<2) plus SO_PRIORITY 5/6. Windows impl: src/platform/windows/misc.cpp:1616-1722 dynamically loads qwave.dll and uses QOSCreateHandle/QOSAddSocketToFlow with QOSTrafficTypeAudioVideo/Voice — and crucially returns nullptr (no-op) unless dscp_tagging is set (:1622-1625). macOS: src/platform/macos/misc.mm:446.
- **Refined:** Add a per-OS set_media_qos(socket, kind) helper. Linux/macOS: use the already-present socket2 — SockRef::set_tos(AF41<<2) for IPv4 / set_tclass_v6 for IPv6, plus SO_PRIORITY on Linux (video=5, audio=6, the max without CAP_NET_ADMIN; set AFTER TOS since TOS resets it — Apollo linux/misc.cpp:841-845). Wire it into UdpTransport::connect / connect_via_punch (the native punktfunk/1 data plane — the primary, highest-value target) behind an opt-in env (PUNKTFUNK_DSCP=1) and optionally a Config field, plus the GameStream stream.rs:66 / audio.rs:305 / control.rs:36 sockets. IMPORTANT Windows-host caveat (this is the user's focus and where the naive version fails): on Windows, plain IP_TOS setsockopt is silently stripped by the OS unless a registry/group-policy QoS policy ('Do not use NLA') is configured — which is exactly why Apollo uses qWAVE (QOSAddSocketToFlow) instead. So a one-line socket2 set_tos does NOT tag on the wire on Windows. To actually deliver value on the Windows host, port Apollo's qWAVE path (runtime LoadLibraryExA qwave.dll, QOSCreateHandle once, QOSAddSocketToFlow per socket with QOSTrafficTypeAudioVideo/Voice) including the dual-stack v4-mapped connect() workaround (windows/misc.cpp:1675-1700) — note our data socket is already connect()ed (udp.rs:361), which sidesteps most of that hack. Keep RAII teardown (QOSRemoveSocketFromFlow on drop) like Apollo's qos_t/deinit_t. This is purely socket-setup, off the per-frame path, no core C-ABI change, no async — fully compatible with all three design invariants.
#### 90. Bitrate-derived rate-control pacing (vs frame-interval-only) #### 90. Bitrate-derived rate-control pacing (vs frame-interval-only)
*Area:* `cmp:protocol-streaming` · *Windows-host:* no · *Severity:* medium · *Effort:* medium · **✓ verified** **REJECTED / OBSOLETE (2026-06-20)** — proposal premise is false: Apollo paces to a hardcoded ~80%-of-1Gbps FIXED link ceiling (stream.cpp:1464), NOT the negotiated bitrate, and punktfunk is pixel-rate-bound by design (VBR/IDR spikes legitimately exceed average bitrate). Existing frame-interval + burst-cap pacing already covers the cited microburst risk; defer unless a measured rate-limited-link regression appears. (If anything, port the FIXED link-ceiling concept via an env knob like PUNKTFUNK_PACE_BURST_KB, not bitrate-derived pacing.)
- **Apollo does:** Apollo paces each frame's packets at the *negotiated bitrate*: ratecontrol_packets_in_1ms = giga*80/100/1000/blocksize/8 (stream.cpp:1464) and sleeps the send loop to that per-millisecond budget across the frame (stream.cpp:1578-1627), so the sender shapes to the link's allotted rate, not just the frame deadline.
- **punktfunk gap:** Both punktfunk send pacers spread purely over the FRAME INTERVAL: the GameStream sender uses budget = frame_interval * 0.75 (stream.rs:209) and the native paced_submit uses budget to next frame's deadline * 0.9 (punktfunk1.rs:1752) — neither derives a packets-per-ms budget from cfg.bitrate_kbps (the bitrate is only used to open NVENC, stream.rs:275). A spiky IDR or VBR overshoot can still microburst above the negotiated rate within its frame window.
- **Proposal:** Compute a bitrate-derived per-millisecond send budget (like Apollo's ratecontrol_packets_in_1ms) from the negotiated bitrate and pace overflow to THAT rate inside paced_submit / spawn_sender, taking the min of the frame-interval budget and the bitrate budget. Smooths VBR bursts on rate-limited links without breaking the existing microburst fast-path.
- **Verify verdict:** `partial` — PUNKTFUNK gap is real: both pacers spread over the FRAME INTERVAL only, never the bitrate. GameStream sender: `let budget = frame_interval.mul_f32(0.75)` (crates/punktfunk-host/src/gamestream/stream.rs:209). Native paced_submit: `let budget = deadline.checked_duration_since(pace_start)...mul_f32(0.9)` (crates/punktfunk-host/src/punktfunk1.rs:1752-1755) where deadline = `next += interval` (punktfunk1.rs:2162) and `interval = Duration::from_secs_f64(1.0 / effective_hz...)` (punktfunk1.rs:2357). bitrate_kbps only configures NVENC (stream.rs:275; punktfunk1.rs:2306, 2694) and is never fed to the pacer. So far the gap claim holds. BUT the Apollo characterization in the proposal is FACTUALLY WRONG: Apollo's `size_t ratecontrol_packets_in_1ms = std::giga::num * 80 / 100 / 1000 / blocksize / 8;` (/home/enricobuehler/Apollo/src/stream.cpp:1464) is a HARDCODED 80% of 1 Gigabit/sec — a fixed constant. grep across stream.cpp shows the negotiated/session bitrate never enters this formula (only std::giga::num, blocksize, and the 80/100 constant appear at lines 1464/1578-1582/1625-1627). Apollo paces to a FIXED ~800 Mbps link ceiling regardless of negotiated bitrate; it is NOT "negotiated-bitrate pacing." punktfunk's own design notes deliberately reject clamping to negotiated bitrate: "The encoder is pixel-rate bound, not bitrate bound" (punktfunk1.rs:321) and the whole 1Gbps+ effort raised the ceiling (punktfunk1.rs:1617-1619, MAX_BITRATE_KBPS ~2 Gbps).
- **Refined:** Reject the proposal AS WRITTEN — its premise ("Apollo paces to the negotiated bitrate") is false; Apollo paces to a hardcoded 80%-of-1Gbps fixed link ceiling (stream.cpp:1464), and pacing to negotiated bitrate would actively regress punktfunk (VBR/IDR spikes legitimately exceed average bitrate, and punktfunk explicitly treats the encoder as pixel-rate-bound, not bitrate-bound — punktfunk1.rs:321). If anything is worth porting, it is the FIXED per-millisecond link-rate ceiling concept, not bitrate-derived pacing: optionally compute a fixed packets-per-ms budget from a configurable link-rate ceiling (default high, e.g. matching MAX_BITRATE_KBPS, env-overridable like PUNKTFUNK_PACE_BURST_KB) and take min(frame-interval budget, link-ceiling budget) inside paced_submit/spawn_sender — purely as a microburst smoother for rate-limited links, NOT tied to cfg.bitrate_kbps. Note punktfunk already has the microburst fast-path (burst_cap, punktfunk1.rs:2005-2009 / paced_submit:1734-1743) and frame-interval spreading, which together already address the "spiky IDR microburst" symptom the proposal cites. Recommend deferring unless a measured rate-limited-link regression appears; the current frame-interval + burst-cap pacing covers the cited risk.
#### 94. Consume the GameStream client loss-stats report #### 94. Consume the GameStream client loss-stats report
*Area:* `cmp:protocol-streaming` · *Windows-host:* no · *Severity:* low · *Effort:* small · **✓ verified** *Area:* `cmp:protocol-streaming` · *Windows-host:* no · *Severity:* low · *Effort:* small · **✓ verified**
@@ -2001,5 +1904,5 @@ GameStream `SO_SNDBUF`), **#8** (move GameStream input injection off the ENet se
- **Verify verdict:** `confirmed_gap` — PUNKTFUNK gap is real. crates/punktfunk-host/src/gamestream/control.rs:165-177 — after decrypt, the only inner-type dispatch is `if matches!(inner, 0x0301 | 0x0302 | 0x0305)` → force_idr; everything else falls through to gamepad::decode (returns None for non-controller) then input::decode, which at crates/punktfunk-host/src/gamestream/input.rs:35 returns empty unless `type == 0x0206`. So a loss-stats packet (`0x0201`) is silently dropped — `on_receive` has no branch for it. A broad grep across crates/ for loss-stats/last-good-frame/0x0201 found nothing (only DXGI's unrelated "last good frame" comment at capture/dxgi.rs:751). The native plane has only end-of-burst ProbeResult bandwidth/loss telemetry (crates/punktfunk-core/src/client.rs:436, abi.rs:1499) — a one-shot speed test, NOT continuous in-stream loss feedback. APOLLO confirms the claim: src/stream.cpp:41 `#define IDX_LOSS_STATS 3`, src/stream.cpp:61 maps it to wire type `0x0201`, and src/stream.cpp:943-957 reads `int32_t *stats` with stats[0]=count, stats[1]=time-window ms, stats[3]=lastGoodFrame (logged at BOOST verbose). Wire offset confirmed: the map callback receives `next_payload = plaintext.data()+4` (src/stream.cpp:1104), i.e. the body AFTER the 4-byte `[type][payloadLength]` header — so stats[0..] is at body offset 0. Note: Apollo only LOGS it; it does not yet drive adaptive FEC/bitrate off it either. - **Verify verdict:** `confirmed_gap` — PUNKTFUNK gap is real. crates/punktfunk-host/src/gamestream/control.rs:165-177 — after decrypt, the only inner-type dispatch is `if matches!(inner, 0x0301 | 0x0302 | 0x0305)` → force_idr; everything else falls through to gamepad::decode (returns None for non-controller) then input::decode, which at crates/punktfunk-host/src/gamestream/input.rs:35 returns empty unless `type == 0x0206`. So a loss-stats packet (`0x0201`) is silently dropped — `on_receive` has no branch for it. A broad grep across crates/ for loss-stats/last-good-frame/0x0201 found nothing (only DXGI's unrelated "last good frame" comment at capture/dxgi.rs:751). The native plane has only end-of-burst ProbeResult bandwidth/loss telemetry (crates/punktfunk-core/src/client.rs:436, abi.rs:1499) — a one-shot speed test, NOT continuous in-stream loss feedback. APOLLO confirms the claim: src/stream.cpp:41 `#define IDX_LOSS_STATS 3`, src/stream.cpp:61 maps it to wire type `0x0201`, and src/stream.cpp:943-957 reads `int32_t *stats` with stats[0]=count, stats[1]=time-window ms, stats[3]=lastGoodFrame (logged at BOOST verbose). Wire offset confirmed: the map callback receives `next_payload = plaintext.data()+4` (src/stream.cpp:1104), i.e. the body AFTER the 4-byte `[type][payloadLength]` header — so stats[0..] is at body offset 0. Note: Apollo only LOGS it; it does not yet drive adaptive FEC/bitrate off it either.
- **Refined:** Add one branch to control.rs `on_receive`: when the decrypted `pt` inner type (LE u16 at pt[0..2]) == 0x0201 and pt.len() >= 20, decode the body as four LE i32 — pt[4..8]=loss_count, pt[8..12]=time_window_ms, pt[16..20]=last_good_frame (mirroring Apollo's stats[0]/stats[1]/stats[3]; verify endianness against a real Moonlight capture — moonlight-common-c writes these as host-order/LE, and punktfunk already treats control inner fields as LE). Initially log at debug/trace and optionally surface via an AtomicU32 in AppState or the mgmt API so the web console can show client-observed loss. Keep it read-only first. Caveat for the backlog: this is a low-value telemetry hook, NOT adaptive control. The actual lever (adaptive FEC % / bitrate de-rating) is a separate, larger piece of work that Apollo itself does not implement off this signal — do not over-scope. Place it next to the existing 0x0301/0x0302/0x0305 dispatch so the control hot path stays a single decrypt + cheap type match. windowsHost=false is correct: this is GameStream-plane, OS-independent, and the punktfunk/1 native plane is the higher-priority protocol — so prioritize accordingly. - **Refined:** Add one branch to control.rs `on_receive`: when the decrypted `pt` inner type (LE u16 at pt[0..2]) == 0x0201 and pt.len() >= 20, decode the body as four LE i32 — pt[4..8]=loss_count, pt[8..12]=time_window_ms, pt[16..20]=last_good_frame (mirroring Apollo's stats[0]/stats[1]/stats[3]; verify endianness against a real Moonlight capture — moonlight-common-c writes these as host-order/LE, and punktfunk already treats control inner fields as LE). Initially log at debug/trace and optionally surface via an AtomicU32 in AppState or the mgmt API so the web console can show client-observed loss. Keep it read-only first. Caveat for the backlog: this is a low-value telemetry hook, NOT adaptive control. The actual lever (adaptive FEC % / bitrate de-rating) is a separate, larger piece of work that Apollo itself does not implement off this signal — do not over-scope. Place it next to the existing 0x0301/0x0302/0x0305 dispatch so the control hot path stays a single decrypt + cheap type match. windowsHost=false is correct: this is GameStream-plane, OS-independent, and the punktfunk/1 native plane is the higher-priority protocol — so prioritize accordingly.
_(28 detailed; remaining 68 medium/low items are in the table above with citations available in Parts 23.)_ _(28 items had detail subsections — 16 shipped/obsolete ones are now collapsed to one-liners above, 12 still-open ones keep full citations; the remaining 68 medium/low items are in the table above with citations available in Parts 23.)_
+45 -115
View File
@@ -1,126 +1,56 @@
--- ---
title: "Apple Stage-2 Presenter (handoff)" title: "Apple Stage-2 Presenter (handoff)"
description: "Implementation plan for the explicit VTDecompressionSession → CAMetalLayer presenter — hand-paced present + true decode→present (glass-to-glass) measurement. Written so a Mac agent can pick it up." description: "Design rationale + open items for the explicit VTDecompressionSession → CAMetalLayer presenter. Implementation shipped; this page is trimmed to the why + what's left."
--- ---
> **Status update:** the stage-2 presenter described here has since been **built and live-validated**, > **Status:** SHIPPED behind the opt-in `punktfunk.presenter` flag (`AVSampleBufferDisplayLayer`
> shipping behind an opt-in flag (`AVSampleBufferDisplayLayer` remains the default known-good path). > stage-1 remains the default known-good path). Live-validated ~11 ms p50 capture→present (commit
> This page is preserved as the implementation/handoff record for that work. > `7b10714`). Code: `clients/apple/Sources/PunktfunkKit/{Stage2Pipeline,MetalVideoPresenter,VideoDecoder,LatencyMeter}.swift`;
> Settings has a presenter picker (`DefaultsKey.presenter`, `SettingsView.swift`). This doc is trimmed
> to design rationale + open items — the shipped `.swift` code is the source of truth for the
> decode/present/measurement walkthrough.
The implementation plan for the **stage-2 Apple presenter**. The **stage-1** presenter feeds ## Why stage 2 (design rationale)
compressed HEVC straight into `AVSampleBufferDisplayLayer`, which hardware-decodes **and presents
internally with no per-frame callback** — so we can't stamp decode or present, and we can't hand-pace. The **stage-1** presenter feeds compressed HEVC straight into `AVSampleBufferDisplayLayer`, which
Stage-2 takes explicit control: decode with `VTDecompressionSession`, present decoded frames through a hardware-decodes **and presents internally with no per-frame callback** — so we can't stamp decode or
`CAMetalLayer` driven by a display link. Two wins: **~0.5 refresh off the present tail** (the biggest present, and we can't hand-pace. **Stage-2** takes explicit control: decode with
client latency term at 60 Hz) and **true decode→present / glass-to-glass** numbers. `VTDecompressionSession`, present decoded frames through a `CAMetalLayer` driven by a display link.
Two wins justify the extra machinery:
- **~0.5 refresh off the present tail** — the present tail is the biggest client latency term at 60 Hz;
display-link-driven present pops the newest-ready frame each vsync instead of letting the layer present
on its own internal schedule.
- **True decode→present / glass-to-glass measurement** — explicit decode-completion and present
timestamps make `capture→present` measurable (modulo the still-unmeasured host render→capture term).
All of this is **macOS/iOS/tvOS-only** — build + validate on a Mac (`swift build && swift test`, then All of this is **macOS/iOS/tvOS-only** — build + validate on a Mac (`swift build && swift test`, then
live against a Linux host). The host + connector side is already done: `PunktfunkConnection.clockOffsetNs` live against a Linux host). The host + connector side is already done:
(the connect-time skew offset, host minus client) is what makes the present timestamp cross-machine `PunktfunkConnection.clockOffsetNs` (the connect-time skew offset, host minus client) is what makes the
valid. See [Status](/docs/status) and roadmap §12. present timestamp cross-machine valid. `skewCorrected` stays false when `clockOffsetNs == 0` (old host)
— then the numbers are same-host-only.
## Where it plugs into the existing code ## Architecture pattern (worth recording)
| Existing (stage-1) | Stage-2 change | Async `VTDecompressionSession` callback → **1-slot newest-ready ring** → display-link-driven present:
|---|---|
| `StreamPump` pulls AUs → `AnnexB.sampleBuffer``layer.enqueue` (compressed) | A `Stage2Pump` (or a mode flag on `StreamPump`) feeds AUs to `VTDecompressionSessionDecodeFrame` instead |
| `StreamView`/`StreamViewIOS` host an `AVSampleBufferDisplayLayer` | Host a `CAMetalLayer` (+ a display link); keep the input-capture + HUD overlay unchanged |
| `AnnexB.formatDescription(fromIDR:)` builds the format desc, refreshed on every IDR | **Reused** — it's the `VTDecompressionSession`'s format description; recreate the session when it changes |
| `LatencyMeter` records capture→client-receipt at `onFrame` | Extend to record **decode-completion** and **present** stages (below) |
Keep stage-1 behind a `UserDefaults` flag (e.g. `punktfunk.presenter = "stage1" | "stage2"`) so a
regression can fall back — `AVSampleBufferDisplayLayer` is the known-good path.
## Decode: VTDecompressionSession
1. Create the session from the IDR's `CMVideoFormatDescription`
(`AnnexB.formatDescription(fromIDR:)`):
```
VTDecompressionSessionCreate(
allocator: nil,
formatDescription: fmt,
decoderSpecification: nil, // hardware by default; no need to force
imageBufferAttributes: [
kCVPixelBufferMetalCompatibilityKey: true,
kCVPixelBufferPixelFormatTypeKey:
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, // 8-bit SDR; 10-bit (…10BiPlanar) for HDR later
],
outputCallback: <C-callback>,
decompressionSessionOut: &session)
```
2. Per AU: build the same `CMSampleBuffer` as stage-1 (`AnnexB.sampleBuffer(au:format:)`, PTS =
`au.ptsNs` @ 1e9 timescale) and submit:
```
VTDecompressionSessionDecodeFrame(session, sampleBuffer,
flags: ._EnableAsynchronousDecompression,
frameRefcon: <pts or a boxed context>, infoFlagsOut: nil)
```
3. The **output callback** delivers `(status, infoFlags, imageBuffer: CVImageBuffer?, presentationTimeStamp, …)`.
`presentationTimeStamp` is `au.ptsNs` (the host capture clock). **Stamp decode-completion here**
(`CLOCK_REALTIME` ns), retain the `CVPixelBuffer`, and push `{pts, pixelBuffer, decodedNs}` into a
small NSLock-guarded ring (the "ready" queue) the display link drains.
4. **IDR / mode change**: when `AnnexB.formatDescription` yields a new desc, check
`VTDecompressionSessionCanAcceptFormatDescription`; if not, finish-and-recreate the session (same
trigger stage-1 uses to refresh `format`). On decoder error (`kVTVideoDecoderBadDataErr`, etc.) drop
to the next IDR — there's no out-of-band extradata; recovery keyframes re-carry the parameter sets.
## Present: CAMetalLayer + display link
- `CAMetalLayer` (device = system default, `pixelFormat = .bgra8Unorm`, `framebufferOnly = true`,
`drawableSize` = stream WxH). The view: macOS `NSView`/iOS `UIView` whose `layerClass`/backing layer
is the `CAMetalLayer` (mirror `StreamView`/`StreamViewIOS`).
- **Display link** drives present: macOS `CVDisplayLink` (or `CADisplayLink` on macOS 14+),
iOS/tvOS `CADisplayLink`. Each callback carries the **target present timestamp** (`CVTimeStamp` /
`targetTimestamp`).
- Each vsync: pop the **newest** ready frame (drop older undisplayed ones — low-latency default; no
smoothing buffer to start), render a fullscreen quad sampling the **biplanar YUV** (luma +
chroma planes via `CVMetalTextureCache`) with a BT.709 YUV→RGB fragment shader, then
`commandBuffer.present(drawable)` (or `present(drawable, atTime:)`). **Stamp present time** for the
frame just shown (use the display link's target timestamp converted to `CLOCK_REALTIME`).
- Colorspace: BT.709 8-bit for now (matches the host's SDR). HDR (BT.2020/PQ, 10-bit `…10BiPlanar` +
EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`) is a later tie-in with the HDR roadmap (§10).
### Cheaper intermediate (2a) if the Metal path is too big in one step
Decode with `VTDecompressionSession` (gets the **decode-completion timestamp** = capture→decoded),
then wrap the decoded `CVPixelBuffer` in a `CMSampleBuffer` and `enqueue` it into the existing
`AVSampleBufferDisplayLayer` (it accepts uncompressed pixel buffers too). This yields the decode term
**without** a Metal renderer — but **not** true present (the layer still presents internally). Ship 2a
first if useful; 2b (CAMetalLayer + display link) is required for the on-glass present stamp.
## Measurement (the whole point)
Extend `LatencyMeter` (or add per-stage meters) so each frame records three instants, all
`CLOCK_REALTIME` ns, all shifted by `connection.clockOffsetNs` to the host clock:
- **capture→decoded** = `decodedNs + offset pts_ns` (VideoToolbox decode latency, cross-machine)
- **decode→present** = `presentedNs decodedNs` (the present tail stage-2 shortens)
- **capture→present** = `presentedNs + offset pts_ns` — **the glass-to-glass number** (modulo the
host render→capture term, still unmeasured; see roadmap §12)
Surface `capture→present` p50/p95 in the HUD (extend the existing `model.latency*` line in
`ContentView`). `skewCorrected` stays false when `clockOffsetNs == 0` (old host) — then the numbers are
same-host-only, as today.
## Validation
- `swift test`: add a decode-output test (decode a known IDR built like
`VideoToolboxRoundTripTests` → assert a `CVPixelBuffer` of the right dimensions + the
decode callback fires). Present is display-bound — validate it **live** via the HUD number.
- Live: connect to a Linux host (`punktfunk1-host --source virtual` on the GNOME box; see
[Ubuntu — GNOME](/docs/ubuntu-gnome)), confirm `capture→present` is a few ms over `capture→client`
and that `decode→present` shrank vs. an `AVSampleBufferDisplayLayer` baseline.
- Compare against the headless reference number: `punktfunk-probe` reports skew-corrected
capture→reassembled (~1.3 ms p50 GNOME box → dev box); capture→present should be that **+ decode +
present**.
## Gotchas
- VT decode is **async**; the output callback runs on a VT-managed thread — don't block it, just stamp - VT decode is **async**; the output callback runs on a VT-managed thread — don't block it, just stamp
+ enqueue. Retain the `CVPixelBuffer` until presented (the ring owns it). decode-completion (`CLOCK_REALTIME` ns) + enqueue. Retain the `CVPixelBuffer` until presented (the ring
- `VTDecompressionSessionDecodeFrame` wants the **same** `CMSampleBuffer` shape stage-1 builds (AVCC owns it).
length-prefixed NALs, in-band parameter sets in the format desc, never as extradata). - Each vsync pops the **newest** ready frame and drops older undisplayed ones — low-latency default, no
- `CAMetalLayer.drawableSize` must track mode changes (the host can `Reconfigure` mid-stream — watch smoothing buffer.
`PunktfunkConnection.mode`/the new-IDR dimensions). - Three per-frame instants (all `CLOCK_REALTIME` ns, all shifted by `clockOffsetNs` to the host clock):
- Don't add a jitter/smoothing buffer for the first cut — present newest-ready for lowest latency; a **capture→decoded** = `decodedNs + offset pts_ns`; **decode→present** = `presentedNs decodedNs`
pacing policy can come later if frames look uneven. (the tail stage-2 shortens); **capture→present** = `presentedNs + offset pts_ns` — the glass-to-glass
- Keep `clients/apple/README.md`'s "Stage 2" item + [Status](/docs/status) updated when this lands. number.
## Open items
- **Make stage 2 the default** — after resolution / HDR edge-case checks (HDR = BT.2020/PQ, 10-bit
`…10BiPlanar` + EDR `CAMetalLayer.wantsExtendedDynamicRangeContent`; ties in with the HDR roadmap).
- **Glass-to-glass numbers via `tools/latency-probe`** — close the still-unmeasured host render→capture
term.
- **Smoothing / pacing policy** — present newest-ready for lowest latency today; a pacing policy can come
later if frames look uneven.
- **iOS / iPadOS / tvOS stage-2 variants.**
@@ -1,5 +1,9 @@
# Windows secure-desktop capture — two-process design # Windows secure-desktop capture — two-process design
> **ARCHIVED 2026-06-26** — this two-process WGC secure-desktop design shipped but is now a
> *fallback*; IDD-push is the primary secure-desktop capture path (2026-06-25). Kept for its
> constraint analysis + architectural rationale. Current status: design/windows-host-rewrite.md.
Status: **all steps (16) implemented and live-validated on the RTX 4090 (2026-06-16).** The Status: **all steps (16) implemented and live-validated on the RTX 4090 (2026-06-16).** The
two-process path works end to end (host as SYSTEM): the user-session WGC helper relays video, the mux two-process path works end to end (host as SYSTEM): the user-session WGC helper relays video, the mux
switches to the host's DDA on the secure desktop, a dead helper is rebuilt automatically, and the switches to the host's DDA on the secure desktop, a dead helper is rebuilt automatically, and the
+14 -28
View File
@@ -3,14 +3,16 @@ title: "DualSense Haptics"
description: "Feasibility and scoping for audio-driven DualSense haptics." description: "Feasibility and scoping for audio-driven DualSense haptics."
--- ---
> **Status:** Audio-driven advanced (voice-coil) haptics — **NO-GO, DEFERRED.** The reachable
> HID work it scoped instead — **adaptive triggers + two-motor rumble** — **SHIPPED** (commit
> `59edeed`; see CLAUDE.md gamepad section, `inject/dualsense.rs`). This doc is trimmed to the
> deferral rationale (the three walls) + the conditions that would trigger a revisit.
**Status: scoped, NO-GO for now (deferred).** Advanced voice-coil haptics on the DualSense are Advanced voice-coil haptics on the DualSense are driven by the controller's **USB audio interface**
driven by the controller's **USB audio interface** (4-channel surround, the back two channels carry (4-channel surround, the back two channels carry the haptic waveform), *not* by HID reports.
the haptic waveform), *not* by HID reports. Emulating that on a Linux host and faithfully replaying Emulating that on a Linux host and faithfully replaying it on the Apple client both hit hard walls,
it on the Apple client both hit hard walls, and the supply of software that actually *emits* these and the supply of software that actually *emits* these haptics on a Linux host is essentially zero.
haptics on a Linux host is essentially zero. We defer the audio-haptics feature and instead land the We defer the audio-haptics feature.
parts of "really supporting the DualSense" that *are* reachable: **adaptive triggers (HID) and
two-motor rumble.**
(Grounded in a 4-agent feasibility read — host USB-gadget viability, DualSense audio descriptors, (Grounded in a 4-agent feasibility read — host USB-gadget viability, DualSense audio descriptors,
Linux game demand, Apple client render path — 2026-06-10.) Linux game demand, Apple client render path — 2026-06-10.)
@@ -72,22 +74,6 @@ Even with a captured waveform, the primary client (macOS/iOS) can't render it we
- There is **no public macOS API** to route CoreAudio to the DualSense's channels 34. Doing it - There is **no public macOS API** to route CoreAudio to the DualSense's channels 34. Doing it
anyway means private/reverse-engineered APIs that break across OS updates. anyway means private/reverse-engineered APIs that break across OS updates.
## What we *can* ship instead ("really supporting the DualSense" minus audio haptics)
The HID DualSense we built is the foundation, and the high-value parts are within reach:
1. **Adaptive triggers — GO.** `dualsense.rs` already parses the L2/R2 trigger effects out of HID
output report `0x02`. Finishing this is the paused HID work: route them over the `0xCD`
HID-output back-channel and render on the client. This delivers the headline "DualSense feel"
(trigger resistance/weapon tension) for any source that emits it — and it's pure HID, no audio
interface, no kernel rebuild.
2. **Two-motor rumble — already done.** Parsed host-side; the Apple client already has
`nextRumble()`. Wire it to `GCDeviceHaptics`/`CHHapticEngine` as discrete patterns (API-clean,
no private APIs).
3. **LED / player-LED / touchpad / motion** — already parsed; finish the `0xCC`/`0xCD` routing.
This is the resume-able HID DualSense Phase C/D/E work — it stands on its own and was never blocked.
## Conditions for a future GO on audio haptics ## Conditions for a future GO on audio haptics
Revisit if **all three** change: Revisit if **all three** change:
@@ -103,9 +89,9 @@ Revisit if **all three** change:
Until then the cost/benefit is upside-down: three hard subsystems (kernel, USB gadget, audio Until then the cost/benefit is upside-down: three hard subsystems (kernel, USB gadget, audio
routing) to serve ~510 Proton titles, rendered lossily on the one client we ship. routing) to serve ~510 Proton titles, rendered lossily on the one client we ship.
## Recommendation ## Open items
**Defer audio-driven advanced haptics. Land adaptive triggers (HID) + rumble instead** — that's the - **Audio-driven advanced (voice-coil) haptics — DEFERRED.** Explicitly blocked until **all three**
reachable 80% of "really supporting the DualSense," needs no kernel work, and the parsing is already "Conditions for a future GO" above are met (a real DualSense to capture the UAC layout, a host UDC
written. Keep this doc as the down payment for the audio-haptics feature whenever the three or grown Linux haptic-title supply, and a client that can render PCM haptics). This doc is the down
conditions above are met. payment to revisit then.
+66 -143
View File
@@ -1,6 +1,11 @@
# Game library: more game stores # Game library: more game stores
Status: **design / not started** · Author research: web-backed, adversarially verified (2026-06-26). > **Status:** Phases 14 SHIPPED — 6 `LibraryProvider` impls (Steam, Lutris, Heroic, Epic, GOG,
> Xbox) in [`crates/punktfunk-host/src/library.rs`](../crates/punktfunk-host/src/library.rs)
> (1869 lines; commits `5f8c6b6` Lutris+Heroic, `b657452` Epic+GOG, `aed0bf0` Xbox, `5acc12d` shared
> art cache, `7e9023f` GameStream/Windows+non-gamescope launch wiring, `203ad80` web store badges).
> Phases 56 (the remaining 6 providers + the `/library/art` endpoint) are **pending**. This doc is
> trimmed to design rationale + open items; the shipped code is the source of truth.
Goal: extend the unified game library so it enumerates and launches titles from more stores — Goal: extend the unified game library so it enumerates and launches titles from more stores —
on **Windows** Xbox / Game Pass, Epic, EA app (and GOG / Ubisoft / Battle.net / Amazon); on **Windows** Xbox / Game Pass, Epic, EA app (and GOG / Ubisoft / Battle.net / Amazon);
@@ -8,11 +13,10 @@ on **Linux** Heroic (Epic+GOG+Amazon), Lutris, and a `.desktop`/Flatpak catch-al
--- ---
## 1. Where the extension point already is ## 1. The extension point
The library lives in [`crates/punktfunk-host/src/library.rs`](../crates/punktfunk-host/src/library.rs) The library lives in [`library.rs`](../crates/punktfunk-host/src/library.rs) and is a plug-in system:
and is already a plug-in system — its own doc comment names these exact targets. Adding a store is adding a store is a new `LibraryProvider`, not a rewrite.
a new `LibraryProvider`, not a rewrite.
```rust ```rust
pub trait LibraryProvider { pub trait LibraryProvider {
@@ -21,21 +25,11 @@ pub trait LibraryProvider {
} }
pub struct GameEntry { id: String /* "<store>:<localid>" */, store, title, art: Artwork, launch: Option<LaunchSpec> } pub struct GameEntry { id: String /* "<store>:<localid>" */, store, title, art: Artwork, launch: Option<LaunchSpec> }
pub struct Artwork { portrait, hero, logo, header: Option<String> } // URLs the CLIENT fetches pub struct Artwork { portrait, hero, logo, header: Option<String> } // URLs the CLIENT fetches
pub struct LaunchSpec{ kind: String, value: String } // today: "steam_appid" | "command" pub struct LaunchSpec{ kind: String, value: String }
``` ```
Today: `SteamProvider` (reads local `.acf` / `.vdf` files — **no API key, no network**) plus a **The "read the launcher's own on-disk files, no auth" approach is the gold standard we replicate per
user-curated `custom` store. `all_games()` merges them; `launch_command(id)` resolves a store.** Launcher-need-not-be-running unless noted.
store-qualified id **against the host's own library** and maps the `LaunchSpec` to a shell command,
with injection guards (`steam_appid` is validated digits-only; the client never sends a raw command).
**The "read the launcher's own on-disk files, no auth" approach is the gold standard we replicate per store.**
Surfaces touched by adding stores:
- `library.rs` — new providers (the bulk of the work is small per store).
- [`mgmt.rs`](../crates/punktfunk-host/src/mgmt.rs) `:1138` — serves `/library`; OpenAPI-generated TS client picks up new stores as data.
- [`web/src/sections/Library/view.tsx`](../web/src/sections/Library/view.tsx) — the grid; **store badge is hard-coded** steam-vs-custom, needs generalizing per `game.store`.
- Launch wiring: [`punktfunk1.rs`](../crates/punktfunk-host/src/punktfunk1.rs) `:573` (native) and [`gamestream/stream.rs`](../crates/punktfunk-host/src/gamestream/stream.rs) `:122` (Moonlight).
> The legacy GameStream `apps.json` ([`gamestream/apps.rs`](../crates/punktfunk-host/src/gamestream/apps.rs)) > The legacy GameStream `apps.json` ([`gamestream/apps.rs`](../crates/punktfunk-host/src/gamestream/apps.rs))
> is a **separate** Moonlight surface (session recipes: compositor + nested command) and stays as-is. > is a **separate** Moonlight surface (session recipes: compositor + nested command) and stays as-is.
@@ -103,12 +97,12 @@ Steam gets free CDN art keyed by appid. Most stores don't. Layered ladder, degra
<operator key>`, **off by default**) to fill gaps. Not no-auth; never blocks listing. <operator key>`, **off by default**) to fill gaps. Not no-auth; never blocks listing.
6. **None** → existing title-only card. 6. **None** → existing title-only card.
**New endpoint:** `GET /library/art/<entryId>/<slot>` (slot ∈ `portrait|hero|logo|header`) on `mgmt.rs`. **New endpoint (still pending):** `GET /library/art/<entryId>/<slot>` (slot ∈ `portrait|hero|logo|header`)
It resolves `entryId` in the host library to a **known on-disk absolute path** (never interpolates raw on `mgmt.rs`. It resolves `entryId` in the host library to a **known on-disk absolute path** (never
client input into a filesystem path), sanitizes the slot, rejects `..`, streams the bytes with the right interpolates raw client input into a filesystem path), sanitizes the slot, rejects `..`, streams the bytes
content-type. Reserve `data:` URLs for tiny logos only (don't bloat the catalog JSON that crosses the with the right content-type. Reserve `data:` URLs for tiny logos only (don't bloat the catalog JSON that
control plane). See open question on whether this GET bypasses the mgmt bearer (images are non-sensitive crosses the control plane). See open question on whether this GET bypasses the mgmt bearer (images are
and the streaming client connects over punktfunk/1, not the bearer-gated REST). non-sensitive and the streaming client connects over punktfunk/1, not the bearer-gated REST).
--- ---
@@ -141,7 +135,7 @@ The one net-new surface is `GET /library/art` — covered in §2b (id-resolved p
--- ---
## 4. New `LaunchSpec` kinds ## 4. `LaunchSpec` kinds
| kind | value holds | maps to | | kind | value holds | maps to |
|---|---|---| |---|---|---|
@@ -160,45 +154,31 @@ The one net-new surface is `GET /library/art` — covered in §2b (id-resolved p
## 5. Per-store provider catalog ## 5. Per-store provider catalog
Confidence is **after** adversarial web-verification (research → verify). All enumeration is no-auth, All enumeration is no-auth, local. Confidence is **after** adversarial web-verification.
local, launcher-need-not-be-running unless noted.
### Linux ### Shipped (phases 14)
#### Lutris — P0, effort M, confidence **high** | store | OS | enumerate | launch kind | art |
- **Enumerate:** read-only `rusqlite` open of `pga.db` |---|---|---|---|---|
(`$XDG_DATA_HOME/lutris` | `~/.local/share/lutris` | `~/.var/app/net.lutris.Lutris/data/lutris`). | **Steam** | both | local `.acf`/`.vdf` | `steam_appid` | Steam CDN (client-direct) |
`SELECT id, slug, name, runner FROM games WHERE installed=1`. Optionally LEFT JOIN | **Lutris** | Linux | read-only `pga.db` (`installed=1`) | `lutris_id` | local JPEGs (needs `/library/art`, still pending) |
`games_categories`/`categories` to drop the `.hidden` category. Open `mode=ro`/`immutable=1` (Lutris | **Heroic** | Linux | `store_cache/{legendary,gog,nile}_library.json` | `heroic` | free public CDN (`art_*`) |
holds it open). `installed=1` matters — the DB also lists owned-but-not-installed rows. | **Epic** | Windows | `…\Manifests\*.item` | `epic` | local `catcache.bin` keyImages |
- **Launch:** `lutris_id` → `lutris lutris:rungameid/<id>` (execs the game; most nesting-friendly). | **GOG** | Windows | registry + `goggame-<id>.info` | `gog` (direct-exe) | `api.gog.com/products/<id>?expand=images` |
One-time on-box check that `games.id` == the `rungameid` int. | **Xbox / Game Pass** | Windows | `XboxGames\*\Content\MicrosoftGame.config` + AppRepository PFN | `aumid` | unofficial `displaycatalog` lookup |
- **Artwork:** **local** JPEGs keyed by slug — `coverart/<slug>.jpg` (→ portrait), `banners/<slug>.jpg`
(→ header) under `~/.local/share/lutris` (0.5.18+), with `~/.cache/lutris` (≤0.5.17) and the Flatpak
cache as fallbacks. Needs the `/library/art` endpoint. hero/logo stay None.
- **Notes:** highest-confidence new store. A `runner=='steam'` row can duplicate `SteamProvider` — dedup
is a nicety. Verify bundled-SQLite is fine for deb/rpm/flatpak.
#### Heroic — P0, effort M, confidence **high** (one provider = Epic + GOG + Amazon, art free) The hard-won corrections folded into these (keep when revisiting): Epic uses Playnite's **exclusion**
- **Enumerate:** parse `~/.config/heroic/store_cache/{legendary,gog,nile}_library.json` (Flatpak: filter (skip `UE_`, DLC `addons` w/o `addons/launchable`), builds the namespace:catalog:app **triple** when
`~/.var/app/com.heroicgameslauncher.hgl/config/heroic/...`). Data key is `"library"` (legendary/nile) ids exist else **falls back to the bare `appName` URI** (don't set launch=None); GOG launches the
or `"games"` (gog); ignore `__timestamp.*` siblings. Filter `is_installed==true` **and** cross-check **direct exe** (dodges Galaxy cold-start/anti-cheat); Xbox **reads** the PackageFamilyName from the
`install.install_path` exists (works around the gog `is_installed` bug, Heroic #2691). Fall back to `AppRepository\Packages\<PackageFullName>` dir name (**never** hash the publisher), scans `XboxGames` rather
`legendaryConfig/legendary/installed.json` etc. when a cache file is absent. than parse the undocumented `.GamingRoot`, and UWP `aumid` activation is load-bearing on the interactive
*(Heroic uses `legendaryConfig/legendary`, **not** the standalone `~/.config/legendary`.)* user token; Heroic `gui=false` is inert (`--no-gui` does it) and single-instance Electron forwards-and-exits
- **Launch:** `heroic` → `heroic --no-gui "heroic://launch?appName=<app>&runner=<runner>"` (argv, no shell). (launch was gated). Misses pure-UWP (non-GDK) Store games under ACL-locked `WindowsApps` — accepted for v1.
`--no-gui` does the suppression; the `gui=false` query param is **inert/fabricated** — drop it.
**Ship enumeration+art first, gate launch:** Heroic is single-instance Electron — if already running it
forwards the URI and **exits**, which (as gamescope's foreground child) would tear the session down while
the game runs **outside** gamescope, uncaptured. Also Electron needs a display — fine nested in gamescope,
not in a bare headless context.
- **Artwork:** **free** — `art_square` → portrait, `art_cover` → header, `art_background`||`art_cover` →
hero, `art_logo` → logo are already public Epic/GOG/Amazon CDN URLs. Skip non-`http(s)` values
(sideloaded `file://` art). No host endpoint.
- **Notes:** do **not** also build separate Linux GOG/Amazon providers — native Linux GOG Galaxy doesn't
exist; Heroic is the canonical Linux path for those.
#### Desktop (`.desktop` + Flatpak) — P1, effort M, confidence medium (universal catch-all) ### Remaining providers (phases 56)
#### Desktop (`.desktop` + Flatpak) — Linux, P1, effort M, confidence medium (universal catch-all)
- **Enumerate:** scan `{/var/lib/flatpak/exports/share/applications, - **Enumerate:** scan `{/var/lib/flatpak/exports/share/applications,
~/.local/share/flatpak/.../applications, /usr/share/applications, /usr/local/share/applications, ~/.local/share/flatpak/.../applications, /usr/share/applications, /usr/local/share/applications,
~/.local/share/applications}/*.desktop`. Require `Type=Application` + `Categories` contains `Game`; skip ~/.local/share/applications}/*.desktop`. Require `Type=Application` + `Categories` contains `Game`; skip
@@ -210,7 +190,7 @@ local, launcher-need-not-be-running unless noted.
App icons are low-res, not box art (acceptable header fallback). App icons are low-res, not box art (acceptable header fallback).
- **Notes:** run **last** and dedup by install path / drop ids already surfaced by Steam/Heroic/Lutris. - **Notes:** run **last** and dedup by install path / drop ids already surfaced by Steam/Heroic/Lutris.
#### itch.io — P3, effort S, confidence medium (Linux + Windows) #### itch.io — Linux + Windows, P3, effort S, confidence medium
- **Enumerate:** read-only `rusqlite` of `butler.db` (`~/.config/itch/db/butler.db`; Flatpak - **Enumerate:** read-only `rusqlite` of `butler.db` (`~/.config/itch/db/butler.db`; Flatpak
`io.itch.itch`; Windows `%AppData%\itch\db`, per-user). JOIN `caves`→`games`. **Key on `cave.ID`** (a `io.itch.itch`; Windows `%AppData%\itch\db`, per-user). JOIN `caves`→`games`. **Key on `cave.ID`** (a
game can have multiple caves; install location + verdict are per-cave). Read game title / `cover_url`; game can have multiple caves; install location + verdict are per-cave). Read game title / `cover_url`;
@@ -220,53 +200,7 @@ local, launcher-need-not-be-running unless noted.
`flavor==native` (html/jar/love need itch's runtime — fall back to custom). `flavor==native` (html/jar/love need itch's runtime — fall back to custom).
- **Artwork:** **free** — `games.cover_url` is a public itch CDN URL. - **Artwork:** **free** — `games.cover_url` is a public itch CDN URL.
### Windows #### Ubisoft Connect — Windows, P2, effort S, confidence medium
#### Epic Games Store — P1, effort M, confidence medium (cleanest Windows store to validate the launch wiring)
- **Enumerate:** read `C:\ProgramData\Epic\EpicGamesLauncher\Data\Manifests\*.item` (JSON; machine-wide,
SYSTEM-readable, launcher need not run). Read `DisplayName`, `AppName`, `CatalogNamespace`,
`CatalogItemId`, `InstallLocation`, `LaunchExecutable`, `MainGameAppName`, `AppCategories`. Iterate the
dir (filename is a random GUID).
**Use Playnite's EXCLUSION filter, not a positive `games` filter:** skip `AppName` starting `UE_`; skip
DLC only when `AppCategories` has `addons` && **not** `addons/launchable`; require `InstallLocation`
exists. (The first-pass positive filter `games + MainGameAppName==AppName` can drop legit games.)
- **Launch:** `epic` → Spawn `EpicGamesLauncher.exe` + `com.epicgames.launcher://apps/<ns>%3A<cat>%3A<app>?action=launch&silent=true`.
Build the **triple** only when both namespace and CatalogItemId are present; otherwise **fall back to the
bare `appName` URI (don't set launch=None)** — bare still works in Playnite today, it's just less robust.
CatalogItemId is **not** present in every `.item` — verify on a real box.
- **Artwork:** **free** — base64-decode + parse `Data\Catalog\catcache.bin`, index by catalogItemId, map
keyImages `DieselGameBoxTall`→portrait, `DieselGameBox`→hero, `DieselGameBoxLogo`→logo. None on miss.
- **Notes:** `.item` + `catcache.bin` are community-RE'd; `silent=true` may not suppress a cold-start
launcher window.
#### GOG — P1, effort M, confidence medium
- **Enumerate:** registry `HKLM\SOFTWARE\WOW6432Node\GOG.com\Games\<id>` (PATH/GAMENAME/gameID/EXE) or
Uninstall `<id>_is1` keys with `Publisher=='GOG.com'` (exclude `GOGPACK*`). Parse
`<PATH>\goggame-<id>.info` for `playTasks[isPrimary && type=='FileTask']` → exe/args/workingDir.
- **Launch:** `gog` → **direct-exe** Spawn (no Galaxy dependency, dodges cold-start/anti-cheat). Optional
fallback: `GalaxyClient.exe /launchViaAutostart /gameId=<id> /command=runGame /path="<dir>"` (note the
`/launchViaAutostart` token; `goggalaxy://openGameView/<id>` only **opens the page**, doesn't launch).
- **Artwork:** **free** — public no-auth `GET https://api.gog.com/products/<id>?expand=images` →
`images.logo2x`/`verticalCover`/`background`; cache resolved URLs. (`goggame-.info` carries no art; the
Galaxy `galaxy-2.0.db` is undocumented/locked — avoid.)
#### Xbox / Microsoft Store / Game Pass — P1, effort **L**, confidence medium (big Game Pass value, most plumbing)
- **Enumerate:** probe each fixed drive for an `XboxGames` dir (default `C:\XboxGames`; the `.GamingRoot`
binary layout is **undocumented** — just scan, don't depend on parsing it). For each
`<Title>\Content\MicrosoftGame.config` (**presence = it's a GDK game**, the game-vs-app signal) read
`ShellVisuals.DefaultDisplayName` (title), `<StoreId>` (12-char BigId, the art key), `Identity Name`,
`<Executable Id="Game">` (the AppId). **Read the PackageFamilyName from the
`C:\ProgramData\Microsoft\Windows\AppRepository\Packages\<PackageFullName>` directory name** (strip
`_Version_Arch_~_PublisherHash`) — **never compute the PFN by hashing the publisher**. AUMID = `PFN!AppId`.
- **Launch:** `aumid` → `explorer.exe shell:AppsFolder\<AUMID>` into the interactive session. **UWP
activation fails from SYSTEM/session-0 — the interactive user token is load-bearing.**
- **Artwork:** one **unofficial** no-auth lookup
`displaycatalog.mp.microsoft.com/v7.0/products/<StoreId>?market=US&languages=en-us&fieldsTemplate=Details`,
map `Images[]` ImagePurpose Poster→portrait / SuperHeroArt→hero / Logo→logo / BoxArt→header; cache to
the config dir, degrade to no-art offline. Not a stable contract.
- **Notes:** misses pure-UWP (non-GDK) Store games under the ACL-locked `WindowsApps` — accept for v1.
#### Ubisoft Connect — P2, effort S, confidence medium
- **Enumerate:** registry `HKLM\SOFTWARE\WOW6432Node\Ubisoft\Launcher\Installs\<gameId>` (both reg views), - **Enumerate:** registry `HKLM\SOFTWARE\WOW6432Node\Ubisoft\Launcher\Installs\<gameId>` (both reg views),
read `InstallDir`; title = install-dir leaf folder (primary) else the `Uplay Install <gameId>` Uninstall read `InstallDir`; title = install-dir leaf folder (primary) else the `Uplay Install <gameId>` Uninstall
`DisplayName`. `DisplayName`.
@@ -274,7 +208,7 @@ local, launcher-need-not-be-running unless noted.
- **Notes:** smallest effort once the Windows URI-launch wiring exists; hive+scheme unchanged across the - **Notes:** smallest effort once the Windows URI-launch wiring exists; hive+scheme unchanged across the
Origin→EA migration. Origin→EA migration.
#### Amazon Games — P2, effort S, confidence medium #### Amazon Games — Windows, P2, effort S, confidence medium
- **Enumerate:** read-only `rusqlite` of - **Enumerate:** read-only `rusqlite` of
`%LocalAppData%\Amazon Games\Data\Games\Sql\GameInstallInfo.sqlite`: `%LocalAppData%\Amazon Games\Data\Games\Sql\GameInstallInfo.sqlite`:
`SELECT Id,ProductTitle,InstallDirectory FROM DbSet WHERE Installed=1`. **Per-user path** — the SYSTEM `SELECT Id,ProductTitle,InstallDirectory FROM DbSet WHERE Installed=1`. **Per-user path** — the SYSTEM
@@ -282,7 +216,7 @@ local, launcher-need-not-be-running unless noted.
- **Launch:** `amazon` → `amazon-games://play/<Id>` (impersonate-token ShellExecute; no clean exe-argv form). - **Launch:** `amazon` → `amazon-games://play/<Id>` (impersonate-token ShellExecute; no clean exe-argv form).
- **Artwork:** `ProductIconUrl`/`ProductLogoUrl` columns when present, else none. - **Artwork:** `ProductIconUrl`/`ProductLogoUrl` columns when present, else none.
#### Battle.net — P2, effort **L**, confidence medium (high catalog value: WoW/Diablo IV/Overwatch 2/CoD) #### Battle.net — Windows, P2, effort **L**, confidence medium (high catalog value: WoW/Diablo IV/Overwatch 2/CoD)
- **Enumerate:** hand-roll a ~4-field protobuf decode of `C:\ProgramData\Battle.net\Agent\product.db` - **Enumerate:** hand-roll a ~4-field protobuf decode of `C:\ProgramData\Battle.net\Agent\product.db`
(`product_install{ uid, product_code, settings.install_path, cached_product_state.base_product_state.installed }`). (`product_install{ uid, product_code, settings.install_path, cached_product_state.base_product_state.installed }`).
Registry fallback: Uninstall keys whose `UninstallString` matches `Battle.net.exe --uid=<uid>`. Registry fallback: Uninstall keys whose `UninstallString` matches `Battle.net.exe --uid=<uid>`.
@@ -292,7 +226,7 @@ local, launcher-need-not-be-running unless noted.
`battlenet://<code>` URI, which only hands off). **Artwork:** none → title-only. `battlenet://<code>` URI, which only hands off). **Artwork:** none → title-only.
- **Notes:** the protobuf + name map + no-art make it L; pin the `.proto` and decode defensively. - **Notes:** the protobuf + name map + no-art make it L; pin the `.proto` and decode defensively.
#### EA app — P2, effort M, confidence medium (most closed/fragile — ship last) #### EA app — Windows, P2, effort M, confidence medium (most closed/fragile — ship last)
- **Enumerate:** registry `HKLM\SOFTWARE\WOW6432Node\{EA Games,Origin Games}\<id>` (Install Dir / - **Enumerate:** registry `HKLM\SOFTWARE\WOW6432Node\{EA Games,Origin Games}\<id>` (Install Dir /
DisplayName), parse `<dir>\__Installer\installerdata.xml` for the **full** `<contentIDs>` list + DisplayName), parse `<dir>\__Installer\installerdata.xml` for the **full** `<contentIDs>` list +
`<gameTitle locale='en_US'>`. Registry under-reports for EA-app (vs legacy Origin) installs — known `<gameTitle locale='en_US'>`. Registry under-reports for EA-app (vs legacy Origin) installs — known
@@ -308,26 +242,24 @@ local, launcher-need-not-be-running unless noted.
--- ---
## 6. Suggested structure & phasing ## 6. Structure & phasing
**Structure.** Split `library.rs` → a `library/` dir before it balloons: **Structure (still pending refactor).** Split the 1869-line `library.rs` → a `library/` dir before it
`mod.rs` (trait, wire types, `LaunchAction`, custom CRUD, `all_games`, `resolve_launch`, balloons further: `mod.rs` (trait, wire types, `LaunchAction`, custom CRUD, `all_games`, `resolve_launch`,
`launch_command`/`launch_title`), `steam.rs`, one file per provider, `art.rs` (ArtResolver + `launch_command`/`launch_title`), `steam.rs`, one file per provider, `art.rs` (ArtResolver +
displaycatalog/gog-api/steamgriddb helpers), `win_util.rs` (HKLM subkey enumerator, read-only SQLite displaycatalog/gog-api/steamgriddb helpers), `win_util.rs` (HKLM subkey enumerator, read-only SQLite
opener, tiny read-only XML reader). New deps: `rusqlite` (bundled, read-only) for lutris/itch/amazon DBs; opener, tiny read-only XML reader). Deps in play: `rusqlite` (bundled, read-only) for lutris/itch/amazon
`roxmltree`/`quick-xml` for the Windows manifests; registry via the `windows` crate's DBs; `roxmltree`/`quick-xml` for the Windows manifests; registry via the `windows` crate's
`Win32_System_Registry` feature (no new crate). Avoid `prost` — hand-roll the ~4 Battle.net fields. `Win32_System_Registry` feature (no new crate). Avoid `prost` — hand-roll the ~4 Battle.net fields.
| Phase | Deliverable | Files | - **Phases 14 — DONE:** launch abstraction + Windows interactive-session spawn; Steam/Lutris/Heroic
|---|---|---| providers + Linux art; Epic/GOG providers; Xbox / Game Pass provider; shared art warmer + cache; web
| **1 — Foundation** (no new stores) | Split `library.rs` → `library/`; add `LaunchAction` + `resolve_launch`; factor `windows/interactive.rs::spawn_in_active_session` out of `wgc_relay.rs`; make `set_launch_command` real on Windows; wire `launch_title` at session-start post-capture; add `win_util.rs` + deps | `library/{mod,steam,launch,art,win_util}.rs`; `windows/interactive.rs` (new); `capture/windows/wgc_relay.rs`; `punktfunk1.rs:573`; `gamestream/stream.rs:122`; `vdisplay.rs:57`; `main.rs`; `Cargo.toml` | store badge generalized per `game.store`.
| **2 — Linux Lutris + Heroic + art endpoint** (P0) | `LutrisProvider`, `HeroicProvider` (art free); `GET /library/art/<id>/<slot>` for Lutris local JPEGs; wire into `all_games()`; unit tests for new `resolve_launch` arms + guards | `library/{lutris,heroic,art}.rs`; `library/mod.rs`; `mgmt.rs:1138` + new route | - **Phase 5 — future:** Linux Desktop catch-all (last + dedup, icons via `/library/art`), Ubisoft
| **3 — Windows Epic + GOG** (P1) | `EpicProvider` (.item + catcache art), `GogProvider` (registry + .info + api.gog.com art); validate `windows/interactive.rs` end-to-end on the RTX box | `library/{epic,gog,win_util,art,launch}.rs` | (`UplayProvider`), Amazon (`AmazonProvider` + per-user-profile-under-SYSTEM helper); land the
| **4 — Xbox / Game Pass** (P1) | `XboxProvider` (XboxGames scan + MicrosoftGame.config + AppRepository PFN + aumid launch) + displaycatalog art with caching/offline degrade | `library/{xbox,art,launch}.rs` | `GET /library/art/<id>/<slot>` endpoint that Lutris/Desktop local art still needs.
| **5 — Linux Desktop catch-all + easy Windows URI stores** (P1/P2) | `DesktopProvider` (last + dedup, icons via `/library/art`), `UplayProvider`, `AmazonProvider` (+ per-user-profile-under-SYSTEM helper) | `library/{desktop,uplay,amazon,win_util,art}.rs` | - **Phase 6 — future:** Battle.net (hand-rolled protobuf + code→name map), EA app, itch.io; Rockstar/
| **6 — Remaining + opt-in enrichment** (P2/P3) | `BattleNetProvider` (hand-rolled protobuf + code→name map), `EaAppProvider`, `ItchProvider`; Rockstar/Bottles → custom; optional SteamGridDB v2 behind an operator key | `library/{battlenet,eaapp,itch,art,mod}.rs` | Bottles → custom; optional SteamGridDB v2 enrichment behind an operator key.
Also generalize the web console store badge (`web/src/sections/Library/view.tsx`) to render per `game.store`.
--- ---
@@ -357,21 +289,12 @@ Also generalize the web console store badge (`web/src/sections/Library/view.tsx`
--- ---
## 8. Verification notes (what the adversarial pass corrected) ## Open items (what's left)
First-pass research was web-re-checked; corrections folded into §5 above: - **6 remaining providers:** Desktop/Flatpak (Linux), itch.io (Linux+Windows), Ubisoft Connect, Amazon
- **Epic:** bare-`AppName` URI is **not** universally removed (Playnite still uses it) — build the triple Games, Battle.net, EA app (recipes in §5).
when ids exist, fall back to bare; use Playnite's **exclusion** filter, not a positive `games` filter. - **`GET /library/art/<entryId>/<slot>` mgmt endpoint** — still missing; Lutris local JPEGs (and the future
- **EA:** a single offerId no longer launches — emit the **full** comma-joined contentID list; registry Desktop icons) have no public URL without it.
under-reports for EA-app installs. - **Refactor `library.rs` (1869 lines) into a `library/` directory** (structure in §6).
- **Battle.net:** `battlenet://<code>` only hands off — use `Battle.net.exe --exec="launch <code>"`. - The **8 open questions** in §7.
- **Xbox:** **read** the PFN from the AppRepository dir name, don't hash the publisher; `.GamingRoot` - **Optional SteamGridDB v2 enrichment** behind an operator key (off by default; never blocks listing).
layout is undocumented — just scan `XboxGames`.
- **Heroic:** `gui=false` is inert (`--no-gui` does it); single-instance Electron forwards-and-exits →
gate launch.
- **Lutris:** open the DB read-only; `lutris -l -j` fallback is fragile (single-instance D-Bus forwarding).
- **SteamGridDB:** v1 is deprecated — use v2 (`/api/v2`, Bearer key).
**Not web-confirmable / needs on-box validation:** every Windows launch path (each launcher's argv
handling, foreground grab, secure-desktop behavior), all registry keys / DB schemas against a live box,
and `rusqlite` packaging.
+39 -77
View File
@@ -3,93 +3,55 @@ title: "GameStream Host"
description: "Stream to a stock Moonlight client on a client-sized virtual display." description: "Stream to a stock Moonlight client on a client-sized virtual display."
--- ---
> **Status:** SHIPPED — works end-to-end with a stock Moonlight/Artemis client (initial
> merge `ab6dda2`, June 2026). Code: [`crates/punktfunk-host/src/gamestream/`](../crates/punktfunk-host/src/gamestream/).
> Byte-level wire reference: [`research/gamestream-protocol-research.json`](research/gamestream-protocol-research.json)
> (distilled from Sunshine + moonlight-common-c). This doc is trimmed to design rationale +
> open items; the shipped code is the source of truth for wire/packet detail.
The shippable milestone (plan §8). A stock Moonlight/Artemis client discovers this host, A stock Moonlight client discovers this host, pairs, launches, and gets video + input + audio
pairs, launches, and gets video (then input, then audio) on a client-sized virtual display. on a client-sized virtual display.
Ground-truth protocol reference: [`research/gamestream-protocol-research.json`](research/gamestream-protocol-research.json)
(distilled from Sunshine + moonlight-common-c source; cite those for byte-level detail).
## Architecture (respects the "one core" invariant) ## Architecture (respects the "one core" invariant)
- **punktfunk-core** gains a **P1 GameStream wire codec** (`ProtocolPhase::P1GameStream`, the - **punktfunk-core** holds the **P1 GameStream wire codec** (`ProtocolPhase::P1GameStream`):
hook already exists): the exact RTP+`NV_VIDEO_PACKET` framing, the GameStream FEC shard the RTP+`NV_VIDEO_PACKET` framing, the GameStream FEC shard layout, and the video/audio
layout, and the video/audio AES-GCM/CBC paths. Hot path, native threads, **no async**. AES-GCM/CBC paths. Hot path, native threads, **no async**. Kept beside punktfunk's native
Kept beside punktfunk's native internal format (P2), selected by phase. internal format (P2), selected by phase.
- **punktfunk-host** gains the **control plane** (tokio/axum OK — I/O-bound, not the hot path): - **punktfunk-host** holds the **control plane** (tokio/axum OK — I/O-bound, *not* the hot
mDNS discovery, nvhttp serverinfo + the 4-phase pairing, the RTSP handshake, the ENet path): mDNS discovery, nvhttp serverinfo + the 4-phase pairing, the RTSP handshake, the ENet
control stream + input injection, the virtual-display lifecycle, and Opus audio encode. control stream + input injection, the virtual-display lifecycle, and Opus audio encode.
## Port map (base 47989; Moonlight derives all by offset) ## Why we shipped in this order (the two highest interop risks)
| Port | Proto | Role | These two mitigations are why early bring-up deliberately skipped crypto and FEC — both turn
|---|---|---| out to be unnecessary on a clean LAN, and both have a wire-incompatibility that would have
| 47989 | TCP | HTTP nvhttp (unpaired: /serverinfo, /pair PIN flow) | silently broken interop if done naively.
| 47984 | TCP | HTTPS nvhttp (paired; **client-cert pinned**) — /launch, /resume, … |
| 48010 | TCP | RTSP (OPTIONS/DESCRIBE/SETUP/ANNOUNCE/PLAY) |
| 47998 | UDP | Video RTP (+ RS-FEC, optional AES-GCM) |
| 47999 | UDP | Audio RTP (Opus, RS-FEC 4+2, optional **AES-CBC**) |
| 48000 | UDP | ENet control stream (AES-GCM) + remote input |
| 5353 | UDP | mDNS `_nvstream._tcp.local` advertisement |
## Key wire facts (the non-obvious ones) 1. **RS-FEC matrix incompatibility — clean-LAN first.** Sunshine + Moonlight both use
**nanors** (GF(2⁸), poly 0x11d, Vandermonde systematic). punktfunk-core uses
`reed-solomon-erasure` (Cauchy) — parity bytes **don't match**, so Moonlight silently fails
to recover any frame with a lost data shard. Mitigation: **on a clean LAN with no loss the
client never runs RS decode**, so we deferred it — get a frame decoded first, then port
nanors for loss recovery.
2. **Crypto layout incompatibility — plaintext video first.** punktfunk's `SessionCrypto`
(salt + seq-as-AAD) is wire-incompatible with GameStream's GCM; P1 needs a separate
GameStream GCM path (key = raw 16-byte RIKEY, IV = `counter_le[8]||0,0,0||'V'(0x56)`, **no
AAD**, **FEC first, then encrypt per shard**). Mitigation: **video encryption is negotiated
and usually off on LAN** — we implemented plaintext video first and added GCM later.
- **Video datagram** = `RTP_PACKET(12, BIG-endian)` + `reserved[4]` + `NV_VIDEO_PACKET(16, ## Open items
LITTLE-endian)` + payload. Endianness differs *within the same packet*. `header=0x80|0x10`.
- `fecInfo` (u32 LE) = `(dataShards<<22)|(fecIndex<<12)|(fecPercentage<<4)`; parityShards is
**recomputed** by the client as `ceil(dataShards*pct/100)` — must match exactly.
- `multiFecBlocks` = `(blockIdx<<4)|((nBlocks-1)<<6)`; **≤4 FEC blocks/frame**, ≤255 shards/block.
- Each frame's bitstream is prefixed with an 8-byte `video_short_frame_header_t`
(`headerType=0x01`, `frameType` 2=IDR, `lastPayloadLen`) before striping into shards.
- Shard size = `packetSize + 16`. Data shards first, then parity, over a contiguous RTP
sequence range. Last data shard zero-padded.
- **Video crypto** (when `SS_ENC_VIDEO` negotiated): AES-128-GCM, key = raw 16-byte RIKEY
(from `/launch?rikey=`), IV = `counter_le[8]||0,0,0||'V'(0x56)`, **NO AAD**, 32-byte
`ENC_VIDEO_HEADER{iv[12],frameNumber,tag[16]}` prefix; **FEC first, then encrypt per shard**.
- **Pairing**: PIN key = `SHA-256(salt[16] || ascii_pin)[..16]`; AES-128-**ECB** (no padding)
for the challenge blocks; SHA-256 rolling hashes; RSA-SHA256 signatures over X.509 certs;
the client cert is pinned for subsequent HTTPS. 4 phases over `/pair?phrase=…`.
- **RTSP** `Session: DEADBEEFCAFE;timeout = 90` (literal), `Transport: server_port=<p>`,
`streamid=video/0/0` / `control/13/0`. ANNOUNCE carries the negotiated config
(`x-nv-video[0].*`, `x-nv-vqos[0].*`) → maps to `punktfunk_core::Config`.
## The two highest interop risks (validate EARLY) - **HDR / 10-bit.** Needs HDR capture + metadata plumbing. (`av1_nvenc -highbitdepth 1`
already encodes Main10 from 8-bit input on this box.)
1. **RS-FEC matrix compatibility.** Sunshine + Moonlight both use **nanors** (GF(2⁸), poly - **Reconnect-at-new-mode robustness.**
0x11d, Vandermonde systematic). punktfunk-core uses `reed-solomon-erasure` (Cauchy) — parity - **AV1 negotiation.** Implemented + unit/live-capture tested; needs a **live confirmation
bytes likely **don't match**, so Moonlight silently fails to recover any frame with a lost with a stock Moonlight client** (select AV1 in a stock client).
data shard. Mitigation: **on a clean LAN with no loss the client never runs RS decode**, so - **Surround 5.1/7.1 audio.** Implemented + tested; needs a **real listen** including FEC
defer this — get a frame decoded first, then FFI/port nanors for loss recovery. under loss, plus a live Moonlight confirmation.
2. **Crypto layout.** punktfunk's `SessionCrypto` (salt + seq-as-AAD) is wire-incompatible. P1
needs a separate GameStream GCM path. Mitigation: **video encryption is negotiated and
usually off on LAN** — implement plaintext video first, add GCM later.
## Phasing (each phase independently testable with a real Moonlight client)
- **P1.1 — Discovery + serverinfo + pairing.** mDNS `_nvstream._tcp`, HTTP/HTTPS nvhttp,
`/serverinfo` XML, the 4-phase pairing + cert pinning. *Acceptance: Moonlight discovers,
pairs (PIN), and shows the host as ready.* ← first slice.
- **P1.2 — Launch + RTSP + virtual display.** `/launch` (parse rikey/rikeyid/mode), the RTSP
handshake, negotiate `Config`, create a wlroots virtual output sized to the client.
*Acceptance: Moonlight completes RTSP and the host stands up the UDP streams.*
- **P1.3 — Video (punktfunk-core P1 codec), plaintext, clean-LAN.** RTP+NV framing + FEC shard
layout in punktfunk-core; wire the spike's NVENC AUs → UDP 47998. *Acceptance: Moonlight DISPLAYS video.*
- **P1.4 — Control + input.** ENet (`rusty_enet`) control stream; decode input → `inject.rs`
(uinput/reis); request-IDR → force NVENC keyframe. *Acceptance: mouse/keyboard work.*
- **P1.5 — Robustness: FEC recovery + encryption.** nanors-exact FEC; per-shard AES-GCM.
*Acceptance: stable under `tc netem` loss; encrypted streams.*
- **P1.6 — Audio + polish.** Opus + audio RTP/FEC/CBC (UDP 47999); disconnect teardown; KWin
backend for the user's KDE box. *Acceptance: full game stream with sound — the GameStream-host goal.*
## Crates (verified available)
`mdns-sd` 0.20 (discovery) · `axum` 0.8 + `rustls` + `tokio-rustls` (nvhttp/HTTPS, custom
`ClientCertVerifier` for pinning) · `rcgen` 0.14 + `x509-parser` 0.18 + `rsa`/`sha2`/`aes`/
`ecb` (pairing crypto) · hand-rolled RTSP over `tokio::net::TcpListener` · `rusty_enet` 0.4
(control) · `opus` 0.3 (audio) · `reis` 0.6 + `input-linux` (input) · `aes-gcm` (already in
core) for the P1 video/control GCM path; nanors (FFI/port) for FEC recovery in P1.5.
## Testing note ## Testing note
The host is headless; end-to-end needs a **stock Moonlight client on the LAN** pointed at The host is headless; end-to-end needs a **stock Moonlight client on the LAN** pointed at this
this box (manual "add host" by IP works without mDNS). P1.1 is testable with `curl` against box (manual "add host" by IP works without mDNS). `/serverinfo` + the pair flow are testable
`/serverinfo` + the Moonlight pair flow; P1.3+ needs a client that can display. with `curl`; video needs a client that can display.
+101 -81
View File
@@ -1,5 +1,11 @@
# GPU-contention performance investigation — why a saturating game starves the stream (2026-06-25) # GPU-contention performance investigation — why a saturating game starves the stream (2026-06-25)
> **Status:** Investigation / plan. §5.A (NV12/P010 on the IDD-push default path) is **SHIPPED**
> — `3514702`, `capture/windows/idd_push.rs` + `encode/windows/nvenc.rs`. All other levers
> (§5.B/§5.C/§5.E/§5.F/§5.G) are **OPEN**; §5.C is partial (REALTIME knob exists, no auto-gate).
> Paired with [`host-latency-plan.md`](host-latency-plan.md) (mutual cross-refs — keep both).
> Trimmed to design rationale + open items; git history holds the full original.
> The headache, stated precisely: > The headache, stated precisely:
> a game renders ~140 fps on the host GPU; the client requests 120/240; in a GPU-light scene the > a game renders ~140 fps on the host GPU; the client requests 120/240; in a GPU-light scene the
> stream tracks; the moment the game pins the GPU the **stream collapses to 4050 fps** while the > stream tracks; the moment the game pins the GPU the **stream collapses to 4050 fps** while the
@@ -14,11 +20,6 @@ supersedes several of that doc's conclusions** — the codebase moved a lot in t
GPU-priority knob got configurable), and a fresh, adversarially-verified research pass overturned GPU-priority knob got configurable), and a fresh, adversarially-verified research pass overturned
two of the old plan's premises. Read §1 (corrections) before acting on the old doc. two of the old plan's premises. Read §1 (corrections) before acting on the old doc.
Method: five parallel investigations — three deep reads of the *current* code (encode, capture,
mitigations) and two web-research passes (encoder-side and GPU-scheduling-side), the latter run with
their own adversarial verifiers. Every external claim below carries a source URL; every code claim
carries a current `file:line`.
--- ---
## 0. TL;DR — the corrected mental model and the action list ## 0. TL;DR — the corrected mental model and the action list
@@ -50,9 +51,9 @@ politely yield without losing anything" — Reflex, render-queue tricks — is a
1. **Diagnose first** (§3). Read `uniq`-vs-`fps` under the real workload + PresentMon presentation 1. **Diagnose first** (§3). Read `uniq`-vs-`fps` under the real workload + PresentMon presentation
mode. Half a day; decides whether you're fighting (a) or (b). The repo already prints the counter. mode. Half a day; decides whether you're fighting (a) or (b). The repo already prints the counter.
2. **Stop feeding NVENC RGB on the default path.** IDD-push (the install default) hands NVENC 2. **Stop feeding NVENC RGB on the default path** **DONE** for IDD-push (`3514702`): the install
BGRA → NVENC runs its RGB→YUV CSC on the SM, the exact contended engine. Convert to NV12/P010 on default now converts BGRA→NV12 (SDR) / FP16→P010 (HDR) before NVENC, off the SM. Linux NV12-default
the **video engine** like the WGC/DDA paths already do. Biggest in-our-control win. (§5.A) and a video-engine HDR P010 are still open. (§5.A)
3. **Build a *correct* async encode pipeline** — submit on one thread, blocking-retrieve on another, 3. **Build a *correct* async encode pipeline** — submit on one thread, blocking-retrieve on another,
deep surface pool, Windows completion events. Our past "pipelining didn't help" was a *same-thread* deep surface pool, Windows completion events. Our past "pipelining didn't help" was a *same-thread*
implementation that can't overlap; the two-thread pattern the NVENC guide mandates was never implementation that can't overlap; the two-thread pattern the NVENC guide mandates was never
@@ -73,9 +74,10 @@ politely yield without losing anything" — Reflex, render-queue tricks — is a
The old doc was right about the shape but several specifics are now wrong or stale: The old doc was right about the shape but several specifics are now wrong or stale:
- **"Windows already feeds NVENC YUV on the video engine, so it does the right thing."** True for the - **"Windows already feeds NVENC YUV on the video engine, so it does the right thing."** True for the
DDA and WGC paths — **false for IDD-push, which is now the install default** and feeds NVENC DDA and WGC paths — **was false for IDD-push, which became the install default and fed NVENC
**RGB**, paying the SM-side CSC the old doc said Windows had eliminated. The default path RGB**, paying the SM-side CSC the old doc said Windows had eliminated. The default path *regressed*
*regressed* on the exact axis the doc celebrated. (§5.A, `capture/windows/idd_push.rs:545-551,743`) on the exact axis the doc celebrated. **Since fixed** (`3514702`, §5.A): IDD-push now converts
BGRA→NV12 on the video engine (FP16→P010 shader for HDR) and feeds NVENC native YUV.
- **"`PUNKTFUNK_ENCODE_DEPTH` (default 4, ≤6) deep-pipelines."** **There is no such knob.** It exists - **"`PUNKTFUNK_ENCODE_DEPTH` (default 4, ≤6) deep-pipelines."** **There is no such knob.** It exists
only in two stale comments (`encode/windows/nvenc.rs:30`, `capture/windows/wgc.rs:57`) and is never only in two stale comments (`encode/windows/nvenc.rs:30`, `capture/windows/wgc.rs:57`) and is never
parsed. The real depth knob is `PUNKTFUNK_IDD_DEPTH` (default 2), used only by IDD-push on the parsed. The real depth knob is `PUNKTFUNK_IDD_DEPTH` (default 2), used only by IDD-push on the
@@ -108,39 +110,33 @@ honest residual ceiling at 100% GPU. Those carry forward.
--- ---
## 2. How the pipeline actually serializes today (verified against current code) ## 2. How the pipeline serializes today — the key insight
The capture→encode loop is a **fixed-cadence pacer** (`gamestream/stream.rs:375-480`, The capture→encode loop is a **fixed-cadence pacer** (`gamestream/stream.rs`, `punktfunk1.rs`): every
`punktfunk1.rs:2430-2540`): every `1/target_fps` tick it grabs the freshest frame with a `1/target_fps` tick it grabs the freshest frame with a **non-blocking** `try_latest()`, and **if
**non-blocking** `try_latest()`, and **if nothing new arrived it re-encodes the held frame** (a nothing new arrived it re-encodes the held frame** (a near-empty P-frame). So the **outbound fps is
near-empty P-frame). So the **outbound fps is pinned at `target_fps` no matter what the source did** pinned at `target_fps` no matter what the source did** — which is *why the raw fps counter lies* under
which is *why the raw fps counter lies* under contention. The only honest signal is the `uniq` / contention. The only honest signal is the `uniq` / `diag_new` counter; the code itself states the
`diag_new` counter (`stream.rs:380`, `punktfunk1.rs:2433-2436`), and the code itself states the diagnostic: *"low new_fps at high send rate ⇒ the source isn't producing frames, not an encode stall."*
diagnostic: *"low new_fps at high send rate ⇒ the source isn't producing frames, not an encode
stall"* (`punktfunk1.rs:2466-2468`).
The encode round-trip (NVENC, the dominant path): The NVENC round-trip (the dominant path) is **depth-1 synchronous**: `encode_picture` is a
non-blocking ASIC launch, but `lock_bitstream` **blocks the same thread** until that frame completes
- `submit``encode_picture` (`encode/windows/nvenc.rs:722`) is a **non-blocking** ASIC launch; it (no `enableEncodeAsync`, no completion event). The only thread split is encode-vs-network-send, never
pushes onto a `pending` FIFO. submit-vs-retrieve. So under contention the loop is strictly serial — `capture (+convert) → submit →
- `poll``lock_bitstream` (`nvenc.rs:801`) **blocks the same thread** until that frame's encode block in lock_bitstream → hand AU to the send thread` — and the arithmetic matches the symptom:
completes. The session is **synchronous** — no `enableEncodeAsync`, no completion event. `1000/17 ≈ 59` and `1000/13 ≈ 77` fps bracket the observed ~50, the signature of **one frame in
- The only thread split is **encode-vs-network-send**, never submit-vs-retrieve. flight per round-trip**, not an ASIC throughput wall.
So at depth-1 the loop is strictly serial: `capture (+convert) → submit → block in lock_bitstream →
hand AU to the send thread`. The arithmetic matches the symptom — `1000/17 ≈ 59` and `1000/13 ≈ 77`
fps bracket the observed ~50, the signature of **one frame in flight per round-trip**, not an ASIC
throughput wall.
([independent NVENC latency study: ~7 frames across all presets](https://arxiv.org/html/2511.18688v2)) ([independent NVENC latency study: ~7 frames across all presets](https://arxiv.org/html/2511.18688v2))
Where the per-frame GPU work lands, by path (this is the crux of contention): Where the per-frame GPU work lands, by path (the crux of contention — **lower contended-engine load is
better**):
| Path | Colour-convert | Extra copy | NVENC input | Contended-engine load/frame | | Path | Colour-convert | NVENC input | Contended-engine load/frame |
|---|---|---|---|---| |---|---|---|---|
| **IDD-push** (install default) | **none → NVENC internal RGB→YUV on the SM** | `CopyResource` BGRA→out-ring (3D), `idd_push.rs:743` | **BGRA/Rgb10a2** | **highest** (SM CSC + 3D copy) | | **IDD-push** (install default) | **NV12/P010 on the video engine** (`3514702`; FP16→P010 via shader for HDR) | NV12/P010 | low (SDR) / shader-CSC on SM (HDR) |
| **WGC** (fallback default) | `VideoProcessorBlt` → NV12 on the **video engine**, `wgc.rs:631` | none (encodes pool texture in place) | NV12/P010 | low | | **WGC** (fallback default) | `VideoProcessorBlt` → NV12 on the **video engine** | NV12/P010 | low |
| **DDA** | `VideoProcessorBlt` → NV12 on the **video engine**, `dxgi.rs:1657-1762` | one `CopyResource` (3D) to release the dup fast, `dxgi.rs:3099` | NV12/P010 | medium | | **DDA** | `VideoProcessorBlt` → NV12 on the **video engine** | NV12/P010 | medium (one 3D `CopyResource` to release the dup fast) |
| **Linux NVENC** | **none → NVENC internal RGB→YUV on the SM** (default) | CUDA dev→dev copy + `cuStreamSynchronize` | RGBZ/BGRZ (NV12 only if `PUNKTFUNK_NV12` *and* `PUNKTFUNK_ZEROCOPY`) | high | | **Linux NVENC** | **none → NVENC internal RGB→YUV on the SM** (default) | RGBZ/BGRZ (NV12 only if `PUNKTFUNK_NV12` *and* `PUNKTFUNK_ZEROCOPY`) | high |
Measured magnitude of "RGB vs NV12 to the encoder": Measured magnitude of "RGB vs NV12 to the encoder":
[**RGB input ≈ video-engine 40% + 3D/CUDA 15%; NV12 input ≈ video 26% + 3D 2%**](https://hardforum.com/threads/can-someone-explain-to-me-how-nvenc-obs-work-with-nvidia-gpus-and-the-gpu-load-they-cause.2025896/). [**RGB input ≈ video-engine 40% + 3D/CUDA 15%; NV12 input ≈ video 26% + 3D 2%**](https://hardforum.com/threads/can-someone-explain-to-me-how-nvenc-obs-work-with-nvidia-gpus-and-the-gpu-load-they-cause.2025896/).
@@ -166,7 +162,8 @@ actual saturating game.
Composed: Flip). Independent-Flip + `uniq` ≪ Presented ⇒ source/flip problem; **Presented FPS Composed: Flip). Independent-Flip + `uniq` ≪ Presented ⇒ source/flip problem; **Presented FPS
itself** collapsed ⇒ the game is genuinely GPU-bound and no capture trick invents the missing itself** collapsed ⇒ the game is genuinely GPU-bound and no capture trick invents the missing
frames. frames.
3. Log `cap_us` / `enc_us` / `pace_us` p50/p99 alongside to localise the stall. 3. Log `cap_us` / `enc_us` / `pace_us` p50/p99 alongside to localise the stall. (Per-stage
`cap`/`submit`/`wait` µs instrumentation landed under `PUNKTFUNK_PERF` in `3514702`.)
> **Necessary-but-not-sufficient caveat:** if the game only *rendered* 50 frames because it's > **Necessary-but-not-sufficient caveat:** if the game only *rendered* 50 frames because it's
> GPU-bound, **nothing downstream creates the other 90**. Source fixes address (b) only; the > GPU-bound, **nothing downstream creates the other 90**. Source fixes address (b) only; the
@@ -182,12 +179,12 @@ actual saturating game.
| Thread priority (Linux) | `setpriority` 10/5 — **native path only; GameStream Linux threads get none** | `punktfunk1.rs:1977` ⚠ | | Thread priority (Linux) | `setpriority` 10/5 — **native path only; GameStream Linux threads get none** | `punktfunk1.rs:1977` ⚠ |
| GPU sched priority | `D3DKMTSetProcessSchedulingPriorityClass` **HIGH(4)** default; `realtime` opt-in, no auto-gate; cross-process onto WGC helper | `capture/windows/dxgi.rs:208-330` ⚠ | | GPU sched priority | `D3DKMTSetProcessSchedulingPriorityClass` **HIGH(4)** default; `realtime` opt-in, no auto-gate; cross-process onto WGC helper | `capture/windows/dxgi.rs:208-330` ⚠ |
| GPU thread/latency | `SetGPUThreadPriority(0x4000001E)`, `SetMaximumFrameLatency(1)` | `dxgi.rs:193-200` ✅ | | GPU thread/latency | `SetGPUThreadPriority(0x4000001E)`, `SetMaximumFrameLatency(1)` | `dxgi.rs:193-200` ✅ |
| CSC off-SM (Win SDR) | WGC/DDA video-engine NV12 ✅ — **IDD-push (default) RGB→SM ✗** | `wgc.rs:631` / `idd_push.rs:545` | | CSC off-SM (Win SDR) | WGC/DDA video-engine NV12 ✅ — **IDD-push (default) now video-engine NV12** (`3514702`) ✅ | `wgc.rs:631` / `idd_push.rs` |
| CSC off-SM (Win HDR) | on-SM unless `PUNKTFUNK_HDR_SHADER_P010` (default **off**) | `wgc.rs:603` ⚠ | | CSC off-SM (Win HDR) | IDD-push HDR via FP16→P010 **shader** (on-SM); other paths on-SM unless `PUNKTFUNK_HDR_SHADER_P010` | `wgc.rs:603` ⚠ |
| CSC off-SM (Linux) | RGB→SM by default; NV12 is **double-opt-in** (`PUNKTFUNK_NV12`+`PUNKTFUNK_ZEROCOPY`) | `encode/linux/mod.rs:104` ⚠ | | CSC off-SM (Linux) | RGB→SM by default; NV12 is **double-opt-in** (`PUNKTFUNK_NV12`+`PUNKTFUNK_ZEROCOPY`) | `encode/linux/mod.rs:104` ⚠ |
| Encode pipeline | depth-1 synchronous, inline `lock_bitstream`; IDD-push native = depth-2 same-thread | `nvenc.rs:801` ⚠ | | Encode pipeline | depth-1 synchronous, inline `lock_bitstream`; IDD-push native = depth-2 same-thread | `nvenc.rs:801` ⚠ |
| Split-encode | 2-way >1 Gpix/s (HEVC/AV1); disabled 10-bit (correct); proper enum | `nvenc.rs:424-447` ✅ | | Split-encode | 2-way >1 Gpix/s (HEVC/AV1); disabled 10-bit (correct); proper enum | `nvenc.rs:424-447` ✅ |
| Zero-copy register-in-place | yes (no encoder-owned pool copy) — IDD-push adds its own out-ring copy | `nvenc.rs:623`/⚠ | | Zero-copy register-in-place | yes; IDD-push out-ring is now the convert target (NV12/P010), no extra copy | `nvenc.rs:623` ✅ |
| AMF tuning | `usage=ultralowlatency`, `preanalysis=false` | `ffmpeg_win.rs:215-219` ✅ | | AMF tuning | `usage=ultralowlatency`, `preanalysis=false` | `ffmpeg_win.rs:215-219` ✅ |
| QSV tuning | `async_depth=1`, `low_power=1` (VDEnc) | `ffmpeg_win.rs:226-227` ✅ | | QSV tuning | `async_depth=1`, `low_power=1` (VDEnc) | `ffmpeg_win.rs:226-227` ✅ |
| Intra-refresh / infinite GOP | yes (killed the periodic-IDR freeze) | ✅ | | Intra-refresh / infinite GOP | yes (killed the periodic-IDR freeze) | ✅ |
@@ -202,35 +199,32 @@ actual saturating game.
## 5. The levers, ranked, with honest verdicts ## 5. The levers, ranked, with honest verdicts
### A. Stop feeding NVENC RGB on the default path — **highest in-our-control win** ### A. Stop feeding NVENC RGB on the default path — **DONE for Windows IDD-push** (`3514702`)
The default Windows capture path (IDD-push) and the default Linux path both hand NVENC packed RGB, The default Windows IDD-push path used to hand NVENC packed RGB, forcing NVENC's internal RGB→YUV CSC
forcing NVENC's internal RGB→YUV CSC onto the SM the game saturates. The WGC and DDA paths already onto the SM the game saturates. `3514702` makes the out-ring the convert target: a D3D11 **video-engine**
solved this by doing the CSC with `ID3D11VideoProcessor::VideoProcessorBlt` (video engine) and `VideoConverter` does BGRA→NV12 (SDR, BT.709 limited) in place, so NVENC gets native NV12 and skips its
feeding NV12/P010. **Make IDD-push and Linux do the same.** SM-side CSC; HDR uses the FP16→P010 shader (NVIDIA's VideoProcessor can't do RGB→P010). NV12 input forces
`bit_depth=8`, so an HDR↔SDR toggle re-inits the session at the matching depth (NV12 can't feed a 10-bit
session). This also removed the separate `CopyResource` (the convert writes the ring directly).
- **Windows IDD-push:** add a `VideoProcessorBlt` BGRA→NV12 (SDR) / FP16→P010 (HDR) step into the **Verdict: REAL, but honestly *conditional*** — the convert has to land **off** the SM to fully pay off.
out-ring, exactly like `wgc.rs:631` / `dxgi.rs:1657-1762`, and feed `NV_ENC_BUFFER_FORMAT_NV12` / `VideoProcessorBlt` is *designed* to use fixed-function video hardware and the hardforum numbers back the
`..._YUV420_10BIT`. This *also* lets you drop the separate `CopyResource` (the convert writes the 15%→2% drop, **but no NVIDIA doc explicitly confirms `VideoProcessorBlt` runs off-SM on GeForce** — treat
out-ring), removing **both** contended-engine ops per frame. Plug it into `SessionPlan` the "video engine" claim as well-founded-but-unverified and confirm on-box with `nvidia-smi dmon` (watch
(`session_plan.rs`, the single owner of the capture/encode decision) so capture and encode can't the `enc`/`sm` columns) before and after. Do **not** convert with a CUDA/3D shader and call it done — that
disagree on the format. just relocates the CSC to the same SM (this is why the HDR P010 *shader* path is still on-SM; Sunshine's
- **Linux:** make NV12 the **default** for the tiled zero-copy path (it's gated behind RGB→NV12 CUDA kernel still contends).
`PUNKTFUNK_NV12` *and* `PUNKTFUNK_ZEROCOPY` today — `encode/linux/mod.rs:104`,
`linux/zerocopy/egl.rs:272`), and feed NVENC `NV_ENC_BUFFER_FORMAT_NV12`. The GL detile already
runs; emitting NV12 from it replaces the swizzle at ~equal cost and deletes NVENC's CSC.
- **Windows HDR:** flip `PUNKTFUNK_HDR_SHADER_P010` on by default (or, better, use a video-engine
P010 convert where the VP supports it).
**Verdict: REAL, but honestly *conditional*.** Feeding NV12 provably removes NVENC's internal CUDA **Still open in §A:**
CSC — but the convert has to land **off** the SM to fully pay off. `VideoProcessorBlt` is *designed* - **Linux:** make NV12 the **default** for the tiled zero-copy path (gated behind `PUNKTFUNK_NV12` *and*
to use fixed-function video hardware and the hardforum numbers back the 15%→2% drop, **but no NVIDIA `PUNKTFUNK_ZEROCOPY` today — `encode/linux/mod.rs:104`, `linux/zerocopy/egl.rs:272`), feeding NVENC
doc explicitly confirms `VideoProcessorBlt` runs off-SM on GeForce** — treat the "video engine" claim `NV_ENC_BUFFER_FORMAT_NV12`. The GL detile already runs; emitting NV12 from it replaces the swizzle at
as well-founded-but-unverified and confirm on-box with `nvidia-smi dmon` (watch the `enc`/`sm` ~equal cost and deletes NVENC's CSC.
columns) before and after. Do **not** convert with a CUDA/3D shader and call it done — that just - **Windows HDR:** move the FP16→P010 convert onto the video engine where the VP supports it (today's
relocates the CSC to the same SM (Sunshine's RGB→NV12 CUDA kernel still contends). shader keeps it on-SM), or flip `PUNKTFUNK_HDR_SHADER_P010` on by default for the non-IDD paths.
### B. A *correct* async encode pipeline (the untried encoder lever) ### B. A *correct* async encode pipeline (the untried encoder lever) — **OPEN**
The NVENC Programming Guide is explicit: *"The main encoder thread should be used only to submit The NVENC Programming Guide is explicit: *"The main encoder thread should be used only to submit
work… (non-blocking `NvEncEncodePicture`). Output buffer processing — waiting on the completion work… (non-blocking `NvEncEncodePicture`). Output buffer processing — waiting on the completion
@@ -263,7 +257,7 @@ Watch out: this **forecloses sub-frame slice output** (mutually exclusive with `
and HAGS can spike the *submit* call itself and HAGS can spike the *submit* call itself
([100200 ms `nvEncEncodePicture` stalls under HAGS](https://forums.developer.nvidia.com/t/windows-11-hardware-accelerated-gpu-scheduling-issue/286128)). ([100200 ms `nvEncEncodePicture` stalls under HAGS](https://forums.developer.nvidia.com/t/windows-11-hardware-accelerated-gpu-scheduling-issue/286128)).
### C. Auto-gated REALTIME GPU scheduling priority ### C. Auto-gated REALTIME GPU scheduling priority — **PARTIAL** (knob exists, no auto-gate)
Raising the host process's WDDM GPU priority is **the** proven single-PC production lever — OBS and Raising the host process's WDDM GPU priority is **the** proven single-PC production lever — OBS and
Sunshine both set `D3DKMT_SCHEDULINGPRIORITYCLASS_REALTIME` to stop being descheduled behind Sunshine both set `D3DKMT_SCHEDULINGPRIORITYCLASS_REALTIME` to stop being descheduled behind
@@ -274,8 +268,8 @@ It works **independently of HAGS** (HAGS does *not* reassign cross-process prior
*"Windows continues to control prioritization"* *"Windows continues to control prioritization"*
[DirectX devblog](https://devblogs.microsoft.com/directx/hardware-accelerated-gpu-scheduling/)). [DirectX devblog](https://devblogs.microsoft.com/directx/hardware-accelerated-gpu-scheduling/)).
We ship only **HIGH(4)** by default with a static `realtime` opt-in and **no auto-gate**. Two things We ship only **HIGH(4)** by default with a static `realtime` opt-in (`PUNKTFUNK_GPU_PRIORITY_CLASS`,
to change: `dxgi.rs:208-330`) and **no auto-gate**. Two things to change:
- **We can actually grant REALTIME.** It needs `SeIncreaseBasePriorityPrivilege`, which an unelevated - **We can actually grant REALTIME.** It needs `SeIncreaseBasePriorityPrivilege`, which an unelevated
app lacks (OBS logs the failure) — **but our host runs as a `LocalSystem` service, which holds it.** app lacks (OBS logs the failure) — **but our host runs as a `LocalSystem` service, which holds it.**
@@ -294,7 +288,7 @@ the host *takes* GPU time from the game; it measurably **costs the game fps**
That's acceptable for a streaming host (the remote view is the product), but say so plainly and make That's acceptable for a streaming host (the remote view is the product), but say so plainly and make
the class operator-configurable (we already expose `PUNKTFUNK_GPU_PRIORITY_CLASS`). the class operator-configurable (we already expose `PUNKTFUNK_GPU_PRIORITY_CLASS`).
### D. Multi-vendor encoder hygiene (AMF/QSV) — mostly done, one caveat ### D. Multi-vendor encoder hygiene (AMF/QSV) — **stable / mostly done, one caveat**
Our `*_amf`/`*_qsv` libavcodec config already follows the research's advice: AMF Our `*_amf`/`*_qsv` libavcodec config already follows the research's advice: AMF
`usage=ultralowlatency` + `preanalysis=false` (`ffmpeg_win.rs:215`), QSV `async_depth=1` + `usage=ultralowlatency` + `preanalysis=false` (`ffmpeg_win.rs:215`), QSV `async_depth=1` +
@@ -313,7 +307,7 @@ Our `*_amf`/`*_qsv` libavcodec config already follows the research's advice: AMF
**Verdict: REAL but largely already captured.** No big win left here except via §G. **Verdict: REAL but largely already captured.** No big win left here except via §G.
### E. Lock clocks / pin P-state — cheap jitter fix, not a collapse fix ### E. Lock clocks / pin P-state — cheap jitter fix, not a collapse fix — **OPEN**
NVIDIA's adaptive clocking downclocks between our small bursty frames and pays a ramp tax every NVIDIA's adaptive clocking downclocks between our small bursty frames and pays a ramp tax every
frame — most visible in the *light* scene (the "200-not-240"). Pin it: frame — most visible in the *light* scene (the "200-not-240"). Pin it:
@@ -334,7 +328,7 @@ frame — most visible in the *light* scene (the "200-not-240"). Pin it:
**Verdict: REAL for latency *stability*, marginal for the saturated collapse** (at 100% util the game **Verdict: REAL for latency *stability*, marginal for the saturated collapse** (at 100% util the game
already pins P0). Cheap, low risk, do it for the light-scene win. already pins P0). Cheap, low risk, do it for the light-scene win.
### F. Escape the frame-source ceiling — only if §3 says (b) ### F. Escape the frame-source ceiling — only if §3 says (b) — **OPEN**
If `uniq` is the wall, no encoder/priority work helps — you need a better frame source. If `uniq` is the wall, no encoder/priority work helps — you need a better frame source.
@@ -358,7 +352,7 @@ If `uniq` is the wall, no encoder/priority work helps — you need a better fram
**Verdict: swapchain-hook is REAL and the only general escape; the rest are narrow.** None invents **Verdict: swapchain-hook is REAL and the only general escape; the rest are narrow.** None invents
frames the game didn't render. frames the game didn't render.
### G. The honest endgame — encode on a second GPU / the iGPU ### G. The honest endgame — encode on a second GPU / the iGPU — **OPEN**
For *demanding* titles that saturate the GPU even when capped, the only thing that **removes** For *demanding* titles that saturate the GPU even when capped, the only thing that **removes**
contention rather than re-prioritizing it is to run the capture→convert→encode pipeline on a contention rather than re-prioritizing it is to run the capture→convert→encode pipeline on a
@@ -384,8 +378,8 @@ the consumer analogue is the iGPU.
## 6. Recommended order of attack ## 6. Recommended order of attack
1. **§3 Diagnose** on the RTX box + a real game. Settles (a) vs (b). *(half a day, decisive)* 1. **§3 Diagnose** on the RTX box + a real game. Settles (a) vs (b). *(half a day, decisive)*
2. **§5.A NV12/P010 on the default paths** (IDD-push video-engine convert; Linux NV12 default-on; 2. **§5.A NV12/P010 on the default paths** IDD-push **DONE** (`3514702`); remaining: Linux NV12
Windows HDR P010 default). Biggest in-our-control floor-raise; confirm off-SM with `nvidia-smi dmon`. default-on, Windows HDR P010 off-SM. Confirm off-SM with `nvidia-smi dmon`.
3. **§5.C Auto-gated REALTIME** priority (HAGS + VRAM gate). Cheap, big, we can uniquely grant it. 3. **§5.C Auto-gated REALTIME** priority (HAGS + VRAM gate). Cheap, big, we can uniquely grant it.
4. **§5.E Clock pin** both OSes (crash-safe undo). Cheap light-scene win. 4. **§5.E Clock pin** both OSes (crash-safe undo). Cheap light-scene win.
5. **§5.B Correct two-thread async pipeline.** Structural; recovers the depth-1 serialization. 5. **§5.B Correct two-thread async pipeline.** Structural; recovers the depth-1 serialization.
@@ -418,13 +412,39 @@ or a second slice of silicon (§G). Don't chase the rest with encoder micro-opti
--- ---
## 8. Open evidence gaps (flagged honestly) ## 8. Open items / what's left
Diagnostics + still-unbuilt levers (verbatim, highest leverage first):
- **§3 automation** — instrument the `uniq`-vs-`fps` heuristic + a PresentMon probe so (a)/(b) is
decided automatically, not by hand. (Per-stage `cap`/`submit`/`wait` µs already land under
`PUNKTFUNK_PERF` from `3514702`; the uniq/PresentMon classifier is not yet automated.)
- **§5.A residual** — Linux NV12 default-on for the tiled zero-copy path (drop the
`PUNKTFUNK_NV12`+`PUNKTFUNK_ZEROCOPY` double-opt-in); move the Windows HDR FP16→P010 convert off the
SM (today it's a shader). Windows IDD-push SDR/HDR NV12/P010 is **DONE** (`3514702`).
- **§5.B** — build a *correct* async NVENC pipeline: submit on one thread, blocking-`lock_bitstream`
on a dedicated retrieve thread, deep input+output surface pool (≈48), Windows per-buffer
`completionEvent` (`enableEncodeAsync=1`), same two-thread split on Linux.
- **§5.C** — auto-gate REALTIME GPU priority: probe HAGS (`D3DKMTQueryAdapterInfo`) + VRAM headroom
(`IDXGIAdapter3::QueryVideoMemoryInfo`) continuously; REALTIME only when HAGS-off or HAGS-on with
comfortable headroom, downgrade to HIGH the instant VRAM tightens. (Static `realtime` opt-in exists
in `dxgi.rs`; no auto-gate.)
- **§5.E** — clock / P-state pinning: Windows NvAPI DRS `PREFERRED_PSTATE=PREFER_MAX` (crash-safe undo
to `%ProgramData%\punktfunk\`); Linux `nvidia-smi -lgc` / `nvmlDeviceSetGpuLockedClocks` (+
`CudaNoStablePerfLimit` on R580/595). Gate `PUNKTFUNK_PIN_CLOCKS`, default off on battery/Deck.
- **§5.F** — frame-source escape (only if §3 says (b)): swapchain-hook capture (OBS-style, anti-cheat
tradeoffs); NvFBC on Linux (keylase patch); compose-flip for the DLSS-FG half-rate case; WGC
`MinUpdateInterval = 1e7/(fps*2)` 2×-rate tweak.
- **§5.G** — iGPU / second-GPU encode offload: pin capture to the gaming adapter, encoder to the iGPU
adapter, one cross-adapter shared-texture copy. Reuses the AMF/QSV/VAAPI backends.
### Open evidence gaps (verify on-box)
- Whether `ID3D11VideoProcessor::VideoProcessorBlt` (BGRA→NV12) runs **off the SM on GeForce** is not - Whether `ID3D11VideoProcessor::VideoProcessorBlt` (BGRA→NV12) runs **off the SM on GeForce** is not
confirmed by any NVIDIA document — it's the linchpin of §5.A's full payoff. **Verify on-box** with confirmed by any NVIDIA document — it's the linchpin of §5.A's full payoff. **Verify on-box** with
`nvidia-smi dmon` (sm% vs enc%) on the WGC path before assuming IDD-push will match it. `nvidia-smi dmon` (sm% vs enc%) on the IDD-push/WGC path before assuming the win landed.
- The exact share of the 1317 ms `encode_ms` that is *convert-on-SM* vs *scheduling-wait* is - The exact share of the 1317 ms `encode_ms` that is *convert-on-SM* vs *scheduling-wait* is
unmeasured. §3 + an A/B of IDD-push-RGB vs IDD-push-NV12 on the same scene settles it and tells you unmeasured. §3 + an A/B of IDD-push-RGB (pre-`3514702`) vs IDD-push-NV12 on the same scene settles it
whether §5.A alone is enough or whether §5.C is doing the heavy lifting. and tells you whether §5.A alone is enough or whether §5.C is doing the heavy lifting.
- AMD VCN "degrades worse under contention" is practitioner-consensus + architecture, not an AMD - AMD VCN "degrades worse under contention" is practitioner-consensus + architecture, not an AMD
whitepaper; treat the *direction* as solid, the magnitude as TBD. whitepaper; treat the *direction* as solid, the magnitude as TBD.
+106 -244
View File
@@ -1,270 +1,132 @@
# HDR pipeline — investigation & implementation plan # HDR pipeline — investigation & implementation plan
Goal: **true, correct HDR glass-to-glass** for punktfunk, across the host (Windows today; > **Status:** Steps 03 SHIPPED — protocol/ABI/host in `3526517`, client apply + display
Linux blocked upstream) and every client (Windows / Apple / Android / Linux). > capability-gate in `551012b`. Windows HDR live-validated; Apple/Android CI-compiled (on-glass
> pending). Step 4 (Linux) is OPEN, blocked upstream on capture. This doc is trimmed to design
> rationale + open items; the shipped code is the source of truth. The original audit (full gap
> list, per-file line refs, blocker walkthroughs) is in git history before this trim.
This is an audit of the current state, the gap list, and a phased plan. It was produced from Goal: **true, correct HDR glass-to-glass** for punktfunk, across the host (Windows today; Linux
a full read of every HDR-touching subsystem cross-checked against the HDR10 standards blocked upstream) and every client (Windows / Apple / Android / Linux). The plan was produced from a
(CICP/H.273 VUI, SMPTE ST.2086 mastering, CEA-861.3 MaxCLL/MaxFALL) and the full read of every HDR-touching subsystem cross-checked against the HDR10 standards (CICP/H.273 VUI,
Sunshine/Apollo/Moonlight reference implementation. SMPTE ST.2086 mastering, CEA-861.3 MaxCLL/MaxFALL) and the Sunshine/Apollo/Moonlight reference.
> Status legend: **blocker** (HDR can't work) · **correctness** (HDR works but looks wrong) · The original diagnosis: pixel math + the HEVC VUI we emitted were already correct (self-test
> **quality** (correct-ish, missing refinement) · **ok**. validated, matches Apollo), but nothing **measured, signalled, transported, or applied the static HDR
metadata** (mastering display colour volume + content light level). The fix was a metadata chain,
protocol-first.
--- ## What shipped (Steps 03)
## TL;DR - **Step 0 — protocol + ABI carry colour end to end (`3526517`).** `ColorInfo` (4 CICP bytes on
`Welcome`) + `HdrMeta` (`0xCE` datagram, bounds-checked); `NativeClient` `color`/`bit_depth` fields
+ an `HdrMeta` receiver/demux + `next_hdr_meta`. C ABI: `punktfunk_connect_ex5(... video_caps)`,
`next_hdr_meta`, `color_info`, and **fixed `abi.rs:896` `video_caps = 0`** — the one-line root cause
that had made Apple's complete (and correct) HDR pipeline dead code. Header regenerated. No
rendering changes, CI-testable (round-trip + truncation + SDR back-compat).
- **Step 1 — host in-band SEI + complete VUI (`3526517`, live-validated on the RTX box).**
Cross-platform byte logic in unit-tested `src/hdr.rs`: `hdr_meta_from_display`,
`hevc_mastering_display_sei` SEI **type 137**, `hevc_content_light_level_sei` SEI **type 144**
(note: NOT "type 4" — that was a drafting error). Windows `dxgi.rs`/`wgc.rs` read
`IDXGIOutput6::GetDesc1` at capture init / output change → `HdrMeta` (MaxCLL/MaxFALL left 0, like
Apollo); `nvenc.rs` attaches mastering + CLL SEI on every IDR for HEVC/H.264 and sends the real
`0xCE` re-sent each keyframe. In-band SEI is read directly by decoders, so this fixed correctness
before clients consumed the protocol and gave an Apollo on-glass parity gate. *Follow-ups:* AV1
mastering rides METADATA OBUs (`HDR_MDCV`/`HDR_CLL`), not SEI; the Windows secure-desktop relay
still sends only the generic baseline `0xCE` (the helper's in-band SEI carries the real grade).
- **Step 2 — clients apply the metadata (`551012b`; Apple/Android CI-compiled).** Each client drains
`next_hdr_meta`/`nextHdrMeta` and remaps from the wire form (ST.2086 **G,B,R** order, mastering
luminance in 0.0001 cd/m²) to the platform layout: **Windows** `SetHDRMetaData`
(`hdr_meta_to_dxgi`: G,B,R→R,G,B reorder, 0.0001-nit→nit), dropping the 1000/1000/400 hardcode;
**Apple** `CVBufferSetAttachment` of `kCVImageBufferMasteringDisplayColorVolumeKey` (24-byte BE) +
`kCVImageBufferContentLightLevelInfoKey` (4-byte BE) per HDR pixel buffer — the correct path for the
`itur_2100_PQ` layer (`CAEDRMetadata` on a PQ layer is ambiguous, deliberately avoided); **Android**
`MediaFormat` `KEY_HDR_STATIC_INFO`, a 25-byte CTA-861.3 Type-1 blob (LE, **R,G,B** order, max-lum
in **nits-u16**). Apple's connect also flips to `connect_ex5` advertising `videoCap10Bit|videoCapHDR`
— the fix that resurrects Apple's previously-dead pipeline.
- **Step 3 — display-capability gate (`551012b`).** **Chosen approach: capability-gate, not an
in-shader BT.2390 tone-map.** Rationale: with Steps 12 the host sends *correct* mastering metadata
so an HDR display self-tone-maps; the remaining gap is SDR displays, best fixed by **not advertising
HDR you can't present** — the host then sends a proper BT.709 SDR stream instead of PQ the panel
mis-tone-maps (washed-out/dark). No guessed curve, deterministic. **Windows**
`display_supports_hdr` (any `IDXGIOutput6` colour space == `G2084`); **Apple**
`NSScreen.maximumExtendedDynamicRangeColorComponentValue > 1` (macOS) /
`UIScreen.main.potentialEDRHeadroom > 1` (iOS); **Android** `Display.getHdrCapabilities`
HDR10/HDR10+. Each ANDs with the user's HDR setting before advertising caps and logs when it drops
to SDR.
Our HDR is **correct in isolated islands but broken end-to-end.** The pixel math and the HEVC ### Wire format — design decisions worth keeping
VUI we *do* emit are right (self-test validated, matches Apollo). What's missing is the
**metadata chain**: nothing measures, signals, transports, or applies the *static HDR metadata*
(mastering display colour volume + content light level) that tells a display how to tone-map —
so every client hardcodes generic values or infers from the bitstream, and one line
(`abi.rs:896`, `video_caps = 0`) makes the entire (correct) Apple HDR pipeline dead code.
---
## What's already correct (the islands)
| Stage | Where |
|---|---|
| Windows host HEVC **VUI** — primaries=9 (BT.2020) / transfer=16 (PQ) / matrix=9 (BT.2020-NCL) / limited range | `encode/nvenc.rs:307-316` |
| Windows host **scRGB→BT.2020 PQ** shader (×80 nits → BT.709→2020 → ST.2084 OETF, 10000-nit peak) | `capture/dxgi.rs` — self-test `<1` code error, matches Apollo |
| Windows client **P010 decode + YUV→RGB** (BT.2020-NCL, limited→full) + **R10G10B10A2 / G2084-P2020 swapchain** | `present.rs:66-77, 320-370` |
| Android client **Main10 decode + reactive DataSpace** (BT2020-PQ/HLG) | `decode.rs:210-227` |
| Apple client decode/present **code** (P010 VideoToolbox, BT.2020 PQ Metal, `itur_2100_PQ` + EDR) | correct — but never runs (blocker #2) |
## Gap list
### Blockers
1. **No color-metadata transport in the protocol** *(the keystone).* The wire carries only
`Hello.video_caps` (10BIT/HDR bits) and `Welcome.bit_depth` (8/10) — `quic.rs:127-128`
explicitly defers color. No primaries/transfer/matrix/range, **no ST.2086 mastering, no
MaxCLL/MaxFALL**. ST.2086/CLL host→client is impossible by construction today.
2. **C ABI hardcodes `video_caps = 0`** (`abi.rs:896`) → Apple's complete HDR pipeline is dead
code; no ABI embedder can request HDR. One-line root cause.
3. **H.264 and AV1 emit zero color signaling on Windows** — the `if self.hdr` VUI block in
`nvenc.rs` only writes `hevcConfig`. Any H.264+10-bit or AV1+HDR stream decodes as BT.709 SDR.
*(AV1 is **not** a "copy the HEVC VUI" fix — AV1 has no VUI/SEI; it carries
primaries/transfer/matrix in the sequence-header `color_config` and mastering/CLL in
**METADATA OBUs** `HDR_MDCV`/`HDR_CLL`. Verify whether NVENC's AV1 path accepts them.)*
4. **Linux host is 8-bit only end to end** — capture offers only 8-bit PipeWire formats
(`capture/linux.rs:443-453, 594-654`; gamescope #2126, portals don't wire PipeWire 1.6
BT.2020/PQ); encode downgrades 10-bit (`encode/linux.rs:153-162` TODO, `vaapi.rs:719`) with
BT.709 hardcoded. The Windows-style 8-bit→Main10 upconvert shim is not implemented here.
5. **Linux client HDR is a complete non-feature**`video_caps=0`, P010 decode path dead
(`video.rs:379`), CICP hardcoded BT.709 (`ui_stream.rs:239-243`), no Wayland
color-management (GTK4 0.11 too old).
### Correctness
6. **No host ever emits the ST.2086 mastering or CEA-861.3 CLL SEI.** Windows never reads
`IDXGIOutput6::GetDesc1`; `nvenc.rs` never builds an `NV_ENC_SEI_PAYLOAD`; Linux attaches no
libavcodec `side_data`. Apollo reads `GetDesc1` and attaches it.
7. **Clients hardcode mastering metadata.** `present.rs:584-595` ships fixed
`1000-nit / MaxCLL 1000 / MaxFALL 400` (with the literal "the protocol doesn't carry the
stream's real mastering metadata yet" comment). Apple/Android set none.
8. **HDR→SDR tone-mapping is unaddressed — and it's the common case.** Most client displays are
SDR. No client queries display peak; silent `SetColorSpace1`/`SetHDRMetaData` failures present
PQ as SDR gamma (crushed/dark). We lean entirely on OS auto-fallback.
9. **Windows secure desktop drops HDR to SDR** on lock/UAC (`dxgi.rs:325-368`,
`sudovda.rs:234-277`).
10. **GameStream silently streams SDR** on a Moonlight HDR request (`mod.rs:48-56`,
`rtsp.rs:288-293`) — logged, but no negotiated error. Real Apollo parity needs the Moonlight
`SS_HDR_METADATA` blob on the **ENet control channel**, not just in-band.
11. **Linux client software path is color-wrong even for SDR** — BT.601 applied to BT.709
(`video.rs:162-167`, no `color_state` on the texture). Standalone bug.
### Quality
12. No per-content MaxCLL/MaxFALL (`GetDesc1` doesn't expose it). No encoder-CSC-range vs
signaled-range reconciliation (black-crush risk). No automated 10-bit test — `probe` never
even reads `Welcome.bit_depth` (`main.rs:396-406`).
### Out of scope (call out, don't build)
- Dynamic metadata: HDR10+ (ST.2094-40) and Dolby Vision RPU. We handle *static* ST.2086 only,
with mid-stream changes carried by re-sending the static block (below).
- HLG: the colorimetry transfer enum carries `18` from day one (free), but the `0xCE` mastering
datagram is **omitted for HLG** (scene-referred, no mastering metadata).
---
## Protocol design (the keystone — pure-additive, hardware-free, CI-testable)
Two layers, both back-compat-safe via the established trailing-bytes / new-datagram-tag patterns. Two layers, both back-compat-safe via the established trailing-bytes / new-datagram-tag patterns.
### (A) Per-session colorimetry — 4 trailing bytes on `Welcome` - **(A) Per-session colorimetry** — 4 trailing bytes on `Welcome` (offsets 60..64):
`colour_primaries` (1=BT.709, 9=BT.2020) · `transfer_characteristics` (1=BT.709, 16=PQ/SMPTE2084,
18=HLG) · `matrix_coeffs` (1=BT.709, 9=BT.2020-NCL — **never emit 10 (CL): no client decodes it**) ·
`video_full_range_flag`. Decoded with `b.get(60).unwrap_or(1)` so an older host that omits them →
BT.709 limited SDR (today's behaviour). A future mirror on `Reconfigured` announces a mid-stream
SDR↔HDR / BT.709↔BT.2020 flip (deferred; today a mode switch never changes colour, and `0xCE`
re-send covers mastering changes).
- **(B) Per-change mastering + CLL** — host→client datagram tag **`0xCE`**, 28 bytes, standard SEI
fixed-point (display primaries G,B,R + white point in 1/50000 units; max/min mastering luminance in
0.0001 cd/m²; MaxCLL/MaxFALL in nits). ST.2086 is variable, so it rides a datagram rather than the
Welcome. **Re-sent on every IDR/RFI keyframe** so a client that dropped the best-effort datagram
converges within a GOP; until first receipt the client uses the Welcome transfer + a documented
generic default. **Bounds-check length before reading** (reassembler-bounds security invariant —
truncation test required). **Omitted entirely for HLG.** Units map straight to DXGI
`DXGI_HDR_METADATA_HDR10`, Android `KEY_HDR_STATIC_INFO`, Apple `CAEDRMetadata.hdr10`; the
libavcodec/Linux side needs conversion — `AVMasteringDisplayMetadata` stores `AVRational`, not raw
fixed-point.
After the existing `bit_depth` (offset 59), append a fixed 4-byte CICP block at offsets 60..64. ## Out of scope (accepted — call out, don't build)
(A future mirror on `Reconfigured` will announce a mid-stream SDR↔HDR / BT.709↔BT.2020 flip on the
control stream we already use for renegotiation — deferred to Step 1 with the mid-stream-flip work;
today a mode switch never changes the colour, and the `0xCE` re-send covers mastering changes.)
``` - **Dynamic metadata:** HDR10+ (ST.2094-40) and Dolby Vision RPU. We handle *static* ST.2086 only,
[60] colour_primaries (CICP: 1=BT.709, 9=BT.2020) with mid-stream changes carried by re-sending the static block.
[61] transfer_characteristics (1=BT.709, 16=PQ/SMPTE2084, 18=HLG) - **HLG:** the transfer enum carries `18` from day one (free), but the `0xCE` mastering datagram is
[62] matrix_coeffs (1=BT.709, 9=BT.2020-NCL) ← never emit 10 (CL): no client decodes it omitted for HLG (scene-referred, no mastering metadata).
[63] video_full_range_flag (0=limited, 1=full)
```
Decode with `b.get(60).unwrap_or(1)` etc. — an older host omits them → BT.709 limited SDR ## Step 4 — Linux (last; capture blocked upstream) — OPEN
(today's behavior). `Welcome` stays `Copy`. Modeled as a `ColorInfo` struct on the wire types
and exposed on `NativeClient` (with `bit_depth`) so clients *know* the colorimetry instead of
inferring it.
### (B) Per-change mastering + CLL — a new host→client datagram, tag `0xCE` - **(4a) 8-bit→Main10 NVENC upconvert shim** (`encode/linux.rs`) — Main10 transport with correct
VUI/SEI without HDR capture (gated so we don't claim HDR transfer on SDR content).
ST.2086 is variable and changes mid-stream, so it rides a datagram (next tag after `0xCD` - **Linux encode colour + side-data (the deferred Step 1c):** set
HIDOUT), demuxed in `client.rs` like AUDIO/RUMBLE/HIDOUT. 28 bytes, standard SEI fixed-point:
```
[0] = 0xCE
G.x G.y B.x B.y R.x R.y 6 × u16 LE display primaries, 1/50000 units
wp.x wp.y 2 × u16 LE white point, 1/50000 units
max_display_mastering_luminance u32 LE 0.0001 cd/m²
min_display_mastering_luminance u32 LE 0.0001 cd/m²
max_cll u16 LE nits
max_fall u16 LE nits
```
- Sent on session start and whenever `GetDesc1`/source mastering changes; **re-sent on every
IDR/RFI keyframe** so a client that dropped the (best-effort) datagram converges within a GOP.
Until first receipt the client uses the Welcome transfer + a documented generic default.
- **Bounds-check length before reading** (reassembler-bounds security invariant) — truncation
test required.
- **Omitted entirely for HLG.**
- Units note: these map straight to DXGI `DXGI_HDR_METADATA_HDR10`, Android `KEY_HDR_STATIC_INFO`,
and Apple `CAEDRMetadata.hdr10`. On the **libavcodec/Linux** side they need conversion —
`AVMasteringDisplayMetadata` stores `AVRational`, not raw fixed-point.
### (C) C ABI
- `punktfunk_connect_ex5(... video_caps: u8)` (ex4 delegates with 0); **fix `abi.rs:896`.**
- `punktfunk_connection_next_hdr_meta(c, *mut PunktfunkHdrMeta, timeout_ms)` — new plane,
one-puller contract like `next_audio`.
- `punktfunk_connection_color_info(c, *mut prim, *mut trc, *mut matrix, *mut range, *mut bit_depth)`.
- Regenerate `include/punktfunk_core.h` (cbindgen); `struct_size`/repr(C) guards on new structs.
---
## Phases
### Step 0 — Protocol + ABI carry color metadata end to end *(this change)*
The dominant cross-cutting blocker; everything else is downstream. No rendering changes, no
hardware, CI-testable.
- **core:** `ColorInfo` + 4 Welcome bytes; `HdrMeta` + `0xCE` codec (bounds-checked);
`NativeClient` `color`/`bit_depth` fields + HdrMeta receiver + demux + `next_hdr_meta`.
- **C ABI:** `connect_ex5`, `next_hdr_meta`, `color_info`, fix caps=0; regen header.
- **host:** populate `Welcome.color` from the negotiated bit-depth/HDR decision; send a `0xCE`
(generic default for now) when HDR is negotiated.
- **clients:** Windows/Android inherit the demux via shared core; Apple flips to `ex5`.
- **validation:** `quic.rs` round-trip + truncation + **SDR back-compat** tests; `probe` logs
`bit_depth` + colorimetry; loopback asserts a 10-bit Welcome carries trc=16 and a `0xCE` lands.
### Step 1 — Host emits correct in-band SEI + complete VUI on all codecs *(landed; RTX-validation pending)*
In-band SEI is read directly by decoders, so it fixes correctness even before clients consume
the protocol, and gives an Apollo/Moonlight on-glass parity gate.
- **Single source of truth:** the capturer learns the source display's mastering metadata and
exposes it via `Capturer::hdr_meta() -> Option<HdrMeta>`. The stream loop forwards it to the
encoder (`Encoder::set_hdr_meta` → in-band SEI) **and** the client (real `0xCE`, re-sent on each
keyframe). Pure byte-level logic (float→fixed conversion + the HEVC/H.264 SEI payloads) lives in
the unit-tested, cross-platform `src/hdr.rs` (`hdr_meta_from_display`, `hevc_mastering_display_sei`
type **137**, `hevc_content_light_level_sei` type **144** — note: NOT "type 4", that was a
drafting error).
- **Windows (done, CI-compiled / RTX on-glass pending):** `dxgi.rs` + `wgc.rs` read
`IDXGIOutput6::GetDesc1` at capture init / output change → `HdrMeta` (MaxCLL/MaxFALL left 0 —
GetDesc1 has none, like Apollo). `nvenc.rs` attaches the mastering + CLL SEI on every IDR for
HEVC/H.264. (AV1 mastering rides METADATA OBUs, not SEI — follow-up; AV1 `color_config` already
lands in Step 0's quick win.)
- **Linux encode-ready — DEFERRED into Step 4:** Linux capture is 8-bit only, so signalling
BT.2020 PQ + attaching mastering side-data on a downconverted 8-bit stream would be *incorrect*.
The libavcodec `side_data` path (with the `AVRational` conversion) lands together with the
8-bit→Main10 shim / true 10-bit capture in Step 4.
- **Windows secure-desktop relay** (`virtual_stream_relay`) still sends only the generic baseline
`0xCE`; the helper's in-band SEI carries the real grade. Wiring the relay's `0xCE` is a follow-up.
- **validation (RTX box):** `ffprobe -show_frames` shows mastering + CLL side-data with the
display's real luminance and VUI 9/16/9; stock Moonlight shows correct (not washed-out) HDR.
Add **encoder-CSC-range == signaled-range** check.
### Step 2 — Clients apply the metadata *(landed; CI/on-glass validation pending)*
All three clients now drain the protocol's `HdrMeta` (`next_hdr_meta` / `nextHdrMeta`) and apply it,
each remapping from the wire form (ST.2086 G,B,R order, mastering luminance in 0.0001 cd/m²) to the
platform's expected layout:
- **Windows (Rust, CI-compiled):** session pump drains `next_hdr_meta` into a `LATEST_HDR_META`
slot; `present_newest` applies it via `Presenter::set_hdr_metadata` → real `SetHDRMetaData`
(`hdr_meta_to_dxgi`: G,B,R→R,G,B reorder, 0.0001-nit→nit for `MaxMasteringLuminance`), dropping
the 1000/1000/400 hardcode. `SetColorSpace1`/`SetHDRMetaData` failures + an SDR-display
colour-space rejection are now **logged**, not swallowed.
- **Apple (Swift, mac-runner CI):** connect now advertises caps via `punktfunk_connect_ex5`
(`SessionModel` computes `videoCap10Bit|videoCapHDR` from `hdrEnabled`) — *this is the fix that
resurrects Apple's previously-dead HDR pipeline*. `nextHdrMeta`/`colorInfo` wrappers added; the
pump drains `nextHdrMeta``VideoDecoder.setHdrMeta``CVBufferSetAttachment` of
`kCVImageBufferMasteringDisplayColorVolumeKey` (24-byte BE SEI) +
`kCVImageBufferContentLightLevelInfoKey` (4-byte BE) on each HDR pixel buffer (the correct path
for the itur_2100_PQ layer; `CAEDRMetadata` on a PQ layer is ambiguous and was avoided).
- **Android (Rust `decode.rs`, cargo-ndk verified):** when `client.color.is_hdr()`, drain the first
`next_hdr_meta` and set `MediaFormat` `hdr-static-info` (`KEY_HDR_STATIC_INFO`) before
`configure()``android_hdr_static_info` builds the 25-byte CTA-861.3 Type-1 blob (LE, **R,G,B**
order, max-lum in **nits-u16**). `Display.getHdrCapabilities` gate deferred (the Surface DataSpace
already drives SurfaceFlinger tone-mapping on non-HDR displays).
### Step 3 — Display-capability gate *(landed; CI/on-glass validation pending)*
The common-case correctness step — most client displays are SDR. **Chosen approach: capability-gate**
(not an in-shader BT.2390 tone-map). Rationale: with Steps 12 the host sends *correct* mastering
metadata, so an HDR display self-tone-maps from it; the real remaining gap is SDR displays, best
fixed by **not advertising HDR you can't present** — the host then sends a proper BT.709 SDR stream
instead of PQ the panel would mis-tone-map (washed-out/dark). No guessed tone-map curve, deterministic.
- **Windows** (`present::display_supports_hdr` via DXGI: any `IDXGIOutput6` colour space ==
`G2084`): `session.rs` ANDs it with the user's HDR setting before advertising caps; logs when it
drops to SDR.
- **Apple** (`SessionModel`, main-actor): `NSScreen.maximumExtendedDynamicRangeColorComponentValue
> 1` (macOS) / `UIScreen.main.potentialEDRHeadroom > 1` (iOS) ANDed with `hdrEnabled`.
- **Android** (`Settings.displaySupportsHdr` via `Display.getHdrCapabilities` HDR10/HDR10+): Kotlin
passes it to `nativeConnect`; `session.rs` gates the caps on the new `hdr_enabled` jboolean
(cargo-ndk-verified).
- **Deferred** (need on-glass / the RTX box): the **mid-session `Reconfigure` "downgrade to SDR"**
for a monitor move HDR↔SDR; and confirming the **host produces SDR for an SDR client even off an
HDR desktop** — on the native path the per-session SudoVDA follows the negotiated depth (SDR
client → SDR virtual display → SDR stream), so it should hold end-to-end; verify the
stale-HDR-SudoVDA edge case on the RTX box.
### Step 4 — Linux (last; capture blocked upstream)
- **8-bit→Main10 NVENC upconvert shim** (`encode/linux.rs`) — Main10 transport with correct
VUI/SEI without HDR capture (gate so we don't claim HDR transfer on SDR content).
- **Linux encode color + side-data (the deferred Step 1c):** set
`color_primaries/trc/colorspace/range` from the negotiated `ColorInfo` and attach `color_primaries/trc/colorspace/range` from the negotiated `ColorInfo` and attach
`AV_FRAME_DATA_MASTERING_DISPLAY_METADATA` / `CONTENT_LIGHT_LEVEL` side-data (with the `AV_FRAME_DATA_MASTERING_DISPLAY_METADATA` / `CONTENT_LIGHT_LEVEL` side-data (with the `AVRational`
`AVRational` conversion) in `encode/linux.rs` + `vaapi.rs` — only once the encoder actually conversion) in `encode/linux.rs` + `vaapi.rs` — only once the encoder actually produces 10-bit, so
produces 10-bit, so the signalling matches the bits. the signalling matches the bits. (Linux capture is 8-bit only, so signalling BT.2020 PQ + attaching
- True 10-bit capture: offer `ABGR2101010`/`P010` PipeWire formats + read colorimetry; pilot on mastering side-data on a downconverted 8-bit stream would be *incorrect* — hence deferred out of
Sway/wlroots; track gamescope #2126. **Don't block the rest of the plan on it.** Step 1.)
- Linux client: `ex5` caps, P010 decode, GdkDmabufTexture CICP from Welcome, - **(4b) True 10-bit capture:** offer `ABGR2101010`/`P010` PipeWire formats + read colorimetry; pilot
`wp_color_management` when GTK ≥ 4.14. on Sway/wlroots; **blocked on gamescope #2126** (portals don't wire PipeWire 1.6 BT.2020/PQ).
**Don't block the rest of the plan on it.**
- **(4c) Linux client:** `ex5` caps, P010 decode, GdkDmabufTexture CICP from Welcome,
`wp_color_management` when GTK ≥ 4.14. (Also a standalone SDR bug: software path applies BT.601 to
BT.709 — needs a BT.601→BT.709 sws + texture `color_state`.)
## Quick wins (independent, land in parallel) ## Deferred validation (need on-glass / the RTX box)
1. `connect_ex5` + fix `abi.rs:896` — resurrects Apple's pipeline *(Step 0)*.
2. H.264 VUI + AV1 `color_config` on `nvenc.rs` — closes two latent blockers *(Windows-only, - The mid-session `Reconfigure` "downgrade to SDR" for a monitor move HDR↔SDR.
validated in CI / on the RTX box)*. - Confirm the **host produces SDR for an SDR client even off an HDR desktop** — on the native path the
3. `probe` logs `bit_depth` + colorimetry — observability for every later round-trip assertion. per-session SudoVDA follows the negotiated depth (SDR client → SDR virtual display → SDR stream), so
4. Linux client BT.601→BT.709 sws + texture `color_state` — standalone SDR correctness bug. it should hold end to end; verify the stale-HDR-SudoVDA edge case.
5. GameStream silent-downgrade already warns (`rtsp.rs:289`) — keep observable.
## Open questions ## Open questions
- **MaxCLL source:** `GetDesc1` doesn't expose it (Apollo zeroes). Static default, or measure - **MaxCLL source:** `GetDesc1` doesn't expose it (Apollo zeroes). Static default, or measure
per-frame peak in the PQ shader (only truly-correct, adds a readback)? per-frame peak in the PQ shader (only truly-correct, adds a readback)?
- **GameStream:** implement `SS_HDR_METADATA` for Moonlight parity, or keep it deliberately SDR - **GameStream:** implement `SS_HDR_METADATA` (Moonlight `SS_HDR_METADATA` blob on the ENet control
and steer HDR users to punktfunk/1? channel) for parity, or keep it deliberately SDR and steer HDR users to punktfunk/1?
- **HLG:** carry the enum from day one (free) — but do any sources actually produce HLG? - **HLG:** carry the enum from day one (free) — but do any sources actually produce HLG?
- **Linux:** is shipping the 8-bit→Main10 shim as "HDR-capable transport" acceptable, or does it - **Linux:** is shipping the 8-bit→Main10 shim as "HDR-capable transport" acceptable, or does it risk
risk advertising HDR we can't truly deliver? advertising HDR we can't truly deliver?
## Ordering rationale ## Ordering rationale
Step 0 first: it's the keystone (metadata transport is the dominant cross-cutter; the ABI line
is a one-line root cause) and needs no hardware. Step 1 next: in-band SEI is read directly by Step 0 first: it's the keystone (metadata transport is the dominant cross-cutter; the ABI
decoders, so it fixes correctness even before our clients consume the protocol, and gives an `video_caps = 0` line is a one-line root cause) and needs no hardware. Step 1 next: in-band SEI is
Apollo-parity on-glass gate. Steps 23 are mechanical per-client wiring once metadata flows. read directly by decoders, so it fixes correctness even before our clients consume the protocol, and
gives an Apollo-parity on-glass gate. Steps 23 are mechanical per-client wiring once metadata flows.
Linux is last because capture is gated on upstream we don't control; the shim delivers Main10 Linux is last because capture is gated on upstream we don't control; the shim delivers Main10
transport without that dependency. transport without that dependency.
Hardware dependencies: Step 0 = none (CI); Step 1 = RTX Windows host; Steps 23 = a real HDR Hardware dependencies: Step 0 = none (CI); Step 1 = RTX Windows host; Steps 23 = a real HDR display
display per platform; Step 4 = a Linux GPU box + HDR-capable Wayland compositor. per platform; Step 4 = a Linux GPU box + HDR-capable Wayland compositor.
+82 -136
View File
@@ -1,13 +1,18 @@
# Host latency & the GPU-contention collapse — analysis + prioritized plan # Host latency & the GPU-contention collapse — analysis + prioritized plan
> **Status:** PARTLY SHIPPED. Tier 2A (Linux NV12 convert) = `1fc6f73`; Tier 2B (Linux
> scheduling) + Tier 3A (Windows session tuning) = `112a054`. Tiers 1A, 1B, 3B, 3C, 3D, 4 are
> still open. This doc is trimmed to design rationale + open items; the shipped code is the
> source of truth for the landed tiers.
> **⚠ Partially superseded (2026-06-25) by [`gpu-contention-investigation.md`](gpu-contention-investigation.md).** > **⚠ Partially superseded (2026-06-25) by [`gpu-contention-investigation.md`](gpu-contention-investigation.md).**
> That follow-up re-verified this plan against the current code and overturned several specifics: > That follow-up re-verified this plan against the current code and overturned several specifics:
> the default Windows path (IDD-push) now feeds NVENC **RGB** (regressing the §0A "Windows does it > the default Windows path (IDD-push) now feeds NVENC **RGB** (regressing the §0A "Windows does it
> right" claim); `PUNKTFUNK_ENCODE_DEPTH` never existed (phantom knob); the "async NVENC stacks > right" claim); `PUNKTFUNK_ENCODE_DEPTH` never existed (phantom knob); the "async NVENC stacks
> latency" result was a *same-thread* implementation, not a disproof of a correct two-thread pipeline; > latency" result was a *same-thread* implementation, not a disproof of a correct two-thread pipeline;
> "capture sees half the frames" is DLSS-Frame-Gen-specific, not general; and NvFBC is dead on > "capture sees half the frames" is DLSS-Frame-Gen-specific, not general; and NvFBC is dead on
> Windows. Use the new doc's ranked action list. The tiers/dropped-placebo analysis below remain a > Windows. **For current action prioritization see `gpu-contention-investigation.md`.** The
> useful record. > tiers/dropped-placebo analysis below remain a useful record.
Scope: Windows + Linux GameStream/punktfunk1 hosts. Priority: **latency**, and specifically the Scope: Windows + Linux GameStream/punktfunk1 hosts. Priority: **latency**, and specifically the
"saturating game starves the stream" headache: "saturating game starves the stream" headache:
@@ -21,64 +26,19 @@ This doc is the synthesis of a multi-agent investigation (deep read of our pipel
separate real levers from placebo. The "Dropped / why" section exists so we don't re-propose the separate real levers from placebo. The "Dropped / why" section exists so we don't re-propose the
placebos. placebos.
## Implementation status (2026-06-18)
-**Tier 2B — Linux scheduling hygiene**: landed. `boost_thread_priority` now nices the
capture/encode + send threads on Linux (`setpriority`, best-effort) and its wrong gamescope
doc-comment is fixed; CUDA context uses `CU_CTX_SCHED_BLOCKING_SYNC`; copies run on a per-thread
highest-priority CUDA stream (`cuStreamCreateWithPriority`, graceful NULL-stream fallback) with a
per-stream sync that no longer blocks on the other worker thread's work. Builds + clippy + fmt
green. The stream-priority hint is **measure-then-keep** (NVIDIA Linux may ignore it).
-**Tier 3A — Windows session tuning**: landed (`session_tuning.rs`, raw C-ABI FFI, no-op off
Windows). Each capture/encode/send thread now applies process-wide tuning once (1 ms timer,
`DwmEnableMMCSS`, `HIGH_PRIORITY_CLASS`) and per-thread MMCSS "Games" + keep-display-awake. Wired
into both the native (`boost_thread_priority`) and GameStream (`stream.rs`) paths. Linux no-op
path builds green; the **FFI was validated on the real MSVC toolchain** (standalone probe compiled,
linked against winmm/kernel32/dwmapi/avrt, and ran — timer/priority/MMCSS all succeed).
-**Tier 2A — Linux NV12 convert**: landed, gated behind `PUNKTFUNK_NV12` (default OFF → the
RGB/BGRx path is byte-for-byte unchanged). The tiled EGL/GL path produces NV12 (BT.709 limited) on
the GPU and feeds NVENC native YUV, deleting NVENC's internal RGB→YUV CSC off the contended SM.
**Validated on an RTX 5070 Ti two ways**: (1) `nv12-selftest` — synthetic RGBA→NV12 round-trip vs a
BT.709 reference, max abs error Y=0.56 / U=0.33 / V=0.26 LSB; (2) live `capture→NV12→NVENC→decode`
of animated content matches the RGB path's colour (avg RGB 230,18,18 vs 231,18,20 — no green-screen,
correct matrix + VUI). LINEAR/Vulkan-bridge (gamescope) path stays RGB. Next: glass-to-glass
latency + fps-under-saturation A/B on a real game (the Tier-0 measurement) before flipping default.
--- ---
## 0. Three corrections to the mental model (read first) ## Mental model (§0A0C) — see the follow-up
**(A) "Feed NVENC RGB so the ASIC does the colour-convert" is backwards.** The original three-correction mental model (A: feeding NVENC RGB is backwards; B: GPU priority is
NVENC's encode core is YUV-native. RGB input makes the driver insert an **RGB→YUV CSC on the maxed on Windows and hits a preemption-granularity ceiling; C: a chunk of the collapse is upstream
SM/3D-compute cores** — the *exact* engine a game saturates. Windows already does the right thing: of the encoder at the compositor compose-rate, with Independent/Direct Flip bypassing DWM) is
`convert_to_yuv` runs the CSC on the dedicated **VIDEO engine** via `VideoProcessorBlt` **partly corrected by `gpu-contention-investigation.md` §1** — notably that the default Windows
(`capture/dxgi.rs:1023,1063`), logged as "0% 3D". **Linux still feeds NVENC RGB** IDD-push path now feeds NVENC RGB (so §0A's "Windows already does the right thing" no longer holds),
(`encode/linux.rs:98 nvenc_input``RGBZ`/`BGRZ`; the `zerocopy/egl.rs:98` shader is a `.bgra` and "capture sees half the frames" is DLSS-Frame-Gen-specific rather than general. Read the
*swizzle*, not a CSC), so it pays NVENC's internal CSC on the SM every frame. That is the single follow-up doc for the corrected model. The durable takeaways still stand: **do less work on the
biggest, clearly-fixable contention source on Linux, and Windows already eliminated it. contended graphics/3D engine**, **overlap the unavoidable per-frame scheduling wait across frames**,
and **measure source-vs-pipeline before blaming encode**.
**(B) "More GPU priority so our frames get through" is already maxed on Windows, and hits a
hardware ceiling.** We ship `D3DKMTSetProcessSchedulingPriorityClass=HIGH(4)` +
`SetGPUThreadPriority(0x4000001E)` + `SetMaximumFrameLatency(1)` (`capture/dxgi.rs:160-263`). The
residual ~20 ms `lock_bitstream` wall (documented at `dxgi.rs:155`) is GPU **context-scheduling
latency**, bounded by **preemption granularity**: NVIDIA preempts *compute* at instruction level
(~0.1 ms) but *graphics* only at coarse draw/tile/DMA-buffer boundaries (milliseconds out under a
draw flood). No priority class preempts an in-flight game draw. So the winning strategy is **not
more priority** — it is (1) do **less work on the contended graphics/3D engine**, and (2) **overlap
the unavoidable per-frame scheduling wait across frames** to recover throughput.
**(C) A chunk of the collapse is upstream of our encoder — no encode/priority fix can beat it.**
DXGI Desktop Duplication *and* WGC both capture **from the DWM compositor**, so captured fps is
hard-ceilinged at the **compose rate**, never the game's 400 fps. Under saturation the *compositor
itself* is scheduled late → composes fewer unique frames → we starve even though NVENC is idle. And
borderless/fullscreen games on **Independent/Direct Flip** present straight to scanout, *bypassing
DWM*, so capture sees ~half the frames (this is the "200 not 240"). The host already paces at
`target_fps` and **re-encodes held frames**, so *transmitted* fps stays ~240 while *unique* fps
collapses. **This must be measured before blaming encode.**
> Net: Windows is already near best-in-class (priority + video-engine CSC + encode|send split all
> shipped); its remaining wins are narrow and partly a hardware/compositor ceiling. **Linux is the
> least-hardened host and holds most of the headroom.**
--- ---
@@ -105,7 +65,7 @@ headless dev VM cannot reproduce the contention.
--- ---
## Tier 1 — The two under-weighted, cross-platform levers (confirmed by research, not yet done) ## Tier 1 — The two under-weighted, cross-platform levers (OPEN — confirmed by research, not yet done)
### 1A. Capture-source / compose-rate cadence (where "200 not 240" actually lives) ### 1A. Capture-source / compose-rate cadence (where "200 not 240" actually lives)
The capture ceiling is the compositor's compose rate, and under load the compositor gets starved. The capture ceiling is the compositor's compose rate, and under load the compositor gets starved.
@@ -146,50 +106,23 @@ collapse (at 100% util the clock is already maxed). Low cost.
--- ---
## Tier 2 — Linux work-deletion + scheduling hygiene (the biggest in-our-control headroom) ## Tier 2 — Linux work-deletion + scheduling hygiene
### 2A. Produce **NV12/P010** on Linux and feed it to NVENC native (delete the SM-side CSC) ### 2A. Linux NV12 convert — **SHIPPED (`1fc6f73`)**
The strictly-correct version (verified): **extend the existing GL de-tile blit GL de-tile blit emits NV12 (BT.709 limited) on the GPU and feeds NVENC native YUV, deleting NVENC's
(`zerocopy/egl.rs`) to emit NV12** instead of swizzled BGRx — multi-render-target (GL_R8 luma internal RGB→YUV CSC off the contended SM. Gated `PUNKTFUNK_NV12` (default OFF). Tiled EGL/GL path
full-res + GL_RG8 chroma half-res, or two passes) applying an **explicit BT.709 limited-range only; LINEAR/Vulkan-bridge (gamescope) stays RGB. Validated colour-correct on RTX 5070 Ti. Open
matrix matching the Windows `VideoConverter`** (`dxgi.rs:957`) so hosts look identical — then follow-up: glass-to-glass latency + CS2 fps-under-saturation A/B before flipping the default, and
register `NV_ENC_BUFFER_FORMAT_NV12` with the encoder (teach `encode/linux.rs:98 nvenc_input` an the **P010** variant for the HDR/10-bit path. Code is the source of truth (`zerocopy/egl.rs`,
NV12 case; `CudaHw sw_format``AV_PIX_FMT_NV12`). `encode/linux.rs`).
- Net: today = GL swizzle (3D) **+** NVENC-internal CSC (SM); after = GL CSC (3D, ~same cost as the
swizzle it replaces) **+ zero NVENC CSC**. Removes one whole CSC pass and removes it from the SM.
- **Do *not* implement this as a standalone CUDA convert kernel on the tiled path** — CUDA can't
sample a tiled NVIDIA surface (`cuGraphicsEGLRegisterImage` is Tegra-only, `egl.rs:6-12`), so it
would still need the GL detile, *and* a CUDA kernel runs on the same saturated SM. The CUDA-kernel
route is only clean on the **LINEAR/Vulkan-bridge (gamescope)** path, where it doubles as the NV12
producer; do it there if/when that path needs it.
- Pitfalls: pervasive 4-byte-pixel assumptions break with NV12 — `cuda.rs` hardcodes
`WidthInBytes = width*4` (`:363,392,499`), `BufferPool`/`alloc_pitched` assume 4 B/px, GL dst is
`GL_RGBA8`; all need a plane-aware NV12 variant (luma W·H + chroma W·H/2, two-plane copy) or you
get the Steam-Deck green-screen class of bug. The HDR/10-bit path needs P010, not NV12.
- Impact: real, **modest, compounding** — a few ms of per-frame GPU time and a shorter time-slice
need, which stacks with cadence + power-pin. **Not** a standalone cure for the 240→40 collapse
(external "47→100 fps" numbers are other people's non-zero-copy pipelines; don't promise them).
Medium cost. Gate behind a `PUNKTFUNK_*` env and A/B `cap→encoded` p50 + the CS2 fps floor.
### 2B. Linux scheduling hygiene (cheap; the priority bits are "measure-then-keep") ### 2B. Linux scheduling hygiene — **SHIPPED (`112a054`)**
Consolidates the genuine parts of several candidates. Mostly unambiguous cleanups + opt-in `boost_thread_priority` nices capture/encode/send on Linux (best-effort `setpriority`);
priority: CUDA context uses `CU_CTX_SCHED_BLOCKING_SYNC`; copies run on a per-thread highest-priority CUDA
- **Arm the Linux `boost_thread_priority` no-op** (`punktfunk1.rs:1856` cfg branch): best-effort stream (`cuStreamCreateWithPriority`, NULL-stream fallback). The stream-priority hint is
`libc::setpriority(PRIO_PROCESS, 0, -10/-5)` on the calling thread (tid 0 = self), log-and-continue **measure-then-keep** (NVIDIA Linux may ignore it). **Do not** default to SCHED_RR/FIFO (can starve
on EPERM. **Do not** default to SCHED_RR/FIFO (can starve the compositor and the game's render the compositor + the game's render thread); opt-in only behind `PUNKTFUNK_SCHED_RR=1`. Code is the
thread — the user refuses to add game frame-time); offer it only behind `PUNKTFUNK_SCHED_RR=1`. source of truth (`punktfunk1.rs`).
**Fix the wrong doc-comment** at `punktfunk1.rs:1834-1835` ("the Linux host caps the game via
gamescope, so its threads aren't starved") — false for the uncapped/NVIDIA-direct path.
- **Set CUDA context scheduling deliberately**: `cuCtxCreate` flag `CU_CTX_SCHED_BLOCKING_SYNC` on
this shared VM (frees a core vs the default AUTO/SPIN) — a CPU-efficiency fix, not throughput.
- **High-priority CUDA stream + EGL context priority** (the missing analogue of the Windows
hardening): `cuStreamCreateWithPriority(highest from cuCtxGetStreamPriorityRange)` for our copies;
request `EGL_IMG_context_priority HIGH` (try `EGL_NV_context_priority_realtime`) at
`egl.rs:332`. **Caveat, load-bearing**: these are intra-process *hints* and NVIDIA's Linux driver
has been reported to **ignore** context priority (driver 545: high- vs low-priority EGL contexts
measured identical) and to **deny** realtime Vulkan queues. Implement with graceful fallback,
gate behind env, and **measure on driver 595** — do not architect around it or credit it before
measurement.
> Explicitly **not** doing on Linux: Vulkan `VK_EXT_global_priority` as "the" lever (it only touches > Explicitly **not** doing on Linux: Vulkan `VK_EXT_global_priority` as "the" lever (it only touches
> the minority gamescope/LINEAR copy, not the convert; likely a silent no-op on consumer NVIDIA). > the minority gamescope/LINEAR copy, not the convert; likely a silent no-op on consumer NVIDIA).
@@ -201,36 +134,41 @@ priority:
## Tier 3 — Windows parity polish (Windows is already strong) ## Tier 3 — Windows parity polish (Windows is already strong)
- **3A. Host-process session tuning (we have *zero* today — verified):** `NtSetTimerResolution(0.5ms)` ### 3A. Host-process session tuning — **SHIPPED (`112a054`)**
/ `timeBeginPeriod(1)` (default 15.6 ms granularity blocks precise pacing), `DwmEnableMMCSS(true)`, `session_tuning.rs` (raw C-ABI FFI, no-op off Windows): each capture/encode/send thread applies
`SetPriorityClass(HIGH_PRIORITY_CLASS)`, MMCSS-register the capture/encode threads ("Games"/"Pro process-wide tuning once (1 ms timer, `DwmEnableMMCSS`, `HIGH_PRIORITY_CLASS`) + per-thread MMCSS
Audio"), `SetThreadExecutionState(ES_CONTINUOUS|ES_DISPLAY_REQUIRED)`. All revert on stop. "Games" + keep-display-awake; reverts on stop. Wired into both native (`boost_thread_priority`) and
Foundational for any precise frame pacing and the encode|send split. Low cost, low risk. GameStream (`stream.rs`) paths. FFI validated on the real MSVC toolchain.
(`gamestream/stream.rs` start/stop; Apollo's `streaming_will_start`/`_stopped`.)
- **3B. Auto-gated REALTIME D3DKMT class** instead of fixed HIGH (the realtime opt-in already exists ### 3B. Auto-gated REALTIME D3DKMT class (OPEN)
at `dxgi.rs:199-207`): probe HAGS (`D3DKMTQueryAdapterInfo` `HwSchEnabled`) **and** VRAM headroom Instead of fixed HIGH (the realtime opt-in already exists at `dxgi.rs:199-207`): probe HAGS
(`IDXGIAdapter3::QueryVideoMemoryInfo`, continuously), allow REALTIME(5) only when safe (HAGS off, (`D3DKMTQueryAdapterInfo` `HwSchEnabled`) **and** VRAM headroom (`IDXGIAdapter3::QueryVideoMemoryInfo`,
or HAGS on + VRAM comfortably below budget), downgrade to HIGH the moment VRAM pressure rises — continuously), allow REALTIME(5) only when safe (HAGS off, or HAGS on + VRAM comfortably below
Sunshine's actual gate avoids the HAGS+near-full-VRAM NVENC freeze/crash. Marginal (one scheduling budget), downgrade to HIGH the moment VRAM pressure rises — Sunshine's actual gate avoids the
rung, same preemption ceiling), so rank it as cheap parity, not a fix. HAGS+near-full-VRAM NVENC freeze/crash. Marginal (one scheduling rung, same preemption ceiling), so
- **3C. Cheap experiment — `VideoProcessorBlt` directly from the DDA surface** (skip the same-format rank it as cheap parity, not a fix.
`gpu_copy` at `dxgi.rs:2375`), then `ReleaseFrame`, *iff* it doesn't re-serialize `AcquireNextFrame`
(the existing decouple-copy was measured 40-200 fps vs ~60 fps, but that note predates confirming ### 3C. `VideoProcessorBlt` directly from the DDA surface (OPEN — cheap experiment)
the Blt is on the video engine). One-line source-texture change; benchmark only. Do **not** build a Skip the same-format `gpu_copy` at `dxgi.rs:2375`, then `ReleaseFrame`, *iff* it doesn't
D3D11↔D3D12 copy-queue offload — the convert is already off-3D, the remaining copy is intra-VRAM re-serialize `AcquireNextFrame` (the existing decouple-copy was measured 40-200 fps vs ~60 fps, but
(~5% 3D, no PCIe), not worth the interop rebuild. that note predates confirming the Blt is on the video engine). One-line source-texture change;
- **3D. Async NVENC + off-thread retrieve — measure-gated, uncertain.** Today retrieve benchmark only. Do **not** build a D3D11↔D3D12 copy-queue offload — the convert is already off-3D,
(`lock_bitstream`) runs **inline on the submit thread** (`nvenc.rs:524-558`), which is *why* the remaining copy is intra-VRAM (~5% 3D, no PCIe), not worth the interop rebuild.
`depth>1` was measured to regress (`wgc_helper.rs:111-114`). The NVENC guide mandates submit/retrieve
on separate threads with completion events + a deep surface pool; doing that *could* let per-frame ### 3D. Async NVENC + off-thread retrieve (OPEN — measure-gated, uncertain)
scheduling waits **overlap across frames** and recover *throughput* — at a per-frame *latency* cost Today retrieve (`lock_bitstream`) runs **inline on the submit thread** (`nvenc.rs:524-558`), which
(depth × frame time). This is the one place the research and our own prior measurement disagree, so is *why* `depth>1` was measured to regress (`wgc_helper.rs:111-114`). The NVENC guide mandates
it is **strictly measure-first**, and it forecloses slice output (`reportSliceOffsets` needs submit/retrieve on separate threads with completion events + a deep surface pool; doing that *could*
`enableEncodeAsync=0`). Treat as a structural experiment, not a committed win. let per-frame scheduling waits **overlap across frames** and recover *throughput* — at a per-frame
*latency* cost (depth × frame time). This is the one place the research and our own prior
measurement disagree, so it is **strictly measure-first**, and it forecloses slice output
(`reportSliceOffsets` needs `enableEncodeAsync=0`). Treat as a structural experiment, not a
committed win. (The follow-up doc notes the prior "async stacks latency" result was a *same-thread*
implementation, not a disproof of a correct two-thread pipeline.)
--- ---
## Tier 4 — Deferred 2nd-order latency (not contention fixes; do after Tiers 0-2) ## Tier 4 — Deferred 2nd-order latency (OPEN — not contention fixes; do after Tiers 0-2)
- **GL2 — Intra-refresh for RFI/recovery** (`enableIntraRefresh` + recovery-point SEI) instead of a - **GL2 — Intra-refresh for RFI/recovery** (`enableIntraRefresh` + recovery-point SEI) instead of a
forced full-IDR: spreads a moving intra band across N frames, killing the 20-40× keyframe size forced full-IDR: spreads a moving intra band across N frames, killing the 20-40× keyframe size
@@ -264,16 +202,24 @@ priority:
--- ---
## Recommended order of attack ## Open items / What's left
1. **Tier 0 diagnose** on the real boxes — settles whether the collapse is source-ceiling or For current action prioritization see [`gpu-contention-investigation.md`](gpu-contention-investigation.md).
pipeline-starvation, and whether flip-bypass is halving capture. Still-open work tracked by this doc:
2. **Tier 2A (Linux NV12)** + **Tier 2B (Linux scheduling hygiene)** — the largest in-our-control
headroom; Linux is the least-hardened host. - **Tier 0** — run the `PUNKTFUNK_PERF=1` uniq-vs-fps + flip-mode diagnosis on the real-GPU boxes
3. **Tier 1B (clock/power pin)** both OSes — cheap, fixes the easy-scene 200-vs-240, crash-safe undo. (gate for everything below).
4. **Tier 1A (cadence/flip)**gated on Tier 0 (this is where a big chunk of the collapse may live). - **Tier 1A** — capture-source / compose-rate cadence levers (ForceComposedFlip verify;
5. **Tier 3 (Windows polish)** — session tuning is the clear win; the rest is parity. `PUNKTFUNK_OUTPUT_HZ_MULTIPLIER` double-refresh; Reflex/render-queue=0 headroom).
6. **Tier 4**only after the contention work; intra-refresh first, slice pipelining last. - **Tier 1B** — GPU clock/power pinning (`PUNKTFUNK_PIN_CLOCKS`; NvAPI per-app DRS on Windows w/
crash-safe undo; root-free CUDA-P2/persistence on Linux; default OFF on battery/Deck).
- **Tier 2A follow-up** — glass-to-glass + CS2-floor A/B before defaulting `PUNKTFUNK_NV12`, and the
**P010** HDR/10-bit variant.
- **Tier 3B** — auto-gated REALTIME D3DKMT class (HAGS + VRAM-headroom gate).
- **Tier 3C** — `VideoProcessorBlt` directly from the DDA surface (benchmark-only experiment).
- **Tier 3D** — correct async NVENC two-thread submit/retrieve pipeline (strictly measure-first).
- **Tier 4** — GL2 intra-refresh for RFI/recovery; GL1/GL6 sub-frame slice output + per-slice paced
send (paced-send half already shipped).
Honest expectation: with the work-deletion + cadence + power-pin levers stacked, the easy-scene gap Honest expectation: with the work-deletion + cadence + power-pin levers stacked, the easy-scene gap
closes and the saturated floor rises, but a residual ceiling remains — at 100% GPU the game closes and the saturated floor rises, but a residual ceiling remains — at 100% GPU the game
+12 -69
View File
@@ -6,6 +6,8 @@ description: "The full design: protocol core, milestones, and architecture."
*A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.* *A ground-up low-latency desktop streaming stack, built Linux-first, with a shared Rust protocol core and native clients per platform.*
> **Status:** SHIPPED — M0M5 complete, M6 largely shipped. This is the project's canonical design doc; it is trimmed to the load-bearing design (thesis, scope, architecture, protocol strategy, C ABI, virtual-display orchestration, latency budget, risk register) plus still-open items. For current shipped-feature status see CLAUDE.md "Where the work stands"; for build/test/run, repo layout, and next actions see CLAUDE.md. Git history holds the full original milestone acceptance criteria.
> The name `punktfunk` fits the lowercase house style (`unom`, `played`, `remplir`) and reads as "glass-to-glass light," which is the whole point. > The name `punktfunk` fits the lowercase house style (`unom`, `played`, `remplir`) and reads as "glass-to-glass light," which is the whole point.
--- ---
@@ -31,7 +33,7 @@ Two concrete gaps justify a new project rather than another fork:
- Native clients: Rust (Linux), Swift (macOS/iOS), Kotlin (Android) — all linking the same core. - Native clients: Rust (Linux), Swift (macOS/iOS), Kotlin (Android) — all linking the same core.
**Explicit non-goals (at least at first):** **Explicit non-goals (at least at first):**
- Windows *host* support (Sunshine/Apollo already do this well; no gap to fill). - Windows *host* support (Sunshine/Apollo already do this well; no gap to fill). *(Note: this non-goal was later reversed — a Windows host shipped; see CLAUDE.md.)*
- Internet/NAT-traversal relay infrastructure (LAN/VPN first; lean on an existing mesh VPN such as Headscale/NetBird/Tailscale). - Internet/NAT-traversal relay infrastructure (LAN/VPN first; lean on an existing mesh VPN such as Headscale/NetBird/Tailscale).
- Reinventing encoders/decoders (bind to FFmpeg + vendor SDKs; never rewrite codecs). - Reinventing encoders/decoders (bind to FFmpeg + vendor SDKs; never rewrite codecs).
- A bespoke compositor (drive existing ones; only consider a dedicated headless compositor as a *deployment mode*, see §6). - A bespoke compositor (drive existing ones; only consider a dedicated headless compositor as a *deployment mode*, see §6).
@@ -213,23 +215,18 @@ client: recv → core[reorder+FEC recover+jitter] → decode → present
--- ---
## 8. Milestones ## 8. Milestones — status
Sizing is rough and relative (Spike / S / M / L) for a focused solo dev; treat as ordering, not deadlines. M0M5 complete; M6 (feature surface) largely shipped. The original per-milestone acceptance criteria (M0 pipeline spike → M1 core+C ABI → M2 P1 host to stock Moonlight → M3 measurement harness → M4 P2 GF(2¹⁶) wall-breaker → M5 Apple client → M6 mic/HDR/per-client identity) are in git history. Live status — what is validated, what is partial — lives in CLAUDE.md "Where the work stands." The bet held: M2 (virtual-display streaming to stock Moonlight on Linux) shipped first as a complete, gap-filling release; the wall-breaking transport, native clients, and mic-done-right were unlocked from that position, resting on a FEC core that makes the 1 Gbps ceiling a thing of the past rather than a thing to hack around.
**M0 — Pipeline spike (S).** wlroots headless output → PipeWire capture → VAAPI/NVENC encode → dump H.265 to a file that plays. *Acceptance:* a valid encoded file from a virtual output, no streaming yet. Proves the Linux capture+encode chain end-to-end. ### Open items (still in flight)
**M1 — `punktfunk-core` skeleton + C ABI (M).** Session lifecycle, GameStream-compatible packetization and GF(2⁸) FEC (P1), AES-GCM, `cbindgen` header, a tiny C test harness. *Acceptance:* core links from C; round-trips packets in a loopback test with simulated loss. - **Sub-frame pipelining**: overlap encode and transmit within a frame. Requires a direct NVENC SDK wrapper (libavcodec only emits whole AUs) — the next big latency lever (~24 ms at high res).
- **Apple stage-2 presenter as the default** (`VTDecompressionSession` + `CAMetalLayer`, live-validated behind the opt-in `punktfunk.presenter` flag at ~11 ms p50) after a few resolution/HDR checks, plus **iOS/iPadOS/tvOS variants**.
**M2 — P1 host: stream to stock Moonlight (L).** Wire M0's pipeline into the core; implement `serverinfo`/pairing/RTSP enough for a real Moonlight client to connect, with a KWin virtual output created on connect and destroyed on disconnect. Input via `reis`/uinput. *Acceptance:* **you play a game on your KDE box streamed to a stock Moonlight client on a virtual display, no dummy plug, no kernel args.** This is the shippable milestone and the project's reason to exist. - **Windows client on-glass validation**: D3D11VA zero-copy decode + HDR present + the WinUI GUI polish are written against the windows-rs/reactor APIs but not yet validated on a real display+GPU (the dev VM is headless/Session-0/WARP); needs the RTX box. Then RAWINPUT relative-mouse pointer-lock and a per-host speed test in the UI.
- **Android real-device validation**: gamepad rumble/HID feedback and HDR10 (Main10/BT.2020 PQ) live-verify; presenter/latency polish.
**M3 — Measurement harness (S).** Glass-to-glass latency measurement (on-screen QR/timestamp or photodiode), packet-loss injection, frame-pacing and stall metrics surfaced in the web UI. *Acceptance:* you can quantify a regression. Build this before optimizing anything. - **gamescope multi-user isolation**: per-session input/audio so concurrent sessions are independent desktops (§8b-2 peer-push approval from a paired device's own app is the related open protocol-growth item).
- **GameStream AV1 + surround audio live confirmation**: both are implemented and unit/live-capture tested but still need a live Moonlight confirmation (select AV1 in a stock client; a real 5.1/7.1 listen including FEC under loss).
**M4 — P2 transport: break the wall (L).** Add `punktfunk/1` negotiation; swap to `reed-solomon-simd` GF(2¹⁶) with multi-block per-frame framing; optional QUIC control/audio. Write a minimal **Rust** reference client (decode via VAAPI, present via wgpu/Vulkan) to exercise it. *Acceptance:* a stable stream above 1.4 Gbps at 5120×1440@240 with loss recovery working; latency unchanged vs. M2.
**M5 — Apple client (L).** Swift + VideoToolbox + Metal + SwiftUI, linking `punktfunk-core` via the C header. *Acceptance:* a Mac plays a stream at native resolution/refresh.
**M6 — Feature surface (M, ongoing).** Mic passthrough as a proper encrypted, per-client reverse audio stream (the thing the upstream PR got wrong); HDR signalling; per-client identity/permissions; pause/resume. *Acceptance:* feature parity with Apollo on the items you care about, plus mic done right.
--- ---
@@ -244,57 +241,3 @@ Sizing is rough and relative (Spike / S / M / L) for a focused solo dev; treat a
| Frame pacing eats more time than expected | High | Med | M3 measurement harness first; treat pacing as a first-class subsystem, not a polish step | | Frame pacing eats more time than expected | High | Med | M3 measurement harness first; treat pacing as a first-class subsystem, not a polish step |
| Scope creep into a full Moonlight replacement | High | High | P1 (stock-client compat) is the firewall: it forces you to ship value before writing a client | | Scope creep into a full Moonlight replacement | High | High | P1 (stock-client compat) is the firewall: it forces you to ship value before writing a client |
| Solo bandwidth vs. other projects | High | Med | M2 is a complete, useful artifact on its own; the plan is safe to pause after any milestone | | Solo bandwidth vs. other projects | High | Med | M2 is a complete, useful artifact on its own; the plan is safe to pause after any milestone |
---
## 10. Testing & measurement
- **Loopback correctness:** core encodes→FEC→loss-inject→recover→decode in-process; property tests over loss patterns and shard counts (proptest).
- **Glass-to-glass latency:** rendered timestamp/QR on host, read back on client capture; or a photodiode for true photons. Track p50/p99.
- **Loss resilience:** `tc netem` to inject loss/jitter/reorder; verify FEC recovery and graceful degradation.
- **Pacing:** log present timestamps vs. client vsync; alert on stalls and duplicate/dropped frames.
- **Soak:** multi-hour streams; watch for buffer growth, fd leaks, encoder session exhaustion.
- **Hardware matrix:** an NVIDIA box (NVENC), an AMD/Intel box (VAAPI), a Mac (VideoToolbox decode). Catch driver quirks early.
---
## 11. Repo / workspace structure
```
punktfunk/
├── Cargo.toml # workspace
├── crates/
│ ├── punktfunk-core/ # protocol, FEC, pacing, crypto — C ABI (cdylib + staticlib)
│ │ ├── src/abi.rs # #[no_mangle] extern "C" surface
│ │ ├── src/fec.rs # GF(2^16) blocking over reed-solomon-simd
│ │ ├── src/transport/ # udp+fec video, quinn control/audio
│ │ ├── src/protocol/ # gamestream-compat (P1) + punktfunk/1 (P2)
│ │ └── cbindgen.toml
│ ├── punktfunk-host/ # Linux host binary
│ │ ├── src/capture/ # pipewire / portal
│ │ ├── src/encode/ # ffmpeg vaapi/nvenc
│ │ ├── src/vdisplay/ # trait + kwin/wlroots/mutter impls
│ │ ├── src/input/ # reis + uinput
│ │ └── src/web/ # axum config/pairing API
│ └── punktfunk-probe/ # reference Rust client (M4)
├── clients/
│ ├── apple/ # Swift package, imports punktfunk_core.h (M5)
│ └── android/ # Kotlin + JNI (later)
├── include/ # generated punktfunk_core.h
└── tools/
├── latency-probe/
└── loss-harness/
```
---
## 12. Immediate next actions (first week)
1. **Stand up the workspace** with `punktfunk-core` (empty ABI + `cbindgen`) and `punktfunk-host` skeletons; wire up CI (Gitea Actions, BuildKit-based pipelines).
2. **M0 spike on wlroots:** headless output → PipeWire capture → NVENC/VAAPI encode → playable file. This validates the riskiest *pipeline* assumptions in days, on real GPU hardware.
3. **Read KRdp's source** for how KDE creates virtual outputs and casts them — it's the closest existing reference for the KWin path needed in M2.
4. **Decide P1 protocol depth:** confirm exactly which `serverinfo`/RTSP/pairing messages a current Moonlight client requires for a successful connect, so M2's compat surface is scoped precisely.
---
*The shape of the bet: M2 alone — virtual-display streaming to stock Moonlight clients on Linux — is a complete, useful, gap-filling release. Everything after it (the wall-breaking transport, native clients, mic-done-right) is upside you unlock from a position of having already shipped, with the hard transport work resting on a FEC core that makes the 1 Gbps ceiling a thing of the past rather than a thing to hack around.*
+22 -34
View File
@@ -1,5 +1,10 @@
# Linux host setup — NVIDIA GPU VM (pipeline spike + GameStream host) # Linux host setup — NVIDIA GPU VM (pipeline spike + GameStream host)
> **Status:** Setup guide — still current and in active use (referenced from
> `clients/apple/README.md`). The pipeline spike and the GameStream host are **shipped**
> (see `design/gamestream-host-plan.md` + `CLAUDE.md`). This doc is trimmed to the bring-up
> steps + gotchas; the §4 crate-API walkthrough was folded into `CLAUDE.md` "Pinned crate facts".
How to bring up the build environment for the punktfunk Linux host on an NVIDIA-GPU Ubuntu VM How to bring up the build environment for the punktfunk Linux host on an NVIDIA-GPU Ubuntu VM
and run the **pipeline spike** (capture→encode). `punktfunk-core` already builds and is tested and run the **pipeline spike** (capture→encode). `punktfunk-core` already builds and is tested
cross-platform; this is about the platform backends in `crates/punktfunk-host`. cross-platform; this is about the platform backends in `crates/punktfunk-host`.
@@ -103,48 +108,26 @@ source /tmp/punktfunk-sway-env.sh
swaymsg exec foot # animated content swaymsg exec foot # animated content
# Live portal capture → NVENC HEVC → playable file, with each AU also round-tripped # Live portal capture → NVENC HEVC → playable file, with each AU also round-tripped
# through a punktfunk_core host→client Session (FEC + packetize + reassemble) and verified: # through a punktfunk_core host→client Session (FEC + packetize + reassemble) and verified:
cargo run -p punktfunk-host -- m0 --source portal --seconds 5 --out /tmp/punktfunk-m0.h265 cargo run -p punktfunk-host -- spike --source portal --seconds 5 --out /tmp/punktfunk-spike.h265
ffprobe /tmp/punktfunk-m0.h265 ffprobe /tmp/punktfunk-spike.h265
# No capture session needed (encode + core only): --source synthetic # No capture session needed (encode + core only): --source synthetic
``` ```
Verified result: `1920x1080` HEVC, ~300 frames in 5s, `punktfunk-core loopback … 0 mismatches`. Verified result: `1920x1080` HEVC, ~300 frames in 5s, `punktfunk-core loopback … 0 mismatches`.
The portal negotiates packed **`RGB` (24-bit, 3 bpp)** on wlroots; the encoder expands it to The portal negotiates packed **`RGB` (24-bit, 3 bpp)** on wlroots; the encoder expands it to
`rgb0` (one pad byte/pixel, no colour math) since NVENC accepts `rgb0`/`bgr0` but not `rgb0` (one pad byte/pixel, no colour math) since NVENC accepts `rgb0`/`bgr0` but not
`rgb24`. dmabuf zero-copy import is still deferred (plan §9) — this is the CPU-copy path. `rgb24`. **GPU zero-copy is now implemented on all paths** (tiled dmabuf → EGL/GL → CUDA;
LINEAR dmabuf → Vulkan bridge → CUDA → NVENC — see `CLAUDE.md`); the `capture` module keeps a
`cpu_bytes` fallback for inputs that can't be imported.
Crate choices, verified current: Crate/API details (`ashpd` 0.13 screencast handshake, `pipewire` 0.9 frame pull, `ffmpeg-next`
- **Capture (portal path):** [`ashpd`](https://docs.rs/ashpd) **0.13** with the 8.x encoder selection, `reis`/uinput input) now live in `CLAUDE.md` "Pinned crate facts" — they
`screencast` feature (the `pipewire` feature is *not* needed — `open_pipe_wire_remote` are the source of truth, with the FFmpeg-prefix override `export FFMPEG_DIR=/that/prefix` and
is unconditional). Flow (0.13 API, verified against the vendored source): `Screencast::new` the bindgen `LIBCLANG_PATH` knob in the troubleshooting table below.
`create_session(Default)``select_sources(&session, SelectSourcesOptions::default()
.set_sources(BitFlags::from_flag(SourceType::Monitor))…)``start(&session, None,
Default)``.response()?``Stream::pipe_wire_node_id()` + `open_pipe_wire_remote()`.
Note 0.13 takes **options structs**, not the old positional args, and defaults to the
**tokio** runtime — drive the handshake on a *multi-thread* tokio runtime (a
current-thread one starves zbus's reader and the portal reports "Invalid session").
Pull frames with [`pipewire`](https://docs.rs/pipewire) **0.9** — it must match the
pipewire crate ashpd 0.13 links (the `pipewire-sys` `links` key is unique per build, so
`0.10` fails to resolve). 0.9 uses `MainLoopRc`/`ContextRc::connect_fd_rc(OwnedFd)`/
`StreamBox`. Only request `SourceType::Monitor` — the wlr backend's
`AvailableSourceTypes` is `1` (Monitor only); asking for `Window`/`Virtual` invalidates
the session. Set `XDG_CURRENT_DESKTOP=sway` so the wlr portal backend is chosen, and
import it into the portal's environment (see "Portal bring-up" below).
- **Encode:** [`ffmpeg-next`](https://crates.io/crates/ffmpeg-next) **8.x** (binds the
system FFmpeg 8.x via pkg-config; needs `clang`/`libclang`). Select the encoder by
name — `encoder::find_by_name("hevc_nvenc")`, *not* by codec id (that's the SW encoder).
Low-latency opts: `preset=p1`, `tune=ull`, `rc=cbr`, `bf=0`, `delay=0`, large `g`.
If your FFmpeg is in a non-standard prefix, `export FFMPEG_DIR=/that/prefix`.
- **Zero-copy is the hard part.** There's no direct dmabuf→CUDA import in FFmpeg.
**Start with the CPU-copy fallback** (download frame → `hwupload_cuda``hevc_nvenc`)
to get an end-to-end stream, then chase true dmabuf zero-copy. The plan flags this
(§9) and the `capture` module already has a `cpu_bytes` fallback field.
- **Input (GameStream host):** [`reis`](https://crates.io/crates/reis) (pure-Rust libei — no native
`libei` needed) with `input-linux`/uinput as the universal fallback.
Then continue toward the **GameStream host**: `serverinfo`/RTSP/pairing enough for a stock Moonlight client The **GameStream host** built on this spike is shipped — `serverinfo`/RTSP/pairing for a stock
to connect, a KWin virtual output created on connect, input via reis/uinput — the Moonlight client, a per-compositor virtual output created on connect, input via reis/uinput.
shippable milestone. See `design/gamestream-host-plan.md` + `CLAUDE.md`.
## Troubleshooting ## Troubleshooting
@@ -157,3 +140,8 @@ shippable milestone.
| `Cannot load libnvidia-encode.so.1` | NVENC runtime lib missing (driver) or unlicensed vGPU | | `Cannot load libnvidia-encode.so.1` | NVENC runtime lib missing (driver) or unlicensed vGPU |
| `cargo build` can't find FFmpeg | `export FFMPEG_DIR=$(pkg-config --variable=prefix libavcodec)` or point `PKG_CONFIG_PATH` at the custom build | | `cargo build` can't find FFmpeg | `export FFMPEG_DIR=$(pkg-config --variable=prefix libavcodec)` or point `PKG_CONFIG_PATH` at the custom build |
| bindgen: libclang not found | `export LIBCLANG_PATH=$(llvm-config --libdir)` | | bindgen: libclang not found | `export LIBCLANG_PATH=$(llvm-config --libdir)` |
## Open items
None — the pipeline spike and the GameStream host it seeds are both shipped (see
`design/gamestream-host-plan.md` + `CLAUDE.md`). This file remains as the host-box bring-up guide.
+35 -88
View File
@@ -1,5 +1,7 @@
# punktfunk — security audit (2026-06-21) # punktfunk — security audit (2026-06-21)
> **Status:** AUDIT COMPLETE (2026-06-21). 11 of 12 confirmed findings are FIXED (`3526517` · `3c55ec3`) or accepted-risk DOCUMENTED-INHERENT (#5/#9); **one remains OPEN: #12** (global `NODE_TLS_REJECT_UNAUTHORIZED`, DEFERRED). This doc is trimmed to the audit trail (executive summary · threat model · per-finding status · cross-cutting themes · positives · refuted findings) plus the accepted-risk rationale and the open item. Per-finding remediation prose for the FIXED findings is collapsed to one line + commit — the shipped code is the source of truth and git history holds the full original.
Whole-project audit by a 10-surface multi-agent review; every finding adversarially verified (reachability, attacker-control, existing mitigation). **10 surfaces · 20 raw findings → 18 confirmed/partial, 2 refuted.** Threat model: a malicious network client (pre- and post-pairing) is the primary adversary; also an on-path MITM and a local unprivileged user (the host is privileged). Whole-project audit by a 10-surface multi-agent review; every finding adversarially verified (reachability, attacker-control, existing mitigation). **10 surfaces · 20 raw findings → 18 confirmed/partial, 2 refuted.** Threat model: a malicious network client (pre- and post-pairing) is the primary adversary; also an on-path MITM and a local unprivileged user (the host is privileged).
## Remediation status (2026-06-21) ## Remediation status (2026-06-21)
@@ -11,97 +13,65 @@ All 12 confirmed findings have been addressed — fixed, or documented where a f
| #1 | high | **FIXED** (3526517) — secret files 0600 + dir 0700 / Windows icacls DACL | | #1 | high | **FIXED** (3526517) — secret files 0600 + dir 0700 / Windows icacls DACL |
| #2 | high | **FIXED** (3526517) — single-use SPAKE2 PIN (consumed at the host key-confirmation) | | #2 | high | **FIXED** (3526517) — single-use SPAKE2 PIN (consumed at the host key-confirmation) |
| #3 | med | **FIXED** (3526517) — RTSP packetSize bounded + saturating packetizer math | | #3 | med | **FIXED** (3526517) — RTSP packetSize bounded + saturating packetizer math |
| #4 | low | **FIXED** — mgmt mTLS-cert auth restricted to a read-only allowlist; admin/state-changing routes require the bearer token | | #4 | low | **FIXED** (3c55ec3) — mgmt mTLS-cert auth restricted to a read-only allowlist; admin/state-changing routes require the bearer token |
| #5 | low | **DOCUMENTED (won't-fix on legacy)** — legacy GameStream GCM nonce reuse is inherent to Nvidia's old-style control encryption (Apollo/Moonlight identical); the GCM key is client-known. Real fix = V2 control-encryption negotiation; use punktfunk/1 for untrusted nets. Code comment at `control.rs` rumble loop. | | #5 | low | **DOCUMENTED (won't-fix on legacy)** — legacy GameStream GCM nonce reuse is inherent to Nvidia's old-style control encryption (Apollo/Moonlight identical); the GCM key is client-known. Real fix = V2 control-encryption negotiation; use punktfunk/1 for untrusted nets. Code comment at `control.rs` rumble loop. |
| #6 | low | **FIXED** — RTSP Content-Length/header caps + per-read timeout + concurrent-connection cap | | #6 | low | **FIXED** (3c55ec3) — RTSP Content-Length/header caps + per-read timeout + concurrent-connection cap |
| #7 | low | **FIXED (GameStream) / DOCUMENTED (native)** — new `VirtualDisplay::set_launch_command` carries the launch command per-session (GameStream); native path keeps the env (safe under today's single-session model; plumb per-session with concurrent sessions) | | #7 | low | **FIXED (3c55ec3, GameStream) / DOCUMENTED (native)** — new `VirtualDisplay::set_launch_command` carries the launch command per-session (GameStream); native path keeps the env (safe under today's single-session model; plumb per-session with concurrent sessions) |
| #8 | info | **FIXED** — constant-time GameStream phase-4 hash compare (`crypto::ct_eq`) | | #8 | info | **FIXED** (3c55ec3) — constant-time GameStream phase-4 hash compare (`crypto::ct_eq`) |
| #9 | info | **DOCUMENTED** — GameStream pairing over plain HTTP is inherent to GFE compat; steer untrusted networks to the SPAKE2 native plane | | #9 | info | **DOCUMENTED** — GameStream pairing over plain HTTP is inherent to GFE compat; steer untrusted networks to the SPAKE2 native plane |
| #10 | info | **FIXED** — fixed ALPN (`pkf1`) on both QUIC endpoints (coordinated client+host upgrade) | | #10 | info | **FIXED** (3c55ec3) — fixed ALPN (`pkf1`) on both QUIC endpoints (coordinated client+host upgrade) |
| #11 | info | **FIXED** — FEC reconstruction failure is now a counted drop, not stream-fatal | | #11 | info | **FIXED** (3c55ec3) — FEC reconstruction failure is now a counted drop, not stream-fatal |
| #12 | low | **DEFERRED (fix ready, reverted)** — the scoped-dispatcher fix (undici `Agent` on `proxyRequest`'s `fetch` option) is designed and the mechanism verified sound (h3 honors the fetch option), but it needs `undici` added as a web dependency (`bun add undici` + lockfile regen), which requires the web build env — not available here. Reverted to keep the web build/proxy working. Latent-only: the loopback mgmt fetch is the web console's ONLY outbound TLS, so the global env weakens nothing today. Apply with: `cd web && bun add undici`, then scope `rejectUnauthorized:false` to the mgmt fetch and drop the global env. | | #12 | low | **DEFERRED (fix ready, reverted)** — the scoped-dispatcher fix (undici `Agent` on `proxyRequest`'s `fetch` option) is designed and the mechanism verified sound (h3 honors the fetch option), but it needs `undici` added as a web dependency (`bun add undici` + lockfile regen), which requires the web build env — not available here. Reverted to keep the web build/proxy working. Latent-only: the loopback mgmt fetch is the web console's ONLY outbound TLS, so the global env weakens nothing today. Apply with: `cd web && bun add undici`, then scope `rejectUnauthorized:false` to the mgmt fetch and drop the global env. |
## Open items
Exactly **one** finding is still open — everything else is FIXED or accepted-risk DOCUMENTED (see the table above and #5/#9 below).
- **#12 [LOW] Web console sets `NODE_TLS_REJECT_UNAUTHORIZED=0` process-globally — DEFERRED.** Latent-only today: the only server-side outbound TLS hop is the loopback proxy to `https://127.0.0.1:47990`, which cannot be MITM'd, so impact is nil now. The fix is designed and verified sound (scope `rejectUnauthorized:false` to a per-request `https.Agent`/undici `Agent` pinned to the host cert on `proxyRequest`'s `fetch`, and drop the global env) but was reverted because it needs `bun add undici` (+ lockfile regen) in the web build env, not available here. **Guard: this MUST be fixed before the web app gains ANY off-loopback server-side outbound TLS** — an update check, webhook, metadata fetch, or pointing `PUNKTFUNK_MGMT_URL` off-loopback would make it silently exploitable. Full detail in the findings section below.
## Executive summary ## Executive summary
Overall the punktfunk host is a security-conscious codebase with a strong cryptographic and wire-parsing core: the FEC/reassembler path bounds every attacker-controlled length field before allocation, AES-GCM is used correctly with per-direction nonce separation and seq-as-AAD on the native plane, and the native trust model (SPAKE2 PIN binding both cert fingerprints, fingerprint pinning that still verifies the real TLS handshake signature) is genuinely sound. The most serious real defects are (1) local secret-disclosure of the host's master private key (key.pem) — written with no restrictive mode/ACL while the far-less-sensitive mgmt token is carefully 0600 — which on Windows (%ProgramData% default Users-read ACL, LocalSystem service) is a near-certain cross-privilege host-impersonation primitive, and (2) the native SPAKE2 PIN ceremony permitting unlimited online guesses against a static, non-rotating 4-digit PIN (no disarm-on-failure, no lockout), which contradicts the documented "one online guess" guarantee and lets a pre-auth LAN attacker brute-force pairing of a fully-trusted rogue client in a few hours against the default standalone/CLI flow. Dominant themes: file-permission hygiene on secrets is inconsistent (the secure pattern exists but is applied selectively), pairing throttling relies on a single global rate-limit rather than attempt-bounding, and authorization is overbroad (any streaming-paired cert is also a full mgmt admin). The remaining findings are a contained pre-auth RTSP video-thread DoS (unbounded packetSize and Content-Length), a legacy GameStream control-stream GCM nonce-reuse that is muted by modern V2 negotiation and being key-gated, and several defense-in-depth nits (non-constant-time GameStream hash compare, no QUIC ALPN, cross-session env-var launch confusion, global NODE_TLS_REJECT_UNAUTHORIZED). No memory-unsafety or RCE was found on attacker wire bytes; panics are safe Rust and isolated by panic=unwind. Net: a solid foundation whose highest-leverage fixes are tightening secret file permissions and making the PIN single-use/lockout-bounded. Overall the punktfunk host is a security-conscious codebase with a strong cryptographic and wire-parsing core: the FEC/reassembler path bounds every attacker-controlled length field before allocation, AES-GCM is used correctly with per-direction nonce separation and seq-as-AAD on the native plane, and the native trust model (SPAKE2 PIN binding both cert fingerprints, fingerprint pinning that still verifies the real TLS handshake signature) is genuinely sound. The most serious real defects are (1) local secret-disclosure of the host's master private key (key.pem) — written with no restrictive mode/ACL while the far-less-sensitive mgmt token is carefully 0600 — which on Windows (%ProgramData% default Users-read ACL, LocalSystem service) is a near-certain cross-privilege host-impersonation primitive, and (2) the native SPAKE2 PIN ceremony permitting unlimited online guesses against a static, non-rotating 4-digit PIN (no disarm-on-failure, no lockout), which contradicts the documented "one online guess" guarantee and lets a pre-auth LAN attacker brute-force pairing of a fully-trusted rogue client in a few hours against the default standalone/CLI flow. Dominant themes: file-permission hygiene on secrets is inconsistent (the secure pattern exists but is applied selectively), pairing throttling relies on a single global rate-limit rather than attempt-bounding, and authorization is overbroad (any streaming-paired cert is also a full mgmt admin). The remaining findings are a contained pre-auth RTSP video-thread DoS (unbounded packetSize and Content-Length), a legacy GameStream control-stream GCM nonce-reuse that is muted by modern V2 negotiation and being key-gated, and several defense-in-depth nits (non-constant-time GameStream hash compare, no QUIC ALPN, cross-session env-var launch confusion, global NODE_TLS_REJECT_UNAUTHORIZED). No memory-unsafety or RCE was found on attacker wire bytes; panics are safe Rust and isolated by panic=unwind. Net: a solid foundation whose highest-leverage fixes are tightening secret file permissions and making the PIN single-use/lockout-bounded.
> The executive summary describes the **pre-fix** state captured by the audit; it is retained verbatim as the historical record. See the status table for what shipped.
## Findings (ranked by severity × exploitability) ## Findings (ranked by severity × exploitability)
FIXED findings are collapsed to one line + commit. The two accepted-risk findings (#5, #9) and the one open finding (#12) keep their full rationale.
### 🟠 #1 [HIGH] Host master private key (key.pem) written with no restrictive file mode / ACL — local secret disclosure enabling full host impersonation ### 🟠 #1 [HIGH] Host master private key (key.pem) written with no restrictive file mode / ACL — local secret disclosure enabling full host impersonation
**Surface:** `secrets-availability` · **FIXED (3526517).** key.pem is the single trust root for ALL surfaces (GameStream TLS cert + pairing signing key, the punktfunk/1 QUIC identity every client pins, the mgmt HTTPS cert); `ServerIdentity::load_or_create` wrote it with a bare `fs::write`/`create_dir_all` while the less-sensitive mgmt token used `OpenOptions::mode(0o600)`. On Windows (%ProgramData% Users-read ACL + LocalSystem service) this was a near-certain cross-privilege host-impersonation primitive; on Linux it landed at umask (verified world-readable). Fix: 0600 file + 0700 dir on Unix mirroring `mgmt_token.rs`, SYSTEM+Administrators-only DACL on the Windows subtree, extended to `client-key.pem`/trust stores, + a 0600 regression test. Refs `gamestream/cert.rs`, `gamestream/mod.rs`, `mgmt_token.rs`, `service.rs`, `native_pairing.rs`.
**Surface:** `secrets-availability`
**Refs:** `crates/punktfunk-host/src/gamestream/cert.rs:36-44`, `crates/punktfunk-host/src/gamestream/mod.rs:216-232`, `crates/punktfunk-host/src/mgmt_token.rs:58-70`, `crates/punktfunk-host/src/service.rs:605-627`, `crates/punktfunk-host/src/native_pairing.rs:116-126`
**Why it ranks here / impact:** Ranked #1 because it is the highest verdict-adjusted severity (high, three corroborating findings merged) and the most reliably exploitable post-foothold: key.pem is the single trust root for ALL surfaces — GameStream TLS server cert, GameStream pairing signing key, the punktfunk/1 QUIC identity every client pins, and the mgmt HTTPS cert — so its disclosure yields full host impersonation/MITM that defeats client fingerprint pinning, plus the mgmt bearer token is likewise unprotected on Windows. ServerIdentity::load_or_create writes it with a bare fs::write (no mode) and create_dir_all (no DACL). On Windows the leak is near-certain and umask-independent: config_dir() is %ProgramData%\punktfunk, whose default ACL grants BUILTIN\Users read, and the host runs as LocalSystem — any local unprivileged user reads the SYSTEM service's key; the mgmt-token 0o600 hardening is #[cfg(unix)] so it is a no-op there. On Linux the file lands at umask (commonly 0664/0644, verified live as world-readable) and is reachable cross-user whenever the home/config chain is traversable. The project demonstrably knows the secure pattern (mgmt_token.rs uses OpenOptions::mode(0o600)+set_permissions) but applies it to the less-sensitive token and not the master key. Local-only (adversary #3), not pre-auth/network, which caps it below critical.
**Fix:** Write key.pem (and cert.pem) via OpenOptions::mode(0o600) + a follow-up set_permissions(0o600) on Unix, mirroring mgmt_token.rs; create config_dir() with DirBuilder::mode(0o700). On Windows set an explicit DACL granting only SYSTEM+Administrators on the punktfunk %ProgramData% subtree and per-file on key.pem / mgmt-token / *paired.json (or relocate the key under a SYSTEM-only path), since the default ProgramData ACL is Users-readable. Extend the same hardening to client-key.pem and the persisted trust stores. Add a regression test asserting 0600 on key.pem on Unix.
### 🟠 #2 [HIGH] Native SPAKE2 PIN ceremony allows unlimited online guesses against a static 4-digit PIN — pre-auth brute-force to a fully-trusted rogue client ### 🟠 #2 [HIGH] Native SPAKE2 PIN ceremony allows unlimited online guesses against a static 4-digit PIN — pre-auth brute-force to a fully-trusted rogue client
**Surface:** `pairing-pin` · **FIXED (3526517).** `pair_ceremony` logged+errored on a wrong PIN but never called `np.disarm()`/rotated; `current_pin()` returned the same value forever, throttled only by one process-wide 2s `PAIRING_COOLDOWN` over a 10,000-PIN space (~2.8h avg → permanently-pinned rogue cert with input/capture/launch). The standalone `punktfunk1-host` default (`--require-pairing` arms `expires_at:None`) made the indefinite static-PIN window the DEFAULT, not opt-in — contradicting the documented "one online guess". Fix: a failed confirmation now consumes/disarms the PIN (the actual "one online guess"); disarm after a successful pair too. Refs `punktfunk1.rs`, `native_pairing.rs`, `mgmt.rs`.
**Surface:** `pairing-pin` ### 🟡 #3 [MEDIUM] Pre-auth RTSP ANNOUNCE packetSize underflows/panics the GameStream video pipeline
**Refs:** `crates/punktfunk-host/src/punktfunk1.rs:388-446`, `crates/punktfunk-host/src/punktfunk1.rs:475-491`, `crates/punktfunk-host/src/punktfunk1.rs:82`, `crates/punktfunk-host/src/native_pairing.rs:189-234`, `crates/punktfunk-host/src/native_pairing.rs:128-131`, `crates/punktfunk-host/src/mgmt.rs:841-842` **Surface:** `gamestream-parsing` · **FIXED (3526517).** The RTSP listener on TCP 48010 does no TLS/pairing/auth; `x-nv-video[0].packetSize` flowed unbounded into `VideoPacketizer::new` where `payload_per_shard = packet_size - 16` → packetSize==16 div-by-zero, <16 underflow OOB slice, ==17 per-frame datagram flood; the safe-Rust panic unwound before `running.store(false)`, wedging the session until restart (not RCE — isolated by panic=unwind). Fix: clamp packetSize (floor ~64 / cap ~2048) + checked/saturating packetizer math + `store(false)` on the unwind path + a `{0,15,16,17}` regression test. Refs `gamestream/rtsp.rs`, `gamestream/video.rs`, `gamestream/stream.rs`.
**Why it ranks here / impact:** Ranked #2: high severity AND pre-auth + fully attacker-controlled, the strongest exploitability combination among the high-rated issues — gated only on pairing being armed and an hours-long active window. Merges the three pairing-pin brute-force findings (they share one root cause: no disarm/rotate-on-failure and no attempt budget). pair_ceremony logs a warning and returns Err on a wrong PIN but never calls np.disarm() or rotates the PIN; current_pin() returns the same value forever (cleared only by TTL or operator); the only throttle is one process-wide 2s PAIRING_COOLDOWN. The PIN space is 10,000. Critically the standalone punktfunk1-host default (--require-pairing forces allow_pairing) arms with expires_at:None at startup, so the indefinite static-PIN window is the DEFAULT for that binary, not an opt-in. At ~1 guess/2s the space exhausts in ~5.5h worst / ~2.8h avg, and on success the attacker's cert is permanently pinned, granting input injection, screen capture and app launch. This directly contradicts the documented 'one online guess, no offline dictionary' claim — the offline-dictionary resistance from SPAKE2 holds, but the online single-guess limit is simply not implemented. Mitigations partial: the web/mgmt arm path is TTL-bounded (15..600s), confining the worst case to the CLI/standalone mode. ### 🔵 #4 [LOW] Any paired punktfunk/1 streaming client gets full management-API authority via the mTLS-paired-cert auth path
**Surface:** `authz-trust` · **FIXED (3c55ec3).** `require_auth` granted any verified peer cert in the native paired store full unscoped `/api/v1` access — the SAME set that admits a device to stream — so a watch-only device could `DELETE /clients/{fp}`, arm pairing + read the PIN, approve knocks, `DELETE /session`, CRUD the library. Fix: mTLS-cert auth restricted to a read-only allowlist; state-changing/admin/pairing-administration routes require the bearer token. Refs `mgmt.rs:459-488`.
**Fix:** Make a failed confirmation consume the PIN: on ok==false in pair_ceremony, call np.disarm() (or rotate to a fresh random PIN) so a single wrong guess closes the window — this is what actually delivers the documented 'one online guess'. Add a per-window failed-attempt budget (auto-disarm after N>=1 failures), give the CLI no-expiry arm path a default expiry, and disarm after a SUCCESSFUL pair too. Keep the 2s cooldown as defence-in-depth and raise the web-armed PIN to 6 digits.
### 🟡 #3 [MEDIUM] Pre-auth RTSP ANNOUNCE packetSize underflows/panics the GameStream video pipeline (div-by-zero / OOB slice / allocation amplification)
**Surface:** `gamestream-parsing`
**Refs:** `crates/punktfunk-host/src/gamestream/rtsp.rs:275`, `crates/punktfunk-host/src/gamestream/video.rs:55-89`, `crates/punktfunk-host/src/gamestream/stream.rs:322`
**Why it ranks here / impact:** Ranked #3: medium and fully pre-auth + attacker-controlled — the highest-exploitability of the medium-and-below tier. The RTSP listener on TCP 48010 performs no TLS/pairing/auth; an unauthenticated peer drives OPTIONS→ANNOUNCE→PLAY (+ a UDP ping to the video port) and the video thread starts on state.stream alone, no paired session required. x-nv-video[0].packetSize is read with no bound and flows into VideoPacketizer::new where payload_per_shard = packet_size - 16: packetSize==16 → pps==0 → div-by-zero panic; packetSize<16 → underflow → OOB slice panic; packetSize==17 → one byte/shard → per-frame datagram flood. Reliable remote pre-auth DoS of a privileged media service, made stickier because the panic unwinds before running.store(false) leaving the session wedged until restart. Calibrated medium (not higher) because it is a SAFE Rust panic (checked slice access, no memory corruption/UB) isolated to the punktfunk-video thread by panic=unwind — the host process and other listeners survive; not RCE.
**Fix:** Validate packet_size in stream_config() before building StreamConfig: reject packetSize below a sane floor (e.g. < 64) and clamp to a sane max (e.g. <= 2048). Additionally harden VideoPacketizer::new to use checked/saturating arithmetic and refuse construction (or fall back to a default) when packet_size < SHARD_HEADER-16 so the per-frame path never sees pps==0 or a wrapped payload_per_shard. Also store(false) on the unwind path so a panic doesn't wedge the session. Add a regression test over packetSize in {0,15,16,17}.
### 🔵 #4 [LOW] Any paired punktfunk/1 streaming client gets full management-API authority via the mTLS-paired-cert auth path (no streaming-vs-admin separation)
**Surface:** `authz-trust`
**Refs:** `crates/punktfunk-host/src/mgmt.rs:459-488`, `crates/punktfunk-host/src/mgmt.rs:466-470`
**Why it ranks here / impact:** Ranked #4: low but a genuine post-auth privilege over-broadening with concrete admin impact. require_auth grants any verified peer cert whose fingerprint is in the native paired store full unscoped access to every /api/v1 route — the SAME paired set that admits a device to stream. So a device paired purely to watch the screen can DELETE /clients/{fp} (unpair others), POST /native/pair/arm (open a pairing window and read the PIN), approve arbitrary knocking devices, DELETE /session, and CRUD the library; there is no role/scope check anywhere in the router. The native client presents its identity via TLS client auth on both ports, so the credential is genuinely usable against mgmt. Bounded to low because it requires being an already-paired (operator-trusted) device and the mgmt port binds loopback by default — remote reach needs an explicit routable --mgmt-bind (and the mTLS path then bypasses the token requirement).
**Fix:** Separate streaming trust from management trust: keep a distinct admin allow-list (or an admin flag on a paired entry) for the mTLS mgmt path, or restrict mTLS-cert auth to read-only endpoints and require the bearer token for state-changing/admin routes. At minimum gate the pairing-administration endpoints (arm/approve/unpair) and session/library mutation behind the bearer token only.
### 🔵 #5 [LOW] GameStream legacy control-stream AES-GCM nonce reuse across directions (host rumble vs client input share key+nonce) ### 🔵 #5 [LOW] GameStream legacy control-stream AES-GCM nonce reuse across directions (host rumble vs client input share key+nonce)
**Surface:** `crypto` **Surface:** `crypto` · **DOCUMENTED (won't-fix on legacy).**
**Refs:** `crates/punktfunk-host/src/gamestream/control.rs:373-400`, `crates/punktfunk-host/src/gamestream/control.rs:257-266`, `crates/punktfunk-host/src/gamestream/control.rs:67,106-114` **Refs:** `crates/punktfunk-host/src/gamestream/control.rs:373-400`, `crates/punktfunk-host/src/gamestream/control.rs:257-266`, `crates/punktfunk-host/src/gamestream/control.rs:67,106-114`
**Why it ranks here / impact:** Ranked #5: a real, correctly-identified catastrophic-class crypto defect (AES-GCM (key,nonce) reuse) but adjusted to low because reachability and impact are heavily muted. The legacy NonceKind branches apply no direction separation (other => other), so host rumble (rumble_seq from 0) and client control (seq from 0) under the shared rikey produce identical (key,nonce). BUT: (1) it only triggers on the legacy auto-detected scheme — modern moonlight-common-c negotiates the V2 scheme which flips marker[0] to 'H' and is direction-separated, so the default path is safe; the doc claim 'the legacy path — which we hit' is stale; (2) the rikey is delivered only over the mTLS /launch, so a pure MITM cannot derive the key — only a paired client can; (3) a paired client can already legitimately send any client→host control message (in-scope-by-design), so forgery is largely redundant and the only genuinely new gain is recovering low-value rumble keystream / forging rumble to its own client. Post-auth, conditional path. **Why it ranks here / impact:** Ranked #5: a real, correctly-identified catastrophic-class crypto defect (AES-GCM (key,nonce) reuse) but adjusted to low because reachability and impact are heavily muted. The legacy NonceKind branches apply no direction separation (other => other), so host rumble (rumble_seq from 0) and client control (seq from 0) under the shared rikey produce identical (key,nonce). BUT: (1) it only triggers on the legacy auto-detected scheme — modern moonlight-common-c negotiates the V2 scheme which flips marker[0] to 'H' and is direction-separated, so the default path is safe; the doc claim 'the legacy path — which we hit' is stale; (2) the rikey is delivered only over the mTLS /launch, so a pure MITM cannot derive the key — only a paired client can; (3) a paired client can already legitimately send any client→host control message (in-scope-by-design), so forgery is largely redundant and the only genuinely new gain is recovering low-value rumble keystream / forging rumble to its own client. Post-auth, conditional path.
**Fix:** Separate the two directions' nonce spaces for the legacy schemes too — set a reserved high bit/byte of the legacy IV for host-originated packets (mirror the V2 'H' marker), or better, HKDF-derive an independent host→client key from the rikey with a direction label so host and client never share a GCM key. Never let host rumble and client input share (key,nonce). **Fix:** Separate the two directions' nonce spaces for the legacy schemes too — set a reserved high bit/byte of the legacy IV for host-originated packets (mirror the V2 'H' marker), or better, HKDF-derive an independent host→client key from the rikey with a direction label so host and client never share a GCM key. Never let host rumble and client input share (key,nonce). (Inherent to the legacy Nvidia wire; the real fix is the V2 control-encryption / punktfunk/1 path. A code comment marks it at the `control.rs` rumble loop.)
### 🔵 #6 [LOW] RTSP request Content-Length / header size unbounded with no read timeout or connection cap — pre-auth slow-loris / memory-growth DoS ### 🔵 #6 [LOW] RTSP request Content-Length / header size unbounded with no read timeout or connection cap — pre-auth slow-loris / memory-growth DoS
**Surface:** `gamestream-parsing` · **FIXED (3c55ec3).** `read_message` computed `total = end+4+content_len` with no cap and looped `extend_from_slice`; the header scan was unbounded and one unbounded native thread spawned per connection with no global limit (slow exhaustion + thread/FD exhaustion on a privileged plaintext LAN listener). Fix: Content-Length/total-header caps + per-read timeout + concurrent-connection cap. Refs `gamestream/rtsp.rs`.
**Surface:** `gamestream-parsing` ### 🔵 #7 [LOW] Per-session launch command carried via process-global PUNKTFUNK_GAMESCOPE_APP env var, stomped under concurrent native sessions
**Refs:** `crates/punktfunk-host/src/gamestream/rtsp.rs:82-106`, `crates/punktfunk-host/src/gamestream/rtsp.rs:24-48` **Surface:** `privilege-process-launch` · **FIXED (3c55ec3, GameStream) / DOCUMENTED (native).** `serve_session` did `std::env::set_var(PUNKTFUNK_GAMESCOPE_APP)` per connection under `DEFAULT_MAX_CONCURRENT=4` — a TOCTOU where client B's launch overwrote what client A's gamescope bare-spawn read (and a never-cleared value leaked to a later no-launch client). NOT command injection: `cmd` always resolves through `library::launch_command` (digit-validated Steam appids / operator-only custom store), so the worst case is a different operator-approved title, and only the gamescope bare-spawn backend reads the var. Fix: `VirtualDisplay::set_launch_command` carries it per-session for GameStream; **the native path keeps the env — safe under today's single-session model, plumb per-session as concurrent sessions land** (still-open follow-up). Refs `punktfunk1.rs`, `vdisplay/gamescope.rs`.
**Why it ranks here / impact:** Ranked #6: low, pre-auth and attacker-controlled but a rate-limited resource DoS, not unsafety or auth bypass. read_message parses content-length and computes total = end+4+content_len with no cap, looping buf.extend_from_slice until buf.len()>=total; the header scan is likewise unbounded and there is no body/header cap, no read/write timeout, and one unbounded native thread is spawned per connection with no global limit. Growth is bounded by attacker send rate (no pre-allocation), so it is slow exhaustion rather than instant OOM; the stronger lever is thread/FD exhaustion from many idle slow-loris connections at near-zero bandwidth. On a privileged LAN-facing plaintext listener with zero defensive caps.
**Fix:** Cap Content-Length and total header size to small constants (e.g. reject content_len > 64 KiB, total header > 16 KiB) and close on violation. Add a read timeout so a slow-loris connection cannot pin a thread indefinitely, and bound concurrent RTSP connections.
### 🔵 #7 [LOW] Per-session launch command carried via process-global PUNKTFUNK_GAMESCOPE_APP env var, stomped under concurrent native sessions (cross-session launch confusion)
**Surface:** `privilege-process-launch`
**Refs:** `crates/punktfunk-host/src/punktfunk1.rs:560-571`, `crates/punktfunk-host/src/punktfunk1.rs:140`, `crates/punktfunk-host/src/vdisplay/gamescope.rs:629-647`
**Why it ranks here / impact:** Ranked #7: low, post-auth cross-session isolation bug, explicitly NOT command injection. serve_session does std::env::set_var(PUNKTFUNK_GAMESCOPE_APP) per accepted connection with a stale comment claiming 'one session at a time', but DEFAULT_MAX_CONCURRENT=4 sessions run concurrently and the var is read in gamescope::spawn during VirtualDisplay::create — a genuine TOCTOU where client B's launch overwrites what client A's bare-spawn reads, and the never-cleared value leaks into a later no-launch client. Impact is capped because cmd always resolves through library::launch_command (digit-validated Steam appids / operator-only custom store), so the worst case is launching a DIFFERENT operator-approved title or a stale title — and it only affects the gamescope bare-spawn backend (kwin/mutter/wlroots/attach ignore the var).
**Fix:** Stop carrying the per-session launch command in a process-global env var. Plumb the resolved command through the VirtualDisplay::create call / per-session context (e.g. a field on Mode or a per-session GamescopeDisplay), and on the bare-spawn path pass it explicitly to spawn(); clear/scope it so a stale value never leaks to the next client.
### ⚪ #8 [INFO] GameStream pairing phase-4 hash compare is not constant-time ### ⚪ #8 [INFO] GameStream pairing phase-4 hash compare is not constant-time
**Surface:** `pairing-pin` · **FIXED (3c55ec3).** Variable-time `==` on attacker-influenced 32-byte phase-4 SHA-256 digests; not weaponizable (`expected` mixes undisclosed host-random `server_challenge`, a mismatch `map.remove`s the session forcing fresh randomness — no stable secret, no timing→PIN path). Fixed for consistency with the native ceremony: `crypto::ct_eq`. Refs `gamestream/pairing.rs:226-247`.
**Surface:** `pairing-pin`
**Refs:** `crates/punktfunk-host/src/gamestream/pairing.rs:226-247`
**Why it ranks here / impact:** Ranked #8: info / hardening only — a real variable-time `==` on attacker-influenced 32-byte SHA-256 digests, but not weaponizable. The compared `expected` mixes in host-random server_challenge that is never disclosed (so the attacker can neither compute nor aim at the target), the attacker cannot steer client_hash to a chosen value without the PIN key, and any mismatch removes the session (map.remove) forcing a fresh ceremony with new randomness — so there is no stable secret to recover prefix-by-prefix and no path from timing to PIN recovery or match forgery. Worth fixing for consistency since the codebase already has ct_eq for the native ceremony.
**Fix:** Use a constant-time comparator (subtle::ConstantTimeEq or the project's existing ct_eq) for hash_ok, matching the constant-time discipline already used in the native SPAKE2 ceremony.
### ⚪ #9 [INFO] GameStream pairing ceremony runs over plain HTTP — inherited GFE brute-forceable-PIN / MITM weakness ### ⚪ #9 [INFO] GameStream pairing ceremony runs over plain HTTP — inherited GFE brute-forceable-PIN / MITM weakness
**Surface:** `authz-trust` **Surface:** `authz-trust` · **DOCUMENTED (inherent to GameStream compat).**
**Refs:** `crates/punktfunk-host/src/gamestream/nvhttp.rs:33`, `crates/punktfunk-host/src/gamestream/nvhttp.rs:215-264`, `crates/punktfunk-host/src/gamestream/pairing.rs:102-247` **Refs:** `crates/punktfunk-host/src/gamestream/nvhttp.rs:33`, `crates/punktfunk-host/src/gamestream/nvhttp.rs:215-264`, `crates/punktfunk-host/src/gamestream/pairing.rs:102-247`
**Why it ranks here / impact:** Ranked #9: info — real but intentional Moonlight-compat behavior, on record rather than a regression. The whole /pair flow (incl. phase-4 cert pinning) is on plain HTTP 47989 with no transport confidentiality and no rate-limiting; the AES key is pin_key(salt,pin) = SHA-256(salt||pin)[..16] feeding AES-128-ECB, so an on-path attacker observing a legitimate pairing can offline-brute-force the 4-digit PIN and forge a clientpairingsecret to get a cert pinned. This is the well-known GFE/Sunshine construction, fixed by interop, and is precisely why punktfunk/1's SPAKE2 path exists; it requires an active MITM during an operator-initiated pairing within the 300s window. A paired GameStream client is in-scope-by-design. **Why it ranks here / impact:** Ranked #9: info — real but intentional Moonlight-compat behavior, on record rather than a regression. The whole /pair flow (incl. phase-4 cert pinning) is on plain HTTP 47989 with no transport confidentiality and no rate-limiting; the AES key is pin_key(salt,pin) = SHA-256(salt||pin)[..16] feeding AES-128-ECB, so an on-path attacker observing a legitimate pairing can offline-brute-force the 4-digit PIN and forge a clientpairingsecret to get a cert pinned. This is the well-known GFE/Sunshine construction, fixed by interop, and is precisely why punktfunk/1's SPAKE2 path exists; it requires an active MITM during an operator-initiated pairing within the 300s window. A paired GameStream client is in-scope-by-design.
@@ -109,31 +79,19 @@ Overall the punktfunk host is a security-conscious codebase with a strong crypto
**Fix:** Inherent to GameStream compatibility — document it and steer users to punktfunk/1 (SPAKE2) for untrusted networks. Optionally rate-limit pairing sessions per uniqueid/IP and tighten/expire the awaiting-PIN window aggressively. **Fix:** Inherent to GameStream compatibility — document it and steer users to punktfunk/1 (SPAKE2) for untrusted networks. Optionally rate-limit pairing sessions per uniqueid/IP and tighten/expire the awaiting-PIN window aggressively.
### ⚪ #10 [INFO] No ALPN configured on the native QUIC server/client (cross-protocol confusion hardening absent) ### ⚪ #10 [INFO] No ALPN configured on the native QUIC server/client (cross-protocol confusion hardening absent)
**Surface:** `cert-tls-identity` · **FIXED (3c55ec3).** No `alpn_protocols` was set on either endpoint, but no reachable confusion attack: ALPACA needs two TLS services sharing a cert on the SAME transport — GameStream is TLS-over-TCP, punktfunk/1 is TLS-in-QUIC (UDP), and there is exactly one QUIC server, with trust already enforced by fingerprint pinning + Hello/Welcome magic. Fixed as cheap future-proofing: fixed ALPN `pkf1` on both endpoints (coordinated client+host upgrade). Refs `quic.rs:1335-1448`.
**Surface:** `cert-tls-identity`
**Refs:** `crates/punktfunk-core/src/quic.rs:1335-1354`, `crates/punktfunk-core/src/quic.rs:1412-1448`
**Why it ranks here / impact:** Ranked #10: info — factually correct (no alpn_protocols set on either endpoint; the cert.pem identity is shared with GameStream TLS) but no reachable confusion attack. ALPACA-style attacks need two TLS services sharing a cert on the SAME transport; here GameStream is TLS-over-TCP and punktfunk/1 is TLS-in-QUIC (UDP) — not cross-reachable — and there is exactly one QUIC server so ALPN would make no authorization decision. Trust is already enforced by fingerprint pinning + app-layer Hello/Welcome magic. Cheap future-proofing only.
**Fix:** Set a fixed ALPN on both endpoints (e.g. rustls_cfg.alpn_protocols = vec![b"pkf1".to_vec()]) so a mismatched protocol is rejected during the TLS handshake — defense-in-depth against ever multiplexing protocols on the QUIC endpoint.
### ⚪ #11 [INFO] FEC reconstruct error on the receive path is stream-fatal — code-contract inconsistency (not an exploitable DoS) ### ⚪ #11 [INFO] FEC reconstruct error on the receive path is stream-fatal — code-contract inconsistency (not an exploitable DoS)
**Surface:** `core-wire-deser` · **FIXED (3c55ec3).** `Reassembler::push` propagated `coder.reconstruct(...)?` and both receive-side callers treated any non-`NoFrame` error as fatal — inconsistent with the surrounding "malformed = silent drop, never fatal" discipline. Every Err arm was traced unreachable from hostile input (header firewall + block-geometry pinning guarantee equal-length correctly-counted shards; `Config::validate` rejects odd/zero `shard_payload`; MDS Reed-Solomon decodes any `data_shards` distinct shards; reaching the reassembler needs an AES-GCM-decryptable packet; client-side only). Fixed as defense-in-depth: a reconstruct failure is now a counted drop returning `Ok(None)`, reserving `Err` for genuinely fatal conditions. Refs `packet.rs`, `session.rs`, `clients/probe/src/main.rs`, `spike.rs`.
**Surface:** `core-wire-deser`
**Refs:** `crates/punktfunk-core/src/packet.rs:411`, `crates/punktfunk-core/src/session.rs:283-289`, `clients/probe/src/main.rs:959`, `crates/punktfunk-host/src/spike.rs:251`
**Why it ranks here / impact:** Ranked last: info — a correctly-identified contract inconsistency with NO demonstrable exploit. Reassembler::push propagates coder.reconstruct(...)? and both real receive-side callers treat any non-NoFrame error as fatal, inconsistent with the surrounding 'malformed = silent drop, never fatal' discipline. But every Err arm was traced unreachable from hostile input: header firewall + block-geometry pinning guarantee equal-length, correctly-counted shards; reconstruct is only called once received>=data_shards; Config::validate rejects odd/zero shard_payload before any decode; and MDS Reed-Solomon decodes any data_shards distinct shards. Reaching the reassembler also requires an AES-GCM-decryptable packet, so it is the connected host (not a port-sprayer), and it is client-side only — the privileged host never runs the reassembler on attacker bytes. Pure defense-in-depth hardening.
**Fix:** Make a FEC reconstruction failure a counted drop rather than stream-fatal: in Reassembler::push match coder.reconstruct(...) and on Err bump packets_dropped (or a fec_failed counter), discard the block, and return Ok(None). Reserve poll_frame's Err for genuinely fatal conditions (role misuse, transport teardown), matching the discipline documented at packet.rs:298-300.
### 🔵 #12 [LOW] Web console sets NODE_TLS_REJECT_UNAUTHORIZED=0 process-globally — latent footgun disabling all outbound TLS verification ### 🔵 #12 [LOW] Web console sets NODE_TLS_REJECT_UNAUTHORIZED=0 process-globally — latent footgun disabling all outbound TLS verification
**Surface:** `deps-config-exposure` **Surface:** `deps-config-exposure` · **DEFERRED — THE ONE STILL-OPEN ITEM.**
**Refs:** `web/.env.example:22-24`, `web/web.env.example:11-14`, `web/server/util/auth.ts:17-22`, `web/vite.config.ts:23` **Refs:** `web/.env.example:22-24`, `web/web.env.example:11-14`, `web/server/util/auth.ts:17-22`, `web/vite.config.ts:23`
**Why it ranks here / impact:** Ranked #12: low and not currently exploitable (attackerControlled false), included as a latent defense-in-depth defect. NODE_TLS_REJECT_UNAUTHORIZED=0 disables certificate validation for every outbound TLS connection the Node process makes, but the only current server-side outbound hop is the loopback proxy to https://127.0.0.1:47990 (CDN/art fetches are browser-side), and a loopback connection cannot be MITM'd — so impact is nil today. Real impact materializes silently if anyone later adds a server-side off-host HTTPS call (update check, webhook, metadata fetch) or points PUNKTFUNK_MGMT_URL off-loopback. **Why it ranks here / impact:** Ranked #12: low and not currently exploitable (attackerControlled false), included as a latent defense-in-depth defect. NODE_TLS_REJECT_UNAUTHORIZED=0 disables certificate validation for every outbound TLS connection the Node process makes, but the only current server-side outbound hop is the loopback proxy to https://127.0.0.1:47990 (CDN/art fetches are browser-side), and a loopback connection cannot be MITM'd — so impact is nil today. Real impact materializes silently if anyone later adds a server-side off-host HTTPS call (update check, webhook, metadata fetch) or points PUNKTFUNK_MGMT_URL off-loopback.
**Fix:** Do not disable TLS verification globally. Pin the host's self-signed cert for the single loopback fetch: pass an https.Agent with the host cert as `ca` (or rejectUnauthorized:false on that one Agent only) to the proxyRequest fetch in server/routes/api/[...].ts, and drop NODE_TLS_REJECT_UNAUTHORIZED from the deployment env. **Fix:** Do not disable TLS verification globally. Pin the host's self-signed cert for the single loopback fetch: pass an https.Agent with the host cert as `ca` (or rejectUnauthorized:false on that one Agent only) to the proxyRequest fetch in server/routes/api/[...].ts, and drop NODE_TLS_REJECT_UNAUTHORIZED from the deployment env. **Reverted** because it needs `undici` added as a web dependency (`bun add undici` + lockfile regen) in the web build env. Apply with `cd web && bun add undici`, then scope `rejectUnauthorized:false` to the mgmt fetch and drop the global env.
## Cross-cutting themes ## Cross-cutting themes
@@ -144,17 +102,6 @@ Overall the punktfunk host is a security-conscious codebase with a strong crypto
- Stale concurrency assumptions and process-global mutable state (legacy GCM nonce direction, PUNKTFUNK_GAMESCOPE_APP env var) that were safe under a since-removed 'one session at a time' invariant and now cause cross-session confusion / crypto reuse. - Stale concurrency assumptions and process-global mutable state (legacy GCM nonce direction, PUNKTFUNK_GAMESCOPE_APP env var) that were safe under a since-removed 'one session at a time' invariant and now cause cross-session confusion / crypto reuse.
- Strong, well-tested cryptographic and memory-safety core (bounded wire parsing, correct AEAD/SPAKE2/pinning, catch_unwind FFI, panic=unwind isolation) — the foundation is solid; the residual risk is in operational hardening and trust-tier granularity, not in unsafe/RCE. - Strong, well-tested cryptographic and memory-safety core (bounded wire parsing, correct AEAD/SPAKE2/pinning, catch_unwind FFI, panic=unwind isolation) — the foundation is solid; the residual risk is in operational hardening and trust-tier granularity, not in unsafe/RCE.
## Prioritized remediation (do in this order)
1. Lock down secret files: write key.pem (and cert.pem) 0600 + create config_dir 0700 on Unix using the existing mgmt_token OpenOptions::mode pattern, and set an explicit SYSTEM+Administrators-only DACL on the punktfunk %ProgramData% subtree / key.pem / mgmt-token / *paired.json on Windows. Extend to client-key.pem; add a 0600 regression test.
2. Make the native PIN single-use and lockout-bounded: disarm or rotate the PIN on a failed SPAKE2 confirmation, add a per-window failed-attempt budget, give the CLI no-expiry arm path a default expiry, and disarm after a successful pair — this is what delivers the documented 'one online guess'.
3. Bound the RTSP video path: validate/clamp x-nv-video[0].packetSize (floor ~64, cap ~2048) in stream_config() and use checked/saturating arithmetic in VideoPacketizer::new so pps==0 / underflow can never occur; store(false) on the unwind path; add a {0,15,16,17} regression test.
4. Cap RTSP request parsing: enforce a Content-Length and total-header-size limit, add a read timeout, and bound concurrent connections so a pre-auth peer cannot slow-loris exhaust threads/memory.
5. Separate streaming trust from management trust: require the mgmt bearer token (not just a paired streaming cert) for state-changing and pairing-administration routes (arm/approve/unpair/session/library), or keep a distinct admin allow-list.
6. Fix the legacy GameStream GCM nonce reuse: HKDF-derive an independent host→client key from the rikey (direction label), or mirror the V2 'H' direction marker into the legacy IV so host rumble and client input never share (key,nonce).
7. Stop carrying the per-session gamescope launch command in a process-global env var: plumb it through the per-session VirtualDisplay::create/context and clear it when no launch is requested, eliminating cross-session stomping under concurrency.
8. Apply the cheap hardening nits: constant-time compare for the GameStream phase-4 hash (use ct_eq), set a fixed ALPN ('pkf1') on both QUIC endpoints, make FEC reconstruct failures a counted drop instead of stream-fatal, and replace the global NODE_TLS_REJECT_UNAUTHORIZED with a cert-pinned https.Agent scoped to the loopback mgmt fetch.
## Security controls done right (positives) ## Security controls done right (positives)
- Defense-in-depth wire parsing: every attacker-controllable FEC/reassembler header field is bounded against negotiated limits BEFORE any allocation keyed on it (packet.rs:328-343) — shard_bytes exact-match, data/total/block counts in range, indices in bounds, frame_bytes<=max — with no integer overflow in the size math and regression tests (rejects_oversized_shard_counts, rejects_inconsistent_block_geometry_without_panicking). - Defense-in-depth wire parsing: every attacker-controllable FEC/reassembler header field is bounded against negotiated limits BEFORE any allocation keyed on it (packet.rs:328-343) — shard_bytes exact-match, data/total/block counts in range, indices in bounds, frame_bytes<=max — with no integer overflow in the size math and regression tests (rejects_oversized_shard_counts, rejects_inconsistent_block_geometry_without_panicking).
+33 -21
View File
@@ -1,25 +1,24 @@
# Session-aware host — known limitations & follow-ups # Session-aware host — known limitations & follow-ups
Status: 2026-06-14. The host auto-detects the live session (Gaming / KDE / GNOME / wlroots) **per > **Status:** Session detection SHIPPED — host auto-detects the live session (Gaming / KDE / GNOME /
connect** and routes both video and input at it — managed gamescope at the client's resolution in > wlroots) **per connect** and routes video+input at it; opt-in mid-stream switch watcher
Steam Gaming Mode, a KWin/Mutter virtual output at the client's resolution on a Desktop. A watcher > (`PUNKTFUNK_SESSION_WATCH=1`). Code: `crates/punktfunk-host/src/vdisplay.rs` +
(opt-in: `PUNKTFUNK_SESSION_WATCH=1`) follows a Gaming↔Desktop switch **mid-stream** and rebuilds the > `vdisplay/linux/{gamescope,kwin,mutter}.rs`. Items #2 + #3 resolved in code (`3363576`); this doc is
backend in place without a reconnect. > trimmed to the still-open limitations + their design rationale.
The host auto-detects the live session per connect and routes both video and input at it — managed
gamescope at the client's resolution in Steam Gaming Mode, a KWin/Mutter virtual output at the
client's resolution on a Desktop. A watcher (opt-in: `PUNKTFUNK_SESSION_WATCH=1`) follows a
Gaming↔Desktop switch **mid-stream** and rebuilds the backend in place without a reconnect.
Live-validated on the Bazzite F44 box (`bazzite-deck-nvidia:testing`, RTX 4090): Desktop KDE at Live-validated on the Bazzite F44 box (`bazzite-deck-nvidia:testing`, RTX 4090): Desktop KDE at
5120×1440 + input; Gaming managed at 5120×1440; warm-session reuse on quick reconnect; Feature B 5120×1440 + input; Gaming managed at 5120×1440; warm-session reuse on quick reconnect; Feature B
video-switch both directions. video-switch both directions.
## Resolved (2026-06-15, `3363576`) (Resolved: **#2 mid-stream-switch input** — `vdisplay::settle_desktop_portal()`; **#3 KWin/Mutter
virtual output primary** — `apply_session_env` defaults `PUNKTFUNK_KWIN_VIRTUAL_PRIMARY` /
- **#2 — mid-stream-switch input** ✅ `vdisplay::settle_desktop_portal()` pushes the live session env `PUNKTFUNK_MUTTER_VIRTUAL_PRIMARY` on for the auto desktop path. Both shipped in `3363576`; details in
into the systemd/D-Bus activation environment and restarts the KWin portal on a switch, so input git history.)
lands without a reconnect. Validated live: `settled desktop portal env … compositor=kwin`
`libei: portal granted devices``device RESUMED` on a Gaming→Desktop mid-stream switch.
- **#3 — KWin/Mutter virtual output primary** ✅ `apply_session_env` defaults
`PUNKTFUNK_KWIN_VIRTUAL_PRIMARY` / `PUNKTFUNK_MUTTER_VIRTUAL_PRIMARY` on for the auto desktop path.
Validated live: `KWin: streamed output set as the sole desktop also_disabled=["HDMI-A-1"]` — panels
now render on the streamed screen.
## Still parked ## Still parked
@@ -32,17 +31,15 @@ but don't eliminate it. Options, in order of preference:
- **SIGKILL the gamescope on teardown** instead of `systemctl stop` (SIGTERM). Hypothesis: skipping - **SIGKILL the gamescope on teardown** instead of `systemctl stop` (SIGTERM). Hypothesis: skipping
gamescope's buggy SIGTERM teardown handler (the part that SIGSEGVs, exit 139) lets the process die gamescope's buggy SIGTERM teardown handler (the part that SIGSEGVs, exit 139) lets the process die
hard and the driver reclaim its GPU resources cleanly via normal process exit — no half-torn-down hard and the driver reclaim its GPU resources cleanly via normal process exit — no half-torn-down
context. Change `stop_autologin_sessions` + `stop_session` (`vdisplay/gamescope.rs`) to context. Change `stop_autologin_sessions` + `stop_session` (`vdisplay/linux/gamescope.rs`, both
`systemctl --user kill --signal=SIGKILL <unit>` (+ a follow-up `stop`/`reset-failed` to clear unit still use `systemctl --user stop` = SIGTERM) to `systemctl --user kill --signal=SIGKILL <unit>`
state). **Untested** — this is the first thing to try; it would preserve "managed client-res (+ a follow-up `stop`/`reset-failed` to clear unit state). **Untested** — this is the first thing
gaming AND TV-shows-gaming-when-idle". to try; it would preserve "managed client-res gaming AND TV-shows-gaming-when-idle".
- **Keep the managed session warm** (no per-disconnect restore): spawn once, reuse forever, never - **Keep the managed session warm** (no per-disconnect restore): spawn once, reuse forever, never
tear down → ~1 teardown per host lifetime. Tradeoff: the TV is blank/idle when no client is tear down → ~1 teardown per host lifetime. Tradeoff: the TV is blank/idle when no client is
connected (the autologin is never restored; return to gaming manually). connected (the autologin is never restored; return to gaming manually).
- Upstream gamescope/driver fix. - Upstream gamescope/driver fix.
(#2 mid-stream-switch input and #3 virtual-output-primary are **resolved** — see the Resolved section above.)
## Lower priority / polish ## Lower priority / polish
### 4. Mid-stream-switch input loss window (~6 s) ### 4. Mid-stream-switch input loss window (~6 s)
@@ -68,3 +65,18 @@ the TV session). Resolve together with the keep-warm decision in #1.
### 8. Feature B is opt-in ### 8. Feature B is opt-in
The mid-stream watcher is gated behind `PUNKTFUNK_SESSION_WATCH=1` pending broader validation. Promote The mid-stream watcher is gated behind `PUNKTFUNK_SESSION_WATCH=1` pending broader validation. Promote
to default-on once #2 (mid-stream input) lands and it's exercised on more boxes. to default-on once #2 (mid-stream input) lands and it's exercised on more boxes.
## Open items
1. **F44 gamescope teardown corrupts the GPU context** (#1) — try SIGKILL on teardown
(`stop_autologin_sessions` / `stop_session` in `vdisplay/linux/gamescope.rs`), else keep the
managed session warm, else upstream fix.
2. **Mid-stream-switch input-loss window (~6 s)** (#4) — pre-warm the portal or buffer/hold events
instead of dropping during the device-resume window.
3. **NVENC `InitializeEncoder failed: invalid param` noise at 5120×1440@240** (#5) — recovers via
split-encode; investigate the first-attempt failure / silence the log.
4. **NVENC HEVC ~800 Mbps cap on the RTX 4090** (#6) — consider preferring AV1 above it + surface the
cap in the speed-test / bitrate UI.
5. **Restore-guard / keep-warm interaction** (#7) — couples to #1; resolve together.
6. **Feature B (`PUNKTFUNK_SESSION_WATCH`) still opt-in** (#8) — promote to default-on after #2 lands
and it's exercised on more boxes.
+43 -238
View File
@@ -1,246 +1,51 @@
# Stats capture & graphing — design # Stats capture & graphing — design
Goal: let an operator **enable performance-stats capture from the web console**, play a > **Status:** SHIPPED (commit `5bf787e`) — host `crates/punktfunk-host/src/stats_recorder.rs`,
session, **stop**, and **review the captured time-series as graphs** in the web console. > mgmt endpoints `/api/v1/stats/*` (`mgmt.rs`), web console Performance page
Captures are **saved to disk** (browse/compare past sessions; survive host restart) and > (`web/src/sections/Stats/`). Implemented; not yet on-glass validated. This doc is trimmed to
cover **both** streaming paths: native punktfunk/1 (`virtual_stream`) and GameStream/Moonlight > design rationale + open items; the shipped code is the source of truth (data models, recorder
(`gamestream/stream.rs`). > API, endpoint list, and UI layout all live there).
This builds on the existing per-stage instrumentation (today gated by `PUNKTFUNK_PERF=1`, Goal: let an operator **enable performance-stats capture from the web console**, play a session,
stdout-only, read once at startup). We make recording **runtime-toggleable**, route the same **stop**, and **review the captured time-series as graphs**. Captures are **saved to disk**
aggregates into a **shared ring → on-disk recording**, and expose it over the mgmt REST API + (browse/compare past sessions; survive host restart) and cover **both** streaming paths: native
web console. punktfunk/1 (`virtual_stream`) and GameStream/Moonlight (`gamestream/stream.rs`).
--- ## Why / design rationale
## 1. Host: shared `StatsRecorder` - **Reuse the existing per-stage instrumentation** that was startup-gated by `PUNKTFUNK_PERF=1`
(stdout-only, read once at startup). The key behavioral change: make the per-frame
New module `crates/punktfunk-host/src/stats_recorder.rs`. One `Arc<StatsRecorder>` is created **measurement** predicate `perf || recorder.is_armed()`, re-evaluated each frame via a cheap
once in the unified host entry (`gamestream::serve`, the `serve` subcommand) alongside `Relaxed` atomic. `PUNKTFUNK_PERF=1` still emits its `tracing::info!` log line exactly as
`Arc<NativePairing>`, and shared with **both** the mgmt API (`MgmtState`) and the streaming before; the web toggle additionally builds a `StatsSample` at the aggregation boundary — so
loops (threaded through `punktfunk1::serve``SessionContext``virtual_stream`/`send_loop`, the web toggle works at runtime with **zero startup flags**.
and into the GameStream encode loop). Mirror the existing `NativePairing` Arc-sharing pattern
exactly.
### Data model (serde + utoipa `ToSchema`; this is the wire + on-disk shape)
```rust
/// One pipeline stage's latency in a window (microseconds).
pub struct StageTiming {
pub name: String, // "capture" | "submit" | "encode" | "packetize" | "send"
pub p50_us: f32,
pub p99_us: f32,
}
/// One aggregated sample (~ every 2 s native, ~ every 1 s GameStream).
pub struct StatsSample {
pub t_ms: u64, // ms since capture start (monotonic, from a stored Instant)
pub session_id: u32, // disambiguates concurrent sessions (usually constant)
pub stages: Vec<StageTiming>, // ordered pipeline stages for this path
pub fps: f32, // genuine NEW frames/s from the source
pub repeat_fps: f32, // re-encoded holds/s (source-starvation indicator)
pub mbps: f32, // tx goodput (Mb/s)
pub bitrate_kbps: u32, // configured target bitrate
pub frames_dropped: u32, // delta in this window
pub packets_dropped: u32, // delta (receiver-side / reassembler), where known
pub send_dropped: u32, // delta (host send-buffer overflow / EAGAIN)
pub fec_recovered: u32, // delta (shards recovered)
}
pub struct CaptureMeta {
pub id: String, // "2026-06-26T20-14-03Z_5120x1440" — also the filename stem
pub started_unix_ms: u64,
pub duration_ms: u64,
pub kind: String, // "native" | "gamestream"
pub width: u32,
pub height: u32,
pub fps: u32,
pub codec: String, // "h264" | "hevc" | "av1"
pub client: String, // short label / fingerprint prefix, or "" if unknown
pub sample_count: u32,
}
pub struct Capture {
pub meta: CaptureMeta,
pub samples: Vec<StatsSample>,
}
pub struct StatsStatus {
pub armed: bool, // capture currently running
pub sample_count: u32, // samples in the in-progress capture
pub started_unix_ms: u64, // 0 if idle
pub kind: String, // path of the in-progress capture, "" if idle
}
```
Stage sets per path (ordered, roughly the per-frame critical path so stacking is meaningful):
- **native**: `capture` (try_latest ring read + color convert), `submit` (NVENC enqueue),
`encode` (lock_bitstream = NVENC schedule + ASIC — the dominant stage under GPU load),
`send` (paced_submit: seal + FEC + pace + sendmmsg).
- **gamestream**: `capture`, `encode`, `packetize` (poll+FEC+packetize), `send`.
> Native naming: today's vectors are `st_cap`→`capture`, `st_submit`→`submit`,
> `st_wait`→`encode`, `pace_us`→`send`. (`encode_us` total ≈ capture+submit+encode; we do not
> emit it as a stage to avoid double-counting — it's implied by the stack.)
### Recorder API
```rust
pub struct StatsRecorder { /* dir, armed: AtomicBool, live: Mutex<Option<Live>>, next_sid: AtomicU32 */ }
impl StatsRecorder {
pub fn new(dir: PathBuf) -> Arc<Self>; // creates dir (0700) if missing
pub fn is_armed(&self) -> bool; // cheap Relaxed atomic load — called on the hot path
/// Arm a new capture. No-op if already armed (returns current status).
pub fn start(&self) -> StatsStatus;
/// A streaming loop announces itself when it first records while armed.
/// Seeds CaptureMeta (kind/w/h/fps/codec/client) on the FIRST registration. Returns session_id.
pub fn register_session(&self, kind: &'static str, w: u32, h: u32, fps: u32, codec: &str, client: &str) -> u32;
/// Append one aggregated sample (called from the loops' existing ~2 s/~1 s boundary).
/// Bounded: cap at MAX_SAMPLES (e.g. 5400 ≈ 3 h @ 2 s). On overflow, stop appending and
/// set a `truncated` flag (DO NOT drop oldest — a saved recording must keep its start).
pub fn push_sample(&self, session_id: u32, sample: StatsSample);
/// Disarm + finalize: write <dir>/<id>.json atomically, clear live, return saved meta.
pub fn stop(&self) -> std::io::Result<Option<CaptureMeta>>;
pub fn status(&self) -> StatsStatus;
pub fn live_snapshot(&self) -> Option<Capture>; // clone of the in-progress capture for live graphing
pub fn list(&self) -> Vec<CaptureMeta>; // scan dir, parse meta only, newest first
pub fn load(&self, id: &str) -> std::io::Result<Capture>;
pub fn delete(&self, id: &str) -> std::io::Result<()>;
}
```
Invariants / safety:
- **No async on the per-frame path.** `is_armed()` is a `Relaxed` atomic load; sample - **No async on the per-frame path.** `is_armed()` is a `Relaxed` atomic load; sample
construction happens only at the existing 2 s / 1 s aggregation boundary, never per frame. construction happens only at the existing **~2 s native / ~1 s GameStream** aggregation
- **`id` is path-traversal-safe.** `load`/`delete` MUST reject any id not matching boundary, never per frame. One shared `Arc<StatsRecorder>` is created once in the unified host
`^[A-Za-z0-9._-]+$` (no `/`, no `..`, no `:` — keep it a valid Windows filename), and only ever entry and threaded into both streaming loops + `MgmtState`, mirroring the existing
join `dir/<id>.json`. Return NotFound on reject. (Endpoints are bearer-authed, but defend in `Arc<NativePairing>` sharing pattern.
depth.) - **Stage sets are the per-frame critical path so stacking is meaningful.** native:
- **Bounded memory.** `MAX_SAMPLES` cap; truncate (keep oldest), never unbounded. `capture` / `submit` (NVENC enqueue) / `encode` (`lock_bitstream` = NVENC schedule + ASIC, the
- **Atomic disk write.** Write to `<id>.json.tmp` then rename, so a crash mid-write can't leave dominant stage under GPU load) / `send` (paced_submit: seal + FEC + pace + sendmmsg).
a half file. Pretty-print not required; compact JSON is fine. gamestream: `capture` / `encode` / `packetize` / `send`. Native source vectors map
- Captures dir: `~/.config/punktfunk/captures/` (next to `cert.pem` etc.). Resolve via the same `st_cap``capture`, `st_submit``submit`, `st_wait``encode`, `pace_us``send`; `encode_us`
config-dir helper the rest of the host uses. total ≈ capture+submit+encode and is **not** emitted as its own stage to avoid double-counting.
- **Gotchas / accepted-risk decisions:**
- **`id` is path-traversal-safe.** `load`/`delete` reject any id not matching
`^[A-Za-z0-9._-]+$` (no `/`, no `..`, no `:` — keep it a valid Windows filename) and only ever
join `dir/<id>.json`. Endpoints are bearer-authed, but defend in depth.
- **Bounded memory, keep the start.** `MAX_SAMPLES` cap (~5400 ≈ 3 h @ 2 s); on overflow stop
appending and set a `truncated` flag — **do NOT drop oldest**, a saved recording must keep
its start.
- **Atomic disk write.** Write `<id>.json.tmp` then rename so a crash mid-write can't leave a
half file. Captures dir `~/.config/punktfunk/captures/` (0700), next to `cert.pem`.
- Counters that a path doesn't expose are recorded as `0`**do NOT fabricate**.
- mgmt endpoints are **bearer-token only** (operator actions) — deliberately NOT in the mTLS
`cert_may_access` read-only allowlist.
- Charts render **client-only** (mounted guard) so SSR doesn't choke on `ResponsiveContainer`'s
0-width measure.
### Runtime gating change (the key behavioral change) ## Open items
Today the loops measure per-stage timing only `if perf` (a startup bool). Change the per-frame - **On-glass validation.** Implemented but not yet validated on real hardware end-to-end (arm
**measurement** predicate to `let measure = perf || recorder.is_armed();`, re-evaluated each from the console, play, stop, review graphs across both native + GameStream paths).
frame (cheap atomic). Then at the aggregation boundary:
- if `perf` → keep the existing `tracing::info!` log line (unchanged behavior);
- if `recorder.is_armed()` → also build a `StatsSample` and `push_sample`.
So `PUNKTFUNK_PERF=1` still works exactly as before, AND the web toggle now works at runtime
with zero startup flags.
### Where each loop emits the sample
- **native** (`punktfunk1.rs`): the cap/submit/encode(`st_wait`) splits live in the capture
thread; `mbps`/`send_dropped`/`bytes` and `session.stats()` live in the send thread. Emit the
complete sample from **one** place. Cleanest: carry the per-frame `cap_us/submit_us/wait_us`
(and a `repeat: bool`) on `FrameMsg` to the send thread (it already carries `encode_us`), so
`send_loop` builds the whole sample at its existing 2 s boundary where `session.stats()` is
already read. Compute `frames_dropped/packets_dropped/send_dropped/fec_recovered` as deltas vs
the previous window's `Session::stats()` snapshot (the loop already tracks `last_bytes` /
`last_send_dropped` — extend that bookkeeping). `register_session` is called once with the
negotiated mode/codec and the client label.
- **gamestream** (`gamestream/stream.rs`): the encode loop already tracks per-stage max each
1 s. Add p50/p99 accumulation (small per-stage `Vec<u32>` like the native path) and, when
`perf || recorder.is_armed()`, emit a `StatsSample` with stages
`[capture, encode, packetize, send]` + fps (unique new frames) + mbps + whatever loss/byte
counters that path exposes (use 0 where a counter doesn't exist; do NOT fabricate). Call
`register_session("gamestream", ...)` with the GameStream-negotiated mode/codec/client.
Threading: add `stats: Arc<StatsRecorder>` to `SessionContext` and the GameStream stream
setup; the standalone `punktfunk1-host` subcommand (no mgmt) passes a fresh recorder (harmless,
just unused).
---
## 2. Host: mgmt REST API (`mgmt.rs`)
Add `stats: Arc<StatsRecorder>` to `MgmtState`. Register handlers in `api_router_parts()` via
`routes!()` with `#[utoipa::path]`. All under `/api/v1`, **bearer-token only** (operator
actions — do NOT add them to the mTLS `cert_may_access` read-only allowlist). All bodies/returns
derive `ToSchema`; errors use the `ApiJson`/`ApiError` envelope. Tag every operation `stats`.
| Method & path | fn (operationId) | body → returns |
|---------------------------------------|-------------------------|-------------------------------|
| POST `/api/v1/stats/capture/start` | `stats_capture_start` | — → `StatsStatus` |
| POST `/api/v1/stats/capture/stop` | `stats_capture_stop` | — → `CaptureMeta` (200) / 204-ish if nothing was recording |
| GET `/api/v1/stats/capture/status` | `stats_capture_status` | → `StatsStatus` |
| GET `/api/v1/stats/capture/live` | `stats_capture_live` | → `Capture` (in-progress; 404/empty if idle) |
| GET `/api/v1/stats/recordings` | `stats_recordings_list` | → `Vec<CaptureMeta>` |
| GET `/api/v1/stats/recordings/{id}` | `stats_recording_get` | → `Capture` |
| DELETE `/api/v1/stats/recordings/{id}`| `stats_recording_delete`| → `StatsStatus`/204 |
Register the new `ToSchema` types with the OpenApi derive's `components(schemas(...))` list.
Then regenerate the checked-in spec:
```
cargo run -p punktfunk-host -- openapi > api/openapi.json
```
CI fails on drift — the regenerated `api/openapi.json` MUST be committed.
---
## 3. Web console (`web/`)
New page **"Performance"** following the established route → section/index (fetch) →
section/view (presentational) pattern, registered in the `NAV` array (`app-shell.tsx`) with a
lucide icon (`Activity` or `LineChart`).
- Route: `web/src/routes/stats.tsx``createFileRoute('/stats')``SectionStats`.
- Section: `web/src/sections/Stats/index.tsx` (orval hooks) + `view.tsx` (presentational,
i18n via Paraglide `m.*`). Use `Section`, `QueryState`, `Card`/`CardHeader`/`CardTitle`/
`CardContent`, `Button`, `Badge` from `web/src/components/ui`.
- Charts: **add `recharts`** to `web/package.json` (no chart lib exists today). Render charts
**client-only** (a mounted guard) so SSR doesn't choke on `ResponsiveContainer`'s 0-width
measure. Theme via existing CSS variables / brand violet, dark-mode aware.
Data hooks come from regenerated orval (`bun run api:gen` after the host's openapi.json is
updated): `useStatsCaptureStatus`, `useStatsCaptureStart`, `useStatsCaptureStop`,
`useStatsCaptureLive`, `useStatsRecordingsList`, `useStatsRecordingGet`,
`useStatsRecordingDelete` (exact names per orval's tag/operationId convention — verify against
generated output and adjust the view imports to match).
UI layout:
1. **Capture control card** — Start/Stop button (mutations; invalidate status query on
success), a "Recording…"/"Idle" `Badge`, elapsed time + live sample count
(`useStatsCaptureStatus`, `refetchInterval: 2000`). On Start, the live chart appears.
2. **Live chart** (visible while armed; `useStatsCaptureLive`, `refetchInterval: 2000`) — the
latency stage breakdown as a **stacked area** (capture/submit/encode/send in µs, the
"where does the time go" view), with fps and mbps as secondary line charts.
3. **Recordings card** — table from `useStatsRecordingsList`: time, kind badge, resolution,
codec, duration, sample count; row actions **View** (select → detail), **Download** (export
the `Capture` JSON via the recording GET), **Delete** (mutation, confirm).
4. **Recording detail** — when a recording (or the live capture) is selected, render the full
graph set from its `samples`:
- Latency stage breakdown (stacked area, µs) — primary bottleneck view; p99 overlay toggle.
- Throughput: fps (new vs repeat) + mbps.
- Health: frames_dropped / packets_dropped / send_dropped / fec_recovered over time.
i18n: add keys to `web/messages/en.json` + `de.json` (nav label, titles, button/labels) and
regenerate Paraglide. Keep both locales in sync.
---
## 4. Verification / done-criteria
- `cargo build -p punktfunk-host` (and `--workspace`), `cargo clippy --workspace --all-targets
-D warnings`, `cargo fmt --all --check` — green.
- `cargo run -p punktfunk-host -- openapi > api/openapi.json` — committed, no drift.
- `PUNKTFUNK_PERF=1` stdout behavior unchanged (no regression to the existing perf log).
- Web: orval regen clean, typecheck/build green, charts render client-side.
- CLAUDE.md status note + this plan updated.
- Adversarial review: hot-path stays sync + bounded; `id` path-traversal-safe; OpenAPI/orval no
drift; SSR-safe charts; both paths actually emit samples.
+224
View File
@@ -0,0 +1,224 @@
---
title: "Windows build & packaging"
description: "How the punktfunk Windows host is built, signed, and packaged: the all-Rust driver workspace built from source in CI, the Inno Setup installer, the web console bundle, the CI workflows, and the dev-iteration helpers. Repo-internal source of truth - not part of the user-facing docs-site."
---
# Windows build & packaging
Single source of truth for **how the Windows host ships**: what artifacts are built, the all-Rust
driver workspace and why we build it from source in CI, the Inno Setup installer, the web console
bundle, the CI workflows, signing, and the dev loop. Architecture lives in
[`windows-host-rewrite.md`](windows-host-rewrite.md); deployment/runtime in
[`windows-service.md`](windows-service.md). This doc is repo-internal (do **not** mirror into
`docs-site/`).
> **x64-only by design.** The host is coupled to NVENC (`nvEncodeAPI64.dll`) and the pf-vdisplay IddCx
> driver, neither of which exists on Windows ARM64 (no ARM64 NVIDIA driver / IddCx path). The *client*
> ships x64 + ARM64 MSIX; the *host* does not.
## 1. What ships
The signed `punktfunk-host-setup-<ver>.exe` (Inno Setup) lays down, under `C:\Program Files\punktfunk\`:
| Component | What it is |
|-----------|------------|
| `punktfunk-host.exe` | the host binary (`--features nvenc,amf-qsv` = NVIDIA + AMD/Intel in one build) |
| `pf-vdisplay` driver | all-Rust UMDF IddCx virtual display (per-session client-resolution output) |
| `pf-dualsense` driver | virtual DualSense / DualShock 4 (one type-aware HID minidriver) |
| `pf-xusb` driver | virtual Xbox 360 / XInput companion |
| `pf-vkhdr-layer` | Vulkan implicit layer that advertises HDR formats on the virtual display |
| web console | self-contained Nitro `.output` + a portable `bun` runtime (the `PunktfunkWeb` task) |
| FFmpeg DLLs | `avcodec`/`avutil`/`swscale`/... - the AMD/Intel (AMF/QSV) encode backend link-imports them |
| `nefconc.exe` | nefarius' nefcon (creates the `root\pf_vdisplay` device node; pnputil can't) |
All three drivers and the HDR layer are **bundled, not external** - no ViGEmBus, no SudoVDA, no separate
driver download. The host installs a `LocalSystem` SCM service that `CreateProcessAsUserW`s into the
interactive session for secure-desktop capture (why MSIX is unusable - see
[`windows-service.md`](windows-service.md)).
## 2. Component map (source -> artifact)
| Source | Built by | Artifact |
|--------|----------|----------|
| `crates/punktfunk-host/` | `cargo build --release -p punktfunk-host --features nvenc,amf-qsv` | `punktfunk-host.exe` |
| `packaging/windows/drivers/pf-vdisplay/` | `build-pf-vdisplay.ps1` (workspace `cargo build` + sign) | `pf_vdisplay.{dll,inf,cat}` + `.cer` |
| `packaging/windows/drivers/pf-dualsense/` `pf-xusb/` | `build-gamepad-drivers.ps1` (sign the workspace build) | `pf_{dualsense,xusb}.{dll,inf,cat}` + shared `.cer` |
| `packaging/windows/pf-vkhdr-layer/` | `pack-host-installer.ps1` (`cargo build --release`) | `pf_vkhdr_layer.dll` + `.json` |
| `web/` | `scripts/windows/build-web.ps1` (`bun run build`) | self-contained `.output` |
| `packaging/windows/nvenc/nvenc.def` | `gen-nvenc-importlib.ps1` (llvm-dlltool) | `nvencodeapi.lib` (link import, no GPU/SDK) |
## 3. The driver workspace - `packaging/windows/drivers/`
A **separate cargo workspace** (its own `[workspace]` root) because driver crates are `cdylib`s built
with the WDK toolchain on Windows only. Members:
- `pf-vdisplay` - the IddCx virtual display (the real driver).
- `pf-dualsense`, `pf-xusb` - the virtual gamepad HID/XUSB minidrivers.
- `wdk-iddcx` - hand-written IddCx DDI wrappers (the `iddcx` ApiSubset bindgen reuses `wdk_default`).
- `wdk-probe` - a toolchain/surface-assert probe crate.
- `vendor/wdk-sys` + `vendor/wdk-build` - **vendored** microsoft/windows-drivers-rs 0.5.1 (the published
crates) + an added `iddcx` ApiSubset. A `[patch.crates-io]` redirects every `wdk-sys`/`wdk-build`
reference (incl. `wdk` 0.4.1's transitive deps) to these copies, so the graph has exactly one
iddcx-capable `wdk-sys`. **Pinned - do not chase upstream.**
Path-deps the owned ABI crate `crates/pf-driver-proto` (the host<->driver control protocol). `.cargo/
config.toml` sets an explicit `--target x86_64-pc-windows-msvc` + `target-feature=+crt-static` (UMDF
needs the static CRT; the explicit target keeps `crt-static` off host build-scripts/proc-macros).
`[workspace.metadata.wdk.driver-model]` sets UMDF 2.31 once for all members.
Driver-specific gotchas (handled by the build scripts):
- **`/INTEGRITYCHECK` (FORCE_INTEGRITY).** `wdk-build` links `/INTEGRITYCHECK`, which a non-EV
(self-signed) cert can't satisfy, so the driver won't load. `clear-force-integrity.ps1` clears the PE
`DllCharacteristics` bit (offset `0x5e`) **before** signing.
- **Self-signed cert.** The drivers are signed with a self-signed CodeSigning cert; the installer trusts
the bundled `.cer` (machine `Root` + `TrustedPublisher`) at install time so PnP loads them silently.
Validated to load under Secure Boot on. (CI can use a stable `DRIVER_CERT_PFX_B64` secret instead.)
- **Device node via nefcon, never devgen.** The `root\pf_vdisplay` node is created with `nefconc`
(a clean `ROOT\DISPLAY` node). `devgen` leaves persistent `SWD\DEVGEN` phantoms that survive reboot +
registry deletion. The gamepad drivers create their per-session nodes from the host via
`SwDeviceCreate` (no install-time node).
- **Strictly-increasing `DriverVer`.** `9.9.MMdd.HHmm` (stampinf). pnputil silently keeps the old binary
on a non-increasing version; a later-minute redeploy always wins.
## 4. Drivers are BUILT FROM SOURCE - the anti-stale decision
The drivers used to ship as **checked-in prebuilt binaries** (`packaging/windows/pf-vdisplay/` +
`gamepad-drivers/`). That model went stale and shipped two field bugs on a fresh install:
1. A repo-wide rename edited `pf_vdisplay.inf` (a comment) but never re-signed `pf_vdisplay.cat`. A
catalog hashes the INF+DLL byte-for-byte, so `pnputil /add-driver` failed
`SPAPI_E_FILE_HASH_NOT_IN_CATALOG` **on every box** - the driver never installed, every session died
"pf-vdisplay driver interface not found".
2. The frozen binary predated `IOCTL_SET_RENDER_ADAPTER`, which the host needs to pin the IddCx render
GPU on hybrid/Optimus boxes.
Fix: **build from source every release.** `pack-host-installer.ps1` calls `build-pf-vdisplay.ps1` (which
`cargo build`s the *whole* workspace) then `build-gamepad-drivers.ps1 -SkipBuild` (sign the already-built
gamepad cdylibs), so `.dll`/`.inf`/`.cat` are always in lockstep and current driver features ship. The
checked-in binaries were deleted. Re-introducing a vendored binary is the bug; if you must, a catalog
guard (`Test-FileCatalog` hash-membership) belongs in the build script.
The build scripts share the same shape (WDK env -> build -> clear FORCE_INTEGRITY -> sign DLL ->
stampinf -> Inf2Cat -> sign cat -> export `.cer`); `build-gamepad-drivers.ps1` loops over the two gamepad
drivers and signs both with one shared cert. (A `_driver-pack-common.ps1` helper to dedup the ~90% they
share is a known TODO - keep behavior identical and re-run `windows-host` if you do it.)
## 5. Toolchain / build env
The drivers build with **plain `cargo build`** against the vendored windows-drivers-rs - **no cargo-make,
no cargo-wdk for the build** (cargo-wdk is only provisioned + probed by `windows-drivers.yml`). The build
needs, on the runner:
- **WDK 26100** - `Version_Number=10.0.26100.0` pins the SDK version `wdk-build` uses (it otherwise picks
`10.0.28000.0`, which has no `km`/`crt`, and bindgen fails). Provisioned by
`scripts/ci/provision-windows-wdk.ps1` (iddcx headers are the "WDK present" signal).
- **clang 22 + bindgen 0.72** - the vendored `bindgen` is `0.72.1`, which builds clean on the runner's
**default** LLVM (`C:\Program Files\LLVM`, currently clang 22). `LIBCLANG_PATH` is left unset (defaults
to the runner default). *History:* LLVM 21.1.2 was briefly pinned (`C:\llvm-21`) to dodge a
bindgen-0.71 layout-test overflow on clang 22; the 0.72 bump retired that pin, so there's now one
toolchain for both driver builds (the pack and `windows-drivers.yml`).
- NVENC import lib synthesised from a 2-export `.def` via `llvm-dlltool` (`gen-nvenc-importlib.ps1`) -
no GPU or NVIDIA SDK at build time.
- `FFMPEG_DIR` (the BtbN gpl-shared x64 tree) for the AMD/Intel AMF/QSV link; NASM + CMake +
`CMAKE_POLICY_VERSION_MINIMUM=3.5` for the CMake-from-source deps (aws-lc, opus).
- **Gotcha:** `CARGO_HOME` must be an ASCII path (a non-ASCII username breaks SDL3's MSVC precompiled
header). The runner uses `C:\Users\Public\.cargo`.
- **`CARGO_TARGET_DIR` for the driver build must be the DEFAULT (in-tree) dir.** `wdk-build`'s
`find_top_level_cargo_manifest()` walks up from `OUT_DIR` to the first ancestor with a `Cargo.lock`; a
relocated `C:\t` target dir hides the workspace lock and the build-script panics "a Cargo.lock file
should exist...". The driver deps have no deep CMake crates, so the in-tree target stays under MAX_PATH.
(The host/client builds *do* relocate to `C:\t` to dodge MAX_PATH - that's the opposite need.)
## 6. The installer - Inno Setup
`pack-host-installer.ps1` orchestrates, in order: resolve a code-signing cert -> sign `punktfunk-host.exe`
-> **build + sign the drivers from source** (`build-pf-vdisplay.ps1` + `build-gamepad-drivers.ps1`,
staged via `stage-pf-vdisplay.ps1` which also fetches/verifies pinned nefcon) -> stage FFmpeg DLLs + the
web console + a portable bun -> build + sign the HDR Vulkan layer -> run `ISCC` on `punktfunk-host.iss`
-> sign `setup.exe`.
`punktfunk-host.iss` (Inno) lays down `{app}`, runs the install steps, and registers things. **Optional
tasks** (all default-checked): install the pf-vdisplay driver, install the gamepad drivers, install the
HDR Vulkan layer, start the service. Silent install: `/VERYSILENT` (omit a task with
`/MERGETASKS="!installdriver"`).
Install-time work runs from `punktfunk-host.exe` subcommands, **not** locale-parsed PowerShell *files* -
the `[Run]` section calls `driver install [--gamepad] --dir <stage>` and `web setup --app-dir <app>
[--password-file <f>]` (`crates/punktfunk-host/src/windows/install.rs`). This is the ANSI-codepage
root fix: PowerShell 5.1 reads a BOM-less `.ps1` *file* in the machine codepage, so a stray non-ASCII
byte aborted the install on a non-English box; a compiled subcommand drives the same external tools as
fixed string literals (the `service install` precedent, see [`windows-service.md`](windows-service.md)).
The `.iss`'s *inline* `-Command` PowerShell is a command-line string, not a file read, so it's unaffected
and stays. Each subcommand is best-effort (a hiccup warns, never aborts the installer):
- **Driver install:** trust the bundled `.cer` (Root + TrustedPublisher), create the `root\pf_vdisplay`
node if absent (nefconc, gated so a re-create can't spawn a phantom), `pnputil /add-driver /install`
(pf-vdisplay) or `pnputil /add-driver` per-inf (gamepads - the host SwDeviceCreate's the devnodes).
A driver hiccup never aborts the install (the host degrades to a physical display).
- **Web console (`web setup`):** write the ACL'd `web-password`, register the `PunktfunkWeb` task (boot,
SYSTEM, restart-on-failure -> `bun` on `:3000`, via a generated UTF-16 Task Scheduler XML), open TCP
3000, start it. Upgrade-safe: stop + reap any old console (by the `:3000` listener owner, runtime-
agnostic - identified by the wildcard foreign address, so the localized state word is never parsed)
before re-registering so the new one can bind.
**Signing:** the exe/setup/HDR-layer use the **`MSIX_CERT_PFX_B64`/`MSIX_CERT_PASSWORD`** secrets
(`CN=unom`, shared with the client); the **drivers** use a separate cert (self-signed per build, or a
stable `DRIVER_CERT_PFX_B64`) and their own bundled `.cer` - the two never collide. Without the MSIX
secrets, an ephemeral self-signed cert is generated and its `.cer` published next to the installer.
## 7. The web console bundle
The console is a TanStack Start / Nitro SSR app (`web/`). `vite.config.ts` sets `noExternals: true`, so
`bun run build` emits a **self-contained `.output`** (~75 files, deps bundled + tree-shaken, no
`node_modules`/`.npmrc`). The installer ships that `.output` + a portable `bun.exe`; the `PunktfunkWeb`
task runs `bun .output/server/index.mjs` on `:3000`, auto-wired to the host's loopback mgmt API via
`web-run.cmd` (sources `%ProgramData%\punktfunk\mgmt-token` + `web-password`). No node, no node_modules
forest. (`build-web.ps1` is the dev-box rebuild-and-restart helper.)
## 8. CI workflows (`.gitea/workflows/`)
All run on the single self-hosted `windows-amd64` runner (`home-windows-1`), which **serializes** the
whole Windows fleet - a `Cargo.lock`/`packaging/windows/**` touch queues several builds back-to-back.
| Workflow | Trigger | Does |
|----------|---------|------|
| `windows-host.yml` | `crates/punktfunk-host`, `packaging/windows`, `scripts/windows`, `web`, tags `v*` | build host + clippy + HDR layer + web smoke-boot -> pack + sign installer -> publish (canary/latest) |
| `windows-drivers.yml` | `packaging/windows/drivers`, `crates/pf-driver-proto` | probe the driver toolchain + build/test/clippy `pf-driver-proto` + `cargo build` the driver workspace + inspect FORCE_INTEGRITY (the fast driver-only gate; coverage the pack lacks) |
| `windows-drivers-provision.yml` | `provision-windows-wdk.ps1` | one-shot WDK + cargo-wdk provisioning onto the persistent runner |
| `windows.yml` / `windows-msix.yml` | client | build the Windows *client* + its signed MSIX (x64 + ARM64) |
`windows-host.yml` also builds the drivers from source (in pack), so it overlaps `windows-drivers.yml` on
a `drivers/**` edit (two driver builds on the serialized runner). They're kept separate on purpose -
`windows-drivers.yml` is the fast pre-pack gate. **CI builds, never launches the exe** (no GPU on the
runner), so AMF/QSV + on-glass behavior are validated on a real box, not in CI.
## 9. Dev iteration
- **Host:** `scripts/windows/deploy-host.ps1` (build + redeploy the exe to a box), `build-web.ps1`
(rebuild + restart the console).
- **pf-vdisplay driver:** `packaging/windows/drivers/deploy-dev.ps1` (build -> clear FORCE_INTEGRITY ->
sign -> stampinf a strictly-increasing `DriverVer` -> Inf2Cat -> sign -> `-Install`);
`redeploy-pf-vdisplay.ps1` (one-shot: stop host -> install -> reload adapter -> start);
`reset-pf-vdisplay.ps1` (recover a wedged driver: reap ghost monitor nodes + cycle the adapter, no
reboot). Run elevated; default to the `PunktfunkHost` service.
- Drive any of these from Linux over SSH:
`ssh user@box 'powershell -ExecutionPolicy Bypass -File C:\...\reset-pf-vdisplay.ps1'`.
- The RTX/on-glass box is where NVENC encode + IDD-push frame flow are validated (CI can't).
## 10. Release
Push a `vX.Y.Z` tag (one tag releases every platform): `windows-host.yml` builds + signs
`punktfunk-host-setup-X.Y.Z.exe` + the public `.cer`, refreshes the `latest/` alias, and attaches them to
the unified Gitea Release. Main pushes publish rolling `0.3.<run>` **canary** builds to `canary/`.
Download: `https://git.unom.io/api/packages/unom/generic/punktfunk-host-windows/{latest,canary}/punktfunk-host-setup.exe`.
## 11. See also
- [`windows-host-rewrite.md`](windows-host-rewrite.md) - host architecture (capture/encode/vdisplay
backends, IDD-push, the rewrite milestones). The architecture source of truth.
- [`windows-service.md`](windows-service.md) - the SYSTEM service + secure-desktop deployment model.
- [`windows-virtual-display-rust-port.md`](windows-virtual-display-rust-port.md) - history of the all-Rust
IddCx driver port (SUPERSEDED in its conclusion: IDD-push became the primary capture path).
- `packaging/windows/pf-vkhdr-layer/README.md` - the HDR Vulkan layer.
- `packaging/windows/README.md` - the file index for `packaging/windows/`.
+70 -134
View File
@@ -1,170 +1,106 @@
# Windows native client — bootstrap handoff # Windows native client — bootstrap handoff
A handoff for an agent picking up the **native Windows punktfunk/1 client**. The host side is done > **Status:** SHIPPED — `clients/windows` (binary `punktfunk-client`), WinUI 3 via `windows-reactor`;
and live-validated on a real RTX 4090; the client is the remaining piece. This doc is the concrete > commits `0a3b92d..0cc36fa`. Build + clippy + fmt green on `x86_64-pc-windows-msvc` and
starting point: the locked decisions, the reference code to port, the stack swaps, the dev loop, and > `aarch64-pc-windows-msvc` (ARM64 cross-compiled off the one x64 runner; signed MSIX for both arches
the gotchas. Read it top to bottom, then start at **Phase 1** (de-risk Reactor first). > via `windows-msix.yml`). This doc is trimmed to design rationale, the HDR reference, hard-won
> gotchas, and open items. The shipped source under `clients/windows/src/` is the truth.
## Status — WinUI 3 client landed (2026-06-15) The native Windows punktfunk/1 client connects to a host (`serve` / `punktfunk1-host`), decodes HEVC,
presents it low-latency on a `SwapChainPanel`, plays Opus audio, and captures local
mouse/keyboard/gamepad to send back — the Windows analogue of the GTK4 Linux client
(`clients/linux`), which was the architectural template. The locked decisions below are the durable
"why"; the HDR section is the evergreen present reference.
The client is implemented in `clients/windows` (binary `punktfunk-client`) and is ## Locked decisions (the "why")
**build + clippy + fmt green on `x86_64-pc-windows-msvc` and `aarch64-pc-windows-msvc`** (the ARM64
target cross-compiled off the one x64 runner — see `windows.yml`; signed MSIX for both arches via
`windows-msix.yml`). It is the **WinUI 3** client this doc planned: native chrome (host list,
settings, in-app SPAKE2 PIN pairing) + the video on a **`SwapChainPanel`**, all in pure Rust.
- **Reactor is viable after all — it is what we use.** The locked decision held. windows-rs
[PR #4499](https://github.com/microsoft/windows-rs/pull/4499) (merged 2026-06-01) added a
`SwapChainPanel` widget to **`windows-reactor`** with `set_swap_chain` over
`CreateSwapChainForComposition` — so a DXGI presenter *can* be hosted. (An earlier read that Reactor
had no swapchain hatch was wrong/stale.) The UI is a declarative React-like tree
(`App::new().render(app)`, `use_state`/`use_resource`/`use_effect` hooks, `list_view`/`text_box`/
`combo_box`/`content_dialog`/`button`/`ToggleSwitch`); the video page is `swap_chain_panel()
.on_ready(|p| p.set_swap_chain(&sc))` driven by `on_rendering`. **`present.rs`** owns the D3D11
composition swapchain (WARP fallback, runtime shaders, Contain-fit) — the same renderer, bound to
the panel instead of an HWND.
- **windows-reactor is unpublished** (`version 0.0.0`) and fast-moving — depend on it as a **git dep
pinned to a commit** (`b4129fcc`), and pin the `windows` crate to the **same commit** so the
`IDXGISwapChain1` you pass to `set_swap_chain` satisfies reactor's `windows_core::Interface`. Its
`build.rs` downloads the Windows App SDK NuGets (Foundation/Interactive/Runtime) and stages the
bootstrap DLL + `resources.pri` next to the exe; it **`.unwrap()`s `CARGO_WORKSPACE_DIR`**, so set
it in the build env (`CARGO_WORKSPACE_DIR=C:\Users\Public\punktfunk`). It writes `/temp` + `/winmd`
to the workspace root (gitignored). The App SDK runtime must be installed to *run*.
- **Stream input is Win32 low-level hooks**, not XAML: reactor exposes only keyboard *accelerators* +
pointer *button-state* (no raw key-down/up, no pointer position, no wheel), insufficient for a game
stream. `input.rs` installs `WH_KEYBOARD_LL`/`WH_MOUSE_LL` on the stream page (uninstalled on exit),
maps the pointer through the window client rect, sends native VK + abs mouse + wheel, with a
Ctrl+Alt+Shift+Q capture toggle. (A future alternative: generate `Microsoft.UI.Xaml.UIElement`
bindings from the staged winmd and subscribe to `KeyDown`/`PointerMoved` — scoped to the panel.)
- **Build gotcha:** `CARGO_HOME` must be on an **ASCII path** (`C:\Users\Public\.cargo`). SDL3's
`build-from-source` PCH embeds the registry source path; the `ü` in the dev box's username makes
MSVC fail (`MSB8084` / `C4828`).
- **Still pending:** **on-glass validation** — the dev VM is headless / SSH Session 0, so the WinUI
window can't show there; validate over RDP or on the RTX box. Then **D3D11VA hardware decode** +
**10-bit/HDR present**, RAWINPUT relative-mouse pointer-lock, and a per-host speed test in the UI.
## What we're building
A native Windows client that connects to a punktfunk/1 host (`serve` / `punktfunk1-host`), decodes
HEVC, presents it low-latency, plays Opus audio, and captures local mouse/keyboard/gamepad to send
back — i.e. the Windows analogue of the **GTK4 Linux client** (`clients/linux`),
which is the architectural template. The Windows client is close to a 1:1 port of the Linux client
with the platform layers swapped.
## Locked decisions (from the Windows-host/client plan, `design/windows-host.md` + project memory)
- **Pure Rust.** `windows-rs` + **Windows App SDK "Reactor"** (WinUI 3 from Rust, merged windows-rs - **Pure Rust.** `windows-rs` + **Windows App SDK "Reactor"** (WinUI 3 from Rust, merged windows-rs
PR #4479). No C++/C#. De-risk Reactor + `SwapChainPanel` FIRST — it's the only novel/uncertain PR #4479). No C++/C#. Reactor + `SwapChainPanel` was the only novel/uncertain piece and was
piece; everything else is a known-good port. de-risked first; everything else is a known-good port of the Linux client.
- Reactor is viable: windows-rs [PR #4499](https://github.com/microsoft/windows-rs/pull/4499)
(merged 2026-06-01) added a `SwapChainPanel` widget to `windows-reactor` with `set_swap_chain`
over `CreateSwapChainForComposition`, so a DXGI presenter *can* be hosted. (An earlier read that
Reactor had no swapchain hatch was wrong/stale.) The UI is a declarative React-like tree
(`App::new().render(app)`, `use_state`/`use_resource`/`use_effect` hooks); the video page is
`swap_chain_panel().on_ready(|p| p.set_swap_chain(&sc))` driven by `on_rendering`.
- **Links `punktfunk-core` directly** (Cargo path dep, `features = ["quic"]`) — **no C ABI**, exactly - **Links `punktfunk-core` directly** (Cargo path dep, `features = ["quic"]`) — **no C ABI**, exactly
like the GTK client. `NativeClient` is already `Sync` (mutexed plane receivers), so it drops into a like the GTK client, *unlike* the Apple path. `NativeClient` is already `Sync` (mutexed plane
UI app cleanly. The C ABI (`punktfunk_connect` + `next_au`/`next_audio`/`next_rumble`/`next_hidout`/ receivers), so it drops into a UI app cleanly. The C ABI (`punktfunk_connect` + `next_au`/
`send_input`/`send_rich_input`) is the *Apple* path; the native Rust clients call `next_audio`/`next_rumble`/`next_hidout`/`send_input`/`send_rich_input`) is the *Apple* path; the
`crates/punktfunk-core/src/client.rs` (`NativeClient`) methods directly. native Rust clients call `crates/punktfunk-core/src/client.rs` (`NativeClient`) methods directly.
- **Video widget = WinUI 3 `SwapChainPanel`** (built-in), fed a D3D11 swapchain via - **Video widget = WinUI 3 `SwapChainPanel`** (built-in), fed a D3D11 swapchain via
`ISwapChainPanelNative::SetSwapChain`. `ISwapChainPanelNative::SetSwapChain`. `present.rs` owns the D3D11 composition swapchain (WARP
fallback, runtime shaders, Contain-fit) — the same renderer, bound to the panel instead of an HWND.
- **Decode = FFmpeg-next + D3D11VA** (HEVC; **Main10** for 10-bit/HDR — see below). - **Decode = FFmpeg-next + D3D11VA** (HEVC; **Main10** for 10-bit/HDR — see below).
- **Audio playback = WASAPI render** + Opus decode (`opus` crate, vendors libopus via cmake; set - **Audio playback = WASAPI render** + Opus decode (`opus` crate, vendors libopus via cmake).
`CMAKE_POLICY_VERSION_MINIMUM=3.5`).
- **Input capture→send**: the client captures LOCAL input and sends it. Mouse (abs + relative) + - **Input capture→send**: the client captures LOCAL input and sends it. Mouse (abs + relative) +
keyboard via the **inverse VK table** (port `keymap.rs`); gamepad via **SDL3** (already a workspace keyboard via the **inverse VK table** (Windows VK is the native source, so simpler than Linux);
dep, cross-platform) → `NativeClient::send_input`/`send_rich_input`. (`SendInput`/`ViGEm` are gamepad via **SDL3** (already a workspace dep, cross-platform) → `NativeClient::send_input`/
HOST-side injection — not used by the client.) `send_rich_input`. (`SendInput`/`ViGEm` are HOST-side injection — not used by the client.)
- **Stream input is Win32 low-level hooks**, not XAML: reactor exposes only keyboard *accelerators*
+ pointer *button-state* (no raw key-down/up, no pointer position, no wheel), insufficient for a
game stream. `input.rs` installs `WH_KEYBOARD_LL`/`WH_MOUSE_LL` on the stream page (uninstalled
on exit), maps the pointer through the window client rect, sends native VK + abs mouse + wheel,
with a Ctrl+Alt+Shift+Q capture toggle. (A future alternative: generate
`Microsoft.UI.Xaml.UIElement` bindings from the staged winmd and subscribe to `KeyDown`/
`PointerMoved` — scoped to the panel.)
- **Discovery = `mdns-sd`** (cross-platform, browses `_punktfunk._udp`). - **Discovery = `mdns-sd`** (cross-platform, browses `_punktfunk._udp`).
- **Trust = shared client identity + SPAKE2 PIN pairing + TOFU** (port `trust.rs`; same identity - **Trust = shared client identity + SPAKE2 PIN pairing + TOFU** (`trust.rs`; same identity
files/logic as the other native clients). files/logic as the other native clients).
## The reference: `clients/linux/src/` ## 10-bit + HDR (the present reference)
Port these files (near 1:1; only the platform layers change): The host negotiates and emits **HEVC Main10 + BT.2020 PQ HDR10** when the captured desktop is HDR
(and 10-bit SDR Main10 when negotiated). The Windows client mirrors the Apple present:
| Linux file | Role | Windows swap |
|---|---|---|
| `main.rs` / `app.rs` | app shell, lifecycle | WinUI 3 `App`/`Window` via Reactor |
| `ui_hosts.rs` | host list / connect screen | WinUI 3 page |
| `ui_settings.rs` | settings | WinUI 3 page |
| `ui_stream.rs` | the streaming view | WinUI 3 page hosting `SwapChainPanel` |
| `video.rs` | FFmpeg decode + present | FFmpeg **D3D11VA** → D3D11 swapchain in `SwapChainPanel` |
| `audio.rs` | Opus decode + playback | **WASAPI render** (was PipeWire) |
| `session.rs` | `NativeClient` connect + plane pumps | **reuse almost verbatim** (core is cross-platform) |
| `trust.rs` | identity, PIN, TOFU | **reuse almost verbatim** |
| `discovery.rs` | mDNS browse | **reuse verbatim** (`mdns-sd`) |
| `keymap.rs` | inverse VK table | reuse; Windows VK is the native source so this is *simpler* |
| `gamepad.rs` | SDL3 pad capture + rumble/feedback | **reuse almost verbatim** (SDL3 is cross-platform) |
`session.rs`, `trust.rs`, `discovery.rs`, `keymap.rs`, `gamepad.rs` are mostly platform-neutral
(they touch `punktfunk-core` + SDL3 + mdns, all cross-platform) — expect to reuse them with minimal
changes. The real work is `video.rs` (D3D11VA + swapchain), `audio.rs` (WASAPI), and the WinUI shell.
## 10-bit + HDR (NEW — landed this session, the client MUST handle it)
The host now negotiates and emits **HEVC Main10 + BT.2020 PQ HDR10** when the captured desktop is
HDR (and 10-bit SDR Main10 when negotiated). The Apple client already does the matching present; the
Windows client should mirror it:
- **Advertise caps** in the `Hello`: `video_caps = VIDEO_CAP_10BIT | VIDEO_CAP_HDR` - **Advertise caps** in the `Hello`: `video_caps = VIDEO_CAP_10BIT | VIDEO_CAP_HDR`
(`crates/punktfunk-core/src/quic.rs`). The host enables 10-bit only if the client advertised it. (`crates/punktfunk-core/src/quic.rs`). The host enables 10-bit only if the client advertised it.
(The native-client connector in `client.rs` currently hardcodes `video_caps: 0` with a TODO —
thread the real caps through when you wire decode; or detect HDR purely in-band, see next.)
- **Detect HDR in-band** from the HEVC VUI (transfer characteristics = SMPTE ST 2084 / PQ), exactly - **Detect HDR in-band** from the HEVC VUI (transfer characteristics = SMPTE ST 2084 / PQ), exactly
like the Apple client's `VideoDecoder.isHDRFormat` (`clients/apple/Sources/PunktfunkKit/`). This like the Apple client's `VideoDecoder.isHDRFormat` (`clients/apple/Sources/PunktfunkKit/`). This
handles a mid-session HDR toggle without renegotiation. `Welcome.bit_depth` (8/10) is also available. handles a mid-session HDR toggle without renegotiation. `Welcome.bit_depth` (8/10) is also
available.
- **Decode** Main10 → **P010** (10-bit) via D3D11VA. - **Decode** Main10 → **P010** (10-bit) via D3D11VA.
- **Present HDR**: swapchain in `DXGI_FORMAT_R10G10B10A2_UNORM` (or `R16G16B16A16_FLOAT`), - **Present HDR**: swapchain in `DXGI_FORMAT_R10G10B10A2_UNORM` (or `R16G16B16A16_FLOAT`),
`IDXGISwapChain3::SetColorSpace1(DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020)` + `IDXGISwapChain3::SetColorSpace1(DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020)` + `SetHDRMetaData`
`SetHDRMetaData` for HDR10; the host's stream is BT.2020 PQ, so present PQ. For SDR, the existing for HDR10; the host's stream is BT.2020 PQ, so present PQ. For SDR, the existing
`DXGI_FORMAT_B8G8R8A8_UNORM` + BT.709 path. (The host-side HDR conversion math is in `DXGI_FORMAT_B8G8R8A8_UNORM` + BT.709 path. (The host-side HDR conversion math is in
`crates/punktfunk-host/src/capture/dxgi.rs` `HDR_PS`/`HdrConverter` if you need the inverse.) `crates/punktfunk-host/src/capture/dxgi.rs` `HDR_PS`/`HdrConverter` if you need the inverse.)
## Dev boxes ## Build gotchas (hard-won)
- **No-GPU dev box (UI + connect + software decode):** `ssh "Enrico Bühler"@192.168.1.57` — Win11 Pro - **`CARGO_HOME` must be an ASCII path** (`C:\Users\Public\.cargo`). SDL3's `build-from-source` PCH
25H2 (build 26200), QEMU Q35, 8 vCPU/12 GB, **no working GPU** (so no NVENC, no D3D11VA hardware embeds the registry source path; the `ü` in the dev box's username makes MSVC fail (`MSB8084` /
decode — use FFmpeg software decode here; this box is for UI/connect/protocol work). Has Rust 1.96 `C4828`). Build under an ASCII path generally (the same `ü` triggers `LNK1201` PDB-write failures
MSVC, VS 2026 + VC tools + Win SDK, Win App Runtime 2.2, SudoVDA + Parsec VDD. under `~/Developer`).
- **Real-GPU box (HDR / hardware decode / end-to-end):** `ssh "Enrico Bühler"@192.168.1.174` — Win11,
RTX 4090, runs the host. Use it to test the client against a live HDR host.
### Dev-loop gotchas (both boxes)
- **Build under an ASCII path** (`C:\Users\Public\…`). The username "Enrico Bühler" has a `ü` → MSVC
`LNK1201` PDB-write failure under `~/Developer`.
- **Toolchain gaps:** `winget install NASM.NASM Kitware.CMake LLVM.LLVM` (aws-lc-rs on the quic path,
ffmpeg-sys needs libclang).
- **`CMAKE_POLICY_VERSION_MINIMUM=3.5`** in the build env (CMake 4 rejects libopus's old minimum). - **`CMAKE_POLICY_VERSION_MINIMUM=3.5`** in the build env (CMake 4 rejects libopus's old minimum).
- **File transfer = `sftp`** (scp is broken under the PowerShell DefaultShell): - **Toolchain:** `winget install NASM.NASM Kitware.CMake LLVM.LLVM` (NASM for aws-lc-rs on the quic
`printf 'put %s /C:/Users/Public/REL/PATH\n' LOCAL | sftp -b - "Enrico Bühler@192.168.1.57"` path; libclang/LLVM for ffmpeg-sys).
note the **leading slash** `/C:/…`. Let the VM regenerate its own `Cargo.lock` (don't transfer it). - **windows-reactor is unpublished** (`version 0.0.0`) and fast-moving — depend on it as a **git dep
pinned to a commit** (`b4129fcc`), and pin the `windows` crate to the **same commit** so the
`IDXGISwapChain1` you pass to `set_swap_chain` satisfies reactor's `windows_core::Interface`. Its
`build.rs` downloads the Windows App SDK NuGets (Foundation/Interactive/Runtime), stages the
bootstrap DLL + `resources.pri` next to the exe, and **`.unwrap()`s `CARGO_WORKSPACE_DIR`** — set
it in the build env (`CARGO_WORKSPACE_DIR=C:\Users\Public\punktfunk`). It writes `/temp` + `/winmd`
to the workspace root (gitignored). The App SDK runtime must be installed to *run*.
- **Windows clippy is stricter** than Linux CI and `cfg(windows)` code is excluded from Linux CI → - **Windows clippy is stricter** than Linux CI and `cfg(windows)` code is excluded from Linux CI →
run `cargo clippy -p punktfunk-client-windows -- -D warnings` ON THE VM before committing. run `cargo clippy -p punktfunk-client-windows -- -D warnings` ON the Windows box before committing.
- Work on `main`; fetch+merge `origin/main` before pushing.
## Suggested phased plan ## Open items
1. **De-risk Reactor (do this first).** A windows-rs Reactor (WinUI 3) hello-world that hosts a - **On-glass validation on the RTX box** of **D3D11VA hardware decode** + **10-bit/HDR present** +
`SwapChainPanel` and presents a cleared D3D11 swapchain into it. Confirm the windows-rs Reactor the **WinUI GUI** — the dev VM is headless / SSH Session 0 / WARP, so the WinUI window can't show
version/API (PR #4479) and `ISwapChainPanelNative::SetSwapChain` interop. If Reactor proves too there and there is no hardware decode. Validate over RDP or on the RTX box against a live HDR host.
raw, the fallback is `winit` + a child HWND swapchain, but try Reactor first per the decision. - **RAWINPUT relative-mouse pointer-lock** for the stream view.
2. **Crate scaffold.** `clients/windows`, `[target.'cfg(windows)'.dependencies]`: - **Per-host speed-test widget** in the UI.
`punktfunk-core { path, features=["quic"] }`, `windows`, the Reactor crate, `ffmpeg-next`, `opus`,
`sdl3`, `mdns-sd`, `anyhow`, `tracing`. Mirror `clients/linux/Cargo.toml`.
3. **Connect + control plane.** Port `session.rs` + `trust.rs`; validate headless against the 4090
box (`punktfunk1-host`/`serve`) — handshake, PIN/TOFU, plane counters — before any UI/decode.
4. **Decode + present.** FFmpeg D3D11VA → `SwapChainPanel`. SDR (8-bit BGRA) first, then **P010 +
HDR colorspace** (see the HDR section).
5. **Audio.** WASAPI render + Opus decode (port `audio.rs`).
6. **Input.** Mouse + keyboard capture→send (port `keymap.rs`), gamepad via SDL3 (port `gamepad.rs`),
feedback from `next_rumble`/`next_hidout`.
7. **Discovery + UI.** Port `discovery.rs` + `ui_hosts.rs` + `ui_settings.rs` to WinUI pages.
## Key references ## Key references
- **Template:** `clients/linux/src/*` (the client to port). - **Full Windows plan + SudoVDA/host details:** `design/windows-host.md`.
- **Apple HDR present** (the pattern to mirror): `clients/apple/Sources/PunktfunkKit/{VideoDecoder, - **Template ported from:** `clients/linux/src/*`.
- **Apple HDR present** (the pattern mirrored): `clients/apple/Sources/PunktfunkKit/{VideoDecoder,
MetalVideoPresenter,Stage2Pipeline}.swift` — in-band PQ detection, P010 decode, EDR present. MetalVideoPresenter,Stage2Pipeline}.swift` — in-band PQ detection, P010 decode, EDR present.
- **Core client API:** `crates/punktfunk-core/src/client.rs` (`NativeClient`). - **Core client API:** `crates/punktfunk-core/src/client.rs` (`NativeClient`).
- **Protocol:** `crates/punktfunk-core/src/quic.rs` (`Hello.video_caps`, `Welcome.bit_depth`, - **Protocol:** `crates/punktfunk-core/src/quic.rs` (`Hello.video_caps`, `Welcome.bit_depth`,
`VIDEO_CAP_10BIT`/`VIDEO_CAP_HDR`). `VIDEO_CAP_10BIT`/`VIDEO_CAP_HDR`).
- **Full Windows plan + SudoVDA/host details:** `design/windows-host.md`. - **Host HDR conversion (inverse math):** `crates/punktfunk-host/src/capture/dxgi.rs` (`HDR_PS`,
- **Host HDR conversion (for the inverse math):** `crates/punktfunk-host/src/capture/dxgi.rs` `HdrConverter`) + `crates/punktfunk-host/src/encode/nvenc.rs` (BT.2020/PQ VUI).
(`HDR_PS`, `HdrConverter`) + `crates/punktfunk-host/src/encode/nvenc.rs` (BT.2020/PQ VUI).
+72 -170
View File
@@ -1,25 +1,27 @@
# Windows virtual DualSense — game detection handoff # Windows virtual DualSense — game detection handoff
> **Status:** Identity fix SHIPPED (commits `6db3525`, `aa159df`, `4a73102`) —
> `crates/punktfunk-host/src/inject/windows/dualsense_windows.rs` (`create_swdevice`). This doc is trimmed
> to the root-cause analysis, the SwDeviceCreate identity rationale, the GameInput fallback design, and the
> still-open on-glass Cyberpunk verification. The implementation walkthrough, probe tooling, and the
> (now-fixed) secondary driver gaps are cut — the shipped code is the source of truth.
Goal: get the host's virtual DualSense **detected and usable in games** (Cyberpunk's native PS5 path + Goal: get the host's virtual DualSense **detected and usable in games** (Cyberpunk's native PS5 path +
others) on the Windows host. This doc is the portable handoff (the investigation lives here, not in any others) on the Windows host. Run the decisive experiments **on the interactive desktop of the Windows host**
one agent's memory). Run the experiments **on the Windows host** (`.173`, repo at (`.173`) — not over SSH.
`C:\Users\Public\punktfunk-native`).
## Status (2026-06-22) ## Where it works / where it doesn't
- **Input works.** Client → host → virtual DualSense → games read input. Verified in Steam's controller - **Input works.** Client → host → virtual DualSense → games read input (verified in Steam's controller
test (buttons/sticks). test).
- **The HID is a CORRECT, COMPLETE DualSense.** An SDL3 probe reports our live device as - **The HID is a CORRECT, COMPLETE DualSense.** SDL3 reports the live device as
`name='DualSense Wireless Controller' vid=0x054C pid=0x0CE6 isGamepad=True gamepadType=PS5`. SDL = `name='DualSense Wireless Controller' vid=0x054C pid=0x0CE6 isGamepad=True gamepadType=PS5`. SDL = HIDAPI =
HIDAPI = what Steam (and many games) build on → that's why Steam works. So the report descriptor, what Steam (and many games) build on → that's why Steam works. This is **not** a descriptor/feature-report
feature reports, and identity are right; this is **not** a descriptor/feature-report problem. problem.
- **Cyberpunk's native DualSense path does NOT detect it at all.** (Steam Input was off — Cyberpunk was - **Cyberpunk's native DualSense path does NOT detect it at all** (Steam Input was off — Cyberpunk was
reading the raw HID.) reading the raw HID). This is the problem the identity fix targets; on-glass confirmation is still open.
- **Rumble:** host-side is proven working (driver captures the game's `0x02`, `parse_ds_output` extracts
the motors, host forwards `0xCA` — log: `rumble: forwarding to client (0xCA) low=16128 high=16128`).
The break is the **client** (macOS) not rendering `0xCA` onto the physical pad. Separate task/agent.
## Root cause — CONFIRMED (2026-06-22, run live on the interactive desktop, console session 3) ## Root cause — the PnP identity, not the HID descriptor (CONFIRMED, run live in console session 3)
The break is the device's **PnP identity / device-interface path**, not the HID descriptor or feature The break is the device's **PnP identity / device-interface path**, not the HID descriptor or feature
reports. `hidclass` derives the HID child's path token and its `HID\VID_054C&PID_0CE6` hardware-ids from the reports. `hidclass` derives the HID child's path token and its `HID\VID_054C&PID_0CE6` hardware-ids from the
@@ -40,7 +42,7 @@ Device-interface paths (from `HKLM\SYSTEM\CurrentControlSet\Control\DeviceClasse
| Real DualShock 4 (USB, registry remnant) | `\\?\HID#VID_054C&PID_05C4&REV_0100#…` | | Real DualShock 4 (USB, registry remnant) | `\\?\HID#VID_054C&PID_05C4&REV_0100#…` |
| Real DualSense (BT, registry remnant) | `\\?\HID#{00001124-…}_VID&0002054c_PID&0ce6#…` | | Real DualSense (BT, registry remnant) | `\\?\HID#{00001124-…}_VID&0002054c_PID&0ce6#…` |
**Cross-API enumeration (the decisive experiment — impossible over SSH, run live in the console session):** **Cross-API enumeration matrix (the decisive experiment — impossible over SSH, run live in the console):**
| API | Sees our virtual DS5? | Identity reported | Reads from | | API | Sees our virtual DS5? | Identity reported | Reads from |
| --- | --- | --- | --- | | --- | --- | --- | --- |
@@ -54,26 +56,19 @@ Device-interface paths (from `HKLM\SYSTEM\CurrentControlSet\Control\DeviceClasse
The GameInput result is the clincher: it **does** enumerate our pad — descriptor fingerprint matches exactly The GameInput result is the clincher: it **does** enumerate our pad — descriptor fingerprint matches exactly
(15 buttons, 6 axes, 1 hat, usage Game Pad 0x05) — but reports **vid/pid = 0**, while it reads the real (15 buttons, 6 axes, 1 hat, usage Game Pad 0x05) — but reports **vid/pid = 0**, while it reads the real
8BitDo's `vid=0x3434` correctly. So GameInput (and, by the same logic, a native PS5 path) takes VID/PID from 8BitDo's `vid=0x3434` correctly. So GameInput (and, by the same logic, a native PS5 path) takes VID/PID from
the **PnP device path / hardware-ids, NOT from `HIDD_ATTRIBUTES`**, and ours carry no `VID_054C&PID_0CE6`. the **PnP device path / hardware-ids, NOT from `HIDD_ATTRIBUTES`**. Everything that reads attributes directly
Everything that reads attributes directly (SDL / RawInput / WGI-raw) is fine; everything that keys off the (SDL / RawInput / WGI-raw) is fine; everything that keys off the device *identity/path* (GameInput, native
device *identity/path* (GameInput, native DualSense detection) sees a generic, unidentified gamepad → no DualSense detection) sees a generic, unidentified gamepad → no PS5 path.
PS5 path.
**⇒ The fix must put `VID_054C&PID_0CE6` into the device-interface path and the `HID\VID&PID` hardware-ids** **⇒ The fix must put `VID_054C&PID_0CE6` into the device-interface path and the `HID\VID&PID` hardware-ids**
(give the device a real-USB-like PnP identity), not merely correct `HIDD_ATTRIBUTES`. See "Fix options". (give the device a real-USB-like PnP identity), not merely correct `HIDD_ATTRIBUTES`.
**Secondary driver gaps found (not the detection blocker, but fix while here):** ## The fix — SwDeviceCreate identity (shipped)
- `IOCTL_HID_GET_STRING` (id 4, ioctl `0x000b0013`) returns `STATUS_NOT_IMPLEMENTED` — a game polls it
repeatedly (seen live in `pfds-driver.log`). Implement manufacturer / product / serial strings
(`"DualSense Wireless Controller"`, a serial). Native PS5 code can read the serial to tell USB from BT.
- `DS_FEATURE_CALIBRATION` is **42** bytes but the report descriptor declares feature `0x05` as **41**
(`0x95 0x28` = 40 data + 1 id). Trim to 41 (motion-only; SDL accepts it regardless).
## Fix — implemented & validated at the identity layer (2026-06-22) `create_swdevice` sets the identity via **`SW_DEVICE_CREATE_INFO` struct fields** (NOT `pProperties` — a
`DEVPROPERTY` write of these PnP-owned identity keys is empirically *ignored*; the create-time struct fields
are the supported lever, confirmed on `.173`):
`create_swdevice` (`inject/dualsense_windows.rs`) now sets, via **`SW_DEVICE_CREATE_INFO` struct fields**
(NOT `pProperties` — empirically a `DEVPROPERTY` write of these PnP-owned identity keys is ignored; the
create-time struct fields are the supported lever, confirmed on `.173`):
- **`pszzCompatibleIds`** = `USB\VID_054C&PID_0CE6`, `USB\Class_03&SubClass_00&Prot_00`, `USB\Class_03` - **`pszzCompatibleIds`** = `USB\VID_054C&PID_0CE6`, `USB\Class_03&SubClass_00&Prot_00`, `USB\Class_03`
(Windows appends `SWD\Generic`). HIDAPI/SDL/libScePad walk HID-child → `CM_Get_Parent` → this parent's (Windows appends `SWD\Generic`). HIDAPI/SDL/libScePad walk HID-child → `CM_Get_Parent` → this parent's
CompatibleIds and string-match `"USB"`**`bus_type` now resolves to USB** (was UNKNOWN). CompatibleIds and string-match `"USB"`**`bus_type` now resolves to USB** (was UNKNOWN).
@@ -81,161 +76,68 @@ create-time struct fields are the supported lever, confirmed on `.173`):
`USB\VID_054C&PID_0CE6&REV_0100`, `USB\VID_054C&PID_0CE6`. hidclass then derives the real-DS5 child ids `USB\VID_054C&PID_0CE6&REV_0100`, `USB\VID_054C&PID_0CE6`. hidclass then derives the real-DS5 child ids
**`HID\VID_054C&PID_0CE6[&REV_0100]`** (previously only `HID\VID_054C&UP:0001_U:0005`). **`HID\VID_054C&PID_0CE6[&REV_0100]`** (previously only `HID\VID_054C&UP:0001_U:0005`).
- **`pContainerId`** = a deterministic per-pad GUID `{50464453-0000-0000-0000-00000000000<idx>}` ("PFDS") - **`pContainerId`** = a deterministic per-pad GUID `{50464453-0000-0000-0000-00000000000<idx>}` ("PFDS")
(avoids the null-sentinel-ContainerId `xinput1_4` slot-skip bug; groups the pad's devnodes). avoids the null-sentinel-ContainerId `xinput1_4` slot-skip bug, and groups the pad's devnodes.
**Validated live** (real shipping path, `dualsense-windows-test --index 1` alongside the running service's **Validated live** (real shipping path, `dualsense-windows-test --index 1` alongside the running service's
pad 0): INF still binds (`Service=MsHidUmdf`), parent CompatibleIds/HardwareIds + per-pad ContainerId set, pad 0): INF still binds (`Service=MsHidUmdf`), parent CompatibleIds/HardwareIds + per-pad ContainerId set,
the HID child gains `HID\VID_054C&PID_0CE6`, and the HIDAPI parent-walk reports **bus_type=USB**. the HID child gains `HID\VID_054C&PID_0CE6`, and the HIDAPI parent-walk reports **bus_type=USB**. SDL /
SDL / RawInput / WGI `RawGameController` identity stays correct (054C:0CE6). RawInput / WGI `RawGameController` identity stays correct (054C:0CE6).
**Remaining gap (NOT fixed by the above): GameInput VID/PID still reads 0.** GameInput parses VID/PID from **Why this may still not satisfy GameInput / a native PS5 path:** GameInput parses VID/PID from the HID
the HID child's **instance path** (`HID\punktfunk\1&…`), which carries no `VID_…&PID_…` token; neither child's **instance path** (`HID\punktfunk\1&…`), which carries no `VID_…&PID_…` token; neither CompatibleIds
CompatibleIds nor HardwareIds change the instance path. Only a real USB-bus instance path nor HardwareIds change the instance path. Only a real USB-bus instance path (`HID\VID_054C&PID_0CE6\…`) does —
(`HID\VID_054C&PID_0CE6\…`) does — i.e. a **ViGEm-style KMDF USB-emulating bus driver** (the rank-3, last i.e. a **ViGEm-style KMDF USB-emulating bus driver** (see fallback below). Prior art (HIDMaestro) shows pure
resort). Pursue only if a target title uses GameInput AND the identity fix above doesn't satisfy it; prior user-mode pads ARE accepted by WGI/GameInput, so other parity (descriptor / strings / mapping) may matter
art (HIDMaestro) shows pure user-mode pads ARE accepted by WGI/GameInput, so other parity (descriptor / more than a genuine USB bus.
strings / mapping) may matter more than a genuine USB bus.
## Next steps ## GameInput fallback design (rank-3, only if needed)
> **Deployed to `.173` (2026-06-22):** the host identity fix is live in the `PunktfunkHost` service (release If a target title uses **GameInput** AND the shipped identity fix above doesn't satisfy it, the last-resort
> rebuilt + restarted) and the driver fixes are installed + signed (`oem74.inf`, `punktfunk-ds-test` cert). option is a **rank-3 KMDF USB-emulating bus driver** (the way ViGEmBus presents a real-looking device)
> The box is ready for the decisive on-glass test. A rollback copy of the prior driver is at instead of SwDeviceCreate + UMDF-HID — it produces a genuine `HID\VID_054C&PID_0CE6\…` instance path, the one
> `C:\Users\Public\giprobe\driver-backup-oem74`. thing GameInput keys off. Pursue this only if required by a target title; it is heavier than the user-mode
path and HIDMaestro suggests user-mode pads can be made acceptable to GameInput without it.
1. **Decisive on-glass test (only the user can run):** launch Cyberpunk 2077 with Steam Input OFF against a ## On-glass Cyberpunk verification procedure (open — only the user can run it)
virtual DS5 carrying the new identity; check the in-game glyphs/prompt switch to DualSense. Cleanest
single-pad test (frees the service's pad 0 so only the new-identity pad is present):
`sc stop PunktfunkHost``target\debug\punktfunk-host.exe dualsense-windows-test --index 0 --seconds 600`
(new identity + live cycling Cross/stick), launch the game; then deploy the release + restart with
`scripts\windows\deploy-host.ps1`.
2. **Driver-side correctness — DONE & installed (2026-06-22).** Rebuilt/resigned/reinstalled per the recipe
below; validated live (`hidstrings` probe + `pfds-driver.log`):
- `IOCTL_HID_GET_STRING` now implemented (was `STATUS_NOT_IMPLEMENTED`). **Discovery:** Windows polls
this device's string slots with low-word ids **`0x0E`/`0x0F`/`0x10`** (lang `0x0409`) cyclically — NOT
the `0/1/2` `HID_STRING_ID_*` constants. The handler maps them (+ `0/1/2` as fallbacks):
`0x0E`→manufacturer "Sony Interactive Entertainment", `0x0F`→product "DualSense Wireless Controller",
`0x10`→serial "35533AD6E774" (the `0x09` pairing-report MAC). Verified: `HidD_GetManufacturer/Product/
SerialNumberString` now return those three distinct strings.
- `DS_FEATURE_CALIBRATION` trimmed 42 → 41 bytes (1 id + 40 data) to match the descriptor's feature
`0x05` (`0x95 0x28`).
- The repo source (`packaging/windows/dualsense-driver/src/lib.rs`) and the m0 build copy were diverged
by *formatting only*; they are now back in sync (the repo file was copied to m0 before building).
3. If a GameInput-only title needs the real VID/PID → the rank-3 KMDF USB-emulating bus driver.
## On-box experiment tooling (built 2026-06-22, `C:\Users\Public\giprobe\`) Must run on the interactive desktop (RDP in or run locally) — WGI / RawInput / GameInput enumeration returns
- `probe.cpp` (+`build.bat`) — GameInput enumeration/fingerprint via `LoadLibrary("GameInput.dll")` + **empty from a headless SSH session** (no window/message pump); only HIDAPI works headless.
`GameInputCreate`/`RegisterDeviceCallback` (GDK header). Prints each device's vid/pid/usage/counts —
this is what proved GameInput reads our pad as vid=0.
- `swexp.cpp` (+`build-swexp.bat`) — standalone `SwDeviceCreate` identity experiment: variations for
`pszzCompatibleIds` (struct field) vs `DEVPKEY_Device_CompatibleIds` (pProperties — ignored),
`pszzHardwareIds` USB ids, `pContainerId`. Create at a spare instance id, hold, inspect. Built with the
VS18 MSVC toolchain via `vcvars64.bat`.
- WGI probe: Windows PowerShell **5.1** WinRT projection of `RawGameController`/`Gamepad` (pump the message
loop; subscribe `RawGameControllerAdded` to kick enumeration).
- Parent-walk bus check: from the HID child, `DEVPKEY_Device_Parent` → that node's
`DEVPKEY_Device_CompatibleIds`, match `^USB`/`^BTH` — mirrors HIDAPI's `hid_internal_detect_bus_type()`.
- NOTE: the agent shell's PowerShell tool chokes on inline `@'…'@` here-strings feeding `Add-Type` (throws
a spurious "Remove-Item on system path '/' is blocked"); write C#/scripts to a file and run them instead.
## How to reproduce / iterate (on `.173`) 1. Free the service's pad 0 so only the new-identity pad is present:
`sc stop PunktfunkHost`
### 1. Spawn a live virtual DualSense to test against 2. Spawn a single virtual DS5 carrying the new identity (cycles Cross/stick so input is visible):
``` `target\debug\punktfunk-host.exe dualsense-windows-test --index 0 --seconds 600`
C:\Users\Public\punktfunk-native\target\debug\punktfunk-host.exe dualsense-windows-test --seconds 60 3. Launch **Cyberpunk 2077 with Steam Input OFF** (so the game reads the raw HID). Check the in-game
``` glyphs/prompt **switch to DualSense**.
Creates `SWD\PUNKTFUNK\PF_PAD_0` (+ its HID child) and holds it, pushing a cycling input. Or just connect 4. Restore the service afterward: redeploy the release + restart with `scripts\windows\deploy-host.ps1`.
a client — the real session creates the identical device. (Build with the env `CMAKE_POLICY_VERSION_MINIMUM=3.5`.)
### 2. SDL3 detection oracle (already set up: `C:\Users\Public\sdltest\SDL3.dll`)
Confirms HID-level recognition (HIDAPI). Run while a device from step 1 is live. PowerShell + C# (note:
PS 5.1's Add-Type is C# 5 — **no** interpolated strings, **no** inline `out` vars, **no**
`Marshal.PtrToStringUTF8`; SDL3 bools are 1 byte → `[return: MarshalAs(UnmanagedType.I1)]`):
```powershell
$cs = @'
using System; using System.Runtime.InteropServices; using System.Text;
public static class S {
const string D = @"C:\Users\Public\sdltest\SDL3.dll";
[DllImport(D)][return: MarshalAs(UnmanagedType.I1)] public static extern bool SDL_Init(uint f);
[DllImport(D)] public static extern IntPtr SDL_GetJoysticks(out int c);
[DllImport(D)] public static extern IntPtr SDL_GetJoystickNameForID(uint id);
[DllImport(D)] public static extern ushort SDL_GetJoystickVendorForID(uint id);
[DllImport(D)] public static extern ushort SDL_GetJoystickProductForID(uint id);
[DllImport(D)][return: MarshalAs(UnmanagedType.I1)] public static extern bool SDL_IsGamepad(uint id);
[DllImport(D)] public static extern IntPtr SDL_OpenGamepad(uint id);
[DllImport(D)] public static extern int SDL_GetGamepadType(IntPtr g);
static string U(IntPtr p){ if(p==IntPtr.Zero)return""; int n=0; while(Marshal.ReadByte(p,n)!=0)n++; byte[] b=new byte[n]; Marshal.Copy(p,b,0,n); return Encoding.UTF8.GetString(b); }
public static string Run(){ if(!SDL_Init(0x2000))return"init fail"; System.Threading.Thread.Sleep(1500);
int n=0; IntPtr a=SDL_GetJoysticks(out n); StringBuilder sb=new StringBuilder("joysticks: "+n+"\n");
for(int i=0;i<n;i++){ uint id=(uint)Marshal.ReadInt32(a,i*4); bool ig=SDL_IsGamepad(id); int t=ig?SDL_GetGamepadType(SDL_OpenGamepad(id)):-1;
sb.AppendLine(" '"+U(SDL_GetJoystickNameForID(id))+"' vid=0x"+SDL_GetJoystickVendorForID(id).ToString("x4")+" pid=0x"+SDL_GetJoystickProductForID(id).ToString("x4")+" isGamepad="+ig+" type="+t+" (PS5=6)"); }
return sb.ToString(); }
}
'@
Add-Type -TypeDefinition $cs; [S]::Run()
```
Expected today: it lists our device with `type=6` (PS5). That's the baseline "HID is correct".
## Next experiments — MUST run ON THE INTERACTIVE DESKTOP, not over SSH
WGI / RawInput / GameInput enumeration returns **empty from a headless SSH session** (no window/message
pump) — only HIDAPI works headless. So these must run in the logged-in desktop session (RDP in, or run
locally) while a DualSense session is live:
1. **Determine which API Cyberpunk uses and whether it sees the SWD device.** Enumerate via, separately:
- `Windows.Gaming.Input` (`RawGameController.RawGameControllers`, `Gamepad.Gamepads`),
- RawInput (`GetRawInputDeviceList` → filter HID gamepad usage 01/05),
- GameInput (`GameInputCreate``EnumerateDevices`) — `GameInputRedistService` is installed on `.173`.
Compare which list our `VID_054C&PID_0CE6` appears in. The one(s) it's *missing from* point at the API
Cyberpunk uses.
2. **If WGI/GameInput exclude it:** make the SwDeviceCreate device enumerate more like a real USB device.
`SwDeviceCreate` takes a `pProperties` (`DEVPROPERTY[]`) array — try setting bus-type / container-id /
compatible-IDs so the newer APIs accept it. If that's insufficient, the heavyweight option is a
USB-emulating bus driver (the way ViGEmBus presents a real-looking device) instead of SwDeviceCreate +
UMDF-HID.
3. **Rule out an XInput device taking priority** (a leftover ViGEm pad, etc.).
4. **Correctness (not the detection blocker):** `DS_FEATURE_CALIBRATION` in the driver is **42 bytes**
but the report descriptor declares feature `0x05` as **41** (1 id + 40 data, `0x95 0x28`). Trim to 41;
wrong calibration only affects motion, and SDL accepts the device regardless.
## On-box layout (`.173`, builds + tools)
- **Host repo / build:** `C:\Users\Public\punktfunk-native``cargo build -p punktfunk-host`
(debug for `dualsense-windows-test`; `--release --features nvenc` is what the service runs). The
build env is persisted Machine-scope (`PUNKTFUNK_NVENC_LIB_DIR`, `LIBCLANG_PATH`,
`CMAKE_POLICY_VERSION_MINIMUM`) — see `scripts\windows\`. **One-call rebuild+redeploy of the
service: `scripts\windows\deploy-host.ps1`** (stop → build → restart, `.bak` rollback); web:
`scripts\windows\build-web.ps1`. bun=`C:\Users\Public\bun`, node=`C:\Users\Public\node-v22.11.0-win-x64`.
- **Host service:** scheduled task / SCM `PunktfunkHost` runs `…\target\release\punktfunk-host.exe
service run` → spawns `serve` (currently native-only, `PUNKTFUNK_HOST_CMD=serve` in
`C:\ProgramData\punktfunk\host.env`). Restart: `sc stop/start PunktfunkHost`. Native port 9777, mgmt
47990. (NB: Sunshine/Apollo conflicts on the GameStream ports — keep it stopped, or run native-only.)
- **UMDF driver build project:** `C:\Users\Public\m0\windows-drivers-rs\examples\pf-dualsense`
(`pf_dualsense.inx` + `src\lib.rs` live here; the canonical copies are in the repo under
`packaging/windows/dualsense-driver/` — keep them in sync). Rebuild + reinstall recipe (e.g. after the
calibration fix), all from that dir, env `LIBCLANG_PATH=C:\Program Files\LLVM\bin`,
`Version_Number=10.0.26100.0`:
1. `cargo make` → `target\debug\pf_dualsense_package\`
2. **Clear the FORCE_INTEGRITY PE bit** (wdk-build sets `/INTEGRITYCHECK`, which blocks self-signed
load): clear bit 0x80 at `PE_header_offset+0x5e` of `pf_dualsense.dll`, then re-sign.
3. `signtool sign /fd SHA256 /sha1 6A52984E54376C45A1C236B1A2C8A746C5AB6131 pf_dualsense.dll`
4. `Inf2Cat /driver:<pkg> /os:10_x64` → re-sign the `.cat` with the same thumbprint.
5. `pnputil /delete-driver <old oemNN.inf> /uninstall /force` then `pnputil /add-driver
pf_dualsense.inf /install`. (Self-signed cert is already trusted on `.173`; Secure Boot ON, HVCI off.)
- **SDL oracle:** `C:\Users\Public\sdltest\SDL3.dll`. **Test device:** `punktfunk-host.exe
dualsense-windows-test --seconds N` creates one `SWD\PUNKTFUNK\PF_PAD_0` and holds it.
## Key code ## Key code
| What | File | | What | File |
| --- | --- | | --- | --- |
| Host backend (`create_swdevice`, the `Global\pfds-shm-<idx>` section, write_state/service/pump) | `crates/punktfunk-host/src/inject/dualsense_windows.rs` | | Host backend (`create_swdevice`, the `Global\pfds-shm-<idx>` section, write_state/service/pump) | `crates/punktfunk-host/src/inject/windows/dualsense_windows.rs` |
| UMDF driver (HID descriptor, feature reports, `on_output_report`) | `packaging/windows/dualsense-driver/src/lib.rs` | | UMDF driver (HID descriptor, feature reports, `on_output_report`) | `packaging/windows/drivers/pf-dualsense/src/lib.rs` |
| Shared report codec (`serialize_state` input, `parse_ds_output` feedback) | `crates/punktfunk-host/src/inject/dualsense_proto.rs` | | Shared report codec (`serialize_state` input, `parse_ds_output` feedback) | `crates/punktfunk-host/src/inject/proto/dualsense_proto.rs` |
| Pad seam (`PadBackend`, `pump` → rumble `0xCA` / hidout `0xCD`) | `crates/punktfunk-host/src/punktfunk1.rs` | | Pad seam (`PadBackend`, `pump` → rumble `0xCA` / hidout `0xCD`) | `crates/punktfunk-host/src/punktfunk1.rs` |
## Open items
1. **Decisive on-glass Cyberpunk test — pending execution.** Launch Cyberpunk 2077 with Steam Input OFF
against a virtual DS5 carrying the new identity; verify the in-game glyphs switch to DualSense (procedure
above).
2. **GameInput rank-3 KMDF USB-emulating bus-driver fallback — optional.** Only if a GameInput-only title
needs the real VID/PID and the shipped SwDeviceCreate identity fix doesn't satisfy it.
## Facts proven (don't re-litigate) ## Facts proven (don't re-litigate)
- `SwDeviceCreate` requirements: enumerator must have **no underscore** (`punktfunk`); the completion - `SwDeviceCreate` requirements: enumerator must have **no underscore** (`punktfunk`); the completion
**callback is mandatory** (NULL → E_INVALIDARG). Per-session device works; auto-removed on disconnect. **callback is mandatory** (NULL → E_INVALIDARG). Per-session device works; auto-removed on disconnect.
- The identity keys must be set via the **`SW_DEVICE_CREATE_INFO` struct fields**, not `pProperties` — a
`DEVPROPERTY` write of the PnP-owned identity keys is ignored.
- HID descriptor + feature reports are DS5-accurate enough that **SDL identifies it as PS5**. - HID descriptor + feature reports are DS5-accurate enough that **SDL identifies it as PS5**.
- Host-side rumble works end to end; the client (macOS) rendering of `0xCA` is the open rumble bug. - The `IOCTL_HID_GET_STRING` and `DS_FEATURE_CALIBRATION` (42 → 41 bytes) driver gaps were fixed + shipped;
the driver answers `HidD_GetManufacturer/Product/SerialNumberString` with distinct strings. (Detail in
`packaging/windows/drivers/pf-dualsense/src/lib.rs`.)
- Host-side rumble works end to end (driver captures the game's `0x02`, `parse_ds_output` extracts the
motors, host forwards `0xCA`); the client (macOS) rendering of `0xCA` onto the physical pad is a separate
open bug, not part of game detection.
+130 -389
View File
@@ -1,416 +1,157 @@
# Windows host — virtual DualSense scoping # Windows host — virtual DualSense scoping
**Status:** **M0 feasibility gate PASSED (2026-06-21)** — a self-authored **Rust** UMDF virtual DualSense > **Status:** SHIPPED — M0 feasibility gate PASSED (2026-06-21), M1M4 landed. Driver:
loads self-signed under Secure Boot, is recognized as a genuine DualSense by Steam, and receives `0x02` > `packaging/windows/drivers/pf-dualsense/` (README there); host backend
output reports at its write callback. Driver source: `packaging/windows/dualsense-driver/`. (Earlier in > `crates/punktfunk-host/src/inject/dualsense_windows.rs` + shared contract
this doc's history the gate looked blocked by Secure Boot / driver code-integrity — that was wrong; the > `inject/dualsense_proto.rs`. Commits `aa159df` (Rust UMDF driver + shm channel),
real blocker was the PE FORCE_INTEGRITY bit that `wdk-build` sets via `/INTEGRITYCHECK`, cleared post-build.) > `4a73102` (host backend), `fde438a`/`6db3525` (SwDeviceCreate per-session devnode),
Web-research pass complete; the mechanism conclusion is **reversed** > `b0c8233` (pure-user-mode DS4/Xbox 360, ViGEm dropped). This doc is trimmed to design
from the 2026-06-20 draft. This doc **supersedes the 2026-06-20 VHF scoping** — VHF was the wrong > rationale + open items; implementation detail lives in the code and the driver README.
answer (it is kernel-only and cannot host a user-mode HID source), and the correct mechanism is a
**UMDF2 user-mode HID minidriver**, the same driver tier punktfunk already vendors/signs/installs for
SudoVDA. Two product decisions are now fixed and drive this plan: **(1)** the driver is for **public
end-user distribution** (so: EV cert + Microsoft attestation signing, not just the fleet self-signed
recipe), and **(2)** the strong preference is a **self-authored Rust driver**, with a thin C/C++ shim
as the realistic fallback and forking HIDMaestro as the last resort.
## TL;DR ## Why UMDF2, and why a real virtual DualSense (the WHY)
Apollo's backlog item #23/#89 ("DS4 ViGEm target on Windows") is the **wrong target** if the goal is Apollo's backlog "DS4 ViGEm target on Windows" is the **wrong target** for *actual DualSense*.
*actual DualSense*. ViGEmBus emulates only **Xbox 360 (XUSB)** and **DualShock 4 (DS4)** — never a ViGEmBus emulates only **Xbox 360 (XUSB)** and **DualShock 4** — never a DualSense. Because this
DualSense. Because this is a *host-side* virtual pad, the DualSense-defining features (adaptive is a *host-side* virtual pad, the DualSense-defining features (adaptive triggers, the fine haptic
triggers, the fine haptic actuators, DS5 identity) only work end-to-end if the **game sees a real actuators, DS5 identity) only work end-to-end if the **game sees a real DualSense** and therefore
DualSense** and therefore drives them; a DS4 virtual pad means the game takes its DS4 code path and drives them. A DS4 virtual pad makes the game take its DS4 code path and never emit those commands,
never emits those commands, so the client's adaptive-trigger rendering is never exercised. ViGEm DS4 so the client's adaptive-trigger rendering is never exercised. **ViGEm DS4 structurally cannot
structurally **cannot** deliver adaptive triggers. deliver adaptive triggers** — that ceiling is the whole reason not to copy Apollo here (and Apollo
itself does DualSense only on Linux via `inputtino`; its Windows path is ViGEm `XUSB`/`DS4_REPORT_EX`
only — zero virtual-HID/DualSense code to vendor).
The right path is the Windows analog of what the Linux host already does over `/dev/uhid`: present a The right path is the Windows analog of the Linux host's `/dev/uhid` device: present a **real virtual
**real virtual DualSense HID device** (Sony VID `054C` / PID `0CE6`, the inputtino PS5 report DualSense HID device** (Sony VID `054C` / PID `0CE6`, the inputtino PS5 report descriptor we already
descriptor punktfunk already ships). On Windows that is a **UMDF2 (user-mode) HID minidriver** ship) so the game/Steam/GameInput bind it as genuine.
created/torn-down per session from the host via `SwDeviceCreate`, sitting as a lower filter under
the OS pass-through driver `mshidumdf.sys`. It is the **same driver tier as SudoVDA** (UMDF, not
kernel), so the existing vendor → sign → Inno-installer machinery applies almost unchanged.
> **Supersedes the 2026-06-20 VHF scoping.** That draft concluded "a kernel-mode virtual-HID device **Mechanism = a UMDF2 (user-mode) HID minidriver**, created/torn-down per session via
> via the Virtual HID Framework (VHF) — a SudoVDA-class driver effort." The decisive correction: `SwDeviceCreate`, as a lower filter under the OS pass-through driver `mshidumdf.sys`. This is the
> **VHF supports a HID *source* driver only in kernel mode** (Microsoft "Virtual HID Framework **same driver tier as SudoVDA** (UMDF, not kernel), so the existing vendor → sign → Inno-installer
> (VHF)"). A user-mode (UMDF) HID source is **not** a VHF use case — it is a UMDF2 HID minidriver machinery applies almost unchanged. Two corrections drove this conclusion over the 2026-06-20 draft:
> built from the `vhidmini2` sample (or DMF's `Dmf_VirtualHidMini`). The earlier "KMDF is a higher
> bar than SudoVDA's UMDF/IddCx" framing is therefore wrong: the correct mechanism is **the same
> UMDF tier as SudoVDA**, not above it.
Everything except the host backend is already platform-agnostic and DualSense-complete (verified - **VHF (Virtual HID Framework) supports a HID *source* driver only in kernel mode** — it is *not*
against live code), so this is a well-bounded host-side addition. **The whole effort is gated by an the mechanism for a user-mode virtual pad. The user-mode mechanism is a UMDF2 HID minidriver built
on-glass feasibility spike (M0)** that no prior art settles: whether a virtual `054C:0CE6` device is from the `vhidmini2` sample. So the earlier "KMDF, a higher bar than SudoVDA" framing was wrong:
accepted as a genuine DualSense by `Windows.Gaming.Input` / GameInput / Steam **and** whether the it is the *same* UMDF tier.
game's output report `0x02` (the adaptive-trigger block) actually reaches the driver's write callback. - **UMDF 2.0 is NOT COM-based** (COM/`IDriverEntry`/`IWDFDriver` are legacy UMDF 1.x). UMDF 2.0 uses
the same **C-style WDF object model as KMDF** — a `DriverEntry` symbol + C function pointers, no
vtable. This is precisely why a Rust FFI implementation is even conceivable.
## Why this is the wrong place to copy Apollo Everything except the host backend was already platform-agnostic and DualSense-complete (protocol
planes `0xCC`/`0xCA`/`0xCD`, the `HidOutput` feedback abstraction, pad-type negotiation, clients, the
C-ABI). The DualSense HID contract (the 232-byte `DUALSENSE_RDESC`, `serialize_state` for input report
`0x01`, `parse_ds_output` for output report `0x02`, the `0x05`/`0x09`/`0x20` feature blobs, USB framing
no-CRC) was already pure transport-independent Rust — so the report bytes are identical to Linux and
only the device-framing layer is new.
Apollo (and all of Sunshine's lineage) **does DualSense only on Linux** (`inputtino`, ## Why Rust ("Option R") despite zero precedent
`DualSenseWired`). Its Windows input path (`src/platform/windows/input.cpp`) is ViGEm
`XUSB_REPORT` + `DS4_REPORT_EX` only — `MPS2_TO_DS4_ACCEL` motion conversion, inverse-ViGEmBus gyro
calibration, DS4 touchpad packing. There is **zero** virtual-HID / DualSense code on Apollo's Windows
side. So:
- Copying Apollo on Windows gets us a **DS4**, with the adaptive-trigger ceiling baked in. The user's strong preference was a **self-authored Rust driver**, accepted as pioneering risk.
- There is **no in-ecosystem upstream** (Sunshine/Apollo/Wolf) that already solved a virtual `microsoft/windows-drivers-rs` officially targets UMDF and ships a real (but *bare-stub*) UMDF sample;
*DualSense* on Windows to vendor from. The closest prior art is in the **virtual-HID-controller** because UMDF 2.0 is the C function-pointer model, the FFI maps cleanly. The honest gap going in: the
space, not the streaming-host space: HIDMaestro and Nefarius DsHidMini (see *Mechanism*). whole HID-minidriver layer (`WdfFdoInitSetFilter`, the manual inverted-call queue, `IOCTL_UMDF_HID_*`
dispatch, `HID_XFER_PACKET`) was hand-written `unsafe` FFI with no safe wrappers, and **every** other
shipping virtual-HID controller driver (`vhidmini2`, HIDMaestro, DsHidMini) is C — so symbol coverage
for the UMDF target was unproven. The de-risk plan was a C `vhidmini2` shim fallback (keeping all
DualSense logic in the Rust host either way), with forking HIDMaestro as the last resort (rejected for
real use because **HIDMaestro omits adaptive triggers** — it cannot prove the one thing that makes a
virtual DualSense worth building).
This is unchanged from the 2026-06-20 draft and remains correct. **Outcome: Option R confirmed.** The M0 spike answered both the build-symbol question and the on-glass
gate with a Rust driver — no C shim needed. The DualSense *logic* stays in Rust where it already lived.
## The parity target — and what's *already* done ## M0 feasibility gate — PASSED (2026-06-21), and the three bugs
The Linux host (`crates/punktfunk-host/src/inject/dualsense.rs`) creates a **UHID** device presenting The blocking gate (RTX box `192.168.1.173`; the dev VM is headless/WARP and cannot validate
the genuine DualSense descriptor, so the kernel `hid-playstation` driver binds it and games see a real game-facing HID recognition) asked two questions no prior art settled:
DualSense — gamepad + motion + touchpad + lightbar/player-LEDs + adaptive triggers. It writes HID
**input** report `0x01` (controller state) and reads HID **output** report `0x02` (the game's
rumble/LED/trigger feedback), which it forwards to the client as `punktfunk_core::quic::HidOutput`.
Crucially, **everything except the host backend is already platform-agnostic and DualSense-complete** 1. **Recognition** — is a virtual `054C:0CE6` UMDF2 device accepted as a *genuine DualSense* by
(verified against live source): `Windows.Gaming.Input` / GameInput / Steam? **YES** — Steam recognized it and drove its
DualSense-specific LEDs.
2. **Adaptive-trigger fidelity** — does the game's output report `0x02` (the adaptive-trigger block)
actually reach the driver's `WriteReport`/`SetOutputReport` callback? **YES** — captured two
Steam-Input output reports (`validFlag1=0x14` = LIGHTBAR|PLAYER_INDICATOR). Adaptive-trigger bytes
ride the same `0x02` path.
| Layer | State | Where | > **Three M0 bugs — reference for any future UMDF-in-Rust work:**
|---|---|---| > 1. **PE FORCE_INTEGRITY blocks self-signed load.** `wdk-build`'s `/INTEGRITYCHECK` sets the PE
| Protocol planes (rich input `0xCC`, rumble `0xCA`, HID-output `0xCD`) | ✅ done | `punktfunk_core::quic` | > FORCE_INTEGRITY bit, which demands a Microsoft-trusted signature to load. Fix: **clear bit `0x80`
| Feedback abstraction (`HidOutput::{Led,PlayerLeds,Trigger,…}`) | ✅ done | `punktfunk_core::quic` | > at offset PE+`0x5e` post-build and re-sign.** This was the load wall (earlier "Secure Boot blocks
| Pad-type negotiation (client pref > env > default), `GamepadPref::DualSense` | ✅ done | `punktfunk1.rs` `resolve_gamepad` (~1577) | > self-signed UMDF" conclusions were wrong).
| Backend dispatch (`enum PadBackend`) | ✅ done; `DualSense`/`DualShock4` arms are `#[cfg(target_os="linux")]` | `punktfunk1.rs` (PadBackend ~11811272) | > 2. **Timer `ExecutionLevel` must be `InheritFromParent`, not zeroed.** A `mem::zeroed`
| Clients (capture + adaptive-trigger/lightbar/haptic/touchpad/motion rendering) | ✅ done, all platforms | `clients/*` | > `WDF_TIMER_CONFIG` gives ExecutionLevel 0, which the framework rejects.
| C-ABI (`next_hidout` / `send_rich_input`) | ✅ done | `abi.rs` | > 3. **Queue `NumberOfPresentedRequests` must be `u32::MAX`, not 0.** A zeroed parallel-queue config
| **Host virtual-DualSense backend** | **Linux only (UHID)** | `inject/dualsense.rs` | > caps in-flight requests at 0 → `EvtIoDeviceControl` never fires.
So a Windows DualSense backend needs **no protocol, client, or C-ABI change**. The whole DualSense ## Milestones
**HID contract already exists as pure, transport-independent Rust + const data**, kernel-verified
byte-for-byte against `hid-playstation.c` / inputtino / SDL, in `inject/dualsense.rs`:
- `DUALSENSE_RDESC` — the 232-byte USB report descriptor. | # | Milestone | State | Commit(s) |
- `serialize_state` — the input report `0x01` packer (controller state → bytes).
- `parse_ds_output` — the output report `0x02` parser (game's rumble/LED/trigger block → `HidOutput`),
valid-flag gated.
- Feature blobs `0x05` calibration, `0x09` pairing, `0x20` firmware. **USB framing (no CRC).**
**No hardware capture is needed** — the bytes are already correct and proven. The *only* Linux
coupling is the `/dev/uhid` event framing (`UHID_CREATE2`/`INPUT2`/`OUTPUT`/`GET_REPORT`) in
`DualSensePad::open`/`write_state`/`service`. A Windows backend swaps that framing for the
`SwDeviceCreate` + IOCTL channel to the UMDF driver; **the report bytes are identical.**
> **One in-repo bug to fix in passing:** `DS_FEATURE_CALIBRATION` (`0x05`) is currently **42 bytes**;
> the spec is **41**. Trim it for strict Windows consumers as part of M1 (`42 → 41`).
`dualshock4.rs` (committed `3e6c9f6`) is a worked **second** example of the multi-pad-type
`PadBackend` pattern, reusing the DualSense state — a template for how the Windows arm slots in.
The host integration seam is small and already mapped: ~1 enum arm + 5 match arms in the
`PadBackend` block (`punktfunk1.rs` ~11811272), flipping `pick_gamepad`/`resolve_gamepad`
(~15581606) from `#[cfg(target_os = "linux")]` to `#[cfg(any(target_os = "linux", target_os =
"windows"))]`, plus the `inject.rs` module gating (~424451). `gamepad_windows.rs` is today
ViGEm-Xbox360-only (138 LOC); the new `inject/dualsense_windows.rs` sits beside it, and ViGEm stays
for Xbox 360 / Xbox One.
## The Windows mechanism — UMDF2 HID minidriver (not VHF)
Windows has **no userspace HID-device creation** (unlike Linux UHID), so a real virtual DualSense
needs a driver component. The decisive correction over the prior draft:
- **VHF (Virtual HID Framework) supports a HID *source* driver only in kernel mode.** It is not the
mechanism for a user-mode virtual pad. (Microsoft, "Virtual HID Framework (VHF)".)
- The user-mode mechanism is a **UMDF2 HID minidriver**: a small lower-filter driver under the
OS-supplied pass-through driver **`mshidumdf.sys`** (which calls `HidRegisterMinidriver` on the
minidriver's behalf). This is the **same UMDF tier as SudoVDA***below* kernel work, not above it.
A second prior-research correction that matters for the language choice: **UMDF 2.0 is NOT
COM-based.** COM / `IDriverEntry` / `IWDFDriver` belong to legacy **UMDF 1.x**. UMDF 2.0 uses the
same **C-style WDF object model as KMDF** — a `DriverEntry` symbol plus C function pointers
(`EvtDriverDeviceAdd`, `EvtIoDeviceControl`) stored in config structs. There is no vtable to
implement. (Microsoft, "Porting a Driver from UMDF 1 to UMDF 2", "Getting Started with UMDF v2".)
This is precisely why a Rust FFI implementation is even conceivable (see *Driver language*).
### What the driver actually does (small, well-bounded)
A UMDF2 HID minidriver holds **no device logic** — it shuttles bytes. Its entire job is one
`EvtIoDeviceControl` callback branching on ~10 HID IOCTLs (Microsoft, "Creating WDF HID
Minidrivers"; reference source `vhidmini2`):
- In `EvtDriverDeviceAdd`: call `WdfFdoInitSetFilter`, then create the I/O queue(s).
- **Descriptor IOCTLs** (`GET_DEVICE_DESCRIPTOR` / `GET_REPORT_DESCRIPTOR` / `GET_DEVICE_ATTRIBUTES`)
— trivial: `RequestCopyFromBuffer` a static blob. For punktfunk these blobs are the **existing
`DUALSENSE_RDESC` (232 B)** + a `HID_DEVICE_ATTRIBUTES` filled `054C`/`0CE6`.
- **Output / feature IOCTLs** (`WRITE_REPORT` / `SET_OUTPUT_REPORT` / `GET_FEATURE` / `SET_FEATURE`)
— pull the `HID_XFER_PACKET` (report id + buffer) and hand the bytes to the host. These carry the
game's `0x02` output report (rumble / lightbar / **adaptive-trigger** block) — exactly what
`parse_ds_output` already decodes.
- **Input path** (`READ_REPORT`, pad → game) — the only non-trivial mechanic, an **inverted call**:
each `READ_REPORT` request is pended into a manual `WDFQUEUE`
(`WdfIoQueueDispatchManual` + `WdfRequestForwardToIoQueue`) and later popped
(`WdfIoQueueRetrieveNextRequest`), filled, and completed (`WdfRequestComplete`) whenever the host
has a fresh `0x01` input report. `vhidmini2` drives this from a periodic timer; punktfunk drives
it from each new `0x01` report arriving over the host channel — **structurally identical to the
existing Linux `/dev/uhid` loop.**
Because UMDF can't marshal embedded pointers, `mshidumdf.sys` converts `IOCTL_HID_*` into
`IOCTL_UMDF_HID_*` (e.g. `IOCTL_UMDF_HID_GET_INPUT_REPORT`, `IOCTL_UMDF_HID_SET_FEATURE`), passing
`reportBuffer` / `reportId` as separate buffers — the driver branches on those.
### Integration sketch
```
host process (Rust) <-- SwDeviceCreate + IOCTL channel --> UMDF2 HID minidriver <-- HID --> game / Steam / GameInput
PadState -------------- input report 0x01 -------------> inverted READ_REPORT queue
HidOutput <----- output report 0x02 (WriteReport cb) ----- EvtIoDeviceControl
```
- **Descriptor reuse:** the exact inputtino PS5 descriptor + feature-report replies we already ship
for Linux (`dualsense.rs` `DS_*` constants) — same bytes, same VID/PID, so Windows + games
recognize it as a DualSense.
- **Host-side device creation:** `windows::Win32::Devices::Enumeration::Pnp::SwDeviceCreate`
`Result<HSWDEVICE>` (pure Win32, in the `windows` crate, **no WDK needed**), enumerating a
root device whose hardware IDs match the pre-staged INF. Requires Administrator. **The device
exists only while the `HSWDEVICE` handle (i.e. the host process) is open** — `SwDeviceClose`
removes it — so the pad is created/destroyed with the session, exactly like the Linux UHID fd.
The INF is pre-staged once (`pnputil /add-driver`).
- **Userspace bridge:** a `DualSenseManager`-shaped struct mirroring the Linux one (same `RichInput`
→ report `0x01` packing via `serialize_state`, same `HidOutput` parsing via `parse_ds_output`),
talking to the driver over an IOCTL channel instead of `/dev/uhid`.
- **Packaging:** vendor + sign the `.dll`/`.inf`/`.cat` and install via the existing
`packaging/windows` machinery (`pnputil` + an `install-*.ps1`, bundled in the Inno `setup.exe`).
The precedent — SudoVDA, a UMDF/IddCx driver — is already in the repo.
## Driver language — recommendation
The user strongly prefers a **self-authored Rust driver**. Verified verdict: **a Rust UMDF2 HID
minidriver is technically viable but unproven and pioneering** — it does not clear the bar for a
*low-risk* M2. Honest ranking of the three options:
### Option R — fully self-authored Rust driver (preferred; viable, but pioneering)
- **What's real today:** `microsoft/windows-drivers-rs` (`wdk`, `wdk-sys`, `wdk-build`,
`wdk-macros`) officially targets WDM + KMDF + **UMDF** (tested UMDF 2.33). It ships a *real* Rust
UMDF sample, `examples/sample-umdf-driver/src/lib.rs`, that `#[unsafe(export_name = "DriverEntry")]`,
builds a `WDF_DRIVER_CONFIG` with `EvtDriverDeviceAdd: Some(...)`, and calls `WdfDriverCreate` +
`WdfDeviceCreate` via `call_unsafe_wdf_function_binding!` over raw `wdk-sys` FFI. Because UMDF 2.0
is the C function-pointer model (no COM vtable), the FFI maps cleanly.
- **The gap:** that sample is a **bare stub** — no I/O queue, no IOCTL dispatch, no HID. The entire
HID-minidriver layer (`WdfFdoInitSetFilter`, the manual inverted-call queue, `IOCTL_UMDF_HID_*`
dispatch, `HID_XFER_PACKET`, `METHOD_NEITHER`) would be **hand-written `unsafe` FFI with no safe
wrappers**, against `vhidmini2`/GazeHid-scale glue (a few hundred lines). The heavy domain logic is
*not* in the driver — it already exists in `dualsense.rs`.
- **The honest blockers:** **zero precedent** — every shipping virtual-HID controller driver
(`vhidmini2`, HIDMaestro, DsHidMini, EmuController, GazeHid) is **C**. Microsoft labels
`windows-drivers-rs` "not yet recommended for production use" (Sept 2025) and has **not settled the
WHCP/attestation submission path for Rust drivers** — directly relevant given the public-distribution
requirement (though attestation re-signs the `.cat` and treats the `.dll` opaquely, so signing
*should* be language-agnostic — unverified). Whether all needed WDF symbols (`WdfIoQueueCreate`,
`WdfFdoInitSetFilter`, `WdfRequestRetrieveOutputMemory`, manual-queue APIs,
`WDF_IO_QUEUE_CONFIG_INIT`) are generated/usable for the UMDF target is **unverified against the
bindings — this is exactly what the M0 build spike must answer.** Note the Dec 2025
`windows-drivers-rs` build break (Discussion #591) is a transient LLVM-22-tip bindgen issue, fixed
by pinning LLVM 21.1.2 — not a fundamental defect.
Do **not** C-FFI-bind DMF's `Dmf_VirtualHidMini` from Rust (large, awkward C surface) — reimplement
the modest `vhidmini2` queue/IOCTL glue directly.
### Option C — thin C/C++ UMDF2 shim + all logic in the Rust host (realistic fallback / lowest-risk M2)
Clone `vhidmini2` (`WdfFdoInitSetFilter` + `EvtIoDeviceControl` + manual inverted-call queue, a few
hundred LOC of generic byte-shuttling); keep **all** DualSense logic in the existing Rust host
(`dualsense.rs` descriptors/packers/parsers fed over the IOCTL channel); the `SwDeviceCreate` host
bridge stays pure Rust in the `windows` crate (no WDK). This **mirrors HIDMaestro's split** (generic
C/C++ UMDF2 HID minidriver under `mshidumdf.sys`, all profile/DualSense logic in the user-mode
service) **and punktfunk's own Linux design.** It is the user's pre-ranked middle option and the
fastest way to reach the M0 on-glass gate.
### Option H — fork/reuse HIDMaestro (last resort)
HIDMaestro is a proven, pure-UMDF2 virtual controller (self-signed, no EV/test-signing/reboot)
recognized by DirectInput/XInput/SDL3/WGI/GameInput/RawInput + Steam, with a **DualSense profile**
(byte-exact VID/PID + descriptor). Use only if even the C shim stalls **and** adaptive-trigger
fidelity is not required — **HIDMaestro omits adaptive triggers from its DS5 feature list**, so it
cannot prove the very thing that makes a virtual DualSense worth building. Its driver is C; its
service is C#.
### Recommendation
**Lead with Option R for the long-term codebase, but de-risk the on-glass gate with Option C in M2.**
Concretely: run the **M0 spike in two halves** — (a) a `windows-drivers-rs` UMDF *build* spike to
confirm the WDF queue/IOCTL symbols are usable from Rust at all, and (b) the on-glass recognition gate
using whichever driver is fastest to stand up (the C `vhidmini2` shim is the safe bet). If (a) passes
**and** the on-glass gate passes, author the M2 driver in **Rust** (it would be the first Rust UMDF
HID driver, accepted as pioneering risk per the user's explicit preference). If (a) is shaky, ship M2
as the **C shim** and migrate the driver to Rust later, once `windows-drivers-rs` ships safe WDF/HID
abstractions. Either way the DualSense *logic* stays in Rust where it already lives. Forking HIDMaestro
is the fallback-of-fallbacks and is acceptable only if adaptive triggers are dropped from scope.
## Signing
Two recipes coexist in the Inno installer, selected by the bundled payload — the same pattern already
proven for SudoVDA.
### Fleet / self-signed (dev + internal boxes)
The in-repo precedent is `packaging/windows/install-sudovda.ps1`: import the bundled `.cer` into the
machine **Root** *and* **TrustedPublisher** stores (`certutil -addstore -f`), then `pnputil
/add-driver /install`. This installs silently **only** because the publisher is pre-trusted on that
machine. Microsoft is explicit that this auto-import-into-Root practice "should never be followed for
any driver package distributed outside your organization" — so it is the **fleet** path, never the
public one.
### Public end-user distribution — EV cert + Microsoft attestation
For arms-length public users, the correct tier is **Microsoft attestation signing** via Partner
Center (verified: "Attestation signing supports Windows Desktop kernel mode **and user mode**
drivers"; processable types include `.cab`/`.dll`). Pipeline:
1. **Prerequisites:** a registered **Windows Hardware Developer Program** (Partner Center) account
(free to register; sign in with an Entra ID global-admin work account; accept the agreements,
provide org/D-U-N-S info, respond to the legal-contact verification email) and an **EV
code-signing certificate** (mandatory to register *and* to sign the submission CAB; ~USD 250560/yr;
FIPS hardware token/HSM mandatory; 17 business-day identity vetting). Windows ADK (`MakeCab`).
2. **Build the submission:** `MakeCab` the `.dll` + `.inf` (+ `.pdb`/symbols) into per-driver
subfolders (folder names < 40 chars, no special chars, no UNC); `SignTool sign` the CAB with the
EV cert (`/fd sha256` + RFC3161 timestamp `/tr … /td sha256`).
3. **Submit:** Partner Center → *Submit new hardware*, **leave test-signing unchecked**, request the
desired signatures.
4. **Microsoft re-signs:** it appends a Microsoft SHA-2 signature and **regenerates + signs a new
`.cat` with a Microsoft cert** (your `.cat` is replaced). Because the catalog signer is then
Microsoft (already trusted), **PnP installs silently — no publisher prompt, no test-signing, no
reboot, and no shipping our cert into users' Root store.** Validation: `devcon`/`pnputil` install
must not show "Windows can't verify the publisher of this driver software."
**Important nuance — is attestation even *required* for UMDF?** UMDF is user-mode, so it is **exempt
from kernel-mode code-integrity *load* enforcement** — the driver `.dll` will *load* without a
Microsoft signature. But **PnP *installation* still requires a signed catalog whose publisher is
trusted.** A driver signed only with a plain publicly-trusted (OV/EV) Authenticode cert that is *not*
already in TrustedPublisher will **install, but with the blocking "Windows Security / would you like
to install this device software?" prompt** (setupapi warning `0x800b0109`, error `0xe0000242`
"publisher … not yet established as trusted"). So a bare Authenticode signature is **not** sufficient
for a prompt-free public install — **attestation is the minimal correct public path.** The April 2026
kernel-trust change (removing trust for legacy cross-signed *kernel* drivers) **does not affect**
attestation/WHQL or user-mode UMDF drivers.
What attestation does **not** do: attestation-signed drivers are **not** distributed via Windows
Update — irrelevant here, since punktfunk bundles the driver in its Inno installer exactly like
SudoVDA. (Azure Trusted Signing is **not** an option for the driver `.cat` at all — it signs only
user-mode PE / `/INTEGRITYCHECK` / SmartScreen, and cannot substitute for the EV cert in Partner
Center; it could only improve SmartScreen reputation on the installer `.exe`.) Note attestation does
**not** require HLK/WHQL testing. The heavier fallback, only if attestation's "testing scenarios"
positioning ever hardens into a block, is full **WHQL/HLK** submission (also yields a Microsoft-signed
catalog, plus Windows Update eligibility).
### Coexistence in the Inno installer
`packaging/windows/punktfunk-host.iss` already gates the SudoVDA driver payload behind
`#ifdef WithDriver` + the `installdriver` task + a `[Run]` call to `install-sudovda.ps1`. Add an
analogous gated payload + `install-dualsense.ps1` for the virtual DualSense driver, switching the
bundled `.cat` per build:
- **fleet build** → self-signed `.cat` + `install-dualsense.ps1` keeps the
`certutil -addstore Root/TrustedPublisher` step (cloned from `install-sudovda.ps1`).
- **public build** → Microsoft-attestation-re-signed `.cat`, and `install-dualsense.ps1`
**drops** the `certutil` import (just `pnputil /add-driver /install`).
Operationally, the EV key lives on a non-exportable FIPS token, so the **CAB signing + Partner Center
submission is a manual offline step**, not a CI secret (cloud-HSM/Azure Key Vault EV options exist but
need per-CA confirmation). The Microsoft-resigned `.cat` is then committed as the vendored public
payload, the way SudoVDA's signed driver is vendored in `packaging/windows/sudovda/`.
## Feasibility gate (BLOCKING — M0, on-glass only)
No prior art settles the two questions that decide whether this whole effort is worth building. **This
gate blocks M1M6** and can only be answered on the **RTX box (`192.168.1.173`)** — the dev VM is
headless/WARP and cannot validate game-facing HID recognition:
1. **Recognition:** is a virtual `054C:0CE6` UMDF2 device accepted as a *genuine DualSense* by
`Windows.Gaming.Input` / GameInput / Steam (and native-DS5 games)? HIDMaestro proves DualSense
*recognition* is possible, but…
2. **Adaptive-trigger fidelity:** does the game's output report `0x02` (the adaptive-trigger block)
actually reach the driver's `WriteReport`/`SetOutputReport` callback? **HIDMaestro omits adaptive
triggers**, so no prior art proves this — it must be **measured on glass**.
If (2) fails, the realistic product is a DualSense *identity* without adaptive triggers — at which
point the value over ViGEm DS4 collapses and the project should likely **defer** rather than ship.
**M0 RESULT (2026-06-21): GATE PASSED.** Both answered YES on the RTX box with a self-authored **Rust**
UMDF minidriver (`packaging/windows/dualsense-driver/`). (1) **Recognition:** Steam recognized the virtual
`054C:0CE6` device as a genuine DualSense and drove its DualSense-specific LEDs. (2) **`0x02` reaches the
write callback:** captured two Steam-Input output reports (`validFlag1=0x14` = LIGHTBAR|PLAYER_INDICATOR).
Adaptive-trigger-specific bytes ride the same `0x02` path (Cyberpunk confirmation is gravy, not a gate).
Three bugs had to be fixed to get there — the load wall was the PE **FORCE_INTEGRITY** bit (`wdk-build`'s
`/INTEGRITYCHECK`; clear bit `0x80` at PE+0x5e + re-sign), then `WdfTimerCreate` exec-level, then a parallel
queue's zeroed `NumberOfPresentedRequests`. **Option R (Rust) confirmed for M2; no C shim needed.**
**Host integration status (2026-06-21): M1/M3/M4 landed; data plane runtime-proven.** The Linux
DualSense logic is shared via `inject/dualsense_proto.rs`; the Windows backend
`inject/dualsense_windows.rs` (`DualSenseWindowsManager`) drives the driver over the
`Global\pfds-shm-<idx>` section, and the `PadBackend`/`pick_gamepad` seam now resolves DualSense on
Windows. Live-verified on the RTX box: the manager creates the section + pushes report `0x01` and a
devnode serves it to a HID read (manager data plane works). **Open item — `SwDeviceCreate`
per-session devnode:** two `E_INVALIDARG` causes found — (1) an underscore in the enumerator name
(`pf_dualsense` → use `punktfunk`), (2) passing the completion callback is still rejected (cause
unresolved; needs a known-good C reference). So per-session auto-creation is **best-effort/non-fatal**:
the host falls back to an out-of-band `pf_dualsense` devnode (the INF lists both `root\pf_dualsense`
for devgen and `pf_dualsense` for SwDevice; the installer would create it, as SudoVDA does). Remaining:
fix the SwDeviceCreate callback E_INVALIDARG, then the M5 on-glass game test.
## Milestone plan (M0M6)
| # | Milestone | Output | Gate / risk |
|---|---|---|---| |---|---|---|---|
| **M0 ✅ DONE** | **Feasibility spike — PASSED (2026-06-21)** | (a) Rust `windows-drivers-rs` UMDF build spike — symbols usable, driver authored in Rust; (b) on-glass on the RTX box: self-signed Rust `054C:0CE6` UMDF minidriver loads under Secure Boot, Steam recognizes it as a DualSense, `0x02` output reaches the write callback. Source: `packaging/windows/dualsense-driver/` | **PASSED.** Option R (Rust) chosen for M2. Load needed clearing the PE FORCE_INTEGRITY bit | | **M0** | Feasibility spike (Rust UMDF build + on-glass recognition + `0x02` callback) | ✅ SHIPPED | driver `aa159df` |
| **M1** | Linux codec refactor | Extract the transport-independent contract from `dualsense.rs` into `inject/dualsense_proto.rs` (`DUALSENSE_RDESC`, `serialize_state`, `parse_ds_output`, feature blobs); **fix `DS_FEATURE_CALIBRATION` 42 → 41**; Linux backend keeps passing | Pure refactor; keep Linux loopback green | | **M1** | Extract transport-independent contract into `inject/dualsense_proto.rs` (`DUALSENSE_RDESC`, `serialize_state`, `parse_ds_output`, feature blobs; calibration trimmed 42→41) | ✅ SHIPPED | `4a73102` |
| **M2** | UMDF2 driver | The HID minidriver + INF + signed `.cat` (test-signed for dev). **Language per M0(a):** Rust if the build spike is solid, else the `vhidmini2`-derived C shim. INF carries the required UMDF directives (`UmdfKernelModeClientPolicy=AllowKernelModeClients`, `UmdfMethodNeitherAction=Copy`, `UmdfFileObjectPolicy=AllowNullAndUnknownFileObjects`, `UmdfFsContextUsePolicy=CanUseFsContext2`), root-enumerated `HIDClass`, filter under `mshidumdf.sys` | Pioneering if Rust; manual inverted-call queue is the hard part | | **M2** | UMDF2 HID minidriver + INF + signed `.cat` (authored in **Rust**) | ✅ SHIPPED | `aa159df` |
| **M3** | Rust host bridge | `inject/dualsense_windows.rs`: `SwDeviceCreate` per-session device (hold `HSWDEVICE` for the session) + the inverted-call IOCTL channel, feeding `0x01` and surfacing `0x02` as `HidOutput` — reusing `dualsense_proto.rs` | Channel design (single control device + inverted-call IOCTL vs shared-memory) | | **M3** | Rust host bridge `inject/dualsense_windows.rs` (`DualSenseWindowsManager` over `Global\pfds-shm-<idx>`; `SwDeviceCreate` per-session devnode) | ✅ SHIPPED | `4a73102`, `fde438a`, `6db3525` |
| **M4** | Un-gate the seam + negotiation | New `PadBackend::DualSense` Windows arm; relax the `#[cfg(target_os="linux")]` guards on DualSense/DualShock4 in `pick_gamepad`/`resolve_gamepad` to `any(linux, windows)`; wire `GamepadPref::DualSense` resolution | Small; `dualshock4.rs` is the template | | **M4** | Un-gate the `PadBackend::DualSense` seam + `GamepadPref::DualSense` resolution on Windows; ViGEm dropped (pure user-mode DS4/Xbox 360 too) | ✅ SHIPPED | `b0c8233` |
| **M5** | On-glass E2E | Client → host → virtual DualSense → game, with adaptive triggers / lightbar / touchpad / motion / rumble round-tripping; latency check | RTX box; the real proof |
| **M6** | Packaging / installer | Vendor + sign the driver; `install-dualsense.ps1` (fleet vs public variant); gate the payload in `punktfunk-host.iss`; complete the **EV cert + attestation** submission for the public build | EV-cert procurement + Partner Center turnaround are lead-time items — start early |
## Decision matrix A `SwDeviceCreate` gotcha surfaced during M3 and is worth keeping: two `E_INVALIDARG` causes were found
— (1) an **underscore in the enumerator name** (`pf_dualsense` → must be `punktfunk`), and (2) passing
the completion callback was rejected; the INF lists both `root\pf_dualsense` (devgen) and `pf_dualsense`
(SwDevice) and the host falls back to an out-of-band devnode when per-session create fails.
## Decision matrix (condensed)
| Option | Adaptive triggers / DS5 identity | Effort | When it's right | | Option | Adaptive triggers / DS5 identity | Effort | When it's right |
|---|---|---|---| |---|---|---|---|
| **A. UMDF2 virtual DualSense** (parity) | ✅ full (pending the M0 gate) | medium — **UMDF, same tier as SudoVDA** (was mis-scoped as "kernel/large" in the 2026-06-20 draft) | the goal — matches the Linux host | | **A. UMDF2 virtual DualSense** (shipped) | ✅ full | medium — UMDF, same tier as SudoVDA | the goal — matches the Linux host |
| **B. ViGEm DS4** (interim) | ❌ never (DS4 ceiling) | small | quick PS-pad on Windows w/ touchpad/motion/lightbar/rumble, no adaptive triggers | | **B. ViGEm DS4** | ❌ never (DS4 ceiling) | small | quick PS-pad, no adaptive triggers — **rejected, ViGEm removed** |
| **C. Hybrid** | A for DS5 clients, B/Xbox 360 fallback | A + small | belt-and-suspenders once A exists | | **C. Hybrid** | A for DS5, Xbox 360 fallback | A + small | belt-and-suspenders (Xbox 360/XInput still covers most games) |
| **D. Defer** | — | — | if the M0 gate fails (esp. output `0x02` fidelity), or a higher-ROI item wins the slot | | **D. Defer** | — | — | would have applied only if the M0 `0x02` gate had failed |
Xbox 360 (XInput, via ViGEm) is already implemented and covers most Windows games regardless; Xbox Xbox 360 (XInput) covers most Windows games regardless; Xbox One/Series fold into it on Windows.
One/Series fold into it on Windows. Windows-host DualShock 4 (ViGEm) remains separately deferred.
## Risk register ## Risk register (condensed)
| Risk | Likelihood | Impact | Mitigation | | Risk | Status |
|---|---|---|---| |---|---|
| Output report `0x02` (adaptive triggers) never reaches the driver write callback | medium | **fatal** to the value prop | M0(b) measures it directly; if it fails → Option D | | Output `0x02` never reaches the driver write callback (fatal to value prop) | **resolved** — M0 measured it directly, YES |
| `054C:0CE6` UMDF2 device not accepted as a real DualSense by WGI/GameInput/Steam | lowmed | fatal | M0(b); HIDMaestro suggests recognition works, but confirm | | `054C:0CE6` not accepted as a real DualSense | **resolved** — Steam recognizes it |
| Rust UMDF driver pioneering risk (first of its kind; no safe WDF/HID wrappers; symbol coverage unproven) | medium | schedule | M0(a) build spike; **Option C (C shim) as the de-risked M2 fallback** | | Rust UMDF pioneering risk (no safe WDF/HID wrappers; symbol coverage) | **resolved** — Rust driver shipped, no C shim |
| EV cert + Partner Center attestation lead time / friction | medium | schedule | Start procurement at M0; lean on the SudoVDA UMDF submission precedent | | `SwDeviceCreate` device lifetime tied to host process handle | accepted — hold `HSWDEVICE` for the session (matches Linux UHID fd semantics) |
| EV key non-exportable → can't sign in CI | high | low | Accept a manual offline sign+submit step; vendor the Microsoft-resigned `.cat` | | `windows-drivers-rs` transient toolchain breaks (LLVM-22 bindgen, Disc. #591) | low — pin LLVM 21.1.2 |
| `SwDeviceCreate` device lifetime tied to the host process handle | known | low | Hold `HSWDEVICE` for the session lifetime (matches Linux UHID fd semantics) | | EV cert + Partner Center attestation lead time / friction | **open** (see below) |
| `windows-drivers-rs` transient toolchain breaks (e.g. LLVM-22 bindgen, Disc. #591) | low | low | Pin LLVM 21.1.2; not a fundamental defect |
| `DS_FEATURE_CALIBRATION` 42-byte blob rejected by strict Windows consumers | low | low | Trim to 41 bytes in M1 |
## Open questions ## Open items
1. **Driver channel design** (unknown): punktfunk's own driver↔host protocol — simplest is a private 1. **Public-distribution signing — EV cert + Microsoft attestation.** The fleet/self-signed recipe
control device with an inverted-call IOCTL for input + IOCTLs for output/feature, vs HIDMaestro's (bundled `.cer` → machine Root + TrustedPublisher via `certutil -addstore -f`, then `pnputil
shared-memory section. `vhidmini2` has *no* service channel (it self-generates via a timer), so this /add-driver /install`, cloned from `install-sudovda.ps1`) works for dev/internal boxes only —
must be designed fresh (or read out of HIDMaestro/DsHidMini source). **Resolve in M3.** Microsoft is explicit it "should never be followed for any driver package distributed outside your
2. **Rust UMDF symbol coverage** (unknown — the M0(a) gate): are all needed WDF symbols organization." For arms-length public users the minimal correct path is **Microsoft attestation
(`WdfIoQueueCreate`, `WdfFdoInitSetFilter`, `WdfRequestRetrieveOutputMemory`, manual-queue APIs, signing** via Partner Center (it re-signs the `.cat` with a Microsoft cert → silent PnP install, no
`WDF_IO_QUEUE_CONFIG_INIT`) generated/usable from `wdk-sys` for the UMDF target? publisher prompt, no Root-store import). A bare Authenticode/OV/EV signature is **not** sufficient:
3. **Attestation for a Rust-authored `.dll`** (likely fine, unverified): attestation re-signs the it installs but with the blocking "would you like to install this device software?" prompt
`.cat` and treats the `.dll` opaquely (allowed type), so language *should* be irrelevant to (setupapi `0x800b0109` / `0xe0000242`). Attestation needs a registered Windows Hardware Developer
signing — but Microsoft has not explicitly settled the WHCP path for Rust drivers. Confirm via a Program (Partner Center) account **and an EV code-signing cert** (FIPS hardware token, ~USD
Partner Center dry-run. 250560/yr, 17 day vetting) to register and to sign the submission CAB. UMDF is exempt from
4. **Single multi-driver CAB** (unknown, operationally useful): can one Partner Center submission carry kernel-mode load enforcement so the `.dll` *loads* unsigned, but *installation* still needs a
*both* the existing SudoVDA driver and the new DualSense driver? Multi-driver CABs are supported in trusted catalog. The EV key is non-exportable → CAB signing + submission is a **manual offline
general; unverified for this account. step**, not a CI secret; vendor the Microsoft-resigned `.cat` like SudoVDA's. (Azure Trusted Signing
5. **EV cert + Partner Center mechanics** (unknown): exact cost/turnaround; whether a cloud-HSM EV cannot substitute — it signs only user-mode PE/`/INTEGRITYCHECK`/SmartScreen, not the driver `.cat`.)
option lets CI sign, or whether it must be a manual offline step (most likely the latter). **Blocks public release; dev/fleet self-signed works today.**
6. **HidHide** (carried over): needed at all on a usually-headless host, or only when a physical pad is
attached? 2. **GameInput API detection reads VID/PID as `0x0000`.** The GameInput path does not pick up the
7. **Min-OS / UMDFVERSION target** (unknown): which `UmdfLibraryVersion` / WDK to target for the widest `054C:0CE6` identity (reads `0x0000`); may require the KMDF USB-emulating bus driver rather than the
Win10/11 install base, consistent with punktfunk's existing host support matrix. root-enumerated UMDF HID device. Tracked in
8. **DsHidMini end-user signing tier** (unknown): self-signed vs attestation in its WixSharp MSI — [`design/windows-dualsense-game-detection.md`](windows-dualsense-game-detection.md).
useful as a second public-distribution data point.
3. **HidHide integration** — unclear value on a usually-headless host; only relevant when a physical
pad is also attached. Decide whether to bundle/integrate at all.
4. **Minimum-OS / `UMDFVERSION` targeting decision** — which `UmdfLibraryVersion` / WDK to target for
the widest Win10/11 install base, consistent with punktfunk's existing host support matrix.
5. **Single multi-driver CAB** — can one Partner Center submission carry *both* SudoVDA and the
DualSense driver? Multi-driver CABs are supported in general; unverified for this account.
6. **DsHidMini end-user signing tier** — self-signed vs attestation in its WixSharp MSI, useful as a
second public-distribution data point.
+160 -182
View File
@@ -5,107 +5,80 @@
> a signed Inno Setup installer with a LocalSystem SCM service. Live-validated on the RTX box through > a signed Inno Setup installer with a LocalSystem SCM service. Live-validated on the RTX box through
> 5120×1440@240 HDR, the secure desktop (lock/UAC), and a fullscreen game. > 5120×1440@240 HDR, the secure desktop (lock/UAC), and a fullscreen game.
> >
> This file **consolidates and replaces** five earlier docs (now retired into it): the rewrite design > This file is the consolidated Windows-host doc — it absorbs the rewrite design plan, the Goal-1
> plan, the Goal-1 staged-refactor plan, the audit, the audit-remediation tracker, and the > staged-refactor plan, the audit + remediation tracker, the fullscreen-game capture-bug analysis, and
> fullscreen-game capture-bug analysis. See the [consolidation note](#appendix--consolidation-note) for > the durable rationale from the original `windows-host.md` implementation plan (now a stub).
> what moved where. **Last updated 2026-06-26.** Work lives on branch **`windows-host-goal1`** (off > **Last updated 2026-06-26.** All of this work is **merged to `main`** (the `windows-host-goal1`
> `main`, not yet merged). > branch landed at `3e7c9bd`).
--- ---
## 1. Status at a glance ## 1. Status at a glance
The Windows host is **functionally complete and validated on glass.** The hard, high-risk proofs are done: **Goals 13 and milestones M0M4 are complete and merged to `main`.** The host has a clean, typed,
a clean all-Rust IddCx driver on the unified `windows-drivers-rs` stack (the `/INTEGRITYCHECK` answer + layered architecture (`HostConfig → SessionPlan → SessionContext`, `windows/`+`linux/` confinement, a
the `iddcx` `wdk-sys` binding), IDD-push zero-copy capture at 5K@240 HDR, the secure desktop (Winlogon / single `VirtualDisplayManager` ownership model, `EncoderCaps`); the all-Rust IddCx `pf-vdisplay` driver
UAC / lock), and the host re-architected into a clean, typed, layered shape. What remains is loads self-signed under Secure Boot and does IDD-push zero-copy capture at 5K@240 HDR including the
**non-blocking**: hygiene (host `unsafe` lints, a few `OwnedHandle` rollouts), the SudoVDA backend **secure desktop** (Winlogon/UAC/lock); SudoVDA is gone (`84a3b95`) — `pf-vdisplay` is the sole
deletion (decoupled, not yet removed), a driver robustness gap (slot reclaim), the gamepad-driver virtual-display backend; and the three UMDF drivers (`pf-vdisplay`, `pf-dualsense`, `pf-xusb`) now build
unification (M4), and old-monolith cleanup (M6) — plus the merge to `main`. from source in one unified `packaging/windows/drivers/` workspace (M4, `92e6802`). The shipped path
(IDD-push + NVENC) is live-validated on glass; the AMF/QSV encode path is CI-green but not yet
on-hardware (no AMD/Intel Windows box in the lab).
One framing correction baked into this doc: the host was **not** greenfield-rebuilt as the original plan Ground the details against the code: `crates/punktfunk-host/src/windows/`,
imagined. It was **refactored in place** via a staged, behavior-preserving sequence (the "Goal-1" plan), `crates/punktfunk-host/src/{capture,encode,inject,audio,vdisplay}/windows/`, and
which kept the live-validated host working at every step. The driver, by contrast, *was* rebuilt fresh `packaging/windows/drivers/`.
(the new `packaging/windows/drivers/pf-vdisplay/` tree).
### Scorecard (verified against `windows-host-goal1` HEAD, 2026-06-25) **What remains (all non-blocking):** the `pf-vdisplay` slot-reclaim-on-REMOVE fix needs an on-glass
reconnect-storm A/B (§4 P1.3); host-crate `unsafe` lint hygiene + old-monolith / bring-up-scaffolding
| Item | Status | Evidence | cleanup (§4 P2); and the hardware-gated items — AMF/QSV on-glass, hybrid-GPU `SET_RENDER_ADAPTER`,
|---|---|---| the WGC/DDA fallback reshape, and true `max_concurrent>1` (§4 P3). One framing note: the host was
| **Goal 1** — clean, layered host architecture | ✅ **DONE** | `config.rs` (`HostConfig`), `session_plan.rs` (`SessionPlan`), `SessionContext`, `windows/`+`linux/` confinement (`38c68c3`), `VirtualDisplayManager` (§2.5), `EncoderCaps` (`0ccd0fe`) | **not** greenfield-rebuilt — it was **refactored in place** via a staged, behavior-preserving sequence
| **Goal 2** — drop every trace of SudoVDA | ✅ **DONE** | reach-in decoupled (F1: `d638a93`/`e60cda3``win_adapter`/`win_display`), then the `sudovda.rs` backend + the dual-backend select **deleted** (this branch) — pf-vdisplay is the sole Windows virtual-display backend | that kept the live host working at every step; only the *driver* was rebuilt fresh.
| **Goal 3** — minimize `unsafe` + P0 lints | 🟡 **PARTIAL** (**box-validated**) | driver `deny(unsafe_op_in_unsafe_fn)` (`a755d6e`); **`OwnedHandle`/RAII rollout** — `idd_push.rs` (`011607e`, view-leak fix) + `service.rs` child/job (`4c95ba7`) + the 3 gamepad backends via shared `gamepad_raii.rs` (`e5c2b4e`) + the IDD-push `KeyedMutexGuard` hot loop (`6585643`) + the **SCM STOP/SESSION events**`OnceLock<OwnedHandle>` (`61c02e6`, runtime-validated: clean ~1 s `sc stop`); **driver `pod_init!`** (`bf57704`, 27→1). **On-glass clean: host clippy `-D warnings` + driver build** (RTX box; `bd05bc8` fixed 11 lints the gate surfaced). The host-side raw-handle smuggling is fully retired; only host-crate P0 lints remain (deferred — churn>value) |
| **M0** — proto ABI + driver toolchain + `/INTEGRITYCHECK` + `iddcx` | ✅ **DONE** | `pf-driver-proto`; vendored `windows-drivers-rs` 0.5.1; `clear-force-integrity.ps1`; CI-green |
| **M1** — new IddCx driver, first light + HDR | ✅ **DONE (on-glass)** | STEP 08 (`d7a9fbf``cd59151`); HDR live ("Mac connects WITH HDR", `6399d28`) |
| **M2** — IDD-push capture + NVENC, glass-to-glass | ✅ **DONE (on-glass)** | 5120×1440@240 HDR zero-copy; integrated into the host path |
| **M3** — service / input / audio / **secure desktop** | ✅ **DONE (on-glass)** | secure desktop (lock/UAC) **owner-confirmed 2026-06-25** — IDD-push captures it + input reaches it |
| **M4** — gamepad drivers onto the unified stack | ❌ **OPEN** | `pf_dualsense`/`pf_xusb` still standalone (`packaging/windows/{dualsense,xusb}-driver/`), not in `drivers/` workspace |
| **M5** — WGC/DDA fallback reshape + GameStream-on-pipeline + AMF/QSV | 🟡 **PARTIAL** | fallbacks exist (`wgc.rs`/`wgc_relay.rs`/`dxgi.rs`), not reshaped onto the new seams; AMF/QSV CI-only (no lab hw) |
| **M6** — cut over + delete the old monoliths | 🟡 **PARTIAL** | old `vdisplay-driver/` tree deleted (`a2bd0cd`); host monoliths + bring-up scaffolding (`spawn_observer`/`DebugBlock`) remain |
| **Game-capture bug (GB1)** — fullscreen game breaks IDD-push | ✅ **FIXED** | resolution-listening recovery (`c87bfe0`) + open-time DDA failover (`f98ab07`) + driver guard/log (`789ad49`) |
| **Audit P0/P1/P2** | ✅ mostly **RESOLVED** | watchdog, `SET_RENDER_ADAPTER`, log gate, mode bounds, IDD-push fallback, F1, out-ring/HDR-ring, proto asserts — all landed; **open:** host hygiene (§8), E1 completion, slot-reclaim |
--- ---
## 2. Architecture (what is on disk) ## 2. Architecture (what is on disk)
A ~1-page map; the empirical constraints these encode are in §3, the deep reference is in §6.
### 2.1 Layering & crates ### 2.1 Layering & crates
- **`crates/punktfunk-host`** — one shared host crate (Linux + Windows; not split). Platform code is - **`crates/punktfunk-host`** — one shared host crate (Linux + Windows; not split). Platform code is
confined under per-module `windows/`+`linux/` folders behind `#[cfg]` seams (`capture/{windows,linux}/`, confined under per-module `windows/`+`linux/` folders behind `#[cfg]` seams (`capture/{windows,linux}/`,
`encode/{windows,linux}/`, `inject/{windows,linux}/`, `audio/{windows,linux}/`, `vdisplay/{windows,linux}/`, `encode/…`, `inject/…`, `audio/…`, `vdisplay/…`, plus top-level `src/windows/`+`src/linux/`). Module
and top-level `src/windows/`+`src/linux/`). Module names stay flat (`#[path]`), so caller paths are names stay flat (`#[path]`), so caller paths are platform-agnostic.
platform-agnostic.
- **`crates/punktfunk-core`** — the one linked protocol/FEC/crypto/QUIC core (unchanged here). - **`crates/punktfunk-core`** — the one linked protocol/FEC/crypto/QUIC core (unchanged here).
- **`crates/pf-driver-proto`** — the owned, `no_std` host↔driver ABI (frame ring + control plane + - **`crates/pf-driver-proto`** — the owned, `no_std` host↔driver ABI (frame ring + control plane +
gamepad SHM), consumed by both the host crate and the driver workspace (§2.7). gamepad SHM), consumed by both the host crate and the driver workspace.
- **`packaging/windows/drivers/`** — the unified driver workspace on `microsoft/windows-drivers-rs` - **`packaging/windows/drivers/`** — the unified driver workspace on `microsoft/windows-drivers-rs`
(vendored 0.5.1 + an `iddcx` subset): members `pf-vdisplay` (the IddCx display driver), `wdk-iddcx` (vendored 0.5.1 + an `iddcx` subset): `pf-vdisplay` (the IddCx display driver), `pf-dualsense` +
(the typed IddCx DDI wrappers), `wdk-probe` (the CI link/surface gate), `vendor/{wdk-build,wdk-sys}`. `pf-xusb` (the gamepad drivers, folded in by M4), `wdk-iddcx` (typed IddCx DDI wrappers), `wdk-probe`
(the CI link/surface gate), `vendor/{wdk-build,wdk-sys}`.
### 2.2 Session resolution — `HostConfig → SessionPlan → SessionContext` (Goal-1 realized) ### 2.2 Session resolution, ownership, and seam traits (Goal-1)
The old ~40-knob `PUNKTFUNK_*` env soup, re-read and recomputed in three places, is replaced by a The old ~40-knob `PUNKTFUNK_*` env soup (re-read and recomputed in three places) is replaced by a
resolve-once pipeline: **resolve-once** pipeline: `config.rs` `HostConfig` (typed, parsed once) → `session_plan.rs` `SessionPlan`
(a `Copy` plan resolved once per session — `CaptureBackend::resolve()` picks `IddPush | Dda | Wgc`,
`resolve_topology` picks `SingleProcess | TwoProcessRelay`; this killed the latent capture/encode
backend-disagreement bug) → `SessionContext` (bundles the ~13 session args + plane receivers, moved into
the stream thread).
- **`config.rs` `HostConfig`** — typed config parsed **once** from `host.env`/env/flags Ownership is a single **OnceLock `VirtualDisplayManager`** (`vdisplay/windows/manager.rs`) owning a
(`idd_push`/`encoder_pref`/`no_wgc`/`capture_backend`/`render_adapter`/`secure_dda`/`ten_bit`/`zerocopy`/…). *typed* `Arc<OwnedHandle>` control-device handle (no raw-`isize` cross-thread smuggle), a refcounted
Each field's parser is byte-identical to the read it replaced. (Runtime-mutated Linux session vars from Idle/Active/Lingering state machine, and the monitor generation; a per-session `MonitorLease`'s `Drop`
`vdisplay::apply_session_env`, and single-use local tuning knobs, are deliberately kept live — see the releases the refcount (a stale lease can't tear down a fresh monitor). This deleted a fistful of
`config.rs` header.) `CURRENT_MON_GEN`/`MGR`/`IDD_*` globals and validated on glass at **0 leaked monitors across a reconnect
- **`session_plan.rs` `SessionPlan { display, capture, topology, encoder, input_format, bit_depth, hdr, storm**, A/B-equivalent to the shipping host.
pipeline_depth }`** — a `Copy` plan resolved **once** per session from `HostConfig` + the negotiated
bit-depth, logged, and threaded through `build_pipeline`. `CaptureBackend::resolve()` is the one
resolver (`IddPush | Dda | Wgc`); `resolve_topology` decides `SingleProcess | TwoProcessRelay`. This
killed the latent capture/encode backend-disagreement bug.
- **`SessionContext`** — bundles the session entry's ~13 args (was `#[allow(too_many_arguments)]`) and the
plane receivers into one owned struct moved into the stream thread.
### 2.3 Ownership model — `VirtualDisplayManager` + `MonitorLease` (§2.5 realized) The seam traits (`VirtualDisplay`/`VirtualOutput`/`VirtualLease`, `Capturer`, `Encoder`,
`AudioCapturer`/`VirtualMic`/`InputInjector`/`PadManager`) got two tightenings: the capturer takes the
desired `OutputFormat { gpu, hdr }` **in** (killing the `capture → encode::windows_resolved_backend()`
back-reference), and `Encoder::caps() -> EncoderCaps` (§2.4) lets the session glue route loss-recovery by
query.
A single **OnceLock `VirtualDisplayManager`** (`vdisplay/windows/manager.rs`) owns a *typed* ### 2.3 Capture — IDD-push primary (normal **and** secure desktop), WGC/DDA fallback, GB1 recovery
`Arc<OwnedHandle>` control-device handle (no raw-`isize` cross-thread smuggle), the refcounted
Idle/Active/Lingering state machine, and the monitor generation (`AtomicU64`). Both Windows backends
(`pf_vdisplay`, `sudovda`) shrank to thin `VdisplayDriver` impls (`open`/`add_monitor`/`remove_monitor`/
`ping`) behind it; `MonitorKey = Guid | Session(u64)`. A per-session `MonitorLease`'s `Drop` releases the
refcount (a stale lease can't tear down a fresh monitor). This deleted the old `CURRENT_MON_GEN`/`MON_GEN`/
two-`MGR`/`IDD_PERSIST`/`IDD_SETUP_LOCK`/`IDD_SESSION_STOP` globals. Validated on glass: **0 leaked active
monitors across a reconnect storm**, A/B-equivalent to the shipping host. (The 5-agent map found
`CURRENT_MON_GEN` had been **write-only** — the per-frame "monitor-gen bail" was never wired — so the gen
lives on the manager + lease only.)
### 2.4 The seam traits
`VirtualDisplay`/`VirtualOutput`/`VirtualLease` (RAII keepalive = release), `Capturer`
(`next_frame`/`try_latest`/`set_active`/`hdr_meta`/`pipeline_depth`), `Encoder`
(`submit`/`caps`/`request_keyframe`/`set_hdr_meta`/`invalidate_ref_frames`/`poll`/`flush`),
`AudioCapturer`/`VirtualMic`/`InputInjector`/`PadManager`. Realized tightenings: the capturer takes the
desired `OutputFormat { gpu, hdr }` **in** (killed the `capture → encode::windows_resolved_backend()`
back-reference recomputed in `dxgi.rs`); and `Encoder::caps() -> EncoderCaps { supports_rfi,
supports_hdr_metadata }` lets the session glue route loss-recovery by query (only Windows direct-NVENC
overrides it; the GameStream loop gates the RFI path on `supports_rfi`).
### 2.5 Capture — IDD-push primary (normal **and** secure desktop), WGC/DDA fallback, GB1 recovery
**IDD-push is the universal primary path.** Capture comes straight from the driver's shared keyed-mutex **IDD-push is the universal primary path.** Capture comes straight from the driver's shared keyed-mutex
texture ring (`capture/windows/idd_push.rs`) — no Desktop Duplication, no `win32u` reparenting hook. The texture ring (`capture/windows/idd_push.rs`) — no Desktop Duplication, no `win32u` reparenting hook. The
@@ -118,7 +91,7 @@ primary path.
- **Open-time fallback:** `IddPushCapturer::open` waits a bounded ~4 s for a *first frame* (not just - **Open-time fallback:** `IddPushCapturer::open` waits a bounded ~4 s for a *first frame* (not just
`DRV_STATUS_OPENED`); on attach failure it returns the keepalive back so `capture.rs` opens **DDA** on `DRV_STATUS_OPENED`); on attach failure it returns the keepalive back so `capture.rs` opens **DDA** on
the same `WinCaptureTarget` — never a 20 s black bail (audit §5.1, `ed58365`/`f98ab07`). the same `WinCaptureTarget` — never a 20 s black bail (`ed58365`/`f98ab07`).
- **Mid-session game mode-set recovery (GB1, fixed):** the 250 ms poll follows the display's *actual* - **Mid-session game mode-set recovery (GB1, fixed):** the 250 ms poll follows the display's *actual*
resolution (`win_display::active_resolution`, CCD/GDI) and recreates the ring on any descriptor change resolution (`win_display::active_resolution`, CCD/GDI) and recreates the ring on any descriptor change
(size **or** HDR) → the driver re-attaches → frames resume at the game's mode, **no reconnect**. If a (size **or** HDR) → the driver re-attaches → frames resume at the game's mode, **no reconnect**. If a
@@ -127,9 +100,10 @@ primary path.
from Windows (`c87bfe0`; the driver's `publish()` width/height guard + flushed log is `789ad49`). from Windows (`c87bfe0`; the driver's `publish()` width/height guard + flushed log is `789ad49`).
- **WGC + DDA** stay as demoted fallbacks for non-IddCx hardware (`wgc.rs`/`dxgi.rs`). The two-process WGC - **WGC + DDA** stay as demoted fallbacks for non-IddCx hardware (`wgc.rs`/`dxgi.rs`). The two-process WGC
secure-desktop relay (`wgc_relay.rs`) is no longer load-bearing now that IDD-push handles the secure secure-desktop relay (`wgc_relay.rs`) is no longer load-bearing now that IDD-push handles the secure
desktop; it is kept recoverable but slated for M5/M6 cleanup. desktop; it is kept recoverable but slated for M5/M6 cleanup. (Its constraint analysis is archived in
[`archive/windows-secure-desktop.md`](archive/windows-secure-desktop.md).)
### 2.6 Encode — NVENC / AMF / QSV / software; `EncoderCaps`; HDR ### 2.4 Encode — NVENC / AMF / QSV / software; `EncoderCaps`; HDR
`encode/windows/` dispatches per DXGI adapter vendor (`open_video`): **NVENC** (NVIDIA, direct SDK, `encode/windows/` dispatches per DXGI adapter vendor (`open_video`): **NVENC** (NVIDIA, direct SDK,
`nvenc.rs` — caps-probe-before-configure, bitrate-clamp binary search, true RFI over the DPB, in-band `nvenc.rs` — caps-probe-before-configure, bitrate-clamp binary search, true RFI over the DPB, in-band
@@ -139,38 +113,34 @@ HEVC Main10 + BT.2020 PQ; the client auto-detects PQ from the VUI. The encoder a
size/format/HDR change per frame (tears down + re-inits), so the GB1 capturer's resolution changes are size/format/HDR change per frame (tears down + re-inits), so the GB1 capturer's resolution changes are
handled downstream with no API change. handled downstream with no API change.
### 2.7 Host↔driver ABI — `pf-driver-proto` `Encoder::caps() -> EncoderCaps { supports_rfi, supports_hdr_metadata }` lets the session glue route
loss-recovery by query (only Windows direct-NVENC overrides it; the GameStream loop gates the RFI path on
`supports_rfi` rather than hard-coding per-backend knowledge into the glue).
One `no_std` crate, both build graphs. Owns the **frame plane** (`SharedHeader`, `FrameToken { generation, ### 2.5 Host↔driver ABI & the `pf-vdisplay` driver
seq, slot }` with `pack`/`unpack`, `Global\pfvd-*` name helpers), the **control plane** (fresh interface
GUID — not SudoVDA's `e5bcc234`; contiguous `0x900` IOCTL ops; `u64` session id; a real `GET_INFO` version
handshake the host **asserts** + bails on mismatch), and the **gamepad SHM** (`XusbShm` 64 B, `PadShm`
256 B incl. `device_type`). `bytemuck`-`Pod` + `size_of` **and** `offset_of!` asserts make ABI drift a
**compile error** (`95dcef3`). The host-side gamepad consumers derive their layouts from here; the
**driver-side** gamepad drivers do not yet (M4).
### 2.8 The `pf-vdisplay` IddCx driver `pf-driver-proto` is one `no_std` crate in both build graphs. It owns the **frame plane** (`FrameToken`
+ `Global\pfvd-*` names), the **control plane** (a fresh interface GUID — *not* SudoVDA's `e5bcc234`;
contiguous `0x900` IOCTL ops; a `GET_INFO` version handshake the host **asserts** + bails on mismatch),
and the **gamepad SHM** (`XusbShm`/`PadShm` incl. `device_type`). `bytemuck`-`Pod` + `size_of` **and**
`offset_of!` asserts make ABI drift a **compile error**.
All-Rust UMDF IddCx driver on `windows-drivers-rs` + the `iddcx` `wdk-sys` subset. STEP 08 landed The driver (`packaging/windows/drivers/pf-vdisplay/src/`) is an all-Rust UMDF IddCx driver on
(`packaging/windows/drivers/pf-vdisplay/src/`): `entry.rs` (DriverEntry + `IDD_CX_CLIENT_CONFIG`, 15 `windows-drivers-rs` + the `iddcx` `wdk-sys` subset; the STEP 08 build is the checklist in §6.3, its
callbacks), `adapter.rs` (caps + FP16 + `SET_RENDER_ADAPTER`), `monitor.rs`/`callbacks.rs` (the `*2` HDR internals are the invariants in §3, and it loads self-signed under Secure Boot (FORCE_INTEGRITY cleared
mode DDIs, EDID verbatim), `swap_chain_processor.rs` (the worker, `SetDevice`-retry + top-of-loop post-link, §6.1). **Known gaps:** ownership state is still partly process-global with
`terminate`), `frame_transport.rs` (the `FramePublisher` on `pf_driver_proto::frame`), `control.rs` (the `EvtCleanupCallback` on the **WDFDEVICE** (a deliberate, sound choice — E1 in §4); and
typed IOCTL dispatch + host-gone **watchdog** + mode bounds). Self-signed-loadable under Secure Boot slot-reclaim-on-REMOVE (§4 P1.3).
(FORCE_INTEGRITY cleared post-link). **Known gaps:** ownership state is still partly process-global
(`MONITOR_MODES`/`NEXT_ID`/`ADAPTER`/`DEVICE_POOL`) with `EvtCleanupCallback` on the **WDFDEVICE** (not
per-`IDDCX_MONITOR`) — see E1 in §4; and it does not reclaim IddCx monitor **slots** on REMOVE (the
ghost-monitor wedge, §4).
### 2.9 Service, packaging, installer ### 2.6 Service, packaging, installer
A `LocalSystem` SCM supervisor (`service.rs`) token-retargets and `CreateProcessAsUserW`s `serve` into the A `LocalSystem` SCM supervisor (`windows/service.rs`) token-retargets and `CreateProcessAsUserW`s `serve`
console session (so `SendInput` reaches the streamed desktop + the secure desktop), relaunches on into the console session (so `SendInput` reaches both the streamed and the secure desktop), relaunches on
session-change, and kills-on-close via a Job Object. Shipped as a **signed Inno Setup** `setup.exe` session-change, and kills-on-close via a Job Object — the Sunshine/Apollo model (rationale:
(`packaging/windows/`, `windows-host.yml`) that bundles the **new** `pf-vdisplay` driver [`windows-service.md`](windows-service.md)). Shipped as a **signed Inno Setup** `setup.exe`
(`pf_vdisplay.inx` in-tree, old `vdisplay-driver/` tree deleted) + FFmpeg DLLs and delegates to `service (`packaging/windows/`, `windows-host.yml`) that builds + signs all three drivers from source, bundles
install`. GameStream (Moonlight) is kept but the installer/service default to secure `serve` (GameStream them + the FFmpeg DLLs, and delegates to `service install`. GameStream (Moonlight) is kept, but the
opt-in). installer/service default to secure `serve` (GameStream opt-in).
--- ---
@@ -206,11 +176,12 @@ These are expensive empirical wins; keep them intact when touching the code:
## 4. Open work / next tasks (prioritized) ## 4. Open work / next tasks (prioritized)
**P1 — ship-readiness / correctness** **P1 — ship-readiness / correctness**
1. **Merge `windows-host-goal1` → `main` + push** (outward-facing → confirm first). Pushing also runs the 1. **Goal-1 → `main` merge — ✅ DONE.** The `windows-host-goal1` branch is merged (tip `3e7c9bd`); the
full Windows CI matrix incl. the `amf-qsv` encode path, which local checks skip. full Windows CI matrix (incl. the `amf-qsv` encode path that local checks skip) runs on push.
2. **Make IDD-push the default**today it is gated behind `PUNKTFUNK_IDD_PUSH` (`config.rs` default 2. **IDD-push default — ✅ resolved via `host.env`.** The shipped default `host.env` sets
`false`); deployment sets it in `host.env`. Flip the code default (with the WGC/DDA fallback already in `PUNKTFUNK_IDD_PUSH=1`, so a fresh install runs the validated IDD-push path (with the WGC/DDA fallback
place) so a fresh install runs the validated path, or document the `host.env` requirement explicitly. in place). The bare *in-code* default (`config.rs`) is still `false` (the dev / non-pf-driver default);
flipping it to follow the deployed default is an optional tidy.
3. **pf-vdisplay slot reclaim on REMOVE** (driver robustness) — 🟡 **fix landed, on-glass-validation 3. **pf-vdisplay slot reclaim on REMOVE** (driver robustness) — 🟡 **fix landed, on-glass-validation
pending.** Sustained ADD/REMOVE churn wedged the driver (`ADD → 0x80070490 ERROR_NOT_FOUND`) because the pending.** Sustained ADD/REMOVE churn wedged the driver (`ADD → 0x80070490 ERROR_NOT_FOUND`) because the
monitor id (EDID serial / `ConnectorIndex` / container GUID) was a **monotonic** `NEXT_ID`, never monitor id (EDID serial / `ConnectorIndex` / container GUID) was a **monotonic** `NEXT_ID`, never
@@ -220,42 +191,16 @@ These are expensive empirical wins; keep them intact when touching the code:
sustained churn on the RTX box, so this needs an **on-glass reconnect-storm A/B** to confirm (the box is sustained churn on the RTX box, so this needs an **on-glass reconnect-storm A/B** to confirm (the box is
ephemeral). Keep `packaging/windows/reset-pf-vdisplay.ps1` as the recovery until validated. ephemeral). Keep `packaging/windows/reset-pf-vdisplay.ps1` as the recovery until validated.
**P2 — hygiene / architecture completion** (the unsafe-reduction + stability priority) **P2 — hygiene / architecture completion**
4. **D1-host — host-crate P0 lints.** Add `#![deny(unsafe_op_in_unsafe_fn)]` + 4. **D1-host — host-crate P0 lints — deferred (low value / high churn).** A crate-wide
`#![warn(clippy::undocumented_unsafe_blocks)]` to the host crate and fix the fallout (~30 of the 52 `#![deny(unsafe_op_in_unsafe_fn)]` produced 100+ FFI-wrap sites across the Linux modules; it *wraps*
`unsafe fn`s need an inner `unsafe {}`). Stage it **per-module, Linux-first** (item-level `#[deny]` on unsafe (discipline) rather than reducing it and doesn't improve stability, so it was deprioritized vs
`linux/zerocopy/cuda.rs`/`egl.rs`, `encode/linux/vaapi.rs` — locally verifiable), then the Windows the `OwnedHandle`/RAII reductions (which are **complete**`idd_push.rs`, `service.rs`, the three
modules (CI-gated), then promote to crate-level. The driver already has the deny. gamepad backends via a shared `gamepad_raii.rs`, the SCM STOP/SESSION events as `OnceLock<OwnedHandle>`,
5. **D2 — `OwnedHandle` / RAII rollout.** ✅ **DONE (complete).** `capture/windows/idd_push.rs` (`011607e`: the hot-loop `KeyedMutexGuard`, and the driver's `pod_init!`; all box-validated, clean `sc stop` in
a `MappedSection` RAII for the mapping handle **+** the leaked `MapViewOfFile` view, + `OwnedHandle` for ~1 s). The driver already has the deny. Revisit D1-host as a final discipline pass (staged per-module)
the event / ring-slot shared handles); `windows/service.rs` (`4c95ba7`: the child process/thread + Job if desired.
handles, ~9 `CloseHandle` deleted); the **three gamepad backends** (`e5c2b4e`: a shared 5. **M6 scaffolding cleanup** delete the bring-up diagnostics (`spawn_observer`/`DebugBlock` in
`inject/windows/gamepad_raii.rs` — `Shm` for the section+view, `SwDevice` for the devnode — replacing the
duplicated `create_shm_section` + three hand-written `Drop`s); and the **SCM STOP/SESSION events**
(`61c02e6`: `AtomicIsize` raw-`isize` smuggle → `OnceLock<OwnedHandle>` the capture-free C handler reads,
owned for the process lifetime — also closes a latent close-then-signal window). **Runtime-validated on
the RTX box**: swapped in, `sc start` → RUNNING, `sc stop` → clean STOPPED in ~1 s (not a timeout-kill),
original restored. `manager.rs`/`pf_vdisplay.rs` already used the pattern.
6. **Hot-loop `KeyedMutexGuard` ✅ done** (`6585643`) — the IDD-push consume loop's hand-written
`AcquireSync`/`ReleaseSync` (with its "don't `?`-return between them or you leak the lock + stall the
driver" caveat) is now a RAII guard scoped to the convert/copy block: same release point (latency
unchanged), but leak-proof on any early return. **Driver `pod_init!` ✅** (`bf57704`, 27 `mem::zeroed` →
1). **Skipped `ThreadBound<T>`** (each `unsafe impl Send` wraps a distinct type — churn, no real gain) and
**scratched the IOCTL dispatcher** (`control.rs`'s `read_input<T>`/`write_output_complete<T>` are already
generic with minimal unsafe).
**On-glass build validation (RTX box, 2026-06-26).** Built this branch on the box in an isolated worktree:
**host `cargo clippy -p punktfunk-host --features nvenc -D warnings` = CLEAN**, **driver `cargo build` =
CLEAN** — validating the whole session's Windows + driver work on real hardware. The clippy gate (which the
goal1/§2.5 work never ran — it used `cargo check`) surfaced + fixed 11 lint issues (`bd05bc8`: 9 redundant
`as *mut c_void`, an `if_same_then_else`, an `unused_unsafe` in `pod_init!`). Remaining only a runtime
**latency A/B** for the `KeyedMutexGuard` (provably equivalent — same release point) if a deeper check is
wanted.
7. **D1-host P0 lints — deferred (low value / high churn).** A crate-wide `#![deny(unsafe_op_in_unsafe_fn)]`
produced 100+ FFI-wrap sites across the Linux modules; it *wraps* unsafe (discipline) rather than
reducing it and doesn't improve stability, so it was deprioritized vs the `OwnedHandle`/RAII reductions
above. Revisit as a final discipline pass (staged per-module) if desired.
8. **M6 scaffolding cleanup** — delete the bring-up diagnostics (`spawn_observer`/`DebugBlock` in
`idd_push.rs`) and, once full parity is proven on glass, the host monoliths. `idd_push.rs`) and, once full parity is proven on glass, the host monoliths.
**Explicitly NOT doing (stability decision): E1 — driver `DeviceContext` ownership + per-`IDDCX_MONITOR` **Explicitly NOT doing (stability decision): E1 — driver `DeviceContext` ownership + per-`IDDCX_MONITOR`
@@ -266,20 +211,21 @@ watchdog races device cleanup) for no gain, and the per-monitor cleanup callback
on this UMDF/IddCx stack. Cleanup is already deterministic (WDFDEVICE `EvtCleanupCallback` + on this UMDF/IddCx stack. Cleanup is already deterministic (WDFDEVICE `EvtCleanupCallback` +
`cleanup_for_device_removal` + the host-gone watchdog). **Revisit only if `max_concurrent>1` on Windows is `cleanup_for_device_removal` + the host-gone watchdog). **Revisit only if `max_concurrent>1` on Windows is
actually needed.** (`monitor.rs` documents this rationale at the `MONITOR_MODES` static.) actually needed.** (`monitor.rs` documents this rationale at the `MONITOR_MODES` static.)
8. **M6 scaffolding cleanup** — delete the bring-up diagnostics (`spawn_observer`/`DebugBlock` in
`idd_push.rs`) and, once full parity is proven on glass, the host monoliths.
**P3 — larger, mostly hardware-gated** **P3 — larger, mostly hardware-gated**
9. **M4 — gamepad-driver unification.** Fold `pf_dualsense` + `pf_xusb` (standalone 6. **M4 — gamepad-driver unification — ✅ substantially DONE** (`92e6802`). `pf-dualsense` (DualSense /
`packaging/windows/{dualsense,xusb}-driver/` on the old WDF stack) into the unified `drivers/` workspace DualShock 4) and `pf-xusb` (Xbox 360 / XInput) now live in the unified `packaging/windows/drivers/`
on `windows-drivers-rs` with WDF device contexts (true multi-pad), and point the **driver side** at workspace and build from source per release against the vendored `wdk-sys`, exactly like `pf-vdisplay`;
`pf_driver_proto::gamepad::{PadShm,XusbShm}` (host side already does — the `device_type`-at-offset-140 `build-gamepad-drivers.ps1` signs them with the shared cert. **Remaining:** point the **driver side** at
hand-duplication is the last ABI-drift hazard). Largest item. `pf_driver_proto::gamepad::{PadShm,XusbShm}` (the host side already does — the `device_type`-at-offset
10. **M5 — reshape WGC/DDA + GameStream onto `session/pipeline`**, then delete the old relay/monoliths. hand-duplication is the last ABI-drift hazard), add WDF device contexts for true multi-pad, and confirm
AMF/QSV stays CI-only (no lab hardware). the source build matches the prior shipped binaries.
11. **On-glass behavioral validation** of the committed-but-unexercised fixes: the watchdog reaping on 7. **M5 — reshape WGC/DDA + GameStream onto `session/pipeline`**, then delete the old relay/monoliths.
host-kill, `SET_RENDER_ADAPTER` on a **hybrid** box (the lab box is single-dGPU), the IDD-push→DDA AMF/QSV stays CI-only (no lab hardware).
fallback trigger, HDR-ring sizing + out-ring repeat under real HDR/static-desktop pipelining. 8. **On-glass behavioral validation** of the committed-but-unexercised fixes: the watchdog reaping on
host-kill, `SET_RENDER_ADAPTER` on a **hybrid** box (the lab box is single-dGPU), the IDD-push→DDA
fallback trigger, HDR-ring sizing + out-ring repeat under real HDR/static-desktop pipelining, and the
AMF/QSV encode path on real AMD/Intel hardware.
--- ---
@@ -323,7 +269,7 @@ wraps `offset_of!`/`unsafe fn` differently than the runner's) — don't reformat
### 5.3 Env knobs (Windows host) ### 5.3 Env knobs (Windows host)
`PUNKTFUNK_IDD_PUSH=1` (capture from the driver ring; default off), `PUNKTFUNK_VDISPLAY=pf|sudovda`, `PUNKTFUNK_IDD_PUSH=1` (capture from the driver ring; shipped `host.env` default on, in-code default off),
`PUNKTFUNK_ENCODER=auto|nvenc` (auto → vendor-detect), `PUNKTFUNK_10BIT=1` + `PUNKTFUNK_HDR_SHADER_P010=1` `PUNKTFUNK_ENCODER=auto|nvenc` (auto → vendor-detect), `PUNKTFUNK_10BIT=1` + `PUNKTFUNK_HDR_SHADER_P010=1`
(HDR), `PUNKTFUNK_SECURE_DDA=1`, `PUNKTFUNK_NO_WGC=1` (pure DDA), `PUNKTFUNK_ZEROCOPY=1`, (HDR), `PUNKTFUNK_SECURE_DDA=1`, `PUNKTFUNK_NO_WGC=1` (pure DDA), `PUNKTFUNK_ZEROCOPY=1`,
`PUNKTFUNK_MONITOR_LINGER_MS`, `PFVD_DEBUG_LOG=1` (driver file log — release builds are silent without it). `PUNKTFUNK_MONITOR_LINGER_MS`, `PFVD_DEBUG_LOG=1` (driver file log — release builds are silent without it).
@@ -331,8 +277,8 @@ Config lives in `%ProgramData%\punktfunk\host.env`; logs in `%ProgramData%\punkt
### 5.4 Build / deploy / packaging ### 5.4 Build / deploy / packaging
x64-only by design (no ARM64 NVIDIA driver / SudoVDA). The installer is the thin-`.iss` / fat-binary model x64-only by design (no ARM64 NVIDIA driver). The installer is the thin-`.iss` / fat-binary model
delegating to `service install`; tag `host-win-vX.Y.Z`. The driver is built + FORCE_INTEGRITY-cleared + delegating to `service install`; tag `host-win-vX.Y.Z`. The drivers are built + FORCE_INTEGRITY-cleared +
signed + `Inf2Cat`'d in CI from source. DriverVer must bump on any driver change; create the ROOT devnode signed + `Inf2Cat`'d in CI from source. DriverVer must bump on any driver change; create the ROOT devnode
via nefcon (devgen is forbidden). via nefcon (devgen is forbidden).
@@ -398,35 +344,67 @@ exactly correct.
8. its own `.inx` + an `unsafe`-reduction pass (`deny(unsafe_op_in_unsafe_fn)`, per-site `// SAFETY:`). 8. its own `.inx` + an `unsafe`-reduction pass (`deny(unsafe_op_in_unsafe_fn)`, per-site `// SAFETY:`).
**Remaining driver work** beyond STEP 8: E1 (DeviceContext-owned state + per-`IDDCX_MONITOR` **Remaining driver work** beyond STEP 8: E1 (DeviceContext-owned state + per-`IDDCX_MONITOR`
`EvtCleanupCallback` → unblock `max_concurrent>1`), the slot-reclaim-on-REMOVE fix, and M4 (fold the `EvtCleanupCallback` → unblock `max_concurrent>1` — see §4 for why it's deliberately deferred), the
gamepad drivers in). See §4. slot-reclaim-on-REMOVE fix (§4 P1.3), and folding the gamepad-driver side onto `pf_driver_proto` (M4 tail,
§4 P3).
### 6.4 Resolved product decisions (the five forks) ### 6.4 Resolved product decisions (the five forks)
**A** the host was refactored **in place** (staged, behavior-preserving), not greenfield-rebuilt — the **A** the host was refactored **in place** (staged, behavior-preserving), not greenfield-rebuilt — the
driver *was* rebuilt fresh. **B** IDD-push primary for everything incl. the **secure desktop** (validated); driver *was* rebuilt fresh. **B** IDD-push primary for everything incl. the **secure desktop** (validated);
WGC+DDA demoted to non-IddCx fallbacks. **C** all drivers on `microsoft/windows-drivers-rs` (+ the `iddcx` WGC+DDA demoted to non-IddCx fallbacks. **C** all drivers on `microsoft/windows-drivers-rs` (+ the `iddcx`
subset; `/INTEGRITYCHECK` solved) — done for `pf-vdisplay`, **pending for the gamepad drivers (M4)**. subset; `/INTEGRITYCHECK` solved) — done for `pf-vdisplay` and now for the gamepad drivers (M4, `92e6802`).
**D** keep GameStream (Moonlight), default to secure `serve`. **E** concurrent sessions: the host-side **D** keep GameStream (Moonlight), default to secure `serve`. **E** concurrent sessions: the host-side
preempt dance was removed by §2.5, but true `max_concurrent>1` on Windows stays blocked on the E1 driver preempt dance was removed by the ownership-model work, but true `max_concurrent>1` on Windows stays blocked
swap-chain-reuse work. on the E1 driver swap-chain-reuse work (deliberately deferred, §4). **Rejected: DeviceContext-per-monitor
ownership** — see the E1 stability decision in §4 (it would add a use-after-free window for no gain under
`ProcessSharingDisabled`).
--- ---
## Appendix — consolidation note ## Origins & design rationale (from the original plan)
This file replaces five docs (recoverable from git history): This folds in the durable rationale from the original Windows host + client plan
([`windows-host.md`](windows-host.md), now a stub; full original text in git history). The Windows host
began (2026-06-10 to 2026-06-14) as a *"add backends behind the existing traits"* job, not a parallel
port — `punktfunk-core` and the whole control plane are platform-agnostic, and the host already compiled
on non-Linux (macOS) thanks to existing `cfg(target_os)` gating. These framing decisions shaped what
shipped and still explain *why* the code is the way it is:
- `windows-host-rewrite.md` (the original design + plan, §0–§15) — its current status, architecture, the - **Build order: host-first.** A user preference (the research had recommended *client*-first, since the
jewels, the seam traits, and the deep reference (§6) are folded in here. client is unblocked by the no-GPU problem and becomes the host's test endpoint). The trade-off held —
- `windows-host-goal1-plan.md` (the 6-stage in-place host refactor) — **complete**; its outcome is §2.22.4 the GPU-gated steps were the only ones that stalled GPU-less.
and the Goal-1 scorecard row. - **Trait-based abstraction → ~95% reuse.** `punktfunk-core` (protocol/FEC/crypto/session/transport/QUIC/
- `windows-host-rewrite-audit.md` (the 2026-06-25 audit) — its findings are reconciled to current reality C ABI), the GameStream wire logic (mDNS, serverinfo, pairing, RTSP, ENet), the management REST API +
in §1 (scorecard) and §4 (only the still-open items survive: host hygiene, E1, slot-reclaim). `native_pairing`/`discovery`, and the `punktfunk1`/`spike`/`pipeline` orchestration all carried over
- `windows-host-rewrite-remediation.md` (the audit-remediation tracker) — its landed items are in §1; its unchanged — only the OS-touching backends behind `Capturer`/`Encoder`/`VirtualDisplay`/`InputInjector`/
remaining items (D1-host, D2, E1, G) are §4 P2/P3. `AudioCapturer`/`VirtualMic` are new `#[cfg(windows)]` code. Getting to MSVC needed only ~3 `cfg`-gates
- `windows-host-rewrite-game-capture-bug.md` (the GB1 investigation + fix) — **fixed**; the resolution is (gate the `std::os::fd`/`OwnedFd` unix-isms in `main.rs`/`vdisplay.rs`).
§2.5 (capture). The full investigation narrative is in git history. - **The no-GPU dev strategy.** Most of the port was built + validated on a **GPU-less Windows VM**: the
MSVC compile, the virtual-display control path (WARP), the openh264 software-encode pipeline (full
capture→encode→FEC→UDP transport minus HW), SendInput injection + interactive-session/desktop-reattach,
gamepad + rumble, and the entire client (software-decode loopback). Only NVENC-D3D11 zero-copy, the
DDA-vs-WGC bake-off, split-encode/bitrate-ceiling, and *all* glass-to-glass numbers deferred to a real
NVIDIA box (no perf claim transfers from Linux).
- **Windows-specific structural issues (no Linux precedent)** — these are the gotchas that drove the
service + capture design and remain true:
- **Interactive session, not a Session-0 service.** SendInput can't reach the desktop from Session 0;
Desktop Duplication / capture need the interactive session. Hence the SYSTEM-in-interactive-session
supervisor (§2.6, [`windows-service.md`](windows-service.md)) and the `OpenInputDesktop`/
`SetThreadDesktop` re-attach to survive UAC/lock desktop switches.
- **Clock epoch.** The skew handshake assumes both ends read the same realtime epoch in ns — the Windows
host must emit timestamps from `GetSystemTimePreciseAsFileTime`→Unix-epoch-ns, or cross-machine latency
+ `ClockProbe`/`ClockEcho` break (std `SystemTime` on Windows is historically coarser).
- **No audio endpoint on a headless IDD.** WASAPI loopback needs a real/virtual render device; the
virtual *mic* (client→host) has no clean user-mode path — deferred.
- **Color/range.** All clients assume BT.709 limited-range; the BGRA→I420/NV12 path must match or colors
wash out — validated against the existing decoders.
(The older `design/windows-host.md`, a pre-rewrite implementation plan from 2026-06-22, is a separate **SudoVDA → pf-vdisplay evolution.** The original plan was built around **SudoVDA**, an off-the-shelf
lineage and is left as-is.) indirect display driver (the same IDD Apollo ships) — chosen to avoid writing/WHQL-signing a driver and to
get arbitrary `WxH@Hz` modes on the fly. It carried the host all the way to live-validated NVENC on a real
RTX 4090. It was then replaced by the all-Rust `pf-vdisplay` IddCx driver (which solved
`/INTEGRITYCHECK` self-signing, §6.1, and gave us the IDD-**push** zero-copy capture path that captures the
secure desktop directly) and **deleted in commit `84a3b95`**`pf-vdisplay` is now the sole
virtual-display backend. The full SudoVDA control protocol (IOCTL layout, watchdog keepalive, GDI-name
resolution) lives in git history if ever needed as a reference.
+11 -371
View File
@@ -1,375 +1,15 @@
# Windows host + client — implementation plan # Windows host + client — implementation plan (superseded)
**Status: in progress — dev box provisioned, host-first.** A Windows host is an *"add backends > **This was the original Windows host + client plan (2026-06-10 to 2026-06-14).** All of it
behind the existing traits"* job, not a parallel port: `punktfunk-core` and the whole control plane > shipped; its durable design rationale has been folded into
are platform-agnostic and the host already compiles on non-Linux (macOS) thanks to existing > [`windows-host-rewrite.md`](windows-host-rewrite.md), which is now the single Windows-host
`cfg(target_os)` gating. The one piece that used to make it XL — a per-client *virtual* output, which > architecture / status / reference doc.
has no user-mode Windows API — is solved by reusing **[SudoVDA](https://github.com/SudoMaker/SudoVDA)**
(the SudoMaker Virtual Display Adapter, the same IDD the Apollo Sunshine-fork ships): a pre-built IDD
that creates virtual displays at **arbitrary `WxH@Hz` on the fly**. We install it and drive its IOCTL
control interface — **no driver to write or WHQL-sign.**
History: scoped 2026-06-10 (4-agent read of the host crate); SudoVDA path 2026-06-11; this concrete For the current picture, see:
plan + dev box + SudoVDA protocol + no-GPU strategy added 2026-06-14 (12-agent research pass).
## Status (2026-06-15) — full pipeline live-validated on an RTX 4090 - **`CLAUDE.md`** ("Where the work stands" → the Windows-host paragraph) — current features and live-validation status.
- **[`windows-host-rewrite.md`](windows-host-rewrite.md)** — architecture, validated invariants, operations, and the hard-won reference notes (`/INTEGRITYCHECK`, the `iddcx` bindgen knobs, the driver-port checklist). Its "Origins & design rationale" section absorbs the keep-worthy parts of this original plan.
- **git history** — the full original plan (SudoVDA control protocol, the no-GPU dev strategy, the phased host-first build order) is recoverable there.
Every OS-touching backend is implemented behind the existing traits and **builds clean on Note: SudoVDA — the off-the-shelf virtual display this plan was built around — was replaced by the
`x86_64-pc-windows-msvc`** (and Linux unaffected). `serve` / `punktfunk1-host` **run on Windows** all-Rust `pf-vdisplay` IddCx driver and deleted in commit `84a3b95`.
(identity in `%APPDATA%`, QUIC bound, mDNS advertising, accepting sessions). The **full native
pipeline is validated live on a real RTX 4090** (Windows 11): SudoVDA virtual display → DXGI
Desktop Duplication (D3D11 zero-copy) → **NVENC HEVC** → punktfunk/1 → Rust reference client, at
720p60 and 1080p60 (0 mismatched frames, p50 1.6 / 3.45 ms cross-machine, ffmpeg-decodes clean),
coexisting with a running Apollo (two concurrent NVENC sessions).
| Backend | State | GPU-less validation on the VM |
|---|---|---|
| Virtual display (SudoVDA) | ✅ done | live: open/version/watchdog/ADD/REMOVE via the trait |
| Input (SendInput) | ✅ **live on RTX 4090** | mouse injection verified — cursor tracked the client's absolute diagonal sweep across the desktop in Session 1 (keyboard shares the same SendInput primitive) |
| Software encode (openh264) | ✅ done | **live: m0 synthetic→openh264→core FEC loopback, 120/120, 0 mismatches** |
| Audio (WASAPI loopback) | ✅ done | live: init chain opens (silent VM → no samples) |
| Capture (DXGI Desktop Duplication) | ✅ **live on RTX 4090** | SudoVDA monitor → D3D11 zero-copy duplication; output is enumerated under the *rendering* GPU, not the SudoVDA LUID (search all adapters) |
| NVENC (D3D11, `--features nvenc`) | ✅ **live on RTX 4090** | 720p60 + 1080p60 HEVC end-to-end to the Rust client; ffmpeg-decodes clean; ran alongside Apollo (2 NVENC sessions) |
| Run host (serve/punktfunk1-host) | ✅ live | punktfunk1-host starts + listens; `c_abi_connection_roundtrip` passes |
| Gamepad (ViGEm) | ✅ done | compiles incl. rumble back-channel; live needs ViGEmBus + a physical pad |
| Host→client audio wiring | ✅ done | builds on MSVC; `m3` `audio_thread` active on Windows (silent VM → no samples to send) |
| GameStream (Moonlight) audio | ✅ done | stereo path active on Windows (WASAPI→Opus→RTP/FEC); surround stays Linux-only (libopus multistream / `audiopus_sys`) |
| Rumble back-channel (ViGEm) | ✅ done | `request_notification` → background thread → 0xCA; live needs a physical pad |
| Game library (Steam discovery) | ✅ done | Windows Steam roots (Program Files) + VDF other-drive libraries; custom store already cross-platform. Non-default Steam install dir (registry) not yet covered |
**Remaining for full parity:**
- **Keyboard injection** — exercised via the same SendInput path (mouse verified live); not yet
asserted into a focused text field.
- **ViGEm rumble + gamepad input** — the pad is created live (ViGEmBus connected); the rumble
back-channel + input still need a physical pad to verify.
- **GameStream (Moonlight) path on a GPU box** — not yet run live (its fixed ports collide with
Apollo, so stop Apollo first).
- **Frame pacing on static content** — DXGI duplication is change-driven, so a blank/idle virtual
display delivers only ~12 fps (181/177 frames over ~15 s); a rendering app drives the full rate.
### Live UX hardening (2026-06-15, validated Mac ↔ RTX 4090)
Driven by live testing with the native macOS client at the display's native **5120×1440@240**:
- **Native resolution, not 1080p.** `sudovda::set_active_mode` enumerates the modes the IDD actually
advertises (`EnumDisplaySettingsW`) and sets the requested **resolution** at the best supported
refresh — keeping 5120×1440@240, never silently collapsing to the 1280×720/1920×1080 OS default
when an exact mode is briefly unavailable.
- **Bitrate auto-cap.** NVENC `init_session` probes and steps the average bitrate down (×3/4 to a
floor) when the requested rate exceeds the GPU's codec-level max, so a high client bitrate connects
instead of failing (matches the Linux host; we do NOT split NVENC sessions).
- **Mouse cursor.** DXGI duplication excludes the hardware cursor; we read the pointer
position/shape from the frame info (`GetFramePointerShape`) and GPU-composite it onto the captured
texture before NVENC (a CPU read-back would stall the pipeline). Color cursors alpha-blend;
**masked-color** cursors (the text I-beam) use an `INV_DEST_COLOR` blend for true screen inversion,
so the caret is visible on any background (no black box). Monochrome handled too.
- **Secure desktop (lock / login / UAC).** The host runs as **SYSTEM in the interactive session**;
the capturer `SetThreadDesktop`s onto the current input desktop and, on the WinSta switch,
**recreates the D3D11 device** and **re-resolves the virtual output's GDI name from the stable
SudoVDA target id** (the name changes across the topology rebuild — the old failure was hunting the
stale `\\.\DISPLAYn` and dropping). `ACCESS_LOST` / `INVALID_CALL` / device-removed are all treated
as recoverable, and a mid-stream resolution change is followed (capturer + NVENC re-init at the new
size). Validated: logging in / locking through the stream stays connected (one real session
recovered 1012 desktop switches and completed cleanly). *Display isolation* (`isolate_displays`
detaches other monitors so Winlogon renders to the virtual output) covers the case where a physical
monitor is also attached.
### Running as SYSTEM (deployment) — the `PunktfunkHost` service
To capture the secure desktop the host must run as **SYSTEM in the interactive Session 1** (a Session
0 service can't duplicate Session 1). The end-user deployment is the built-in Windows **service**
(`src/service.rs`) — see [`windows-service.md`](windows-service.md). One elevated command:
```powershell
punktfunk-host service install # auto-start LocalSystem service + firewall rules + default host.env
punktfunk-host service start
```
The service runs in Session 0 but never captures: it duplicates its own LocalSystem token, retargets
it to the active console session, and `CreateProcessAsUserW`s the host there — supervising it across
exits and console-session switches (the Sunshine/Apollo model). Config lives in
`%ProgramData%\punktfunk\host.env`; logs in `%ProgramData%\punktfunk\logs\`.
> **Old bring-up chain (debug only, superseded by the service):** a scheduled task (Interactive,
> Highest) → `PsExec64 -s -i 1 -d wscript.exe launch.vbs` → `host-run.cmd` (hidden window), with
> `APPDATA=C:\Users\Public` as the shared-identity hack. The service replaces all of this; the host
> now resolves its config dir to `%ProgramData%\punktfunk` directly (`PUNKTFUNK_CONFIG_DIR` overrides).
### Real-GPU test box (RTX 4090, `ssh "Enrico Bühler"@192.168.1.174`)
Windows 11, RTX 4090 (driver 596.36) + AMD iGPU, SudoVDA + Apollo (sunshine) installed. SSH lands in
**Session 0 (non-interactive)** — DXGI duplication + SendInput need the **interactive Session 1**, so
launch the host there via an Interactive scheduled task (admin SSH session is the same user):
`Register-ScheduledTask -Principal (New-ScheduledTaskPrincipal -UserId (whoami) -LogonType
Interactive -RunLevel Highest)`, then `Start-ScheduledTask`. The host runs with desktop access; read
its redirected log over SSH. `nvEncodeAPI64.dll` ships with the driver, so a VM-built `--features
nvenc` exe runs here as-is (no SDK install). The 4090's Ada NVENC has no consumer session cap, so the
host encodes alongside Apollo. **Gotcha:** the SudoVDA monitor is rendered by — and DXGI-enumerated
under — the 4090, not the SudoVDA adapter LUID (the capturer searches all adapters; see the fix).
#### Native build on the 4090 (fast iteration loop)
Build on the box itself (edit locally → `sftp` to the repo → `cargo build` there → run via the task)
instead of build-on-VM-then-copy. Prereqs that bit us, in order:
1. **Full MSVC C++ build tools, incl. the CRT libs.** A VS install can land `cl.exe` + the Windows
SDK + sanitizer libs but *miss* the desktop CRT import libs (`VC\Tools\MSVC\<ver>\lib\x64\msvcrt.lib`,
`libcmt.lib`, …) → `LNK1104: msvcrt.lib`. Root cause here: the `Microsoft.VisualCpp.Redist.14`
package failed to install (1603), cascading to skip the NativeDesktop workload. Fix = (re)install
the C++ workload via the VS Installer **GUI** (the headless `setup.exe modify` over SSH fails — a
non-elevated SSH token gives 1603/87, and `--quiet` as SYSTEM hangs). A reboot may be needed first
(a pending reboot also yields 1603). Stop-gap: the desktop CRT libs are version-pinned, so they can
be copied from another box with the **identical** MSVC version (`14.51.36231` here).
2. **Build from an ASCII path.** A username with a non-ASCII char (`C:\Users\Enrico Bühler\…`) breaks
the MSVC PDB writer → `LNK1201: error writing to the program database`. Clone/copy the repo to
e.g. `C:\Users\Public\punktfunk-native` and build there (the VM worked only because it built in
`C:\Users\Public\punktfunk`).
3. `winget install NASM.NASM Kitware.CMake`; generate the NVENC import lib (`lib /def` → set
`PUNKTFUNK_NVENC_LIB_DIR`); set `CMAKE_POLICY_VERSION_MINIMUM=3.5` (libopus).
Build env (each `cargo` invocation): `$env:PATH += ";C:\Program Files\NASM;C:\Program Files\CMake\bin"`,
`$env:CMAKE_POLICY_VERSION_MINIMUM="3.5"`, `$env:PUNKTFUNK_NVENC_LIB_DIR="C:\Users\Public\nvenc"`, then
`cargo build --release -p punktfunk-host --features nvenc`. Validated: native build (1m37s) →
720p60 NVENC, 174/174 frames, p50 2.5 ms, ffmpeg-decodes clean.
All Windows backends are `clippy -D warnings` and `rustfmt` clean on `x86_64-pc-windows-msvc` (the
Windows-only modules are cfg-excluded from Linux CI, so run clippy on the VM after touching them — its
rustc 1.96 clippy is stricter than the Linux CI image on shared code, e.g. `needless_return`).
### Building & testing on a real-GPU Windows box (NVENC)
1. Install **SudoVDA** (virtual display) and **ViGEmBus** (gamepad) drivers; install the NVIDIA driver.
2. NVENC link lib: either install the NVIDIA Video Codec SDK, or generate an import lib from the
driver DLL — `lib /def:nvenc.def /machine:x64 /out:nvencodeapi.lib` where `nvenc.def` lists
`NvEncodeAPICreateInstance` and `NvEncodeAPIGetMaxSupportedVersion` — and set
`PUNKTFUNK_NVENC_LIB_DIR` to its directory.
3. `cargo build -p punktfunk-host --features nvenc,amf-qsv` for the all-vendor GPU host (NVENC for
NVIDIA; AMD AMF + Intel QSV via libavcodec — `amf-qsv` needs `FFMPEG_DIR` with the `*_amf`/`*_qsv`
encoders at build, e.g. the BtbN gpl-shared tree, and the FFmpeg DLLs on PATH at run; NASM + CMake
for aws-lc-rs; libclang for ffmpeg-sys-next). Default build (no feature) = openh264 software encoder.
4. Run in the **interactive session** (not a Session-0 service / not over SSH — SendInput + DXGI
Desktop Duplication need a desktop): `serve` or `punktfunk1-host --source virtual`.
`PUNKTFUNK_ENCODER=auto` (default) picks the backend from the GPU vendor — `nvenc`/`amf`/`qsv`/`sw`
force one. The DXGI capturer emits zero-copy D3D11 NV12/P010 to match any GPU backend; the SudoVDA
monitor activates once a real GPU drives WDDM, so capture + encode then work.
### Dev loop (this repo → the Windows VM)
`ssh "Enrico Bühler"@192.168.1.57` (PowerShell shell). Repo cloned at `C:\Users\Public\punktfunk`
(Gitea). Sync uncommitted files with **sftp** (`sftp -b - host`, `/C:/...` paths — scp and
base64-over-ssh are unreliable here). Commit on Linux → `git reset --hard origin/main` on the VM.
Build env: `PATH` += cargo bin + NASM + CMake + LLVM (vcvars not needed — rustc/cc self-locate MSVC).
Set `CMAKE_POLICY_VERSION_MINIMUM=3.5` — CMake 4 rejects libopus's old `cmake_minimum_required` when
`audiopus_sys` (vendored by the `opus` crate) builds libopus from source for the host→client audio path.
## Decisions (locked 2026-06-14)
| Decision | Choice | Rationale |
|---|---|---|
| **Build order** | **Host first** | User preference. (Note: the research recommended *client* first, since the client is unblocked by the no-GPU problem and becomes the host's test endpoint — see "No-GPU dev strategy". Revisit if host progress stalls on GPU-gated steps.) |
| **Virtual display** | **SudoVDA** | Arbitrary modes on the fly (no baked EDID / registry mode list, unlike parsec-vdd), MIT/CC0 (bundleable), already installed on the dev box, proven by Apollo. |
| **Client UI** | **Pure Rust: `windows-rs` + Windows Reactor (WinUI 3)** | No C++/C#. Links `punktfunk-core` directly as a crate (like the GTK Linux client — no C ABI, no GC/FFI-lifetime hazard). Built-in `SwapChainPanel` widget for the video surface; `Custom` escape hatch + raw `Microsoft.UI.Xaml` as fallback. |
| **Client decode** | **FFmpeg + D3D11VA** | Exactly what Moonlight ships; feeds AnnexB H.264/HEVC/AV1 directly, decodes AV1 via the GPU DXVA profile with **no** Store Video Extension. Cost: ffmpeg dep + libclang. |
| **Host SW encode (no-GPU dev)** | **openh264** | BSD, no system ffmpeg, low-latency single-ref/zero-lookahead with intra-refresh. Lets the full capture→encode→FEC→send pipeline run GPU-less. |
| **Host HW encode** | **nvidia-video-codec-sdk (D3D11)** | `NV_ENC_DEVICE_TYPE_DIRECTX` + `NvEncRegisterResource` on the captured `ID3D11Texture2D` = true zero-copy, no CUDA bridge. Young crate — vendor + wrap behind the `Encoder` trait. Defers to a real-GPU box. |
## Dev box (`ssh "Enrico Bühler"@192.168.1.57`)
Windows 11 Pro 25H2 (build 26200), QEMU Q35, 8 vCPU, 12 GB. **No working GPU** (an `RTX 5070 Ti` node
is present but `Status: Unknown`; `nvidia-smi` fails → NVENC cannot initialize). Installed: Rust 1.96
(MSVC), Visual Studio Community 2026 + VC tools + Windows SDK 10.0.26100/28000, Windows App Runtime
2.2 (Reactor needs ≥2.0.1 ✅), **SudoVDA** (`ROOT\DISPLAY\0000`, hwid `root\sudomaker\sudovda`, INF
`oem6.inf`, Status OK) and Parsec VDD, git, winget. **Toolchain gaps to fill** (see Step 0): NASM,
CMake, libclang.
## Reused as-is (~95% of the codebase — no changes)
| Reusable | Why |
|---|---|
| `punktfunk-core` (protocol, FEC, crypto, session, transport, QUIC control plane, C ABI) | Zero platform deps; already compiles on Windows MSVC |
| GameStream wire logic (mDNS, serverinfo, pairing, RTSP, ENet) *except* the capture/encode/audio backends | pure protocol |
| Management REST API (`mgmt.rs`) + OpenAPI, `native_pairing`, `discovery` | axum/tokio/quinn — portable |
| `punktfunk1.rs` / `spike.rs` / `pipeline.rs` orchestration | trait-generic: call `capturer.next_frame()`, `encoder.submit/poll()`, `vd.create(mode)` — no changes |
| The trait boundaries: `Capturer`, `Encoder`, `VirtualDisplay`, `InputInjector`, `AudioCapturer`, `VirtualMic` | platform-neutral; Linux deps already isolated under `[target.'cfg(target_os="linux")'.dependencies]` |
## Step 0 — make `punktfunk-host` compile on `x86_64-pc-windows-msvc` — ✅ DONE (2026-06-14)
**Result:** the full dependency tree builds clean on MSVC (aws-lc-rs with NASM+CMake, quinn,
rusty_enet, axum/hyper/utoipa), and `punktfunk-host` compiles **and runs** (the `openapi` subcommand
emits the spec). Only **3 cfg-gates** were needed — the host was already ~95% portable:
`main.rs` `mod dmabuf_fence`/`mod drm_sync``#[cfg(target_os = "linux")]`; `vdisplay.rs` the
`use std::os::fd::OwnedFd` import + `VirtualOutput.remote_fd` field → `#[cfg(target_os = "linux")]`.
Verified green on Linux too. Build env on the VM: rustc+`cc`/`cmake` self-locate MSVC (vcvars not
needed); `PATH` must include cargo bin + NASM + CMake + LLVM.
The host already compiles on macOS (Linux backends are `cfg`-gated; heavy Linux deps are
target-gated). Getting to Windows MSVC is the **unix-but-not-linux** delta, not a from-scratch port:
1. **Toolchain**: `winget install NASM.NASM Kitware.CMake LLVM.LLVM`, set `LIBCLANG_PATH`
(or tick VS "C++ Clang tools"). NASM+CMake are for **aws-lc-rs** (pulled by `rustls`/`rcgen` on
the `quic` path); libclang is for `ffmpeg-sys`/bindgen (client decode + any host bindgen crate).
2. **`std::os::fd` / `libc`**: `vdisplay.rs:18` has an unconditional `use std::os::fd::OwnedFd;` and
`VirtualOutput.remote_fd: Option<OwnedFd>``std::os::fd` is `cfg(unix)`, so it builds on macOS
but breaks on Windows. Gate the import + field (`#[cfg(unix)]`, with a Windows arm or omission).
Sweep for other `cfg(target_os="linux")`-missing unix-isms (`libc`, fds).
3. **Build natively on the VM** (`cargo build -p punktfunk-host`*not* cross-compile; xwin chokes on
aws-lc-rs/ffmpeg-sys/WDK). Triage the remaining errors. Suspect deps to verify link on MSVC:
`aws-lc-rs` (needs NASM+CMake), `rusty_enet`, the hyper/axum/utoipa stack (expected fine).
4. **CI**: add a `cargo build -p punktfunk-host --target x86_64-pc-windows-msvc` job so the Windows
path stops bit-rotting (the dev box can be a Gitea runner later).
This is the highest-value first move and is **fully doable GPU-less**.
## Windows backends (new `#[cfg(target_os = "windows")]` code behind existing traits)
| Subsystem | Linux today | Windows backend | VM-testable? |
|---|---|---|---|
| **VirtualDisplay** | KWin/gamescope/Mutter/Sway | **SudoVDA** IOCTLs (below) + `SetDisplayConfig` mode-set | ✅ likely (WARP) — *spike* |
| **Capture** | PipeWire/dmabuf | **DXGI Desktop Duplication** primary, **WGC** fallback → `ID3D11Texture2D`; add `FramePayload::D3d11` | ⚠️ DDA-on-WARP unreliable; WGC-on-WARP unverified — *spike* |
| **Zero-copy** | dmabuf→EGL/Vulkan→CUDA | register `ID3D11Texture2D` with NVENC (`NV_ENC_DEVICE_TYPE_DIRECTX`) — no CUDA bridge | ❌ needs real GPU |
| **Encode** | ffmpeg `*_nvenc`/`*_vaapi` | `nvidia-video-codec-sdk` (NVIDIA) + libavcodec `*_amf`/`*_qsv` (AMD/Intel, `encode/ffmpeg_win.rs`, `--features amf-qsv`) + `openh264` SW fallback; vendor-auto via `PUNKTFUNK_ENCODER` | NVENC ✅ live / AMF/QSV CI-only |
| **Input kbd/mouse** | libei / wlr | **SendInput** with `MOUSEEVENTF_VIRTUALDESK` absolute mapping onto the virtual desktop rect (skip the VK→evdev table — client sends Win VKs; use `KEYEVENTF_SCANCODE`+`EXTENDEDKEY`) | ✅ |
| **Gamepad** | uinput xpad + FF | **ViGEmBus** via `vigem-client` (`Xbox360Wired`); rumble via `request_notification()``XNotification{large,small}` | ✅ (install driver) |
| **Audio capture** | PipeWire sink monitor | **WASAPI loopback** via the `wasapi` crate (48 kHz stereo f32 → existing Opus) | ⚠️ needs an audio endpoint |
| **Virtual mic** | PipeWire `Audio/Source` | virtual audio driver (`Virtual-Audio-Driver`) or defer | ❌ second driver — defer |
`punktfunk1.rs`/`spike.rs`/`pipeline.rs` are unchanged. Note: the Windows capture needs its own
`capture_virtual_output` entry point (the SudoVDA identity is a DXGI adapter LUID + DisplayConfig
TargetId → GDI `\\.\DisplayN`, which doesn't fit the PipeWire `node_id: u32` field — carry it inside
the `keepalive` / a Windows-specific seam rather than overloading `node_id`).
## SudoVDA control protocol (the `VirtualDisplay` backend spec)
Pure Rust via the `windows` crate (no C lib; Apollo vendors a header-only client under
`third-party/sudovda/`). Reference port pattern: `parsec-vdd-rust` (SetupAPI/CM_* → `CreateFileW`
`DeviceIoControl`). **Verify the IOCTL hex with a `const fn ctl_code()`**
`CTL_CODE(dev,func,method,access) = (dev<<16)|(access<<14)|(func<<2)|method`, with
`FILE_DEVICE_UNKNOWN=0x22`, `METHOD_BUFFERED=0`, `FILE_ANY_ACCESS=0`.
- **Device interface GUID**: `{E5BCC234-1E0C-418A-A0D4-EF8B7501414D}` · **HWID**: `root\sudomaker\sudovda`
- **IOCTLs** (func → value): ADD `0x800``0x00222000`, REMOVE `0x801``0x00222004`,
SET_RENDER_ADAPTER `0x802``0x00222008`, GET_WATCHDOG `0x803``0x0022200C`,
DRIVER_PING `0x888``0x00222220`, GET_PROTOCOL_VERSION `0x8FF``0x002223FC`.
- **Add** (`#[repr(C)]` exact layout): in `{ u32 Width; u32 Height; u32 RefreshRate; GUID MonitorGuid;
CHAR DeviceName[14]; CHAR SerialNumber[14] }` → out `{ LUID AdapterLuid; u32 TargetId }`. **The mode
is set at create** (driver computes timing arithmetically — no EDID seeding). Pick a *stable
per-client* `MonitorGuid` (Windows persists that monitor's layout; remove is by GUID).
- **Resolve the capture target**: the monitor appears **asynchronously** — poll
`QueryDisplayConfig(QDC_ONLY_ACTIVE_PATHS)`, match `targetInfo.id == TargetId`,
`DisplayConfigGetDeviceInfo` → `viewGdiDeviceName` (`\\.\DisplayN`). Apollo polls 20 ms → ×2 → cap
320 ms. Then point DXGI Desktop Duplication at that output.
- **Keepalive (mandatory)**: `GET_WATCHDOG` → `{ u32 Timeout_s; u32 Countdown }` (default **3 s**,
driver-wide). Run one thread firing `DRIVER_PING` every `Timeout*1000/3` ms (~1 s). Miss it and the
driver tears down **all** virtual displays.
- **Teardown (RAII)**: `Drop` → `DeviceIoControl(REMOVE, { GUID MonitorGuid })` = the `VirtualOutput`
keepalive drop.
- **Mid-stream `Reconfigure`**: SudoVDA has no in-place mode IOCTL (Apollo only relaunches). Implement
punktfunk's `Reconfigure` as remove+re-add at the new mode (or add-second + migrate capture), and
**watch the Win11 24H2/25H2 IDD mode-apply regression** (post-create `ChangeDisplaySettingsEx` may
not move the *desktop* to the new mode without a Settings-UI poke — VirtualDrivers #471). The
~90 ms `Reconfigure` budget needs an isolated spike to confirm on 24H2/25H2.
- **Install / signing**: self-signed — ship `sudovda.cer`, import to Root + TrustedPublisher, create
the device node via `nefconc.exe` (`--create-device-node`/`--install-driver`). Installs **without**
test-signing (trusted-publisher). MIT/CC0 → bundleable (Apollo precedent). **Already installed on
the dev box.** Document it as a host prerequisite (like the Linux udev rule).
- **GPU caveat**: SudoVDA's `Driver.cpp` does `D3D11CreateDevice(UNKNOWN)` on a render adapter with
**no explicit WARP fallback**; on the GPU-less VM Windows binds the Basic Render Driver (WARP), so
display compositing *should* work but NVENC won't. Confirm `ADD` actually brings a monitor up on the
VM in the first spike.
## No-GPU dev strategy
**Buildable + validatable on the VM now:** Step 0 (MSVC compile); the SudoVDA backend
(add/mode-set/keepalive/remove via WARP — *spike to confirm*); the openh264 SW encode path fed a CPU
BGRA staging copy → real AnnexB → FEC → UDP (the full transport minus HW); SendInput injection +
interactive-session/desktop-reattach; ViGEm gamepad + rumble; WASAPI loopback (if an endpoint
exists); and the entire client (software decode loopback).
**Defers to a real NVIDIA-GPU Windows box:** NVENC-D3D11 zero-copy encode; whether the captured
`ID3D11Texture2D` registers with NVENC zero-copy vs needing a `CopyResource`; the DDA-vs-WGC latency
bake-off (DDA-on-WARP is `E_NOTIMPL`-class); split-encode + bitrate-ceiling probe; and **all**
glass-to-glass / throughput numbers (no perf claim transfers from Linux).
## Windows-specific structural issues (no Linux precedent)
- **Interactive session, not a Session-0 service.** SendInput can't reach the desktop from Session 0.
Run the host in the user's interactive session and replicate Apollo/Sunshine's
`OpenInputDesktop`/`SetThreadDesktop` re-attach to survive UAC/lock-screen desktop switches. (Driving
the UAC *secure* desktop needs a UIAccess manifest + signing — out of scope; document it.)
- **Clock epoch on the host side.** The skew handshake assumes both ends read the same realtime epoch
in ns. The Windows host must emit timestamps from `GetSystemTimePreciseAsFileTime`→Unix-epoch-ns or
cross-machine latency numbers + `ClockProbe`/`ClockEcho` break.
- **IDD has no audio endpoint.** There's nothing to loop back on a headless box unless a real/virtual
render device exists → WASAPI loopback needs an endpoint, and the virtual *mic* (client→host) has no
clean user-mode path. Audio is potentially a second driver-install problem; defer the mic.
- **Color/range.** All clients assume BT.709 limited-range. A new openh264/NVENC-D3D11 path doing
BGRA→I420 must match, or colors wash out — validate against the existing decoders.
## Phased plan (host-first)
0. **Compile on MSVC** (Step 0 above). GPU-less. ← *start here*
1. **SudoVDA `VirtualDisplay` backend** — ✅ *control path landed* (`vdisplay/sudovda.rs`:
add/keepalive/remove + GDI-name resolution + RAII teardown, behind the existing trait; `open()`
returns it on Windows). Compiles + live-tested on the VM. **Remaining:** monitor activation +
`\\.\DisplayN` resolution (needs a GPU), then `SetDisplayConfig` mid-stream `Reconfigure`.
2. **Capture + SW encode** — DXGI Desktop Duplication (or WGC) → `ID3D11Texture2D` → CPU staging →
openh264 → existing FEC/transport. First end-to-end Windows session, GPU-less, against the Linux
`punktfunk-probe` or the new Windows client.
3. **Input** — SendInput (kbd/mouse, VIRTUALDESK mapping) + interactive-session/desktop-reattach.
4. **Gamepad + audio** — ViGEm + rumble; WASAPI loopback.
5. **HW encode (real-GPU box)** — `nvidia-video-codec-sdk` D3D11 zero-copy; DDA-vs-WGC bake-off;
glass-to-glass numbers. Resolve to Xbox-360 pad on Windows (drop DualSense fidelity/virtual-mic to
follow-ups, as the host already does for non-Linux).
## The Windows client (separate track, pure Rust)
Structurally a sibling of `clients/linux` (GTK4) — same shape, different toolkit:
- **UI**: `windows-rs` + **Windows Reactor** (WinUI 3) for native chrome. Link `punktfunk-core`
directly (no C ABI). **De-risk early**: a Reactor window with a `SwapChainPanel` presenting a
test pattern through a flip-model waitable swapchain, before building on it. Fallback if Reactor's
3-week-old maturity bites: the `Custom` element + raw `windows-rs` `Microsoft.UI.Xaml`.
- **Decode**: FFmpeg `avcodec_send_packet`/`receive_frame` with the **D3D11VA** hwaccel → `NV12/P010`
`ID3D11Texture2D`. Feeds AnnexB directly (matches host output), decodes AV1 with no Store extension.
- **Present**: DXGI flip-model **waitable** swapchain (`FLIP_DISCARD` + `FRAME_LATENCY_WAITABLE_OBJECT`,
max latency 1) bound to the `SwapChainPanel` via `ISwapChainPanelNative::SetSwapChain`. **Not**
MediaPlayerElement.
- **Input capture**: RAWINPUT/`WM_INPUT` for relative/pointer-lock mouse; `Windows.Gaming.Input` for
gamepads + rumble. Forward via the linked `NativeClient` (`send_input`/`send_rich_input`).
- **Trust**: SPAKE2 PIN + TOFU pinning via core; persist the client identity in Windows Credential
Manager / DPAPI (the Keychain analog).
## Open risks / spikes (do these in isolation, early)
1. **`cargo build -p punktfunk-host` on the VM** — count + triage the real MSVC errors before
estimating Step 0. (GPU-less.)
2. **SudoVDA `ADD` on the VM** — ✅ *done 2026-06-15.* The control path is fully validated on the
GPU-less VM, both standalone and through the real `VirtualDisplay` trait (`vdisplay/sudovda.rs`):
device open by GUID, `GET_VERSION` (0.2.1), `GET_WATCHDOG` (3 s), `ADD 1920×1080@60` → returns
adapter LUID + `target_id`, watchdog ping holds it, RAII `Drop` → `REMOVE`. **Gap:** with no GPU the
target does NOT activate into a WDDM display path (`QueryDisplayConfig` active paths stay 0 → no
`\\.\DisplayN` to resolve/capture). So **activation + name-resolution + capture defer to a real
GPU** (passthrough on the Proxmox VM, or a GPU box) — consistent with capture/NVENC deferring anyway.
3. **IDD arbitrary-mode + `Reconfigure` on 24H2/25H2** — does 5120×1440@240 apply, and does a
remove+re-add (or re-modeset) hit the ~90 ms budget without a Settings-UI toggle? Make-or-break for
"native client resolution, no scaling".
4. **NVENC-D3D11 zero-copy** (real-GPU box) — does the captured texture register as-is, or need a
copy? Does `nvidia-video-codec-sdk`'s `NV_ENC_DEVICE_TYPE_DIRECTX` path work end-to-end? (Expect to
vendor/patch.)
5. **DDA vs WGC** against the SudoVDA monitor — measure latency/jitter on a real GPU; resolve the
primary-capture choice.
6. **Driver redistribution** — confirm bundling SudoVDA (`.cer` + nefcon) + ViGEmBus installers in the
punktfunk Windows package; document them as prerequisites.
## References
- SudoVDA: <https://github.com/SudoMaker/SudoVDA> · Apollo integration:
<https://github.com/ClassicOldSong/Apollo/tree/master/src/platform/windows> (`virtual_display.cpp`)
+ `third-party/sudovda/`
- parsec-vdd-rust (port pattern): <https://github.com/rohitsangwan01/parsec-vdd-rust>
- Win11 24H2 IDD mode-apply regression: VirtualDrivers/Virtual-Display-Driver #471
- Windows Reactor (WinUI 3 in Rust): windows-rs PR #4479
- Crates: `windows`, `windows-capture`, `vigem-client`, `wasapi`, `openh264`,
`nvidia-video-codec-sdk`, `ffmpeg-next`
</content>
</invoke>
+20 -101
View File
@@ -1,110 +1,29 @@
# Windows service (deployment) # Windows service (deployment)
The `PunktfunkHost` Windows service is the end-user way to run the host on Windows. It replaces the **Status: SHIPPED.** The `PunktfunkHost` LocalSystem SCM service is the end-user way to run the host
manual bring-up chain (a scheduled task → `PsExec64 -s -i 1``wscript launch.vbs``host-run.cmd`) on Windows, installed by the signed Inno Setup installer. Sources / details:
with one command, auto-start on boot, and supervision.
## Install (installer — recommended) - `crates/punktfunk-host/src/windows/service.rs` — the supervisor.
- [`packaging/windows/README.md`](../packaging/windows/README.md) — installer + driver packaging.
- `punktfunk-host service --help` — install / start / stop / status / uninstall.
Download the signed installer from the package registry ## Why it works the way it does (the durable rationale)
(`punktfunk-host-windows`, <https://git.unom.io/unom/-/packages>) and run it (it elevates itself):
``` The host must capture the **secure desktop** (UAC / lock / login) and inject input there. Desktop
punktfunk-host-setup-<ver>.exe # wizard Duplication of the secure desktop and `SendInput` both require **SYSTEM**, while capture and injection
punktfunk-host-setup-<ver>.exe /VERYSILENT # unattended require the **interactive console session** — which a plain Session-0 service is *not* in. One process
``` must therefore be SYSTEM *and* in the interactive session.
It lays the host into `C:\Program Files\punktfunk`, optionally installs the bundled **SudoVDA** The service resolves this the same way Sunshine/Apollo do: it runs as **LocalSystem in Session 0** but
virtual-display driver, then runs `service install` + `service start` for you. Upgrades stop the **never captures**. Instead it duplicates its own LocalSystem token, retargets it to the active console
service first and re-point it; uninstall (Add/Remove Programs) runs `service uninstall`. Packaging session (`SetTokenInformation(TokenSessionId)`), and launches the host there with
details: [`packaging/windows/README.md`](../packaging/windows/README.md). A self-signed CI build also `CreateProcessAsUserW` (`lpDesktop = winsta0\default`) — supervising it across exits and console-session
publishes a `.cer` — import it once (`Import-Certificate -FilePath punktfunk-host-windows.cer switches, with a kill-on-close Job Object so a service crash never orphans the SYSTEM host.
-CertStoreLocation Cert:\LocalMachine\TrustedPublisher`) so Windows trusts the signed setup.
## Install (manual / CLI) `service run` is the **SCM entry point only** — don't run it by hand (it errors with a hint).
From an **elevated** (Administrator) prompt: ## Open item — graceful stop
```powershell A service stop currently `TerminateProcess`es the host, which **skips RAII teardown**, so a stale
punktfunk-host service install # register auto-start LocalSystem service + firewall rules + default host.env virtual monitor can linger until the next start. The follow-up is a cooperative-stop signal
punktfunk-host service start # start it now (also starts automatically on every boot) (event/pipe) that lets the host unwind cleanly before exit.
```
`service install` is idempotent — run it again after upgrading the exe to re-point the service at the
new binary. Register whatever location you keep the exe in (e.g. `C:\Program Files\punktfunk\`); the
service records the current exe path.
Other subcommands:
```powershell
punktfunk-host service stop
punktfunk-host service status
punktfunk-host service uninstall # stop + delete the service + remove its firewall rules
```
## How it works
The host must run **as SYSTEM in the interactive session** (Session 1+): Desktop Duplication of the
secure desktop (UAC / lock / login) and `SendInput` need SYSTEM, and capture/injection need the
interactive session, which a plain Session-0 service is not in.
So the service (itself in Session 0) **never captures**. On start, and whenever the active console
session changes, it:
1. resolves the active console session (`WTSGetActiveConsoleSessionId`),
2. duplicates its own LocalSystem token and retargets it to that session (`SetTokenInformation`
`TokenSessionId`),
3. launches the host there with `CreateProcessAsUserW` (`lpDesktop = winsta0\default`),
4. supervises it: relaunches on exit/crash (with backoff) and on a console connect/disconnect.
A kill-on-close **job object** ensures a service crash never orphans the SYSTEM host. The host in turn
spawns the WGC helper into the *user* session (see [`windows-secure-desktop.md`](windows-secure-desktop.md))
— two nested launches. Lock/unlock are handled inside the host (the `DesktopWatcher` DDA↔WGC mux), so
the service deliberately does **not** relaunch on lock/unlock — only on a real session switch.
This is the same model Sunshine/Apollo use.
## Configuration
Config lives in **`%ProgramData%\punktfunk\host.env`** (KEY=VALUE lines, `#` comments). `service
install` writes a default if none exists. Template: [`scripts/windows/host.env.example`](../scripts/windows/host.env.example).
```ini
PUNKTFUNK_ENCODER=nvenc
PUNKTFUNK_VIDEO_SOURCE=virtual
PUNKTFUNK_SECURE_DDA=1
RUST_LOG=info
# PUNKTFUNK_HOST_CMD=serve --gamestream # the host subcommand the service launches (default: native + Moonlight)
```
The service loads these into its environment and carries `PUNKTFUNK_*` + `RUST_LOG` to the host child
(the same env-merge the WGC helper uses). Restart the service after editing:
```powershell
punktfunk-host service stop; punktfunk-host service start
```
The host's identity (cert/pairing/mgmt token/library) also lives under `%ProgramData%\punktfunk` — a
machine-wide dir the SYSTEM service and the interactive user share, surviving user logout.
`PUNKTFUNK_CONFIG_DIR` overrides the location (both platforms; handy for tests).
## Logs
- `%ProgramData%\punktfunk\logs\service.log` — the service's own supervision log (spawn/exit/session
switches).
- `%ProgramData%\punktfunk\logs\host.log` — the host child's stdout/stderr.
## Prerequisites
- The host built with `--features nvenc` for NVENC (the driver ships `nvEncodeAPI64.dll`; no SDK
needed at runtime). Software encode otherwise.
- The **SudoVDA** indirect display driver installed (for `PUNKTFUNK_VIDEO_SOURCE=virtual`).
- **ViGEmBus** for virtual gamepads (optional).
## Gotchas
- `service install`/`uninstall` need an **elevated** prompt (the SCM rejects non-admin).
- `service run` is the SCM entry point — don't run it by hand (it errors with a hint).
- A **graceful** stop currently `TerminateProcess`es the host, so its RAII teardown (SudoVDA monitor
REMOVE) doesn't run; a stale virtual monitor can linger until the next start. A cooperative-stop
signal is a follow-up.
+119 -467
View File
@@ -1,497 +1,149 @@
# Windows virtual display — a Rust port of SudoVDA (investigation & plan) # Windows virtual display — a Rust port of SudoVDA (investigation & plan)
Status: **P1 done — `pf-vdisplay` validated streaming on glass at 5120×1440@240** (2026-06-22). The > **Status:** SHIPPED (P1, 2026-06-22) + P2 CLOSED as a dead end. The all-Rust IddCx driver
all-Rust IddCx driver replaces the vendored **SudoVDA** C++ driver, matching the "all-Rust UMDF, zero > `pf-vdisplay` (`packaging/windows/drivers/pf-vdisplay/`) replaced the vendored SudoVDA C++ driver
external driver deps" direction we finished for gamepads (ViGEmBus gone; DualSense/DS4/XUSB shipped). > (SudoVDA backend deleted in `84a3b95`) and is the **sole** Windows vdisplay backend; the host drives it
The investigation/plan below is kept for context; see **Validated on-box** for the result. > via `crates/punktfunk-host/src/vdisplay/windows/pf_vdisplay.rs`. Live-validated streaming on the RTX box
> at 5120×1440@240. The current consolidated Windows-host architecture lives in
> [`windows-host-rewrite.md`](windows-host-rewrite.md). This doc is trimmed to the two things git history
> can't replace: the **on-glass driver-iteration gotchas**, and the **P2 decision record** proving
> direct-frame-push (IDD-push) is architecturally impossible for bare-metal capture — *do not re-attempt it.*
## TL;DR All the P1 planning/feasibility/decision content (signing tier, Rust prior art, binding-stack choice,
IOCTL contract, phased plan) executed as designed and now lives in the code + `windows-host-rewrite.md`;
it has been cut. What remains below is the durable record.
A Rust port is **feasible, low-on-blockers, and strategically aligned** — and there's an unexpected ## Driver-iteration gotchas (hard-won, on-glass)
architectural prize beyond "same thing, in Rust."
- **Signing is not a blocker.** An IddCx driver is UMDF *user-mode*; it needs **no WHQL, no These cost real time during P1 bring-up and apply to **any** future IddCx/UMDF driver work on this box.
attestation, no test-signing**. A self-signed cert in LocalMachine `Root` + `TrustedPublisher`
loads it — **exactly the model our gamepad drivers already ship** (and exactly what SudoVDA and the
other forks do). ([Do UMDF drivers require signing?](https://learn.microsoft.com/en-us/archive/blogs/peterwie/do-umdf-drivers-require-signing))
- **We would not be first in Rust.** [`MolotovCherry/virtual-display-rs`](https://github.com/MolotovCherry/virtual-display-rs)
is a complete, shipping **IddCx driver written in Rust** (MIT), with hand-rolled IddCx/WDF bindgen
bindings (`wdf-umdf-sys` + `wdf-umdf`) and a reference swap-chain processor. This turns "greenfield
FFI" into "adapt a proven reference."
- **The prize: we can stop using DXGI Desktop Duplication.** An IddCx driver already *receives* the
composited desktop frames in its swap-chain. [Looking Glass](https://deepwiki.com/gnif/LookingGlass/2.5-indirect-display-driver-(idd))
ships exactly this in production — driver consumes the swap-chain, hands frames to a separate
process, "operates entirely independently of DDA." Doing the same would **delete an entire class of
multi-GPU bugs** the current `capture/dxgi.rs` is built to survive (ACCESS_LOST storms,
MODE_CHANGE_IN_PROGRESS, the `win32u.dll` reparenting patch).
Recommendation: **yes, build it in Rust**, in phases — a drop-in DDA-compatible driver first (own the - **INF DriverVer gate.** Updating an installed UMDF driver only takes if the INF **DriverVer changes**
stack at low risk), then the direct-frame-push path (the real cleanup). Keep vendoring SudoVDA as the `deploy-dev.ps1` stamps a date.time `-v` on every run; without a bump the **old binary keeps running
safe interim until the Rust driver is on-glass-validated on the RTX box. (silently)**.
- **Devnode hygiene — `nefconc`, never `devgen`.** Create the root devnode with
`nefconc --create-device-node` (a clean `ROOT\DISPLAY` node), **NOT** `devgen /add` — devgen makes
**persistent `SWD\DEVGEN` software devices** that survive reboot *and* registry deletion and resurrect on
every `pnputil /add-driver` (they carry `hwid root\pf_vdisplay`, so the driver install re-materializes
them). The production installer must use a single `nefconc`/INF-created node and never `devgen`.
- **Session-0 vs Session-1 observability.** Every standalone probe (`vdtest`, the host's
`live_create_drop` test) runs in **Session 0** — the services session, whose desktop is a throwaway
**1024×768** basic display. IddCx activation happens in the **console Session 1**, where the GPU drives
the real desktop. So `Screen.AllScreens`/CCD queries from Session 0 *can never* see the virtual monitor
activate — they report the wrong desktop. The only valid way to drive + observe it is the **host service**
(SYSTEM, which targets Session 1) plus the driver's own `OutputDebugString` (system-wide,
session-agnostic). (An early "monitor arrives but never gets a swap-chain / no DXGI output" symptom was
this measurement artifact, not a driver bug.)
- **Accumulated device-state damage.** Repeated reinstalls + `Disable`/`Enable-PnpDevice` cycles + a control
handle the host **cached across all of it** wedge the device tree (stale handle → the host's PINGs fail →
the 3 s watchdog tears the monitor down mid-session → capture opens a dying display → "no DXGI output").
A **reboot** clears it and it works on the first connect. Lesson: after device churn, restart the host
service (fresh handle) — and when in doubt, reboot.
- **Hot-reload is unreliable; deploy = install + reboot.** `pnputil /restart-device` does **NOT** restart
WUDFHost (old image stays mapped), `Disable/Enable-PnpDevice` errors on the root-enumerated IDD, and
**killing WUDFHost invalidates the host's cached `{e5bcc234}` control handle** (every ADD then fails
`0x80070006`, and the device can wedge to `FAILED_POST_START`). A **reboot** loads a freshly-installed
build cleanly. **Recovery** from a broken build is clean and reboot-free:
`pnputil /delete-driver <oemNN>.inf /uninstall` removes the bad package and the device rebinds the
previous (validated) package in the DriverStore.
- **`FAILED_POST_START` is usually churn, not the binary.** Comparing a working vs. a suspect DLL's import
tables came out **identical** (same DLLs; the size/hash delta is just the Authenticode signature). A clean
install **+ reboot** (no `restart-device`/`disable-enable`/kill in between) loads to `OK`.
- **The swap-chain drain is required.** The swap-chain processor is a faithful port of
virtual-display-rs's — it drains correctly via `ReleaseAndAcquireBuffer` + `FinishedProcessingFrame`. The
drain is *required*; a true no-op stalls DWM and freezes the captured image.
- **`pf-vdisplay` can't coexist with SudoVDA.** They register the same control-interface GUID, so two IddCx
adapters claiming `{e5bcc234}``FAILED_POST_START`. pf-vdisplay *replaces* SudoVDA (now moot — SudoVDA
is deleted — but the same rule binds any second IDD that claims the GUID).
## Validated on-box (2026-06-22) ## P2 — direct frame push (kill DDA): decision record — DEAD END, DO NOT PURSUE
Before committing, the toolchain + load path were proven on the RTX box (Win11 26200, WDK 26100): P2 wanted the driver to *publish* each swap-chain frame to the host directly (Looking-Glass style), to
retire DXGI Desktop Duplication and its multi-GPU survival code (`capture/dxgi.rs`'s
`DXGI_ERROR_ACCESS_LOST`/`MODE_CHANGE_IN_PROGRESS` re-duplication churn and the `win32u.dll`
`install_gpu_pref_hook()` patch). **It cannot work for bare-metal console-desktop capture.** All the
IDD-push code stays in-tree, compiles, and is gated **off** behind `PUNKTFUNK_IDD_PUSH` — dormant and
harmless — as the documented record so it isn't re-tried.
- **A Rust IddCx driver builds with our toolchain.** Cloned [`virtual-display-rs`](https://github.com/MolotovCherry/virtual-display-rs) ### What was proven sound (so the failure is *not* a transport bug)
and built its driver `.dll` against our WDK (UMDF 2.31 + IddCx 1.4 stubs, bindgen over `IddCx.h` via
our LLVM, nightly-2024-07-26). One fix needed: its `build.rs` picked the **max** SDK Lib version
(`10.0.28000.0`, a base SDK with no IddCx) for the `IddCxStub` search path; resolving it by the
version that actually contains `um\x64\iddcx\1.4` (`10.0.26100.0`, the WDK) fixed the link.
- **It installs self-signed and loads.** Signed `.dll`/`.cat` with our existing driver cert (the
gamepad `punktfunk-ds-test`), `pnputil /add-driver`, root devnode via `devgen`. The device came up
**Status OK / CM_PROB_NONE**, Class Display, hosted by `WUDFRd` — a Rust IddCx adapter initialized
cleanly. (SudoVDA, already live here, independently confirms IddCx + self-signed UMDF work on this
box.) Test artifacts removed afterward; SudoVDA untouched.
**Conclusion:** the central risk ("can we build + load a Rust IddCx driver here?") is retired. The - **Producer and consumer are both in Session 0.** The pf-vdisplay host process is `WUDFHost.exe`
binding question (D2) resolves toward **reusing `virtual-display-rs`'s self-contained `wdf-umdf-sys` + (`-DeviceGroupId:pfVDisplayGroup`) and the punktfunk host service is `LocalSystem`**both Session 0**.
`wdf-umdf` bindgen crates** (now proven to build + load on our box) rather than extending So a D3D11 **shared keyed-mutex texture** created in the driver can be opened by name in the host
`windows-drivers-rs` — IddCx functions are direct `IddCxStub` exports the WDF function-table macro (`ID3D11Device1::OpenSharedResourceByName`) with both devices on the **same render-adapter LUID** (the
can't reach anyway, so a unified bindgen is the cleaner base for `pf-vdisplay`. Reference clone kept at driver reports it out of the `ADD` IOCTL via `OsAdapterLuid`). Named kernel objects resolve through
`C:\Users\Public\virtual-display-rs`. Session 0's shared `\BaseNamedObjects`, so no `Global\` prefix / `SeCreateGlobalPrivilege` gymnastics are
needed for same-session use. The Looking-Glass cross-*VM* shared-memory device is unnecessary — this is
**Scaffold + driver logic landed + on-glass:** `packaging/windows/vdisplay-driver/` — vendored cross-*process*, same-session, one GPU.
`wdf-umdf-sys`/`wdf-umdf` (MIT, + the SDK-version build.rs fix) + the `pf-vdisplay` driver crate. The - **Transport shape (built):** a **ring** of N (default 3) shared keyed-mutex textures (newest-wins, so the
full IddCx driver is ported (entry → `IDD_CX_CLIENT_CONFIG` with all 7 callbacks → device/monitor swap-chain thread never blocks — a stalled `IddCxSwapChainReleaseAndAcquire` loop freezes DWM compositing
context → our own EDID → a real swap-chain drain), with the IPC/serde/`tokio` stack replaced by an system-wide) + a named metadata header (`{magic, version, generation, width, height, dxgi_format,
in-tree `monitor` model and `OutputDebugString` logging. **Validated on the RTX box:** built, signed ring_len, latest}`) + a frame-ready auto-reset event. A **generation** counter bumps on a mode change so
(our `punktfunk-ds-test` cert), installed, loaded **Status OK**, and **arrived a real virtual monitor** the host re-opens the ring.
("VirtuDisplay+", `DISPLAY\CHY0000`) — i.e. an OURS, all-Rust IddCx virtual display creating a monitor. - **The inversion (required) — host creates, driver opens.** **WUDFHost runs with a restricted token: it
**IOCTL control plane done + on-glass (P1 functionally complete):** the SudoVDA-compatible control
plane is implemented (`EVT_IDD_CX_DEVICE_IO_CONTROL` + the `{e5bcc234-…}` interface registered via
`WdfDeviceCreateDeviceInterface`; `control.rs` with byte-identical structs) — `ADD` a monitor at a
requested mode → `{LUID, target_id}` (target id + adapter LUID captured from `IDARG_OUT_MONITORARRIVAL`),
`REMOVE` by GUID, `PING`/`GET_WATCHDOG` watchdog, `GET_VERSION`, `SET_RENDER_ADAPTER`
(`IddCxAdapterSetRenderAdapter`); per-`ADD` mode injection (requested mode preferred + fallbacks). Added
the five missing FFI wrappers to the vendored `wdf-umdf`. **Validated on the RTX box** with a probe
that mimics `vdisplay/sudovda.rs` exactly: `GET_VERSION → 0.2.1`, `GET_WATCHDOG → timeout=3`,
`ADD 1920×1080@60 → target_id=257 + adapter LUID`, a real "VirtuDisplay+" monitor arrived at the
requested mode, `REMOVE` ok. **Constraint:** pf-vdisplay can't coexist with SudoVDA — they register the
same interface GUID, so two IddCx adapters claiming it → `FAILED_POST_START`; pf-vdisplay *replaces*
SudoVDA (validated by disabling SudoVDA first).
**Watchdog + real-host drive validated:** added the watchdog thread (1 Hz countdown reset by any IOCTL;
tears down all monitors at 0 so a gone host never leaves a phantom display; mirrors SudoVDA's
`RunWatchdog`). Pointed the **real host** at it — removed SudoVDA's devnode so pf-vdisplay is the sole
`{e5bcc234}` provider, then ran the host's `vdisplay::sudovda::tests::live_create_drop`
(`PUNKTFUNK_SUDOVDA_LIVE=1`): **test passed**, and the pf-vdisplay log shows the host's IOCTLs landing —
`ADD 1920x1080@60 → target_id=258, luid=…02619823`, then the watchdog correctly tore the monitor down
when the test process exited without a final REMOVE. So `vdisplay/sudovda.rs` drives pf-vdisplay
unchanged through the full control contract.
**Validated streaming end-to-end on glass (2026-06-22) — P1 complete.** pf-vdisplay is a working
SudoVDA replacement. Driven by the **real host** (`serve`, the LocalSystem service) with a stock client
at **5120×1440@240**: the monitor arrives, `resolve_gdi_name → \\.\DISPLAY10`, `set_active_mode` +
CCD-isolate succeed, the DXGI output resolves **under the RTX 4090**, WGC capture + NVENC run at
**steady 240 fps, ~2.4 ms encode**, 6512 AUs sent, clean teardown (`isolate restored rc=0x0`). Same
`vdisplay/sudovda.rs` path, unchanged — full parity with SudoVDA.
**The earlier "monitor arrives but never gets a swap-chain / no DXGI output" symptoms were a
measurement + state artifact, not a driver bug.** Two traps cost a lot of time:
1. **Session 0.** Every standalone probe (`vdtest`, the host's `live_create_drop` test) ran in
**Session 0** — the services session, whose desktop is a throwaway **1024×768** basic display. IddCx
activation happens in the **console Session 1**, where the 4090 drives the real desktop. So
`Screen.AllScreens`/CCD queries from Session 0 *can never* see the virtual monitor activate — they
report the wrong desktop. The only valid way to drive + observe it is the **host service** (SYSTEM,
which targets Session 1) plus the driver's own `OutputDebugString` (system-wide, session-agnostic).
2. **Accumulated device-state damage.** Repeated reinstalls + `Disable`/`Enable-PnpDevice` cycles +
a control handle the host **cached across all of it** left the device tree wedged (stale handle →
the host's PINGs fail → the 3 s watchdog tears the monitor down mid-session → capture opens a dying
display → "no DXGI output"). **A reboot cleared it and it worked on the first connect.** Lesson:
after device churn, restart the host service (fresh handle) — and when in doubt, reboot.
The swap-chain processor is a **faithful port of virtual-display-rs's** (it drains correctly via
`ReleaseAndAcquireBuffer` + `FinishedProcessingFrame` — the drain is *required*; a true no-op would
stall DWM and freeze the captured image). The EDID is our **own clean 128-byte block** (manufacturer
`PNK`, product `punktfunk`) — no SudoVDA bytes.
**Build gotcha (important for iterating):** updating an installed UMDF driver only takes if the INF
**DriverVer changes**`deploy-dev.ps1` stamps a date.time `-v` on every run; without a bump the old
binary keeps running (silently). **Devnode hygiene:** create the root devnode with
`nefconc --create-device-node` (a clean `ROOT\DISPLAY` node), NOT `devgen /add` — devgen makes
**persistent `SWD\DEVGEN` software devices** that survive reboot *and* registry deletion and resurrect
on every `pnputil /add-driver` (they have `hwid root\pf_vdisplay`, so the driver install re-materializes
them). The production installer must use a single `nefconc`/INF-created node and never `devgen`.
## P2 — direct frame push (kill DDA): design & decision record
Status: **in progress.** P1 ships frames the old way (the driver drains its swap-chain and DDA/WGC
re-captures the composited desktop). P2 makes the driver *publish* each swap-chain frame to the host
directly, so we can retire Desktop Duplication and its multi-GPU survival code. Built behind
`PUNKTFUNK_IDD_PUSH`, A/B'd against DDA, and only then made the default.
### The decisive finding: producer and consumer are both in Session 0
The whole transport design hinged on one unknown — same-session or cross-session? **Measured on the
RTX box (2026-06-22):** the pf-vdisplay host process is `WUDFHost.exe` with
`-DeviceGroupId:pfVDisplayGroup`, running in **Session 0**; the punktfunk host service is `LocalSystem`,
also **Session 0**. So the swap-chain processor thread (spawned by our own `thread::spawn` inside the
driver, i.e. in `WUDFHost`) and the encoder live in the **same session**. This is the easy case:
- A D3D11 **shared keyed-mutex texture** created in the driver can be opened by name in the host with
`ID3D11Device1::OpenSharedResourceByName` — both devices created on the **same render-adapter LUID**
(which the driver already reports out of the `ADD` IOCTL via `OsAdapterLuid`, surfaced as
`WinCaptureTarget::adapter_luid`).
- Named kernel objects resolve through Session 0's shared `\BaseNamedObjects`, so **no `Global\`
prefix / `SeCreateGlobalPrivilege` gymnastics** are needed (kept the names unprefixed; documented
that this relies on both processes being Session 0). The Looking-Glass cross-*VM* shared-memory
device is unnecessary — this is cross-*process*, same-session, on one GPU.
This collapses the "Session-0 cross-process transport is the long pole" risk from the original plan.
### Transport: a ring of shared keyed-mutex textures + a metadata header + an event
A single ping-pong keyed mutex would couple the driver's present rate to the host's consume rate — and
**the swap-chain thread must never block** (a stalled `IddCxSwapChainReleaseAndAcquire`/processing loop
freezes DWM compositing system-wide). So, the Looking-Glass shape — multiple frame buffers, newest
wins:
- **Ring** of `N` (default 3) shared textures, `RESOURCE_MISC_SHARED_NTHANDLE |
SHARED_KEYEDMUTEX`, fixed size for the session. A **generation** counter bumps on a mode change
(resize): the driver tears down + recreates the ring at the new size, the host notices the
generation change and re-opens.
- **Named metadata header** (`CreateFileMapping`): `{magic, version, generation, width, height,
dxgi_format, ring_len, latest}` where `latest` packs `{write_index, monotonic sequence}` published
*after* the copy completes. Plain (unprefixed) names — Session-0 shared namespace.
- **Frame-ready auto-reset event** so the consumer waits instead of spinning.
- **Producer (driver, per acquired frame):** pick `(latest_index + 1) % N`; **try**-acquire that
slot's keyed mutex with a 0 ms timeout (if the host still holds it — rare with 3 slots — reuse the
current slot or skip, **never block**); `CopyResource` the acquired `MetaData.pSurface` into the
slot; release the mutex; publish `{index, ++seq}`; `SetEvent`. Then `FinishedProcessingFrame` as
today.
- **Consumer (host `IddPushCapturer`):** `WaitForSingleObject(event, timeout)`; read `latest`; if `seq`
advanced, acquire that slot's mutex, `CopyResource` into an owned NVENC-input texture, release, yield
`FramePayload::D3d11{texture, device}` — straight into the existing zero-copy NVENC path. No DDA, no
CPU readback.
### What P2 removes vs. keeps
- **Removes:** `capture/dxgi.rs`'s `DXGI_ERROR_ACCESS_LOST`/`MODE_CHANGE_IN_PROGRESS` re-duplication
churn, the legacy-`DuplicateOutput` fallback, and **`install_gpu_pref_hook()` (the `win32u.dll`
patch)** — by **pinning the render adapter to the encoder GPU** (`IddCxAdapterSetRenderAdapter`, the
existing `SET_RENDER_ADAPTER` IOCTL, driven before `ADD`), so the OS never reparents the output and
the shared texture + NVENC share one device by construction.
- **Keeps:** display **topology** (making the virtual display the composited desktop) and the
**watchdog** (now ours). The **two-process WGC secure-desktop relay** stays until we confirm the IDD
push also delivers the secure (Winlogon) desktop; if it does, that retires too.
### On-glass attempt 2026-06-22 — code complete, blocked at driver load
The full transport (driver publisher + host `IddPushCapturer` + render-LUID robustness + in-process
routing) is written and compiles clean. The first on-glass A/B exposed several real things and one
hard blocker:
- **The service captures in a Session-1 WGC helper, not in-process.** `should_use_helper()` returns
true for a SYSTEM service, so it spawns a user-session helper that does capture **and input
injection**. IDD-push must capture **in-process in Session 0** (where the driver publishes) — wired
via `should_use_helper()` returning false for `PUNKTFUNK_IDD_PUSH`. **Caveat:** `SendInput` from
Session 0 can't reach the user's Session-1 desktop, so in-process IDD-push has **no working input**
yet. Production needs either a Session-1 input-only helper, or `Global\`-namespaced shared textures
so a Session-1 helper consumes IDD-push for both video + input.
- **`SET_RENDER_ADAPTER` is ignored by the driver** (the IDD lands on a different adapter than pinned:
observed IDD adapter `0xd60722` vs pinned 4090 `0x15de1`). The render-LUID-in-header path makes the
host bind correctly regardless, but the driver should be made to actually honor the pin (or the host
must copy across adapters) so NVENC gets a 4090 surface.
- **Cursor is included** in the IddCx composited frame (DDA strips it) — so the host-side cursor
compositor (P2.5) is likely unnecessary for this path.
- **`FAILED_POST_START` was a red herring (churn, not the binary).** Comparing the 2157 (works) and
the `frame_transport` DLL import tables: **identical** (same 8 DLLs; the size/hash delta is just the
Authenticode signature). A clean install **+ reboot** (no `restart-device`/`disable-enable`/kill in
between) loads the `frame_transport` driver to **`OK`**. The earlier `FAILED_POST_START` was the
device wedging from the hot-reload churn (the deploy gotchas above). **Lesson: deploy = install +
reboot, full stop.**
- **THE REAL BLOCKER — the driver can't CREATE the shared objects.** With the driver loaded clean and
the monitor active, the host's `IddPushCapturer` still times out: `pfvd-hdr-<target> never appeared`.
The driver's own `OutputDebugString` is invisible (UMDF redirects it to ETW, not DebugView — verified
with a working DBWIN self-test), so a **file-logging** driver build was tried — and it wrote **no
file at all**, even though `init()` runs in `DriverEntry`, the device is `OK`, WUDFHost runs as
`LocalService`, and `C:\Users\Public` is world-writable. **WUDFHost runs with a restricted token: it
can neither write the filesystem nor create named kernel objects** (`CreateFileMappingW`/`CreateEventW`/ can neither write the filesystem nor create named kernel objects** (`CreateFileMappingW`/`CreateEventW`/
`CreateSharedHandle`), so `FramePublisher::new` fails silently. This is exactly why the **gamepad UMDF `CreateSharedHandle` all fail silently), which a file-logging driver build confirmed (it wrote no file at
drivers invert it**: `inject/dualsense_windows.rs` — *"the host creates the section (privileged → a all even though `init()` runs in `DriverEntry` and the device is `OK`). This is exactly why the gamepad
permissive SDDL so the WUDFHost can open it); the driver maps it"* — `Global\pfds-shm-<idx>` + SDDL UMDF drivers invert it (`inject/dualsense_windows.rs`): **the HOST creates the section** (privileged → a
`D:(A;;GA;;;WD)`. **Fix: invert frame-push to match.** The HOST creates the header + event + ring permissive `Global\` name + SDDL `D:(A;;GA;;;WD)`) and **the DRIVER only OPENS it**. The host-created-ring
textures (`Global\` names, `D:(A;;GA;;;WD)` SDDL); the DRIVER only OPENS them, writes its actual / restricted-open split was implemented and **works every time** (`created shared ring … render_luid=…`,
render LUID + a status code back into the host-created header (so we get driver visibility through the no name collisions after the per-attempt generation fix). The gamepad drivers independently prove a UMDF
host log), and runs the copy loop. The host creates the textures on the render adapter the driver driver *can* open + write a host-created `Global\` section on this box — so the driver writing nothing is
reports. **not** an access problem.
- **Also unresolved: `SET_RENDER_ADAPTER` appears ignored** (the host's pin to the 4090 vs the ADD-reply
adapter differ every time). The inverted header carries the driver's *actual* render LUID so the host
can create textures + run NVENC on the right adapter — but if that's the iGPU, NVENC (NVIDIA) can't
encode it, so the driver must be made to honor the pin (or the host must cross-adapter copy). Needs its
own investigation.
**Driver deploy gotchas learned (this box):** hot-reloading a UMDF display driver is unreliable ### Root cause — the swap-chain is never assigned (fundamental, not fixable)
`pnputil /restart-device` does NOT restart WUDFHost (old image stays mapped), `Disable/Enable-PnpDevice`
errors on the root-enumerated IDD, and **killing WUDFHost invalidates the host's cached `{e5bcc234}`
control handle** (every ADD then fails `0x80070006`, and the device can wedge to `FAILED_POST_START`).
A **reboot** loads a freshly-installed build cleanly. **Recovery** from a broken build is clean and
reboot-free: `pnputil /delete-driver <oemNN>.inf /uninstall` removes the bad package and the device
rebinds the previous (validated) package in the DriverStore — restored 2157 → `OK` immediately.
### On-glass attempt 2 (2026-06-23) — inversion works; in-process Session-0 path is a dead end Across **every** configuration tested, the driver's `run_core` swap-chain processor is **never entered**
(`run_core_entries=0`):
Implemented the **inversion** (host creates the header + event + ring textures with the
`D:(A;;GA;;;WD)` SDDL, driver only opens them) + a per-attempt **generation** (kills the
`DXGI_ERROR_NAME_ALREADY_EXISTS` retry collisions) + a fixed-name **`Global\pfvd-dbg` debug channel**
(structured counters the driver writes, since UMDF/ETW + the restricted token block its other logs).
Results on the RTX box:
- ✅ The host **creates the shared ring every time** (`created shared ring … render_luid=…`) — the
privileged-create / restricted-open split is sound.
- ✅ No more name collisions (generation fix).
-**The driver writes NOTHING** — debug block all zeros, crucially `run_core_entries=0`. The
swap-chain processor **never runs**, i.e. the OS **never assigns a swap-chain** to the virtual
monitor in this path.
**Root cause: an IddCx monitor only gets a swap-chain when something PRESENTS to it, and the in-process
path has no presenter.** The host + the CCD topology-isolate run in **Session 0, which has no DWM /
compositor**. The WGC path works because its capture helper lives in **Session 1**, where DWM composes
the desktop onto the display (that composition is the swap-chain trigger). So in-process Session-0
IDD-push gets no frames to push, full stop — a **fundamental** barrier, not a fixable bug. The original
plan's "Session-0 transport is the long pole" was right, but the long pole turned out to be *triggering
presentation*, not the shared-memory mechanics (those work).
**Consequence:** the only viable IDD-push shape is **option 3 — a Session-1 helper drives presentation +
consumes the `Global\` ring** (the inversion built here is exactly what it needs). But it carries an
unretired risk: it's still unproven whether the swap-chain gets assigned even with a Session-1 consumer
that isn't WGC. Until that's answered, **DDA/WGC stays the shipping Windows capture path** — it works.
All the IDD-push code (driver open-side + host create-side + debug channel) is written, compiles, and is
gated behind `PUNKTFUNK_IDD_PUSH` (off), so it's dormant and harmless.
### CONCLUSION (2026-06-23): IDD-push is not viable for bare-metal capture — the swap-chain is never assigned
After the inversion + a fixed-name debug channel + a host-created-ring observer + an autonomous
loopback test harness (`punktfunk-probe` → the SYSTEM service, paired via the mgmt API), the question
"does the driver's swap-chain processor ever run?" was answered **definitively: no.** The driver's
`run_core` is **never entered**`run_core_entries=0` in *every* configuration tested:
- in-process (Session 0) and WGC-triggered (Session 1 helper) sessions, - in-process (Session 0) and WGC-triggered (Session 1 helper) sessions,
- a user-created ring AND a host-created (LocalSystem) ring with a permissive `D:(A;;GA;;;WD)` SDDL, - a user-created ring AND a host-created (LocalSystem) ring with the permissive `D:(A;;GA;;;WD)` SDDL,
- with and without a Low-IL (`S:(ML;;NW;;;LW)`) mandatory label, - with and without a Low-IL (`S:(ML;;NW;;;LW)`) mandatory label,
- with WUDFHost confirmed **not** an AppContainer (`IsAppContainer=0`), - with WUDFHost confirmed **not** an AppContainer (`IsAppContainer=0`),
— even while WGC simultaneously captured the same virtual monitor's composition and streamed multi-MB — even while WGC simultaneously captured the same virtual monitor's composition and streamed multi-MB of
of HEVC. The gamepad UMDF drivers prove a UMDF driver *can* open + write a host-created `Global\` HEVC.
section on this box, so the driver writing nothing is **not** an access problem — `run_core` simply
does not run.
**Root cause (researched + ecosystem-confirmed):** an IddCx virtual monitor only receives a swap-chain **An IddCx virtual monitor only receives a swap-chain (`EVT_IDD_CX_MONITOR_ASSIGN_SWAPCHAIN`) when the OS
(`EVT_IDD_CX_MONITOR_ASSIGN_SWAPCHAIN`) when the OS **presents/scans-out** to it, which requires a real presents/scans-out to it, which requires a real presentation consumer. WGC/DDA capture of the composed
presentation consumer. **WGC/DDA capture of the composed desktop does NOT count** — it reads DWM's desktop does NOT count** — it reads DWM's composition, bypassing the driver's swap-chain. With no physical
composition, bypassing the driver's swap-chain. With no physical scanout and no consumer that routes scanout and no consumer that routes *through the driver*, the path stays inactive (`IDDCX_PATH_FLAGS=0`) and
*through the driver*, the path stays inactive (`IDDCX_PATH_FLAGS=0`) and `ASSIGN_SWAPCHAIN` never fires. `ASSIGN_SWAPCHAIN` never fires. Session 0 additionally has no DWM/compositor at all.
Confirming evidence:
Ecosystem + first-party confirmation:
- **Every bare-metal virtual-display capture project uses WGC/DDA, not the driver swap-chain:** SudoVDA - **Every bare-metal virtual-display capture project uses WGC/DDA, not the driver swap-chain:** SudoVDA
(its swap-chain loop acquires-and-discards), Apollo/Sunshine (DDA + WGC backends), virtual-display-rs (its swap-chain loop acquires-and-discards), Apollo/Sunshine (DDA + WGC backends), virtual-display-rs
(discards), parsec-vdd (no frame path). Only **Looking Glass** consumes the driver swap-chain — and (discards), parsec-vdd (no frame path). Only **Looking Glass** consumes the driver swap-chain — and only
only because a **VM guest scans out** the display (the consumer). We have no equivalent on bare metal. because a **VM guest scans out** the display (the consumer). Bare metal has no equivalent.
- Microsoft's own unanswered Q&A (learn.microsoft.com/answers 4096179) reports the identical symptom for - Microsoft's own unanswered Q&A (learn.microsoft.com/answers 4096179) reports the identical symptom for
the IddSampleDriver: virtual display "always inactive," `ASSIGN_SWAPCHAIN` never runs. the IddSampleDriver: virtual display "always inactive," `ASSIGN_SWAPCHAIN` never runs.
**Verdict:** the "driver consumes its swap-chain and pushes frames" architecture (P2 / Looking-Glass ### Both remaining escape hatches tested and closed
style) **cannot get frames** for punktfunk's bare-metal, whole-desktop, capture-only use case. The
shared-memory transport machinery (host-creates / driver-opens, the gamepad pattern) is all sound and
proven to *create*, but there is nothing for the driver to publish. **DDA/WGC remains the only viable
Windows capture path**, which is exactly what the entire ecosystem does. The IDD-push code stays
in-tree, compiles, and is gated `off` (`PUNKTFUNK_IDD_PUSH`) — dormant and harmless — documenting the
attempt so it isn't re-tried. "Better performance/lower overhead" must come from optimizing the WGC/DDA
path (e.g. trimming the Session-0↔Session-1 relay, zero-copy encode), not from IDD-push.
The only unexplored avenue is **driver-side** (a different adapter/monitor/path setup that might make the - **Option 3 — a present *source* on the display — TESTED, failed.** A present-trigger added to the
OS treat the virtual display as a presentation target) — but it needs a reboot to test, the MS Q&A Session-1 WGC helper successfully created a D3D11 swapchain on the virtual display and presented
suggests it's unsolved, and the unanimous ecosystem choice of WGC/DDA argues it's a dead end. continuously (WGC even captured the flashing window). The driver stayed `run_core_entries=0` /
`frames_acquired=0`. So an active present *source* does NOT make the OS assign the driver's swap-chain —
DWM composes the present onto the display (capturable) without routing it through the driver.
- **Option 2 — a driver flag — closed by analysis.** The present-trigger succeeding proves the **path is
already active**; the missing piece is **scanout routed through the driver**, which the OS does only for a
real consumer (physical display / VM guest / RDP). The one IddCx flag for that —
`IDDCX_ADAPTER_FLAGS_REMOTE_SESSION_DRIVER` — requires the **RDP protocol stack** as the consumer, which
bare-metal console capture has no equivalent of.
**Final exhaustion (2026-06-23, follow-up): both remaining avenues closed.** ### Verdict (final)
- **Option 3 (present source) — TESTED, failed.** Added a present-trigger to the Session-1 WGC helper: IDD-push needs a presentation consumer (scanout / VM guest / RDP) that bare-metal console desktop-capture
it successfully created a D3D11 swapchain on the virtual display and presented continuously (WGC even fundamentally cannot provide. No host-side capture, no in-process path, no present source, and no available
captured the flashing window). The driver stayed `run_core_entries=0` / `frames_acquired=0`. So an driver flag overcomes it. **WGC (normal desktop) + DDA (secure desktop) is the only viable Windows capture
active *present source* on the display does NOT make the OS assign the driver's swap-chain either — path — as the entire ecosystem already does.** Any future "lower overhead" must come from optimizing the
DWM composes the present onto the display (capturable) without routing it through the driver's WGC/DDA path (trimming the Session-0↔Session-1 relay, zero-copy encode), **not** from IDD-push. The
swap-chain. remaining gaps a hypothetical IDD-push would also have had (cursor delivered separately via
- **Option 2 (driver flag) — closed by analysis.** The present-trigger succeeding proves the **path is `IddCxMonitorSetupHardwareCursor`/`QueryHardwareCursor`; HDR needing the IddCx **1.11 D3D12 acquire path**
already active** (a swapchain presents to the display fine); the missing piece is **scanout routed `SetDevice2`/`ReleaseAndAcquireBuffer2``ID3D12Resource`, since the default swap-chain surface is 8-bit)
through the driver**, which the OS does only for a real consumer (physical display / VM guest / RDP). are moot for the same reason.
The one IddCx flag for that — `IDDCX_ADAPTER_FLAGS_REMOTE_SESSION_DRIVER` — requires the **RDP
protocol stack** as the consumer, which bare-metal console capture has no equivalent of.
**Verdict is final:** IDD-push needs a presentation consumer (scanout / VM guest / RDP) that bare-metal ## Open items
console desktop-capture fundamentally cannot provide. No host-side capture, no in-process path, no
present source, and no available driver flag overcomes it. WGC (normal desktop) + DDA (secure desktop)
is the only viable Windows capture path — as the entire ecosystem already does. The IDD-push +
present-trigger code stays in-tree, gated off, as the documented record of the attempt.
### Known gaps the build-out must close (tracked as P2.* tasks) **None.** P1 shipped; P2 is a permanent *do-not-pursue* record (no pending work). WGC/DDA is the shipping
capture path.
- **Cursor.** DDA/WGC composite the HW cursor host-side from frame-info; the IDD path delivers the
cursor separately (`IddCxMonitorSetupHardwareCursor` event → `QueryHardwareCursor`). The prototype
may ship cursor-less; the build-out wires the IDD cursor into the existing `CursorCompositor`.
- **HDR.** The default IddCx swap-chain surface is 8-bit `B8G8R8A8`; FP16/HDR needs the **IddCx 1.11
D3D12 acquire path** (`SetDevice2`/`ReleaseAndAcquireBuffer2``ID3D12Resource`). Build against
1.10, runtime-gate 1.11. SDR-only for the prototype.
## Why we'd do this
The user's goals, mapped to outcomes:
| Goal | Outcome |
| --- | --- |
| Drop external deps | No more vendored prebuilt SudoVDA `.dll`/`.cat` (third-party, C++, single upstream). |
| Increase Rust coverage | The display driver joins the gamepad drivers as in-tree Rust UMDF. |
| Own the stack / easier display management | We control the IOCTL protocol, the EDID, the mode list, the watchdog — and can fold the topology/mode logic that's currently scattered in `vdisplay/sudovda.rs` into the driver. |
| Cleaner code | Phase 2 retires `capture/dxgi.rs`'s DDA workarounds + the `win32u.dll` patch. |
## What we'd be replacing (current architecture)
- **Driver:** SudoVDA — UMDF2 IddCx, `Class=Display`, `UmdfExtensions=IddCx0102`,
`UpperFilters=IndirectKmd`, root-enumerated `Root\SudoMaker\SudoVDA`. Vendored prebuilt under
`packaging/windows/sudovda/`, installed by `install-sudovda.ps1` (cert → `nefconc` devnode →
`pnputil`). Source is public ([SudoMaker/SudoVDA](https://github.com/SudoMaker/SudoVDA), README-only
MIT/CC0 grant over the MS sample, ~1,900 LOC C++).
- **Host contract:** `crates/punktfunk-host/src/vdisplay/sudovda.rs` opens the control device by
interface GUID `{e5bcc234-…}` and drives a tiny `METHOD_BUFFERED` IOCTL protocol — byte-identical to
SudoVDA's `Common/Include/sudovda-ioctl.h`:
- `ADD (0x800)` `{w,h,refresh,GUID,name[14],serial[14]}``{LUID, target_id}`
- `REMOVE (0x801)` `{GUID}` · `SET_RENDER_ADAPTER (0x802)` `{LUID}` · `GET_WATCHDOG (0x803)` ·
`PING (0x888)` (mandatory keepalive) · `GET_VERSION (0x8FF)`
- **Capture:** `capture/dxgi.rs` finds the virtual monitor's GDI output **across all adapters** (it's
enumerated under the *rendering* GPU, not SudoVDA's LUID) and runs **DXGI Desktop Duplication**
(`DuplicateOutput1`, FP16 for HDR). This file is **dominated by virtual-display-over-DDA survival
code**: `DXGI_ERROR_ACCESS_LOST` re-duplication with retries, `MODE_CHANGE_IN_PROGRESS` backoff,
legacy-`DuplicateOutput` fallback, CCD display isolation to make the IDD the sole composited
desktop, and an **`install_gpu_pref_hook()` that patches `win32u.dll!NtGdiDdDDIGetCachedHybridQueryValue`**
to stop DXGI reparenting the output across GPUs. Most of that exists *because* we capture a virtual
display via DDA on a multi-GPU box.
## Feasibility findings
### Signing — green (the make-or-break)
UMDF user-mode ⇒ Code-Integrity signing rules don't apply to our binary (the only kernel piece is
Microsoft's inbox `IndirectKmd`). Self-signed cert in `Root` + `TrustedPublisher` is sufficient on a
normal Secure-Boot Win11 box — no `bcdedit /set testsigning`. SudoVDA and `virtual-display-rs` both
ship this way. This is the **same** model as our DualSense/DS4/XUSB drivers. (The only thing that
breaks install is a botched cert placement, not a signing *tier*.)
### Rust prior art — exists, MIT, reusable
`virtual-display-rs` proves an all-Rust IddCx driver runs in production and gives us:
`wdf-umdf-sys` (bindgen over WDF **and** `iddcx.h`, links `IddCxStub`), `wdf-umdf` (safe wrappers —
`iddcx.rs` ~300 LOC, with an `IddCxIsFunctionAvailable!` version-gate macro), and a reference driver
(`swap_chain_processor.rs` ~158 LOC, `direct_3d_device.rs`, `edid.rs`). **Caveat:** it uses its *own*
bindgen stack, **not** `microsoft/windows-drivers-rs` — see Decision D2.
### windows-drivers-rs IddCx support — absent, but a bounded extension
Our `wdk-sys` (m0) binds Base + WDF + feature-gated subsets (hid/gpio/spb/…). **Zero IddCx symbols.**
Adding it is the same shape as the existing subsets: an `ApiSubset::Iddcx` variant + `iddcx` feature →
`iddcx_headers()` returning `iddcx.h` for bindgen, and linking `IddCx.lib`. IddCx functions are **not**
WDF-table functions, so the `call_unsafe_wdf_function_binding!` macro doesn't apply — they're direct
`IddCx.lib` exports we'd `#[link(name="IddCx")] extern` (or bindgen) and wrap ourselves.
`windows` 0.58 (already in the tree) provides the Direct3D11/Dxgi APIs the swap-chain loop needs.
### The IddCx driver itself — well-understood, ~12k LOC
Required callbacks (baselined on the MS [IddSampleDriver](https://github.com/microsoft/Windows-driver-samples/blob/main/video/IndirectDisplay/IddSampleDriver/Driver.cpp), ~1,100 LOC, IddCx 1.4):
`EVT_IDD_CX_ADAPTER_INIT_FINISHED`, `ADAPTER_COMMIT_MODES`, `PARSE_MONITOR_DESCRIPTION`,
`MONITOR_GET_DEFAULT_DESCRIPTION_MODES`, `MONITOR_QUERY_TARGET_MODES`, `MONITOR_ASSIGN_SWAPCHAIN`
(the only callback with real D3D work), `MONITOR_UNASSIGN_SWAPCHAIN`, and `DEVICE_IO_CONTROL` (where
our ADD/REMOVE/PING IOCTLs live). Init flow: `WdfDeviceCreate → IddCxDeviceInitConfig →
IddCxDeviceInitialize → IddCxAdapterInitAsync → IddCxMonitorCreate → IddCxMonitorArrival`.
**Arbitrary resolutions don't need EDID timings:** ship one generic ~128/256-byte EDID base block to
make Windows treat the target as a real monitor, then advertise modes programmatically from the
mode-list callbacks — a static table **plus the runtime-requested client mode injected as preferred**
(exactly SudoVDA's `s_DefaultModes[]` + per-ADD preferred-mode approach). 5120×1440@240 just gets
added at ADD time.
**HDR/10-bit:** supported, but it's the one place IddCx is *harder* than today. The default swap-chain
surface is **8-bit `A8R8G8B8`**; FP16/HDR requires the IddCx **1.11 D3D12 acquire path**
(`SetDevice2`/`ReleaseAndAcquireBuffer2``ID3D12Resource`, with a stricter sync model). Our box is
Win11 26200 (IddCx ≥ 1.10), so this is reachable, but it's real work — and our current WGC/DDA path
gives FP16 HDR "for free." Build against 1.10 and runtime-gate the newer DDIs (SudoVDA's pattern).
## The architectural prize: skip DDA (Phase 2)
An IddCx driver gets each presented frame from `IddCxSwapChainReleaseAndAcquireBuffer` as an
`IDXGIResource` on a device **we** bind via `IddCxSwapChainSetDevice`. We can copy it into a shared
texture / shared section and hand it to the host's encoder process directly — **no Desktop
Duplication**. Why this is the real win, not just a detour:
- **It's the *intended* IddCx use case.** IddCx exists for remote/wireless/USB displays that ship
swap-chain frames over a wire; consuming frames in the driver is the designed path, and **Looking
Glass already does exactly this** (driver → shared memory → separate consumer, no DDA).
- **It kills the multi-GPU bug class.** We call `IddCxAdapterSetRenderAdapter` to pin the swap-chain to
the **same GPU as our NVENC encoder before adding the monitor**, and the OS honors it. No more DXGI
reparenting the output onto the wrong GPU, no ACCESS_LOST storms, and we can **retire
`install_gpu_pref_hook()` (the `win32u.dll` patch)** and most of `capture/dxgi.rs`. Swap-chain
re-creation becomes a documented, in-band event (`ABANDON_SWAPCHAIN`) instead of an undocumented
failure we fight with retries.
What it does **not** remove (be honest): display **topology** management — making the virtual display
the sole/primary composited desktop so the game (and Winlogon) render to it — is independent of how we
*get* frames and stays (though we can integrate it more cleanly). And the watchdog stays, now ours.
The cost: a **Session-0 → service cross-process frame transport** (the driver host is `WUDFHost` in
Session 0 / LocalService; our host is a LocalSystem service). A `Global\`-named, explicitly-ACL'd
shared section + keyed-mutex texture (Looking Glass's shape) is where the engineering actually goes —
prototype this first, it's the only genuinely new risk. Plus the HDR D3D12 path above.
## Decisions to make at kickoff
- **D1 — Own the driver?** Recommend **yes, in Rust.** (Alternatives: fork SudoVDA's C++ — fastest to a
known-good HDR driver but reintroduces a C++ toolchain and README-only license provenance; or keep
vendoring — zero cost, but none of the goals.)
- **D2 — Binding stack?** The main implementation fork.
- **(a)** Extend our `windows-drivers-rs` (m0) with an `iddcx` subset — **one toolchain across all
our drivers**, our build env, but we write the IddCx bindings ourselves (+~35 wk), using
`virtual-display-rs`'s `iddcx.rs` as the 1:1 guide. *Preferred for consistency.*
- **(b)** Vendor `virtual-display-rs`'s `wdf-umdf*` crates (MIT) — fastest to first light, but a
*second* WDK-binding stack in-tree.
- Suggested sequence: **prototype on (b) to prove IddCx-on-our-box in days**, then build production on
**(a)** for consistency.
- **D3 — Frame transport?** Phase it: **DDA-compatible first** (zero capture-side change), **direct
push second** (the cleanup). Don't couple the driver rewrite to the transport rewrite.
## Recommended plan
- **P0 — now:** keep vendoring SudoVDA. No change. (The gamepad-driver installer work just shipped;
this is independent.)
- **P1 — drop-in Rust IddCx driver (`pf-vdisplay`).** Replicate SudoVDA's IOCTL contract **exactly**
(same struct layouts; reuse or re-issue the control interface GUID) so `vdisplay/sudovda.rs` needs
**~zero change** (at most a GUID constant). Class=Display + IddCx INF, our own EDID + programmatic
mode list incl. the per-ADD client mode, the watchdog, a real swap-chain drain (the vdd port — the
drain is required so DWM keeps compositing; DDA/WGC still captures the desktop). Bundle + self-sign +
`pnputil`-install via the installer, identical to the gamepad-driver path we just built. **Outcome:** all-Rust, SudoVDA dependency dropped, DDA capture
unchanged. Effort ≈ **24 wk to first light**, **57 wk to parity** (HDR, multi-monitor, CI).
- **P2 — direct frame push (kill DDA).** Add a swap-chain processor that copies each frame into a
shared section/texture; new `capture` backend reads it directly; pin the render adapter to the
encoder GPU. Gate behind a flag, validate against DDA, then retire the DDA path + the `win32u.dll`
patch. HDR via the IddCx 1.11 D3D12 acquire path. **Outcome:** the real "owning the stack pays off"
cleanup. Effort: additional; the Session-0 transport is the long pole.
## Risks
1. **D3-in-a-driver swap-chain loop** — the one genuinely new piece; bugs here = black screens/TDR.
Mitigated by `virtual-display-rs`'s `swap_chain_processor.rs` + the MS sample as references.
2. **Session-0 cross-process transport** (P2) — the actual hard part; prototype it first.
3. **HDR = the harder D3D12 1.11 path** — our current WGC/DDA HDR is free; the IddCx HDR path is not.
4. **Two binding stacks** if we go D2(b) — a maintenance cost cutting against "clean/consistent."
5. **No WHQL ⇒ no Windows Update / Dev-Center distribution** — same constraint our gamepad drivers
already accept (bundle + self-sign + import cert).
## References
- IddCx model + signing: [IDD model overview](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/indirect-display-driver-model-overview) ·
[IddCx versions](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/iddcx-versions) ·
[1.10+ updates](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/iddcx1.10-updates) ·
[UMDF signing](https://learn.microsoft.com/en-us/archive/blogs/peterwie/do-umdf-drivers-require-signing)
- Swap-chain / frames: [IDDCX_METADATA](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/ns-iddcx-iddcx_metadata) ·
[SetDevice](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/nf-iddcx-iddcxswapchainsetdevice) ·
[SetRenderAdapter](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/nf-iddcx-iddcxadaptersetrenderadapter) ·
[ASSIGN_SWAPCHAIN](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/iddcx/nc-iddcx-evt_idd_cx_monitor_assign_swapchain)
- Prior art: [microsoft IddSampleDriver](https://github.com/microsoft/Windows-driver-samples/tree/main/video/IndirectDisplay) ·
[SudoMaker/SudoVDA](https://github.com/SudoMaker/SudoVDA) ([ioctl.h](https://github.com/SudoMaker/SudoVDA/blob/master/Common/Include/sudovda-ioctl.h)) ·
**[MolotovCherry/virtual-display-rs (Rust, MIT)](https://github.com/MolotovCherry/virtual-display-rs)** ·
[Looking Glass IDD (swap-chain → shm, no DDA)](https://deepwiki.com/gnif/LookingGlass/2.5-indirect-display-driver-(idd)) ·
[itsmikethetech/Virtual-Display-Driver](https://github.com/itsmikethetech/Virtual-Display-Driver)
+28 -23
View File
@@ -3,6 +3,8 @@
A one-file, signed `setup.exe` for the punktfunk streaming **host** on Windows, published to Gitea's A one-file, signed `setup.exe` for the punktfunk streaming **host** on Windows, published to Gitea's
generic package registry (`punktfunk-host-windows`) by `.gitea/workflows/windows-host.yml`. generic package registry (`punktfunk-host-windows`) by `.gitea/workflows/windows-host.yml`.
> Full picture (drivers-from-source, toolchain, CI, dev loop): **[`design/windows-build-and-packaging.md`](../../design/windows-build-and-packaging.md)**. This README is the `packaging/windows/` file index.
## x64 only (no ARM64) ## x64 only (no ARM64)
Unlike the client (which ships x64 + ARM64 MSIX), the host is **x64-only by design**. It is coupled to Unlike the client (which ships x64 + ARM64 MSIX), the host is **x64-only by design**. It is coupled to
@@ -18,17 +20,20 @@ interactive session for secure-desktop (UAC / lock screen) capture, adds firewal
on the **pf-vdisplay** UMDF/IDD virtual-display driver. MSIX's sandbox can install **neither** a SYSTEM on the **pf-vdisplay** UMDF/IDD virtual-display driver. MSIX's sandbox can install **neither** a SYSTEM
service of this kind **nor** a driver. So the host ships as a classic elevated installer. service of this kind **nor** a driver. So the host ships as a classic elevated installer.
The installer is deliberately thin: the real install logic — SCM registration, firewall rules, the The installer is deliberately thin: the real install logic lives in `punktfunk-host` subcommands, not
default `host.env`, and the SYSTEM→interactive-session supervisor — already lives in in PowerShell — `service install` (SCM registration, firewall rules, the default `host.env`, the
`punktfunk-host service install` (`crates/punktfunk-host/src/service.rs`). The installer just lays the SYSTEM→interactive-session supervisor; `service.rs`), `driver install [--gamepad]` and `web setup`
exe into `C:\Program Files\punktfunk\` and calls that subcommand, elevated. (driver/console provisioning; `windows/install.rs`). The installer lays the exe into
`C:\Program Files\punktfunk\` and calls those subcommands elevated. Keeping the logic in the compiled
exe — not a `.ps1` *file* PowerShell reads in the machine codepage — is the fix for the ANSI-codepage
parse breakage that silently failed installs on non-English boxes.
## What the installer does ## What the installer does
- Installs `punktfunk-host.exe` (+ `host.env.example`, this README) to `{app}` (`C:\Program Files\punktfunk`). - Installs `punktfunk-host.exe` (+ `host.env.example`, this README) to `{app}` (`C:\Program Files\punktfunk`).
- **Optional task** *Install the pf-vdisplay virtual display driver*imports the driver's self-signed - **Optional task** *Install the pf-vdisplay virtual display driver*`punktfunk-host.exe driver install`
cert (machine `Root` + `TrustedPublisher`), creates the `root\pf_vdisplay` device node (only imports the driver's self-signed cert (machine `Root` + `TrustedPublisher`), creates the
if absent, via nefconc — never devgen`install-pf-vdisplay.ps1`), and stages the driver with `root\pf_vdisplay` device node (only if absent, via nefconc — never devgen), and stages the driver with
`pnputil /add-driver /install`. `pnputil /add-driver /install`.
Best-effort: a driver failure warns but never aborts the install (the host degrades to a physical Best-effort: a driver failure warns but never aborts the install (the host degrades to a physical
display without it). display without it).
@@ -38,7 +43,7 @@ exe into `C:\Program Files\punktfunk\` and calls that subcommand, elevated.
lays down the built **self-contained** `.output` server (Nitro `noExternals` — deps bundled + lays down the built **self-contained** `.output` server (Nitro `noExternals` — deps bundled +
tree-shaken, ~75 files, no `node_modules`) + a portable **bun**, prompts for a console login tree-shaken, ~75 files, no `node_modules`) + a portable **bun**, prompts for a console login
password (pre-filled with a secure random default, shown again on the final page; kept on upgrade), password (pre-filled with a secure random default, shown again on the final page; kept on upgrade),
then `web-setup.ps1` writes the ACL'd `%ProgramData%\punktfunk\web-password`, registers the then `punktfunk-host.exe web setup` writes the ACL'd `%ProgramData%\punktfunk\web-password`, registers the
**`PunktfunkWeb`** scheduled task (boot, SYSTEM, restart-on-failure → `web-run.cmd``bun` on **`PunktfunkWeb`** scheduled task (boot, SYSTEM, restart-on-failure → `web-run.cmd``bun` on
`:3000`), opens TCP 3000, and starts it. It proxies the host's loopback mgmt API with the host's `:3000`), opens TCP 3000, and starts it. It proxies the host's loopback mgmt API with the host's
own `%ProgramData%\punktfunk\mgmt-token`. own `%ProgramData%\punktfunk\mgmt-token`.
@@ -66,28 +71,28 @@ read it from `%ProgramData%\punktfunk\web-password`.
| File | Role | | File | Role |
|------|------| |------|------|
| `punktfunk-host.iss` | Inno Setup script (the installer definition). | | `punktfunk-host.iss` | Inno Setup script (the installer definition). |
| `pack-host-installer.ps1` | Orchestrator: cert + sign, stage the driver + FFmpeg + **web console** (`.output` + bun) bundles, run ISCC, sign setup.exe, emit registry paths. | | `pack-host-installer.ps1` | Orchestrator: cert + sign exe, **build + sign the drivers from source**, stage them + FFmpeg + the **web console** (`.output` + bun) + the HDR layer, run ISCC, sign setup.exe. |
| `stage-pf-vdisplay.ps1` | Stage the **vendored** pf-vdisplay driver + fetch/verify the **pinned** nefcon release into the bundle. | | `build-pf-vdisplay.ps1` | Build pf-vdisplay from source (the `drivers/` workspace) + clear FORCE_INTEGRITY + sign `.dll`/`.cat` + export `.cer`. |
| `install-pf-vdisplay.ps1` | Runs at install time (elevated): trust cert → gated device-node create (nefconc) → `pnputil` install. | | `build-gamepad-drivers.ps1` | Sign + catalog the gamepad drivers (`pf-dualsense` + `pf-xusb`) from the same workspace build (`-SkipBuild`), one shared cert. |
| `clear-force-integrity.ps1` | Clear the `/INTEGRITYCHECK` PE bit so a self-signed driver loads (reused by every driver build). |
| `stage-pf-vdisplay.ps1` | Stage the just-built pf-vdisplay bundle + fetch/verify the **pinned** nefcon release. |
| `../../scripts/windows/web-run.cmd` | The `PunktfunkWeb` task action: loads the mgmt token + login password env, runs the bundled `bun` on the Nitro server (`:3000`). | | `../../scripts/windows/web-run.cmd` | The `PunktfunkWeb` task action: loads the mgmt token + login password env, runs the bundled `bun` on the Nitro server (`:3000`). |
| `../../scripts/windows/web-setup.ps1` | Install-time (elevated): write the ACL'd console password, register the `PunktfunkWeb` task + firewall rule, start it. |
| `pf-vdisplay/` | **Vendored** signed pf-vdisplay driver: `pf_vdisplay.inf` / `pf_vdisplay.cat` / `pf_vdisplay.dll` / `punktfunk-driver.cer`. Built from `drivers/`. |
| `drivers/` | The all-Rust IddCx **driver source** workspace: the `pf-vdisplay` crate on `wdk-sys` / windows-drivers-rs + the owned `pf-driver-proto` ABI + `wdk-iddcx` / `wdk-probe`, plus `deploy-dev.ps1` (build/sign/install for dev). | | `drivers/` | The all-Rust IddCx **driver source** workspace: the `pf-vdisplay` crate on `wdk-sys` / windows-drivers-rs + the owned `pf-driver-proto` ABI + `wdk-iddcx` / `wdk-probe`, plus `deploy-dev.ps1` (build/sign/install for dev). |
| `reset-pf-vdisplay.ps1` | **Dev:** recover a wedged driver — stop host → reap ghost monitor nodes → reload the adapter → start host (no reboot). See *Dev iteration* below. | | `reset-pf-vdisplay.ps1` | **Dev:** recover a wedged driver — stop host → reap ghost monitor nodes → reload the adapter → start host (no reboot). See *Dev iteration* below. |
| `redeploy-pf-vdisplay.ps1` | **Dev:** one-shot redeploy — (optional) build → stop host → `deploy-dev.ps1 -Install` → reload adapter → start host. | | `redeploy-pf-vdisplay.ps1` | **Dev:** one-shot redeploy — (optional) build → stop host → `deploy-dev.ps1 -Install` → reload adapter → start host. |
| `nvenc/nvenc.def`, `nvenc/gen-nvenc-importlib.ps1` | Synthesise `nvencodeapi.lib` for the `--features nvenc` link (llvm-dlltool / lib.exe). | | `nvenc/nvenc.def`, `nvenc/gen-nvenc-importlib.ps1` | Synthesise `nvencodeapi.lib` for the `--features nvenc` link (llvm-dlltool / lib.exe). |
| `pf-vkhdr-layer/` | **HDR Vulkan layer** (standalone `cdylib`): lets Vulkan games (Doom: The Dark Ages, etc.) enable HDR over the virtual display by advertising the HDR surface formats the NVIDIA/AMD ICDs hide on an indirect display. Built by the packer, laid into `{app}\vklayer`, registered under `HKLM64\…\Khronos\Vulkan\ImplicitLayers` (opt-out *Install the HDR Vulkan layer* task). Self-gated on the display's HDR state. See its README. | | `pf-vkhdr-layer/` | **HDR Vulkan layer** (standalone `cdylib`): lets Vulkan games (Doom: The Dark Ages, etc.) enable HDR over the virtual display by advertising the HDR surface formats the NVIDIA/AMD ICDs hide on an indirect display. Built by the packer, laid into `{app}\vklayer`, registered under `HKLM64\…\Khronos\Vulkan\ImplicitLayers` (opt-out *Install the HDR Vulkan layer* task). Self-gated on the display's HDR state. See its README. |
> **Vendored driver:** pf-vdisplay is our **all-Rust IddCx** virtual display (UMDF2), built from > **Drivers are built from source, not vendored.** All three (pf-vdisplay + the gamepad pf-dualsense /
> `packaging/windows/drivers/`. It replaced the vendored SudoVDA C++ driver — full story in > pf-xusb) are members of the all-Rust `drivers/` workspace (windows-drivers-rs / IddCx) and are
> [`design/windows-virtual-display-rust-port.md`](../../design/windows-virtual-display-rust-port.md). The > **rebuilt + signed every release** by `build-pf-vdisplay.ps1` + `build-gamepad-drivers.ps1` - the
> **signed** output (`pf_vdisplay.dll`/`.inf`/`.cat` + `punktfunk-driver.cer`; signer > checked-in prebuilt binaries were deleted (a stale `.cat` once stopped covering its `.inf` →
> `punktfunk-ds-test` — the same cert the gamepad drivers ship, Class=Display, HWID `root\pf_vdisplay`) > `SPAPI_E_FILE_HASH_NOT_IN_CATALOG` on every box, and a frozen binary predated a driver IOCTL the host
> is checked in under `pf-vdisplay/`. To refresh it after a driver-source change, rebuild + re-sign with > needed). Building from source keeps `.dll`/`.inf`/`.cat` in lockstep. nefcon (the device-node tool -
> `drivers/deploy-dev.ps1` and copy the staged `pf_vdisplay.{dll,inf,cat}` over the vendored > the install creates the `root\pf_vdisplay` node with it, **never** `devgen`, which leaves persistent
> copies. nefcon (the device-node tool — the install creates the node with it, **never** `devgen`, which > phantom devices) is fetched + SHA-256-verified from its pinned release in `stage-pf-vdisplay.ps1`. See
> leaves persistent phantom devices) **is** fetched + SHA-256-verified from its pinned release in > [`design/windows-build-and-packaging.md`](../../design/windows-build-and-packaging.md) for the toolchain
> `stage-pf-vdisplay.ps1`. > + signing details.
## Dev iteration on the test box (driver) ## Dev iteration on the test box (driver)
+130
View File
@@ -0,0 +1,130 @@
<#
.SYNOPSIS
Build + sign the punktfunk virtual-gamepad UMDF drivers (pf-dualsense = DualSense/DualShock 4, pf-xusb =
Xbox 360 / XInput) FROM SOURCE, in CI, and stage them for the host installer. The gamepad analogue of
build-pf-vdisplay.ps1 - replaces the checked-in prebuilt binaries (packaging/windows/gamepad-drivers/)
so the .dll/.inf/.cat stay in lockstep with the source and never go stale.
.DESCRIPTION
Both drivers are members of the in-tree drivers workspace (packaging/windows/drivers/), so one
`cargo build --release` builds the whole workspace (this shares wdk-sys/wdk-build + the bindgen pin with
pf-vdisplay). Then, per driver: CLEAR the FORCE_INTEGRITY PE bit, sign the .dll, stampinf a DriverVer
into the INF; then Inf2Cat both catalogs and sign them. Both drivers share ONE self-signed cert (or a
supplied DRIVER_CERT secret) + ONE exported .cer - the layout install-gamepad-drivers.ps1 consumes
(per-driver .inf/.cat/.dll + one shared punktfunk-driver.cer).
Output (-Out): pf_dualsense.{dll,inf,cat} + pf_xusb.{dll,inf,cat} + punktfunk-driver.cer.
.EXAMPLE
pwsh -File build-gamepad-drivers.ps1 -Out C:\t\gamepad
#>
[CmdletBinding()]
param(
[string]$DriversDir = (Join-Path $PSScriptRoot 'drivers'),
[Parameter(Mandatory = $true)][string]$Out,
[string]$DriverVer,
[string]$CertPfxB64 = $env:DRIVER_CERT_PFX_B64,
[string]$CertPassword = $env:DRIVER_CERT_PASSWORD,
[switch]$SkipBuild
)
$ErrorActionPreference = 'Stop'
$ProgressPreference = 'SilentlyContinue'
$PSNativeCommandUseErrorActionPreference = $false
$DriversDir = (Resolve-Path $DriversDir).Path
$clear = Join-Path $PSScriptRoot 'clear-force-integrity.ps1'
$drivers = @(
@{ crate = 'pf-dualsense'; dll = 'pf_dualsense.dll'; inx = 'pf-dualsense\pf_dualsense.inx'; inf = 'pf_dualsense.inf'; cat = 'pf_dualsense.cat' }
@{ crate = 'pf-xusb'; dll = 'pf_xusb.dll'; inx = 'pf-xusb\pf_xusb.inx'; inf = 'pf_xusb.inf'; cat = 'pf_xusb.cat' }
)
foreach ($d in $drivers) {
if (-not (Test-Path (Join-Path $DriversDir $d.inx))) { throw "no $($d.inx) under $DriversDir" }
}
# --- WDK build env ----------------------------------------------------------------------------
if (-not $env:Version_Number) { $env:Version_Number = '10.0.26100.0' }
if (-not $env:LIBCLANG_PATH -and (Test-Path 'C:\Program Files\LLVM\bin\libclang.dll')) {
$env:LIBCLANG_PATH = 'C:\Program Files\LLVM\bin'
}
# Build into the DEFAULT workspace target dir (not an external CARGO_TARGET_DIR) - wdk-build walks up
# from OUT_DIR for a Cargo.lock and doesn't support out-of-tree target dirs. See build-pf-vdisplay.ps1.
$rel = Join-Path $DriversDir 'target\x86_64-pc-windows-msvc\release'
# --- 1. build (release) - one build covers the whole workspace --------------------------------
if (-not $SkipBuild) {
Write-Host "==> cargo build --release (drivers workspace) in $DriversDir"
$prevTarget = $env:CARGO_TARGET_DIR
Remove-Item Env:\CARGO_TARGET_DIR -ErrorAction SilentlyContinue
Push-Location $DriversDir
& cargo build --release
$rc = $LASTEXITCODE
Pop-Location
if ($prevTarget) { $env:CARGO_TARGET_DIR = $prevTarget } else { Remove-Item Env:\CARGO_TARGET_DIR -ErrorAction SilentlyContinue }
if ($rc -ne 0) { throw "gamepad drivers cargo build failed ($rc)" }
}
foreach ($d in $drivers) {
if (-not (Test-Path (Join-Path $rel $d.dll))) { throw "driver not built: $(Join-Path $rel $d.dll)" }
}
# --- 2. WDK sign tools ------------------------------------------------------------------------
$kits = 'C:\Program Files (x86)\Windows Kits\10\bin'
function Find-Tool([string]$name, [string]$arch) {
(Get-ChildItem "$kits\*\$arch\$name" -ErrorAction SilentlyContinue | Sort-Object FullName | Select-Object -Last 1).FullName
}
$signtool = Find-Tool 'signtool.exe' 'x64'
$stampinf = Find-Tool 'stampinf.exe' 'x64'
$inf2cat = Find-Tool 'Inf2Cat.exe' 'x86'
foreach ($t in @($signtool, $stampinf, $inf2cat)) {
if (-not $t) { throw 'a WDK tool (signtool/stampinf/Inf2Cat) was not found - install the Windows 10/11 WDK.' }
}
# --- 3. signing cert (supplied stable pfx OR fresh self-signed; shared by both drivers) -------
$cleanupCert = $null
if ($CertPfxB64) {
Write-Host '==> signing with supplied driver cert (DRIVER_CERT_PFX_B64)'
$pfx = Join-Path (Split-Path -Parent $Out) 'driver-signing.pfx'
[IO.File]::WriteAllBytes($pfx, [Convert]::FromBase64String($CertPfxB64))
$sec = if ($CertPassword) { ConvertTo-SecureString $CertPassword -AsPlainText -Force } else { $null }
$signArgs = @('/f', $pfx); if ($CertPassword) { $signArgs += @('/p', $CertPassword) }
$pubForCer = if ($sec) { Get-PfxCertificate -FilePath $pfx -Password $sec } else { Get-PfxCertificate -FilePath $pfx }
}
else {
Write-Host '==> no DRIVER_CERT_PFX_B64 -> generating a fresh self-signed driver cert (the installer trusts the bundled .cer at install time)'
$cleanupCert = New-SelfSignedCertificate -Type CodeSigningCert -Subject 'CN=punktfunk-driver' `
-CertStoreLocation Cert:\CurrentUser\My -KeyExportPolicy Exportable -NotAfter (Get-Date).AddYears(10)
$signArgs = @('/sha1', $cleanupCert.Thumbprint)
$pubForCer = $cleanupCert
}
# --- 4. stage + clear FORCE_INTEGRITY + sign dlls + stampinf infs ------------------------------
if (Test-Path $Out) { Remove-Item $Out -Recurse -Force }
New-Item -ItemType Directory -Force -Path $Out | Out-Null
if (-not $DriverVer) { $now = Get-Date; $DriverVer = '9.9.{0}.{1}' -f $now.ToString('MMdd'), $now.ToString('HHmm') }
foreach ($d in $drivers) {
$sDll = Join-Path $Out $d.dll
$sInf = Join-Path $Out $d.inf
Copy-Item (Join-Path $rel $d.dll) $sDll -Force
Copy-Item (Join-Path $DriversDir $d.inx) $sInf -Force # stampinf rewrites this copy in place
& powershell -NoProfile -ExecutionPolicy Bypass -File $clear -Path $sDll | Out-Null
& $signtool sign /fd SHA256 @signArgs $sDll | Out-Null
if ($LASTEXITCODE -ne 0) { throw "signtool sign ($($d.dll)) failed ($LASTEXITCODE)" }
& $stampinf -f $sInf -d '*' -a 'amd64' -u '2.15.0' -v $DriverVer | Out-Null
}
# --- 5. Inf2Cat both catalogs (one pass over -Out), then sign each -----------------------------
& $inf2cat /driver:$Out /os:10_X64 /uselocaltime | Out-Null
foreach ($d in $drivers) {
$sCat = Join-Path $Out $d.cat
if (-not (Test-Path $sCat)) { throw "Inf2Cat did not produce $sCat" }
& $signtool sign /fd SHA256 @signArgs $sCat | Out-Null
if ($LASTEXITCODE -ne 0) { throw "signtool sign ($($d.cat)) failed ($LASTEXITCODE)" }
}
# --- 6. one shared public .cer ----------------------------------------------------------------
Export-Certificate -Cert $pubForCer -FilePath (Join-Path $Out 'punktfunk-driver.cer') | Out-Null
if ($cleanupCert) { Remove-Item "Cert:\CurrentUser\My\$($cleanupCert.Thumbprint)" -Force -ErrorAction SilentlyContinue }
Write-Host "==> built + signed gamepad drivers DriverVer=$DriverVer -> $Out"
Get-ChildItem $Out -File | ForEach-Object { " $($_.Name) ($($_.Length) bytes)" }
+146
View File
@@ -0,0 +1,146 @@
<#
.SYNOPSIS
Build + sign the pf-vdisplay UMDF IddCx virtual-display driver FROM SOURCE, in CI, and stage it for the
host installer. This REPLACES the old vendored-prebuilt-binary model (packaging/windows/pf-vdisplay/) -
the binary went stale (frozen mid-June while the driver source kept moving), which silently shipped two
field bugs: (1) the catalog no longer covered the edited INF (pnputil SPAPI_E_FILE_HASH_NOT_IN_CATALOG on
every box), and (2) the binary predated IOCTL_SET_RENDER_ADAPTER that the host needs to pin the IDD render
GPU on hybrid/Optimus boxes. Building every release from source keeps the .dll/.inf/.cat in lockstep and
ships current driver features.
.DESCRIPTION
Mirrors packaging/windows/drivers/deploy-dev.ps1 but for CI (release build, output to -Out, cert from a
secret OR a fresh self-signed). Steps: cargo build (the wdk-sys/windows-drivers-rs driver workspace) ->
CLEAR the FORCE_INTEGRITY PE bit (wdk-build links /INTEGRITYCHECK, which a non-EV cert can't satisfy) ->
sign the .dll -> stampinf a strictly-increasing DriverVer into the INF -> Inf2Cat the catalog -> sign the
catalog -> export the public .cer. Output (-Out): pf_vdisplay.{dll,inf,cat} + punktfunk-driver.cer.
Requires the WDK build env: cargo + the x64 MSVC toolset, an LLVM compatible with the driver's bindgen
(>= 0.72 supports current clang), LIBCLANG_PATH, and the Windows 10/11 WDK (the runner has these). Sets
Version_Number for wdk-build if the caller didn't.
.EXAMPLE
pwsh -File build-pf-vdisplay.ps1 -Out C:\t\pfvd -DriverVer 9.9.0626.1612
#>
[CmdletBinding()]
param(
[string]$DriversDir = (Join-Path $PSScriptRoot 'drivers'),
[Parameter(Mandatory = $true)][string]$Out,
[string]$DriverVer, # default: 9.9.MMdd.HHmm (strictly-increasing)
[string]$CertPfxB64 = $env:DRIVER_CERT_PFX_B64, # optional stable driver-signing cert (CI secret)
[string]$CertPassword = $env:DRIVER_CERT_PASSWORD,
[switch]$SkipBuild # reuse an existing target\...\release\pf_vdisplay.dll
)
$ErrorActionPreference = 'Stop'
$ProgressPreference = 'SilentlyContinue'
$PSNativeCommandUseErrorActionPreference = $false
$DriversDir = (Resolve-Path $DriversDir).Path
$inx = Join-Path $DriversDir 'pf-vdisplay\pf_vdisplay.inx'
$clear = Join-Path $PSScriptRoot 'clear-force-integrity.ps1'
if (-not (Test-Path $inx)) { throw "no pf_vdisplay.inx under $DriversDir" }
# --- WDK build env (wdk-build needs Version_Number; bindgen needs LIBCLANG_PATH) --------------
if (-not $env:Version_Number) { $env:Version_Number = '10.0.26100.0' }
if (-not $env:LIBCLANG_PATH -and (Test-Path 'C:\Program Files\LLVM\bin\libclang.dll')) {
$env:LIBCLANG_PATH = 'C:\Program Files\LLVM\bin'
}
# The driver MUST build into its DEFAULT target dir (under the driver workspace), NOT an external one:
# wdk-sys's build script calls wdk-build::find_top_level_cargo_manifest(), which walks UP from OUT_DIR
# for the first ancestor holding a Cargo.lock (it explicitly "does not support non-default target
# directories"). CI sets a shared CARGO_TARGET_DIR=C:\t, whose ancestors have no Cargo.lock -> the build
# script panics "a Cargo.lock file should exist in the same directory as the top-level Cargo.toml". So
# clear CARGO_TARGET_DIR for this build and let cargo use <driver-workspace>\target (its ancestors
# include the driver Cargo.lock). The driver has no CMake-from-source deps, so it doesn't need C:\t's
# MAX_PATH dodge, and its own [workspace] keeps it isolated from the host's tree regardless.
$drvTarget = Join-Path $DriversDir 'target'
$dll = Join-Path $drvTarget 'x86_64-pc-windows-msvc\release\pf_vdisplay.dll'
# --- 1. build (release) -----------------------------------------------------------------------
if (-not $SkipBuild) {
Write-Host "==> cargo build --release (pf-vdisplay) in $DriversDir (default target -> $drvTarget)"
$prevTarget = $env:CARGO_TARGET_DIR
Remove-Item Env:\CARGO_TARGET_DIR -ErrorAction SilentlyContinue
Push-Location $DriversDir
& cargo build --release
$rc = $LASTEXITCODE
Pop-Location
if ($prevTarget) { $env:CARGO_TARGET_DIR = $prevTarget } else { Remove-Item Env:\CARGO_TARGET_DIR -ErrorAction SilentlyContinue }
if ($rc -ne 0) { throw "pf-vdisplay cargo build failed ($rc)" }
}
if (-not (Test-Path $dll)) { throw "driver not built: $dll" }
# --- 2. WDK sign tools ------------------------------------------------------------------------
$kits = 'C:\Program Files (x86)\Windows Kits\10\bin'
function Find-Tool([string]$name, [string]$arch) {
(Get-ChildItem "$kits\*\$arch\$name" -ErrorAction SilentlyContinue | Sort-Object FullName | Select-Object -Last 1).FullName
}
$signtool = Find-Tool 'signtool.exe' 'x64'
$stampinf = Find-Tool 'stampinf.exe' 'x64'
$inf2cat = Find-Tool 'Inf2Cat.exe' 'x86'
foreach ($t in @($signtool, $stampinf, $inf2cat)) {
if (-not $t) { throw 'a WDK tool (signtool/stampinf/Inf2Cat) was not found - install the Windows 10/11 WDK.' }
}
# --- 3. signing cert (supplied stable pfx OR fresh self-signed) -------------------------------
$cleanupCert = $null
if ($CertPfxB64) {
Write-Host '==> signing with supplied driver cert (DRIVER_CERT_PFX_B64)'
$pfx = Join-Path $Out '..\driver-signing.pfx'
[IO.File]::WriteAllBytes($pfx, [Convert]::FromBase64String($CertPfxB64))
$sec = if ($CertPassword) { ConvertTo-SecureString $CertPassword -AsPlainText -Force } else { $null }
$signArgs = @('/f', $pfx); if ($CertPassword) { $signArgs += @('/p', $CertPassword) }
$pubForCer = if ($sec) { Get-PfxCertificate -FilePath $pfx -Password $sec } else { Get-PfxCertificate -FilePath $pfx }
}
else {
Write-Host '==> no DRIVER_CERT_PFX_B64 -> generating a fresh self-signed driver cert (the installer trusts the bundled .cer at install time)'
$cleanupCert = New-SelfSignedCertificate -Type CodeSigningCert -Subject 'CN=punktfunk-driver' `
-CertStoreLocation Cert:\CurrentUser\My -KeyExportPolicy Exportable -NotAfter (Get-Date).AddYears(10)
$signArgs = @('/sha1', $cleanupCert.Thumbprint)
$pubForCer = $cleanupCert
}
# --- 4. stage + clear FORCE_INTEGRITY + sign + cat --------------------------------------------
if (Test-Path $Out) { Remove-Item $Out -Recurse -Force }
New-Item -ItemType Directory -Force -Path $Out | Out-Null
$sDll = Join-Path $Out 'pf_vdisplay.dll'
$sInf = Join-Path $Out 'pf_vdisplay.inf'
$sCat = Join-Path $Out 'pf_vdisplay.cat'
$sCer = Join-Path $Out 'punktfunk-driver.cer'
Copy-Item $dll $sDll -Force
Copy-Item $inx $sInf -Force # stampinf rewrites this copy in place
# Clear FORCE_INTEGRITY BEFORE signing (it edits the PE, invalidating any signature).
& powershell -NoProfile -ExecutionPolicy Bypass -File $clear -Path $sDll | Out-Null
if (-not $DriverVer) { $now = Get-Date; $DriverVer = '9.9.{0}.{1}' -f $now.ToString('MMdd'), $now.ToString('HHmm') }
& $signtool sign /fd SHA256 @signArgs $sDll | Out-Null
if ($LASTEXITCODE -ne 0) { throw "signtool sign (dll) failed ($LASTEXITCODE)" }
& $stampinf -f $sInf -d '*' -a 'amd64' -u '2.15.0' -v $DriverVer | Out-Null
& $inf2cat /driver:$Out /os:10_X64 /uselocaltime | Out-Null
if (-not (Test-Path $sCat)) { throw "Inf2Cat did not produce $sCat" }
& $signtool sign /fd SHA256 @signArgs $sCat | Out-Null
if ($LASTEXITCODE -ne 0) { throw "signtool sign (cat) failed ($LASTEXITCODE)" }
Export-Certificate -Cert $pubForCer -FilePath $sCer | Out-Null
if ($cleanupCert) { Remove-Item "Cert:\CurrentUser\My\$($cleanupCert.Thumbprint)" -Force -ErrorAction SilentlyContinue }
# --- 5. guard: assert the freshly-built catalog covers the inf + dll ---------------------------
# Built-from-source can't drift, but this catches a botched stampinf/Inf2Cat ordering. Test-FileCatalog
# itself can't always OPEN a catalog signed by a not-yet-trusted cert (it throws UnableToOpenCatalogFile),
# so treat ITS failure as inconclusive (warn) - but a real coverage miss still fails the build.
$cat = $null
try { $cat = Test-FileCatalog -CatalogFilePath $sCat -Path $Out -FilesToSkip 'pf_vdisplay.cat', 'punktfunk-driver.cer' -Detailed }
catch { Write-Warning "catalog coverage guard inconclusive (Test-FileCatalog: $($_.Exception.Message))" }
if ($cat) {
$covered = @($cat.CatalogItems.Keys)
foreach ($need in @('pf_vdisplay.inf', 'pf_vdisplay.dll')) {
if (-not ($covered | Where-Object { $_ -like "*$need" })) {
throw "catalog coverage guard: $need is NOT in $sCat (stampinf/Inf2Cat ordering bug?)"
}
}
Write-Host " catalog covers pf_vdisplay.inf + pf_vdisplay.dll (status=$($cat.Status))"
}
Write-Host "==> built + signed pf-vdisplay DriverVer=$DriverVer -> $Out"
Get-ChildItem $Out -File | ForEach-Object { " $($_.Name) ($($_.Length) bytes)" }
+5 -5
View File
@@ -1,10 +1,10 @@
# Clear the PE FORCE_INTEGRITY bit (IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY = 0x0080) from a driver DLL. # Clear the PE FORCE_INTEGRITY bit (IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY = 0x0080) from a driver DLL.
# #
# windows-drivers-rs / wdk-build links UMDF drivers with /INTEGRITYCHECK (sets the bit) UNCONDITIONALLY # windows-drivers-rs / wdk-build links UMDF drivers with /INTEGRITYCHECK (sets the bit) UNCONDITIONALLY
# (wdk-build configure_binary_build cargo::rustc-cdylib-link-arg=/INTEGRITYCHECK; no opt-out). With the # (wdk-build configure_binary_build -> cargo::rustc-cdylib-link-arg=/INTEGRITYCHECK; no opt-out). With the
# bit set, Windows Code Integrity refuses to load a binary whose signature doesn't chain to a Microsoft # bit set, Windows Code Integrity refuses to load a binary whose signature doesn't chain to a Microsoft
# root (errors 3004/3089) so a SELF-SIGNED driver won't load. Clearing the bit (then re-signing) lets a # root (errors 3004/3089) - so a SELF-SIGNED driver won't load. Clearing the bit (then re-signing) lets a
# self-signed driver load under Secure Boot the same recipe the punktfunk gamepad drivers use, here as a # self-signed driver load under Secure Boot - the same recipe the punktfunk gamepad drivers use, here as a
# deterministic, idempotent, reusable step instead of a hand-run patch. # deterministic, idempotent, reusable step instead of a hand-run patch.
# #
# Order in the packaging flow: cargo build -> THIS -> signtool (sign .dll) -> Inf2Cat (.cat) -> sign .cat. # Order in the packaging flow: cargo build -> THIS -> signtool (sign .dll) -> Inf2Cat (.cat) -> sign .cat.
@@ -28,7 +28,7 @@ $FORCE_INTEGRITY = 0x0080
$dllchar = [BitConverter]::ToUInt16($b, $off) $dllchar = [BitConverter]::ToUInt16($b, $off)
if (($dllchar -band $FORCE_INTEGRITY) -eq 0) { if (($dllchar -band $FORCE_INTEGRITY) -eq 0) {
Write-Host ("clear-force-integrity: already clear (DllCharacteristics=0x{0:X4}) no change: $Path" -f $dllchar) Write-Host ("clear-force-integrity: already clear (DllCharacteristics=0x{0:X4}) - no change: $Path" -f $dllchar)
} else { } else {
$new = [uint16]($dllchar -band (-bnot $FORCE_INTEGRITY)) $new = [uint16]($dllchar -band (-bnot $FORCE_INTEGRITY))
[BitConverter]::GetBytes($new).CopyTo($b, $off) [BitConverter]::GetBytes($new).CopyTo($b, $off)
@@ -36,7 +36,7 @@ if (($dllchar -band $FORCE_INTEGRITY) -eq 0) {
Write-Host ("clear-force-integrity: cleared FORCE_INTEGRITY 0x{0:X4} -> 0x{1:X4} in $Path" -f $dllchar, $new) Write-Host ("clear-force-integrity: cleared FORCE_INTEGRITY 0x{0:X4} -> 0x{1:X4} in $Path" -f $dllchar, $new)
} }
# Verify on disk (re-read) the assertion. # Verify on disk (re-read) - the assertion.
$v = [BitConverter]::ToUInt16([IO.File]::ReadAllBytes($Path), $off) $v = [BitConverter]::ToUInt16([IO.File]::ReadAllBytes($Path), $off)
if (($v -band $FORCE_INTEGRITY) -ne 0) { throw ("FORCE_INTEGRITY still set after clear (0x{0:X4})" -f $v) } if (($v -band $FORCE_INTEGRITY) -ne 0) { throw ("FORCE_INTEGRITY still set after clear (0x{0:X4})" -f $v) }
Write-Host ("clear-force-integrity: verified DllCharacteristics=0x{0:X4}, FORCE_INTEGRITY clear." -f $v) Write-Host ("clear-force-integrity: verified DllCharacteristics=0x{0:X4}, FORCE_INTEGRITY clear." -f $v)
+24 -6
View File
@@ -63,15 +63,15 @@ dependencies = [
[[package]] [[package]]
name = "anyhow" name = "anyhow"
version = "1.0.102" version = "1.0.103"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c" checksum = "2a4385e2e34eb35d6b3efe798b9eb88096925d87726c0798709bf56d9ed84af3"
[[package]] [[package]]
name = "bindgen" name = "bindgen"
version = "0.71.1" version = "0.72.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5f58bf3d7db68cfbac37cfc485a8d711e87e064c3d0fe0435b92f7a407f9d6b3" checksum = "993776b509cfb49c750f11b8f07a46fa23e0a1386ffc01fb1e7d343efc387895"
dependencies = [ dependencies = [
"bitflags", "bitflags",
"cexpr", "cexpr",
@@ -394,6 +394,22 @@ version = "1.0.15"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "57c0d7b74b563b49d38dae00a0c37d4d6de9b432382b2892f0574ddcae73fd0a" checksum = "57c0d7b74b563b49d38dae00a0c37d4d6de9b432382b2892f0574ddcae73fd0a"
[[package]]
name = "pf-driver-proto"
version = "0.0.1"
dependencies = [
"bytemuck",
]
[[package]]
name = "pf-dualsense"
version = "0.0.1"
dependencies = [
"wdk",
"wdk-build",
"wdk-sys",
]
[[package]] [[package]]
name = "pf-vdisplay" name = "pf-vdisplay"
version = "0.0.1" version = "0.0.1"
@@ -408,10 +424,12 @@ dependencies = [
] ]
[[package]] [[package]]
name = "pf-driver-proto" name = "pf-xusb"
version = "0.0.1" version = "0.0.1"
dependencies = [ dependencies = [
"bytemuck", "wdk",
"wdk-build",
"wdk-sys",
] ]
[[package]] [[package]]
+1 -1
View File
@@ -7,7 +7,7 @@
# crates/pf-driver-proto from the main tree. # crates/pf-driver-proto from the main tree.
[workspace] [workspace]
resolver = "2" resolver = "2"
members = ["wdk-probe", "wdk-iddcx", "pf-vdisplay"] members = ["wdk-probe", "wdk-iddcx", "pf-vdisplay", "pf-dualsense", "pf-xusb"]
[workspace.package] [workspace.package]
edition = "2024" edition = "2024"
+4 -4
View File
@@ -15,8 +15,8 @@
Version_Number=10.0.26100.0, run `cargo build`. Version_Number=10.0.26100.0, run `cargo build`.
Re-deploying needs a HIGHER DriverVer than the installed one or pnputil silently keeps the old binary — Re-deploying needs a HIGHER DriverVer than the installed one or pnputil silently keeps the old binary —
hence the 9.9.MMdd.HHmm scheme (the vendored build is 9.5.*). If the host service is running it holds the hence the 9.9.MMdd.HHmm scheme (also what the installer build uses; a later-minute dev redeploy wins).
driver: `punktfunk-host service stop`, deploy, then start it, for a clean test. If the host service is running it holds the driver: `punktfunk-host service stop`, deploy, then start it.
.PARAMETER Install .PARAMETER Install
Also add the driver package to the store + (if absent) create the Root\pf_vdisplay devnode via nefconc. Also add the driver package to the store + (if absent) create the Root\pf_vdisplay devnode via nefconc.
Needs an ELEVATED shell. Needs an ELEVATED shell.
@@ -25,7 +25,7 @@
param( param(
[string]$Stage = 'C:\Users\Public\pfvd-stage-deploy', [string]$Stage = 'C:\Users\Public\pfvd-stage-deploy',
[string]$Thumbprint = '6A52984E54376C45A1C236B1A2C8A746C5AB6131', [string]$Thumbprint = '6A52984E54376C45A1C236B1A2C8A746C5AB6131',
[string]$Nefconc = 'C:\Users\Public\virtual-display-rs\installer\files\nefconc.exe', [string]$Nefconc = 'C:\Users\Public\nefcon\x64\nefconc.exe', # pinned nefcon (stage-pf-vdisplay.ps1 fetches it)
[switch]$Install [switch]$Install
) )
$ErrorActionPreference = 'Stop' $ErrorActionPreference = 'Stop'
@@ -56,7 +56,7 @@ Copy-Item $inx $stagedInf -Force # stampinf rewrites this copy in place
# Clear FORCE_INTEGRITY BEFORE signing (the clear edits the PE, which invalidates any signature). # Clear FORCE_INTEGRITY BEFORE signing (the clear edits the PE, which invalidates any signature).
& $clear -Path $stagedDll | Out-Null & $clear -Path $stagedDll | Out-Null
# DriverVer must strictly increase. Installed is 9.5.* — 9.9.MMdd.HHmm always wins on the same day. # DriverVer must strictly increase past whatever is installed; 9.9.MMdd.HHmm bumps every minute.
$now = Get-Date $now = Get-Date
$ver = '9.9.{0}.{1}' -f $now.ToString('MMdd'), $now.ToString('HHmm') $ver = '9.9.{0}.{1}' -f $now.ToString('MMdd'), $now.ToString('HHmm')
@@ -0,0 +1,30 @@
# pf-dualsense - punktfunk virtual DualSense (DS5) / DualShock 4 (DS4) UMDF2 HID minidriver.
# A member of the in-tree drivers workspace (shares the vendored wdk-sys/wdk-build with the bindgen pin
# + the crt-static .cargo/config), built from source per release like pf-vdisplay.
[package]
name = "pf-dualsense"
edition.workspace = true
version.workspace = true
license.workspace = true
publish = false
description = "punktfunk virtual DualSense / DualShock 4 UMDF2 HID minidriver"
[package.metadata.wdk.driver-model]
driver-type = "UMDF"
umdf-version-major = 2
target-umdf-version-minor = 31
[lib]
crate-type = ["cdylib"]
[build-dependencies]
wdk-build.workspace = true
[dependencies]
wdk.workspace = true
wdk-sys.workspace = true
[features]
default = ["hid"]
hid = ["wdk-sys/hid"]
nightly = ["wdk-sys/nightly", "wdk/nightly"]
@@ -24,8 +24,9 @@ version control / portability of the spike.
## Build / sign / install recipe (the one that actually loads) ## Build / sign / install recipe (the one that actually loads)
Prereqs on the Windows box: **WDK 26100**, **LLVM 21.1.2** (pinned — newer bindgen breaks), Prereqs on the Windows box: **WDK 26100**, **LLVM** (the current default; bindgen 0.72 builds on clang
`cargo-make`, Rust MSVC. A self-signed CodeSigning cert in `CurrentUser\My` + `LocalMachine\Root` + 22), Rust MSVC. Built as a member of the `packaging/windows/drivers/` workspace (plain `cargo build`, no
cargo-make). A self-signed CodeSigning cert in `CurrentUser\My` + `LocalMachine\Root` +
`TrustedPublisher`. `TrustedPublisher`.
Every build needs: Every build needs:
@@ -0,0 +1,29 @@
# pf-xusb - punktfunk virtual Xbox 360 XUSB companion (UMDF2, classic XInput).
# A member of the in-tree drivers workspace (shares the vendored wdk-sys/wdk-build with the bindgen pin
# + the crt-static .cargo/config), built from source per release like pf-vdisplay.
[package]
name = "pf-xusb"
edition.workspace = true
version.workspace = true
license.workspace = true
publish = false
description = "punktfunk virtual Xbox 360 XUSB companion (UMDF2 - classic XInput)"
[package.metadata.wdk.driver-model]
driver-type = "UMDF"
umdf-version-major = 2
target-umdf-version-minor = 31
[lib]
crate-type = ["cdylib"]
[build-dependencies]
wdk-build.workspace = true
[dependencies]
wdk.workspace = true
wdk-sys.workspace = true
[features]
default = []
nightly = ["wdk-sys/nightly", "wdk/nightly"]
@@ -66,8 +66,8 @@ there); these repo files are the canonical copies — keep them in sync.
`crates/punktfunk-host/src/inject/gamepad_windows.rs` is the Windows `GamepadManager` (used by `crates/punktfunk-host/src/inject/gamepad_windows.rs` is the Windows `GamepadManager` (used by
`PadBackend::Xbox360`): it SwDeviceCreate's the `pf_xusb` companion, maps `pfxusb-shm-<index>`, writes `PadBackend::Xbox360`): it SwDeviceCreate's the `pf_xusb` companion, maps `pfxusb-shm-<index>`, writes
the XInput state from the client's gamepad frame (already XInput-convention) and forwards rumble. There the XInput state from the client's gamepad frame (already XInput-convention) and forwards rumble. There
is **no ViGEmBus dependency** anymore. The driver is vendored + pnputil-installed by the Inno Setup is **no ViGEmBus dependency** anymore. The driver is built from source (`packaging/windows/drivers/pf-xusb`),
installer (`packaging/windows/gamepad-drivers/` + `install-gamepad-drivers.ps1`). signed, and pnputil-installed by the Inno Setup installer (via `install-gamepad-drivers.ps1`).
## Multi-pad ## Multi-pad
+1 -1
View File
@@ -48,7 +48,7 @@ path = "src/lib.rs"
version = "1.0.97" version = "1.0.97"
[dependencies.bindgen] [dependencies.bindgen]
version = "0.71.0" version = "0.72"
[dependencies.camino] [dependencies.camino]
version = "1.1.9" version = "1.1.9"
+1 -1
View File
@@ -67,7 +67,7 @@ version = "=0.5.1"
version = "1.0.97" version = "1.0.97"
[build-dependencies.bindgen] [build-dependencies.bindgen]
version = "0.71.0" version = "0.72"
[build-dependencies.cargo_metadata] [build-dependencies.cargo_metadata]
version = "0.19.2" version = "0.19.2"
@@ -1,36 +0,0 @@
[package]
edition = "2024"
name = "pf-dualsense"
version = "0.1.0"
publish = false
license = "MIT OR Apache-2.0"
description = "punktfunk virtual DualSense UMDF2 HID minidriver (M0 spike)"
[package.metadata.wdk.driver-model]
driver-type = "UMDF"
target-umdf-version-minor = 31
umdf-version-major = 2
[lib]
crate-type = ["cdylib"]
[build-dependencies]
wdk-build.path = "../../crates/wdk-build"
[dependencies]
wdk.path = "../../crates/wdk"
wdk-sys.path = "../../crates/wdk-sys"
[features]
default = ["hid"]
hid = ["wdk-sys/hid"]
nightly = ["wdk-sys/nightly", "wdk/nightly"]
[profile.dev]
lto = true
[profile.release]
lto = true
# Standalone package (not part of the windows-drivers-rs root workspace).
[workspace]
@@ -1,4 +0,0 @@
extend = [
{ path = "../../crates/wdk-build/rust-driver-makefile.toml" },
{ path = "../../crates/wdk-build/rust-driver-sample-makefile.toml" },
]
Binary file not shown.
Binary file not shown.
@@ -1,82 +0,0 @@
;/*++
; punktfunk virtual DualSense — UMDF2 HID minidriver INF (M0 spike).
; Adapted from the WDK vhidmini2 UMDF2 sample (VhidminiUm.inx).
; Depends on MsHidUmdf.inf (build >= 22000).
; Install: devgen /add /hardwareid "root\pf_dualsense" (after pnputil /add-driver /install)
;--*/
[Version]
Signature="$WINDOWS NT$"
Class=HIDClass
ClassGuid={745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Provider=%ProviderString%
CatalogFile = pf_dualsense.cat
PnpLockdown=1
DriverVer = 06/22/2026,16.23.43.887
[DestinationDirs]
DefaultDestDir = 13
[SourceDisksNames]
1=%Disk_Description%,,,
[SourceDisksFiles]
pf_dualsense.dll=1
[Manufacturer]
%ManufacturerString%=pf, NTamd64.10.0...22000
[pf.NTamd64.10.0...22000]
; Hardware ids: `root\pf_dualsense` for a root-enumerated devnode (devgen/devcon tests); `pf_dualsense`
; for the host's SwDeviceCreate'd DualSense (the `root\` prefix is reserved for root enumeration, so
; SwDeviceCreate rejects it with E_INVALIDARG); `pf_dualshock4` for the host's virtual DualShock 4 — the
; same driver binds both and serves the DualSense or DS4 identity per the device_type byte the host
; stamps into shared memory.
%DeviceDesc%=pfDualSense, root\pf_dualsense, pf_dualsense, pf_dualshock4
[pfDualSense.NT]
CopyFiles=UMDriverCopy
Include=MsHidUmdf.inf
Needs=MsHidUmdf.NT
Include=WUDFRD.inf
Needs=WUDFRD_LowerFilter.NT
[pfDualSense.NT.hw]
Include=MsHidUmdf.inf
Needs=MsHidUmdf.NT.hw
Include=WUDFRD.inf
Needs=WUDFRD_LowerFilter.NT.hw
[pfDualSense.NT.Services]
Include=MsHidUmdf.inf
Needs=MsHidUmdf.NT.Services
Include=WUDFRD.inf
Needs=WUDFRD_LowerFilter.NT.Services
[pfDualSense.NT.Filters]
Include=WUDFRD.inf
Needs=WUDFRD_LowerFilter.NT.Filters
[pfDualSense.NT.Wdf]
UmdfService="pf_dualsense", pf_dualsense_Install
UmdfServiceOrder=pf_dualsense
UmdfKernelModeClientPolicy=AllowKernelModeClients
UmdfFileObjectPolicy=AllowNullAndUnknownFileObjects
UmdfMethodNeitherAction=Copy
UmdfFsContextUsePolicy=CanUseFsContext2
; Each pad gets its OWN WUDFHost so the driver's per-pad statics (incl. the shm index) don't collide
; across multiple simultaneous controllers (multi-pad).
UmdfHostProcessSharing=ProcessSharingDisabled
[pf_dualsense_Install]
UmdfLibraryVersion=2.31.0
ServiceBinary="%13%\pf_dualsense.dll"
[UMDriverCopy]
pf_dualsense.dll
[Strings]
ProviderString ="punktfunk"
ManufacturerString ="punktfunk"
ClassName ="HID device"
Disk_Description ="punktfunk DualSense Installation Disk"
DeviceDesc ="punktfunk Virtual DualSense"
Binary file not shown.
Binary file not shown.
@@ -1,66 +0,0 @@
;/*++
; punktfunk virtual Xbox 360 XUSB companion — a non-HID UMDF2 driver that registers the XUSB
; device-interface GUID {EC87F1E3-...} and answers the buffered XInput IOCTLs, so classic
; XInputGetState() reads the pad without a kernel bus driver (the HIDMaestro approach). System class,
; hosted by the in-box WUDFRd reflector. Created per-session by the host via SwDeviceCreate
; (hardware id `pf_xusb`); `root\pf_xusb` is the devgen/devcon test id.
;--*/
[Version]
Signature = "$WINDOWS NT$"
Class = System
ClassGuid = {4D36E97D-E325-11CE-BFC1-08002BE10318}
Provider = %ProviderString%
CatalogFile = pf_xusb.cat
PnpLockdown = 1
DriverVer = 06/22/2026,16.17.56.696
[DestinationDirs]
DefaultDestDir = 13
[SourceDisksNames]
1 = %DiskId1%,,,""
[SourceDisksFiles]
pf_xusb.dll = 1,,
[Manufacturer]
%StdMfg%=Standard, NTamd64.10.0...22000
[Standard.NTamd64.10.0...22000]
%DeviceDesc%=pfXusb, root\pf_xusb, pf_xusb
[pfXusb.NT]
CopyFiles=Drivers_Dir
Include=WUDFRD.inf
Needs=WUDFRD.NT
[Drivers_Dir]
pf_xusb.dll
[pfXusb.NT.HW]
Include=WUDFRD.inf
Needs=WUDFRD.NT.HW
[pfXusb.NT.Services]
Include=WUDFRD.inf
Needs=WUDFRD.NT.Services
[pfXusb.NT.Wdf]
UmdfService=pf_xusb, pfXusb_Install
UmdfServiceOrder=pf_xusb
UmdfKernelModeClientPolicy=AllowKernelModeClients
UmdfFileObjectPolicy=AllowNullAndUnknownFileObjects
UmdfMethodNeitherAction=Copy
UmdfFsContextUsePolicy=CanUseFsContext2
UmdfHostProcessSharing=ProcessSharingDisabled
[pfXusb_Install]
UmdfLibraryVersion=2.31.0
ServiceBinary=%13%\pf_xusb.dll
[Strings]
ProviderString = "punktfunk"
StdMfg = "(Standard system devices)"
DiskId1 = "punktfunk XUSB Installation Disk"
DeviceDesc = "punktfunk Virtual Xbox 360 (XUSB)"
@@ -1,50 +0,0 @@
<#
.SYNOPSIS
Install the bundled punktfunk virtual-gamepad UMDF drivers - pf_dualsense (DualSense + DualShock 4,
one type-aware HID driver) and pf_xusb (Xbox 360 XUSB companion for classic XInput). Runs ELEVATED
at setup time (invoked from the installer's [Run] section). Best-effort: warns and exits 0 on any
failure, so a driver hiccup never aborts the whole install (gamepad input degrades gracefully - a
session still streams without a pad).
.DESCRIPTION
-Dir holds the staged payload: pf_dualsense.{inf,cat,dll}, pf_xusb.{inf,cat,dll}, and the signing
.cer. Steps:
1. Trust the self-signed driver cert (machine Root + TrustedPublisher) so pnputil adds it silently.
2. pnputil /add-driver each .inf - adds the package to the driver store. (No /install or device-node
creation: the host SwDeviceCreate's the per-session devnodes itself when a client forwards a pad,
so PnP binds the store driver on demand.)
ASCII-only on purpose: this is run by the installer via Windows PowerShell 5.1, which mis-decodes a
BOM-less UTF-8 non-ASCII char (e.g. an em-dash) as a smart-quote and breaks parsing.
.EXAMPLE
powershell -ExecutionPolicy Bypass -File install-gamepad-drivers.ps1 -Dir C:\path\to\gamepad
#>
[CmdletBinding()]
param([string]$Dir = $PSScriptRoot)
# Never abort the installer on a driver failure.
$ErrorActionPreference = 'Continue'
trap { Write-Warning "gamepad driver install error: $_"; exit 0 }
# 1) Trust the self-signed driver cert (Root so the chain validates + TrustedPublisher so pnputil adds
# it without a prompt).
$cer = Get-ChildItem -Path $Dir -Filter *.cer -ErrorAction SilentlyContinue | Select-Object -First 1
if ($cer) {
Write-Host "==> importing $($cer.Name) to Root + TrustedPublisher"
certutil.exe -addstore -f Root "$($cer.FullName)" | Out-Null
certutil.exe -addstore -f TrustedPublisher "$($cer.FullName)" | Out-Null
}
else { Write-Warning "no .cer in $Dir; drivers may not install silently (untrusted publisher)" }
# 2) Add each driver package to the store (idempotent; re-adding the same .inf is harmless).
$infs = Get-ChildItem -Path $Dir -Filter *.inf -ErrorAction SilentlyContinue
if (-not $infs) { Write-Warning "no driver .inf in $Dir; skipping gamepad driver install."; exit 0 }
foreach ($inf in $infs) {
Write-Host "==> pnputil /add-driver $($inf.Name)"
& pnputil.exe /add-driver "$($inf.FullName)"
$rc = $LASTEXITCODE
if ($rc -eq 3010) { Write-Host " added; a reboot is recommended." }
elseif ($rc -ne 0) { Write-Warning "pnputil /add-driver $($inf.Name) returned $rc" }
}
exit 0
-81
View File
@@ -1,81 +0,0 @@
<#
.SYNOPSIS
Install the bundled pf-vdisplay (punktfunk) virtual-display driver — our all-Rust IddCx replacement
for SudoVDA. Runs ELEVATED at setup time (invoked from the installer's [Run] section). Best-effort:
warns and exits 0 on any failure — the host degrades to a physical display without a virtual display,
so a driver hiccup must never abort the whole install.
.DESCRIPTION
-Dir holds the staged payload (pf_vdisplay.inf/.cat/.dll + signing .cer + nefconc.exe). Steps:
1. Trust the self-signed driver cert (machine Root + TrustedPublisher) so PnP installs it silently
(the same punktfunk-ds-test cert the gamepad drivers ship).
2. Create the ROOT device node IF ABSENT (gated — a blind re-create spawns a phantom duplicate, and
the host's open_device() binds interface index 0; crates/punktfunk-host/src/vdisplay/sudovda.rs).
ALWAYS via nefconc (a clean ROOT\DISPLAY node) — NEVER devgen, which makes persistent SWD\DEVGEN
software devices that survive reboot + registry deletion and resurrect on every driver install.
3. Stage + bind the driver (pnputil /add-driver /install — modern, in-box, idempotent).
Class/ClassGuid are read from the .inf so they always match the shipped driver.
.EXAMPLE
powershell -ExecutionPolicy Bypass -File install-pf-vdisplay.ps1 -Dir C:\path\to\pf-vdisplay
#>
[CmdletBinding()]
param(
[string]$Dir = $PSScriptRoot,
[string]$HardwareId = 'root\pf_vdisplay' # matches pf_vdisplay.inf [Standard.NTamd64]
)
# Never abort the installer on a driver failure.
$ErrorActionPreference = 'Continue'
trap { Write-Warning "pf-vdisplay install error: $_"; exit 0 }
function Test-PfVdisplayPresent {
$devs = Get-PnpDevice -Class Display -PresentOnly -ErrorAction SilentlyContinue
foreach ($d in $devs) {
$hw = (Get-PnpDeviceProperty -InstanceId $d.InstanceId -KeyName 'DEVPKEY_Device_HardwareIds' `
-ErrorAction SilentlyContinue).Data
if ($hw -and ($hw | Where-Object { $_ -ieq $HardwareId })) { return $true }
}
return $false
}
$inf = Get-ChildItem -Path $Dir -Filter pf_vdisplay.inf -Recurse -ErrorAction SilentlyContinue | Select-Object -First 1
$cer = Get-ChildItem -Path $Dir -Filter *.cer -Recurse -ErrorAction SilentlyContinue | Select-Object -First 1
$nef = Get-ChildItem -Path $Dir -Filter 'nefconc.exe' -Recurse -ErrorAction SilentlyContinue | Select-Object -First 1
if (-not $inf) { Write-Warning "no pf_vdisplay.inf in $Dir; skipping driver install."; exit 0 }
Write-Host "pf-vdisplay inf: $($inf.FullName)"
# 1) Trust the self-signed driver cert (a self-signed driver needs the cert in BOTH the machine Root
# store, so the chain validates, and TrustedPublisher, so PnP installs without a prompt).
if ($cer) {
Write-Host "==> importing $($cer.Name) to Root + TrustedPublisher"
certutil.exe -addstore -f Root "$($cer.FullName)" | Out-Null
certutil.exe -addstore -f TrustedPublisher "$($cer.FullName)" | Out-Null
}
else { Write-Warning "no .cer in $Dir — driver may not install silently (untrusted publisher)" }
# 2) Create the root device node only if it isn't already there. nefconc, NEVER devgen.
if (Test-PfVdisplayPresent) {
Write-Host "pf-vdisplay device node already present — leaving it as-is."
}
elseif ($nef) {
$infText = Get-Content -Raw $inf.FullName
$classGuid = ([regex]::Match($infText, '(?im)^\s*ClassGuid\s*=\s*(\{[0-9A-Fa-f-]+\})')).Groups[1].Value
$className = ([regex]::Match($infText, '(?im)^\s*Class\s*=\s*([^\s;]+)')).Groups[1].Value
if (-not $classGuid) { $classGuid = '{4d36e968-e325-11ce-bfc1-08002be10318}'; $className = 'Display' } # Display class
Write-Host "==> nefconc --create-device-node hwid=$HardwareId class=$className $classGuid"
& $nef.FullName --create-device-node --hardware-id $HardwareId --class-name $className --class-guid $classGuid
if ($LASTEXITCODE -ne 0 -and $LASTEXITCODE -ne 3010) {
Write-Warning "nefconc --create-device-node returned $LASTEXITCODE"
}
}
else { Write-Warning "nefconc.exe not found in $Dir — cannot create the pf-vdisplay device node." }
# 3) Stage + bind the driver (idempotent; re-staging the same .inf is harmless).
Write-Host "==> pnputil /add-driver $($inf.Name) /install"
& pnputil.exe /add-driver "$($inf.FullName)" /install
$rc = $LASTEXITCODE
if ($rc -eq 3010) { Write-Host " driver installed; a reboot is recommended." }
elseif ($rc -ne 0) { Write-Warning "pnputil /add-driver returned $rc" }
exit 0
+36 -30
View File
@@ -5,7 +5,7 @@
.DESCRIPTION .DESCRIPTION
From a release `cargo build -p punktfunk-host --features nvenc` output (the exe), this: From a release `cargo build -p punktfunk-host --features nvenc` output (the exe), this:
1. resolves a code-signing cert (supplied stable .pfx from CI secrets OR an ephemeral self-signed 1. resolves a code-signing cert (supplied stable .pfx from CI secrets OR an ephemeral self-signed
CN=unom same scheme as the client's pack-msix.ps1) and exports the public .cer, CN=unom - same scheme as the client's pack-msix.ps1) and exports the public .cer,
2. signs the inner punktfunk-host.exe, 2. signs the inner punktfunk-host.exe,
3. stages the pf-vdisplay virtual-display driver bundle (unless -NoDriver), 3. stages the pf-vdisplay virtual-display driver bundle (unless -NoDriver),
4. runs ISCC to build punktfunk-host-setup-<ver>.exe, 4. runs ISCC to build punktfunk-host-setup-<ver>.exe,
@@ -52,7 +52,7 @@ function Find-Iscc {
} }
$c = Get-Command iscc -ErrorAction SilentlyContinue $c = Get-Command iscc -ErrorAction SilentlyContinue
if ($c) { return $c.Source } if ($c) { return $c.Source }
throw "ISCC.exe (Inno Setup 6, any 6.x) not found install it (choco install innosetup -y)." throw "ISCC.exe (Inno Setup 6, any 6.x) not found - install it (choco install innosetup -y)."
} }
function Find-SdkTool([string]$name) { function Find-SdkTool([string]$name) {
$root = 'C:\Program Files (x86)\Windows Kits\10\bin' $root = 'C:\Program Files (x86)\Windows Kits\10\bin'
@@ -60,7 +60,7 @@ function Find-SdkTool([string]$name) {
Where-Object { $_.FullName -match '\\(10\.0\.\d+\.\d+)\\x64\\' } | Where-Object { $_.FullName -match '\\(10\.0\.\d+\.\d+)\\x64\\' } |
Sort-Object { [version]([regex]::Match($_.FullName, '\\(10\.0\.\d+\.\d+)\\x64\\').Groups[1].Value) } | Sort-Object { [version]([regex]::Match($_.FullName, '\\(10\.0\.\d+\.\d+)\\x64\\').Groups[1].Value) } |
Select-Object -Last 1 Select-Object -Last 1
if (-not $hit) { throw "$name not found under $root install the Windows 10/11 SDK." } if (-not $hit) { throw "$name not found under $root - install the Windows 10/11 SDK." }
$hit.FullName $hit.FullName
} }
$iscc = Find-Iscc $iscc = Find-Iscc
@@ -103,7 +103,7 @@ function Sign-File([string]$Path) {
if ($PfxPassword) { $signArgs += @('/p', $PfxPassword) } if ($PfxPassword) { $signArgs += @('/p', $PfxPassword) }
& $signtool ($signArgs + @('/tr', 'http://timestamp.digicert.com', '/td', 'SHA256', $Path)) & $signtool ($signArgs + @('/tr', 'http://timestamp.digicert.com', '/td', 'SHA256', $Path))
if ($LASTEXITCODE -ne 0) { if ($LASTEXITCODE -ne 0) {
Write-Warning "timestamped sign failed for $Path retrying without a timestamp" Write-Warning "timestamped sign failed for $Path - retrying without a timestamp"
& $signtool ($signArgs + @($Path)) & $signtool ($signArgs + @($Path))
if ($LASTEXITCODE -ne 0) { throw "signtool sign failed for $Path ($LASTEXITCODE)" } if ($LASTEXITCODE -ne 0) { throw "signtool sign failed for $Path ($LASTEXITCODE)" }
} }
@@ -140,37 +140,45 @@ $defines = @(
"/DReadme=$readme" "/DReadme=$readme"
) )
# --- stage the pf-vdisplay virtual-display driver bundle -------------------------------------- # --- build (from source) + stage the pf-vdisplay virtual-display driver -----------------------
# pf-vdisplay is our all-Rust IddCx driver (packaging/windows/drivers/), vendored signed under # pf-vdisplay is our all-Rust IddCx driver (packaging/windows/drivers/). It is now BUILT FROM SOURCE
# packaging/windows/pf-vdisplay/. It replaced the vendored SudoVDA C++ driver. # every release (build-pf-vdisplay.ps1) instead of shipping a checked-in prebuilt binary: the vendored
# binary went stale (its .cat stopped covering an edited .inf -> pnputil SPAPI_E_FILE_HASH_NOT_IN_CATALOG
# on every box, and it predated IOCTL_SET_RENDER_ADAPTER the host needs on hybrid/Optimus GPUs). Building
# here keeps the .dll/.inf/.cat in lockstep + ships current driver features. stage-pf-vdisplay.ps1 then
# adds the fetched nefcon device tool. (Needs the WDK build env; -NoDriver skips it for a WDK-less pack.)
if (-not $NoDriver) { if (-not $NoDriver) {
$built = Join-Path $OutDir 'pfvd-built'
& (Join-Path $here 'build-pf-vdisplay.ps1') -Out $built
$stage = Join-Path $OutDir 'stage' $stage = Join-Path $OutDir 'stage'
& (Join-Path $here 'stage-pf-vdisplay.ps1') -OutDir $stage & (Join-Path $here 'stage-pf-vdisplay.ps1') -OutDir $stage -VendorDir $built
Copy-Item (Join-Path $here 'install-pf-vdisplay.ps1') (Join-Path $stage 'install-pf-vdisplay.ps1') -Force # The installer runs `punktfunk-host.exe driver install --dir {tmp}\pfvdisplay` (not a staged .ps1).
$defines += "/DStageDir=$stage" $defines += "/DStageDir=$stage"
} }
else { Write-Host "-NoDriver: building installer WITHOUT the bundled pf-vdisplay driver" } else { Write-Host "-NoDriver: building installer WITHOUT the bundled pf-vdisplay driver" }
# --- stage the punktfunk virtual-gamepad UMDF drivers (DualSense/DS4 + Xbox 360 XUSB) ---------- # --- build (from source) + stage the punktfunk virtual-gamepad UMDF drivers --------------------
# Vendored, pre-signed under packaging/windows/gamepad-drivers/ (like pf-vdisplay). Rebuild + re-vendor # pf-dualsense (DualSense / DualShock 4) + pf-xusb (Xbox 360 / XInput) are members of the same drivers
# from packaging/windows/{dualsense,xusb}-driver/ when the driver source changes (see their READMEs). # workspace as pf-vdisplay, built from source per release (build-gamepad-drivers.ps1) - same anti-stale
# reasoning as pf-vdisplay; the prior checked-in binaries under gamepad-drivers/ are retired. The
# installer adds each to the store via `punktfunk-host.exe driver install --gamepad` (the host
# SwDeviceCreate's the per-session devnodes).
if (-not $NoDriver) { if (-not $NoDriver) {
$gpVendor = Join-Path $here 'gamepad-drivers' $gpBuilt = Join-Path $OutDir 'gamepad-built'
if (Test-Path (Join-Path $gpVendor 'pf_dualsense.inf')) { # -SkipBuild: build-pf-vdisplay.ps1 above already `cargo build`s the WHOLE drivers workspace (incl.
$gpStage = Join-Path $OutDir 'gamepad' # the gamepad cdylibs), so just sign+stage them here - no redundant second full build.
if (Test-Path $gpStage) { Remove-Item -Recurse -Force $gpStage } & (Join-Path $here 'build-gamepad-drivers.ps1') -Out $gpBuilt -SkipBuild
New-Item -ItemType Directory -Force -Path $gpStage | Out-Null $gpStage = Join-Path $OutDir 'gamepad'
Copy-Item (Join-Path $gpVendor '*') $gpStage -Force if (Test-Path $gpStage) { Remove-Item -Recurse -Force $gpStage }
Copy-Item (Join-Path $here 'install-gamepad-drivers.ps1') (Join-Path $gpStage 'install-gamepad-drivers.ps1') -Force New-Item -ItemType Directory -Force -Path $gpStage | Out-Null
$defines += "/DGamepadStageDir=$gpStage" Copy-Item (Join-Path $gpBuilt '*') $gpStage -Force
Write-Host "==> staged vendored gamepad UMDF drivers from $gpVendor" $defines += "/DGamepadStageDir=$gpStage"
} Write-Host "==> built + staged gamepad UMDF drivers -> $gpStage"
else { Write-Warning "no vendored gamepad drivers under $gpVendor — installer built WITHOUT them" }
} }
# --- stage the FFmpeg shared DLLs (AMD/Intel AMF/QSV build) ------------------------------------ # --- stage the FFmpeg shared DLLs (AMD/Intel AMF/QSV build) ------------------------------------
# A host built with --features amf-qsv link-imports avcodec/avutil/swscale/... so the shared DLLs # A host built with --features amf-qsv link-imports avcodec/avutil/swscale/... so the shared DLLs
# MUST sit next to the exe (it won't start otherwise). Bundle them from $FfmpegDir\bin the same # MUST sit next to the exe (it won't start otherwise). Bundle them from $FfmpegDir\bin - the same
# BtbN gpl-shared tree the build linked against. A nvenc/software-only build doesn't import them, so # BtbN gpl-shared tree the build linked against. A nvenc/software-only build doesn't import them, so
# this is a harmless extra there; skipped entirely when $FfmpegDir is unset. # this is a harmless extra there; skipped entirely when $FfmpegDir is unset.
$ffmpegBinSrc = if ($FfmpegDir) { Join-Path $FfmpegDir 'bin' } else { $null } $ffmpegBinSrc = if ($FfmpegDir) { Join-Path $FfmpegDir 'bin' } else { $null }
@@ -190,7 +198,7 @@ else { Write-Host "no FFMPEG_DIR\bin -> installer built WITHOUT FFmpeg DLLs (nve
# The console runs as the PunktfunkWeb scheduled task (`bun {app}\web\.output\server\index.mjs`), # The console runs as the PunktfunkWeb scheduled task (`bun {app}\web\.output\server\index.mjs`),
# auto-wired to the host's loopback mgmt API. Stage everything ISCC reads into $OutDir (the # auto-wired to the host's loopback mgmt API. Stage everything ISCC reads into $OutDir (the
# non-WOW64-redirected C:\t area, same reason as the .iss/host.env staging above). The .output is # non-WOW64-redirected C:\t area, same reason as the .iss/host.env staging above). The .output is
# self-contained (Nitro noExternals deps bundled + tree-shaken, no node_modules), so bun runs it # self-contained (Nitro noExternals - deps bundled + tree-shaken, no node_modules), so bun runs it
# directly; omitted when -WebDir/-BunExe are unset (host-only installer, e.g. a local debug pack). # directly; omitted when -WebDir/-BunExe are unset (host-only installer, e.g. a local debug pack).
if ($WebDir -and (Test-Path $WebDir) -and $BunExe -and (Test-Path $BunExe)) { if ($WebDir -and (Test-Path $WebDir) -and $BunExe -and (Test-Path $BunExe)) {
$webStage = Join-Path $OutDir 'web' $webStage = Join-Path $OutDir 'web'
@@ -200,20 +208,18 @@ if ($WebDir -and (Test-Path $WebDir) -and $BunExe -and (Test-Path $BunExe)) {
$bunStage = Join-Path $OutDir 'bun.exe' $bunStage = Join-Path $OutDir 'bun.exe'
Copy-Item -LiteralPath $BunExe -Destination $bunStage -Force Copy-Item -LiteralPath $BunExe -Destination $bunStage -Force
$webRun = Join-Path $OutDir 'web-run.cmd' $webRun = Join-Path $OutDir 'web-run.cmd'
$webSetup = Join-Path $OutDir 'web-setup.ps1'
Copy-Item (Join-Path $repoRoot 'scripts\windows\web-run.cmd') -Destination $webRun -Force Copy-Item (Join-Path $repoRoot 'scripts\windows\web-run.cmd') -Destination $webRun -Force
Copy-Item (Join-Path $repoRoot 'scripts\windows\web-setup.ps1') -Destination $webSetup -Force # The console is provisioned by `punktfunk-host.exe web setup` (not a staged web-setup.ps1).
$defines += "/DWebDir=$webStage" $defines += "/DWebDir=$webStage"
$defines += "/DBunExe=$bunStage" $defines += "/DBunExe=$bunStage"
$defines += "/DWebRunCmd=$webRun" $defines += "/DWebRunCmd=$webRun"
$defines += "/DWebSetup=$webSetup"
Write-Host "bundling the web console from $WebDir (+ bun $BunExe)" Write-Host "bundling the web console from $WebDir (+ bun $BunExe)"
} }
else { Write-Host "no -WebDir/-BunExe -> installer built WITHOUT the web console" } else { Write-Host "no -WebDir/-BunExe -> installer built WITHOUT the web console" }
# --- build + stage the HDR Vulkan layer (pf-vkhdr-layer) -------------------------------------- # --- build + stage the HDR Vulkan layer (pf-vkhdr-layer) --------------------------------------
# A tiny always-on Vulkan implicit layer (cdylib) that advertises HDR10/scRGB surface formats on the # A tiny always-on Vulkan implicit layer (cdylib) that advertises HDR10/scRGB surface formats on the
# virtual display so Vulkan games (Doom: The Dark Ages, etc.) can enable HDR while streaming the # virtual display so Vulkan games (Doom: The Dark Ages, etc.) can enable HDR while streaming - the
# NVIDIA/AMD ICDs hide HDR formats on an indirect display even though they accept+present a forced HDR # NVIDIA/AMD ICDs hide HDR formats on an indirect display even though they accept+present a forced HDR
# swapchain there. Self-gated on the display's actual advanced-color state, so it's a no-op on SDR. # swapchain there. Self-gated on the display's actual advanced-color state, so it's a no-op on SDR.
# Standalone crate (own [workspace]); built here and registered by the installer. Skipped if cargo # Standalone crate (own [workspace]); built here and registered by the installer. Skipped if cargo
@@ -239,7 +245,7 @@ if (Test-Path (Join-Path $layerSrc 'Cargo.toml')) {
$defines += "/DVkLayerDir=$layerStage" $defines += "/DVkLayerDir=$layerStage"
Write-Host "==> staged pf-vkhdr-layer -> $layerStage" Write-Host "==> staged pf-vkhdr-layer -> $layerStage"
} }
else { Write-Warning "pf-vkhdr-layer build failed ($layerExit) installer built WITHOUT the HDR Vulkan layer" } else { Write-Warning "pf-vkhdr-layer build failed ($layerExit) - installer built WITHOUT the HDR Vulkan layer" }
} }
else { Write-Host "no pf-vkhdr-layer crate -> installer built WITHOUT the HDR Vulkan layer" } else { Write-Host "no pf-vkhdr-layer crate -> installer built WITHOUT the HDR Vulkan layer" }
Binary file not shown.
Binary file not shown.
@@ -1,85 +0,0 @@
;/*++
; pf-vdisplay - punktfunk virtual display, UMDF2 IddCx driver INF (template; stampinf -> .inf).
;
; For the all-Rust wdk-sys / windows-drivers-rs driver in THIS tree
; (packaging/windows/drivers/pf-vdisplay/). The driver registers the OWNED pf_driver_proto
; control-interface GUID in CODE (WdfDeviceCreateDeviceInterface), so this INF is GUID-agnostic and
; is byte-identical to the superseded oracle's (packaging/windows/vdisplay-driver/.../pf_vdisplay.inx,
; itself adapted from MolotovCherry/virtual-display-rs (MIT) + SudoVDA's control-device security DACL).
; HWID Root\pf_vdisplay + IddCx0102 + the DACL match the host backend (crates/punktfunk-host/src/
; vdisplay/pf_vdisplay.rs) and install-pf-vdisplay.ps1's Test-PfVdisplayPresent / nefconc node-create.
;--*/
[Version]
PnpLockdown=1
Signature="$Windows NT$"
ClassGUID={4D36E968-E325-11CE-BFC1-08002BE10318}
Class=Display
ClassVer=2.0
Provider=%ManufacturerName%
CatalogFile=pf_vdisplay.cat
DriverVer = 06/25/2026,9.5.0625.1614
[Manufacturer]
%ManufacturerName%=Standard,NTamd64
[Standard.NTamd64]
%DeviceName%=pf_vdisplay_Install, Root\pf_vdisplay
[SourceDisksFiles]
pf_vdisplay.dll=1
[SourceDisksNames]
1=%DiskName%
; =================== UMDF IddCx device ====================
[pf_vdisplay_Install.NT]
CopyFiles=UMDriverCopy
[pf_vdisplay_Install.NT.hw]
AddReg=pf_vdisplay_HardwareDeviceSettings
[pf_vdisplay_HardwareDeviceSettings]
HKR, , "UpperFilters", %REG_MULTI_SZ%, "IndirectKmd"
HKR, "WUDF", "DeviceGroupId", %REG_SZ%, "pfVDisplayGroup"
; Let the host (LocalSystem service) + admins open the control device for the ADD/REMOVE/PING IOCTLs.
HKR, , "Security", , "D:P(A;;GA;;;SY)(A;;GA;;;BA)(A;;GRGW;;;WD)"
[pf_vdisplay_Install.NT.Services]
AddService=WUDFRd,0x000001fa,WUDFRD_ServiceInstall
[pf_vdisplay_Install.NT.Wdf]
UmdfService=pf_vdisplay, pf_vdisplay_Install
UmdfServiceOrder=pf_vdisplay
UmdfKernelModeClientPolicy=AllowKernelModeClients
UmdfHostProcessSharing=ProcessSharingDisabled
[pf_vdisplay_Install]
UmdfLibraryVersion=2.15.0
ServiceBinary=%12%\UMDF\pf_vdisplay.dll
UmdfExtensions=IddCx0102
[WUDFRD_ServiceInstall]
DisplayName=%WudfRdDisplayName%
ServiceType=1
StartType=3
ErrorControl=1
ServiceBinary=%12%\WUDFRd.sys
[DestinationDirs]
UMDriverCopy=12,UMDF
[UMDriverCopy]
pf_vdisplay.dll
[Strings]
ManufacturerName="punktfunk"
DiskName="punktfunk Virtual Display Installation Disk"
WudfRdDisplayName="Windows Driver Foundation - User-mode Driver Framework Reflector"
DeviceName="punktfunk Virtual Display"
REG_MULTI_SZ=0x00010000
REG_SZ=0x00000000
REG_EXPAND_SZ=0x00020000
REG_DWORD=0x00010001
Binary file not shown.
+41 -29
View File
@@ -28,35 +28,32 @@
#ifndef Readme #ifndef Readme
#define Readme "README.md" #define Readme "README.md"
#endif #endif
; The web console launcher (the PunktfunkWeb task action) + its post-install provisioner committed ; The web console launcher (the PunktfunkWeb task action) + its post-install provisioner - committed
; scripts staged next to the .iss by pack-host-installer.ps1 (absolute paths passed in). ; scripts staged next to the .iss by pack-host-installer.ps1 (absolute paths passed in).
#ifndef WebRunCmd #ifndef WebRunCmd
#define WebRunCmd "..\..\scripts\windows\web-run.cmd" #define WebRunCmd "..\..\scripts\windows\web-run.cmd"
#endif #endif
#ifndef WebSetup
#define WebSetup "..\..\scripts\windows\web-setup.ps1"
#endif
; StageDir (the staged pf-vdisplay payload + nefconc.exe + install-pf-vdisplay.ps1) is optional. ; StageDir (the staged pf-vdisplay payload + nefconc.exe + install-pf-vdisplay.ps1) is optional.
#ifdef StageDir #ifdef StageDir
#define WithDriver #define WithDriver
#endif #endif
; GamepadStageDir (the vendored UMDF gamepad drivers + install-gamepad-drivers.ps1) is optional. ; GamepadStageDir (the built-from-source UMDF gamepad drivers + install-gamepad-drivers.ps1) is optional.
#ifdef GamepadStageDir #ifdef GamepadStageDir
#define WithGamepad #define WithGamepad
#endif #endif
; FfmpegBin (a dir of FFmpeg shared DLLs) is optional present when the host is built with ; FfmpegBin (a dir of FFmpeg shared DLLs) is optional - present when the host is built with
; --features amf-qsv (the AMD/Intel AMF/QSV encode backend link-imports the FFmpeg libs). ; --features amf-qsv (the AMD/Intel AMF/QSV encode backend link-imports the FFmpeg libs).
#ifdef FfmpegBin #ifdef FfmpegBin
#define WithFfmpeg #define WithFfmpeg
#endif #endif
; WebDir (the built web .output tree) + BunExe (a portable bun.exe) are passed together by ; WebDir (the built web .output tree) + BunExe (a portable bun.exe) are passed together by
; pack-host-installer.ps1 to bundle the management console. Both required WithWeb. ; pack-host-installer.ps1 to bundle the management console. Both required -> WithWeb.
#ifdef WebDir #ifdef WebDir
#ifdef BunExe #ifdef BunExe
#define WithWeb #define WithWeb
#endif #endif
#endif #endif
; VkLayerDir (the staged pf-vkhdr-layer: pf_vkhdr_layer.dll + .json) is optional present when the ; VkLayerDir (the staged pf-vkhdr-layer: pf_vkhdr_layer.dll + .json) is optional - present when the
; HDR Vulkan layer was built. It lets Vulkan games (Doom: The Dark Ages, etc.) enable HDR over the ; HDR Vulkan layer was built. It lets Vulkan games (Doom: The Dark Ages, etc.) enable HDR over the
; virtual display (the ICD won't advertise HDR there; the layer injects the surface formats, self- ; virtual display (the ICD won't advertise HDR there; the layer injects the surface formats, self-
; gated on the display's actual HDR state). ; gated on the display's actual HDR state).
@@ -94,7 +91,7 @@ Name: "english"; MessagesFile: "compiler:Default.isl"
Name: "installdriver"; Description: "Install the pf-vdisplay virtual display driver (required for native-resolution streaming)" Name: "installdriver"; Description: "Install the pf-vdisplay virtual display driver (required for native-resolution streaming)"
#endif #endif
#ifdef WithGamepad #ifdef WithGamepad
Name: "installgamepad"; Description: "Install the virtual gamepad drivers (DualSense / DualShock 4 / Xbox 360 no ViGEmBus needed)" Name: "installgamepad"; Description: "Install the virtual gamepad drivers (DualSense / DualShock 4 / Xbox 360 - no ViGEmBus needed)"
#endif #endif
#ifdef WithVkLayer #ifdef WithVkLayer
Name: "installhdrlayer"; Description: "Install the HDR Vulkan layer (lets Vulkan games like Doom use HDR on the virtual display)" Name: "installhdrlayer"; Description: "Install the HDR Vulkan layer (lets Vulkan games like Doom use HDR on the virtual display)"
@@ -106,27 +103,26 @@ Source: "{#BinDir}\punktfunk-host.exe"; DestDir: "{app}"; Flags: ignoreversion
Source: "{#HostEnv}"; DestDir: "{app}"; Flags: ignoreversion Source: "{#HostEnv}"; DestDir: "{app}"; Flags: ignoreversion
Source: "{#Readme}"; DestDir: "{app}"; DestName: "README.txt"; Flags: ignoreversion Source: "{#Readme}"; DestDir: "{app}"; DestName: "README.txt"; Flags: ignoreversion
#ifdef WithFfmpeg #ifdef WithFfmpeg
; FFmpeg shared DLLs (avcodec/avutil/swscale/...) laid down next to the exe the AMD/Intel ; FFmpeg shared DLLs (avcodec/avutil/swscale/...) laid down next to the exe - the AMD/Intel
; (AMF/QSV) encode backend link-imports them, so the exe won't start without them. NVENC/software- ; (AMF/QSV) encode backend link-imports them, so the exe won't start without them. NVENC/software-
; only builds simply omit this block. ; only builds simply omit this block.
Source: "{#FfmpegBin}\*.dll"; DestDir: "{app}"; Flags: ignoreversion Source: "{#FfmpegBin}\*.dll"; DestDir: "{app}"; Flags: ignoreversion
#endif #endif
#ifdef WithWeb #ifdef WithWeb
; The web management console: the self-contained Nitro SSR bundle (.output = server + public; deps ; The web management console: the self-contained Nitro SSR bundle (.output = server + public; deps
; bundled in, no node_modules) {app}\web\.output, a portable bun runtime {app}\bun\bun.exe, and ; bundled in, no node_modules) -> {app}\web\.output, a portable bun runtime -> {app}\bun\bun.exe, and
; the launcher the PunktfunkWeb task runs {app}\web\web-run.cmd. web-setup.ps1 (the provisioner) ; the launcher the PunktfunkWeb task runs -> {app}\web\web-run.cmd. (`punktfunk-host.exe web setup`
; goes to {tmp} and is removed after install. ; provisions the console at install time - no staged provisioner script.)
Source: "{#WebDir}\*"; DestDir: "{app}\web\.output"; Flags: ignoreversion recursesubdirs createallsubdirs Source: "{#WebDir}\*"; DestDir: "{app}\web\.output"; Flags: ignoreversion recursesubdirs createallsubdirs
Source: "{#BunExe}"; DestDir: "{app}\bun"; DestName: "bun.exe"; Flags: ignoreversion Source: "{#BunExe}"; DestDir: "{app}\bun"; DestName: "bun.exe"; Flags: ignoreversion
Source: "{#WebRunCmd}"; DestDir: "{app}\web"; DestName: "web-run.cmd"; Flags: ignoreversion Source: "{#WebRunCmd}"; DestDir: "{app}\web"; DestName: "web-run.cmd"; Flags: ignoreversion
Source: "{#WebSetup}"; DestDir: "{tmp}"; DestName: "web-setup.ps1"; Flags: deleteafterinstall
#endif #endif
#ifdef WithDriver #ifdef WithDriver
; The driver payload + nefconc.exe + install-pf-vdisplay.ps1, extracted to {tmp} and removed after install. ; The driver payload + nefconc.exe + install-pf-vdisplay.ps1, extracted to {tmp} and removed after install.
Source: "{#StageDir}\*"; DestDir: "{tmp}\pfvdisplay"; Flags: deleteafterinstall recursesubdirs createallsubdirs; Tasks: installdriver Source: "{#StageDir}\*"; DestDir: "{tmp}\pfvdisplay"; Flags: deleteafterinstall recursesubdirs createallsubdirs; Tasks: installdriver
#endif #endif
#ifdef WithGamepad #ifdef WithGamepad
; The vendored UMDF gamepad drivers + install-gamepad-drivers.ps1, extracted to {tmp}, removed after. ; The built-from-source UMDF gamepad drivers + install-gamepad-drivers.ps1, extracted to {tmp}, removed after.
Source: "{#GamepadStageDir}\*"; DestDir: "{tmp}\gamepad"; Flags: deleteafterinstall recursesubdirs createallsubdirs; Tasks: installgamepad Source: "{#GamepadStageDir}\*"; DestDir: "{tmp}\gamepad"; Flags: deleteafterinstall recursesubdirs createallsubdirs; Tasks: installgamepad
#endif #endif
#ifdef WithVkLayer #ifdef WithVkLayer
@@ -148,14 +144,12 @@ Root: HKLM64; Subkey: "SOFTWARE\Khronos\Vulkan\ImplicitLayers"; ValueType: dword
[Run] [Run]
#ifdef WithDriver #ifdef WithDriver
Filename: "powershell.exe"; \ Filename: "{app}\punktfunk-host.exe"; Parameters: "driver install --dir ""{tmp}\pfvdisplay"""; WorkingDir: "{app}"; \
Parameters: "-NoProfile -ExecutionPolicy Bypass -File ""{tmp}\pfvdisplay\install-pf-vdisplay.ps1"" -Dir ""{tmp}\pfvdisplay"""; \
StatusMsg: "Installing the pf-vdisplay virtual display driver..."; \ StatusMsg: "Installing the pf-vdisplay virtual display driver..."; \
Flags: runhidden waituntilterminated; Tasks: installdriver Flags: runhidden waituntilterminated; Tasks: installdriver
#endif #endif
#ifdef WithGamepad #ifdef WithGamepad
Filename: "powershell.exe"; \ Filename: "{app}\punktfunk-host.exe"; Parameters: "driver install --gamepad --dir ""{tmp}\gamepad"""; WorkingDir: "{app}"; \
Parameters: "-NoProfile -ExecutionPolicy Bypass -File ""{tmp}\gamepad\install-gamepad-drivers.ps1"" -Dir ""{tmp}\gamepad"""; \
StatusMsg: "Installing the virtual gamepad drivers..."; \ StatusMsg: "Installing the virtual gamepad drivers..."; \
Flags: runhidden waituntilterminated; Tasks: installgamepad Flags: runhidden waituntilterminated; Tasks: installgamepad
#endif #endif
@@ -169,8 +163,7 @@ Filename: "{app}\punktfunk-host.exe"; Parameters: "service start"; WorkingDir: "
; Provision the console AFTER the host service is up (so the mgmt token exists): write the ACL'd ; Provision the console AFTER the host service is up (so the mgmt token exists): write the ACL'd
; login password, register the PunktfunkWeb scheduled task (boot, SYSTEM, restart-on-failure), ; login password, register the PunktfunkWeb scheduled task (boot, SYSTEM, restart-on-failure),
; open TCP 3000, and start it. {code:WebSetupParams} appends -PasswordFile only on a fresh install. ; open TCP 3000, and start it. {code:WebSetupParams} appends -PasswordFile only on a fresh install.
Filename: "powershell.exe"; \ Filename: "{app}\punktfunk-host.exe"; Parameters: "web setup {code:WebSetupParams}"; WorkingDir: "{app}"; \
Parameters: "-NoProfile -ExecutionPolicy Bypass -File ""{tmp}\web-setup.ps1"" {code:WebSetupParams}"; \
StatusMsg: "Setting up the punktfunk web console..."; Flags: runhidden waituntilterminated StatusMsg: "Setting up the punktfunk web console..."; Flags: runhidden waituntilterminated
#endif #endif
@@ -180,7 +173,7 @@ Filename: "{app}\punktfunk-host.exe"; Parameters: "service uninstall"; Flags: ru
; Stop + remove the PunktfunkWeb task and its firewall rule (leaves %ProgramData%\punktfunk config, ; Stop + remove the PunktfunkWeb task and its firewall rule (leaves %ProgramData%\punktfunk config,
; like the host uninstall does). ; like the host uninstall does).
Filename: "powershell.exe"; \ Filename: "powershell.exe"; \
Parameters: "-NoProfile -ExecutionPolicy Bypass -Command ""Stop-ScheduledTask -TaskName PunktfunkWeb -ErrorAction SilentlyContinue; Unregister-ScheduledTask -TaskName PunktfunkWeb -Confirm:$false -ErrorAction SilentlyContinue; Get-NetFirewallRule -Name 'PunktfunkWeb-TCP-3000' -ErrorAction SilentlyContinue | Remove-NetFirewallRule"""; \ Parameters: "-NoProfile -ExecutionPolicy Bypass -Command ""Stop-ScheduledTask -TaskName PunktfunkWeb -ErrorAction SilentlyContinue; Get-NetTCPConnection -LocalPort 3000 -State Listen -ErrorAction SilentlyContinue | ForEach-Object {{ Stop-Process -Id $_.OwningProcess -Force -ErrorAction SilentlyContinue }; Unregister-ScheduledTask -TaskName PunktfunkWeb -Confirm:$false -ErrorAction SilentlyContinue; Get-NetFirewallRule -Name 'PunktfunkWeb-TCP-3000' -ErrorAction SilentlyContinue | Remove-NetFirewallRule"""; \
Flags: runhidden waituntilterminated; RunOnceId: "PunktfunkWebCleanup" Flags: runhidden waituntilterminated; RunOnceId: "PunktfunkWebCleanup"
#endif #endif
@@ -188,7 +181,7 @@ Filename: "powershell.exe"; \
#ifdef WithWeb #ifdef WithWeb
var var
WebPwPage: TInputQueryWizardPage; WebPwPage: TInputQueryWizardPage;
FreshWebInstall: Boolean; { captured at start web-setup creates the file mid-run } FreshWebInstall: Boolean; { captured at start - web-setup creates the file mid-run }
function WebPasswordPath: String; function WebPasswordPath: String;
begin begin
@@ -196,7 +189,7 @@ begin
end; end;
{ Pre-fill the console password field with a crypto-strong default (Inno has no RNG): a one-shot { Pre-fill the console password field with a crypto-strong default (Inno has no RNG): a one-shot
PowerShell writes 12 random bytes as dashed hex; strip the dashes a 24-char hex password. } PowerShell writes 12 random bytes as dashed hex; strip the dashes -> a 24-char hex password. }
procedure GenerateRandomWebPassword(var Pw: String); procedure GenerateRandomWebPassword(var Pw: String);
var var
ResultCode: Integer; ResultCode: Integer;
@@ -229,7 +222,7 @@ begin
WebPwPage := CreateInputQueryPage(wpSelectTasks, WebPwPage := CreateInputQueryPage(wpSelectTasks,
'Web console', 'Set the punktfunk web console login password', 'Web console', 'Set the punktfunk web console login password',
'The management console is served on http://this-computer:3000 and is login-gated. Keep the ' + 'The management console is served on http://this-computer:3000 and is login-gated. Keep the ' +
'secure password generated below (it is shown again on the final page) or enter your own you ' + 'secure password generated below (it is shown again on the final page) or enter your own - you ' +
'can change it later in %ProgramData%\punktfunk\web-password.'); 'can change it later in %ProgramData%\punktfunk\web-password.');
WebPwPage.Add('Console password:', False); { visible, so the admin can read the generated default } WebPwPage.Add('Console password:', False); { visible, so the admin can read the generated default }
DefaultPw := ''; DefaultPw := '';
@@ -239,7 +232,7 @@ end;
function ShouldSkipPage(PageID: Integer): Boolean; function ShouldSkipPage(PageID: Integer): Boolean;
begin begin
{ On upgrade the password already exists keep it, don't re-prompt. } { On upgrade the password already exists - keep it, don't re-prompt. }
Result := (PageID = WebPwPage.ID) and (not FreshWebInstall); Result := (PageID = WebPwPage.ID) and (not FreshWebInstall);
end; end;
@@ -264,10 +257,10 @@ end;
function WebSetupParams(Param: String): String; function WebSetupParams(Param: String): String;
begin begin
{ Pass the password to web-setup.ps1 via a temp file, not the cmdline (which lands in the install { Pass the password to web-setup.ps1 via a temp file, not the cmdline (which lands in the install
log). Only on a fresh install on upgrade web-setup keeps the existing file. } log). Only on a fresh install - on upgrade web-setup keeps the existing file. }
Result := '-AppDir "' + ExpandConstant('{app}') + '"'; Result := '--app-dir "' + ExpandConstant('{app}') + '"';
if FreshWebInstall then if FreshWebInstall then
Result := Result + ' -PasswordFile "' + ExpandConstant('{tmp}\webpw.txt') + '"'; Result := Result + ' --password-file "' + ExpandConstant('{tmp}\webpw.txt') + '"';
end; end;
#endif #endif
@@ -287,12 +280,31 @@ begin
'', SW_HIDE, ewWaitUntilTerminated, ResultCode); '', SW_HIDE, ewWaitUntilTerminated, ResultCode);
end; end;
#ifdef WithWeb
{ Stop a running web console + free :3000 BEFORE the file copy, so the old server doesn't lock
.output / web-run.cmd / bun.exe and the new task can bind. Killing the :3000 listener owner is
runtime-agnostic (an early install may have run node, the current one runs bun). web-setup.ps1
repeats this idempotently after the copy. Best-effort; a fresh install is a no-op. }
procedure StopWebConsole;
var
ResultCode: Integer;
begin
Exec('powershell.exe',
'-NoProfile -ExecutionPolicy Bypass -Command "' +
'$ErrorActionPreference=''SilentlyContinue''; ' +
'Stop-ScheduledTask -TaskName PunktfunkWeb; ' +
'Get-NetTCPConnection -LocalPort 3000 -State Listen | ForEach-Object { Stop-Process -Id $_.OwningProcess -Force }"',
'', SW_HIDE, ewWaitUntilTerminated, ResultCode);
end;
#endif
procedure CurStepChanged(CurStep: TSetupStep); procedure CurStepChanged(CurStep: TSetupStep);
begin begin
if CurStep = ssInstall then if CurStep = ssInstall then
begin begin
StopHostServiceAndWait; StopHostServiceAndWait;
#ifdef WithWeb #ifdef WithWeb
StopWebConsole; { upgrade-safe: free :3000 + unlock the web files before the copy }
{ Stash the chosen password for web-setup.ps1 (fresh install only); the temp copy is auto-cleaned. } { Stash the chosen password for web-setup.ps1 (fresh install only); the temp copy is auto-cleaned. }
if FreshWebInstall then if FreshWebInstall then
SaveStringToFile(ExpandConstant('{tmp}\webpw.txt'), Trim(WebPwPage.Values[0]), False); SaveStringToFile(ExpandConstant('{tmp}\webpw.txt'), Trim(WebPwPage.Values[0]), False);
+15 -15
View File
@@ -1,20 +1,20 @@
<# <#
.SYNOPSIS .SYNOPSIS
Stage the pf-vdisplay driver bundle the installer ships into -OutDir: the VENDORED signed pf-vdisplay Stage the pf-vdisplay driver bundle the installer ships into -OutDir: the freshly-built signed
driver + the fetched nefcon device tool. pf-vdisplay driver (from -VendorDir) + the fetched nefcon device tool.
.DESCRIPTION .DESCRIPTION
pf-vdisplay (our all-Rust IddCx virtual display) is built from packaging/windows/drivers/, and pf-vdisplay (our all-Rust IddCx virtual display) is BUILT FROM SOURCE per release by
the SIGNED output (pf_vdisplay.dll/.inf/.cat + punktfunk-driver.cer) is VENDORED under build-pf-vdisplay.ps1 (packaging/windows/drivers/), which emits the signed pf_vdisplay.{dll,inf,cat}
packaging/windows/pf-vdisplay/ (signer punktfunk-ds-test — shared with the gamepad drivers — Class= + punktfunk-driver.cer into a build-output dir. pack-host-installer.ps1 passes that dir here as
Display, HWID root\pf_vdisplay). Rebuild + re-vendor with -VendorDir; we copy it + add the nefcon tool. (The old checked-in binaries under
packaging/windows/drivers/deploy-dev.ps1 when the driver source changes, then copy the staged packaging/windows/pf-vdisplay/ were retired - building from source keeps the .dll/.inf/.cat in
pf_vdisplay.{dll,inf,cat} over the vendored copies. nefcon publishes a pinned release, so we fetch + lockstep; see design/windows-build-and-packaging.md.) nefcon publishes a pinned release, so we fetch +
SHA-256-verify it (it provides nefconc.exe, used to create the root-enumerated device node pnputil SHA-256-verify it (it provides nefconc.exe, used to create the root-enumerated device node - pnputil
can't). can't).
Output (consumed by punktfunk-host.iss): -OutDir gets pf_vdisplay.inf/.cat/.dll + punktfunk-driver.cer Output (consumed by punktfunk-host.iss): -OutDir gets pf_vdisplay.inf/.cat/.dll + punktfunk-driver.cer
and nefconc.exe (x64). pack-host-installer.ps1 also drops install-pf-vdisplay.ps1 in. and nefconc.exe (x64).
.EXAMPLE .EXAMPLE
pwsh -File stage-pf-vdisplay.ps1 -OutDir C:\t\out\stage pwsh -File stage-pf-vdisplay.ps1 -OutDir C:\t\out\stage
@@ -22,7 +22,7 @@
[CmdletBinding()] [CmdletBinding()]
param( param(
[Parameter(Mandatory = $true)][string]$OutDir, [Parameter(Mandatory = $true)][string]$OutDir,
[string]$VendorDir = (Join-Path $PSScriptRoot 'pf-vdisplay'), [Parameter(Mandatory = $true)][string]$VendorDir, # the build-pf-vdisplay.ps1 output dir
# PINNED nefcon release (https://github.com/nefarius/nefcon/releases). MIT-licensed. # PINNED nefcon release (https://github.com/nefarius/nefcon/releases). MIT-licensed.
[string]$NefconUrl = 'https://github.com/nefarius/nefcon/releases/download/v1.17.40/nefcon_v1.17.40.zip', [string]$NefconUrl = 'https://github.com/nefarius/nefcon/releases/download/v1.17.40/nefcon_v1.17.40.zip',
[string]$NefconSha256 = '812bae7ed7dfb7d6d2284bc7de2f8ccebc92ed2a0b1ae893c53b337096e50c1a' [string]$NefconSha256 = '812bae7ed7dfb7d6d2284bc7de2f8ccebc92ed2a0b1ae893c53b337096e50c1a'
@@ -34,11 +34,11 @@ $PSNativeCommandUseErrorActionPreference = $false
if (Test-Path $OutDir) { Remove-Item -Recurse -Force $OutDir } if (Test-Path $OutDir) { Remove-Item -Recurse -Force $OutDir }
New-Item -ItemType Directory -Force -Path $OutDir | Out-Null New-Item -ItemType Directory -Force -Path $OutDir | Out-Null
# --- vendored pf-vdisplay driver -------------------------------------------------------------- # --- built pf-vdisplay driver -----------------------------------------------------------------
$inf = Get-ChildItem -Path $VendorDir -Filter pf_vdisplay.inf -ErrorAction SilentlyContinue | Select-Object -First 1 $inf = Get-ChildItem -Path $VendorDir -Filter pf_vdisplay.inf -ErrorAction SilentlyContinue | Select-Object -First 1
if (-not $inf) { throw "no vendored pf_vdisplay.inf under $VendorDir — re-vendor via drivers/deploy-dev.ps1" } if (-not $inf) { throw "no pf_vdisplay.inf under $VendorDir - did build-pf-vdisplay.ps1 run?" }
Copy-Item (Join-Path $VendorDir '*') $OutDir -Force Copy-Item (Join-Path $VendorDir '*') $OutDir -Force
Write-Host "==> vendored pf-vdisplay staged from $VendorDir" Write-Host "==> pf-vdisplay staged from $VendorDir"
# --- nefcon (fetched + verified) -------------------------------------------------------------- # --- nefcon (fetched + verified) --------------------------------------------------------------
$work = Join-Path ([IO.Path]::GetTempPath()) ('nefcon-' + [IO.Path]::GetRandomFileName()) $work = Join-Path ([IO.Path]::GetTempPath()) ('nefcon-' + [IO.Path]::GetRandomFileName())
@@ -54,7 +54,7 @@ try {
} }
Write-Host " sha256 ok ($got)" Write-Host " sha256 ok ($got)"
} }
else { Write-Warning "no pinned nefcon SHA-256 computed $got (PIN THIS in stage-pf-vdisplay.ps1)" } else { Write-Warning "no pinned nefcon SHA-256 - computed $got (PIN THIS in stage-pf-vdisplay.ps1)" }
Expand-Archive -Path $zip -DestinationPath $work -Force Expand-Archive -Path $zip -DestinationPath $work -Force
$nefc = Get-ChildItem -Path $work -Recurse -Filter 'nefconc.exe' | $nefc = Get-ChildItem -Path $work -Recurse -Filter 'nefconc.exe' |
Where-Object { $_.FullName -match '(?i)\\x64\\' } | Select-Object -First 1 Where-Object { $_.FullName -match '(?i)\\x64\\' } | Select-Object -First 1
-35
View File
@@ -1,35 +0,0 @@
[package]
edition = "2024"
name = "pf-xusb"
version = "0.1.0"
publish = false
license = "MIT OR Apache-2.0"
description = "punktfunk virtual Xbox 360 XUSB companion (UMDF2 — classic XInput)"
[package.metadata.wdk.driver-model]
driver-type = "UMDF"
target-umdf-version-minor = 31
umdf-version-major = 2
[lib]
crate-type = ["cdylib"]
[build-dependencies]
wdk-build.path = "../../crates/wdk-build"
[dependencies]
wdk.path = "../../crates/wdk"
wdk-sys.path = "../../crates/wdk-sys"
[features]
default = []
nightly = ["wdk-sys/nightly", "wdk/nightly"]
[profile.dev]
lto = true
[profile.release]
lto = true
# Standalone package (not part of the windows-drivers-rs root workspace).
[workspace]
@@ -1,4 +0,0 @@
extend = [
{ path = "../../crates/wdk-build/rust-driver-makefile.toml" },
{ path = "../../crates/wdk-build/rust-driver-sample-makefile.toml" },
]
+10 -30
View File
@@ -52,31 +52,10 @@ if ($haveCargoWdk) {
if ($LASTEXITCODE -ne 0) { throw "cargo install cargo-wdk failed ($LASTEXITCODE)" } if ($LASTEXITCODE -ne 0) { throw "cargo install cargo-wdk failed ($LASTEXITCODE)" }
} }
# ---- 2b. LLVM 21.1.2 (libclang for wdk-sys bindgen) ---- # ---- 2b. (LLVM is NOT pinned) ----
# wdk-sys's bindgen layout tests overflow (E0080 on threadlocaleinfostruct etc.) with LLVM ToT / dev # The runner's default clang (C:\Program Files\LLVM) builds the driver workspace clean now that the
# (22+) builds — the runner's default C:\Program Files\LLVM. The windows-drivers-rs maintainers confirm # vendored bindgen is 0.72 (the shipping pack proves it). LLVM 21.1.2 was briefly installed here to dodge
# the RELEASED LLVM 21.1.2 builds clean (discussion #591). Install it to a dedicated path so the client # a bindgen-0.71 layout-test overflow on clang 22; the 0.72 bump retired that pin. No LLVM install needed.
# builds' LLVM is untouched; the driver-build CI job points LIBCLANG_PATH here.
$llvmDir = 'C:\llvm-21'
$libclang = Join-Path $llvmDir 'bin\libclang.dll'
if (Test-Path $libclang) {
info "LLVM 21.1.2 already present ($libclang) — skipping."
} else {
# Use the portable .tar.xz, NOT the NSIS .exe: the .exe /S silent install HANGS in the headless SYSTEM
# CI session (observed stuck >15 min after download, blocking the runner). Win11's bundled tar
# (bsdtar/libarchive) extracts .tar.xz; the tarball has one top-level dir, so --strip-components=1
# lands bin/ at $llvmDir\bin. We only need libclang.dll + the clang resource dir for bindgen.
$url = 'https://github.com/llvm/llvm-project/releases/download/llvmorg-21.1.2/clang+llvm-21.1.2-x86_64-pc-windows-msvc.tar.xz'
$tmp = Join-Path $env:TEMP 'llvm-21.1.2-msvc.tar.xz'
info "Downloading LLVM 21.1.2 portable archive (~898 MB) -> $tmp"
& curl.exe -L --fail --retry 3 -o $tmp $url
if ($LASTEXITCODE -ne 0) { throw "LLVM archive download failed ($LASTEXITCODE)" }
info ("downloaded {0:N0} MB; extracting to $llvmDir (tar, strip 1)..." -f ((Get-Item $tmp).Length / 1MB))
New-Item -ItemType Directory -Force -Path $llvmDir | Out-Null
& tar -xf $tmp -C $llvmDir --strip-components=1
if ($LASTEXITCODE -ne 0) { throw "tar extract of LLVM failed ($LASTEXITCODE) — runner tar may lack .xz support" }
if (-not (Test-Path $libclang)) { throw "LLVM extract did not produce $libclang" }
}
# ---- 3. Verify (enumerate the REAL layout; fail only on build-essential absences) ---- # ---- 3. Verify (enumerate the REAL layout; fail only on build-essential absences) ----
Write-Host "" Write-Host ""
@@ -99,10 +78,11 @@ foreach ($t in 'inf2cat.exe','stampinf.exe','signtool.exe','makecat.exe','InfVer
} }
try { $cw = (& cargo wdk --version 2>&1) -join ' ' } catch { $cw = '' } try { $cw = (& cargo wdk --version 2>&1) -join ' ' } catch { $cw = '' }
found 'cargo-wdk' $cw found 'cargo-wdk' $cw
found 'LLVM 21.1.2 libclang' ($(if (Test-Path $libclang) { $libclang } else { 'MISSING' })) $defClang = 'C:\Program Files\LLVM\bin\libclang.dll'
found 'default libclang' ($(if (Test-Path $defClang) { $defClang } else { 'MISSING (clang not on the runner?)' }))
# Block only on the genuinely build-essential pieces (headers + iddcx + cargo-wdk + the pinned libclang). # Block only on the genuinely build-essential pieces (headers + iddcx + cargo-wdk).
# inf2cat arch quirks are non-fatal — cargo-wdk locates the WDK tools itself. # inf2cat arch quirks are non-fatal — cargo-wdk locates the WDK tools itself.
$essential = ($null -ne $umdfHdr) -and (Test-Path $kmDir) -and ($iddcxVers -ne '') -and ($cw -match 'wdk') -and (Test-Path $libclang) $essential = ($null -ne $umdfHdr) -and (Test-Path $kmDir) -and ($iddcxVers -ne '') -and ($cw -match 'wdk')
if (-not $essential) { throw "provisioning incomplete: need wdf.h + km headers + iddcx + cargo-wdk + LLVM 21.1.2 (see above)" } if (-not $essential) { throw "provisioning incomplete: need wdf.h + km headers + iddcx + cargo-wdk (see above)" }
info "WDK + cargo-wdk + LLVM 21.1.2 provisioned OK. Driver builds pin Version_Number=$SdkVersion + LIBCLANG_PATH=$llvmDir\bin." info "WDK + cargo-wdk provisioned OK. Driver builds use Version_Number=$SdkVersion + the runner-default clang (bindgen 0.72)."
+3 -3
View File
@@ -50,9 +50,9 @@ and log in with the password the installer shows on its final page. To change it
powershell -ExecutionPolicy Bypass -File scripts\windows\build-web.ps1 powershell -ExecutionPolicy Bypass -File scripts\windows\build-web.ps1
``` ```
`bun install && bun run build`, installs the externalized server deps into `.output/server` `bun install && bun run build` (Nitro `noExternals` -> a self-contained `.output`, no
(with the `@unom` `.npmrc`), then restarts the `PunktfunkWeb` task and checks `:3000/login`. Use `node_modules`/`.npmrc`), then restarts the `PunktfunkWeb` task and checks `:3000/login`. Use
this to iterate on the console against an installed host `web-setup.ps1` (or a fresh install) is this to iterate on the console against an installed host - `web-setup.ps1` (or a fresh install) is
what creates the task in the first place. what creates the task in the first place.
## Typical flow after pulling new code ## Typical flow after pulling new code
+1 -1
View File
@@ -23,7 +23,7 @@ if ($LASTEXITCODE -ne 0) { throw "web build failed (exit $LASTEXITCODE)" }
Write-Host "restarting $task ..." Write-Host "restarting $task ..."
& schtasks /end /tn $task 2>$null | Out-Null & schtasks /end /tn $task 2>$null | Out-Null
Get-CimInstance Win32_Process -Filter "Name='bun.exe'" -ErrorAction SilentlyContinue | Get-CimInstance Win32_Process -Filter "Name='bun.exe' OR Name='node.exe'" -ErrorAction SilentlyContinue |
Where-Object { $_.CommandLine -match 'index\.mjs' } | Where-Object { $_.CommandLine -match 'index\.mjs' } |
ForEach-Object { Stop-Process -Id $_.ProcessId -Force -ErrorAction SilentlyContinue } ForEach-Object { Stop-Process -Id $_.ProcessId -Force -ErrorAction SilentlyContinue }
Start-Sleep 2 Start-Sleep 2
+1 -1
View File
@@ -1,5 +1,5 @@
@echo off @echo off
rem punktfunk web console launcher the action the PunktfunkWeb scheduled task runs at boot. rem punktfunk web console launcher - the action the PunktfunkWeb scheduled task runs at boot.
rem rem
rem Lays out next to the installed payload: {app}\web\web-run.cmd, {app}\web\.output\... and rem Lays out next to the installed payload: {app}\web\web-run.cmd, {app}\web\.output\... and
rem {app}\bun\bun.exe (so %~dp0 = {app}\web\). Auto-wires the console the same way the Linux rem {app}\bun\bun.exe (so %~dp0 = {app}\web\). Auto-wires the console the same way the Linux
-93
View File
@@ -1,93 +0,0 @@
<#
Provision the punktfunk web console after the host installer has laid down its payload
({app}\web\.output, {app}\bun\bun.exe, {app}\web\web-run.cmd). Invoked elevated from the
installer's [Run] section; idempotent (safe to re-run on upgrade).
1. Sets the console login password file %ProgramData%\punktfunk\web-password
(PUNKTFUNK_UI_PASSWORD=...), ACL'd to Administrators + SYSTEM only:
- if -PasswordFile points at a non-empty temp file (a FRESH install collected one on the
wizard page), use that;
- else if the file already exists (UPGRADE), keep it untouched;
- else generate a random one (fallback, so the console never boots auth-misconfigured).
2. Registers the PunktfunkWeb scheduled task: at boot, as SYSTEM/Highest, restart-on-failure,
no execution time limit (a long-running server), running {app}\web\web-run.cmd.
3. Opens inbound TCP 3000 (the console port) on all profiles.
4. Waits briefly for the host's mgmt token, then starts the task.
The mgmt bearer token is NOT managed here — the host owns %ProgramData%\punktfunk\mgmt-token
(crates/punktfunk-host/src/mgmt_token.rs writes it on `serve`); web-run.cmd sources it.
#>
[CmdletBinding()]
param(
[Parameter(Mandatory = $true)][string]$AppDir, # the installer's {app}
[string]$PasswordFile # temp file with the chosen password (fresh install)
)
$ErrorActionPreference = 'Stop'
$TaskName = 'PunktfunkWeb'
$dataDir = Join-Path $env:ProgramData 'punktfunk'
$pwFile = Join-Path $dataDir 'web-password'
$tokenFile = Join-Path $dataDir 'mgmt-token'
New-Item -ItemType Directory -Force -Path $dataDir | Out-Null
function New-RandomPassword {
# URL/shell-safe (no /+=) so it's a clean env-file value and cmd-token, like scripts/web-init.sh.
$bytes = New-Object byte[] 24
([System.Security.Cryptography.RandomNumberGenerator]::Create()).GetBytes($bytes)
$s = [Convert]::ToBase64String($bytes) -replace '[/+=]', ''
return $s.Substring(0, [Math]::Min(20, $s.Length))
}
# --- 1. login password -----------------------------------------------------------------------
$password = $null
if ($PasswordFile -and (Test-Path -LiteralPath $PasswordFile)) {
$password = (Get-Content -LiteralPath $PasswordFile -Raw).Trim()
}
if (-not $password) {
if (Test-Path -LiteralPath $pwFile) {
Write-Host "keeping existing web console password ($pwFile)"
}
else {
$password = New-RandomPassword
Write-Host "no password supplied - generated a random web console password"
}
}
if ($password) {
# LF, no BOM (UTF8) so web-run.cmd's `for /f` reads a clean value.
[IO.File]::WriteAllText($pwFile, "PUNKTFUNK_UI_PASSWORD=$password`n")
# Lock it down: drop inheritance, grant only Administrators (S-1-5-32-544) + SYSTEM (S-1-5-18).
& icacls $pwFile /inheritance:r /grant:r '*S-1-5-32-544:F' '*S-1-5-18:F' | Out-Null
}
# --- 2. PunktfunkWeb scheduled task ----------------------------------------------------------
$cmd = Join-Path $AppDir 'web\web-run.cmd'
if (-not (Test-Path -LiteralPath $cmd)) { throw "web launcher missing: $cmd" }
$action = New-ScheduledTaskAction -Execute $cmd
$trigger = New-ScheduledTaskTrigger -AtStartup
$principal = New-ScheduledTaskPrincipal -UserId 'SYSTEM' -LogonType ServiceAccount -RunLevel Highest
# RestartCount/Interval cover transient crashes + the brief post-install race before the host has
# written the mgmt token (web-run.cmd exits non-zero until then). No time limit: it's a server.
$settings = New-ScheduledTaskSettingsSet -AllowStartIfOnBatteries -DontStopIfGoingOnBatteries `
-StartWhenAvailable -RestartInterval (New-TimeSpan -Minutes 1) -RestartCount 10 `
-ExecutionTimeLimit (New-TimeSpan -Seconds 0)
Register-ScheduledTask -TaskName $TaskName -Action $action -Trigger $trigger -Principal $principal `
-Settings $settings -Description 'punktfunk web management console (Nitro SSR on bun, :3000)' `
-Force | Out-Null
Write-Host "registered scheduled task $TaskName -> $cmd"
# --- 3. firewall: inbound TCP 3000 -----------------------------------------------------------
try {
$fwName = 'PunktfunkWeb-TCP-3000'
Get-NetFirewallRule -Name $fwName -ErrorAction SilentlyContinue | Remove-NetFirewallRule -ErrorAction SilentlyContinue
New-NetFirewallRule -Name $fwName -DisplayName 'punktfunk web console (TCP 3000)' `
-Direction Inbound -Action Allow -Protocol TCP -LocalPort 3000 -Profile Any | Out-Null
Write-Host "firewall: allowed inbound TCP 3000"
}
catch { Write-Warning "could not add the firewall rule for TCP 3000: $($_.Exception.Message)" }
# --- 4. wait for the host's mgmt token, then start -------------------------------------------
# The host service was installed+started just before this; give it a moment to write the token so
# the first start serves immediately (otherwise restart-on-failure picks it up within a minute).
for ($i = 0; $i -lt 30 -and -not (Test-Path -LiteralPath $tokenFile); $i++) { Start-Sleep -Seconds 1 }
Start-ScheduledTask -TaskName $TaskName
Write-Host "started $TaskName (console on http://<host-ip>:3000)"
+3 -3
View File
@@ -48,9 +48,9 @@ export default defineConfig({
preset: "node-server", preset: "node-server",
// BUNDLE every dependency into the server output (no externalized node_modules). Three wins: // BUNDLE every dependency into the server output (no externalized node_modules). Three wins:
// (1) the .output tree drops from ~47k files / 730 MB (the whole untree-shaken @unom/ui dep // (1) the .output tree drops from ~47k files / 730 MB (the whole untree-shaken @unom/ui dep
// tree — payload, lexical, date-fns…) to a handful of tree-shaken chunks; (2) it makes the // tree — payload, lexical, date-fns…) to a handful of tree-shaken chunks; (2) the output is a
// output a self-contained graph `bun build --compile` can fold into ONE native binary (the // self-contained ~75-file `.output` the bundled `bun` runs directly (the Windows installer
// Windows installer ships that instead of node + a node_modules forest); (3) it removes the // ships bun + that `.output`, not node + a node_modules forest); (3) it removes the
// bare external imports (`srvx`, `seroval`…) bun couldn't resolve at runtime — the reason we // bare external imports (`srvx`, `seroval`…) bun couldn't resolve at runtime — the reason we
// used to need node. node still runs the same self-contained output for the Linux .deb. // used to need node. node still runs the same self-contained output for the Linux .deb.
noExternals: true, noExternals: true,