Commit Graph

5 Commits

Author SHA1 Message Date
enricobuehler aef552f04a feat(host/windows): HDR scRGB→P010 in a shader — NVENC native P010, off the SM
apple / swift (push) Successful in 55s
deb / build-publish (push) Successful in 3m9s
decky / build-publish (push) Successful in 13s
ci / rust (push) Successful in 1m14s
ci / web (push) Successful in 30s
ci / docs-site (push) Successful in 30s
windows-host / package (push) Failing after 2m19s
android / android (push) Successful in 3m12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
ci / bench (push) Successful in 4m38s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m42s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m47s
docker / deploy-docs (push) Successful in 18s
On the Windows WGC HDR path the FP16 scRGB capture was fed to NVENC as
R10G10B10A2 (BT.2020 PQ), and NVENC did the RGB→YUV CSC internally on the
contended SM — adding to the encode_ms wall under a GPU-saturating game.
(NVIDIA's D3D11 VideoProcessor can't do RGB→P010 for HDR; that path renders
green, confirmed live — so the convert must be ours.)

New `HdrP010Converter` fuses the tone-map with the BT.2020 RGB→YUV matrix and
emits P010 (10-bit limited range) directly: a luma pass → an R16_UNORM plane
RTV (full-res) and a chroma pass → an R16G16_UNORM plane RTV (half-res, 2x2
box average) of a DXGI_FORMAT_P010 texture. NVENC then takes native P010 and
skips its SM-side convert.

Gated behind PUNKTFUNK_HDR_SHADER_P010 (default OFF → the existing
R10→NVENC path is byte-for-byte unchanged). Colour validated by a new
`hdr-p010-selftest` subcommand: a synthetic scRGB pattern → P010 → readback,
compared to a BT.2020 PQ 10-bit reference — max abs error Y=0.99 / Cb=0.82 /
Cr=0.75 codes on an RTX 4090. Live-validated HDR colours correct (no green).
Build + clippy (--features nvenc -D warnings) green on x86_64-pc-windows-msvc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 09:54:00 +00:00
enricobuehler 4cc57d5c39 perf(host/windows): move capture→encode off the 3D engine (NV12/P010 video-processor path, zero-copy, GPU priority)
apple / swift (push) Successful in 56s
ci / rust (push) Successful in 1m36s
android / android (push) Successful in 1m56s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 28s
deb / build-publish (push) Successful in 2m26s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m33s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m58s
The Windows host capped at ~60 fps with 35-40 ms latency on a GPU-heavy game:
the per-frame capture→encode path shared the 3D engine with the game and got
scheduled behind it. Rework to minimize 3D-engine work per frame:

- VideoConverter (D3D11 video processor): capture → NVENC-native NV12/P010 so
  NVENC skips its internal RGB→YUV (a 3D/compute step). Wired into both DDA
  (dxgi.rs) and WGC (wgc.rs). New PixelFormat::Nv12/P010 + NVENC YUV input.
- GPU scheduling hardening (Apollo-style): D3DKMTSetProcessSchedulingPriorityClass
  HIGH, absolute SetGPUThreadPriority, SetMaximumFrameLatency(1).
- WGC SDR zero-copy (hold pool frames; no CopyResource). DDA keeps a fast
  CopyResource to decouple its single-frame acquire/release from the async convert.
- Pipelined helper encode loop (PUNKTFUNK_ENCODE_DEPTH, default 1) + perf split
  (cap_wait / encode / write).

Live on the RTX 4090: hard 60 fps ceiling removed (now scene-scaling 40-200+),
latency much reduced. Residual cap in GPU-pinned scenes is the irreducible RGB→YUV
convert (no fixed-function unit on NVIDIA — VideoProcessing engine reads 0%) waiting
behind an uncapped game under WDDM context time-slicing; Linux avoids it via
gamescope capping the game to the display refresh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:08:03 +00:00
enricobuehler f469dfcc76 chore(host/windows): clean up DDA capture — fix unused imports, quiet secure-desktop log, sane retry default
- Remove 4 unused imports (PCWSTR in composed_flip, anyhow macro + SizeInt32 in
  wgc, Write in wgc_relay).
- DuplicateOutput1 retry defaults to N=1 (immediate legacy): on the secure
  desktop DuplicateOutput1 is LOGON_UI-only so it always refuses, and the
  release-before-reduplicate + gentle recovery keep the legacy dup stable;
  retrying there only blocked. Still env-tunable (PUNKTFUNK_DUP_RETRY_N/_MS).
- Throttle the 'using legacy DuplicateOutput' warning (expected + once-per-gentle-
  recovery on secure) so a lock dwell doesn't flood the log.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 17:05:02 +00:00
enricobuehler 5c2bcbc2a2 docs(windows): secure-desktop two-process design + WGC impersonation attempt (vestigial)
apple / swift (push) Successful in 55s
android / android (push) Has been cancelled
ci / rust (push) Has been cancelled
ci / web (push) Has been cancelled
ci / docs-site (push) Has been cancelled
ci / bench (push) Has been cancelled
deb / build-publish (push) Has been cancelled
decky / build-publish (push) Has been cancelled
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Has been cancelled
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Has been cancelled
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Has been cancelled
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Has been cancelled
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Has been cancelled
docker / deploy-docs (push) Has been cancelled
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Has been cancelled
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Has been cancelled
Validated design for adding secure-desktop (UAC/lock/login) coverage on top of the shipped WGC
animation fix. Key verified constraint: WGC won't activate under SYSTEM (0x80070424) even with
thread-level ImpersonateLoggedOnUser, and DDA+SendInput on Winlogon need LOCAL_SYSTEM — so one
process can't do both. Architecture: SYSTEM host (QUIC + SudoVDA + DDA-secure + SendInput + AU mux)
+ a USER-session WGC helper (CreateProcessAsUser) that relays encoded Annex-B AUs over a named pipe;
the host muxes helper-AUs (normal desktop) vs its own DDA encoder (secure desktop), switched by a
desktop-name watcher. No shared GPU texture (rejected — MIC/keyed-mutex pain); just AU bytes.
docs/windows-secure-desktop.md has the ordered, box-testable steps.

The impersonate_active_user() in wgc.rs is kept as a harmless no-op (under a user-token process
WTSQueryUserToken fails → no impersonation → WGC works natively); it does NOT make WGC work under
SYSTEM (the two-process design uses a real user process for WGC instead). + Win32_System_RemoteDesktop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 07:08:50 +00:00
enricobuehler 28ab448a29 feat(host/windows): WGC capture backend (overlay/HDR-correct) with watchdog'd DDA fallback
android / android (push) Failing after 46s
apple / swift (push) Successful in 54s
ci / rust (push) Failing after 1m16s
ci / web (push) Successful in 31s
ci / docs-site (push) Successful in 27s
deb / build-publish (push) Successful in 2m23s
decky / build-publish (push) Successful in 10s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m31s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m50s
The capture-architecture reset from the research: add a Windows.Graphics.Capture (WGC) backend that
captures the COMPOSED desktop — including the overlay/independent-flip/MPO planes DXGI Desktop
Duplication misses — which structurally fixes the frozen HDR animations + video (proven live: a WGC
frame decodes to the real 5120x1440 HDR content DDA freezes on). It reuses the whole pipeline
unchanged: the WGC frame's GPU texture → same scRGB→BT.2020-PQ shader → NVENC zero-copy; the OS
composites the cursor (IsCursorCaptureEnabled) so no manual cursor pass. crates/punktfunk-host/src/
capture/wgc.rs; find_output/make_device/HdrConverter/nudge_cursor_onto made pub(crate) for reuse.

Reliability findings + mitigations (live on the RTX 4090):
- WGC can't activate under the SYSTEM account (0x80070424) — it needs the interactive user token. The
  host must run as the user for WGC (run.cmd: drop PsExec -s). DDA still needs SYSTEM for the secure
  desktop — that token reconciliation (impersonation) is the remaining task.
- WGC's Direct3D11CaptureFramePool::CreateFreeThreaded intermittently HANGS on the headless SudoVDA
  (IddCx) display, correlated with accumulated SudoVDA churn (failed REMOVEs leaving lingering
  displays); clean-state opens reliably. Since it's a blocking hang, capture_virtual_output runs WGC
  open on a watchdog thread with a 5s timeout and falls back to DDA on hang/error — the session is
  NEVER left black: WGC when it opens (fixed animations), DDA otherwise. First-frame nudge added (WGC
  fires FrameArrived on change; a static desktop otherwise never delivers the first frame).
- Default WGC; PUNKTFUNK_CAPTURE=dda forces DDA. DDA path unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 06:32:54 +00:00