4afdb18cc4da519e0a04fb8a43e665e1575cd6f4
5 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
aef552f04a |
feat(host/windows): HDR scRGB→P010 in a shader — NVENC native P010, off the SM
apple / swift (push) Successful in 55s
deb / build-publish (push) Successful in 3m9s
decky / build-publish (push) Successful in 13s
ci / rust (push) Successful in 1m14s
ci / web (push) Successful in 30s
ci / docs-site (push) Successful in 30s
windows-host / package (push) Failing after 2m19s
android / android (push) Successful in 3m12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
ci / bench (push) Successful in 4m38s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m42s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m47s
docker / deploy-docs (push) Successful in 18s
On the Windows WGC HDR path the FP16 scRGB capture was fed to NVENC as R10G10B10A2 (BT.2020 PQ), and NVENC did the RGB→YUV CSC internally on the contended SM — adding to the encode_ms wall under a GPU-saturating game. (NVIDIA's D3D11 VideoProcessor can't do RGB→P010 for HDR; that path renders green, confirmed live — so the convert must be ours.) New `HdrP010Converter` fuses the tone-map with the BT.2020 RGB→YUV matrix and emits P010 (10-bit limited range) directly: a luma pass → an R16_UNORM plane RTV (full-res) and a chroma pass → an R16G16_UNORM plane RTV (half-res, 2x2 box average) of a DXGI_FORMAT_P010 texture. NVENC then takes native P010 and skips its SM-side convert. Gated behind PUNKTFUNK_HDR_SHADER_P010 (default OFF → the existing R10→NVENC path is byte-for-byte unchanged). Colour validated by a new `hdr-p010-selftest` subcommand: a synthetic scRGB pattern → P010 → readback, compared to a BT.2020 PQ 10-bit reference — max abs error Y=0.99 / Cb=0.82 / Cr=0.75 codes on an RTX 4090. Live-validated HDR colours correct (no green). Build + clippy (--features nvenc -D warnings) green on x86_64-pc-windows-msvc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
4cc57d5c39 |
perf(host/windows): move capture→encode off the 3D engine (NV12/P010 video-processor path, zero-copy, GPU priority)
apple / swift (push) Successful in 56s
ci / rust (push) Successful in 1m36s
android / android (push) Successful in 1m56s
ci / web (push) Successful in 27s
ci / docs-site (push) Successful in 28s
deb / build-publish (push) Successful in 2m26s
decky / build-publish (push) Successful in 11s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m33s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m58s
The Windows host capped at ~60 fps with 35-40 ms latency on a GPU-heavy game: the per-frame capture→encode path shared the 3D engine with the game and got scheduled behind it. Rework to minimize 3D-engine work per frame: - VideoConverter (D3D11 video processor): capture → NVENC-native NV12/P010 so NVENC skips its internal RGB→YUV (a 3D/compute step). Wired into both DDA (dxgi.rs) and WGC (wgc.rs). New PixelFormat::Nv12/P010 + NVENC YUV input. - GPU scheduling hardening (Apollo-style): D3DKMTSetProcessSchedulingPriorityClass HIGH, absolute SetGPUThreadPriority, SetMaximumFrameLatency(1). - WGC SDR zero-copy (hold pool frames; no CopyResource). DDA keeps a fast CopyResource to decouple its single-frame acquire/release from the async convert. - Pipelined helper encode loop (PUNKTFUNK_ENCODE_DEPTH, default 1) + perf split (cap_wait / encode / write). Live on the RTX 4090: hard 60 fps ceiling removed (now scene-scaling 40-200+), latency much reduced. Residual cap in GPU-pinned scenes is the irreducible RGB→YUV convert (no fixed-function unit on NVIDIA — VideoProcessing engine reads 0%) waiting behind an uncapped game under WDDM context time-slicing; Linux avoids it via gamescope capping the game to the display refresh. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
f469dfcc76 |
chore(host/windows): clean up DDA capture — fix unused imports, quiet secure-desktop log, sane retry default
- Remove 4 unused imports (PCWSTR in composed_flip, anyhow macro + SizeInt32 in wgc, Write in wgc_relay). - DuplicateOutput1 retry defaults to N=1 (immediate legacy): on the secure desktop DuplicateOutput1 is LOGON_UI-only so it always refuses, and the release-before-reduplicate + gentle recovery keep the legacy dup stable; retrying there only blocked. Still env-tunable (PUNKTFUNK_DUP_RETRY_N/_MS). - Throttle the 'using legacy DuplicateOutput' warning (expected + once-per-gentle- recovery on secure) so a lock dwell doesn't flood the log. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
5c2bcbc2a2 |
docs(windows): secure-desktop two-process design + WGC impersonation attempt (vestigial)
apple / swift (push) Successful in 55s
android / android (push) Has been cancelled
ci / rust (push) Has been cancelled
ci / web (push) Has been cancelled
ci / docs-site (push) Has been cancelled
ci / bench (push) Has been cancelled
deb / build-publish (push) Has been cancelled
decky / build-publish (push) Has been cancelled
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Has been cancelled
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Has been cancelled
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Has been cancelled
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Has been cancelled
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Has been cancelled
docker / deploy-docs (push) Has been cancelled
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Has been cancelled
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Has been cancelled
Validated design for adding secure-desktop (UAC/lock/login) coverage on top of the shipped WGC animation fix. Key verified constraint: WGC won't activate under SYSTEM (0x80070424) even with thread-level ImpersonateLoggedOnUser, and DDA+SendInput on Winlogon need LOCAL_SYSTEM — so one process can't do both. Architecture: SYSTEM host (QUIC + SudoVDA + DDA-secure + SendInput + AU mux) + a USER-session WGC helper (CreateProcessAsUser) that relays encoded Annex-B AUs over a named pipe; the host muxes helper-AUs (normal desktop) vs its own DDA encoder (secure desktop), switched by a desktop-name watcher. No shared GPU texture (rejected — MIC/keyed-mutex pain); just AU bytes. docs/windows-secure-desktop.md has the ordered, box-testable steps. The impersonate_active_user() in wgc.rs is kept as a harmless no-op (under a user-token process WTSQueryUserToken fails → no impersonation → WGC works natively); it does NOT make WGC work under SYSTEM (the two-process design uses a real user process for WGC instead). + Win32_System_RemoteDesktop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
28ab448a29 |
feat(host/windows): WGC capture backend (overlay/HDR-correct) with watchdog'd DDA fallback
android / android (push) Failing after 46s
apple / swift (push) Successful in 54s
ci / rust (push) Failing after 1m16s
ci / web (push) Successful in 31s
ci / docs-site (push) Successful in 27s
deb / build-publish (push) Successful in 2m23s
decky / build-publish (push) Successful in 10s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 6s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 4s
ci / bench (push) Successful in 4m31s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m15s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m50s
The capture-architecture reset from the research: add a Windows.Graphics.Capture (WGC) backend that captures the COMPOSED desktop — including the overlay/independent-flip/MPO planes DXGI Desktop Duplication misses — which structurally fixes the frozen HDR animations + video (proven live: a WGC frame decodes to the real 5120x1440 HDR content DDA freezes on). It reuses the whole pipeline unchanged: the WGC frame's GPU texture → same scRGB→BT.2020-PQ shader → NVENC zero-copy; the OS composites the cursor (IsCursorCaptureEnabled) so no manual cursor pass. crates/punktfunk-host/src/ capture/wgc.rs; find_output/make_device/HdrConverter/nudge_cursor_onto made pub(crate) for reuse. Reliability findings + mitigations (live on the RTX 4090): - WGC can't activate under the SYSTEM account (0x80070424) — it needs the interactive user token. The host must run as the user for WGC (run.cmd: drop PsExec -s). DDA still needs SYSTEM for the secure desktop — that token reconciliation (impersonation) is the remaining task. - WGC's Direct3D11CaptureFramePool::CreateFreeThreaded intermittently HANGS on the headless SudoVDA (IddCx) display, correlated with accumulated SudoVDA churn (failed REMOVEs leaving lingering displays); clean-state opens reliably. Since it's a blocking hang, capture_virtual_output runs WGC open on a watchdog thread with a 5s timeout and falls back to DDA on hang/error — the session is NEVER left black: WGC when it opens (fixed animations), DDA otherwise. First-frame nudge added (WGC fires FrameArrived on change; a static desktop otherwise never delivers the first frame). - Default WGC; PUNKTFUNK_CAPTURE=dda forces DDA. DDA path unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |