Files
punktfunk/design/hdr-pipeline-plan.md
T
enricobuehler 7b99b41ede docs(design): trim shipped plans, consolidate cluster, add index
Much of design/ described work that has since shipped. Trim each doc to
its durable rationale + still-open items (the code is the source of truth
for shipped detail; git history holds the full originals).

- Shipped plans -> status stubs: stats-capture, gamestream-host-plan,
  apple-stage2-presenter, windows-service.
- Trimmed completed-out / open-kept: implementation-plan, hdr-pipeline,
  host-latency, gpu-contention (fixed stale status table), game-library,
  linux-setup (fixed m0->spike + stale zero-copy claim),
  session-aware-host-followups, windows-client-bootstrap,
  windows-dualsense-{scoping,game-detection}, windows-virtual-display,
  security-review (per-finding status table; #12 still open),
  apollo-comparison (shipped backlog collapsed to one-liners).
- Windows-host cluster consolidated: windows-host.md -> redirect into
  windows-host-rewrite.md (whose stale scorecard is corrected -- goal1 is
  merged, M4 done); windows-secure-desktop.md archived (now a fallback
  behind IDD-push primary).
- Kept evergreen: ci.md, gamescope-multiuser.md, windows-build-and-packaging.md.
- New design/README.md: per-doc status table + consolidated open-items
  roll-up so nothing is tracked in only one buried doc.
- Repoint 5 code comments to the archived secure-desktop doc path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:39:06 +00:00

9.5 KiB
Raw Permalink Blame History

HDR pipeline — investigation & implementation plan

Status: Steps 03 SHIPPED — protocol/ABI/host in 3526517, client apply + display capability-gate in 551012b. Windows HDR live-validated; Apple/Android CI-compiled (on-glass pending). Step 4 (Linux) is OPEN, blocked upstream on capture. This doc is trimmed to design rationale + open items; the shipped code is the source of truth. The original audit (full gap list, per-file line refs, blocker walkthroughs) is in git history before this trim.

Goal: true, correct HDR glass-to-glass for punktfunk, across the host (Windows today; Linux blocked upstream) and every client (Windows / Apple / Android / Linux). The plan was produced from a full read of every HDR-touching subsystem cross-checked against the HDR10 standards (CICP/H.273 VUI, SMPTE ST.2086 mastering, CEA-861.3 MaxCLL/MaxFALL) and the Sunshine/Apollo/Moonlight reference.

The original diagnosis: pixel math + the HEVC VUI we emitted were already correct (self-test validated, matches Apollo), but nothing measured, signalled, transported, or applied the static HDR metadata (mastering display colour volume + content light level). The fix was a metadata chain, protocol-first.

What shipped (Steps 03)

  • Step 0 — protocol + ABI carry colour end to end (3526517). ColorInfo (4 CICP bytes on Welcome) + HdrMeta (0xCE datagram, bounds-checked); NativeClient color/bit_depth fields
    • an HdrMeta receiver/demux + next_hdr_meta. C ABI: punktfunk_connect_ex5(... video_caps), next_hdr_meta, color_info, and fixed abi.rs:896 video_caps = 0 — the one-line root cause that had made Apple's complete (and correct) HDR pipeline dead code. Header regenerated. No rendering changes, CI-testable (round-trip + truncation + SDR back-compat).
  • Step 1 — host in-band SEI + complete VUI (3526517, live-validated on the RTX box). Cross-platform byte logic in unit-tested src/hdr.rs: hdr_meta_from_display, hevc_mastering_display_sei SEI type 137, hevc_content_light_level_sei SEI type 144 (note: NOT "type 4" — that was a drafting error). Windows dxgi.rs/wgc.rs read IDXGIOutput6::GetDesc1 at capture init / output change → HdrMeta (MaxCLL/MaxFALL left 0, like Apollo); nvenc.rs attaches mastering + CLL SEI on every IDR for HEVC/H.264 and sends the real 0xCE re-sent each keyframe. In-band SEI is read directly by decoders, so this fixed correctness before clients consumed the protocol and gave an Apollo on-glass parity gate. Follow-ups: AV1 mastering rides METADATA OBUs (HDR_MDCV/HDR_CLL), not SEI; the Windows secure-desktop relay still sends only the generic baseline 0xCE (the helper's in-band SEI carries the real grade).
  • Step 2 — clients apply the metadata (551012b; Apple/Android CI-compiled). Each client drains next_hdr_meta/nextHdrMeta and remaps from the wire form (ST.2086 G,B,R order, mastering luminance in 0.0001 cd/m²) to the platform layout: Windows SetHDRMetaData (hdr_meta_to_dxgi: G,B,R→R,G,B reorder, 0.0001-nit→nit), dropping the 1000/1000/400 hardcode; Apple CVBufferSetAttachment of kCVImageBufferMasteringDisplayColorVolumeKey (24-byte BE) + kCVImageBufferContentLightLevelInfoKey (4-byte BE) per HDR pixel buffer — the correct path for the itur_2100_PQ layer (CAEDRMetadata on a PQ layer is ambiguous, deliberately avoided); Android MediaFormat KEY_HDR_STATIC_INFO, a 25-byte CTA-861.3 Type-1 blob (LE, R,G,B order, max-lum in nits-u16). Apple's connect also flips to connect_ex5 advertising videoCap10Bit|videoCapHDR — the fix that resurrects Apple's previously-dead pipeline.
  • Step 3 — display-capability gate (551012b). Chosen approach: capability-gate, not an in-shader BT.2390 tone-map. Rationale: with Steps 12 the host sends correct mastering metadata so an HDR display self-tone-maps; the remaining gap is SDR displays, best fixed by not advertising HDR you can't present — the host then sends a proper BT.709 SDR stream instead of PQ the panel mis-tone-maps (washed-out/dark). No guessed curve, deterministic. Windows display_supports_hdr (any IDXGIOutput6 colour space == G2084); Apple NSScreen.maximumExtendedDynamicRangeColorComponentValue > 1 (macOS) / UIScreen.main.potentialEDRHeadroom > 1 (iOS); Android Display.getHdrCapabilities HDR10/HDR10+. Each ANDs with the user's HDR setting before advertising caps and logs when it drops to SDR.

Wire format — design decisions worth keeping

Two layers, both back-compat-safe via the established trailing-bytes / new-datagram-tag patterns.

  • (A) Per-session colorimetry — 4 trailing bytes on Welcome (offsets 60..64): colour_primaries (1=BT.709, 9=BT.2020) · transfer_characteristics (1=BT.709, 16=PQ/SMPTE2084, 18=HLG) · matrix_coeffs (1=BT.709, 9=BT.2020-NCL — never emit 10 (CL): no client decodes it) · video_full_range_flag. Decoded with b.get(60).unwrap_or(1) so an older host that omits them → BT.709 limited SDR (today's behaviour). A future mirror on Reconfigured announces a mid-stream SDR↔HDR / BT.709↔BT.2020 flip (deferred; today a mode switch never changes colour, and 0xCE re-send covers mastering changes).
  • (B) Per-change mastering + CLL — host→client datagram tag 0xCE, 28 bytes, standard SEI fixed-point (display primaries G,B,R + white point in 1/50000 units; max/min mastering luminance in 0.0001 cd/m²; MaxCLL/MaxFALL in nits). ST.2086 is variable, so it rides a datagram rather than the Welcome. Re-sent on every IDR/RFI keyframe so a client that dropped the best-effort datagram converges within a GOP; until first receipt the client uses the Welcome transfer + a documented generic default. Bounds-check length before reading (reassembler-bounds security invariant — truncation test required). Omitted entirely for HLG. Units map straight to DXGI DXGI_HDR_METADATA_HDR10, Android KEY_HDR_STATIC_INFO, Apple CAEDRMetadata.hdr10; the libavcodec/Linux side needs conversion — AVMasteringDisplayMetadata stores AVRational, not raw fixed-point.

Out of scope (accepted — call out, don't build)

  • Dynamic metadata: HDR10+ (ST.2094-40) and Dolby Vision RPU. We handle static ST.2086 only, with mid-stream changes carried by re-sending the static block.
  • HLG: the transfer enum carries 18 from day one (free), but the 0xCE mastering datagram is omitted for HLG (scene-referred, no mastering metadata).

Step 4 — Linux (last; capture blocked upstream) — OPEN

  • (4a) 8-bit→Main10 NVENC upconvert shim (encode/linux.rs) — Main10 transport with correct VUI/SEI without HDR capture (gated so we don't claim HDR transfer on SDR content).
  • Linux encode colour + side-data (the deferred Step 1c): set color_primaries/trc/colorspace/range from the negotiated ColorInfo and attach AV_FRAME_DATA_MASTERING_DISPLAY_METADATA / CONTENT_LIGHT_LEVEL side-data (with the AVRational conversion) in encode/linux.rs + vaapi.rs — only once the encoder actually produces 10-bit, so the signalling matches the bits. (Linux capture is 8-bit only, so signalling BT.2020 PQ + attaching mastering side-data on a downconverted 8-bit stream would be incorrect — hence deferred out of Step 1.)
  • (4b) True 10-bit capture: offer ABGR2101010/P010 PipeWire formats + read colorimetry; pilot on Sway/wlroots; blocked on gamescope #2126 (portals don't wire PipeWire 1.6 BT.2020/PQ). Don't block the rest of the plan on it.
  • (4c) Linux client: ex5 caps, P010 decode, GdkDmabufTexture CICP from Welcome, wp_color_management when GTK ≥ 4.14. (Also a standalone SDR bug: software path applies BT.601 to BT.709 — needs a BT.601→BT.709 sws + texture color_state.)

Deferred validation (need on-glass / the RTX box)

  • The mid-session Reconfigure "downgrade to SDR" for a monitor move HDR↔SDR.
  • Confirm the host produces SDR for an SDR client even off an HDR desktop — on the native path the per-session SudoVDA follows the negotiated depth (SDR client → SDR virtual display → SDR stream), so it should hold end to end; verify the stale-HDR-SudoVDA edge case.

Open questions

  • MaxCLL source: GetDesc1 doesn't expose it (Apollo zeroes). Static default, or measure per-frame peak in the PQ shader (only truly-correct, adds a readback)?
  • GameStream: implement SS_HDR_METADATA (Moonlight SS_HDR_METADATA blob on the ENet control channel) for parity, or keep it deliberately SDR and steer HDR users to punktfunk/1?
  • HLG: carry the enum from day one (free) — but do any sources actually produce HLG?
  • Linux: is shipping the 8-bit→Main10 shim as "HDR-capable transport" acceptable, or does it risk advertising HDR we can't truly deliver?

Ordering rationale

Step 0 first: it's the keystone (metadata transport is the dominant cross-cutter; the ABI video_caps = 0 line is a one-line root cause) and needs no hardware. Step 1 next: in-band SEI is read directly by decoders, so it fixes correctness even before our clients consume the protocol, and gives an Apollo-parity on-glass gate. Steps 23 are mechanical per-client wiring once metadata flows. Linux is last because capture is gated on upstream we don't control; the shim delivers Main10 transport without that dependency.

Hardware dependencies: Step 0 = none (CI); Step 1 = RTX Windows host; Steps 23 = a real HDR display per platform; Step 4 = a Linux GPU box + HDR-capable Wayland compositor.