feat(hdr): Windows HDR10 + 10-bit end-to-end, negotiated; non-blocking capture recovery
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 1m32s
android / android (push) Successful in 1m49s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 30s
ci / bench (push) Successful in 1m36s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
deb / build-publish (push) Successful in 2m20s
flatpak / build-publish (push) Successful in 4m6s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m11s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m32s

Adds true HDR (BT.2020 PQ) and 10-bit (HEVC Main10) streaming, negotiated so an
8-bit/SDR client is never sent a stream it can't decode, plus a robust fix for the
capture losing the stream across a secure-desktop transition.

Protocol (punktfunk-core/quic.rs):
- Hello gains `video_caps` (VIDEO_CAP_10BIT / VIDEO_CAP_HDR), Welcome gains `bit_depth`,
  both as optional trailing bytes (back-compat). client-rs advertises 10-bit via
  PUNKTFUNK_CLIENT_10BIT; the connector advertises 0 for now (in-band detection drives
  the native clients). Regenerated punktfunk_core.h.

Windows host:
- 10-bit Main10: host enables it only when the client advertised VIDEO_CAP_10BIT AND
  PUNKTFUNK_10BIT is set; threaded through open_video → NVENC (profile Main10,
  pixelBitDepthMinus8).
- HDR: when the captured desktop is scRGB FP16 (R16G16B16A16_FLOAT, HDR on), copy it to
  an FP16 surface, composite the cursor there, convert scRGB → BT.2020 PQ 10-bit
  (R10G10B10A2) via a shader, and encode HEVC Main10 with the BT.2020/PQ colour VUI
  (ABGR10 input). Fixes the freeze + cursor-trail that came from feeding FP16 into the
  BGRA path. Reacts dynamically to the HDR toggle.
- Capture recovery: rebuild is now a single NON-BLOCKING attempt, throttled to ~4×/s,
  repeating the last good frame between attempts (format-tagged last_present). During a
  secure-desktop dwell SudoVDA's output is gone; the old blocking 12 s retry starved the
  send loop for seconds so the client timed out and disconnected — now the session stays
  fed (frozen) until the desktop returns. Also seeds a black frame on recovery.

Apple client (PunktfunkKit):
- Detects HDR in-band from the stream VUI (PQ transfer function), decodes to 10-bit P010,
  and presents via an rgba16Float + BT.2020 PQ CAMetalLayer with EDR; SDR path unchanged.
  Switches automatically on a mid-session HDR toggle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-15 20:28:52 +00:00
parent f5eae24c87
commit bbabc04bca
19 changed files with 785 additions and 129 deletions
@@ -7,6 +7,7 @@
// (which fires on the main runloop). The Metal objects + texture cache are touched only here.
#if canImport(Metal) && canImport(QuartzCore)
import CoreGraphics
import CoreVideo
import Metal
import QuartzCore
@@ -44,6 +45,27 @@ fragment float4 pf_frag(VOut in [[stage_in]],
float b = y + 1.8556 * u;
return float4(saturate(float3(r, g, b)), 1.0);
}
// HDR: 10-bit P010 (BT.2020, limited range), Y'CbCr that is PQ-encoded. We apply the BT.2020
// matrix to get PQ-encoded R'G'B' and output it as-is — the CAMetalLayer's itur_2100_PQ colour
// space + EDR tells the compositor the samples are PQ, so it does the PQ→display mapping. No EOTF
// here (matching the host, which emitted BT.2020 PQ). P010 stores the 10-bit code in the high bits
// of each 16-bit sample, so an .r16Unorm sample reads ~code/1023 (the /1024 vs /1023 error is < 0.1%).
fragment float4 pf_frag_hdr(VOut in [[stage_in]],
texture2d<float> lumaTex [[texture(0)]],
texture2d<float> chromaTex [[texture(1)]]) {
constexpr sampler s(filter::linear, address::clamp_to_edge);
float y = lumaTex.sample(s, in.uv).r;
float2 c = chromaTex.sample(s, in.uv).rg;
// BT.2020 10-bit limited (video) range → full-range PQ R'G'B'.
y = (y - 64.0/1023.0) * (1023.0/876.0);
float u = (c.x - 512.0/1023.0) * (1023.0/896.0);
float v = (c.y - 512.0/1023.0) * (1023.0/896.0);
float r = y + 1.4746 * v;
float g = y - 0.16455 * u - 0.57135 * v;
float b = y + 1.8814 * u;
return float4(saturate(float3(r, g, b)), 1.0);
}
"""
public final class MetalVideoPresenter {
@@ -52,8 +74,13 @@ public final class MetalVideoPresenter {
private let device: MTLDevice
private let queue: MTLCommandQueue
private let pipeline: MTLRenderPipelineState
/// SDR (BT.709 8-bit NV12 bgra8) and HDR (BT.2020 PQ 10-bit P010 rgba16Float) pipelines.
/// Selected per frame by `render`; the layer is reconfigured when the mode flips (HDR toggle).
private let pipelineSDR: MTLRenderPipelineState
private let pipelineHDR: MTLRenderPipelineState
private var textureCache: CVMetalTextureCache?
/// Current layer configuration switched lazily in `configure(hdr:)` when a frame's mode differs.
private var hdrActive = false
/// nil if Metal is unavailable (no GPU / a headless CI) the caller falls back to stage-1.
public init?() {
@@ -64,11 +91,17 @@ public final class MetalVideoPresenter {
self.queue = queue
do {
let library = try device.makeLibrary(source: shaderSource, options: nil)
let desc = MTLRenderPipelineDescriptor()
desc.vertexFunction = library.makeFunction(name: "pf_vtx")
desc.fragmentFunction = library.makeFunction(name: "pf_frag")
desc.colorAttachments[0].pixelFormat = .bgra8Unorm
pipeline = try device.makeRenderPipelineState(descriptor: desc)
let vtx = library.makeFunction(name: "pf_vtx")
let sdr = MTLRenderPipelineDescriptor()
sdr.vertexFunction = vtx
sdr.fragmentFunction = library.makeFunction(name: "pf_frag")
sdr.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineSDR = try device.makeRenderPipelineState(descriptor: sdr)
let hdr = MTLRenderPipelineDescriptor()
hdr.vertexFunction = vtx
hdr.fragmentFunction = library.makeFunction(name: "pf_frag_hdr")
hdr.colorAttachments[0].pixelFormat = .rgba16Float // EDR-capable
pipelineHDR = try device.makeRenderPipelineState(descriptor: hdr)
} catch {
return nil
}
@@ -102,14 +135,40 @@ public final class MetalVideoPresenter {
if layer.drawableSize != size { layer.drawableSize = size }
}
/// Draw one decoded frame to the next drawable and present it. Returns true on success;
/// Reconfigure the layer for SDR or HDR when the stream mode flips (HDR toggle). HDR uses an
/// rgba16Float drawable + a BT.2020 PQ colour space + EDR, so the compositor PQ-maps to the
/// display; SDR uses the plain 8-bit sRGB path. Main-thread only (called from `render`).
private func configure(hdr: Bool) {
guard hdr != hdrActive else { return }
hdrActive = hdr
if hdr {
layer.pixelFormat = .rgba16Float
layer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ)
#if os(macOS)
layer.wantsExtendedDynamicRangeContent = true
#endif
} else {
layer.pixelFormat = .bgra8Unorm
layer.colorspace = nil
#if os(macOS)
layer.wantsExtendedDynamicRangeContent = false
#endif
}
}
/// Draw one decoded frame to the next drawable and present it. `isHDR` selects the 10-bit
/// BT.2020 PQ path (P010 input) vs the 8-bit BT.709 path (NV12 input). Returns true on success;
/// false when there's no drawable yet, a texture couldn't be made, or Metal errored the
/// caller then doesn't stamp a present for this frame.
@discardableResult
public func render(_ pixelBuffer: CVPixelBuffer) -> Bool {
public func render(_ pixelBuffer: CVPixelBuffer, isHDR: Bool = false) -> Bool {
configure(hdr: isHDR)
// P010 stores 10-bit luma/chroma in 16-bit samples R16/RG16; NV12 is 8-bit R8/RG8.
let lumaFmt: MTLPixelFormat = isHDR ? .r16Unorm : .r8Unorm
let chromaFmt: MTLPixelFormat = isHDR ? .rg16Unorm : .rg8Unorm
guard let textureCache,
let luma = makeTexture(pixelBuffer, plane: 0, format: .r8Unorm, cache: textureCache),
let chroma = makeTexture(pixelBuffer, plane: 1, format: .rg8Unorm, cache: textureCache)
let luma = makeTexture(pixelBuffer, plane: 0, format: lumaFmt, cache: textureCache),
let chroma = makeTexture(pixelBuffer, plane: 1, format: chromaFmt, cache: textureCache)
else { return false }
// The hosting view owns drawableSize (aspect-fit to its bounds); skip until it's laid
@@ -127,7 +186,7 @@ public final class MetalVideoPresenter {
guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else {
return false
}
encoder.setRenderPipelineState(pipeline)
encoder.setRenderPipelineState(isHDR ? pipelineHDR : pipelineSDR)
encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0)
encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1)
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
@@ -144,7 +144,7 @@ public final class Stage2Pipeline {
/// converted to `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
public func renderTick(targetPresentNs: Int64) {
guard let frame = ring.take() else { return }
guard presenter.render(frame.pixelBuffer) else { return }
guard presenter.render(frame.pixelBuffer, isHDR: frame.isHDR) else { return }
presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs)
}
@@ -17,8 +17,11 @@ public struct ReadyFrame: @unchecked Sendable {
public let ptsNs: UInt64
/// Client `CLOCK_REALTIME` instant decode completed, in nanoseconds.
public let decodedNs: Int64
/// The decoded image (NV12 biplanar, Metal-compatible).
/// The decoded image 8-bit NV12 biplanar (SDR) or 10-bit P010 biplanar (HDR), Metal-compatible.
public let pixelBuffer: CVPixelBuffer
/// True when the stream is HDR (BT.2020 PQ): the buffer is 10-bit P010 and the presenter must
/// configure EDR + BT.2020 PQ output. Derived from the decoded buffer's pixel format.
public let isHDR: Bool
}
/// The C output callback can't capture context, so VideoToolbox hands it the refcon we set at
@@ -116,8 +119,22 @@ public final class VideoDecoder: @unchecked Sendable {
format = nil
}
/// `lock` held. Replace the session with one for `newFormat`. NV12 video-range, Metal-
/// compatible output (10-bit/HDR is a later tie-in see the plan).
/// True when `newFormat` carries a PQ (SMPTE ST 2084) or HLG transfer function i.e. the host
/// is sending HDR (BT.2020). VideoToolbox populates the transfer-function extension from the
/// HEVC VUI, so this tracks the *stream*, switching dynamically when the user toggles HDR
/// (the host re-emits parameter sets with the new VUI a new format desc session rebuild).
static func isHDRFormat(_ format: CMVideoFormatDescription) -> Bool {
guard
let tf = CMFormatDescriptionGetExtension(
format, extensionKey: kCMFormatDescriptionExtension_TransferFunction)
else { return false }
let s = tf as? String
return s == (kCMFormatDescriptionTransferFunction_SMPTE_ST_2084_PQ as String)
|| s == (kCMFormatDescriptionTransferFunction_ITU_R_2100_HLG as String)
}
/// `lock` held. Replace the session with one for `newFormat`. SDR streams decode to 8-bit NV12;
/// HDR streams (BT.2020 PQ) decode to 10-bit P010 so the presenter can drive EDR.
private func createSessionLocked(format newFormat: CMVideoFormatDescription) -> Bool {
if let session {
VTDecompressionSessionWaitForAsynchronousFrames(session)
@@ -126,10 +143,14 @@ public final class VideoDecoder: @unchecked Sendable {
session = nil
format = nil
let hdr = Self.isHDRFormat(newFormat)
let pixelFormat =
hdr
? kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010 (10-bit)
: kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12 (8-bit)
let imageAttrs: [CFString: Any] = [
kCVPixelBufferMetalCompatibilityKey: true,
kCVPixelBufferPixelFormatTypeKey:
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,
kCVPixelBufferPixelFormatTypeKey: pixelFormat,
]
var callback = VTDecompressionOutputCallbackRecord(
decompressionOutputCallback: decoderOutputCallback,
@@ -160,6 +181,11 @@ public final class VideoDecoder: @unchecked Sendable {
// pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively.
let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default)
let ptsNs = p.value > 0 ? UInt64(p.value) : 0
onDecoded(ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer))
// HDR iff the decoder produced a 10-bit P010 buffer (we only request P010 for PQ streams).
let isHDR =
CVPixelBufferGetPixelFormatType(imageBuffer)
== kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
onDecoded(
ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer, isHDR: isHDR))
}
}