feat(hdr): Windows HDR10 + 10-bit end-to-end, negotiated; non-blocking capture recovery
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 1m32s
android / android (push) Successful in 1m49s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 30s
ci / bench (push) Successful in 1m36s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
deb / build-publish (push) Successful in 2m20s
flatpak / build-publish (push) Successful in 4m6s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m11s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m32s
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 1m32s
android / android (push) Successful in 1m49s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 30s
ci / bench (push) Successful in 1m36s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
deb / build-publish (push) Successful in 2m20s
flatpak / build-publish (push) Successful in 4m6s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m11s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m32s
Adds true HDR (BT.2020 PQ) and 10-bit (HEVC Main10) streaming, negotiated so an 8-bit/SDR client is never sent a stream it can't decode, plus a robust fix for the capture losing the stream across a secure-desktop transition. Protocol (punktfunk-core/quic.rs): - Hello gains `video_caps` (VIDEO_CAP_10BIT / VIDEO_CAP_HDR), Welcome gains `bit_depth`, both as optional trailing bytes (back-compat). client-rs advertises 10-bit via PUNKTFUNK_CLIENT_10BIT; the connector advertises 0 for now (in-band detection drives the native clients). Regenerated punktfunk_core.h. Windows host: - 10-bit Main10: host enables it only when the client advertised VIDEO_CAP_10BIT AND PUNKTFUNK_10BIT is set; threaded through open_video → NVENC (profile Main10, pixelBitDepthMinus8). - HDR: when the captured desktop is scRGB FP16 (R16G16B16A16_FLOAT, HDR on), copy it to an FP16 surface, composite the cursor there, convert scRGB → BT.2020 PQ 10-bit (R10G10B10A2) via a shader, and encode HEVC Main10 with the BT.2020/PQ colour VUI (ABGR10 input). Fixes the freeze + cursor-trail that came from feeding FP16 into the BGRA path. Reacts dynamically to the HDR toggle. - Capture recovery: rebuild is now a single NON-BLOCKING attempt, throttled to ~4×/s, repeating the last good frame between attempts (format-tagged last_present). During a secure-desktop dwell SudoVDA's output is gone; the old blocking 12 s retry starved the send loop for seconds so the client timed out and disconnected — now the session stays fed (frozen) until the desktop returns. Also seeds a black frame on recovery. Apple client (PunktfunkKit): - Detects HDR in-band from the stream VUI (PQ transfer function), decodes to 10-bit P010, and presents via an rgba16Float + BT.2020 PQ CAMetalLayer with EDR; SDR path unchanged. Switches automatically on a mid-session HDR toggle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -7,6 +7,7 @@
|
|||||||
// (which fires on the main runloop). The Metal objects + texture cache are touched only here.
|
// (which fires on the main runloop). The Metal objects + texture cache are touched only here.
|
||||||
|
|
||||||
#if canImport(Metal) && canImport(QuartzCore)
|
#if canImport(Metal) && canImport(QuartzCore)
|
||||||
|
import CoreGraphics
|
||||||
import CoreVideo
|
import CoreVideo
|
||||||
import Metal
|
import Metal
|
||||||
import QuartzCore
|
import QuartzCore
|
||||||
@@ -44,6 +45,27 @@ fragment float4 pf_frag(VOut in [[stage_in]],
|
|||||||
float b = y + 1.8556 * u;
|
float b = y + 1.8556 * u;
|
||||||
return float4(saturate(float3(r, g, b)), 1.0);
|
return float4(saturate(float3(r, g, b)), 1.0);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// HDR: 10-bit P010 (BT.2020, limited range), Y'CbCr that is PQ-encoded. We apply the BT.2020
|
||||||
|
// matrix to get PQ-encoded R'G'B' and output it as-is — the CAMetalLayer's itur_2100_PQ colour
|
||||||
|
// space + EDR tells the compositor the samples are PQ, so it does the PQ→display mapping. No EOTF
|
||||||
|
// here (matching the host, which emitted BT.2020 PQ). P010 stores the 10-bit code in the high bits
|
||||||
|
// of each 16-bit sample, so an .r16Unorm sample reads ~code/1023 (the /1024 vs /1023 error is < 0.1%).
|
||||||
|
fragment float4 pf_frag_hdr(VOut in [[stage_in]],
|
||||||
|
texture2d<float> lumaTex [[texture(0)]],
|
||||||
|
texture2d<float> chromaTex [[texture(1)]]) {
|
||||||
|
constexpr sampler s(filter::linear, address::clamp_to_edge);
|
||||||
|
float y = lumaTex.sample(s, in.uv).r;
|
||||||
|
float2 c = chromaTex.sample(s, in.uv).rg;
|
||||||
|
// BT.2020 10-bit limited (video) range → full-range PQ R'G'B'.
|
||||||
|
y = (y - 64.0/1023.0) * (1023.0/876.0);
|
||||||
|
float u = (c.x - 512.0/1023.0) * (1023.0/896.0);
|
||||||
|
float v = (c.y - 512.0/1023.0) * (1023.0/896.0);
|
||||||
|
float r = y + 1.4746 * v;
|
||||||
|
float g = y - 0.16455 * u - 0.57135 * v;
|
||||||
|
float b = y + 1.8814 * u;
|
||||||
|
return float4(saturate(float3(r, g, b)), 1.0);
|
||||||
|
}
|
||||||
"""
|
"""
|
||||||
|
|
||||||
public final class MetalVideoPresenter {
|
public final class MetalVideoPresenter {
|
||||||
@@ -52,8 +74,13 @@ public final class MetalVideoPresenter {
|
|||||||
|
|
||||||
private let device: MTLDevice
|
private let device: MTLDevice
|
||||||
private let queue: MTLCommandQueue
|
private let queue: MTLCommandQueue
|
||||||
private let pipeline: MTLRenderPipelineState
|
/// SDR (BT.709 8-bit NV12 → bgra8) and HDR (BT.2020 PQ 10-bit P010 → rgba16Float) pipelines.
|
||||||
|
/// Selected per frame by `render`; the layer is reconfigured when the mode flips (HDR toggle).
|
||||||
|
private let pipelineSDR: MTLRenderPipelineState
|
||||||
|
private let pipelineHDR: MTLRenderPipelineState
|
||||||
private var textureCache: CVMetalTextureCache?
|
private var textureCache: CVMetalTextureCache?
|
||||||
|
/// Current layer configuration — switched lazily in `configure(hdr:)` when a frame's mode differs.
|
||||||
|
private var hdrActive = false
|
||||||
|
|
||||||
/// nil if Metal is unavailable (no GPU / a headless CI) — the caller falls back to stage-1.
|
/// nil if Metal is unavailable (no GPU / a headless CI) — the caller falls back to stage-1.
|
||||||
public init?() {
|
public init?() {
|
||||||
@@ -64,11 +91,17 @@ public final class MetalVideoPresenter {
|
|||||||
self.queue = queue
|
self.queue = queue
|
||||||
do {
|
do {
|
||||||
let library = try device.makeLibrary(source: shaderSource, options: nil)
|
let library = try device.makeLibrary(source: shaderSource, options: nil)
|
||||||
let desc = MTLRenderPipelineDescriptor()
|
let vtx = library.makeFunction(name: "pf_vtx")
|
||||||
desc.vertexFunction = library.makeFunction(name: "pf_vtx")
|
let sdr = MTLRenderPipelineDescriptor()
|
||||||
desc.fragmentFunction = library.makeFunction(name: "pf_frag")
|
sdr.vertexFunction = vtx
|
||||||
desc.colorAttachments[0].pixelFormat = .bgra8Unorm
|
sdr.fragmentFunction = library.makeFunction(name: "pf_frag")
|
||||||
pipeline = try device.makeRenderPipelineState(descriptor: desc)
|
sdr.colorAttachments[0].pixelFormat = .bgra8Unorm
|
||||||
|
pipelineSDR = try device.makeRenderPipelineState(descriptor: sdr)
|
||||||
|
let hdr = MTLRenderPipelineDescriptor()
|
||||||
|
hdr.vertexFunction = vtx
|
||||||
|
hdr.fragmentFunction = library.makeFunction(name: "pf_frag_hdr")
|
||||||
|
hdr.colorAttachments[0].pixelFormat = .rgba16Float // EDR-capable
|
||||||
|
pipelineHDR = try device.makeRenderPipelineState(descriptor: hdr)
|
||||||
} catch {
|
} catch {
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
@@ -102,14 +135,40 @@ public final class MetalVideoPresenter {
|
|||||||
if layer.drawableSize != size { layer.drawableSize = size }
|
if layer.drawableSize != size { layer.drawableSize = size }
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Draw one decoded frame to the next drawable and present it. Returns true on success;
|
/// Reconfigure the layer for SDR or HDR when the stream mode flips (HDR toggle). HDR uses an
|
||||||
|
/// rgba16Float drawable + a BT.2020 PQ colour space + EDR, so the compositor PQ-maps to the
|
||||||
|
/// display; SDR uses the plain 8-bit sRGB path. Main-thread only (called from `render`).
|
||||||
|
private func configure(hdr: Bool) {
|
||||||
|
guard hdr != hdrActive else { return }
|
||||||
|
hdrActive = hdr
|
||||||
|
if hdr {
|
||||||
|
layer.pixelFormat = .rgba16Float
|
||||||
|
layer.colorspace = CGColorSpace(name: CGColorSpace.itur_2100_PQ)
|
||||||
|
#if os(macOS)
|
||||||
|
layer.wantsExtendedDynamicRangeContent = true
|
||||||
|
#endif
|
||||||
|
} else {
|
||||||
|
layer.pixelFormat = .bgra8Unorm
|
||||||
|
layer.colorspace = nil
|
||||||
|
#if os(macOS)
|
||||||
|
layer.wantsExtendedDynamicRangeContent = false
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Draw one decoded frame to the next drawable and present it. `isHDR` selects the 10-bit
|
||||||
|
/// BT.2020 PQ path (P010 input) vs the 8-bit BT.709 path (NV12 input). Returns true on success;
|
||||||
/// false when there's no drawable yet, a texture couldn't be made, or Metal errored — the
|
/// false when there's no drawable yet, a texture couldn't be made, or Metal errored — the
|
||||||
/// caller then doesn't stamp a present for this frame.
|
/// caller then doesn't stamp a present for this frame.
|
||||||
@discardableResult
|
@discardableResult
|
||||||
public func render(_ pixelBuffer: CVPixelBuffer) -> Bool {
|
public func render(_ pixelBuffer: CVPixelBuffer, isHDR: Bool = false) -> Bool {
|
||||||
|
configure(hdr: isHDR)
|
||||||
|
// P010 stores 10-bit luma/chroma in 16-bit samples → R16/RG16; NV12 is 8-bit → R8/RG8.
|
||||||
|
let lumaFmt: MTLPixelFormat = isHDR ? .r16Unorm : .r8Unorm
|
||||||
|
let chromaFmt: MTLPixelFormat = isHDR ? .rg16Unorm : .rg8Unorm
|
||||||
guard let textureCache,
|
guard let textureCache,
|
||||||
let luma = makeTexture(pixelBuffer, plane: 0, format: .r8Unorm, cache: textureCache),
|
let luma = makeTexture(pixelBuffer, plane: 0, format: lumaFmt, cache: textureCache),
|
||||||
let chroma = makeTexture(pixelBuffer, plane: 1, format: .rg8Unorm, cache: textureCache)
|
let chroma = makeTexture(pixelBuffer, plane: 1, format: chromaFmt, cache: textureCache)
|
||||||
else { return false }
|
else { return false }
|
||||||
|
|
||||||
// The hosting view owns drawableSize (aspect-fit to its bounds); skip until it's laid
|
// The hosting view owns drawableSize (aspect-fit to its bounds); skip until it's laid
|
||||||
@@ -127,7 +186,7 @@ public final class MetalVideoPresenter {
|
|||||||
guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else {
|
guard let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: pass) else {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
encoder.setRenderPipelineState(pipeline)
|
encoder.setRenderPipelineState(isHDR ? pipelineHDR : pipelineSDR)
|
||||||
encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0)
|
encoder.setFragmentTexture(CVMetalTextureGetTexture(luma), index: 0)
|
||||||
encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1)
|
encoder.setFragmentTexture(CVMetalTextureGetTexture(chroma), index: 1)
|
||||||
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
|
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
|
||||||
|
|||||||
@@ -144,7 +144,7 @@ public final class Stage2Pipeline {
|
|||||||
/// converted to `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
|
/// converted to `CLOCK_REALTIME` (see `realtimeNs(forDisplayLinkTimestamp:)`).
|
||||||
public func renderTick(targetPresentNs: Int64) {
|
public func renderTick(targetPresentNs: Int64) {
|
||||||
guard let frame = ring.take() else { return }
|
guard let frame = ring.take() else { return }
|
||||||
guard presenter.render(frame.pixelBuffer) else { return }
|
guard presenter.render(frame.pixelBuffer, isHDR: frame.isHDR) else { return }
|
||||||
presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs)
|
presentMeter.record(ptsNs: frame.ptsNs, atNs: targetPresentNs, offsetNs: offsetNs)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -17,8 +17,11 @@ public struct ReadyFrame: @unchecked Sendable {
|
|||||||
public let ptsNs: UInt64
|
public let ptsNs: UInt64
|
||||||
/// Client `CLOCK_REALTIME` instant decode completed, in nanoseconds.
|
/// Client `CLOCK_REALTIME` instant decode completed, in nanoseconds.
|
||||||
public let decodedNs: Int64
|
public let decodedNs: Int64
|
||||||
/// The decoded image (NV12 biplanar, Metal-compatible).
|
/// The decoded image — 8-bit NV12 biplanar (SDR) or 10-bit P010 biplanar (HDR), Metal-compatible.
|
||||||
public let pixelBuffer: CVPixelBuffer
|
public let pixelBuffer: CVPixelBuffer
|
||||||
|
/// True when the stream is HDR (BT.2020 PQ): the buffer is 10-bit P010 and the presenter must
|
||||||
|
/// configure EDR + BT.2020 PQ output. Derived from the decoded buffer's pixel format.
|
||||||
|
public let isHDR: Bool
|
||||||
}
|
}
|
||||||
|
|
||||||
/// The C output callback can't capture context, so VideoToolbox hands it the refcon we set at
|
/// The C output callback can't capture context, so VideoToolbox hands it the refcon we set at
|
||||||
@@ -116,8 +119,22 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
format = nil
|
format = nil
|
||||||
}
|
}
|
||||||
|
|
||||||
/// `lock` held. Replace the session with one for `newFormat`. NV12 video-range, Metal-
|
/// True when `newFormat` carries a PQ (SMPTE ST 2084) or HLG transfer function — i.e. the host
|
||||||
/// compatible output (10-bit/HDR is a later tie-in — see the plan).
|
/// is sending HDR (BT.2020). VideoToolbox populates the transfer-function extension from the
|
||||||
|
/// HEVC VUI, so this tracks the *stream*, switching dynamically when the user toggles HDR
|
||||||
|
/// (the host re-emits parameter sets with the new VUI → a new format desc → session rebuild).
|
||||||
|
static func isHDRFormat(_ format: CMVideoFormatDescription) -> Bool {
|
||||||
|
guard
|
||||||
|
let tf = CMFormatDescriptionGetExtension(
|
||||||
|
format, extensionKey: kCMFormatDescriptionExtension_TransferFunction)
|
||||||
|
else { return false }
|
||||||
|
let s = tf as? String
|
||||||
|
return s == (kCMFormatDescriptionTransferFunction_SMPTE_ST_2084_PQ as String)
|
||||||
|
|| s == (kCMFormatDescriptionTransferFunction_ITU_R_2100_HLG as String)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// `lock` held. Replace the session with one for `newFormat`. SDR streams decode to 8-bit NV12;
|
||||||
|
/// HDR streams (BT.2020 PQ) decode to 10-bit P010 so the presenter can drive EDR.
|
||||||
private func createSessionLocked(format newFormat: CMVideoFormatDescription) -> Bool {
|
private func createSessionLocked(format newFormat: CMVideoFormatDescription) -> Bool {
|
||||||
if let session {
|
if let session {
|
||||||
VTDecompressionSessionWaitForAsynchronousFrames(session)
|
VTDecompressionSessionWaitForAsynchronousFrames(session)
|
||||||
@@ -126,10 +143,14 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
session = nil
|
session = nil
|
||||||
format = nil
|
format = nil
|
||||||
|
|
||||||
|
let hdr = Self.isHDRFormat(newFormat)
|
||||||
|
let pixelFormat =
|
||||||
|
hdr
|
||||||
|
? kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange // P010 (10-bit)
|
||||||
|
: kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange // NV12 (8-bit)
|
||||||
let imageAttrs: [CFString: Any] = [
|
let imageAttrs: [CFString: Any] = [
|
||||||
kCVPixelBufferMetalCompatibilityKey: true,
|
kCVPixelBufferMetalCompatibilityKey: true,
|
||||||
kCVPixelBufferPixelFormatTypeKey:
|
kCVPixelBufferPixelFormatTypeKey: pixelFormat,
|
||||||
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,
|
|
||||||
]
|
]
|
||||||
var callback = VTDecompressionOutputCallbackRecord(
|
var callback = VTDecompressionOutputCallbackRecord(
|
||||||
decompressionOutputCallback: decoderOutputCallback,
|
decompressionOutputCallback: decoderOutputCallback,
|
||||||
@@ -160,6 +181,11 @@ public final class VideoDecoder: @unchecked Sendable {
|
|||||||
// pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively.
|
// pts was stamped at timescale 1e9 (AnnexB.sampleBuffer); normalize defensively.
|
||||||
let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default)
|
let p = CMTimeConvertScale(pts, timescale: 1_000_000_000, method: .default)
|
||||||
let ptsNs = p.value > 0 ? UInt64(p.value) : 0
|
let ptsNs = p.value > 0 ? UInt64(p.value) : 0
|
||||||
onDecoded(ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer))
|
// HDR iff the decoder produced a 10-bit P010 buffer (we only request P010 for PQ streams).
|
||||||
|
let isHDR =
|
||||||
|
CVPixelBufferGetPixelFormatType(imageBuffer)
|
||||||
|
== kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
|
||||||
|
onDecoded(
|
||||||
|
ReadyFrame(ptsNs: ptsNs, decodedNs: decodedNs, pixelBuffer: imageBuffer, isHDR: isHDR))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -380,6 +380,14 @@ async fn session(args: Args) -> Result<()> {
|
|||||||
name: Some(args.name.clone()),
|
name: Some(args.name.clone()),
|
||||||
// `--launch ID` — host resolves it against its own library and runs it this session.
|
// `--launch ID` — host resolves it against its own library and runs it this session.
|
||||||
launch: args.launch.clone(),
|
launch: args.launch.clone(),
|
||||||
|
// This headless tool just dumps the bitstream (no decode), so it can always claim
|
||||||
|
// 10-bit support. Gated by env so latency runs stay on the 8-bit baseline:
|
||||||
|
// PUNKTFUNK_CLIENT_10BIT=1 advertises VIDEO_CAP_10BIT to exercise the host Main10 path.
|
||||||
|
video_caps: if std::env::var_os("PUNKTFUNK_CLIENT_10BIT").is_some() {
|
||||||
|
punktfunk_core::quic::VIDEO_CAP_10BIT
|
||||||
|
} else {
|
||||||
|
0
|
||||||
|
},
|
||||||
}
|
}
|
||||||
.encode(),
|
.encode(),
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -614,6 +614,10 @@ async fn worker_main(args: WorkerArgs) {
|
|||||||
name: None,
|
name: None,
|
||||||
// Library id to launch this session, if the embedder asked for one.
|
// Library id to launch this session, if the embedder asked for one.
|
||||||
launch: launch.clone(),
|
launch: launch.clone(),
|
||||||
|
// TODO(hdr): advertise the embedder's real decode caps once the ABI carries them
|
||||||
|
// and the Apple/Linux clients decode 10-bit. 0 = 8-bit only — the host then never
|
||||||
|
// upgrades this connector's session to a stream it can't yet present.
|
||||||
|
video_caps: 0,
|
||||||
}
|
}
|
||||||
.encode(),
|
.encode(),
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -70,8 +70,21 @@ pub struct Hello {
|
|||||||
/// `name` is absent, a zero-length name placeholder precedes it so the offset stays
|
/// `name` is absent, a zero-length name placeholder precedes it so the offset stays
|
||||||
/// deterministic. Omitted by older clients (decodes to `None`).
|
/// deterministic. Omitted by older clients (decodes to `None`).
|
||||||
pub launch: Option<String>,
|
pub launch: Option<String>,
|
||||||
|
/// Client video capabilities the host may use to upgrade the stream — a bitfield of
|
||||||
|
/// [`VIDEO_CAP_10BIT`] (the client can decode 10-bit Main10 HEVC) and [`VIDEO_CAP_HDR`]
|
||||||
|
/// (the client can present BT.2020 PQ HDR10). The host enables a 10-bit / HDR encode ONLY
|
||||||
|
/// when the matching bit is set, so an older client (decodes to `0`) always gets the 8-bit
|
||||||
|
/// BT.709 stream it understands. Appended after `launch` as a single trailing byte; a
|
||||||
|
/// zero-length name/launch placeholder precedes it when those are absent so the offset stays
|
||||||
|
/// deterministic. Omitted by older clients (decodes to `0`).
|
||||||
|
pub video_caps: u8,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// [`Hello::video_caps`] bit: the client can decode a 10-bit (Main10) HEVC stream.
|
||||||
|
pub const VIDEO_CAP_10BIT: u8 = 0x01;
|
||||||
|
/// [`Hello::video_caps`] bit: the client can present BT.2020 PQ HDR10 (implies 10-bit).
|
||||||
|
pub const VIDEO_CAP_HDR: u8 = 0x02;
|
||||||
|
|
||||||
/// Longest device name carried in a [`Hello`] (bytes of UTF-8; longer names are truncated on
|
/// Longest device name carried in a [`Hello`] (bytes of UTF-8; longer names are truncated on
|
||||||
/// encode, rejected on decode — a one-byte length prefix caps it at 255 anyway).
|
/// encode, rejected on decode — a one-byte length prefix caps it at 255 anyway).
|
||||||
pub const HELLO_NAME_MAX: usize = 64;
|
pub const HELLO_NAME_MAX: usize = 64;
|
||||||
@@ -108,6 +121,12 @@ pub struct Welcome {
|
|||||||
/// default when the client requested `0`). Appended to the wire form — `0` when an older host
|
/// default when the client requested `0`). Appended to the wire form — `0` when an older host
|
||||||
/// omitted it (i.e. "unknown").
|
/// omitted it (i.e. "unknown").
|
||||||
pub bitrate_kbps: u32,
|
pub bitrate_kbps: u32,
|
||||||
|
/// The luma/chroma bit depth the host actually encodes at — `8` (default / older host) or
|
||||||
|
/// `10` (Main10, enabled only when the client advertised [`VIDEO_CAP_10BIT`]). The client
|
||||||
|
/// configures its decoder for 10-bit (P010) when this is `10`. Appended to the wire form as a
|
||||||
|
/// single trailing byte; `8` when an older host omitted it. (Color space stays BT.709 in
|
||||||
|
/// Phase 1; BT.2020 PQ HDR signaling is added alongside HDR support.)
|
||||||
|
pub bit_depth: u8,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// `client → host`: data plane is bound, begin streaming.
|
/// `client → host`: data plane is bound, begin streaming.
|
||||||
@@ -513,20 +532,28 @@ impl Hello {
|
|||||||
// so a Hello with neither name nor launch stays byte-identical to the bitrate-era form
|
// so a Hello with neither name nor launch stays byte-identical to the bitrate-era form
|
||||||
// (26 bytes). When `launch` is present we must still emit name's length byte (0 for None)
|
// (26 bytes). When `launch` is present we must still emit name's length byte (0 for None)
|
||||||
// so `launch` lands at a deterministic offset.
|
// so `launch` lands at a deterministic offset.
|
||||||
|
// `video_caps` is the last trailing field, after `launch`; when it's present (non-zero)
|
||||||
|
// the name/launch length bytes must still be emitted (0 for absent) so it lands at a
|
||||||
|
// deterministic offset — the same discipline `launch` already imposes on `name`.
|
||||||
|
let need_placeholders = self.video_caps != 0;
|
||||||
match (&self.name, &self.launch) {
|
match (&self.name, &self.launch) {
|
||||||
(None, None) => {}
|
(None, None) if !need_placeholders => {}
|
||||||
(name, _) => {
|
(name, _) => {
|
||||||
let n = truncate_to(name.as_deref().unwrap_or(""), HELLO_NAME_MAX);
|
let n = truncate_to(name.as_deref().unwrap_or(""), HELLO_NAME_MAX);
|
||||||
b.push(n.len() as u8);
|
b.push(n.len() as u8);
|
||||||
b.extend_from_slice(n.as_bytes());
|
b.extend_from_slice(n.as_bytes());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// launch after name: len u8 || UTF-8. Last trailing field.
|
// launch after name: len u8 || UTF-8.
|
||||||
if let Some(launch) = &self.launch {
|
if self.launch.is_some() || need_placeholders {
|
||||||
let l = truncate_to(launch, HELLO_LAUNCH_MAX);
|
let l = truncate_to(self.launch.as_deref().unwrap_or(""), HELLO_LAUNCH_MAX);
|
||||||
b.push(l.len() as u8);
|
b.push(l.len() as u8);
|
||||||
b.extend_from_slice(l.as_bytes());
|
b.extend_from_slice(l.as_bytes());
|
||||||
}
|
}
|
||||||
|
// video_caps: single trailing byte. Last field.
|
||||||
|
if self.video_caps != 0 {
|
||||||
|
b.push(self.video_caps);
|
||||||
|
}
|
||||||
b
|
b
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -580,6 +607,15 @@ impl Hello {
|
|||||||
.and_then(|s| std::str::from_utf8(s).ok())
|
.and_then(|s| std::str::from_utf8(s).ok())
|
||||||
.map(String::from)
|
.map(String::from)
|
||||||
}),
|
}),
|
||||||
|
// Optional trailing video-caps byte, positioned right after launch's `len u8 || bytes`
|
||||||
|
// block. Uses the raw (possibly zero/placeholder) name/launch length bytes to locate it,
|
||||||
|
// so it's robust to absent name/launch; absent entirely on an older client → `0`.
|
||||||
|
video_caps: {
|
||||||
|
let name_len = b.get(26).copied().unwrap_or(0) as usize;
|
||||||
|
let launch_off = 27 + name_len; // launch's length byte
|
||||||
|
let launch_len = b.get(launch_off).copied().unwrap_or(0) as usize;
|
||||||
|
b.get(launch_off + 1 + launch_len).copied().unwrap_or(0)
|
||||||
|
},
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -607,6 +643,7 @@ impl Welcome {
|
|||||||
b.push(self.compositor.to_u8()); // appended at offset 53 — older clients read [0..53] and skip it
|
b.push(self.compositor.to_u8()); // appended at offset 53 — older clients read [0..53] and skip it
|
||||||
b.push(self.gamepad.to_u8()); // appended at offset 54 — same back-compat discipline
|
b.push(self.gamepad.to_u8()); // appended at offset 54 — same back-compat discipline
|
||||||
b.extend_from_slice(&self.bitrate_kbps.to_le_bytes()); // appended at offset 55..59
|
b.extend_from_slice(&self.bitrate_kbps.to_le_bytes()); // appended at offset 55..59
|
||||||
|
b.push(self.bit_depth); // appended at offset 59 — older clients read [0..59] and skip it
|
||||||
b
|
b
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -614,7 +651,7 @@ impl Welcome {
|
|||||||
// Layout (LE): magic[0..4] abi[4..8] port[8..10] w[10..14] h[14..18] hz[18..22]
|
// Layout (LE): magic[0..4] abi[4..8] port[8..10] w[10..14] h[14..18] hz[18..22]
|
||||||
// scheme[22] pct[23] max_data[24..26] shard[26..28] encrypt[28] key[29..45]
|
// scheme[22] pct[23] max_data[24..26] shard[26..28] encrypt[28] key[29..45]
|
||||||
// salt[45..49] frames[49..53] compositor[53] gamepad[54] bitrate_kbps[55..59]
|
// salt[45..49] frames[49..53] compositor[53] gamepad[54] bitrate_kbps[55..59]
|
||||||
// (compositor/gamepad/bitrate are optional trailing bytes).
|
// bit_depth[59] (compositor/gamepad/bitrate/bit_depth are optional trailing bytes).
|
||||||
if b.len() < 53 || &b[0..4] != MAGIC {
|
if b.len() < 53 || &b[0..4] != MAGIC {
|
||||||
return Err(PunktfunkError::InvalidArg("bad Welcome"));
|
return Err(PunktfunkError::InvalidArg("bad Welcome"));
|
||||||
}
|
}
|
||||||
@@ -661,6 +698,9 @@ impl Welcome {
|
|||||||
.get(55..59)
|
.get(55..59)
|
||||||
.map(|s| u32::from_le_bytes(s.try_into().unwrap()))
|
.map(|s| u32::from_le_bytes(s.try_into().unwrap()))
|
||||||
.unwrap_or(0),
|
.unwrap_or(0),
|
||||||
|
// Optional trailing byte — absent on an older host → `8` (8-bit, the only depth they
|
||||||
|
// encode).
|
||||||
|
bit_depth: b.get(59).copied().unwrap_or(8),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1518,6 +1558,7 @@ mod tests {
|
|||||||
compositor: CompositorPref::Gamescope,
|
compositor: CompositorPref::Gamescope,
|
||||||
gamepad: GamepadPref::DualSense,
|
gamepad: GamepadPref::DualSense,
|
||||||
bitrate_kbps: 50_000,
|
bitrate_kbps: 50_000,
|
||||||
|
bit_depth: 10,
|
||||||
};
|
};
|
||||||
assert_eq!(Welcome::decode(&w.encode()).unwrap(), w);
|
assert_eq!(Welcome::decode(&w.encode()).unwrap(), w);
|
||||||
}
|
}
|
||||||
@@ -1536,6 +1577,7 @@ mod tests {
|
|||||||
bitrate_kbps: 25_000,
|
bitrate_kbps: 25_000,
|
||||||
name: Some("Test Device".into()),
|
name: Some("Test Device".into()),
|
||||||
launch: Some("steam:570".into()),
|
launch: Some("steam:570".into()),
|
||||||
|
video_caps: VIDEO_CAP_10BIT,
|
||||||
};
|
};
|
||||||
assert_eq!(Hello::decode(&h.encode()).unwrap(), h);
|
assert_eq!(Hello::decode(&h.encode()).unwrap(), h);
|
||||||
let s = Start {
|
let s = Start {
|
||||||
@@ -1602,6 +1644,7 @@ mod tests {
|
|||||||
bitrate_kbps: 80_000,
|
bitrate_kbps: 80_000,
|
||||||
name: None,
|
name: None,
|
||||||
launch: None,
|
launch: None,
|
||||||
|
video_caps: 0,
|
||||||
};
|
};
|
||||||
let enc = h.encode();
|
let enc = h.encode();
|
||||||
assert_eq!(enc.len(), 26);
|
assert_eq!(enc.len(), 26);
|
||||||
@@ -1639,9 +1682,10 @@ mod tests {
|
|||||||
compositor: CompositorPref::Kwin,
|
compositor: CompositorPref::Kwin,
|
||||||
gamepad: GamepadPref::Xbox360,
|
gamepad: GamepadPref::Xbox360,
|
||||||
bitrate_kbps: 120_000,
|
bitrate_kbps: 120_000,
|
||||||
|
bit_depth: 10,
|
||||||
};
|
};
|
||||||
let wenc = w.encode();
|
let wenc = w.encode();
|
||||||
assert_eq!(wenc.len(), 59);
|
assert_eq!(wenc.len(), 60);
|
||||||
let legacy_w = Welcome::decode(&wenc[..53]).unwrap();
|
let legacy_w = Welcome::decode(&wenc[..53]).unwrap();
|
||||||
assert_eq!(legacy_w.compositor, CompositorPref::Auto);
|
assert_eq!(legacy_w.compositor, CompositorPref::Auto);
|
||||||
assert_eq!(legacy_w.gamepad, GamepadPref::Auto);
|
assert_eq!(legacy_w.gamepad, GamepadPref::Auto);
|
||||||
@@ -1655,7 +1699,10 @@ mod tests {
|
|||||||
let pre_bitrate_w = Welcome::decode(&wenc[..55]).unwrap();
|
let pre_bitrate_w = Welcome::decode(&wenc[..55]).unwrap();
|
||||||
assert_eq!(pre_bitrate_w.gamepad, GamepadPref::Xbox360);
|
assert_eq!(pre_bitrate_w.gamepad, GamepadPref::Xbox360);
|
||||||
assert_eq!(pre_bitrate_w.bitrate_kbps, 0);
|
assert_eq!(pre_bitrate_w.bitrate_kbps, 0);
|
||||||
|
assert_eq!(pre_bitrate_w.bit_depth, 8); // older host (no trailing byte) → 8-bit assumed
|
||||||
|
assert_eq!(legacy_w.bit_depth, 8);
|
||||||
assert_eq!(Welcome::decode(&wenc).unwrap().bitrate_kbps, 120_000);
|
assert_eq!(Welcome::decode(&wenc).unwrap().bitrate_kbps, 120_000);
|
||||||
|
assert_eq!(Welcome::decode(&wenc).unwrap().bit_depth, 10); // full form carries it
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
@@ -1672,6 +1719,7 @@ mod tests {
|
|||||||
bitrate_kbps: 0,
|
bitrate_kbps: 0,
|
||||||
name: Some("Enrico's MacBook".into()),
|
name: Some("Enrico's MacBook".into()),
|
||||||
launch: None,
|
launch: None,
|
||||||
|
video_caps: 0,
|
||||||
};
|
};
|
||||||
let enc = base.encode();
|
let enc = base.encode();
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
@@ -1718,6 +1766,7 @@ mod tests {
|
|||||||
bitrate_kbps: 0,
|
bitrate_kbps: 0,
|
||||||
name: None,
|
name: None,
|
||||||
launch: None,
|
launch: None,
|
||||||
|
video_caps: 0,
|
||||||
};
|
};
|
||||||
// launch alone (no name): a zero-length name placeholder keeps the offset deterministic.
|
// launch alone (no name): a zero-length name placeholder keeps the offset deterministic.
|
||||||
let with_launch = Hello {
|
let with_launch = Hello {
|
||||||
@@ -1882,6 +1931,7 @@ mod tests {
|
|||||||
bitrate_kbps: 0,
|
bitrate_kbps: 0,
|
||||||
name: None,
|
name: None,
|
||||||
launch: None,
|
launch: None,
|
||||||
|
video_caps: 0,
|
||||||
}
|
}
|
||||||
.encode();
|
.encode();
|
||||||
assert!(PairRequest::decode(&h).is_err(), "abi {abi} parsed as pair");
|
assert!(PairRequest::decode(&h).is_err(), "abi {abi} parsed as pair");
|
||||||
|
|||||||
@@ -164,7 +164,9 @@ mod uso {
|
|||||||
/// Latch USO off for the process after a send that means it isn't usable on this OS/NIC/path.
|
/// Latch USO off for the process after a send that means it isn't usable on this OS/NIC/path.
|
||||||
pub fn disable() {
|
pub fn disable() {
|
||||||
if STATE.swap(2, Ordering::Relaxed) != 2 {
|
if STATE.swap(2, Ordering::Relaxed) != 2 {
|
||||||
tracing::warn!("Windows USO unsupported on this path — falling back to per-packet sends");
|
tracing::warn!(
|
||||||
|
"Windows USO unsupported on this path — falling back to per-packet sends"
|
||||||
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -22,6 +22,10 @@ pub enum PixelFormat {
|
|||||||
Rgb,
|
Rgb,
|
||||||
/// `[B,G,R]`, 3 bpp.
|
/// `[B,G,R]`, 3 bpp.
|
||||||
Bgr,
|
Bgr,
|
||||||
|
/// 10-bit RGB packed as `R10G10B10A2` (DXGI `R10G10B10A2_UNORM`), 4 bpp. The HDR capture path
|
||||||
|
/// produces this: scRGB FP16 desktop pixels are converted to BT.2020 PQ and written here, then
|
||||||
|
/// handed to NVENC as `ABGR10` for an HEVC Main10 / HDR10 encode.
|
||||||
|
Rgb10a2,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl PixelFormat {
|
impl PixelFormat {
|
||||||
|
|||||||
@@ -16,24 +16,26 @@ use windows::core::{s, Interface, PCSTR};
|
|||||||
use windows::Win32::Foundation::{HMODULE, LUID};
|
use windows::Win32::Foundation::{HMODULE, LUID};
|
||||||
use windows::Win32::Graphics::Direct3D::Fxc::D3DCompile;
|
use windows::Win32::Graphics::Direct3D::Fxc::D3DCompile;
|
||||||
use windows::Win32::Graphics::Direct3D::{
|
use windows::Win32::Graphics::Direct3D::{
|
||||||
ID3DBlob, D3D_DRIVER_TYPE_UNKNOWN, D3D_FEATURE_LEVEL_11_0,
|
ID3DBlob, D3D_DRIVER_TYPE_UNKNOWN, D3D_FEATURE_LEVEL_11_0, D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST,
|
||||||
D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP,
|
D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP,
|
||||||
};
|
};
|
||||||
use windows::Win32::Graphics::Direct3D11::{
|
use windows::Win32::Graphics::Direct3D11::{
|
||||||
D3D11CreateDevice, ID3D11BlendState, ID3D11Buffer, ID3D11Device, ID3D11DeviceContext,
|
D3D11CreateDevice, ID3D11BlendState, ID3D11Buffer, ID3D11Device, ID3D11DeviceContext,
|
||||||
ID3D11PixelShader, ID3D11RenderTargetView, ID3D11SamplerState, ID3D11ShaderResourceView,
|
ID3D11PixelShader, ID3D11RenderTargetView, ID3D11SamplerState, ID3D11ShaderResourceView,
|
||||||
ID3D11Texture2D, ID3D11VertexShader, D3D11_BIND_CONSTANT_BUFFER, D3D11_BIND_FLAG,
|
ID3D11Texture2D, ID3D11VertexShader, D3D11_BIND_CONSTANT_BUFFER, D3D11_BIND_FLAG,
|
||||||
D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_BLEND_DESC, D3D11_BLEND_INV_DEST_COLOR,
|
D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_BLEND_DESC,
|
||||||
D3D11_BLEND_INV_SRC_ALPHA, D3D11_BLEND_ONE, D3D11_BLEND_OP_ADD, D3D11_BLEND_SRC_ALPHA,
|
D3D11_BLEND_INV_DEST_COLOR, D3D11_BLEND_INV_SRC_ALPHA, D3D11_BLEND_ONE, D3D11_BLEND_OP_ADD,
|
||||||
D3D11_BUFFER_DESC,
|
D3D11_BLEND_SRC_ALPHA, D3D11_BUFFER_DESC, D3D11_COLOR_WRITE_ENABLE_ALL, D3D11_COMPARISON_NEVER,
|
||||||
D3D11_COLOR_WRITE_ENABLE_ALL, D3D11_COMPARISON_NEVER, D3D11_CPU_ACCESS_READ,
|
D3D11_CPU_ACCESS_READ, D3D11_CPU_ACCESS_WRITE, D3D11_CREATE_DEVICE_BGRA_SUPPORT,
|
||||||
D3D11_CPU_ACCESS_WRITE, D3D11_CREATE_DEVICE_BGRA_SUPPORT, D3D11_FILTER_MIN_MAG_MIP_POINT,
|
D3D11_FILTER_MIN_MAG_MIP_POINT, D3D11_MAPPED_SUBRESOURCE, D3D11_MAP_READ,
|
||||||
D3D11_MAPPED_SUBRESOURCE, D3D11_MAP_READ, D3D11_MAP_WRITE_DISCARD, D3D11_RENDER_TARGET_BLEND_DESC,
|
D3D11_MAP_WRITE_DISCARD, D3D11_RENDER_TARGET_BLEND_DESC, D3D11_SAMPLER_DESC, D3D11_SDK_VERSION,
|
||||||
D3D11_SAMPLER_DESC, D3D11_SDK_VERSION, D3D11_SUBRESOURCE_DATA, D3D11_TEXTURE2D_DESC,
|
D3D11_SUBRESOURCE_DATA, D3D11_TEXTURE2D_DESC, D3D11_TEXTURE_ADDRESS_CLAMP, D3D11_USAGE_DEFAULT,
|
||||||
D3D11_TEXTURE_ADDRESS_CLAMP, D3D11_USAGE_DEFAULT, D3D11_USAGE_DYNAMIC, D3D11_USAGE_STAGING,
|
D3D11_USAGE_DYNAMIC, D3D11_USAGE_STAGING, D3D11_VIEWPORT,
|
||||||
D3D11_VIEWPORT,
|
};
|
||||||
|
use windows::Win32::Graphics::Dxgi::Common::{
|
||||||
|
DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_FORMAT_R10G10B10A2_UNORM, DXGI_FORMAT_R16G16B16A16_FLOAT,
|
||||||
|
DXGI_SAMPLE_DESC,
|
||||||
};
|
};
|
||||||
use windows::Win32::Graphics::Dxgi::Common::{DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_SAMPLE_DESC};
|
|
||||||
use windows::Win32::Graphics::Dxgi::{
|
use windows::Win32::Graphics::Dxgi::{
|
||||||
CreateDXGIFactory1, IDXGIAdapter1, IDXGIFactory1, IDXGIOutput1, IDXGIOutputDuplication,
|
CreateDXGIFactory1, IDXGIAdapter1, IDXGIFactory1, IDXGIOutput1, IDXGIOutputDuplication,
|
||||||
IDXGIResource, DXGI_ERROR_ACCESS_LOST, DXGI_ERROR_DEVICE_REMOVED, DXGI_ERROR_DEVICE_RESET,
|
IDXGIResource, DXGI_ERROR_ACCESS_LOST, DXGI_ERROR_DEVICE_REMOVED, DXGI_ERROR_DEVICE_RESET,
|
||||||
@@ -230,7 +232,8 @@ unsafe fn compile_shader(src: &str, entry: PCSTR, target: PCSTR) -> Result<Vec<u
|
|||||||
.as_ref()
|
.as_ref()
|
||||||
.map(|e| {
|
.map(|e| {
|
||||||
let p = e.GetBufferPointer() as *const u8;
|
let p = e.GetBufferPointer() as *const u8;
|
||||||
String::from_utf8_lossy(std::slice::from_raw_parts(p, e.GetBufferSize())).to_string()
|
String::from_utf8_lossy(std::slice::from_raw_parts(p, e.GetBufferSize()))
|
||||||
|
.to_string()
|
||||||
})
|
})
|
||||||
.unwrap_or_default();
|
.unwrap_or_default();
|
||||||
bail!("D3DCompile failed: {msg}");
|
bail!("D3DCompile failed: {msg}");
|
||||||
@@ -326,7 +329,13 @@ impl CursorCompositor {
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
unsafe fn set_shape(&mut self, device: &ID3D11Device, bgra: &[u8], w: u32, h: u32) -> Result<()> {
|
unsafe fn set_shape(
|
||||||
|
&mut self,
|
||||||
|
device: &ID3D11Device,
|
||||||
|
bgra: &[u8],
|
||||||
|
w: u32,
|
||||||
|
h: u32,
|
||||||
|
) -> Result<()> {
|
||||||
let desc = D3D11_TEXTURE2D_DESC {
|
let desc = D3D11_TEXTURE2D_DESC {
|
||||||
Width: w,
|
Width: w,
|
||||||
Height: h,
|
Height: h,
|
||||||
@@ -394,7 +403,11 @@ impl CursorCompositor {
|
|||||||
};
|
};
|
||||||
ctx.RSSetViewports(Some(&[vp]));
|
ctx.RSSetViewports(Some(&[vp]));
|
||||||
ctx.OMSetRenderTargets(Some(&[Some(rtv.clone())]), None);
|
ctx.OMSetRenderTargets(Some(&[Some(rtv.clone())]), None);
|
||||||
let blend = if invert { &self.blend_invert } else { &self.blend };
|
let blend = if invert {
|
||||||
|
&self.blend_invert
|
||||||
|
} else {
|
||||||
|
&self.blend
|
||||||
|
};
|
||||||
ctx.OMSetBlendState(blend, Some(&[0.0; 4]), 0xffff_ffff);
|
ctx.OMSetBlendState(blend, Some(&[0.0; 4]), 0xffff_ffff);
|
||||||
ctx.VSSetShader(&self.vs, None);
|
ctx.VSSetShader(&self.vs, None);
|
||||||
ctx.PSSetShader(&self.ps, None);
|
ctx.PSSetShader(&self.ps, None);
|
||||||
@@ -409,8 +422,122 @@ impl CursorCompositor {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Fullscreen-triangle vertex shader for the HDR conversion pass (3 verts, no input layout).
|
||||||
|
const HDR_VS: &str = r"
|
||||||
|
struct VOut { float4 pos : SV_POSITION; float2 uv : TEXCOORD0; };
|
||||||
|
VOut main(uint vid : SV_VertexID) {
|
||||||
|
float2 uv = float2((vid << 1) & 2, vid & 2);
|
||||||
|
VOut o;
|
||||||
|
o.pos = float4(uv * float2(2.0, -2.0) + float2(-1.0, 1.0), 0.0, 1.0);
|
||||||
|
o.uv = uv;
|
||||||
|
return o;
|
||||||
|
}
|
||||||
|
";
|
||||||
|
|
||||||
|
/// HDR conversion pixel shader: scRGB FP16 desktop (linear, Rec.709 primaries, 1.0 = 80 nits) →
|
||||||
|
/// BT.2020 primaries → SMPTE ST 2084 (PQ) → written to a 10-bit R10G10B10A2 target for NVENC
|
||||||
|
/// (HEVC Main10 / HDR10). This is the standard Windows-HDR capture conversion (matches OBS/Sunshine).
|
||||||
|
const HDR_PS: &str = r"
|
||||||
|
Texture2D<float4> tx : register(t0);
|
||||||
|
SamplerState sm : register(s0);
|
||||||
|
// Rec.709 → Rec.2020 primaries (linear). Column-major rows as written, used with mul(M, v).
|
||||||
|
static const float3x3 BT709_TO_BT2020 = {
|
||||||
|
0.627403914, 0.329283038, 0.043313048,
|
||||||
|
0.069097292, 0.919540405, 0.011362303,
|
||||||
|
0.016391439, 0.088013308, 0.895595253
|
||||||
|
};
|
||||||
|
float3 pq_oetf(float3 L) {
|
||||||
|
// L normalized so 1.0 = 10000 nits. ST 2084.
|
||||||
|
const float m1 = 0.1593017578125;
|
||||||
|
const float m2 = 78.84375;
|
||||||
|
const float c1 = 0.8359375;
|
||||||
|
const float c2 = 18.8515625;
|
||||||
|
const float c3 = 18.6875;
|
||||||
|
float3 Lp = pow(saturate(L), m1);
|
||||||
|
return pow((c1 + c2 * Lp) / (1.0 + c3 * Lp), m2);
|
||||||
|
}
|
||||||
|
float4 main(float4 pos : SV_POSITION, float2 uv : TEXCOORD0) : SV_TARGET {
|
||||||
|
float3 scrgb = max(tx.Sample(sm, uv).rgb, 0.0); // scRGB can be negative (wide gamut); clamp
|
||||||
|
float3 nits = scrgb * 80.0; // scRGB 1.0 = 80 nits → absolute luminance
|
||||||
|
float3 lin2020 = mul(BT709_TO_BT2020, nits); // primaries conversion (linear)
|
||||||
|
float3 pq = pq_oetf(lin2020 / 10000.0); // normalize to 10k nits, encode PQ
|
||||||
|
return float4(pq, 1.0);
|
||||||
|
}
|
||||||
|
";
|
||||||
|
|
||||||
|
/// scRGB FP16 → BT.2020 PQ 10-bit conversion pass. One per capture device (rebuilt on device
|
||||||
|
/// recreate, like [`CursorCompositor`]). A single fullscreen draw samples the FP16 source SRV and
|
||||||
|
/// writes PQ-encoded BT.2020 to the bound R10G10B10A2 render target.
|
||||||
|
struct HdrConverter {
|
||||||
|
vs: ID3D11VertexShader,
|
||||||
|
ps: ID3D11PixelShader,
|
||||||
|
sampler: ID3D11SamplerState,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl HdrConverter {
|
||||||
|
unsafe fn new(device: &ID3D11Device) -> Result<Self> {
|
||||||
|
let vsb = compile_shader(HDR_VS, s!("main"), s!("vs_5_0"))?;
|
||||||
|
let psb = compile_shader(HDR_PS, s!("main"), s!("ps_5_0"))?;
|
||||||
|
let mut vs = None;
|
||||||
|
device.CreateVertexShader(&vsb, None, Some(&mut vs))?;
|
||||||
|
let mut ps = None;
|
||||||
|
device.CreatePixelShader(&psb, None, Some(&mut ps))?;
|
||||||
|
let sd = D3D11_SAMPLER_DESC {
|
||||||
|
Filter: D3D11_FILTER_MIN_MAG_MIP_POINT,
|
||||||
|
AddressU: D3D11_TEXTURE_ADDRESS_CLAMP,
|
||||||
|
AddressV: D3D11_TEXTURE_ADDRESS_CLAMP,
|
||||||
|
AddressW: D3D11_TEXTURE_ADDRESS_CLAMP,
|
||||||
|
ComparisonFunc: D3D11_COMPARISON_NEVER,
|
||||||
|
MaxLOD: f32::MAX,
|
||||||
|
..Default::default()
|
||||||
|
};
|
||||||
|
let mut sampler = None;
|
||||||
|
device.CreateSamplerState(&sd, Some(&mut sampler))?;
|
||||||
|
Ok(Self {
|
||||||
|
vs: vs.context("hdr vs")?,
|
||||||
|
ps: ps.context("hdr ps")?,
|
||||||
|
sampler: sampler.context("hdr sampler")?,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Convert `src_srv` (FP16 scRGB) into `dst_rtv` (R10G10B10A2 PQ BT.2020). Opaque pass, no blend.
|
||||||
|
unsafe fn convert(
|
||||||
|
&self,
|
||||||
|
ctx: &ID3D11DeviceContext,
|
||||||
|
src_srv: &ID3D11ShaderResourceView,
|
||||||
|
dst_rtv: &ID3D11RenderTargetView,
|
||||||
|
w: u32,
|
||||||
|
h: u32,
|
||||||
|
) {
|
||||||
|
let vp = D3D11_VIEWPORT {
|
||||||
|
TopLeftX: 0.0,
|
||||||
|
TopLeftY: 0.0,
|
||||||
|
Width: w as f32,
|
||||||
|
Height: h as f32,
|
||||||
|
MinDepth: 0.0,
|
||||||
|
MaxDepth: 1.0,
|
||||||
|
};
|
||||||
|
ctx.RSSetViewports(Some(&[vp]));
|
||||||
|
ctx.OMSetRenderTargets(Some(&[Some(dst_rtv.clone())]), None);
|
||||||
|
ctx.OMSetBlendState(None, None, 0xffff_ffff); // opaque overwrite
|
||||||
|
ctx.VSSetShader(&self.vs, None);
|
||||||
|
ctx.PSSetShader(&self.ps, None);
|
||||||
|
ctx.PSSetShaderResources(0, Some(&[Some(src_srv.clone())]));
|
||||||
|
ctx.PSSetSamplers(0, Some(&[Some(self.sampler.clone())]));
|
||||||
|
ctx.IASetInputLayout(None);
|
||||||
|
ctx.IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
|
||||||
|
ctx.Draw(3, 0);
|
||||||
|
// Unbind so the next frame can CopyResource into the source and re-RTV the destination.
|
||||||
|
ctx.OMSetRenderTargets(Some(&[None]), None);
|
||||||
|
ctx.PSSetShaderResources(0, Some(&[None]));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Convert a DXGI pointer shape (color / masked-color / monochrome) into top-down BGRA.
|
/// Convert a DXGI pointer shape (color / masked-color / monochrome) into top-down BGRA.
|
||||||
fn convert_pointer_shape(buf: &[u8], si: &DXGI_OUTDUPL_POINTER_SHAPE_INFO) -> Option<(Vec<u8>, u32, u32)> {
|
fn convert_pointer_shape(
|
||||||
|
buf: &[u8],
|
||||||
|
si: &DXGI_OUTDUPL_POINTER_SHAPE_INFO,
|
||||||
|
) -> Option<(Vec<u8>, u32, u32)> {
|
||||||
let w = si.Width as usize;
|
let w = si.Width as usize;
|
||||||
let pitch = si.Pitch as usize;
|
let pitch = si.Pitch as usize;
|
||||||
if w == 0 || pitch == 0 {
|
if w == 0 || pitch == 0 {
|
||||||
@@ -571,7 +698,34 @@ pub struct DuplCapturer {
|
|||||||
/// Reused owned texture the duplication frame is copied into for the D3D11 path (the duplication
|
/// Reused owned texture the duplication frame is copied into for the D3D11 path (the duplication
|
||||||
/// surface is transient and released each frame).
|
/// surface is transient and released each frame).
|
||||||
gpu_copy: Option<ID3D11Texture2D>,
|
gpu_copy: Option<ID3D11Texture2D>,
|
||||||
have_gpu_frame: bool,
|
/// The most recently produced presentable GPU texture + its pixel format, repeated by
|
||||||
|
/// `next_frame` when AcquireNextFrame reports no change (static desktop) or during a rebuild.
|
||||||
|
/// Format-tagged because the SDR path presents BGRA `gpu_copy` while the HDR path presents the
|
||||||
|
/// 10-bit `hdr10_out` — the encoder needs the right format on every frame.
|
||||||
|
last_present: Option<(ID3D11Texture2D, PixelFormat)>,
|
||||||
|
/// HDR (scRGB FP16) capture state. Set when the duplication surface is `R16G16B16A16_FLOAT`
|
||||||
|
/// (the desktop has HDR on). The frame can't be `CopyResource`d into a BGRA target, so the HDR
|
||||||
|
/// path copies it into an FP16 SRV texture, composites the cursor, then runs [`HdrConverter`] to
|
||||||
|
/// produce a BT.2020 PQ 10-bit (`R10G10B10A2`) frame for NVENC. Toggling HDR fires ACCESS_LOST →
|
||||||
|
/// `recreate_dupl` re-detects the format, so this tracks the *current* duplication.
|
||||||
|
hdr_fp16: bool,
|
||||||
|
/// FP16 copy of the duplication surface (RT|SRV): the cursor composites onto it and the converter
|
||||||
|
/// samples it. Reallocated on device/size change.
|
||||||
|
fp16_src: Option<ID3D11Texture2D>,
|
||||||
|
fp16_srv: Option<ID3D11ShaderResourceView>,
|
||||||
|
/// 10-bit `R10G10B10A2` PQ output of the HDR conversion — the texture handed to NVENC.
|
||||||
|
hdr10_out: Option<ID3D11Texture2D>,
|
||||||
|
/// scRGB→PQ conversion pass; rebuilt on device recreate.
|
||||||
|
hdr_conv: Option<HdrConverter>,
|
||||||
|
/// Last time a duplication rebuild was attempted, to throttle retries during an outage (e.g. a
|
||||||
|
/// secure-desktop dwell where the output is gone) so we don't block the encode loop or hammer
|
||||||
|
/// DuplicateOutput — between attempts the last good frame is repeated. `None` = never attempted.
|
||||||
|
last_rebuild: Option<Instant>,
|
||||||
|
/// True once at least one real frame has been produced. After that, a frame drought (e.g. a long
|
||||||
|
/// secure-desktop dwell with nothing rendering to the virtual output) must never fatally end the
|
||||||
|
/// session — `next_frame` keeps repeating the last/seeded frame instead of erroring on its
|
||||||
|
/// deadline. The deadline stays fatal only *before* the first frame (a genuine startup misconfig).
|
||||||
|
ever_got_frame: bool,
|
||||||
/// GPU cursor overlay (rebuilt on device recreate). `None` until the first composite.
|
/// GPU cursor overlay (rebuilt on device recreate). `None` until the first composite.
|
||||||
cursor: Option<CursorCompositor>,
|
cursor: Option<CursorCompositor>,
|
||||||
/// Last cursor shape as BGRA (kept device-independent so it survives a device recreate).
|
/// Last cursor shape as BGRA (kept device-independent so it survives a device recreate).
|
||||||
@@ -721,7 +875,7 @@ impl DuplCapturer {
|
|||||||
.map(|v| matches!(v.to_ascii_lowercase().as_str(), "nvenc" | "hw" | "nvidia"))
|
.map(|v| matches!(v.to_ascii_lowercase().as_str(), "nvenc" | "hw" | "nvidia"))
|
||||||
.unwrap_or(false);
|
.unwrap_or(false);
|
||||||
tracing::info!(
|
tracing::info!(
|
||||||
"DXGI duplication: {}x{}@{} on {} ({})",
|
"DXGI duplication: {}x{}@{} on {} ({}) dxgi_format={} (87=BGRA8 24=R10G10B10A2 10=R16G16B16A16_FLOAT)",
|
||||||
width,
|
width,
|
||||||
height,
|
height,
|
||||||
refresh_hz,
|
refresh_hz,
|
||||||
@@ -730,7 +884,8 @@ impl DuplCapturer {
|
|||||||
"D3D11 zero-copy"
|
"D3D11 zero-copy"
|
||||||
} else {
|
} else {
|
||||||
"CPU staging"
|
"CPU staging"
|
||||||
}
|
},
|
||||||
|
dd.ModeDesc.Format.0,
|
||||||
);
|
);
|
||||||
Ok(Self {
|
Ok(Self {
|
||||||
device,
|
device,
|
||||||
@@ -752,7 +907,14 @@ impl DuplCapturer {
|
|||||||
last: None,
|
last: None,
|
||||||
gpu_mode,
|
gpu_mode,
|
||||||
gpu_copy: None,
|
gpu_copy: None,
|
||||||
have_gpu_frame: false,
|
last_present: None,
|
||||||
|
hdr_fp16: dd.ModeDesc.Format == DXGI_FORMAT_R16G16B16A16_FLOAT,
|
||||||
|
fp16_src: None,
|
||||||
|
fp16_srv: None,
|
||||||
|
hdr10_out: None,
|
||||||
|
hdr_conv: None,
|
||||||
|
last_rebuild: None,
|
||||||
|
ever_got_frame: false,
|
||||||
cursor: None,
|
cursor: None,
|
||||||
cursor_shape: None,
|
cursor_shape: None,
|
||||||
cursor_pos: (0, 0),
|
cursor_pos: (0, 0),
|
||||||
@@ -819,10 +981,104 @@ impl DuplCapturer {
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// FP16 (`R16G16B16A16_FLOAT`) copy of the HDR duplication surface (RT for the cursor composite +
|
||||||
|
/// SRV for the converter). Reallocated when absent (device/size change drops it).
|
||||||
|
unsafe fn ensure_fp16_src(&mut self) -> Result<()> {
|
||||||
|
if self.fp16_src.is_some() {
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
let desc = D3D11_TEXTURE2D_DESC {
|
||||||
|
Width: self.width,
|
||||||
|
Height: self.height,
|
||||||
|
MipLevels: 1,
|
||||||
|
ArraySize: 1,
|
||||||
|
Format: DXGI_FORMAT_R16G16B16A16_FLOAT,
|
||||||
|
SampleDesc: DXGI_SAMPLE_DESC {
|
||||||
|
Count: 1,
|
||||||
|
Quality: 0,
|
||||||
|
},
|
||||||
|
Usage: D3D11_USAGE_DEFAULT,
|
||||||
|
BindFlags: (D3D11_BIND_RENDER_TARGET.0 | D3D11_BIND_SHADER_RESOURCE.0) as u32,
|
||||||
|
CPUAccessFlags: 0,
|
||||||
|
MiscFlags: 0,
|
||||||
|
};
|
||||||
|
let mut t: Option<ID3D11Texture2D> = None;
|
||||||
|
self.device
|
||||||
|
.CreateTexture2D(&desc, None, Some(&mut t))
|
||||||
|
.context("CreateTexture2D(fp16 src)")?;
|
||||||
|
let t = t.context("fp16 src tex")?;
|
||||||
|
let mut srv = None;
|
||||||
|
self.device
|
||||||
|
.CreateShaderResourceView(&t, None, Some(&mut srv))?;
|
||||||
|
self.fp16_srv = Some(srv.context("fp16 srv")?);
|
||||||
|
self.fp16_src = Some(t);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// 10-bit `R10G10B10A2_UNORM` PQ output of the HDR conversion — the texture NVENC encodes.
|
||||||
|
unsafe fn ensure_hdr10_out(&mut self) -> Result<()> {
|
||||||
|
if self.hdr10_out.is_some() {
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
let desc = D3D11_TEXTURE2D_DESC {
|
||||||
|
Width: self.width,
|
||||||
|
Height: self.height,
|
||||||
|
MipLevels: 1,
|
||||||
|
ArraySize: 1,
|
||||||
|
Format: DXGI_FORMAT_R10G10B10A2_UNORM,
|
||||||
|
SampleDesc: DXGI_SAMPLE_DESC {
|
||||||
|
Count: 1,
|
||||||
|
Quality: 0,
|
||||||
|
},
|
||||||
|
Usage: D3D11_USAGE_DEFAULT,
|
||||||
|
BindFlags: D3D11_BIND_RENDER_TARGET.0 as u32,
|
||||||
|
CPUAccessFlags: 0,
|
||||||
|
MiscFlags: 0,
|
||||||
|
};
|
||||||
|
let mut t: Option<ID3D11Texture2D> = None;
|
||||||
|
self.device
|
||||||
|
.CreateTexture2D(&desc, None, Some(&mut t))
|
||||||
|
.context("CreateTexture2D(hdr10 out)")?;
|
||||||
|
self.hdr10_out = t;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Allocate a presentable GPU texture on the *current* device, clear it to black, and record it
|
||||||
|
/// as `last_present`. Called after a desktop-switch recovery so `next_frame` always has a D3D11
|
||||||
|
/// frame to repeat even while the (secure) desktop renders nothing to the virtual output — this
|
||||||
|
/// is what keeps the session alive across a lock/login/UAC transition instead of dropping it. In
|
||||||
|
/// HDR mode it seeds the 10-bit output (black = PQ 0); otherwise the BGRA copy. One-shot: the next
|
||||||
|
/// real frame overwrites the texture in place.
|
||||||
|
unsafe fn seed_black_gpu_frame(&mut self) -> Result<()> {
|
||||||
|
if self.hdr_fp16 {
|
||||||
|
self.ensure_hdr10_out()?;
|
||||||
|
let out = self.hdr10_out.clone().context("hdr10 out texture")?;
|
||||||
|
let mut rtv: Option<ID3D11RenderTargetView> = None;
|
||||||
|
self.device
|
||||||
|
.CreateRenderTargetView(&out, None, Some(&mut rtv))?;
|
||||||
|
self.context
|
||||||
|
.ClearRenderTargetView(&rtv.context("null RTV (hdr seed)")?, &[0.0, 0.0, 0.0, 1.0]);
|
||||||
|
self.last_present = Some((out, PixelFormat::Rgb10a2));
|
||||||
|
} else {
|
||||||
|
self.ensure_gpu_copy()?;
|
||||||
|
let gpu = self.gpu_copy.clone().context("gpu copy texture")?;
|
||||||
|
let mut rtv: Option<ID3D11RenderTargetView> = None;
|
||||||
|
self.device
|
||||||
|
.CreateRenderTargetView(&gpu, None, Some(&mut rtv))?;
|
||||||
|
self.context
|
||||||
|
.ClearRenderTargetView(&rtv.context("null RTV (sdr seed)")?, &[0.0, 0.0, 0.0, 1.0]);
|
||||||
|
self.last_present = Some((gpu, PixelFormat::Bgra));
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
/// Pull cursor position/visibility/shape out of the frame info (the HW cursor is NOT in the frame).
|
/// Pull cursor position/visibility/shape out of the frame info (the HW cursor is NOT in the frame).
|
||||||
unsafe fn update_cursor(&mut self, info: &DXGI_OUTDUPL_FRAME_INFO) {
|
unsafe fn update_cursor(&mut self, info: &DXGI_OUTDUPL_FRAME_INFO) {
|
||||||
if info.LastMouseUpdateTime != 0 {
|
if info.LastMouseUpdateTime != 0 {
|
||||||
self.cursor_pos = (info.PointerPosition.Position.x, info.PointerPosition.Position.y);
|
self.cursor_pos = (
|
||||||
|
info.PointerPosition.Position.x,
|
||||||
|
info.PointerPosition.Position.y,
|
||||||
|
);
|
||||||
self.cursor_visible = info.PointerPosition.Visible.as_bool();
|
self.cursor_visible = info.PointerPosition.Visible.as_bool();
|
||||||
}
|
}
|
||||||
if info.PointerShapeBufferSize > 0 {
|
if info.PointerShapeBufferSize > 0 {
|
||||||
@@ -856,12 +1112,21 @@ impl DuplCapturer {
|
|||||||
|
|
||||||
/// Composite the cursor onto the GPU frame texture (zero-copy path).
|
/// Composite the cursor onto the GPU frame texture (zero-copy path).
|
||||||
unsafe fn composite_cursor_gpu(&mut self, gpu: &ID3D11Texture2D) -> Result<()> {
|
unsafe fn composite_cursor_gpu(&mut self, gpu: &ID3D11Texture2D) -> Result<()> {
|
||||||
|
// Diagnostic kill-switch: skip the GPU cursor composite entirely (PUNKTFUNK_NO_CURSOR=1) to
|
||||||
|
// isolate its cost on the 3D engine. The per-frame render-target view + draw to the 5K target
|
||||||
|
// is the suspect for the high 3D usage under heavy desktop change.
|
||||||
|
if std::env::var_os("PUNKTFUNK_NO_CURSOR").is_some() {
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
self.dbg_cursor += 1;
|
self.dbg_cursor += 1;
|
||||||
if self.dbg_cursor % 240 == 1 {
|
if self.dbg_cursor % 240 == 1 {
|
||||||
tracing::debug!(
|
tracing::debug!(
|
||||||
visible = self.cursor_visible,
|
visible = self.cursor_visible,
|
||||||
pos = format!("{:?}", self.cursor_pos),
|
pos = format!("{:?}", self.cursor_pos),
|
||||||
shape = self.cursor_shape.as_ref().map(|(_, w, h)| format!("{w}x{h}")),
|
shape = self
|
||||||
|
.cursor_shape
|
||||||
|
.as_ref()
|
||||||
|
.map(|(_, w, h)| format!("{w}x{h}")),
|
||||||
"cursor state"
|
"cursor state"
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
@@ -898,58 +1163,70 @@ impl DuplCapturer {
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// ONE rebuild attempt — deliberately non-blocking. ACCESS_LOST fires on desktop switches
|
||||||
|
/// (normal ↔ Winlogon secure: lock/login/UAC) and on the mode change we issue at create. We
|
||||||
|
/// re-attach to the now-current input desktop and recreate the D3D11 device + duplication on it
|
||||||
|
/// (a device made on the previous desktop can't sustain a duplication on the new one). CRUCIAL:
|
||||||
|
/// no internal multi-second retry loop — during a secure-desktop dwell the SudoVDA output is
|
||||||
|
/// *gone* (`no DXGI output named …`), and a blocking retry here would starve the encode/send
|
||||||
|
/// loop of frames for seconds, so the client times out and disconnects (the bug this fixes).
|
||||||
|
/// Instead a single attempt returns immediately; the caller ([`acquire`]) repeats the last good
|
||||||
|
/// frame and retries on a throttle, so the session survives an arbitrarily long secure visit.
|
||||||
unsafe fn recreate_dupl(&mut self) -> Result<()> {
|
unsafe fn recreate_dupl(&mut self) -> Result<()> {
|
||||||
if self.holding_frame {
|
if self.holding_frame {
|
||||||
let _ = self.dupl.ReleaseFrame();
|
let _ = self.dupl.ReleaseFrame();
|
||||||
self.holding_frame = false;
|
self.holding_frame = false;
|
||||||
}
|
}
|
||||||
// ACCESS_LOST fires on desktop switches (normal ↔ Winlogon secure: lock/login/UAC) and on the
|
// The SudoVDA output's GDI name can CHANGE across a secure-desktop topology rebuild —
|
||||||
// mode change we issue at create. Re-attach to the now-current input desktop AND recreate the
|
// re-resolve from the STABLE target id so we find it under its current name.
|
||||||
// D3D11 device on it: a device made on the previous desktop cannot sustain a duplication on the
|
if let Some(n) = crate::vdisplay::sudovda::resolve_gdi_name(self.target_id) {
|
||||||
// new one (perpetual ACCESS_LOST). The capturer hands the new device out on `FramePayload::D3d11`,
|
self.gdi_name = n;
|
||||||
// so NVENC re-inits when it sees it. Retry while the desktop is mid-reconfigure.
|
}
|
||||||
let deadline = Instant::now() + Duration::from_millis(12000);
|
attach_input_desktop();
|
||||||
loop {
|
let (dev, ctx, out, dupl) = reopen_duplication(&self.gdi_name)?; // Err → caller repeats + retries
|
||||||
// The SudoVDA virtual output's GDI name can CHANGE across a secure-desktop topology
|
// A desktop switch can come back at a different size (e.g. the user session applies its own
|
||||||
// rebuild — the observed failure was searching for the stale \\.\DISPLAYn until the
|
// resolution on login). Adopt it: update dimensions and drop the staging/gpu copies so they
|
||||||
// deadline and dying ("no DXGI output named ..."). Re-resolve it from the STABLE target
|
// reallocate. NVENC re-inits at the new size when it sees the frame.
|
||||||
// id each retry so recovery finds the output under its current name.
|
let dd: DXGI_OUTDUPL_DESC = dupl.GetDesc();
|
||||||
if let Some(n) = crate::vdisplay::sudovda::resolve_gdi_name(self.target_id) {
|
let (nw, nh) = (dd.ModeDesc.Width, dd.ModeDesc.Height);
|
||||||
self.gdi_name = n;
|
tracing::info!(
|
||||||
}
|
dxgi_format = dd.ModeDesc.Format.0,
|
||||||
attach_input_desktop();
|
"DXGI duplication rebuilt (format: 87=BGRA8 24=R10G10B10A2 10=R16G16B16A16_FLOAT)"
|
||||||
match reopen_duplication(&self.gdi_name) {
|
);
|
||||||
Ok((dev, ctx, out, dupl)) => {
|
if nw != self.width || nh != self.height {
|
||||||
// A desktop switch can come back at a different size (e.g. the user session applies
|
tracing::info!(
|
||||||
// its own resolution on login). Adopt it: update dimensions and drop the staging/gpu
|
old = format!("{}x{}", self.width, self.height),
|
||||||
// copies so they reallocate. NVENC re-inits at the new size when it sees the frame.
|
new = format!("{nw}x{nh}"),
|
||||||
let dd: DXGI_OUTDUPL_DESC = dupl.GetDesc();
|
"DXGI duplication size changed across switch"
|
||||||
let (nw, nh) = (dd.ModeDesc.Width, dd.ModeDesc.Height);
|
);
|
||||||
if nw != self.width || nh != self.height {
|
self.width = nw;
|
||||||
tracing::info!(
|
self.height = nh;
|
||||||
old = format!("{}x{}", self.width, self.height),
|
self.staging = None;
|
||||||
new = format!("{nw}x{nh}"),
|
}
|
||||||
"DXGI duplication size changed across switch"
|
self.device = dev;
|
||||||
);
|
self.context = ctx;
|
||||||
self.width = nw;
|
self.output = out;
|
||||||
self.height = nh;
|
self.dupl = dupl;
|
||||||
self.staging = None;
|
self.gpu_copy = None; // stale: belonged to the old device
|
||||||
}
|
self.cursor = None; // shaders/textures belonged to the old device; rebuilt on demand
|
||||||
self.device = dev;
|
self.last_present = None; // belonged to the old device; reseeded below
|
||||||
self.context = ctx;
|
// Re-detect HDR and drop the HDR textures/converter (old device). Toggling HDR on or
|
||||||
self.output = out;
|
// off is exactly this path: the duplication comes back as FP16 (HDR) or BGRA8.
|
||||||
self.dupl = dupl;
|
self.hdr_fp16 = dd.ModeDesc.Format == DXGI_FORMAT_R16G16B16A16_FLOAT;
|
||||||
self.gpu_copy = None; // stale: belonged to the old device
|
self.fp16_src = None;
|
||||||
self.cursor = None; // shaders/textures belonged to the old device; rebuilt on demand
|
self.fp16_srv = None;
|
||||||
self.have_gpu_frame = false;
|
self.hdr10_out = None;
|
||||||
self.first_frame = true;
|
self.hdr_conv = None;
|
||||||
nudge_cursor_onto(&self.output); // re-kick after recovery
|
self.first_frame = true;
|
||||||
return Ok(());
|
// Seed a black frame on the NEW device so next_frame always has something to repeat (and the
|
||||||
}
|
// encoder re-inits) until real frames resume.
|
||||||
Err(e) if Instant::now() >= deadline => return Err(e),
|
if self.gpu_mode {
|
||||||
Err(_) => std::thread::sleep(Duration::from_millis(120)),
|
if let Err(e) = self.seed_black_gpu_frame() {
|
||||||
|
tracing::warn!(error = %format!("{e:#}"), "seed black frame after recovery failed");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
nudge_cursor_onto(&self.output); // re-kick after recovery
|
||||||
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Acquire one frame: `Some` on a fresh image, `None` on timeout (no change → caller reuses last).
|
/// Acquire one frame: `Some` on a fresh image, `None` on timeout (no change → caller reuses last).
|
||||||
@@ -960,7 +1237,11 @@ impl DuplCapturer {
|
|||||||
}
|
}
|
||||||
let mut info = DXGI_OUTDUPL_FRAME_INFO::default();
|
let mut info = DXGI_OUTDUPL_FRAME_INFO::default();
|
||||||
let mut res: Option<IDXGIResource> = None;
|
let mut res: Option<IDXGIResource> = None;
|
||||||
let timeout = if self.first_frame { 2000 } else { self.timeout_ms };
|
let timeout = if self.first_frame {
|
||||||
|
2000
|
||||||
|
} else {
|
||||||
|
self.timeout_ms
|
||||||
|
};
|
||||||
match self.dupl.AcquireNextFrame(timeout, &mut info, &mut res) {
|
match self.dupl.AcquireNextFrame(timeout, &mut info, &mut res) {
|
||||||
Ok(()) => {
|
Ok(()) => {
|
||||||
if self.first_frame {
|
if self.first_frame {
|
||||||
@@ -993,13 +1274,31 @@ impl DuplCapturer {
|
|||||||
|| e.code() == DXGI_ERROR_DEVICE_RESET =>
|
|| e.code() == DXGI_ERROR_DEVICE_RESET =>
|
||||||
{
|
{
|
||||||
self.dbg_lost += 1;
|
self.dbg_lost += 1;
|
||||||
tracing::warn!(
|
// THROTTLED, NON-BLOCKING recovery. During a secure-desktop dwell the SudoVDA output
|
||||||
lost = self.dbg_lost,
|
// is gone, so a rebuild fails for the whole visit. We must NOT block retrying (that
|
||||||
code = format!("{:#x}", e.code().0),
|
// starves the encode/send loop → the client times out → disconnect — the bug). Try a
|
||||||
"DXGI capture lost (desktop switch?) — recovering"
|
// rebuild at most ~4×/s; between attempts return "no new frame" so next_frame repeats
|
||||||
);
|
// the last good frame, keeping the client fed (frozen) until the desktop returns. A
|
||||||
self.recreate_dupl()?;
|
// brief sleep on the throttled path avoids busy-spinning on the dead duplication.
|
||||||
self.first_frame = true;
|
let now = Instant::now();
|
||||||
|
let due = self.last_rebuild.map_or(true, |t| {
|
||||||
|
now.duration_since(t) >= Duration::from_millis(250)
|
||||||
|
});
|
||||||
|
if due {
|
||||||
|
self.last_rebuild = Some(now);
|
||||||
|
if self.dbg_lost % 8 == 1 {
|
||||||
|
tracing::warn!(
|
||||||
|
lost = self.dbg_lost,
|
||||||
|
code = format!("{:#x}", e.code().0),
|
||||||
|
"DXGI capture lost (desktop switch?) — repeating last frame, retrying rebuild"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if self.recreate_dupl().is_ok() {
|
||||||
|
self.first_frame = true;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
std::thread::sleep(Duration::from_millis(8));
|
||||||
|
}
|
||||||
return Ok(None);
|
return Ok(None);
|
||||||
}
|
}
|
||||||
Err(e) => return Err(e).context("AcquireNextFrame"),
|
Err(e) => return Err(e).context("AcquireNextFrame"),
|
||||||
@@ -1007,6 +1306,47 @@ impl DuplCapturer {
|
|||||||
self.holding_frame = true;
|
self.holding_frame = true;
|
||||||
let res = res.context("AcquireNextFrame: null resource")?;
|
let res = res.context("AcquireNextFrame: null resource")?;
|
||||||
let tex: ID3D11Texture2D = res.cast().context("resource -> Texture2D")?;
|
let tex: ID3D11Texture2D = res.cast().context("resource -> Texture2D")?;
|
||||||
|
if self.gpu_mode && self.hdr_fp16 {
|
||||||
|
// HDR zero-copy path: the duplication surface is scRGB FP16 (R16G16B16A16_FLOAT) — it can't
|
||||||
|
// be CopyResource'd into a BGRA target (that was the freeze + cursor-trail bug). Copy it into
|
||||||
|
// an FP16 SRV texture (same format → valid), composite the cursor onto it (the cursor lands
|
||||||
|
// at ~SDR-white brightness, then goes through the PQ curve correctly), then convert scRGB →
|
||||||
|
// BT.2020 PQ 10-bit into hdr10_out and hand THAT to NVENC (HEVC Main10 / HDR10).
|
||||||
|
self.ensure_fp16_src()?;
|
||||||
|
let src = self.fp16_src.clone().context("fp16 src texture")?;
|
||||||
|
self.context.CopyResource(&src, &tex);
|
||||||
|
let _ = self.dupl.ReleaseFrame();
|
||||||
|
self.holding_frame = false;
|
||||||
|
self.composite_cursor_gpu(&src)?; // onto the FP16 surface (RTV works on FP16)
|
||||||
|
self.ensure_hdr10_out()?;
|
||||||
|
let out = self.hdr10_out.clone().context("hdr10 out texture")?;
|
||||||
|
if self.hdr_conv.is_none() {
|
||||||
|
self.hdr_conv = Some(HdrConverter::new(&self.device)?);
|
||||||
|
}
|
||||||
|
let srv = self.fp16_srv.clone().context("fp16 srv")?;
|
||||||
|
let mut rtv: Option<ID3D11RenderTargetView> = None;
|
||||||
|
self.device
|
||||||
|
.CreateRenderTargetView(&out, None, Some(&mut rtv))?;
|
||||||
|
let rtv = rtv.context("hdr10 rtv")?;
|
||||||
|
self.hdr_conv.as_ref().unwrap().convert(
|
||||||
|
&self.context,
|
||||||
|
&srv,
|
||||||
|
&rtv,
|
||||||
|
self.width,
|
||||||
|
self.height,
|
||||||
|
);
|
||||||
|
self.last_present = Some((out.clone(), PixelFormat::Rgb10a2));
|
||||||
|
return Ok(Some(CapturedFrame {
|
||||||
|
width: self.width,
|
||||||
|
height: self.height,
|
||||||
|
pts_ns: now_ns(),
|
||||||
|
format: PixelFormat::Rgb10a2,
|
||||||
|
payload: FramePayload::D3d11(D3d11Frame {
|
||||||
|
texture: out,
|
||||||
|
device: self.device.clone(),
|
||||||
|
}),
|
||||||
|
}));
|
||||||
|
}
|
||||||
if self.gpu_mode {
|
if self.gpu_mode {
|
||||||
// Zero-copy path: keep the frame on the GPU for NVENC. Copy the transient duplication
|
// Zero-copy path: keep the frame on the GPU for NVENC. Copy the transient duplication
|
||||||
// surface into a reused owned texture, release the duplication frame, hand off the texture.
|
// surface into a reused owned texture, release the duplication frame, hand off the texture.
|
||||||
@@ -1015,8 +1355,8 @@ impl DuplCapturer {
|
|||||||
self.context.CopyResource(&gpu, &tex);
|
self.context.CopyResource(&gpu, &tex);
|
||||||
let _ = self.dupl.ReleaseFrame();
|
let _ = self.dupl.ReleaseFrame();
|
||||||
self.holding_frame = false;
|
self.holding_frame = false;
|
||||||
self.have_gpu_frame = true;
|
|
||||||
self.composite_cursor_gpu(&gpu)?;
|
self.composite_cursor_gpu(&gpu)?;
|
||||||
|
self.last_present = Some((gpu.clone(), PixelFormat::Bgra));
|
||||||
return Ok(Some(CapturedFrame {
|
return Ok(Some(CapturedFrame {
|
||||||
width: self.width,
|
width: self.width,
|
||||||
height: self.height,
|
height: self.height,
|
||||||
@@ -1079,20 +1419,23 @@ impl Capturer for DuplCapturer {
|
|||||||
fn next_frame(&mut self) -> Result<CapturedFrame> {
|
fn next_frame(&mut self) -> Result<CapturedFrame> {
|
||||||
// Generous: a secure-desktop switch can take several seconds to settle (re-resolve + recreate
|
// Generous: a secure-desktop switch can take several seconds to settle (re-resolve + recreate
|
||||||
// the duplication up to 12 s). Better a few seconds of frozen-last-frame than dropping the stream.
|
// the duplication up to 12 s). Better a few seconds of frozen-last-frame than dropping the stream.
|
||||||
let deadline = Instant::now() + Duration::from_secs(20);
|
let mut deadline = Instant::now() + Duration::from_secs(20);
|
||||||
loop {
|
loop {
|
||||||
if let Some(f) = unsafe { self.acquire() }? {
|
if let Some(f) = unsafe { self.acquire() }? {
|
||||||
|
self.ever_got_frame = true;
|
||||||
return Ok(f);
|
return Ok(f);
|
||||||
}
|
}
|
||||||
if self.gpu_mode && self.have_gpu_frame {
|
if self.gpu_mode {
|
||||||
if let Some(gpu) = &self.gpu_copy {
|
if let Some((tex, fmt)) = &self.last_present {
|
||||||
|
// Repeat the last presented GPU frame (SDR BGRA or HDR 10-bit), keeping the encoder
|
||||||
|
// on a matching format through a static desktop or a mid-rebuild gap.
|
||||||
return Ok(CapturedFrame {
|
return Ok(CapturedFrame {
|
||||||
width: self.width,
|
width: self.width,
|
||||||
height: self.height,
|
height: self.height,
|
||||||
pts_ns: now_ns(),
|
pts_ns: now_ns(),
|
||||||
format: PixelFormat::Bgra,
|
format: *fmt,
|
||||||
payload: FramePayload::D3d11(D3d11Frame {
|
payload: FramePayload::D3d11(D3d11Frame {
|
||||||
texture: gpu.clone(),
|
texture: tex.clone(),
|
||||||
device: self.device.clone(),
|
device: self.device.clone(),
|
||||||
}),
|
}),
|
||||||
});
|
});
|
||||||
@@ -1108,6 +1451,14 @@ impl Capturer for DuplCapturer {
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
if Instant::now() > deadline {
|
if Instant::now() > deadline {
|
||||||
|
// After we've streamed at least once, never fatally drop on a frame drought: a long
|
||||||
|
// secure-desktop dwell (or a slow rebuild) just means no NEW frame yet. Reset the
|
||||||
|
// deadline and keep repeating the last/seeded frame so the session stays alive. The
|
||||||
|
// deadline stays fatal only before the first frame — a genuine "monitor never lit up".
|
||||||
|
if self.ever_got_frame {
|
||||||
|
deadline = Instant::now() + Duration::from_secs(20);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
return Err(anyhow!(
|
return Err(anyhow!(
|
||||||
"no DXGI frame within 20s (SudoVDA monitor not activated by a WDDM GPU?)"
|
"no DXGI frame within 20s (SudoVDA monitor not activated by a WDDM GPU?)"
|
||||||
));
|
));
|
||||||
|
|||||||
@@ -102,6 +102,7 @@ pub fn validate_dimensions(codec: Codec, width: u32, height: u32) -> Result<()>
|
|||||||
/// encoder takes GPU frames (`AV_PIX_FMT_CUDA`) from the zero-copy path; otherwise it takes
|
/// encoder takes GPU frames (`AV_PIX_FMT_CUDA`) from the zero-copy path; otherwise it takes
|
||||||
/// packed RGB/BGR CPU frames. `format`/`bitrate_bps`/`codec`/mode come from session
|
/// packed RGB/BGR CPU frames. `format`/`bitrate_bps`/`codec`/mode come from session
|
||||||
/// negotiation; the caller derives `cuda` from the first captured frame's payload.
|
/// negotiation; the caller derives `cuda` from the first captured frame's payload.
|
||||||
|
#[allow(clippy::too_many_arguments)]
|
||||||
pub fn open_video(
|
pub fn open_video(
|
||||||
codec: Codec,
|
codec: Codec,
|
||||||
format: PixelFormat,
|
format: PixelFormat,
|
||||||
@@ -110,6 +111,7 @@ pub fn open_video(
|
|||||||
fps: u32,
|
fps: u32,
|
||||||
bitrate_bps: u64,
|
bitrate_bps: u64,
|
||||||
cuda: bool,
|
cuda: bool,
|
||||||
|
bit_depth: u8,
|
||||||
) -> Result<Box<dyn Encoder>> {
|
) -> Result<Box<dyn Encoder>> {
|
||||||
validate_dimensions(codec, width, height)?;
|
validate_dimensions(codec, width, height)?;
|
||||||
#[cfg(target_os = "linux")]
|
#[cfg(target_os = "linux")]
|
||||||
@@ -134,7 +136,7 @@ pub fn open_video(
|
|||||||
}
|
}
|
||||||
let mut last: Option<anyhow::Error> = None;
|
let mut last: Option<anyhow::Error> = None;
|
||||||
for (i, &b) in candidates.iter().enumerate() {
|
for (i, &b) in candidates.iter().enumerate() {
|
||||||
match linux::NvencEncoder::open(codec, format, width, height, fps, b, cuda) {
|
match linux::NvencEncoder::open(codec, format, width, height, fps, b, cuda, bit_depth) {
|
||||||
Ok(enc) => {
|
Ok(enc) => {
|
||||||
if i > 0 {
|
if i > 0 {
|
||||||
tracing::warn!(
|
tracing::warn!(
|
||||||
@@ -158,6 +160,7 @@ pub fn open_video(
|
|||||||
#[cfg(target_os = "windows")]
|
#[cfg(target_os = "windows")]
|
||||||
{
|
{
|
||||||
let _ = cuda; // always false on Windows (no Cuda payload)
|
let _ = cuda; // always false on Windows (no Cuda payload)
|
||||||
|
let _ = bit_depth; // used by the NVENC path below; the software H.264 path is 8-bit only
|
||||||
let pref = std::env::var("PUNKTFUNK_ENCODER")
|
let pref = std::env::var("PUNKTFUNK_ENCODER")
|
||||||
.unwrap_or_default()
|
.unwrap_or_default()
|
||||||
.to_ascii_lowercase();
|
.to_ascii_lowercase();
|
||||||
@@ -166,8 +169,15 @@ pub fn open_video(
|
|||||||
// FramePayload::D3d11 output under the same env var so capture + encode share textures.
|
// FramePayload::D3d11 output under the same env var so capture + encode share textures.
|
||||||
#[cfg(feature = "nvenc")]
|
#[cfg(feature = "nvenc")]
|
||||||
{
|
{
|
||||||
let enc =
|
let enc = nvenc::NvencD3d11Encoder::open(
|
||||||
nvenc::NvencD3d11Encoder::open(codec, format, width, height, fps, bitrate_bps)?;
|
codec,
|
||||||
|
format,
|
||||||
|
width,
|
||||||
|
height,
|
||||||
|
fps,
|
||||||
|
bitrate_bps,
|
||||||
|
bit_depth,
|
||||||
|
)?;
|
||||||
return Ok(Box::new(enc) as Box<dyn Encoder>);
|
return Ok(Box::new(enc) as Box<dyn Encoder>);
|
||||||
}
|
}
|
||||||
#[cfg(not(feature = "nvenc"))]
|
#[cfg(not(feature = "nvenc"))]
|
||||||
@@ -196,7 +206,16 @@ pub fn open_video(
|
|||||||
}
|
}
|
||||||
#[cfg(not(any(target_os = "linux", target_os = "windows")))]
|
#[cfg(not(any(target_os = "linux", target_os = "windows")))]
|
||||||
{
|
{
|
||||||
let _ = (codec, format, width, height, fps, bitrate_bps, cuda);
|
let _ = (
|
||||||
|
codec,
|
||||||
|
format,
|
||||||
|
width,
|
||||||
|
height,
|
||||||
|
fps,
|
||||||
|
bitrate_bps,
|
||||||
|
cuda,
|
||||||
|
bit_depth,
|
||||||
|
);
|
||||||
anyhow::bail!("video encode requires Linux or Windows")
|
anyhow::bail!("video encode requires Linux or Windows")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -103,6 +103,9 @@ fn nvenc_input(format: PixelFormat) -> (Pixel, bool) {
|
|||||||
PixelFormat::Rgba => (Pixel::RGBA, false),
|
PixelFormat::Rgba => (Pixel::RGBA, false),
|
||||||
PixelFormat::Rgb => (Pixel::RGBZ, true), // RGB -> rgb0
|
PixelFormat::Rgb => (Pixel::RGBZ, true), // RGB -> rgb0
|
||||||
PixelFormat::Bgr => (Pixel::BGRZ, true), // BGR -> bgr0
|
PixelFormat::Bgr => (Pixel::BGRZ, true), // BGR -> bgr0
|
||||||
|
// 10-bit HDR (R10G10B10A2) is produced only by the Windows DXGI HDR capture path; the Linux
|
||||||
|
// capturer never emits it. Map to BGRA so the match is exhaustive — unreachable here.
|
||||||
|
PixelFormat::Rgb10a2 => (Pixel::BGRA, false),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -131,6 +134,7 @@ pub struct NvencEncoder {
|
|||||||
unsafe impl Send for NvencEncoder {}
|
unsafe impl Send for NvencEncoder {}
|
||||||
|
|
||||||
impl NvencEncoder {
|
impl NvencEncoder {
|
||||||
|
#[allow(clippy::too_many_arguments)]
|
||||||
pub fn open(
|
pub fn open(
|
||||||
codec: Codec,
|
codec: Codec,
|
||||||
format: PixelFormat,
|
format: PixelFormat,
|
||||||
@@ -139,7 +143,18 @@ impl NvencEncoder {
|
|||||||
fps: u32,
|
fps: u32,
|
||||||
bitrate_bps: u64,
|
bitrate_bps: u64,
|
||||||
cuda: bool,
|
cuda: bool,
|
||||||
|
bit_depth: u8,
|
||||||
) -> Result<Self> {
|
) -> Result<Self> {
|
||||||
|
// TODO(hdr): Linux 10-bit parity. Unlike the Windows raw-SDK path (which upconverts 8-bit
|
||||||
|
// ARGB → Main10 via pixelBitDepthMinus8), libavcodec hevc_nvenc needs a 10-bit input pixel
|
||||||
|
// format (p010) for Main10, so it's a bigger change; deferred until a Linux GPU box is
|
||||||
|
// available to validate. The Linux host stays 8-bit for now.
|
||||||
|
if bit_depth != 8 {
|
||||||
|
tracing::warn!(
|
||||||
|
bit_depth,
|
||||||
|
"Linux NVENC 10-bit not yet wired — encoding 8-bit"
|
||||||
|
);
|
||||||
|
}
|
||||||
ffmpeg::init().context("ffmpeg init")?;
|
ffmpeg::init().context("ffmpeg init")?;
|
||||||
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
|
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
|
||||||
unsafe { ffi::av_log_set_level(48) }; // AV_LOG_DEBUG — surface NVENC hw-frame rejects
|
unsafe { ffi::av_log_set_level(48) }; // AV_LOG_DEBUG — surface NVENC hw-frame rejects
|
||||||
|
|||||||
@@ -43,6 +43,12 @@ pub struct NvencD3d11Encoder {
|
|||||||
fps: u32,
|
fps: u32,
|
||||||
bitrate_bps: u64,
|
bitrate_bps: u64,
|
||||||
buffer_fmt: nv::NV_ENC_BUFFER_FORMAT,
|
buffer_fmt: nv::NV_ENC_BUFFER_FORMAT,
|
||||||
|
/// Encoded bit depth (8 or 10). 10 → HEVC Main10 (NVENC upconverts the 8-bit ARGB input).
|
||||||
|
bit_depth: u8,
|
||||||
|
/// HDR: the capturer is delivering BT.2020 PQ 10-bit (`PixelFormat::Rgb10a2`) frames. Sets the
|
||||||
|
/// `ABGR10` input format + the BT.2020/PQ colour VUI. Derived per-frame from the capture format
|
||||||
|
/// (HDR can toggle mid-session); a change re-inits the session.
|
||||||
|
hdr: bool,
|
||||||
/// Registrations of the capturer's input textures, cached by texture raw pointer — NVENC encodes
|
/// Registrations of the capturer's input textures, cached by texture raw pointer — NVENC encodes
|
||||||
/// them in place (no per-frame copy). The cloned `ID3D11Texture2D` keeps each alive until we
|
/// them in place (no per-frame copy). The cloned `ID3D11Texture2D` keeps each alive until we
|
||||||
/// unregister it (the capturer may drop its copy on a device recreate before our teardown runs).
|
/// unregister it (the capturer may drop its copy on a device recreate before our teardown runs).
|
||||||
@@ -71,6 +77,7 @@ impl NvencD3d11Encoder {
|
|||||||
height: u32,
|
height: u32,
|
||||||
fps: u32,
|
fps: u32,
|
||||||
bitrate_bps: u64,
|
bitrate_bps: u64,
|
||||||
|
bit_depth: u8,
|
||||||
) -> Result<Self> {
|
) -> Result<Self> {
|
||||||
Ok(Self {
|
Ok(Self {
|
||||||
encoder: ptr::null_mut(),
|
encoder: ptr::null_mut(),
|
||||||
@@ -80,6 +87,8 @@ impl NvencD3d11Encoder {
|
|||||||
fps,
|
fps,
|
||||||
bitrate_bps,
|
bitrate_bps,
|
||||||
buffer_fmt: nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB,
|
buffer_fmt: nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB,
|
||||||
|
bit_depth,
|
||||||
|
hdr: false,
|
||||||
regs: HashMap::new(),
|
regs: HashMap::new(),
|
||||||
next: 0,
|
next: 0,
|
||||||
bitstreams: Vec::new(),
|
bitstreams: Vec::new(),
|
||||||
@@ -139,7 +148,8 @@ impl NvencD3d11Encoder {
|
|||||||
// it at low pixel rates). Env override PUNKTFUNK_SPLIT_ENCODE = 0/disable | 1/auto | 2 | 3.
|
// it at low pixel rates). Env override PUNKTFUNK_SPLIT_ENCODE = 0/disable | 1/auto | 2 | 3.
|
||||||
// HEVC/AV1 only; the init-failure fallback below disables it if a codec/config rejects it.
|
// HEVC/AV1 only; the init-failure fallback below disables it if a codec/config rejects it.
|
||||||
let pixel_rate = self.width as u64 * self.height as u64 * self.fps.max(1) as u64;
|
let pixel_rate = self.width as u64 * self.height as u64 * self.fps.max(1) as u64;
|
||||||
let mut split_mode: u32 = match std::env::var("PUNKTFUNK_SPLIT_ENCODE").ok().as_deref() {
|
let mut split_mode: u32 = match std::env::var("PUNKTFUNK_SPLIT_ENCODE").ok().as_deref()
|
||||||
|
{
|
||||||
Some("0") | Some("disable") => {
|
Some("0") | Some("disable") => {
|
||||||
nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_DISABLE_MODE as u32
|
nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_DISABLE_MODE as u32
|
||||||
}
|
}
|
||||||
@@ -202,6 +212,33 @@ impl NvencD3d11Encoder {
|
|||||||
cfg.rcParams.vbvBufferSize = vbv;
|
cfg.rcParams.vbvBufferSize = vbv;
|
||||||
cfg.rcParams.vbvInitialDelay = vbv;
|
cfg.rcParams.vbvInitialDelay = vbv;
|
||||||
|
|
||||||
|
// 3b. 10-bit HEVC Main10. The 8-bit ARGB capture input is upconverted by NVENC (the
|
||||||
|
// proven high-bit-depth-from-8-bit path); the encoded stream is 10-bit, which removes
|
||||||
|
// banding and is the foundation for HDR. Color stays BT.709 here (Phase 2 sets the
|
||||||
|
// BT.2020/PQ VUI + HDR10 metadata). 8-bit leaves the preset default (Main) untouched.
|
||||||
|
if self.bit_depth == 10 {
|
||||||
|
cfg.profileGUID = nv::NV_ENC_HEVC_PROFILE_MAIN10_GUID;
|
||||||
|
cfg.encodeCodecConfig.hevcConfig.set_pixelBitDepthMinus8(2);
|
||||||
|
// 10 - 8
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3c. HDR colour signaling: BT.2020 primaries + SMPTE ST 2084 (PQ) transfer in the
|
||||||
|
// HEVC VUI, so a decoder/display knows the 10-bit samples are PQ HDR (not SDR gamma).
|
||||||
|
// The capturer already produced PQ-encoded BT.2020 pixels; this just describes them.
|
||||||
|
// (HDR10 static metadata — mastering display + MaxCLL/MaxFALL — is added in a follow-up.)
|
||||||
|
if self.hdr {
|
||||||
|
let vui = &mut cfg.encodeCodecConfig.hevcConfig.hevcVUIParameters;
|
||||||
|
vui.videoSignalTypePresentFlag = 1;
|
||||||
|
vui.videoFullRangeFlag = 0; // limited (studio) range — NVENC RGB→YUV default
|
||||||
|
vui.colourDescriptionPresentFlag = 1;
|
||||||
|
vui.colourPrimaries =
|
||||||
|
nv::NV_ENC_VUI_COLOR_PRIMARIES::NV_ENC_VUI_COLOR_PRIMARIES_BT2020;
|
||||||
|
vui.transferCharacteristics =
|
||||||
|
nv::NV_ENC_VUI_TRANSFER_CHARACTERISTIC::NV_ENC_VUI_TRANSFER_CHARACTERISTIC_SMPTE2084;
|
||||||
|
vui.colourMatrix =
|
||||||
|
nv::NV_ENC_VUI_MATRIX_COEFFS::NV_ENC_VUI_MATRIX_COEFFS_BT2020_NCL;
|
||||||
|
}
|
||||||
|
|
||||||
// 4. initialize the encoder.
|
// 4. initialize the encoder.
|
||||||
let mut init = nv::NV_ENC_INITIALIZE_PARAMS {
|
let mut init = nv::NV_ENC_INITIALIZE_PARAMS {
|
||||||
version: nv::NV_ENC_INITIALIZE_PARAMS_VER,
|
version: nv::NV_ENC_INITIALIZE_PARAMS_VER,
|
||||||
@@ -242,9 +279,11 @@ impl NvencD3d11Encoder {
|
|||||||
// fails, the codec/config may not accept it (e.g. H264) — disable split and retry
|
// fails, the codec/config may not accept it (e.g. H264) — disable split and retry
|
||||||
// single-engine rather than fail the session.
|
// single-engine rather than fail the session.
|
||||||
Err(e)
|
Err(e)
|
||||||
if split_mode != nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_AUTO_MODE as u32
|
if split_mode
|
||||||
|
!= nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_AUTO_MODE as u32
|
||||||
&& split_mode
|
&& split_mode
|
||||||
!= nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_DISABLE_MODE as u32 =>
|
!= nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_DISABLE_MODE
|
||||||
|
as u32 =>
|
||||||
{
|
{
|
||||||
let _ = (API.destroy_encoder)(enc);
|
let _ = (API.destroy_encoder)(enc);
|
||||||
tracing::warn!(error = ?e, "NVENC init rejected with split-encode forced — disabling split, retrying single-engine");
|
tracing::warn!(error = ?e, "NVENC init rejected with split-encode forced — disabling split, retrying single-engine");
|
||||||
@@ -253,7 +292,10 @@ impl NvencD3d11Encoder {
|
|||||||
}
|
}
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
let _ = (API.destroy_encoder)(enc);
|
let _ = (API.destroy_encoder)(enc);
|
||||||
return Err(anyhow!("initialize_encoder: {e:?} (even at {} Mbps floor)", FLOOR_BPS / 1_000_000));
|
return Err(anyhow!(
|
||||||
|
"initialize_encoder: {e:?} (even at {} Mbps floor)",
|
||||||
|
FLOOR_BPS / 1_000_000
|
||||||
|
));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
@@ -280,10 +322,12 @@ impl NvencD3d11Encoder {
|
|||||||
}
|
}
|
||||||
self.inited = true;
|
self.inited = true;
|
||||||
tracing::info!(
|
tracing::info!(
|
||||||
"NVENC D3D11 session: {}x{}@{} {} Mbps {:?}",
|
"NVENC D3D11 session: {}x{}@{} {}-bit{} {} Mbps {:?}",
|
||||||
self.width,
|
self.width,
|
||||||
self.height,
|
self.height,
|
||||||
self.fps,
|
self.fps,
|
||||||
|
self.bit_depth,
|
||||||
|
if self.hdr { " HDR(BT.2020 PQ)" } else { "" },
|
||||||
self.bitrate_bps / 1_000_000,
|
self.bitrate_bps / 1_000_000,
|
||||||
self.codec_guid
|
self.codec_guid
|
||||||
);
|
);
|
||||||
@@ -303,21 +347,36 @@ impl Encoder for NvencD3d11Encoder {
|
|||||||
// The capturer recreates its D3D11 device on a desktop switch (secure/Winlogon) and may come
|
// The capturer recreates its D3D11 device on a desktop switch (secure/Winlogon) and may come
|
||||||
// back at a different resolution (user session applies its own mode on login). Re-init when the
|
// back at a different resolution (user session applies its own mode on login). Re-init when the
|
||||||
// frame arrives on a different device OR at a different size than our session was built on.
|
// frame arrives on a different device OR at a different size than our session was built on.
|
||||||
|
// HDR (BT.2020 PQ 10-bit) when the capturer hands us a 10-bit R10G10B10A2 frame. This can flip
|
||||||
|
// mid-session when the user toggles HDR (which arrives as a capture device recreate anyway).
|
||||||
|
let hdr = matches!(captured.format, PixelFormat::Rgb10a2);
|
||||||
let dev_raw = frame.device.as_raw();
|
let dev_raw = frame.device.as_raw();
|
||||||
let size_changed = self.inited && (self.width != captured.width || self.height != captured.height);
|
let size_changed =
|
||||||
if self.inited && (self.init_device != dev_raw || size_changed) {
|
self.inited && (self.width != captured.width || self.height != captured.height);
|
||||||
|
let hdr_changed = self.inited && self.hdr != hdr;
|
||||||
|
if self.inited && (self.init_device != dev_raw || size_changed || hdr_changed) {
|
||||||
tracing::info!(
|
tracing::info!(
|
||||||
device_changed = self.init_device != dev_raw,
|
device_changed = self.init_device != dev_raw,
|
||||||
size_changed,
|
size_changed,
|
||||||
|
hdr_changed,
|
||||||
|
hdr,
|
||||||
new = format!("{}x{}", captured.width, captured.height),
|
new = format!("{}x{}", captured.width, captured.height),
|
||||||
"NVENC: capture device/size changed (desktop switch) — re-initializing session"
|
"NVENC: capture device/size/HDR changed — re-initializing session"
|
||||||
);
|
);
|
||||||
unsafe { self.teardown() };
|
unsafe { self.teardown() };
|
||||||
}
|
}
|
||||||
if !self.inited {
|
if !self.inited {
|
||||||
// Adopt the current frame size so the encoder always matches what the capturer produces.
|
// Adopt the current frame size + colour so the encoder always matches the capturer output.
|
||||||
self.width = captured.width;
|
self.width = captured.width;
|
||||||
self.height = captured.height;
|
self.height = captured.height;
|
||||||
|
self.hdr = hdr;
|
||||||
|
if hdr {
|
||||||
|
// 10-bit BT.2020 PQ input; force Main10 regardless of the negotiated SDR bit depth.
|
||||||
|
self.bit_depth = 10;
|
||||||
|
self.buffer_fmt = nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ABGR10;
|
||||||
|
} else {
|
||||||
|
self.buffer_fmt = nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB;
|
||||||
|
}
|
||||||
let device = frame.device.clone();
|
let device = frame.device.clone();
|
||||||
self.init_session(&device)?;
|
self.init_session(&device)?;
|
||||||
self.init_device = dev_raw;
|
self.init_device = dev_raw;
|
||||||
@@ -332,7 +391,8 @@ impl Encoder for NvencD3d11Encoder {
|
|||||||
if !self.regs.contains_key(&key) {
|
if !self.regs.contains_key(&key) {
|
||||||
let mut rr = nv::NV_ENC_REGISTER_RESOURCE {
|
let mut rr = nv::NV_ENC_REGISTER_RESOURCE {
|
||||||
version: nv::NV_ENC_REGISTER_RESOURCE_VER,
|
version: nv::NV_ENC_REGISTER_RESOURCE_VER,
|
||||||
resourceType: nv::NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_DIRECTX,
|
resourceType:
|
||||||
|
nv::NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_DIRECTX,
|
||||||
width: self.width,
|
width: self.width,
|
||||||
height: self.height,
|
height: self.height,
|
||||||
pitch: 0,
|
pitch: 0,
|
||||||
|
|||||||
@@ -146,6 +146,11 @@ impl Encoder for OpenH264Encoder {
|
|||||||
self.normalize_to_bgra(bytes, 3, false);
|
self.normalize_to_bgra(bytes, 3, false);
|
||||||
self.yuv.read_rgb(BgraSliceU8::new(&self.scratch, (w, h)));
|
self.yuv.read_rgb(BgraSliceU8::new(&self.scratch, (w, h)));
|
||||||
}
|
}
|
||||||
|
// 10-bit HDR comes only from the GPU NVENC path; the software 8-bit H.264 encoder
|
||||||
|
// can't represent it (and never receives it — the capturer pairs Rgb10a2 with NVENC).
|
||||||
|
PixelFormat::Rgb10a2 => {
|
||||||
|
anyhow::bail!("software H.264 encoder cannot encode 10-bit HDR (Rgb10a2)")
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if self.force_kf {
|
if self.force_kf {
|
||||||
|
|||||||
@@ -274,6 +274,7 @@ fn stream_body(
|
|||||||
cfg.fps,
|
cfg.fps,
|
||||||
cfg.bitrate_kbps as u64 * 1000,
|
cfg.bitrate_kbps as u64 * 1000,
|
||||||
frame.is_cuda(),
|
frame.is_cuda(),
|
||||||
|
8, // GameStream/Moonlight path: 8-bit (its own codec negotiation)
|
||||||
)
|
)
|
||||||
.context("open NVENC for stream")?;
|
.context("open NVENC for stream")?;
|
||||||
// FEC overhead percent (Sunshine default 20). Override with PUNKTFUNK_FEC_PCT (0 = data-only).
|
// FEC overhead percent (Sunshine default 20). Override with PUNKTFUNK_FEC_PCT (0 = data-only).
|
||||||
|
|||||||
@@ -104,6 +104,7 @@ pub fn run(opts: Options) -> Result<()> {
|
|||||||
opts.fps,
|
opts.fps,
|
||||||
opts.bitrate_bps,
|
opts.bitrate_bps,
|
||||||
first.is_cuda(),
|
first.is_cuda(),
|
||||||
|
8, // m0 synthetic harness: 8-bit
|
||||||
)
|
)
|
||||||
.context("open encoder")?;
|
.context("open encoder")?;
|
||||||
|
|
||||||
|
|||||||
@@ -554,6 +554,25 @@ async fn serve_session(
|
|||||||
"encoder bitrate"
|
"encoder bitrate"
|
||||||
);
|
);
|
||||||
|
|
||||||
|
// Resolve the encode bit depth: HEVC Main10 only when the client advertised it AND the host
|
||||||
|
// opted in (PUNKTFUNK_10BIT). A client that can't decode 10-bit (caps bit clear, or an older
|
||||||
|
// client) always gets the 8-bit stream. PUNKTFUNK_10BIT is the host policy gate until a
|
||||||
|
// mgmt/console toggle replaces it.
|
||||||
|
let host_wants_10bit = std::env::var_os("PUNKTFUNK_10BIT").is_some();
|
||||||
|
let client_supports_10bit = hello.video_caps & punktfunk_core::quic::VIDEO_CAP_10BIT != 0;
|
||||||
|
let bit_depth: u8 = if host_wants_10bit && client_supports_10bit {
|
||||||
|
10
|
||||||
|
} else {
|
||||||
|
8
|
||||||
|
};
|
||||||
|
tracing::info!(
|
||||||
|
bit_depth,
|
||||||
|
host_wants_10bit,
|
||||||
|
client_supports_10bit,
|
||||||
|
client_video_caps = hello.video_caps,
|
||||||
|
"encode bit depth"
|
||||||
|
);
|
||||||
|
|
||||||
// Reserve a UDP port for the data plane (bind, read it back, rebind in UdpTransport).
|
// Reserve a UDP port for the data plane (bind, read it back, rebind in UdpTransport).
|
||||||
let probe = std::net::UdpSocket::bind("0.0.0.0:0")?;
|
let probe = std::net::UdpSocket::bind("0.0.0.0:0")?;
|
||||||
let udp_port = probe.local_addr()?.port();
|
let udp_port = probe.local_addr()?.port();
|
||||||
@@ -590,6 +609,7 @@ async fn serve_session(
|
|||||||
.unwrap_or(CompositorPref::Auto),
|
.unwrap_or(CompositorPref::Auto),
|
||||||
gamepad,
|
gamepad,
|
||||||
bitrate_kbps,
|
bitrate_kbps,
|
||||||
|
bit_depth,
|
||||||
};
|
};
|
||||||
io::write_msg(&mut send, &welcome.encode()).await?;
|
io::write_msg(&mut send, &welcome.encode()).await?;
|
||||||
|
|
||||||
@@ -807,6 +827,7 @@ async fn serve_session(
|
|||||||
let (seconds, frames) = (opts.seconds, opts.frames);
|
let (seconds, frames) = (opts.seconds, opts.frames);
|
||||||
let mode = hello.mode;
|
let mode = hello.mode;
|
||||||
let bitrate_kbps = welcome.bitrate_kbps; // resolved encoder bitrate (Hello clamped, or default)
|
let bitrate_kbps = welcome.bitrate_kbps; // resolved encoder bitrate (Hello clamped, or default)
|
||||||
|
let bit_depth = welcome.bit_depth; // resolved encode bit depth (8, or 10 when negotiated)
|
||||||
let stop_stream = stop.clone();
|
let stop_stream = stop.clone();
|
||||||
let result: Result<()> = async {
|
let result: Result<()> = async {
|
||||||
tokio::task::spawn_blocking(move || -> Result<()> {
|
tokio::task::spawn_blocking(move || -> Result<()> {
|
||||||
@@ -849,6 +870,7 @@ async fn serve_session(
|
|||||||
&keyframe_rx,
|
&keyframe_rx,
|
||||||
compositor,
|
compositor,
|
||||||
bitrate_kbps,
|
bitrate_kbps,
|
||||||
|
bit_depth,
|
||||||
probe_rx,
|
probe_rx,
|
||||||
probe_result_tx,
|
probe_result_tx,
|
||||||
)
|
)
|
||||||
@@ -1942,6 +1964,7 @@ fn virtual_stream(
|
|||||||
keyframe: &std::sync::mpsc::Receiver<()>,
|
keyframe: &std::sync::mpsc::Receiver<()>,
|
||||||
compositor: crate::vdisplay::Compositor,
|
compositor: crate::vdisplay::Compositor,
|
||||||
bitrate_kbps: u32,
|
bitrate_kbps: u32,
|
||||||
|
bit_depth: u8,
|
||||||
probe_rx: std::sync::mpsc::Receiver<ProbeRequest>,
|
probe_rx: std::sync::mpsc::Receiver<ProbeRequest>,
|
||||||
probe_result_tx: tokio::sync::mpsc::UnboundedSender<ProbeResult>,
|
probe_result_tx: tokio::sync::mpsc::UnboundedSender<ProbeResult>,
|
||||||
) -> Result<()> {
|
) -> Result<()> {
|
||||||
@@ -1949,11 +1972,12 @@ fn virtual_stream(
|
|||||||
compositor = compositor.id(),
|
compositor = compositor.id(),
|
||||||
?mode,
|
?mode,
|
||||||
bitrate_kbps,
|
bitrate_kbps,
|
||||||
|
bit_depth,
|
||||||
"punktfunk/1 virtual display"
|
"punktfunk/1 virtual display"
|
||||||
);
|
);
|
||||||
let mut vd = crate::vdisplay::open(compositor)?;
|
let mut vd = crate::vdisplay::open(compositor)?;
|
||||||
let (mut capturer, mut enc, mut frame, mut interval) =
|
let (mut capturer, mut enc, mut frame, mut interval) =
|
||||||
build_pipeline_with_retry(&mut vd, mode, bitrate_kbps)?;
|
build_pipeline_with_retry(&mut vd, mode, bitrate_kbps, bit_depth)?;
|
||||||
|
|
||||||
let perf = std::env::var("PUNKTFUNK_PERF").is_ok();
|
let perf = std::env::var("PUNKTFUNK_PERF").is_ok();
|
||||||
// Microburst cap (applied in send_loop/paced_submit): a frame ≤ this bursts out immediately;
|
// Microburst cap (applied in send_loop/paced_submit): a frame ≤ this bursts out immediately;
|
||||||
@@ -2041,7 +2065,8 @@ fn virtual_stream(
|
|||||||
let rebuilt =
|
let rebuilt =
|
||||||
(|| -> Result<(Box<dyn crate::vdisplay::VirtualDisplay>, Pipeline)> {
|
(|| -> Result<(Box<dyn crate::vdisplay::VirtualDisplay>, Pipeline)> {
|
||||||
let mut new_vd = crate::vdisplay::open(sw.compositor)?;
|
let mut new_vd = crate::vdisplay::open(sw.compositor)?;
|
||||||
let pipe = build_pipeline_with_retry(&mut new_vd, mode, bitrate_kbps)?;
|
let pipe =
|
||||||
|
build_pipeline_with_retry(&mut new_vd, mode, bitrate_kbps, bit_depth)?;
|
||||||
Ok((new_vd, pipe))
|
Ok((new_vd, pipe))
|
||||||
})();
|
})();
|
||||||
match rebuilt {
|
match rebuilt {
|
||||||
@@ -2084,7 +2109,7 @@ fn virtual_stream(
|
|||||||
// Build the new pipeline BEFORE dropping the old one: the host already acked
|
// Build the new pipeline BEFORE dropping the old one: the host already acked
|
||||||
// the switch as accepted, so a rebuild failure must not kill an otherwise
|
// the switch as accepted, so a rebuild failure must not kill an otherwise
|
||||||
// healthy session — keep streaming the current mode and log instead.
|
// healthy session — keep streaming the current mode and log instead.
|
||||||
match build_pipeline(&mut vd, new_mode, bitrate_kbps) {
|
match build_pipeline(&mut vd, new_mode, bitrate_kbps, bit_depth) {
|
||||||
Ok(next_pipe) => {
|
Ok(next_pipe) => {
|
||||||
(capturer, enc, frame, interval) = next_pipe;
|
(capturer, enc, frame, interval) = next_pipe;
|
||||||
next = std::time::Instant::now();
|
next = std::time::Instant::now();
|
||||||
@@ -2176,11 +2201,12 @@ fn build_pipeline_with_retry(
|
|||||||
vd: &mut Box<dyn crate::vdisplay::VirtualDisplay>,
|
vd: &mut Box<dyn crate::vdisplay::VirtualDisplay>,
|
||||||
mode: punktfunk_core::Mode,
|
mode: punktfunk_core::Mode,
|
||||||
bitrate_kbps: u32,
|
bitrate_kbps: u32,
|
||||||
|
bit_depth: u8,
|
||||||
) -> Result<Pipeline> {
|
) -> Result<Pipeline> {
|
||||||
const MAX_ATTEMPTS: u32 = 4;
|
const MAX_ATTEMPTS: u32 = 4;
|
||||||
let mut backoff = std::time::Duration::from_millis(500);
|
let mut backoff = std::time::Duration::from_millis(500);
|
||||||
for attempt in 1..=MAX_ATTEMPTS {
|
for attempt in 1..=MAX_ATTEMPTS {
|
||||||
match build_pipeline(vd, mode, bitrate_kbps) {
|
match build_pipeline(vd, mode, bitrate_kbps, bit_depth) {
|
||||||
Ok(pipe) => {
|
Ok(pipe) => {
|
||||||
if attempt > 1 {
|
if attempt > 1 {
|
||||||
tracing::info!(attempt, "pipeline up after retry");
|
tracing::info!(attempt, "pipeline up after retry");
|
||||||
@@ -2238,6 +2264,7 @@ fn build_pipeline(
|
|||||||
vd: &mut Box<dyn crate::vdisplay::VirtualDisplay>,
|
vd: &mut Box<dyn crate::vdisplay::VirtualDisplay>,
|
||||||
mode: punktfunk_core::Mode,
|
mode: punktfunk_core::Mode,
|
||||||
bitrate_kbps: u32,
|
bitrate_kbps: u32,
|
||||||
|
bit_depth: u8,
|
||||||
) -> Result<Pipeline> {
|
) -> Result<Pipeline> {
|
||||||
let vout = vd.create(mode).context("create virtual output")?;
|
let vout = vd.create(mode).context("create virtual output")?;
|
||||||
// The backend reports the refresh it actually achieved in `preferred_mode.2` (KWin may cap a
|
// The backend reports the refresh it actually achieved in `preferred_mode.2` (KWin may cap a
|
||||||
@@ -2260,6 +2287,8 @@ fn build_pipeline(
|
|||||||
crate::capture::capture_virtual_output(vout).context("capture virtual output")?;
|
crate::capture::capture_virtual_output(vout).context("capture virtual output")?;
|
||||||
capturer.set_active(true);
|
capturer.set_active(true);
|
||||||
let frame = capturer.next_frame().context("first frame")?;
|
let frame = capturer.next_frame().context("first frame")?;
|
||||||
|
// `bit_depth` is the handshake-negotiated value (8, or 10 = HEVC Main10 when the client
|
||||||
|
// advertised VIDEO_CAP_10BIT and the host opted in). Threaded down from the Welcome.
|
||||||
let enc = crate::encode::open_video(
|
let enc = crate::encode::open_video(
|
||||||
crate::encode::Codec::H265,
|
crate::encode::Codec::H265,
|
||||||
frame.format,
|
frame.format,
|
||||||
@@ -2268,6 +2297,7 @@ fn build_pipeline(
|
|||||||
effective_hz,
|
effective_hz,
|
||||||
bitrate_kbps as u64 * 1000,
|
bitrate_kbps as u64 * 1000,
|
||||||
frame.is_cuda(),
|
frame.is_cuda(),
|
||||||
|
bit_depth,
|
||||||
)
|
)
|
||||||
.context("open NVENC")?;
|
.context("open NVENC")?;
|
||||||
let interval = std::time::Duration::from_secs_f64(1.0 / effective_hz.max(1) as f64);
|
let interval = std::time::Duration::from_secs_f64(1.0 / effective_hz.max(1) as f64);
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
//! Windows virtual-display backend driving **SudoVDA** (the SudoMaker Virtual Display Adapter —
|
//! Windows virtual-display backend driving **SudoVDA** (the SudoMaker Virtual Display Adapter —
|
||||||
//! the Indirect Display Driver the Apollo Sunshine-fork ships). The Windows analogue of the
|
//! the Indirect Display Driver the Apollo Sunshine-fork ships). The Windows analogue of the
|
||||||
//! Linux per-compositor backends: [`create`](VirtualDisplay::create) adds a virtual monitor at the
|
//! Linux per-compositor backends: [`create`](VirtualDisplay::create) adds a virtual monitor at the
|
||||||
//! client's exact `WxH@Hz` (the mode is baked into the ADD IOCTL — no EDID seeding), starts the
|
//! client's exact `WxH@Hz` (the mode is baked into the ADD IOCTL — no EDID seeding), starts the
|
||||||
@@ -161,7 +161,11 @@ fn set_active_mode(gdi_name: &str, mode: Mode) {
|
|||||||
..Default::default()
|
..Default::default()
|
||||||
};
|
};
|
||||||
let ok = unsafe {
|
let ok = unsafe {
|
||||||
EnumDisplaySettingsW(PCWSTR(wname.as_ptr()), ENUM_DISPLAY_SETTINGS_MODE(i), &mut dm)
|
EnumDisplaySettingsW(
|
||||||
|
PCWSTR(wname.as_ptr()),
|
||||||
|
ENUM_DISPLAY_SETTINGS_MODE(i),
|
||||||
|
&mut dm,
|
||||||
|
)
|
||||||
}
|
}
|
||||||
.as_bool();
|
.as_bool();
|
||||||
if !ok {
|
if !ok {
|
||||||
@@ -175,7 +179,12 @@ fn set_active_mode(gdi_name: &str, mode: Mode) {
|
|||||||
}
|
}
|
||||||
let chosen_hz = if at_res.contains(&mode.refresh_hz) {
|
let chosen_hz = if at_res.contains(&mode.refresh_hz) {
|
||||||
mode.refresh_hz
|
mode.refresh_hz
|
||||||
} else if let Some(hz) = at_res.iter().copied().filter(|&hz| hz <= mode.refresh_hz).max() {
|
} else if let Some(hz) = at_res
|
||||||
|
.iter()
|
||||||
|
.copied()
|
||||||
|
.filter(|&hz| hz <= mode.refresh_hz)
|
||||||
|
.max()
|
||||||
|
{
|
||||||
hz
|
hz
|
||||||
} else if let Some(hz) = at_res.iter().copied().max() {
|
} else if let Some(hz) = at_res.iter().copied().max() {
|
||||||
hz
|
hz
|
||||||
@@ -212,8 +221,9 @@ fn set_active_mode(gdi_name: &str, mode: Mode) {
|
|||||||
dmDisplayFrequency: chosen_hz,
|
dmDisplayFrequency: chosen_hz,
|
||||||
..Default::default()
|
..Default::default()
|
||||||
};
|
};
|
||||||
let test =
|
let test = unsafe {
|
||||||
unsafe { ChangeDisplaySettingsExW(PCWSTR(wname.as_ptr()), Some(&dm), None, CDS_TEST, None) };
|
ChangeDisplaySettingsExW(PCWSTR(wname.as_ptr()), Some(&dm), None, CDS_TEST, None)
|
||||||
|
};
|
||||||
if test != DISP_CHANGE_SUCCESSFUL {
|
if test != DISP_CHANGE_SUCCESSFUL {
|
||||||
tracing::warn!(
|
tracing::warn!(
|
||||||
result = test.0,
|
result = test.0,
|
||||||
|
|||||||
@@ -36,7 +36,8 @@ pub fn drm_fourcc(format: crate::capture::PixelFormat) -> Option<u32> {
|
|||||||
Rgbx => fourcc(b"XB24"), // DRM_FORMAT_XBGR8888
|
Rgbx => fourcc(b"XB24"), // DRM_FORMAT_XBGR8888
|
||||||
Rgba => fourcc(b"AB24"), // DRM_FORMAT_ABGR8888
|
Rgba => fourcc(b"AB24"), // DRM_FORMAT_ABGR8888
|
||||||
// 24-bit packed RGB/BGR have no straightforward dmabuf import here; use the CPU path.
|
// 24-bit packed RGB/BGR have no straightforward dmabuf import here; use the CPU path.
|
||||||
Rgb | Bgr => return None,
|
// Rgb10a2 is the Windows HDR capture format — never produced by the Linux capturer.
|
||||||
|
Rgb | Bgr | Rgb10a2 => return None,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -146,6 +146,16 @@
|
|||||||
// `shard_payload` so `HEADER_LEN + shard_payload + CRYPTO_OVERHEAD ≤ MAX_DATAGRAM_BYTES`.
|
// `shard_payload` so `HEADER_LEN + shard_payload + CRYPTO_OVERHEAD ≤ MAX_DATAGRAM_BYTES`.
|
||||||
#define MAX_DATAGRAM_BYTES 2048
|
#define MAX_DATAGRAM_BYTES 2048
|
||||||
|
|
||||||
|
#if defined(PUNKTFUNK_FEATURE_QUIC)
|
||||||
|
// [`Hello::video_caps`] bit: the client can decode a 10-bit (Main10) HEVC stream.
|
||||||
|
#define VIDEO_CAP_10BIT 1
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#if defined(PUNKTFUNK_FEATURE_QUIC)
|
||||||
|
// [`Hello::video_caps`] bit: the client can present BT.2020 PQ HDR10 (implies 10-bit).
|
||||||
|
#define VIDEO_CAP_HDR 2
|
||||||
|
#endif
|
||||||
|
|
||||||
#if defined(PUNKTFUNK_FEATURE_QUIC)
|
#if defined(PUNKTFUNK_FEATURE_QUIC)
|
||||||
// Longest device name carried in a [`Hello`] (bytes of UTF-8; longer names are truncated on
|
// Longest device name carried in a [`Hello`] (bytes of UTF-8; longer names are truncated on
|
||||||
// encode, rejected on decode — a one-byte length prefix caps it at 255 anyway).
|
// encode, rejected on decode — a one-byte length prefix caps it at 255 anyway).
|
||||||
|
|||||||
Reference in New Issue
Block a user