feat(hdr): Windows HDR10 + 10-bit end-to-end, negotiated; non-blocking capture recovery
apple / swift (push) Successful in 54s
ci / rust (push) Successful in 1m32s
android / android (push) Successful in 1m49s
ci / web (push) Successful in 26s
ci / docs-site (push) Successful in 30s
ci / bench (push) Successful in 1m36s
decky / build-publish (push) Successful in 12s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 5s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 4s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 4s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 4s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 3s
deb / build-publish (push) Successful in 2m20s
flatpak / build-publish (push) Successful in 4m6s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 5m11s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 4m32s

Adds true HDR (BT.2020 PQ) and 10-bit (HEVC Main10) streaming, negotiated so an
8-bit/SDR client is never sent a stream it can't decode, plus a robust fix for the
capture losing the stream across a secure-desktop transition.

Protocol (punktfunk-core/quic.rs):
- Hello gains `video_caps` (VIDEO_CAP_10BIT / VIDEO_CAP_HDR), Welcome gains `bit_depth`,
  both as optional trailing bytes (back-compat). client-rs advertises 10-bit via
  PUNKTFUNK_CLIENT_10BIT; the connector advertises 0 for now (in-band detection drives
  the native clients). Regenerated punktfunk_core.h.

Windows host:
- 10-bit Main10: host enables it only when the client advertised VIDEO_CAP_10BIT AND
  PUNKTFUNK_10BIT is set; threaded through open_video → NVENC (profile Main10,
  pixelBitDepthMinus8).
- HDR: when the captured desktop is scRGB FP16 (R16G16B16A16_FLOAT, HDR on), copy it to
  an FP16 surface, composite the cursor there, convert scRGB → BT.2020 PQ 10-bit
  (R10G10B10A2) via a shader, and encode HEVC Main10 with the BT.2020/PQ colour VUI
  (ABGR10 input). Fixes the freeze + cursor-trail that came from feeding FP16 into the
  BGRA path. Reacts dynamically to the HDR toggle.
- Capture recovery: rebuild is now a single NON-BLOCKING attempt, throttled to ~4×/s,
  repeating the last good frame between attempts (format-tagged last_present). During a
  secure-desktop dwell SudoVDA's output is gone; the old blocking 12 s retry starved the
  send loop for seconds so the client timed out and disconnected — now the session stays
  fed (frozen) until the desktop returns. Also seeds a black frame on recovery.

Apple client (PunktfunkKit):
- Detects HDR in-band from the stream VUI (PQ transfer function), decodes to 10-bit P010,
  and presents via an rgba16Float + BT.2020 PQ CAMetalLayer with EDR; SDR path unchanged.
  Switches automatically on a mid-session HDR toggle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-15 20:28:52 +00:00
parent f5eae24c87
commit bbabc04bca
19 changed files with 785 additions and 129 deletions
+8
View File
@@ -380,6 +380,14 @@ async fn session(args: Args) -> Result<()> {
name: Some(args.name.clone()),
// `--launch ID` — host resolves it against its own library and runs it this session.
launch: args.launch.clone(),
// This headless tool just dumps the bitstream (no decode), so it can always claim
// 10-bit support. Gated by env so latency runs stay on the 8-bit baseline:
// PUNKTFUNK_CLIENT_10BIT=1 advertises VIDEO_CAP_10BIT to exercise the host Main10 path.
video_caps: if std::env::var_os("PUNKTFUNK_CLIENT_10BIT").is_some() {
punktfunk_core::quic::VIDEO_CAP_10BIT
} else {
0
},
}
.encode(),
)
+4
View File
@@ -614,6 +614,10 @@ async fn worker_main(args: WorkerArgs) {
name: None,
// Library id to launch this session, if the embedder asked for one.
launch: launch.clone(),
// TODO(hdr): advertise the embedder's real decode caps once the ABI carries them
// and the Apple/Linux clients decode 10-bit. 0 = 8-bit only — the host then never
// upgrades this connector's session to a stream it can't yet present.
video_caps: 0,
}
.encode(),
)
+56 -6
View File
@@ -70,8 +70,21 @@ pub struct Hello {
/// `name` is absent, a zero-length name placeholder precedes it so the offset stays
/// deterministic. Omitted by older clients (decodes to `None`).
pub launch: Option<String>,
/// Client video capabilities the host may use to upgrade the stream — a bitfield of
/// [`VIDEO_CAP_10BIT`] (the client can decode 10-bit Main10 HEVC) and [`VIDEO_CAP_HDR`]
/// (the client can present BT.2020 PQ HDR10). The host enables a 10-bit / HDR encode ONLY
/// when the matching bit is set, so an older client (decodes to `0`) always gets the 8-bit
/// BT.709 stream it understands. Appended after `launch` as a single trailing byte; a
/// zero-length name/launch placeholder precedes it when those are absent so the offset stays
/// deterministic. Omitted by older clients (decodes to `0`).
pub video_caps: u8,
}
/// [`Hello::video_caps`] bit: the client can decode a 10-bit (Main10) HEVC stream.
pub const VIDEO_CAP_10BIT: u8 = 0x01;
/// [`Hello::video_caps`] bit: the client can present BT.2020 PQ HDR10 (implies 10-bit).
pub const VIDEO_CAP_HDR: u8 = 0x02;
/// Longest device name carried in a [`Hello`] (bytes of UTF-8; longer names are truncated on
/// encode, rejected on decode — a one-byte length prefix caps it at 255 anyway).
pub const HELLO_NAME_MAX: usize = 64;
@@ -108,6 +121,12 @@ pub struct Welcome {
/// default when the client requested `0`). Appended to the wire form — `0` when an older host
/// omitted it (i.e. "unknown").
pub bitrate_kbps: u32,
/// The luma/chroma bit depth the host actually encodes at — `8` (default / older host) or
/// `10` (Main10, enabled only when the client advertised [`VIDEO_CAP_10BIT`]). The client
/// configures its decoder for 10-bit (P010) when this is `10`. Appended to the wire form as a
/// single trailing byte; `8` when an older host omitted it. (Color space stays BT.709 in
/// Phase 1; BT.2020 PQ HDR signaling is added alongside HDR support.)
pub bit_depth: u8,
}
/// `client → host`: data plane is bound, begin streaming.
@@ -513,20 +532,28 @@ impl Hello {
// so a Hello with neither name nor launch stays byte-identical to the bitrate-era form
// (26 bytes). When `launch` is present we must still emit name's length byte (0 for None)
// so `launch` lands at a deterministic offset.
// `video_caps` is the last trailing field, after `launch`; when it's present (non-zero)
// the name/launch length bytes must still be emitted (0 for absent) so it lands at a
// deterministic offset — the same discipline `launch` already imposes on `name`.
let need_placeholders = self.video_caps != 0;
match (&self.name, &self.launch) {
(None, None) => {}
(None, None) if !need_placeholders => {}
(name, _) => {
let n = truncate_to(name.as_deref().unwrap_or(""), HELLO_NAME_MAX);
b.push(n.len() as u8);
b.extend_from_slice(n.as_bytes());
}
}
// launch after name: len u8 || UTF-8. Last trailing field.
if let Some(launch) = &self.launch {
let l = truncate_to(launch, HELLO_LAUNCH_MAX);
// launch after name: len u8 || UTF-8.
if self.launch.is_some() || need_placeholders {
let l = truncate_to(self.launch.as_deref().unwrap_or(""), HELLO_LAUNCH_MAX);
b.push(l.len() as u8);
b.extend_from_slice(l.as_bytes());
}
// video_caps: single trailing byte. Last field.
if self.video_caps != 0 {
b.push(self.video_caps);
}
b
}
@@ -580,6 +607,15 @@ impl Hello {
.and_then(|s| std::str::from_utf8(s).ok())
.map(String::from)
}),
// Optional trailing video-caps byte, positioned right after launch's `len u8 || bytes`
// block. Uses the raw (possibly zero/placeholder) name/launch length bytes to locate it,
// so it's robust to absent name/launch; absent entirely on an older client → `0`.
video_caps: {
let name_len = b.get(26).copied().unwrap_or(0) as usize;
let launch_off = 27 + name_len; // launch's length byte
let launch_len = b.get(launch_off).copied().unwrap_or(0) as usize;
b.get(launch_off + 1 + launch_len).copied().unwrap_or(0)
},
})
}
}
@@ -607,6 +643,7 @@ impl Welcome {
b.push(self.compositor.to_u8()); // appended at offset 53 — older clients read [0..53] and skip it
b.push(self.gamepad.to_u8()); // appended at offset 54 — same back-compat discipline
b.extend_from_slice(&self.bitrate_kbps.to_le_bytes()); // appended at offset 55..59
b.push(self.bit_depth); // appended at offset 59 — older clients read [0..59] and skip it
b
}
@@ -614,7 +651,7 @@ impl Welcome {
// Layout (LE): magic[0..4] abi[4..8] port[8..10] w[10..14] h[14..18] hz[18..22]
// scheme[22] pct[23] max_data[24..26] shard[26..28] encrypt[28] key[29..45]
// salt[45..49] frames[49..53] compositor[53] gamepad[54] bitrate_kbps[55..59]
// (compositor/gamepad/bitrate are optional trailing bytes).
// bit_depth[59] (compositor/gamepad/bitrate/bit_depth are optional trailing bytes).
if b.len() < 53 || &b[0..4] != MAGIC {
return Err(PunktfunkError::InvalidArg("bad Welcome"));
}
@@ -661,6 +698,9 @@ impl Welcome {
.get(55..59)
.map(|s| u32::from_le_bytes(s.try_into().unwrap()))
.unwrap_or(0),
// Optional trailing byte — absent on an older host → `8` (8-bit, the only depth they
// encode).
bit_depth: b.get(59).copied().unwrap_or(8),
})
}
@@ -1518,6 +1558,7 @@ mod tests {
compositor: CompositorPref::Gamescope,
gamepad: GamepadPref::DualSense,
bitrate_kbps: 50_000,
bit_depth: 10,
};
assert_eq!(Welcome::decode(&w.encode()).unwrap(), w);
}
@@ -1536,6 +1577,7 @@ mod tests {
bitrate_kbps: 25_000,
name: Some("Test Device".into()),
launch: Some("steam:570".into()),
video_caps: VIDEO_CAP_10BIT,
};
assert_eq!(Hello::decode(&h.encode()).unwrap(), h);
let s = Start {
@@ -1602,6 +1644,7 @@ mod tests {
bitrate_kbps: 80_000,
name: None,
launch: None,
video_caps: 0,
};
let enc = h.encode();
assert_eq!(enc.len(), 26);
@@ -1639,9 +1682,10 @@ mod tests {
compositor: CompositorPref::Kwin,
gamepad: GamepadPref::Xbox360,
bitrate_kbps: 120_000,
bit_depth: 10,
};
let wenc = w.encode();
assert_eq!(wenc.len(), 59);
assert_eq!(wenc.len(), 60);
let legacy_w = Welcome::decode(&wenc[..53]).unwrap();
assert_eq!(legacy_w.compositor, CompositorPref::Auto);
assert_eq!(legacy_w.gamepad, GamepadPref::Auto);
@@ -1655,7 +1699,10 @@ mod tests {
let pre_bitrate_w = Welcome::decode(&wenc[..55]).unwrap();
assert_eq!(pre_bitrate_w.gamepad, GamepadPref::Xbox360);
assert_eq!(pre_bitrate_w.bitrate_kbps, 0);
assert_eq!(pre_bitrate_w.bit_depth, 8); // older host (no trailing byte) → 8-bit assumed
assert_eq!(legacy_w.bit_depth, 8);
assert_eq!(Welcome::decode(&wenc).unwrap().bitrate_kbps, 120_000);
assert_eq!(Welcome::decode(&wenc).unwrap().bit_depth, 10); // full form carries it
}
#[test]
@@ -1672,6 +1719,7 @@ mod tests {
bitrate_kbps: 0,
name: Some("Enrico's MacBook".into()),
launch: None,
video_caps: 0,
};
let enc = base.encode();
assert_eq!(
@@ -1718,6 +1766,7 @@ mod tests {
bitrate_kbps: 0,
name: None,
launch: None,
video_caps: 0,
};
// launch alone (no name): a zero-length name placeholder keeps the offset deterministic.
let with_launch = Hello {
@@ -1882,6 +1931,7 @@ mod tests {
bitrate_kbps: 0,
name: None,
launch: None,
video_caps: 0,
}
.encode();
assert!(PairRequest::decode(&h).is_err(), "abi {abi} parsed as pair");
+3 -1
View File
@@ -164,7 +164,9 @@ mod uso {
/// Latch USO off for the process after a send that means it isn't usable on this OS/NIC/path.
pub fn disable() {
if STATE.swap(2, Ordering::Relaxed) != 2 {
tracing::warn!("Windows USO unsupported on this path — falling back to per-packet sends");
tracing::warn!(
"Windows USO unsupported on this path — falling back to per-packet sends"
);
}
}
}
+4
View File
@@ -22,6 +22,10 @@ pub enum PixelFormat {
Rgb,
/// `[B,G,R]`, 3 bpp.
Bgr,
/// 10-bit RGB packed as `R10G10B10A2` (DXGI `R10G10B10A2_UNORM`), 4 bpp. The HDR capture path
/// produces this: scRGB FP16 desktop pixels are converted to BT.2020 PQ and written here, then
/// handed to NVENC as `ABGR10` for an HEVC Main10 / HDR10 encode.
Rgb10a2,
}
impl PixelFormat {
+431 -80
View File
@@ -16,24 +16,26 @@ use windows::core::{s, Interface, PCSTR};
use windows::Win32::Foundation::{HMODULE, LUID};
use windows::Win32::Graphics::Direct3D::Fxc::D3DCompile;
use windows::Win32::Graphics::Direct3D::{
ID3DBlob, D3D_DRIVER_TYPE_UNKNOWN, D3D_FEATURE_LEVEL_11_0,
ID3DBlob, D3D_DRIVER_TYPE_UNKNOWN, D3D_FEATURE_LEVEL_11_0, D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST,
D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP,
};
use windows::Win32::Graphics::Direct3D11::{
D3D11CreateDevice, ID3D11BlendState, ID3D11Buffer, ID3D11Device, ID3D11DeviceContext,
ID3D11PixelShader, ID3D11RenderTargetView, ID3D11SamplerState, ID3D11ShaderResourceView,
ID3D11Texture2D, ID3D11VertexShader, D3D11_BIND_CONSTANT_BUFFER, D3D11_BIND_FLAG,
D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_BLEND_DESC, D3D11_BLEND_INV_DEST_COLOR,
D3D11_BLEND_INV_SRC_ALPHA, D3D11_BLEND_ONE, D3D11_BLEND_OP_ADD, D3D11_BLEND_SRC_ALPHA,
D3D11_BUFFER_DESC,
D3D11_COLOR_WRITE_ENABLE_ALL, D3D11_COMPARISON_NEVER, D3D11_CPU_ACCESS_READ,
D3D11_CPU_ACCESS_WRITE, D3D11_CREATE_DEVICE_BGRA_SUPPORT, D3D11_FILTER_MIN_MAG_MIP_POINT,
D3D11_MAPPED_SUBRESOURCE, D3D11_MAP_READ, D3D11_MAP_WRITE_DISCARD, D3D11_RENDER_TARGET_BLEND_DESC,
D3D11_SAMPLER_DESC, D3D11_SDK_VERSION, D3D11_SUBRESOURCE_DATA, D3D11_TEXTURE2D_DESC,
D3D11_TEXTURE_ADDRESS_CLAMP, D3D11_USAGE_DEFAULT, D3D11_USAGE_DYNAMIC, D3D11_USAGE_STAGING,
D3D11_VIEWPORT,
D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_BLEND_DESC,
D3D11_BLEND_INV_DEST_COLOR, D3D11_BLEND_INV_SRC_ALPHA, D3D11_BLEND_ONE, D3D11_BLEND_OP_ADD,
D3D11_BLEND_SRC_ALPHA, D3D11_BUFFER_DESC, D3D11_COLOR_WRITE_ENABLE_ALL, D3D11_COMPARISON_NEVER,
D3D11_CPU_ACCESS_READ, D3D11_CPU_ACCESS_WRITE, D3D11_CREATE_DEVICE_BGRA_SUPPORT,
D3D11_FILTER_MIN_MAG_MIP_POINT, D3D11_MAPPED_SUBRESOURCE, D3D11_MAP_READ,
D3D11_MAP_WRITE_DISCARD, D3D11_RENDER_TARGET_BLEND_DESC, D3D11_SAMPLER_DESC, D3D11_SDK_VERSION,
D3D11_SUBRESOURCE_DATA, D3D11_TEXTURE2D_DESC, D3D11_TEXTURE_ADDRESS_CLAMP, D3D11_USAGE_DEFAULT,
D3D11_USAGE_DYNAMIC, D3D11_USAGE_STAGING, D3D11_VIEWPORT,
};
use windows::Win32::Graphics::Dxgi::Common::{
DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_FORMAT_R10G10B10A2_UNORM, DXGI_FORMAT_R16G16B16A16_FLOAT,
DXGI_SAMPLE_DESC,
};
use windows::Win32::Graphics::Dxgi::Common::{DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_SAMPLE_DESC};
use windows::Win32::Graphics::Dxgi::{
CreateDXGIFactory1, IDXGIAdapter1, IDXGIFactory1, IDXGIOutput1, IDXGIOutputDuplication,
IDXGIResource, DXGI_ERROR_ACCESS_LOST, DXGI_ERROR_DEVICE_REMOVED, DXGI_ERROR_DEVICE_RESET,
@@ -230,7 +232,8 @@ unsafe fn compile_shader(src: &str, entry: PCSTR, target: PCSTR) -> Result<Vec<u
.as_ref()
.map(|e| {
let p = e.GetBufferPointer() as *const u8;
String::from_utf8_lossy(std::slice::from_raw_parts(p, e.GetBufferSize())).to_string()
String::from_utf8_lossy(std::slice::from_raw_parts(p, e.GetBufferSize()))
.to_string()
})
.unwrap_or_default();
bail!("D3DCompile failed: {msg}");
@@ -326,7 +329,13 @@ impl CursorCompositor {
})
}
unsafe fn set_shape(&mut self, device: &ID3D11Device, bgra: &[u8], w: u32, h: u32) -> Result<()> {
unsafe fn set_shape(
&mut self,
device: &ID3D11Device,
bgra: &[u8],
w: u32,
h: u32,
) -> Result<()> {
let desc = D3D11_TEXTURE2D_DESC {
Width: w,
Height: h,
@@ -394,7 +403,11 @@ impl CursorCompositor {
};
ctx.RSSetViewports(Some(&[vp]));
ctx.OMSetRenderTargets(Some(&[Some(rtv.clone())]), None);
let blend = if invert { &self.blend_invert } else { &self.blend };
let blend = if invert {
&self.blend_invert
} else {
&self.blend
};
ctx.OMSetBlendState(blend, Some(&[0.0; 4]), 0xffff_ffff);
ctx.VSSetShader(&self.vs, None);
ctx.PSSetShader(&self.ps, None);
@@ -409,8 +422,122 @@ impl CursorCompositor {
}
}
/// Fullscreen-triangle vertex shader for the HDR conversion pass (3 verts, no input layout).
const HDR_VS: &str = r"
struct VOut { float4 pos : SV_POSITION; float2 uv : TEXCOORD0; };
VOut main(uint vid : SV_VertexID) {
float2 uv = float2((vid << 1) & 2, vid & 2);
VOut o;
o.pos = float4(uv * float2(2.0, -2.0) + float2(-1.0, 1.0), 0.0, 1.0);
o.uv = uv;
return o;
}
";
/// HDR conversion pixel shader: scRGB FP16 desktop (linear, Rec.709 primaries, 1.0 = 80 nits) →
/// BT.2020 primaries → SMPTE ST 2084 (PQ) → written to a 10-bit R10G10B10A2 target for NVENC
/// (HEVC Main10 / HDR10). This is the standard Windows-HDR capture conversion (matches OBS/Sunshine).
const HDR_PS: &str = r"
Texture2D<float4> tx : register(t0);
SamplerState sm : register(s0);
// Rec.709 → Rec.2020 primaries (linear). Column-major rows as written, used with mul(M, v).
static const float3x3 BT709_TO_BT2020 = {
0.627403914, 0.329283038, 0.043313048,
0.069097292, 0.919540405, 0.011362303,
0.016391439, 0.088013308, 0.895595253
};
float3 pq_oetf(float3 L) {
// L normalized so 1.0 = 10000 nits. ST 2084.
const float m1 = 0.1593017578125;
const float m2 = 78.84375;
const float c1 = 0.8359375;
const float c2 = 18.8515625;
const float c3 = 18.6875;
float3 Lp = pow(saturate(L), m1);
return pow((c1 + c2 * Lp) / (1.0 + c3 * Lp), m2);
}
float4 main(float4 pos : SV_POSITION, float2 uv : TEXCOORD0) : SV_TARGET {
float3 scrgb = max(tx.Sample(sm, uv).rgb, 0.0); // scRGB can be negative (wide gamut); clamp
float3 nits = scrgb * 80.0; // scRGB 1.0 = 80 nits → absolute luminance
float3 lin2020 = mul(BT709_TO_BT2020, nits); // primaries conversion (linear)
float3 pq = pq_oetf(lin2020 / 10000.0); // normalize to 10k nits, encode PQ
return float4(pq, 1.0);
}
";
/// scRGB FP16 → BT.2020 PQ 10-bit conversion pass. One per capture device (rebuilt on device
/// recreate, like [`CursorCompositor`]). A single fullscreen draw samples the FP16 source SRV and
/// writes PQ-encoded BT.2020 to the bound R10G10B10A2 render target.
struct HdrConverter {
vs: ID3D11VertexShader,
ps: ID3D11PixelShader,
sampler: ID3D11SamplerState,
}
impl HdrConverter {
unsafe fn new(device: &ID3D11Device) -> Result<Self> {
let vsb = compile_shader(HDR_VS, s!("main"), s!("vs_5_0"))?;
let psb = compile_shader(HDR_PS, s!("main"), s!("ps_5_0"))?;
let mut vs = None;
device.CreateVertexShader(&vsb, None, Some(&mut vs))?;
let mut ps = None;
device.CreatePixelShader(&psb, None, Some(&mut ps))?;
let sd = D3D11_SAMPLER_DESC {
Filter: D3D11_FILTER_MIN_MAG_MIP_POINT,
AddressU: D3D11_TEXTURE_ADDRESS_CLAMP,
AddressV: D3D11_TEXTURE_ADDRESS_CLAMP,
AddressW: D3D11_TEXTURE_ADDRESS_CLAMP,
ComparisonFunc: D3D11_COMPARISON_NEVER,
MaxLOD: f32::MAX,
..Default::default()
};
let mut sampler = None;
device.CreateSamplerState(&sd, Some(&mut sampler))?;
Ok(Self {
vs: vs.context("hdr vs")?,
ps: ps.context("hdr ps")?,
sampler: sampler.context("hdr sampler")?,
})
}
/// Convert `src_srv` (FP16 scRGB) into `dst_rtv` (R10G10B10A2 PQ BT.2020). Opaque pass, no blend.
unsafe fn convert(
&self,
ctx: &ID3D11DeviceContext,
src_srv: &ID3D11ShaderResourceView,
dst_rtv: &ID3D11RenderTargetView,
w: u32,
h: u32,
) {
let vp = D3D11_VIEWPORT {
TopLeftX: 0.0,
TopLeftY: 0.0,
Width: w as f32,
Height: h as f32,
MinDepth: 0.0,
MaxDepth: 1.0,
};
ctx.RSSetViewports(Some(&[vp]));
ctx.OMSetRenderTargets(Some(&[Some(dst_rtv.clone())]), None);
ctx.OMSetBlendState(None, None, 0xffff_ffff); // opaque overwrite
ctx.VSSetShader(&self.vs, None);
ctx.PSSetShader(&self.ps, None);
ctx.PSSetShaderResources(0, Some(&[Some(src_srv.clone())]));
ctx.PSSetSamplers(0, Some(&[Some(self.sampler.clone())]));
ctx.IASetInputLayout(None);
ctx.IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
ctx.Draw(3, 0);
// Unbind so the next frame can CopyResource into the source and re-RTV the destination.
ctx.OMSetRenderTargets(Some(&[None]), None);
ctx.PSSetShaderResources(0, Some(&[None]));
}
}
/// Convert a DXGI pointer shape (color / masked-color / monochrome) into top-down BGRA.
fn convert_pointer_shape(buf: &[u8], si: &DXGI_OUTDUPL_POINTER_SHAPE_INFO) -> Option<(Vec<u8>, u32, u32)> {
fn convert_pointer_shape(
buf: &[u8],
si: &DXGI_OUTDUPL_POINTER_SHAPE_INFO,
) -> Option<(Vec<u8>, u32, u32)> {
let w = si.Width as usize;
let pitch = si.Pitch as usize;
if w == 0 || pitch == 0 {
@@ -571,7 +698,34 @@ pub struct DuplCapturer {
/// Reused owned texture the duplication frame is copied into for the D3D11 path (the duplication
/// surface is transient and released each frame).
gpu_copy: Option<ID3D11Texture2D>,
have_gpu_frame: bool,
/// The most recently produced presentable GPU texture + its pixel format, repeated by
/// `next_frame` when AcquireNextFrame reports no change (static desktop) or during a rebuild.
/// Format-tagged because the SDR path presents BGRA `gpu_copy` while the HDR path presents the
/// 10-bit `hdr10_out` — the encoder needs the right format on every frame.
last_present: Option<(ID3D11Texture2D, PixelFormat)>,
/// HDR (scRGB FP16) capture state. Set when the duplication surface is `R16G16B16A16_FLOAT`
/// (the desktop has HDR on). The frame can't be `CopyResource`d into a BGRA target, so the HDR
/// path copies it into an FP16 SRV texture, composites the cursor, then runs [`HdrConverter`] to
/// produce a BT.2020 PQ 10-bit (`R10G10B10A2`) frame for NVENC. Toggling HDR fires ACCESS_LOST →
/// `recreate_dupl` re-detects the format, so this tracks the *current* duplication.
hdr_fp16: bool,
/// FP16 copy of the duplication surface (RT|SRV): the cursor composites onto it and the converter
/// samples it. Reallocated on device/size change.
fp16_src: Option<ID3D11Texture2D>,
fp16_srv: Option<ID3D11ShaderResourceView>,
/// 10-bit `R10G10B10A2` PQ output of the HDR conversion — the texture handed to NVENC.
hdr10_out: Option<ID3D11Texture2D>,
/// scRGB→PQ conversion pass; rebuilt on device recreate.
hdr_conv: Option<HdrConverter>,
/// Last time a duplication rebuild was attempted, to throttle retries during an outage (e.g. a
/// secure-desktop dwell where the output is gone) so we don't block the encode loop or hammer
/// DuplicateOutput — between attempts the last good frame is repeated. `None` = never attempted.
last_rebuild: Option<Instant>,
/// True once at least one real frame has been produced. After that, a frame drought (e.g. a long
/// secure-desktop dwell with nothing rendering to the virtual output) must never fatally end the
/// session — `next_frame` keeps repeating the last/seeded frame instead of erroring on its
/// deadline. The deadline stays fatal only *before* the first frame (a genuine startup misconfig).
ever_got_frame: bool,
/// GPU cursor overlay (rebuilt on device recreate). `None` until the first composite.
cursor: Option<CursorCompositor>,
/// Last cursor shape as BGRA (kept device-independent so it survives a device recreate).
@@ -721,7 +875,7 @@ impl DuplCapturer {
.map(|v| matches!(v.to_ascii_lowercase().as_str(), "nvenc" | "hw" | "nvidia"))
.unwrap_or(false);
tracing::info!(
"DXGI duplication: {}x{}@{} on {} ({})",
"DXGI duplication: {}x{}@{} on {} ({}) dxgi_format={} (87=BGRA8 24=R10G10B10A2 10=R16G16B16A16_FLOAT)",
width,
height,
refresh_hz,
@@ -730,7 +884,8 @@ impl DuplCapturer {
"D3D11 zero-copy"
} else {
"CPU staging"
}
},
dd.ModeDesc.Format.0,
);
Ok(Self {
device,
@@ -752,7 +907,14 @@ impl DuplCapturer {
last: None,
gpu_mode,
gpu_copy: None,
have_gpu_frame: false,
last_present: None,
hdr_fp16: dd.ModeDesc.Format == DXGI_FORMAT_R16G16B16A16_FLOAT,
fp16_src: None,
fp16_srv: None,
hdr10_out: None,
hdr_conv: None,
last_rebuild: None,
ever_got_frame: false,
cursor: None,
cursor_shape: None,
cursor_pos: (0, 0),
@@ -819,10 +981,104 @@ impl DuplCapturer {
Ok(())
}
/// FP16 (`R16G16B16A16_FLOAT`) copy of the HDR duplication surface (RT for the cursor composite +
/// SRV for the converter). Reallocated when absent (device/size change drops it).
unsafe fn ensure_fp16_src(&mut self) -> Result<()> {
if self.fp16_src.is_some() {
return Ok(());
}
let desc = D3D11_TEXTURE2D_DESC {
Width: self.width,
Height: self.height,
MipLevels: 1,
ArraySize: 1,
Format: DXGI_FORMAT_R16G16B16A16_FLOAT,
SampleDesc: DXGI_SAMPLE_DESC {
Count: 1,
Quality: 0,
},
Usage: D3D11_USAGE_DEFAULT,
BindFlags: (D3D11_BIND_RENDER_TARGET.0 | D3D11_BIND_SHADER_RESOURCE.0) as u32,
CPUAccessFlags: 0,
MiscFlags: 0,
};
let mut t: Option<ID3D11Texture2D> = None;
self.device
.CreateTexture2D(&desc, None, Some(&mut t))
.context("CreateTexture2D(fp16 src)")?;
let t = t.context("fp16 src tex")?;
let mut srv = None;
self.device
.CreateShaderResourceView(&t, None, Some(&mut srv))?;
self.fp16_srv = Some(srv.context("fp16 srv")?);
self.fp16_src = Some(t);
Ok(())
}
/// 10-bit `R10G10B10A2_UNORM` PQ output of the HDR conversion — the texture NVENC encodes.
unsafe fn ensure_hdr10_out(&mut self) -> Result<()> {
if self.hdr10_out.is_some() {
return Ok(());
}
let desc = D3D11_TEXTURE2D_DESC {
Width: self.width,
Height: self.height,
MipLevels: 1,
ArraySize: 1,
Format: DXGI_FORMAT_R10G10B10A2_UNORM,
SampleDesc: DXGI_SAMPLE_DESC {
Count: 1,
Quality: 0,
},
Usage: D3D11_USAGE_DEFAULT,
BindFlags: D3D11_BIND_RENDER_TARGET.0 as u32,
CPUAccessFlags: 0,
MiscFlags: 0,
};
let mut t: Option<ID3D11Texture2D> = None;
self.device
.CreateTexture2D(&desc, None, Some(&mut t))
.context("CreateTexture2D(hdr10 out)")?;
self.hdr10_out = t;
Ok(())
}
/// Allocate a presentable GPU texture on the *current* device, clear it to black, and record it
/// as `last_present`. Called after a desktop-switch recovery so `next_frame` always has a D3D11
/// frame to repeat even while the (secure) desktop renders nothing to the virtual output — this
/// is what keeps the session alive across a lock/login/UAC transition instead of dropping it. In
/// HDR mode it seeds the 10-bit output (black = PQ 0); otherwise the BGRA copy. One-shot: the next
/// real frame overwrites the texture in place.
unsafe fn seed_black_gpu_frame(&mut self) -> Result<()> {
if self.hdr_fp16 {
self.ensure_hdr10_out()?;
let out = self.hdr10_out.clone().context("hdr10 out texture")?;
let mut rtv: Option<ID3D11RenderTargetView> = None;
self.device
.CreateRenderTargetView(&out, None, Some(&mut rtv))?;
self.context
.ClearRenderTargetView(&rtv.context("null RTV (hdr seed)")?, &[0.0, 0.0, 0.0, 1.0]);
self.last_present = Some((out, PixelFormat::Rgb10a2));
} else {
self.ensure_gpu_copy()?;
let gpu = self.gpu_copy.clone().context("gpu copy texture")?;
let mut rtv: Option<ID3D11RenderTargetView> = None;
self.device
.CreateRenderTargetView(&gpu, None, Some(&mut rtv))?;
self.context
.ClearRenderTargetView(&rtv.context("null RTV (sdr seed)")?, &[0.0, 0.0, 0.0, 1.0]);
self.last_present = Some((gpu, PixelFormat::Bgra));
}
Ok(())
}
/// Pull cursor position/visibility/shape out of the frame info (the HW cursor is NOT in the frame).
unsafe fn update_cursor(&mut self, info: &DXGI_OUTDUPL_FRAME_INFO) {
if info.LastMouseUpdateTime != 0 {
self.cursor_pos = (info.PointerPosition.Position.x, info.PointerPosition.Position.y);
self.cursor_pos = (
info.PointerPosition.Position.x,
info.PointerPosition.Position.y,
);
self.cursor_visible = info.PointerPosition.Visible.as_bool();
}
if info.PointerShapeBufferSize > 0 {
@@ -856,12 +1112,21 @@ impl DuplCapturer {
/// Composite the cursor onto the GPU frame texture (zero-copy path).
unsafe fn composite_cursor_gpu(&mut self, gpu: &ID3D11Texture2D) -> Result<()> {
// Diagnostic kill-switch: skip the GPU cursor composite entirely (PUNKTFUNK_NO_CURSOR=1) to
// isolate its cost on the 3D engine. The per-frame render-target view + draw to the 5K target
// is the suspect for the high 3D usage under heavy desktop change.
if std::env::var_os("PUNKTFUNK_NO_CURSOR").is_some() {
return Ok(());
}
self.dbg_cursor += 1;
if self.dbg_cursor % 240 == 1 {
tracing::debug!(
visible = self.cursor_visible,
pos = format!("{:?}", self.cursor_pos),
shape = self.cursor_shape.as_ref().map(|(_, w, h)| format!("{w}x{h}")),
shape = self
.cursor_shape
.as_ref()
.map(|(_, w, h)| format!("{w}x{h}")),
"cursor state"
);
}
@@ -898,58 +1163,70 @@ impl DuplCapturer {
Ok(())
}
/// ONE rebuild attempt — deliberately non-blocking. ACCESS_LOST fires on desktop switches
/// (normal ↔ Winlogon secure: lock/login/UAC) and on the mode change we issue at create. We
/// re-attach to the now-current input desktop and recreate the D3D11 device + duplication on it
/// (a device made on the previous desktop can't sustain a duplication on the new one). CRUCIAL:
/// no internal multi-second retry loop — during a secure-desktop dwell the SudoVDA output is
/// *gone* (`no DXGI output named …`), and a blocking retry here would starve the encode/send
/// loop of frames for seconds, so the client times out and disconnects (the bug this fixes).
/// Instead a single attempt returns immediately; the caller ([`acquire`]) repeats the last good
/// frame and retries on a throttle, so the session survives an arbitrarily long secure visit.
unsafe fn recreate_dupl(&mut self) -> Result<()> {
if self.holding_frame {
let _ = self.dupl.ReleaseFrame();
self.holding_frame = false;
}
// ACCESS_LOST fires on desktop switches (normal ↔ Winlogon secure: lock/login/UAC) and on the
// mode change we issue at create. Re-attach to the now-current input desktop AND recreate the
// D3D11 device on it: a device made on the previous desktop cannot sustain a duplication on the
// new one (perpetual ACCESS_LOST). The capturer hands the new device out on `FramePayload::D3d11`,
// so NVENC re-inits when it sees it. Retry while the desktop is mid-reconfigure.
let deadline = Instant::now() + Duration::from_millis(12000);
loop {
// The SudoVDA virtual output's GDI name can CHANGE across a secure-desktop topology
// rebuild — the observed failure was searching for the stale \\.\DISPLAYn until the
// deadline and dying ("no DXGI output named ..."). Re-resolve it from the STABLE target
// id each retry so recovery finds the output under its current name.
if let Some(n) = crate::vdisplay::sudovda::resolve_gdi_name(self.target_id) {
self.gdi_name = n;
}
attach_input_desktop();
match reopen_duplication(&self.gdi_name) {
Ok((dev, ctx, out, dupl)) => {
// A desktop switch can come back at a different size (e.g. the user session applies
// its own resolution on login). Adopt it: update dimensions and drop the staging/gpu
// copies so they reallocate. NVENC re-inits at the new size when it sees the frame.
let dd: DXGI_OUTDUPL_DESC = dupl.GetDesc();
let (nw, nh) = (dd.ModeDesc.Width, dd.ModeDesc.Height);
if nw != self.width || nh != self.height {
tracing::info!(
old = format!("{}x{}", self.width, self.height),
new = format!("{nw}x{nh}"),
"DXGI duplication size changed across switch"
);
self.width = nw;
self.height = nh;
self.staging = None;
}
self.device = dev;
self.context = ctx;
self.output = out;
self.dupl = dupl;
self.gpu_copy = None; // stale: belonged to the old device
self.cursor = None; // shaders/textures belonged to the old device; rebuilt on demand
self.have_gpu_frame = false;
self.first_frame = true;
nudge_cursor_onto(&self.output); // re-kick after recovery
return Ok(());
}
Err(e) if Instant::now() >= deadline => return Err(e),
Err(_) => std::thread::sleep(Duration::from_millis(120)),
// The SudoVDA output's GDI name can CHANGE across a secure-desktop topology rebuild —
// re-resolve from the STABLE target id so we find it under its current name.
if let Some(n) = crate::vdisplay::sudovda::resolve_gdi_name(self.target_id) {
self.gdi_name = n;
}
attach_input_desktop();
let (dev, ctx, out, dupl) = reopen_duplication(&self.gdi_name)?; // Err → caller repeats + retries
// A desktop switch can come back at a different size (e.g. the user session applies its own
// resolution on login). Adopt it: update dimensions and drop the staging/gpu copies so they
// reallocate. NVENC re-inits at the new size when it sees the frame.
let dd: DXGI_OUTDUPL_DESC = dupl.GetDesc();
let (nw, nh) = (dd.ModeDesc.Width, dd.ModeDesc.Height);
tracing::info!(
dxgi_format = dd.ModeDesc.Format.0,
"DXGI duplication rebuilt (format: 87=BGRA8 24=R10G10B10A2 10=R16G16B16A16_FLOAT)"
);
if nw != self.width || nh != self.height {
tracing::info!(
old = format!("{}x{}", self.width, self.height),
new = format!("{nw}x{nh}"),
"DXGI duplication size changed across switch"
);
self.width = nw;
self.height = nh;
self.staging = None;
}
self.device = dev;
self.context = ctx;
self.output = out;
self.dupl = dupl;
self.gpu_copy = None; // stale: belonged to the old device
self.cursor = None; // shaders/textures belonged to the old device; rebuilt on demand
self.last_present = None; // belonged to the old device; reseeded below
// Re-detect HDR and drop the HDR textures/converter (old device). Toggling HDR on or
// off is exactly this path: the duplication comes back as FP16 (HDR) or BGRA8.
self.hdr_fp16 = dd.ModeDesc.Format == DXGI_FORMAT_R16G16B16A16_FLOAT;
self.fp16_src = None;
self.fp16_srv = None;
self.hdr10_out = None;
self.hdr_conv = None;
self.first_frame = true;
// Seed a black frame on the NEW device so next_frame always has something to repeat (and the
// encoder re-inits) until real frames resume.
if self.gpu_mode {
if let Err(e) = self.seed_black_gpu_frame() {
tracing::warn!(error = %format!("{e:#}"), "seed black frame after recovery failed");
}
}
nudge_cursor_onto(&self.output); // re-kick after recovery
Ok(())
}
/// Acquire one frame: `Some` on a fresh image, `None` on timeout (no change → caller reuses last).
@@ -960,7 +1237,11 @@ impl DuplCapturer {
}
let mut info = DXGI_OUTDUPL_FRAME_INFO::default();
let mut res: Option<IDXGIResource> = None;
let timeout = if self.first_frame { 2000 } else { self.timeout_ms };
let timeout = if self.first_frame {
2000
} else {
self.timeout_ms
};
match self.dupl.AcquireNextFrame(timeout, &mut info, &mut res) {
Ok(()) => {
if self.first_frame {
@@ -993,13 +1274,31 @@ impl DuplCapturer {
|| e.code() == DXGI_ERROR_DEVICE_RESET =>
{
self.dbg_lost += 1;
tracing::warn!(
lost = self.dbg_lost,
code = format!("{:#x}", e.code().0),
"DXGI capture lost (desktop switch?) — recovering"
);
self.recreate_dupl()?;
self.first_frame = true;
// THROTTLED, NON-BLOCKING recovery. During a secure-desktop dwell the SudoVDA output
// is gone, so a rebuild fails for the whole visit. We must NOT block retrying (that
// starves the encode/send loop → the client times out → disconnect — the bug). Try a
// rebuild at most ~4×/s; between attempts return "no new frame" so next_frame repeats
// the last good frame, keeping the client fed (frozen) until the desktop returns. A
// brief sleep on the throttled path avoids busy-spinning on the dead duplication.
let now = Instant::now();
let due = self.last_rebuild.map_or(true, |t| {
now.duration_since(t) >= Duration::from_millis(250)
});
if due {
self.last_rebuild = Some(now);
if self.dbg_lost % 8 == 1 {
tracing::warn!(
lost = self.dbg_lost,
code = format!("{:#x}", e.code().0),
"DXGI capture lost (desktop switch?) — repeating last frame, retrying rebuild"
);
}
if self.recreate_dupl().is_ok() {
self.first_frame = true;
}
} else {
std::thread::sleep(Duration::from_millis(8));
}
return Ok(None);
}
Err(e) => return Err(e).context("AcquireNextFrame"),
@@ -1007,6 +1306,47 @@ impl DuplCapturer {
self.holding_frame = true;
let res = res.context("AcquireNextFrame: null resource")?;
let tex: ID3D11Texture2D = res.cast().context("resource -> Texture2D")?;
if self.gpu_mode && self.hdr_fp16 {
// HDR zero-copy path: the duplication surface is scRGB FP16 (R16G16B16A16_FLOAT) — it can't
// be CopyResource'd into a BGRA target (that was the freeze + cursor-trail bug). Copy it into
// an FP16 SRV texture (same format → valid), composite the cursor onto it (the cursor lands
// at ~SDR-white brightness, then goes through the PQ curve correctly), then convert scRGB →
// BT.2020 PQ 10-bit into hdr10_out and hand THAT to NVENC (HEVC Main10 / HDR10).
self.ensure_fp16_src()?;
let src = self.fp16_src.clone().context("fp16 src texture")?;
self.context.CopyResource(&src, &tex);
let _ = self.dupl.ReleaseFrame();
self.holding_frame = false;
self.composite_cursor_gpu(&src)?; // onto the FP16 surface (RTV works on FP16)
self.ensure_hdr10_out()?;
let out = self.hdr10_out.clone().context("hdr10 out texture")?;
if self.hdr_conv.is_none() {
self.hdr_conv = Some(HdrConverter::new(&self.device)?);
}
let srv = self.fp16_srv.clone().context("fp16 srv")?;
let mut rtv: Option<ID3D11RenderTargetView> = None;
self.device
.CreateRenderTargetView(&out, None, Some(&mut rtv))?;
let rtv = rtv.context("hdr10 rtv")?;
self.hdr_conv.as_ref().unwrap().convert(
&self.context,
&srv,
&rtv,
self.width,
self.height,
);
self.last_present = Some((out.clone(), PixelFormat::Rgb10a2));
return Ok(Some(CapturedFrame {
width: self.width,
height: self.height,
pts_ns: now_ns(),
format: PixelFormat::Rgb10a2,
payload: FramePayload::D3d11(D3d11Frame {
texture: out,
device: self.device.clone(),
}),
}));
}
if self.gpu_mode {
// Zero-copy path: keep the frame on the GPU for NVENC. Copy the transient duplication
// surface into a reused owned texture, release the duplication frame, hand off the texture.
@@ -1015,8 +1355,8 @@ impl DuplCapturer {
self.context.CopyResource(&gpu, &tex);
let _ = self.dupl.ReleaseFrame();
self.holding_frame = false;
self.have_gpu_frame = true;
self.composite_cursor_gpu(&gpu)?;
self.last_present = Some((gpu.clone(), PixelFormat::Bgra));
return Ok(Some(CapturedFrame {
width: self.width,
height: self.height,
@@ -1079,20 +1419,23 @@ impl Capturer for DuplCapturer {
fn next_frame(&mut self) -> Result<CapturedFrame> {
// Generous: a secure-desktop switch can take several seconds to settle (re-resolve + recreate
// the duplication up to 12 s). Better a few seconds of frozen-last-frame than dropping the stream.
let deadline = Instant::now() + Duration::from_secs(20);
let mut deadline = Instant::now() + Duration::from_secs(20);
loop {
if let Some(f) = unsafe { self.acquire() }? {
self.ever_got_frame = true;
return Ok(f);
}
if self.gpu_mode && self.have_gpu_frame {
if let Some(gpu) = &self.gpu_copy {
if self.gpu_mode {
if let Some((tex, fmt)) = &self.last_present {
// Repeat the last presented GPU frame (SDR BGRA or HDR 10-bit), keeping the encoder
// on a matching format through a static desktop or a mid-rebuild gap.
return Ok(CapturedFrame {
width: self.width,
height: self.height,
pts_ns: now_ns(),
format: PixelFormat::Bgra,
format: *fmt,
payload: FramePayload::D3d11(D3d11Frame {
texture: gpu.clone(),
texture: tex.clone(),
device: self.device.clone(),
}),
});
@@ -1108,6 +1451,14 @@ impl Capturer for DuplCapturer {
});
}
if Instant::now() > deadline {
// After we've streamed at least once, never fatally drop on a frame drought: a long
// secure-desktop dwell (or a slow rebuild) just means no NEW frame yet. Reset the
// deadline and keep repeating the last/seeded frame so the session stays alive. The
// deadline stays fatal only before the first frame — a genuine "monitor never lit up".
if self.ever_got_frame {
deadline = Instant::now() + Duration::from_secs(20);
continue;
}
return Err(anyhow!(
"no DXGI frame within 20s (SudoVDA monitor not activated by a WDDM GPU?)"
));
+23 -4
View File
@@ -102,6 +102,7 @@ pub fn validate_dimensions(codec: Codec, width: u32, height: u32) -> Result<()>
/// encoder takes GPU frames (`AV_PIX_FMT_CUDA`) from the zero-copy path; otherwise it takes
/// packed RGB/BGR CPU frames. `format`/`bitrate_bps`/`codec`/mode come from session
/// negotiation; the caller derives `cuda` from the first captured frame's payload.
#[allow(clippy::too_many_arguments)]
pub fn open_video(
codec: Codec,
format: PixelFormat,
@@ -110,6 +111,7 @@ pub fn open_video(
fps: u32,
bitrate_bps: u64,
cuda: bool,
bit_depth: u8,
) -> Result<Box<dyn Encoder>> {
validate_dimensions(codec, width, height)?;
#[cfg(target_os = "linux")]
@@ -134,7 +136,7 @@ pub fn open_video(
}
let mut last: Option<anyhow::Error> = None;
for (i, &b) in candidates.iter().enumerate() {
match linux::NvencEncoder::open(codec, format, width, height, fps, b, cuda) {
match linux::NvencEncoder::open(codec, format, width, height, fps, b, cuda, bit_depth) {
Ok(enc) => {
if i > 0 {
tracing::warn!(
@@ -158,6 +160,7 @@ pub fn open_video(
#[cfg(target_os = "windows")]
{
let _ = cuda; // always false on Windows (no Cuda payload)
let _ = bit_depth; // used by the NVENC path below; the software H.264 path is 8-bit only
let pref = std::env::var("PUNKTFUNK_ENCODER")
.unwrap_or_default()
.to_ascii_lowercase();
@@ -166,8 +169,15 @@ pub fn open_video(
// FramePayload::D3d11 output under the same env var so capture + encode share textures.
#[cfg(feature = "nvenc")]
{
let enc =
nvenc::NvencD3d11Encoder::open(codec, format, width, height, fps, bitrate_bps)?;
let enc = nvenc::NvencD3d11Encoder::open(
codec,
format,
width,
height,
fps,
bitrate_bps,
bit_depth,
)?;
return Ok(Box::new(enc) as Box<dyn Encoder>);
}
#[cfg(not(feature = "nvenc"))]
@@ -196,7 +206,16 @@ pub fn open_video(
}
#[cfg(not(any(target_os = "linux", target_os = "windows")))]
{
let _ = (codec, format, width, height, fps, bitrate_bps, cuda);
let _ = (
codec,
format,
width,
height,
fps,
bitrate_bps,
cuda,
bit_depth,
);
anyhow::bail!("video encode requires Linux or Windows")
}
}
+15
View File
@@ -103,6 +103,9 @@ fn nvenc_input(format: PixelFormat) -> (Pixel, bool) {
PixelFormat::Rgba => (Pixel::RGBA, false),
PixelFormat::Rgb => (Pixel::RGBZ, true), // RGB -> rgb0
PixelFormat::Bgr => (Pixel::BGRZ, true), // BGR -> bgr0
// 10-bit HDR (R10G10B10A2) is produced only by the Windows DXGI HDR capture path; the Linux
// capturer never emits it. Map to BGRA so the match is exhaustive — unreachable here.
PixelFormat::Rgb10a2 => (Pixel::BGRA, false),
}
}
@@ -131,6 +134,7 @@ pub struct NvencEncoder {
unsafe impl Send for NvencEncoder {}
impl NvencEncoder {
#[allow(clippy::too_many_arguments)]
pub fn open(
codec: Codec,
format: PixelFormat,
@@ -139,7 +143,18 @@ impl NvencEncoder {
fps: u32,
bitrate_bps: u64,
cuda: bool,
bit_depth: u8,
) -> Result<Self> {
// TODO(hdr): Linux 10-bit parity. Unlike the Windows raw-SDK path (which upconverts 8-bit
// ARGB → Main10 via pixelBitDepthMinus8), libavcodec hevc_nvenc needs a 10-bit input pixel
// format (p010) for Main10, so it's a bigger change; deferred until a Linux GPU box is
// available to validate. The Linux host stays 8-bit for now.
if bit_depth != 8 {
tracing::warn!(
bit_depth,
"Linux NVENC 10-bit not yet wired — encoding 8-bit"
);
}
ffmpeg::init().context("ffmpeg init")?;
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
unsafe { ffi::av_log_set_level(48) }; // AV_LOG_DEBUG — surface NVENC hw-frame rejects
+70 -10
View File
@@ -43,6 +43,12 @@ pub struct NvencD3d11Encoder {
fps: u32,
bitrate_bps: u64,
buffer_fmt: nv::NV_ENC_BUFFER_FORMAT,
/// Encoded bit depth (8 or 10). 10 → HEVC Main10 (NVENC upconverts the 8-bit ARGB input).
bit_depth: u8,
/// HDR: the capturer is delivering BT.2020 PQ 10-bit (`PixelFormat::Rgb10a2`) frames. Sets the
/// `ABGR10` input format + the BT.2020/PQ colour VUI. Derived per-frame from the capture format
/// (HDR can toggle mid-session); a change re-inits the session.
hdr: bool,
/// Registrations of the capturer's input textures, cached by texture raw pointer — NVENC encodes
/// them in place (no per-frame copy). The cloned `ID3D11Texture2D` keeps each alive until we
/// unregister it (the capturer may drop its copy on a device recreate before our teardown runs).
@@ -71,6 +77,7 @@ impl NvencD3d11Encoder {
height: u32,
fps: u32,
bitrate_bps: u64,
bit_depth: u8,
) -> Result<Self> {
Ok(Self {
encoder: ptr::null_mut(),
@@ -80,6 +87,8 @@ impl NvencD3d11Encoder {
fps,
bitrate_bps,
buffer_fmt: nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB,
bit_depth,
hdr: false,
regs: HashMap::new(),
next: 0,
bitstreams: Vec::new(),
@@ -139,7 +148,8 @@ impl NvencD3d11Encoder {
// it at low pixel rates). Env override PUNKTFUNK_SPLIT_ENCODE = 0/disable | 1/auto | 2 | 3.
// HEVC/AV1 only; the init-failure fallback below disables it if a codec/config rejects it.
let pixel_rate = self.width as u64 * self.height as u64 * self.fps.max(1) as u64;
let mut split_mode: u32 = match std::env::var("PUNKTFUNK_SPLIT_ENCODE").ok().as_deref() {
let mut split_mode: u32 = match std::env::var("PUNKTFUNK_SPLIT_ENCODE").ok().as_deref()
{
Some("0") | Some("disable") => {
nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_DISABLE_MODE as u32
}
@@ -202,6 +212,33 @@ impl NvencD3d11Encoder {
cfg.rcParams.vbvBufferSize = vbv;
cfg.rcParams.vbvInitialDelay = vbv;
// 3b. 10-bit HEVC Main10. The 8-bit ARGB capture input is upconverted by NVENC (the
// proven high-bit-depth-from-8-bit path); the encoded stream is 10-bit, which removes
// banding and is the foundation for HDR. Color stays BT.709 here (Phase 2 sets the
// BT.2020/PQ VUI + HDR10 metadata). 8-bit leaves the preset default (Main) untouched.
if self.bit_depth == 10 {
cfg.profileGUID = nv::NV_ENC_HEVC_PROFILE_MAIN10_GUID;
cfg.encodeCodecConfig.hevcConfig.set_pixelBitDepthMinus8(2);
// 10 - 8
}
// 3c. HDR colour signaling: BT.2020 primaries + SMPTE ST 2084 (PQ) transfer in the
// HEVC VUI, so a decoder/display knows the 10-bit samples are PQ HDR (not SDR gamma).
// The capturer already produced PQ-encoded BT.2020 pixels; this just describes them.
// (HDR10 static metadata — mastering display + MaxCLL/MaxFALL — is added in a follow-up.)
if self.hdr {
let vui = &mut cfg.encodeCodecConfig.hevcConfig.hevcVUIParameters;
vui.videoSignalTypePresentFlag = 1;
vui.videoFullRangeFlag = 0; // limited (studio) range — NVENC RGB→YUV default
vui.colourDescriptionPresentFlag = 1;
vui.colourPrimaries =
nv::NV_ENC_VUI_COLOR_PRIMARIES::NV_ENC_VUI_COLOR_PRIMARIES_BT2020;
vui.transferCharacteristics =
nv::NV_ENC_VUI_TRANSFER_CHARACTERISTIC::NV_ENC_VUI_TRANSFER_CHARACTERISTIC_SMPTE2084;
vui.colourMatrix =
nv::NV_ENC_VUI_MATRIX_COEFFS::NV_ENC_VUI_MATRIX_COEFFS_BT2020_NCL;
}
// 4. initialize the encoder.
let mut init = nv::NV_ENC_INITIALIZE_PARAMS {
version: nv::NV_ENC_INITIALIZE_PARAMS_VER,
@@ -242,9 +279,11 @@ impl NvencD3d11Encoder {
// fails, the codec/config may not accept it (e.g. H264) — disable split and retry
// single-engine rather than fail the session.
Err(e)
if split_mode != nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_AUTO_MODE as u32
if split_mode
!= nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_AUTO_MODE as u32
&& split_mode
!= nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_DISABLE_MODE as u32 =>
!= nv::NV_ENC_SPLIT_ENCODE_MODE::NV_ENC_SPLIT_DISABLE_MODE
as u32 =>
{
let _ = (API.destroy_encoder)(enc);
tracing::warn!(error = ?e, "NVENC init rejected with split-encode forced — disabling split, retrying single-engine");
@@ -253,7 +292,10 @@ impl NvencD3d11Encoder {
}
Err(e) => {
let _ = (API.destroy_encoder)(enc);
return Err(anyhow!("initialize_encoder: {e:?} (even at {} Mbps floor)", FLOOR_BPS / 1_000_000));
return Err(anyhow!(
"initialize_encoder: {e:?} (even at {} Mbps floor)",
FLOOR_BPS / 1_000_000
));
}
}
};
@@ -280,10 +322,12 @@ impl NvencD3d11Encoder {
}
self.inited = true;
tracing::info!(
"NVENC D3D11 session: {}x{}@{} {} Mbps {:?}",
"NVENC D3D11 session: {}x{}@{} {}-bit{} {} Mbps {:?}",
self.width,
self.height,
self.fps,
self.bit_depth,
if self.hdr { " HDR(BT.2020 PQ)" } else { "" },
self.bitrate_bps / 1_000_000,
self.codec_guid
);
@@ -303,21 +347,36 @@ impl Encoder for NvencD3d11Encoder {
// The capturer recreates its D3D11 device on a desktop switch (secure/Winlogon) and may come
// back at a different resolution (user session applies its own mode on login). Re-init when the
// frame arrives on a different device OR at a different size than our session was built on.
// HDR (BT.2020 PQ 10-bit) when the capturer hands us a 10-bit R10G10B10A2 frame. This can flip
// mid-session when the user toggles HDR (which arrives as a capture device recreate anyway).
let hdr = matches!(captured.format, PixelFormat::Rgb10a2);
let dev_raw = frame.device.as_raw();
let size_changed = self.inited && (self.width != captured.width || self.height != captured.height);
if self.inited && (self.init_device != dev_raw || size_changed) {
let size_changed =
self.inited && (self.width != captured.width || self.height != captured.height);
let hdr_changed = self.inited && self.hdr != hdr;
if self.inited && (self.init_device != dev_raw || size_changed || hdr_changed) {
tracing::info!(
device_changed = self.init_device != dev_raw,
size_changed,
hdr_changed,
hdr,
new = format!("{}x{}", captured.width, captured.height),
"NVENC: capture device/size changed (desktop switch) — re-initializing session"
"NVENC: capture device/size/HDR changed — re-initializing session"
);
unsafe { self.teardown() };
}
if !self.inited {
// Adopt the current frame size so the encoder always matches what the capturer produces.
// Adopt the current frame size + colour so the encoder always matches the capturer output.
self.width = captured.width;
self.height = captured.height;
self.hdr = hdr;
if hdr {
// 10-bit BT.2020 PQ input; force Main10 regardless of the negotiated SDR bit depth.
self.bit_depth = 10;
self.buffer_fmt = nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ABGR10;
} else {
self.buffer_fmt = nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB;
}
let device = frame.device.clone();
self.init_session(&device)?;
self.init_device = dev_raw;
@@ -332,7 +391,8 @@ impl Encoder for NvencD3d11Encoder {
if !self.regs.contains_key(&key) {
let mut rr = nv::NV_ENC_REGISTER_RESOURCE {
version: nv::NV_ENC_REGISTER_RESOURCE_VER,
resourceType: nv::NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_DIRECTX,
resourceType:
nv::NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_DIRECTX,
width: self.width,
height: self.height,
pitch: 0,
+5
View File
@@ -146,6 +146,11 @@ impl Encoder for OpenH264Encoder {
self.normalize_to_bgra(bytes, 3, false);
self.yuv.read_rgb(BgraSliceU8::new(&self.scratch, (w, h)));
}
// 10-bit HDR comes only from the GPU NVENC path; the software 8-bit H.264 encoder
// can't represent it (and never receives it — the capturer pairs Rgb10a2 with NVENC).
PixelFormat::Rgb10a2 => {
anyhow::bail!("software H.264 encoder cannot encode 10-bit HDR (Rgb10a2)")
}
}
if self.force_kf {
@@ -274,6 +274,7 @@ fn stream_body(
cfg.fps,
cfg.bitrate_kbps as u64 * 1000,
frame.is_cuda(),
8, // GameStream/Moonlight path: 8-bit (its own codec negotiation)
)
.context("open NVENC for stream")?;
// FEC overhead percent (Sunshine default 20). Override with PUNKTFUNK_FEC_PCT (0 = data-only).
+1
View File
@@ -104,6 +104,7 @@ pub fn run(opts: Options) -> Result<()> {
opts.fps,
opts.bitrate_bps,
first.is_cuda(),
8, // m0 synthetic harness: 8-bit
)
.context("open encoder")?;
+34 -4
View File
@@ -554,6 +554,25 @@ async fn serve_session(
"encoder bitrate"
);
// Resolve the encode bit depth: HEVC Main10 only when the client advertised it AND the host
// opted in (PUNKTFUNK_10BIT). A client that can't decode 10-bit (caps bit clear, or an older
// client) always gets the 8-bit stream. PUNKTFUNK_10BIT is the host policy gate until a
// mgmt/console toggle replaces it.
let host_wants_10bit = std::env::var_os("PUNKTFUNK_10BIT").is_some();
let client_supports_10bit = hello.video_caps & punktfunk_core::quic::VIDEO_CAP_10BIT != 0;
let bit_depth: u8 = if host_wants_10bit && client_supports_10bit {
10
} else {
8
};
tracing::info!(
bit_depth,
host_wants_10bit,
client_supports_10bit,
client_video_caps = hello.video_caps,
"encode bit depth"
);
// Reserve a UDP port for the data plane (bind, read it back, rebind in UdpTransport).
let probe = std::net::UdpSocket::bind("0.0.0.0:0")?;
let udp_port = probe.local_addr()?.port();
@@ -590,6 +609,7 @@ async fn serve_session(
.unwrap_or(CompositorPref::Auto),
gamepad,
bitrate_kbps,
bit_depth,
};
io::write_msg(&mut send, &welcome.encode()).await?;
@@ -807,6 +827,7 @@ async fn serve_session(
let (seconds, frames) = (opts.seconds, opts.frames);
let mode = hello.mode;
let bitrate_kbps = welcome.bitrate_kbps; // resolved encoder bitrate (Hello clamped, or default)
let bit_depth = welcome.bit_depth; // resolved encode bit depth (8, or 10 when negotiated)
let stop_stream = stop.clone();
let result: Result<()> = async {
tokio::task::spawn_blocking(move || -> Result<()> {
@@ -849,6 +870,7 @@ async fn serve_session(
&keyframe_rx,
compositor,
bitrate_kbps,
bit_depth,
probe_rx,
probe_result_tx,
)
@@ -1942,6 +1964,7 @@ fn virtual_stream(
keyframe: &std::sync::mpsc::Receiver<()>,
compositor: crate::vdisplay::Compositor,
bitrate_kbps: u32,
bit_depth: u8,
probe_rx: std::sync::mpsc::Receiver<ProbeRequest>,
probe_result_tx: tokio::sync::mpsc::UnboundedSender<ProbeResult>,
) -> Result<()> {
@@ -1949,11 +1972,12 @@ fn virtual_stream(
compositor = compositor.id(),
?mode,
bitrate_kbps,
bit_depth,
"punktfunk/1 virtual display"
);
let mut vd = crate::vdisplay::open(compositor)?;
let (mut capturer, mut enc, mut frame, mut interval) =
build_pipeline_with_retry(&mut vd, mode, bitrate_kbps)?;
build_pipeline_with_retry(&mut vd, mode, bitrate_kbps, bit_depth)?;
let perf = std::env::var("PUNKTFUNK_PERF").is_ok();
// Microburst cap (applied in send_loop/paced_submit): a frame ≤ this bursts out immediately;
@@ -2041,7 +2065,8 @@ fn virtual_stream(
let rebuilt =
(|| -> Result<(Box<dyn crate::vdisplay::VirtualDisplay>, Pipeline)> {
let mut new_vd = crate::vdisplay::open(sw.compositor)?;
let pipe = build_pipeline_with_retry(&mut new_vd, mode, bitrate_kbps)?;
let pipe =
build_pipeline_with_retry(&mut new_vd, mode, bitrate_kbps, bit_depth)?;
Ok((new_vd, pipe))
})();
match rebuilt {
@@ -2084,7 +2109,7 @@ fn virtual_stream(
// Build the new pipeline BEFORE dropping the old one: the host already acked
// the switch as accepted, so a rebuild failure must not kill an otherwise
// healthy session — keep streaming the current mode and log instead.
match build_pipeline(&mut vd, new_mode, bitrate_kbps) {
match build_pipeline(&mut vd, new_mode, bitrate_kbps, bit_depth) {
Ok(next_pipe) => {
(capturer, enc, frame, interval) = next_pipe;
next = std::time::Instant::now();
@@ -2176,11 +2201,12 @@ fn build_pipeline_with_retry(
vd: &mut Box<dyn crate::vdisplay::VirtualDisplay>,
mode: punktfunk_core::Mode,
bitrate_kbps: u32,
bit_depth: u8,
) -> Result<Pipeline> {
const MAX_ATTEMPTS: u32 = 4;
let mut backoff = std::time::Duration::from_millis(500);
for attempt in 1..=MAX_ATTEMPTS {
match build_pipeline(vd, mode, bitrate_kbps) {
match build_pipeline(vd, mode, bitrate_kbps, bit_depth) {
Ok(pipe) => {
if attempt > 1 {
tracing::info!(attempt, "pipeline up after retry");
@@ -2238,6 +2264,7 @@ fn build_pipeline(
vd: &mut Box<dyn crate::vdisplay::VirtualDisplay>,
mode: punktfunk_core::Mode,
bitrate_kbps: u32,
bit_depth: u8,
) -> Result<Pipeline> {
let vout = vd.create(mode).context("create virtual output")?;
// The backend reports the refresh it actually achieved in `preferred_mode.2` (KWin may cap a
@@ -2260,6 +2287,8 @@ fn build_pipeline(
crate::capture::capture_virtual_output(vout).context("capture virtual output")?;
capturer.set_active(true);
let frame = capturer.next_frame().context("first frame")?;
// `bit_depth` is the handshake-negotiated value (8, or 10 = HEVC Main10 when the client
// advertised VIDEO_CAP_10BIT and the host opted in). Threaded down from the Welcome.
let enc = crate::encode::open_video(
crate::encode::Codec::H265,
frame.format,
@@ -2268,6 +2297,7 @@ fn build_pipeline(
effective_hz,
bitrate_kbps as u64 * 1000,
frame.is_cuda(),
bit_depth,
)
.context("open NVENC")?;
let interval = std::time::Duration::from_secs_f64(1.0 / effective_hz.max(1) as f64);
+15 -5
View File
@@ -1,4 +1,4 @@
//! Windows virtual-display backend driving **SudoVDA** (the SudoMaker Virtual Display Adapter —
//! Windows virtual-display backend driving **SudoVDA** (the SudoMaker Virtual Display Adapter —
//! the Indirect Display Driver the Apollo Sunshine-fork ships). The Windows analogue of the
//! Linux per-compositor backends: [`create`](VirtualDisplay::create) adds a virtual monitor at the
//! client's exact `WxH@Hz` (the mode is baked into the ADD IOCTL — no EDID seeding), starts the
@@ -161,7 +161,11 @@ fn set_active_mode(gdi_name: &str, mode: Mode) {
..Default::default()
};
let ok = unsafe {
EnumDisplaySettingsW(PCWSTR(wname.as_ptr()), ENUM_DISPLAY_SETTINGS_MODE(i), &mut dm)
EnumDisplaySettingsW(
PCWSTR(wname.as_ptr()),
ENUM_DISPLAY_SETTINGS_MODE(i),
&mut dm,
)
}
.as_bool();
if !ok {
@@ -175,7 +179,12 @@ fn set_active_mode(gdi_name: &str, mode: Mode) {
}
let chosen_hz = if at_res.contains(&mode.refresh_hz) {
mode.refresh_hz
} else if let Some(hz) = at_res.iter().copied().filter(|&hz| hz <= mode.refresh_hz).max() {
} else if let Some(hz) = at_res
.iter()
.copied()
.filter(|&hz| hz <= mode.refresh_hz)
.max()
{
hz
} else if let Some(hz) = at_res.iter().copied().max() {
hz
@@ -212,8 +221,9 @@ fn set_active_mode(gdi_name: &str, mode: Mode) {
dmDisplayFrequency: chosen_hz,
..Default::default()
};
let test =
unsafe { ChangeDisplaySettingsExW(PCWSTR(wname.as_ptr()), Some(&dm), None, CDS_TEST, None) };
let test = unsafe {
ChangeDisplaySettingsExW(PCWSTR(wname.as_ptr()), Some(&dm), None, CDS_TEST, None)
};
if test != DISP_CHANGE_SUCCESSFUL {
tracing::warn!(
result = test.0,
+2 -1
View File
@@ -36,7 +36,8 @@ pub fn drm_fourcc(format: crate::capture::PixelFormat) -> Option<u32> {
Rgbx => fourcc(b"XB24"), // DRM_FORMAT_XBGR8888
Rgba => fourcc(b"AB24"), // DRM_FORMAT_ABGR8888
// 24-bit packed RGB/BGR have no straightforward dmabuf import here; use the CPU path.
Rgb | Bgr => return None,
// Rgb10a2 is the Windows HDR capture format — never produced by the Linux capturer.
Rgb | Bgr | Rgb10a2 => return None,
})
}