feat(windows-client): D3D11VA zero-copy hw decode + HDR10 present + GUI polish
windows-msix / package (push) Successful in 1m2s
apple / swift (push) Successful in 54s
windows / build (push) Failing after 1m2s
android / android (push) Failing after 48s
ci / web (push) Failing after 6s
ci / docs-site (push) Failing after 1s
ci / bench (push) Failing after 0s
deb / build-publish (push) Failing after 0s
decky / build-publish (push) Failing after 0s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Failing after 0s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Failing after 1s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 0s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Failing after 0s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 0s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Failing after 1s
docker / deploy-docs (push) Has been skipped
ci / rust (push) Failing after 2m0s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 4m18s
windows-msix / package (push) Successful in 1m2s
apple / swift (push) Successful in 54s
windows / build (push) Failing after 1m2s
android / android (push) Failing after 48s
ci / web (push) Failing after 6s
ci / docs-site (push) Failing after 1s
ci / bench (push) Failing after 0s
deb / build-publish (push) Failing after 0s
decky / build-publish (push) Failing after 0s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Failing after 0s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Failing after 1s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Failing after 0s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Failing after 0s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Failing after 0s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Failing after 1s
docker / deploy-docs (push) Has been skipped
ci / rust (push) Failing after 2m0s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Failing after 4m18s
The client was pure software HEVC decode + CPU swscale->RGBA + a full-frame
dynamic-texture upload every frame -- the reason performance was poor on a GPU
box (the GPU sat idle while the CPU churned). This adds a hardware path, HDR,
and a GUI pass.
Performance -- D3D11VA zero-copy:
- gpu.rs (new): one D3D11 device (hardware + VIDEO_SUPPORT, WARP fallback,
multithread-protected) shared by decoder and presenter via a Send/Sync
OnceLock. Sharing is mandatory -- a decoded texture is only bindable on the
device that created it. windows-rs COM interfaces are !Send/!Sync, so the
unsafe impl is sound only under the multithread protection + disjoint
decode(video ctx)/present(immediate ctx) split.
- video.rs: D3d11vaDecoder (raw FFI mirroring the Linux VAAPI module). The
COM-typed AVD3D11VA{Device,Frames}Context are declared here (stable FFmpeg
ABI) to avoid ffmpeg-sys binding the d3d11 headers; get_format builds a frames
ctx with BindFlags=SHADER_RESOURCE so the NV12/P010 array slices are
sampleable. av_frame_clone guard keeps each surface out of the reuse pool
until the presenter drops it. Software decode stays as the fallback
(DecoderPref Auto/Hardware/Software; auto falls back on init/decode error).
- present.rs: shared device; per-plane SRVs over the array slice
(NV12->R8/R8G8, P010->R16/R16G16) + three pixel shaders (RGBA passthrough,
NV12/BT.709, P010/BT.2020-PQ). present() now takes the frame by value so the
GPU surface survives re-presents.
HDR:
- Detected in-band (transfer == SMPTE2084), same signal as the other clients.
Swapchain flips to R10G10B10A2 + ST.2084 + HDR10 metadata. New Settings toggle
gates advertising VIDEO_CAP_10BIT|HDR; host still gates 10-bit behind its own
PUNKTFUNK_10BIT + actual-HDR-content checks.
GUI (windows-reactor):
- Host cards with accent-monogram avatars + colored status pills, InfoBar for
errors/pairing hints, ToggleSwitch settings (+ HDR, decoder, bitrate), button
icons, a richer connecting screen, and a stream HUD with GPU/CPU-decode + HDR
status chips.
Not yet on-glass validated: the Linux dev box can't compile the cfg(windows)
code (ffmpeg/windows crates unfetched; WARP has no hw decode) -- only
cargo fmt checks it here. API shapes verified against the windows-rs/reactor
source and the YUV->RGB coefficients checked by hand, but D3D11VA + shaders +
the GUI need a real build (Windows CI / build VM) and on-glass test on the RTX
box. The host-side HDR encode path is unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
+397
-22
@@ -1,27 +1,76 @@
|
||||
//! Video decode: reassembled HEVC access units → frames for the D3D11 presenter.
|
||||
//!
|
||||
//! The dev box has no working GPU, so this ships the **software** backend first: libavcodec
|
||||
//! on the CPU + swscale to RGBA, uploaded into a D3D11 texture by the presenter. It runs
|
||||
//! `AV_CODEC_FLAG_LOW_DELAY` with slice threading only — the host encodes zero-reorder
|
||||
//! streams (no B-frames, in-band parameter sets on every IDR), so decode is strictly
|
||||
//! one-in/one-out and frame threading would only add latency.
|
||||
//! Two backends, picked at session start (override via [`DecoderPref`] / the Settings UI):
|
||||
//!
|
||||
//! `DecodedFrame` is an enum so the real-GPU **D3D11VA** path (decode → `NV12`/`P010`
|
||||
//! `ID3D11Texture2D`, zero-copy into the swapchain) can be added as a second variant without
|
||||
//! touching the session pump or the presenter's frame contract.
|
||||
//! * **D3D11VA** (any GPU): libavcodec decodes on the GPU straight into `ID3D11Texture2D`s that
|
||||
//! carry `D3D11_BIND_SHADER_RESOURCE`, so the presenter samples the decoded NV12/P010 surface
|
||||
//! directly — **zero copy** (no swscale, no CPU readback, no per-frame upload). The textures are
|
||||
//! created by the process-wide shared device ([`crate::gpu`]) the presenter also draws with, which
|
||||
//! is what makes them bindable there. This is the big latency/throughput win over software decode.
|
||||
//! * **Software**: libavcodec on the CPU + swscale to a packed 4-byte format the presenter uploads
|
||||
//! (`RGBA` for SDR, `X2BGR10` for HDR). The fallback on a GPU-less box (WARP), when D3D11VA init
|
||||
//! fails, or when a mid-session hardware error demotes us — the host's IDR/RFI recovery
|
||||
//! resynchronizes on the next keyframe either way.
|
||||
//!
|
||||
//! Both run `AV_CODEC_FLAG_LOW_DELAY`; the host encodes zero-reorder streams (no B-frames, in-band
|
||||
//! parameter sets on every IDR), so decode is strictly one-in/one-out.
|
||||
//!
|
||||
//! HDR is detected in-band from the decoded frame's transfer characteristic (`SMPTE2084` / PQ in the
|
||||
//! HEVC VUI) — the same signal every other punktfunk client keys off — not from a protocol field.
|
||||
|
||||
use anyhow::{anyhow, Context as _, Result};
|
||||
use anyhow::{anyhow, bail, Context as _, Result};
|
||||
use ffmpeg::format::Pixel;
|
||||
use ffmpeg::software::scaling;
|
||||
use ffmpeg::util::frame::Video as AvFrame;
|
||||
use ffmpeg_next as ffmpeg;
|
||||
use std::ffi::c_void;
|
||||
use std::ptr;
|
||||
use windows::core::Interface; // ID3D11Device::clone().into_raw() for the FFmpeg hwdevice ctx
|
||||
|
||||
/// Which decode backend to use; the Settings UI persists this as a string.
|
||||
#[derive(Clone, Copy, PartialEq, Eq, Debug, Default)]
|
||||
pub enum DecoderPref {
|
||||
/// Try D3D11VA, fall back to software.
|
||||
#[default]
|
||||
Auto,
|
||||
/// Force D3D11VA (error out if unavailable, for debugging).
|
||||
Hardware,
|
||||
/// Force software decode.
|
||||
Software,
|
||||
}
|
||||
|
||||
impl DecoderPref {
|
||||
pub fn from_name(s: &str) -> DecoderPref {
|
||||
match s {
|
||||
"hardware" => DecoderPref::Hardware,
|
||||
"software" => DecoderPref::Software,
|
||||
_ => DecoderPref::Auto,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub enum DecodedFrame {
|
||||
Cpu(CpuFrame),
|
||||
Gpu(GpuFrame),
|
||||
}
|
||||
|
||||
/// Packed 4-byte-per-pixel frame for a D3D11 texture upload (which takes a row pitch). The bytes
|
||||
/// are `R8G8B8A8` for SDR and `X2BGR10` (== DXGI `R10G10B10A2`, R in the low 10 bits) for HDR.
|
||||
impl DecodedFrame {
|
||||
pub fn dims(&self) -> (u32, u32) {
|
||||
match self {
|
||||
DecodedFrame::Cpu(c) => (c.width, c.height),
|
||||
DecodedFrame::Gpu(g) => (g.width, g.height),
|
||||
}
|
||||
}
|
||||
pub fn hdr(&self) -> bool {
|
||||
match self {
|
||||
DecodedFrame::Cpu(c) => c.hdr,
|
||||
DecodedFrame::Gpu(g) => g.hdr,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Packed 4-byte-per-pixel frame for a D3D11 dynamic-texture upload (which takes a row pitch). The
|
||||
/// bytes are `R8G8B8A8` for SDR and `X2BGR10` (== DXGI `R10G10B10A2`, R in the low 10 bits) for HDR.
|
||||
pub struct CpuFrame {
|
||||
pub width: u32,
|
||||
pub height: u32,
|
||||
@@ -33,26 +82,101 @@ pub struct CpuFrame {
|
||||
pub hdr: bool,
|
||||
}
|
||||
|
||||
/// A decoded frame still on the GPU: a D3D11 texture **array** plus the slice index the decoder
|
||||
/// wrote this frame into. The presenter creates per-plane shader-resource views over the slice and
|
||||
/// converts YUV→RGB in a pixel shader. The underlying surface stays alive — and out of the decoder's
|
||||
/// reuse pool — for exactly as long as `guard` (an `av_frame_clone` of the decoded frame) lives.
|
||||
pub struct GpuFrame {
|
||||
pub width: u32,
|
||||
pub height: u32,
|
||||
/// Texture-array slice this frame occupies (`AVFrame::data[1]`).
|
||||
pub index: u32,
|
||||
/// BT.2020 PQ HDR10 (P010, ST.2084) vs ordinary 8-bit BT.709 SDR (NV12).
|
||||
pub hdr: bool,
|
||||
/// 10-bit (P010, R16 planes) vs 8-bit (NV12, R8 planes) — kept for the first-frame log; the
|
||||
/// present path keys colour/format off `hdr` (the host couples 10-bit ⟺ HDR).
|
||||
pub ten_bit: bool,
|
||||
guard: D3d11FrameGuard,
|
||||
}
|
||||
|
||||
impl GpuFrame {
|
||||
/// The decoder's D3D11 texture array holding this frame's slice, borrowed from the live cloned
|
||||
/// `AVFrame`. Construct the windows-rs interface on the thread that will use it (the presenter /
|
||||
/// UI thread): COM interfaces are `!Send`, but the raw pointer is fine to carry across threads.
|
||||
pub fn texture_ptr(&self) -> *mut c_void {
|
||||
unsafe { (*self.guard.0).data[0] as *mut c_void }
|
||||
}
|
||||
}
|
||||
|
||||
/// Owns a cloned decoded `AVFrame` (which refs the D3D11 surface in the decoder pool). Dropping it
|
||||
/// releases the surface back for reuse. The clone is plain refcounted data; freeing it from the
|
||||
/// presenter thread is fine.
|
||||
pub struct D3d11FrameGuard(*mut ffmpeg::ffi::AVFrame);
|
||||
unsafe impl Send for D3d11FrameGuard {}
|
||||
impl Drop for D3d11FrameGuard {
|
||||
fn drop(&mut self) {
|
||||
unsafe { ffmpeg::ffi::av_frame_free(&mut self.0) };
|
||||
}
|
||||
}
|
||||
|
||||
enum Backend {
|
||||
D3d11va(D3d11vaDecoder),
|
||||
Software(SoftwareDecoder),
|
||||
}
|
||||
|
||||
pub struct Decoder {
|
||||
inner: SoftwareDecoder,
|
||||
backend: Backend,
|
||||
}
|
||||
|
||||
impl Decoder {
|
||||
pub fn new() -> Result<Decoder> {
|
||||
pub fn new(pref: DecoderPref) -> Result<Decoder> {
|
||||
ffmpeg::init().context("ffmpeg init")?;
|
||||
if pref != DecoderPref::Software {
|
||||
match D3d11vaDecoder::new() {
|
||||
Ok(d) => {
|
||||
tracing::info!("D3D11VA hardware decode active (zero-copy)");
|
||||
return Ok(Decoder {
|
||||
backend: Backend::D3d11va(d),
|
||||
});
|
||||
}
|
||||
Err(e) => {
|
||||
if pref == DecoderPref::Hardware {
|
||||
return Err(e.context("decoder=hardware but D3D11VA failed"));
|
||||
}
|
||||
tracing::info!(reason = %e, "D3D11VA unavailable — software decode");
|
||||
}
|
||||
}
|
||||
}
|
||||
Ok(Decoder {
|
||||
inner: SoftwareDecoder::new()?,
|
||||
backend: Backend::Software(SoftwareDecoder::new()?),
|
||||
})
|
||||
}
|
||||
|
||||
/// Feed one access unit; returns the decoded frame (the host's streams are
|
||||
/// one-in/one-out). A decode error after packet loss is survivable — log upstream and
|
||||
/// keep feeding; the host's IDR/RFI recovery resynchronizes on the next keyframe.
|
||||
/// True for the zero-copy hardware backend (shown in the stream HUD).
|
||||
pub fn is_hardware(&self) -> bool {
|
||||
matches!(self.backend, Backend::D3d11va(_))
|
||||
}
|
||||
|
||||
/// Feed one access unit; returns the decoded frame (the host's streams are one-in/one-out). A
|
||||
/// software decode error after packet loss is survivable — keep feeding. A D3D11VA error demotes
|
||||
/// to software for the rest of the session (the next IDR resynchronizes).
|
||||
pub fn decode(&mut self, au: &[u8]) -> Result<Option<DecodedFrame>> {
|
||||
Ok(self.inner.decode(au)?.map(DecodedFrame::Cpu))
|
||||
match &mut self.backend {
|
||||
Backend::D3d11va(d) => match d.decode(au) {
|
||||
Ok(f) => Ok(f.map(DecodedFrame::Gpu)),
|
||||
Err(e) => {
|
||||
tracing::warn!(error = %e, "D3D11VA decode failed — falling back to software");
|
||||
self.backend = Backend::Software(SoftwareDecoder::new()?);
|
||||
Ok(None)
|
||||
}
|
||||
},
|
||||
Backend::Software(s) => Ok(s.decode(au)?.map(DecodedFrame::Cpu)),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// --- software backend ---------------------------------------------------------------
|
||||
|
||||
struct SoftwareDecoder {
|
||||
decoder: ffmpeg::decoder::Video,
|
||||
/// Rebuilt whenever the decoded format/size **or output format** changes (mid-stream
|
||||
@@ -90,10 +214,9 @@ impl SoftwareDecoder {
|
||||
}
|
||||
|
||||
/// Convert the decoded YUV frame to a packed 4-byte format the presenter uploads directly:
|
||||
/// SDR → `RGBA` (BT.709), HDR (SMPTE ST.2084 / PQ transfer) → `X2BGR10` (10-bit, == DXGI
|
||||
/// R10G10B10A2) using the BT.2020 matrix. For HDR the PQ-encoded values pass through unchanged
|
||||
/// (swscale only applies the YUV→RGB matrix + range, never the transfer) — exactly what an
|
||||
/// HDR10/ST.2084 swapchain wants.
|
||||
/// SDR → `RGBA` (BT.709), HDR (SMPTE ST.2084 / PQ transfer) → `X2BGR10` (== DXGI R10G10B10A2)
|
||||
/// using the BT.2020 matrix. For HDR the PQ-encoded values pass through unchanged (swscale only
|
||||
/// applies the YUV→RGB matrix + range, never the transfer) — exactly what an HDR10 swapchain wants.
|
||||
fn convert(&mut self, frame: &AvFrame) -> Result<CpuFrame> {
|
||||
use ffmpeg::color::TransferCharacteristic;
|
||||
let (fmt, w, h) = (frame.format(), frame.width(), frame.height());
|
||||
@@ -134,3 +257,255 @@ impl SoftwareDecoder {
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// --- D3D11VA backend ------------------------------------------------------------------
|
||||
//
|
||||
// Raw FFI: ffmpeg-next has no hwaccel wrappers. The COM-typed hwcontext structs are declared here
|
||||
// (stable FFmpeg public ABI) rather than relied on from ffmpeg-sys bindgen — the generic
|
||||
// AVHWDeviceContext / AVHWFramesContext (whose payload is an opaque `void *hwctx`) come from
|
||||
// ffmpeg-sys, and we cast `hwctx` to the structs below. All owned pointers are freed in Drop;
|
||||
// decoded surfaces transfer out through D3d11FrameGuard.
|
||||
|
||||
const AVERROR_EAGAIN: i32 = -11; // -EAGAIN
|
||||
const D3D11_BIND_SHADER_RESOURCE: u32 = 0x8; // <d3d11.h>; FFmpeg ORs D3D11_BIND_DECODER itself
|
||||
|
||||
/// `hwcontext_d3d11va.h` — `AVHWDeviceContext::hwctx`. Leaving `lock` null makes FFmpeg install an
|
||||
/// `ID3D11Multithread` default lock + set multithread protection on `device_context` during init,
|
||||
/// which is what lets the presenter share this device's immediate context from the UI thread.
|
||||
#[repr(C)]
|
||||
struct AVD3D11VADeviceContext {
|
||||
device: *mut c_void, // ID3D11Device*
|
||||
device_context: *mut c_void, // ID3D11DeviceContext*
|
||||
video_device: *mut c_void, // ID3D11VideoDevice*
|
||||
video_context: *mut c_void, // ID3D11VideoContext*
|
||||
lock: *mut c_void, // void (*)(void*)
|
||||
unlock: *mut c_void, // void (*)(void*)
|
||||
lock_ctx: *mut c_void,
|
||||
}
|
||||
|
||||
/// `hwcontext_d3d11va.h` — `AVHWFramesContext::hwctx`. `BindFlags` lets us add
|
||||
/// `D3D11_BIND_SHADER_RESOURCE` so the decoded array texture is sampleable (zero copy).
|
||||
#[repr(C)]
|
||||
struct AVD3D11VAFramesContext {
|
||||
texture: *mut c_void, // ID3D11Texture2D* (null → FFmpeg allocates the pool)
|
||||
bind_flags: u32, // UINT BindFlags
|
||||
misc_flags: u32, // UINT MiscFlags
|
||||
}
|
||||
|
||||
fn averr(what: &str, code: i32) -> anyhow::Error {
|
||||
anyhow!("{what}: {}", ffmpeg::Error::from(code))
|
||||
}
|
||||
|
||||
/// libavcodec's `get_format` callback: accept the D3D11 hw surface, building a frames context whose
|
||||
/// textures carry `BIND_SHADER_RESOURCE` (so the presenter can sample them). Returning anything but
|
||||
/// `AV_PIX_FMT_D3D11` aborts hardware decode → the session demotes to software.
|
||||
unsafe extern "C" fn get_format_d3d11(
|
||||
avctx: *mut ffmpeg::ffi::AVCodecContext,
|
||||
mut list: *const ffmpeg::ffi::AVPixelFormat,
|
||||
) -> ffmpeg::ffi::AVPixelFormat {
|
||||
use ffmpeg::ffi::*;
|
||||
unsafe {
|
||||
let mut found = false;
|
||||
while *list != AVPixelFormat::AV_PIX_FMT_NONE {
|
||||
if *list == AVPixelFormat::AV_PIX_FMT_D3D11 {
|
||||
found = true;
|
||||
break;
|
||||
}
|
||||
list = list.add(1);
|
||||
}
|
||||
if !found {
|
||||
return AVPixelFormat::AV_PIX_FMT_NONE;
|
||||
}
|
||||
let device_ref = (*avctx).hw_device_ctx;
|
||||
if device_ref.is_null() {
|
||||
return AVPixelFormat::AV_PIX_FMT_NONE;
|
||||
}
|
||||
let frames_ref = av_hwframe_ctx_alloc(device_ref);
|
||||
if frames_ref.is_null() {
|
||||
return AVPixelFormat::AV_PIX_FMT_NONE;
|
||||
}
|
||||
let frames = (*frames_ref).data as *mut AVHWFramesContext;
|
||||
(*frames).format = AVPixelFormat::AV_PIX_FMT_D3D11;
|
||||
let sw = if (*avctx).sw_pix_fmt != AVPixelFormat::AV_PIX_FMT_NONE {
|
||||
(*avctx).sw_pix_fmt
|
||||
} else {
|
||||
AVPixelFormat::AV_PIX_FMT_NV12
|
||||
};
|
||||
(*frames).sw_format = sw;
|
||||
(*frames).width = (*avctx).coded_width;
|
||||
(*frames).height = (*avctx).coded_height;
|
||||
// DPB + a few in-flight (decoded channel + the presenter's held frame); the host's
|
||||
// zero-reorder stream needs only a small DPB, so 20 is comfortable headroom.
|
||||
(*frames).initial_pool_size = 20;
|
||||
let fhw = (*frames).hwctx as *mut AVD3D11VAFramesContext;
|
||||
(*fhw).bind_flags = D3D11_BIND_SHADER_RESOURCE;
|
||||
let r = av_hwframe_ctx_init(frames_ref);
|
||||
if r < 0 {
|
||||
let mut fr = frames_ref;
|
||||
av_buffer_unref(&mut fr);
|
||||
return AVPixelFormat::AV_PIX_FMT_NONE;
|
||||
}
|
||||
(*avctx).hw_frames_ctx = frames_ref; // decoder takes ownership
|
||||
AVPixelFormat::AV_PIX_FMT_D3D11
|
||||
}
|
||||
}
|
||||
|
||||
struct D3d11vaDecoder {
|
||||
ctx: *mut ffmpeg::ffi::AVCodecContext,
|
||||
hw_device: *mut ffmpeg::ffi::AVBufferRef,
|
||||
packet: *mut ffmpeg::ffi::AVPacket,
|
||||
frame: *mut ffmpeg::ffi::AVFrame,
|
||||
}
|
||||
|
||||
// Single-owner pointers, only touched from the session pump thread.
|
||||
unsafe impl Send for D3d11vaDecoder {}
|
||||
|
||||
impl D3d11vaDecoder {
|
||||
fn new() -> Result<D3d11vaDecoder> {
|
||||
use ffmpeg::ffi;
|
||||
let shared = crate::gpu::shared().ok_or_else(|| anyhow!("no shared D3D11 device"))?;
|
||||
if !shared.hardware {
|
||||
bail!("shared device is WARP (no hardware video decode)");
|
||||
}
|
||||
unsafe {
|
||||
// Build a D3D11VA hwdevice context around the *shared* device, so decoded textures live
|
||||
// on the same device the presenter samples + draws with.
|
||||
let hw_device =
|
||||
ffi::av_hwdevice_ctx_alloc(ffi::AVHWDeviceType::AV_HWDEVICE_TYPE_D3D11VA);
|
||||
if hw_device.is_null() {
|
||||
bail!("av_hwdevice_ctx_alloc(D3D11VA) failed");
|
||||
}
|
||||
let devctx = (*hw_device).data as *mut ffi::AVHWDeviceContext;
|
||||
let d3dctx = (*devctx).hwctx as *mut AVD3D11VADeviceContext;
|
||||
// Hand FFmpeg an owned ref to the device + immediate context (it Releases them when the
|
||||
// hwdevice ctx is freed). `into_raw()` transfers a +1 ref without releasing.
|
||||
(*d3dctx).device = shared.device.clone().into_raw();
|
||||
(*d3dctx).device_context = shared.context.clone().into_raw();
|
||||
// lock left null → FFmpeg installs the ID3D11Multithread default lock in init.
|
||||
let r = ffi::av_hwdevice_ctx_init(hw_device);
|
||||
if r < 0 {
|
||||
let mut hw = hw_device;
|
||||
ffi::av_buffer_unref(&mut hw);
|
||||
bail!("av_hwdevice_ctx_init: {}", ffmpeg::Error::from(r));
|
||||
}
|
||||
|
||||
let codec = ffi::avcodec_find_decoder(ffi::AVCodecID::AV_CODEC_ID_HEVC);
|
||||
if codec.is_null() {
|
||||
let mut hw = hw_device;
|
||||
ffi::av_buffer_unref(&mut hw);
|
||||
bail!("no HEVC decoder");
|
||||
}
|
||||
let ctx = ffi::avcodec_alloc_context3(codec);
|
||||
(*ctx).hw_device_ctx = ffi::av_buffer_ref(hw_device);
|
||||
(*ctx).get_format = Some(get_format_d3d11);
|
||||
(*ctx).flags |= ffi::AV_CODEC_FLAG_LOW_DELAY as i32;
|
||||
(*ctx).thread_count = 1; // hwaccel: threads only add latency
|
||||
let r = ffi::avcodec_open2(ctx, codec, ptr::null_mut());
|
||||
if r < 0 {
|
||||
let mut ctx = ctx;
|
||||
ffi::avcodec_free_context(&mut ctx);
|
||||
let mut hw = hw_device;
|
||||
ffi::av_buffer_unref(&mut hw);
|
||||
bail!("avcodec_open2 (D3D11VA): {}", ffmpeg::Error::from(r));
|
||||
}
|
||||
Ok(D3d11vaDecoder {
|
||||
ctx,
|
||||
hw_device,
|
||||
packet: ffi::av_packet_alloc(),
|
||||
frame: ffi::av_frame_alloc(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
fn decode(&mut self, au: &[u8]) -> Result<Option<GpuFrame>> {
|
||||
use ffmpeg::ffi;
|
||||
unsafe {
|
||||
let r = ffi::av_new_packet(self.packet, au.len() as i32);
|
||||
if r < 0 {
|
||||
return Err(averr("av_new_packet", r));
|
||||
}
|
||||
ptr::copy_nonoverlapping(au.as_ptr(), (*self.packet).data, au.len());
|
||||
let r = ffi::avcodec_send_packet(self.ctx, self.packet);
|
||||
ffi::av_packet_unref(self.packet);
|
||||
if r < 0 {
|
||||
return Err(averr("send_packet", r));
|
||||
}
|
||||
let mut out = None;
|
||||
loop {
|
||||
let r = ffi::avcodec_receive_frame(self.ctx, self.frame);
|
||||
if r == AVERROR_EAGAIN {
|
||||
break;
|
||||
}
|
||||
if r < 0 {
|
||||
return Err(averr("receive_frame", r));
|
||||
}
|
||||
out = Some(self.lift()?); // newest wins; older guards drop here
|
||||
ffi::av_frame_unref(self.frame);
|
||||
}
|
||||
Ok(out)
|
||||
}
|
||||
}
|
||||
|
||||
/// Lift the decoded D3D11 surface into a `GpuFrame`. `data[0]` is the texture array, `data[1]`
|
||||
/// the slice index. We `av_frame_clone` so the surface stays referenced (kept out of the reuse
|
||||
/// pool) until the presenter drops the guard.
|
||||
unsafe fn lift(&mut self) -> Result<GpuFrame> {
|
||||
use ffmpeg::ffi;
|
||||
unsafe {
|
||||
if (*self.frame).format != ffi::AVPixelFormat::AV_PIX_FMT_D3D11 as i32 {
|
||||
bail!("decoder returned a software frame (no D3D11 surface)");
|
||||
}
|
||||
let hdr =
|
||||
(*self.frame).color_trc == ffi::AVColorTransferCharacteristic::AVCOL_TRC_SMPTE2084;
|
||||
let ten_bit = {
|
||||
let hwfc = (*self.frame).hw_frames_ctx;
|
||||
!hwfc.is_null()
|
||||
&& (*((*hwfc).data as *const ffi::AVHWFramesContext)).sw_format
|
||||
== ffi::AVPixelFormat::AV_PIX_FMT_P010LE
|
||||
};
|
||||
let cloned = ffi::av_frame_clone(self.frame);
|
||||
if cloned.is_null() {
|
||||
bail!("av_frame_clone failed");
|
||||
}
|
||||
let frame = GpuFrame {
|
||||
width: (*self.frame).width as u32,
|
||||
height: (*self.frame).height as u32,
|
||||
index: (*self.frame).data[1] as usize as u32,
|
||||
hdr,
|
||||
ten_bit,
|
||||
guard: D3d11FrameGuard(cloned),
|
||||
};
|
||||
log_layout_once(frame.width, frame.height, frame.index, hdr, ten_bit);
|
||||
Ok(frame)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for D3d11vaDecoder {
|
||||
fn drop(&mut self) {
|
||||
use ffmpeg::ffi;
|
||||
unsafe {
|
||||
ffi::av_packet_free(&mut self.packet);
|
||||
ffi::av_frame_free(&mut self.frame);
|
||||
ffi::avcodec_free_context(&mut self.ctx);
|
||||
ffi::av_buffer_unref(&mut self.hw_device);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// One-time dump of the first decoded surface's layout — so a new GPU/driver combination's real
|
||||
/// format (slice index range, HDR/bit-depth) is visible in the logs without a debugger.
|
||||
fn log_layout_once(width: u32, height: u32, index: u32, hdr: bool, ten_bit: bool) {
|
||||
use std::sync::atomic::{AtomicBool, Ordering};
|
||||
static ONCE: AtomicBool = AtomicBool::new(true);
|
||||
if ONCE.swap(false, Ordering::Relaxed) {
|
||||
tracing::info!(
|
||||
width,
|
||||
height,
|
||||
slice = index,
|
||||
hdr,
|
||||
ten_bit,
|
||||
"D3D11VA first frame (zero-copy)"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user