Files
punktfunk/crates/punktfunk-host/src/encode/windows/ffmpeg_win.rs
T
enricobuehler 75627c8afe
apple / swift (push) Failing after 10s
release / apple (push) Failing after 7s
apple / screenshots (push) Has been skipped
audit / cargo-audit (push) Failing after 1m19s
windows-host / package (push) Failing after 2m44s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Failing after 39s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Failing after 39s
windows / build (aarch64-pc-windows-msvc) (push) Failing after 45s
android / android (push) Successful in 5m17s
windows / build (x86_64-pc-windows-msvc) (push) Failing after 45s
ci / web (push) Successful in 57s
ci / docs-site (push) Successful in 56s
ci / rust (push) Successful in 9m19s
ci / bench (push) Successful in 4m40s
decky / build-publish (push) Successful in 26s
deb / build-publish (push) Successful in 2m57s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 33s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m56s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m35s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m20s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 53s
flatpak / build-publish (push) Successful in 4m22s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m51s
docker / deploy-docs (push) Successful in 21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m50s
feat(audio): end-to-end 5.1/7.1 surround across the native path + all clients
Adds negotiated 5.1/7.1 surround to the punktfunk/1 protocol and every client
(previously stereo-only):

- core: new shared `audio` layout table (LAYOUT_51/71 + identity multistream
  mapping, canonical wire order FL FR FC LFE RL RR SL SR); Hello/Welcome
  `audio_channels` negotiation via the trailing-byte back-compat pattern (old
  peers fall back to stereo); C-ABI `punktfunk_connect_ex6`,
  `punktfunk_connection_audio_channels`, and in-core multistream decode
  `punktfunk_connection_next_audio_pcm` for embedders without a multistream
  Opus decoder. Real-libopus channel-identity round-trip test.
- host: native audio thread captures + Opus-(multi)stream-encodes at the
  negotiated count (with a cross-session cached-capturer channel-mismatch fix);
  GameStream surround unified onto the safe `opus::MSEncoder`, dropping
  `audiopus_sys` (~4 unsafe blocks) and un-gating Windows GameStream surround;
  WASAPI loopback capture relaxed to 2/6/8 with the correct dwChannelMask.
- clients: Linux (PipeWire), Windows (WASAPI), Android (AAudio) decode via
  `opus::MSDecoder` + render multichannel; Apple decodes in-core to PCM →
  AVAudioEngine with an explicit wire-order channel layout; each gains a
  Stereo/5.1/7.1 setting. `punktfunk-probe --audio-channels N` is the headless
  validator.

Verified on Linux: core/host/linux/probe test suites + the Android Rust
(cargo-ndk) build, clippy -D warnings, and rustfmt all green. Windows/Apple
builds, all on-glass checks, and the live native loopback are pending (CI / a
free box).

Also lands the concurrent in-tree HEVC 4:4:4 host work (PUNKTFUNK_444): it
shares the same touched files (quic.rs, punktfunk1.rs, encode/*, ...) and so
cannot be committed separately from the surround changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 21:11:05 +00:00

1293 lines
58 KiB
Rust
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
//! AMD **AMF** and Intel **QSV** hardware encode on Windows via `ffmpeg-next` — the Windows
//! analogue of the Linux [`super::vaapi`] backend (one libavcodec backend per vendor, selected by
//! encoder name: `*_amf` / `*_qsv`). This is the sibling of the direct-SDK [`super::nvenc`] path
//! behind the shared [`Encoder`] trait, selected in [`super::open_video`] (NVIDIA → NVENC,
//! AMD → AMF, Intel → QSV).
//!
//! The capturer hands a `FramePayload::D3d11` texture (NV12/P010 from the D3D11 video processor, or
//! BGRA/Rgb10a2 as a fallback) on the capturer's own `ID3D11Device`. Two input paths, chosen lazily
//! from the first frame and the `PUNKTFUNK_ZEROCOPY` knob:
//!
//! * **System-memory** ([`SystemInner`], the default): read the captured D3D11 surface back to a CPU
//! NV12/P010 [`AVFrame`] (a same-format `CopyResource` → staging → `Map`, plus a `swscale` step for
//! the BGRA fallback) and `avcodec_send_frame` it. AMF/QSV upload it internally. One
//! GPU→CPU→GPU round-trip per frame — the robust path, and the only one that can be brought up
//! without on-glass validation (it is the analogue of the VAAPI "CPU input" fallback).
//! * **Zero-copy D3D11** ([`ZeroCopyInner`], `PUNKTFUNK_ZEROCOPY=1`): wrap the capturer's
//! `ID3D11Device` as an `AV_HWDEVICE_TYPE_D3D11VA` hwdevice (shared, *not* a second device — the
//! capture textures are not shared-handle, so a different device couldn't read them), keep an
//! FFmpeg D3D11 frames pool, `CopySubresourceRegion` the captured texture into a pooled array
//! slice (a GPU-local copy, like NVENC's CUDA path), then feed AMF `AV_PIX_FMT_D3D11` directly,
//! or map the D3D11 frame to a derived QSV surface for QSV. If the hw setup fails to open, this
//! falls back to the system-memory path for the session.
//!
//! **Status: compiles in CI; not yet on-glass validated** (no AMD/Intel Windows box in the lab as of
//! 2026-06-22). The system path is the conservative default; zero-copy is opt-in until validated.
//!
//! Raw FFI: `ffmpeg-next` has no hwcontext wrappers for D3D11VA, so the hwdevice/hwframes calls go
//! through `ffmpeg::ffi` (= `ffmpeg_sys_next`), exactly as the Linux CUDA/VAAPI paths do. The
//! `AVD3D11VADeviceContext`/`AVD3D11VAFramesContext` layouts are mirrored (the bindings don't
//! allowlist `hwcontext_d3d11va.h`), as [`super::linux`] mirrors `AVCUDADeviceContext`.
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
#![deny(clippy::undocumented_unsafe_blocks)]
use super::{ChromaFormat, Codec, EncodedFrame, Encoder};
use crate::capture::{dxgi::D3d11Frame, CapturedFrame, FramePayload, PixelFormat};
use anyhow::{anyhow, bail, Context, Result};
use ffmpeg::format::Pixel;
use ffmpeg::{codec, encoder, Dictionary, Packet, Rational};
use ffmpeg_next as ffmpeg;
use std::os::raw::{c_int, c_uint, c_void};
use std::ptr;
use windows::core::Interface;
use windows::Win32::Graphics::Direct3D11::{
ID3D11Device, ID3D11DeviceContext, ID3D11Resource, ID3D11Texture2D, D3D11_BIND_DECODER,
D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_BIND_VIDEO_ENCODER,
D3D11_CPU_ACCESS_READ, D3D11_MAPPED_SUBRESOURCE, D3D11_MAP_READ, D3D11_TEXTURE2D_DESC,
D3D11_USAGE_STAGING,
};
use windows::Win32::Graphics::Dxgi::Common::{
DXGI_FORMAT, DXGI_FORMAT_B8G8R8A8_UNORM, DXGI_FORMAT_NV12, DXGI_FORMAT_P010,
DXGI_FORMAT_R10G10B10A2_UNORM, DXGI_SAMPLE_DESC,
};
use ffmpeg::ffi; // = ffmpeg_sys_next
// libswscale scaler-flag + colour-space constants (not exported as Rust consts by the bindings —
// the stable `<libswscale/swscale.h>` #defines, same as the VAAPI path uses).
const SWS_POINT: c_int = 0x10;
const SWS_CS_ITU709: c_int = 1;
const SWS_CS_BT2020: c_int = 9;
/// `AVD3D11VADeviceContext` (libavutil/hwcontext_d3d11va.h) — mirrored (the ffmpeg-sys bindings
/// don't allowlist that header). We set `device` to the capturer's `ID3D11Device` so AMF/QSV share
/// it; `av_hwdevice_ctx_init` fills `device_context`/`video_device`/`video_context`/the default
/// lock from a non-null `device`.
#[repr(C)]
struct AVD3D11VADeviceContext {
device: *mut c_void, // ID3D11Device*
device_context: *mut c_void, // ID3D11DeviceContext*
video_device: *mut c_void, // ID3D11VideoDevice*
video_context: *mut c_void, // ID3D11VideoContext*
lock: *mut c_void, // void (*)(void*)
unlock: *mut c_void, // void (*)(void*)
lock_ctx: *mut c_void,
}
/// `AVD3D11VAFramesContext` (libavutil/hwcontext_d3d11va.h) — mirrored. `BindFlags`/`MiscFlags`
/// customise the texture-array FFmpeg allocates for the pool; `texture` (we leave null) would let us
/// supply our own array.
#[repr(C)]
struct AVD3D11VAFramesContext {
texture: *mut c_void, // ID3D11Texture2D*
bind_flags: c_uint, // UINT BindFlags
misc_flags: c_uint, // UINT MiscFlags
texture_infos: *mut c_void, // AVD3D11FrameDescriptor* (FFmpeg-owned; we never touch it)
}
/// AMD AMF vs Intel QSV — the two libavcodec vendor backends this module covers.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum WinVendor {
Amf,
Qsv,
}
impl WinVendor {
fn encoder_name(self, codec: Codec) -> &'static str {
match self {
WinVendor::Amf => codec.amf_name(),
WinVendor::Qsv => codec.qsv_name(),
}
}
fn label(self) -> &'static str {
match self {
WinVendor::Amf => "AMF",
WinVendor::Qsv => "QSV",
}
}
}
/// Is the zero-copy D3D11 path enabled? Opt-in (`PUNKTFUNK_ZEROCOPY=1`) until on-glass validated;
/// the default is the robust system-memory readback path.
fn zerocopy_enabled() -> bool {
crate::config::config().zerocopy
}
/// The swscale *source* pixel format for a captured packed-RGB/BGR layout (8-bit BGRA fallback only).
fn sws_src(format: PixelFormat) -> Result<Pixel> {
Ok(match format {
PixelFormat::Bgrx => Pixel::BGRZ,
PixelFormat::Rgbx => Pixel::RGBZ,
PixelFormat::Bgra => Pixel::BGRA,
PixelFormat::Rgba => Pixel::RGBA,
PixelFormat::Rgb => Pixel::RGB24,
PixelFormat::Bgr => Pixel::BGR24,
PixelFormat::Nv12 | PixelFormat::P010 | PixelFormat::Rgb10a2 => {
bail!("ffmpeg_win swscale path supports packed RGB/BGR only; got {format:?}")
}
})
}
/// Does this captured format imply a 10-bit encode (P010 / Rgb10a2)?
fn is_10bit_format(format: PixelFormat, bit_depth: u8) -> bool {
bit_depth >= 10 || matches!(format, PixelFormat::P010 | PixelFormat::Rgb10a2)
}
/// `ffmpeg::format::Pixel` → raw `AVPixelFormat`.
fn pixel_to_av(p: Pixel) -> ffi::AVPixelFormat {
ffi::AVPixelFormat::from(p)
}
/// Build the FFmpeg encoder context shared by both inner paths: name, mode, low-latency RC,
/// infinite GOP, the BT.709-limited (SDR) or BT.2020-PQ (HDR) VUI, the given `pix_fmt`, and the
/// optional hw device/frames contexts (null for the system path). Returns the opened encoder.
#[allow(clippy::too_many_arguments)]
unsafe fn open_win_encoder(
vendor: WinVendor,
codec: Codec,
width: u32,
height: u32,
fps: u32,
bitrate_bps: u64,
pix_fmt: ffi::AVPixelFormat,
sw_pix_fmt: ffi::AVPixelFormat,
ten_bit: bool,
device_ref: *mut ffi::AVBufferRef,
frames_ref: *mut ffi::AVBufferRef,
) -> Result<encoder::video::Encoder> {
let name = vendor.encoder_name(codec);
let av_codec = encoder::find_by_name(name).ok_or_else(|| {
anyhow!(
"{name} not built into libavcodec (no {} encoder)",
vendor.label()
)
})?;
let mut video = codec::context::Context::new_with_codec(av_codec)
.encoder()
.video()
.context("alloc video encoder")?;
video.set_width(width);
video.set_height(height);
// Software view of the input layout (NV12 / P010). For the hw paths `pix_fmt` is overridden to
// D3D11/QSV below; libavcodec still uses this as `sw_pix_fmt`.
video.set_format(Pixel::from(sw_pix_fmt));
video.set_time_base(Rational(1, fps as i32));
video.set_frame_rate(Some(Rational(fps as i32, 1)));
video.set_bit_rate(bitrate_bps as usize);
video.set_max_bit_rate(bitrate_bps as usize); // target == max → CBR
let vbv_frames = std::env::var("PUNKTFUNK_VBV_FRAMES")
.ok()
.and_then(|s| s.parse::<f32>().ok())
.filter(|v| v.is_finite() && *v > 0.0)
.unwrap_or(1.0);
let vbv_bits =
((bitrate_bps as f64 / fps.max(1) as f64) * vbv_frames as f64).clamp(1.0, i32::MAX as f64);
video.set_max_b_frames(0);
let raw = video.as_mut_ptr();
(*raw).rc_buffer_size = vbv_bits as i32;
(*raw).gop_size = i32::MAX; // no periodic IDR (forced-IDR via pict_type=I on RFI)
if ten_bit {
// 10-bit HDR: BT.2020 primaries + SMPTE-2084 (PQ) transfer. The client auto-detects PQ from
// the HEVC VUI; the static mastering metadata also rides the 0xCE datagram out-of-band.
(*raw).colorspace = ffi::AVColorSpace::AVCOL_SPC_BT2020_NCL;
(*raw).color_range = ffi::AVColorRange::AVCOL_RANGE_MPEG;
(*raw).color_primaries = ffi::AVColorPrimaries::AVCOL_PRI_BT2020;
(*raw).color_trc = ffi::AVColorTransferCharacteristic::AVCOL_TRC_SMPTE2084;
} else {
// We hand the encoder BT.709 *limited* NV12 (video-processor or swscale CSC), so signal that
// VUI — else the client decoder washes the picture out.
(*raw).colorspace = ffi::AVColorSpace::AVCOL_SPC_BT709;
(*raw).color_range = ffi::AVColorRange::AVCOL_RANGE_MPEG;
(*raw).color_primaries = ffi::AVColorPrimaries::AVCOL_PRI_BT709;
(*raw).color_trc = ffi::AVColorTransferCharacteristic::AVCOL_TRC_BT709;
}
(*raw).pix_fmt = pix_fmt;
if !device_ref.is_null() {
(*raw).hw_device_ctx = ffi::av_buffer_ref(device_ref);
}
if !frames_ref.is_null() {
(*raw).hw_frames_ctx = ffi::av_buffer_ref(frames_ref);
}
// Low-latency tuning. Unknown private options are ignored by avcodec_open2 (left in the dict),
// so vendor-specific keys are safe to set unconditionally.
let mut opts = Dictionary::new();
match vendor {
WinVendor::Amf => {
opts.set("usage", "ultralowlatency");
opts.set("rc", "cbr");
opts.set("quality", "balanced");
opts.set("preanalysis", "false");
opts.set("enforce_hrd", "true");
// VPS/SPS/PPS on each IDR (clean mid-stream join) — HEVC/AV1 only; ignored elsewhere.
opts.set("header_insertion_mode", "idr");
}
WinVendor::Qsv => {
opts.set("preset", "veryfast");
opts.set("async_depth", "1"); // bound in-flight frames — the big QSV latency lever
opts.set("low_power", "1"); // VDEnc fixed-function path (lower latency)
opts.set("look_ahead", "0"); // (h264_qsv only; ignored on hevc/av1)
opts.set("forced_idr", "1"); // a forced key frame becomes a real IDR
opts.set("scenario", "displayremoting");
}
}
video
.open_with(opts)
.with_context(|| format!("open {name} ({width}x{height}@{fps}, {bitrate_bps} bps)"))
}
/// Probe whether THIS GPU can `vendor`-encode `codec`, by opening a tiny system-input encoder. The
/// driver/runtime rejects codecs the video engine can't do (AV1 on pre-RDNA3 AMD / pre-Arc Intel,
/// or HEVC on a very old part). Used to build the GameStream codec advertisement so a client never
/// negotiates a codec the encoder can't open. Torn down immediately.
/// Whether the active AMD (AMF) / Intel (QSV) GPU can encode HEVC **4:4:4**. **Deferred in v1 —
/// always `false`.** AMF/QSV HEVC 4:4:4 encode is narrow (AMD RDNA3+, Intel Arc/Xe2+) and the
/// libavcodec profile/pixel-format incantation is vendor- and driver-specific — a wrong profile
/// `avcodec_open2` *silently* falls back to 4:2:0, so a positive probe would need a verify-by-frame,
/// and there is no AMD/Intel Windows box in the lab to build + validate that against. Returning
/// `false` keeps the negotiation honest: an AMF/QSV host resolves every session to 4:2:0 before the
/// Welcome. (Follow-up: implement + validate on an RDNA3+/Arc Windows box.)
pub fn probe_can_encode_444(_vendor: WinVendor, _codec: Codec) -> bool {
tracing::info!("AMF/QSV HEVC 4:4:4 encode is not implemented yet — declining (encoding 4:2:0)");
false
}
pub fn probe_can_encode(vendor: WinVendor, codec: Codec) -> bool {
if ffmpeg::init().is_err() {
return false;
}
// SAFETY: `ffmpeg::init()` succeeded above, so libav's global state is initialised.
// `av_log_get_level`/`av_log_set_level` are global scalar getters/setters with no pointer args.
// `open_win_encoder` (the `unsafe fn`) is called with null `device_ref`/`frames_ref` (the system
// path), so it touches no D3D11/hwcontext — it only allocates and opens a self-contained
// libavcodec encoder that is dropped at the end of `.is_ok()`. We restore the prior log level and
// no raw pointer escapes the block.
unsafe {
// A missing AMF/QSV runtime (wrong-vendor host, GPU-less CI) is an expected probe outcome —
// quiet ffmpeg's open error for the probe, then restore the level.
let prev = ffi::av_log_get_level();
ffi::av_log_set_level(ffi::AV_LOG_FATAL);
let ok = open_win_encoder(
vendor,
codec,
640,
480,
30,
2_000_000,
ffi::AVPixelFormat::AV_PIX_FMT_NV12,
ffi::AVPixelFormat::AV_PIX_FMT_NV12,
false,
ptr::null_mut(),
ptr::null_mut(),
)
.is_ok();
ffi::av_log_set_level(prev);
ok
}
}
/// Drain the encoder for one packet (shared poll logic, identical to the VAAPI/NVENC paths).
fn poll_encoder(enc: &mut encoder::video::Encoder, fps: u32) -> Result<Option<EncodedFrame>> {
let mut pkt = Packet::empty();
match enc.receive_packet(&mut pkt) {
Ok(()) => {
let data = pkt.data().map(|d| d.to_vec()).unwrap_or_default();
let pts = pkt.pts().unwrap_or(0).max(0) as u64;
Ok(Some(EncodedFrame {
data,
pts_ns: pts * 1_000_000_000 / fps as u64,
keyframe: pkt.is_key(),
}))
}
Err(ffmpeg::Error::Other { errno })
if errno == ffmpeg::util::error::EAGAIN
|| errno == ffmpeg::util::error::EWOULDBLOCK =>
{
Ok(None)
}
Err(ffmpeg::Error::Eof) => Ok(None),
Err(e) => Err(e).context("receive_packet"),
}
}
/// The immediate context of an `ID3D11Device` (for `CopyResource`/`CopySubresourceRegion`).
unsafe fn immediate_context(device: &ID3D11Device) -> ID3D11DeviceContext {
// windows-rs 0.62: the inherent method takes no args and returns the context (the OutRef form is
// only on the `_Impl` trait, for implementing the interface). Every D3D11 device has one.
device
.GetImmediateContext()
.expect("ID3D11Device always has an immediate context")
}
// ---------------------------------------------------------------------------------------------
// System-memory path (default): read the captured D3D11 surface back to a CPU NV12/P010 frame.
// ---------------------------------------------------------------------------------------------
struct SystemInner {
enc: encoder::video::Encoder,
/// Reusable software NV12/P010 frame: swscale dst / readback dst, and the `send_frame` src.
sw_frame: *mut ffi::AVFrame,
/// swscale ctx for the BGRA→NV12 fallback (built lazily; null for the YUV-readback path).
sws: *mut ffi::SwsContext,
/// CPU-readable staging texture for the D3D11 readback (built lazily on the captured device).
staging: Option<ID3D11Texture2D>,
ctx: Option<ID3D11DeviceContext>,
format: PixelFormat,
ten_bit: bool,
width: u32,
height: u32,
}
impl SystemInner {
#[allow(clippy::too_many_arguments)]
fn open(
vendor: WinVendor,
codec: Codec,
format: PixelFormat,
width: u32,
height: u32,
fps: u32,
bitrate_bps: u64,
bit_depth: u8,
) -> Result<Self> {
let ten_bit = is_10bit_format(format, bit_depth);
let sw_av = if ten_bit {
ffi::AVPixelFormat::AV_PIX_FMT_P010LE
} else {
ffi::AVPixelFormat::AV_PIX_FMT_NV12
};
// SAFETY: calls the `unsafe fn open_win_encoder` with null `device_ref`/`frames_ref`, so the
// system path is taken (no hw device/frames context is touched); all other args are scalars.
// The returned `encoder::video::Encoder` owns its `AVCodecContext` and frees it on drop; no raw
// pointer is aliased.
let enc = unsafe {
open_win_encoder(
vendor,
codec,
width,
height,
fps,
bitrate_bps,
sw_av, // system input: pix_fmt == sw_format (no hw frames ctx)
sw_av,
ten_bit,
ptr::null_mut(),
ptr::null_mut(),
)?
};
// SAFETY: `av_frame_alloc` returns a freshly-allocated, uniquely-owned `AVFrame` (null-checked
// before any deref); writing `format`/`width`/`height` through `*f` stays inside that
// allocation. `av_frame_get_buffer(f, 0)` allocates the backing planes — on failure we
// `av_frame_free` the sole owner (no double-free) and bail; on success the raw `f` is moved into
// `self.sw_frame` and freed exactly once in `Drop`.
let sw_frame = unsafe {
let f = ffi::av_frame_alloc();
if f.is_null() {
bail!("av_frame_alloc(sw) failed");
}
(*f).format = sw_av as c_int;
(*f).width = width as c_int;
(*f).height = height as c_int;
if ffi::av_frame_get_buffer(f, 0) < 0 {
let mut f = f;
ffi::av_frame_free(&mut f);
bail!("av_frame_get_buffer(sw) failed");
}
f
};
tracing::info!(
encoder = vendor.encoder_name(codec),
"{} encode active ({width}x{height}@{fps}, system-memory {} path)",
vendor.label(),
if ten_bit { "P010" } else { "NV12" }
);
Ok(SystemInner {
enc,
sw_frame,
sws: ptr::null_mut(),
staging: None,
ctx: None,
format,
ten_bit,
width,
height,
})
}
/// Lazily (re)build the staging texture matching `dxgi_fmt` on the captured device.
unsafe fn ensure_staging(
&mut self,
device: &ID3D11Device,
dxgi_fmt: DXGI_FORMAT,
) -> Result<()> {
if self.staging.is_some() {
return Ok(());
}
let desc = D3D11_TEXTURE2D_DESC {
Width: self.width,
Height: self.height,
MipLevels: 1,
ArraySize: 1,
Format: dxgi_fmt,
SampleDesc: DXGI_SAMPLE_DESC {
Count: 1,
Quality: 0,
},
Usage: D3D11_USAGE_STAGING,
BindFlags: 0,
CPUAccessFlags: D3D11_CPU_ACCESS_READ.0 as u32,
MiscFlags: 0,
};
let mut t: Option<ID3D11Texture2D> = None;
device
.CreateTexture2D(&desc, None, Some(&mut t))
.context("CreateTexture2D(staging readback)")?;
self.staging = t;
self.ctx = Some(immediate_context(device));
Ok(())
}
/// Send the reusable `sw_frame` to the encoder with the given pts / IDR flag.
unsafe fn send(&mut self, pts: i64, idr: bool) -> Result<()> {
(*self.sw_frame).pts = pts;
(*self.sw_frame).pict_type = if idr {
ffi::AVPictureType::AV_PICTURE_TYPE_I
} else {
ffi::AVPictureType::AV_PICTURE_TYPE_NONE
};
let r = ffi::avcodec_send_frame(self.enc.as_mut_ptr(), self.sw_frame);
if r < 0 {
bail!("avcodec_send_frame({} system) failed ({r})", "ffmpeg_win");
}
Ok(())
}
/// D3D11 path: read the captured surface back into `sw_frame`, then send. Dispatches on the
/// CURRENT frame's `format` — the capturer's video processor latches off on failure and switches
/// NV12→Bgra (SDR) or P010→Rgb10a2 (HDR) mid-session, so a fixed open-time format is wrong.
fn submit_d3d11(
&mut self,
frame: &D3d11Frame,
format: PixelFormat,
pts: i64,
idr: bool,
) -> Result<()> {
let fmt_10 = matches!(format, PixelFormat::P010 | PixelFormat::Rgb10a2);
anyhow::ensure!(
fmt_10 == self.ten_bit,
"captured format {format:?} bit-depth changed under the encoder (built {}-bit)",
if self.ten_bit { 10 } else { 8 }
);
match format {
PixelFormat::Nv12 | PixelFormat::P010 => self.readback_yuv(frame, pts, idr),
PixelFormat::Bgra | PixelFormat::Bgrx => self.readback_bgra(frame, pts, idr),
PixelFormat::Rgb10a2 => self.readback_rgb10(frame, pts, idr),
other => {
bail!("ffmpeg_win system path cannot read back captured D3D11 format {other:?}")
}
}
}
/// Read back a captured NV12/P010 surface plane-by-plane into the software frame.
fn readback_yuv(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> {
let dxgi_fmt = if self.ten_bit {
DXGI_FORMAT_P010
} else {
DXGI_FORMAT_NV12
};
// SAFETY: `ensure_staging` builds a STAGING texture (CPU_ACCESS_READ) matching `dxgi_fmt` on
// `frame.device` — the same `ID3D11Device` that owns `frame.texture` — and caches that device's
// immediate context in `self.ctx`. `src`/`dst` are that device's textures of identical NV12/P010
// format and dimensions, so `CopyResource` on the single-threaded immediate context is valid.
// `Map(.., D3D11_MAP_READ)` succeeds on a staging texture and yields `map.pData` valid for the
// whole resource; for NV12/P010 the luma plane is `H` rows at `RowPitch` and the chroma plane
// follows at byte offset `RowPitch*H` (`H/2` rows), so `total = pitch*(H+⌈H/2⌉)` is exactly the
// mapped extent and `from_raw_parts(base, total)` stays in-bounds. Each `copy_nonoverlapping`
// reads a bounds-checked `mapped[..]` sub-slice (`row_bytes ≤ pitch`) and writes `row_bytes ≤
// linesize` into the `av_frame_get_buffer`-allocated plane at row `y < H`, so every destination
// offset is inside the frame's plane allocation; src and dst never alias. `Unmap` pairs `Map`,
// then `send` (the `unsafe fn`) hands `sw_frame` to the encoder.
unsafe {
self.ensure_staging(&frame.device, dxgi_fmt)?;
let staging = self.staging.clone().context("staging texture")?;
let ctx = self.ctx.clone().context("d3d11 context")?;
let src: ID3D11Resource = frame.texture.cast().context("texture -> resource")?;
let dst: ID3D11Resource = staging.cast().context("staging -> resource")?;
ctx.CopyResource(&dst, &src);
let mut map = D3D11_MAPPED_SUBRESOURCE::default();
ctx.Map(&staging, 0, D3D11_MAP_READ, 0, Some(&mut map))
.context("Map staging (yuv readback)")?;
let pitch = map.RowPitch as usize;
let h = self.height as usize;
// NV12/P010 in a mapped staging surface: the Y plane occupies rows [0,H) at `pitch`; the
// interleaved chroma plane (H/2 rows) starts at byte offset `pitch * H`. P010 samples are
// 16-bit, so a "row" of width pixels is `width*2` bytes (and chroma `width*2` too).
let bytes_per_sample = if self.ten_bit { 2 } else { 1 };
let row_bytes = self.width as usize * bytes_per_sample;
let base = map.pData as *const u8;
let total = pitch.saturating_mul(h + h.div_ceil(2));
let mapped = std::slice::from_raw_parts(base, total);
let chroma_off = pitch * h;
let y_dst = (*self.sw_frame).data[0];
let y_stride = (*self.sw_frame).linesize[0] as usize;
let uv_dst = (*self.sw_frame).data[1];
let uv_stride = (*self.sw_frame).linesize[1] as usize;
for y in 0..h {
let s = &mapped[y * pitch..y * pitch + row_bytes];
ptr::copy_nonoverlapping(s.as_ptr(), y_dst.add(y * y_stride), row_bytes);
}
for y in 0..h.div_ceil(2) {
let s = &mapped[chroma_off + y * pitch..chroma_off + y * pitch + row_bytes];
ptr::copy_nonoverlapping(s.as_ptr(), uv_dst.add(y * uv_stride), row_bytes);
}
ctx.Unmap(&staging, 0);
self.send(pts, idr)
}
}
/// Read back a captured BGRA surface, then swscale BGRA→NV12 into the software frame (8-bit).
fn readback_bgra(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> {
if self.ten_bit {
bail!("ffmpeg_win: BGRA readback is 8-bit only (HDR needs the P010 capture path)");
}
// SAFETY: `ensure_staging` builds a B8G8R8A8 STAGING texture on `frame.device` and caches that
// device's immediate context; `src`/`dst` are that device's textures of matching BGRA format,
// so `CopyResource` on the single-threaded context is valid. `Map(READ)` on the staging texture
// yields `base` valid for `pitch` × `h` rows. `ensure_sws` lazily builds the BGRA→NV12 context;
// `sws_scale` reads `h` rows of `pitch` bytes from `base` (in-bounds — the staging surface is
// `≥ pitch*h`) into the `sw_frame` planes addressed by its `data`/`linesize` (allocated for
// `width`×`height` NV12). `Unmap` pairs `Map`; the cached `sws` is freed once in `Drop`. The
// mapped read region never aliases the owned encoder frame.
unsafe {
self.ensure_staging(&frame.device, DXGI_FORMAT_B8G8R8A8_UNORM)?;
let staging = self.staging.clone().context("staging texture")?;
let ctx = self.ctx.clone().context("d3d11 context")?;
let src: ID3D11Resource = frame.texture.cast().context("texture -> resource")?;
let dst: ID3D11Resource = staging.cast().context("staging -> resource")?;
ctx.CopyResource(&dst, &src);
let mut map = D3D11_MAPPED_SUBRESOURCE::default();
ctx.Map(&staging, 0, D3D11_MAP_READ, 0, Some(&mut map))
.context("Map staging (bgra readback)")?;
let pitch = map.RowPitch as usize;
let h = self.height as usize;
let base = map.pData as *const u8;
self.ensure_sws(
pixel_to_av(Pixel::BGRA),
ffi::AVPixelFormat::AV_PIX_FMT_NV12,
SWS_CS_ITU709,
)?;
let src_data: [*const u8; 4] = [base, ptr::null(), ptr::null(), ptr::null()];
let src_stride: [c_int; 4] = [pitch as c_int, 0, 0, 0];
let r = ffi::sws_scale(
self.sws,
src_data.as_ptr(),
src_stride.as_ptr(),
0,
h as c_int,
(*self.sw_frame).data.as_ptr(),
(*self.sw_frame).linesize.as_ptr(),
);
ctx.Unmap(&staging, 0);
if r < 0 {
bail!("sws_scale BGRA→NV12 failed");
}
self.send(pts, idr)
}
}
/// Read back a captured Rgb10a2 (BT.2020 PQ, R10G10B10A2) surface and swscale it to P010
/// (BT.2020 PQ, limited range) — the HDR path when the capturer's video processor emitted its
/// R10 shader output instead of P010. DXGI `R10G10B10A2_UNORM` (R in the low 10 bits, X2 alpha in
/// the top 2) == FFmpeg `AV_PIX_FMT_X2BGR10LE`. UNTESTED on glass (no AMD/Intel Windows box).
fn readback_rgb10(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> {
// SAFETY: same shape as `readback_yuv`/`readback_bgra` — `ensure_staging` builds an
// R10G10B10A2 STAGING texture on `frame.device` and caches its immediate context; `src`/`dst`
// are that device's matching-format textures, so `CopyResource` on the single-threaded context
// is valid. `Map(READ)` yields `base` valid for `pitch` × `h` rows. `ensure_sws` builds the
// X2BGR10LE→P010 (BT.2020) context; `sws_scale` reads `h` rows of `pitch` bytes from `base`
// (in-bounds) into the `sw_frame` P010 planes (`data`/`linesize`, allocated `width`×`height`).
// `Unmap` pairs `Map`; `sws` is freed once in `Drop`. No aliasing between read and write.
unsafe {
self.ensure_staging(&frame.device, DXGI_FORMAT_R10G10B10A2_UNORM)?;
let staging = self.staging.clone().context("staging texture")?;
let ctx = self.ctx.clone().context("d3d11 context")?;
let src: ID3D11Resource = frame.texture.cast().context("texture -> resource")?;
let dst: ID3D11Resource = staging.cast().context("staging -> resource")?;
ctx.CopyResource(&dst, &src);
let mut map = D3D11_MAPPED_SUBRESOURCE::default();
ctx.Map(&staging, 0, D3D11_MAP_READ, 0, Some(&mut map))
.context("Map staging (rgb10 readback)")?;
let pitch = map.RowPitch as usize;
let h = self.height as usize;
let base = map.pData as *const u8;
// RGB(BT.2020 PQ) → YUV(BT.2020 PQ): a matrix-only repack (same PQ transfer), full→limited.
self.ensure_sws(
ffi::AVPixelFormat::AV_PIX_FMT_X2BGR10LE,
ffi::AVPixelFormat::AV_PIX_FMT_P010LE,
SWS_CS_BT2020,
)?;
let src_data: [*const u8; 4] = [base, ptr::null(), ptr::null(), ptr::null()];
let src_stride: [c_int; 4] = [pitch as c_int, 0, 0, 0];
let r = ffi::sws_scale(
self.sws,
src_data.as_ptr(),
src_stride.as_ptr(),
0,
h as c_int,
(*self.sw_frame).data.as_ptr(),
(*self.sw_frame).linesize.as_ptr(),
);
ctx.Unmap(&staging, 0);
if r < 0 {
bail!("sws_scale Rgb10a2→P010 failed");
}
self.send(pts, idr)
}
}
/// CPU path: swscale a packed RGB/BGR CPU buffer to NV12, then send (8-bit only). Used when the
/// capturer hands `FramePayload::Cpu` (DDA without the video-processor path).
fn submit_cpu(&mut self, bytes: &[u8], format: PixelFormat, pts: i64, idr: bool) -> Result<()> {
anyhow::ensure!(
format == self.format,
"captured format {format:?} != encoder source {:?}",
self.format
);
if self.ten_bit {
bail!("ffmpeg_win: CPU swscale path is 8-bit only");
}
let w = self.width as usize;
let h = self.height as usize;
let src_row = w * format.bytes_per_pixel();
anyhow::ensure!(bytes.len() >= src_row * h, "captured buffer too small");
// SAFETY: `ensure_sws` lazily builds the (packed RGB/BGR)→NV12 context for this fixed src/dst
// format pair. `src_data[0] = bytes.as_ptr()` with `src_stride[0] = src_row`; the `ensure!`
// above guarantees `bytes` holds at least `src_row*h` bytes, so `sws_scale` reads `h` rows of
// `src_row` bytes in-bounds and writes the `sw_frame` NV12 planes (`data`/`linesize`, allocated
// `width`×`height`). `bytes` is borrowed for the call only and never aliases the owned
// `sw_frame`. `send` then hands `sw_frame` to the encoder.
unsafe {
self.ensure_sws(
pixel_to_av(sws_src(format)?),
ffi::AVPixelFormat::AV_PIX_FMT_NV12,
SWS_CS_ITU709,
)?;
let src_data: [*const u8; 4] = [bytes.as_ptr(), ptr::null(), ptr::null(), ptr::null()];
let src_stride: [c_int; 4] = [src_row as c_int, 0, 0, 0];
if ffi::sws_scale(
self.sws,
src_data.as_ptr(),
src_stride.as_ptr(),
0,
h as c_int,
(*self.sw_frame).data.as_ptr(),
(*self.sw_frame).linesize.as_ptr(),
) < 0
{
bail!("sws_scale RGB→NV12 failed");
}
self.send(pts, idr)
}
}
/// Lazily build the swscale context (src → NV12/P010, limited range, the given colorspace). A
/// SystemInner uses exactly one src→dst conversion for its lifetime (8-bit RGB→NV12 BT.709, or
/// 10-bit RGB10→P010 BT.2020), so caching a single context is sound.
unsafe fn ensure_sws(
&mut self,
src_av: ffi::AVPixelFormat,
dst_av: ffi::AVPixelFormat,
cs: c_int,
) -> Result<()> {
if !self.sws.is_null() {
return Ok(());
}
let sws = ffi::sws_getContext(
self.width as c_int,
self.height as c_int,
src_av,
self.width as c_int,
self.height as c_int,
dst_av,
SWS_POINT,
ptr::null_mut(),
ptr::null_mut(),
ptr::null(),
);
if sws.is_null() {
bail!("sws_getContext(RGB→YUV) failed");
}
// Source full-range RGB → destination limited-range YUV (matches the limited-range VUI we
// signal). For RGB input the src coefficient table is unused; pass the dst table for both.
let coeff = ffi::sws_getCoefficients(cs);
ffi::sws_setColorspaceDetails(sws, coeff, 1, coeff, 0, 0, 1 << 16, 1 << 16);
self.sws = sws;
Ok(())
}
}
impl Drop for SystemInner {
fn drop(&mut self) {
// SAFETY: `sw_frame` is the `AVFrame` allocated in `open` (or null) — `av_frame_free` drops it
// once and nulls the pointer through the `&mut`; `sws` is the cached `SwsContext` (or null) —
// `sws_freeContext` frees it once. This `Drop` runs exactly once and `SystemInner` owns both
// exclusively, so there is no double-free or use-after-free.
unsafe {
if !self.sw_frame.is_null() {
ffi::av_frame_free(&mut self.sw_frame);
}
if !self.sws.is_null() {
ffi::sws_freeContext(self.sws);
}
}
}
}
// ---------------------------------------------------------------------------------------------
// Zero-copy D3D11 path (PUNKTFUNK_ZEROCOPY=1): share the capture device, pool D3D11 frames, copy
// the captured texture into a pooled slice, feed AMF directly / map to QSV. Falls back to the
// system path if the hw setup fails to open. Untested on glass — opt-in only for now.
// ---------------------------------------------------------------------------------------------
struct D3d11Hw {
device_ref: *mut ffi::AVBufferRef,
frames_ref: *mut ffi::AVBufferRef,
}
impl D3d11Hw {
/// Wrap the capturer's `ID3D11Device` as a D3D11VA hwdevice and build an NV12/P010 frames pool.
unsafe fn new(
device: &ID3D11Device,
sw_format: ffi::AVPixelFormat,
bind_flags: u32,
w: u32,
h: u32,
pool: c_int,
) -> Result<Self> {
let mut device_ref =
ffi::av_hwdevice_ctx_alloc(ffi::AVHWDeviceType::AV_HWDEVICE_TYPE_D3D11VA);
if device_ref.is_null() {
bail!("av_hwdevice_ctx_alloc(D3D11VA) failed");
}
let dev_ctx = (*device_ref).data as *mut ffi::AVHWDeviceContext;
let d11 = (*dev_ctx).hwctx as *mut AVD3D11VADeviceContext;
// Share the capture device. FFmpeg's d3d11va teardown Releases `device`, so hand it an owned
// reference (clone = AddRef, forget = don't Release ours). init() fills
// device_context / video_device / video_context / the default lock from a non-null device.
std::mem::forget(device.clone());
(*d11).device = device.as_raw();
let r = ffi::av_hwdevice_ctx_init(device_ref);
if r < 0 {
ffi::av_buffer_unref(&mut device_ref);
bail!("av_hwdevice_ctx_init(D3D11VA) failed ({r})");
}
let mut frames_ref = ffi::av_hwframe_ctx_alloc(device_ref);
if frames_ref.is_null() {
ffi::av_buffer_unref(&mut device_ref);
bail!("av_hwframe_ctx_alloc(D3D11VA) failed");
}
let fc = (*frames_ref).data as *mut ffi::AVHWFramesContext;
(*fc).format = ffi::AVPixelFormat::AV_PIX_FMT_D3D11;
(*fc).sw_format = sw_format;
(*fc).width = w as c_int;
(*fc).height = h as c_int;
(*fc).initial_pool_size = pool;
let f11 = (*fc).hwctx as *mut AVD3D11VAFramesContext;
(*f11).bind_flags = bind_flags;
let r = ffi::av_hwframe_ctx_init(frames_ref);
if r < 0 {
ffi::av_buffer_unref(&mut frames_ref);
ffi::av_buffer_unref(&mut device_ref);
bail!("av_hwframe_ctx_init(D3D11VA) failed ({r})");
}
Ok(D3d11Hw {
device_ref,
frames_ref,
})
}
}
impl Drop for D3d11Hw {
fn drop(&mut self) {
// SAFETY: `frames_ref`/`device_ref` are the two non-null `AVBufferRef`s `D3d11Hw::new` created
// (it bails before constructing `Self` if either alloc/init fails, so a live `D3d11Hw` always
// holds both). `av_buffer_unref` drops one reference and nulls the pointer through the `&mut`.
// This `Drop` runs exactly once and `D3d11Hw` owns these refs exclusively → no double-free /
// use-after-free. Frames are unref'd before the device because the frames ctx internally holds
// a ref on the device (refcounted, so the order is sound either way).
unsafe {
ffi::av_buffer_unref(&mut self.frames_ref);
ffi::av_buffer_unref(&mut self.device_ref);
}
}
}
struct ZeroCopyInner {
vendor: WinVendor,
enc: encoder::video::Encoder,
hw: D3d11Hw,
/// QSV only: the QSV device + frames ctx derived from the D3D11VA ones (the encoder's real
/// input). `None` for AMF (which takes the D3D11 frames directly).
qsv_device: *mut ffi::AVBufferRef,
qsv_frames: *mut ffi::AVBufferRef,
ctx: ID3D11DeviceContext,
/// The pool's fixed sw_format (NV12 8-bit / P010 10-bit). A captured frame whose format differs
/// (the capturer's video-processor fell back to Bgra/Rgb10a2) cannot be CopySubresourceRegion'd
/// into this pool (format-group mismatch → UB), so the caller drops to the system path instead.
pool_format: PixelFormat,
}
impl ZeroCopyInner {
#[allow(clippy::too_many_arguments)]
fn open(
vendor: WinVendor,
codec: Codec,
format: PixelFormat,
width: u32,
height: u32,
fps: u32,
bitrate_bps: u64,
bit_depth: u8,
device: &ID3D11Device,
) -> Result<Self> {
let ten_bit = is_10bit_format(format, bit_depth);
let sw_av = if ten_bit {
ffi::AVPixelFormat::AV_PIX_FMT_P010LE
} else {
ffi::AVPixelFormat::AV_PIX_FMT_NV12
};
let pool_format = if ten_bit {
PixelFormat::P010
} else {
PixelFormat::Nv12
};
// Bind flags on the FFmpeg-allocated pool. AMF reads it as encoder input (RENDER_TARGET +
// SHADER_RESOURCE, matching the video-processor output); QSV maps it as an mfx surface
// (DECODER | VIDEO_ENCODER). The CopySubresourceRegion into the pool works with any usable
// DEFAULT-usage texture regardless.
let bind_flags = match vendor {
WinVendor::Amf => (D3D11_BIND_RENDER_TARGET.0 | D3D11_BIND_SHADER_RESOURCE.0) as u32,
WinVendor::Qsv => (D3D11_BIND_DECODER.0 | D3D11_BIND_VIDEO_ENCODER.0) as u32,
};
const POOL: c_int = 8;
// SAFETY: `D3d11Hw::new` wraps the capturer's `device` as a D3D11VA hwdevice (handing FFmpeg an
// owned AddRef of it, balanced by FFmpeg's teardown Release) and builds an owned
// device_ref/frames_ref pair freed by `D3d11Hw::Drop`; `hw` is a local, so it is dropped (and
// both refs freed) on every early `return Err`. For QSV, `av_hwdevice_ctx_create_derived` and
// `av_hwframe_ctx_create_derived` fill the null-initialised `qsv_device`/`qsv_frames` out-params
// only on success (`r >= 0` checked); on the frames-derive failure we unref the already-created
// `qsv_device` before bailing. `open_win_encoder` internally `av_buffer_ref`s the dev/frames
// refs it is given (so ownership of `hw`'s and the derived refs stays here), and on its failure
// we unref the still-owned derived `qsv_frames`/`qsv_device` (null for AMF → skipped) and return
// — `hw` then drops its D3D11 refs. On success the derived refs are moved into `ZeroCopyInner`
// (freed in its `Drop`) and the encoder holds its own AddRef'd copies. Every `AVBufferRef` is
// unref'd exactly once across all paths — no leak, no double-free.
unsafe {
let hw = D3d11Hw::new(device, sw_av, bind_flags, width, height, POOL)?;
let (pix_fmt, dev_ref, frames_ref, mut qsv_device, mut qsv_frames) = match vendor {
WinVendor::Amf => (
ffi::AVPixelFormat::AV_PIX_FMT_D3D11,
hw.device_ref,
hw.frames_ref,
ptr::null_mut(),
ptr::null_mut(),
),
WinVendor::Qsv => {
// Derive a QSV device that SHARES the D3D11 device, and a QSV frames ctx derived
// from the D3D11 frames pool (auto-mapped 1:1). The encoder takes AV_PIX_FMT_QSV.
let mut qsv_device: *mut ffi::AVBufferRef = ptr::null_mut();
let r = ffi::av_hwdevice_ctx_create_derived(
&mut qsv_device,
ffi::AVHWDeviceType::AV_HWDEVICE_TYPE_QSV,
hw.device_ref,
0,
);
if r < 0 {
bail!("derive QSV device from D3D11VA: {}", ffmpeg::Error::from(r));
}
let mut qsv_frames: *mut ffi::AVBufferRef = ptr::null_mut();
let r = ffi::av_hwframe_ctx_create_derived(
&mut qsv_frames,
ffi::AVPixelFormat::AV_PIX_FMT_QSV,
qsv_device,
hw.frames_ref,
ffi::AV_HWFRAME_MAP_DIRECT as c_int,
);
if r < 0 {
ffi::av_buffer_unref(&mut qsv_device);
bail!("derive QSV frames from D3D11VA: {}", ffmpeg::Error::from(r));
}
(
ffi::AVPixelFormat::AV_PIX_FMT_QSV,
qsv_device,
qsv_frames,
qsv_device,
qsv_frames,
)
}
};
let enc = match open_win_encoder(
vendor,
codec,
width,
height,
fps,
bitrate_bps,
pix_fmt,
sw_av,
ten_bit,
dev_ref,
frames_ref,
) {
Ok(e) => e,
Err(e) => {
if !qsv_frames.is_null() {
ffi::av_buffer_unref(&mut qsv_frames);
}
if !qsv_device.is_null() {
ffi::av_buffer_unref(&mut qsv_device);
}
return Err(e);
}
};
tracing::info!(
encoder = vendor.encoder_name(codec),
"{} encode active ({width}x{height}@{fps}, zero-copy D3D11 {} path)",
vendor.label(),
if ten_bit { "P010" } else { "NV12" }
);
Ok(ZeroCopyInner {
vendor,
enc,
hw,
qsv_device,
qsv_frames,
ctx: immediate_context(device),
pool_format,
})
}
}
fn submit(&mut self, frame: &D3d11Frame, pts: i64, idr: bool) -> Result<()> {
// SAFETY: `d3d = av_frame_alloc()` is a fresh owned frame (null-checked) and is `av_frame_free`d
// exactly once on every path below. `av_hwframe_get_buffer` fills it from the pool — on failure
// we free it and bail. `(*d3d).data[0]` is the pool's texture-array and `data[1]` the array
// index; `from_raw_borrowed` borrows that `ID3D11Texture2D` WITHOUT taking ownership (no Release
// — the frame owns it) and is null-checked. `src` (the captured texture) and `dst` (the pooled
// slice) live on the SAME D3D11 device wrapped by `self.hw`, and the caller guarantees
// `captured.format == pool_format` before calling, so `CopySubresourceRegion(dst, dst_index, ..,
// src, 0, ..)` on the single-threaded immediate context `self.ctx` is a valid same-format GPU
// copy. For QSV the mapped `qsv` frame is a fresh owned frame whose `hw_frames_ctx` takes an
// `av_buffer_ref` of `self.qsv_frames`; it is `av_frame_free`d (releasing that ref) on both the
// map-failure and success paths. `avcodec_send_frame` only internally refs the input frame, so
// the `av_frame_free(d3d)`/`av_frame_free(qsv)` afterwards are the sole owning frees — no leak,
// no double-free, no use-after-free.
unsafe {
// Pull a pooled D3D11 surface; its data[0] is the pool's texture-ARRAY, data[1] the slice.
let mut d3d = ffi::av_frame_alloc();
if d3d.is_null() {
bail!("av_frame_alloc(d3d11) failed");
}
let r = ffi::av_hwframe_get_buffer(self.hw.frames_ref, d3d, 0);
if r < 0 {
ffi::av_frame_free(&mut d3d);
bail!("av_hwframe_get_buffer(D3D11) failed ({r})");
}
let dst_ptr = (*d3d).data[0] as *mut c_void;
let dst_index = (*d3d).data[1] as usize as u32;
let dst_tex = ID3D11Texture2D::from_raw_borrowed(&dst_ptr)
.ok_or_else(|| anyhow!("pooled D3D11 frame has null texture"))?;
// GPU-local copy of the captured slice into the pooled array slice (like NVENC's CUDA
// device→device copy). Subresource = arrayIndex (MipLevels=1).
let src: ID3D11Resource = frame.texture.cast().context("texture -> resource")?;
let dst: ID3D11Resource = dst_tex.cast().context("pooled texture -> resource")?;
self.ctx
.CopySubresourceRegion(&dst, dst_index, 0, 0, 0, &src, 0, None);
(*d3d).pts = pts;
(*d3d).pict_type = if idr {
ffi::AVPictureType::AV_PICTURE_TYPE_I
} else {
ffi::AVPictureType::AV_PICTURE_TYPE_NONE
};
let send = match self.vendor {
WinVendor::Amf => ffi::avcodec_send_frame(self.enc.as_mut_ptr(), d3d),
WinVendor::Qsv => {
// Map the D3D11 frame to a QSV surface (1:1, no copy), then send the mapped frame.
let mut qsv = ffi::av_frame_alloc();
if qsv.is_null() {
ffi::av_frame_free(&mut d3d);
bail!("av_frame_alloc(qsv) failed");
}
(*qsv).format = ffi::AVPixelFormat::AV_PIX_FMT_QSV as c_int;
(*qsv).hw_frames_ctx = ffi::av_buffer_ref(self.qsv_frames);
// The map flags are a bindgen enum (no BitOr) — cast each to int before OR-ing.
let r = ffi::av_hwframe_map(
qsv,
d3d,
ffi::AV_HWFRAME_MAP_DIRECT as c_int | ffi::AV_HWFRAME_MAP_READ as c_int,
);
if r < 0 {
ffi::av_frame_free(&mut qsv);
ffi::av_frame_free(&mut d3d);
bail!("av_hwframe_map(D3D11→QSV) failed ({r})");
}
(*qsv).pts = pts;
(*qsv).pict_type = (*d3d).pict_type;
let s = ffi::avcodec_send_frame(self.enc.as_mut_ptr(), qsv);
ffi::av_frame_free(&mut qsv);
s
}
};
ffi::av_frame_free(&mut d3d);
if send < 0 {
bail!(
"avcodec_send_frame({}) failed ({send})",
self.vendor.label()
);
}
}
Ok(())
}
}
impl Drop for ZeroCopyInner {
fn drop(&mut self) {
// SAFETY: `qsv_frames`/`qsv_device` are the derived QSV `AVBufferRef`s (or null for AMF); each
// is `av_buffer_unref`'d once here (nulling the pointer through the `&mut`) — `ZeroCopyInner`
// owns these handles exclusively and this `Drop` runs once, so no double-free. The `enc` and
// `hw` fields free the encoder's AddRef'd copies and the D3D11 device/frames refs through their
// own `Drop`, so all references stay balanced.
unsafe {
if !self.qsv_frames.is_null() {
ffi::av_buffer_unref(&mut self.qsv_frames);
}
if !self.qsv_device.is_null() {
ffi::av_buffer_unref(&mut self.qsv_device);
}
}
}
}
// ---------------------------------------------------------------------------------------------
enum Inner {
System(SystemInner),
ZeroCopy(ZeroCopyInner),
}
pub struct FfmpegWinEncoder {
vendor: WinVendor,
codec: Codec,
format: PixelFormat,
width: u32,
height: u32,
fps: u32,
bitrate_bps: u64,
bit_depth: u8,
/// Built lazily from the first frame (system readback vs zero-copy D3D11).
inner: Option<Inner>,
/// Raw `ID3D11Device` pointer the live inner is bound to — re-init on change (the capturer
/// recreates its device across secure-desktop / HDR / resize transitions, like NVENC tracks).
bound_device: isize,
frame_idx: i64,
force_kf: bool,
}
// Raw FFI pointers + COM objects; the encoder lives on a single thread (same contract as NVENC/VAAPI).
// SAFETY: `FfmpegWinEncoder` owns raw libav pointers (`AVFrame`/`SwsContext`/`AVBufferRef`) and
// windows-rs COM handles (`ID3D11Device`/`ID3D11DeviceContext`/textures) that are not auto-`Send`. The
// session creates the encoder, drives `submit`/`poll`/`flush`, and drops it all on one dedicated encode
// thread; it is never shared by reference across threads, and the D3D11 immediate context is only ever
// touched from that thread. The only cross-thread action is the initial move to the encode thread,
// after which every interior pointer/COM ref is used single-threaded — the same contract the
// NVENC/VAAPI encoders rely on. No interior state is accessed concurrently.
unsafe impl Send for FfmpegWinEncoder {}
impl FfmpegWinEncoder {
#[allow(clippy::too_many_arguments)]
#[allow(clippy::too_many_arguments)]
pub fn open(
vendor: WinVendor,
codec: Codec,
format: PixelFormat,
width: u32,
height: u32,
fps: u32,
bitrate_bps: u64,
bit_depth: u8,
chroma: ChromaFormat,
) -> Result<Self> {
// AMF/QSV 4:4:4 is deferred (see `probe_can_encode_444`): no validated AMD/Intel Windows
// hardware in the lab, and the AMF/QSV HEVC 4:4:4 profile/format incantations are vendor- and
// driver-specific (a wrong profile silently encodes 4:2:0). The probe returns false so the host
// never negotiates 4:4:4 for an AMF/QSV session; if a request slips through, fall back to 4:2:0.
if chroma.is_444() {
tracing::warn!("AMF/QSV 4:4:4 encode not implemented — encoding 4:2:0");
}
ffmpeg::init().context("ffmpeg init")?;
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
// SAFETY: `ffmpeg::init()` ran on the line above, so libav is initialised; `av_log_set_level`
// is a global scalar setter with no pointer arguments.
unsafe { ffi::av_log_set_level(48) };
}
// Make sure the encoder name exists in this libavcodec build up front (clear error vs a
// first-frame failure).
let name = vendor.encoder_name(codec);
if encoder::find_by_name(name).is_none() {
bail!(
"{name} not built into libavcodec (this FFmpeg lacks the {} encoder)",
vendor.label()
);
}
Ok(FfmpegWinEncoder {
vendor,
codec,
format,
width,
height,
fps,
bitrate_bps,
bit_depth,
inner: None,
bound_device: 0,
frame_idx: 0,
force_kf: false,
})
}
/// Build (or rebuild) the inner for a D3D11 frame, picking zero-copy or system. Zero-copy
/// failures fall back to the system path so a session is never lost to the untested hw path. The
/// device is re-bound on change (the capturer recreates it across secure-desktop / HDR / resize).
fn ensure_inner_d3d11(&mut self, device: &ID3D11Device) -> Result<()> {
let dev_raw = device.as_raw() as isize;
if self.inner.is_some() && self.bound_device == dev_raw {
return Ok(());
}
self.inner = None;
self.bound_device = dev_raw;
let inner = if zerocopy_enabled() {
match ZeroCopyInner::open(
self.vendor,
self.codec,
self.format,
self.width,
self.height,
self.fps,
self.bitrate_bps,
self.bit_depth,
device,
) {
Ok(zc) => Inner::ZeroCopy(zc),
Err(e) => {
tracing::warn!(
error = %format!("{e:#}"),
"{} zero-copy D3D11 setup failed — falling back to system-memory readback",
self.vendor.label()
);
Inner::System(self.open_system()?)
}
}
} else {
Inner::System(self.open_system()?)
};
self.inner = Some(inner);
Ok(())
}
fn open_system(&self) -> Result<SystemInner> {
SystemInner::open(
self.vendor,
self.codec,
self.format,
self.width,
self.height,
self.fps,
self.bitrate_bps,
self.bit_depth,
)
}
}
impl Encoder for FfmpegWinEncoder {
fn submit(&mut self, captured: &CapturedFrame) -> Result<()> {
anyhow::ensure!(
captured.width == self.width && captured.height == self.height,
"captured frame {}x{} != encoder {}x{}",
captured.width,
captured.height,
self.width,
self.height
);
let pts = self.frame_idx;
self.frame_idx += 1;
let idr = self.force_kf;
self.force_kf = false;
match &captured.payload {
FramePayload::D3d11(f) => {
self.ensure_inner_d3d11(&f.device)?;
// If zero-copy is active but the capturer fell back to a format the NV12/P010 pool
// can't accept (no video processor → Bgra/Rgb10a2), a CopySubresourceRegion into the
// pool would be a format-group mismatch (UB / device removal). Drop to the system
// readback path, which handles every captured format.
let pool_mismatch = matches!(
&self.inner,
Some(Inner::ZeroCopy(zc)) if captured.format != zc.pool_format
);
if pool_mismatch {
tracing::warn!(
captured = ?captured.format,
"{} zero-copy pool format mismatch (capturer video-processor fallback) — \
switching to system-memory readback",
self.vendor.label()
);
self.inner = Some(Inner::System(self.open_system()?));
}
match self.inner.as_mut().unwrap() {
Inner::ZeroCopy(zc) => zc.submit(f, pts, idr),
Inner::System(s) => s.submit_d3d11(f, captured.format, pts, idr),
}
}
FramePayload::Cpu(bytes) => {
// DDA-without-video-processor hands CPU BGRA; build a system inner and swscale it.
if self.inner.is_none() {
self.inner = Some(Inner::System(self.open_system()?));
}
match self.inner.as_mut().unwrap() {
Inner::System(s) => s.submit_cpu(bytes, captured.format, pts, idr),
Inner::ZeroCopy(_) => {
bail!(
"{} encoder built for D3D11 got a CPU frame",
self.vendor.label()
)
}
}
}
}
}
fn request_keyframe(&mut self) {
self.force_kf = true;
}
fn poll(&mut self) -> Result<Option<EncodedFrame>> {
match &mut self.inner {
Some(Inner::System(s)) => poll_encoder(&mut s.enc, self.fps),
Some(Inner::ZeroCopy(z)) => poll_encoder(&mut z.enc, self.fps),
None => Ok(None),
}
}
fn flush(&mut self) -> Result<()> {
match &mut self.inner {
Some(Inner::System(s)) => s.enc.send_eof().context("send_eof")?,
Some(Inner::ZeroCopy(z)) => z.enc.send_eof().context("send_eof")?,
None => {}
}
Ok(())
}
}