feat(audio): end-to-end 5.1/7.1 surround across the native path + all clients
apple / swift (push) Failing after 10s
release / apple (push) Failing after 7s
apple / screenshots (push) Has been skipped
audit / cargo-audit (push) Failing after 1m19s
windows-host / package (push) Failing after 2m44s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Failing after 39s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Failing after 39s
windows / build (aarch64-pc-windows-msvc) (push) Failing after 45s
android / android (push) Successful in 5m17s
windows / build (x86_64-pc-windows-msvc) (push) Failing after 45s
ci / web (push) Successful in 57s
ci / docs-site (push) Successful in 56s
ci / rust (push) Successful in 9m19s
ci / bench (push) Successful in 4m40s
decky / build-publish (push) Successful in 26s
deb / build-publish (push) Successful in 2m57s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 33s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m56s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m35s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m20s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 53s
flatpak / build-publish (push) Successful in 4m22s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m51s
docker / deploy-docs (push) Successful in 21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m50s
apple / swift (push) Failing after 10s
release / apple (push) Failing after 7s
apple / screenshots (push) Has been skipped
audit / cargo-audit (push) Failing after 1m19s
windows-host / package (push) Failing after 2m44s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Failing after 39s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Failing after 39s
windows / build (aarch64-pc-windows-msvc) (push) Failing after 45s
android / android (push) Successful in 5m17s
windows / build (x86_64-pc-windows-msvc) (push) Failing after 45s
ci / web (push) Successful in 57s
ci / docs-site (push) Successful in 56s
ci / rust (push) Successful in 9m19s
ci / bench (push) Successful in 4m40s
decky / build-publish (push) Successful in 26s
deb / build-publish (push) Successful in 2m57s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 33s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m56s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m35s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m20s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 53s
flatpak / build-publish (push) Successful in 4m22s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m51s
docker / deploy-docs (push) Successful in 21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m50s
Adds negotiated 5.1/7.1 surround to the punktfunk/1 protocol and every client (previously stereo-only): - core: new shared `audio` layout table (LAYOUT_51/71 + identity multistream mapping, canonical wire order FL FR FC LFE RL RR SL SR); Hello/Welcome `audio_channels` negotiation via the trailing-byte back-compat pattern (old peers fall back to stereo); C-ABI `punktfunk_connect_ex6`, `punktfunk_connection_audio_channels`, and in-core multistream decode `punktfunk_connection_next_audio_pcm` for embedders without a multistream Opus decoder. Real-libopus channel-identity round-trip test. - host: native audio thread captures + Opus-(multi)stream-encodes at the negotiated count (with a cross-session cached-capturer channel-mismatch fix); GameStream surround unified onto the safe `opus::MSEncoder`, dropping `audiopus_sys` (~4 unsafe blocks) and un-gating Windows GameStream surround; WASAPI loopback capture relaxed to 2/6/8 with the correct dwChannelMask. - clients: Linux (PipeWire), Windows (WASAPI), Android (AAudio) decode via `opus::MSDecoder` + render multichannel; Apple decodes in-core to PCM → AVAudioEngine with an explicit wire-order channel layout; each gains a Stereo/5.1/7.1 setting. `punktfunk-probe --audio-channels N` is the headless validator. Verified on Linux: core/host/linux/probe test suites + the Android Rust (cargo-ndk) build, clippy -D warnings, and rustfmt all green. Windows/Apple builds, all on-glass checks, and the live native loopback are pending (CI / a free box). Also lands the concurrent in-tree HEVC 4:4:4 host work (PUNKTFUNK_444): it shares the same touched files (quic.rs, punktfunk1.rs, encode/*, ...) and so cannot be committed separately from the surround changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -11,7 +11,7 @@
|
||||
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
|
||||
#![deny(clippy::undocumented_unsafe_blocks)]
|
||||
|
||||
use super::{Codec, EncodedFrame, Encoder};
|
||||
use super::{ChromaFormat, Codec, EncodedFrame, Encoder};
|
||||
use crate::capture::{CapturedFrame, FramePayload, PixelFormat};
|
||||
use anyhow::{anyhow, bail, Context, Result};
|
||||
use ffmpeg::format::Pixel;
|
||||
@@ -19,9 +19,33 @@ use ffmpeg::util::frame::Video as VideoFrame;
|
||||
use ffmpeg::{codec, encoder, Dictionary, Packet, Rational};
|
||||
use ffmpeg_next as ffmpeg;
|
||||
use std::os::raw::c_int;
|
||||
use std::ptr;
|
||||
|
||||
use ffmpeg::ffi; // = ffmpeg_sys_next
|
||||
|
||||
/// swscale: nearest-neighbour scaler flag (`SWS_POINT`). We never rescale (src dims == dst dims), so
|
||||
/// the resampler choice only governs the colour-conversion path; POINT is the cheapest.
|
||||
const SWS_POINT: c_int = 0x10;
|
||||
/// swscale colorspace id for ITU-R BT.709 (`SWS_CS_ITU709`) — the CSC coefficients for our RGB→YUV.
|
||||
const SWS_CS_ITU709: c_int = 1;
|
||||
|
||||
/// The swscale *source* pixel format for a captured packed RGB/BGR layout (the real byte order, not
|
||||
/// the NVENC-padded `*0` form). Used by the 4:4:4 RGB→YUV444P conversion path. Mirrors the VAAPI
|
||||
/// CPU-input mapping; YUV/10-bit inputs can't feed this path (the 4:4:4 session forces packed RGB).
|
||||
fn sws_src_pixel(format: PixelFormat) -> Result<Pixel> {
|
||||
Ok(match format {
|
||||
PixelFormat::Bgrx => Pixel::BGRZ, // bgr0
|
||||
PixelFormat::Rgbx => Pixel::RGBZ, // rgb0
|
||||
PixelFormat::Bgra => Pixel::BGRA,
|
||||
PixelFormat::Rgba => Pixel::RGBA,
|
||||
PixelFormat::Rgb => Pixel::RGB24,
|
||||
PixelFormat::Bgr => Pixel::BGR24,
|
||||
PixelFormat::Nv12 | PixelFormat::P010 | PixelFormat::Rgb10a2 => {
|
||||
bail!("NVENC 4:4:4 CPU-input path supports packed RGB/BGR only; got {format:?}")
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
/// `AVCUDADeviceContext` (libavutil/hwcontext_cuda.h) — not in the ffmpeg-sys bindings (the
|
||||
/// crate doesn't allowlist that header), so mirror its stable 3-pointer layout. We set the
|
||||
/// first field to *our* `CUcontext` so NVENC shares the context the EGL importer maps into.
|
||||
@@ -131,6 +155,10 @@ pub struct NvencEncoder {
|
||||
frame: Option<VideoFrame>,
|
||||
/// Zero-copy path: CUDA hwdevice/hwframes contexts (the encoder takes `AV_PIX_FMT_CUDA`).
|
||||
cuda: Option<CudaHw>,
|
||||
/// 4:4:4 path only: swscale context converting the captured packed RGB/BGR → planar YUV444P
|
||||
/// (BT.709 limited) into [`Self::frame`], because `hevc_nvenc` only emits 4:4:4 from a YUV444
|
||||
/// *input* (RGB-in is always 4:2:0). `None` on the ordinary 4:2:0 RGB path. Freed in `Drop`.
|
||||
sws_444: Option<*mut ffi::SwsContext>,
|
||||
src_format: PixelFormat,
|
||||
expand: bool,
|
||||
width: u32,
|
||||
@@ -142,10 +170,12 @@ pub struct NvencEncoder {
|
||||
force_kf: bool,
|
||||
}
|
||||
|
||||
// `CudaHw` holds raw `AVBufferRef`s; the encoder lives on a single thread. The CPU encoder is
|
||||
// already `Send` via ffmpeg-next; assert it for the CUDA fields too.
|
||||
// `CudaHw` holds raw `AVBufferRef`s and `sws_444` a raw `SwsContext`; the encoder lives on a single
|
||||
// thread. The CPU encoder is already `Send` via ffmpeg-next; assert it for the raw fields too.
|
||||
// SAFETY: `NvencEncoder` owns an ffmpeg-next `Encoder`/`VideoFrame` (already `Send`) plus a `CudaHw`
|
||||
// holding raw `AVBufferRef`s, which are not `Send` by default. The encoder is owned and driven by
|
||||
// holding raw `AVBufferRef`s and an optional raw `SwsContext`, none of which are `Send` by default.
|
||||
// The `SwsContext` is a self-contained swscale state object with no thread affinity, touched only
|
||||
// through `&mut self` on the one encode thread. The encoder is owned and driven by
|
||||
// exactly ONE thread — the per-session encode thread it is moved to — and is only touched through
|
||||
// `&mut self` methods, so it is never aliased or accessed concurrently. The wrapped libav contexts
|
||||
// (and the shared `CUcontext` the `CudaHw` references) have no thread affinity, so transferring
|
||||
@@ -164,6 +194,7 @@ impl NvencEncoder {
|
||||
bitrate_bps: u64,
|
||||
cuda: bool,
|
||||
bit_depth: u8,
|
||||
chroma: ChromaFormat,
|
||||
) -> Result<Self> {
|
||||
// TODO(hdr): Linux 10-bit parity. Unlike the Windows raw-SDK path (which upconverts 8-bit
|
||||
// ARGB → Main10 via pixelBitDepthMinus8), libavcodec hevc_nvenc needs a 10-bit input pixel
|
||||
@@ -175,6 +206,18 @@ impl NvencEncoder {
|
||||
"Linux NVENC 10-bit not yet wired — encoding 8-bit"
|
||||
);
|
||||
}
|
||||
// Full-chroma 4:4:4 (HEVC Range Extensions). `hevc_nvenc` only emits 4:4:4 from a YUV444
|
||||
// *input* frame — feeding RGB always subsamples to 4:2:0 regardless of profile (verified on
|
||||
// the RTX 5070 Ti). So a 4:4:4 session swscales the captured RGB → YUV444P (BT.709 limited)
|
||||
// and feeds that with `profile=rext`. The negotiator gates this to HEVC + the single-process
|
||||
// CPU-capture topology, so `cuda` must be false here; defend the contract.
|
||||
let want_444 = chroma.is_444() && codec == Codec::H265;
|
||||
if want_444 && cuda {
|
||||
bail!(
|
||||
"NVENC 4:4:4 needs CPU RGB frames (the session forces non-zero-copy capture for \
|
||||
4:4:4); got a CUDA frame — capture/encoder negotiation mismatch"
|
||||
);
|
||||
}
|
||||
ffmpeg::init().context("ffmpeg init")?;
|
||||
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
|
||||
// SAFETY: `av_log_set_level` sets libav's global integer log level; `48` (= AV_LOG_DEBUG)
|
||||
@@ -185,7 +228,14 @@ impl NvencEncoder {
|
||||
let name = codec.nvenc_name();
|
||||
let av_codec = encoder::find_by_name(name)
|
||||
.ok_or_else(|| anyhow!("{name} not built into libavcodec"))?;
|
||||
let (nvenc_pixel, expand) = nvenc_input(format);
|
||||
let (rgb_pixel, rgb_expand) = nvenc_input(format);
|
||||
// 4:4:4 feeds NVENC a planar YUV444P frame we produce by swscale; the ordinary path feeds the
|
||||
// captured RGB straight in and lets NVENC's internal CSC subsample to 4:2:0.
|
||||
let (nvenc_pixel, expand) = if want_444 {
|
||||
(Pixel::YUV444P, false)
|
||||
} else {
|
||||
(rgb_pixel, rgb_expand)
|
||||
};
|
||||
|
||||
let mut video = codec::context::Context::new_with_codec(av_codec)
|
||||
.encoder()
|
||||
@@ -234,12 +284,12 @@ impl NvencEncoder {
|
||||
(*video.as_mut_ptr()).gop_size = -1;
|
||||
}
|
||||
|
||||
// NV12 path: we did the RGB→YUV conversion ourselves as BT.709 *limited* range, so signal
|
||||
// that in the bitstream VUI (colorspace/range/primaries/transfer) — otherwise the client
|
||||
// decoder assumes a default and the picture comes out washed-out / wrong-contrast. The
|
||||
// RGB-input paths leave these unset (NVENC's internal CSC writes its own VUI). Matches the
|
||||
// Windows NV12 path's BT.709 limited-range signalling.
|
||||
if matches!(format, PixelFormat::Nv12) {
|
||||
// NV12 / 4:4:4 paths: we do the RGB→YUV conversion ourselves as BT.709 *limited* range
|
||||
// (swscale), so signal that in the bitstream VUI (colorspace/range/primaries/transfer) —
|
||||
// otherwise the client decoder assumes a default and the picture comes out washed-out /
|
||||
// wrong-contrast. The RGB-input 4:2:0 path leaves these unset (NVENC's internal CSC writes
|
||||
// its own VUI). Matches the Windows NV12 path's BT.709 limited-range signalling.
|
||||
if matches!(format, PixelFormat::Nv12) || want_444 {
|
||||
// SAFETY: same `video` builder — `raw = video.as_mut_ptr()` is the non-null, properly-
|
||||
// aligned, sole-owned, not-yet-opened `AVCodecContext`. We set its four VUI colour enum
|
||||
// fields to valid `AVColorSpace`/`AVColorRange`/`AVColorPrimaries`/`AVColorTransfer-
|
||||
@@ -280,6 +330,45 @@ impl NvencEncoder {
|
||||
None
|
||||
};
|
||||
|
||||
// 4:4:4: build the RGB→YUV444P swscale (BT.709 limited, no rescale). Mirrors the VAAPI CPU
|
||||
// path's RGB→NV12 scaler, but the dst is full-chroma planar 4:4:4.
|
||||
let sws_444 = if want_444 {
|
||||
let src_av = pixel_to_av(sws_src_pixel(format)?);
|
||||
// SAFETY: `sws_getContext` allocates a swscale context for the given src/dst dims + pixel
|
||||
// formats. Both dims are the encoder's positive `width`/`height` as `c_int`; `src_av` is a
|
||||
// valid `AVPixelFormat` (from the `sws_src_pixel`-validated, packed-RGB-only source), the
|
||||
// dst is YUV444P. The trailing filter/param pointers are null = "use defaults" (documented
|
||||
// as accepted). No Rust memory is borrowed; the returned pointer is null-checked below.
|
||||
let sws = unsafe {
|
||||
ffi::sws_getContext(
|
||||
width as c_int,
|
||||
height as c_int,
|
||||
src_av,
|
||||
width as c_int,
|
||||
height as c_int,
|
||||
ffi::AVPixelFormat::AV_PIX_FMT_YUV444P,
|
||||
SWS_POINT,
|
||||
ptr::null_mut(),
|
||||
ptr::null_mut(),
|
||||
ptr::null(),
|
||||
)
|
||||
};
|
||||
if sws.is_null() {
|
||||
bail!("sws_getContext(RGB→YUV444P) failed");
|
||||
}
|
||||
// SAFETY: `sws` is the non-null context from the call above (null-checked). The ITU-709
|
||||
// coefficient table from `sws_getCoefficients` is a process-lifetime libswscale static,
|
||||
// reused for src+dst matrices; `sws_setColorspaceDetails` only reads it and writes scalar
|
||||
// CSC settings into `sws` (limited-range dst: dstRange = 0). No Rust memory is passed.
|
||||
unsafe {
|
||||
let cs709 = ffi::sws_getCoefficients(SWS_CS_ITU709);
|
||||
ffi::sws_setColorspaceDetails(sws, cs709, 1, cs709, 0, 0, 1 << 16, 1 << 16);
|
||||
}
|
||||
Some(sws)
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
// Low-latency NVENC tuning (plan §7 / linux-setup doc).
|
||||
let mut opts = Dictionary::new();
|
||||
opts.set("preset", "p1"); // fastest
|
||||
@@ -288,6 +377,12 @@ impl NvencEncoder {
|
||||
opts.set("bf", "0");
|
||||
opts.set("delay", "0");
|
||||
opts.set("forced-idr", "1"); // RFI/request_keyframe → real IDR under the infinite GOP
|
||||
if want_444 {
|
||||
// HEVC Range Extensions — the profile that carries chroma_format_idc=3. With a YUV444P
|
||||
// input `hevc_nvenc` auto-selects it, but pin it explicitly so the chroma is never silently
|
||||
// dropped on a future libavcodec.
|
||||
opts.set("profile", "rext");
|
||||
}
|
||||
|
||||
// Split-frame encode across both NVENC engines (GB203 has 2) when the pixel rate exceeds
|
||||
// a single engine's HEVC capacity (~1 Gpix/s); e.g. 5120x1440@240 = 1.77 Gpix/s needs it,
|
||||
@@ -321,6 +416,7 @@ impl NvencEncoder {
|
||||
enc,
|
||||
frame,
|
||||
cuda: cuda_hw,
|
||||
sws_444,
|
||||
src_format: format,
|
||||
expand,
|
||||
width,
|
||||
@@ -333,6 +429,15 @@ impl NvencEncoder {
|
||||
}
|
||||
|
||||
impl Encoder for NvencEncoder {
|
||||
fn caps(&self) -> super::EncoderCaps {
|
||||
super::EncoderCaps {
|
||||
// 4:4:4 iff this session opened the RGB→YUV444P swscale path (FREXT). RFI/HDR-SEI stay
|
||||
// unsupported on libavcodec NVENC (the trait defaults).
|
||||
chroma_444: self.sws_444.is_some(),
|
||||
..super::EncoderCaps::default()
|
||||
}
|
||||
}
|
||||
|
||||
fn submit(&mut self, captured: &CapturedFrame) -> Result<()> {
|
||||
anyhow::ensure!(
|
||||
captured.width == self.width && captured.height == self.height,
|
||||
@@ -411,6 +516,47 @@ impl NvencEncoder {
|
||||
bytes.len(),
|
||||
src_row * h
|
||||
);
|
||||
// 4:4:4: swscale the packed RGB straight into the planar YUV444P input frame (BT.709 limited),
|
||||
// then send it — no byte-expand. The 4:2:0 RGB path (below) feeds NVENC packed RGB directly.
|
||||
if let Some(sws) = self.sws_444 {
|
||||
let frame = self
|
||||
.frame
|
||||
.as_mut()
|
||||
.context("CPU frame missing (encoder opened in CUDA mode)")?;
|
||||
// SAFETY: `format == self.src_format` and `bytes.len() >= src_row * h` (the `ensure!`s
|
||||
// above), so `sws_scale` reads `h` rows of `src_row` bytes from `src_data[0] = bytes`
|
||||
// (packed RGB is single-plane; the other src planes are null/0) — all in bounds. `sws` is
|
||||
// the non-null context built in `open`. The dst is `frame`'s underlying `AVFrame`: its
|
||||
// `data`/`linesize` in-struct arrays were sized for YUV444P by `VideoFrame::new`, and the
|
||||
// 3 planes are each `width`×`height`. All pointers are live locals for this synchronous
|
||||
// call; the encoder runs only on this thread (`unsafe impl Send`), so no aliasing/race.
|
||||
unsafe {
|
||||
let dst_av = frame.as_mut_ptr();
|
||||
let src_data: [*const u8; 4] =
|
||||
[bytes.as_ptr(), ptr::null(), ptr::null(), ptr::null()];
|
||||
let src_stride: [c_int; 4] = [src_row as c_int, 0, 0, 0];
|
||||
let r = ffi::sws_scale(
|
||||
sws,
|
||||
src_data.as_ptr(),
|
||||
src_stride.as_ptr(),
|
||||
0,
|
||||
h as c_int,
|
||||
(*dst_av).data.as_ptr(),
|
||||
(*dst_av).linesize.as_ptr(),
|
||||
);
|
||||
if r < 0 {
|
||||
bail!("sws_scale(RGB→YUV444P) failed ({r})");
|
||||
}
|
||||
}
|
||||
frame.set_pts(Some(pts));
|
||||
frame.set_kind(if idr {
|
||||
ffmpeg::picture::Type::I
|
||||
} else {
|
||||
ffmpeg::picture::Type::None
|
||||
});
|
||||
self.enc.send_frame(frame).context("send_frame(444)")?;
|
||||
return Ok(());
|
||||
}
|
||||
let frame = self
|
||||
.frame
|
||||
.as_mut()
|
||||
@@ -526,3 +672,51 @@ impl NvencEncoder {
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for NvencEncoder {
|
||||
fn drop(&mut self) {
|
||||
if let Some(sws) = self.sws_444.take() {
|
||||
// SAFETY: `sws` is the non-null `SwsContext` allocated by `sws_getContext` in `open` and
|
||||
// owned exclusively by this encoder (taken out of the field so it can't be freed twice).
|
||||
// `sws_freeContext` frees it; nothing else references it after this single-threaded drop.
|
||||
unsafe { ffi::sws_freeContext(sws) };
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Probe whether this NVIDIA GPU + driver + libavcodec can actually encode HEVC **4:4:4** (Range
|
||||
/// Extensions). Opens a tiny real `hevc_nvenc` 4:4:4 session — the exact path [`NvencEncoder::open`]
|
||||
/// takes for a live 4:4:4 stream — and reports whether it succeeded. HEVC-only; the result is cached
|
||||
/// by the caller ([`crate::encode::can_encode_444`]). A GPU/driver/ffmpeg without RExt 4:4:4 fails
|
||||
/// the open here, so the host resolves the session to 4:2:0 before the Welcome (honest downgrade).
|
||||
pub fn probe_can_encode_444(codec: Codec) -> bool {
|
||||
if codec != Codec::H265 {
|
||||
return false;
|
||||
}
|
||||
if ffmpeg::init().is_err() {
|
||||
return false;
|
||||
}
|
||||
// Quiet ffmpeg's open error on a GPU that lacks 4:4:4 — the probe failing is an expected outcome.
|
||||
// SAFETY: libav initialized above; `av_log_{get,set}_level` only read/write the global int level
|
||||
// (no pointer args) and are always sound post-init.
|
||||
let prev = unsafe {
|
||||
let p = ffi::av_log_get_level();
|
||||
ffi::av_log_set_level(ffi::AV_LOG_FATAL);
|
||||
p
|
||||
};
|
||||
let ok = NvencEncoder::open(
|
||||
codec,
|
||||
PixelFormat::Bgra,
|
||||
640,
|
||||
480,
|
||||
30,
|
||||
2_000_000,
|
||||
false, // CPU input (the 4:4:4 path never uses CUDA)
|
||||
8,
|
||||
ChromaFormat::Yuv444,
|
||||
)
|
||||
.is_ok();
|
||||
// SAFETY: restore the saved global log level (scalar arg, no pointers).
|
||||
unsafe { ffi::av_log_set_level(prev) };
|
||||
ok
|
||||
}
|
||||
|
||||
@@ -160,6 +160,18 @@ pub fn probe_can_encode(codec: Codec) -> bool {
|
||||
}
|
||||
}
|
||||
|
||||
/// Whether the active VAAPI GPU can encode HEVC **4:4:4** (Range Extensions). **Deferred in v1 —
|
||||
/// always `false`.** VAAPI HEVC 4:4:4 encode is narrow and vendor-specific (the lab's AMD Phoenix1 /
|
||||
/// RDNA3 exposes only `VAProfileHEVCMain`/`Main10` `EncSlice`, no `Main444`), and there is no
|
||||
/// validated hardware to build + verify the 4:4:4 surface/profile path against. Returning `false`
|
||||
/// keeps the negotiation honest: a VAAPI host resolves every session to 4:2:0 before the Welcome, so
|
||||
/// the client never builds a 4:4:4 decoder it would only get 4:2:0 frames for. (Follow-up: implement
|
||||
/// + validate on an Intel Arc / RDNA4-class box that advertises a HEVC 4:4:4 encode entrypoint.)
|
||||
pub fn probe_can_encode_444(_codec: Codec) -> bool {
|
||||
tracing::info!("VAAPI HEVC 4:4:4 encode is not implemented yet — declining (encoding 4:2:0)");
|
||||
false
|
||||
}
|
||||
|
||||
/// Drain the encoder for one packet (shared poll logic).
|
||||
fn poll_encoder(enc: &mut encoder::video::Encoder, fps: u32) -> Result<Option<EncodedFrame>> {
|
||||
let mut pkt = Packet::empty();
|
||||
@@ -848,6 +860,7 @@ pub struct VaapiEncoder {
|
||||
unsafe impl Send for VaapiEncoder {}
|
||||
|
||||
impl VaapiEncoder {
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub fn open(
|
||||
codec: Codec,
|
||||
format: PixelFormat,
|
||||
@@ -856,10 +869,18 @@ impl VaapiEncoder {
|
||||
fps: u32,
|
||||
bitrate_bps: u64,
|
||||
bit_depth: u8,
|
||||
chroma: super::ChromaFormat,
|
||||
) -> Result<Self> {
|
||||
if bit_depth != 8 {
|
||||
tracing::warn!(bit_depth, "VAAPI 10-bit not yet wired — encoding 8-bit");
|
||||
}
|
||||
// VAAPI 4:4:4 is deferred (see `probe_can_encode_444`): no validated AMD/Intel hardware in the
|
||||
// lab exposes a HEVC 4:4:4 encode entrypoint, and the probe returns false so the host never
|
||||
// negotiates 4:4:4 for a VAAPI session. If a request slips through, fall back to 4:2:0 rather
|
||||
// than emit an unverified stream — the host signalled 4:2:0 in the Welcome anyway.
|
||||
if chroma.is_444() {
|
||||
tracing::warn!("VAAPI 4:4:4 encode not implemented — encoding 4:2:0");
|
||||
}
|
||||
ffmpeg::init().context("ffmpeg init")?;
|
||||
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
|
||||
// SAFETY: `av_log_set_level` sets libav's global integer log level; `48` (= AV_LOG_DEBUG)
|
||||
|
||||
@@ -31,7 +31,7 @@
|
||||
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
|
||||
#![deny(clippy::undocumented_unsafe_blocks)]
|
||||
|
||||
use super::{Codec, EncodedFrame, Encoder};
|
||||
use super::{ChromaFormat, Codec, EncodedFrame, Encoder};
|
||||
use crate::capture::{dxgi::D3d11Frame, CapturedFrame, FramePayload, PixelFormat};
|
||||
use anyhow::{anyhow, bail, Context, Result};
|
||||
use ffmpeg::format::Pixel;
|
||||
@@ -241,6 +241,18 @@ unsafe fn open_win_encoder(
|
||||
/// driver/runtime rejects codecs the video engine can't do (AV1 on pre-RDNA3 AMD / pre-Arc Intel,
|
||||
/// or HEVC on a very old part). Used to build the GameStream codec advertisement so a client never
|
||||
/// negotiates a codec the encoder can't open. Torn down immediately.
|
||||
/// Whether the active AMD (AMF) / Intel (QSV) GPU can encode HEVC **4:4:4**. **Deferred in v1 —
|
||||
/// always `false`.** AMF/QSV HEVC 4:4:4 encode is narrow (AMD RDNA3+, Intel Arc/Xe2+) and the
|
||||
/// libavcodec profile/pixel-format incantation is vendor- and driver-specific — a wrong profile
|
||||
/// `avcodec_open2` *silently* falls back to 4:2:0, so a positive probe would need a verify-by-frame,
|
||||
/// and there is no AMD/Intel Windows box in the lab to build + validate that against. Returning
|
||||
/// `false` keeps the negotiation honest: an AMF/QSV host resolves every session to 4:2:0 before the
|
||||
/// Welcome. (Follow-up: implement + validate on an RDNA3+/Arc Windows box.)
|
||||
pub fn probe_can_encode_444(_vendor: WinVendor, _codec: Codec) -> bool {
|
||||
tracing::info!("AMF/QSV HEVC 4:4:4 encode is not implemented yet — declining (encoding 4:2:0)");
|
||||
false
|
||||
}
|
||||
|
||||
pub fn probe_can_encode(vendor: WinVendor, codec: Codec) -> bool {
|
||||
if ffmpeg::init().is_err() {
|
||||
return false;
|
||||
@@ -1096,6 +1108,7 @@ pub struct FfmpegWinEncoder {
|
||||
unsafe impl Send for FfmpegWinEncoder {}
|
||||
|
||||
impl FfmpegWinEncoder {
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub fn open(
|
||||
vendor: WinVendor,
|
||||
@@ -1106,7 +1119,15 @@ impl FfmpegWinEncoder {
|
||||
fps: u32,
|
||||
bitrate_bps: u64,
|
||||
bit_depth: u8,
|
||||
chroma: ChromaFormat,
|
||||
) -> Result<Self> {
|
||||
// AMF/QSV 4:4:4 is deferred (see `probe_can_encode_444`): no validated AMD/Intel Windows
|
||||
// hardware in the lab, and the AMF/QSV HEVC 4:4:4 profile/format incantations are vendor- and
|
||||
// driver-specific (a wrong profile silently encodes 4:2:0). The probe returns false so the host
|
||||
// never negotiates 4:4:4 for an AMF/QSV session; if a request slips through, fall back to 4:2:0.
|
||||
if chroma.is_444() {
|
||||
tracing::warn!("AMF/QSV 4:4:4 encode not implemented — encoding 4:2:0");
|
||||
}
|
||||
ffmpeg::init().context("ffmpeg init")?;
|
||||
if std::env::var_os("PUNKTFUNK_FFMPEG_DEBUG").is_some() {
|
||||
// SAFETY: `ffmpeg::init()` ran on the line above, so libav is initialised; `av_log_set_level`
|
||||
|
||||
@@ -16,7 +16,7 @@
|
||||
// Every `unsafe` block / impl in this file carries a `// SAFETY:` proof; enforce it.
|
||||
#![deny(clippy::undocumented_unsafe_blocks)]
|
||||
|
||||
use super::{Codec, EncodedFrame, Encoder, EncoderCaps};
|
||||
use super::{ChromaFormat, Codec, EncodedFrame, Encoder, EncoderCaps};
|
||||
use crate::capture::{CapturedFrame, FramePayload, PixelFormat};
|
||||
use anyhow::{anyhow, bail, Context, Result};
|
||||
use std::collections::{HashMap, VecDeque};
|
||||
@@ -57,6 +57,15 @@ pub struct NvencD3d11Encoder {
|
||||
buffer_fmt: nv::NV_ENC_BUFFER_FORMAT,
|
||||
/// Encoded bit depth (8 or 10). 10 → HEVC Main10 (NVENC upconverts the 8-bit ARGB input).
|
||||
bit_depth: u8,
|
||||
/// Full-chroma 4:4:4 (HEVC Range Extensions, `chroma_format_idc = 3`) requested for this session.
|
||||
/// NVENC ingests the RGB (ARGB/ABGR10) input and CSCs it to YUV444 internally — the `FREXT` profile
|
||||
/// + `chromaFormatIDC = 3` in the encode config carry the chroma. Gated on the GPU's
|
||||
/// `NV_ENC_CAPS_SUPPORT_YUV444_ENCODE` (cleared in `query_caps` on a card that lacks it) and on an
|
||||
/// RGB input format (NV12/P010 capture can't reconstruct 4:4:4). HEVC-only.
|
||||
chroma_444: bool,
|
||||
/// `NV_ENC_CAPS_SUPPORT_YUV444_ENCODE` from the caps probe — whether this GPU can 4:4:4 encode at
|
||||
/// all. `chroma_444` is forced off when this is false (graceful downgrade to 4:2:0).
|
||||
yuv444_supported: bool,
|
||||
/// HDR: the capturer is delivering BT.2020 PQ 10-bit (`PixelFormat::Rgb10a2`) frames. Sets the
|
||||
/// `ABGR10` input format + the BT.2020/PQ colour VUI. Derived per-frame from the capture format
|
||||
/// (HDR can toggle mid-session); a change re-inits the session.
|
||||
@@ -103,6 +112,7 @@ pub struct NvencD3d11Encoder {
|
||||
unsafe impl Send for NvencD3d11Encoder {}
|
||||
|
||||
impl NvencD3d11Encoder {
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub fn open(
|
||||
codec: Codec,
|
||||
_format: PixelFormat,
|
||||
@@ -111,6 +121,7 @@ impl NvencD3d11Encoder {
|
||||
fps: u32,
|
||||
bitrate_bps: u64,
|
||||
bit_depth: u8,
|
||||
chroma: ChromaFormat,
|
||||
) -> Result<Self> {
|
||||
Ok(Self {
|
||||
encoder: ptr::null_mut(),
|
||||
@@ -122,6 +133,9 @@ impl NvencD3d11Encoder {
|
||||
bitrate_bps,
|
||||
buffer_fmt: nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB,
|
||||
bit_depth,
|
||||
// 4:4:4 is HEVC-only; the GPU-support gate is applied in `query_caps`.
|
||||
chroma_444: chroma.is_444() && codec == Codec::H265,
|
||||
yuv444_supported: false,
|
||||
hdr: false,
|
||||
hdr_meta: None,
|
||||
regs: HashMap::new(),
|
||||
@@ -209,6 +223,7 @@ impl NvencD3d11Encoder {
|
||||
let wmax = self.get_cap(enc, nv::NV_ENC_CAPS::NV_ENC_CAPS_WIDTH_MAX);
|
||||
let hmax = self.get_cap(enc, nv::NV_ENC_CAPS::NV_ENC_CAPS_HEIGHT_MAX);
|
||||
let ten_bit = self.get_cap(enc, nv::NV_ENC_CAPS::NV_ENC_CAPS_SUPPORT_10BIT_ENCODE);
|
||||
let yuv444 = self.get_cap(enc, nv::NV_ENC_CAPS::NV_ENC_CAPS_SUPPORT_YUV444_ENCODE);
|
||||
let rfi = self.get_cap(
|
||||
enc,
|
||||
nv::NV_ENC_CAPS::NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION,
|
||||
@@ -235,6 +250,13 @@ impl NvencD3d11Encoder {
|
||||
self.bit_depth = 8;
|
||||
self.hdr = false;
|
||||
}
|
||||
// Same for 4:4:4: a card without YUV444 encode falls back to 4:2:0. (The host already probed
|
||||
// this via `probe_can_encode_444` before the Welcome, so this is a belt-and-braces guard.)
|
||||
self.yuv444_supported = yuv444 != 0;
|
||||
if self.chroma_444 && !self.yuv444_supported {
|
||||
tracing::warn!("NVENC: this GPU can't 4:4:4 encode — falling back to 4:2:0");
|
||||
self.chroma_444 = false;
|
||||
}
|
||||
self.rfi_supported = rfi != 0;
|
||||
self.custom_vbv = custom_vbv != 0;
|
||||
tracing::info!(
|
||||
@@ -313,9 +335,31 @@ impl NvencD3d11Encoder {
|
||||
cfg.encodeCodecConfig.hevcConfig.tier = 1;
|
||||
cfg.encodeCodecConfig.hevcConfig.level = 0;
|
||||
|
||||
// 10-bit HEVC Main10 (HDR foundation): NVENC upconverts the 8-bit input; 8-bit leaves the
|
||||
// preset default (Main) untouched.
|
||||
if self.bit_depth == 10 {
|
||||
// Chroma + bit depth. Full-chroma 4:4:4 (HEVC Range Extensions) takes precedence and composes
|
||||
// with 10-bit (Main 4:4:4 10): NVENC ingests the RGB input (ARGB / ABGR10) and CSCs it to
|
||||
// YUV444 internally when `chromaFormatIDC = 3` under the FREXT profile. Only valid on an RGB
|
||||
// input — a subsampled NV12/P010 source can't reconstruct full chroma (so the capturer is
|
||||
// forced to RGB for a 4:4:4 session, and we guard on the input format here too).
|
||||
//
|
||||
// ON-GLASS TODO (RTX box): confirm ARGB + chromaFormatIDC=3 + FREXT yields a *true* 4:4:4
|
||||
// stream. NVENC's RGB→YUV CSC is documented to honor chromaFormatIDC (unlike libavcodec's
|
||||
// wrapper, which always subsamples RGB to 4:2:0 — hence the Linux path feeds planar YUV444
|
||||
// instead). If on-glass shows 4:2:0, the follow-up is a BGRA→AYUV shader feeding the native
|
||||
// `NV_ENC_BUFFER_FORMAT_AYUV` 4:4:4 input format.
|
||||
let rgb_input = matches!(
|
||||
self.buffer_fmt,
|
||||
nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB
|
||||
| nv::NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ABGR10
|
||||
);
|
||||
if self.chroma_444 && rgb_input {
|
||||
cfg.profileGUID = nv::NV_ENC_HEVC_PROFILE_FREXT_GUID;
|
||||
cfg.encodeCodecConfig.hevcConfig.set_chromaFormatIDC(3);
|
||||
if self.bit_depth == 10 {
|
||||
cfg.encodeCodecConfig.hevcConfig.set_pixelBitDepthMinus8(2); // Main 4:4:4 10
|
||||
}
|
||||
} else if self.bit_depth == 10 {
|
||||
// 10-bit HEVC Main10 (HDR foundation): NVENC upconverts the 8-bit input; 8-bit leaves the
|
||||
// preset default (Main) untouched.
|
||||
cfg.profileGUID = nv::NV_ENC_HEVC_PROFILE_MAIN10_GUID;
|
||||
cfg.encodeCodecConfig.hevcConfig.set_pixelBitDepthMinus8(2); // 10 - 8
|
||||
}
|
||||
@@ -787,6 +831,9 @@ impl Encoder for NvencD3d11Encoder {
|
||||
EncoderCaps {
|
||||
supports_rfi: self.rfi_supported,
|
||||
supports_hdr_metadata: self.hdr,
|
||||
// Reflects what the session actually configured (cleared in `query_caps` if the GPU lacks
|
||||
// YUV444 encode), so the glue can confirm 4:4:4 vs the negotiated request.
|
||||
chroma_444: self.chroma_444,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -904,3 +951,69 @@ impl Drop for NvencD3d11Encoder {
|
||||
unsafe { self.teardown() };
|
||||
}
|
||||
}
|
||||
|
||||
/// Probe whether the active NVIDIA GPU can encode HEVC **4:4:4** (`NV_ENC_CAPS_SUPPORT_YUV444_ENCODE`).
|
||||
/// Creates a throwaway hardware D3D11 device + NVENC session, queries the cap, and tears down. HEVC-only;
|
||||
/// the result is cached by the caller ([`crate::encode::can_encode_444`]) and read *before* the Welcome
|
||||
/// so the host advertises the chroma it can really encode (honest downgrade to 4:2:0 on a card without it).
|
||||
pub fn probe_can_encode_444(codec: Codec) -> bool {
|
||||
use windows::Win32::Foundation::HMODULE;
|
||||
use windows::Win32::Graphics::Direct3D::{D3D_DRIVER_TYPE_HARDWARE, D3D_FEATURE_LEVEL_11_0};
|
||||
use windows::Win32::Graphics::Direct3D11::{
|
||||
D3D11CreateDevice, D3D11_CREATE_DEVICE_BGRA_SUPPORT, D3D11_SDK_VERSION,
|
||||
};
|
||||
if codec != Codec::H265 {
|
||||
return false;
|
||||
}
|
||||
// SAFETY: a self-contained probe owning every handle it creates. `D3D11CreateDevice` (HARDWARE
|
||||
// driver, NULL adapter) fills `device` or returns Err (→ false). `open_encode_session_ex` opens an
|
||||
// NVENC session against that device's raw pointer (valid while `device` is held) or errors (→ false,
|
||||
// tearing nothing down). `get_encode_caps` reads one scalar cap into `val` via the loaded API table.
|
||||
// `destroy_encoder` frees the session exactly once; `device`/its context drop with the COM wrappers.
|
||||
// No handle escapes this call and nothing runs concurrently.
|
||||
unsafe {
|
||||
let mut device: Option<ID3D11Device> = None;
|
||||
if D3D11CreateDevice(
|
||||
None,
|
||||
D3D_DRIVER_TYPE_HARDWARE,
|
||||
HMODULE::default(),
|
||||
D3D11_CREATE_DEVICE_BGRA_SUPPORT,
|
||||
Some(&[D3D_FEATURE_LEVEL_11_0]),
|
||||
D3D11_SDK_VERSION,
|
||||
Some(&mut device),
|
||||
None,
|
||||
None,
|
||||
)
|
||||
.is_err()
|
||||
{
|
||||
return false;
|
||||
}
|
||||
let Some(device) = device else { return false };
|
||||
let mut params = nv::NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS {
|
||||
version: nv::NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER,
|
||||
deviceType: nv::NV_ENC_DEVICE_TYPE::NV_ENC_DEVICE_TYPE_DIRECTX,
|
||||
device: device.as_raw(),
|
||||
apiVersion: nv::NVENCAPI_VERSION,
|
||||
..Default::default()
|
||||
};
|
||||
let mut enc: *mut c_void = ptr::null_mut();
|
||||
if (API.open_encode_session_ex)(&mut params, &mut enc)
|
||||
.result_without_string()
|
||||
.is_err()
|
||||
{
|
||||
return false;
|
||||
}
|
||||
let mut param = nv::NV_ENC_CAPS_PARAM {
|
||||
version: nv::NV_ENC_CAPS_PARAM_VER,
|
||||
capsToQuery: nv::NV_ENC_CAPS::NV_ENC_CAPS_SUPPORT_YUV444_ENCODE,
|
||||
reserved: [0; 62],
|
||||
};
|
||||
let mut val: i32 = 0;
|
||||
let ok = (API.get_encode_caps)(enc, nv::NV_ENC_CODEC_HEVC_GUID, &mut param, &mut val)
|
||||
.result_without_string()
|
||||
.is_ok()
|
||||
&& val != 0;
|
||||
let _ = (API.destroy_encoder)(enc);
|
||||
ok
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user