docs(host): prove every unsafe block in the Linux FFI files + gate them (unsafe-proof program 2/N)
Continues the structural unsafe-proof program (every unsafe carries a documented
proof of soundness; the file gains #![deny(clippy::undocumented_unsafe_blocks)]
so it stays proven). This batch covers all 10 remaining pure-Linux files
(104 blocks), each proof stating the REAL invariant — not boilerplate:
zerocopy/cuda.rs (26) leaked process-lifetime libcuda fn-ptr table; opaque
CUcontext never dereferenced; free-exactly-once via the
Arc<Mutex<PoolInner>> ownership graph; dmabuf fd take/close split
zerocopy/egl.rs (18) eglGetProcAddress'd procs with the GL context current;
EGLImage liveness; the two-call modifier-query bounds
zerocopy/vulkan.rs (4) copy-bounds arithmetic (src_size>=span); Send = thread
confinement to the punktfunk-pipewire thread
dmabuf_fence.rs (4) poll/ioctl/close fd liveness + ownership
capture/linux/mod.rs (16) spa_data repr(transparent) cast; null-checked spa
derefs; single-loop-thread buffer ownership until requeue
inject/linux/gamepad.rs (10) uinput ioctl request-number ↔ struct-size match
(static-asserted); InputEventRaw no-padding for the byte cast
encode/linux/vaapi.rs (15) + encode/linux/mod.rs (9) ffmpeg object ownership/
free ladders; VAAPI/DRM graph; Send = single-thread transfer
inject/linux/wlr.rs (2), vdisplay/linux/kwin.rs (1)
No memory-unsafety SUSPECT blocks were found — the unsafe is sound. The vaapi
agent did flag two real AVBufferRef *leaks* (not UB) in DmabufInner::open; marked
inline with NOTE(leak) and addressed in a follow-up.
Verified: cargo clippy -p punktfunk-host --all-targets -- -D warnings is clean
(each file's deny gate hard-errors on any undocumented block).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -13,6 +13,9 @@
|
||||
//! attaches none, the export yields an already-signaled sync_file (poll returns immediately) — no
|
||||
//! wait, no harm, and `waited=false` tells us the driver doesn't fence (so zero-copy would still race).
|
||||
|
||||
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
|
||||
#![deny(clippy::undocumented_unsafe_blocks)]
|
||||
|
||||
use std::os::fd::RawFd;
|
||||
|
||||
// linux/dma-buf.h ioctls on the DMA_BUF_BASE ('b' = 0x62) magic. _IOWR = dir(3)<<30 | size<<16 | base<<8 | nr.
|
||||
@@ -40,6 +43,11 @@ pub fn wait_read_ready(dmabuf_fd: RawFd, timeout_ms: i32) -> std::io::Result<boo
|
||||
flags: DMA_BUF_SYNC_READ,
|
||||
fd: -1,
|
||||
};
|
||||
// SAFETY: `dmabuf_fd` is a live dmabuf fd supplied by the caller (borrowed for this call; we
|
||||
// never close it). `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` encodes `size_of::<DmaBufExportSyncFile>()`
|
||||
// — the exact byte count the kernel copies — and `&mut req` is a live, correctly-sized
|
||||
// `#[repr(C)]` struct the EXPORT_SYNC_FILE ioctl reads (`flags`) and writes (`fd`). `req`
|
||||
// outlives this synchronous call and is not aliased elsewhere.
|
||||
let r = unsafe { libc::ioctl(dmabuf_fd, DMA_BUF_IOCTL_EXPORT_SYNC_FILE, &mut req) };
|
||||
if r < 0 {
|
||||
return Err(std::io::Error::last_os_error());
|
||||
@@ -54,11 +62,21 @@ pub fn wait_read_ready(dmabuf_fd: RawFd, timeout_ms: i32) -> std::io::Result<boo
|
||||
revents: 0,
|
||||
};
|
||||
// Non-blocking probe: not-yet-signaled (poll==0) means the producer is still rendering.
|
||||
// SAFETY: `&mut pfd` points at a single live `libc::pollfd` and `nfds == 1` matches that one
|
||||
// element; `pfd.fd` is `sync_fd`, the sync_file fd just exported (already checked `>= 0`).
|
||||
// `poll` reads `fd`/`events` and writes `revents` for this non-blocking (timeout 0) probe, then
|
||||
// returns — `pfd` outlives the call and aliases nothing.
|
||||
let pending = unsafe { libc::poll(&mut pfd, 1, 0) } == 0;
|
||||
if pending {
|
||||
pfd.revents = 0;
|
||||
// SAFETY: same live single-element `pfd` (its `revents` reset to 0 just above), `nfds == 1`,
|
||||
// and `sync_fd` still open. This blocking `poll` (up to `timeout_ms`) waits for the render
|
||||
// fence to signal; it reads `fd`/`events`, writes `revents`, and returns before `pfd` ends.
|
||||
unsafe { libc::poll(&mut pfd, 1, timeout_ms) }; // block until the render fence signals
|
||||
}
|
||||
// SAFETY: `sync_fd` is the sync_file fd the EXPORT_SYNC_FILE ioctl created and handed us to own;
|
||||
// this point is reached only when `sync_fd >= 0`, this `close` runs exactly once on it, and it is
|
||||
// never used afterward — no double-close or use-after-close.
|
||||
unsafe { libc::close(sync_fd) };
|
||||
Ok(pending)
|
||||
}
|
||||
|
||||
@@ -11,6 +11,8 @@
|
||||
//! thread) and ffmpeg's `hevc_nvenc` (encode thread); each thread makes it current before use.
|
||||
|
||||
#![allow(non_camel_case_types, non_snake_case)]
|
||||
// Every `unsafe` block/impl below carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
|
||||
#![deny(clippy::undocumented_unsafe_blocks)]
|
||||
|
||||
use anyhow::{bail, Result};
|
||||
use std::os::raw::{c_int, c_uint, c_void};
|
||||
@@ -128,8 +130,14 @@ struct CudaApi {
|
||||
) -> CUresult,
|
||||
cuDestroyExternalMemory: unsafe extern "C" fn(CUexternalMemory) -> CUresult,
|
||||
}
|
||||
// The resolved fn pointers are plain addresses into a process-lifetime mapping; safe to share.
|
||||
// SAFETY: every field is a bare `extern "C" fn` address into the leaked, process-lifetime
|
||||
// `libcuda` mapping (`cuda_api` `forget`s the `Library`, so it is never unloaded) — an immutable
|
||||
// value with no interior mutability and no thread affinity. Moving the table to another thread
|
||||
// cannot dangle (the code it points at stays mapped) or race (the fields are read-only).
|
||||
unsafe impl Send for CudaApi {}
|
||||
// SAFETY: as above — the table is a set of immutable fn-pointer addresses with no interior
|
||||
// mutability, so concurrent shared reads from multiple threads cannot race; the driver entry
|
||||
// points they address are themselves thread-safe.
|
||||
unsafe impl Sync for CudaApi {}
|
||||
|
||||
/// `CUresult` returned by the wrappers when `libcuda` isn't loaded (no NVIDIA driver). Non-zero so
|
||||
@@ -143,6 +151,14 @@ static CUDA_API: OnceLock<Option<CudaApi>> = OnceLock::new();
|
||||
/// (the expected case on AMD/Intel hosts) — logged at debug, not an error.
|
||||
fn cuda_api() -> Option<&'static CudaApi> {
|
||||
CUDA_API
|
||||
// SAFETY: `Library::new` runs `libcuda.so.1`'s initializers — it is the trusted NVIDIA
|
||||
// driver library, so loading has no unexpected effects; `?`/`None` handle its absence.
|
||||
// Each `lib.get::<T>(name)` asserts the symbol's real ABI equals `T`: every NUL-terminated
|
||||
// name is a documented CUDA Driver API entry point and `T` is the exact
|
||||
// `unsafe extern "C" fn(..)` signature from cuda.h/cudaGL.h (`_v2` for ctx/mem ops). Each
|
||||
// `Symbol` only borrows `lib` until the end of the struct-literal statement; we deref-copy
|
||||
// the raw fn-pointer out first, then `forget(lib)` leaks the mapping so those addresses
|
||||
// stay valid for the whole process. Runs once under the `OnceLock` init — no aliasing.
|
||||
.get_or_init(|| unsafe {
|
||||
let lib = libloading::Library::new("libcuda.so.1")
|
||||
.or_else(|_| libloading::Library::new("libcuda.so"))
|
||||
@@ -361,6 +377,12 @@ pub fn read_plane_to_host(
|
||||
Height: height,
|
||||
..Default::default()
|
||||
};
|
||||
// SAFETY: `copy_blocking` is unsafe because it issues a CUDA copy; its contract is a valid
|
||||
// descriptor with the shared context current (the caller's responsibility — self-test path).
|
||||
// `©` is a live local `#[repr(C)] CUDA_MEMCPY2D` that outlives the synchronous call:
|
||||
// `srcDevice`/`srcPitch` are the caller's live pitched device plane, `dstHost` addresses the
|
||||
// freshly-allocated `host` `Vec` of exactly `width_bytes*height` bytes, and `WidthInBytes`×
|
||||
// `Height` fit both. The copy is synchronous, so `host` is fully written before we return it.
|
||||
unsafe { copy_blocking(©, "cuMemcpy2DAsync_v2(dev->host)")? };
|
||||
Ok(host)
|
||||
}
|
||||
@@ -369,7 +391,13 @@ pub fn read_plane_to_host(
|
||||
/// in a `OnceLock`; the raw `CUcontext` is thread-safe to make current from any thread.
|
||||
#[derive(Clone, Copy)]
|
||||
pub struct Context(pub CUcontext);
|
||||
// SAFETY: `CUcontext` is an opaque CUDA driver handle, not a dereferenceable Rust pointer. It is
|
||||
// created once and never destroyed (process lifetime), and the only thing done with it is
|
||||
// `cuCtxSetCurrent`, which the Driver API explicitly allows from any thread — so transferring the
|
||||
// handle to another thread cannot dangle or race (the driver owns the synchronization).
|
||||
unsafe impl Send for Context {}
|
||||
// SAFETY: as above — the wrapped handle is an immutable opaque address and the driver does all the
|
||||
// synchronization, so sharing `&Context` across threads is sound.
|
||||
unsafe impl Sync for Context {}
|
||||
|
||||
static CONTEXT: OnceLock<Context> = OnceLock::new();
|
||||
@@ -382,6 +410,12 @@ pub fn context() -> Result<CUcontext> {
|
||||
if cuda_api().is_none() {
|
||||
bail!("libcuda.so.1 not available — no NVIDIA driver (CUDA zero-copy disabled)");
|
||||
}
|
||||
// SAFETY: we returned above unless `cuda_api()` is `Some`, so every wrapper here forwards into
|
||||
// the live, leaked `libcuda` table rather than the not-loaded stub. `cuInit(0)` passes the
|
||||
// API-required flags value 0. `&mut dev`/`&mut ctx` are live, zero/null-initialized stack
|
||||
// out-params the driver writes the device handle / new context into; each outlives its
|
||||
// synchronous call and they are distinct locals (no aliasing). `cuCtxCreate_v2` yields a valid
|
||||
// `CUcontext` on success (`ck` bails otherwise), which becomes the block's value.
|
||||
let ctx = unsafe {
|
||||
ck(cuInit(0), "cuInit")?;
|
||||
let mut dev: CUdevice = 0;
|
||||
@@ -401,6 +435,10 @@ pub fn context() -> Result<CUcontext> {
|
||||
/// Make the shared context current on the calling thread (required before any CUDA op here).
|
||||
pub fn make_current() -> Result<()> {
|
||||
let ctx = context()?;
|
||||
// SAFETY: `ctx` came from `context()?`, so it is the live shared `CUcontext` and the driver
|
||||
// table is present. `cuCtxSetCurrent` binds that opaque handle to the calling thread; it takes
|
||||
// no Rust-memory pointer and is thread-safe (affects only this thread's current context), so
|
||||
// there is no aliasing or lifetime hazard.
|
||||
unsafe { ck(cuCtxSetCurrent(ctx), "cuCtxSetCurrent") }
|
||||
}
|
||||
|
||||
@@ -423,6 +461,12 @@ fn copy_stream() -> CUstream {
|
||||
if let Some(s) = cell.get() {
|
||||
return s;
|
||||
}
|
||||
// SAFETY: `copy_stream` runs with the shared context current (its doc contract), so the
|
||||
// wrappers forward into the live `libcuda` table. `&mut least`/`&mut greatest` are live
|
||||
// stack `i32`s the driver fills with the priority range; `&mut s` is a live null-init
|
||||
// `CUstream` the driver writes the new stream into. All out-params outlive their
|
||||
// synchronous calls and are distinct locals. On any non-zero result we fall back to a null
|
||||
// (NULL-stream) value and never read an uninitialized handle.
|
||||
let stream = unsafe {
|
||||
let (mut least, mut greatest) = (0i32, 0i32);
|
||||
if cuCtxGetStreamPriorityRange(&mut least, &mut greatest) != 0 {
|
||||
@@ -459,6 +503,11 @@ unsafe fn copy_blocking(copy: &CUDA_MEMCPY2D, what: &str) -> Result<()> {
|
||||
fn alloc_pitched(width: u32, height: u32) -> Result<(CUdeviceptr, usize)> {
|
||||
let mut ptr: CUdeviceptr = 0;
|
||||
let mut pitch: usize = 0;
|
||||
// SAFETY: `cuMemAllocPitch_v2` allocates a pitched device buffer (the wrapper forwards to the
|
||||
// live table on any path that reached allocation). `&mut ptr` (`CUdeviceptr`) and `&mut pitch`
|
||||
// (`usize`) are live, distinct stack out-params the driver writes the allocation pointer and
|
||||
// its pitch into; both outlive the synchronous call. Width/height/element-size are by-value
|
||||
// ints. No aliasing — two separate locals.
|
||||
unsafe {
|
||||
ck(
|
||||
cuMemAllocPitch_v2(
|
||||
@@ -486,6 +535,10 @@ fn alloc_pitched_nv12(
|
||||
let mut y_pitch: usize = 0;
|
||||
let mut uv_ptr: CUdeviceptr = 0;
|
||||
let mut uv_pitch: usize = 0;
|
||||
// SAFETY: two independent `cuMemAllocPitch_v2` calls (wrapper → live table). `&mut y_ptr`/
|
||||
// `&mut y_pitch` and `&mut uv_ptr`/`&mut uv_pitch` are live, distinct stack out-params the
|
||||
// driver writes each plane's pointer and pitch into; all outlive their synchronous calls. The
|
||||
// dimension/element-size args are by-value ints. No aliasing — four separate locals.
|
||||
unsafe {
|
||||
ck(
|
||||
cuMemAllocPitch_v2(
|
||||
@@ -524,6 +577,13 @@ struct PoolInner {
|
||||
|
||||
impl Drop for PoolInner {
|
||||
fn drop(&mut self) {
|
||||
// SAFETY: the pool only exists because allocation succeeded, so the driver table is live.
|
||||
// `PoolInner` drops only once every `DeviceBuffer` that referenced it (each holds an `Arc`
|
||||
// clone) has been recycled, so `free`/`free_uv` hold every outstanding allocation exactly
|
||||
// once and nothing else still uses them — no double-free or use-after-free. We make the
|
||||
// shared context current first (drop may run off the allocating thread) so `cuMemFree_v2`
|
||||
// targets the right context. Each `p` is a `CUdeviceptr` previously returned by
|
||||
// `cuMemAllocPitch_v2`; results are ignored (best-effort teardown).
|
||||
unsafe {
|
||||
if let Some(c) = CONTEXT.get() {
|
||||
let _ = cuCtxSetCurrent(c.0);
|
||||
@@ -697,6 +757,12 @@ impl Drop for DeviceBuffer {
|
||||
}
|
||||
} else {
|
||||
// The buffer may be freed on the encode thread; cuMemFree needs a current context.
|
||||
// SAFETY: this is the un-pooled branch (`pool` is `None`), so this `DeviceBuffer`
|
||||
// exclusively owns `self.ptr` (and `self.uv`'s `uv_ptr`), each returned by
|
||||
// `cuMemAllocPitch_v2` and freed exactly once here — `drop` runs once and the
|
||||
// `self.ptr == 0` guard above skips the sentinel/empty case, so no double-free. We set
|
||||
// the shared context current first because drop may run on a thread where it isn't, and
|
||||
// `cuMemFree_v2` needs it. Wrapper → live table; results ignored (teardown).
|
||||
unsafe {
|
||||
if let Some(c) = CONTEXT.get() {
|
||||
let _ = cuCtxSetCurrent(c.0);
|
||||
@@ -745,6 +811,16 @@ impl RegisteredTexture {
|
||||
/// unmap. The copy is synchronized (on our priority stream) before unmap so `dst` is ready
|
||||
/// before the source dmabuf is recycled. Always unmaps, even if the copy errors.
|
||||
pub fn copy_mapped_to(&mut self, dst: &DeviceBuffer) -> Result<()> {
|
||||
// SAFETY: `self.resource` is the valid `CUgraphicsResource` from a successful `register_gl`
|
||||
// (its only constructor), so the wrappers forward to the live table; the caller holds the
|
||||
// GL+CUDA contexts current (the registration's contract). `cuGraphicsMapResources` maps
|
||||
// `count == 1` resource via `&mut self.resource` (a live field) on the default stream;
|
||||
// `cuGraphicsSubResourceGetMappedArray` writes the mapped `CUarray` into the live local
|
||||
// `array` (index 0, mip 0). On failure we unmap and bail (balanced). `©` is a live
|
||||
// local `CUDA_MEMCPY2D` outliving the synchronous `copy_blocking`: `srcArray` is valid
|
||||
// while mapped, `dstDevice`/`dstPitch` are `dst`'s live allocation, `width*4`×`height` fit
|
||||
// both. `copy_blocking` syncs before we unmap, so the array stays valid through the copy;
|
||||
// we always unmap afterward (even on error), keeping the map/unmap pair balanced.
|
||||
unsafe {
|
||||
ck(
|
||||
cuGraphicsMapResources(1, &mut self.resource, std::ptr::null_mut()),
|
||||
@@ -783,6 +859,14 @@ impl RegisteredTexture {
|
||||
width_bytes: usize,
|
||||
height: usize,
|
||||
) -> Result<()> {
|
||||
// SAFETY: identical contract to `copy_mapped_to` — `self.resource` is the valid
|
||||
// `CUgraphicsResource` from `register_gl` (wrappers → live table; caller holds GL+CUDA
|
||||
// contexts current). Map `count == 1` resource via the live `&mut self.resource`; the
|
||||
// mapped `CUarray` is written into the live local `array` (index 0, mip 0); on failure we
|
||||
// unmap and bail (balanced). `©` is a live local outliving the synchronous
|
||||
// `copy_blocking`: `srcArray` valid while mapped, `dstDevice`/`dstPitch` are the caller's
|
||||
// live plane, `width_bytes`×`height` fit it. We always unmap afterward, even on copy error,
|
||||
// so the map/unmap pair stays balanced and the array outlives the copy.
|
||||
unsafe {
|
||||
ck(
|
||||
cuGraphicsMapResources(1, &mut self.resource, std::ptr::null_mut()),
|
||||
@@ -847,6 +931,10 @@ pub fn copy_device_to_device(
|
||||
Height: src.height as usize,
|
||||
..Default::default()
|
||||
};
|
||||
// SAFETY: `copy_blocking` is unsafe (issues a CUDA copy); the caller must have the shared
|
||||
// context current (documented). `©` is a live local device→device `CUDA_MEMCPY2D` outliving
|
||||
// the synchronous call: `srcDevice`/`srcPitch` are `src`'s live allocation, `dstDevice`/
|
||||
// `dstPitch` the caller's live region, `width*4`×`height` within both. Wrapper → live table.
|
||||
unsafe { copy_blocking(©, "cuMemcpy2DAsync_v2(dev->dev)") }
|
||||
}
|
||||
|
||||
@@ -888,6 +976,12 @@ pub fn copy_nv12_to_device(
|
||||
Height: h / 2,
|
||||
..Default::default()
|
||||
};
|
||||
// SAFETY: two unsafe `copy_blocking` device→device copies; the caller must have the shared
|
||||
// context current (documented). `&y`/`&uv` are live local `CUDA_MEMCPY2D`s outliving each
|
||||
// synchronous call. All four device pointers are valid: `src.ptr`/`src_uv_ptr` come from a live
|
||||
// NV12 `DeviceBuffer` (its `.uv` presence was checked via `ok_or_else`), `y_dst`/`uv_dst` are
|
||||
// the caller's live NVENC surface planes; the luma copy is `w`×`h`, the chroma copy
|
||||
// `(w/2)*2`×`h/2`, each within its planes. Wrappers → live table.
|
||||
unsafe {
|
||||
copy_blocking(&y, "cuMemcpy2DAsync_v2(nv12 Y dev->dev)")?;
|
||||
copy_blocking(&uv, "cuMemcpy2DAsync_v2(nv12 UV dev->dev)")
|
||||
@@ -897,6 +991,12 @@ pub fn copy_nv12_to_device(
|
||||
impl Drop for RegisteredTexture {
|
||||
fn drop(&mut self) {
|
||||
if !self.resource.is_null() {
|
||||
// SAFETY: `self.resource` is non-null (just checked) and is the valid
|
||||
// `CUgraphicsResource` from `register_gl`, owned exclusively by this `RegisteredTexture`
|
||||
// and unregistered exactly once here (drop runs once) — no use-after-free or
|
||||
// double-unregister. `cuGraphicsUnregisterResource` releases the GL↔CUDA registration;
|
||||
// wrapper → live table (the resource exists ⇒ the driver was present). Result ignored
|
||||
// (best-effort teardown).
|
||||
unsafe {
|
||||
let _ = cuGraphicsUnregisterResource(self.resource);
|
||||
}
|
||||
@@ -913,7 +1013,11 @@ pub struct ExternalDmabuf {
|
||||
pub size: u64,
|
||||
}
|
||||
|
||||
// Raw driver handles; used from the single capture thread but moved with the importer.
|
||||
// SAFETY: the fields are opaque CUDA driver handles — an external-memory handle and a device
|
||||
// pointer — not dereferenceable Rust memory, and the value is uniquely owned (no `Clone`). It is
|
||||
// used from a single capture thread but constructed on / moved between threads with the importer;
|
||||
// transferring these handles is sound because uniqueness rules out aliasing and they are destroyed
|
||||
// exactly once in `Drop`. Only `Send` (not `Sync`) is asserted, matching the single-thread use.
|
||||
unsafe impl Send for ExternalDmabuf {}
|
||||
|
||||
impl ExternalDmabuf {
|
||||
@@ -921,6 +1025,9 @@ impl ExternalDmabuf {
|
||||
/// from then on) and map its full `size` bytes to a device pointer. The shared context
|
||||
/// must be current.
|
||||
pub fn import(fd: i32, size: u64) -> Result<ExternalDmabuf> {
|
||||
// SAFETY: `libc::dup` only reads the integer `fd` and returns a new descriptor (or -1); it
|
||||
// touches no Rust memory and `fd` is the caller's still-owned dmabuf fd (not consumed
|
||||
// here). No aliasing or lifetime concern — a pure syscall on an integer.
|
||||
let dup = unsafe { libc::dup(fd) };
|
||||
if dup < 0 {
|
||||
bail!("dup(dmabuf fd) failed");
|
||||
@@ -938,8 +1045,17 @@ impl ExternalDmabuf {
|
||||
};
|
||||
desc.handle[0] = dup as u32 as u64; // union member `int fd` (little-endian low bytes)
|
||||
let mut ext: CUexternalMemory = std::ptr::null_mut();
|
||||
// SAFETY: `cuImportExternalMemory` imports the memory described by `&desc`, a live local
|
||||
// `#[repr(C)] CUDA_EXTERNAL_MEMORY_HANDLE_DESC` (cuda.h 64-bit layout) that outlives this
|
||||
// synchronous call: `type_` is OPAQUE_FD, `handle[0]` holds the dup'd fd in the union's
|
||||
// `int fd` low bytes, `size` is set. `&mut ext` is a live null-init out-param the driver
|
||||
// writes the imported handle into. The driver takes ownership of the fd only on success.
|
||||
// Distinct locals → no aliasing. Wrapper → live table (caller holds the context current).
|
||||
let r = unsafe { cuImportExternalMemory(&mut ext, &desc) };
|
||||
if r != 0 {
|
||||
// SAFETY: import failed (`r != 0`), so the driver did NOT take ownership of `dup`; we
|
||||
// still own it and close it exactly once here on the error path (the success path never
|
||||
// closes it — the driver does). `libc::close` acts on the integer fd alone.
|
||||
unsafe { libc::close(dup) }; // import failed → the driver did not take the fd
|
||||
bail!("cuImportExternalMemory failed ({r}) — LINEAR dmabuf import unsupported?");
|
||||
}
|
||||
@@ -949,8 +1065,17 @@ impl ExternalDmabuf {
|
||||
..Default::default()
|
||||
};
|
||||
let mut ptr: CUdeviceptr = 0;
|
||||
// SAFETY: maps a device pointer from `ext` (the valid `CUexternalMemory` just imported) per
|
||||
// `&buf`, a live local `CUDA_EXTERNAL_MEMORY_BUFFER_DESC` (offset 0, full `size`) that
|
||||
// outlives this synchronous call. `&mut ptr` is a live zero-init out-param the driver writes
|
||||
// the mapped device address into; distinct locals → no aliasing. Wrapper → live table
|
||||
// (context current).
|
||||
let r = unsafe { cuExternalMemoryGetMappedBuffer(&mut ptr, ext, &buf) };
|
||||
if r != 0 {
|
||||
// SAFETY: mapping failed; `ext` is the valid `CUexternalMemory` we imported and
|
||||
// exclusively own. We destroy it exactly once here on the error path (the success path
|
||||
// instead moves it into the returned `ExternalDmabuf`, whose `Drop` destroys it),
|
||||
// releasing the fd the driver took — no double-destroy or use-after-free.
|
||||
unsafe {
|
||||
let _ = cuDestroyExternalMemory(ext);
|
||||
}
|
||||
@@ -962,6 +1087,12 @@ impl ExternalDmabuf {
|
||||
|
||||
impl Drop for ExternalDmabuf {
|
||||
fn drop(&mut self) {
|
||||
// SAFETY: this `ExternalDmabuf` only exists after a successful import, so the driver table
|
||||
// is live. It exclusively owns `self.ptr` (the mapped buffer) and `self.ext` (the external
|
||||
// memory), each torn down exactly once here (drop runs once; guarded by `!= 0` / `!null`) —
|
||||
// no double-free or use-after-free. We make the shared context current first because drop
|
||||
// may run off the import thread, and we free the mapped buffer before destroying its
|
||||
// backing external memory. Results ignored (best-effort teardown).
|
||||
unsafe {
|
||||
if let Some(c) = CONTEXT.get() {
|
||||
let _ = cuCtxSetCurrent(c.0);
|
||||
@@ -996,5 +1127,10 @@ pub fn copy_pitched_to_buffer(
|
||||
};
|
||||
// copy_blocking syncs our priority stream before returning, so the copy is complete before the
|
||||
// dmabuf is requeued to the producer.
|
||||
// SAFETY: `copy_blocking` is unsafe (issues a CUDA copy); the caller must have the shared
|
||||
// context current (documented). `©` is a live local device→device `CUDA_MEMCPY2D` outliving
|
||||
// the synchronous call: `srcDevice`/`srcPitch` are the caller's live mapped span (e.g. an
|
||||
// `ExternalDmabuf`), `dstDevice`/`dstPitch` are `dst`'s live allocation, `width*4`×`height`
|
||||
// within both. Wrapper → live table.
|
||||
unsafe { copy_blocking(©, "cuMemcpy2DAsync_v2(ext->dev)") }
|
||||
}
|
||||
|
||||
@@ -12,6 +12,8 @@
|
||||
//! owned [`DeviceBuffer`] so the dmabuf can be returned to the compositor immediately.
|
||||
|
||||
#![allow(non_upper_case_globals)]
|
||||
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
|
||||
#![deny(clippy::undocumented_unsafe_blocks)]
|
||||
|
||||
use super::cuda::{self, DeviceBuffer};
|
||||
use anyhow::{bail, ensure, Context as _, Result};
|
||||
@@ -415,6 +417,14 @@ impl Nv12Blit {
|
||||
|
||||
impl Drop for Nv12Blit {
|
||||
fn drop(&mut self) {
|
||||
// SAFETY: these GL names (textures/FBOs/VAO/programs) were all created by THIS `Nv12Blit`
|
||||
// in `Nv12Blit::new` on the current GL context, which is still current because the owning
|
||||
// `EglImporter` is dropped on its single capture thread (fields drop before
|
||||
// `EglImporter::drop`, which never releases the context). `glDelete*` takes a count + a
|
||||
// pointer to that many names: `&self.y_tex`/`&self.vao` are `&u32` to one live field (n=1);
|
||||
// `[self.y_fbo, self.uv_fbo].as_ptr()` points at a 2-element temporary that lives for the
|
||||
// whole `glDeleteFramebuffers` call (n=2 matches). The symbols dispatch through libGL
|
||||
// (libglvnd) to the driver for the current context. Each name is deleted exactly once.
|
||||
unsafe {
|
||||
glDeleteTextures(1, &self.y_tex);
|
||||
glDeleteTextures(1, &self.uv_tex);
|
||||
@@ -459,7 +469,14 @@ pub struct EglImporter {
|
||||
render_fd: c_int,
|
||||
}
|
||||
|
||||
// The EGL handles are confined to the capture thread; the struct is moved there once.
|
||||
// SAFETY: `EglImporter` owns thread-affine handles — an EGLDisplay/contexts made current on one
|
||||
// thread, a loaded GL proc pointer, a `gbm_device*`, a raw fd, and CUDA-registered GL textures —
|
||||
// none safe to touch concurrently. It is constructed inside `pipewire_thread` on the dedicated
|
||||
// `punktfunk-pipewire` thread, and every method (`import*`, `supported_modifiers`, `Drop`) runs on
|
||||
// that same thread; it is never accessed through a shared `&` from another thread. `Send` asserts
|
||||
// only that transferring *ownership* is sound (needed so the importer can live in the PipeWire
|
||||
// stream's user-data, whose API imposes a `Send` bound) — the live handles are never used
|
||||
// off-thread. `Sync` is deliberately NOT implied.
|
||||
unsafe impl Send for EglImporter {}
|
||||
|
||||
impl EglImporter {
|
||||
@@ -470,16 +487,38 @@ impl EglImporter {
|
||||
// to the same DRM device CUDA-GL interop associates with, which the EGL device platform
|
||||
// did not (cuGraphicsGLRegisterImage rejected device-platform GL textures).
|
||||
let path = std::ffi::CString::new("/dev/dri/renderD128").unwrap();
|
||||
// SAFETY: `path` is a live local `CString` (built from a string with no interior NUL, so it
|
||||
// is NUL-terminated); `path.as_ptr()` is a valid pointer to that buffer which outlives this
|
||||
// synchronous `open`. `open` only reads the path and returns a new fd (or -1); it neither
|
||||
// retains the pointer nor writes through it, so there is no aliasing or lifetime hazard.
|
||||
let render_fd = unsafe { libc::open(path.as_ptr(), libc::O_RDWR | libc::O_CLOEXEC) };
|
||||
ensure!(render_fd >= 0, "open /dev/dri/renderD128 for GBM");
|
||||
// SAFETY: `render_fd` is the live DRM render-node fd just returned by `open` and checked
|
||||
// `>= 0`. `gbm_create_device` (libgbm, linked above) builds a `gbm_device` over that fd and
|
||||
// returns a `*mut gbm_device` (or null); it borrows but does not take ownership of the fd,
|
||||
// which `EglImporter` keeps open and closes only in `Drop` after `gbm_device_destroy`. No
|
||||
// Rust-owned memory is passed, so there is nothing to alias.
|
||||
let gbm = unsafe { gbm_create_device(render_fd) };
|
||||
if gbm.is_null() {
|
||||
// SAFETY: reached only when `gbm_create_device` failed (null) — the fd was not consumed
|
||||
// and no `EglImporter` exists yet to close it again, so this `close` runs exactly once on
|
||||
// the live `render_fd`, releasing it before the error return. No double-close.
|
||||
unsafe { libc::close(render_fd) };
|
||||
anyhow::bail!("gbm_create_device failed");
|
||||
}
|
||||
|
||||
// SAFETY: `Egl::load_required` dlopens the system libEGL and binds its entry points,
|
||||
// trusting that libEGL (libglvnd) is a genuine EGL 1.5 implementation whose core symbols
|
||||
// match the ABI the `khronos_egl` `EGL1_5` bindings declare. No Rust memory is passed; the
|
||||
// returned instance is afterwards used only through the safe `khronos_egl` wrappers.
|
||||
let egl: Egl =
|
||||
unsafe { Egl::load_required() }.context("load libEGL (EGL 1.5 dynamic instance)")?;
|
||||
// SAFETY: `gbm` is the non-null `gbm_device*` created just above (checked), and
|
||||
// `EGL_PLATFORM_GBM_KHR` is exactly the platform enum that pairs with a GBM device as the
|
||||
// native-display handle, so the `gbm as NativeDisplayType` cast hands EGL a valid native
|
||||
// display for the requested platform. `&[egl::ATTRIB_NONE]` is a properly terminated, empty
|
||||
// attribute array borrowed for this synchronous call; EGL only reads it and returns an
|
||||
// `EGLDisplay`, retaining no pointer into Rust memory.
|
||||
let display = unsafe {
|
||||
egl.get_platform_display(
|
||||
EGL_PLATFORM_GBM_KHR,
|
||||
@@ -533,6 +572,13 @@ impl EglImporter {
|
||||
.context("eglCreateContext(OpenGL)")?;
|
||||
egl.make_current(display, None, None, Some(gl_ctx))
|
||||
.context("eglMakeCurrent surfaceless (needs EGL_KHR_surfaceless_context)")?;
|
||||
// SAFETY: the GL context was made current on this thread just above, which `eglGetProcAddress`
|
||||
// requires to return a usable pointer. The non-null (`?`-checked) pointer it returns for
|
||||
// "glEGLImageTargetTexture2DOES" is the driver's implementation of that GL-OES entry point,
|
||||
// whose real ABI is `void(GLenum, GLeglImageOES)` = `(u32, *mut c_void)` `extern "system"`.
|
||||
// `EglImageTargetFn` is declared with exactly that signature, so the transmute only retypes a
|
||||
// same-size, same-ABI thin function pointer (no value/representation change). The function is
|
||||
// present because `EGL_EXT_image_dma_buf_import` was asserted on this display above.
|
||||
let egl_image_target: EglImageTargetFn = unsafe {
|
||||
std::mem::transmute(
|
||||
egl.get_proc_address("glEGLImageTargetTexture2DOES")
|
||||
@@ -543,6 +589,10 @@ impl EglImporter {
|
||||
// Create the shared CUDA context up front so import() is pure hot path.
|
||||
cuda::context().context("create CUDA context")?;
|
||||
|
||||
// SAFETY: `egl::NO_CONTEXT` is EGL's defined sentinel (a null handle) for "no context";
|
||||
// `Context::from_ptr` only stores the handle (it never dereferences it), so wrapping the
|
||||
// null sentinel is sound and yields exactly the `EGL_NO_CONTEXT` value that
|
||||
// `eglCreateImage(EGL_LINUX_DMA_BUF_EXT)` requires as its context argument later.
|
||||
let no_ctx = unsafe { egl::Context::from_ptr(egl::NO_CONTEXT) };
|
||||
tracing::info!(
|
||||
"zero-copy EGL importer ready (GBM platform + GL texture interop, dma_buf_import + modifiers)"
|
||||
@@ -602,8 +652,21 @@ impl EglImporter {
|
||||
let Some(sym) = self.egl.get_proc_address("eglQueryDmaBufModifiersEXT") else {
|
||||
return Vec::new();
|
||||
};
|
||||
// SAFETY: `sym` is the non-null pointer `eglGetProcAddress("eglQueryDmaBufModifiersEXT")`
|
||||
// returned (the `let-else` already bailed on `None`) — the driver's implementation of that
|
||||
// EGL extension entry point. `QueryFn` is declared with that function's exact documented ABI
|
||||
// (`EGLDisplay, EGLint, EGLint, EGLuint64* , EGLBoolean*, EGLint* -> EGLBoolean`), all
|
||||
// `extern "system"`, so the transmute only retypes a same-size, same-ABI thin fn pointer.
|
||||
let query: QueryFn = unsafe { std::mem::transmute(sym) };
|
||||
let dpy = self.display.as_ptr();
|
||||
// SAFETY: `dpy` is this importer's live, initialized `EGLDisplay`; `query` is the proc loaded
|
||||
// just above. The first call passes null out-arrays with `max_modifiers == 0`, which the
|
||||
// extension defines as "write only the count" — it writes solely through `&mut count` (a live
|
||||
// local `i32`). For the second call, `mods`/`ext` are freshly allocated `Vec`s of exactly
|
||||
// `count` elements and `max_modifiers == count`, so the driver writes at most `count`
|
||||
// `u64`/`u32` entries (in bounds) plus the actual count through `&mut n` (a live local). All
|
||||
// four Rust addresses outlive these synchronous calls and alias nothing else. `truncate` only
|
||||
// shrinks, so even a misbehaving `n > count` cannot read out of bounds.
|
||||
unsafe {
|
||||
let mut count: i32 = 0;
|
||||
if query(
|
||||
@@ -699,6 +762,10 @@ impl EglImporter {
|
||||
]);
|
||||
}
|
||||
attrs.push(egl::ATTRIB_NONE);
|
||||
// SAFETY: `eglCreateImage(EGL_LINUX_DMA_BUF_EXT, ...)` mandates a NULL `EGLClientBuffer`
|
||||
// (the source is described entirely by the attribute list built above), so wrapping
|
||||
// `null_mut()` is the required value. `from_ptr` only stores the pointer without
|
||||
// dereferencing it, so constructing it from null is sound.
|
||||
let client = unsafe { egl::ClientBuffer::from_ptr(std::ptr::null_mut()) };
|
||||
let image = self
|
||||
.egl
|
||||
@@ -733,11 +800,21 @@ impl EglImporter {
|
||||
) -> Result<DeviceBuffer> {
|
||||
cuda::make_current()?;
|
||||
if self.blit.as_ref().map(|b| (b.width, b.height)) != Some((width, height)) {
|
||||
// SAFETY: `GlBlit::new` requires the GL context current on the calling thread and a
|
||||
// current CUDA context. Both hold: this runs on the capture thread where
|
||||
// `EglImporter::new` made the GL context current and never released it, and
|
||||
// `cuda::make_current()?` ran at the top of this function. `width`/`height` are plain
|
||||
// `Copy` frame dimensions.
|
||||
self.blit = Some(unsafe { GlBlit::new(width, height)? });
|
||||
}
|
||||
let egl_image_target = self.egl_image_target;
|
||||
let blit = self.blit.as_mut().unwrap();
|
||||
// SAFETY: GL + CUDA contexts current on this thread; `image` is a valid EGLImage.
|
||||
// SAFETY: `GlBlit::run` requires a current GL context and a valid `EGLImage`. The GL context
|
||||
// is current on this capture thread (made current in `EglImporter::new`, never released) and
|
||||
// `cuda::make_current()` ran above; `egl_image_target` is the `glEGLImageTargetTexture2DOES`
|
||||
// pointer loaded in `new`; `image` is the raw handle of the live `EGLImage` that
|
||||
// `import_inner` created with `eglCreateImage` and destroys only AFTER this call returns, so
|
||||
// it stays valid for the whole synchronous `run`.
|
||||
unsafe { blit.run(egl_image_target, image)? };
|
||||
// Persistent registration (mapped per frame) + a pooled buffer — no per-frame
|
||||
// cuGraphicsGLRegisterImage / cuMemAllocPitch.
|
||||
@@ -757,11 +834,21 @@ impl EglImporter {
|
||||
) -> Result<DeviceBuffer> {
|
||||
cuda::make_current()?;
|
||||
if self.nv12_blit.as_ref().map(|b| (b.width, b.height)) != Some((width, height)) {
|
||||
// SAFETY: `Nv12Blit::new` requires the GL context current on the calling thread and a
|
||||
// current CUDA context. Both hold: this runs on the capture thread where
|
||||
// `EglImporter::new` made the GL context current and never released it, and
|
||||
// `cuda::make_current()?` ran at the top of this function. `width`/`height` are plain
|
||||
// `Copy` frame dimensions.
|
||||
self.nv12_blit = Some(unsafe { Nv12Blit::new(width, height)? });
|
||||
}
|
||||
let egl_image_target = self.egl_image_target;
|
||||
let blit = self.nv12_blit.as_mut().unwrap();
|
||||
// SAFETY: GL + CUDA contexts current on this thread; `image` is a valid EGLImage.
|
||||
// SAFETY: `Nv12Blit::run` requires a current GL context and a valid `EGLImage`. The GL
|
||||
// context is current on this capture thread (made current in `EglImporter::new`, never
|
||||
// released) and `cuda::make_current()` ran above; `egl_image_target` is the
|
||||
// `glEGLImageTargetTexture2DOES` pointer loaded in `new`; `image` is the raw handle of the
|
||||
// live `EGLImage` that `import_inner` created with `eglCreateImage` and destroys only AFTER
|
||||
// this call returns, so it stays valid for the whole synchronous `run`.
|
||||
unsafe { blit.run(egl_image_target, image)? };
|
||||
let dst = blit.pool.get()?;
|
||||
cuda::copy_mapped_nv12(&mut blit.y_registered, &mut blit.uv_registered, &dst)?;
|
||||
@@ -787,9 +874,22 @@ impl EglImporter {
|
||||
);
|
||||
cuda::make_current()?;
|
||||
if self.nv12_blit.as_ref().map(|b| (b.width, b.height)) != Some((width, height)) {
|
||||
// SAFETY: `Nv12Blit::new` requires the GL context current on the calling thread and a
|
||||
// current CUDA context. Both hold: this self-test path runs on the thread that owns this
|
||||
// `EglImporter` with its GL context current, and `cuda::make_current()?` ran just above.
|
||||
// `width`/`height` are plain `Copy` scalars.
|
||||
self.nv12_blit = Some(unsafe { Nv12Blit::new(width, height)? });
|
||||
}
|
||||
let blit = self.nv12_blit.as_mut().unwrap();
|
||||
// SAFETY: runs on the thread that owns this `EglImporter` with its GL context current.
|
||||
// `blit.src_tex` is a texture this `Nv12Blit` owns; `glTexStorage2D` allocates immutable
|
||||
// RGBA8 storage exactly once (guarded by `test_src_storage`) sized `width×height`.
|
||||
// `glTexSubImage2D` then uploads exactly `width×height` RGBA8 texels, reading `width*height*4`
|
||||
// bytes from `rgba.as_ptr()`; the caller already asserted `rgba.len() == width*height*4`, rows
|
||||
// are `width*4` bytes (a multiple of the default 4-byte unpack alignment, so no row-padding
|
||||
// over-read), and `rgba` is a live borrow that outlives this synchronous upload. `run_passes`
|
||||
// then needs only the current GL context (no further Rust pointers). All GL names are this
|
||||
// blit's own, alias no other live object, and nothing is retained past the calls.
|
||||
unsafe {
|
||||
// Upload the host RGBA into `src_tex` (an immutable GL_RGBA8 backing must exist first;
|
||||
// the live path never allocates it — it retargets `src_tex` via EGLImage instead).
|
||||
@@ -824,9 +924,16 @@ impl EglImporter {
|
||||
impl Drop for EglImporter {
|
||||
fn drop(&mut self) {
|
||||
if !self.gbm.is_null() {
|
||||
// SAFETY: `self.gbm` is the non-null `gbm_device*` from `gbm_create_device` in `new`
|
||||
// (checked non-null here), owned exclusively by this `EglImporter` and destroyed exactly
|
||||
// once (in `Drop`). It is freed BEFORE `render_fd` is closed below — the correct order,
|
||||
// since the device borrowed that fd for its lifetime.
|
||||
unsafe { gbm_device_destroy(self.gbm) };
|
||||
}
|
||||
if self.render_fd >= 0 {
|
||||
// SAFETY: `self.render_fd` is the fd `open` returned in `new` (checked `>= 0`), owned
|
||||
// exclusively by this `EglImporter`; this `close` runs exactly once, after the gbm device
|
||||
// that borrowed it has been destroyed. No double-close or use-after-close.
|
||||
unsafe { libc::close(self.render_fd) };
|
||||
}
|
||||
}
|
||||
|
||||
@@ -16,6 +16,9 @@
|
||||
//! a stream's life). Falls back cleanly: any init/import error disables the importer and the
|
||||
//! CPU mmap path takes over.
|
||||
|
||||
// Every `unsafe` block in this file carries a `// SAFETY:` proof; enforce it (unsafe-proof program).
|
||||
#![deny(clippy::undocumented_unsafe_blocks)]
|
||||
|
||||
use super::cuda::{self, DeviceBuffer};
|
||||
use anyhow::{anyhow, bail, Context as _, Result};
|
||||
use ash::vk;
|
||||
@@ -51,12 +54,27 @@ pub struct VkBridge {
|
||||
dst: Option<DstBuf>,
|
||||
}
|
||||
|
||||
// Confined to the capture thread; moved there once.
|
||||
// SAFETY: `VkBridge` owns ash Vulkan handles (instance/device/queue/command pool+buffer/fence), a
|
||||
// CUDA external-memory mapping, and an fd→buffer cache — none `Sync`, and a single queue +
|
||||
// command buffer must be externally synchronized. It is created inside `EglImporter::import_linear`
|
||||
// on the dedicated `punktfunk-pipewire` capture thread and every method (`import_linear`, `Drop`)
|
||||
// runs on that thread; it is never shared via `&` across threads. `Send` asserts only that
|
||||
// transferring ownership is sound (so the bridge can live inside the `Send` `EglImporter`); the live
|
||||
// handles are never touched off-thread, and `Sync` is deliberately NOT implied.
|
||||
unsafe impl Send for VkBridge {}
|
||||
|
||||
impl VkBridge {
|
||||
/// Bring up Vulkan on the NVIDIA GPU with the external-memory extensions.
|
||||
pub fn new() -> Result<VkBridge> {
|
||||
// SAFETY: standard ash bring-up — every call is `unsafe` only because ash cannot statically
|
||||
// verify Vulkan handle/CreateInfo validity. `ash::Entry::load` dlopens a real system
|
||||
// libvulkan. Each `*CreateInfo`/`AllocateInfo` is built by ash's builders from locals (`app`,
|
||||
// `exts`, `prio`, `qci`, and the inline infos) that all live for the duration of the
|
||||
// synchronous `create_*`/`enumerate_*` call that reads them — in particular the
|
||||
// `enabled_extension_names(&exts)` and `queue_priorities(&prio)` borrows outlive their calls.
|
||||
// Every handle passed (`instance`, `phys`, `device`, `qf`, `cmd_pool`) was just created and
|
||||
// checked via `?`/`ok_or_else` in this same function, so no invalid handle is ever used. This
|
||||
// constructor shares nothing across threads.
|
||||
unsafe {
|
||||
let entry = ash::Entry::load().context("load libvulkan")?;
|
||||
let app = vk::ApplicationInfo::default().api_version(vk::API_VERSION_1_1);
|
||||
@@ -294,6 +312,19 @@ impl VkBridge {
|
||||
height: u32,
|
||||
pool: &cuda::BufferPool,
|
||||
) -> Result<DeviceBuffer> {
|
||||
// SAFETY: `fd` is the live dmabuf fd handed in by the caller (borrowed; `import_src` dup's it
|
||||
// internally and Vulkan owns the dup). `libc::lseek` only queries the fd's size. The unsafe
|
||||
// `import_src`/`ensure_dst` are called with a valid fd and a checked size. The bounds are
|
||||
// proven: `import_src` asserts `size >= span` (so the cached `src_size >= span`),
|
||||
// `copy_size = src_size.min(span)`, and `ensure_dst(copy_size)` makes `dst` at least
|
||||
// `copy_size` — so the GPU `cmd_copy_buffer` of `copy_size` bytes reads/writes within both
|
||||
// buffers, and the later CUDA pitched copy reading `[offset, span)` from `dst.cuda.ptr` (=
|
||||
// `offset + stride*height = span <= copy_size`) stays inside the freshly-copied region. The
|
||||
// `*Info`/`region`/`cmds`/`submit` are locals that outlive the synchronous calls reading them.
|
||||
// `cmd`/`queue`/`fence` are this bridge's own handles, used on this single thread only. The
|
||||
// host-side `wait_for_fences` fully retires the Vulkan copy BEFORE CUDA reads the shared
|
||||
// memory, so there is no GPU write/read data race. `dst` is an `&self.dst` shared borrow that
|
||||
// does not alias the `&self.device` calls.
|
||||
unsafe {
|
||||
let span = offset as u64 + stride as u64 * height as u64;
|
||||
if !self.src_cache.contains_key(&fd) {
|
||||
@@ -347,6 +378,15 @@ impl VkBridge {
|
||||
|
||||
impl Drop for VkBridge {
|
||||
fn drop(&mut self) {
|
||||
// SAFETY: runs once when the bridge is dropped on its owning capture thread.
|
||||
// `device_wait_idle` first drains all in-flight GPU work, so no queued command still
|
||||
// references these objects. Every handle freed (the `src_cache` buffers+memories, the `dst`
|
||||
// buffer+memory, `fence`, `cmd_pool`, `device`, `instance`) was created by this `VkBridge`
|
||||
// and owned exclusively by it, so each `destroy_*`/`free_*` runs exactly once with no
|
||||
// double-free, in dependency order (child objects before `device`, `device` before
|
||||
// `instance`). `dst.cuda` is dropped after `free_memory`, which is safe because CUDA holds
|
||||
// its own dup'd OPAQUE_FD reference to the underlying allocation. No other thread touches
|
||||
// these handles.
|
||||
unsafe {
|
||||
let _ = self.device.device_wait_idle();
|
||||
for (_, s) in self.src_cache.drain() {
|
||||
|
||||
Reference in New Issue
Block a user