Files
punktfunk/crates/punktfunk-host/src/dmabuf_fence.rs
T
enricobuehler 92c6da9546
ci / web (push) Failing after 42s
apple / swift (push) Failing after 1m5s
ci / rust (push) Failing after 1m10s
ci / docs-site (push) Failing after 44s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 5s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 5s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 5s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 5s
deb / build-publish (push) Successful in 2m54s
docker / deploy-docs (push) Successful in 18s
rpm / build-publish (push) Successful in 5m13s
fix(capture/mutter): restore zero-copy + sync via dmabuf implicit fence
The previous attempt (8531135) dropped zero-copy on Mutter+NVIDIA for a sticky
CPU/SHM fallback that (a) still listed SPA_DATA_DmaBuf in its buffer types, so
Mutter kept handing dmabufs that got mmap-read UNsynced — making the flashing
worse, not better — and (b) hinged on producer explicit sync, which Mutter+NVIDIA
cannot do (`error alloc buffers` / no cogl sync_fd, confirmed in worker-3 logs).

Revert the capture restructure to the original zero-copy dmabuf path, and fix the
NVIDIA stale-frame race the RIGHT way for a producer that can't do explicit sync:
the consumer snapshots the dmabuf's implicit fence (DMA_BUF_IOCTL_EXPORT_SYNC_FILE)
and waits the producer's render before sampling (new dmabuf_fence module, ioctl
number unit-tested). Covers the GPU import and the CPU mmap read. Logs once whether
a render was actually in flight (waited=true → the driver fences and the race is
closed; false → no implicit fence, so we learn zero-copy still needs SHM here).

drm_sync (the explicit-sync primitive) is kept and verified but marked unused —
no targeted compositor produces a usable sync_fd today; ready to wire in when one
does. The Bug-2 input fix (held-key release on disconnect) from 8531135 is kept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 09:28:17 +00:00

76 lines
3.3 KiB
Rust

//! Consumer-side implicit-fence wait for dmabuf capture (`DMA_BUF_IOCTL_EXPORT_SYNC_FILE`).
//!
//! Mutter renders its virtual monitor DIRECTLY into the PipeWire dmabuf and hands the buffer over
//! at GPU-submit time. With no fencing the consumer can sample mid-render and encode the buffer's
//! *previous* contents — the "stale/old frame" flashing on NVIDIA (KWin/gamescope blit into the
//! buffer so they don't hit this). The producer-driven fix is PipeWire explicit sync, but
//! Mutter+NVIDIA can't produce a sync_fd (`error alloc buffers` / no cogl sync_fd).
//!
//! So sync from the *consumer* side instead: a dmabuf carries its in-flight GPU work as an implicit
//! fence on its reservation object. `DMA_BUF_IOCTL_EXPORT_SYNC_FILE` snapshots that into a sync_file
//! fd we can `poll()` — readable once the producer's writes complete. This makes zero-copy capture
//! race-free WITHOUT the producer doing anything, *iff* the driver actually attaches the fence. If it
//! attaches none, the export yields an already-signaled sync_file (poll returns immediately) — no
//! wait, no harm, and `waited=false` tells us the driver doesn't fence (so zero-copy would still race).
use std::os::fd::RawFd;
// linux/dma-buf.h ioctls on the DMA_BUF_BASE ('b' = 0x62) magic. _IOWR = dir(3)<<30 | size<<16 | base<<8 | nr.
const DMA_BUF_BASE: u64 = 0x62;
const fn iowr(nr: u32, size: usize) -> u64 {
(3u64 << 30) | ((size as u64) << 16) | (DMA_BUF_BASE << 8) | nr as u64
}
#[repr(C)]
struct DmaBufExportSyncFile {
flags: u32,
fd: i32,
}
const DMA_BUF_IOCTL_EXPORT_SYNC_FILE: u64 = iowr(2, std::mem::size_of::<DmaBufExportSyncFile>());
/// We will READ the buffer → export the fence(s) we must wait for before reading (the producer's writes).
const DMA_BUF_SYNC_READ: u32 = 1 << 0;
/// Wait until the producer's writes to `dmabuf_fd` complete (or `timeout_ms` elapses). Returns:
/// - `Ok(true)` — a render was still in flight and we waited on its fence (the race was real, now closed).
/// - `Ok(false)` — no fence / already signaled (the driver attaches no implicit fence; zero-copy can race).
/// - `Err` — the ioctl failed (e.g. the kernel/driver lacks `EXPORT_SYNC_FILE`).
pub fn wait_read_ready(dmabuf_fd: RawFd, timeout_ms: i32) -> std::io::Result<bool> {
let mut req = DmaBufExportSyncFile {
flags: DMA_BUF_SYNC_READ,
fd: -1,
};
let r = unsafe { libc::ioctl(dmabuf_fd, DMA_BUF_IOCTL_EXPORT_SYNC_FILE, &mut req) };
if r < 0 {
return Err(std::io::Error::last_os_error());
}
let sync_fd = req.fd;
if sync_fd < 0 {
return Ok(false); // no sync_file exported
}
let mut pfd = libc::pollfd {
fd: sync_fd,
events: libc::POLLIN,
revents: 0,
};
// Non-blocking probe: not-yet-signaled (poll==0) means the producer is still rendering.
let pending = unsafe { libc::poll(&mut pfd, 1, 0) } == 0;
if pending {
pfd.revents = 0;
unsafe { libc::poll(&mut pfd, 1, timeout_ms) }; // block until the render fence signals
}
unsafe { libc::close(sync_fd) };
Ok(pending)
}
#[cfg(test)]
mod tests {
use super::*;
/// The ioctl number must match linux/dma-buf.h exactly — it's computed, so lock it down.
#[test]
fn ioctl_number_matches_dma_buf_h() {
assert_eq!(DMA_BUF_IOCTL_EXPORT_SYNC_FILE, 0xC008_6202);
}
}