feat(windows): pf-vdisplay IDD-push — HDR + pipelined zero-copy capture
apple / swift (push) Successful in 1m4s
windows-host / package (push) Successful in 6m28s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m14s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m10s
release / apple (push) Successful in 7m53s
android / android (push) Successful in 10m33s
ci / web (push) Successful in 44s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 3m4s
ci / docs-site (push) Successful in 53s
ci / rust (push) Successful in 12m22s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m11s
apple / screenshots (push) Successful in 5m24s
deb / build-publish (push) Successful in 3m16s
decky / build-publish (push) Successful in 21s
ci / bench (push) Successful in 4m42s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m34s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m13s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 47s
flatpak / build-publish (push) Successful in 4m24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 25s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m44s
apple / swift (push) Successful in 1m4s
windows-host / package (push) Successful in 6m28s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Successful in 1m14s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Successful in 1m10s
release / apple (push) Successful in 7m53s
android / android (push) Successful in 10m33s
ci / web (push) Successful in 44s
windows / build (aarch64-pc-windows-msvc) (push) Successful in 3m4s
ci / docs-site (push) Successful in 53s
ci / rust (push) Successful in 12m22s
windows / build (x86_64-pc-windows-msvc) (push) Successful in 1m11s
apple / screenshots (push) Successful in 5m24s
deb / build-publish (push) Successful in 3m16s
decky / build-publish (push) Successful in 21s
ci / bench (push) Successful in 4m42s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 27s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m34s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m42s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m13s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 47s
flatpak / build-publish (push) Successful in 4m24s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m5s
docker / deploy-docs (push) Successful in 25s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 7m44s
HDR (display-driven, matching the WGC path): - CTA-861.3 HDR EDID (BT.2020 primaries + HDR Static Metadata block) so Windows offers "Use HDR" on the virtual display. The host FOLLOWS the display's live advanced-color state, recreating the shared ring at the matching format (FP16 in HDR / BGRA in SDR) on a toggle — no freeze. - Always emit Main10/BT.2020-PQ Rgb10a2 while the display is HDR; the client auto-detects PQ from the HEVC VUI (clients under-report VIDEO_CAP_10BIT). Generic HDR10 mastering SEI on every IDR. - Generation-tagged `latest` (gen<<40|seq<<8|slot) + driver `is_stale` re-attach kill the toggle-time garbage frame and any stale-ring read. Perf: - Pipeline the encode loop (Capturer::pipeline_depth; IDD-push = 2): submit N+1 before polling N so the convert/copy on the 3D engine overlaps the NVENC encode of N on the ASIC. PUNKTFUNK_IDD_DEPTH overrides (1 = synchronous). - Rotating host output ring (OUT_RING) so the in-flight encode and the next convert never touch the same texture. - HDR converts directly from the keyed-mutex slot's SRV into the output ring (drops the redundant slot->fp16 scratch copy); SDR copies the BGRA slot in. The slot mutex is held only across the convert/copy, not the encode. RING_LEN 3->6 for publish headroom. - Capture-health diagnostic: new_fps vs repeat_fps under PUNKTFUNK_PERF (a low new_fps at a high send rate means the source isn't compositing, not an encode stall). Validated live on the RTX box: 5120x1440@240 HDR streams; driver composes ~180 new fps, encode 240 fps @ ~4.3 ms p50. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -21,6 +21,7 @@ features = [
|
||||
"Win32_Security",
|
||||
"Win32_System_SystemServices",
|
||||
"Win32_System_Threading",
|
||||
"Win32_System_Memory",
|
||||
"Win32_System_Diagnostics_Debug",
|
||||
"Win32_Graphics_Direct3D",
|
||||
"Win32_Graphics_Direct3D11",
|
||||
|
||||
@@ -11,10 +11,17 @@ use wdf_umdf_sys::{
|
||||
DISPLAYCONFIG_TARGET_MODE, DISPLAYCONFIG_VIDEO_SIGNAL_INFO, IDARG_IN_ADAPTER_INIT_FINISHED,
|
||||
IDARG_IN_COMMITMODES, IDARG_IN_GETDEFAULTDESCRIPTIONMODES, IDARG_IN_PARSEMONITORDESCRIPTION,
|
||||
IDARG_IN_QUERYTARGETMODES, IDARG_IN_SETSWAPCHAIN, IDARG_OUT_GETDEFAULTDESCRIPTIONMODES,
|
||||
IDARG_OUT_PARSEMONITORDESCRIPTION, IDARG_OUT_QUERYTARGETMODES, IDDCX_ADAPTER__,
|
||||
IDARG_OUT_PARSEMONITORDESCRIPTION, IDARG_OUT_QUERYTARGETMODES, IDDCX_ADAPTER__, IDDCX_PATH,
|
||||
IDDCX_MONITOR_MODE, IDDCX_MONITOR_MODE_ORIGIN, IDDCX_MONITOR__, IDDCX_TARGET_MODE, NTSTATUS,
|
||||
WDFDEVICE, WDF_POWER_DEVICE_STATE,
|
||||
};
|
||||
// IddCx 1.10 *2 DDIs (HDR-capable). For B1 we advertise SDR (8 bpc) so behaviour is unchanged; B2
|
||||
// flips the bit depth + adapter flag to enable HDR.
|
||||
use wdf_umdf_sys::{
|
||||
IDARG_IN_COMMITMODES2, IDARG_IN_PARSEMONITORDESCRIPTION2, IDARG_IN_QUERYTARGETMODES2,
|
||||
IDARG_IN_QUERYTARGET_INFO, IDARG_OUT_QUERYTARGET_INFO, IDDCX_BITS_PER_COMPONENT, IDDCX_MONITOR_MODE2,
|
||||
IDDCX_PATH2, IDDCX_TARGET_CAPS, IDDCX_TARGET_MODE2, IDDCX_WIRE_BITS_PER_COMPONENT,
|
||||
};
|
||||
|
||||
use crate::{
|
||||
context::{DeviceContext, MonitorContext},
|
||||
@@ -179,6 +186,7 @@ pub extern "C-unwind" fn monitor_get_default_modes(
|
||||
_p_in_args: *const IDARG_IN_GETDEFAULTDESCRIPTIONMODES,
|
||||
_p_out_args: *mut IDARG_OUT_GETDEFAULTDESCRIPTIONMODES,
|
||||
) -> NTSTATUS {
|
||||
info!("GET_DEFAULT_MODES called (we return NOT_IMPLEMENTED — only valid for a monitor with NO EDID)");
|
||||
NTSTATUS::STATUS_NOT_IMPLEMENTED
|
||||
}
|
||||
|
||||
@@ -287,9 +295,20 @@ pub extern "C-unwind" fn monitor_query_modes(
|
||||
|
||||
pub extern "C-unwind" fn adapter_commit_modes(
|
||||
_adapter_object: *mut IDDCX_ADAPTER__,
|
||||
_p_in_args: *const IDARG_IN_COMMITMODES,
|
||||
p_in_args: *const IDARG_IN_COMMITMODES,
|
||||
) -> NTSTATUS {
|
||||
// The swap-chain is managed by IddCx; there is nothing device-specific to reconfigure on a commit.
|
||||
// DIAGNOSTIC: does the OS commit an ACTIVE path for our monitor? IDDCX_PATH_FLAGS_ACTIVE = 2. If
|
||||
// no active path is ever committed, the OS never calls ASSIGN_SWAPCHAIN (the bug we're chasing).
|
||||
let in_args = unsafe { &*p_in_args };
|
||||
info!("COMMIT_MODES: path_count={}", in_args.PathCount);
|
||||
for i in 0..in_args.PathCount {
|
||||
let path: &IDDCX_PATH = unsafe { &*in_args.pPaths.add(i as usize) };
|
||||
let active = (path.Flags.0 & 2) != 0;
|
||||
info!(
|
||||
" path[{i}] monitor={:p} flags=0x{:x} active={active}",
|
||||
path.MonitorObject, path.Flags.0
|
||||
);
|
||||
}
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
|
||||
@@ -320,3 +339,194 @@ pub extern "C-unwind" fn unassign_swap_chain(monitor_object: *mut IDDCX_MONITOR_
|
||||
.into()
|
||||
}
|
||||
}
|
||||
|
||||
// ===== IddCx 1.10 *2 DDIs (HDR-capable path) ============================================
|
||||
// These mirror the 1.x callbacks above but advertise per-mode wire bit-depth. B1 reports SDR (8 bpc);
|
||||
// B2 bumps `wire_bits()` to add 10 bpc + sets CAN_PROCESS_FP16 to actually enable HDR.
|
||||
|
||||
/// Wire bit-depth advertised per mode. B2: advertise BOTH 8 and 10 bpc RGB so the OS offers HDR10
|
||||
/// modes (the bitfield: 8 = 0x2, 10 = 0x4).
|
||||
fn wire_bits() -> IDDCX_WIRE_BITS_PER_COMPONENT {
|
||||
let rgb = IDDCX_BITS_PER_COMPONENT(
|
||||
IDDCX_BITS_PER_COMPONENT::IDDCX_BITS_PER_COMPONENT_8.0
|
||||
| IDDCX_BITS_PER_COMPONENT::IDDCX_BITS_PER_COMPONENT_10.0,
|
||||
);
|
||||
IDDCX_WIRE_BITS_PER_COMPONENT {
|
||||
Rgb: rgb,
|
||||
YCbCr444: IDDCX_BITS_PER_COMPONENT::IDDCX_BITS_PER_COMPONENT_NONE,
|
||||
YCbCr422: IDDCX_BITS_PER_COMPONENT::IDDCX_BITS_PER_COMPONENT_NONE,
|
||||
YCbCr420: IDDCX_BITS_PER_COMPONENT::IDDCX_BITS_PER_COMPONENT_NONE,
|
||||
}
|
||||
}
|
||||
|
||||
/// 1.10 variant of [`parse_monitor_description`] — writes `IDDCX_MONITOR_MODE2` (adds bit-depth).
|
||||
pub extern "C-unwind" fn parse_monitor_description2(
|
||||
p_in_args: *const IDARG_IN_PARSEMONITORDESCRIPTION2,
|
||||
p_out_args: *mut IDARG_OUT_PARSEMONITORDESCRIPTION,
|
||||
) -> NTSTATUS {
|
||||
let in_args = unsafe { &*p_in_args };
|
||||
let out_args = unsafe { &mut *p_out_args };
|
||||
|
||||
let Ok(monitors) = MONITOR_MODES.lock() else {
|
||||
error!("MONITOR_MODES mutex poisoned");
|
||||
return NTSTATUS::STATUS_DRIVER_INTERNAL_ERROR;
|
||||
};
|
||||
|
||||
let edid = unsafe {
|
||||
std::slice::from_raw_parts(
|
||||
in_args.MonitorDescription.pData as *const u8,
|
||||
in_args.MonitorDescription.DataSize as usize,
|
||||
)
|
||||
};
|
||||
let Ok(monitor_index) = Edid::get_serial(edid) else {
|
||||
error!("bad edid ({} bytes)", edid.len());
|
||||
return NTSTATUS::STATUS_INVALID_VIEW_SIZE;
|
||||
};
|
||||
let Some(monitor) = monitors.iter().find(|&m| m.data.id == monitor_index) else {
|
||||
error!("Failed to find monitor id {monitor_index}");
|
||||
return NTSTATUS::STATUS_DRIVER_INTERNAL_ERROR;
|
||||
};
|
||||
|
||||
let number_of_modes: u32 = monitor
|
||||
.data
|
||||
.modes
|
||||
.iter()
|
||||
.map(|m| u32::try_from(m.refresh_rates.len()).expect("Cannot use > u32::MAX refresh rates"))
|
||||
.sum();
|
||||
|
||||
out_args.MonitorModeBufferOutputCount = number_of_modes;
|
||||
if in_args.MonitorModeBufferInputCount < number_of_modes {
|
||||
return if in_args.MonitorModeBufferInputCount > 0 {
|
||||
NTSTATUS::STATUS_BUFFER_TOO_SMALL
|
||||
} else {
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
};
|
||||
}
|
||||
|
||||
let monitor_modes = unsafe {
|
||||
std::slice::from_raw_parts_mut(
|
||||
in_args.pMonitorModes.cast::<MaybeUninit<IDDCX_MONITOR_MODE2>>(),
|
||||
number_of_modes as usize,
|
||||
)
|
||||
};
|
||||
for (mode, out_mode) in monitor.data.modes.flatten().zip(monitor_modes.iter_mut()) {
|
||||
out_mode.write(IDDCX_MONITOR_MODE2 {
|
||||
#[allow(clippy::cast_possible_truncation)]
|
||||
Size: mem::size_of::<IDDCX_MONITOR_MODE2>() as u32,
|
||||
Origin: IDDCX_MONITOR_MODE_ORIGIN::IDDCX_MONITOR_MODE_ORIGIN_MONITORDESCRIPTOR,
|
||||
MonitorVideoSignalInfo: display_info(mode.width, mode.height, mode.refresh_rate),
|
||||
BitsPerComponent: wire_bits(),
|
||||
});
|
||||
}
|
||||
out_args.PreferredMonitorModeIdx = 0;
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
|
||||
fn target_mode2(width: u32, height: u32, refresh_rate: u32) -> IDDCX_TARGET_MODE2 {
|
||||
let m1 = target_mode(width, height, refresh_rate);
|
||||
IDDCX_TARGET_MODE2 {
|
||||
#[allow(clippy::cast_possible_truncation)]
|
||||
Size: mem::size_of::<IDDCX_TARGET_MODE2>() as u32,
|
||||
TargetVideoSignalInfo: m1.TargetVideoSignalInfo,
|
||||
BitsPerComponent: wire_bits(),
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
/// 1.10 variant of [`monitor_query_modes`] — writes `IDDCX_TARGET_MODE2`.
|
||||
pub extern "C-unwind" fn monitor_query_modes2(
|
||||
monitor_object: *mut IDDCX_MONITOR__,
|
||||
p_in_args: *const IDARG_IN_QUERYTARGETMODES2,
|
||||
p_out_args: *mut IDARG_OUT_QUERYTARGETMODES,
|
||||
) -> NTSTATUS {
|
||||
let Ok(monitors) = MONITOR_MODES.lock() else {
|
||||
error!("MONITOR_MODES mutex poisoned");
|
||||
return NTSTATUS::STATUS_DRIVER_INTERNAL_ERROR;
|
||||
};
|
||||
let Some(monitor) = monitors
|
||||
.iter()
|
||||
.find(|&m| m.object.is_some_and(|p| p.as_ptr() == monitor_object))
|
||||
else {
|
||||
error!("Failed to find monitor object in cache for {monitor_object:?}");
|
||||
return NTSTATUS::STATUS_DRIVER_INTERNAL_ERROR;
|
||||
};
|
||||
|
||||
let number_of_modes = monitor
|
||||
.data
|
||||
.modes
|
||||
.iter()
|
||||
.map(|m| u32::try_from(m.refresh_rates.len()).expect("Cannot use > u32::MAX modes"))
|
||||
.sum();
|
||||
|
||||
let out_args = unsafe { &mut *p_out_args };
|
||||
out_args.TargetModeBufferOutputCount = number_of_modes;
|
||||
|
||||
let in_args = unsafe { &*p_in_args };
|
||||
if in_args.TargetModeBufferInputCount >= number_of_modes {
|
||||
let out_target_modes = unsafe {
|
||||
std::slice::from_raw_parts_mut(
|
||||
in_args.pTargetModes.cast::<MaybeUninit<IDDCX_TARGET_MODE2>>(),
|
||||
number_of_modes as usize,
|
||||
)
|
||||
};
|
||||
for (mode, out_target) in monitor.data.modes.flatten().zip(out_target_modes.iter_mut()) {
|
||||
out_target.write(target_mode2(mode.width, mode.height, mode.refresh_rate));
|
||||
}
|
||||
}
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
|
||||
/// 1.10 variant of [`adapter_commit_modes`] — `IDDCX_PATH2` carries the committed wire format.
|
||||
pub extern "C-unwind" fn adapter_commit_modes2(
|
||||
_adapter_object: *mut IDDCX_ADAPTER__,
|
||||
p_in_args: *const IDARG_IN_COMMITMODES2,
|
||||
) -> NTSTATUS {
|
||||
let in_args = unsafe { &*p_in_args };
|
||||
info!("COMMIT_MODES2: path_count={}", in_args.PathCount);
|
||||
for i in 0..in_args.PathCount {
|
||||
let path: &IDDCX_PATH2 = unsafe { &*in_args.pPaths.add(i as usize) };
|
||||
let active = (path.Flags.0 & 2) != 0;
|
||||
info!(
|
||||
" path2[{i}] monitor={:p} flags=0x{:x} active={active} colorspace={} rgb_bpc=0x{:x}",
|
||||
path.MonitorObject,
|
||||
path.Flags.0,
|
||||
path.WireFormatInfo.ColorSpace.0,
|
||||
path.WireFormatInfo.BitsPerComponent.Rgb.0
|
||||
);
|
||||
}
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
|
||||
/// 1.10 NEW: per-target capabilities. B2 reports `HIGH_COLOR_SPACE` so the OS enables HDR10 (transfer
|
||||
/// curve + wide gamut) on this target.
|
||||
pub extern "C-unwind" fn query_target_info(
|
||||
_adapter_object: *mut IDDCX_ADAPTER__,
|
||||
_p_in_args: *mut IDARG_IN_QUERYTARGET_INFO,
|
||||
p_out_args: *mut IDARG_OUT_QUERYTARGET_INFO,
|
||||
) -> NTSTATUS {
|
||||
let out_args = unsafe { &mut *p_out_args };
|
||||
out_args.TargetCaps = IDDCX_TARGET_CAPS::IDDCX_TARGET_CAPS_HIGH_COLOR_SPACE;
|
||||
out_args.DitheringSupport = IDDCX_WIRE_BITS_PER_COMPONENT::default();
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
|
||||
/// 1.10 NEW (HDR): the OS hands us the default HDR10 static metadata for the monitor. B2 accepts it
|
||||
/// (the host/client own the final HDR metadata for the stream); B3 will forward it to the host for the
|
||||
/// HEVC mastering-display SEI. Stub keeps the OS's HDR setup happy.
|
||||
pub extern "C-unwind" fn set_default_hdr_metadata(
|
||||
_monitor_object: *mut IDDCX_MONITOR__,
|
||||
_p_in_args: *const wdf_umdf_sys::IDARG_IN_MONITOR_SET_DEFAULT_HDR_METADATA,
|
||||
) -> NTSTATUS {
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
|
||||
/// 1.10 HDR: the OS hands us the gamma ramp (a 3x4 colour-space matrix in HDR mode). We do NOT apply it
|
||||
/// server-side — the host streams the scRGB FP16 and the CLIENT's display applies its own transform —
|
||||
/// so we accept it. Wiring this is OBLIGATED once CAN_PROCESS_FP16 is set; without it the OS rejects
|
||||
/// the adapter at init (`IddCxAdapterInitAsync` → "Failed to get adapter").
|
||||
pub extern "C-unwind" fn set_gamma_ramp(
|
||||
_monitor_object: *mut IDDCX_MONITOR__,
|
||||
_p_in_args: *const wdf_umdf_sys::IDARG_IN_SET_GAMMARAMP,
|
||||
) -> NTSTATUS {
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
|
||||
@@ -2,6 +2,7 @@ use std::{
|
||||
mem::{self, size_of},
|
||||
num::{ParseIntError, TryFromIntError},
|
||||
ptr::{addr_of_mut, NonNull},
|
||||
sync::{Arc, Mutex},
|
||||
};
|
||||
|
||||
use anyhow::anyhow;
|
||||
@@ -13,7 +14,7 @@ use wdf_umdf::{
|
||||
use wdf_umdf_sys::{
|
||||
DISPLAYCONFIG_VIDEO_OUTPUT_TECHNOLOGY, HANDLE, IDARG_IN_ADAPTER_INIT, IDARG_IN_MONITORCREATE,
|
||||
IDARG_IN_SETUP_HWCURSOR, IDARG_OUT_ADAPTER_INIT, IDARG_OUT_MONITORARRIVAL,
|
||||
IDARG_OUT_MONITORCREATE, IDDCX_ADAPTER, IDDCX_ADAPTER_CAPS, IDDCX_CURSOR_CAPS,
|
||||
IDARG_OUT_MONITORCREATE, IDDCX_ADAPTER, IDDCX_ADAPTER_CAPS, IDDCX_ADAPTER_FLAGS, IDDCX_CURSOR_CAPS,
|
||||
IDDCX_ENDPOINT_DIAGNOSTIC_INFO, IDDCX_ENDPOINT_VERSION, IDDCX_FEATURE_IMPLEMENTATION,
|
||||
IDDCX_MONITOR, IDDCX_MONITOR_DESCRIPTION, IDDCX_MONITOR_DESCRIPTION_TYPE, IDDCX_MONITOR_INFO,
|
||||
IDDCX_SWAPCHAIN, IDDCX_TRANSMISSION_TYPE, IDDCX_XOR_CURSOR_SUPPORT, LUID, NTSTATUS, WDFDEVICE,
|
||||
@@ -34,6 +35,37 @@ use crate::{
|
||||
// Maximum amount of monitors that can be connected
|
||||
pub const MAX_MONITORS: u8 = 16;
|
||||
|
||||
/// ONE shared D3D render device, reused across every swap-chain assignment (keyed by render LUID).
|
||||
/// Creating a fresh `Direct3DDevice` per assign — and the swap-chain flap fires several assigns per
|
||||
/// session — spawned a new NVIDIA UMD worker-thread set each time that was NEVER reclaimed on release
|
||||
/// (proven on the RTX box: ~70 `nvwgf2umx` threads + ~50 MB VRAM leaked per reconnect, permanently,
|
||||
/// even though our `Direct3DDevice` refcount dropped to 0). Pooling one device keeps a single, stable
|
||||
/// thread set: the processors borrow an `Arc`, so the device outlives them and is never re-created.
|
||||
static DEVICE_POOL: Mutex<Option<(i64, Arc<Direct3DDevice>)>> = Mutex::new(None);
|
||||
|
||||
/// Get-or-create the pooled D3D device for `luid`. Re-creates only if the render adapter changes
|
||||
/// (e.g. a GPU hot-swap), which drops the old `Arc` once its last processor releases it.
|
||||
fn pooled_device(luid: windows::Win32::Foundation::LUID) -> Option<Arc<Direct3DDevice>> {
|
||||
let key = (i64::from(luid.HighPart) << 32) | i64::from(luid.LowPart as u32);
|
||||
let mut pool = DEVICE_POOL.lock().ok()?;
|
||||
if let Some((k, dev)) = pool.as_ref() {
|
||||
if *k == key {
|
||||
return Some(dev.clone());
|
||||
}
|
||||
}
|
||||
match Direct3DDevice::init(luid) {
|
||||
Ok(d) => {
|
||||
let a = Arc::new(d);
|
||||
*pool = Some((key, a.clone()));
|
||||
Some(a)
|
||||
}
|
||||
Err(e) => {
|
||||
error!("pooled Direct3DDevice::init failed: {e:?}");
|
||||
None
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub struct DeviceContext {
|
||||
device: WDFDEVICE,
|
||||
adapter: Option<IDDCX_ADAPTER>,
|
||||
@@ -48,6 +80,11 @@ unsafe impl Sync for DeviceContext {}
|
||||
pub struct MonitorContext {
|
||||
device: IDDCX_MONITOR,
|
||||
swap_chain_processor: Option<SwapChainProcessor>,
|
||||
/// OS target id (from IddCxMonitorArrival), stamped on this context at creation. assign_swap_chain
|
||||
/// uses THIS instead of a MONITOR_MODES pointer lookup — the lookup returns 0 for a recreated
|
||||
/// (session-2+) monitor, which broke the shared-ring naming and cascaded into SetDevice
|
||||
/// E_INVALIDARG + an access violation (the fix-teardown crash).
|
||||
target_id: u32,
|
||||
}
|
||||
|
||||
// SAFETY: Raw ptr is managed by external library
|
||||
@@ -98,6 +135,10 @@ impl DeviceContext {
|
||||
#[allow(clippy::cast_possible_truncation)]
|
||||
Size: size_of::<IDDCX_ADAPTER_CAPS>() as u32,
|
||||
|
||||
// B2 HDR: declare we can process FP16 (scRGB) desktop surfaces — enables HDR10 / SDR WCG.
|
||||
// This OBLIGATES the *2 mode DDIs (done) + ReleaseAndAcquireBuffer2 (done in run_core).
|
||||
Flags: IDDCX_ADAPTER_FLAGS::IDDCX_ADAPTER_FLAGS_CAN_PROCESS_FP16,
|
||||
|
||||
MaxMonitorsSupported: u32::from(MAX_MONITORS),
|
||||
|
||||
EndPointDiagnostics: IDDCX_ENDPOINT_DIAGNOSTIC_INFO {
|
||||
@@ -231,6 +272,14 @@ impl DeviceContext {
|
||||
}
|
||||
}
|
||||
|
||||
// Stamp the OS target id onto the monitor's CONTEXT so assign_swap_chain reads it directly
|
||||
// (no MONITOR_MODES pointer lookup, which returns 0 for a recreated monitor).
|
||||
unsafe {
|
||||
let _ = MonitorContext::get_mut(monitor_create_out.MonitorObject.cast(), |ctx| {
|
||||
ctx.target_id = arrival_out.OsTargetId;
|
||||
});
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
@@ -240,6 +289,7 @@ impl MonitorContext {
|
||||
Self {
|
||||
device,
|
||||
swap_chain_processor: None,
|
||||
target_id: 0,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -265,20 +315,37 @@ impl MonitorContext {
|
||||
render_adapter.HighPart, render_adapter.LowPart
|
||||
);
|
||||
|
||||
let device = Direct3DDevice::init(luid);
|
||||
// The OS target id keys the per-monitor shared frame-push objects (header/event/textures) the
|
||||
// host opens. Read it from THIS context (stamped at creation after IddCxMonitorArrival) — the
|
||||
// old MONITOR_MODES pointer lookup returned 0 for a recreated (session-2+) monitor, which broke
|
||||
// the ring naming and cascaded into SetDevice E_INVALIDARG + an access violation.
|
||||
let target_id = self.target_id;
|
||||
|
||||
if let Ok(device) = device {
|
||||
let device = pooled_device(luid);
|
||||
|
||||
if let Some(device) = device {
|
||||
let mut processor = SwapChainProcessor::new();
|
||||
|
||||
processor.run(swap_chain, device, new_frame_event);
|
||||
processor.run(
|
||||
swap_chain,
|
||||
device,
|
||||
new_frame_event,
|
||||
target_id,
|
||||
render_adapter.LowPart,
|
||||
render_adapter.HighPart,
|
||||
);
|
||||
|
||||
self.swap_chain_processor = Some(processor);
|
||||
|
||||
self.setup_hw_cursor();
|
||||
// Cursor is BAKED into the captured video: for IDD-push we deliberately do NOT advertise a
|
||||
// hardware cursor, so DWM software-composites the mouse cursor into the swapchain surface we
|
||||
// capture — the client then sees the cursor in the stream. (A future separate-plane cursor
|
||||
// would re-enable setup_hw_cursor + IddCxMonitorQueryHardwareCursor.) Not advertising one
|
||||
// also stops leaking a CreateEventA handle per assign.
|
||||
} else {
|
||||
// It's important to delete the swap-chain if D3D initialization fails, so that the OS knows to generate a new
|
||||
// swap-chain and try again.
|
||||
error!("Direct3DDevice::init FAILED on render LUID: {device:?} — deleting swap chain for OS retry");
|
||||
// It's important to delete the swap-chain if D3D init fails, so the OS generates a fresh
|
||||
// swap-chain and retries.
|
||||
error!("pooled Direct3DDevice unavailable for render LUID — deleting swap chain for OS retry");
|
||||
|
||||
unsafe {
|
||||
let _ = WdfObjectDelete(swap_chain.cast());
|
||||
@@ -287,9 +354,15 @@ impl MonitorContext {
|
||||
}
|
||||
|
||||
pub fn unassign_swap_chain(&mut self) {
|
||||
self.swap_chain_processor.take();
|
||||
let had = self.swap_chain_processor.take().is_some();
|
||||
error!("unassign_swap_chain (target={}) — dropped live processor: {had}", self.target_id);
|
||||
}
|
||||
|
||||
/// Advertise a HARDWARE cursor. NOT called for IDD-push — we bake the cursor into the video
|
||||
/// instead (see `assign_swap_chain`). Kept for a future separate-plane cursor (which would pair it
|
||||
/// with `IddCxMonitorQueryHardwareCursor`). Leaks a `CreateEventA` handle per call, so only wire it
|
||||
/// back up alongside a real cursor-plane consumer.
|
||||
#[allow(dead_code)]
|
||||
pub fn setup_hw_cursor(&mut self) {
|
||||
let mouse_event = unsafe { CreateEventA(None, false, false, s!("vdd_mouse_event")) };
|
||||
let Ok(mouse_event) = mouse_event else {
|
||||
|
||||
@@ -6,8 +6,9 @@
|
||||
use std::ffi::c_void;
|
||||
use std::mem::size_of;
|
||||
use std::sync::atomic::{AtomicBool, Ordering};
|
||||
use std::sync::Mutex;
|
||||
use std::thread;
|
||||
use std::time::Duration;
|
||||
use std::time::{Duration, Instant};
|
||||
|
||||
use log::{error, info};
|
||||
use wdf_umdf::{
|
||||
@@ -16,7 +17,7 @@ use wdf_umdf::{
|
||||
};
|
||||
use wdf_umdf_sys::{IDARG_IN_ADAPTERSETRENDERADAPTER, LUID, NTSTATUS, WDFDEVICE, WDFREQUEST};
|
||||
|
||||
use crate::context::DeviceContext;
|
||||
use crate::context::{DeviceContext, MonitorContext};
|
||||
use crate::monitor::{
|
||||
default_modes, Mode, MonitorData, MonitorObject, ADAPTER, MONITOR_MODES, NEXT_ID,
|
||||
PREFERRED_RENDER_ADAPTER, PROTOCOL_VERSION, WATCHDOG_COUNTDOWN, WATCHDOG_TIMEOUT,
|
||||
@@ -37,6 +38,16 @@ const IOCTL_CLEAR_ALL: u32 = ctl(0x804);
|
||||
const IOCTL_PING: u32 = ctl(0x888);
|
||||
const IOCTL_GET_VERSION: u32 = ctl(0x8FF);
|
||||
|
||||
/// Serializes monitor lifecycle ops — ADD / REMOVE / watchdog-teardown — against each other. Without
|
||||
/// it, a watchdog expiry can drain an entry out from under an in-flight `do_add` (which releases the
|
||||
/// `MONITOR_MODES` lock before the slow `create_monitor`), leaving `do_add` to return
|
||||
/// `STATUS_UNSUCCESSFUL` → the host sees `ERROR_GEN_FAILURE`. This was the reconnect-churn fault.
|
||||
static MONITOR_OP_LOCK: Mutex<()> = Mutex::new(());
|
||||
/// A monitor created less than this ago is still in its host-side setup window (CCD commit + GDI-name
|
||||
/// resolve + topology settle, ~5 s) and is never reaped by the watchdog — only by an explicit
|
||||
/// CLEAR_ALL. Protects a freshly-born monitor from a transient PING gap during reconnect churn.
|
||||
const MONITOR_GRACE: Duration = Duration::from_secs(6);
|
||||
|
||||
#[repr(C)]
|
||||
struct AddParams {
|
||||
width: u32,
|
||||
@@ -117,7 +128,7 @@ pub extern "C-unwind" fn device_io_control(
|
||||
IOCTL_GET_WATCHDOG => do_get_watchdog(request, output_len, &mut bytes),
|
||||
IOCTL_PING => NTSTATUS::STATUS_SUCCESS,
|
||||
IOCTL_CLEAR_ALL => {
|
||||
disconnect_all_monitors();
|
||||
disconnect_all_monitors(true);
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
IOCTL_GET_VERSION => do_get_version(request, output_len, &mut bytes),
|
||||
@@ -136,6 +147,11 @@ unsafe fn do_add(
|
||||
output_len: usize,
|
||||
bytes: &mut usize,
|
||||
) -> NTSTATUS {
|
||||
// Serialize the whole ADD (push entry → create_monitor → verify) against the watchdog teardown +
|
||||
// REMOVE, so an expiry can never drain this entry mid-flight. `create_monitor` is fast (the slow
|
||||
// CCD/GDI work is host-side, after this returns), and PING/GET_WATCHDOG don't take this lock, so
|
||||
// the host keeps the watchdog reset while we hold it.
|
||||
let _op = MONITOR_OP_LOCK.lock().unwrap();
|
||||
if input_len < size_of::<AddParams>() || output_len < size_of::<AddOut>() {
|
||||
return NTSTATUS::STATUS_BUFFER_TOO_SMALL;
|
||||
}
|
||||
@@ -182,6 +198,7 @@ unsafe fn do_add(
|
||||
target_id: 0,
|
||||
adapter_luid_low: 0,
|
||||
adapter_luid_high: 0,
|
||||
created_at: Instant::now(),
|
||||
});
|
||||
|
||||
// Create the IddCx monitor via the device context (captures target id + LUID into the entry).
|
||||
@@ -226,18 +243,37 @@ unsafe fn do_remove(request: WDFREQUEST, input_len: usize) -> NTSTATUS {
|
||||
let params = unsafe { &*pin.cast::<RemoveParams>() };
|
||||
let guid = guid_key(¶ms.guid);
|
||||
|
||||
let mut lock = MONITOR_MODES.lock().unwrap();
|
||||
if let Some(pos) = lock.iter().position(|m| m.guid == guid) {
|
||||
let mon = lock.remove(pos);
|
||||
if let Some(obj) = mon.object {
|
||||
if let Err(e) = unsafe { IddCxMonitorDeparture(obj.as_ptr()) } {
|
||||
error!("REMOVE: departure failed: {e:?}");
|
||||
}
|
||||
// Serialize against ADD + watchdog teardown (lock order: OP_LOCK → MONITOR_MODES).
|
||||
let _op = MONITOR_OP_LOCK.lock().unwrap();
|
||||
let mon = {
|
||||
let mut lock = MONITOR_MODES.lock().unwrap();
|
||||
match lock.iter().position(|m| m.guid == guid) {
|
||||
Some(pos) => lock.remove(pos),
|
||||
None => return NTSTATUS::STATUS_NOT_FOUND,
|
||||
}
|
||||
info!("REMOVE target_id={}", mon.target_id);
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
} else {
|
||||
NTSTATUS::STATUS_NOT_FOUND
|
||||
// MONITOR_MODES released here — the processor-join + departure below must not hold it.
|
||||
};
|
||||
if let Some(obj) = mon.object {
|
||||
free_swap_chain_processor(obj.as_ptr());
|
||||
if let Err(e) = unsafe { IddCxMonitorDeparture(obj.as_ptr()) } {
|
||||
error!("REMOVE: departure failed: {e:?}");
|
||||
}
|
||||
}
|
||||
info!("REMOVE target_id={}", mon.target_id);
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
|
||||
/// Drop a monitor's live swap-chain processor BEFORE departure. The WDF context is an
|
||||
/// `Arc<RwLock<MonitorContext>>` that WDF frees WITHOUT running Rust `Drop` (no `EvtCleanupCallback`
|
||||
/// is wired), and the OS does not reliably call UNASSIGN on a host-initiated departure — so the
|
||||
/// streaming `Direct3DDevice` (its ~dozens of D3D worker threads + tens of MB of VRAM) was orphaned
|
||||
/// once per session, the dominant reconnect-churn leak. `get_mut` takes the context `RwLock`, so this
|
||||
/// is safe against a concurrent OS unassign callback (whichever runs second sees `None`).
|
||||
fn free_swap_chain_processor(monitor: *mut wdf_umdf_sys::IDDCX_MONITOR__) {
|
||||
// SAFETY: `monitor` is a live IddCx monitor object whose context was init'd at creation.
|
||||
let r = unsafe { MonitorContext::get_mut(monitor.cast(), |ctx| ctx.unassign_swap_chain()) };
|
||||
if let Err(e) = r {
|
||||
error!("free_swap_chain_processor: get_mut FAILED: {e:?}");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -295,22 +331,46 @@ unsafe fn do_get_version(request: WDFREQUEST, output_len: usize, bytes: &mut usi
|
||||
NTSTATUS::STATUS_SUCCESS
|
||||
}
|
||||
|
||||
/// Tear down every monitor (watchdog expiry — the host is gone). Mirrors SudoVDA's DisconnectAllMonitors.
|
||||
fn disconnect_all_monitors() {
|
||||
let mut lock = MONITOR_MODES.lock().unwrap();
|
||||
if lock.is_empty() {
|
||||
return;
|
||||
}
|
||||
for mon in lock.drain(..) {
|
||||
/// Tear down monitors. `force` (CLEAR_ALL) reaps EVERYTHING — orphans from a crashed previous host;
|
||||
/// the watchdog passes `false`, which spares any monitor still inside its creation grace
|
||||
/// (`MONITOR_GRACE`) so a freshly-born monitor is never reaped mid-setup. Caller MUST hold
|
||||
/// `MONITOR_OP_LOCK` (lock order: OP_LOCK → MONITOR_MODES). Mirrors SudoVDA's DisconnectAllMonitors.
|
||||
fn disconnect_all_monitors_locked(force: bool) {
|
||||
// Drain under the lock (fast); free processors + depart OUTSIDE it (the processor-join blocks).
|
||||
let to_depart: Vec<MonitorObject> = {
|
||||
let mut lock = MONITOR_MODES.lock().unwrap();
|
||||
if lock.is_empty() {
|
||||
return;
|
||||
}
|
||||
let mut keep: Vec<MonitorObject> = Vec::new();
|
||||
let mut depart: Vec<MonitorObject> = Vec::new();
|
||||
for mon in lock.drain(..) {
|
||||
if !force && mon.created_at.elapsed() < MONITOR_GRACE {
|
||||
keep.push(mon); // still in its host-side setup window — leave it alone
|
||||
} else {
|
||||
depart.push(mon);
|
||||
}
|
||||
}
|
||||
*lock = keep;
|
||||
depart
|
||||
};
|
||||
for mon in to_depart {
|
||||
if let Some(obj) = mon.object {
|
||||
free_swap_chain_processor(obj.as_ptr());
|
||||
// SAFETY: `obj` is a live IddCx monitor object.
|
||||
if let Err(e) = unsafe { IddCxMonitorDeparture(obj.as_ptr()) } {
|
||||
error!("watchdog: monitor departure failed: {e:?}");
|
||||
error!("teardown: monitor departure failed: {e:?}");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Public entry: takes `MONITOR_OP_LOCK`, then tears down. Used by CLEAR_ALL (`force = true`).
|
||||
fn disconnect_all_monitors(force: bool) {
|
||||
let _op = MONITOR_OP_LOCK.lock().unwrap();
|
||||
disconnect_all_monitors_locked(force);
|
||||
}
|
||||
|
||||
/// Start the watchdog thread (once). The host reads the timeout via GET_WATCHDOG and PINGs every
|
||||
/// timeout/3; if it stops, the countdown reaches 0 and every monitor is torn down — so a crashed/gone
|
||||
/// host never leaves a phantom display. Mirrors SudoVDA's RunWatchdog.
|
||||
@@ -340,8 +400,14 @@ pub fn start_watchdog() {
|
||||
.is_ok()
|
||||
&& prev - 1 == 0
|
||||
{
|
||||
error!("watchdog expired (host stopped pinging) — tearing down all monitors");
|
||||
disconnect_all_monitors();
|
||||
// About to fire. Serialize against do_add/do_remove (so we never tear an entry out from
|
||||
// under an in-flight ADD), then RE-CHECK the countdown under the lock: if a concurrent
|
||||
// IOCTL (PING/ADD) reset it while we were acquiring the lock, the host is alive — abort.
|
||||
let _op = MONITOR_OP_LOCK.lock().unwrap();
|
||||
if WATCHDOG_COUNTDOWN.load(Ordering::Relaxed) == 0 {
|
||||
error!("watchdog expired (host stopped pinging) — tearing down stale monitors");
|
||||
disconnect_all_monitors_locked(false);
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
@@ -1,3 +1,5 @@
|
||||
use std::sync::atomic::{AtomicI32, Ordering};
|
||||
|
||||
use windows::{
|
||||
core::Error,
|
||||
Win32::{
|
||||
@@ -29,13 +31,19 @@ impl From<&'static str> for Direct3DError {
|
||||
}
|
||||
}
|
||||
|
||||
/// DIAGNOSTIC: live `Direct3DDevice` count. Each one holds an `ID3D11Device` whose NVIDIA UMD spawns
|
||||
/// ~dozens of worker threads; if this climbs without bound across reconnects, devices are leaking.
|
||||
pub static LIVE_DEVICES: AtomicI32 = AtomicI32::new(0);
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct Direct3DDevice {
|
||||
// The following are already refcounted, so they're safe to use directly without additional drop impls
|
||||
_dxgi_factory: IDXGIFactory5,
|
||||
_adapter: IDXGIAdapter1,
|
||||
pub device: ID3D11Device,
|
||||
_device_context: ID3D11DeviceContext,
|
||||
/// The single (SINGLETHREADED) immediate context — used by the frame-push publisher's
|
||||
/// `CopyResource` on the swap-chain processor thread (the one thread this device is touched from).
|
||||
pub device_context: ID3D11DeviceContext,
|
||||
}
|
||||
|
||||
impl Direct3DDevice {
|
||||
@@ -67,11 +75,21 @@ impl Direct3DDevice {
|
||||
let device = device.ok_or("ID3D11Device not found")?;
|
||||
let device_context = device_context.ok_or("ID3D11DeviceContext not found")?;
|
||||
|
||||
let live = LIVE_DEVICES.fetch_add(1, Ordering::Relaxed) + 1;
|
||||
log::error!("Direct3DDevice::init OK — live D3D devices = {live}");
|
||||
|
||||
Ok(Self {
|
||||
_dxgi_factory: dxgi_factory,
|
||||
_adapter: adapter,
|
||||
device,
|
||||
_device_context: device_context,
|
||||
device_context,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for Direct3DDevice {
|
||||
fn drop(&mut self) {
|
||||
let live = LIVE_DEVICES.fetch_sub(1, Ordering::Relaxed) - 1;
|
||||
log::error!("Direct3DDevice::drop — live D3D devices = {live}");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,114 +1,118 @@
|
||||
use std::{array::TryFromSliceError, ops::Deref};
|
||||
//! The 256-byte EDID the pf-vdisplay driver hands IddCx for each virtual monitor: a 128-byte EDID 1.4
|
||||
//! base block + a **CTA-861.3 extension** that advertises HDR — a BT.2020 Colorimetry Data Block and an
|
||||
//! HDR Static Metadata Data Block declaring the SMPTE ST 2084 (PQ) EOTF. Windows reads a display's HDR
|
||||
//! capability from this CTA HDR block; without it the monitor is treated as SDR-only regardless of the
|
||||
//! IddCx adapter's `CAN_PROCESS_FP16` / `HIGH_COLOR_SPACE` / 10-bit mode caps (the missing piece that
|
||||
//! made "Use HDR" never appear for the virtual display). The base block declares EDID 1.4 + 10-bit
|
||||
//! digital so the panel's bit depth is unambiguous.
|
||||
//!
|
||||
//! Identity: manufacturer "PNK" (bytes 8-9), product name "punktfunk" (the 0xFC display descriptor). The
|
||||
//! serial-number field (base offset 0x0C, little-endian) encodes the per-monitor index so
|
||||
//! `parse_monitor_description` can map an EDID the OS hands back to its monitor; [`Edid::generate_with`]
|
||||
//! patches that serial and recomputes BOTH block checksums (base byte 127 + extension byte 255). The
|
||||
//! detailed-timing / range-limit descriptors are placeholders — the modes we actually advertise come
|
||||
//! from the monitor's stored mode list (`monitor.rs` / `callbacks.rs`), not from parsing this EDID.
|
||||
|
||||
use bytemuck::{Pod, Zeroable};
|
||||
use std::array::TryFromSliceError;
|
||||
|
||||
// A clean, self-contained 128-byte EDID carrying punktfunk's own identity — manufacturer ID "PNK"
|
||||
// (bytes 8-9) and product name "punktfunk" (the 0xFC display-descriptor). Derived from the
|
||||
// virtual-display-rs base block (a standard, widely-deployed virtual EDID); it deliberately carries NO
|
||||
// other driver's bytes or branding. The serial-number field (offset 0x0C) encodes the per-monitor
|
||||
// index, so `parse_monitor_description` can map an EDID the OS hands back to its monitor;
|
||||
// `generate_with` patches that serial and `gen_checksum` recomputes byte 127 before the EDID reaches
|
||||
// IddCx. The detailed-timing / range-limit descriptors are placeholders: the modes we actually
|
||||
// advertise come from the monitor's stored mode list (`monitor.rs` / `callbacks.rs`), not from parsing
|
||||
// this EDID.
|
||||
const _EDID: [u8; 128] = [
|
||||
0x00, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x41, 0xCB, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
|
||||
0xFF, 0x21, 0x01, 0x03, 0x80, 0x32, 0x1F, 0x78, 0x07, 0xEE, 0x95, 0xA3, 0x54, 0x4C, 0x99, 0x26,
|
||||
0x0F, 0x50, 0x54, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
|
||||
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x02, 0x3A, 0x80, 0x18, 0x71, 0x38, 0x2D, 0x40, 0x58, 0x2C,
|
||||
0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x1E, 0x00, 0x00, 0x00, 0xFD, 0x00, 0x17, 0xF0, 0x0F,
|
||||
0xFF, 0x0F, 0x00, 0x0A, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x00, 0x00, 0x00, 0xFC, 0x00, 0x70,
|
||||
0x75, 0x6E, 0x6B, 0x74, 0x66, 0x75, 0x6E, 0x6B, 0x0A, 0x20, 0x20, 0x20, 0x00, 0x00, 0x00, 0x00,
|
||||
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
|
||||
/// Per-monitor serial number, base-block offset 0x0C, little-endian u32.
|
||||
const SERIAL_OFFSET: usize = 0x0C;
|
||||
|
||||
/// EDID 1.4 base block (128 bytes). Differs from a plain SDR virtual EDID only by: revision 1.4 (byte
|
||||
/// 19 = 0x04), 10-bit digital video input (byte 20 = 0xB0), and one extension present (byte 126 = 0x01).
|
||||
/// Byte 127 (checksum) and the serial (0x0C) are filled/patched in [`Edid::generate_with`].
|
||||
#[rustfmt::skip]
|
||||
const BASE: [u8; 128] = [
|
||||
0x00, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, // fixed header
|
||||
0x41, 0xCB, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // mfr "PNK", product, serial (patched)
|
||||
0xFF, 0x21, 0x01, 0x04, 0xB0, 0x32, 0x1F, 0x78, // week/year, EDID 1.4, 10-bit digital, size, gamma
|
||||
0x03, 0x78, 0xB1, 0xB5, 0x4A, 0x2B, 0xCC, 0x21, // feature (sRGB-default CLEARED), BT.2020 primaries...
|
||||
0x0B, 0x50, 0x54, 0x00, 0x00, 0x00, 0x01, 0x01, // ...BT.2020 primaries, established timings, std timings
|
||||
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
|
||||
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x02, 0x3A, // std timings, DTD 1 (placeholder preferred timing)
|
||||
0x80, 0x18, 0x71, 0x38, 0x2D, 0x40, 0x58, 0x2C,
|
||||
0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x1E,
|
||||
0x00, 0x00, 0x00, 0xFD, 0x00, 0x17, 0xF0, 0x0F, // display range-limits descriptor
|
||||
0xFF, 0x0F, 0x00, 0x0A, 0x20, 0x20, 0x20, 0x20,
|
||||
0x20, 0x20, 0x00, 0x00, 0x00, 0xFC, 0x00, 0x70, // name descriptor "punktfunk"
|
||||
0x75, 0x6E, 0x6B, 0x74, 0x66, 0x75, 0x6E, 0x6B,
|
||||
0x0A, 0x20, 0x20, 0x20, 0x00, 0x00, 0x00, 0x00, // empty 4th descriptor...
|
||||
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
|
||||
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, // ...byte 126 = 1 extension, byte 127 = checksum
|
||||
];
|
||||
|
||||
const EDID_LEN: usize = _EDID.len();
|
||||
/// CTA-861.3 extension block (128 bytes), block 1. Header + a Data Block Collection holding the
|
||||
/// Colorimetry and HDR Static Metadata data blocks; the rest is padding up to the checksum (byte 255).
|
||||
/// `D` (byte 130) marks where DTDs would start (= end of the data blocks); we carry none.
|
||||
#[rustfmt::skip]
|
||||
const CTA_HEADER: [u8; 4] = [
|
||||
0x02, // CTA Extension tag
|
||||
0x03, // revision 3 (CTA-861.3 — required for the extended-tag data blocks below)
|
||||
0x0F, // D = 15: the (empty) DTD region starts at block byte 15, i.e. data blocks occupy bytes 4..15
|
||||
0x00, // 0 native DTDs; no basic audio; no YCbCr 4:4:4/4:2:2 (RGB-only, matching the wire format)
|
||||
];
|
||||
|
||||
static EDID: AlignedEdid<EDID_LEN> = AlignedEdid {
|
||||
data: _EDID,
|
||||
_align: [],
|
||||
};
|
||||
/// Colorimetry Data Block (CTA extended tag 0x05): declare BT.2020 RGB (bit 7). YCbCr variants are left
|
||||
/// clear — the IddCx wire format is RGB-only — and the gamut-metadata flags are 0.
|
||||
#[rustfmt::skip]
|
||||
const COLORIMETRY_DB: [u8; 4] = [
|
||||
0xE3, // tag 0b111 (use-extended-tag) | length 3
|
||||
0x05, // extended tag: Colorimetry
|
||||
0x80, // BT2020RGB (bit 7); xvYCC/sYCC/opRGB/BT2020 YCC/cYCC all clear
|
||||
0x00, // gamut metadata profiles MD0..MD3: none
|
||||
];
|
||||
|
||||
#[repr(C)]
|
||||
struct AlignedEdid<const N: usize> {
|
||||
data: [u8; N],
|
||||
// required to make this type aligned to Edid
|
||||
_align: [Edid; 0],
|
||||
}
|
||||
/// HDR Static Metadata Data Block (CTA extended tag 0x06): EOTFs = Traditional SDR (ET_0) + SMPTE ST
|
||||
/// 2084 / PQ (ET_2); Static Metadata Type 1 (SM_0). Plus the optional desired-content luminance hints
|
||||
/// (~993 nit max, ~400 nit max-frame-average, ~0.05 nit min) so the block is complete.
|
||||
#[rustfmt::skip]
|
||||
const HDR_STATIC_METADATA_DB: [u8; 7] = [
|
||||
0xE6, // tag 0b111 (use-extended-tag) | length 6
|
||||
0x06, // extended tag: HDR Static Metadata
|
||||
0x05, // Supported EOTFs: ET_0 (traditional SDR) | ET_2 (SMPTE ST 2084 / PQ)
|
||||
0x01, // Supported Static Metadata Descriptors: SM_0 (Static Metadata Type 1)
|
||||
0x8A, // Desired Content Max Luminance (code 138 ≈ 993 nits)
|
||||
0x60, // Desired Content Max Frame-avg Lum. (code 96 = 400 nits)
|
||||
0x12, // Desired Content Min Luminance (code 18 ≈ 0.05 nits)
|
||||
];
|
||||
|
||||
impl<const N: usize> AlignedEdid<N> {
|
||||
fn new(data: &[u8]) -> Result<Self, TryFromSliceError> {
|
||||
let data: [u8; N] = data.try_into()?;
|
||||
Ok(Self { data, _align: [] })
|
||||
}
|
||||
}
|
||||
|
||||
impl<const N: usize> Deref for AlignedEdid<N> {
|
||||
type Target = Edid;
|
||||
|
||||
fn deref(&self) -> &Self::Target {
|
||||
let header = &self.data[..EDID_SIZE];
|
||||
bytemuck::from_bytes(header)
|
||||
}
|
||||
}
|
||||
|
||||
const EDID_SIZE: usize = std::mem::size_of::<Edid>();
|
||||
|
||||
#[repr(C)]
|
||||
#[derive(Debug, Copy, Clone, Pod, Zeroable)]
|
||||
pub struct Edid {
|
||||
header: [u8; 8],
|
||||
manufacturer_id: [u8; 2],
|
||||
product_code: u16,
|
||||
serial_number: u32,
|
||||
manufacture_week: u8,
|
||||
manufacture_year: u8,
|
||||
version: u8,
|
||||
revision: u8,
|
||||
}
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct Edid;
|
||||
|
||||
impl Edid {
|
||||
/// Build the full 256-byte EDID for monitor `serial`, with both block checksums recomputed.
|
||||
pub fn generate_with(serial: u32) -> Vec<u8> {
|
||||
// change serial number in the header
|
||||
let mut header = *EDID;
|
||||
header.serial_number = serial;
|
||||
|
||||
header.generate()
|
||||
let mut edid = [0u8; 256];
|
||||
// Block 0: base.
|
||||
edid[..128].copy_from_slice(&BASE);
|
||||
edid[SERIAL_OFFSET..SERIAL_OFFSET + 4].copy_from_slice(&serial.to_le_bytes());
|
||||
// Block 1: CTA-861.3 extension (header + colorimetry + HDR static metadata; rest stays 0).
|
||||
edid[128..132].copy_from_slice(&CTA_HEADER);
|
||||
edid[132..136].copy_from_slice(&COLORIMETRY_DB);
|
||||
edid[136..143].copy_from_slice(&HDR_STATIC_METADATA_DB);
|
||||
// Each 128-byte block ends in a checksum byte that makes the block sum ≡ 0 (mod 256).
|
||||
Self::fix_block_checksum(&mut edid, 0);
|
||||
Self::fix_block_checksum(&mut edid, 128);
|
||||
edid.to_vec()
|
||||
}
|
||||
|
||||
/// Read the per-monitor serial (base offset 0x0C, little-endian) from an EDID the OS handed back.
|
||||
/// Works for the full 256-byte EDID or just the 128-byte base block. Errors (rather than panics) on
|
||||
/// a too-short buffer so the caller can reject a malformed descriptor.
|
||||
pub fn get_serial(edid: &[u8]) -> Result<u32, TryFromSliceError> {
|
||||
let edid = AlignedEdid::<EDID_LEN>::new(edid)?;
|
||||
Ok(edid.serial_number)
|
||||
let bytes: [u8; 4] = edid
|
||||
.get(SERIAL_OFFSET..SERIAL_OFFSET + 4)
|
||||
.unwrap_or(&[])
|
||||
.try_into()?;
|
||||
Ok(u32::from_le_bytes(bytes))
|
||||
}
|
||||
|
||||
fn generate(&self) -> Vec<u8> {
|
||||
let header = bytemuck::bytes_of(self);
|
||||
|
||||
// slice of monitor edid minus header
|
||||
let data = &EDID.data[EDID_SIZE..];
|
||||
|
||||
// splice together header and the rest of the EDID
|
||||
let mut edid: Vec<u8> = header.iter().chain(data).copied().collect();
|
||||
// regenerate checksum
|
||||
Self::gen_checksum(&mut edid);
|
||||
|
||||
edid
|
||||
}
|
||||
|
||||
fn gen_checksum(data: &mut [u8]) {
|
||||
// important, this is the bare minimum length
|
||||
assert!(data.len() >= 128);
|
||||
|
||||
// slice to the entire data minus the last checksum byte
|
||||
let edid_data = &data[..=126];
|
||||
|
||||
// do checksum calculation
|
||||
let sum: u32 = edid_data.iter().copied().map(u32::from).sum();
|
||||
// this wont ever truncate
|
||||
#[allow(clippy::cast_possible_truncation)]
|
||||
let checksum = (256 - (sum % 256)) as u8;
|
||||
|
||||
// update last byte with new checksum
|
||||
data[127] = checksum;
|
||||
/// Set the trailing byte of the 128-byte block at `start` so the block's bytes sum to 0 (mod 256) —
|
||||
/// the standard EDID block checksum.
|
||||
fn fix_block_checksum(edid: &mut [u8], start: usize) {
|
||||
let sum = edid[start..start + 127]
|
||||
.iter()
|
||||
.fold(0u8, |acc, &b| acc.wrapping_add(b));
|
||||
edid[start + 127] = 0u8.wrapping_sub(sum);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -12,8 +12,10 @@ use wdf_umdf_sys::{
|
||||
};
|
||||
|
||||
use crate::callbacks::{
|
||||
adapter_commit_modes, adapter_init_finished, assign_swap_chain, device_d0_entry,
|
||||
monitor_get_default_modes, monitor_query_modes, parse_monitor_description, unassign_swap_chain,
|
||||
adapter_commit_modes, adapter_commit_modes2, adapter_init_finished, assign_swap_chain,
|
||||
device_d0_entry, monitor_get_default_modes, monitor_query_modes, monitor_query_modes2,
|
||||
parse_monitor_description, parse_monitor_description2, query_target_info,
|
||||
set_default_hdr_metadata, set_gamma_ramp, unassign_swap_chain,
|
||||
};
|
||||
use crate::context::DeviceContext;
|
||||
use crate::control::device_io_control;
|
||||
@@ -73,6 +75,15 @@ extern "C-unwind" fn driver_add(
|
||||
config.EvtIddCxMonitorGetDefaultDescriptionModes = Some(monitor_get_default_modes);
|
||||
config.EvtIddCxMonitorQueryTargetModes = Some(monitor_query_modes);
|
||||
config.EvtIddCxAdapterCommitModes = Some(adapter_commit_modes);
|
||||
// IddCx 1.10 *2 mode DDIs (HDR-capable path). The OS prefers these on 1.10; the 1.x callbacks
|
||||
// above stay as the down-level fallback. B1 advertises SDR through them (so behaviour is unchanged);
|
||||
// B2 enables HDR by adding 10 bpc in `wire_bits()`, HIGH_COLOR_SPACE caps, and CAN_PROCESS_FP16.
|
||||
config.EvtIddCxParseMonitorDescription2 = Some(parse_monitor_description2);
|
||||
config.EvtIddCxMonitorQueryTargetModes2 = Some(monitor_query_modes2);
|
||||
config.EvtIddCxAdapterCommitModes2 = Some(adapter_commit_modes2);
|
||||
config.EvtIddCxAdapterQueryTargetInfo = Some(query_target_info);
|
||||
config.EvtIddCxMonitorSetDefaultHdrMetaData = Some(set_default_hdr_metadata);
|
||||
config.EvtIddCxMonitorSetGammaRamp = Some(set_gamma_ramp);
|
||||
config.EvtIddCxMonitorAssignSwapChain = Some(assign_swap_chain);
|
||||
config.EvtIddCxMonitorUnassignSwapChain = Some(unassign_swap_chain);
|
||||
// IddCx redirects device IOCTLs to this callback — our SudoVDA-compatible control plane.
|
||||
|
||||
@@ -0,0 +1,424 @@
|
||||
//! P2 direct frame push — DRIVER side. The restricted WUDFHost token canNOT create named kernel
|
||||
//! objects (proven on the RTX box: it can't even write a world-writable file), so — exactly like the
|
||||
//! gamepad UMDF drivers (`crates/punktfunk-host/src/inject/dualsense_windows.rs`: *"the host creates
|
||||
//! the section, privileged, with a permissive SDDL so the WUDFHost can open it; the driver maps it"*)
|
||||
//! — the **host** creates the shared header + frame-ready event + ring of keyed-mutex textures, and
|
||||
//! the driver only **OPENS** them. The driver writes its actual render-adapter LUID + a status code
|
||||
//! back into the host-created header (our only driver-visibility channel: UMDF hides OutputDebugString
|
||||
//! in ETW and the token can't write files), then copies each acquired swap-chain surface into the next
|
||||
//! ring slot and signals the host.
|
||||
//!
|
||||
//! Host counterpart: `crates/punktfunk-host/src/capture/idd_push.rs` — [`SharedHeader`], [`MAGIC`],
|
||||
//! [`RING_LEN`], the driver-status codes and the `Global\` object-name scheme are DUPLICATED
|
||||
//! byte-identically there.
|
||||
|
||||
use std::sync::atomic::{AtomicPtr, AtomicU32, AtomicU64, Ordering};
|
||||
|
||||
use log::info;
|
||||
use windows::core::{Interface, HSTRING};
|
||||
use windows::Win32::Foundation::{CloseHandle, HANDLE};
|
||||
use windows::Win32::Graphics::Direct3D11::{
|
||||
ID3D11Device, ID3D11Device1, ID3D11DeviceContext, ID3D11Texture2D, D3D11_TEXTURE2D_DESC,
|
||||
};
|
||||
use windows::Win32::Graphics::Dxgi::IDXGIKeyedMutex;
|
||||
use windows::Win32::System::Memory::{
|
||||
MapViewOfFile, OpenFileMappingW, UnmapViewOfFile, FILE_MAP_ALL_ACCESS,
|
||||
MEMORY_MAPPED_VIEW_ADDRESS,
|
||||
};
|
||||
use windows::Win32::System::Threading::{OpenEventW, SetEvent, SYNCHRONIZATION_ACCESS_RIGHTS};
|
||||
|
||||
// --- kept byte-identical with the host (idd_push.rs) ---
|
||||
pub const MAGIC: u32 = 0x4456_4650;
|
||||
/// Kept for parity with the host's duplicated protocol header (the host writes it).
|
||||
#[allow(dead_code)]
|
||||
pub const VERSION: u32 = 1;
|
||||
/// Ring slots. 6 (was 3) gives ample headroom so this 0 ms-timeout publish always finds a free slot
|
||||
/// while the host briefly holds one across the convert/copy into its output ring and the depth-2
|
||||
/// pipelined encode runs. MUST equal the host's `RING_LEN` (idd_push.rs) — both are rebuilt together;
|
||||
/// a mismatch corrupts the slot mapping.
|
||||
pub const RING_LEN: u32 = 6;
|
||||
const DXGI_SHARED_RESOURCE_RW: u32 = 0x8000_0000 | 0x1;
|
||||
/// SYNCHRONIZE | EVENT_MODIFY_STATE — the driver waits on (no) and SIGNALS the event.
|
||||
const EVENT_ACCESS: u32 = 0x0010_0000 | 0x0002;
|
||||
const WAIT_TIMEOUT_HRESULT: i32 = 0x0000_0102;
|
||||
|
||||
/// `driver_status` values the driver writes into the host header (the host logs them on a timeout).
|
||||
/// `NONE` is the host's initial value (kept for parity).
|
||||
#[allow(dead_code)]
|
||||
pub const DRV_STATUS_NONE: u32 = 0;
|
||||
pub const DRV_STATUS_OPENED: u32 = 1;
|
||||
pub const DRV_STATUS_TEX_FAIL: u32 = 2;
|
||||
pub const DRV_STATUS_NO_DEVICE1: u32 = 3;
|
||||
|
||||
#[repr(C)]
|
||||
pub struct SharedHeader {
|
||||
pub magic: u32,
|
||||
pub version: u32,
|
||||
pub generation: u32,
|
||||
pub ring_len: u32,
|
||||
pub width: u32,
|
||||
pub height: u32,
|
||||
pub dxgi_format: u32,
|
||||
pub _pad: u32,
|
||||
/// `(seq << 8) | slot` — DRIVER-written after each copy; host loads it `Acquire`.
|
||||
pub latest: u64,
|
||||
pub qpc_pts: u64,
|
||||
/// DRIVER-written: the adapter the swap-chain actually renders on (so the host can detect a
|
||||
/// mismatch with the textures it created and report it).
|
||||
pub driver_render_luid_low: u32,
|
||||
pub driver_render_luid_high: i32,
|
||||
/// DRIVER-written status (visibility channel).
|
||||
pub driver_status: u32,
|
||||
pub driver_status_detail: u32,
|
||||
}
|
||||
|
||||
pub fn hdr_name(target_id: u32) -> String {
|
||||
format!("Global\\pfvd-hdr-{target_id}")
|
||||
}
|
||||
pub fn evt_name(target_id: u32) -> String {
|
||||
format!("Global\\pfvd-evt-{target_id}")
|
||||
}
|
||||
pub fn tex_name(target_id: u32, generation: u32, slot: u32) -> String {
|
||||
format!("Global\\pfvd-tex-{target_id}-{generation}-{slot}")
|
||||
}
|
||||
// --------------------------------------------------------
|
||||
|
||||
// ===== Bring-up debug channel (fixed-name, host-created) =====
|
||||
// UMDF hides the driver's OutputDebugString (ETW) and the restricted token can't write files, so this
|
||||
// fixed-name `Global\pfvd-dbg` block — created by the host with the permissive SDDL — is how the driver
|
||||
// reports what it's doing, INDEPENDENT of the per-target header (which is the thing under test). The
|
||||
// host reads + logs these counters. Duplicated in `idd_push.rs`.
|
||||
#[repr(C)]
|
||||
pub struct DebugBlock {
|
||||
pub magic: u32,
|
||||
/// ++ each `run_core` entry — proves the swap-chain processor runs at all.
|
||||
pub run_core_entries: u32,
|
||||
/// The `target_id` the driver resolved for naming (mismatch vs the host = the bug).
|
||||
pub resolved_target_id: u32,
|
||||
/// ++ each header-open attempt.
|
||||
pub header_open_attempts: u32,
|
||||
/// Last header-open error (win32/HRESULT).
|
||||
pub last_open_error: u32,
|
||||
/// 1 once the driver opened the per-target header.
|
||||
pub header_opened: u32,
|
||||
pub render_luid_low: u32,
|
||||
pub render_luid_high: i32,
|
||||
/// ++ each acquired swap-chain frame — proves frames flow (or the display is idle).
|
||||
pub frames_acquired: u32,
|
||||
pub _pad: u32,
|
||||
}
|
||||
|
||||
static DBG_PTR: AtomicPtr<DebugBlock> = AtomicPtr::new(std::ptr::null_mut());
|
||||
|
||||
/// Map the host-created debug block on first use (fixed name). Returns null until the host creates it.
|
||||
fn dbg_block() -> *mut DebugBlock {
|
||||
let p = DBG_PTR.load(Ordering::Acquire);
|
||||
if !p.is_null() {
|
||||
return p;
|
||||
}
|
||||
let Ok(map) = (unsafe {
|
||||
OpenFileMappingW(FILE_MAP_ALL_ACCESS.0, false, &HSTRING::from("Global\\pfvd-dbg"))
|
||||
}) else {
|
||||
return std::ptr::null_mut();
|
||||
};
|
||||
let view = unsafe { MapViewOfFile(map, FILE_MAP_ALL_ACCESS, 0, 0, std::mem::size_of::<DebugBlock>()) };
|
||||
if view.Value.is_null() {
|
||||
unsafe {
|
||||
let _ = CloseHandle(map);
|
||||
}
|
||||
return std::ptr::null_mut();
|
||||
}
|
||||
let np = view.Value.cast::<DebugBlock>();
|
||||
match DBG_PTR.compare_exchange(std::ptr::null_mut(), np, Ordering::AcqRel, Ordering::Acquire) {
|
||||
Ok(_) => np, // we win; intentionally leak the handle (diagnostic, process-lifetime)
|
||||
Err(existing) => {
|
||||
unsafe {
|
||||
let _ = UnmapViewOfFile(view);
|
||||
let _ = CloseHandle(map);
|
||||
}
|
||||
existing
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn dbg_run_core_entry() {
|
||||
let p = dbg_block();
|
||||
if !p.is_null() {
|
||||
unsafe {
|
||||
(*(std::ptr::addr_of_mut!((*p).run_core_entries) as *const AtomicU32))
|
||||
.fetch_add(1, Ordering::Relaxed);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn dbg_frame() {
|
||||
let p = dbg_block();
|
||||
if !p.is_null() {
|
||||
unsafe {
|
||||
(*(std::ptr::addr_of_mut!((*p).frames_acquired) as *const AtomicU32))
|
||||
.fetch_add(1, Ordering::Relaxed);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Record the target id + render LUID the driver will use to name the shared objects.
|
||||
pub fn dbg_set_target(target_id: u32, render_luid_low: u32, render_luid_high: i32) {
|
||||
let p = dbg_block();
|
||||
if !p.is_null() {
|
||||
unsafe {
|
||||
(*p).resolved_target_id = target_id;
|
||||
(*p).render_luid_low = render_luid_low;
|
||||
(*p).render_luid_high = render_luid_high;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Record a header-open attempt + its error (0 = success).
|
||||
pub fn dbg_header_attempt(error: u32, opened: bool) {
|
||||
let p = dbg_block();
|
||||
if !p.is_null() {
|
||||
unsafe {
|
||||
(*(std::ptr::addr_of_mut!((*p).header_open_attempts) as *const AtomicU32))
|
||||
.fetch_add(1, Ordering::Relaxed);
|
||||
(*p).last_open_error = error;
|
||||
if opened {
|
||||
(*p).header_opened = 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct Slot {
|
||||
tex: ID3D11Texture2D,
|
||||
mutex: IDXGIKeyedMutex,
|
||||
}
|
||||
|
||||
/// Publishes acquired swap-chain surfaces into the HOST-created ring. Owned by the swap-chain
|
||||
/// processor thread; attached lazily once the host has created the shared objects.
|
||||
pub struct FramePublisher {
|
||||
context: ID3D11DeviceContext,
|
||||
map: HANDLE,
|
||||
header: *mut SharedHeader,
|
||||
event: HANDLE,
|
||||
slots: Vec<Slot>,
|
||||
next: u32,
|
||||
seq: u64,
|
||||
/// The host-created ring textures' DXGI format (from the shared header). A swap-chain surface whose
|
||||
/// format differs (e.g. an FP16 HDR frame vs a BGRA ring) is dropped in `publish` — CopyResource
|
||||
/// needs matching formats.
|
||||
ring_format: u32,
|
||||
/// The ring generation this publisher attached to. The host BUMPS the header generation when it
|
||||
/// recreates the ring at a new format mid-session (the display's HDR mode flipped) — [`Self::is_stale`]
|
||||
/// detects that so `run_core` re-attaches to the new-format textures instead of dropping every frame.
|
||||
generation: u32,
|
||||
}
|
||||
|
||||
// SAFETY: created and used only on the swap-chain processor thread.
|
||||
unsafe impl Send for FramePublisher {}
|
||||
|
||||
impl FramePublisher {
|
||||
/// Try ONCE to attach to the host-created shared objects. Returns `Err` cheaply if the host hasn't
|
||||
/// created/published them yet — the drain loop retries periodically, so a non-IDD-push session
|
||||
/// just keeps draining with no stall.
|
||||
pub fn try_open(
|
||||
target_id: u32,
|
||||
render_luid_low: u32,
|
||||
render_luid_high: i32,
|
||||
device: &ID3D11Device,
|
||||
context: &ID3D11DeviceContext,
|
||||
) -> windows::core::Result<Self> {
|
||||
// 1. Open the host-created header (RW). Err if the host hasn't created it yet.
|
||||
let map = unsafe {
|
||||
OpenFileMappingW(
|
||||
FILE_MAP_ALL_ACCESS.0,
|
||||
false,
|
||||
&HSTRING::from(hdr_name(target_id)),
|
||||
)?
|
||||
};
|
||||
let view =
|
||||
unsafe { MapViewOfFile(map, FILE_MAP_ALL_ACCESS, 0, 0, std::mem::size_of::<SharedHeader>()) };
|
||||
if view.Value.is_null() {
|
||||
unsafe {
|
||||
let _ = CloseHandle(map);
|
||||
}
|
||||
return Err(windows::core::Error::from_win32());
|
||||
}
|
||||
let header = view.Value.cast::<SharedHeader>();
|
||||
|
||||
// 2. Report our render adapter to the host immediately (lets it detect a mismatch).
|
||||
unsafe {
|
||||
(*header).driver_render_luid_low = render_luid_low;
|
||||
(*header).driver_render_luid_high = render_luid_high;
|
||||
}
|
||||
|
||||
// 3. The host sets magic==MAGIC only once the ring textures exist. Not ready → retry later.
|
||||
let magic =
|
||||
unsafe { (*(std::ptr::addr_of!((*header).magic) as *const AtomicU32)).load(Ordering::Acquire) };
|
||||
if magic != MAGIC {
|
||||
unsafe {
|
||||
let _ = UnmapViewOfFile(MEMORY_MAPPED_VIEW_ADDRESS { Value: header.cast() });
|
||||
let _ = CloseHandle(map);
|
||||
}
|
||||
return Err(windows::core::Error::from_win32());
|
||||
}
|
||||
let (generation, ring_len) =
|
||||
unsafe { ((*header).generation, (*header).ring_len.min(RING_LEN)) };
|
||||
|
||||
// 4. Open the event (SYNCHRONIZE | EVENT_MODIFY_STATE so we can SetEvent).
|
||||
let event = match unsafe {
|
||||
OpenEventW(
|
||||
SYNCHRONIZATION_ACCESS_RIGHTS(EVENT_ACCESS),
|
||||
false,
|
||||
&HSTRING::from(evt_name(target_id)),
|
||||
)
|
||||
} {
|
||||
Ok(e) => e,
|
||||
Err(e) => {
|
||||
unsafe {
|
||||
let _ = UnmapViewOfFile(MEMORY_MAPPED_VIEW_ADDRESS { Value: header.cast() });
|
||||
let _ = CloseHandle(map);
|
||||
}
|
||||
return Err(e);
|
||||
}
|
||||
};
|
||||
|
||||
// 5. Open device1 + the ring textures the host created (same render adapter required).
|
||||
let device1: ID3D11Device1 = match device.cast() {
|
||||
Ok(d) => d,
|
||||
Err(e) => {
|
||||
unsafe {
|
||||
(*header).driver_status = DRV_STATUS_NO_DEVICE1;
|
||||
let _ = CloseHandle(event);
|
||||
let _ = UnmapViewOfFile(MEMORY_MAPPED_VIEW_ADDRESS { Value: header.cast() });
|
||||
let _ = CloseHandle(map);
|
||||
}
|
||||
return Err(e);
|
||||
}
|
||||
};
|
||||
let mut slots = Vec::new();
|
||||
for k in 0..ring_len {
|
||||
let name = HSTRING::from(tex_name(target_id, generation, k));
|
||||
let opened: windows::core::Result<ID3D11Texture2D> =
|
||||
unsafe { device1.OpenSharedResourceByName(&name, DXGI_SHARED_RESOURCE_RW) };
|
||||
match opened {
|
||||
Ok(tex) => match tex.cast::<IDXGIKeyedMutex>() {
|
||||
Ok(mutex) => slots.push(Slot { tex, mutex }),
|
||||
Err(e) => {
|
||||
unsafe {
|
||||
(*header).driver_status = DRV_STATUS_TEX_FAIL;
|
||||
(*header).driver_status_detail = e.code().0 as u32;
|
||||
let _ = CloseHandle(event);
|
||||
let _ = UnmapViewOfFile(MEMORY_MAPPED_VIEW_ADDRESS { Value: header.cast() });
|
||||
let _ = CloseHandle(map);
|
||||
}
|
||||
return Err(e);
|
||||
}
|
||||
},
|
||||
Err(e) => {
|
||||
// Most likely a render-adapter mismatch (the host made the textures on a different
|
||||
// GPU than the swap-chain renders on). Tell the host so it can report it.
|
||||
unsafe {
|
||||
(*header).driver_status = DRV_STATUS_TEX_FAIL;
|
||||
(*header).driver_status_detail = e.code().0 as u32;
|
||||
let _ = CloseHandle(event);
|
||||
let _ = UnmapViewOfFile(MEMORY_MAPPED_VIEW_ADDRESS { Value: header.cast() });
|
||||
let _ = CloseHandle(map);
|
||||
}
|
||||
return Err(e);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
unsafe {
|
||||
(*header).driver_status = DRV_STATUS_OPENED;
|
||||
}
|
||||
info!("frame-push(driver): attached to host ring gen {generation} ({ring_len} slots)");
|
||||
Ok(Self {
|
||||
context: context.clone(),
|
||||
map,
|
||||
header,
|
||||
event,
|
||||
slots,
|
||||
next: 0,
|
||||
seq: 0,
|
||||
ring_format: unsafe { (*header).dxgi_format },
|
||||
generation,
|
||||
})
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn latest_cell(&self) -> &AtomicU64 {
|
||||
unsafe { &*(std::ptr::addr_of!((*self.header).latest) as *const AtomicU64) }
|
||||
}
|
||||
|
||||
/// True once the host has recreated the ring (bumped the header generation) — e.g. the display's
|
||||
/// HDR mode flipped, so the ring format changed (FP16 ⇄ BGRA) and the texture names now carry a new
|
||||
/// generation. `run_core` drops the publisher on this so it re-attaches to the new ring.
|
||||
pub fn is_stale(&self) -> bool {
|
||||
let cur = unsafe {
|
||||
(*(std::ptr::addr_of!((*self.header).generation) as *const AtomicU32))
|
||||
.load(Ordering::Acquire)
|
||||
};
|
||||
cur != self.generation
|
||||
}
|
||||
|
||||
/// Copy `surface` into the next free ring slot and signal the host. Never blocks (0 ms try-acquire).
|
||||
pub fn publish(&mut self, surface: &ID3D11Texture2D) {
|
||||
let ring_len = self.slots.len() as u32;
|
||||
if ring_len == 0 {
|
||||
return;
|
||||
}
|
||||
// B2 format guard: CopyResource needs the surface + ring textures to share a DXGI format. Drop
|
||||
// a frame that doesn't match (e.g. an FP16 HDR surface arriving while the ring is still BGRA,
|
||||
// before B3 makes the ring FP16) instead of corrupting / failing the copy.
|
||||
let mut desc = D3D11_TEXTURE2D_DESC::default();
|
||||
unsafe { surface.GetDesc(&mut desc) };
|
||||
if desc.Format.0 as u32 != self.ring_format {
|
||||
return;
|
||||
}
|
||||
let start = self.next;
|
||||
for attempt in 0..ring_len {
|
||||
let slot = (start + attempt) % ring_len;
|
||||
let s = &self.slots[slot as usize];
|
||||
match unsafe { s.mutex.AcquireSync(0, 0) } {
|
||||
Ok(()) => {
|
||||
unsafe {
|
||||
self.context.CopyResource(&s.tex, surface);
|
||||
let _ = s.mutex.ReleaseSync(0);
|
||||
}
|
||||
self.seq = self.seq.wrapping_add(1);
|
||||
// `latest` = (generation << 40) | (seq << 8) | slot. Stamping the generation lets the
|
||||
// host REJECT a publish from a stale ring (an old-generation publisher racing the
|
||||
// host's mid-session ring recreate) so it never consumes an unwritten new-ring slot.
|
||||
let latest = (u64::from(self.generation) << 40)
|
||||
| ((self.seq & 0xFFFF_FFFF) << 8)
|
||||
| u64::from(slot & 0xff);
|
||||
self.latest_cell().store(latest, Ordering::Release);
|
||||
unsafe {
|
||||
let _ = SetEvent(self.event);
|
||||
}
|
||||
self.next = (slot + 1) % ring_len;
|
||||
return;
|
||||
}
|
||||
Err(e) if e.code().0 == WAIT_TIMEOUT_HRESULT => continue,
|
||||
Err(_) => return,
|
||||
}
|
||||
}
|
||||
// All slots busy — drop this frame (never block the swap-chain thread).
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for FramePublisher {
|
||||
fn drop(&mut self) {
|
||||
self.slots.clear();
|
||||
unsafe {
|
||||
if !self.header.is_null() {
|
||||
let _ = UnmapViewOfFile(MEMORY_MAPPED_VIEW_ADDRESS {
|
||||
Value: self.header.cast(),
|
||||
});
|
||||
}
|
||||
let _ = CloseHandle(self.event);
|
||||
let _ = CloseHandle(self.map);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -12,6 +12,7 @@ mod control;
|
||||
mod direct_3d_device;
|
||||
mod edid;
|
||||
mod entry;
|
||||
mod frame_transport;
|
||||
mod helpers;
|
||||
mod logger;
|
||||
mod monitor;
|
||||
|
||||
@@ -1,12 +1,22 @@
|
||||
//! Minimal `log` backend that writes to `OutputDebugString` — no `driver-logger`/event-log/`tokio`.
|
||||
//! View with DebugView/WinDbg. Keeping the `log` facade lets the ported callbacks/context use
|
||||
//! `error!`/`info!`/`debug!` unchanged.
|
||||
//! Minimal `log` backend that writes to `OutputDebugString` AND tees to a file — UMDF redirects a
|
||||
//! hosted driver's `OutputDebugString` to ETW (invisible to DebugView), so the file tee is how we
|
||||
//! actually read driver logs during bring-up. Keeping the `log` facade lets the ported
|
||||
//! callbacks/context use `error!`/`info!`/`debug!` unchanged.
|
||||
|
||||
use std::fs::OpenOptions;
|
||||
use std::io::Write;
|
||||
use std::sync::Mutex;
|
||||
|
||||
use log::{LevelFilter, Metadata, Record};
|
||||
use windows::core::PCSTR;
|
||||
use windows::Win32::System::Diagnostics::Debug::OutputDebugStringA;
|
||||
|
||||
struct DbgLogger;
|
||||
/// World-writable so the restricted WUDFHost token can append. Read it during bring-up.
|
||||
const LOG_PATH: &str = r"C:\Users\Public\pfvd-driver.log";
|
||||
|
||||
struct DbgLogger {
|
||||
file: Mutex<()>,
|
||||
}
|
||||
|
||||
impl log::Log for DbgLogger {
|
||||
fn enabled(&self, _metadata: &Metadata) -> bool {
|
||||
@@ -17,12 +27,19 @@ impl log::Log for DbgLogger {
|
||||
let msg = format!("[pf-vdisplay] {:<5} {}\0", record.level(), record.args());
|
||||
// SAFETY: `msg` is a NUL-terminated byte string valid for the call.
|
||||
unsafe { OutputDebugStringA(PCSTR(msg.as_ptr())) };
|
||||
// Tee to the file (best-effort): the real channel during bring-up.
|
||||
let _guard = self.file.lock();
|
||||
if let Ok(mut f) = OpenOptions::new().create(true).append(true).open(LOG_PATH) {
|
||||
let _ = writeln!(f, "{:<5} {}", record.level(), record.args());
|
||||
}
|
||||
}
|
||||
|
||||
fn flush(&self) {}
|
||||
}
|
||||
|
||||
static LOGGER: DbgLogger = DbgLogger;
|
||||
static LOGGER: DbgLogger = DbgLogger {
|
||||
file: Mutex::new(()),
|
||||
};
|
||||
|
||||
pub fn init() {
|
||||
let _ = log::set_logger(&LOGGER);
|
||||
@@ -31,4 +48,8 @@ pub fn init() {
|
||||
} else {
|
||||
LevelFilter::Info
|
||||
});
|
||||
// Boot marker so each load is distinguishable in the file.
|
||||
if let Ok(mut f) = OpenOptions::new().create(true).append(true).open(LOG_PATH) {
|
||||
let _ = writeln!(f, "==== pf-vdisplay logger init ====");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -6,6 +6,7 @@
|
||||
use std::ptr::NonNull;
|
||||
use std::sync::atomic::{AtomicU32, AtomicU64};
|
||||
use std::sync::{Mutex, OnceLock};
|
||||
use std::time::Instant;
|
||||
|
||||
use wdf_umdf_sys::{IDDCX_ADAPTER__, IDDCX_MONITOR__};
|
||||
|
||||
@@ -37,6 +38,10 @@ pub struct MonitorObject {
|
||||
pub target_id: u32,
|
||||
pub adapter_luid_low: u32,
|
||||
pub adapter_luid_high: i32,
|
||||
/// When the entry was pushed (`do_add`). The watchdog skips monitors younger than the host's
|
||||
/// setup window (CCD commit + GDI-name resolve + settle) so a still-initializing monitor is never
|
||||
/// torn down mid-birth during reconnect churn.
|
||||
pub created_at: Instant,
|
||||
}
|
||||
// SAFETY: the raw IddCx object ptr is framework-managed; access is serialized by MONITOR_MODES.
|
||||
unsafe impl Send for MonitorObject {}
|
||||
@@ -53,9 +58,12 @@ pub static MONITOR_MODES: Mutex<Vec<MonitorObject>> = Mutex::new(Vec::new());
|
||||
|
||||
/// Monitor id / EDID-serial counter (unique per created monitor).
|
||||
pub static NEXT_ID: AtomicU32 = AtomicU32::new(1);
|
||||
/// Watchdog (seconds). The host reads the timeout via GET_WATCHDOG and PINGs to keep alive.
|
||||
pub static WATCHDOG_TIMEOUT: AtomicU32 = AtomicU32::new(3);
|
||||
pub static WATCHDOG_COUNTDOWN: AtomicU32 = AtomicU32::new(3);
|
||||
/// Watchdog (seconds). The host reads the timeout via GET_WATCHDOG and PINGs to keep alive. 8 s (was
|
||||
/// 3) gives the host's between-session teardown gap — stop old pinger → CCD display re-attach (a slow
|
||||
/// `SetDisplayConfig`) → REMOVE — headroom, so the watchdog doesn't spuriously fire during reconnect
|
||||
/// churn. The host derives its PING interval from this (timeout/3), so it auto-adjusts.
|
||||
pub static WATCHDOG_TIMEOUT: AtomicU32 = AtomicU32::new(8);
|
||||
pub static WATCHDOG_COUNTDOWN: AtomicU32 = AtomicU32::new(8);
|
||||
/// The preferred render adapter LUID set via SET_RENDER_ADAPTER, packed `(high<<32)|low`. 0 = none.
|
||||
pub static PREFERRED_RENDER_ADAPTER: AtomicU64 = AtomicU64::new(0);
|
||||
|
||||
|
||||
@@ -4,29 +4,39 @@ use std::{
|
||||
Arc,
|
||||
},
|
||||
thread::{self, JoinHandle},
|
||||
time::Duration,
|
||||
};
|
||||
|
||||
use log::{debug, error};
|
||||
use wdf_umdf::{
|
||||
IddCxSwapChainFinishedProcessingFrame, IddCxSwapChainReleaseAndAcquireBuffer,
|
||||
IddCxSwapChainFinishedProcessingFrame, IddCxSwapChainReleaseAndAcquireBuffer2,
|
||||
IddCxSwapChainSetDevice, WdfObjectDelete,
|
||||
};
|
||||
use wdf_umdf_sys::{
|
||||
HANDLE, IDARG_IN_SWAPCHAINSETDEVICE, IDARG_OUT_RELEASEANDACQUIREBUFFER, IDDCX_SWAPCHAIN,
|
||||
NTSTATUS, WAIT_TIMEOUT, WDFOBJECT,
|
||||
HANDLE, IDARG_IN_RELEASEANDACQUIREBUFFER2, IDARG_IN_SWAPCHAINSETDEVICE,
|
||||
IDARG_OUT_RELEASEANDACQUIREBUFFER2, IDDCX_SWAPCHAIN, NTSTATUS, WAIT_TIMEOUT, WDFOBJECT,
|
||||
};
|
||||
use windows::{
|
||||
core::{w, Interface},
|
||||
Win32::{
|
||||
Foundation::HANDLE as WHANDLE,
|
||||
Graphics::Dxgi::IDXGIDevice,
|
||||
Graphics::{
|
||||
Direct3D11::ID3D11Texture2D,
|
||||
Dxgi::{IDXGIDevice, IDXGIResource},
|
||||
},
|
||||
System::Threading::{
|
||||
AvRevertMmThreadCharacteristics, AvSetMmThreadCharacteristicsW, WaitForSingleObject,
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
use crate::{direct_3d_device::Direct3DDevice, helpers::Sendable};
|
||||
use crate::{
|
||||
direct_3d_device::Direct3DDevice,
|
||||
frame_transport::{
|
||||
dbg_frame, dbg_header_attempt, dbg_run_core_entry, dbg_set_target, FramePublisher,
|
||||
},
|
||||
helpers::Sendable,
|
||||
};
|
||||
|
||||
pub struct SwapChainProcessor {
|
||||
terminate: Arc<AtomicBool>,
|
||||
@@ -47,8 +57,11 @@ impl SwapChainProcessor {
|
||||
pub fn run(
|
||||
&mut self,
|
||||
swap_chain: IDDCX_SWAPCHAIN,
|
||||
device: Direct3DDevice,
|
||||
device: Arc<Direct3DDevice>,
|
||||
available_buffer_event: HANDLE,
|
||||
target_id: u32,
|
||||
render_luid_low: u32,
|
||||
render_luid_high: i32,
|
||||
) {
|
||||
let available_buffer_event = unsafe { Sendable::new(available_buffer_event) };
|
||||
let swap_chain = unsafe { Sendable::new(swap_chain) };
|
||||
@@ -64,7 +77,17 @@ impl SwapChainProcessor {
|
||||
return;
|
||||
};
|
||||
|
||||
Self::run_core(*swap_chain, &device, *available_buffer_event, &terminate);
|
||||
Self::run_core(
|
||||
*swap_chain,
|
||||
&device,
|
||||
*available_buffer_event,
|
||||
&terminate,
|
||||
target_id,
|
||||
render_luid_low,
|
||||
render_luid_high,
|
||||
);
|
||||
|
||||
error!("run_core RETURNED (target={target_id}) — deleting swap-chain, device drops next");
|
||||
|
||||
let res = unsafe { WdfObjectDelete(*swap_chain as WDFOBJECT) };
|
||||
if let Err(e) = res {
|
||||
@@ -87,31 +110,140 @@ impl SwapChainProcessor {
|
||||
device: &Direct3DDevice,
|
||||
available_buffer_event: HANDLE,
|
||||
terminate: &AtomicBool,
|
||||
target_id: u32,
|
||||
render_luid_low: u32,
|
||||
render_luid_high: i32,
|
||||
) {
|
||||
let dxgi_device = device.device.cast::<IDXGIDevice>();
|
||||
let Ok(dxgi_device) = dxgi_device else {
|
||||
error!("Failed to cast ID3D11Device to IDXGIDevice: {dxgi_device:?}");
|
||||
return;
|
||||
};
|
||||
// P2 direct frame push: lazily ATTACH to the HOST-created shared ring. The restricted UMDF
|
||||
// token can't create named objects, so the host creates the header + event + textures and we
|
||||
// only OPEN them once they appear (`try_open`). Until then we just drain — exactly the P1
|
||||
// behaviour — so a non-IDD-push session never stalls. Retried every ~30 frames.
|
||||
let mut publisher: Option<FramePublisher> = None;
|
||||
let mut frames_since_try: u32 = u32::MAX; // attach attempt on the first acquired frame
|
||||
|
||||
// Bring-up debug: prove run_core ran + record the target/render LUID we'll name objects with.
|
||||
dbg_run_core_entry();
|
||||
dbg_set_target(target_id, render_luid_low, render_luid_high);
|
||||
|
||||
// SetDevice fails (0x887A0026, FACILITY_DXGI) when the monitor briefly flaps INACTIVE during
|
||||
// topology activation — the OS unassigns + re-assigns the swap-chain, and a fresh run_core thread
|
||||
// can lose the race to the unassign. Retry briefly so a stable re-assign binds the device instead
|
||||
// of giving up on the first transient failure. `terminate` (set when the OS unassigns + drops the
|
||||
// processor) breaks us out promptly.
|
||||
// Cast to IDXGIDevice ONCE and BORROW it to the swap-chain across all retries. The previous
|
||||
// code re-cast + `into_raw()`'d on EVERY attempt — and a flapping monitor fails several
|
||||
// attempts per session — so each failure orphaned one IDXGIDevice reference, pinning the D3D
|
||||
// device so it (and its ~dozen D3D worker threads + tens of MB of VRAM) was NEVER freed when
|
||||
// the processor dropped. That leaked ~71 threads / ~57 MB VRAM per reconnect until the driver
|
||||
// choked and sessions fell to 0 bytes. `as_raw()` keeps our single reference (released right
|
||||
// after the loop); IddCx AddRefs its own on success, and `device` keeps the object alive for
|
||||
// the drain loop regardless.
|
||||
let dxgi_device = match device.device.cast::<IDXGIDevice>() {
|
||||
Ok(d) => d,
|
||||
Err(e) => {
|
||||
error!("Failed to cast ID3D11Device to IDXGIDevice: {e:?}");
|
||||
return;
|
||||
}
|
||||
};
|
||||
let set_device = IDARG_IN_SWAPCHAINSETDEVICE {
|
||||
pDevice: dxgi_device.into_raw().cast(),
|
||||
pDevice: dxgi_device.as_raw().cast(),
|
||||
};
|
||||
|
||||
let res = unsafe { IddCxSwapChainSetDevice(swap_chain, &set_device) };
|
||||
if res.is_err() {
|
||||
debug!("Failed to set swapchain device: {res:?}");
|
||||
let mut set_ok = false;
|
||||
let mut terminated = false;
|
||||
for attempt in 0..60u32 {
|
||||
if terminate.load(Ordering::Relaxed) {
|
||||
error!("run_core: terminated during SetDevice (attempt {attempt}, target={target_id})");
|
||||
terminated = true;
|
||||
break;
|
||||
}
|
||||
let res = unsafe { IddCxSwapChainSetDevice(swap_chain, &set_device) };
|
||||
if res.is_ok() {
|
||||
set_ok = true;
|
||||
error!("run_core: SetDevice OK (target={target_id}, attempt={attempt}) — entering drain loop");
|
||||
break;
|
||||
}
|
||||
if attempt == 0 {
|
||||
debug!("run_core: SetDevice attempt 0 failed ({res:?}) — retrying up to 60x@50ms (monitor may be flapping)");
|
||||
}
|
||||
thread::sleep(Duration::from_millis(50));
|
||||
}
|
||||
// Release our borrowed device reference — IddCx holds its own now, or we gave up. (Explicit
|
||||
// drop so NLL can't release it mid-loop while the swap-chain still references the raw ptr.)
|
||||
drop(dxgi_device);
|
||||
if !set_ok {
|
||||
if !terminated {
|
||||
error!("run_core: SetDevice never succeeded after retries (target={target_id}) — giving up");
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
let mut logged_pending = false;
|
||||
let mut logged_frame = false;
|
||||
loop {
|
||||
let mut buffer = IDARG_OUT_RELEASEANDACQUIREBUFFER::default();
|
||||
let hr: NTSTATUS =
|
||||
unsafe { IddCxSwapChainReleaseAndAcquireBuffer(swap_chain, &mut buffer).into() };
|
||||
// Check terminate at the TOP, every iteration. The success branch below does NOT re-check
|
||||
// it, so during a CONTINUOUS frame burst (DWM rendering the freshly-activated desktop) a
|
||||
// thread that the OS unassigns — or that `free_swap_chain_processor` is dropping — never
|
||||
// sees the flag and loops on, pinning its D3D device (and ~36 NVIDIA worker threads). That
|
||||
// is THE reconnect leak: it only reproduced at full speed, because cdb's pacing forced
|
||||
// E_PENDING gaps (which DO check terminate) and masked it. Without this, `SwapChainProcessor::drop`'s
|
||||
// join can also block until the burst ends.
|
||||
if terminate.load(Ordering::Relaxed) {
|
||||
break;
|
||||
}
|
||||
// The host recreates the shared ring (new format) mid-session when the display's HDR mode
|
||||
// flips — it bumps the header generation. Detect that and drop the publisher so we re-attach
|
||||
// to the new-format textures below; otherwise we'd keep CopyResource'ing into the stale ring,
|
||||
// whose format now mismatches the surface → the publish() format-guard drops every frame and
|
||||
// the stream freezes until the next swap-chain recreate.
|
||||
if publisher.as_ref().is_some_and(FramePublisher::is_stale) {
|
||||
publisher = None;
|
||||
frames_since_try = u32::MAX; // re-attach immediately
|
||||
}
|
||||
// Lazy-attach (rate-limited) at the loop TOP so we keep trying even while the display is
|
||||
// idle (E_PENDING / no frames presented yet), not only when a frame is acquired. `try_open`
|
||||
// is a cheap OpenFileMapping that fails fast until the host has created the ring.
|
||||
if publisher.is_none() {
|
||||
if frames_since_try >= 30 {
|
||||
frames_since_try = 0;
|
||||
match FramePublisher::try_open(
|
||||
target_id,
|
||||
render_luid_low,
|
||||
render_luid_high,
|
||||
&device.device,
|
||||
&device.device_context,
|
||||
) {
|
||||
Ok(p) => {
|
||||
dbg_header_attempt(0, true);
|
||||
publisher = Some(p);
|
||||
}
|
||||
Err(e) => dbg_header_attempt(e.code().0 as u32, false),
|
||||
}
|
||||
} else {
|
||||
frames_since_try += 1;
|
||||
}
|
||||
}
|
||||
|
||||
// B2: ...Buffer2 is required once CAN_PROCESS_FP16 is set. AcquireSystemMemoryBuffer=FALSE
|
||||
// keeps the GPU surface (out.MetaData.pSurface). The surface format varies per-frame —
|
||||
// FP16 (R16G16B16A16_FLOAT) in HDR, BGRA in SDR — and the publisher's format guard handles
|
||||
// a frame that doesn't match the ring until B3 makes the ring FP16.
|
||||
let mut in_args = IDARG_IN_RELEASEANDACQUIREBUFFER2 {
|
||||
#[allow(clippy::cast_possible_truncation)]
|
||||
Size: std::mem::size_of::<IDARG_IN_RELEASEANDACQUIREBUFFER2>() as u32,
|
||||
AcquireSystemMemoryBuffer: 0,
|
||||
};
|
||||
let mut buffer = IDARG_OUT_RELEASEANDACQUIREBUFFER2::default();
|
||||
let hr: NTSTATUS = unsafe {
|
||||
IddCxSwapChainReleaseAndAcquireBuffer2(swap_chain, &mut in_args, &mut buffer).into()
|
||||
};
|
||||
|
||||
#[allow(clippy::items_after_statements)]
|
||||
const E_PENDING: u32 = 0x8000_000A;
|
||||
if u32::from(hr) == E_PENDING {
|
||||
if !logged_pending {
|
||||
error!("run_core: E_PENDING (target={target_id}) — swap-chain valid but DWM has composed NO frame yet");
|
||||
logged_pending = true;
|
||||
}
|
||||
let wait_result =
|
||||
unsafe { WaitForSingleObject(WHANDLE(available_buffer_event.cast()), 16).0 };
|
||||
|
||||
@@ -130,8 +262,29 @@ impl SwapChainProcessor {
|
||||
// The wait was cancelled or something unexpected happened
|
||||
break;
|
||||
} else if hr.is_success() {
|
||||
if !logged_frame {
|
||||
error!("run_core: FIRST FRAME acquired (target={target_id}) — DWM IS compositing the virtual display!");
|
||||
logged_frame = true;
|
||||
}
|
||||
dbg_frame(); // bring-up: prove frames actually flow (vs an idle display)
|
||||
// This is the most performance-critical section of code in an IddCx driver. It's important that whatever
|
||||
// is done with the acquired surface be finished as quickly as possible.
|
||||
//
|
||||
// P2: copy the acquired surface into the shared ring BEFORE FinishedProcessingFrame
|
||||
// (the surface is valid until the next ReleaseAndAcquire). The pointer is BORROWED —
|
||||
// `from_raw_borrowed` does not take IddCx's refcount — and the GPU-side copy is ordered
|
||||
// before the consumer via the slot keyed mutex. (Attach happens at the loop top.)
|
||||
if let Some(pub_) = publisher.as_mut() {
|
||||
let raw = buffer.MetaData.pSurface as *mut core::ffi::c_void;
|
||||
if !raw.is_null() {
|
||||
if let Some(res) = unsafe { IDXGIResource::from_raw_borrowed(&raw) } {
|
||||
if let Ok(tex) = res.cast::<ID3D11Texture2D>() {
|
||||
pub_.publish(&tex);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let hr = unsafe { IddCxSwapChainFinishedProcessingFrame(swap_chain) };
|
||||
|
||||
if hr.is_err() {
|
||||
|
||||
@@ -7,7 +7,10 @@ use winreg::enums::HKEY_LOCAL_MACHINE;
|
||||
use winreg::RegKey;
|
||||
|
||||
const UMDF_V: &str = "2.31";
|
||||
const IDDCX_V: &str = "1.4";
|
||||
// Bumped 1.4 -> 1.10 for HDR/FP16 support (IDDCX_ADAPTER_FLAGS_CAN_PROCESS_FP16,
|
||||
// IddCxSwapChainReleaseAndAcquireBuffer2, the *2 mode/metadata DDIs). 1.10 is a superset of 1.4, so
|
||||
// existing call sites keep working; the new HDR DDIs become available to bind.
|
||||
const IDDCX_V: &str = "1.10";
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
enum Error {
|
||||
|
||||
@@ -7,7 +7,8 @@ use wdf_umdf_sys::{
|
||||
IDARG_IN_ADAPTERSETRENDERADAPTER, IDARG_IN_ADAPTER_INIT, IDARG_IN_MONITORCREATE,
|
||||
IDARG_IN_QUERY_HWCURSOR, IDARG_IN_SETUP_HWCURSOR, IDARG_IN_SWAPCHAINSETDEVICE,
|
||||
IDARG_OUT_ADAPTER_INIT, IDARG_OUT_MONITORARRIVAL, IDARG_OUT_MONITORCREATE,
|
||||
IDARG_OUT_QUERY_HWCURSOR, IDARG_OUT_RELEASEANDACQUIREBUFFER, IDDCX_ADAPTER, IDDCX_MONITOR,
|
||||
IDARG_IN_RELEASEANDACQUIREBUFFER2, IDARG_OUT_QUERY_HWCURSOR, IDARG_OUT_RELEASEANDACQUIREBUFFER,
|
||||
IDARG_OUT_RELEASEANDACQUIREBUFFER2, IDDCX_ADAPTER, IDDCX_MONITOR,
|
||||
IDDCX_SWAPCHAIN, IDD_CX_CLIENT_CONFIG, NTSTATUS, WDFDEVICE, WDFDEVICE_INIT,
|
||||
};
|
||||
|
||||
@@ -236,6 +237,30 @@ pub unsafe fn IddCxSwapChainReleaseAndAcquireBuffer(
|
||||
)
|
||||
}
|
||||
|
||||
/// IddCx 1.10 HDR variant — required once the adapter sets `CAN_PROCESS_FP16`. Provides per-frame
|
||||
/// `IDDCX_METADATA2` (surface colour space, HDR metadata, SDR white level).
|
||||
///
|
||||
/// # Safety
|
||||
/// None. User is responsible for safety.
|
||||
#[rustfmt::skip]
|
||||
pub unsafe fn IddCxSwapChainReleaseAndAcquireBuffer2(
|
||||
// in
|
||||
SwapChainObject: IDDCX_SWAPCHAIN,
|
||||
// in
|
||||
pInArgs: &mut IDARG_IN_RELEASEANDACQUIREBUFFER2,
|
||||
// out
|
||||
pOutArgs: &mut IDARG_OUT_RELEASEANDACQUIREBUFFER2
|
||||
) -> Result<NTSTATUS, IddCxError> {
|
||||
IddCxCall!(
|
||||
true,
|
||||
IddCxSwapChainReleaseAndAcquireBuffer2(
|
||||
SwapChainObject,
|
||||
pInArgs,
|
||||
pOutArgs
|
||||
)
|
||||
)
|
||||
}
|
||||
|
||||
/// # Safety
|
||||
///
|
||||
/// None. User is responsible for safety.
|
||||
|
||||
Reference in New Issue
Block a user