12 KiB
VRR over punktfunk/1 — design
Status: DESIGN — investigation complete (2026-07-03), nothing implemented. Key architectural decision recorded here: no VRR virtual display is needed — client-side VRR is driven purely by presentation cadence, so the host's virtual display Hz becomes a sampling grid decoupled from the client panel. punktfunk/1-native only; GameStream/Moonlight stays fixed-cadence (stock clients).
Goal: end-to-end variable refresh — the game's real frame pacing reaches the client's VRR panel instead of being resampled onto a fixed grid twice (host pacer, client vsync). Gains are both latency (the fixed-cadence quantization at capture and present is now the dominant remaining latency term — the Windows-client loopback p50 of ~18 ms is dominated by the 60 Hz virtual-display cadence while the wire is sub-millisecond) and smoothness (an 85 fps game on a 120 Hz grid presents as an irregular 8.3/16.7 ms alternation — judder baked in at the source that no client can undo).
The core decision: skip the VRR virtual display
A VRR panel is not "driven at a framerate" by any API — it follows the presentation cadence. If the client presents each frame on arrival, the panel refreshes at the stream's cadence, whatever it is. So client VRR needs frame-driven host emission + present-on-arrival clients, and no VRR anywhere on the host display stack. This sidesteps two otherwise-hard blockers entirely:
- IddCx (Windows host) has no VRR support at all (through 1.10, which pf-vdisplay is built against): no VRR DDI, no VRR in the virtual EDID, and GPU control panels don't even list indirect displays as VRR-capable. Not fixable by us; the community IDD projects' "can we fake it" issue is open and unanswered.
- KWin/Mutter/wlroots virtual outputs are fixed-mode (KWin hardcodes 60 Hz + out-of-band
kscreen-doctorcustom modes,vdisplay/linux/kwin.rs:101,138; Mutter defaults 60 with thePUNKTFUNK_MUTTER_VIRTUAL_REFRESHopt-in,mutter.rs:244-258; Sway takes one--custom WxH@Hz,wlroots.rs:93).
What a true-VRR virtual display would add is confined to the source end, exactly two residuals:
(1) sampling quantization → pacing wobble — the game's output is sampled on the virtual
display's fixed grid, and the game's true present times never reach the wire (our pts_ns is
stamped at capture, already grid-aligned); (2) up to one virtual-vblank of host latency (a
frame completed just after a composite waits for the next grid tick). Both scale with the grid:
at 240 Hz the grid is 4.2 ms — pacing error ~±2 ms (below the ~4–5 ms perceptibility threshold)
and ≤4.2 ms added latency. The high-Hz machinery already exists on every backend, and the Linux
compositors composite on damage, so a 240 Hz virtual mode costs GPU work proportional to the game's
actual fps, not 240 composites/s.
Negotiation semantics shift: today the client requests its native WxH@Hz and the mode's Hz means "the cadence you'll receive." Under client-VRR the virtual display Hz is the sampling grid (pick it high), while the client's panel VRR range governs presentation only.
Where the pieces stand (investigation findings)
Wire — already ~90 % ready
- Every packet carries a wall-clock capture timestamp:
PacketHeader.pts_nsis the first field of the 40-byte header (punktfunk-core/src/packet.rs:52-68), threaded toFrame.pts_nsand ABIPunktfunkFrame.pts_ns. Epoch = ns since UNIX epoch, stamped host-side viaSystemTime::now()(punktfunk1.rs:100-105). Plus a monotonic per-AUframe_index. - The clock-skew offset is ABI-exposed:
punktfunk_connection_clock_offset_ns(abi.rs:2121-2137; NTP-style min-RTT estimate,quic.rs:417-426). A client can convert host capture time to its own clock — the raw material for a timestamp-scheduled presenter, and something Moonlight fundamentally lacks (its "frame pacing" guesses; we have a measured offset). - FEC, keepalives, and reorder are rate-agnostic: FEC is self-describing per packet and adapts
on loss; QUIC keepalive is 4 s/8 s; the reassembler window is frame-count-based
(
REORDER_WINDOW = 16,packet.rs:47). Nothing in the data plane divides by fps.
Missing (all small): a FLAG_REPEAT (or FLAG_NEW) bit in the already-end-to-end
PacketHeader.user_flags (free bits above FLAG_PIC/EOF/SOF/PROBE, packet.rs:30-36 — no header
size change); VIDEO_CAP_VRR = 0x08 in video_caps (quic.rs:107-116) mirrored to the ABI
constant with the lockstep assert (abi.rs:856-864); an append-only Hello/Welcome trailing field
for the client's panel refresh range (the same trailing-byte back-compat pattern used 7×). One real
caveat: Reconfigure/Reconfigured are fixed-length, not tail-extensible (decode requires
exact lengths, quic.rs:1029,1057) — a mid-stream VRR toggle/range change needs a new typed
control message, not a field append.
Host — fixed cadence is a consumer-loop choice, not a capture limitation
Every capture producer is already push/event-driven: PipeWire delivers a buffer per composite on
all Linux backends (damage-driven on kwin/mutter/wlroots — a static desktop produces nothing;
gamescope pushes per output frame at its -r rate); the pf-vdisplay ring publishes one frame per
DWM present and signals a frame-ready event, returning E_PENDING when DWM composed nothing
(swap_chain_processor.rs:306-333). The fixed cadence is imposed entirely by the encode loops: the
next += 1/effective_hz pacer (punktfunk1.rs:3336,3398-3401,3606; GameStream analogue
gamestream/stream.rs:805-808) re-samples via try_latest() and re-encodes the last frame as a
synthetic repeat when nothing new arrived (punktfunk1.rs:3169-3179) — repeats go on the wire
indistinguishable from new frames (the repeat bool is host-internal stats only).
Smallest cadence change: block on the existing next_frame() (Linux recv_timeout, IDD
WaitForSingleObject on the frame-ready event) and submit one encode per delivered frame, keeping
an idle-timeout repeat so a damage-idle desktop still emits keepalive frames. The wire PTS is
already wall-clock, so timestamps survive unchanged.
The load-bearing fixed-fps assumption is rate control: both encoder paths run CBR with a
~1-frame VBV sized bitrate/fps and feed frame_idx as the encoder PTS
(encode/linux/mod.rs:280-297 — time_base(1/fps), VBV bitrate/fps × PUNKTFUNK_VBV_FRAMES
default 1; encode/windows/nvenc.rs:663-672,787-788 — frameRateNum = fps, VBV bitrate/fps;
PTS = frame_idx at nvenc.rs:1189-1206 / mod.rs:167,535). Variable intervals won't corrupt
ordering, but a game at 85 fps in a "240 Hz-grid" session drastically undershoots the bitrate
target and bursts fight the 1-frame VBV. VFR needs: feed the real capture PTS to the encoder
timeline, and either budget frameRate at the expected rate with a laxer VBV or move that path
to VBR/CQ. This is the one real technical knot.
Clients — all four are vsync-locked newest-wins today
No client has any tearing/VRR/present-immediate path; clock_offset_ns is used only for the
latency HUD. Queue depth is 1–2 slots newest-wins everywhere; no de-jitter buffer anywhere.
| Client | Today | Present-on-arrival path |
|---|---|---|
| Android | releaseOutputBuffer(render=true) immediately on the newest drained buffer (native/src/decode.rs:274-334); setFrameRate fixed hint (decode.rs:100) |
Closest — present already arrival-driven; switch to the frame-rate change-strategy / seamless APIs so a VRR panel follows |
| Linux | set_paintable on frame arrival; GTK/compositor frame clock scans out (ui_stream.rs:475-588) |
Arrival side done; needs compositor VRR (GNOME/KDE enable VRR for fullscreen apps — the fullscreen GtkGraphicsOffload dmabuf direct-scanout path is exactly the eligible case) |
| Apple | Main-runloop CADisplayLink at fixed display/stream cadence + 1-slot ReadyRing (SessionPresenter.swift:69-76, Stage2Pipeline.swift:15-37); macOS displaySyncEnabled=false is not tearing — WindowServer still composites at vsync (MetalVideoPresenter.swift:193-200) |
iOS/iPadOS ProMotion: wide CAFrameRateRange + drive render from the decode callback instead of the link. macOS: WindowServer-limited (Moonlight reports VRR-follows-stream fullscreen only) |
| Windows | Render thread waits the swapchain latency waitable (DWM vblank cadence) then Present(1); no ALLOW_TEARING anywhere (render.rs:157-225, present.rs:161-173,540) |
Hardest — the composition SwapChainPanel swapchain can't tear/independent-flip. Plausible route: arrival-driven presents through DWM's windowed-VRR (windowed G-Sync/FreeSync — DWM composes on demand, panel follows); needs on-glass validation, else a fullscreen HWND swapchain mode |
Client pacing policy: scheduled present, not raw arrival
Raw present-on-arrival replays network+encode jitter onto the panel. Better: present at
pts_ns + clock_offset_ns + D for a small constant D — the shared clock absorbs jitter and
reproduces the host-side cadence exactly (still grid-quantized at the source; see residual (1)).
D is a smoothness-vs-latency knob; on LAN it can be near zero. All the data for this is already
on the wire today.
Staging
- Stage A — client-only, no protocol change. Timestamp-scheduled / present-on-arrival on VRR-capable displays. Order: Android + Linux (architecturally ready) → iOS ProMotion → Windows (DWM windowed-VRR validation) → macOS fullscreen. Biggest single latency win: removes avg ½ / worst 1 client refresh (~8/16.7 ms at 60 Hz, halved at 120).
- Stage B — host, native path only. Frame-driven consumer loop + idle-repeat keepalive;
real-PTS encoder timeline + VFR-tolerant rate control;
FLAG_REPEATon the wire;VIDEO_CAP_VRR+ panel-range negotiation; grid-Hz mode semantics. Kills the capture-side quantization down to the grid and stops burning encode on synthetic repeats. - Stage C — optional gamescope experiment. gamescope has
--adaptive-syncand it works even nested per upstream #1694; we don't pass it in the headless spawn (vdisplay/linux/gamescope.rs:975-980), and whether the headless backend honors it is unverified (untestable until the dev VM's GPU passthrough returns). If it works, it removes even the sampling grid on the path that matters most for gaming, at near-zero implementation cost. An optimization, not the architecture. KWin/Mutter/wlroots/IddCx true VRR: upstream-blocked, do not pursue.
Open questions / risks
- VFR rate control per encoder: exact NVENC/VAAPI/AMF-QSV recipe (real-timestamp
time_basevs max-rate + enlarged VBV vs VBR/CQ); interaction with the 1-frame-VBV latency property we rely on. The main Stage-B risk item. - Does gamescope headless honor
--adaptive-sync? (Stage C gate; needs the GPU back.) - DWM windowed VRR with a composition swapchain: does arrival-cadence presenting through the
XAML
SwapChainPanelactually drive a G-Sync/FreeSync panel variably? On-glass validation gates the Windows-client stage-A entry. - Panel VRR floor / LFC: the idle-keepalive repeat cadence sets the stream's minimum rate; if it sits below a panel's ~48 Hz floor the client compositor/driver's LFC handles doubling — verify, and don't park the keepalive interval right at a floor boundary.
- Android: seamless (
CHANGE_FRAME_RATE_ONLY_IF_SEAMLESS) vs non-seamless switch strategy, and real-device VRR panel coverage. - Hello semantics: how a VRR-capable client picks the grid Hz to request (host advertises its max grid? client just asks 240 and the host clamps like today's mode ladder?).
External evidence (2026-07-03)
- gamescope
--adaptive-syncworks in nested mode: ValveSoftware/gamescope#1694 - IddCx has no VRR path; community "can we fake it" open/unanswered: Virtual-Display-Driver#24, IddCx DDI index
- Client VRR panels do follow Moonlight's stream cadence in practice (and it's messy — our shared clock is the differentiator): moonlight-qt#1545, macOS fullscreen-only moonlight-qt#1509
- Mutter
RecordVirtualderives refresh from PipeWire; VRR only on real monitors: mutter!1154