Files
punktfunk/design/vrr-plan.md
T

12 KiB
Raw Blame History

VRR over punktfunk/1 — design

Status: DESIGN — investigation complete (2026-07-03), nothing implemented. Key architectural decision recorded here: no VRR virtual display is needed — client-side VRR is driven purely by presentation cadence, so the host's virtual display Hz becomes a sampling grid decoupled from the client panel. punktfunk/1-native only; GameStream/Moonlight stays fixed-cadence (stock clients).

Goal: end-to-end variable refresh — the game's real frame pacing reaches the client's VRR panel instead of being resampled onto a fixed grid twice (host pacer, client vsync). Gains are both latency (the fixed-cadence quantization at capture and present is now the dominant remaining latency term — the Windows-client loopback p50 of ~18 ms is dominated by the 60 Hz virtual-display cadence while the wire is sub-millisecond) and smoothness (an 85 fps game on a 120 Hz grid presents as an irregular 8.3/16.7 ms alternation — judder baked in at the source that no client can undo).

The core decision: skip the VRR virtual display

A VRR panel is not "driven at a framerate" by any API — it follows the presentation cadence. If the client presents each frame on arrival, the panel refreshes at the stream's cadence, whatever it is. So client VRR needs frame-driven host emission + present-on-arrival clients, and no VRR anywhere on the host display stack. This sidesteps two otherwise-hard blockers entirely:

  • IddCx (Windows host) has no VRR support at all (through 1.10, which pf-vdisplay is built against): no VRR DDI, no VRR in the virtual EDID, and GPU control panels don't even list indirect displays as VRR-capable. Not fixable by us; the community IDD projects' "can we fake it" issue is open and unanswered.
  • KWin/Mutter/wlroots virtual outputs are fixed-mode (KWin hardcodes 60 Hz + out-of-band kscreen-doctor custom modes, vdisplay/linux/kwin.rs:101,138; Mutter defaults 60 with the PUNKTFUNK_MUTTER_VIRTUAL_REFRESH opt-in, mutter.rs:244-258; Sway takes one --custom WxH@Hz, wlroots.rs:93).

What a true-VRR virtual display would add is confined to the source end, exactly two residuals: (1) sampling quantization → pacing wobble — the game's output is sampled on the virtual display's fixed grid, and the game's true present times never reach the wire (our pts_ns is stamped at capture, already grid-aligned); (2) up to one virtual-vblank of host latency (a frame completed just after a composite waits for the next grid tick). Both scale with the grid: at 240 Hz the grid is 4.2 ms — pacing error ~±2 ms (below the ~45 ms perceptibility threshold) and ≤4.2 ms added latency. The high-Hz machinery already exists on every backend, and the Linux compositors composite on damage, so a 240 Hz virtual mode costs GPU work proportional to the game's actual fps, not 240 composites/s.

Negotiation semantics shift: today the client requests its native WxH@Hz and the mode's Hz means "the cadence you'll receive." Under client-VRR the virtual display Hz is the sampling grid (pick it high), while the client's panel VRR range governs presentation only.

Where the pieces stand (investigation findings)

Wire — already ~90 % ready

  • Every packet carries a wall-clock capture timestamp: PacketHeader.pts_ns is the first field of the 40-byte header (punktfunk-core/src/packet.rs:52-68), threaded to Frame.pts_ns and ABI PunktfunkFrame.pts_ns. Epoch = ns since UNIX epoch, stamped host-side via SystemTime::now() (punktfunk1.rs:100-105). Plus a monotonic per-AU frame_index.
  • The clock-skew offset is ABI-exposed: punktfunk_connection_clock_offset_ns (abi.rs:2121-2137; NTP-style min-RTT estimate, quic.rs:417-426). A client can convert host capture time to its own clock — the raw material for a timestamp-scheduled presenter, and something Moonlight fundamentally lacks (its "frame pacing" guesses; we have a measured offset).
  • FEC, keepalives, and reorder are rate-agnostic: FEC is self-describing per packet and adapts on loss; QUIC keepalive is 4 s/8 s; the reassembler window is frame-count-based (REORDER_WINDOW = 16, packet.rs:47). Nothing in the data plane divides by fps.

Missing (all small): a FLAG_REPEAT (or FLAG_NEW) bit in the already-end-to-end PacketHeader.user_flags (free bits above FLAG_PIC/EOF/SOF/PROBE, packet.rs:30-36 — no header size change); VIDEO_CAP_VRR = 0x08 in video_caps (quic.rs:107-116) mirrored to the ABI constant with the lockstep assert (abi.rs:856-864); an append-only Hello/Welcome trailing field for the client's panel refresh range (the same trailing-byte back-compat pattern used 7×). One real caveat: Reconfigure/Reconfigured are fixed-length, not tail-extensible (decode requires exact lengths, quic.rs:1029,1057) — a mid-stream VRR toggle/range change needs a new typed control message, not a field append.

Host — fixed cadence is a consumer-loop choice, not a capture limitation

Every capture producer is already push/event-driven: PipeWire delivers a buffer per composite on all Linux backends (damage-driven on kwin/mutter/wlroots — a static desktop produces nothing; gamescope pushes per output frame at its -r rate); the pf-vdisplay ring publishes one frame per DWM present and signals a frame-ready event, returning E_PENDING when DWM composed nothing (swap_chain_processor.rs:306-333). The fixed cadence is imposed entirely by the encode loops: the next += 1/effective_hz pacer (punktfunk1.rs:3336,3398-3401,3606; GameStream analogue gamestream/stream.rs:805-808) re-samples via try_latest() and re-encodes the last frame as a synthetic repeat when nothing new arrived (punktfunk1.rs:3169-3179) — repeats go on the wire indistinguishable from new frames (the repeat bool is host-internal stats only).

Smallest cadence change: block on the existing next_frame() (Linux recv_timeout, IDD WaitForSingleObject on the frame-ready event) and submit one encode per delivered frame, keeping an idle-timeout repeat so a damage-idle desktop still emits keepalive frames. The wire PTS is already wall-clock, so timestamps survive unchanged.

The load-bearing fixed-fps assumption is rate control: both encoder paths run CBR with a ~1-frame VBV sized bitrate/fps and feed frame_idx as the encoder PTS (encode/linux/mod.rs:280-297time_base(1/fps), VBV bitrate/fps × PUNKTFUNK_VBV_FRAMES default 1; encode/windows/nvenc.rs:663-672,787-788frameRateNum = fps, VBV bitrate/fps; PTS = frame_idx at nvenc.rs:1189-1206 / mod.rs:167,535). Variable intervals won't corrupt ordering, but a game at 85 fps in a "240 Hz-grid" session drastically undershoots the bitrate target and bursts fight the 1-frame VBV. VFR needs: feed the real capture PTS to the encoder timeline, and either budget frameRate at the expected rate with a laxer VBV or move that path to VBR/CQ. This is the one real technical knot.

Clients — all four are vsync-locked newest-wins today

No client has any tearing/VRR/present-immediate path; clock_offset_ns is used only for the latency HUD. Queue depth is 12 slots newest-wins everywhere; no de-jitter buffer anywhere.

Client Today Present-on-arrival path
Android releaseOutputBuffer(render=true) immediately on the newest drained buffer (native/src/decode.rs:274-334); setFrameRate fixed hint (decode.rs:100) Closest — present already arrival-driven; switch to the frame-rate change-strategy / seamless APIs so a VRR panel follows
Linux set_paintable on frame arrival; GTK/compositor frame clock scans out (ui_stream.rs:475-588) Arrival side done; needs compositor VRR (GNOME/KDE enable VRR for fullscreen apps — the fullscreen GtkGraphicsOffload dmabuf direct-scanout path is exactly the eligible case)
Apple Main-runloop CADisplayLink at fixed display/stream cadence + 1-slot ReadyRing (SessionPresenter.swift:69-76, Stage2Pipeline.swift:15-37); macOS displaySyncEnabled=false is not tearing — WindowServer still composites at vsync (MetalVideoPresenter.swift:193-200) iOS/iPadOS ProMotion: wide CAFrameRateRange + drive render from the decode callback instead of the link. macOS: WindowServer-limited (Moonlight reports VRR-follows-stream fullscreen only)
Windows Render thread waits the swapchain latency waitable (DWM vblank cadence) then Present(1); no ALLOW_TEARING anywhere (render.rs:157-225, present.rs:161-173,540) Hardest — the composition SwapChainPanel swapchain can't tear/independent-flip. Plausible route: arrival-driven presents through DWM's windowed-VRR (windowed G-Sync/FreeSync — DWM composes on demand, panel follows); needs on-glass validation, else a fullscreen HWND swapchain mode

Client pacing policy: scheduled present, not raw arrival

Raw present-on-arrival replays network+encode jitter onto the panel. Better: present at pts_ns + clock_offset_ns + D for a small constant D — the shared clock absorbs jitter and reproduces the host-side cadence exactly (still grid-quantized at the source; see residual (1)). D is a smoothness-vs-latency knob; on LAN it can be near zero. All the data for this is already on the wire today.

Staging

  1. Stage A — client-only, no protocol change. Timestamp-scheduled / present-on-arrival on VRR-capable displays. Order: Android + Linux (architecturally ready) → iOS ProMotion → Windows (DWM windowed-VRR validation) → macOS fullscreen. Biggest single latency win: removes avg ½ / worst 1 client refresh (~8/16.7 ms at 60 Hz, halved at 120).
  2. Stage B — host, native path only. Frame-driven consumer loop + idle-repeat keepalive; real-PTS encoder timeline + VFR-tolerant rate control; FLAG_REPEAT on the wire; VIDEO_CAP_VRR + panel-range negotiation; grid-Hz mode semantics. Kills the capture-side quantization down to the grid and stops burning encode on synthetic repeats.
  3. Stage C — optional gamescope experiment. gamescope has --adaptive-sync and it works even nested per upstream #1694; we don't pass it in the headless spawn (vdisplay/linux/gamescope.rs:975-980), and whether the headless backend honors it is unverified (untestable until the dev VM's GPU passthrough returns). If it works, it removes even the sampling grid on the path that matters most for gaming, at near-zero implementation cost. An optimization, not the architecture. KWin/Mutter/wlroots/IddCx true VRR: upstream-blocked, do not pursue.

Open questions / risks

  • VFR rate control per encoder: exact NVENC/VAAPI/AMF-QSV recipe (real-timestamp time_base vs max-rate + enlarged VBV vs VBR/CQ); interaction with the 1-frame-VBV latency property we rely on. The main Stage-B risk item.
  • Does gamescope headless honor --adaptive-sync? (Stage C gate; needs the GPU back.)
  • DWM windowed VRR with a composition swapchain: does arrival-cadence presenting through the XAML SwapChainPanel actually drive a G-Sync/FreeSync panel variably? On-glass validation gates the Windows-client stage-A entry.
  • Panel VRR floor / LFC: the idle-keepalive repeat cadence sets the stream's minimum rate; if it sits below a panel's ~48 Hz floor the client compositor/driver's LFC handles doubling — verify, and don't park the keepalive interval right at a floor boundary.
  • Android: seamless (CHANGE_FRAME_RATE_ONLY_IF_SEAMLESS) vs non-seamless switch strategy, and real-device VRR panel coverage.
  • Hello semantics: how a VRR-capable client picks the grid Hz to request (host advertises its max grid? client just asks 240 and the host clamps like today's mode ladder?).

External evidence (2026-07-03)