Files
enricobuehler 75627c8afe
apple / swift (push) Failing after 10s
release / apple (push) Failing after 7s
apple / screenshots (push) Has been skipped
audit / cargo-audit (push) Failing after 1m19s
windows-host / package (push) Failing after 2m44s
windows-msix / package (arm64, C:\Users\Public\ffmpeg-arm64, aarch64-pc-windows-msvc, C:\t-a64) (push) Failing after 39s
windows-msix / package (x64, C:\Users\Public\ffmpeg, x86_64-pc-windows-msvc, C:\t) (push) Failing after 39s
windows / build (aarch64-pc-windows-msvc) (push) Failing after 45s
android / android (push) Successful in 5m17s
windows / build (x86_64-pc-windows-msvc) (push) Failing after 45s
ci / web (push) Successful in 57s
ci / docs-site (push) Successful in 56s
ci / rust (push) Successful in 9m19s
ci / bench (push) Successful in 4m40s
decky / build-publish (push) Successful in 26s
deb / build-publish (push) Successful in 2m57s
docker / build-push (., web/Dockerfile, punktfunk-web) (push) Successful in 33s
docker / build-push (--build-arg FEDORA_VERSION=44, ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora44-rpm) (push) Successful in 2m56s
docker / build-push (ci, ci/fedora-rpm.Dockerfile, punktfunk-fedora-rpm) (push) Successful in 2m35s
docker / build-push (ci, ci/rust-ci.Dockerfile, punktfunk-rust-ci) (push) Successful in 2m20s
docker / build-push (docs-site, docs-site/Dockerfile, punktfunk-docs) (push) Successful in 53s
flatpak / build-publish (push) Successful in 4m22s
rpm / build-publish (bazzite, punktfunk-fedora-rpm) (push) Successful in 8m51s
docker / deploy-docs (push) Successful in 21s
rpm / build-publish (fedora-44, punktfunk-fedora44-rpm) (push) Successful in 8m50s
feat(audio): end-to-end 5.1/7.1 surround across the native path + all clients
Adds negotiated 5.1/7.1 surround to the punktfunk/1 protocol and every client
(previously stereo-only):

- core: new shared `audio` layout table (LAYOUT_51/71 + identity multistream
  mapping, canonical wire order FL FR FC LFE RL RR SL SR); Hello/Welcome
  `audio_channels` negotiation via the trailing-byte back-compat pattern (old
  peers fall back to stereo); C-ABI `punktfunk_connect_ex6`,
  `punktfunk_connection_audio_channels`, and in-core multistream decode
  `punktfunk_connection_next_audio_pcm` for embedders without a multistream
  Opus decoder. Real-libopus channel-identity round-trip test.
- host: native audio thread captures + Opus-(multi)stream-encodes at the
  negotiated count (with a cross-session cached-capturer channel-mismatch fix);
  GameStream surround unified onto the safe `opus::MSEncoder`, dropping
  `audiopus_sys` (~4 unsafe blocks) and un-gating Windows GameStream surround;
  WASAPI loopback capture relaxed to 2/6/8 with the correct dwChannelMask.
- clients: Linux (PipeWire), Windows (WASAPI), Android (AAudio) decode via
  `opus::MSDecoder` + render multichannel; Apple decodes in-core to PCM →
  AVAudioEngine with an explicit wire-order channel layout; each gains a
  Stereo/5.1/7.1 setting. `punktfunk-probe --audio-channels N` is the headless
  validator.

Verified on Linux: core/host/linux/probe test suites + the Android Rust
(cargo-ndk) build, clippy -D warnings, and rustfmt all green. Windows/Apple
builds, all on-glass checks, and the live native loopback are pending (CI / a
free box).

Also lands the concurrent in-tree HEVC 4:4:4 host work (PUNKTFUNK_444): it
shares the same touched files (quic.rs, punktfunk1.rs, encode/*, ...) and so
cannot be committed separately from the surround changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 21:11:05 +00:00

491 lines
21 KiB
Swift
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
// Session audio, both directions:
//
// host speaker: a drain thread pulls Opus packets (nextAudio, its own plane in the
// core), decodes via OpusDecoder, and writes PCM into a jitter ring; an
// AVAudioSourceNode pulls from the ring (silence on underrun with re-priming, so a
// network gap costs one dip, not permanent crackle).
//
// mic host: a second AVAudioEngine taps the input device, resamples to 48 kHz
// stereo, slices 20 ms chunks, Opus-encodes, and sendMic()s each packet the host
// feeds them into a virtual PipeWire source.
//
// Devices are chosen by UID ("" = system default: the engine is then never pinned to a
// concrete device and follows default-device changes). Two engines, not one a single
// AVAudioEngine ties input+output to one aggregate clock, separate engines keep
// arbitrary mic/speaker combinations trivial.
import AVFoundation
import os
private let log = Logger(subsystem: "io.unom.punktfunk", category: "audio")
/// SPSC-ish jitter ring (interleaved float, `channels` per frame), drain thread render
/// callback. The unfair lock is held for microseconds; fine at render-callback rates. Priming:
/// reads return silence until enough is buffered (at least `prefill`, and at least one
/// packet more than the device's render quantum large-buffer devices would otherwise
/// chronically out-demand the prefill and oscillate prime dropout re-prime), and an
/// underrun re-primes, concealing jitter as one short dip instead of sustained crackle.
/// All counts stay whole frames (multiples of `channels`), so the interleave can never slip.
final class AudioRing: @unchecked Sendable {
private var buf: [Float]
private var readIdx = 0
private var writeIdx = 0
private var primed = false
private var renderQuantum = 0
private let prefill: Int
private let highWater: Int
private let channels: Int
private let lock = OSAllocatedUnfairLock()
/// `capacity`/`prefill` in samples (interleaved `channels` per frame, both whole frames).
init(capacity: Int, prefill: Int, channels: Int) {
buf = [Float](repeating: 0, count: capacity)
self.prefill = prefill
self.channels = channels
highWater = prefill * 4
}
func write(_ samples: UnsafePointer<Float>, count: Int) {
lock.lock()
defer { lock.unlock() }
let capacity = buf.count
// A single write larger than the whole ring would push readIdx PAST writeIdx below
// (inverting the valid range corruption). It never happens (one decoded packet is far
// under capacity), but guard rather than corrupt.
guard count <= capacity else { return }
if writeIdx + count - readIdx > capacity {
readIdx = writeIdx + count - capacity // overflow: drop oldest
}
for i in 0..<count {
buf[(writeIdx + i) % capacity] = samples[i]
}
writeIdx += count
// Latency clamp: both ends run at 48 kHz, so backlog from a network stall (or
// creeping host-vs-DAC clock skew) never drains on its own without this, one
// 300 ms hiccup leaves audio 300 ms behind video for the rest of the session.
// Shedding down to 2× prefill costs one audible blip instead.
if writeIdx - readIdx > highWater {
readIdx = writeIdx - prefill * 2
}
}
/// Fills `out` completely (silence beyond what's buffered).
func read(into out: UnsafeMutablePointer<Float>, count: Int) {
lock.lock()
defer { lock.unlock() }
renderQuantum = max(renderQuantum, count)
let available = writeIdx - readIdx
if !primed {
// One 5 ms host packet (240 frames × channels) of slack beyond the device's demand.
if available >= max(prefill, renderQuantum + 240 * channels) {
primed = true
} else {
for i in 0..<count { out[i] = 0 }
return
}
}
let n = min(available, count)
let capacity = buf.count
for i in 0..<n {
out[i] = buf[(readIdx + i) % capacity]
}
readIdx += n
if n < count {
for i in n..<count { out[i] = 0 }
primed = false // underrun re-prime before resuming
}
}
}
private final class StopFlag: @unchecked Sendable {
private let lock = NSLock()
private var stopped = false
var isStopped: Bool {
lock.lock()
defer { lock.unlock() }
return stopped
}
func stop() {
lock.lock()
stopped = true
lock.unlock()
}
}
/// Render-block-owned scratch storage: freed exactly when the closure (and thus the
/// last possible render call) is released never racing CoreAudio.
private final class ScratchBuffer {
// 8192 frames × up to 8 channels (7.1) the render block caps `frames` at 8192.
let ptr = UnsafeMutablePointer<Float>.allocate(capacity: 8192 * 8)
deinit { ptr.deallocate() }
}
/// CoreAudio channel layout for the canonical wire order FL FR FC LFE RL RR [SL SR]. nil for
/// stereo (the standard layout is correct). For 5.1/7.1 we list explicit channel labels via
/// `kAudioChannelLayoutTag_UseChannelDescriptions` preset tags (DTS_5_1 etc.) don't reliably
/// match Moonlight's order. NB the 7.1 mapping (verified against the WASAPI 0x63F + SPA orderings):
/// wire idx 4-5 = RL/RR = the WAVE *back* pair LeftSurround/RightSurround; idx 6-7 = SL/SR = the
/// WAVE *side* pair LeftSurroundDirect/RightSurroundDirect. (Using RearSurround* for 6-7 would
/// swap side/back vs the Windows/Linux clients.)
private func wireChannelLayout(channels: Int) -> AVAudioChannelLayout? {
let labels: [AudioChannelLabel]
switch channels {
case 6:
labels = [
kAudioChannelLabel_Left, kAudioChannelLabel_Right, kAudioChannelLabel_Center,
kAudioChannelLabel_LFEScreen, kAudioChannelLabel_LeftSurround,
kAudioChannelLabel_RightSurround,
]
case 8:
labels = [
kAudioChannelLabel_Left, kAudioChannelLabel_Right, kAudioChannelLabel_Center,
kAudioChannelLabel_LFEScreen,
kAudioChannelLabel_LeftSurround, kAudioChannelLabel_RightSurround, // wire RL/RR (back)
kAudioChannelLabel_LeftSurroundDirect, kAudioChannelLabel_RightSurroundDirect, // wire SL/SR (side)
]
default:
return nil
}
let size = MemoryLayout<AudioChannelLayout>.size
+ (labels.count - 1) * MemoryLayout<AudioChannelDescription>.stride
let raw = UnsafeMutableRawPointer.allocate(byteCount: size, alignment: 16)
defer { raw.deallocate() }
let layout = raw.bindMemory(to: AudioChannelLayout.self, capacity: 1)
layout.pointee.mChannelLayoutTag = kAudioChannelLayoutTag_UseChannelDescriptions
layout.pointee.mChannelBitmap = AudioChannelBitmap(rawValue: 0)
layout.pointee.mNumberChannelDescriptions = UInt32(labels.count)
let descs = UnsafeMutableBufferPointer(
start: &layout.pointee.mChannelDescriptions, count: labels.count)
for (i, lbl) in labels.enumerated() {
descs[i] = AudioChannelDescription(
mChannelLabel: lbl, mChannelFlags: AudioChannelFlags(rawValue: 0),
mCoordinates: (0, 0, 0))
}
return AVAudioChannelLayout(layout: layout)
}
public final class SessionAudio {
private let connection: PunktfunkConnection
private let flag = StopFlag()
private let drainDone = DispatchSemaphore(value: 0)
/// Owns the engine handles + drainStarted, paired with `flag`: stop() sets the flag
/// BEFORE taking the engines, every publisher re-checks the flag under this lock
/// after publishing-side work so a startCapture racing stop() (the mic-permission
/// callback arrives whenever the user clicks the prompt) can never leave a hot
/// microphone with no owner.
private let stateLock = NSLock()
private var playbackEngine: AVAudioEngine?
private var captureEngine: AVAudioEngine?
private var drainStarted = false
public init(connection: PunktfunkConnection) {
self.connection = connection
}
/// Backstop for an owner dropping us without stop() unblocks the drain thread
/// (which captures the connection strongly, NOT self) within one poll timeout.
/// Engine teardown still belongs to stop().
deinit {
flag.stop()
}
/// Start playback (and, if enabled+authorized, the mic uplink). Empty UIDs = system
/// default device; on iOS the UIDs are ignored entirely (routes are
/// AVAudioSession-managed). Main thread (engine setup); returns after the engines
/// start the mic may start slightly later if the permission prompt is pending.
public func start(speakerUID: String, micUID: String, micEnabled: Bool) {
#if os(iOS)
// Route + policy live in the session, not per-engine: stereo playback, mic
// capture when enabled, Bluetooth allowed. Failure is non-fatal (defaults).
let session = AVAudioSession.sharedInstance()
do {
if micEnabled {
// .defaultToSpeaker: .playAndRecord otherwise routes to the iPhone
// EARPIECE; only affects the built-in route (headphones/BT still win).
try session.setCategory(
.playAndRecord, mode: .default,
options: [.allowBluetoothA2DP, .defaultToSpeaker])
} else {
try session.setCategory(.playback, mode: .default)
}
try session.setActive(true)
} catch {
log.warning("AVAudioSession setup failed: \(error.localizedDescription)")
}
#elseif os(tvOS)
do {
try AVAudioSession.sharedInstance().setCategory(.playback, mode: .default)
try AVAudioSession.sharedInstance().setActive(true)
} catch {
log.warning("AVAudioSession setup failed: \(error.localizedDescription)")
}
#endif
startPlayback(speakerUID: speakerUID)
#if os(tvOS)
// No app-accessible microphone input on tvOS playback only.
#else
guard micEnabled else { return }
switch AVCaptureDevice.authorizationStatus(for: .audio) {
case .authorized:
startCapture(micUID: micUID)
case .notDetermined:
AVCaptureDevice.requestAccess(for: .audio) { [weak self] granted in
DispatchQueue.main.async {
guard let self, granted, !self.flag.isStopped else { return }
self.startCapture(micUID: micUID)
}
}
default:
log.warning("microphone access denied — mic uplink disabled (System Settings → Privacy)")
}
#endif
}
/// Stop both directions. Safe from any thread; waits the drain thread out ( its
/// poll timeout) so the caller can close the connection right after.
public func stop() {
flag.stop() // before taking the engines see stateLock's comment
stateLock.lock()
let capture = captureEngine
captureEngine = nil
let playback = playbackEngine
playbackEngine = nil
let wasDraining = drainStarted
drainStarted = false
stateLock.unlock()
if let capture {
capture.inputNode.removeTap(onBus: 0)
capture.stop()
}
playback?.stop()
if wasDraining {
_ = drainDone.wait(timeout: .now() + .milliseconds(400))
}
#if !os(macOS)
// Release the session so audio we interrupted (Music, podcasts) gets its
// resume cue.
do {
try AVAudioSession.sharedInstance().setActive(
false, options: .notifyOthersOnDeactivation)
} catch {
log.warning("AVAudioSession deactivation failed: \(error.localizedDescription)")
}
#endif
}
// MARK: - Playback (host speaker)
private func startPlayback(speakerUID: String) {
// Build the playback layout from the host-RESOLVED channel count (never the request):
// 2 = stereo / 6 = 5.1 / 8 = 7.1, canonical wire order FL FR FC LFE RL RR SL SR.
let channels = Int(connection.resolvedAudioChannels)
// 1 s interleaved capacity, ~20 ms prefill (four 5 ms host packets of jitter absorption
// before the first sample plays), both scaled by the channel count.
let ring = AudioRing(
capacity: 48_000 * channels, prefill: 960 * channels, channels: channels)
let engine = AVAudioEngine()
#if os(macOS)
if !speakerUID.isEmpty {
if let dev = AudioDevices.deviceID(forUID: speakerUID),
let unit = engine.outputNode.audioUnit {
if !Self.setDevice(dev, on: unit) {
log.error("could not select speaker \(speakerUID) — using default")
}
} else {
log.warning("speaker \(speakerUID) not present — using default")
}
}
#endif
// Engine-native deinterleaved float; the render block deinterleaves from the ring. Surround
// uses an explicit wire-order channel layout; the mixer downmixes to the output device when
// it has fewer speakers (e.g. an iPhone's stereo built-ins). (Explicit if/else rather than
// map/flatMap so it's correct whether the channelLayout initializer is failable or not.)
var format: AVAudioFormat?
if channels == 2 {
format = AVAudioFormat(standardFormatWithSampleRate: 48_000, channels: 2)
} else if let layout = wireChannelLayout(channels: channels) {
format = AVAudioFormat(standardFormatWithSampleRate: 48_000, channelLayout: layout)
}
guard let format else {
log.error("could not build \(channels)-channel audio format — audio disabled")
return
}
let scratch = ScratchBuffer() // block-owned; freed with the closure
let source = AVAudioSourceNode(format: format) { _, _, frameCount, abl -> OSStatus in
let frames = Int(frameCount)
guard frames <= 8192 else { return kAudioUnitErr_TooManyFramesToProcess }
ring.read(into: scratch.ptr, count: frames * channels)
let buffers = UnsafeMutableAudioBufferListPointer(abl)
// Deinterleave the wire-order interleaved ring into the engine's per-channel buses.
if buffers.count >= channels {
for ch in 0..<channels {
if let dst = buffers[ch].mData?.assumingMemoryBound(to: Float.self) {
for f in 0..<frames { dst[f] = scratch.ptr[f * channels + ch] }
}
}
}
return noErr
}
engine.attach(source)
engine.connect(source, to: engine.mainMixerNode, format: format)
engine.prepare()
do {
try engine.start()
} catch {
log.error("playback engine failed to start: \(error.localizedDescription)")
return
}
stateLock.lock()
if flag.isStopped {
stateLock.unlock()
engine.stop() // stop() already ran don't strand a started engine
return
}
playbackEngine = engine
stateLock.unlock()
startDrain(into: ring)
}
private func startDrain(into ring: AudioRing) {
stateLock.lock()
drainStarted = true
stateLock.unlock()
let thread = Thread { [connection, flag, drainDone] in
defer { drainDone.signal() }
// Decode happens IN-CORE (libopus multistream) AudioToolbox's Opus path is
// stereo-only and is handed back as interleaved f32 PCM in wire channel order.
while !flag.isStopped {
let pcm: PunktfunkConnection.AudioPCM?
do {
pcm = try connection.nextAudioPcm(timeoutMs: 100)
} catch {
break // session closed
}
guard let pcm, pcm.frameCount > 0 else { continue }
pcm.samples.withUnsafeBufferPointer { p in
if let base = p.baseAddress {
ring.write(base, count: pcm.frameCount * pcm.channels)
}
}
}
}
thread.name = "punktfunk-audio"
thread.qualityOfService = .userInteractive
thread.start()
}
// MARK: - Mic (mic host)
#if !os(tvOS)
private func startCapture(micUID: String) {
let engine = AVAudioEngine()
let input = engine.inputNode
#if os(macOS)
if !micUID.isEmpty {
if let dev = AudioDevices.deviceID(forUID: micUID), let unit = input.audioUnit {
if !Self.setDevice(dev, on: unit) {
log.error("could not select microphone \(micUID) — using default")
}
} else {
log.warning("microphone \(micUID) not present — using default")
}
}
#endif
let inFormat = input.outputFormat(forBus: 0)
guard inFormat.sampleRate > 0, inFormat.channelCount > 0 else {
log.error("no usable input device — mic uplink disabled")
return
}
guard let encoder = try? OpusEncoder(),
let resampler = AVAudioConverter(from: inFormat, to: encoder.pcmFormat),
let chunk = AVAudioPCMBuffer(
pcmFormat: encoder.pcmFormat, frameCapacity: OpusEncoder.framesPerPacket)
else {
log.error("Opus encoder unavailable — mic uplink disabled")
return
}
// Tap-thread-confined state: resample into `staging`, accumulate in `fifo`,
// slice 960-frame chunks for the encoder.
var fifo: [Float] = []
fifo.reserveCapacity(48_000)
var seq: UInt32 = 0
let connection = connection
let flag = flag
input.installTap(onBus: 0, bufferSize: 2048, format: inFormat) { buffer, _ in
if flag.isStopped { return }
let ratio = 48_000 / inFormat.sampleRate
let outCapacity = AVAudioFrameCount(
(Double(buffer.frameLength) * ratio).rounded(.up) + 64)
guard let staging = AVAudioPCMBuffer(
pcmFormat: encoder.pcmFormat, frameCapacity: outCapacity)
else { return }
var fed = false
var convError: NSError?
let status = resampler.convert(to: staging, error: &convError) { _, outStatus in
if fed {
outStatus.pointee = .noDataNow
return nil
}
fed = true
outStatus.pointee = .haveData
return buffer
}
guard status != .error, let p = staging.floatChannelData?[0] else { return }
fifo.append(contentsOf: UnsafeBufferPointer(
start: p, count: Int(staging.frameLength) * 2))
let samplesPerChunk = Int(OpusEncoder.framesPerPacket) * 2
while fifo.count >= samplesPerChunk {
chunk.frameLength = OpusEncoder.framesPerPacket
fifo.withUnsafeBufferPointer { src in
chunk.floatChannelData![0].update(
from: src.baseAddress!, count: samplesPerChunk)
}
fifo.removeFirst(samplesPerChunk)
guard let packets = try? encoder.encode(chunk) else { continue }
for packet in packets {
connection.sendMic(
packet, seq: seq, ptsNs: DispatchTime.now().uptimeNanoseconds)
seq &+= 1
}
}
}
engine.prepare()
do {
try engine.start()
} catch {
log.error("capture engine failed to start: \(error.localizedDescription)")
input.removeTap(onBus: 0)
return
}
stateLock.lock()
if flag.isStopped {
// stop() ran while we were starting (the permission prompt resolves at the
// user's leisure) tear the engine down ourselves, nobody else owns it now.
stateLock.unlock()
input.removeTap(onBus: 0)
engine.stop()
return
}
captureEngine = engine
stateLock.unlock()
log.info("mic uplink started (\(micUID.isEmpty ? "default input" : micUID))")
}
#endif
#if os(macOS)
private static func setDevice(_ id: AudioDeviceID, on unit: AudioUnit) -> Bool {
var dev = id
return AudioUnitSetProperty(
unit, kAudioOutputUnitProperty_CurrentDevice, kAudioUnitScope_Global, 0,
&dev, UInt32(MemoryLayout<AudioDeviceID>.size)) == noErr
}
#endif
}