iOS Audio Session Management

This guide explains how the LiveSwitch iOS SDK manages audio sessions, the VPIO AudioUnit and AVAudioSession. It covers default behaviour, configuration options, CallKit integration, Bluetooth handling, interruption recovery and diagnostics.

Overview

The LiveSwitch iOS SDK uses a centralized FMLiveSwitchCocoaAudioSessionManager (ASM) singleton that owns the single shared Voice Processing I/O (VPIO) AudioUnit for the entire process. All microphone and speaker instances communicate through ASM -- they never create or hold their own AudioUnit.

This centralized design solves several problems that arise when multiple audio components compete for shared system resources:

Problem	Centralized Solution
Two AudioUnits create two competing echo cancellers	One VPIO unit provides a single echo canceller with full context
Each unit tries to open its own Bluetooth SCO link	One unit negotiates a single SCO link at one sample rate
Multiple remote streams need mixing before output	ASM owns the real-time mixer callback and mixes all sinks
Competing `AVAudioSession` `setCategory`/`setActive` calls	One manager configures the session once; manual mode delegates to CallKit

Key Classes

FMLiveSwitchCocoaAudioSessionManager -- Singleton. Owns the VPIO AudioUnit, AVAudioSession lifecycle, real-time mixer and per-participant output buffers.
FMLiveSwitchCocoaAudioUnitSink -- One instance per remote participant. Writes decoded PCM audio into ASM for mixing and playback.
FMLiveSwitchCocoaAudioUnitSource -- One active instance at a time (one microphone). Pulls captured samples from the VPIO input bus and delivers them to the encoding pipeline.

Note

By default, ASM handles everything automatically. Most apps don't need to interact with ASM directly. The primary reason to use the ASM API is when integrating with CallKit or another framework that needs to manage AVAudioSession externally.

Migrating to 1.25.4

Prior to 1.25.4, apps typically managed AVAudioSession directly -- calling setCategory:, setActive:, and configuring Bluetooth options in LocalMedia or RemoteMedia. With the centralized AudioSessionManager, this is no longer necessary and is actively discouraged.

When upgrading to 1.25.4:

Remove manual AVAudioSession calls from your LocalMedia and RemoteMedia classes. Delete any calls to setCategory:mode:options: and setActive: -- the SDK now handles these automatically.
Remove manual Bluetooth configuration. You no longer need to specify .allowBluetooth, .allowBluetoothA2DP or .allowBluetoothHFP session options. ASM configures these as part of its session setup.
Remove interruption handling code. The SDK now recovers from audio interruptions automatically in default mode.
Keep manual management only if you use CallKit. If your app integrates with CallKit, enable manualAudioSessionManagement and use the audioSessionDidActivateExternally / audioSessionDidDeactivateExternally callbacks as described in Manual Audio Session Management. This is the only scenario where manual management is appropriate.

Caution

Competing AVAudioSession calls from your app and the SDK's AudioSessionManager can cause undefined behaviour -- audio may not be captured, may play at low volume, or may fail silently. If you are migrating from an earlier SDK version, remove your manual session management code unless you have a specific need for CallKit integration.

Default Behaviour

When your app creates local or remote media and opens a connection, the SDK manages the full audio lifecycle automatically:

Each media track calls registerMediaWithMediaId: on the ASM singleton.
On the first media registration, ASM:
- Configures AVAudioSession (category: PlayAndRecord, mode: VideoChat)
- Activates the audio session
- Creates and starts the VPIO AudioUnit
For each sink (remote participant), ASM allocates a dedicated output buffer for mixing.
For each source (microphone), ASM registers the capture callback on the VPIO input bus.
On the last media unregistration, ASM stops and disposes the VPIO unit, then deactivates AVAudioSession.

No app code is required for any of this. The SDK handles session category, mode, activation, VPIO lifecycle, and interruption recovery automatically.

Note

On macOS, there is no AVAudioSession. ASM manages the AudioUnit directly using kAudioUnitSubType_HALOutput (when VPIO is turned off) or kAudioUnitSubType_VoiceProcessingIO.

Configuring the Audio Unit

Note

For most applications, the default values are appropriate and no configuration is required. You should only need to adjust these settings if your app uses LiveSwitch for music or media playback (where echo cancellation may degrade audio quality), or if you're experiencing specific audio issues such as unexpected gain behaviour or audio graph conflicts.

ASM exposes several properties that control how the VPIO AudioUnit is created and configured. Some of these are init-only -- they must be set before the first media registration (before any connection is opened). Once the VPIO unit is initialised, init-only properties are locked.

`useVoiceProcessingIO`

Default: YES | Timing: Before first media registration

Selects VPIO (with AEC, noise suppression and AGC) vs. plain RemoteIO (kAudioUnitSubType_RemoteIO on iOS) or HALOutput (kAudioUnitSubType_HALOutput on macOS). This must be YES for VoIP applications. Set to NO only for music or media playback apps where echo cancellation would degrade audio quality.

// Disable VPIO for music streaming (no echo cancellation)
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().setUseVoiceProcessingIO(false)

`channelCount`

Default: 2 (stereo) | Timing: Before first media registration

Sets the preferred output channel count for both the microphone and speaker paths.

When a Bluetooth HFP headset connects, the audio route negotiates at 16 kHz mono regardless of this setting -- the SDK automatically adapts to the hardware channel count reported by AVAudioSession. You don't need to set channelCount to 1 for Bluetooth to work correctly.

However, you can explicitly set mono if your app is voice-only and you want to avoid any unnecessary stereo processing on the non-Bluetooth path:

// Optional: force mono for all audio routes
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().setChannelCount(1)

`bypassVoiceProcessing`

Default: NO | Timing: Any time (live property)

Disables AEC and noise suppression while keeping the VPIO AudioUnit structure intact. This is useful for temporarily switching to a "music mode" mid-call without tearing down the audio graph.

FMLiveSwitchCocoaAudioSessionManager.sharedInstance().setBypassVoiceProcessing(true)

`voiceProcessingEnableAGC`

Default: YES | Timing: Any time (live property)

Controls automatic gain control on the microphone path. Disable this if your app implements its own gain management or if AGC is interfering with a specific audio workflow.

Important

This property only takes effect when useVoiceProcessingIO is YES and bypassVoiceProcessing is NO. If voice processing is bypassed, AGC is off as part of the bypass and this setting has no effect.

FMLiveSwitchCocoaAudioSessionManager.sharedInstance().setVoiceProcessingEnableAGC(false)

Summary

Property	Default	Timing	Effect
`useVoiceProcessingIO`	`YES`	Before first media	Selects VPIO vs RemoteIO/HALOutput
`channelCount`	`2` (stereo)	Before first media	Preferred output channel count (auto-adapts to Bluetooth mono)
`bypassVoiceProcessing`	`NO`	Any time	Disables AEC/noise suppression, keeps VPIO structure
`voiceProcessingEnableAGC`	`YES`	Any time	Automatic gain control on microphone

Note

The VPIO sample rate isn't configurable -- it's read from AVAudioSession.sampleRate at unit initialisation time. On Bluetooth SCO headsets this is typically 16,000 Hz; on the built-in speaker and microphone it's 48,000 Hz. The SDK handles any required sample rate conversion internally.

CallKit Integration and Manual Audio Session Management

By default, FMLiveSwitchCocoaAudioSessionManager fully manages the AVAudioSession lifecycle on your behalf:

When the first media component registers, it configures the session (PlayAndRecord category, VideoChat mode) and activates it.
When the last media component unregisters, it deactivates the session.
If an interruption occurs (incoming phone call, Siri, etc.), it automatically recovers when the interruption ends.

Manual mode changes this. When enabled, the AudioSessionManager still owns and operates the VPIO AudioUnit for capture and playback, but it delegates all AVAudioSession calls to your app. It won't set the category, activate, deactivate, or recover the session.

This is required for CallKit integration. Apple mandates that CXProvider configures and activates the audio session at an OS-controlled time. Any competing setActive: call from the SDK would conflict with CallKit and cause undefined behaviour.

Enabling Manual Mode

// Must be set BEFORE registering any media (before opening connections)
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().manualAudioSessionManagement = true

Caution

Attempting to change manualAudioSessionManagement while media is registered has no effect and logs a warning. Always set this property at app launch, before creating any LiveSwitch connections.

Behavior Comparison

Behavior	Default Mode	Manual Mode
`setCategory:mode:options:` on first media	Automatic	Skipped -- app must configure
`setActive:YES` on first media	Automatic	Skipped -- app/CallKit activates
`setActive:NO` on last media	Automatic	Skipped -- app/CallKit deactivates
Auto-recovery after interruption	Automatic	Skipped -- app must call `audioSessionDidActivateExternally`
VPIO AudioUnit management	Automatic	Automatic (unchanged)
Mixing and jitter buffering	Automatic	Automatic (unchanged)

What Your App Must Do

When CallKit grants audio, call audioSessionDidActivateExternally to tell the SDK that VPIO processing can begin. When CallKit revokes audio, call audioSessionDidDeactivateExternally to tell the SDK to stop and clean up the VPIO unit.

// CXProviderDelegate
func provider(_ provider: CXProvider, didActivate audioSession: AVAudioSession) {
    // CallKit has configured and activated the audio session.
    // Tell ASM it can initialise and start the VPIO unit.
    FMLiveSwitchCocoaAudioSessionManager.sharedInstance().audioSessionDidActivateExternally()
}

func provider(_ provider: CXProvider, didDeactivate audioSession: AVAudioSession) {
    // CallKit has revoked the audio session.
    // Tell ASM to stop and clean up the VPIO unit.
    FMLiveSwitchCocoaAudioSessionManager.sharedInstance().audioSessionDidDeactivateExternally()
}

The equivalent Objective-C calls:

// CXProviderDelegate
- (void)provider:(CXProvider *)provider didActivateAudioSession:(AVAudioSession *)audioSession {
    [[FMLiveSwitchCocoaAudioSessionManager sharedInstance] audioSessionDidActivateExternally];
}

- (void)provider:(CXProvider *)provider didDeactivateAudioSession:(AVAudioSession *)audioSession {
    [[FMLiveSwitchCocoaAudioSessionManager sharedInstance] audioSessionDidDeactivateExternally];
}

How Deferred VPIO Initialisation Works

In manual mode, there is a timing gap between when your app opens a LiveSwitch connection (which registers media and starts AudioUnitSource) and when CallKit actually grants the audio session. The SDK handles this transparently through deferred mode:

AudioUnitSource.start is called, but the VPIO unit doesn't exist yet (the session isn't active).
The source enters deferred mode and registers an observer for the internal AudioSessionManagerDidRecover notification.
Your CXProviderDelegate receives didActivateAudioSession: and you call audioSessionDidActivateExternally.
The AudioSessionManager initialises and starts the VPIO unit, then posts the recovery notification.
The source receives the notification, acquires the VPIO unit, configures its format and registers its microphone callback.

Note

The source doesn't need to be restarted. Even if audioSessionDidActivateExternally fires well after the connection is established, the deferred mechanism handles the late activation seamlessly. Audio capture and playback begin as soon as the VPIO unit is ready.

Complete CallKit Integration Sequence

The following example shows the full lifecycle from app launch to call teardown:

// 1. Enable manual mode at app launch (before any media).
//    Do this once -- typically in application(_:didFinishLaunchingWithOptions:).
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().manualAudioSessionManagement = true

// 2. Create the LiveSwitch connection and register media.
//    Internally, registerMediaWithMediaId: runs, but setActive:YES is skipped.
//    If AudioUnitSource starts before CallKit grants audio,
//    it enters deferred mode automatically.

// 3. Report the incoming (or outgoing) call to CallKit.
//    CallKit configures the audio session at the OS-controlled time
//    and calls didActivateAudioSession: when ready.

// 4. CallKit grants audio:
func provider(_ provider: CXProvider, didActivate audioSession: AVAudioSession) {
    FMLiveSwitchCocoaAudioSessionManager.sharedInstance().audioSessionDidActivateExternally()
    // The VPIO unit is now running.
    // Audio capture and playback begin immediately.
}

// 5. Call proceeds normally.
//    ASM handles mixing, jitter buffering, resampling and route changes.

// 6. Call ends; CallKit revokes audio:
func provider(_ provider: CXProvider, didDeactivate audioSession: AVAudioSession) {
    FMLiveSwitchCocoaAudioSessionManager.sharedInstance().audioSessionDidDeactivateExternally()
    // VPIO unit is stopped and cleaned up.
    // Speaker output buffers are reset.
}

// 7. Tear down the LiveSwitch connection and unregister media.
//    Because manual mode is active, the SDK does not call setActive:NO --
//    CallKit has already deactivated the session in step 6.

Important

The order matters. Always enable manual mode (step 1) before creating connections (step 2). Always call audioSessionDidActivateExternally in didActivateAudioSession: and audioSessionDidDeactivateExternally in didDeactivateAudioSession:. Don't call setActive: yourself when using CallKit -- let CallKit own the session lifecycle.

Bluetooth Audio Considerations

Bluetooth Hands-Free Profile (HFP) establishes a full-duplex SCO (Synchronous Connection-Oriented) link at 16 kHz mono. This has several implications for how ASM manages audio.

Recommendations

Channel adaptation is automatic. When a Bluetooth HFP headset connects, the SDK automatically adapts to the mono channel count reported by AVAudioSession. You don't need to set channelCount to 1 for Bluetooth to work correctly. See Configuring the Audio Unit for optional manual override.
Sample rate conversion is automatic. If the VPIO unit runs at 16 kHz (Bluetooth) but the codec expects 48 kHz, ASM resamples transparently in both directions. No app code is needed.
The output bus is always kept active, even for send-only streams. This is required because Bluetooth HFP won't establish the SCO link unless both the input and output directions are active. The SDK handles this internally -- you don't need to create a dummy sink.

Important

Avoid tearing down and recreating the VPIO AudioUnit mid-call (for example, when adding a send stream to a receive-only session). ASM uses a lightweight I/O reconfiguration path (stop, uninitialise, toggle EnableIO, reinitialise, start) that preserves the existing SCO link. A full teardown causes Bluetooth headsets to announce "call ended" and drop the audio route.

Typical Bluetooth Audio Flow

When a Bluetooth HFP headset is connected and a call begins:

ASM detects the active Bluetooth route via AVAudioSession route change notifications.
The VPIO unit initialises at the Bluetooth-negotiated sample rate (typically 16 kHz).
ASM resamples between the VPIO rate and the codec rate as needed.
If the session transitions from receive-only to send-and-receive, ASM reconfigures the VPIO I/O buses in place without dropping the SCO link.

Caution

If your app uses CallKit and manages AVAudioSession manually, be careful not to deactivate and reactivate the session during a call. This tears down the SCO link and may cause the Bluetooth headset to disconnect or announce "call ended." See Manual Audio Session Management for CallKit integration details.

Interruption and Recovery Handling

Note

This section applies to iOS only. macOS doesn't have AVAudioSession interruptions or a background/foreground app lifecycle.

Audio interruptions occur when another app or system service takes control of the audio hardware -- for example, an incoming phone call, Siri activation or a timer alarm. The AudioSessionManager handles these events differently depending on whether you are using default mode or manual mode.

Default Mode (Automatic Recovery)

In default mode, recovery is fully automatic:

Interruption begins. The system posts AVAudioSessionInterruptionNotification. The AudioSessionManager detects it and stops the VPIO unit.
Interruption ends. The AudioSessionManager reactivates the session, reinitialises the VPIO unit and posts the AudioSessionManagerDidRecover notification. Sources and sinks detect the notification and reattach to the new VPIO unit.
Interruption ends while backgrounded. If the app is in the background when the interruption ends, recovery is deferred until the app returns to the foreground.

No action is required from your code. The SDK resumes audio capture and playback transparently.

Manual Mode

In manual mode, the AudioSessionManager doesn't perform automatic recovery after interruptions. It assumes an external manager (such as CallKit) owns the session lifecycle.

When an interruption ends, the SDK doesn't reactivate the session or reinitialise VPIO.
Your app is responsible for detecting when the audio session becomes available again and calling audioSessionDidActivateExternally.
For CallKit apps, this is handled naturally: when the system restores your call after an interruption, CallKit calls provider(_:didActivate:) and you forward that to the AudioSessionManager.

Background-to-Foreground Transitions

When your app returns to the foreground, the AudioSessionManager uses a conditional recovery strategy rather than always tearing down and rebuilding the VPIO unit. A full teardown and rebuild only occurs if:

A system interruption occurred while the app was in the background (for example, an incoming phone call, Siri activation or a timer alarm that posted AVAudioSessionInterruptionNotification), or
The VPIO unit is detected as frozen -- the real-time mixer callback has stopped advancing, which can happen if iOS silently reclaims audio resources from a backgrounded app without posting a formal interruption notification.

If neither condition is true, the existing VPIO unit continues running undisturbed.

Important

This conditional approach prevents Bluetooth headsets from announcing "call ended" on every background-to-foreground transition. The Bluetooth SCO link is preserved as long as the VPIO unit is alive and running. Without this, some headsets (particularly those using HFP/SCO profiles) interpret the session deactivation as a call ending and play an audible announcement.

Diagnostics

Call getDiagnostics at any time from any thread to get a human-readable snapshot of the audio subsystem state.

let diagnostics = FMLiveSwitchCocoaAudioSessionManager.sharedInstance().getDiagnostics()
print(diagnostics)

NSString *diagnostics = [[FMLiveSwitchCocoaAudioSessionManager sharedInstance] getDiagnostics];
NSLog(@"Audio diagnostics:\n%@", diagnostics);

What the Output Includes

The diagnostics string contains a point-in-time snapshot of the entire audio subsystem:

Media registrations: Total active media count, broken down by send and receive with all registered media IDs.
Session configuration: Whether manual audio session management is enabled, whether the session is active and whether the VPIO unit is initialised.
VPIO settings: Voice processing mode (VPIO vs. RemoteIO), bypass state, AGC state, channel count (mono/stereo) and sample rate.
Microphone input: The media ID of the registered microphone source, or "None" if no source is registered.
Speaker outputs (per-sink): For each registered speaker output:
- Audio format (sample rate, channel count)
- Whether resampling is active and the resample ratio
- Circular buffer fill level (bytes used / total, percentage)
- Samples written and read
- Underrun counts (consecutive and total)
- Active state
AVAudioSession details (iOS): Current category, mode, hardware sample rate, input availability and output volume.

Note

Include the full diagnostics output when filing bug reports. It provides a complete picture of the audio subsystem state at the time of the issue and is essential for diagnosing audio problems.

Note

For LiveSwitch Server customers, audio diagnostics are also reported to the server automatically as part of the SDK's telemetry pipeline. Two types of telemetry are sent:

asmEvent -- Mid-call audio quality events (underruns, jitter spikes, overruns, interruptions and device changes).
asmSnapshot -- A full AudioSessionManager configuration snapshot, captured at call start and refreshed after audio route changes.

ASM telemetry activates automatically for SDK ≥ 1.25.4 clients connected to a server ≥ 1.25.4. No client-side configuration is required. Server operators can gate or suppress ASM telemetry per deployment using the clientTelemetry section in the Deployment Config:

{
  "clientTelemetry": {
    "sendASMTelemetryMinRemoteVersion": "1.25.4",
    "disabledTelemetryTypes": ["asmEvent", "asmSnapshot"]
  }
}

sendASMTelemetryMinRemoteVersion -- Only clients at or higher than this SDK version send ASM telemetry. Set to null (default) to allow all versions.
disabledTelemetryTypes -- An array of telemetry type identifiers to suppress entirely. For example, ["asmSnapshot"] disables configuration snapshots while keeping mid-call events active.