iOS Audio Session Management
This guide explains how the LiveSwitch iOS SDK manages audio sessions, the VPIO AudioUnit and AVAudioSession. It covers default behaviour, configuration options, CallKit integration, Bluetooth handling, interruption recovery and diagnostics.
Overview
The LiveSwitch iOS SDK uses a centralized FMLiveSwitchCocoaAudioSessionManager (ASM) singleton that owns the single shared Voice Processing I/O (VPIO) AudioUnit for the entire process. All microphone and speaker instances communicate through ASM -- they never create or hold their own AudioUnit.
This centralized design solves several problems that arise when multiple audio components compete for shared system resources:
| Problem | Centralized Solution |
|---|---|
| Two AudioUnits create two competing echo cancellers | One VPIO unit provides a single echo canceller with full context |
| Each unit tries to open its own Bluetooth SCO link | One unit negotiates a single SCO link at one sample rate |
| Multiple remote streams need mixing before output | ASM owns the real-time mixer callback and mixes all sinks |
Competing AVAudioSession setCategory/setActive calls |
One manager configures the session once; manual mode delegates to CallKit |
Key Classes
FMLiveSwitchCocoaAudioSessionManager-- Singleton. Owns the VPIO AudioUnit,AVAudioSessionlifecycle, real-time mixer and per-participant output buffers.FMLiveSwitchCocoaAudioUnitSink-- One instance per remote participant. Writes decoded PCM audio into ASM for mixing and playback.FMLiveSwitchCocoaAudioUnitSource-- One active instance at a time (one microphone). Pulls captured samples from the VPIO input bus and delivers them to the encoding pipeline.
Note
By default, ASM handles everything automatically. Most apps don't need to interact with ASM directly. The primary reason to use the ASM API is when integrating with CallKit or another framework that needs to manage AVAudioSession externally.
Migrating to 1.25.4
Prior to 1.25.4, apps typically managed AVAudioSession directly -- calling setCategory:, setActive:, and configuring Bluetooth options in LocalMedia or RemoteMedia. With the centralized AudioSessionManager, this is no longer necessary and is actively discouraged.
When upgrading to 1.25.4:
- Remove manual
AVAudioSessioncalls from yourLocalMediaandRemoteMediaclasses. Delete any calls tosetCategory:mode:options:andsetActive:-- the SDK now handles these automatically. - Remove manual Bluetooth configuration. You no longer need to specify
.allowBluetooth,.allowBluetoothA2DPor.allowBluetoothHFPsession options. ASM configures these as part of its session setup. - Remove interruption handling code. The SDK now recovers from audio interruptions automatically in default mode.
- Keep manual management only if you use CallKit. If your app integrates with CallKit, enable
manualAudioSessionManagementand use theaudioSessionDidActivateExternally/audioSessionDidDeactivateExternallycallbacks as described in Manual Audio Session Management. This is the only scenario where manual management is appropriate.
Caution
Competing AVAudioSession calls from your app and the SDK's AudioSessionManager can cause undefined behaviour -- audio may not be captured, may play at low volume, or may fail silently. If you are migrating from an earlier SDK version, remove your manual session management code unless you have a specific need for CallKit integration.
Default Behaviour
When your app creates local or remote media and opens a connection, the SDK manages the full audio lifecycle automatically:
- Each media track calls
registerMediaWithMediaId:on the ASM singleton. - On the first media registration, ASM:
- Configures
AVAudioSession(category:PlayAndRecord, mode:VideoChat) - Activates the audio session
- Creates and starts the VPIO AudioUnit
- Configures
- For each sink (remote participant), ASM allocates a dedicated output buffer for mixing.
- For each source (microphone), ASM registers the capture callback on the VPIO input bus.
- On the last media unregistration, ASM stops and disposes the VPIO unit, then deactivates
AVAudioSession.
No app code is required for any of this. The SDK handles session category, mode, activation, VPIO lifecycle, and interruption recovery automatically.
Note
On macOS, there is no AVAudioSession. ASM manages the AudioUnit directly using kAudioUnitSubType_HALOutput (when VPIO is turned off) or kAudioUnitSubType_VoiceProcessingIO.
Configuring the Audio Unit
Note
For most applications, the default values are appropriate and no configuration is required. You should only need to adjust these settings if your app uses LiveSwitch for music or media playback (where echo cancellation may degrade audio quality), or if you're experiencing specific audio issues such as unexpected gain behaviour or audio graph conflicts.
ASM exposes several properties that control how the VPIO AudioUnit is created and configured. Some of these are init-only -- they must be set before the first media registration (before any connection is opened). Once the VPIO unit is initialised, init-only properties are locked.
useVoiceProcessingIO
Default: YES | Timing: Before first media registration
Selects VPIO (with AEC, noise suppression and AGC) vs. plain RemoteIO (kAudioUnitSubType_RemoteIO on iOS) or HALOutput (kAudioUnitSubType_HALOutput on macOS). This must be YES for VoIP applications. Set to NO only for music or media playback apps where echo cancellation would degrade audio quality.
// Disable VPIO for music streaming (no echo cancellation)
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().setUseVoiceProcessingIO(false)
channelCount
Default: 2 (stereo) | Timing: Before first media registration
Sets the preferred output channel count for both the microphone and speaker paths.
When a Bluetooth HFP headset connects, the audio route negotiates at 16 kHz mono regardless of this setting -- the SDK automatically adapts to the hardware channel count reported by AVAudioSession. You don't need to set channelCount to 1 for Bluetooth to work correctly.
However, you can explicitly set mono if your app is voice-only and you want to avoid any unnecessary stereo processing on the non-Bluetooth path:
// Optional: force mono for all audio routes
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().setChannelCount(1)
bypassVoiceProcessing
Default: NO | Timing: Any time (live property)
Disables AEC and noise suppression while keeping the VPIO AudioUnit structure intact. This is useful for temporarily switching to a "music mode" mid-call without tearing down the audio graph.
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().setBypassVoiceProcessing(true)
voiceProcessingEnableAGC
Default: YES | Timing: Any time (live property)
Controls automatic gain control on the microphone path. Disable this if your app implements its own gain management or if AGC is interfering with a specific audio workflow.
Important
This property only takes effect when useVoiceProcessingIO is YES and bypassVoiceProcessing is NO. If voice processing is bypassed, AGC is off as part of the bypass and this setting has no effect.
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().setVoiceProcessingEnableAGC(false)
Summary
| Property | Default | Timing | Effect |
|---|---|---|---|
useVoiceProcessingIO |
YES |
Before first media | Selects VPIO vs RemoteIO/HALOutput |
channelCount |
2 (stereo) |
Before first media | Preferred output channel count (auto-adapts to Bluetooth mono) |
bypassVoiceProcessing |
NO |
Any time | Disables AEC/noise suppression, keeps VPIO structure |
voiceProcessingEnableAGC |
YES |
Any time | Automatic gain control on microphone |
Note
The VPIO sample rate isn't configurable -- it's read from AVAudioSession.sampleRate at unit initialisation time. On Bluetooth SCO headsets this is typically 16,000 Hz; on the built-in speaker and microphone it's 48,000 Hz. The SDK handles any required sample rate conversion internally.
CallKit Integration and Manual Audio Session Management
By default, FMLiveSwitchCocoaAudioSessionManager fully manages the AVAudioSession lifecycle on your behalf:
- When the first media component registers, it configures the session (
PlayAndRecordcategory,VideoChatmode) and activates it. - When the last media component unregisters, it deactivates the session.
- If an interruption occurs (incoming phone call, Siri, etc.), it automatically recovers when the interruption ends.
Manual mode changes this. When enabled, the AudioSessionManager still owns and operates the VPIO AudioUnit for capture and playback, but it delegates all AVAudioSession calls to your app. It won't set the category, activate, deactivate, or recover the session.
This is required for CallKit integration. Apple mandates that CXProvider configures and activates the audio session at an OS-controlled time. Any competing setActive: call from the SDK would conflict with CallKit and cause undefined behaviour.
Enabling Manual Mode
// Must be set BEFORE registering any media (before opening connections)
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().manualAudioSessionManagement = true
Caution
Attempting to change manualAudioSessionManagement while media is registered has no effect and logs a warning. Always set this property at app launch, before creating any LiveSwitch connections.
Behavior Comparison
| Behavior | Default Mode | Manual Mode |
|---|---|---|
setCategory:mode:options: on first media |
Automatic | Skipped -- app must configure |
setActive:YES on first media |
Automatic | Skipped -- app/CallKit activates |
setActive:NO on last media |
Automatic | Skipped -- app/CallKit deactivates |
| Auto-recovery after interruption | Automatic | Skipped -- app must call audioSessionDidActivateExternally |
| VPIO AudioUnit management | Automatic | Automatic (unchanged) |
| Mixing and jitter buffering | Automatic | Automatic (unchanged) |
What Your App Must Do
When CallKit grants audio, call audioSessionDidActivateExternally to tell the SDK that VPIO processing can begin. When CallKit revokes audio, call audioSessionDidDeactivateExternally to tell the SDK to stop and clean up the VPIO unit.
// CXProviderDelegate
func provider(_ provider: CXProvider, didActivate audioSession: AVAudioSession) {
// CallKit has configured and activated the audio session.
// Tell ASM it can initialise and start the VPIO unit.
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().audioSessionDidActivateExternally()
}
func provider(_ provider: CXProvider, didDeactivate audioSession: AVAudioSession) {
// CallKit has revoked the audio session.
// Tell ASM to stop and clean up the VPIO unit.
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().audioSessionDidDeactivateExternally()
}
The equivalent Objective-C calls:
// CXProviderDelegate
- (void)provider:(CXProvider *)provider didActivateAudioSession:(AVAudioSession *)audioSession {
[[FMLiveSwitchCocoaAudioSessionManager sharedInstance] audioSessionDidActivateExternally];
}
- (void)provider:(CXProvider *)provider didDeactivateAudioSession:(AVAudioSession *)audioSession {
[[FMLiveSwitchCocoaAudioSessionManager sharedInstance] audioSessionDidDeactivateExternally];
}
How Deferred VPIO Initialisation Works
In manual mode, there is a timing gap between when your app opens a LiveSwitch connection (which registers media and starts AudioUnitSource) and when CallKit actually grants the audio session. The SDK handles this transparently through deferred mode:
AudioUnitSource.startis called, but the VPIO unit doesn't exist yet (the session isn't active).- The source enters deferred mode and registers an observer for the internal
AudioSessionManagerDidRecovernotification. - Your
CXProviderDelegatereceivesdidActivateAudioSession:and you callaudioSessionDidActivateExternally. - The AudioSessionManager initialises and starts the VPIO unit, then posts the recovery notification.
- The source receives the notification, acquires the VPIO unit, configures its format and registers its microphone callback.
Note
The source doesn't need to be restarted. Even if audioSessionDidActivateExternally fires well after the connection is established, the deferred mechanism handles the late activation seamlessly. Audio capture and playback begin as soon as the VPIO unit is ready.
Complete CallKit Integration Sequence
The following example shows the full lifecycle from app launch to call teardown:
// 1. Enable manual mode at app launch (before any media).
// Do this once -- typically in application(_:didFinishLaunchingWithOptions:).
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().manualAudioSessionManagement = true
// 2. Create the LiveSwitch connection and register media.
// Internally, registerMediaWithMediaId: runs, but setActive:YES is skipped.
// If AudioUnitSource starts before CallKit grants audio,
// it enters deferred mode automatically.
// 3. Report the incoming (or outgoing) call to CallKit.
// CallKit configures the audio session at the OS-controlled time
// and calls didActivateAudioSession: when ready.
// 4. CallKit grants audio:
func provider(_ provider: CXProvider, didActivate audioSession: AVAudioSession) {
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().audioSessionDidActivateExternally()
// The VPIO unit is now running.
// Audio capture and playback begin immediately.
}
// 5. Call proceeds normally.
// ASM handles mixing, jitter buffering, resampling and route changes.
// 6. Call ends; CallKit revokes audio:
func provider(_ provider: CXProvider, didDeactivate audioSession: AVAudioSession) {
FMLiveSwitchCocoaAudioSessionManager.sharedInstance().audioSessionDidDeactivateExternally()
// VPIO unit is stopped and cleaned up.
// Speaker output buffers are reset.
}
// 7. Tear down the LiveSwitch connection and unregister media.
// Because manual mode is active, the SDK does not call setActive:NO --
// CallKit has already deactivated the session in step 6.
Important
The order matters. Always enable manual mode (step 1) before creating connections (step 2). Always call audioSessionDidActivateExternally in didActivateAudioSession: and audioSessionDidDeactivateExternally in didDeactivateAudioSession:. Don't call setActive: yourself when using CallKit -- let CallKit own the session lifecycle.
Bluetooth Audio Considerations
Bluetooth Hands-Free Profile (HFP) establishes a full-duplex SCO (Synchronous Connection-Oriented) link at 16 kHz mono. This has several implications for how ASM manages audio.
Recommendations
Channel adaptation is automatic. When a Bluetooth HFP headset connects, the SDK automatically adapts to the mono channel count reported by
AVAudioSession. You don't need to setchannelCountto1for Bluetooth to work correctly. See Configuring the Audio Unit for optional manual override.Sample rate conversion is automatic. If the VPIO unit runs at 16 kHz (Bluetooth) but the codec expects 48 kHz, ASM resamples transparently in both directions. No app code is needed.
The output bus is always kept active, even for send-only streams. This is required because Bluetooth HFP won't establish the SCO link unless both the input and output directions are active. The SDK handles this internally -- you don't need to create a dummy sink.
Important
Avoid tearing down and recreating the VPIO AudioUnit mid-call (for example, when adding a send stream to a receive-only session). ASM uses a lightweight I/O reconfiguration path (stop, uninitialise, toggle EnableIO, reinitialise, start) that preserves the existing SCO link. A full teardown causes Bluetooth headsets to announce "call ended" and drop the audio route.
Typical Bluetooth Audio Flow
When a Bluetooth HFP headset is connected and a call begins:
- ASM detects the active Bluetooth route via
AVAudioSessionroute change notifications. - The VPIO unit initialises at the Bluetooth-negotiated sample rate (typically 16 kHz).
- ASM resamples between the VPIO rate and the codec rate as needed.
- If the session transitions from receive-only to send-and-receive, ASM reconfigures the VPIO I/O buses in place without dropping the SCO link.
Caution
If your app uses CallKit and manages AVAudioSession manually, be careful not to deactivate and reactivate the session during a call. This tears down the SCO link and may cause the Bluetooth headset to disconnect or announce "call ended." See Manual Audio Session Management for CallKit integration details.
Interruption and Recovery Handling
Note
This section applies to iOS only. macOS doesn't have AVAudioSession interruptions or a background/foreground app lifecycle.
Audio interruptions occur when another app or system service takes control of the audio hardware -- for example, an incoming phone call, Siri activation or a timer alarm. The AudioSessionManager handles these events differently depending on whether you are using default mode or manual mode.
Default Mode (Automatic Recovery)
In default mode, recovery is fully automatic:
- Interruption begins. The system posts
AVAudioSessionInterruptionNotification. The AudioSessionManager detects it and stops the VPIO unit. - Interruption ends. The AudioSessionManager reactivates the session, reinitialises the VPIO unit and posts the
AudioSessionManagerDidRecovernotification. Sources and sinks detect the notification and reattach to the new VPIO unit. - Interruption ends while backgrounded. If the app is in the background when the interruption ends, recovery is deferred until the app returns to the foreground.
No action is required from your code. The SDK resumes audio capture and playback transparently.
Manual Mode
In manual mode, the AudioSessionManager doesn't perform automatic recovery after interruptions. It assumes an external manager (such as CallKit) owns the session lifecycle.
- When an interruption ends, the SDK doesn't reactivate the session or reinitialise VPIO.
- Your app is responsible for detecting when the audio session becomes available again and calling
audioSessionDidActivateExternally. - For CallKit apps, this is handled naturally: when the system restores your call after an interruption, CallKit calls
provider(_:didActivate:)and you forward that to the AudioSessionManager.
Background-to-Foreground Transitions
When your app returns to the foreground, the AudioSessionManager uses a conditional recovery strategy rather than always tearing down and rebuilding the VPIO unit. A full teardown and rebuild only occurs if:
- A system interruption occurred while the app was in the background (for example, an incoming phone call, Siri activation or a timer alarm that posted
AVAudioSessionInterruptionNotification), or - The VPIO unit is detected as frozen -- the real-time mixer callback has stopped advancing, which can happen if iOS silently reclaims audio resources from a backgrounded app without posting a formal interruption notification.
If neither condition is true, the existing VPIO unit continues running undisturbed.
Important
This conditional approach prevents Bluetooth headsets from announcing "call ended" on every background-to-foreground transition. The Bluetooth SCO link is preserved as long as the VPIO unit is alive and running. Without this, some headsets (particularly those using HFP/SCO profiles) interpret the session deactivation as a call ending and play an audible announcement.
Diagnostics
Call getDiagnostics at any time from any thread to get a human-readable snapshot of the audio subsystem state.
let diagnostics = FMLiveSwitchCocoaAudioSessionManager.sharedInstance().getDiagnostics()
print(diagnostics)
NSString *diagnostics = [[FMLiveSwitchCocoaAudioSessionManager sharedInstance] getDiagnostics];
NSLog(@"Audio diagnostics:\n%@", diagnostics);
What the Output Includes
The diagnostics string contains a point-in-time snapshot of the entire audio subsystem:
- Media registrations: Total active media count, broken down by send and receive with all registered media IDs.
- Session configuration: Whether manual audio session management is enabled, whether the session is active and whether the VPIO unit is initialised.
- VPIO settings: Voice processing mode (VPIO vs. RemoteIO), bypass state, AGC state, channel count (mono/stereo) and sample rate.
- Microphone input: The media ID of the registered microphone source, or "None" if no source is registered.
- Speaker outputs (per-sink): For each registered speaker output:
- Audio format (sample rate, channel count)
- Whether resampling is active and the resample ratio
- Circular buffer fill level (bytes used / total, percentage)
- Samples written and read
- Underrun counts (consecutive and total)
- Active state
- AVAudioSession details (iOS): Current category, mode, hardware sample rate, input availability and output volume.
Note
Include the full diagnostics output when filing bug reports. It provides a complete picture of the audio subsystem state at the time of the issue and is essential for diagnosing audio problems.
Note
For LiveSwitch Server customers, audio diagnostics are also reported to the server automatically as part of the SDK's telemetry pipeline. Two types of telemetry are sent:
asmEvent-- Mid-call audio quality events (underruns, jitter spikes, overruns, interruptions and device changes).asmSnapshot-- A full AudioSessionManager configuration snapshot, captured at call start and refreshed after audio route changes.
ASM telemetry activates automatically for SDK ≥ 1.25.4 clients connected to a server ≥ 1.25.4. No client-side configuration is required. Server operators can gate or suppress ASM telemetry per deployment using the clientTelemetry section in the Deployment Config:
{
"clientTelemetry": {
"sendASMTelemetryMinRemoteVersion": "1.25.4",
"disabledTelemetryTypes": ["asmEvent", "asmSnapshot"]
}
}
sendASMTelemetryMinRemoteVersion-- Only clients at or higher than this SDK version send ASM telemetry. Set tonull(default) to allow all versions.disabledTelemetryTypes-- An array of telemetry type identifiers to suppress entirely. For example,["asmSnapshot"]disables configuration snapshots while keeping mid-call events active.