Insights / On-Device AI
Swift Concurrency for AI Workloads
Actors, AsyncStream, Task priority, and cancellation — the complete concurrency picture for shipping Core ML and Foundation Models features without race conditions, UI freezes, or memory leaks.
Why AI workloads need special concurrency care
On-device AI inference has two characteristics that make it unusually demanding on a concurrency model: it is slow (20ms to 2 seconds depending on model size) and it is not thread-safe (MLModel instances cannot be called from multiple threads concurrently). The naive solution — wrapping inference in DispatchQueue.global().async — solves the first problem and ignores the second. The correct solution uses Swift's structured concurrency model end-to-end.
This guide covers the four patterns that make AI inference correct in a production SwiftUI app: actor isolation for thread safety, async/await for non-blocking dispatch, AsyncStream for token-by-token streaming, and Task management for cancellation and priority.
1. Actors: thread-safe model access
MLModel is not thread-safe. Two concurrent calls to prediction() on the same instance can cause a crash. Swift actors serialize access to their state — only one caller can execute actor-isolated code at a time. This eliminates the bug class entirely, without locks.
actor ClassifierService {
private let model: SentimentClassifier
init() throws {
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine
self.model = try SentimentClassifier(configuration: config)
}
/// async/await: suspends caller, doesn't block any thread
func classify(text: String) async throws -> String {
let input = SentimentClassifierInput(text: text)
let output = try await model.prediction(input: input)
return output.label
}
/// Batch version using the async batch API (iOS 16+)
func classifyBatch(_ texts: [String]) async throws -> [String] {
let inputs = texts.map { SentimentClassifierInput(text: $0) }
let outputs = try await model.predictions(inputs: inputs)
return outputs.map(\.label)
}
}
// One shared instance for the whole app
// @Observable ViewModel owns the actor — not each view
@MainActor @Observable
class FeedViewModel {
private let classifier = try! ClassifierService()
private(set) var result = ""
func classify(_ text: String) async {
result = (try? await classifier.classify(text: text)) ?? "error"
}
}Key points:
- ▸The actor serializes all
classify()calls — even if 20 views call it simultaneously, only one prediction runs at a time per model instance. - ▸The
awaitat the call site suspends the caller's task without blocking the thread — other tasks continue executing. - ▸If you need parallel inference throughput, use a pool of actor instances with
withThrowingTaskGroup.
2. AsyncStream: streaming token-by-token output
Apple Foundation Models supports streaming inference — tokens are yielded as they are generated rather than waiting for the complete response. AsyncStream and AsyncThrowingStream are the correct types for consuming this in SwiftUI.
import FoundationModels
actor LLMService {
private let session = LanguageModelSession()
/// Returns an AsyncThrowingStream — allows callers to iterate tokens
/// and handle errors (e.g., unsupported device, model error)
func stream(prompt: String) -> AsyncThrowingStream<String, Error> {
AsyncThrowingStream { continuation in
Task {
do {
// Foundation Models streams partial strings
for try await partial in session.streamResponse(to: prompt) {
// Try to cancel: check before each yield
try Task.checkCancellation()
continuation.yield(partial)
}
continuation.finish()
} catch is CancellationError {
continuation.finish() // clean exit — no error
} catch {
continuation.finish(throwing: error)
}
}
}
}
}
// SwiftUI view — tokens appear progressively
struct StreamingView: View {
@State private var output = ""
@State private var error: String?
let service: LLMService
let prompt: String
var body: some View {
ScrollView {
Text(output)
.frame(maxWidth: .infinity, alignment: .leading)
.padding()
}
// .task(id:) cancels and restarts when prompt changes
.task(id: prompt) {
output = ""
do {
for try await token in await service.stream(prompt: prompt) {
output += token // @MainActor — safe
}
} catch {
self.error = error.localizedDescription
}
}
}
}Key points:
- ▸
.task(id: prompt)automatically cancels the previous stream when the prompt changes — no cleanup code required. - ▸
try Task.checkCancellation()exits the stream cleanly rather than continuing to yield tokens for a view that no longer exists. - ▸Use
AsyncThrowingStreamwhen the stream can fail;AsyncStreamwhen failure is impossible (rare in AI inference).
3. Task priority for inference workloads
Swift task priority maps to thread scheduling priority. The right choice determines whether inference latency competes with UI rendering or runs quietly in the background.
// Map inference intent → Task priority
// User tapped "Classify" and is waiting for result
Button("Classify") {
Task(priority: .userInitiated) {
let result = try await classifier.classify(text: input)
await MainActor.run { self.result = result }
}
}
// Pre-warming the model on app launch (user not waiting)
.onAppear {
Task(priority: .utility) {
await classifier.warmUp() // loads model silently
}
}
// Batch classifying content in the background (not visible yet)
func classifyAllInBackground(_ items: [Item]) {
Task(priority: .background) {
for item in items {
guard !Task.isCancelled else { break }
let label = try? await classifier.classify(text: item.body)
// update local store — don't touch UI from here
await repository.updateLabel(item.id, label: label)
}
}
}
// Parallel batch with TaskGroup (respects priority)
func classifyParallel(_ texts: [String]) async throws -> [String] {
try await withThrowingTaskGroup(of: (Int, String).self) { group in
for (i, text) in texts.enumerated() {
group.addTask(priority: .userInitiated) {
let label = try await classifier.classify(text: text)
return (i, label)
}
}
var results = [(Int, String)]()
for try await result in group { results.append(result) }
return results.sorted { $0.0 < $1.0 }.map { $0.1 }
}
}Priority reference:
.userInitiatedUser triggered, result appears immediately in UI.utilityPre-warming, indexing — user will benefit soon.backgroundBatch processing the user won't see this session.high / .defaultRarely correct for inference — avoid4. Cancellation: preventing ghost inference
Without cancellation, inference continues running after a view disappears — consuming CPU, memory, and battery for work whose result will never be used. Swift's structured concurrency propagates cancellation automatically when you use .task, but manually-created Tasks require explicit cancellation.
@MainActor @Observable
class SearchViewModel {
private(set) var results: [SearchResult] = []
private var currentTask: Task<Void, Never>?
func search(query: String) {
// Cancel any in-progress inference before starting a new one
currentTask?.cancel()
currentTask = Task(priority: .userInitiated) {
// Check cancellation before starting expensive work
guard !Task.isCancelled else { return }
let candidates = try? await repository.fetch(query: query)
guard !Task.isCancelled else { return } // cancelled while fetching
let ranked = await inferenceService.rank(candidates ?? [], for: query)
guard !Task.isCancelled else { return } // cancelled during ranking
results = ranked // only reaches here if not cancelled
}
}
func onDisappear() {
// Belt-and-suspenders: cancel explicitly when view disappears
currentTask?.cancel()
currentTask = nil
}
}
// In a SwiftUI view: prefer .task(id:) — it handles cancellation automatically
.searchable(text: $query)
.task(id: query) {
// Automatically cancels and restarts when query changes
await viewModel.search(query: query)
}5. @MainActor: connecting inference output to SwiftUI
Inference runs off the main actor. The results must reach the main actor to update SwiftUI state. There are two patterns: annotate the ViewModel class with @MainActor (Swift 5.10+, recommended), or hop manually with await MainActor.run .
// Preferred (Swift 5.10+): annotate the entire class
// All properties and methods run on the main actor
@MainActor @Observable
class PredictionViewModel {
private(set) var label = ""
private(set) var confidence: Float = 0
private let service: ClassifierService
init(service: ClassifierService) { self.service = service }
func predict(text: String) async {
// await crosses into ClassifierService actor (off main)
// returns back to @MainActor (main thread) after await
guard let output = try? await service.classify(text: text) else { return }
label = output.label // safe — we're on MainActor
confidence = output.confidence
}
}
// Manual hop (when you can't annotate the class)
func handleInference() {
Task {
let result = try? await service.classify(text: input)
await MainActor.run {
self.displayLabel = result?.label ?? "unknown"
}
}
}
// Swift 6 strict concurrency: @MainActor is required, not optional
// Enable strict concurrency checking in Xcode 16+:
// Build Settings → Swift Compiler - Language → Strict Concurrency: Complete