On-Device AI

Swift Concurrency for AI Workloads: Actors, AsyncStream, and Task Priority

Actors, AsyncStream, Task priority, and cancellation — the complete concurrency picture for shipping Core ML and Foundation Models features without race conditions, UI freezes, or memory leaks.

By Ehsan Azish · 3NSOFTS·March 2026·8 min read

Why AI workloads need special concurrency care

On-device AI inference has two characteristics that make it unusually demanding on a concurrency model: it is slow (20ms to 2 seconds depending on model size) and it is not thread-safe (MLModel instances cannot be called from multiple threads concurrently). The naive solution — wrapping inference in DispatchQueue.global().async — solves the first problem and ignores the second. The correct solution uses Swift's structured concurrency model end-to-end.

This guide covers the four patterns that make AI inference correct in a production SwiftUI app: actor isolation for thread safety, async/await for non-blocking dispatch, AsyncStream for token-by-token streaming, and Task management for cancellation and priority.

1. Actors: thread-safe model access

MLModel is not thread-safe. Two concurrent calls to prediction() on the same instance can cause a crash. Swift actors serialize access to their state — only one caller can execute actor-isolated code at a time. This eliminates the bug class entirely, without locks.

actor ClassifierService {
 private let model: SentimentClassifier

 init() throws {
 let config = MLModelConfiguration()
 config.computeUnits = .cpuAndNeuralEngine
 self.model = try SentimentClassifier(configuration: config)
 }

 /// async/await: suspends caller, doesn't block any thread
 func classify(text: String) async throws -> String {
 let input = SentimentClassifierInput(text: text)
 let output = try await model.prediction(input: input)
 return output.label
 }

 /// Batch version using the async batch API (iOS 16+)
 func classifyBatch(_ texts: [String]) async throws -> [String] {
 let inputs = texts.map 
 let outputs = try await model.predictions(inputs: inputs)
 return outputs.map(\\.label)
 }
}

// One shared instance for the whole app
// @Observable ViewModel owns the actor — not each view
@MainActor @Observable
class FeedViewModel {
 private let classifier = try! ClassifierService()
 private(set) var result = ""

 func classify(_ text: String) async {
 result = (try? await classifier.classify(text: text)) ?? "error"
 }
}

Key points:

▸ The actor serializes all classify() calls — even if 20 views call it simultaneously, only one prediction runs at a time per model instance.
▸ The await at the call site suspends the caller's task without blocking the thread — other tasks continue executing.
▸ If you need parallel inference throughput, use a pool of actor instances with withThrowingTaskGroup.

2. AsyncStream: streaming token-by-token output

Apple Foundation Models supports streaming inference — tokens are yielded as they are generated rather than waiting for the complete response. AsyncStream and AsyncThrowingStream are the correct types for consuming this in SwiftUI.

import FoundationModels

actor LLMService {
 private let session = LanguageModelSession()

 /// Returns an AsyncThrowingStream — allows callers to iterate tokens
 /// and handle errors (e.g., unsupported device, model error)
 func stream(prompt: String) -> AsyncThrowingStream {
 AsyncThrowingStream { continuation in
 Task {
 do {
 // Foundation Models streams partial strings
 for try await partial in session.streamResponse(to: prompt) {
 // Try to cancel: check before each yield
 try Task.checkCancellation()
 continuation.yield(partial)
 }
 continuation.finish()
 } catch is CancellationError {
 continuation.finish() // clean exit — no error
 } catch {
 continuation.finish(throwing: error)
 }
 }
 }
 }
}

// SwiftUI view — tokens appear progressively
struct StreamingView: View {
 @State private var output = ""
 @State private var error: String?
 let service: LLMService
 let prompt: String

 var body: some View {
 ScrollView {
 Text(output)
 .frame(maxWidth: .infinity, alignment: .leading)
 .padding()
 }
 // .task(id:) cancels and restarts when prompt changes
 .task(id: prompt) {
 output = ""
 do {
 for try await token in await service.stream(prompt: prompt) {
 output += token // @MainActor — safe
 }
 } catch {
 self.error = error.localizedDescription
 }
 }
 }
}

Key points:

▸ .task(id: prompt) automatically cancels the previous stream when the prompt changes — no cleanup code required.
▸ try Task.checkCancellation() exits the stream cleanly rather than continuing to yield tokens for a view that no longer exists.
▸ Use AsyncThrowingStream when the stream can fail; AsyncStream when failure is impossible (rare in AI inference).

3. Task priority for inference workloads

Swift task priority maps to thread scheduling priority. The right choice determines whether inference latency competes with UI rendering or runs quietly in the background.

// Map inference intent → Task priority

// User tapped "Classify" and is waiting for result
Button("Classify") {
 Task(priority: .userInitiated) {
 let result = try await classifier.classify(text: input)
 await MainActor.run 
 }
}

// Pre-warming the model on app launch (user not waiting)
.onAppear {
 Task(priority: .utility) {
 await classifier.warmUp() // loads model silently
 }
}

// Batch classifying content in the background (not visible yet)
func classifyAllInBackground(_ items: [Item]) {
 Task(priority: .background) {
 for item in items 
 let label = try? await classifier.classify(text: item.body)
 // update local store — don't touch UI from here
 await repository.updateLabel(item.id, label: label)
 }
 }
}

// Parallel batch with TaskGroup (respects priority)
func classifyParallel(_ texts: [String]) async throws -> [String] {
 try await withThrowingTaskGroup(of: (Int, String).self) { group in
 for (i, text) in texts.enumerated() {
 group.addTask(priority: .userInitiated) {
 let label = try await classifier.classify(text: text)
 return (i, label)
 }
 }
 var results = [(Int, String)]()
 for try await result in group 
 return results.sorted { $0.0 
 Priority reference: 
 
 | P | When |
| --- | --- |
| .userInitiated | User triggered, result appears immediately in UI |
| .utility | Pre-warming, indexing — user will benefit soon |
| .background | Batch processing the user won't see this session |
| .high / .default | Rarely correct for inference — avoid |

## 4. Cancellation: preventing ghost inference

 

Without cancellation, inference continues running after a view disappears — consuming CPU, memory, and battery for work whose result will never be used. Swift's structured concurrency propagates cancellation automatically when you use `.task`, but manually-created Tasks require explicit cancellation.

 

```swift
@MainActor @Observable
class SearchViewModel {
 private(set) var results: [SearchResult] = []
 private var currentTask: Task ?

 func search(query: String) {
 // Cancel any in-progress inference before starting a new one
 currentTask?.cancel()

 currentTask = Task(priority: .userInitiated) {
 // Check cancellation before starting expensive work
 guard !Task.isCancelled else 

 let candidates = try? await repository.fetch(query: query)
 guard !Task.isCancelled else // cancelled while fetching

 let ranked = await inferenceService.rank(candidates ?? [], for: query)
 guard !Task.isCancelled else // cancelled during ranking

 results = ranked // only reaches here if not cancelled
 }
 }

 func onDisappear() {
 // Belt-and-suspenders: cancel explicitly when view disappears
 currentTask?.cancel()
 currentTask = nil
 }
}

// In a SwiftUI view: prefer .task(id:) — it handles cancellation automatically
.searchable(text: $query)
.task(id: query) {
 // Automatically cancels and restarts when query changes
 await viewModel.search(query: query)
}

5. @MainActor: connecting inference output to SwiftUI

Inference runs off the main actor. The results must reach the main actor to update SwiftUI state. There are two patterns: annotate the ViewModel class with @MainActor (Swift 5.10+, recommended), or hop manually with await MainActor.run .

// Preferred (Swift 5.10+): annotate the entire class
// All properties and methods run on the main actor
@MainActor @Observable
class PredictionViewModel {
 private(set) var label = ""
 private(set) var confidence: Float = 0
 private let service: ClassifierService

 init(service: ClassifierService) 

 func predict(text: String) async {
 // await crosses into ClassifierService actor (off main)
 // returns back to @MainActor (main thread) after await
 guard let output = try? await service.classify(text: text) else 
 label = output.label // safe — we're on MainActor
 confidence = output.confidence
 }
}

// Manual hop (when you can't annotate the class)
func handleInference() {
 Task {
 let result = try? await service.classify(text: input)
 await MainActor.run {
 self.displayLabel = result?.label ?? "unknown"
 }
 }
}

// Swift 6 strict concurrency: @MainActor is required, not optional
// Enable strict concurrency checking in Xcode 16+:
// Build Settings → Swift Compiler - Language → Strict Concurrency: Complete

Frequently asked questions

Do I need to dispatch Core ML inference to a background queue manually? No. Wrapping your Core ML model in an actor and calling prediction() via async/await handles dispatch automatically. The actor runs on the cooperative thread pool — never on the main thread. The async prediction() API (iOS 16+) suspends the caller without blocking any thread.

What Task priority should AI inference use? Use .userInitiated for inference the user is actively waiting for. Use .utility for pre-warming and background indexing. Use .background for batch processing that won't be visible this session. Avoid .high — it can starve system tasks.

How do I cancel inference when the user navigates away? Use .task(id:) in SwiftUI — it automatically cancels when the view disappears or when the id changes. For manual Tasks, store the Task handle and call handle.cancel(). Check Task.isCancelled at suspension points inside the work.

Can multiple views share one Core ML model instance? Yes — this is the recommended pattern. One actor-wrapped service instance lives at the app or scene level, injected into views. The actor serializes all calls. Per-view model instantiation wastes memory and adds 50–500ms of loading time per view appearance.

Authoritative References

Core MLCore ML documentationCore ML toolsSwift ConcurrencySwift Evolution

Why AI workloads need special concurrency care

1. Actors: thread-safe model access

2. AsyncStream: streaming token-by-token output

3. Task priority for inference workloads

5. @MainActor: connecting inference output to SwiftUI

Frequently asked questions

Authoritative References

Related

Authoritative References