Insights / On-Device AI
How to Integrate Core ML Models into a SwiftUI App in 2026
On-device AI inference is 4–10x faster than cloud API round-trips on Apple Silicon — with no network dependency, no per-inference cost, and user data that never leaves the device. This guide covers the complete integration path from model preparation to production rollout.
Overview: what Core ML does
Core ML is Apple's on-device machine learning framework. It takes a trained model — exported from Create ML, PyTorch, TensorFlow, or any ONNX-compatible tool — converts it to the .mlpackage format, and provides a type-safe Swift API to run inference on that model using the device's Neural Engine, GPU, or CPU.
The key point most tutorials miss: Core ML is not just a model runner. The framework handles hardware routing, precision optimization, memory management, and model compilation. You declare what you want — the prediction — and the framework decides which hardware executes it most efficiently. The developer-facing API is intentionally simple so that complexity is hidden by the platform.
Step 1 — Prepare your model with Create ML
Create ML is Apple's no-code and Swift-based training tool. For image classification, text classification, sound analysis, and tabular regression — it accepts training data, runs training on-device using your Mac's GPU and Neural Engine, and exports a ready-to-use .mlpackage directly.
If you're using a pre-trained model from PyTorch or TensorFlow, export using coremltools — Apple's Python library for model conversion:
import coremltools as ct
import torch
# Load your trained PyTorch model
model = MyModel()
model.load_state_dict(torch.load("weights.pth"))
model.eval()
# Convert to Core ML format
traced = torch.jit.trace(model, torch.rand(1, 3, 224, 224))
mlmodel = ct.convert(
traced,
convert_to="mlprogram", # .mlpackage format
inputs=[ct.ImageType(
name="image",
shape=(1, 3, 224, 224),
color_layout=ct.colorlayout.RGB
)],
compute_precision=ct.precision.FLOAT16, # ANE-optimized precision
minimum_deployment_target=ct.target.iOS16,
)
mlmodel.save("MyModel.mlpackage")Always use convert_to="mlprogram" for new models. The .mlpackage format (vs the legacy .mlmodel) supports on-device model compilation, better Neural Engine scheduling, and per-layer compute unit assignment in Xcode's Performance Report.
Step 2 — Add the model to Xcode
Drag the .mlpackage into your Xcode project navigator. Check the target membership for your app target. Xcode will:
- Generate a typed Swift class (e.g.
MyModel) withMyModelInputandMyModelOutputtypes - Show the model's metadata, input/output types, and the Performance Report button in the model inspector
- Bundle the
.mlpackagein the app bundle at build time, compiled to an optimized.mlmodelcfor the target device
Click the model in the project navigator and go to the Performance tab to run the Xcode Core ML Performance Report. This runs on-device benchmarks and shows you per-layer compute unit assignment (CPU / GPU / Neural Engine) before you write any code. Use this to verify that the model is routing to the Neural Engine as expected.
Step 3 — Load and run inference in Swift
The correct production pattern wraps the model in a Swift Actor and loads it lazily. Never load a Core ML model synchronously on the main thread — model loading takes 50–500ms depending on model size.
actor MyModelActor {
private var model: MyModel?
func predict(image: CVPixelBuffer) async throws -> MyModelOutput {
if model == nil {
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine
model = try await MyModel.load(configuration: config)
}
guard let model else { throw ModelError.unavailable }
return try model.prediction(image: image)
}
}
// View model usage
@Observable
final class ClassifierViewModel {
var result: MyModelOutput?
var isLoading = false
private let actor = MyModelActor()
func classify(_ buffer: CVPixelBuffer) {
isLoading = true
Task {
result = try? await actor.predict(image: buffer)
isLoading = false
}
}
}The MLModelConfiguration is where you control Neural Engine targeting. Using .cpuAndNeuralEngine excludes the GPU and routes the model to the most energy-efficient compute path for latency-sensitive inference. The .all option lets Core ML choose — which is correct for most models.
Benchmark results: on-device vs cloud inference
The following benchmarks use MobileNetV3-Large (image classification) comparing on-device Core ML inference against cloud AI API latency. Measurements are median over 100 runs; cloud benchmarks use a US-East server with a European client connection under normal load.
| Method | Median latency | P95 latency | Works offline |
|---|---|---|---|
| Core ML — A15 Bionic (ANE) | 2.1ms | 3.8ms | Yes |
| Core ML — A13 Bionic (ANE) | 4.2ms | 6.5ms | Yes |
| Cloud API (vision endpoint) | 210ms | 450ms | No |
| Cloud API (LLM endpoint) | 340ms | 920ms | No |
Core ML measurements on physical devices, iOS 17, FP16, mlprogram format, .cpuAndNeuralEngine compute units. Cloud API measurements include full round-trip latency from an EU mobile connection to a US-East endpoint.
According to Apple's WWDC 2023 session “Optimize your Core ML usage”, converting models to FP16 precision typically delivers 2x–4x inference speedup on the Neural Engine compared to FP32. The speedup comes from the ANE's native FP16 compute path and the reduced memory bandwidth of smaller tensors.
Step 4 — Production rollout checklist
Before shipping a Core ML feature to all users, verify the following:
- ✓Profiled on physical hardware. The iOS Simulator has no Neural Engine. All performance measurements must be taken on a real device at the minimum supported iOS version.
- ✓Thermal behavior verified. Sustained inference generates heat. Profile at both ambient temperature and after 30 seconds of continuous inference to measure throttled performance — what users see during extended use.
- ✓Graceful degradation for older devices. If the feature requires specific Neural Engine capabilities, provide a fallback state for unsupported hardware rather than hiding the feature entirely or crashing.
- ✓Privacy label updated. On-device inference processes user data locally, but if any inference input is stored, logged, or later transmitted, that must be disclosed in your App Privacy Details.
- ✓Model versioning strategy defined. Core ML models bundled in the app binary require an App Store update to change. For models that need frequent updates, design a download-and-cache strategy using
MLModel.compileModel(at:)to compile downloaded models on-device.
FAQ
Does Core ML work offline?↓
How fast is Core ML compared to cloud AI APIs?↓
What is the Neural Engine and when does Core ML use it?↓
Should I use Core ML or Apple Foundation Models?↓
How do I add a Core ML model to my Xcode project?↓
Related reading
Adding Core ML to an existing iOS app?
The On-Device AI Integration service handles the full path: architecture assessment, Core ML or Foundation Models implementation, performance profiling, and a production rollout playbook — in 3–5 weeks.
Start AI Integration