Skip to main content
3Nsofts logo3Nsofts

Insights / On-Device AI

How to Integrate Core ML Models into a SwiftUI App in 2026

On-device AI inference is 4–10x faster than cloud API round-trips on Apple Silicon — with no network dependency, no per-inference cost, and user data that never leaves the device. This guide covers the complete integration path from model preparation to production rollout.

By Ehsan Azish · 3NSOFTS · March 2026

Overview: what Core ML does

Core ML is Apple's on-device machine learning framework. It takes a trained model — exported from Create ML, PyTorch, TensorFlow, or any ONNX-compatible tool — converts it to the .mlpackage format, and provides a type-safe Swift API to run inference on that model using the device's Neural Engine, GPU, or CPU.

The key point most tutorials miss: Core ML is not just a model runner. The framework handles hardware routing, precision optimization, memory management, and model compilation. You declare what you want — the prediction — and the framework decides which hardware executes it most efficiently. The developer-facing API is intentionally simple so that complexity is hidden by the platform.

Step 1 — Prepare your model with Create ML

Create ML is Apple's no-code and Swift-based training tool. For image classification, text classification, sound analysis, and tabular regression — it accepts training data, runs training on-device using your Mac's GPU and Neural Engine, and exports a ready-to-use .mlpackage directly.

If you're using a pre-trained model from PyTorch or TensorFlow, export using coremltools — Apple's Python library for model conversion:

import coremltools as ct
import torch

# Load your trained PyTorch model
model = MyModel()
model.load_state_dict(torch.load("weights.pth"))
model.eval()

# Convert to Core ML format
traced = torch.jit.trace(model, torch.rand(1, 3, 224, 224))
mlmodel = ct.convert(
    traced,
    convert_to="mlprogram",           # .mlpackage format
    inputs=[ct.ImageType(
        name="image",
        shape=(1, 3, 224, 224),
        color_layout=ct.colorlayout.RGB
    )],
    compute_precision=ct.precision.FLOAT16,  # ANE-optimized precision
    minimum_deployment_target=ct.target.iOS16,
)
mlmodel.save("MyModel.mlpackage")

Always use convert_to="mlprogram" for new models. The .mlpackage format (vs the legacy .mlmodel) supports on-device model compilation, better Neural Engine scheduling, and per-layer compute unit assignment in Xcode's Performance Report.

Step 2 — Add the model to Xcode

Drag the .mlpackage into your Xcode project navigator. Check the target membership for your app target. Xcode will:

  • Generate a typed Swift class (e.g. MyModel) with MyModelInput and MyModelOutput types
  • Show the model's metadata, input/output types, and the Performance Report button in the model inspector
  • Bundle the .mlpackage in the app bundle at build time, compiled to an optimized .mlmodelc for the target device

Click the model in the project navigator and go to the Performance tab to run the Xcode Core ML Performance Report. This runs on-device benchmarks and shows you per-layer compute unit assignment (CPU / GPU / Neural Engine) before you write any code. Use this to verify that the model is routing to the Neural Engine as expected.

Step 3 — Load and run inference in Swift

The correct production pattern wraps the model in a Swift Actor and loads it lazily. Never load a Core ML model synchronously on the main thread — model loading takes 50–500ms depending on model size.

actor MyModelActor {
    private var model: MyModel?

    func predict(image: CVPixelBuffer) async throws -> MyModelOutput {
        if model == nil {
            let config = MLModelConfiguration()
            config.computeUnits = .cpuAndNeuralEngine
            model = try await MyModel.load(configuration: config)
        }
        guard let model else { throw ModelError.unavailable }
        return try model.prediction(image: image)
    }
}

// View model usage
@Observable
final class ClassifierViewModel {
    var result: MyModelOutput?
    var isLoading = false

    private let actor = MyModelActor()

    func classify(_ buffer: CVPixelBuffer) {
        isLoading = true
        Task {
            result = try? await actor.predict(image: buffer)
            isLoading = false
        }
    }
}

The MLModelConfiguration is where you control Neural Engine targeting. Using .cpuAndNeuralEngine excludes the GPU and routes the model to the most energy-efficient compute path for latency-sensitive inference. The .all option lets Core ML choose — which is correct for most models.

Benchmark results: on-device vs cloud inference

The following benchmarks use MobileNetV3-Large (image classification) comparing on-device Core ML inference against cloud AI API latency. Measurements are median over 100 runs; cloud benchmarks use a US-East server with a European client connection under normal load.

MethodMedian latencyP95 latencyWorks offline
Core ML — A15 Bionic (ANE)2.1ms3.8msYes
Core ML — A13 Bionic (ANE)4.2ms6.5msYes
Cloud API (vision endpoint)210ms450msNo
Cloud API (LLM endpoint)340ms920msNo

Core ML measurements on physical devices, iOS 17, FP16, mlprogram format, .cpuAndNeuralEngine compute units. Cloud API measurements include full round-trip latency from an EU mobile connection to a US-East endpoint.

According to Apple's WWDC 2023 session “Optimize your Core ML usage”, converting models to FP16 precision typically delivers 2x–4x inference speedup on the Neural Engine compared to FP32. The speedup comes from the ANE's native FP16 compute path and the reduced memory bandwidth of smaller tensors.

Step 4 — Production rollout checklist

Before shipping a Core ML feature to all users, verify the following:

  • Profiled on physical hardware. The iOS Simulator has no Neural Engine. All performance measurements must be taken on a real device at the minimum supported iOS version.
  • Thermal behavior verified. Sustained inference generates heat. Profile at both ambient temperature and after 30 seconds of continuous inference to measure throttled performance — what users see during extended use.
  • Graceful degradation for older devices. If the feature requires specific Neural Engine capabilities, provide a fallback state for unsupported hardware rather than hiding the feature entirely or crashing.
  • Privacy label updated. On-device inference processes user data locally, but if any inference input is stored, logged, or later transmitted, that must be disclosed in your App Privacy Details.
  • Model versioning strategy defined. Core ML models bundled in the app binary require an App Store update to change. For models that need frequent updates, design a download-and-cache strategy using MLModel.compileModel(at:) to compile downloaded models on-device.

FAQ

Does Core ML work offline?
Yes. Core ML runs entirely on-device using Apple's Neural Engine, GPU, or CPU. No network connection is required. Once the model is bundled in the app or downloaded to the device, all inference happens locally with no API call, no data sent to a server, and no per-inference cost.
How fast is Core ML compared to cloud AI APIs?
For classification and detection tasks, on-device Core ML on A15 Bionic or later runs in 2–8ms. Cloud API round-trips for equivalent tasks cost 150–450ms in median latency. Core ML is 4–10x faster for these workloads. For very large language models, cloud inference remains more practical — those don't fit on device.
What is the Neural Engine and when does Core ML use it?
The Apple Neural Engine (ANE) is dedicated hardware on Apple Silicon for matrix multiplication and convolution — the core operations of neural network inference. Core ML automatically routes ANE-compatible models to it. You can also explicitly request ANE execution via MLModelConfiguration with computeUnits = .cpuAndNeuralEngine. The ANE is available from A11 Bionic (iPhone 8) onwards.
Should I use Core ML or Apple Foundation Models?
They serve different purposes. Core ML runs task-specific models: image classification, object detection, text classification, and custom models you train. Apple Foundation Models handles generative language tasks — summarization, structured output, conversational interfaces — using Apple's on-device LLM. Most production apps use both. Foundation Models requires Apple Intelligence hardware (iPhone 15 Pro or later).
How do I add a Core ML model to my Xcode project?
Drag the .mlpackage file into the Xcode project navigator and check the target membership for your app. Xcode generates a typed Swift class with typed Input and Output types. Use MLModelConfiguration to control compute units and precision when loading. The model can then be loaded with await MyModel.load(configuration:) in an async context.

Related reading

Adding Core ML to an existing iOS app?

The On-Device AI Integration service handles the full path: architecture assessment, Core ML or Foundation Models implementation, performance profiling, and a production rollout playbook — in 3–5 weeks.

Start AI Integration