When should I use Foundation Models vs Core ML?

Use Core ML for deterministic classification, image analysis, and custom trained models. Use Foundation Models (iOS 18.1+, A17 Pro or later) for natural language generation, summarization, and structured output from an on-device LLM.

How do I handle AI inference results in Core Data?

Run inference when a record is created or updated, store the result as a field on the entity, mark records as needing re-inference when content changes, and use NSFetchRequest predicates on stored inference results to power UI.

iOS Architecture

AI-Native iOS Architecture: On-Device Intelligence Without the Cloud

How to build iOS apps where AI is a first-class architectural concern — Core ML, Apple Foundation Models, offline inference, and the data layer decisions that make it work.

By Ehsan Azish · 3NSOFTS·March 2026·8 min read

The structural difference between AI-enabled and AI-native

An AI-enabled app calls an API when the user taps a button. The AI feature can be removed and the app still works — because the intelligence was never load-bearing.

An AI-native app is designed so that intelligence is part of how the data model works. The app categorizes, ranks, or reasons about your data locally — and the results are persisted as first-class entities, reused across features, and updated incrementally as new data arrives.

This distinction matters because it determines the architecture. If AI is an API call, you can add it in sprint 8. If AI is structural, you need to design for it in week 1 — because it shapes your data model, your sync strategy, and your offline behavior.

On-device inference as the default

Apple's Core ML and Foundation Models make on-device inference practical for most production use cases. The Neural Engine on modern Apple Silicon handles classification, NLP, summarization, and embedding tasks at speeds that are fast enough for real-time interaction — without a network round-trip.

For an AI-native app, this is the correct default for three reasons:

Offline behavior. The AI features work when there's no network. This is non-negotiable for mobile.
Privacy. No user data leaves the device. This matters for products in health, finance, or enterprise contexts.
Cost model. On-device inference has no per-request cost. For products with high inference volume, this eliminates a significant operational dependency.

Cloud AI has legitimate use cases — large-context reasoning, multi-modal tasks, or capabilities that simply don't have on-device equivalents yet. In those cases, scope cloud AI as an optional enhancement layer, not the foundation. The core product always runs without it.

The data layer is where AI-native architecture lives

The most important architectural decision in an AI-native iOS app isn't which model to use — it's how inference results get stored and reused.

In a naive implementation, you run inference on every display. The app loads a record, runs the model, uses the result, discards it. This is inefficient and doesn't compose: you can't sort by AI-generated relevance, filter by predicted category, or aggregate across results if you're only computing them at render time.

In a well-designed AI-native app, inference results are persisted. Using Core Data:

Run inference when a record is created or updated, not on every fetch.
Store the result (category, score, embedding, summary) as a field on the entity.
Mark records as "needs inference" when their content changes, and reprocess in a background task.
Use NSFetchRequest predicates and sort descriptors on stored inference results to power UI — same as any other data.

This architecture composes cleanly. The UI doesn't know it's displaying AI-derived data. The sync layer (CloudKit) replicates the inference results alongside the source records. A new device gets accurate AI-powered views immediately, without reprocessing everything on first launch.

Foundation Models and structured generation

Apple's Foundation Models framework (introduced in iOS 18.1) brings on-device LLM capabilities to production apps via a stable Swift API. For AI-native architecture, the most useful capability is guided generation — the ability to constrain model output to a specific schema using Swift's Generable protocol.

Instead of parsing free-form text, you define a Swift struct conforming to Generable and pass it as the output type. The model generates JSON that conforms to your schema, which you receive as a typed Swift value. This eliminates the brittle string parsing that makes LLM integrations unreliable in production.

The practical constraint is model availability: Foundation Models requires iOS 18.1+ and a device with Apple Silicon (A17 Pro or later, all M-series). Apps targeting earlier OS versions or older hardware need a fallback path — either a lighter Core ML model or graceful degradation of AI features.

A realistic deployment checklist

Before shipping an AI-native iOS app, the following questions need clear answers:

Model size and load time. Core ML models are bundled in the app. Large models increase app size and first-inference latency. Profile on the target device, not the simulator.
Background processing. For batch inference (processing existing records on first launch), use background tasks via BGTaskScheduler. Don't block the main thread or the UI launch path.
Staleness handling. When records update, how does the system know which inference results are stale? This needs an explicit mechanism — a timestamp, a dirty flag, or a content hash.
Sync behavior. If you're syncing via CloudKit, decide whether inference results sync or are recomputed locally. Syncing results reduces reprocessing cost on new devices; local recomputation avoids sync conflicts.
Minimum OS version. Foundation Models requires iOS 18.1+. Core ML works back to iOS 11. Pick your minimum target deliberately.

The bottom line

AI-native iOS architecture isn't complicated. It's a set of deliberate decisions made early: run inference on-device, persist results in your data model, reprocess lazily, and keep the AI layer decoupled from the UI.

What makes it work in production is the same thing that makes any data-layer architecture work: consistency, testability, and explicit handling of edge cases. The model is just another data source. Treat it like one.

Work With Me

See this architecture in production — the Sorto and ECHO Survival AI case studies show how these patterns were applied in real products.

Authoritative References

Foundation Models frameworkApple IntelligencePrivate Cloud ComputeCore MLCore ML documentation