Chapter 5 • Week 5

Testing AI-Powered Features

AI quality failures are usually integration failures. This chapter provides a practical test matrix for deterministic behavior, privacy guarantees, model regressions, and degraded runtime conditions.

Testing pyramid for AI products

- Unit tests for adapter logic, serialization, and routing rules.
- Contract tests for model input/output schema compatibility.
- Golden-set evaluation tests for output quality thresholds.
- End-to-end workflow tests with fallback and cancellation paths.

Deterministic harness pattern

Build an inference abstraction that can swap real and fake engines. Feed fixed fixture payloads and verify policy outcomes, not only string equality. This makes tests stable across model updates.

protocol InferenceEngine {
    func run(_ input: TestInput) async throws -> TestOutput
}

struct FakeEngine: InferenceEngine {
    let canned: TestOutput
    func run(_ input: TestInput) async throws -> TestOutput { canned }
}

final class FeatureTests: XCTestCase {
    func testFallbackWhenLatencyBudgetExceeded() async throws {
        let engine = SlowFixtureEngine(delayMs: 900)
        let service = FeatureService(engine: engine, budgetMs: 300)
        let result = try await service.execute(.sample)
        XCTAssertEqual(result.mode, .fallback)
    }
}

Privacy assertions in CI

Add explicit tests that block accidental sensitive-data export. These checks should run in every pull request, not as manual release checklist items.

- Verify analytics payloads do not include raw input/output content.
- Verify logs redact model prompts and generated content.
- Verify kill switch fully disables telemetry paths.

Regression gates and release criteria

Ship with measurable gates: p95 latency, fallback rate, confidence floor, and critical scenario pass rate. Treat model changes as release candidates with their own QA sign-off, not hidden dependency bumps.

Authoritative References

Core MLCore ML documentationCore ML toolsSwiftUIObservation