> BRB
./how-its-built
./playground~/./download
OverviewResources
00Why?01Problem & Vision02Data Foundation03Ingestion Pipeline04Vector Search & RAG05Casper Agent06Training Pipeline07Inference System08Desktop Architecture09Privacy & Ethics
./how-its-built/inference-system
[07]

Inference System - Running Models Locally

Multi-backend architecture for running fine-tuned models entirely on your Mac.

On this page
Multi-Backend ArchitectureMLX Python Backendllama.cpp AlternativeFuture: Native MLX RustModel LoadingBase Model + AdapterMemory ManagementLazy Loading StrategyGeneration ConfigurationTemperature & SamplingStop SequencesRepetition PenaltyStreaming ResponsesServer-Sent EventsToken-by-token Output
←
Previous
Training Pipeline
Next
Desktop Architecture
→
© 2026 Lark Matter