omlx
by jundot
About
oMLX is a high-performance LLM inference server optimized for Apple Silicon, featuring a unique tiered KV cache system that persists context across RAM and SSD. Built on the MLX framework, it supports continuous batching, multi-model serving with LRU eviction, and native VLM/OCR capabilities. It acts as a critical backend for the OpenClaw ecosystem, offering OpenAI-compatible APIs and specific optimizations for Claude Code to enable efficient local agent workflows.
Features
- Tiered KV Cache (Hot/Cold)
- Continuous Batching Support
- Native Apple Silicon Optimization
- VLM & OCR Capabilities
- Deep OpenClaw Integration
Supported Platforms
desktopweb