o

omlx

by jundot
🔓 Open Source Python 🌍 Global free

About

oMLX is a high-performance LLM inference server optimized for Apple Silicon, featuring a unique tiered KV cache system that persists context across RAM and SSD. Built on the MLX framework, it supports continuous batching, multi-model serving with LRU eviction, and native VLM/OCR capabilities. It acts as a critical backend for the OpenClaw ecosystem, offering OpenAI-compatible APIs and specific optimizations for Claude Code to enable efficient local agent workflows.

Features

  • Tiered KV Cache (Hot/Cold)
  • Continuous Batching Support
  • Native Apple Silicon Optimization
  • VLM & OCR Capabilities
  • Deep OpenClaw Integration

Supported Platforms

desktopweb