omlx

by jundot

🔓 Open Source Python 🌍 Global free

About

oMLX is a high-performance LLM inference server optimized for Apple Silicon, featuring a unique tiered KV cache system that persists context across RAM and SSD. Built on the MLX framework, it supports continuous batching, multi-model serving with LRU eviction, and native VLM/OCR capabilities. It acts as a critical backend for the OpenClaw ecosystem, offering OpenAI-compatible APIs and specific optimizations for Claude Code to enable efficient local agent workflows.

Features

Tiered KV Cache (Hot/Cold)
Continuous Batching Support
Native Apple Silicon Optimization
VLM & OCR Capabilities
Deep OpenClaw Integration

Supported Platforms

desktopweb

Links

🌐 Visit Website 📦 GitHub Repository

omlx

About

Features

Supported Platforms

Links

Related AI Industry News

Accenture and Google Cloud Launch Gemini Enterprise Acceleration Program

Cline Launches Open-Source Agent Runtime SDK, Outperforming Anthropic

Claude Code v2.1.141 Adds Desktop Notifications and Context Summarization