A3M Router v2.0: An OpenAI-Compatible AI Gateway Supporting 39 Providers

We just shipped A3M Router v2.0.0 — the biggest update since launch. What started as a simple routing library has officially evolved into a full-featured AI Gateway.

1. OpenAI-Compatible Proxy Server

You can now boot up a local API proxy with a single command:

npx a3m-router serve

That's it. You now have an OpenAI-compatible API proxy running on localhost:8787.

Because it is fully compatible with the OpenAI API specification, any existing SDK works without code changes. Whether you use Python, Node, LangChain, or LlamaIndex, you just need to point the base_url to A3M Router:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8787/v1",
    api_key="not-needed"  # A3M Router handles provider keys automatically
)

response = client.chat.completions.create(
    model="auto",  # Enables intelligent routing
    messages=[{"role": "user", "content": "Hello"}]
)

2. Real-Time Dashboard

Navigate to http://localhost:8787/ to access a live dashboard that visualizes critical metrics:

Request volume and accumulated costs
Provider status (supporting up to 39 providers online/offline)
Request logs with detailed routing decisions
Cost breakdown by provider

3. LangChain Adapter

The update includes a dedicated adapter that acts as a drop-in replacement for ChatOpenAI, supporting streaming, tool calling, and structured output out of the box:

import { A3MChatModel } from 'adaptive-memory-multi-model-router/langchain';

const model = new A3MChatModel({ modelName: 'auto' });
const response = await model.invoke([new HumanMessage("Hello")]);

4. Guardrails Engine

A built-in guardrails engine allows you to inspect and sanitize user queries before they hit the upstream models:

import { GuardrailEngine } from 'adaptive-memory-multi-model-router';

const guardrail = new GuardrailEngine({
  promptInjection: true,
  piiDetection: true,
  contentFilter: true
});

const result = await guardrail.checkInput(userInput);
if (result.blocked) {
  // Prompt injection or PII detected
  console.log(result.reason);
}

The engine features out-of-the-box detection for prompt injection attempts, PII (emails, phones, SSNs, credit cards, API keys), harmful content, and language detection for routing.

5. Semantic Cache

To bypass repetitive LLM execution, A3M Router v2.0 introduces semantic caching. Unlike traditional solutions, it utilizes n-gram similarity on the local machine, eliminating the need for external embedding APIs:

import { SemanticCache } from 'adaptive-memory-multi-model-router';

const cache = new SemanticCache({ similarityThreshold: 0.92 });

// First query: cache miss, calls upstream provider
const result1 = await cache.get("What is Python?");

// Semantically similar query: cache HIT! (No API call made)
const result2 = await cache.get("Tell me about Python");

6. Cost Analytics

The release also packs a CostAnalytics module, enabling developers to monitor and analyze token consumption and expenses across multiple providers for better LLMOps management.

[AgentUpdate Depth Analysis]The release of A3M Router v2.0 highlights a major trend in LLMOps: the transition of multi-model routing from simple client-side SDKs to local, lightweight AI Gateways. Compared to heavier, cloud-native alternatives like LiteLLM or Portkey, A3M Router stands out with its zero-config local footprint, alongside built-in, embedding-free semantic caching and gateway-level input guardrails. For complex AI Agent architectures where multi-step planning (Chain-of-Thought) routinely balloons latency and token spend, a local n-gram semantic cache is highly practical, yielding near-zero latency for recurrent sub-tasks. Furthermore, handling security filtering directly at the gateway prevents prompt injection vulnerabilities from compromising autonomous agent decisions. This local-first, performance-oriented middleware represents the necessary infrastructure to build resilient, cost-effective, and safe AI agent swarms.

A3M Router v2.0: An OpenAI-Compatible AI Gateway Supporting 39 Providers

1. OpenAI-Compatible Proxy Server

2. Real-Time Dashboard

3. LangChain Adapter

4. Guardrails Engine

5. Semantic Cache

6. Cost Analytics

Next Stories to Read

Local Gemma 4 Guide: MoE Architecture, 256K Context, & Ollama Integration

Deep Dive into RAG Core: Understanding Vector Embeddings and Retrieval

5 Practical Tips to Cut Claude Code Token Usage by 30%