We just shipped A3M Router v2.0.0 — the biggest update since launch. What started as a simple routing library has officially evolved into a full-featured AI Gateway.
1. OpenAI-Compatible Proxy Server
You can now boot up a local API proxy with a single command:
npx a3m-router serve
That's it. You now have an OpenAI-compatible API proxy running on localhost:8787.
Because it is fully compatible with the OpenAI API specification, any existing SDK works without code changes. Whether you use Python, Node, LangChain, or LlamaIndex, you just need to point the base_url to A3M Router:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8787/v1",
api_key="not-needed" # A3M Router handles provider keys automatically
)
response = client.chat.completions.create(
model="auto", # Enables intelligent routing
messages=[{"role": "user", "content": "Hello"}]
)
2. Real-Time Dashboard
Navigate to http://localhost:8787/ to access a live dashboard that visualizes critical metrics:
- Request volume and accumulated costs
- Provider status (supporting up to 39 providers online/offline)
- Request logs with detailed routing decisions
- Cost breakdown by provider
3. LangChain Adapter
The update includes a dedicated adapter that acts as a drop-in replacement for ChatOpenAI, supporting streaming, tool calling, and structured output out of the box:
import { A3MChatModel } from 'adaptive-memory-multi-model-router/langchain';
const model = new A3MChatModel({ modelName: 'auto' });
const response = await model.invoke([new HumanMessage("Hello")]);
4. Guardrails Engine
A built-in guardrails engine allows you to inspect and sanitize user queries before they hit the upstream models:
import { GuardrailEngine } from 'adaptive-memory-multi-model-router';
const guardrail = new GuardrailEngine({
promptInjection: true,
piiDetection: true,
contentFilter: true
});
const result = await guardrail.checkInput(userInput);
if (result.blocked) {
// Prompt injection or PII detected
console.log(result.reason);
}
The engine features out-of-the-box detection for prompt injection attempts, PII (emails, phones, SSNs, credit cards, API keys), harmful content, and language detection for routing.
5. Semantic Cache
To bypass repetitive LLM execution, A3M Router v2.0 introduces semantic caching. Unlike traditional solutions, it utilizes n-gram similarity on the local machine, eliminating the need for external embedding APIs:
import { SemanticCache } from 'adaptive-memory-multi-model-router';
const cache = new SemanticCache({ similarityThreshold: 0.92 });
// First query: cache miss, calls upstream provider
const result1 = await cache.get("What is Python?");
// Semantically similar query: cache HIT! (No API call made)
const result2 = await cache.get("Tell me about Python");
6. Cost Analytics
The release also packs a CostAnalytics module, enabling developers to monitor and analyze token consumption and expenses across multiple providers for better LLMOps management.
[AgentUpdate Depth Analysis]The release of A3M Router v2.0 highlights a major trend in LLMOps: the transition of multi-model routing from simple client-side SDKs to local, lightweight AI Gateways. Compared to heavier, cloud-native alternatives like LiteLLM or Portkey, A3M Router stands out with its zero-config local footprint, alongside built-in, embedding-free semantic caching and gateway-level input guardrails. For complex AI Agent architectures where multi-step planning (Chain-of-Thought) routinely balloons latency and token spend, a local n-gram semantic cache is highly practical, yielding near-zero latency for recurrent sub-tasks. Furthermore, handling security filtering directly at the gateway prevents prompt injection vulnerabilities from compromising autonomous agent decisions. This local-first, performance-oriented middleware represents the necessary infrastructure to build resilient, cost-effective, and safe AI agent swarms.