by
About
llm-d Router is an intelligent entry point for LLM inference traffic, providing load and prefix-cache aware routing, request prioritization, and advanced flow control. It utilizes an Endpoint Picker (EPP) that integrates with proxies like Envoy via the ext-proc protocol. Supporting both Standalone and Kubernetes Gateway API modes, it enables performance-targeted scheduling, model name rewriting for canary rollouts, and coordinates complex multi-stage lifecycles like Prefill/Decode (P/D) disaggregation.
Features
- KV-Cache Aware Intelligent Routing
- Native Kubernetes Gateway API Integration
- Multi-level Request Prioritization
- Prefill/Decode Disaggregation Support
- Dynamic Model Rewriting & Canary Support
Supported Platforms
web