by
🔓 Open Source Go 🌍 Global free

About

llm-d Router is an intelligent entry point for LLM inference traffic, providing load and prefix-cache aware routing, request prioritization, and advanced flow control. It utilizes an Endpoint Picker (EPP) that integrates with proxies like Envoy via the ext-proc protocol. Supporting both Standalone and Kubernetes Gateway API modes, it enables performance-targeted scheduling, model name rewriting for canary rollouts, and coordinates complex multi-stage lifecycles like Prefill/Decode (P/D) disaggregation.

Features

  • KV-Cache Aware Intelligent Routing
  • Native Kubernetes Gateway API Integration
  • Multi-level Request Prioritization
  • Prefill/Decode Disaggregation Support
  • Dynamic Model Rewriting & Canary Support

Supported Platforms

web