SOURCE // LABS

DLLG: Dynamic Logit-Level Gating Outperforms LLM Routing and Merging

DLLG: Dynamic Logit-Level Gating Outperforms LLM Routing and Merging

Leveraging multiple specialized Large Language Models (LLMs) can combine complementary strengths, but existing approaches trade adaptability for stability. Traditional routing mechanisms commit to a single model prematurely, heuristic ensembling methods depend on fragile proxies, and parameter merging often introduces interference that degrades specialized capabilities.

To address these challenges, researchers have introduced DLLG (Dynamic Logit-Level Gating), a dynamic logit-level ensembling framework that learns token-level expert fusion from sparse response-level supervision. At its core, a lightweight gating module predicts step-wise fusion weights, linking trajectory-level correctness directly to token generation without requiring fine-grained token-level labels or computationally expensive expert retraining.

Across diverse reasoning and code benchmarks, DLLG consistently outperforms strong routing, heuristic ensembling, and parameter-merging baselines across various model scales. This highlights learned logit-level fusion as a robust, scalable, and highly adaptable paradigm for integrating specialized LLM experts.

[AgentUpdate Depth Analysis] DLLG represents a significant leap forward in multi-model collaboration, bridging the gap between coarse-grained routing and destructive parameter merging. In the context of the AI Agent ecosystem, traditional multi-agent orchestration relies heavily on high-level routers, which introduce high latency and struggle with dynamic task switching within a single generation step. By operating at the logit level during inference, DLLG acts as a dynamic, non-intrusive Inference-time MoE. This allows heterogeneous, specialized agent models to collaboratively generate a single response, switching "expertise" token by token. For future autonomous agent workflows, this paradigm enables highly flexible, adaptive, and performant agent swarms that can dynamically pool collective intelligence without retraining underlying models.