⚡ News

Alibaba Unveils Next-Gen AI Chip for Unified LLM Training and Inference

Alibaba Unveils Next-Gen AI Chip for Unified LLM Training and Inference

At its latest technology summit, Alibaba Group officially unveiled its next-generation proprietary AI processor. This new silicon is engineered to eliminate computational bottlenecks across the entire lifecycle of Large Language Models (LLMs), offering a unified solution that excels in both heavy-duty model training and high-efficiency inference workloads.

At the architectural level, the chip leverages advanced Chiplet technology and integrates dedicated Transformer Processing Units (TPUs) tailored for modern generative AI workloads. Delivering robust support for mixed-precision formats including FP8 and INT4, the new processor boasts a 3x training performance leap over its predecessor. For multimodal inference of 100-billion-parameter models, it slashes latency by over 50%, representing a massive gain in performance-per-watt.

To tackle the communication bottlenecks inherent in distributed training, Alibaba has embedded its proprietary high-bandwidth interconnect, delivering up to 3.2Tbps of bandwidth per chip. Integrated with Alibaba Cloud’s "Lingjun" lossless network protocol, the hardware can scale seamlessly to ultra-large clusters exceeding 10,000 GPUs, maintaining a linear scaling efficiency of over 90% during massive parallel training tasks.

On the software front, the silicon offers native compatibility with mainstream deep learning frameworks like PyTorch and TensorFlow, while being deeply integrated with Alibaba Cloud’s "Bailian" platform and the ModelScope community. Developers can transition LLMs from training to deployment without tedious low-level code modifications, significantly lowering the barriers to enterprise AI adoption.

[AgentUpdate Depth Analysis] Alibaba's launch of a unified training and inference AI silicon marks a crucial step in cementing its closed-loop Model-as-a-Service (MaaS) ecosystem. While NVIDIA continues to dominate peak raw computing power, Alibaba's competitive moat lies in deep cloud-to-hardware co-design. The rise of AI Agents imposes unique demands on infrastructure: ultra-low latency for real-time multimodal interaction and cost-effective compute for iterative ReAct (Reasoning and Acting) loops and tool callings. By hardware-accelerating Transformer operators and optimizing mixed-precision data flows, this new chip directly addresses the prohibitive token costs of complex agentic workflows. Ultimately, this hardware breakthrough will democratize high-frequency Agent actions via Alibaba's Bailian platform, accelerating the transition of AI Agents from experimental demos to production-grade enterprise solutions.

↗ Read original source