Most AI models today are not designed for sustained, multi-step autonomous execution. Tasks like running hundreds of iterative code modifications, or chaining tool calls across hours without human intervention, require a different kind of model architecture and training focus. Addressing this gap, Alibaba’s Qwen team formally announced Qwen3.7-Max at the 2026 Alibaba Cloud Summit on May 20, following the quiet appearance of two preview versions on the Arena AI leaderboard.
Alibaba previewed two models simultaneously: Qwen3.7-Max-Preview and Qwen3.7-Plus-Preview. They ranked 13th globally in text capabilities and 16th in vision capabilities, respectively, according to LM Arena. In the Text Arena, Qwen3.7-Max-Preview ranked #13 overall, placing Alibaba as the #6 lab in text. In the Vision Arena, Qwen3.7-Plus-Preview ranked #16 overall, placing Alibaba as the #5 lab in vision.
Qwen3.7-Plus-Preview is described as a high-performance balanced version preview, focusing on reasoning and logical expression, with its multimodal and vision capabilities, and a toolchain to be opened in the future. Qwen3.7-Max is the text-only reasoning flagship, which is the main focus of this article as Alibaba has formally launched it with API access.
The Alibaba Qwen team described Qwen3.7-Max as its most advanced and comprehensive agent model to date. This proprietary, closed-weight model is specifically designed to handle complex coding and debugging, office workflow automation, and long-horizon tasks spanning hundreds or thousands of steps.
As a reasoning-first model, Qwen3.7-Max features an "Extended-Thinking Mode." The model generates an internal chain of thought to plan, verify, and correct course before committing to a final answer, visible as a 'Thinking' trace in user interfaces like Qwen Chat.
Reasoning models consume significantly more output tokens than standard completions. In Artificial Analysis’s Intelligence Index evaluation, Qwen3.7-Max generated approximately 97 million tokens, compared to a benchmark average of 24 million. While this overhead adds latency to simpler tasks, it is highly powerful for multi-step planning, heavy code refactoring, and complex agentic workflows.
Crucially, the model upgrades its context window to 1M tokens, up from the 256K featured on Qwen3.6 Max Preview. This text-only model's pricing is yet to be announced, though its predecessor was priced at $1.30/$7.80 per million input/output tokens on Alibaba Cloud. A million-token context allows ingestion of entire code repositories or massive document stacks, paving the way for highly autonomous and informed operations.
[AgentUpdate Depth Analysis] The launch of Qwen3.7-Max underscores a critical pivot in LLM architecture from simple conversational interfaces to reasoning-driven autonomous agents. By pairing a massive 1M-token context window with extended-thinking capabilities, Qwen3.7-Max directly mitigates the "context drift" and "planning horizon" limitations that typically plague AI agents executing multi-step tasks over long periods. When compared to reasoning frameworks like OpenAI's o-series or DeepSeek-R1, Qwen3.7-Max doubles down on enterprise utility—specifically focusing on long-horizon, complex API tool-chaining and codebase refactoring. Providing 1M tokens allows the agent to ingest substantial environmental states (such as an entire repository) and execute deep reasoning iterations simultaneously. This transition guarantees that the future of the AI Agent ecosystem lies in highly independent, self-correcting agents capable of running autonomously for hours, fundamentally redefining productivity in software engineering and enterprise workflows.