News

Zhipu AI Releases GLM-5.1: Self-Refining Coding Strategy for Enhanced Agentic Programming

Zhipu AI Releases GLM-5.1: Self-Refining Coding Strategy for Enhanced Agentic Programming

Zhipu AI has released its new GLM-5.1 model under an MIT license. This model is reportedly capable of refining its own approach over hundreds of iterations when tackling complex coding tasks.

GLM-5.1 is introduced as a new open-weight model specifically designed for long-running, agent-based programming tasks. The core premise is that existing models, including Zhipu's own predecessor GLM-5, tend to exhaust their strategic ideas too quickly on complex problems. They often apply familiar strategies, achieve initial progress, but then hit a performance wall, a problem not solvable by merely increasing computational power.

GLM-5.1 aims to rectify this by repeatedly reviewing its own strategy, identifying dead ends, and exploring new approaches. Zhipu AI details this optimization process as spanning "hundreds of rounds and thousands of tool calls."

The company demonstrated GLM-5.1's capabilities through three internal scenarios, though independent evaluations are not yet available.

GLM-5.1 autonomously switches strategies mid-task

In the first scenario, GLM-5.1 was tasked with optimizing a vector database—a system designed to search large datasets for similar entries. The objective was to answer as many search queries per second as possible without compromising accuracy. Zhipu AI reported that in a standard 50-round test, Claude Opus 4.6 held the previous best score of 3,547 queries per second.

Zhipu AI provided GLM-5.1 with unlimited attempts, allowing the model to autonomously decide when to submit a new version and what new approach to try next. After over 600 iterations and more than 6,000 tool calls, GLM-5.1 achieved an impressive 21,500 queries per second—approximately six times the previous best, according to the company.

Zhipu states that the model fundamentally altered its strategy multiple times during this run. Around iteration 90, it shifted from an exhaustive search of all data to a more efficient clustering approach. Later, around iteration 240, it introduced a two-stage pipeline for rough pre-sorting followed by precise filtering. The company identified six such structural shifts throughout the run, each initiated by the model itself.

GPU optimization shows progress but doesn't reach the top

In the second scenario, the model's task was to rewrite existing machine learning code for faster execution on GPUs. GLM-5.1 achieved a 3.6x speedup over the baseline implementation and continued to make progress even in later phases, as reported by Zhipu AI. In contrast, GLM-5 plateaued much earlier.

On the KernelBench Level 3 GPU optimization task, GLM-5.1 sustained progress considerably longer than its predecessor GLM-5, but it still trailed Claude Opus 4.6. Claude Opus 4.6 maintained a clear lead in this test with a 4.2x speedup and continued to show potential for improvement at the end. While GLM-5.1 extended the productive optimization horizon compared to its predecessor, it did not close the performance gap with its strongest competitor.

A Linux desktop from a single prompt

The third scenario presented was particularly unusual: GLM-5.1 was asked to build a complete Linux desktop environment as a web application—starting with no starter code and no intermediate instructions.

↗ Read original source