At Anthropic, some engineers haven't written a single line of code in five months. This isn’t due to a lack of tasks, but because Claude now handles the workload. According to a new Anthropic Institute paper titled "When AI builds itself," more than 80% of the code merged into Anthropic's production codebase as of May 2026 was authored by Claude. This is a massive leap from the low single digits seen when Claude Code launched in February 2025. However, the company wants the industry to focus on a deeper implication: the dawn of AI designing and training its own successor.
The productivity surge is staggering. In Q2 2026, a typical Anthropic engineer merged 8x more code per day compared to 2024. An internal poll of 130 research staff indicated that the median developer estimated four times as much output with their latest model, Mythos Preview, than working alone. For complex, open-ended engineering issues, Claude’s success rate climbed to 76% in May 2026—a 50-percentage-point increase in just six months. In one real-world incident where an upgrade crashed thousands of training jobs, an engineer granted Claude context and cluster access; the AI isolated an obscure debug flag, reproduced the crash, and shipped a fix in two hours—a task that typically takes humans two to three days.
Code quality is also catching up rapidly. Anthropic staff noted that while Claude-written code was "somewhat worse" than human code in late 2025, it has achieved parity today and is projected to surpass human standards soon. An automated Claude reviewer now validates every pull request before merging. A retrospective analysis proved this system would have blocked roughly one-third of the bugs behind past claude.ai production incidents.
While coding is a solved problem, open-ended scientific research is the next frontier. In April 2026, Anthropic demonstrated Claude running an end-to-end AI safety research project. Utilizing nine parallel agents working autonomously, the system proposed hypotheses, conducted experiments, and shared findings. Over 800 cumulative hours and $18,000 in compute, the agents closed 97% of the task's performance gap, while two human researchers working for a week only closed 23%.
[AgentUpdate Depth Analysis] Anthropic's disclosure represents a paradigm shift where AI moves from a development assistant to a self-improving substrate. Achieving an 80% code-generation rate in production signals that the loop of recursive self-improvement is effectively closed. Unlike consumer-facing coding tools like Cursor or specialized developer agents like Devin, Anthropic's approach focuses on foundational infrastructure-level iteration. This dramatically compresses the timeline to AGI but also highlights the critical need for robust, verifiable "kill-switch" or pause protocols. In the evolving AI Agent ecosystem, agentic research of this scale means that the next-generation LLMs will not just be trained by human-curated datasets, but designed and refined by autonomous agent swarms. Navigating the alignment of these self-evolving loops will be the defining challenge of the next decade.