Anthropic, a leading artificial intelligence research company, has announced a significant breakthrough with its new model, Mythos Preview, on the challenging SWE-bench software engineering benchmark. The model demonstrated superior performance, substantially outperforming its predecessor or comparable model, Opus 4.6.
Specifically, Mythos Preview achieved an impressive 93.9% on the SWE-bench Verified track. This score marks a considerable improvement over Opus 4.6, which registered 80.8% on the same verified tasks. The results underscore Mythos Preview's enhanced capabilities in understanding and resolving validated software issues.
Furthermore, on the more demanding SWE-bench Pro track, Mythos Preview scored 77.8%, a substantial leap compared to Opus 4.6's 53.4%. This significant performance gain, particularly in tackling complex and challenging real-world software engineering problems, highlights rapid advancements in AI models' ability to generate, debug, and solve code-related issues.
The SWE-bench benchmark is designed to evaluate AI models' proficiency in addressing real-world software engineering tasks, requiring them to identify, understand, and fix bugs or implement new features based on actual GitHub issues. Mythos Preview's exceptional results not only solidify Anthropic's position at the forefront of AI innovation but also lay a strong foundation for the future development of advanced AI code agents and autonomous development tools, accelerating the path towards AI-assisted and even AI-driven programming.