Anthropic has released Claude Opus 4.7, its most capable generally available model to date, showcasing benchmark-leading performance in software engineering and agentic reasoning. This release significantly widens Claude's lead over competitors like OpenAI’s GPT-5.4 and Google’s Gemini 3.1 Pro in tasks critical for developers and enterprise users.
Opus 4.7 is now available across various Claude plans and through major cloud platforms including Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. It is priced at $5 per million input tokens and $25 per million output tokens.
A highlight of Opus 4.7's performance is in software engineering. On SWE-bench Pro, a benchmark designed to test a model’s ability to resolve real-world software issues from open-source repositories, Opus 4.7 achieved a score of 64.3%. This marks a substantial improvement from Opus 4.6's 53.4% and significantly surpasses GPT-5.4 at 57.7% and Gemini 3.1 Pro at 54.2%. For SWE-bench Verified, a curated subset, Opus 4.7 scored 87.6%, compared to its predecessor's 80.8% and Gemini 3.1 Pro's 80.6%.
Autonomous coding performance also saw a significant jump. On CursorBench, which evaluates performance within the popular AI code editor, Opus 4.7 reached 70%, up from Opus 4.6's 58%. This improvement is particularly impactful for a model already a default choice in tools like Cursor and Claude Code, directly reflecting how developers utilize it in practice.
In terms of graduate-level reasoning, measured by GPQA Diamond, frontier models are converging. Opus 4.7 scored 94.2%, closely followed by GPT-5.4 Pro at 94.4% and Gemini 3.1 Pro at 94.3%. These minimal differences suggest that competitive differentiation is now shifting from raw reasoning scores towards applied performance in complex, multi-step tasks.
Perhaps the most significant advancements in Opus 4.7 lie in its agentic capabilities. Anthropic reports a 14% improvement over Opus 4.6 on complex multi-step workflows, achieved with fewer tokens and a dramatic reduction in tool errors by two-thirds. Notably, Opus 4.7 is the first Claude model to successfully pass "implicit-need tests," where the model must infer the necessary tools or actions without explicit instructions, marking a crucial step towards more autonomous AI agents. The model also features 3x higher image resolution and enhanced multi-agent coordination, enabling it to manage hours-long workflows.