The arms race in AI mathematics is reaching a boiling point. Just days after OpenAI made headlines by claiming its AI cracked an 80-year-old Erdős conjecture, Google DeepMind quietly raised the stakes. The company's AlphaProof Nexus system has autonomously solved nine open Erdős problems—considered some of the hardest unsolved questions in mathematics—outperforming OpenAI's breakthrough nine to one.
AlphaProof Nexus achieved this feat at a remarkably low cost of just a few hundred dollars per problem. The nine solved problems spanned the fields of combinatorics and graph theory, including two that had remained unsolved for 56 years. Beyond these major milestones, the system also successfully proved 44 open conjectures from the Online Encyclopedia of Integer Sequences (OEIS). In comparison, OpenAI's recent win disproved a single 80-year-old Erdős conjecture, months after the company had to walk back a previous claim of solving ten novel problems.
At a technical level, DeepMind's system paired a Large Language Model (LLM) with Lean, a specialized formal proof assistant, to generate machine-verified mathematical proofs. The agent works via a self-correcting reinforcement learning loop: generating proofs, testing and verifying them in Lean, and repeating the process until a proof successfully passes. While a simpler version of the agent matched these results, it did so at a much higher cost. Furthermore, the researchers noted that problems requiring entirely new mathematical constructions still remain out of reach for current systems.
[AgentUpdate Depth Analysis] The success of AlphaProof Nexus marks a paradigm shift for AI Agents, moving them from probabilistic text generation to formal, logical discovery. By coupling an LLM (acting as the intuitive "System 1") with Lean (the rigorous "System 2" formal verifier), DeepMind has showcased the power of closed-loop verification. This architecture allows autonomous agents to bootstrap their own reasoning capabilities through reinforcement learning without relying on human-annotated datasets. This "LLM + formal verifier" paradigm will inevitably expand beyond pure mathematics into high-stakes domains such as cryptographic protocol verification, chip design, and automated software engineering, establishing a new frontier for highly reliable autonomous workflows.