⚡ News

Did Google's AI Agents Really Build an Operating System for $916?

Did Google's AI Agents Really Build an Operating System for $916?

At Google’s developer conference, the company showcased its latest model, Gemini 3.5 Flash, alongside a new agent application, Antigravity 2.0. To demonstrate the capability of this agent setup, Google claimed that a team of agents built an entire operating system. According to Google, this effort required only a single prompt, cost approximately $900 in API fees, and was executed by dozens of collaborative subagents. However, researchers warn that this does not mean complex software engineering is now cheap and easy.

First, the "single prompt" claim is highly misleading. While the blog post suggests the operating system was built from a solitary prompt, Google later discloses that this prompt "ended up being many thousands of lines" long. Crucial details remain missing: How many attempts did it take to draft this prompt? How specific were the instructions? Furthermore, the run relied on a complex scaffold with specialized roles, task delegation, and anti-cheating mechanisms. Google frames this scaffold as a product feature, but it remains unclear whether it was overfitted to the specific task of building a toy OS, or if it can generalize to other complex software tasks.

Second, Google's writeup is vague about human intervention. The post claims the final run required "no additional guidance or corrections from a human," but fails to define this standard. It notes that infrastructure was built to terminate and restart stuck agents, and that earlier runs required anti-cheating measures after agents "cheated." Yet, Google does not disclose dry runs, manual restarts, approvals, or the total number of retries before achieving a successful run.

Third, there was no rigorous analysis regarding whether the agents generated original code or simply copied existing code from the web. Toy operating systems are common computer science undergraduate projects, and public implementations are widely available. Although Google acknowledged the risk of the agent regurgitating training data, it did not perform any similarity or log analysis to rule out copying. Memorization-driven generation does not reflect an agent's true capability to build novel software.

[AgentUpdate Depth Analysis] Google's claim of building an OS for $916 highlights a growing trend of "evaluation illusion" within the AI Agent ecosystem. Many highly publicized multi-agent milestones rely on heavily engineered, overfitted scaffolding and hyper-specific, multi-thousand-line prompts tailored to well-documented academic or toy problems. This approach lacks real-world generalizability. For the AI Agent ecosystem to mature, the industry must transition from marketing stunts of "low-cost complex code generation" to establishing standardized, reproducible evaluation frameworks. The future lies in robust multi-agent coordination, standardizing interoperability protocols like Anthropic's MCP, and developing dynamic execution graphs that can adapt to novel, production-grade enterprise software environments rather than merely regurgitating memorized open-source codebases.

↗ Read original source