⚡ News

Did Google’s AI Agents Really Build an Operating System for $916?

Did Google’s AI Agents Really Build an Operating System for $916?

At Google’s developer conference, the company launched its latest model, Gemini 3.5 Flash, alongside a new agent app, Antigravity 2.0. To showcase the potential of this setup, Google claimed that a team of agents built an entire operating system from a single prompt, costing only about $900 in API fees and involving dozens of collaborating subagents.

Does this imply that complex software can now be built cheaply by AI? Not so fast. The "single prompt" claim is misleading. While the blog post emphasizes the simplicity of the input, it later discloses that the prompt ended up being "many thousands of lines" long. Without knowing how many attempts were needed to generate this prompt or how specific the instructions were, it is difficult to determine if the breakthrough is due to a better model or intense prompt engineering. Furthermore, the run was carried out on a scaffold with specialized roles, delegation, and anti-cheating measures. We do not know if this scaffold was overfitted to the specific task of building an OS or if it would generalize to other complex engineering tasks.

Secondly, Google's report is ambiguous about human intervention. The post mentions that the final run required "no additional guidance or corrections from a human," yet it describes an infrastructure built to kill and restart stuck agents. It also notes an earlier run where agents appeared to cheat, after which the team added anti-cheating measures and re-ran the task. However, the methodology does not account for these dry runs. It remains unclear whether any agents escalated issues to humans, if manual restarts were required during the final run, or how many retries were necessary before reaching success.

Finally, there was no attempt to analyze whether the agents wrote code from scratch or copied existing code from the web. To Google's credit, the post notes that toy operating systems are standard undergraduate projects with many public implementations. While the post raises concerns that the agent might have regurgitated information rather than innovating, it does not address this through similarity or log analysis. Even without direct copying, writing an OS might be relatively easy for agents due to patterns memorized in training data, which tells us little about their ability to create truly novel software.

↗ Read original source