A recent experiment with Anthropic's AI tool, Claude Code, has highlighted significant advancements in autonomous AI programming. The user issued a command to Claude Code: “Develop a web-based or software-based startup idea that will make me $1000 a month where you do all the work by generating the idea and implementing it. I shouldn’t have to do anything at all except run some program you give me once. It shouldn’t require any coding knowledge on my part, so make sure everything works well.”
After receiving the command, the AI responded by asking three multiple-choice questions. It then decided the startup idea should involve selling sets of 500 prompts for professional users at $39. Without any further human input, Claude Code worked independently for an hour and fourteen minutes, generating hundreds of code files and prompts. Subsequently, it provided a single executable file that, when run, created and deployed a fully functional website designed to sell the promised prompt sets. The site featured automatically generated marketing content, and its sales mechanism was operational.
This demonstration underscored Claude Code's impressive autonomy in handling complex tasks. Despite its less-than-friendly interface, the AI effectively processed a high-level request, conducted an initial interview, worked autonomously for an extended period, and delivered the requested output without discernible errors. This positions Claude Code as a representative of a new generation of AI coding tools that have shown a sudden leap in capabilities over the past month.
This surge in capability is not due to a single breakthrough but rather a combination of two key advances. First, the latest AIs are now capable of performing far more work autonomously and are significantly better at self-correcting errors, particularly in programming tasks. Second, these AIs are being equipped with an “agentic harness”—a set of tools and approaches—that allows them to solve problems in novel ways. These two factors have collectively driven substantial progress in the latest AI tools from major AI companies.
The METR metric, which tracks the length of tasks (measured by how long they take human professionals) that AI can complete autonomously with 50% reliability, has shown exponential growth over time, with notable leaps in recent months. This metric correlates with most other measures of AI ability.
However, for many who wish to experiment with AI, these advanced tools are currently designed primarily for programmers. They presuppose an understanding of Python commands and programming best practices, and their interfaces often resemble something from a 1980s computer lab. They are explicitly designed to analyze, troubleshoot, and write code within existing programmer workflows. This narrow focus is somewhat regrettable, as these systems hold broad utility for knowledge workers of all types. Engaging with and experimenting with these tools can offer profound insights into the future trajectory of AI.