Anthropic Experiment Reveals Stronger AI Models Secure Better Deals, While Weaker Agents' Users Remain Unaware

In a recent week-long experiment, Anthropic allowed Claude agents to handle buying and selling goods for employees, revealing a significant insight: stronger AI models consistently negotiated better deals. The critical finding, however, was that participants represented by weaker agents remained completely unaware of their disadvantage.

The experiment, internally dubbed "Project Deal," was conducted in December 2025 at Anthropic's San Francisco office, involving 69 employees. The entire marketplace operated through Slack, with Claude AI agents managing every negotiation and transaction.

Each participant was allocated a $100 budget. Prior to the commencement of the experiment, Claude conducted a brief interview with each volunteer to ascertain their desired selling items and prices, purchasing interests, and preferred negotiation styles for their agent. Anthropic then translated these responses into a custom system prompt for each AI agent.

From that point, the AI agents took complete control. They generated listings, identified potential buyers and sellers, made offers, haggled over prices, and finalized deals autonomously, without human intervention. Human participants only stepped back in at the very end to physically exchange the items, which varied widely from a snowboard to a bag of ping-pong balls.

A core research question was embedded within a parallel experimental setup that participants were initially unaware of. Anthropic ran four versions of the marketplace simultaneously. In two of these, all agents utilized Claude Opus 4.5, Anthropic's frontier model at the time. In the other two, each participant had a 50 percent chance of being represented by Claude Haiku 4.5, Anthropic's smallest model. Importantly, only the AI agents communicated with each other.

In the "real" run, where every agent used Opus, the 69 agents completed 186 deals across over 500 listings, moving just over $4,000 in total. Participants rated the fairness of individual deals at an average of 4 out of 7, indicating a neutral perception.

The mixed runs, however, exposed a measurable performance gap. Opus users consistently came out ahead, closing approximately two more deals on average than Haiku users. When the identical item was sold once by an Opus agent and once by a Haiku agent, the Opus agent secured an average of $3.64 more.

For instance, a lab-grown ruby sold for $65 when negotiated by an Opus agent but only $35 with a Haiku agent. The Opus agent initiated the bidding at $60 and saw its price increase due to competitive offers, while the Haiku agent started at $40 and was subsequently talked down.

Another example involved the same broken folding bike, with the same buyer and seller: the Opus agent achieved a $65 sale, whereas the Haiku agent only secured $38. Across 161 items sold in at least two of the four runs, an Opus seller earned $2.68 more on average, while an Opus buyer paid $2.45 less. When an Opus seller faced off against a Haiku buyer, the average price reached $24.18, a significant increase compared to $18.63 for Opus-on-Opus deals. With a median price of $12 and an overall average of $20.05 across all transactions, these differences underscore the substantial impact of model strength.