Labs

Google DeepMind's AlphaEvolve: LLM Rewrites Game Theory Algorithms, Outperforming Human Experts

Google DeepMind's AlphaEvolve: LLM Rewrites Game Theory Algorithms, Outperforming Human Experts

Designing algorithms for Multi-Agent Reinforcement Learning (MARL) in imperfect-information games—scenarios where players act sequentially and cannot see each other’s private information, such as poker—has historically relied on manual iteration. Researchers typically identify weighting schemes, discounting rules, and equilibrium solvers through intuition and trial-and-error. Google DeepMind researchers now propose AlphaEvolve, an LLM-powered evolutionary coding agent that replaces this manual process with automated search.

The research team applied this framework to two established paradigms: Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO). In both cases, the AlphaEvolve system discovered new algorithm variants that performed competitively against or better than existing hand-designed state-of-the-art baselines. All experiments were conducted using the OpenSpiel framework.

Background: CFR and PSRO

CFR is an iterative algorithm that decomposes regret minimization across information sets. At each iteration, it accumulates ‘counterfactual regret’—how much a player would have gained by playing differently—and derives a new policy proportional to positive accumulated regret. Over many iterations, the time-averaged strategy converges to a Nash Equilibrium (NE). Variants like DCFR (Discounted CFR) and PCFR+ (Predictive CFR+) improve convergence by applying specific discounting or predictive update rules, all of which were developed through manual design.

PSRO operates at a higher level of abstraction. It maintains a population of policies for each player, constructs a payoff tensor (the meta-game) by computing expected utilities for every combination of population policies, and then uses a meta-strategy solver to produce a probability distribution over the population. Best responses are trained against that distribution and iteratively added to the population. The meta-strategy solver—which dictates how the population distribution is computed—is the central design choice that this paper targets for automated discovery. All experiments utilized an exact best response oracle (computed via value iteration) and exact payoff values for all meta-game entries, thereby eliminating Monte Carlo sampling noise from the results.

The AlphaEvolve Framework

AlphaEvolve is a distributed evolutionary system that leverages LLMs to mutate source code rather than numeric parameters. The process unfolds as follows: a population is initialized with a standard implementation (CFR+ as the seed for CFR experiments; Uniform as the seed for both PSRO solver classes). At each generation, a parent algorithm is selected based on its fitness; its source code is then passed to an LLM (Gemini 2.5 Pro) with a prompt to modify it. The resulting candidate algorithm is evaluated on proxy games, and valid candidates are subsequently added to the population. AlphaEvolve supports multi-objective optimization; if multiple fitness metrics are defined, one is randomly selected per generation to guide parent sampling. The fitness signal used is negative exploitability.

↗ Read original source