Christopher Olah, Anthropic’s co-founder and head of interpretability research, used his platform at the Vatican on Monday to make an unprecedented argument: the development of frontier AI cannot be left solely to the frontier AI labs themselves. Speaking at the formal presentation of Pope Leo XIV’s first encyclical, Magnifica humanitas, in the Vatican Synod Hall, Olah admitted that the structural incentives governing the industry often conflict with ethical responsibilities.
“Every frontier AI lab,” Olah stated, “operates inside a set of incentives and constraints that can sometimes conflict with doing the right thing.” He emphasized that even well-intentioned researchers remain trapped within these market forces, making outside scrutiny from governments, religious leaders, and civil-society institutions absolutely essential.
Olah also delivered a stark warning regarding the future of labor, noting a “real possibility” that AI will displace human work “at very large scale.” He declared that if this transpires, “supporting those displaced will be a moral imperative of historic proportions.” This represents the most explicit public acknowledgment by a frontier-lab founder that their internal projections show technology displacing jobs faster than the labor market can re-absorb them.
Anthropic's high-profile presence at the Vatican—preceded by the announcement of a new office in Milan—marks a strategic repositioning. Olah’s role in leading mechanistic interpretability is central to the firm’s safety credibility, as his team actively works to reverse-engineer and understand what frontier models are doing internally.
This moral posturing stands in stark contrast to the company's recent political struggles in the United States. In April, the Pentagon excluded Anthropic from classified AI projects over the firm's strict usage policies, opting for Nvidia, Microsoft, and AWS instead. Furthermore, the Trump administration blocked the global expansion of Mythos, Anthropic's autonomous vulnerability-discovery model that has disrupted banking cybersecurity. Olah's appeal for external oversight comes at a highly leveraged commercial moment, as Anthropic is reportedly in talks to raise $30 billion at a staggering $900 billion valuation.
[AgentUpdate Depth Analysis] Olah’s warning marks a critical inflection point for the AI Agent ecosystem. As AI transitions from passive models to autonomous agents capable of independent execution—exemplified by Anthropic’s Mythos model—the paradigm of internal safety alignment reaches its limits. Autonomous agents introduce unpredictable behavioral vectors, making mechanistic interpretability not just an academic pursuit, but a foundational requirement for agentic deployment. By calling for external oversight, Olah highlights that the future of the agent economy cannot rely on the self-policing of a few tech giants. Establishing independent, external auditing frameworks and robust interpretability standards will be the ultimate deciding factor for the enterprise adoption and long-term viability of AI Agents.