AI Agents from Anthropic, Google, and Microsoft Vulnerable to Prompt Injection, Exposing API Keys

Security researcher Aonan Guan has exposed critical prompt injection vulnerabilities in AI agents developed by Anthropic, Google, and Microsoft. These flaws, which leverage the AI agents' integrations with GitHub Actions, allowed for the successful exfiltration of API keys and access tokens. Notably, while all three companies quietly paid bug bounties, none chose to publish public advisories or assign CVEs, potentially leaving users on older versions unaware of the inherent risks.

The vulnerabilities affect several AI tools that integrate with GitHub Actions, including Anthropic’s Claude Code Security Review, Google’s Gemini CLI Action, and GitHub’s Copilot Agent. These tools typically process GitHub data—such as pull request titles, issue bodies, and comments—as task context and then execute actions. The core issue lies in their inability to reliably distinguish between legitimate content and maliciously injected instructions.

How the Attacks Work

The fundamental technique employed is indirect prompt injection. Rather than attacking the AI model directly, the researcher embedded malicious instructions within GitHub data that the agents were designed to trust, such as PR titles, issue descriptions, and comments. When the agent ingested this content as part of its workflow, it executed the injected commands as if they were legitimate instructions.

For Anthropic’s Claude Code Security Review, which scans pull requests for vulnerabilities, Guan crafted a PR title containing a prompt injection payload. Claude then executed the embedded commands and included the output, which contained leaked credentials, in its JSON response. This response was subsequently posted as a PR comment, making the secrets publicly accessible. The attack successfully exfiltrated Anthropic API keys, GitHub access tokens, and other secrets exposed in the GitHub Actions runner environment.
The Gemini attack followed a similar pattern. By injecting a fake “trusted content section” after legitimate content within a GitHub issue, Guan managed to override Gemini’s safety instructions. This manipulation tricked the agent into publishing its own API key as an issue comment. Google’s Gemini CLI Action, which integrates Gemini into GitHub issue workflows, treated the injected text as authoritative commands.
The Copilot attack was more subtle. Guan concealed malicious instructions inside an HTML comment within a GitHub issue. This rendered the payload invisible in the Markdown format typically viewed by humans but fully visible to the AI agent parsing the raw content. When a developer assigned the issue to Copilot Agent, the bot proceeded to follow these hidden instructions.

AI Agents from Anthropic, Google, and Microsoft Vulnerable to Prompt Injection, Exposing API Keys

How the Attacks Work

Next Stories to Read

Google Launches Gemini AI App for Mac, Enhancing Desktop AI Interaction

Maximizing Claude Cowork: A Comprehensive Guide for Enhanced AI Collaboration Across All User Levels

Adobe Unveils Creative AI Assistant with Deep Integration of Anthropic's Claude Model

Related Tools & Resources

Skill Marketplaces

Awesome Cyber Skills

Matt Pocock's AI Skills

Recommended Plugins

Security Guidance