AI Agents Fail to Resist Prompt Injection Attacks in New Study

Researchers from four institutions published a study on Thursday finding AI agents powered by GPT-5 and Gemini 2.5-Flash cannot resist prompt injection attacks. Direct attacks succeeded more than 79% of the time, while indirect attacks achieved success rates between 41.67% and 68.16%. The findings highlight persistent security vulnerabilities as AI agents capable of autonomous web browsing, research, and transactions become more widely deployed.

Prompt injection occurs when attackers embed hidden instructions in content that an AI agent encounters, causing it to follow the attacker's directions instead of the user's. The study was conducted by researchers from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois Urbana-Champaign.

Researchers Conduct 3,168 Attack Simulations Using StakeBench

The research team developed StakeBench, a benchmark that tests how AI agents respond to prompt injection attacks in realistic online environments. They conducted 3,168 attack simulations using NanoBrowser and BrowserUse with GPT-5 and Gemini 2.5-Flash.

The researchers wrote that existing security benchmarks adopt an attack-centric perspective while overlooking the distribution of resulting harms. They stated that prompt-injection risk is victim-dependent, with a single exploit producing asymmetric consequences for different stakeholders.

StakeBench probes three factors: the semantic distance between the injected objective and the user's original intent, the consistency of surrounding environmental cues, and the position along the agent's execution trajectory at which the benchmark first exposes it to the injected content.

Microsoft and Google Documented Prompt Injection Attacks

In February, Microsoft researchers warned that hidden instructions embedded in AI summary links could influence chatbot behavior. In April, Google documented prompt injection attacks hidden in web pages that attempted to manipulate AI agents into leaking credentials or sending payments.

Microsoft disclosed a prompt injection flaw in Anthropic's Claude Code GitHub Action that could have exposed user credentials.

Study Identifies Stealthy Parasitism Attack Pattern

The study identified what researchers called "stealthy parasitism," where an AI agent completes a user's task while simultaneously advancing an attacker's objective. For example, stealthy parasitism caused by a prompt injection attack could subtly influence product recommendations, steering users toward a particular item without any obvious signs that the system had been compromised.

The researchers concluded that prompt-injection security in deployable web agents is not a scalar property of the backbone model but a distribution of harm jointly determined by the affected stakeholder, the semantic alignment between the injected objective and the user's task, and the architectural context in which the backbone is deployed.

FAQ

What did researchers find about AI agent security on Thursday?

Researchers from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois Urbana-Champaign published a study on Thursday finding that AI agents powered by GPT-5 and Gemini 2.5-Flash cannot consistently resist prompt injection attacks, with direct attacks succeeding more than 79% of the time.

What is stealthy parasitism in AI agent attacks?

Stealthy parasitism is a pattern identified in the study where an AI agent completes a user's task while simultaneously advancing an attacker's objective, such as subtly influencing product recommendations without obvious signs of compromise.

How many attack simulations did researchers conduct?

The research team conducted 3,168 attack simulations using NanoBrowser and BrowserUse with GPT-5 and Gemini 2.5-Flash to test AI agent responses to prompt injection attacks.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments