Baseline Risks
In this section, we list the baseline risks from (i) the components of an agent and (ii) the design of an agentic system. Note that this list is meant as a reference and is not meant to be exhaustive. Clicking on any of the risks will bring you to the next page with the corresponding controls.
Click here for a downloadable version.
List of Baseline Risks
-
Denison et al. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models. https://arxiv.org/abs/2406.10162, 2024. Accessed: 2025-07-21. ↩
-
Zhang et al. Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems. https://arxiv.org/abs/2505.00212, 2025. Accessed: 2025-07-21. ↩
-
See Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks (Li et al, 2025) and Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents (Yang et al, 2024). ↩
-
Bowen et al. Scaling Trends for Data Poisoning in LLMs. https://arxiv.org/abs/2408.02946v6, 2024. Accessed: 2025-07-21. ↩
-
See Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies (Vineeth Sai Narajala and Idan Habler, 2025), MCIP: Protecting MCP Safety via Model Contextual Integrity Protocol (Jing et al, 2025), and Model Context Protocol (MCP): Understanding security risks and controls (Florencio Cano Gabarda, 2025) ↩
-
Bargury, Michael. MCP: Untrusted Servers and Confused Clients, Plus a Sneaky Exploit. https://www.mbgsec.com/archive/2025-05-03-mcp-untrusted-servers-and-confused-clients-plus-a-sneaky-exploit-embrace-the-red/, 2025. Accessed: 2025-07-21. ↩
-
See Multi-Agent Systems Execute Arbitrary Malicious Code (Triedman et al, 2025) and CVE-2024-7042. ↩
-
Rehberger, Johann. Plugin Vulnerabilities: Visit a Website and Have Your Source Code Stolen. https://embracethered.com/blog/posts/2023/chatgpt-plugin-vulns-chat-with-code/, 2023. Accessed: 2025-07-21. ↩
-
Bondarenko et al. Demonstrating specification gaming in reasoning models. https://arxiv.org/abs/2502.13295, 2025. Accessed: 2025-07-21. ↩
-
In What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts, Yang et al (2025) show that underspecified prompts are two times more likely to regress over model or prompt changes, with accuracy drops exceeding 20% in some cases. ↩
-
See Control Illusion: The Failure of Instruction Hierarchies in Large Language Models (Geng et al, 2025) and IHEval: Evaluating Language Models on Following the Instruction Hierarchy (Zhang et al, 2025). ↩
-
See One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems (Chang et al, 2025) and PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models (Zou et al, 2025) for text-based knowledge bases; see PoisonedEye: Knowledge Poisoning Attack on Retrieval-Augmented Generation based Large Vision-Language Models (Zhang et al, 2025) for image-based knowledge bases. ↩
-
Shanmugarasa et al. SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation. https://arxiv.org/abs/2506.12699v2, 2025. Accessed: 2025-07-21. ↩
-
Huang et al. On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents. https://arxiv.org/abs/2408.00989v3, 2025. Accessed: 2025-07-21. ↩
-
Peigné-Lefebvre et al. Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems. arXiv preprint arXiv:2502.19145, 2025. https://arxiv.org/pdf/2502.19145, Accessed: 2025‑07‑27. ↩
-
Chen et al. AI Agents Are Here. So Are the Threats. Palo Alto Networks Unit 42 blog, May 1 2025. https://unit42.paloaltonetworks.com/agentic-ai-threats/, Accessed: 2025‑07‑27. ↩
-
Goutham A S. Escaping Reality: Privilege Escalation in Gen AI Admin Panel (aka The Chaos of a Misconfigured Admin Panel). Medium blog, Sept 23 2024. https://cyberweapons.medium.com/escaping-reality-privilege-escalation-in-gen-ai-admin-panel-aka-the-chaos-of-a-misconfigured-b6ad73bf1b65, Accessed: 2025‑07‑27. ↩↩
-
Chan et al. Visibility into AI Agents. ACM FAccT 2024, 2024. https://arxiv.org/abs/2401.13138, Accessed: 2025-08-04. ↩