Skip to content

Baseline Risks

In this section, we list the baseline risks from (i) the components of an agent and (ii) the design of an agentic system. Note that this list is meant as a reference and is not meant to be exhaustive. Clicking on any of the risks will bring you to the next page with the corresponding controls.

Click here for a downloadable version.

List of Baseline Risks

Component Risk
LLM Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles.1
LLM Weaker LLMs have a higher tendency to produce unpredictable outputs which make agent behaviour erratic.2
LLM LLMs with poor safety tuning are more susceptible to prompt injection attacks and jailbreaking attempts.3
LLM Using LLMs trained on poisoned or biased data introduces manipulation risk, discriminatory decisions, or misinformation.4
Tools Poorly implemented tools may not correctly verify user identity or permissions when executing privileged actions.5
Tools Rogue tools that mimic legitimate ones can contain hidden malicious code that executes when loaded.6
Tools Tools that do not properly sanitise or validate inputs can be exploited through prompt injection attacks.7
Tools Tools that demand broader permissions than necessary create unnecessary attack surfaces for malicious actors.8
Instructions Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations.9
Instructions Vague instructions may compel agents to attempt to fill in missing constraints, resulting in unpredictable actions or incorrect steps taken.10
Instructions Instructions without a clear distinction between system prompts and user requests may confuse agents and result in greater vulnerability to prompt injection attacks.11
Memory Malicious actors can inject false or misleading facts into the knowledge base, resulting in the agent acting on incorrect data or facts.12
Memory Agents may inadvertently store sensitive user or organisational data from prior interactions, resulting in data privacy risks.13
Memory Agents may mistakenly save momentary glitches and hallucinations into memory, resulting in compounding mistakes when the agent relies on the incorrect information for its decision or actions.
Agentic Architecture In linear agentic pipelines where each stage blindly trusts the previous stage, single early mistakes may be propagated and magnified.14
Agentic Architecture In hub-and-spoke architectures which route all decisions through one controller agent, any bug or compromise may distributes faulty instructions across the entire system.15
Agentic Architecture More complex agentic architectures may make it difficult to fully reconstruct decision processes across multiple agents.
Roles and Access Controls Unauthorised actors can impersonate agents and gain access to restricted resources.16
Roles and Access Controls Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles.17
Monitoring and Traceability Lack of monitoring results in delayed detection of agent failures and downstream risks.18
Monitoring and Traceability Lack of traceability inhibit proper audit of decision-making paths in the event of failures.17


  1. Denison et al. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models. https://arxiv.org/abs/2406.10162, 2024. Accessed: 2025-07-21. 

  2. Zhang et al. Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems. https://arxiv.org/abs/2505.00212, 2025. Accessed: 2025-07-21. 

  3. See Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks (Li et al, 2025) and Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents (Yang et al, 2024)

  4. Bowen et al. Scaling Trends for Data Poisoning in LLMs. https://arxiv.org/abs/2408.02946v6, 2024. Accessed: 2025-07-21. 

  5. See Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies (Vineeth Sai Narajala and Idan Habler, 2025), MCIP: Protecting MCP Safety via Model Contextual Integrity Protocol (Jing et al, 2025), and Model Context Protocol (MCP): Understanding security risks and controls (Florencio Cano Gabarda, 2025) 

  6. Bargury, Michael. MCP: Untrusted Servers and Confused Clients, Plus a Sneaky Exploit. https://www.mbgsec.com/archive/2025-05-03-mcp-untrusted-servers-and-confused-clients-plus-a-sneaky-exploit-embrace-the-red/, 2025. Accessed: 2025-07-21. 

  7. See Multi-Agent Systems Execute Arbitrary Malicious Code (Triedman et al, 2025) and CVE-2024-7042

  8. Rehberger, Johann. Plugin Vulnerabilities: Visit a Website and Have Your Source Code Stolen. https://embracethered.com/blog/posts/2023/chatgpt-plugin-vulns-chat-with-code/, 2023. Accessed: 2025-07-21. 

  9. Bondarenko et al. Demonstrating specification gaming in reasoning models. https://arxiv.org/abs/2502.13295, 2025. Accessed: 2025-07-21. 

  10. In What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts, Yang et al (2025) show that underspecified prompts are two times more likely to regress over model or prompt changes, with accuracy drops exceeding 20% in some cases. 

  11. See Control Illusion: The Failure of Instruction Hierarchies in Large Language Models (Geng et al, 2025) and IHEval: Evaluating Language Models on Following the Instruction Hierarchy (Zhang et al, 2025)

  12. See One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems (Chang et al, 2025) and PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models (Zou et al, 2025) for text-based knowledge bases; see PoisonedEye: Knowledge Poisoning Attack on Retrieval-Augmented Generation based Large Vision-Language Models (Zhang et al, 2025) for image-based knowledge bases. 

  13. Shanmugarasa et al. SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation. https://arxiv.org/abs/2506.12699v2, 2025. Accessed: 2025-07-21. 

  14. Huang et al. On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents. https://arxiv.org/abs/2408.00989v3, 2025. Accessed: 2025-07-21. 

  15. Peigné-Lefebvre et al. Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems. arXiv preprint arXiv:2502.19145, 2025. https://arxiv.org/pdf/2502.19145, Accessed: 2025‑07‑27. 

  16. Chen et al. AI Agents Are Here. So Are the Threats. Palo Alto Networks Unit 42 blog, May 1 2025. https://unit42.paloaltonetworks.com/agentic-ai-threats/, Accessed: 2025‑07‑27. 

  17. Goutham A S. Escaping Reality: Privilege Escalation in Gen AI Admin Panel (aka The Chaos of a Misconfigured Admin Panel). Medium blog, Sept 23 2024. https://cyberweapons.medium.com/escaping-reality-privilege-escalation-in-gen-ai-admin-panel-aka-the-chaos-of-a-misconfigured-b6ad73bf1b65, Accessed: 2025‑07‑27. 

  18. Chan et al. Visibility into AI Agents. ACM FAccT 2024, 2024. https://arxiv.org/abs/2401.13138, Accessed: 2025-08-04.