LLM |
Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles. |
Review the LLM's system card for potential alignment issues before using the LLM for more complex tasks. |
LLM |
Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles. |
Integrate an explicit safety constraint layer (e.g., policy engine or constitutional rules) that overrides unsafe outputs at runtime. |
LLM |
Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles. |
Maintain human‑in‑the‑loop approval for any high‑impact or irreversible actions. |
LLM |
Weaker LLMs have a higher tendency to produce unpredictable outputs which make agent behaviour erratic. |
Prioritise LLMs with stronger performance in instruction following and other related benchmarks. |
LLM |
Weaker LLMs have a higher tendency to produce unpredictable outputs which make agent behaviour erratic. |
Continuously monitor and log outputs, triggering alerts when behaviour drifts from tested baselines. |
LLM |
LLMs with poor safety tuning are more susceptible to prompt injection attacks and jailbreaking attempts. |
Implement input sanitisation measures or limit inputs to conventional ASCII characters only. |
LLM |
Using LLMs trained on poisoned or biased data introduces manipulation risk, discriminatory decisions, or misinformation. |
Do not use LLMs from unknown or untrusted sources, even if it is available on public platforms. |
Tools |
Poorly implemented tools may not correctly verify user identity or permissions when executing privileged actions. |
Do not use tools which do not implement robust authentication protocols. |
Tools |
Poorly implemented tools may not correctly verify user identity or permissions when executing privileged actions. |
Conduct periodic audits to validate that tool actions match the appropriate user permissions. |
Tools |
Rogue tools that mimic legitimate ones can contain hidden malicious code that executes when loaded. |
Do not use tools from unknown or untrusted sources, even if it is available on public platforms. |
Tools |
Rogue tools that mimic legitimate ones can contain hidden malicious code that executes when loaded. |
Test third‑party tools in hardened sandboxes with syscall/network egress restrictions before using them in production environments. |
Tools |
Tools that do not properly sanitise or validate inputs can be exploited through prompt injection attacks. |
Enforce strict schema validation (e.g. JSON Schema, protobuf) and reject non‑conforming inputs upstream. |
Tools |
Tools that do not properly sanitise or validate inputs can be exploited through prompt injection attacks. |
Escape or encode user inputs when embedding into tool prompts or commands. |
Tools |
Tools that demand broader permissions than necessary create unnecessary attack surfaces for malicious actors. |
Conduct periodic least‑privilege reviews and automated permission drift detection. |
Instructions |
Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations. |
Define multi‑objective success criteria incorporating safety, ethics, and usability metrics. |
Instructions |
Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations. |
Conduct adversarial evaluation to surface gaming behaviours and iterate on instruction design. |
Instructions |
Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations. |
Continuously monitor and log agents' outputs, triggering alerts when behaviour drifts from tested baselines. |
Instructions |
Vague instructions may compel agents to attempt to fill in missing constraints, resulting in unpredictable actions or incorrect steps taken. |
Ask the agent to summarise its understanding and request clarification before proceeding. |
Instructions |
Vague instructions may compel agents to attempt to fill in missing constraints, resulting in unpredictable actions or incorrect steps taken. |
Test instructions with scenario‑based evaluations to reveal ambiguities for refinement. |
Instructions |
Instructions without a clear distinction between system prompts and user requests may confuse agents and result in greater vulnerability to prompt injection attacks. |
Signpost system prompts with clear tags (e.g. XML) to distinguish between system prompts and user inputs. |
Memory |
Malicious actors can inject false or misleading facts into the knowledge base, resulting in the agent acting on incorrect data or facts. |
Periodically run audits that reconcile stored facts against trusted external references, with a flag for discrepancies. |
Memory |
Agents may inadvertently store sensitive user or organisational data from prior interactions, resulting in data privacy risks. |
Encrypt memory at rest and restrict access via fine‑grained access controls and audit logs. |
Memory |
Agents may mistakenly save momentary glitches and hallucinations into memory, resulting in compounding mistakes when the agent relies on the incorrect information for its decision or actions. |
Schedule periodic memory reconciliation where human reviewers or external tools flag anomalies. |
Agentic Architecture |
In linear agentic pipelines where each stage blindly trusts the previous stage, single early mistakes may be propagated and magnified. |
Insert validation checkpoints between stages that verify assumptions and reject invalid outputs. |
Agentic Architecture |
In linear agentic pipelines where each stage blindly trusts the previous stage, single early mistakes may be propagated and magnified. |
Design feedback loops enabling later stages to roll back or request correction from earlier stages. |
Agentic Architecture |
In hub-and-spoke architectures which route all decisions through one controller agent, any bug or compromise may distributes faulty instructions across the entire system. |
Apply circuit‑breakers that freeze propagation when anomalous behaviour is detected. |
Agentic Architecture |
More complex agentic architectures may make it difficult to fully reconstruct decision processes across multiple agents. |
Implement end‑to‑end distributed tracing with unique request IDs across all agents and tool calls. |
Agentic Architecture |
More complex agentic architectures may make it difficult to fully reconstruct decision processes across multiple agents. |
Write immutable, tamper‑evident audit logs that capture prompts, responses, and tool invocations. |
Roles and Access Controls |
Unauthorised actors can impersonate agents and gain access to restricted resources. |
Maintain trusted registry of agents and authenticate agents using strong, verifiable credentials. |
Roles and Access Controls |
Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles. |
Apply Principle of Least Privilege (PoLP) when configuring all agent and delegation roles. |
Roles and Access Controls |
Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles. |
Apply strict access controls and validate agent roles for requests. |
Roles and Access Controls |
Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles. |
Ensure fine-grained, scoped tokens or credentials where possible. |
Roles and Access Controls |
Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles. |
Use time-bound or one-time-use credentials where possible. |
Monitoring and Traceability |
Lack of monitoring results in delayed detection of agent failures. |
Implement real-time monitoring of agent status, actions, and performance metrics, paired with automated alerting mechanisms that notify operators of anomalies, errors, or inactivity. |
Monitoring and Traceability |
Lack of traceability inhibit proper audit of decision-making paths in the event of failures. |
Record comprehensive logs of agent actions, inputs, outputs, and inter-agent communications, tagged with unique trace identifiers to reconstruct full decision-making paths. |