Skip to content

Baseline Controls

Here we provide a list of technical controls to mitigate the baseline risks discussed in the previous section. As our aim here is to provide sensible and concrete recommendations for teams, we do not include all potential mitigation measures here.

Click here for a downloadable version.

List of Baseline Controls

Component Risk Control
LLM Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles. Review the LLM's system card for potential alignment issues before using the LLM for more complex tasks.
LLM Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles. Integrate an explicit safety constraint layer (e.g., policy engine or constitutional rules) that overrides unsafe outputs at runtime.
LLM Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles. Maintain human‑in‑the‑loop approval for any high‑impact or irreversible actions.
LLM Weaker LLMs have a higher tendency to produce unpredictable outputs which make agent behaviour erratic. Prioritise LLMs with stronger performance in instruction following and other related benchmarks.
LLM Weaker LLMs have a higher tendency to produce unpredictable outputs which make agent behaviour erratic. Continuously monitor and log outputs, triggering alerts when behaviour drifts from tested baselines.
LLM LLMs with poor safety tuning are more susceptible to prompt injection attacks and jailbreaking attempts. Implement input sanitisation measures or limit inputs to conventional ASCII characters only.
LLM Using LLMs trained on poisoned or biased data introduces manipulation risk, discriminatory decisions, or misinformation. Do not use LLMs from unknown or untrusted sources, even if it is available on public platforms.
Tools Poorly implemented tools may not correctly verify user identity or permissions when executing privileged actions. Do not use tools which do not implement robust authentication protocols.
Tools Poorly implemented tools may not correctly verify user identity or permissions when executing privileged actions. Conduct periodic audits to validate that tool actions match the appropriate user permissions.
Tools Rogue tools that mimic legitimate ones can contain hidden malicious code that executes when loaded. Do not use tools from unknown or untrusted sources, even if it is available on public platforms.
Tools Rogue tools that mimic legitimate ones can contain hidden malicious code that executes when loaded. Test third‑party tools in hardened sandboxes with syscall/network egress restrictions before using them in production environments.
Tools Tools that do not properly sanitise or validate inputs can be exploited through prompt injection attacks. Enforce strict schema validation (e.g. JSON Schema, protobuf) and reject non‑conforming inputs upstream.
Tools Tools that do not properly sanitise or validate inputs can be exploited through prompt injection attacks. Escape or encode user inputs when embedding into tool prompts or commands.
Tools Tools that demand broader permissions than necessary create unnecessary attack surfaces for malicious actors. Conduct periodic least‑privilege reviews and automated permission drift detection.
Instructions Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations. Define multi‑objective success criteria incorporating safety, ethics, and usability metrics.
Instructions Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations. Conduct adversarial evaluation to surface gaming behaviours and iterate on instruction design.
Instructions Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations. Continuously monitor and log agents' outputs, triggering alerts when behaviour drifts from tested baselines.
Instructions Vague instructions may compel agents to attempt to fill in missing constraints, resulting in unpredictable actions or incorrect steps taken. Ask the agent to summarise its understanding and request clarification before proceeding.
Instructions Vague instructions may compel agents to attempt to fill in missing constraints, resulting in unpredictable actions or incorrect steps taken. Test instructions with scenario‑based evaluations to reveal ambiguities for refinement.
Instructions Instructions without a clear distinction between system prompts and user requests may confuse agents and result in greater vulnerability to prompt injection attacks. Signpost system prompts with clear tags (e.g. XML) to distinguish between system prompts and user inputs.
Memory Malicious actors can inject false or misleading facts into the knowledge base, resulting in the agent acting on incorrect data or facts. Periodically run audits that reconcile stored facts against trusted external references, with a flag for discrepancies.
Memory Agents may inadvertently store sensitive user or organisational data from prior interactions, resulting in data privacy risks. Encrypt memory at rest and restrict access via fine‑grained access controls and audit logs.
Memory Agents may mistakenly save momentary glitches and hallucinations into memory, resulting in compounding mistakes when the agent relies on the incorrect information for its decision or actions. Schedule periodic memory reconciliation where human reviewers or external tools flag anomalies.
Agentic Architecture In linear agentic pipelines where each stage blindly trusts the previous stage, single early mistakes may be propagated and magnified. Insert validation checkpoints between stages that verify assumptions and reject invalid outputs.
Agentic Architecture In linear agentic pipelines where each stage blindly trusts the previous stage, single early mistakes may be propagated and magnified. Design feedback loops enabling later stages to roll back or request correction from earlier stages.
Agentic Architecture In hub-and-spoke architectures which route all decisions through one controller agent, any bug or compromise may distributes faulty instructions across the entire system. Apply circuit‑breakers that freeze propagation when anomalous behaviour is detected.
Agentic Architecture More complex agentic architectures may make it difficult to fully reconstruct decision processes across multiple agents. Implement end‑to‑end distributed tracing with unique request IDs across all agents and tool calls.
Agentic Architecture More complex agentic architectures may make it difficult to fully reconstruct decision processes across multiple agents. Write immutable, tamper‑evident audit logs that capture prompts, responses, and tool invocations.
Roles and Access Controls Unauthorised actors can impersonate agents and gain access to restricted resources. Maintain trusted registry of agents and authenticate agents using strong, verifiable credentials.
Roles and Access Controls Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles. Apply Principle of Least Privilege (PoLP) when configuring all agent and delegation roles.
Roles and Access Controls Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles. Apply strict access controls and validate agent roles for requests.
Roles and Access Controls Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles. Ensure fine-grained, scoped tokens or credentials where possible.
Roles and Access Controls Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles. Use time-bound or one-time-use credentials where possible.
Monitoring and Traceability Lack of monitoring results in delayed detection of agent failures. Implement real-time monitoring of agent status, actions, and performance metrics, paired with automated alerting mechanisms that notify operators of anomalies, errors, or inactivity.
Monitoring and Traceability Lack of traceability inhibit proper audit of decision-making paths in the event of failures. Record comprehensive logs of agent actions, inputs, outputs, and inter-agent communications, tagged with unique trace identifiers to reconstruct full decision-making paths.