Baseline Controls

Here we provide a list of technical controls to mitigate the baseline risks discussed in the previous section. As our aim here is to provide sensible and concrete recommendations for teams, we do not include all potential mitigation measures here.

Click here for a downloadable version.

List of Baseline Controls

Component	Risk	Control
LLM	Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles.	Review the LLM's system card for potential alignment issues before using the LLM for more complex tasks.
LLM	Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles.	Integrate an explicit safety constraint layer (e.g., policy engine or constitutional rules) that overrides unsafe outputs at runtime.
LLM	Poorly aligned LLMs may pursue objectives which technically satisfy instructions but violate safety principles.	Maintain human‑in‑the‑loop approval for any high‑impact or irreversible actions.
LLM	Weaker LLMs have a higher tendency to produce unpredictable outputs which make agent behaviour erratic.	Prioritise LLMs with stronger performance in instruction following and other related benchmarks.
LLM	Weaker LLMs have a higher tendency to produce unpredictable outputs which make agent behaviour erratic.	Continuously monitor and log outputs, triggering alerts when behaviour drifts from tested baselines.
LLM	LLMs with poor safety tuning are more susceptible to prompt injection attacks and jailbreaking attempts.	Implement input sanitisation measures or limit inputs to conventional ASCII characters only.
LLM	Using LLMs trained on poisoned or biased data introduces manipulation risk, discriminatory decisions, or misinformation.	Do not use LLMs from unknown or untrusted sources, even if it is available on public platforms.
Tools	Poorly implemented tools may not correctly verify user identity or permissions when executing privileged actions.	Do not use tools which do not implement robust authentication protocols.
Tools	Poorly implemented tools may not correctly verify user identity or permissions when executing privileged actions.	Conduct periodic audits to validate that tool actions match the appropriate user permissions.
Tools	Rogue tools that mimic legitimate ones can contain hidden malicious code that executes when loaded.	Do not use tools from unknown or untrusted sources, even if it is available on public platforms.
Tools	Rogue tools that mimic legitimate ones can contain hidden malicious code that executes when loaded.	Test third‑party tools in hardened sandboxes with syscall/network egress restrictions before using them in production environments.
Tools	Tools that do not properly sanitise or validate inputs can be exploited through prompt injection attacks.	Enforce strict schema validation (e.g. JSON Schema, protobuf) and reject non‑conforming inputs upstream.
Tools	Tools that do not properly sanitise or validate inputs can be exploited through prompt injection attacks.	Escape or encode user inputs when embedding into tool prompts or commands.
Tools	Tools that demand broader permissions than necessary create unnecessary attack surfaces for malicious actors.	Conduct periodic least‑privilege reviews and automated permission drift detection.
Instructions	Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations.	Define multi‑objective success criteria incorporating safety, ethics, and usability metrics.
Instructions	Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations.	Conduct adversarial evaluation to surface gaming behaviours and iterate on instruction design.
Instructions	Simplistic instructions with narrow metrics and without broader constraints may result in agents engaging in specification gaming, resulting in poor performance or safety violations.	Continuously monitor and log agents' outputs, triggering alerts when behaviour drifts from tested baselines.
Instructions	Vague instructions may compel agents to attempt to fill in missing constraints, resulting in unpredictable actions or incorrect steps taken.	Ask the agent to summarise its understanding and request clarification before proceeding.
Instructions	Vague instructions may compel agents to attempt to fill in missing constraints, resulting in unpredictable actions or incorrect steps taken.	Test instructions with scenario‑based evaluations to reveal ambiguities for refinement.
Instructions	Instructions without a clear distinction between system prompts and user requests may confuse agents and result in greater vulnerability to prompt injection attacks.	Signpost system prompts with clear tags (e.g. XML) to distinguish between system prompts and user inputs.
Memory	Malicious actors can inject false or misleading facts into the knowledge base, resulting in the agent acting on incorrect data or facts.	Periodically run audits that reconcile stored facts against trusted external references, with a flag for discrepancies.
Memory	Agents may inadvertently store sensitive user or organisational data from prior interactions, resulting in data privacy risks.	Encrypt memory at rest and restrict access via fine‑grained access controls and audit logs.
Memory	Agents may mistakenly save momentary glitches and hallucinations into memory, resulting in compounding mistakes when the agent relies on the incorrect information for its decision or actions.	Schedule periodic memory reconciliation where human reviewers or external tools flag anomalies.
Agentic Architecture	In linear agentic pipelines where each stage blindly trusts the previous stage, single early mistakes may be propagated and magnified.	Insert validation checkpoints between stages that verify assumptions and reject invalid outputs.
Agentic Architecture	In linear agentic pipelines where each stage blindly trusts the previous stage, single early mistakes may be propagated and magnified.	Design feedback loops enabling later stages to roll back or request correction from earlier stages.
Agentic Architecture	In hub-and-spoke architectures which route all decisions through one controller agent, any bug or compromise may distributes faulty instructions across the entire system.	Apply circuit‑breakers that freeze propagation when anomalous behaviour is detected.
Agentic Architecture	More complex agentic architectures may make it difficult to fully reconstruct decision processes across multiple agents.	Implement end‑to‑end distributed tracing with unique request IDs across all agents and tool calls.
Agentic Architecture	More complex agentic architectures may make it difficult to fully reconstruct decision processes across multiple agents.	Write immutable, tamper‑evident audit logs that capture prompts, responses, and tool invocations.
Roles and Access Controls	Unauthorised actors can impersonate agents and gain access to restricted resources.	Maintain trusted registry of agents and authenticate agents using strong, verifiable credentials.
Roles and Access Controls	Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles.	Apply Principle of Least Privilege (PoLP) when configuring all agent and delegation roles.
Roles and Access Controls	Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles.	Apply strict access controls and validate agent roles for requests.
Roles and Access Controls	Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles.	Ensure fine-grained, scoped tokens or credentials where possible.
Roles and Access Controls	Agents may gain unauthorized access to restricted resources by exploiting misconfigured or overly permissive roles.	Use time-bound or one-time-use credentials where possible.
Monitoring and Traceability	Lack of monitoring results in delayed detection of agent failures.	Implement real-time monitoring of agent status, actions, and performance metrics, paired with automated alerting mechanisms that notify operators of anomalies, errors, or inactivity.
Monitoring and Traceability	Lack of traceability inhibit proper audit of decision-making paths in the event of failures.	Record comprehensive logs of agent actions, inputs, outputs, and inter-agent communications, tagged with unique trace identifiers to reconstruct full decision-making paths.