Capabilities
In this section, we describe the different capabilities that agentic AI systems may have. We start with three broad categories of capabilities - cognitive, interaction, and operational - and break it down into more granular capabilities.
Click here for a downloadable version.
Cognitive Capabilities
Cognitive capabilities encompass the agentic AI system's internal "thinking" skills – how it analyses information, forms plans, learns from experience, and monitors its own performance.
-
Reasoning & Problem-Solving: The capability to perform structured, multi-step reasoning that demonstrates deeper understanding, problem-solving, and decision-making. This includes deconstructing complex prompts, making inferences, and drawing conclusions from available information. An example would be analysing a customer complaint by first identifying the core issue, then considering possible causes, and finally proposing a solution based on company policies.
-
Planning & Goal Management: The capability to develop detailed, step-by-step, and executable plans with specific tasks in response to broad instructions. This includes prioritising activities based on importance and dependencies between tasks, monitoring how well its plan is working, and adjusting when circumstances change or obstacles arise. For instance, when asked to "organise a team meeting," the agentic AI system should be able to break this down into steps like checking calendars, booking a room, sending invites, and preparing an agenda, as well as automatically finding alternative timeslots and updating the details if a required participant becomes unavailable.
-
Agent Delegation: The capability to assign subtasks to other agents and coordinate their activities to achieve broader goals. This includes identifying which components are best suited for specific tasks, issuing clear instructions, managing inter-agent dependencies, and monitoring performance or failures. For example, an agentic AI system tasked with generating a market research report may delegate data collection to a web-scraping agent, assign analysis to a statistical reasoning agent, and pass the results to a report generation agent to draft the final document, while coordinating timing and verifying outputs from each.
-
Tool Use: The capability to evaluate available options and choose the best tool for specific subtasks. This requires agents to understand the capabilities and limitations of different tools, match them appropriately to the tasks, check that the work completed is satisfactory (and requesting corrections when not), and combine multiple outputs into a coherent final result. For example, when processing a document, an agentic system might choose an OCR tool for scanned images but a direct text parser for digital PDFs.
Interaction Capabilities
Interaction capabilities describe how the agentic AI system exchanges information with users, other agents, and external systems. These capabilities below are broadly differentiated based on how and what they interact with.
-
Natural Language Communication: The capability to fluently and meaningfully converse with human users, handling a wide range of situations such as explaining complex topics, generating documents or prose, or discussing issues with human users. It produces natural-sounding text which matches the appropriate tone, formality level, and communication style for the situation. For instance, an agentic knowledge bot should be able to adjust its response and tone depending on the content and nuances of the user's query and provide a tailored and sensitive response.
-
Multimodal Understanding & Generation: The capability to to take in image, audio, or video inputs and / or generate image, audio, or video outputs. This includes analysing visual information, transcribing speech, or creating multimedia content as needed. As an example, it might analyze a chart in an uploaded image to extract data trends, or generate a diagram to illustrate a complex process being discussed.
-
Official Communication: The capability to compose and directly publish communications that formally represent an organisation to external parties (e.g., customers, partners, regulators, courts, media) via approved channels and formats without human oversight or approval. For instance, an AI customer service agent may be tasked to directly respond to simple queries coming into the company's customer service portal without requiring a human operator to review the response.
-
Business Transactions: The capability to execute transactions that involve exchanging money, services, or commitments with external parties. It can process payments, make reservations, and handle other business transactions within authorized limits. For instance, it might book travel arrangements for employees, process refunds for customers, or automatically pay recurring vendor invoices.
-
Internet & Search Access: The capability to access and search the Internet for services or resources, especially for up-to-date information to supplement its knowledge and provide more accurate answers. For example, an agentic market research system, when asked about recent market trends, could search financial news sites and compile the latest relevant information. Note that this capability specifically refers to external internet resources, not accessing private organisational documents or internal systems.
-
Computer Use: The capability to directly control a computer interface by moving the mouse, clicking buttons, and typing on behalf of the user. It can navigate applications and perform tasks that require interacting with graphical user interfaces. For example, it might automatically fill out online forms, navigate through software menus to change settings, or perform repetitive data entry tasks.
-
Other Programmatic Interfaces: The capability to interact with external systems through APIs, SDKs, or backend services. This includes sending and receiving data via RESTful APIs, pushing code to a remote repository, or invoking cloud services to retrieve or manipulate information from other systems. For instance, a coding assistant agent may call GitHub’s API to fetch issue tickets, query a documentation API to retrieve function specifications, or push commits to a version control system as part of an automated development workflow.
Operational Capabilities
Operational capabilities focus on the agentic AI system's ability to execute actions safely and efficiently within its operating environment.
-
Agent Communication: The capability to communicate with other agents within the system, either through natural language or a predefined protocol, and to coordinate with other agents to accomplish complex tasks that require multiple specialties. This would include sending instructions, sharing information, and coordinating workflows with other agents in the system. For example, a customer service agent may communicate with a billing specialist agent to resolve a payment dispute, sharing relevant account details and receiving updated billing information.
-
Code Execution: The capability to write, execute, and debug code in various programming languages to automate tasks or solve computational problems. It can work with different programming languages and understand code outputs to determine if programs ran successfully. For example, it might write a Python script to analyze a dataset and then execute it to generate insights and visualizations.
-
File & Data Management: The capability to create, read, modify, organise, convert, query, and update information across both unstructured files (e.g., PDFs, Word docs, spreadsheets) and structured data stores (e.g., SQL/NoSQL databases, data warehouses, vector stores). For example, it can find and merge documents, extract and transform data in CSV files, write and execute database queries, or move files to different locations.
-
System Management: The capability to adjust system configurations, manage computing resources, and handle technical infrastructure tasks. This includes monitoring system performance, securely handle authentication information and access controls, and making optimizations as needed while maintaining security best practices. For instance, it might allocate additional memory to a slow-running process or configure network settings for a new application, and it might rotate API keys on a schedule or grant temporary access permissions to team members for specific projects.