- Sabrina Ramonov 🍄
- Posts
- Introduction to AI Agents
Introduction to AI Agents
Unified Architecture for AI Agents
You’ve probably heard — AI agents are all the rage.
But what are they?
AI agents are autonomous programs that interact with their environment, make decisions, process information and data, respond to feedback, and take actions to achieve specific goals… all without human intervention!
Traditional agents were limited by heuristic rules, severely struggling with generalization across diverse tasks and situations.
But LLMs change everything.
LLMs handle complex tasks and generalize well.
Because of their emergent abilities, a single LLM can handle multitudes of tasks and collect diverse types of feedback to improve decision making.
LLMs are the “brain” powering next-generation AI agents.
Agents are particularly exciting because of these characteristics:
Automation of Complex Tasks
AI agents can handle complex and repetitive tasks, improving efficiency while reducing errors. They take advantage of interaction with the environment to learn faster and decompose complex tasks into sub-tasks.
Human-like Interactions
AI agents can interact in human-like conversation, making them great in roles that involve customer interaction, such as support.
Agents can plan how to solve complex issues, efficiently wielding multiple tools (e.g. APIs, knowledge bases).
Adaptability
AI agents learn from their environments and experiences, adapting behaviors to better perform their tasks.
This adaptability makes them great for dynamic environments where conditions and requirements frequently change.
Scalability
AI agents can be deployed en masse, allowing for scalable operations.
Unified Architecture for AI Agents
This paper introduces a unified architecture for LLM-powered AI agents, consisting of 4 modules:
Profile
Memory
Planning
Action
Each module serves a specific purpose, like identifying the agent’s identity or enabling learning from past interactions or planning future actions.
This unified architecture is important because it maximizes LLMs’ capabilities to solve complex tasks, enabling LLM-powered AI agents to leverage the vast amount of knowledge and experience LLMs have.
Below, I dive into each module.
Graphic By: Sabrina Ramonov @ sabrina.dev
Profile
The Profile module assigns the AI agent’s role and personality, which heavily impacts the agent’s decisions and planning.
This goes hand in hand with prompt engineering techniques, such as telling LLMs, “You are an expert deep learning researcher. What are the…”
Agent Identity
Whether an engineer, teacher, or psychologist, each agent is assigned a profile that influences how they behave.
Just like describing a person, an agent’s profile consists of demographic information, personality traits, social elements, and their role or purpose.
Each profile is specific to the agent’s task or objective.
Ways to Create Identity
There are several ways to generate the profile:
Handcrafted: generate identities through manually crafted prompts, e.g. give one AI agent the prompt, “You are an introvert.”
LLM Generated: provide a few seed profiles to an LLM, then ask the LLM to generate new profiles. This leverages few-shot learning to bootstrap the profiles from the initial seed profiles. Compared to handcrafted profiles, this approach is faster but less precise.
Dataset Alignment: generate profiles from real-world datasets, e.g. dataset of human participants in a research study.
You can mix and match profile creation mechanisms depending on your task.
Memory
The Memory module enables the agent to learn from past interactions with its environment in order to adapt and formulate better plans.
Storage
Inspired by the human brain, there are 2 types of agent memory storage:
Unified: short-term memory only. This is typically equivalent to in-context learning and is usually part of the prompt. For example, in an LLM-powered game, you store a monster’s health in the prompt.
Hybrid: combines short-term memory and long-term memory.
Unified storage is sufficient for simpler tasks.
Because it’s usually embedded in the prompt, it suffers from short context window size issues and will degrade in performance as memory grows.
Hybrid storage’s dual-memory system is the best fit for more complex tasks.
Short-term memory stores recent perceptions and context about the agent’s current situation. It allows the agent to respond to immediate changes and demands in its environment.
Long-term memory stores the agent's past behaviors and thoughts. Information is consolidated over time and can be retrieved based on its relevance to current events.
Daily memories are encoded as embeddings, preserved in a vector database.
When agents need to access past memories, the long-term memory system uses embedding similarities to retrieve related information.
Hybrid storage helps maintain consistency in the agent’s behavior and improves planning and decision-making by learning from past experiences.
Just like us, humans!
Formats
There are multiple format options for memory storage.
The most common formats:
List: structured lists are useful to capture information hierarchies
Database: useful for efficient manipulation of memories via SQL query
Natural Language: flexible format, preserving semantic information
Embedding Vectors: most efficient for memory retrieval
Operations
Interaction between the agent and environment is done via 3 operations:
Reading: plan smarter actions by extracting meaningful information from memory, typically based on recency, relevance, and importance
Writing: store information about the environment in memory, providing a foundation for future retrieval and learning.
Reflection: synthesize memories into broader insights, similar to how we reflect on decisions and notice emerging themes and patterns
Traditional LLMs are in a static environment, while LLM-powered AI agents live in a dynamic environment where they learn from past behavior.
This is why memory is key.
Planning
The Planning module enables AI agents to deconstruct complex tasks into simpler subtasks that can be solved individually.
Sound familiar? This is also how humans think and plan!
Without Feedback
Planning without feedback is precisely what it sounds like:
AI agents don’t receive any feedback to influence future behaviors. They don’t learn whether or not an action was effective.
These are the main ways agents plan without feedback:
Chain-of-Thought (single path)
Multi-path CoT (tree or graph)
External planner (e.g LLM → planning domain definition language)
This works for simple tasks with a small number of reasoning steps.
With Feedback
But for complex tasks, planning with feedback is much more robust.
There are 3 types of feedback AI agents can receive:
Environment: feedback from the objective world or virtual environment, e.g. agent plans to add a feature to a codebase, tries running unit tests, but something breaks, which serves as feedback from its environment
Human: feedback from humans helps align the agent with human preferences and reduce hallucinations
Model: agent generates output, asks other LLMs to provide feedback, then the agent refines its output based on the feedback
Action
The Action module translates the AI agent’s decisions into desired outcomes.
This module is directly influenced by all others. The agent’s profile, memory, and planning modules shape what action is taken and how it’s carried out.
This module has 4 subparts:
Goal - what does the agent want to achieve?
Production - how should the agent execute the action?
Space - what are all possible actions the agent could take?
Impact - what are the consequences of the action?
Below, I dive into each part.
Goal - Desired Outcomes
Every action has a goal, a set of intended or desired outcomes.
Common examples are:
Task completion: e.g. in a survival game, craft armor
Communication: e.g. ask another LLM or a human for feedback
Exploring environment: e.g. in a strategy game, uncover fog of war
Production - How Action is Taken
With ordinary LLMs, inputs and outputs are directly linked.
With AI agents, actions can be taken via different strategies.
Recall that AI agents interact and learn from their dynamic environment, analyze information, receive feedback, and may juggle multiple objectives. There are different methods to synthesize all this knowledge into action.
The most common strategies are:
Action via Memory Recollection: agent checks past experiences completing this task and analyzes what actions were taken to achieve successful outcomes.
Action via Plan Following: agent follows pregenerated plans.
Space - Possible Actions
In any situation, there are many possible actions an AI agent can take.
There are roughly 2 main categories of potential actions:
Call External Tools
Agents can call external APIs, databases, knowledge bases, or other models.
This is effective for domain-specific use cases that require niche knowledge. External tools can also help reduce hallucinations.
Calling external models can be helpful for specialized tasks.
For example, an agent calls a specialized LLM trained for a specific niche task, like separating encoding and querying in retrieval tasks.
Use Self-Knowledge Only
In this approach, the agent relies on its own internal knowledge only, without calling any external tools.
Impact - Consequences
Every action has an impact or consequences.
Common examples are:
Changing the agent’s environment
Updating the agent’s memory with new information
Triggering new actions
Conclusion
The graphic below summarizes all key modules and features in this unified architecture for AI agents:
Graphic By: Sabrina Ramonov @ sabrina.dev
Like any other tool, AI agents face their own set of challenges, such as data privacy concerns, ethical challenges, technical complexities in implementation and troubleshooting, and they need massive compute resources.
Nonetheless, LLM-powered AI agents are a significant evolution from traditional agents, which were previously limited and inflexible.
LLMs enable AI agents to handle complex tasks and generalize much better, making them effective in dynamic environments and scalable operations.