• Sabrina Ramonov
  • Posts
  • Generative AI Agents - Everything You Need to Know (Part 1)

Generative AI Agents - Everything You Need to Know (Part 1)

Definition of Generative AI Agents

Generative AI agents are the next big thing!

But what are they?

In this 2-part series, I demystify generative AI agents and share real examples.

By the end of this series, you will have a clear understanding of:

  • definition of generative AI agents

  • how to design effective AI agents

  • how to equip AI agents with tools

  • how to build AI agents WITHOUT knowing how to code

In a future post, I’ll share popular technical frameworks like LangChain.

But I love starting with low-code tools because they’re accessible to everyone. No technical background required.

Here’s the youtube version of this post:

What are Generative AI Agents?

Generative AI agents are a combination of 2 concepts:

  • generative AI

  • AI agents

Let’s start by defining an AI agent:

Sabrina Ramonov @ sabrina.dev

The key question distinguishing an agent vs. non-agent:

Can the AI autonomously get feedback from interactions with its environment in order to make better decisions?

  • If yes → agent

  • If no → not agent

Generative AI agents are agents powered by generative AI, such as large language models (LLMs), e.g. Llama3 or ChatGPT.

Compared to non-generative agents, generative AI agents are able to solve more complex and diverse problems. They’re more adaptable and flexible. They respond in believable language, thanks to the power of LLMs.

Typically, when you interact with ChatGPT, you run a prompt, review the output manually, and notice a few things to improve:

Sabrina Ramonov @ sabrina.dev

In this case, perhaps the answer is too long for a social media post.

You iterate by interacting with ChatGPT:

Make it shorter for a social media post.

Sabrina Ramonov @ sabrina.dev

Sabrina Ramonov @ sabrina.dev

This interaction between you and ChatGPT leads to better answers, aligned with your intent and needs.

But ChatGPT didn’t arrive at a better answer autonomously. You had to tell ChatGPT what to change.

In contrast, generative AI agents are autonomous, operating without human intervention. Leveraging LLMs, generative agents can ask and answer their own questions, which serves as “feedback” to further improve its output.

Generative AI agents can think about their own answers, judge their quality, and try to create better answers until they’re finally satisfied with the quality.

This means you don’t have to sit there, constantly interacting with ChatGPT to iterate towards better answers.

For the rest of this post, the term “agents” refers to “generative AI agents”.

Designing Effective AI Agents

To design effective agents, consider these aspects:

  1. identity

  2. memory

  3. planning

  4. narrow scope

  5. use of external tools

  6. collaboration with other agents

We’ll dive into each below.

1. Identity

The identity you give an agent directly impacts the quality of its answers.

For example, compare these 2 prompts and answers:

  • What is Llama3?

  • You are a 6th grade science teacher. What is Llama3?

Sabrina Ramonov @ sabrina.dev

Sabrina Ramonov @ sabrina.dev

With a brief 1-sentence identity assignment, ChatGPT assumed the role of a 6th grade science teacher. It completely changed the:

  • content

  • writing style

  • started with a joke

  • comprehension level

  • amount of technical detail

Be mindful of the identity and context you give each agent.

Experiment a lot.

Identity makes a big difference.

2. Narrow Scope

Research has shown that providing LLMs with excessive information or context decreases output quality and increases hallucinations.

It’s crucial to keep the scope narrow.

Give each agent a single identity and sufficient context to succeed, but not so much context that the agent loses sight of what’s important.

When designing agents, ensure each agent has one specific goal.

Instead of a single agent trying to do everything, assemble a team of agents, where each agent has one goal, focusing on a narrow specialty.

For example, if you’re using agents for coding tasks, start by assembling a team of technical agents with distinct responsibilities:

  • one agent writes an architecture plan

  • one agent writes the code

  • one agent tests the code

Don’t overwhelm your agent with tools, either.

Equip each agent with the critical tools it needs to achieve its goal. Access to too many tools is counterproductive because your agent may get confused which tool to use.

3. Memory

Memory plays a huge role in many real-world agent systems.

Inspired by human memory constructs, agent memory enables agents to remember past actions and decisions, learn from them via reflection, and apply these learnings to future decisions.

Memory is incredibly powerful for agents, allowing them to get better and better, learning over time.

Short-term memory starts fresh each time an agent runs.

Long-term memory, typically stored in a database, allows agents to learn from previous runs. Each time an agent finishes a run, the agent reflects on its work, trying to figure out how to improve.

When faced with a new task, agents query these reflection insights in order to make better and more reliable decisions.

4. Planning

Planning is another key aspect of agent design.

Many tasks can’t be accomplished in a single step.

For complex tasks, you may not be able to specify all the necessary sub-tasks ahead of time.

With planning, you let your agent decide dynamically.

Let the agent think through the steps it needs to take to accomplish the goal.

For example, suppose you have a QA agent reviewing customer service replies drafted by another agent.

The QA agent finds that part of the reply cannot be verified by information in a general knowledge base. The QA agent then decides to search through the company’s technical documentation, hosted outside of the knowledge base, to find supporting information. It didn’t find what it was looking for in one source, so it thought about what to do next, and decided to search another source.

Planning is a powerful design pattern to increase the overall flexibility and adaptability of your agent systems.

Empower agents to analyze available information, its goal, and what tools it can leverage, then formulate a reasonable plan consisting of specific discrete steps it can take.

5. Collaboration

Remember how interacting with ChatGPT leads to better answers? The back-and-forth conversation helps ChatGPT learn what you’re looking for.

The same is true for agents.

Agents can collaborate with each other, talk to each other, incorporate feedback from one another, and together iterate towards better answers.

I’ll discuss this more when we get to multi-agent systems.

6. Tools

Agents can also leverage tools, which massively extend their capabilities, such as calling external APIs, collecting data, and searching the internet.

This makes agents especially powerful and extensible.

We’ll dive into tools in the next section.

In summary, take these aspects into serious consideration in order to design effective agents and agent teams:

  1. identity

  2. memory

  3. planning

  4. narrow scope

  5. use of external tools

  6. collaboration with other agents

Empowering Agents with Tools

Tools play a big role in agent systems.

Tools allow agents to communicate with the external world, including APIs, your company internal systems and databases, and the internet.

For example, here’s a list of built-in tools supported by LangChain:

Low-code agent builder platforms offer many out-of-the-box integrations with tools for common use cases, including:

  • operations

  • marketing

  • research

  • sales

For example, imagine an AI agent tasked with outbound sales prospecting.

To be maximally effective, the agent should have tools to:

  • enrich lead information

  • look up the lead’s company and website

  • read and scrape the company website

  • google search for further research

An agent equipped with these tools will be far more productive and reliable than an agent without.

But here’s the challenge:

Agent outputs are text-based and probabilistic, but most external tools require a structured input, such as JSON.

You need to convert your agent’s output into the structured format required by most tools.

But, because of the probabilistic nature of LLMs, outputs may vary and break the connection, resulting in errors or exceptions.

Even though I explicitly ask ChatGPT for a JSON output, there is a non-zero probability that occasionally it will return an invalid JSON.

At the 20th run, ChatGPT mixes LaTeX and JSON without escaping!

If I didn’t have robust error handling in place, this would’ve crashed my app.

When designing agent systems, think about what should happen when the agent runs into a failure or exception using a tool:

  • should the agent stop?

  • should the agent try again?

Look for tools that have built-in error handling, retry mechanisms, and return helpful error messages to your agent (who may decide on a new plan).

What’s Next?

In part 2 of this series, I’ll show you how to build an agent for sales prospecting as well as a multi-agent system for content creation. I plan to use low-code tools, requiring no coding or technical background.

I’ll also discuss multi-agent systems and share a bunch of promising agent-building platforms for you to explore.

Stay tuned!

Have a question about agents?

Hit reply, and I’ll do my best to help!