AI Agent Simulation: Coin Collector Version 1

Building Foundation of Survival-Themed Agent Simulations

Lately, I’ve been playing open-world survival games, Valheim and Palworld.

Super fun!

My long-term goal is to build a similar survival-themed AI agent simulation.

Here’s my first step 🚀 

I built a simulation where an AI agent awakes, stranded on a randomly generated desert island, tasked with collecting coins on the island.

The entire simulation runs in my terminal:

Sabrina Ramonov @ sabrina.dev

From left to right:

  • island map

    • B = agent (named after my puppy Bubble)

    • C = coin

    • T = tree

    • blue wave = water

    • gray square = island

  • game details

    • score = # of coins collected

    • steps

    • current action

  • last thought

    • agent generates a thought which governs its next action

  • last feedback

    • feedback received via interaction from the environment

    • e.g. last time you tried to move left, but there was a tree

Check out Youtube to see the AI agent simulation in action!

Simulation Architecture

Here’s how the simulation works.

The agent’s internal graph state has 3 nodes:

  • act

  • move (up, down, left, right)

  • collect (pick up coin located in same position)

When an agent acts, it first thinks about what to do, then selects an action, such as move or collect.

After executing the action, the agent’s state returns to act.

Thus, act ←→ move and act ←→ collect are bidirectional edges.

Sabrina Ramonov @ sabrina.dev

The agent uses ChatGPT for pathfinding which presents challenges.

I previously explored LLM-powered pathfinding and its complexities in this newsletter series: part 1, part 2, part 3.

If the agent tries to move but encounters a tree (obstacle), then that is feedback from the environment. When planning its next action, the agent recalls this feedback. However, memory is currently limited to 1 time step.

When the agent picks up a coin, it gets feedback from the environment that a new coin has spawned on the island.

I highlight the term feedback because it’s essential to generative AI agents.

Using LangGraph and LangSmith

I’m using LangGraph and LangSmith to build this simulation.

LangGraph is a Python library for building complex and stateful gen AI apps.

It’s useful when you need an AI agent to perform complex multi-step tasks and utilize memory across interactions. Here are its key features:

  1. State Management: track of information across multiple interactions

  2. Workflow Design: create graphs of agent’s decision-making process

  3. LangChain Integration: access to many tools and capabilities

LangSmith is an LLM Ops platform to monitor, debug, QA, and evaluate your LLM-powered apps. I like this write-up, explaining the need for a full lifecycle LLM Ops platform to help manage the probabilistic nature of LLMs.

Decider Prompt for Thought Generation

Here is my initial DECIDER_PROMPT  it generates the Thought governing the agent’s next action (i.e. move or collect).

The prompt knows the agent’s current position {agent_pos} and the coin’s position {coin_pos}, each represented as coordinates (x, y).

DECIDER_PROMPT = """
Your current location is {agent_pos}.

The coin location is {coin_pos}.

To make a decision, first generate a thought that will govern your future action. The thought should be based on your goal, position on the island. Explicitly specify what influenced your thought.

Take a deep breath and generate your thought step by step.
"""

Previously, I tried letting the agent decide an action, without thinking about it.

But this led to worse results. So I introduced this “thought generation” step.

Anthropic Prompt Generator

Next, I try Anthropic’s Prompt Generator to improve DECIDER_PROMPT.

You can see the system message and Anthropic-optimized prompt below, both fed into LangGraph:

In the Youtube video, you can see the challenges with this prompt.

For instance, it weighs too heavily the “last feedback you received”.

When there is no “last feedback”, the agent gets confused. It keeps moving away from the coin, rather than towards it! Precisely the opposite of its goal.

I tweak the prompt a little bit, instructing ChatGPT to ignore last_feedback if there is none.

Much better!

Sabrina Ramonov @ sabrina.dev

Obstacle Avoidance

Obstacle avoidance is a major challenge.

For those unfamiliar with obstacle avoidance, it is AI learning how to move around without bumping into immovable objects (AKA obstacles).

Imagine a toddler learning to walk in a room full of toys. The toddler must figure out how to step around blocks and dolls to get to their favorite stuffed animal on the other side of the room.

In my simulation, trees and water are the obstacles.

Using LLMs for pathfinding with very limited memory (only 1 time step) is difficult because, as shown in the complex map below, the agent often gets stuck between 2 trees.

Sabrina Ramonov @ sabrina.dev

Memory

But, shouldn’t memory help?

Yes, most likely.

But right now, I’ve only implemented 1-step memory.

It doesn’t work well when the agent is surrounded by multiple trees.

For example, in this map, there are trees to the left and right of the agent B.

Sabrina Ramonov

The agent tries moving left, but runs into a tree.

It remembers this feedback from the environment to plan its next action.

So, it decides to move right… yikes, another tree!

Sabrina Ramonov @ sabrina.dev

But, the agent only has 1-step memory.

So, it’s already forgotten about the tree on the left!

It tries moving left again! Back and forth, the agent attempts to move, again and again, left and right, stuck between the 2 trees.

Without longer-term memory, pathfinding to the coin while surrounded by numerous obstacles is quite challenging.

Closing Words

So far, I’m enjoying building my agent simulation framework from scratch.

However, I’m tempted to allow agents to fall back on traditional ML methods for pathfinding, just so I can make progress in other parts of my simulation.

Next week, I’ll share my Love is Blind + Love Island multi-agent dating simulation. It’s pretty addictive working on the prompts to generate realistic dating profiles. Thankfully, no pathfinding involved! 😅 

Stayed tuned!

Sabrina Ramonov

P.S. If you’re enjoying the free newsletter, it’d mean the world to me if you share it with others. My newsletter just launched, and every single referral helps. Thank you!