- Sabrina Ramonov š
- Posts
- AI Agent Simulation: Coin Collector Version 1
AI Agent Simulation: Coin Collector Version 1
Building Foundation of Survival-Themed Agent Simulations
Super fun!
My long-term goal is to build a similar survival-themed AI agent simulation.
Hereās my first step š
I built a simulation where an AI agent awakes, stranded on a randomly generated desert island, tasked with collecting coins on the island.
The entire simulation runs in my terminal:
Sabrina Ramonov @ sabrina.dev
From left to right:
island map
B = agent (named after my puppy Bubble)
C = coin
T = tree
blue wave = water
gray square = island
game details
score = # of coins collected
steps
current action
last thought
agent generates a thought which governs its next action
last feedback
feedback received via interaction from the environment
e.g. last time you tried to move left, but there was a tree
Check out Youtube to see the AI agent simulation in action!
Simulation Architecture
Hereās how the simulation works.
The agentās internal graph state has 3 nodes:
act
move (up, down, left, right)
collect (pick up coin located in same position)
When an agent acts, it first thinks about what to do, then selects an action, such as move or collect.
After executing the action, the agentās state returns to act.
Thus, act āā move and act āā collect are bidirectional edges.
Sabrina Ramonov @ sabrina.dev
The agent uses ChatGPT for pathfinding which presents challenges.
I previously explored LLM-powered pathfinding and its complexities in this newsletter series: part 1, part 2, part 3.
If the agent tries to move but encounters a tree (obstacle), then that is feedback from the environment. When planning its next action, the agent recalls this feedback. However, memory is currently limited to 1 time step.
When the agent picks up a coin, it gets feedback from the environment that a new coin has spawned on the island.
I highlight the term feedback because itās essential to generative AI agents.
Using LangGraph and LangSmith
Iām using LangGraph and LangSmith to build this simulation.
LangGraph is a Python library for building complex and stateful gen AI apps.
Itās useful when you need an AI agent to perform complex multi-step tasks and utilize memory across interactions. Here are its key features:
State Management: track of information across multiple interactions
Workflow Design: create graphs of agentās decision-making process
LangChain Integration: access to many tools and capabilities
LangSmith is an LLM Ops platform to monitor, debug, QA, and evaluate your LLM-powered apps. I like this write-up, explaining the need for a full lifecycle LLM Ops platform to help manage the probabilistic nature of LLMs.
Decider Prompt for Thought Generation
Here is my initial DECIDER_PROMPT ā it generates the Thought governing the agentās next action (i.e. move or collect).
The prompt knows the agentās current position {agent_pos} and the coinās position {coin_pos}, each represented as coordinates (x, y).
DECIDER_PROMPT = """
Your current location is {agent_pos}.
The coin location is {coin_pos}.
To make a decision, first generate a thought that will govern your future action. The thought should be based on your goal, position on the island. Explicitly specify what influenced your thought.
Take a deep breath and generate your thought step by step.
"""
Previously, I tried letting the agent decide an action, without thinking about it.
But this led to worse results. So I introduced this āthought generationā step.
Anthropic Prompt Generator
Next, I try Anthropicās Prompt Generator to improve DECIDER_PROMPT.
You can see the system message and Anthropic-optimized prompt below, both fed into LangGraph:
In the Youtube video, you can see the challenges with this prompt.
For instance, it weighs too heavily the ālast feedback you receivedā.
When there is no ālast feedbackā, the agent gets confused. It keeps moving away from the coin, rather than towards it! Precisely the opposite of its goal.
I tweak the prompt a little bit, instructing ChatGPT to ignore last_feedback if there is none.
Much better!
Sabrina Ramonov @ sabrina.dev
Obstacle Avoidance
Obstacle avoidance is a major challenge.
For those unfamiliar with obstacle avoidance, it is AI learning how to move around without bumping into immovable objects (AKA obstacles).
Imagine a toddler learning to walk in a room full of toys. The toddler must figure out how to step around blocks and dolls to get to their favorite stuffed animal on the other side of the room.
In my simulation, trees and water are the obstacles.
Using LLMs for pathfinding with very limited memory (only 1 time step) is difficult because, as shown in the complex map below, the agent often gets stuck between 2 trees.
Sabrina Ramonov @ sabrina.dev
Memory
But, shouldnāt memory help?
Yes, most likely.
But right now, Iāve only implemented 1-step memory.
It doesnāt work well when the agent is surrounded by multiple trees.
For example, in this map, there are trees to the left and right of the agent B
.
Sabrina Ramonov
The agent tries moving left, but runs into a tree.
It remembers this feedback from the environment to plan its next action.
So, it decides to move rightā¦ yikes, another tree!
Sabrina Ramonov @ sabrina.dev
But, the agent only has 1-step memory.
So, itās already forgotten about the tree on the left!
It tries moving left again! Back and forth, the agent attempts to move, again and again, left and right, stuck between the 2 trees.
Without longer-term memory, pathfinding to the coin while surrounded by numerous obstacles is quite challenging.
Closing Words
So far, Iām enjoying building my agent simulation framework from scratch.
However, Iām tempted to allow agents to fall back on traditional ML methods for pathfinding, just so I can make progress in other parts of my simulation.
Next week, Iāll share my Love is Blind + Love Island multi-agent dating simulation. Itās pretty addictive working on the prompts to generate realistic dating profiles. Thankfully, no pathfinding involved! š
Stayed tuned!
Sabrina Ramonov
P.S. If youāre enjoying the free newsletter, itād mean the world to me if you share it with others. My newsletter just launched, and every single referral helps. Thank you!
Or, share by copying and pasting the link: https://www.sabrina.dev