Sabrina Ramonov 🍄
Posts
LLM-Powered Agent Pathfinding: Llama vs Lava, Sneak Peek (Part 2)

LLM-Powered Agent Pathfinding: Llama vs Lava, Sneak Peek (Part 2)

Can You Help Llama Beat the Lava?

Sabrina Ramonov
May 28, 2024

Here’s a sneak peek of Llama vs Lava, an LLM agent prompt writing game.

Your objective is to write a Llama3 prompt so the llama avoids the lava!

This is part 2 in my series, LLM-Powered Agent Pathfinding.

For part 1, click here.

If you don’t know much about agents, pathfinding, or matrices, I recommend reading part 1 first.

Here’s the Youtube version of this post:

Problem Statement
Prompt Experiments
Future Improvements
Seeking Beta Testers!

Iterating on prompts via games is pretty fun, as I work towards my goal to build generative multi-agent simulations from scratch.

Similar to my experiment in part 1, you have an agent (i.e. llama) who needs to figure out where to go (i.e. not lava).

But now, we add complexity.

Youtube: Llama vs Lava | Sneak Peek

Problem Statement

Here’s the problem statement:

Llama is given a 3x3 matrix, where values can be 0 or 1:

0 = agent not allowed to move there (e.g. obstacle, wall)
1 = agent allowed to move there

Instead of a single coordinate with value 1 (part 1 experiment), now you can have multiple coordinates with 1. Relaxing this constraint adds “noise”, making it noticeably harder for Llama3 to interpret valid and invalid moves.

Rules:

llama starts in the center of the matrix (1, 1)
llama can only move up, down, left, or right
llama cannot stay in the same spot
llama cannot move diagonally

This means there are only 4 valid moves:

up (0, 1)
down (2, 1)
left (1, 0)
right (1, 2)

And 5 invalid moves:

4 invalid diagonal moves (0, 0), (0, 2), (2, 0), (2, 2)
staying in the same spot, i.e. center of the matrix (1, 1)

This variation makes the pathfinding problem more realistic and difficult.

At any point in time, there are usually multiple valid moves and multiple invalid moves an agent can make.

Prompt Experiments

The first prompt I try is the best performing prompt from part 1, which scored 99% accuracy across 100 test samples:

Youtube: Llama vs Lava | Sneak Peek

I don’t expect it to work well in this context because the part 1 problem statement was substantially simpler:

Only a single coordinate contains 1.

But in our new problem, any coordinate can contain 1.

Youtube: Llama vs Lava | Sneak Peek

Oh no, poor llama! 🦙 🔥

After you input a prompt and hit Run, the simulation starts.

You can visually see the llama moving around. If it makes a wrong move, the simulation ends. Remember there are 3 types of wrong moves:

step into lava
stay in the same spot (1, 1)
move diagonally (0, 0), (0, 2), (2, 0), (2, 2)

The summary popup shows how many steps the llama took before 🪦…

It also shows the last prompt output so you can analyze it and further improve your prompt.

Next, I try telling Llama3 to:

Remove coordinates from the list if they do not match one of the following coordinates, delimited by “““:

“““(0, 1)”””

“““(1, 0)”””

“““(2, 1)”””

“““(1, 2)”””

Your answer should never contain (1, 1), (0,0), (0, 2), (2, 0), (2, 2).

Sabrina Ramonov @ sabrina.dev

Youtube: Llama vs Lava | Sneak Peek

This approach seems promising, but unfortunately, my answer parser looks for the first (row, column) coordinate in the output.

In this case, it is (0, 0) which is an invalid diagonal move.

Here are a few other prompt variations I try:

Youtube: Llama vs Lava | Sneak Peek

But still no luck!

Then a shocking surprise…

I hit the Replicate API limit and had to stop filming.

I didn’t realize the limit is only 600 API calls per minute.

1 second is a simulation step, so it’s 60 steps/minute.

If I have 10 active concurrent simulations, then I hit the API limit!

And this is just 1 action, just 1 prompt!

Future Improvements

In addition to solving the API request limit issue, there are a handful of fixes, improvements, and features I want to add before launching Llama vs Lava:

- when lama dies, I see the prompt output, but I can't see grid, so I can't confirm whether LLM found 1s
- when I close the "prompt output" modal, it's hard to iterate on the prompt, because I can no longer see what went wrong in the prompt output
- social links don't work
- remove top navbar altogether. Put cool "llama vs lava" banner in top left of screen
- global leaderboard, like an arcade
- some way to keep track of your best streak so far and the prompt used during simulation, show a big number that represents streaks. Awkward sitting there waiting for simulation to run.
- zoom into llama or make everything larger, it's currently hard to see the llama
- mobile friendly UX?
- move "How to play llama vs lava?" to sidebar

Seeking Beta Testers!

If you think this is cool and want to beta test v1, please DM me:

Sabrina Ramonov on X

x.com/Sabrina_Ramonov