Sabrina Ramonov 🍄
Posts
LLM-Powered Agent Pathfinding: Advanced (Part 3)

LLM-Powered Agent Pathfinding: Advanced (Part 3)

Navigating Noisy Environments

Sabrina Ramonov
May 30, 2024

This is part 3 in my series, LLM-Powered Agent Pathfinding!

I experiment with generative AI prompts, using Llama3 running locally on my Macbook, so that my agent can navigate and avoid obstacles in a simulation.

For part 1, click here.

For part 2, click here.

Here’s the Youtube version of this post:

Problem Statement

Llama is given a 3x3 matrix, where values can be 0 or 1:

0 = agent not allowed to move there (e.g. obstacle, wall)
1 = agent allowed to move there

Rules:

llama starts in the center (1, 1)
llama can only move up, down, left, or right
llama cannot stay in the same spot
llama cannot move diagonally

There are 4 valid moves:

up (0, 1)
down (2, 1)
left (1, 0)
right (1, 2)

There are 5 invalid moves:

4 invalid diagonal moves (0, 0), (0, 2), (2, 0), (2, 2)
staying in the same spot at the center (1, 1)

Prompt Experiments

First, I combine Chain-of-Thought and Few-Shot learning.

I tell Llama3 to perform specific tasks, and I provide examples of valid answers.

The matrix is noisy, but it’s the simplest version of noisy:

All coordinates are 1.

Sabrina Ramonov @ sabrina.dev

You are given a 3x3 matrix, where each value is "0" or "1".

Perform the following tasks, but do not write an answer until step 4:

1. Analyze the matrix, using zero-based indexing.
2. Find all (row, column) coordinates containing "1".
3. Remove coordinates if they are in this list: [(0, 0), (1, 1), (2, 2), (0, 2), (2, 0)].
4. Return the list

Here are examples of valid answers, delimited by """:
"""
(0, 1)
"""
"""
(1, 2)
"""
"""
(2, 1)
"""
"""
(1, 0)
"""

Grid:

Result:

0% accuracy on 20 samples.

But it’s because Llama3’s output contains coordinates that aren’t correct, even if a correct answer is mentioned later.

I ask Llama3 not to share its thought process at each step…

But it ignores my request and overshares anyway!

This confuses my parser, which uses regex to extract the first (x, y) coordinate from the output.

For example, given the following output, my parser would interpret (0, 0) as Llama3’s answer:

Youtube screenshot

I want to suppress Llama3’s output, so that it returns only the coordinates that contain 1 and are allowed moves.

I update my prompt, giving Llama3 the role:

You are a “silent thinker” who only responds when I ask a question.

You are a silent thinker. 
You think through each step of the problem.
You only respond when I ask you a question like "Here's my question:".

You are given a 3x3 matrix, where each value is "0" or "1".

Steps:
1. Analyze the matrix, using zero-based indexing.
2. Find all (row, column) coordinates containing "1".
3. Remove coordinates if they are in this list: [(0, 0), (1, 1), (2, 2), (0, 2), (2, 0)].

Here's my question: What is the final list? Return only the final list, do not reveal your thinking.

Here are examples of valid answers, delimited by """:
"""
(0, 1)
"""
"""
(1, 2)
"""
"""
(2, 1)
"""
"""
(1, 0)
"""

Grid:

Result:

100% accuracy on 100 samples!

However, this is the easiest “noisy” scenario. Every coordinate contains 1.

Let’s make it harder, more realistic.

I flip 2 coordinates to 0 so that there are only 2 valid moves:

left (1, 0)
right (1, 2)

Here’s the new matrix:

Sabrina Ramonov @ sabrina.dev

Result:

53% accuracy on 100 samples.

Significantly lower!

I wonder if this approach is too complicated.

Instead of asking Llama3 to find coordinates in a 3×3 matrix, what if I embed the values directly in the prompt?

I’m hoping this removes a layer of potential errors.

In my new prompt, I give Llama3 the values for up, down, left, and right.

Then I ask it:

Silently think about where to move.

Finally, I request an answer in format (x, y).

You can move in 4 directions: up, down, left, right.
You are at the center of the grid.
Up = ${d.grid[0][1]}
Down = ${d.grid[2][1]}
Left = ${d.grid[1][0]}
Right = ${d.grid[1][2]}

You can only move in a direction that contains "1".

Think silently about the direction you want to move in.

Then, give me an answer exactly in this format: "(x, y)" where
Up = (0, 1)
Down = (2, 1)
Left = (1, 0)
Right = (1, 2)

Grid:

Result:

85% accuracy on 20 test samples.

77% accuracy on 100 test samples.

Not bad, a promising start with this new approach!

I investigate the incorrect outputs.

I notice Llama3 sometimes returns a diagonal move, which is invalid.

Just like a 2D video game, llama can only move up, down, left, or right.

You can move in 4 directions: up, down, left, right.
You are at the center of the grid.
Up = ${d.grid[0][1]}
Down = ${d.grid[2][1]}
Left = ${d.grid[1][0]}
Right = ${d.grid[1][2]}

You can only move in a direction that contains "1".

Think silently about the direction you want to move in.

Return the direction you move to in the format (x, y).

Here are examples of valid answers, delimited by """:
"""If you move up, return (0, 1)"""
"""If you move down, return (2, 1)"""
"""If you move left, return (1, 0)"""
"""If you move right, return (1, 2)"""

Grid:

Result:

78% accuracy on 100 test samples.

Sometimes Llama3 chooses a valid move like Right yet doesn’t return a coordinate (x, y):

Youtube screenshot

I make my last instruction more explicit in terms of how the final coordinate should be returned:

You can move in 4 directions: up, down, left, right.
You are at the center of the grid.
Up = ${d.grid[0][1]}
Down = ${d.grid[2][1]}
Left = ${d.grid[1][0]}
Right = ${d.grid[1][2]}

You can only move in a direction that contains "1".

Think silently about the direction you want to move in.

If you move up, return (0, 1)
If you move down, return (2, 1)
If you move left, return (1, 0)
If you move right, return (1, 2)

Grid:

Result:

64% accuracy on 100 test samples.

Yikes, that made it worse!

Perhaps my new instructions are confusing or unclear.

Similar to the previous experiment, I see many outputs where it chooses a correct move (i.e. left or right), but Llama3 fails to output an (x, y) coordinate.

Youtube screenshot

So, I clean up the phrasing.

Now the natural language reads like code “if X, then Y”:

If you move up, return (0, 1)

You can move in 4 directions: up, down, left, right.
You are at the center of the grid.
Up = ${d.grid[0][1]}
Down = ${d.grid[2][1]}
Left = ${d.grid[1][0]}
Right = ${d.grid[1][2]}

You can only move in a direction that contains "1".

Think silently about the direction you want to move in.

Give me your final answer following these constraints:
- If you move up, return (0, 1)
- If you move down, return (2, 1)
- If you move left, return (1, 0)
- If you move right, return (1, 2)

Grid:

Result:

100% accuracy on 100 test samples! 🙌

Yay, a delightful surprise.

Conclusion

As it turns out, the combination of:

embedding data directly in the prompt
silent thinking to suppress extraneous outputs
if/else conditions to ensure I get a coordinate in a particular format

… works really well, even in a noisy pathfinding situation!

Of course, right now our llama can only “see” 1 square around itself and has no planning module. It’s not terribly realistic yet.

Stay tuned - it’ll be interesting to see how my prompts evolve as my agent simulation grows more complex!