AI Agent Orders Pizza 🍕

How to build an AI agent that controls your local Chrome browser and performs actions to achieve a goal, like ordering pizza!

I built an AI agent that orders me pizza from Doordash 🍕 

I gave it a high-level task. Then, the AI agent performed these steps:

  • open my local Chrome browser

  • navigate to the Doordash website, where I’m already logged in

  • find and select cheese pizza from my favorite pizza place

  • add pizza to cart

  • go to checkout screen

  • realize the store is still closed (uh oh!)

  • agent figures out it should schedule the delivery for later this afternoon

  • agent confirms and places the order!

đź’ˇ The key thing:

I did NOT hardcode this logic into the AI agent. It figured it out dynamically in real-time by understanding the goal (i.e. order pizza), analyzing each web page (e.g. doordash), formulating a plan, performing an action (i.e. click on button), then reflecting on whether the action taken resulted in progress towards the goal.

AI Automation vs AI Agents

There’s a big difference between automation and agents.

Automation follows rules.

Agents figure things out given a task, a set of tools, and a set of possible actions.

The newest wave of AI agents is starting to look much more like the second than the first. Emphasis on “STARTING” - it’s still very early days and difficult to build robust agentic systems without technical expertise.

Ordering a pizza is a simple enough task for you and me. Open the app/webiste, find your restaurant, pick your toppings, choose a delivery time, check out.

But for a computer?

That used to require writing code specifying exactly where to click and what to type.

In this experiment, I only provided a high-level task:

Open 'Lucky Slice Pizza' on Doordash and place an order for cheese pizza: 
<doordash_url>

Instead of following a predefined script or workflow automation steps, my AI agent uses multimodal capabilities (e.g. vision) to analyze each webpage.

It recognizes text, images, and buttons, then decides what makes sense to click next.

When something unexpected happens — like the restaurant being closed — the AI agent didn’t just fail.

It adapted! 🔥 

It adjusted its plan dynamically. In this case, the agent scheduled the order for later, just as a person would.

How to Build Your Own AI Agent

Here's a guide to setting this up.

I recommend watching the Youtube tutorial as well:

  1. Clone the Project: Go to GitHub and click the green Code button to clone the project to your local directory. You can also click Download ZIP and extract the zip file anywhere in your computer. You should now have a new directory called browser-use.

  1. Open your terminal (on Mac, it’s called Terminal or iTerm; on Windows, I highly recommend setting up WSL2)

  2. Navigate to the browser-use directory you just created and use pip to install python dependencies. If you don’t have pip installed, ask ChatGPT/Deepseek to walk you through the process.

  3. Follow the instructions in the quick start guide:

  1. Create a Python Script: Name it test.py and paste this sample code:

from browser_use import Agent, Browser, BrowserConfig, Controller
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
import asyncio

# Configure the browser to connect to your Chrome instance
browser = Browser(
    config=BrowserConfig(
        # Specify the path to your Chrome executable
        chrome_instance_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",  # macOS path
        # For Windows, typically: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
        # For Linux, typically: '/usr/bin/google-chrome'
    )
)

# Central agent
agent = Agent(
    task="""
Open 'Lucky Slice Pizza' on Doordash and place an order for cheese pizza: 
https://www.doordash.com/store/lucky-slice-pizza-ogden-229628
""",
    llm=ChatOpenAI(model="gpt-4o", timeout=1000),
    browser=browser,
)


async def main():
    await agent.run(max_steps=1000)
    await browser.close()


if __name__ == "__main__":
    asyncio.run(main())
  1. Ensure ALL Chrome browsers are closed before running the AI agent.

  2. Environment Variables: Create a .env file and paste your OpenAI API key

  3. Run Script: Execute the python script by typing python test.py in your terminal and press Enter

  4. If you need to STOP the AI agent, click CMD X (Mac)

If you run into errors or confusion, ask ChatGPT or DeepSeek for help. This is an important habit to establish!

Challenges

To my surprise, my AI agent consistently succeeds performing this task.

A few years ago, even the best AI models struggled to do this.

If it can do this for a complex website like Doordash, with lots of images, buttons, options to click… imagine capabilities 5 years from now.

However, other tasks have been more challenging, such as scraping.

In my opinion, the main constraint is reliability. AI agents still make mistakes, sometimes quite surprising unexpected mistakes. Sometimes, they don’t even know they’re making mistakes.

But the trend is clear: they're getting better âś… 

The holy grail:

You describe the goal. Your AI agent figures out the rest.

Need More Help? đź‘‹ 

If you want to learn how to grow on social media and make $20K/month through coaching, consulting, speaking, selling apps, or digital products… check out Blotato

2/ Free AI courses & playbooks here.