Claude Sonnet vs. ChatGPT4o: The Moscow Puzzles

Visual Puzzles Involving Belts, Matches, and Goats!

In this post, I pit Claude 3.5 Sonnet versus ChatGPT-4o đźĄŠ 

The gauntlet is the The Moscow Puzzles — a book of visual math puzzles my husband loved growing up (in Moscow).

Although famous as a book for Russian kids, no advanced math required, I bet most STEM-centric adults would enjoy it too.

The Moscow Puzzles

If you’re not familiar with Claude Sonnet, it’s available for free at Claude.ai and the Claude iOS app.

Based on research benchmarks, Claude 3.5 Sonnet excels in graduate-level reasoning, undergraduate knowledge, coding proficiency, and complex task handling, with notable improvements in understanding nuance, humor, and complex instructions.

Most relevant to my experiment, Claude 3.5 Sonnet demonstrates significantly enhanced visual reasoning capabilities.

We’ll see about that…

Rotating Connected Belts

Question 1:

(try to solve it before reading the answer)

The Moscow Puzzles

Claude 3.5 Sonnet’s answer:

Claude Sonnet 3.5

ChatGPT-4o’s answer:

ChatGPT-4o

Correct answer:

  • If wheel A rotates clockwise, yes all 4 wheels can rotate

    • A clockwise

    • B counterclockwise

    • C clockwise

    • D clockwise

  • Yes, the wheels can turn if all 4 belts are crossed

  • No, the wheels cannot turn if 1 or 3 belts are crossed

Claude got the reasoning wrong and the final answer wrong!

Not off to a great start, Claude…

On the other hand, ChatGPT got one part of the answer correct:

If wheel A rotates clockwise, all 4 wheels can rotate.

But, ChatGPT’s reasoning was slightly off — it thinks wheel D will rotate counterclockwise, when wheel A rotates clockwise. This is wrong.

Also, its analysis of 1 and 3-belt scenarios is wrong. And ChatGPT didn’t even attempt to answer the question about all 4 belts crossed.

Nonetheless, ChatGPT is the winner this round because Claude flopped.

Spiral Matches to Squares

Question 2:

(try to solve it before reading the answer)

The Moscow Puzzles

Claude 3.5 Sonnet’s answer:

Claude 3.5 Sonnet

ChatGPT-4o’s answer:

ChatGPT-4o

Curious to see if ChatGPT could provide a decent visualization of its answer, I requested: “show me a visual the the 4 matches you moved”.

Here is ChatGPT’s attempted visualization of each match move:

ChatGPT-4o

ChatGPT-4o

Correct answer:

You create 3 squares from moving 4 matches, as shown in the image below:

  • big outer square

  • semi-big middle square, within big outer square

  • small square, within semi-big middle square

Now review Claude and ChatGPT’s answers.

Claude lays out its solution steps. The 1st step is perfect, moving a match in the upper right corner to complete the big outer square, same as I’ve done.

But, Claude’s proposed 2nd step doesn’t make sense. It wants to complete a square in the bottom right corner, but you need 2 matches to do that, not just 1.

Claude’s 3rd step works fine — take 2 matches from the innermost spiral and complete a square in the bottom left corner.

But in total, Claude only formed 2 squares, not 3.

Not an epic fail, at least Claude parsed the diagram and question correctly, but still disappointing.

Now review ChatGPT’s response…

I found it pretty confusing after a few re-reads. Even after asking ChatGPT to produce a visualization of what it means, I gain zero clarity.

Giving ChatGPT the benefit of the doubt, here’s what I think it means:

ChatGPT-4o

The lines in red I’m more confident about.

But the lines in blue — I’m not sure if ChatGPT meant to move that match to complete the square in the bottom left corner, or start a new square to be completed in the bottom right corner.

If the former, then ChatGPT’s answer is valid — it moved 4 matches and created 3 squares:

  • big outer square

  • semi-big middle square, within big outer square

  • small square, in bottom left corner

Although its answer isn’t clear cut, I’ll give ChatGPT the benefit of the doubt.

This means ChatGPT-4o wins round 2 (barely) over Claude 3.5 Sonnet.

Separate Goats from Cabbage

Question 3:

(try to solve it before reading the answer)

Easily my favorite question in the book… goats and cabbage, who needs more in life?!?

The Moscow Puzzles

Claude 3.5 Sonnet’s answer:

Claude 3.5 Sonnet

ChatGPT-4o’s answer:

ChatGPT-4o

Correct answer:

To separate all the goats from the cabbage, draw 3 straight lines like this.

Each goat is in a zone without any cabbage.

The Moscow Puzzles

Claude got 2 of the 3 lines correct:

  • diagonal line from top left corner to bottom right

  • diagonal line from top right corner to bottom left

But its 3rd line (“horizontally near the bottom”) didn’t make sense. I did my best to visualize Claude’s proposed solution, but it’s obvious that it fails.

Nonetheless… 2 out of 3 ain’t bad, nice job Claude!

Here is my visualization of ChatGPT’s final answer. Its first answer was very, very wrong. For some reason, it constrained itself to vertical and horizontal lines only. Here’s my follow-up prompt to ChatGPT to loosen that constraint:

Try again. You are not limited to drawing horizontal or vertical lines. You can draw diagonal lines too. Take a deep breath and take it step-by-step.

Interestingly, Claude didn’t need a follow-up prompt at all.

It’s first answer (zero shot) already included 2 diagonal lines, both correct.

After the follow-up prompt to ChatGPT, its answer noticeably improved. It proposed the same 2 diagonal lines as Claude, which are correct.

Weirdly enough, ChatGPT also proposes as its 3rd line a “horizontal line through the middle”. Recall that Claude’s 3rd line was also a “horizontal line” towards the bottom.

For this round, Claude is the winner because it got 2/3 lines correct without additional prompting.

Last Thoughts

ChatGPT-4o remains the undefeated champion!

ChatGPT bested Claude in 2 out of 3 multimodal Moscow math puzzles.

But round 2 (spiral & matches) was a close call, where I gave ChatGPT major benefit of the doubt.

I’d love to run more experiments, but I already reached my Claude free tier limit!

Here’s the Youtube version of this post:

Have fun experimenting!

Sabrina Ramonov

P.S. If you’re enjoying the free newsletter, it’d mean the world to me if you share it with others. My newsletter just launched, and every single referral helps. Thank you!