- Sabrina Ramonov 🍄
- Posts
- Claude Sonnet vs. ChatGPT4o: The Moscow Puzzles
Claude Sonnet vs. ChatGPT4o: The Moscow Puzzles
Visual Puzzles Involving Belts, Matches, and Goats!
In this post, I pit Claude 3.5 Sonnet versus ChatGPT-4o 🥊
The gauntlet is the The Moscow Puzzles — a book of visual math puzzles my husband loved growing up (in Moscow).
Although famous as a book for Russian kids, no advanced math required, I bet most STEM-centric adults would enjoy it too.
The Moscow Puzzles
If you’re not familiar with Claude Sonnet, it’s available for free at Claude.ai and the Claude iOS app.
Based on research benchmarks, Claude 3.5 Sonnet excels in graduate-level reasoning, undergraduate knowledge, coding proficiency, and complex task handling, with notable improvements in understanding nuance, humor, and complex instructions.
Most relevant to my experiment, Claude 3.5 Sonnet demonstrates significantly enhanced visual reasoning capabilities.
We’ll see about that…
Rotating Connected Belts
Question 1:
(try to solve it before reading the answer)
The Moscow Puzzles
Claude 3.5 Sonnet’s answer:
Claude Sonnet 3.5
ChatGPT-4o’s answer:
ChatGPT-4o
Correct answer:
If wheel A rotates clockwise, yes all 4 wheels can rotate
A clockwise
B counterclockwise
C clockwise
D clockwise
Yes, the wheels can turn if all 4 belts are crossed
No, the wheels cannot turn if 1 or 3 belts are crossed
Claude got the reasoning wrong and the final answer wrong!
Not off to a great start, Claude…
On the other hand, ChatGPT got one part of the answer correct:
If wheel A rotates clockwise, all 4 wheels can rotate.
But, ChatGPT’s reasoning was slightly off — it thinks wheel D will rotate counterclockwise, when wheel A rotates clockwise. This is wrong.
Also, its analysis of 1 and 3-belt scenarios is wrong. And ChatGPT didn’t even attempt to answer the question about all 4 belts crossed.
Nonetheless, ChatGPT is the winner this round because Claude flopped.
Spiral Matches to Squares
Question 2:
(try to solve it before reading the answer)
The Moscow Puzzles
Claude 3.5 Sonnet’s answer:
Claude 3.5 Sonnet
ChatGPT-4o’s answer:
ChatGPT-4o
Curious to see if ChatGPT could provide a decent visualization of its answer, I requested: “show me a visual the the 4 matches you moved”.
Here is ChatGPT’s attempted visualization of each match move:
ChatGPT-4o
ChatGPT-4o
Correct answer:
You create 3 squares from moving 4 matches, as shown in the image below:
big outer square
semi-big middle square, within big outer square
small square, within semi-big middle square
Now review Claude and ChatGPT’s answers.
Claude lays out its solution steps. The 1st step is perfect, moving a match in the upper right corner to complete the big outer square, same as I’ve done.
But, Claude’s proposed 2nd step doesn’t make sense. It wants to complete a square in the bottom right corner, but you need 2 matches to do that, not just 1.
Claude’s 3rd step works fine — take 2 matches from the innermost spiral and complete a square in the bottom left corner.
But in total, Claude only formed 2 squares, not 3.
Not an epic fail, at least Claude parsed the diagram and question correctly, but still disappointing.
Now review ChatGPT’s response…
I found it pretty confusing after a few re-reads. Even after asking ChatGPT to produce a visualization of what it means, I gain zero clarity.
Giving ChatGPT the benefit of the doubt, here’s what I think it means:
ChatGPT-4o
The lines in red I’m more confident about.
But the lines in blue — I’m not sure if ChatGPT meant to move that match to complete the square in the bottom left corner, or start a new square to be completed in the bottom right corner.
If the former, then ChatGPT’s answer is valid — it moved 4 matches and created 3 squares:
big outer square
semi-big middle square, within big outer square
small square, in bottom left corner
Although its answer isn’t clear cut, I’ll give ChatGPT the benefit of the doubt.
This means ChatGPT-4o wins round 2 (barely) over Claude 3.5 Sonnet.
Separate Goats from Cabbage
Question 3:
(try to solve it before reading the answer)
Easily my favorite question in the book… goats and cabbage, who needs more in life?!?
The Moscow Puzzles
Claude 3.5 Sonnet’s answer:
Claude 3.5 Sonnet
ChatGPT-4o’s answer:
ChatGPT-4o
Correct answer:
To separate all the goats from the cabbage, draw 3 straight lines like this.
Each goat is in a zone without any cabbage.
The Moscow Puzzles
Claude got 2 of the 3 lines correct:
diagonal line from top left corner to bottom right
diagonal line from top right corner to bottom left
But its 3rd line (“horizontally near the bottom”) didn’t make sense. I did my best to visualize Claude’s proposed solution, but it’s obvious that it fails.
Nonetheless… 2 out of 3 ain’t bad, nice job Claude!
Here is my visualization of ChatGPT’s final answer. Its first answer was very, very wrong. For some reason, it constrained itself to vertical and horizontal lines only. Here’s my follow-up prompt to ChatGPT to loosen that constraint:
Try again. You are not limited to drawing horizontal or vertical lines. You can draw diagonal lines too. Take a deep breath and take it step-by-step.
Interestingly, Claude didn’t need a follow-up prompt at all.
It’s first answer (zero shot) already included 2 diagonal lines, both correct.
After the follow-up prompt to ChatGPT, its answer noticeably improved. It proposed the same 2 diagonal lines as Claude, which are correct.
Weirdly enough, ChatGPT also proposes as its 3rd line a “horizontal line through the middle”. Recall that Claude’s 3rd line was also a “horizontal line” towards the bottom.
For this round, Claude is the winner because it got 2/3 lines correct without additional prompting.
Last Thoughts
ChatGPT-4o remains the undefeated champion!
ChatGPT bested Claude in 2 out of 3 multimodal Moscow math puzzles.
But round 2 (spiral & matches) was a close call, where I gave ChatGPT major benefit of the doubt.
I’d love to run more experiments, but I already reached my Claude free tier limit!
Here’s the Youtube version of this post:
Have fun experimenting!
Sabrina Ramonov
P.S. If you’re enjoying the free newsletter, it’d mean the world to me if you share it with others. My newsletter just launched, and every single referral helps. Thank you!