- Sabrina Ramonov 🍄
- Posts
- How I Used ChatGPT-4o to Help Build Graph Game
How I Used ChatGPT-4o to Help Build Graph Game
Parsing Neural Network Architecture Diagrams with ChatGPT-4o
Yesterday, I hit Top 2 on HackerNews, my 2nd time in the past 7 days.
I launched Graph Game — an interactive drag-and-drop game to help you learn and practice neural network architectures.
I built Graph Game to help me visualize, understand, and remember the flow of data in various neural networks.
In this post, I walk through how I used multimodal ChatGPT-4o to help me build new levels for the game by parsing nodes and edges from diagrams.
Here’s a Youtube version of this post if you prefer to watch:
Personally, I love games for learning.
I find games more engaging than staring at a paper or book due to the immediate feedback I get when I do something right or wrong.
I grew up playing Baldur’s Gate and MMORPGs, plus I confess a briefly embarrassing obsession with Neopets.
But as discussed in the HN thread, Graph Game assumes a basic level of knowledge on neural networks. If you’re not sure what the R in RNN stands for, it may be frustrating!
The tech stack was simple:
Typescript and React Flow for the node-based UI.
Once I built a working version, my next challenge was adding new game levels, i.e. neural networks to be assembled.
This involves listing nodes, specifying their types and subtypes, and defining how they connect to each other via edges.
Here is an example specification for an LSTM cell:
Sabrina Ramonov @ sabrina.dev
But doing this manually is time-consuming, tedious, and error-prone.
I decided to try multimodal ChatGPT-4o to automate this process.
Parsing LSTM Cell Diagram
I started by giving ChatGPT an image of a long short term memory (LSTM) cell along with a list of nodes and edges, hoping this example helps ChatGPT learn and improve accuracy when parsing new diagrams.
LSTM Cell
Sabrina Ramonov @ sabrina.dev
I asked ChatGPT to analyze the diagram and understand how the diagram maps to the list of nodes and edges.
I incorporated a simple prompt engineering technique, Chain of Thought:
Take a deep breath and take it step-by-step.
The result?
Sabrina Ramonov @ sabrina.dev
ChatGPT did not disappoint!
It correctly identified all nodes and edges.
It also explained how the nodes and edges map to the architecture diagram, as well as the overall flow of the digram.
Parsing ResNet Block Diagram
Encouraged by this success, I give ChatGPT a new diagram:
Residual Network (ResNet) block.
ResNet Block
Again, it performed perfectly!
ChatGPT identified the correct nodes and edges, plus the flow of data.
Sabrina Ramonov @ sabrina.dev
Formatting the Output
But I need to reformat ChatGPT’s output, so that I can directly copy-paste it into my Graph Game code.
I ask ChatGPT to format the list of nodes and edges, and I give it an example.
The result?
Sabrina Ramonov @ sabrina.dev
Great job GPT-4o!
For each node, it lists the type, subtype, and label, just the way I need it.
Curious to test the output, I copy and paste it directly into Graph Game levels.ts
and start playing around, connecting nodes with edges:
Everything connects as expected, without any extraneous nodes or edges!
Overall, things are proceeding smoothly with multimodal GPT-4o.
It’s already saved me a bunch of time and effort I would’ve dreaded to spend doing this manually.
Parsing Deep RNN Diagram
To push the boundaries, I give ChatGPT a more complex diagram:
A deep recurrent neural network (RNN).
I follow the same approach, giving ChatGPT the diagram and asking for the list of nodes and edges.
Deep RNN
ChatGPT handled this complex diagram with ease, even correctly rendering subscript and superscript labels for each node.
Even though I didn’t provide an example of my desired output format, ChatGPT used the session context to format the output without me asking explicitly.
Sabrina Ramonov @ sabrina.dev
I literally copy-pasted GPT-4o’s output directly into my Graph Game codebase, no alterations or edits.
Voila, it worked the first try — no issues!
Conclusion
I’m super impressed how multimodal GPT-4o nailed this task, parsing nodes and edges from neural network architecture diagrams, helping me rapidly create new levels for Graph Game.
Here’s the ResNet block level and deep RNN level made by GPT-4o!
Integrating ChatGPT’s output into the codebase, everything loaded perfectly. Nodes and edges were correctly connected. No errors or issues at all.
Best of all, ChatGPT-4o saved me from the tedious task of manual data entry! …the bane of every developer’s existence.
Even if there had been small mistakes, however, fixing them would’ve been much easier than starting from scratch.
P.S. Graph Game V2
Here’s my roadmap for Graph Game v2, mostly informed by the HN thread:
Provide More Explanations and Guidance
The current version of Graph Game is geared towards people who already have basic knowledge of neural network architectures.
If you don’t know much about deep RNNs, for example, you may feel intimidated or frustrated when starting that level, seeing lots of nodes.
It’s more accurate to describe the current Graph Game v1 as “testing your knowledge of neural network architectures” rather than helping beginners learn them.
In v2, I want to add explanations embedded within the game to help beginners understand what each network does and how data is supposed to flow.
Improve UX
I also want to improve the user experience of the game:
improve mobile experience
make the dots larger and more obvious
increase the clickable area for connections
make the game objective and progress clearer
improve indication of input and output connections