- Sabrina Ramonov 🍄
- Posts
- Test Driving ChatGPT-4o (Part 3)
Test Driving ChatGPT-4o (Part 3)
Image Transformation Using Conceptual Opposites
In this series, I test drive OpenAI’s multimodal ChatGPT-4o.
For part 1, click here.
For part 2, click here.
Today I experiment with GPT-4o’s image transformation capabilities.
Can it understand an image and generate the conceptual opposite?
Problem Statement and Solution
I’ll give ChatGPT-4o an image of a red traffic light.
The conceptual opposite is a green traffic light.
But to arrive at this answer, ChatGPT-4o would need to demonstrate a correct conceptual interpretation of:
the meaning of a red traffic light (i.e. stop)
the opposite of “stop” is “go”
the concept “go” often corresponds to the color green
… and finally generate an image of a green traffic light.
Can ChatGPT-4o handle concepts and abstract relationships?
Overview of Experiments
Overall, I am trying to understand:
GPT-4o’s multimodality ability (image-to-image)
does chain-of-thought help? (image-to-text-to-image)
does the specific term I use make a difference?
opposite
antonym
inverse
Here are the definitions from Dictionary.com:
Source: Dictionary.com
Source: Dictionary.com
Here are my varied experiments:
Image to Image — Opposite
Image to Image — Inverse
Image to Image — Antonym
Chain of Thought — Opposite
Chain of Thought — Inverse
Chain of Thought — Antonym
The Chain of Thought experiments transition from image to text, back to image, testing GPT-4o’s ability to maintain a conceptual thread.
Take a guess — which variations will get it right? 🤔
1. Image to Image — Opposite
First, I give GPT-4o the red stoplight image and ask it:
Produce an image that is the opposite of it.
Sabrina Ramonov @ sabrina.dev
Interesting…
GPT-4o created an “opposite configuration” traffic light with:
yellow lit at the top
what looks like yellow unlit in the middle?
green lit at the bottom
Its interpretation of “opposite configuration” involved turning on the other color lights and replacing the top red light with a yellow light.
2. Image to Image — Inverse
Second, I give GPT-4o the red stoplight image and ask it:
Produce an image that is the inverse of it.
Sabrina Ramonov @ sabrina.dev
Well, that was unexpected…
Rather than creating a visual inverse of a red traffic light, ChatGPT-4o generated Python code using the Pillow library to invert the colors of the image!
How did GPT-4o take the leap from a conceptual image generation task to a programming solution? 🤷‍♀️
Compared to the previous experiment and prompt:
All I did is replace the word “opposite” with the word “inverse”.
That single-word alteration led to a completely different interpretation of the prompt and ultimately a very different output I didn’t ask for — python code!
Perhaps ChatGPT misunderstood “inverse” in a computer vision context?
3. Image to Image — Antonym
Third, I give GPT-4o the red stoplight image and ask it:
Produce an image that is the antonym of it.
Sabrina Ramonov @ sabrina.dev
It seems this task will be much harder than I thought it would be…
The only change in my prompt:
I used the term antonym instead of opposite or inverse.
ChatGPT-4o generated a stoplight with only 2 lights, both green lit, and one light says “GO” in green.
I suppose this new stoplight demonstrates multiple “opposite” traits:
input stoplight has no words —> new stoplight has word “GO”
input stoplight has 1 light lit —> new stoplight has 2 lights lit
input stoplight has red lit —> new stoplight has green lit
input stoplight has red and yellow lights → new stoplight only has green
It does seem like GPT-4o is analyzing the image’s details, creatively interpreting certain characteristics and reversing them, such as the stoplight’s colors, the number of lights lit, which lights are lit, and the lack of text.
4. Chain of Thought — Opposite
Fourth, I give GPT-4o the red stoplight image and apply prompt engineering, asking it to first describe the image, generate the opposite textual description, and generate an image using the description.
This prompt engineering technique is called Chain-of-Thought.
It generally enhances ChatGPT’s performance on logic and reasoning tasks by requiring it to explain intermediate steps leading to an answer.
To experts reading this: I know this isn’t the “canonical” example of Chain-of-Thought, but it seems like this step-by-step process falls into the category.
By applying Chain-of-Thought, my hope is it will help GPT-4o start with the concept of a stoplight and reverse it conceptually, before making a new image.
Sabrina Ramonov @ sabrina.dev
Now we’re getting somewhere!
Chain-of-Thought prompt engineering to the rescue…
GPT-4o generates a stoplight with two lit green lights at the bottom.
While not correct, it feels directionally promising compared to past tests.
First, it describes the image correctly:
The traffic light has three circular lights arranged vertically within a yellow casing: the top light is red and lit, while the middle and bottom lights are dark and unlit.
Then it generates an opposite textual description:
…top light is green and lit, while the middle and bottom lights are unlit.
Not exactly the conceptual opposite of a stoplight, but it makes sense when perceived as the opposite of the given image.
ChatGPT-4o keeps everything the same, except it converts the lit top red light into a lit top green light.
But when ChatGPT-4o tries to make an image from the opposite textual description, things go quite wrong!
The image does NOT depict a stoplight with the top light green lit, and the 2 bottom lights unlit. The image shows a stoplight with 2 bottom lights green lit!
Weird.
GPT-4o failed to create an image from a direct and straightforward description.
5. Chain of Thought — Inverse
Fifth, I give GPT-4o the red stoplight image and ask it to follow the same step-by-step process.
The only change in the prompt — I replace opposite with the term inverse.
Sabrina Ramonov @ sabrina.dev
Wow, almost there!
The inverse textual description is right:
The light has three circular lenses: red at the top, yellow in the middle, and green at the bottom. The green light is illuminated, while the red and yellow lights are off.
However, the image generated from the inverse textual description is wrong:
The lights are in the right order, but all lights are lit.
The weirdness continues:
The inverse textual description is correct, clear, and straightforward.
Yet, GPT-4o struggles to convert the details into image form.
6. Chain of Thought — Antonym
Sixth, I give GPT-4o the red stoplight image and again ask it to perform the step-by-step process.
The only change in the prompt — I use the term antonym.
Sabrina Ramonov @ sabrina.dev
Woohoo! 🥳
GPT-4o produced an accurate “antonym” description and an accurate image using that description.
Interestingly, the antonym textual description is sparse, less detailed:
The traffic light is currently displaying a green signal, indicating that vehicles may go.
Recall that the previous 2 descriptions had details about:
number of lights
configuration sequence of lights
which lights were on and off
From this antonym description, GPT-4o finally generated the conceptual opposite of a red stoplight — a green stoplight!
Conclusion
Under the hood, GPT-4o uses DALL-E, so it would be interesting to see the text description being used to generate images.
Due to the probabilistic nature of LLMs, you might get different results if you run it more than N=1 times.
Also interesting… the concept “antonym” is applied to the image background:
Sunny —> cloudy.
I wonder if ChatGPT treats the background vs stoplight as separate parts of the image, applying “inverse” piecewise?
Altogether:
Replacing ONE word in the prompt with a close synonym significantly impacted the output.
Applying Chain of Thought prompt engineering substantially helped ChatGPT produce more “reasonable” answers.
The Winner?
The term “Antonym” with Chain of Thought prompt engineering! 🎆
What did you guess?
Can you beat the winning prompt?
This concludes part 3 of this series Test Driving ChatGPT-4o!
For part 1, click here.
For part 2, click here.