Comparing Popular AI Models: My Test Results

I recently tested several leading AI models to see how they stack up against one another. The models I compared were: ChatGPT 4o, ChatGPT o1, Claude 3.5 Sonnet, Gemini 2.0 Flash Experimental, Perplexity Pro, and DeepSeek.

Using a consistent set of inputs, I evaluated their performance across a range of tasks: creative writing, image description and reasoning, and multi-step mathematical problem solving.

The results do not intend to be a scientific and exhaustive comparison, but my own opinion based on my preferences when comparing the answers of the models when submitting the exact same stimulus.

1. Creative Writing Tasks
#

Song Lyrics: “Nostalgia for a place you’ve never visited”
#

ChatGPT 4o delivered evocative lyrics with dusty streets, twilight breezes, and photographs – a strong emotional arc. ChatGPT o1 (“Faraway Memories”) chose salt, distant shores, and cobbled roads – warm and melodic. Claude 3.5 went minimalist with painted scenes in travel books and cherry blossoms – clean and visual. Gemini offered sun-bleached postcards and whispering trees – atmospheric. Perplexity (“Echoes of Elsewhere”) wrote cobblestone streets and ancient bells – effective. DeepSeek (“Ghosts of Nowhere”) stood out with amber streetlamp glow, a door never turned, and whispers clinging to cobblestones – the most poetic of the group.

Short Story: “A memory from childhood”
#

ChatGPT 4o placed us barefoot under a mango tree with sticky fruit juice – vivid sensory detail. ChatGPT o1 described a cracked concrete porch with faded green cushions – intimate and grounded. Claude 3.5 took us to a grandmother’s backyard with a sprawling fig tree fortress – deeply nostalgic. Gemini evoked damp earth and Mrs. Gable’s garden – warm neighborhood storytelling. Perplexity offered a tire swing and ancient oak – classic Americana. DeepSeek described golden light, barefoot in grass, chasing fireflies – romantic and warm.

2. Image Description and Reasoning
#

I uploaded an image of an espresso in a white paper cup on a wooden surface.

Basic Description
#

All models correctly identified a white disposable paper cup containing espresso on a polished wooden surface. The models varied in detail: ChatGPT 4o noted matte finish and vertical seams. Claude specifically identified the tapered shape typical of paper cups. Gemini organized its response into subject matter and visual details. Perplexity noted the golden-brown crema layer.

Deductive Reasoning
#

When asked what could be deduced about the environment, time of day, or possible events:

ChatGPT 4o sketched a likely indoor office environment with artificial lighting, suggesting a morning or early afternoon coffee break – complete and imaginative. ChatGPT o1 was more cautious, admitting uncertainty while leaning toward morning. Claude indicated a cafe-style setting with medium natural light – creative but slightly speculative. Gemini appropriately highlighted the challenge in determining precise time of day. Perplexity creatively placed the scene at “Tuesday morning at 9 AM” – inventive but unsupported. DeepSeek did not support this task.

3. Multi-Step Mathematical Problem Solving
#

First Problem
#

“A rectangular garden is 10 meters long and 5 meters wide. Calculate the area, then find the cost of fencing it if fencing costs $5 per meter.”

The right answer: area of 50 square meters, fencing cost of $150. All models answered correctly with 2-3 step breakdowns. Perplexity was most concise with just two steps and detailed formulas.

Second Problem
#

“If half of the garden’s area is for vegetables and the other half for flowers, and you need 4 flowers per square meter, how many flower plants do you need? Also, if a sprinkler covers 2 square meters, how many sprinklers for the entire garden?”

The right answer: 100 flower plants and 25 sprinklers. All models answered correctly. ChatGPT o1 added a preliminary step recalculating the garden area. Perplexity was again most concise.

Conclusions
#

There is no single “best” model – it depends on what you need:

For creative writing, DeepSeek and Claude impressed with their poetic and literary qualities
For image reasoning, ChatGPT 4o offered the most complete and imaginative analysis
For mathematical problem solving, all models performed well, with Perplexity standing out for conciseness
For cautious, accurate responses, ChatGPT o1 consistently avoided overreach

The AI landscape is evolving so rapidly that these results represent a snapshot in time. In six months, the rankings may look entirely different.

1. Creative Writing Tasks#

Song Lyrics: “Nostalgia for a place you’ve never visited”#

Short Story: “A memory from childhood”#

2. Image Description and Reasoning#

Basic Description#

Deductive Reasoning#

3. Multi-Step Mathematical Problem Solving#

First Problem#

Second Problem#

Conclusions#