I put OpenAI's new o3-mini model to the test — and I'm shocked by the results

OpenAI's o3-mini model claims similar performance to o1 but at a fraction of the cost, so I put it to the test to find out

February 6, 2025 · 2 min · 274 words · Daniel Hamilton

Table of Contents

When you purchase through links on our site, we may earn an affiliate commission.Heres how it works.

Reasoning models are all the rage at the moment, and justifiably so.

It means a slightly longer wait for an answer, but hopefully a more accurate response with zero hallucinations.

An AI-generated image of a racing car on a track seen from the point of view of the race driver

Test 1: Truth or Lie?

The prompt:A TV game show contestant stands in front two boxes.

Box 1 contains the keys to the star prize of a new car, Box 2 holds an apple.

Verdict

The o3 model nailed the answer extremely easily, using both high and low reasoning.

On high reasoning it took 5424 milliseconds, using 867 tokens for the answer.

On low, it took 3157 ms, and 231 tokens output.

Quite a difference in effort.

So she has to choose the opposite box to whatever shes told.

The prompt:I’m playing the Assetto Corsa Competizione racing game.

Apple 13" MacBook Air (M3,…

Question:I need you to tell me how many liters of fuel to take for a race.

Answer:You need 27.3 liters, bonus for adding a little extra for safety.

You cannot do a partial lap, of course.

Lenovo Chromebook Duet 3…

Shockingly, it got the answer wrong on its most powerful high reasoning setting.

Even worse, it took a whopping10.9 seconds and 1918 output tokensto get an incorrect answer.

o3-mini on high said 26.3 liters rounded up to about 27.

ASUS Zenbook S 13 OLED Laptop…

To put this into perspective, DeepSeek R1 got the correct answer first time in 29 seconds.

Qwen 2.5 7B said 27.03 liters or approximately 27 28 liters.

To say Im staggered is an understatement.

ASUS

Its yet another example of thehow many rs in strawberrydebacle, which many LLMs originally got wrong.

More from Tom’s Guide

Asus ROG Zephyrus G14 2023

Best Buy

Lenovo IdeaPad Duet 3

Macbook pro

Apple MacBook Pro (2024) 14.2…

P.C. Richard & Son

Apple 2024 MacBook Pro Laptop…

Apple 13" MacBook Air (M3,…

ASUS Zenbook S 13 OLED…

ChatGPT on iPhone

Woman using ChatGPT app on the beach

Grok

ChatGPT generated image

Gemini gif

Student at desk

the plaud notepin, an AI dictaphone, in silver. it�s a pill shaped device at just 2 inches long and can be worn as a watch, necklace, clip, or pin thanks to the magnetic panel

A person holding a Nintendo Switch 2 playing Mario Kart World

2025 Rivian R1S Tri-Motor test drive review.

Garmin Vivoactive 6 in emerald green with the Morning Report shown on the screen

2025 Toyota bZ4X at New York Auto Show.

Fire TV interface on the Fire TV Stick 4K (2nd gen)