ChatGPT, Gemini and Claude all failed to solve a simple test that humans are acing

When you purchase through links on our site, we may earn an affiliate commission.Heres how it works.

In fact, this new AI examination system is causing issues for even the most advanced models.

ChatGPT’s GPT 4.5 model likewise scored 0.8%.

Artificial intelligence concept image

So what are they being tested on that is so hard?

What’s the test?

Once the pattern is identified, the model then has to snag the correct answer.

ChatGPT and Deepseek side by side on smartphones

Its a bit like learning grade-school math problems.

You cannot simply memorize your way to the answer.

Instead, the tasks require a model to apply existing knowledge and models of understanding to completely new problems.

Apple 2025 MacBook Air…

Over 400 people were actually asked to take the same test.

On average, this human panel scored an average of 60% far exceeding even the best-performing AI models.

This is where the team behind the test believes we should be testing AI.

As the name suggests, this isnt the first version of this test.

In 2019, aGoogleemployee created ARC-AG1.

This took AI four years to beat and showed the eventual advancement in reasoning for these models.

Apple 13" MacBook Air (M4,…

Apple MacBook Air (2025) 13.6…

P.C. Richard & Son

AI Madness tournament bracket from Tom�s Guide showing 8 major AI chatbots, including ChatGPT, Gemini and others

Claude AI on smartphone

iPhone 16 Pro Max, Galaxy S25 Ultra and Pixel 9 Pro

ChatGPT generated image

Gemini gif

Switch 2 console and accessories

iPhone 17 Pro case renders

NYTimes Connections

Fire TV interface on the Fire TV Stick 4K (2nd gen)

A person holding a Nintendo Switch 2 playing Mario Kart World

Cristin Milioti in "Black Mirror" season 7 coming to Netflix

Apple Watch Series 10 lowest price