When you purchase through links on our site, we may earn an affiliate commission.Heres how it works.
In fact, this new AI examination system is causing issues for even the most advanced models.
ChatGPT’s GPT 4.5 model likewise scored 0.8%.
So what are they being tested on that is so hard?
What’s the test?
Once the pattern is identified, the model then has to snag the correct answer.
Its a bit like learning grade-school math problems.
You cannot simply memorize your way to the answer.
Instead, the tasks require a model to apply existing knowledge and models of understanding to completely new problems.
Over 400 people were actually asked to take the same test.
On average, this human panel scored an average of 60% far exceeding even the best-performing AI models.
This is where the team behind the test believes we should be testing AI.
As the name suggests, this isnt the first version of this test.
In 2019, aGoogleemployee created ARC-AG1.
This took AI four years to beat and showed the eventual advancement in reasoning for these models.