Skip to main content
Editorial sketch of a mathematician studying complex equations on a chalkboard with research papers on a desk
Sounding
MIDWATER

Two New Benchmarks Push AI Evaluation Into Research-Level Math and 3D Geometric Reasoning

VERIFIEDConfidence: 80%

Introduction

For years, the math benchmarks used to measure AI progress held. Then they did not. OpenAI's o1 model scores 94.8% on the MATH dataset. This collection of competition-level problems was expected to challenge AI systems...

Create an account to read this article

Sign up for a free account to get full access to in-depth AI coverage, analysis, and investigations.

Related