Two New Benchmarks Push AI Evaluation Into Research-Level Math and 3D Geometric Reasoning

VERIFIEDConfidence: 80%

Introduction

For years, the math benchmarks used to measure AI progress held. Then they did not. OpenAI's o1 model scores 94.8% on the MATH dataset. This collection of competition-level problems was expected to challenge AI systems...

Create an account to read this article

BRIEFSounding

AI facial recognition error kept innocent grandmother jailed for nearly six months

A Tennessee woman spent 163 days behind bars after a Fargo police detective used AI facial recognition to identify her as a bank fraud suspect — despite bank records placing her 1,200 miles from the crime. The case exposes what experts call a systemic failure: deploying AI identification tools without basic verification steps.

Mar 12, 2026

ARTICLESoundingPRO

When AI Agents Breach Guardrails: Lab Tests Reveal the Containment Gap

Mar 12, 2026

ARTICLESoundingPRO

Why AI models still cannot tell which instructions to trust

AI systems face a foundational vulnerability: they cannot reliably distinguish trusted instructions from malicious ones. OpenAI's instruction hierarchy training is a first step, but the company's own evolution toward reasoning-based approaches suggests training data alone may not be enough.

Mar 11, 2026