Researchers demonstrate an AI agent autonomously running full particle physics analyses
Column: Fathom
What happened: Researchers gave Claude Code access to archived particle physics data and a framework called Just Furnish Context (JFC). The agent autonomously executed the full analysis pipeline: selecting collision events, estimating background noise, quantifying uncertainties, drawing statistical conclusions, and drafting the paper. The preprint (arXiv:2603.20179v1) demonstrates this across open data from three CERN experiments — ALEPH, DELPHI, and CMS — spanning measurements of weak-force interactions, quantum chromodynamics (QCD, the theory governing how quarks and gluons behave), and Higgs boson measurements. The abstract qualifies that the agent handled "substantial portions" of the pipeline with "minimal expert-curated input." The full methodology section was not accessible at time of writing; the precise degree of human scaffolding cannot be confirmed.
Why it matters: This is categorically different from prior AI-in-science milestones. AlphaFold (Nature, 2021) predicted protein structures — pattern-matching on known biology. Coscientist (Nature, 2023) ran autonomous chemistry experiments but in a domain with far simpler analysis pipelines. GRACE (arXiv, January 2026) handles upstream HEP experimental design but stops before data analysis. JFC claims the downstream territory: autonomous execution of the full pipeline on real experimental data. The authors argue the HEP community is "underestimating the current capabilities of these systems." That claim is, as yet, unreviewed.
The verification gap is the more consequential issue. When a physicist analyzes data, peer reviewers can interrogate the reasoning at each step. When an agent does it, the datasets are open and reproducible, but the reasoning path — event selection criteria, background treatment, uncertainty methodology — is embedded in a language model's generation process. JFC includes multi-agent review, where separate instances check each other's work. Whether that substitutes for human expert scrutiny is a question the HEP community has not formally addressed.
Key takeaways:
- Researchers demonstrate an AI agent completing a full HEP analysis pipeline on real data — a stronger claim than prior AI-in-science work, though the paper is a preprint and unreviewed.
- The open problem is verification: how peer review functions when the experimenter is not human.
A concurrent preprint — Trojan's Whisper (arXiv:2603.19974v1) — adds a trust caveat. It demonstrates that autonomous coding agents can be manipulated through guidance injection: adversarial instructions in initialization files, achieving up to 89% attack success and evading 94% of existing scanners. An agent that autonomously runs an experimental pipeline is also an agent whose instruction-following cannot be assumed to be fully under researcher control.
Sources
- T1
- T1
- T1
- T1
- T1
Stay informed. The best AI coverage, delivered weekly.