Mathematicians Put A.I. to the Test with First Proof
A group of mathematicians has launched First Proof, an experiment that uses genuine, unpublished research questions to measure how well large-language models handle research-level mathematics. Each contributor supplied a test problem drawn from work they had underway; the solutions were encrypted online and are due to be released on Feb.
13. The team says the effort is meant to temper hype that A.I. will soon “solve” mathematics and to avoid discouraging students or funders. The collaborators include Martin Hairer, Mohammed Abouzaid, Lauren Williams and Tamara Kolda, representing a diversity of fields.
In preliminary trials on OpenAI’s ChatGPT-5.2 Pro and Google’s Gemini 3.0 Deep Think, the authors found that when given a single shot the best publicly available systems struggled to answer many of the problems.