First Proof Challenge Tests LLMs in Math Research
Results posted on arXiv on Feb 13, 2026.
Problems were unpublished and encrypted before Feb 13, 2026.
Participants solved problems using GPT-5.1 Pro and Gemini 3 Pro.
Only two solutions were correct for problems 9 and 10.
No LLMs solved all ten problems.
Second round planned with tighter controls, details March 14, 2026.
Goal to make First Proof a permanent benchmark for math and other domains.
1 week ago