First Proof Challenge: AI Solves Half of Novel Math Lemmas, But Can't Invent New Math

Scientific American 2026-03-25-2

First Proof Challenge: AI Solves Half of Novel Math Lemmas, But Can't Invent New Math

Eleven mathematicians posed 10 unpublished research lemmas to AI: public models solved 2, scaffolded in-house systems hit 5-6. The score matters less than how they solved them: brute-force assembly of existing tools, not invention of new abstractions. That's the same ceiling every enterprise hits. AI is a spectacular research assistant and a mediocre strategist. The 3x jump from multi-agent scaffolding, not model upgrades, tells you where the real capability gains live. And Lauren Williams' attribution finding generalizes far beyond math: if you can't separate human from AI contribution in formal proofs, you definitely can't in your quarterly business review.

# tags

agentic-ai-viability reliability multi-model-strategy ai-1.0-defensibility

◆ entities

First Proof OpenAI Google Gemini Mohammed Abouzaid Lauren Williams

→ threads

agentic-ai-viability multi-model-strategy reliability

⟷ links

2026-03-21-2 2026-03-08-1 2026-03-13-1 2026-03-22-3 2026-03-17-3 2026-03-13-w3 2026-03-10-1 2026-03-20-w1 2026-03-20-w2 2026-03-18-1

permalink