verifier-bottleneck
6 items · chronological order
ArXiv to Ban Researchers for a Year if They Submit AI Slop
ArXiv's one-year ban targets only 'incontrovertible' cases, meaning LLM meta-comments left in manuscripts and hallucinated references, which leaves sophisticated AI use untouched by design. The Columbia biomedical data behind the policy shows fabricated citations running from 1 in 2,828 papers in 2023 to 1 in 277 in early 2026, and the policy's narrow scope isn't a bug: detection scales with submissions times sophistication, deterrence scales flat, and when the first exceeds budget you switch to the second. bioRxiv, SSRN, and PubMed Central are next, and arXiv's nonprofit transition in July is explicitly fundraising for the verification cost center that every major research repository will have to build.
OpenAI Model Disproves Erdos Unit Distance Conjecture
An internal OpenAI model disproved Erdos's 1946 planar unit distance conjecture, with Princeton's Sawin extracting an explicit exponent delta=0.014 in a constructive refinement, and Gowers calling it Annals-of-Mathematics quality. The bigger signal isn't the proof. It's Shankar's CoT observation: most of the model's reasoning attempted counterexamples to the conjecture, not validations of it. That's calibrated contrarianism — a scorable behavioral property and the math-grounded analogue to sycophancy detection. Verifier-rich domains are where autonomous AI lands first; counterexample-seeking is how we'll measure whether reasoning is real or performative.
DeepMind Co-Scientist: A multi-agent AI partner to accelerate research
DeepMind's Co-Scientist paper in Nature drops the actual bombshell in one sentence — the majority of system compute goes to verifying hypotheses, not generating them. The moat isn't Gemini; it's the verifier corpus that grounds each claim: AlphaFold, ChEMBL, UniProt, the literature stack Google has quietly accumulated. Every "AI for vertical X" startup pricing the model layer is pricing the wrong layer of the stack.
Google's AI is being manipulated. The search giant is quietly fighting back
A BBC journalist published one page on his personal site claiming hot-dog-eating prowess; 20 minutes later ChatGPT, Gemini, and Google AI Overviews were repeating it. Google's response to a $0 attack floor against a 2.5 billion monthly-view surface: a spam-policy clarification. Two things worth pricing: verified-publisher trust premium inverts upward as AI-citability becomes a defensible moat distinct from SEO, and adversarial-input regression suites become procurement-grade table-stakes for any enterprise running RAG against external corpora.
Google's AI is being manipulated. The search giant is quietly fighting back
A journalist published one page on his personal site claiming hot-dog-eating prowess; 20 minutes later ChatGPT, Gemini, and Google AI Overviews were repeating it as fact. Google's response to a $0 attack floor against a 2.5 billion monthly-view surface was a spam-policy clarification — which is another way of saying verification infrastructure was never part of the original build. The mechanism here is identical to what's arriving in the litigation market: AI lowered the cost of generating content that systems trust, without building any corresponding layer to evaluate whether that trust is warranted. Verified-publisher authority is repricing upward not because editorial quality improved, but because AI-citability is now a distinct and defensible position from SEO. Adversarial-input regression testing follows the same logic as DeepMind's verifier corpus: the evaluation layer is where the economics are accumulating.
DeepMind Co-Scientist: A multi-agent AI partner to accelerate research
The detail that reorients the entire Co-Scientist paper: the majority of system compute goes to verifying hypotheses, not generating them. DeepMind didn't build a research assistant on top of Gemini — it built a verifier corpus (AlphaFold, ChEMBL, UniProt, the full literature stack) and wrapped a generator around it. That architectural choice is the same bet surfacing in the Bloomberg litigation data and the BBC manipulation piece: generation is cheap and increasingly generic, and the organizations that accumulated verification infrastructure before the model layer commoditized are holding the durable position. Every 'AI for vertical X' startup that priced the model layer priced the wrong thing. The moat was always the corpus that tells you whether the output is true.