evalrig-adjacent

7 items

Google DeepMind · 2026-05-20 2026-05-22-w1

DeepMind Co-Scientist: A multi-agent AI partner to accelerate research

The detail that reorients the entire Co-Scientist paper: the majority of system compute goes to verifying hypotheses, not generating them. DeepMind didn't build a research assistant on top of Gemini — it built a verifier corpus (AlphaFold, ChEMBL, UniProt, the full literature stack) and wrapped a generator around it. That architectural choice is the same bet surfacing in the Bloomberg litigation data and the BBC manipulation piece: generation is cheap and increasingly generic, and the organizations that accumulated verification infrastructure before the model layer commoditized are holding the durable position. Every 'AI for vertical X' startup that priced the model layer priced the wrong thing. The moat was always the corpus that tells you whether the output is true.

Bloomberg · 2026-05-22 2026-05-22-w3

Courts Are Swamped With AI-Powered Do-It-Yourself Lawsuits

Pro se employment filings grew 49% year-over-year (4,100 to 6,400) while attorney-led filings grew 15% — and Nippon Life burned roughly $300K defending one ChatGPT-assisted plaintiff trying to reopen a settled case. AI didn't make those plaintiffs more legally sophisticated; it flipped the cost asymmetry so that filing is nearly free and response is not. That's the same structural gap the BBC piece exposes in information distribution and Co-Scientist exposes in research: generation costs collapsed, verification costs didn't move. The unoccupied product surface here sits on the defense side, sanctions detection, AI-authorship forensics, response-cost triage, and it's the same category as the verifier corpus DeepMind built, just at the opposite end of the market from Harvey. Volume markets with high cost-to-respond are permanently changed; the firms that figure out verification tooling own the economics of what comes next.

Bloomberg 2026-05-22-1

Courts Are Swamped With AI-Powered Do-It-Yourself Lawsuits

Bloomberg's DIY-lawsuit lede buries the structural point: pro se employment filings grew 49% YoY (4,100 → 6,400) while attorney-led grew 15%, and Nippon Life burned ~$300K defending one ChatGPT-assisted plaintiff trying to reopen a settled case. That's the actual story — AI didn't make plaintiffs smarter, it flipped the litigation cost asymmetry. Volume markets with high cost-to-respond just became permanently uneconomic for defendants, and the unoccupied product surface is defense-side: adversarial-output verification (sanctions-detection, AI-authorship forensics, response-cost triage) — EvalRig-adjacent, opposite end of the market from Harvey.

The Handbasket 2026-05-22-2

Hating AI is good, actually

Pew clocking 53% pessimism vs 16% optimism on AI and creativity landed the same day WSJ put 'AI Rebellion' on the front page — sentiment confirmation, not signal. The actual signal is the Rosenbaum book (fabricated quotes, author unrepentant) and Granta using Claude.ai to evaluate AI-suspected prize submissions landing in the same week: legitimacy is collapsing precisely where output verification was never built. Every CMO reading the WSJ piece has the same question their CTO hasn't answered yet — where in our stack does a Rosenbaum incident happen to us.

Wall Street Journal 2026-05-22-3

WSJ/Mims — 'Vibe Slop Crisis': 75% AI-generated code at Google, GitHub policy response, and the IPO-window verification arbitrage

Pichai says 75% of Google's new code is AI-generated, up from 50% six months ago; Claude Code's median user went from 20 minutes a day to 20 hours a week. GitHub changing its policies to fight AI-generated coding garbage in the same week the Zechner/Ronacher critique surfaces in WSJ isn't coincidence — it's practitioner alarm graduating to institutional press at exactly the OpenAI/Anthropic IPO moment. The market is pricing generation; the cliff it hasn't priced is verification.

Digiday 2026-05-21-1

The Economist's two-track web: agent-readable B2B pages, embedded pods, and the wholesale/retail split

The Economist is building two parallel surfaces: stripped-down Q&A for the agents that B2B buyers now start their research in, and the glossy human-facing product where subscription pricing actually lives. De Zanche names it correctly: agent optimization is a defensive baseline, not differentiation, which means the agent-track is wholesale and the human-track is the only place premium pricing survives. The quieter story is the org-shape change underneath: six to eight cross-functional pods, editorial staff embedded next to engineers, science-desk editors vibe-coding journal-credibility utilities, and a productivity number revised from 8 percent to more-than-doubled in a single news cycle.

Google DeepMind 2026-05-20-1

DeepMind Co-Scientist: A multi-agent AI partner to accelerate research

DeepMind's Co-Scientist paper in Nature drops the actual bombshell in one sentence — the majority of system compute goes to verifying hypotheses, not generating them. The moat isn't Gemini; it's the verifier corpus that grounds each claim: AlphaFold, ChEMBL, UniProt, the literature stack Google has quietly accumulated. Every "AI for vertical X" startup pricing the model layer is pricing the wrong layer of the stack.