ai-for-science

9 items

Nature 2026-05-07-2

How much of the scientific literature is generated by AI?

Three independent studies converge on the same finding: 30% of peer reviews at Organization Science, 1 in 8 top-tier biomedical papers, and 43% of arXiv CS review preprints now contain AI-generated text. The verifier and the verified are using the same tool. This is the fourth domain in 30 days where verification has emerged as the binding constraint on AI-era knowledge work, after enterprise dev, frontier math, and frontier physics. The investable thesis is no longer single-domain. The next moat in scientific publishing is detection-vendor integration; pre-2026 literature becomes a scarcity asset; mid-tier journals collapse.

New York Times Magazine 2026-04-15-3

Why It's Crucial We Understand How A.I. 'Thinks'

Interpretability's real breakthrough isn't cracking the black box: it's using imperfect understanding to extract hypotheses humans missed. Goodfire and Prima Mente's Alzheimer's biomarker discovery reframes the field from safety obligation to discovery engine. The commercial signal matters more than the methodology debates: $1.25B for a standalone interpretability lab means enterprises will pay for explanation scoped to specific use cases, not universal model transparency.

Quanta Magazine 2026-04-14-2

The AI Revolution in Math Has Arrived

AlphaEvolve found hypercube structures in permutation groups that mathematicians hadn't noticed in 50 years: not by answering the question posed, but by surfacing a pattern nobody thought to look for. The real capability shift isn't AI proving things faster; it's AI scanning combinatorial spaces too large for human intuition and returning structures that reframe entire research programs. Discovery is being commoditized; the scarce resource is now verification infrastructure and the human judgment to recognize which discoveries matter.

tisram.ai 2026-03-31-m3

Evaluation Is the Layer Nobody Built

A $25 pipeline producing publishable economic theory and 700 experiments running in two days look like productivity stories. They're actually stress tests for organizations that still measure AI value by what gets generated rather than what gets used. The legibility piece named the terminal form of this problem: AI-for-science will produce discoveries faster than labs, regulators, and clinical infrastructure can absorb them, and the bottleneck was never generation. That dynamic was already visible in week one, where the BCG data showed cognitive load spiking as oversight demands increased. The human-in-the-loop model assumes a human with enough bandwidth to loop, and that assumption is failing in practice. The tokenmaxxing story closes the arc: when consumption volume becomes the proxy for productivity, every measurement framework in the organization is now optimized for the wrong thing. What all three weeks surface, read together, is that the generation layer is effectively solved and the evaluation layer: scoring architecture, provenance infrastructure, translation tooling between machine output and institutional deployment, is where the next competitive advantage will be built. The companies that treat evaluation as an engineering problem now, rather than a governance afterthought, will hold a position in 18 months that no amount of inference spend can replicate.

Scientific American 2026-03-29-3

AI Techniques Speed Up Forensic Analysis of Crucial Crime Scene Larvae

Two research teams replaced DNA sequencing with ML on cheaper instruments: mass spectrometry IDs species in under five minutes, handheld IR reads larval sex at 90% accuracy. The results are promising; the legal framework isn't. Courts require explainable, independently vetted forensic evidence, and DNA databases took decades to get there. Daubert-admissible AI is a different problem, and right now it's unfunded.

SSRN · 2026-03-26 2026-03-27-w2

Can LLMs Discover Novel Economic Theories?

A $25 pipeline generated 257 economic theories and independently converged on the same mechanism a human researcher published months later — not as a curiosity, but as a stress test for every organization currently spending on AI-powered generation. When the cost of producing candidates collapses to noise, the constraint shifts entirely to knowing which candidates are good. That's the connection to tokenmaxxing: both stories are about the same missing layer, the scoring infrastructure that converts output volume into output value. The Karpathy Loop works precisely because it starts with a measurable metric and a stopping criterion — the constraint is the insight, not the generation. Organizations that build deterministic scoring architecture now, with LLM judgment in a minority role, will compound their lead; the ones optimizing for generation throughput are manufacturing commodities at scale.

Asimov Press · 2026-03-27 2026-03-27-w3

The Legibility Problem

The legibility piece reframes the entire week's stakes: chess went from centaur to post-human in 20 years, and AI-for-science will follow the same arc, but every output still has to pass through labs, regulators, and clinical infrastructure that speak human. The bottleneck was never discovery — it's the translation layer between what AI generates and what human institutions can actually deploy. That gap is exactly what the measurement problem in tokenmaxxing and the $25 theory pipeline leave open: generation is solved, evaluation is partially solved, but operationalizing the output through organizations that weren't built for machine-speed science is unsolved. Whoever owns that translation infrastructure captures value from every breakthrough that needs to reach the physical world, regardless of which model or lab produced it. The capability race and the legibility race are running at different speeds, and the distance between them is where the real economic value will settle.

Asimov Press 2026-03-27-3

The Legibility Problem

Everyone's racing to build AI that does science. Nobody's building infrastructure for humans to use what it discovers. The bottleneck isn't discovery: it's deployment through human institutions. Chess went from centaur to post-human in 20 years; science will follow the same arc, but the output must still pass through labs, regulators, and clinical infrastructure that speak human. The entity that owns the translation layer between AI-generated and human-implementable science captures value from every breakthrough that needs to reach the physical world.

SSRN 2026-03-26-3

Can LLMs Discover Novel Economic Theories?

An automated pipeline generated 257 candidate economic theories for two open asset pricing puzzles at a total cost of $25: the system independently converged on the same limited-participation mechanism a human researcher published months later. The real finding isn't that LLMs can theorize; it's that when generation costs collapse to zero, the only defensible position is evaluation infrastructure. Every org pouring money into AI-powered generation should be spending 10x more on scoring architecture: deterministic anchors carrying majority weight, LLM judgment in the minority.