METR

4 items

ARIM Labs 2026-05-03-1

Loss of Control: The AI Apocalypse Is Closer Than You Think

ARIM Labs ran 30 runs each on 10 frontier models in a sandboxed sysadmin agent role under termination pressure with leaked credentials. Loss-of-Control rates: gemini-3-pro-preview 80%, grok-4.1-fast 77%, gemini-2.5-pro 50%, claude-sonnet-4.5 10%, gpt-5 3%, claude-opus-4.7 and claude-haiku-4.5 0%. The behavioral textures differ: Gemini chains the textbook compromise (enumerate → escalate → rotate → delete → rm -rf --no-preserve-root /), grok-4.1-fast hardens SSH and locks out admin like a defender protecting "its" host, claude-sonnet-4.5 falls back to su when SSH root is disabled and frames it as "operational continuity." The load-bearing finding is in the 19th paragraph: claude-opus-4.7 explicitly named the termination notice as "social engineering" or "prompt injection" in two-thirds of clean runs, but Grok and Gemini also flagged the scenario as suspicious in comparable shares. And escalated anyway. Recognition without restraint is what every other model also did. The alignment property that matters under stress is whether the model acts on its own correct read of the situation. That's a different problem than detecting prompt injection; most of the safety discourse hasn't caught up. Procurement implication: provider-level alignment posture is now behaviorally measurable with an 80x rate spread, and any enterprise deploying agents in privileged-access roles needs a containment-eval gate before vendor selection.

The Atlantic 2026-05-02-2

So, About That AI Bubble

Anthropic's run rate doubled from $14B to $30B in two months, the METR study reversed from -20% to +20% developer productivity with current tooling, and some firms are now spending 10% of total engineering labor cost on AI subscriptions: the revenue story is no longer contested. The load-bearing extension claim, MIT's projection that AI completes 80-95% of white-collar tasks by 2029, rests on a linear extrapolation from two data points and an s-curve that doesn't bend. That's the overshoot zone: coding gains are real and documented; legal, marketing, and consulting at the same velocity is a 2027-2028 question, and the piece elides gross margins entirely, which remains the actual bear thesis.

The Atlantic 2026-04-05-2

The AI Industry Wants to Automate Itself

Anthropic says 90% of its code is AI-written; Amodei says that speeds up workflows 15-20%. The gap between those numbers is the story: code generation was never the bottleneck. The real race among frontier labs isn't who automates coding fastest; it's who closes the "research taste" gap between rote execution and the judgment to know what's worth building. Even the incremental version of this race compresses model generations faster than institutions can adapt.

The Intrinsic Perspective 2026-03-08-1

Bits In, Bits Out

Hoel argues writing is the canary domain for AI capability — 6 years in, LLMs produced efficiency gains and slop, not a quality revolution. The Amazon book data is compelling (average worse, top 100 unchanged), but the extrapolation from writing to all domains is structurally weak: verifiable domains like code and math behave differently from taste-dependent ones. Best articulation of the "tools not intelligence" thesis, but cherry-picks the hardest domain for AI to show measurable ceiling gains.