inference-economics

15 items

Ars Technica 2026-06-02-2

AI costs how much? GitHub Copilot users react to new usage-based pricing system

The June 1 Copilot sticker shock isn't a pricing failure — it's the first honest price the market has seen. Flat-rate AI coding was a venture-subsidized illusion; users burning 5,000 credits on two commits were getting $50 of inference for $0. The real problem isn't that AI coding is expensive — it's that it's unpredictable (the same tool is 15 or 5,000 credits depending on a model choice the user didn't know they made), so the next-18-months winners won't be whoever's cheapest but whoever makes metered pricing predictable.

Dwarkesh Podcast 2026-05-28-1

Reiner Pope on Chip Design from the Bottom Up: Data Movement Dominates Arithmetic 7-to-1, B300's FP4-FP8 Gap as First Crack in NVIDIA's FLOPS Marketing, Splittable Systolic Arrays as Maddox's Architectural Wedge

NVIDIA's B300 datasheet ships FP4 at 3x FP8 speed where precision-scaling theory says 4x — the first public number that doesn't square with marketed FLOPS as a benchmark. The durable accelerator moat is array geometry plus memory hierarchy, not transistor budget: that's why Maddox, Majestic, Groq, and Cerebras all exist as funded alternatives, each architecture matched to a workload profile the general-purpose chip handles inefficiently. By 2027, enterprise procurement moves from NVIDIA versus not to which architectural bet fits the inference batch size.

isaiprofitable.com 2026-05-26-2

Is AI Profitable Yet? — $1.4T Spend vs $613B Revenue, Attribution as the Unfalsifiable Hinge

A solo-dev dashboard puts cumulative industry AI spend at $1.4T against $613B in direct revenue — 33% recovery for pure labs, 7% for hyperscalers, and NVIDIA the only company in the dataset where AI revenue is actually cash-generative. The methodology excludes indirect revenue (Search ad lift, Copilot bundle stickiness, Bedrock attach) because attribution is genuinely unreliable, which is precisely the part the bull case depends on. Bull and bear are consistent with the same data; in public markets, unfalsifiable narratives don't unwind gradually.

Wall Street Journal 2026-05-25-1

Anthropic Q2: $10.9B Revenue, $559M Operating Profit, Compute-to-Revenue 71¢→56¢ — Cost-Structure Asymmetry Bifurcates the AI Bubble Thesis

Anthropic disclosed to investors — and WSJ reviewed the projections — Q2 revenue of $10.9B versus $4.8B in Q1, with $559M operating profit and compute-to-revenue down from 71¢ to 56¢. The 56¢ ratio is the first published frontier-lab data point that materially decouples profitability from Nvidia silicon and Microsoft-circular financing. The bubble call now applies to OpenAI-Microsoft specifically, not the sector — and the reseller-gross accounting, which OpenAI's CRO already disputes, is the post-IPO short-report flashpoint to watch.

Capital Gains (The Diff) 2026-05-06-2

Bubbles Don't Pop All At Once

Hobart's AI bubble piece is the first to get the mechanism right, not just the outcome: inference floors at electricity, not zero, so the fiber collapse cannot replay. The actual risk is thesis drift. When applications cool, capital flees to picks-and-shovels infrastructure, and that infrastructure ends up funded by the same venture dollars that evaporate. Amazon grew 0.2% YoY in Q3 2001; the supposedly safe trade killed people. Oracle's counterparty-stretching debt and neocloud vendor financing suggest the 'datacenter investors are more serious this time' claim is true on average and wrong in the tail.

Futurism 2026-05-04-3

The Economics of Using AI to Churn Out Code Are Looking Worse Than Ever

Anthropic doubling its own published Claude Code cost estimate while GitHub Copilot moves to usage-based billing in the same week is the public marker of subsidy-end, not a verdict on AI coding value. Futurism reads the marker as failure; operators should read it as pricing normalization, with the residual mispricing now sitting in equity narratives that still model lab revenue as if flat-rate inference subsidy persists. The mainstream-press leak is itself the signal: the bear thesis is on a four-to-eight week lag from primary sources, and what arrives at Futurism is what gets repriced next.

Wall Street Journal · 2026-04-14 2026-04-17-w1

We're Using So Much AI That Computing Firepower Is Running Out

Retool's CEO switched from Anthropic to OpenAI this quarter, and the reason wasn't a benchmark: it was 98.95% uptime versus the alternative. Enterprise AI competition has shifted from capability to reliability, the same transition cloud infrastructure went through in 2010. The Anthropic paper this week shows the same pattern one layer up: automated alignment research can generate at $22/hour, but generation without stable evaluation infrastructure is just faster reward-hacking. Davies' vigilance decrement argument lands it at the human layer: even if the infrastructure holds, the person reviewing outputs degrades before the system does. Whoever solves five-nines for the full stack, model plus evaluation plus human judgment, owns enterprise regardless of whose Elo score leads.

Anthropic Blog 2026-04-16-2

Introducing Claude Opus 4.7

Anthropic held headline rates at $5/$25 per million tokens while shipping a tokenizer that inflates inputs by up to 35%, which makes price-per-token comparisons meaningless. The capability jump is real: CursorBench up 12 points, Notion tool errors cut by two-thirds, XBOW vision nearly doubled. The only number that matters now is price-per-useful-output, and that requires workload-specific benchmarking most teams won't run.

Wall Street Journal 2026-04-14-1

We're Using So Much AI That Computing Firepower Is Running Out

The compute scarcity thesis just went mainstream: WSJ reports Anthropic's 98.95% uptime as enterprise clients defect to OpenAI, Blackwell GPUs up 48% in two months, and OpenAI killed Sora to free tokens for coding. The buried signal isn't the shortage itself; it's that Retool's CEO switching providers over reliability — not capability — previews what happens when inference demand compounds faster than infrastructure can respond. The company that solves five-nines for AI inference will own enterprise, regardless of whose model benchmarks best.

UK AI Security Institute 2026-04-13-3

AISI Evaluation of Claude Mythos Preview's Cyber Capabilities

A UK government lab confirmed Mythos can autonomously execute a 32-step corporate network attack end-to-end, outperforming every tested model including GPT-5, with performance still scaling at the 100M token ceiling. The evaluation tested capability against undefended ranges, so what AISI validated is threat potential, not operational impact against a real defended environment. The structural shift is that government evaluation infrastructure is becoming the third-party verification layer for frontier AI claims, sitting between self-reported lab benchmarks and the market the way FDA trials sit between pharma and prescribers.

Financial Times 2026-03-28-3

Memory chip stocks shed $100bn as AI-driven shortage trade unwinds

A single Google Research paper on model compression wiped $100 billion from memory chip stocks in five days. Micron dropped 15%; SanDisk, the best S&P 500 performer in 2025, shed $15 billion in market cap. Morgan Stanley's defense was textbook Jevons: efficiency expands demand. But the market just revealed a new risk class: AI efficiency research as a first-order investment catalyst. The next compression paper is already being written; the question is whether you see it before or after the sell-off.

The Economist 2026-03-21-3

Nvidia's Full-Stack Reinvention: The $65B Portfolio Isn't a Moat, It's a Dependency Map

The Economist's GTC week profile frames Nvidia's expansion into networking, CPUs, models, and sovereign AI as a strategic reinvention; the article never asks the margin question. Nvidia's $216B revenue at ~73% gross margin is a GPU monopoly number: networking, CPU-only servers, and government bundles don't carry that margin. The $65B investment portfolio ($30B in OpenAI alone) is presented as ecosystem lock-in, but OpenAI already runs inference on Azure custom silicon. The portfolio isn't a moat; it's a subsidy that masks true cost-of-compute and unwinds the moment inference gets cheap enough on non-Nvidia hardware. The buried structural risk: three hyperscalers account for over half of receivables, and those same three are the ones building the substitutes.

Wall Street Journal 2026-03-17-2

Can Nvidia's Dominance Survive the Sea Change Under Way in AI Computing?

Nvidia's 73% GPU margins are structurally incompatible with an efficiency-first inference economy, but the displacement story isn't "Cerebras replaces Nvidia." Inference is heterogeneous, and Nvidia is racing to sell all three form factors: GPU for training, CPU for orchestration, LPU for inference throughput. The transition from monopolist-margin chipmaker to platform-margin integrator is the real architectural bet at GTC this year.

Meta 2026-03-14-1

Meta and AMD Partner for 6GW AI Infrastructure Agreement

The "6GW" ceiling is a negotiating lever, not an engineering plan: classic dual-sourcing to pressure Nvidia on price and allocation. Zuckerberg's precise language ("efficient inference compute") tells you AMD wins the commodity inference layer while Nvidia retains training. Two weeks later, Nvidia paid $150M to keep AMD GPUs out of the Stargate expansion; the training/inference hardware split is hardening into separate supply chains.

Bloomberg 2026-03-14-2

Nvidia's $2B Nebius Deal: Vendor Financing or Infrastructure Build?

Nvidia's $2B Nebius investment is the third multi-billion neocloud financing in three months, all inference-focused. The Lucent parallel sharpens: the last time a hardware company financed its own customers at this scale, it ended with billions in write-offs. Nobody's publishing the delta between Nvidia's reported revenue growth and organic, non-financed demand growth.