inference-economics
4 items · chronological order
Nvidia's Full-Stack Reinvention: The $65B Portfolio Isn't a Moat, It's a Dependency Map
The Economist's GTC week profile frames Nvidia's expansion into networking, CPUs, models, and sovereign AI as a strategic reinvention; the article never asks the margin question. Nvidia's $216B revenue at ~73% gross margin is a GPU monopoly number: networking, CPU-only servers, and government bundles don't carry that margin. The $65B investment portfolio ($30B in OpenAI alone) is presented as ecosystem lock-in, but OpenAI already runs inference on Azure custom silicon. The portfolio isn't a moat; it's a subsidy that masks true cost-of-compute and unwinds the moment inference gets cheap enough on non-Nvidia hardware. The buried structural risk: three hyperscalers account for over half of receivables, and those same three are the ones building the substitutes.
Memory chip stocks shed $100bn as AI-driven shortage trade unwinds
A single Google Research paper on model compression wiped $100 billion from memory chip stocks in five days. Micron dropped 15%; SanDisk, the best S&P 500 performer in 2025, shed $15 billion in market cap. Morgan Stanley's defense was textbook Jevons: efficiency expands demand. But the market just revealed a new risk class: AI efficiency research as a first-order investment catalyst. The next compression paper is already being written; the question is whether you see it before or after the sell-off.
We're Using So Much AI That Computing Firepower Is Running Out
The compute scarcity thesis just went mainstream: WSJ reports Anthropic's 98.95% uptime as enterprise clients defect to OpenAI, Blackwell GPUs up 48% in two months, and OpenAI killed Sora to free tokens for coding. The buried signal isn't the shortage itself; it's that Retool's CEO switching providers over reliability — not capability — previews what happens when inference demand compounds faster than infrastructure can respond. The company that solves five-nines for AI inference will own enterprise, regardless of whose model benchmarks best.
We're Using So Much AI That Computing Firepower Is Running Out
Retool's CEO switched from Anthropic to OpenAI this quarter, and the reason wasn't a benchmark: it was 98.95% uptime versus the alternative. Enterprise AI competition has shifted from capability to reliability, the same transition cloud infrastructure went through in 2010. The Anthropic paper this week shows the same pattern one layer up: automated alignment research can generate at $22/hour, but generation without stable evaluation infrastructure is just faster reward-hacking. Davies' vigilance decrement argument lands it at the human layer: even if the infrastructure holds, the person reviewing outputs degrades before the system does. Whoever solves five-nines for the full stack, model plus evaluation plus human judgment, owns enterprise regardless of whose Elo score leads.