inference

2 items

Bloomberg 2026-04-25-2

Meta Strikes Multibillion-Dollar Deal to Use Amazon Chips for AI Projects

Meta is renting hundreds of thousands of Graviton chips from AWS for multiple billions; Graviton is a CPU, not an accelerator. The consensus is measuring AI capex by GPU count, but at production scale the CPU layer, which handles feature serving, retrieval, ranking, and orchestration, runs roughly 5-10x the accelerator unit count. This deal is the first explicit public signal that reframes general-purpose CPU compute as a distinct AI infrastructure category, and it means the total AI infrastructure commitment envelope is materially larger than accelerator-only framings capture.

New York Times 2026-03-17-3

Nvidia Built the A.I. Era. Now It Has to Defend It.

Nvidia is the first major chipmaker to unbundle training from inference at the architecture level, pairing its GPUs with Groq's inference-optimized LPUs in a $20B licensing deal. The supply chain math is as interesting as the product: Groq on Samsung fab with no HBM dependency sidesteps both TSMC allocation constraints and memory chip shortages. If inference grows to 70-80% of total AI compute spend, the companies building chip-agnostic inference routing will capture a new middleware layer that doesn't exist yet.