Reiner Pope on Chip Design from the Bottom Up: Data Movement Dominates Arithmetic 7-to-1, B300's FP4-FP8 Gap as First Crack in NVIDIA's FLOPS Marketing, Splittable Systolic Arrays as Maddox's Architectural Wedge
NVIDIA's B300 datasheet ships FP4 at 3x FP8 speed where precision-scaling theory says 4x — the first public number that doesn't square with marketed FLOPS as a benchmark. The durable accelerator moat is array geometry plus memory hierarchy, not transistor budget: that's why Maddox, Majestic, Groq, and Cerebras all exist as funded alternatives, each architecture matched to a workload profile the general-purpose chip handles inefficiently. By 2027, enterprise procurement moves from NVIDIA versus not to which architectural bet fits the inference batch size.