edge-ai — tisram

The Verge 2026-06-02-1

Microsoft to unveil new AI models and Windows improvements at Build

Build 2026 is a developer-trust-repair operation with a second plot running underneath it. Microsoft is assembling the full OpenAI-independence stack: its first reasoning model trained without distillation, its own image models, a new agent, and a hard push toward local inference on Windows silicon. The "no distillation" detail is the tell — Microsoft wants to prove it can train reasoning without learning from another model's outputs.

# tags

microsoft ai-strategy on-device inference-cost-economics vertical-integration developer-tools competitive-positioning msft copilot suleyman github edge-ai nvidia arm verge model-routing

◆ entities

Microsoft Mustafa Suleyman MAI-Thinking-1 Copilot Microsoft Scout GitHub Nvidia RTX Spark Qualcomm OpenAI Satya Nadella Jensen Huang

→ threads

microsoft-openai-independence consumer-edge-inference

⟷ links

art_20260403_microsoft-mid-class-model-admission-compart_20260529_engadget-microsoft-s-buttoned-up-copilot2026-03-27-3 2026-04-07-2

permalink

Dwarkesh Podcast 2026-05-28-1

Reiner Pope on Chip Design from the Bottom Up: Data Movement Dominates Arithmetic 7-to-1, B300's FP4-FP8 Gap as First Crack in NVIDIA's FLOPS Marketing, Splittable Systolic Arrays as Maddox's Architectural Wedge

NVIDIA's B300 datasheet ships FP4 at 3x FP8 speed where precision-scaling theory says 4x — the first public number that doesn't square with marketed FLOPS as a benchmark. The durable accelerator moat is array geometry plus memory hierarchy, not transistor budget: that's why Maddox, Majestic, Groq, and Cerebras all exist as funded alternatives, each architecture matched to a workload profile the general-purpose chip handles inefficiently. By 2027, enterprise procurement moves from NVIDIA versus not to which architectural bet fits the inference batch size.

# tags

ai-economics ai-infrastructure semiconductor nvidia tpu inference-economics hardware-fragmentation custom-chips ai-1.0-defensibility dwarkesh semiconductors gpu-infrastructure compute-supply-chain harness-as-moat agentic-ai-viability edge-ai podcast

OpenAI Engineering Blog 2026-05-05-1

OpenAI's WebRTC rearchitecture for low-latency voice

OpenAI's voice rearchitecture moves the competition down a layer; the model is no longer where the gap opens. The published mechanics, split relay plus stateful transceiver, ufrag-encoded routing, and the hire of WebRTC's original architects, buy deterministic first-packet routing and a Kubernetes-native UDP surface that competitors stitching LiveKit and ElevenLabs cannot replicate without comparable POP density. The explicit 1:1 framing also breaks the SFU default for voice agents, leaving specialist delivery vendors competing for a multiparty-shaped TAM.

# tags

voice-ai ai-infrastructure openai webrtc ai-1.0-defensibility platformization cloudflare elevenlabs audio-stack vertical-integration competitive-strategy agentic-ai-viability reliability evalrig pickrig ai-economics Realtime-API edge-ai