rlft

2 items

WIRED 2026-05-10-2

I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI

Mercor's 300 employees plus tens of thousands of contractors is structurally identical to Medvi's 2 employees plus outsourced clinical labor — same shape, different industry. The frontier labs' "human alignment" premium is a labor-supply-chain bet, and procurement DD that asks about training-data provenance but not evaluation-labor provenance is asking 2024's question. The atomization Fowler describes is the durable feature: profession unbundled into rate-this, classify-that, evaluate-that, with the person erased and the signal extracted.

a16z Podcast (originally Cheeky Pint) 2026-04-17-3

From Models to Mobility: Waymo Architecture at Scale — Dolgov on the Teacher/Simulator/Critic Triad and the End-to-End Debate Resolution

Waymo's architecture resolves the end-to-end debate: Dolgov states pure pixels-to-trajectories drives "pretty darn well" in the nominal case but is "orders of magnitude away" from what full autonomy requires. The 500K-rides-per-week stack is one off-board foundation model fanning into three specialized teachers (Driver, Simulator, Critic), each distilled into smaller in-car students; RLFT against the critic is the physical-AI analog to RLHF. Enterprise teams shipping pure-LLM agents without the simulator and critic scaffolding are replaying Waymo's 2017, not its 2026: evaluation infrastructure is the reliability gate, not model choice.