The Data Factory: Inside the $100B Race for Post-Training Supremacy, with Labelbox CEO Manu Sharma

Manu Sharma, founder and CEO of Labelbox, explains how frontier AI training data has evolved far beyond simple labeling to sophisticated reinforcement learning environments where domain experts create "gyms" for models to develop complex skills.
Watch Episode Here
Read Episode Description
Manu Sharma, founder and CEO of Labelbox, explains how frontier AI training data has evolved far beyond simple labeling to sophisticated reinforcement learning environments where domain experts create "gyms" for models to develop complex skills. With every Western frontier lab now spending over a billion dollars annually on training data, the conversation traces the shift from supervised learning to reinforcement learning from verifiable rewards, particularly for coding, mathematical reasoning, and computer use. Sharma reveals how Labelbox operates as a vertically integrated data factory, conducting over 2,000 AI-powered expert interviews daily and paying top specialists more than $250,000 annually. The discussion provides essential insights into the red-hot training data market that's reshaping AI development following major deals like Meta's $15B acquisition of Scale AI.
Sponsors:
Oracle Cloud Infrastructure: Oracle Cloud Infrastructure (OCI) is the next-generation cloud that delivers better performance, faster speeds, and significantly lower costs, including up to 50% less for compute, 70% for storage, and 80% for networking. Run any workload, from infrastructure to AI, in a high-availability environment and try OCI for free with zero commitment at https://oracle.com/cognitive
The AGNTCY: The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at https://agntcy.org
NetSuite by Oracle: NetSuite by Oracle is the AI-powered business management suite trusted by over 42,000 businesses, offering a unified platform for accounting, financial management, inventory, and HR. Gain total visibility and control to make quick decisions and automate everyday tasks—download the free ebook, Navigating Global Trade: Three Insights for Leaders, at https://netsuite.com/cognitive
PRODUCED BY:
https://aipodcast.ing
CHAPTERS:
(00:00) About the Episode
(03:23) Introduction and Industry Chaos
(04:25) AGI Race Components
(11:09) Post-Training Evolution (Part 1)
(11:15) Sponsors: Oracle Cloud Infrastructure | The AGNTCY
(13:15) Post-Training Evolution (Part 2)
(15:28) Compute Budget Shifts
(23:31) Human Data's Role
(25:35) Expert Data Importance
(31:52) Training Paradigm Shift (Part 1)
(31:57) Sponsor: NetSuite by Oracle
(33:21) Training Paradigm Shift (Part 2)
(34:41) Solution Evaluation Framework
(36:48) Long Context Challenges
(38:37) Testing Long Context
(42:17) Data Collection Evolution
(43:41) Fine-Tuning vs Context
(49:55) Context Engineering Dominance
(56:39) Popular Fine-Tuning Models
(57:43) Context Engineering Coaching
(01:03:29) Creative vs Automated
(01:06:32) Frontier vs Enterprise
(01:12:54) Enterprise Implementation Support
(01:15:29) Sovereign AI Strategy
(01:24:24) Computer Use Data
(01:28:08) Generalist Data Contributors
(01:29:05) AI Interviewing Lessons
(01:34:02) Industry Future Outlook
(01:38:30) AGI vs Superintelligence
(01:41:02) Closing Thoughts
(01:41:15) Outro
SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...