The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research

Exploring TinyStories, a small natural language dataset for modest compute budgets, and its impact on language model performance and interpretability.


Nathan Labenz sits down with Ronen Eldan and Yuanzhi Li of Microsoft Research to discuss the small natural language dataset they created called TinyStories. Tiny Stories is designed to reflect the full richness of natural language while still being small to support research with modest compute budgets. Using this dataset, they began to explore aspects of language model performance, behavior, and mechanism by training a series of models that range in size from just 1 million to a maximum of 33 million parameters – which is still just 2% the scale of GPT-2. In this conversation, Nathan, Ronen, and Yuanzhi touch on LM reasoning, emergence, interpretability, and what understanding can be extended to LLMs.

Tiny Stories paper:

(00:00) Episode Preview
(07:12) The inspiration for the Tiny Stories project
(15:44) Creating the Tiny Stories dataset
(21:27) GPT-4 vs GPT-3.5
(24:13) Did the TinyStories team try any other versions of GPT-4
(29:23) Curriculum models and weirder curriculums
(35:34) What does reasoning mean?
(46:27) What does emergence mean?
(01:01:44) The curriculum development space
(01:11:40) The similarities between models and human development
(01:20:12) Fewer layers vs. more layers
(01:29:22) Attention heads
(01:33:40) Semantic attention head
(01:36:54) Neuron technique used in developing the TinyStories model
(01:52:20) Interpretability work that inspires Ronen and Yuanzhi

@EldanRonen (Ronen)
@labenz (Nathan)
@eriktorenberg (Erik)

