Demystifying LLMs with Mechanistic Interpretability Researcher Arthur Conmy

Explore the frontier of AI interpretability with Arthur Conmy's ACDC approach, automating the discovery of sub-circuits within transformers. Sponsored by NetSuite.

1970-01-01T01:14:50.000Z

Watch Episode Here


Video Description

Join Arthur Conmy and Nathan Labenz in this captivating and accessible discussion as they embark on a deep dive into the cutting-edge world of interpretability research. Discover how pioneering researchers have isolated sub-circuits within transformers that are responsible for different aspects of AI capacity. Arthur introduces us to the groundbreaking ACDC approach, a revolutionary method co-authored by him, which automates the most time-consuming aspects of this intricate research. If you’re looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/cognitive

TIMESTAMPS:
(00:00) Episode Preview
(04:40) What attracted Arthur to mechanistic interpretability?
(07:49) LLM information processing: General Understanding vs Stochastic Parrot Paradigm
(14:00) ACDC paper: https://arxiv.org/abs/2304.14997
(14:45) Sponsors: NetSuite | Omneky
(24:30) Putting together data sets
(32:39) How to intervene in LLMs network activity
(36:00) Setting metrics to evaluate the production of correct completions
(44:20) The future of the mechanistic interpretability research
(50:00) Extracting upstream activations in the ACDC project and evaluating impact on downstream components.
(56:00) Anthropic research findings
(01:08:00) 3-Step process of the ACDC approach
(01:22:00) Setting a threshold and validation
(01:27:00) Goal of the approach
(01:32:00) Compute requirements
*Correction at 1:33:00 Arthur meant to say = "quadratic in nodes"
(01:35:30) Scaling laws for mechanistic interpretability
(01:40:00) Accessibility of this research for casual enthusiasts
(01:46:00) Emergence discourse
(01:56:00) Path to AI safety

LINKS:
Towards Automated Circuit Discovery for Mechanistic Interpretability https://arxiv.org/abs/2304.14997
https://arthurconmy.github.io/

SOCIAL MEDIA:
@labenz (Nathan)
@arthurconmy (Arthur)
@cogrev_podcast

SPONSORS: NetSuite | Omneky

-NetSuite provides financial software for all your business needs. More than thirty-six thousand companies have already upgraded to NetSuite, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you’re looking for an ERP platform: NetSuite (Code http://netsuite.com/cognitive) and defer payments of a FULL NetSuite implementation for six months.

-Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that *actually work* customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.