The Utility of Interpretability — Emmanuel Amiesen
Latent Space: The AI Engineer Podcast
2025/06/06
The Utility of Interpretability — Emmanuel Amiesen
The Utility of Interpretability — Emmanuel Amiesen

Latent Space: The AI Engineer Podcast
2025/06/06
Shownote
Shownote
Emmanuel Amiesen is lead author of “Circuit Tracing: Revealing Computational Graphs in Language Models” (https://transformer-circuits.pub/2025/attribution-graphs/methods.html ), which is part of a duo of MechInterp papers that Anthropic published in March (alongside https://transformer-circuits.pub/2025/attribution-graphs/biology.html ).
We recorded the initial conversation a month ago, but then held off publishing until the open source tooling for the graph generation discussed in this work was released last week: https://www.anthropic.com/research/open-source-circuit-tracing
This is a 2 part episode - an intro covering the open source release, then a deeper dive into the paper — with guest host Vibhu Sapra (https://x.com/vibhuuuus ) and Mochi the MechInterp Pomsky (https://x.com/mochipomsky ). Thanks to Vibhu for making this episode happen!
While the original blogpost contained some fantastic guided visualizations (which we discuss at the end of this pod!), with the notebook and Neuronpedia visualization (https://www.neuronpedia.org/gemma-2-2b/graph ) released this week, you can now explore on your own with Neuronpedia, as we show you in the video version of this pod.
Highlights
Highlights
This podcast episode delves into the groundbreaking work of Emmanuel Amiesen, lead author of Anthropic's recent papers on Circuit Tracing and Mechanical Interpretability. The discussion is split into two parts: an introduction to the open-source release of circuit tracing tools and a deeper exploration of the research behind these advancements. With guest host Vibhu Sapra and Mochi the MechInterp Pomsky, the episode highlights how these tools enable users to explore and understand the inner workings of language models like Gemma 2 2B.
Chapters
Chapters
Intro & Guest Introductions
00:00Anthropic's Circuit Tracing Release
01:00Exploring Circuit Tracing Tools & Demos
06:11Model Behaviors and User Experiments
13:01Behind the Research: Team and Community
17:02Main Episode Start: Mech Interp Backgrounds
24:19Getting Into Mech Interp Research
25:56History and Foundations of Mech Interp
31:52Core Concepts: Superposition & Features
37:05Applications & Interventions in Models
39:54Challenges & Open Questions in Interpretability
45:59Understanding Model Mechanisms: Circuits & Reasoning
57:15Model Planning, Reasoning, and Attribution Graphs
1:04:24Faithfulness, Deception, and Parallel Circuits
1:30:52Publishing Risks, Open Research, and Visualization
1:40:16Barriers, Vision, and Call to Action
1:49:33Transcript
Transcript
swyx: All right, we are actually going to record this as a intro to the main episode. But here we have my trusty co-host, guest host, I guess, Vibhu, as well as Emmanuel from Anthropic. We're going to talk about the circuit tracing stuff and all the interp...