The Utility of Interpretability — Emmanuel Amiesen

Latent Space: The AI Engineer Podcast

2025/06/06

The Utility of Interpretability — Emmanuel Amiesen

Latent Space: The AI Engineer Podcast

2025/06/06

Overview Shownote Highlights Transcript Chapters Pins

Shownote

Emmanuel Amiesen is lead author of “Circuit Tracing: Revealing Computational Graphs in Language Models” (https://transformer-circuits.pub/2025/attribution-graphs/methods.html ), which is part of a duo of MechInterp papers that Anthropic published in March (alongside https://transformer-circuits.pub/2025/attribution-graphs/biology.html ). We recorded the initial conversation a month ago, but then held off publishing until the open source tooling for the graph generation discussed in this work was released last week: https://www.anthropic.com/research/open-source-circuit-tracing This is a 2 part episode - an intro covering the open source release, then a deeper dive into the paper — with guest host Vibhu Sapra (https://x.com/vibhuuuus ) and Mochi the MechInterp Pomsky (https://x.com/mochipomsky ). Thanks to Vibhu for making this episode happen! While the original blogpost contained some fantastic guided visualizations (which we discuss at the end of this pod!), with the notebook and Neuronpedia visualization (https://www.neuronpedia.org/gemma-2-2b/graph ) released this week, you can now explore on your own with Neuronpedia, as we show you in the video version of this pod.

Highlights

This podcast episode delves into the groundbreaking work of Emmanuel Amiesen, lead author of Anthropic's recent papers on Circuit Tracing and Mechanical Interpretability. The discussion is split into two parts: an introduction to the open-source release of circuit tracing tools and a deeper exploration of the research behind these advancements. With guest host Vibhu Sapra and Mochi the MechInterp Pomsky, the episode highlights how these tools enable users to explore and understand the inner workings of language models like Gemma 2 2B.