scripod.com

Information Theory for Language Models: Jack Morris

In this episode, we sit down with Jack Morris, a PhD student at Cornell Tech whose research focuses on the information-theoretic foundations of large language models. Unlike many of his peers who focus on trending topics like AI agents or benchmarking, Jack delves into the deeper mechanics of how models store and process information. His work spans embeddings, model inversion, and the surprising role of datasets in driving AI innovation. This conversation offers a unique window into some of the most underappreciated yet critical aspects of modern AI research.
Jack Morris discusses his research journey and key contributions to understanding large language models from an information-theoretic perspective. He explores how information is stored and compressed within models, highlighting that GPT-style architectures store around 3.6 bits per parameter. The conversation also covers embedding inversion, where text can be reconstructed from vector representations with high accuracy, revealing the surprising richness of embedded information. Jack introduces the idea that major AI breakthroughs often stem from new datasets rather than novel methods, citing examples like BERT and transformers. He also touches on model universality, drawing parallels to computer vision techniques like CycleGAN, and emphasizes the importance of adapting to evolving AI education and engineering practices.
10:54
10:54
Mojo is positioned as a faster alternative to CUDA, developed by Chris Lattner.
22:25
22:25
Training a model can measure its ability to store information from text.
27:49
27:49
Achieved 97% accuracy in recovering text from embeddings after iterative improvements.
47:34
47:34
Gemma 3n enables stackable and swappable capabilities in language models.
53:04
53:04
GPT-style models store around 3.6–3.9 bits of information per parameter.
1:06:49
1:06:49
In AI, paradigm shifts often come from training new techniques on new data, not just new methods.