Columbia CS Professor: Why LLMs Can’t Discover New Science

The a16z Show

2025/10/13

Overview Shownote Highlights Transcript Chapters Pins

The rapid evolution of large language models has sparked intense debate about their potential to transcend mimicry and contribute meaningfully to scientific discovery. In this conversation, a leading computer scientist unpacks the inner workings of LLMs, challenging assumptions about their capabilities and revealing why they may be fundamentally constrained in generating true innovation.

Current large language models excel at pattern recognition and contextual reasoning through mechanisms like chain-of-thought, which reduce uncertainty by structuring predictions geometrically. However, despite their fluency, they cannot generate knowledge beyond their training data—no matter how sophisticated their architecture. The speaker highlights that systems like these interpolate rather than invent, relying on Bayesian inference over fixed distributions without creating new axioms or theories. While in-context learning gives the illusion of adaptation, it lacks genuine abstraction or recursive improvement. Even with multimodal inputs or increased scale, breakthroughs in science require architectural innovation, not just refinement. Researchers are hitting diminishing returns, suggesting we're approaching a plateau. True AGI would demand systems capable of autonomous discovery, such as deriving relativity from incomplete data—a feat no LLM can achieve today. The path forward may lie in rethinking foundational models, inspired by theoretical work on matrix abstraction and cognitive manifolds.

02:50

LLMs create Bayesian manifolds where low entropy means high confidence

12:45

RAC was accidentally invented while trying to fix StatsGuru and has been in production since 2021.

26:51

LLMs can only generate what they've been trained on, not truly self-improve.

30:40

AGI must create new science, not just interpolate training data.

44:12

An LLM creating a large software project without supervision would convince me we're close to AGI