Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion
Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion
Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion
This episode features Notion's Sarah Sachs and Simon Last diving deep into the multi-year journey behind Custom Agents—a foundational shift in how productivity software interfaces with AI. They unpack the technical, organizational, and philosophical decisions that shaped one of the most ambitious agent-native systems in enterprise software today.
Notion spent over three years iterating on Custom Agents—rebuilding them four or five times—due to early limitations like weak models, short context windows, and unreliable tool-calling. A major inflection came with stronger reasoning models (e.g., Sonnet 3.6/3.7), enabling robust background execution, fine-grained permissions, and Slack-integrated workflows. Their approach centers on the 'Agent Lab' thesis: not just wrapping models, but deeply understanding collaboration patterns to build product systems around frontier capabilities. Engineering leadership emphasizes user journeys over novelty, low-ego teams comfortable deleting code, and 'demos over memos'—shipping internally first for rapid feedback. Notion organizes AI across core infrastructure, packaging teams, and a company-wide mandate that every product surface must serve both humans and agents. Their eval philosophy includes 'Frontier/Headroom' tests (designed to fail ~70% of the time) and a dedicated Model Behavior Engineer role focused on capability analysis—not just engineering. Architecturally, they evolved from XML and JavaScript agents to Markdown- and SQL-like abstractions, prioritizing progressive tool disclosure and deterministic CLI execution over opaque MCP where possible. Pricing uses credits to abstract across tokens, models, web search, and sandboxing—guided by an 'Auto' system that matches models to tasks. Crucially, Notion sees itself not as a hardware or foundation model builder, but as the system of record where collaboration data lives—powering agents, search, and workflows like Meeting Notes, which has become a key growth loop by turning conversations into structured, actionable knowledge.
00:00
00:00
MCP is good for narrow, lightweight agents with tight permissions
00:39
00:39
Simon and Sarah from Notion join the Latent Space podcast
00:52
00:52
Early 2022 attempts failed because models were too dumb and had short context length
04:32
04:32
Notion's two crucial skills for frontier capabilities: avoiding swimming upstream and anticipating product development
11:28
11:28
Notion has rebuilt its Custom Agents 3–4 times
14:48
14:48
Jimmy’s image generation project on the database collections team became a full-fledged feature thanks to low-ego leadership and rapid iteration
15:43
15:43
The majority of traffic will come from agents in the future
19:13
19:13
Every team owns their own evals, many integrated into CI or run nightly
23:49
23:49
Notion's Last Exam only passes 30% of the time and has full-time staff dedicated to it
24:22
24:22
The Model Behavior Engineer role combines data science, product management, and prompt engineering to understand model capabilities and headroom
25:58
25:58
Supervision for coding agents can come from non-engineers like UXRE personnel who triage failures and guide investment
26:57
26:57
Software engineers at Notion are going through an identity crisis, realizing that delegation and context-switching are more important than code-writing
30:56
30:56
Custom agents route bugs to appropriate teams and post in Slack, replacing manual processes rather than people
32:22
32:22
There's a limit on the number of recursions to avoid infinite loops when composing agents
39:02
39:02
Using language models for deterministic tasks and interfacing with third-party providers is wasteful
42:52
42:52
Notion built its own mail and calendar in-house, spending time fine-tuning tools, building triggers, and using the right tools at the right time
47:46
47:46
Adding new tools hit a bottleneck due to token usage, efficiency, and quality trade-offs
51:01
51:01
Making it too easy to use can diminish the agent's capabilities
52:10
52:10
Custom agents can set and debug themselves, and users can ask about failures and update instructions
1:07:39
1:07:39
Most problems in the system are due to tool bugs rather than model issues
