scripod.com

Building Semantic Memory for AI With Cognee

In this episode of the AI Engineering Podcast, host Tobias Macey speaks with Vasilije Markovic about the evolving challenge of memory integration in large language models (LLMs). As LLMs become more central to complex applications, maintaining context and long-term knowledge remains a persistent hurdle. Markovic shares insights from his experience building Cognee, an open-source semantic memory engine designed to enhance how AI systems retain and retrieve information over time.
Vasilije Markovic discusses how current LLM architectures struggle with 'catastrophic forgetting' due to limited context windows, especially in multi-turn interactions. He introduces the concept of hierarchical memory—balancing short-term retrieval with long-term storage—as a solution to improve continuity and reasoning in AI applications. The conversation explores tools like Graph RAG and semantic memory inspired by cognitive science, which structure knowledge more effectively than traditional vector databases. Markovic details the development of Cognee, its architecture built on adaptable data pipelines, and how it integrates with existing systems using AWS services and Python-based frameworks. He also compares Cognee’s approach to personalization and memory management with other tools like Mem0, while highlighting use cases across industries such as agriculture and logistics. Looking ahead, Markovic outlines plans for deeper integrations with emerging AI stacks and predicts future advancements in agent communication protocols.
00:19
00:19
Vasilije built a memory engine for AI apps in the past year.
01:39
01:39
Memory in LLM systems is based on in-context learning, enabling efficient operation at low cost.
03:05
03:05
Catastrophic Forgetting occurs when LLMs fail to retain prior knowledge after updating prompts.
05:06
05:06
Hierarchical memory improves retrieval efficiency in RAG stacks.
06:52
06:52
Graph RAG offers a more structured approach to knowledge retrieval
10:10
10:10
Semantic memory layer based on human cognition boosts LLM performance.
14:52
14:52
Cognee can be used as short-term memory in the stack, reducing LLM calls.
17:15
17:15
The speaker deployed an initial product on AWS using the Atkinson-Schiffrin model, Neo4j, and VB8.
23:32
23:32
Anticipate increased use of LLMs as costs decrease
29:53
29:53
Cognee supports deployment on AWS and Kubernetes with minimal system disruption.
34:34
34:34
Cognee secures $1.5 million funding for sustainable solutions in complex domains
42:06
42:06
Manual work may be automated with proper technology solutions like RAG and GraphRAG.
51:39
51:39
Relational tools may persist, vector stores could be replaced by open-source alternatives.