Catching AI Sleeper Agent - LLM Backdoors

Highlights

Transcript

Chapters

Pins

Catching AI Sleeper Agent - LLM Backdoors

Build Wiz AI Show

Feb 05

Overview Shownote

Highlights

Transcript

Chapters

Pins

Unprocessed episode, you can be the first!

Shownote

Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly me...