scripod.com
Catching AI Sleeper Agent - LLM Backdoors

Highlights

Transcript

Chapters

Pins

Catching AI Sleeper Agent - LLM Backdoors

OverviewShownote
Unprocessed episode, you can be the first!

Shownote

Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly me...

Highlights

Chapters

Transcript