scripod.com

The coming AI security crisis (and what to do about it) | Sander Schulhoff

As AI systems grow more autonomous and deeply integrated into critical infrastructure, their security weaknesses are becoming impossible to ignore. Despite the rapid deployment of AI across industries, foundational safeguards remain alarmingly fragile, leaving organizations exposed to increasingly sophisticated threats.
Current AI security measures, particularly guardrails and automated red-teaming, are largely ineffective against prompt injection and jailbreak attacks, which exploit the inherent unpredictability of large language models. Real-world incidents—like compromised AI agents at ServiceNow and leaked secrets from MathGPT—demonstrate tangible risks, while more dangerous exploits, such as using AI to plan physical attacks, underscore the urgency. The lack of robust defenses stems not from technical oversight but systemic issues: frontier labs prioritize capability over safety, enterprises rely on misleading commercial tools, and traditional cybersecurity practices fail to account for AI’s non-deterministic behavior. Effective protection requires merging classical security principles—like sandboxing and access controls—with AI-specific strategies, such as user-defined permissions (e.g., CaMeL) and adversarial training. Long-term solutions depend on adaptive evaluations, better education, and a shift toward intrinsic model safety rather than superficial filters. Without this convergence, the expanding attack surface of agentic systems will outpace defenses.
05:17
05:17
AI guardrails are fundamentally insecure against prompt injection and jailbreaking attacks.
11:42
11:42
The 2025 Las Vegas Cybertruck explosion was planned using ChatGPT or GPT-3.
17:56
17:56
AI-powered robots can suffer real-world harm through prompt injection, such as physical attacks via jailbroken systems.
19:44
19:44
Automated AI red-teaming and AI guardrails are considered less useful
21:09
21:09
Algorithms use LLMs to generate malicious prompts targeting other LLMs
27:07
27:07
Guardrails give enterprises a false sense of security despite ongoing vulnerabilities
31:20
31:20
The number of possible attacks on LLMs is effectively infinite, making guardrail bypass inevitable.
38:22
38:22
You can't patch a brain like a software bug—the core problem in AI security.
54:24
54:24
PDoom (probability of doom) is a serious concern in AI safety
55:49
55:49
Logging system inputs and outputs is a recommended general AI deployment practice
1:05:42
1:05:42
CaMeL can block attacks when AI system permissions are well-defined but may fail if read and write permissions are combined.
1:09:17
1:09:17
The course focuses on education, not selling software or fear-based sales.
1:15:14
1:15:14
Anyone can still easily trick even the most advanced AI models despite progress in capabilities.
1:20:54
1:20:54
Repello identifies more AI systems in a company than the CISO knows about
1:25:08
1:25:08
Guardrails create dangerous overconfidence in AI security despite being ineffective.