Risk & Safety
Jailbreak
A technique used to bypass an AI system's safety constraints or content policies through carefully crafted prompts or inputs. Differs from prompt injection in that jailbreaks typically involve the model's own output mechanisms rather than injecting external instructions. An ongoing adversarial challenge for AI safety researchers and developers.
Referenced in frameworks
NIST AI 600-1 OWASP LLM Top 10