AI_SLANG_ENTRY
Jailbreak Meaning
A prompt or interaction pattern that tries to push an AI model past its safety rules, policy boundaries, or refusal behavior.
What does Jailbreak mean?
A prompt or interaction pattern that tries to push an AI model past its safety rules, policy boundaries, or refusal behavior.
A jailbreak is an attempt to make an AI system do something its designers tried to prevent, usually by disguising or escalating the instruction inside a prompt.
Origin and usage
Borrowed from older security and device-hacking language, then adopted by LLM users and security researchers for attempts to bypass model guardrails.
Source type: technical-term. Last checked: 2026-07-05.
Established AI security term; OWASP and MITRE ATLAS both track jailbreak-style attacks, while community usage often uses the word more loosely for guardrail bypass prompts.
Examples
- The screenshot looked clever until you realized it was just another jailbreak prompt.
- Treat public jailbreaks as security signals, not party tricks for production agents.