Skip to content

Claude Fable 5: Cyber-Safeguards and Jailbreak Framework Defined

In brief: Anthropic classifies AI cybersecurity use into four categories and establishes a severity framework for jailbreaks to enable defensive applications while preventing misuse.

Anthropic has globally redeployed Claude Fable 5 and documents for the first time the cybersecurity classifiers as well as a framework for assessing AI jailbreaks to address dual-use risks in a structured manner.

Anthropic has globally redeployed Claude Fable 5 and shares detailed information on two security topics: the integrated safety classifiers and a jailbreak severity framework. The classifiers are specialized AI systems designed to detect and block dangerous or potentially dangerous cybersecurity uses. Anthropic documents for the first time precisely which harm categories the classifiers address and which they do not.

The core challenge lies in the dual-use nature of cybersecurity technology: capabilities such as code vulnerability analysis can be used by defenders to secure systems or by attackers to compromise them. Anthropic therefore refrains from imposing a blanket ban on all cybersecurity activities. Instead, Fable 5 classifies requests into four categories: (1) Prohibited Use – highly dangerous with minimal defensive use (blocked), (2) High-Risk Dual Use – widespread among attackers but with legitimate applications (blocked), (3) Low-Risk Dual Use – predominantly defensive but can benefit attackers (monitored and partially blocked as a security margin), (4) Benign Use – harmless (allowed with monitoring). The security margin deliberately includes legitimate requests that Anthropic blocks out of caution; this buffer has been consciously expanded in Fable 5 compared to earlier versions.

In parallel, Anthropic presents a jailbreak severity framework, developed with partner Glaswing. AI jailbreaks are unconventional prompt strategies with which users can circumvent safeguards. Until now, a standardized assessment of their dangerousness was lacking – some disable only minor behaviors, while others unlock broad harms and significantly amplify AI risks. A common standard would enable developers and governments to communicate consistently about specific jailbreak risks.

Anthropic invites feedback on the Fable 5 cybersecurity strategy at cyber-safeguards@anthropic.com and operates a HackerOne program through which security researchers can submit new jailbreaks. The company positions the approach as a step toward balancing defensive AI use and abuse prevention through standardization.


Source: www.anthropic.com · Published 1 July 2026
Lumi AI News — AI-assisted curation pursuant to Art. 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.2.

Share on: