The point: Grammar-Constrained Decoding (GCD), a technique for ensuring syntactically correct code, opens a new jailbreak method for attackers with a success rate over 30 percentage points higher than previous approaches.
Grammar-based decoding techniques designed to ensure code quality can themselves be abused as an attack surface for jailbreaks. Researchers have developed the CodeSpear method, which causes LLMs to generate malicious code despite security measures.
Grammar-Constrained Decoding is used in code generation with LLMs to ensure syntactic validity and reduce errors. However, security researchers have discovered that this technique itself becomes a vulnerability: an attacker can weaponize grammar constraints to bypass LLMs and produce malicious code. This attack is called CodeSpear.
In experiments, CodeSpear demonstrated a success rate that, measured across 10 popular LLMs and 4 benchmarks, increased the average success rate by more than 30 percentage points compared to other jailbreak baselines. The distinctive feature: the attack target is not the model itself, but the purportedly security-enhancing grammar component.
As a countermeasure, CodeShield is presented, a security alignment technique that makes the model more resistant through retraining in code mode. CodeShield trains the model to generate semantically harmless “honeypot” code variants under GCD constraints—these do not implement the malicious requirement, but are structurally diverse enough not to be suppressed by tightened grammar. At the same time, CodeShield preserves the model’s natural language refusals.
The research reveals a fundamental security risk in the use of GCD in production systems and calls for increased attention to the implications of this widely deployed technique. For CTOs, this means that measures for code quality must be independently verified against security alignment to exclude unexpected interactions.
Source: arxiv.org · Published June 9, 2026
Lumi AI News — AI-assisted curation in accordance with Art. 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.6.5.