SafePyramid: Benchmark Reveals Weaknesses in LLM Guardrails for Context-Dependent Policies30. June 2026AI Models, CybersecurityEven GPT-4.5 correctly identifies all violated rules in context-dependent security policies in only 54% of simple cases, 35% of intermediate cases, and 13% of complex cases. Share on: