NEUSparse Autoencoders: Interpretable Features Insufficient for Reliable Model Control18. June 2026AI Models, Cybersecurity, RegulationShare on:SAE-based safety measures are vulnerable to post-intervention recovery: models can restore suppressed behaviors even when targeted features are controlled. Share on: