NEUSteerability of Language Models Can Be Predicted Early15. June 2026AI Models, Claude AIShare on:A trainable classifier predicts with a 0.7 Macro-F1-Score based on early hidden states whether activation steering will succeed without requiring complete generations. Share on: