NEUSTARE: Token-Level Stability Procedure Against Policy Entropy Collapse in GRPO Training19. June 2026AI Models, Claude AIShare on:STARE uses surprisal metrics and selective advantage reweighting to maintain policy entropy stability across long training sequences while improving accuracy by 4–8%. Share on: