RL-Controlled Sampling for Test-Time Scaling in Large Language Models3. June 20264. July 2026AI ModelsA CPU-based RL controller optimizes adaptive sampling during test-time scaling, reducing computational overhead and latency compared to heuristic methods. Share on: