REVES: Iterative Training for More Efficient Test-Time Scaling in LLMs

Share on:

In a nutshell: REVES leverages intermediate steps from successful error corrections as separate training data, achieving better performance with less computational overhead than conventional multi-turn reinforcement learning methods.

Researchers have developed a two-stage training framework called REVES that makes language models more efficient through targeted learning from intermediate steps in problem-solving. The approach extracts revision and verification data from failed solution attempts, significantly reducing computational requirements compared to standard RL approaches.

The presented REVES method addresses a core challenge in training language models for multi-step inference: standard post-training methods typically optimize toward single-shot objectives, while test-time scaling through sequential revisions requires multiple steps. REVES instead alternates between two phases — data and prompt augmentation as well as policy optimization — converting intermediate steps (so-called “near-miss” answers) from successful corrections into decoupled revision and verification tasks.

Evaluation shows concrete gains: On LiveCodeBench, the method achieves +6.5 points over an RL baseline and +4.0 points over standard multi-turn training, measured on publicly available test cases. On circle-packing problems, REVES reaches the previous state-of-the-art result of large evolutionary search systems with a 4B base model and significantly fewer rollouts. On mathematics tasks under ground-truth verification, the improved correction capability is confirmed.

Technically interesting is the generalization aspect: the method transfers to out-of-distribution problems like n-queens and mini-Sudoku, where correctness is fully defined by problem constraints. The reduction in computational requirements comes from off-policy data generation, which avoids long-horizon sampling in standard multi-turn RL. Source code is available.

Source: arxiv.org · Published June 16, 2026
Lumi AI News — AI-assisted curation pursuant to Art. 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.1.

Share on:

REVES: Iterative Training for More Efficient Test-Time Scaling in LLMs

Lumi AI News

Legal

Topics