RACES enables automatic composition of verifiable environments through recursive combination, with DeepSeek-R1-Distill-Qwen-14B improving by 3.1 points and Qwen3-14B by 2.3 points across six benchmarks.
FlowTracer models information propagation as a directed graph and derives token credits from global flow structure to precisely concentrate reinforcement learning signals on critical reasoning steps.