RACES enables automatic composition of verifiable environments through recursive combination, with DeepSeek-R1-Distill-Qwen-14B improving by 3.1 points and Qwen3-14B by 2.3 points across six benchmarks.
FlowTracer models information propagation as a directed graph and derives token credits from global flow structure to precisely concentrate reinforcement learning signals on critical reasoning steps.
FlowTracer assigns credit to tokens based on their measured information throughput in the attention graph rather than treating all equally, yielding consistent performance gains in reasoning tasks.