Bebop: Rejection Sampling Improves Multi-Token Prediction in RL Training11. June 2026AI Models, Claude CodeShare on:Bebop uses rejection sampling and TV loss optimization to maintain stable MTP acceptance rates during RL training and accelerates rollouts by up to 1.8x. Share on: