Bebop: Rejection Sampling Improves Multi-Token Prediction in RL Training11. June 2026AI Models, Claude CodeShare on:Bebop uses rejection sampling and TV loss optimization to maintain stable MTP acceptance rates during RL training and accelerates rollouts by up to 1.8x. Share on:
Hybrid LLMs Lose Long-Context Capabilities Through CoT Fine-Tuning10. June 2026AI Models, Claude CodeShare on:CoT fine-tuning degrades long-context retrieval in hybrid LLMs by distorting query-key projections; QK-Restore fixes this without additional training. Share on: