A 35B agent model with horizon scaling and multi-teacher distillation achieves comparable performance to trillion-parameter models on long-horizon benchmarks.
A systematic data curation pipeline enables agentic models to be trained generalizably across diverse task types while achieving competitive or superior results compared to specialized models.