WARP reconstructs the training source mixtures of language models from their weights, achieving mean absolute errors of 0.046 for BERT and 0.104 for GPT-2.
A systematic data curation pipeline enables agentic models to be trained generalizably across diverse task types while achieving competitive or superior results compared to specialized models.