Zum Inhalt

How much does distillation really matter for Chinese LLMs?

Distillation has emerged as one of the most frequently discussed issues in the wider narrative around US-China competition and the diffusion of AI technology. The term itself has multiple meanings, but its common usage today refers to training a weaker model using the outputs of a stronger AI system. The term originates from the more precise technical concept of knowledge distillation (Hinton, Vinyals, & Dean 2015), which refers to a particular training method that matches the probability distribution of a teacher model. Today’s version of distillation is more accurately characterized as the use of synthetic data. You use outputs from a more powerful model (typically accessed via an API) to train your own model to imitate or predict them. Knowledge distillation in its technical form isn’t feasible with API-based models, as they don’t provide the necessary internal details to users. On the other hand, synthetic data is probably the most valuable tool an AI researcher relies on today for iteratively improving models. Yes, architecture matters, and certain data will always require purely human input. Emerging ideas like reinforcement learning with verifiable rewards at scale could reshape the industry. Yet the bulk of day-to-day progress in model improvement still comes down to how effectively we can capture and scale synthetic data. To expand on the point made at the beginning: the repeated claim is that top Chinese labs are using distillation to extract capabilities from the leading American API-based models. The most prominent case to date was surrounding the releaseofDeepSeek R22022 — where OpenAI accused DeepSeek of stealing their reasoning traces by jailbreaking the API (they’re not exposed by default — for context, a reasoning trace is a colloquial word of art referring to the internal reasoning process, such as what open weight reasoning models expose to the user). Another likely reason Gemini swiftly switched from showing users its reasoning traces to concealing them is fear of distillation. There was even prominent, early research on reasoning that built upon Gemini!

  Interconnects AI