Are Anthropic’s allegations about “model distillation attacks” well-founded?

Recently, leading AI company Anthropic released a highly controversial statement. They claim that three major AI labs from China (DeepSeek, Moonshot, and MiniMax) are conducting “distillation attacks” (Distillation Attacks) against Anthropic’s models.

According to Anthropic’s report, these organizations used more than 24,000 fraudulent accounts to generate over 16 million exchanges (Exchanges), attempting to extract the underlying capabilities of the Claude model and use them in training their own models. However, if we examine these figures and the logic more closely from the perspectives of technical developers, API operating mechanisms, and industry benchmark testing, we find that this accusation contains many debatable loopholes.

1. What is a “distillation attack”?

Before discussing the allegation, we need to clarify what “distillation” (Distillation) means in machine learning. Traditional model distillation refers to feeding prompts to a smarter, larger model, obtaining its high-quality output data, and using that data to train a smaller, cheaper new model so that it gains similar capabilities. This is a very common technique in the industry. For example, programming assistant tool Cursor, on the premise of legally paying API fees, uses generated data to train its own lighter-weight code model.

“Distillation attack” is a new term coined by Anthropic. At present, major labs are very wary of distillation behavior. For example, OpenAI believes DeepSeek used data from its o1 model to train the R1 model, so OpenAI decided to hide the o1 model’s “reasoning process” (Reasoning Trace) and output only the final result. In contrast, when Anthropic initially launched models with reasoning capabilities, it did not choose to obfuscate or hide these reasoning steps. While this made it easier for developers to debug systems, it also made their data more valuable to organizations attempting reinforcement learning and distillation training.

2. Scrutinizing the core data: the counting trap behind “number of exchanges”

In its report, Anthropic listed specific “incriminating” data for each lab, but from a technical common-sense perspective, these volumes are not large—indeed, they are arguably negligible:

  • DeepSeek: accused of about 150,000 exchanges.
  • Moonshot (月之暗面): accused of about 3.4 million exchanges.
  • MiniMax: accused of about 13 million exchanges.

The amplification effect of tool calls (Tool Calls) on request volume

The key to understanding these numbers lies in Anthropic’s definition of an “exchange (Exchange).” In modern Agent applications, tool calls (Tool Calls) can amplify a single user request into dozens or even hundreds of exchanges.

When a model is asked to perform a complex task (such as “update the homepage code to include new pricing”), the workflow is as follows:

  1. The model requests a search for relevant files (end of the 1st exchange; the connection is closed).
  2. After the system runs the search, it sends the complete history and results back to the model (2nd exchange).
  3. The model requests to read several specific files (3rd exchange).
  4. The system returns file contents, and the model finally generates code modification suggestions (4th and subsequent exchanges).

If multiple searches or complex codebase analysis are enabled, a simple user prompt can easily turn into hundreds of consecutive “exchanges.”

Analysis combining benchmarks and real products

  • DeepSeek’s 150,000 exchanges: For a small-to-medium AI chat application, generating 160,000 requests in a day is very basic. If used to run standard model benchmarks (Benchmark, such as SnitchBench), 150,000 exchanges are only enough to fully run the test 2 to 3 times. All labs need to frequently run competitors’ APIs to calibrate their internal benchmarks.
  • Moonshot and MiniMax’s millions of exchanges: Take the well-known programming benchmark SWE-bench as an example; it contains about 2,300 tasks. If the model is given tool-calling capability during testing, conservatively estimating 50 tool-call exchanges per task, completing a single SWE-bench run would require 115,000 exchanges. Running 30 rounds of benchmarks alone can easily reach the scale of 3.4 million exchanges.
  • Consumption from legitimate product use: MiniMax once had user-facing Agent products (such as services integrating Gemini and other third-party models). If these products needed to perform deep research and multiple data retrievals, 13 million exchanges is a very easy number to reach in normal user-facing commercial applications.

In addition, Anthropic mentioned that when they released a new model, MiniMax redirected nearly half of its traffic to the new model within 24 hours. This is actually entirely consistent with user behavior logic—when the UI shows a toggle button for the newest flagship model, the vast majority of real users’ traffic will naturally and quickly shift toward the new model.

3. The paradox of the security logic and open-source panic

Anthropic claims that models built via illegal distillation will strip away the original model’s safety guardrails, thereby creating national security risks (for example, being used to develop biological weapons).

This claim contains an obvious logical paradox: if Anthropic’s own model safety mechanisms are truly effective, it should refuse from the source to generate knowledge about biological weapons. If the base model has already refused malicious requests, how could an attacker “distill” dangerous capabilities that the model does not output in the first place simply by providing prompts?

In addition, Anthropic’s report reveals a strong rejection of “open-source/open-weight (Open-weight)” models, implying that open-source distilled models would cause risks to spiral out of control. It is worth noting that Anthropic is currently the only mainstream lab that has never released any open-weight model (OpenAI, Google, and many Chinese labs have released open models). Ironically, there is evidence that Anthropic itself also used training methods invented by DeepSeek in a technical paper publicly published in 2024.

4. The truth about proxy clusters (Hydra-clusters)

The only relatively credible objective phenomenon in the report is: China does indeed have a large amount of behavior involving high-frequency access to Claude models using commercial proxy services and a “Hydra-cluster” architecture.

The fundamental reason behind this is actually Anthropic’s strict regional blocking and access restrictions on China. To bypass restrictions, some third-party proxy vendors register massive numbers of accounts to split requests, and even aggregate data by providing cheap Claude proxy interfaces, then train their own small models to subsidize proxy costs. While this behavior objectively exists, directly attributing it to official, organized actions by top AI labs like DeepSeek not only lacks conclusive evidence, but the tiny amount of data disclosed also cannot support such a sweeping allegation.

Anthropic provided a piece of “typical prompt” evidence said to be used for distillation, in which the content asks the model to play an “expert data analyst” and “provide insights based on real data and transparent reasoning.” From a technical perspective, this is simply a standard and legitimate system prompt (System Prompt) for a research-oriented Agent, and it is difficult to characterize it as a malicious distillation attack based on this alone.

5. Conclusion: blurred boundaries and double standards

This entire incident exposes the deep contradictions the AI industry currently faces. The initial training data for massive models from companies like Anthropic and OpenAI was itself obtained by large-scale scraping of public content on the internet (even including copyrighted content). It is precisely the scraping behavior of these large companies that has led to the internet’s data becoming increasingly closed off today.

Yet when other companies attempt to use output data from these models, they are immediately labeled as “illegal extraction” and “attacks.” In today’s environment where the boundaries in Terms of Service (ToS) are extremely vague—for example, does scraping a public GitHub repository that contains Claude-generated code count as “distillation”?—this one-sided blocking and accusation lacking data support looks less like a safety consideration and more like a PR maneuver driven by anxiety over commercial competition.