Ethical Hacking News

Anthropic's Distillation Alert: Chinese AI Firms Indulged in Industrial-Scale Claude Model Hacking

Anthropic has revealed an alarming trend of industrial-scale Claude model hacking carried out by three Chinese-based AI companies. These malicious actors utilized 16 million queries to extract capabilities from Anthropic's LLM, compromising its terms of service and posing significant national security risks.

Anthropic has identified malicious distillation attacks carried out by three Chinese-based AI companies: DeepSeek, Moonshot AI, and MiniMax.

The attacks exploited 16 million Claude queries to extract capabilities from Anthropic's large language model (LLM).

The illicit distillation targeted the most differentiated capabilities of the LLM, specifically agentic reasoning, tool use, and coding.

Malicious actors used commercial proxy services that resell access to the Claude LLM at scale to carry out these attacks.

The attacks pose significant national security risks as illicitly distilled models often lack necessary safeguards.

Anthropic has taken steps to mitigate this threat, including building classifiers and behavioral fingerprinting systems to identify suspicious distillation attack patterns.

Anthropic, a prominent artificial intelligence (AI) research and development organization, has sounded the alarm on industrial-scale campaigns carried out by three Chinese-based AI companies – DeepSeek, Moonshot AI, and MiniMax. These malicious actors have been utilizing 16 million Claude queries to illicitly extract capabilities from Anthropic's large language model (LLM), thereby compromising its terms of service and regional access restrictions.

The illicit distillation attacks targeted the most differentiated capabilities of the Claude LLM, specifically agentic reasoning, tool use, and coding. DeepSeek focused on refining its reasoning capabilities by leveraging 150,000 exchanges with Claude, while Moonshot AI aimed to develop its agentic coding and tool-use capabilities through interactions amounting to over 3.4 million exchanges. Meanwhile, MiniMax concentrated its efforts on extracting agentic coding and tool-use capabilities via approximately 13 million exchanges.

Anthropic has identified these attacks as part of a larger problem involving commercial proxy services that resell access to the Claude LLM at scale. These services employ "hydra cluster" architectures containing massive networks of fraudulent accounts, which in turn distribute traffic across their API to evade detection. By utilizing this infrastructure, malicious actors could generate vast amounts of carefully crafted prompts designed to extract specific capabilities from the model and leverage these high-quality responses for training their own AI models.

These malicious activities pose significant national security risks as illicitly distilled models often lack necessary safeguards. Consequently, Anthropic has warned that such unprotected capabilities can proliferate with many protections stripped out entirely. Moreover, foreign AI companies that engage in distillation attacks can weaponize these unprotected capabilities to facilitate malicious activities, including cyber-related operations and mass surveillance systems deployed by authoritarian governments.

To mitigate this threat, Anthropic has taken several steps, including building classifiers and behavioral fingerprinting systems to identify suspicious distillation attack patterns in API traffic. The organization has also strengthened verification for educational accounts, security research programs, and startup organizations, as well as implemented enhanced safeguards to reduce the efficacy of model outputs for illicit distillation.

Anthropic's alert comes weeks after Google Threat Intelligence Group (GTIG) disclosed it had identified and disrupted distillation and model extraction attacks aimed at Gemini's reasoning capabilities through over 100,000 prompts. This incident highlights the growing concern surrounding AI-powered threats and the need for robust security measures to safeguard AI services.

Furthermore, Anthropic emphasized that these risks primarily concentrate among model developers and service providers rather than posing an existential threat to average users. However, this does not diminish the importance of proactive action being taken by organizations like Anthropic in addressing this critical vulnerability.

In light of recent developments and the implications surrounding these illicit distillation campaigns, it has become clear that the security landscape for AI services requires urgent attention and a multifaceted approach. By recognizing these risks and working together to develop more effective countermeasures, we can safeguard our increasingly dependent technological infrastructure against the growing threats posed by sophisticated adversaries.

Related Information:

https://www.ethicalhackingnews.com/articles/Anthropics-Distillation-Alert-Chinese-AI-Firms-Indulged-in-Industrial-Scale-Claude-Model-Hacking-ehn.shtml

Published: Tue Feb 24 01:42:16 2026 by llama3.2 3B Q4_K_M

Today's cybersecurity headlines are brought to you by ThreatPerspective

Anthropic's Distillation Alert: Chinese AI Firms Indulged in Industrial-Scale Claude Model Hacking