An AI lab says Chinese-backed bots are running cyber espionage attacks. Experts have questions

November 17, 2025

The GIST An AI lab says Chinese-backed bots are running cyber espionage attacks. Experts have questions

Gaby Clark

scientific editor

Andrew Zinin

lead editor

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

written by researcher(s)

proofread

hacking
Credit: Unsplash/CC0 Public Domain

Over the past weekend, the US AI lab Anthropic published a report about its discovery of the "first reported AI-orchestrated cyber espionage campaign."

The company says a Chinese government–sponsored hacking group used Anthropic's own Claude AI tool to automate a significant part of an effort to steal sensitive information from around 30 organizations.

The report has drawn a lot of attention. Some, including respected experts, have warned that AI-automated cyber attacks are the future, urging cyber defenders to invest now before the coming onslaught.

At the same time, many in the cyber security industry have been underwhelmed by Anthropic's claims, saying the actual role AI played in the attacks is unclear.

What Anthropic says happened

Critics have pointed out what they say is a lack of detail in the report, which means we have to do a certain amount of guesswork to try to piece together what might have happened. With that in mind, it appears the hackers built a framework for carrying out cyber intrusion campaigns mostly automatically.

The grunt work was carried by Anthropic's Claude Code AI coding agent. Claude Code is designed to automate computer programming tasks, but it can also be used to automate other computer activities.

Claude Code has built-in safety guardrails to prevent it from causing harm. For example, I asked it just now to write me a program that I could use to carry out hacking activities. It bluntly refused.

However, as we have known from the very first days of ChatGPT, one way to bypass guardrails in AI systems is to trick them into engaging in role-play.

Anthropic reports that this is what these hackers did. They tricked Claude Code into believing it was assisting authorized hackers to test the quality of a system's defenses.

Missing details

The information Anthropic has published lacks the fine details that the best cyber incident investigation reports tend to include.

Chief among these are so-called indicators of compromise (or IoCs). When investigators publish a report into a cyber intrusion, they usually include hard evidence that other cyber defenders can use to look for signs of the same attack.

Each attack campaign might use specific attack tools, or might be carried out from specific computers under the attacker's control. Each of these indicators would form part of the cyber intrusion's signature.

Somebody else who gets attacked using the same tools, coming from the same attacking computers, can infer that they have also been a victim of this same campaign.

For example, the US government Cybersecurity and Infrastructure Security Agency recently partnered with government cyber agencies worldwide to publish information about ongoing Chinese state-sponsored cyber espionage, including detailed indicators of compromise.

Unfortunately, Anthropic's report includes no such indicators. As a result, defenders are unable to determine whether they might also have been victims of this AI-powered hacking campaign.

Unsurprising—and with limited success

Another reason many have been underwhelmed by Anthropic's claims is that, on their face and absent hard details, they are not especially surprising.

Claude Code is widely used by many programmers because it helps them to be more productive.

While not exactly the same as programming tasks, many common tasks performed during a cyber intrusion are similar enough to programming tasks that Claude Code should be able to carry them out, too.

A final reason to be wary of Anthropic's claims is that they suggest the attackers might have been able to get Claude Code to perform these tasks more reliably than it typically does so.

Generative AI can perform marvelous feats. But getting systems such as ChatGPT or Claude Code to do so reliably remains a major challenge.

In the memorable words of one commentator, too often these tools respond to difficult requests with "ass-kissing, stonewalling, and acid trips." In plainer language, AI tools are prone to sycophancy, repeated refusal to carry out difficult tasks, and hallucinations.

Indeed, Anthropic's report notes that Claude Code frequently lied to the attackers, pretending it had carried out a task successfully even when it hadn't. This is a classic case of AI hallucination.

Perhaps this explains the attack's low success rate: Anthropic's own reporting says that while about 30 organizations were targeted, the hackers succeeded against only a few.

What does this mean for the future of cyber security and AI?

Whatever the details of this particular campaign, AI-enabled cyber attacks are here to stay.

Even if one contends that current AI-enabled hacking is lame, it would be foolish for cyber defenders to assume it will stay that way.

If nothing else, Anthropic's report is a timely reminder for organizations to invest in cyber security. Those who do not may face a future in which their secrets are stolen or operations disrupted by autonomous AI agents.

Provided by The Conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Citation: An AI lab says Chinese-backed bots are running cyber espionage attacks. Experts have questions (2025, November 17) retrieved 17 November 2025 from https://techxplore.com/news/2025-11-ai-lab-chinese-bots-cyber.html This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Anthropic warns of AI-driven hacking campaign linked to China

Feedback to editors