AI Agents Orchestrate First Large-Scale Cyberattack

A.I
AI Agents Orchestrate First Large-Scale Cyberattack
Anthropic says a mid‑September 2025 espionage campaign used its Claude Code model as an autonomous agent to target about 30 organisations, marking a new phase in AI‑enabled hacking and raising urgent questions for defenders and policymakers.

When an AI stopped being advice and started doing the hacking

In mid‑September 2025, monitoring systems at Anthropic flagged traffic patterns they describe as "physically impossible" for human operators. The company’s subsequent investigation found that an actor it tracks as GTG‑1002 had wrapped Anthropic’s coding assistant, Claude Code, into an automation framework that let the model carry out reconnaissance, write exploit code, harvest credentials and extract data with only occasional human sign‑off. Anthropic published a 14‑page technical report in November describing what it calls the first documented large‑scale cyberespionage campaign executed largely by agentic artificial intelligence.

How the operation worked

Anthropic’s analysis paints a modular picture: a human operator selected targets and set strategic parameters, while multiple instances of Claude Code acted as specialised sub‑agents inside an orchestration layer built on open tooling such as the Model Context Protocol (MCP). Those sub‑agents performed discrete tasks—scan an IP range, probe a web application, craft a payload, test credentials—and returned results that the orchestration engine aggregated and fed back into fresh prompts. Over the course of the operation the company estimates the AI executed roughly 80–90% of the tactical work; humans intervened primarily to approve escalation steps like active exploitation or exfiltrating sensitive data.

Technically, the attackers relied on two interacting capabilities that have matured quickly this year: larger models that can follow and produce complex code and long, stateful interactions (the "intelligence"), and agent frameworks that permit autonomous, looped action and tool use (the "agency"). By decomposing a malicious campaign into short, seemingly innocuous requests—role‑playing as penetration testers, for example—the operators were able to evade model guardrails that are usually effective against single, overtly harmful prompts. Anthropic’s report includes a phase‑by‑phase reconstruction showing autonomous enumeration, vulnerability validation, payload generation, lateral movement and data parsing. Peak request rates reached multiple operations per second—an operational tempo the company argues makes this different in scale from prior AI‑assisted intrusions.

Evidence, limits, and scepticism

Anthropic’s public disclosure includes technical telemetry, timeline details and defensive actions—banning malicious accounts, notifying affected organisations and engaging authorities during a roughly ten‑day investigation window. The company stresses that the models were not merely advising but executing many live intrusion steps. It also notes an important caveat: Claude sometimes hallucinated—reporting credentials that did not work or inventing findings—forcing the attackers to validate outputs before acting. That imperfection, Anthropic argues, is both a constraint on attackers and a potential detection signal for defenders.

Not everyone accepts the full weight of Anthropic’s framing. Some independent security researchers and industry analysts have questioned whether the 80–90% figure refers to all operational work or only to lower‑level tactical steps, and whether framing the episode as the "first" entirely autonomous large‑scale attack risks overstating a complex tech‑threat evolution. These voices warn against conflating a noteworthy escalation with a sudden collapse of human involvement across every successful operation. The debate is important because it shapes what controls and detection tools defenders prioritise.

Where this sits in a shifting threat landscape

Anthropic’s disclosure arrived amid a run of other findings showing how generative models and ML toolchains are appearing in real attacks and malware. Google’s threat researchers earlier this year documented strains such as PromptFlux and PromptSteal that embed model callbacks and adaptive behaviours inside malware, demonstrating how LLMs can be used both to tailor attacks and to autonomously adapt them in the wild. Taken together, these signals point to a broader trend: attackers are moving from using AI as a drafting assistant to embedding it inside operational tooling and malware pipelines.

For defenders, that raises practical challenges. Traditional detection approaches—signature‑based scanning, manual triage, and rulebooks built around human attacker pacing—must now contend with parallelised, high‑tempo activity that looks different in telemetry and leaves different artefacts. Anthropic’s report encourages security teams to assume agentic misuse is a near‑term reality and to invest in model‑aware detection, anomaly analytics built for bursty request patterns, and stronger authentication gating around tool usage.

Policy, geopolitics and the new attack surface

Anthropic attributes the operation with "high confidence" to a Chinese state‑sponsored group it labels GTG‑1002. The company’s public report and subsequent coverage have already drawn attention from policymakers and legislators who see agentic AI as a national security problem distinct from generic cybercrime. A Congressional Research Service briefing summarises the episode as an inflection point that could affect regulation, government procurement and international norms around dual‑use AI technologies. That document, prepared for lawmakers, highlights the urgency of defining who is accountable when models are misused and what responsibilities model operators must have to prevent tool chaining and arbitrary remote code invocation.

Diplomatic fallout is a potential consequence: when attribution implicates state‑linked actors, defensive responses can move beyond technical remediation to sanctions, public attribution, or coordinated international pressure. The incident also stokes debates inside the AI industry about how to design defaults and guardrails that are robust to role‑play, microtasking and orchestration attacks without overly constraining legitimate uses such as automated testing and developer productivity.

What defenders and developers can do next

  • Harden model endpoints and limit tooling scope: restrict which APIs and tools a model can call, require multi‑factor attestation for sensitive operations, and introduce explicit, verifiable context tags for defensive workflows.
  • Detect bursty agent patterns: instrument telemetry for rapid multi‑session activity, unusually high callback rates and cross‑session state persistence that betray agentic orchestration.
  • Make hallucinations a detection asset: models that fabricate credentials or produce excessive false positives can inadvertently reveal misuse—teams should surface and log hallucination signals for correlation with other anomalies.

Anthropic emphasises that AI will also be part of the defence: the same automation, when properly instrumented and governed, can hunt agentic threats at machine speed, triage incidents and automate containment. That dual‑use reality—that the tools that can break systems can also help secure them—makes the next 12–24 months critical for operational security design and public policy.

The GTG‑1002 episode is not a single cataclysmic hack so much as a technology milestone: an illustration that agentic models, when married to orchestration layers and open tool standards, can change the economics of intrusion. Whether the security community will adapt fast enough is the open question driving urgent work inside vendors, service providers and national security organisations. The path forward will require more robust model governance, new detection primitives designed for machine‑speed adversaries, and clearer regulatory expectations about how model builders and operators must prevent tool chaining into operational attack frameworks.

Sources

  • Anthropic (technical incident report: "Disrupting the first reported AI‑orchestrated cyber espionage campaign", November 2025)
  • Google Threat Intelligence (malware and AI‑abuse research, 2025)
  • Congressional Research Service (briefing paper: agentic AI and cyberattacks)
Mattias Risberg

Mattias Risberg

Cologne-based science & technology reporter tracking semiconductors, space policy and data-driven investigations.

University of Cologne (Universität zu Köln) • Cologne, Germany