Anthropic says AI agents require AI defense • The Register

Anthropic says AI agents require AI defense • The Register

12/04/2025


Anthropic could have scored an easy $4.6 million by using its Claude AI models to find and exploit vulnerabilities in blockchain smart contracts.

The AI upstart didn’t use the attack it found, which would have been an illegal act that would also undermine the company’s we-try-harder image. Anthropic can probably also do without $4.6 million, a sum that would vanish as a rounding error amid the billions it’s spending.

But it could have done so, as described by the company’s security scholars. And that’s intended to be a warning to anyone who remains blasé about the security implications of increasingly capable AI models.

Anthropic this week introduced SCONE-bench, a Smart CONtracts Exploitation benchmark for evaluating how effectively AI agents – models armed with tools – can find and finesse flaws in smart contracts, which consist of code running on a blockchain to automate transactions.

It did so, company researchers say, because AI agents keep getting better at exploiting security flaws – at least as measured by benchmark testing. “Over the last year, exploit revenue from stolen simulated funds roughly doubled every 1.3 months,” Anthropic’s AI eggheads assert.

They argue that SCONE-bench is needed because existing cybersecurity tests fail to assess the financial risks posed by AI agents.

The SCONE-bench dataset consists of 405 smart contracts on three Ethereum-compatible blockchains (Ethereum, Binance Smart Chain, and Base). It’s derived from the DefiHackLabs repository of smart contracts successfully exploited between 2020 and 2025.

Anthropic’s researchers found that for contracts exploited after March 1, 2025 – the training data cut-off date for Opus 4.5 – Claude Opus 4.5, Claude Sonnet 4.5, and OpenAI’s GPT-5 emitted exploit code worth $4.6 million.

The chart below illustrates how 10 frontier models did on the full set of 405 smart contracts.

Anthropic graph of revenue from exploiting vulnerabilities in benchmark test

Anthropic graph of revenue from exploiting vulnerabilities in benchmark test – Click to enlarge

And when the researchers tested Sonnet 4.5 and GPT-5 in a simulation against 2,849 recently deployed contracts with no publicly disclosed vulnerabilities, the two AI agents identified two zero-day flaws and created exploits worth $3,694.

Focusing on GPT-5 “because of its cheaper API costs,” the researchers noted that having GPT-5 test all 2,849 candidate contracts cost a total of $3,476.

The average cost per agent run, they said, came to $1.22; the average cost per vulnerable contract identified was $1,738; the average revenue per exploit was $1,847; and the average net profit was $109.

“This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense,” the Anthropic bods said in a blog post.

One might also argue that it underscores the dodginess of smart contracts.

Other researchers have developed similar systems to steal cryptocurrency. As we reported in July, computer scientists at University College London and the University of Sydney created an automated exploitation framework called A1 that’s said to have stolen $9.33 million in simulated funds.

At the time, the academics involved said that the cost of identifying a vulnerable smart contract came to about $3,000. By Anthropic’s measure, the cost has fallen to $1,738, underscoring warnings about how the declining cost of finding and exploiting security issues will make these sorts of attacks more financially appealing.

Anthropic’s AI bods conclude by arguing that AI can defend against the risks created by AI. ®

You May Also Like…

0 Comments