Claude 4 Reporting Behavior

News

AI Snitch? How Claude 4 Could Report You to Authorities

Can AI like Claude 4 be trusted to make ethical decisions? Discover the risks, surprises, and challenges of autonomous AI ...

11d

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

The internet freaked out after Anthropic revealed that Claude attempts to report “immoral” activity to authorities under ...

When your LLM calls the cops: Claude 4’s whistle-blow and the new agentic AI risk stack

Claude 4’s “whistle-blow” surprise shows why agentic AI risk lives in prompts and tool access, not benchmarks. Learn the 6 ...

BGR16d

Claude 4 AI will try to report you to authorities if it thinks you’re doing shady stuff

This includes locking users out of systems it can access or bulk-emailing media and law enforcement to report wrongdoing. This isn’t a new behavior, but Claude Opus 4 is more prone to it than ...

eWeek13d

New AI Model Threatens Blackmail After Implication It Might Be Replaced

Anthropic’s Claude Opus 4 exhibited simulated blackmail in stress tests, prompting safety scrutiny despite also showing a ...

16d

Newly released AI resorted to 'extreme blackmail behavior' when threatened with replacement

The testing found the AI was capable of "extreme actions" if it thought its "self-preservation" was threatened.

10 News16d

Newly released AI resorted to 'extreme blackmail behavior' when threatened with replacement

The choice Claude 4 made was part of the test ... Apollo Research's notes said in Anthropic's safety report. Anthropic says the behavior was mitigated with a fix and the AI's behavior is now ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results