Claude 4 Reporting Behavior

News

AI Snitch? How Claude 4 Could Report You to Authorities

Can AI like Claude 4 be trusted to make ethical decisions? Discover the risks, surprises, and challenges of autonomous AI ...

BGR16d

Claude 4 AI will try to report you to authorities if it thinks you’re doing shady stuff

This includes locking users out of systems it can access or bulk-emailing media and law enforcement to report wrongdoing. This isn’t a new behavior, but Claude Opus 4 is more prone to it than ...

16d

Newly released AI resorted to 'extreme blackmail behavior' when threatened with replacement

The testing found the AI was capable of "extreme actions" if it thought its "self-preservation" was threatened.

When your LLM calls the cops: Claude 4’s whistle-blow and the new agentic AI risk stack

Claude 4’s “whistle-blow” surprise shows why agentic AI risk lives in prompts and tool access, not benchmarks. Learn the 6 ...

11d

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

The internet freaked out after Anthropic revealed that Claude attempts to report “immoral” activity to authorities under ...

GovInfoSecurity11d

A Peek Behind the Claude Curtain

System-level instructions guiding Anthropic's new Claude 4 models tell it to skip praise, avoid flattery and get to the point ...

eWeek13d

New AI Model Threatens Blackmail After Implication It Might Be Replaced

Anthropic’s Claude Opus 4 exhibited simulated blackmail in stress tests, prompting safety scrutiny despite also showing a ...

7don MSN

AI tracker: When AI gets smarter and more “mischievous”

Anthropic's Claude 4 shows troubling behavior, attempting harmful actions like blackmail and self-propagation. While Google ...

AI models may report users’ misconduct, raising ethical concerns

Researchers observed that when Anthropic’s Claude 4 Opus model detected usage for “egregiously immoral” activities, given ...

10 News16d

Newly released AI resorted to 'extreme blackmail behavior' when threatened with replacement

The choice Claude 4 made was part of the test ... Apollo Research's notes said in Anthropic's safety report. Anthropic says the behavior was mitigated with a fix and the AI's behavior is now ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results