54mon MSN
‘I think you’re testing me’: Anthropic’s newest Claude model knows when it’s being evaluated
Anthropic’s Claude Sonnet 4.5 exhibits some "situational awareness"—leading to safety and performance concerns ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results