OpenAI and Anthropic may often pit their foundation models against each other, but the two companies came together to evaluate each other’s public models to test alignment. The companies said they ...
Claude Sonnet 4.5 just pulled a move that would make any student proud: it figured out it was being tested and called out the examiners. “I think you’re testing me - seeing if I’ll just validate ...