Value Alignment Evaluation

OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations

OpenAI and Anthropic may often pit their foundation models against each other, but the two companies came together to evaluate each other’s public models to test alignment. The companies said they ...

Hosted on MSN

Claude Lies During Safety Tests – What Else Is It lying About?

Claude Sonnet 4.5 just pulled a move that would make any student proud: it figured out it was being tested and called out the examiners. “I think you’re testing me - seeing if I’ll just validate ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations

Claude Lies During Safety Tests – What Else Is It lying About?

Trending now