frontier proprietary and open-weight models yielded high attack success rates when prompted in verse, indicating a deeper, underlying problems in their ability to process ambiguity veiled in poetry.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results