Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...
Understanding, predicting, and changing behavior is the holy grail of managers, politicians, and academics alike. I sat down with eminent social psychologist Prof Saadi Lahlou of the London School of ...