A new OpenAI benchmark, GDPval, tests AI models on things people actually do in their jobs — and finds that Claude is about ...