News

Record-Breaking Performance on Industry Benchmarks. Claude Opus 4 scored 72.5% on SWE-bench, a rigorous benchmark used to evaluate AI coding abilities.
Discover how Claude 4 and Grok 4 compare in AI app development. Which model excels in efficiency, reliability, and ...
Anthropic's Claude Opus 4 outperforms OpenAI's GPT-4.1 with unprecedented seven-hour autonomous coding sessions and record-breaking 72.5% SWE-bench score, transforming AI from quick-response tool ...
The release features Claude Opus 4 and Claude Sonnet 4, both of which raise the bar for AI reasoning, coding capabilities, and sustained agentic performance. Claude Opus 4, now alleged as the world’s ...
Free users of Claude can only access the new Sonnet 4 model. However, Pro, Max, Team, and Enterprise Claude plan users can access both models and extended thinking.
Claude 4 Opus: Built for Complex, Long-Term Workflows. Claude 4 Opus is specifically designed to handle high-performance, long-duration tasks. It excels in advanced reasoning, memory retention ...
According to the SWE-Bench Verified benchmark, Devstral outperforms GPT-4.1-mini and Claude 3.5 Haiku. Its small size allows it to run on a single RTX 4090 or a Mac with 32GB RAM, enabling it to ...
According to Anthropic, Claude Sonnet 4 (its mid-tier model, between Raiku and Opus) significantly improves at coding, reasoning, and instruction following compared to its predecessor, Claude ...
That’s true of both Claude Sonnet 4 and Claude Opus 4. There are small touches, like the little “thinking” icon, to more important differences, like the kind of text it generates.
Claude Opus 4, now alleged as the world’s best coding model, delivers a record-setting 72.5% on SWE-bench and 43.2% on Terminal-bench, outperforming all competitors on long-running and complex ...