Opus 4.6, Codex 5.3, and the post-benchmark era

On Thursday, February 5th, OpenAI and Anthropic simultaneously released the latest versions of their coding-focused models: GPT-5.3-Codex and Claude Opus 4.6. Prior to this, Anthropic had captured the bulk of the mindshare as the industry wrestled with the emerging era of AI agents, largely propelled by the dramatic leap in capabilities delivered by Claude Code powered by Opus 4.5. This post doesn’t delve into how software is changing forever, how Moltbook is demonstrating the future, how ML research is speeding up, or the many wider implications. Instead, it focuses on how to evaluate, live with, and get ready for new models. The narrow gap between Opus 4.6 and Codex 5.3 will be noticeable across many model releases this year, with Opus currently leading the matchup on usability. Going into these releases, I’d been using Claude Code heavily as a general-purpose computer agent—for a mix of software engineering, data analysis, automation, and similar tasks. I’ve experimented with Codex 5.2 (typically on xhigh with maximum thinking effort), but it didn’t fully suit my wide range of horizontal tasks. Over the past few days, I’ve been using both models far more equally. To pay it a high compliment: Codex 5.3 now feels much more like Claude. It responds far more quickly and handles a wide range of tasks—from Git operations to data analysis—with much greater competence. (Earlier versions, even up through 204, frequently stumbled on basic Git commands like creating a new branch.) Codex 5.3 marks a significant advance into Claude’s domain by achieving stronger product-market fit.

Interconnects AI

Opus 4.6, Codex 5.3, and the post-benchmark era

Lumi AI News

Rechtliches

Themenbereiche