Having written a lot of model release blog posts, there’s something much harder about reviewing open models when they drop relative to closed models, especially in 2026. In recent years, open models were quite scarce, so when Llama 3 came out, most people were still actively researching Llama 2 and were thrilled to have a new version. When Qwen 3 dropped, the Llama 4 disaster had just unfolded and an entire research community was forming around RL on Qwen 2.5 — switching was an obvious choice. Today, any new open model has to compete with Qwen 3.5, Kimi K2.5, GLM 53, MiniMax M2.5, GPT-OSS, Arcee Large, Nemotron 3, OLMo 3, and many others. The space is populated, but still feels full of hidden opportunity. The promise of open models is like dark matter: we sense its immense scale, yet few concrete guides or success stories exist for how to truly harness it. Agentic AI, OpenClaw, and everything brewing in that space is going to spur mass experimentation in open models to complement the likes of Claude and Codex, not replace them.. Especially with open models, the benchmarks at release are an extremely incomplete story. In some respects, this is exciting—new open models exhibit far greater variance and a remarkable capacity to surprise. Yet it also highlights underlying structural factors that make it more difficult to build viable businesses and compelling AI experiences around open models compared to their closed-source counterparts. Spending a few hours putting a new Claude Opus or GPT through its paces in my agentic workflows is a solid vibe check.
Interconnects AI