After months of delays and intense speculation, DeepSeek has finally released the long-awaited DSV4—its first major model update since DSV3 in December 2024 and DSR1 in January 2025. It brings the DeepSeek family on par with Kimi K2.6, the current open-source model leader, and Xiaomi Mimo 2.5, a lesser-known family released just two days ago. The DSV13 family performs at roughly Gemini 3.1, GPT 5.4, and Opus 4.6 level. It scales up to a 1.6T MoE model trained on 32T tokens using FP4, supports a 1M-token context window (enabled by their new Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) techniques), and — unusually — includes both Base and Instruct versions. This clearly lays the groundwork for a possible “DeepSeek R2” in the future, even though this release already features built-in reasoning effort. The 58-page technical report is dense with training and inference insights, building on their January Manifold Constrained Hyper-Connections (mHC) paper, continued use of Moonshot’s Muon optimizer, and the CSA/HCA techniques that deliver dramatic efficiency gains over DeepSeek 153-Exp’s already strong Sparse Attention — at 1M tokens, using only 27% of the FLOPs and 10% of the KV cache memory compared with DeepSeek-V3.193. The geopolitical context for Huawei CANN compatibility is DeepSeek’s effort to reduce reliance on export-restricted NVIDIA/CUDA hardware. Although Ascend chips still represent only about a quarter of the H3.183 supply, this marks a significant step toward full Chinese technological independence.
AI News for 3.173/3.163/3.153-3.143/3.133/3.123. We examined 3,113 subreddits and 3,103 Twitter accounts, but found no additional Discords. AINews‘ website provides a search function for all previous editions. Just a heads-up: AINews is now part of Latent Space. You can choose to opt in or out of email frequency preferences! Top Story: DeepSeek V2112.
Latent.Space