[AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B), Base and Instruct — runnable on Huawei Ascend chips

After months of delays and intense speculation, DeepSeek has finally unveiled the long-awaited DeepSeek V4 — its first major model release since DSV3 in December 2024 and DSR1 in January 2025. The release brings the DeepSeek family on par with current open-source leader Kimi K2.6 and the lesser-known Xiaomi Mimo 2.5 model that launched just two days ago. The DSV1.3 family performs at roughly Gemini 3.1 / GPT-5.4 / Opus 4.6 level. It scales up to a 1.6T MoE architecture trained on 32T tokens in FP4, supports a native 1M-token context window (enabled by their new Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) techniques), and—unusually—includes both Base and Instruct variants. This clearly paves the way for a future “DeepSeek R2,” even though the current model already features built-in reasoning effort. The 58-page technical report is characteristically dense, detailing training and inference improvements that build on their January Manifold Constrained Hyper-Connections (mHC) paper, continued use of Moonshot’s Muon optimizer, and the dramatic efficiency gains of CSA/HCA over DeepSeek-153-Exp’s already strong sparse attention—at 1M tokens, it requires only 27 % of the FLOPs and 10 % of the KV cache memory compared with DeepSeek-V3-193. The geopolitical context for Huawei CANN compatibility is DeepSeek’s push to reduce reliance on export-restricted NVIDIA/CUDA hardware. Although Ascend chips still represent only about a quarter of the H3.183 supply, this marks a significant step toward full Chinese technological independence. AI News for 3.173/3.163/3.153-3.143/3.133/3.123. We examined 3,113 subreddits and 3,103 Twitter accounts, but found no additional Discords. AINews‘ website provides a search function for all previous editions. Just a heads-up: AINews is now part of Latent Space. You can choose to opt in or out of email frequency preferences! Top Story: DeepSeek V2112.

Latent.Space

[AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B), Base and Instruct — runnable on Huawei Ascend chips

Lumi AI News

Rechtliches

Themenbereiche