MiniMax Sparse Attention: Efficient Long-Context Processing for Billion-Parameter Models12. June 2026AI Models, Claude CodeShare on:MSA reduces attention computation for million-token contexts by a factor of 28.4 through blockwise sparse selection and achieves practical speedups via co-design of algorithm and GPU kernel. Share on:
Hybrid LLMs Lose Long-Context Capabilities Through CoT Fine-Tuning10. June 2026AI Models, Claude CodeShare on:CoT fine-tuning degrades long-context retrieval in hybrid LLMs by distorting query-key projections; QK-Restore fixes this without additional training. Share on: