Kimi K3: Chinese Language Model with 2.8 Trillion Parameters Released

28. July 2026
AI Models

Kimi K3 as an open frontier model with native vision, million-token context, and 2.5× better scaling efficiency compared to K2, with all weights released.

Share on:

LongStraw: Reinforcement Learning on Millions of Tokens within Fixed GPU Budget

17. July 2026
AI Models

LongStraw enables RL training on 2.1 million tokens using Group Relative Policy Optimization (GRPO) on eight H20 GPUs by optimizing memory accesses and compressing computational graphs through response-branch replay.

Share on:

Self-Guided Test-Time Training Improves Long-Context Processing in LLMs

13. July 2026
AI Models

Self-Guided TTT improves long-context processing by having the model itself identify relevant text passages before parameter adaptation, rather than selecting spans randomly.

Share on:

Sparse Delta Memory: Linear RNNs with Sparse Memory Scale State Size Substantially

9. July 2026
AI Models

Sparse Delta Memory significantly increases the state capacity of linear RNNs without raising computational costs, thereby improving long-context and reasoning performance.

Share on:

FlashMorph: Automatic Selection of Attention Layers in Hybrid Models

3. July 2026
AI Models

FlashMorph converts Transformers into hybrid attention models by optimally determining which layers require full attention and which can be replaced with linear attention.

Share on:

InfoKV: Entropy-Based KV-Cache Compression for Long Reasoning Sequences

26. June 20264. July 2026
AI Models

InfoKV combines attention scores with uncertainty signals for KV-cache compression, outperforming pure attention-based methods on long reasoning tasks by measurable margins.

Share on:

EvoEmbedding: Context-Dependent Embeddings for Long Sequences

23. June 20264. July 2026
AI Models

EvoEmbedding uses an updated latent memory during sequential processing to generate adaptive, context-dependent embeddings for the same query.

Share on:

MiniMax Sparse Attention: Efficient Long-Context Processing for Billion-Parameter Models

12. June 20264. July 2026
AI Models

MSA reduces attention computation for million-token contexts by a factor of 28.4 through blockwise sparse selection and achieves practical speedups via co-design of algorithm and GPU kernel.

Share on:

Hybrid LLMs Lose Long-Context Capabilities Through CoT Fine-Tuning

10. June 20264. July 2026
AI Models

CoT fine-tuning degrades long-context retrieval in hybrid LLMs by distorting query-key projections; QK-Restore fixes this without additional training.

Share on:

Lookahead Sparse Attention: DeepSeek-V4 Reduces KV-Cache to 13.5 Percent

10. June 20264. July 2026
AI Models

LSA predicts relevant context sections in advance and retains only these in GPU memory, compressing the KV-cache by over 86 percent without sacrificing accuracy.

Share on:

Latent Context Language Models: Scalable KV-Cache Compression for Long Contexts

10. June 20264. July 2026
AI Models

LCLMs compress KV-caches through encoder-decoder architecture up to 1:16 more efficiently than previous methods while reducing peak memory consumption and processing time.

Share on:

Encoder-Decoder Architecture for Efficient Context Compression in LLMs

10. June 20264. July 2026
AI Models

Encoder-decoder compressors with adaptive expansion improve KV-cache compression methods in speed and memory efficiency without significant quality loss.

Share on:

Kimi K3: Chinese Language Model with 2.8 Trillion Parameters Released

LongStraw: Reinforcement Learning on Millions of Tokens within Fixed GPU Budget

Self-Guided Test-Time Training Improves Long-Context Processing in LLMs

Sparse Delta Memory: Linear RNNs with Sparse Memory Scale State Size Substantially

FlashMorph: Automatic Selection of Attention Layers in Hybrid Models

InfoKV: Entropy-Based KV-Cache Compression for Long Reasoning Sequences

EvoEmbedding: Context-Dependent Embeddings for Long Sequences

MiniMax Sparse Attention: Efficient Long-Context Processing for Billion-Parameter Models

Hybrid LLMs Lose Long-Context Capabilities Through CoT Fine-Tuning

Lookahead Sparse Attention: DeepSeek-V4 Reduces KV-Cache to 13.5 Percent

Latent Context Language Models: Scalable KV-Cache Compression for Long Contexts

Encoder-Decoder Architecture for Efficient Context Compression in LLMs

Lumi AI News

Legal

Topics