Frozen 12B Model Achieves 100% Accuracy on Verified Tasks With Zero Token Consumption

28. July 2026
AI Models

A frozen 12B model combined with a verified solution store achieves 100% accuracy on verified problem families with zero token consumption and deterministic, bit-exact results.

Share on:

Smartsheet Connects AI Agents to AWS Infrastructure via MCP Server

17. July 2026
AI Models

Smartsheet operates an MCP server on AWS that provides AI agents with structured access to platform data and has saved 3 billion tokens to date through token optimizations.

Share on:

InfoKV: Entropy-Based KV-Cache Compression for Long Reasoning Sequences

26. June 20264. July 2026
AI Models

InfoKV combines attention scores with uncertainty signals for KV-cache compression, outperforming pure attention-based methods on long reasoning tasks by measurable margins.

Share on:

Language Compression in LLMs: Output Optimization Saves Costs, Input Reduction Increases Them

26. June 20264. July 2026
AI Models

Output compression effectively reduces inference costs, while input compression increases overall costs and degrades response quality.

Share on:

Bebop: Rejection Sampling Improves Multi-Token Prediction in RL Training

11. June 20264. July 2026
AI Models

Bebop uses rejection sampling and TV loss optimization to maintain stable MTP acceptance rates during RL training and accelerates rollouts by up to 1.8x.

Share on:

Hybrid LLMs Lose Long-Context Capabilities Through CoT Fine-Tuning

10. June 20264. July 2026
AI Models

CoT fine-tuning degrades long-context retrieval in hybrid LLMs by distorting query-key projections; QK-Restore fixes this without additional training.

Share on:

Lookahead Sparse Attention: DeepSeek-V4 Reduces KV-Cache to 13.5 Percent

10. June 20264. July 2026
AI Models

LSA predicts relevant context sections in advance and retains only these in GPU memory, compressing the KV-cache by over 86 percent without sacrificing accuracy.

Share on:

KVarN: Variance-Based KV-Cache Quantization Reduces Error Accumulation

3. June 20264. July 2026
AI Models

KVarN reduces error accumulation when quantizing KV-caches to 2-bit precision through improved token-scale normalization and achieves state-of-the-art results on MATH500, AIME24, and HumanEval.

Share on:

Frozen 12B Model Achieves 100% Accuracy on Verified Tasks With Zero Token Consumption

Smartsheet Connects AI Agents to AWS Infrastructure via MCP Server

InfoKV: Entropy-Based KV-Cache Compression for Long Reasoning Sequences

Language Compression in LLMs: Output Optimization Saves Costs, Input Reduction Increases Them

Bebop: Rejection Sampling Improves Multi-Token Prediction in RL Training

Hybrid LLMs Lose Long-Context Capabilities Through CoT Fine-Tuning

Lookahead Sparse Attention: DeepSeek-V4 Reduces KV-Cache to 13.5 Percent

KVarN: Variance-Based KV-Cache Quantization Reduces Error Accumulation

Lumi AI News

Legal

Topics