Anthropic calls for an aviation-like regulatory authority or commissioned private auditors to examine AI models for critical risks before their release.
InternVideo3 enables foundation models to analyze longer video sequences with iterative reasoning and tool use while avoiding efficiency problems in KV cache management.
Arbor enables AI-driven research through systematic hypothesis management and achieved an average of 2.5x higher improvements than existing code models on six test tasks.
Arbor coordinates autonomous AI agents via persistent hypothesis trees and achieved 2.5× better results than Codex and Claude Code on six research tasks.
Bebop uses rejection sampling and TV loss optimization to maintain stable MTP acceptance rates during RL training and accelerates rollouts by up to 1.8x.
RACES enables automatic composition of verifiable environments through recursive combination, with DeepSeek-R1-Distill-Qwen-14B improving by 3.1 points and Qwen3-14B by 2.3 points across six benchmarks.
npm blocks automatic package installation scripts by default starting with version 12, a practice that competitors like Yarn, pnpm, and Bun had already established.