InternVideo3 enables foundation models to analyze longer video sequences with iterative reasoning and tool use while avoiding efficiency problems in KV cache management.
DiffusionGemma denoises up to 256 tokens in parallel per step instead of sequentially and achieves 1,000 tokens/second on NVIDIA H100 at batch size 1 — without cloud dependency.