DiffusionGemma denoises up to 256 tokens in parallel per step instead of sequentially and achieves 1,000 tokens/second on NVIDIA H100 at batch size 1 — without cloud dependency.
DiffusionGemma replaces the traditional sequential token-generation process with parallel denoising of 256-token blocks, enabling faster inference and improved problem-solving capabilities for complex tasks.