NEUVariable-Width Transformers: Non-Uniform Capacity Distribution Across Layers17. June 2026AI ModelsShare on:Different layers perform different roles and could therefore enable non-uniform distribution of parameters and computational resources as an alternative to constant architectural width. Share on: