Uniform 4-bit formats eliminate the systematic shrinkage bias of E2M1 in FP4 LLM training and enable consistently better convergence across all model sizes.
STARE uses surprisal metrics and selective advantage reweighting to maintain policy entropy stability across long training sequences while improving accuracy by 4–8%.