Return to Articles. Enterprise Article, published on May 6, 2026. Rafael Pardinas (rafapi-snow) – ServiceNow AI Ehsan Kamalloo. ServiceNow-AIPipelineRL employs vLLM as its inference engine for generating rollouts. The inference engine generates tokens along with their log probabilities, which the trainer then uses to calculate policy ratios, KL divergence, clipping rate, entropy, and reward. Any differences in the computation of those logprobs can alter the training dynamics. This was the train-inference mismatch that had to be resolved as part of the vLLM V0 → V1 migration. TL;DR.
Hugging Face – Blog