training-inference-mismatch

Tag

Cards List
#training-inference-mismatch

Diagnosing Training Inference Mismatch in LLM Reinforcement Learning

arXiv cs.LG · 2026-05-15 Cached

This paper diagnoses Training-Inference Mismatch (TIM) in LLM reinforcement learning, showing that small numerical disagreements between training and inference token probabilities can cause training collapse, and proposes remedies.

0 favorites 0 likes
← Back to home

Submit Feedback