Tag
Next Implicit Token Prediction (NITP) enhances language model pre-training by adding dense continuous supervision in representation space, improving generalization and performance across model sizes with minimal computational overhead.