Tag
The paper proposes a hybrid pre-training objective combining JEPA latent-space prediction with MLM reconstruction for language models, showing improved embedding uniformity and semantic-lexical balance.
This paper argues that designing advanced language representations to shape cognitive schemas is a key frontier for expanding LLM intelligence without scaling parameters. It provides formalizations and empirical evidence showing that different linguistic structures significantly impact model performance and internal feature activations.