Next-Latent Prediction Transformers Learn Compact World Models

Papers with Code Trending 11/08/25, 10:41 AM Papers

Summary

Introduces Next-Latent Prediction (NextLat), a self-supervised objective that trains transformers to predict their next latent state, encouraging compact internal world models and improving generalization across sequence modeling tasks.

Transformers replace recurrence with a memory that grows with sequence length and self-attention that enables ad-hoc look ups over past tokens. Consequently, they lack an inherent incentive to compress history into compact latent states with consistent transition rules. This often leads to learning solutions that generalize poorly. We introduce Next-Latent Prediction (NextLat), which extends standard next-token training with self-supervised predictions in the latent space. Specifically, NextLat trains a transformer to learn latent representations that are predictive of its next latent state given the next output token. Theoretically, we show that these latents provably converge to belief states, compressed information of the history necessary to predict the future. This simple auxiliary objective also injects a recurrent inductive bias into transformers, while leaving their architecture, parallel training, and inference unchanged. NextLat effectively encourages the transformer to form compact internal world models with its own belief states and transition dynamics -- a crucial property absent in standard next-token prediction transformers. Empirically, across benchmarks targeting core sequence modeling competencies -- world modeling, reasoning, planning, and language modeling -- NextLat demonstrates significant gains over standard next-token training in downstream accuracy, representation compression, and lookahead planning. NextLat stands as a simple and efficient paradigm for shaping transformer representations toward stronger generalization.

Original Article

View Cached Full Text

Cached at: 06/17/26, 11:38 PM

Paper page - Next-Latent Prediction Transformers Learn Compact World Models

Source: https://huggingface.co/papers/2511.05963

Abstract

Next-Latent Prediction enhances transformer architectures by introducing self-supervised latent state prediction, enabling more effective history compression and improved generalization in sequence modeling tasks.

Transformersreplace recurrence with a memory that grows with sequence length andself-attentionthat enables ad-hoc look ups over past tokens. Consequently, they lack an inherent incentive to compress history into compact latent states with consistent transition rules. This often leads to learning solutions that generalize poorly. We introduce Next-Latent Prediction (NextLat), which extends standardnext-token trainingwith self-supervised predictions in thelatent space. Specifically, NextLat trains a transformer to learn latent representations that are predictive of its next latent state given the next output token. Theoretically, we show that these latents provably converge tobelief states, compressed information of the history necessary to predict the future. This simple auxiliary objective also injects a recurrent inductive bias intotransformers, while leaving their architecture, parallel training, and inference unchanged. NextLat effectively encourages the transformer to form compact internal world models with its ownbelief statesand transition dynamics -- a crucial property absent in standard next-token predictiontransformers. Empirically, across benchmarks targeting core sequence modeling competencies --world modeling,reasoning,planning, andlanguage modeling-- NextLat demonstrates significant gains over standardnext-token trainingin downstream accuracy,representation compression, andlookahead planning. NextLat stands as a simple and efficient paradigm for shaping transformer representations toward stronger generalization.

View arXiv page View PDF Project page GitHub54 Add to collection

Get this paper in your agent:

hf papers read 2511\.05963

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2511.05963 in a model README.md to link it from this page.

Datasets citing this paper1

#### JaydenTeoh/manhattan Viewer• UpdatedMar 2 • 91.6M • 428 • 1

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2511.05963 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Next-Latent Prediction Transformers Learn Compact World Models

Paper page - Next-Latent Prediction Transformers Learn Compact World Models

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

Next-Latent Prediction Transformers [R]

Looped World Models

NITP: Next Implicit Token Prediction for LLM Pre-training

World Machine: Towards Generative World Modeling for Time-Series

Generative modeling with sparse transformers

Submit Feedback

Similar Articles

Next-Latent Prediction Transformers [R]

NITP: Next Implicit Token Prediction for LLM Pre-training

World Machine: Towards Generative World Modeling for Time-Series

Generative modeling with sparse transformers