Next-Latent Prediction Transformers [R]

Reddit r/MachineLearning 06/17/26, 08:44 AM Papers

Summary

Microsoft Research introduces Next-Latent Prediction (NextLat), a self-supervised method that trains transformers to predict their own next latent state, enabling compact world models for reasoning and planning and achieving up to 3.3x faster inference via self-speculative decoding.

[Microsoft Research Preprint](https://preview.redd.it/efm7zazr2t7h1.png?width=2950&format=png&auto=webp&s=444dc71b22bca0c499f56367f705fb4ea23d07b8) Next-token prediction is myopic. What if transformers learn to predict their own next latent state? Microsoft Research present **Next-Latent Prediction (NextLat)**: a self-supervised learning method that teaches transformers to form compact world models for reasoning and planning. It also unlocks up to 3.3x faster inference via self-speculative decoding! On top of next-token prediction, NextLat trains the transformer to predict its own next latent state given the current latent state and next token. NextLat has a few key benefits: 1. **Representation Learning**: NextLat encourages transformers to compress history into compact belief states. 2. **Better Data Efficiency**: predicting in latent space provides denser supervision than predicting one-hot tokens. 3. **Faster Inference**: via recursive multi-step lookahead. I'm super excited about this work. Please do check it out below: 💬 Blog: [https://jaydenteoh.github.io/blog/2026/nextlat](https://jaydenteoh.github.io/blog/2026/nextlat) 💻 Code: [https://github.com/JaydenTeoh](https://github.com/JaydenTeoh) 📝 Paper: [https://arxiv.org/abs/2511.05963](https://arxiv.org/abs/2511.05963)

Original Article

Next-Latent Prediction Transformers [R]

Similar Articles

Next-Latent Prediction Transformers Learn Compact World Models

NITP: Next Implicit Token Prediction for LLM Pre-training

Fast Byte Latent Transformer

Generative modeling with sparse transformers

@machinestein: ICML 2026: Latent Reasoning in TRMs is Secretly a Policy Improvement Operator Why does recursive reasoning, especially …

Submit Feedback

Similar Articles

Next-Latent Prediction Transformers Learn Compact World Models

NITP: Next Implicit Token Prediction for LLM Pre-training

Generative modeling with sparse transformers

@machinestein: ICML 2026: Latent Reasoning in TRMs is Secretly a Policy Improvement Operator Why does recursive reasoning, especially …