Next-Latent Prediction Transformers [R]

Reddit r/MachineLearning Papers

Summary

Microsoft Research introduces Next-Latent Prediction (NextLat), a self-supervised method that trains transformers to predict their own next latent state, enabling compact world models for reasoning and planning and achieving up to 3.3x faster inference via self-speculative decoding.

[Microsoft Research Preprint](https://preview.redd.it/efm7zazr2t7h1.png?width=2950&format=png&auto=webp&s=444dc71b22bca0c499f56367f705fb4ea23d07b8) Next-token prediction is myopic. What if transformers learn to predict their own next latent state? Microsoft Research present **Next-Latent Prediction (NextLat)**: a self-supervised learning method that teaches transformers to form compact world models for reasoning and planning. It also unlocks up to 3.3x faster inference via self-speculative decoding! On top of next-token prediction, NextLat trains the transformer to predict its own next latent state given the current latent state and next token. NextLat has a few key benefits: 1. **Representation Learning**: NextLat encourages transformers to compress history into compact belief states. 2. **Better Data Efficiency**: predicting in latent space provides denser supervision than predicting one-hot tokens. 3. **Faster Inference**: via recursive multi-step lookahead. I'm super excited about this work. Please do check it out below: 💬 Blog: [https://jaydenteoh.github.io/blog/2026/nextlat](https://jaydenteoh.github.io/blog/2026/nextlat) 💻 Code: [https://github.com/JaydenTeoh](https://github.com/JaydenTeoh) 📝 Paper: [https://arxiv.org/abs/2511.05963](https://arxiv.org/abs/2511.05963)
Original Article

Similar Articles

Next-Latent Prediction Transformers Learn Compact World Models

Papers with Code Trending

Introduces Next-Latent Prediction (NextLat), a self-supervised objective that trains transformers to predict their next latent state, encouraging compact internal world models and improving generalization across sequence modeling tasks.

NITP: Next Implicit Token Prediction for LLM Pre-training

Hugging Face Daily Papers

Next Implicit Token Prediction (NITP) enhances language model pre-training by adding dense continuous supervision in representation space, improving generalization and performance across model sizes with minimal computational overhead.

Fast Byte Latent Transformer

Hugging Face Daily Papers

This paper introduces BLT Diffusion and speculative decoding techniques for byte-level language models to significantly reduce generation latency and memory bandwidth costs while maintaining quality.

Generative modeling with sparse transformers

OpenAI Blog

OpenAI introduces the Sparse Transformer, a deep neural network that improves the attention mechanism from O(N²) to O(N√N) complexity, enabling modeling of sequences 30x longer than previously possible across text, images, and audio. The model uses sparse attention patterns and checkpoint-based memory optimization to train networks up to 128 layers deep, achieving state-of-the-art performance across multiple domains.