token-prediction

#token-prediction

Which tokens does a hybrid model predict better?

Hugging Face Blog ↗ · 3d ago Cached

A study comparing Olmo Hybrid and Olmo 3 transformers at the token level shows hybrid models better predict meaningful tokens like nouns/verbs, while transformers excel at copying tokens from input.

0 favorites 0 likes

#token-prediction

@MatthieuWyart: LLMs learn by predicting tokens. World models (JEPA, data2vec) learn by predicting their own abstractions. Which needs …

X AI KOLs Timeline ↗ · 2026-06-01 Cached

This paper proves that learning by predicting latent representations (as in world models like JEPA and data2vec) requires exponentially less data than predicting tokens (as in LLMs) for hierarchical data with hidden structure.

0 favorites 0 likes

#token-prediction

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper proposes STOP (SuperTOken for Pruning), a systematic framework for pruning inefficient reasoning paths early in parallel reasoning with Large Reasoning Models. The method achieves superior efficiency and effectiveness across models from 1.5B to 20B parameters, boosting GPT-OSS-20B accuracy on AIME25 from 84% to 90% under fixed compute budgets.

0 favorites 0 likes

token-prediction

Which tokens does a hybrid model predict better?

@MatthieuWyart: LLMs learn by predicting tokens. World models (JEPA, data2vec) learn by predicting their own abstractions. Which needs …

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

Submit Feedback