next-token-prediction

#next-token-prediction

Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models

arXiv cs.LG ↗ · yesterday Cached

Introduces UniTok, a universal tokenizer that transforms continuous time series into discrete tokens, and UniTok-FM, a foundation model pretrained via next-token prediction that enables zero-shot and prompt-boosted forecasting as well as few-shot generation and classification through training-free in-context inference.

0 favorites 0 likes

#next-token-prediction

@Hesamation: 3Blue1Brown’s new video explains why every LLM is actually a compression machine. everyone describes pre-training as “n…

X AI KOLs Timeline ↗ · 3d ago Cached

3Blue1Brown's new video explains that LLMs are fundamentally compression machines, linking next-token prediction to efficient encoding of human knowledge, which leads to better abstraction and reasoning.

0 favorites 0 likes

#next-token-prediction

The Need for an External Observer Formalizing the Sufficiency Gap: A Mathematical Extension of Mixture Identifiability and Contextual Grounding in Sequence Models

arXiv cs.CL ↗ · 2026-05-27 Cached

This paper formalizes the sufficiency gap in next-token prediction, demonstrating that even ideal sequence models can become overconfident when textual prefixes are not sufficient statistics for latent circumstances. It proposes an external observer mechanism to reduce but not eliminate this gap.

0 favorites 0 likes

#next-token-prediction

Where does next-token prediction leave us?

Hacker News Top ↗ · 2026-05-27 Cached

A critical examination of how AI maximalists celebrate the obsolescence of human labor through next-token prediction, and the socioeconomic risks this attitude poses, particularly for vulnerable populations.

0 favorites 0 likes

#next-token-prediction

When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper distinguishes three probabilistic objects often conflated in language modeling—the full conditional language process, the marginal text-only law, and the model-induced distribution—and analyzes the conditions under which next-token prediction is useful, with RAG and tools interpreted as conditional sufficiency devices.

0 favorites 0 likes

#next-token-prediction

@pallavishekhar_: https://x.com/pallavishekhar_/status/2058460434035060758

X AI KOLs Timeline ↗ · 2026-05-24 Cached

Explains what large language models actually do (next-token prediction) and why they sound confident even when wrong. Offers a mental model and verification checklist for using LLMs safely.

0 favorites 0 likes

#next-token-prediction

Rant: Stop saying LLMs are just “next token predictors.”

Reddit r/singularity ↗ · 2026-05-17

A critique of the oversimplified claim that LLMs are 'just next token predictors,' arguing that prediction at scale induces useful representations and capabilities, and that such dismissals confuse objective with learned system.

0 favorites 0 likes

#next-token-prediction

Conditional Attribute Estimation with Autoregressive Sequence Models

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper introduces Conditional Attribute Transformers, a method for jointly estimating next-token probability and attribute values conditionally, enabling credit assignment, counterfactual analysis, and steerable generation in a single forward pass.

0 favorites 0 likes

#next-token-prediction

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

Hugging Face Daily Papers ↗ · 2026-05-14 Cached

ATLAS presents a visual reasoning framework that combines agentic operations and latent representations using functional tokens, enabling efficient training via next-token prediction and reinforcement learning while avoiding intermediate image generation.

0 favorites 0 likes

#next-token-prediction

TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG

arXiv cs.CL ↗ · 2026-04-20 Cached

TPA proposes a novel method for detecting hallucinations in RAG systems by attributing next-token probabilities to seven distinct sources (Query, RAG Context, Past Token, Self Token, FFN, Final LayerNorm, Initial Embedding) and aggregating by Part-of-Speech tags. The approach achieves state-of-the-art performance across five LLMs including Llama2, Llama3, Mistral, and Qwen.

0 favorites 0 likes

next-token-prediction

Submit Feedback