marginalization

#marginalization

When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper distinguishes three probabilistic objects often conflated in language modeling—the full conditional language process, the marginal text-only law, and the model-induced distribution—and analyzes the conditions under which next-token prediction is useful, with RAG and tools interpreted as conditional sufficiency devices.

0 favorites 0 likes

marginalization

When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming

Submit Feedback