InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition

Hugging Face Daily Papers 05/04/26, 12:00 AM Papers

Summary

InfoLaw is a data-aware scaling framework that predicts model loss based on token consumption, model size, data mixture weights, and repetition, enabling efficient data-recipe selection under varying compute budgets.

Upweighting high-quality data in LLM pretraining often improves performance, but in datalimited regimes, especially under overtraining, stronger upweighting increases repetition and can degrade performance. However, standard scaling laws do not reliably extrapolate across mixture recipes or under repetitions, making the selection for optimal data recipes at scaling underdetermined. To solve this, we introduce InfoLaw (Information Scaling Laws), a data-aware scaling framework that predicts loss from consumed tokens, model size, data mixture weights, and repetition. The key idea is to model pretraining as information accumulation, where quality controls information density and repetition induces scaledependent diminishing returns. We first collect the model performance after training on datasets that vary in scale, quality distribution, and repetition level. Then we build up the modeling for information so that information accurately predicts those model performance. InfoLaw predicts performance on unseen data recipes and larger scale runs (up to 7B, 425B tokens) with 0.15% mean and 0.96% max absolute error in loss, and it extrapolates reliably across overtraining levels, enabling efficient data-recipe selection under varying compute budgets.

Original Article

View Cached Full Text

Cached at: 05/13/26, 12:18 AM

Paper page - InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition

Source: https://huggingface.co/papers/2605.02364

Abstract

Upweighting high-quality data in LLMpretrainingoften improves performance, but in datalimited regimes, especially under overtraining, stronger upweighting increasesrepetitionand can degrade performance. However, standardscaling lawsdo not reliably extrapolate across mixture recipes or underrepetitions, making the selection for optimal data recipes at scaling underdetermined. To solve this, we introduce InfoLaw (Information Scaling Laws), adata-aware scaling frameworkthat predicts loss from consumed tokens,model size,data mixture weights, andrepetition. The key idea is to modelpretrainingasinformation accumulation, where quality controls information density andrepetitioninduces scaledependent diminishing returns. We first collect the model performance after training on datasets that vary in scale, quality distribution, andrepetitionlevel. Then we build up the modeling for information so that information accurately predicts those model performance. InfoLaw predicts performance on unseen data recipes and larger scale runs (up to 7B, 425B tokens) with 0.15% mean and 0.96% max absolute error in loss, and it extrapolates reliably across overtraining levels, enabling efficient data-recipe selection under varying compute budgets.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.02364

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.02364 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.02364 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.02364 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition

Paper page - InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Model Merging Scaling Laws in Large Language Models

Scaling Laws for Mixture Pretraining Under Data Constraints

Scaling laws for neural language models

Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws

Can LLMs Take Retrieved Information with a Grain of Salt?

Submit Feedback

Similar Articles

Model Merging Scaling Laws in Large Language Models

Scaling Laws for Mixture Pretraining Under Data Constraints

Scaling laws for neural language models

Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws

Can LLMs Take Retrieved Information with a Grain of Salt?