compute-optimal

#compute-optimal

@lilianweng: A super long overdue (3+ years?) post on scaling laws. Compute is expensive. Scaling laws are a way to help us reason a…

X AI KOLs Timeline ↗ · 5d ago Cached

Lilian Weng's blog post provides a comprehensive overview of scaling laws in deep learning, covering their derivation, compute-optimal allocation, and the debate between Kaplan et al. and Chinchilla.

0 favorites 0 likes

#compute-optimal

A Bitter Lesson for Data Filtering (1 minute read)

TLDR AI ↗ · 2026-05-21 Cached

This paper investigates data filtering for large model pretraining and finds that in the high-compute, data-scarce regime, filtering may not be necessary and can even be detrimental; sufficiently trained large models benefit from nominally low-quality data.

1 favorites 1 likes

#compute-optimal

Compute Optimal Tokenization (2 minute read)

TLDR AI ↗ · 2026-05-13 Cached

This paper systematically derives compression-aware neural scaling laws by training nearly 1,300 models, demonstrating that the widely used heuristic of 20 tokens per parameter is an artifact of specific tokenizers. The authors propose a tokenizer-agnostic scaling law based on bytes, offering a new framework for compute-efficient training across diverse languages and modalities.

0 favorites 0 likes

#compute-optimal

Prescriptive Scaling Laws for Data Constrained Training

Hugging Face Daily Papers ↗ · 2026-05-02 Cached

A modified scaling law accounting for data repetition effects provides compute-optimal training strategies for data-constrained scenarios, showing that beyond a point further repetition is counterproductive and compute is better spent on model capacity.

0 favorites 0 likes

compute-optimal

@lilianweng: A super long overdue (3+ years?) post on scaling laws. Compute is expensive. Scaling laws are a way to help us reason a…

A Bitter Lesson for Data Filtering (1 minute read)

Compute Optimal Tokenization (2 minute read)

Prescriptive Scaling Laws for Data Constrained Training

Submit Feedback