@kothasuhas: really really cool work. TLDR: it probably does not make sense to filter _any_ data in the infinite compute regime

X AI KOLs Following 05/21/26, 04:39 PM Papers

data-filtering language-models training-data scaling-laws infinite-compute dclm

Summary

New research suggests that with sufficient compute, filtering training data for language models may be unnecessary, and models can benefit from low-quality data.

really really cool work. TLDR: it probably does not make sense to filter _any_ data in the infinite compute regime https://t.co/61P9AOZe2b

Original Article

View Cached Full Text

Cached at: 05/22/26, 09:53 PM

really really cool work. TLDR: it probably does not make sense to filter any data in the infinite compute regime https://t.co/61P9AOZe2b

Tatsunori Hashimoto (@tatsu_hashimoto): Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally ‘low quality’ data, and can sometimes even benefit.

Similar Articles

A Bitter Lesson for Data Filtering (1 minute read)

TLDR AI

This paper investigates data filtering for large model pretraining and finds that in the high-compute, data-scarce regime, filtering may not be necessary and can even be detrimental; sufficiently trained large models benefit from nominally low-quality data.

@tatsu_hashimoto: Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data …

X AI KOLs Following

Surprising new results show that for large LMs with enough compute, the best data filter might be no filter, as they tolerate low-quality data well.

@AI_Whisper_X: Bitter Lesson Part Two: If you have enough compute, the best data filter is no filter. The biggest takeaway from reading this paper is that Rich Sutton's bitter lesson is now coming to the data side? Stanford's Hashimoto published "A Bitter Lesson for Data Filtering"...

X AI KOLs Timeline

A research paper from Stanford University proposes that with sufficient compute, the best data filtering strategy is no filtering. Experiments show that large-scale models are robust to low-quality data, and unfiltered data pools perform better at larger scales. However, this conclusion applies to standard pre-training of dense models, and filtering remains important when compute is limited.

@FrancoisChauba1: If you train on (unsorted list, bubble sort procedure, sorted list) traces, you will never test time compute (TTC) your…

X AI KOLs Following

A critique arguing that training LLMs on human-generated data limits their ability to discover novel solutions via test-time compute, and that true AGI requires models that can explore hypothesis spaces more broadly, similar to AlphaZero.

@yoonholeee: https://x.com/yoonholeee/status/2064027464926716154

X AI KOLs Following

The author argues that text optimization (prompts, context, memory) is a legitimate and sample-efficient learning mechanism that should be taken more seriously by the ML community, enabling a new scaling axis of update-time compute.

Similar Articles

A Bitter Lesson for Data Filtering (1 minute read)

@tatsu_hashimoto: Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data …

@AI_Whisper_X: Bitter Lesson Part Two: If you have enough compute, the best data filter is no filter. The biggest takeaway from reading this paper is that Rich Sutton's bitter lesson is now coming to the data side? Stanford's Hashimoto published "A Bitter Lesson for Data Filtering"...

@FrancoisChauba1: If you train on (unsorted list, bubble sort procedure, sorted list) traces, you will never test time compute (TTC) your…

@yoonholeee: https://x.com/yoonholeee/status/2064027464926716154

Submit Feedback