@tatsu_hashimoto: Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data …

X AI KOLs Following Papers

Summary

Surprising new results show that for large LMs with enough compute, the best data filter might be no filter, as they tolerate low-quality data well.

Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally 'low quality' data, and can sometimes even benefit. https://t.co/VhshLOWBIx
Original Article
View Cached Full Text

Cached at: 05/21/26, 09:37 PM

Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally ‘low quality’ data, and can sometimes even benefit. https://t.co/VhshLOWBIx

Similar Articles

A Bitter Lesson for Data Filtering (1 minute read)

TLDR AI

This paper investigates data filtering for large model pretraining and finds that in the high-compute, data-scarce regime, filtering may not be necessary and can even be detrimental; sufficiently trained large models benefit from nominally low-quality data.

@AI_Whisper_X: Bitter Lesson Part Two: If you have enough compute, the best data filter is no filter. The biggest takeaway from reading this paper is that Rich Sutton's bitter lesson is now coming to the data side? Stanford's Hashimoto published "A Bitter Lesson for Data Filtering"...

X AI KOLs Timeline

A research paper from Stanford University proposes that with sufficient compute, the best data filtering strategy is no filtering. Experiments show that large-scale models are robust to low-quality data, and unfiltered data pools perform better at larger scales. However, this conclusion applies to standard pre-training of dense models, and filtering remains important when compute is limited.