dclm

#dclm

@kothasuhas: really really cool work. TLDR: it probably does not make sense to filter _any_ data in the infinite compute regime

X AI KOLs Following ↗ · 2026-05-21 Cached

New research suggests that with sufficient compute, filtering training data for language models may be unnecessary, and models can benefit from low-quality data.

0 favorites 0 likes

#dclm

@tatsu_hashimoto: Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data …

X AI KOLs Following ↗ · 2026-05-21 Cached

Surprising new results show that for large LMs with enough compute, the best data filter might be no filter, as they tolerate low-quality data well.

0 favorites 0 likes

dclm

@kothasuhas: really really cool work. TLDR: it probably does not make sense to filter _any_ data in the infinite compute regime

@tatsu_hashimoto: Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data …

Submit Feedback