@BetaTomorrow: Title: A Bitter Lesson for Data Filtering Authors : Christopher Mohri , John Duchi, Tatsunori Hashimoto (@tatsu_hashimo…

X AI KOLs Following Papers

Summary

This paper argues that for large enough models, unfiltered data can improve generalization by providing weak perturbations, contrary to the common assumption that only high-quality filtered data is beneficial. The authors caution that harmful conditional shifts can still damage models, but over-curation may remove useful perturbations.

Title: A Bitter Lesson for Data Filtering Authors : Christopher Mohri , John Duchi, Tatsunori Hashimoto (@tatsu_hashimoto) Filtering helps when the model lacks enough capacity to separate manifold regions. But when the model is large enough, unfiltered data supplies weak stochastic perturbations across a broader manifold. These perturbations can activate more intrinsic pathways, stabilize more fixed-point basins, and improve generalization. The “bitter lesson” here is not only scale beats curation; it is that over-curation may remove the very perturbations needed for fixed-point construction in high-order nonlinear data. One caution: this should not be overstated as “all data is good.” The paper itself says harmful conditional shifts can still damage the model, for example systematically false statements that look like normal high-quality text. Deep Manifold would say the same: useful perturbation nudges the manifold; adversarial or wrong conditional structure can anchor the wrong fixed point. ** Dataualism ** https://x.com/BetaTomorrow/status/2048580677290070016… #DeepManifoldInterpretation
Original Article
View Cached Full Text

Cached at: 06/13/26, 02:17 PM

Title: A Bitter Lesson for Data Filtering Authors : Christopher Mohri , John Duchi, Tatsunori Hashimoto (@tatsu_hashimoto) Filtering helps when the model lacks enough capacity to separate manifold regions. But when the model is large enough, unfiltered data supplies weak stochastic perturbations across a broader manifold. These perturbations can activate more intrinsic pathways, stabilize more fixed-point basins, and improve generalization. The “bitter lesson” here is not only scale beats curation; it is that over-curation may remove the very perturbations needed for fixed-point construction in high-order nonlinear data. One caution: this should not be overstated as “all data is good.” The paper itself says harmful conditional shifts can still damage the model, for example systematically false statements that look like normal high-quality text. Deep Manifold would say the same: useful perturbation nudges the manifold; adversarial or wrong conditional structure can anchor the wrong fixed point. ** Dataualism ** https://x.com/BetaTomorrow/status/2048580677290070016… #DeepManifoldInterpretation

Turing Post (@TheTuringPost): Wow, this is interesting..

@Stanford researchers put a common assumption to the test: large models need only “high-quality” filtered training data.

What if the best filter is no filter at all?

They compared full Common Crawl data with heavily filtered versions of it and got

Similar Articles

The Curse of Depth in Large Language Models

Lobsters Hottest

This paper introduces the Curse of Depth in LLMs, where deep layers become ineffective due to Pre-Layer Normalization causing output variance explosion. The authors propose LayerNorm Scaling to mitigate this, showing consistent improvements in pre-training and fine-tuning across model sizes up to 7B.

Singular Learning Theory: AI learns like ice melts

Reddit r/artificial

Singular Learning Theory (SLT) uses algebraic geometry to explain why neural networks generalize well despite their degeneracies, introducing the real log canonical threshold (RLCT) as a measure of model complexity.

(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable

arXiv cs.AI

This paper proposes that reliability in AI-assisted social science research depends on decision architecture—how cognitive labor is divided between humans and machines. Through a pre-specified factorial experiment, the authors show that an unconstrained multi-agent baseline fails in 72% of runs, while one organized with three architectural commitments (LLMs restricted to reasoning, deterministic data/estimation, and three human decision gates) fails in only 16%.

Prefill Awareness in Large Language Models

arXiv cs.AI

This paper investigates whether frontier language models can detect when their prior assistant messages have been inserted or edited (prefill awareness). The study finds that models like Claude Opus 4.5 exhibit substantial prefill awareness, detecting tampered prefills in up to 35% of cases without false positives, which could compromise the validity of prefill-based safety evaluations.