Tag
A comprehensive overview of scaling laws in deep learning, tracing their theoretical roots and empirical findings, and explaining how loss decreases predictably with model size, data, and compute.
This paper investigates the quantitative limits of parametric memory in LLMs using LoRA as a probe, establishing a power law relationship and introducing a threshold-guided optimization method called MemFT for improved memory performance.
This paper investigates growth dynamics in deterministic equational discovery across three toy substrates and two real-world replications, finding substrate-conditional saturating power-law scaling.
Foundational empirical study demonstrating power-law scaling relationships between language model performance and model size, dataset size, and compute budget, with implications for optimal training allocation and sample efficiency.