Tag
This paper investigates the quantitative limits of parametric memory in LLMs using LoRA as a probe, establishing a power law relationship and introducing a threshold-guided optimization method called MemFT for improved memory performance.
This paper investigates growth dynamics in deterministic equational discovery across three toy substrates and two real-world replications, finding substrate-conditional saturating power-law scaling.
Foundational empirical study demonstrating power-law scaling relationships between language model performance and model size, dataset size, and compute budget, with implications for optimal training allocation and sample efficiency.