Tag
Foundational empirical study demonstrating power-law scaling relationships between language model performance and model size, dataset size, and compute budget, with implications for optimal training allocation and sample efficiency.