Tag
This paper investigates why larger models outperform smaller ones, attributing it to reduced gradient interference and better resource allocation, allowing them to learn rare and complex tasks even with infinite data. Experiments on synthetic data and OLMo models verify that larger models avoid overwriting rare-task features due to weaker gradient updates for common tasks.
Sam Altman announces that a program offering compute capacity will be available until the current allocation sells out, with plans to resume later while reserving capacity for ChatGPT and Codex.
Sam Altman announces OpenAI's Guaranteed Capacity, offering discounted tokens for 1-3 year commitments to provide customers with capacity certainty.