Tag
Kyutai Labs trains 6B-parameter models on Common Crawl data ordered sequentially from 2018 to 2025, showing that performance drop on recent years disappears, and open-sources the checkpoints for continual learning research.
The paper addresses catastrophic forgetting in sequentially trained early-exiting neural networks and proposes two methods based on Elastic Weight Consolidation and Learning without Forgetting to preserve earlier exit performance while adding new ones.