@neural_avb: Give them a bunch of money so they can do these scaling experiments upto 7B LLMs and beyond So much to learn from these…
Summary
Zyphra shares their first work on continual learning for LLMs, studying whether models can learn forever from new data, and deriving a scaling law for the onset of plasticity loss in scaling experiments up to 7B parameters.
View Cached Full Text
Cached at: 06/26/26, 10:11 AM
Give them a bunch of money so they can do these scaling experiments upto 7B LLMs and beyond
So much to learn from these papers https://t.co/VhZvCJH0nk
Zyphra (@ZyphraAI): Zyphra is sharing our first work in continual learning where we study: Can LLMs learn forever from new data?
Many see continual learning as a path to AGI through recursive self-improvement (RSI).
The first obstacle is plasticity loss. We derive a scaling law for its onset 🧵
Similar Articles
@harshbhatt7585: https://x.com/harshbhatt7585/status/2063593933314113587
The author shares learnings from training a 160M parameter LLM from scratch, experimenting with architectures like multi-token prediction and hierarchical reasoning models. They emphasize the importance of fast iteration, simplifying ideas, and understanding why architectures work.
@neural_avb: If you think about it, LLM training in 2026 is really a 3-step loop : - train it on some data - dogfood it/run categori…
The tweet outlines a 3-step loop for LLM training in 2026: train on data, run evals, and add synthetic data for underperforming tasks. It emphasizes the accessibility of legal distillation via open source models and cheap APIs, noting that training on reasoning traces alone can achieve high scores.
@lilianweng: A super long overdue (3+ years?) post on scaling laws. Compute is expensive. Scaling laws are a way to help us reason a…
Lilian Weng's blog post provides a comprehensive overview of scaling laws in deep learning, covering their derivation, compute-optimal allocation, and the debate between Kaplan et al. and Chinchilla.
Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)
A developer shares progress on training a 7B parameter open source LLM from scratch using a DeepSeek architecture optimized for low VRAM, with the goal of democratizing AI development and eventually surpassing large proprietary models.
Scaling laws for neural language models
Foundational empirical study demonstrating power-law scaling relationships between language model performance and model size, dataset size, and compute budget, with implications for optimal training allocation and sample efficiency.