@neural_avb: Give them a bunch of money so they can do these scaling experiments upto 7B LLMs and beyond So much to learn from these…

X AI KOLs Timeline 06/25/26, 01:12 PM Papers

Summary

Zyphra shares their first work on continual learning for LLMs, studying whether models can learn forever from new data, and deriving a scaling law for the onset of plasticity loss in scaling experiments up to 7B parameters.

Give them a bunch of money so they can do these scaling experiments upto 7B LLMs and beyond So much to learn from these papers https://t.co/VhZvCJH0nk

Original Article

View Cached Full Text

Cached at: 06/26/26, 10:11 AM

Give them a bunch of money so they can do these scaling experiments upto 7B LLMs and beyond

So much to learn from these papers https://t.co/VhZvCJH0nk

Zyphra (@ZyphraAI): Zyphra is sharing our first work in continual learning where we study: Can LLMs learn forever from new data?

Many see continual learning as a path to AGI through recursive self-improvement (RSI).

The first obstacle is plasticity loss. We derive a scaling law for its onset 🧵

Similar Articles

@harshbhatt7585: https://x.com/harshbhatt7585/status/2063593933314113587

X AI KOLs Timeline

The author shares learnings from training a 160M parameter LLM from scratch, experimenting with architectures like multi-token prediction and hierarchical reasoning models. They emphasize the importance of fast iteration, simplifying ideas, and understanding why architectures work.

@neural_avb: If you think about it, LLM training in 2026 is really a 3-step loop : - train it on some data - dogfood it/run categori…

X AI KOLs Timeline

The tweet outlines a 3-step loop for LLM training in 2026: train on data, run evals, and add synthetic data for underperforming tasks. It emphasizes the accessibility of legal distillation via open source models and cheap APIs, noting that training on reasoning traces alone can achieve high scores.

@lilianweng: A super long overdue (3+ years?) post on scaling laws. Compute is expensive. Scaling laws are a way to help us reason a…

X AI KOLs Timeline

Lilian Weng's blog post provides a comprehensive overview of scaling laws in deep learning, covering their derivation, compute-optimal allocation, and the debate between Kaplan et al. and Chinchilla.

Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)

Reddit r/LocalLLaMA

A developer shares progress on training a 7B parameter open source LLM from scratch using a DeepSeek architecture optimized for low VRAM, with the goal of democratizing AI development and eventually surpassing large proprietary models.

Scaling laws for neural language models

OpenAI Blog

Foundational empirical study demonstrating power-law scaling relationships between language model performance and model size, dataset size, and compute budget, with implications for optimal training allocation and sample efficiency.

Similar Articles

@harshbhatt7585: https://x.com/harshbhatt7585/status/2063593933314113587

@neural_avb: If you think about it, LLM training in 2026 is really a 3-step loop : - train it on some data - dogfood it/run categori…

@lilianweng: A super long overdue (3+ years?) post on scaling laws. Compute is expensive. Scaling laws are a way to help us reason a…

Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)

Scaling laws for neural language models

Submit Feedback