For over a decade, we've accepted that end-to-end backprop is the only way to train deep networks (1 minute read)
Summary
Sakana AI presents DiffusionBlocks, a method that trains neural networks block-wise by interpreting forward passes as diffusion denoising, significantly reducing memory requirements compared to traditional end-to-end backpropagation.
View Cached Full Text
Cached at: 05/29/26, 06:32 PM
Holding the entire network in memory at once is why AI training is hitting a resource wall. Sakana Labs has found a new way to break the network into blocks and train them independently. The trick was to treat the network’s forward pass like a diffusion model denoising a signal. This slashes the memory needed to train deep models.
Sakana AI (@SakanaAILabs): Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
https://t.co/45Xvzl2qQS
What if we didn’t have to hold an entire neural network in memory to train it?
Standard neural net training optimizes all parameters jointly. As a result, the
Similar Articles
@simplifyinAI: BREAKING: NVIDIA proved back-propagation isn't the only way to build an AI. Billion-parameter models were trained witho…
NVIDIA and Oxford University introduced EGGROLL, a scalable evolution strategies algorithm that trains billion-parameter models without backpropagation, using only integers and parallel mutations.
DiffusionBench: On Holistic Evaluation of Diffusion Transformers
Researchers introduce NanoGen, a unified framework for training and evaluating diffusion transformers, and propose DiffusionBench, a holistic benchmark combining ImageNet class-conditional and text-to-image generation to better assess progress in generative modeling.
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
SANA-Video is a small diffusion model that efficiently generates high-resolution, long videos using linear attention and a constant-memory KV cache, achieving competitive performance at dramatically lower cost and faster speed compared to existing models.
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models
This paper introduces Learned Relay Representations (Relay), a method that allows masked diffusion models to propagate latent information across denoising steps, overcoming the hard reset problem and improving performance-latency trade-offs. The method is shown to outperform standard supervised finetuning on coding tasks while reducing inference latency by up to 32%.
Cyclic Denoising Reveals Ultrastable Memories in Diffusion Models
Cyclic denoising is introduced as a novel extraction attack that reveals ultrastable memorized training images in diffusion models by repeatedly noising and denoising samples. The technique requires no gradients or weight inspection and has implications for privacy auditing.