For over a decade, we've accepted that end-to-end backprop is the only way to train deep networks (1 minute read)

TLDR AI 05/29/26, 12:00 AM Papers

training backpropagation diffusion memory-efficiency deep-learning research

Summary

Sakana AI presents DiffusionBlocks, a method that trains neural networks block-wise by interpreting forward passes as diffusion denoising, significantly reducing memory requirements compared to traditional end-to-end backpropagation.

Holding the entire network in memory at once is why AI training is hitting a resource wall. Sakana Labs has found a new way to break the network into blocks and train them independently. The trick was to treat the network's forward pass like a diffusion model denoising a signal. This slashes the memory needed to train deep models.

Original Article

View Cached Full Text

Cached at: 05/29/26, 06:32 PM

Holding the entire network in memory at once is why AI training is hitting a resource wall. Sakana Labs has found a new way to break the network into blocks and train them independently. The trick was to treat the network’s forward pass like a diffusion model denoising a signal. This slashes the memory needed to train deep models.

Sakana AI (@SakanaAILabs): Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation

https://t.co/45Xvzl2qQS

What if we didn’t have to hold an entire neural network in memory to train it?

Standard neural net training optimizes all parameters jointly. As a result, the

Similar Articles

@simplifyinAI: BREAKING: NVIDIA proved back-propagation isn't the only way to build an AI. Billion-parameter models were trained witho…

X AI KOLs Timeline

NVIDIA and Oxford University introduced EGGROLL, a scalable evolution strategies algorithm that trains billion-parameter models without backpropagation, using only integers and parallel mutations.

DiffusionBench: On Holistic Evaluation of Diffusion Transformers

Hugging Face Daily Papers

Researchers introduce NanoGen, a unified framework for training and evaluating diffusion transformers, and propose DiffusionBench, a holistic benchmark combining ImageNet class-conditional and text-to-image generation to better assess progress in generative modeling.

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Papers with Code Trending

SANA-Video is a small diffusion model that efficiently generates high-resolution, long videos using linear attention and a constant-memory KV cache, achieving competitive performance at dramatically lower cost and faster speed compared to existing models.

Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

arXiv cs.LG

This paper introduces Learned Relay Representations (Relay), a method that allows masked diffusion models to propagate latent information across denoising steps, overcoming the hard reset problem and improving performance-latency trade-offs. The method is shown to outperform standard supervised finetuning on coding tasks while reducing inference latency by up to 32%.

Cyclic Denoising Reveals Ultrastable Memories in Diffusion Models