activation-compression

Tag

Cards List
#activation-compression

ResBM: a new transformer-based architecture for low-bandwidth pipeline-parallel training, achieving 128× activation compression [R]

Reddit r/MachineLearning · 2026-04-16

ResBM introduces a transformer-based architecture with residual encoder-decoder bottlenecks for pipeline-parallel training, achieving 128× activation compression while maintaining convergence. The work advances decentralized, internet-grade distributed training by reducing inter-stage communication overhead.

0 favorites 0 likes
← Back to home

Submit Feedback