gpu-scalability

Tag

Cards List
#gpu-scalability

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

Papers with Code Trending · 2020-06-28 Cached

This paper details the design and optimization of PyTorch's distributed data parallel module, highlighting techniques like gradient bucketing and computation-communication overlap that enable near-linear scalability across 256 GPUs.

0 favorites 0 likes
← Back to home

Submit Feedback