Tag
Explains the communication model for multi-GPU systems, covering the trade-off between latency and bandwidth, and compares MST and Ring algorithms for collective operations like broadcast.
A new in-depth blog post explains collective communication for multiple GPUs, covering primitives like broadcast and reduce, and helps beginners understand how to scale experiments.