gpu-clusters

Tag

Cards List
#gpu-clusters

Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)

OpenAI Blog · 5d ago Cached

OpenAI has released MRC (Multipath Reliable Connection), a novel networking protocol developed with industry partners to improve performance and resilience in large-scale AI training clusters. The specification was published via the Open Compute Project to standardize infrastructure for efficient supercomputer operations.

0 favorites 0 likes
#gpu-clusters

Blowing Off Steam: How Power-Flexible AI Factories Can Stabilize the Global Energy Grid

NVIDIA Blog · 2026-03-25 Cached

Emerald AI demonstrated how power-flexible AI factories can autonomously adjust electricity consumption to stabilize grid demand, using NVIDIA GPUs and infrastructure at a London data center to absorb peak power surges without disrupting critical workloads.

0 favorites 0 likes
#gpu-clusters

Scaling Kubernetes to 7,500 nodes

OpenAI Blog · 2021-01-25 Cached

OpenAI shares detailed lessons learned from scaling a single Kubernetes cluster to 7,500 nodes to support large machine learning workloads, covering networking, scheduling, and infrastructure challenges. The post builds on their earlier experience scaling to 2,500 nodes and aims to help the broader Kubernetes community.

0 favorites 0 likes
← Back to home

Submit Feedback