Tag
ZCube is a new network architecture that flattens the topology and mixes single/multi-rail access to optimize KV Cache transmission in long-context and PD separation scenarios. In the GLM-5.1 production cluster, it achieved a 33% reduction in switch/optical module costs, a 15% increase in GPU inference throughput, and a 40.6% decrease in TTFT P99.
This paper presents ZCube, a novel network architecture developed by Z.ai, Harnets.AI, and Tsinghua University to address topology-induced congestion in Prefill-Decode disaggregated LLM inference clusters. Production deployments on GLM-5.1 coding workloads achieved a 33% reduction in network CapEx, 15% throughput improvement, and 40.6% reduction in TTFT P99 latency.
Jane Street allowed Dwarkesh Patel to tour their new Texas data center with 4,032 GPUs, each rack pulling 140 kilowatts, highlighting the massive scale and unique networking choices.
Jane Street revealed inside views of its AI training center in Texas, housing 4,032 GPUs, 8,000 kilometers of fiber optics, and a full liquid cooling system, while recounting the 20-year evolution from a humble start with six Dell hosts to today's extreme trading system.
OpenAI is launching Stargate UK, an AI infrastructure partnership with NVIDIA and Nscale to build sovereign compute capabilities in the United Kingdom, with plans for up to 31,000 GPUs over time. The initiative supports the UK's national AI strategy and will enable OpenAI models to run on local UK compute for critical public services, regulated industries, and national security use cases.