@ollama: GLM 5.2 on Ollama's cloud just doubled GPU capacity to handle the volume of usage! This is all US based, and running on…
Summary
Ollama doubled GPU capacity for GLM 5.2 on its US cloud, using NVIDIA B300 Blackwell GPUs, emphasizing privacy and open models.
View Cached Full Text
Cached at: 06/20/26, 10:24 PM
GLM 5.2 on Ollama’s cloud just doubled GPU capacity to handle the volume of usage!
This is all US based, and running on NVIDIA B300 Blackwell GPUs. We believe privacy matters!
Let’s go open models! ❤️
Similar Articles
Giving GLM-5.2 a spin locally on CPU only! (poor man's rig for big models)
A user runs GLM-5.2 locally on CPU only, demonstrating how to run a large model on a modest setup.
GLM 5.2 API is live, weights are on HF, and ollama has it already
GLM 5.2 has been released with open weights under MIT license on HuggingFace, available via API and Ollama, featuring competitive benchmarks that trail Opus 4.8 by a point and edge GPT-5.5 by one.
@0xSero: Rejoice fellow 6000 enjoyers. We have GLM at home
A turnkey Docker setup to serve the GLM-5.2-NVFP4-REAP-469B model on 4× RTX PRO 6000 Blackwell GPUs using vLLM, with detailed instructions and configuration options.
@UnslothAI: GLM-5.2 can now be run locally! The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% siz…
UnslothAI announces GLM-5.2, Z.ai's strongest open model with 744B parameters, now runnable locally via dynamic GGUF quantization reducing size by ~84% to 239GB while retaining ~82% accuracy. It fits on 256GB Macs and supports long-context, reasoning, and agentic tasks.
@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm
AirLLM is an open-source tool that optimizes inference memory usage, enabling 70B LLMs to run on a single 4GB GPU without quantization, and supports 405B models on 8GB VRAM.