Tag
Ollama doubled GPU capacity for GLM 5.2 on its US cloud, using NVIDIA B300 Blackwell GPUs, emphasizing privacy and open models.
Hippocratic AI partners with Modular to use MAX framework for inference on large language models, achieving sub-500ms TTFT, ~30% faster P99 latency and ~22% faster mean latency at scale on NVIDIA B300 GPUs, with portability to AMD.