GLM 5.2 on Mac Studio Speedup PR

Reddit r/LocalLLaMA Models

Summary

GLM 5.2 delivers major performance gains on Mac Studio with 512GB RAM, achieving prefill speeds above 100 t/s at high context lengths and enabling 4-bit quantization for contexts over 100k tokens, as detailed in a pull request by the oMLX creator.

Just a heads up for the lucky few 512 gb mac owners: GLM 5.2 is a game changer because prefill speeds stay above 100 t/s at much higher context, and also take less space, so we can run 4 bit quants well above 100k context. See this PR by the oMLX creator: https://github.com/jundot/omlx/pull/1984
Original Article

Similar Articles

GLM 5.2 on consumer hardware

Reddit r/LocalLLaMA

A user tested the unsloth quantized GLM-5.2 model on a high-end consumer-like system with dual RTX 5090, achieving 12 tokens per second.