model-hacks

Tag

Cards List
#model-hacks

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system.

Reddit r/LocalLLaMA · yesterday Cached

A detailed blog post describing how to dramatically speed up GLM-5.2 inference on a dual Grace Hopper system from 2.5 tok/s to over 50 tok/s by stopping model cross-module traffic and grafting an FP8 MTP head onto the INT4 base.

0 favorites 0 likes
← Back to home

Submit Feedback