Tag
An uncensored version of the GLM5.2 754B parameter model (231GB GGUF) was successfully deployed on a Mac Studio M3 Ultra with 512GB RAM, achieving approximately 3.6 tokens/s.
A user reports that running Hermes Agent as a game master using DeepSeek V4 Flash locally on M3 Ultra yields nearly identical quality to the online version.
AWS has secured a large number of Apple's M3 Ultra Mac Studio units for cloud services, while regular consumers face continued shortages and limited availability.
Antirez reports that DeepSeek v4 PRO runs well on a Mac Studio M3 Ultra with 512GB RAM using 2-bit quantization, achieving 130 t/s prefill and 13 t/s generation.
A researcher shares their home compute setup for MLX and AI research, featuring M3 Ultra with 512GB, RTX PRO 6000 with 96GB, and M3 Max with 96GB for model porting and stress testing.