mac-studio

#mac-studio

@cryptoresetlife: 本地无审核版 GLM5.2 754B 参数模型 231GB 在我的MAC studio M3 ultra 512gb 上部署成功了 @support_huihui 948 tokens / 4分25秒 = 948 / 265 ≈ 3.6 …

X AI KOLs Timeline ↗ · 4d ago Cached

An uncensored version of the GLM5.2 754B parameter model (231GB GGUF) was successfully deployed on a Mac Studio M3 Ultra with 512GB RAM, achieving approximately 3.6 tokens/s.

0 favorites 0 likes

#mac-studio

@antirez: Based on what I'm saying with GLM 5.2 implementation inside DwarfStar, there is 90% of probability I'll merge the branc…

X AI KOLs Following ↗ · 2026-06-24

Antirez announces high probability of merging a branch implementing GLM 5.2 in DwarfStar, which could become the best model for 512GB Mac Studio and potentially run on distributed 128GB MacBooks with 2-bit quantization.

0 favorites 0 likes

#mac-studio

GLM 5.2 on Mac Studio Speedup PR

Reddit r/LocalLLaMA ↗ · 2026-06-23

GLM 5.2 delivers major performance gains on Mac Studio with 512GB RAM, achieving prefill speeds above 100 t/s at high context lengths and enabling 4-bit quantization for contexts over 100k tokens, as detailed in a pull request by the oMLX creator.

0 favorites 0 likes

#mac-studio

@karminski3: Thinking of buying a Mac to run large models? This is a deterrent post. Actually, the estimation method is simple. Even if you buy a MacStudio to run the Qwen3.6-27B 4bit quantized version, then enable DFlash to use Qwen's built-in speculative decoding, it only reaches 65 token/s. And now most large models can run at 40 token/s…

X AI KOLs Timeline ↗ · 2026-06-22 Cached

The author calculates the token cost and break-even period of running large models on a Mac Studio, concluding that it is not cost-effective for ordinary users to buy a Mac for personal large model use, and suggests that using APIs or renting GPUs is more economical.

0 favorites 0 likes

#mac-studio

@jun_song: If Apple drops the M5 Ultra Mac Studio soon, I am ordering it with max RAM instantly. No time to hesitate. The M3 Ultra…

X AI KOLs Following ↗ · 2026-06-21 Cached

The author states they will immediately order an M5 Ultra Mac Studio with max RAM if Apple releases it soon, citing the M3 Ultra's high resale value and the M5's inference performance leap as reasons.

0 favorites 0 likes

#mac-studio

@AlexFinn: I can't believe this is real I have GLM 5.2 running 100% locally on my Mac Studio. 2 bit quant. The results I'm getting…

X AI KOLs Following ↗ · 2026-06-18 Cached

A user reports running GLM 5.2 locally on a Mac Studio with 2-bit quantization, claiming it outperforms Opus 4.8 and enables free, private superintelligence for coding and agent tasks.

0 favorites 0 likes

#mac-studio

@pcuenq: GLM 5.2 has just been released Here it's already running with MLX on two Mac Studios (M3 Ultra). This is comparable to …

X AI KOLs Timeline ↗ · 2026-06-16 Cached

GLM 5.2, an open-weight AI model comparable to top closed models, has been released and is now running on MLX on two Mac Studios (M3 Ultra).

0 favorites 0 likes

#mac-studio

I compared all specs of the major GPUs/machines that are being used here, because bandwidth is not everything. Some of ya'll need a reality check.

Reddit r/LocalLLaMA ↗ · 2026-05-30

The author compares various GPUs for LLM inference, critiquing common benchmarks and emphasizing the importance of prefill performance over generation speed, offering recommendations for different budgets and use cases.

0 favorites 0 likes

#mac-studio

"AWS secures rare Mac Studios while ordinary Apple customers remain completely locked out"

Reddit r/LocalLLaMA ↗ · 2026-05-20

AWS has secured a large number of Apple's M3 Ultra Mac Studio units for cloud services, while regular consumers face continued shortages and limited availability.

0 favorites 0 likes

#mac-studio

@FinanceYF5: A 10-year-old blogger said: "The future belongs to those who understand Tokens." He switched to Mac Studio not for gaming, but to run multiple AI Agents working together. He compared the AI industry chain to a cake, breaking it down layer by layer from energy to application. "Tokens are the hard currency of the AI era…

X AI KOLs Following ↗ · 2026-05-18 Cached

A 10-year-old blogger shared his understanding of the AI era, believing Tokens are hard currency, and runs multiple AI Agents working together.

0 favorites 0 likes

#mac-studio

@antirez: I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit qua…

X AI KOLs Timeline ↗ · 2026-05-17 Cached

Antirez reports that DeepSeek v4 PRO runs well on a Mac Studio M3 Ultra with 512GB RAM using 2-bit quantization, achieving 130 t/s prefill and 13 t/s generation.

0 favorites 0 likes

#mac-studio

@rohanpaul_ai: China: a 10-year-old casually gets a Mac Studio for “raising lobsters,” aka letting multiple AI agents work together li…

X AI KOLs Following ↗ · 2026-05-17 Cached

A 10-year-old in China uses a Mac Studio to run multiple AI agents, highlighting the emergence of AI-native children who understand tokens and automation.

0 favorites 0 likes

#mac-studio

@jun_song: Best mid-range local LLM hardware : DGX Spark vs Mac Studio M5 Max 128GB (upcoming) Price: $4.7k (cheaper if used or OE…

X AI KOLs Following ↗ · 2026-05-16 Cached

A comparison of DGX Spark vs Mac Studio M5 Max for running local LLMs, highlighting decode speed, prefill performance, RAM, power consumption, and cost. The Mac wins on decode bandwidth but DGX is faster for prefill and supports batching.

0 favorites 0 likes

#mac-studio

@ttasanen: Just fired up DS4 by @antirez on my Mac Studio M3 Ultra 256GB and man, it’s seriously impressive. A clean, purpose-buil…

X AI KOLs Timeline ↗ · 2026-05-11 Cached

DS4 is a specialized inference engine by antirez designed to run DeepSeek V4 Flash locally on high-end Mac hardware, featuring optimized KV cache handling and 1M context support.

0 favorites 0 likes

#mac-studio

Apple Removes 256GB M3 Ultra Mac Studio Model From Online Store

Reddit r/LocalLLaMA ↗ · 2026-05-09

Apple has removed the 256GB M3 Ultra Mac Studio configuration from its online store, raising speculation about future storage options for upcoming models.

0 favorites 0 likes

#mac-studio

@MemoryReboot_: Why Mac Studio is a trap for local AI - Large unified memory looks sexy on paper - Great for chatbots, terrible for 24/…

X AI KOLs Timeline ↗ · 2026-05-09

The article argues that the Mac Studio is a poor choice for 24/7 local AI workflows due to the lack of CUDA support and non-upgradable hardware, despite its large unified memory.

0 favorites 0 likes

#mac-studio

@Michaelzsguo: Two days ago, I asked whether I should buy a Mac Studio for local LLMs. I was genuinely humbled by how much great feedb…

X AI KOLs Timeline ↗ · 2026-05-09

The author shares a synthesized buying guide for hardware suitable for running local LLMs, comparing Mac Studio, NVIDIA, and AMD options based on community feedback.

0 favorites 0 likes

#mac-studio

@songjunkr: Sharing my local LLM setup for personal use: Equipment: MacStudio M2 Ultra 64gb Model on load - SuperQwen3.6 35b mlx 4b…

X AI KOLs Timeline ↗ · 2026-04-20 Cached

A user shared their personal local LLM stack running on a MacStudio M2 Ultra 64 GB, combining SuperQwen3.6-35b-mlx-4bit, Ernie Image Turbo, and multiple helper models for coding and chat.

0 favorites 0 likes

#mac-studio

Bloomberg: No Mac Studios until at least October

Reddit r/LocalLLaMA ↗ · 2026-04-19

Bloomberg reports that new Mac Studio models won't arrive until at least October 2026, raising questions about when Apple hardware will be capable of running models like DeepSeek v4.

0 favorites 0 likes

mac-studio

Submit Feedback