@ivanfioravanti: Interesting video of M5 Max, on impact of Low, Automatic and High power modes on inference. - No external monitor attac…
Summary
A performance test demonstrates the impact of Low, Automatic, and High power modes on LLM inference speed on an M5 Max MacBook, showing significant differences in token generation rates and power consumption.
Similar Articles
@ivanfioravanti: Apple M5 Max + MLX = raw power! Look at this demo I'm playing with "FasterLivePortrait-MLX" I started with MPS but resu…
The author demonstrates that migrating a LivePortrait implementation from MPS to Apple's MLX framework on an M5 Max chip results in significantly better performance and speed.
@AlexJonesax: Qwen3.6-27b absolutely flying on a M5Max with MTP enabled & oMLX inference.
A community report highlights high inference performance for the Qwen3.6-27b model on M5Max hardware using oMLX optimization.
Localmaxxing (3 minute read)
The article analyzes the viability of running AI inference locally on a MacBook Pro, comparing a local Qwen 35B model against the cloud-based Claude Opus 4.5. It concludes that local models are 2x faster for routine tasks, making them a practical choice for half of daily workloads despite a slight capability gap.
@stevibe: MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth's UD-IQ3_XXS (80GB) on 4 different rigs:…
A user tested MiniMax M2.7 (230B parameter model) using Unsloth's UD-IQ3_XXS quantization (80GB) across four different hardware configurations including RTX 4090, RTX 5090, RTX PRO 6000, and DGX setups, reporting token generation speeds and time-to-first-token metrics.
@alexocheema: Running Qwen3.6 35B (vision) on 2 x M5 Max MacBook Pro with RDMA over Thunderbolt 5. It describes the image and identif…
A demo shows Qwen3.6 35B vision model running across two M5 Max MacBook Pros connected via RDMA over Thunderbolt 5, achieving near-instant responses with prefix caching. The model correctly identifies Apple Park but misidentifies a person in the image.