@ivanfioravanti: Interesting video of M5 Max, on impact of Low, Automatic and High power modes on inference. - No external monitor attac…

X AI KOLs Timeline 05/12/26, 09:33 PM News

Summary

A performance test demonstrates the impact of Low, Automatic, and High power modes on LLM inference speed on an M5 Max MacBook, showing significant differences in token generation rates and power consumption.

Interesting video of M5 Max, on impact of Low, Automatic and High power modes on inference. - No external monitor attached - Model not relevant, but it's DS4 Flash Q2. Results: - Low ~25W ~12 toks/s - High ~120W ~ 32 toks/s - Automatic varies from 40W ~14 to 90W ~29 in relation to the fan speed and temperature of the Mac. If you really want to push your MacBook to the max, High Power mode and no external monitors, with them I see a very strange behavior that I'm investigating

Original Article

Similar Articles

@ivanfioravanti: Apple M5 Max + MLX = raw power! Look at this demo I'm playing with "FasterLivePortrait-MLX" I started with MPS but resu…

X AI KOLs Timeline

The author demonstrates that migrating a LivePortrait implementation from MPS to Apple's MLX framework on an M5 Max chip results in significantly better performance and speed.

@AlexJonesax: Qwen3.6-27b absolutely flying on a M5Max with MTP enabled & oMLX inference.

X AI KOLs Timeline

A community report highlights high inference performance for the Qwen3.6-27b model on M5Max hardware using oMLX optimization.

Localmaxxing (3 minute read)

TLDR AI

The article analyzes the viability of running AI inference locally on a MacBook Pro, comparing a local Qwen 35B model against the cloud-based Claude Opus 4.5. It concludes that local models are 2x faster for routine tasks, making them a practical choice for half of daily workloads despite a slight capability gap.

@stevibe: MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth's UD-IQ3_XXS (80GB) on 4 different rigs:…

X AI KOLs Following

A user tested MiniMax M2.7 (230B parameter model) using Unsloth's UD-IQ3_XXS quantization (80GB) across four different hardware configurations including RTX 4090, RTX 5090, RTX PRO 6000, and DGX setups, reporting token generation speeds and time-to-first-token metrics.

@alexocheema: Running Qwen3.6 35B (vision) on 2 x M5 Max MacBook Pro with RDMA over Thunderbolt 5. It describes the image and identif…

X AI KOLs Timeline

A demo shows Qwen3.6 35B vision model running across two M5 Max MacBook Pros connected via RDMA over Thunderbolt 5, achieving near-instant responses with prefix caching. The model correctly identifies Apple Park but misidentifies a person in the image.

Similar Articles

@ivanfioravanti: Apple M5 Max + MLX = raw power! Look at this demo I'm playing with "FasterLivePortrait-MLX" I started with MPS but resu…

@AlexJonesax: Qwen3.6-27b absolutely flying on a M5Max with MTP enabled & oMLX inference.

Localmaxxing (3 minute read)

@stevibe: MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth's UD-IQ3_XXS (80GB) on 4 different rigs:…

@alexocheema: Running Qwen3.6 35B (vision) on 2 x M5 Max MacBook Pro with RDMA over Thunderbolt 5. It describes the image and identif…

Submit Feedback