@stevibe: MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth's UD-IQ3_XXS (80GB) on 4 different rigs:…

X AI KOLs Following 04/18/26, 03:03 PM News

llm local-inference quantization benchmark consumer-hardware large-language-model

Summary

A user tested MiniMax M2.7 (230B parameter model) using Unsloth's UD-IQ3_XXS quantization (80GB) across four different hardware configurations including RTX 4090, RTX 5090, RTX PRO 6000, and DGX setups, reporting token generation speeds and time-to-first-token metrics.

MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth's UD-IQ3_XXS (80GB) on 4 different rigs: 4x RTX 4090 (96GB): 71.52 tok/s, TTFT 1045ms 4x RTX 5090 (128GB): 120.54 tok/s, TTFT 725ms 1x RTX PRO 6000 (96GB): 118.74 tok/s, TTFT 765ms DGX

Original Article

Similar Articles

@TeksEdge: With MiniMax M3 open source now out, here is what to expect on quants and sizes, including VRAM needed: MiniMax M3 (428…

X AI KOLs Following

MiniMax M3, a 428B MoE model with ~23B active parameters, is now open source. It offers ultra-long context (up to 1M) and efficiency improvements, with various quantized sizes and VRAM requirements for local deployment.

MiniMax2.7 @47tg 1200pp

Reddit r/LocalLLaMA

MiniMax2.7 model released with 47 trillion parameters and 1200pp context length.

@no_stp_on_snek: Config-I quant of MiniMax-M3 is up on MLX. 2-bit experts, 4-bit attention, 8-bit boundaries + embeddings, f16 router. ~…

X AI KOLs Following

Announces the release of a Config-I quantization of MiniMax-M3 on MLX, using 2-bit experts and 4-bit attention to reduce the 427B MoE model from 869GB to ~167GB, though the quant is untested and requires a patch for mlx_lm.

Dual dgx spark (Asus GX10) MiniMax M2.7 results

Reddit r/LocalLLaMA

User benchmarks dual Asus GX10 (DGX Spark) running MiniMax-M2.7-AWQ-4bit, achieving 30–40 tokens/s while drawing only ~100 W each, replacing noisy multi-GPU rigs.

JANGQ-AI/MiniMax-M2.7-JANGTQ_K : mixed-bit quant of MiniMax M2.7 - 74 GB on disk

Reddit r/LocalLLaMA

Release of a mixed-bit quantized version of the MiniMax M2.7 model, optimized to 74 GB for efficient local inference on Apple Silicon devices.

Similar Articles

@TeksEdge: With MiniMax M3 open source now out, here is what to expect on quants and sizes, including VRAM needed: MiniMax M3 (428…

MiniMax2.7 @47tg 1200pp

@no_stp_on_snek: Config-I quant of MiniMax-M3 is up on MLX. 2-bit experts, 4-bit attention, 8-bit boundaries + embeddings, f16 router. ~…

Dual dgx spark (Asus GX10) MiniMax M2.7 results

JANGQ-AI/MiniMax-M2.7-JANGTQ_K : mixed-bit quant of MiniMax M2.7 - 74 GB on disk

Submit Feedback