@stevibe: MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth's UD-IQ3_XXS (80GB) on 4 different rigs:…
Summary
A user tested MiniMax M2.7 (230B parameter model) using Unsloth's UD-IQ3_XXS quantization (80GB) across four different hardware configurations including RTX 4090, RTX 5090, RTX PRO 6000, and DGX setups, reporting token generation speeds and time-to-first-token metrics.
Similar Articles
@TeksEdge: With MiniMax M3 open source now out, here is what to expect on quants and sizes, including VRAM needed: MiniMax M3 (428…
MiniMax M3, a 428B MoE model with ~23B active parameters, is now open source. It offers ultra-long context (up to 1M) and efficiency improvements, with various quantized sizes and VRAM requirements for local deployment.
MiniMax2.7 @47tg 1200pp
MiniMax2.7 model released with 47 trillion parameters and 1200pp context length.
@no_stp_on_snek: Config-I quant of MiniMax-M3 is up on MLX. 2-bit experts, 4-bit attention, 8-bit boundaries + embeddings, f16 router. ~…
Announces the release of a Config-I quantization of MiniMax-M3 on MLX, using 2-bit experts and 4-bit attention to reduce the 427B MoE model from 869GB to ~167GB, though the quant is untested and requires a patch for mlx_lm.
Dual dgx spark (Asus GX10) MiniMax M2.7 results
User benchmarks dual Asus GX10 (DGX Spark) running MiniMax-M2.7-AWQ-4bit, achieving 30–40 tokens/s while drawing only ~100 W each, replacing noisy multi-GPU rigs.
JANGQ-AI/MiniMax-M2.7-JANGTQ_K : mixed-bit quant of MiniMax M2.7 - 74 GB on disk
Release of a mixed-bit quantized version of the MiniMax M2.7 model, optimized to 74 GB for efficient local inference on Apple Silicon devices.