@TeksEdge: With MiniMax M3 open source now out, here is what to expect on quants and sizes, including VRAM needed: MiniMax M3 (428…
Summary
MiniMax M3, a 428B MoE model with ~23B active parameters, is now open source. It offers ultra-long context (up to 1M) and efficiency improvements, with various quantized sizes and VRAM requirements for local deployment.
View Cached Full Text
Cached at: 06/12/26, 07:01 PM
With MiniMax M3 open source now out, here is what to expect on quants and sizes, including VRAM needed:
MiniMax M3 (428B MoE, ~23B active)
GGUF Size Estimates Q8_0 → ~430-450 GB Q6_K → ~340-360 GB Q5_K_M/XL → ~280-310 GB Q4_K_M/XL → ~220-250 GB (Best balance) Q3_K_XL → ~170-200 GB Q2_K → ~110-140 GB Last resort
Very efficient due to extreme sparsity!
Practical local runs will need high-VRAM setups (multiple 5090s or better).
ModelScope (@ModelScope2022): MiniMax M3 is now open source! The model combines native multimodal understanding, ultra-long context, and Agent capabilities in one.🚀
New MSA architecture: up to 1M context at 1/20 the per-token compute of the previous gen. 9x faster prefilling, 15x faster decoding, on par
Similar Articles
@stevibe: MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth's UD-IQ3_XXS (80GB) on 4 different rigs:…
A user tested MiniMax M2.7 (230B parameter model) using Unsloth's UD-IQ3_XXS quantization (80GB) across four different hardware configurations including RTX 4090, RTX 5090, RTX PRO 6000, and DGX setups, reporting token generation speeds and time-to-first-token metrics.
@no_stp_on_snek: Config-I quant of MiniMax-M3 is up on MLX. 2-bit experts, 4-bit attention, 8-bit boundaries + embeddings, f16 router. ~…
Announces the release of a Config-I quantization of MiniMax-M3 on MLX, using 2-bit experts and 4-bit attention to reduce the 427B MoE model from 869GB to ~167GB, though the quant is untested and requires a patch for mlx_lm.
@PrajwalTomar_: Okay this is wild. MiniMax just dropped M3, and it might be the most capable open model for building right now. I gave …
MiniMax released M3, an open-source AI model that leads coding benchmarks and offers a 1M token context for handling entire codebases.
MiniMax M3 (2 minute read)
MiniMax introduces M3, the first open-weights model to combine coding, agentic, and multimodal capabilities with up to 1M context via sparse attention.
MiniMax teases upcoming M3 model with new sparse attention mechanism and 15.6X long-context response speed boost (12 minute read)
MiniMax has released a detailed technical report on its M2 series and teased the upcoming M3 model, which uses a novel sparse attention mechanism to achieve up to 15.6× faster decoding at million-token contexts.