after a month with 5 Chinese coding LLMs, is M3 actually going to take the top spot?

Reddit r/ArtificialInteligence 05/22/26, 12:27 PM News

coding-llms chinese-llms comparison ts-next deepseek minimax cost-efficiency

Summary

A user shares a month-long comparison of five Chinese coding LLMs (Kimi K2.6, GLM-5.1, MiMo V2.5 Pro, MiniMax 2.7, DeepSeek V4 Pro) on a TypeScript/Next.js codebase, rating each in categories like frontend, backend, code review, all-rounder, and reasoning. They note MiniMax 2.7 achieves ~90% of Opus 4.6 quality at ~7% cost and speculate whether the upcoming MiniMax 3.0 will close gaps in planning and test coverage to become the top spot.

been rotating through 5 chinese coding models on a TS/Next codebase for the last 4-5 weeks. Kimi K2.6, GLM-5.1, MiMo V2.5 Pro, MiniMax 2.7, DeepSeek V4 Pro. wanted to share where i landed and ask about M3. quick per-category from my runs: * Frontend / design → K2.6 * Backend → K2.6 and GLM-5.1 * Code review → MiMo * All-rounder → M2.7 * Reasoning-heavy → DeepSeek afterwards i found llmdevguy posted a near-identical ranking on X a couple weeks back (162k views, 2.3k likes) and ended it with "now i'm waiting for MiniMax 3.0 to take the number 1 spot." weird to land in the exact same place. https://preview.redd.it/01k9njcpmo2h1.png?width=1190&format=png&auto=webp&s=ef920c65d32a34f1dc054718813d3bb57b54037e M2.7 didn't win any single category for me. what surprised me is cost. Kilo Code posted a benchmark on ClaudeAI: M2.7 hit \~90% of Opus 4.6 quality at \~7% of the cost ($0.27 vs $3.67 across three coding tasks). my own runs aren't scientific but the ratio tracks. short version of the shortcomings: thinner tests and it jumps straight to code instead of walking through reasoning. so i reach for it as an executor once a stronger model has planned, not as the planner. real question is whether M3 closes the planning and test-coverage gap. if it does, all-rounder becomes top of every category pretty fast. anyone else doing side-by-side runs? does this hold on python / go / rust or is it a TS thing?

Original Article

after a month with 5 Chinese coding LLMs, is M3 actually going to take the top spot?

Similar Articles

@sdrzn: MiniMax's new m3 model scores the same as opus 4.7 on terminal-bench 2.1 at 1/20th the compute/cost of their previous m…

I benchmarked 21 local LLMs on a MacBook Air M5 for code quality AND speed

Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3

@PrajwalTomar_: Everyone's sleeping on MiniMax. Again. They just shipped M3. The first open-weights model to combine frontier coding, 1…

Submit Feedback

Similar Articles

@sdrzn: MiniMax's new m3 model scores the same as opus 4.7 on terminal-bench 2.1 at 1/20th the compute/cost of their previous m…

I benchmarked 21 local LLMs on a MacBook Air M5 for code quality AND speed

@jiayuan_jy: A few objective clarifications: 1) This post has nothing to do with MiniMax (I never take sponsored posts). 2) 'Subjective feel' is not the same as actual performance; it's not quantitative data. After more extensive experience, overall coding ability is a qualitative improvement compared to m2.7. A current shortcoming is that 1-shot results compared with...

Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3

@PrajwalTomar_: Everyone's sleeping on MiniMax. Again. They just shipped M3. The first open-weights model to combine frontier coding, 1…