@yidabuilds: https://x.com/yidabuilds/status/2053409619641602286
Summary
The author conducted a comparative evaluation of four domestic AI models: DeepSeek V4, Kimi K2.6, GLM-5.1, and MiniMax M2.7. The analysis covers their strengths and weaknesses regarding cost, long-context processing, coding stability, and reasoning performance, offering specific recommendations on how to route tasks involving large document analysis, long-running background jobs, and bulk content generation.
View Cached Full Text
Cached at: 05/10/26, 12:28 PM
After Testing Four Chinese AI Models, I Started to Understand Why Some People Are Ditching Claude
Between March and May this year, domestic models were released in rapid succession. I integrated and tested the four currently strongest domestic models: DeepSeek V4, Kimi K2.6, GLM-5.1, and MiniMax M2.7. My conclusion after running them through their paces is that my primary workflow still remains with Claude, but there are several categories of tasks where continuing to use Claude is entirely unnecessary.
Tasks such as large document analysis, long-running background jobs, and bulk content production do not require high model precision but place high demands on cost and throughput. These scenarios fall precisely into the zone where domestic models hold the greatest advantage.
One Sentence Summary for Each
Let’s start with a quick horizontal comparison. DeepSeek V4 is the only one without an official subscription, operating solely on API billing. However, its Flash tier costs only 1% of Claude’s rate, and its 1-million-token context window is genuinely usable—no rate limiting, no queuing, and payment via domestic Chinese credit cards is seamless. Kimi K2.6 starts at 39 RMB. Community users have reported running it continuously for 13 hours to refactor a financial engine without interruption, though its tool calling stability is average, and it tends to queue up during peak hours. GLM-5.1 starts at 49 RMB for the basic tier and 149 RMB for Pro. The model itself is solid, but its servers frequently overload during peak times, and prices have more than doubled this year. MiniMax M2.7 is the cheapest at 29 RMB for the starter tier and boasts the fastest speed at 100 tokens per second. It excels at text generation but enters infinite loops when encountering complex math problems.
As a benchmark for comparison: The Claude Pro subscription overseas costs $20 per month (approx. 144 RMB), while the Code subscription is $200 per month (approx. 1440 RMB).
DeepSeek V4
Released on April 24, exactly 484 days after V3.
DeepSeek is the only model among the four without an official subscription plan; all usage is billed via API. The Flash tier is priced at 1 RMB per million tokens for input and 2 RMB for output, dropping to 0.2 RMB for input when cache hits occur. Based on my actual usage over one month—100 million input tokens and 30 million output tokens—running everything through V4-Flash cost approximately 160 RMB. The same volume via Claude Opus 4.7 would cost roughly 9000 RMB (pure API billing; coding plan prices would naturally be lower).
The 1-million-token context window is the most heavily utilized feature in testing, and it is truly 1 million. Some models claim to support long contexts but suffer a noticeable drop in output quality after exceeding 200,000 tokens (a common feedback on the Linux.do forum). V4 does not have this issue. You can throw in a 50-page technical document for analysis, read an entire GitHub repository’s code in one go, or process all clauses of a long contract without segmenting it. Previously, these scenarios required window splitting, retrieval-augmented generation (RAG), and careful context management. With V4, you can simply input everything at once.
The most comfortable aspect during testing was the absence of rate limits and queuing. GLM and Kimi often require waiting or become completely unavailable during peak hours (a frequent complaint in the Linux.do community). I encountered no such issues with DeepSeek. Payment is processed directly via domestic Chinese credit cards, eliminating the need to hassle with foreign currency cards or proxies, offering near-zero friction for domestic users.
Although V4-Flash is the cheapest tier, its output quality was sufficient for handling large documents and codebase analysis during testing. The Pro tier is more capable but currently limited by compute resources, resulting in significantly slower response speeds compared to Flash. Improvement is likely only after the release of Huawei’s Ascend 950 chips in the second half of the year. V4 lacks multimodal support, which is its most obvious shortcoming at present. If you prefer a subscription-based experience, you can currently only access DeepSeek via third-party platforms like Alibaba Cloud Bailian, Volcano Engine, or Tencent Cloud, with first-month costs typically ranging from 7 to 9 RMB.
Kimi K2.6
Released on April 20. Moonshot AI’s programming capabilities had previously been questioned by the community. This time, their official positioning was reduced to a single sentence: “Our strongest code model.”
The best indicator of K2.6’s actual capability isn’t benchmark scores, but real-world cases from the community. One developer had it autonomously refactor an 8-year-old financial matching engine, exchange-core. The model ran continuously for 13 hours, iterating through 12 optimization strategies, modifying over 4,000 lines of code, and increasing peak throughput by 133%. Another case involved implementing an inference engine from scratch using Zig, running autonomously for 12 hours, resulting in an engine 20% faster than LM Studio. The ability to run autonomously for over ten hours without drifting off course—a capability previously seen only with Claude Opus—is notable. Users on Linux.do also reported that K2.6’s programming capabilities are significantly improved compared to K2.5, although opinions vary for scenarios outside of frontend development.
A practical solution I currently recommend to clients is using Kimi for automated external link submission. Link building is a core yet extremely time-consuming SEO task requiring visits to dozens or hundreds of different websites, account registration, form filling, and content publishing. I tested three solutions. Codex’s security review is too strict, refusing even legitimate directory submissions. Claude can technically handle it, but pay-per-use costs are high, and more importantly, domestic users frequently face account bans, causing workflows to halt midway. Kimi K2.6 performed surprisingly well in this scenario—it understood the page structure of different sites, autonomously completed the registration and submission process, and adjusted strategies when encountering abnormal pages. Client feedback has been generally positive.
However, there are several issues encountered during testing that you should be aware of. First, the speed is indeed slow due to the large model size, making response latency in long-process tasks visibly apparent. Second, tool calling stability is insufficient, with occasional 400 errors; many users on Linux.do have reported this issue persists since the K2.6 launch. Third, like GLM, it is prone to queuing during peak hours.
Subscriptions are divided into three tiers: Starter at 39 RMB (including K2.6, Agent, and PPT features), Mid-tier Allegretto at 199 RMB, High-tier Allegro at 559 RMB, and a Student version at 49 RMB.
GLM-5.1
Zhipu AI released this on March 27. The official announcement consisted of a single sentence: “GLM-5.1 is now open to all GLM Coding Plan users.” There was no press conference and no press release.
The core case studies presented by Zhipu themselves include building a complete Linux desktop environment from scratch in 8 hours—window manager, status bar, VPN manager, Chinese fonts, and game library—with all 1,200 steps completed autonomously, producing 4.8MB of supporting files. Another case involved vector database optimization. Given only a target and initial code, the model ran 655 rounds of autonomous iteration, improving query performance from 3,108 to 21,472 (a 6.9x increase). The optimization path was not linear: when retrieval methods were too slow, it switched architectures; when accuracy dropped, it introduced compression; when speed was still insufficient, it added coarse filtering; finally, it overlaid routing and pruning. Every technical route switch was initiated autonomously by the model. If these cases are authentic, the capabilities are indeed strong, but note that they have only been demonstrated by Zhipu themselves, and independent reproduction reports have not yet emerged.
GLM-5.1 was trained entirely on Huawei Ascend 910B chips, utilizing 100,000 units, with no NVIDIA GPUs involved. The model’s inherent capability is indeed good, and its stability in running long-duration tasks is acceptable.
However, there are realistic issues with the user experience. Servers frequently overload during peak hours, leading to queuing or complete unavailability—a point repeatedly mentioned on forums, with some users stating it is “basically unusable during severe peak loads.” A price adjustment in 2026 more than doubled subscription fees: Lite rose from the low 20s to 49 RMB, Pro from the low 100s to 149 RMB, and Max to 469 RMB. This was the largest price hike among the four models. Some users on Linux.do directly complained that “price hikes come with no refunds,” resulting in poor experiences for old users. New users can get a 50% discount on their first quarterly payment. Its 200K context window is the shortest among the four.
MiniMax M2.7
Released on March 19, with an active parameter count of only 10B for the smallest version, making it the lightest among the four. The subscription tiers are Starter at 29 RMB per month (5 hours, 600 requests, coding only), Plus at 49 RMB (including image and audio generation), and Max at 119 RMB (full multimodal including video).
Speed was the most intuitive takeaway from testing. With an output speed of 100 tokens per second—roughly double that of other models—tasks that take others over ten seconds are completed by M2.7 in just a few seconds. I used it for bulk copywriting generation and content summarization, where the speed difference was stark.
M2.7 exhibited a counter-intuitive performance in text processing. For tasks like polishing and summarization, its actual output quality was better than that of models with significantly stronger overall capabilities. It performed particularly well in copywriting and summarization scenarios during testing.
Some clients have also tried using MiniMax for external link submissions. For structurally simple blog platforms—standardized forms for titles, body text, and links—it is indeed competent, offering high speed and low cost. However, for complex sites (multi-step registration, email verification, locating submission entry points), the reasoning chain tends to break, leading to a significant drop in success rates.
However, M2.7 has a structural flaw. I encountered an instance during testing where I gave it a slightly complex math problem, and the model fell into a reasoning infinite loop, repeatedly outputting the same sentence thousands of times without exiting. After that incident, I never assigned any tasks requiring rigorous reasoning to it. This is not an isolated incident—the independent benchmark report by Luo Xiaoshan on Zhihu specifically recorded this phenomenon, and multiple users on Linux.do have reported it. It is most cost-effective for text and speed-sensitive scenarios; avoid using it for reasoning tasks.
How to Distribute Workloads
My conclusion after testing all four is that if you want to offload some tasks from Claude, price alone should not be the deciding factor. Actual testing revealed that cost is just one of three variables; the other two are context capacity and task duration. Below are my distribution recommendations based on the test results.
DeepSeek V4-Flash is recommended for processing large documents and large codebases. Its 1-million-token context window has no alternative among the four. With API billing, monthly costs are under 50 RMB, whereas the same tasks using Claude would cost hundreds of RMB per month. A user on Linux.do summed it up directly: “No rate limiting, fast speed, even the Flash tier is very usable.”
GLM-5.1 Pro (149 RMB/month) is recommended for long-running background tasks. Tasks like database optimization, code refactoring, and test generation share common characteristics: they require no human supervision, run for long durations, and demand high stability throughout. I also considered Kimi K2.6 (which has community-verified performance for 13 hours of continuous coding), but its mid-tier subscription at 199 RMB is 50 RMB more expensive than GLM Pro. Since background tasks do not require high code creativity, GLM offers better cost-effectiveness. Note GLM’s queuing issues during peak hours; try to avoid evening peaks for background tasks.
MiniMax Starter (29 RMB/month) is recommended for bulk content production. Long-tail copywriting, bulk summarization, and large volumes of structured content are tasks sensitive to speed and cost but do not require deep reasoning. M2.7’s output speed advantage of 100 tokens per second is significant. The hard prerequisite is that these tasks must not involve mathematical reasoning.
Kimi K2.6 is recommended for automated web operations. External link submission is a typical scenario where AI needs to autonomously visit different websites, understand page structures, and complete registration and form filling. I tested three solutions for this: Codex refused execution due to security reviews; Claude is unsuitable for long-term use due to cost and the risk of account bans in China; Kimi found an acceptable balance between stability and cost. If you only need to submit to structurally simple blog platforms, MiniMax can also cover this at a lower cost.
For my daily primary programming and complex reasoning tasks, I continue to stay with Claude. My current workflow relies on two Claude Code Max instances plus Codex Pro, with my entire toolchain built around this setup—IDE plugins, quick actions, and prompt templates are all adapted to it. The adaptation cost of switching a single point to a domestic model would offset the model’s inherent cost-effectiveness. However, if you are building a workflow from scratch, using Kimi’s starter tier to replace Claude can save you 1,400 RMB per month.
Subscribing to all three domestic models costs just over 200 RMB per month (selecting all starter tiers costs only 117 RMB; following the recommended configuration costs about 217 RMB), covering a broader range of scenarios than using Claude alone.
The current pricing of domestic models will not last forever. Kimi K2.6 raised its API prices by 58% upon release, and GLM’s subscription tiers underwent a comprehensive price adjustment in 2026, doubling costs compared to six months ago. The window of opportunity is still open, but it is narrowing.
Similar Articles
@Michaelzsguo: https://x.com/Michaelzsguo/status/2053217839729791221
This article is a guide for local large model deployment, covering hardware selection, memory calculations, Runtime tool comparisons, and model quantization options, helping users from getting started to optimizing their local inference experience.
Two open-sourced models from china just blew claude opus 4.6 out of water. (Kimi 2.6 and xiaomi mimo v2.5 pro)
Chinese teams open-sourced Kimi 2.6 and Xiaomi MiMo v2.5 Pro, reportedly surpassing Claude Opus 4.6 benchmarks.
@geekbb: MCP tool that offloads low-risk tasks from Codex to DeepSeek, letting expensive models only make judgments. Average 48% cost savings over five test tasks with about 6 seconds latency. CodexSaver is an MCP tool that delegates low-risk tasks (writing tests, documentation, code explanations...) in Codex coding sessions...
CodexSaver is an MCP tool that offloads low-risk coding tasks (tests, docs, lint fixes) from Codex to a cheaper model like DeepSeek, achieving ~48% cost savings with ~6s latency.
@0xshimei: https://x.com/0xshimei/status/2053088751862288846
This article provides a comprehensive 2026 guide to free and low-cost large language models, comparing domestic (China) and international options.
@sanbuphy: K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times…
K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times, boosting throughput from ~15 tokens/s to ~193 tokens/s, ultimately achieving 20% faster inference than LM Studio.