Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

Reddit r/LocalLLaMA News

Summary

The author highlights how rapidly local AI capabilities have improved, enabling tasks once exclusive to top-tier cloud models to run on affordable hardware using models like Qwen 27b and Minimax 2.7.

I know benchmarks are questionable, imprecise on individual use cases, and LLMs are often trained to excel... But we're not talking numbers here. We're talking about a trend. When I was using GPT 4o or Sonnet 3.7, if you'd told me I could do all those things locally in such a short time, I wouldn't have believed it. Now it's happening. It's not just happening to those with 400GB of VRAM. It's also happening on more affordable hardware. I think if Qwen 3.6 27b actually comes out soon, it will be truly incredible. True: we're seeing licenses changing, and an increasing need for monetization from open source developers. But it's a really great time. Yesterday I completed tasks that I normally couldn't finish without Claude using the odd Qwen 27b + Minimax 2.7 Q4 combo. For those who want GLM 5 Air... Rediscover the 4.7, which is still very good and smaller. This is a chart that answers many questions I read here daily.
Original Article

Similar Articles

Gemma 4: Byte for byte, the most capable open models

Google DeepMind Blog

Google DeepMind introduces Gemma 4, its most capable family of open models to date, designed for advanced reasoning and agentic workflows with high intelligence-per-parameter efficiency across multiple sizes.

Introducing Gemma 3

Google DeepMind Blog

Google introduces Gemma 3, a collection of lightweight open models (1B, 4B, 12B, 27B) designed to run on single GPUs or TPUs, featuring support for 140+ languages, 128k context window, and multimodal capabilities. The models outperform larger competitors like Llama 3 and DeepSeek-V3 while maintaining efficiency for on-device deployment.