Newer Qwen models are worse at summarization?
Summary
A comparison of LLM summarization performance shows Qwen 3 leads the 30B parameter range, followed by Gemma 4, while newer Qwen models may be optimized for agentic tasks.
Similar Articles
Is Qwen3.6 current king for local agentic use?
A user reports that Qwen3.6 35B A3B outperforms other local models like Gemma4 and GLM 4.7 Flash REAP for agentic tasks, though occasional loops still occur.
Qwen 35b a3b surprises me
User reports positive experience with Qwen 35b a3b for agentic coding tasks, noting it outperforms Gemma4 26b in their use case and works well for demo/data analytics, especially in agentic mode versus chat.
I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090
A hands-on benchmark of four local LLMs—Qwen3.6-27B, Qwen3.6-35B, Qwen3.5-27B and Gemma 4—on a 20k-token architecture-writing task shows Qwen3.6-27B delivering the best overall balance of clarity, completeness and usefulness on an RTX 5090.
Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries!
User reports that Gemma 4 26b outperforms Qwen 3.5/3.6 for language learning and scientific queries, despite being behind in coding tasks, and invites discussion on other non-coding use cases for small MoE models.
What do you all think? Can we say qwen 3.6 27b beats gemini 2.5 pro? Or sonnet 3.7? Because when I tested, I found the 27b do better.
A user asks whether the 27B-parameter Qwen 3.6 model can outperform Gemini 2.5 Pro and Sonnet 3.7 on deep web search, coding, and agentic tasks, and seeks suggestions for the lowest-parameter model that can beat Gemini 2.5 Pro.