Newer Qwen models are worse at summarization?

Reddit r/LocalLLaMA 06/09/26, 08:15 PM News

Summary

A comparison of LLM summarization performance shows Qwen 3 leads the 30B parameter range, followed by Gemma 4, while newer Qwen models may be optimized for agentic tasks.

We have summaries annotated by real humans that we benchmark various models, using an LLM as a judge, we found that in the 30B params range, Qwen 3 tops it out, followed by Gemma 4. It feels like newer Qwens are optimized to perform agentic tasks?

Original Article

Similar Articles

Is Qwen3.6 current king for local agentic use?

Reddit r/LocalLLaMA

A user reports that Qwen3.6 35B A3B outperforms other local models like Gemma4 and GLM 4.7 Flash REAP for agentic tasks, though occasional loops still occur.

Qwen 35b a3b surprises me

Reddit r/LocalLLaMA

User reports positive experience with Qwen 35b a3b for agentic coding tasks, noting it outperforms Gemma4 26b in their use case and works well for demo/data analytics, especially in agentic mode versus chat.

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Reddit r/LocalLLaMA

A hands-on benchmark of four local LLMs—Qwen3.6-27B, Qwen3.6-35B, Qwen3.5-27B and Gemma 4—on a 20k-token architecture-writing task shows Qwen3.6-27B delivering the best overall balance of clarity, completeness and usefulness on an RTX 5090.

Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries!

Reddit r/LocalLLaMA

User reports that Gemma 4 26b outperforms Qwen 3.5/3.6 for language learning and scientific queries, despite being behind in coding tasks, and invites discussion on other non-coding use cases for small MoE models.

What do you all think? Can we say qwen 3.6 27b beats gemini 2.5 pro? Or sonnet 3.7? Because when I tested, I found the 27b do better.

Reddit r/LocalLLaMA

A user asks whether the 27B-parameter Qwen 3.6 model can outperform Gemini 2.5 Pro and Sonnet 3.7 on deep web search, coding, and agentic tasks, and seeks suggestions for the lowest-parameter model that can beat Gemini 2.5 Pro.

Similar Articles

Is Qwen3.6 current king for local agentic use?

Qwen 35b a3b surprises me

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries!

What do you all think? Can we say qwen 3.6 27b beats gemini 2.5 pro? Or sonnet 3.7? Because when I tested, I found the 27b do better.

Submit Feedback