Newer Qwen models are worse at summarization?

Reddit r/LocalLLaMA News

Summary

A comparison of LLM summarization performance shows Qwen 3 leads the 30B parameter range, followed by Gemma 4, while newer Qwen models may be optimized for agentic tasks.

We have summaries annotated by real humans that we benchmark various models, using an LLM as a judge, we found that in the 30B params range, Qwen 3 tops it out, followed by Gemma 4. It feels like newer Qwens are optimized to perform agentic tasks?
Original Article

Similar Articles

Qwen 35b a3b surprises me

Reddit r/LocalLLaMA

User reports positive experience with Qwen 35b a3b for agentic coding tasks, noting it outperforms Gemma4 26b in their use case and works well for demo/data analytics, especially in agentic mode versus chat.