Good Summarization SLMs for < 2000 tokens

Reddit r/LocalLLaMA News

Summary

A novice asks for recommendations on small language models and prompting strategies to build an employee note summarization engine under 2000 tokens, after experiencing hallucinations with Qwen2.5-7B-Instruct.

A novice here, I am trying to build a summarization engine for employee notes. There are between 10 and 50 notes (est 3000-15000 tokens) that needs summarizing. These come already with tags, and need to be summarized into a general report of est 200-1000 tokens. Model needs to determine the "too detailed" level of notes and generalize several similar notes into a category (i.e. when there are several notes related to a same tag category). I tried [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) with some prompting, but it is spewing hallucinations and is not useable. Tried to reduce the temperature, without success. What model and what prompting would you recommend for this task?
Original Article

Similar Articles

Newer Qwen models are worse at summarization?

Reddit r/LocalLLaMA

A comparison of LLM summarization performance shows Qwen 3 leads the 30B parameter range, followed by Gemma 4, while newer Qwen models may be optimized for agentic tasks.

Are super tiny LLMs any good?

Reddit r/singularity

Explores whether very small language models can handle casual conversations adequately, and what training factors differentiate the better ones.

Learning to summarize with human feedback

OpenAI Blog

OpenAI demonstrates a technique for improving language model summarization by training a reward model on human preferences and fine-tuning models with reinforcement learning, achieving significant quality improvements that generalize across datasets. This work advances model alignment through human feedback at scale, with applications beyond summarization.