Good Summarization SLMs for < 2000 tokens
Summary
A novice asks for recommendations on small language models and prompting strategies to build an employee note summarization engine under 2000 tokens, after experiencing hallucinations with Qwen2.5-7B-Instruct.
Similar Articles
Learning Faster with Better Tokens: Parameter-Efficient Vocabulary Adaptation for Specialized Text Summarization
This paper proposes a parameter-efficient vocabulary adaptation method for LLM-based text summarization in specialized domains, augmenting pretrained tokenizers with domain-specific tokens and selectively replacing under-trained ones to reduce training time by 35-55% and parameter counts by up to 37%.
Newer Qwen models are worse at summarization?
A comparison of LLM summarization performance shows Qwen 3 leads the 30B parameter range, followed by Gemma 4, while newer Qwen models may be optimized for agentic tasks.
Floor for local meeting summarization on a 6GB GPU: qwen3.5:0.8b works at 57s, Granite 4 350M hallucinates
The author introduces VoiceFlow, an open-source local dictation and meeting transcription tool, and benchmarks small LLMs (qwen3.5:0.8b and Granite 4 350M) for meeting summarization on a 6GB GPU, finding the 0.8B Qwen viable while sub-500M models hallucinate. They also ask the community for long-context summarization solutions on low VRAM.
Are super tiny LLMs any good?
Explores whether very small language models can handle casual conversations adequately, and what training factors differentiate the better ones.
Learning to summarize with human feedback
OpenAI demonstrates a technique for improving language model summarization by training a reward model on human preferences and fine-tuning models with reinforcement learning, achieving significant quality improvements that generalize across datasets. This work advances model alignment through human feedback at scale, with applications beyond summarization.