Tag
This paper benchmarks 17 compact language models (1B-8B parameters) as generators in Russian-language RAG systems under CPU-only inference, finding that Qwen-family models offer strong quality-latency tradeoffs for private, GPU-free deployment.