Tag
Ahmad Osman announces VibeThinker 3B, a 3-billion-parameter model based on Qwen 2.5 that claims performance comparable to Claude Opus 4.5, predicting local deployment on consumer hardware.
该文章探讨了模型蒸馏的难度和成本,以DeepSeek R1蒸馏到Llama 3 8b和Qwen 2.5 7b为例,询问为何蒸馏模型不常见。