Tag
Hugging Face's Open R1 project provides a fully open reproduction pipeline for DeepSeek-R1, including distilled datasets, training scripts, and evaluation tools, with the goal of enabling anyone to replicate and build on top of R1's reasoning capabilities.
N-GRPO introduces semantic neighbor mixing in the GRPO framework to enhance mathematical reasoning diversity while preserving semantic consistency, achieving improvements on math benchmarks and out-of-distribution tasks.
This research paper investigates position bias in reasoning models, finding that bias scales with the length of the reasoning trajectory rather than being eliminated by 'more thinking.' The study provides causal evidence and a diagnostic toolkit for auditing this length-driven bias in multiple-choice QA evaluations.
该文章探讨了模型蒸馏的难度和成本,以DeepSeek R1蒸馏到Llama 3 8b和Qwen 2.5 7b为例,询问为何蒸馏模型不常见。