deepseek-r1

Tag

Cards List
#deepseek-r1

Open Reproduction of DeepSeek-R1

Hacker News Top · 2026-06-11 Cached

Hugging Face's Open R1 project provides a fully open reproduction pipeline for DeepSeek-R1, including distilled datasets, training scripts, and evaluation tools, with the goal of enabling anyone to replicate and build on top of R1's reasoning capabilities.

0 favorites 0 likes
#deepseek-r1

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

Hugging Face Daily Papers · 2026-06-09 Cached

N-GRPO introduces semantic neighbor mixing in the GRPO framework to enhance mathematical reasoning diversity while preserving semantic consistency, achieving improvements on math benchmarks and out-of-distribution tasks.

0 favorites 0 likes
#deepseek-r1

More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

arXiv cs.AI · 2026-05-11 Cached

This research paper investigates position bias in reasoning models, finding that bias scales with the length of the reasoning trajectory rather than being eliminated by 'more thinking.' The study provides causal evidence and a diagnostic toolkit for auditing this length-driven bias in multiple-choice QA evaluations.

0 favorites 0 likes
#deepseek-r1

How difficult is distilling?

Reddit r/LocalLLaMA · 2026-05-08

该文章探讨了模型蒸馏的难度和成本,以DeepSeek R1蒸馏到Llama 3 8b和Qwen 2.5 7b为例,询问为何蒸馏模型不常见。

0 favorites 0 likes
← Back to home

Submit Feedback