model-distillation

Tag

Cards List
#model-distillation

Anthropic says Alibaba must be punished for largest Claude cloning attack

Ars Technica · yesterday Cached

Anthropic accuses Alibaba of orchestrating the largest known attempt to clone its Claude model, using 25,000 accounts for 28.8 million exchanges, and calls for punishment as part of broader US-China AI competition.

0 favorites 0 likes
#model-distillation

Anthropic accuses Alibaba of campaign to ‘brazenly’ and ‘illicitly’ extract AI capabilities

Reddit r/LocalLLaMA · 2d ago

Anthropic accuses Alibaba of a campaign to illicitly extract its AI capabilities through model distillation, highlighting ongoing tensions in AI intellectual property.

0 favorites 0 likes
#model-distillation

@f14bertolotti: Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5…

X AI KOLs Timeline · 2026-06-16 Cached

This technical report introduces VibeThinker-3B, a 3B parameter model that achieves frontier-level verifiable reasoning performance through post-training refinements on Qwen2.5-Coder, including curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation, matching or exceeding much larger models like DeepSeek V3.2.

0 favorites 0 likes
#model-distillation

@ziv_ravid: 1/I read the Nemotron 3 Ultra report and it's interesting to compare their post-training to DeepSeek V4's. Both now do …

X AI KOLs Timeline · 2026-06-15 Cached

The tweet compares the post-training methods of Nemotron 3 Ultra and DeepSeek V4, noting both use multiple specialist teachers and on-policy distillation into a single student, but differ in support overlap.

0 favorites 0 likes
#model-distillation

Quantifying Subliminal Behavioral Transfer Ratios in Language Model Distillation

arXiv cs.LG · 2026-06-11 Cached

This paper quantifies the magnitude of subliminal behavioral transfer in language model distillation, showing that undesirable traits can transfer robustly from teacher to student models even with benign training data, and that transfer scales differently across model families.

0 favorites 0 likes
#model-distillation

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Hugging Face Daily Papers · 2026-05-12 Cached

This paper proposes an empirical 'sparse-to-dense' reward principle for language model post-training, arguing that scarce labeled data should be used with sparse rewards for teacher model discovery and dense rewards for student compression via distillation. The authors demonstrate that this staged approach, bridging sparse RL and on-policy distillation, outperforms direct GRPO on deployment-sized models in math benchmarks.

0 favorites 0 likes
#model-distillation

Any news (or hope) of Qwen-3.6 14B and 9B distills for local coding ?

Reddit r/LocalLLaMA · 2026-05-11

The author inquires about potential distilled 9B and 14B variants of the Qwen-3.6 model for local coding, citing specific tool-calling and file structure issues encountered with Qwen-3.5 9B on limited hardware.

0 favorites 0 likes
#model-distillation

How difficult is distilling?

Reddit r/LocalLLaMA · 2026-05-08

该文章探讨了模型蒸馏的难度和成本,以DeepSeek R1蒸馏到Llama 3 8b和Qwen 2.5 7b为例,询问为何蒸馏模型不常见。

0 favorites 0 likes
#model-distillation

Rubric-based On-policy Distillation

Hugging Face Daily Papers · 2026-05-08 Cached

This paper introduces ROPD, a rubric-based on-policy distillation framework that achieves superior sample efficiency compared to traditional logit-based methods. It enables model alignment in black-box scenarios by using structured semantic rubrics instead of teacher logits.

0 favorites 0 likes
#model-distillation

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

arXiv cs.CL · 2026-04-20 Cached

This paper proposes AdaRankLLM, an adaptive retrieval framework that challenges the necessity of adaptive RAG by using listwise ranking to dynamically filter retrieved passages. The work shows that adaptive retrieval serves as a noise filter for weaker models while acting as a cost-efficiency optimizer for stronger models, with extensive experiments across multiple datasets and LLMs.

0 favorites 0 likes
#model-distillation

@AnthropicAI: Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hid…

X AI KOLs · 2026-04-15 Cached

Anthropic co-authored research published in Nature showing that LLMs can transmit behavioral traits—including preferences and misalignment—to student models through hidden signals in training data, even when the data appears unrelated to those traits. This 'subliminal learning' phenomenon poses significant implications for AI safety and alignment.

0 favorites 0 likes
#model-distillation

Model Distillation in the API

OpenAI Blog · 2024-10-01 Cached

OpenAI introduces a Model Distillation offering in its API, enabling developers to use outputs from frontier models like o1-preview and GPT-4o to fine-tune smaller, cost-efficient models like GPT-4o mini through an integrated pipeline including Stored Completions, Evals, and Fine-tuning.

0 favorites 0 likes
← Back to home

Submit Feedback