supervised-fine-tuning

#supervised-fine-tuning

Navigating by Old Maps: The Pitfalls of Static Mechanistic Localization in LLM Post-Training

arXiv cs.CL ↗ · 2d ago Cached

This paper challenges the 'Locate-then-Update' paradigm in LLM post-training by demonstrating that static mechanistic localization is insufficient due to the dynamic evolution of neural circuits during fine-tuning. It introduces new metrics to analyze circuit stability and proposes the need for predictive frameworks in mechanistic localization.

0 favorites 0 likes

#supervised-fine-tuning

When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models

arXiv cs.CL ↗ · 2d ago Cached

When2Speak is a synthetic dataset and pipeline for training LLMs to decide when to speak in multi-party conversations. Fine-tuning on this dataset significantly improves turn-taking, with reinforcement learning reducing missed interventions from 50% to ~20%.

0 favorites 0 likes

#supervised-fine-tuning

TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis

arXiv cs.CL ↗ · 2026-04-20 Cached

TRIDENT is a novel framework and dataset synthesis pipeline for enhancing LLM safety through tri-dimensional red-teaming data that covers lexical diversity, malicious intent, and jailbreak tactics. Fine-tuning Llama-3.1-8B on TRIDENT-Edge achieves 14.29% reduction in Harm Score and 20% decrease in Attack Success Rate compared to baseline models.

0 favorites 0 likes

#supervised-fine-tuning

Learning to Reason with Insight for Informal Theorem Proving

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper proposes DeepInsightTheorem, a hierarchical dataset and Progressive Multi-Stage SFT training strategy to improve LLMs' informal theorem proving by teaching them to identify and apply core techniques through insight-aware reasoning.

0 favorites 0 likes

#supervised-fine-tuning

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

Hugging Face Daily Papers ↗ · 2026-04-15 Cached

GFT (Group Fine-Tuning) is a unified post-training framework for LLMs that addresses limitations of supervised fine-tuning by using Group Advantage Learning and Dynamic Coefficient Rectification to improve training stability and generalization. The paper shows SFT can be interpreted as a special case of policy gradient optimization with sparse implicit rewards, and GFT consistently outperforms SFT-based methods while integrating more smoothly with subsequent RL training.

0 favorites 0 likes

supervised-fine-tuning

Navigating by Old Maps: The Pitfalls of Static Mechanistic Localization in LLM Post-Training

When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models

TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis

Learning to Reason with Insight for Informal Theorem Proving

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

Submit Feedback