preference-tuning

#preference-tuning

Improving Text-to-Music Generation with Human Preference Rewards

Hugging Face Daily Papers ↗ · 2026-06-19 Cached

This paper presents a text-to-music generation system that leverages reward conditioning, expert iteration, and preference tuning to improve audio quality within a 120M-parameter model, submitted to the ATTM Grand Challenge at ICME 2026.

0 favorites 0 likes

#preference-tuning

Alignment Tuning for Large Language Models: A Data-Centric Lens on Alignment Data Pipelines

arXiv cs.CL ↗ · 2026-05-27 Cached

This survey reframes the alignment tuning of large language models as a data pipeline design problem, decomposing it into three stages: response synthesis, preference evaluation, and preference instantiation. It identifies design trade-offs and failure modes, and outlines open challenges such as prompt-level alignment and agentic settings.

0 favorites 0 likes

#preference-tuning

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

arXiv cs.CL ↗ · 2026-05-27 Cached

This paper introduces CroCo, a method for cross-lingual contrastive preference tuning on self-generated responses, showing that a reward model trained on English preferences can effectively rank responses in other languages, improving model performance across 14 languages without language-specific annotations.

0 favorites 0 likes

#preference-tuning

@neural_avb: Next video is on training tiny (<1B) models for preference tuning. Plus how to generate preference datasets with local …

X AI KOLs Timeline ↗ · 2026-05-26 Cached

Announces an upcoming video on training tiny models for preference tuning, covering reward models, RLHF, DPO, ORPO with Unsloth and TRL.

0 favorites 0 likes

preference-tuning

Improving Text-to-Music Generation with Human Preference Rewards

Alignment Tuning for Large Language Models: A Data-Centric Lens on Alignment Data Pipelines

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

@neural_avb: Next video is on training tiny (<1B) models for preference tuning. Plus how to generate preference datasets with local …

Submit Feedback