TuneJury: An Open Metric for Improving Music Generation Preference Alignment

Hugging Face Daily Papers 06/15/26, 12:00 AM Papers

open-source reward-model text-to-music preference-alignment music-generation pairwise calibration

Summary

TuneJury is an open-source pairwise reward model for text-to-music generation that provides calibrated preference scoring and generalizes across multiple downstream applications.

We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels covering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering via a simple score threshold. TuneJury generalizes to both held-out test pairs and out-of-distribution benchmarks, remaining competitive with prior baselines on the latter. For generators released after training, we introduce anchor calibration, a post-hoc, per-system Bradley-Terry calibration that recovers agreement at substantially better data efficiency than from-scratch retraining. The same frozen reward drives consistent reward-axis gains across three downstream applications: inference-time best-of-N selection, DITTO-style latent optimization, and expert-iteration post-training. TuneJury is available at https://github.com/yonghyunk1m/TuneJury.

Original Article

View Cached Full Text

Cached at: 06/16/26, 11:34 AM

Paper page - TuneJury: An Open Metric for Improving Music Generation Preference Alignment

Source: https://huggingface.co/papers/2606.17006

Abstract

A novel open-source pairwise reward model for text-to-music generation that provides calibrated preference scoring and generalizes across multiple downstream applications through a frozen reward mechanism.

We introduce TuneJury, an open, instance-levelpairwise reward modelfortext-to-musicthat predicts amusic preference scorefrom a text prompt and an audio clip. The released checkpoint is trained on publicly availablehuman-preference labelscovering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering via a simple score threshold. TuneJury generalizes to both held-out test pairs and out-of-distribution benchmarks, remaining competitive with prior baselines on the latter. For generators released after training, we introduceanchor calibration, a post-hoc, per-systemBradley-Terry calibrationthat recovers agreement at substantially better data efficiency than from-scratch retraining. The same frozen reward drives consistent reward-axis gains across three downstream applications: inference-timebest-of-N selection, DITTO-stylelatent optimization, andexpert-iteration post-training. TuneJury is available at https://github.com/yonghyunk1m/TuneJury.

View arXiv page View PDF Project page GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2606\.17006

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### TuneJury/tunejury Updatedabout 9 hours ago

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.17006 in a dataset README.md to link it from this page.

Spaces citing this paper4

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

Paper page - TuneJury: An Open Metric for Improving Music Generation Preference Alignment

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper4

Collections including this paper0

Similar Articles

Improving Text-to-Music Generation with Human Preference Rewards

Jukebox

MERIT: Learning Disentangled Music Representations for Audio Similarity

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting)

APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

Submit Feedback

Similar Articles

Improving Text-to-Music Generation with Human Preference Rewards

MERIT: Learning Disentangled Music Representations for Audio Similarity

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting)

APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music