TuneJury: An Open Metric for Improving Music Generation Preference Alignment
Summary
TuneJury is an open-source pairwise reward model for text-to-music generation that provides calibrated preference scoring and generalizes across multiple downstream applications.
View Cached Full Text
Cached at: 06/16/26, 11:34 AM
Paper page - TuneJury: An Open Metric for Improving Music Generation Preference Alignment
Source: https://huggingface.co/papers/2606.17006
Abstract
A novel open-source pairwise reward model for text-to-music generation that provides calibrated preference scoring and generalizes across multiple downstream applications through a frozen reward mechanism.
We introduce TuneJury, an open, instance-levelpairwise reward modelfortext-to-musicthat predicts amusic preference scorefrom a text prompt and an audio clip. The released checkpoint is trained on publicly availablehuman-preference labelscovering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering via a simple score threshold. TuneJury generalizes to both held-out test pairs and out-of-distribution benchmarks, remaining competitive with prior baselines on the latter. For generators released after training, we introduceanchor calibration, a post-hoc, per-systemBradley-Terry calibrationthat recovers agreement at substantially better data efficiency than from-scratch retraining. The same frozen reward drives consistent reward-axis gains across three downstream applications: inference-timebest-of-N selection, DITTO-stylelatent optimization, andexpert-iteration post-training. TuneJury is available at https://github.com/yonghyunk1m/TuneJury.
View arXiv pageView PDFProject pageGitHub0Add to collection
Get this paper in your agent:
hf papers read 2606\.17006
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### TuneJury/tunejury Updatedabout 9 hours ago
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.17006 in a dataset README.md to link it from this page.
Spaces citing this paper4
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Improving Text-to-Music Generation with Human Preference Rewards
This paper presents a text-to-music generation system that leverages reward conditioning, expert iteration, and preference tuning to improve audio quality within a 120M-parameter model, submitted to the ATTM Grand Challenge at ICME 2026.
Jukebox
OpenAI's Jukebox is a generative model that produces music as raw audio, including vocals and instruments, using a VQ-VAE for compression and hierarchical Sparse Transformer priors to handle long-range musical structure. It represents a significant step beyond symbolic music generation by operating directly in the raw audio domain.
MERIT: Learning Disentangled Music Representations for Audio Similarity
MERIT is a framework that learns disentangled music representations for melody, rhythm, and timbre using conditional audio generation and source-separated stems, enabling nuanced and factor-specific audio similarity queries.
Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting)
A revamped TTS benchmark introduces objective standards and live blind voting to create an ELO rating for 46+ models, with participation open to the community.
APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
APEX is a large-scale multi-task learning framework that predicts both popularity and aesthetic quality of AI-generated music using frozen audio embeddings. The model demonstrates strong generalization across different generative architectures by jointly predicting engagement signals and perceptual quality dimensions.