APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

Hugging Face Daily Papers Papers

Summary

APEX is a large-scale multi-task learning framework that predicts both popularity and aesthetic quality of AI-generated music using frozen audio embeddings. The model demonstrates strong generalization across different generative architectures by jointly predicting engagement signals and perceptual quality dimensions.

Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit is aesthetic quality. We propose APEX, the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptual aesthetic quality dimensions from frozen audio embeddings extracted from MERT, a self-supervised music understanding model. Aesthetic quality and popularity capture complementary aspects of music that together prove valuable: in an out-of-distribution evaluation on the Music Arena dataset, comprising pairwise human preference battles across eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/08/26, 08:06 AM

Paper page - APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

Source: https://huggingface.co/papers/2605.03395

Abstract

A large-scale multi-task learning framework for AI-generated music predicts both popularity and aesthetic quality using frozen audio embeddings from a self-supervised music understanding model, demonstrating strong generalization across different generative architectures.

Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise ofAI-generated musicplatforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit isaesthetic quality. We propose APEX, the first large-scalemulti-task learningframework forAI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptualaesthetic qualitydimensions fromfrozen audio embeddingsextracted fromMERT, aself-supervised music understandingmodel.Aesthetic qualityand popularity capture complementary aspects of music that together prove valuable: in anout-of-distribution evaluationon the Music Arena dataset, comprising pairwisehuman preference battlesacross eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.

View arXiv pageView PDFGitHub5Add to collection

Community

Paper author

Paper submitter

1 day ago

Large-scale aesthetics informed AI music hit prediction model in terms of a streams and likes-score.

Upload images, audio, and videos by dragging in the text input, pasting, orclicking here.

Tap or paste here to upload images

Get this paper in your agent:

hf papers read 2605\.03395

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### amaai-lab/apex Feature Extraction• Updated1 day ago • 286 • 3

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.03395 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.03395 in a Space README.md to link it from this page.

Collections including this paper1

Similar Articles

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Hugging Face Daily Papers

ArtifactNet is a lightweight neural network framework that detects AI-generated music by analyzing codec-specific artifacts in audio signals, achieving F1=0.9829 on a new 6,183-track benchmark (ArtifactBench) with 49x fewer parameters than competing methods. The approach uses forensic physics principles to extract codec residuals through a bounded-mask UNet and compact CNN, with codec-aware training reducing cross-codec drift by 83%.

Jukebox

OpenAI Blog

OpenAI's Jukebox is a generative model that produces music as raw audio, including vocals and instruments, using a VQ-VAE for compression and hierarchical Sparse Transformer priors to handle long-range musical structure. It represents a significant step beyond symbolic music generation by operating directly in the raw audio domain.

Music AI Sandbox, now with new features and broader access

Google DeepMind Blog

Google DeepMind expands Music AI Sandbox with new features including Lyria 2 music generation model and broader access to musicians in the U.S., enabling AI-assisted music creation through tools for generating, extending, and editing musical content.

Learning from human preferences

OpenAI Blog

OpenAI presents a method for training AI agents using human preference feedback, where an agent learns reward functions from human comparisons of behavior trajectories and uses reinforcement learning to optimize for the inferred goals. The approach demonstrates strong sample efficiency, requiring less than 1000 bits of human feedback to train an agent to perform a backflip.