APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
Summary
APEX is a large-scale multi-task learning framework that predicts both popularity and aesthetic quality of AI-generated music using frozen audio embeddings. The model demonstrates strong generalization across different generative architectures by jointly predicting engagement signals and perceptual quality dimensions.
View Cached Full Text
Cached at: 05/08/26, 08:06 AM
Paper page - APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
Source: https://huggingface.co/papers/2605.03395
Abstract
A large-scale multi-task learning framework for AI-generated music predicts both popularity and aesthetic quality using frozen audio embeddings from a self-supervised music understanding model, demonstrating strong generalization across different generative architectures.
Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise ofAI-generated musicplatforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit isaesthetic quality. We propose APEX, the first large-scalemulti-task learningframework forAI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptualaesthetic qualitydimensions fromfrozen audio embeddingsextracted fromMERT, aself-supervised music understandingmodel.Aesthetic qualityand popularity capture complementary aspects of music that together prove valuable: in anout-of-distribution evaluationon the Music Arena dataset, comprising pairwisehuman preference battlesacross eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.
View arXiv pageView PDFGitHub5Add to collection
Community
Paper author
Paper submitter
Large-scale aesthetics informed AI music hit prediction model in terms of a streams and likes-score.
Upload images, audio, and videos by dragging in the text input, pasting, orclicking here.
Tap or paste here to upload images
Get this paper in your agent:
hf papers read 2605\.03395
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### amaai-lab/apex Feature Extraction• Updated1 day ago • 286 • 3
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.03395 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.03395 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics
ArtifactNet is a lightweight neural network framework that detects AI-generated music by analyzing codec-specific artifacts in audio signals, achieving F1=0.9829 on a new 6,183-track benchmark (ArtifactBench) with 49x fewer parameters than competing methods. The approach uses forensic physics principles to extract codec residuals through a bounded-mask UNet and compact CNN, with codec-aware training reducing cross-codec drift by 83%.
The BEST local AI music generator is here! Free & unlimited
ACE-Step 1.5 XL is an open-source music generator that surpasses Suno & Udio in quality and speed, running unlimited on a 12 GB GPU with ~120× real-time generation.
Jukebox
OpenAI's Jukebox is a generative model that produces music as raw audio, including vocals and instruments, using a VQ-VAE for compression and hierarchical Sparse Transformer priors to handle long-range musical structure. It represents a significant step beyond symbolic music generation by operating directly in the raw audio domain.
Music AI Sandbox, now with new features and broader access
Google DeepMind expands Music AI Sandbox with new features including Lyria 2 music generation model and broader access to musicians in the U.S., enabling AI-assisted music creation through tools for generating, extending, and editing musical content.
Learning from human preferences
OpenAI presents a method for training AI agents using human preference feedback, where an agent learns reward functions from human comparisons of behavior trajectories and uses reinforcement learning to optimize for the inferred goals. The approach demonstrates strong sample efficiency, requiring less than 1000 bits of human feedback to train an agent to perform a backflip.