APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
Summary
APEX is a large-scale multi-task learning framework that predicts both popularity and aesthetic quality of AI-generated music using frozen audio embeddings. The model demonstrates strong generalization across different generative architectures by jointly predicting engagement signals and perceptual quality dimensions.
View Cached Full Text
Cached at: 05/08/26, 08:06 AM
Paper page - APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
Source: https://huggingface.co/papers/2605.03395
Abstract
A large-scale multi-task learning framework for AI-generated music predicts both popularity and aesthetic quality using frozen audio embeddings from a self-supervised music understanding model, demonstrating strong generalization across different generative architectures.
Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise ofAI-generated musicplatforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit isaesthetic quality. We propose APEX, the first large-scalemulti-task learningframework forAI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptualaesthetic qualitydimensions fromfrozen audio embeddingsextracted fromMERT, aself-supervised music understandingmodel.Aesthetic qualityand popularity capture complementary aspects of music that together prove valuable: in anout-of-distribution evaluationon the Music Arena dataset, comprising pairwisehuman preference battlesacross eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.
View arXiv pageView PDFGitHub5Add to collection
Community
Paper author
Paper submitter
Large-scale aesthetics informed AI music hit prediction model in terms of a streams and likes-score.
Upload images, audio, and videos by dragging in the text input, pasting, orclicking here.
Tap or paste here to upload images
Get this paper in your agent:
hf papers read 2605\.03395
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### amaai-lab/apex Feature Extraction• Updated1 day ago • 286 • 3
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.03395 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.03395 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
APEX: Adaptive Principle EXtraction A Three-Layer Self-Evolution Framework for Production AI Agents
APEX proposes a three-layer self-evolution framework for production AI agents that simultaneously optimizes the harness, behavioural principles, and workflow topology. Experiments on a production agent show significant improvements in health score and workflow quality with minimal LLM calls.
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics
ArtifactNet is a lightweight neural network framework that detects AI-generated music by analyzing codec-specific artifacts in audio signals, achieving F1=0.9829 on a new 6,183-track benchmark (ArtifactBench) with 49x fewer parameters than competing methods. The approach uses forensic physics principles to extract codec residuals through a bounded-mask UNet and compact CNN, with codec-aware training reducing cross-codec drift by 83%.
APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations
APEX is a network-native, decoder-only transformer for forecasting and anomaly detection in wireless edge telemetry, pre-trained on data from ~4,500 production networks. It achieves 18% lower MAE than the best general-purpose time-series foundation model on a DHCP degradation benchmark and enables sub-second inference on edge hardware.
APEX: Automated Prompt Engineering eXpert with Dynamic Data Selection
APEX introduces a dynamic data selection strategy for automatic prompt optimization, stratifying datasets into easy, hard, and mixed tiers to improve data efficiency, achieving significant performance gains over initial prompts on multiple benchmarks.
Improving Text-to-Music Generation with Human Preference Rewards
This paper presents a text-to-music generation system that leverages reward conditioning, expert iteration, and preference tuning to improve audio quality within a 120M-parameter model, submitted to the ATTM Grand Challenge at ICME 2026.