How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling
Summary
This paper evaluates how small adaptation interfaces (LoRA, IA3, BitFit, prefix tuning, full fine-tuning) extend a frozen Music Transformer to eleven target genres for chord-symbol time-series modeling. Results show consistent harmonic prediction improvement but limited genre identity representation, concluding that chord symbols alone are insufficient to capture complete genre identity.
View Cached Full Text
Cached at: 06/08/26, 11:15 AM
Paper page - How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling
Source: https://huggingface.co/papers/2606.07334
Abstract
Small adaptation interfaces extend a frozen Music Transformer model to multiple genres, showing consistent improvement in harmonic prediction but limited genre identity representation.
Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, andmusical conventionmeet. This report treatschord-symbol sequencesnot as a complete representation of music, but as an interpretable, controllable time series forgenre-local harmonic modeling. Starting from a frozen pop-jazzMusic Transformercheckpoint, I evaluate how far small adaptation interfaces can extend the model to eleven target genres: blues, bossa nova, Bach chorales, country, electronic, folk, funk, gospel, hip-hop, R&B/soul, and rock. The main evaluation comparesLoRA,IA3,BitFit,prefix tuning, andfull fine-tuningover 11 genres and 3 seeds, a complete 165-cell grid. All five methods improve over the frozen base on held-out chord prediction, with macro gains from +2.89 to +3.61 points;LoRAandIA3score highest, but Wilcoxon tests with Holm and Benjamini-Hochberg correction do not support a decisive winner. A matched-data-size control sharpens this: when genres are sub-sampled to a common corpus size,IA3stays on top butLoRA’s full-data edge disappears and it falls to last, indicating the small gaps are partly data-driven. A control-token baseline is also strong, and wrong-genre adapters often beat the frozen base, suggesting much of the effect comes from lightweight conditioning over a reusable harmonic base rather than one particular adapter family. Additional diagnostics (rank sweeps, wrong-genre rotation, a base-checkpoint ablation, chord-only genre classification, generated-output statistics, real-song evaluation, and duplicate analysis) support a bounded conclusion: chord-symbol adaptation reliably improves genre-localharmonic prediction, but chord symbols alone do not carry complete genre identity. The report therefore avoids claims about perceived genre authenticity or full musical quality, which require controlled listener or musician evaluation.
View arXiv pageView PDFProject pageGitHub0Add to collection
Get this paper in your agent:
hf papers read 2606\.07334
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper17
#### PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa Updatedabout 1 hour ago • 145 • 1
#### PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline Updatedabout 1 hour ago • 251
#### PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80 Updatedabout 1 hour ago • 282
#### PearlLeeStudio/TheArtist-MusicTransformer-ft-pop67 Updatedabout 1 hour ago • 260
Browse 17 models citing this paper## Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.07334 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.07334 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Under the Hood: Building a Real-Time Chord Recognizer
This article explains the technical architecture of a real-time chord recognizer, detailing a four-stage pipeline using pitch-class bitmasks, candidate generation, score normalization, and musical heuristics.
Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators
This paper introduces Live Music Diffusion Models (LMDMs), which modify the diffusion process to enable efficient block-wise processing and novel training paradigms for real-time interactive music generation on consumer hardware, outperforming discrete autoregressive models in inference complexity and enabling stable post-training alignment.
Domain Adaptation and Reasoning Frameworks in Language Models: A Controlled Experiment with Historical Cosmology
This paper investigates how domain adaptation reshapes explanatory behavior in language models by training on a pre-Copernican corpus, finding that fine-tuning shifts explanatory framing more than cosmological stance.
Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding
Chronicle is a 324M-parameter decoder-only transformer pretrained from scratch on both natural language and time series, achieving competitive performance on NLU and time series classification tasks, and setting new state-of-the-art for frozen-embedding time series classification on UCR/UEA datasets.
ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models
ADAPTOOD is a novel framework that uses data uncertainty to quantify distribution shift severity and guide fine-tuning of ECG time series models for out-of-distribution settings. It combines uncertainty estimation with low-rank model updates and adaptive hyperparameter optimization, achieving up to 7% higher accuracy and 12.9% higher precision than existing OOD adaptation methods.