Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Hugging Face Daily Papers 05/21/26, 12:00 AM Papers

Summary

This paper introduces Live Music Diffusion Models (LMDMs), which modify the diffusion process to enable efficient block-wise processing and novel training paradigms for real-time interactive music generation on consumer hardware, outperforming discrete autoregressive models in inference complexity and enabling stable post-training alignment.

Interactive streaming music generation promises the use of generative models for live performance and co-creation that is impossible with offline models. However, SOTA models exist in the discrete-AR regime, requiring industrial levels of compute for both training and inference. In this work, we investigate whether audio diffusion models, with their wide support in the open-source community but non-streaming bidirectional nature, can be repurposed efficiently into interactive models accessible on consumer hardware. By taking a critical look at the modern pipeline for block-wise outpainting diffusion, we identify critical inefficiencies during inference that result in strictly worse computational efficiency than their discrete-AR counterparts. We propose Live Music Diffusion Models (LMDMs), a simple modification of the generative diffusion process that recovers, and then outperforms, the inference complexity of the discrete Live Music Models (LMMs) through block-wise KV Caching. Unlike LMMs, LMDMs further enable stable post-training alignment through our novel ARC-Forcing paradigm, reducing error accumulation without any explicit RL or reward models. We demonstrate the application of LMDMs in a number of creative domains, including text-conditioned generation, sketch-based music synthesis, and jamming. We finally show how LMDMs can be used as a generative instrument in a real artist-AI collaboration, utilizing LMDMs as a "generative delay" to transform musicians' improvisation live for variable timbral effects while running locally on a consumer gaming laptop.

Original Article

View Cached Full Text

Cached at: 05/22/26, 10:22 PM

Paper page - Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Source: https://huggingface.co/papers/2605.22717 Authors:

Abstract

Audio diffusion models are adapted for interactive music generation through efficient block-wise processing and novel training paradigms that enable real-time performance on consumer hardware.

Interactive streaming music generation promises the use ofgenerative modelsfor live performance and co-creation that is impossible with offline models. However, SOTA models exist in the discrete-AR regime, requiring industrial levels of compute for both training and inference. In this work, we investigate whetheraudio diffusion models, with their wide support in the open-source community but non-streaming bidirectional nature, can be repurposed efficiently into interactive models accessible on consumer hardware. By taking a critical look at the modern pipeline forblock-wise outpaintingdiffusion, we identify critical inefficiencies during inference that result in strictly worse computational efficiency than their discrete-AR counterparts. We proposeLive Music Diffusion Models(LMDMs), a simple modification of the generative diffusion process that recovers, and then outperforms, theinference complexityof the discrete Live Music Models (LMMs) throughblock-wise KV Caching. Unlike LMMs, LMDMs further enable stable post-training alignment through our novelARC-Forcing paradigm, reducing error accumulation without any explicit RL or reward models. We demonstrate the application of LMDMs in a number of creative domains, including text-conditioned generation, sketch-based music synthesis, and jamming. We finally show how LMDMs can be used as agenerative instrumentin a real artist-AI collaboration, utilizing LMDMs as a “generative delay” to transform musicians’ improvisation live for variable timbral effects while running locally on a consumer gaming laptop.

View arXiv page View PDF Project page GitHub5 Add to collection

Get this paper in your agent:

hf papers read 2605\.22717

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.22717 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.22717 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.22717 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Paper page - Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Learnability-Informed Fine-Tuning of Diffusion Language Models

DEMON: Diffusion Engine for Musical Orchestrated Noise

Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation

Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion

Submit Feedback

Similar Articles

Learnability-Informed Fine-Tuning of Diffusion Language Models

DEMON: Diffusion Engine for Musical Orchestrated Noise

Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation

Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion