PianoKontext: Expressive Performance Rendering from Deadpan Context

Hugging Face Daily Papers 06/10/26, 12:00 AM Papers

music-generation piano flow-matching expressive-performance audio-synthesis midi dynamic-time-warping

Summary

PianoKontext generates variable-length expressive piano performances from deadpan MIDI scores by aligning audio and MIDI in latent space using Dynamic Time Warping and flow matching with DiT blocks.

Expressive performance rendering (EPR) aims to generate realistic performances constrained on sequences of notes. However, flow matching audio editing models manipulate only synchronized music samples of the same duration, limiting their understanding of expressive timing. We introduce PianoKontext, a flow matching rendering model for classical piano music that generates variable-length performances in the latent space of a pretrained Music2Latent model. We synthesize MIDI scores into deadpan audio and employ Dynamic Time Warping (DTW) in the latent space to construct paired data for training. The aligned embeddings are concatenated in DiT blocks, allowing for a simple and effective learning of the dependencies between the score and performances. Audio samples are available at our demo page: https://realfolkcode.github.io/pianokontext_demo/.

Original Article

View Cached Full Text

Cached at: 06/12/26, 10:52 AM

Paper page - PianoKontext: Expressive Performance Rendering from Deadpan Context

Source: https://huggingface.co/papers/2606.12282 Published on Jun 10

Submitted byhttps://huggingface.co/realfolkcode

Dmitryon Jun 12

Abstract

PianoKontext generates variable-length piano performances by aligning MIDI scores with audio in latent space using DTW and DiT blocks.

Expressive performance rendering (EPR) aims to generate realistic performances constrained on sequences of notes. However,flow matching audio editing modelsmanipulate only synchronized music samples of the same duration, limiting their understanding ofexpressive timing. We introducePianoKontext, aflow matchingrendering model for classical piano music that generates variable-length performances in thelatent spaceof a pretrainedMusic2Latentmodel. We synthesize MIDI scores into deadpan audio and employDynamic Time Warping(DTW) in thelatent spaceto construct paired data for training. The aligned embeddings are concatenated inDiT blocks, allowing for a simple and effective learning of the dependencies between the score and performances. Audio samples are available at our demo page: https://realfolkcode.github.io/pianokontext_demo/.

View arXiv page View PDF Project page GitHub1 Add to collection

Get this paper in your agent:

hf papers read 2606\.12282

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.12282 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.12282 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.12282 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

PianoKontext: Expressive Performance Rendering from Deadpan Context

Paper page - PianoKontext: Expressive Performance Rendering from Deadpan Context

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

PianoCoRe: Combined and Refined Piano MIDI Dataset

Coffee Piano

@danshipper: codex teaches me to play piano:

@iScienceLuvr: Music-JEPA: Learning a World Model of Sound from Action "we propose to learn a world model of piano sound using JEPA by…

Wan-Dancer: A Hierarchical Framework for Minute-scale Coherent Music-to-Dance Generation

Submit Feedback

Similar Articles

PianoCoRe: Combined and Refined Piano MIDI Dataset

@danshipper: codex teaches me to play piano:

@iScienceLuvr: Music-JEPA: Learning a World Model of Sound from Action "we propose to learn a world model of piano sound using JEPA by…

Wan-Dancer: A Hierarchical Framework for Minute-scale Coherent Music-to-Dance Generation