PianoKontext: Expressive Performance Rendering from Deadpan Context

Hugging Face Daily Papers Papers

Summary

PianoKontext generates variable-length expressive piano performances from deadpan MIDI scores by aligning audio and MIDI in latent space using Dynamic Time Warping and flow matching with DiT blocks.

Expressive performance rendering (EPR) aims to generate realistic performances constrained on sequences of notes. However, flow matching audio editing models manipulate only synchronized music samples of the same duration, limiting their understanding of expressive timing. We introduce PianoKontext, a flow matching rendering model for classical piano music that generates variable-length performances in the latent space of a pretrained Music2Latent model. We synthesize MIDI scores into deadpan audio and employ Dynamic Time Warping (DTW) in the latent space to construct paired data for training. The aligned embeddings are concatenated in DiT blocks, allowing for a simple and effective learning of the dependencies between the score and performances. Audio samples are available at our demo page: https://realfolkcode.github.io/pianokontext_demo/.
Original Article
View Cached Full Text

Cached at: 06/12/26, 10:52 AM

Paper page - PianoKontext: Expressive Performance Rendering from Deadpan Context

Source: https://huggingface.co/papers/2606.12282 Published on Jun 10

·

Submitted byhttps://huggingface.co/realfolkcode

Dmitryon Jun 12

Abstract

PianoKontext generates variable-length piano performances by aligning MIDI scores with audio in latent space using DTW and DiT blocks.

Expressive performance rendering (EPR) aims to generate realistic performances constrained on sequences of notes. However,flow matchingaudio editing modelsmanipulate only synchronized music samples of the same duration, limiting their understanding ofexpressive timing. We introducePianoKontext, aflow matchingrendering model for classical piano music that generates variable-length performances in thelatent spaceof a pretrainedMusic2Latentmodel. We synthesize MIDI scores into deadpan audio and employDynamic Time Warping(DTW) in thelatent spaceto construct paired data for training. The aligned embeddings are concatenated inDiT blocks, allowing for a simple and effective learning of the dependencies between the score and performances. Audio samples are available at our demo page: https://realfolkcode.github.io/pianokontext_demo/.

View arXiv pageView PDFProject pageGitHub1Add to collection

Get this paper in your agent:

hf papers read 2606\.12282

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.12282 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.12282 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.12282 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

PianoCoRe: Combined and Refined Piano MIDI Dataset

Hugging Face Daily Papers

PianoCoRe is a large-scale piano MIDI dataset unifying and refining open-source corpora with 250,046 performances of 5,625 pieces by 483 composers, featuring note-level alignments for music information retrieval and including a MIDI quality classifier and alignment refinement pipeline.

Coffee Piano

Product Hunt

Coffee Piano is a browser-based piano studio that provides visual harmony tools for music creation.

DramaBox by Resemble AI

Product Hunt

DramaBox by Resemble AI converts scene descriptions into AI-generated vocal performances.