MuseNet

OpenAI Blog 04/25/19, 07:00 AM Models

music-generation transformer deep-learning gpt-2-based openai unsupervised-learning

Summary

OpenAI released MuseNet, a deep neural network based on GPT-2 architecture that generates 4-minute musical compositions with 10 instruments by learning patterns from hundreds of thousands of MIDI files. The model can combine multiple music styles and blend them in novel ways.

We’ve created MuseNet, a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles. MuseNet was not explicitly programmed with our understanding of music, but instead discovered patterns of harmony, rhythm, and style by learning to predict the next token in hundreds of thousands of MIDI files. MuseNet uses the same general-purpose unsupervised technology as GPT-2, a large-scale transformer model trained to predict the next token in a sequence, whether audio or text.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 02:55 PM

# MuseNet Source: [https://openai.com/index/musenet/](https://openai.com/index/musenet/) We’ve created MuseNet, a deep neural network that can generate 4\-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles\. MuseNet was not explicitly programmed with our understanding of music, but instead discovered patterns of harmony, rhythm, and style by learning to predict the next token in hundreds of thousands of MIDI files\. MuseNet uses the same general\-purpose unsupervised technology as[GPT‑2⁠](https://openai.com/index/better-language-models/), a large\-scale[transformer⁠\(opens in a new window\)](https://arxiv.org/abs/1706.03762)model trained to predict the next token in a sequence, whether audio or text\. Since MuseNet knows many different styles, we can blend generations in novel ways\.[A](https://openai.com/index/musenet/#citation-bottom-A)Here the model is given the first 6 notes of a Chopin Nocturne, but is asked to generate a piece in a pop style with piano, drums, bass, and guitar\. The model manages to blend the two styles convincingly, with the full band joining in at around the 30 second mark: We collected training data for MuseNet from many different sources\.[ClassicalArchives⁠\(opens in a new window\)](https://www.classicalarchives.com/)and[BitMidi⁠\(opens in a new window\)](https://bitmidi.com/)donated their large collections of MIDI files for this project, and we also found several collections online, including jazz, pop, African, Indian, and Arabic styles\. Additionally, we used the[MAESTRO dataset⁠\(opens in a new window\)](https://arxiv.org/abs/1810.12247)\. The transformer is trained on sequential data: given a set of notes, we ask it to predict the upcoming note\. We experimented with several different ways to encode the MIDI files into tokens suitable for this task\. First, a chordwise approach that considered every combination of notes sounding at one time as an individual “chord”, and assigned a token to each chord\. Second, we tried condensing the musical patterns by only focusing on the starts of notes, and tried further compressing that using a byte pair encoding scheme\. We also tried two different methods of marking the passage of time: either tokens that were scaled according to the piece’s tempo \(so that the tokens represented a musical beat or fraction of a beat\), or tokens that marked absolute time in seconds\. We landed on an encoding that combines expressivity with conciseness: combining the pitch, volume, and instrument information into a single token\.

MuseNet

Similar Articles

Music AI Sandbox, now with new features and broader access

Introducing Muse Spark: Scaling Towards Personal Superintelligence

Jukebox

GPT-4

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Submit Feedback

Similar Articles

Music AI Sandbox, now with new features and broader access

Introducing Muse Spark: Scaling Towards Personal Superintelligence

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics