The Pop-K MIDI Dataset is an open collection of modern pop melodies developed for training and testing symbolic music models in a constrained musical domain. The dataset contains 305,815 files augmented from a base dataset of 8-bar vocal lead, chords, and bass melody tracks. An accompanying model trained on this dataset can be found on GitHub.
The dataset was created to evaluate how limited training data can be scaled via augmentation to efficiently train a model to generate a specific musical style. Additionally, the melodies were transposed to C major and A minor, with timing information normalized to 120 BPM at a 96-tick resolution. This results in a total duration of approximately 1360 hours of musical notation.
The Pop-K MIDI Dataset is licensed under the Creative Commons Attribution-NonCommercial (CC BY-NC) license. While efforts have been made to augment and transform the original melodies, some segments may still resemble the source material.
See examples folder to preview MIDI and mp3 demos.
Direct Download (56.2MB) popk_dataset_300k_mid.tar.gz
If you use this dataset for a research or development project, please cite the following references:
@misc{Pop-K MIDI Dataset,
publisher = {Patchbanks},
title = {Pop-K: Augmented MIDI Dataset for Learning Constrained Modern Pop Melodies},
year = {2025},
doi = {10.5281/zenodo.14791511},
url = {https://doi.org/10.5281/zenodo.14791511},
}
For any questions or feedback please contact us.