Skip to content

Latent diffusion-based approach for generating sequences of MIDI music from scratch. Paper + code.

License

Notifications You must be signed in to change notification settings

hirenkumawat/latent-diffusion-MIDI

 
 

Repository files navigation

Latent Diffusion MIDI

Fork of Latent Diffusion repo for generating sequences of MIDI music from scratch. We use a dataset of MIDI files, convert them to images using midi2img, and train the model to learn the distribution and create similar MIDI music from pure Gaussian noise.

The full paper describing our methods, experiments, and results is linked here.

Screenshot 2024-07-23 at 8 52 34 PM

Our Approach and Motivation

We observed that the main challenge with applying diffusion models to this problem space lies in the lack of continuity in the MIDI domain. Prior works have solved this continuity issue by using a VAE to map from the discrete MIDI domain to a continuous latent variable space. Instead of utilising a VAE to map to a continuous domain, we believe it will be more useful to employ an encoder-decoder architecture for solving the continuity issue. Moreover, latent diffusion has shown remarkable results in vision due to the more semantic meaning of the encoded domain. Our hypothesis is that using a latent-diffusion approach will result in high-quality MIDI generation, and specifically that the encoder-decoder architecture will perform a better job at finding a latent domain that is more meaningful for the diffusion model. Finally, an encoder-decoder architecture has greater visibility due to the underlying attention mechanisms that can be used to further analyse intermediate outputs. Moreover, using a non-VAE approach to access the latent space gives us a deterministic method to interact with the area, which we believe makes the model better during training for the reconstruction


You can find the original latent diffusion repo and their complete documentation here.

About

Latent diffusion-based approach for generating sequences of MIDI music from scratch. Paper + code.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 90.4%
  • Python 9.5%
  • Shell 0.1%