Skip to content

Commit d72add0

Browse files
Add some new audio papers - what a week!
1 parent 7b007e7 commit d72add0

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

unit4/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,9 +98,15 @@ _A spectrogram generated with Riffusion ([image source](https://www.riffusion.co
9898

9999
While there has been some work on generating audio directly using diffusion models (e.g. [DiffWave](https://arxiv.org/abs/2009.09761)) the most successful approach so far has been to convert the audio signal into something called a spectrogram, which effectively 'encodes' the audio as a 2D "image" which can then be used to train the kinds of diffusion models we're used to using for image generation. The resulting generated spectrograms can then be converted into audio using existing methods. This approach is behind the recently-released Riffusion, which fine-tuned Stable Diffusion to generate spectrograms conditioned on text - [try it out here](https://www.riffusion.com/).
100100

101+
The field of audio generation is moving extremely quickly. Over the past week (at the time of writing) there have been at least 5 new advances announced, which are marked with a star in the list below:
102+
101103
Key references:
102104
- [DiffWave: A Versatile Diffusion Model for Audio Synthesis](https://arxiv.org/abs/2009.09761)
103105
- ['Riffusion'](https://www.riffusion.com/about) (and [code](https://github.com/riffusion/riffusion))
106+
- *[MusicLM](https://google-research.github.io/seanet/musiclm/examples/) by Google generates consistent audio from text and can be conditioned with hummed or whistled melodies
107+
- *[RAVE2](https://github.com/acids-ircam/RAVE) - a new version of a Variational Auto-Encoder that will be useful for latent diffusion on audio tasks. This is used in the soon-to-be-announced *[AudioLDM](https://twitter.com/LiuHaohe/status/1619119637660327936?s=20&t=jMkPWBFuAH19HI9m5Sklmg) model
108+
- *[Noise2Music](https://noise2music.github.io/) - A diffusion model trained to produce high-quality 30-second clips of audio based on text descriptions
109+
- *[Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models](https://text-to-audio.github.io/) - a diffusion model trained to generate diverse sounds based on text
104110

105111
## New Architectures and Approaches - Towards 'Iterative Refinement'
106112

0 commit comments

Comments
 (0)