Skip to content

PierreChouteau/umss_icassp

Repository files navigation

A Fully Differentiable Model for Unsupervised Singing Voice Separation

Description

This is the source code for the experiments related to our ICASSP 2024 paper, A Fully Differentiable Model for Unsupervised Singing Voice Separation.

We propose to extend the work of Schultze-Foster et al.$^{1}$, and to build a complete, fully differentiable model by integrating a multipitch estimator and a novel differentiable voice assignment module within the core model.

Note 1: This project builds upon the model of Schultze-Foster et al. and parts of the code are taken/adapted from their repository.

Note 2: The trained models of multif0-estimation-polyvocals$^{2}$ and, voas-vocal-quartets$^{3}$ have been used in our experiments.

  1. K. Schulze-Forster, G. Richard, L. Kelley, C. S. J. Doire and R. Badeau, "Unsupervised Music Source Separation Using Differentiable Parametric Source Models," IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1-14, 2023

  2. H. Cuesta, B. McFee, and E. Gómez, “Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural Networks”, in ISMIR, Montréal, Canada, 2020

  3. H. Cuesta and E. Gómez, “Voice Assignment in Vocal Quartets Using Deep Learning Models Based on Pitch Salience”, Transactions of the International Society for Music Information, 2022

Links

📄 Paper

🔊 Audio examples

📁 CSD DatabaseCantoría Database

Installing the working environment

With conda

Create an environment using the environment.yml file:

conda env create -f environment.yml

Training

To start the training, run the train.py or train_unets.py script:

python train.py -c config.txt
python train_u_nets.py -c unet_config.txt

Evaluation

To evaluate the model, run the eval.py script:

python eval.py --tag TAG --f0-from-mix --test-set CSD --show-progress

Note: TAG is the evaluated model's name. (Example: UMSS_4s_bcbq)

Inference

To separate the voices of a mixture, run the inference.py script:

python inference.py --audio_path AUDIO_PATH --tag TAG --mode MODE --output_dir OUTPUT_DIR --device DEVICE

with:

  • AUDIO_PATH: path to the mixture audio file
  • TAG: name of the model to use (between our trained models, default is W-Up_bcbq)
  • MODE: mode to save the audio files (between segmented_audio and full_audio, default is segmented_audio).
  • OUTPUT_DIR: path where the separated voices will be saved (default is ./output)
  • DEVICE: device to use (between cpu and cuda, default is cpu)

Note: Except for AUDIO_PATH, all other arguments are optional and have default values.

Trained models

The trained models used in our experiments are available here.

About

ICASSP 2024 paper - A Fully Differentiable Model for Unsupervised Singing Voice Separation

Topics

Resources

License

Stars

Watchers

Forks

Languages