A Fully Differentiable Model for Unsupervised Singing Voice Separation

Description

This is the source code for the experiments related to our ICASSP 2024 paper, A Fully Differentiable Model for Unsupervised Singing Voice Separation.

We propose to extend the work of Schultze-Foster et al.$^{1}$, and to build a complete, fully differentiable model by integrating a multipitch estimator and a novel differentiable voice assignment module within the core model.

Note 1: This project builds upon the model of Schultze-Foster et al. and parts of the code are taken/adapted from their repository.

Note 2: The trained models of multif0-estimation-polyvocals$^{2}$ and, voas-vocal-quartets$^{3}$ have been used in our experiments.

K. Schulze-Forster, G. Richard, L. Kelley, C. S. J. Doire and R. Badeau, "Unsupervised Music Source Separation Using Differentiable Parametric Source Models," IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1-14, 2023
H. Cuesta, B. McFee, and E. Gómez, “Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural Networks”, in ISMIR, Montréal, Canada, 2020
H. Cuesta and E. Gómez, “Voice Assignment in Vocal Quartets Using Deep Learning Models Based on Pitch Salience”, Transactions of the International Society for Music Information, 2022

Links

📄 Paper

🔊 Audio examples

📁 CSD Database | Cantoría Database

Installing the working environment

With conda

Create an environment using the environment.yml file:

conda env create -f environment.yml

Training

To start the training, run the train.py or train_unets.py script:

python train.py -c config.txt

python train_u_nets.py -c unet_config.txt

Evaluation

To evaluate the model, run the eval.py script:

python eval.py --tag TAG --f0-from-mix --test-set CSD --show-progress

Note: TAG is the evaluated model's name. (Example: UMSS_4s_bcbq)

Inference

To separate the voices of a mixture, run the inference.py script:

python inference.py --audio_path AUDIO_PATH --tag TAG --mode MODE --output_dir OUTPUT_DIR --device DEVICE

with:

AUDIO_PATH: path to the mixture audio file
TAG: name of the model to use (between our trained models, default is W-Up_bcbq)
MODE: mode to save the audio files (between segmented_audio and full_audio, default is segmented_audio).
OUTPUT_DIR: path where the separated voices will be saved (default is ./output)
DEVICE: device to use (between cpu and cuda, default is cpu)

Note: Except for AUDIO_PATH, all other arguments are optional and have default values.

Trained models

The trained models used in our experiments are available here.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
datasets		datasets
docs		docs
models		models
pre-trained_models		pre-trained_models
preprocess_datasets		preprocess_datasets
trained_models		trained_models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.txt		config.txt
data.py		data.py
environment.yml		environment.yml
eval.py		eval.py
evaluation_metrics.py		evaluation_metrics.py
inference.py		inference.py
run_multiple_evals.py		run_multiple_evals.py
train.py		train.py
train_u_nets.py		train_u_nets.py
unet_config.txt		unet_config.txt
utils.py		utils.py
utils_eval.py		utils_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Fully Differentiable Model for Unsupervised Singing Voice Separation

Description

Links

Installing the working environment

With conda

Training

Evaluation

Inference

Trained models

About

Uh oh!

Uh oh!

Languages

License

PierreChouteau/umss_icassp

Folders and files

Latest commit

History

Repository files navigation

A Fully Differentiable Model for Unsupervised Singing Voice Separation

Description

Links

Installing the working environment

With conda

Training

Evaluation

Inference

Trained models

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages