This is the source code for the experiments related to our ICASSP 2024 paper, A Fully Differentiable Model for Unsupervised Singing Voice Separation.
We propose to extend the work of Schultze-Foster et al.
Note 1: This project builds upon the model of Schultze-Foster et al. and parts of the code are taken/adapted from their repository.
Note 2: The trained models of multif0-estimation-polyvocals
-
K. Schulze-Forster, G. Richard, L. Kelley, C. S. J. Doire and R. Badeau, "Unsupervised Music Source Separation Using Differentiable Parametric Source Models," IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1-14, 2023
-
H. Cuesta, B. McFee, and E. Gómez, “Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural Networks”, in ISMIR, Montréal, Canada, 2020
-
H. Cuesta and E. Gómez, “Voice Assignment in Vocal Quartets Using Deep Learning Models Based on Pitch Salience”, Transactions of the International Society for Music Information, 2022
📁 CSD Database | Cantoría Database
Create an environment using the environment.yml
file:
conda env create -f environment.yml
To start the training, run the train.py
or train_unets.py
script:
python train.py -c config.txt
python train_u_nets.py -c unet_config.txt
To evaluate the model, run the eval.py
script:
python eval.py --tag TAG --f0-from-mix --test-set CSD --show-progress
Note: TAG
is the evaluated model's name. (Example: UMSS_4s_bcbq
)
To separate the voices of a mixture, run the inference.py
script:
python inference.py --audio_path AUDIO_PATH --tag TAG --mode MODE --output_dir OUTPUT_DIR --device DEVICE
with:
AUDIO_PATH
: path to the mixture audio fileTAG
: name of the model to use (between our trained models, default isW-Up_bcbq
)MODE
: mode to save the audio files (betweensegmented_audio
andfull_audio
, default issegmented_audio
).OUTPUT_DIR
: path where the separated voices will be saved (default is./output
)DEVICE
: device to use (betweencpu
andcuda
, default iscpu
)
Note: Except for AUDIO_PATH
, all other arguments are optional and have default values.
The trained models used in our experiments are available here.