The code in this repo aims to help reproduce the results in the work:
Jordi Pons, Rong Gong, and Xavier Serra. 2017. Score-informed Syllable Segmentation for A Cappella Singing Voice with Convolutional Neural Networks. In 18th International Society for Music Information Retrieval Conference. Suzhou, China.
This paper introduces a new score-informed method for the segmentation of jingju a cappella singing voice into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated syllable onset detection function (ODF) and its score. Throughout the paper, we first examine the jingju syllables structure and propose a definition of the term “syllable onset”. Then, we identify which are the challenges that jingju a cappella singing poses. We propose using a score-informed Viterbi algorithm –instead of thresholding the onset function–, because the available musical knowledge we have can be used to inform the Viterbi algorithm in order to overcome the identified challenges. In addition, we investigate how to improve the syllable ODF estimation with convolutional neural networks (CNNs). We propose a novel CNN architecture that allows to efficiently capture different time- frequency scales for estimating syllable onsets. The proposed method outperforms the state-of-the-art in syllable segmentation for jingju a cappella singing. We further provide an analysis of the segmentation errors which points possible research directions.
- Clone this repository
- Download Jingju a capella singing dataset, scores and syllable boundary annotations from https://goo.gl/y0P7BL
- Change
dataset_root_pathvariable insrc/filePath.pyto locate the above dataset - Python 2.7.9 and Essentia 2.1-beta3 were used in the paper; Install python dependencies from
requirements.txt. - Set
mth_ODF,layer2,fusionandfilter_shapevariables insrc/parameters.py - Run
python onsetFunctionCalc.pyto produce the experiment results for above parameter setting - Run
python eval_demo.pyto produce the evaluation result
- Do steps 1, 2, 3, 4 in Steps to reproduce the experiment results
- Run
python trainingSampleCollection.pyto calculate mel-bands features - CNN models training code is located in
localDLScriptsfolder. Use them according to the computing configurations (CPU, GPU). - Pre-trained models are located in
cnnModelsfolders
numpy scipy matplotlib essentia scikit-learn cython keras theano hyperopt
Affero GNU General Public License version 3