Skip to content

uai-ufmg/parc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PARC — The Polyphonic Audio to Roman Corpus

The Polyphonic Audio To Roman Corpus, or PARC, is a large-scale dataset for Roman Numeral Analysis (RNA) from polyphonic audio, built using HookTheory's TheoryTab Database. It pairs time-aligned RN annotations with real recordings (sourced from YouTube) and is intended mainly for research in RNA but can also be extended to chord estimation, sequence modeling, and other MIR tasks.

PARC contains over 11,000 songs segmented into TheoryTabs (sections), spanning 33 genres, and organized into three stratified split levels: theorytab, song and artist. It includes 1,000+ unique RN labels and over 7 million annotations, sampled at a 1/32-beat resolution.

The dataset is provided in a JSON format following the guidelines of Donahue et al., in their earlier work with the HookTheory database, and is accompanied by pre-extracted features using the NNLS-Chroma VAMP plugin.

Useful notebooks can be found in the notebooks/ folder, e.g. dataset_exploration.ipynb, to help you visualize and explore PARC's contents, structure, and features.

Download and setup

The dataset and features can be downloaded here. This link also has the models' checkpoints used to report the metrics in the paper and can be found under the checkpoints folder.

Once downloaded:

  1. Place the files parc.json and the segments folder inside the dataset/ folder at the root of this repository.
  2. Update the path variables in source/constants.py so that they point to your local copies.

Copyright note: We cannot distribute the original audio files due to copyright and ethical restrictions. If you want to extract features from your own audio set, you can run:

python -m scripts.extract_vamp_features

To use the script, you'll need to update AUDIOS_FILEPATH in source/constants.py to point to your audio .h5 file. Your audios must be organized by ID and have a sampling rate of 44100Hz to be compatible.


Quick start

# 1) clone this repo
git clone https://github.com/uai-ufmg/parc.git
cd parc

# 2) create a venv (recommended)
python -m venv .venv
source .venv/bin/activate

# 3) install requirements
pip install -r requirements.txt

We recommend Python 3.10 and running on a GPU for faster training and evaluation.

Training and evaluation

The repository provides two main entry points for model experiments (please note that the YAML configuration file will be explained later).

# To train your models
python -m scripts.train --yaml path/to/config.yaml

# To evaluate your models
python -m scripts.evaluate --yaml path/to/config.yaml

Besides the --yaml command line argument, the evaluation script also has other 5 optional arguments, which are:

  • --n_bootstraps: Number of bootstrap iterations for confidence intervals (default is 300).
  • --ci: Confidence interval for bootstrap (default is 0.95).
  • --run_genre_eval: Flag to also run evaluation per genre, reporting the metrics for each genre of the dataset.
  • --run_complexity_eval: Flag to also run evaluation per complexity, reporting the metrics for each complexity of the dataset.
  • --checkpoint: Use this to specify a checkpoint file for the evaluation procedure. If this is not used, the evaluation script will search for best_model.ckpt checkpoint file in the output_dir key value of the YAML configuration file.

Here is an example of a more customized evaluation:

# Running evaluation with less boostrap iterations and also per genre
python -m scripts.evaluate --yaml path/to/config.yaml --n_boostraps 100 --run_genre_eval

YAML configuration file

As said in the previous section, the execution of both training and evaluation is bounded to a .yaml configuration file located at the configs/ folder. An example of such configuration file can be found in this folder. Each key will be explained next:

  • model: Which model to use. Available options are: Frog, AudioAugmentedNet, BiGRU and NaiveBaseline.

  • num_epochs: The amount of epochs to train the model.

  • output_dir: Where to save the experiment outputs, such as training and evaluation results, stored in .csv format, and best model checkpoint.

  • data: This represents the data related information, and has the following keys:

    • split_level: Which split level to use. Available options are theorytab, song and artist.
    • use_semitone_spectrum: Boolean indicating whether to use the Semitone Spectrum audio feature (True) or the NNLS-Chroma Chroma+Bass features (False).
  • dataloader: This represents PyTorch's data loading related information, and has the following keys:

    • num_workers:: Number of workers to load the data.
    • batch_size: Batch size to train the model.
  • optimizer: This represents PyTorch's optimizer related information, which in this case is ADAM, and has the following keys:

    • lr: Which learning rate to use.
    • weight_decay: Which weight decay to use.

Notes:

  • The same YAML configuration file can be used in both training and evaluation procedures!
  • The keys associated with data, dataloader and optimizer can actually be customized to cover all arguments of the TheoryTabDataset, located at source/data.py and PyTorch's DataLoader and ADAM optimizer classes. In particular, you can set wanted_genres and wanted_complexities in the data key to load songs with only that set of genres and complexities. For a detailed list of which genres and complexities are available please take a look at the source/constants.py file.

Licensing considerations

Although all of the code in this repository is released under a permissive MIT license, the dataset and trained models are distributed under CC BY-NC-SA 3.0, as they are derived from user contributions on HookTheory.

For more information please check HookTheory's Terms of Service and the LICENSE file.

About

Repository for The Polyphonic Audio to Roman Corpus dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published