The Polyphonic Audio To Roman Corpus, or PARC, is a large-scale dataset for Roman Numeral Analysis (RNA) from polyphonic audio, built using HookTheory's TheoryTab Database. It pairs time-aligned RN annotations with real recordings (sourced from YouTube) and is intended mainly for research in RNA but can also be extended to chord estimation, sequence modeling, and other MIR tasks.
PARC contains over 11,000 songs segmented into TheoryTabs (sections), spanning 33 genres, and organized into three stratified split levels: theorytab, song and artist. It includes 1,000+ unique RN labels and over 7 million annotations, sampled at a 1/32-beat resolution.
The dataset is provided in a JSON format following the guidelines of Donahue et al., in their earlier work with the HookTheory database, and is accompanied by pre-extracted features using the NNLS-Chroma VAMP plugin.
Useful notebooks can be found in the
notebooks/
folder, e.g.dataset_exploration.ipynb
, to help you visualize and explore PARC's contents, structure, and features.
The dataset and features can be downloaded here. This link also has the models' checkpoints used to report the metrics in the paper and can be found under the checkpoints
folder.
Once downloaded:
- Place the files
parc.json
and thesegments
folder inside thedataset/
folder at the root of this repository. - Update the path variables in
source/constants.py
so that they point to your local copies.
Copyright note: We cannot distribute the original audio files due to copyright and ethical restrictions. If you want to extract features from your own audio set, you can run:
python -m scripts.extract_vamp_features
To use the script, you'll need to update AUDIOS_FILEPATH
in source/constants.py
to point to your audio .h5
file. Your audios must be organized by ID and have a sampling rate of 44100Hz to be compatible.
# 1) clone this repo
git clone https://github.com/uai-ufmg/parc.git
cd parc
# 2) create a venv (recommended)
python -m venv .venv
source .venv/bin/activate
# 3) install requirements
pip install -r requirements.txt
We recommend Python 3.10 and running on a GPU for faster training and evaluation.
The repository provides two main entry points for model experiments (please note that the YAML configuration file will be explained later).
# To train your models
python -m scripts.train --yaml path/to/config.yaml
# To evaluate your models
python -m scripts.evaluate --yaml path/to/config.yaml
Besides the --yaml
command line argument, the evaluation script also has other 5 optional arguments, which are:
--n_bootstraps:
Number of bootstrap iterations for confidence intervals (default is 300).--ci:
Confidence interval for bootstrap (default is 0.95).--run_genre_eval:
Flag to also run evaluation per genre, reporting the metrics for each genre of the dataset.--run_complexity_eval:
Flag to also run evaluation per complexity, reporting the metrics for each complexity of the dataset.--checkpoint:
Use this to specify a checkpoint file for the evaluation procedure. If this is not used, the evaluation script will search forbest_model.ckpt
checkpoint file in theoutput_dir
key value of the YAML configuration file.
Here is an example of a more customized evaluation:
# Running evaluation with less boostrap iterations and also per genre
python -m scripts.evaluate --yaml path/to/config.yaml --n_boostraps 100 --run_genre_eval
As said in the previous section, the execution of both training and evaluation is bounded to a .yaml
configuration file located at the configs/
folder. An example of such configuration file can be found in this folder. Each key will be explained next:
-
model:
Which model to use. Available options are:Frog
,AudioAugmentedNet
,BiGRU
andNaiveBaseline
. -
num_epochs:
The amount of epochs to train the model. -
output_dir:
Where to save the experiment outputs, such as training and evaluation results, stored in.csv
format, and best model checkpoint. -
data:
This represents the data related information, and has the following keys:split_level:
Which split level to use. Available options aretheorytab
,song
andartist
.use_semitone_spectrum:
Boolean indicating whether to use the Semitone Spectrum audio feature (True
) or the NNLS-Chroma Chroma+Bass features (False
).
-
dataloader:
This represents PyTorch's data loading related information, and has the following keys:num_workers::
Number of workers to load the data.batch_size:
Batch size to train the model.
-
optimizer:
This represents PyTorch's optimizer related information, which in this case is ADAM, and has the following keys:lr:
Which learning rate to use.weight_decay:
Which weight decay to use.
Notes:
- The same YAML configuration file can be used in both training and evaluation procedures!
- The keys associated with
data
,dataloader
andoptimizer
can actually be customized to cover all arguments of theTheoryTabDataset
, located atsource/data.py
and PyTorch's DataLoader and ADAM optimizer classes. In particular, you can setwanted_genres
andwanted_complexities
in thedata
key to load songs with only that set of genres and complexities. For a detailed list of which genres and complexities are available please take a look at thesource/constants.py
file.
Although all of the code in this repository is released under a permissive MIT license, the dataset and trained models are distributed under CC BY-NC-SA 3.0, as they are derived from user contributions on HookTheory.
For more information please check HookTheory's Terms of Service and the LICENSE
file.