Extended Metadata for MGPHot (audio links and more!)

This repo is still in construction....

Purpose of this repository

This repository enables the community to use MGPHot in further research without redistributing restricted files. The license of the original dataset forbids redistribution of derivative files and does not provide audio. Therefore, this repository does not include gene_values or any audio.

Instead, you will:

Reconstruct the three canonical indices locally.
Verify each index with MD5 checksums.
Collect the audio for each track from public sources and verify the files.

What we provide:

Get the data in two steps: python reconstruct.py and python download_audio.py.
data_preparation/: scripts to collect audio and build the indices.
evaluation_probes/: code to train lightweight models for evaluation.

Compliance note: do not upload reconstructed indices or audio to this repository or any online service. The goal is reproducible use of MGPHot while respecting the original license.

What you reconstruct

You will obtain three JSON index files:

genome_index_split.json Task: regression on gene_values (continuous targets).
genome_index_split_positive.json Task: positive music autotagging (binary tags from thresholds over gene_values).
genome_index_split_negative.json Task: negative music autotagging (complement of the positive tags).

Each index already includes the train/validation/test split in the field split. MD5 files are used to guarantee that every index is canonical in content and formatting.

How to reconstruct

Run the reconstruction script.

python reconstruct.py

It will:

download the Zenodo TSV with gene_values,
rebuild the base index with gene_values,
generate positive and negative indices,
compare each output with its reference MD5,
print a short report with dashed separators.

Outputs created (plus their .md5 files):

genome_index_split.json
genome_index_split_positive.json
genome_index_split_negative.json

If an MD5 does not match, the script prints it clearly. MD5 ensures exact byte match, including field order, indentation, and the trailing newline policy.

Download the audio

python download_audio.py

Repository layout

data_preparation/ — Clean and reliable process to obtain YouTube links and to build the indices.
download_audio/ — Scripts to download and verify all audio.
evaluation_probes/ — Training and evaluation code for the benchmark (regression and autotagging probes).
reconstruct.py — Rebuilds the three indices and verifies MD5 for each.
genome_positive.py / genome_negative.py — Convert gene_values to positive and negative tags.

Contribute

Audio download is semi-automatic. If you find a wrong or broken link, please open an issue:

Please include:

- Artist name: <Artist>
- Track title: <Title>
- Old YouTube URL: <https://www.youtube.com/watch?v=...>
- Old YouTube ID: <...>
- New YouTube URL: <https://www.youtube.com/watch?v=...>
- New YouTube ID: <...>
- Notes (optional): <...>

Citation

If you use this repository in research, please cite the paper:

and the original dataset:

License

Code and index definitions are released for research and non‑commercial use. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Extended Metadata for MGPHot (audio links and more!)

Purpose of this repository

What you reconstruct

How to reconstruct

Download the audio

Repository layout

Contribute

Citation

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_preparation		data_preparation
evaluation_probes		evaluation_probes
md5		md5
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_audio.py		download_audio.py
genome_index_split_without_gene.json		genome_index_split_without_gene.json
genome_negative.py		genome_negative.py
genome_positive.py		genome_positive.py
reconstruct_index.py		reconstruct_index.py

License

MTG/MGPHot-audio

Folders and files

Latest commit

History

Repository files navigation

Extended Metadata for MGPHot (audio links and more!)

Purpose of this repository

What you reconstruct

How to reconstruct

Download the audio

Repository layout

Contribute

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages