This script processes and harmonizes the MusicBrainz release.tar.xz
data dump. It extracts essential release information, including:
- Release titles and dates
- Artist names
- Track listings with durations
- Genres and subgenres (from tags)
- Labels and country of release
The script handles large datasets efficiently using multiprocessing and logs critical errors for review.
Ensure you have the required dependencies installed:
pip install ujson tqdm
Run the script with:
python prepare20M.py --tar-file path/to/release.tar.xz --output-dir output_directory
Replace path/to/release.tar.xz
with the path to your MusicBrainz data file and output_directory
with your desired output folder.
- Processed JSON files containing essential release data.
- A log file
mb_processing_errors.log
capturing critical errors. - A
rejected_records
folder with samples of records that were not processed.
This project is licensed under the MIT License.