Skip to content

etsabary/musicbrainz_data_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MusicBrainz Data Parser

This script processes and harmonizes the MusicBrainz release.tar.xz data dump. It extracts essential release information, including:

  • Release titles and dates
  • Artist names
  • Track listings with durations
  • Genres and subgenres (from tags)
  • Labels and country of release

The script handles large datasets efficiently using multiprocessing and logs critical errors for review.

Usage

Ensure you have the required dependencies installed:

pip install ujson tqdm

Run the script with:

python prepare20M.py --tar-file path/to/release.tar.xz --output-dir output_directory

Replace path/to/release.tar.xz with the path to your MusicBrainz data file and output_directory with your desired output folder.

Output

  • Processed JSON files containing essential release data.
  • A log file mb_processing_errors.log capturing critical errors.
  • A rejected_records folder with samples of records that were not processed.

License

This project is licensed under the MIT License.

About

Command-line tool for efficient MusicBrainz JSON release dump parsing and harmonizing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published