Datasets of MusicBrainz, Tidal, Spotify, Deezer
You can use these tables for MiniMedia's database, it will save you huge amounts of time, instead calling the API's yourself
Tidal, Spotify, Deezer datasets were obtained through their API, took months of calling their API's 24/7
MusicBrainz came mostly through their own published dataset + API
Note for Deezer dataset: The Preview Url (to listen to the first x seconds of a song) and TrackToken (for playback) fields will be empty, it took too much space to store all of this for me
Packed: CSV-Format 4.6GB, SQL 16.2GB
Unpacked CSV-Format 82.2GBGB, SQL 114.3GB
Loving the work I do? buy me a coffee https://buymeacoffee.com/musicmovearr
You can officially download it here: https://metabrainz.org/datasets/postgres-dumps#musicbrainz
But the official dataset is huge, a lot larger then I'm sharing, this is because I saved it more efficiently
Total Size: ~20GB in postgres, 270GB provided by MusicBrainz in json-format
Artists: 2.5mil
Albums: 4.8mil
Tracks: 49mil
Total Size: ~3GB in postgres, 1.2GB in CSV-Format
Artists: 214k
Albums: 408k
Tracks: 2.1mil
Total Size: ~15GB in postgres, 3GB in CSV-Format
Artists: 456k
Albums: 2.3mil
Tracks: 14.6mil
Total Size: ~120GB in postgres, 73.8GB in CSV-Format
Artists: 4.1mil
Albums: 21.7mil
Tracks: 118.7mil
Datasets of MusicBrainz, Tidal, Spotify
These datasets contain zero modifications from myself, they're straight from the source
You can use these tables for MiniMedia's database, it will save you huge amounts of time, instead calling the API's yourself
Tidal, Spotify datasets were obtained through their API, took months of calling their API's 24/7
MusicBrainz came mostly through their own published dataset + API
Packed: 4.5GB
Unpacked 44.6GB
Loving the work I do? buy me a coffee https://buymeacoffee.com/musicmovearr
You can officially download it here: https://metabrainz.org/datasets/postgres-dumps#musicbrainz
But the official dataset is huge, a lot larger then I'm sharing, this is because I saved it more efficiently
Total Size: ~20GB in postgres, 270GB provided by MusicBrainz in json-format
Artists: 2.5mil
Albums: 4.8mil
Tracks: 49mil
Total Size: ~1GB in postgres
Artists: 64k
Albums: 196k
Tracks: 1.1mil
Total Size: ~3GB in postgres
Artists: 118k
Albums: 403k
Tracks: 2.5mil
Spotify has a pain in the ass rate limit, I can only call their API every 10seconds to not trigger their rate limiter too fast and then it will block the API key for ~15hours...
Plus keep in mind that the dataset of Spotify is not complete, I can only fetch ~500 artists in a day...
Tidal's api rate limiter is alright but I can only make ~200 API calls per (15 minutes?), it's not super fast but compared to spotify it doesn't block me for ~15hours
Plus keep in mind that the dataset of Tidal is not complete
The Deezer dataset is complete I can say with confidence for 99%, there surely must be a few artists I missed
Type | Name |
---|---|
long | ArtistId |
string | ArtistName |
int | ArtistNbAlbum |
int | ArtistNbFan |
bool | ArtistRadio |
string | ArtistType |
string | ArtistHref |
string | ArtistImageHref |
long | AlbumId |
long | AlbumArtistId |
string | AlbumName |
string | AlbumMd5Image |
int | AlbumGenreId |
long | AlbumFans |
string | AlbumReleaseDate |
string | AlbumRecordType |
bool | AlbumExplicitLyrics |
int | AlbumExplicitContentLyrics |
int | AlbumExplicitContentCover |
string | AlbumType |
string | AlbumUPC |
string | Label |
long | AlbumNbTracks |
TimeSpan | AlbumDuration |
bool | AlbumAvailable |
string | AlbumHref |
string | AlbumGenreName |
string | AlbumGenrePicture |
long | TrackId |
bool | TrackReadable |
string | TrackTitle |
string | TrackTitleShort |
string | TrackTitleVersion |
string | TrackISRC |
TimeSpan | TrackDuration |
int | TrackPosition |
int | TrackDiscNumber |
long | TrackRank |
string | TrackReleaseDate |
bool | TrackExplicitLyrics |
int | TrackExplicitContentLyrics |
int | TrackExplicitContentCover |
double | TrackBPM |
double | TrackGain |
string | TrackMd5Image |
long | TrackArtistId |
long | TrackAlbumId |
string | TrackType |
string | TrackHref |
Type | Name |
---|---|
string | ArtistId |
string | ArtistName |
int | ArtistPopularity |
string | ArtistType |
string | ArtistUri |
int | ArtistTotalFollowers |
string | ArtistHref |
string | ArtistGenres |
string | ArtistCoverUrl |
string | AlbumId |
string | AlbumAlbumGroup |
string | AlbumAlbumType |
string | AlbumName |
string | AlbumReleaseDate |
string | AlbumReleaseDatePrecision |
int | AlbumTotalTracks |
string | AlbumType |
string | AlbumUri |
string | AlbumLabel |
int | AlbumPopularity |
string | AlbumArtistId |
string | AlbumCoverUrl |
string | AlbumUPC |
string | TrackId |
string | TrackAlbumId |
int | TrackDiscNumber |
TimeSpan | TrackDuration |
bool | TrackExplicit |
string | TrackHref |
bool | TrackIsPlayable |
string | TrackName |
string | TrackPreviewUrl |
int | TrackNumber |
string | TrackType |
string | TrackUri |
string | TrackISRC |
Type | Name |
---|---|
int | ArtistId |
string | ArtistName |
float | ArtistPopularity |
string | ArtistImageHref |
int | AlbumId |
string | AlbumTitle |
string | AlbumBarcodeId |
int | AlbumNumberOfVolumes |
int | AlbumNumberOfItems |
string | AlbumDuration |
string | AlbumExplicit |
string | AlbumReleaseDate |
string | AlbumCopyright |
float | AlbumPopularity |
string | AlbumAvailability |
string | AlbumMediatags |
string | AlbumImageHref |
int | TrackId |
int | TrackAlbumId |
string | TrackTitle |
string | TrackISRC |
string | TrackDuration |
string | TrackCopyright |
bool | TrackExplicit |
float | TrackPopularity |
string | TrackAvailability |
string | TrackMediatags |
int | TrackVolumeNumber |
int | TrackTrackNumber |
string | TrackVersion |
string | ArtistHref |
string | AlbumHref |
string | TrackHref |