This is a personal Spotify listening history tracking pipeline. It connects to the Spotify Web API to fetch recently played tracks (can call upto 50 recent tracks), stores them locally in a structured CSV format, and enhances them with additional metadata such as track duration and artist genres. The system is built to run incrementally, updating only new records and maintaining JSON-based caches for artist and song metadata.
You can also manually update missing or ambiguous metadata—especially useful for instrumental or lesser-known tracks where Spotify may not provide genre information. The search feature seems to yield no results for a surprisingly large number of artists.
- You will need to access and create a new app in Spotify developer dashboard and generate your own client id and secret for your Spotify listening account (This is free).
- Add them to a
.env
file in your project similar to.env_example
.
extract_script.py
- The main script that calls the Spotify Web API and generates the main CSV file and stores in./data/spotify_data_<current_year>.csv
. The results of all runs (automated or otherwise) are recorded inauto_extract_log.txt
with appropriate timestamp information.extract_with_metadata.py
- This script searches for an organizes the genre and track duration and adds them as two columns in addition to the existing DataFrame generated byextract_script
. Stored as./data/spotify_data_with_metadata_<current_year>.csv
. Also generates metadata files./metadata/artist_metadata.json
,./metadata/song_metadata.json
and./metadata/missing_queries.json
spotify_logger.bat
- The batch file used with the Windows Task Scheduler to run theextract_script.py
every hour to check for recent history. This ensures tracks listened to are recorded well within the 50 track limit.