Spotify ETL Project

This repo contains a Spotify ETL project.

The idea is to query the Spotify API regularly and obtain the latest information about songs a user listened to. This info is gathered to build a database of historical music choices. Additionally I plan to query other sources to obtain information about events, weather, etc. to see if we are really "singing in the rain" or dance in summer :) The end goal is a dashboard, automatic playlist generation etc. to give more insights into one's music choices over time and in relation to outside influences (if there are any). Maybe this can even be enriched with a sensible recommendation.

Additionally I want to use this project to learn and practice some best practices such as CI/CD, type hinting and checking, testing, and data validation through pydantic.

Original Tech Stack (question mark means potential but still undecided):

Python (pandas, spotipy, mypy, pydantic, psycopg2, boto3, black)
PostgreSQL on RDS
AWS EC2 + cronjob (Airflow could technically be used, dag is functional, complications of setup on EC2 didn't warrant the effort. A simple cronjob works for the hourly data ingest.)

V2.0 Databricks

after my free AWS tier ran out the project was stopped
decided to relaunch on Azure
created a setup on Databricks as this will be quickest and easiest for me (everything necessary is in ./databricks)
once I have it running may add a platform independent version again

Spotify Authentication

Authentication is handled by spotipy and oauth2. To use spotipy.oauth2 the SPOTIPY_CLIENT_ID and SPOTIPY_CLIENT_SECRET environment variables need to be set. In order to obtain a set of credentials you need to register with the Spotify Developer Service. Additionally if you want to run this on Databricks you need to authenticate the user once in an interactive setting, then transfer the .cache file with the token to a path on dbfs

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github/workflows		.github/workflows
app		app
databricks		databricks
exploration		exploration
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
main-dag.py		main-dag.py
main.py		main.py
monthly_playlist.py		monthly_playlist.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spotify ETL Project

Original Tech Stack (question mark means potential but still undecided):

V2.0 Databricks

Spotify Authentication

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

MerlinSchaefer/spotify_etl

Folders and files

Latest commit

History

Repository files navigation

Spotify ETL Project

Original Tech Stack (question mark means potential but still undecided):

V2.0 Databricks

Spotify Authentication

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages