Skip to content

ETL POC example where it extracts data from the Spotify API, transform the data filtering unwanted records and loads to PG database using Alembic for database versioning

Notifications You must be signed in to change notification settings

EspositoLucas/Spotify-ETL-POC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spotify API ETL POC

This POC extract data from the Spotify API, transform the data filtering unwanted records and loads to PG database using Alembic for database versioning.

Inspired on

Karolina Sowinska - Data Engineering https://www.youtube.com/playlist?list=PLNkCniHtd0PNM4NZ5etgYMw4ojid0Aa6i

CodinEric - Data Engineering https://youtube.com/@CodinEric?si=slN4oCnuDbhr6FIn

Extract

I am using the spotify api to extract data from my account

Get Spotify creds

https://developer.spotify.com/dashboard/login

Transform

Using pandas I can filter the unwanted records, as you can see you have a loosly time window and I am plainning on running this as tasks of Airflow :P

Load

Local postgres db

docker run -d --name spoty_etl_pg -v my_dbdata:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_PASSWORD=postgres -e POSTGRES_USER=postgres -e POSTGRES_DB=spotipy postgres

test it with

docker exec -it spoty_etl_pg psql -h localhost -U postgres -W spotipy

Alembic

Add alembic to the project

alembic init alembic

Known issue

when instaling alembic you might need to also add

poetry add psycopg2-binary

Create a Migration Script

alembic revision --autogenerate -m "First"

Running First Migration

alembic upgrade head

About

ETL POC example where it extracts data from the Spotify API, transform the data filtering unwanted records and loads to PG database using Alembic for database versioning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published