Skip to content

ETL pipeline to find trends over movie theaters, libraries and museums in different parts of Argentina using the Argentinian open data portal

Notifications You must be signed in to change notification settings

EspositoLucas/Argentina-Open-Data-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Argentina Open Data Challenge

Using the Argentinian open data portal, I created an ETL pipeline to find trends over movie theaters, libraries and museums in different parts of the country

You can find the challenge here

Prerequisites

Poetry

Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you.

You need to install poetry with pip install poetry then all other dependencies will be managed by it.

Using poetry install it will create a virtual env with all the necesary dependencies then you can access it with poetry shell.

If you need to add a dependency just use poetry add <dependency> or you can customice much more editing the pyproyect.toml I exported all necesary dependencies to a requirements.txt

You can read more about Poetry here

Also you can expand even further with this post Hypermodern Python

Data research

You can find the data exploratory research on this notebook

Setup database

You can setup the dabase by running

docker run -d --name challenge_pg -p 5432:5432 -e POSTGRES_PASSWORD=postgres -e POSTGRES_USER=postgres e POSTGRES_DB=postgres postgres

You can access the database by running

docker exec -it challenge_pg psql -U postgres -d postgres

Creating the database

You can create the dabase by running

python script.py

This script will read all the files in/challenge/sql/ and run the sql script in them to create each table.

Running the ETL

First you need to create a settings.ini file. Use the settings.ini.ex for reference You can also set the variables as enviromental vairables by doing

export DB_CONNSTR=value 
export MUSEO_URL=value
export CINES_URL=value
export ESPACIOS_URL=value

Where value is the correct value you need.

You can run the etl by using the command.

python python challenge/main.py --date 2021-08-31

About

ETL pipeline to find trends over movie theaters, libraries and museums in different parts of Argentina using the Argentinian open data portal

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published