Argentina Open Data Challenge

Using the Argentinian open data portal, I created an ETL pipeline to find trends over movie theaters, libraries and museums in different parts of the country

You can find the challenge here

Prerequisites

Poetry

Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you.

You need to install poetry with pip install poetry then all other dependencies will be managed by it.

Using poetry install it will create a virtual env with all the necesary dependencies then you can access it with poetry shell.

If you need to add a dependency just use poetry add <dependency> or you can customice much more editing the pyproyect.toml I exported all necesary dependencies to a requirements.txt

You can read more about Poetry here

Also you can expand even further with this post Hypermodern Python

Data research

You can find the data exploratory research on this notebook

Setup database

You can setup the dabase by running

docker run -d --name challenge_pg -p 5432:5432 -e POSTGRES_PASSWORD=postgres -e POSTGRES_USER=postgres e POSTGRES_DB=postgres postgres

You can access the database by running

docker exec -it challenge_pg psql -U postgres -d postgres

Creating the database

You can create the dabase by running

python script.py

This script will read all the files in/challenge/sql/ and run the sql script in them to create each table.

Running the ETL

First you need to create a settings.ini file. Use the settings.ini.ex for reference You can also set the variables as enviromental vairables by doing

export DB_CONNSTR=value 
export MUSEO_URL=value
export CINES_URL=value
export ESPACIOS_URL=value

Where value is the correct value you need.

You can run the etl by using the command.

python python challenge/main.py --date 2021-08-31

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
challenge		challenge
notebook		notebook
Challenge.pdf		Challenge.pdf
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
settings.ini		settings.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Argentina Open Data Challenge

Prerequisites

Poetry

Data research

Setup database

Creating the database

Running the ETL

About

Uh oh!

Releases

Packages

Uh oh!

Languages

EspositoLucas/Argentina-Open-Data-Challenge

Folders and files

Latest commit

History

Repository files navigation

Argentina Open Data Challenge

Prerequisites

Poetry

Data research

Setup database

Creating the database

Running the ETL

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages