Airflow Challenge - Indicium [Lighthouse]

This challenge is a task provided by Indicium in order to obtain a Trainee-level skill certification on Airflow platform.

In order to run this project, please follow instructions below:

Instructions

Note: all commands are using linux platform. If you are using Windows, please find commands corresponding to your operating system.

Clone the repository:

git clone https://github.com/moise-s/-Lighthouse-Airflow_Challenge.git

Create a Virtual Environment:

virtualenv venv -p python3

Run the newly-created environment:

source venv/bin/activate

Install the required libraries:

pip install -r requirements.txt

It is necessary to inform the system the location of variable AIRFLOW_HOME. So, create on the project root, a file named .env containing following text. It is necessary to change path according to your environment:

export AIRFLOW_HOME=<path-to-project-root-folder>

Inform system the location and purpose of file, on terminal:

source .env

Finally, run AirFlow:

airflow standalone

On Terminal, you will find the credentials to run project on localhost:8080, as the example:

standalone | Airflow is ready
standalone | Login with username: admin  password: GdKc3u5PNE5qZGGG
standalone | Airflow Standalone is for development purposes only. Do not use this in production!

AirFlow Usage

After acessing, on your browser localhost:8080 and logging in with credentials gotten from terminal, it is possible to check DAGs. Please click on DAG named DesafioAirflow. There you can navigate on options. By going to graph option, there is the DAG's flow, as image below:

For running the DAG, click on play icon, on top-right of screen, and then Trigger DAG.

After Triggering, it should run successfully and the result will be shown on Grid view (green for success!)

Explaining tasks

Connecting to sqlite with Northwind database, and exporting some data (SELECT * from "Order") from this DB to a .CSV file named output_orders.csv. The function is below:

def sqlite_to_csv():
    con = sqlite3.connect('data/Northwind_small.sqlite')
    df = pd.read_sql_query('select * from "order"', con)
    df.to_csv(path_or_buf='data/output_orders.csv')
    con.close()

Reading data from OrderDetail table and writing a file containing a filtered query (from tables Order and OrderDetail) to a .csv file:

Sum of Quantity sold to a ShipCity = "Rio de Janeiro";

So, function with query and file writing is as follows:

def sqlite_join():
    con = sqlite3.connect('data/Northwind_small.sqlite')
    df = pd.read_sql_query("""
    select SUM(OrderDetail.quantity)
    from "Order"
    LEFT join OrderDetail
    on "order"."Id" = OrderDetail.orderid
    where "order"."shipcity" = "Rio de Janeiro";
    """, con)
    f = open("data/count.txt", "w+")
    f.write(str(df.values).strip('[]'))
    con.close()

Result of function running is file on data/count.txt with following content: 1893.

Added my e-mail to the variable my_email on AirFlow
While running last task of DAG, called export_final_output, the file final_output.txt is created. Its content is:

bW9pc2VzLm5hc2NpbWVudG9AaW5kaWNpdW0udGVjaDE4OTM=

When executing DAG it is important to make sure they are in the correct order. That is, it was necessary to provide following code on dags/desafio.py file:

export_CSV >> join_count >> export_final_output

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dags		dags
data		data
images		images
logs/scheduler		logs/scheduler
.env		.env
.gitignore		.gitignore
README.md		README.md
airflow.cfg		airflow.cfg
airflow.db		airflow.db
final_output.txt		final_output.txt
requirements.txt		requirements.txt
standalone_admin_password.txt		standalone_admin_password.txt
webserver_config.py		webserver_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Airflow Challenge - Indicium [Lighthouse]

Instructions

AirFlow Usage

Explaining tasks

About

Uh oh!

Releases

Packages

Uh oh!

Languages

moise-s/Indicium-codeChallenge-AirFlow

Folders and files

Latest commit

History

Repository files navigation

Airflow Challenge - Indicium [Lighthouse]

Instructions

AirFlow Usage

Explaining tasks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages