Skip to content

joao-victor-campos/pyspark-data-aggregation-pipeline

Repository files navigation

Pyspark Data Aggregation Pipeline

Python Version Code style: black Flake8 Imports: isort Checked with mypy pytest coverage: 100%

Build Status:

Core Docker Image
Tests Docker Image

Introduction

Repo to save our pyspark pipeline project. Created with the intention to put to life our studies in python, pyspark, ETL process and software engineering skills in general, such as units and integration test, CI, Docker and code quality.

Running the pipeline

Clone the project:

git clonegit@github.com:joao-victor-campos/pyspark-data-aggregation-pipeline.git
cd pyspark-data-aggregation-pipeline

Build docker image

docker build -t pyspark-data-aggreagation-pipeline .

Run app in docker

docker run pyspark-data-aggreagation-pipeline 

Pipeline execution finished!!!

Development

Install dependencies

make requirements

Code Style

Apply code style with black and isort

make apply-style

Perform all checks (flake8, black and mypy)

make checks

Testing and Coverage

Unit tests:

make unit-tests

Integration tests:

make integration-tests

All (unit + integration) tests with coverage report:

make tests-coverage

About

Repo to save our pyspark pipeline project. Created with the intention to put to life our studies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •