Data Pipeline Project

Please do not fork this repository, but use this repository as a template for your refactoring project. Make Pull Requests to your own repository even if you work alone and mark the checkboxes with an x, if you are done with a topic in the pull request message.

Project for today

The task for today you can find in the data-pipeline-project.md file.

Environment

Use this file to create a new environment for this task.

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Questions:

What are the steps you took to complete the project?
- Created a Jupiter notebook to test the download of the file from the data sorce and to test the calculation of the revenue per day SQL-with-spark_rev_per_day.ipynb
- Created a data_ingestion.py for data ingestion to the GCS bucket
- Created a PySpark file revenue_report.py for converting the data by a Dataproc job and saving the results to the GCS bucket
What are the challenges you faced?
- Applying the new accomplishments from the whole week of boot camp to a new task was very exciting and thrilling at the same time. The bonus part is very challenging and needs more time. I will try this part again after the bootcamp.
What are the things you would do differently if you had more time?
- Would like to get into the prefect topic, as workflow orchestration can take a lot of the work out of data science.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
SQL-with-spark_rev_per_day.ipynb		SQL-with-spark_rev_per_day.ipynb
data-pipeline-project.md		data-pipeline-project.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Pipeline Project

Project for today

Environment

Questions:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jos-repo/mle-data-pipeline-project

Folders and files

Latest commit

History

Repository files navigation

Data Pipeline Project

Project for today

Environment

Questions:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages