Welcome to the project.
Let's Start with requirement to complete the projects:-
- You should have laptop with minimum 4 GB of RAM, i3 and above (Better to have 8GB with i5).
- Local setup of spark. This is tricky so keep all things intact to work it properly.Download python 3.10.11 instead of python3.6 or python3.9
- PyCharm installed in the system.
- MySQL workbench should also be installed to the system.
- GitHub account is good to have but not necessary.
- An AWS account is needed.
- It is recommended to execute the project in a virtual environment.
Project structure:-
my_project/
├── docs/
│ └── readme.md
├── resources/
│ ├── __init__.py
│ ├── dev/
│ │ ├── config.py
│ │ └── requirement.txt
│ └── qa/
│ │ ├── config.py
│ │ └── requirement.txt
│ └── prod/
│ │ ├── config.py
│ │ └── requirement.txt
│ ├── sql_scripts/
│ │ └── table_scripts.sql
├── src/
│ ├── main/
│ │ ├── __init__.py
│ │ └── delete/
│ │ │ ├── aws_delete.py
│ │ │ ├── database_delete.py
│ │ │ └── local_file_delete.py
│ │ └── download/
│ │ │ └── aws_file_download.py
│ │ └── move/
│ │ │ └── move_files.py
│ │ └── read/
│ │ │ ├── aws_read.py
│ │ │ └── database_read.py
│ │ └── transformations/
│ │ │ └── jobs/
│ │ │ │ ├── customer_mart_sql_transform_write.py
│ │ │ │ ├── dimension_tables_join.py
│ │ │ │ ├── main.py
│ │ │ │ └──sales_mart_sql_transform_write.py
│ │ └── upload/
│ │ │ └── upload_to_s3.py
│ │ └── utility/
│ │ │ ├── encrypt_decrypt.py
│ │ │ ├── logging_config.py
│ │ │ ├── s3_client_object.py
│ │ │ ├── spark_session.py
│ │ │ └── my_sql_session.py
│ │ └── write/
│ │ │ ├── database_write.py
│ │ │ └── parquet_write.py
│ ├── test/
│ │ ├── scratch_pad.py.py
│ │ └── generate_csv_data.py
How to run the program in Pycharm:-
- Open the pycharm editor.
- Upload or pull the project from GitHub.
- Open terminal from bottom pane.
- Goto virtual environment and activate it. Let's say you have venv as virtual environment.i) cd venv ii) cd Scripts iii) activate (if activate doesn't work then use ./activate)
- You will have to create a user on AWS also and assign s3 full access and provide secret key and access key to the config file.
- Run main.py from green play button on top right hand side.