@@ -12,7 +12,7 @@ The pipeline follows a multi-layer data architecture:
12
12
1 . ** Raw Layer** - Contains raw CO2 data loaded directly from source files
13
13
2 . ** Harmonized Layer** - Standardized data with consistent formatting and data quality checks
14
14
3 . ** Analytics Layer** - Derived tables with aggregations, metrics, and enriched attributes for analysis
15
-
15
+ 4 . ** External Layer ** - Storing all the stages and for implementing external access integration and policies for external outbound network call.
16
16
### Key Components:
17
17
18
18
- ** Raw Data Ingestion** - Loads CO2 data from S3 into the raw layer
@@ -28,6 +28,7 @@ The pipeline follows a multi-layer data architecture:
28
28
- ** Snowpark** - Snowflake's Python API for data processing
29
29
- ** GitHub Actions** - CI/CD pipeline
30
30
- ** AWS S3** - Data storage for source files
31
+ - ** AWS Lambda** - Creating Lambda function with API Gateway for routing network api calls
31
32
- ** pytest** - Testing framework
32
33
33
34
## Setup and Installation
@@ -44,28 +45,33 @@ The pipeline follows a multi-layer data architecture:
44
45
1 . Clone the repository:
45
46
46
47
``` bash
47
- git clone < repository-url >
48
+ git clone https://github.com/BigDataTeam5/Incremental_DataPipleine_using_Snowflake.git
48
49
cd Incremental_DataPipleine_using_Snowflake
49
50
```
50
51
51
- 2 . Create and activate a virtual environment:
52
-
53
- ``` bash
54
- python -m venv venv
55
- source venv/bin/activate # On Windows: venv\Scripts\activate
56
- ```
52
+ 2 . Create and activate a virtual environment using poetry:
53
+ ### Windows Installation
54
+ ```
55
+ # Using PowerShell
56
+ (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
57
+ ```
58
+ after installing,
59
+ ```
60
+ cd Incremental_DataPipleine_using_Snowflake
61
+ poetry show
62
+ ```
63
+ run any python file along with poetry command
64
+ ```
65
+ poetry run python <your_script>.py
66
+ ```
57
67
58
- 3 . Install dependencies:
59
68
60
- ``` bash
61
- pip install -r requirements.txt
62
- ```
63
69
64
70
4. Set up RSA key pair authentication:
65
71
66
72
```bash
67
73
mkdir -p ~/.snowflake/keys
68
- python scripts/rsa_key_pair_authentication/generate_snowflake_keys.py
74
+ poetry python scripts/rsa_key_pair_authentication/generate_snowflake_keys.py
69
75
```
70
76
71
77
5 . Configure Snowflake connection by creating ` ~/.snowflake/connections.toml ` :
@@ -74,6 +80,7 @@ python scripts/rsa_key_pair_authentication/generate_snowflake_keys.py
74
80
[dev ]
75
81
account = " your-account"
76
82
user = " your-username"
83
+ password = " your-password"
77
84
private_key_path = " ~/.snowflake/keys/rsa_key.p8"
78
85
warehouse = " CO2_WH_DEV"
79
86
role = " CO2_ROLE_DEV"
@@ -84,6 +91,7 @@ client_request_mfa_token = false
84
91
[prod ]
85
92
account = " your-account"
86
93
user = " your-username"
94
+ password = " your-password"
87
95
private_key_path = " ~/.snowflake/keys/rsa_key.p8"
88
96
warehouse = " CO2_WH_PROD"
89
97
role = " CO2_ROLE_PROD"
@@ -122,7 +130,7 @@ ALTER USER YourUsername SET RSA_PUBLIC_KEY='<public-key-string>';
122
130
2 . Create required Snowflake resources:
123
131
124
132
``` bash
125
- python scripts/deployment_files/snowflake_deployer.py sql --profile dev --file scripts/setup_dev.sql
133
+ poetry run python scripts/deployment_files/snowflake_deployer.py sql --profile dev --file scripts/setup_dev.sql
126
134
```
127
135
128
136
## Project Structure
@@ -157,13 +165,13 @@ Incremental_DataPipleine_using_Snowflake/
157
165
### Loading Raw Data
158
166
159
167
``` bash
160
- python scripts/raw\ data\ loading\ and\ stream\ creation/raw_co2_data.py
168
+ poetry run python scripts/raw\ data\ loading\ and\ stream\ creation/raw_co2_data.py
161
169
```
162
170
163
171
### Creating Streams for Change Data Capture
164
172
165
173
``` bash
166
- python scripts/raw\ data\ loading\ and\ stream\ creation/02_create_rawco2data_stream.py
174
+ poetry run python scripts/raw\ data\ loading\ and\ stream\ creation/02_create_rawco2data_stream.py
167
175
```
168
176
169
177
### Running Tests
@@ -176,7 +184,7 @@ pytest tests/
176
184
177
185
Deploy all components:
178
186
``` bash
179
- python scripts/deployment_files/snowflake_deployer.py deploy-all --profile dev --path udfs_and_spoc --check-changes
187
+ poetry run python scripts/deployment_files/snowflake_deployer.py deploy-all --profile dev --path udfs_and_spoc --check-changes
180
188
```
181
189
182
190
Deploy a specific component:
@@ -218,7 +226,7 @@ Common issues and solutions:
218
226
219
227
- ** Authentication Errors** : Verify key permissions and format
220
228
- ** Deployment Failures** : Check function signatures and parameter counts
221
- - ** Connection Issues** : Run ` python scripts/deployment_files/check_connections_file.py ` to validate your connections.toml
229
+ - ** Connection Issues** : Run ` poetry run python scripts/deployment_files/check_connections_file.py` to validate your connections.toml
222
230
223
231
## Contributing
224
232
0 commit comments