This Repo contains details about Kafka Real Time Stock Market Data Engineering Project, Thanks
Introduction and Data Flow:
In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL Extension to the technologies made includes:
- Connecting PowerBI to AWS S3 JSON File and building a sample PowerBI Report
- Connecting PowerBI to AWS Athena Database and building a sample PowerBI Report
Technology Used:
- Programming Languages: Python and R Language
- Cloud Provider: AWS - S3, Athena, Glue Catalog, Glue Crawler, Ec2
- Reporting Tool: PowerBI
- ODBC Driver: Simba ODBC Driver to connect PowerBI to AWS Athena
- R Packages: AWS S3 R Package to connect to AWS S3 bucket
Dataset URL:
Kafka Producer Code: https://github.com/vinaykm5758/Kafka_Real_Time_Stock_Market_Data_Engineering_Project/blob/main/Kafka_Producer.ipynb
Kafka Consumer Code: https://github.com/vinaykm5758/Kafka_Real_Time_Stock_Market_Data_Engineering_Project/blob/main/KafkaConsumer.ipynb
Connecting to Kafka components in AWS EC2 Instance: https://github.com/vinaykm5758/Kafka_Real_Time_Stock_Market_Data_Engineering_Project/blob/main/Kafka_1.PNG
AWS S3 Data: https://github.com/vinaykm5758/Kafka_Real_Time_Stock_Market_Data_Engineering_Project/blob/main/AWS_S3_Data.PNG
PowerBI Reports:
- Connecting PowerBI with AWS S3 bucket sample JSON file: Page-1: https://github.com/vinaykm5758/Kafka_Real_Time_Stock_Market_Data_Engineering_Project/blob/main/Realtime_AWS_Athena_PowerBI_Report.pbix
- Connecting PowerBI with AWS Athena Table "stock_market_kafka.kafka_stock_market_demo_viinay": Page-2: https://github.com/vinaykm5758/Kafka_Real_Time_Stock_Market_Data_Engineering_Project/blob/main/Realtime_AWS_Athena_PowerBI_Report.pbix
Data Validations:
-
Validated the counts for the Index column from AWS Athena Vs PowerBI Report in Real time: Counts Matched
R Script used in PowerBI:
Sys.setenv(
"AWS_ACCESS_KEY_ID" = 'XXX', "AWS_SECRET_ACCESS_KEY" = 'XX', "AWS_DEFAULT_REGION" = "us-east-1" )
test_data <- aws.s3::s3read_using(FUN = read.csv, object = 'stock_market_95.json', bucket = 's3://kafka-stock-market-demo-viinay/')