The purpose of this project is to learn how to upload real-time data to S3 using Kafka (in a docker container), using local producers and consumers in Python.
To set up this project:
- Docker must be installed
- Python must be installed on chosed IDE (VScode recommended)
- An AlphaVantage API key must be acquired and added to the project:
- Enter details to acquire API key from "https://www.alphavantage.co/support/#api-key" (this is free)
- Create ".env" file in the project root folder
- Add "ALPHA_VANTAGE_API_KEY = " to the .env file
- Create a dedicated S3 bucket and IAM user:
- Create an S3 bucket
- Create an IAM user with an appropriate name, e.g. "kafka-s3-uploader"
- Add permission for the user:
- Go to user page
- Go to "Add permissions"
- Choose the "Create inline policy" option
- Change the policy editor view to "JSON"
- Copy the contents from "amazon_s3_bucker_user_policy.json" and paste it into the inline policy editor
- Edit the bucket name where it says "your bucket name"
- Hit "Next" and "Create policy" to finalise the new permissions for the user
- Acquire and add the user access key & secret to the .env file:
- Go to the user page
- Hit "Create access key" in the Summary section
- Choose "Local Code"
- Give the key an appropriate description and hit "Create access key"
- Download the CSV file to store the access key and secret
- Add " access key: " & " secret: " to the .env file in the project root folder
- Create a virtual environment to run the python scripts: "python -m venv venv" or "python3 -m venv venv" (macOS)
- Activate the virtual environment: "venv\scripts\activate" or "source venv/bin/activate" (macOS)
- Install required dependencies: "pip install -r requirements.txt"
To run this project:
- run "docker-compose up -d" to start Kafka in docker
- Activate the virtual environment: "venv\scripts\activate" or "source venv/bin/activate" (macOS)
- Run the kafka producer: "python kafka-producer.py"
- Run the consumer: "python kafka-consumer.py"
This project demonstrates how to upload real-time data to S3 using Kafka (running in Docker) and Python-based producers and consumers.
To run this project:
- run "docker-compose up -d" to start Kafka in docker
- Activate the virtual environment: "venv\scripts\activate" or "source venv/bin/activate" (macOS)
- Run the kafka producer: "python kafka-producer.py"
- Run the consumer: "python kafka-consumer.py"
Before setting up the project, ensure you have:
- Docker installed.
- Python installed (using an IDE like VSCode is recommended).
- An AlphaVantage API Key.
- An AWS S3 Bucket and an IAM User with proper permissions.
-
Obtain an API Key:
- Visit AlphaVantage API Key Request to get a free API key.
-
Configure the API Key:
- Create a
.env
file in the project's root directory. - Add the following line, replacing
<api key>
with your actual key:ALPHA_VANTAGE_API_KEY=<api key>
- Create a
- Set up a dedicated S3 bucket for the project.
-
User Creation & Permissions:
- Create a new IAM user.
- Go to the user page, click on "Add permissions", and choose "Create inline policy".
- Switch the policy editor to JSON.
- Copy the contents from
amazon_s3_bucker_user_policy.json
and paste them into the policy editor. - Replace
"your bucket name"
with the name of your S3 bucket. - Click "Next" and then "Create policy"
-
Obtain Access Credentials:
- On the IAM user page, click "Create access key" in the Summary section.
- Choose "Local Code" and provide a description.
- Download the CSV file containing the Access Key and Secret Key.
- Add these credentials to your
.env
file, replacing<your access key>
with your access key and<your secret>
with your secret key:AWS_ACCESS_KEY_ID=<your access key> AWS_SECRET_ACCESS_KEY=<your secret>
-
Create a Virtual Environment:
- On Windows:
python -m venv venv venv\scripts\activate
- On macOS/Linux:
python3 -m venv venv source venv/bin/activate
- On Windows:
-
Install Dependencies:
pip install -r requirements.txt