Skip to content

rushawnwhite29/apache-kakfa-capstone

Repository files navigation

Apache Kafka Capstone Project

This project demonstrates the use of Apache Kafka for publishing, consuming, and processing GitHub account and commit data. It consists of three Spring Boot applications:

  1. 1-github-metrics-streams: Processes GitHub metrics using Kafka Streams.
  2. 2-github-commits-publisher: Consumes GitHub account messages, retrieves commit data, and publishes commit messages to Kafka topics.
  3. 3-github-accounts-publisher: Retrieves GitHub account data and publishes it to a Kafka topic.

Prerequisites

Before setting up and running the applications, ensure the following are installed on your system:

  • Java 17 or higher
  • Maven (for building the project)
  • Docker and Docker Compose (for running Kafka)
  • A valid GitHub Personal Access Token (PAT) with the required permissions to access the GitHub API.

Setup Instructions

1. Clone the Repository

git clone <repository-url>
cd <repository-directory>

2. Configure Environment Variables

Create a .env file in the root directory and add the following variables:

GITHUB_PAT=<your-github-personal-access-token>
KAFKA_BROKER=localhost:9092

Replace <your-github-personal-access-token> with your GitHub PAT.

3. Start Kafka Using Docker Compose

Run the following command to start Kafka and Zookeeper:

docker-compose up -d

Ensure Kafka is running on localhost:9092.

4. Create Kafka Topics

You can create the required Kafka topics using the following command:

env $(cat .env | xargs) ./create-topics.sh

4. Build the Applications

Navigate to the root directory and build all applications using Maven:

mvn clean install

Running the Applications

1. Start the GitHub Metrics Streams

This application processes GitHub metrics using Kafka Streams.

cd ../1-github-metrics-streams
mvn spring-boot:run

2. Start the GitHub Commits Publisher

This application consumes GitHub account messages, retrieves commit data, and publishes commit messages to Kafka topics.

cd ../2-github-commits-publisher
mvn spring-boot:run

3. Start the GitHub Accounts Publisher

This application retrieves GitHub account data and publishes it to a Kafka topic.

cd 3-github-accounts-publisher
mvn spring-boot:run

4. Verify the Applications are Running

You can check the logs of each application to ensure they are running correctly. You can also use a Kafka consumer to verify that messages are being published to the Kafka topics.

Verifying the Workflow

  1. GitHub Accounts Publisher: Check the logs to ensure GitHub account data is being published to the Kafka topic.
  2. GitHub Commits Publisher: Verify that commit data is being consumed and published to the appropriate Kafka topics.
  3. GitHub Metrics Streams: Confirm that metrics are being processed and output as expected.

Stopping the Applications

To stop the applications, press Ctrl+C in each terminal where the applications are running.

To stop Kafka and Zookeeper, run:

docker-compose down

Troubleshooting

  • Ensure all environment variables are correctly set in the .env file.
  • Verify that Kafka is running and accessible on localhost:9092.
  • Check the logs of each application for detailed error messages if something goes wrong.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published