This project demonstrates the use of Apache Kafka for publishing, consuming, and processing GitHub account and commit data. It consists of three Spring Boot applications:
- 1-github-metrics-streams: Processes GitHub metrics using Kafka Streams.
- 2-github-commits-publisher: Consumes GitHub account messages, retrieves commit data, and publishes commit messages to Kafka topics.
- 3-github-accounts-publisher: Retrieves GitHub account data and publishes it to a Kafka topic.
Before setting up and running the applications, ensure the following are installed on your system:
- Java 17 or higher
- Maven (for building the project)
- Docker and Docker Compose (for running Kafka)
- A valid GitHub Personal Access Token (PAT) with the required permissions to access the GitHub API.
git clone <repository-url>
cd <repository-directory>
Create a .env
file in the root directory and add the following variables:
GITHUB_PAT=<your-github-personal-access-token>
KAFKA_BROKER=localhost:9092
Replace <your-github-personal-access-token>
with your GitHub PAT.
Run the following command to start Kafka and Zookeeper:
docker-compose up -d
Ensure Kafka is running on localhost:9092
.
You can create the required Kafka topics using the following command:
env $(cat .env | xargs) ./create-topics.sh
Navigate to the root directory and build all applications using Maven:
mvn clean install
This application processes GitHub metrics using Kafka Streams.
cd ../1-github-metrics-streams
mvn spring-boot:run
This application consumes GitHub account messages, retrieves commit data, and publishes commit messages to Kafka topics.
cd ../2-github-commits-publisher
mvn spring-boot:run
This application retrieves GitHub account data and publishes it to a Kafka topic.
cd 3-github-accounts-publisher
mvn spring-boot:run
You can check the logs of each application to ensure they are running correctly. You can also use a Kafka consumer to verify that messages are being published to the Kafka topics.
- GitHub Accounts Publisher: Check the logs to ensure GitHub account data is being published to the Kafka topic.
- GitHub Commits Publisher: Verify that commit data is being consumed and published to the appropriate Kafka topics.
- GitHub Metrics Streams: Confirm that metrics are being processed and output as expected.
To stop the applications, press Ctrl+C
in each terminal where the applications are running.
To stop Kafka and Zookeeper, run:
docker-compose down
- Ensure all environment variables are correctly set in the
.env
file. - Verify that Kafka is running and accessible on
localhost:9092
. - Check the logs of each application for detailed error messages if something goes wrong.