DevOps-Kubernetes-BigData-CI-CD

Setting up a highly available Kubernetes cluster, big data streaming service, and a CI/CD pipeline. This project demonstrates security configuration, monitoring, and logging in a multi-node Kubernetes environment, real-time data pipeline integration with Kafka, NiFi, Spark, Airflow, and a fully functional CI/CD pipeline using Jenkins

Overview

This repository contains the implementation for a comprehensive DevOps challenge, focusing on setting up:

A highly available Kubernetes cluster.
A big data streaming service using tools like Kafka, Spark, NiFi, and Airflow.
A CI/CD pipeline using Jenkins. Each section provides step-by-step instructions, explanations, and configurations for the challenge.

1. Kubernetes Cluster Setup

This section demonstrates how to create a highly available, performant, and secure Kubernetes cluster with the following features: • Two master nodes with a Virtual IP (VIP) setup using keepalived. • Two worker nodes for deploying applications. • Monitoring and Logging with Prometheus, Grafana, and Fluentd. Steps:

Environment Setup: Set up 2 master and 2 worker nodes using Virtual Machines (bare metal or cloud instances).
Kubernetes Installation: Install Docker, kubeadm, kubectl, and kubelet on all nodes.
Cluster Initialization: Initialize the first master node and join the second master and worker nodes.
Keepalived Configuration: Configure VIP for high availability between the two master nodes.
Monitoring and Logging: Install Prometheus and Grafana for monitoring, and Fluentd for centralized logging.

2. Big Data Streaming Service

This section walks through setting up a big data streaming service to manage both SQL-based transactional data and NoSQL-based analytical data using: • Apache Kafka for message brokering. • Apache NiFi for data flow management. • Apache Spark for real-time data processing. • Apache Airflow for workflow orchestration. Steps:

Kafka Installation: Set up Kafka for streaming data.
NiFi Installation: Use NiFi for data routing between databases.
Spark Integration: Process streaming data with Spark from Kafka.
Airflow Setup: Manage workflows using Airflow for automated pipeline orchestration.

CI/CD Pipeline Setup This section outlines the setup of a CI/CD pipeline to automate application deployment: • Jenkins is used to manage the pipeline. • The pipeline builds, tests, and deploys a sample application to the Kubernetes cluster. Steps:
Jenkins Installation: Set up Jenkins on a dedicated server.
Pipeline Configuration: Configure a Jenkins pipeline to trigger builds on Git commits and deploy the application using kubectl.
Deployment: The CI/CD pipeline deploys the application to the Kubernetes cluster. Files:

Prerequisites Before running this project, ensure you have the following installed: • Docker on all machines. • Kubernetes tools (kubectl, kubeadm, kubelet) on all nodes. • Kafka, NiFi, Spark, Airflow on their respective machines. • Jenkins on a separate machine for the CI/CD pipeline. Ensure that all nodes are networked properly and can communicate with each other. Required Tools: • Linux-based Virtual Machines (VMs) or cloud instances. • Helm (for Kubernetes package management). • Git (for version control).

How to Run Kubernetes Cluster:

Clone the repository:
Follow the instructions in the kubernetes/ folder to set up the Kubernetes cluster, install keepalived, and configure VIP, Prometheus, Grafana, and Fluentd.
Big Data Streaming Service:
Navigate to the big-data-streaming/ folder for instructions on installing Kafka, NiFi, Spark, and Airflow.
Follow the provided documentation to configure each component and test the pipeline.
CI/CD Pipeline:
Set up Jenkins by following the steps in the ci-cd/ folder.
Run the pipeline to build and deploy a sample application to the Kubernetes cluster.

Conclusion

This repository demonstrates the complete implementation of:

A highly available Kubernetes cluster with monitoring and logging >>> CI-CD-Pipeline
A big data streaming service that integrates transactional and analytical databases >>> Big Data Streaming Branch
A CI/CD pipeline that automates the deployment of applications to Kubernetes >>> Kubernetes-Setup

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
airflow_dag.py		airflow_dag.py
docker-compose.yml		docker-compose.yml
fluentd_daemonset.yaml		fluentd_daemonset.yaml
init_master1.sh		init_master1.sh
install_airflow.sh		install_airflow.sh
install_calico.sh		install_calico.sh
install_docker.sh		install_docker.sh
install_jenkins.sh		install_jenkins.sh
install_kafka.sh		install_kafka.sh
install_kubernetes_tools.sh		install_kubernetes_tools.sh
install_nifi.sh		install_nifi.sh
install_prometheus_grafana.sh		install_prometheus_grafana.sh
install_spark.sh		install_spark.sh
jenkins_docker_pipeline.groovy		jenkins_docker_pipeline.groovy
join_master2.sh		join_master2.sh
join_worker.sh		join_worker.sh
kafka_producer.py		kafka_producer.py
keepalived_master1.conf		keepalived_master1.conf
keepalived_master2.conf		keepalived_master2.conf
nifi_flow.xml		nifi_flow.xml
spark_streaming.py		spark_streaming.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DevOps-Kubernetes-BigData-CI-CD

Overview

Contents

1. Kubernetes Cluster Setup

2. Big Data Streaming Service

Contributors: Tim Murkomen

About

Uh oh!

Releases

Packages

Languages

License

Timoo20/DevOps-Kubernetes-BigData-CI-CD

Folders and files

Latest commit

History

Repository files navigation

DevOps-Kubernetes-BigData-CI-CD

Overview

Contents

1. Kubernetes Cluster Setup

2. Big Data Streaming Service

Contributors: Tim Murkomen

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages