Setting up a highly available Kubernetes cluster, big data streaming service, and a CI/CD pipeline. This project demonstrates security configuration, monitoring, and logging in a multi-node Kubernetes environment, real-time data pipeline integration with Kafka, NiFi, Spark, Airflow, and a fully functional CI/CD pipeline using Jenkins
This repository contains the implementation for a comprehensive DevOps challenge, focusing on setting up:
- A highly available Kubernetes cluster.
- A big data streaming service using tools like Kafka, Spark, NiFi, and Airflow.
- A CI/CD pipeline using Jenkins. Each section provides step-by-step instructions, explanations, and configurations for the challenge.
- Kubernetes Cluster Setup
- Big Data Streaming Service
- CI/CD Pipeline Setup
- Prerequisites
- How to Run
- Conclusion
##SHORT-CUTS
This section demonstrates how to create a highly available, performant, and secure Kubernetes cluster with the following features: • Two master nodes with a Virtual IP (VIP) setup using keepalived. • Two worker nodes for deploying applications. • Monitoring and Logging with Prometheus, Grafana, and Fluentd. Steps:
- Environment Setup: Set up 2 master and 2 worker nodes using Virtual Machines (bare metal or cloud instances).
- Kubernetes Installation: Install Docker, kubeadm, kubectl, and kubelet on all nodes.
- Cluster Initialization: Initialize the first master node and join the second master and worker nodes.
- Keepalived Configuration: Configure VIP for high availability between the two master nodes.
- Monitoring and Logging: Install Prometheus and Grafana for monitoring, and Fluentd for centralized logging.
This section walks through setting up a big data streaming service to manage both SQL-based transactional data and NoSQL-based analytical data using: • Apache Kafka for message brokering. • Apache NiFi for data flow management. • Apache Spark for real-time data processing. • Apache Airflow for workflow orchestration. Steps:
- Kafka Installation: Set up Kafka for streaming data.
- NiFi Installation: Use NiFi for data routing between databases.
- Spark Integration: Process streaming data with Spark from Kafka.
- Airflow Setup: Manage workflows using Airflow for automated pipeline orchestration.
- CI/CD Pipeline Setup This section outlines the setup of a CI/CD pipeline to automate application deployment: • Jenkins is used to manage the pipeline. • The pipeline builds, tests, and deploys a sample application to the Kubernetes cluster. Steps:
- Jenkins Installation: Set up Jenkins on a dedicated server.
- Pipeline Configuration: Configure a Jenkins pipeline to trigger builds on Git commits and deploy the application using kubectl.
- Deployment: The CI/CD pipeline deploys the application to the Kubernetes cluster. Files:
Prerequisites Before running this project, ensure you have the following installed: • Docker on all machines. • Kubernetes tools (kubectl, kubeadm, kubelet) on all nodes. • Kafka, NiFi, Spark, Airflow on their respective machines. • Jenkins on a separate machine for the CI/CD pipeline. Ensure that all nodes are networked properly and can communicate with each other. Required Tools: • Linux-based Virtual Machines (VMs) or cloud instances. • Helm (for Kubernetes package management). • Git (for version control).
How to Run Kubernetes Cluster:
- Clone the repository:
- Follow the instructions in the kubernetes/ folder to set up the Kubernetes cluster, install keepalived, and configure VIP, Prometheus, Grafana, and Fluentd.
- Big Data Streaming Service:
- Navigate to the big-data-streaming/ folder for instructions on installing Kafka, NiFi, Spark, and Airflow.
- Follow the provided documentation to configure each component and test the pipeline.
- CI/CD Pipeline:
- Set up Jenkins by following the steps in the ci-cd/ folder.
- Run the pipeline to build and deploy a sample application to the Kubernetes cluster.
Conclusion
This repository demonstrates the complete implementation of:
-
A highly available Kubernetes cluster with monitoring and logging >>> CI-CD-Pipeline
-
A big data streaming service that integrates transactional and analytical databases >>> Big Data Streaming Branch
-
A CI/CD pipeline that automates the deployment of applications to Kubernetes >>> Kubernetes-Setup