diff --git a/README.md b/README.md index 85ef94f6..af8e6c32 100644 --- a/README.md +++ b/README.md @@ -161,6 +161,29 @@ Kubernetes, also known as K8s, is an open-source container orchestration platfor | 47 | 2025-04-27 | [⎈ K8s Tools Docker Images — kubectl ️](https://medium.com/@muppedaanvesh/k8s-tools-docker-images-kubectl-%EF%B8%8F-acd446b5c079?source=rss-15b2de10f77d------2) | | 48 | 2025-04-29 | [⎈ Containerized FluxCD: Zero-Install Cluster Management ️](https://medium.com/@muppedaanvesh/containerized-fluxcd-zero-install-cluster-management-%EF%B8%8F-4f2ace623eb4?source=rss-15b2de10f77d------2) | | 49 | 2025-05-11 | [⎈ kubectl-ai: Speak, Don’t Script ️](https://medium.com/@muppedaanvesh/kubectl-ai-speak-dont-script-%EF%B8%8F-f16e79b0fdaa?source=rss-15b2de10f77d------2) | +| 50 | 2025-05-12 | [⎈ Introducing the Official Kubernetes Hands-On Guides ️](https://medium.com/@muppedaanvesh/introducing-the-official-kubernetes-hands-on-guides-%EF%B8%8F-8e4da946b92d?source=rss-15b2de10f77d------2) | +| 51 | 2025-05-12 | [⎈ Amazon EKS Auto Mode: A Hands-On Guide ️](https://medium.com/@muppedaanvesh/amazon-eks-auto-mode-a-hands-on-guide-%EF%B8%8F-466880dc9f07?source=rss-15b2de10f77d------2) | +| 52 | 2025-05-18 | [⎈ EKS Node Viewer: A Hands-On Guide ️](https://medium.com/@muppedaanvesh/eks-node-viewer-a-hands-on-guide-%EF%B8%8F-bae01bf4a91b?source=rss-15b2de10f77d------2) | +| 53 | 2025-05-18 | [⎈ Amazon EKS Kubecost: A Hands-On Guide ️](https://medium.com/@muppedaanvesh/amazon-eks-kubecost-a-hands-on-guide-%EF%B8%8F-a85a81c9226c?source=rss-15b2de10f77d------2) | + + + + + + + + + + + + + + + + + + + diff --git a/docs/013-ado-agents/_category_.json b/docs/013-ado-agents/_category_.json new file mode 100644 index 00000000..6978fb30 --- /dev/null +++ b/docs/013-ado-agents/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "Kubernetes ADO Self-Hosted Agents", + "position": 13, + "link": { + "type": "generated-index", + "description": "Are you tired of wrestling with manual agent management, lengthy pipeline queues, or soaring infrastructure costs in your Azure DevOps setup? Look no further! In this comprehensive series of guides, we’ll explore how leveraging Kubernetes for self-hosted agents can revolutionize your CI/CD processes. Say goodbye to operational headaches and hello to streamlined, efficient pipelines." + } +} diff --git a/docs/013-ado-agents/azure-devops-part-1.md b/docs/013-ado-agents/azure-devops-part-1.md new file mode 100644 index 00000000..2ae4cce4 --- /dev/null +++ b/docs/013-ado-agents/azure-devops-part-1.md @@ -0,0 +1,374 @@ +# ⎈ Azure DevOps — Self Hosted Agents on Kubernetes — PART-1 ⎈ +#### *Unlocking Efficiency and Scalability for Your CI/CD Workflows🚀* + +![img](./img/linux-agent.png.webp) + + +Are you tired of wrestling with manual agent management, lengthy pipeline queues, or soaring infrastructure costs in your Azure DevOps setup? Look no further! In this comprehensive series of guides, we’ll explore how leveraging Kubernetes for self-hosted agents can revolutionize your CI/CD processes. Say goodbye to operational headaches and hello to streamlined, efficient pipelines. + +### Introduction + +If you find yourself grappling with the management of Azure DevOps self-hosted agents or experiencing prolonged wait times in pipeline queues, or perhaps facing higher costs associated with agent infrastructure, or struggling with the setup and maintenance of self-hosted agent environments, then Kubernetes presents an optimal solution to address these challenges. + +Furthermore, the absence of Kubernetes orchestration complicates the setup of diverse environments for running pipelines on self-hosted agents. Organizations may struggle to configure and manage different agent environments for testing, staging, and production, resulting in inconsistency and potential deployment issues. Additionally, without the scalability and elasticity provided by Kubernetes, organizations may experience bottlenecks and delays due to long queue times in Azure DevOps pipelines, leading to decreased productivity and slower time-to-market. + +In summary, without leveraging Kubernetes for Azure DevOps self-hosted agents, organizations may face a myriad of challenges including operational inefficiencies, pipeline delays, increased costs, and complexity in environment management. Embracing Kubernetes as a solution offers scalability, flexibility, cost-effectiveness, and streamlined operations, enabling organizations to optimize their CI/CD workflows and accelerate software delivery. + +### Challenges with Traditional Self-Hosted Agents + +- Management overhead: The manual effort required to provision, configure, and manage self-hosted agents. + +- Queue times: Prolonged wait times in pipeline queues due to limited agent capacity. + +- Cost implications: Inefficient resource utilization leading to increased infrastructure costs. + +- Environment setup complexities: Difficulties in configuring and managing diverse environments for running pipelines. + +### Benefits of using the Kubernetes for Self-Hosted Agents + +**Key features:** Scalability, flexibility, portability, and automated management. + +- **Benefits for CI/CD:** Enabling dynamic scaling, resource optimization, and simplified management of agent infrastructure. +- **Scalability:** Ability to scale agent capacity up or down based on workload demands. +- **Flexibility:** Customizable agent configurations and support for diverse environments. +- **Cost-effectiveness:** Efficient resource utilization leading to reduced infrastructure costs. +- **Simplified management:** Automated deployment, scaling, and maintenance of agent infrastructure. + +### Prerequisites + +- Kubernetes environment. +- Azure Devops active account. +- Docker environment + +### Getting Started: Setting Up Azure DevOps Self-Hosted Agents on Kubernetes + +Let’s start setting up these Azure DevOps Self-Hosted Agents on Kubernetes by following the below steps one by one. + +- **Building Docker images:** Building Docker images containing Azure Pipelines Agent software. +- **Configuring Azure DevOps:** Setting up agent pools and generating Personal Access Tokens (PATs). +- **Deploying agent pods:** Using Kubernetes manifests to deploy agent pods in the cluster. +- **Registering agents:** Configuring agent pods to register themselves with Azure DevOps using PATs. + +#### Building Docker Images: + +Assume you have Docker environment is ready to build the required docker images. + +1. Open the target CLI to run docker commands + +2. Create a new directory. +```yaml +mkdir azsh-linux-agent/ +``` + +3. Change the directory to newly created. +```yaml +cd azsh-linux-agent/ +``` +4. Create the Dockerfile with below content, name it as **azsh-linux-agent.dockerfile** +```yaml +FROM ubuntu:22.04 + +RUN apt update +RUN apt upgrade -y +RUN apt install -y curl git jq libicu70 + +# Also can be "linux-arm", "linux-arm64". +ENV TARGETARCH="linux-x64" + +WORKDIR /azp/ + +COPY ./start.sh ./ +RUN chmod +x ./start.sh + +RUN useradd agent +RUN chown agent ./ +USER agent +# Another option is to run the agent as root. +# ENV AGENT_ALLOW_RUNASROOT="true" + +ENTRYPOINT ./start.sh +``` + +5. Create a **start.sh** file using below content, which will be used by the above docker file. + +```yaml +#!/bin/bash +set -e + +if [ -z "${AZP_URL}" ]; then + echo 1>&2 "error: missing AZP_URL environment variable" + exit 1 +fi + +if [ -z "${AZP_TOKEN_FILE}" ]; then + if [ -z "${AZP_TOKEN}" ]; then + echo 1>&2 "error: missing AZP_TOKEN environment variable" + exit 1 + fi + + AZP_TOKEN_FILE="/azp/.token" + echo -n "${AZP_TOKEN}" > "${AZP_TOKEN_FILE}" +fi + +unset AZP_TOKEN + +if [ -n "${AZP_WORK}" ]; then + mkdir -p "${AZP_WORK}" +fi + +cleanup() { + trap "" EXIT + + if [ -e ./config.sh ]; then + print_header "Cleanup. Removing Azure Pipelines agent..." + + # If the agent has some running jobs, the configuration removal process will fail. + # So, give it some time to finish the job. + while true; do + ./config.sh remove --unattended --auth "PAT" --token $(cat "${AZP_TOKEN_FILE}") && break + + echo "Retrying in 30 seconds..." + sleep 30 + done + fi +} + +print_header() { + lightcyan="\033[1;36m" + nocolor="\033[0m" + echo -e "\n${lightcyan}$1${nocolor}\n" +} + +# Let the agent ignore the token env variables +export VSO_AGENT_IGNORE="AZP_TOKEN,AZP_TOKEN_FILE" + +print_header "1. Determining matching Azure Pipelines agent..." + +AZP_AGENT_PACKAGES=$(curl -LsS \ + -u user:$(cat "${AZP_TOKEN_FILE}") \ + -H "Accept:application/json;" \ + "${AZP_URL}/_apis/distributedtask/packages/agent?platform=${TARGETARCH}&top=1") + +AZP_AGENT_PACKAGE_LATEST_URL=$(echo "${AZP_AGENT_PACKAGES}" | jq -r ".value[0].downloadUrl") + +if [ -z "${AZP_AGENT_PACKAGE_LATEST_URL}" -o "${AZP_AGENT_PACKAGE_LATEST_URL}" == "null" ]; then + echo 1>&2 "error: could not determine a matching Azure Pipelines agent" + echo 1>&2 "check that account "${AZP_URL}" is correct and the token is valid for that account" + exit 1 +fi + +print_header "2. Downloading and extracting Azure Pipelines agent..." + +curl -LsS "${AZP_AGENT_PACKAGE_LATEST_URL}" | tar -xz & wait $! + +source ./env.sh + +trap "cleanup; exit 0" EXIT +trap "cleanup; exit 130" INT +trap "cleanup; exit 143" TERM + +print_header "3. Configuring Azure Pipelines agent..." + +./config.sh --unattended \ + --agent "${AZP_AGENT_NAME:-$(hostname)}" \ + --url "${AZP_URL}" \ + --auth "PAT" \ + --token $(cat "${AZP_TOKEN_FILE}") \ + --pool "${AZP_POOL:-Default}" \ + --work "${AZP_WORK:-_work}" \ + --replace \ + --acceptTeeEula & wait $! + +print_header "4. Running Azure Pipelines agent..." + +chmod +x ./run.sh + +# To be aware of TERM and INT signals call ./run.sh +# Running it with the --once flag at the end will shut down the agent after the build is executed +./run.sh "$@" & wait $! +``` + +6. Let’s build the docker image using below command. +```yaml +docker build --tag "azsh-linux-agent:tag" --file "./azsh-linux-agent.dockerfile" . +``` + +7. Now push the above docker image to your target repository. +### Configuring Azure DevOps +**Create Agent Pool** + +- In your Azure DevOps organization, navigate to “Project Settings” > “Agent Pools”. +- Create a new agent pool or use an existing one for your Kubernetes agents. +- Click on the “New agent pool” button to create a new pool, or select an existing one. + +**Create PAT Token** + +- Click on “**User Settings**” from top-right corner of the page. +- Select “**Personal access tokens**” from the dropdown menu. +- Generate a Personal Access Token (PAT) with the appropriate scope for registering agents, and save it for next steps. + +**Deploying agent pods** + +Imagine you’ve got your Kubernetes cluster ready to roll. If not, don’t worry — we’ve got you covered. Even if you’re new to Kubernetes, setting up Azure DevOps agents on your behalf is a breeze. Simply follow along with the steps outlined in this guide, and you’ll be up and running in no time! + +Create a **azsh-linux-agent-deployment.yaml** file using below content. +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: azsh-linux + namespace: az-devops + labels: + app: azsh-linux-agent +spec: + replicas: 1 + selector: + matchLabels: + app: azsh-linux-agent + template: + metadata: + labels: + app: azsh-linux-agent + spec: + containers: + - name: kubepodcreation + image: anvesh35/azsh-linux-agent:02062024 + env: + - name: AZP_URL + valueFrom: + secretKeyRef: + name: azdevops + key: AZP_URL + - name: AZP_TOKEN + valueFrom: + secretKeyRef: + name: azdevops + key: AZP_TOKEN + - name: AZP_POOL + valueFrom: + secretKeyRef: + name: azdevops + key: AZP_POOL + volumeMounts: + - mountPath: /var/run/docker.sock + name: docker-volume + volumes: + - name: docker-volume + hostPath: + path: /var/run/docker.sock +``` +Create a **azsh-linux-agent-secret.yaml** file using below content. This secret will help us to authenticate our kubernetes pods with Azure devops. + +```yaml +apiVersion: v1 +data: + AZP_POOL: + AZP_TOKEN: + AZP_URL: +kind: Secret +metadata: + name: azdevops + namespace: az-devops +type: Opaque +``` + +Here replace your agent-pool-name with your newly created Agent Pool name, PAT-Token with PAT token, and AZP_URL with your Azure devops portal URL. + +Now create a new namespace for azure devops self-hosted agents in your kubernetes cluster. +```yaml +kubectl create namespace az-devops +``` +Let’s deploy the deployment using below command. +```yaml +kubectl apply -f azsh-linux-agent-deployment.yaml +``` +Let’s create the secret to authenticate our agent pods to azure devops using below command. +```yaml +kubectl apply -f azsh-linux-agent-secret.yaml +``` + +Verify the newly created pods and secrets status using below command. +```yaml +kubectl get pods -n az-devops +kubectl get secret -n az-devops +``` +```yaml +kubectl get pods -n az-devops +NAME READY STATUS RESTARTS AGE +azdevops-deployment-b684fd756-nmn54 1/1 Running 0 21h + +kubectl get secret -n az-devops +NAME TYPE DATA AGE +azdevops Opaque 3 21h +``` + +Now the pods are up and running, let’s check in our Azure Devops Portal for our self-hosted agents in our agent pool. + + +That’s it your agents are online now you can run your pipelines based on your number of replicas. + +### Running your first Azure DevOps pipeline on Kubernetes Self-Hosted agents +Lets create a new sample pipeline using below steps. + +Create a new file in **/pipelines/sample-pipeline.yaml** path in your azure devops repository using below content. + +```yaml +trigger: none + +pool: + +resources: + repositories: + - repository: templates + displayname: templates from devops + type: git + name: / + ref: + +stages: +- stage: Deploy + jobs: + - deployment: helloWorld + displayName: 'Agent Setup' + strategy: + runOnce: + deploy: + steps: + #Branch Checkout + - checkout: self + persistCredentials: true + + # Hello World program + - template: /scripts/kubernetes/hello-world.yaml@templates +``` +Create the script **/scripts/kubernetes/hello-world.yaml** to run the sample program on self-hosted agent on kubernetes. +```yaml + +steps: + - bash: | + echo "Hello, Welcome to Anvesh World!" + displayName: 'Hello World' + +``` + +Now create a new pipeline using below steps. +Select the project where you want to create the YAML pipeline. + +1. Click on the “Pipelines” menu option in the left sidebar. +2. You should see a button labeled “New pipeline” on the Pipelines page. +3. Click on it to start creating a new pipeline. +4. Choose the repository where your code is located. Azure DevOps supports Git repositories, GitHub repositories, and others. +5. Choose where your YAML file is located(i.e., /pipelines/sample-pipeline.yaml) +6. Now save the pipeline. +7. should see your pipeline listed in the Pipelines page of your Azure DevOps project. You can review the pipeline configuration and manually trigger a run to test it. + +That’s it! You’ve successfully created a YAML pipeline in Azure DevOps and ran it using self hosted agent from kubernetes. + +![img](./img/sample-pipeline.png.webp) + + +## Best Practices and Tips +- **Resource management:** Setting resource limits and requests for agent pods to ensure efficient resource utilization. +- **Monitoring and logging:** Implementing monitoring and logging solutions to track agent performance and troubleshoot issues. +- **Automation:** Leveraging automation tools and scripts for seamless deployment and configuration of agent infrastructure. +- Will discuss all these in upcoming parts. \ No newline at end of file diff --git a/docs/013-ado-agents/azure-devops-part-2.md b/docs/013-ado-agents/azure-devops-part-2.md new file mode 100644 index 00000000..1338f786 --- /dev/null +++ b/docs/013-ado-agents/azure-devops-part-2.md @@ -0,0 +1,299 @@ +# ⎈ Azure DevOps — Self Hosted Agents on Kubernetes — PART-2 ⎈ +#### *Build and Deploy Windows SelfHosted Agents 🚀* + +![img](./img/windows-agents.png.webp) + +## Welcome to Part 2 +### Deploying Windows Self-Hosted Agents on Kubernetes for Azure DevOps + + +In the first part of our blog series, we explored the setup of **Linux self-hosted agents on Kubernetes,** demonstrating how to seamlessly run sample scripts from Kubernetes-managed agents within Azure DevOps pipelines. Now, in Part 2, we delve into the world of Windows self-hosted agents, focusing on building Docker images tailored for Windows environments, deploying these agents to Kubernetes clusters, and executing command-line scripts on them. + +Throughout this installment, we’ll guide you through the process of creating a Windows self-hosted agent Docker image, deploying it to Kubernetes, and harnessing its power to execute Windows command-line scripts within Azure DevOps pipelines. Whether you’re a seasoned DevOps engineer or just beginning your journey into automation, this guide will equip you with the knowledge and tools needed to streamline your CI/CD workflows in Windows environments. + + +Join us as we unlock the potential of Windows self-hosted agents in Kubernetes and Azure DevOps, empowering you to elevate your development and deployment processes to new heights. Let’s dive in! + +### 1. Building Windows Self-Hosted Agent Docker Image + +1. Create **Dockerfile:** +Start by creating a Dockerfile for your Windows self-hosted agent. + +```yaml +FROM mcr.microsoft.com/windows/servercore:ltsc2022 + +WORKDIR /azp/ + +COPY ./start.ps1 ./ + +CMD powershell .\start.ps1 +``` + +2. Create start.ps1 in same directory with below content. + +```yaml + +function Print-Header ($header) { + Write-Host "`n${header}`n" -ForegroundColor Cyan +} + +if (-not (Test-Path Env:AZP_URL)) { + Write-Error "error: missing AZP_URL environment variable" + exit 1 +} + +if (-not (Test-Path Env:AZP_TOKEN_FILE)) { + if (-not (Test-Path Env:AZP_TOKEN)) { + Write-Error "error: missing AZP_TOKEN environment variable" + exit 1 + } + + $Env:AZP_TOKEN_FILE = "\azp\.token" + $Env:AZP_TOKEN | Out-File -FilePath $Env:AZP_TOKEN_FILE +} + +Remove-Item Env:AZP_TOKEN + +if ((Test-Path Env:AZP_WORK) -and -not (Test-Path $Env:AZP_WORK)) { + New-Item $Env:AZP_WORK -ItemType directory | Out-Null +} + +New-Item "\azp\agent" -ItemType directory | Out-Null + +# Let the agent ignore the token env variables +$Env:VSO_AGENT_IGNORE = "AZP_TOKEN,AZP_TOKEN_FILE" + +Set-Location agent + +Print-Header "1. Determining matching Azure Pipelines agent..." + +$base64AuthInfo = [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(":$(Get-Content ${Env:AZP_TOKEN_FILE})")) +$package = Invoke-RestMethod -Headers @{Authorization=("Basic $base64AuthInfo")} "$(${Env:AZP_URL})/_apis/distributedtask/packages/agent?platform=win-x64&`$top=1" +$packageUrl = $package[0].Value.downloadUrl + +Write-Host $packageUrl + +Print-Header "2. Downloading and installing Azure Pipelines agent..." + +$wc = New-Object System.Net.WebClient +$wc.DownloadFile($packageUrl, "$(Get-Location)\agent.zip") + +Expand-Archive -Path "agent.zip" -DestinationPath "\azp\agent" + +try { + Print-Header "3. Configuring Azure Pipelines agent..." + + .\config.cmd --unattended ` + --agent "$(if (Test-Path Env:AZP_AGENT_NAME) { ${Env:AZP_AGENT_NAME} } else { hostname })" ` + --url "$(${Env:AZP_URL})" ` + --auth PAT ` + --token "$(Get-Content ${Env:AZP_TOKEN_FILE})" ` + --pool "$(if (Test-Path Env:AZP_POOL) { ${Env:AZP_POOL} } else { 'Default' })" ` + --work "$(if (Test-Path Env:AZP_WORK) { ${Env:AZP_WORK} } else { '_work' })" ` + --replace + + Print-Header "4. Running Azure Pipelines agent..." + + .\run.cmd +} + finally +{ + Print-Header "Cleanup. Removing Azure Pipelines agent..." + + .\config.cmd remove --unattended ` + --auth PAT ` + --token "$(Get-Content ${Env:AZP_TOKEN_FILE})" +} + +``` +3. Run the following command within that directory: + +```yaml +docker build --tag ":" --file "./Dockerfile" . +``` + +4. Push the above windows docker image to your repository. + +```yaml +docker push : +``` +### 2. Deploying Windows Self-Hosted Agent to Kubernetes + +***Ensuring that your Kubernetes cluster has Windows worker nodes is crucial for deploying and running Windows self-hosted agents. Windows worker nodes provide the necessary environment for executing Windows-based containers, allowing your self-hosted agents to operate effectively within the Kubernetes cluster.*** + + +***Before proceeding with the deployment of Windows self-hosted agents on Kubernetes, verify that your cluster configuration includes Windows worker nodes alongside Linux nodes. This ensures compatibility and availability for running Windows containers and accommodating the specific requirements of your Windows-based workloads.*** + +Create a new namespace for Azure DevOps Self Hosted Agents using the below command + +```yaml +kubectl create namespace az-devops +``` + +***Create All Azure DevOps self hosted agents related workloads within the az-devops namespace*** + +Create Kubernetes Deployment with below content and name it as **windows-sh-agent-deploy.yaml** + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: azsh-windows + namespace: az-devops + labels: + app: azsh-windows-agent +spec: + replicas: 1 + selector: + matchLabels: + app: azsh-windows-agent + template: + metadata: + labels: + app: azsh-windows-agent + spec: + containers: + - name: kubepodcreation + image: anvesh35/azsh-windows-agent:1602204 + env: + - name: AZP_URL + valueFrom: + secretKeyRef: + name: azdevops + key: AZP_URL + - name: AZP_TOKEN + valueFrom: + secretKeyRef: + name: azdevops + key: AZP_TOKEN + - name: AZP_POOL + valueFrom: + secretKeyRef: + name: azdevops + key: AZP_POOL + volumeMounts: + - mountPath: /var/run/docker.sock + name: docker-volume + volumes: + - name: docker-volume + hostPath: + path: /var/run/docker.sock +``` + +Create Kubernetes secret which is used to connect to the Azure DevOps + +```yaml +kubectl -n az-devops create secret generic azdevops \ + --from-literal=AZP_URL=https://dev.azure.com/yourOrg \ + --from-literal=AZP_TOKEN=YourPAT \ + --from-literal=AZP_POOL=NameOfYourPool +``` + +Now apply deployment manifest file using the below commands + +```yaml +kubectl apply -f windows-sh-agent-deploy.yaml +``` + +Verify that the agent pods are created and running successfully by checking their status with below command + +```yaml +kubectl get pods -n az-devops +``` +```yaml +NAME READY STATUS RESTARTS AGE +azsh-windows-768bd8bdf8-2ng48 1/1 Running 0 3h1m +``` + +Now Azure DevOps windows Self-Hosted Agent pod is up and running. Now, let’s ensure it’s available in the Azure DevOps Agent Pool by following these steps: + + +1. **Login to Azure DevOps Portal:** Go to the Azure DevOps portal and log in with your credentials. +2. **Navigate to Project Settings:** Once logged in, navigate to your project settings. You can usually find this option in the bottom-left corner of the Azure DevOps portal. + +3. **Click on Agent Pools:** In the project settings, click on the “Agent pools” option. This will take you to the page where you can manage agent pools for your project. +4. **Select the Target Agent Pool:** Choose the target agent pool that you are using within the secret mentioned in your setup. This is the pool where your Windows Self-Hosted Agent pod should be available. +5. **Go to the Agents Section:** Within the selected agent pool, navigate to the “Agents” section. Here, you should be able to see a list of agents registered in this pool. +6. **Verify Your Windows Agent:** Look for your Windows Self-Hosted Agent in the list. It should be listed here if it’s successfully registered and connected to the Azure DevOps Agent Pool. + +By following these steps, you can ensure that your Windows Self-Hosted Agent is correctly configured and available in the desired agent pool within Azure DevOps. If you encounter any issues, double-check the configuration and connectivity settings of your agent. + +![img](./img/agent-on-kubernetes.png.webp) + + +### 3. Running Command-Line Scripts on Windows Self-Hosted Agents +Lets create a new sample windows pipeline using below steps. + + +Create a new file in **/pipelines/sample-windows-pipeline.yaml** path in your Azure DevOps repository using below content. + +```yaml +trigger: none + +pool: + name: + vmImage: + +stages: +- stage: Deploy + jobs: + - deployment: agentSetup + environment: test + displayName: 'Running on Windows Agent' + strategy: + runOnce: + deploy: + steps: + #Hello World From CMDLINE + - template: ../scripts/kubernetes/windows/hello-world.yaml + + #Print Hostname from CMDLINE + - template: ../scripts/kubernetes/windows/windows-hostname.yaml + +``` + +Create the first command line script /scripts/kubernetes/hello-world.yaml to run the Hello World program on Windows Self-Hosted Agent on kubernetes. + +```yaml +steps: + - task: CmdLine@2 + displayName: 'Run Hello World' + inputs: + script: 'echo Hello, World!' +``` +Create the second command line script **/scripts/kubernetes/hello-world.yaml** to display the hostname on Windows Self-Hosted Agent on kubernetes. + +```yaml +steps: + - task: CmdLine@2 + displayName: 'Display Hostname' + inputs: + script: 'hostname' +``` +Now create a new pipeline using below steps. + +1. Select the project where you want to create the YAML pipeline. +2. Click on the “Pipelines” menu option in the left sidebar. +3. You should see a button labeled “New pipeline” on the Pipelines page. Click on it to start creating a new pipeline. +4. Choose the repository where your code is located. Azure DevOps supports Git repositories, GitHub repositories, and others. +5. Choose where your YAML file is located(i.e., /pipelines/sample-pipeline.yaml) +6. Now save the pipeline. +7. should see your pipeline listed in the Pipelines page of your Azure DevOps project. You can review the pipeline configuration and manually trigger a run to test it. + +That’s it! You’ve successfully created a YAML pipeline in Azure DevOps and ran it using Windows Self-Hosted Agent from kubernetes cluster. +![img](./img/pipeline.png.webp) + +![img](./img/sample-run.png.webp) + +![img](./img/hello-world.png.webp) + +![img](./img/hostname.png.webp) + + + +### 4. Best Practices and Tips + +- **Resource management:** Setting resource limits and requests for agent pods to ensure efficient resource utilization. +- **Monitoring and logging:** Implementing monitoring and logging solutions to track agent performance and troubleshoot issues. +- **Automation:** Leveraging automation tools and scripts for seamless deployment and configuration of agent infrastructure. +- Will discuss all these in upcoming parts. \ No newline at end of file diff --git a/docs/013-ado-agents/azure-devops-part-3.md b/docs/013-ado-agents/azure-devops-part-3.md new file mode 100644 index 00000000..0c0854db --- /dev/null +++ b/docs/013-ado-agents/azure-devops-part-3.md @@ -0,0 +1,275 @@ +# ⎈ Azure DevOps — Self Hosted Agents on Kubernetes — PART-3 ⎈ + +#### *Scaling Self-Hosted Agents on Kubernetes with KEDA 📈* + +![img](./img/auto-scaling.png.webp) + +### Welcome to Part 3 +Welcome back to our ongoing journey through the fusion of Azure DevOps and Kubernetes! In this third installment, we’re diving deep into a fascinating realm of Kubernetes management: the auto-scaling of self-hosted agents using Kubernetes Event-driven Autoscaling (KEDA). + +Get ready to bid farewell to those lengthy queues and usher in a new era of efficiency in our CI/CD pipelines! In this latest chapter of our Azure DevOps and Kubernetes saga, we’re embracing KEDA to say goodbye to wait times and revolutionize how we allocate resources. With KEDA at our disposal, we’re poised to transform the scalability of our self-hosted agents, banishing bottlenecks and ensuring swift deployments every time. Join us on this journey towards operational excellence as we harness the power of KEDA and wave goodbye to long queues once and for all! +### Install KEDA(Kubernetes Event-driven Autoscaling) + +To deploy KEDA (Kubernetes Event-driven Autoscaling) using Helm, you can follow these steps: + +1. **Add Helm Repo:** Add the KEDA Helm repository to your Helm configuration. Run the following command: + +```yaml +helm repo add kedacore https://kedacore.github.io/charts +``` + +2. **Update Helm Repo:** After adding the repository, make sure to update your local Helm repository cache with the latest information from the added repositories. Run: + +```yaml +helm repo update +``` + +3. **Install KEDA Helm Chart:** Use Helm to install the KEDA Helm chart. Since you want to install it in the keda namespace and create the namespace if it doesn't exist, use the following command: + +```yaml +helm install keda kedacore/keda --namespace keda --create-namespace +``` +This command installs the KEDA Helm chart named keda from the kedacore repository into the keda namespace, creating the namespace if it doesn't already exist. + +4. Verify the all KEDA resources are up and running using below command + +```yaml +kubectl get all -n keda +``` +```yaml +kubectl get all -n keda +NAME READY STATUS RESTARTS AGE +pod/keda-admission-webhooks-689544998-g9lpt 1/1 Running 0 75m +pod/keda-operator-898ccf84f-j6ghd 1/1 Running 1 (77m ago) 77m +pod/keda-operator-metrics-apiserver-688659cccb-g6n2g 1/1 Running 0 73m + +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +service/keda-admission-webhooks ClusterIP 10.0.189.174 443/TCP 98m +service/keda-operator ClusterIP 10.0.198.219 9666/TCP 98m +service/keda-operator-metrics-apiserver ClusterIP 10.0.188.67 443/TCP,8080/TCP 98m + +NAME READY UP-TO-DATE AVAILABLE AGE +deployment.apps/keda-admission-webhooks 1/1 1 1 98m +deployment.apps/keda-operator 1/1 1 1 98m +deployment.apps/keda-operator-metrics-apiserver 1/1 1 1 98m + +NAME DESIRED CURRENT READY AGE +replicaset.apps/keda-admission-webhooks-54764ff7d5 0 0 0 98m +replicaset.apps/keda-admission-webhooks-689544998 1 1 1 75m +replicaset.apps/keda-operator-567cb596fd 0 0 0 98m +replicaset.apps/keda-operator-898ccf84f 1 1 1 77m +replicaset.apps/keda-operator-metrics-apiserver-6475bf5fff 0 0 0 98m +replicaset.apps/keda-operator-metrics-apiserver-688659cccb 1 1 1 73m +``` +With these steps, you’ll have deployed KEDA into your Kubernetes cluster using Helm. Make sure to adjust the namespace or any other parameters as needed for your specific environment. + + +### Deploy KEDA ScaledObject + +After successfully installing KEDA into your Kubernetes cluster, it’s essential to verify that all KEDA resources are up and running to ensure proper functionality. This step ensures that KEDA is ready to scale your deployments based on specified triggers. + +Once you’ve confirmed that KEDA resources are operational, the next step is to deploy the KEDA ScaledObject. This deployment enables KEDA to initiate scaling actions based on predefined triggers. In this case, we’ll configure the ScaledObject to scale based on the queue length of an Azure Pipelines agent pool. + +Let’s Deploy the KEDA ScaledObject Using the below steps: + +Create a Secret: + +Use the following YAML to create a Secret named pipeline-auth with your personal access token (PAT) encoded in base64: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: pipeline-auth +data: + personalAccessToken: '' +``` +Replace your base64 PAT with your actual personal access token encoded in base64. + +Run the below command(assuming you save the YAML in a file named secret.yaml) to create the Secret in your Kubernetes cluster or you can use the previous secret which we created part of the windows/linux agent deployment. + +```yaml +kubectl apply -f secret.yaml +``` +Create a TriggerAuthentication: + +Use the following YAML to create a TriggerAuthentication named pipeline-trigger-auth: +```yaml +apiVersion: keda.sh/v1alpha1 +kind: TriggerAuthentication +metadata: + name: pipeline-trigger-auth +spec: + secretTargetRef: + - parameter: personalAccessToken + name: pipeline-auth + key: personalAccessToken +``` +Run below command to create the TriggerAuthentication in your Kubernetes cluster. +```yaml +kubectl apply -f trigger-auth.yaml +``` + + +Create a ScaledObject: + +Use the following YAML to create a ScaledObject named azure-pipelines-scaledobject.yaml: + +```yaml +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: azure-pipelines-scaledobject + namespace: az-devops +spec: + scaleTargetRef: + name: azsh-windows #Target agent deployment name + minReplicaCount: 1 + maxReplicaCount: 5 #Maximum number of parallel instances + triggers: + - type: azure-pipelines + metadata: + poolID: "" #Replace with your agent pool ID + organizationURLFromEnv: "AZP_URL" + authenticationRef: + name: pipeline-trigger-auth +``` +Replace Agent Pool Id with your actual value. By specifying the Agent Pool ID parameter in the ScaledObject configuration, KEDA will automatically scale the deployment according to the queue size of the specified agent pool. Additionally, the ScaledObject will ensure that scaling operations stay within the defined minimum and maximum replica counts, providing flexibility and control over resource allocation. + +To get the Agent Pool ID from the Azure DevOps console, follow these steps: + + +1. **Navigate to Azure DevOps:** Go to the Azure DevOps portal at dev.azure.com. + +2. **Select Organization:** Choose the organization where your agent pool is located. + +3. **Go to Agent Pools:** In the left sidebar, click on “Organization settings” (you may need to click on the gear icon to access this), then select “Agent pools” under the “Pipelines” section. + +4. **View Agent Pool:** You will see a list of agent pools available in your project. Click on the agent pool that you want to get the ID for. + +5. **Get Agent Pool ID:** Once you’re in the agent pool details view, you should see the ID displayed in the URL of your browser. The URL will look something like this: + +```yaml +https://dev.azure.com/{organization}/{project}/_settings/agentpools?poolId={poolId} +``` +Now run the below command to create the ScaledObject in your Kubernetes cluster. + +```yaml +kubectl apply -f scaled-object.yaml +``` +By running above command our scaled object is created succesfully. To ensure that the KEDA ScaledObject is properly configured and aligned with your target deployment, you can use the following commands to check its status and the status of all resources in the az-devops namespace: +```yaml +kubectl get scaledobject -n az-devops +kubectl get all -n az-devops +``` +The first command (kubectl get scaledobject -n az-devops) will display information about the KEDA ScaledObject, including its name, the associated deployment, the minimum and maximum replica counts, and any configured triggers. This will help you verify that the ScaledObject is configured correctly according to your requirements. + +```yaml +kubectl get scaledobject -n az-devops +NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK PAUSED AGE +azure-pipelines-scaledobject apps/v1.Deployment azsh-windows 1 5 azure-pipelines pipeline-trigger-auth True False False Unknown 95m + +``` +The second command (kubectl get all -n az-devops) will provide information about all resources in the az-devops namespace, including deployments, pods, services, and any other resources that may be present. This will allow you to verify the overall status of your deployment and ensure that all associated resources are running as expected. +```yaml +kubectl get all -n az-devops +NAME READY STATUS RESTARTS AGE +pod/azsh-windows-768bd8bdf8-2zddt 1/1 Running 0 101m + +NAME READY UP-TO-DATE AVAILABLE AGE +deployment.apps/azsh-windows 1/1 1 1 101m + +NAME DESIRED CURRENT READY AGE +replicaset.apps/azsh-windows-768bd8bdf8 1 1 1 101m + +NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE +horizontalpodautoscaler.autoscaling/keda-hpa-azure-pipelines-scaledobject Deployment/azsh-windows 0/1 (avg) 1 5 1 86m + +``` +By running these commands, you can confirm that the KEDA ScaledObject is configured properly and that your deployment is ready to scale based on the specified triggers. If everything looks correct, you can proceed confidently knowing that your Kubernetes environment is set up for efficient and automated scaling. + +In summary, deploying the KEDA ScaledObject with the azure-pipelines trigger enables dynamic scaling of deployments in response to changes in the Azure Pipelines agent pool queue length. This automation enhances efficiency and responsiveness in managing workload demands within the Kubernetes environment. +### Auto Scaling Testing using KEDA +Create two pipelines to test the auto scaling make sure both the pipelines are aligned with the single agent pool which we mentioned in the above ScaledObject. + +***Follow Part 1 & Part 2 to create the pipelines*** + +![img](./img/kubernetes.png.webp) + +Make sure the agent is ready within the agent pool to run the pipelines. + +![img](./img/windows-agents-on-kubernetes.png.webp) +Indeed, with traditional setup, if you have a single agent capable of running only one pipeline at a time, the second pipeline will be queued until the first one completes its execution. However, with KEDA auto-scaling configured based on the Azure Pipelines agent pool queue size, the behavior might change. + +Once KEDA is set up and configured properly, it will monitor the queue size of the Azure Pipelines agent pool. As soon as a pipeline is queued, KEDA will detect the increased load and trigger scaling actions automatically. It will dynamically adjust the number of replicas of your deployment to accommodate the increased workload. + +In this scenario, when the first pipeline is queued and starts running, KEDA will detect the increased demand for resources and scale up the deployment accordingly to handle the workload. As a result, the second pipeline won’t be left waiting in the queue for an available agent. Instead, KEDA will ensure that there are sufficient resources (replicas) available to run both pipelines simultaneously, or at least minimize the wait time for the second pipeline. + +Now, once you trigger the both pipelines +horizontalpodautoscaler.autoscaling startnig creating the new agent. + +```yaml +kubectl get all -n az-devops +NAME READY STATUS RESTARTS AGE +pod/azsh-windows-768bd8bdf8-2zddt 1/1 Running 0 112m +pod/azsh-windows-768bd8bdf8-jfcj9 0/1 ContainerCreating 0 5s + +NAME READY UP-TO-DATE AVAILABLE AGE +deployment.apps/azsh-windows 1/2 2 1 112m + +NAME DESIRED CURRENT READY AGE +replicaset.apps/azsh-windows-768bd8bdf8 2 2 1 112m + +NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE +horizontalpodautoscaler.autoscaling/keda-hpa-azure-pipelines-scaledobject Deployment/azsh-windows 2/1 (avg) 1 5 1 97m +``` +Already, first pipeline is running on first pod and once the second pod is started running then automatically second pipeline will assign to the second pod. +```yaml +kubectl get all -n az-devops +NAME READY STATUS RESTARTS AGE +pod/azsh-windows-768bd8bdf8-2zddt 1/1 Running 0 113m +pod/azsh-windows-768bd8bdf8-jfcj9 1/1 Running 0 31s + +NAME READY UP-TO-DATE AVAILABLE AGE +deployment.apps/azsh-windows 2/2 2 2 113m + +NAME DESIRED CURRENT READY AGE +replicaset.apps/azsh-windows-768bd8bdf8 2 2 2 113m + +NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE +horizontalpodautoscaler.autoscaling/keda-hpa-azure-pipelines-scaledobject Deployment/azsh-windows 1/1 (avg) 1 5 2 97m +``` +Now both the agent are online and ready to execute the pipelines. + +![img](./img/windows-agents-on-kubernetes-2.png.webp) + +So finally without any wait time and without any queue, we successfully completed both the pipelines. + +![img](./img/pipeline-1.png.webp) + +![img](./img/pipeline-2.png.webp) + +Once both the pipelines are executed successfully, the extra agent will be deleted automatically. + +```yaml +kubectl get all -n az-devops +NAME READY STATUS RESTARTS AGE +pod/azsh-windows-768bd8bdf8-2zddt 1/1 Running 0 101m + +NAME READY UP-TO-DATE AVAILABLE AGE +deployment.apps/azsh-windows 1/1 1 1 101m + +NAME DESIRED CURRENT READY AGE +replicaset.apps/azsh-windows-768bd8bdf8 1 1 1 101m + +NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE +horizontalpodautoscaler.autoscaling/keda-hpa-azure-pipelines-scaledobject Deployment/azsh-windows 0/1 (avg) 1 5 1 86m +``` + +By leveraging KEDA auto-scaling, you can improve the efficiency of your CI/CD pipeline execution by dynamically adjusting resource allocation based on workload demands, ultimately reducing queue times and improving overall throughput. + + + + + + diff --git a/docs/013-ado-agents/azure-devops-part-4.md b/docs/013-ado-agents/azure-devops-part-4.md new file mode 100644 index 00000000..3f1c6c50 --- /dev/null +++ b/docs/013-ado-agents/azure-devops-part-4.md @@ -0,0 +1,539 @@ +# ⎈ Azure DevOps — Self Hosted Agents on Kubernetes — PART-4 ⎈ + +#### *Build and Deploy ‘Docker in Docker’ SelfHosted Agents 🐳* + + +![img](./img/docker-in-docker.png.webp) + +Welcome to our series on Azure DevOps self-hosted agents! In this blog, we’ll delve into the powerful realm of Docker in Docker (DinD). This technique allows us to run Docker commands within a Docker container, opening up a world of possibilities for containerized workflows. +This series on making Docker and Azure DevOps work seamlessly together! In this blog post, we’re exploring a cool trick called Docker in Docker (DinD). It’s like nesting containers within containers, and it’s a game-changer for anyone working with Kubernetes. +Imagine being able to run Docker commands right from your Azure DevOps pipelines, all within Kubernetes. That’s exactly what we’ll show you how to do! +We’ll walk you through creating Docker in Docker images and running them in Kubernetes pods. And we’ll connect everything back to Azure DevOps, so you can run your Docker tasks hassle-free. +Get ready to simplify your Kubernetes workflows and take your DevOps game to the next level with Docker in Docker and Azure DevOps. Let’s dive in! + +### Understanding Docker in Docker +Docker in Docker, as the name suggests, enables us to nest Docker containers within one another. This capability is particularly useful in scenarios where we need to build, test, or deploy Dockerized applications within an isolated environment. + +### Setting up Docker in Docker Image +To begin, we’ll create a Docker in Docker image. This image will allow us to execute Docker commands seamlessly within a container. By encapsulating the Docker runtime environment, we ensure consistency and portability across different platforms. + +Let’s build a Docker in Docker(DinD) image: + +Create **Dockerfile** with below content in your docker environment + +```yaml +# +# Ubuntu Bionic + Docker +# +# Instructions for docker installation taken from: +# https://docs.docker.com/install/linux/docker-ce/ubuntu/ +# + +FROM ubuntu:bionic + +# Docker install +RUN apt-get update && apt-get install --no-install-recommends -y \ + apt-transport-https \ + ca-certificates \ + curl \ + gnupg-agent \ + software-properties-common +RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - +RUN apt-key fingerprint 0EBFCD88 + +RUN add-apt-repository \ + "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ + $(lsb_release -cs) \ + stable" +RUN apt-get update && apt-get install --no-install-recommends -y docker-ce docker-ce-cli containerd.io + +RUN apt update -y && apt upgrade -y && apt install curl git jq libicu60 -y + +# Also can be "linux-arm", "linux-arm64". +ENV TARGETARCH="linux-x64" + +WORKDIR /azp/ + +COPY ./start.sh ./ +RUN chmod +x ./start.sh + +ENV AGENT_ALLOW_RUNASROOT="true" + +# # Set start.sh script as ENTRYPOINT. +ENTRYPOINT ["/azp/start.sh"] +``` +Create a start.sh script within the same path with below content(this script will help us to start the docker inside the container and connect to the Azure DevOps agent pool) + +```yaml +#!/bin/bash +set -e + +print_header() { + lightcyan="\033[1;36m" + nocolor="\033[0m" + echo -e "\n${lightcyan}$1${nocolor}\n" +} + +print_header "Starting the Docker Process..." +dockerd > /var/log/dockerd.log 2>&1 & + +if [ -z "${AZP_URL}" ]; then + echo 1>&2 "error: missing AZP_URL environment variable" + exit 1 +fi + +if [ -z "${AZP_TOKEN_FILE}" ]; then + if [ -z "${AZP_TOKEN}" ]; then + echo 1>&2 "error: missing AZP_TOKEN environment variable" + exit 1 + fi + + AZP_TOKEN_FILE="/azp/.token" + echo -n "${AZP_TOKEN}" > "${AZP_TOKEN_FILE}" +fi + +unset AZP_TOKEN + +if [ -n "${AZP_WORK}" ]; then + mkdir -p "${AZP_WORK}" +fi + +cleanup() { + trap "" EXIT + + if [ -e ./config.sh ]; then + print_header "Cleanup. Removing Azure Pipelines agent..." + + # If the agent has some running jobs, the configuration removal process will fail. + # So, give it some time to finish the job. + while true; do + ./config.sh remove --unattended --auth "PAT" --token $(cat "${AZP_TOKEN_FILE}") && break + + echo "Retrying in 30 seconds..." + sleep 30 + done + fi +} + +# Let the agent ignore the token env variables +export VSO_AGENT_IGNORE="AZP_TOKEN,AZP_TOKEN_FILE" + +print_header "1. Determining matching Azure Pipelines agent..." + +AZP_AGENT_PACKAGES=$(curl -LsS \ + -u user:$(cat "${AZP_TOKEN_FILE}") \ + -H "Accept:application/json;" \ + "${AZP_URL}/_apis/distributedtask/packages/agent?platform=${TARGETARCH}&top=1") + +AZP_AGENT_PACKAGE_LATEST_URL=$(echo "${AZP_AGENT_PACKAGES}" | jq -r ".value[0].downloadUrl") + +if [ -z "${AZP_AGENT_PACKAGE_LATEST_URL}" -o "${AZP_AGENT_PACKAGE_LATEST_URL}" == "null" ]; then + echo 1>&2 "error: could not determine a matching Azure Pipelines agent" + echo 1>&2 "check that account "${AZP_URL}" is correct and the token is valid for that account" + exit 1 +fi + +print_header "2. Downloading and extracting Azure Pipelines agent..." + +curl -LsS "${AZP_AGENT_PACKAGE_LATEST_URL}" | tar -xz & wait $! + +source ./env.sh + +trap "cleanup; exit 0" EXIT +trap "cleanup; exit 130" INT +trap "cleanup; exit 143" TERM + +print_header "3. Configuring Azure Pipelines agent..." + +./config.sh --unattended \ + --agent "${AZP_AGENT_NAME:-$(hostname)}" \ + --url "${AZP_URL}" \ + --auth "PAT" \ + --token $(cat "${AZP_TOKEN_FILE}") \ + --pool "${AZP_POOL:-Default}" \ + --work "${AZP_WORK:-_work}" \ + --replace \ + --acceptTeeEula & wait $! + +print_header "4. Running Azure Pipelines agent..." + +chmod +x ./run.sh + +# To be aware of TERM and INT signals call ./run.sh +# Running it with the --once flag at the end will shut down the agent after the build is executed +./run.sh "$@" & wait $! +``` +Now build the image using below command + +```yaml +docker build -t : . +``` +Remember to replace dind-image and dind-tag with your desired image name and tag. + +Let’s check the newly created image using below command + +```yaml +docker images +``` +Following these steps ensures the creation of a Docker image capable of running Docker commands within containers i.e., Docker in Docker (DinD). + +### Deploying Docker in Docker(DinD) image in Kubernetes +Before diving into deploying our Docker in Docker(DinD) image in Kubernetes, it’s crucial to prepare our Kubernetes cluster to support Docker in Docker without necessitating root privileges. To accomplish this, we’ll employ Sysbox, a powerful tool that facilitates running Docker in Docker within Kubernetes pods without the need for elevated permissions. + +By leveraging Sysbox, we can seamlessly execute Docker commands within our pods, bypassing the requirement for root privileges on the worker nodes’ Docker daemon. This not only simplifies the setup process but also enhances security by mitigating potential root access vulnerabilities. + +Before proceeding with our Docker in Docker deployment, let’s first establish Sysbox within our Kubernetes cluster. This foundational step ensures a smooth and secure environment for our containerized workflows. + +***Setting up sysbox*** + +Installation is easily done via a daemonset called “sysbox-deploy-k8s”, which installs the Sysbox and CRI-O binaries onto the desired K8s nodes and performs all associated config. + +***Select the specific worker nodes to run your sysbox using labels so that other workloads cannot be disturbed*** + +**Steps:** + +Add labels to the target worker nodes + +```yaml +kubectl label nodes sysbox-install=yes +``` +Create the sysbox-daemon.yaml manifest file using below content: + +```yaml +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: sysbox-label-node + namespace: kube-system +--- +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: sysbox-node-labeler +rules: +- apiGroups: [""] + resources: ["nodes"] + verbs: ["get", "patch"] +--- +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: sysbox-label-node-rb +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: sysbox-node-labeler +subjects: +- kind: ServiceAccount + name: sysbox-label-node + namespace: kube-system +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: sysbox-deploy-k8s + namespace: kube-system +spec: + selector: + matchLabels: + sysbox-install: "yes" + template: + metadata: + labels: + sysbox-install: "yes" + spec: + serviceAccountName: sysbox-label-node + nodeSelector: + sysbox-install: "yes" + tolerations: + - key: "sysbox-runtime" + operator: "Equal" + value: "not-running" + effect: "NoSchedule" + containers: + - name: sysbox-deploy-k8s + image: registry.nestybox.com/nestybox/sysbox-deploy-k8s:v0.6.3 + imagePullPolicy: Always + command: [ "bash", "-c", "/opt/sysbox/scripts/sysbox-deploy-k8s.sh ce install" ] + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + securityContext: + privileged: true + volumeMounts: + - name: host-etc + mountPath: /mnt/host/etc + - name: host-osrelease + mountPath: /mnt/host/os-release + - name: host-dbus + mountPath: /var/run/dbus + - name: host-run-systemd + mountPath: /run/systemd + - name: host-lib-systemd + mountPath: /mnt/host/lib/systemd/system + - name: host-etc-systemd + mountPath: /mnt/host/etc/systemd/system + - name: host-lib-sysctl + mountPath: /mnt/host/lib/sysctl.d + - name: host-opt-lib-sysctl + mountPath: /mnt/host/opt/lib/sysctl.d + - name: host-usr-bin + mountPath: /mnt/host/usr/bin + - name: host-opt-bin + mountPath: /mnt/host/opt/bin + - name: host-usr-local-bin + mountPath: /mnt/host/usr/local/bin + - name: host-opt-local-bin + mountPath: /mnt/host/opt/local/bin + - name: host-usr-lib-mod-load + mountPath: /mnt/host/usr/lib/modules-load.d + - name: host-opt-lib-mod-load + mountPath: /mnt/host/opt/lib/modules-load.d + - name: host-run + mountPath: /mnt/host/run + - name: host-var-lib + mountPath: /mnt/host/var/lib + volumes: + - name: host-etc + hostPath: + path: /etc + - name: host-osrelease + hostPath: + path: /etc/os-release + - name: host-dbus + hostPath: + path: /var/run/dbus + - name: host-run-systemd + hostPath: + path: /run/systemd + - name: host-lib-systemd + hostPath: + path: /lib/systemd/system + - name: host-etc-systemd + hostPath: + path: /etc/systemd/system + - name: host-lib-sysctl + hostPath: + path: /lib/sysctl.d + - name: host-opt-lib-sysctl + hostPath: + path: /opt/lib/sysctl.d + - name: host-usr-bin + hostPath: + path: /usr/bin/ + - name: host-opt-bin + hostPath: + path: /opt/bin/ + - name: host-usr-local-bin + hostPath: + path: /usr/local/bin/ + - name: host-opt-local-bin + hostPath: + path: /opt/local/bin/ + - name: host-usr-lib-mod-load + hostPath: + path: /usr/lib/modules-load.d + - name: host-opt-lib-mod-load + hostPath: + path: /opt/lib/modules-load.d + - name: host-run + hostPath: + path: /run + - name: host-var-lib + hostPath: + path: /var/lib + updateStrategy: + rollingUpdate: + maxUnavailable: 1 + type: RollingUpdate +--- +apiVersion: node.k8s.io/v1 +kind: RuntimeClass +metadata: + name: sysbox-runc +handler: sysbox-runc +scheduling: + nodeSelector: + sysbox-runtime: running +--- +``` +Now let’s deploy the above daemonset using below command +```yaml +kubectl apply -f sysbox-daemon.yaml +``` +Make sure the sysbox-deploy-k8s daemonset pods are up and runing without any errors. +```yaml +kubectl get pods -n kube-system -l sysbox-install=yes +NAME READY STATUS RESTARTS AGE +sysbox-deploy-k8s-7zhbl 1/1 Running 1 24h +``` +```yaml +kubectl get nodes -l sysbox-install=yes +NAME STATUS ROLES AGE VERSION +aks-userpool-48158934-vmss000000 Ready agent 25h v1.27.9 +``` +### Now let’s deploy our Docker in Docker(DinD) +Create dind-deploy.yaml manifest file using below content: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: azsh-dind + namespace: az-devops + labels: + app: azsh-dind-agent +spec: + replicas: 1 + selector: + matchLabels: + app: azsh-dind-agent + template: + metadata: + labels: + app: azsh-dind-agent + annotations: + io.kubernetes.cri-o.userns-mode: "auto:size=65536" + spec: + runtimeClassName: sysbox-runc + containers: + - name: dind + image: anvesh35/azsh-dind-agent + command: ["/azp/start.sh"] + env: + - name: AZP_URL + valueFrom: + secretKeyRef: + name: azdevops + key: AZP_URL + - name: AZP_TOKEN + valueFrom: + secretKeyRef: + name: azdevops + key: AZP_TOKEN + - name: AZP_POOL + valueFrom: + secretKeyRef: + name: azdevops + key: AZP_POOL +``` + +Create Kubernetes secret which is used to connect & authenticate withthe Azure DevOps + +```yaml +kubectl -n az-devops create secret generic azdevops \ + --from-literal=AZP_URL=https://dev.azure.com/yourOrg \ + --from-literal=AZP_TOKEN=YourPAT \ + --from-literal=AZP_POOL=NameOfYourPool +``` +Now apply above dind-deploy.yaml manifest file using the below commands +```yaml +kubectl apply -f dind-deploy.yaml +``` +Verify that the DinD agent pods are created and running successfully by checking their status with below command + +```yaml +kubectl get pods -n az-devops +``` +```yaml +NAME READY STATUS RESTARTS AGE +azsh-dind-5d74b8576c-ns4n4 1/1 Running 0 3h +``` +Now Azure DevOps DinD Self-Hosted Agent pod is up and running. Now, let’s ensure it’s available in the Azure DevOps Agent Pool by following these steps: + +1. **Login to Azure DevOps Portal:** Go to the Azure DevOps portal and log in with your credentials. +2. **Navigate to Project Settings:** Once logged in, navigate to your project settings. You can usually find this option in the bottom-left corner of the Azure DevOps portal. +3. **Click on Agent Pools:** In the project settings, click on the “Agent pools” option. This will take you to the page where you can manage agent pools for your project. +4. **Select the Target Agent Pool:** Choose the target agent pool that you are using within the secret mentioned in your setup. This is the pool where your DinD Self-Hosted Agent pod should be available. +5. **Go to the Agents Section:** Within the selected agent pool, navigate to the “Agents” section. Here, you should be able to see a list of agents registered in this pool. +6. **Verify Your DinD Agent:** Look for your DinD Self-Hosted Agent in the list. It should be listed here if it’s successfully registered and connected to the Azure DevOps Agent Pool. + + +By following these steps, you can ensure that your DinD Self-Hosted Agent is correctly configured and available in the desired agent pool within Azure DevOps. If you encounter any issues, double-check the configuration and connectivity settings of your agent. + +![img](./img/dind-agent.png.webp) + +#### Running Docker commands on DinD Self-Hosted Agents +Lets create a new sample DinD pipeline using below steps. + +Create a new file in **/pipelines/sample-dind-pipeline.yaml** path in your Azure DevOps repository using below content. + +```yaml +trigger: none + +pool: agent-on-kubernetes + +resources: + repositories: + - repository: templates + displayname: templates from devops + type: git + name: selfhosted-agents-kubernetes/selfhosted-agents-on-k8s + ref: main + +stages: +- stage: Deploy + jobs: + - deployment: agentSetup + environment: test + displayName: 'Agent Setup' + strategy: + runOnce: + deploy: + steps: + #Branch Checkout + - checkout: self + persistCredentials: true + + - template: /scripts/dind/hello-world.yaml@templates +``` + +Create the first simeple docker commands script /scripts/kubernetes/hello-world.yaml to run the simple Hello World container on DinD Self-Hosted Agent on kubernetes. +```yaml +steps: + - bash: | + echo "Running Hello World Docker image" + docker run hello-world + displayName: 'Test Docker Commands' +``` +Now create a new pipeline using below steps. + +1. Select the project where you want to create the YAML pipeline. +2. Click on the “Pipelines” menu option in the left sidebar. +3. You should see a button labeled “New pipeline” on the Pipelines page. Click on it to start creating a new pipeline. +4. Choose the repository where your code is located. Azure DevOps supports Git repositories, GitHub repositories, and others. +5. Choose where your YAML file is located(i.e., /pipelines/sample-dind-pipeline.yaml) +6. Now save the pipeline. +7. Now you should see your pipeline listed in the Pipelines page of your Azure DevOps project. You can review the pipeline configuration and manually trigger a run to test it. + +That’s it! You’ve successfully created a YAML pipeline in Azure DevOps and ran it using **DinD Self-Hosted Agent** from kubernetes cluster. +![img](./img/dind-pipeline.png.webp) + +![img](./img/dind-run-on-k8s-node.png.webp) + +![img](./img/dind-output-feom-k8s-pod.png.webp) + +Kubernetes DinD pod logs: +```yaml +$ kubectl logs azsh-dind-5d74b8576c-ns4n4 -n az-devops --tail=10 +Testing agent connection. +2024-03-14 00:15:57Z: Settings Saved. + +4. Running Azure Pipelines agent... + +Scanning for tool capabilities. +Connecting to the server. +2024-03-14 00:15:58Z: Listening for Jobs +2024-03-14 00:28:10Z: Running job: Agent Setup +2024-03-14 00:28:19Z: Job Agent Setup completed with result: Succeeded +``` +### Advantages of Docker in Docker(DinD) for Azure DevOps + +- **Enhanced Isolation:** Docker in Docker provides a sandboxed environment for executing Docker commands, minimizing interference with the host system. +- **Improved Portability:** By encapsulating the Docker runtime environment within a container, we ensure consistent behavior across different environments. +- **Seamless Integration:** Integrating Docker in Docker with Azure DevOps self-hosted agents enables us to seamlessly incorporate Docker tasks into our CI/CD pipelines. +- **Scalability:** Leveraging Kubernetes for orchestration allows us to scale our containerized workloads dynamically, ensuring optimal resource utilization. +- **Reduced Privilege Requirements:** With DinD, Docker commands can be executed within containers without requiring root privileges on the host machine. This is particularly advantageous in Kubernetes environments where granting root access to containers can introduce security risks. By utilizing DinD, Kubernetes clusters can maintain stricter security postures by minimizing the need for elevated permissions. +- **Enhanced Resource Utilization:** Kubernetes excels at efficiently managing containerized workloads, including DinD containers. By orchestrating DinD containers alongside other Kubernetes resources, such as pods and deployments, clusters can optimize resource utilization while maintaining security boundaries. This ensures that resources are allocated appropriately and reduces the risk of resource contention or abuse. \ No newline at end of file diff --git a/docs/013-ado-agents/azure-devops-part-5.md b/docs/013-ado-agents/azure-devops-part-5.md new file mode 100644 index 00000000..68a5c61e --- /dev/null +++ b/docs/013-ado-agents/azure-devops-part-5.md @@ -0,0 +1,59 @@ +# ⎈ Azure DevOps — Self Hosted Agents on Kubernetes — PART-5 ⎈ +#### *Deploy Linux, Windows, DinD Self-Hosted Agents using Helm Charts 🐳* + +![img](./img/helm-charts.png.gif) + +Welcome to Part-5. + In our journey towards optimizing our CI/CD workflows, we’ve explored setting up Linux, Windows, and Docker-in-Docker (DinD) self-hosted agents, integrating them into Kubernetes, and ensuring seamless connections with Azure DevOps. However, managing these agents individually through manifest files can become cumbersome in real-world projects. To streamline this process, we’re introducing Helm charts for deploying these agents. +### Introducing Helm Charts +Helm charts offer a convenient way to manage the deployment of complex applications and services on Kubernetes. By encapsulating the configuration details into reusable templates, Helm charts simplify the deployment process and enable better control over various components. + +### Helm Chart Structure + +```yaml +az-selfhosted-agents/ + ├── charts/ + ├── templates/ + │ ├── dind-deploy.yaml + │ ├── windows-deploy.yaml + │ ├── linux-deploy.yaml + │ ├── secret.yaml + │ ├── sysbox-install.yaml + │ ├── _helpers.tpl + ├── values.yaml + ├── .helmignore + ├── Chart.yaml + ├── LICENSE + └── README.md +``` + +In our Helm chart, we’ve consolidated the deployment manifest files for all three types of agents(Linux, Windows, and DinD) along with the necessary configurations for secrets and Sysbox setup. + +### Deploying Agents with Helm Charts + +In this part, we’ll deploy Linux, Windows, and DinD agents using a single Helm chart. The flexibility of Helm allows us to selectively install or skip specific Self-Hosted Agents based on our project requirements. + +By default, all three types of agents (Linux, Windows, and DinD) are disabled in the Helm chart. To install specific agents, we can use the following command: + + +Example: Linux + +```yaml +helm install az-selfhosted-agents ./az-selfhosted-agents \ + --set linux.enabled=true \ + --create-namespace -n az-devops +``` +This command creates a new namespace az-devops and installs the specified agents i.e., Linux Agent. + +Alternatively, if you want to install all agents, you can use the following command: + +```yaml +helm install az-selfhosted-agents ./az-selfhosted-agents \ + --set windows.enabled=true \ + --set linux.enabled=true \ + --set dind.enabled=true \ + --create-namespace -n az-devops +``` + +## Conclusion +With Helm charts, managing the deployment of self-hosted agents becomes more efficient and scalable. By leveraging Helm’s capabilities, we can easily configure and deploy agents according to our project requirements, simplifying the CI/CD pipeline setup process. diff --git a/docs/013-ado-agents/img/agent-on-kubernetes.png.webp b/docs/013-ado-agents/img/agent-on-kubernetes.png.webp new file mode 100644 index 00000000..40add324 Binary files /dev/null and b/docs/013-ado-agents/img/agent-on-kubernetes.png.webp differ diff --git a/docs/013-ado-agents/img/auto-scaling.png.webp b/docs/013-ado-agents/img/auto-scaling.png.webp new file mode 100644 index 00000000..d7393b22 Binary files /dev/null and b/docs/013-ado-agents/img/auto-scaling.png.webp differ diff --git a/docs/013-ado-agents/img/dind-agent.png.webp b/docs/013-ado-agents/img/dind-agent.png.webp new file mode 100644 index 00000000..f1f9a7e2 Binary files /dev/null and b/docs/013-ado-agents/img/dind-agent.png.webp differ diff --git a/docs/013-ado-agents/img/dind-output-feom-k8s-pod.png.webp b/docs/013-ado-agents/img/dind-output-feom-k8s-pod.png.webp new file mode 100644 index 00000000..17182eac Binary files /dev/null and b/docs/013-ado-agents/img/dind-output-feom-k8s-pod.png.webp differ diff --git a/docs/013-ado-agents/img/dind-pipeline.png.webp b/docs/013-ado-agents/img/dind-pipeline.png.webp new file mode 100644 index 00000000..e93386c9 Binary files /dev/null and b/docs/013-ado-agents/img/dind-pipeline.png.webp differ diff --git a/docs/013-ado-agents/img/dind-run-on-k8s-node.png.webp b/docs/013-ado-agents/img/dind-run-on-k8s-node.png.webp new file mode 100644 index 00000000..7007c084 Binary files /dev/null and b/docs/013-ado-agents/img/dind-run-on-k8s-node.png.webp differ diff --git a/docs/013-ado-agents/img/docker-in-docker.png.webp b/docs/013-ado-agents/img/docker-in-docker.png.webp new file mode 100644 index 00000000..9dee9e02 Binary files /dev/null and b/docs/013-ado-agents/img/docker-in-docker.png.webp differ diff --git a/docs/013-ado-agents/img/hello-world.png.webp b/docs/013-ado-agents/img/hello-world.png.webp new file mode 100644 index 00000000..6879533c Binary files /dev/null and b/docs/013-ado-agents/img/hello-world.png.webp differ diff --git a/docs/013-ado-agents/img/helm-charts.png.gif b/docs/013-ado-agents/img/helm-charts.png.gif new file mode 100644 index 00000000..6c74b209 Binary files /dev/null and b/docs/013-ado-agents/img/helm-charts.png.gif differ diff --git a/docs/013-ado-agents/img/hostname.png.webp b/docs/013-ado-agents/img/hostname.png.webp new file mode 100644 index 00000000..a6923f39 Binary files /dev/null and b/docs/013-ado-agents/img/hostname.png.webp differ diff --git a/docs/013-ado-agents/img/kubernetes.png.webp b/docs/013-ado-agents/img/kubernetes.png.webp new file mode 100644 index 00000000..408e5c4b Binary files /dev/null and b/docs/013-ado-agents/img/kubernetes.png.webp differ diff --git a/docs/013-ado-agents/img/linux-agent.png.webp b/docs/013-ado-agents/img/linux-agent.png.webp new file mode 100644 index 00000000..ec9b5925 Binary files /dev/null and b/docs/013-ado-agents/img/linux-agent.png.webp differ diff --git a/docs/013-ado-agents/img/pipeline-1.png.webp b/docs/013-ado-agents/img/pipeline-1.png.webp new file mode 100644 index 00000000..6e9e1491 Binary files /dev/null and b/docs/013-ado-agents/img/pipeline-1.png.webp differ diff --git a/docs/013-ado-agents/img/pipeline-2.png.webp b/docs/013-ado-agents/img/pipeline-2.png.webp new file mode 100644 index 00000000..7807b547 Binary files /dev/null and b/docs/013-ado-agents/img/pipeline-2.png.webp differ diff --git a/docs/013-ado-agents/img/pipeline.png.webp b/docs/013-ado-agents/img/pipeline.png.webp new file mode 100644 index 00000000..27cc4c27 Binary files /dev/null and b/docs/013-ado-agents/img/pipeline.png.webp differ diff --git a/docs/013-ado-agents/img/sample-pipeline.png.webp b/docs/013-ado-agents/img/sample-pipeline.png.webp new file mode 100644 index 00000000..032b0fdc Binary files /dev/null and b/docs/013-ado-agents/img/sample-pipeline.png.webp differ diff --git a/docs/013-ado-agents/img/sample-run.png.webp b/docs/013-ado-agents/img/sample-run.png.webp new file mode 100644 index 00000000..7c0b0c5f Binary files /dev/null and b/docs/013-ado-agents/img/sample-run.png.webp differ diff --git a/docs/013-ado-agents/img/windows-agents-on-kubernetes-2.png.webp b/docs/013-ado-agents/img/windows-agents-on-kubernetes-2.png.webp new file mode 100644 index 00000000..ed75941e Binary files /dev/null and b/docs/013-ado-agents/img/windows-agents-on-kubernetes-2.png.webp differ diff --git a/docs/013-ado-agents/img/windows-agents-on-kubernetes.png.webp b/docs/013-ado-agents/img/windows-agents-on-kubernetes.png.webp new file mode 100644 index 00000000..d6fb49bf Binary files /dev/null and b/docs/013-ado-agents/img/windows-agents-on-kubernetes.png.webp differ diff --git a/docs/013-ado-agents/img/windows-agents.png.webp b/docs/013-ado-agents/img/windows-agents.png.webp new file mode 100644 index 00000000..ca15306a Binary files /dev/null and b/docs/013-ado-agents/img/windows-agents.png.webp differ diff --git a/docs/deployment-strategies/blue-green.md b/docs/deployment-strategies/blue-green.md index bb932665..a7a42820 100644 --- a/docs/deployment-strategies/blue-green.md +++ b/docs/deployment-strategies/blue-green.md @@ -16,6 +16,7 @@ In the fast-paced world of software development, deploying updates and new featu #### What is Blue-Green Deployment? Blue-Green Deployment is a technique used to release software updates with minimal downtime and risk. In this approach, two identical environments, typically referred to as “blue” and “green,” are set up: one represents the currently live production environment (blue), while the other is a clone where the new version is deployed (green). Once the new version in the green environment is tested and ready, traffic is switched from blue to green, making the green environment the new production environment. + ![Blue-Green Deployment Flowchart](./img/blue-green-deployment-flowchart.png.webp) ### Benefits of Blue-Green Deployment: @@ -156,8 +157,10 @@ replicaset.apps/green-deploy-6c976bd585 3 3 3 15m ``` Let’s try to access the application to verify the traffic and functionality. #### Testing-blue-application + ![blue-application](./img/testing-blue-applications.webp) + Or we can use the below curl command to test the traffic: ```yaml $ for i in $(seq 1 10); do curl ; done | grep -o '[^<]*' | sed 's/<[^>]*>//g' @@ -230,6 +233,7 @@ kubectl apply -f svc-manifest.yaml Once the service is routed to older version(v1.0.0 from Blue environment) then verify the traffic status: #### Rollout-older-version + ![Rollout-older-version](./img/rollout-older-version.png.webp) diff --git a/docs/deployment-strategies/canary.md b/docs/deployment-strategies/canary.md index 1e108c3e..3c62ac31 100644 --- a/docs/deployment-strategies/canary.md +++ b/docs/deployment-strategies/canary.md @@ -14,6 +14,7 @@ In Kubernetes, canary deployments are achieved by running multiple versions of a ![Canary Deployment](./img/canary-deployment.png.webp) + ### Why Canary Deployment? Canary deployments offer several advantages, including: **Risk Reduction:** By exposing only a small percentage of users to the new version, you can mitigate the impact of potential bugs or issues before they affect the entire user base. @@ -182,6 +183,7 @@ base-app-5dbddc57c5-n8l5q The output from the previous command shows that the curl command hits the older version (base application) 7 times and the newer version (canary application) 3 times. **Old-version** + ![Old-version](./img/old-version.png.webp) **New-version** @@ -193,5 +195,3 @@ The output from the previous command shows that the curl command hits the older - - diff --git a/docs/deployment-strategies/rolling-and-recreate-update.md b/docs/deployment-strategies/rolling-and-recreate-update.md index 55aed01d..cc63d3d2 100644 --- a/docs/deployment-strategies/rolling-and-recreate-update.md +++ b/docs/deployment-strategies/rolling-and-recreate-update.md @@ -9,6 +9,7 @@ sidebar_position: 3 #### *✨ Choose the Right Strategy for Seamless Deployments* + ![Rolling Update and Recreate Deployment Strategies](./img/rolling-and-recreate-strategies.png.webp) When deploying applications in Kubernetes, choosing the right deployment strategy is crucial for ensuring minimal downtime and smooth updates. Two common strategies are Rolling Updates and Recreate. In this blog post, we’ll explore these strategies, their differences, and how to configure them effectively. @@ -24,6 +25,7 @@ Rolling update is the default deployment strategy in Kubernetes. It ensures that **2. Gradual Rollout:** It ensures that a specified number of new pods are available and healthy before terminating old ones. **3. Controlled Progress:** The rollout is managed by controlling parameters like maxUnavailable and maxSurge. ### Flow-chart + ![Deployment Strategies](./img/deployment-strategies.png.webp) ### Animated Flowchart diff --git a/docs/monitoring/alertmanager.md b/docs/monitoring/alertmanager.md index 7251eeb8..a6574c7a 100644 --- a/docs/monitoring/alertmanager.md +++ b/docs/monitoring/alertmanager.md @@ -5,85 +5,241 @@ sidebar_id: "alertmanager" sidebar_position: 3 --- -# Alertmanager: Managing Alerts in Kubernetes +# ⎈ A Hands-On Guide: Setting Up Prometheus and AlertManager in Kubernetes with Custom Alerts 🛠️ -Alertmanager is a component of the Prometheus ecosystem that handles alerts generated by Prometheus. It routes, deduplicates, and manages alerts, ensuring that critical issues are brought to the attention of the right people. This guide provides an overview of Alertmanager, its benefits, and how to set it up in a Kubernetes cluster. +#### *⇢ Understanding Prometheus & AlertManager Setup in Kubernetes with Custom Rules: A Comprehensive Guide* ---- -
-

🚧 Work in Progress

-

This page is currently under construction. Please check back later for detailed information about Alertmanager setup and usage in Kubernetes.

-
---- +![img](./img/alertmanager.png.webp) -## Table of Contents -- [Introduction](#introduction) -- [Why Use Alertmanager?](#why-use-alertmanager) -- [Architecture](#architecture) -- [Installation](#installation) -- [Configuration](#configuration) -- [Best Practices](#best-practices) +Monitoring your Kubernetes cluster is crucial for maintaining the health and performance of your applications. In this guide, we’ll walk through setting up Prometheus and Alertmanager using Helm and configuring custom alert rules to monitor your cluster effectively. +If you haven’t already, I recommend checking out my previous blog post on Kubernetes monitoring using Prometheus and Grafana for a comprehensive overview of setting up Prometheus and Grafana. ---- +### Prerequisites -## Introduction -Alertmanager is a critical component for managing alerts in Kubernetes. It works with Prometheus to ensure that alerts are routed to the appropriate channels, such as email, Slack, or PagerDuty, and provides mechanisms for silencing and grouping alerts. +Before we start, ensure you have the following: ---- +- A running Kubernetes cluster. +- Helm installed on your local machine. -## Why Use Alertmanager? -- **Centralized Alert Management**: Consolidates alerts from multiple Prometheus instances. -- **Routing and Notification**: Sends alerts to the right people or systems based on defined rules. -- **Deduplication**: Prevents duplicate alerts from overwhelming notification channels. -- **Silencing**: Temporarily suppresses alerts during maintenance or known issues. +![img](./img/custom-alerts.png.gif) ---- +### Step 1: Install Prometheus and Alertmanager using Helm -## Architecture -Alertmanager works as follows: -1. **Prometheus**: Generates alerts based on defined rules. -2. **Alertmanager**: Receives alerts from Prometheus and processes them. -3. **Notification Channels**: Sends alerts to configured channels like email, Slack, or PagerDuty. +We’ll use the kube-prometheus-stack Helm chart from the Prometheus community. This chart includes Prometheus, Alertmanager, and Grafana, along with several pre-configured dashboards and alerting rules. ---- -## Installation -> **Note:** Detailed installation steps will be added soon. +First, create a custom-values.yaml file to specify our custom configurations: + +```yaml +# custom-values.yaml +prometheus: + service: + type: NodePort +grafana: + service: + type: NodePort +alertmanager: + service: + type: NodePort +``` +Next, install the kube-prometheus-stack using Helm: + +```yaml +helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -f custom-values.yaml +``` +This command will deploy Prometheus, Alertmanager, and Grafana to your cluster with the services exposed as NodePort. + +![img](./img/alert-manager-architecture.png.webp) + +### Step 2: Verifying the Setup +To verify that Prometheus and Alertmanager are running correctly, you can access their web UIs. Since we exposed their services as NodePort, you can use kubectl port-forward to access them locally or you can use external IP of the cluster and nodeport of the respective service. + +For Prometheus: + +![img](./img/prometheus-ui.png.webp) + + +For Alertmanager: + +![img](./img/alertmanager-ui.png.webp) + +For Grafana: + +![img](./img/grafana-dashboard.png.webp) + +Access the default Alertmanager rules: +To access the alertmanager rules/alerts, navigate to Alerts section on prometheus UI: + + +![img](./img/alerts-in-prometheus-ui.png.webp) + + +Here we can see that three alerts are in Firing state, so these alerts we can see in AlertManager UI to manage: + +![img](./img/alerts-fired.png.webp) + +### Step 3: Configuring Custom Alert Rules +From the above steps we can see that the default alerts are configured in prometheus and alertmanager. Now, let’s add custom alert rules to monitor our Kubernetes cluster. We’ll create a PrometheusRule manifest to define these alerts. + +Create a file named custom-alert-rules.yaml with the following content: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + labels: + app: kube-prometheus-stack + app.kubernetes.io/instance: kube-prometheus-stack + release: kube-prometheus-stack + name: kube-pod-not-ready +spec: + groups: + - name: my-pod-demo-rules + rules: + - alert: KubernetesPodNotHealthy + expr: sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"}) > 0 + for: 1m + labels: + severity: critical + annotations: + summary: Kubernetes Pod not healthy (instance {{ $labels.instance }}) + description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-running state for longer than 15 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: KubernetesDaemonsetRolloutStuck + expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled * 100 < 100 or kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled > 0 + for: 10m + labels: + severity: warning + annotations: + summary: Kubernetes DaemonSet rollout stuck (instance {{ $labels.instance }}) + description: "Some Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled or not ready\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: ContainerHighCpuUtilization + expr: (sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod, container) / sum(container_spec_cpu_quota{container!=""}/container_spec_cpu_period{container!=""}) by (pod, container) * 100) > 80 + for: 2m + labels: + severity: warning + annotations: + summary: Container High CPU utilization (instance {{ $labels.instance }}) + description: "Container CPU utilization is above 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: ContainerHighMemoryUsage + expr: (sum(container_memory_working_set_bytes{name!=""}) BY (instance, name) / sum(container_spec_memory_limit_bytes > 0) BY (instance, name) * 100) > 80 + for: 2m + labels: + severity: warning + annotations: + summary: Container High Memory usage (instance {{ $labels.instance }}) + description: "Container Memory usage is above 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: KubernetesContainerOomKiller + expr: (kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m]) == 1 + for: 0m + labels: + severity: warning + annotations: + summary: Kubernetes Container oom killer (instance {{ $labels.instance }}) + description: "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - alert: KubernetesPodCrashLooping + expr: increase(kube_pod_container_status_restarts_total[1m]) > 3 + for: 2m + labels: + severity: warning + annotations: + summary: Kubernetes pod crash looping (instance {{ $labels.instance }}) + description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" +``` +Apply the manifest to your Kubernetes cluster: + +```yaml +kubectl apply -f custom-alert-rules.yaml +``` +Once the PromethuesRule is created then check the newly created alerts on Prometheus UI. + +![img](./img/promethues-rule.png.webp) + +That’s it we have successfully added our new custom alerts on alertmanager. + +### Step 4: Test the custom rules: +To ensure our custom alert rules are working correctly, we’ll simulate a failure by creating a pod with an incorrect image tag. This will help us verify if the alerts are triggered and properly reported in Alertmanager. KubernetesPodNotHealthy alert is responsible to report this alert. + + +1. **Create a Pod with an Invalid Image** + +This will simulate a failure by using an incorrect image tag: + +```yaml +kubectl run nginx-pod --image=nginx:lates3 +``` +Note: The correct tag is latest, so lates3 is intentionally incorrect to cause the pod to fail. + +2. **Verify the Pod Status** + +Check the status of the pod to confirm that it is failing: + +```yaml +kubectl get pods nginx-pod +NAME READY STATUS RESTARTS AGE +nginx-pod 0/1 ImagePullBackOff 0 5m35s +``` + +You should see the pod in a ErrImagePull state. You can also describe the pod for more details ---- -## Configuration -Alertmanager configuration involves defining alert routing, grouping, and notification channels. Example configuration: +```yaml +kubectl describe pod nginx-pod +``` +This will provide information about why the pod is failing. + +3. **Check for Alerts in Alertmanager** + +Since you have set up custom alert rules, these should trigger an alert when the pod fails. Look for alerts related to pod failures. The custom alerts you configured should appear in the Alertmanager interface. + +![img](./img/alert-triggered-on-prometheus.png.webp) +![img](./img/alert-triggered-on-alertmanager.png.webp) + +This process ensures that your custom alerting rules are working correctly and that you are notified when a pod fails. + +### Step 5: Understanding Custom Alert Rules +To better understand how to create and customize alert rules, let’s break down one of the alert rules defined in our custom-alert-rules.yaml. We'll use the KubernetesPodNotHealthy alert as an example: + +```yaml +- alert: KubernetesPodNotHealthy + expr: sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"}) > 0 + for: 1m + labels: + severity: critical + annotations: + summary: Kubernetes Pod not healthy (instance {{ $labels.instance }}) + description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-running state for longer than 15 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" +``` +## Alert Fields Breakdown + +**alert:** The name of the alert (KubernetesPodNotHealthy). +**expr:** The Prometheus expression to evaluate. This alert triggers if any pod in a Pending, Unknown, or Failed state is detected. +**for:** The duration for which the condition should be true before the alert fires (1m or 1 minute). +**labels:** Additional labels to categorize the alert. In this case, we label it with a severity of critical. +**annotations:** Descriptive information about the alert. These fields can provide context when the alert is triggered: +**— — summary:** A brief description of the alert (Kubernetes Pod not healthy (instance labels.instance )). +**— — description:** A detailed description that includes dynamic values from the alert labels (Pod $labels.namespace /labels.pod has been in a non-running state for longer than 15 minutes.\n VALUE = value \n LABELS = $labels ). + +These fields help to provide clarity and context when an alert is triggered, making it easier to diagnose and respond to issues in your cluster. + + +For more examples of custom Prometheus alert rules, you can refer to this Awesome Prometheus Alerts repository. +### Step 5: Cleanup +If you want to remove Prometheus, Alertmanager, and Grafana from your Kubernetes cluster, you can do so with the following commands: + +1. **Uninstall the Helm Chart:** +```yaml +helm uninstall kube-prometheus-stack +``` +2. **Verify Resources Are Deleted:** +Check that the Prometheus, AlertManager, and Grafana resources have been removed: ```yaml -global: - resolve_timeout: 5m - -route: - receiver: 'email-alerts' - group_by: ['alertname', 'cluster', 'service'] - group_wait: 30s - group_interval: 5m - repeat_interval: 3h - -receivers: - - name: 'email-alerts' - email_configs: - - to: 'alerts@example.com' - from: 'alertmanager@example.com' - smarthost: 'smtp.example.com:587' - auth_username: 'user' - auth_password: 'password' +kubectl get all -l release=kube-prometheus-stack ``` +## Conclusion -## Best Practices -- Use grouping to consolidate similar alerts into a single notification. -- Define silences for planned maintenance windows to avoid unnecessary alerts. -- Integrate Alertmanager with multiple notification channels for redundancy. -- Monitor Alertmanager itself to ensure it is functioning correctly. +In this guide, we have successfully set up Prometheus and Alertmanager in a Kubernetes cluster using Helm and configured custom alert rules to monitor the cluster’s health. We also explored the components of an alert rule to better understand how they work. This setup provides a robust monitoring solution that can be further extended and customized to suit your needs. For more examples of custom Prometheus alert rules, you can refer to this Awesome Prometheus Alerts repository. ---- -Stay tuned for updates as we continue to enhance this guide! \ No newline at end of file diff --git a/docs/monitoring/elk-stack.md b/docs/monitoring/elk-stack.md index f24288cf..78b3085b 100644 --- a/docs/monitoring/elk-stack.md +++ b/docs/monitoring/elk-stack.md @@ -5,94 +5,354 @@ sidebar_id: "elk-stack" sidebar_position: 4 --- -# ELK Stack: Centralized Logging for Kubernetes +# ⎈ A Hands-On Guide to Kubernetes Logging Using ELK Stack & Filebeat ⚙️ -The ELK Stack (Elasticsearch, Logstash, and Kibana) is a popular solution for centralized logging and log analysis. It allows you to collect, process, and visualize logs from Kubernetes clusters, making it easier to monitor and troubleshoot applications. This guide provides an overview of the ELK Stack, its benefits, and how to set it up in a Kubernetes environment. +#### *⇢ A Comprehensive Guide to Setting Up the ELK Stack on Kubernetes with Helm with Practical Example* ---- +![img](./img/elk-and-filebeat.png.webp) -
-

🚧 Work in Progress

-

This page is currently under construction. Please check back later for detailed information about ELK Stack setup and usage in Kubernetes.

-
+In this blog post, we’ll guide you through setting up the ELK stack (Elasticsearch, Logstash, and Kibana) on a Kubernetes cluster using Helm. Helm simplifies the deployment and management of applications on Kubernetes, making it an excellent tool for deploying complex stacks like ELK. We’ll also configure Filebeat to collect and forward logs to Logstash. ---- -## Table of Contents -- [Introduction](#introduction) -- [Why Use the ELK Stack?](#why-use-the-elk-stack) -- [Architecture](#architecture) -- [Installation](#installation) -- [Configuration](#configuration) -- [Best Practices](#best-practices) +### Prerequisites +Before we get started, make sure you have: ---- +- A Kubernetes cluster up and running. -## Introduction -The ELK Stack is a powerful tool for managing and analyzing logs in Kubernetes. It consists of: -- **Elasticsearch**: A distributed search and analytics engine for storing and querying logs. -- **Logstash**: A data processing pipeline that ingests, transforms, and forwards logs to Elasticsearch. -- **Kibana**: A visualization tool for exploring and analyzing logs stored in Elasticsearch. +- Helm installed and configured. ---- +- kubectl installed and configured. -## Why Use the ELK Stack? -- **Centralized Logging**: Collect logs from all Kubernetes pods and nodes in one place. -- **Powerful Querying**: Elasticsearch provides advanced search and analytics capabilities. -- **Visualization**: Kibana offers customizable dashboards for log analysis. -- **Scalability**: The ELK Stack can handle large-scale Kubernetes clusters. +![img](./img/animated-elk-and-filebeat.png.gif) ---- +### Step 1: Install Elasticsearch -## Architecture -The ELK Stack works as follows: -1. **Logstash**: Collects logs from Kubernetes pods and nodes, processes them, and forwards them to Elasticsearch. -2. **Elasticsearch**: Stores the logs and makes them searchable. -3. **Kibana**: Visualizes the logs and provides an interface for querying and analyzing them. +Elasticsearch is the core component of the ELK stack, responsible for storing and indexing logs. We’ll use the official Elasticsearch Helm chart for deployment. ---- +1. **Add the Elastic Helm repository:** -## Installation -> **Note:** Detailed installation steps will be added soon. +```yaml +helm repo add elastic https://helm.elastic.co +helm repo update +``` ---- -## Configuration -The ELK Stack requires configuration for each component: -1. **Logstash**: Define input sources, filters, and output destinations. -2. **Elasticsearch**: Configure storage, indexing, and cluster settings. -3. **Kibana**: Set up dashboards and connect to Elasticsearch. +2. **Create a elasticsearch-values.yaml file with the following content:** -Example Logstash configuration: ```yaml -input { - file { - path => "/var/log/*.log" - type => "kubernetes-logs" - } -} +resources: + requests: + cpu: "200m" + memory: "200Mi" + limits: + cpu: "1000m" + memory: "2Gi" + +antiAffinity: "soft" +``` + + +antiAffinity: "soft": Configures soft anti-affinity, allowing pods to be scheduled on the same node if necessary, but preferring to spread them across nodes when possible. + +3. **Install Elasticsearch:** + +```yaml +helm install elasticsearch elastic/elasticsearch -f elasticsearch-values.yaml +``` + +This command installs Elasticsearch with the specified configurations. + +### Step 2: Configure and Install Filebeat + +Filebeat is a lightweight shipper for forwarding and centralizing log data. We’ll configure Filebeat to collect logs from containerized applications and forward them to Logstash. + + +1. **Create a filebeat-values.yaml file with the following content:** + + +```yaml +filebeatConfig: + filebeat.yml: | + filebeat.inputs: + - type: container + paths: + - /var/log/containers/*.log + processors: + - add_kubernetes_metadata: + host: ${NODE_NAME} + matchers: + - logs_path: + logs_path: "/var/log/containers/" + + output.logstash: + hosts: ["logstash-logstash:5044"] +``` + + +**Explanation:** + + +**filebeat.inputs:** Configures Filebeat to collect logs from container directories. The path /var/log/containers/*.log is where Kubernetes stores container logs. +**processors:** Adds Kubernetes metadata to the logs to provide context, such as pod names and namespaces. + +**output.logstash:** Configures Filebeat to send logs to Logstash at port 5044. + +2. **Install Filebeat using Helm:** + + +```yaml +helm install filebeat elastic/filebeat -f filebeat-values.yaml +``` + +This command installs Filebeat with the specified configuration, ensuring that logs are collected from containers and forwarded to Logstash. + +### Step 3: Configure and Install Logstash + + +Logstash processes and transforms logs before indexing them in Elasticsearch. We’ll set up Logstash to receive logs from Filebeat and send them to Elasticsearch. + +1. **Create a logstash-values.yaml file with the following content:** +```yaml +extraEnvs: + - name: "ELASTICSEARCH_USERNAME" + valueFrom: + secretKeyRef: + name: elasticsearch-master-credentials + key: username + - name: "ELASTICSEARCH_PASSWORD" + valueFrom: + secretKeyRef: + name: elasticsearch-master-credentials + key: password + +logstashConfig: + logstash.yml: | + http.host: 0.0.0.0 + xpack.monitoring.enabled: false + +logstashPipeline: + logstash.conf: | + input { + beats { + port => 5044 + } + } + + output { + elasticsearch { + hosts => ["https://elasticsearch-master:9200"] + cacert => "/usr/share/logstash/config/elasticsearch-master-certs/ca.crt" + user => '${ELASTICSEARCH_USERNAME}' + password => '${ELASTICSEARCH_PASSWORD}' + } + } + +secretMounts: + - name: "elasticsearch-master-certs" + secretName: "elasticsearch-master-certs" + path: "/usr/share/logstash/config/elasticsearch-master-certs" + +service: + type: ClusterIP + ports: + - name: beats + port: 5044 + protocol: TCP + targetPort: 5044 + - name: http + port: 8080 + protocol: TCP + targetPort: 8080 + +resources: + requests: + cpu: "200m" + memory: "200Mi" + limits: + cpu: "1000m" + memory: "1536Mi" + +``` + + +**Explanation:** + +**extraEnvs:** Sets environment variables for Elasticsearch authentication using Kubernetes secrets. +**logstashConfig:** Configures Logstash settings, including enabling HTTP and disabling monitoring. +**logstashPipeline:** Configures Logstash to listen on port 5044 for incoming logs from Filebeat and forward them to Elasticsearch. +**secretMounts:** Mounts the Elasticsearch CA certificate for secure communication between Logstash and Elasticsearch. +**service:** Configures Logstash’s service type as ClusterIP, making it accessible only within the cluster. + -filter { - grok { - match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} %{GREEDYDATA:message}" } - } -} -output { - elasticsearch { - hosts => ["http://elasticsearch:9200"] - index => "kubernetes-logs-%{+YYYY.MM.dd}" - } +2. **Install Logstash using Helm:** + +```yaml +helm install logstash elastic/logstash -f logstash-values.yaml +``` +This command installs Logstash with the specified configuration, ensuring that it can receive logs from Filebeat and forward them to Elasticsearch. + + +### Step 4: Configure and Install Kibana + +Kibana provides a user interface for visualizing and interacting with Elasticsearch data. + +1. **Create a kibana-values.yaml file with the following content:** + +```yaml +service: + type: NodePort + port: 5601 + +resources: + requests: + cpu: "200m" + memory: "200Mi" + limits: + cpu: "1000m" + memory: "2Gi" +``` + +***Explanation:*** + +**service.type: NodePort:** Exposes Kibana on a specific port on all nodes in the Kubernetes cluster. This makes it accessible from outside the cluster for development and testing purposes. + + +**port: 5601:** The default port for Kibana, which is exposed for accessing the Kibana web interface. + +2. **Install Kibana using Helm:** + +```yaml +helm install kibana elastic/kibana -f kibana-values.yaml +``` +This command installs Kibana with the specified configuration, allowing you to access it through the exposed port. + +### Step 5: Access Kibana and View Logs + +Now that Kibana is installed and running, you can access it to visualize and analyze the logs collected by Filebeat and processed by Logstash. + + 1. **Find the NodePort assigned to Kibana:** + + ```yaml + kubectl get svc kibana-kibana -n elk -o jsonpath="{.spec.ports[0].nodePort}" + ``` + + + This command retrieves the NodePort assigned to Kibana, which you will use to access the Kibana web interface. + + 2. **Access Kibana:** + + Open your web browser and navigate to: +```yaml + http://: + ``` + + Replace EXTERNAL-IP with the IP address of your Kubernetes cluster and NODE-PORT with the NodePort value obtained in step 1. + + ![img](./img/kibana-login-page.png.webp) + + 3. **Log in to Kibana:** + + You can get the login credentials for Kibana from the elastic secrets using the below commands. + +```yaml +$ kubectl get secret elasticsearch-master-credentials -o jsonpath="{.data.username}" | base64 --decode + +$ kubectl get secret elasticsearch-master-credentials -o jsonpath="{.data.password}" | base64 --decode +``` +Once you access Kibana, you can start exploring your log data. + +![img](./img/kibana-dashboard.png.webp) + + +Access the logs + +![img](./img/kibana-logs.png.webp) + + +### Step 6: Check Elasticsearch Cluster Health + +To ensure that your Elasticsearch cluster is functioning correctly, you need to verify its health. Here’s how you can check the health of your Elasticsearch cluster: + +**Check Cluster Health:** + +Execute the below command to check the health of your Elasticsearch cluster by querying the _cluster/health endpoint: +```yaml +kubectl exec -it -- curl -XGET -u elastic -vk 'https://elasticsearch-master:9200/_cluster/health?pretty' +``` +**Output:** + +```yaml +$ kubectl exec -it elasticsearch-master-0 -- curl -XGET -u elastic -vk 'https://elasticsearch-master:9200/_cluster/health?pretty' + +Defaulted container "elasticsearch" out of: elasticsearch, configure-sysctl (init) +Enter host password for user 'elastic': +Note: Unnecessary use of -X or --request, GET is already inferred. +* Trying 10.245.158.126:9200... +* TCP_NODELAY set +* Connected to elasticsearch-master (10.245.158.126) port 9200 (#0) +* ALPN, offering h2 +* ALPN, offering http/1.1 +* successfully set certificate verify locations: +* CAfile: /etc/ssl/certs/ca-certificates.crt + CApath: /etc/ssl/certs +* TLSv1.3 (OUT), TLS handshake, Client hello (1): +* TLSv1.3 (IN), TLS handshake, Server hello (2): +* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): +* TLSv1.3 (IN), TLS handshake, Certificate (11): +* TLSv1.3 (IN), TLS handshake, CERT verify (15): +* TLSv1.3 (IN), TLS handshake, Finished (20): +* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): +* TLSv1.3 (OUT), TLS handshake, Finished (20): +* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 +* ALPN, server did not agree to a protocol +* Server certificate: +* subject: CN=elasticsearch-master +* start date: Sep 11 00:42:27 2024 GMT +* expire date: Sep 11 00:42:27 2025 GMT +* issuer: CN=elasticsearch-ca +* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway. +* Server auth using Basic with user 'elastic' +> GET /_cluster/health?pretty HTTP/1.1 +> Host: elasticsearch-master:9200 +> Authorization: Basic ZWxhc3RpYzp6a3J6Z2lqd3NDUWlLaDJW +> User-Agent: curl/7.68.0 +> Accept: */* +> +* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): +* Mark bundle as not supporting multiuse +< HTTP/1.1 200 OK +< X-elastic-product: Elasticsearch +< content-type: application/json +< content-length: 468 +< +{ + "cluster_name" : "elasticsearch", + "status" : "green", + "timed_out" : false, + "number_of_nodes" : 3, + "number_of_data_nodes" : 3, + "active_primary_shards" : 13, + "active_shards" : 26, + "relocating_shards" : 0, + "initializing_shards" : 0, + "unassigned_shards" : 0, + "delayed_unassigned_shards" : 0, + "number_of_pending_tasks" : 0, + "number_of_in_flight_fetch" : 0, + "task_max_waiting_in_queue_millis" : 0, + "active_shards_percent_as_number" : 100.0 } -``` +* Connection #0 to host elasticsearch-master left intact +``` + + + +Review the output to understand the cluster’s health status. + +**Conclusion** -## Best Practices -- Use Kubernetes labels and annotations to organize logs effectively. -- Monitor the resource usage of Elasticsearch and Logstash to ensure they scale with your cluster. -- Set up retention policies in Elasticsearch to manage log storage. -- Regularly back up Elasticsearch data to prevent data loss. -- Use Kibana's visualization features to create dashboards for monitoring application performance and troubleshooting issues. +You’ve now set up the ELK stack on Kubernetes using Helm with the provided configurations! Your setup includes Elasticsearch for storing and indexing logs, Logstash for processing and forwarding logs, Filebeat for collecting and shipping logs, and Kibana for visualizing and analyzing your data. This powerful stack will help you monitor and analyze logs from your containerized applications. ---- -Stay tuned for more detailed information on setting up and using the ELK Stack in Kubernetes! \ No newline at end of file +Feel free to customize these configurations based on your specific requirements and environment. Happy logging! diff --git a/docs/monitoring/grafana-loki.md b/docs/monitoring/grafana-loki.md index 040e9b1f..f3d17f02 100644 --- a/docs/monitoring/grafana-loki.md +++ b/docs/monitoring/grafana-loki.md @@ -5,78 +5,275 @@ sidebar_id: "grafana-loki" sidebar_position: 1 --- -# Grafana Loki: Log Aggregation for Kubernetes +# ⎈ A Hands-On Guide to Kubernetes Logging Using Grafana Loki ⚙️ -Grafana Loki is a log aggregation system designed for Kubernetes. It is lightweight, cost-effective, and integrates seamlessly with Grafana for log visualization. This document provides an overview of Grafana Loki, its benefits, and how to set it up in a Kubernetes cluster. +#### *⇢ A Comprehensive Guide to Setting Up the Grafana Loki on Kubernetes with Helm: Practical Example* ---- +![img](./img/logging-grafana-loki.png.webp) -
-

🚧 Work in Progress

-

This page is currently under construction. Please check back later for detailed information about Grafana Loki setup and usage in Kubernetes.

-
+In a microservices architecture, monitoring and logging are essential to keep track of various components. Kubernetes generates a large number of logs, and managing them effectively is key to running a healthy cluster. **Grafana Loki** is a highly efficient logging solution that integrates seamlessly with **Grafana** for visualizing logs, allowing you to query and explore logs from multiple sources in one place. ---- +In this guide, I’ll walk you through setting up Grafana Loki in a Kubernetes cluster using Helm, a package manager for Kubernetes. We will use the Loki Stack, which comes bundled with Loki, Promtail, and optionally Grafana. -## Table of Contents -- [Introduction](#introduction) -- [Why Use Grafana Loki?](#why-use-grafana-loki) -- [Architecture](#architecture) -- [Installation](#installation) -- [Configuration](#configuration) -- [Querying Logs](#querying-logs) -- [Best Practices](#best-practices) ---- +![img](./img/animated-logging-grafana-loki.png.gif) -## Introduction -Grafana Loki is a log aggregation system optimized for Kubernetes. Unlike traditional log aggregation systems, Loki does not index the content of logs but instead indexes metadata such as labels. This makes it highly efficient and cost-effective for Kubernetes environments. +### Prerequisites ---- +Before starting, make sure you have: -## Why Use Grafana Loki? -- **Kubernetes-Native**: Designed to work seamlessly with Kubernetes labels and metadata. -- **Cost-Effective**: Minimal indexing reduces storage and processing costs. -- **Integration with Grafana**: Provides a unified interface for metrics and logs. -- **Scalable**: Can handle large-scale Kubernetes clusters with ease. ---- +- A Kubernetes cluster up and running +- Helm installed on your system +- kubectl configured to interact with your cluster -## Architecture -Grafana Loki consists of the following components: -1. **Promtail**: A lightweight agent that collects logs from Kubernetes pods and forwards them to Loki. -2. **Loki**: The central log aggregation system that stores and indexes logs. -3. **Grafana**: A visualization tool used to query and display logs from Loki. +## Steps to Set Up Grafana Loki on Kubernetes ---- +Once you have the prerequisites in place, follow the steps below to set up Grafana Loki using Helm. -## Installation -> **Note:** Detailed installation steps will be added soon. +### Step 1: Add the Grafana Helm Repository ---- +The first step is to add the Grafana Helm repository, which contains the Helm chart for deploying Loki. -## Configuration -> **Note:** Configuration details for Promtail, Loki, and Grafana will be added soon. +Run the following command to add the Grafana repo to Helm: + +```yaml + helm repo add grafana https://grafana.github.io/helm-charts +``` + +After adding the repository, it’s a good practice to update the Helm repo to ensure you have the latest chart versions. Use the command: + +```yaml +helm repo update +``` + +Now, list every repository with the word “Loki” in it by running: + +```yaml +helm search repo loki +``` + +You should see several results, but we will be using the grafana/loki-stack repository to deploy Promtail and Grafana, and to configure Loki. + +### Step 2: Customize Helm Chart Configuration Values +Before deploying Loki, you may want to customize some of the default values in the Helm chart. This step is especially important if you want to install Grafana alongside Loki or configure other advanced features like persistent storage. + + +First, download the default values of the Loki Helm chart into a YAML file by running: + +```yaml +helm show values grafana/loki-stack > loki-custom-values.yaml +``` +Now, open the loki-values.yaml file and make the following changes to meet your specific configuration needs. +Here is the custom loki-custom-values.yaml file: + +```yaml +test_pod: + enabled: true + image: bats/bats:1.8.2 + pullPolicy: IfNotPresent + +loki: + enabled: true + isDefault: true + url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }} + readinessProbe: + httpGet: + path: /ready + port: http-metrics + initialDelaySeconds: 45 + livenessProbe: + httpGet: + path: /ready + port: http-metrics + initialDelaySeconds: 45 + datasource: + jsonData: "{}" + uid: "" + + +promtail: + enabled: true + config: + logLevel: info + serverPort: 3101 + clients: + - url: http://{{ .Release.Name }}:3100/loki/api/v1/push + +fluent-bit: + enabled: false + +grafana: + enabled: true + sidecar: + datasources: + label: "" + labelValue: "" + enabled: true + maxLines: 1000 + image: + tag: 10.3.3 + service: + type: NodePort + +prometheus: + enabled: false + isDefault: false + url: http://{{ include "prometheus.fullname" .}}:{{ .Values.prometheus.server.service.servicePort }}{{ .Values.prometheus.server.prefixURL }} + datasource: + jsonData: "{}" + +filebeat: + enabled: false + filebeatConfig: + filebeat.yml: | + # logging.level: debug + filebeat.inputs: + - type: container + paths: + - /var/log/containers/*.log + processors: + - add_kubernetes_metadata: + host: ${NODE_NAME} + matchers: + - logs_path: + logs_path: "/var/log/containers/" + output.logstash: + hosts: ["logstash-loki:5044"] + +logstash: + enabled: false + image: grafana/logstash-output-loki + imageTag: 1.0.1 + filters: + main: |- + filter { + if [kubernetes] { + mutate { + add_field => { + "container_name" => "%{[kubernetes][container][name]}" + "namespace" => "%{[kubernetes][namespace]}" + "pod" => "%{[kubernetes][pod][name]}" + } + replace => { "host" => "%{[kubernetes][node][name]}"} + } + } + mutate { + remove_field => ["tags"] + } + } + outputs: + main: |- + output { + loki { + url => "http://loki:3100/loki/api/v1/push" + #username => "test" + #password => "test" + } + # stdout { codec => rubydebug } + } + +# proxy is currently only used by loki test pod +# Note: If http_proxy/https_proxy are set, then no_proxy should include the +# loki service name, so that tests are able to communicate with the loki +# service. +proxy: + http_proxy: "" + https_proxy: "" + no_proxy: "" +``` + +***Key Points in Custom Configuration:*** + + +- **Loki** is enabled and configured with readiness and liveness probes for health checking. +- **Promtail** is enabled to forward logs from Kubernetes nodes to Loki. +- **Grafana** is enabled with a **NodePort** service to allow access to the Grafana UI from outside the cluster. +- **Prometheus**, **Filebeat**, and **Logstash** are explicitly disabled. + +### Step 3: Deploy the Loki Stack with Custom Values + +After editing the loki-cusomt-values.yaml file, you are ready to deploy the Loki stack. Use the following command to install or upgrade the Helm release: +```yaml + +helm upgrade --install --values loki-custom-values.yaml loki grafana/loki-stack -n grafana-loki --create-namespace +``` +**This command:** +- Deploys the **Loki**, **Promtail**, and **Grafana** components. +- Disables the **Prometheus**, **Filebeat**, and **Logstash** components as per the configuration. +- Creates a namespace grafana-loki and deploys all components inside this namespace. + +### Step 4: Access Grafana and Configure Data Source +Once the Helm chart has been successfully deployed, it’s time to access Grafana and verify that everything is working correctly. + +1. ***First, check the pods in the grafana-loki namespace to ensure everything is running:*** +```yaml +$ kubectl get pods -n grafana-loki +NAME READY STATUS RESTARTS AGE +loki-0 1/1 Running 0 19m +loki-grafana-567d65596c-gvt5q 2/2 Running 0 17m +loki-promtail-8jng6 1/1 Running 0 19m +loki-promtail-hb6x2 1/1 Running 0 19m +``` + +2. ***Find the NodePort assigned to Grafana:*** + +```yaml +$ kubectl get svc loki-grafana -n grafana-loki -o jsonpath="{.spec.ports[0].nodePort}" +30287 +``` +This command retrieves the NodePort assigned to Grafana, which you will use to access the Grafana web interface. + +3. ***Access Kibana:*** + +Open your web browser and navigate to: + +```yaml +http://: +``` + +Replace EXTERNAL-IP with the IP address of your Kubernetes cluster and NODE-PORT with the NodePort value obtained in step 1 + +![img](./img/grafana-ui.png.webp) + +4. ***Log in to Grafana:*** + +You can get the login credentials for Grafana from the loki-grafana secret using the below commands. + +```yaml +$ kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-user}" | base64 --decode +admin +$ kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-password}" | base64 --decode +C43ICy6t22dwI3W93DsDPiiSUeX5Z4aHMwKWkNvq% +``` + +Once you log in you can see the home screen of Grafana, press the three lines at the top left corner you can see the menu then go to **Connections > Data sources** as shown in the below image. + +![img](./img/grafana-home.png.webp) + +In Data sources you can see Loki has been configured as shown below + +![img](./img/grafana-data-sources.png.webp) + + +Now, check if you are getting logs or not. Go to Explore by pressing the Explore button. + + + +To query logs select a Label and Value, Loki will collect every log in your Kubernetes cluster and label it according to container, pod, namespace, deployments, jobs and other objects of Kubernetes. + +![img](./img/grafana-query.png.webp) + +After selecting a Label(namespace) and Value(grafana-loki), press the blue button at the top right corner(Run Query)to query logs. + +![img](./img/grafana-logs.png.webp) + + +Promtail, running as a DaemonSet, will collect logs from all nodes and forward them to Loki. You can query these logs in Grafana, making it easy to monitor your Kubernetes applications. ---- -## Querying Logs -Grafana Loki uses a query language called **LogQL** to filter and analyze logs. Example queries: -- Retrieve logs for a specific pod: - ```logql - {pod="my-app-pod"} - ``` -- Filter logs by a specific label: - ```logql - {app="my-app", level="error"} - ``` - -## Best Practices -- Use Kubernetes labels effectively to organize and query logs. -- Monitor Loki's resource usage to ensure it scales with your cluster. -- Set up retention policies to manage log storage efficiently. -- Integrate Loki with Grafana dashboards for unified monitoring. ---- -Stay tuned for updates as we continue to enhance this guide! \ No newline at end of file +## Conclusion +In this post, we walked through how to deploy Grafana Loki on Kubernetes using Helm with customized values. By enabling Loki, Promtail, and Grafana, and disabling unnecessary components like Prometheus, Filebeat, and Logstash, we tailored the setup to meet specific logging needs. +Grafana Loki offers an efficient, scalable solution for Kubernetes log management. With this setup, you can now monitor and explore your Kubernetes logs with ease. \ No newline at end of file diff --git a/docs/monitoring/img/alert-manager-architecture.png.webp b/docs/monitoring/img/alert-manager-architecture.png.webp new file mode 100644 index 00000000..964d628a Binary files /dev/null and b/docs/monitoring/img/alert-manager-architecture.png.webp differ diff --git a/docs/monitoring/img/alert-triggered-on-alertmanager.png.webp b/docs/monitoring/img/alert-triggered-on-alertmanager.png.webp new file mode 100644 index 00000000..022ddb0d Binary files /dev/null and b/docs/monitoring/img/alert-triggered-on-alertmanager.png.webp differ diff --git a/docs/monitoring/img/alert-triggered-on-prometheus.png.webp b/docs/monitoring/img/alert-triggered-on-prometheus.png.webp new file mode 100644 index 00000000..f801f259 Binary files /dev/null and b/docs/monitoring/img/alert-triggered-on-prometheus.png.webp differ diff --git a/docs/monitoring/img/alertmanager-ui.png.webp b/docs/monitoring/img/alertmanager-ui.png.webp new file mode 100644 index 00000000..3424a15e Binary files /dev/null and b/docs/monitoring/img/alertmanager-ui.png.webp differ diff --git a/docs/monitoring/img/alertmanager.png.webp b/docs/monitoring/img/alertmanager.png.webp new file mode 100644 index 00000000..ea8b6f7e Binary files /dev/null and b/docs/monitoring/img/alertmanager.png.webp differ diff --git a/docs/monitoring/img/alerts-fired.png.webp b/docs/monitoring/img/alerts-fired.png.webp new file mode 100644 index 00000000..cf19d9f1 Binary files /dev/null and b/docs/monitoring/img/alerts-fired.png.webp differ diff --git a/docs/monitoring/img/alerts-in-prometheus-ui.png.webp b/docs/monitoring/img/alerts-in-prometheus-ui.png.webp new file mode 100644 index 00000000..b0aa152b Binary files /dev/null and b/docs/monitoring/img/alerts-in-prometheus-ui.png.webp differ diff --git a/docs/monitoring/img/animated-elk-and-filebeat.png.gif b/docs/monitoring/img/animated-elk-and-filebeat.png.gif new file mode 100644 index 00000000..7805b820 Binary files /dev/null and b/docs/monitoring/img/animated-elk-and-filebeat.png.gif differ diff --git a/docs/monitoring/img/animated-logging-grafana-loki.png.gif b/docs/monitoring/img/animated-logging-grafana-loki.png.gif new file mode 100644 index 00000000..2eb86f8b Binary files /dev/null and b/docs/monitoring/img/animated-logging-grafana-loki.png.gif differ diff --git a/docs/monitoring/img/animated-promethes-and-grafana.png.gif b/docs/monitoring/img/animated-promethes-and-grafana.png.gif new file mode 100644 index 00000000..6f9b6bc1 Binary files /dev/null and b/docs/monitoring/img/animated-promethes-and-grafana.png.gif differ diff --git a/docs/monitoring/img/custom-alerts.png.gif b/docs/monitoring/img/custom-alerts.png.gif new file mode 100644 index 00000000..514687a2 Binary files /dev/null and b/docs/monitoring/img/custom-alerts.png.gif differ diff --git a/docs/monitoring/img/dashboard-current-alerts.png.webp b/docs/monitoring/img/dashboard-current-alerts.png.webp new file mode 100644 index 00000000..67ae851e Binary files /dev/null and b/docs/monitoring/img/dashboard-current-alerts.png.webp differ diff --git a/docs/monitoring/img/dashboard-id.png.webp b/docs/monitoring/img/dashboard-id.png.webp new file mode 100644 index 00000000..5a66e7c9 Binary files /dev/null and b/docs/monitoring/img/dashboard-id.png.webp differ diff --git a/docs/monitoring/img/dashboard-in-grafana.png.webp b/docs/monitoring/img/dashboard-in-grafana.png.webp new file mode 100644 index 00000000..360e3269 Binary files /dev/null and b/docs/monitoring/img/dashboard-in-grafana.png.webp differ diff --git a/docs/monitoring/img/dashboard-k8s.png.webp b/docs/monitoring/img/dashboard-k8s.png.webp new file mode 100644 index 00000000..b397855a Binary files /dev/null and b/docs/monitoring/img/dashboard-k8s.png.webp differ diff --git a/docs/monitoring/img/dashboard.png.webp b/docs/monitoring/img/dashboard.png.webp new file mode 100644 index 00000000..8743346d Binary files /dev/null and b/docs/monitoring/img/dashboard.png.webp differ diff --git a/docs/monitoring/img/elk-and-filebeat.png.webp b/docs/monitoring/img/elk-and-filebeat.png.webp new file mode 100644 index 00000000..ebd7da82 Binary files /dev/null and b/docs/monitoring/img/elk-and-filebeat.png.webp differ diff --git a/docs/monitoring/img/grafana-dashboard-metrics.png.webp b/docs/monitoring/img/grafana-dashboard-metrics.png.webp new file mode 100644 index 00000000..4709e444 Binary files /dev/null and b/docs/monitoring/img/grafana-dashboard-metrics.png.webp differ diff --git a/docs/monitoring/img/grafana-dashboard.png.webp b/docs/monitoring/img/grafana-dashboard.png.webp new file mode 100644 index 00000000..415b81d1 Binary files /dev/null and b/docs/monitoring/img/grafana-dashboard.png.webp differ diff --git a/docs/monitoring/img/grafana-dashboards.png.webp b/docs/monitoring/img/grafana-dashboards.png.webp new file mode 100644 index 00000000..86fd3d85 Binary files /dev/null and b/docs/monitoring/img/grafana-dashboards.png.webp differ diff --git a/docs/monitoring/img/grafana-data-sources.png.webp b/docs/monitoring/img/grafana-data-sources.png.webp new file mode 100644 index 00000000..b3388073 Binary files /dev/null and b/docs/monitoring/img/grafana-data-sources.png.webp differ diff --git a/docs/monitoring/img/grafana-home.png.webp b/docs/monitoring/img/grafana-home.png.webp new file mode 100644 index 00000000..3891e7c4 Binary files /dev/null and b/docs/monitoring/img/grafana-home.png.webp differ diff --git a/docs/monitoring/img/grafana-kube-state-metrics-v2.png.webp b/docs/monitoring/img/grafana-kube-state-metrics-v2.png.webp new file mode 100644 index 00000000..a7582ca8 Binary files /dev/null and b/docs/monitoring/img/grafana-kube-state-metrics-v2.png.webp differ diff --git a/docs/monitoring/img/grafana-library.png.webp b/docs/monitoring/img/grafana-library.png.webp new file mode 100644 index 00000000..12516962 Binary files /dev/null and b/docs/monitoring/img/grafana-library.png.webp differ diff --git a/docs/monitoring/img/grafana-logs.png.webp b/docs/monitoring/img/grafana-logs.png.webp new file mode 100644 index 00000000..63a1782c Binary files /dev/null and b/docs/monitoring/img/grafana-logs.png.webp differ diff --git a/docs/monitoring/img/grafana-query.png.webp b/docs/monitoring/img/grafana-query.png.webp new file mode 100644 index 00000000..0e5c8782 Binary files /dev/null and b/docs/monitoring/img/grafana-query.png.webp differ diff --git a/docs/monitoring/img/grafana-ui.png.webp b/docs/monitoring/img/grafana-ui.png.webp new file mode 100644 index 00000000..ff890b26 Binary files /dev/null and b/docs/monitoring/img/grafana-ui.png.webp differ diff --git a/docs/monitoring/img/import-dashboard-11455.png.webp b/docs/monitoring/img/import-dashboard-11455.png.webp new file mode 100644 index 00000000..bf93a200 Binary files /dev/null and b/docs/monitoring/img/import-dashboard-11455.png.webp differ diff --git a/docs/monitoring/img/import-dashboard-13345.webp b/docs/monitoring/img/import-dashboard-13345.webp new file mode 100644 index 00000000..77d29467 Binary files /dev/null and b/docs/monitoring/img/import-dashboard-13345.webp differ diff --git a/docs/monitoring/img/import-dashboard.png.webp b/docs/monitoring/img/import-dashboard.png.webp new file mode 100644 index 00000000..2be73425 Binary files /dev/null and b/docs/monitoring/img/import-dashboard.png.webp differ diff --git a/docs/monitoring/img/import-grafana-dashboard.png.webp b/docs/monitoring/img/import-grafana-dashboard.png.webp new file mode 100644 index 00000000..eede0b97 Binary files /dev/null and b/docs/monitoring/img/import-grafana-dashboard.png.webp differ diff --git a/docs/monitoring/img/kibana-dashboard.png.webp b/docs/monitoring/img/kibana-dashboard.png.webp new file mode 100644 index 00000000..63db65eb Binary files /dev/null and b/docs/monitoring/img/kibana-dashboard.png.webp differ diff --git a/docs/monitoring/img/kibana-login-page.png.webp b/docs/monitoring/img/kibana-login-page.png.webp new file mode 100644 index 00000000..cf8df6d9 Binary files /dev/null and b/docs/monitoring/img/kibana-login-page.png.webp differ diff --git a/docs/monitoring/img/kibana-logs.png.webp b/docs/monitoring/img/kibana-logs.png.webp new file mode 100644 index 00000000..07963b61 Binary files /dev/null and b/docs/monitoring/img/kibana-logs.png.webp differ diff --git a/docs/monitoring/img/kube-state-metrics-v2.png.webp b/docs/monitoring/img/kube-state-metrics-v2.png.webp new file mode 100644 index 00000000..adbc80d0 Binary files /dev/null and b/docs/monitoring/img/kube-state-metrics-v2.png.webp differ diff --git a/docs/monitoring/img/load-grafana-dashboard.png.webp b/docs/monitoring/img/load-grafana-dashboard.png.webp new file mode 100644 index 00000000..a5e7a009 Binary files /dev/null and b/docs/monitoring/img/load-grafana-dashboard.png.webp differ diff --git a/docs/monitoring/img/logging-and-metrics.png.webp b/docs/monitoring/img/logging-and-metrics.png.webp new file mode 100644 index 00000000..e4dd2dee Binary files /dev/null and b/docs/monitoring/img/logging-and-metrics.png.webp differ diff --git a/docs/monitoring/img/logging-grafana-loki.png.webp b/docs/monitoring/img/logging-grafana-loki.png.webp new file mode 100644 index 00000000..75c38920 Binary files /dev/null and b/docs/monitoring/img/logging-grafana-loki.png.webp differ diff --git a/docs/monitoring/img/pod-dashboard-example.png.webp b/docs/monitoring/img/pod-dashboard-example.png.webp new file mode 100644 index 00000000..13639441 Binary files /dev/null and b/docs/monitoring/img/pod-dashboard-example.png.webp differ diff --git a/docs/monitoring/img/prometheus-alerts.png.webp b/docs/monitoring/img/prometheus-alerts.png.webp new file mode 100644 index 00000000..2dae20f7 Binary files /dev/null and b/docs/monitoring/img/prometheus-alerts.png.webp differ diff --git a/docs/monitoring/img/prometheus-and-grafana-flowchart.png.webp b/docs/monitoring/img/prometheus-and-grafana-flowchart.png.webp new file mode 100644 index 00000000..a003dc72 Binary files /dev/null and b/docs/monitoring/img/prometheus-and-grafana-flowchart.png.webp differ diff --git a/docs/monitoring/img/prometheus-ui.png.webp b/docs/monitoring/img/prometheus-ui.png.webp new file mode 100644 index 00000000..f4f2df6e Binary files /dev/null and b/docs/monitoring/img/prometheus-ui.png.webp differ diff --git a/docs/monitoring/img/promethues-rule.png.webp b/docs/monitoring/img/promethues-rule.png.webp new file mode 100644 index 00000000..29115ed1 Binary files /dev/null and b/docs/monitoring/img/promethues-rule.png.webp differ diff --git a/docs/monitoring/metrics-loggings-grafana-loki.md b/docs/monitoring/metrics-loggings-grafana-loki.md new file mode 100644 index 00000000..b46a1d62 --- /dev/null +++ b/docs/monitoring/metrics-loggings-grafana-loki.md @@ -0,0 +1,332 @@ +--- +// filepath: /Users/anveshmuppeda/Desktop/anvesh/tech/git/kubernetes/docs/monitoring/metrics-loggings grafana-loki.md +sidebar_label: "Metrics Loggings" +sidebar_id: "metrics-loggings" +sidebar_position: 5 +--- +# ⎈ A Hands-On Guide to Kubernetes Monitoring: Metrics and Logging with Grafana Loki ⚙️ + +#### *⇢ A Step-by-Step Guide to Setting Up Metrics and Logging in Kubernetes Using the Grafana, Loki, Prometheus, Logstash, and Filebeat for Full Cluster Observability* + +![img](./img/logging-and-metrics.png.webp) + +In a microservices architecture, monitoring both **metrics** and **logs** is critical for ensuring the health and performance of your applications. When running Kubernetes clusters, the ability to efficiently collect and visualize this data can be complex. With tools like **Grafana**, **Loki**, **Prometheus**, **Logstash**, and **Filebeat**, we can set up a powerful monitoring stack that provides complete observability. + + + +This blog will guide you through setting up a comprehensive monitoring solution in Kubernetes, focusing on both metrics and logging. We will use the following tools: + +- **Grafana:** For visualizing metrics and logs. +- **Loki:** For aggregating and storing logs. +- **Prometheus:** For collecting metrics. +- **Logstash:** For log processing and forwarding. +- **Filebeat:** For collecting log files from Kubernetes pods. + + +![img](./img/logging-and-metrics.png.webp) + + +We’ll use Helm to deploy these tools as it simplifies managing Kubernetes applications through charts. This tutorial builds upon the previous setup of Grafana Loki for logging and expands it to include Prometheus for metrics and more robust log collection with Logstash and Filebeat. + +## Prerequisites + +Before starting, ensure you have the following: + + +- A Kubernetes cluster up and running. +- Helm installed on your machine. +- kubectl configured to interact with your cluster. + +### Step 1: Add the Grafana Helm Repository + +To begin, add the Grafana Helm repository, which contains the charts for deploying Loki and other monitoring tools: + +```yaml +helm repo add grafana https://grafana.github.io/helm-charts +helm repo update +``` +Next, search for the Loki chart: + +```yaml +helm search repo loki +``` + +We will be using the grafana/loki-stack chart for this deployment, which includes Grafana, Loki, and additional components. + +### Step 2: Customize Helm Chart Configuration + +We’ll customize the default Helm chart values to enable Prometheus for metrics, configure Filebeat for log collection, and set up Logstash for advanced log processing. Below is the updated loki-custom-values.yaml file: + +```yaml + +test_pod: + enabled: true + image: bats/bats:1.8.2 + pullPolicy: IfNotPresent + +loki: + enabled: true + isDefault: true + fullnameOverride: loki + url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }} + readinessProbe: + httpGet: + path: /ready + port: http-metrics + initialDelaySeconds: 45 + livenessProbe: + httpGet: + path: /ready + port: http-metrics + initialDelaySeconds: 45 + datasource: + jsonData: "{}" + uid: "" + + +promtail: + enabled: false + config: + logLevel: info + serverPort: 3101 + clients: + - url: http://{{ .Release.Name }}:3100/loki/api/v1/push + +fluent-bit: + enabled: false + +grafana: + enabled: true + sidecar: + datasources: + label: "" + labelValue: "" + enabled: true + maxLines: 1000 + image: + tag: 10.3.3 + service: + type: NodePort + +prometheus: + enabled: true + isDefault: false + url: http://{{ include "prometheus.fullname" .}}:{{ .Values.prometheus.server.service.servicePort }}{{ .Values.prometheus.server.prefixURL }} + datasource: + jsonData: "{}" + server: + service: + type: NodePort + persistentVolume: + ## If true, Prometheus server will create/use a Persistent Volume Claim + ## If false, use emptyDir + ## + enabled: false + +filebeat: + enabled: true + filebeatConfig: + filebeat.yml: | + # logging.level: debug + filebeat.inputs: + - type: container + paths: + - /var/log/containers/*.log + processors: + - add_kubernetes_metadata: + host: ${NODE_NAME} + matchers: + - logs_path: + logs_path: "/var/log/containers/" + output.logstash: + hosts: ["logstash-loki-headless:5044"] + +logstash: + enabled: true + image: grafana/logstash-output-loki + imageTag: 1.0.1 + + fullnameOverride: logstash-loki + + logstashConfig: + logstash.yml: | + http.host: 0.0.0.0 + xpack.monitoring.enabled: false + + logstashPipeline: + logstash.conf: | + input { + beats { + port => 5044 + } + } + + filter { + if [kubernetes] { + mutate { + add_field => { + "container" => "%{[kubernetes][container][name]}" + "namespace" => "%{[kubernetes][namespace]}" + "pod" => "%{[kubernetes][pod][name]}" + } + replace => { "host" => "%{[kubernetes][node][name]}"} + } + } + mutate { + remove_field => ["tags"] + } + } + + output { + loki { + url => "http://loki:3100/loki/api/v1/push" + } + # stdout { codec => rubydebug } + } + +proxy: + http_proxy: "" + https_proxy: "" + no_proxy: "" +``` + +Key points: + +- **Prometheus** is enabled for metrics collection with NodePort service. +- **Filebeat** is enabled for log collection from Kubernetes pods. +- **Logstash** is enabled and configured to receive logs from Filebeat and forward them to Loki. + +### Step 3: Deploy the Monitoring Stack + +Once the loki-custom-values.yaml file is ready, deploy the stack using Helm: + +```yaml +helm upgrade --install --values loki-custom-values.yaml loki grafana/loki-stack -n grafana-loki --create-namespace +``` +This command: + +- Deploys Loki, Prometheus, Filebeat, Logstash, and Grafana. +- Disable Promtail. +- Configures Prometheus to collect metrics and Filebeat to collect logs. +- Sets up Logstash to forward logs to Loki for central logging. + +### Step 4: Access the Cluster Logs on Grafana +After the deployment, you need to access Grafana and configure data sources for metrics and logs. + +1. **Check the Pods:** Verify that all the components are running correctly in the grafana-loki namespace: + +```yaml +$ kubectl get pods -n grafana-loki + +NAME READY STATUS RESTARTS AGE +logstash-loki-0 1/1 Running 0 59m +loki-0 1/1 Running 0 6h5m +loki-alertmanager-0 1/1 Running 0 22m +loki-filebeat-6gl8t 1/1 Running 0 53m +loki-filebeat-jrn5n 1/1 Running 0 53m +loki-filebeat-p8pl8 1/1 Running 0 53m +loki-grafana-568895c66-c7pxl 2/2 Running 0 59m +loki-kube-state-metrics-77ffbdd8db-x64lh 1/1 Running 0 50m +loki-prometheus-node-exporter-2hfgb 1/1 Running 0 50m +loki-prometheus-node-exporter-9qq9c 1/1 Running 0 50m +loki-prometheus-node-exporter-tkctf 1/1 Running 0 50m +loki-prometheus-pushgateway-69d48d6874-hgd7v 1/1 Running 0 50m +loki-prometheus-server-8475684f7c-qh44p 2/2 Running 0 48m +``` + +2. **Find the NodePort** for Grafana: Retrieve the NodePort assigned to the Grafana service: + + +```yaml +$ kubectl get svc loki-grafana -n grafana-loki -o jsonpath="{.spec.ports[0].nodePort}" + +32181 +``` + +3. **Access the Grafana UI:** Open your browser and navigate to: + +```yaml +http://: +``` + +Replace EXTERNAL-IP with your cluster's IP address and NODE-PORT with the NodePort you retrieved. + +![img](./img/grafana-ui.png.webp) + +4. **Log in to Grafana:** Retrieve the default login credentials: + +```yaml +kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-user}" | base64 --decode +kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-password}" | base64 --decode +``` +Once you log in you can see the home screen of Grafana, press the three lines at the top left corner you can see the menu then go to **Connections > Data sources** as shown in the below image. + +![img](./img/grafana-home.png.webp) + +In Data sources you can see Loki has been configured as shown below + +![img](./img/grafana-data-sources.png.webp) + + +Now, check if you are getting logs or not. Go to Explore by pressing the Explore button. + + +To query logs select a Label and Value, Loki will collect every log in your Kubernetes cluster and label it according to container, pod, namespace, deployments, jobs and other objects of Kubernetes. + +![img](./img/grafana-query.png.webp) + +After selecting a Label(namespace) and Value(grafana-loki), press the blue button at the top right corner(Run Query)to query logs. + +![img](./img/grafana-logs.png.webp) + +Filebeat, running as a DaemonSet, will collect logs from all nodes and send them to Logstash, and logstash forward them to Loki. You can query these logs in Grafana, making it easy to monitor your Kubernetes applications. + + +### Step 5: Access Metrics on Grafana by Adding new Dashboards + +1. Login to the Grafana by follwoing the above same steps. +2. Navigate to Home > Dashboards section + +![img](./img/grafana-dashboard-metrics.png.webp) + +3. Add/Create new Dashboards + +We also have the flexibility to create our own dashboards from scratch or import multiple Grafana dashboards from the Grafana library. + +To import a Grafana dashboard, follow these steps: + +Step 1: Access the Grafana library. + +Step 2. Select the desired dashboard ID to add. + +Considering kube-state-metrics-v2 Dashboard + +![img](./img/kube-state-metrics-v2.png.webp) + +Copy the Id of kube-state-metrics-v2 Dashboard i.e., 13332 + + +### Step 3: Import selected Dashboard in Grafana + +Access Home > Dashboard section & click on Import section. + +![img](./img/import-grafana-dashboard.png.webp) + +![img](./img/import-dashboard-13345.webp) + +Now enter the ID of the target new Dashboard i.e., 13332 then click on Load to load the new dashboard into Grafana. + + +![img](./img/load-grafana-dashboard.png.webp) + +Click on Import to import the new Dashboard & Access it. + +![img](./img/grafana-kube-state-metrics-v2.png.webp) + + +These steps allow us to easily integrate any dashboard from the Grafana library. Now that everything is set up, you can start visualizing both metrics and logs in Grafana. + + +## Conclusion + +In this blog, we have built a complete monitoring stack for Kubernetes that includes both metrics and logs. By using Grafana for visualization, Loki for log aggregation, Prometheus for metrics collection, Filebeat for log collection, and Logstash for log processing, you can ensure that your Kubernetes cluster is fully observable. This setup provides a powerful way to monitor and troubleshoot your applications, ensuring better reliability and performance. diff --git a/docs/monitoring/promrtheus-grafana.md b/docs/monitoring/promrtheus-grafana.md index 1cbffc43..59d39279 100644 --- a/docs/monitoring/promrtheus-grafana.md +++ b/docs/monitoring/promrtheus-grafana.md @@ -5,82 +5,375 @@ sidebar_id: "prometheus-grafana" sidebar_position: 2 --- -# Prometheus and Grafana: Monitoring Kubernetes Clusters +# ⎈ A Hands-On Guide to Kubernetes Monitoring Using Prometheus & Grafana🛠️ -Prometheus and Grafana are widely used tools for monitoring and visualizing metrics in Kubernetes clusters. Prometheus collects and stores metrics, while Grafana provides a powerful interface for querying and visualizing these metrics. This guide provides an overview of Prometheus and Grafana, their benefits, and how to set them up in a Kubernetes cluster. +#### *⇢ Understanding Prometheus & Grafana Setup in Kubernetes: A Comprehensive Guide* ---- +![img](./img/prometheus-and-grafana-flowchart.png.webp) -
-

🚧 Work in Progress

-

This page is currently under construction. Please check back later for detailed information about Prometheus and Grafana setup and usage in Kubernetes.

-
---- +## Introduction +In the dynamic world of containerized applications and microservices, monitoring is indispensable for maintaining the health, performance, and reliability of your infrastructure. Kubernetes, with its ability to orchestrate containers at scale, introduces new challenges and complexities in monitoring. This is where tools like Prometheus and Grafana come into play. +**Prometheus** is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It excels at monitoring metrics and providing powerful query capabilities against time-series data. Meanwhile, **Grafana** complements Prometheus by offering visualization capabilities through customizable dashboards and graphs. +- In this blog post, we will guide you through the process of setting up Prometheus and Grafana on a Kubernetes cluster using Helm. By the end of this tutorial, you will have a robust monitoring solution that allows you to: +- Collect and store metrics from your Kubernetes cluster and applications. +Visualize these metrics through intuitive dashboards. +- Set up alerts based on predefined thresholds or anomalies. +- Gain insights into the performance and resource utilization of your cluster. -## Table of Contents -- [Introduction](#introduction) -- [Why Use Prometheus and Grafana?](#why-use-prometheus-and-grafana) -- [Architecture](#architecture) -- [Installation](#installation) -- [Configuration](#configuration) -- [Creating Dashboards](#creating-dashboards) -- [Best Practices](#best-practices) ---- -## Introduction -Prometheus and Grafana are essential tools for monitoring Kubernetes clusters. Prometheus collects metrics from Kubernetes components, applications, and infrastructure, while Grafana visualizes these metrics in customizable dashboards. +Whether you are deploying your first Kubernetes cluster or looking to enhance your existing monitoring setup, understanding how to leverage Prometheus and Grafana effectively is essential. Let’s dive into the step-by-step process of deploying and configuring these powerful tools on Kubernetes. ---- +### Prerequisites +Before we get started, ensure you have the following: -## Why Use Prometheus and Grafana? -- **Comprehensive Monitoring**: Collects metrics from Kubernetes nodes, pods, and applications. -- **Custom Dashboards**: Grafana allows you to create tailored dashboards for specific use cases. -- **Alerting**: Prometheus supports alerting rules to notify you of critical issues. -- **Scalability**: Both tools can handle large-scale Kubernetes clusters. +- A running Kubernetes cluster. ---- +- kubectl command-line tool configured to communicate with your cluster. +- Helm (the package manager for Kubernetes) installed. -## Architecture -Prometheus and Grafana work together as follows: -1. **Prometheus**: Scrapes metrics from Kubernetes components and stores them in a time-series database. -2. **Grafana**: Queries Prometheus for metrics and visualizes them in dashboards. -3. **Alertmanager**: (Optional) Used with Prometheus to send alerts based on defined rules. +### Setting up Prometheus and Grafana +**Step 1: Adding the Helm Repository** ---- +First, add the Prometheus community Helm repository and update it: -## Installation -> **Note:** Detailed installation steps will be added soon. +```yaml +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +helm repo update +``` ---- -## Configuration -> **Note:** Configuration details for Prometheus, Grafana, and Alertmanager will be added soon. +**Step 2: Installing Prometheus and Grafana** + +Create a custom-values.yaml file to customize the Helm chart installation. This file will configure Prometheus and Grafana to be exposed via NodePorts. +```yaml +# custom-values.yaml +prometheus: + service: + type: NodePort +grafana: + service: + type: NodePort +``` + + +Then, install the kube-prometheus-stack using Helm: +```yaml +helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -f custom-values.yaml +``` +Output: +```yaml +$ helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -f custom-values.yaml +Release "kube-prometheus-stack" does not exist. Installing it now. +NAME: kube-prometheus-stack +LAST DEPLOYED: Sun Jun 16 17:04:53 2024 +NAMESPACE: default +STATUS: deployed +REVISION: 1 +NOTES: +kube-prometheus-stack has been installed. Check its status by running: + kubectl --namespace default get pods -l "release=kube-prometheus-stack" + +Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator. +``` +**Step 3: Verifying the Installation** + +After the installation, you can verify that the Prometheus and Grafana services are created and exposed on NodePorts: +```yaml +kubectl get services +``` +You should see output similar to this, showing the services with their respective NodePorts: +```yaml +$ kubectl get services +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +alertmanager-operated ClusterIP None 9093/TCP,9094/TCP,9094/UDP 5m19s +kube-prometheus-stack-alertmanager ClusterIP 10.245.239.151 9093/TCP,8080/TCP 5m22s +kube-prometheus-stack-grafana NodePort 10.245.30.17 80:31519/TCP 5m22s +kube-prometheus-stack-kube-state-metrics ClusterIP 10.245.26.205 8080/TCP 5m22s +kube-prometheus-stack-operator ClusterIP 10.245.19.171 443/TCP 5m22s +kube-prometheus-stack-prometheus NodePort 10.245.151.164 9090:30090/TCP,8080:32295/TCP 5m22s +kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.245.22.30 9100/TCP 5m22s +kubernetes ClusterIP 10.245.0.1 443/TCP 57d +prometheus-operated ClusterIP None 9090/TCP 5m19s +``` +**Step 4: Accessing Prometheus and Grafana** + +To access Prometheus and Grafana dashboards outside the cluster, you need the external IP of any node in the cluster and the NodePorts on which the services are exposed. + + +Get the external IP addresses of your nodes: +```yaml +kubectl get nodes -o wide +``` + +You should see output similar to this: + +```yaml +$ kubectl get nodes -o wide +NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME +pool-t5ss0fagn-jeb47 Ready 57d v1.29.1 10.124.0.2 146.190.55.222 Debian GNU/Linux 12 (bookworm) 6.1.0-17-amd64 containerd://1.6.28 +``` + +Use the external IP of any node and the NodePorts to access the dashboards: + + +Prometheus: http://146.190.55.222:30090 +Grafana: http://146.190.55.222:31519 + +### Access Prometheus +Use the below link as above to access the Prometheus UI + +Publiclp:Promethues-Port + +![img](./img/prometheus-ui.png.webp) + +![img](./img/prometheus-alerts.png.webp) + +### Access Grafana Default Dashboards +Use the below link as above to access the Grafana UI + +Publiclp:GRAFANA-PORT + +![img](./img/grafana-ui.png.webp) + + +Use the below command to get the Grafana Admin login: + + +**Username:** + +```yaml +$ kubectl get secret --namespace default kube-prometheus-stack-grafana -o jsonpath="{.data.admin-user}" | base64 --decode ; echo +admin +``` +**Password:** +```yaml +$ kubectl get secret --namespace default kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo +prom-operator +``` + + + +![img](./img/grafana-dashboards.png.webp) + +### Default Dashboards + +By default our previous setup will add few dashboards: + +![img](./img/dashboard.png.webp) + +Using these dashboards we can easily monitor our kubernetes cluster + +![img](./img/pod-dashboard-example.png.webp) + +### Add/Create new Dashboards + +We also have the flexibility to create our own dashboards from scratch or import multiple Grafana dashboards from the Grafana library. + + +To import a Grafana dashboard, follow these steps: + + +**Step 1:** Access the Grafana library. + +![img](./img/grafana-library.png.webp) + +**Step 2.** Select the desired dashboard ID to add. + +Considering K8s/Storage/Volumes/Namespace Dashboard + +![img](./img/dashboard-k8s.png.webp) + + +![img](./img/dashboard-id.png.webp) + +Copy the Id of K8s/Storage/Volumes/Namespace Dashboard i.e., 11455 + +**Step 3: Import selected Dashboard in Grafana** + +Access Dashboard section & click on Import section. + +![img](./img/dashboard-in-grafana.png.webp) + +Now enter the ID of the target new Dashboard i.e., 11455. + +![img](./img/import-dashboard-11455.png.webp) + +Click on Load to load the new dashboard into Grafana. + +![img](./img/import-dashboard.png.webp) + +Click on Import to import the new Dashboard & Access it. + +![img](./img/dashboard-current-alerts.png.webp) + + +These steps allow us to easily integrate any dashboard from the Grafana library. + + +### Prometheus Architecture + +Prometheus is a powerful monitoring and alerting toolkit designed for reliability and scalability. Understanding its architecture helps in leveraging its full potential. The Prometheus architecture comprises several key components: + +![img](./img/animated-promethes-and-grafana.png.gif) + +### Prometheus Server + +The Prometheus server is the core component responsible for: + +1. **Data Scraping:** Prometheus periodically scrapes metrics from configured targets, which are typically HTTP endpoints exposing metrics in a specified format. +2. **Data Storage**: It stores all scraped samples locally using a time series database. Prometheus is designed to be efficient with both storage and retrieval of time series data. +3. **Querying:** Prometheus allows you to query the time series data via the Prometheus Query Language (PromQL), which enables complex aggregations and calculations. + +### Prometheus Components + +1. **Prometheus Server:** The main component that does the bulk of the work, including scraping metrics from targets, storing the data, and providing a powerful query interface. +2. **Pushgateway:** An intermediary service used for pushing metrics from short-lived jobs that cannot be scraped directly by Prometheus. This is particularly useful for batch jobs and other processes with a finite lifespan. +3. **Exporters:** Exporters are used to expose metrics from third-party systems as Prometheus metrics. For example, Node Exporter collects hardware and OS metrics from a node, while other exporters exist for databases, web servers, and more. +4. **Alertmanager:** This component handles alerts generated by the Prometheus server. It can deduplicate, group, and route alerts to various receivers such as email, Slack, PagerDuty, or other notification systems. +5. **Service Discovery:** Prometheus supports various service discovery mechanisms to automatically find targets to scrape. This includes static configuration, DNS-based service discovery, and integrations with cloud providers and orchestration systems like Kubernetes. +6. **PromQL:** The powerful query language used by Prometheus to retrieve and manipulate time series data. PromQL supports a wide range of operations such as arithmetic, aggregation, and filtering. + +### Data Flow in Prometheus + +1. **Scraping Metrics:** Prometheus scrapes metrics from HTTP endpoints (targets) at regular intervals. These targets can be predefined or discovered dynamically through service discovery. +2. **Storing Metrics:** Scraped metrics are stored as time series data, identified by a metric name and a set of key-value pairs (labels). +3. **Querying Metrics:** Users can query the stored metrics using PromQL. Queries can be executed via the Prometheus web UI, HTTP API, or integrated with Grafana for visualization. +4. **Alerting:** Based on predefined rules, Prometheus can evaluate metrics data and trigger alerts. These alerts are sent to Alertmanager, which then processes and routes them to the appropriate notification channels. + +### Example of Prometheus Workflow + +1. **Service Discovery:** Prometheus discovers targets to scrape metrics from using service discovery mechanisms. For example, in a Kubernetes environment, it discovers pods, services, and nodes. +2. **Scraping:** Prometheus scrapes metrics from discovered targets at defined intervals. Each target is an endpoint exposing metrics in a format Prometheus understands (typically plain text). +3. **Storing:** Scraped metrics are stored in Prometheus’s time series database, indexed by the metric name and labels. +4. **Querying:** Users can query the data using PromQL for analysis, visualization, or alerting purposes. +5. **Alerting:** When certain conditions are met (defined by alerting rules), Prometheus generates alerts and sends them to Alertmanager. +6. **Alertmanager:** Alertmanager processes the alerts, deduplicates them, groups them if necessary, and sends notifications to configured receivers. + +### Understanding the Kubernetes Objects + +The Helm chart deploys various Kubernetes objects to set up Prometheus and Grafana. +```yaml +$ kubectl get all +NAME READY STATUS RESTARTS AGE +pod/alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 38m +pod/kube-prometheus-stack-grafana-76858ff8dd-76bn4 3/3 Running 0 38m +pod/kube-prometheus-stack-kube-state-metrics-84958579f9-g44sk 1/1 Running 0 38m +pod/kube-prometheus-stack-operator-554b777575-hgm8b 1/1 Running 0 38m +pod/kube-prometheus-stack-prometheus-node-exporter-cl98x 1/1 Running 0 38m +pod/prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 38m + +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +service/alertmanager-operated ClusterIP None 9093/TCP,9094/TCP,9094/UDP 38m +service/kube-prometheus-stack-alertmanager ClusterIP 10.245.239.151 9093/TCP,8080/TCP 38m +service/kube-prometheus-stack-grafana NodePort 10.245.30.17 80:31519/TCP 38m +service/kube-prometheus-stack-kube-state-metrics ClusterIP 10.245.26.205 8080/TCP 38m +service/kube-prometheus-stack-operator ClusterIP 10.245.19.171 443/TCP 38m +service/kube-prometheus-stack-prometheus NodePort 10.245.151.164 9090:30090/TCP,8080:32295/TCP 38m +service/kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.245.22.30 9100/TCP 38m +service/kubernetes ClusterIP 10.245.0.1 443/TCP 57d +service/prometheus-operated ClusterIP None 9090/TCP 38m + +NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE +daemonset.apps/kube-prometheus-stack-prometheus-node-exporter 1 1 1 1 1 kubernetes.io/os=linux 38m + +NAME READY UP-TO-DATE AVAILABLE AGE +deployment.apps/kube-prometheus-stack-grafana 1/1 1 1 38m +deployment.apps/kube-prometheus-stack-kube-state-metrics 1/1 1 1 38m +deployment.apps/kube-prometheus-stack-operator 1/1 1 1 38m + +NAME DESIRED CURRENT READY AGE +replicaset.apps/kube-prometheus-stack-grafana-76858ff8dd 1 1 1 38m +replicaset.apps/kube-prometheus-stack-kube-state-metrics-84958579f9 1 1 1 38m +replicaset.apps/kube-prometheus-stack-operator-554b777575 1 1 1 38m + +NAME READY AGE +statefulset.apps/alertmanager-kube-prometheus-stack-alertmanager 1/1 38m +statefulset.apps/prometheus-kube-prometheus-stack-prometheus 1/1 38m +``` + +Here’s a brief explanation of each type of object used: + +### Deployments: + +Deployments ensure that a specified number of pod replicas are running at any given time. They manage the creation, update, and deletion of pods. In this setup, Deployments are used for: + +**Grafana:** Manages the Grafana instance, ensuring it is always available. +**Kube-State-Metrics:** Exposes Kubernetes cluster-level metrics. + +Example: + +```yaml +deployment.apps/kube-prometheus-stack-grafana 1/1 1 1 38m +deployment.apps/kube-prometheus-stack-kube-state-metrics 1/1 1 1 38m +``` +### StatefulSets: + +StatefulSets are used for managing stateful applications that require persistent storage and stable network identities. They ensure that the pods are deployed in a specific order and have unique, stable identifiers. In this setup, StatefulSets are used for: + + +**Prometheus:** Ensures Prometheus instances have persistent storage for metric data. +**Alertmanager:** Manages the Alertmanager instances. + +Example: + +```yaml +statefulset.apps/alertmanager-kube-prometheus-stack-alertmanager 1/1 38m +statefulset.apps/prometheus-kube-prometheus-stack-prometheus 1/1 38m +``` +### DaemonSets: + +DaemonSets ensure that a copy of a pod is running on all (or some) nodes in the cluster. They are commonly used for logging and monitoring agents. In this setup, DaemonSets are used for: + +**Node Exporter:** Collects hardware and OS metrics from the nodes. + +Example: + +```yaml +daemonset.apps/kube-prometheus-stack-prometheus-node-exporter 1/1 38m +``` +### Cleanup Section +Use the below command to uninstall the prometheus stack + +```yaml +$ helm uninstall kube-prometheus-stack +``` + +### Advantages of Using Prometheus and Grafana + +Using Prometheus and Grafana together provides a powerful and flexible monitoring solution for Kubernetes clusters. Here are some of the key advantages: + +### Prometheus + +1. **Open Source and Community-Driven:** Prometheus is a widely adopted open-source monitoring solution with a large community, ensuring continuous improvements, support, and a plethora of plugins and integrations. +2. **Dimensional Data Model:** Prometheus uses a multi-dimensional data model with time series data identified by metric name and key/value pairs. This makes it highly flexible and powerful for querying. +3. **Powerful Query Language (PromQL):** Prometheus Query Language (PromQL) allows for complex queries and aggregations, making it easy to extract meaningful insights from the collected metrics. +4. **Efficient Storage:** Prometheus has an efficient storage engine designed for high performance and scalability. It uses a local time series database, making it fast and reliable. + + +5. **Alerting:** Prometheus has a built-in alerting system that allows you to define alerting rules based on metrics. Alerts can be sent to various receivers like email, Slack, or custom webhooks using the Alertmanager component. +6. **Service Discovery:** Prometheus supports multiple service discovery mechanisms, including Kubernetes, which makes it easy to dynamically discover and monitor new services as they are deployed. +### Grafana + +1. **Rich Visualization:** Grafana provides a wide range of visualization options, including graphs, charts, histograms, and heatmaps, allowing you to create comprehensive dashboards. +2. **Customizable Dashboards:** Grafana dashboards are highly customizable, enabling you to create tailored views that meet the specific needs of your team or organization. +3. **Integration with Multiple Data Sources:** While Grafana works seamlessly with Prometheus, it also supports many other data sources such as Elasticsearch, InfluxDB, and Graphite, making it a versatile tool for centralized monitoring. +4. **Alerting:** Grafana offers its own alerting system, allowing you to set up alert rules on dashboard panels and receive notifications via multiple channels, such as email, Slack, and PagerDuty. +5. **Templating:** Grafana allows the use of template variables in dashboards, making them reusable and more interactive. This feature helps in creating dynamic and flexible dashboards. +6. **User Management and Sharing:** Grafana supports user authentication and role-based access control, making it easier to manage access to dashboards. Dashboards can also be easily shared with team members or embedded in other applications. +7. **Plugins and Extensions:** Grafana has a rich ecosystem of plugins for different data sources, panels, and apps, allowing you to extend its functionality to meet your specific monitoring needs. +### Combined Benefits + +1. **Comprehensive Monitoring Solution:** Together, Prometheus and Grafana provide a complete monitoring solution, from metrics collection and storage (Prometheus) to powerful visualization and analysis (Grafana). +2. **Scalability:** Both Prometheus and Grafana are designed to scale with your infrastructure. Prometheus can handle millions of time series, while Grafana can manage numerous dashboards and data sources. +3. **Real-Time Monitoring and Alerting:** With Prometheus’s real-time metrics collection and Grafana’s real-time visualization, you can monitor your infrastructure’s health continuously and get alerted to issues promptly. +4. **Ease of Use:** Setting up Prometheus and Grafana is straightforward, especially with tools like Helm for Kubernetes, making it easy to deploy and manage the monitoring stack. +5. **Extensibility:** Both tools are highly extensible, allowing you to integrate them with other systems and customize them to fit your specific requirements. + + +By leveraging the strengths of Prometheus and Grafana, you can ensure that your Kubernetes environment is well-monitored, making it easier to maintain performance, reliability, and efficiency. +## Conclusion +Setting up Prometheus and Grafana on Kubernetes using Helm is straightforward and provides a powerful monitoring solution for your cluster. By exposing the services via NodePorts, you can easily access the dashboards from outside the cluster. This setup allows you to monitor your cluster’s performance, visualize metrics, and set up alerts to ensure your applications run smoothly. ---- -## Creating Dashboards -Grafana allows you to create custom dashboards to visualize metrics. Example steps: -1. Log in to Grafana. -2. Add Prometheus as a data source. -3. Create a new dashboard and add panels for specific metrics. -4. Use PromQL (Prometheus Query Language) to query metrics. - -Example PromQL queries: -- CPU usage of a pod: - ```promql - sum(rate(container_cpu_usage_seconds_total{pod="my-app-pod"}[5m])) - ``` -- Memory usage of a node: - ```promql - sum(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) - ``` -## Best Practices -- Use labels effectively in Prometheus to organize and query metrics. -- Set up retention policies to manage storage usage. -- Use Alertmanager to configure alerts for critical metrics. -- Monitor Prometheus and Grafana resource usage to ensure scalability. - ---- -Stay tuned for updates as we continue to enhance this guide!