- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6
Milestone 5
- Deploying our services across both the IU and TACC instances of Jetstream (using Kubespray).
- Installing Istio (Service Mesh) on both IU & TACC
- Modifying Jenkins pipeline to deploy latest versions to both IU & TACC (for keeping both in sync)
- Switching between TACC and IU VMs with no disruption of services over Network Failure using “blue-green” deployments.
- Testing failover
The below diagram explains our Jetstream deployment, where both IU & TACC have the Kubernetes cluster installed. The only difference is the IU holds 2 extra instances HAproxy & Jenkins.
- Injected sidecar proxy
- Connects Control plane (istiod) with the data plane (pods with sidecar proxy)
We have the following set of commands to install & deploy istio along with deploying the kiali dashboard by converting it from ClusterIP to LoadBalancer.
  curl -L https://istio.io/downloadIstio | sh - &&
  cd istio-1.9.3/ &&
  export PATH=$PWD/bin:$PATH &&
  istioctl install -y &&
  kubectl label namespace default istio-injection=enabled &&
  cd .. &&
  kubectl delete -f PingIntelligence/ &&
  kubectl apply -f PingIntelligence/ &&
  cd PingIntelligence/ &&
  git checkout automation-script &&
  cp ./kiali.yaml ../istio-1.9.3/samples/addons/ &&
  git checkout kubernetes_files &&
  cd .. &&
  kubectl apply -f istio-1.9.3/samples/addons/
Here is a snapshot of the kiali dashboard:
Kiali Dashboard is accessible from the below URLs:
In order to achieve blue-green deployment, we have created bash scripts that make use of Kubernetes & Istio for knowing the status. The script files deployed on each instance can be found below links:
The Monitor Script present at "149.165.172.138" runs continuously. It monitors for any failure in IU (blue deployment). If it detects any then the Database states are extracted and loaded onto the TACC server (Green deployment). Later, the HAproxy redirects the URL from IU's master node to TACC's master node. Similarly, if any failure occurs on TACC then control will switch from TACC to IU.
Hence, at any given point of time, both IU & TACC are up & running. The HAproxy URL to access it is: http://149.165.172.138/
Assumption: IU/TACC clusters may go down but HAProxy Server won’t fail.
- Free, Open-source.
- Provides high-availability load balancer and proxy server across multiple servers.
- Configuring HAproxy (port conflicts with Nginx)
- Extracting & loading DB states
- Identifying reasons for failover
- Building shell script for continuous monitoring using HAProxy Server
The HAproxy status can also be monitored using HAproxy dashboard using the link.
Username: ubuntu
Password: ubuntu
Below is the snapshot of the dashboard:
Below are the links we referred to know the possible disruptions and we tried to address the same using these scripts:
- https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
- https://kubernetes.io/docs/concepts/architecture/nodes/#condition
- https://istio.io/latest/docs/ops/diagnostic-tools/proxy-cmd/#:~:text=STALE%20means%20that%20Istiod%20has,a%20bug%20with%20Istio%20itself.
Another alternative that we thought of was the KeepAliveD package. The advantage of using that is it does not require additional instances and redirection is done automatically (without the need to write the script). Redirection is done using mechanisms: VIP (Virtual IP), VRRP (Virtual Router Redundancy Protocol), Heartbeat (to elect the next Master node in case the current master fails). The Master node is elected from a set of Backup nodes based on the configured priority. The only challenge we faced is to know the status of the Master node (when does the control shift from master to backup using the Health-checkup script). Hence, we went ahead with HAproxy.
We tested the failover from IU to TACC by pausing the worker node - 2. This lead to disruption of service provided by pods present in node-2.
Similarly, we tested TACC to IU failover using another possibility (deleting all pods from the Kubernetes cluster).
  kubectl delete deployment --all
With both failovers, we observed successful transition of control intially from IU -> TACC and then TACC -> IU (by maintaining the database states).