AKS: Not all services are starting

The README.md for running setup_f5_aks.sh states that: "The cluster will have three Standard_D4_v3 nodes which have 4 CPU cores and 16 GB of memory.", however it does not seem to be enough for all services to start. 

kubectl get pods
NAME                                                           READY   STATUS                  RESTARTS   AGE
f5-admin-ui-6896d75965-g4mqp                                   1/1     Running                 0          6d17h
f5-ambassador-57c8d798d-cjzwc                                  0/1     CrashLoopBackOff        1869       6d17h
f5-api-gateway-df9b65dc6-dfsd5                                 0/1     Init:CrashLoopBackOff   1361       6d17h
f5-argo-ui-7c87f99b64-vmx2s                                    1/1     Running                 0          6d17h
f5-auth-ui-7765cd5488-h5wx8                                    1/1     Running                 0          6d17h
f5-classic-rest-service-0                                      0/1     Init:0/3                1361       6d17h
f5-connectors-68f64c488f-m6mts                                 0/1     Pending                 0          6d17h
f5-connectors-backend-69d877b594-k6tb5                         0/1     Pending                 0          6d17h
f5-devops-ui-86bc48f54-2c65h                                   1/1     Running                 0          6d17h
f5-fusion-admin-75794787c9-pn294                               0/1     Pending                 0          6d17h
f5-fusion-indexing-8479f45ffc-bmqkj                            0/1     Init:CrashLoopBackOff   1361       6d17h
f5-fusion-log-forwarder-9c768c45-tg4m9                         0/1     Init:CrashLoopBackOff   1360       6d17h
f5-insights-5ff56c5d-95vcd                                     1/1     Running                 0          6d17h
f5-job-launcher-6f7896dc-59g8m                                 0/1     CrashLoopBackOff        2220       6d17h
f5-job-rest-server-58994d99dd-6v64z                            0/1     Init:CrashLoopBackOff   1361       6d17h
f5-ml-model-service-7448f97bf6-s9m6s                           0/1     Init:CrashLoopBackOff   1357       6d17h
f5-monitoring-grafana-6647cddd56-m45cl                         1/1     Running                 0          6d17h
f5-monitoring-prometheus-kube-state-metrics-647cd65579-qc8kc   1/1     Running                 0          6d17h
f5-monitoring-prometheus-pushgateway-5dd445ff4f-pccht          1/1     Running                 0          6d17h
f5-monitoring-prometheus-server-0                              2/2     Running                 0          6d17h
f5-mysql-5666f7474f-xz7cs                                      1/1     Running                 0          6d17h
f5-pm-ui-5d4cb9f8f6-xbsr8                                      1/1     Running                 0          6d17h
f5-pulsar-bookkeeper-0                                         0/1     Init:CrashLoopBackOff   1360       6d17h
f5-pulsar-broker-0                                             0/1     Init:0/4                0          6d17h
f5-pulsar-broker-1                                             0/1     Init:0/4                0          6d17h
f5-query-pipeline-6c4ff48788-8rw6c                             0/1     Pending                 0          6d17h
f5-rules-ui-5fd49b5974-smq4k                                   1/1     Running                 0          6d17h
f5-solr-0                                                      0/1     Init:CrashLoopBackOff   1360       6d17h
f5-solr-exporter-778cfc8566-fqtg8                              0/1     Init:0/1                0          6d17h
f5-templating-567f74c8c4-d8skj                                 0/1     Pending                 0          6d17h
f5-tikaserver-6bbd4dd778-59hw8                                 1/1     Running                 0          6d17h
f5-webapps-c5cb654cc-njjcs                                     0/1     Init:CrashLoopBackOff   1360       6d17h
f5-workflow-controller-7bc469557b-l2dml                        1/1     Running                 0          6d17h
f5-zookeeper-0                                                 1/1     Running                 0          6d17h
f5-zookeeper-1                                                 0/1     Pending                 0          6d17h
milvus-writable-64bc9f8b75-hdfsw                               1/1     Running                 0          6d17h
seldon-controller-manager-85cc4458dc-w9zmw                     1/1     Running                 2          6d17h

I believe most containers are in CrashLoopBackOff because they cannot verify connection to the zookeeper.

kubectl describe pod f5-api-gateway-df9b65dc6-dfsd5
Name:         f5-api-gateway-df9b65dc6-dfsd5
Namespace:    default
Priority:     0
Node:         aks-agentpool-20404971-vmss000000/10.240.0.4
Start Time:   Wed, 03 Nov 2021 01:26:45 +0000
Labels:       app.kubernetes.io/component=api-gateway
              app.kubernetes.io/instance=f5
              app.kubernetes.io/part-of=fusion
              pod-template-hash=df9b65dc6
Annotations:  prometheus.io/path: /actuator/prometheus
              prometheus.io/port: 6764
              prometheus.io/scrape: true
Status:       Pending
IP:           10.244.0.18
IPs:
  IP:           10.244.0.18
Controlled By:  ReplicaSet/f5-api-gateway-df9b65dc6
Init Containers:
  check-zk:
    Container ID:  containerd://764fad878747462caeb8147f618c8613ef9e1be76d446a31a61c452f8630056e
    Image:         lucidworks/check-fusion-dependency:v1.2.0
    Image ID:      docker.io/lucidworks/check-fusion-dependency@sha256:9829ccb6a0bea76ac92851b51f8fd8451b7f803019adf27865f093d168a6b19e
    Port:          <none>
    Host Port:     <none>
    Args:
      zookeeper
    State:          Waiting
      Reason:       CrashLoopBackOff

Events for kubectl describe pod f5-zookeeper-1:

Events:
  Type     Reason             Age                        From                Message
  ----     ------             ----                       ----                -------
  Warning  FailedScheduling   2m41s                      default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   91m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   79m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   78m                        default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   78m                        default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   77m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   67m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   66m                        default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   65m                        default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   65m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   55m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   53m                        default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   53m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   52m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   42m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636480742}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   41m                        default-scheduler   0/3 nodes are available: 1 node(s) exceed max volume count, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   40m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {node.cloudprovider.kubernetes.io/shutdown: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   40m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   39m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   29m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   28m                        default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   27m                        default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   27m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   17m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   15m                        default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   15m                        default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   15m                        default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   5m2s                       default-scheduler   0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   3m47s                      default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling   3m36s                      default-scheduler   0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
  Normal   NotTriggerScaleUp  2m42s (x29319 over 3d23h)  cluster-autoscaler  pod didn't trigger scale-up: 1 max node group size reached

Would you be able to let me know what resources are needed in order to have all services up?

Thanks,
Greg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AKS: Not all services are starting #198

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AKS: Not all services are starting #198

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions