Skip to content

AKS: Not all services are starting #198

@greg-bucko

Description

@greg-bucko

The README.md for running setup_f5_aks.sh states that: "The cluster will have three Standard_D4_v3 nodes which have 4 CPU cores and 16 GB of memory.", however it does not seem to be enough for all services to start.

kubectl get pods
NAME READY STATUS RESTARTS AGE
f5-admin-ui-6896d75965-g4mqp 1/1 Running 0 6d17h
f5-ambassador-57c8d798d-cjzwc 0/1 CrashLoopBackOff 1869 6d17h
f5-api-gateway-df9b65dc6-dfsd5 0/1 Init:CrashLoopBackOff 1361 6d17h
f5-argo-ui-7c87f99b64-vmx2s 1/1 Running 0 6d17h
f5-auth-ui-7765cd5488-h5wx8 1/1 Running 0 6d17h
f5-classic-rest-service-0 0/1 Init:0/3 1361 6d17h
f5-connectors-68f64c488f-m6mts 0/1 Pending 0 6d17h
f5-connectors-backend-69d877b594-k6tb5 0/1 Pending 0 6d17h
f5-devops-ui-86bc48f54-2c65h 1/1 Running 0 6d17h
f5-fusion-admin-75794787c9-pn294 0/1 Pending 0 6d17h
f5-fusion-indexing-8479f45ffc-bmqkj 0/1 Init:CrashLoopBackOff 1361 6d17h
f5-fusion-log-forwarder-9c768c45-tg4m9 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-insights-5ff56c5d-95vcd 1/1 Running 0 6d17h
f5-job-launcher-6f7896dc-59g8m 0/1 CrashLoopBackOff 2220 6d17h
f5-job-rest-server-58994d99dd-6v64z 0/1 Init:CrashLoopBackOff 1361 6d17h
f5-ml-model-service-7448f97bf6-s9m6s 0/1 Init:CrashLoopBackOff 1357 6d17h
f5-monitoring-grafana-6647cddd56-m45cl 1/1 Running 0 6d17h
f5-monitoring-prometheus-kube-state-metrics-647cd65579-qc8kc 1/1 Running 0 6d17h
f5-monitoring-prometheus-pushgateway-5dd445ff4f-pccht 1/1 Running 0 6d17h
f5-monitoring-prometheus-server-0 2/2 Running 0 6d17h
f5-mysql-5666f7474f-xz7cs 1/1 Running 0 6d17h
f5-pm-ui-5d4cb9f8f6-xbsr8 1/1 Running 0 6d17h
f5-pulsar-bookkeeper-0 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-pulsar-broker-0 0/1 Init:0/4 0 6d17h
f5-pulsar-broker-1 0/1 Init:0/4 0 6d17h
f5-query-pipeline-6c4ff48788-8rw6c 0/1 Pending 0 6d17h
f5-rules-ui-5fd49b5974-smq4k 1/1 Running 0 6d17h
f5-solr-0 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-solr-exporter-778cfc8566-fqtg8 0/1 Init:0/1 0 6d17h
f5-templating-567f74c8c4-d8skj 0/1 Pending 0 6d17h
f5-tikaserver-6bbd4dd778-59hw8 1/1 Running 0 6d17h
f5-webapps-c5cb654cc-njjcs 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-workflow-controller-7bc469557b-l2dml 1/1 Running 0 6d17h
f5-zookeeper-0 1/1 Running 0 6d17h
f5-zookeeper-1 0/1 Pending 0 6d17h
milvus-writable-64bc9f8b75-hdfsw 1/1 Running 0 6d17h
seldon-controller-manager-85cc4458dc-w9zmw 1/1 Running 2 6d17h

I believe most containers are in CrashLoopBackOff because they cannot verify connection to the zookeeper.

kubectl describe pod f5-api-gateway-df9b65dc6-dfsd5
Name: f5-api-gateway-df9b65dc6-dfsd5
Namespace: default
Priority: 0
Node: aks-agentpool-20404971-vmss000000/10.240.0.4
Start Time: Wed, 03 Nov 2021 01:26:45 +0000
Labels: app.kubernetes.io/component=api-gateway
app.kubernetes.io/instance=f5
app.kubernetes.io/part-of=fusion
pod-template-hash=df9b65dc6
Annotations: prometheus.io/path: /actuator/prometheus
prometheus.io/port: 6764
prometheus.io/scrape: true
Status: Pending
IP: 10.244.0.18
IPs:
IP: 10.244.0.18
Controlled By: ReplicaSet/f5-api-gateway-df9b65dc6
Init Containers:
check-zk:
Container ID: containerd://764fad878747462caeb8147f618c8613ef9e1be76d446a31a61c452f8630056e
Image: lucidworks/check-fusion-dependency:v1.2.0
Image ID: docker.io/lucidworks/check-fusion-dependency@sha256:9829ccb6a0bea76ac92851b51f8fd8451b7f803019adf27865f093d168a6b19e
Port:
Host Port:
Args:
zookeeper
State: Waiting
Reason: CrashLoopBackOff

Events for kubectl describe pod f5-zookeeper-1:

Events:
Type Reason Age From Message


Warning FailedScheduling 2m41s default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 91m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 79m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 78m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 78m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 77m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 67m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 66m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 65m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 65m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 55m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 53m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 53m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 52m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 42m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636480742}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 41m default-scheduler 0/3 nodes are available: 1 node(s) exceed max volume count, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 40m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {node.cloudprovider.kubernetes.io/shutdown: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 40m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 39m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 29m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 28m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 27m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 27m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 17m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 5m2s default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 3m47s default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 3m36s default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Normal NotTriggerScaleUp 2m42s (x29319 over 3d23h) cluster-autoscaler pod didn't trigger scale-up: 1 max node group size reached

Would you be able to let me know what resources are needed in order to have all services up?

Thanks,
Greg

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions