APISIX 100% Memory Usage on RedHat Openshift Kubernetes Cluster #12227
-
Hi Community, I'm seeing high memory usage of apisix-data-plane pod on RedHat Openshift Cluster. After periodic requests the usage climbs up to 100%. Where as on AWS EKS cluster the memory usage is significantly low. On RedHat Openshift Cluster, the memory limit is set to 3Gi and it is using 100% of the memory.
Where as on AWS EKS Cluster
APISIX VERSION: 3.12.0
LUAROCKS VERSION: 3.8.0
Can anyone help me identify the issue? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
Hi @geeky-akshay , to debug the root reason, we need a relroducable env like yours.
Can you take some time to orgainze and send it to us? so we can have a try |
Beta Was this translation helpful? Give feedback.
-
EnvironmentRedHat Openshift Container Platform Version: 4.16.36
LoadBalancer: MetalLB Monitoring Stack: https://docs.redhat.com/en/documentation/openshift_container_platform/3.11/html/configuring_clusters/prometheus-cluster-monitoring#installing-monitoring-stack DeploymentAPISIX is deployed using Bitnami Helm Chart: https://github.com/bitnami/charts/tree/main/bitnami/apisix (version: 4.2.5, appVersion: 3.12.0) Custom Values: global:
security:
allowInsecureImages: true
controlPlane:
extraConfig:
apisix:
ssl:
ssl_trusted_certificate: '/bitnami/certs/{{ .Values.controlPlane.tls.certCAFilename }}'
lifecycleHooks:
postStart:
exec:
command:
- /bin/sh
- -c
- |
sleep 5;
rm /usr/local/apisix/logs/worker_events.sock
metrics:
enabled: true
serviceMonitor:
enabled: true
labels:
clusterobservability: "1"
cluster_component: "apisix-control-plane"
metricRelabelings:
- targetLabel: "cluster_component"
replacement: "apisix-control-plane"
resourcesPreset: "large"
dataPlane:
extraConfig:
apisix:
ssl:
fallback_sni: "apisix-data-plane.local"
plugin_attr:
prometheus:
metrics:
http_status:
expire: 600
http_latency:
expire: 600
bandwidth:
expire: 600
upstream_status:
expire: 600
redirect:
https_port: 443
lifecycleHooks:
postStart:
exec:
command:
- /bin/sh
- -c
- |
sleep 5;
rm /usr/local/apisix/logs/worker_events.sock
metrics:
enabled: true
serviceMonitor:
enabled: true
labels:
clusterobservability: "1"
cluster_component: "apisix-data-plane"
metricRelabelings:
- targetLabel: "cluster_component"
replacement: "apisix-data-plane"
resourcesPreset: "large"
ingressController:
extraConfig:
apisix_resource_sync_interval: 30m
kubernetes:
resync_interval: 1h
metrics:
enabled: true
serviceMonitor:
enabled: true
labels:
clusterobservability: "1"
cluster_component: "apisix-ingress-controller"
metricRelabelings:
- targetLabel: "cluster_component"
replacement: "apisix-ingress-controller"
replicaCount: 3
resourcesPreset: "small"
etcd:
autoCompactionMode: "revision"
autoCompactionRetention: "3"
extraEnvVars:
- name: ETCD_QUOTA_BACKEND_BYTES
value: "4294967296"
metrics:
enabled: true
podMonitor:
enabled: true
namespace: "openshift-monitoring"
additionalLabels:
clusterobservability: "1"
cluster_component: "apisix-etcd"
relabelings:
- targetLabel: "cluster_component"
replacement: "apisix-etcd"
resourcesPreset: null
resources:
requests:
cpu: 500m
memory: 512Mi Extra Resources: apiVersion: apisix.apache.org/v2
kind: ApisixClusterConfig
metadata:
name: default
namespace: apisix
spec:
monitoring:
prometheus:
enable: true ApisixGlobalRule.yaml apiVersion: apisix.apache.org/v2
kind: ApisixGlobalRule
metadata:
name: global
namespace: apisix
spec:
plugins:
- name: redirect
enable: true
config:
http_to_https: true ApisixTls.yaml apiVersion: apisix.apache.org/v2
kind: ApisixTls
metadata:
name: apisix-tls
namespace: apisix
spec:
hosts:
- <APISIX-EXTERNAL-IP-ADDRESS>
- apisix-data-plane.local
secret:
name: <CERT-CONTAINING-APISIX-EXTERNAL-IP-ADDRESS>
namespace: apisix |
Beta Was this translation helpful? Give feedback.
-
Upon more investigation I found that, when In AWS I see 4 worker processes, hence memory usage is lower
In Openshift I see 80 worker processes, hence memory usage is higher
When I explicitly set
Is there a way to control nginx worker processes in auto mode so that it doesn't run out of memory? |
Beta Was this translation helpful? Give feedback.
-
More investigation: In AWS,
Whereas in Openshift
|
Beta Was this translation helpful? Give feedback.
I guess setting worker_processes depending upon number of cpu allocated to container would be the solution.