K8S Deployment seeing uncapped memory growth #2042

MaxwellDPS · 2023-04-17T15:46:30Z

MaxwellDPS
Apr 17, 2023

Hey all!

I have an SC4S instance deployed on a k8s cluster that seems to keep up with the volume of traffic it receives (~1.5+ Gbps).
However the memory usage of the pods continues to grow and never seems to decrease as load decreases, After the pods get to ~ 5.5 of their 6GB limit they start to drop events.

This seems like it may be related, but context is lacking #1946

Is it expected to have memory grow from ~900 MB to ~5 GB in just over a day, without any decrease in usage?

We have been handling this by preemptively restarting the statefulset every ~12 Hours, are there any know issues with this approach?

What can we do to try and stabilize memory use over time?

Thanks in advance!!

Extra Info

Deployment info

Host(s): 2x 12c @ 3+GHZ | 32GB DDR4 | 10G networking (These are worker nodes dedicated to SC4S)
OS: CentOS Stream 9 5.14.0-176.el9.x86_64
Kubernetes: v1.25.3
Container Runtime: cri-o://1.25.2

Ingress provided by MetalLB, with a externalTrafficPolicy of local

SC4S Config

~ 12 Vendor ports listening
Output to a single HEC endpoint

dest_hec_workers: 50
Reliable: no
Disk buffering is also showing no messages being written to disk ( All files, all pods)

Disk-buffer state loaded; filename='/var/lib/syslog-ng/syslog-ng-00000.qf', number_of_messages='0'

Command used to test disk buffer files

for pod in $(kubectl get po -n sc4s --output=jsonpath={.items..metadata.name}); do echo $pod && kubectl exec -it $pod -n sc4s -- bash -c 'for qfile in /var/lib/syslog-ng/*.qf ; do dqtool info $qfile 2>&1 ; done'; done

HEC Destination

SC4S feeds a scaleable NGINX LB that forwards round robin to 30+ indexers
This does not seem to be a bottleneck

Pod info

resources:
  limits:
#    cpu: 100m 🚀
    memory: 6Gi
  requests:
    cpu: 2000m
    memory: 2Gi

autoscaling:
  enabled: true
  minReplicas: 4
  maxReplicas: 15
  metrics:
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

k top no (under ~80% max load)

NAME                       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY% 
<DEDICATED NODE #1>        3436m        28%    6417Mi          20%       
<DEDICATED NODE #2>        3030m        25%    4976Mi          16%

MaxwellDPS · 2023-04-24T15:59:07Z

MaxwellDPS
Apr 24, 2023
Author

@rjha-splunk Howdy! You seem to be the primary maintainer for SC4S these days, have you encountered this issue / do you have any recommendations for a fix?

1 reply

rjha-splunk Aug 11, 2023
Maintainer

Somehow looks like i didnt see this message, although we havent seen it before, i will check it next week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

K8S Deployment seeing uncapped memory growth #2042

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

K8S Deployment seeing uncapped memory growth #2042

Uh oh!

MaxwellDPS Apr 17, 2023

Extra Info

Deployment info

SC4S Config

HEC Destination

Pod info

k top no (under ~80% max load)

Replies: 1 comment · 1 reply

Uh oh!

MaxwellDPS Apr 24, 2023 Author

Uh oh!

rjha-splunk Aug 11, 2023 Maintainer

MaxwellDPS
Apr 17, 2023

Replies: 1 comment 1 reply

MaxwellDPS
Apr 24, 2023
Author

rjha-splunk Aug 11, 2023
Maintainer