Replies: 2 comments
-
Hi @abhisgup, In your S3-to-S3 pipeline using Vector ( Just because Vector's process memory stays high doesn't always mean it's actively using it. For complex topics like this, I usually have a notebook with multiple timeseries to help me get an overview at an abstract level: I will try to describe all the important Vector themes that affect memory usage. 🔍 What affects Vector's memory usage?1. In-Memory Buffers
2. Batching in Sinks
Example: A 50MB batch will use ~50MB (very rough estimation) memory until flushed. 3. High-Cardinality Output Keys
4. Transform Behavior
5. Concurrency in S3 Source
🧠 When Is a Memory Spike "Normal"?A memory spike can be expected if:
Memory should stabilize after the burst. 🚨 When Should You Worry?
✅ What You Can Do
📚 References |
Beta Was this translation helpful? Give feedback.
-
@pront Thanks a lot for responding. I have gone through the links you shared and will be trying the suggestions.
Each input file has JSON formatted log records without a newline separator because of which I had to write Vector ingested around 9 GB of gzipped logs over a period of 1 hour in all the runs, with the log generation (in the input S3 bucket) rate being constant. 2.56 MB of compressed logs per second If vector always took 20-30 GB for this workload, I could have done the capacity planning taking that into account. But vector has produced the output without any extra delay even when using 3 GB of memory for quite some time (40 minutes out of 1 hour in the first run). That makes it difficult to do capacity planning - I can still plan as per the highest memory usage but that might lead me to overprovision the machines which isn't very economical. High memory usage is less of a problem than the variance in memory usage for the same workload. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Today I conducted two runs (each lasting around an hour) to ingest logs into a deployment of vector running on AWS. The rate at which the logs were generated for both the runs was the same.
Vector was reading the SQS event notifications coming from the input S3 bucket to perform some transformations on the logs and output them to a output S3 bucket.
I noticed that Vector was consuming around 3 GB of memory for the first 40 minutes of the first run and then the memory used shot up to around 25 GB. The entire duration of the second run used around 25 GB.
Between the two runs, the docker container of vector was restarted.
timberio/vector:0.46.1-debian is the docker image I was using.
Few days ago I had used timberio/vector:0.46.0.custom.fba8185-debian docker image and conducted multiple runs (with the same log generation rate and duration) where I observed that vector would be producing the output appropriately (there was no significant lag) while consuming 3 GB of memory for quite some time and then the memory requirment would suddenly shoot up to 20-30 GB.
Any idea why the memory usage of vector might be so erratic?
Beta Was this translation helpful? Give feedback.
All reactions