Large amount files will lead to slow throughput rate to kafka #15467
Unanswered
WilliamEricCheung
asked this question in
Q&A
Replies: 1 comment
-
Since no one comments on my post, we tried to modify our scripts to just keep almost 100 logs within 4 hours. Now the vector can run smoothly, but it's not the solution we want for our business. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
#15276 (comment)
A note for the community
Problem
Background

Hello, we are migrating services logs using vector (1 host) -- kafka (3 hosts) -- logstash (3 hosts) -- opensearch (1 kibana host) pipeline. Our services will output almost 25 gzip files per hour in certain directory, each size is about 900M, and we have a script to remove 1 day ago deprecated logs from this directory to reduce storage pressure.
When we start vector at Nov 16, 2022 @ 9:00:00, the throughput was steady until 12:00:00, during this 3 hour, we could get about 450M indice hits per hour in kibana. However, the hits rate rapidly down after 12 PM, we just got 100M at 3 PM, 80M at 6 PM, 70M at 9 PM, 60 M at Nov 17, 2022 0 AM, 50M at 3 AM and 40M at 6 AM.
Our thinking
Every hour our services put 25 files, for one whole day, we will have 24*25=600 files. Even we use a script to keep just one day logs, the file total size is very heavy here - 600 * 900M = 540G.
Question
Basic Infomation
Sources format: gzip files
Files directory: /local/vectorLog
Files format: /local/vectorLog/requests.log.yyyy-MM-dd-HH.gz
Files size: ~900M
Configuration
Version
vector 0.23.3 (x86_64-unknown-linux-gnu af8c9e1 2022-08-10)
Debug Output
Example Data
No response
Additional Context
No response
References
No response
Beta Was this translation helpful? Give feedback.
All reactions