S3 Batching with VRL Template syntax Key_prefix #23013
-
Quick question on the expected behavior of batching to s3. I have the below s3 config for my sink which is accepting a lot of different data sources. Based on this config would my assumption of multiple 30MB files being written to s3 OR 1 file a every 60 seconds be correct? We are seeing hundreds of thousands of small 10kb files being written in reality. The running theory we are testing is that the VRL Syntax is causing the files to be closed and written to s3 before they hit either condition. For an example we have a stream of two data sources with different s3: type: aws_s3
inputs:
- 'ocsf_*'
bucket: "dest_bucket_for_ocsf"
region: "us-east-1"
key_prefix: "ocsf/{{.unmapped.dl_partition_class_name}}/{{.unmapped.dl_partition_vendor_name}}/%Y%m%d/"
compression: zstd
healthcheck:
enabled: true
timeout_secs: 15
batch:
max_bytes: 30973400 # 30 MB
timeout_secs: 60
encoding:
codec: json |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
When batching is configured like this: batch:
max_bytes: 30973400 # 30MB
timeout_secs: 60 # 60s Vector should flush a batch to S3 when either:
However, a separate batch is maintained for each unique S3 object key, and in your case, the key is dynamically derived from:
If |
Beta Was this translation helpful? Give feedback.
Fixed the link. Unless specified it should use the default. In this case, it actually is
None
. So this setting doesn't really affect batching unless specified in the config.