Skip to content

Save files based on processing time and not event time #44

@KhaoticMind

Description

@KhaoticMind

Currently the output plugin saves temporary files (the ones that will be sent do ADX) based on the @timestamp field of the events.

When working with a large environment we may have cases where devices aren't fully time synced and might send events close to each other but with very different timestamp values. In this case the plugin will end up creating various small files to send to ADX, what can increase the load on the cluster (various small files being ingested) and increase the cost of the service (various small files end-up triggering various write-operations that accumulate to the total value).

We were facing this issue in our environment and by customizing the filter step on logstash like bellow we forced all events to be write to the file base on the processing time, and not the event time. This helped as cut reduce the write operations costs by 95% (yeah, ninity-five percent!).

   mutate {
      copy => { "@timestamp" => "event_timestamp" }
   }
   ruby {
      code => "event.set('logstash_processed_at', Time.now());"
   }
   mutate {
      copy => { "logstash_processed_at" => "@timestamp" }
   }
   mutate {
      remove_field  => ["logstash_processed_at"]
   }

Before the changes we saw various small files (in kilobytes) being ingested every minute, and now have just one file with 100-200Mb per minute.

While we know that we need to fix the time of the devices sending the data, this issue might also happen because of buffering and send delays (network disconnects and what else).
@avneraa and @ag-ramachandran are aware of our situation.
Unfortunately, we didn't have the time to try to change the plugin code to contribute with a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions