-
Hi, peeps. Here's the problem: vector is used in a volatile, temporarily lived pipeline. Scenario is the following:
But here is the catch: how do I know that Vector finished and all data is already out of it? Meaning, when do I begin my log data analysis? — immediately after the EOF? Wait 10 seconds? A minute? Longer?.. The easiest way would be to know how many messages it processed so far, or at least have some "kick" to it with API or so, to ensure it shuts down properly, flushing everything. For example, if I send SIGINT to the process (https://vector.dev/docs/reference/cli/#vector_graceful_shutdown_limit_secs), while Vector although claims "I am shutting down gracefully", it actually seems still chewing/yanking log messages in a pool of PostgreSQL sink, effectively killing them, eventually hindering my analysis results. 😉 Currently I just duct-taped it with a sleep of ~10 seconds, and then SIGINT'ing it (and that works). But this also sucks due to the obvious reasons. So what would be the idea here? How to properly shut it down, 100% ensuring Vector is really finished? @jszwedko would be nice if you lit a bit of light here! (thanks) |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 13 replies
-
Hi @isbm , Vector unfortunately lacks a good way to know when processing is done. I think the workaround you came up with is probably the best you could do. #11095 is tracking these sorts of "ETL" use-cases. |
Beta Was this translation helpful? Give feedback.
-
@jszwedko well OK, but "graceful shutdown" on SIGINT should still at least catch all the waiting/processing messages in all pools/buffs and wait until they are done, right? IOW, you don't quit until any pool/source/sink across the whole thing still contains anything, and quit iff all sources/sinks are length of 0. Basically I would still expect:
If that would not be the case right now, I would consider this as a bug that needs to be fixed, or? 😉 Because otherwise means you are losing data. Unless I am wrong, I don't see that the fix is hard. Actually all it needs to be done is to define "length" on each sink/source in a trait, and force to implement it everywhere. Then you just loop over all defined sources/sinks checking if they are 0 and sleep for 0.05 😉. Would that fix work? Because if yes, we would PR that. At some point... |
Beta Was this translation helpful? Give feedback.
-
@jszwedko OK, so in this case "timeout" is a thing anyways. The question only remains: Should we make the timeout dynamic? I.e. Vector must be fully aware that it is safe to completely terminate, because all sinks/sources are already empty; instead of keep waiting for no reasons till the last second of fixed timeout, which also (potentially) can be too short, as you never know what kind of processing user do. The benefit of this approach is that Vector would not need to wait a minute, but could terminate immediately as long as there are no more messages left to process, thus saving DevOps time to fire up their tests, likewise would wait even longer, if necessary. |
Beta Was this translation helpful? Give feedback.
It is possible to disable the timeout via
--no-graceful-shutdown-limit
. See https://vector.dev/docs/reference/cli/#vector_no_graceful_shutdown_limit. Note Vector will shut down earlier if there is no more input to process; the limit is just the maximum amount of time it will wait.