-
Notifications
You must be signed in to change notification settings - Fork 304
intelmqctl stop bots are still running #2595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
short remark on:
You can also run the bots as systemd services directly: https://github.com/certtools/intelmq/tree/develop/contrib/systemd I'll look into your report in more detail tomorrow |
Thought about it, but I wanted to avoid having to keep systemd and intelmq bots in sync (just one more step/layer on top). So for us it's currently ensuring that intelmq runs when the server is rebooted (and catching the case where a bot crashes and needs to be restarted is something we try by monitoring the logs). |
Oh, this may explain some issues I sometimes see. 🤔 I'd actually suggest not "just" increasing the hardcoded time, but rebuild it a little to have x retries with a shorter sleep between them, ideally in every retry checking just outstanding bots. |
My intention was to simply make the time configurable (not only changing the hardcoded time). This obviously is the easy fix. Didn't think about multiple retries yet (would reduce the latency when a larger sleep-time is configured but not actually needed). i'm not sure about whether checking more often should be done (I don't have a feeling how many bots other users usually have running, checking on a large number often might be something we want to avoid -- maybe some kind of exponential backoff and not checking on bots which we know already stopped helps here). |
Not sure if I understand this correctly. If the operation succeeded, the exit code should be 0 and if it didn't (as in the case you describe), anything unequal to 0 is correct. What is the problem you are experiencing with the exit code?
Yes. It's correct and the behaviour you see (bots not exiting in time and thus causing the "confusion") is not particularly new either, but was never critical enough to address it. The 0.75 delay was a value that worked reasonably good and was small enough to not take too long. A proper solution would be what @kamil-certat said and the effort is approximately equal to a configurable delay. |
What I meant is that the stop operation actually worked (the bots are all stopped after all) but the exit code indicates some sort of error.
Alright, didn't find an issue for it (but searching for I see this is not high on your list of priorities so you might not bother implementing this. Because of the other feature I need to set up some place where I can develop anyhow, so I'd just implement this as well (if you're willing to merge something like this). Just to be clear you are more on the side of a simple |
I guess there is none.
That would be greatly appreciated.
I guess increments of 0.1s and a maximum waiting time of 5s (or equivalent: maximum steps) would be sensible defaults. You get bonus points if the loop iterations only check the status of the not-yet stopped bots instead of checking all the bots in every iteration =) (causes fewer delays) |
retry multiple times on `intelmqctl stop` to check if bots really stopped, since the bots might take longer to stop. Using retry in constrast to increasing the sleep_time keeps the delay short in case the bots did already stop.
retry multiple times on `intelmqctl stop` to check if bots really stopped, since the bots might take longer to stop. Using retry in constrast to increasing the sleep_time keeps the delay short in case the bots did already stop.
retry multiple times on `intelmqctl stop` to check if bots really stopped, since the bots might take longer to stop. Using retry in constrast to increasing the sleep_time keeps the delay short in case the bots did already stop.
Hi,
working more with intelmq these days I noticed when executing
intelmqctl stop
sometimes some bots are still reported as running afterwards in the output (not that big of an issue) and the exit-code is!= 1
(bigger issue, since my wrapper script (using systemd for restarting and most important for starting when booting the server) reacts on this).I noticed when running
intelmqctl status
after theintelmqctl stop
, the bots actually are reported as stopped. Looking deeper into the code responsible for stopping the bots, I noticed intelmq(ctl) uses the following prodecure for stopping the whole botnet:intelmq/intelmq/bin/intelmqctl.py
Lines 563 to 564 in aadc887
SIGTERM
signal (intelmq/intelmq/lib/processmanager.py
Lines 197 to 199 in aadc887
0.75
Seconds (intelmq/intelmq/bin/intelmqctl.py
Line 567 in aadc887
intelmq/intelmq/bin/intelmqctl.py
Lines 568 to 571 in aadc887
So to me it looks like on our server it takes too long until all the bots are finally stopped (when executing
intelmqctl status
the bots are stopped after all). In our case we're speeking about 16 bots on a server with 4 GiB RAM and 2 cores (not that impressive specs, but so far we're not dealing with massive amounts of data and half of the bots are really just for testing purposes).With this in mind, does my analysis make sense to you (as people knowing intelmq much better than I do)?
So far my approach would be simply increasing the time
intelmqctl stop
sleeps until checking on the bots (not generally, but adding this as a parameter to the CLI). Am I missing a simpler solution here?The text was updated successfully, but these errors were encountered: