-
Notifications
You must be signed in to change notification settings - Fork 17
Delete unused indices. #919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
922abe2
to
ab6c9c0
Compare
ab6c9c0
to
455bde4
Compare
62f09b1
to
988c2e6
Compare
We should determine the conditions to keep an index. Right now, it is specifically related to the live iterations and those around it, but we could also add some more predicates, such as how old an iteration is (for example, "whatever happens, keep the indices for the iterations created in the last 2 weeks"). |
988c2e6
to
a8ff882
Compare
After discussion, I think this could be improve with some sort of minimal set algorithm. If we can afford to do several passes during the job execution, we can find, for all selected iterations, the minimal set of smallest indices required for all iterations to work properly. If we merge that with a guard that skips initial index creation if the required index is already included in a larger one, we can have an ideal situation, short of using statistics for the heuristic. |
51b3aaf
to
0a55b8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few last comments, looks about good to merge in dry run mode.
I just realized something that's missing (even in current code): we have a dead angle on any aggregates used in screening rule trigger conditions that we don't take into account anywhere. Possibly makes most sense to include this in a separate PR since it's more of a (possible, rare) bugfix than directly related to this feature, although it also indirectly concerns it (we could add the fetching of screening configs with the iterations in the big SQL query)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
bf4311a
to
4583080
Compare
4583080
to
c695afb
Compare
When creating scenario iterations that perform aggregate queries, the prepare step creates database indices in the background. Those indices are never deleted if the live scenario iteration does not require them anymore, which contributes to database bloat and slower ingestion.
This PR adds a worker that compares all required indices to those existing in the database, and deletes them if they are deemed not necessary for operations.
Since index creation can be a time-consuming operation that can delay a scenario publication, we are somewhat lenient as to what we consider a required index. As it is, we expand the notion of "live iteration" to "live iteration and neighbors", which means the previous iteration and all next iterations.
For example, if a scenario has iterations 1-13, with iteration 10 being live, we will consider iterations 9-13 as live-adjacent and keep their indices.
We should only consider indices that are used for aggregations here, so we explicitly exclude other indices based on their names (which is why it is important, when adding a new type of index, to name it carefully). As of now, we have four types of indices on the client tables:
*_pkey
)nav_*
)uniq_idx_*
)idx_
)An edge case was found if a user re-prepares (after its induces were deleted) an old version of a scenario: the job would, if it runs after the preparation, immediately delete the newly-created index, and the old version could not be activated.
A tentative commit was added on top of this tree to bump an iteration
updated_at
when preparing it so it can be included in the "live-adjacent" iterations (if an iteration'supdated_at
is more recent than that of the live iteration, it means it could become live).