Skip to content

Delete unused indices. #919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 16, 2025
Merged

Delete unused indices. #919

merged 5 commits into from
Jul 16, 2025

Conversation

apognu
Copy link
Contributor

@apognu apognu commented Mar 19, 2025

When creating scenario iterations that perform aggregate queries, the prepare step creates database indices in the background. Those indices are never deleted if the live scenario iteration does not require them anymore, which contributes to database bloat and slower ingestion.

This PR adds a worker that compares all required indices to those existing in the database, and deletes them if they are deemed not necessary for operations.

Since index creation can be a time-consuming operation that can delay a scenario publication, we are somewhat lenient as to what we consider a required index. As it is, we expand the notion of "live iteration" to "live iteration and neighbors", which means the previous iteration and all next iterations.

For example, if a scenario has iterations 1-13, with iteration 10 being live, we will consider iterations 9-13 as live-adjacent and keep their indices.

We should only consider indices that are used for aggregations here, so we explicitly exclude other indices based on their names (which is why it is important, when adding a new type of index, to name it carefully). As of now, we have four types of indices on the client tables:

  • Primary keys (*_pkey)
  • Navigation indices (nav_*)
  • Unicity indices (uniq_idx_*)
  • Aggregation indices (idx_)

An edge case was found if a user re-prepares (after its induces were deleted) an old version of a scenario: the job would, if it runs after the preparation, immediately delete the newly-created index, and the old version could not be activated.

A tentative commit was added on top of this tree to bump an iteration updated_at when preparing it so it can be included in the "live-adjacent" iterations (if an iteration's updated_at is more recent than that of the live iteration, it means it could become live).

@apognu apognu added enhancement New feature or request WIP go Pull requests that update Go code do-not-merge labels Mar 19, 2025
@apognu apognu self-assigned this Mar 19, 2025
@apognu apognu force-pushed the feat/index-deletion branch from 922abe2 to ab6c9c0 Compare March 19, 2025 13:29
@apognu apognu force-pushed the feat/index-deletion branch from ab6c9c0 to 455bde4 Compare April 9, 2025 07:29
@apognu apognu force-pushed the feat/index-deletion branch 4 times, most recently from 62f09b1 to 988c2e6 Compare July 1, 2025 09:47
@apognu
Copy link
Contributor Author

apognu commented Jul 1, 2025

We should determine the conditions to keep an index. Right now, it is specifically related to the live iterations and those around it, but we could also add some more predicates, such as how old an iteration is (for example, "whatever happens, keep the indices for the iterations created in the last 2 weeks").

@apognu apognu marked this pull request as ready for review July 1, 2025 12:25
@apognu apognu force-pushed the feat/index-deletion branch from 988c2e6 to a8ff882 Compare July 1, 2025 13:02
@apognu apognu marked this pull request as draft July 3, 2025 13:19
@apognu
Copy link
Contributor Author

apognu commented Jul 3, 2025

After discussion, I think this could be improve with some sort of minimal set algorithm.

If we can afford to do several passes during the job execution, we can find, for all selected iterations, the minimal set of smallest indices required for all iterations to work properly.

If we merge that with a guard that skips initial index creation if the required index is already included in a larger one, we can have an ideal situation, short of using statistics for the heuristic.

@apognu apognu force-pushed the feat/index-deletion branch 4 times, most recently from 51b3aaf to 0a55b8e Compare July 11, 2025 09:32
Copy link
Contributor

@Pascal-Delange Pascal-Delange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few last comments, looks about good to merge in dry run mode.
I just realized something that's missing (even in current code): we have a dead angle on any aggregates used in screening rule trigger conditions that we don't take into account anywhere. Possibly makes most sense to include this in a separate PR since it's more of a (possible, rare) bugfix than directly related to this feature, although it also indirectly concerns it (we could add the fetching of screening configs with the iterations in the big SQL query)

Copy link
Contributor

@Pascal-Delange Pascal-Delange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@apognu apognu force-pushed the feat/index-deletion branch from bf4311a to 4583080 Compare July 16, 2025 12:36
@apognu apognu marked this pull request as ready for review July 16, 2025 12:42
@apognu apognu force-pushed the feat/index-deletion branch from 4583080 to c695afb Compare July 16, 2025 12:42
@apognu apognu merged commit 7dc70f4 into main Jul 16, 2025
4 checks passed
@apognu apognu deleted the feat/index-deletion branch July 16, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge enhancement New feature or request go Pull requests that update Go code WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants