Skip to content

[WIP] improve indexes to query decisions #1010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Pascal-Delange
Copy link
Contributor

@Pascal-Delange Pascal-Delange commented May 18, 2025

Master plan, pre-implem

Everytying in this PR is about the decisions table and its indexes.

  • improve the table model by denormalizing some fields that should never have made it to the main table (things held on the scenario & iteration)
  • review indexes on the table "from scratch", to have a reasonable minimal set of indexes that helps the most important query paths to run quickly enough
  • document what workflows the indexes on the table are for - where it's not self-explanatory
  • start rewriting query structure (and indexes) so that we get index-only scans where possible, at least for an initial filtering of rows
  • update some table settings related to the table: vacuum frequency, toast_tuple_target

Some preliminary comments based on my exploration

  • the table does not have toast_tuple_target set up (only decision_rules does)
    • as a consequence, because the typical row for a decision is about 500-1000 bytes (pre-compression, perhaps 20% less post-compression) all data is currently held in the main table (no toast at all) => my goal would be that all but the smallest payloads go to toast table
    • Actually (sadness sadness 😢 ), toasting is triggered only if TOAST_TUPLE_THRESHOLD is reached, which is 2kb by default, and is a compile-time flag for postgres which we can't change on cloudsql. So much for toasting, if we want to offload the payloads we'll need to move it to a secondary table.
  • surprisingly, vacuum full does not update the visibility map - only regular vacuum does. So you need to run vacuum after a vacuum full - and it takes quite some time to run. (No particular impact on our prod where we can't do full, but interesting still)
  • also, vacuum full only rewrites the table in the same "shape", it does not move things to toast table if the setting has changed :'(
  • the overly complex query structure que have (with 3 levels of subquery) can already be simplified since we removed the option for pagination backwards - even before removing (if we do it) the returning of row numbers.
  • doing a first subquery with the filtering, returning the id (currently not in the index) and only then joining the main table to read the other columns seems to help on filtering by doing an index only scan - I would have expected that it's equivalent and handled by the DB under the hood, but evidence seems to point in the other direction

Actual changes

Revamp indexes

Current indexes on the table

  • id: PK
  • org_id, object_id
  • org_id, pivot_value, created_at
  • org_id, created_at INCLUDE(scenario_id, outcome, trigger_object_type, case_id) => Basic search
  • org_id, pivot_value, case_id. WHERE (case_id and pivot_value not null) => For "add to case" workflow. limited size because filtered on case_id not null
  • scenario_iteration_id => Duplicate because of another other one below? Created because of FK ? But I'm removing this FK
  • pivot_id => for cascade delete of FK only? But I'm removing this FK
  • case_id, org_id WHERE case_id not null => for listing in the case page. Limited size because filtered on case_id not null
  • org_id, created_at INCLUDE(scenario_id, outcome, trigger_object, case_id, review_status) WHERE case_id not null => duplicate of the base search index without the case filter, because postgres was selecting indexes strangely if passing "case_id is not null".
  • scheduled_execution_id, created_at WHERE scheduled_execution_id IS NOT NULL => For pagination by scheduled_exec_id (but missing org_id in index)

In total, 9 indexes + PK, and 3 of them are filtered on decisions with a case, 1 filtered on decisions with a scheduled_exec_id

New set of indexes

  • id PK
    For specific workflows:
  • org_id, pivot_value, case_id. WHERE (case_id and pivot_value not null) => For "add to case" workflow. Limited in size because filtered. Must be in this order.
    Very specific indexes, makes sense to have them as we « hope » they are automatically chosen if one of the object_id, pivot_value filters are passed (which are expected to be some of the main filters):
  • org_id, pivot_value, created_at WHERE pivot_value IS NOT NULL (which should be most decisions eventually so the filter may not reduce size by much… but we might as well still use it, as we cannot pass « null » as a filter value when it’s used)
  • org_id, object_id, created_at => also used in a workflow for transfercheck - but useful for search anyway
    Strong filters, limited index size (in most setups):
  • scheduled_execution_id, org_id, created_at WHERE scheduled_execution_id IS NOT NULL
  • org_id, case_id, created_at WHERE case_id not null
    Otherwise, default to this:
  • org_id, created_at INCLUDE (scenario_id (or not?)) => "include" columns to refine
  • org_id, created_at INCLUDE (scenario_id (or not?)) WHERE case_id is not null => "include" columns to refine, double check if we can do without it
  • org_id, outcome, created_at INCLUDE(case_id, review_status, scenario_id (or not?)) => "include" columns to refine

=> 8 indexes +PK, out of which 3 are filtered on decisions with cases, 1 filtered on batch decisions, and one on decisions that have a non null pivot value (So 4 or 5 « large » ones in total).

Schema changes

  • remove FKs on:
    • scenario_iteration_id => saves an index, faster writes
    • pivot_id => saves an index, faster writes
    • org_id, or just remove the "on delete cascade" ? (not decided yet - does not save an index either way)
  • remove columns scenario_name, scenario_description, scenario_version (renormalize the schema) (read them at query time with a join)
  • change review_status varchar(10) => text

Config changes

  • set toast_tuple_target on the table, value TBD but probably in the 256-ish range. not relevant, can't update TOAST_TUPLE_THRESHOLD param
  • make the table be vacuumed more aggressively. Probably (will be done in another PR on terraform anyway) by changing autovacuum_vacuum_insert_scale_factor to something like 0.02, and autovacuum_analyze_scale_factor to 0.01 (1/10 of the default values)

@Pascal-Delange Pascal-Delange force-pushed the pascal/improvement-on-decisions-query branch from c8c2d78 to 9004d8c Compare May 19, 2025 09:13
@apognu
Copy link
Contributor

apognu commented May 20, 2025

also, vacuum full only rewrites the table in the same "shape", it does not move things to toast table if the setting has changed :'(

As a note, vacuum full would also actually drop the dropped columns, instead of simply marking them as invisible, which would help with row packing and therefore loads. We still cannot do it on prod, though. :)

@Pascal-Delange
Copy link
Contributor Author

also, vacuum full only rewrites the table in the same "shape", it does not move things to toast table if the setting has changed :'(

As a note, vacuum full would also actually drop the dropped columns, instead of simply marking them as invisible, which would help with row packing and therefore loads. We still cannot do it on prod, though. :)

TBH I think that some day in the future, perhaps, if still relevant, we'll do a thing where we create a temporary copy of the decisions table, vacuum full the main one, then stitch them together again. Perhaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants