Skip to content

H2 selects less selective index in query plan #4281

@ashleymercer

Description

@ashleymercer

Suppose I have a table which has foreign keys to two different parent tables:

    +----------+        +----------+
    | PARENT_A |        | PARENT_B |
	+----------+        +----------+
	| ID (int) |        | ID (int) |
	+----------+        +----------+
	     ^                    ^
		 |   +-----------+    |
		 |   |   CHILD   |    |
		 |   +-----------+    |
		 \---| A_ID (int)|    |
		     | B_ID (int)|----/
			 +-----------+

and that we also have separate indexes on each of the child columns A_ID and B_ID (either because we created them manually, or because H2 automatically created them as part of the foreign key).

If I query for all rows where A_ID = x, or if I query where B_ID = y, then H2 uses the appropriate index in each case. If instead I filter where A_ID = x and B_ID = y then H2 picks one of the two indexes to satisfy one of the two conditions, then filters within the remaining rows to satisfy the other.

Which of the two indexes gets picked matters, especially if there is a large difference in selectivity between the indexes, but I'm seeing cases where H2 picks the "wrong" (less selective) index which can lead to much worse performance.

I've attached a script which replicates this problem: it models a child table which records some kind of per-second samples from attached devices, with foreign keys to one parent table of the devices themselves (small number of devices, but high number of samples per device, so low selectivity); and a second parent table which buckets the samples by hour (large number of hours, but small number of samples per hour, so high selectivity).

Running the script sets up some dummy data. The following queries then show H2 choosing to use sbs_device_id_idx which involves orders of magnitude more reads, when ideally it would prefer sbs_hour_id_idx:

select count(*)
from samples_by_second
where sample_hours_id = 3
;

select count(*)
from samples_by_second
where device_id = 2
;

explain analyze
select sample_hours_id, min(epoch_seconds), max(epoch_seconds)
from samples_by_second

-- toggle this to see the two different query plans
--  use index (sbs_hour_id_idx)

where sample_hours_id = 3
and device_id = 2
group by sample_hours_id
;

h2database_issue_540_wrong_query_plan.sql

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions