-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Suppose I have a table which has foreign keys to two different parent tables:
+----------+ +----------+
| PARENT_A | | PARENT_B |
+----------+ +----------+
| ID (int) | | ID (int) |
+----------+ +----------+
^ ^
| +-----------+ |
| | CHILD | |
| +-----------+ |
\---| A_ID (int)| |
| B_ID (int)|----/
+-----------+
and that we also have separate indexes on each of the child columns A_ID
and B_ID
(either because we created them manually, or because H2 automatically created them as part of the foreign key).
If I query for all rows where A_ID = x
, or if I query where B_ID = y
, then H2 uses the appropriate index in each case. If instead I filter where A_ID = x and B_ID = y
then H2 picks one of the two indexes to satisfy one of the two conditions, then filters within the remaining rows to satisfy the other.
Which of the two indexes gets picked matters, especially if there is a large difference in selectivity between the indexes, but I'm seeing cases where H2 picks the "wrong" (less selective) index which can lead to much worse performance.
I've attached a script which replicates this problem: it models a child table which records some kind of per-second samples from attached devices, with foreign keys to one parent table of the devices themselves (small number of devices, but high number of samples per device, so low selectivity); and a second parent table which buckets the samples by hour (large number of hours, but small number of samples per hour, so high selectivity).
Running the script sets up some dummy data. The following queries then show H2 choosing to use sbs_device_id_idx
which involves orders of magnitude more reads, when ideally it would prefer sbs_hour_id_idx
:
select count(*)
from samples_by_second
where sample_hours_id = 3
;
select count(*)
from samples_by_second
where device_id = 2
;
explain analyze
select sample_hours_id, min(epoch_seconds), max(epoch_seconds)
from samples_by_second
-- toggle this to see the two different query plans
-- use index (sbs_hour_id_idx)
where sample_hours_id = 3
and device_id = 2
group by sample_hours_id
;