Skip to content

JOINing with OR condition results in a super slow query #3813

@cmaion

Description

@cmaion

Bug Description:

Hi,

I'm joining a vector table (t2, primary search source) with another one (t1) that contains attributes needed to filter the result set.

In some cases I need to use a OR condition on the right-side table. This however generates a super slow query, much slower than running the query twice with either side of the OR condition.

Looks like an (unneeded?) full table scan is performed in that case?

> DESCRIBE t1;
+-------------------+-----------+------------+
| Field             | Type      | Properties |
+-------------------+-----------+------------+
| id                | bigint    |            |
[...]
| site_id           | mva64     |            |
| parent_id         | mva64     |            |
| flag              | bool      |            |
+-------------------+-----------+------------+
19 rows in set (0.00 sec)

> DESCRIBE t2;
+------------+--------------+------------+
| Field      | Type         | Properties |
+------------+--------------+------------+
| id         | bigint       |            |
| account_id | bigint       |            |
| t1_id      | bigint       |            |
[...]
| embedding  | float_vector | knn        |
+------------+--------------+------------+
5 rows in set (0.00 sec)

This is the result of running the needed query, with the OR condition: (t1.flag=1 OR ANY(parent_id) IN (67890))

> SELECT t1_id, knn_dist() AS knn_dist, t1.site_id AS site_id, t1.parent_id AS parent_id
FROM t2
INNER JOIN t1 ON t1.id=t2.t1_id
WHERE KNN(embedding, 4000, (...vector...), {ef=2000, oversampling=3.0, rescore=1})
AND knn_dist<=0.35 
AND account_id=3
AND ANY(site_id)=12345
AND (t1.flag=1 OR ANY(parent_id) IN (67890))
LIMIT 2000 OPTION max_matches=2000;
show meta;
[...]
2000 rows in set (0.27 sec)
--- 2000 out of 5049 results in 269ms ---

+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| total          | 2000  |
| total_found    | 5049  |
| total_relation | eq    |
| time           | 0.269 |
+----------------+-------+

Result returned in 269ms.
Now lets run the same query but with the right side of the OR condition removed: (t1.flag=1)

> SELECT t1_id, knn_dist() AS knn_dist, t1.site_id AS site_id, t1.parent_id AS parent_id
FROM t2
INNER JOIN t1 ON t1.id=t2.t1_id
WHERE KNN(embedding, 4000, (...vector...), {ef=2000, oversampling=3.0, rescore=1})
AND knn_dist<=0.35
AND account_id=3
AND ANY(site_id)=12345
AND t1.flag=1
LIMIT 2000 OPTION max_matches=2000;
show meta;
Empty set (0.01 sec)
--- 0 out of 0 results in 10ms ---

+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| total          | 0     |
| total_found    | 0     |
| total_relation | eq    |
| time           | 0.010 |
+----------------+-------+
4 rows in set (0.00 sec)

This is super fast.
Now with the left side of the OR condition removed: (ANY(parent_id) IN (67890))

> SELECT t1_id, knn_dist() AS knn_dist, t1.site_id AS site_id, t1.parent_id AS parent_id 
FROM t2
INNER JOIN t1 ON t1.id=t2.t1_id
WHERE KNN(embedding, 4000, (...vector...), {ef=2000, oversampling=3.0, rescore=1})
AND knn_dist<=0.35
AND account_id=3
AND ANY(site_id)=56391
AND ANY(parent_id) IN (3677791)
LIMIT 2000 OPTION comment='console',max_matches=2000;
show meta;
[...]
2000 rows in set (0.02 sec)
--- 2000 out of 5049 results in 19ms ---

+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| total          | 2000  |
| total_found    | 5049  |
| total_relation | eq    |
| time           | 0.019 |
+----------------+-------+
4 rows in set (0.00 sec)

Still very fast, <20ms!

Increasing join_batch_size does improve the response time of the complete query (eg, using join_batch_size=4000 returns in 150ms instead of 260ms with the default join_batch_size=1000), but this is still an order of magnitude greater.

Switching the order of the OR condition doesn't change anything.

Secondary indices are disabled (secondary_indexes = 0).

Manticore Search Version:

13.13.0 e5465fe44@25100704 (columnar 8.1.0 e1522a2@25100213) (secondary 8.1.0 e1522a2@25100213) (knn 8.1.0 e1522a2@25100213) (embeddings 1.0.1)

Operating System Version:

Ubuntu 24.04

Have you tried the latest development version?

No

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions