You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote a simple test to test the speed of MERGE JOIN.
RECREATE TABLE BIG_1 (
ID_1 BIGINTNOT NULL,
F_1 BIGINT,
CONSTRAINT PK_BIG_1 PRIMARY KEY(ID_1)
);
RECREATE TABLE BIG_2 (
ID_2 BIGINTNOT NULL,
F_2 BIGINT,
CONSTRAINT PK_BIG_2 PRIMARY KEY(ID_2)
);
SET TERM ^;
EXECUTE BLOCK
AS
DECLARE I BIGINT;
DECLARE A BIGINT;
BEGIN
I =0;
WHILE (I <2000000) DO
BEGIN
I = I +1;
A =NULL;
IF(MOD(I, 19) =0)
THEN A = I;
INSERT INTO BIG_1(ID_1, F_1)
VALUES (:I, :A);
END
I =0;
WHILE (I <1500000) DO
BEGIN
I = I +1;
A =NULL;
IF(MOD(I, 13) =0)
THEN A = I;
INSERT INTO BIG_2(ID_2, F_2)
VALUES (:I, :A);
END
END^
SET TERM ;^
COMMIT;
As you can see, most of the field values by which tables are joined are NULL.
And execute simple query. I didn't wait for the result.
SELECTCOUNT(*)
FROM
BIG_1
JOIN BIG_2 ONBIG_2.F_2=BIG_1.F_1
Now we rewrite it in an equivalent form:
SELECTCOUNT(*)
FROM
BIG_1
JOIN BIG_2 ONBIG_2.F_2=BIG_1.F_1WHEREBIG_2.F_2IS NOT NULL
Select Expression
-> Aggregate
-> Filter
-> Hash Join (inner)
-> Table "BIG_1" Full Scan
-> Record Buffer (record length: 33)
-> Filter
-> Table "BIG_2" Full Scan
COUNT
=====================
6072
Current memory = 1806305664
Delta memory = 0
Max memory = 1811848224
Elapsed time = 0.944 sec
Buffers = 204800
Reads = 0
Writes = 0
Fetches = 3596854
This is understandable: in a hash table, many entries with the value NULL fall into one bucket.
Why does the optimizer add records with NULL values into a hash table? Why isn't the filter applied implicitly? After all, for an internal join of tables using HASH/MERGE JOIN, it is a priori clear that the condition will be false.
Tried it in 4.0 and 5.0. No difference. although I don't understand why MERGE JOIN is so slow in this case.
The text was updated successfully, but these errors were encountered:
The algorithm does not check whether equality or IS NOT DISTINCT FROM is used as a join condition (both options are possible), and preserve NULLs to handle both cases. I agree such a check could be added to improve the performance.
Still, more often the join is made using equality, and NULL is not such a rare value. Therefore, it would be nice to add such a check. Of course, already in 6.0.
I wrote a simple test to test the speed of MERGE JOIN.
As you can see, most of the field values by which tables are joined are NULL.
And execute simple query. I didn't wait for the result.
Now we rewrite it in an equivalent form:
This is understandable: in a hash table, many entries with the value NULL fall into one bucket.
Why does the optimizer add records with NULL values into a hash table? Why isn't the filter applied implicitly? After all, for an internal join of tables using HASH/MERGE JOIN, it is a priori clear that the condition will be false.
Tried it in 4.0 and 5.0. No difference. although I don't understand why MERGE JOIN is so slow in this case.
The text was updated successfully, but these errors were encountered: