Skip to content

Commit dda7fb1

Browse files
authored
refactor(optimizer): refactor the optimizer tests (#18093)
* optimizer(test): refine the optimizer test * optimizer: refine the optimizer test to cases/statistics/results * refine the files list * refactor: make the code more clean
1 parent 1b4d6b6 commit dda7fb1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+869
-566
lines changed
Lines changed: 78 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,31 @@
1-
# TPC-DS Optimizer Test Data
1+
# Optimizer Test Data
22

3-
This directory contains test data for TPC-DS optimizer tests. The tests are structured as follows:
3+
This directory contains test data for query optimizer tests. The tests are structured as follows:
44

55
## Directory Structure
66

77
```
88
data
9-
├── tables/ # SQL table definitions
10-
├── statistics/ # SQL table definitions
11-
└── cases/ # YAML test case definitions and golden files
9+
├── tables/ # SQL table definitions
10+
│ ├── basic/ # Basic table definitions
11+
│ ├── tpcds/ # TPC-DS table definitions
12+
│ └── obfuscated/ # Obfuscated table definitions
13+
├── statistics/ # Statistics files
14+
│ ├── basic/ # Basic statistics
15+
│ ├── tpcds/ # TPC-DS statistics
16+
│ └── obfuscated/ # Obfuscated statistics
17+
├── cases/ # YAML test case definitions
18+
│ ├── basic/ # Basic test cases
19+
│ ├── tpcds/ # TPC-DS test cases
20+
│ └── obfuscated/ # Obfuscated test cases
21+
└── results/ # Test result files (generated)
22+
├── basic/ # Results for basic test cases
23+
├── tpcds/ # Results for TPC-DS test cases
24+
└── obfuscated/ # Results for obfuscated test cases
1225
```
1326

27+
The test framework supports hierarchical subdirectory structures for better organization of test cases, tables, and results.
28+
1429
## YAML Test Case Format
1530

1631
Each test case is defined in a YAML file with the following structure:
@@ -22,7 +37,12 @@ description: "Test description" # Optional description
2237
sql: | # SQL query to test
2338
SELECT ...
2439
25-
table_statistics: # Table statistics
40+
auto_statistics: false # Whether to use CollectStatisticsOptimizer (default: false)
41+
42+
statistics_file: "tpcds_100g" # Optional: reference external statistics file
43+
# (from statistics/ directory, extension optional)
44+
45+
table_statistics: # Inline table statistics (can be combined with statistics_file)
2646
table_name:
2747
num_rows: 1000
2848
data_size: 102400
@@ -31,7 +51,7 @@ table_statistics: # Table statistics
3151
number_of_blocks: 10
3252
number_of_segments: 2
3353

34-
column_statistics: # Column statistics
54+
column_statistics: # Inline column statistics (can be combined with statistics_file)
3555
table_name.column_name:
3656
min: 1990 # Min value (can be number or string)
3757
max: 2000 # Max value (can be number or string)
@@ -42,21 +62,66 @@ good_plan: | # Optional expected good plan
4262
...
4363
```
4464
65+
## External Statistics Files
66+
67+
Statistics can be defined in separate YAML files in the `statistics/` directory:
68+
69+
```yaml
70+
# statistics/tpcds/tpcds_100g.yaml
71+
table_statistics:
72+
catalog_sales:
73+
num_rows: 143997065
74+
data_size: 12959733850
75+
# ... other stats
76+
77+
column_statistics:
78+
catalog_sales.cs_sold_date_sk:
79+
min: 2450815
80+
max: 2452921
81+
ndv: 1823
82+
null_count: 0
83+
# ... other columns
84+
```
85+
86+
Test cases can reference these files using the `statistics_file` field. The framework will automatically search for matching files (with or without numeric prefixes like `01_tpcds_100g.yaml`).
87+
4588
## Table Definitions
4689

47-
Table definitions are stored in SQL files in the `tables` directory. Each file contains a `CREATE TABLE` statement for a specific table used in the tests.
90+
Table definitions are stored in SQL files in the `tables/` directory. Each file contains `CREATE TABLE` statements. The framework will execute these SQL statements to set up the test environment. If a table already exists, the error will be ignored.
91+
92+
## Running Tests
93+
94+
### Run All Tests
95+
```bash
96+
cargo test --package databend-query --test it -- sql::planner::optimizer::optimizer_test::test_optimizer --exact --nocapture
97+
```
98+
99+
### Run Tests from Specific Subdirectory
100+
```bash
101+
TEST_SUBDIR=tpcds cargo test --package databend-query --test it -- sql::planner::optimizer::optimizer_test::test_optimizer --exact --nocapture
102+
```
103+
104+
## Generated Result Files
105+
106+
Each test case generates three result files in the corresponding subdirectory under `results/`:
107+
- `{test_name}_raw.txt` - The raw plan before optimization
108+
- `{test_name}_optimized.txt` - The optimized logical plan
109+
- `{test_name}_physical.txt` - The physical execution plan
48110

49111
## Adding New Tests
50112

51113
To add a new test case:
52114

53-
1. Create a new YAML file in the `yaml` directory with the test case definition.
54-
2. If the test uses new tables, add the table definitions to the `tables` directory.
55-
3. The test runner will automatically discover and run all test cases in the `yaml` directory.
115+
1. Create a new YAML file in the appropriate subdirectory under `cases/` (e.g., `basic/`, `tpcds/`, or `obfuscated/`).
116+
2. If the test uses new tables, add the table definitions to the corresponding subdirectory under `tables/`.
117+
3. If needed, add statistics files to the corresponding subdirectory under `statistics/`.
118+
4. The test runner will automatically discover and run all test cases recursively in all subdirectories.
119+
5. Test results will be saved in a matching subdirectory structure under the main `results/` directory.
56120

57121
## Updating Existing Tests
58122

59123
If the expected output of a test changes (e.g., due to optimizer improvements):
60124

61-
1. Run the test with UPDATE_GOLDENFILES to generate new file.
62-
2. Checking that changes to files are as expected.
125+
1. Run the test with `UPDATE_GOLDENFILES` environment variable to generate new result files.
126+
2. The new result files will be automatically saved in the correct subdirectory structure under the main `results/` directory.
127+
3. Review the changes to ensure they are as expected.

src/query/service/tests/it/sql/planner/optimizer/data/cases/Q98.yaml

Lines changed: 0 additions & 23 deletions
This file was deleted.

src/query/service/tests/it/sql/planner/optimizer/data/cases/Q99.yaml

Lines changed: 0 additions & 23 deletions
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
1-
name: "Q04"
2-
description: "Cross test"
1+
name: "01_cross_join_aggregation"
2+
description: "Test for cross join with aggregation functions"
33

44
sql: |
55
SELECT SUM(i1.i), MIN(i1.i), MAX(i2.i) FROM integers i1, integers i2;
66
7-
87
auto_statistics: true
98

109
good_plan: |
@@ -19,12 +18,3 @@ good_plan: |
1918
│ └── columns: [D_DATE_SK, D_YEAR, D_MOY]
2019
└── Aggregate [group by: ITEM.I_BRAND_ID, ITEM.I_BRAND, STORE_SALES.SS_SOLD_DATE_SK]
2120
└── InnerJoin [join key: (ITEM.I_ITEM_SK = STORE_SALES.SS_ITEM_SK)]
22-
├── Aggregate [group by: ITEM.I_ITEM_SK, ITEM.I_BRAND_ID, ITEM.I_BRAND]
23-
│ └── Filter [condition: ITEM.I_MANUFACT_ID = 128]
24-
│ └── TableScan (ITEM) [partitions: 2/2, bytes: 23,811,584]
25-
│ └── columns: [I_ITEM_SK, I_BRAND_ID, I_BRAND, I_MANUFACT_ID]
26-
└── Aggregate [group by: STORE_SALES.SS_SOLD_DATE_SK, STORE_SALES.SS_ITEM_SK]
27-
└── Filter [condition: STORE_SALES.SS_SOLD_DATE_SK IS NOT NULL]
28-
└── JoinFilter [join key: (DT.D_DATE_SK = STORE_SALES.SS_SOLD_DATE_SK)]
29-
└── TableScan (STORE_SALES) [partitions: 70,412/72,718, bytes: 1,212,628,258,304]
30-
└── columns: [SS_SOLD_DATE_SK, SS_ITEM_SK, SS_EXT_SALES_PRICE]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
name: "01_multi_join_avg_case_expression"
2+
description: "Test for multiple left outer joins with AVG and CASE expression"
3+
4+
sql: |
5+
SELECT t.sell_mnt = 0 FROM (SELECT a.a0d, a.a0k, a.a0m, c.a5m, avg(CASE WHEN d.a1v = '603020' THEN 1 ELSE 0 END) + 3 AS sell_mnt FROM a0c AS a LEFT OUTER JOIN a1z AS b ON a.a0k = b.a0k AND a.a0n = b.a0n AND b.a2c <= a.a0d AND b.a2k > a.a0d LEFT OUTER JOIN a2x AS c ON a.a0m = c.a0m LEFT OUTER JOIN a5r AS d ON a.a0l = d.a5t WHERE a.a0d BETWEEN '20240526' AND '20250525' AND b.a2t = '624100' AND SUBSTRING(c.a4m FROM 20 FOR 1) = '1' AND SUBSTRING(d.a5w FROM 1 FOR 1) = '1' GROUP BY a.a0d, a.a0k, a.a0m, c.a5m) AS t;
6+
7+
# Reference to external statistics file
8+
statistics_file: obfuscated/01_multi_join_case_expression_stats.yaml
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
name: "01_multi_join_sum_case_expression"
2+
description: "Test for multiple left outer joins with SUM and CASE expression"
3+
4+
sql: |
5+
SELECT t.sell_mnt = 0 FROM (SELECT a.a0d, a.a0k, a.a0m, c.a5m, sum(CASE WHEN d.a1v = '603020' THEN 1 ELSE 0 END) AS sell_mnt FROM a0c AS a LEFT OUTER JOIN a1z AS b ON a.a0k = b.a0k AND a.a0n = b.a0n AND b.a2c <= a.a0d AND b.a2k > a.a0d LEFT OUTER JOIN a2x AS c ON a.a0m = c.a0m LEFT OUTER JOIN a5r AS d ON a.a0l = d.a5t WHERE a.a0d BETWEEN '20240526' AND '20250525' AND b.a2t = '624100' AND SUBSTRING(c.a4m FROM 20 FOR 1) = '1' AND SUBSTRING(d.a5w FROM 1 FOR 1) = '1' GROUP BY a.a0d, a.a0k, a.a0m, c.a5m) AS t;
6+
7+
# Reference to external statistics file
8+
statistics_file: obfuscated/01_multi_join_case_expression_stats.yaml

src/query/service/tests/it/sql/planner/optimizer/data/cases/Q01.yaml renamed to src/query/service/tests/it/sql/planner/optimizer/data/cases/tpcds/Q01.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ sql: |
2626
LIMIT 100
2727
2828
# Reference to external statistics file
29-
statistics_file: statistics.yaml
29+
statistics_file: tpcds/tpcds_100g.yaml
3030

3131
# Converted from tabular format to tree format based on parent-child relationships
3232
good_plan: |

src/query/service/tests/it/sql/planner/optimizer/data/cases/Q03.yaml renamed to src/query/service/tests/it/sql/planner/optimizer/data/cases/tpcds/Q03.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ sql: |
1414
LIMIT 100
1515
1616
# Reference to external statistics file
17-
statistics_file: statistics.yaml
17+
statistics_file: tpcds/tpcds_100g.yaml
1818

1919
good_plan: |
2020
Result [output: DT.D_YEAR, ITEM.I_BRAND_ID, ITEM.I_BRAND, SUM(...)]

0 commit comments

Comments
 (0)