You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(optimizer): refactor the optimizer tests (#18093)
* optimizer(test): refine the optimizer test
* optimizer: refine the optimizer test to cases/statistics/results
* refine the files list
* refactor: make the code more clean
column_statistics: #Inline column statistics (can be combined with statistics_file)
35
55
table_name.column_name:
36
56
min: 1990# Min value (can be number or string)
37
57
max: 2000# Max value (can be number or string)
@@ -42,21 +62,66 @@ good_plan: | # Optional expected good plan
42
62
...
43
63
```
44
64
65
+
## External Statistics Files
66
+
67
+
Statistics can be defined in separate YAML files in the `statistics/` directory:
68
+
69
+
```yaml
70
+
# statistics/tpcds/tpcds_100g.yaml
71
+
table_statistics:
72
+
catalog_sales:
73
+
num_rows: 143997065
74
+
data_size: 12959733850
75
+
# ... other stats
76
+
77
+
column_statistics:
78
+
catalog_sales.cs_sold_date_sk:
79
+
min: 2450815
80
+
max: 2452921
81
+
ndv: 1823
82
+
null_count: 0
83
+
# ... other columns
84
+
```
85
+
86
+
Test cases can reference these files using the `statistics_file` field. The framework will automatically search for matching files (with or without numeric prefixes like `01_tpcds_100g.yaml`).
87
+
45
88
## Table Definitions
46
89
47
-
Table definitions are stored in SQL files in the `tables` directory. Each file contains a `CREATE TABLE` statement for a specific table used in the tests.
90
+
Table definitions are stored in SQL files in the `tables/` directory. Each file contains `CREATE TABLE` statements. The framework will execute these SQL statements to set up the test environment. If a table already exists, the error will be ignored.
91
+
92
+
## Running Tests
93
+
94
+
### Run All Tests
95
+
```bash
96
+
cargo test --package databend-query --test it -- sql::planner::optimizer::optimizer_test::test_optimizer --exact --nocapture
97
+
```
98
+
99
+
### Run Tests from Specific Subdirectory
100
+
```bash
101
+
TEST_SUBDIR=tpcds cargo test --package databend-query --test it -- sql::planner::optimizer::optimizer_test::test_optimizer --exact --nocapture
102
+
```
103
+
104
+
## Generated Result Files
105
+
106
+
Each test case generates three result files in the corresponding subdirectory under `results/`:
107
+
- `{test_name}_raw.txt`- The raw plan before optimization
108
+
- `{test_name}_optimized.txt`- The optimized logical plan
109
+
- `{test_name}_physical.txt`- The physical execution plan
48
110
49
111
## Adding New Tests
50
112
51
113
To add a new test case:
52
114
53
-
1. Create a new YAML file in the `yaml` directory with the test case definition.
54
-
2. If the test uses new tables, add the table definitions to the `tables` directory.
55
-
3. The test runner will automatically discover and run all test cases in the `yaml` directory.
115
+
1. Create a new YAML file in the appropriate subdirectory under `cases/` (e.g., `basic/`, `tpcds/`, or `obfuscated/`).
116
+
2. If the test uses new tables, add the table definitions to the corresponding subdirectory under `tables/`.
117
+
3. If needed, add statistics files to the corresponding subdirectory under `statistics/`.
118
+
4. The test runner will automatically discover and run all test cases recursively in all subdirectories.
119
+
5. Test results will be saved in a matching subdirectory structure under the main `results/` directory.
56
120
57
121
## Updating Existing Tests
58
122
59
123
If the expected output of a test changes (e.g., due to optimizer improvements):
60
124
61
-
1. Run the test with UPDATE_GOLDENFILES to generate new file.
62
-
2. Checking that changes to files are as expected.
125
+
1. Run the test with `UPDATE_GOLDENFILES` environment variable to generate new result files.
126
+
2. The new result files will be automatically saved in the correct subdirectory structure under the main `results/` directory.
127
+
3. Review the changes to ensure they are as expected.
description: "Test for multiple left outer joins with AVG and CASE expression"
3
+
4
+
sql: |
5
+
SELECT t.sell_mnt = 0 FROM (SELECT a.a0d, a.a0k, a.a0m, c.a5m, avg(CASE WHEN d.a1v = '603020' THEN 1 ELSE 0 END) + 3 AS sell_mnt FROM a0c AS a LEFT OUTER JOIN a1z AS b ON a.a0k = b.a0k AND a.a0n = b.a0n AND b.a2c <= a.a0d AND b.a2k > a.a0d LEFT OUTER JOIN a2x AS c ON a.a0m = c.a0m LEFT OUTER JOIN a5r AS d ON a.a0l = d.a5t WHERE a.a0d BETWEEN '20240526' AND '20250525' AND b.a2t = '624100' AND SUBSTRING(c.a4m FROM 20 FOR 1) = '1' AND SUBSTRING(d.a5w FROM 1 FOR 1) = '1' GROUP BY a.a0d, a.a0k, a.a0m, c.a5m) AS t;
description: "Test for multiple left outer joins with SUM and CASE expression"
3
+
4
+
sql: |
5
+
SELECT t.sell_mnt = 0 FROM (SELECT a.a0d, a.a0k, a.a0m, c.a5m, sum(CASE WHEN d.a1v = '603020' THEN 1 ELSE 0 END) AS sell_mnt FROM a0c AS a LEFT OUTER JOIN a1z AS b ON a.a0k = b.a0k AND a.a0n = b.a0n AND b.a2c <= a.a0d AND b.a2k > a.a0d LEFT OUTER JOIN a2x AS c ON a.a0m = c.a0m LEFT OUTER JOIN a5r AS d ON a.a0l = d.a5t WHERE a.a0d BETWEEN '20240526' AND '20250525' AND b.a2t = '624100' AND SUBSTRING(c.a4m FROM 20 FOR 1) = '1' AND SUBSTRING(d.a5w FROM 1 FOR 1) = '1' GROUP BY a.a0d, a.a0k, a.a0m, c.a5m) AS t;
0 commit comments