|
1 | 1 | # Glossary
|
2 | 2 |
|
3 |
| -Definitions in query optimization can get very overloaded. Below is the language optd developers speak. |
| 3 | +We have found internally that definitions in query optimization have become overloaded. This |
| 4 | +document defines key names and definitions for concepts that are required in optimization. |
| 5 | + |
| 6 | +Many of the names and definitions will be inspired by the Cascades framework. However, there are a |
| 7 | +few important differences that need to be addressed considering our memo table will be persistent. |
| 8 | + |
| 9 | +# Contents |
| 10 | + |
| 11 | +- [Memo Table] |
| 12 | +- [Expression] |
| 13 | + - [Relational Expression] |
| 14 | + - [Logical Expression] |
| 15 | + - [Physical Expression] |
| 16 | + - [Scalar Expression] |
| 17 | + - **[Equivalence of Expressions](#expression-equivalence)** |
| 18 | +- [Group] |
| 19 | + - [Relational Group] |
| 20 | + - [Scalar Group] |
| 21 | +- [Query Plan] |
| 22 | + - [Logical Plan] |
| 23 | + - [Physical Plan] |
| 24 | +- [Operator] / [Plan Node] |
| 25 | + - [Relational Operator] |
| 26 | + - [Logical Operator] |
| 27 | + - [Physical Operator] |
| 28 | + - [Scalar Operator] |
| 29 | +- [Property] |
| 30 | + - [Logical Property] |
| 31 | + - [Physical Property] |
| 32 | + - ? Derived Property ? |
| 33 | +- [Rule] |
| 34 | + - [Transformation Rule] |
| 35 | + - [Implementation Rule] |
| 36 | + |
| 37 | +[EQOP]: https://www.microsoft.com/en-us/research/publication/extensible-query-optimizers-in-practice/ |
| 38 | +[Memo Table]: #memo-table |
| 39 | +[Expression]: #expression |
| 40 | +[Relational Expression]: #relational-expression |
| 41 | +[Logical Expression]: #logical-expression |
| 42 | +[Physical Expression]: #physical-expression |
| 43 | +[Scalar Expression]: #scalar-expression |
| 44 | +[Group]: #group |
| 45 | +[Relational Group]: #relational-group |
| 46 | +[Scalar Group]: #scalar-group |
| 47 | +[Query Plan]: #query-plan |
| 48 | +[Logical Plan]: #logical-plan |
| 49 | +[Physical Plan]: #physical-plan |
| 50 | +[Plan Node]: #operator |
| 51 | +[Operator]: #operator |
| 52 | +[Relational Operator]: #relational-operator |
| 53 | +[Logical Operator]: #logical-operator |
| 54 | +[Physical Operator]: #physical-operator |
| 55 | +[Scalar Operator]: #scalar-operator |
| 56 | +[Property]: #property |
| 57 | +[Logical Property]: #logical-property |
| 58 | +[Physical Property]: #physical-property |
| 59 | +[Rule]: #rule |
| 60 | +[Transformation Rule]: #transformation-rule |
| 61 | +[Implementation Rule]: #implementation-rule |
| 62 | +[Enforcer Rule]: #enforcer-rule |
| 63 | +[Enforcer Operator]: #enforcer-operator |
4 | 64 |
|
5 |
| -### Relational operator |
6 |
| -A **relation operator** (`RelNode`) describes an operation that can be evaluated to obtain a bag of tuples. In other literature this is also referred to as a query plan. A relational operator can be either logical or physical. |
| 65 | +# Comparison with Cascades |
7 | 66 |
|
8 |
| -### Scalar operator |
| 67 | +In the Cascades framework, an expression is a tree of operators. In `optd`, we are instead defining |
| 68 | +a logical or physical [Query Plan] to be a tree or DAG of [Operator]s. An expression in `optd` |
| 69 | +strictly refers to the representation of an operator in the [Memo Table], not in query plans. |
9 | 70 |
|
10 |
| -A **scalar operator** (`ScalarNode`) describes an operation that can be evaluated to obtain a single value. In other literature this is also referred to as a sql expression or a row expression. |
| 71 | +See the [section below](#expression-logical-physical-scalar) on the kinds of expressions for more |
| 72 | +information. |
11 | 73 |
|
12 |
| -## Cascades |
| 74 | +Most other terms in `optd` are similar to Cascades or are self-explanatory. |
13 | 75 |
|
14 |
| -### Expressions |
| 76 | +<br> |
15 | 77 |
|
16 |
| -A **logical expression** is a tree/DAG of logical operators. |
| 78 | +# Memo Table Terms |
17 | 79 |
|
18 |
| -A **physical expression** is a tree/DAG of physical operators. |
| 80 | +This section describes names and definitions of concepts related to the memo table. |
19 | 81 |
|
20 |
| -The term **expression** in the context of Cascades can refer to either a relational or a scalar expression. |
| 82 | +## Memo Table |
21 | 83 |
|
22 |
| -### Properties |
| 84 | +The memo table is the data structure used for dynamic programming in a top-down plan enumeration |
| 85 | +search algorithm. The memo table consists of a mutually recursive data structure made up of |
| 86 | +[Expression]s and [Group]s. |
23 | 87 |
|
24 |
| -**Properties** are metadata computed (and sometimes stored) for each node in an expression. |
25 |
| -Properties of an expression may be **required** by the original SQL query or **derived** from **physical properties of one of its inputs.** |
| 88 | +## Expression |
26 | 89 |
|
| 90 | +An expression is the representation of a non-materialized operator _inside_ of the [Memo Table]. |
27 | 91 |
|
28 |
| -**Logical properties** describe the structure and content of data returned by an expression. |
| 92 | +There are 2 types of expressions: [Relational Expression]s and [Scalar Expression]s. A [Relational |
| 93 | +Expression] can be either a [Logical Expression] or a [Physical Expression]. |
29 | 94 |
|
30 |
| -- Examples: row count, operator type,statistics, whether relational output columns can contain nulls. |
| 95 | +Note that different kinds of expressions can have the same names as [Operator]s or [Plan Node]s, but |
| 96 | +expressions solely indicate non-materialized relational or scalar operators in the [Memo Table]. |
31 | 97 |
|
32 |
| -**Physical properties** are characteristics of an expression that |
33 |
| -impact its layout, presentation, or location, but not its logical content. |
| 98 | +Operators outside of the [Memo Table] should _**not**_ be referred to as expressions, and should |
| 99 | +instead be referred to as [Operator]s or [Plan Node]s. |
34 | 100 |
|
35 |
| -- Examples: order and data distribution. |
| 101 | +Notably, when we refer to an expression, _we are specifically talking about the representation of_ |
| 102 | +_operators inside the memo table_. A logical operator from an incoming logical plan should _not_ |
| 103 | +be called an [Logical Expression], and similarly a physical execution operator in the final output |
| 104 | +physical plan should also _not_ be called an [Physical Expression]. |
36 | 105 |
|
| 106 | +Another way to think about this is that expressions are _not_ materialized, and plan nodes and |
| 107 | +operators inside query plans _are_ materialized. Operators inside of query plans (both logical and |
| 108 | +physical) should be referred to as either logical or physical [Operator]s or logical or physical |
| 109 | +[Plan Node]s. |
37 | 110 |
|
38 |
| -### Equivalence |
| 111 | +Another key difference between expressions and [Plan Node]s is that expressions have 0 or more |
| 112 | +**Group Identifiers** as children, and [Plan Node]s have 0 or more other [Plan Node]s as children. |
39 | 113 |
|
40 |
| -Two logical expressions are equivalent if the logical properties of the two expressions are the same. They should produce the same set of rows and columns. |
| 114 | +## Relational Expression |
41 | 115 |
|
42 |
| -Two physical expressions are equivalent if their logical and physical properties are the same. |
| 116 | +A relational expression is either a [Logical Expression] or a [Physical Expression]. |
43 | 117 |
|
44 |
| -Logical expression with a required physical property is equivalent to a physical expression if the physical expression has the same logical property and delivers the physical property. |
| 118 | +When we say "relational", we mean representations of operations in the relational algebra of SQL. |
45 | 119 |
|
| 120 | +Relational expressions differ from [Scalar Expression]s in that the result of algebraically |
| 121 | +evaluating a relational expression produces a bag of tuples instead of a single scalar value. |
46 | 122 |
|
47 |
| -### Group |
| 123 | +See the following sections for more information. |
48 | 124 |
|
49 |
| -A **group** consists of equivalent logical expressions. |
| 125 | +## Logical Expression |
50 | 126 |
|
51 |
| -A **relational group** consists of logically equivalent logical relational operators. |
| 127 | +A logical expression is a version of a [Relational Expression]. |
52 | 128 |
|
53 |
| -A **scalar group** consists of logically equivalent logical scalar operators. |
| 129 | +TODO(connor) Add more details. |
54 | 130 |
|
55 |
| -### Rule |
| 131 | +Examples of logical expressions include Logical Scan, Logical Join, or Logical Sort expressions |
| 132 | +(which can just be shorthanded to Scan, Join, or Sort). |
56 | 133 |
|
57 |
| -a **rule** in Cascades transforms an expression into equivalent expressions. It has the following interface. |
| 134 | +## Physical Expression |
| 135 | + |
| 136 | +A physical expression is a version of a [Relational Expression]. |
| 137 | + |
| 138 | +TODO(connor) Add more details. |
| 139 | + |
| 140 | +Examples of physical expressions include Table Scan, Index Scan, Hash Join, or Sort Merge Join. |
| 141 | + |
| 142 | +## Scalar Expression |
| 143 | + |
| 144 | +A scalar expression is a version of an [Expression]. |
| 145 | + |
| 146 | +A scalar expression describes an operation that can be evaluated to obtain a single value. This can |
| 147 | +also be referred to as a SQL expression, a row expression, or a SQL predicate. |
| 148 | + |
| 149 | +TODO(everyone) Figure out the semantics of what a scalar expression really is. |
| 150 | + |
| 151 | +Examples of scalar expressions include the expressions `t1.a < 42` or `t1.b = t2.c`. |
| 152 | + |
| 153 | +## Expression Equivalence |
| 154 | + |
| 155 | +Two [Logical Expression]s are equivalent if the [Logical Property]s of the two expressions are the |
| 156 | +same. In other words, the [Logical Plan]s they represent produce the same set of rows and columns. |
| 157 | + |
| 158 | +Two Physical Expressions are equivalent if their Logical and [Physical Property]s are the same. |
| 159 | +In other words, the [Physical Plan]s they represent produce the same set of rows and columns, in the |
| 160 | +exact same order and distribution. |
| 161 | + |
| 162 | +TODO This next part is unclear? |
| 163 | + |
| 164 | +A [Logical Expression] with a required [Physical Property] is equivalent to a [Physical Expression] |
| 165 | +if the [Physical Expression] has the same [Logical Property] and delivers the [Physical Property]. |
| 166 | + |
| 167 | +## Group |
| 168 | + |
| 169 | +A **group** is a set of equivalent [Expression]s. |
| 170 | + |
| 171 | +We follow the definition of groups in the Volcano and Cascades frameworks. From the [EQOP] Microsoft |
| 172 | +article (Section 2.2, page 205): |
| 173 | + |
| 174 | +> In the memo, each class of equivalent expressions is called an _equivalence class_ or a _group_, |
| 175 | +> and all equivalent expressions within the class are called _group expressions_ or simply |
| 176 | +> _expressions_. |
| 177 | +
|
| 178 | +## Relational Group |
| 179 | + |
| 180 | +A relational group is a set of 1 or more equivalent [Logical Expression]s and 0 or more equivalent |
| 181 | +[Physical Expression]s. |
| 182 | + |
| 183 | +For a given relational group, the first step of optimization is exploration, in which equivalent |
| 184 | +[Logical Expression]s are added to the group via [Transformation Rule]s. Once the search space for |
| 185 | +the group has been exhausted (all possible transformation rules have been applied to all logical |
| 186 | +expressions in the group), the group can be physically optimized. At this point, the search |
| 187 | +algorithm will apply [Implementation RUle]s to cost and find the best execution plan. |
| 188 | + |
| 189 | +TODO Add more details. |
| 190 | + |
| 191 | +TODO Add example. |
| 192 | + |
| 193 | +## Scalar Group |
| 194 | + |
| 195 | +A scalar group consists of equivalent [Scalar Expression]s. |
| 196 | + |
| 197 | +TODO Add more details. |
| 198 | + |
| 199 | +TODO Add example. |
| 200 | + |
| 201 | +<br> |
| 202 | + |
| 203 | +# Plan Enumeration and Search Concepts |
| 204 | + |
| 205 | +This section describes names and definitions of concepts related to the general plan enumeration and |
| 206 | +search of optimal query plans. |
| 207 | + |
| 208 | +## Query Plan |
| 209 | + |
| 210 | +A query plan is a tree or DAG of relational and scalar operators. We can consider query optimization |
| 211 | +to be a function from an unoptimized query plan to an optimized query plan. More specifically, the |
| 212 | +input plan is generally a [Logical Plan] and the output plan is always a [Physical Plan]. |
| 213 | + |
| 214 | +We generally consider query plans to either be completely logical or completely physical. However, |
| 215 | +when dealing with rule matching and rule application to enumerate different but equivalent query |
| 216 | +plans, we also deal with partially materialized query plans that can be a mix of both logical and |
| 217 | +physical operators (as well as group identifiers and other scalar operators). |
| 218 | + |
| 219 | +TODO Add more details about partially materialized plans. |
| 220 | + |
| 221 | +## Logical Plan |
| 222 | + |
| 223 | +A logical plan is a tree or DAG of [Logical Operator]s that can be evaluated to produce a bag of |
| 224 | +tuples. This can also be referred to as a logical query plan. The [Operator]s that make up this |
| 225 | +logical plan can be considered logical plan nodes. |
| 226 | + |
| 227 | +## Physical Plan |
| 228 | + |
| 229 | +A physical plan is a tree or DAG of [Physical Operator]s that can be evaluated by an execution |
| 230 | +engine to produce a table. This can also be referred to as a physical query plan. The [Operator]s |
| 231 | +that make up this physical plan can be considered physical plan nodes. |
| 232 | + |
| 233 | +## Operator |
| 234 | + |
| 235 | +An operator is the materialized version of an [Expression]. Like expressions, there are both |
| 236 | +relational operators and scalar operators. |
| 237 | + |
| 238 | +See the following sections for more information. |
| 239 | + |
| 240 | +## Relational Operator |
| 241 | + |
| 242 | +A relational operator is a node in a [Query Plan] (which is a tree or DAG), and is the materialized |
| 243 | +version of a [Relational Expression]. |
| 244 | + |
| 245 | +## Logical Operator |
| 246 | + |
| 247 | +A logical operator is a node in a [Logical Plan] (which is a tree or DAG), and is the materialized |
| 248 | +version of a [Logical Expression]. |
| 249 | + |
| 250 | +## Physical Operator |
| 251 | + |
| 252 | +A physical operator is a node in a [Physical Plan] (which is a tree or DAG), and is the materialized |
| 253 | +version of a [Physical Expression]. |
| 254 | + |
| 255 | +## Scalar Operator |
| 256 | + |
| 257 | +A scalar operator is a node in a [Query Plan] that describes a scalar expression, and can be |
| 258 | +considered the materialized version of a [Scalar Expression]. |
| 259 | + |
| 260 | +## Property |
| 261 | + |
| 262 | +A property is metadata computed (and sometimes stored) for a given relational expression. |
| 263 | + |
| 264 | +Properties of an expression may be _required_ by the original SQL query or _derived_ from the |
| 265 | +[Physical Property] of one of its inputs. |
| 266 | + |
| 267 | +TODO Add more details. |
| 268 | + |
| 269 | +## Logical Property |
| 270 | + |
| 271 | +A logical property describes the structure and content of data returned by a given expression. |
| 272 | + |
| 273 | +Examples: row count, operator type,statistics, whether relational output columns can contain nulls. |
| 274 | + |
| 275 | +TODO Clean up and add more details. |
| 276 | + |
| 277 | +## Physical Property |
| 278 | + |
| 279 | +A physical property is a characteristic of an expression that impacts its layout, presentation, or |
| 280 | +location, but not its logical content. |
| 281 | + |
| 282 | +Examples: order and data distribution. |
| 283 | + |
| 284 | +TODO Clean up and add more details. |
| 285 | + |
| 286 | +## Rule |
| 287 | + |
| 288 | +A rule transforms a query plan or sub-plan into an equivalent plan. |
| 289 | + |
| 290 | +Rules should have an interface similar to the following: |
58 | 291 |
|
59 | 292 | ```rust
|
60 | 293 | trait Rule {
|
61 | 294 | /// Checks whether the rule is applicable on the input expression.
|
62 | 295 | fn check_pattern(expr: Expr) -> bool;
|
| 296 | + |
63 | 297 | /// Transforms the expression into one or more equivalent expressions.
|
64 | 298 | fn transform(expr: Expr) -> Vec<Expr>;
|
65 | 299 | }
|
66 | 300 | ```
|
67 | 301 |
|
68 |
| -A **transformation rule** transforms a **part** of the logical expression into logical expressions. This is also called a logical to logical transformation in other systems. |
| 302 | +TODO Actually figure out the interface for rules since it's probably not going to like that. |
| 303 | + |
| 304 | +TODO Clean up and add more details. |
69 | 305 |
|
70 |
| -A **implementation rule** transforms a **part** of a logical expression to an equivalent physical expression with physical properties. |
| 306 | +## Transformation Rule |
71 | 307 |
|
72 |
| -In Cascades, you don't need to materialize the entire query tree when applying rules. Instead, you can materialize expressions on demand while leaving unrelated parts of the tree as group identifiers. |
| 308 | +A transformation rule transforms a _part_ of the logical expression into logical expressions. |
| 309 | + |
| 310 | +This is also called a logical to logical transformation in other systems. |
| 311 | + |
| 312 | +TODO Clean up and add more details. |
| 313 | + |
| 314 | +## Implementation Rule |
| 315 | + |
| 316 | +A implementation rule transforms a _part_ of a logical expression to an equivalent physical |
| 317 | +expression with physical properties. |
| 318 | + |
| 319 | +In Cascades, you don't need to materialize the entire query tree when applying rules. Instead, you |
| 320 | +can materialize expressions on demand while leaving unrelated parts of the tree as group identifiers. |
73 | 321 |
|
74 | 322 | In other systems, there are physical to physical expression transformation for execution engine specific optimization, physical property enforcement, or distributed planning. At the moment, we are **not** considering physical-to-physical transformations.
|
75 | 323 |
|
76 |
| -**Enforcer rule:** *TODO!* |
| 324 | +TODO Clean up and add more details. |
| 325 | + |
| 326 | +## Enforcer Rule |
| 327 | + |
| 328 | +TODO Write this section. |
| 329 | + |
| 330 | +## Enforcer Operator |
| 331 | + |
| 332 | +TODO Write this section. |
0 commit comments