Adding an SQL Operator

Introduction

The current architecture of Peloton contains a mix of both PostgreSQL components and our own components. In particular, we use the SQL parser and planner of Postgres, and then we use our own execution engine to execute these generated plans. We also use our own storage engine to store the databases. More information on the tile-based architecture of our execution and storage engines is available here.

Execution Engine

Before going about adding an operator, you might want to look at existing Peloton operators in src/backend/executor. In particular, the limit operator is kind of straightforward.

All the operators inherit from the abstract operator class. In particular, each operator has a Init and Execute functions. These functions should initialize/reinitialize and execute the respective operator.

When a parent operator invokes the Execute function of a child operator, the child operator returns false if and only if it has already returned all the logical tiles it has produced to the parent operator. It will never return an empty logical tile. Otherwise, the child operator returns true, and the parent operator can use GetOutput to obtain the logical tile produced by the child operator. A parent operator can, therefore, repeatedly invoke Execute function of a child operator to obtain all the logical tiles produced by the child operator.

Peloton Query Processing

We intercept the generated plan trees generated by the Postgres planner component and use our own executor component. Now, we will get into the specifics within our side of things.

Queries are classified into data description language (DDL) queries and data manipulation language (DML) queries. These two categories of queries take two different processing paths both within the Postgres frontend and Peloton.

In Postgres, DML queries are executed in four stages. Take a look at the entry point of the Postgres executor module here. ExecutorStart() performs some initialization that sets up the dynamic plan state tree from the static plan tree. ExecutorRun() invokes the plan state tree. ExecutorFinish() and ExecutorEnd() take care of cleaning things up, but they are not relevant to us. Peloton takes over query execution when queries reach ExecutorRun(), and we therefore only make use of ExecutorStart() in our system.

In case of DDL queries, Peloton intercepts them in the ProcessUtility function here.

Peloton cannot directly execute the Postgres plan state tree as our executors can only understand our own Peloton query plan tree. So, we need to transform the Postgres plan state tree into a Peloton plan tree before execution. We refer to this process as plan mapping or plan transformation. After mapping the plan, Peloton executes the plan tree by recursively executing the plan tree nodes. We obtain Peloton tuples after query processing. We then transform them back into Postgres tuples before sending them back to the client via the Postgres frontend.

Plan Mapping

After taking over from Postgres, DDL queries are handled by peloton_ddl(), whereas DML queries would be processed by peloton_dml(). These functions are located here within the peloton module.

Plan mapping is done only for DML queries, since DDL queries do not require any planning. The high-level idea is to map each plan node in the Postgres plan state tree recursively into a corresponding plan node in the Peloton plan tree. The plan mapper module preprocesses the plan state tree, and extracts the critical information from each Postgres plan node. This preprocessing is performed by functions in the peloton::bridge::DMLUtils namespace. The main PlanTransformer would then transform the preprocessed plan by recursively invoking sub-transformers based on the type of node in the tree. An entry point for this module is peloton::bridge::PlanTransformer::TransformPlan().

Peloton Plan Execution

Peloton then builds an executor tree based on the Peloton query plan tree. It then runs the executor tree recursively.

Execution context is the state associated with an instance of the plan execution, such as parameters and transaction information. By separating the execution context from the query plan, we can support prepared statements. A planned and then mapped query plan can be reused with different execution contexts. This saves time spent for query planning and mapping.

After that, query execution consists of two stages. The execution tree has to be initialized (DInit()), and then it is executed (DExecute). An entry point for this module is here.

Expression System

We have our own expression system in Peloton. We transform the Postgres expressions into Peloton expressions, and evaluate them. All the expressions are based on the abstract expression class.

The code related to our expression system is located under src/backend/expression. There are several file containing utility functions like this one containing date-related functions.

The expression system is tightly coupled with our type system. The type system is based on an abstract data type called Value. The associated code is located here.

Adding an SQL Operator

Introduction

Execution Engine

Peloton Query Processing

Plan Mapping

Peloton Plan Execution

Expression System

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

Installation

Development

Development Tools

Development Notes

Clone this wiki locally