-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Document Table Constraint Enforcement Behavior in Custom Table Providers Guide #16340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
eb9fff5
6d3b5c4
6fd4527
d6d51db
3a9ee8d
46859f2
27e396c
8da0cbe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Table Constraint Enforcement | ||
|
||
Table providers can describe table constraints using the | ||
[`TableConstraint`] and [`Constraints`] APIs. These constraints include | ||
primary keys, unique keys, foreign keys and check constraints. | ||
|
||
DataFusion does **not** currently enforce these constraints at runtime. | ||
They are provided for informational purposes and can be used by custom | ||
`TableProvider` implementations or other parts of the system. | ||
|
||
- **Nullability**: The only property enforced by DataFusion is the | ||
nullability of each [`Field`] in a schema. Columns marked as not | ||
nullable should not produce null values during execution. DataFusion | ||
does not check this when data is ingested. | ||
- **Primary and unique keys**: DataFusion does not verify that the data | ||
satisfies primary or unique key constraints. Table providers that | ||
require this behaviour must implement their own checks. | ||
- **Foreign keys and check constraints**: These constraints are parsed | ||
but are not validated or used during query planning. | ||
|
||
The optimizer also does not assume that these constraints hold when | ||
rewriting queries. For example, declaring a column as a primary key will | ||
not allow the optimizer to skip a `DISTINCT` aggregation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't think this was true -- I was pretty sure there are some ordering / functional dependency check that relies on declared constraints, but I couldn't find it quickly when searching Maybe @mustafasrepo remembers 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hi @alamb, You're right. I tested this in datafusion-cli -- Test 1: Create table with more data to see if DISTINCT appears
CREATE TABLE test_pk_large (
id INTEGER PRIMARY KEY,
name VARCHAR(50)
);
-- Insert duplicate names but unique IDs
INSERT INTO test_pk_large VALUES
(1, 'Alice'),
(2, 'Alice'),
(3, 'Bob'),
(4, 'Bob'),
(5, 'Charlie');
-- Test DISTINCT on primary key column
EXPLAIN SELECT DISTINCT id FROM test_pk_large;
+---------------+-------------------------------+
| plan_type | plan |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ DataSourceExec │ |
| | │ -------------------- │ |
| | │ bytes: 376 │ |
| | │ format: memory │ |
| | │ rows: 1 │ |
| | └───────────────────────────┘ |
| | |
+---------------+-------------------------------+
-- Test 2
CREATE TABLE test_no_pk (
id INTEGER,
name VARCHAR(50)
);
-- Insert unique IDs (same as before)
INSERT INTO test_no_pk VALUES
(1, 'Alice'),
(2, 'Alice'),
(3, 'Bob'),
(4, 'Bob'),
(5, 'Charlie');
EXPLAIN SELECT DISTINCT id FROM test_no_pk;
+---------------+-------------------------------+
| plan_type | plan |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ AggregateExec │ |
| | │ -------------------- │ |
| | │ group_by: id │ |
| | │ │ |
| | │ mode: │ |
| | │ FinalPartitioned │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ CoalesceBatchesExec │ |
| | │ -------------------- │ |
| | │ target_batch_size: │ |
| | │ 8192 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ RepartitionExec │ |
| | │ -------------------- │ |
| | │ partition_count(in->out): │ |
| | │ 10 -> 10 │ |
| | │ │ |
| | │ partitioning_scheme: │ |
| | │ Hash([id@0], 10) │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ RepartitionExec │ |
| | │ -------------------- │ |
| | │ partition_count(in->out): │ |
| | │ 1 -> 10 │ |
| | │ │ |
| | │ partitioning_scheme: │ |
| | │ RoundRobinBatch(10) │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ AggregateExec │ |
| | │ -------------------- │ |
| | │ group_by: id │ |
| | │ mode: Partial │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ DataSourceExec │ |
| | │ -------------------- │ |
| | │ bytes: 376 │ |
| | │ format: memory │ |
| | │ rows: 1 │ |
| | └───────────────────────────┘ |
| | |
+---------------+-------------------------------+ In other words, the declared constraints does affect the optimizer. |
||
|
||
[`tableconstraint`]: https://docs.rs/datafusion/latest/datafusion/sql/planner/enum.TableConstraint.html | ||
[`constraints`]: https://docs.rs/datafusion/latest/datafusion/common/functional_dependencies/struct.Constraints.html | ||
[`field`]: https://docs.rs/arrow/latest/arrow/datatype/struct.Field.html |
Uh oh!
There was an error while loading. Please reload this page.