-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Functional dependencies are a generalization of unique constraints.
They are very much needed for virtualization when dealing with denormalized data, which is almost the de facto norm when data comes from files (e.g. JSON files, CSV extracts, Excel).
To give an example, let's consider a table describing persons and their address. It has an unique constraint over the person_id
.
With functional dependencies, we can further declare that the city_id
determines the city_name
and the region_id
, that the region_id
determines the region_name
and the country_id
and so on and so forth. These dependencies are not unique because they are repeated in many rows.
Functional dependencies are also typically transitive (e.g. city_id
determines in the end the country_id
). It is important to let the processor compute the transitive closure, as specifying it manually can be very cumbersome and error-prone.
This feature is supported in Ontop lenses: https://ontop-vkg.org/guide/advanced/lenses.html#otherfunctionaldependency .
At Ontopic, we use them a lot in projects. In particular, they are key for dealing with JSON-like data structures in large datasets available in platforms like BigQuery or SparkSQL. Without them, virtualization wouldn't have been feasible over these large datasets.
Functional dependencies enable some self-inner-join and self-left-join optimizations.