Lance Table and Column Level Governance #3997
jackye1995
started this conversation in
Ideas
Replies: 1 comment
-
That's a cool idea! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Governance is usually a catalog level feature, and typically comes with some sort of lock-in to use a specific catalog + specific engines that can respect the access policies in the catalog.
With the unique design of Lance, we can actually achieve table and column level governance even without a catalog. Because Lance can store different columns in different files, as long as we can set different encryption keys for different data files and metadata files, users can just configure access to those encryption keys to control access to different columns in the same table.
For example, consider a table with columns
(c1, c2, c3, ssn)
. A user can configure:g1 = [c1, c2, c3]
andg2=[ssn]
(see more details in the column group discussion: Column group with horizontal compaction and split #3995)k1
to encryptg1
and encryption keyk2
to encryptg2
Technically, Parquet also have similar feature of column level encryption, but because all columns must be stored in the same Parquet file, this makes the Parquet spec quite complicated. Users have to upgrade to use it, and all the readers, writers, related table formats and engines have to be updated to support it, thus it becomes much harder to gain very wide ecosystem adoption (personal opinion :P). Compared to that, Lance just needs to add the ability for users to set the key, and ensure we pass the key information to the object storage layer (e.g. S3 SSE-KMS). There is no additional handling needed at the file format level.
Beta Was this translation helpful? Give feedback.
All reactions