Row Level Commit Resolution #3726
westonpace
started this conversation in
Lance Table Format
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Lance's conflict resolution currently happens at a fragment-level. In other words, if two operations modify the same fragment (e.g. to mark a row deleted because it has been updated elsewhere) then those operations will conflict. This can be problematic for small parallel updates (or merge_insert operations). For example, if two parallel processes perform an "update by id" on unique ids then many users are surprised to find these operations can conflict.
Pre-requisites
Challenges
Clearly define affected operations: We should clearly define which operations need row-level conflict resolution (in terms based on #3724). At the moment I am currently thinking this matters for updates, merge inserts, and deletes.
Determining overlap: We need to be able to identify if two operations overlap. When detecting if there is a conflict all we have are the transactions. There is not enough information in the transactions at the moment to know if two operations overlap. We could potentially save modified row ids in the transaction for update / delete operations. Alternatively, we could bring more information into the overlap detection. For example, if we determine there is potential overlap (because two operations modify the same fragments), then we may need to pull up the affected manifests to get row-id information.
Merging changes: If two updates happen to the same fragment (but different rows) then we will need to somehow merge the two delete files into a single delete file. There may be other changes like this that will need to be merged.
Beta Was this translation helpful? Give feedback.
All reactions