You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transaction throughput is largely limited by latency to object storage. Transactions and manifests need to be written sequentially. And when there are conflicts, we need to do a retry loop. In practice, we can only achieve 1-4 transactions per second on object stores.
When there are writers across different processes, this is unavoidable. But often we have many writers in the same process. This might be a server handling requests to Lance, or the same embedded client writing to a local dataset. If we could buffer and combine transactions within each process, we can reduce the physical TPS and concurrency.
Idea
Instead of eagerly writing out files, we can buffer writes and commits in memory. Then flush periodically. Flushing consists of merging all buffered transactions into one, and then committing that transaction. The flush period can be short (like 1 or 2s) so that we can wait to return success to the caller until after the flush has happened. Thus, even though we only write out 1 TPS, we can handle many more from the user.
To put it another way, if a user does N appends in a single process concurrently, we should be able to coalesce those in memory and write them as a single transaction.
Example
Consider 3 single-row upsert operations (that affect different rows).
Currently, each has to:
Scan
Write a data file
Write a deletion file
4. (Actually it's worse than this: during conflict resolution we need to merge these deletion files again. So two of these transactions will need to rewrite their deletion files, for a total of 5 deletion files written. But we ignore this for simplicity.)
Write a transaction file
Commit a manifest (when no one else is committing)
If we buffered and merged the transactions:
We could compact the data files in memory, writing only one
We could merge the data files in memory, writing only one
We would commit only one transaction
There would be no contention when writing the manifest
This means the following changes:
6 write IO hops -> 2 hops
12 write IO requests -> 4 requests
No retry loop is needed to commit the manifest
Less compaction: 3 data files instead of 1
Where to put the buffer
Attached to a Session object, we can hold a central committer that holds the buffer.
Merging transactions
We currently have basic facilities for merging trasnactions. See CommitBuilder::commit_batch(). We can use the ConflictResolver to make this work for upsert, update, and delete transactions as well.
Deferring deletion file writes
If we write out deletion files and have to read them and merge them later, this can slow us down. One option we should add is deferring writing the deletion file(s) until during the commit step. The RoaringBitmaps can just be passed to the committer, just like the affected_rows parameter.
Deferring data file writes
The other problem that comes with allowing higher throughput writes is we will create many small files. Unless compaction can keep up, this will makes scans slower with each new write. The overall throughput with upsert would be undermined.
To mitigate this, we should consider deferring writing data files as well. The object store write we pass to the file writer could handle this. For example, if the data written is < 5MB, instead of doing the upload, it could just return an in-memory copy of the data to be written. This could be written later in the commit step, or compacted together with other deferred writes.
Drawback
This will mess with the transaction history
Users who want to be able to see a record of transaction history won't see that reflected in the new versions. However, users who want high throughput writes likely don't care as much about this, so it could make sense to control this with a setting.
Long-term, this could be addressed by the design in #3734, which separates the idea of user-facing transactions and the Lance-level transactions.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Transaction throughput is largely limited by latency to object storage. Transactions and manifests need to be written sequentially. And when there are conflicts, we need to do a retry loop. In practice, we can only achieve 1-4 transactions per second on object stores.
When there are writers across different processes, this is unavoidable. But often we have many writers in the same process. This might be a server handling requests to Lance, or the same embedded client writing to a local dataset. If we could buffer and combine transactions within each process, we can reduce the physical TPS and concurrency.
Idea
Instead of eagerly writing out files, we can buffer writes and commits in memory. Then flush periodically. Flushing consists of merging all buffered transactions into one, and then committing that transaction. The flush period can be short (like 1 or 2s) so that we can wait to return success to the caller until after the flush has happened. Thus, even though we only write out 1 TPS, we can handle many more from the user.
To put it another way, if a user does N appends in a single process concurrently, we should be able to coalesce those in memory and write them as a single transaction.
Example
Consider 3 single-row upsert operations (that affect different rows).
Currently, each has to:
4. (Actually it's worse than this: during conflict resolution we need to merge these deletion files again. So two of these transactions will need to rewrite their deletion files, for a total of 5 deletion files written. But we ignore this for simplicity.)
If we buffered and merged the transactions:
This means the following changes:
Where to put the buffer
Attached to a
Session
object, we can hold a central committer that holds the buffer.Merging transactions
We currently have basic facilities for merging trasnactions. See CommitBuilder::commit_batch(). We can use the
ConflictResolver
to make this work for upsert, update, and delete transactions as well.Deferring deletion file writes
If we write out deletion files and have to read them and merge them later, this can slow us down. One option we should add is deferring writing the deletion file(s) until during the commit step. The
RoaringBitmaps
can just be passed to the committer, just like theaffected_rows
parameter.Deferring data file writes
The other problem that comes with allowing higher throughput writes is we will create many small files. Unless compaction can keep up, this will makes scans slower with each new write. The overall throughput with upsert would be undermined.
To mitigate this, we should consider deferring writing data files as well. The object store write we pass to the file writer could handle this. For example, if the data written is < 5MB, instead of doing the upload, it could just return an in-memory copy of the data to be written. This could be written later in the commit step, or compacted together with other deferred writes.
Drawback
This will mess with the transaction history
Users who want to be able to see a record of transaction history won't see that reflected in the new versions. However, users who want high throughput writes likely don't care as much about this, so it could make sense to control this with a setting.
Long-term, this could be addressed by the design in #3734, which separates the idea of user-facing transactions and the Lance-level transactions.
Beta Was this translation helpful? Give feedback.
All reactions