-
Notifications
You must be signed in to change notification settings - Fork 281
feat(transaction): Add retry logic to transaction #1484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious how do we plan to test the retry? i.e. is it possible to use mock catalog and inject retriable / non-retriable error?
Hi @dentiny , this is a good question:) I'm thinking of using a mock catalog, but haven't figured out exactly how |
.expect("Invalid value for commit.retry.num-retries"), | ||
) | ||
.with_factor(2.0) | ||
.build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really related to this change, but something I noticed while writing the PR:
- There is not a centralized properties class, which makes properties hard to locate in the code base: Some properties are defined in table_metadata.rs, and some are defined in the classes that use them: example
- No orthogonal way to fetch and parse property value, which makes error handling inconsistent. For example, this LOC will silently fallback to the default value if property parsing failed
For 1, we can have a dedicated file like table_metadata.rs
to hold all hardcoded property keys, or we can have a property struct to define properties we support like what datafusion does
For 2, something like iceberg-java's PropertyUtil can be very useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of 1. I agree that we could have a module like table_properties
to contain all the properties, and have sth to parse properties in a type safe way like following:
pub struct TableProperties {
#[key="commit.num.retries", default="1"]
commitRetries: i32
}
We could track this in following up issue.
cc @liurenjie1024 @Xuanwo this PR is ready for review, PTAL, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CTTY for this pr, generally LGTM! Just some nits.
pub const COMMIT_NUM_RETRIES: &str = "commit.retry.num-retries"; | ||
/// Default value for number of commit retries. | ||
pub const COMMIT_NUM_RETRIES_DEFAULT: usize = 4; | ||
|
||
/// Property key for minimum wait time (ms) between retries. | ||
pub const COMMIT_MIN_RETRY_WAIT_MS: &str = "commit.retry.min-wait-ms"; | ||
/// Default value for minimum wait time (ms) between retries. | ||
pub const COMMIT_MIN_RETRY_WAIT_MS_DEFAULT: u64 = 100; | ||
|
||
/// Property key for maximum wait time (ms) between retries. | ||
pub const COMMIT_MAX_RETRY_WAIT_MS: &str = "commit.retry.max-wait-ms"; | ||
/// Default value for maximum wait time (ms) between retries. | ||
pub const COMMIT_MAX_RETRY_WAIT_MS_DEFAULT: u64 = 60 * 1000; // 1 minute | ||
|
||
/// Property key for total maximum retry time (ms). | ||
pub const COMMIT_TOTAL_RETRY_TIME_MS: &str = "commit.retry.total-timeout-ms"; | ||
/// Default value for total maximum retry time (ms). | ||
pub const COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT: u64 = 30 * 60 * 1000; // 30 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add PROPERTY_
prefix following current convention.
.expect("Invalid value for commit.retry.num-retries"), | ||
) | ||
.with_factor(2.0) | ||
.build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of 1. I agree that we could have a module like table_properties
to contain all the properties, and have sth to parse properties in a type safe way like following:
pub struct TableProperties {
#[key="commit.num.retries", default="1"]
commitRetries: i32
}
We could track this in following up issue.
.get(COMMIT_MIN_RETRY_WAIT_MS) | ||
.map(|s| s.parse()) | ||
.unwrap_or_else(|| Ok(COMMIT_MIN_RETRY_WAIT_MS_DEFAULT)) | ||
.expect("Invalid value for commit.retry.min-wait-ms"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not panic here, we should return error.
.get(COMMIT_MAX_RETRY_WAIT_MS) | ||
.map(|s| s.parse()) | ||
.unwrap_or_else(|| Ok(COMMIT_MAX_RETRY_WAIT_MS_DEFAULT)) | ||
.expect("Invalid value for commit.retry.max-wait-ms"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
.get(COMMIT_TOTAL_RETRY_TIME_MS) | ||
.map(|s| s.parse()) | ||
.unwrap_or_else(|| Ok(COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT)) | ||
.expect("Invalid value for commit.retry.total-timeout-ms"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
.get(COMMIT_NUM_RETRIES) | ||
.map(|s| s.parse()) | ||
.unwrap_or_else(|| Ok(COMMIT_NUM_RETRIES_DEFAULT)) | ||
.expect("Invalid value for commit.retry.num-retries"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
.returning_st(move |_| { | ||
if let Some(attempts) = success_after_attempts { | ||
static mut ATTEMPTS: u32 = 0; | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need unsafe
here? Modifying static var? We could use Atomic
variable to avoid using unsafe
Which issue does this PR close?
Transaction::commit
method #1387What changes are included in this PR?
backon
Are these changes tested?
Added unit tests