Skip to content

feat(transaction): Add retry logic to transaction #1484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

CTTY
Copy link
Contributor

@CTTY CTTY commented Jul 1, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Added retry logic using backon

Are these changes tested?

Added unit tests

@CTTY CTTY marked this pull request as draft July 1, 2025 18:43
Copy link
Contributor

@dentiny dentiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious how do we plan to test the retry? i.e. is it possible to use mock catalog and inject retriable / non-retriable error?

@CTTY
Copy link
Contributor Author

CTTY commented Jul 1, 2025

curious how do we plan to test the retry? i.e. is it possible to use mock catalog and inject retriable / non-retriable error?

Hi @dentiny , this is a good question:) I'm thinking of using a mock catalog, but haven't figured out exactly how

@dentiny
Copy link
Contributor

dentiny commented Jul 1, 2025

Hi @dentiny , this is a good question:) I'm thinking of using a mock catalog, but haven't figured out exactly how

I use mockall on traits.
Just for reference:)

.expect("Invalid value for commit.retry.num-retries"),
)
.with_factor(2.0)
.build()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really related to this change, but something I noticed while writing the PR:

  1. There is not a centralized properties class, which makes properties hard to locate in the code base: Some properties are defined in table_metadata.rs, and some are defined in the classes that use them: example
  2. No orthogonal way to fetch and parse property value, which makes error handling inconsistent. For example, this LOC will silently fallback to the default value if property parsing failed

For 1, we can have a dedicated file like table_metadata.rs to hold all hardcoded property keys, or we can have a property struct to define properties we support like what datafusion does

For 2, something like iceberg-java's PropertyUtil can be very useful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of 1. I agree that we could have a module like table_properties to contain all the properties, and have sth to parse properties in a type safe way like following:

pub struct TableProperties {
     #[key="commit.num.retries", default="1"]
     commitRetries: i32
}

We could track this in following up issue.

@CTTY CTTY marked this pull request as ready for review July 3, 2025 04:36
@CTTY
Copy link
Contributor Author

CTTY commented Jul 3, 2025

cc @liurenjie1024 @Xuanwo this PR is ready for review, PTAL, thanks!

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this pr, generally LGTM! Just some nits.

Comment on lines +102 to +119
pub const COMMIT_NUM_RETRIES: &str = "commit.retry.num-retries";
/// Default value for number of commit retries.
pub const COMMIT_NUM_RETRIES_DEFAULT: usize = 4;

/// Property key for minimum wait time (ms) between retries.
pub const COMMIT_MIN_RETRY_WAIT_MS: &str = "commit.retry.min-wait-ms";
/// Default value for minimum wait time (ms) between retries.
pub const COMMIT_MIN_RETRY_WAIT_MS_DEFAULT: u64 = 100;

/// Property key for maximum wait time (ms) between retries.
pub const COMMIT_MAX_RETRY_WAIT_MS: &str = "commit.retry.max-wait-ms";
/// Default value for maximum wait time (ms) between retries.
pub const COMMIT_MAX_RETRY_WAIT_MS_DEFAULT: u64 = 60 * 1000; // 1 minute

/// Property key for total maximum retry time (ms).
pub const COMMIT_TOTAL_RETRY_TIME_MS: &str = "commit.retry.total-timeout-ms";
/// Default value for total maximum retry time (ms).
pub const COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT: u64 = 30 * 60 * 1000; // 30 minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add PROPERTY_ prefix following current convention.

.expect("Invalid value for commit.retry.num-retries"),
)
.with_factor(2.0)
.build()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of 1. I agree that we could have a module like table_properties to contain all the properties, and have sth to parse properties in a type safe way like following:

pub struct TableProperties {
     #[key="commit.num.retries", default="1"]
     commitRetries: i32
}

We could track this in following up issue.

.get(COMMIT_MIN_RETRY_WAIT_MS)
.map(|s| s.parse())
.unwrap_or_else(|| Ok(COMMIT_MIN_RETRY_WAIT_MS_DEFAULT))
.expect("Invalid value for commit.retry.min-wait-ms"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not panic here, we should return error.

.get(COMMIT_MAX_RETRY_WAIT_MS)
.map(|s| s.parse())
.unwrap_or_else(|| Ok(COMMIT_MAX_RETRY_WAIT_MS_DEFAULT))
.expect("Invalid value for commit.retry.max-wait-ms"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

.get(COMMIT_TOTAL_RETRY_TIME_MS)
.map(|s| s.parse())
.unwrap_or_else(|| Ok(COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT))
.expect("Invalid value for commit.retry.total-timeout-ms"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

.get(COMMIT_NUM_RETRIES)
.map(|s| s.parse())
.unwrap_or_else(|| Ok(COMMIT_NUM_RETRIES_DEFAULT))
.expect("Invalid value for commit.retry.num-retries"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

.returning_st(move |_| {
if let Some(attempts) = success_after_attempts {
static mut ATTEMPTS: u32 = 0;
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need unsafe here? Modifying static var? We could use Atomic variable to avoid using unsafe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add retry in Transaction::commit method
3 participants