Skip to content

feat: add ExponentialJitterBackoff backoff strategy #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions client.go
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,36 @@ func LinearJitterBackoff(min, max time.Duration, attemptNum int, resp *http.Resp
return time.Duration(jitterMin * int64(attemptNum))
}

// ExponentialJitterBackoff is an extension of DefaultBackoff that applies
// a jitter to avoid thundering herd.
func ExponentialJitterBackoff(min, max time.Duration, attemptNum int, resp *http.Response) time.Duration {
baseBackoff := DefaultBackoff(min, max, attemptNum, resp)

if resp != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Default backoff already handles Retry-After. Is this check required again? ref

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is needed to avoid adding the jitter when the Retry-After header has been handled, that is returning the backoff returned by DefaultBackoff as it is.

if retryAfterHeaders := resp.Header["Retry-After"]; len(retryAfterHeaders) > 0 && retryAfterHeaders[0] != "" {
return baseBackoff
}
}

// Seed randomization; it's OK to do it every time
rnd := rand.New(rand.NewSource(time.Now().UnixNano()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: To maintain consistency, can we use the same source like here?

Copy link
Author

@wandering-tales wandering-tales Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can factor out a new function to get the randomization source. Is that what you mean?


jitter := rnd.Float64()*0.5 - 0.25 // Random value between -0.25 e +0.25
jitteredSleep := time.Duration(float64(baseBackoff) * (1.0 + jitter))

return clampDuration(jitteredSleep, min, max)
Comment on lines +655 to +658

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why is the jitter between -0.25 and +0.25 and not between min and max like in LinearJitterBackoff?

Also, since min and max mean different things between the two functions, it would be worthwhile adding it to the function documentation.

Copy link
Author

@wandering-tales wandering-tales Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why is the jitter between -0.25 and +0.25 and not between min and max like in LinearJitterBackoff?

As far as I can see, the min and max arguments should be respectively treated as absolute lower and
upper limits for the computed backoff. In that regard, the LinearJitterBackoff function is the only one that treats them slightly differently. Quoting its documentation:

min and max here are not absolute values. The number to be multiplied by
the attempt number will be chosen at random from between them, thus they are
bounding the jitter.

Said that, to guarantee the exponential trend of the backoff handled by DefaultBackoff, I think the jitter should necessarily be within a "neighborhood", in a mathematical sense, of the value returned by DefaultBackoff. Hence, a random value is picked between the interval [ backoff - 0.25 * backoff, backoff + 0.25 * backoff ].

Regarding your other observation, instead:

Also, since min and max mean different things between the two functions, it would be worthwhile adding it to the function documentation.

I'll improve the documentation of the new function to explain how the arguments are used.

}

func clampDuration(d, min, max time.Duration) time.Duration {
if d < min {
return min
}
if d > max {
return max
}
return d
}

Comment on lines +661 to +670

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This function could've been inlined.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this function to exist is to fully test it in isolation.

As this clamping is performed for a random value, there's no guarantee of full test coverage.

// PassthroughErrorHandler is an ErrorHandler that directly passes through the
// values from the net/http library for the final request. The body is not
// closed.
Expand Down
152 changes: 151 additions & 1 deletion client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ func testClientDo(t *testing.T, body interface{}) {
}

if resp.StatusCode != 200 {
t.Fatalf("exected 200, got: %d", resp.StatusCode)
t.Fatalf("expected 200, got: %d", resp.StatusCode)
}

if retryCount < 0 {
Expand Down Expand Up @@ -896,6 +896,156 @@ func TestClient_DefaultBackoff(t *testing.T) {
}
}

func TestClient_ExponentialJitterBackoff(t *testing.T) {
const retriableStatusCode int = http.StatusServiceUnavailable

t.Run("with non-empty first value of Retry-After header in response", func(t *testing.T) {
response := &http.Response{
StatusCode: retriableStatusCode,
Header: http.Header{
"Content-Type": []string{"application/json"},
"Retry-After": []string{"42"},
},
}
backoff := ExponentialJitterBackoff(retryWaitMin, retryWaitMax, 3, response)
expectedBackoff := 42 * time.Second

if backoff != expectedBackoff {
t.Fatalf("expected default backoff from Retry-After header (%s), got %s", expectedBackoff, backoff)
}
})

invalidRetryAfterHeaderCases := []struct {
name string
makeResponse func() *http.Response
}{
{
name: "with empty first value of Retry-After header in response",
makeResponse: func() *http.Response {
return &http.Response{
StatusCode: retriableStatusCode,
Header: http.Header{
"Content-Type": []string{"application/json"},
"Retry-After": []string{""},
},
}
},
},
{
name: "without Retry-After header in response",
makeResponse: func() *http.Response {
return &http.Response{
StatusCode: retriableStatusCode,
Header: http.Header{"Content-Type": []string{"application/json"}},
}
},
},
{
name: "with nil response",
makeResponse: func() *http.Response {
return nil
},
},
}

for _, irahc := range invalidRetryAfterHeaderCases {
t.Run(irahc.name, func(t *testing.T) {
attemptNumCases := []struct {
name string
attemptNum int
expectedBackoffWithoutJitter time.Duration
}{
{
name: "with first attempt",
attemptNum: 0,
expectedBackoffWithoutJitter: retryWaitMin,
},
{
name: "with low attempt number",
attemptNum: 3,
expectedBackoffWithoutJitter: 16 * time.Second,
},
{
name: "with high attempt number",
attemptNum: 10,
expectedBackoffWithoutJitter: retryWaitMax,
},
}

for _, anc := range attemptNumCases {
t.Run(anc.name, func(t *testing.T) {
backoff := ExponentialJitterBackoff(defaultRetryWaitMin, defaultRetryWaitMax, anc.attemptNum, irahc.makeResponse())
expectedJitterDelta := float64(anc.expectedBackoffWithoutJitter) * 0.25
expectedMinTime := anc.expectedBackoffWithoutJitter - time.Duration(expectedJitterDelta)
expectedMaxTime := anc.expectedBackoffWithoutJitter + time.Duration(expectedJitterDelta)
expectedBackoffLowerLimit := max(expectedMinTime, retryWaitMin)
expectedBackoffUpperLimit := min(expectedMaxTime, retryWaitMax)

t.Run("returns exponential backoff with jitter, clamped within min and max limits", func(t *testing.T) {
if backoff < expectedBackoffLowerLimit || backoff > expectedBackoffUpperLimit {
t.Fatalf("expected backoff to be within range [%s, %s], got %s", expectedBackoffLowerLimit, expectedBackoffUpperLimit, backoff)
}
})
})
}
})
}
}

func Test_clampDuration(t *testing.T) {
const (
minDuration time.Duration = 500 * time.Millisecond
maxDuration time.Duration = 10 * time.Minute
)

testCases := []struct {
name string
errorMessage string
duration time.Duration
expectedClampedDuration time.Duration
}{
{
name: "with duration below min value",
errorMessage: "should return the min value",
duration: 60 * time.Microsecond,
expectedClampedDuration: minDuration,
},
{
name: "with duration equal to min value",
errorMessage: "should return the min value",
duration: minDuration,
expectedClampedDuration: minDuration,
},
{
name: "with duration strictly within min and max range",
errorMessage: "should return the given value",
duration: 45 * time.Second,
expectedClampedDuration: 45 * time.Second,
},
{
name: "with duration equal to max value",
errorMessage: "should return the max value",
duration: maxDuration,
expectedClampedDuration: maxDuration,
},
{
name: "with duration above max value",
errorMessage: "should return the max value",
duration: 2 * time.Hour,
expectedClampedDuration: maxDuration,
},
}

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
duration := clampDuration(tc.duration, minDuration, maxDuration)
if duration != tc.expectedClampedDuration {
t.Fatalf("expected duration %s, got %s", expectedBackoff, backoff)
}
})
}
}

func TestClient_DefaultRetryPolicy_TLS(t *testing.T) {
ts := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(200)
Expand Down