Skip to content

feat(lazer): add resilient client in rust #2859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 23, 2025
Merged

feat(lazer): add resilient client in rust #2859

merged 14 commits into from
Jul 23, 2025

Conversation

keyvankhademi
Copy link
Contributor

Summary

This PR updates the rust client to be a resilient client with multiple connections.

Rationale

We need this both for consumers and for monitor service.

How has this been tested?

  • Current tests cover my changes
  • Added new tests
  • Manually tested the code

Manually ran and tested the connections.

Copy link

vercel bot commented Jul 16, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
api-reference ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 22, 2025 10:04pm
component-library ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 22, 2025 10:04pm
developer-hub ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 22, 2025 10:04pm
entropy-debugger ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 22, 2025 10:04pm
entropy-explorer ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 22, 2025 10:04pm
insights ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 22, 2025 10:04pm
proposals ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 22, 2025 10:04pm
staking ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 22, 2025 10:04pm

Copy link
Contributor

@Riateche Riateche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

/// * `access_token` - The access token for authentication
/// * `num_connections` - The number of WebSocket connections to maintain
pub fn new(
endpoints: Vec<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Url type maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let (sender, receiver) = mpsc::channel::<AnyResponse>(CHANNEL_CAPACITY);

for i in 0..self.num_connections {
let endpoint = self.endpoints[i % self.endpoints.len()].clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will panic if endpoints is empty. We should probably check for that in the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

pub async fn start(&mut self) -> Result<mpsc::Receiver<AnyResponse>> {
let (sender, receiver) = mpsc::channel::<AnyResponse>(CHANNEL_CAPACITY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel capacity should be configurable by the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Internal channel don't need to be configurable right?

}

let streams: Vec<_> = self.receivers.drain(..).map(ReceiverStream::new).collect();
let mut merged_stream = stream::select_all(streams);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easier and more efficient to use a single channel here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. fixed.

Comment on lines 94 to 96
if last_failure_time.elapsed() > BACKOFF_RESET_DURATION {
self.backoff.reset();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a hack. I think we should reset the backoff object in start() when we successfully connect and subscribe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the behavior to be a little different than that. For example, if client can connect and subscribe, but get disconnected immediately after we should not reset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code means that when the endpoint is down, rather than doing a proper exponential backoff, it will start spamming the endpoint again every 10 seconds. I think what you want can be better achieved with this:

pub async fn start(...) {
    //...
    let stream = ws_connection.start().await?;
    let connected_at = Instant::now();
    let mut num_responses = 0;
    let mut confirmed_healthy = false;
    //...
             Ok(response) => {
                 if !confirmed_healthy {
                     num_responses += 1;
                     if num_responses >= CONFIRM_HEALTHY_NUM_RESPONSES &&
                          connected_at.elapsed() > CONFIRM_HEALTHY_DELAY {
                          confirmed_healthy = true;
                          self.backoff.reset();
                      }
                 }
                 sender
                     .send(response)
                     .await
                     .context("Failed to send response")?;
             }
    //...
}





Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only happens when max_interval is greater than 10 seconds. Otherwise it will try connecting every max_interval.
I think the best solution is to set BACKOFF_RESET_DURATION to be 2*max_interval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it another way. Now we check the duration of the connection and if was connected for more than BACKOFF_RESET_DURATION and timout + second it was connected successfully and we reset the backoff.

request: SubscribeRequest,
) -> Result<()> {
self.subscriptions.push(request.clone());
return ws_connection.subscribe(request).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unneeded return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

request.subscription_id
);
}
return ws_connection.unsubscribe(request).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unneeded return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

endpoints: Vec<String>,
access_token: String,
num_connections: usize,
backoff: ExponentialBackoff,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be part of your public interface. you need to reexport it to downstream consumers. Other thing is to have a wrapper that has what you need (because i see you are not supporting everything there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a builder wrapper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A builder would also be nice for the PythLazerClient itself. You can move channel_capacity from start() argument to the builder. The builder can also provide default values for everything except access_token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

};

#[derive(Debug)]
pub struct PythLazerExponentialBackoffBuilder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document here as well? maybe default as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't implement default is that we need to have access token. Default doesn't make sense since it will be in an invalid state.

.with_max_interval(self.max_interval)
.with_max_elapsed_time(None)
.build()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm fine with this but generally my take was reexposing what we use in the API and we still use ExponentialBackoff from the backoff crate. now that you have a builder it makes sense that you have your own wrapper type for the api of the lazer client. later when you convert it you can remove the following error as well: bail!("max_elapsed_time is not supported in Pyth Lazer client");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's a good improvement. I'll add this + docs in another PR to unblock the progress of monitor for now.

@keyvankhademi keyvankhademi merged commit 928f003 into main Jul 23, 2025
11 checks passed
@keyvankhademi keyvankhademi deleted the res-client branch July 23, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants