-
Notifications
You must be signed in to change notification settings - Fork 1.1k
dynamic-load-balance channel can not work #2257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This issue can reproduce 100%. when connect to multi c++ grpc server endpoint, only one endpoint can established, other will be CLOSE_WAIT. code in examples/src/dynamic_load_balance/client.rs pub mod pb {
tonic::include_proto!("grpc.examples.unaryecho");
}
use pb::{echo_client::EchoClient, EchoRequest};
use tonic::transport::channel::Change;
use tonic::transport::Channel;
use tonic::transport::Endpoint;
use tracing::level_filters::LevelFilter;
use tracing_subscriber::fmt;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::util::SubscriberInitExt;
use tracing_subscriber::Layer;
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering::SeqCst};
use tokio::time::timeout;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
tracing_subscriber::registry()
.with(
fmt::layer()
.with_ansi(true)
.with_file(true)
.with_line_number(true)
.with_thread_ids(false)
.with_target(false)
.with_filter(
LevelFilter::TRACE
),
)
.init();
let e1 = Endpoint::from_static("http://127.0.0.1:50051");
let e2 = Endpoint::from_static("http://11.124.250.156:9090");
let (channel, rx) = Channel::balance_channel(10);
let mut client = EchoClient::new(channel);
tokio::spawn(async move {
println!("Added first endpoint");
let change = Change::Insert("1", e1);
let res = rx.send(change).await;
println!("{:?}", res);
println!("Added second endpoint");
let change = Change::Insert("2", e2);
let res = rx.send(change).await;
println!("{:?}", res);
});
{
tokio::time::sleep(tokio::time::Duration::from_millis(500)).await;
let request = tonic::Request::new(EchoRequest {
message: "hello".into(),
});
let rx = client.unary_echo(request);
if let Ok(resp) = timeout(tokio::time::Duration::from_secs(10), rx).await {
println!("RESPONSE={:?}", resp);
} else {
println!("did not receive value within 10 secs");
}
}
println!("... Bye");
tokio::time::sleep(tokio::time::Duration::from_secs(5000)).await;
Ok(())
}
$netstat -antlp | grep :5005
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.1:50051 0.0.0.0:* LISTEN 112413/dynamic-load
tcp 0 0 127.0.0.1:50052 0.0.0.0:* LISTEN 112413/dynamic-load
tcp 0 0 127.0.0.1:50051 127.0.0.1:42554 ESTABLISHED 112413/dynamic-load
tcp 0 0 127.0.0.1:42554 127.0.0.1:50051 ESTABLISHED 50880/dynamic-load-
$netstat -antlp | grep :9090
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 52 0 11.165.57.131:46340 11.124.250.156:9090 ESTABLISHED 50880/dynamic-load- after about 2min, the tcp status is: $netstat -antlp | grep :9090
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 53 0 11.165.57.131:46340 11.124.250.156:9090 CLOSE_WAIT 50880/dynamic-load- from now on, if client issue a request that hit the |
Yeah, I think this is def a bug with tower::balance I see you made an issue there already. We should use that one to track this and then we can bump a new release when it gets "fixed" there. Though this is probably a pretty big foundational issue with balance. We are working on new grpc code with a new load balancer but that is still quite a ways out. |
Thanks for your reply, I'll keep track on it. |
@LucioFranco I have updated the survey on this issue, with the details provided in the reference above. |
Bug Report
Version
v0.12.3
Platform
x86_64-unknown-linux
Crates
Description
I use tonic as a grpc client with
Channel::balance_channel(10)
, when client call server(multi node) at the first time, it works, and then wait a moment (1 min) the tcp connection status is turned toCLOSE_WAIT
(i.e. the server close the connection as thers is no other streams), and from now on, the later call to server failed to response like this:TCP Status

I don't know if there is something wrong with my usage. Please help confirm. The following is the creation of the channel, thank you.
The text was updated successfully, but these errors were encountered: