-
We are developing a database implemented in Rust, using the Arrow Flight SQL protocol for client-server communication. The server has many workers, each created via tokio::spawn. If there are a large number of workers on the server, such as 512, the server occasionally hangs. Specifically, when the client calls stream.next, the server does not respond. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Tokio task has no such limitation, so I suspect there is a deadlock or another issue outside of tokio. For debugging, you could use |
Beta Was this translation helpful? Give feedback.
-
I found an interesting phenomenon that some streams are never polled by tokio while others not. I've added a log in the Does tokio has the capability to refuse to schedule some tasks? |
Beta Was this translation helpful? Give feedback.
It's now clear what makes tasks hang. Tonic does not really support lazy stream.
When we create a cross-node stream through the Arrow Flight protocol, which is based on grpc supported by tonic, the stream is eagerly polled by tonic in the server side, however, not lazily polled by the client.
Therefore, if the number of streams is large and each stream contains a large amount of data, then the server side would eagerly consume all streams in parallel.
I'm not sure the direct reason that makes tokio tasks hang. But after we enforce the stream being consumed in a lazy way, the tasks won't hang.