Strange CPU Usage/Scheduling Behavior #4753

xanderdunn · 2022-06-06T02:36:56Z

xanderdunn
Jun 6, 2022

Thanks for this stellar library. I've only just begun using it last week and it's likely going to make its way into production for us.

I have a gRPC server using tonic with bidirectional streams. To test it, we use Docker Compose to spin up 75 nodes each running the same rust binary on an 8 core machine. They perform some work on tokio::spawn(async and tokio::task::spawn_blocking tasks, share the results with each other over gRPC bidirectional tokio_stream::wrappers::ReceiverStreams, and then exit.

Each node computes 225 cryptographic values, and sends its values to every other node via ReceiverStreams for a total of 75*225 values computed and sent over streams.

I'm seeing some strange behavior where, in the middle of the computationally intense process of creating and sharing all of those values, the CPU usage just goes to 0% and nothing happens for 10-30 seconds. Then, after some time and for no apparent reason, the processors suddenly max out again until the job completes. See a video of this here: https://youtu.be/c9UQPLjj6jM. The video starts after node setup has completed and the computationally and message intensive portion we're interested in starts. You'll see 100% CPU usage, followed by a suddenly silent period of 0% usage. This abrupt 100%->0%->100% happens a couple of times.

At smaller numbers of nodes, like 6 or 20, I do not see this behavior. The CPUs max out until the task is complete. This is what I would expect.

Things I'm checking:

There are exactly 0 sleeps or timers in the code.
There are no mutexes in the code. There is a single Arc<RwLock<HashMap>> object, but I put a log statement before the acquiring of every .write(), and none of them are occurring during this phase of the program, only during startup.
Several portions of the work are computationally expensive, so they're performed on tokio::task::spawn_blocking tasks and the results are put on mpsc's so that async channels can make use of the results. All of the computationally expensive work should be in blocking tasks. The total number of calls to tokio::task::spawn_blocking across all 75 nodes on a successful end to end run is 450 (not all are alive simultaneously).
Once this phase of the program starts with "x / 225 dealing rounds have completed...", the code is homogenous until completion. There aren't any new code branches, it should just be doing the same thing until it completes.
There is no file IO
The behavior is the same on both debug and release builds
I can't attach tokio-console to all 75 nodes at once, but I was able to attach to a random node and run tokio-console a few times. For the most part I didn't see anything interesting. Everything shows IDLE for most of the run. I saw one task show BUSY and a message that showed up to "3 tasks have lost their waker":

I realize this is impossible to debug without a minimal reproducing example, but I wonder if perhaps any debugging ideas arise.

tokio 1.19.1. Ubuntu 20.04.

Is it possible worker thread scheduling may be expected to perform this way under high CPU load?

Things I could try next:

Use rayon for the computationally intensive work rather than spawn_blocking
Limit number of worker threads on each node, since there will be far more worker threads than cores with 75 nodes.

It's a relatively simple program: A single file with 640 lines of code. If I can't make any progress I should be able to replace the computation with some boilerplate and copy the tokio/tonic logic to share.

xanderdunn · 2022-06-06T03:17:55Z

xanderdunn
Jun 6, 2022
Author

When moving from an 8 core to a 32 core machine, I see the same behavior. Despite 4x number of cores, and an overall faster completion time, there is still an unexplained period where the CPUs have 0% utilization, sandwiched between 100% usage.

0 replies

Noah-Kennedy · 2022-06-06T04:39:02Z

Noah-Kennedy
Jun 6, 2022
Collaborator

Sorry, did you say 0.19.1? Or is this 1.19.1?

0 replies

xanderdunn · 2022-06-06T04:40:48Z

xanderdunn
Jun 6, 2022
Author

Thanks, that's a typo. I'm on latest: 1.19.1.

0 replies

Darksonn · 2022-06-06T08:25:33Z

Darksonn
Jun 6, 2022
Maintainer

You say that you are using mpsc channels. Are your channels bounded? Perhaps your threads are sleeping because they are trying to send a message on a bounded channel that doesn't have space for the message?

0 replies

xanderdunn · 2022-06-06T14:36:12Z

xanderdunn
Jun 6, 2022
Author

Thanks, they are indeed bounded. I had a limit of 1000 on all three of the channels. In the situation that one of them is full, I'd expect at least one core to have high usage as it's trying to recv and work through all of the messages? All of the channels are created upfront and it should be an unblocked pipeline to completion.

To test, I increased the bound on all the channels to 1,000,000. This is much larger than the total number of messages sent throughout execution. No change in behavior. Still unexplained periods of 0% CPU usage.

I'll see if I can swap out the computation with some boilerplate and share the code.

0 replies

xanderdunn · 2022-06-06T18:51:03Z

xanderdunn
Jun 6, 2022
Author

Here is a minimal reproducing sample: https://github.com/xanderdunn/tokio-sample
I've simply reduced it to a minimum set of gRPC endpoints and replaced the computation with some ring library key generations and signatures. The bidirectional stream we're interested in is in src/sample_grpc.rs::receive_dealings().

I'm sure it's architected sub-optimally, so if any feedback comes to mind when you take a look, please let me know.

0 replies

Darksonn · 2022-06-06T19:18:35Z

Darksonn
Jun 6, 2022
Maintainer

That's still a lot of code for me to go through, so I've only skimmed some of the files. However, the widespread usage of locks makes me uncomfortable. I would investigate whether you're holding a lock somewhere that is preventing the threads from making progress.

I have a draft for a blog post about this kind of thing. I haven't finished it and the project is currently on hold, but you can read it here: link

0 replies

xanderdunn · 2022-06-06T21:53:48Z

xanderdunn
Jun 6, 2022
Author

This is a great post, thanks very much for sharing!

Some points that stood out for me:

The deadlock can happen because if you .await while a mutex is locked, then that mutex will remain locked until the thread swaps back to that task.

When lots of threads are reading from an RwLock all the time, a writer can be prevented from taking a write lock for a very long time since the number of readers never drops to zero. This is called starvation.

I definitely have some encapsulation and cleanup of my lock usage to do.

A common mistake when writing things like chat servers is to define a collection such as HashMap<UserId, TcpStream>, and to then put it inside a lock of some kind. I think this is a big mistake, and I've never really seen it turn out well. To handle cases like this one, I would encourage you to instead use the actor pattern, where each TcpStream is exclusively owned by a spawned task dedicated to that TcpStream.

I am essentially doing this. I am storing two channel Senders on each value in a map. I have an Arc<RwLock<BTreeMap<PublicKey, Peer>>>, and each Peer has two mpsc::Senders that take in values from streams. It would be preferable for these peer connections to instead create their own actors. I will read the actor post as well.

I don't yet know if this is the source of my issue, but I'll work on these directions.

0 replies

xanderdunn · 2022-06-06T23:02:28Z

xanderdunn
Jun 6, 2022
Author

I encapsulated all of my lock usage into structs as described in your article. See here. Unfortunately, this doesn't appear to have affected the 0% CPU usage stalling behavior. However, the code is much nicer, the with_* approach is very nice!

I also tried switching out all of my std::sync::RwLock to parking_lot::RwLock. See here. Again, no effect on the problem.

This is in line with my above observation:

I put a log statement before the acquiring of every .write(), and none of them are occurring during this phase of the program, only during startup.

If no write locks are being acquired, I wouldn't expect locks to be degrading performance.

0 replies

xanderdunn · 2022-06-08T20:18:12Z

xanderdunn
Jun 8, 2022
Author

I've narrowed it down to specifically a lag between the sending and receiving of messages over the network's bidirectional gRPC streams, even in the absence of all computationally intensive work, so I posted on tonic since it's doing the network communication here.

0 replies

xanderdunn · 2022-06-11T21:33:15Z

xanderdunn
Jun 11, 2022
Author

This issue ended up being specific to my use of Docker containers. When I perform exactly the same test on my local machine without containers, there is no lag in messaging at all. I haven't figured out what about the Docker containers is causing the problem, but I am at least unblocked for my performance testing.

2 replies

Darksonn Jun 11, 2022
Maintainer

Could that be due to the number of threads used by the runtime?

xanderdunn Jun 11, 2022
Author

I'm not sure. I don't see a limit on the number of threads a container can create.
As for the rust code being run, it's identical whether it's in or out of a container. The only change is that it uses localhost when run without Docker and it uses the container's hostname when run in a Docker container.

When running the test with Docker and enough nodes to make the lag appear, I run docker stats to inspect container resource usage, captured during message lagging:

I don't see any exorbitant resource usage. In particular, see the PIDs column.

Docker Compuse used to have an optional pids_limit config, but compose config v3.* doesn't have it anymore.

Uh oh!

Strange CPU Usage/Scheduling Behavior #4753

Uh oh!

Uh oh!

xanderdunn Jun 6, 2022

Replies: 11 comments · 2 replies

Uh oh!

Uh oh!

xanderdunn Jun 6, 2022 Author

Uh oh!

Noah-Kennedy Jun 6, 2022 Collaborator

Uh oh!

xanderdunn Jun 6, 2022 Author

Uh oh!

Darksonn Jun 6, 2022 Maintainer

Uh oh!

Uh oh!

xanderdunn Jun 6, 2022 Author

Uh oh!

xanderdunn Jun 6, 2022 Author

Uh oh!

Darksonn Jun 6, 2022 Maintainer

Uh oh!

xanderdunn Jun 6, 2022 Author

Uh oh!

Uh oh!

xanderdunn Jun 6, 2022 Author

Uh oh!

xanderdunn Jun 8, 2022 Author

Uh oh!

xanderdunn Jun 11, 2022 Author

Uh oh!

Darksonn Jun 11, 2022 Maintainer

Uh oh!

Uh oh!

xanderdunn Jun 11, 2022 Author

xanderdunn
Jun 6, 2022

Replies: 11 comments 2 replies

xanderdunn
Jun 6, 2022
Author

Noah-Kennedy
Jun 6, 2022
Collaborator

xanderdunn
Jun 6, 2022
Author

Darksonn
Jun 6, 2022
Maintainer

xanderdunn
Jun 6, 2022
Author

xanderdunn
Jun 6, 2022
Author

Darksonn
Jun 6, 2022
Maintainer

xanderdunn
Jun 6, 2022
Author

xanderdunn
Jun 6, 2022
Author

xanderdunn
Jun 8, 2022
Author

xanderdunn
Jun 11, 2022
Author

Darksonn Jun 11, 2022
Maintainer

xanderdunn Jun 11, 2022
Author