Replies: 1 comment
-
I'm realizing this probably belongs in the docs/ folder. I'll get a pull request together later but in the mean time it may be useful for anyone else to chime in with what they know about network tuning for Mantis. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I spent some time last week partnered with @mcowgill-stripe investigating throughput limitations on network connections in Mantis. I thought the write up might be interesting to other Mantis users so I'm posting it here.
TL;DR -- A single connection appears to max out around 400Mbits/sec and a single worker appears to hit a lock contention limit at approximately 500k events per second on an
m7g.2xl
instance.Drops
Drops will occur on the incoming side if the stage logic is not ready to process events. This is smoothed for bursty connections with a buffer controlled by the following properties for stage to stage and
JobSource
respectively.Properties
Most of the properties below can be set on a job-by-job basis by calling
System.setProperty(String, String)
in either a static context or ingetJobInstance()
before actually constructing the job. There are some exceptions that are noted below. All values below are examples and are not necessarily recommended values.Stage to Stage
mantis.w2w.spsc
- Enables single producer single consumer. We generally have this set to false as it limits worker to worker communication to a single thread.mantis.w2w.toKeyThreads
- Tunes the number of threads that are draining the outgoing queue.mantis.w2w.toKeyBuffe
r - Tunes the length of the buffer that each of the consumer threads uses.mantis.w2w.toKeyMaxChunkSize
- The maximum batch size to send to the downstream. This will trigger an automatic send if the thread’s internal buffer reaches this size.mantis.w2w.toKeyMaxChunkTimeMSec
- The maximum batch age before sending to the downstream. This will trigger an automatic send of the batch if the thread goes this long without sending.mantis.worker.connectionsPerEndpoint
- The number of connections initiated between each host stage to stage. This defaults to 2.Job to Job
JobSource
responds to the following properties// ServerSentEventsSink equivalents of the above…
System.setProperty("mantis.sse.spsc", "false");
System.setProperty("mantis.sse.numConsumerThreads", "8");
System.setProperty("mantis.sse.bufferCapacity", "250000");
System.setProperty("mantis.sse.maxChunkSize", "1000");
System.setProperty("mantis.sse.batchInterval", **"100");
See above for descriptions.
Practical Tips
A few things to try immediately.
enableBinaryCompression: true
intarget
forJobSource
in order to increase the amount of information that fits into 400Mbits/sec.connectionsPerEndpoint
to ensure stage to stage communication can utilize all available network resources.5*400Mbit/sec=2Gbit/sec
Future Work
It would be very interesting to implement a
connectionsPerEndpoint
equivalent inJobSource
to allow higher bandwidth job to job. This has become especially important as we've become more cost sensitive and attempt to scale down our jobs.It would also be interesting to experiment with the number of Netty threads dedicated to the NIO loop.
Beta Was this translation helpful? Give feedback.
All reactions