Huge Performance Drop observed with large payloads of around 300KB or more #702
Unanswered
YashasAnand
asked this question in
Q&A
Replies: 2 comments 8 replies
-
Hey! On what machines the cluster is running? |
Beta Was this translation helpful? Give feedback.
1 reply
-
this might be an issue. what do you see if you remove - _ = Task.Run(async () => await ProcessMessageAsync(natsConsumerOptions,consumerOptions,onMessageReceived,jsMsg));;
+ await ProcessMessageAsync(natsConsumerOptions,consumerOptions,onMessageReceived,jsMsg); |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am doing a benchmark at around 300rps with a large payload of around 300KB in k8s nats using helm. My client code is written in c#.
This is my producer code:
NatsMessageModel.cs
Consumer Code:
The payload i am sending is around 300KB, same if i send small payloads around 1KB or 10KB we see very fast consumption & produc rates. can the above code be optimized or any thing specific we can do regarding this? Also ive observed stream size grows exponentially in this case,
The CPU & memory on nodes are going very high around 90% on each nats node given a 3 node nats jetstream cluster with current Replica set to 1 ( temporarily for testsing ,later will increase replicas).
Nats was installed using helm chart as suggested in the cmminity github https://github.com/nats-io/k8s
Edit 1:
@Jarema pls find metrics below:
K8s cluster consisting of 3 node nats deployed through official helm chart.
-----------------------------------------------------TEST Case 1---------------------------------------------------------------------------------
Test Parameters:
PayloadSize: 1KB
RPS: 1.5k
Test Duration : 10m
Stream Replicas: 3 (file storage)
Node Config: (2 Core 4GB Ram with 1:1 cpu to vcpu ratio & 20GB Storage)
Consumer Info:
We are using 75 Consumers totally of JobTestTopic type per pod & we have 3 pods of consumer service (totally 225 consumers) , 40 of these topics are unique durable consumers example (JobTestTopic1, JobTestTopic2 .. JobTestTopic39) . other 35 consumers are clients bounded to same consumers & jetstream is distributing it in round-robin fashion to different consumer threads.
Messags produced are round-robined to each consumer using custom algorithm to distribute load to all consumers.
Observations
CPU of 3 NATS Nodes: are around 90%
Service (Producer & consumer) CPUs are around 60 to 70%
NOTE: we are able to rate of produce & rate of consume is almost equal, due to the fact that processing & acking is happening on a different thread in consumer as shown above.
Here is the stream & consumer info
These are consumer info of 2 consumers
-----------------------------------------------------TEST Case 2---------------------------------------------------------------------------------
Exact same test as above but freshly created stream, with payload size of 270KB at around 300rps
Test Parameters:
PayloadSize: 270KB
RPS: 300rps
Test Duration : 10m
Stream Replicas: 3 (file storage)
Node Config: (2 Core 4GB Ram with 1:1 cpu to vcpu ratio & 20GB Storage)
Consumer Info:
We are using 75 Consumers totally of JobTestTopic type per pod & we have 3 pods of consumer service (totally 225 consumers) , 40 of these topics are unique durable consumers example (JobTestTopic1, JobTestTopic2 .. JobTestTopic39) . other 35 consumers are clients bounded to same consumers & jetstream is distributing it in round-robin fashion to different consumer threads.
Messags produced are round-robined to each consumer using custom algorithm to distribute load to all consumers.
Observations
CPU of 3 NATS Nodes: are around 85 to 100%
Service (Producer & consumer) CPUs are around 35 to 50%
Observed pod Restarts of NATS, Also produce errors in producer service (logs attached later section)
Able to reach only 50RPS
Stream config
Consumer Info:
Before Pod Restart These Logs are observed:
Producer Service Exceptions: Also i feel for some reason this No response received from the server is happeing due to overwhelming the nats server because i am able to see consumers still consuming slowly. Need your advice on this...
Beta Was this translation helpful? Give feedback.
All reactions