Huge Performance Drop observed with large payloads of around 300KB or more #702

YashasAnand · 2025-01-06T10:50:01Z

YashasAnand
Jan 6, 2025

I am doing a benchmark at around 300rps with a large payload of around 300KB in k8s nats using helm. My client code is written in c#.

This is my producer code:

private NatsJsonContextSerializer<NatsMessageModel> natsMesgSerializer = new NatsJsonContextSerializer<NatsMessageModel>(NatsMessageModelContext.Default);

public async Task ProduceMessage(string topicName, T message)
    {
        try
        {
            if(message!=null && !string.IsNullOrWhiteSpace(message.Key) && !string.IsNullOrWhiteSpace(message.Value))
			NatsMessageModel natsMessage = new NatsMessageModel{
                    Key = message.Key,
                    Value= message.Value
                };
			if(_natsConnection!=null) 
                       {
				await _natsJSContext.PublishAsync(topicName,natsMessage,serializer:natsMesgSerializer);
			}
		}
	}

NatsMessageModel.cs

public class NatsMessageModel : BaseMessageModel
{
    public required string Key { get; set; }
    public required string Value { get; set; } 
    public string Attribute { get; set; } = string.Empty;
}

Consumer Code:

public async Task ConsumeMessage(MessageConsumeOptions<TConfig> consumerOptions, Func<T, Task> onMessageReceived)
{
    try
    {
        var natsConsumerOptions = consumerOptions.ConsumerInfo as NatsConsumerModel;
        var natsMesgSerializer = new NatsJsonContextSerializer<NatsMessageModel>(NatsMessageModelContext.Default);
        var ConsumerConfig = new ConsumerConfig
                {
                    Name = natsConsumerOptions.ConsumerName,
                    DurableName = natsConsumerOptions.ConsumerName,
                    AckPolicy = ConsumerConfigAckPolicy.Explicit,
                    DeliverPolicy = ConsumerConfigDeliverPolicy.Last,
                    FilterSubject = natsConsumerOptions.TopicName,
                    AckWait = TimeSpan.FromMinutes(1)
                };
        var consumer = await _natsJSContext!.CreateOrUpdateConsumerAsync(natsConsumerOptions.StreamName, ConsumerConfig);
        _logger.Log($"Started Consumer {natsConsumerOptions.ConsumerName} with FilterSubject {natsConsumerOptions.TopicName} on ThreadID {Thread.CurrentThread.ManagedThreadId}");
        await foreach (var jsMsg in consumer.ConsumeAsync(serializer:natsMesgSerializer))
        {
            _logger.Log($"Received Msg on Consumer {natsConsumerOptions.TopicName} {consumerOptions.CurrentConsumerIndex} on ThreadID {Thread.CurrentThread.ManagedThreadId}");
            if(jsMsg.Data!=null)
            {
            //Processing Logic on a new thread which takes around 1 second but does not block consumption
                _ = Task.Run(async () => await ProcessMessageAsync(natsConsumerOptions,consumerOptions,onMessageReceived,jsMsg));; 
            }
        }
    }
}

The payload i am sending is around 300KB, same if i send small payloads around 1KB or 10KB we see very fast consumption & produc rates. can the above code be optimized or any thing specific we can do regarding this? Also ive observed stream size grows exponentially in this case,

The CPU & memory on nodes are going very high around 90% on each nats node given a 3 node nats jetstream cluster with current Replica set to 1 ( temporarily for testsing ,later will increase replicas).

Nats was installed using helm chart as suggested in the cmminity github https://github.com/nats-io/k8s

Edit 1:
@Jarema pls find metrics below:

K8s cluster consisting of 3 node nats deployed through official helm chart.

-----------------------------------------------------TEST Case 1---------------------------------------------------------------------------------

Test Parameters:
PayloadSize: 1KB
RPS: 1.5k
Test Duration : 10m
Stream Replicas: 3 (file storage)
Node Config: (2 Core 4GB Ram with 1:1 cpu to vcpu ratio & 20GB Storage)

Consumer Info:

We are using 75 Consumers totally of JobTestTopic type per pod & we have 3 pods of consumer service (totally 225 consumers) , 40 of these topics are unique durable consumers example (JobTestTopic1, JobTestTopic2 .. JobTestTopic39) . other 35 consumers are clients bounded to same consumers & jetstream is distributing it in round-robin fashion to different consumer threads.

Messags produced are round-robined to each consumer using custom algorithm to distribute load to all consumers.

Observations
CPU of 3 NATS Nodes: are around 90%
Service (Producer & consumer) CPUs are around 60 to 70%
NOTE: we are able to rate of produce & rate of consume is almost equal, due to the fact that processing & acking is happening on a different thread in consumer as shown above.

Here is the stream & consumer info

Nats Stream Info

Information for Stream BenchTest created 2025-01-08 11:53:27

              Subjects: BenchTest.>
              Replicas: 1
               Storage: File

Options:

             Retention: Limits
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 2m0s
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: unlimited
   Maximum Per Subject: unlimited
         Maximum Bytes: 20 GiB
           Maximum Age: 7d0h0m0s
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited

Cluster Information:

                  Name: nats
                Leader: nats-1

State:

              Messages: 898,593
                 Bytes: 722 MiB
        First Sequence: 1 @ 2025-01-08 11:55:57
         Last Sequence: 898,593 @ 2025-01-08 12:05:57
      Active Consumers: 70
    Number of Subjects: 40

These are consumer info of 2 consumers

Information for Consumer BenchTest > JobTestTopic created 2025-01-08T11:54:41+05:30

Configuration:

                    Name: JobTestTopic
               Pull Mode: true
          Filter Subject: BenchTest.Jobs.JobTestTopic
          Deliver Policy: New
              Ack Policy: Explicit
                Ack Wait: 1m0s
           Replay Policy: Instant
       Max Waiting Pulls: 512

Cluster Information:

                    Name: nats
                  Leader: nats-1

State:

  Last Delivered Message: Consumer sequence: 22,467 Stream sequence: 898,593 Last delivery: 10m56s ago
    Acknowledgment Floor: Consumer sequence: 22,467 Stream sequence: 898,593 Last Ack: 10m55s ago
        Outstanding Acks: 0
    Redelivered Messages: 0
    Unprocessed Messages: 0
           Waiting Pulls: 60 of maximum 512
		   
		   
Information for Consumer BenchTest > JobTestTopic39 created 2025-01-08T11:54:41+05:30

Configuration:

                    Name: JobTestTopic39
               Pull Mode: true
          Filter Subject: BenchTest.Jobs.JobTestTopic39
          Deliver Policy: New
              Ack Policy: Explicit
                Ack Wait: 1m0s
           Replay Policy: Instant
       Max Waiting Pulls: 512

Cluster Information:

                    Name: nats
                  Leader: nats-1

State:

  Last Delivered Message: Consumer sequence: 22,464 Stream sequence: 898,593 Last delivery: 14m45s ago
    Acknowledgment Floor: Consumer sequence: 22,464 Stream sequence: 898,593 Last Ack: 14m44s ago
        Outstanding Acks: 0
    Redelivered Messages: 0
    Unprocessed Messages: 0
           Waiting Pulls: 3 of maximum 512

-----------------------------------------------------TEST Case 2---------------------------------------------------------------------------------

Exact same test as above but freshly created stream, with payload size of 270KB at around 300rps

Test Parameters:
PayloadSize: 270KB
RPS: 300rps
Test Duration : 10m
Stream Replicas: 3 (file storage)
Node Config: (2 Core 4GB Ram with 1:1 cpu to vcpu ratio & 20GB Storage)

Consumer Info:
We are using 75 Consumers totally of JobTestTopic type per pod & we have 3 pods of consumer service (totally 225 consumers) , 40 of these topics are unique durable consumers example (JobTestTopic1, JobTestTopic2 .. JobTestTopic39) . other 35 consumers are clients bounded to same consumers & jetstream is distributing it in round-robin fashion to different consumer threads.

Messags produced are round-robined to each consumer using custom algorithm to distribute load to all consumers.

Observations
CPU of 3 NATS Nodes: are around 85 to 100%
Service (Producer & consumer) CPUs are around 35 to 50%
Observed pod Restarts of NATS, Also produce errors in producer service (logs attached later section)
Able to reach only 50RPS

Stream config

Information for Stream BenchTest created 2025-01-09 11:37:02

              Subjects: BenchTest.>
              Replicas: 3
               Storage: File

Options:

             Retention: Limits
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 2m0s
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: unlimited
   Maximum Per Subject: unlimited
         Maximum Bytes: 20 GiB
           Maximum Age: 7d0h0m0s
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited

Cluster Information:

                  Name: nats
                Leader: nats-2
               Replica: nats-0, current, seen 730ms ago
               Replica: nats-1, current, seen 730ms ago

State:

              Messages: 19,763
                 Bytes: 4.9 GiB
        First Sequence: 1 @ 2025-01-09 11:41:26
         Last Sequence: 19,763 @ 2025-01-09 12:17:38
      Active Consumers: 70
    Number of Subjects: 45
	
	Messages: 10,134
                 Bytes: 2.5 GiB

Consumer Info:

Information for Consumer BenchTest > JobTestTopic created 2025-01-09T11:40:37+05:30

Configuration:

                    Name: JobTestTopic
               Pull Mode: true
          Filter Subject: BenchTest.Jobs.JobTestTopic
          Deliver Policy: New
              Ack Policy: Explicit
                Ack Wait: 1m0s
           Replay Policy: Instant
       Max Waiting Pulls: 512

Cluster Information:

                    Name: nats
                  Leader: nats-2
                 Replica: nats-0, current, seen 351ms ago
                 Replica: nats-1, current, seen 351ms ago

State:

  Last Delivered Message: Consumer sequence: 757 Stream sequence: 19,758 Last delivery: 5m30s ago
    Acknowledgment Floor: Consumer sequence: 757 Stream sequence: 19,763 Last Ack: 5m29s ago
        Outstanding Acks: 0
    Redelivered Messages: 0
    Unprocessed Messages: 0
           Waiting Pulls: 60 of maximum 512
		   
Information for Consumer BenchTest > JobTestTopic39 created 2025-01-09T11:40:37+05:30

Configuration:

                    Name: JobTestTopic39
               Pull Mode: true
          Filter Subject: BenchTest.Jobs.JobTestTopic39
          Deliver Policy: New
              Ack Policy: Explicit
                Ack Wait: 1m0s
           Replay Policy: Instant
       Max Waiting Pulls: 512

Cluster Information:

                    Name: nats
                  Leader: nats-0
                 Replica: nats-1, current, seen 791ms ago
                 Replica: nats-2, current, seen 790ms ago

State:

  Last Delivered Message: Consumer sequence: 746 Stream sequence: 19,763 Last delivery: 8m58s ago
    Acknowledgment Floor: Consumer sequence: 746 Stream sequence: 19,763 Last Ack: 8m57s ago
        Outstanding Acks: 0
    Redelivered Messages: 0
    Unprocessed Messages: 0
           Waiting Pulls: 3 of maximum 512

Before Pod Restart These Logs are observed:

Pod1:
[7] 2025/01/09 06:13:54.806113 [INF] 10.35.0.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2264 chunks of 117746424 total bytes.
[7] 2025/01/09 06:14:00.517370 [INF] 10.40.0.1:37728 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2336 chunks of 121706531 total bytes.
[7] 2025/01/09 06:14:04.806661 [INF] 10.35.0.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2841 chunks of 149064695 total bytes.
[7] 2025/01/09 06:14:10.518748 [INF] 10.40.0.1:37728 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 3122 chunks of 166013255 total bytes.
[7] 2025/01/09 06:14:14.807349 [INF] 10.35.0.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 3487 chunks of 183667787 total bytes.
[7] 2025/01/09 06:14:20.519398 [INF] 10.40.0.1:37728 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 3823 chunks of 199434835 total bytes.
[7] 2025/01/09 06:14:24.809110 [INF] 10.35.0.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 4328 chunks of 225453935 total bytes.
[7] 2025/01/09 06:14:30.520222 [INF] 10.40.0.1:37728 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 4566 chunks of 237352557 total bytes.
[7] 2025/01/09 06:14:34.812181 [INF] 10.35.0.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 5255 chunks of 272097307 total bytes.

Pod 2:
[7] 2025/01/09 04:37:34.931644 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 1929 chunks of 100234481 total bytes.
[7] 2025/01/09 04:37:44.934351 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2462 chunks of 129094802 total bytes.
[7] 2025/01/09 04:37:47.496853 [INF] 10.45.0.3:6222 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2135 chunks of 112413162 total bytes.
[7] 2025/01/09 04:37:54.935475 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 3076 chunks of 159359919 total bytes.
[7] 2025/01/09 04:37:57.497086 [INF] 10.45.0.3:6222 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2949 chunks of 154086602 total bytes.
[7] 2025/01/09 04:38:04.935840 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 3593 chunks of 185819571 total bytes.
[7] 2025/01/09 04:38:07.497516 [INF] 10.45.0.3:6222 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 3582 chunks of 185926571 total bytes.
[7] 2025/01/09 04:38:14.940253 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 4234 chunks of 221210803 total bytes.
[7] 2025/01/09 04:38:17.497928 [INF] 10.45.0.3:6222 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 3933 chunks of 205523136 total bytes.
[7] 2025/01/09 04:38:24.944314 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 4997 chunks of 259097229 total bytes.
[7] 2025/01/09 04:38:27.498804 [INF] 10.45.0.3:6222 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 4920 chunks of 254508705 total bytes.
[7] 2025/01/09 04:38:28.065218 [WRN] JetStream cluster consumer '$G > BenchTest > GetNextConsumer13' has NO quorum, stalled.
[7] 2025/01/09 04:38:34.945260 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 5737 chunks of 295596344 total bytes.
[7] 2025/01/09 04:38:37.499635 [INF] 10.45.0.3:6222 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 4706 chunks of 243764975 total bytes.
[7] 2025/01/09 04:38:44.946600 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6623 chunks of 343225284 total bytes.
[7] 2025/01/09 04:38:47.500634 [INF] 10.45.0.3:6222 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 5407 chunks of 280122202 total bytes.
[7] 2025/01/09 04:38:54.951157 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6839 chunks of 354878040 total bytes.
[7] 2025/01/09 04:38:57.501560 [INF] 10.45.0.3:6222 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6197 chunks of 322283896 total bytes.
[7] 2025/01/09 04:39:04.959845 [INF] 10.43.128.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 7389 chunks of 383954473 total bytes.
[7] 2025/01/09 04:39:07.502579 [INF] 10.45.0.3:6222 - rid:8 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6735 chunks of 351040250 total bytes.

Producer Service Exceptions: Also i feel for some reason this No response received from the server is happeing due to overwhelming the nats server because i am able to see consumers still consuming slowly. Need your advice on this...

Exception: Failed to publish message to subject:BenchTest.JobTestTopic28
 ---> NATS.Client.JetStream.NatsJSPublishNoResponseException: No response received from the server
   at NATS.Client.JetStream.NatsJSContext.PublishAsync[T](String subject, T data, INatsSerialize`1 serializer, NatsJSPubOpts opts, NatsHeaders headers, CancellationToken cancellationToken)
   at Prism.ImplementationLayer.Nats.NatsMessageProducer`1.ProduceMessage(String topicName, T message) in /app/Prism/ImplementationLayer/Nats/NatsMessageProducer.cs:line 52
   --- End of inner exception stack trace ---



Exception: Failed to publish message to subject:BenchTest.Jobs.JobTestTopic37
 ---> System.OperationCanceledException: The operation was canceled.
   at NATS.Client.Core.Commands.CommandWriter.PublishStateMachineAsync(Boolean lockHeld, String subject, String replyTo, NatsPooledBufferWriter`1 headersBuffer, NatsPooledBufferWriter`1 payloadBuffer, CancellationToken cancellationToken)
   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at NATS.Client.Core.NatsConnection.CreateRequestSubAsync[TRequest,TReply](String subject, TRequest data, NatsHeaders headers, INatsSerialize`1 requestSerializer, INatsDeserialize`1 replySerializer, NatsPubOpts requestOpts, NatsSubOpts replyOpts, CancellationToken cancellationToken)
   at NATS.Client.JetStream.NatsJSContext.PublishAsync[T](String subject, T data, INatsSerialize`1 serializer, NatsJSPubOpts opts, NatsHeaders headers, CancellationToken cancellationToken)

Jarema · 2025-01-06T12:25:08Z

Jarema
Jan 6, 2025
Maintainer

Hey!

On what machines the cluster is running?
Can you please show nats consumer info and nats stream info?
Also - providing more concrete numbers of what "very fast" and "slow" means would be really helpful.

1 reply

YashasAnand Jan 9, 2025
Author

@Jarema pls find details in original edited thread.

mtmk · 2025-01-09T10:08:16Z

mtmk
Jan 9, 2025
Maintainer

this might be an issue. what do you see if you remove Task.Run() in consuming service?

-  _ = Task.Run(async () => await ProcessMessageAsync(natsConsumerOptions,consumerOptions,onMessageReceived,jsMsg));; 
+  await ProcessMessageAsync(natsConsumerOptions,consumerOptions,onMessageReceived,jsMsg);

7 replies

YashasAnand Jan 9, 2025
Author

@mtmk rps is requests per second, commonly also tps i believe😅. Could you pls elaborate "Using multiple connections with multiple consumers might help without going crazy about it". I am currently using Connection Singleton instance.

mtmk Jan 9, 2025
Maintainer

edit: in high performance scenarios it might be beneficial to use multiple connections to overcome TCPs limitations e.g. one connection for publishers and one for consumers or spread the consumers across multiple connections to minimize head of line blocking. this is the opposite of what we usually suggest actually.

as for multiple consumers if processing of each message is taking too long, you can 'load balance' them using multiple consumers rather than a single consumer and spin a task.

edit2: I mean multiple subscriptions to a single consumer from multiple application instances e.g. if you're doing autoscaling

thanks @Jarema for checking this. please feel free to add any other comments you might have

mtmk Jan 9, 2025
Maintainer

(@YashasAnand just made some edits above, in case you missed them)

YashasAnand Jan 10, 2025
Author

@mtmk as per your edit2, that is exactly what ive done, if you missed it from my edited oringinal post

"We are using 75 Consumers totally of JobTestTopic type per pod & we have 3 pods of consumer service (totally 225 consumers) , 40 of these topics are unique durable consumers example (JobTestTopic1, JobTestTopic2 .. JobTestTopic39) . other 35 consumers are clients bounded to same consumers & jetstream is distributing it in round-robin fashion to different consumer threads."

beyond 3 nodes we dont want to go as with 3 nodes & 1 producer consumer service pod on each node, the consumerCPU's are on higher side (70+ percent on all 3 nodes).

Also one thing im not understanding is "[7] 2025/01/09 06:13:54.806113 [INF] 10.35.0.1:6222 - rid:10 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2264 chunks of 117746424 total bytes." why does this not tell me which consumer is slow? Also is slow consumer applicable to jetstream as far as i read from blogs its a coreNATS concept only, How can i pinpoint which consumer is slow?

mtmk Jan 10, 2025
Maintainer

you need to find what works best for you. my main comment would be don't just spin tasks without tracking them in the consume loop. btw, in case you didn't know we also have a dedicated JetStream channel: https://slack.nats.io #jetstream

Huge Performance Drop observed with large payloads of around 300KB or more #702

Uh oh!

Uh oh!

YashasAnand Jan 6, 2025

Replies: 2 comments · 8 replies

Uh oh!

Jarema Jan 6, 2025 Maintainer

Uh oh!

YashasAnand Jan 9, 2025 Author

Uh oh!

mtmk Jan 9, 2025 Maintainer

Uh oh!

YashasAnand Jan 9, 2025 Author

Uh oh!

Uh oh!

mtmk Jan 9, 2025 Maintainer

Uh oh!

mtmk Jan 9, 2025 Maintainer

Uh oh!

YashasAnand Jan 10, 2025 Author

Uh oh!

mtmk Jan 10, 2025 Maintainer

YashasAnand
Jan 6, 2025

Replies: 2 comments 8 replies

Jarema
Jan 6, 2025
Maintainer

YashasAnand Jan 9, 2025
Author

mtmk
Jan 9, 2025
Maintainer

YashasAnand Jan 9, 2025
Author

mtmk Jan 9, 2025
Maintainer

mtmk Jan 9, 2025
Maintainer

YashasAnand Jan 10, 2025
Author

mtmk Jan 10, 2025
Maintainer