One out of 3 NATS servers are unavailable during long run test (appr. more than 2 hours) #6897
Unanswered
Santhoshbesagarahalli
asked this question in
Q&A
Replies: 1 comment 10 replies
-
When the node is in a stuck state, can you please fetch & attach the |
Beta Was this translation helpful? Give feedback.
10 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Set-up Information:
Server : Intel(R) Xeon(R) Gold 6430, 1.2GHz, RAM 502G
OS: Rocky Linux 8.10 (Green Obsidian)
K8 Version: Server Version: v1.30.9 (single node)
NATS server: v2.11.2
NATS Helm: 1.3.4
CNATS: 3.10.1
Deployment information:
Number of NATS instances -3 (1 cluster)
Jetstream enabled = TRUE
NATS configuration is as below:
NATS Cluster information
As part of our setup, we have 6 to10 other C++ client application pods which are connecting to NATS server for updating KV store with bucket name “service_reg_db_app_status” with underlying stream name “KV_ service_reg_db_app_status” periodically every 20 sec per application. Snapshot of Stream is as below,

Jetstream report during startup as below
Cluster report during startup is as below,
When the issue has been seen, one of the NATS server “cucp-cd-nats-1” stop responding to client requests to update any in the KV store, but RAFT cluster is still up and running.
NATS server “cucp-cd-nats-1” is still running and able to respond to NATS cli command.
Below are the snapshot captured when issue has been observed,
Output of server report routes, where “cudp-cd-nats-1” server information is missing,
Output of server report Jetstream where “cudp-cd-nats-1” server information is missing as part of Jetstream summary however is part of the RAFT group
Deadline exceed error when we try to get Stream information using NATS CLI,
Even we have observed that sometime RAFT leader stops responding and no RAFT leader information. Output of Jetstream server report is as below,
We tried the same test with NATS version of 2.10.22, and we did not observer this issue ever after running for 7 hours.
Could you please help us to debug the issue.
Beta Was this translation helpful? Give feedback.
All reactions