How to Handle System Failures Caused by Hardware Issues in NATS Clusters #6892
Unanswered
lightwinglc
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to ask how to handle the following issue:
When the host running the NATS server and application experiences a hardware failure (monitoring shows CPU at 100%), we cannot log into the host or proactively kill the application via monitoring processes. Logs from other healthy NATS servers indicate that the faulty NATS server was removed from the cluster due to a "slow consumer." However, external service consumers still report system slowness until the host completely crashes after 10+ minutes, after which they report normal operation.
Question:
Is there an API interface that can remove both the NATS service and its client connections associated with a specific IP from the cluster?
Beta Was this translation helpful? Give feedback.
All reactions