How to Handle System Failures Caused by Hardware Issues in NATS Clusters #6892

lightwinglc · 2025-05-14T02:59:02Z

lightwinglc
May 14, 2025

I would like to ask how to handle the following issue:
When the host running the NATS server and application experiences a hardware failure (monitoring shows CPU at 100%), we cannot log into the host or proactively kill the application via monitoring processes. Logs from other healthy NATS servers indicate that the faulty NATS server was removed from the cluster due to a "slow consumer." However, external service consumers still report system slowness until the host completely crashes after 10+ minutes, after which they report normal operation.

Question:
Is there an API interface that can remove both the NATS service and its client connections associated with a specific IP from the cluster?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to Handle System Failures Caused by Hardware Issues in NATS Clusters #6892

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

How to Handle System Failures Caused by Hardware Issues in NATS Clusters #6892

Uh oh!

lightwinglc May 14, 2025

Replies: 0 comments

lightwinglc
May 14, 2025