Loki Ingesters and Mimir Ingesters Sharing a Ring?! #11423
-
I woke up this morning to the strangest state of my telemetry cluster. I install mimir using the mimir-distributed helm chart, and the loki helm chart in microservies mode on a kubernetes cluster. Both are on their own namespace. However, this morning the Mimir queriers were throwing this error: I checked the mimir ring by port-forwarding the mimir ingester pod and going to https://localhost:8080/ingester/ring, and found that the loki ingesters were on the same ring as the mimir ingesters?! Has anyone seen this before? Surely this isn't normal, right? ![]() |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
K8S pods suffers an issue that we call "Fast IPs recycling". The problem is that the IP of a terminating Loki pod may be quickly reused for a starting Mimir pod (viceversa can happen too). If there's not enough delay between when a IP is released and that IP is reused (recycled) for another pod, other Loki pods may connect to that IP believing it's still a Loki pod (when it's actually a Mimir pod now), and this can cause the two memberlist clusters to join. How to protect from this? The solution is to configure the following in Mimir to enable memberlist cluster "verification" (Loki has similar settings too):
To rollout this change with no downtime, you will have to change config in a couple of steps. Please refer to this doc: |
Beta Was this translation helpful? Give feedback.
K8S pods suffers an issue that we call "Fast IPs recycling". The problem is that the IP of a terminating Loki pod may be quickly reused for a starting Mimir pod (viceversa can happen too). If there's not enough delay between when a IP is released and that IP is reused (recycled) for another pod, other Loki pods may connect to that IP believing it's still a Loki pod (when it's actually a Mimir pod now), and this can cause the two memberlist clusters to join.
How to protect from this? The solution is to configure the following in Mimir to enable memberlist cluster "verification" (Loki has similar settings too):