"target not available" error when losing one of three thanos receive replicas #5108
Unanswered
dwilliams782
asked this question in
Questions & Answers
Replies: 1 comment 1 reply
-
Perhaps the problem is that you are pointing each node to a service which is a load-balancer in front of pods in Kubernetes, right? Perhaps you could try creating a headless service and then to use |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I am attempting to run Thanos Receive in k8s as a statefulset. Our use case is very small as Prometheus does almost all of our heavy lifting, however we evaluate some rules using Loki Ruler and use remote write to send these to Prometheus. Due to our Prom being HA, there is a potential of data loss if we lose an instance, so we want to use Thanos Receive to provide a HA solution for remote write metrics.
I have the following config:
And the hashring config:
We don't need three replicas, but based on this discussion: #3194, three replicas with a replication factor of 2 seems to be the minimum required to have metrics replicated on two instances of Receive whilst providing resiliency against instance failure?
I can see data being received from two of the three instances (
thanos-receive-1
andthanos-receive-2
) as expected:To test we have resiliency against a pod failing, I deleted
thanos-receive-1
, expecting that requests would start getting routed tothanos-receive-0
. Instead, the logs inthanos-receive-2
started spamming multiple times per second:This continued until I restored
thanos-receive-1
, so the -0 index instance did not get replicated to.Have I misunderstood a concept here, or got some configuration wrong?
Beta Was this translation helpful? Give feedback.
All reactions