couchdb v3.1 cluster stability issue #3559
Unanswered
nicknaychov
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
we recently did upgrade from 2.3.1. to 3.1.1 . We noticed that when we restart one of the nodes all nodes are starting to report errors and whole cluster become unusable. To fix the issue we have to restart the rest of the nodes as well. This does not seems like reliable design of v3, I do not think is normal restart of one of nodes to bring the whole cluster down. Even if our cluster is not setup correctly still that behavior seems very odd to me.
[cluster] q=5 n=3 placement = z1:2,z2:1
Errors we get after restart of pbx1-z2:
on pbx1-z2:
[error] 2021-05-11T11:35:43.688273Z couchdb3@pbx1-z2.domain.ca <0.502.0> -------- Error checking security objects for _replicator :: {error,timeout} [error] 2021-05-11T11:35:43.723096Z couchdb3@pbx1-z2.domain.ca <0.561.0> -------- fabric_worker_timeout get_all_security,'couchdb3@pbx1-z1.domain.ca',<<"shards/99999999-cccccccb/_users.1619581901">> [error] 2021-05-11T11:35:43.723325Z couchdb3@pbx1-z2.domain.ca <0.561.0> -------- Error checking security objects for _users :: {error,timeout}
pbx2-z1 node:
[error] 2021-05-11T11:35:42.566342Z couchdb3@pbx2-z1.domain.ca <0.17661.0> 0794fd3f5d fabric_worker_timeout open_doc,'couchdb3@pbx1-z1.domain.ca',<<"shards/66666666-99999998/_users.1619581901">> [error] 2021-05-11T11:35:42.566343Z couchdb3@pbx2-z1.domain.ca <0.17660.0> dbd0ec51bc fabric_worker_timeout open_doc,'couchdb3@pbx1-z1.domain.ca',<<"shards/66666666-99999998/_users.1619581901">>
pbx1-z1 node:
Let me know if you need some further details
Thank you
Description
Steps to Reproduce
Expected Behaviour
Your Environment
Additional Context
Beta Was this translation helpful? Give feedback.
All reactions