couchdb v3.1 cluster stability issue #3559

nicknaychov · 2021-05-11T12:28:58Z

nicknaychov
May 11, 2021

Hi
we recently did upgrade from 2.3.1. to 3.1.1 . We noticed that when we restart one of the nodes all nodes are starting to report errors and whole cluster become unusable. To fix the issue we have to restart the rest of the nodes as well. This does not seems like reliable design of v3, I do not think is normal restart of one of nodes to bring the whole cluster down. Even if our cluster is not setup correctly still that behavior seems very odd to me.

[cluster] q=5 n=3 placement = z1:2,z2:1

Errors we get after restart of pbx1-z2:

on pbx1-z2:
[error] 2021-05-11T11:35:43.688273Z couchdb3@pbx1-z2.domain.ca <0.502.0> -------- Error checking security objects for _replicator :: {error,timeout} [error] 2021-05-11T11:35:43.723096Z couchdb3@pbx1-z2.domain.ca <0.561.0> -------- fabric_worker_timeout get_all_security,'couchdb3@pbx1-z1.domain.ca',<<"shards/99999999-cccccccb/_users.1619581901">> [error] 2021-05-11T11:35:43.723325Z couchdb3@pbx1-z2.domain.ca <0.561.0> -------- Error checking security objects for _users :: {error,timeout}
pbx2-z1 node:
[error] 2021-05-11T11:35:42.566342Z couchdb3@pbx2-z1.domain.ca <0.17661.0> 0794fd3f5d fabric_worker_timeout open_doc,'couchdb3@pbx1-z1.domain.ca',<<"shards/66666666-99999998/_users.1619581901">> [error] 2021-05-11T11:35:42.566343Z couchdb3@pbx2-z1.domain.ca <0.17660.0> dbd0ec51bc fabric_worker_timeout open_doc,'couchdb3@pbx1-z1.domain.ca',<<"shards/66666666-99999998/_users.1619581901">>
pbx1-z1 node:

[error] 2021-05-11T11:32:23.537733Z couchdb3@pbx1-z1.domain.ca <0.6339.206> 6969682e36 rexi_server: from: couchdb3@pbx1-z1.domain.ca(<0.14168.206>) mfa: fabric_rpc:map_view/5 error:function_clause [{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,187}]},{couch_server,open_int,2,[{file,"src/couch_server.erl"},{line,94}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,84}]},{mem3_util,get_or_create_db,2,[{file,"src/mem3_util.erl"},{line,512}]},{fabric_rpc,map_view,5,[{file,"src/fabric_rpc.erl"},{line,146}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,138}]}]

[error] 2021-05-11T11:36:31.003402Z couchdb3@pbx1-z1.domain.ca <0.9286.206> 95132a4007 rexi_server: from: couchdb3@pbx1-z1.domain.ca(<0.29152.206>) mfa: fabric_rpc:open_shard/2 error:function_clause [{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,187}]},{couch_server,open_int,2,[{file,"src/couch_server.erl"},{line,94}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,84}]},{couch_db,open,2,[{file,"src/couch_db.erl"},{line,160}]},{fabric_rpc,open_shard,2,[{file,"src/fabric_rpc.erl"},{line,307}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,138}]}]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

couchdb v3.1 cluster stability issue #3559

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

couchdb v3.1 cluster stability issue #3559

Uh oh!

Uh oh!

nicknaychov May 11, 2021

Description

Steps to Reproduce

Expected Behaviour

Your Environment

Additional Context

Replies: 0 comments

nicknaychov
May 11, 2021