You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/clustering/high-availability.mdx
+8-6Lines changed: 8 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -635,8 +635,8 @@ the wrong state of other clusters, it can become a leader without being connecte
635
635
636
636
## Recovering from errors
637
637
638
-
Distributed systems can fail in numerous ways. With the current implementation, Memgraph instances are resilient to occasional network
639
-
failuresand independent machine failures. Byzantine failures aren't handled since the Raft consensus protocol cannot deal with them either.
638
+
Distributed systems can fail in numerous ways. Memgraph processes are resilient to network
639
+
failures, omission faults and independent machine failures. Byzantine failures aren't tolerated since the Raft consensus protocol cannot deal with them either.
640
640
641
641
Recovery Time Objective (RTO) is an often used term for measuring the maximum tolerable length of time that an instance or cluster can be down.
642
642
Since every highly available Memgraph cluster has two types of instances, we need to analyze the failures of each separately.
@@ -652,9 +652,6 @@ and the time needed to realize the instance is down (`--instance-down-timeout-se
652
652
using just a handful of RPC messages (correct time depends on the distance between instances). It is important to mention that the whole failover is performed without the loss of committed data
653
653
if the newly chosen MAIN (previously REPLICA) had all up-to-date data.
654
654
655
-
Current deployment assumes the existence of only one datacenter, which automatically means that Memgraph won't be available in the case the whole datacenter goes down. We are actively
656
-
working on 2 datacenter (2-DC) architecture.
657
-
658
655
## Raft configuration parameters
659
656
660
657
Several Raft-related parameters are important for the correct functioning of the cluster. The leader coordinator sends a heartbeat
@@ -664,9 +661,14 @@ expiration is set to 2000ms so that cluster can never get into situation where m
664
661
the ability to survive occasional network hiccups without triggering leadership changes.
665
662
666
663
664
+
## Data center failure
665
+
666
+
The architecture we currently use allows us to deploy coordinators in 3 data centers and hence tolerate a failure of the whole data center. Data instances can be freely
667
+
distributed in any way you want between data centers. The failover time will be slighlty increased due to the network communication needed.
668
+
667
669
## Kubernetes
668
670
669
-
We support deploying Memgraph HA instances as part of the Kubernetes cluster.
671
+
We support deploying Memgraph HA as part of the Kubernetes cluster through Helm charts.
670
672
You can see example configurations [here](/getting-started/install-memgraph/kubernetes#memgraph-high-availability-helm-chart).
0 commit comments