Skip to content

Improve failover mechanism for primary and secondary replica connections #243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jan 26, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Grakn.java
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,8 @@ interface Transaction extends AutoCloseable {

enum Type {
READ(0),
READ_REPLICA(2),
WRITE(1);
WRITE(1),
READ_SECONDARY(2);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning to rename it to READ_SECONDARY is because it allows you to read from secondary replicas, whereas READ and WRITE reads to the primary replica.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed in a verbal discussion with Haikal to get rid of the READ_SECONDARY transaction type and replace it with a new Option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fec34cc, 62c4368, 8f31ae6, and b8d3aa5 (I don't know why it took me 4 commits to do so :D )


private final int id;
private final boolean isWrite;
Expand Down
60 changes: 27 additions & 33 deletions GraknClient.java
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,9 @@
import java.util.Map;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;

import static grakn.client.common.exception.ErrorMessage.Client.CLUSTER_NOT_AVAILABLE;
import static grakn.client.common.exception.ErrorMessage.Client.CLUSTER_UNABLE_TO_CONNECT;
import static grakn.common.collection.Collections.pair;

public class GraknClient {
Expand Down Expand Up @@ -106,23 +105,20 @@ public Channel channel() {

public static class Cluster implements Grakn.Client {
private static final Logger LOG = LoggerFactory.getLogger(Cluster.class);
private final Map<Address.Cluster.Server, Core> coreClientMap;
private final Core[] coreClientArray;
private final AtomicInteger selectedCoreClientIndex;
private GraknClusterGrpc.GraknClusterBlockingStub clusterDiscoveryRPC;
private final Map<Address.Cluster.Server, Core> coreClients;
private final Map<Address.Cluster.Server, GraknClusterGrpc.GraknClusterBlockingStub> graknClusterRPCs;
private final RPCDatabaseManager.Cluster databases;
private boolean isOpen;

private Cluster(String address) {
Pair<GraknClusterGrpc.GraknClusterBlockingStub, Set<Address.Cluster.Server>> discovery = discoverCluster(address);
clusterDiscoveryRPC = discovery.first();
coreClientMap = discovery.second().stream()
coreClients = discoverCluster(address).stream()
.map(addr -> pair(addr, new Core(addr.client())))
.collect(Collectors.toMap(Pair::first, Pair::second));
coreClientArray = coreClientMap.values().toArray(new Core[] {});
selectedCoreClientIndex = new AtomicInteger();
graknClusterRPCs = coreClients.entrySet().stream()
.map(client -> pair(client.getKey(), GraknClusterGrpc.newBlockingStub(client.getValue().channel())))
.collect(Collectors.toMap(Pair::first, Pair::second));
databases = new RPCDatabaseManager.Cluster(
coreClientMap.entrySet().stream()
coreClients.entrySet().stream()
.map(client -> pair(client.getKey(), client.getValue().databases()))
.collect(Collectors.toMap(Pair::first, Pair::second))
);
Expand All @@ -136,13 +132,7 @@ public RPCSession.Cluster session(String database, Grakn.Session.Type type) {

@Override
public RPCSession.Cluster session(String database, Grakn.Session.Type type, GraknOptions options) {
return new RPCSession.Cluster(this, database, type, options, clusterDiscoveryRPC);
}

public GraknClusterGrpc.GraknClusterBlockingStub selectNextClusterDiscoveryRPC() {
Core selected = selectNextCoreClient();
clusterDiscoveryRPC = GraknClusterGrpc.newBlockingStub(selected.channel());
return clusterDiscoveryRPC;
return new RPCSession.Cluster(this, database, type, options);
}

@Override
Expand All @@ -157,33 +147,37 @@ public boolean isOpen() {

@Override
public void close() {
coreClientMap.values().forEach(GraknClient.Core::close);
coreClients.values().forEach(GraknClient.Core::close);
isOpen = false;
}

public Map<Address.Cluster.Server, Core> coreClients() {
return coreClientMap;
public Set<Address.Cluster.Server> servers() {
return coreClients.keySet();
}

public Core coreClient(Address.Cluster.Server address) {
return coreClients.get(address);
}

private Pair<GraknClusterGrpc.GraknClusterBlockingStub, Set<Address.Cluster.Server>> discoverCluster(String... addresses) {
public GraknClusterGrpc.GraknClusterBlockingStub graknClusterRPC(Address.Cluster.Server address) {
return graknClusterRPCs.get(address);
}

private Set<Address.Cluster.Server> discoverCluster(String... addresses) {
for (String address: addresses) {
try (GraknClient.Core client = new Core(address)) {
LOG.info("Performing server discovery to {}...", address);
GraknClusterGrpc.GraknClusterBlockingStub clusterDiscoveryRPC = GraknClusterGrpc.newBlockingStub(client.channel());
LOG.debug("Performing server discovery to {}...", address);
GraknClusterGrpc.GraknClusterBlockingStub graknClusterRPC = GraknClusterGrpc.newBlockingStub(client.channel());
ClusterProto.Cluster.Discover.Res res =
clusterDiscoveryRPC.clusterDiscover(ClusterProto.Cluster.Discover.Req.newBuilder().build());
graknClusterRPC.clusterDiscover(ClusterProto.Cluster.Discover.Req.newBuilder().build());
Set<Address.Cluster.Server> servers = res.getServersList().stream().map(Address.Cluster.Server::parse).collect(Collectors.toSet());
LOG.info("Discovered {}", servers);
return pair(clusterDiscoveryRPC, servers);
LOG.debug("Discovered {}", servers);
return servers;
} catch (StatusRuntimeException e) {
LOG.error("Server discovery to {} failed.", address);
}
}
throw new GraknClientException(CLUSTER_NOT_AVAILABLE.message((Object) addresses)); // remove ambiguity by casting to Object
}

private Core selectNextCoreClient() {
return coreClientArray[selectedCoreClientIndex.getAndIncrement() % coreClientMap.size()];
throw new GraknClientException(CLUSTER_UNABLE_TO_CONNECT, (Object[]) addresses); // remove ambiguity by casting to Object
}
}
}
27 changes: 19 additions & 8 deletions common/exception/ErrorMessage.java
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,15 @@ public static class Client extends ErrorMessage {
public static final Client MISSING_RESPONSE =
new Client(4, "The required field 'res' of type '%s' was not set.");
public static final Client UNKNOWN_REQUEST_ID =
new Client(5, "Received a response with unknown request id '%s'.");
new Client(5, "Received a response with unknown request id '%s'.") ;
public static final Client ILLEGAL_ARGUMENT = new Client(6, "Illegal argument passed into the method: '%s'.");

public static final Client UNABLE_TO_CONNECT = new Client(7, "Unable to connect to Grakn Core Server.");
public static final Client CLUSTER_LEADER_NOT_YET_ELECTED =
new Client(7, "No leader has been elected for latest known term '%s'.");

public static final Client CLUSTER_NOT_AVAILABLE =
new Client(8, "Attempted connecting to these servers, but none are available: '%s'.");
public static final Client UNABLE_TO_CONNECT =
new Client(9, "Unable to connect to Grakn Core Server.");
new Client(8, "No leader has been elected for latest known term '%s'.");
public static final Client CLUSTER_UNABLE_TO_CONNECT =
new Client(9, "Unable to connect to Grakn Cluster. Attempted connecting to these servers, but none are available: '%s'.");
public static final Client CLUSTER_SERVER_NOT_A_LEADER =
new Client(10, "Server is not a leader");

private static final String codePrefix = "CLI";
private static final String messagePrefix = "Illegal Client State";
Expand Down Expand Up @@ -97,4 +96,16 @@ public static class Query extends ErrorMessage {
super(codePrefix, number, messagePrefix, message);
}
}

public static class Internal extends ErrorMessage {
public static final Internal UNEXPECTED_INTERRUPTION =
new Internal(1, "Unexpected thread interruption!");

private static final String codePrefix = "INT";
private static final String messagePrefix = "Internal Error";

Internal(int number, String message) {
super(codePrefix, number, messagePrefix, message);
}
}
}
7 changes: 5 additions & 2 deletions common/exception/GraknClientException.java
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import io.grpc.StatusRuntimeException;

import javax.annotation.Nullable;
import java.util.Objects;

public class GraknClientException extends RuntimeException {

Expand All @@ -33,15 +34,17 @@ public GraknClientException(String error) {
this.errorMessage = null;
}

public GraknClientException(ErrorMessage error) {
super(error.toString());
public GraknClientException(ErrorMessage error, Object... parameters) {
super(error.message(parameters));
assert !getMessage().contains("%s");
this.errorMessage = error;
}

public static GraknClientException of(StatusRuntimeException statusRuntimeException) {
if (statusRuntimeException.getStatus().getCode() == Status.Code.UNAVAILABLE) {
return new GraknClientException(ErrorMessage.Client.UNABLE_TO_CONNECT);
} else if (statusRuntimeException.getStatus().getCode() == Status.Code.INTERNAL && statusRuntimeException.getStatus().getDescription().contains("[RFT01]")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular if block I want to check if the server throws an exception because there's no leader yet.

The server will throw an exception of code "[RFT01]", and the only way to propagate that information about the exception is by embedding it in the message, hence the need for getDescription().contains(...).

Is there a better way to propagate exceptions than this?

Do you have better ideas?

Copy link
Member

@alexjpwalker alexjpwalker Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best alternatives that we have right now would be to either:

  • change the protocol and server: don't throw an exception if there's no leader, but rather encode the error in protobuf (but this modifies the behaviour in a strange way); or
  • add a new method to protocol and server to check if the leader exists - but this is both inefficient and non-atomic.

So in conclusion parsing the error message is the least bad option right now.

There are plans, in the future, to implement a dedicated Error message in protocol, which we will be able to interpret without the need for hacks: see https://github.com/graknlabs/client-java/issues/180 .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the hacky code into its own method and added a TODO: 268c884

return new GraknClientException(ErrorMessage.Client.CLUSTER_SERVER_NOT_A_LEADER);
}
return new GraknClientException(statusRuntimeException.getStatus().getDescription());
}
Expand Down
4 changes: 2 additions & 2 deletions dependencies/graknlabs/repositories.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ def graknlabs_dependencies():
def graknlabs_protocol():
git_repository(
name = "graknlabs_protocol",
remote = "https://github.com/graknlabs/protocol",
tag = "2.0.0-alpha-6", # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_protocol
remote = "https://github.com/lolski/protocol",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert to graknlabs when typedb/typedb-protocol#110 is merged.

commit = "125b3b88d817e84dcb83702eb557eb4d5bcea4a1", # sync-marker: do not remove this comment, this is used for sync-dependencies by @graknlabs_protocol
)

def graknlabs_behaviour():
Expand Down
1 change: 0 additions & 1 deletion rpc/RPCDatabaseManager.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@

import java.util.List;
import java.util.Map;
import java.util.concurrent.ConcurrentMap;
import java.util.function.Supplier;

import static grakn.client.common.exception.ErrorMessage.Client.MISSING_DB_NAME;
Expand Down
Loading