-
Notifications
You must be signed in to change notification settings - Fork 67
Description
As we continue to evaluate OrientDB, we've discovered a critical bug in OrientJS: a "hang" which occurs whenever an OrientJS client attempts to interact with a distributed OrientDB instance as the database host is initializing.
First, some context: We run an online MMO game. On the weekends we usually see a spike in database connections. One of our test cases for evaluating OrientDB is to add additional replicas to our server cluster to accommodate the increased load we see on weekends (to grow connection throughput). In our tests, as we launched new database instances while keeping the service online, our client apps became "stuck" as they attempted to connect to newly launched distributed hosts.
repro steps
To reproduce the issue, run the following simple example (see below) while launching a distributed database. Note that we are running OrientDB 2.1-rc5 in distributed mode (dserver.sh) and are using the latest OrientJS package from npm. This bug only happens if the client attempts to connect to the database as it is coming online, when it is in the "starting" state prior to it logging that is it in the "ready" state.
- Launch dserver.sh
- Immediately after, as the server is initializing, run our test app.
- Note that the query never completes - you can enable detailed logging in OrientJS and see that it hangs when connecting to the database.
discussion
We have spent significant time investigating the issue and have concluded that the flaw must be in OrientJS. We wrote a simple program that does the same thing in Java and noticed that the Java driver appears to handle this edge case gracefully, emitting the following error:
com.orientechnologies.orient.core.exception.OStorageException: Cannot create a connection to remote server address(es): [ip adress].
From the server's perspective, we see the following error in its log, indicating that it is refusing connections, since it is not yet "ready".
2015-07-29 11:10:55:943 WARNI Distributed server is not yet ONLINE (status=STARTING), reject incoming connection from /127.0.0.1:50438. If you are trying to shutdown the server, please kill the process [OServerNetworkListener]
When we probed deeper into OrientJS in an attempt to isolate the flaw, we discovered that the problem appears to be in establishing a connection to the database. The Promise used by the binary transport to connect to the database (in negotiateConnection
) appears to never be fulfilled. We can tell that the process does not "hang" because the heartbeat print in our example continues to appear, but the Promise never settles, and as such, the query never completes, resulting in a "hung" Promise chain.
example app
'use strict';
var orientjs = require('orientjs');
// NOTE change these to match your local setup:
var server = orientjs({
host: 'localhost',
port: 2424,
username: 'root',
password: 'password',
});
// NOTE change these to match your local setup:
var db = server.use({
name: 'GratefulDeadConcerts',
username: 'admin',
password: 'admin'
});
db.query('delete vertex v limit 500')
.then(function(info) {
// NOTE this never happens
console.log(info);
process.exit(0);
})
.catch(function(err) {
// NOTE this never happens
console.error(err);
process.exit(1);
});
// NOTE this is here to illustrate that the process itself is not hung
setInterval(function() {
console.log('heartbeat');
}, 1000);