-
Notifications
You must be signed in to change notification settings - Fork 2
Overview
JEM clustering is based on Hazelcast. Each cluster member (calles node) has the same rights and responsibilities of the others (with the exception of the oldest member, that we are going to see in details): this is because Hazelcast implements a peer-to-peer clustering, so that there's no "master" node.
When a node starts up, it checks to see if there's already a cluster in the network. There are two ways to find this out:
- Multicast discovery: if multicast discovery is enabled (this is the default), the node will send a join request in the form of a multicast datagram packet.
- Unicast discovery: if multicast discovery is disabled and TCP/IP join is enabled, the node will try to connect to the IPs defined. If it successfully connects to (at least) one node, then it will send a join request through the TCP/IP connection.
If no cluster is found, the node will be the first member of the cluster. If multicast is enabled, it starts a multicast listener so that it can respond to incoming join requests. Otherwise, it will listen for join request coming via TCP/IP.
If there is an existing cluster already, then the oldest member in the cluster will receive the join request and checks if the request is for the right group. If so, the oldest member in the cluster will start the join process.
In the join process, the oldest member will:
- send the new member list to all members
- tell members to sync data in order to balance the data load
Every member in the cluster has the same member list in the same order. First member is the oldest member so if the oldest member dies, second member in the list becomes the first member in the list and the new oldest member. The oldest member is considered as the JEM cluster coordinator: it will execute those actions that must be executed by a single member (i.e. locks releasing due to a member crash).
Aside the "normal" nodes, there are the Hazelcast clients which are used for the Web Application (running on Apache Tomcat, as well as on any other application server).
Here is a diagram of the various nodes' statuses:
Status | Description |
---|---|
STARTING | in start up phase, node registers itself with this status |
INACTIVE | it’s ready to take JCL to execute |
ACTIVE | the job is running and the node is managing it |
DRAINING | operators perform drain command to node, to block any processing, but the node was ACTIVE and a job is still running |
DRAINED | operators block any processing of this node |
UNKNOWN | when a node is no longer joined to cluster and its status is unknown |
The Execution Environment is a set of logical definition related to cluster which must be used to address the job to the right member to be executed. JEM implements 3 kinds of coordinates, used as tags, named:
- Environment: the name of the cluster jobs must be run on. It must be the same of Hazelcast group name.
- Domain: a subset of nodes (identified by a tag) jobs must be run on (i.e. have a domain for application code).
- Affinity: a subset of nodes of a domain (by one or more tags) jobs must be run on.
Each node belongs to:
- one (and only one) Environment
- zero or one Domain
- zero or more Affinities
Each JCL can be defined to be run on:
- one (and only one) Environment
- zero or one Domain
- zero or more Affinities
JEM manages several queues used to maintain the life-cycle of a job: the queues are implemented using Hazelcast data sharing.
Here is the explanation:
- when a job is submitted for execution by a submitter, it's moved to
preinput
queue: while there, JCL is validated (JCL validation is done by a cluster's node) - after successful JCL validation, job is moved to
input
queue, waiting for job execution - according to Domain and Affinity tags, job is run on an appropriate node and moved to
running
queue - after job ends, it is moved to
output
queue - according to Environment tag, a job can be moved from the input queue to the
routing
queue, waiting for another JEM cluster which will fetch and execute it (see Routing) - if JCL validation is unsuccessful, job is moved into
output
queue
When a job is moved into output queue, the submitter will receive a "job ended" notification (via topic).
In addition to a memory data sharing, one of most important requirements for JEM is to use a global file system (GFS). The main goal is to be able to store data on a common file system so that all jobs could manage them (reading and writing). Nevertheless a GFS is not mandatory, if you desire to have all data spread on all machines and configuring JEM to have separate Environment, by specific Domains and Affinties.
Anyway a GFS is suggested to be used to put the keys and keystores for encryption and licenses used by JEM.
Following folders should be configured:
- data path where all datasets will be stored
- output path where JEM nodes will store all output produced by job, during its execution (see next section about output management)
- sources path where JCL sources are stored. This is necessary for usual import and include statements of JCL
- library path where all native system libraries (like .dll, .so) that are nedded by the executable files present in the binary folder should be stored
- binary path where all the executable files (like .exe, .cmd, .sh ) that are called by the JCL should be stored
- classpath path where all the library (like jar, zip), needed at runtime to accomplished a JCL, should be stored
-
persistence
contains the following directories structure:-
[ENV-NAME]
is the folder containing all the information relative to a specific environment-
config
path containing the configuration files relative to the environment. See Configuring Jem Cluster -
keystores
path where keystores, used by JEM when running in secure mode, are stored. For more information see JEM cluster security -
policy
path where is present the standard policy file provide by JEM installation (policy.js). This file can be modify to reach custom needs. Each of these paths should be mount in a shared file system (may be different shared file systems, one for each path if needed) so that all the nodes in the cluster will refers to files in the same way, will avoid redundancy and will always be up to date relative to the libraries versions, binary versions etc...
-
-
In this documentation, when we referred to the JEM GFS (global file system) we are referring to these paths.
- Introduction
- Installation
- Configuration
- Job Execution
- JCL
- User Interfaces
- Security
- REST api
- Features
- Log Messages
- Sandbox
- Software Quality
- Some performance data