Skip to content

JCLReferenceUsingSpringBatch

stockiNail edited this page Oct 12, 2015 · 11 revisions

Spring Batch as JCL

Many applications within the enterprise domain require bulk processing to perform business operations in mission critical environments. These business operations include automated, complex processing of large volumes of information that is most efficiently processed without user interaction. These operations typically include time based events (for instance month-end calculations, notices or correspondence), periodic application of complex business rules processed repetitively across very large data sets (or instance insurance benefit determination or rate adjustments), or the integration of information that is received from internal and external systems that typically requires formatting, validation and processing in a transactional manner into the system of record. Batch processing is used to process billions of transactions every day for enterprises.

Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch builds upon the productivity, POJO-based development approach, and general ease of use capabilities people have come to know from the Spring Framework, while making it easy for developers to access and leverage more advanced enterprise services when necessary.

Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that will enable extremely high-volume and high performance batch jobs through optimization and partitioning techniques. Simple as well as complex, high-volume batch jobs can leverage the framework in a highly scalable manner to process significant volumes of information.

Spring Batch is part of Spring.

Installation of JEM-SpringBatch module

Spring Batch integration in JEM is provided out of the box inside of JEM distribution and installation file.

See here how Spring Batch is configured.

Properties definition

To use Spring Batch inside JEM, Spring Batch XML JCL must contain the mandatory properties that JEM needs, using a specific bean by class org.pepstock.jem.springbatch.JemBean and setting the id attribute to jem.bean.

Properties are described as following:

  • Job Name is optional string property, called for JemBean jobName. If missing, name attribute of job element will be used. If both are missing, an exception occurs.
  • Environment is optional string property, called for JemBean environment. If missing, the JEM node environment definition is used.
  • Domain is optional string property, called for JemBean domain. If missing, default value (***) will be used.
  • Affinity is optional string property, called for JemBean affinity. If missing, default value (***) will be used.
  • User is optional string property, called for JemBean user. If missing, default value (null) will be used. An exception occurs if the user, who submitted the job, is not authorized to change the user job execution.
  • Locking Scope is optional string property, called for JemBean lockingScope. If missing, default value (job) will be used. If the value is not equals to one the possible values (job, step), an exception occurs.
  • Hold is optional boolean property, called for JemBean hold. If missing, default value (false) will be used.
  • Priority is optional integer property, called for JemBean priority. If missing, default value (10) will be used.
  • Memory is optional integer property, called for JemBean memory. If missing, default value (128) will be used. Be aware the unit is mega bytes.
  • Classpath is optional string property, called for JemBean classPath. Defining a classpath, it will be able to extend the classpath of JVM where Spring Batch is running. The value is a string and the files are separated by semicolons ;. If the file doesn't represent an absolute path, the JEM will use relative position from jem.classpath folder. You could use variables that JEM substitutes in string value. If missing, default value (null) will be used. The paths will be added at the end of default classpath, build by JEM.
  • Prior classpath is optional string property, called for JemBean priorClassPath. Defining a classpath, it will be able to extend the classpath of JVM where Spring Batch is running. The value is a string and the files are separated by semicolons ;. If the file doesn't represent an absolute path, the JEM will use relative position from jem.classpath folder. You could use variables that JEM substitutes in string value. If missing, default value (null) will be used. The paths will be added at the beginning of default classpath, build by JEM.
  • Emails Notification is optional string property, called for JemBean emailNotificationAddresses. If missing, default value (null) will be used.
  • Java is optional string property, called for JemBean java. If missing, If missing, JEM will use the JRE used by JEM node.

Spring Batch can have other 2 properties which could be set to be compliant with CommandLineJobRunner invocation:

  • options could be:
    • -restart: (optional) to restart the last failed execution
    • -stop: (optional) to stop a running execution
    • -abandon: (optional) to abandon a stopped execution
    • -next: (optional) to start the next in a sequence
  • parameters are used to launch a job specified in the form of key=value pairs, blank separated

A example usage of org.pepstock.jem.springbatch.JemBean is following:

<beans:bean id="jem.bean" class="org.pepstock.jem.springbatch.JemBean">
   <beans:property name="jobName" value="JOB1"/>
   <beans:property name="environment" value="ENV1"/>
   <beans:property name="domain" value="domain"/>
   <beans:property name="affinity" value="classA"/>
   <beans:property name="user" value="newUser"/>
   <beans:property name="hold" value="true"/>
   <beans:property name="lockingScope" value="job"/>
   <beans:property name="priority" value="99"/>
   <beans:property name="memory" value="1024"/>
   <beans:property name="classPath" value="${jem.classpath}/jdbc.jar"/>
   <beans:property name="priorClassPath" value="${jem.classpath}/jdbc1.jar"/>
   <beans:property name="emailNotificationAddresses" value="m1@pepstock.org;m2@pepstock.org"/>
   <beans:property name="options" value="-next"/>
   <beans:property name="parameters" value="key1=value1 key2=value2"/>
   <beans:property name="java" value="jdk7"/>
</beans:bean>

To work correctly inside JEM (it works anyway but without some JEM job management features), it's mandatory to have the JemBean defined inside of SprngBatch JCL. This is mandatory because this bean will perform all necessary customazation to integrate SpringBatch and JEM.

The job name is extracted from JCL reading the id attribute of <job> element and MUST be equals to property of org.pepstock.jem.springbatch.JemBean, as following:

<job id="JOB1">
   <step id="step0">
      <tasklet ref="test"></tasklet>
   </step>
</job>

Data Descriptions and Datasets in Spring Batch

Data descriptions and datasets are implemented by specific beans. Because of all in Spring is a bean, data descriptions could be defined using org.pepstock.jem.springbatch.tasks.DataDescription bean as following:

<beans:bean id="id-data-description" class="org.pepstock.jem.springbatch.tasks.DataDescription">
   <beans:property name="name" value="name-data-description"/>
   <beans:property name="sysout" value="true/false"/>
   <beans:property name="disposition" value="NEW/OLD/MOD/SHR"/>
   <beans:property name="datasets">
      <beans:list>
      ... ... ...
      </beans:list>
   </beans:property>
</beans:bean>

The id attribute of main bean is used only by Spring Batch but necessary to relate data descriptions to tasks and or beans.

Data description needs a mandatory name property. This name is used inside the business logic to access to datasets by JNDI so this name must be unique in the task definition. The sysout property is optional one that if there is means that the data description is a sysout. The disposition property is optional and must be a valid value, one of the following string enumeration: NEW, MOD, OLD or SHR.

Datasets are children of a data description definition and identifies a file or data necessary for business logic. Datasets could be defined as following:

<beans:bean class="org.pepstock.jem.springbatch.tasks.DataSet">
   <beans:property name="name" value="file-name"/>
   <beans:property name="datasource" value="datasource-name"/>
</beans:bean>
/**
* Inline dataset
*/
<beans:bean class="org.pepstock.jem.springbatch.tasks.DataSet">
   <beans:property name="text" value=" Here I am This is a DD"/>
</beans:bean>

Datasets bean must have the name attribute (by name property) and which could represent (by value property):

  • file name (absolute or relative path), composed by Spring Batch properties if necessary. If it's a relative path, JEM adds the content of jem.data variable, used to identify the global file system with all data.
  • GDG file name, following the same rules of a normal file name (see previous item).
  • Temporary prefix file name

If the name attribute is missing and text is present, a temporary file is created, putting the content of value property attribute. Datasets could have the datasource attribute which could represent a relation with a data source, previously defined. It works only the FTP resources and the name of dataset represents the file to manage by FTP. The reference dataset is not implemented in Spring Batch because with usage of bean it's enough to pass the id of the bean to all steps which need it.

Data sources in Spring Batch

Data sources are implemented by specific tags. They could be defined as following:

<!-- DB datasource reference -->
<beans:bean id="my-db" class="org.pepstock.jem.springbatch.tasks.DataSource">
   <beans:property name="name" value="my-db"></beans:property>
   <beans:property name="resource" value="resource-db"></beans:property>
</beans:bean>
<!-- FTP datasource reference -->
<beans:bean id="my-ftp" class="org.pepstock.jem.springbatch.tasks.DataSource">
   <beans:property name="name" value="my-ftp"></beans:property>
   <beans:property name="resource" value="my-resource-ftp"></beans:property>
   <beans:property name="properties">
      <beans:list>
         <beans:bean class="org.pepstock.jem.springbatch.tasks.Property">
            <beans:property name="name" value="binary"></beans:property>
            <beans:property name="value" value="true"></beans:property>
         </beans:bean>
      </beans:list>
   </beans:property>
</beans:bean>

The id attribute of bean is used only by Spring Batch but necessary to relate data sources to tasks and or beans.

Data source could have name attribute. This name is used inside the business logic to access to data source (for every kind of data source, as jdbc or ftp) by JNDI so this name must be unique in the task definition. If missing, the name of data source, to use inside of business logic or in dataset definition, is the same of resource name, defined in JEM. Data source needs a mandatory resource attribute. This name is used to locate the common-resource, defined previously in JEM. If resource is not defined, an exception occurs. Data source element can contain one or more property element to override at runtime the value of some properties defined to create a data source. The property element needs a mandatory name attribute. An exception could occur if you try to override a property created in JEM with override="true" typically used for password, user-id or URL.

Jdbc-Template of Spring

For datasources of jdbc type, it is possibile to use the datasource directly with org.springframework.jdbc.core.JdbcTemplate class which is a Spring utility to access to databases. For more deatils how to use Jdbc-Template, have a look to Spring documentation.

Here is an example:

<!-- JdbcTemplate Definition -->
<beans:bean class="org.springframework.jdbc.core.JdbcTemplate" id="jdbcTemplate">  
    <beans:property name="dataSource" ref="jem-db" />  
</beans:bean>  

<!-- DataSource Definition -->
<beans:bean id="jem-db" class="org.pepstock.jem.springbatch.tasks.DataSource">
    <beans:property name="name" value="jem-db" />
    <beans:property name="resource" value="JUNIT_JDBC_JEM" />
</beans:bean>

Locks in Spring Batch

Locks are implemented by specific tags. They could be defined as following:

<beans:bean id="id-lock" class="org.pepstock.jem.springbatch.tasks.Lock">
   <beans:property name="name" value="unique-name-for-env" />
</beans:bean>

The id attribute of bean is used only by Spring Batch but necessary to relate locks to tasks and/or beans. Lock needs a mandatory name attribute. This name is used to create a lock inside the JEM environment in exclusive. This name must be unique in the task definition.

Tasklets

The tasklet is a simple interface that has one method, execute , which will be a called repeatedly by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure. Each call to the Tasklet is wrapped in a transaction. Tasklet implementers might call a stored procedure, a script, or a simple SQL update statement.

JEM prepared a own tasklet implementation, org.pepstock.jem.springbatch.tasks.JemTasklet which implements the execute method in final way and it calls another method that the developer must implement. The method looks like as following:

/**
* Is abstract method to implement with business logic, where it's possible
* to access to resources by JNDI.
*
* @param stepContribution step contribution
* @param chuckContext chunk context
* @return status of execution
* @throws Exception if errors occur
*/
public abstract RepeatStatus run(StepContribution stepContribution, ChunkContext chuckContext) throws Exception;

A sample how to define a onw tasklet with data descriptions is following:

<beans:bean id="my-tasklet" class="org.pepstock.test.springbatch.MyTaskLet">
   <beans:property name="dataDescriptionList">
      <beans:list>
         <beans:ref local="my-single"></beans:ref>
      </beans:list>
   </beans:property>
</beans:bean>

Here is java code of tasklet sample:

public RepeatStatus run(StepContribution stepContribution, ChunkContext chuckContext) throws Exception {
   Hashtable<String, String> env = new Hashtable<String, String>();
   env.put(Context.INITIAL_CONTEXT_FACTORY, "org.pepstock.jem.node.tasks.jndi.JemContextFactory");
   
   try {
      InitialContext context = new InitialContext(env);
      /** single file input stream **/
      Object object = (Object) context.lookup("my-single");
      FileInputStream inputStream = (FileInputStream) object;

      /** reads input stream **/
      Scanner sc = new Scanner(inputStream);
      sc.useDelimiter("\n");
      while (sc.hasNext()) {
         String record = sc.next().toString();
         System.out.println(record);
      }
      sc.close();
   } catch (Exception ex){
      ex.printStackTrace();
   }
   return RepeatStatus.FINISHED;
}

Chunks

Spring Batch uses a 'Chunk Oriented' processing style within its most common implementation. Chunk oriented processing refers to reading the data one at a time, and creating 'chunks' that will be written out, within a transaction boundary. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.

JEM prepared a ItemWriter and a ItemReader, to be used inside of Spring Batch JCL, which items are able to read and write datasets defined inside of data description.

They are org.pepstock.jem.springbatch.items.DataDescriptionItemReader and org.pepstock.jem.springbatch.items.DataDescriptionItemWriter.

They use a delegate to read and write, implemented in org.pepstock.jem.springbatch.items.SimpleFileItemReader and org.pepstock.jem.springbatch.items.SimpleFileItemWriter.

Data sources and locks concepts are not available with chuck approach.

A sample how to use chunk with data description is following:

<!-- Item reader and item writer definition -->
<beans:bean id="delegateItemReader" class="org.pepstock.jem.springbatch.items.SimpleFileItemReader">
</beans:bean>

<beans:bean id="delegateItemWriter" class="org.pepstock.jem.springbatch.items.SimpleFileItemWriter">
</beans:bean>

<beans:bean id="itemReader" class="org.pepstock.jem.springbatch.items.DataDescriptionItemReader">
   <beans:property name="dataDescription">
      <beans:ref local="input"></beans:ref>
   </beans:property>
   <beans:property name="delegate" ref="delegateItemReader"></beans:property>
</beans:bean>

<beans:bean id="itemWriter" class="org.pepstock.jem.springbatch.items.DataDescriptionItemWriter">
   <beans:property name="dataDescription">
      <beans:ref local="output"></beans:ref>
   </beans:property>
   <beans:property name="delegate" ref="delegateItemWriter"></beans:property>
</beans:bean>

A sample how to use configure chuck in the step element is following:

<job id="JOB1">
   <step id="step0">
      <tasklet>
         <chunk reader="itemReader" writer="itemWriter"></chunk>
      </tasklet>
   </step>
</job>

Utilities

Inside the JEM, you could find some common tasklets which are helpful for your JCLs.

JEM provides:

  • NullTasklet : do nothing! Just exiting in RC=0!
  • WaitTasklet : waits for seconds (passed as argument) and ends in RC=0
  • CopyTasklet : copies one or more files into another one
  • LauncherTasklet : executes runnable or whatever bean defined in the application context
  • MainLauncherTasklet : executes main java class, even from another classpath

NULL tasklet

NullTasklet is a Spring Batch tasklet that doesn't do anything, without any statement in its main method. It could be useful because it performs all locks for datasets defined in the tasklet.

Here is a sample:

<!--
Tasklet Defintion
-->
<beans:bean id="null" class="org.pepstock.jem.springbatch.tasks.utilities.NullTasklet"></beans:bean>
<!--
null: does nothing
-->
<job id="NULL">
   <step id="nothing">
      <tasklet ref="null"></tasklet>
   </step>
</job>

WAIT tasklet

WaitTasklet is a Spring Batch tasklet that wait for some seconds before to end in return code 0. The amount of seconds to wait must be passed as argument to main program. If the parameter is missing, WaitTasklet waits for forever and just canceling it could end. It could be useful because it performs all locks for datasets defined in the tasklet.

Here is a sample:

<!--
Tasklet Defintion
-->
<beans:bean id="wait" class="org.pepstock.jem.springbatch.tasks.utilities.WaitTasklet">
   <beans:property name="seconds" value="60"></beans:property>
</beans:bean>
<!--
wait: wait for 60 seconds
-->
<job id="WAIT">
   <step id="waiting">
      <tasklet ref="wait"></tasklet>
   </step>
</job>

COPY tasklet

CopyTasklet is a Spring Batch tasklet that copies one or more datasets into another one. The input datasets must be defined in a data description called INPUT and the output dataset must be defined in a data description called OUTPUT.

Here is a sample:

<!--
Tasklet Defintion
-->
<beans:bean id="icegener" class="org.pepstock.jem.springbatch.tasks.utilities.CopyTasklet">
   <beans:property name="dataDescriptionList">
      <beans:list>
         <beans:ref local="INPUT"></beans:ref>
         <beans:ref local="OUTPUT"></beans:ref>
      </beans:list>
   </beans:property>
</beans:bean>
<!--
Data description list
INPUT file
-->
<beans:bean id="INPUT" class="org.pepstock.jem.springbatch.tasks.DataDescription">
   <beans:property name="name" value="INPUT"></beans:property>
   <beans:property name="disposition" value="SHR"></beans:property>
   <beans:property name="datasets">
      <beans:list>
         <beans:bean class="org.pepstock.jem.springbatch.tasks.DataSet">
            <beans:property name="name" value="gdg/jemtest(0)"/>
         </beans:bean>
         <beans:bean class="org.pepstock.jem.springbatch.tasks.DataSet">
            <beans:property name="text" value="
             These records are added to OUTPUT file:
             Record1 test abcdefghjklilmnopqrstuvzxw
             Record2 test abcdefghjklilmnopqrstuvzxw
             Record3 test abcdefghjklilmnopqrstuvzxw"/>
         </beans:bean>
      </beans:list>
   </beans:property>
</beans:bean>
<!--
Data description list
OUTPUT file
-->
<beans:bean id="OUTPUT" class="org.pepstock.jem.springbatch.tasks.DataDescription">
   <beans:property name="name" value="OUTPUT"></beans:property>
   <beans:property name="disposition" value="NEW"></beans:property>
   <beans:property name="datasets">
      <beans:list>
         <beans:bean class="org.pepstock.jem.springbatch.tasks.DataSet">
            <beans:property name="name" value="gdg/jemtest(+1)"/>
         </beans:bean>
      </beans:list>
   </beans:property>
</beans:bean>
<!--
copy: copy a GDG generation 0 in a new one
-->
<job id="ICEGENER">
   <step id="copy" next="step2">
      <tasklet ref="icegener"></tasklet>
   </step>
</job>

Launcher tasklet

LauncherTasklet is a Spring Batch tasklet that is able to executes beans, defined inside of application context.

LauncherTasklet accepts a attribute:

  • object is an instance of a bean, defined inside of SpringBatch XML file. The launcher will execute java.lang.Runnable instances or beans which have a method with @ToBeExecuted annotation.

To execute a method, by @ToBeExecuted annotation, the launcher is able to execute method without any arguments or with 2 arguments, instances of org.springframework.batch.core.StepContribution and org.springframework.batch.core.scope.context.ChunkContext.

Here are the signature of executable methods:


class#method()

class#method(org.springframework.batch.core.StepContribution, org.springframework.batch.core.scope.context.ChunkContext)

If the Stepcontribution and ChunkContext are necessary, you can also use 2 special annotations, at field level:

  • @AssignStepContribution assigns the instance of StepContribution to a field of bean to be executed.
  • @AssignChunkContext assigns the instance of ChunkContext to a field of bean to be executed.

Here is a sample of application context of SpringBatch how to configure the execution of a bean:

<beans:bean id="run1" class="org.pepstock.jem.junit.test.springbatch.java.RestRunnable"/>

<beans:bean id="step1" class="org.pepstock.jem.springbatch.tasks.utilities.LauncherTasklet">
	<beans:property name="object" ref="run1" />
	<beans:property name="dataSourceList">
			<beans:list>
				<beans:ref local="jem-db"/>
			</beans:list>
	</beans:property>		
</beans:bean>

<!-- DataSource Definition -->
<beans:bean id="jem-db" class="org.pepstock.jem.springbatch.tasks.DataSource">
	<beans:property name="name" value="JUNIT-REST-RESOURCE" />
	<beans:property name="resource" value="JUNIT-REST-RESOURCE" />
</beans:bean>

Here are some samples how to write a bean to executable inside of launcher:

RUNNABLE

public class MyRunnable implements Runnable {
	
	@AssignStepContribution
	StepContribution stepContribution = null;
	
	@AssignChunkContext
	ChunkContext chunkContext = null;

	@Override
	public void run(){

	}

BEAN with annotations

public class MyBean {
	
	@AssignStepContribution
	StepContribution stepContribution = null;
	
	@AssignChunkContext
	ChunkContext chunkContext = null;

	@ToBeExecuted
	public void exec(){

	}

BEAN without annotations, but specific method

public class MyBean {
	
	@ToBeExecuted
	public void exec(StepContribution stepContribution, ChunkContext chunkContext){

	}

Main Launcher tasklet

MainLauncherTasklet is a Spring Batch tasklet that is able to executes java main class (classes with method public static void main(String[] args)).

The tasklet is able to use a custom classloader, separated from the system one. To be compliant with JEM and use datasources or datasets, add at the end of your classpath the system property ${java.class.path}.

MainLauncherTasklet accepts the following attribute:

  • className is a string which represents the name of class to load and execute. It must be a main java class.
  • arguments is an list of arguments to pass to main java class.
  • classPath is an list of files (also more than one for element, but semicolon separated)

If you don't use the classpath, you can use inside of your code the annotations for fields:

  • @AssignStepContribution assigns the instance of StepContribution to a field of main class to be executed (must be STATIC).
  • @AssignChunkContext assigns the instance of ChunkContext to a field of main class to be executed (must be STATIC).

Here is a sample how to configure the tasklet, with a specific classPath:

<beans:bean id="null" class="org.pepstock.jem.springbatch.tasks.utilities.MainLauncherTasklet">
	<beans:property name="className" value="org.pepstock.jem.junit.test.springbatch.java.DataSourceConnMain" />
	<beans:property name="arguments">
		<beans:list>
		<beans:value>argument1</beans:value>
		<beans:value>argument2</beans:value>
		</beans:list>
	</beans:property>
	<beans:property name="classPath">
		<beans:list>
		   <beans:value>${JEM_HOME}/lib/jem-junit.jar</beans:value>
		   <beans:value>${JEM_HOME}/lib/db/*</beans:value>
		   <beans:value>${java.class.path}</beans:value>
		</beans:list>
	</beans:property>
	<beans:property name="dataSourceList">
			<beans:list>
				<beans:ref local="jem-db"/>
			</beans:list>
	</beans:property>		
</beans:bean>

<!-- DataSource Definition -->
<beans:bean id="jem-db" class="org.pepstock.jem.springbatch.tasks.DataSource">
	<beans:property name="name" value="jem-db" />
	<beans:property name="resource" value="JUNIT_JDBC_JEM" />
</beans:bean>

Remember that you can use the classPath attribute of JemBean (see here more details);

Extensible XML authoring by JEM

Since version 2.0, Spring has featured a mechanism for schema-based extensions to the basic Spring XML format for defining and configuring beans.

JEM uses this feature of Spring adding XML elements to define own entities, like data description, data sets, tasklets. To activate this feature, you must add JEM name space, as following:

<beans:beans ....
        xmlns:jem="http://www.pepstock.org/schema/jem" 
        ...
        xsi:schemaLocation="
           ...
           http://www.pepstock.org/schema/jem
           http://www.pepstock.org/schema/jem/jem.xsd">

Importing this name space (see complete schema here), you can improve the reading of Spring Batch JCL, using specific XML element to define:

  • JEM configuration
  • JEM tasklet
    • Data description
    • Data set
    • Lock
    • Data source
  • JEM itemReader and ItemWriter
  • JEM launcher
  • JEM JAVA main class launcher

The root elements are tasklet (JPPF and not) and configuration. All other elements are children of tasklet and then nested elements (usable only inside the tasklet).

JEM configuration

To use Spring Batch inside JEM, Spring Batch XML JCL must contain the mandatory properties that JEM needs, using a specific bean by class org.pepstock.jem.springbatch.JemBean and setting the id attribute to jem.bean.

With the extensible XML authoring, a new XML tag has been created and you can use as following:

  <jem:configuration jobName="ICEGENER-2" environment="TEST-Env" />

All attributes names are the same of the properties keys that you can set using jem.bean bean. Please have a look here to the list of properties you can use.

This is only mandatory element you must define on your Spring Batch JCL to be executed inside JEM, because JEM needs to be application context aware of Spring (adding a own StepListener).

JEM tasklet

JEM prepared a own tasklet implementation, org.pepstock.jem.springbatch.tasks.JemTasklet which implements the execute method in final way and it calls another method that the developer must implement.

With the extensible XML authoring, a new XML tag has been created and you can use as following:

<!-- Tasklet Defintion -->
<jem:tasklet id="icegener"
                class="org.pepstock.jem.springbatch.tasks.utilities.CopyTasklet">
...
...
</jem:tasklet>

Tasklets usually use datadescriptions (and then datasets), locks and datasources.

With the extensible XML authoring, new XML tags have been created for all the entities. These tags must be nested inside of tasklet tag, as following:

<!-- Tasklet Defintion -->
<jem:tasklet id="icegener"
                class="org.pepstock.jem.springbatch.tasks.utilities.CopyTasklet">
    <jem:dataDescription name="INPUT" disposition="SHR">
         <jem:dataSet name="gdg1/jemtest(0)" />
         <jem:dataSet>
             These records are added to OUTPUT file:
             Record1 test
             abcdefghjklilmnopqrstuvzxw
             Record2 test abcdefghjklilmnopqrstuvzxw
             Record3 test abcdefghjklilmnopqrstuvzxw
         </jem:dataSet>
    </jem:dataDescription>

    <jem:dataDescription name="OUTPUT" disposition="NEW">
        <jem:dataSet name="gdg1/jemtest(1)" />
    </jem:dataDescription>

</jem:tasklet>


<jem:tasklet id="ftp" class="org.pepstock.jem.springbatch.tasks.utilities.CopyTasklet">

    <jem:lock name="traffic-light"/>
        
    <jem:dataDescription name="INPUT" disposition="SHR">
         <jem:dataSet name="Action.java" datasource="localhost" />
    </jem:dataDescription>

    <jem:dataDescription name="OUTPUT" disposition="NEW">
         <jem:dataSet name="gdg1/jemtest(1)" />
    </jem:dataDescription>
                
    <jem:dataSource name="localhost" resource="FTPlocalhost">
         <jem:property name="binary">true</jem:property>
    </jem:dataSource>

</jem:tasklet>

Tasklet element has got 2 mandatory attributes:

  • id: bean id, to use in int step element of Spring Batch
  • class: java class, implementation of org.pepstock.jem.springbatch.tasks.JemTasklet

Datadescription, dataset, lock and datasource elements will use the attribute already explained in previous sections.

JEM itemReader and itemWriter

JEM prepared a own items implementation, org.pepstock.jem.springbatch.item.DataDescriptionItemReader and org.pepstock.jem.springbatch.item.DataDescriptionItemReader which use a delegate to read and write.

With the extensible XML authoring, a new XML tag has been created and you can use as following:

<jem:itemReader id="reader"	delegate="org.pepstock.jem.springbatch.items.SimpleFileItemReader">
	...
</jem:itemReader>

<jem:itemWriter id="writer"	delegate="org.pepstock.jem.springbatch.items.SimpleFileItemWriter">
	...
</jem:itemWriter>

Items reader and writer usually use datadescriptions (and then datasets), locks and datasources.

With the extensible XML authoring, new XML tags have been created for all the entities. These tags must be nested inside of itemReader or itemWriter tag, as following:

<jem:itemReader id="reader"
	delegate="org.pepstock.jem.springbatch.items.SimpleFileItemReader">
	<jem:dataDescription name="INPUT" disposition="SHR">
		<jem:dataSet name="gdg/jemtest(0)" />
		<jem:dataSet>
			These records are added to OUTPUT file:
			Record1 test abcdefghjklilmnopqrstuvzxw
			Record2 test abcdefghjklilmnopqrstuvzxw
			Record3 test abcdefghjklilmnopqrstuvzxw
		</jem:dataSet>
	</jem:dataDescription>
</jem:itemReader>

<jem:itemWriter id="writer"
	delegate="org.pepstock.jem.springbatch.items.SimpleFileItemWriter">
	<jem:dataDescription name="OUTPUT" disposition="NEW">
		<jem:dataSet name="gdg/jemtest(1)" />
	</jem:dataDescription>
</jem:itemWriter>

JEM launcher

JEM prepared a launcher tasklet implementation, org.pepstock.jem.springbatch.tasks.utilities.LauncherTasklet which is able to executes beans, defined inside of application context.

With the extensible XML authoring, a new XML tag has been created and you can use as following:

<jem:launcher id="step1" object="bean_reference">
...
</jem:launcher>

The launcher tasklet usually use datadescriptions (and then datasets), locks and datasources.

With the extensible XML authoring, new XML tags have been created for all the entities. These tags must be nested inside of launcher tag, as following:

<beans:bean id="run1" class="org.pepstock.jem.junit.test.springbatch.java.RestRunnable"/>

<jem:launcher id="step1" object="run1">
	<jem:dataSource name="JUNIT-REST-RESOURCE" resource="JUNIT-REST-RESOURCE"/>
</jem:launcher>

JEM JAVA main class launcher

JEM prepared a launcher tasklet implementation, org.pepstock.jem.springbatch.tasks.utilities.MainLauncherTasklet which is able to executes java main class (classes with method public static void main(String[] args)).

With the extensible XML authoring, a new XML tag has been created and you can use as following:

<jem:main-launcher id="null" className="my.MainClass">
  ...
  <jem:arguments>
	<jem:argument>arg 1</jem:argument>
	...
	<jem:argument>arg N</jem:argument>
  </jem:arguments>
  <jem:classPath>
	<jem:pathElement>path 1</jem:pathElement>
	...
	<jem:pathElement>path N</jem:pathElement>
  </jem:classPath>
</jem:main-launcher>

The main launcher tasklet usually use datadescriptions (and then datasets), locks and datasources. The nested elements are:

  • arguments which are the list of arguments topass to the main method as array
  • classPath which contains all path element with the all jars or folder to load to run the JAVA main classes.

With the extensible XML authoring, new XML tags have been created for all the entities. These tags must be nested inside of main-launcher tag, as following:

<jem:main-launcher id="null" className="my.MainClass">
   <jem:dataSource name="JUNIT-REST-RESOURCE" resource="JUNIT-REST-RESOURCE"/>
   <jem:arguments>
       <jem:argument>FirstArgument</jem:argument>
   </jem:arguments>
   <jem:classPath>
	   <jem:pathElement>${jem.library}/my/my.jar</jem:pathElement>
       <jem:pathElement>${java.class.path}</jem:pathElement>
   </jem:classPath>
</jem:main-launcher>

Property place holder

Spring provides a property place holder configurer, org.springframework.beans.factory.config.PropertyPlaceholderConfigurer, that resolves ${...} placeholders against local properties and/or system properties and environment variables.

Here is an example: xml

<beans:bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
    <beans:property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE"/>
    <beans:property name="ignoreUnresolvablePlaceholders" value="true"/>
</beans:bean>

But this utility resolves placeholders inside of Spring application context (JEM JCL).

JEM needs a property place holder which can resolve placeholders inside of the properties of resources.

For instance, a connection property of a data base could set a path where to place a trace log. In this case, it's possible to set the property of resource using a placeholder which can be resolve at runtime by JEM, using local properties and/or system properties and environment variables.

For this reason, JEM provides an own property place holder, org.pepstock.jem.springbatch.tasks.PropertyPlaceholder, which extends the Spring one, maintaining all capabilities and adding the resolving inside the properties.

Here is an example how to use it inside of Spring application context (JEM JCL):

<beans:bean class="org.pepstock.jem.springbatch.tasks.PropertyPlaceholder">
    <beans:property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE"/>
    <beans:property name="ignoreUnresolvablePlaceholders" value="true"/>
</beans:bean>

Restartability

(quoted from SpringBatch documentation)

One key issue when executing a batch job concerns the behavior of a job when it is restarted. The launching of a job is considered to be a 'restart' if a job execution already exists for the particular job instance. Ideally, all jobs should be able to start up where they left off, but there are scenarios where this is not possible. It is entirely up to the developer to ensure that a new job instance is created in this scenario. However, Spring Batch does provide some help.

http://docs.spring.io/spring-batch/reference/html/images/job-repository-advanced.png

Restartability on JEM##

JEM, the BEE provides a set of beans to use the SpringBatch restartibility out-of-the-box therefore the developer can be concentrated only on the business logic and use the StepExceution and JobExecution beans to save the check point during the execution. The SpringBatch leverages on the following beans to implement the restartability:

  • JobRepository, bean name "jobRepository": is used for basic CRUD operations of the various persisted domain objects within Spring Batch, such as JobExecution and StepExecution.
  • JobExplorer, bean name "jobExplorer": is the ability to query the repository for existing executions.
  • TransactionManager, bean name "transactionManager": is to ensure that the batch meta data, including state that is necessary for restarts after a failure, is persisted correctly.

JEM provides 3 specific implementations of the previous beans to use the OOTB restartability.

This features is not activated on the standard JEM configuration. To activate it, you should change the jem_env.xml configuration file, where the SpringBatch JCL factory is defined, as following:

<factories>
	...
	<factory className="org.pepstock.jem.springbatch.SpringBatchFactory">
		<properties>
			<property name="jem.jdbc.url" value="jdbc:mysql://hostname/springbatch" /> 
			<property name="jem.jdbc.driver" value="com.mysql.jdbc.Driver" /> 
			<property name="jem.jdbc.user" value="userid" /> 
			<property name="jem.jdbc.password" value="blahblah" />
			<property name="jem.jdbc.type" value="mysql" />
		</properties>
	</factory>
	...
</factories>

The SpringBatch JCL factory needs to have specific properties set to define the data source to use for restartibility:

  • jem.jdbc.url: the JDBC url of datasource
  • jem.jdbc.driver: the java JDBC driver of datasource. Put all jars of /lib/ext of JEM
  • jem.jdbc.user: the JDBC userid of datasource
  • jem.jdbc.password: the JDBC password of datasource
  • jem.jdbc.type: the JDBC type of datasource. You must specify one of supported types from Springbatch:
    • db2 for DB2 or DB2 for Z/Os
    • derby for Apache Derby
    • h2 for H2
    • hsqldb for HSQL Database Engine
    • mysql for MySQL
    • oracle10g for Oracle
    • postgresql for PostgreSQL
    • sqlserver for Microsoft SQL Server
    • sybase for Sybase
    • sqlf for SQLite

When the JCL factory is configured as above, all SpringBatch jobs can use the restartability, but using the specific beans developed for JEM (described below). The data structures will be created automatically, if missing.

JemTransactionManager

JemTransactionManager is a custom PlatfromTransactionmanager of SpringBatch, which will use the information provided to configure the JCL factory to create the datasource. This transaction manager doesn't need aby additional parameters to be configured. It could be added on SpringBatch JCL as a specific transaction manager (not the main one, with the ID "transactionManager"), if you need more than one transaction managers.

Here is how to configure in your JCL:

<beans:bean class="org.pepstock.jem.springbatch.tasks.JemTransactionManager" id="jemtransactionManager"/>

JemJobRepository

JemJobRepository is a bean which extends a standard JobRepositoryFactoryBean. It is able to receive a transaction manager as parameter but this transaction manager MUST be a JemTransactionManager instance.

Here is how to configure in your JCL:

<beans:bean class="org.pepstock.jem.springbatch.tasks.JemJobRepository" id="jobRepository">
    <beans:property ref="jemtransactionManager" name="transactionManager"/>
</beans:bean>

JemJobExplorer

JemJobExplorer is a bean which extends a standard JobExplorerFactoryBean. This bean doesn't need aby additional parameters to be configured.

<beans:bean class="org.pepstock.jem.springbatch.tasks.JemJobExplorer" id="jobExplorer"> </beans:bean>

Putting all together, here is an example how to write your SpringBatch JCL leveraging on the restartability of JEM:

<beans:bean class="org.pepstock.jem.springbatch.tasks.JemTransactionManager" id="jemtransactionManager"/>

<beans:bean class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" id="transactionManager"/>

<beans:bean class="org.pepstock.jem.springbatch.tasks.JemJobRepository" id="jobRepository">
	<beans:property ref="jemtransactionManager" name="transactionManager"/>
</beans:bean>

<beans:bean class="org.pepstock.jem.springbatch.tasks.JemJobExplorer" id="jobExplorer"> </beans:bean>

<beans:bean class="org.springframework.batch.core.launch.support.SimpleJobLauncher" id="jobLauncher">
	<beans:property ref="jobRepository" name="jobRepository"/>
</beans:bean>

How to force the restart

JEM uses the command line job executor of SpringBatch to execute jobs. To restart a job in JEM, you should submit again the job with the parameter -restart. To do that you must specify the restart option on the JemBean of your SpringBatch, as following:

<beans:bean class="org.pepstock.jem.springbatch.JemBean" id="jem.bean">
	<beans:property name="jobName" value="NAME"/>
	<beans:property name="environment" value="TEST-Env"/>
	<beans:property name="options" value="-restart 3"/>
</beans:bean>

To restart a SpringBatch, you need the execution ID to be restarted. You can find this number on the job output log of previous execution, as following:

date time INFO   [main] JEMS0022 Job "NAME" is using "job" locking scope.
date time INFO   [main] Job: [FlowJob: [name=NAME]] launched with the following parameters: [{jem.job.id=0000000000000000003-0000001439810106362}]
date time INFO   [main] JEMS0067 JOB ID: 3
date time INFO   [main] Executing step: [waiting]
date time INFO   [main] Job: [FlowJob: [name=NAME]] completed with the following parameters: [{jem.job.id=0000000000000000003-0000001439810106362}] and the following status: [COMPLETED]

The message JEMS0067 reports the job id (in this case 3) to use on the next restart, if necessary.

Cancelling a job

JEM recognizes when a job is cancelled and it's able to update the restartability datasource, force the job execution in FAILED automatically. Therefore you can re-submit a cancelled job without any manual task.

Job ends correctly

To avoid the annoying cleanup of the restartability datasource of Springbatch, everytime that a job ends correctly, all executions related to that job will be automatically removed.

Clone this wiki locally