-
Notifications
You must be signed in to change notification settings - Fork 80
MarkLogic Content Pump (mlcp) and Gradle
The MlcpTask
class allows you to invoke MarkLogic's Content Pump tool (mlcp) via a Gradle task.
One benefit of using MlcpTask
vs JavaExec
is that MlcpTask
will use the mlHost/(mlUsername or mlRestAdminUsername)/(mlPassword or mlRestAdminPassword) properties by default, which are defined in the mlAppConfig
instance that ml-gradle instantiates in Gradle. Another benefit is you don't need to download mlcp and put the executable in your path - you can run this from anywhere, as all of mlcp's libraries are downloaded via Gradle. That's also handy for something like running mlcp on a Jenkins CI server.
MlcpTask
also provides task properties for most of mlcp's command-line arguments. These are just syntactic sugar - since MlcpTask
extends JavaExec
, you can always pass properties through JavaExec's "args" property.
Note that you don't need to use MlcpTask
either to use mlcp - just use JavaExec
, and configure all of the command line arguments yourself. In particular, if you are using an MLCP options file to specify arguments for MLCP, the syntactic sugar provided by MlcpTask
won't be of any help.
As of ml-gradle 4.3.0, you should use at least Gradle 6.6. If you'd like to use Gradle 7.0 or higher, you must use ml-gradle 4.3.1 or higher.
Below is an example of using MlcpTask
and pulling in the mlcp dependencies (this omits the configuration needed for pulling in ml-gradle - see the mlcp-project build file for a more complete example, which shows both import and export tasks):
plugins {
id "com.marklogic.ml-gradle" version "4.3.1"
}
repositories {
mavenCentral()
maven { url "https://developer.marklogic.com/maven2/" }
}
configurations {
mlcp
}
dependencies {
mlcp "com.marklogic:mlcp:10.0.6.2"
}
task sample(type: com.marklogic.gradle.task.MlcpTask) {
classpath = configurations.mlcp
command = "IMPORT"
database = "my-database"
input_file_path = "my-input-file.txt"
input_file_type = "delimited_text"
output_collections = "my-collection"
// Can also override the default properties
// username = "some-other-username"
etc...
}
See Dynamically creating tasks for tips on reducing duplication across many MLCP tasks.
MLCP uses Log4j for logging. When you depend on MLCP via a dependency, you don't get a default log4j.properties file. And thus, you won't get any logging from MLCP.
This is easy to fix though - one way is to make an e.g. ./lib/log4j.properties file (the name of the directory can be anything), and then add "lib" as an mlcp dependency:
dependencies {
mlcp "com.marklogic:mlcp:10.0.6.2"
mlcp files("lib")
}
And here's a very simple log4j.properties file that you can use as a starting point:
# Root logger option
log4j.rootLogger=INFO, stdout
# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
If you execute an instance of MlcpTask
with Gradle's info or debug logging enabled, all of the arguments passed to MLCP - including passwords - will be logged via the JavaExec
parent class. To avoid this, choose one of the following options:
- Don't use info or debug logging when running an instance of
MlcpTask
, or any instance ofJavaExec
where passwords are passed as plaintext. - Use an MLCP options file - in which case you should just use
JavaExec
so that you do not inherit what will be the unwanted behavior whereMlcpTask
automatically sets a password based onmlRestAdminPassword
.
Note that if neither info or debug logging is enabled, MlcpTask
will print all of the non-password arguments passed to it.
Be aware that MlcpTask
defaults to using port 8000. IF you specify a transform parameter in your MlcpTask
, then you will need to set the "port" parameter to that of your XDBC server or REST server that supports XDBC requests.
New in ml-gradle 2.6.0 - you can set the logOutputUri
parameter to define a URI for mlcp log output to be written to:
task sample(type: com.marklogic.gradle.task.MlcpTask) {
...
logOutputUri = "/mlcp-output.txt"
}
And new in 3.12.0 - you can provide a custom DatabaseClient
to control what database the log output is written to (it defaults to mlAppConfig.newDatabaseClient()
):
task sample(type: com.marklogic.gradle.task.MlcpTask) {
...
logOutputUri = "/mlcp-output.txt"
logClient = mlAppConfig.newModulesDatabaseClient() // Just notional - reference or construct any DatabaseClient you want
}
When running mlcp via Gradle on Windows, you're likely to see the following message logged:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
It reads as an exception, but unless you're using certain Hadoop-based features within mlcp, you can safely ignore this. If MLCP is instead throwing an error later on, you likely should use the MLCP standalone distribution instead of using MlcpTask
.
You can also suppress the message by performing the following steps:
- Create a dummy lib\bin\winutils.exe file in your project
- Add the following to your task that extends
MlcpTask
:
systemProperties = ["hadoop.home.dir" : "$project.rootDir/lib"]