csvw-validator is java implementation of W3C CSV on the Web validator.
Java - version 8 (higher versions should be compatible, but are not tested)
Maven - version 3.3.9+
Run command:
mvn clean install
which will install the application with all 3 modules:
csvw-validator-lib- module which contains the whole validator logic (including parsing).csvw-validator-cli-app- module with simple command line validator applicationcsvw-validator-web-app- module, which contains Spring Boot application with REST web service and simple Vaadin Web UI.
Web application is in module csvw-validator-web-app. After project installation you should open the folder with this module and run command:
mvn spring-boot:run
or you can take generated JAR (eg. csvw-validator-web-app-1.0.0-SNAPSHOT.jar) and run command:
java -jar csvw-validator-web-app-1.0.0-SNAPSHOT.jar
Both commands will start the application server on port 8080. After startup open http://localhost:8080/ in your browser.
If you want to specify the port, use these command instead:
mvn spring-boot:run -Dspring-boot.run.arguments=--server.port={port}
java -jar csvw-validator-web-app-1.0.0-SNAPSHOT.jar --server.port={port}
where instead {port} you can insert the port number.
Web service is in module csvw-validator-web-app. After project installation you should open the folder with this module and run command:
mvn spring-boot:run
or you can take generated JAR (eg. csvw-validator-web-app-1.0.0-SNAPSHOT.jar) and run command:
java -jar csvw-validator-web-app-1.0.0-SNAPSHOT.jar
Both commands will start the application server on port 8080. After startup you will have ready REST web service at http://localhost:8080/.
If you want to specify the port, use these command instead:
mvn spring-boot:run -Dspring-boot.run.arguments=--server.port={port}
java -jar csvw-validator-web-app-1.0.0-SNAPSHOT.jar --server.port={port}
where instead {port} you can insert the port number.
Web service has two endpoints:
/validate- for validating one tabular data file and/or metadata file./validateBatch- for validating multiple files./validationResult/{resultId}- for obtaining result of validation.
REST API documentation is included in RAML in file /resources/validator.raml.
Endpoint /validate validates one tabular data file or metadata file or both together.
It receives POST HTTP requests with content-type and accept set to application/json.
So request body has to be JSON with format:
{
"tabularUrl": "http://www.w3.org/2013/csvw/tests/test286.csv",
"metadataUrl": "http://www.w3.org/2013/csvw/tests/test286-metadata.json",
"strictMode": false,
"language": "en"
}
Where each property is optional, but at least one of tabularUrl or metadataUrl must be specified.
TabularUrlis string property for URL of tabular data file.MetadataUrlis string property for URL of metadata file.StrictModeis boolean property, which disable/enable strict mode. By default is strict mode enabled. In strict mode tabular files are validating against RFC 4180 (CSV comma delimiter, line endings \r\n, UTF-8 encoding,...).Languageis code of language in which all error messages should translate. Default is "en", validator also supports "cs".
Response looks like this:
{
"id": "resultId",
"validationStatus": "ERROR",
"tabularUrl": "http://www.w3.org/2013/csvw/tests/test286.csv",
"metadataUrl": "http://www.w3.org/2013/csvw/tests/test286-metadata.json",
"validationErrors": [
{
"severity": "ERROR",
"message": "Cell (row 2 column 1) cannot be formatted as 'integer' dataype."
}
],
"warningCount": 0,
"errorCount": 1,
"fatalCount": 0,
"totalErrorsCount": 1
}
Id identifier of the result.
ValidationStatus is result of validation - PASSED, WARNING or ERROR.
TabularUrl and metadataUrl are URL of validated files.
ValidationErrors contains errors created during validation. Each error has severity (WARNING, ERROR, FATAL).
Then there are number of all errors (totalErrorsCount), warning errors (warningCount), error errorss (errorCount) and fatal errors (fatalCount).
Example request in cURL:
curl -X POST \
http://localhost:8080/validate \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-H 'Postman-Token: cbfb628b-0444-48f4-8307-5f8a3d011644' \
-H 'cache-control: no-cache' \
-d '{
"tabularUrl": "http://www.w3.org/2013/csvw/tests/test286.csv",
"metadataUrl": "http://www.w3.org/2013/csvw/tests/test286-metadata.json",
"strictMode": false
}'
Endpoint /validateBatch validates multiple tabular data files and/or metadata files.
It receives POST HTTP requests with content-type and accept set to application/json.
So request body has to be JSON with format:
{
"filesToProcess": [
{
"tabularUrl": "http://www.w3.org/2013/csvw/tests/test282.csv",
"metadataUrl": "http://www.w3.org/2013/csvw/tests/test282-metadata.json"
},
{
"tabularUrl": "http://www.w3.org/2013/csvw/tests/test283.csv"
},
{
"metadataUrl": "http://www.w3.org/2013/csvw/tests/test284-metadata.json"
},
{
"tabularUrl": "http://www.w3.org/2013/csvw/tests/test286.csv",
"metadataUrl": "http://www.w3.org/2013/csvw/tests/test286-metadata.json"
}
],
"strictMode": false
"filesResults": false
}
Where stricMode and language are the same properties as in /validate endpoint - it enables strict mode and set language of messages.
Property filesToProcess is array property, which contains object with tabularUrl and/or metadataUrl as in /validate endpoint.
Last property is boolean property filesResults, which enables or disables results for each validated file in response.
If it is disabled, only result ID is sent back and not the whole result. By default is set to true.
Response can look like this JSON:
{
"filesCount": 3 ,
"passedFilesCount": 3 ,
"warningFilesCount": 0 ,
"errorFilesCount": 0 ,
"filesResults": [
{
"id": "resultId1"
},
{
"id": "resultId2"
},
{
"id": "resultId3"
},
{
"id": "resultId4"
}
]
}
It has property filesCount, which contains number of validated files.
Then there are counters for passed (passedFileCount), warning (warningFileCount) and error (errorFileCount) files.
For each validated files, there is resultId and ValidationResponse (object from /validate endpoint) in filesResults.
In case when property filesResults was set to false in the request, there are just resultIds and not ValidationResponses.
In project there is Shell script batchValidate.sh which runs batchValidation with over 1000 files.
Endpoint /validationResult/{resultId} returns result in JSON format as in /validate endpoint.
If result with given ID doesn't exists, endpoint will return 404 response code.
Command-line application is located in module csvw-validator-cli-app.
Take generated JAR (eg. csvw-validator-cli-app-1.0.0-SNAPSHOT.jar.) and start the application with command:
java -jar csvw-validator-cli-app-1.0.0-SNAPSHOT.jar [-f <FILE>] [-s <SCHEMA>]
[-o <OUTPUT>] [--strict] [-h] [--rdf] [--csv]
-f,--file <FILE> The CSV file URL to be processed
-s,--schema <SCHEMA> The CSVW schema URL to be processed
-o,--output <OUTPUT> The name of output file without suffix
--not-strict Disables strict mode
--csv Validator will generate output also in CSV format.
--rdf Validator will generate output also in RDF format
-h,--help Show help
At least FILE or SCHEMA must be specified.
Default file name (path) of output file is result. Command-line application will print result of validating to output and also create
default text result file (eg. result.txt).
If --rdf or --csv arguments are specified, then validator will create extra files with these formats (eg. result.ttl and result.csv).
If OUTPUT is specified, then it is used as output file instead result.
Option --not-strict disables the strict mode.
Example: For validating by command line with these arguments:
java -jar validator.jar -f http://www.w3.org/2013/csvw/tests/test286.csv
-s http://www.w3.org/2013/csvw/tests/test286-metadata.json --rdf --csv -o ../newResult
text result is:
Tabular URL: http://www.w3.org/2013/csvw/tests/test286.csv
Metadata URL: http://www.w3.org/2013/csvw/tests/test286-metadata.json
Result: ERROR
Strict mode: false
Total errors: 1
Warning errors: 0
Error errors: 1
Fatal errors: 0
Errors:
ERROR: Cell (row 2 column 1) cannot be formatted as 'integer' datatype.
which will be printed to output and also it will be in file newResults.txt generated in parent folder.
Module csvw-validator-lib can be used as validator.
If you use Maven you can easily add dependency, but first you need to add maven-dependency-plugin.
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<dependencies>
<dependency>
<groupId>org.apache.felix</groupId>
<artifactId>maven-bundle-plugin</artifactId>
<version>2.4.0</version>
<type>maven-plugin</type>
</dependency>
</dependencies>
<extensions>true</extensions>
</plugin>
</plugins>
</build>
then you have to add dependencies:
<dependencies>
<dependency>
<groupId>com.malyvoj3</groupId>
<artifactId>csvw-validator-lib</artifactId>
<version>1.0.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>5.0.8.RELEASE</version>
</dependency>
</dependencies>
Than you can use classes from csvw-validator-lib. But this module uses Spring framework for dependency injection, so you have to create
the whole dependencies between class instances or use Spring. If you want for example just class CsvwProcessor, which is main class for validating,
you can get it with all its dependencies with this code (which is also used in command-line application):
ApplicationContext ctx = new AnnotationConfigApplicationContext(ProcessorConfig.class);
CsvwProcessor processor = ctx.getBean(CsvwProcessor.class);
// processor is the Spring bean with all dependencies
You can generate HTML with command:
mvn javadoc:javadoc
After that, each module will have HTML documentation in /target/site folder.
Or it is at GitHub pages: https://malyvoj3.github.io/csvw-validator/