This repository contains a Dockerfile which generates TDB2 datasets for Fuseki.
It can:
- Create TDB2 datasets
- Create a spatial index for the dataset.
- Create a text index for the dataset.
Create a tdb2 dataset in the current directory from the RDF files in ./data
.
docker run \
-v "./data:/rdf" \
-v "$(pwd):/fuseki/databases" \
--rm \
ghcr.io/kurrawong/tdb2-generation:latest
Note
To persist the generated dataset files, you need to mount a volume to the location
where the dataset will be created.
Typically, this is the location of the tdb2 dataset as specified in the mounted
assembler description (/config.ttl).
If no assembler description is given then the dataset will be created at
/fuseki/databases/ds
This can be overriden with the $DATASET Environment Variable.
See the Environment Variables section below for more information.
The loading process can be configured by passing environment variables to the container. See the table below for all available options.
The text and spatial index creation are opt-in and will not be generated by default.
To create a tdb dataset with a text and spatial index:
docker run \
-e "SPATIAL=true" \
-e "TEXT=true" \
-v "./data:/rdf" \
-v "$(pwd):/fuseki/databases" \
-v "./config.ttl:/config.ttl" \
--rm \
ghcr.io/kurrawong/tdb2-generation:latest
Variable | Purpose | Default | Usage Example |
---|---|---|---|
JENA_VERSION |
Which version of jena/fuseki to use for building the database. | 5.5.0 options: [ 5.5.0, ... ] | JENA_VERSION=5.5.0 |
SPATIAL |
If set, do spatial indexing | unset (false) | SPATIAL=true |
TEXT |
If set, do text indexing. Requires an assembler description mounted at /config.ttl |
unset (false) | TEXT=true |
THREADS |
Sets the number of threads to use for processing (only applies to tdb2.xloader) |
Number of available processors minus 1 | THREADS=4 |
USE_XLOADER |
If set, use tdb2.xloader instead of tdb2.tdbloader. See tdb.xloader |
unset (false) | USE_XLOADER=true |
TDB2_MODE |
Specifies the loader mode for tdb2.tdbloader. See tdbloader options |
phased if not set |
TDB2_MODE=sequential |
DATASET |
Specifies the path where the tdb dataset should be created. | If no assembler description is mounted at /config.ttl it will defualt to /fuseki/databases/ds . Else it is derived from the tdb2:location "..." ; statement in /config.ttl. |
DATASET=/fuseki/databases/myds |
SKIP_VALIDATION |
If set skip the validation check. By default, invalid RDF files will be marked as *.invalid and not processed. | unset (false) | SKIP_VALIDATION=true |
SKIP_LOAD |
If set skip the tdb2 generation. Allows indexing an already built dataset or applying validation only. | unset (false) | SKIP_LOAD=true |
GRAPH |
Optional named graph for triples (only used for tdb2.tdbloader, not tdb2.xloader) | unset | GRAPH=https://graphs/example |
JVM_ARGS |
General Java args | unset | JVM_ARGS=-Xmx4G |
To build the image locally
docker build . -t tdb2-generation:dev
To run it against some test data / config
docker compose up