Skip to content

Kurrawong/tdb2-generation

Repository files navigation

Overview

This repository contains a Dockerfile which generates TDB2 datasets for Fuseki.

It can:

  1. Create TDB2 datasets
  2. Create a spatial index for the dataset.
  3. Create a text index for the dataset.

Usage

Create a tdb2 dataset in the current directory from the RDF files in ./data.

docker run \
  -v "./data:/rdf" \
  -v "$(pwd):/fuseki/databases" \
  --rm \
  ghcr.io/kurrawong/tdb2-generation:latest

Note

To persist the generated dataset files, you need to mount a volume to the location
where the dataset will be created.

Typically, this is the location of the tdb2 dataset as specified in the mounted
assembler description (/config.ttl).

If no assembler description is given then the dataset will be created at
/fuseki/databases/ds

This can be overriden with the $DATASET Environment Variable.
See the Environment Variables section below for more information.

The loading process can be configured by passing environment variables to the container. See the table below for all available options.

The text and spatial index creation are opt-in and will not be generated by default.

To create a tdb dataset with a text and spatial index:

docker run \
  -e "SPATIAL=true" \
  -e "TEXT=true" \
  -v "./data:/rdf" \
  -v "$(pwd):/fuseki/databases" \
  -v "./config.ttl:/config.ttl" \
  --rm \
  ghcr.io/kurrawong/tdb2-generation:latest

Environment Variables

Variable Purpose Default Usage Example
JENA_VERSION Which version of jena/fuseki to use for building the database. 5.5.0 options: [ 5.5.0, ... ] JENA_VERSION=5.5.0
SPATIAL If set, do spatial indexing unset (false) SPATIAL=true
TEXT If set, do text indexing. Requires an assembler description mounted at /config.ttl unset (false) TEXT=true
THREADS Sets the number of threads to use for processing
(only applies to tdb2.xloader)
Number of available processors minus 1 THREADS=4
USE_XLOADER If set, use tdb2.xloader instead of tdb2.tdbloader.
See tdb.xloader
unset (false) USE_XLOADER=true
TDB2_MODE Specifies the loader mode for tdb2.tdbloader.
See tdbloader options
phased if not set TDB2_MODE=sequential
DATASET Specifies the path where the tdb dataset should be created. If no assembler description is mounted at /config.ttl it will defualt to /fuseki/databases/ds. Else it is derived from the tdb2:location "..." ; statement in /config.ttl. DATASET=/fuseki/databases/myds
SKIP_VALIDATION If set skip the validation check. By default, invalid RDF files will be marked as *.invalid and not processed. unset (false) SKIP_VALIDATION=true
SKIP_LOAD If set skip the tdb2 generation. Allows indexing an already built dataset or applying validation only. unset (false) SKIP_LOAD=true
GRAPH Optional named graph for triples (only used for tdb2.tdbloader, not tdb2.xloader) unset GRAPH=https://graphs/example
JVM_ARGS General Java args unset JVM_ARGS=-Xmx4G

Development

To build the image locally

docker build . -t tdb2-generation:dev

To run it against some test data / config

docker compose up

About

Dockerfile to generate Jena TDB2 database from RDF, including spatial index

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 2

  •  
  •