Usage

Overview

This repository contains a Dockerfile which generates TDB2 datasets for Fuseki.

It can:

Create TDB2 datasets
Create a spatial index for the dataset.
Create a text index for the dataset.

Usage

Create a tdb2 dataset in the current directory from the RDF files in ./data.

docker run \
  -v "./data:/rdf" \
  -v "$(pwd):/fuseki/databases" \
  --rm \
  ghcr.io/kurrawong/tdb2-generation:latest

Note

To persist the generated dataset files, you need to mount a volume to the location
where the dataset will be created.

Typically, this is the location of the tdb2 dataset as specified in the mounted
assembler description (/config.ttl).

If no assembler description is given then the dataset will be created at
/fuseki/databases/ds

This can be overriden with the $DATASET Environment Variable.
See the Environment Variables section below for more information.

The loading process can be configured by passing environment variables to the container. See the table below for all available options.

The text and spatial index creation are opt-in and will not be generated by default.

To create a tdb dataset with a text and spatial index:

docker run \
  -e "SPATIAL=true" \
  -e "TEXT=true" \
  -v "./data:/rdf" \
  -v "$(pwd):/fuseki/databases" \
  -v "./config.ttl:/config.ttl" \
  --rm \
  ghcr.io/kurrawong/tdb2-generation:latest

Environment Variables

Variable	Purpose	Default	Usage Example
`JENA_VERSION`	Which version of jena/fuseki to use for building the database.	5.5.0 options: [ 5.5.0, ... ]	`JENA_VERSION=5.5.0`
`SPATIAL`	If set, do spatial indexing	unset (false)	`SPATIAL=true`
`TEXT`	If set, do text indexing. Requires an assembler description mounted at `/config.ttl`	unset (false)	`TEXT=true`
`THREADS`	Sets the number of threads to use for processing (only applies to tdb2.xloader)	Number of available processors minus 1	`THREADS=4`
`USE_XLOADER`	If set, use tdb2.xloader instead of tdb2.tdbloader. See tdb.xloader	unset (false)	`USE_XLOADER=true`
`TDB2_MODE`	Specifies the loader mode for tdb2.tdbloader. See tdbloader options	`phased` if not set	`TDB2_MODE=sequential`
`DATASET`	Specifies the path where the tdb dataset should be created.	If no assembler description is mounted at /config.ttl it will defualt to `/fuseki/databases/ds`. Else it is derived from the `tdb2:location "..." ;` statement in /config.ttl.	`DATASET=/fuseki/databases/myds`
`SKIP_VALIDATION`	If set skip the validation check. By default, invalid RDF files will be marked as *.invalid and not processed.	unset (false)	`SKIP_VALIDATION=true`
`SKIP_LOAD`	If set skip the tdb2 generation. Allows indexing an already built dataset or applying validation only.	unset (false)	`SKIP_LOAD=true`
`GRAPH`	Optional named graph for triples (only used for tdb2.tdbloader, not tdb2.xloader)	unset	`GRAPH=https://graphs/example`
`JVM_ARGS`	General Java args	unset	`JVM_ARGS=-Xmx4G`

Development

To build the image locally

docker build . -t tdb2-generation:dev

To run it against some test data / config

docker compose up

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github/workflows		.github/workflows
docs		docs
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Taskfile.yaml		Taskfile.yaml
compose.yaml		compose.yaml
entrypoint.sh		entrypoint.sh
jena_download.sh		jena_download.sh
query.rq		query.rq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Usage

Environment Variables

Development

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Kurrawong/tdb2-generation

Folders and files

Latest commit

History

Repository files navigation

Overview

Usage

Environment Variables

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages