This repository contains the code for the Master's thesis Text2SHACL: LLM-Driven Generation of Validation Graphs for Automatic Assessment of Social Benefit Eligibility. The project explores how large language models (LLMs) can support the automatic generation of SHACL shapes graphs from natural language text, in particular descriptions of eligibility requirements for social benefits.
Name: Seike Appold
Email: seike.appold@stud.leuphana.de
Institution: Leuphana University Lüneburg, Institute of Information Systems
Program: Management & Data Science
Each experiment consists of prompting one or more LLM to generate SHACL shapes graphs from natural language eligibility requirements. They differ in the components included in the prompts:
- Baseline: Instruction + ontology + input text
- Fewshot: Instruction + ontology + selected worked examples + input text
- Chain-of-Thought (CoT): Instruction + ontology + selected worked examples with intermediate reasoning steps + input text
The ontology specifies the classes, properties, and individuals used in the RDF data to be validated. When applicable, examples are selected based on input embedding similarity.
The generated shapes graphs are compared against expert-annotated gold graphs using two groups of metrics:
- Syntactic quality: Graph Edit Distance (GED), G-BERTScore, Triple Match (Precision, Recall, F1)
- Semantic quality: Comparison of validation outcomes on synthetic user profiles (Precsion, Recall, F1, Accuracy)
Run pip install -r requirements.txt
To run an experiment with the default configurations from the thesis, run the following command from the root-directory, replacing the placeholders as specified below:
python Pipeline/Inference/RunInference.py --mode <mode> \
--api_key <api_key> \
--base_url <base_url> \
Parameter | Required | Description | Default |
---|---|---|---|
--mode |
✅ | Prompting strategy: baseline , fewshot , or cot (chain-of-thought). |
— |
--api_key |
✅ | Your API key for the Chat-AI API | — |
--base_url |
✅ | Base URL for the Chat-AI API endpoint. | — |
--num_examples |
❌ | Number of examples (fewshot /cot only) |
1 (fewshot /cot ), 0 (baseline ) |
--k |
❌ | Number of folds for k-fold cross-validation (fewshot /cot only). Must be at least 2. |
3 (fewshot /cot ), 0 (baseline ) |
--custom_models |
❌ | Space-separated list of model names to use. See Chat-AI documentation for available models. | None |
Note: The script expects a standard directory structure for test inputs, prompt components, results, and gold data. You can override these: --test_dir
, --prompt_components_dir
, --results_dir
, --groundtruth_dir
To compute the performance metrics specified above for a given experiment run, run the following command from the root-directory:
python Pipeline/Evaluation/RunEvaluation.py --experiment <experiment>
Parameter | Required | Description | Default |
---|---|---|---|
--experiment |
✅ | Name of the experiment to evaluate. Must match the name of the experiment folder. | — |
Note: The script expects a standard directory structure for results, SHACL gold files, and user profiles. You can override these: --results_dir
, --shacl_gold_dir
, --profiles_dir
The experiments and evaluation were conducted using Python 3.12.3
on Ubuntu 24.04.2 LTS
, with model inference run on the HPC infrastructure provided by the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen.
The dataset consists of two parts:
-
Requirements texts: The natural language descriptions of eligibility requirements for social benefits were retrieved using the Suchdienst-API from the Portalverbund Online Gateway (PVOG). The data was downloaded on January 20, 2025. For the raw data, only exemplary files are included on remote.
-
SHACL Gold: The shapes graphs used as ground truth in this project were generated by the author, and independently verified by two domain experts.
The repository is structured as follows.
- text-to-SHACL
- Analysis: Code and files for analyzing and visualization results.
- Pipeline
- Inference: Code for generating model output with different prompts.
- Evaluation: Code for evaluating LLM-generated SHACL shapes.
- Preprocessing: Code for scraping and preparing the dataset.
- Utils: General helper functions used throughout the repository.
- data
- processed
- requirements_texts: Input requirements texts for selected benefits.
- shacl_gold: Human-annotated SHACL graphs (ground truth).
- raw
- service_catalogs: All administrative services by municiaplity.
- all_service_descriptions: Full descriptions of all adminsitrative services.
- social_benefit_descriptions: Intermediate benefit selection.
- processed
- resources
- requiremets_decomposition: Extracted individual requirements.
- schemata: Metadata about benefits, experiments, and SHACL vocabulary.
- templates: SHACL & Decomposition templates.
- user_profiles: Synthetic usre profiles in RDF.
- results: Model output and main syntactic and semantic evaluation metrics.
- <run_id>/: One folder per experiment run (e.g., baseline_0ex0fcv_1745664019)
- <model_name>/: One folder per model used in the run
- metrics/: Evaluation metrics for this model
- output/: Raw and parsed model outputs
- parsed_output/: Generated SHACL graphs per requirement
- <model_name>/: One folder per model used in the run
- <run_id>/: One folder per experiment run (e.g., baseline_0ex0fcv_1745664019)
Special thanks to Ben and Benjamin from Förderfunke for supporting the annotation process with their practical expertise and providing the idea to use SHACL validation to assess social benefit eligibility, which inspired and shaped this project. A running demo of their system is available on their website, and additional resources can be found on their GitHub.