NeuReg is a neuro-symbolic QA generation framework that transforms complex regulatory documents into intelligent and explainable questionβanswering systems. It seamlessly integrates ontology-guided knowledge graphs (KGs) and regulatory text chunks to generate high-quality, semantically grounded QA pairs. By combining structured symbolic knowledge from domain ontologies with the contextual richness of unstructured policy text, NeuReg enables accurate, diverse, and interpretable QA generation using large language models (LLMs).
- π Pipeline Overview
- π§ Model Architecture
- π Motivation
- β¨ Key Contributions
- π― QA Generation Types
- π Repository Structure
- βοΈ Installation
βΆοΈ Getting Started- π Citation
- π License
NeuReg consists of two main stages: Knowledge Extraction and Question Answer Generation.
- Regulatory text is split into coherent text chunks.
- A domain-specific Educational Funding Regulation Ontology (EFRO) is used to guide schema extraction and triple generation using GPT-4 turbo.
- Each output triple is structured as (subject, predicate, object) with post-processing.
Each chunk is mapped to its corresponding KG (based on chunk_id
). The QA generation follows four steps:
- Question Type Selection β Factual, Relational, Comparative, Inferential
- Prompt Augmentation β Zero-shot, One-shot, Few-shot prompting strategies
- QA Filtering β Answer length, semantic similarity (cosine < 0.85), retry up to 3 times
- Validation β Human annotation and automatic scoring from LLMs
Figure 1: NeuReg: Neuro-symbolic framework for regulatory QA generation using ZS, OS, and FS prompting with ontology-guided KG extraction.
Access to education funding is governed by complex and evolving regulations. These policies are often communicated through lengthy documents that are difficult for students and institutional staff to interpret. NeuReg addresses this challenge by transforming unstructured regulatory guidance into structured and explainable QA datasets, bridging the gap between dense policy language and actionable decision support.
We present NeuReg, a neuro-symbolic questionβanswer generation framework that integrates the generative power of large language models (LLMs) with structured knowledge from ontology-guided knowledge graphs and their aligned regulatory text segments. This hybrid approach enables the generation of high-quality, semantically grounded QA pairs tailored to complex regulatory domains.
We construct a domain-specific QA dataset for regulatory compliance in education funding, encompassing four distinct question types: Factual (FactQ), Relational (RelQ), Comparative (CompQ), and Inferential (InferQ). These QA pairs are generated using multi-strategy prompting and rigorously validated through comparative assessment by expert human annotators and state-of-the-art (SOTA) LLM judges. To the best of our knowledge, this is the first QA dataset of its kind within this domain.
We conduct controlled ablation studies to quantify the individual contributions of structured KG triples and unstructured text chunks to QA generation quality, demonstrating the indispensable role of symbolic knowledge. Additionally, we evaluate the practical utility of the generated datasets by fine-tuning multiple LLMs (T5, FLAN-T5), analyzing the effects of prompting strategies (ZS, OS, FS) and model scale on QA performance.
Type | Description |
---|---|
FactQ | Extract concrete details (e.g., definitions, thresholds, dates) for grounded information retrieval. |
RelQ | QExamine entity interactions within regulatory structures, reflecting KG-based links (e.g., between providers and funding authorities). |
CompQ | Contrast policies, programmes, or entities to highlight distinctions or trade-offs. |
InferQ | Require synthesis or multi-hop reasoning across text and KG to derive implicit conclusions |
NeuReg/
βββ README.md # Overview of the project, contributions, pipeline, and structure
βββ LICENSE # Project license (MIT)
βββ requirements.txt # Python dependencies for reproducing the results
βββ data/ # Preprocessing and knowledge graph construction
β βββ README.md # Overview of chunk & triple-level statistics
β βββ chunks/ # Extracting regulatory text chunks
β β βββ chunks.csv # chunk dataset
β β βββ chunks.ipynb # Chunk extraction notebook
β βββ ontology/ # Ontology schema and KG triples
β β βββ ontology_schema.json # Extracted ontology schema in JSON
β β βββ Ontology_Guided_Triples.csv # Ontology-guided KG triples
β β βββ Ontology_Guided_Triples_statistics.json # Stats on generated triples
β β βββ EFRO_Schema_Extraction.ipynb # Extract ontology schema from guidance
β β βββ KG_Extraction.ipynb # Generate KG using ontology + chunks
βββ qa_generation/ # QA dataset generation using prompting
β βββ README.md
β βββ Zero-shot.ipynb # Zero-shot QA generation
β βββ One-shot.ipynb # One-shot QA generation
β βββ Few-shot.ipynb # Few-shot QA generation
β βββ Zero-Shot_qa_dataset.json # Output QA dataset (zero-shot)
β βββ One-Shot_qa_dataset.json # Output QA dataset (one-shot)
β βββ Few-Shot_qa_dataset.json # Output QA dataset (few-shot)
β βββ Zero_Shot_QA_analysis_report.json # Analysis report (zero-shot)
β βββ One_Shot_QA_analysis_report.json # Analysis report (one-shot)
β βββ Few_Shot_QA_analysis_report.json # Analysis report (few-shot)
βββ evaluation/ # Complete evaluation framework
β βββ README.md # Central summary of all evaluation types and modules
β βββ Ontology-Guided_KG_Evaluation/ # Evaluation of KG triples
β β βββ README.md
β β βββ Evaluation.ipynb # Validates triple structure and semantics
β β βββ evaluation_results.csv # Per-triple validation outcomes
β β βββ evaluation_report.json # Aggregate KG validation statistics
β βββ LLM-as-a-Judge/ # LLM-based QA evaluation (5 models)
β β βββ README.md # Overview of LLM evaluation setup and metric definitions
β β βββ DeepSeek-R1-Distill-Llama-70B/ # Evaluation results from DeepSeek-R1
β β β βββ DeepSeek-R1-Distill-Llama-70B.ipynb # ipynb file
β β β βββ DeepSeek_zeroshot_evaluation_results.csv # Zero-Shot QA results
β β β βββ DeepSeek_oneshot_evaluation_results.csv # One-Shot QA results
β β β βββ DeepSeek_fewshot_evaluation_results.csv # Few-Shot QA results
β β βββ Gemma-2 Instruct 27B/ # ipynb file and evaluation results
β β βββ LLaMA 3.3 70B/ # ipynb file and evaluation results
β β βββ mixtral-8x22b-instruct-v0.1/ # ipynb file and evaluation results
β β βββ Qwen3-32B/ # ipynb file and evaluation results
Each model folder includes: one `.ipynb` notebook + 3 CSVs for Zero-/One-/Few-Shot QA evaluation results
β βββ llms results analysis/ # Cross-model aggregation and statistics
β β βββ README.md
β β βββ LLM results analysis.ipynb # Compare results across LLM judges
β β βββ comprehensive_analysis_report.json # Metrics summary (means, deviations, majority voting agreement)
β βββ Human Judgements/ # Human evaluation and sampling
β β βββ README.md
β β βββ Evaluation_Template.md # Annotation form and scoring rubric
β β βββ stratified sampling method.ipynb # Script for stratified QA sampling
β β βββ QA_Human_Eval_Stratified_5percent.csv # Final sampled QA set for annotation
β β βββ QA_Sampling_Summary_Statistics.csv # Summary of sampled distribution
β β βββ QA_Stratified_Sampling_Visualization.png # Sample distribution plots
β β βββ human results analysis.ipynb # Human score processing and statistics
β β βββ human_evaluation_analysis_report.json # Metrics summary (means, deviations, majority voting agreement)
β βββ LLM vs Human/ # Correlation between LLM and human scores
β β βββ README.md
β β βββ LLM vs Human.ipynb # Notebook to compare LLM vs human scores
β β βββ human_llm_comparison_results.csv # EM,f1
βββ analysis/ # Statistical analysis & insights
β βββ README.md
β βββ Statistical_Analysis.ipynb
β βββ Readability_Analysis.csv # FKGL, Flesch, etc.
β βββ Vocabulary_Diversity_Analysis.csv
β βββ Length_Distribution_Analysis.csv
β βββ LLMs_based_results_analysis.ipynb
β βββ LLMs_Analysis_report.csv
Ablation Studies/
β
βββ Ablation Study 1/
β βββ chunks_only_qa_dataset.ipynb
β βββ Ablation_1_chunks_only_analysis_report.json
β βββ Ablation_1_chunks_only_qa_dataset.json
β
βββ Ablation Study 2/
β βββ KG_only_qa_dataset.ipynb
β βββ Ablation_2_kg_only_analysis_report.json
β βββ Ablation_2_kg_only_qa_dataset.json
β
βββ Evaluation/
β βββ chunks_only_Evaluation/
β β βββ DeepSeek-R1-Distill-Llama-70B/
β β βββ Gemma-2 Instruct (27B)/
β β βββ Llama 3.3 70B/
β β βββ mixtral-8x22b-instruct-v0.1/
β β βββ Qwen3-32B/
β β
β βββ KG_only_Evaluation/
β βββ DeepSeek-R1-Distill-Llama-70B/
β βββ Gemma-2 Instruct (27B)/
β βββ Llama 3.3 70B/
β βββ mixtral-8x22b-instruct-v0.1/
β βββ Qwen3-32B/
β
βββ Results Analysis/
βββ Chunks Only/
β βββ Chunks Only Evaluation Analysis.ipynb
βββ KG Only/
βββ KG Only Evaluation Analysis.ipynb
βββ fine_tuning/ # Fine-tuning experiments on QA datasets
β βββ README.md
β βββ t5_small/
β βββ t5_base/
β βββ t5_large/
β βββ flan_t5_small/
β βββ flan_t5_base/
β βββ flan_t5_large/ # results
git clone https://github.com/RGU-Computing/NeuReg.git
cd NeuReg
pip install -r requirements.txt
To reproduce the NeuReg QA generation pipeline:
cd data/chunks/
jupyter notebook chunks.ipynb
cd data/ontology/
jupyter notebook EFRO_Schema_Extraction.ipynb # Extract EFRO ontology
jupyter notebook KG_Extraction.ipynb # Generate KG triples
Choose your prompting strategy:
cd qa_generation/
# Choose one of the following:
jupyter notebook Zero-shot.ipynb # Zero-shot prompting
jupyter notebook One-shot.ipynb # One-shot prompting
jupyter notebook Few-shot.ipynb # Few-shot prompting
cd evaluation/llm_judges/[ModelName]/
# Example:
cd evaluation/llm_judges/DeepSeek-R1-Distill-Llama-70B/
jupyter notebook DeepSeek-R1-Distill-Llama-70B.ipynb
cd evaluation/llm_vs_human/
jupyter notebook llm_vs_human_Analysis_results_analysis.ipynb
cd fine_tuning/t5_small/ # or flan_t5_base/, flan_t5_large/, etc.
# Choose based on your dataset:
jupyter notebook t5_small_zero.ipynb # Zero-shot dataset
jupyter notebook t5_small_one.ipynb # One-shot dataset
jupyter notebook t5_small_few.ipynb # Few-shot dataset
Arshad et al., βNeuReg: Neuro-Symbolic QA Generation from Regulatory Compliance,β submitted to International Conference on Knowledge Capture 2025. GitHub Repository: https://github.com/RGU-Computing/NeuReg
This project is licensed under the MIT License. Β© 2025 School of Computing, Engineering and Technology, Robert Gordon University, UK. For full license details, see the LICENSE file.