Skip to content

RGU-Computing/NeuReg

Repository files navigation

🧠 NeuReg: Neuro-Symbolic QA Generation from Regulatory Compliance

NeuReg is a neuro-symbolic QA generation framework that transforms complex regulatory documents into intelligent and explainable question–answering systems. It seamlessly integrates ontology-guided knowledge graphs (KGs) and regulatory text chunks to generate high-quality, semantically grounded QA pairs. By combining structured symbolic knowledge from domain ontologies with the contextual richness of unstructured policy text, NeuReg enables accurate, diverse, and interpretable QA generation using large language models (LLMs).


πŸ“š Table of Contents


πŸ”„ Pipeline Overview

NeuReg consists of two main stages: Knowledge Extraction and Question Answer Generation.

1️⃣ Ontology-Guided Knowledge Extraction

  • Regulatory text is split into coherent text chunks.
  • A domain-specific Educational Funding Regulation Ontology (EFRO) is used to guide schema extraction and triple generation using GPT-4 turbo.
  • Each output triple is structured as (subject, predicate, object) with post-processing.

2️⃣ Question Answer Generation

Each chunk is mapped to its corresponding KG (based on chunk_id). The QA generation follows four steps:

  1. Question Type Selection – Factual, Relational, Comparative, Inferential
  2. Prompt Augmentation – Zero-shot, One-shot, Few-shot prompting strategies
  3. QA Filtering – Answer length, semantic similarity (cosine < 0.85), retry up to 3 times
  4. Validation – Human annotation and automatic scoring from LLMs

🧠 Model Architecture

NeuReg Framework

Figure 1: NeuReg: Neuro-symbolic framework for regulatory QA generation using ZS, OS, and FS prompting with ontology-guided KG extraction.


πŸš€ Motivation

Access to education funding is governed by complex and evolving regulations. These policies are often communicated through lengthy documents that are difficult for students and institutional staff to interpret. NeuReg addresses this challenge by transforming unstructured regulatory guidance into structured and explainable QA datasets, bridging the gap between dense policy language and actionable decision support.


✨ Key Contributions

🧠 Neuro-Symbolic QA Generation Framework

We present NeuReg, a neuro-symbolic question–answer generation framework that integrates the generative power of large language models (LLMs) with structured knowledge from ontology-guided knowledge graphs and their aligned regulatory text segments. This hybrid approach enables the generation of high-quality, semantically grounded QA pairs tailored to complex regulatory domains.

πŸ“Š First-of-Its-Kind Regulatory QA Dataset

We construct a domain-specific QA dataset for regulatory compliance in education funding, encompassing four distinct question types: Factual (FactQ), Relational (RelQ), Comparative (CompQ), and Inferential (InferQ). These QA pairs are generated using multi-strategy prompting and rigorously validated through comparative assessment by expert human annotators and state-of-the-art (SOTA) LLM judges. To the best of our knowledge, this is the first QA dataset of its kind within this domain.

πŸ”¬ Empirical Validation through Ablation and Fine-Tuning

We conduct controlled ablation studies to quantify the individual contributions of structured KG triples and unstructured text chunks to QA generation quality, demonstrating the indispensable role of symbolic knowledge. Additionally, we evaluate the practical utility of the generated datasets by fine-tuning multiple LLMs (T5, FLAN-T5), analyzing the effects of prompting strategies (ZS, OS, FS) and model scale on QA performance.


🎯 QA Generation Types

Type Description
FactQ Extract concrete details (e.g., definitions, thresholds, dates) for grounded information retrieval.
RelQ QExamine entity interactions within regulatory structures, reflecting KG-based links (e.g., between providers and funding authorities).
CompQ Contrast policies, programmes, or entities to highlight distinctions or trade-offs.
InferQ Require synthesis or multi-hop reasoning across text and KG to derive implicit conclusions

πŸ“‚ Repository Structure

NeuReg/
β”œβ”€β”€ README.md                          # Overview of the project, contributions, pipeline, and structure
β”œβ”€β”€ LICENSE                            # Project license (MIT)
β”œβ”€β”€ requirements.txt                   # Python dependencies for reproducing the results

β”œβ”€β”€ data/                              # Preprocessing and knowledge graph construction
β”‚   β”œβ”€β”€ README.md                      # Overview of chunk & triple-level statistics
β”‚   β”œβ”€β”€ chunks/                        # Extracting regulatory text chunks
β”‚   β”‚   β”œβ”€β”€ chunks.csv                 # chunk dataset
β”‚   β”‚   └── chunks.ipynb               # Chunk extraction notebook
β”‚   β”œβ”€β”€ ontology/                      # Ontology schema and KG triples
β”‚   β”‚   β”œβ”€β”€ ontology_schema.json       # Extracted ontology schema in JSON
β”‚   β”‚   β”œβ”€β”€ Ontology_Guided_Triples.csv           # Ontology-guided KG triples
β”‚   β”‚   β”œβ”€β”€ Ontology_Guided_Triples_statistics.json  # Stats on generated triples
β”‚   β”‚   β”œβ”€β”€ EFRO_Schema_Extraction.ipynb           # Extract ontology schema from guidance
β”‚   β”‚   └── KG_Extraction.ipynb                    # Generate KG using ontology + chunks

β”œβ”€β”€ qa_generation/                      # QA dataset generation using prompting
β”‚   β”œβ”€β”€ README.md                      
β”‚   β”œβ”€β”€ Zero-shot.ipynb                # Zero-shot QA generation
β”‚   β”œβ”€β”€ One-shot.ipynb                 # One-shot QA generation
β”‚   β”œβ”€β”€ Few-shot.ipynb                 # Few-shot QA generation
β”‚   β”œβ”€β”€ Zero-Shot_qa_dataset.json      # Output QA dataset (zero-shot)
β”‚   β”œβ”€β”€ One-Shot_qa_dataset.json       # Output QA dataset (one-shot)
β”‚   β”œβ”€β”€ Few-Shot_qa_dataset.json       # Output QA dataset (few-shot)
β”‚   β”œβ”€β”€ Zero_Shot_QA_analysis_report.json  # Analysis report (zero-shot)
β”‚   β”œβ”€β”€ One_Shot_QA_analysis_report.json   # Analysis report (one-shot)
β”‚   └── Few_Shot_QA_analysis_report.json   # Analysis report (few-shot)

β”œβ”€β”€ evaluation/                        # Complete evaluation framework
β”‚   β”œβ”€β”€ README.md                      # Central summary of all evaluation types and modules
β”‚   β”œβ”€β”€ Ontology-Guided_KG_Evaluation/         # Evaluation of KG triples
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   β”œβ”€β”€ Evaluation.ipynb                    # Validates triple structure and semantics
β”‚   β”‚   β”œβ”€β”€ evaluation_results.csv              # Per-triple validation outcomes
β”‚   β”‚   └── evaluation_report.json              # Aggregate KG validation statistics
β”‚   β”œβ”€β”€ LLM-as-a-Judge/                        # LLM-based QA evaluation (5 models)
β”‚   β”‚   β”œβ”€β”€ README.md                          # Overview of LLM evaluation setup and metric definitions
β”‚   β”‚   β”œβ”€β”€ DeepSeek-R1-Distill-Llama-70B/     # Evaluation results from DeepSeek-R1
β”‚   β”‚   β”‚   β”œβ”€β”€ DeepSeek-R1-Distill-Llama-70B.ipynb         # ipynb file
β”‚   β”‚   β”‚   β”œβ”€β”€ DeepSeek_zeroshot_evaluation_results.csv    #  Zero-Shot QA results
β”‚   β”‚   β”‚   β”œβ”€β”€ DeepSeek_oneshot_evaluation_results.csv     #  One-Shot QA results
β”‚   β”‚   β”‚   └── DeepSeek_fewshot_evaluation_results.csv     #  Few-Shot QA results
β”‚   β”‚   β”œβ”€β”€ Gemma-2 Instruct 27B/              # ipynb file and evaluation results
β”‚   β”‚   β”œβ”€β”€ LLaMA 3.3 70B/                     # ipynb file and evaluation results
β”‚   β”‚   β”œβ”€β”€ mixtral-8x22b-instruct-v0.1/       # ipynb file and evaluation results
β”‚   β”‚   └── Qwen3-32B/                         # ipynb file and evaluation results
Each model folder includes: one `.ipynb` notebook + 3 CSVs for Zero-/One-/Few-Shot QA evaluation results

β”‚   β”œβ”€β”€ llms results analysis/                 # Cross-model aggregation and statistics
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   β”œβ”€β”€ LLM results analysis.ipynb         # Compare results across LLM judges
β”‚   β”‚   └── comprehensive_analysis_report.json # Metrics summary (means, deviations, majority voting agreement)
β”‚   β”œβ”€β”€ Human Judgements/                      # Human evaluation and sampling
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   β”œβ”€β”€ Evaluation_Template.md             # Annotation form and scoring rubric
β”‚   β”‚   β”œβ”€β”€ stratified sampling method.ipynb   # Script for stratified QA sampling
β”‚   β”‚   β”œβ”€β”€ QA_Human_Eval_Stratified_5percent.csv     # Final sampled QA set for annotation
β”‚   β”‚   β”œβ”€β”€ QA_Sampling_Summary_Statistics.csv        # Summary of sampled distribution
β”‚   β”‚   β”œβ”€β”€ QA_Stratified_Sampling_Visualization.png  # Sample distribution plots
β”‚   β”‚   β”œβ”€β”€ human results analysis.ipynb       # Human score processing and statistics
β”‚   β”‚   └── human_evaluation_analysis_report.json  # Metrics summary (means, deviations, majority voting agreement)
β”‚   β”œβ”€β”€ LLM vs Human/                          # Correlation between LLM and human scores
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   β”œβ”€β”€ LLM vs Human.ipynb                 # Notebook to compare LLM vs human scores
β”‚   β”‚   └── human_llm_comparison_results.csv   #  EM,f1

β”œβ”€β”€ analysis/                          # Statistical analysis & insights
β”‚   β”œβ”€β”€ README.md 
β”‚   β”œβ”€β”€ Statistical_Analysis.ipynb
β”‚   β”œβ”€β”€ Readability_Analysis.csv       # FKGL, Flesch, etc.
β”‚   β”œβ”€β”€ Vocabulary_Diversity_Analysis.csv
β”‚   β”œβ”€β”€ Length_Distribution_Analysis.csv
β”‚   β”œβ”€β”€ LLMs_based_results_analysis.ipynb
β”‚   └── LLMs_Analysis_report.csv

Ablation Studies/
β”‚
β”œβ”€β”€ Ablation Study 1/
β”‚ β”œβ”€β”€ chunks_only_qa_dataset.ipynb
β”‚ β”œβ”€β”€ Ablation_1_chunks_only_analysis_report.json
β”‚ └── Ablation_1_chunks_only_qa_dataset.json
β”‚
β”œβ”€β”€ Ablation Study 2/
β”‚ β”œβ”€β”€ KG_only_qa_dataset.ipynb
β”‚ β”œβ”€β”€ Ablation_2_kg_only_analysis_report.json
β”‚ └── Ablation_2_kg_only_qa_dataset.json
β”‚
β”œβ”€β”€ Evaluation/
β”‚ β”œβ”€β”€ chunks_only_Evaluation/
β”‚ β”‚ β”œβ”€β”€ DeepSeek-R1-Distill-Llama-70B/
β”‚ β”‚ β”œβ”€β”€ Gemma-2 Instruct (27B)/
β”‚ β”‚ β”œβ”€β”€ Llama 3.3 70B/
β”‚ β”‚ β”œβ”€β”€ mixtral-8x22b-instruct-v0.1/
β”‚ β”‚ └── Qwen3-32B/
β”‚ β”‚
β”‚ └── KG_only_Evaluation/
β”‚  β”œβ”€β”€ DeepSeek-R1-Distill-Llama-70B/
β”‚  β”œβ”€β”€ Gemma-2 Instruct (27B)/
β”‚  β”œβ”€β”€ Llama 3.3 70B/
β”‚  β”œβ”€β”€ mixtral-8x22b-instruct-v0.1/
β”‚  └── Qwen3-32B/
β”‚
└── Results Analysis/
β”œβ”€β”€ Chunks Only/
β”‚ └── Chunks Only Evaluation Analysis.ipynb
└── KG Only/
 └── KG Only Evaluation Analysis.ipynb

β”œβ”€β”€ fine_tuning/                       # Fine-tuning experiments on QA datasets
β”‚   β”œβ”€β”€ README.md 
β”‚   β”œβ”€β”€ t5_small/
β”‚   β”œβ”€β”€ t5_base/
β”‚   β”œβ”€β”€ t5_large/
β”‚   β”œβ”€β”€ flan_t5_small/
β”‚   β”œβ”€β”€ flan_t5_base/
β”‚   └── flan_t5_large/                      #  results 

βš™οΈ Installation

git clone https://github.com/RGU-Computing/NeuReg.git
cd NeuReg
pip install -r requirements.txt

▢️ Getting Started

To reproduce the NeuReg QA generation pipeline:

Step 1: Preprocess Regulatory Text

cd data/chunks/
jupyter notebook chunks.ipynb

Step 2: Generate Ontology-Guided KG Triples

cd data/ontology/
jupyter notebook EFRO_Schema_Extraction.ipynb  # Extract EFRO ontology
jupyter notebook KG_Extraction.ipynb           # Generate KG triples

Step 3: Generate QA Pairs

Choose your prompting strategy:

cd qa_generation/
# Choose one of the following:
jupyter notebook Zero-shot.ipynb   # Zero-shot prompting
jupyter notebook One-shot.ipynb    # One-shot prompting  
jupyter notebook Few-shot.ipynb    # Few-shot prompting

Step 4: Evaluate QA Quality

cd evaluation/llm_judges/[ModelName]/
# Example:
cd evaluation/llm_judges/DeepSeek-R1-Distill-Llama-70B/
jupyter notebook DeepSeek-R1-Distill-Llama-70B.ipynb

Step 5 (Optional): Analyze LLM vs Human Agreement

cd evaluation/llm_vs_human/
jupyter notebook llm_vs_human_Analysis_results_analysis.ipynb

Step 6 : Fine-tune Models

cd fine_tuning/t5_small/  # or flan_t5_base/, flan_t5_large/, etc.
# Choose based on your dataset:
jupyter notebook t5_small_zero.ipynb   # Zero-shot dataset
jupyter notebook t5_small_one.ipynb    # One-shot dataset
jupyter notebook t5_small_few.ipynb    # Few-shot dataset

πŸ“– Citation

Arshad et al., β€œNeuReg: Neuro-Symbolic QA Generation from Regulatory Compliance,” submitted to International Conference on Knowledge Capture 2025. GitHub Repository: https://github.com/RGU-Computing/NeuReg


πŸ“„ License

This project is licensed under the MIT License. Β© 2025 School of Computing, Engineering and Technology, Robert Gordon University, UK. For full license details, see the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published