NL2SPARQL Calvin

An automatic conversion system from natural language questions in French to SPARQL queries for querying historical knowledge graphs.

📋 Table of Contents

Overview
Installation
Configuration
Usage
Architecture
Evaluation
Development
Documentation

🎯 Overview

This project developed as part of a Bachelor's Thesis at the University of Geneva allows to:

Convert French questions to SPARQL queries
Validate automatically generated queries
Correct errors via a feedback system
Evaluate performance with precise metrics
Interact via a conversational web interface

🔬 Based on SIB work : This system builds on and extends the sparql-llm framework developed by the Swiss Institute of Bioinformatics, adapted for French historical data and conversational interface.

Usage Example

❓ Question: "Liste tous les registres et leur numéro."

🔄 Automatic SPARQL query generation:

PREFIX rc: <http://purl.org/rcnum/onto/rc#>
SELECT ?registre ?numero WHERE {
        SERVICE <http://localhost:7200/repositories/Calvin> {
          ?registre a rc:Registre ;
                    rc:no_registre ?numero .
        }
      }

🚀 Installation

Prerequisites

Python 3.9+
GraphDB (>= 9.0)
Qdrant Vector Database
OpenAI API Key

Quick Installation

# Clone the repository
git clone https://gitlab.unige.ch/Filipe.Ramos/nl2sparql_calvin.git
cd nl2sparql_calvin

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Required Services

GraphDB:

Start GraphDB on port 7200

Qdrant:

# Docker
docker run -p 6333:6333 qdrant/qdrant

# Or local installation
# See: https://qdrant.tech/documentation/quick-start/

⚙️ Configuration

Environment Variables

Copy the .env.example file to .env and configure:

# LLM Model
MODEL_NAME=xxxxx
OPENAI_API_KEY=your_openai_api_key

# Database
GRAPHDB_ENDPOINT=http://localhost:7200/repositories/Calvin
VOID_FILE=/path/to/void.ttl

# Qdrant
QDRANT_HOST=http://localhost:6333
QDRANT_COLLECTION_NAME=calvin-sparql-docs

# LangSmith (optional)
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=nl2sparql_calvin

Data Initialization

# Compile SPARQL examples
bash compile_examples.sh

# Initialize vector database
python -c "from llm.index import check_collection; check_collection()"

🎮 Usage

Web Interface (Recommended)

# Start the Chainlit interface
bash ./run_ui.sh
# Access: http://localhost:8000

CLI Interface

# Interactive command-line mode
python -m llm.app

Programmatic API

from llm.chain_pipeline import build_chain
from sparql_llm.utils import get_prefixes_and_schema_for_endpoints
from llm.index import endpoints

# Initialization
prefixes_map, endpoints_void_dict = get_prefixes_and_schema_for_endpoints(endpoints)
chain = build_chain(skip_retriever=False)

# Usage
ctx = {
    "prefixes_map": prefixes_map,
    "endpoints_void_dict": endpoints_void_dict,
}

result = chain.invoke({
    "question": "Your question here",
    **ctx
})

🏗️ Architecture

nl2sparql_calvin/
├── llm/                    # Main LLM module
│   ├── chain_pipeline.py   # Processing pipeline
│   ├── calvin_sparql_validation.py  # SPARQL validation
│   ├── retriever.py        # Context retrieval
│   └── index.py           # Data management
├── UI/                     # User interface
│   ├── app.py             # Chainlit application
│   └── chainlit.yaml      # UI configuration
├── benchmark/              # Evaluation system
│   ├── eval.py            # Evaluation pipeline
│   └── sparql/            # Reference queries
├── folds/                  # Cross-validation data
│   ├── fold_1/ ... fold_5/
└── Graph/                  # Data and ontologies

Processing Flow

User question → Interface
Context retrieval → Qdrant vector database
SPARQL generation → LLM model (GPT-4o)
Validation → Syntax + Semantics (ShEx)
Correction → Feedback if errors (max 3 attempts)
Execution → GraphDB
Results → User interface

Contributing

Fork the project
Create a feature branch (git checkout -b feature/new-feature)
Commit changes (git commit -am 'Add new feature')
Push to branch (git push origin feature/new-feature)
Create a Pull Request

Main Modules

LLM Module : Main processing pipeline
UI Module : User interface
Benchmark Module :

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

👥 Authors

Filipe Ramos - Main Developer - University of Geneva
Marco Sorbi - Supervisor and Mentor - PhD research
Laurent Moccozet - Bachelor's Thesis Director, Senior Lecturer (MER) - University of Geneva

🙏 Acknowledgments

Supervision and Mentoring

Laurent Moccozet - Senior Lecturer (MER), University of Geneva
- Bachelor's Thesis supervision
- Initiation and facilitation of collaboration with the history department
- Pedagogical and scientific expertise throughout the project
Marco Sorbi - University of Geneva
- Technical supervision and daily mentoring
- Expertise in SPARQL technologies
- Continuous support for implementation and evaluation

Institutions and Teams

University of Geneva - University Computing Center and History Department
Swiss Institute of Bioinformatics (SIB) - For their SPARQL tools and frameworks
UNIGE Computer Science research team
LangChain community
OpenAI for access to GPT models

SIB Work and Tools Used

This project heavily relies on work and tools developed by the Swiss Institute of Bioinformatics (SIB):

sparql-llm : Main framework for SPARQL-LLM integration
- Used for: Base pipeline, endpoint management, query validation
- Adapted modules: SparqlEndpointLinks, SparqlExamplesLoader, validation utilities
sparql-examples-utils : SPARQL examples compilation and validation tools
- Used for: Training examples generation, dataset compilation
void-generator : VOID metadata generator
- Used for: Automatic VOID schema generation for GraphDB

Adaptations and Extensions

SIB tools have been adapted and extended for this project:

Calvin Validation : Extension of SPARQL validation system with ShEx
Multilingual Pipeline : Adaptation for French and historical data
Conversational Interface : Integration with Chainlit for UI
Evaluation Metrics : Specialized evaluation system with LangSmith
Error Handling : Automatic feedback and correction system

This work uses and extends SPARQL-LLM tools developed by the Swiss Institute 
of Bioinformatics (SIB), available at: https://github.com/sib-swiss/sparql-llm

Project developed as part of a Bachelor's Thesis
University of Geneva - 2025

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.chainlit		.chainlit
Graph		Graph
SIB_tools		SIB_tools
UI		UI
benchmark		benchmark
folds		folds
llm		llm
my-examples		my-examples
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
compile_examples.sh		compile_examples.sh
exemples_tester.sh		exemples_tester.sh
export.trig		export.trig
extract_examples.py		extract_examples.py
filAriane.md		filAriane.md
generate_shex.py		generate_shex.py
log4j.properties		log4j.properties
questions_sparql_langsmith.json		questions_sparql_langsmith.json
requirements.txt		requirements.txt
run_ui.sh		run_ui.sh
void.ttl		void.ttl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NL2SPARQL Calvin

📋 Table of Contents

🎯 Overview

Usage Example

🚀 Installation

Prerequisites

Quick Installation

Required Services

⚙️ Configuration

Environment Variables

Data Initialization

🎮 Usage

Web Interface (Recommended)

CLI Interface

Programmatic API

🏗️ Architecture

Processing Flow

Contributing

Main Modules

📄 License

👥 Authors

🙏 Acknowledgments

Supervision and Mentoring

Institutions and Teams

SIB Work and Tools Used

Adaptations and Extensions

About

Uh oh!

Releases

Packages

Languages

Lip1200/nl2sparql_calvin

Folders and files

Latest commit

History

Repository files navigation

NL2SPARQL Calvin

📋 Table of Contents

🎯 Overview

Usage Example

🚀 Installation

Prerequisites

Quick Installation

Required Services

⚙️ Configuration

Environment Variables

Data Initialization

🎮 Usage

Web Interface (Recommended)

CLI Interface

Programmatic API

🏗️ Architecture

Processing Flow

Contributing

Main Modules

📄 License

👥 Authors

🙏 Acknowledgments

Supervision and Mentoring

Institutions and Teams

SIB Work and Tools Used

Adaptations and Extensions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages