Natural language to SPARQL query

This system is designed to take natural language input, convert it into SPARQL queries, execute them on DBPedia, and return the results. Below, I'll break down how it works and how you can start using it.

1. Overview of the Workflow

Convert natural language to SPARQL (e.g., "List all Nobel Prize winners in Physics after 2000").
Execute the SPARQL query on DBPedia.
Validate the results to ensure they are meaningful.
Display the output in a readable format.

There are also pathology-specific queries for retrieving medical information.

2. Understanding the Key Files

Each file serves a role in the pipeline.

Core Scripts

File	Purpose
`query_generator.py`	Uses OpenAI GPT-4 to convert natural language into SPARQL queries.
`executor.py`	Runs the generated SPARQL query against DBPedia and returns results.
`validator.py`	Checks if the SPARQL query results are valid and meaningful.

Automation & Testing

File	Purpose
`automate_queries.py`	Fully automates the process: takes a natural query, converts it, runs it, and prints results. Uses OpenAI GPT-4.
`automate_with_ollama.py`	Same as `automate_queries.py`, but uses Ollama instead of OpenAI's API.
`test_queries.py`	Unit tests for query generation, execution, and validation.

Pathology-Specific Scripts

File	Purpose
`run_pathology_queries.py`	Runs five pre-defined pathology-related SPARQL queries on DBPedia.
`pathology.py`	Runs a single pathology-related SPARQL query.

3. Getting Started

There are two main ways to start using this system:

Method 1: Run automate_queries.py (for automated natural language to SPARQL)
Method 2: Manually use query_generator.py + executor.py (for step-by-step control)

Method 1: Fully Automated (Best for Testing)

Open a terminal in the project directory.
Run:
```
python automate_queries.py
```
The script will:
- Take a natural language query ("Who are some famous pathologists?")
- Generate a SPARQL query using GPT-4.
- Execute the SPARQL query on DBPedia.
- Print the results.
If you want to modify the query, open automate_queries.py and change:
```
natural_query = "Who are some famous pathologists?"
```
to whatever you want.

Method 2: Step-by-Step Execution

If you want to control each step manually:

Step 1: Generate a SPARQL Query

Run:

python query_generator.py

This will take a natural language question (e.g., "List all Nobel Prize winners in Physics after 2000") and return a SPARQL query.

Step 2: Execute the SPARQL Query

Copy the generated query and run:

python executor.py

This script will send the query to DBPedia and return the results.

Step 3: Validate Results

If you want to validate whether the results are useful, call:

from validator import validate_results

valid = validate_results(results)  # Pass the results from executor.py
print("Valid:", valid)

4. Running Pathology Queries

If you're interested in medical queries, run:

python run_pathology_queries.py

This will run five pathology-related queries, including:

Common diseases
Cancers and ICD-10 codes
Liver diseases
Pathology scientists

Alternatively, to run a single pathology-related query, use:

python pathology.py

5. Running Tests

To test the system, run:

python -m unittest discover

This will check:

Whether SPARQL queries are generated correctly.
Whether they execute successfully.
Whether the results are valid.

6. Alternative: Running with Ollama Instead of OpenAI

If you want to avoid using OpenAI's API, you can use Ollama.

Run:

python automate_with_ollama.py

It will:

Use Ollama's Mistral model instead of GPT-4.
Convert natural language to SPARQL.
Execute the query.

⚠️ Note: I commented that OpenAI's GPT-4 performs better than Ollama.

7. Example Inputs & Expected Outputs

Example 1: Finding Nobel Prize Winners

Input (Natural Language)

"List all Nobel Prize winners in Physics after 2000."

Generated SPARQL Query

SELECT ?name WHERE {
    ?person a dbo:Scientist .
    ?person dbo:award dbr:Nobel_Prize .
    ?person dbo:field dbr:Physics .
    ?person foaf:name ?name .
    FILTER (year(?person dbo:awardYear) > 2000)
} LIMIT 10

Output (Results)

Albert Einstein
Richard Feynman
Marie Curie
...

8. Troubleshooting & Debugging

1. No results found?

Check if the generated SPARQL query is valid.
Run the query manually in DBPedia's Query Editor.
Adjust filtering conditions in the query.

2. OpenAI API Issues?

Ensure OPENAI_API_KEY is set in your environment variables.
Try switching to automate_with_ollama.py.

3. DBPedia Not Responding?

DBPedia's SPARQL endpoint sometimes throttles requests.
Try running queries during off-peak hours.

9. Summary

Task	Recommended Script
Full automation	`automate_queries.py`
Step-by-step execution	`query_generator.py` → `executor.py`
Validate query results	`validator.py`
Run pathology-related queries	`run_pathology_queries.py`
Test the system	`test_queries.py`
Use Ollama instead of OpenAI	`automate_with_ollama.py`

This system is pretty robust for querying DBPedia using natural language. You can either:

Use automate_queries.py for a quick, fully automated approach.
Manually generate & execute queries for fine control.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
docs		docs
src		src
start		start
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
my_openai_script.py		my_openai_script.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Natural language to SPARQL query

1. Overview of the Workflow

2. Understanding the Key Files

Core Scripts

Automation & Testing

Pathology-Specific Scripts

3. Getting Started

Method 1: Fully Automated (Best for Testing)

Method 2: Step-by-Step Execution

Step 1: Generate a SPARQL Query

Step 2: Execute the SPARQL Query

Step 3: Validate Results

4. Running Pathology Queries

5. Running Tests

6. Alternative: Running with Ollama Instead of OpenAI

7. Example Inputs & Expected Outputs

Example 1: Finding Nobel Prize Winners

Input (Natural Language)

Generated SPARQL Query

Output (Results)

8. Troubleshooting & Debugging

1. No results found?

2. OpenAI API Issues?

3. DBPedia Not Responding?

9. Summary

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

tdiprima/nl2sparql

Folders and files

Latest commit

History

Repository files navigation

Natural language to SPARQL query

1. Overview of the Workflow

2. Understanding the Key Files

Core Scripts

Automation & Testing

Pathology-Specific Scripts

3. Getting Started

Method 1: Fully Automated (Best for Testing)

Method 2: Step-by-Step Execution

Step 1: Generate a SPARQL Query

Step 2: Execute the SPARQL Query

Step 3: Validate Results

4. Running Pathology Queries

5. Running Tests

6. Alternative: Running with Ollama Instead of OpenAI

7. Example Inputs & Expected Outputs

Example 1: Finding Nobel Prize Winners

Input (Natural Language)

Generated SPARQL Query

Output (Results)

8. Troubleshooting & Debugging

1. No results found?

2. OpenAI API Issues?

3. DBPedia Not Responding?

9. Summary

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages