This system is designed to take natural language input, convert it into SPARQL queries, execute them on DBPedia, and return the results. Below, I'll break down how it works and how you can start using it.
- Convert natural language to SPARQL (e.g., "List all Nobel Prize winners in Physics after 2000").
- Execute the SPARQL query on DBPedia.
- Validate the results to ensure they are meaningful.
- Display the output in a readable format.
There are also pathology-specific queries for retrieving medical information.
Each file serves a role in the pipeline.
File | Purpose |
---|---|
query_generator.py |
Uses OpenAI GPT-4 to convert natural language into SPARQL queries. |
executor.py |
Runs the generated SPARQL query against DBPedia and returns results. |
validator.py |
Checks if the SPARQL query results are valid and meaningful. |
File | Purpose |
---|---|
automate_queries.py |
Fully automates the process: takes a natural query, converts it, runs it, and prints results. Uses OpenAI GPT-4. |
automate_with_ollama.py |
Same as automate_queries.py , but uses Ollama instead of OpenAI's API. |
test_queries.py |
Unit tests for query generation, execution, and validation. |
File | Purpose |
---|---|
run_pathology_queries.py |
Runs five pre-defined pathology-related SPARQL queries on DBPedia. |
pathology.py |
Runs a single pathology-related SPARQL query. |
There are two main ways to start using this system:
- Method 1: Run
automate_queries.py
(for automated natural language to SPARQL) - Method 2: Manually use
query_generator.py
+executor.py
(for step-by-step control)
-
Open a terminal in the project directory.
-
Run:
python automate_queries.py
-
The script will:
- Take a natural language query (
"Who are some famous pathologists?"
) - Generate a SPARQL query using GPT-4.
- Execute the SPARQL query on DBPedia.
- Print the results.
- Take a natural language query (
-
If you want to modify the query, open
automate_queries.py
and change:natural_query = "Who are some famous pathologists?"
to whatever you want.
If you want to control each step manually:
Run:
python query_generator.py
This will take a natural language question (e.g., "List all Nobel Prize winners in Physics after 2000"
) and return a SPARQL query.
Copy the generated query and run:
python executor.py
This script will send the query to DBPedia and return the results.
If you want to validate whether the results are useful, call:
from validator import validate_results
valid = validate_results(results) # Pass the results from executor.py
print("Valid:", valid)
If you're interested in medical queries, run:
python run_pathology_queries.py
This will run five pathology-related queries, including:
- Common diseases
- Cancers and ICD-10 codes
- Liver diseases
- Pathology scientists
Alternatively, to run a single pathology-related query, use:
python pathology.py
To test the system, run:
python -m unittest discover
This will check:
- Whether SPARQL queries are generated correctly.
- Whether they execute successfully.
- Whether the results are valid.
If you want to avoid using OpenAI's API, you can use Ollama.
Run:
python automate_with_ollama.py
It will:
- Use Ollama's Mistral model instead of GPT-4.
- Convert natural language to SPARQL.
- Execute the query.
"List all Nobel Prize winners in Physics after 2000."
SELECT ?name WHERE {
?person a dbo:Scientist .
?person dbo:award dbr:Nobel_Prize .
?person dbo:field dbr:Physics .
?person foaf:name ?name .
FILTER (year(?person dbo:awardYear) > 2000)
} LIMIT 10
Albert Einstein
Richard Feynman
Marie Curie
...
- Check if the generated SPARQL query is valid.
- Run the query manually in DBPedia's Query Editor.
- Adjust filtering conditions in the query.
- Ensure
OPENAI_API_KEY
is set in your environment variables. - Try switching to
automate_with_ollama.py
.
- DBPedia's SPARQL endpoint sometimes throttles requests.
- Try running queries during off-peak hours.
Task | Recommended Script |
---|---|
Full automation | automate_queries.py |
Step-by-step execution | query_generator.py → executor.py |
Validate query results | validator.py |
Run pathology-related queries | run_pathology_queries.py |
Test the system | test_queries.py |
Use Ollama instead of OpenAI | automate_with_ollama.py |
This system is pretty robust for querying DBPedia using natural language. You can either:
- Use
automate_queries.py
for a quick, fully automated approach. - Manually generate & execute queries for fine control.