This tool facilitates the creation of a knowledge graph from unstructured text from PDF documents and their subsequent storage in a Neo4j database. It is ideal for processing documents like insurance policies or annual reports.
Before you begin, ensure that Python 3.x is installed on your system. You can download it from Python's official website.
Ensure your PDF files are formatted correctly for ingestion. Update the main.py
file with a list of documents as shown in the sample structure:
{
"url": "PDF URL https://...",
"title": "Title of the pdf",
"context": "Insurance Document or Annual Report"
}
Install the required Python dependencies:
pip install -r requirements.txt
Set up the necessary environment variables:
- Copy the sample environment configuration file:
cp .env.example .env
- Fill in the details in
.env
for OpenAI and your Neo4j database.
Launch the application to begin importing your PDFs into Neo4j:
python main.py
To start the Streamlit-based user interface, use the following command from the root of the repository:
streamlit run ./ui/chatbot.py
This command initializes the UI, allowing you to interact with the extracted text data through a convenient web interface.