A support chatbot that leverages local LLMs, vector similarity search, and knowledge graphs to provide contextual assistance by finding and presenting solutions from historical support tickets.
Screen.Recording.2025-06-02.at.23.57.54.mov
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Streamlit β β LLM Handler β β Ollama API β
β Chat UI βββββΊβ (OpenAI βββββΊβ (Local LLM) β
β β β Compatible) β β β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β
β βΌ
β ββββββββββββββββββββ βββββββββββββββββββ
β β Knowledge Graph βββββΊβ Neo4j β
β β Retriever β β Database β
β ββββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β Conversation β β Embedding β
β History β β Computation β
β (JSON Files) β β β
βββββββββββββββββββ ββββββββββββββββββββ
A more detailed workflow flowchart diagram can be found in the docs/1. Analysis/main_workflow_diagram.mmd file.
- Python 3.13.2
- Neo4j Database with APOC plugin
- Ollama
-
Ollama Server
# Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull required models ollama pull llama3.2 # or your preferred chat model ollama pull nomic-embed-text # for embeddings
-
Neo4j Database
Install Neo4j: https://neo4j.com/deployment-center/
Important: Make sure to install the APOC plugin for Neo4j as it's required for embedding operations.
-
Clone the repository
git clone "https://github.com/AlexisBalayre/3ds_Assignement_Data_Scientist_Customer_Success" cd 3ds_Assignement_Data_Scientist_Customer_Success
-
Install Python dependencies using Poetry
# Install Poetry if you haven't already curl -sSL https://install.python-poetry.org | python3 - # Install project dependencies poetry install # Activate the virtual environment poetry shell
-
Create an environment file
cp exemple.env .env
The project includes Cypher scripts to set up the complete database schema and load sample data:
-
Database Structure Setup (
datasets/setup_database.cypher
)- Creates unique constraints for all entities
- Loads sample data from CSV files (users, tickets, comments, etc.)
- Establishes relationships between entities
- Cleans up temporary foreign key properties
-
Embeddings Setup (
datasets/upload_embeddings.cypher
)- Loads pre-computed embeddings for tickets
- Creates vector index for similarity search
- Configures cosine similarity with proper dimensions
-
Prepare your CSV files in the Neo4j import directory:
neo4j/import/ βββ users_sample.csv βββ status_sample.csv βββ priority_sample.csv βββ category_sample.csv βββ tickets_sample.csv βββ comments_sample.csv βββ documentation_sample.csv βββ tickets_sample_with_embeddings.csv
Note: The Cypher scripts use
http://localhost:11001/project-<id>/
URLs for CSV import. Update these URLs in the scripts to match your Neo4j import directory or file server setup. -
Run the database setup script:
// In Neo4j Browser or cypher-shell // Copy and paste the contents of datasets/setup_database.cypher
-
Generate embeddings for your tickets:
# Process your tickets CSV to add embeddings poetry run python compute_embedding.py
-
Upload embeddings to Neo4j:
// In Neo4j Browser // Copy and paste the contents of datasets/upload_embeddings.cypher
The database follows this entity relationship structure:
User ββSUBMITSβββΊ Ticket ββHAS_STATUSβββΊ Status
β β
βββPOSTSβββΊ Comment β
β βββHAS_PRIORITYβββΊ Priority
β β
CONTAINSβββ β ββHAS_CATEGORYβββΊ Category
β β
βββCONTAINSβββΊ Comment β
β β
DocumentationArticle ββββ΄ββββββββ
β
REFERENCED_BYβββComment
A more detailed architecture diagram can be found in the docs/1. Analysis/knowledge_graph_architecture_diagram.mmd file.
The graph traversal workflow flowchart can be found in the docs/1. Analysis/graph_traversal_workflow_diagram.mmd file.
# Using Poetry (recommended)
poetry run streamlit run app.py
# Or activate the virtual environment first
poetry shell
streamlit run app.py
# Or run directly
poetry run python app.py
- Access the application at http://localhost:8501
- Select a model from the sidebar dropdown
- Configure parameters (optional):
- System prompt
- Temperature (creativity level)
- Number of similar tickets to retrieve
- Similarity threshold
- Context comments to include
- Start chatting with questions about technical issues
- View sources when the AI finds similar tickets
User: "I get a 500 error when I try to open Aura. What should I do?"
AI: [Searches for similar tickets and finds relevant solutions] "Based on similar tickets, this is typically caused by authentication service issues. Here's how to resolve it:
- Check if the authentication service is running
- Restart the auth service if needed
- Clear browser cache and cookies
- Try logging in again"
[Shows source tickets with similarity scores]
3ds_Assignement_Data_Scientist_Customer_Success/
βββ app.py # Application entry point
βββ chatbot.py # Streamlit chat interface
βββ llm_handler.py # LLM operations and tool integration
βββ knowledge_graph_retriever.py # Neo4j data retriever handler with similarity search
βββ compute_embedding.py # Text embedding generation handler
βββ config.py # Configuration settings
βββ pyproject.toml # Poetry dependencies and project config
βββ poetry.lock # Poetry lock file for reproducible builds
βββ datasets/
β βββ data/
β β βββ tickets_sample.csv
β β βββ tickets_sample_with_embeddings.csv
β β βββ users_sample.csv
β β βββ comments_sample.csv
β β βββ status_sample.csv
β β βββ priority_sample.csv
β β βββ category_sample.csv
β β βββ documentation_sample.csv
β βββ setup_database.cypher # Neo4j database schema and data setup
β βββ upload_embeddings.cypher # Vector embeddings and index setup
βββ llm_history/ # Conversation storage
β βββ *.json # Individual conversation files
βββ README.md # This file
- Chat Models: Any Ollama-compatible model (llama2, mistral, codellama, etc.)
- Embedding Models: Models that produce 768-dimensional vectors (e.g., sentence-transformers)
- Vector Dimensions: Must match your embedding model (project configured for 768 dimensions)
- Neo4j Requirements: APOC procedures must be installed for embedding handling
- top_k: Number of similar tickets to retrieve (1-20)
- context_comments: Comments to include for context (0-10)
- min_similarity_score: Similarity threshold (0.7-0.95)
- temperature: Response creativity (0.0-1.0)
The system intelligently decides when to search for similar tickets:
- Searches when: Technical issues, error reports, how-to questions
- Doesn't search for: Greetings, general questions, casual conversation
This project was built using knowledge and inspiration from:
- Knowledge Graphs + LLMs: Multi-Hop Question Answering - Neo4j blog on integrating knowledge graphs with LLMs
- LLM Fundamentals: Vectors & Semantic Search - GraphAcademy course on vector indexes and semantic search with neo4j
- Structured Outputs With LLM - OpenAI guide on structured outputs for LLMs
- Function Calling - OpenAI documentation on function calling with LLMs