Skip to content

A support chatbot that leverages local LLMs, vector similarity search, and knowledge graphs to provide contextual assistance by finding and presenting solutions from historical support tickets.

Notifications You must be signed in to change notification settings

AlexisBalayre/3ds_Assignement_Data_Scientist_Customer_Success

Repository files navigation

3DS Assignement - Data Scientist - Customer Success: AuraHelpeskGraph πŸ€–

A support chatbot that leverages local LLMs, vector similarity search, and knowledge graphs to provide contextual assistance by finding and presenting solutions from historical support tickets.

Screen.Recording.2025-06-02.at.23.57.54.mov

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Streamlit     β”‚    β”‚   LLM Handler    β”‚    β”‚   Ollama API    β”‚
β”‚   Chat UI       │◄──►│   (OpenAI        │◄──►│   (Local LLM)   β”‚
β”‚                 β”‚    β”‚   Compatible)    β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚
         β”‚                       β–Ό
         β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚              β”‚  Knowledge Graph │◄──►│     Neo4j       β”‚
         β”‚              β”‚    Retriever     β”‚    β”‚   Database      β”‚
         β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚
         β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Conversation   β”‚    β”‚   Embedding      β”‚
β”‚   History       β”‚    β”‚   Computation    β”‚
β”‚   (JSON Files)  β”‚    β”‚                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

A more detailed workflow flowchart diagram can be found in the docs/1. Analysis/main_workflow_diagram.mmd file.

πŸ“‹ Prerequisites

System Requirements

  • Python 3.13.2
  • Neo4j Database with APOC plugin
  • Ollama

Required Services

  1. Ollama Server

    # Install Ollama
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Pull required models
    ollama pull llama3.2  # or your preferred chat model
    ollama pull nomic-embed-text  # for embeddings
  2. Neo4j Database

    Install Neo4j: https://neo4j.com/deployment-center/

    Important: Make sure to install the APOC plugin for Neo4j as it's required for embedding operations.

πŸš€ Installation

Setup

  1. Clone the repository

    git clone "https://github.com/AlexisBalayre/3ds_Assignement_Data_Scientist_Customer_Success"
    cd 3ds_Assignement_Data_Scientist_Customer_Success
  2. Install Python dependencies using Poetry

    # Install Poetry if you haven't already
    curl -sSL https://install.python-poetry.org | python3 -
    
    # Install project dependencies
    poetry install
    
    # Activate the virtual environment
    poetry shell
  3. Create an environment file

    cp exemple.env .env

πŸ’Ύ Data Setup

Neo4j Database Schema Setup

The project includes Cypher scripts to set up the complete database schema and load sample data:

  1. Database Structure Setup (datasets/setup_database.cypher)

    • Creates unique constraints for all entities
    • Loads sample data from CSV files (users, tickets, comments, etc.)
    • Establishes relationships between entities
    • Cleans up temporary foreign key properties
  2. Embeddings Setup (datasets/upload_embeddings.cypher)

    • Loads pre-computed embeddings for tickets
    • Creates vector index for similarity search
    • Configures cosine similarity with proper dimensions

Setting Up Your Database

  1. Prepare your CSV files in the Neo4j import directory:

    neo4j/import/
    β”œβ”€β”€ users_sample.csv
    β”œβ”€β”€ status_sample.csv
    β”œβ”€β”€ priority_sample.csv
    β”œβ”€β”€ category_sample.csv
    β”œβ”€β”€ tickets_sample.csv
    β”œβ”€β”€ comments_sample.csv
    β”œβ”€β”€ documentation_sample.csv
    └── tickets_sample_with_embeddings.csv
    

    Note: The Cypher scripts use http://localhost:11001/project-<id>/ URLs for CSV import. Update these URLs in the scripts to match your Neo4j import directory or file server setup.

  2. Run the database setup script:

    // In Neo4j Browser or cypher-shell
    // Copy and paste the contents of datasets/setup_database.cypher
  3. Generate embeddings for your tickets:

    # Process your tickets CSV to add embeddings
    poetry run python compute_embedding.py
  4. Upload embeddings to Neo4j:

    // In Neo4j Browser
    // Copy and paste the contents of datasets/upload_embeddings.cypher

Data Schema

The database follows this entity relationship structure:

User ──SUBMITS──► Ticket ──HAS_STATUS──► Status
 β”‚                   β”‚
 └──POSTS──► Comment β”‚
             β”‚       β”œβ”€β”€HAS_PRIORITY──► Priority
             β”‚       β”‚
         CONTAINS◄── β”‚ ──HAS_CATEGORY──► Category
                     β”‚                      β”‚
                     └──CONTAINS──► Comment β”‚
                                    β”‚       β”‚
            DocumentationArticle β—„β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                REFERENCED_BY◄──Comment

A more detailed architecture diagram can be found in the docs/1. Analysis/knowledge_graph_architecture_diagram.mmd file.

The graph traversal workflow flowchart can be found in the docs/1. Analysis/graph_traversal_workflow_diagram.mmd file.

🎯 Usage

Running the Application

# Using Poetry (recommended)
poetry run streamlit run app.py

# Or activate the virtual environment first
poetry shell
streamlit run app.py

# Or run directly
poetry run python app.py

Using the Chat Interface

  1. Access the application at http://localhost:8501
  2. Select a model from the sidebar dropdown
  3. Configure parameters (optional):
    • System prompt
    • Temperature (creativity level)
    • Number of similar tickets to retrieve
    • Similarity threshold
    • Context comments to include
  4. Start chatting with questions about technical issues
  5. View sources when the AI finds similar tickets

Example Interactions

User: "I get a 500 error when I try to open Aura. What should I do?"

AI: [Searches for similar tickets and finds relevant solutions] "Based on similar tickets, this is typically caused by authentication service issues. Here's how to resolve it:

  1. Check if the authentication service is running
  2. Restart the auth service if needed
  3. Clear browser cache and cookies
  4. Try logging in again"

[Shows source tickets with similarity scores]

πŸ“ Project Structure

3ds_Assignement_Data_Scientist_Customer_Success/
β”œβ”€β”€ app.py                     # Application entry point
β”œβ”€β”€ chatbot.py                  # Streamlit chat interface
β”œβ”€β”€ llm_handler.py             # LLM operations and tool integration
β”œβ”€β”€ knowledge_graph_retriever.py # Neo4j data retriever handler with similarity search
β”œβ”€β”€ compute_embedding.py       # Text embedding generation handler
β”œβ”€β”€ config.py                  # Configuration settings
β”œβ”€β”€ pyproject.toml             # Poetry dependencies and project config
β”œβ”€β”€ poetry.lock               # Poetry lock file for reproducible builds
β”œβ”€β”€ datasets/
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ tickets_sample.csv
β”‚   β”‚   β”œβ”€β”€ tickets_sample_with_embeddings.csv
β”‚   β”‚   β”œβ”€β”€ users_sample.csv
β”‚   β”‚   β”œβ”€β”€ comments_sample.csv
β”‚   β”‚   β”œβ”€β”€ status_sample.csv
β”‚   β”‚   β”œβ”€β”€ priority_sample.csv
β”‚   β”‚   β”œβ”€β”€ category_sample.csv
β”‚   β”‚   └── documentation_sample.csv
β”‚   β”œβ”€β”€ setup_database.cypher   # Neo4j database schema and data setup
β”‚   └── upload_embeddings.cypher # Vector embeddings and index setup
β”œβ”€β”€ llm_history/              # Conversation storage
β”‚   └── *.json               # Individual conversation files
└── README.md                 # This file

βš™οΈ Configuration

Model Configuration

  • Chat Models: Any Ollama-compatible model (llama2, mistral, codellama, etc.)
  • Embedding Models: Models that produce 768-dimensional vectors (e.g., sentence-transformers)
  • Vector Dimensions: Must match your embedding model (project configured for 768 dimensions)
  • Neo4j Requirements: APOC procedures must be installed for embedding handling

Performance Tuning

  • top_k: Number of similar tickets to retrieve (1-20)
  • context_comments: Comments to include for context (0-10)
  • min_similarity_score: Similarity threshold (0.7-0.95)
  • temperature: Response creativity (0.0-1.0)

πŸ”§ Advanced Features

Tool Usage

The system intelligently decides when to search for similar tickets:

  • Searches when: Technical issues, error reports, how-to questions
  • Doesn't search for: Greetings, general questions, casual conversation

πŸ“š Resources & References

This project was built using knowledge and inspiration from:

Additional Learning Resources

About

A support chatbot that leverages local LLMs, vector similarity search, and knowledge graphs to provide contextual assistance by finding and presenting solutions from historical support tickets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published