3DS Assignement - Data Scientist - Customer Success: AuraHelpeskGraph 🤖

A support chatbot that leverages local LLMs, vector similarity search, and knowledge graphs to provide contextual assistance by finding and presenting solutions from historical support tickets.

Screen.Recording.2025-06-02.at.23.57.54.mov

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Streamlit     │    │   LLM Handler    │    │   Ollama API    │
│   Chat UI       │◄──►│   (OpenAI        │◄──►│   (Local LLM)   │
│                 │    │   Compatible)    │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │
         │                       ▼
         │              ┌──────────────────┐    ┌─────────────────┐
         │              │  Knowledge Graph │◄──►│     Neo4j       │
         │              │    Retriever     │    │   Database      │
         │              └──────────────────┘    └─────────────────┘
         │                       │
         ▼                       ▼
┌─────────────────┐    ┌──────────────────┐
│  Conversation   │    │   Embedding      │
│   History       │    │   Computation    │
│   (JSON Files)  │    │                  │
└─────────────────┘    └──────────────────┘

A more detailed workflow flowchart diagram can be found in the docs/1. Analysis/main_workflow_diagram.mmd file.

📋 Prerequisites

System Requirements

Python 3.13.2
Neo4j Database with APOC plugin
Ollama

Required Services

Ollama Server

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull required models
ollama pull llama3.2  # or your preferred chat model
ollama pull nomic-embed-text  # for embeddings

Neo4j Database

Install Neo4j: https://neo4j.com/deployment-center/

Important: Make sure to install the APOC plugin for Neo4j as it's required for embedding operations.

🚀 Installation

Setup

Clone the repository

git clone "https://github.com/AlexisBalayre/3ds_Assignement_Data_Scientist_Customer_Success"
cd 3ds_Assignement_Data_Scientist_Customer_Success

Install Python dependencies using Poetry

# Install Poetry if you haven't already
curl -sSL https://install.python-poetry.org | python3 -

# Install project dependencies
poetry install

# Activate the virtual environment
poetry shell

Create an environment file
```
cp exemple.env .env
```

💾 Data Setup

Neo4j Database Schema Setup

The project includes Cypher scripts to set up the complete database schema and load sample data:

Database Structure Setup (datasets/setup_database.cypher)
- Creates unique constraints for all entities
- Loads sample data from CSV files (users, tickets, comments, etc.)
- Establishes relationships between entities
- Cleans up temporary foreign key properties
Embeddings Setup (datasets/upload_embeddings.cypher)
- Loads pre-computed embeddings for tickets
- Creates vector index for similarity search
- Configures cosine similarity with proper dimensions

Setting Up Your Database

Prepare your CSV files in the Neo4j import directory:

neo4j/import/
├── users_sample.csv
├── status_sample.csv
├── priority_sample.csv
├── category_sample.csv
├── tickets_sample.csv
├── comments_sample.csv
├── documentation_sample.csv
└── tickets_sample_with_embeddings.csv

Note: The Cypher scripts use http://localhost:11001/project-<id>/ URLs for CSV import. Update these URLs in the scripts to match your Neo4j import directory or file server setup.

Run the database setup script:

// In Neo4j Browser or cypher-shell
// Copy and paste the contents of datasets/setup_database.cypher

Generate embeddings for your tickets:

# Process your tickets CSV to add embeddings
poetry run python compute_embedding.py

Upload embeddings to Neo4j:

// In Neo4j Browser
// Copy and paste the contents of datasets/upload_embeddings.cypher

Data Schema

The database follows this entity relationship structure:

User ──SUBMITS──► Ticket ──HAS_STATUS──► Status
 │                   │
 └──POSTS──► Comment │
             │       ├──HAS_PRIORITY──► Priority
             │       │
         CONTAINS◄── │ ──HAS_CATEGORY──► Category
                     │                      │
                     └──CONTAINS──► Comment │
                                    │       │
            DocumentationArticle ◄──┴───────┘
                     │
                REFERENCED_BY◄──Comment

A more detailed architecture diagram can be found in the docs/1. Analysis/knowledge_graph_architecture_diagram.mmd file.

The graph traversal workflow flowchart can be found in the docs/1. Analysis/graph_traversal_workflow_diagram.mmd file.

🎯 Usage

Running the Application

# Using Poetry (recommended)
poetry run streamlit run app.py

# Or activate the virtual environment first
poetry shell
streamlit run app.py

# Or run directly
poetry run python app.py

Using the Chat Interface

Access the application at http://localhost:8501
Select a model from the sidebar dropdown
Configure parameters (optional):
- System prompt
- Temperature (creativity level)
- Number of similar tickets to retrieve
- Similarity threshold
- Context comments to include
Start chatting with questions about technical issues
View sources when the AI finds similar tickets

Example Interactions

User: "I get a 500 error when I try to open Aura. What should I do?"

AI: [Searches for similar tickets and finds relevant solutions] "Based on similar tickets, this is typically caused by authentication service issues. Here's how to resolve it:

Check if the authentication service is running
Restart the auth service if needed
Clear browser cache and cookies
Try logging in again"

[Shows source tickets with similarity scores]

📁 Project Structure

3ds_Assignement_Data_Scientist_Customer_Success/
├── app.py                     # Application entry point
├── chatbot.py                  # Streamlit chat interface
├── llm_handler.py             # LLM operations and tool integration
├── knowledge_graph_retriever.py # Neo4j data retriever handler with similarity search
├── compute_embedding.py       # Text embedding generation handler
├── config.py                  # Configuration settings
├── pyproject.toml             # Poetry dependencies and project config
├── poetry.lock               # Poetry lock file for reproducible builds
├── datasets/
│   ├── data/
│   │   ├── tickets_sample.csv
│   │   ├── tickets_sample_with_embeddings.csv
│   │   ├── users_sample.csv
│   │   ├── comments_sample.csv
│   │   ├── status_sample.csv
│   │   ├── priority_sample.csv
│   │   ├── category_sample.csv
│   │   └── documentation_sample.csv
│   ├── setup_database.cypher   # Neo4j database schema and data setup
│   └── upload_embeddings.cypher # Vector embeddings and index setup
├── llm_history/              # Conversation storage
│   └── *.json               # Individual conversation files
└── README.md                 # This file

⚙️ Configuration

Model Configuration

Chat Models: Any Ollama-compatible model (llama2, mistral, codellama, etc.)
Embedding Models: Models that produce 768-dimensional vectors (e.g., sentence-transformers)
Vector Dimensions: Must match your embedding model (project configured for 768 dimensions)
Neo4j Requirements: APOC procedures must be installed for embedding handling

Performance Tuning

top_k: Number of similar tickets to retrieve (1-20)
context_comments: Comments to include for context (0-10)
min_similarity_score: Similarity threshold (0.7-0.95)
temperature: Response creativity (0.0-1.0)

🔧 Advanced Features

Tool Usage

The system intelligently decides when to search for similar tickets:

Searches when: Technical issues, error reports, how-to questions
Doesn't search for: Greetings, general questions, casual conversation

📚 Resources & References

This project was built using knowledge and inspiration from:

Knowledge Graphs + LLMs: Multi-Hop Question Answering - Neo4j blog on integrating knowledge graphs with LLMs
LLM Fundamentals: Vectors & Semantic Search - GraphAcademy course on vector indexes and semantic search with neo4j
Structured Outputs With LLM - OpenAI guide on structured outputs for LLMs
Function Calling - OpenAI documentation on function calling with LLMs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

3DS Assignement - Data Scientist - Customer Success: AuraHelpeskGraph 🤖

🏗️ Architecture

📋 Prerequisites

System Requirements

Required Services

🚀 Installation

Setup

💾 Data Setup

Neo4j Database Schema Setup

Setting Up Your Database

Data Schema

🎯 Usage

Running the Application

Using the Chat Interface

Example Interactions

📁 Project Structure

⚙️ Configuration

Model Configuration

Performance Tuning

🔧 Advanced Features

Tool Usage

📚 Resources & References

Additional Learning Resources

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
datasets		datasets
docs		docs
.gitignore		.gitignore
README.md		README.md
app.py		app.py
chatbot.py		chatbot.py
compute_embedding.py		compute_embedding.py
config.py		config.py
exemple.env		exemple.env
knowledge_graph_retriever.py		knowledge_graph_retriever.py
llm_handler.py		llm_handler.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

AlexisBalayre/3ds_Assignement_Data_Scientist_Customer_Success

Folders and files

Latest commit

History

Repository files navigation

3DS Assignement - Data Scientist - Customer Success: AuraHelpeskGraph 🤖

🏗️ Architecture

📋 Prerequisites

System Requirements

Required Services

🚀 Installation

Setup

💾 Data Setup

Neo4j Database Schema Setup

Setting Up Your Database

Data Schema

🎯 Usage

Running the Application

Using the Chat Interface

Example Interactions

📁 Project Structure

⚙️ Configuration

Model Configuration

Performance Tuning

🔧 Advanced Features

Tool Usage

📚 Resources & References

Additional Learning Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages