LLM-Based Ticket Reply Evaluation

Overview

The LLM-Based Ticket Reply Evaluation is a Streamlit-based web application that evaluates AI-generated customer support responses. It uses OpenAI's GPT-4o model to assess replies based on content relevance, correctness, completeness, clarity, structure, and grammar.

Users can upload a CSV file containing customer tickets and AI responses, and the app will generate evaluations in a chat-like format.

Features

✅ Upload & Process CSV Data – Supports ticket-response datasets.
✅ AI-Powered Evaluation – Uses GPT-4o for scoring replies.
✅ Dynamic Data Editing – Modify CSV content before processing.
✅ Interactive Chat Display – Shows conversations with AI feedback.
✅ Download Evaluations – Export results as a CSV file.
✅ Customizable Prompt – Uses a YAML file for prompt customization.

Installation & Setup

1️⃣ Clone the Repository

git clone https://github.com/flakula/rauda.ai.interview.git
cd rauda.ai.interview

2️⃣ Set Up a Virtual Environment (Optional)

python -m venv .venv
.\.venv\Scripts\activate

3️⃣ Install Dependencies

python -m pip install -r requirements.txt

4️⃣ Set Up Environment Variables

Create a .env file in the project directory with:

OPENAI_API_KEY=your-api-key-here

5️⃣ Prepare the Prompt YAML File

Modify the prompt.yml file to adjust AI evaluation criteria:

6️⃣ Run the Application

streamlit run app.py

This will launch the web app in your browser.

Usage

Upload Customer Tickets & AI Replies
- Navigate to the "Ticket Reply Evaluation" application.
- Upload a CSV file containing customer support tickets and AI-generated replies.
- The app expects at least two columns:
  - ticket – The customer's issue/question.
  - reply – The AI's response to the ticket.
  Example CSV format:
```
ticket,reply
"My order hasn't arrived.","Your order is in transit and should arrive within 3 days."
"I need to cancel my subscription.","Sure! You can cancel anytime via your account settings."
```
- User can modify the dataframe in the corresponding widget.
AI Evaluation Process Once the file is uploaded:
- The app calls GPT-4o to analyze each response.
- Evaluations are based on:
  - Content Score (1-5): Measures relevance, correctness, and completeness.
  - Format Score (1-5): Evaluates clarity, grammar, and structure.
  Each evaluation follows this format:
```
{
"content_score": 4,
"content_explanation": "The response is somewhat relevant but lacks details.",
"format_score": 5,
"format_explanation": "The response is well-structured and grammatically correct."
}
```
Chat-Style Display After processing, the results appear in an interactive chat-like format:
- User Message: Displays the original ticket.
- AI Reply: Shows the AI-generated response.
- Evaluation Summary: Lists content and format scores with explanations.

Export Evaluations

Once responses are evaluated, users can download results as a CSV file for further analysis.
Click "Download Evaluated Tickets" to save the processed dataset.
The exported file retains original tickets, replies, and AI scores.

Example output (tickets_evaluated.csv):

ticket,reply,content_score,content_explanation,format_score,format_explanation
"My order hasn't arrived.","Your order is in transit and should arrive within 3 days.",4,"Lacks estimated delivery details.",5,"Clear and well-structured."
"I need to cancel my subscription.","Sure! You can cancel anytime via your account settings.",5,"Completely relevant response.",5,"Perfectly formatted."

File Structure

 📂 rauda.ai.interview
 ├── 📂 .venv                          # Virtual environment (dependencies)
 ├── 📂 pages                          # Streamlit multi-page application
 │   ├── 1_📝_Readme.py                # Page for README visualization
 │   ├── 2_✏️_Edit_Prompt.py           # Page for editing the evaluation prompt
 ├── 📄 .gitignore                     # Git ignore file
 ├── 📄 .env                           # Environment variables (API keys)
 ├── 📄 AI Engineer Assignment.pdf     # Assignment details
 ├── 📄 prompt.yml                     # YAML file containing evaluation prompt
 ├── 📄 README.md                      # Project documentation
 ├── 📄 requirements.txt               # Python dependencies
 ├── 📄 tickets.csv                    # Original customer ticket dataset
 ├── 📄 tickets_evaluated.csv          # AI-evaluated ticket dataset
 ├── 📄 🔎_Ticket_Reply_Evaluation    # Main file for processing ticket responses
 ├── 📂 tests                          # Unit test suite
 │   ├── 📄 test_prompt.py             # Tests for prompt loading
 │   ├── 📄 test_processing.py         # Tests for CSV handling
 │   ├── 📄 test_ai_response.py        # Tests for AI response parsing
 │   ├── 📄 test_download.py           # Tests for CSV downloads

🧪 Testing

To ensure application functionality, a test suite has been added using pytest. It covers:

✅ Prompt Validation: Ensures prompt.yml loads correctly.
✅ Data Processing Tests: Validates CSV structure and missing values.
✅ AI Response Handling: Checks OpenAI output parsing.
✅ Download Functionality: Ensures the evaluated CSV can be downloaded.

Running Tests

pytest tests/

Future Improvements

📌 Support for multiple AI models
📌 Advanced analytics & visualizations
📌 More customizable evaluation criteria
📌 Multilingual support for diverse customer tickets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM-Based Ticket Reply Evaluation

Overview

Features

Installation & Setup

1️⃣ Clone the Repository

2️⃣ Set Up a Virtual Environment (Optional)

3️⃣ Install Dependencies

4️⃣ Set Up Environment Variables

5️⃣ Prepare the Prompt YAML File

6️⃣ Run the Application

Usage

File Structure

🧪 Testing

Running Tests

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
pages		pages
src		src
tests		tests
.gitignore		.gitignore
AI Engineer Assignment.pdf		AI Engineer Assignment.pdf
Readme.md		Readme.md
prompt.yml		prompt.yml
requirements.txt		requirements.txt
tickets.csv		tickets.csv
tickets_evaluated.csv		tickets_evaluated.csv
🔎_Ticket_Reply_Evaluation.py		🔎_Ticket_Reply_Evaluation.py

flakula/rauda.ai.interview

Folders and files

Latest commit

History

Repository files navigation

LLM-Based Ticket Reply Evaluation

Overview

Features

Installation & Setup

1️⃣ Clone the Repository

2️⃣ Set Up a Virtual Environment (Optional)

3️⃣ Install Dependencies

4️⃣ Set Up Environment Variables

5️⃣ Prepare the Prompt YAML File

6️⃣ Run the Application

Usage

File Structure

🧪 Testing

Running Tests

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages