The LLM-Based Ticket Reply Evaluation is a Streamlit-based web application that evaluates AI-generated customer support responses. It uses OpenAI's GPT-4o model to assess replies based on content relevance, correctness, completeness, clarity, structure, and grammar.
Users can upload a CSV file containing customer tickets and AI responses, and the app will generate evaluations in a chat-like format.
✅ Upload & Process CSV Data – Supports ticket-response datasets.
✅ AI-Powered Evaluation – Uses GPT-4o for scoring replies.
✅ Dynamic Data Editing – Modify CSV content before processing.
✅ Interactive Chat Display – Shows conversations with AI feedback.
✅ Download Evaluations – Export results as a CSV file.
✅ Customizable Prompt – Uses a YAML file for prompt customization.
git clone https://github.com/flakula/rauda.ai.interview.git
cd rauda.ai.interview
python -m venv .venv
.\.venv\Scripts\activate
python -m pip install -r requirements.txt
Create a .env
file in the project directory with:
OPENAI_API_KEY=your-api-key-here
Modify the prompt.yml file to adjust AI evaluation criteria:
streamlit run app.py
This will launch the web app in your browser.
-
Upload Customer Tickets & AI Replies
-
Navigate to the "Ticket Reply Evaluation" application.
-
Upload a CSV file containing customer support tickets and AI-generated replies.
-
The app expects at least two columns:
ticket
– The customer's issue/question.reply
– The AI's response to the ticket.
Example CSV format:
ticket,reply "My order hasn't arrived.","Your order is in transit and should arrive within 3 days." "I need to cancel my subscription.","Sure! You can cancel anytime via your account settings."
-
User can modify the dataframe in the corresponding widget.
-
-
AI Evaluation Process Once the file is uploaded:
-
The app calls GPT-4o to analyze each response.
-
Evaluations are based on:
- Content Score (1-5): Measures relevance, correctness, and completeness.
- Format Score (1-5): Evaluates clarity, grammar, and structure.
Each evaluation follows this format:
{ "content_score": 4, "content_explanation": "The response is somewhat relevant but lacks details.", "format_score": 5, "format_explanation": "The response is well-structured and grammatically correct." }
-
-
Chat-Style Display After processing, the results appear in an interactive chat-like format:
- User Message: Displays the original ticket.
- AI Reply: Shows the AI-generated response.
- Evaluation Summary: Lists content and format scores with explanations.
-
Export Evaluations
- Once responses are evaluated, users can download results as a CSV file for further analysis.
- Click "Download Evaluated Tickets" to save the processed dataset.
- The exported file retains original tickets, replies, and AI scores.
Example output (
tickets_evaluated.csv
):ticket,reply,content_score,content_explanation,format_score,format_explanation "My order hasn't arrived.","Your order is in transit and should arrive within 3 days.",4,"Lacks estimated delivery details.",5,"Clear and well-structured." "I need to cancel my subscription.","Sure! You can cancel anytime via your account settings.",5,"Completely relevant response.",5,"Perfectly formatted."
📂 rauda.ai.interview
├── 📂 .venv # Virtual environment (dependencies)
├── 📂 pages # Streamlit multi-page application
│ ├── 1_📝_Readme.py # Page for README visualization
│ ├── 2_✏️_Edit_Prompt.py # Page for editing the evaluation prompt
├── 📄 .gitignore # Git ignore file
├── 📄 .env # Environment variables (API keys)
├── 📄 AI Engineer Assignment.pdf # Assignment details
├── 📄 prompt.yml # YAML file containing evaluation prompt
├── 📄 README.md # Project documentation
├── 📄 requirements.txt # Python dependencies
├── 📄 tickets.csv # Original customer ticket dataset
├── 📄 tickets_evaluated.csv # AI-evaluated ticket dataset
├── 📄 🔎_Ticket_Reply_Evaluation # Main file for processing ticket responses
├── 📂 tests # Unit test suite
│ ├── 📄 test_prompt.py # Tests for prompt loading
│ ├── 📄 test_processing.py # Tests for CSV handling
│ ├── 📄 test_ai_response.py # Tests for AI response parsing
│ ├── 📄 test_download.py # Tests for CSV downloads
To ensure application functionality, a test suite has been added using pytest
. It covers:
✅ Prompt Validation: Ensures prompt.yml
loads correctly.
✅ Data Processing Tests: Validates CSV structure and missing values.
✅ AI Response Handling: Checks OpenAI output parsing.
✅ Download Functionality: Ensures the evaluated CSV can be downloaded.
pytest tests/
📌 Support for multiple AI models
📌 Advanced analytics & visualizations
📌 More customizable evaluation criteria
📌 Multilingual support for diverse customer tickets