Skip to content

flakula/rauda.ai.interview

Repository files navigation

LLM-Based Ticket Reply Evaluation

Overview

The LLM-Based Ticket Reply Evaluation is a Streamlit-based web application that evaluates AI-generated customer support responses. It uses OpenAI's GPT-4o model to assess replies based on content relevance, correctness, completeness, clarity, structure, and grammar.

Users can upload a CSV file containing customer tickets and AI responses, and the app will generate evaluations in a chat-like format.

Features

Upload & Process CSV Data – Supports ticket-response datasets.
AI-Powered Evaluation – Uses GPT-4o for scoring replies.
Dynamic Data Editing – Modify CSV content before processing.
Interactive Chat Display – Shows conversations with AI feedback.
Download Evaluations – Export results as a CSV file.
Customizable Prompt – Uses a YAML file for prompt customization.

Installation & Setup

1️⃣ Clone the Repository

git clone https://github.com/flakula/rauda.ai.interview.git
cd rauda.ai.interview

2️⃣ Set Up a Virtual Environment (Optional)

python -m venv .venv
.\.venv\Scripts\activate

3️⃣ Install Dependencies

python -m pip install -r requirements.txt

4️⃣ Set Up Environment Variables

Create a .env file in the project directory with:

OPENAI_API_KEY=your-api-key-here

5️⃣ Prepare the Prompt YAML File

Modify the prompt.yml file to adjust AI evaluation criteria:

6️⃣ Run the Application

streamlit run app.py

This will launch the web app in your browser.

Usage

  1. Upload Customer Tickets & AI Replies

    • Navigate to the "Ticket Reply Evaluation" application.

    • Upload a CSV file containing customer support tickets and AI-generated replies.

    • The app expects at least two columns:

      • ticket – The customer's issue/question.
      • reply – The AI's response to the ticket.

      Example CSV format:

      ticket,reply
      "My order hasn't arrived.","Your order is in transit and should arrive within 3 days."
      "I need to cancel my subscription.","Sure! You can cancel anytime via your account settings."
      
    • User can modify the dataframe in the corresponding widget.

  2. AI Evaluation Process Once the file is uploaded:

    • The app calls GPT-4o to analyze each response.

    • Evaluations are based on:

      • Content Score (1-5): Measures relevance, correctness, and completeness.
      • Format Score (1-5): Evaluates clarity, grammar, and structure.

      Each evaluation follows this format:

      {
      "content_score": 4,
      "content_explanation": "The response is somewhat relevant but lacks details.",
      "format_score": 5,
      "format_explanation": "The response is well-structured and grammatically correct."
      }
  3. Chat-Style Display After processing, the results appear in an interactive chat-like format:

    • User Message: Displays the original ticket.
    • AI Reply: Shows the AI-generated response.
    • Evaluation Summary: Lists content and format scores with explanations.
  4. Export Evaluations

    • Once responses are evaluated, users can download results as a CSV file for further analysis.
    • Click "Download Evaluated Tickets" to save the processed dataset.
    • The exported file retains original tickets, replies, and AI scores.

    Example output (tickets_evaluated.csv):

    ticket,reply,content_score,content_explanation,format_score,format_explanation
    "My order hasn't arrived.","Your order is in transit and should arrive within 3 days.",4,"Lacks estimated delivery details.",5,"Clear and well-structured."
    "I need to cancel my subscription.","Sure! You can cancel anytime via your account settings.",5,"Completely relevant response.",5,"Perfectly formatted."
    

File Structure

 📂 rauda.ai.interview
 ├── 📂 .venv                          # Virtual environment (dependencies)
 ├── 📂 pages                          # Streamlit multi-page application
 │   ├── 1_📝_Readme.py                # Page for README visualization
 │   ├── 2_✏️_Edit_Prompt.py           # Page for editing the evaluation prompt
 ├── 📄 .gitignore                     # Git ignore file
 ├── 📄 .env                           # Environment variables (API keys)
 ├── 📄 AI Engineer Assignment.pdf     # Assignment details
 ├── 📄 prompt.yml                     # YAML file containing evaluation prompt
 ├── 📄 README.md                      # Project documentation
 ├── 📄 requirements.txt               # Python dependencies
 ├── 📄 tickets.csv                    # Original customer ticket dataset
 ├── 📄 tickets_evaluated.csv          # AI-evaluated ticket dataset
 ├── 📄 🔎_Ticket_Reply_Evaluation    # Main file for processing ticket responses
 ├── 📂 tests                          # Unit test suite
 │   ├── 📄 test_prompt.py             # Tests for prompt loading
 │   ├── 📄 test_processing.py         # Tests for CSV handling
 │   ├── 📄 test_ai_response.py        # Tests for AI response parsing
 │   ├── 📄 test_download.py           # Tests for CSV downloads

🧪 Testing

To ensure application functionality, a test suite has been added using pytest. It covers:

Prompt Validation: Ensures prompt.yml loads correctly.
Data Processing Tests: Validates CSV structure and missing values.
AI Response Handling: Checks OpenAI output parsing.
Download Functionality: Ensures the evaluated CSV can be downloaded.

Running Tests

pytest tests/

Future Improvements

📌 Support for multiple AI models
📌 Advanced analytics & visualizations
📌 More customizable evaluation criteria
📌 Multilingual support for diverse customer tickets

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages