A comprehensive conversation analysis tool designed for marriage/wedding organizations to analyze customer interactions using OpenAI's GPT models. This project enables both automated analysis and manual labeling of conversations to evaluate bot performance and customer sentiment. I have developed a more advanced version of this project for a client which is private. This version is a more rudimentary version that, with the explicit permission of the client, I have open-sourced (sensitive data omitted).
-
Automated Conversation Analysis: Uses OpenAI's GPT models to analyze conversations and extract:
- Overall customer sentiment (positive, neutral, negative)
- Bot understanding quality (good, acceptable, poor)
- Bot performance assessment (good, acceptable, poor)
- Response completeness (whether bot answered the last message)
- Conversation categorization
-
Manual Labeling Interface: Beautiful web-based tool for creating ground truth labels
- Interactive conversation viewer
- Progress tracking and navigation
- Export/import functionality for labels
- Keyboard shortcuts for efficient labeling
-
Fine-Tuning Support: Prepare data for OpenAI fine-tuning to improve analysis accuracy
- JSONL format validation
- Data quality checks
- Ground truth integration
# Clone the repository
git clone https://github.com/yourusername/fine-tuned-sentiment-analyser.git
cd fine-tuned-sentiment-analyser
# Install dependencies
pip install -r requirements.txtCreate a secrets.env file in the project root:
cp secrets.env.example secrets.envEdit secrets.env and add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here
The tool expects conversation data in a specific JSON format. Use the provided preprocessing script:
# Update process_conversations.py with your input file path
input_file = "your_conversations.json"
output_file = "cleaned_conversations.json"
python process_conversations.pyRun the main analyzer:
python conversation_analyzer.pyFollow the prompts to select how many conversations to analyze.
The main analysis script (conversation_analyzer.py) provides several options:
- Analyze specific number:
10(analyzes first 10 conversations) - Analyze range:
11-50(analyzes conversations 11 through 50) - Analyze all:
allora(analyzes all conversations)
Results are saved to classification_results.json with detailed analysis for each conversation.
- Open
conversation_labeler.htmlin your web browser - Load your conversation JSON file using the "Load Conversations" button
- Navigate through conversations and provide manual labels
- Export your ground truth labels when complete
Keyboard Shortcuts:
Ctrl + ←: Previous conversationCtrl + →: Next conversationCtrl + S: Save current labels
Use process_conversations.py to convert raw conversation data to the required format:
def process_conversations_file(input_file, output_file, max_conversations=100):
# Processes conversations and creates cleaned formatExpected input format:
[
{
"conversation_id": "unique_id",
"messages": [
{
"id": "message_id",
"type": "TEXT",
"sender_id": "sender_identifier",
"content": {"text": "message content"},
"created_at": "timestamp",
"is_internal": false
}
]
}
]- Create manual ground truth using the labeling interface
- Generate fine-tuning data (implement your own script based on ground truth)
- Validate JSONL format:
python validate_jsonl.py- Upload to OpenAI and update model ID in
conversation_analyzer.pyline 174
The tool categorizes conversations into predefined categories relevant to wedding/marriage organizations:
- Düğün Mekanları (Wedding Venues)
- Düğün Organizasyon (Wedding Organization)
- Kına Gecesi (Henna Night)
- Nişan ve Söz (Engagement)
- Mezuniyet ve Balo (Graduation & Prom)
- Doğum Günü & Baby Shower
- Düğün Fotoğrafçıları (Wedding Photographers)
- And more...
To adapt for your domain:
- Update categories in
conversation_analyzer.py(lines 58-64) - Modify analysis prompt for your specific use case (lines 84-129)
- Adjust bot identification in
process_conversations.py(line 64)
fine-tuned-sentiment-analyser/
├── conversation_analyzer.py # Main analysis script
├── conversation_labeler.html # Manual labeling interface
├── process_conversations.py # Data preprocessing
├── validate_jsonl.py # JSONL validation for fine-tuning
├── requirements.txt # Python dependencies
├── secrets.env.example # Environment variables template
├── fine_tuning.jsonl # Fine-tuning data (placeholder)
└── example-last-500-conversations.json # Example data format
Each analyzed conversation produces:
{
"conversation_id": "unique_identifier",
"llm_classification": {
"overall_sentiment": "positive|neutral|negative",
"bot_understanding": "good|acceptable|poor",
"bot_performance": "good|acceptable|poor",
"bot_answered": true|false,
"categories": ["Category1", "Category2"],
"to_improve_understanding": "explanation or null",
"to_improve_performance": "explanation or null"
}
}{
"exported_at": "2024-01-01T00:00:00.000Z",
"total_labeled": 50,
"labels": [
{
"conversation_id": "conv_123",
"ground_truth": {
"overall_sentiment": "positive",
"bot_understanding": "good",
"bot_performance": "acceptable",
"bot_answered": true
}
}
]
}from conversation_analyzer import ConversationAnalyzer
# Initialize
analyzer = ConversationAnalyzer()
# Load conversations
conversations = analyzer.load_conversations("your_file.json")
# Analyze specific range
results = analyzer.analyze_conversations(conversations, start=0, end=10)
# Save results
analyzer.save_results(results, "output.json")- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- openai: For GPT model integration
- tqdm: Progress bars during analysis
- python-dotenv: Environment variable management
- API Key Error: Ensure your OpenAI API key is correctly set in
secrets.env - JSON Format Error: Validate your input data format matches the expected structure
- Rate Limiting: The tool includes automatic delays to prevent API rate limits
- Memory Issues: For large datasets, process in smaller batches
- Use ranges (e.g.,
1-100) for large datasets - Monitor API usage and costs
- Consider fine-tuning for better accuracy on your specific domain
This project is licensed under the MIT License - see the LICENSE file for details.
- Built for marriage/wedding organization customer service analysis
- Uses OpenAI's GPT models for intelligent conversation analysis
- Inspired by the need for automated customer interaction quality assessment
For questions, issues, or contributions, please:
- Check existing issues in the repository
- Create a new issue with detailed information
- Provide sample data and error messages when applicable
Note: This tool is designed for Turkish wedding/marriage organization conversations but can be adapted for other domains by modifying the categories and analysis prompts.