A machine learning system that automatically categorizes emails into 10 distinct topics using both content and sender information. The classifier uses a dual-feature pipeline combining TF-IDF vectorization with sender domain analysis to achieve high accuracy on both straightforward and ambiguous cases.
- 10 distinct email topics
- High-accuracy classification (85% on straightforward cases, 70% on ambiguous cases)
- Interactive HTML report
- Includes test suite with separate evaluation for straightforward and ambiguous cases
- Comprehensive evaluation metrics including cross-validation and confusion matrix
The classifier can categorize emails into these topics:
- Work
- Shipping
- Finance
- Travel
- Promotions
- Social
- Updates
- Support
- Spam
- Events
-
Clone the repository:
git clone https://github.com/Karan5352/email-topic-classifier.git cd email-topic-classifier
-
Set up the environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Export emails from Gmail:
# 1. Generate an App Password (Security Key): # - Go to your Google Account settings # - Navigate to Security # - Enable 2-Step Verification if not already enabled # - Go to App Passwords # - Select "Mail" and your device # - Copy the generated 16-character security key # 2. Export emails using the security key: # Use --days 30 to only export emails from the past 30 days python export_gmail.py --email "your.email@gmail.com" --app-password "your-16-char-security-key" --output "emails" --days 30
-
Train the model:
python email_topic_classifier.py --train
-
Process your emails:
# Process .eml files and organize by topic python email_processor.py --input "emails" --output "organized_emails"
-
View the report: Open
organized_emails/email_report.html
in your browser
from email_topic_classifier import EmailTopicClassifier
# Initialize and train
classifier = EmailTopicClassifier()
classifier.train(texts, labels)
# Make predictions
topic = classifier.predict("Your payment has been processed")
print(f"Predicted topic: {topic}")
# Export emails from Gmail
python export_gmail.py --email "your.email@gmail.com" --app-password "your-16-char-security-key" --output "emails" --days 30
# Process and organize them
python email_processor.py --input "emails" --output "organized_emails"
custom_emails = [
("Your meeting is scheduled for tomorrow", "work"),
("Family dinner this weekend", "personal"),
# Add more examples...
]
texts, labels = zip(*custom_emails)
classifier.train(texts, labels)
python -m unittest discover tests
This project is licensed under the MIT License.