|
| 1 | +# Extracting Information from Emails with DSPy |
| 2 | + |
| 3 | +This tutorial demonstrates how to build an intelligent email processing system using DSPy. We'll create a system that can automatically extract key information from various types of emails, classify their intent, and structure the data for further processing. |
| 4 | + |
| 5 | +## What You'll Build |
| 6 | + |
| 7 | +By the end of this tutorial, you'll have a DSPy-powered email processing system that can: |
| 8 | + |
| 9 | +- **Classify email types** (order confirmation, support request, meeting invitation, etc.) |
| 10 | +- **Extract key entities** (dates, amounts, product names, contact info) |
| 11 | +- **Determine urgency levels** and required actions |
| 12 | +- **Structure extracted data** into consistent formats |
| 13 | +- **Handle multiple email formats** robustly |
| 14 | + |
| 15 | +## Prerequisites |
| 16 | + |
| 17 | +- Basic understanding of DSPy modules and signatures |
| 18 | +- Python 3.9+ installed |
| 19 | +- OpenAI API key (or access to another supported LLM) |
| 20 | + |
| 21 | +## Installation and Setup |
| 22 | + |
| 23 | +```bash |
| 24 | +pip install dspy |
| 25 | +``` |
| 26 | + |
| 27 | +## Step 1: Define Our Data Structures |
| 28 | + |
| 29 | +First, let's define the types of information we want to extract from emails: |
| 30 | + |
| 31 | +```python |
| 32 | +import dspy |
| 33 | +from typing import List, Optional, Literal |
| 34 | +from datetime import datetime |
| 35 | +from pydantic import BaseModel |
| 36 | +from enum import Enum |
| 37 | + |
| 38 | +class EmailType(str, Enum): |
| 39 | + ORDER_CONFIRMATION = "order_confirmation" |
| 40 | + SUPPORT_REQUEST = "support_request" |
| 41 | + MEETING_INVITATION = "meeting_invitation" |
| 42 | + NEWSLETTER = "newsletter" |
| 43 | + PROMOTIONAL = "promotional" |
| 44 | + INVOICE = "invoice" |
| 45 | + SHIPPING_NOTIFICATION = "shipping_notification" |
| 46 | + OTHER = "other" |
| 47 | + |
| 48 | +class UrgencyLevel(str, Enum): |
| 49 | + LOW = "low" |
| 50 | + MEDIUM = "medium" |
| 51 | + HIGH = "high" |
| 52 | + CRITICAL = "critical" |
| 53 | + |
| 54 | +class ExtractedEntity(BaseModel): |
| 55 | + entity_type: str |
| 56 | + value: str |
| 57 | + confidence: float |
| 58 | + |
| 59 | +class EmailInsight(BaseModel): |
| 60 | + email_type: EmailType |
| 61 | + urgency: UrgencyLevel |
| 62 | + summary: str |
| 63 | + key_entities: List[ExtractedEntity] |
| 64 | + action_required: bool |
| 65 | + deadline: Optional[str] = None |
| 66 | + amount: Optional[float] = None |
| 67 | + sender_info: Optional[str] = None |
| 68 | +``` |
| 69 | + |
| 70 | +## Step 2: Create DSPy Signatures |
| 71 | + |
| 72 | +Now let's define the signatures for our email processing pipeline: |
| 73 | + |
| 74 | +```python |
| 75 | +class ClassifyEmail(dspy.Signature): |
| 76 | + """Classify the type and urgency of an email based on its content.""" |
| 77 | + |
| 78 | + email_subject: str = dspy.InputField(desc="The subject line of the email") |
| 79 | + email_body: str = dspy.InputField(desc="The main content of the email") |
| 80 | + sender: str = dspy.InputField(desc="Email sender information") |
| 81 | + |
| 82 | + email_type: EmailType = dspy.OutputField(desc="The classified type of email") |
| 83 | + urgency: UrgencyLevel = dspy.OutputField(desc="The urgency level of the email") |
| 84 | + reasoning: str = dspy.OutputField(desc="Brief explanation of the classification") |
| 85 | + |
| 86 | +class ExtractEntities(dspy.Signature): |
| 87 | + """Extract key entities and information from email content.""" |
| 88 | + |
| 89 | + email_content: str = dspy.InputField(desc="The full email content including subject and body") |
| 90 | + email_type: EmailType = dspy.InputField(desc="The classified type of email") |
| 91 | + |
| 92 | + key_entities: List[ExtractedEntity] = dspy.OutputField(desc="List of extracted entities with type, value, and confidence") |
| 93 | + financial_amount: Optional[float] = dspy.OutputField(desc="Any monetary amounts found (e.g., '$99.99')") |
| 94 | + important_dates: list[str] = dspy.OutputField(desc="List of important dates found in the email") |
| 95 | + contact_info: list[str] = dspy.OutputField(desc="Relevant contact information extracted") |
| 96 | + |
| 97 | +class GenerateActionItems(dspy.Signature): |
| 98 | + """Determine what actions are needed based on the email content and extracted information.""" |
| 99 | + |
| 100 | + email_type: EmailType = dspy.InputField() |
| 101 | + urgency: UrgencyLevel = dspy.InputField() |
| 102 | + email_summary: str = dspy.InputField(desc="Brief summary of the email content") |
| 103 | + extracted_entities: List[ExtractedEntity] = dspy.InputField(desc="Key entities found in the email") |
| 104 | + |
| 105 | + action_required: bool = dspy.OutputField(desc="Whether any action is required") |
| 106 | + action_items: list[str] = dspy.OutputField(desc="List of specific actions needed") |
| 107 | + deadline: Optional[str] = dspy.OutputField(desc="Deadline for action if applicable") |
| 108 | + priority_score: int = dspy.OutputField(desc="Priority score from 1-10") |
| 109 | + |
| 110 | +class SummarizeEmail(dspy.Signature): |
| 111 | + """Create a concise summary of the email content.""" |
| 112 | + |
| 113 | + email_subject: str = dspy.InputField() |
| 114 | + email_body: str = dspy.InputField() |
| 115 | + key_entities: List[ExtractedEntity] = dspy.InputField() |
| 116 | + |
| 117 | + summary: str = dspy.OutputField(desc="A 2-3 sentence summary of the email's main points") |
| 118 | +``` |
| 119 | + |
| 120 | +## Step 3: Build the Email Processing Module |
| 121 | + |
| 122 | +Now let's create our main email processing module: |
| 123 | + |
| 124 | +```python |
| 125 | +class EmailProcessor(dspy.Module): |
| 126 | + """A comprehensive email processing system using DSPy.""" |
| 127 | + |
| 128 | + def __init__(self): |
| 129 | + super().__init__() |
| 130 | + |
| 131 | + # Initialize our processing components |
| 132 | + self.classifier = dspy.ChainOfThought(ClassifyEmail) |
| 133 | + self.entity_extractor = dspy.ChainOfThought(ExtractEntities) |
| 134 | + self.action_generator = dspy.ChainOfThought(GenerateActionItems) |
| 135 | + self.summarizer = dspy.ChainOfThought(SummarizeEmail) |
| 136 | + |
| 137 | + def forward(self, email_subject: str, email_body: str, sender: str = ""): |
| 138 | + """Process an email and extract structured information.""" |
| 139 | + |
| 140 | + # Step 1: Classify the email |
| 141 | + classification = self.classifier( |
| 142 | + email_subject=email_subject, |
| 143 | + email_body=email_body, |
| 144 | + sender=sender |
| 145 | + ) |
| 146 | + |
| 147 | + # Step 2: Extract entities |
| 148 | + full_content = f"Subject: {email_subject}\n\nFrom: {sender}\n\n{email_body}" |
| 149 | + entities = self.entity_extractor( |
| 150 | + email_content=full_content, |
| 151 | + email_type=classification.email_type |
| 152 | + ) |
| 153 | + |
| 154 | + # Step 3: Generate summary |
| 155 | + summary = self.summarizer( |
| 156 | + email_subject=email_subject, |
| 157 | + email_body=email_body, |
| 158 | + key_entities=entities.key_entities |
| 159 | + ) |
| 160 | + |
| 161 | + # Step 4: Determine actions |
| 162 | + actions = self.action_generator( |
| 163 | + email_type=classification.email_type, |
| 164 | + urgency=classification.urgency, |
| 165 | + email_summary=summary.summary, |
| 166 | + extracted_entities=entities.key_entities |
| 167 | + ) |
| 168 | + |
| 169 | + # Step 5: Structure the results |
| 170 | + return dspy.Prediction( |
| 171 | + email_type=classification.email_type, |
| 172 | + urgency=classification.urgency, |
| 173 | + summary=summary.summary, |
| 174 | + key_entities=entities.key_entities, |
| 175 | + financial_amount=entities.financial_amount, |
| 176 | + important_dates=entities.important_dates, |
| 177 | + action_required=actions.action_required, |
| 178 | + action_items=actions.action_items, |
| 179 | + deadline=actions.deadline, |
| 180 | + priority_score=actions.priority_score, |
| 181 | + reasoning=classification.reasoning, |
| 182 | + contact_info=entities.contact_info |
| 183 | + ) |
| 184 | +``` |
| 185 | + |
| 186 | +## Step 4: Running the Email Processing System |
| 187 | + |
| 188 | +Let's create a simple function to test our email processing system: |
| 189 | + |
| 190 | +```python |
| 191 | +import os |
| 192 | +def run_email_processing_demo(): |
| 193 | + """Demonstration of the email processing system.""" |
| 194 | + |
| 195 | + # Configure DSPy |
| 196 | + lm = dspy.LM(model='openai/gpt-4o-mini') |
| 197 | + dspy.configure(lm=lm) |
| 198 | + os.environ["OPENAI_API_KEY"] = "<YOUR OPENAI KEY>" |
| 199 | + |
| 200 | + # Create our email processor |
| 201 | + processor = EmailProcessor() |
| 202 | + |
| 203 | + # Sample emails for testing |
| 204 | + sample_emails = [ |
| 205 | + { |
| 206 | + "subject": "Order Confirmation #12345 - Your MacBook Pro is on the way!", |
| 207 | + "body": """Dear John Smith, |
| 208 | +
|
| 209 | +Thank you for your order! We're excited to confirm that your order #12345 has been processed. |
| 210 | +
|
| 211 | +Order Details: |
| 212 | +- MacBook Pro 14-inch (Space Gray) |
| 213 | +- Order Total: $2,399.00 |
| 214 | +- Estimated Delivery: December 15, 2024 |
| 215 | +- Tracking Number: 1Z999AA1234567890 |
| 216 | +
|
| 217 | +If you have any questions, please contact our support team at support@techstore.com. |
| 218 | +
|
| 219 | +Best regards, |
| 220 | +TechStore Team""", |
| 221 | + "sender": "orders@techstore.com" |
| 222 | + }, |
| 223 | + { |
| 224 | + "subject": "URGENT: Server Outage - Immediate Action Required", |
| 225 | + "body": """Hi DevOps Team, |
| 226 | +
|
| 227 | +We're experiencing a critical server outage affecting our production environment. |
| 228 | +
|
| 229 | +Impact: All users unable to access the platform |
| 230 | +Started: 2:30 PM EST |
| 231 | +
|
| 232 | +Please join the emergency call immediately: +1-555-123-4567 |
| 233 | +
|
| 234 | +This is our highest priority. |
| 235 | +
|
| 236 | +Thanks, |
| 237 | +Site Reliability Team""", |
| 238 | + "sender": "alerts@company.com" |
| 239 | + }, |
| 240 | + { |
| 241 | + "subject": "Meeting Invitation: Q4 Planning Session", |
| 242 | + "body": """Hello team, |
| 243 | +
|
| 244 | +You're invited to our Q4 planning session. |
| 245 | +
|
| 246 | +When: Friday, December 20, 2024 at 2:00 PM - 4:00 PM EST |
| 247 | +Where: Conference Room A |
| 248 | +
|
| 249 | +Please confirm your attendance by December 18th. |
| 250 | +
|
| 251 | +Best, |
| 252 | +Sarah Johnson""", |
| 253 | + "sender": "sarah.johnson@company.com" |
| 254 | + } |
| 255 | + ] |
| 256 | + |
| 257 | + # Process each email and display results |
| 258 | + print("🚀 Email Processing Demo") |
| 259 | + print("=" * 50) |
| 260 | + |
| 261 | + for i, email in enumerate(sample_emails): |
| 262 | + print(f"\n📧 EMAIL {i+1}: {email['subject'][:50]}...") |
| 263 | + |
| 264 | + # Process the email |
| 265 | + result = processor( |
| 266 | + email_subject=email["subject"], |
| 267 | + email_body=email["body"], |
| 268 | + sender=email["sender"] |
| 269 | + ) |
| 270 | + |
| 271 | + # Display key results |
| 272 | + print(f" 📊 Type: {result.email_type}") |
| 273 | + print(f" 🚨 Urgency: {result.urgency}") |
| 274 | + print(f" 📝 Summary: {result.summary}") |
| 275 | + |
| 276 | + if result.financial_amount: |
| 277 | + print(f" 💰 Amount: ${result.financial_amount:,.2f}") |
| 278 | + |
| 279 | + if result.action_required: |
| 280 | + print(f" ✅ Action Required: Yes") |
| 281 | + if result.deadline: |
| 282 | + print(f" ⏰ Deadline: {result.deadline}") |
| 283 | + else: |
| 284 | + print(f" ✅ Action Required: No") |
| 285 | + |
| 286 | +# Run the demo |
| 287 | +if __name__ == "__main__": |
| 288 | + run_email_processing_demo() |
| 289 | +``` |
| 290 | + |
| 291 | +## Expected Output |
| 292 | +``` |
| 293 | +🚀 Email Processing Demo |
| 294 | +================================================== |
| 295 | +
|
| 296 | +📧 EMAIL 1: Order Confirmation #12345 - Your MacBook Pro is on... |
| 297 | + 📊 Type: order_confirmation |
| 298 | + 🚨 Urgency: low |
| 299 | + 📝 Summary: The email confirms John Smith's order #12345 for a MacBook Pro 14-inch in Space Gray, totaling $2,399.00, with an estimated delivery date of December 15, 2024. It includes a tracking number and contact information for customer support. |
| 300 | + 💰 Amount: $2,399.00 |
| 301 | + ✅ Action Required: No |
| 302 | +
|
| 303 | +📧 EMAIL 2: URGENT: Server Outage - Immediate Action Required... |
| 304 | + 📊 Type: other |
| 305 | + 🚨 Urgency: critical |
| 306 | + 📝 Summary: The Site Reliability Team has reported a critical server outage that began at 2:30 PM EST, preventing all users from accessing the platform. They have requested the DevOps Team to join an emergency call immediately to address the issue. |
| 307 | + ✅ Action Required: Yes |
| 308 | + ⏰ Deadline: Immediately |
| 309 | +
|
| 310 | +📧 EMAIL 3: Meeting Invitation: Q4 Planning Session... |
| 311 | + 📊 Type: meeting_invitation |
| 312 | + 🚨 Urgency: medium |
| 313 | + 📝 Summary: Sarah Johnson has invited the team to a Q4 planning session on December 20, 2024, from 2:00 PM to 4:00 PM EST in Conference Room A. Attendees are asked to confirm their participation by December 18th. |
| 314 | + ✅ Action Required: Yes |
| 315 | + ⏰ Deadline: December 18th |
| 316 | +``` |
| 317 | + |
| 318 | +## Next Steps |
| 319 | + |
| 320 | +- **Add more email types** and refine classification (newsletter, promotional, etc.) |
| 321 | +- **Add integration** with email providers (Gmail API, Outlook, IMAP) |
| 322 | +- **Experiment with different LLMs** and optimization strategies |
| 323 | +- **Add multilingual support** for international email processing |
| 324 | +- **Optimization** for increasing the performance of your program |
0 commit comments