Skip to content

Commit dc64c67

Browse files
authored
Add real world tutorial for email extraction (#8426)
* add real world tutorial for email extraction * simplify install
1 parent a97437b commit dc64c67

File tree

2 files changed

+325
-0
lines changed

2 files changed

+325
-0
lines changed
Lines changed: 324 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,324 @@
1+
# Extracting Information from Emails with DSPy
2+
3+
This tutorial demonstrates how to build an intelligent email processing system using DSPy. We'll create a system that can automatically extract key information from various types of emails, classify their intent, and structure the data for further processing.
4+
5+
## What You'll Build
6+
7+
By the end of this tutorial, you'll have a DSPy-powered email processing system that can:
8+
9+
- **Classify email types** (order confirmation, support request, meeting invitation, etc.)
10+
- **Extract key entities** (dates, amounts, product names, contact info)
11+
- **Determine urgency levels** and required actions
12+
- **Structure extracted data** into consistent formats
13+
- **Handle multiple email formats** robustly
14+
15+
## Prerequisites
16+
17+
- Basic understanding of DSPy modules and signatures
18+
- Python 3.9+ installed
19+
- OpenAI API key (or access to another supported LLM)
20+
21+
## Installation and Setup
22+
23+
```bash
24+
pip install dspy
25+
```
26+
27+
## Step 1: Define Our Data Structures
28+
29+
First, let's define the types of information we want to extract from emails:
30+
31+
```python
32+
import dspy
33+
from typing import List, Optional, Literal
34+
from datetime import datetime
35+
from pydantic import BaseModel
36+
from enum import Enum
37+
38+
class EmailType(str, Enum):
39+
ORDER_CONFIRMATION = "order_confirmation"
40+
SUPPORT_REQUEST = "support_request"
41+
MEETING_INVITATION = "meeting_invitation"
42+
NEWSLETTER = "newsletter"
43+
PROMOTIONAL = "promotional"
44+
INVOICE = "invoice"
45+
SHIPPING_NOTIFICATION = "shipping_notification"
46+
OTHER = "other"
47+
48+
class UrgencyLevel(str, Enum):
49+
LOW = "low"
50+
MEDIUM = "medium"
51+
HIGH = "high"
52+
CRITICAL = "critical"
53+
54+
class ExtractedEntity(BaseModel):
55+
entity_type: str
56+
value: str
57+
confidence: float
58+
59+
class EmailInsight(BaseModel):
60+
email_type: EmailType
61+
urgency: UrgencyLevel
62+
summary: str
63+
key_entities: List[ExtractedEntity]
64+
action_required: bool
65+
deadline: Optional[str] = None
66+
amount: Optional[float] = None
67+
sender_info: Optional[str] = None
68+
```
69+
70+
## Step 2: Create DSPy Signatures
71+
72+
Now let's define the signatures for our email processing pipeline:
73+
74+
```python
75+
class ClassifyEmail(dspy.Signature):
76+
"""Classify the type and urgency of an email based on its content."""
77+
78+
email_subject: str = dspy.InputField(desc="The subject line of the email")
79+
email_body: str = dspy.InputField(desc="The main content of the email")
80+
sender: str = dspy.InputField(desc="Email sender information")
81+
82+
email_type: EmailType = dspy.OutputField(desc="The classified type of email")
83+
urgency: UrgencyLevel = dspy.OutputField(desc="The urgency level of the email")
84+
reasoning: str = dspy.OutputField(desc="Brief explanation of the classification")
85+
86+
class ExtractEntities(dspy.Signature):
87+
"""Extract key entities and information from email content."""
88+
89+
email_content: str = dspy.InputField(desc="The full email content including subject and body")
90+
email_type: EmailType = dspy.InputField(desc="The classified type of email")
91+
92+
key_entities: List[ExtractedEntity] = dspy.OutputField(desc="List of extracted entities with type, value, and confidence")
93+
financial_amount: Optional[float] = dspy.OutputField(desc="Any monetary amounts found (e.g., '$99.99')")
94+
important_dates: list[str] = dspy.OutputField(desc="List of important dates found in the email")
95+
contact_info: list[str] = dspy.OutputField(desc="Relevant contact information extracted")
96+
97+
class GenerateActionItems(dspy.Signature):
98+
"""Determine what actions are needed based on the email content and extracted information."""
99+
100+
email_type: EmailType = dspy.InputField()
101+
urgency: UrgencyLevel = dspy.InputField()
102+
email_summary: str = dspy.InputField(desc="Brief summary of the email content")
103+
extracted_entities: List[ExtractedEntity] = dspy.InputField(desc="Key entities found in the email")
104+
105+
action_required: bool = dspy.OutputField(desc="Whether any action is required")
106+
action_items: list[str] = dspy.OutputField(desc="List of specific actions needed")
107+
deadline: Optional[str] = dspy.OutputField(desc="Deadline for action if applicable")
108+
priority_score: int = dspy.OutputField(desc="Priority score from 1-10")
109+
110+
class SummarizeEmail(dspy.Signature):
111+
"""Create a concise summary of the email content."""
112+
113+
email_subject: str = dspy.InputField()
114+
email_body: str = dspy.InputField()
115+
key_entities: List[ExtractedEntity] = dspy.InputField()
116+
117+
summary: str = dspy.OutputField(desc="A 2-3 sentence summary of the email's main points")
118+
```
119+
120+
## Step 3: Build the Email Processing Module
121+
122+
Now let's create our main email processing module:
123+
124+
```python
125+
class EmailProcessor(dspy.Module):
126+
"""A comprehensive email processing system using DSPy."""
127+
128+
def __init__(self):
129+
super().__init__()
130+
131+
# Initialize our processing components
132+
self.classifier = dspy.ChainOfThought(ClassifyEmail)
133+
self.entity_extractor = dspy.ChainOfThought(ExtractEntities)
134+
self.action_generator = dspy.ChainOfThought(GenerateActionItems)
135+
self.summarizer = dspy.ChainOfThought(SummarizeEmail)
136+
137+
def forward(self, email_subject: str, email_body: str, sender: str = ""):
138+
"""Process an email and extract structured information."""
139+
140+
# Step 1: Classify the email
141+
classification = self.classifier(
142+
email_subject=email_subject,
143+
email_body=email_body,
144+
sender=sender
145+
)
146+
147+
# Step 2: Extract entities
148+
full_content = f"Subject: {email_subject}\n\nFrom: {sender}\n\n{email_body}"
149+
entities = self.entity_extractor(
150+
email_content=full_content,
151+
email_type=classification.email_type
152+
)
153+
154+
# Step 3: Generate summary
155+
summary = self.summarizer(
156+
email_subject=email_subject,
157+
email_body=email_body,
158+
key_entities=entities.key_entities
159+
)
160+
161+
# Step 4: Determine actions
162+
actions = self.action_generator(
163+
email_type=classification.email_type,
164+
urgency=classification.urgency,
165+
email_summary=summary.summary,
166+
extracted_entities=entities.key_entities
167+
)
168+
169+
# Step 5: Structure the results
170+
return dspy.Prediction(
171+
email_type=classification.email_type,
172+
urgency=classification.urgency,
173+
summary=summary.summary,
174+
key_entities=entities.key_entities,
175+
financial_amount=entities.financial_amount,
176+
important_dates=entities.important_dates,
177+
action_required=actions.action_required,
178+
action_items=actions.action_items,
179+
deadline=actions.deadline,
180+
priority_score=actions.priority_score,
181+
reasoning=classification.reasoning,
182+
contact_info=entities.contact_info
183+
)
184+
```
185+
186+
## Step 4: Running the Email Processing System
187+
188+
Let's create a simple function to test our email processing system:
189+
190+
```python
191+
import os
192+
def run_email_processing_demo():
193+
"""Demonstration of the email processing system."""
194+
195+
# Configure DSPy
196+
lm = dspy.LM(model='openai/gpt-4o-mini')
197+
dspy.configure(lm=lm)
198+
os.environ["OPENAI_API_KEY"] = "<YOUR OPENAI KEY>"
199+
200+
# Create our email processor
201+
processor = EmailProcessor()
202+
203+
# Sample emails for testing
204+
sample_emails = [
205+
{
206+
"subject": "Order Confirmation #12345 - Your MacBook Pro is on the way!",
207+
"body": """Dear John Smith,
208+
209+
Thank you for your order! We're excited to confirm that your order #12345 has been processed.
210+
211+
Order Details:
212+
- MacBook Pro 14-inch (Space Gray)
213+
- Order Total: $2,399.00
214+
- Estimated Delivery: December 15, 2024
215+
- Tracking Number: 1Z999AA1234567890
216+
217+
If you have any questions, please contact our support team at support@techstore.com.
218+
219+
Best regards,
220+
TechStore Team""",
221+
"sender": "orders@techstore.com"
222+
},
223+
{
224+
"subject": "URGENT: Server Outage - Immediate Action Required",
225+
"body": """Hi DevOps Team,
226+
227+
We're experiencing a critical server outage affecting our production environment.
228+
229+
Impact: All users unable to access the platform
230+
Started: 2:30 PM EST
231+
232+
Please join the emergency call immediately: +1-555-123-4567
233+
234+
This is our highest priority.
235+
236+
Thanks,
237+
Site Reliability Team""",
238+
"sender": "alerts@company.com"
239+
},
240+
{
241+
"subject": "Meeting Invitation: Q4 Planning Session",
242+
"body": """Hello team,
243+
244+
You're invited to our Q4 planning session.
245+
246+
When: Friday, December 20, 2024 at 2:00 PM - 4:00 PM EST
247+
Where: Conference Room A
248+
249+
Please confirm your attendance by December 18th.
250+
251+
Best,
252+
Sarah Johnson""",
253+
"sender": "sarah.johnson@company.com"
254+
}
255+
]
256+
257+
# Process each email and display results
258+
print("🚀 Email Processing Demo")
259+
print("=" * 50)
260+
261+
for i, email in enumerate(sample_emails):
262+
print(f"\n📧 EMAIL {i+1}: {email['subject'][:50]}...")
263+
264+
# Process the email
265+
result = processor(
266+
email_subject=email["subject"],
267+
email_body=email["body"],
268+
sender=email["sender"]
269+
)
270+
271+
# Display key results
272+
print(f" 📊 Type: {result.email_type}")
273+
print(f" 🚨 Urgency: {result.urgency}")
274+
print(f" 📝 Summary: {result.summary}")
275+
276+
if result.financial_amount:
277+
print(f" 💰 Amount: ${result.financial_amount:,.2f}")
278+
279+
if result.action_required:
280+
print(f" ✅ Action Required: Yes")
281+
if result.deadline:
282+
print(f" ⏰ Deadline: {result.deadline}")
283+
else:
284+
print(f" ✅ Action Required: No")
285+
286+
# Run the demo
287+
if __name__ == "__main__":
288+
run_email_processing_demo()
289+
```
290+
291+
## Expected Output
292+
```
293+
🚀 Email Processing Demo
294+
==================================================
295+
296+
📧 EMAIL 1: Order Confirmation #12345 - Your MacBook Pro is on...
297+
📊 Type: order_confirmation
298+
🚨 Urgency: low
299+
📝 Summary: The email confirms John Smith's order #12345 for a MacBook Pro 14-inch in Space Gray, totaling $2,399.00, with an estimated delivery date of December 15, 2024. It includes a tracking number and contact information for customer support.
300+
💰 Amount: $2,399.00
301+
✅ Action Required: No
302+
303+
📧 EMAIL 2: URGENT: Server Outage - Immediate Action Required...
304+
📊 Type: other
305+
🚨 Urgency: critical
306+
📝 Summary: The Site Reliability Team has reported a critical server outage that began at 2:30 PM EST, preventing all users from accessing the platform. They have requested the DevOps Team to join an emergency call immediately to address the issue.
307+
✅ Action Required: Yes
308+
⏰ Deadline: Immediately
309+
310+
📧 EMAIL 3: Meeting Invitation: Q4 Planning Session...
311+
📊 Type: meeting_invitation
312+
🚨 Urgency: medium
313+
📝 Summary: Sarah Johnson has invited the team to a Q4 planning session on December 20, 2024, from 2:00 PM to 4:00 PM EST in Conference Room A. Attendees are asked to confirm their participation by December 18th.
314+
✅ Action Required: Yes
315+
⏰ Deadline: December 18th
316+
```
317+
318+
## Next Steps
319+
320+
- **Add more email types** and refine classification (newsletter, promotional, etc.)
321+
- **Add integration** with email providers (Gmail API, Outlook, IMAP)
322+
- **Experiment with different LLMs** and optimization strategies
323+
- **Add multilingual support** for international email processing
324+
- **Optimization** for increasing the performance of your program

docs/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ nav:
6161
- Async: tutorials/async/index.md
6262
- Real-World Examples:
6363
- Generating llms.txt: tutorials/llms_txt_generation/index.md
64+
- Email Information Extraction: tutorials/email_extraction/index.md
6465
- DSPy in Production: production/index.md
6566
- Community:
6667
- Community Resources: community/community-resources.md

0 commit comments

Comments
 (0)