-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Currently, fast_mail_parser
return only the last header value if there are multiple headers with the same key (like Received
, which is nearly almost the case).
An example program with Python's build-in parser:
# Hardcoded email data (for the sake of example, the email data is included here as a raw string)
email_data = """\
From: sender@example.com
To: recipient@example.com
Subject: Test Email
Date: Mon, 13 Sep 2024 10:00:00 +0200
Received: from mail.example.com (mail.example.com [192.0.2.1])
by smtp.example.com with ESMTP id abc123
for <recipient@example.com>; Mon, 13 Sep 2024 09:55:00 +0200
Received: from smtp2.example.com (smtp2.example.com [192.0.2.2])
by mail.example.com with ESMTP id def456
for <recipient@example.com>; Mon, 13 Sep 2024 09:50:00 +0200
Received: from relay.example.com (relay.example.com [192.0.2.3])
by smtp2.example.com with ESMTP id ghi789
for <recipient@example.com>; Mon, 13 Sep 2024 09:45:00 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
This is a test email.
"""
import email
from email import policy
from email.parser import BytesParser
from io import BytesIO
def parse_received_headers(email_data):
"""Parse the Received: headers from a hardcoded EML string."""
# Convert the email string to bytes (as we would normally read an EML file in bytes)
email_bytes = BytesIO(email_data.encode('utf-8'))
# Parse the email message
email_message = BytesParser(policy=policy.default).parse(email_bytes)
# Extract Received headers in the order they appear
received_headers = []
for header, value in email_message.items():
if header.lower() == 'received':
received_headers.append(value)
return received_headers
# Call the function with the hardcoded email
received_headers = parse_received_headers(email_data)
# Print out the Received headers in order
for i, header in enumerate(received_headers, 1):
print(f"Received Header {i}:\n{header}\n")
$ python test.py
Received Header 1:
from mail.example.com (mail.example.com [192.0.2.1]) by smtp.example.com with ESMTP id abc123 for <recipient@example.com>; Mon, 13 Sep 2024 09:55:00 +0200
Received Header 2:
from smtp2.example.com (smtp2.example.com [192.0.2.2]) by mail.example.com with ESMTP id def456 for <recipient@example.com>; Mon, 13 Sep 2024 09:50:00 +0200
Received Header 3:
from relay.example.com (relay.example.com [192.0.2.3]) by smtp2.example.com with ESMTP id ghi789 for <recipient@example.com>; Mon, 13 Sep 2024 09:45:00 +0200
and the same with fast_mail_parser
:
email_data = """\
From: sender@example.com
To: recipient@example.com
Subject: Test Email
Date: Mon, 13 Sep 2024 10:00:00 +0200
Received: from mail.example.com (mail.example.com [192.0.2.1])
by smtp.example.com with ESMTP id abc123
for <recipient@example.com>; Mon, 13 Sep 2024 09:55:00 +0200
Received: from smtp2.example.com (smtp2.example.com [192.0.2.2])
by mail.example.com with ESMTP id def456
for <recipient@example.com>; Mon, 13 Sep 2024 09:50:00 +0200
Received: from relay.example.com (relay.example.com [192.0.2.3])
by smtp2.example.com with ESMTP id ghi789
for <recipient@example.com>; Mon, 13 Sep 2024 09:45:00 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
This is a test email.
"""
from fast_mail_parser import parse_email, ParseError
from pprint import pprint as pp
email = parse_email(email_data)
pp(email.headers)
$ python test-fmp.py
{'Content-Transfer-Encoding': '7bit',
'Content-Type': 'text/plain; charset="utf-8"',
'Date': 'Mon, 13 Sep 2024 10:00:00 +0200',
'From': 'sender@example.com',
'MIME-Version': '1.0',
'Received': 'from relay.example.com (relay.example.com [192.0.2.3]) by '
'smtp2.example.com with ESMTP id ghi789 for '
'<recipient@example.com>; Mon, 13 Sep 2024 09:45:00 +0200',
'Subject': 'Test Email',
'To': 'recipient@example.com'}
Specifically for the Received
header, the most significant one if the first, but fast_mail_parser
returns only the last one. Could you please add (not to break compatibility) another header representation, which correctly lists all headers, maybe with a list of tuples?
Metadata
Metadata
Assignees
Labels
No labels