Skip to content

Releases: serejekee/thebat_parser

1.0

20 May 17:23
c9e6ad0

Choose a tag to compare

📦 Changelog: https://github.com/serejekee/thebat_parser/commits/1.0

[v1.0.0] - Initial Release

✨ New Features

  • Parses .eml files from the data/ directory.
  • Extracts key email fields: date, sender, recipient, subject, body, and attachments.
  • Converts HTML bodies to plain text using BeautifulSoup.
  • Cleans and formats email content for readability.
  • Generates a structured .docx file with all parsed emails in table format using python-docx.

🛠️ Technologies Used

  • Python 3.8+
  • beautifulsoup4 for HTML parsing
  • python-docx for document creation
  • Standard email library for parsing .eml messages

📁 Output

  • emails.docx containing a summary of all emails with:
    • Date/Time
    • Sender
    • Recipient
    • Subject + Message Body
    • Attachment Names

⚠️ Notes

  • Ensure .eml files are placed in the data/ folder before running.
  • pandas is included in requirements.txt but not currently used in the script.