Skip to content

๐ŸŽ™๏ธ Convert WhatsApp chats to beautiful PDFs with AI-powered audio transcription. Supports 100+ languages, customizable layouts, and smart caching.

License

Notifications You must be signed in to change notification settings

wafy80/whatsapp-chat-transcriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

22 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

WhatsApp Chat to PDF Transcriber

Python Version License Platform CI GitHub release GitHub stars

Convert WhatsApp chat exports to beautifully formatted PDFs with automatic audio transcription using AI.

โญ Star this repo if you find it useful!

๐Ÿ“ธ Screenshots

Click to see examples

WhatsApp-Style Layout (Default)

Beautiful chat bubbles with green messages and audio transcriptions

Minimal Layout

Clean, simple design perfect for archiving

Sample Output

Generated PDF with embedded images and transcribed audio

โœจ Features

  • ๐Ÿ“ Automatic message parsing from WhatsApp exports
  • ๐ŸŽ™๏ธ AI-powered audio transcription using OpenAI Whisper
  • ๐Ÿ–ผ๏ธ Embedded images directly in PDF
  • ๐Ÿ“Ž Media file references (documents, videos)
  • ๐ŸŒ 100+ languages supported (6 built-in translations + auto-detect)
  • ๐ŸŽจ Customizable HTML templates (WhatsApp-style layouts)
  • ๐ŸŒ Multi-language interface (language files in languages/ folder)
  • โš™๏ธ Highly customizable (colors, fonts, spacing)
  • ๐Ÿ’พ Smart caching (instant regeneration, up to 98% time savings)
  • ๐Ÿ”„ Batch processing (multiple chats at once)
  • ๐Ÿ”’ Privacy options (exclude images)

๐Ÿš€ Quick Start

# Install dependencies
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

# Single chat
python3 main.py "chat.zip"

# With language specification
python3 main.py "chat.zip" -l en

# All chats in folder (batch mode)
python3 main.py --batch

Or use the convenience wrapper:

# Automatically sets up environment
./convert.sh "chat.zip"

# Verify setup
./check_setup.sh

๐Ÿ“ฑ Getting the Chat ZIP from WhatsApp

WhatsApp exports are created on your phone. Here are the easiest ways to transfer them to your computer:

Method 1: Direct Upload via Web Interface โญ EASIEST!

No file transfer needed! Upload directly from your phone:

Option A: Same WiFi (Local Network)

# Start the web server on your computer
python3 web_upload.py

Then on your phone:

  1. Connect to the same WiFi as your computer
  2. Open browser โ†’ go to URL shown (e.g., http://192.168.1.100:8080)
  3. Upload WhatsApp ZIP file
  4. Download generated PDF!

Option B: HTTPS Tunnel (PWA Share) ๐Ÿš€ RECOMMENDED!

Enable direct sharing from WhatsApp!

# Option 1: Cloudflared (Recommended - No warning page)
# Install cloudflared first:
# Linux:
wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
sudo mv cloudflared-linux-amd64 /usr/local/bin/cloudflared
sudo chmod +x /usr/local/bin/cloudflared

# macOS:
brew install cloudflared

# Windows:
# Method 1: Using Scoop (recommended)
scoop install cloudflared
# Method 2: Using Chocolatey
choco install cloudflared
# Method 3: Manual installation
# 1. Download from: https://github.com/cloudflare/cloudflared/releases/latest
#    (choose: cloudflared-windows-amd64.exe for 64-bit or cloudflared-windows-386.exe for 32-bit)
# 2. Add the downloaded file to a folder (e.g., C:\Program Files\cloudflared\)
# 3. Add this folder to your PATH:
#    - Open Environment Variables: Win+R โ†’ sysdm.cpl โ†’ Environment Variables
#    - Add the folder to the "Path" variable in System variables
# 4. Restart your terminal

# Start with HTTPS tunnel (temporary URL, changes on restart)
python3 web_upload.py --https

# For PERSISTENT URL (same URL every time):
# Easy setup with helper script:
./setup_tunnel.sh

# Or manual setup:
# 1. Login to Cloudflare (one time)
cloudflared tunnel login
# 2. Create a named tunnel (one time)
cloudflared tunnel create chat2pdf
# 3. Configure DNS route (one time)
cloudflared tunnel route dns chat2pdf chat2pdf.yourdomain.com
# 4. Run with your tunnel name
python3 web_upload.py --tunnel-name chat2pdf
# Your URL will be: https://chat2pdf.yourdomain.com

# Option 2: ngrok (Alternative - Shows warning page on first visit)
pip install pyngrok
python3 web_upload.py --ngrok

Note on Persistent URLs:

  • Named tunnels require a domain managed by Cloudflare (free account OK)
  • If you don't have a domain, use the quick tunnel (--https) - URL changes but it's instant
  • The helper script setup_tunnel.sh will guide you through the setup

Setup (one time only):

  1. Open the HTTPS URL shown on your phone (e.g., https://xxxx.trycloudflare.com or https://xxxx.ngrok-free.app)
  2. Tap "Install app" or "Add to Home Screen"
  3. The "Chat2PDF" app will be installed on your phone

Every time you export a chat:

  1. WhatsApp โ†’ Open chat โ†’ โ‹ฎ โ†’ More โ†’ Export chat
  2. Choose "Include Media"
  3. Tap "Share" button
  4. Select "Chat2PDF" from the share menu! ๐Ÿ“ฑ
  5. File uploads automatically โ†’ Download PDF when ready!

Requirements:

  • โœ… Android + Chrome/Edge (iOS Safari doesn't support Share Target API)
  • โœ… Cloudflared installed (recommended) OR Free ngrok account: Sign up here โ†’ Get auth token โ†’ Run ngrok config add-authtoken YOUR_TOKEN

Features:

  • ๐Ÿ“ฑ Mobile-friendly interface
  • ๐ŸŽจ Drag & drop upload
  • ๐ŸŒ Language selection
  • โšก Auto processing
  • ๐Ÿ“ฅ Direct PDF download
  • ๐Ÿ”— PWA Share Target (share directly from WhatsApp!)
  • ๐Ÿš€ Cloudflared: No warning pages, instant access

Method 2: Cloud Storage

  1. Export chat on WhatsApp โ†’ Choose "Include Media"
  2. Save to Google Drive, iCloud, Dropbox, etc.
  3. Download on your computer from the cloud service

Method 3: Email

  1. Export chat on WhatsApp
  2. Choose "Email" as share method
  3. Open email on computer and download attachment
  4. โš ๏ธ Limit: Email attachments usually max at 25 MB

Method 4: USB Cable

  1. Export chat on WhatsApp โ†’ Save to phone storage
  2. Connect phone to computer via USB cable
  3. Copy ZIP file from phone's WhatsApp folder:
    • Android: /Internal Storage/WhatsApp/
    • iOS: Use iTunes File Sharing or Finder

Method 5: Messaging Apps

  1. Export chat on WhatsApp
  2. Share via Telegram (send to "Saved Messages"), Signal, etc.
  3. Download from desktop app

Method 6: Local Network Transfer

Use apps like:

  • SendAnywhere (no account needed)
  • LocalSend (open source, no internet needed)
  • Snapdrop (web-based, same network)

๐Ÿ’ก Tip: For large chats with media, cloud storage or local network transfer are fastest!

๐Ÿ“‹ Requirements

  • Python 3.8+
  • ffmpeg (for audio conversion)
  • ~500 MB disk space (for Whisper model)

Installing ffmpeg

# Linux (Ubuntu/Debian)
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from: https://ffmpeg.org/download.html
# Or use: choco install ffmpeg

๐Ÿ“– Usage

Single File

# Basic usage
python3 main.py chat.zip

# Specify output filename
python3 main.py chat.zip -o output.pdf

# Specify language for transcription
python3 main.py chat.zip -l en

# Using the wrapper script (easier)
./convert.sh chat.zip
./convert.sh chat.zip -o output.pdf -l en

Tip: The convert.sh wrapper script automatically:

  • Creates virtual environment if needed
  • Installs dependencies
  • Activates the environment
  • Runs the transcriber

Batch Mode

Process multiple chat files at once:

# Process all .zip files in current directory
python3 main.py --batch

# With language specification
python3 main.py --batch -l en

# Skip files that already have PDF output
python3 main.py --batch --skip-existing

# Custom file pattern
python3 main.py --batch --pattern "WhatsApp*.zip"

Command Line Options

Single file mode:
  python3 main.py <zip_file> [-o output.pdf] [-l language]

Batch mode:
  python3 main.py --batch [-l language] [--pattern "*.zip"] [--skip-existing]

Options:
  -o, --output      Output PDF filename
  -l, --language    Language for transcription (e.g., en, es, it, fr)
  --batch           Process all .zip files in current directory
  --pattern         File pattern for batch mode (default: *.zip)
  --skip-existing   Skip files that already have PDF output
  --help            Show help message

โš™๏ธ Configuration

Customize everything via config.ini:

# Copy example configuration
cp config.example.ini config.ini

# Edit as needed
nano config.ini

Key Configuration Sections

PDF Settings

[PDF]
page_size = A4              # or letter
left_margin = 0.5           # inches
right_margin = 0.5
max_image_width = 3.0       # inches
max_image_height = 2.5

Whisper AI Settings

[WHISPER]
model = small               # tiny, base, small, medium, large
language = en               # Leave empty for auto-detect

Layout Customization

[LAYOUT]
title_font_size = 20
sender_font_size = 10
message_font_size = 9
message_alignment = JUSTIFY  # LEFT, CENTER, RIGHT, JUSTIFY

Colors

[COLORS]
title_color = 075E54        # Hex color without #
sender_color = 25D366       # WhatsApp green
media_color = 0084FF
system_color = 808080

Privacy

[PRIVACY]
exclude_images = false      # Set to true to exclude images from PDF

HTML Templates

[HTML_TEMPLATE]
enabled = true                    # HTML templates enabled by default
template_file = templates/template.html     # WhatsApp-style layout
show_stats = true                 # Show message/media statistics

Available templates:

  • templates/template.html - Full WhatsApp-style layout (default)
  • templates/template_minimal.html - Minimal clean layout
  • templates/template_simple.html - Simple text-based layout

Language Translation

[LANGUAGE]
code = en                         # en, es, fr, de, it, pt

The program loads language-specific strings from languages/XX.ini files. These control:

  • Pattern matching in exported WhatsApp files
    • "file attached" (English)
    • "archivo adjunto" (Spanish)
    • "fichier joint" (French)
  • PDF labels: "Audio:", "IMAGE", "VIDEO", "DOCUMENT"
  • System messages: "excluded for privacy", "Transcription failed"

Note: All language-dependent strings are now in languages/ folder. The config file no longer contains language strings.

See languages/README.md for how to add new languages.

๐Ÿ’พ Transcription Cache

Audio transcriptions are automatically cached to save time:

# First time: ~10 minutes
python3 main.py chat.zip

# Regeneration: ~3 seconds โšก
python3 main.py chat.zip -o chat_v2.pdf

Time savings: up to 98%!

The cache is stored in .transcription_cache/ directory and is automatically created when needed.

๐ŸŒ Language Support

Whisper Transcription Languages

Whisper AI supports 100+ languages for audio transcription:

  • en - English
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese
  • ru - Russian
  • ja - Japanese
  • zh - Chinese
  • ar - Arabic
  • hi - Hindi
  • ko - Korean
  • And many more...

Tip: Specifying the language with -l is faster than auto-detect (up to 50% faster).

Interface Translations

Built-in translations for the PDF interface (labels, patterns, messages):

Language Code File Status
๐Ÿ‡ฌ๐Ÿ‡ง English en languages/en.ini โœ… Default
๐Ÿ‡ช๐Ÿ‡ธ Spanish es languages/es.ini โœ… Complete
๐Ÿ‡ซ๐Ÿ‡ท French fr languages/fr.ini โœ… Complete
๐Ÿ‡ฉ๐Ÿ‡ช German de languages/de.ini โœ… Complete
๐Ÿ‡ฎ๐Ÿ‡น Italian it languages/it.ini โœ… Complete
๐Ÿ‡ต๐Ÿ‡น Portuguese pt languages/pt.ini โœ… Complete

These files control:

  • WhatsApp export patterns ("file attached" vs "archivo adjunto")
  • PDF labels ("Audio:", "IMAGE", "VIDEO", etc.)
  • System messages

To use a different language:

# Set in config.ini
[LANGUAGE]
code = es

# Or use command line
python3 main.py chat.zip -l es

See languages/README.md to add new translations.

๐Ÿ“‚ Project Structure

WhatsappTranscriber/
โ”œโ”€โ”€ main.py                 # Main script
โ”œโ”€โ”€ convert.sh              # Wrapper script
โ”œโ”€โ”€ check_setup.sh          # Environment verification script
โ”œโ”€โ”€ config.example.ini      # Example configuration
โ”œโ”€โ”€ requirements.txt        # Python dependencies
โ”œโ”€โ”€ templates/              # HTML templates
โ”‚   โ”œโ”€โ”€ template.html           # Full WhatsApp-style layout (default)
โ”‚   โ”œโ”€โ”€ template_minimal.html   # Minimal clean layout
โ”‚   โ””โ”€โ”€ template_simple.html    # Simple text-based layout
โ”œโ”€โ”€ languages/              # Language files
โ”‚   โ”œโ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ en.ini              # English (default)
โ”‚   โ”œโ”€โ”€ es.ini              # Spanish
โ”‚   โ”œโ”€โ”€ fr.ini              # French
โ”‚   โ”œโ”€โ”€ de.ini              # German
โ”‚   โ”œโ”€โ”€ it.ini              # Italian
โ”‚   โ””โ”€โ”€ pt.ini              # Portuguese
โ”œโ”€โ”€ LICENSE                 # MIT License
โ””โ”€โ”€ README.md              # This file

๐Ÿ› ๏ธ Helper Scripts

convert.sh

Convenience wrapper that handles environment setup automatically:

./convert.sh chat.zip              # Single file
./convert.sh --batch               # All .zip files  
./convert.sh chat.zip -l en        # With language
./convert.sh --help                # Show help

check_setup.sh

Verifies your environment is correctly configured:

./check_setup.sh

This checks:

  • โœ… Python 3 installation
  • โœ… Virtual environment
  • โœ… Required dependencies (ReportLab, Pillow, PyDub, Whisper)
  • โœ… FFmpeg availability
  • โœ… Project files integrity

๐Ÿ“ฅ Exporting Chats from WhatsApp

Android

  1. Open WhatsApp
  2. Open the chat you want to export
  3. Tap โ‹ฎ (menu) โ†’ More โ†’ Export chat
  4. Choose "Include Media"
  5. Save the .zip file

iPhone

  1. Open WhatsApp
  2. Open the chat you want to export
  3. Tap the contact/group name at the top
  4. Scroll down and tap Export Chat
  5. Choose "Attach Media"
  6. Save the .zip file

The exported .zip file contains:

chat.zip
โ”œโ”€โ”€ _chat.txt              # Message history
โ”œโ”€โ”€ IMG-*.jpg              # Images
โ”œโ”€โ”€ PTT-*.opus             # Audio messages
โ”œโ”€โ”€ VID-*.mp4              # Videos
โ””โ”€โ”€ *.pdf                  # Documents

๐Ÿ“„ Output

Generated PDF includes:

  • โœ… Title and metadata
  • โœ… Formatted messages (sender, date, time)
  • โœ… Audio transcriptions embedded inline
  • โœ… Images embedded (optional)
  • โœ… Links to documents/videos
  • โœ… System messages (group changes, etc.)

Example output: chat_transcript.pdf

๐Ÿ’ก Examples

Example 1: Single Chat

python3 main.py "WhatsApp Chat with John.zip" -l en

Output: WhatsApp_Chat_with_John_transcript.pdf

Example 2: Multiple Chats

# Process all WhatsApp exports in folder
python3 main.py --batch -l en

Example 3: Regenerate After Layout Changes

# Modify config.ini (colors, margins, etc.)

# Regenerate all PDFs (uses cached audio transcriptions)
python3 main.py --batch

# Fast! โšก (seconds instead of minutes)

Example 4: Only New Chats

# Process only files without existing PDF output
python3 main.py --batch --skip-existing

๐Ÿ”ง Advanced Customization

HTML Templates

The project uses HTML templates for PDF generation. You can customize the layout by editing or creating your own template:

Template Variables:

  • {{chat_title}} - Chat name
  • {{generation_date}} - PDF generation date
  • {{total_messages}} - Message count
  • {{total_media}} - Media files count
  • {{total_transcriptions}} - Transcribed audio count

Message Loop:

{{#each messages}}
  <div class="message {{this.message_class}}">
    <strong>{{this.sender}}</strong>
    <span>{{this.time}}</span>
    <p>{{this.text}}</p>
    {{#if this.transcription}}
      <em>๐ŸŽ™๏ธ {{this.transcription}}</em>
    {{/if}}
    {{#if this.media}}
      <!-- Media handling -->
    {{/if}}
  </div>
{{/each}}

Conditionals:

  • {{#if condition}}...{{/if}} - Show if true
  • {{#if condition}}...{{else}}...{{/if}} - If-else
  • {{#each array}}...{{/each}} - Loop through array

Available templates:

  1. templates/template.html - WhatsApp-style with green bubbles and statistics
  2. templates/template_minimal.html - Clean minimal design
  3. templates/template_simple.html - Simple text-based layout

To use a different template, edit config.ini:

[HTML_TEMPLATE]
enabled = true
template_file = templates/template_minimal.html

Language Translation Files

Interface strings are stored in languages/XX.ini files. Create new translations by copying an existing file:

cp languages/en.ini languages/ja.ini

Then edit the strings:

[PATTERNS]
# Must match WhatsApp export format in your language
attached_file = ๆทปไป˜ใƒ•ใ‚กใ‚คใƒซ

[LABELS]
# Labels shown in PDF
audio = ใ‚ชใƒผใƒ‡ใ‚ฃใ‚ช:
image = ็”ปๅƒ
video = ใƒ“ใƒ‡ใ‚ช

[MESSAGES]
# System messages
image_excluded = ใƒ—ใƒฉใ‚คใƒใ‚ทใƒผใฎใŸใ‚้™คๅค–
transcription_failed = ่ปขๅ†™ใซๅคฑๆ•—ใ—ใพใ—ใŸ

See languages/README.md for detailed instructions.

๐Ÿ†˜ Troubleshooting

Error: "No module named reportlab"

# Make sure virtual environment is activated
source venv/bin/activate
pip install -r requirements.txt

Error: "ffmpeg not found"

Install ffmpeg (see Requirements section above).

Cache Not Working

Cache is stored in .transcription_cache/:

# Verify cache directory exists
ls -la .transcription_cache/

# Clear cache to force re-transcription
rm -rf .transcription_cache/

Batch Mode: "No files found"

# Check for .zip files
ls *.zip

# Use specific pattern
python3 main.py --batch --pattern "WhatsApp*.zip"

Poor Transcription Quality

  • Specify the language explicitly: -l en
  • Use a better model in config.ini: model = medium or model = large
  • Check audio quality in original files

โšก Performance

Operation First Time With Cache
1 chat (10 audio) ~10 min ~3 sec โšก
10 chats (100 audio) ~100 min ~30 sec โšก
PDF regeneration ~10 min ~3 sec โšก

Cache provides up to 98% time savings!

๐Ÿ“ Technical Details

  • AI Model: Whisper by OpenAI
  • Default Model: small (466 MB)
  • First Run: Model download (~2-3 min)
  • Accuracy: 85-95% (depends on audio quality)
  • Supported Audio: opus, m4a, mp3, wav, aac
  • Supported Images: jpg, jpeg, png, gif, webp
  • Cache: Automatic, file-based

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

See CONTRIBUTING.md for detailed guidelines on:

  • Reporting bugs
  • Suggesting enhancements
  • Adding language translations
  • Creating templates
  • Code contributions

Ways to contribute:

  • ๐Ÿ› Report bugs
  • ๐Ÿ’ก Suggest features
  • ๐ŸŒ Add language translations
  • ๐ŸŽจ Create new templates
  • ๐Ÿ“ Improve documentation
  • โญ Star this repository

๐Ÿ“„ License

This project is released under the MIT License. See LICENSE file for details.

It uses the following open-source libraries:

  • Whisper by OpenAI (MIT License)
  • ReportLab (BSD License)
  • WeasyPrint (BSD License)

๐Ÿ™ Acknowledgments

  • OpenAI for the amazing Whisper model
  • The ReportLab team for PDF generation
  • The WeasyPrint team for HTML to PDF conversion
  • Cloudflare for cloudflared tunneling

Made with โค๏ธ for preserving your conversations

About

๐ŸŽ™๏ธ Convert WhatsApp chats to beautiful PDFs with AI-powered audio transcription. Supports 100+ languages, customizable layouts, and smart caching.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published