A comprehensive suite of scripts and utilities to enhance, extract, and package EPUB files for personal ebook libraries, especially web novel rips or incomplete books.
- Features
- Prerequisites
- Installation
- Scripts
- Usage
- Workflow Examples
- Configuration
- Troubleshooting
- Contributing
- License
- 🖼️ Inject Cover: Replace or add a high-quality cover image with proper EPUB metadata
- 📝 Fix Titles: Auto-generate missing
<h1>
titles from the EPUB navigation files (EPUB 2/3) - 📄 Add Title Page: Create a custom styled title page as the book's opening
- 📋 Extract Chapters: Dump chapter titles from an EPUB's TOC to a text file
- 🔧 DOCX Heading Fixer: Fix misformatted or missing headings inside DOCX files
- 📖 Build EPUB: Stitch Markdown-converted chapters into a single EPUB via Bash
- ✅ Proper EPUB Packaging: Maintains EPUB standards (e.g., mimetype first uncompressed)
- ⚡ Lightweight and Fast: Uses lxml, BeautifulSoup, and pandoc for quick processing
- Python: 3.6 or higher
- Bash: For shell scripts (Linux/macOS/WSL)
- Operating System: Linux, macOS, Windows (with WSL), or Android (with Termux)
- pandoc: Required for
make_epub.sh
script - zip/unzip: Standard archive utilities (usually pre-installed)
git clone https://github.com/yourusername/epub-tools-collection.git
cd epub-tools-collection
pip install -r requirements.txt
Or install manually:
pip install lxml beautifulsoup4 ebooklib
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install pandoc
macOS (with Homebrew):
brew install pandoc
Windows: Download from pandoc.org
Android/Termux:
pkg install python pandoc
pip install lxml beautifulsoup4 ebooklib
chmod +x make_epub.sh
python --version
pandoc --version
Function: Adds or overwrites the cover image in an EPUB file while maintaining proper EPUB structure and metadata.
Command:
python inject_cover.py input.epub output.epub cover.jpg
Features:
- Supports JPG, PNG, and GIF formats
- Automatically updates OPF manifest and metadata
- Preserves existing EPUB structure
- Handles both EPUB 2 and EPUB 3 formats
Example:
python inject_cover.py "Harry Potter.epub" "Harry Potter with Cover.epub" cover.jpg
Function: Inserts missing <h1>
titles into each chapter HTML/XHTML file, using the EPUB TOC (NCX or NAV).
Command:
python fix_epub_titles.py input.epub output.epub
Features:
- Automatically detects EPUB 2 (NCX) or EPUB 3 (NAV) format
- Preserves existing formatting and styles
- Handles nested chapter structures
- Skips chapters that already have titles
Example:
python fix_epub_titles.py "raw_novel.epub" "novel_with_titles.epub"
Function: Generates and inserts a styled title page at the start of an EPUB.
Command:
python add_title.py input.epub output.epub
Features:
- Creates professional-looking title page
- Extracts title and author from EPUB metadata
- Customizable CSS styling
- Inserts at the beginning of reading order
Example:
python add_title.py "novel.epub" "novel_with_title_page.epub"
Function: Reads an EPUB's TOC and writes all chapter titles to chapters.txt
.
Command:
python extract_chapters.py input.epub
Output: Creates chapters.txt
in the current directory with all chapter titles.
Features:
- Extracts from both NCX and NAV files
- Preserves chapter hierarchy
- Handles special characters and encoding
Example:
python extract_chapters.py "my_novel.epub"
# Creates chapters.txt with all chapter titles
Function: Automatically fixes misformatted DOCX chapter headings. Useful for batch-fixing heading levels before EPUB creation.
Command:
python "heading fixer docx.py" input_folder/
Features:
- Processes all DOCX files in the specified folder
- Fixes heading levels and styles
- Ensures consistent formatting
- Creates backup of original files
Notes:
- Processes all DOCX files in the provided folder
- Ensures consistent heading styles for proper EPUB conversion
- Backup files are created with
.bak
extension
Example:
python "heading fixer docx.py" ./chapters/
Function: Converts a series of DOCX chapter files into Markdown, cleans headings, and assembles them into a single EPUB with metadata and TOC via pandoc.
Prerequisites:
- Name your chapters
ch01.docx
throughch21.docx
(or adjust the seq range in script) - Update the script's variables:
TITLE
: Book titleAUTHOR
: Author namePUBLISHER
: Publisher nameOUT
: Output filename
Usage:
# 1. Make executable
chmod +x make_epub.sh
# 2. Edit variables in script
vim make_epub.sh
# 3. Run
./make_epub.sh
Features:
- Converts DOCX to Markdown automatically
- Generates proper EPUB metadata
- Creates table of contents
- Handles multiple chapters seamlessly
Example structure:
project_folder/
├── ch01.docx
├── ch02.docx
├── ...
├── ch21.docx
└── make_epub.sh
# Basic workflow for enhancing an existing EPUB
python inject_cover.py raw.epub temp1.epub my_cover.jpg
python fix_epub_titles.py temp1.epub temp2.epub
python add_title.py temp2.epub final.epub
# Complete enhancement pipeline
python inject_cover.py raw.epub temp1.epub my_cover.jpg
python fix_epub_titles.py temp1.epub temp2.epub
python add_title.py temp2.epub temp3.epub
python extract_chapters.py temp3.epub
python "heading fixer docx.py" chapters_folder/
./make_epub.sh
Extract chapter names for reference:
python extract_chapters.py "my_book.epub"
cat chapters.txt
Fix DOCX files before conversion:
python "heading fixer docx.py" ./raw_chapters/
Create EPUB from DOCX files:
# Edit make_epub.sh variables first
./make_epub.sh
graph TD;
A[Raw EPUB] --> B[inject_cover.py]
B --> C[fix_epub_titles.py]
C --> D[add_title.py]
D --> E[extract_chapters.py]
E --> F[Enhanced EPUB]
G[DOCX Files] --> H[heading fixer docx.py]
H --> I[make_epub.sh]
I --> J[New EPUB]
F --> K[Final Enhanced EPUB]
J --> K
# For web novel rips or incomplete EPUBs
python inject_cover.py raw_novel.epub step1.epub cover.jpg
python fix_epub_titles.py step1.epub step2.epub
python add_title.py step2.epub final_novel.epub
# For converting Word documents to EPUB
python "heading fixer docx.py" chapters/
./make_epub.sh # Creates final EPUB
# For maximum enhancement
python inject_cover.py raw.epub temp1.epub cover.jpg
python fix_epub_titles.py temp1.epub temp2.epub
python add_title.py temp2.epub enhanced.epub
python extract_chapters.py enhanced.epub # For reference
Edit the following variables in make_epub.sh
:
# Book metadata
TITLE="Your Book Title"
AUTHOR="Author Name"
PUBLISHER="Publisher Name"
OUT="Final Book Title.epub"
# Chapter range (adjust as needed)
# Current: ch01.docx through ch21.docx
seq -f "ch%02g.docx" 1 21
Edit the HTML template in add_title.py
to customize the title page appearance:
# Look for the title_page_content variable
title_page_content = """
<!DOCTYPE html>
<html>
<head>
<title>Title Page</title>
<style>
/* Customize CSS here */
</style>
</head>
<body>
<!-- Customize HTML here -->
</body>
</html>
"""
Cause: Improper EPUB structure or missing mimetype file. Solution:
- Ensure mimetype is uncompressed and first in archive
- Use the provided rebuild logic in scripts
- Verify with
unzip -l your_file.epub
to check structure
Cause: Reader cache or incorrect metadata. Solution:
- Clear reader cache/restart reader app
- Confirm cover.jpg path and
properties="cover-image"
in OPF - Verify image format is supported (JPG, PNG, GIF)
Cause: Invalid or missing navigation files. Solution:
- Verify EPUB has a valid NCX (EPUB2) or NAV (EPUB3) file
- Check if TOC structure is properly formatted
- Use
python extract_chapters.py
to verify TOC content
Cause: Missing pandoc or incorrect file structure. Solution:
- Install pandoc:
sudo apt-get install pandoc
- Verify DOCX files are named correctly (ch01.docx, ch02.docx, etc.)
- Check that all DOCX files are in the same directory as the script
Cause: Scripts not executable or insufficient permissions. Solution:
chmod +x make_epub.sh
chmod +x *.py
Cause: Missing dependencies. Solution:
pip install lxml beautifulsoup4 ebooklib
Cause: Corrupted DOCX files or unsupported format. Solution:
- Verify DOCX files open correctly in Word/LibreOffice
- Re-save problematic files in standard DOCX format
- Check file permissions
epub-tools-collection/
├── README.md
├── requirements.txt
├── LICENSE
├── inject_cover.py
├── fix_epub_titles.py
├── add_title.py
├── extract_chapters.py
├── heading fixer docx.py
├── make_epub.sh
└── examples/
├── sample_cover.jpg
└── sample_chapters/
├── ch01.docx
└── ch02.docx
lxml>=4.6.0
beautifulsoup4>=4.9.0
ebooklib>=0.18
python-docx>=0.8.11
We welcome contributions! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-feature
) - Commit your changes (
git commit -am 'Add new feature'
) - Push to the branch (
git push origin feature/new-feature
) - Create a Pull Request
git clone https://github.com/yourusername/epub-tools-collection.git
cd epub-tools-collection
pip install -r requirements.txt
- Always backup your EPUB files before modifications
- Scripts assume standard EPUB structure; heavily corrupted files may fail
- You can edit the title page template in
add_title.py
to customize styling - For large batches, consider using shell loops:
for file in *.epub; do python inject_cover.py "$file" "enhanced_$file" cover.jpg done
MIT License — freely modify and share.
MIT License
Copyright (c) 2024 EPUB Tools Collection
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
If you encounter issues or have questions:
- Check the Troubleshooting section
- Search existing Issues
- Create a new issue with:
- Your operating system
- Python version
- Error messages
- Steps to reproduce
Happy EPUB editing! 📚✨