Skip to content

A metadata-based and LLM-driven ebook collection renaming toolkit. It offers both traditional metadata extraction (i.e., title, author, ISBN) to perform file naming, and an LLM-driven approach that ingests book content for the model to analyze when metadata is poor or missing altogether. Locally hosted LLMs can be used via OpenAI API.

Notifications You must be signed in to change notification settings

ngpepin/LLM-rename-ebooks

Repository files navigation

Rename Ebooks

Overview

This project builds on ebook-tools and provides two distinct and stand-alone approaches for renaming and organizing ebooks:

  1. Metadata-Based Renaming: Utilizes traditional metadata extraction from ebook files (such as title, author, ISBN) via ebook-tools. This method is effective for well-formatted files with embedded or inferable metadata.

  2. LLM-Based Renaming: Leverages a Large Language Model (LLM) to analyze ebook content and generate context-aware filenames. Ideal for ebooks with poor or missing metadata.

Each method has its own script and configuration, and they can be used independently based on the nature of your ebook collection.

  1. Metadata-Based Renaming (default) using traditional metadata extraction tools.
  2. LLM-Based Renaming using a Large Language Model to infer filenames from content.

This project builds on ebook-tools and extends its functionality, encapsulates it and allow complex [ebook-tools] options to be captured via a JSON configuration file for simplified CLI invocation. It has an additional script for renaming, organizing, and correcting issues with ebook files caused by apparent [ebook-tools] bug(s). It uses the updated / forked Docker image didc/ebook-tools:latest. It was tested on Ubuntu.

The scripts include:

  • rename-ebooks.sh: Handles renaming and metadata extraction using organize-ebooks.sh and fix-matches.sh.
  • fix-matches.sh: Fixes issues where ebooks are placed in incorrect directories.
  • organize-ebooks.sh: Organizes ebooks based on metadata. (A lightly modified version of the original script from ebook-tools.)
  • lib.sh: Contains utility functions used by other scripts. (A lightly modified version of the original script from ebook-tools.)

The project utilizes ebook-tools in Dockerized form to process and rename books effectively. Lightly modified ebook-tools scripts organize-ebooks.sh and lib.sh are provided in this repo need to be in the same directory as rename-ebooks.sh as they are bind-mounted into the Docker container.

The provided Dockerfile creates directories that are bind-mounted to the host filesystem to receive successful, corrupt, "pamphlet" (short non-book documents), "uncertain" and failed e-book file output. Note that although the container directory names appear in config.json, any name changes need to also be reflected (manually) in the Dockerfile.

Features

  • Renames ebooks using metadata extracted from various sources.
  • Corrects issues where files are placed in unnecessary subdirectories. Determines file types based on content rather than just extensions.
  • Uses a configurable JSON file for flexible behavior.
  • Supports pdf, epub, mobi, and txt file formats.
  • Uses Docker to simplify dependency management.

Installation

  1. Clone the repository:
    git clone https://github.com/ngpepin/rename-ebooks.git
    cd rename-ebooks
  2. Ensure dependencies are installed:
    sudo apt install jq docker unzip poppler-utils calibre
  3. Pull the required Docker image (if not already available):
    docker pull didc/ebook-tools:latest

Usage

1. Metadata-Based Renaming

This is the default and well-established method based on extracting embedded ebook metadata.

./rename-ebooks.sh [OPTIONS] -i /path/to/input -o /path/to/output

Options:

  • -c, --config <file>: Use a custom JSON config file.
  • -i, --input <dir>: Input directory.
  • -o, --output <dir>: Output directory.
  • -f, --fresh: Redownload the Docker image.
  • -d, --debug: Enable debug mode.
  • -h, --help: Show help.

2. LLM-Based Renaming

A newer approach that uses an LLM to understand ebook content and generate meaningful filenames.

./rename-using-llm.sh -i /path/to/input -o /path/to/output -c rename-using-llm.conf

Configuration File (rename-using-llm.conf):

  • PROJ_DIR: Path to the project.
  • API_ENDPOINT: URL to the LLM chat completion endpoint (e.g., http://localhost:4141/v1/chat/completions).

This method is particularly useful for ebooks with ambiguous or no metadata.

Fixing Matches

If ebooks are placed in incorrect directories (due to an ebook-tools issue), the main script will run (or you can manually run):

./fix-matches.sh [-i /path/to/input-directory -o /path/to/output-directory]

This script will:

  • Detect misplaced ebooks.
  • Rename them based on the correct structure.
  • Move files to the correct location.
  • Determine the correct file types.

Configuration

The project uses a config.json file to define how books are processed. Below is the full JSON schema used:

{
  "docker": {
    "mounts": {
      "input": "input",
      "output": "output",
      "corrupt": "corrupt",
      "pamphlets": "pamphlets",
      "uncertain": "uncertain",
      "failed": "failed"
    },
    "dirs": {
      "input_home": "/my-input-home",
      "input": "",
      "output_home": "/my-output-home",
      "output": "",
      "corrupt": "/Corrupt",
      "pamphlets": "/Pamphlets",
      "uncertain": "/Uncertain",
      "failed": "/Failed"
    },
    "image": "didc/ebook-tools:latest",
    "dockerfile": "",
    "remove_container": true
  },
  "script_general": {
    "verbose": false,
    "keep_metadata": true,
    "corruption_check_only": false,
    "input_extensions": "^(7z|bz2|chm|arj|cab|gz|tgz|gzip|zip|rar|xz|tar|epub|docx|odt|ods|cbr|cbz|maff|iso)$",
    "output_format": ""
  },
  "isbn": {
    "metadata_fetch_order": "Goodreads,Amazon.com,Google,ISBNDB,WorldCat xISBN,OZON.ru",
    "reorder_text_to_find_isbn": "true, 400, 50",
    "organize_without_isbn": true,
    "without-isbn-sources": "Goodreads,Amazon.com,Google"
  },
  "ocr": {
    "enabled": true,
    "lang": "eng",
    "only_first_last_pages": "7,3"
  }
}

Dependencies

Ensure the following dependencies are installed:

  • jq (for parsing JSON configs)
  • docker (for running ebook-tools)
  • unzip (for checking EPUB files)
  • poppler-utils (for handling PDFs)
  • calibre (for metadata extraction)

Contributing

Contributions are welcome! Please open an issue or submit a pull request with improvements.

License

This project is licensed under the MIT License. See LICENSE if included for additional details.

About

A metadata-based and LLM-driven ebook collection renaming toolkit. It offers both traditional metadata extraction (i.e., title, author, ISBN) to perform file naming, and an LLM-driven approach that ingests book content for the model to analyze when metadata is poor or missing altogether. Locally hosted LLMs can be used via OpenAI API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published