PDF Rasterize

A Python-based tool to split a PDF by its bookmarks, rasterize the pages of the split files into high-quality images, and then merge them back into a new PDF. This process effectively flattens complex vector graphics into images, which can reduce file size and improve compatibility.

The project provides both a command-line interface (CLI) and a graphical user interface (GUI).

Features

Splitting: Splits a master PDF into multiple smaller PDFs based on its bookmark hierarchy.
Rasterizing: Converts the pages of the split PDFs into high-resolution images using Ghostscript and ImageMagick.
Merging: Combines the rasterized PDFs back into a single, final PDF.
Bookmark Recreation: Preserves the original bookmark structure in the final merged PDF.
GUI & CLI: Can be run through an easy-to-use graphical interface (built with PyQt6) or as a command-line script.
Parallel Processing: Uses multiple CPU cores to speed up the rasterization process.
Flexible Configuration: Allows customization of tool paths, resolution, and other settings.

Prerequisites

Before you can run this project, you must have the following software installed on your system:

Python 3.8+: The project is written in Python.
Ghostscript: Used for converting PDF pages to PNG images.
ImageMagick: Used for combining the PNG images back into a PDF.

Ensure that the executables for gs (Ghostscript) and magick (ImageMagick) are available in your system's PATH, or specify their locations in the config.json file.

Installation

Clone the repository:

git clone https://github.com/kush-chou/PDF_Rasterize.git
cd PDF_Rasterize

Install Python dependencies: The project's dependencies are listed in pyproject.toml. The main dependency is pypdf. The GUI also requires PyQt6. You can install them using pip:
```
pip install pypdf PyQt6
```

Usage

The project can be used via the GUI or the CLI.

Graphical User Interface (GUI)

To run the GUI, execute the pdf_gui.py script:

python pdf_gui.py

The GUI provides two main tabs:

Split & Rasterize: Select an input PDF and an output folder. Adjust settings like DPI and start the process.
Merge PDFs: Select a directory containing previously rasterized PDFs to merge them into a final document.

Command-Line Interface (CLI)

The core logic is available through pdf_split_rasterize.py.

To Split and Rasterize:

python pdf_split_rasterize.py --input /path/to/your/document.pdf --output /path/to/output_folder --resolution 300

Key Arguments:

--input: The source PDF file.
--output: The directory where the output will be saved.
--resolution or -r: The DPI for rasterization (default: 300).
--keep-originals: Prevents the deletion of the intermediate split PDF files.
--workers or -w: Number of parallel processes to use for rasterization.
--flatten-output: Moves all generated files into a single flat directory.

To Merge:

python pdf_split_rasterize.py --merge /path/to/output_folder

Key Arguments:

--merge: The directory containing the *_rasterized.pdf files to be merged.
--merge-output: Specify a custom path for the final merged PDF.
--no-recreate-bookmarks: Disables the automatic recreation of bookmarks from the original structure.

Configuration

You can configure the paths to the Ghostscript and ImageMagick executables by editing the config.json file:

{
    "gs_path": "gs",
    "magick_path": "magick"
}

If the executables are in your system's PATH, the default values should work. Otherwise, provide the full absolute path to gs and magick. The GUI also provides a settings window to configure these paths.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Sample PDFs		Sample PDFs
__pycache__		__pycache__
tests		tests
.DS_Store		.DS_Store
PDF_Rasterize.spec		PDF_Rasterize.spec
README.md		README.md
config.json		config.json
pdf_gui.py		pdf_gui.py
pdf_split_rasterize.py		pdf_split_rasterize.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Rasterize

Features

Prerequisites

Installation

Usage

Graphical User Interface (GUI)

Command-Line Interface (CLI)

Configuration

About

Uh oh!

Releases

Packages

Languages

kush-chou/PDF_Rasterize

Folders and files

Latest commit

History

Repository files navigation

PDF Rasterize

Features

Prerequisites

Installation

Usage

Graphical User Interface (GUI)

Command-Line Interface (CLI)

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages