PDF Watermark Remover

Removes 'RETRACTED' watermarks from Academic PDF articles.

This tool provides a web interface and a command-line utility to remove watermarks from PDF files. It offers three levels of aggressivity for watermark removal. Higher levels are more aggressive and may cause more changes to the final document, but images and photos embedded in the PDF are always preserved.

Features

Web Interface: An easy-to-use interface to upload and clean PDFs.
Command-Line Interface: For batch processing and integration into workflows.
Multiple Aggressivity Levels: Choose the best watermark removal strategy for your needs.
Image Preservation: Images and photos within the PDF are not affected.

Aggressivity Levels

Level 1: Removes all PDF stream resources that are explicitly identified as watermarks (e.g., using /Watermark or /Background tags).
Level 2 (Default): Includes all removals from Level 1, plus it removes graphical elements that appear more than once across the PDF pages and all instances of the word 'RETRACTED'. Note: For some PDFs, this level might remove the entire text from a page.
Level 3: Includes all removals from Levels 1 and 2, and also removes all graphical elements from the PDF.

Web Interface Quick Start

The simplest way to use the PDF Watermark Remover is through its web interface.

1. Installation

Clone the repository and install the required Python packages:

git clone https://github.com/your-username/pdf-watermark-remover.git
cd pdf-watermark-remover
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Run the Web Server

Start the Flask application:

python main.py

Open your web browser and navigate to http://127.0.0.1:5000.

3. Usage

Click the "Upload PDF File" button and select your PDF.
Click "Remove Watermarks".
The cleaned PDF will be automatically downloaded.

Command-Line Quick Start

1. Installation

Follow the same installation steps as for the web interface.

2. Usage

Run the watermark remover from the command line:

python main.py -i <PDF-input> -o <PDF-output> -m [mode of aggressivity]

<PDF-input>: Path to your input PDF.
<PDF-output>: Path for the cleaned output PDF.
[mode of aggressivity]: 1, 2, or 3 (defaults to 2).

For Developers: Frontend Setup with Tailwind CSS

The web interface is built with Flask and styled with Tailwind CSS. If you want to modify the frontend, you'll need to set up the Tailwind CSS development environment.

1. Prerequisites

Node.js and npm

2. Install Dependencies

Install the necessary npm packages:

npm install

This will install Tailwind CSS, PostCSS, and Autoprefixer, as defined in package.json.

3. Run the Tailwind CSS Build Process

To watch for changes in the CSS and automatically generate the output.css file, run the following command:

npm run build-css

This command, defined in package.json, uses tailwindcss to compile static/style.css into static/output.css. The --watch flag keeps the process running and automatically recompiles when you make changes to your HTML or CSS files.

4. How it Works

tailwind.config.js: This file configures Tailwind CSS. The content array tells Tailwind to scan all HTML and JavaScript files in the templates and static directories for class names.
postcss.config.js: This file configures PostCSS to use the Tailwind CSS and Autoprefixer plugins.
static/style.css: This is the main CSS source file. It includes the base Tailwind CSS styles.
static/output.css: This is the generated CSS file that is included in the main HTML template (templates/index.html). Do not edit this file directly, as it is overwritten every time the build-css script is run.

Project Structure

.
├── app/                  # Core application logic (if any)
├── main.py               # Main Flask application and CLI entry point
├── package.json          # Node.js dependencies and scripts for frontend
├── pdf_processing/       # Modules for PDF manipulation
│   ├── watermark_remover.py
│   └── ...
├── requirements.txt      # Python dependencies
├── static/               # Static assets (CSS, JS)
│   ├── style.css         # Source CSS file for Tailwind
│   └── output.css        # Generated CSS file
├── templates/            # HTML templates for Flask
│   └── index.html
├── tailwind.config.js    # Tailwind CSS configuration
└── postcss.config.js     # PostCSS configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Watermark Remover

Features

Aggressivity Levels

Web Interface Quick Start

1. Installation

2. Run the Web Server

3. Usage

Command-Line Quick Start

1. Installation

2. Usage

For Developers: Frontend Setup with Tailwind CSS

1. Prerequisites

2. Install Dependencies

3. Run the Tailwind CSS Build Process

4. How it Works

Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
pdf_processing		pdf_processing
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

s24hira/pdf-watermark-remover

Folders and files

Latest commit

History

Repository files navigation

PDF Watermark Remover

Features

Aggressivity Levels

Web Interface Quick Start

1. Installation

2. Run the Web Server

3. Usage

Command-Line Quick Start

1. Installation

2. Usage

For Developers: Frontend Setup with Tailwind CSS

1. Prerequisites

2. Install Dependencies

3. Run the Tailwind CSS Build Process

4. How it Works

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages