Removes 'RETRACTED' watermarks from Academic PDF articles.
This tool provides a web interface and a command-line utility to remove watermarks from PDF files. It offers three levels of aggressivity for watermark removal. Higher levels are more aggressive and may cause more changes to the final document, but images and photos embedded in the PDF are always preserved.
- Web Interface: An easy-to-use interface to upload and clean PDFs.
- Command-Line Interface: For batch processing and integration into workflows.
- Multiple Aggressivity Levels: Choose the best watermark removal strategy for your needs.
- Image Preservation: Images and photos within the PDF are not affected.
- Level 1: Removes all PDF stream resources that are explicitly identified as watermarks (e.g., using
/Watermark
or/Background
tags). - Level 2 (Default): Includes all removals from Level 1, plus it removes graphical elements that appear more than once across the PDF pages and all instances of the word 'RETRACTED'. Note: For some PDFs, this level might remove the entire text from a page.
- Level 3: Includes all removals from Levels 1 and 2, and also removes all graphical elements from the PDF.
The simplest way to use the PDF Watermark Remover is through its web interface.
Clone the repository and install the required Python packages:
git clone https://github.com/your-username/pdf-watermark-remover.git
cd pdf-watermark-remover
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Start the Flask application:
python main.py
Open your web browser and navigate to http://127.0.0.1:5000
.
- Click the "Upload PDF File" button and select your PDF.
- Click "Remove Watermarks".
- The cleaned PDF will be automatically downloaded.
Follow the same installation steps as for the web interface.
Run the watermark remover from the command line:
python main.py -i <PDF-input> -o <PDF-output> -m [mode of aggressivity]
<PDF-input>
: Path to your input PDF.<PDF-output>
: Path for the cleaned output PDF.[mode of aggressivity]
:1
,2
, or3
(defaults to2
).
The web interface is built with Flask and styled with Tailwind CSS. If you want to modify the frontend, you'll need to set up the Tailwind CSS development environment.
Install the necessary npm packages:
npm install
This will install Tailwind CSS, PostCSS, and Autoprefixer, as defined in package.json
.
To watch for changes in the CSS and automatically generate the output.css
file, run the following command:
npm run build-css
This command, defined in package.json
, uses tailwindcss
to compile static/style.css
into static/output.css
. The --watch
flag keeps the process running and automatically recompiles when you make changes to your HTML or CSS files.
tailwind.config.js
: This file configures Tailwind CSS. Thecontent
array tells Tailwind to scan all HTML and JavaScript files in thetemplates
andstatic
directories for class names.postcss.config.js
: This file configures PostCSS to use the Tailwind CSS and Autoprefixer plugins.static/style.css
: This is the main CSS source file. It includes the base Tailwind CSS styles.static/output.css
: This is the generated CSS file that is included in the main HTML template (templates/index.html
). Do not edit this file directly, as it is overwritten every time thebuild-css
script is run.
.
├── app/ # Core application logic (if any)
├── main.py # Main Flask application and CLI entry point
├── package.json # Node.js dependencies and scripts for frontend
├── pdf_processing/ # Modules for PDF manipulation
│ ├── watermark_remover.py
│ └── ...
├── requirements.txt # Python dependencies
├── static/ # Static assets (CSS, JS)
│ ├── style.css # Source CSS file for Tailwind
│ └── output.css # Generated CSS file
├── templates/ # HTML templates for Flask
│ └── index.html
├── tailwind.config.js # Tailwind CSS configuration
└── postcss.config.js # PostCSS configuration