mule-pdfbox-module

MuleSoft Apache PDFBox Module to Manipulate PDF

MuleSoft PDF Utilities Connector (Java SDK)

A lightweight MuleSoft connector that enables PDF manipulation using Apache PDFBox. This module provides a set of high-performance operations to extract information, manipulate pages, and split documents inside Mule flows.

📦 Features

Extract PDF Info: Retrieve metadata such as author, title, subject, and number of pages.
Extract Text by Page Range: Extract visible text from a specified range of pages.
Filter Pages: Remove blank pages and/or keep only selected page ranges.
Rotate Pages: Rotate a range of pages clockwise or counterclockwise.
Split Pages: Split a PDF into individual single-page PDF files.
Merge PDFs: Merge array of PDFs into individual single PDF file.

🧰 Built With

Apache PDFBox
MuleSoft Java SDK (for Mule 4)

🚀 Operations

`extractPdfInfo`

Description: Extracts metadata and document properties.
Input: PDF file as InputStream
Output:

{
  "title": "Sample",
  "author": "John Doe",
  "subject": "Contracts",
  "keywords": "MuleSoft,PDF",
  "version": "1.4",
  "encrypted": false,
  "numberOfPages": 5
}

`extractTextByPageRange`

Description: Extracts plain text from a specified page range.
Inputs:

PDF InputStream
Optional startPage and endPage
Output: Extracted text as String

`filterPages`

Description: Removes blank pages and/or filters based on page range.
Inputs:

PDF InputStream
removeBlankPages (boolean)
Optional startPage and endPage
Output: Filtered PDF as InputStream

`rotatePages`

Description: Rotates a specific range of pages clockwise or counterclockwise.
Inputs:

PDF InputStream
startPage, endPage
clockwise (boolean)
Output: Rotated PDF as InputStream

`splitPages`

Description: Splits a multi-page PDF into a list of single-page PDF files.
Input: PDF InputStream
Output: List of InputStreams, one per page

`mergePDFs`

Description: Merge array of PDFs into individual single PDF file.
Input: PDF Array of InputStreams, one per PDF Output: PDF InputStream

📂 Usage

This connector is designed for use in Mule 4 Java SDK-based modules. Register the operations in your Extension class and call them from flows or other operations using standard SDK syntax.

⚠️ Limitations

No image compression or DPI downsampling.
Does not preserve digital signatures if present.
Very large PDFs may consume significant memory during processing.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
exchange-docs		exchange-docs
icon		icon
src/main/java/org/mule/extension/pdfBox		src/main/java/org/mule/extension/pdfBox
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
formatter.xml		formatter.xml
mule-artifact.json		mule-artifact.json
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mule-pdfbox-module

MuleSoft PDF Utilities Connector (Java SDK)

📦 Features

🧰 Built With

🚀 Operations

`extractPdfInfo`

`extractTextByPageRange`

`filterPages`

`rotatePages`

`splitPages`

`mergePDFs`

📂 Usage

⚠️ Limitations

About

Uh oh!

Uh oh!

Languages

License

MuleSoft-Forge/mule-pdfbox-module

Folders and files

Latest commit

History

Repository files navigation

mule-pdfbox-module

MuleSoft PDF Utilities Connector (Java SDK)

📦 Features

🧰 Built With

🚀 Operations

extractPdfInfo

extractTextByPageRange

filterPages

rotatePages

splitPages

mergePDFs

📂 Usage

⚠️ Limitations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

`extractPdfInfo`

`extractTextByPageRange`

`filterPages`

`rotatePages`

`splitPages`

`mergePDFs`