This project automates the redaction of sensitive information in PDF files. It processes PDFs to obscure sensitive data and saves the redacted versions in a specified output folder. The redaction is achieved through a combination of natural language processing (NLP) and regular expressions to identify and replace sensitive information. End of all it creates a new pdf file automatically
-
input_pdfs/
: Contains the original PDF files that need to be redacted.- Example:
input_pdfs/test.pdf
- Example:
-
output_pdfs/
: Will hold the redacted PDF files. The filenames will include a_redacted
suffix.- Example:
output_pdfs/test_redacted.pdf
- Example:
-
run.py
: The main script for processing PDFs. It reads frominput_pdfs
, applies redactions, and saves the results tooutput_pdfs
. -
requirements.txt
: Lists the required Python libraries for the project.
-
Sensitive Data Redaction: Automatically redact sensitive information (e.g., personal identifiers and numbers) from PDFs.
-
Automation: Streamline the redaction process for multiple PDFs, ensuring consistency and efficiency.
-
Output Management: Save redacted PDFs with a clear naming convention for easy identification.
- Clone the Repository:
git clone https://github.com/omertascioglu/Data-Redact-In-Pdf-Using-Turkish-NLP.git cd your-repository