🔒 PII Detection & Redaction App

A powerful Python-based web application built with Streamlit for detecting and redacting Personally Identifiable Information (PII) in documents. Supports multiple file formats including PDFs, images, and text files with intelligent pattern recognition for Indian identity documents.

🌟 Features

🔍 Smart PII Detection: Automatically detects Aadhaar numbers, PAN numbers, Driving Licenses, and Voter IDs using advanced regular expressions
🎭 Intelligent Masking: Replaces detected PII with 'X' characters while preserving document structure
📄 PDF Redaction: Creates professionally redacted PDFs with PII information blacked out
📁 Multi-format Support: Processes PDF, PNG, JPG, JPEG, and TXT files seamlessly
🖥️ User-friendly Interface: Clean, intuitive Streamlit web interface
⚡ Real-time Processing: Instant PII detection and masking results

🚀 Quick Start

Prerequisites

Python 3.7 or higher
Git (for cloning the repository)

Installation

Clone the repository

git clone https://github.com/yourusername/pii-detection-app.git
cd pii-detection-app

Install Python dependencies
```
pip install -r requirements.txt
```
Install Tesseract OCR (Optional - for image processing)
- Windows: Download from Tesseract OCR
- macOS: brew install tesseract
- Linux: sudo apt-get install tesseract-ocr
Run the application
```
streamlit run main.py
```
Open your browser and navigate to http://localhost:8501

Note: The app works with PDF and TXT files even without Tesseract. Image processing requires Tesseract OCR installation.

📖 How It Works

1. Upload Document

Drag and drop or browse to select your document
Supports PDF, PNG, JPG, JPEG, and TXT formats

2. Automatic PII Detection

Advanced regex patterns scan for:
- Aadhaar Numbers: 12-digit unique identification numbers
- PAN Numbers: 10-character alphanumeric tax identification
- Driving License: State-specific license number patterns
- Voter ID: Election Commission identification numbers

3. Smart Processing

PDFs: Direct text extraction and redaction
Images: OCR-based text recognition (requires Tesseract)
Text Files: Direct content analysis

4. Secure Output

View detected PII in organized format
Download redacted PDFs with PII blacked out
See masked versions with 'X' replacements

🎯 Usage Example

Launch the app: streamlit run main.py
Open browser: Navigate to http://localhost:8501
Upload file: Choose your document
Review results: See detected PII and masked versions
Download: Get redacted PDF if applicable

🛡️ Supported PII Types

PII Type	Pattern	Example
Aadhaar	12 digits (with/without spaces)	`1234 5678 9012`
PAN	5 letters + 4 digits + 1 letter	`ABCDE1234F`
Driving License	State code + digits	`MH1234567890`
Voter ID	3 letters + 7 digits	`ABC1234567`

⚠️ Important Notes

Privacy First: All processing happens locally on your machine
No Data Storage: Files are temporarily processed and automatically cleaned up
OCR Dependency: Image processing requires Tesseract OCR installation
Accuracy: Detection accuracy depends on document quality and text clarity
Indian Focus: Current patterns optimized for Indian identity documents

🤝 Contributing

We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with Streamlit for the web interface
Uses Tesseract OCR for image text extraction
PDF processing powered by PyMuPDF and PyPDF2

⭐ Star this repository if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔒 PII Detection & Redaction App

🌟 Features

🚀 Quick Start

Prerequisites

Installation

📖 How It Works

1. Upload Document

2. Automatic PII Detection

3. Smart Processing

4. Secure Output

🎯 Usage Example

🛡️ Supported PII Types

⚠️ Important Notes

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

deeksha006/PII-Detection-

Folders and files

Latest commit

History

Repository files navigation

🔒 PII Detection & Redaction App

🌟 Features

🚀 Quick Start

Prerequisites

Installation

📖 How It Works

1. Upload Document

2. Automatic PII Detection

3. Smart Processing

4. Secure Output

🎯 Usage Example

🛡️ Supported PII Types

⚠️ Important Notes

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages