This Spring Boot project allows you to:
- Upload PDF files
- Generate SHA256 hashes
- Asynchronously extract and store metadata in a MySQL database
- Retrieve metadata by hash
- 📦 RESTful endpoints for uploading and retrieving PDFs
- 🔐 SHA256-based hash identification
- ⚙️ Asynchronous metadata extraction
- 🗃️ MySQL persistence
- 🐳 Docker & Docker Compose support
- 📝 Clean logs for traceability
- Docker + Docker Compose installed
- Optional: Java 17+ and Maven if running locally
git clone https://github.com/arpitsingh134/PDF-Metadata-Scanne.git
cd pdf-metadata-scanner
mvn clean package
mvn clean install
docker-compose up --build
docker-compose down -v
docker exec -it mysqldb mysql -u testuser -p
testpass
show tables;
use pdfscanner;
show tables;
select * from pdf_metadata;
Make sure MySQL is running and application.properties
is updated with correct DB credentials.
mvn clean spring-boot:run
Uploads the file and triggers metadata extraction asynchronously.
curl -F "file=@sample.pdf" http://localhost:8080/scan
{
"sha256": "hqR2EoK/q7NQ0/DGXAJfI/Da8mqwYZcD3TxA/pdKX1Y="
}
curl http://localhost:8080/lookup/hqR2EoK%2Fq7NQ0%2FDGXAJfI%2FDa8mqwYZcD3TxA%2FpdKX1Y%3D
{
"sha256": "hqR2EoK/q7NQ0/DGXAJfI/Da8mqwYZcD3TxA/pdKX1Y=",
"version": "1.7",
"producer": "Apache PDFBox",
"author": "Arpit Singh",
"created": "D:20250614010000Z",
"modified": "D:20250614020000Z",
"scanned": "2025-06-14T05:45:00Z",
"filename": "sample_20250614_114500.pdf"
}
Layer | Description |
---|---|
controller |
Handles /scan and /lookup/{hash} |
service |
Extracts metadata using PDFBox |
model |
PdfMetadata entity |
repository |
Spring Data JPA for MySQL |
util |
SHA256 Hash Utility |
- Java 17
- Spring Boot
- Spring Web + Spring Data JPA
- MySQL
- Apache PDFBox
- Docker / Docker Compose
- Lombok + SLF4J
Dockerfile
: For containerizing the Spring Boot appdocker-compose.yml
: Spins up Spring app + MySQL DBapplication.properties
: DB configs and JPA tuningPdfMetadata
: JPA entity model for metadata
Arpit Singh 📧 arpitsingh134@gmail.com 🔗 LinkedIn