PDF Extractor Tools - Noësis project

This app is part of a larger project called Noësis (more information coming soon). It aims (among other goals) to provide effective tools for researchers and students.

Actual key functionalities

Academic citations extraction

From a PDF, you can export (PDF, Word or txt handled) all the citations in the original document (traditional citations between quotation marks, harvard citations, and block citations). The process relies on a precise analysis of characters — including Unicode symbols, note numbers, and their coordinates — to reconstruct citations and their associated footnotes.

You can find the citation-extractor as an individual tool here - with also more informations.

Annotations extraction

From a PDF, you can export all the annotations added (PDF, Word and txt export).

You can find it as an individual tool here.

A basic authentication service

As it is still a demo, registration is disabled. A basic demo user is provided to try the main functionalities.

Technical details

Tech Stack

Java 17
Spring Boot 3.5
JWT Authentication (access token only in demo mode -> refresh token planned in prod)
Bucket4j for rate limiting
Maven
H2 (demo) -> PostgreSQL (prod)

Security and Middleware

JWT Authentication with Authorization: Bearer token
Rate limiting with Bucket4j
- General: 100 requests / 15 min
- Auth : 5 requests / 1 min
- Critical (PDF extraction): stricter limits - 3 requests / min
Custom middlewares using Spring Interceptors
Custom error handling

Tests

Password hashing with Argon2 verified
Jwt Service tested
Authenticated endpoints tested with MockMvc
Extraction endpoints tested: valid ZIP response with dummy files
Rate limits tested

Features in Code

Feature	Path	Description
Register	`GET /auth/register`	Returns new user non sensitive info - disabled in demo mode
Login	`POST /auth/login`	Returns JWT
User Info	`GET /user/me`	Requires JWT
Update User	`PUT /user/update`	Auth required
Citation Extraction	`POST /extract/citations`	File + formats, returns ZIP
Annotation Extraction	`POST /extract/annotations`	File + formats, returns ZIP

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.mvn/wrapper		.mvn/wrapper
src		src
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Extractor Tools - Noësis project

Actual key functionalities

Technical details

Tech Stack

Security and Middleware

Tests

Features in Code

About

Uh oh!

Releases

Packages

Languages

CamilleNerriere/pdf-extractor-tools

Folders and files

Latest commit

History

Repository files navigation

PDF Extractor Tools - Noësis project

Actual key functionalities

Technical details

Tech Stack

Security and Middleware

Tests

Features in Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages