This is a Node.js server that provides REST APIs for text extraction and bounding box detection using Tesseract OCR.
- Node.js (v14 or higher)
- Tesseract OCR v5.0.0
- npm or yarn
sudo apt update
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
brew install tesseract
Download and install the binary from UB Mannheim
npm install
Development mode:
npm run dev
Production mode:
npm start
- URL:
/get-text
- Method: POST
- Content-Type: multipart/form-data
- Body:
- image: Image file (PNG, JPEG, TIFF)
- URL:
/get-bboxes
- Method: POST
- Content-Type: multipart/form-data
- Body:
- image: Image file (PNG, JPEG, TIFF)
- type: One of ["word", "line", "paragraph", "block", "page"]
npm test
Build the image:
docker build -t tesseract-ocr-server .
Run the container:
docker run -p 3000:3000 tesseract-ocr-server