Skip to content

r69shabh/visionlab-tesseract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tesseract OCR API Server

This is a Node.js server that provides REST APIs for text extraction and bounding box detection using Tesseract OCR.

Prerequisites

  1. Node.js (v14 or higher)
  2. Tesseract OCR v5.0.0
  3. npm or yarn

Installation

1. Install Tesseract OCR

Ubuntu/Debian

sudo apt update
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

macOS

brew install tesseract

Windows

Download and install the binary from UB Mannheim

2. Install Node.js dependencies

npm install

Running the Server

Development mode:

npm run dev

Production mode:

npm start

API Endpoints

1. Extract Text

  • URL: /get-text
  • Method: POST
  • Content-Type: multipart/form-data
  • Body:
    • image: Image file (PNG, JPEG, TIFF)

2. Get Bounding Boxes

  • URL: /get-bboxes
  • Method: POST
  • Content-Type: multipart/form-data
  • Body:
    • image: Image file (PNG, JPEG, TIFF)
    • type: One of ["word", "line", "paragraph", "block", "page"]

Running Tests

npm test

Docker Support

Build the image:

docker build -t tesseract-ocr-server .

Run the container:

docker run -p 3000:3000 tesseract-ocr-server

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published