CLAMS app wraps around Tesseract OCR to perform OCR on images or video frames.
The wrapper takes a VideoDocument
with SWT
TimeFrame
annotations. The app specifically
uses the representative TimePoint
annotations from SWT v4 TimeFrame
annotations to extract specific frames for OCR
From the tesseract documentation
The tesseract model returns a dict
with pytesseract's image_to_data
function
Here is the typical layout:
{'level': [], 'page_num': [], 'block_num': [],’par_num': [], 'line_num': [], 'word_num': [], 'left': [], 'top': [], 'width': [], 'height': [], 'conf': [], 'text': []}
The tesseract wrapper preserves this structured information in the output MMIF by creating
lapps Paragraph
Sentence
and Token
annotations corresponding to the Block
, Line
, and Word
from the tesseract output.
General user instruction for CLAMS apps is available at CLAMS Apps documentation.
Below is a list of additional information specific to this app.
This tool relies on the tesseract ocr engine and the pytesseract python library.
(The container image is built with tesseract-ocr
(version 5.3) on Debian Bookworm, see https://packages.debian.org/source/bookworm/tesseract)
- pytesseract
- Requires mmif-python[cv] for the
VideoDocument
helper functions
For the full list of parameters, please refer to the app metadata from CLAMS App Directory or metadata.py
file in this repository.