Skip to content

Moenupa/DeOCR

Repository files navigation

DeOCR

DeOCR (de-cor), A reverse OCR tool that renders huggingface-compatible datasets to images of specified sizes (e.g., 512x512). This tool can be considered as a text-to-image data pre-processing component in pipelines such as DeepSeek-OCR.

---
title: DeOCR Usage in LLM Pipeline
---
flowchart LR
  TEXTDATA[/"some context in text form"/]
  MMDATA[/"Does this particular car <br/> &lt;image&gt; present in here &lt;image&gt; ?"/]
  HFDATASET[("huggingface dataset")] 
  subgraph DeOCR
    CSS1["cli --style red-text textit"]
    CSS2["cli --style default"]
    CSS3["cli --style default"]
    MAPPER["DeOCR Dataset Mapper"]
  end
  TEXTDATA --> CSS1 --> IMG1[["some context in text form"]]:::redText
  TEXTDATA --> CSS2 --> IMG2[["some context in text form"]]
  MMDATA --> CSS3 --> IMG3[["Does this particular car <br/> 🖼️🖼️🖼️🖼️🖼️🖼️🖼️<br/>🖼️🖼️🖼️🚗🖼️🖼️🖼️<br/>🖼️🖼️🖼️🖼️🖼️🖼️🖼️<br/> present in here <br/> 🖼️🖼️🖼️🖼️🖼️🖼️🖼️<br/>🖼️🖼️🖼️🖼️🖼️🖼️🖼️<br/>🖼️🖼️🖼️🖼️🖼️🖼️🖼️<br/>?"]]
  HFDATASET --> MAPPER --> DEOCRDATASET[("🖼️ imagified dataset")]
  DEOCRDATASET & IMG1 & IMG2 & IMG3 -.-> MODEL["LLMs or VLMs<br/> Evaluation"]
  classDef redText color:#ff0000,font-style:italic;
  IMG1 ~~~|"fa:fa-mobile-screen A screenshot of text <br/>w. special formatting"| IMG1
  IMG2 ~~~|"fa:fa-mobile-screen A plain screenshot of text"| IMG2
  IMG3 ~~~|"fa:fa-mobile-screen A screenshot of both text and images"| IMG3
Loading
Here is an output example, sized `512x512`, with random string as context

a 512x512 example

Quick Start

pip install deocr
# activate your python environment, then install playwright deps
playwright install chromium
Alternatively, install from source
# uv
uv add "deocr @ git+https://github.com/Moenupa/DeOCR.git"
# for pip or conda
pip install "git+https://github.com/Moenupa/DeOCR.git"
# activate your python environment, then install playwright deps
playwright install chromium
For development

Please use uv to manage the environment:

git clone https://github.com/Moenupa/DeOCR.git
cd DeOCR
uv venv
uv sync --dev
source .venv/bin/activate
playwright install chromium
pre-commit install

About

A reverse OCR package that transforms text datasets to images

Topics

Resources

License

Stars

Watchers

Forks