YOLOv8-CRNN Scene Text Recognition

This repository is dedicated to implementing Deep Learning-based Scene Text Recognition models, utilizing a two-step approach involving Text Detection (TD) and Text Recognition (TR). The TD step employs YOLOv8, while the TR step utilizes a Convolutional Recurrent Neural Network (CRNN). The dataset used for training and evaluation are provided by TextOCR with ~1M high quality word annotations on TextVQA images. The dataset can be accessed through this Kaggle link.

Download or use the Kaggle API to download and extract the dataset. initialize the dataset by storing the extracted Kaggle dataset in datasets directory. The directory should look something like this.

├── datasets
│ └── archive
│ ├── train_val_images
│ │ └── train_images
│ │ ├── img1.jpg
│ │ ├── img2.jpg
│ │ └── ...
│ ├── annot.csv
│ ├── annot.parquet
│ ├── img.csv
│ ├── img.parquet
│ └── TextOCR_0.1_train.json
├── demo
└── ...

Workflow

The project workflow is straightforward: Given an image, text detection and recognition are performed through YOLOv8 and CRNN models, respectively. The process involves the detection and extraction of texts using YOLOv8, storing the resulting texts as a collection of cropped text images. These cropped images serve as input for the CRNN model, which recognizes all the text within them. The final results are then plotted on the original image. The illustration of the workflow is presented below:

Scripts and Notebooks

All scripts and notebooks are located under the src/ directory:

yolov8_datagen.py
- This Python script (yolov8_datagen.py) reformats the dataset into the YOLOv8 training format for TD.
yolov8_workflow.ipynb
- The notebook script (yolov8_workflow.ipynb) provides a step-by-step guide on custom training and evaluating YOLOv8 models using the data generation script (yolov8_datagen.py). It covers the entire workflow from data preparation to model training.
crnn_datagen.py
- This Python script (crnn_datagen.py) is responsible for cropping text instances from images and creating the TR dataset.
crnn_dataset.py
- The Python script (crnn_dataset.py) manages the dataset for the CRNN model, including loading and preprocessing.
crnn_decoder.py
- The Python script (crnn_decoder.py) contains the decoder implementation for the CRNN model.
crnn_model.py
- The Python script (crnn_model.py) defines the architecture of the CRNN model for text recognition.
crnn_predict.py
- This Python script (crnn_predict.py) is used for predicting text instances using the trained CRNN model.
crnn_train.py
- The Python script (crnn_train.py) is responsible for training the CRNN model on the prepared dataset.
crnn_evaluate.py
- The Python script (crnn_evaluate.py) evaluates the performance of the trained CRNN model on a validation set.
predict.py
- The Python script (predict.py) performs inference of the full STR workflow using YOLOv8 detector and CRNN recognizer.

Getting Started

To begin using the pretrained YOLO Text Detector, you can use Ultralytics YOLO API through the CLI.

yolo detect predict model=checkpoints/yolov8_5k.pt source=demo/TD.jpg

To begin using the pretrained CRNN Text Recognizer, you can use the crnn_predict.py script throught eh CLI.

python src/crnn_predict.py --cp_path checkpoints/crnn_s100k.pt --source demo/TR_Harris.png

python src/crnn_predict.py --cp_path checkpoints/crnn_s100k.pt --source demo/TR_Harris.png

To begin using the full STR workflow of both YOLOv8 and CRNN, you can use the predict.py script trorugh the CLI.

Image

python src/predict.py --detector checkpoints/yolov8_5k.pt --recognizer checkpoints/crnn_s100k.pt --source demo/TD.jpg

Video

python src/predict.py --detector checkpoints/yolov8_5k.pt --recognizer checkpoints/crnn_s100k.pt --source demo/street.mp4

The demo video is a portion of streetview video from Walking Around youtube channel. The full video can be accessed here

Results

YOLOv8 Small Model for Text Detection
- Training Details:
  - Model: YOLOv8 Small
  - Fine-tuned on 5,000 images from the TextOCR dataset
  - Training Epochs: 20
- Losses:
  - Train Box Loss: 1.305
  - Validation Box Loss: 1.2908
- Performance Metrics:
  - Mean Average Precision (mAP50): 67.559%
CRNN Pretrained Model for Text Recognition
- Training Details:
  - Model: CRNN
  - Pretrained on synth90k dataset link
  - Fine-tuned on 100,000 cropped text images from the TextOCR dataset
  - Training Epochs: 5
  - CTC Decoder: Greedy algorithm for faster inference time
- Losses:
  - Train CTC Loss: 5.948
  - Validation CTC Loss: 4.664
- Accuracy:
  - Validation Accuracy: 58%

for more information about training, evaluation, and dataset generation refer to source code.

Dependencies

Ensure all necessary dependencies are installed by running:

pip install -r requirements.txt

Feel free to customize and adapt the code to fit your specific requirements. If you encounter any issues or have suggestions for improvement, please open an issue or submit a pull request.

References

This project are heavliy inspired by Ultralytics and CRNN-Pytorch Github Repo:

Ultralytics Documentation Page Ultralytics
CRNN Implementation on Pytorch Github CRNN-Pytorch
Original CRNN Research Paper CRNN
TextOCR - Text Extraction from Images Dataset, Kaggle Link TextOCR

Citations

@software{Jocher_Ultralytics_YOLO_2023,
      author = {Jocher, Glenn and Chaurasia, Ayush and Qiu, Jing},
      license = {AGPL-3.0},
      month = jan,
      title = {{Ultralytics YOLO}},
      url = {https://github.com/ultralytics/ultralytics},
      version = {8.0.0},
      year = {2023}
}

@inproceedings{singh2021textocr,
      title={{TextOCR}: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text},
      author={Singh, Amanpreet and Pang, Guan and Toh, Mandy and Huang, Jing and Galuba, Wojciech and Hassner, Tal},
      journal={The Conference on Computer Vision and Pattern Recognition},
      year={2021}
}

@misc{shi2015endtoend,
      title={An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition}, 
      author={Baoguang Shi and Xiang Bai and Cong Yao},
      year={2015},
      eprint={1507.05717},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

For any inquiries or feedback, please contact Fadhil Umar at [fadhilumaraf.9a@gmail.com].

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
checkpoints		checkpoints
demo		demo
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YOLOv8-CRNN Scene Text Recognition

Workflow

Scripts and Notebooks

Getting Started

Results

Dependencies

References

Citations

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Alfadhils/YOLOv8-CRNN-Scene-Text-Recognition

Folders and files

Latest commit

History

Repository files navigation

YOLOv8-CRNN Scene Text Recognition

Workflow

Scripts and Notebooks

Getting Started

Results

Dependencies

References

Citations

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages