We adapted color based and geometry data extraction of Pie Charts, Scatter plot and it's varients(Simple, Dot, Bubble) in this system.
This module performs text detection and recognition on chart Image. We use a deep-learning-based OCR, namely Character Region Awareness for Text Detection, CRAFT | Paper | Code | succeeded by a scene text recognition framework, STR | Paper | Code |
Things to be taken care before runing the code:
- Download the pretrained model craft_mlt_25k.pth, and place model at the following path
ChartDecode/CRAFT_TextDetector/craft_mlt_25k.pth
- Download the pretrained model TPS-ResNet-BiLSTM-Attn.pth, tand place model at the following path
ChartDecode/Deep_TextRecognition/TPS-ResNet-BiLSTM-Attn.pth
- The code is developed and tested on Python 3.6 you can also find attached requirements.txt to avoid errors due to compatibility issues
- Finally you can run the
main.py
file and provide the path of your chart image file. It generates the following files as output:- data_
filename
.csv: contains extracted data values along with additional semantic attributes like chart_type, title, x-title, and y-title that helps in chart reconstruction and summarization - Reconstructed_
filename
.png: The reconstructed image from extracted data_filename
.csv file. - summ_
filename
.txt: The chart text summary generated using templated-NLG approach based on our user-study observations
- data_
- Also find the synthetically genrerated test data set for this system with it's results at
ChartDecode/SYNTHETIC_DATA
.