Skip to content

avadesh02/Sanskrit-letter-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sanskrit-letter-dataset

This dataset has been created for the research and development of a Sanskrit OCR. The dataset contains 7702 images of sanskrit(Devanagari) letters belonging to 602 classes.

This is a publicly available dataset and can be used for further research.

The dataset is structured in the following manner: The data is in the form of a 2D array. The first dimension indicates the indexed letter in the dataset. The second dimension is again a 1D dimensional array containing three elements- The first element is the image in array form. The second element is the corresponding class index number. The third element is the corresponding English class Annotation.

Sample Images of Sanskrit letters in the dataset:

alt text alt text alt text

alt text alt text alt text

alt text alt text alt text

alt text

Please run dbreader.py to read and understand the structure of the dataset.

The dataset file is dev_letter_D.p.

print('reading...')
db = pickle.load(open("dev_letter_D.p","rb"))
print("Number of letter images in the dataset are:" + str(len(db)))
i = 0
cv2.imshow('a',db[i][0])
cv2.waitKey()
print("The Class index number of the sanskrit letter Image is : " + str(db[i][1]))
print("English Class Annotation of the sanskrit Image(I-Trans) is : " + str(db[i][2]))

To Cite this dataset in your academic research, please use the following citation :

Avadesh, Meduri, and Navneet Goyal. "Optical Character Recognition for Sanskrit Using Convolution Neural Networks." 
In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 447-452. IEEE, 2018.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •