Skip to content

Commit 07c56f1

Browse files
authored
Merge pull request opencv#17313 from hunter-college-ossd-spr-2020:revise-knn-tutorials
* Revise and expand kNN Python tutorials * Correct NPTEL link
1 parent 0e1c7ed commit 07c56f1

File tree

2 files changed

+97
-96
lines changed

2 files changed

+97
-96
lines changed

doc/py_tutorials/py_ml/py_knn/py_knn_opencv/py_knn_opencv.markdown

Lines changed: 34 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,20 @@ OCR of Hand-written Data using kNN {#tutorial_py_knn_opencv}
44
Goal
55
----
66

7-
In this chapter
8-
- We will use our knowledge on kNN to build a basic OCR application.
9-
- We will try with Digits and Alphabets data available that comes with OpenCV.
7+
In this chapter:
8+
- We will use our knowledge on kNN to build a basic OCR (Optical Character Recognition) application.
9+
- We will try our application on Digits and Alphabets data that comes with OpenCV.
1010

1111
OCR of Hand-written Digits
1212
--------------------------
1313

14-
Our goal is to build an application which can read the handwritten digits. For this we need some
15-
train_data and test_data. OpenCV comes with an image digits.png (in the folder
14+
Our goal is to build an application which can read handwritten digits. For this we need some
15+
training data and some test data. OpenCV comes with an image digits.png (in the folder
1616
opencv/samples/data/) which has 5000 handwritten digits (500 for each digit). Each digit is
17-
a 20x20 image. So our first step is to split this image into 5000 different digits. For each digit,
18-
we flatten it into a single row with 400 pixels. That is our feature set, ie intensity values of all
19-
pixels. It is the simplest feature set we can create. We use first 250 samples of each digit as
20-
train_data, and next 250 samples as test_data. So let's prepare them first.
17+
a 20x20 image. So our first step is to split this image into 5000 different digit images. Then for each digit (20x20 image),
18+
we flatten it into a single row with 400 pixels. That is our feature set, i.e. intensity values of all
19+
pixels. It is the simplest feature set we can create. We use the first 250 samples of each digit as
20+
training data, and the other 250 samples as test data. So let's prepare them first.
2121
@code{.py}
2222
import numpy as np
2323
import cv2 as cv
@@ -28,10 +28,10 @@ gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
2828
# Now we split the image to 5000 cells, each 20x20 size
2929
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]
3030

31-
# Make it into a Numpy array. It size will be (50,100,20,20)
31+
# Make it into a Numpy array: its size will be (50,100,20,20)
3232
x = np.array(cells)
3333

34-
# Now we prepare train_data and test_data.
34+
# Now we prepare the training data and test data
3535
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
3636
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)
3737

@@ -40,7 +40,7 @@ k = np.arange(10)
4040
train_labels = np.repeat(k,250)[:,np.newaxis]
4141
test_labels = train_labels.copy()
4242

43-
# Initiate kNN, train the data, then test it with test data for k=1
43+
# Initiate kNN, train it on the training data, then test it with the test data with k=1
4444
knn = cv.ml.KNearest_create()
4545
knn.train(train, cv.ml.ROW_SAMPLE, train_labels)
4646
ret,result,neighbours,dist = knn.findNearest(test,k=5)
@@ -52,13 +52,15 @@ correct = np.count_nonzero(matches)
5252
accuracy = correct*100.0/result.size
5353
print( accuracy )
5454
@endcode
55-
So our basic OCR app is ready. This particular example gave me an accuracy of 91%. One option
56-
improve accuracy is to add more data for training, especially the wrong ones. So instead of finding
57-
this training data every time I start application, I better save it, so that next time, I directly
58-
read this data from a file and start classification. You can do it with the help of some Numpy
59-
functions like np.savetxt, np.savez, np.load etc. Please check their docs for more details.
55+
So our basic OCR app is ready. This particular example gave me an accuracy of 91%. One option to
56+
improve accuracy is to add more data for training, especially for the digits where we had more errors.
57+
58+
Instead of finding
59+
this training data every time I start the application, I better save it, so that the next time, I can directly
60+
read this data from a file and start classification. This can be done with the help of some Numpy
61+
functions like np.savetxt, np.savez, np.load, etc. Please check the NumPy docs for more details.
6062
@code{.py}
61-
# save the data
63+
# Save the data
6264
np.savez('knn_data.npz',train=train, train_labels=train_labels)
6365

6466
# Now load the data
@@ -71,36 +73,36 @@ In my system, it takes around 4.4 MB of memory. Since we are using intensity val
7173
features, it would be better to convert the data to np.uint8 first and then save it. It takes only
7274
1.1 MB in this case. Then while loading, you can convert back into float32.
7375

74-
OCR of English Alphabets
76+
OCR of the English Alphabet
7577
------------------------
7678

77-
Next we will do the same for English alphabets, but there is a slight change in data and feature
79+
Next we will do the same for the English alphabet, but there is a slight change in data and feature
7880
set. Here, instead of images, OpenCV comes with a data file, letter-recognition.data in
7981
opencv/samples/cpp/ folder. If you open it, you will see 20000 lines which may, on first sight, look
80-
like garbage. Actually, in each row, first column is an alphabet which is our label. Next 16 numbers
81-
following it are its different features. These features are obtained from [UCI Machine Learning
82+
like garbage. Actually, in each row, the first column is a letter which is our label. The next 16 numbers
83+
following it are the different features. These features are obtained from the [UCI Machine Learning
8284
Repository](http://archive.ics.uci.edu/ml/). You can find the details of these features in [this
8385
page](http://archive.ics.uci.edu/ml/datasets/Letter+Recognition).
8486

85-
There are 20000 samples available, so we take first 10000 data as training samples and remaining
86-
10000 as test samples. We should change the alphabets to ascii characters because we can't work with
87-
alphabets directly.
87+
There are 20000 samples available, so we take the first 10000 as training samples and the remaining
88+
10000 as test samples. We should change the letters to ascii characters because we can't work with
89+
letters directly.
8890
@code{.py}
8991
import cv2 as cv
9092
import numpy as np
9193

92-
# Load the data, converters convert the letter to a number
94+
# Load the data and convert the letters to numbers
9395
data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
9496
converters= {0: lambda ch: ord(ch)-ord('A')})
9597

96-
# split the data to two, 10000 each for train and test
98+
# Split the dataset in two, with 10000 samples each for training and test sets
9799
train, test = np.vsplit(data,2)
98100

99-
# split trainData and testData to features and responses
101+
# Split trainData and testData into features and responses
100102
responses, trainData = np.hsplit(train,[1])
101103
labels, testData = np.hsplit(test,[1])
102104

103-
# Initiate the kNN, classify, measure accuracy.
105+
# Initiate the kNN, classify, measure accuracy
104106
knn = cv.ml.KNearest_create()
105107
knn.train(trainData, cv.ml.ROW_SAMPLE, responses)
106108
ret, result, neighbours, dist = knn.findNearest(testData, k=5)
@@ -110,10 +112,12 @@ accuracy = correct*100.0/10000
110112
print( accuracy )
111113
@endcode
112114
It gives me an accuracy of 93.22%. Again, if you want to increase accuracy, you can iteratively add
113-
error data in each level.
115+
more data.
114116

115117
Additional Resources
116118
--------------------
119+
1. [Wikipedia article on Optical character recognition](https://en.wikipedia.org/wiki/Optical_character_recognition)
117120

118121
Exercises
119122
---------
123+
1. Here we used k=5. What happens if you try other values of k? Can you find a value that maximizes accuracy (minimizes the number of errors)?

0 commit comments

Comments
 (0)