Skip to content

Applied computer vision principles with Watson Visual Recognition service to detect and identify objects and faces in the images.

Notifications You must be signed in to change notification settings

i-krishna/Watson-VisualRecognition

Repository files navigation

Watson-Visual-Recognition (WVR) Summary

Bigger picture

Watson has various services that you weave together to solve the user’s problem. Watson does not just know. It has to be taught. Cognitive systems are not programmed, they are trained. There are five key Watson patterns: Engagement, Discovery, Decision, Policy, and Exploration

Discover - Vision - Visual Recognition

Let us look into Watson API learning model - visual recognition further. WVR has 6 basic models as shown below

alt text

Working of various models

Consider a tyre image from the demo here and go through the classification results as shown below

alt text

From curl or swagger or postman, submit the images as input from img_db

Get the API access key credentials of the visual recognition service from your IBM cloud account

Image on left as input - Json response on right as output

WVR Food model

WVR Face model

WVR General model

WVR Custom model

We created a custom model that classifies dogs. We supplied a negative sample of cats. The JSON response below shows the training phase of the custom classifier dogs_2025763446

Once the training is done, make a get request to see the status ready before passing test samples.

Below you find one positive (dog golden retreiver ) and one negative (apples) JSON responses when passed to custom classifier dogs_2025763446

Documentation specifies the WVR custom model limitations.

Issues / Limitations

Assertion 1: Documentation says that form parameter images_file can be a single file or a zip file with max 20 images. The maximum size of such a zip file is 100MB. Not ideal for cases of real-time video classification that takes more than 20 fps.

alt text

Assertion 2: When using the general model, it does not show all the objects like a apple within an image in JSON response.

curl -X POST -u "apikey:m2SyTztvn6aR1PFI0i7Lyf9er4Jh8fANO6E0btcYWrAL" --form "images_file=@/Users/krishna/Desktop/img_db/fruitbowl.jpg" "https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?version=2018-03-19"

Assertion 3: As assertion 2 uses the general model and all objects within food image are not shown, We passed the classifier id as food now. Even then, not all fruits like oranges are classified with a default threshold as shown below.

Test with image sample of fruitbowl.jpg

curl -X POST -u "apikey:m2SyTztvn6aR1PFI0i7Lyf9er4Jh8fANO6E0btcYWrAL" --form "images_file=@/Users/krishna/Desktop/img_db/fruitbowl.jpg"  -F "classifier_ids=food" "https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?version=2018-03-19"

Threshold of 0 and 0.5(default)

Threshold of 0.6, 0.9

Note:

  • In fruitbowl.jpg (640 × 426 pixel image resolution), when the threshold is above 0.6, apples or banana are not recognized. The default threshold of 0.5 and anything below 0.5 recognized the fruits apple and banana. Adjusting the threshold might increase the quality of predictions but sometimes the objects are gone out of predictions completely.

Test with another image sample of Apples_green_red.jpg

Default threshold of 0.5

Threshold of 0.7, 0.8

Note:

  • In Apples_green_red.jpg (342 × 147 pixel image resolution), none of the objects are recognized when the threshold is increased to 0.8. The image resolution of Apples_green_red.jpg is less than that of the above fruitbowl.jpg (640 × 426 pixel image resolution).
  • Instead of changing the threshold to improve the prediction results, we can fix the threshold to 0.5 (default) and submit images with higher resolution for better predictions results.
  • On a broader note, Threshold is directly proportional to the image quality. Higher the picture quality, objects in the picture can be recognized with higher thresholds. Lesser the picture quality, objects in the picture can be recognized only with lesser thresholds. Some tips on choosing the right threshold value for custom classifiers is shown here: 3rd point in Questions
  • Also, Documentation mentions that images in training and testing sets should resemble each other. Significant visual differences between training and testing groups will result in poor performance results. There are number of additional factors that will impact the quality of your training beyond the resolution of your images. Lighting, angle, focus, color, shape, distance from subject, and presence of other objects in the image will all impact your training.
  • So far, we tested images with pre-trained classifiers or built-in models where we have no control of trained images. Custom classifiers have much more control on training & test samples to improve the accuracy levels taking in view of aforementioned points.

Assertion 3: Faces are detected in food image for the following command below

curl -X POST -u "apikey:m2SyTztvn6aR1PFI0i7Lyf9er4Jh8fANO6E0btcYWrAL" --form "images_file=@/Users/krishna/Desktop/img_db/fruitbowl.jpg" "https://gateway.watsonplatform.net/visual-recognition/api/v3/detect_faces?version=2018-03-19"

Assertion 4: Documentation says that for a given image, age and gender is classified using general model. However, JSON responses for the curl requests using a general model for above Ginni / Trump images does not shown such classification.

Assertion 5: General model does not detect multiple faces within a single image as shown below

General model response for face detection

/p>

For such classification to happen, we have to explicitly pass the parameter detect_faces while submitting the image through curl request as shown below. Means, we have to know whether we are passing the face/object/food image before passing image.

detect_faces parameter passed in curl request

curl -X POST -u "apikey:m2SyTztvn6aR1PFI0i7Lyf9er4Jh8fANO6E0btcYWrAL" --form "images_file=@/Users/krishna/Desktop/img_db/6_faces_in_single_image.jpg.jpg" "https://gateway.watsonplatform.net/visual-recognition/api/v3/detect_faces?version=2018-03-19"

Assertion 6: Delayed responses in cases of increased image files. 1st image below takes <1sec. While the next 2 zipped folders with 5 and 22 images take 2.5 and more than 8 seconds.

Time = < 1sec for 1 file

Time = 2.5sec for 5 files

Time = >8 sec for 22 files

Detailed JSON response for 20 files - Note that only 20 files are processed as specified in the documentation

Assertion 7: We can observe from above-detailed JSON response that, images that have faces does not contain any information about their age/gender within the JSON response. Also, we passed images with combinations like images with face and food, food and text, food and hands. In such cases, the JSON responses are restricted to only one particular category.

Assertion 8: Current UI interface does not show any train button to upload images in custom model creation. Hence we trained our custom models by passing training datasets through curl request. Check below demo for further details.

Relevant studies

Onpremise offering

References

About

Applied computer vision principles with Watson Visual Recognition service to detect and identify objects and faces in the images.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages