The main idea of this project is to present training methods based on YoloV5 neural network architecture. This project shows several ways to collect data, depicts the entire training process based on gathered data (plastic Coca-Cola bottles) and runs inference on Jetson devices. The training process was done on a desktop machine while the implementation of the retrained model benchmark was done on Jetson Nano using an official repository of YoloV5 .
Jetson Nano is an AI single-board computer for embedded developers. This small-size embedded system is designed for prototyping solutions in the field of machine learning and artificial intelligence. Capabilities offered out of the box include peripherals dedicated directly to Raspberry Pi and Adafruit. Thanks to the Nvidia jetpack SDK, every necessary software tool is already pre-installed on the Linux system, based on Ubuntu 18.04 distribution. Therefore, it is possible to benefit from software packages and tools such as CUDA, TensorRT, cuDNN. It is worth noting that the cost of a small and relatively powerful device is around 99€/59€ (4GB/2GB of RAM). There are also more expensive versions like Jetson TX1, Jetson TX2, Jetson Xavier NX, Jetson AGX Xavier, Jetson Orin. Official documentation with various code samples is available on the GitHub repository - link.
Dustin Franklin is a member of Nvidia Jetson Developers and provides a complete repository on GitHub with dedicated software implementation on Jetson Nano. Every specific problem regarding Jetson Nano can be reported in the issues section. In this project, I decided not to apply any pre-trained and optimized model from the tutorials but take a closer look at the YOLO (You Only Look Once) algorithm.
Although, GitHub is the best place to familiarize users with coding on Jetson Nano. The image shown below presents a Jetson Nano device.
Figure 1. Jetson Nano module integrated on the board B01
Figure 2. Board B01 designed by NVIDIA
- microSD card slot for main storage
- 40-pin expansion header
- Micro-USB port for 5V power input or for data
- Gigabit Ethernet port
- USB 3.0 ports (x4)
- HDMI output port
- DisplayPort connector
- DC Barrel jack for 5V power input
- MIPI CSI camera connectors
- Antmicro project: GitHub Link
Figure 3. Jetson Nano module integrated on the Antmicro's baseboard
Figure 4. Baseboard designed by Antmicro
Collection of 250 Coca-Cola images include different position variants and backgrounds. The images are divided into three folders: test 5%, train 70% and valid 25%. Each of them includes subfolders with images and labels. Images have been resized to 416x416 pixels. The majority part of the pictures contain only standing positions and fully filled Coca-Cola bottles.
- Create categories and specific label names (Category example "Plastic bottles", label names "Coca-Cola", "Pepsi" ...)
- Create tight bounding boxes
- Annotate all objects of interest
- Annotate occluded objects
- Annotate the entirety of an object
- Create clear annotation instructions
- Use good annotation tools
The annotations for these tasks are in the form of bounding boxes and class names where the extreme coordinates of the bounding boxes and the class ID are set as the ground truth. This detection comes in the form of bounding boxes where the network detects the bounding box coordinates of each object and its corresponding class labels. Guide to Image Annotation
The default directory structure consists of two main files: detect.py and train.py. The first one includes command options to parse arguments. It runs a chosen interface with determined data, weights, configuration etc. The other one loads data, sets hyperparameters, imports a Yolo model and represents graph results either in Tensorboard or using matplotlib. Directory weights includes individual model weights.pt typical for PyTorch Framework. A directory called models includes default .yaml configuration files. These files must be modified in order to be trained properly on a custom dataset. The number of classes should be converted into a single one. It stands for only one category class of a Coca-Cola bottle. Besides this, several files need to be converted from a Pytorch model into Open Neural Network Exchange model ONNX format.
Figure 11. How to convert PyTorch to a TensorFlow model?
Directory utils includes additional files with created and defined methods used in the previously presented scripts. Eventually, all images are located in a data directory. My decision was focused on simplifying this process of organising folder structure. For this purpose, I created a Python script called create_data.py. It automatically divides gathered images with labels into separate directories and converts the resolution of the images.
The standard annotation for YOLO models is a .yaml file and Darknet weights format. This is actually a configuration file which defines its architecture with parameters. All I had to do in the files yolov5s.yaml and yolov5x.yaml was change the number of classes in the second line of code. The essential file for the training is dataset .yaml which should be located in a directory with images in order to avoid chaos in this peculiar one with others. It includes paths for training and validation which should be modified. Furthermore, the number of classes contained in the image folders should be adjusted to 1. However, the name for a specific object should be defined as Coca-Cola. An example of COCO format (JSON) and its conversion into YOLO format (Darknet .txt) can be found under this link
This is truly a time-consuming part of the entire project. From the number of 250 images, only 13 of them were destined for testing purposes without creating bounding boxes. For labelling images, I decided to use a web application LabelBox, I also recommend Label Studio - open source project made in Django framework. All resized images were uploaded by me so that I could launch a label editor. There are several options to outline objects like polygon, bounding box, polyline, point, entity, and segmentation. After all, images have been created with their bounding boxes, the next step is to download labelled files available either in .json or .csv extension. It was necessary to convert them into a specific yolo .txt file before training. Therefore I uploaded everything on Roboflow. This is a smart web application that automatically interlocks images with labels. It offers many functions such as dataset health check which can recognise incorrectly labelled images or even image transformations like preprocessing and augmentation.
Figure 5. YOLOv5s batch_size: 16, epochs: 100, training time: 10 minutes
In the case of YOLOv5, the training process is relatively easy to complete. After data preparation, it is requisite to run the training script train.py. I decided to take advantage of transfer learning using checkpoints. Both of them were pre-trained on a COCO dataset (default GPU common in a research field is NVIDIA Tesla P100). The smallest one, YOLOv5s weight consists of 7.5 million parameters, on the other hand, YOLOv5x consists of 89 million parameters. My results stand on to train four different neural network times to gain a better understanding of the impact on the final training results. My first attempt at YOLOv5s was surprising. With a batch size equal to 16 and a number of epochs equal to 100, the whole training process on 237 images lasted only 10 minutes. In comparison, YOLOv5x training with batch size 8 (due to a problem with memory allocation) lasted 22 minutes. My second training session was set up to 3000 epochs. In this case, training time was not proportionally equal. The bigger model was training for 10 hours and 30 minutes, while the smaller one only for 2 hours and 15 minutes. Overall, on the desktop machine (GTX 1080), YOLOv5s achieved a stable 100 FPS which is impressive and the other model YOLOv5x obtained a stable 30 FPS. The rapid training process encouraged me to repeat the experiment on more epochs. Therefore, both models have been set up to 3000 epochs with batch size equal to 32 and 6 due to the limitation of the memory allocation.
The Impressive part is that it takes only 10 minutes for a neural network to learn and recognize the Coca-Cola pattern in the process of training. Images below present a Generalized Intersection over Union and objectiveness for both training and validation data, detecting precision with recall, and mean Average Precision.
In this scenario, I decided to discuss only GIoU and mAP. GIoU is one of the most common evaluation metrics applied in object detection. In general, it defines the area taken into account shapes and is a good indicator of a loss function more details. Epoch number of 100 seems insufficient to decrease it below the value of 0.02.
Figure 6. YOLOv5s batch size - 16, epochs - 100, training time - 10 minutes
Figure 7. YOLOv5x batch size - 8, epochs - 100, training time - 22 minutes
Whereas, mAP - mean Average Precision measures the accuracy of the examined object detector. This is correlated with precision and recall. Precision can be defined as a correctly predicted percentage of images.
Furthermore, recall only measures positive cases from the prediction. Average Precision is an integral of the area under the precision-recall curve. In other words, it defines the accuracy of the trained model. As well figure 3. as Figure 4. presents successfully trained models. The difference is that the larger architecture of YOLO demands 5 times more training time than the smaller one.
Figure 8. YOLOv5s batch size - 32, epochs - 3000, training time - 2.25 hours
Figure 9. YOLOv5x batch size - 6, epochs - 3000, training time - 10.5 hours
To implement pre-trained models, it is necessary to copy the file with weights called last.pt. This should be pasted in the project with the weights directory on Jetson Nano. In order to run and check out pre-trained object detector in live-time on a web camera, type the following command:
python3 detect.py --source 0 --weights weights/yolov5s_cola.pt
Where yolov5s_cola.pt is our weight file that determines the structure of neuron connections in a neural network. If there are no other peripherals connected to Jetson Nano, parameter --source 0 defines a default device. Python script detect.py invokes methods to run the inference on a web camera.
Tests done on Jetson Nano demonstrated practical usage on this embedded platform. Accuracy remains generally the same either on a desktop machine or Jetson Nano device. The only significant difference is the speed performance. Stable 100 fps using a small version of yolov5 and 30 fps of the extra-large version dropped down respectively to 15 fps on yolov5s and 1.6 fps on yolov5x. This showcase makes an enormous difference by matching 472 GFLOPs on Jetson Nano GPU to 8873 GFLOPs on Nvidia GTX 1080 graphics card.
Additionally, I noticed that by the increasing number of iterations during training, the tendency of detecting is focused only on the entire object. In other words, it is going to detect only fully filled 1.25l Coca-Cola plastic bottles with high precision, because all images consisted of this object sort.
Yolov5s 100 epochs | Yolov5s 3000 epochs |
---|---|
![]() |
![]() |
Yolov5x 100 epochs | Yolov5x 3000 epochs |
---|---|
![]() |
![]() |
Transfer Learning Before | Transfer Learning After |
---|---|
![]() |
![]() |
Please notice that after transfer learning has been applied, the Coca-Cola bottle displayed on the laptop screen is considered both as a water bottle and as a Coca-Cola bottle at the same time. Due to the low prediction value in both cases.
Figure 10. Synthetic Data of Coca-Cola bottle
Synthetic Data is the data generated through computer programs such as Blender, game engines such as Unity or dedicated synthetic-data-generation engines such as NVIDIA Omniverse Replicator.
11. Synthetic Data usage in the future
-
Open Blender file named Create_model.blend
-
Click in the workspace tab Scripting , then Run Script.
Example 1 | Example 2 |
---|---|
![]() |
![]() |
Example 3 | Example 4 |
---|---|
![]() |
![]() |
TODO:
- Fix issue with the wrong mesh textures.
How to debug in VS CODE scripts written in Python [ BLENDER ]
This document outlines the unit tests for the Blender script. The tests cover various aspects of the script's functionality, ensuring that it behaves as expected under different scenarios.
- Test STL and image file paths exist
- Test image loading and existence
- Test camera and light creation
- Test mesh import and material assignment
- Test texture coordinate generation and mapping
- Test random mesh transformations
- Test rendering settings configuration
- Test image generation for scenes and angles
- Test image resolution and format
- Test camera removal after rendering
- Test handling of no valid mesh object
- Test error handling for file paths and image loading
- Test image generation limit per scene
- Test scene generation limit
- Test entire script execution end-to-end with different scenarios
To run the unit tests, follow these steps:
- Set up the necessary paths and variables in the
TestBlenderScript
class. - Open the Blender Text Editor.
- Copy and paste the test code into a new script.
- Run the script to execute the unit tests.
This Blender script generates random scenes and renders images of a 3D model. It applies texture color from an image file to the model and creates variations in scene composition.
- Generates random scenes with texture-applied models
- Applies texture color from an image to the model
- Creates variations in mesh transformations and camera angles
- Open the Blender Text Editor.
- Copy and paste the script code into a new script.
- Customize the file paths and parameters as needed.
- Run the script to generate and render random scenes.
For ensuring the functionality and correctness of the script, there are unit tests available. Refer to the Unit Tests document for detailed test cases and instructions on running the tests.
- Blender 3.0 or later
TODO LIST
Tool: Blender
- Create / Apply a 3D Model of the Coca-Cola bottle
- New inference version Yolo V8
- a) Create 3D Model using lidar & photogrammetry methods implemented in Scaniverse iPhone 13 Pro
- b) Create 3D Model using depth camera (IR laser projector + RGB Camera) implemented in RealSense D435
- Define custom scenes in Blender 🌟 NEW
- Render example scene with applied 3D model
- Create a script for random data generation
- Create a script to annotate images automatically