This project focuses on creating object detection models for an aquatic environment. In addition, stereo video is captured and processed with a ZED camera to detect these objects in real time using YOLO models, combined with depth estimation using stereo disparity maps. The output includes bounding boxes with estimated distances and an optional video output.
To validate the proposed neural network models by making inference with the video captured with the ZED camera, the following steps were followed:
To run the code, you will need to install the following dependencies beforehand:
- Ultralytics >=8.3.20
- Python
- Pytorch
- Clearml 1.18.0
The best way to install python dependencies is by using a virtual environment, to do so:
$ sudo apt install virtualenv
$ virtualenv -p python3 venv
$ source venv/bin/activate
$ pip install numpy
To deactivate virtualenv, do by:
$ deactivate
Datasense@CRAS/ │
├── train/ │
│ ├── images/
│ ├── image1.jpg
│ ├── image2.jpg
│ ├── .... (other image files)
│ └── labels/
│ ├── label1.txt
│ ├── label2.txt
│ ├── .... (other label files)
├── valid/ │
│ ├── image1.jpg
│ ├── image2.jpg
│ ├── .... (other image files)
│ └── labels/
│ ├── label1.txt
│ ├── label2.txt
│ ├── .... (other label files)
├── test/ │
│ ├── image1.jpg
│ ├── image2.jpg
│ ├── .... (other image files)
│ └── labels/
│ ├── label1.txt
│ ├── label2.txt
│ ├── .... (other label files)
└── data.yaml
- Original dataset: Datasense@CRAS.
- Original dataset: SeaDroneSee v2.
- Our dataset as a contribution: USVDD.
To run or launch the training you will need:
$ yolo detect train data=/path/data.yaml model=yolov8n.pt epochs=150 imgsz=640 batch=16 lr0=0.001 momentum=0.9 weight_decay=0.0005 plots=True save=True amp=True
-
Datasense@CRAS
nano model.
small model.
medium model. -
SeaDroneSee v2
nano model.
small model.
medium model. -
USVDD
nano model.
small model.
medium model.
- JetPack SDK
- ZED SDK
- Ultralytics
- CUDA
To run inferences with our model and the videos captured with the ZED camera, our inference code is based on the Python script detector.py
provided by Stereolabs. Alternatively, we have also developed a C++ version based on equivalent with similar performance rates.
In addition, the testing have been performed with some videos available in the folder Videos/
folder, in which we have used the medium, small and nano model of both Datasense@CRAS and USVDD dataset for object detection. The results are available in the folder Videos/ObjectDetection/
.
Title: "YOLO-Based Power-Efficient Object Detection on Edge Devices for USVs"
Journal: Journal of Real-Time Image Processing (Published)
DOI: https://doi.org/10.1007/s11554-025-01682-2
This paper has been partially funded by the EU (FEDER), the Spanish MINECO under grants PID2021-126576NB-I00 and TED2021-130123B-I00 funded by MCIN/AEI/10.13039/501100011033 and by European Union "ERDF A way of making Europe" and the NextGenerationEU/PRT. J.L.M. thanks the National Secretariat of Science, Technology and Innovation (SENACYT) of Panama for financial support during the completion of his PhD.