THis repository is for YOLO inference on FPGA without any interaction from software through utilizing FINN framework, Vivado and Verilog HDL. The target board used was ZCU102. However, Pynq-Z2 was also used for testing the algorithm.
1- You have firstly to train your network and retrieve the .pt weights file
I have used dataset for cars detection in this project
2- export your pre-trained model using file 1
3- Generate your ip whether through file 2 or file 3 from the following notebooks
4- Verify the flow steps through file 4
5- Design your block diagram in Vivado as in file 6 after adding file 5 to your project sources
6- Simulate your system by Vivado simulator using file 7 (check the signals name are the same as you have in your project)
The first step needed after the training process is exporting the model into QONNX format. You will need the best.pt file to instantiate a pre-trained model firstly before the export.
The used YOLO model in this repository can be found in this repo
This is the main notebook in the project. It is responsible for generating the streaming dataflow ip (NN Accelerator) using FINN. You can find explained details in the markdown cells. Note that the notebook will be static if you are viewing through GitHub. You will need firstly to install FINN to be able to run it. The folding values in this notebook were set as in this repo.
This notebook is almost the same as the previous one. However, I reduced the paralleslism in the network (folding) in order to decrease the needed resources and to be able to deploy it on PYNQ-Z2. The goal from this is visualizing the final output through the PYNQ Jupyter Notebook.
This notebook is for the verifying of the internally generated ONNX graphs during the flow before the hardware conversion. This is done through comparing the output values with the original model. The main function used was the built-in FINN function (execute_onnx)
This Verilog module is the main core of the hardware inference. It post-processes the RAW output from the YOLO network without any interaction from the software. The used data was only related to the class score and objectness score where there was no care about the position of the object inside its grid since it was sufficient to know whether the 32 x 32 pixel-grid has an object or no.
This how the system should look like for simulation. It is also possible if you want to integrate it in your project or use it only as a standalone system. Just take care of the IO mapping to generate your bitstream properly
This is a simple testbench that simulates the system using the AXI-Stream protocol with giving two consequent images loaded from the two memory files. The image must be 416 x 416 pixels for correct results. THe pixel values must have be represented in hexadecimal format as well.