F-E3D: FPGA-based Acceleration of An Efficient 3D Convolutional Neural Networkfor Human Action Recognition
If you have any questions, pls leave an issue or email me at h.fan17@imperial.ac.uk
This is the evaluation code and model for paper "F-E3D: FPGA-based Acceleration of An Efficient 3D Convolutional Neural Networkfor Human Action Recognition". The whole implementation is based on MXNet.
Please cite our paper if it is helpful for your research
- Set up your environment with python and pip. You can use your native environment or Anaconda.
- Install CUDA 9.2 and cudnn.
- Install MXNet using
pip install mxnet-cu92
.
- Download the UCF101 dataset from this link.
- Decompress the downloaded dataset, remember the absolute path to this dataset.
- Since Training on Kinetics dataset takes takes nearly one month even on six-GPU cluster, the checkpoint of E3DNet model pretrained on Kinetcs is available on
ckpt/E3DNet_ckpt_kinetics
. - In
script/finetuning_E3DNet_ucf101.sh
, changing the path of UCF101 dataset to your local location. - Run the script:
sh script/finetuning_E3DNet_ucf101.sh
Optional: If you like, you can use screen to separate training from the terminal shell.
- In
script/validation_ucf101.sh
, changing the path of UCF101 dataset to your local location, and pointing the model path to the finetuned model. - The accuracy of other models is also shown in table below.
Please note, the accuracy we are testing is Clip@1 accuracy (prediction using only 1 clip). Many papers report Video@1 accuracy (prediction using only 10 or 20 clips), which is impossible for real-life applications
Model | ResNeX-101 Link1 Link2 | P3D Link | C3D Link | E3DNet |
---|---|---|---|---|
Clip@1 Accuracy | 87.7% | 84.2% | 79.87% | 85.17% |
For some models which do not report Clip@1 accuracy in their paper, we use the Clip@1 accuracy reported in their github repository. Link followed after the accuracy is their github implementation