A higher performance NVCaffe implementation of Single-Shot Refinement Neural Network for Object Detection. This code is based on NVCaffe version 17.0.2. The official and original Caffe code can be found here.
This code is verified on Ubuntu 16.04 LTS 64bit, CUDA 9.0 and cudnn 7.0.
Arch | Paper | Caffe Version | Our NVCaffe Version |
---|---|---|---|
RefineDet320 | 80.0% | 79.52% | 79.98% |
RefineDet512 | 81.8% | 81.85% | 81.8% |
RefineDet320 from Scratch | - | - | 72.27% |
- Clone this repository.
- Then download the dataset by following the instructions below.
make -j{cpu_core_num} all
make pycaffe
- First download the fc-reduced VGG-16 PyTorch base network weights at: https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
- By default, we assume you have downloaded the file in the
RefineDet.PyTorch/weights
dir:
cd models/VGGNet
wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
- To train RefineDet320 or RefineDet512 using the train scripts
train_refinedet320.sh
andtrain_refinedet512.sh
. You can manually change them as you want.
./refinedet_train.sh
- Note:
- For training, an NVIDIA GPU is strongly recommended for speed.
- For instructions on Visdom usage/installation, see the Installation section.
To evaluate a trained network:
python test/test_refinedet.py
- Original Implementation (CAFFE)
- NVCAFFE
- A list of other great SSD ports that were sources of Readme.md:
NVIDIA Caffe (NVIDIA Corporation ©2017) is an NVIDIA-maintained fork of BVLC Caffe tuned for NVIDIA GPUs, particularly in multi-GPU configurations. Here are the major features:
- 16 bit (half) floating point train and inference support.
- Mixed-precision support. It allows to store and/or compute data in either 64, 32 or 16 bit formats. Precision can be defined for every layer (forward and backward passes might be different too), or it can be set for the whole Net.
- Layer-wise Adaptive Rate Control (LARC) and adaptive global gradient scaler for better accuracy, especially in 16-bit training.
- Integration with cuDNN v7.
- Automatic selection of the best cuDNN convolution algorithm.
- Integration with v2.2 (or higher) of NCCL library for improved multi-GPU scaling.
- Optimized GPU memory management for data and parameters storage, I/O buffers and workspace for convolutional layers.
- Parallel data parser, transformer and image reader for improved I/O performance.
- Parallel back propagation and gradient reduction on multi-GPU systems.
- Fast solvers implementation with fused CUDA kernels for weights and history update.
- Multi-GPU test phase for even memory load across multiple GPUs.
- Backward compatibility with BVLC Caffe and NVCaffe 0.15 and higher.
- Extended set of optimized models (including 16 bit floating point examples).
- Experimental feature (no official support) Multi-node training (since v0.17.1, NCCL 2.2 and OpenMPI 2 required).
- Experimental feature (no official support) TRTLayer (since v0.17.1, can be used as inference plugin).
Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.
Please cite Caffe in your publications if it helps your research:
@article{jia2014caffe,
Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
Journal = {arXiv preprint arXiv:1408.5093},
Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
Year = {2014}
}
Please read, sign and attach enclosed agreement NVIDIA_CLA_v1.0.1.docx to your PR.
Libturbojpeg library is used since 0.16.5. It has a packaging bug. Please execute the following (required for Makefile, optional for CMake):
sudo apt-get install libturbojpeg
sudo ln -s /usr/lib/x86_64-linux-gnu/libturbojpeg.so.0.1.0 /usr/lib/x86_64-linux-gnu/libturbojpeg.so