Note: This work is not related to event cameras.
Code and Pretrained Models for: evMLP: An Efficient Event-Driven MLP Architecture for Vision
This is a highly experimental implementation of evMLP. Training code, pre-trained models, and evaluation scripts will be updated in the near future.
Please refer to "requirements.txt". If you don't want to install dependencies according to "requirements.txt", torch>=2.0.0
is necessary, and it's better to install the latest versions of einops
and thop
to support operators and correctly calculate the cost.
You can train on ImageNet-1K by modifying the references/classification/train.py in torchvision.
To train the models in the paper (default configuration in evmlp.py
), you can use the following settings (using 4 GPUs):
torchrun --nproc_per_node=4 \
train.py \
--auto-augment imagenet \
--label-smoothing 0.1 \
--random-erase=0.1 \
--mixup-alpha 0.2 \
--cutmix-alpha 1.0 \
--epochs 300 \
--batch-size 256 \
--opt sgd \
--lr 0.1 \
--lr-scheduler cosineannealinglr \
--lr-min 0.00001 \
--lr-warmup-method=linear \
--lr-warmup-epochs=5 \
--workers 8 \
--wd 0.00001 \
--data-path /path/to/dataset
Here are the pre-trained models:
evmlp_b_224_imagenet1k.pth
: Using the default configuration inevmlp.py
, trained from scratch on ImageNet-1K.
Process videos using eval_video_dir.py
:
python eval_video_dir.py <weights.pth> <dir_path> <event_threshold>
For example, download the model file evmlp_b_224_imagenet1k.pth
, place the video files in /path/to/videos
, and use an event threshold of 0.05
:
python eval_video_dir.py evmlp_b_224_imagenet1k.pth /path/to/videos 0.05
eval_video_dir.py
uses opencv_python
to load video files. The default filter list only supports video files with extensions .avi
and .mp4
. If necessary, you can edit the following code:
L31@eval_video_dir.py: video_extensions = {'.avi', '.mp4'}
Q: Can evMLP be used for other computer vision tasks besides image classification?
A: Certainly. The feature maps reconstructed by evMLP
through the rearrange operation can maintain the adjacency relationship between neuron patches relative to the input image, making it directly applicable to tasks such as object detection and segmentation. If I have time, I will update some examples of applying evMLP
to other tasks.
Q: Why has the number of MACs decreased, but the execution time increased instead?
A: This repository only provides experimental Python code. If you understand that:
Code 1:
a = numpy.random.rand(N)
sum = 0.
for i in a:
sum += i
Code 2:
a = numpy.random.rand(N)
sum = a.sum()
Even though both codes sum the array a
, the execution time of Code 2
might be significantly shorter than Code 1
. For practical applications, the code can be implemented in C/C++. Alternatively, using FPGA for implementation is also a great option.