GitHub - i-evi/evMLP: evMLP: An Efficient Event-Driven MLP Architecture for Vision

Note: This work is not related to event cameras.

Code and Pretrained Models for: evMLP: An Efficient Event-Driven MLP Architecture for Vision

This is a highly experimental implementation of evMLP. Training code, pre-trained models, and evaluation scripts will be updated in the near future.

Dependencies

Please refer to "requirements.txt". If you don't want to install dependencies according to "requirements.txt", torch>=2.0.0 is necessary, and it's better to install the latest versions of einops and thop to support operators and correctly calculate the cost.

Trainning

You can train on ImageNet-1K by modifying the references/classification/train.py in torchvision.

To train the models in the paper (default configuration in evmlp.py), you can use the following settings (using 4 GPUs):

torchrun --nproc_per_node=4 \
   train.py \
  --auto-augment imagenet \
  --label-smoothing 0.1 \
  --random-erase=0.1 \
  --mixup-alpha 0.2 \
  --cutmix-alpha 1.0 \
  --epochs 300 \
  --batch-size 256 \
  --opt sgd \
  --lr 0.1 \
  --lr-scheduler cosineannealinglr \
  --lr-min 0.00001 \
  --lr-warmup-method=linear \
  --lr-warmup-epochs=5 \
  --workers 8 \
  --wd 0.00001 \
  --data-path /path/to/dataset

Pre-trained models

Here are the pre-trained models:

Google Drive

evmlp_b_224_imagenet1k.pth: Using the default configuration in evmlp.py, trained from scratch on ImageNet-1K.

Video processing

Process videos using eval_video_dir.py:

python eval_video_dir.py <weights.pth> <dir_path> <event_threshold>

For example, download the model file evmlp_b_224_imagenet1k.pth, place the video files in /path/to/videos, and use an event threshold of 0.05:

python eval_video_dir.py evmlp_b_224_imagenet1k.pth /path/to/videos 0.05

eval_video_dir.py uses opencv_python to load video files. The default filter list only supports video files with extensions .avi and .mp4. If necessary, you can edit the following code:

L31@eval_video_dir.py: video_extensions = {'.avi', '.mp4'}

FAQs

Q: Can evMLP be used for other computer vision tasks besides image classification?

A: Certainly. The feature maps reconstructed by evMLP through the rearrange operation can maintain the adjacency relationship between neuron patches relative to the input image, making it directly applicable to tasks such as object detection and segmentation. If I have time, I will update some examples of applying evMLP to other tasks.

Q: Why has the number of MACs decreased, but the execution time increased instead?

A: This repository only provides experimental Python code. If you understand that:

Code 1:

a = numpy.random.rand(N)
sum = 0.
for i in a:
  sum += i

Code 2:

a = numpy.random.rand(N)
sum = a.sum()

Even though both codes sum the array a, the execution time of Code 2 might be significantly shorter than Code 1. For practical applications, the code can be implemented in C/C++. Alternatively, using FPGA for implementation is also a great option.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
eval_video_dir.py		eval_video_dir.py
event_driven.py		event_driven.py
evmlp.py		evmlp.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dependencies

Trainning

Pre-trained models

Video processing

FAQs

About

Uh oh!

Releases

Packages

Languages

i-evi/evMLP

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Trainning

Pre-trained models

Video processing

FAQs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages