Face Transformer - Rethinking model incorporating EfficientNet into ViT

Prof. Srinibas Rana* Debargha Mitra Roy Bikash Shaw Suprio Kundu

Jalpaiguri Government Engineering College

Recently there has been great interests of Transformer not only in NLP but also in Computer Vision (CV). We wonder if transformer can be used in face recognition by incorporating EfficientNet into ViT and whether it is better than CNNs. Therefore, we investigate the performance of Transformer models in face recognition. The models are trained on a large scale face recognition database Casia-Webface and evaluated on several mainstream benchmarks, including LFW, SLLFW, CALFW, CPLFW, TALFW, CFP-FP & AGEDB databases. We demonstrate that Transformer models achieve comparable performance as CNN with similar number of parameters and MACs. The Face-Transformer mainly uses ViT (Vision Transformer) architecture. Now we demonstrate if we can transfer learn and fine-tune the model with EfficientNet & merge it into ViT to get a better results.

Abstract:

We propose a hybrid architecture synergistically combining the strengths of both architectures, aiming for robust and efficient face recognition. Face recognition has achieved remarkable progress in recent years, but challenges remain in terms of robustness, efficiency, and scalability. Transformers have emerged as powerful models for various vision tasks, but their direct application to face recognition faces challenges due to computational cost and potential overfitting. EfficientNets, on the other hand, offer a balance of accuracy and efficiency in convolutional neural networks. In this work, we propose a novel approach that rethinks face transformers by integrating EfficientNets with ViT. We explore a hybrid architecture that leverages the strengths of both transformers and EfficientNets, aiming to achieve robust and efficient face recognition. We employ EfficientNets as a backbone for feature extraction, extracting informative and compact features while maintaining computational efficiency. Our findings demonstrate that the proposed hybrid architecture significantly surpasses existing methods in face recognition performance while maintaining excellent computational efficiency. It paves the way for developing robust, efficient, and scalable face recognition systems with diverse applications, ranging from security and access control to personalized user experiences and social media.

Objectives

To learn a representation of face images that is invariant to variations in lighting, pose, and expression.
To achieve state-of-the-art results on face recognition benchmarks by fine-tuning with EfficientNet and introduce the model into ViT.
To be robust to variations in the quality of the input images by evaluating LFW, SLLFW, CALFW, CPLFW, TALFW, CFP-FP & AGEDB evaluation databases.
To make it efficient in terms of computational cost and memory.

Model Architecture

Usage Instructions

1. Preparation

This code is mainly adopted from Vision Transformer, DeiT & Face Evolve. In addition to PyTorch and torchvision, install vit_pytorch by Phil Wang, efficientnet_pytorch by Luke Melas-Kyriazi & package timm by Ross Wightman. Sincerely appreciate for their contributions.

All needed Packages are found in requirements.txt. Simply install all packages by:

pip install -r requirements.txt

Files of vit_pytorch folder.

.
├── __init__.py
├── vit.py
├── vit_face.py
└── vits_face.py

Files of util folder.

.
├── __init__.py
├── test.py
├── utils.py
└── verification.py

2. Databases

You can download the training databases, CASIA-Webface (version - casia-webface), and put it in folder Data.

Dataset	Baidu Netdisk	Password	Google Drive	Onedrive	Website	GitHub	Kaggle
`ms1m-retinaface`	LINK	`4ouw`		LINK
`CASIA-Webface`	LINK		LINK				LINK
`UMDFace`	LINK		LINK
`VGG2`	LINK		LINK
`MS1M-IBUG`	LINK
`MS1M-ArcFace`	LINK		LINK
`MS1M-RetinaFace`	LINK	`8eb3`	LINK
`Asian-Celeb`	LINK
`Glint-Mini`	LINK	`10m5`
`Glint360K`	LINK	`o3az`
`DeepGlint`	LINK
`WebFace260M`					LINK
`IMDB-Face`
`Celeb500k`
`MegaFace`	LINK	`5f8m`	LINK
`DigiFace-1M`					LINK	LINK

You can download the testing databases as follows and put them in folder eval.

Dataset	Baidu Netdisk	Password	Google Drive
`LFW`	LINK	`dfj0`	LINK
`SLLFW`	LINK	`l1z6`	LINK
`CALFW`	LINK	`vvqe`	LINK
`CPLFW`	LINK	`jyp9`	LINK
`TALFW`	LINK	`izrg`	LINK
`CFP_FP`	LINK	`4fem`	LINK
`AGEDB`	LINK	`rlqf`	LINK

refers to Insightface

3. Train Models

EfficientNet + ViT

CUDA_VISIBLE_DEVICES='0' python3 -u train.py -b <batch_size> -w 0 -d casia -n <network_name> -head CosFace --outdir <path_to_model> --warmup-epochs 0 --lr 3e-5 -r <path_to_model>

4. Pretrained Models and Test Models (on LFW, SLLFW, CALFW, CPLFW, TALFW, CFP_FP, AGEDB)

You can download the following models -

Model	Google Drive
`ViT-P8S8`	LINK
`EfficientNet + ViT`	LINK

You can test Models -

The content of property file for casia-webface dataset is as follows: $10572, 112, 112$

python3 test.py --model <path_to_model> --network <network_name> --batch_size <batch_size> --target <eval_data>

References

This is the research paper of Face Transformer for Recognition, forked from zhongyy/Face-Transformer.

Contact

If you have any questions, please create an issue on this repository or contact at debarghamitraroy@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
log		log
papers		papers
publications		publications
util		util
vit_pytorch		vit_pytorch
LICENSE		LICENSE
README.md		README.md
colab.ipynb		colab.ipynb
config.py		config.py
efficientnet.ipynb		efficientnet.ipynb
image_iter.py		image_iter.py
requirements.txt		requirements.txt
test.py		test.py
test_forward.py		test_forward.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Face Transformer - Rethinking model incorporating EfficientNet into ViT

Abstract:

Objectives

Model Architecture

Usage Instructions

1. Preparation

2. Databases

3. Train Models

4. Pretrained Models and Test Models (on LFW, SLLFW, CALFW, CPLFW, TALFW, CFP_FP, AGEDB)

You can download the following models -

You can test Models -

References

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

debarghamitraroy/Face-Transformer-Rethinking-model-incorporating-EfficientNet-into-ViT

Folders and files

Latest commit

History

Repository files navigation

Face Transformer - Rethinking model incorporating EfficientNet into ViT

Abstract:

Objectives

Model Architecture

Usage Instructions

1. Preparation

2. Databases

3. Train Models

4. Pretrained Models and Test Models (on LFW, SLLFW, CALFW, CPLFW, TALFW, CFP_FP, AGEDB)

You can download the following models -

You can test Models -

References

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages