Skip to content

uksamarth/Transformer-Based-Fusion-of-Static-and-Dynamic-Features

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Transformer Based Fusion of Static and Dynamic Features

Problem Statement

This project focuses on developing a unified multimodal framework for Automatic Unit (AU) detection and Expression recognition using a Transformer-based approach. The goal is to effectively fuse static (images, text) and dynamic (audio) features to enhance the accuracy and robustness of facial expression analysis systems.

Objectives

  • Propose a unified Multimodal framework for AU detection and Expression recognition.
  • Incorporate basic expressions, Action Units (AU), and Valence-Arousal (VA) features into the model.

Context of the Study

Facial Expression Analysis is a critical area in computer vision and human-computer interaction, with applications ranging from emotion recognition systems to virtual agents and affective computing.

Modalities

The project will utilize the following modalities for feature extraction and fusion:

  • Images (static features)
  • Text (such as transcripts or textual descriptions associated with the facial expressions)
  • Audio (dynamic features)

References

  1. Zhang, Wei, et al. "Transformer-based multimodal information fusion for facial expression analysis." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

  2. Kim, Jun-Hwa, Namho Kim, and Chee Sun Won. "Multi-modal facial expression recognition with transformer-based fusion networks and dynamic sampling." arXiv preprint arXiv:2303.08419 (2023).

About

Objective is to Propose a unified Multimodal framework for AU detection and Expression recognition.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published