Skip to content

Facial Landmark • Iris Detection • Blink Detection • Polynomial Regression • Signal Processing • ResNet-18 • Masked Auto-Encoder • Knowledge-Distillation • Cognitive Load • Emotion • CUDA

Notifications You must be signed in to change notification settings

chatterjeesaurabh/Assessment-of-Cognitive-Load-and-Emotion-using-Ocular-and-Facial-Microexpressions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assessment of Cognitive Load and Emotion using Ocular Features and Facial Microexpressions

This project aims to assess the validity of using eye motion features as indicators of cognitive load and emotional state. Ocular features such as saccades and fixations, subjective questionnaires and task performance measures has been used to assess the mental workload while the user performs the designed tasks. Simultaneous acquisition of physiological signal is performed, which is currently the high-speed image sequence of user’s face as input. Facial micro-expression intensity has been estimated using ResNet-18 based model trained with Knowledge-Distillation from Masked Auto-Encoder, along with ocular features while the users were shown emotional stimuli.

The scheme has been validated with psychological tests such as visual response test (VRT) which induces mental fatigue and N-back test which induces memory load. Correlation between the physiological signal (eye movement and blink) and the psychological test response has been observed with the change in mental workload. Moreover, user emotional state has been observed to have correlation with eye motion behaviour and has been validated with the detection of activation of corresponding facial action units based on Facial Action Coding System (FACS).

Module Contents
1 Eye Tracking Eye detection, blink detection, eye motion features classification, screen gaze, emotion classification
2 Visual Response Test Psychometric test game made using PyGame
3 Facial Expression Estimation Deep learning model for facial Action Unit intensity estimation

Brief Steps

  • Developed face and facial landmark detection pipeline for video, and performed pupil localization by radial inspection of gradients. Developed CUDA algorithms using Numba to accelerate execution speed by 300 times.
  • Alternatively, used Mediapipe Face Landmark model to detect and track eye iris, eye corners and eyelid coordinates.
  • Performed blink detection, and iris location correction using Cubic Spline interpolation.
  • Screen gaze and heatmap estimation using Polynomial Regression. Classified eye motion into saccades and fixations using iris velocity and dispersion based thresholds.
  • Designed psychometric game(s): Visual Response Test and N-Back Test, which can induce different levels of mental workload.
  • Built deep learning model using ResNet-18 by pre-training with large facial expression datasets (AffectNet and EmotioNet) and further fine tuned with action unit intensity labels (DISFA dataset) along with simultaneous Knowledge-Distillation from a larger ViT (Vision-Transformer) based Masked Auto-Encoder model to estimate facial micro-expressions.

Eye Detection and Tracking

Cognitive Load Assessment Methodology

Emotion Assessment Methodology

  • The objective is to estimate facial emotion from expression using facial Action Units (AUs) intensity and FACS (Facial Action Coding System) in real-time.
  • To accomplish this, used a large-scale pre-trained network (Masked Auto-encoder) and performed feature-wise knowledge distillation with task-specific fine-tuning on a lightweight model (ResNet-18) to get facial Action Unit intensity in real-time.
  • Designed visual emotion stimuli to induce different emotions and simultaneously acquire eye coordinates and face video to estimate eye-motion features and facial micro-expression corresponding to the shown emotion stimulus.

Method

  • The training method has been adapted from Chang et. al.
  • A ViT (Vision-Transformer) based Masked Auto-encoder (MAE) is used which was pre-trained in self-supervised manner (masked input image reconstruction task) with EmotioNet dataset, to eliminate the limitation of lacking labeled training data. Subsequently, only the encoder part is extracted and attached to a linear classification layer, and further pre-trained on AffectNet and FFHQ dataset which are large facial expression datasets before finally fine-tuning on the DISFA dataset for facial Action Unit intensity estimation.
  • Now, since the MAE is a large model, to perform faster and real-time estimation, employed feature-wise knowledge distillation to transfer the teacher model's (MAE) knowledge to a lightweight student model (ResNet-18).
  • The ResNet-18 model with linear classification layer attached, is first pre-trained on the same AffectNet and FFHQ datasets and then fine-tuned with simultaneous knowledge distillation from teacher model on DISFA dataset for facial Action-Unit intensity estimation.
  • Using the facial Action Unit intensity values, their activation is assessed and based on FACS (Facial Action Coding System) which defines a relation between the action units and emotion, the overall facial emotion is estimated.

Losses

  1. Feature Matching Loss: A MSE loss between hidden feature layers of the teacher model and the student model.
    $\mathcal{L}_{FM} = \left\|f_{T} - \textbf{I}(f_{S})\right\| ^{2}$

  2. KL Divergence Loss: between teacher model's output for (i) the input face image and (ii) the student model's hidden feature layer input to the teacher linear classifiation layer.
    $\mathcal{L}_{KLD} = -\widehat{y}_{T} ~ \text{log} (\frac{\widehat{y}_{T}}{\widehat{y}_{S}})$

  3. Task Loss: The training MSE loss for the student network.
    $\mathcal{L}_{Task} = \left\|\widehat{y} - y\right\| ^{2}$

  • Overall Loss: $\mathcal{L} = \mathcal{L}_{FM} + \alpha\mathcal{L}_{Task} + \beta\mathcal{L}_{KLD}$

Performance on DISFA datast for facial Action Unit intensity estimation task:

Method PCC MAE MSE Remarks
ResNet-18 0.518 0.278 0.352 -
ResNet-18 + Pre-Train 0.614 0.236 0.260 -
ResNet-18 + FM Distill 0.628 0.231 0.260 Better performance and faster
MAE + Pre-Train 0.674 0.202 0.270 Best performance, but heavy and slowest

Datasets

Dataset Type Size Features
EmotioNet Image 9,75,000 8 Emotions
AffectNet Image 4,50,000 16 Overall Emotions, 6 Basic Emotions
DISFA Video 27 12 Action Units

Contributions

Saurabh Chatterjee
MTech, Signal Processing and Machine Learning
Indian Institute of Technology (IIT) Kharagpur

References

  • D. Chang, Y. Yin, Z. Li, M. Tran and M. Soleymani, "LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis," in 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2024 pp. 8190-8200. doi: 10.1109/WACV57701.2024.00802.

About

Facial Landmark • Iris Detection • Blink Detection • Polynomial Regression • Signal Processing • ResNet-18 • Masked Auto-Encoder • Knowledge-Distillation • Cognitive Load • Emotion • CUDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published