Assessment of Cognitive Load and Emotion using Ocular Features and Facial Microexpressions

This project aims to assess the validity of using eye motion features as indicators of cognitive load and emotional state. Ocular features such as saccades and fixations, subjective questionnaires and task performance measures has been used to assess the mental workload while the user performs the designed tasks. Simultaneous acquisition of physiological signal is performed, which is currently the high-speed image sequence of user’s face as input. Facial micro-expression intensity has been estimated using ResNet-18 based model trained with Knowledge-Distillation from Masked Auto-Encoder, along with ocular features while the users were shown emotional stimuli.

The scheme has been validated with psychological tests such as visual response test (VRT) which induces mental fatigue and N-back test which induces memory load. Correlation between the physiological signal (eye movement and blink) and the psychological test response has been observed with the change in mental workload. Moreover, user emotional state has been observed to have correlation with eye motion behaviour and has been validated with the detection of activation of corresponding facial action units based on Facial Action Coding System (FACS).

	Module	Contents
1	Eye Tracking	Eye detection, blink detection, eye motion features classification, screen gaze, emotion classification
2	Visual Response Test	Psychometric test game made using PyGame
3	Facial Expression Estimation	Deep learning model for facial Action Unit intensity estimation

Brief Steps

Developed face and facial landmark detection pipeline for video, and performed pupil localization by radial inspection of gradients. Developed CUDA algorithms using Numba to accelerate execution speed by 300 times.
Alternatively, used Mediapipe Face Landmark model to detect and track eye iris, eye corners and eyelid coordinates.
Performed blink detection, and iris location correction using Cubic Spline interpolation.
Screen gaze and heatmap estimation using Polynomial Regression. Classified eye motion into saccades and fixations using iris velocity and dispersion based thresholds.
Designed psychometric game(s): Visual Response Test and N-Back Test, which can induce different levels of mental workload.
Built deep learning model using ResNet-18 by pre-training with large facial expression datasets (AffectNet and EmotioNet) and further fine tuned with action unit intensity labels (DISFA dataset) along with simultaneous Knowledge-Distillation from a larger ViT (Vision-Transformer) based Masked Auto-Encoder model to estimate facial micro-expressions.

Eye Detection and Tracking

Cognitive Load Assessment Methodology

Emotion Assessment Methodology

The objective is to estimate facial emotion from expression using facial Action Units (AUs) intensity and FACS (Facial Action Coding System) in real-time.
To accomplish this, used a large-scale pre-trained network (Masked Auto-encoder) and performed feature-wise knowledge distillation with task-specific fine-tuning on a lightweight model (ResNet-18) to get facial Action Unit intensity in real-time.
Designed visual emotion stimuli to induce different emotions and simultaneously acquire eye coordinates and face video to estimate eye-motion features and facial micro-expression corresponding to the shown emotion stimulus.

Method

The training method has been adapted from Chang et. al.
A ViT (Vision-Transformer) based Masked Auto-encoder (MAE) is used which was pre-trained in self-supervised manner (masked input image reconstruction task) with EmotioNet dataset, to eliminate the limitation of lacking labeled training data. Subsequently, only the encoder part is extracted and attached to a linear classification layer, and further pre-trained on AffectNet and FFHQ dataset which are large facial expression datasets before finally fine-tuning on the DISFA dataset for facial Action Unit intensity estimation.
Now, since the MAE is a large model, to perform faster and real-time estimation, employed feature-wise knowledge distillation to transfer the teacher model's (MAE) knowledge to a lightweight student model (ResNet-18).
The ResNet-18 model with linear classification layer attached, is first pre-trained on the same AffectNet and FFHQ datasets and then fine-tuned with simultaneous knowledge distillation from teacher model on DISFA dataset for facial Action-Unit intensity estimation.
Using the facial Action Unit intensity values, their activation is assessed and based on FACS (Facial Action Coding System) which defines a relation between the action units and emotion, the overall facial emotion is estimated.

Losses

Feature Matching Loss: A MSE loss between hidden feature layers of the teacher model and the student model.
$\mathcal{L}_{FM} = \left\|f_{T} - \textbf{I}(f_{S})\right\| ^{2}$
KL Divergence Loss: between teacher model's output for (i) the input face image and (ii) the student model's hidden feature layer input to the teacher linear classifiation layer.
$\mathcal{L}_{KLD} = -\widehat{y}_{T} ~ \text{log} (\frac{\widehat{y}_{T}}{\widehat{y}_{S}})$
Task Loss: The training MSE loss for the student network.
$\mathcal{L}_{Task} = \left\|\widehat{y} - y\right\| ^{2}$

Overall Loss: $\mathcal{L} = \mathcal{L}_{FM} + \alpha\mathcal{L}_{Task} + \beta\mathcal{L}_{KLD}$

Performance on DISFA datast for facial Action Unit intensity estimation task:

Method	PCC	MAE	MSE	Remarks
ResNet-18	0.518	0.278	0.352	-
ResNet-18 + Pre-Train	0.614	0.236	0.260	-
ResNet-18 + FM Distill	0.628	0.231	0.260	Better performance and faster
MAE + Pre-Train	0.674	0.202	0.270	Best performance, but heavy and slowest

Datasets

Dataset	Type	Size	Features
EmotioNet	Image	9,75,000	8 Emotions
AffectNet	Image	4,50,000	16 Overall Emotions, 6 Basic Emotions
DISFA	Video	27	12 Action Units

Contributions

Saurabh Chatterjee
MTech, Signal Processing and Machine Learning
Indian Institute of Technology (IIT) Kharagpur

References

D. Chang, Y. Yin, Z. Li, M. Tran and M. Soleymani, "LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis," in 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2024 pp. 8190-8200. doi: 10.1109/WACV57701.2024.00802.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
Eye Tracking		Eye Tracking
Face Action Unit Intensity Estimation		Face Action Unit Intensity Estimation
VRT		VRT
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Assessment of Cognitive Load and Emotion using Ocular Features and Facial Microexpressions

Brief Steps

Eye Detection and Tracking

Cognitive Load Assessment Methodology

Emotion Assessment Methodology

Method

Losses

Datasets

Contributions

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

chatterjeesaurabh/Assessment-of-Cognitive-Load-and-Emotion-using-Ocular-and-Facial-Microexpressions

Folders and files

Latest commit

History

Repository files navigation

Assessment of Cognitive Load and Emotion using Ocular Features and Facial Microexpressions

Brief Steps

Eye Detection and Tracking

Cognitive Load Assessment Methodology

Emotion Assessment Methodology

Method

Losses

Datasets

Contributions

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages