This project implements a deep learning solution for recognizing emotions from facial expressions using a fine-tuned ResNet18 architecture. The model classifies faces into seven universal emotions: angry, disgust, fear, happy, neutral, sad, and surprise.
- Project Overview
- Technical Highlights
- Installation
- Dataset
- Model Architecture
- Data Augmentation
- Training Pipeline
- Evaluation and Results
- Usage
- References
Facial emotion recognition has applications in numerous fields including:
- Human-computer interaction
- Mental health monitoring
- Customer sentiment analysis
- Educational technology
- Accessibility solutions
This project demonstrates how transfer learning can be effectively applied to the challenging task of emotion recognition by fine-tuning a pre-trained CNN architecture.
- Transfer Learning: Leverages ImageNet-pretrained ResNet18 for efficient training
- Architecture Adaptation: Modified for grayscale input while preserving learned features
- Selective Layer Freezing: Strategically unfreezes specific layers for optimal fine-tuning
- Advanced Data Augmentation: Implements multiple augmentation techniques to improve generalization
- Learning Rate Scheduling: Uses ReduceLROnPlateau to optimize training
- Gradient Clipping: Prevents exploding gradients during backpropagation
- Best Model Checkpointing: Saves the model with highest validation accuracy
To run this project, you need Python 3.6+ and the following packages:
pip install torch torchvision numpy pandas opencv-python Pillow scikit-learn matplotlib seaborn tqdm
git clone https://github.com/yourusername/emotion-detector.git
cd emotion-detector
The model is trained on the FER2013 (Facial Expression Recognition 2013) dataset, which contains 35,887 grayscale images of faces categorized into seven emotions.
fer2013/
├── train/
│ ├── angry/
│ ├── disgust/
│ ├── fear/
│ ├── happy/
│ ├── neutral/
│ ├── sad/
│ └── surprise/
└── test/
├── angry/
├── disgust/
├── fear/
├── happy/
├── neutral/
├── sad/
└── surprise/
The dataset contains seven emotion categories represented by the following mapping:
EMOTIONS = {
'angry': 0,
'disgust': 1,
'fear': 2,
'happy': 3,
'neutral': 4,
'sad': 5,
'surprise': 6
}
ResNet (Residual Network) is known for its skip connections that help mitigate the vanishing gradient problem in deep networks. ResNet18 has 18 layers and provides an excellent balance of depth and computational efficiency.
def create_resnet18_model(num_classes=7):
# Load pretrained model
model = resnet18(weights='IMAGENET1K_V1')
# Modify for grayscale input
original_conv = model.conv1.weight.data
model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
model.conv1.weight.data = original_conv.mean(dim=1, keepdim=True)
# Strategic layer freezing/unfreezing
layers_to_unfreeze = ['layer3', 'layer4', 'fc']
for name, param in model.named_parameters():
param.requires_grad = any(layer in name for layer in layers_to_unfreeze)
# Replace final classifier
num_features = model.fc.in_features
model.fc = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(num_features, num_classes)
)
return model
To enhance model robustness and generalization, a comprehensive set of data augmentations is applied to the training images:
train_transform = T.Compose([
T.Lambda(lambda x: Image.fromarray(x)), # Convert numpy array to PIL Image
T.Resize((224, 224)),
T.RandomHorizontalFlip(p=0.5),
T.RandomRotation(10),
T.RandomAffine(degrees=0, translate=(0.05, 0.05), scale=(0.95, 1.05)),
T.RandomAdjustSharpness(sharpness_factor=1.5, p=0.3),
T.RandomAutocontrast(p=0.3),
T.ToTensor(),
T.Normalize(mean=[0.485], std=[0.229])
])
For evaluation, only resizing and normalization are applied:
test_transform = T.Compose([
T.Lambda(lambda x: Image.fromarray(x)),
T.Resize((224, 224)),
T.ToTensor(),
T.Normalize(mean=[0.485], std=[0.229])
])
A custom FERDataset
class inherits from PyTorch's Dataset
class to load and preprocess the images:
class FERDataset(Dataset):
def __init__(self, data_dir, transform=None):
self.transform = transform
self.images = []
self.labels = []
# Get all images and labels
for emotion, label in EMOTIONS.items():
path = os.path.join(data_dir, emotion)
if not os.path.exists(path):
continue
for img in os.listdir(path):
self.images.append(os.path.join(path, img))
self.labels.append(label)
def __getitem__(self, idx):
# Read image as grayscale
image = cv2.imread(self.images[idx], cv2.IMREAD_GRAYSCALE)
if self.transform:
image = self.transform(image)
return image, self.labels[idx]
def __len__(self):
return len(self.images)
The training uses AdamW optimizer with different learning rates for different layers:
optimizer = optim.AdamW([
{'params': model.conv1.parameters(), 'lr': learning_rate * 0.1},
{'params': model.layer3.parameters(), 'lr': learning_rate * 0.3},
{'params': model.layer4.parameters(), 'lr': learning_rate * 0.5},
{'params': model.fc.parameters(), 'lr': learning_rate}
], weight_decay=0.01)
To manage learning rates dynamically during training:
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=2, verbose=True)
The training process includes:
- Forward pass and loss computation
- Backward pass with gradient clipping
- Optimization step
- Metrics tracking for both train and validation sets
- Model checkpointing based on validation accuracy
The model achieves the following performance:
- Best Training Accuracy: ~74.52%
- Best Validation Accuracy: ~66.88%
- Final Training Loss: 0.64
- Final Validation Loss: 0.98
precision recall f1-score support
angry 0.61 0.59 0.60 958
disgust 0.72 0.58 0.64 111
fear 0.48 0.50 0.49 937
happy 0.86 0.88 0.87 1774
neutral 0.63 0.65 0.64 1233
sad 0.57 0.55 0.56 1247
surprise 0.81 0.80 0.81 831
def predict_emotion(image_path, model):
# Load image and preprocess
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
image_tensor = test_transform(image).unsqueeze(0).to(device)
# Get prediction
model.eval()
with torch.no_grad():
output = model(image_tensor)
_, predicted = torch.max(output, 1)
emotion_idx = predicted.item()
# Map index to emotion
emotions = list(EMOTIONS.keys())
return emotions[emotion_idx]
# Example usage
emotion = predict_emotion("path/to/image.jpg", model)
print(f"Predicted emotion: {emotion}")
def load_model(model_path, num_classes=7):
model = create_resnet18_model(num_classes)
checkpoint = torch.load(model_path)
model.load_state_dict(checkpoint['model_state_dict'])
model.to(device)
return model
# Example usage
model = load_model("best_model.pth")
- FER2013 Dataset: Kaggle
- ResNet Architecture: Deep Residual Learning for Image Recognition
- Transfer Learning: A Survey on Transfer Learning