This academic mini project aims to detect and classify a personβs emotional state using voice and facial expressions. By combining speech and image data, the system attempts to provide an early indication of mental health status using machine learning techniques.
The system is divided into three core modules:
- Voice Emotion Recognition β Classifies emotions based on audio signals
- Facial Emotion Recognition β Identifies emotions from facial images
- Integrated Model β Combines predictions from both modules for better accuracy
This project was developed as a team effort during our final year of B.Tech in Information Technology.
- Dataset: RAVDESS (not included in repo due to size)
- Features Used: MFCC, Chroma, Mel Spectrogram
- Model: CNN-LSTM
- Output: Emotion label (e.g., happy, sad, angry)
- Dataset: FER2013 (publicly available on Kaggle)
- Model: CNN
- Output: Facial emotion classification into predefined categories
- Merges predictions from both modules
- Gives a more reliable assessment of emotional state
- Python
- TensorFlow / Keras
- NumPy, Pandas
- Librosa (audio feature extraction)
- OpenCV (image processing)
- Scikit-learn
- Matplotlib (visualization)