This project focuses on recognizing human emotions from speech audio using machine learning and deep learning techniques. It combines multiple datasets and extracts meaningful features to classify emotions like happy, sad, angry, and more.
The goal is to build a model that can identify the emotional tone of a speaker based on audio input. This can be useful in applications like virtual assistants, call center analytics, and mental health monitoring.
- 📁 Jupyter Notebook: All code is in
SER_model.ipynb
- 🧪 Feature extraction using MFCCs via
librosa
- 🧠 LSTM-based deep learning model using Keras
- 📊 Accuracy tracking and result visualization
speech-emotion-recognition/
│
├── SER_model.ipynb # Main notebook
├── requirements.txt # Python dependencies
├── README.md # You’re here!
└── data/
└── README.md # Dataset download instructions
This project uses two popular emotional speech datasets:
- RAVDESS - Zenodo Link
- TESS - University of Toronto
Download links and instructions are provided inside data/README.md
.
-
Clone this repository:
git clone https://github.com/your-username/speech-emotion-recognition.git cd speech-emotion-recognition
-
Install dependencies:
pip install -r requirements.txt
-
Download datasets as per
data/README.md
and place them in thedata/
folder. -
Launch the notebook:
jupyter notebook SER_model.ipynb
-
Run all cells in order to train the model and view results.
- Emotion classification accuracy
- Training vs validation plots
- Spectrogram and MFCC visualizations of audio samples
- Add real-time emotion prediction from microphone input
- Experiment with CNN architectures
- Hyperparameter tuning for improved accuracy