Skip to content

Tkarthik123/Speech-Emotion-Recognition-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Speech Emotion Recognition 📢😄😢😡

Overview

The Speech Emotion Recognition (SER) project is designed to detect emotions from human speech. By analyzing vocal patterns, this project aims to identify specific emotions such as happiness, sadness, anger, and more. Speech Emotion Recognition has applications in various domains, including customer service, mental health monitoring, and interactive voice assistants, where understanding the speaker's emotional state can enhance the overall experience and response accuracy.

Project Objectives

To develop a machine learning model capable of identifying emotions from audio files. To preprocess speech data for optimal model training and evaluation. To evaluate model performance and refine it for greater accuracy in real-world applications. To build a pipeline for converting raw audio files into labeled emotions and then visualize the results.

Dataset

This project utilizes a speech emotion dataset that includes labeled audio files with emotions such as:

Happiness Sadness Anger Fear Neutral

Popular datasets for SER include RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) and TESS (Toronto Emotional Speech Set), which contain audio samples with labeled emotions.

Methodologies and Techniques

1. Data Preprocessing

Audio Cleaning: Removing background noise to improve model performance. Feature Extraction: Extracting Mel-frequency cepstral coefficients (MFCCs), chroma, and mel spectrogram features from the audio files, which are essential for recognizing patterns in speech. Label Encoding: Encoding emotional labels as numerical values for model training.

2. Feature Extraction

MFCC (Mel-Frequency Cepstral Coefficients): This technique breaks down audio signals into features related to pitch and loudness, providing critical insights for emotion recognition. Chroma Features: Analyzes the pitch class distribution, which is useful for distinguishing emotional tones. Spectral Features: Capturing the energy distribution across different frequencies, useful for identifying mood or tone variations.

3. Model Building and Training

Model Selection: Common models used include: CNN (Convolutional Neural Networks): Effective for image-like audio spectrogram input. LSTM (Long Short-Term Memory Networks): Useful for sequential data like audio. Random Forests & SVMs: Traditional classifiers for simpler audio feature sets. Model Evaluation: Models are evaluated using metrics such as accuracy, precision, recall, and F1-score to ensure balanced and reliable emotion recognition.

4. Data Visualization

Confusion Matrix: Displays the model's accuracy by emotion class. Audio Signal Plots: Visual representations of different audio signals to show differences in waveform for various emotions. Feature Analysis: Visualize feature distributions to assess how well different emotions separate.

Installation and Setup

Clone the repository:

git clone https://github.com/your-username/Speech-Emotion-Recognition.git cd Speech-Emotion-Recognition

Install required libraries: Make sure you have Python 3.6+ and install the necessary libraries: pip install -r requirements.txt

Download Dataset:

Place the audio dataset in the data/ directory. Ensure each file is labeled with its corresponding emotion.

Run the Model: To train and test the model, execute:

python train_model.py Visualize Results: Visualize accuracy, loss, and confusion matrix by running:

python visualize_results.py

This project requires the following libraries:

Librosa:

For audio analysis and feature extraction NumPy & Pandas: For data manipulation Scikit-Learn: For machine learning algorithms and evaluation metrics TensorFlow/Keras: For deep learning models (CNN, LSTM) Matplotlib & Seaborn: For visualizing data and results To install all dependencies:

pip install -r requirements.txt Results and Evaluation The model's accuracy and performance were assessed using:

Accuracy: Percentage of correctly classified emotions. Precision and Recall: Evaluates how well the model identifies each emotion correctly.

F1-Score: Balances precision and recall for a robust metric of model performance. Initial tests yielded promising accuracy rates across emotions, with some misclassifications due to overlapping audio characteristics. Further tuning and additional training data can enhance these results.

Potential Applications Customer Service Automation: Detects customer sentiment in call centers to improve response quality.

Mental Health Monitoring: Analyzes emotional states for early detection of mental health concerns.

Virtual Assistants: Enables more natural interactions by detecting user emotions. Entertainment: Customizes music or video recommendations based on the user’s emotional state.

Future Improvements Data Augmentation: Introduce more diverse audio samples to enhance model generalizability.

Real-Time Recognition: Optimize the model for real-time emotion detection. Multi-Language Support: Adapt the model for different languages and accents to broaden its application.

Advanced Models: Experiment with transformers or hybrid models for improved accuracy in emotion recognition. Contributing

Contributions are welcome! Feel free to submit a pull request or open an issue if you'd like to improve the project or add new features.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published