Audio Emotion Recognition

Ever wondered how a machine can feel your vibes? Well, here's your answer! This project dives into the world of audio signals and extracts the hidden emotions behind them. This README provides an overview of the code, datasets used, and key functionalities.

Installation

To run this project, you need to install the required libraries. Use the following commands to set up your environment:

!apt-get update
!apt-get install -y libsndfile1
pip install librosa seaborn tensorflow keras

Datasets

The project uses the following datasets:

RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song): Contains 24 professional actors (12 male, 12 female) vocalizing two lexically-matched statements in a neutral North American accent.
CREMA-D (Crowd-Sourced Emotional Multimodal Actors Dataset): Contains 7,442 original clips from 91 actors.
SAVEE (Surrey Audio-Visual Expressed Emotion): Contains recordings from 4 male actors expressing 7 different emotions.
TESS (Toronto Emotional Speech Set): Contains 200 target words spoken in the carrier phrase "Say the word _" by two actresses.

Data Preparation

The data preparation involves reading the audio files from the datasets and extracting relevant information such as file paths and emotions. The code performs the following steps:

Mount Google Drive: To access the datasets stored in Google Drive.
Load RAVDESS Dataset: Extracts file paths and emotions, and stores them in a DataFrame.
Load CREMA-D Dataset: Extracts file paths and emotions, and stores them in a DataFrame.
Load SAVEE Dataset: Extracts file paths and emotions, and stores them in a DataFrame.
Load TESS Dataset: Extracts file paths and emotions, and stores them in a DataFrame.
Combine All Datasets: Combines the DataFrames from all datasets into a single DataFrame.

Feature Extraction

Various audio features are extracted to represent the audio signals. The features include:

RMS Energy: Root Mean Square energy of the audio signal.
Zero Crossing Rate (ZCR): The rate at which the signal changes sign.
Band Energy Ratio (BER): The ratio of energy in different frequency bands.
Spectral Centroid: The center of mass of the spectrum.
Bandwidth: The width of the band in the spectrum.
Mel-Frequency Cepstral Coefficients (MFCCs): A representation of the short-term power spectrum of sound.

Spectrogram Visualization

The project includes code for visualizing the spectrograms of audio signals. This includes:

Magnitude Spectrum: The magnitude of the Fourier Transform of the signal.
Spectrogram: A visual representation of the spectrum of frequencies of the signal as it varies with time.
Log-Amplitude Spectrogram: The logarithm of the amplitude of the spectrogram.
Mel Spectrogram: A spectrogram where the frequencies are converted to the Mel scale.

Model Training

The project utilizes various machine learning models for emotion recognition from audio signals. The models include:

Convolutional Neural Networks (CNNs): For extracting spatial features from spectrograms.
Long Short-Term Memory (LSTM): For capturing temporal dependencies in the audio signals.
GRU (Gated Recurrent Unit): An alternative to LSTM for capturing temporal dependencies.

The models are built using Keras and TensorFlow.

Results

The project evaluates the performance of the models using metrics such as confusion matrix and classification report. The results show the accuracy and other performance metrics of the emotion recognition models.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
audio_rough_1.ipynb		audio_rough_1.ipynb
audio_rough_2.ipynb		audio_rough_2.ipynb
ser_ml_proj.ipynb		ser_ml_proj.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Emotion Recognition

Table of Contents

Installation

Datasets

Data Preparation

Feature Extraction

Spectrogram Visualization

Model Training

Results

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

VarunDamodaran/speech_emotion_recognition

Folders and files

Latest commit

History

Repository files navigation

Audio Emotion Recognition

Table of Contents

Installation

Datasets

Data Preparation

Feature Extraction

Spectrogram Visualization

Model Training

Results

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages