Sound Classification
is considered one of the most important tasks in the field of deep learning. It has great impact on applications of voice recognition within virtual assistants (Like Siri or Alexa), customer service as well as in music and media content recommendation systems. Moreover, it also impacts the Medical field wheteher to detect abnormalities in heartbeats or repiratory sounds. In addition it is also used within Security and Surveillance systems to help detect and assess a possible security breach inside a home whether it is infered by distress calls or even gunshots and glass breaking. Therefore, we aim to develop deep learning algorithms that can enable us to properly classify some environmental sounds provided by the UrbanSound8k Dataset
.
This project was developed using a Notebook
. Therefore if you're looking forward to test it out yourself, keep in mind to either use a Anaconda Distribution or a 3rd party software that helps you inspect and execute it.
Therefore, for more informations regarding the Virtual Environment used in Anaconda, consider checking the DEPENDENCIES.md file.
The project includes several key phases, including:
Exploratory Data Analysis
: We begin by examining the UrbanSound8k dataset to gain deeper insights into its structure and content to helps us understand the distribution of sound classes.Data pre-processing
: Cleaning and Preparing the audio samples to ensure their consistency and quality over the .Feature Engineering
: Utilizing the Librosa library, we extract meaningful features from the audio data such as Mel-frequency cepstral coefficients (MFCCs).Model architecture definition
: We develop the architecture of artificial neural networks tailored for sound classification, which involves experimenting with different deep learning models.Training and Performance Evaluation
: Employing the pre-partitioned dataset, we perform 10-fold cross-validation on each developed networks to then assess the models' performance using key metrics such as accuracy and confusion matrices.Statistical Inference
: Perform a Statistical Evaluation of the performance between all the developed networks.
The UrbanSound8k
dataset contains 8732 labeled sound excerpts (
For a detailed description of the dataset please consider checking the dataset web page available here. In case you are interested in the compilation process, the dataset creators have published a paper outlining the Taxonomy for Urban Sound Research. You can access it here.
If you're interested in trying this project yourself, you'll need access to the complete datasets
we've created. Since GitHub has file size limits, we've made them all available here.
Network Architecture
|
Final Global Confusion Matrix
|
---|---|
MLP |
|
CNN |
|
CNN Pre-Trained with YAMNET |
|
ResNet |
- MLP: Achieved 45% accuracy, struggling with the complexity of the sound data.
- CNN: Performed better at 55%, benefiting from 2D time-frequency representations (MFCCs).
- YAMNet: Leveraging transfer learning, YAMNet outperformed other models with 70% accuracy.
- ResNet: Achieved 55%, similar to CNN, but not as effective as YAMNet.
Data Distribution Scatter Plots
|
||
---|---|---|
1-Dimensional Processed MFCC's
|
2-Dimensional Raw MFCC's
|
|
PCA |
||
t-SNE |
This critical difference diagram shows the ranks of every model:
- YAMNET (1) is the best model, significantly outperforming others.
- CNN (2.6) and ResNet (2.6) have similar performance, with no statistical difference between them.
- MLP (3.8) is the worst, significantly worse than YAMNET and likely CNN / ResNet.
Our experiments showed that YAMNet with transfer learning produced the best results. Regularization techniques such as Dropout and L2 regularization helped reduce overfitting. While the models performed well overall, difficulties in distinguishing similar sound classes suggest that further improvements in feature extraction and model design could enhance performance.
- Authors → Gonçalo Esteves, Nuno Gomes and Pedro Afonseca
- Course → Machine Learning II [CC3043]
- University → Faculty of Sciences, University of Porto
README.md by Gonçalo Esteves