ML2 | Urban Sound Classification

Project Overview

Sound Classification is considered one of the most important tasks in the field of deep learning. It has great impact on applications of voice recognition within virtual assistants (Like Siri or Alexa), customer service as well as in music and media content recommendation systems. Moreover, it also impacts the Medical field wheteher to detect abnormalities in heartbeats or repiratory sounds. In addition it is also used within Security and Surveillance systems to help detect and assess a possible security breach inside a home whether it is infered by distress calls or even gunshots and glass breaking. Therefore, we aim to develop deep learning algorithms that can enable us to properly classify some environmental sounds provided by the UrbanSound8k Dataset.

Project Development

Dependencies & Execution

This project was developed using a Notebook. Therefore if you're looking forward to test it out yourself, keep in mind to either use a Anaconda Distribution or a 3rd party software that helps you inspect and execute it.

Therefore, for more informations regarding the Virtual Environment used in Anaconda, consider checking the DEPENDENCIES.md file.

Planned Work

The project includes several key phases, including:

Exploratory Data Analysis : We begin by examining the UrbanSound8k dataset to gain deeper insights into its structure and content to helps us understand the distribution of sound classes.
Data pre-processing : Cleaning and Preparing the audio samples to ensure their consistency and quality over the .
Feature Engineering : Utilizing the Librosa library, we extract meaningful features from the audio data such as Mel-frequency cepstral coefficients (MFCCs).
Model architecture definition : We develop the architecture of artificial neural networks tailored for sound classification, which involves experimenting with different deep learning models.
Training and Performance Evaluation : Employing the pre-partitioned dataset, we perform 10-fold cross-validation on each developed networks to then assess the models' performance using key metrics such as accuracy and confusion matrices.
Statistical Inference : Perform a Statistical Evaluation of the performance between all the developed networks.

UrbanSound8K Dataset

The UrbanSound8k dataset contains 8732 labeled sound excerpts ($\le$ 4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy.

For a detailed description of the dataset please consider checking the dataset web page available here. In case you are interested in the compilation process, the dataset creators have published a paper outlining the Taxonomy for Urban Sound Research. You can access it here.

Additional Datasets

If you're interested in trying this project yourself, you'll need access to the complete datasets we've created. Since GitHub has file size limits, we've made them all available here.

Project Results

Model Performance

Network Architecture	Final Global Confusion Matrix
MLP
CNN
CNN Pre-Trained with YAMNET
ResNet

MLP: Achieved 45% accuracy, struggling with the complexity of the sound data.
CNN: Performed better at 55%, benefiting from 2D time-frequency representations (MFCCs).
YAMNet: Leveraging transfer learning, YAMNet outperformed other models with 70% accuracy.
ResNet: Achieved 55%, similar to CNN, but not as effective as YAMNet.

Dimensionality Reduction Visualization

Data Distribution Scatter Plots
	1-Dimensional Processed MFCC's	2-Dimensional Raw MFCC's
PCA
t-SNE

Critical Differences Diagram

Critical Differences Diagram

This critical difference diagram shows the ranks of every model:

YAMNET (1) is the best model, significantly outperforming others.
CNN (2.6) and ResNet (2.6) have similar performance, with no statistical difference between them.
MLP (3.8) is the worst, significantly worse than YAMNET and likely CNN / ResNet.

Conclusion

Our experiments showed that YAMNet with transfer learning produced the best results. Regularization techniques such as Dropout and L2 regularization helped reduce overfitting. While the models performed well overall, difficulties in distinguishing similar sound classes suggest that further improvements in feature extraction and model design could enhance performance.

Authorship

Authors → Gonçalo Esteves, Nuno Gomes and Pedro Afonseca
Course → Machine Learning II [CC3043]
University → Faculty of Sciences, University of Porto

_{README.md by Gonçalo Esteves}

Name		Name	Last commit message	Last commit date
Latest commit History 220 Commits
Urban-Sound-Classification		Urban-Sound-Classification
.gitignore		.gitignore
DEPENDENCIES.md		DEPENDENCIES.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
tensorflow.yml		tensorflow.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML2 | Urban Sound Classification

Project Overview

Project Development

Dependencies & Execution

Planned Work

UrbanSound8K Dataset

Additional Datasets

Project Results

Model Performance

Dimensionality Reduction Visualization

Critical Differences Diagram

Conclusion

Authorship

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

EstevesX10/ML2-Urban-Sound-Classification

Folders and files

Latest commit

History

Repository files navigation

ML2 | Urban Sound Classification

Project Overview

Project Development

Dependencies & Execution

Planned Work

UrbanSound8K Dataset

Additional Datasets

Project Results

Model Performance

Dimensionality Reduction Visualization

Critical Differences Diagram

Conclusion

Authorship

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages