Speech Recognition Using CNN and FSDD

Project Description

This project implements a convolutional neural network (CNN) to recognize spoken digits using the Free Spoken Digit Dataset (FSDD). The dataset includes recordings of spoken digits in WAV files, making it a suitable choice for developing and testing speech recognition models.

Dataset

The FSDD is a simple audio/speech dataset that includes recordings of spoken digits in English, trimmed to minimal silence. It features:

6 speakers
3,000 recordings (50 of each digit per speaker)

Dependencies

To run this notebook, you will need the following packages:

hub
numpy
sklearn
matplotlib
tensorflow

Install these packages using pip:

pip install hub numpy sklearn matplotlib tensorflow

Usage

Data Preprocessing: Load the dataset and split it into training, validation, and testing sets.
Build and Train the Model: Run the notebook cells to build the CNN model and train it using the training data.
Evaluate the Model: Assess the model's performance on the test dataset using accuracy metrics and a confusion matrix.
Predictions: Use the trained model to make predictions on new data.

Model Details

The CNN model consists of several convolutional and pooling layers, followed by dense layers for classification. Training details, including the number of epochs and optimizer, are documented in the notebook.

Results

The notebook includes visualizations of model accuracy and loss, as well as a confusion matrix and classification report for a comprehensive evaluation.

Save and Load the Model

The trained model is saved as Speech Recognition.h5. It can be loaded using TensorFlow/Keras for further use or deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Speech Recognition.h5		Speech Recognition.h5
Speech Recognition.ipynb		Speech Recognition.ipynb
model.png		model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech Recognition Using CNN and FSDD

Project Description

Dataset

Dependencies

Usage

Model Details

Results

Save and Load the Model

About

Uh oh!

Releases

Packages

Languages

elishastanley/Convolutional-neural-network-on-Speech-Recognition-using-Free-Spoken-Digit-Dataset--FSDD

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition Using CNN and FSDD

Project Description

Dataset

Dependencies

Usage

Model Details

Results

Save and Load the Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages