Skip to content

SherifGamal9441/Spoken-Digit-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ Spoken Digit Recognition

A simple yet effective bidirectional LSTM model trained to recognize spoken digits using spectrograms. This project serves as an end-to-end learning exercise in audio preprocessing, feature extraction, and sequence modeling.


📌 Project Highlights

  • 🔊 Audio Classification: Predicts digits (0–9) from short spoken audio clips.
  • 📈 High Accuracy: Achieves strong validation performance with minimal preprocessing.
  • 🧠 Deep Learning: Utilizes a Bidirectional LSTM model trained on spectrogram features.
  • 🔁 Augmentation: Applies time-stretching and noise injection to improve generalization.
  • 📊 Visualization: Includes confusion matrix, spectrogram plot, and training curves.
  • 🎓 Educational Purpose: Built as a foundational step into speech and audio modeling.

📂 Dataset

This project uses the Free Spoken Digit Dataset (FSDD), which contains:

  • Recordings of digits (0–9)
  • Multiple speakers
  • Clean and well-labeled audio, ideal for quick experimentation

📈 Evaluation & Results

Observations:

  • The model performs well on validation data with minimal overfitting.
  • Confusion matrix shows strong classification accuracy, especially for clearly spoken digits.
  • Achieved 96% F1 score

Conclusion:

This project shows that even simple models can be powerful when combined with clean datasets and good preprocessing. Bidirectional LSTMs capture temporal features well, and augmentation helps further boost performance. The approach provides a solid foundation for more complex speech-based applications.


🧪 Virtual Environment

  • Key Packages:
    • Python 3.10
    • tensorflow==2.19.0
    • tf-keras==2.19.0
    • keras==3.9.2
    • pandas==1.4.2
    • numpy==1.26.4
    • matplotlib==3.10.0
    • seaborn==0.13.2
    • librosa==0.11.0

About

Bidirectional LSTM model trained to recognize spoken digits using spectrograms

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published