Speech-Recognition-Using-Tensorflow

Implementation of a seq2seq model for speech recognition. Architecture similar to "Listen, Attend and Spell". https://arxiv.org/pdf/1508.01211.pdf

Created: ['S', 'E', 'V', 'E', 'N', 'T', 'E', 'E', 'N', '<SPACE>', 'T', 'W', 'E', 'N', 'T', 'Y', '<SPACE>', 'F', 'O', 'U', 'R']
Actual: ['S', 'E', 'V', 'E', 'N', 'T', 'E', 'E', 'N', '<SPACE>', 'T', 'W', 'E', 'N', 'T', 'Y', '<SPACE>', 'F', 'O', 'U', 'R']

Requirements:

Tensorflow
numpy
pandas
librosa
python_speech_features

Dataset:

The dataset I used is the LibriSpeech dataset. It contains about 1000 hours of 16kHz read English speech. Source: http://www.openslr.org/12/

Architecture Used:

Seq2Seq model

We're using pyramidal bidirectional LSTMs in the encoder. This reduces the time resolution and enhances the performance on longer sequences.

Encoder-Decoder
Pyramidal Bidirectional LSTM
Bahdanau Attention
Adam Optimizer
exponential or cyclic learning rate
Beam Search or Greedy Decoding

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
SpeechRecognizer.py		SpeechRecognizer.py
sr.ipynb		sr.ipynb
sr_data_utils.py		sr_data_utils.py
sr_model_utils.py		sr_model_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech-Recognition-Using-Tensorflow

About

Uh oh!

Releases

Packages

Languages

gargimahale/Speech-Recognition-Using-Tensorflow

Folders and files

Latest commit

History

Repository files navigation

Speech-Recognition-Using-Tensorflow

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages