Skip to content

added DNN training #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions python/DNN_training/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# DNNs For ASR
The work is part of a Google Summer of Code project, the goal of which was to integrate DNNs with CMUSphinx. This particular repository contains some convenient scripts that wrap Keras code and allow for easy training of DNNs.
## Getting Started
Start by cloning the repository.
### Prerequisites
The required python libraries available from pypi are in the requirements.txt file. Install them by running:
```
pip install -r requirements.txt
```
Additional libraries not available from pypi:
- tfrbm- for DBN-DNN pretraining.
- available at https://github.com/meownoid/tensorfow-rbm
## Getting Started
Since the project is primarily intended to be used with PocketSphinx the file formats for feature files, state-segmentation output files and the prediction files are in sphinx format.
### Feature File Format
```
N: number of frames
M: dimensions of the feature vector
N*M (4 bytes)
Frame 1: f_1...f_M (4*M bytes)
.
.
.
Frame N: f_1,...,f_M (4*M bytes)
```
Look at readMFC in utils.py
### state-segmentation files
format for each frame:
```
2 2 2 1 4 bytes
st1 [st2 st3] pos scr
```
### Prediction output
format for each frame:
```
N: number of states
N (2 bytes)
scr_1...scr_N (2*N bytes)
```
### Wrapper Scripts
```
runDatasetGen.py -train_fileids -val_fileids [-test_fileids] -n_filts -feat_dir -feat_ext -stseg_dir -stseg_ext -mdef [-outfile_prefix] [-keep_utts]
```
runDatasetGen takes feature files and state-segmentation files stored in sphinx format along with the definition file of the GMM-HMM model to generate a set of numpy arrays that form a python readable dataset.
runDatasetGen writes the following files in the directory it was called in:
- Data Files
- <outfile_prefix>_train.npy
- <outfile_prefix>_dev.npy
- <outfile_prefix>_test.npy
- label files
- <outfile_prefix>_train_label.npy
- <outfile_prefix>_dev_label.npy
- <outfile_prefix>_test_label.npy
- metadata file
- <outfile_prefix>_meta.npz

The metadata file is a zipped collection of arrays with the follwing keys:
- File names for utterances
- filenames_Train
- filenames_Dev
- filenames_Test
- Number of frames per utterance (useful if -keep_utts is not set)
- framePos_Train
- framePos_Dev
- framePos_Test
- State Frequencies (useful for scaling in some cases)
- state_freq_Train
- state_freq_Dev
- state_freq_Test
```
runNNTrain.py -train_data -train_labels -val_data -val_labels -nn_config [-context_win] [-cuda_device_id] [-pretrain] [-keras_model] -model_name
```
runNNTrain takes the training and validation data files (as generated by runDatasetGen) and trains a neural network on them.
The architecture and parameters of the neural network is defined in a text file. Currently this script supports 4 network types:
- MLP (mlp)
- Convolutional Neural Network (conv)
- MLP with short cut connections (resnet)
- Convolutional Network with residual connections in the fully connected layers (conv + resnet)
See sample_nn.cfg for an example.
The format for the configuration file consists of ```param``` and ```value``` pairs
if value has multiple elemets (represented by ... below) they should be separated by spaces.
Params and possible values:
- **type** mlp, conv, resnet, conv+resnet
- **width** any integer value
- **depth** any integer value
- **dropout** float in (0,1)
- **batch_norm** -
- **activation** sigmoid, hard_sigmoid, elu, relu, selu, tanh, softplus, softsign, softmax, linear
- **optimizer** sgd, adam, adagrad
- **lr** float in (0,1)
- **batch_size** any integer value
- **ctc_loss** -
- for type = conv and type = conv+resnet
- **conv** [n_filters, filter_window]...
- **pooling** None, [max/avg, window_size, stride_size]
- for type = resnet and type = conv+resnet
- **block_depth** any integer value
- **n_blocks** any integer value
```
runNNPredict -keras_model -ctldir -inext -outdir -outext -nfilts [-acoustic_weight] [-context_win] [-cuda_device_id]
```
runNNPredict takes a keras model and a list of feature files to generate predictions. The predictions are stored as binary files in sphinx readable format (defined above).
Please ensure that the dimensionality of the feature vectors matches nfilts and the context window is the same as that for which the model was trained.
The acoustic_weight is used to scale the output scores. This is required because if the scores are passed through a GMM-GMM decoder like PocketSphinx are too small or too large then the decoding performance suffers. One way of estimating this weight is to generate scores from the GMM-HMM decoder being used, fit a linear regression between the GMM-HMM scores and the NN-scores and use the coefficient as the weight.
```
readSen.py -gmm_score_dir -gmm_ctllist -nn_score_dir -nn_ctllist [-gmm_ext] [-nn_ext]
```
readSen takes scores (stored in sphinx readable binary files) obtained from a GMM-HMM decoder and a NN, and fit a regression to them.

## Example workflow with CMUSphinx
- Feature extraction using sphinx_fe:
```
sphinx_fe -argfile ../../en_us.ci_cont/feat.params -c etc/wsj0_train.fileids -di wav/ -do feat_ci_mls -mswav yes -eo mls -ei wav -ofmt sphinx -logspec yes
```
- State-segmentation using sphinx3_align
```
sphinx3_align -hmm ../../en_us.ci_cont/ -dict etc/cmudict.0.6d.wsj0 -ctl etc/wsj0_train.fileids -cepdir feat_ci_mls/ -cepext .mfc -insent etc/wsj0.transcription -outsent wsj0.out -stsegdir stateseg_ci_dir/ -cmn batch
```
- Generate dataset using runDatasetGen.py
- Train NN using runNNtrain.py
### EITHER
- Generate predictions from the NN using runNNPredct.py
- Generate predictions from PocketSphinx
```
pocketsphinx_batch -hmm ../../en_us.ci_cont/ -lm ../../tcb20onp.Z.DMP -cepdir feat_ci_mfc/ -ctl ../../GSOC/SI_ET_20.NDX -dict etc/cmudict.0.6d.wsj0 -senlogdir sendump_ci/ -compallsen yes -bestpath no -fwdflat no -remove_noise no -remove_silence no -logbase 1.0001 -pl_window 0
```
- Compute the acoustic weight using readSen.py
- Decode the scaled NN predictions with PocketSphinx
```
pocketsphinx_batch -hmm ../../en_us.ci_cont/ -lm ../../tcb20onp.Z.DMP -cepdir senscores/ -cepext .sen -hyp NN2.hyp -ctl ../../GSOC/SI_ET_20.NDX -dict etc/cmudict.0.6d.wsj0 -compallsen yes -logbase 1.0001 -pl_window 0 -senin yes
```
### OR
- predict and decode with the PocketSphinx DNN decoder by passing your keras model to it and setting the other required parameters.
```
pocketsphinx_batch -hmm ../../en_us.ci_cont_2/ -lm ../../tcb20onp.Z.DMP -cepdir feat_ci_dev_mls/ -cmn batch -hyp test_ci_2-2.hyp -ctl etc/wsj0_dev.fileids -dict etc/cmudict.0.6d.wsj0 -nnmgau ../../GSOC/bestModels/best_CI.h5 -pl_window 0 -ceplen 25 -ncep 25 -cudaid 2
```
NOTE: If you are using the PocketSphinx DNN decoder please ensure that you select the appropriate feature type for your model. You need to be extra careful if you are training models that process the data utterance-wise instead of frame-wise since the default behaviour of Pocketsphinx is to perform frame-wise classification.
67 changes: 67 additions & 0 deletions python/DNN_training/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
appdirs==1.4.3
audioread==2.1.5
backports.weakref==1.0rc1
bleach==1.5.0
cycler==0.10.0
Cython==0.25.2
daemonize==2.4.7
decorator==4.0.6
editdistance==0.3.1
funcsigs==1.0.2
functools32==3.2.3.post2
graphviz==0.7.1
guppy==0.1.10
h5py==2.7.0
htk-io==0.5
html5lib==0.9999999
ipython==2.4.1
joblib==0.11
Keras==2.0.6
Lasagne==0.2.dev1
librosa==0.5.1
Mako==1.0.6
Markdown==2.2.0
MarkupSafe==1.0
matplotlib==2.0.2
memory-profiler==0.47
mock==2.0.0
nose==1.3.7
numpy==1.13.1
packaging==16.8
pbr==3.1.1
pexpect==4.0.1
posix-ipc==1.0.0
protobuf==3.3.0
ptyprocess==0.5
py==1.4.33
pycurl==7.43.0
pydot==1.2.3
pydot-ng==1.0.0
pyfst==0.2.3
pygpu==0.6.5
pyliblzma==0.5.3
pyparsing==2.2.0
pysqlite==1.0.1
pytest==3.0.7
python-apt==1.1.0b1
python-dateutil==2.6.0
python-speech-features==0.5
pytools==2016.2.6
pytz==2017.2
PyYAML==3.12
resampy==0.1.5
rpm-python==4.12.0.1
scikit-learn==0.18.1
scipy==0.19.1
simplegeneric==0.8.1
six==1.10.0
subprocess32==3.2.7
tensorflow-gpu==1.2.0
tfrbm==0.0.2
Theano==0.9.0
tkinter==0.2.0
tqdm==4.14.0
urlgrabber==3.9.1
virtualenv==15.0.1
Werkzeug==0.12.2
yum-metadata-parser==1.1.4
9 changes: 9 additions & 0 deletions python/DNN_training/test.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
epoch,acc,loss,lr,val_acc,val_loss
0,0.11622738801684533,5.8389106012230449,0.001,0.10160732136648122,5.7048709947973366
1,0.16153809341500766,4.8325580953638916,0.001,0.10914953494436717,5.4223426403024435
2,0.18681057379402757,4.4621437494093934,0.001,0.12780201451668985,5.2290357653056798
3,0.21127847434915772,4.1832528039470747,0.001,0.14062635822322669,5.0586822548562527
4,0.2326776057618683,3.9420465787738608,0.001,0.15207753824756606,4.8969079582681241
0,0.11584453962480858,5.8249653400724917,0.001,0.10603784553198888,5.6919887321217502
1,0.16048675583843797,4.8297243269807897,0.001,0.10991421462100139,5.4031377342712235
2,0.18789032589969373,4.4382342056329911,0.001,0.12922543245827539,5.2118432286385863
Binary file added python/DNN_training/test.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 8 additions & 1 deletion python/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,13 @@ nobase_scripts_SCRIPTS = \
cmusphinx/prune_mixw.py \
cmusphinx/quantize_mixw.py \
cmusphinx/lat2dot.py \
cmusphinx/qmwx.pyx
cmusphinx/qmwx.pyx \
cmusphinx/readSen.py \
cmusphinx/runDatasetGen.py \
cmusphinx/runNNPredict.py \
cmusphinx/runNNTrain.py \
cmusphinx/genLabels.py \
cmusphinx/utils.py \
cmusphinx/NNTrain.py

EXTRA_DIST = $(nobase_scripts_SCRIPTS)
Loading