cmusphinx · ahmedshah1494 · Aug 26, 2017 · Aug 27, 2017 · Aug 27, 2017 · Aug 27, 2017
diff --git a/python/DNN_training/README.md b/python/DNN_training/README.md
@@ -0,0 +1,137 @@
+# DNNs For ASR
+The work is part of a Google Summer of Code project, the goal of which was to integrate DNNs with CMUSphinx. This particular repository contains some convenient scripts that wrap Keras code and allow for easy training of DNNs.
+## Getting Started
+Start by cloning the repository.
+### Prerequisites
+The required python libraries available from pypi are in the requirements.txt file. Install them by running:
+```
+pip install -r requirements.txt
+```
+Additional libraries not available from pypi:
+- tfrbm- for DBN-DNN pretraining.
+	- available at https://github.com/meownoid/tensorfow-rbm
+## Getting Started
+Since the project is primarily intended to be used with PocketSphinx the file formats for feature files, state-segmentation output files and the prediction files are in sphinx format. 
+### Feature File Format
+```
+N: number of frames
+M: dimensions of the feature vector
+N*M (4 bytes)
+Frame 1: f_1...f_M (4*M bytes)
+.
+.
+.
+Frame N: f_1,...,f_M (4*M bytes)
+```
+Look at readMFC in utils.py
+### state-segmentation files
+format for each frame:
+```
+ 2    2   2    1   4  bytes
+st1 [st2 st3] pos scr
+```
+### Prediction output
+format for each frame:
+```
+N: number of states
+N (2 bytes)
+scr_1...scr_N (2*N bytes)
+```
+### Wrapper Scripts 
+```
+runDatasetGen.py -train_fileids -val_fileids [-test_fileids] -n_filts -feat_dir -feat_ext -stseg_dir -stseg_ext -mdef [-outfile_prefix] [-keep_utts]
+```
+runDatasetGen takes feature files and state-segmentation files stored in sphinx format along with the definition file of the GMM-HMM model to generate a set of numpy arrays that form a python readable dataset.
+runDatasetGen writes the following files in the directory it was called in:
+- Data Files
+	- <outfile_prefix>_train.npy
+	- <outfile_prefix>_dev.npy
+	- <outfile_prefix>_test.npy
+- label files
+	- <outfile_prefix>_train_label.npy
+	- <outfile_prefix>_dev_label.npy
+	- <outfile_prefix>_test_label.npy
+- metadata file
+	- <outfile_prefix>_meta.npz
+
+The metadata file is a zipped collection of arrays with the follwing keys:
+- File names for utterances
+	- filenames_Train
+	- filenames_Dev
+	- filenames_Test
+- Number of frames per utterance (useful if -keep_utts is not set)
+	- framePos_Train
+	- framePos_Dev
+	- framePos_Test
+- State Frequencies (useful for scaling in some cases)
+	- state_freq_Train
+	- state_freq_Dev
+	- state_freq_Test
+```
+runNNTrain.py -train_data -train_labels -val_data -val_labels -nn_config [-context_win] [-cuda_device_id] [-pretrain] [-keras_model] -model_name
+```
+runNNTrain takes the training and validation data files (as generated by runDatasetGen) and trains a neural network on them. 
+The architecture and parameters of the neural network is defined in a text file. Currently this script supports 4 network types:
+- MLP (mlp)
+- Convolutional Neural Network (conv)
+- MLP with short cut connections (resnet)
+- Convolutional Network with residual connections in the fully connected layers (conv + resnet)
+See sample_nn.cfg for an example.
+The format for the configuration file consists of ```param``` and ```value``` pairs
+if value has multiple elemets (represented by ... below) they should be separated by spaces.
+Params and possible values:
+- **type** 			mlp, conv, resnet, conv+resnet
+- **width** 		any integer value
+- **depth**			any integer value
+- **dropout**		float in (0,1)
+- **batch_norm**	-
+- **activation**	sigmoid, hard_sigmoid, elu, relu, selu, tanh, softplus, softsign, softmax, linear
+- **optimizer**		sgd, adam, adagrad
+- **lr** 			float in (0,1)
+- **batch_size** 	any integer value
+- **ctc_loss**		-
+- for type = conv and type = conv+resnet
+	- **conv** 			[n_filters, filter_window]...
+	- **pooling**		None, [max/avg, window_size, stride_size]
+- for type = resnet and type = conv+resnet
+	- **block_depth**	any integer value
+	- **n_blocks**		any integer value
+```
+runNNPredict -keras_model -ctldir -inext -outdir -outext -nfilts [-acoustic_weight] [-context_win] [-cuda_device_id]
+```
+runNNPredict takes a keras model and a list of feature files to generate predictions. The predictions are stored as binary files in sphinx readable format (defined above).
+Please ensure that the dimensionality of the feature vectors matches nfilts and the context window is the same as that for which the model was trained.
+The acoustic_weight is used to scale the output scores. This is required because if the scores are passed through a GMM-GMM decoder like PocketSphinx are too small or too large then the decoding performance suffers. One way of estimating this weight is to generate scores from the GMM-HMM decoder being used, fit a linear regression between the GMM-HMM scores and the NN-scores and use the coefficient as the weight.
+```
+readSen.py -gmm_score_dir -gmm_ctllist -nn_score_dir -nn_ctllist [-gmm_ext] [-nn_ext]
+```
+readSen takes scores (stored in sphinx readable binary files) obtained from a GMM-HMM decoder and a NN, and fit a regression to them.
+
+## Example workflow with CMUSphinx
+- Feature extraction using sphinx_fe:
+	 ```
+	 sphinx_fe -argfile ../../en_us.ci_cont/feat.params -c etc/wsj0_train.fileids -di wav/ -do feat_ci_mls -mswav yes -eo mls -ei wav -ofmt sphinx -logspec yes
+	 ```
+- State-segmentation using sphinx3_align
+	```
+	sphinx3_align -hmm ../../en_us.ci_cont/ -dict etc/cmudict.0.6d.wsj0 -ctl etc/wsj0_train.fileids -cepdir feat_ci_mls/ -cepext .mfc -insent etc/wsj0.transcription -outsent wsj0.out -stsegdir stateseg_ci_dir/ -cmn batch
+	```
+- Generate dataset using runDatasetGen.py
+- Train NN using runNNtrain.py
+### EITHER
+- Generate predictions from the NN using runNNPredct.py
+- Generate predictions from PocketSphinx
+```
+pocketsphinx_batch -hmm ../../en_us.ci_cont/ -lm ../../tcb20onp.Z.DMP -cepdir feat_ci_mfc/ -ctl ../../GSOC/SI_ET_20.NDX -dict etc/cmudict.0.6d.wsj0 -senlogdir sendump_ci/ -compallsen yes -bestpath no -fwdflat no -remove_noise no -remove_silence no -logbase 1.0001 -pl_window 0
+```
+- Compute the acoustic weight using readSen.py
+- Decode the scaled NN predictions with PocketSphinx
+```
+pocketsphinx_batch -hmm ../../en_us.ci_cont/ -lm ../../tcb20onp.Z.DMP -cepdir senscores/ -cepext .sen -hyp NN2.hyp -ctl ../../GSOC/SI_ET_20.NDX -dict etc/cmudict.0.6d.wsj0 -compallsen yes -logbase 1.0001 -pl_window 0 -senin yes
+```
+### OR
+- predict and decode with the PocketSphinx DNN decoder by passing your keras model to it and setting the other required parameters.
+```
+pocketsphinx_batch -hmm ../../en_us.ci_cont_2/ -lm ../../tcb20onp.Z.DMP -cepdir feat_ci_dev_mls/  -cmn batch -hyp test_ci_2-2.hyp -ctl etc/wsj0_dev.fileids -dict etc/cmudict.0.6d.wsj0 -nnmgau ../../GSOC/bestModels/best_CI.h5 -pl_window 0 -ceplen 25 -ncep 25 -cudaid 2
+```
+NOTE: If you are using the PocketSphinx DNN decoder please ensure that you select the appropriate feature type for your model. You need to be extra careful if you are training models that process the data utterance-wise instead of frame-wise since the default behaviour of Pocketsphinx is to perform frame-wise classification.
diff --git a/python/DNN_training/requirements.txt b/python/DNN_training/requirements.txt
@@ -0,0 +1,67 @@
+appdirs==1.4.3
+audioread==2.1.5
+backports.weakref==1.0rc1
+bleach==1.5.0
+cycler==0.10.0
+Cython==0.25.2
+daemonize==2.4.7
+decorator==4.0.6
+editdistance==0.3.1
+funcsigs==1.0.2
+functools32==3.2.3.post2
+graphviz==0.7.1
+guppy==0.1.10
+h5py==2.7.0
+htk-io==0.5
+html5lib==0.9999999
+ipython==2.4.1
+joblib==0.11
+Keras==2.0.6
+Lasagne==0.2.dev1
+librosa==0.5.1
+Mako==1.0.6
+Markdown==2.2.0
+MarkupSafe==1.0
+matplotlib==2.0.2
+memory-profiler==0.47
+mock==2.0.0
+nose==1.3.7
+numpy==1.13.1
+packaging==16.8
+pbr==3.1.1
+pexpect==4.0.1
+posix-ipc==1.0.0
+protobuf==3.3.0
+ptyprocess==0.5
+py==1.4.33
+pycurl==7.43.0
+pydot==1.2.3
+pydot-ng==1.0.0
+pyfst==0.2.3
+pygpu==0.6.5
+pyliblzma==0.5.3
+pyparsing==2.2.0
+pysqlite==1.0.1
+pytest==3.0.7
+python-apt==1.1.0b1
+python-dateutil==2.6.0
+python-speech-features==0.5
+pytools==2016.2.6
+pytz==2017.2
+PyYAML==3.12
+resampy==0.1.5
+rpm-python==4.12.0.1
+scikit-learn==0.18.1
+scipy==0.19.1
+simplegeneric==0.8.1
+six==1.10.0
+subprocess32==3.2.7
+tensorflow-gpu==1.2.0
+tfrbm==0.0.2
+Theano==0.9.0
+tkinter==0.2.0
+tqdm==4.14.0
+urlgrabber==3.9.1
+virtualenv==15.0.1
+Werkzeug==0.12.2
+yum-metadata-parser==1.1.4
diff --git a/python/DNN_training/test.csv b/python/DNN_training/test.csv
@@ -0,0 +1,9 @@
+epoch,acc,loss,lr,val_acc,val_loss
+0,0.11622738801684533,5.8389106012230449,0.001,0.10160732136648122,5.7048709947973366
+1,0.16153809341500766,4.8325580953638916,0.001,0.10914953494436717,5.4223426403024435
+2,0.18681057379402757,4.4621437494093934,0.001,0.12780201451668985,5.2290357653056798
+3,0.21127847434915772,4.1832528039470747,0.001,0.14062635822322669,5.0586822548562527
+4,0.2326776057618683,3.9420465787738608,0.001,0.15207753824756606,4.8969079582681241
+0,0.11584453962480858,5.8249653400724917,0.001,0.10603784553198888,5.6919887321217502
+1,0.16048675583843797,4.8297243269807897,0.001,0.10991421462100139,5.4031377342712235
+2,0.18789032589969373,4.4382342056329911,0.001,0.12922543245827539,5.2118432286385863
diff --git a/python/DNN_training/test.png b/python/DNN_training/test.png
diff --git a/python/Makefile.am b/python/Makefile.am
@@ -72,6 +72,13 @@ nobase_scripts_SCRIPTS = \
 	cmusphinx/prune_mixw.py \
 	cmusphinx/quantize_mixw.py \
 	cmusphinx/lat2dot.py \
-	cmusphinx/qmwx.pyx
+	cmusphinx/qmwx.pyx \
+	cmusphinx/readSen.py \
+	cmusphinx/runDatasetGen.py \
+	cmusphinx/runNNPredict.py \
+	cmusphinx/runNNTrain.py \
+	cmusphinx/genLabels.py \
+	cmusphinx/utils.py \
+	cmusphinx/NNTrain.py
 
 EXTRA_DIST = $(nobase_scripts_SCRIPTS)