|
1 |
| -TDNN LSTM CTC |
| 1 | +TDNN-LSTM-CTC |
2 | 2 | =============
|
| 3 | + |
| 4 | +This tutorial shows you how to run a TDNN-LSTM-CTC model with the `LibriSpeech <https://www.openslr.org/12>`_ dataset. |
| 5 | + |
| 6 | + |
| 7 | +.. HINT:: |
| 8 | + |
| 9 | + We assume you have read the page :ref:`install icefall` and have setup |
| 10 | + the environment for ``icefall``. |
| 11 | + |
| 12 | + |
| 13 | +Data preparation |
| 14 | +---------------- |
| 15 | + |
| 16 | +.. code-block:: bash |
| 17 | +
|
| 18 | + $ cd egs/librispeech/ASR |
| 19 | + $ ./prepare.sh |
| 20 | +
|
| 21 | +The script ``./prepare.sh`` handles the data preparation for you, **automagically**. |
| 22 | +All you need to do is to run it. |
| 23 | + |
| 24 | +The data preparation contains several stages, you can use the following two |
| 25 | +options: |
| 26 | + |
| 27 | + - ``--stage`` |
| 28 | + - ``--stop-stage`` |
| 29 | + |
| 30 | +to control which stage(s) should be run. By default, all stages are executed. |
| 31 | + |
| 32 | + |
| 33 | +For example, |
| 34 | + |
| 35 | +.. code-block:: bash |
| 36 | +
|
| 37 | + $ cd egs/librispeech/ASR |
| 38 | + $ ./prepare.sh --stage 0 --stop-stage 0 |
| 39 | +
|
| 40 | +means to run only stage 0. |
| 41 | + |
| 42 | +To run stage 2 to stage 5, use: |
| 43 | + |
| 44 | +.. code-block:: bash |
| 45 | +
|
| 46 | + $ ./prepare.sh --stage 2 --stop-stage 5 |
| 47 | +
|
| 48 | +
|
| 49 | +Training |
| 50 | +-------- |
| 51 | + |
| 52 | +Now describing the training of TDNN-LSTM-CTC model, contained in |
| 53 | +the `tdnn_lstm_ctc <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc>`_ |
| 54 | +folder. |
| 55 | + |
| 56 | +The command to run the training part is: |
| 57 | + |
| 58 | +.. code-block:: bash |
| 59 | +
|
| 60 | + $ cd egs/librispeech/ASR |
| 61 | + $ export CUDA_VISIBLE_DEVICES="0,1,2,3" |
| 62 | + $ ./tdnn_lstm_ctc/train.py --world-size 4 |
| 63 | +
|
| 64 | +By default, it will run ``20`` epochs. Training logs and checkpoints are saved |
| 65 | +in ``tdnn_lstm_ctc/exp``. |
| 66 | + |
| 67 | +In ``tdnn_lstm_ctc/exp``, you will find the following files: |
| 68 | + |
| 69 | + - ``epoch-0.pt``, ``epoch-1.pt``, ..., ``epoch-19.pt`` |
| 70 | + |
| 71 | + These are checkpoint files, containing model ``state_dict`` and optimizer ``state_dict``. |
| 72 | + To resume training from some checkpoint, say ``epoch-10.pt``, you can use: |
| 73 | + |
| 74 | + .. code-block:: bash |
| 75 | +
|
| 76 | + $ ./tdnn_lstm_ctc/train.py --start-epoch 11 |
| 77 | +
|
| 78 | + - ``tensorboard/`` |
| 79 | + |
| 80 | + This folder contains TensorBoard logs. Training loss, validation loss, learning |
| 81 | + rate, etc, are recorded in these logs. You can visualize them by: |
| 82 | + |
| 83 | + .. code-block:: bash |
| 84 | +
|
| 85 | + $ cd tdnn_lstm_ctc/exp/tensorboard |
| 86 | + $ tensorboard dev upload --logdir . --description "TDNN LSTM training for librispeech with icefall" |
| 87 | +
|
| 88 | + - ``log/log-train-xxxx`` |
| 89 | + |
| 90 | + It is the detailed training log in text format, same as the one |
| 91 | + you saw printed to the console during training. |
| 92 | + |
| 93 | + |
| 94 | +To see available training options, you can use: |
| 95 | + |
| 96 | +.. code-block:: bash |
| 97 | +
|
| 98 | + $ ./tdnn_lstm_ctc/train.py --help |
| 99 | +
|
| 100 | +Other training options, e.g., learning rate, results dir, etc., are |
| 101 | +pre-configured in the function ``get_params()`` |
| 102 | +in `tdnn_lstm_ctc/train.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/train.py>`_. |
| 103 | +Normally, you don't need to change them. You can change them by modifying the code, if |
| 104 | +you want. |
| 105 | + |
| 106 | +Decoding |
| 107 | +-------- |
| 108 | + |
| 109 | +The decoding part uses checkpoints saved by the training part, so you have |
| 110 | +to run the training part first. |
| 111 | + |
| 112 | +The command for decoding is: |
| 113 | + |
| 114 | +.. code-block:: bash |
| 115 | +
|
| 116 | + $ export CUDA_VISIBLE_DEVICES="0" |
| 117 | + $ ./tdnn_lstm_ctc/decode.py |
| 118 | +
|
| 119 | +You will see the WER in the output log. |
| 120 | + |
| 121 | +Decoded results are saved in ``tdnn_lstm_ctc/exp``. |
| 122 | + |
| 123 | +.. code-block:: bash |
| 124 | +
|
| 125 | + $ ./tdnn_lstm_ctc/decode.py --help |
| 126 | +
|
| 127 | +shows you the available decoding options. |
| 128 | + |
| 129 | +Some commonly used options are: |
| 130 | + |
| 131 | + - ``--epoch`` |
| 132 | + |
| 133 | + You can select which checkpoint to be used for decoding. |
| 134 | + For instance, ``./tdnn_lstm_ctc/decode.py --epoch 10`` means to use |
| 135 | + ``./tdnn_lstm_ctc/exp/epoch-10.pt`` for decoding. |
| 136 | + |
| 137 | + - ``--avg`` |
| 138 | + |
| 139 | + It's related to model averaging. It specifies number of checkpoints |
| 140 | + to be averaged. The averaged model is used for decoding. |
| 141 | + For example, the following command: |
| 142 | + |
| 143 | + .. code-block:: bash |
| 144 | +
|
| 145 | + $ ./tdnn_lstm_ctc/decode.py --epoch 10 --avg 3 |
| 146 | +
|
| 147 | + uses the average of ``epoch-8.pt``, ``epoch-9.pt`` and ``epoch-10.pt`` |
| 148 | + for decoding. |
| 149 | + |
| 150 | + - ``--export`` |
| 151 | + |
| 152 | + If it is ``True``, i.e., ``./tdnn_lstm_ctc/decode.py --export 1``, the code |
| 153 | + will save the averaged model to ``tdnn_lstm_ctc/exp/pretrained.pt``. |
| 154 | + See :ref:`tdnn_lstm_ctc use a pre-trained model` for how to use it. |
| 155 | + |
| 156 | +.. HINT:: |
| 157 | + |
| 158 | + There are several decoding methods provided in `tdnn_lstm_ctc/decode.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/train.py>`_, you can change the decoding method by modifying ``method`` parameter in function ``get_params()``. |
| 159 | + |
| 160 | + |
| 161 | +.. _tdnn_lstm_ctc use a pre-trained model: |
| 162 | + |
| 163 | +Pre-trained Model |
| 164 | +----------------- |
| 165 | + |
| 166 | +We have uploaded the pre-trained model to |
| 167 | +`<https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc>`_. |
| 168 | + |
| 169 | +The following shows you how to use the pre-trained model. |
| 170 | + |
| 171 | +Download the pre-trained model |
| 172 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 173 | + |
| 174 | +.. code-block:: bash |
| 175 | +
|
| 176 | + $ cd egs/librispeech/ASR |
| 177 | + $ mkdir tmp |
| 178 | + $ cd tmp |
| 179 | + $ git lfs install |
| 180 | + $ git clone https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc |
| 181 | +
|
| 182 | +.. CAUTION:: |
| 183 | + |
| 184 | + You have to use ``git lfs`` to download the pre-trained model. |
| 185 | + |
| 186 | +After downloading, you will have the following files: |
| 187 | + |
| 188 | +.. code-block:: bash |
| 189 | +
|
| 190 | + $ cd egs/librispeech/ASR |
| 191 | + $ tree tmp |
| 192 | +
|
| 193 | +.. code-block:: bash |
| 194 | +
|
| 195 | + tmp/ |
| 196 | + `-- icefall_asr_librispeech_tdnn-lstm_ctc |
| 197 | + |-- README.md |
| 198 | + |-- data |
| 199 | + | |-- lang_phone |
| 200 | + | | |-- HLG.pt |
| 201 | + | | |-- tokens.txt |
| 202 | + | | `-- words.txt |
| 203 | + | `-- lm |
| 204 | + | `-- G_4_gram.pt |
| 205 | + |-- exp |
| 206 | + | `-- pretrained.pt |
| 207 | + `-- test_wavs |
| 208 | + |-- 1089-134686-0001.flac |
| 209 | + |-- 1221-135766-0001.flac |
| 210 | + |-- 1221-135766-0002.flac |
| 211 | + `-- trans.txt |
| 212 | + |
| 213 | + 6 directories, 10 files |
| 214 | +
|
| 215 | +
|
| 216 | +Download kaldifeat |
| 217 | +~~~~~~~~~~~~~~~~~~ |
| 218 | +
|
| 219 | +`kaldifeat <https://github.com/csukuangfj/kaldifeat>`_ is used for extracting |
| 220 | +features from a single or multiple sound files. Please refer to |
| 221 | +`<https://github.com/csukuangfj/kaldifeat>`_ to install ``kaldifeat`` first. |
| 222 | +
|
| 223 | +Inference with a pre-trained model |
| 224 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 225 | +
|
| 226 | +.. code-block:: bash |
| 227 | +
|
| 228 | + $ cd egs/librispeech/ASR |
| 229 | + $ ./tdnn_lstm_ctc/pretrained.py --help |
| 230 | +
|
| 231 | +shows the usage information of ``./tdnn_lstm_ctc/pretrained.py``. |
| 232 | +
|
| 233 | +To decode with ``1best`` method, we can use: |
| 234 | +
|
| 235 | +.. code-block:: bash |
| 236 | +
|
| 237 | + ./tdnn_lstm_ctc/pretrained.py \ |
| 238 | + --checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \ |
| 239 | + --words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt \ |
| 240 | + --HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt \ |
| 241 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac \ |
| 242 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac \ |
| 243 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac |
| 244 | +
|
| 245 | +The output is: |
| 246 | +
|
| 247 | +.. code-block:: |
| 248 | +
|
| 249 | + 2021-08-24 16:57:13,315 INFO [pretrained.py:168] device: cuda:0 |
| 250 | + 2021-08-24 16:57:13,315 INFO [pretrained.py:170] Creating model |
| 251 | + 2021-08-24 16:57:18,331 INFO [pretrained.py:182] Loading HLG from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt |
| 252 | + 2021-08-24 16:57:27,581 INFO [pretrained.py:199] Constructing Fbank computer |
| 253 | + 2021-08-24 16:57:27,584 INFO [pretrained.py:209] Reading sound files: ['./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac'] |
| 254 | + 2021-08-24 16:57:27,599 INFO [pretrained.py:215] Decoding started |
| 255 | + 2021-08-24 16:57:27,791 INFO [pretrained.py:245] Use HLG decoding |
| 256 | + 2021-08-24 16:57:28,098 INFO [pretrained.py:266] |
| 257 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac: |
| 258 | + AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS |
| 259 | + |
| 260 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac: |
| 261 | + GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN |
| 262 | + |
| 263 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac: |
| 264 | + YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION |
| 265 | + |
| 266 | + |
| 267 | + 2021-08-24 16:57:28,099 INFO [pretrained.py:268] Decoding Done |
| 268 | +
|
| 269 | +
|
| 270 | +To decode with ``whole-lattice-rescoring`` methond, you can use |
| 271 | +
|
| 272 | +.. code-block:: bash |
| 273 | +
|
| 274 | + ./tdnn_lstm_ctc/pretrained.py \ |
| 275 | + --checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \ |
| 276 | + --words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt \ |
| 277 | + --HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt \ |
| 278 | + --method whole-lattice-rescoring \ |
| 279 | + --G ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt \ |
| 280 | + --ngram-lm-scale 0.8 \ |
| 281 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac \ |
| 282 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac \ |
| 283 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac |
| 284 | +
|
| 285 | +The decoding output is: |
| 286 | +
|
| 287 | +.. code-block:: |
| 288 | +
|
| 289 | + 2021-08-24 16:39:24,725 INFO [pretrained.py:168] device: cuda:0 |
| 290 | + 2021-08-24 16:39:24,725 INFO [pretrained.py:170] Creating model |
| 291 | + 2021-08-24 16:39:29,403 INFO [pretrained.py:182] Loading HLG from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt |
| 292 | + 2021-08-24 16:39:40,631 INFO [pretrained.py:190] Loading G from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt |
| 293 | + 2021-08-24 16:39:53,098 INFO [pretrained.py:199] Constructing Fbank computer |
| 294 | + 2021-08-24 16:39:53,107 INFO [pretrained.py:209] Reading sound files: ['./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac'] |
| 295 | + 2021-08-24 16:39:53,121 INFO [pretrained.py:215] Decoding started |
| 296 | + 2021-08-24 16:39:53,443 INFO [pretrained.py:250] Use HLG decoding + LM rescoring |
| 297 | + 2021-08-24 16:39:54,010 INFO [pretrained.py:266] |
| 298 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac: |
| 299 | + AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS |
| 300 | + |
| 301 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac: |
| 302 | + GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN |
| 303 | + |
| 304 | + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac: |
| 305 | + YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION |
| 306 | + |
| 307 | + |
| 308 | + 2021-08-24 16:39:54,010 INFO [pretrained.py:268] Decoding Done |
| 309 | +
|
| 310 | +
|
| 311 | +Colab notebook |
| 312 | +-------------- |
| 313 | +
|
| 314 | +We provide a colab notebook for decoding with pre-trained model. |
| 315 | +
|
| 316 | +|librispeech tdnn_lstm_ctc colab notebook| |
| 317 | +
|
| 318 | +.. |librispeech tdnn_lstm_ctc colab notebook| image:: https://colab.research.google.com/assets/colab-badge.svg |
| 319 | + :target: https://colab.research.google.com/drive/1kNmDXNMwREi0rZGAOIAOJo93REBuOTcd |
| 320 | +
|
| 321 | +
|
| 322 | +**Congratulations!** You have finished the TDNN-LSTM-CTC recipe on librispeech in ``icefall``. |
0 commit comments