Skip to content

Commit f4223ee

Browse files
authored
Add TDNN-LSTM-CTC Results (#25)
* Add tdnn-lstm pretrained model and results * Add docs for TDNN-LSTM-CTC * Minor fix * Fix typo * Fix style checking
1 parent 1bd5dcc commit f4223ee

File tree

9 files changed

+915
-6
lines changed

9 files changed

+915
-6
lines changed

.flake8

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ max-line-length = 80
55
per-file-ignores =
66
# line too long
77
egs/librispeech/ASR/conformer_ctc/conformer.py: E501,
8+
egs/librispeech/ASR/conformer_ctc/decode.py: E501,
89

910
exclude =
1011
.git,
Lines changed: 321 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,322 @@
1-
TDNN LSTM CTC
1+
TDNN-LSTM-CTC
22
=============
3+
4+
This tutorial shows you how to run a TDNN-LSTM-CTC model with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
5+
6+
7+
.. HINT::
8+
9+
We assume you have read the page :ref:`install icefall` and have setup
10+
the environment for ``icefall``.
11+
12+
13+
Data preparation
14+
----------------
15+
16+
.. code-block:: bash
17+
18+
$ cd egs/librispeech/ASR
19+
$ ./prepare.sh
20+
21+
The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
22+
All you need to do is to run it.
23+
24+
The data preparation contains several stages, you can use the following two
25+
options:
26+
27+
- ``--stage``
28+
- ``--stop-stage``
29+
30+
to control which stage(s) should be run. By default, all stages are executed.
31+
32+
33+
For example,
34+
35+
.. code-block:: bash
36+
37+
$ cd egs/librispeech/ASR
38+
$ ./prepare.sh --stage 0 --stop-stage 0
39+
40+
means to run only stage 0.
41+
42+
To run stage 2 to stage 5, use:
43+
44+
.. code-block:: bash
45+
46+
$ ./prepare.sh --stage 2 --stop-stage 5
47+
48+
49+
Training
50+
--------
51+
52+
Now describing the training of TDNN-LSTM-CTC model, contained in
53+
the `tdnn_lstm_ctc <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc>`_
54+
folder.
55+
56+
The command to run the training part is:
57+
58+
.. code-block:: bash
59+
60+
$ cd egs/librispeech/ASR
61+
$ export CUDA_VISIBLE_DEVICES="0,1,2,3"
62+
$ ./tdnn_lstm_ctc/train.py --world-size 4
63+
64+
By default, it will run ``20`` epochs. Training logs and checkpoints are saved
65+
in ``tdnn_lstm_ctc/exp``.
66+
67+
In ``tdnn_lstm_ctc/exp``, you will find the following files:
68+
69+
- ``epoch-0.pt``, ``epoch-1.pt``, ..., ``epoch-19.pt``
70+
71+
These are checkpoint files, containing model ``state_dict`` and optimizer ``state_dict``.
72+
To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
73+
74+
.. code-block:: bash
75+
76+
$ ./tdnn_lstm_ctc/train.py --start-epoch 11
77+
78+
- ``tensorboard/``
79+
80+
This folder contains TensorBoard logs. Training loss, validation loss, learning
81+
rate, etc, are recorded in these logs. You can visualize them by:
82+
83+
.. code-block:: bash
84+
85+
$ cd tdnn_lstm_ctc/exp/tensorboard
86+
$ tensorboard dev upload --logdir . --description "TDNN LSTM training for librispeech with icefall"
87+
88+
- ``log/log-train-xxxx``
89+
90+
It is the detailed training log in text format, same as the one
91+
you saw printed to the console during training.
92+
93+
94+
To see available training options, you can use:
95+
96+
.. code-block:: bash
97+
98+
$ ./tdnn_lstm_ctc/train.py --help
99+
100+
Other training options, e.g., learning rate, results dir, etc., are
101+
pre-configured in the function ``get_params()``
102+
in `tdnn_lstm_ctc/train.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/train.py>`_.
103+
Normally, you don't need to change them. You can change them by modifying the code, if
104+
you want.
105+
106+
Decoding
107+
--------
108+
109+
The decoding part uses checkpoints saved by the training part, so you have
110+
to run the training part first.
111+
112+
The command for decoding is:
113+
114+
.. code-block:: bash
115+
116+
$ export CUDA_VISIBLE_DEVICES="0"
117+
$ ./tdnn_lstm_ctc/decode.py
118+
119+
You will see the WER in the output log.
120+
121+
Decoded results are saved in ``tdnn_lstm_ctc/exp``.
122+
123+
.. code-block:: bash
124+
125+
$ ./tdnn_lstm_ctc/decode.py --help
126+
127+
shows you the available decoding options.
128+
129+
Some commonly used options are:
130+
131+
- ``--epoch``
132+
133+
You can select which checkpoint to be used for decoding.
134+
For instance, ``./tdnn_lstm_ctc/decode.py --epoch 10`` means to use
135+
``./tdnn_lstm_ctc/exp/epoch-10.pt`` for decoding.
136+
137+
- ``--avg``
138+
139+
It's related to model averaging. It specifies number of checkpoints
140+
to be averaged. The averaged model is used for decoding.
141+
For example, the following command:
142+
143+
.. code-block:: bash
144+
145+
$ ./tdnn_lstm_ctc/decode.py --epoch 10 --avg 3
146+
147+
uses the average of ``epoch-8.pt``, ``epoch-9.pt`` and ``epoch-10.pt``
148+
for decoding.
149+
150+
- ``--export``
151+
152+
If it is ``True``, i.e., ``./tdnn_lstm_ctc/decode.py --export 1``, the code
153+
will save the averaged model to ``tdnn_lstm_ctc/exp/pretrained.pt``.
154+
See :ref:`tdnn_lstm_ctc use a pre-trained model` for how to use it.
155+
156+
.. HINT::
157+
158+
There are several decoding methods provided in `tdnn_lstm_ctc/decode.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/train.py>`_, you can change the decoding method by modifying ``method`` parameter in function ``get_params()``.
159+
160+
161+
.. _tdnn_lstm_ctc use a pre-trained model:
162+
163+
Pre-trained Model
164+
-----------------
165+
166+
We have uploaded the pre-trained model to
167+
`<https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc>`_.
168+
169+
The following shows you how to use the pre-trained model.
170+
171+
Download the pre-trained model
172+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
173+
174+
.. code-block:: bash
175+
176+
$ cd egs/librispeech/ASR
177+
$ mkdir tmp
178+
$ cd tmp
179+
$ git lfs install
180+
$ git clone https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc
181+
182+
.. CAUTION::
183+
184+
You have to use ``git lfs`` to download the pre-trained model.
185+
186+
After downloading, you will have the following files:
187+
188+
.. code-block:: bash
189+
190+
$ cd egs/librispeech/ASR
191+
$ tree tmp
192+
193+
.. code-block:: bash
194+
195+
tmp/
196+
`-- icefall_asr_librispeech_tdnn-lstm_ctc
197+
|-- README.md
198+
|-- data
199+
| |-- lang_phone
200+
| | |-- HLG.pt
201+
| | |-- tokens.txt
202+
| | `-- words.txt
203+
| `-- lm
204+
| `-- G_4_gram.pt
205+
|-- exp
206+
| `-- pretrained.pt
207+
`-- test_wavs
208+
|-- 1089-134686-0001.flac
209+
|-- 1221-135766-0001.flac
210+
|-- 1221-135766-0002.flac
211+
`-- trans.txt
212+
213+
6 directories, 10 files
214+
215+
216+
Download kaldifeat
217+
~~~~~~~~~~~~~~~~~~
218+
219+
`kaldifeat <https://github.com/csukuangfj/kaldifeat>`_ is used for extracting
220+
features from a single or multiple sound files. Please refer to
221+
`<https://github.com/csukuangfj/kaldifeat>`_ to install ``kaldifeat`` first.
222+
223+
Inference with a pre-trained model
224+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
225+
226+
.. code-block:: bash
227+
228+
$ cd egs/librispeech/ASR
229+
$ ./tdnn_lstm_ctc/pretrained.py --help
230+
231+
shows the usage information of ``./tdnn_lstm_ctc/pretrained.py``.
232+
233+
To decode with ``1best`` method, we can use:
234+
235+
.. code-block:: bash
236+
237+
./tdnn_lstm_ctc/pretrained.py \
238+
--checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \
239+
--words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt \
240+
--HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt \
241+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac \
242+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac \
243+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
244+
245+
The output is:
246+
247+
.. code-block::
248+
249+
2021-08-24 16:57:13,315 INFO [pretrained.py:168] device: cuda:0
250+
2021-08-24 16:57:13,315 INFO [pretrained.py:170] Creating model
251+
2021-08-24 16:57:18,331 INFO [pretrained.py:182] Loading HLG from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt
252+
2021-08-24 16:57:27,581 INFO [pretrained.py:199] Constructing Fbank computer
253+
2021-08-24 16:57:27,584 INFO [pretrained.py:209] Reading sound files: ['./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac']
254+
2021-08-24 16:57:27,599 INFO [pretrained.py:215] Decoding started
255+
2021-08-24 16:57:27,791 INFO [pretrained.py:245] Use HLG decoding
256+
2021-08-24 16:57:28,098 INFO [pretrained.py:266]
257+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac:
258+
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
259+
260+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac:
261+
GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
262+
263+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac:
264+
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
265+
266+
267+
2021-08-24 16:57:28,099 INFO [pretrained.py:268] Decoding Done
268+
269+
270+
To decode with ``whole-lattice-rescoring`` methond, you can use
271+
272+
.. code-block:: bash
273+
274+
./tdnn_lstm_ctc/pretrained.py \
275+
--checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \
276+
--words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt \
277+
--HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt \
278+
--method whole-lattice-rescoring \
279+
--G ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt \
280+
--ngram-lm-scale 0.8 \
281+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac \
282+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac \
283+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
284+
285+
The decoding output is:
286+
287+
.. code-block::
288+
289+
2021-08-24 16:39:24,725 INFO [pretrained.py:168] device: cuda:0
290+
2021-08-24 16:39:24,725 INFO [pretrained.py:170] Creating model
291+
2021-08-24 16:39:29,403 INFO [pretrained.py:182] Loading HLG from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt
292+
2021-08-24 16:39:40,631 INFO [pretrained.py:190] Loading G from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt
293+
2021-08-24 16:39:53,098 INFO [pretrained.py:199] Constructing Fbank computer
294+
2021-08-24 16:39:53,107 INFO [pretrained.py:209] Reading sound files: ['./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac']
295+
2021-08-24 16:39:53,121 INFO [pretrained.py:215] Decoding started
296+
2021-08-24 16:39:53,443 INFO [pretrained.py:250] Use HLG decoding + LM rescoring
297+
2021-08-24 16:39:54,010 INFO [pretrained.py:266]
298+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac:
299+
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
300+
301+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac:
302+
GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
303+
304+
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac:
305+
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
306+
307+
308+
2021-08-24 16:39:54,010 INFO [pretrained.py:268] Decoding Done
309+
310+
311+
Colab notebook
312+
--------------
313+
314+
We provide a colab notebook for decoding with pre-trained model.
315+
316+
|librispeech tdnn_lstm_ctc colab notebook|
317+
318+
.. |librispeech tdnn_lstm_ctc colab notebook| image:: https://colab.research.google.com/assets/colab-badge.svg
319+
:target: https://colab.research.google.com/drive/1kNmDXNMwREi0rZGAOIAOJo93REBuOTcd
320+
321+
322+
**Congratulations!** You have finished the TDNN-LSTM-CTC recipe on librispeech in ``icefall``.

egs/librispeech/ASR/RESULTS.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
TensorBoard log is available at https://tensorboard.dev/experiment/GnRzq8WWQW62dK4bklXBTg/#scalars
88

9-
Pretrained model is available at https://huggingface.co/pkufool/conformer_ctc
9+
Pretrained model is available at https://huggingface.co/pkufool/icefall_asr_librispeech_conformer_ctc
1010

1111
The best decoding results (WER) are listed below, we got this results by averaging models from epoch 15 to 34, and using `attention-decoder` decoder with num_paths equals to 100.
1212

@@ -21,3 +21,26 @@ To get more unique paths, we scaled the lattice.scores with 0.5 (see https://git
2121
|test-clean|1.3|1.2|
2222
|test-other|1.2|1.1|
2323

24+
25+
### LibriSpeech training results (Tdnn-Lstm)
26+
#### 2021-08-24
27+
28+
(Wei Kang): Result of phone based Tdnn-Lstm model.
29+
30+
Icefall version: https://github.com/k2-fsa/icefall/commit/caa0b9e9425af27e0c6211048acb55a76ed5d315
31+
32+
Pretrained model is available at https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc
33+
34+
The best decoding results (WER) are listed below, we got this results by averaging models from epoch 19 to 14, and using `whole-lattice-rescoring` decoding method.
35+
36+
||test-clean|test-other|
37+
|--|--|--|
38+
|WER| 6.59% | 17.69% |
39+
40+
We searched the lm_score_scale for best results, the scales that produced the WER above are also listed below.
41+
42+
||lm_scale|
43+
|--|--|
44+
|test-clean|0.8|
45+
|test-other|0.9|
46+
Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
Please visit
32
<https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html>
43
for how to run this recipe.

egs/librispeech/ASR/conformer_ctc/decode.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ def get_parser():
8383
- (3) nbest-rescoring. Extract n paths from the decoding lattice,
8484
rescore them with an n-gram LM (e.g., a 4-gram LM), the path with
8585
the highest score is the decoding result.
86-
- (4) whole-lattice. Rescore the decoding lattice with an n-gram LM
86+
- (4) whole-lattice-rescoring. Rescore the decoding lattice with an n-gram LM
8787
(e.g., a 4-gram LM), the best path of rescored lattice is the
8888
decoding result.
8989
- (5) attention-decoder. Extract n paths from the LM rescored lattice,

0 commit comments

Comments
 (0)