How much memory is needed for the Speaker and Diarization Example #5986

Okohedeki · 2023-02-10T08:46:52Z

Okohedeki
Feb 10, 2023

When going through the example found Here I have been running into a memory issue. The file I am using is a 10 second clip which is about 1.8 MB. However when I call

asr_decoder_ts.run_ASR(asr_model)

I run into a numpy core error saying I need approximately 609 GiB. The line causing the issue is in the decoder_time_stamps_utils.py when creating the samples.

Full TraceBack:

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/okohedeki/Desktop/LanguageTranslate/src/backend/VideoTranslate/SpeakerSplit/getSignalSampleRate.py", line 162, in
cfg_with_params = setCFGParams(cfg_with_manifest_path, data_dir)
File "/home/okohedeki/Desktop/LanguageTranslate/src/backend/VideoTranslate/SpeakerSplit/getSignalSampleRate.py", line 113, in setCFGParams
word_hyp, word_ts_hyp = asr_decoder_ts.run_ASR(asr_model)
File "/home/okohedeki/Desktop/LanguageTranslate/src/backend/LTVE/lib/python3.10/site-packages/nemo/collections/asr/parts/utils/decoder_timestamps_utils.py", line 656, in run_ASR_BPE_CTC
hyp, greedy_predictions_list, log_prob = get_wer_feat_logit(
File "/home/okohedeki/Desktop/LanguageTranslate/src/backend/LTVE/lib/python3.10/site-packages/nemo/collections/asr/parts/utils/decoder_timestamps_utils.py", line 230, in get_wer_feat_logit
asr.read_audio_file_and_return(audio_file_path, delay, model_stride_in_secs)
File "/home/okohedeki/Desktop/LanguageTranslate/src/backend/LTVE/lib/python3.10/site-packages/nemo/collections/asr/parts/utils/decoder_timestamps_utils.py", line 258, in read_audio_file_and_return
samples = np.pad(samples, (0, int(delay * model_stride_in_secs * self.asr_model._cfg.sample_rate)))
File "<array_function internals>", line 180, in pad
File "/home/okohedeki/Desktop/LanguageTranslate/src/backend/LTVE/lib/python3.10/site-packages/numpy/lib/arraypad.py", line 793, in pad
padded, original_area_slice = _pad_simple(array, pad_width)
File "/home/okohedeki/Desktop/LanguageTranslate/src/backend/LTVE/lib/python3.10/site-packages/numpy/lib/arraypad.py", line 114, in _pad_simple
padded = np.empty(new_shape, dtype=array.dtype, order=order)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 609. GiB for an array with shape (240001, 681000) and data type float32

I believe the example shown in the tutorial was around 300kb so it may be the size of the file that is the issue. However I wanted to ask if there was something potentially wrong with my setup since I do not see any other issues related to memory when calling asr_decoder_ts.run_ASR(asr_model) from others. And the array shape does seem large for that filesize.

Setup

Python=3.10.6
numpy==1.23.5
nemo-toolkit==1.16.0rc0
ubuntu=22.04

Answered by Okohedeki

Feb 11, 2023

In case anyone else runs into this error the issue was that the input files can only have 1 audio channels. The file I was using had 2 audio channels which caused nemo/collections/asr/parts/utils/audio_utils.py to return a list of list due to how soundfile and/or librosa handles multiple audio channels. Because it was a list of list in the nemo/collections/asr/parts/utils/decoder_timestamps_utils.py file at samples = np.pad(samples, (0, int(delay * model_stride_in_secs * self.asr_model._cfg.sample_rate))) would blow up since the formula there expects a single list not a nested list causing the amount of memory needed.

You can use something like ffprobe -i TestVideo.mp4 -show_entries strea…

View full answer

Okohedeki · 2023-02-11T10:18:43Z

Okohedeki
Feb 11, 2023
Author

In case anyone else runs into this error the issue was that the input files can only have 1 audio channels. The file I was using had 2 audio channels which caused nemo/collections/asr/parts/utils/audio_utils.py to return a list of list due to how soundfile and/or librosa handles multiple audio channels. Because it was a list of list in the nemo/collections/asr/parts/utils/decoder_timestamps_utils.py file at samples = np.pad(samples, (0, int(delay * model_stride_in_secs * self.asr_model._cfg.sample_rate))) would blow up since the formula there expects a single list not a nested list causing the amount of memory needed.

You can use something like ffprobe -i TestVideo.mp4 -show_entries stream=channels -select_streams a -of compact=p=0:nk=1 -v 0 to check how many channels your audio has and if it is greater than 1 then use ffmpeg -i TestVideo.mp4 -ac 1 MonoTestVideo.mp4 to merge the audio channels before passing it as the AUDIO_FILENAME in the manifest json

It would be a nice check in audio_utils to check the number of channels of a file before allowing someone to proceed or checking the shape of the array

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How much memory is needed for the Speaker and Diarization Example #5986

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How much memory is needed for the Speaker and Diarization Example #5986

Uh oh!

Uh oh!

Okohedeki Feb 10, 2023

Replies: 1 comment

Uh oh!

Uh oh!

Okohedeki Feb 11, 2023 Author

Okohedeki
Feb 10, 2023

Okohedeki
Feb 11, 2023
Author