-
Notifications
You must be signed in to change notification settings - Fork 248
Open
Description
I encountered a LibsndfileError: flac decoder lost sync error while trying to download and process the ReazonSpeech "small-v1" dataset using lhotse. The error occurs during the dataset mapping phase.
always at 778
2025-03-27 14:48:34,459 INFO [config.py:54] PyTorch version 2.4.1+cu118 available.
2025-03-27 14:48:34,690 INFO [reazonspeech.py:101] Downloading ReazonSpeech part: small-v1
Downloading data: 100%|███████████████████████████████████████████████████████████████| 275k/275k [00:00<00:00, 895kB/s]
Downloading data: 100%|██████████████████████████████████████████████████████████████| 321M/321M [00:16<00:00, 20.0MB/s]
Generating train split: 2637 examples [00:00, 9166.16 examples/s]
Map: 30%|███████████████████▊ | 778/2637 [00:01<00:04, 458.21 examples/s]
Traceback (most recent call last):
File "/home/nroy/anaconda3/envs/icefall/bin/lhotse", line 8, in <module>
sys.exit(cli())
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/lhotse/bin/modes/recipes/reazonspeech.py", line 59, in reazonspeech
download_reazonspeech(target_dir, dataset_parts=subset, num_jobs=num_jobs)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/lhotse/recipes/reazonspeech.py", line 119, in download_reazonspeech
ds = ds.map(
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 560, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3055, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3428, in _map_single
example = apply_function_on_filtered_inputs(example, i, offset=offset)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3320, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/lhotse/recipes/reazonspeech.py", line 113, in format_example
example["audio_filepath"] = example["audio"]["path"]
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/formatting/formatting.py", line 279, in __getitem__
value = self.format(key)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/formatting/formatting.py", line 377, in format
return self.formatter.format_column(self.pa_table.select([key]))[0]
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/formatting/formatting.py", line 449, in format_column
column = self.python_features_decoder.decode_column(column, pa_table.column_names[0])
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/formatting/formatting.py", line 225, in decode_column
return self.features.decode_column(column, column_name) if self.features else column
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/features/features.py", line 2066, in decode_column
[decode_nested_example(self[column_name], value) if value is not None else None for value in column]
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/features/features.py", line 2066, in <listcomp>
[decode_nested_example(self[column_name], value) if value is not None else None for value in column]
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/features/features.py", line 1405, in decode_nested_example
return schema.decode_example(obj, token_per_repo_id=token_per_repo_id)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/datasets/features/audio.py", line 184, in decode_example
array, sampling_rate = sf.read(f)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/soundfile.py", line 308, in read
data = f.read(frames, dtype, always_2d, fill_value, out)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/soundfile.py", line 942, in read
frames = self._array_io('read', out, frames)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/soundfile.py", line 1394, in _array_io
return self._cdata_io(action, cdata, ctype, frames)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/soundfile.py", line 1404, in _cdata_io
_error_check(self._errorcode)
File "/home/nroy/anaconda3/envs/icefall/lib/python3.8/site-packages/soundfile.py", line 1480, in _error_check
raise LibsndfileError(err, prefix=prefix)
soundfile.LibsndfileError: Error : flac decoder lost sync.
Python: 3.8
PyTorch: 2.4.1+cu118
lhotse: 1.31.0.dev0+git.aa38c0f.clean
soundfile: 0.13.1
datasets: 3.1.0
Metadata
Metadata
Assignees
Labels
No labels