-
Notifications
You must be signed in to change notification settings - Fork 248
Open
Description
See
lhotse/lhotse/dataset/speech_recognition.py
Lines 206 to 215 in aa38c0f
tol = 2e-3 # 1ms | |
for cut in cuts: | |
for supervision in cut.supervisions: | |
assert supervision.start >= -tol, ( | |
f"Supervisions starting before the cut are not supported for ASR" | |
f" (sup id: {supervision.id}, cut id: {cut.id})" | |
) | |
# Supervision start time is relative to Cut ... | |
# https://lhotse.readthedocs.io/en/v0.10_e/cuts.html |
The code assumes supervision.start
is positive.
However,
lhotse/lhotse/dataset/speech_recognition.py
Line 214 in aa38c0f
# Supervision start time is relative to Cut ... |
says
supervision.start
is relative to cut, which means it can be negative.
See also
Line 77 in aa38c0f
In some cases, the supervision might have a negative start, or a duration exceeding the duration of the cut; |
I find this since I have the following cut that fails to pass the validation.
# Edited to remove some info
MonoCut(id='xxxx', start=555.4219954648526, duration=24.081995464852607, channel=0,
supervisions=[SupervisionSegment(id='xxxx', recording_id='xxx', start=-8.854,
duration=8.954000000000065, channel=0, text='xxxx', language='zh',
speaker=None, gender=None, alignment=None)],
features=Features(type='kaldi-fbank', num_frames=2408,
num_features=80, frame_shift=0.01, sampling_rate=16000,
start=555.4219954648526, duration=24.082,
storage_type='lilcom_chunky',
storage_path='/xxxx/feats-0.lca', storage_key='501861,44277,43037,43447,43043,3
5284', recording_id='None', channels=0),
recording=Recording(id='xxxx,
sources=[AudioSource(type='file', channels=[0, 1], source='xxxx.m4a')],
sampling_rate=16000, num_samples=19222768,
duration=1201.423, channel_ids=[0, 1],
transforms=[Resample(source_sampling_rate=44100,
target_sampling_rate=16000)]), custom=None)
You can see that supervisions[0].start
is -8.854
, which is negative.
Metadata
Metadata
Assignees
Labels
No labels