Support multiple parallel augmentations?

Piotr, there's something that it would be nice to have support for at some point, which is multiple parallel versions of the same audio but with different augmentations.
As you probably know, all our current recipes in Icefall depend on [CR-CTC](https://arxiv.org/abs/2410.05101), where we have two versions of the same audio with different spec-aug  masks; in the extra loss introduced in CR-CTC, the CTC output for one copy acts as a reference for the network's CTC log-probs of the other copy.  (In the SpecAug used for CR-CTC, we use a 2.5 times larger-than-normal num_frame_masks and max_frames_mask_fraction fraction versus the default setup).
I was speaking about this with MILA's group (@mravenelli-mila) and one of them asked me whether we did different music-and-noise augmentation in the two copies.  I said we don't.  But I wonder how hard this would be to implement in Lhotse?  And whether can it be done without too-unpleasant code changes?
In our current recipe we move the SpecAug out of lhotse.
One of those guys from MILA also mentioned that they are working on something where they have several copies of the augmented data, but the copy that produces the "reference output" stays clean without augmentations.  This is just something to bear in mind, I'm not saying to necessarily support this as I don't know how the code would be structured.
In MILA's work they do the same kind of thing as we do in  CR-CTC but with the attention decoder.  (I'm not sure if this is in addition to a CTC version).  In that case there wouldn't be a natural need to keep the two copies of each utterance synchronized time-wise, because there is no concept of time in the attention-decoder outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support multiple parallel augmentations? #1477

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support multiple parallel augmentations? #1477

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions